Compute Library
 21.02
NEGEMMLowpOutputStage.h
Go to the documentation of this file.
1 /*
2  * Copyright (c) 2017-2021 Arm Limited.
3  *
4  * SPDX-License-Identifier: MIT
5  *
6  * Permission is hereby granted, free of charge, to any person obtaining a copy
7  * of this software and associated documentation files (the "Software"), to
8  * deal in the Software without restriction, including without limitation the
9  * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
10  * sell copies of the Software, and to permit persons to whom the Software is
11  * furnished to do so, subject to the following conditions:
12  *
13  * The above copyright notice and this permission notice shall be included in all
14  * copies or substantial portions of the Software.
15  *
16  * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
17  * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
18  * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
19  * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
20  * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
21  * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
22  * SOFTWARE.
23  */
24 #ifndef ARM_COMPUTE_NEGEMMLOWPOUTPUTSTAGE_H
25 #define ARM_COMPUTE_NEGEMMLOWPOUTPUTSTAGE_H
26 
27 #include "arm_compute/core/Types.h"
29 
30 /** This file contains all available output stages for GEMMLowp on Neon.
31  *
32  * In gemmlowp, the "output stage" is the process that takes a final int32 accumulator value (the output of @ref NEGEMMLowpMatrixMultiplyCore),
33  * and processes it to obtain the final ASYMM8 value.
34  *
35  * More information about the GEMMLowp output stage can be found at https://github.com/google/gemmlowp/blob/master/doc/output.md
36  */
37 
38 namespace arm_compute
39 {
40 class ITensor;
41 class ITensorInfo;
42 
43 /** Basic function to execute NEGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint on Neon.
44  *
45  * NEGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint depends on 3 parameters:
46  *
47  * result_fixedpoint_multiplier, result_shift, result_offset_after_shift
48  *
49  * The final result is:
50  *
51  * (FixedPointMul(input[i][k], result_fixedpoint_multiplier) >> result_shift) + result_offset_after_shift
52  *
53  * where FixedPointMul(x, y) is the nearest integer to the following
54  * mathematical expression, evaluated without overflow or intermediate rounding:
55  *
56  * (x * y) / 2^31
57  *
58  * For more information: https://github.com/google/gemmlowp/blob/master/public/output_stages.h#L68
59  *
60  * In case the bias tensor is provided, the final result is:
61  *
62  * ((FixedPointMul(input[i][k] + bias[k], result_fixedpoint_multiplier)) >> result_shift) + result_offset_after_shift
63  *
64  * This function calls the following Neon kernels:
65  *
66  * -# @ref NEGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel
67  *
68  * @note The function accepts also 2 optional input arguments (min and max) which can be used to implement "rectified linear unit" activation functions
69  * after the result is shifted right by result_shift
70 */
72 {
73 public:
74  /** Constructor */
76  /** Prevent instances of this class from being copied (As this class contains pointers) */
78  /** Prevent instances of this class from being copied (As this class contains pointers) */
80  /** Prevent instances of this class from being moved (As this class contains non movable objects) */
82  /** Prevent instances of this class from being moved (As this class contains non movable objects) */
84  /** Default destructor */
86  /** Initialise the kernel's inputs, output
87  *
88  * @param[in] input Input tensor. Data type supported: S32
89  * @param[in] bias Biases tensor. Only shared biases supported and it can be a nullptr if the biases addition is not required.
90  * Biases are 1D tensor with dimensions [OFM]. Data type supported: Same as @p input.
91  * @param[out] output Output tensor. Data type supported: Data type supported: QASYMM8
92  * @param[in] result_fixedpoint_multiplier Fixed point value to be multiplied to each element of the input matrix when once the result_offset has been add
93  * @param[in] result_shift Number of bits to shift right the result after the fixed point multiplication
94  * @param[in] result_offset_after_shift Offset to be applied to result before converting it back to QASYMM8
95  * @param[in] min (Optional) Min value used to saturate down the output result before converting back to QASYMM8. Defaults to the minimum possible 32-bit signed integer.
96  * @param[in] max (Optional) Max value used to saturate up the output result before converting back to QASYMM8,
97  * Along with @p min, this value can be used to implement "rectified linear unit" activation functions. Defaults to the maximum possible 32-bit signed integer.
98  */
99  void configure(const ITensor *input, const ITensor *bias, ITensor *output, int result_fixedpoint_multiplier, int result_shift, int result_offset_after_shift,
100  int min = std::numeric_limits<int32_t>::lowest(), int max = std::numeric_limits<int32_t>::max());
101  /** Static function to check if given info will lead to a valid configuration of @ref NEGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint
102  *
103  * @param[in] input Input tensor. It is the output of @ref NEGEMMLowpMatrixMultiplyCore function. Data type supported: S32
104  * @param[in] bias Biases tensor. Only shared biases supported and it can be a nullptr if the addition of biases is not required.
105  * Biases are 1D tensor with dimensions [OFM]. Data type supported: Same as @p input.
106  * @param[in] output Output tensor. Data type supported: Data type supported: QASYMM8
107  * @param[in] min (Optional) Min value used to saturate down the output result before converting back to QASYMM8. Defaults to the minimum possible 32-bit signed integer.
108  * @param[in] max (Optional) Max value used to saturate up the output result before converting back to QASYMM8,
109  * Along with @p min, this value can be used to implement "rectified linear unit" activation functions. Defaults to the maximum possible 32-bit signed integer.
110  *
111  * @return a status
112  */
113  static Status validate(const ITensorInfo *input, const ITensorInfo *bias, const ITensorInfo *output, int min = std::numeric_limits<int32_t>::lowest(), int max = std::numeric_limits<int32_t>::max());
114 };
115 /** Basic function to execute NEGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPoint on Neon.
116  *
117  * NEGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPoint depends on 3 parameters:
118  *
119  * result_fixedpoint_multiplier, result_shift, result_offset_after_shift
120  *
121  * The final result is:
122  *
123  * (FixedPointMul(input[i][k], result_fixedpoint_multiplier) >> result_shift) + result_offset_after_shift
124  *
125  * where FixedPointMul(x, y) is the nearest integer to the following
126  * mathematical expression, evaluated without overflow or intermediate rounding:
127  *
128  * (x * y) / 2^31
129  *
130  * For more information: https://github.com/google/gemmlowp/blob/master/public/output_stages.h#L68
131  *
132  * In case the bias tensor is provided, the final result is:
133  *
134  * ((FixedPointMul(input[i][k] + bias[k], result_fixedpoint_multiplier)) >> result_shift) + result_offset_after_shift
135  *
136  * This function calls the following Neon kernels:
137  *
138  * -# @ref NEGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel
139  *
140  * @note The function accepts also 2 optional input arguments (min and max) which can be used to implement "rectified linear unit" activation functions
141  * after the result is shifted right by result_shift
142 */
144 {
145 public:
146  /** Constructor */
148  /** Prevent instances of this class from being copied (As this class contains pointers) */
150  /** Prevent instances of this class from being copied (As this class contains pointers) */
152  /** Prevent instances of this class from being moved (As this class contains non movable objects) */
154  /** Prevent instances of this class from being moved (As this class contains non movable objects) */
156  /** Default destructor */
158  /** Initialise the kernel's inputs, output
159  *
160  * @param[in] input Input tensor. Data type supported: S32
161  * @param[in] bias Biases tensor. Only shared biases supported and it can be a nullptr if the biases addition is not required.
162  * Biases are 1D tensor with dimensions [OFM]. Data type supported: Same as @p input.
163  * @param[out] output Output tensor. Data type supported: Data type supported: QASYMM8_SIGNED
164  * @param[in] result_fixedpoint_multiplier Fixed point value to be multiplied to each element of the input matrix when once the result_offset has been add
165  * @param[in] result_shift Number of bits to shift right the result after the fixed point multiplication
166  * @param[in] result_offset_after_shift Offset to be applied to result before converting it back to QASYMM8_SIGNED
167  * @param[in] min (Optional) Min value used to saturate down the output result before converting back to QASYMM8_SIGNED. Defaults to the minimum possible 32-bit signed integer.
168  * @param[in] max (Optional) Max value used to saturate up the output result before converting back to QASYMM8_SIGNED,
169  * Along with @p min, this value can be used to implement "rectified linear unit" activation functions. Defaults to the maximum possible 32-bit signed integer.
170  */
171  void configure(const ITensor *input, const ITensor *bias, ITensor *output, int result_fixedpoint_multiplier, int result_shift, int result_offset_after_shift,
172  int min = std::numeric_limits<int32_t>::lowest(), int max = std::numeric_limits<int32_t>::max());
173  /** Static function to check if given info will lead to a valid configuration of @ref NEGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPoint
174  *
175  * @param[in] input Input tensor. It is the output of @ref NEGEMMLowpMatrixMultiplyCore function. Data type supported: S32
176  * @param[in] bias Biases tensor. Only shared biases supported and it can be a nullptr if the addition of biases is not required.
177  * Biases are 1D tensor with dimensions [OFM]. Data type supported: Same as @p input.
178  * @param[in] output Output tensor. Data type supported: Data type supported: QASYMM8_SIGNED
179  * @param[in] min (Optional) Min value used to saturate down the output result before converting back to QASYMM8_SIGNED. Defaults to the minimum possible 32-bit signed integer.
180  * @param[in] max (Optional) Max value used to saturate up the output result before converting back to QASYMM8_SIGNED,
181  * Along with @p min, this value can be used to implement "rectified linear unit" activation functions. Defaults to the maximum possible 32-bit signed integer.
182  *
183  * @return a status
184  */
185  static Status validate(const ITensorInfo *input, const ITensorInfo *bias, const ITensorInfo *output, int min = std::numeric_limits<int32_t>::lowest(), int max = std::numeric_limits<int32_t>::max());
186 };
187 /** Basic function to execute NEGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPoint on Neon.
188  *
189  * NEGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPoint depends on 2 parameters:
190  *
191  * result_fixedpoint_multiplier, result_shift
192  *
193  * The final result is:
194  *
195  * (FixedPointMul(input[i][k], result_fixedpoint_multiplier) >> result_shift)
196  *
197  * where FixedPointMul(x, y) is the nearest integer to the following
198  * mathematical expression, evaluated without overflow or intermediate rounding:
199  *
200  * (x * y) / 2^31
201  *
202  * For more information: https://github.com/google/gemmlowp/blob/master/public/output_stages.h#L68
203  *
204  * In case the bias tensor is provided, the final result is:
205  *
206  * ((FixedPointMul(input[i][k] + bias[k], result_fixedpoint_multiplier)) >> result_shift) + result_offset_after_shift
207  *
208  * This function calls the following Neon kernels:
209  *
210  * -# @ref NEGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel
211  *
212  * @note The function accepts also 2 optional input arguments (min and max) which can be used to implement "rectified linear unit" activation functions
213  * after the result is shifted right by result_shift
214 */
216 {
217 public:
218  /** Constructor */
220  /** Prevent instances of this class from being copied (As this class contains pointers) */
222  /** Prevent instances of this class from being copied (As this class contains pointers) */
224  /** Prevent instances of this class from being moved (As this class contains non movable objects) */
226  /** Prevent instances of this class from being moved (As this class contains non movable objects) */
228  /** Default destructor */
230  /** Initialise the kernel's inputs, output
231  *
232  * @param[in] input Input tensor. Data type supported: S32
233  * @param[in] bias Biases tensor. Only shared biases supported and it can be a nullptr if the biases addition is not required.
234  * Biases are 1D tensor with dimensions [OFM]. Data type supported: Same as @p input.
235  * @param[out] output Output tensor. Data type supported: Data type supported: QSYMM16
236  * @param[in] result_fixedpoint_multiplier Fixed point value to be multiplied to each element of the input matrix when once the result_offset has been add
237  * @param[in] result_shift Number of bits to shift right the result after the fixed point multiplication
238  * @param[in] min (Optional) Min value used to saturate down the output result before converting back to QSYMM16. Defaults to the minimum possible 32-bit signed integer.
239  * @param[in] max (Optional) Max value used to saturate up the output result before converting back to QSYMM16.
240  * Along with @p min, this value can be used to implement "rectified linear unit" activation functions. Defaults to the maximum possible 32-bit signed integer.
241  */
242  void configure(const ITensor *input, const ITensor *bias, ITensor *output, int result_fixedpoint_multiplier, int result_shift, int min = std::numeric_limits<int32_t>::lowest(),
243  int max = std::numeric_limits<int32_t>::max());
244  /** Static function to check if given info will lead to a valid configuration of @ref NEGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint
245  *
246  * @param[in] input Input tensor info. It is the output of @ref NEGEMMLowpMatrixMultiplyCore function. Data type supported: S32
247  * @param[in] bias Biases tensor info. Only shared biases supported and it can be a nullptr if the addition of biases is not required.
248  * Biases are 1D tensor with dimensions [OFM]. Data type supported: Same as @p input.
249  * @param[in] output Output tensor info. Data type supported: Data type supported: QSYMM16
250  * @param[in] min (Optional) Min value used to saturate down the output result before converting back to QSYMM16. Defaults to the minimum possible 32-bit signed integer.
251  * @param[in] max (Optional) Max value used to saturate up the output result before converting back to QSYMM16,
252  * Along with @p min, this value can be used to implement "rectified linear unit" activation functions. Defaults to the maximum possible 32-bit signed integer.
253  *
254  * @return a status
255  */
256  static Status validate(const ITensorInfo *input, const ITensorInfo *bias, const ITensorInfo *output, int min = std::numeric_limits<int32_t>::lowest(), int max = std::numeric_limits<int32_t>::max());
257 };
258 
259 /** Basic function to execute GEMMLowpQuantizeDown kernels on Neon.
260  *
261  * This function calls the following Neon kernels:
262  *
263  * -# @ref NEGEMMLowpQuantizeDownInt32ScaleKernel
264  * -# @ref NEGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel
265  * -# @ref NEGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel
266  * -# @ref NEGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel
267 */
269 {
270 public:
271  /** Constructor */
272  NEGEMMLowpOutputStage() = default;
273  /** Prevent instances of this class from being copied (As this class contains pointers) */
275  /** Prevent instances of this class from being copied (As this class contains pointers) */
277  /** Prevent instances of this class from being moved (As this class contains non movable objects) */
279  /** Prevent instances of this class from being moved (As this class contains non movable objects) */
281  /** Default destructor */
283  /** Initialise the kernel's inputs, output
284  *
285  * @param[in] input Input tensor. Data type supported: S32
286  * @param[in] bias Biases tensor. Only shared biases supported and it can be a nullptr if the biases addition is not required.
287  * Biases are 1D tensor with dimensions [OFM]. Data type supported: Same as @p input.
288  * @param[out] output Output tensor. Data type supported: Data type supported: QASYMM8/QASYMM8_SIGNED/QSYMM16
289  * @param[in] info GEMMLowp output stage metadata.
290  */
291  void configure(const ITensor *input, const ITensor *bias, ITensor *output, const GEMMLowpOutputStageInfo &info);
292  /** Static function to check if given info will lead to a valid configuration of @ref NEGEMMLowpOutputStage
293  *
294  * @param[in] input Input tensor info. It is the output of @ref NEGEMMLowpMatrixMultiplyCore function. Data type supported: S32
295  * @param[in] bias Biases tensor info. Only shared biases supported and it can be a nullptr if the addition of biases is not required.
296  * Biases are 1D tensor with dimensions [OFM]. Data type supported: Same as @p input.
297  * @param[in] output Output tensor info. Data type supported: Data type supported: QASYMM8/QASYMM8_SIGNED/QSYMM16
298  * @param[in] info GEMMLowp output stage metadata.
299  *
300  * @return a status
301  */
302  static Status validate(const ITensorInfo *input, const ITensorInfo *bias, const ITensorInfo *output, const GEMMLowpOutputStageInfo &info);
303 };
304 } // namespace arm_compute
305 #endif /*ARM_COMPUTE_NEGEMMLOWPOUTPUTSTAGE_H */
Basic function to execute NEGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint on Neon.
Store the tensor&#39;s metadata.
Definition: ITensorInfo.h:40
void configure(const ITensor *input, const ITensor *bias, ITensor *output, int result_fixedpoint_multiplier, int result_shift, int result_offset_after_shift, int min=std::numeric_limits< int32_t >::lowest(), int max=std::numeric_limits< int32_t >::max())
Initialise the kernel&#39;s inputs, output.
Status class.
Definition: Error.h:52
Interface for Neon tensor.
Definition: ITensor.h:36
Copyright (c) 2017-2021 Arm Limited.
static Status validate(const ITensorInfo *input, const ITensorInfo *bias, const ITensorInfo *output, int min=std::numeric_limits< int32_t >::lowest(), int max=std::numeric_limits< int32_t >::max())
Static function to check if given info will lead to a valid configuration of NEGEMMLowpQuantizeDownIn...
Basic interface for functions which have a single Neon kernel and no border.
GEMMLowp output stage info.
Definition: Types.h:1952
Basic function to execute NEGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPoint on Neon.
ScaleKernelInfo info(interpolation_policy, default_border_mode, PixelValue(), sampling_policy, false)
NEGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint & operator=(const NEGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint &)=delete
Prevent instances of this class from being copied (As this class contains pointers) ...
Basic function to execute GEMMLowpQuantizeDown kernels on Neon.
Basic function to execute NEGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPoint on Neon.