Compute Library
 21.02
NELSTMLayerQuantized Class Reference

Basic function to run NELSTMLayerQuantized. More...

#include <NELSTMLayerQuantized.h>

Collaboration diagram for NELSTMLayerQuantized:
[legend]

Public Member Functions

 NELSTMLayerQuantized (std::shared_ptr< IMemoryManager > memory_manager=nullptr)
 Default constructor. More...
 
 NELSTMLayerQuantized (const NELSTMLayerQuantized &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 NELSTMLayerQuantized (NELSTMLayerQuantized &&)=delete
 Prevent instances of this class from being moved (As this class contains pointers) More...
 
NELSTMLayerQuantizedoperator= (const NELSTMLayerQuantized &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
NELSTMLayerQuantizedoperator= (NELSTMLayerQuantized &&)=delete
 Prevent instances of this class from being moved (As this class contains pointers) More...
 
 ~NELSTMLayerQuantized ()
 Default destructor. More...
 
void configure (const ITensor *input, const ITensor *input_to_input_weights, const ITensor *input_to_forget_weights, const ITensor *input_to_cell_weights, const ITensor *input_to_output_weights, const ITensor *recurrent_to_input_weights, const ITensor *recurrent_to_forget_weights, const ITensor *recurrent_to_cell_weights, const ITensor *recurrent_to_output_weights, const ITensor *input_gate_bias, const ITensor *forget_gate_bias, const ITensor *cell_bias, const ITensor *output_gate_bias, ITensor *cell_state_in, const ITensor *output_state_in, ITensor *cell_state_out, ITensor *output_state_out)
 Initialize function's tensors. More...
 
void run () override
 Run the kernels contained in the function. More...
 
void prepare () override
 Prepare the function for executing. More...
 
- Public Member Functions inherited from IFunction
virtual ~IFunction ()=default
 Destructor. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *input, const ITensorInfo *input_to_input_weights, const ITensorInfo *input_to_forget_weights, const ITensorInfo *input_to_cell_weights, const ITensorInfo *input_to_output_weights, const ITensorInfo *recurrent_to_input_weights, const ITensorInfo *recurrent_to_forget_weights, const ITensorInfo *recurrent_to_cell_weights, const ITensorInfo *recurrent_to_output_weights, const ITensorInfo *input_gate_bias, const ITensorInfo *forget_gate_bias, const ITensorInfo *cell_bias, const ITensorInfo *output_gate_bias, const ITensorInfo *cell_state_in, const ITensorInfo *output_state_in, const ITensorInfo *cell_state_out, const ITensorInfo *output_state_out)
 Static function to check if given info will lead to a valid configuration of NELSTMLayer. More...
 

Detailed Description

Basic function to run NELSTMLayerQuantized.

This function calls the following Neon functions/kernels:

  1. NEGEMMLowpMatrixMultiplyCore Quantized matrix multiplication core. Accumulators are 32-bit integers
  2. NEGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPoint Convert 32-bit integers into QSYMM16
  3. NETranspose Matrix transpose
  4. NEConcatenateLayer Tensor concatenation
  5. NEActivationLayer Activation functions (tanh and logistic)
  6. NEArithmeticAddition Elementwise addition
  7. NEPixelWiseMultiplication Elementwise multiplication
  8. NESlice Tensor slicing
  9. NEDequantizationLayer Dequantize into float
  10. NEQuantizationLayer Quantize from float

Definition at line 63 of file NELSTMLayerQuantized.h.

Constructor & Destructor Documentation

◆ NELSTMLayerQuantized() [1/3]

NELSTMLayerQuantized ( std::shared_ptr< IMemoryManager memory_manager = nullptr)

Default constructor.

Definition at line 57 of file NELSTMLayerQuantized.cpp.

58  : _memory_group(std::move(memory_manager)), _gemmlowp(), _output_stage(), _transpose_weights(), _concat_input_weights(), _concat_recurrent_weights(), _concat_weights(), _concat_inputs(),
59  _concat_bias(), _sigmoid_forget_gate(), _sigmoid_input_gate(), _sigmoid_output_gate(), _tanh_modulation_gate(), _tanh_output_state(), _add1(), _add2(), _mul1(), _mul2(), _mul3(),
60  _slice_input_tensor(), _slice_forget_tensor(), _slice_cell_tensor(), _slice_output_tensor(), _dequantize(), _quantize(), _input_to_input_weights(nullptr), _input_to_forget_weights(nullptr),
61  _input_to_cell_weights(nullptr), _input_to_output_weights(nullptr), _recurrent_to_input_weights(nullptr), _recurrent_to_forget_weights(nullptr), _recurrent_to_cell_weights(nullptr),
62  _recurrent_to_output_weights(nullptr), _input_gate_bias(nullptr), _forget_gate_bias(nullptr), _cell_bias(nullptr), _output_gate_bias(nullptr), _recurrent_weights(), _input_weights(), _weights(),
63  _input(), _weights_transposed(), _output_highp(), _output_lowp(), _bias(), _forget_gate_input(), _input_gate_input(), _output_gate_input(), _input_modulation_gate_input(), _forget_gate_output(),
64  _input_gate_output(), _output_gate_output(), _input_modulation_gate_output(), _cell_state1(), _cell_state2(), _output_state_tmp(), _output_state_out_symm(), _output_state_out_f32(),
65  _is_prepared(false)
66 {
67 }

◆ NELSTMLayerQuantized() [2/3]

Prevent instances of this class from being copied (As this class contains pointers)

◆ NELSTMLayerQuantized() [3/3]

Prevent instances of this class from being moved (As this class contains pointers)

◆ ~NELSTMLayerQuantized()

~NELSTMLayerQuantized ( )
default

Default destructor.

Member Function Documentation

◆ configure()

void configure ( const ITensor input,
const ITensor input_to_input_weights,
const ITensor input_to_forget_weights,
const ITensor input_to_cell_weights,
const ITensor input_to_output_weights,
const ITensor recurrent_to_input_weights,
const ITensor recurrent_to_forget_weights,
const ITensor recurrent_to_cell_weights,
const ITensor recurrent_to_output_weights,
const ITensor input_gate_bias,
const ITensor forget_gate_bias,
const ITensor cell_bias,
const ITensor output_gate_bias,
ITensor cell_state_in,
const ITensor output_state_in,
ITensor cell_state_out,
ITensor output_state_out 
)

Initialize function's tensors.

Parameters
[in]inputSource tensor. Input is a 2D tensor with dimensions [input_size, batch_size]. Data types supported: QASYMM8.
[in]input_to_input_weights2D weights tensor with dimensions [input_size, output_size]. Data type supported: Same as input.
[in]input_to_forget_weights2D weights tensor with dimensions [input_size, output_size]. Data type supported: Same as input.
[in]input_to_cell_weights2D weights tensor with dimensions [input_size, output_size]. Data type supported: Same as input.
[in]input_to_output_weights2D weights tensor with dimensions [input_size, output_size]. Data type supported: Same as input.
[in]recurrent_to_input_weights2D weights tensor with dimensions [output_size, output_size]. Data type supported: Same as input.
[in]recurrent_to_forget_weights2D weights tensor with dimensions [output_size, output_size]. Data type supported: Same as input.
[in]recurrent_to_cell_weights2D weights tensor with dimensions [output_size, output_size]. Data type supported: Same as input.
[in]recurrent_to_output_weights2D weights tensor with dimensions [output_size, output_size]. Data type supported: Same as input.
[in]input_gate_bias1D weights tensor with dimensions [output_size]. Data type supported: S32.
[in]forget_gate_bias1D weights tensor with dimensions [output_size]. Data type supported: S32.
[in]cell_bias1D weights tensor with dimensions [output_size]. Data type supported: S32.
[in]output_gate_bias1D weights tensor with dimensions [output_size]. Data type supported: S32.
[in]cell_state_in2D tensor with dimensions [output_size, batch_size]. Data type supported: QSYMM16.
[in]output_state_in2D tensor with dimensions [output_size, batch_size]. Data type supported: Same as input.
[out]cell_state_outDestination tensor. Output is a 2D tensor with dimensions [output_size, batch_size]. Data type supported: QSYMM16.
[out]output_state_outDestination tensor. Output is a 2D tensor with dimensions [output_size, batch_size].Data types supported: Same as input.

Definition at line 69 of file NELSTMLayerQuantized.cpp.

References TensorAllocator::allocate(), Tensor::allocator(), ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, arm_compute::auto_init_if_empty(), arm_compute::quantization::calculate_quantized_multiplier(), NEDequantizationLayer::configure(), NETranspose::configure(), NEQuantizationLayer::configure(), NEConcatenateLayer::configure(), NEActivationLayer::configure(), NEArithmeticAddition::configure(), NEGEMMLowpMatrixMultiplyCore::configure(), NESlice::configure(), NEPixelWiseMultiplication::configure(), NEGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPoint::configure(), ITensorInfo::dimension(), Window::DimX, Window::DimY, arm_compute::F32, arm_compute::test::validation::forget_gate_bias, ITensor::info(), Tensor::info(), TensorAllocator::init(), arm_compute::test::validation::input, arm_compute::test::validation::input_gate_bias, arm_compute::test::validation::input_size, arm_compute::test::validation::input_to_cell_weights, arm_compute::test::validation::input_to_forget_weights, arm_compute::test::validation::input_to_input_weights, arm_compute::test::validation::input_to_output_weights, ActivationLayerInfo::LOGISTIC, MemoryGroup::manage(), UniformQuantizationInfo::offset, arm_compute::test::validation::output_gate_bias, arm_compute::test::validation::output_size, arm_compute::test::validation::qasymm(), arm_compute::QASYMM8, arm_compute::QSYMM16, arm_compute::test::validation::qsymm_3(), arm_compute::test::validation::qsymm_4(), ITensorInfo::quantization_info(), arm_compute::test::validation::qweights(), arm_compute::test::validation::recurrent_to_cell_weights, arm_compute::test::validation::recurrent_to_forget_weights, arm_compute::test::validation::recurrent_to_input_weights, arm_compute::test::validation::recurrent_to_output_weights, arm_compute::S32, arm_compute::SATURATE, UniformQuantizationInfo::scale, ITensorInfo::set_quantization_info(), ActivationLayerInfo::TANH, ITensorInfo::tensor_shape(), arm_compute::TO_ZERO, QuantizationInfo::uniform(), and NELSTMLayerQuantized::validate().

75 {
78  input_gate_bias, forget_gate_bias, cell_bias, output_gate_bias, cell_state_in, output_state_in, cell_state_out, output_state_out);
79 
83  input_gate_bias->info(), forget_gate_bias->info(), cell_bias->info(), output_gate_bias->info(), cell_state_in->info(), output_state_in->info(), cell_state_out->info(), output_state_out->info()));
84 
85  const int input_size = input->info()->dimension(0);
86  const int batch_size = input->info()->dimension(1);
87  const int output_size = input_to_input_weights->info()->dimension(1);
88 
89  const QuantizationInfo qweights = input_to_input_weights->info()->quantization_info(); // Weights quantization
90 
91  auto_init_if_empty(*cell_state_out->info(), TensorInfo(TensorShape(batch_size, output_size), 1, DataType::QSYMM16, qsymm_4));
92  auto_init_if_empty(*output_state_out->info(), TensorInfo(TensorShape(batch_size, output_size), 1, DataType::QASYMM8, qasymm));
93 
94  _input_to_input_weights = input_to_input_weights;
95  _input_to_forget_weights = input_to_forget_weights;
96  _input_to_cell_weights = input_to_cell_weights;
97  _input_to_output_weights = input_to_output_weights;
98  _recurrent_to_input_weights = recurrent_to_input_weights;
99  _recurrent_to_forget_weights = recurrent_to_forget_weights;
100  _recurrent_to_cell_weights = recurrent_to_cell_weights;
101  _recurrent_to_output_weights = recurrent_to_output_weights;
102  _input_gate_bias = input_gate_bias;
103  _forget_gate_bias = forget_gate_bias;
104  _cell_bias = cell_bias;
105  _output_gate_bias = output_gate_bias;
106 
107  // Weights concatenation
108  std::vector<const ITensor *> inputs_weights_vector{ input_to_input_weights, input_to_forget_weights, input_to_cell_weights, input_to_output_weights };
109  std::vector<const ITensor *> recurrent_weights_vector{ recurrent_to_input_weights, recurrent_to_forget_weights, recurrent_to_cell_weights, recurrent_to_output_weights };
110 
111  _input_weights.allocator()->init(TensorInfo(TensorShape(input_size, 4 * output_size), 1, DataType::QASYMM8, qweights));
112  _concat_input_weights.configure(inputs_weights_vector, &_input_weights, Window::DimY);
113 
114  _recurrent_weights.allocator()->init(TensorInfo(TensorShape(output_size, 4 * output_size), 1, DataType::QASYMM8, qweights));
115  _concat_recurrent_weights.configure(recurrent_weights_vector, &_recurrent_weights, Window::DimY);
116 
117  std::vector<const ITensor *> weights_vector{ &_recurrent_weights, &_input_weights };
118  _weights.allocator()->init(TensorInfo(TensorShape(output_size + input_size, 4 * output_size), 1, DataType::QASYMM8, qweights));
119  _concat_weights.configure(weights_vector, &_weights, Window::DimX);
120  _transpose_weights.configure(&_weights, &_weights_transposed);
121 
122  // Input concatenation
123  std::vector<const ITensor *> input_vector{ input, output_state_in };
124  _memory_group.manage(&_input);
125  _input.allocator()->init(TensorInfo(TensorShape(output_size + input_size, batch_size), 1, DataType::QASYMM8, qasymm));
126  _concat_inputs.configure(input_vector, &_input, Window::DimX);
127 
128  // Bias concatenation
129  std::vector<const ITensor *> bias_vector{ input_gate_bias, forget_gate_bias, cell_bias, output_gate_bias };
130  _bias.allocator()->init(TensorInfo(TensorShape(4 * output_size), 1, DataType::S32));
131  _concat_bias.configure(bias_vector, &_bias, Window::DimX);
132 
133  // Invert the offset for gemmlowp
134  _input.info()->set_quantization_info(QuantizationInfo(qasymm.uniform().scale, -qasymm.uniform().offset));
135  _weights_transposed.info()->set_quantization_info(QuantizationInfo(qweights.uniform().scale, -qweights.uniform().offset));
136 
137  // Run gemmlowp
138  _memory_group.manage(&_output_highp);
139  _output_highp.allocator()->init(TensorInfo(TensorShape(4 * output_size, batch_size), 1, DataType::S32));
140  _gemmlowp.configure(&_input, &_weights_transposed, nullptr, &_output_highp);
141  _input.allocator()->allocate();
142 
143  // Set the offset back
144  _input.info()->set_quantization_info(QuantizationInfo(qasymm.uniform().scale, qasymm.uniform().offset));
145  _weights_transposed.info()->set_quantization_info(QuantizationInfo(qweights.uniform().scale, qweights.uniform().offset));
146 
147  // multiplier = (input_scale * weights_scale) / output_scale (2 ^ (-12))
148  _output_lowp.allocator()->init(TensorInfo(_output_highp.info()->tensor_shape(), 1, DataType::QSYMM16, qsymm_3));
149 
150  const float multiplier = 4096.f * qasymm.uniform().scale * qweights.uniform().scale;
151  int32_t output_multiplier = 0;
152  int32_t output_shift = 0;
153  quantization::calculate_quantized_multiplier(multiplier, &output_multiplier, &output_shift);
154 
155  _memory_group.manage(&_output_lowp);
156  _output_stage.configure(&_output_highp, &_bias, &_output_lowp, output_multiplier, output_shift);
157  _output_highp.allocator()->allocate();
158  _bias.allocator()->allocate();
159 
160  // Get the gate tensors
161  if(batch_size > 1)
162  {
163  _memory_group.manage(&_input_gate_input);
164  _slice_input_tensor.configure(&_output_lowp, &_input_gate_input, { 0, 0 }, { output_size, batch_size });
165  _memory_group.manage(&_forget_gate_input);
166  _slice_forget_tensor.configure(&_output_lowp, &_forget_gate_input, { output_size, 0 }, { 2 * output_size, batch_size });
167  _memory_group.manage(&_input_modulation_gate_input);
168  _slice_cell_tensor.configure(&_output_lowp, &_input_modulation_gate_input, { 2 * output_size, 0 }, { 3 * output_size, batch_size });
169  _memory_group.manage(&_output_gate_input);
170  _slice_output_tensor.configure(&_output_lowp, &_output_gate_input, { 3 * output_size, 0 }, { 4 * output_size, batch_size });
171  _output_lowp.allocator()->allocate();
172  }
173  else
174  {
175  _memory_group.manage(&_input_gate_input);
176  _slice_input_tensor.configure(&_output_lowp, &_input_gate_input, { 0 }, { output_size });
177  _memory_group.manage(&_forget_gate_input);
178  _slice_forget_tensor.configure(&_output_lowp, &_forget_gate_input, { output_size }, { 2 * output_size });
179  _memory_group.manage(&_input_modulation_gate_input);
180  _slice_cell_tensor.configure(&_output_lowp, &_input_modulation_gate_input, { 2 * output_size }, { 3 * output_size });
181  _memory_group.manage(&_output_gate_input);
182  _slice_output_tensor.configure(&_output_lowp, &_output_gate_input, { 3 * output_size }, { 4 * output_size });
183  _output_lowp.allocator()->allocate();
184  }
185 
186  // Forget gate
187  _memory_group.manage(&_forget_gate_output);
188  _forget_gate_output.allocator()->init(TensorInfo(_forget_gate_input.info()->tensor_shape(), 1, DataType::QSYMM16, qsymm_0));
189  _sigmoid_forget_gate.configure(&_forget_gate_input, &_forget_gate_output, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::LOGISTIC));
190  _forget_gate_input.allocator()->allocate();
191 
192  // Input gate
193  _memory_group.manage(&_input_gate_output);
194  _input_gate_output.allocator()->init(TensorInfo(_input_gate_input.info()->tensor_shape(), 1, DataType::QSYMM16, qsymm_0));
195  _sigmoid_input_gate.configure(&_input_gate_input, &_input_gate_output, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::LOGISTIC));
196  _input_gate_input.allocator()->allocate();
197 
198  // Input modulation gate equation
199  _memory_group.manage(&_input_modulation_gate_output);
200  _input_modulation_gate_output.allocator()->init(TensorInfo(_input_modulation_gate_input.info()->tensor_shape(), 1, DataType::QSYMM16, qsymm_0));
201  _tanh_modulation_gate.configure(&_input_modulation_gate_input, &_input_modulation_gate_output, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::TANH, 1.0f, 1.0f));
202  _input_modulation_gate_input.allocator()->allocate();
203 
204  // Output gate
205  _memory_group.manage(&_output_gate_output);
206  _output_gate_output.allocator()->init(TensorInfo(_output_gate_input.info()->tensor_shape(), 1, DataType::QSYMM16, qsymm_0));
207  _sigmoid_output_gate.configure(&_output_gate_input, &_output_gate_output, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::LOGISTIC));
208  _output_gate_input.allocator()->allocate();
209 
210  // Long term memory
211  _memory_group.manage(&_cell_state1);
212  _cell_state1.allocator()->init(TensorInfo(_forget_gate_output.info()->tensor_shape(), 1, DataType::QSYMM16, qsymm_4));
213  _mul1.configure(&_forget_gate_output, cell_state_in, &_cell_state1, 1, ConvertPolicy::SATURATE, RoundingPolicy::TO_ZERO);
214  _forget_gate_output.allocator()->allocate();
215 
216  _memory_group.manage(&_cell_state2);
217  _cell_state2.allocator()->init(TensorInfo(_input_gate_output.info()->tensor_shape(), 1, DataType::QSYMM16, qsymm_4));
218  _mul2.configure(&_input_gate_output, &_input_modulation_gate_output, &_cell_state2, 1, ConvertPolicy::SATURATE, RoundingPolicy::TO_ZERO);
219  _input_modulation_gate_output.allocator()->allocate();
220  _input_gate_output.allocator()->allocate();
221 
222  _add1.configure(&_cell_state1, &_cell_state2, cell_state_out, ConvertPolicy::SATURATE);
223  _cell_state1.allocator()->allocate();
224  _cell_state2.allocator()->allocate();
225 
226  // Short term memory
227  _memory_group.manage(&_output_state_tmp);
228  _output_state_tmp.allocator()->init(TensorInfo(cell_state_out->info()->tensor_shape(), 1, DataType::QSYMM16, qsymm_0));
229  _tanh_output_state.configure(cell_state_out, &_output_state_tmp, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::TANH, 1.0f, 1.0f));
230 
231  _memory_group.manage(&_output_state_out_symm);
232  _output_state_out_symm.allocator()->init(TensorInfo(_output_gate_output.info()->tensor_shape(), 1, DataType::QSYMM16, qsymm_0));
233  _mul3.configure(&_output_state_tmp, &_output_gate_output, &_output_state_out_symm, 1, ConvertPolicy::SATURATE, RoundingPolicy::TO_ZERO);
234  _output_gate_output.allocator()->allocate();
235  _output_state_tmp.allocator()->allocate();
236 
237  // Requantize the output state from QSYMM16 to QASYMM8
238  _memory_group.manage(&_output_state_out_f32);
239  _output_state_out_f32.allocator()->init(TensorInfo(_output_state_out_symm.info()->tensor_shape(), 1, DataType::F32));
240  _dequantize.configure(&_output_state_out_symm, &_output_state_out_f32);
241  _output_state_out_symm.allocator()->allocate();
242 
243  _quantize.configure(&_output_state_out_f32, output_state_out);
244  _output_state_out_f32.allocator()->allocate();
245 }
quantized, symmetric fixed-point 16-bit number
void init(const TensorAllocator &allocator, const Coordinates &coords, TensorInfo &sub_info)
Shares the same backing memory with another tensor allocator, while the tensor info might be differen...
QuantizationInfo qweights(1.f/16.f, 16)
void configure(const ITensor *input, ITensor *output)
Configure the kernel.
1 channel, 1 F32 per channel
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
Status calculate_quantized_multiplier(float multiplier, int32_t *quant_multiplier, int32_t *shift, bool ignore_epsilon=false)
Calculate quantized representation of multiplier.
void configure(const ITensor *input1, const ITensor *input2, ITensor *output, ConvertPolicy policy, const ActivationLayerInfo &act_info=ActivationLayerInfo())
Initialise the kernel&#39;s inputs, output and conversion policy.
QuantizationInfo qsymm_3(8.f/32768.f, 0)
TensorAllocator * allocator()
Return a pointer to the tensor&#39;s allocator.
Definition: Tensor.cpp:48
ITensorInfo * info() const override
Interface to be implemented by the child class to return the tensor&#39;s metadata.
Definition: Tensor.cpp:33
1 channel, 1 S32 per channel
void manage(IMemoryManageable *obj) override
Sets a object to be managed by the given memory group.
Definition: MemoryGroup.h:79
void configure(const ITensor *input, ITensor *output)
Initialise the kernel&#39;s inputs and output.
Definition: NETranspose.cpp:32
static constexpr size_t DimX
Alias for dimension 0 also known as X dimension.
Definition: Window.h:43
void configure(const ITensor *input, ITensor *output, const Coordinates &starts, const Coordinates &ends)
Configure kernel.
Definition: NESlice.cpp:85
void configure(const ITensor *a, const ITensor *b, const ITensor *c, ITensor *output, const GEMMInfo &gemm_info=GEMMInfo())
Initialise the kernel&#39;s inputs, output.
virtual const TensorShape & tensor_shape() const =0
Size for each dimension of the tensor.
void configure(const ITensor *input1, const ITensor *input2, ITensor *output, float scale, ConvertPolicy overflow_policy, RoundingPolicy rounding_policy, const ActivationLayerInfo &act_info=ActivationLayerInfo())
Initialise the kernel&#39;s inputs, output and convertion policy.
quantized, asymmetric fixed-point 8-bit number unsigned
void allocate() override
Allocate size specified by TensorInfo of CPU memory.
UniformQuantizationInfo uniform() const
Return per layer quantization info.
bool auto_init_if_empty(ITensorInfo &info, const TensorShape &shape, int num_channels, DataType data_type, QuantizationInfo quantization_info=QuantizationInfo())
Auto initialize the tensor info (shape, number of channels and data type) if the current assignment i...
virtual ITensorInfo & set_quantization_info(const QuantizationInfo &quantization_info)=0
Set the quantization settings (scale and offset) of the tensor.
void configure(const ITensor *input, ITensor *output)
Set the input and output tensors.
static constexpr size_t DimY
Alias for dimension 1 also known as Y dimension.
Definition: Window.h:45
void configure(const ITensor *input, const ITensor *bias, ITensor *output, int result_fixedpoint_multiplier, int result_shift, int min=std::numeric_limits< int32_t >::lowest(), int max=std::numeric_limits< int32_t >::max())
Initialise the kernel&#39;s inputs, output.
void configure(ITensor *input, ITensor *output, ActivationLayerInfo activation_info)
[NEActivationLayer snippet]
QuantizationInfo qasymm(1.f/128.f, 128)
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161
QuantizationInfo qsymm_4(16.f/32768.f, 0)
void configure(std::vector< const ITensor *> inputs_vector, ITensor *output, size_t axis)
Initialise the kernel&#39;s inputs vector and output.
Truncates the least significant values that are lost in operations.
static Status validate(const ITensorInfo *input, const ITensorInfo *input_to_input_weights, const ITensorInfo *input_to_forget_weights, const ITensorInfo *input_to_cell_weights, const ITensorInfo *input_to_output_weights, const ITensorInfo *recurrent_to_input_weights, const ITensorInfo *recurrent_to_forget_weights, const ITensorInfo *recurrent_to_cell_weights, const ITensorInfo *recurrent_to_output_weights, const ITensorInfo *input_gate_bias, const ITensorInfo *forget_gate_bias, const ITensorInfo *cell_bias, const ITensorInfo *output_gate_bias, const ITensorInfo *cell_state_in, const ITensorInfo *output_state_in, const ITensorInfo *cell_state_out, const ITensorInfo *output_state_out)
Static function to check if given info will lead to a valid configuration of NELSTMLayer.

◆ operator=() [1/2]

NELSTMLayerQuantized& operator= ( const NELSTMLayerQuantized )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

NELSTMLayerQuantized& operator= ( NELSTMLayerQuantized &&  )
delete

Prevent instances of this class from being moved (As this class contains pointers)

◆ prepare()

void prepare ( )
overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Note
Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from IFunction.

Definition at line 503 of file NELSTMLayerQuantized.cpp.

References TensorAllocator::allocate(), Tensor::allocator(), TensorAllocator::free(), ITensor::mark_as_unused(), INESimpleFunctionNoBorder::run(), and NEConcatenateLayer::run().

Referenced by NELSTMLayerQuantized::run().

504 {
505  if(!_is_prepared)
506  {
507  _input_weights.allocator()->allocate();
508  _concat_input_weights.run();
509 
510  _input_to_input_weights->mark_as_unused();
511  _input_to_forget_weights->mark_as_unused();
512  _input_to_cell_weights->mark_as_unused();
513  _input_to_output_weights->mark_as_unused();
514 
515  _recurrent_weights.allocator()->allocate();
516  _concat_recurrent_weights.run();
517  _recurrent_to_input_weights->mark_as_unused();
518  _recurrent_to_forget_weights->mark_as_unused();
519  _recurrent_to_cell_weights->mark_as_unused();
520  _recurrent_to_output_weights->mark_as_unused();
521 
522  _weights.allocator()->allocate();
523  _concat_weights.run();
524 
525  _input_weights.mark_as_unused();
526  _input_weights.allocator()->free();
527  _recurrent_weights.mark_as_unused();
528  _recurrent_weights.allocator()->free();
529 
530  _weights_transposed.allocator()->allocate();
531  _transpose_weights.run();
532 
533  _weights.mark_as_unused();
534  _weights.allocator()->free();
535 
536  _bias.allocator()->allocate();
537  _concat_bias.run();
538  _input_gate_bias->mark_as_unused();
539  _forget_gate_bias->mark_as_unused();
540  _cell_bias->mark_as_unused();
541  _output_gate_bias->mark_as_unused();
542 
543  _is_prepared = true;
544  }
545 }
void run() override
Run the kernels contained in the function.
void run() override final
Run the kernels contained in the function.
TensorAllocator * allocator()
Return a pointer to the tensor&#39;s allocator.
Definition: Tensor.cpp:48
void mark_as_unused() const
Marks a tensor as unused.
Definition: ITensor.cpp:168
void allocate() override
Allocate size specified by TensorInfo of CPU memory.
void free() override
Free allocated CPU memory.

◆ run()

void run ( )
overridevirtual

Run the kernels contained in the function.

For Neon kernels:

  • Multi-threading is used for the kernels which are parallelisable.
  • By default std::thread::hardware_concurrency() threads are used.
Note
CPPScheduler::set_num_threads() can be used to manually set the number of threads

For OpenCL kernels:

  • All the kernels are enqueued on the queue associated with CLScheduler.
  • The queue is then flushed.
Note
The function will not block until the kernels are executed. It is the user's responsibility to wait.
Will call prepare() on first run if hasn't been done

Implements IFunction.

Definition at line 456 of file NELSTMLayerQuantized.cpp.

References NELSTMLayerQuantized::prepare(), INESimpleFunctionNoBorder::run(), NEConcatenateLayer::run(), NEActivationLayer::run(), NEArithmeticAddition::run(), NEGEMMLowpMatrixMultiplyCore::run(), NESlice::run(), and NEPixelWiseMultiplication::run().

457 {
458  prepare();
459 
460  // Acquire all the temporaries
461  MemoryGroupResourceScope scope_mg(_memory_group);
462 
463  // Concat and transpose the input
464  _concat_inputs.run();
465 
466  // Run gemmlowp
467  _gemmlowp.run();
468  _output_stage.run();
469 
470  // Slice the results
471  _slice_input_tensor.run();
472  _slice_forget_tensor.run();
473  _slice_cell_tensor.run();
474  _slice_output_tensor.run();
475 
476  // Gates
477  // Forget gate
478  _sigmoid_forget_gate.run();
479 
480  // Input gate
481  _sigmoid_input_gate.run();
482 
483  // Input modulation gate
484  _tanh_modulation_gate.run();
485 
486  // Output gate
487  _sigmoid_output_gate.run();
488 
489  // Cell state (long term memory)
490  _mul1.run();
491  _mul2.run();
492  _add1.run();
493 
494  // Output state (short term memory)
495  _tanh_output_state.run();
496  _mul3.run();
497 
498  // Requantize output state from QSYMM16 to QASYMM8
499  _dequantize.run();
500  _quantize.run();
501 }
void run() override
Run the kernels contained in the function.
void run() override
Run the kernels contained in the function.
void run() override final
Run the kernels contained in the function.
void run() override
Run the kernels contained in the function.
void run() override
Run the kernels contained in the function.
void run() override
Run the kernels contained in the function.
void prepare() override
Prepare the function for executing.
void run() override
Run the kernels contained in the function.
Definition: NESlice.cpp:93

◆ validate()

Status validate ( const ITensorInfo input,
const ITensorInfo input_to_input_weights,
const ITensorInfo input_to_forget_weights,
const ITensorInfo input_to_cell_weights,
const ITensorInfo input_to_output_weights,
const ITensorInfo recurrent_to_input_weights,
const ITensorInfo recurrent_to_forget_weights,
const ITensorInfo recurrent_to_cell_weights,
const ITensorInfo recurrent_to_output_weights,
const ITensorInfo input_gate_bias,
const ITensorInfo forget_gate_bias,
const ITensorInfo cell_bias,
const ITensorInfo output_gate_bias,
const ITensorInfo cell_state_in,
const ITensorInfo output_state_in,
const ITensorInfo cell_state_out,
const ITensorInfo output_state_out 
)
static

Static function to check if given info will lead to a valid configuration of NELSTMLayer.

Parameters
[in]inputSource tensor info. Input is a 2D tensor info with dimensions [input_size, batch_size]. Data types supported: QASYMM8.
[in]input_to_input_weights2D weights tensor info with dimensions [input_size, output_size]. Data type supported: Same as input.
[in]input_to_forget_weights2D weights tensor info with dimensions [input_size, output_size]. Data type supported: Same as input.
[in]input_to_cell_weights2D weights tensor info with dimensions [input_size, output_size]. Data type supported: Same as input.
[in]input_to_output_weights2D weights tensor info with dimensions [input_size, output_size]. Data type supported: Same as input.
[in]recurrent_to_input_weights2D weights tensor info with dimensions [output_size, output_size]. Data type supported: Same as input.
[in]recurrent_to_forget_weights2D weights tensor info with dimensions [output_size, output_size]. Data type supported: Same as input.
[in]recurrent_to_cell_weights2D weights tensor info with dimensions [output_size, output_size]. Data type supported: Same as input.
[in]recurrent_to_output_weights2D weights tensor info with dimensions [output_size, output_size]. Data type supported: Same as input.
[in]input_gate_bias1D weights tensor info with dimensions [output_size]. Data type supported: S32.
[in]forget_gate_bias1D weights tensor info with dimensions [output_size]. Data type supported: S32.
[in]cell_bias1D weights tensor info with dimensions [output_size]. Data type supported: S32.
[in]output_gate_bias1D weights tensor info with dimensions [output_size]. Data type supported: S32.
[in]cell_state_in2D tensor info with dimensions [output_size, batch_size]. Data type supported: QSYMM16.
[in]output_state_in2D tensor info with dimensions [output_size, batch_size]. Data type supported: Same as input.
[out]cell_state_outDestination tensor info. Output is a 2D tensor info with dimensions [output_size, batch_size]. Data type supported: QSYMM16.
[out]output_state_outDestination tensor info. Output is a 2D tensor info with dimensions [output_size, batch_size].Data types supported: Same as input.
Returns
a status

Definition at line 247 of file NELSTMLayerQuantized.cpp.

References ARM_COMPUTE_RETURN_ERROR_ON, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_QUANTIZATION_INFO, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_SHAPES, ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR, ARM_COMPUTE_RETURN_ON_ERROR, arm_compute::test::validation::bias_info, arm_compute::quantization::calculate_quantized_multiplier(), ICloneable< T >::clone(), TensorInfo::clone(), ITensorInfo::dimension(), Window::DimX, Window::DimY, arm_compute::F32, arm_compute::test::validation::input_size, ActivationLayerInfo::LOGISTIC, ITensorInfo::num_dimensions(), UniformQuantizationInfo::offset, arm_compute::test::validation::output_size, arm_compute::test::validation::qasymm(), arm_compute::QASYMM8, arm_compute::QSYMM16, arm_compute::test::validation::qsymm_3(), arm_compute::test::validation::qsymm_4(), ITensorInfo::quantization_info(), arm_compute::test::validation::qweights(), arm_compute::S32, arm_compute::SATURATE, UniformQuantizationInfo::scale, ITensorInfo::set_quantization_info(), TensorInfo::set_quantization_info(), ActivationLayerInfo::TANH, ITensorInfo::tensor_shape(), TensorInfo::tensor_shape(), arm_compute::TO_ZERO, ITensorInfo::total_size(), QuantizationInfo::uniform(), NEDequantizationLayer::validate(), NETranspose::validate(), NEQuantizationLayer::validate(), NEConcatenateLayer::validate(), NEActivationLayer::validate(), NEArithmeticAddition::validate(), NEGEMMLowpMatrixMultiplyCore::validate(), NESlice::validate(), NEPixelWiseMultiplication::validate(), and NEGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPoint::validate().

Referenced by NELSTMLayerQuantized::configure().

253 {
256  output_state_in, cell_state_out, output_state_out);
257 
258  const int input_size = input->dimension(0);
259  const int batch_size = input->dimension(1);
260  const int output_size = input_to_input_weights->dimension(1);
261 
262  // Dimensionality checks
263  ARM_COMPUTE_RETURN_ERROR_ON(input->num_dimensions() > 2);
265  ARM_COMPUTE_RETURN_ERROR_ON(input_gate_bias->num_dimensions() > 1);
266  ARM_COMPUTE_RETURN_ERROR_ON(output_state_in->num_dimensions() > 2);
267 
268  TensorInfo input_weights_info(input_to_input_weights->clone()->set_tensor_shape(TensorShape(input_size, output_size)).set_data_type(DataType::QASYMM8));
269  TensorInfo recurrent_weights_info(input_to_input_weights->clone()->set_tensor_shape(TensorShape(output_size, output_size)).set_data_type(DataType::QASYMM8));
270  TensorInfo bias_info(input_gate_bias->clone()->set_tensor_shape(TensorShape(output_size)).set_data_type(DataType::S32));
271  TensorInfo output_state_info(cell_state_in->clone()->set_tensor_shape(TensorShape(output_size, batch_size)).set_data_type(DataType::QASYMM8).set_quantization_info(qasymm));
272  TensorInfo cell_state_info(cell_state_in->clone()->set_tensor_shape(TensorShape(output_size, batch_size)).set_data_type(DataType::QSYMM16).set_quantization_info(qsymm_4));
273 
274  // Shape checks
278  ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_SHAPES(&cell_state_info, cell_state_in);
279  ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_SHAPES(&output_state_info, output_state_in);
280 
281  // Data type checks
285  ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(&cell_state_info, cell_state_in);
286  ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(&output_state_info, output_state_in);
287 
288  // Quantization checks
291  ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_QUANTIZATION_INFO(&cell_state_info, cell_state_in);
292  ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_QUANTIZATION_INFO(&output_state_info, output_state_in);
293 
294  // Validate internal functions
295  // _concat_input_weights
296  std::vector<const ITensorInfo *> inputs_weights_vector;
297  inputs_weights_vector.emplace_back(input_to_input_weights);
298  inputs_weights_vector.emplace_back(input_to_forget_weights);
299  inputs_weights_vector.emplace_back(input_to_cell_weights);
300  inputs_weights_vector.emplace_back(input_to_output_weights);
301  const QuantizationInfo qweights = input_to_input_weights->quantization_info(); // Weights quantization
302  const TensorInfo input_weights(TensorShape(input_size, 4 * output_size), 1, DataType::QASYMM8, qweights);
303  ARM_COMPUTE_RETURN_ON_ERROR(NEConcatenateLayer::validate(inputs_weights_vector, &input_weights, Window::DimY));
304 
305  // _concat_recurrent_weights
306  std::vector<const ITensorInfo *> recurrent_weights_vector;
307  recurrent_weights_vector.emplace_back(recurrent_to_input_weights);
308  recurrent_weights_vector.emplace_back(recurrent_to_forget_weights);
309  recurrent_weights_vector.emplace_back(recurrent_to_cell_weights);
310  recurrent_weights_vector.emplace_back(recurrent_to_output_weights);
311  const TensorInfo recurrent_weights(TensorShape(output_size, 4 * output_size), 1, DataType::QASYMM8, qweights);
312  ARM_COMPUTE_RETURN_ON_ERROR(NEConcatenateLayer::validate(recurrent_weights_vector, &recurrent_weights, Window::DimY));
313 
314  // _concat_weights
315  std::vector<const ITensorInfo *> weights_vector;
316  weights_vector.emplace_back(&recurrent_weights);
317  weights_vector.emplace_back(&input_weights);
318  const TensorInfo weights(TensorShape(input_size + output_size, 4 * output_size), 1, DataType::QASYMM8, qweights);
320  // _transpose_weights
321  const TensorShape weights_transposed_shape(weights.tensor_shape()[1], weights.tensor_shape()[0]);
322  TensorInfo weights_transposed = weights.clone()->set_is_resizable(true).set_tensor_shape(weights_transposed_shape);
323  ARM_COMPUTE_RETURN_ON_ERROR(NETranspose::validate(&weights, &weights_transposed));
324 
325  // _concat_inputs
326  std::vector<const ITensorInfo *> input_vector;
327  input_vector.emplace_back(input);
328  input_vector.emplace_back(output_state_in);
329  TensorInfo input_concatenated(TensorShape(output_size + input_size, batch_size), 1, DataType::QASYMM8, qasymm);
330  ARM_COMPUTE_RETURN_ON_ERROR(NEConcatenateLayer::validate(input_vector, &input_concatenated, Window::DimX));
331 
332  // _concat_bias
333  std::vector<const ITensorInfo *> bias_vector;
334  bias_vector.emplace_back(input_gate_bias);
335  bias_vector.emplace_back(forget_gate_bias);
336  bias_vector.emplace_back(cell_bias);
337  bias_vector.emplace_back(output_gate_bias);
338 
339  const TensorInfo bias_concatenated(TensorShape(4 * output_size), 1, DataType::S32);
340  ARM_COMPUTE_RETURN_ON_ERROR(NEConcatenateLayer::validate(bias_vector, &bias_concatenated, Window::DimX));
341 
342  // Invert the offset for gemmlowp
343  input_concatenated.set_quantization_info(QuantizationInfo(qasymm.uniform().scale, -qasymm.uniform().offset));
344  weights_transposed.set_quantization_info(QuantizationInfo(qweights.uniform().scale, -qweights.uniform().offset));
345 
346  // _gemmlowp
347  const TensorInfo output_highp(TensorShape(4 * output_size, batch_size), 1, DataType::S32);
348  ARM_COMPUTE_RETURN_ON_ERROR(NEGEMMLowpMatrixMultiplyCore::validate(&input_concatenated, &weights_transposed, nullptr, &output_highp));
349 
350  // Set the offset back
351  input_concatenated.set_quantization_info(QuantizationInfo(qasymm.uniform().scale, qasymm.uniform().offset));
352  weights_transposed.set_quantization_info(QuantizationInfo(qweights.uniform().scale, qweights.uniform().offset));
353 
354  const TensorInfo output_lowp(output_highp.tensor_shape(), 1, DataType::QSYMM16, qsymm_3);
355 
356  const float multiplier = 4096.f * qasymm.uniform().scale * qweights.uniform().scale;
357  int32_t output_multiplier = 0;
358  int32_t output_shift = 0;
359  ARM_COMPUTE_RETURN_ON_ERROR(quantization::calculate_quantized_multiplier(multiplier, &output_multiplier, &output_shift));
360 
361  // _output_stage
362  ARM_COMPUTE_RETURN_ON_ERROR(NEGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPoint::validate(&output_highp, &bias_concatenated, &output_lowp));
363 
364  TensorInfo input_gate_input;
365  TensorInfo forget_gate_input;
366  TensorInfo input_modulation_gate_input;
367  TensorInfo output_gate_input;
368 
369  if(batch_size > 1)
370  {
371  // _slice_input_tensor
372  input_gate_input = TensorInfo(TensorShape(output_size, batch_size), 1, DataType::QSYMM16, qsymm_3);
373  ARM_COMPUTE_RETURN_ON_ERROR(NESlice::validate(&output_lowp, &input_gate_input, { 0, 0 }, { output_size, batch_size }));
374  // _slice_forget_tensor
375  forget_gate_input = TensorInfo(TensorShape(output_size, batch_size), 1, DataType::QSYMM16, qsymm_3);
376  ARM_COMPUTE_RETURN_ON_ERROR(NESlice::validate(&output_lowp, &forget_gate_input, { output_size, 0 }, { 2 * output_size, batch_size }));
377  // _slice_cell_tensor
378  input_modulation_gate_input = TensorInfo(TensorShape(output_size, batch_size), 1, DataType::QSYMM16, qsymm_3);
379  ARM_COMPUTE_RETURN_ON_ERROR(NESlice::validate(&output_lowp, &input_modulation_gate_input, { 2 * output_size, 0 }, { 3 * output_size, batch_size }));
380  // _slice_output_tensor
381  output_gate_input = TensorInfo(TensorShape(output_size, batch_size), 1, DataType::QSYMM16, qsymm_3);
382  ARM_COMPUTE_RETURN_ON_ERROR(NESlice::validate(&output_lowp, &output_gate_input, { 3 * output_size, 0 }, { 4 * output_size, batch_size }));
383  }
384  else
385  {
386  // _slice_input_tensor
387  input_gate_input = TensorInfo(TensorShape(output_size), 1, DataType::QSYMM16, qsymm_3);
388  ARM_COMPUTE_RETURN_ON_ERROR(NESlice::validate(&output_lowp, &input_gate_input, { 0 }, { output_size }));
389  // _slice_forget_tensor
390  forget_gate_input = TensorInfo(TensorShape(output_size), 1, DataType::QSYMM16, qsymm_3);
391  ARM_COMPUTE_RETURN_ON_ERROR(NESlice::validate(&output_lowp, &forget_gate_input, { output_size }, { 2 * output_size }));
392  // _slice_cell_tensor
393  input_modulation_gate_input = TensorInfo(TensorShape(output_size), 1, DataType::QSYMM16, qsymm_3);
394  ARM_COMPUTE_RETURN_ON_ERROR(NESlice::validate(&output_lowp, &input_modulation_gate_input, { 2 * output_size }, { 3 * output_size }));
395  // _slice_output_tensor
396  output_gate_input = TensorInfo(TensorShape(output_size), 1, DataType::QSYMM16, qsymm_3);
397  ARM_COMPUTE_RETURN_ON_ERROR(NESlice::validate(&output_lowp, &output_gate_input, { 3 * output_size }, { 4 * output_size }));
398  }
399 
400  // _sigmoid_forget_gate
401  const TensorInfo forget_gate_output(forget_gate_input.tensor_shape(), 1, DataType::QSYMM16, qsymm_0);
402  ARM_COMPUTE_RETURN_ON_ERROR(NEActivationLayer::validate(&forget_gate_input, &forget_gate_output, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::LOGISTIC)));
403  // _sigmoid_input_gate
404  const TensorInfo input_gate_output(input_gate_input.tensor_shape(), 1, DataType::QSYMM16, qsymm_0);
405  ARM_COMPUTE_RETURN_ON_ERROR(NEActivationLayer::validate(&input_gate_input, &input_gate_output, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::LOGISTIC)));
406  // _tanh_modulation_gate
407  const TensorInfo input_modulation_gate_output(input_modulation_gate_input.tensor_shape(), 1, DataType::QSYMM16, qsymm_0);
408  ARM_COMPUTE_RETURN_ON_ERROR(NEActivationLayer::validate(&input_modulation_gate_input, &input_modulation_gate_output, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::TANH, 1.0f, 1.0f)));
409  // _sigmoid_output_gate
410  const TensorInfo output_gate_output(output_gate_input.tensor_shape(), 1, DataType::QSYMM16, qsymm_0);
411  ARM_COMPUTE_RETURN_ON_ERROR(NEActivationLayer::validate(&output_gate_input, &output_gate_output, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::LOGISTIC)));
412 
413  // _mul_forget_gate_cell_state
414  const TensorInfo cell_state_tmp1(forget_gate_output.tensor_shape(), 1, DataType::QSYMM16, qsymm_4);
416 
417  // _mul_input_gate_input_mod_gate
418  const TensorInfo cell_state_tmp2(input_gate_output.tensor_shape(), 1, DataType::QSYMM16, qsymm_4);
419  ARM_COMPUTE_RETURN_ON_ERROR(NEPixelWiseMultiplication::validate(&input_gate_output, &input_modulation_gate_output, &cell_state_tmp2, 1, ConvertPolicy::SATURATE, RoundingPolicy::TO_ZERO));
420 
421  // _add_cell_state_tmps
422  ARM_COMPUTE_RETURN_ON_ERROR(NEArithmeticAddition::validate(&cell_state_tmp1, &cell_state_tmp2, cell_state_out, ConvertPolicy::SATURATE));
423 
424  // _tanh_modulation_gate
425  const TensorInfo output_state_tmp(cell_state_out->tensor_shape(), 1, DataType::QSYMM16, qsymm_0);
426  ARM_COMPUTE_RETURN_ON_ERROR(NEActivationLayer::validate(cell_state_out, &output_state_tmp, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::TANH, 1.0f, 1.0f)));
427 
428  // _mul_output_state_tmp_output_gate
429  const TensorInfo output_state_out_symm(output_gate_output.tensor_shape(), 1, DataType::QSYMM16, qsymm_0);
430  ARM_COMPUTE_RETURN_ON_ERROR(NEPixelWiseMultiplication::validate(&output_state_tmp, &output_gate_output, &output_state_out_symm, 1, ConvertPolicy::SATURATE, RoundingPolicy::TO_ZERO));
431 
432  // _dequantize
433  const TensorInfo output_state_out_f32(output_state_out_symm.tensor_shape(), 1, DataType::F32);
434  ARM_COMPUTE_RETURN_ON_ERROR(NEDequantizationLayer::validate(&output_state_out_symm, &output_state_out_f32));
435 
436  // _quantize
437  ARM_COMPUTE_RETURN_ON_ERROR(NEQuantizationLayer::validate(&output_state_out_f32, output_state_out));
438 
439  if(cell_state_out->total_size() != 0)
440  {
441  ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(&cell_state_info, cell_state_out);
442  ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_SHAPES(&cell_state_info, cell_state_out);
443  ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_QUANTIZATION_INFO(&cell_state_info, cell_state_out);
444  }
445 
446  if(output_state_out->total_size() != 0)
447  {
448  ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(&output_state_info, output_state_out);
449  ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_SHAPES(&output_state_info, output_state_out);
450  ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_QUANTIZATION_INFO(&output_state_info, output_state_out);
451  }
452 
453  return Status{};
454 }
quantized, symmetric fixed-point 16-bit number
static Status validate(const ITensorInfo *input1, const ITensorInfo *input2, const ITensorInfo *output, ConvertPolicy policy, const ActivationLayerInfo &act_info=ActivationLayerInfo())
Static function to check if given info will lead to a valid configuration of NEArithmeticAddition.
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_QUANTIZATION_INFO(...)
Definition: Validate.h:610
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204
QuantizationInfo qweights(1.f/16.f, 16)
static Status validate(const ITensorInfo *input, const ITensorInfo *output, const ActivationLayerInfo &act_info)
[NEActivationLayer snippet]
1 channel, 1 F32 per channel
Status calculate_quantized_multiplier(float multiplier, int32_t *quant_multiplier, int32_t *shift, bool ignore_epsilon=false)
Calculate quantized representation of multiplier.
#define ARM_COMPUTE_RETURN_ERROR_ON(cond)
If the condition is true, an error is returned.
Definition: Error.h:296
QuantizationInfo qsymm_3(8.f/32768.f, 0)
#define ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR(...)
Definition: Validate.h:163
1 channel, 1 S32 per channel
static constexpr size_t DimX
Alias for dimension 0 also known as X dimension.
Definition: Window.h:43
static Status validate(const ITensorInfo *input, const ITensorInfo *output)
Static function to check if given info will lead to a valid configuration of NEDequantizationLayer.
quantized, asymmetric fixed-point 8-bit number unsigned
UniformQuantizationInfo uniform() const
Return per layer quantization info.
static constexpr size_t DimY
Alias for dimension 1 also known as Y dimension.
Definition: Window.h:45
static Status validate(const ITensorInfo *input1, const ITensorInfo *input2, const ITensorInfo *output, float scale, ConvertPolicy overflow_policy, RoundingPolicy rounding_policy, const ActivationLayerInfo &act_info=ActivationLayerInfo())
Static function to check if given info will lead to a valid configuration of NEPixelWiseMultiplicatio...
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_SHAPES(...)
Definition: Validate.h:443
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(...)
Definition: Validate.h:545
QuantizationInfo qasymm(1.f/128.f, 128)
static Status validate(const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *output, const GEMMInfo &gemm_info=GEMMInfo())
Static function to check if given info will lead to a valid configuration of NEGEMMLowpMatrixMultiply...
static Status validate(const std::vector< const ITensorInfo *> &inputs_vector, const ITensorInfo *output, size_t axis)
Static function to check if given info will lead to a valid configuration of NEConcatenateLayer.
static Status validate(const ITensorInfo *input, const ITensorInfo *output)
Static function to check if given info will lead to a valid configuration of NEQuantizationLayer.
QuantizationInfo qsymm_4(16.f/32768.f, 0)
static Status validate(const ITensorInfo *input, const ITensorInfo *output, const Coordinates &starts, const Coordinates &ends)
Static function to check if given info will lead to a valid configuration of NESlice.
Definition: NESlice.cpp:80
Truncates the least significant values that are lost in operations.
static Status validate(const ITensorInfo *input, const ITensorInfo *output)
Static function to check if given info will lead to a valid configuration of NETranspose.
Definition: NETranspose.cpp:39
static Status validate(const ITensorInfo *input, const ITensorInfo *bias, const ITensorInfo *output, int min=std::numeric_limits< int32_t >::lowest(), int max=std::numeric_limits< int32_t >::max())
Static function to check if given info will lead to a valid configuration of NEGEMMLowpQuantizeDownIn...

The documentation for this class was generated from the following files: