Compute Library
 21.02
CLLSTMLayerQuantized Class Reference

Basic function to run CLLSTMLayerQuantized. More...

#include <CLLSTMLayerQuantized.h>

Collaboration diagram for CLLSTMLayerQuantized:
[legend]

Public Member Functions

 CLLSTMLayerQuantized (std::shared_ptr< IMemoryManager > memory_manager=nullptr)
 Default constructor. More...
 
 CLLSTMLayerQuantized (const CLLSTMLayerQuantized &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 CLLSTMLayerQuantized (CLLSTMLayerQuantized &&)=default
 Default move constructor. More...
 
CLLSTMLayerQuantizedoperator= (const CLLSTMLayerQuantized &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
CLLSTMLayerQuantizedoperator= (CLLSTMLayerQuantized &&)=default
 Default move assignment operator. More...
 
void configure (const ICLTensor *input, const ICLTensor *input_to_input_weights, const ICLTensor *input_to_forget_weights, const ICLTensor *input_to_cell_weights, const ICLTensor *input_to_output_weights, const ICLTensor *recurrent_to_input_weights, const ICLTensor *recurrent_to_forget_weights, const ICLTensor *recurrent_to_cell_weights, const ICLTensor *recurrent_to_output_weights, const ICLTensor *input_gate_bias, const ICLTensor *forget_gate_bias, const ICLTensor *cell_bias, const ICLTensor *output_gate_bias, ICLTensor *cell_state_in, const ICLTensor *output_state_in, ICLTensor *cell_state_out, ICLTensor *output_state_out)
 Initialize function's tensors. More...
 
void configure (const CLCompileContext &compile_context, const ICLTensor *input, const ICLTensor *input_to_input_weights, const ICLTensor *input_to_forget_weights, const ICLTensor *input_to_cell_weights, const ICLTensor *input_to_output_weights, const ICLTensor *recurrent_to_input_weights, const ICLTensor *recurrent_to_forget_weights, const ICLTensor *recurrent_to_cell_weights, const ICLTensor *recurrent_to_output_weights, const ICLTensor *input_gate_bias, const ICLTensor *forget_gate_bias, const ICLTensor *cell_bias, const ICLTensor *output_gate_bias, ICLTensor *cell_state_in, const ICLTensor *output_state_in, ICLTensor *cell_state_out, ICLTensor *output_state_out)
 Initialize function's tensors. More...
 
void run () override
 Run the kernels contained in the function. More...
 
void prepare () override
 Prepare the function for executing. More...
 
- Public Member Functions inherited from IFunction
virtual ~IFunction ()=default
 Destructor. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *input, const ITensorInfo *input_to_input_weights, const ITensorInfo *input_to_forget_weights, const ITensorInfo *input_to_cell_weights, const ITensorInfo *input_to_output_weights, const ITensorInfo *recurrent_to_input_weights, const ITensorInfo *recurrent_to_forget_weights, const ITensorInfo *recurrent_to_cell_weights, const ITensorInfo *recurrent_to_output_weights, const ITensorInfo *input_gate_bias, const ITensorInfo *forget_gate_bias, const ITensorInfo *cell_bias, const ITensorInfo *output_gate_bias, const ITensorInfo *cell_state_in, const ITensorInfo *output_state_in, const ITensorInfo *cell_state_out, const ITensorInfo *output_state_out)
 Static function to check if given info will lead to a valid configuration of CLLSTMLayerQuantized. More...
 

Detailed Description

Basic function to run CLLSTMLayerQuantized.

This function calls the following CL functions/kernels:

  1. CLGEMMLowpMatrixMultiplyCore Quantized matrix multiplication core. Accumulators are 32-bit integers
  2. CLGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPoint Convert 32-bit integers into QSYMM16
  3. CLTranspose Matrix transpose
  4. CLConcatenateLayer Tensor concatenation
  5. CLActivationLayer Activation functions (tanh and logistic)
  6. CLArithmeticAddition Elementwise addition
  7. CLPixelWiseMultiplication Elementwise multiplication
  8. CLSlice Tensor slicing
  9. CLDequantizationLayer Dequantize into float
  10. CLQuantizationLayer Quantize from float

Definition at line 61 of file CLLSTMLayerQuantized.h.

Constructor & Destructor Documentation

◆ CLLSTMLayerQuantized() [1/3]

CLLSTMLayerQuantized ( std::shared_ptr< IMemoryManager memory_manager = nullptr)

Default constructor.

Definition at line 53 of file CLLSTMLayerQuantized.cpp.

54  : _memory_group(std::move(memory_manager)), _gemmlowp(), _output_stage(), _transpose_weights(), _concat_input_weights(), _concat_recurrent_weights(), _concat_weights(), _concat_inputs(),
55  _concat_bias(), _sigmoid_forget_gate(), _sigmoid_input_gate(), _sigmoid_output_gate(), _tanh_modulation_gate(), _tanh_output_state(), _add_cell_state_tmps(), _add2(), _mul_forget_gate_cell_state(),
56  _mul_input_gate_input_mod_gate(), _mul_output_state_tmp_output_gate(), _slice_input_tensor(), _slice_forget_tensor(), _slice_cell_tensor(), _slice_output_tensor(), _dequantize(), _quantize(),
57  _input_to_input_weights(nullptr), _input_to_forget_weights(nullptr), _input_to_cell_weights(nullptr), _input_to_output_weights(nullptr), _recurrent_to_input_weights(nullptr),
58  _recurrent_to_forget_weights(nullptr), _recurrent_to_cell_weights(nullptr), _recurrent_to_output_weights(nullptr), _input_gate_bias(nullptr), _forget_gate_bias(nullptr), _cell_bias(nullptr),
59  _output_gate_bias(nullptr), _recurrent_weights(), _input_weights(), _weights(), _input(), _weights_transposed(), _output_highp(), _output_lowp(), _bias(), _forget_gate_input(), _input_gate_input(),
60  _output_gate_input(), _input_modulation_gate_input(), _forget_gate_output(), _input_gate_output(), _output_gate_output(), _input_modulation_gate_output(), _cell_state_tmp1(), _cell_state_tmp2(),
61  _output_state_tmp(), _output_state_out_symm(), _output_state_out_f32(), _is_prepared(false)
62 {
63 }

◆ CLLSTMLayerQuantized() [2/3]

Prevent instances of this class from being copied (As this class contains pointers)

◆ CLLSTMLayerQuantized() [3/3]

Default move constructor.

Member Function Documentation

◆ configure() [1/2]

void configure ( const ICLTensor input,
const ICLTensor input_to_input_weights,
const ICLTensor input_to_forget_weights,
const ICLTensor input_to_cell_weights,
const ICLTensor input_to_output_weights,
const ICLTensor recurrent_to_input_weights,
const ICLTensor recurrent_to_forget_weights,
const ICLTensor recurrent_to_cell_weights,
const ICLTensor recurrent_to_output_weights,
const ICLTensor input_gate_bias,
const ICLTensor forget_gate_bias,
const ICLTensor cell_bias,
const ICLTensor output_gate_bias,
ICLTensor cell_state_in,
const ICLTensor output_state_in,
ICLTensor cell_state_out,
ICLTensor output_state_out 
)

Initialize function's tensors.

Parameters
[in]inputSource tensor. Input is a 2D tensor with dimensions [input_size, batch_size]. Data types supported: QASYMM8.
[in]input_to_input_weights2D weights tensor with dimensions [input_size, output_size]. Data type supported: Same as input.
[in]input_to_forget_weights2D weights tensor with dimensions [input_size, output_size]. Data type supported: Same as input.
[in]input_to_cell_weights2D weights tensor with dimensions [input_size, output_size]. Data type supported: Same as input.
[in]input_to_output_weights2D weights tensor with dimensions [input_size, output_size]. Data type supported: Same as input.
[in]recurrent_to_input_weights2D weights tensor with dimensions [output_size, output_size]. Data type supported: Same as input.
[in]recurrent_to_forget_weights2D weights tensor with dimensions [output_size, output_size]. Data type supported: Same as input.
[in]recurrent_to_cell_weights2D weights tensor with dimensions [output_size, output_size]. Data type supported: Same as input.
[in]recurrent_to_output_weights2D weights tensor with dimensions [output_size, output_size]. Data type supported: Same as input.
[in]input_gate_bias1D weights tensor with dimensions [output_size]. Data type supported: S32.
[in]forget_gate_bias1D weights tensor with dimensions [output_size]. Data type supported: S32.
[in]cell_bias1D weights tensor with dimensions [output_size]. Data type supported: S32.
[in]output_gate_bias1D weights tensor with dimensions [output_size]. Data type supported: S32.
[in]cell_state_in2D tensor with dimensions [output_size, batch_size]. Data type supported: QSYMM16.
[in]output_state_in2D tensor with dimensions [output_size, batch_size]. Data type supported: Same as input.
[out]cell_state_outDestination tensor. Output is a 2D tensor with dimensions [output_size, batch_size]. Data type supported: QSYMM16.
[out]output_state_outDestination tensor. Output is a 2D tensor with dimensions [output_size, batch_size].Data types supported: Same as input.

Definition at line 65 of file CLLSTMLayerQuantized.cpp.

References CLKernelLibrary::get().

Referenced by arm_compute::test::validation::TEST_CASE().

71 {
74  output_state_out);
75 }
static CLKernelLibrary & get()
Access the KernelLibrary singleton.
void configure(const ICLTensor *input, const ICLTensor *input_to_input_weights, const ICLTensor *input_to_forget_weights, const ICLTensor *input_to_cell_weights, const ICLTensor *input_to_output_weights, const ICLTensor *recurrent_to_input_weights, const ICLTensor *recurrent_to_forget_weights, const ICLTensor *recurrent_to_cell_weights, const ICLTensor *recurrent_to_output_weights, const ICLTensor *input_gate_bias, const ICLTensor *forget_gate_bias, const ICLTensor *cell_bias, const ICLTensor *output_gate_bias, ICLTensor *cell_state_in, const ICLTensor *output_state_in, ICLTensor *cell_state_out, ICLTensor *output_state_out)
Initialize function&#39;s tensors.

◆ configure() [2/2]

void configure ( const CLCompileContext compile_context,
const ICLTensor input,
const ICLTensor input_to_input_weights,
const ICLTensor input_to_forget_weights,
const ICLTensor input_to_cell_weights,
const ICLTensor input_to_output_weights,
const ICLTensor recurrent_to_input_weights,
const ICLTensor recurrent_to_forget_weights,
const ICLTensor recurrent_to_cell_weights,
const ICLTensor recurrent_to_output_weights,
const ICLTensor input_gate_bias,
const ICLTensor forget_gate_bias,
const ICLTensor cell_bias,
const ICLTensor output_gate_bias,
ICLTensor cell_state_in,
const ICLTensor output_state_in,
ICLTensor cell_state_out,
ICLTensor output_state_out 
)

Initialize function's tensors.

Parameters
[in]compile_contextThe compile context to be used.
[in]inputSource tensor. Input is a 2D tensor with dimensions [input_size, batch_size]. Data types supported: QASYMM8.
[in]input_to_input_weights2D weights tensor with dimensions [input_size, output_size]. Data type supported: Same as input.
[in]input_to_forget_weights2D weights tensor with dimensions [input_size, output_size]. Data type supported: Same as input.
[in]input_to_cell_weights2D weights tensor with dimensions [input_size, output_size]. Data type supported: Same as input.
[in]input_to_output_weights2D weights tensor with dimensions [input_size, output_size]. Data type supported: Same as input.
[in]recurrent_to_input_weights2D weights tensor with dimensions [output_size, output_size]. Data type supported: Same as input.
[in]recurrent_to_forget_weights2D weights tensor with dimensions [output_size, output_size]. Data type supported: Same as input.
[in]recurrent_to_cell_weights2D weights tensor with dimensions [output_size, output_size]. Data type supported: Same as input.
[in]recurrent_to_output_weights2D weights tensor with dimensions [output_size, output_size]. Data type supported: Same as input.
[in]input_gate_bias1D weights tensor with dimensions [output_size]. Data type supported: S32.
[in]forget_gate_bias1D weights tensor with dimensions [output_size]. Data type supported: S32.
[in]cell_bias1D weights tensor with dimensions [output_size]. Data type supported: S32.
[in]output_gate_bias1D weights tensor with dimensions [output_size]. Data type supported: S32.
[in]cell_state_in2D tensor with dimensions [output_size, batch_size]. Data type supported: QSYMM16.
[in]output_state_in2D tensor with dimensions [output_size, batch_size]. Data type supported: Same as input.
[out]cell_state_outDestination tensor. Output is a 2D tensor with dimensions [output_size, batch_size]. Data type supported: QSYMM16.
[out]output_state_outDestination tensor. Output is a 2D tensor with dimensions [output_size, batch_size].Data types supported: Same as input.

Definition at line 77 of file CLLSTMLayerQuantized.cpp.

References CLTensorAllocator::allocate(), CLTensor::allocator(), ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, arm_compute::auto_init_if_empty(), arm_compute::quantization::calculate_quantized_multiplier(), CLDequantizationLayer::configure(), CLTranspose::configure(), CLQuantizationLayer::configure(), CLActivationLayer::configure(), CLConcatenateLayer::configure(), CLGEMMLowpMatrixMultiplyCore::configure(), CLArithmeticAddition::configure(), CLSlice::configure(), CLPixelWiseMultiplication::configure(), CLGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPoint::configure(), ITensorInfo::dimension(), Window::DimX, Window::DimY, arm_compute::F32, arm_compute::test::validation::forget_gate_bias, ITensor::info(), CLTensor::info(), ITensorAllocator::init(), arm_compute::test::validation::input_gate_bias, arm_compute::test::validation::input_size, arm_compute::test::validation::input_to_cell_weights, arm_compute::test::validation::input_to_forget_weights, arm_compute::test::validation::input_to_input_weights, arm_compute::test::validation::input_to_output_weights, ActivationLayerInfo::LOGISTIC, MemoryGroup::manage(), UniformQuantizationInfo::offset, arm_compute::test::validation::output_gate_bias, arm_compute::test::validation::output_size, arm_compute::test::validation::qasymm(), arm_compute::QASYMM8, arm_compute::QSYMM16, arm_compute::test::validation::qsymm_3(), arm_compute::test::validation::qsymm_4(), ITensorInfo::quantization_info(), arm_compute::test::validation::qweights(), arm_compute::test::validation::recurrent_to_cell_weights, arm_compute::test::validation::recurrent_to_forget_weights, arm_compute::test::validation::recurrent_to_input_weights, arm_compute::test::validation::recurrent_to_output_weights, arm_compute::S32, arm_compute::SATURATE, UniformQuantizationInfo::scale, TensorInfo::set_quantization_info(), ActivationLayerInfo::TANH, ITensorInfo::tensor_shape(), TensorInfo::tensor_shape(), arm_compute::TO_ZERO, QuantizationInfo::uniform(), and CLLSTMLayerQuantized::validate().

83 {
86  input_gate_bias, forget_gate_bias, cell_bias, output_gate_bias, cell_state_in, output_state_in, cell_state_out, output_state_out);
87 
91  input_gate_bias->info(), forget_gate_bias->info(), cell_bias->info(), output_gate_bias->info(), cell_state_in->info(), output_state_in->info(), cell_state_out->info(), output_state_out->info()));
92 
93  const int input_size = input->info()->dimension(0);
94  const int batch_size = input->info()->dimension(1);
95  const int output_size = input_to_input_weights->info()->dimension(1);
96 
97  const QuantizationInfo qweights = input_to_input_weights->info()->quantization_info(); // Weights quantization
98 
99  auto_init_if_empty(*cell_state_out->info(), TensorInfo(TensorShape(batch_size, output_size), 1, DataType::QSYMM16, qsymm_4));
100  auto_init_if_empty(*output_state_out->info(), TensorInfo(TensorShape(batch_size, output_size), 1, DataType::QASYMM8, qasymm));
101 
102  _input_to_input_weights = input_to_input_weights;
103  _input_to_forget_weights = input_to_forget_weights;
104  _input_to_cell_weights = input_to_cell_weights;
105  _input_to_output_weights = input_to_output_weights;
106  _recurrent_to_input_weights = recurrent_to_input_weights;
107  _recurrent_to_forget_weights = recurrent_to_forget_weights;
108  _recurrent_to_cell_weights = recurrent_to_cell_weights;
109  _recurrent_to_output_weights = recurrent_to_output_weights;
110  _input_gate_bias = input_gate_bias;
111  _forget_gate_bias = forget_gate_bias;
112  _cell_bias = cell_bias;
113  _output_gate_bias = output_gate_bias;
114 
115  // Weights concatenation
116  std::vector<const ICLTensor *> inputs_weights_vector;
117  inputs_weights_vector.emplace_back(input_to_input_weights);
118  inputs_weights_vector.emplace_back(input_to_forget_weights);
119  inputs_weights_vector.emplace_back(input_to_cell_weights);
120  inputs_weights_vector.emplace_back(input_to_output_weights);
121 
122  std::vector<const ICLTensor *> recurrent_weights_vector;
123  recurrent_weights_vector.emplace_back(recurrent_to_input_weights);
124  recurrent_weights_vector.emplace_back(recurrent_to_forget_weights);
125  recurrent_weights_vector.emplace_back(recurrent_to_cell_weights);
126  recurrent_weights_vector.emplace_back(recurrent_to_output_weights);
127 
128  _input_weights.allocator()->init(TensorInfo(TensorShape(input_size, 4 * output_size), 1, DataType::QASYMM8, qweights));
129  _concat_input_weights.configure(compile_context, inputs_weights_vector, &_input_weights, Window::DimY);
130 
131  _recurrent_weights.allocator()->init(TensorInfo(TensorShape(output_size, 4 * output_size), 1, DataType::QASYMM8, qweights));
132  _concat_recurrent_weights.configure(compile_context, recurrent_weights_vector, &_recurrent_weights, Window::DimY);
133 
134  std::vector<const ICLTensor *> weights_vector;
135  weights_vector.emplace_back(&_recurrent_weights);
136  weights_vector.emplace_back(&_input_weights);
137 
138  _weights.allocator()->init(TensorInfo(TensorShape(output_size + input_size, 4 * output_size), 1, DataType::QASYMM8, qweights));
139  _concat_weights.configure(compile_context, weights_vector, &_weights, Window::DimX);
140  _transpose_weights.configure(compile_context, &_weights, &_weights_transposed);
141 
142  // Input concatenation
143  std::vector<const ICLTensor *> input_vector;
144  input_vector.emplace_back(input);
145  input_vector.emplace_back(output_state_in);
146 
147  _memory_group.manage(&_input);
148  _input.allocator()->init(TensorInfo(TensorShape(output_size + input_size, batch_size), 1, DataType::QASYMM8, qasymm));
149  _concat_inputs.configure(compile_context, input_vector, &_input, Window::DimX);
150 
151  // Bias concatenation
152  std::vector<const ICLTensor *> bias_vector;
153  bias_vector.emplace_back(input_gate_bias);
154  bias_vector.emplace_back(forget_gate_bias);
155  bias_vector.emplace_back(cell_bias);
156  bias_vector.emplace_back(output_gate_bias);
157 
158  _bias.allocator()->init(TensorInfo(TensorShape(4 * output_size), 1, DataType::S32));
159  _concat_bias.configure(compile_context, bias_vector, &_bias, Window::DimX);
160 
161  // Invert the offset for gemmlowp
162  _input.info()->set_quantization_info(QuantizationInfo(qasymm.uniform().scale, -qasymm.uniform().offset));
163  _weights_transposed.info()->set_quantization_info(QuantizationInfo(qweights.uniform().scale, -qweights.uniform().offset));
164 
165  // Run gemmlowp
166  _memory_group.manage(&_output_highp);
167  _output_highp.allocator()->init(TensorInfo(TensorShape(4 * output_size, batch_size), 1, DataType::S32));
168  _gemmlowp.configure(compile_context, &_input, &_weights_transposed, nullptr, &_output_highp);
169  _input.allocator()->allocate();
170 
171  // Set the offset back
172  _input.info()->set_quantization_info(QuantizationInfo(qasymm.uniform().scale, qasymm.uniform().offset));
173  _weights_transposed.info()->set_quantization_info(QuantizationInfo(qweights.uniform().scale, qweights.uniform().offset));
174 
175  // multiplier = (input_scale * weights_scale) / output_scale (2 ^ (-12))
176  _output_lowp.allocator()->init(TensorInfo(_output_highp.info()->tensor_shape(), 1, DataType::QSYMM16, qsymm_3));
177 
178  const float multiplier = 4096.f * qasymm.uniform().scale * qweights.uniform().scale;
179  int output_multiplier = 0;
180  int output_shift = 0;
181  quantization::calculate_quantized_multiplier(multiplier, &output_multiplier, &output_shift);
182 
183  _memory_group.manage(&_output_lowp);
184  _output_stage.configure(compile_context, &_output_highp, &_bias, &_output_lowp, output_multiplier, output_shift);
185  _output_highp.allocator()->allocate();
186  _bias.allocator()->allocate();
187 
188  // Get the gate tensors
189  if(batch_size > 1)
190  {
191  _memory_group.manage(&_input_gate_input);
192  _slice_input_tensor.configure(compile_context, &_output_lowp, &_input_gate_input, { 0, 0 }, { output_size, batch_size });
193  _memory_group.manage(&_forget_gate_input);
194  _slice_forget_tensor.configure(compile_context, &_output_lowp, &_forget_gate_input, { output_size, 0 }, { 2 * output_size, batch_size });
195  _memory_group.manage(&_input_modulation_gate_input);
196  _slice_cell_tensor.configure(compile_context, &_output_lowp, &_input_modulation_gate_input, { 2 * output_size, 0 }, { 3 * output_size, batch_size });
197  _memory_group.manage(&_output_gate_input);
198  _slice_output_tensor.configure(compile_context, &_output_lowp, &_output_gate_input, { 3 * output_size, 0 }, { 4 * output_size, batch_size });
199  _output_lowp.allocator()->allocate();
200  }
201  else
202  {
203  _memory_group.manage(&_input_gate_input);
204  _slice_input_tensor.configure(compile_context, &_output_lowp, &_input_gate_input, { 0 }, { output_size });
205  _memory_group.manage(&_forget_gate_input);
206  _slice_forget_tensor.configure(compile_context, &_output_lowp, &_forget_gate_input, { output_size }, { 2 * output_size });
207  _memory_group.manage(&_input_modulation_gate_input);
208  _slice_cell_tensor.configure(compile_context, &_output_lowp, &_input_modulation_gate_input, { 2 * output_size }, { 3 * output_size });
209  _memory_group.manage(&_output_gate_input);
210  _slice_output_tensor.configure(compile_context, &_output_lowp, &_output_gate_input, { 3 * output_size }, { 4 * output_size });
211  _output_lowp.allocator()->allocate();
212  }
213 
214  // Forget gate
215  _memory_group.manage(&_forget_gate_output);
216  _forget_gate_output.allocator()->init(TensorInfo(_forget_gate_input.info()->tensor_shape(), 1, DataType::QSYMM16, qsymm_0));
217  _sigmoid_forget_gate.configure(compile_context, &_forget_gate_input, &_forget_gate_output, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::LOGISTIC));
218  _forget_gate_input.allocator()->allocate();
219 
220  // Input gate
221  _memory_group.manage(&_input_gate_output);
222  _input_gate_output.allocator()->init(TensorInfo(_input_gate_input.info()->tensor_shape(), 1, DataType::QSYMM16, qsymm_0));
223  _sigmoid_input_gate.configure(compile_context, &_input_gate_input, &_input_gate_output, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::LOGISTIC));
224  _input_gate_input.allocator()->allocate();
225 
226  // Input modulation gate equation
227  _memory_group.manage(&_input_modulation_gate_output);
228  _input_modulation_gate_output.allocator()->init(TensorInfo(_input_modulation_gate_input.info()->tensor_shape(), 1, DataType::QSYMM16, qsymm_0));
229  _tanh_modulation_gate.configure(compile_context, &_input_modulation_gate_input, &_input_modulation_gate_output, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::TANH, 1.0f, 1.0f));
230  _input_modulation_gate_input.allocator()->allocate();
231 
232  // Output gate
233  _memory_group.manage(&_output_gate_output);
234  _output_gate_output.allocator()->init(TensorInfo(_output_gate_input.info()->tensor_shape(), 1, DataType::QSYMM16, qsymm_0));
235  _sigmoid_output_gate.configure(compile_context, &_output_gate_input, &_output_gate_output, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::LOGISTIC));
236  _output_gate_input.allocator()->allocate();
237 
238  // Long term memory
239  _memory_group.manage(&_cell_state_tmp1);
240  _cell_state_tmp1.allocator()->init(TensorInfo(_forget_gate_output.info()->tensor_shape(), 1, DataType::QSYMM16, qsymm_4));
241  _mul_forget_gate_cell_state.configure(compile_context, &_forget_gate_output, cell_state_in, &_cell_state_tmp1, 1, ConvertPolicy::SATURATE, RoundingPolicy::TO_ZERO);
242  _forget_gate_output.allocator()->allocate();
243 
244  _memory_group.manage(&_cell_state_tmp2);
245  _cell_state_tmp2.allocator()->init(TensorInfo(_input_gate_output.info()->tensor_shape(), 1, DataType::QSYMM16, qsymm_4));
246  _mul_input_gate_input_mod_gate.configure(compile_context, &_input_gate_output, &_input_modulation_gate_output, &_cell_state_tmp2, 1, ConvertPolicy::SATURATE, RoundingPolicy::TO_ZERO);
247  _input_modulation_gate_output.allocator()->allocate();
248  _input_gate_output.allocator()->allocate();
249 
250  _add_cell_state_tmps.configure(compile_context, &_cell_state_tmp1, &_cell_state_tmp2, cell_state_out, ConvertPolicy::SATURATE);
251  _cell_state_tmp1.allocator()->allocate();
252  _cell_state_tmp2.allocator()->allocate();
253 
254  // Short term memory
255  _memory_group.manage(&_output_state_tmp);
256  _output_state_tmp.allocator()->init(TensorInfo(cell_state_out->info()->tensor_shape(), 1, DataType::QSYMM16, qsymm_0));
257  _tanh_output_state.configure(compile_context, cell_state_out, &_output_state_tmp, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::TANH, 1.0f, 1.0f));
258 
259  _memory_group.manage(&_output_state_out_symm);
260  _output_state_out_symm.allocator()->init(TensorInfo(_output_gate_output.info()->tensor_shape(), 1, DataType::QSYMM16, qsymm_0));
261  _mul_output_state_tmp_output_gate.configure(compile_context, &_output_state_tmp, &_output_gate_output, &_output_state_out_symm, 1, ConvertPolicy::SATURATE, RoundingPolicy::TO_ZERO);
262  _output_gate_output.allocator()->allocate();
263  _output_state_tmp.allocator()->allocate();
264 
265  // Requantize the output state from QSYMM16 to QASYMM8
266  _memory_group.manage(&_output_state_out_f32);
267  _output_state_out_f32.allocator()->init(TensorInfo(_output_state_out_symm.info()->tensor_shape(), 1, DataType::F32));
268  _dequantize.configure(compile_context, &_output_state_out_symm, &_output_state_out_f32);
269  _output_state_out_symm.allocator()->allocate();
270 
271  _quantize.configure(compile_context, &_output_state_out_f32, output_state_out);
272  _output_state_out_f32.allocator()->allocate();
273 }
quantized, symmetric fixed-point 16-bit number
TensorInfo * info() const override
Interface to be implemented by the child class to return the tensor&#39;s metadata.
Definition: CLTensor.cpp:41
void configure(const ICLTensor *input, const ICLTensor *bias, ICLTensor *output, int result_fixedpoint_multiplier, int result_shift, int min=std::numeric_limits< int32_t >::lowest(), int max=std::numeric_limits< int32_t >::max())
Initialise the kernel&#39;s inputs, output.
QuantizationInfo qweights(1.f/16.f, 16)
1 channel, 1 F32 per channel
void configure(ICLTensor *input1, ICLTensor *input2, ICLTensor *output, float scale, ConvertPolicy overflow_policy, RoundingPolicy rounding_policy, const ActivationLayerInfo &act_info=ActivationLayerInfo())
Initialise the kernel&#39;s inputs, output and convertion policy.
CLTensorAllocator * allocator()
Return a pointer to the tensor&#39;s allocator.
Definition: CLTensor.cpp:61
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
void configure(const ICLTensor *a, const ICLTensor *b, const ICLTensor *c, ICLTensor *output, const GEMMInfo &gemm_info=GEMMInfo())
Initialise the kernel&#39;s inputs, output.
Status calculate_quantized_multiplier(float multiplier, int32_t *quant_multiplier, int32_t *shift, bool ignore_epsilon=false)
Calculate quantized representation of multiplier.
void configure(std::vector< const ICLTensor *> &inputs_vector, ICLTensor *output, size_t axis)
Initialise the kernel&#39;s inputs vector and output.
void init(const TensorInfo &input, size_t alignment=0)
Initialize a tensor based on the passed TensorInfo.
QuantizationInfo qsymm_3(8.f/32768.f, 0)
ITensorInfo & set_quantization_info(const QuantizationInfo &quantization_info) override
Set the quantization settings (scale and offset) of the tensor.
Definition: TensorInfo.cpp:380
1 channel, 1 S32 per channel
void manage(IMemoryManageable *obj) override
Sets a object to be managed by the given memory group.
Definition: MemoryGroup.h:79
static constexpr size_t DimX
Alias for dimension 0 also known as X dimension.
Definition: Window.h:43
static Status validate(const ITensorInfo *input, const ITensorInfo *input_to_input_weights, const ITensorInfo *input_to_forget_weights, const ITensorInfo *input_to_cell_weights, const ITensorInfo *input_to_output_weights, const ITensorInfo *recurrent_to_input_weights, const ITensorInfo *recurrent_to_forget_weights, const ITensorInfo *recurrent_to_cell_weights, const ITensorInfo *recurrent_to_output_weights, const ITensorInfo *input_gate_bias, const ITensorInfo *forget_gate_bias, const ITensorInfo *cell_bias, const ITensorInfo *output_gate_bias, const ITensorInfo *cell_state_in, const ITensorInfo *output_state_in, const ITensorInfo *cell_state_out, const ITensorInfo *output_state_out)
Static function to check if given info will lead to a valid configuration of CLLSTMLayerQuantized.
quantized, asymmetric fixed-point 8-bit number unsigned
UniformQuantizationInfo uniform() const
Return per layer quantization info.
bool auto_init_if_empty(ITensorInfo &info, const TensorShape &shape, int num_channels, DataType data_type, QuantizationInfo quantization_info=QuantizationInfo())
Auto initialize the tensor info (shape, number of channels and data type) if the current assignment i...
void configure(const ICLTensor *input, ICLTensor *output)
Set the input and output tensors.
void configure(ICLTensor *input1, ICLTensor *input2, ICLTensor *output, ConvertPolicy policy, const ActivationLayerInfo &act_info=ActivationLayerInfo())
Initialise the kernel&#39;s inputs, output and conversion policy.
void configure(const ICLTensor *input, ICLTensor *output, const Coordinates &starts, const Coordinates &ends)
Configure kernel.
Definition: CLSlice.cpp:84
void allocate() override
Allocate size specified by TensorInfo of OpenCL memory.
static constexpr size_t DimY
Alias for dimension 1 also known as Y dimension.
Definition: Window.h:45
void configure(const ICLTensor *input, ICLTensor *output)
Set the input and output tensors.
void configure(ICLTensor *input, ICLTensor *output, ActivationLayerInfo act_info)
Set the input and output tensor.
QuantizationInfo qasymm(1.f/128.f, 128)
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161
void configure(const ICLTensor *input, ICLTensor *output)
Initialise the kernel&#39;s inputs and output.
Definition: CLTranspose.cpp:32
QuantizationInfo qsymm_4(16.f/32768.f, 0)
const TensorShape & tensor_shape() const override
Size for each dimension of the tensor.
Definition: TensorInfo.h:262
Truncates the least significant values that are lost in operations.

◆ operator=() [1/2]

CLLSTMLayerQuantized& operator= ( const CLLSTMLayerQuantized )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

CLLSTMLayerQuantized& operator= ( CLLSTMLayerQuantized &&  )
default

Default move assignment operator.

◆ prepare()

void prepare ( )
overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Note
Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from IFunction.

Definition at line 532 of file CLLSTMLayerQuantized.cpp.

References CLTensorAllocator::allocate(), CLTensor::allocator(), CLTensorAllocator::free(), ITensor::mark_as_unused(), ICLSimpleFunction::run(), and CLConcatenateLayer::run().

Referenced by CLLSTMLayerQuantized::run().

533 {
534  if(!_is_prepared)
535  {
536  _input_weights.allocator()->allocate();
537  _concat_input_weights.run();
538 
539  _input_to_input_weights->mark_as_unused();
540  _input_to_forget_weights->mark_as_unused();
541  _input_to_cell_weights->mark_as_unused();
542  _input_to_output_weights->mark_as_unused();
543 
544  _recurrent_weights.allocator()->allocate();
545  _concat_recurrent_weights.run();
546  _recurrent_to_input_weights->mark_as_unused();
547  _recurrent_to_forget_weights->mark_as_unused();
548  _recurrent_to_cell_weights->mark_as_unused();
549  _recurrent_to_output_weights->mark_as_unused();
550 
551  _weights.allocator()->allocate();
552  _concat_weights.run();
553 
554  _input_weights.mark_as_unused();
555  _input_weights.allocator()->free();
556  _recurrent_weights.mark_as_unused();
557  _recurrent_weights.allocator()->free();
558 
559  _weights_transposed.allocator()->allocate();
560  _transpose_weights.run();
561 
562  _weights.mark_as_unused();
563  _weights.allocator()->free();
564 
565  _bias.allocator()->allocate();
566  _concat_bias.run();
567  _input_gate_bias->mark_as_unused();
568  _forget_gate_bias->mark_as_unused();
569  _cell_bias->mark_as_unused();
570  _output_gate_bias->mark_as_unused();
571 
572  _is_prepared = true;
573  }
574 }
CLTensorAllocator * allocator()
Return a pointer to the tensor&#39;s allocator.
Definition: CLTensor.cpp:61
void mark_as_unused() const
Marks a tensor as unused.
Definition: ITensor.cpp:168
void run() override final
Run the kernels contained in the function.
void allocate() override
Allocate size specified by TensorInfo of OpenCL memory.
void free() override
Free allocated OpenCL memory.
void run() override
Run the kernels contained in the function.

◆ run()

void run ( )
overridevirtual

Run the kernels contained in the function.

For Neon kernels:

  • Multi-threading is used for the kernels which are parallelisable.
  • By default std::thread::hardware_concurrency() threads are used.
Note
CPPScheduler::set_num_threads() can be used to manually set the number of threads

For OpenCL kernels:

  • All the kernels are enqueued on the queue associated with CLScheduler.
  • The queue is then flushed.
Note
The function will not block until the kernels are executed. It is the user's responsibility to wait.
Will call prepare() on first run if hasn't been done

Implements IFunction.

Definition at line 485 of file CLLSTMLayerQuantized.cpp.

References CLLSTMLayerQuantized::prepare(), ICLSimpleFunction::run(), CLActivationLayer::run(), CLConcatenateLayer::run(), CLGEMMLowpMatrixMultiplyCore::run(), CLArithmeticAddition::run(), CLSlice::run(), and CLPixelWiseMultiplication::run().

486 {
487  prepare();
488 
489  // Acquire all the temporaries
490  MemoryGroupResourceScope scope_mg(_memory_group);
491 
492  // Concat and transpose the input
493  _concat_inputs.run();
494 
495  // Run gemmlowp
496  _gemmlowp.run();
497  _output_stage.run();
498 
499  // Slice the results
500  _slice_input_tensor.run();
501  _slice_forget_tensor.run();
502  _slice_cell_tensor.run();
503  _slice_output_tensor.run();
504 
505  // Gates
506  // Forget gate
507  _sigmoid_forget_gate.run();
508 
509  // Input gate
510  _sigmoid_input_gate.run();
511 
512  // Input modulation gate
513  _tanh_modulation_gate.run();
514 
515  // Output gate
516  _sigmoid_output_gate.run();
517 
518  // Cell state (long term memory)
519  _mul_forget_gate_cell_state.run();
520  _mul_input_gate_input_mod_gate.run();
521  _add_cell_state_tmps.run();
522 
523  // Output state (short term memory)
524  _tanh_output_state.run();
525  _mul_output_state_tmp_output_gate.run();
526 
527  // Requantize output state from QSYMM16 to QASYMM8
528  _dequantize.run();
529  _quantize.run();
530 }
void run() override
Run the kernels contained in the function.
void prepare() override
Prepare the function for executing.
void run() override
Run the kernels contained in the function.
void run() override
Run the kernels contained in the function.
void run() override
Run the kernels contained in the function.
void run() override final
Run the kernels contained in the function.
void run() override
Run the kernels contained in the function.
Definition: CLSlice.cpp:97
void run() override
Run the kernels contained in the function.

◆ validate()

Status validate ( const ITensorInfo input,
const ITensorInfo input_to_input_weights,
const ITensorInfo input_to_forget_weights,
const ITensorInfo input_to_cell_weights,
const ITensorInfo input_to_output_weights,
const ITensorInfo recurrent_to_input_weights,
const ITensorInfo recurrent_to_forget_weights,
const ITensorInfo recurrent_to_cell_weights,
const ITensorInfo recurrent_to_output_weights,
const ITensorInfo input_gate_bias,
const ITensorInfo forget_gate_bias,
const ITensorInfo cell_bias,
const ITensorInfo output_gate_bias,
const ITensorInfo cell_state_in,
const ITensorInfo output_state_in,
const ITensorInfo cell_state_out,
const ITensorInfo output_state_out 
)
static

Static function to check if given info will lead to a valid configuration of CLLSTMLayerQuantized.

Parameters
[in]inputSource tensor info. Input is a 2D tensor info with dimensions [input_size, batch_size]. Data types supported: QASYMM8.
[in]input_to_input_weights2D weights tensor info with dimensions [input_size, output_size]. Data type supported: Same as input.
[in]input_to_forget_weights2D weights tensor info with dimensions [input_size, output_size]. Data type supported: Same as input.
[in]input_to_cell_weights2D weights tensor info with dimensions [input_size, output_size]. Data type supported: Same as input.
[in]input_to_output_weights2D weights tensor info with dimensions [input_size, output_size]. Data type supported: Same as input.
[in]recurrent_to_input_weights2D weights tensor info with dimensions [output_size, output_size]. Data type supported: Same as input.
[in]recurrent_to_forget_weights2D weights tensor info with dimensions [output_size, output_size]. Data type supported: Same as input.
[in]recurrent_to_cell_weights2D weights tensor info with dimensions [output_size, output_size]. Data type supported: Same as input.
[in]recurrent_to_output_weights2D weights tensor info with dimensions [output_size, output_size]. Data type supported: Same as input.
[in]input_gate_bias1D weights tensor info with dimensions [output_size]. Data type supported: S32.
[in]forget_gate_bias1D weights tensor info with dimensions [output_size]. Data type supported: S32.
[in]cell_bias1D weights tensor info with dimensions [output_size]. Data type supported: S32.
[in]output_gate_bias1D weights tensor info with dimensions [output_size]. Data type supported: S32.
[in]cell_state_in2D tensor info with dimensions [output_size, batch_size]. Data type supported: QSYMM16.
[in]output_state_in2D tensor info with dimensions [output_size, batch_size]. Data type supported: Same as input.
[out]cell_state_outDestination tensor info. Output is a 2D tensor info with dimensions [output_size, batch_size]. Data type supported: QSYMM16.
[out]output_state_outDestination tensor info. Output is a 2D tensor info with dimensions [output_size, batch_size].Data types supported: Same as input.
Returns
a status

Definition at line 275 of file CLLSTMLayerQuantized.cpp.

References ARM_COMPUTE_RETURN_ERROR_ON, ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_NOT_IN, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_QUANTIZATION_INFO, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_SHAPES, ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR, ARM_COMPUTE_RETURN_ON_ERROR, arm_compute::test::validation::bias_info, arm_compute::quantization::calculate_quantized_multiplier(), ICloneable< T >::clone(), TensorInfo::clone(), ITensorInfo::dimension(), Window::DimX, Window::DimY, arm_compute::F32, arm_compute::test::validation::input_size, ActivationLayerInfo::LOGISTIC, ITensorInfo::num_dimensions(), UniformQuantizationInfo::offset, arm_compute::test::validation::output_size, arm_compute::test::validation::qasymm(), arm_compute::QASYMM8, arm_compute::QSYMM16, arm_compute::test::validation::qsymm_3(), arm_compute::test::validation::qsymm_4(), ITensorInfo::quantization_info(), arm_compute::test::validation::qweights(), arm_compute::S32, arm_compute::SATURATE, UniformQuantizationInfo::scale, ITensorInfo::set_quantization_info(), TensorInfo::set_quantization_info(), ActivationLayerInfo::TANH, ITensorInfo::tensor_shape(), TensorInfo::tensor_shape(), arm_compute::TO_ZERO, ITensorInfo::total_size(), QuantizationInfo::uniform(), CLDequantizationLayer::validate(), CLTranspose::validate(), CLQuantizationLayer::validate(), CLActivationLayer::validate(), CLConcatenateLayer::validate(), CLGEMMLowpMatrixMultiplyCore::validate(), CLArithmeticAddition::validate(), CLSlice::validate(), CLPixelWiseMultiplication::validate(), and CLGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPoint::validate().

Referenced by CLLSTMLayerQuantized::configure().

281 {
284  output_state_in, cell_state_out, output_state_out);
286 
287  const int input_size = input->dimension(0);
288  const int batch_size = input->dimension(1);
289  const int output_size = input_to_input_weights->dimension(1);
290 
291  // Dimensionality checks
292  ARM_COMPUTE_RETURN_ERROR_ON(input->num_dimensions() > 2);
294  ARM_COMPUTE_RETURN_ERROR_ON(input_gate_bias->num_dimensions() > 1);
295  ARM_COMPUTE_RETURN_ERROR_ON(output_state_in->num_dimensions() > 2);
296 
297  TensorInfo input_weights_info(input_to_input_weights->clone()->set_tensor_shape(TensorShape(input_size, output_size)).set_data_type(DataType::QASYMM8));
298  TensorInfo recurrent_weights_info(input_to_input_weights->clone()->set_tensor_shape(TensorShape(output_size, output_size)).set_data_type(DataType::QASYMM8));
299  TensorInfo bias_info(input_gate_bias->clone()->set_tensor_shape(TensorShape(output_size)).set_data_type(DataType::S32));
300  TensorInfo output_state_info(cell_state_in->clone()->set_tensor_shape(TensorShape(output_size, batch_size)).set_data_type(DataType::QASYMM8).set_quantization_info(qasymm));
301  TensorInfo cell_state_info(cell_state_in->clone()->set_tensor_shape(TensorShape(output_size, batch_size)).set_data_type(DataType::QSYMM16).set_quantization_info(qsymm_4));
302 
303  // Shape checks
307  ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_SHAPES(&cell_state_info, cell_state_in);
308  ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_SHAPES(&output_state_info, output_state_in);
309 
310  // Data type checks
314  ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(&cell_state_info, cell_state_in);
315  ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(&output_state_info, output_state_in);
316 
317  // Quantization checks
320  ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_QUANTIZATION_INFO(&cell_state_info, cell_state_in);
321  ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_QUANTIZATION_INFO(&output_state_info, output_state_in);
322 
323  // Validate internal functions
324  // _concat_input_weights
325  std::vector<const ITensorInfo *> inputs_weights_vector;
326  inputs_weights_vector.emplace_back(input_to_input_weights);
327  inputs_weights_vector.emplace_back(input_to_forget_weights);
328  inputs_weights_vector.emplace_back(input_to_cell_weights);
329  inputs_weights_vector.emplace_back(input_to_output_weights);
330  const QuantizationInfo qweights = input_to_input_weights->quantization_info(); // Weights quantization
331  const TensorInfo input_weights(TensorShape(input_size, 4 * output_size), 1, DataType::QASYMM8, qweights);
332  ARM_COMPUTE_RETURN_ON_ERROR(CLConcatenateLayer::validate(inputs_weights_vector, &input_weights, Window::DimY));
333 
334  // _concat_recurrent_weights
335  std::vector<const ITensorInfo *> recurrent_weights_vector;
336  recurrent_weights_vector.emplace_back(recurrent_to_input_weights);
337  recurrent_weights_vector.emplace_back(recurrent_to_forget_weights);
338  recurrent_weights_vector.emplace_back(recurrent_to_cell_weights);
339  recurrent_weights_vector.emplace_back(recurrent_to_output_weights);
340  const TensorInfo recurrent_weights(TensorShape(output_size, 4 * output_size), 1, DataType::QASYMM8, qweights);
341  ARM_COMPUTE_RETURN_ON_ERROR(CLConcatenateLayer::validate(recurrent_weights_vector, &recurrent_weights, Window::DimY));
342 
343  // _concat_weights
344  std::vector<const ITensorInfo *> weights_vector;
345  weights_vector.emplace_back(&recurrent_weights);
346  weights_vector.emplace_back(&input_weights);
347  const TensorInfo weights(TensorShape(input_size + output_size, 4 * output_size), 1, DataType::QASYMM8, qweights);
349  // _transpose_weights
350  const TensorShape weights_transposed_shape(weights.tensor_shape()[1], weights.tensor_shape()[0]);
351  TensorInfo weights_transposed = weights.clone()->set_is_resizable(true).set_tensor_shape(weights_transposed_shape);
352  ARM_COMPUTE_RETURN_ON_ERROR(CLTranspose::validate(&weights, &weights_transposed));
353 
354  // _concat_inputs
355  std::vector<const ITensorInfo *> input_vector;
356  input_vector.emplace_back(input);
357  input_vector.emplace_back(output_state_in);
358  TensorInfo input_concatenated(TensorShape(output_size + input_size, batch_size), 1, DataType::QASYMM8, qasymm);
359  ARM_COMPUTE_RETURN_ON_ERROR(CLConcatenateLayer::validate(input_vector, &input_concatenated, Window::DimX));
360 
361  // _concat_bias
362  std::vector<const ITensorInfo *> bias_vector;
363  bias_vector.emplace_back(input_gate_bias);
364  bias_vector.emplace_back(forget_gate_bias);
365  bias_vector.emplace_back(cell_bias);
366  bias_vector.emplace_back(output_gate_bias);
367 
368  const TensorInfo bias_concatenated(TensorShape(4 * output_size), 1, DataType::S32);
369  ARM_COMPUTE_RETURN_ON_ERROR(CLConcatenateLayer::validate(bias_vector, &bias_concatenated, Window::DimX));
370 
371  // Invert the offset for gemmlowp
372  input_concatenated.set_quantization_info(QuantizationInfo(qasymm.uniform().scale, -qasymm.uniform().offset));
373  weights_transposed.set_quantization_info(QuantizationInfo(qweights.uniform().scale, -qweights.uniform().offset));
374 
375  // _gemmlowp
376  const TensorInfo output_highp(TensorShape(4 * output_size, batch_size), 1, DataType::S32);
377  ARM_COMPUTE_RETURN_ON_ERROR(CLGEMMLowpMatrixMultiplyCore::validate(&input_concatenated, &weights_transposed, nullptr, &output_highp));
378 
379  // Set the offset back
380  input_concatenated.set_quantization_info(QuantizationInfo(qasymm.uniform().scale, qasymm.uniform().offset));
381  weights_transposed.set_quantization_info(QuantizationInfo(qweights.uniform().scale, qweights.uniform().offset));
382 
383  const TensorInfo output_lowp(output_highp.tensor_shape(), 1, DataType::QSYMM16, qsymm_3);
384 
385  const float multiplier = 4096.f * qasymm.uniform().scale * qweights.uniform().scale;
386  int output_multiplier = 0;
387  int output_shift = 0;
388  ARM_COMPUTE_RETURN_ON_ERROR(quantization::calculate_quantized_multiplier(multiplier, &output_multiplier, &output_shift));
389 
390  // _output_stage
391  ARM_COMPUTE_RETURN_ON_ERROR(CLGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPoint::validate(&output_highp, &bias_concatenated, &output_lowp));
392 
393  TensorInfo input_gate_input;
394  TensorInfo forget_gate_input;
395  TensorInfo input_modulation_gate_input;
396  TensorInfo output_gate_input;
397 
398  if(batch_size > 1)
399  {
400  // _slice_input_tensor
401  input_gate_input = TensorInfo(TensorShape(output_size, batch_size), 1, DataType::QSYMM16, qsymm_3);
402  ARM_COMPUTE_RETURN_ON_ERROR(CLSlice::validate(&output_lowp, &input_gate_input, { 0, 0 }, { output_size, batch_size }));
403  // _slice_forget_tensor
404  forget_gate_input = TensorInfo(TensorShape(output_size, batch_size), 1, DataType::QSYMM16, qsymm_3);
405  ARM_COMPUTE_RETURN_ON_ERROR(CLSlice::validate(&output_lowp, &forget_gate_input, { output_size, 0 }, { 2 * output_size, batch_size }));
406  // _slice_cell_tensor
407  input_modulation_gate_input = TensorInfo(TensorShape(output_size, batch_size), 1, DataType::QSYMM16, qsymm_3);
408  ARM_COMPUTE_RETURN_ON_ERROR(CLSlice::validate(&output_lowp, &input_modulation_gate_input, { 2 * output_size, 0 }, { 3 * output_size, batch_size }));
409  // _slice_output_tensor
410  output_gate_input = TensorInfo(TensorShape(output_size, batch_size), 1, DataType::QSYMM16, qsymm_3);
411  ARM_COMPUTE_RETURN_ON_ERROR(CLSlice::validate(&output_lowp, &output_gate_input, { 3 * output_size, 0 }, { 4 * output_size, batch_size }));
412  }
413  else
414  {
415  // _slice_input_tensor
416  input_gate_input = TensorInfo(TensorShape(output_size), 1, DataType::QSYMM16, qsymm_3);
417  ARM_COMPUTE_RETURN_ON_ERROR(CLSlice::validate(&output_lowp, &input_gate_input, { 0 }, { output_size }));
418  // _slice_forget_tensor
419  forget_gate_input = TensorInfo(TensorShape(output_size), 1, DataType::QSYMM16, qsymm_3);
420  ARM_COMPUTE_RETURN_ON_ERROR(CLSlice::validate(&output_lowp, &forget_gate_input, { output_size }, { 2 * output_size }));
421  // _slice_cell_tensor
422  input_modulation_gate_input = TensorInfo(TensorShape(output_size), 1, DataType::QSYMM16, qsymm_3);
423  ARM_COMPUTE_RETURN_ON_ERROR(CLSlice::validate(&output_lowp, &input_modulation_gate_input, { 2 * output_size }, { 3 * output_size }));
424  // _slice_output_tensor
425  output_gate_input = TensorInfo(TensorShape(output_size), 1, DataType::QSYMM16, qsymm_3);
426  ARM_COMPUTE_RETURN_ON_ERROR(CLSlice::validate(&output_lowp, &output_gate_input, { 3 * output_size }, { 4 * output_size }));
427  }
428 
429  // _sigmoid_forget_gate
430  const TensorInfo forget_gate_output(forget_gate_input.tensor_shape(), 1, DataType::QSYMM16, qsymm_0);
431  ARM_COMPUTE_RETURN_ON_ERROR(CLActivationLayer::validate(&forget_gate_input, &forget_gate_output, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::LOGISTIC)));
432  // _sigmoid_input_gate
433  const TensorInfo input_gate_output(input_gate_input.tensor_shape(), 1, DataType::QSYMM16, qsymm_0);
434  ARM_COMPUTE_RETURN_ON_ERROR(CLActivationLayer::validate(&input_gate_input, &input_gate_output, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::LOGISTIC)));
435  // _tanh_modulation_gate
436  const TensorInfo input_modulation_gate_output(input_modulation_gate_input.tensor_shape(), 1, DataType::QSYMM16, qsymm_0);
437  ARM_COMPUTE_RETURN_ON_ERROR(CLActivationLayer::validate(&input_modulation_gate_input, &input_modulation_gate_output, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::TANH, 1.0f, 1.0f)));
438  // _sigmoid_output_gate
439  const TensorInfo output_gate_output(output_gate_input.tensor_shape(), 1, DataType::QSYMM16, qsymm_0);
440  ARM_COMPUTE_RETURN_ON_ERROR(CLActivationLayer::validate(&output_gate_input, &output_gate_output, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::LOGISTIC)));
441 
442  // _mul_forget_gate_cell_state
443  const TensorInfo cell_state_tmp1(forget_gate_output.tensor_shape(), 1, DataType::QSYMM16, qsymm_4);
445 
446  // _mul_input_gate_input_mod_gate
447  const TensorInfo cell_state_tmp2(input_gate_output.tensor_shape(), 1, DataType::QSYMM16, qsymm_4);
448  ARM_COMPUTE_RETURN_ON_ERROR(CLPixelWiseMultiplication::validate(&input_gate_output, &input_modulation_gate_output, &cell_state_tmp2, 1, ConvertPolicy::SATURATE, RoundingPolicy::TO_ZERO));
449 
450  // _add_cell_state_tmps
451  ARM_COMPUTE_RETURN_ON_ERROR(CLArithmeticAddition::validate(&cell_state_tmp1, &cell_state_tmp2, cell_state_out, ConvertPolicy::SATURATE));
452 
453  // _tanh_modulation_gate
454  const TensorInfo output_state_tmp(cell_state_out->tensor_shape(), 1, DataType::QSYMM16, qsymm_0);
455  ARM_COMPUTE_RETURN_ON_ERROR(CLActivationLayer::validate(cell_state_out, &output_state_tmp, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::TANH, 1.0f, 1.0f)));
456 
457  // _mul_output_state_tmp_output_gate
458  const TensorInfo output_state_out_symm(output_gate_output.tensor_shape(), 1, DataType::QSYMM16, qsymm_0);
459  ARM_COMPUTE_RETURN_ON_ERROR(CLPixelWiseMultiplication::validate(&output_state_tmp, &output_gate_output, &output_state_out_symm, 1, ConvertPolicy::SATURATE, RoundingPolicy::TO_ZERO));
460 
461  // _dequantize
462  const TensorInfo output_state_out_f32(output_state_out_symm.tensor_shape(), 1, DataType::F32);
463  ARM_COMPUTE_RETURN_ON_ERROR(CLDequantizationLayer::validate(&output_state_out_symm, &output_state_out_f32));
464 
465  // _quantize
466  ARM_COMPUTE_RETURN_ON_ERROR(CLQuantizationLayer::validate(&output_state_out_f32, output_state_out));
467 
468  if(cell_state_out->total_size() != 0)
469  {
470  ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(&cell_state_info, cell_state_out);
471  ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_SHAPES(&cell_state_info, cell_state_out);
472  ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_QUANTIZATION_INFO(&cell_state_info, cell_state_out);
473  }
474 
475  if(output_state_out->total_size() != 0)
476  {
477  ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(&output_state_info, output_state_out);
478  ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_SHAPES(&output_state_info, output_state_out);
479  ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_QUANTIZATION_INFO(&output_state_info, output_state_out);
480  }
481 
482  return Status{};
483 }
static Status validate(const ITensorInfo *input, const ITensorInfo *output, const ActivationLayerInfo &act_info)
Static function to check if given info will lead to a valid configuration of CLActivationLayer.
static Status validate(const ITensorInfo *input, const ITensorInfo *output)
Static function to check if given info will lead to a valid configuration of CLDequantizationLayer.
quantized, symmetric fixed-point 16-bit number
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_QUANTIZATION_INFO(...)
Definition: Validate.h:610
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204
QuantizationInfo qweights(1.f/16.f, 16)
1 channel, 1 F32 per channel
Status calculate_quantized_multiplier(float multiplier, int32_t *quant_multiplier, int32_t *shift, bool ignore_epsilon=false)
Calculate quantized representation of multiplier.
#define ARM_COMPUTE_RETURN_ERROR_ON(cond)
If the condition is true, an error is returned.
Definition: Error.h:296
QuantizationInfo qsymm_3(8.f/32768.f, 0)
#define ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR(...)
Definition: Validate.h:163
1 channel, 1 S32 per channel
static Status validate(const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *output, const GEMMInfo &gemm_info=GEMMInfo())
Static function to check if given info will lead to a valid configuration of CLGEMMLowpMatrixMultiply...
static constexpr size_t DimX
Alias for dimension 0 also known as X dimension.
Definition: Window.h:43
quantized, asymmetric fixed-point 8-bit number unsigned
static Status validate(const ITensorInfo *input1, const ITensorInfo *input2, const ITensorInfo *output, ConvertPolicy policy, const ActivationLayerInfo &act_info=ActivationLayerInfo())
Static function to check if given info will lead to a valid configuration of opencl::kernels::ClSatur...
UniformQuantizationInfo uniform() const
Return per layer quantization info.
static Status validate(const ITensorInfo *input, const ITensorInfo *output, const Coordinates &starts, const Coordinates &ends)
Static function to check if given info will lead to a valid configuration of CLSlice.
Definition: CLSlice.cpp:79
static Status validate(const std::vector< const ITensorInfo *> &inputs_vector, const ITensorInfo *output, size_t axis)
Static function to check if given info will lead to a valid configuration of CLConcatenateLayer.
static constexpr size_t DimY
Alias for dimension 1 also known as Y dimension.
Definition: Window.h:45
static Status validate(const ITensorInfo *input, const ITensorInfo *output)
Static function to check if given info will lead to a valid configuration of CLTranspose.
Definition: CLTranspose.cpp:44
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_SHAPES(...)
Definition: Validate.h:443
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(...)
Definition: Validate.h:545
static Status validate(const ITensorInfo *input, const ITensorInfo *bias, const ITensorInfo *output, int min=std::numeric_limits< int32_t >::lowest(), int max=std::numeric_limits< int32_t >::max())
Static function to check if given info will lead to a valid configuration of CLGEMMLowpQuantizeDownIn...
QuantizationInfo qasymm(1.f/128.f, 128)
#define ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_NOT_IN(t,...)
Definition: Validate.h:694
QuantizationInfo qsymm_4(16.f/32768.f, 0)
Truncates the least significant values that are lost in operations.
static Status validate(const ITensorInfo *input, const ITensorInfo *output)
Static function to check if given info will lead to a valid configuration of CLQuantizationLayer.
static Status validate(const ITensorInfo *input1, const ITensorInfo *input2, const ITensorInfo *output, float scale, ConvertPolicy overflow_policy, RoundingPolicy rounding_policy, const ActivationLayerInfo &act_info=ActivationLayerInfo())
Static function to check if given info will lead to a valid configuration of CLPixelWiseMultiplicatio...

The documentation for this class was generated from the following files: