Compute Library
 21.02
CLQLSTMLayer Class Reference

Basic function to run CLQLSTMLayer. More...

#include <CLQLSTMLayer.h>

Collaboration diagram for CLQLSTMLayer:
[legend]

Public Member Functions

 CLQLSTMLayer (std::shared_ptr< IMemoryManager > memory_manager=nullptr)
 Default constructor. More...
 
 CLQLSTMLayer (const CLQLSTMLayer &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 CLQLSTMLayer (CLQLSTMLayer &&)=default
 Default move constructor. More...
 
CLQLSTMLayeroperator= (const CLQLSTMLayer &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
CLQLSTMLayeroperator= (CLQLSTMLayer &&)=default
 Default move assignment operator. More...
 
 ~CLQLSTMLayer ()
 Default destructor. More...
 
void configure (const ICLTensor *input, const ICLTensor *input_to_forget_weights, const ICLTensor *input_to_cell_weights, const ICLTensor *input_to_output_weights, const ICLTensor *recurrent_to_forget_weights, const ICLTensor *recurrent_to_cell_weights, const ICLTensor *recurrent_to_output_weights, const ICLTensor *forget_gate_bias, const ICLTensor *cell_bias, const ICLTensor *output_gate_bias, ICLTensor *cell_state_in, ICLTensor *output_state_in, ICLTensor *cell_state_out, ICLTensor *output_state_out, ICLTensor *output, const LSTMParams< ICLTensor > &lstm_params)
 Initialize function's tensors. More...
 
void configure (const CLCompileContext &compile_context, const ICLTensor *input, const ICLTensor *input_to_forget_weights, const ICLTensor *input_to_cell_weights, const ICLTensor *input_to_output_weights, const ICLTensor *recurrent_to_forget_weights, const ICLTensor *recurrent_to_cell_weights, const ICLTensor *recurrent_to_output_weights, const ICLTensor *forget_gate_bias, const ICLTensor *cell_bias, const ICLTensor *output_gate_bias, ICLTensor *cell_state_in, ICLTensor *output_state_in, ICLTensor *cell_state_out, ICLTensor *output_state_out, ICLTensor *output, const LSTMParams< ICLTensor > &lstm_params)
 Initialize function's tensors. More...
 
void run () override
 Run the kernels contained in the function. More...
 
void prepare () override
 Prepare the function for executing. More...
 
- Public Member Functions inherited from IFunction
virtual ~IFunction ()=default
 Destructor. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *input, const ITensorInfo *input_to_forget_weights, const ITensorInfo *input_to_cell_weights, const ITensorInfo *input_to_output_weights, const ITensorInfo *recurrent_to_forget_weights, const ITensorInfo *recurrent_to_cell_weights, const ITensorInfo *recurrent_to_output_weights, const ITensorInfo *forget_gate_bias, const ITensorInfo *cell_bias, const ITensorInfo *output_gate_bias, const ITensorInfo *cell_state_in, const ITensorInfo *output_state_in, const ITensorInfo *cell_state_out, const ITensorInfo *output_state_out, const ITensorInfo *output, const LSTMParams< ITensorInfo > &lstm_params)
 Static function to check if given info will lead to a valid configuration of CLQLSTMLayer. More...
 

Detailed Description

Basic function to run CLQLSTMLayer.

This function calls the following CL functions/kernels:

  1. CLActivationLayer Activation functions (tanh and logistic)
  2. CLCopy Copy function for copying output_state_out to output
  3. CLArithmeticAddition Elementwise addition and subtraction
  4. CLGEMMLowpMatrixMultiplyCore Quantized matrix multiplication core. Accumulators are 32-bit integers
  5. CLGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPoint Convert 32-bit integers into QSYMM16
  6. CLGEMMLowpMatrixAReductionKernel For precomputing effective biases to use
  7. CLPixelWiseMultiplication Elementwise multiplication
  8. CLTranspose Transpose function for reshaping the weights

Definition at line 60 of file CLQLSTMLayer.h.

Constructor & Destructor Documentation

◆ CLQLSTMLayer() [1/3]

CLQLSTMLayer ( std::shared_ptr< IMemoryManager memory_manager = nullptr)

Default constructor.

Definition at line 97 of file CLQLSTMLayer.cpp.

References CLTensorAllocator::allocate(), CLTensor::allocator(), ARM_COMPUTE_ERROR_ON, arm_compute::quantization::calculate_quantized_multiplier(), CLQLSTMLayerNormalizationKernel::configure(), CLGEMMLowpMatrixMultiplyCore::configure(), CLGEMMLowpOutputStage::configure(), GEMMLowpOutputStageInfo::gemmlowp_multiplier, GEMMLowpOutputStageInfo::gemmlowp_shift, ITensor::info(), ITensorAllocator::init(), MemoryGroup::manage(), CLQLSTMLayerNormalizationKernel::validate(), and CLQLSTMLayer::~CLQLSTMLayer().

98  : _input_to_input_reduction(std::make_unique<CLGEMMLowpMatrixAReductionKernel>()),
99  _recurrent_to_input_reduction(std::make_unique<CLGEMMLowpMatrixAReductionKernel>()),
100  _input_to_forget_reduction(std::make_unique<CLGEMMLowpMatrixAReductionKernel>()),
101  _recurrent_to_forget_reduction(std::make_unique<CLGEMMLowpMatrixAReductionKernel>()),
102  _input_to_cell_reduction(std::make_unique<CLGEMMLowpMatrixAReductionKernel>()),
103  _recurrent_to_cell_reduction(std::make_unique<CLGEMMLowpMatrixAReductionKernel>()),
104  _input_to_output_reduction(std::make_unique<CLGEMMLowpMatrixAReductionKernel>()),
105  _recurrent_to_output_reduction(std::make_unique<CLGEMMLowpMatrixAReductionKernel>()),
106  _projection_reduction(std::make_unique<CLGEMMLowpMatrixAReductionKernel>()),
107  _layer_norms(),
108  _copy_output()
109 {
110  for(auto &norm : _layer_norms)
111  {
112  norm = std::make_unique<CLQLSTMLayerNormalizationKernel>();
113  }
114 
115  _memory_group = MemoryGroup(std::move(memory_manager));
116 }

◆ CLQLSTMLayer() [2/3]

CLQLSTMLayer ( const CLQLSTMLayer )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ CLQLSTMLayer() [3/3]

CLQLSTMLayer ( CLQLSTMLayer &&  )
default

Default move constructor.

◆ ~CLQLSTMLayer()

~CLQLSTMLayer ( )
default

Default destructor.

Referenced by CLQLSTMLayer::CLQLSTMLayer().

Member Function Documentation

◆ configure() [1/2]

void configure ( const ICLTensor input,
const ICLTensor input_to_forget_weights,
const ICLTensor input_to_cell_weights,
const ICLTensor input_to_output_weights,
const ICLTensor recurrent_to_forget_weights,
const ICLTensor recurrent_to_cell_weights,
const ICLTensor recurrent_to_output_weights,
const ICLTensor forget_gate_bias,
const ICLTensor cell_bias,
const ICLTensor output_gate_bias,
ICLTensor cell_state_in,
ICLTensor output_state_in,
ICLTensor cell_state_out,
ICLTensor output_state_out,
ICLTensor output,
const LSTMParams< ICLTensor > &  lstm_params 
)

Initialize function's tensors.

Parameters
[in]inputSource tensor. Input is a 2D tensor with dimensions [input_size, batch_size]. Data types supported: QASYMM8_SIGNED.
[in]input_to_forget_weights2D weights tensor with dimensions [input_size, num_units]. Data type supported: QSYMM8.
[in]input_to_cell_weights2D weights tensor with dimensions [input_size, num_units]. Data type supported: QSYMM8.
[in]input_to_output_weights2D weights tensor with dimensions [input_size, num_units]. Data type supported: QSYMM8.
[in]recurrent_to_forget_weights2D weights tensor with dimensions [output_size, num_units]. Data type supported: QSYMM8.
[in]recurrent_to_cell_weights2D weights tensor with dimensions [output_size, num_units]. Data type supported: QSYMM8.
[in]recurrent_to_output_weights2D weights tensor with dimensions [output_size, num_units]. Data type supported: QSYMM8.
[in]forget_gate_bias1D weights tensor with dimensions [num_units]. Data type supported: S32.
[in]cell_bias1D weights tensor with dimensions [num_units]. Data type supported: S32.
[in]output_gate_bias1D weights tensor with dimensions [num_units]. Data type supported: S32.
[in]cell_state_in2D tensor with dimensions [num_units, batch_size]. Data type supported: QSYMM16.
[in]output_state_in2D tensor with dimensions [output_size, batch_size]. Data type supported: Same as input.
[out]cell_state_outDestination tensor. Output is a 2D tensor with dimensions [num_units, batch_size]. Data type supported: QSYMM16.
[out]output_state_outDestination tensor. Output is a 2D tensor with dimensions [output_size, batch_size].Data types supported: Same as input.
[out]outputDestination tensor. Output is a 2D tensor with dimensions [output_size, batch_size].Data types supported: Same as input.
[in]lstm_paramsWeights tensors used in peephole, CIFG and layer normalization optimizations: input_intermediate_scale Scale of the intermediate result of matmul, i.e. input to layer normalization, at input gate. forget_intermediate_scale Scale of the intermediate result of matmul, i.e. input to layer normalization, at forget gate. cell_intermediate_scale Scale of the intermediate result of matmul, i.e. input to layer normalization, at cell gate. output_intermediate_scale Scale of the intermediate result of matmul, i.e. input to layer normalization, at output gate. hidden_state_zero The zero point of the hidden state. hidden_state_scale The scale of the hidden state. input_to_input_weights (Optional) 2D weights tensor with dimensions [input_size, num_units]. Data type supported: QSYMM8. recurrent_to_input_weights (Optional) 2D weights tensor with dimensions [output_size, num_units]. Data type supported: QSYMM8. cell_to_input_weights (Optional) 1D weights tensor with dimensions [num_units]. Can be nullptr. Data type supported: QSYMM16. cell_to_forget_weights (Optional) 1D weights tensor with dimensions [num_units]. Data type supported: QSYMM16. cell_to_output_weights (Optional) 1D weights tensor with dimensions [num_units]. Data type supported: QSYMM16. input_gate_bias (Optional) 1D weights tensor with dimensions [num_units]. Data type supported: S32. projection_weights (Optional) 2D weights tensor with dimensions [output_size, num_units]. Data type supported: QSYMM8. projection_bias (Optional) 1D weights tensor with dimensions [output_size]. S32. input_layer_norm_weights (Optional) 1D weights tensor with dimensions [num_units]. Data type supported: QSYMM16. forget_layer_norm_weights (Optional) 1D weights tensor with dimensions [num_units]. Data type supported: QSYMM16. cell_layer_norm_weights (Optional) 1D weights tensor with dimensions [num_units]. Data type supported: QSYMM16. output_layer_norm_weights (Optional) 1D weights tensor with dimensions [num_units]. Data type supported: QSYMM16. cell_threshold (Optional) The clipping threshold for the cell state, such that values are bound within [-cell_clip, cell_clip]. If set to 0.0 then clipping is disabled. projection_threshold (Optional) The clipping threshold for the output from the projection layer, such that values are bound within [-proj_clip, proj_clip]. If set to 0.0 then clipping is disabled.

Definition at line 162 of file CLQLSTMLayer.cpp.

References CLKernelLibrary::get().

169 {
172  cell_state_in, output_state_in, cell_state_out, output_state_out, output, lstm_params);
173 }
static CLKernelLibrary & get()
Access the KernelLibrary singleton.
void configure(const ICLTensor *input, const ICLTensor *input_to_forget_weights, const ICLTensor *input_to_cell_weights, const ICLTensor *input_to_output_weights, const ICLTensor *recurrent_to_forget_weights, const ICLTensor *recurrent_to_cell_weights, const ICLTensor *recurrent_to_output_weights, const ICLTensor *forget_gate_bias, const ICLTensor *cell_bias, const ICLTensor *output_gate_bias, ICLTensor *cell_state_in, ICLTensor *output_state_in, ICLTensor *cell_state_out, ICLTensor *output_state_out, ICLTensor *output, const LSTMParams< ICLTensor > &lstm_params)
Initialize function&#39;s tensors.

◆ configure() [2/2]

void configure ( const CLCompileContext compile_context,
const ICLTensor input,
const ICLTensor input_to_forget_weights,
const ICLTensor input_to_cell_weights,
const ICLTensor input_to_output_weights,
const ICLTensor recurrent_to_forget_weights,
const ICLTensor recurrent_to_cell_weights,
const ICLTensor recurrent_to_output_weights,
const ICLTensor forget_gate_bias,
const ICLTensor cell_bias,
const ICLTensor output_gate_bias,
ICLTensor cell_state_in,
ICLTensor output_state_in,
ICLTensor cell_state_out,
ICLTensor output_state_out,
ICLTensor output,
const LSTMParams< ICLTensor > &  lstm_params 
)

Initialize function's tensors.

Parameters
[in]compile_contextThe compile context to be used.
[in]inputSource tensor. Input is a 2D tensor with dimensions [input_size, batch_size]. Data types supported: QASYMM8_SIGNED.
[in]input_to_forget_weights2D weights tensor with dimensions [input_size, num_units]. Data type supported: QSYMM8.
[in]input_to_cell_weights2D weights tensor with dimensions [input_size, num_units]. Data type supported: QSYMM8.
[in]input_to_output_weights2D weights tensor with dimensions [input_size, num_units]. Data type supported: QSYMM8.
[in]recurrent_to_forget_weights2D weights tensor with dimensions [output_size, num_units]. Data type supported: QSYMM8.
[in]recurrent_to_cell_weights2D weights tensor with dimensions [output_size, num_units]. Data type supported: QSYMM8.
[in]recurrent_to_output_weights2D weights tensor with dimensions [output_size, num_units]. Data type supported: QSYMM8.
[in]forget_gate_bias1D weights tensor with dimensions [num_units]. Data type supported: S32.
[in]cell_bias1D weights tensor with dimensions [num_units]. Data type supported: S32.
[in]output_gate_bias1D weights tensor with dimensions [num_units]. Data type supported: S32.
[in]cell_state_in2D tensor with dimensions [num_units, batch_size]. Data type supported: QSYMM16.
[in]output_state_in2D tensor with dimensions [output_size, batch_size]. Data type supported: Same as input.
[out]cell_state_outDestination tensor. Output is a 2D tensor with dimensions [num_units, batch_size]. Data type supported: QSYMM16.
[out]output_state_outDestination tensor. Output is a 2D tensor with dimensions [output_size, batch_size].Data types supported: Same as input.
[out]outputDestination tensor. Output is a 2D tensor with dimensions [output_size, batch_size].Data types supported: Same as input.
[in]lstm_paramsWeights tensors used in peephole, CIFG and layer normalization optimizations: input_intermediate_scale Scale of the intermediate result of matmul, i.e. input to layer normalization, at input gate. forget_intermediate_scale Scale of the intermediate result of matmul, i.e. input to layer normalization, at forget gate. cell_intermediate_scale Scale of the intermediate result of matmul, i.e. input to layer normalization, at cell gate. output_intermediate_scale Scale of the intermediate result of matmul, i.e. input to layer normalization, at output gate. hidden_state_zero The zero point of the hidden state. hidden_state_scale The scale of the hidden state. input_to_input_weights (Optional) 2D weights tensor with dimensions [input_size, num_units]. Data type supported: QSYMM8. recurrent_to_input_weights (Optional) 2D weights tensor with dimensions [output_size, num_units]. Data type supported: QSYMM8. cell_to_input_weights (Optional) 1D weights tensor with dimensions [num_units]. Can be nullptr. Data type supported: QSYMM16. cell_to_forget_weights (Optional) 1D weights tensor with dimensions [num_units]. Data type supported: QSYMM16. cell_to_output_weights (Optional) 1D weights tensor with dimensions [num_units]. Data type supported: QSYMM16. input_gate_bias (Optional) 1D weights tensor with dimensions [num_units]. Data type supported: S32. projection_weights (Optional) 2D weights tensor with dimensions [output_size, num_units]. Data type supported: QSYMM8. projection_bias (Optional) 1D weights tensor with dimensions [output_size]. S32. input_layer_norm_weights (Optional) 1D weights tensor with dimensions [num_units]. Data type supported: QSYMM16. forget_layer_norm_weights (Optional) 1D weights tensor with dimensions [num_units]. Data type supported: QSYMM16. cell_layer_norm_weights (Optional) 1D weights tensor with dimensions [num_units]. Data type supported: QSYMM16. output_layer_norm_weights (Optional) 1D weights tensor with dimensions [num_units]. Data type supported: QSYMM16. cell_threshold (Optional) The clipping threshold for the cell state, such that values are bound within [-cell_clip, cell_clip]. If set to 0.0 then clipping is disabled. projection_threshold (Optional) The clipping threshold for the output from the projection layer, such that values are bound within [-proj_clip, proj_clip]. If set to 0.0 then clipping is disabled.

Definition at line 175 of file CLQLSTMLayer.cpp.

References CLTensorAllocator::allocate(), CLTensor::allocator(), ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, arm_compute::utils::info_helpers::build_lstm_params_tensor_info(), arm_compute::quantization::calculate_quantized_multiplier(), LSTMParams< T >::cell_clip(), LSTMParams< T >::cell_intermediate_scale(), LSTMParams< T >::cell_layer_norm_weights(), LSTMParams< T >::cell_to_forget_weights(), LSTMParams< T >::cell_to_input_weights(), LSTMParams< T >::cell_to_output_weights(), CLTranspose::configure(), CLCopy::configure(), CLActivationLayer::configure(), CLArithmeticAddition::configure(), CLArithmeticSubtraction::configure(), CLPixelWiseMultiplication::configure(), CLGEMMLowpOutputStage::configure(), ITensorInfo::data_type(), ITensorInfo::dimension(), LSTMParams< T >::forget_intermediate_scale(), LSTMParams< T >::forget_layer_norm_weights(), GEMMLowpOutputStageInfo::gemmlowp_max_bound, GEMMLowpOutputStageInfo::gemmlowp_min_bound, GEMMLowpOutputStageInfo::gemmlowp_multiplier, GEMMLowpOutputStageInfo::gemmlowp_offset, GEMMLowpOutputStageInfo::gemmlowp_shift, LSTMParams< T >::has_cifg_opt(), LSTMParams< T >::has_peephole_opt(), LSTMParams< T >::has_projection(), LSTMParams< T >::hidden_state_scale(), LSTMParams< T >::hidden_state_zero(), ITensor::info(), CLTensor::info(), ITensorAllocator::init(), LSTMParams< T >::input_gate_bias(), LSTMParams< T >::input_intermediate_scale(), LSTMParams< T >::input_layer_norm_weights(), arm_compute::test::validation::input_to_cell_weights, arm_compute::test::validation::input_to_forget_weights, LSTMParams< T >::input_to_input_weights(), arm_compute::test::validation::input_to_output_weights, ActivationLayerInfo::LOGISTIC, arm_compute::support::cpp11::lowest(), ActivationLayerInfo::LU_BOUNDED_RELU, MemoryGroup::manage(), UniformQuantizationInfo::offset, GEMMLowpOutputStageInfo::output_data_type, LSTMParams< T >::output_intermediate_scale(), LSTMParams< T >::output_layer_norm_weights(), arm_compute::test::validation::output_size, LSTMParams< T >::projection_bias(), LSTMParams< T >::projection_clip(), LSTMParams< T >::projection_weights(), arm_compute::QASYMM8_SIGNED, arm_compute::QSYMM16, ITensorInfo::quantization_info(), TensorInfo::quantization_info(), arm_compute::QUANTIZE_DOWN_FIXEDPOINT, arm_compute::quantize_qsymm16(), arm_compute::test::validation::recurrent_to_cell_weights, arm_compute::test::validation::recurrent_to_forget_weights, LSTMParams< T >::recurrent_to_input_weights(), arm_compute::test::validation::recurrent_to_output_weights, arm_compute::S32, arm_compute::SATURATE, UniformQuantizationInfo::scale, TensorInfo::set_tensor_shape(), ActivationLayerInfo::TANH, ITensorInfo::tensor_shape(), TensorInfo::tensor_shape(), arm_compute::TO_ZERO, GEMMLowpOutputStageInfo::type, QuantizationInfo::uniform(), LSTMParams< T >::use_layer_norm(), and CLQLSTMLayer::validate().

182 {
185  forget_gate_bias, cell_bias, output_gate_bias, cell_state_in, output_state_in,
186  cell_state_out, output_state_out, output);
187 
188  // Set lstm parameters
189  LSTMParams<ITensorInfo> lstm_params_info{};
190  build_lstm_params_tensor_info(lstm_params, &lstm_params_info);
191 
192  // Validate
195  forget_gate_bias->info(), cell_bias->info(), output_gate_bias->info(),
196  cell_state_in->info(), output_state_in->info(), cell_state_out->info(), output_state_out->info(), output->info(),
197  lstm_params_info));
198 
199  const int batch_size = input->info()->dimension(1);
200  const int num_units = input_to_output_weights->info()->dimension(1);
201  const int output_size = output_state_out->info()->dimension(_out_state_output_size_dimension_idx);
202 
203  const UniformQuantizationInfo qinput = input->info()->quantization_info().uniform();
204  const UniformQuantizationInfo qcell_state_in = cell_state_in->info()->quantization_info().uniform();
205  const UniformQuantizationInfo qoutput_state_in = output_state_in->info()->quantization_info().uniform();
206 
207  _projection_bias = lstm_params.projection_bias();
208  _input_to_forget_weights = input_to_forget_weights;
209  _input_to_cell_weights = input_to_cell_weights;
210  _input_to_output_weights = input_to_output_weights;
211  _recurrent_to_forget_weights = recurrent_to_forget_weights;
212  _recurrent_to_cell_weights = recurrent_to_cell_weights;
213  _recurrent_to_output_weights = recurrent_to_output_weights;
214  _projection_weights = lstm_params.projection_weights();
215 
216  // Layer normalization
217  _has_layer_norm = lstm_params.use_layer_norm();
218  if(_has_layer_norm)
219  {
220  set_layer_norm_weight(lstm_params.forget_layer_norm_weights(), LayerNormGate::Forget);
221  set_layer_norm_weight(lstm_params.cell_layer_norm_weights(), LayerNormGate::Cell);
222  set_layer_norm_weight(lstm_params.input_layer_norm_weights(), LayerNormGate::Input);
223  set_layer_norm_weight(lstm_params.output_layer_norm_weights(), LayerNormGate::Output);
224 
225  set_layer_norm_bias(forget_gate_bias, LayerNormGate::Forget);
226  set_layer_norm_bias(cell_bias, LayerNormGate::Cell);
227  set_layer_norm_bias(lstm_params.input_gate_bias(), LayerNormGate::Input);
228  set_layer_norm_bias(output_gate_bias, LayerNormGate::Output);
229  }
230 
231  _has_cifg = lstm_params.has_cifg_opt();
232  _has_projection = lstm_params.has_projection();
233  _has_peephole = lstm_params.has_peephole_opt();
234 
235  // Calculate and decompose effective scales for optimizing matmul calculation
236  const int32_t cell_shift = log2(qcell_state_in.scale);
237 
238  // Calculate quantized parameters for clipping.
239  int16_t quantized_cell_clip = 0;
240  if(lstm_params.cell_clip() > 0.0f)
241  {
242  quantized_cell_clip = quantize_qsymm16(lstm_params.cell_clip(), qcell_state_in);
243  }
244  _has_cell_clipping = quantized_cell_clip > 0;
245 
246  // Precompute effective bias for optimizing the matmul computations.
247  if(!_has_cifg)
248  {
249  _input_to_input_weights = lstm_params.input_to_input_weights();
250  _recurrent_to_input_weights = lstm_params.recurrent_to_input_weights();
251 
252  _input_to_input_reduction->configure(compile_context, _input_to_input_weights, &_input_to_input_eff_bias, GEMMLowpReductionKernelInfo(num_units, false, -qinput.offset, true));
253  _recurrent_to_input_reduction->configure(compile_context, _recurrent_to_input_weights, &_recurrent_to_input_eff_bias, GEMMLowpReductionKernelInfo(num_units, false, -qoutput_state_in.offset, true));
254  }
255  _input_to_forget_reduction->configure(compile_context, input_to_forget_weights, &_input_to_forget_eff_bias, GEMMLowpReductionKernelInfo(num_units, false, -qinput.offset, true));
256  _recurrent_to_forget_reduction->configure(compile_context, recurrent_to_forget_weights, &_recurrent_to_forget_eff_bias, GEMMLowpReductionKernelInfo(num_units, false, -qoutput_state_in.offset, true));
257  _input_to_cell_reduction->configure(compile_context, input_to_cell_weights, &_input_to_cell_eff_bias, GEMMLowpReductionKernelInfo(num_units, false, -qinput.offset, true));
258  _recurrent_to_cell_reduction->configure(compile_context, recurrent_to_cell_weights, &_recurrent_to_cell_eff_bias, GEMMLowpReductionKernelInfo(num_units, false, -qoutput_state_in.offset, true));
259  _input_to_output_reduction->configure(compile_context, input_to_output_weights, &_input_to_output_eff_bias, GEMMLowpReductionKernelInfo(num_units, false, -qinput.offset, true));
260  _recurrent_to_output_reduction->configure(compile_context, recurrent_to_output_weights, &_recurrent_to_output_eff_bias, GEMMLowpReductionKernelInfo(num_units, false, -qoutput_state_in.offset, true));
261  if(_has_projection)
262  {
263  _projection_reduction->configure(compile_context, _projection_weights, &_projection_eff_bias, GEMMLowpReductionKernelInfo(output_size, false, lstm_params.hidden_state_zero(), true));
264  if(_projection_bias != nullptr)
265  {
266  _projection_bias_add.configure(compile_context, _projection_bias, &_projection_eff_bias, &_projection_eff_bias, ConvertPolicy::SATURATE);
267  }
268  }
269 
270  // Pre-transpose weights to be used in GEMM.
271  _transpose_input_to_forget_weights.configure(compile_context, input_to_forget_weights, &_input_to_forget_weights_transposed);
272  _transpose_input_to_cell_weights.configure(compile_context, input_to_cell_weights, &_input_to_cell_weights_transposed);
273  _transpose_input_to_output_weights.configure(compile_context, input_to_output_weights, &_input_to_output_weights_transposed);
274  _transpose_recurrent_to_forget_weights.configure(compile_context, recurrent_to_forget_weights, &_recurrent_to_forget_weights_transposed);
275  _transpose_recurrent_to_cell_weights.configure(compile_context, recurrent_to_cell_weights, &_recurrent_to_cell_weights_transposed);
276  _transpose_recurrent_to_output_weights.configure(compile_context, recurrent_to_output_weights, &_recurrent_to_output_weights_transposed);
277  if(!_has_cifg)
278  {
279  _transpose_input_to_input_weights.configure(compile_context, lstm_params.input_to_input_weights(), &_input_to_input_weights_transposed);
280  _transpose_recurrent_to_input_weights.configure(compile_context, lstm_params.recurrent_to_input_weights(), &_recurrent_to_input_weights_transposed);
281  }
282  if(_has_projection)
283  {
284  _transpose_projection_weights.configure(compile_context, _projection_weights, &_projection_weights_transposed);
285  }
286 
287  GEMMLowpOutputStageInfo gemmlowp_info;
289  gemmlowp_info.gemmlowp_min_bound = std::numeric_limits<int16_t>::lowest();
290  gemmlowp_info.gemmlowp_max_bound = std::numeric_limits<int16_t>::max();
291  gemmlowp_info.output_data_type = DataType::QSYMM16;
292 
293  const TensorInfo mm_out_info(TensorShape(num_units, batch_size), 1, DataType::S32);
294  // Forget gate.
295  const TensorInfo forget_gate_outstage_info(mm_out_info.tensor_shape(), 1, DataType::QSYMM16, QuantizationInfo(lstm_params.forget_intermediate_scale(), 0));
296  const float input_to_forget_scale = input_to_forget_weights->info()->quantization_info().uniform().scale * qinput.scale / lstm_params.forget_intermediate_scale();
297  configure_mm(compile_context, _mm_input_to_forget, _input_to_forget_outstage, gemmlowp_info,
298  input, &_input_to_forget_weights_transposed, &_input_to_forget_eff_bias,
299  &_mm_input_to_forget_res, &_input_to_forget_outstage_res, input_to_forget_scale,
300  mm_out_info, forget_gate_outstage_info);
301 
302  const float recurrent_to_forget_scale = recurrent_to_forget_weights->info()->quantization_info().uniform().scale * qoutput_state_in.scale / lstm_params.forget_intermediate_scale();
303  configure_mm(compile_context, _mm_recurrent_to_forget, _recurrent_to_forget_outstage, gemmlowp_info,
304  output_state_in, &_recurrent_to_forget_weights_transposed, &_recurrent_to_forget_eff_bias,
305  &_mm_recurrent_to_forget_res, &_recurrent_to_forget_outstage_res, recurrent_to_forget_scale,
306  mm_out_info, forget_gate_outstage_info);
307 
308  _accumulate_input_recurrent_forget.configure(compile_context, &_input_to_forget_outstage_res, &_recurrent_to_forget_outstage_res, &_recurrent_to_forget_outstage_res,
310  _input_to_forget_outstage_res.allocator()->allocate();
311 
312  if(_has_peephole)
313  {
314  _mul_cell_to_forget_res.allocator()->init(TensorInfo(cell_state_in->info()->tensor_shape(), 1, DataType::S32));
315  _memory_group.manage(&_mul_cell_to_forget_res);
316  _pixelwise_mul_cell_to_forget.configure(compile_context, cell_state_in, lstm_params.cell_to_forget_weights(), &_mul_cell_to_forget_res, 1.f, ConvertPolicy::SATURATE, RoundingPolicy::TO_ZERO);
317  _cell_to_forget_outstage_res.allocator()->init(TensorInfo(_mul_cell_to_forget_res.info()->tensor_shape(), 1, DataType::QSYMM16, QuantizationInfo(lstm_params.forget_intermediate_scale(), 0)));
318  _memory_group.manage(&_cell_to_forget_outstage_res);
319  const float cell_to_forget_scale = std::pow(2, cell_shift) * lstm_params.cell_to_forget_weights()->info()->quantization_info().uniform().scale / lstm_params.forget_intermediate_scale();
320  quantization::calculate_quantized_multiplier(cell_to_forget_scale, &gemmlowp_info.gemmlowp_multiplier, &gemmlowp_info.gemmlowp_shift);
321  _cell_to_forget_outstage.configure(compile_context, &_mul_cell_to_forget_res, nullptr, &_cell_to_forget_outstage_res, gemmlowp_info);
322  _mul_cell_to_forget_res.allocator()->allocate();
323  _accumulate_cell_forget.configure(compile_context, &_recurrent_to_forget_outstage_res, &_cell_to_forget_outstage_res, &_recurrent_to_forget_outstage_res,
325  _cell_to_forget_outstage_res.allocator()->allocate();
326  }
327 
328  CLTensor *forget_activation_input = &_recurrent_to_forget_outstage_res;
329 
330  if(_has_layer_norm)
331  {
332  configure_layer_norm(LayerNormGate::Forget, &_recurrent_to_forget_outstage_res);
333  _recurrent_to_forget_outstage_res.allocator()->allocate();
334  forget_activation_input = &get_layer_norm_output(LayerNormGate::Forget);
335  }
336 
337  // Output quantization info of Sigmoid and Tanh activations
338  const QuantizationInfo sigmoid_tanh_outqinfo(1.f / 32768.f, 0);
339 
340  const TensorInfo forget_gate_info(TensorShape(num_units, batch_size), 1, DataType::QSYMM16, sigmoid_tanh_outqinfo);
341  _memory_group.manage(&_forget_gate);
342  _forget_gate.allocator()->init(forget_gate_info);
343  _forget_gate_sigmoid.configure(compile_context, forget_activation_input, &_forget_gate, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::LOGISTIC));
344  forget_activation_input->allocator()->allocate();
345 
346  // Modulation gate.
347  const TensorInfo cell_outstage_info(mm_out_info.tensor_shape(), 1, DataType::QSYMM16, QuantizationInfo(lstm_params.cell_intermediate_scale(), 0));
348  const float input_to_cell_scale = input_to_cell_weights->info()->quantization_info().uniform().scale * qinput.scale / lstm_params.cell_intermediate_scale();
349  configure_mm(compile_context, _mm_input_to_cell, _input_to_cell_outstage, gemmlowp_info,
350  input, &_input_to_cell_weights_transposed, &_input_to_cell_eff_bias,
351  &_mm_input_to_cell_res, &_input_to_cell_outstage_res, input_to_cell_scale,
352  mm_out_info, cell_outstage_info);
353 
354  const float recurrent_to_cell_scale = recurrent_to_cell_weights->info()->quantization_info().uniform().scale * qoutput_state_in.scale / lstm_params.cell_intermediate_scale();
355  configure_mm(compile_context, _mm_recurrent_to_cell, _recurrent_to_cell_outstage, gemmlowp_info,
356  output_state_in, &_recurrent_to_cell_weights_transposed, &_recurrent_to_cell_eff_bias,
357  &_mm_recurrent_to_cell_res, &_recurrent_to_cell_outstage_res, recurrent_to_cell_scale,
358  mm_out_info, cell_outstage_info);
359 
360  _accumulate_input_recurrent_modulation.configure(compile_context, &_input_to_cell_outstage_res, &_recurrent_to_cell_outstage_res, &_recurrent_to_cell_outstage_res,
362  _input_to_cell_outstage_res.allocator()->allocate();
363 
364  CLTensor *cell_activation_input = &_recurrent_to_cell_outstage_res;
365 
366  if(_has_layer_norm)
367  {
368  configure_layer_norm(LayerNormGate::Cell, &_recurrent_to_cell_outstage_res);
369  _recurrent_to_cell_outstage_res.allocator()->allocate();
370  cell_activation_input = &get_layer_norm_output(LayerNormGate::Cell);
371  }
372 
373  const TensorInfo cell_gate_info(TensorShape(num_units, batch_size), 1, DataType::QSYMM16, sigmoid_tanh_outqinfo);
374  _memory_group.manage(&_cell_gate);
375  _cell_gate.allocator()->init(cell_gate_info);
376  _cell_gate_tanh.configure(compile_context, cell_activation_input, &_cell_gate, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::TANH, 1.f, 1.f));
377  cell_activation_input->allocator()->allocate();
378 
379  // Input gate.
380  const TensorInfo input_gate_info(TensorShape(num_units, batch_size), 1, DataType::QSYMM16, sigmoid_tanh_outqinfo);
381  _input_gate.allocator()->init(input_gate_info);
382  _memory_group.manage(&_input_gate);
383  if(_has_cifg)
384  {
385  _ones.allocator()->init(*_forget_gate.info());
386  _input_gate_sub.configure(compile_context, &_ones, &_forget_gate, &_input_gate, ConvertPolicy::SATURATE);
387  _ones.allocator()->allocate();
388  }
389  else
390  {
391  const TensorInfo input_outstage_info(TensorShape(num_units, batch_size), 1, DataType::QSYMM16, QuantizationInfo(lstm_params.input_intermediate_scale(), 0));
392  const float input_to_input_scale = _input_to_input_weights->info()->quantization_info().uniform().scale * qinput.scale / lstm_params.input_intermediate_scale();
393  configure_mm(compile_context, _mm_input_to_input, _input_to_input_outstage, gemmlowp_info,
394  input, &_input_to_input_weights_transposed, &_input_to_input_eff_bias,
395  &_mm_input_to_input_res, &_input_to_input_outstage_res, input_to_input_scale,
396  mm_out_info, input_outstage_info);
397 
398  const float recurrent_to_input_scale = _recurrent_to_input_weights->info()->quantization_info().uniform().scale * qoutput_state_in.scale / lstm_params.input_intermediate_scale();
399  configure_mm(compile_context, _mm_recurrent_to_input, _recurrent_to_input_outstage, gemmlowp_info,
400  output_state_in, &_recurrent_to_input_weights_transposed, &_recurrent_to_input_eff_bias,
401  &_mm_recurrent_to_input_res, &_recurrent_to_input_outstage_res, recurrent_to_input_scale,
402  mm_out_info, input_outstage_info);
403  _accumulate_input_recurrent_input.configure(compile_context, &_input_to_input_outstage_res, &_recurrent_to_input_outstage_res, &_recurrent_to_input_outstage_res,
405  _input_to_input_outstage_res.allocator()->allocate();
406 
407  if(_has_peephole)
408  {
409  _mul_cell_to_input_res.allocator()->init(TensorInfo(cell_state_in->info()->tensor_shape(), 1, DataType::S32));
410  _memory_group.manage(&_mul_cell_to_input_res);
411  _pixelwise_mul_cell_to_input.configure(compile_context, cell_state_in, lstm_params.cell_to_input_weights(), &_mul_cell_to_input_res, 1.f, ConvertPolicy::SATURATE, RoundingPolicy::TO_ZERO);
412  const float cell_to_input_scale = std::pow(2, cell_shift) * lstm_params.cell_to_input_weights()->info()->quantization_info().uniform().scale / lstm_params.input_intermediate_scale();
413  quantization::calculate_quantized_multiplier(cell_to_input_scale, &gemmlowp_info.gemmlowp_multiplier, &gemmlowp_info.gemmlowp_shift);
414  _cell_to_input_outstage_res.allocator()->init(TensorInfo(_mul_cell_to_input_res.info()->tensor_shape(), 1, DataType::QSYMM16, QuantizationInfo(lstm_params.input_intermediate_scale(), 0)));
415  _memory_group.manage(&_cell_to_input_outstage_res);
416  _cell_to_input_outstage.configure(compile_context, &_mul_cell_to_input_res, nullptr, &_cell_to_input_outstage_res, gemmlowp_info);
417  _mul_cell_to_input_res.allocator()->allocate();
418  _accumulate_cell_input.configure(&_recurrent_to_input_outstage_res, &_cell_to_input_outstage_res, &_recurrent_to_input_outstage_res, ConvertPolicy::SATURATE);
419  _cell_to_input_outstage_res.allocator()->allocate();
420  }
421 
422  CLTensor *input_activation_input = &_recurrent_to_input_outstage_res;
423 
424  if(_has_layer_norm)
425  {
426  configure_layer_norm(LayerNormGate::Input, &_recurrent_to_input_outstage_res);
427  _recurrent_to_input_outstage_res.allocator()->allocate();
428  input_activation_input = &get_layer_norm_output(LayerNormGate::Input);
429  }
430 
431  _input_gate_sigmoid.configure(compile_context, input_activation_input, &_input_gate, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::LOGISTIC));
432  input_activation_input->allocator()->allocate();
433  }
434  // Cell.
435  // TODO(COMPMID-3396): Perform multiplication in the quantized domain in CLPixelWiseMultiplication
436  _pixelwise_mul_forget_cell.configure(compile_context, &_forget_gate, cell_state_in, &_forget_gate, 1.f, ConvertPolicy::SATURATE, RoundingPolicy::TO_ZERO);
437  const float cell_gate_scale = _cell_gate.info()->quantization_info().uniform().scale;
438  const float mul_input_cell_scale = cell_gate_scale * std::pow(2, 15 + cell_shift);
439  const TensorInfo mul_input_cell_info(TensorShape(num_units, batch_size), 1, DataType::QSYMM16, QuantizationInfo(mul_input_cell_scale, 0));
440  _memory_group.manage(&_mul_input_cell_res);
441  _mul_input_cell_res.allocator()->init(mul_input_cell_info);
442  _pixelwise_mul_input_cell.configure(compile_context, &_input_gate, &_cell_gate, &_mul_input_cell_res, 1.f, ConvertPolicy::SATURATE, RoundingPolicy::TO_ZERO);
443  _cell_gate.allocator()->allocate();
444  _add_forget_cell.configure(compile_context, &_forget_gate, &_mul_input_cell_res, cell_state_out, ConvertPolicy::SATURATE);
445  _mul_input_cell_res.allocator()->allocate();
446  _forget_gate.allocator()->allocate();
447  if(_has_cell_clipping)
448  {
449  _cell_clip.configure(compile_context, cell_state_out, nullptr, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::LU_BOUNDED_RELU, -quantized_cell_clip, quantized_cell_clip));
450  }
451  // Output gate.
452  const TensorInfo output_outstage_info(TensorShape(num_units, batch_size), 1, DataType::QSYMM16, QuantizationInfo(lstm_params.output_intermediate_scale(), 0));
453  const float input_to_output_scale = input_to_output_weights->info()->quantization_info().uniform().scale * qinput.scale / lstm_params.output_intermediate_scale();
454  configure_mm(compile_context, _mm_input_to_output, _input_to_output_outstage, gemmlowp_info,
455  input, &_input_to_output_weights_transposed, &_input_to_output_eff_bias,
456  &_mm_input_to_output_res, &_input_to_output_outstage_res, input_to_output_scale,
457  mm_out_info, output_outstage_info);
458 
459  const float recurrent_to_output_scale = recurrent_to_output_weights->info()->quantization_info().uniform().scale * qoutput_state_in.scale / lstm_params.output_intermediate_scale();
460  configure_mm(compile_context, _mm_recurrent_to_output, _recurrent_to_output_outstage, gemmlowp_info,
461  output_state_in, &_recurrent_to_output_weights_transposed, &_recurrent_to_output_eff_bias,
462  &_mm_recurrent_to_output_res, &_recurrent_to_output_outstage_res, recurrent_to_output_scale,
463  mm_out_info, output_outstage_info);
464 
465  _accumulate_input_recurrent_output.configure(compile_context, &_recurrent_to_output_outstage_res, &_input_to_output_outstage_res, &_recurrent_to_output_outstage_res,
467  _input_to_output_outstage_res.allocator()->allocate();
468 
469  if(_has_peephole)
470  {
471  // TODO(COMPMID-3396): Perform multiplication in the quantized domain in CLPixelWiseMultiplication
472  // Here we are not using the output stage because all operations are done in float
473  _mul_cell_to_output_res.allocator()->init(TensorInfo(cell_state_out->info()->tensor_shape(), 1, DataType::S32));
474  _memory_group.manage(&_mul_cell_to_output_res);
475  _pixelwise_mul_cell_to_output.configure(compile_context, cell_state_out, lstm_params.cell_to_output_weights(), &_mul_cell_to_output_res, 1.f, ConvertPolicy::SATURATE, RoundingPolicy::TO_ZERO);
476 
477  const float cell_to_output_scale = std::pow(2, cell_shift) * lstm_params.cell_to_output_weights()->info()->quantization_info().uniform().scale / lstm_params.output_intermediate_scale();
478  quantization::calculate_quantized_multiplier(cell_to_output_scale, &gemmlowp_info.gemmlowp_multiplier, &gemmlowp_info.gemmlowp_shift);
479  _cell_to_output_outstage_res.allocator()->init(TensorInfo(_mul_cell_to_output_res.info()->tensor_shape(), 1, DataType::QSYMM16, QuantizationInfo(lstm_params.output_intermediate_scale(), 0)));
480  _memory_group.manage(&_cell_to_output_outstage_res);
481  _cell_to_output_outstage.configure(compile_context, &_mul_cell_to_output_res, nullptr, &_cell_to_output_outstage_res, gemmlowp_info);
482  _mul_cell_to_output_res.allocator()->allocate();
483 
484  _accumulate_cell_to_output.configure(compile_context, &_recurrent_to_output_outstage_res, &_cell_to_output_outstage_res, &_recurrent_to_output_outstage_res,
486  _cell_to_output_outstage_res.allocator()->allocate();
487  }
488 
489  CLTensor *output_activation_input = &_recurrent_to_output_outstage_res;
490 
491  if(_has_layer_norm)
492  {
493  configure_layer_norm(LayerNormGate::Output, &_recurrent_to_output_outstage_res);
494  _recurrent_to_output_outstage_res.allocator()->allocate();
495  output_activation_input = &get_layer_norm_output(LayerNormGate::Output);
496  }
497 
498  const TensorInfo output_gate_info(TensorShape(num_units, batch_size), 1, DataType::QSYMM16, sigmoid_tanh_outqinfo);
499  _memory_group.manage(&_output_gate);
500  _output_gate.allocator()->init(output_gate_info);
501  _output_gate_sigmoid.configure(compile_context, output_activation_input, &_output_gate, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::LOGISTIC));
502  output_activation_input->allocator()->allocate();
503 
504  // Hidden.
505  _hidden_tanh.configure(compile_context, cell_state_out, &_input_gate, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::TANH, 1.f, 1.f));
506  // TODO(COMPMID-3396): Perform multiplication in the quantized domain in CLPixelWiseMultiplication
507  _memory_group.manage(&_hidden_mul_res);
508  const TensorInfo hidden_mul_res(_input_gate.info()->tensor_shape(), 1, DataType::S32);
509  _hidden_mul_res.allocator()->init(hidden_mul_res);
510  _pixelwise_mul_hidden.configure(compile_context, &_output_gate, &_input_gate, &_hidden_mul_res, 1.f, ConvertPolicy::SATURATE, RoundingPolicy::TO_ZERO);
511  _output_gate.allocator()->allocate();
512  _input_gate.allocator()->allocate();
513  const float hidden_state_scale = std::pow(2, -15) / lstm_params.hidden_state_scale() * std::pow(2, -15);
514  quantization::calculate_quantized_multiplier(hidden_state_scale, &gemmlowp_info.gemmlowp_multiplier, &gemmlowp_info.gemmlowp_shift, /* ignore_epsilon */ true);
515  gemmlowp_info.gemmlowp_offset = lstm_params.hidden_state_zero();
516  gemmlowp_info.output_data_type = output_state_in->info()->data_type();
517 
518  _projection_tensor_copy_required = (num_units != output_size);
519  ICLTensor *hidden_gate_result = output_state_out;
520 
521  _memory_group.manage(&_hidden_gate);
522 
523  if(_projection_tensor_copy_required)
524  {
525  _hidden_gate.allocator()->init(*output_state_out->info());
526  _hidden_gate.info()->set_tensor_shape(_hidden_mul_res.info()->tensor_shape());
527  hidden_gate_result = &_hidden_gate;
528  }
529 
530  _hidden_outstage.configure(compile_context, &_hidden_mul_res, nullptr, hidden_gate_result, gemmlowp_info);
531  _hidden_mul_res.allocator()->allocate();
532 
533  // Projection.
534  if(_has_projection)
535  {
536  const TensorInfo projection_outstage_info(*output_state_out->info());
537  const UniformQuantizationInfo qprojection = _projection_weights->info()->quantization_info().uniform();
538  const float projection_scale = qprojection.scale * lstm_params.hidden_state_scale() / qoutput_state_in.scale;
539  gemmlowp_info.gemmlowp_offset = qoutput_state_in.offset;
540  gemmlowp_info.gemmlowp_min_bound = std::numeric_limits<int8_t>::lowest();
541  gemmlowp_info.gemmlowp_max_bound = std::numeric_limits<int8_t>::max();
542  gemmlowp_info.output_data_type = DataType::QASYMM8_SIGNED;
543 
544  TensorInfo projection_mm_out_info{ mm_out_info };
545  projection_mm_out_info.set_tensor_shape(TensorShape(output_size, batch_size));
546 
547  configure_mm(compile_context, _mm_projection, _projection_outstage, gemmlowp_info,
548  hidden_gate_result, &_projection_weights_transposed, &_projection_eff_bias,
549  &_mm_projection_res, &_projection_outstage_res, projection_scale,
550  projection_mm_out_info, projection_outstage_info);
551 
552  ICLTensor *accumulate_destination = output_state_out;
553 
554  if(_projection_tensor_copy_required)
555  {
556  _hidden_gate.allocator()->allocate();
557  _projection_accumulate_res.allocator()->init(*output_state_in->info());
558  _projection_accumulate_res.info()->set_tensor_shape(_projection_outstage_res.info()->tensor_shape());
559  _projection_output_to_accumulate_copy.configure(*output_state_in, _projection_accumulate_res);
560  accumulate_destination = &_projection_accumulate_res;
561  }
562 
563  _accumulate_projection.configure(compile_context, &_projection_outstage_res, accumulate_destination, accumulate_destination, ConvertPolicy::SATURATE);
564  _projection_outstage_res.allocator()->allocate();
565 
566  if(_projection_tensor_copy_required)
567  {
568  _projection_accumulate_to_output_copy.configure(_projection_accumulate_res, *output_state_out);
569  _projection_accumulate_res.allocator()->allocate();
570  }
571 
572  int8_t quantized_projection_clip{ 0 };
573  if(lstm_params.projection_clip() > 0.0f)
574  {
575  quantized_projection_clip = utility::clamp<int8_t>(lstm_params.projection_clip() / qprojection.scale, -128, 127);
576  }
577 
578  if(quantized_projection_clip > 0)
579  {
580  _projection_clip.configure(compile_context, output_state_out, nullptr, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::LU_BOUNDED_RELU, -quantized_projection_clip,
581  quantized_projection_clip));
582  _has_projection_clipping = true;
583  }
584  }
585  else
586  {
587  if(_projection_tensor_copy_required)
588  {
589  _hidden_to_output_copy.configure(_hidden_gate, *output_state_out);
590  _hidden_gate.allocator()->allocate();
591  }
592  }
593 
594  // Copy output_state_out to output
595  _copy_output.configure(compile_context, output_state_out, output);
596 }
int16_t quantize_qsymm16(float value, const UniformQuantizationInfo &qinfo, RoundingPolicy rounding_policy=RoundingPolicy::TO_NEAREST_UP)
Quantize a value given a 16-bit symmetric quantization scheme.
Quantize using a fixed point multiplication.
quantized, symmetric fixed-point 16-bit number
TensorInfo * info() const override
Interface to be implemented by the child class to return the tensor&#39;s metadata.
Definition: CLTensor.cpp:41
void build_lstm_params_tensor_info(const LSTMParams< T > &lstm_params, LSTMParams< ITensorInfo > *lstm_params_info)
Build LSTMParams<ITensorInfo> object by extracting the metadata from each tensor. ...
Definition: InfoHelpers.h:71
QuantizationInfo quantization_info() const override
Get the quantization settings (scale and offset) of the tensor.
Definition: TensorInfo.h:311
void configure(const ICLTensor *input, const ICLTensor *bias, ICLTensor *output, const GEMMLowpOutputStageInfo &info)
Initialise the kernel&#39;s inputs, output.
void configure(ICLTensor *input1, ICLTensor *input2, ICLTensor *output, float scale, ConvertPolicy overflow_policy, RoundingPolicy rounding_policy, const ActivationLayerInfo &act_info=ActivationLayerInfo())
Initialise the kernel&#39;s inputs, output and convertion policy.
CLTensorAllocator * allocator()
Return a pointer to the tensor&#39;s allocator.
Definition: CLTensor.cpp:61
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
Status calculate_quantized_multiplier(float multiplier, int32_t *quant_multiplier, int32_t *shift, bool ignore_epsilon=false)
Calculate quantized representation of multiplier.
void init(const TensorInfo &input, size_t alignment=0)
Initialize a tensor based on the passed TensorInfo.
1 channel, 1 S32 per channel
void manage(IMemoryManageable *obj) override
Sets a object to be managed by the given memory group.
Definition: MemoryGroup.h:79
void configure(ICLTensor *input, ICLTensor *output, Window *dst_window=nullptr)
Initialise the function&#39;s source and destination.
Definition: CLCopy.cpp:52
UniformQuantizationInfo uniform() const
Return per layer quantization info.
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor&#39;s metadata.
void configure(ICLTensor *input1, ICLTensor *input2, ICLTensor *output, ConvertPolicy policy, const ActivationLayerInfo &act_info=ActivationLayerInfo())
Initialise the kernel&#39;s inputs, output and conversion policy.
virtual QuantizationInfo quantization_info() const =0
Get the quantization settings (scale and offset) of the tensor.
void configure(const ICLTensor *input1, const ICLTensor *input2, ICLTensor *output, ConvertPolicy policy, const ActivationLayerInfo &act_info=ActivationLayerInfo())
Initialise the kernel&#39;s inputs, output and conversion policy.
void allocate() override
Allocate size specified by TensorInfo of OpenCL memory.
void configure(ICLTensor *input, ICLTensor *output, ActivationLayerInfo act_info)
Set the input and output tensor.
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161
ITensorInfo & set_tensor_shape(const TensorShape &shape) override
Set the shape of an already initialized tensor.
Definition: TensorInfo.cpp:352
quantized, asymmetric fixed-point 8-bit number signed
void configure(const ICLTensor *input, ICLTensor *output)
Initialise the kernel&#39;s inputs and output.
Definition: CLTranspose.cpp:32
const TensorShape & tensor_shape() const override
Size for each dimension of the tensor.
Definition: TensorInfo.h:262
Truncates the least significant values that are lost in operations.
static Status validate(const ITensorInfo *input, const ITensorInfo *input_to_forget_weights, const ITensorInfo *input_to_cell_weights, const ITensorInfo *input_to_output_weights, const ITensorInfo *recurrent_to_forget_weights, const ITensorInfo *recurrent_to_cell_weights, const ITensorInfo *recurrent_to_output_weights, const ITensorInfo *forget_gate_bias, const ITensorInfo *cell_bias, const ITensorInfo *output_gate_bias, const ITensorInfo *cell_state_in, const ITensorInfo *output_state_in, const ITensorInfo *cell_state_out, const ITensorInfo *output_state_out, const ITensorInfo *output, const LSTMParams< ITensorInfo > &lstm_params)
Static function to check if given info will lead to a valid configuration of CLQLSTMLayer.

◆ operator=() [1/2]

CLQLSTMLayer& operator= ( const CLQLSTMLayer )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

CLQLSTMLayer& operator= ( CLQLSTMLayer &&  )
default

Default move assignment operator.

◆ prepare()

void prepare ( )
overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Note
Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from IFunction.

Definition at line 1104 of file CLQLSTMLayer.cpp.

References CLTensorAllocator::allocate(), CLTensor::allocator(), ICLTensor::buffer(), TensorInfo::element_size(), CLScheduler::enqueue(), CLScheduler::get(), CLTensor::info(), CLTensor::map(), ITensor::mark_as_unused(), CLScheduler::queue(), ICLSimpleFunction::run(), CLArithmeticAddition::run(), TensorInfo::total_size(), and CLTensor::unmap().

Referenced by CLQLSTMLayer::run().

1105 {
1106  if(!_is_prepared)
1107  {
1108  // Pre-transpose weights to be used in GEMM.
1109  _input_to_forget_weights_transposed.allocator()->allocate();
1110  _input_to_cell_weights_transposed.allocator()->allocate();
1111  _input_to_output_weights_transposed.allocator()->allocate();
1112  _recurrent_to_forget_weights_transposed.allocator()->allocate();
1113  _recurrent_to_cell_weights_transposed.allocator()->allocate();
1114  _recurrent_to_output_weights_transposed.allocator()->allocate();
1115  _transpose_input_to_forget_weights.run();
1116  _transpose_input_to_cell_weights.run();
1117  _transpose_input_to_output_weights.run();
1118  _transpose_recurrent_to_forget_weights.run();
1119  _transpose_recurrent_to_cell_weights.run();
1120  _transpose_recurrent_to_output_weights.run();
1121 
1122  // Precompute effective biases
1123  if(_has_cifg)
1124  {
1125  _ones.map(true);
1126  std::fill_n(reinterpret_cast<int16_t *>(_ones.buffer()), _ones.info()->total_size() / _ones.info()->element_size(), 32767);
1127  _ones.unmap();
1128  }
1129  else
1130  {
1131  _input_to_input_eff_bias.allocator()->allocate();
1132  _recurrent_to_input_eff_bias.allocator()->allocate();
1133  CLScheduler::get().enqueue(*_input_to_input_reduction);
1134  CLScheduler::get().enqueue(*_recurrent_to_input_reduction);
1135 
1136  _input_to_input_weights_transposed.allocator()->allocate();
1137  _recurrent_to_input_weights_transposed.allocator()->allocate();
1138  _transpose_input_to_input_weights.run();
1139  _transpose_recurrent_to_input_weights.run();
1140  _input_to_input_weights->mark_as_unused();
1141  _recurrent_to_input_weights->mark_as_unused();
1142  }
1143  _input_to_forget_eff_bias.allocator()->allocate();
1144  _recurrent_to_forget_eff_bias.allocator()->allocate();
1145  _input_to_cell_eff_bias.allocator()->allocate();
1146  _recurrent_to_cell_eff_bias.allocator()->allocate();
1147  _input_to_output_eff_bias.allocator()->allocate();
1148  _recurrent_to_output_eff_bias.allocator()->allocate();
1149  CLScheduler::get().enqueue(*_input_to_forget_reduction);
1150  CLScheduler::get().enqueue(*_recurrent_to_forget_reduction);
1151  CLScheduler::get().enqueue(*_input_to_cell_reduction);
1152  CLScheduler::get().enqueue(*_recurrent_to_cell_reduction);
1153  CLScheduler::get().enqueue(*_input_to_output_reduction);
1154  CLScheduler::get().enqueue(*_recurrent_to_output_reduction);
1155 
1156  if(_has_projection)
1157  {
1158  _projection_eff_bias.allocator()->allocate();
1159  CLScheduler::get().enqueue(*_projection_reduction);
1160  if(_projection_bias != nullptr)
1161  {
1162  _projection_bias_add.run();
1163  _projection_bias->mark_as_unused();
1164  }
1165 
1166  _projection_weights_transposed.allocator()->allocate();
1167  _transpose_projection_weights.run();
1168  _projection_weights->mark_as_unused();
1169 
1170  if(!_projection_tensor_copy_required)
1171  {
1172  _hidden_gate.mark_as_unused();
1173  _projection_accumulate_res.mark_as_unused();
1174  }
1175  }
1176 
1177  // Mark weights as unused
1178  _input_to_forget_weights->mark_as_unused();
1179  _input_to_cell_weights->mark_as_unused();
1180  _input_to_output_weights->mark_as_unused();
1181  _recurrent_to_forget_weights->mark_as_unused();
1182  _recurrent_to_cell_weights->mark_as_unused();
1183  _recurrent_to_output_weights->mark_as_unused();
1184 
1185  CLScheduler::get().queue().finish();
1186  _is_prepared = true;
1187  }
1188 }
TensorInfo * info() const override
Interface to be implemented by the child class to return the tensor&#39;s metadata.
Definition: CLTensor.cpp:41
void run() override
Run the kernels contained in the function.
static CLScheduler & get()
Access the scheduler singleton.
CLTensorAllocator * allocator()
Return a pointer to the tensor&#39;s allocator.
Definition: CLTensor.cpp:61
void map(bool blocking=true)
Enqueue a map operation of the allocated buffer.
Definition: CLTensor.cpp:66
void mark_as_unused() const
Marks a tensor as unused.
Definition: ITensor.cpp:168
uint8_t * buffer() const override
Interface to be implemented by the child class to return a pointer to CPU memory. ...
Definition: ICLTensor.cpp:53
void run() override final
Run the kernels contained in the function.
size_t total_size() const override
Returns the total size of the tensor in bytes.
Definition: TensorInfo.h:278
cl::CommandQueue & queue()
Accessor for the associated CL command queue.
Definition: CLScheduler.cpp:41
void enqueue(ICLKernel &kernel, bool flush=true)
Schedule the execution of the passed kernel if possible.
void allocate() override
Allocate size specified by TensorInfo of OpenCL memory.
void unmap()
Enqueue an unmap operation of the allocated and mapped buffer.
Definition: CLTensor.cpp:71
size_t element_size() const override
Element size in bytes calculated as data_size() * num_channels()
Definition: TensorInfo.h:250

◆ run()

void run ( )
overridevirtual

Run the kernels contained in the function.

For Neon kernels:

  • Multi-threading is used for the kernels which are parallelisable.
  • By default std::thread::hardware_concurrency() threads are used.
Note
CPPScheduler::set_num_threads() can be used to manually set the number of threads

For OpenCL kernels:

  • All the kernels are enqueued on the queue associated with CLScheduler.
  • The queue is then flushed.
Note
The function will not block until the kernels are executed. It is the user's responsibility to wait.
Will call prepare() on first run if hasn't been done

Implements IFunction.

Definition at line 963 of file CLQLSTMLayer.cpp.

References CLScheduler::enqueue(), CLScheduler::get(), CLQLSTMLayer::prepare(), ICLSimpleFunction::run(), CLCopy::run(), CLActivationLayer::run(), CLGEMMLowpMatrixMultiplyCore::run(), CLArithmeticAddition::run(), CLArithmeticSubtraction::run(), and CLPixelWiseMultiplication::run().

964 {
965  prepare();
966 
967  // Acquire all the temporaries
968  MemoryGroupResourceScope scope_mg(_memory_group);
969 
970  // Forget gate.
971  _mm_input_to_forget.run();
972  _input_to_forget_outstage.run();
973 
974  _mm_recurrent_to_forget.run();
975  _recurrent_to_forget_outstage.run();
976  _accumulate_input_recurrent_forget.run();
977 
978  if(_has_peephole)
979  {
980  _pixelwise_mul_cell_to_forget.run();
981  _cell_to_forget_outstage.run();
982  _accumulate_cell_forget.run();
983  }
984 
985  if(_has_layer_norm)
986  {
987  CLScheduler::get().enqueue(get_layer_norm(LayerNormGate::Forget));
988  }
989 
990  _forget_gate_sigmoid.run();
991 
992  // Modulation gate.
993  _mm_input_to_cell.run();
994  _input_to_cell_outstage.run();
995 
996  _mm_recurrent_to_cell.run();
997  _recurrent_to_cell_outstage.run();
998  _accumulate_input_recurrent_modulation.run();
999 
1000  if(_has_layer_norm)
1001  {
1002  CLScheduler::get().enqueue(get_layer_norm(LayerNormGate::Cell));
1003  }
1004 
1005  _cell_gate_tanh.run();
1006 
1007  // Input gate
1008  if(_has_cifg)
1009  {
1010  _input_gate_sub.run();
1011  }
1012  else
1013  {
1014  _mm_input_to_input.run();
1015  _input_to_input_outstage.run();
1016  _mm_recurrent_to_input.run();
1017  _recurrent_to_input_outstage.run();
1018  _accumulate_input_recurrent_input.run();
1019 
1020  if(_has_peephole)
1021  {
1022  _pixelwise_mul_cell_to_input.run();
1023  _cell_to_input_outstage.run();
1024  _accumulate_cell_input.run();
1025  }
1026 
1027  if(_has_layer_norm)
1028  {
1029  CLScheduler::get().enqueue(get_layer_norm(LayerNormGate::Input));
1030  }
1031 
1032  _input_gate_sigmoid.run();
1033  }
1034 
1035  // Cell.
1036  _pixelwise_mul_forget_cell.run();
1037  _pixelwise_mul_input_cell.run();
1038  _add_forget_cell.run();
1039  if(_has_cell_clipping)
1040  {
1041  _cell_clip.run();
1042  }
1043 
1044  // Output gate.
1045  _mm_input_to_output.run();
1046  _input_to_output_outstage.run();
1047  _mm_recurrent_to_output.run();
1048  _recurrent_to_output_outstage.run();
1049  _accumulate_input_recurrent_output.run();
1050  if(_has_peephole)
1051  {
1052  _pixelwise_mul_cell_to_output.run();
1053  _cell_to_output_outstage.run();
1054  _accumulate_cell_to_output.run();
1055  }
1056 
1057  if(_has_layer_norm)
1058  {
1059  CLScheduler::get().enqueue(get_layer_norm(LayerNormGate::Output));
1060  }
1061 
1062  _output_gate_sigmoid.run();
1063 
1064  // Hidden.
1065  _hidden_tanh.run();
1066  _pixelwise_mul_hidden.run();
1067  _hidden_outstage.run();
1068 
1069  // Projection.
1070  if(_has_projection)
1071  {
1072  _mm_projection.run();
1073  _projection_outstage.run();
1074 
1075  if(_projection_tensor_copy_required)
1076  {
1077  _projection_output_to_accumulate_copy.run();
1078  }
1079 
1080  _accumulate_projection.run();
1081 
1082  if(_projection_tensor_copy_required)
1083  {
1084  _projection_accumulate_to_output_copy.run();
1085  }
1086 
1087  if(_has_projection_clipping)
1088  {
1089  _projection_clip.run();
1090  }
1091  }
1092  else
1093  {
1094  if(_projection_tensor_copy_required)
1095  {
1096  _hidden_to_output_copy.run();
1097  }
1098  }
1099 
1100  // Copy output_state_out to output
1101  _copy_output.run();
1102 }
void run() override
Run the kernels contained in the function.
static CLScheduler & get()
Access the scheduler singleton.
void run() override
Run the kernels contained in the function.
void run() override
Run the kernels contained in the function.
Definition: CLCopy.cpp:73
void run() override
Run the kernels contained in the function.
void run() override
Run the kernels contained in the function.
void run() override
Run the kernels contained in the function.
void prepare() override
Prepare the function for executing.
void run() override final
Run the kernels contained in the function.
void enqueue(ICLKernel &kernel, bool flush=true)
Schedule the execution of the passed kernel if possible.

◆ validate()

Status validate ( const ITensorInfo input,
const ITensorInfo input_to_forget_weights,
const ITensorInfo input_to_cell_weights,
const ITensorInfo input_to_output_weights,
const ITensorInfo recurrent_to_forget_weights,
const ITensorInfo recurrent_to_cell_weights,
const ITensorInfo recurrent_to_output_weights,
const ITensorInfo forget_gate_bias,
const ITensorInfo cell_bias,
const ITensorInfo output_gate_bias,
const ITensorInfo cell_state_in,
const ITensorInfo output_state_in,
const ITensorInfo cell_state_out,
const ITensorInfo output_state_out,
const ITensorInfo output,
const LSTMParams< ITensorInfo > &  lstm_params 
)
static

Static function to check if given info will lead to a valid configuration of CLQLSTMLayer.

Parameters
[in]inputSource tensor info. Input is a 2D tensor info with dimensions [input_size, batch_size]. Data types supported: QASYMM8_SIGNED.
[in]input_to_forget_weights2D weights tensor info with dimensions [input_size, num_units]. Data type supported: QSYMM8.
[in]input_to_cell_weights2D weights tensor info with dimensions [input_size, num_units]. Data type supported: QSYMM8.
[in]input_to_output_weights2D weights tensor info with dimensions [input_size, num_units]. Data type supported: QSYMM8.
[in]recurrent_to_forget_weights2D weights tensor info with dimensions [output_size, num_units]. Data type supported: QSYMM8.
[in]recurrent_to_cell_weights2D weights tensor info with dimensions [output_size, num_units]. Data type supported: QSYMM8.
[in]recurrent_to_output_weights2D weights tensor info with dimensions [output_size, num_units]. Data type supported: QSYMM8.
[in]forget_gate_bias1D weights tensor info with dimensions [num_units]. Data type supported: S32.
[in]cell_bias1D weights tensor info with dimensions [num_units]. Data type supported: S32.
[in]output_gate_bias1D weights tensor info with dimensions [num_units]. Data type supported: S32.
[in]cell_state_in2D tensor info with dimensions [num_units, batch_size]. Data type supported: QSYMM16.
[in]output_state_in2D tensor info with dimensions [output_size, batch_size]. Data type supported: Same as input.
[in]cell_state_outDestination tensor info. Output is a 2D tensor info with dimensions [num_units, batch_size]. Data type supported: QSYMM16.
[in]output_state_outDestination tensor info. Output is a 2D tensor info with dimensions [output_size, batch_size].Data types supported: Same as input.
[in]outputDestination tensor info. Output is a 2D tensor info with dimensions [output_size, batch_size].Data types supported: Same as input.
[in]lstm_paramsWeights tensors info used in peephole, CIFG and layer normalization optimizations: input_intermediate_scale Scale of the intermediate result of matmul, i.e. input to layer normalization, at input gate. forget_intermediate_scale Scale of the intermediate result of matmul, i.e. input to layer normalization, at forget gate. cell_intermediate_scale Scale of the intermediate result of matmul, i.e. input to layer normalization, at cell gate. output_intermediate_scale Scale of the intermediate result of matmul, i.e. input to layer normalization, at output gate. hidden_state_zero The zero point of the hidden state. hidden_state_scale The scale of the hidden state. input_to_input_weights (Optional) 2D weights tensor with dimensions [input_size, num_units]. Data type supported: QSYMM8. recurrent_to_input_weights (Optional) 2D weights tensor with dimensions [output_size, num_units]. Data type supported: QSYMM8. cell_to_input_weights (Optional) 1D weights tensor with dimensions [num_units]. Can be nullptr. Data type supported: QSYMM16. cell_to_forget_weights (Optional) 1D weights tensor with dimensions [num_units]. Data type supported: QSYMM16. cell_to_output_weights (Optional) 1D weights tensor with dimensions [num_units]. Data type supported: QSYMM16. input_gate_bias (Optional) 1D weights tensor with dimensions [num_units]. Data type supported: S32. projection_weights (Optional) 2D weights tensor with dimensions [output_size, num_units]. Data type supported: QSYMM8. projection_bias (Optional) 1D weights tensor with dimensions [output_size]. S32. input_layer_norm_weights (Optional) 1D weights tensor with dimensions [num_units]. Data type supported: QSYMM16. forget_layer_norm_weights (Optional) 1D weights tensor with dimensions [num_units]. Data type supported: QSYMM16. cell_layer_norm_weights (Optional) 1D weights tensor with dimensions [num_units]. Data type supported: QSYMM16. output_layer_norm_weights (Optional) 1D weights tensor with dimensions [num_units]. Data type supported: QSYMM16. cell_threshold (Optional) The clipping threshold for the cell state, such that values are bound within [-cell_clip, cell_clip]. If set to 0.0 then clipping is disabled. projection_threshold (Optional) The clipping threshold for the output from the projection layer, such that values are bound within [-proj_clip, proj_clip]. If set to 0.0 then clipping is disabled.
Returns
a status

Definition at line 598 of file CLQLSTMLayer.cpp.

References ARM_COMPUTE_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_RETURN_ERROR_ON, ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_SHAPES, ARM_COMPUTE_RETURN_ERROR_ON_MSG, ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR, ARM_COMPUTE_RETURN_ON_ERROR, arm_compute::quantization::calculate_quantized_multiplier(), LSTMParams< T >::cell_clip(), LSTMParams< T >::cell_intermediate_scale(), LSTMParams< T >::cell_layer_norm_weights(), LSTMParams< T >::cell_to_forget_weights(), LSTMParams< T >::cell_to_input_weights(), LSTMParams< T >::cell_to_output_weights(), ITensorInfo::data_type(), TensorInfo::data_type(), ITensorInfo::dimension(), arm_compute::test::validation::forget_gate_bias, LSTMParams< T >::forget_intermediate_scale(), LSTMParams< T >::forget_layer_norm_weights(), GEMMLowpOutputStageInfo::gemmlowp_max_bound, GEMMLowpOutputStageInfo::gemmlowp_min_bound, GEMMLowpOutputStageInfo::gemmlowp_multiplier, GEMMLowpOutputStageInfo::gemmlowp_offset, GEMMLowpOutputStageInfo::gemmlowp_shift, LSTMParams< T >::has_cifg_opt(), LSTMParams< T >::has_peephole_opt(), LSTMParams< T >::has_projection(), LSTMParams< T >::hidden_state_scale(), LSTMParams< T >::hidden_state_zero(), LSTMParams< T >::input_gate_bias(), LSTMParams< T >::input_intermediate_scale(), LSTMParams< T >::input_layer_norm_weights(), arm_compute::test::validation::input_size, LSTMParams< T >::input_to_input_weights(), ActivationLayerInfo::LOGISTIC, arm_compute::support::cpp11::lowest(), ActivationLayerInfo::LU_BOUNDED_RELU, ITensorInfo::num_dimensions(), UniformQuantizationInfo::offset, GEMMLowpOutputStageInfo::output_data_type, arm_compute::test::validation::output_gate_bias, LSTMParams< T >::output_intermediate_scale(), LSTMParams< T >::output_layer_norm_weights(), arm_compute::test::validation::output_size, LSTMParams< T >::projection_bias(), LSTMParams< T >::projection_clip(), LSTMParams< T >::projection_weights(), arm_compute::QASYMM8_SIGNED, arm_compute::QSYMM16, arm_compute::QSYMM8, ITensorInfo::quantization_info(), arm_compute::QUANTIZE_DOWN_FIXEDPOINT, arm_compute::quantize_qasymm8_signed(), arm_compute::quantize_qsymm16(), LSTMParams< T >::recurrent_to_input_weights(), arm_compute::S32, arm_compute::SATURATE, UniformQuantizationInfo::scale, TensorInfo::set_tensor_shape(), ActivationLayerInfo::TANH, arm_compute::TO_ZERO, ITensorInfo::total_size(), GEMMLowpOutputStageInfo::type, QuantizationInfo::uniform(), LSTMParams< T >::use_layer_norm(), CLTranspose::validate(), CLCopy::validate(), CLActivationLayer::validate(), CLGEMMLowpMatrixAReductionKernel::validate(), CLArithmeticAddition::validate(), arm_compute::validate(), CLArithmeticSubtraction::validate(), CLPixelWiseMultiplication::validate(), and CLGEMMLowpOutputStage::validate().

Referenced by CLQLSTMLayer::configure().

605 {
607  recurrent_to_output_weights, forget_gate_bias, cell_bias, output_gate_bias, cell_state_in, output_state_in,
608  cell_state_out, output_state_out, output);
609 
611  ARM_COMPUTE_RETURN_ERROR_ON_MSG(input->num_dimensions() != 2, "Input must have exactly 2 dimensions");
612 
613  const unsigned int input_size = input->dimension(0);
614  const unsigned int batch_size = input->dimension(1);
615  const unsigned int num_units = input_to_output_weights->dimension(1);
616  const unsigned int output_size = output_state_out->dimension(_out_state_output_size_dimension_idx);
617 
622  ARM_COMPUTE_RETURN_ERROR_ON(recurrent_to_output_weights->dimension(1) != num_units);
627 
628  ARM_COMPUTE_RETURN_ERROR_ON(forget_gate_bias->num_dimensions() != 1);
629  ARM_COMPUTE_RETURN_ERROR_ON(forget_gate_bias->dimension(0) != num_units);
633 
634  ARM_COMPUTE_RETURN_ERROR_ON(cell_state_in->num_dimensions() != 2);
635  ARM_COMPUTE_RETURN_ERROR_ON(cell_state_in->dimension(0) != num_units);
636  ARM_COMPUTE_RETURN_ERROR_ON(cell_state_in->dimension(1) != batch_size);
638 
639  ARM_COMPUTE_RETURN_ERROR_ON(output_state_in->num_dimensions() != 2);
640  ARM_COMPUTE_RETURN_ERROR_ON(output_state_in->dimension(0) != output_size);
641  ARM_COMPUTE_RETURN_ERROR_ON(output_state_in->dimension(1) != batch_size);
643 
644  // Check whether peephole weights are all there or none
645  if(lstm_params.has_peephole_opt())
646  {
647  ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR(lstm_params.cell_to_forget_weights(), lstm_params.cell_to_output_weights());
648  ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(lstm_params.cell_to_forget_weights(), 1, DataType::QSYMM16);
649  ARM_COMPUTE_RETURN_ERROR_ON(lstm_params.cell_to_forget_weights()->num_dimensions() != 1);
650  ARM_COMPUTE_RETURN_ERROR_ON(lstm_params.cell_to_forget_weights()->dimension(0) != num_units);
651  ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(lstm_params.cell_to_forget_weights(), lstm_params.cell_to_output_weights());
652  ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_SHAPES(lstm_params.cell_to_forget_weights(), lstm_params.cell_to_output_weights());
653 
654  if(!lstm_params.has_cifg_opt())
655  {
656  ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR(lstm_params.cell_to_input_weights());
657  ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(lstm_params.cell_to_forget_weights(), lstm_params.cell_to_input_weights());
658  ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_SHAPES(lstm_params.cell_to_forget_weights(), lstm_params.cell_to_input_weights());
659  }
660  }
661 
662  const UniformQuantizationInfo qinput = input->quantization_info().uniform();
663  const UniformQuantizationInfo qcell_state_in = cell_state_in->quantization_info().uniform();
664  const UniformQuantizationInfo qoutput_state_in = output_state_in->quantization_info().uniform();
665 
666  // Calculate and decompose effective scales for optimizing matmul calculation
667  const int32_t cell_shift = log2(qcell_state_in.scale);
668  ARM_COMPUTE_RETURN_ERROR_ON(cell_shift > -9);
669 
670  // Calculate quantized parameters for clipping.
671  int16_t quantized_cell_clip = 0;
672  if(lstm_params.cell_clip() > 0.0f)
673  {
674  quantized_cell_clip = quantize_qsymm16(lstm_params.cell_clip(), qcell_state_in);
675  }
676 
677  // Precompute effective bias for optimizing the matmul computations.
678  const TensorInfo eff_bias_info(TensorShape(num_units), 1, DataType::S32);
679  const TensorInfo projection_eff_bias_info(TensorShape(output_size), 1, DataType::S32);
680  if(!lstm_params.has_cifg_opt())
681  {
682  ARM_COMPUTE_RETURN_ON_ERROR(CLGEMMLowpMatrixAReductionKernel::validate(lstm_params.input_to_input_weights(), &eff_bias_info, GEMMLowpReductionKernelInfo(num_units, false, -qinput.offset, true)));
683  ARM_COMPUTE_RETURN_ON_ERROR(CLGEMMLowpMatrixAReductionKernel::validate(lstm_params.recurrent_to_input_weights(), &eff_bias_info, GEMMLowpReductionKernelInfo(num_units, false, -qoutput_state_in.offset,
684  true)));
685  }
686  ARM_COMPUTE_RETURN_ON_ERROR(CLGEMMLowpMatrixAReductionKernel::validate(input_to_forget_weights, &eff_bias_info, GEMMLowpReductionKernelInfo(num_units, false, -qinput.offset, true)));
687  ARM_COMPUTE_RETURN_ON_ERROR(CLGEMMLowpMatrixAReductionKernel::validate(recurrent_to_forget_weights, &eff_bias_info, GEMMLowpReductionKernelInfo(num_units, false, -qoutput_state_in.offset, true)));
688  ARM_COMPUTE_RETURN_ON_ERROR(CLGEMMLowpMatrixAReductionKernel::validate(input_to_cell_weights, &eff_bias_info, GEMMLowpReductionKernelInfo(num_units, false, -qinput.offset, true)));
689  ARM_COMPUTE_RETURN_ON_ERROR(CLGEMMLowpMatrixAReductionKernel::validate(recurrent_to_cell_weights, &eff_bias_info, GEMMLowpReductionKernelInfo(num_units, false, -qoutput_state_in.offset, true)));
690  ARM_COMPUTE_RETURN_ON_ERROR(CLGEMMLowpMatrixAReductionKernel::validate(input_to_output_weights, &eff_bias_info, GEMMLowpReductionKernelInfo(num_units, false, -qinput.offset, true)));
691  ARM_COMPUTE_RETURN_ON_ERROR(CLGEMMLowpMatrixAReductionKernel::validate(recurrent_to_output_weights, &eff_bias_info, GEMMLowpReductionKernelInfo(num_units, false, -qoutput_state_in.offset, true)));
692  if(lstm_params.has_projection())
693  {
694  ARM_COMPUTE_RETURN_ON_ERROR(CLGEMMLowpMatrixAReductionKernel::validate(lstm_params.projection_weights(), &projection_eff_bias_info, GEMMLowpReductionKernelInfo(output_size, false,
695  lstm_params.hidden_state_zero(),
696  true)));
697  if(lstm_params.projection_bias() != nullptr)
698  {
699  ARM_COMPUTE_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(lstm_params.projection_bias(), 1, DataType::S32);
700  ARM_COMPUTE_RETURN_ON_ERROR(CLArithmeticAddition::validate(lstm_params.projection_bias(), &projection_eff_bias_info,
701  &projection_eff_bias_info, ConvertPolicy::SATURATE));
702  }
703  }
704 
705  const TensorInfo input_weights_transposed(TensorShape(num_units, input_size), 1, input_to_forget_weights->data_type(), input_to_forget_weights->quantization_info());
706  const TensorInfo recurrent_weights_transposed(TensorShape(num_units, output_size), 1, recurrent_to_forget_weights->data_type(), recurrent_to_forget_weights->quantization_info());
707 
708  // Validate weights transpose
715  if(!lstm_params.has_cifg_opt())
716  {
717  ARM_COMPUTE_RETURN_ON_ERROR(CLTranspose::validate(lstm_params.input_to_input_weights(), &input_weights_transposed));
718  ARM_COMPUTE_RETURN_ON_ERROR(CLTranspose::validate(lstm_params.recurrent_to_input_weights(), &recurrent_weights_transposed));
719  }
720  if(lstm_params.has_projection())
721  {
722  const TensorInfo projection_weights_transposed(TensorShape(output_size, num_units), 1, lstm_params.projection_weights()->data_type(), lstm_params.projection_weights()->quantization_info());
723  ARM_COMPUTE_RETURN_ON_ERROR(CLTranspose::validate(lstm_params.projection_weights(), &projection_weights_transposed));
724  }
725 
726  GEMMLowpOutputStageInfo gemmlowp_info;
728  gemmlowp_info.gemmlowp_min_bound = std::numeric_limits<int16_t>::lowest();
729  gemmlowp_info.gemmlowp_max_bound = std::numeric_limits<int16_t>::max();
730  gemmlowp_info.output_data_type = DataType::QSYMM16;
731 
732  const bool has_layer_norm = lstm_params.use_layer_norm();
733 
734  // Forget gate.
735  ARM_COMPUTE_RETURN_ERROR_ON(lstm_params.forget_intermediate_scale() == 0);
736  const TensorInfo forget_outstage_info(TensorShape(num_units, batch_size), 1, DataType::QSYMM16, QuantizationInfo(lstm_params.forget_intermediate_scale(), 0));
737  const TensorInfo mm_out_info(TensorShape(num_units, batch_size), 1, DataType::S32);
738  const float input_to_forget_scale = input_to_forget_weights->quantization_info().uniform().scale * qinput.scale / lstm_params.forget_intermediate_scale();
739  ARM_COMPUTE_RETURN_ON_ERROR(validate_mm(gemmlowp_info, input, &input_weights_transposed, &eff_bias_info, input_to_forget_scale, &mm_out_info, &forget_outstage_info));
740 
741  const float recurrent_to_forget_scale = recurrent_to_forget_weights->quantization_info().uniform().scale * qoutput_state_in.scale / lstm_params.forget_intermediate_scale();
742  ARM_COMPUTE_RETURN_ON_ERROR(validate_mm(gemmlowp_info, output_state_in, &recurrent_weights_transposed, &eff_bias_info, recurrent_to_forget_scale, &mm_out_info, &forget_outstage_info));
743 
744  ARM_COMPUTE_RETURN_ON_ERROR(CLArithmeticAddition::validate(&forget_outstage_info, &forget_outstage_info, &forget_outstage_info, ConvertPolicy::SATURATE));
745 
746  if(lstm_params.has_peephole_opt())
747  {
748  ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(lstm_params.cell_to_forget_weights(), 1, DataType::QSYMM16);
749  ARM_COMPUTE_RETURN_ON_ERROR(CLPixelWiseMultiplication::validate(cell_state_in, lstm_params.cell_to_forget_weights(), &mm_out_info, 1.f, ConvertPolicy::SATURATE,
751  const float cell_to_forget_scale = std::pow(2, cell_shift) * lstm_params.cell_to_forget_weights()->quantization_info().uniform().scale / lstm_params.forget_intermediate_scale();
752  ARM_COMPUTE_RETURN_ON_ERROR(quantization::calculate_quantized_multiplier(cell_to_forget_scale, &gemmlowp_info.gemmlowp_multiplier, &gemmlowp_info.gemmlowp_shift));
753  ARM_COMPUTE_RETURN_ON_ERROR(CLGEMMLowpOutputStage::validate(&mm_out_info, nullptr, &forget_outstage_info, gemmlowp_info));
754  ARM_COMPUTE_RETURN_ON_ERROR(CLArithmeticAddition::validate(&forget_outstage_info, &forget_outstage_info, &forget_outstage_info, ConvertPolicy::SATURATE));
755  }
756 
757  if(has_layer_norm)
758  {
759  const ITensorInfo *w_info = lstm_params.forget_layer_norm_weights();
760  const ITensorInfo *b_info = forget_gate_bias;
761  ARM_COMPUTE_RETURN_ON_ERROR(validate_layer_norm(forget_outstage_info, *w_info, *b_info));
762  }
763 
764  // Output quantization info of Sigmoid and Tanh activations
765  const QuantizationInfo sigmoid_tanh_outqinfo(1.f / 32768.f, 0);
766 
767  const TensorInfo forget_gate_info(TensorShape(num_units, batch_size), 1, DataType::QSYMM16, sigmoid_tanh_outqinfo);
768  ARM_COMPUTE_RETURN_ON_ERROR(CLActivationLayer::validate(&forget_outstage_info, &forget_gate_info, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::LOGISTIC)));
769 
770  // Modulation gate.
771  ARM_COMPUTE_RETURN_ERROR_ON(lstm_params.cell_intermediate_scale() == 0);
772  const TensorInfo cell_outstage_info(TensorShape(num_units, batch_size), 1, DataType::QSYMM16, QuantizationInfo(lstm_params.cell_intermediate_scale(), 0));
773  const float input_to_cell_scale = input_to_cell_weights->quantization_info().uniform().scale * qinput.scale / lstm_params.cell_intermediate_scale();
774  ARM_COMPUTE_RETURN_ON_ERROR(validate_mm(gemmlowp_info, input, &input_weights_transposed, &eff_bias_info, input_to_cell_scale, &mm_out_info, &cell_outstage_info));
775 
776  const float recurrent_to_cell_scale = recurrent_to_cell_weights->quantization_info().uniform().scale * qoutput_state_in.scale / lstm_params.cell_intermediate_scale();
777  ARM_COMPUTE_RETURN_ON_ERROR(validate_mm(gemmlowp_info, output_state_in, &input_weights_transposed, &eff_bias_info, recurrent_to_cell_scale, &mm_out_info, &cell_outstage_info));
778 
779  ARM_COMPUTE_RETURN_ON_ERROR(CLArithmeticAddition::validate(&cell_outstage_info, &cell_outstage_info, &cell_outstage_info, ConvertPolicy::SATURATE));
780 
781  if(has_layer_norm)
782  {
783  const ITensorInfo *w_info = lstm_params.cell_layer_norm_weights();
784  const ITensorInfo *b_info = cell_bias;
785  ARM_COMPUTE_RETURN_ON_ERROR(validate_layer_norm(cell_outstage_info, *w_info, *b_info));
786  }
787 
788  const TensorInfo cell_gate_info(TensorShape(num_units, batch_size), 1, DataType::QSYMM16, sigmoid_tanh_outqinfo);
789  ARM_COMPUTE_RETURN_ON_ERROR(CLActivationLayer::validate(&cell_outstage_info, &cell_gate_info, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::TANH, 1.f, 1.f)));
790 
791  // Input gate.
792  const TensorInfo input_gate_info(TensorShape(num_units, batch_size), 1, DataType::QSYMM16, sigmoid_tanh_outqinfo);
793  if(lstm_params.has_cifg_opt())
794  {
795  ARM_COMPUTE_RETURN_ERROR_ON_MSG(lstm_params.input_gate_bias() != nullptr, "Input gate bias must not be present when CIFG is used");
796  ARM_COMPUTE_RETURN_ON_ERROR(CLArithmeticSubtraction::validate(&input_gate_info, &forget_gate_info, &forget_gate_info, ConvertPolicy::SATURATE));
797  }
798  else
799  {
800  ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR(lstm_params.input_to_input_weights(), lstm_params.recurrent_to_input_weights(), lstm_params.input_gate_bias());
801  ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(input_to_forget_weights, lstm_params.input_to_input_weights(), lstm_params.recurrent_to_input_weights());
802  ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_SHAPES(input_to_forget_weights, lstm_params.input_to_input_weights());
803  ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_SHAPES(recurrent_to_forget_weights, lstm_params.recurrent_to_input_weights());
806 
807  ARM_COMPUTE_RETURN_ERROR_ON(lstm_params.input_intermediate_scale() == 0);
808  const TensorInfo input_outstage_info(TensorShape(num_units, batch_size), 1, DataType::QSYMM16, QuantizationInfo(lstm_params.input_intermediate_scale(), 0));
809  const float input_to_input_scale = lstm_params.input_to_input_weights()->quantization_info().uniform().scale * qinput.scale / lstm_params.input_intermediate_scale();
810  ARM_COMPUTE_RETURN_ON_ERROR(validate_mm(gemmlowp_info, input, &input_weights_transposed, &eff_bias_info, input_to_input_scale, &mm_out_info, &input_outstage_info));
811 
812  const float recurrent_to_input_scale = lstm_params.recurrent_to_input_weights()->quantization_info().uniform().scale * qoutput_state_in.scale / lstm_params.input_intermediate_scale();
813  ARM_COMPUTE_RETURN_ON_ERROR(validate_mm(gemmlowp_info, output_state_in, &recurrent_weights_transposed, &eff_bias_info, recurrent_to_input_scale, &mm_out_info, &input_outstage_info));
814 
815  ARM_COMPUTE_RETURN_ON_ERROR(CLArithmeticAddition::validate(&input_outstage_info, &input_outstage_info, &input_outstage_info, ConvertPolicy::SATURATE));
816 
817  if(lstm_params.has_peephole_opt())
818  {
819  ARM_COMPUTE_RETURN_ON_ERROR(CLPixelWiseMultiplication::validate(cell_state_in, lstm_params.cell_to_input_weights(), &mm_out_info, 1.f, ConvertPolicy::SATURATE,
821  const float cell_to_input_scale = std::pow(2, cell_shift) * lstm_params.cell_to_input_weights()->quantization_info().uniform().scale / lstm_params.input_intermediate_scale();
822  ARM_COMPUTE_RETURN_ON_ERROR(quantization::calculate_quantized_multiplier(cell_to_input_scale, &gemmlowp_info.gemmlowp_multiplier, &gemmlowp_info.gemmlowp_shift));
823  ARM_COMPUTE_RETURN_ON_ERROR(CLGEMMLowpOutputStage::validate(&mm_out_info, &eff_bias_info, &input_outstage_info, gemmlowp_info));
824  ARM_COMPUTE_RETURN_ON_ERROR(CLArithmeticAddition::validate(&input_outstage_info, &input_outstage_info, &input_outstage_info, ConvertPolicy::SATURATE));
825  }
826 
827  if(has_layer_norm)
828  {
829  const ITensorInfo *w_info = lstm_params.input_layer_norm_weights();
830  const ITensorInfo *b_info = lstm_params.input_gate_bias();
831  ARM_COMPUTE_RETURN_ON_ERROR(validate_layer_norm(cell_outstage_info, *w_info, *b_info));
832  }
833 
834  ARM_COMPUTE_RETURN_ON_ERROR(CLActivationLayer::validate(&input_outstage_info, &input_gate_info, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::LOGISTIC, 1.f, 1.f)));
835  }
836  // Cell.
837  ARM_COMPUTE_RETURN_ON_ERROR(CLPixelWiseMultiplication::validate(&forget_gate_info, cell_state_in, &forget_gate_info, 1.f, ConvertPolicy::SATURATE, RoundingPolicy::TO_ZERO));
839  ARM_COMPUTE_RETURN_ON_ERROR(CLArithmeticAddition::validate(&forget_gate_info, &cell_gate_info, cell_state_out, ConvertPolicy::SATURATE));
840  if(quantized_cell_clip > 0)
841  {
842  ARM_COMPUTE_RETURN_ON_ERROR(CLActivationLayer::validate(cell_state_out, nullptr, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::LU_BOUNDED_RELU, -quantized_cell_clip,
843  quantized_cell_clip)));
844  }
845  // Output gate.
846  ARM_COMPUTE_RETURN_ERROR_ON(lstm_params.output_intermediate_scale() == 0);
847  const TensorInfo output_outstage_info(TensorShape(num_units, batch_size), 1, DataType::QSYMM16, QuantizationInfo(lstm_params.output_intermediate_scale(), 0));
848  const float input_to_output_scale = input_to_output_weights->quantization_info().uniform().scale * qinput.scale / lstm_params.output_intermediate_scale();
849  ARM_COMPUTE_RETURN_ON_ERROR(validate_mm(gemmlowp_info, input, &input_weights_transposed, &eff_bias_info, input_to_output_scale, &mm_out_info, &output_outstage_info));
850 
851  const float recurrent_to_output_scale = recurrent_to_output_weights->quantization_info().uniform().scale * qoutput_state_in.scale / lstm_params.output_intermediate_scale();
852  ARM_COMPUTE_RETURN_ON_ERROR(validate_mm(gemmlowp_info, output_state_in, &recurrent_weights_transposed, &eff_bias_info, recurrent_to_output_scale, &mm_out_info, &output_outstage_info));
853 
854  ARM_COMPUTE_RETURN_ON_ERROR(CLArithmeticAddition::validate(&output_outstage_info, &output_outstage_info, &output_outstage_info, ConvertPolicy::SATURATE));
855  if(lstm_params.has_peephole_opt())
856  {
857  ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(lstm_params.cell_to_output_weights(), 1, DataType::QSYMM16);
858  // TODO(COMPMID-3395): Perform multiplication in the quantized domain in NEPixelWiseMultiplicationKernel
859  // Here we are not using the output stage because all operations are done in float
860  // const float cell_to_output_scale = std::pow(2, cell_shift) * lstm_params.cell_to_output_weights()->quantization_info().uniform().scale / lstm_params.output_intermediate_scale();
861  // ARM_COMPUTE_RETURN_ON_ERROR(quantization::calculate_quantized_multiplier(cell_to_output_scale, &gemmlowp_info.gemmlowp_multiplier, &gemmlowp_info.gemmlowp_shift));
862  ARM_COMPUTE_RETURN_ON_ERROR(CLPixelWiseMultiplication::validate(cell_state_out, lstm_params.cell_to_output_weights(), &output_outstage_info, 1.f, ConvertPolicy::SATURATE,
864  ARM_COMPUTE_RETURN_ON_ERROR(CLArithmeticAddition::validate(&output_outstage_info, &output_outstage_info, &output_outstage_info, ConvertPolicy::SATURATE));
865  }
866 
867  if(has_layer_norm)
868  {
869  const ITensorInfo *w_info = lstm_params.output_layer_norm_weights();
870  const ITensorInfo *b_info = output_gate_bias;
871  ARM_COMPUTE_RETURN_ON_ERROR(validate_layer_norm(output_outstage_info, *w_info, *b_info));
872  }
873 
874  const TensorInfo output_gate_info(TensorShape(num_units, batch_size), 1, DataType::QSYMM16, sigmoid_tanh_outqinfo);
875  ARM_COMPUTE_RETURN_ON_ERROR(CLActivationLayer::validate(&output_outstage_info, &output_gate_info, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::LOGISTIC)));
876 
877  // Hidden.
878  ARM_COMPUTE_RETURN_ON_ERROR(CLActivationLayer::validate(cell_state_out, &input_gate_info, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::TANH, 1.f, 1.f)));
879  const TensorInfo hidden_mul_res(TensorShape(num_units, batch_size), 1, DataType::S32);
880  const TensorInfo hidden_out_info(TensorShape(num_units, batch_size), 1, DataType::QASYMM8_SIGNED);
881 
882  ARM_COMPUTE_RETURN_ERROR_ON(lstm_params.hidden_state_scale() == 0);
883  ARM_COMPUTE_RETURN_ON_ERROR(CLPixelWiseMultiplication::validate(&output_gate_info, &input_gate_info, &hidden_mul_res, 1.f, ConvertPolicy::SATURATE, RoundingPolicy::TO_ZERO));
884  const float hidden_state_scale = std::pow(2, -15) / lstm_params.hidden_state_scale() * std::pow(2, -15);
885  ARM_COMPUTE_RETURN_ON_ERROR(quantization::calculate_quantized_multiplier(hidden_state_scale, &gemmlowp_info.gemmlowp_multiplier, &gemmlowp_info.gemmlowp_shift, /* ignore_epsilon */ true));
886  gemmlowp_info.gemmlowp_offset = lstm_params.hidden_state_zero();
887  gemmlowp_info.output_data_type = hidden_out_info.data_type();
888  ARM_COMPUTE_RETURN_ON_ERROR(CLGEMMLowpOutputStage::validate(&hidden_mul_res, nullptr, &hidden_out_info, gemmlowp_info));
889 
890  const bool projection_tensor_copy_required = num_units != output_size;
891 
892  // Projection.
893  if(lstm_params.has_projection())
894  {
896  ARM_COMPUTE_RETURN_ERROR_ON(qoutput_state_in.scale == 0);
897 
898  const UniformQuantizationInfo qprojection = lstm_params.projection_weights()->quantization_info().uniform();
899  const float projection_scale = qprojection.scale * lstm_params.hidden_state_scale() / qoutput_state_in.scale;
900  ARM_COMPUTE_RETURN_ON_ERROR(quantization::calculate_quantized_multiplier(projection_scale, &gemmlowp_info.gemmlowp_multiplier, &gemmlowp_info.gemmlowp_shift));
901  gemmlowp_info.gemmlowp_offset = qoutput_state_in.offset;
902  gemmlowp_info.gemmlowp_min_bound = std::numeric_limits<int8_t>::lowest();
903  gemmlowp_info.gemmlowp_max_bound = std::numeric_limits<int8_t>::max();
904  gemmlowp_info.output_data_type = DataType::QASYMM8_SIGNED;
905 
906  const TensorInfo projection_outstage_info(*output_state_out);
907  const TensorInfo projection_weights_transposed(TensorShape(output_size, num_units), 1, lstm_params.projection_weights()->data_type(), lstm_params.projection_weights()->quantization_info());
908 
909  TensorInfo projection_mm_out_info{ mm_out_info };
910  projection_mm_out_info.set_tensor_shape(TensorShape(output_size, batch_size));
911 
912  ARM_COMPUTE_RETURN_ON_ERROR(validate_mm(gemmlowp_info, &hidden_out_info, &projection_weights_transposed, &projection_eff_bias_info, projection_scale, &projection_mm_out_info,
913  &projection_outstage_info));
914 
915  if(projection_tensor_copy_required)
916  {
917  ARM_COMPUTE_RETURN_ON_ERROR(CLQLSTMLayer::TensorCopyKernel::validate(*output_state_in, projection_outstage_info));
918  }
919 
920  ARM_COMPUTE_RETURN_ON_ERROR(CLArithmeticAddition::validate(output_state_out, output_state_out, output_state_out, ConvertPolicy::SATURATE));
921 
922  if(projection_tensor_copy_required)
923  {
924  ARM_COMPUTE_RETURN_ON_ERROR(CLQLSTMLayer::TensorCopyKernel::validate(projection_outstage_info, *output_state_out));
925  }
926 
927  int8_t quantized_projection_clip{ 0 };
928  if(lstm_params.projection_clip() > 0.0f)
929  {
930  quantized_projection_clip = quantize_qasymm8_signed(lstm_params.projection_clip(), qprojection);
931  }
932 
933  if(quantized_projection_clip > 0)
934  {
935  ARM_COMPUTE_RETURN_ON_ERROR(CLActivationLayer::validate(output_state_out, nullptr, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::LU_BOUNDED_RELU, -quantized_projection_clip,
936  quantized_projection_clip)));
937  }
938  }
939  else
940  {
941  if(projection_tensor_copy_required)
942  {
943  ARM_COMPUTE_RETURN_ON_ERROR(CLQLSTMLayer::TensorCopyKernel::validate(hidden_out_info, *output_state_out));
944  }
945  }
946 
947  if(cell_state_out->total_size() > 0)
948  {
949  ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(cell_state_in, cell_state_out);
950  ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_SHAPES(cell_state_in, cell_state_out);
951  }
952 
953  if(output_state_out->total_size() > 0)
954  {
956  ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_SHAPES(output_state_in, output_state_out);
957  }
958 
959  ARM_COMPUTE_RETURN_ON_ERROR(CLCopy::validate(output_state_out, output));
960  return Status{};
961 }
static Status validate(const ITensorInfo *input, const ITensorInfo *output, const ActivationLayerInfo &act_info)
Static function to check if given info will lead to a valid configuration of CLActivationLayer.
int16_t quantize_qsymm16(float value, const UniformQuantizationInfo &qinfo, RoundingPolicy rounding_policy=RoundingPolicy::TO_NEAREST_UP)
Quantize a value given a 16-bit symmetric quantization scheme.
Quantize using a fixed point multiplication.
quantized, symmetric fixed-point 16-bit number
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204
Status calculate_quantized_multiplier(float multiplier, int32_t *quant_multiplier, int32_t *shift, bool ignore_epsilon=false)
Calculate quantized representation of multiplier.
#define ARM_COMPUTE_RETURN_ERROR_ON(cond)
If the condition is true, an error is returned.
Definition: Error.h:296
#define ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR(...)
Definition: Validate.h:163
static Status validate(const ITensorInfo *input, const ITensorInfo *bias, const ITensorInfo *output, const GEMMLowpOutputStageInfo &info)
Static function to check if given info will lead to a valid configuration of CLGEMMLowpQuantizeDownIn...
1 channel, 1 S32 per channel
int8_t quantize_qasymm8_signed(float value, const INFO_TYPE &qinfo, RoundingPolicy rounding_policy=RoundingPolicy::TO_NEAREST_UP)
Quantize a value given a signed 8-bit asymmetric quantization scheme.
static Status validate(const ITensorInfo *input1, const ITensorInfo *input2, const ITensorInfo *output, ConvertPolicy policy, const ActivationLayerInfo &act_info=ActivationLayerInfo())
Static function to check if given info will lead to a valid configuration of opencl::kernels::ClSatur...
static Status validate(const ITensorInfo *input1, const ITensorInfo *input2, const ITensorInfo *output, ConvertPolicy policy, const ActivationLayerInfo &act_info=ActivationLayerInfo())
Static function to check if given info will lead to a valid configuration of opencl::kernels::ClSatur...
quantized, symmetric fixed-point 8-bit number
#define ARM_COMPUTE_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(t, c,...)
Definition: Validate.h:790
static Status validate(const ITensorInfo *input, const ITensorInfo *output, Window *dst_window=nullptr)
Static function to check if given info will lead to a valid configuration of CLCopy.
Definition: CLCopy.cpp:68
static Status validate(const ITensorInfo *input, const ITensorInfo *output)
Static function to check if given info will lead to a valid configuration of CLTranspose.
Definition: CLTranspose.cpp:44
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_SHAPES(...)
Definition: Validate.h:443
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(...)
Definition: Validate.h:545
#define ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(t, c,...)
Definition: Validate.h:792
static Status validate(const ITensorInfo *mtx_a, const ITensorInfo *vector_sum_row, const GEMMLowpReductionKernelInfo &info)
Static function to check if given info will lead to a valid configuration of CLGEMMLowpMatrixAReducti...
#define ARM_COMPUTE_RETURN_ERROR_ON_MSG(cond, msg)
If the condition is true, an error is returned.
Definition: Error.h:244
quantized, asymmetric fixed-point 8-bit number signed
Truncates the least significant values that are lost in operations.
static Status validate(const ITensorInfo *input1, const ITensorInfo *input2, const ITensorInfo *output, float scale, ConvertPolicy overflow_policy, RoundingPolicy rounding_policy, const ActivationLayerInfo &act_info=ActivationLayerInfo())
Static function to check if given info will lead to a valid configuration of CLPixelWiseMultiplicatio...
Status validate(const ITensorInfo *scores_in, const ITensorInfo *boxes_in, const ITensorInfo *batch_splits_in, const ITensorInfo *scores_out, const ITensorInfo *boxes_out, const ITensorInfo *classes, const ITensorInfo *batch_splits_out, const ITensorInfo *keeps, const ITensorInfo *keeps_size, const BoxNMSLimitInfo info)

The documentation for this class was generated from the following files: