Compute Library
 22.11
CLQLSTMLayer Class Reference

Basic function to run CLQLSTMLayer. More...

#include <CLQLSTMLayer.h>

Collaboration diagram for CLQLSTMLayer:
[legend]

Public Member Functions

 CLQLSTMLayer (std::shared_ptr< IMemoryManager > memory_manager=nullptr)
 Default constructor. More...
 
 CLQLSTMLayer (const CLQLSTMLayer &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 CLQLSTMLayer (CLQLSTMLayer &&)=default
 Default move constructor. More...
 
CLQLSTMLayeroperator= (const CLQLSTMLayer &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
CLQLSTMLayeroperator= (CLQLSTMLayer &&)=default
 Default move assignment operator. More...
 
 ~CLQLSTMLayer ()
 Default destructor. More...
 
void configure (const ICLTensor *input, const ICLTensor *input_to_forget_weights, const ICLTensor *input_to_cell_weights, const ICLTensor *input_to_output_weights, const ICLTensor *recurrent_to_forget_weights, const ICLTensor *recurrent_to_cell_weights, const ICLTensor *recurrent_to_output_weights, const ICLTensor *forget_gate_bias, const ICLTensor *cell_bias, const ICLTensor *output_gate_bias, ICLTensor *cell_state_in, ICLTensor *output_state_in, ICLTensor *cell_state_out, ICLTensor *output_state_out, ICLTensor *output, const LSTMParams< ICLTensor > &lstm_params)
 Initialize function's tensors. More...
 
void configure (const CLCompileContext &compile_context, const ICLTensor *input, const ICLTensor *input_to_forget_weights, const ICLTensor *input_to_cell_weights, const ICLTensor *input_to_output_weights, const ICLTensor *recurrent_to_forget_weights, const ICLTensor *recurrent_to_cell_weights, const ICLTensor *recurrent_to_output_weights, const ICLTensor *forget_gate_bias, const ICLTensor *cell_bias, const ICLTensor *output_gate_bias, ICLTensor *cell_state_in, ICLTensor *output_state_in, ICLTensor *cell_state_out, ICLTensor *output_state_out, ICLTensor *output, const LSTMParams< ICLTensor > &lstm_params)
 Initialize function's tensors. More...
 
void run () override
 Run the kernels contained in the function. More...
 
void prepare () override
 Prepare the function for executing. More...
 
- Public Member Functions inherited from IFunction
virtual ~IFunction ()=default
 Destructor. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *input, const ITensorInfo *input_to_forget_weights, const ITensorInfo *input_to_cell_weights, const ITensorInfo *input_to_output_weights, const ITensorInfo *recurrent_to_forget_weights, const ITensorInfo *recurrent_to_cell_weights, const ITensorInfo *recurrent_to_output_weights, const ITensorInfo *forget_gate_bias, const ITensorInfo *cell_bias, const ITensorInfo *output_gate_bias, const ITensorInfo *cell_state_in, const ITensorInfo *output_state_in, const ITensorInfo *cell_state_out, const ITensorInfo *output_state_out, const ITensorInfo *output, const LSTMParams< ITensorInfo > &lstm_params)
 Static function to check if given info will lead to a valid configuration of CLQLSTMLayer. More...
 

Detailed Description

Basic function to run CLQLSTMLayer.

This function calls the following CL functions/kernels:

  1. CLActivationLayer Activation functions (tanh and logistic)
  2. CLCopy Copy function for copying output_state_out to output
  3. CLArithmeticAddition Elementwise addition and subtraction
  4. CLGEMMLowpMatrixMultiplyCore Quantized matrix multiplication core. Accumulators are 32-bit integers
  5. CLGEMMLowpOutputStage Convert 32-bit integers into QSYMM16
  6. opencl::kernels::ClGemmLowpMatrixAReductionKernel For precomputing effective biases to use
  7. CLPixelWiseMultiplication Elementwise multiplication
  8. CLTranspose Transpose function for reshaping the weights

Definition at line 66 of file CLQLSTMLayer.h.

Constructor & Destructor Documentation

◆ CLQLSTMLayer() [1/3]

CLQLSTMLayer ( std::shared_ptr< IMemoryManager memory_manager = nullptr)

Default constructor.

Definition at line 94 of file CLQLSTMLayer.cpp.

References CLTensorAllocator::allocate(), CLTensor::allocator(), ARM_COMPUTE_ERROR_ON, arm_compute::quantization::calculate_quantized_multiplier(), CLQLSTMLayerNormalizationKernel::configure(), CLGEMMLowpOutputStage::configure(), CLGEMMLowpMatrixMultiplyCore::configure(), GEMMLowpOutputStageInfo::gemmlowp_multiplier, GEMMLowpOutputStageInfo::gemmlowp_shift, ITensor::info(), ITensorAllocator::init(), MemoryGroup::manage(), CLQLSTMLayerNormalizationKernel::validate(), and CLQLSTMLayer::~CLQLSTMLayer().

95  : _input_to_input_reduction(std::make_unique<ClGemmLowpMatrixAReductionKernel>()),
96  _recurrent_to_input_reduction(std::make_unique<ClGemmLowpMatrixAReductionKernel>()),
97  _input_to_forget_reduction(std::make_unique<ClGemmLowpMatrixAReductionKernel>()),
98  _recurrent_to_forget_reduction(std::make_unique<ClGemmLowpMatrixAReductionKernel>()),
99  _input_to_cell_reduction(std::make_unique<ClGemmLowpMatrixAReductionKernel>()),
100  _recurrent_to_cell_reduction(std::make_unique<ClGemmLowpMatrixAReductionKernel>()),
101  _input_to_output_reduction(std::make_unique<ClGemmLowpMatrixAReductionKernel>()),
102  _recurrent_to_output_reduction(std::make_unique<ClGemmLowpMatrixAReductionKernel>()),
103  _projection_reduction(std::make_unique<ClGemmLowpMatrixAReductionKernel>()),
104  _layer_norms(),
105  _copy_output()
106 {
107  for(auto &norm : _layer_norms)
108  {
109  norm = std::make_unique<CLQLSTMLayerNormalizationKernel>();
110  }
111 
112  _memory_group = MemoryGroup(std::move(memory_manager));
113 }

◆ CLQLSTMLayer() [2/3]

CLQLSTMLayer ( const CLQLSTMLayer )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ CLQLSTMLayer() [3/3]

CLQLSTMLayer ( CLQLSTMLayer &&  )
default

Default move constructor.

◆ ~CLQLSTMLayer()

~CLQLSTMLayer ( )
default

Default destructor.

Referenced by CLQLSTMLayer::CLQLSTMLayer().

Member Function Documentation

◆ configure() [1/2]

void configure ( const ICLTensor input,
const ICLTensor input_to_forget_weights,
const ICLTensor input_to_cell_weights,
const ICLTensor input_to_output_weights,
const ICLTensor recurrent_to_forget_weights,
const ICLTensor recurrent_to_cell_weights,
const ICLTensor recurrent_to_output_weights,
const ICLTensor forget_gate_bias,
const ICLTensor cell_bias,
const ICLTensor output_gate_bias,
ICLTensor cell_state_in,
ICLTensor output_state_in,
ICLTensor cell_state_out,
ICLTensor output_state_out,
ICLTensor output,
const LSTMParams< ICLTensor > &  lstm_params 
)

Initialize function's tensors.

Valid data layouts:

  • All

Valid data type configurations:

src0 src1 - src6 src7 -src9 src10 src11 dst0 dst1 - dst2
QASYMM8_SIGNEDQASYMM8 S32 QSYMM16QASYMM8_SIGNEDQSYMM16QASYMM8_SIGNED
Parameters
[in]inputSource tensor. Input is a 2D tensor with dimensions [input_size, batch_size]. Data types supported: QASYMM8_SIGNED.
[in]input_to_forget_weights2D weights tensor with dimensions [input_size, num_units]. Data type supported: QSYMM8.
[in]input_to_cell_weights2D weights tensor with dimensions [input_size, num_units]. Data type supported: QSYMM8.
[in]input_to_output_weights2D weights tensor with dimensions [input_size, num_units]. Data type supported: QSYMM8.
[in]recurrent_to_forget_weights2D weights tensor with dimensions [output_size, num_units]. Data type supported: QSYMM8.
[in]recurrent_to_cell_weights2D weights tensor with dimensions [output_size, num_units]. Data type supported: QSYMM8.
[in]recurrent_to_output_weights2D weights tensor with dimensions [output_size, num_units]. Data type supported: QSYMM8.
[in]forget_gate_bias1D weights tensor with dimensions [num_units]. Data type supported: S32.
[in]cell_bias1D weights tensor with dimensions [num_units]. Data type supported: S32.
[in]output_gate_bias1D weights tensor with dimensions [num_units]. Data type supported: S32.
[in]cell_state_in2D tensor with dimensions [num_units, batch_size]. Data type supported: QSYMM16.
[in]output_state_in2D tensor with dimensions [output_size, batch_size]. Data type supported: Same as input.
[out]cell_state_outDestination tensor. Output is a 2D tensor with dimensions [num_units, batch_size]. Data type supported: QSYMM16.
[out]output_state_outDestination tensor. Output is a 2D tensor with dimensions [output_size, batch_size].Data types supported: Same as input.
[out]outputDestination tensor. Output is a 2D tensor with dimensions [output_size, batch_size].Data types supported: Same as input.
[in]lstm_paramsWeights tensors used in peephole, CIFG and layer normalization optimizations: input_intermediate_scale Scale of the intermediate result of matmul, i.e. input to layer normalization, at input gate. forget_intermediate_scale Scale of the intermediate result of matmul, i.e. input to layer normalization, at forget gate. cell_intermediate_scale Scale of the intermediate result of matmul, i.e. input to layer normalization, at cell gate. output_intermediate_scale Scale of the intermediate result of matmul, i.e. input to layer normalization, at output gate. hidden_state_zero The zero point of the hidden state. hidden_state_scale The scale of the hidden state. input_to_input_weights (Optional) 2D weights tensor with dimensions [input_size, num_units]. Data type supported: QSYMM8. recurrent_to_input_weights (Optional) 2D weights tensor with dimensions [output_size, num_units]. Data type supported: QSYMM8. cell_to_input_weights (Optional) 1D weights tensor with dimensions [num_units]. Can be nullptr. Data type supported: QSYMM16. cell_to_forget_weights (Optional) 1D weights tensor with dimensions [num_units]. Data type supported: QSYMM16. cell_to_output_weights (Optional) 1D weights tensor with dimensions [num_units]. Data type supported: QSYMM16. input_gate_bias (Optional) 1D weights tensor with dimensions [num_units]. Data type supported: S32. projection_weights (Optional) 2D weights tensor with dimensions [output_size, num_units]. Data type supported: QSYMM8. projection_bias (Optional) 1D weights tensor with dimensions [output_size]. S32. input_layer_norm_weights (Optional) 1D weights tensor with dimensions [num_units]. Data type supported: QSYMM16. forget_layer_norm_weights (Optional) 1D weights tensor with dimensions [num_units]. Data type supported: QSYMM16. cell_layer_norm_weights (Optional) 1D weights tensor with dimensions [num_units]. Data type supported: QSYMM16. output_layer_norm_weights (Optional) 1D weights tensor with dimensions [num_units]. Data type supported: QSYMM16. cell_threshold (Optional) The clipping threshold for the cell state, such that values are bound within [-cell_clip, cell_clip]. If set to 0.0 then clipping is disabled. projection_threshold (Optional) The clipping threshold for the output from the projection layer, such that values are bound within [-proj_clip, proj_clip]. If set to 0.0 then clipping is disabled.

Definition at line 159 of file CLQLSTMLayer.cpp.

References CLKernelLibrary::get().

166 {
169  cell_state_in, output_state_in, cell_state_out, output_state_out, output, lstm_params);
170 }
static CLKernelLibrary & get()
Access the KernelLibrary singleton.
void configure(const ICLTensor *input, const ICLTensor *input_to_forget_weights, const ICLTensor *input_to_cell_weights, const ICLTensor *input_to_output_weights, const ICLTensor *recurrent_to_forget_weights, const ICLTensor *recurrent_to_cell_weights, const ICLTensor *recurrent_to_output_weights, const ICLTensor *forget_gate_bias, const ICLTensor *cell_bias, const ICLTensor *output_gate_bias, ICLTensor *cell_state_in, ICLTensor *output_state_in, ICLTensor *cell_state_out, ICLTensor *output_state_out, ICLTensor *output, const LSTMParams< ICLTensor > &lstm_params)
Initialize function&#39;s tensors.

◆ configure() [2/2]

void configure ( const CLCompileContext compile_context,
const ICLTensor input,
const ICLTensor input_to_forget_weights,
const ICLTensor input_to_cell_weights,
const ICLTensor input_to_output_weights,
const ICLTensor recurrent_to_forget_weights,
const ICLTensor recurrent_to_cell_weights,
const ICLTensor recurrent_to_output_weights,
const ICLTensor forget_gate_bias,
const ICLTensor cell_bias,
const ICLTensor output_gate_bias,
ICLTensor cell_state_in,
ICLTensor output_state_in,
ICLTensor cell_state_out,
ICLTensor output_state_out,
ICLTensor output,
const LSTMParams< ICLTensor > &  lstm_params 
)

Initialize function's tensors.

Parameters
[in]compile_contextThe compile context to be used.
[in]inputSource tensor. Input is a 2D tensor with dimensions [input_size, batch_size]. Data types supported: QASYMM8_SIGNED.
[in]input_to_forget_weights2D weights tensor with dimensions [input_size, num_units]. Data type supported: QSYMM8.
[in]input_to_cell_weights2D weights tensor with dimensions [input_size, num_units]. Data type supported: QSYMM8.
[in]input_to_output_weights2D weights tensor with dimensions [input_size, num_units]. Data type supported: QSYMM8.
[in]recurrent_to_forget_weights2D weights tensor with dimensions [output_size, num_units]. Data type supported: QSYMM8.
[in]recurrent_to_cell_weights2D weights tensor with dimensions [output_size, num_units]. Data type supported: QSYMM8.
[in]recurrent_to_output_weights2D weights tensor with dimensions [output_size, num_units]. Data type supported: QSYMM8.
[in]forget_gate_bias1D weights tensor with dimensions [num_units]. Data type supported: S32.
[in]cell_bias1D weights tensor with dimensions [num_units]. Data type supported: S32.
[in]output_gate_bias1D weights tensor with dimensions [num_units]. Data type supported: S32.
[in]cell_state_in2D tensor with dimensions [num_units, batch_size]. Data type supported: QSYMM16.
[in]output_state_in2D tensor with dimensions [output_size, batch_size]. Data type supported: Same as input.
[out]cell_state_outDestination tensor. Output is a 2D tensor with dimensions [num_units, batch_size]. Data type supported: QSYMM16.
[out]output_state_outDestination tensor. Output is a 2D tensor with dimensions [output_size, batch_size].Data types supported: Same as input.
[out]outputDestination tensor. Output is a 2D tensor with dimensions [output_size, batch_size].Data types supported: Same as input.
[in]lstm_paramsWeights tensors used in peephole, CIFG and layer normalization optimizations: input_intermediate_scale Scale of the intermediate result of matmul, i.e. input to layer normalization, at input gate. forget_intermediate_scale Scale of the intermediate result of matmul, i.e. input to layer normalization, at forget gate. cell_intermediate_scale Scale of the intermediate result of matmul, i.e. input to layer normalization, at cell gate. output_intermediate_scale Scale of the intermediate result of matmul, i.e. input to layer normalization, at output gate. hidden_state_zero The zero point of the hidden state. hidden_state_scale The scale of the hidden state. input_to_input_weights (Optional) 2D weights tensor with dimensions [input_size, num_units]. Data type supported: QSYMM8. recurrent_to_input_weights (Optional) 2D weights tensor with dimensions [output_size, num_units]. Data type supported: QSYMM8. cell_to_input_weights (Optional) 1D weights tensor with dimensions [num_units]. Can be nullptr. Data type supported: QSYMM16. cell_to_forget_weights (Optional) 1D weights tensor with dimensions [num_units]. Data type supported: QSYMM16. cell_to_output_weights (Optional) 1D weights tensor with dimensions [num_units]. Data type supported: QSYMM16. input_gate_bias (Optional) 1D weights tensor with dimensions [num_units]. Data type supported: S32. projection_weights (Optional) 2D weights tensor with dimensions [output_size, num_units]. Data type supported: QSYMM8. projection_bias (Optional) 1D weights tensor with dimensions [output_size]. S32. input_layer_norm_weights (Optional) 1D weights tensor with dimensions [num_units]. Data type supported: QSYMM16. forget_layer_norm_weights (Optional) 1D weights tensor with dimensions [num_units]. Data type supported: QSYMM16. cell_layer_norm_weights (Optional) 1D weights tensor with dimensions [num_units]. Data type supported: QSYMM16. output_layer_norm_weights (Optional) 1D weights tensor with dimensions [num_units]. Data type supported: QSYMM16. cell_threshold (Optional) The clipping threshold for the cell state, such that values are bound within [-cell_clip, cell_clip]. If set to 0.0 then clipping is disabled. projection_threshold (Optional) The clipping threshold for the output from the projection layer, such that values are bound within [-proj_clip, proj_clip]. If set to 0.0 then clipping is disabled.

Definition at line 172 of file CLQLSTMLayer.cpp.

References CLTensorAllocator::allocate(), CLTensor::allocator(), ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, ARM_COMPUTE_LOG_PARAMS, arm_compute::utils::info_helpers::build_lstm_params_tensor_info(), arm_compute::quantization::calculate_quantized_multiplier(), LSTMParams< T >::cell_clip(), LSTMParams< T >::cell_intermediate_scale(), LSTMParams< T >::cell_layer_norm_weights(), LSTMParams< T >::cell_to_forget_weights(), LSTMParams< T >::cell_to_input_weights(), LSTMParams< T >::cell_to_output_weights(), CLTranspose::configure(), CLCopy::configure(), CLActivationLayer::configure(), CLArithmeticAddition::configure(), CLPixelWiseMultiplication::configure(), CLGEMMLowpOutputStage::configure(), CLArithmeticSubtraction::configure(), ITensorInfo::data_type(), ITensorInfo::dimension(), LSTMParams< T >::forget_intermediate_scale(), LSTMParams< T >::forget_layer_norm_weights(), GEMMLowpOutputStageInfo::gemmlowp_max_bound, GEMMLowpOutputStageInfo::gemmlowp_min_bound, GEMMLowpOutputStageInfo::gemmlowp_multiplier, GEMMLowpOutputStageInfo::gemmlowp_offset, GEMMLowpOutputStageInfo::gemmlowp_shift, LSTMParams< T >::has_cifg_opt(), LSTMParams< T >::has_peephole_opt(), LSTMParams< T >::has_projection(), LSTMParams< T >::hidden_state_scale(), LSTMParams< T >::hidden_state_zero(), ITensor::info(), CLTensor::info(), ITensorAllocator::init(), LSTMParams< T >::input_gate_bias(), LSTMParams< T >::input_intermediate_scale(), LSTMParams< T >::input_layer_norm_weights(), arm_compute::test::validation::input_to_cell_weights, arm_compute::test::validation::input_to_forget_weights, LSTMParams< T >::input_to_input_weights(), arm_compute::test::validation::input_to_output_weights, ActivationLayerInfo::LOGISTIC, arm_compute::support::cpp11::lowest(), ActivationLayerInfo::LU_BOUNDED_RELU, MemoryGroup::manage(), UniformQuantizationInfo::offset, GEMMLowpOutputStageInfo::output_data_type, LSTMParams< T >::output_intermediate_scale(), LSTMParams< T >::output_layer_norm_weights(), arm_compute::test::validation::output_size, LSTMParams< T >::projection_bias(), LSTMParams< T >::projection_clip(), LSTMParams< T >::projection_weights(), arm_compute::QASYMM8_SIGNED, arm_compute::QSYMM16, ITensorInfo::quantization_info(), TensorInfo::quantization_info(), arm_compute::QUANTIZE_DOWN_FIXEDPOINT, arm_compute::quantize_qsymm16(), arm_compute::test::validation::recurrent_to_cell_weights, arm_compute::test::validation::recurrent_to_forget_weights, LSTMParams< T >::recurrent_to_input_weights(), arm_compute::test::validation::recurrent_to_output_weights, arm_compute::S32, arm_compute::SATURATE, UniformQuantizationInfo::scale, TensorInfo::set_tensor_shape(), ActivationLayerInfo::TANH, ITensorInfo::tensor_shape(), TensorInfo::tensor_shape(), arm_compute::TO_ZERO, GEMMLowpOutputStageInfo::type, QuantizationInfo::uniform(), LSTMParams< T >::use_layer_norm(), and CLQLSTMLayer::validate().

179 {
182  forget_gate_bias, cell_bias, output_gate_bias, cell_state_in, output_state_in,
183  cell_state_out, output_state_out, output);
184 
187  forget_gate_bias, cell_bias, output_gate_bias, cell_state_in, output_state_in,
188  cell_state_out, output_state_out, output, lstm_params);
189  // Set lstm parameters
190  LSTMParams<ITensorInfo> lstm_params_info{};
191  build_lstm_params_tensor_info(lstm_params, &lstm_params_info);
192 
193  // Validate
196  forget_gate_bias->info(), cell_bias->info(), output_gate_bias->info(),
197  cell_state_in->info(), output_state_in->info(), cell_state_out->info(), output_state_out->info(), output->info(),
198  lstm_params_info));
199 
200  const int batch_size = input->info()->dimension(1);
201  const int num_units = input_to_output_weights->info()->dimension(1);
202  const int output_size = output_state_out->info()->dimension(_out_state_output_size_dimension_idx);
203 
204  const UniformQuantizationInfo qinput = input->info()->quantization_info().uniform();
205  const UniformQuantizationInfo qcell_state_in = cell_state_in->info()->quantization_info().uniform();
206  const UniformQuantizationInfo qoutput_state_in = output_state_in->info()->quantization_info().uniform();
207 
208  _projection_bias = lstm_params.projection_bias();
209  _input_to_forget_weights = input_to_forget_weights;
210  _input_to_cell_weights = input_to_cell_weights;
211  _input_to_output_weights = input_to_output_weights;
212  _recurrent_to_forget_weights = recurrent_to_forget_weights;
213  _recurrent_to_cell_weights = recurrent_to_cell_weights;
214  _recurrent_to_output_weights = recurrent_to_output_weights;
215  _projection_weights = lstm_params.projection_weights();
216 
217  // Layer normalization
218  _has_layer_norm = lstm_params.use_layer_norm();
219  if(_has_layer_norm)
220  {
221  set_layer_norm_weight(lstm_params.forget_layer_norm_weights(), LayerNormGate::Forget);
222  set_layer_norm_weight(lstm_params.cell_layer_norm_weights(), LayerNormGate::Cell);
223  set_layer_norm_weight(lstm_params.input_layer_norm_weights(), LayerNormGate::Input);
224  set_layer_norm_weight(lstm_params.output_layer_norm_weights(), LayerNormGate::Output);
225 
226  set_layer_norm_bias(forget_gate_bias, LayerNormGate::Forget);
227  set_layer_norm_bias(cell_bias, LayerNormGate::Cell);
228  set_layer_norm_bias(lstm_params.input_gate_bias(), LayerNormGate::Input);
229  set_layer_norm_bias(output_gate_bias, LayerNormGate::Output);
230  }
231 
232  _has_cifg = lstm_params.has_cifg_opt();
233  _has_projection = lstm_params.has_projection();
234  _has_peephole = lstm_params.has_peephole_opt();
235 
236  // Calculate and decompose effective scales for optimizing matmul calculation
237  const int32_t cell_shift = log2(qcell_state_in.scale);
238 
239  // Calculate quantized parameters for clipping.
240  int16_t quantized_cell_clip = 0;
241  if(lstm_params.cell_clip() > 0.0f)
242  {
243  quantized_cell_clip = quantize_qsymm16(lstm_params.cell_clip(), qcell_state_in);
244  }
245  _has_cell_clipping = quantized_cell_clip > 0;
246 
247  // Precompute effective bias for optimizing the matmul computations.
248  if(!_has_cifg)
249  {
250  _input_to_input_weights = lstm_params.input_to_input_weights();
251  _recurrent_to_input_weights = lstm_params.recurrent_to_input_weights();
252 
253  _input_to_input_reduction->configure(compile_context, _input_to_input_weights->info(), _input_to_input_eff_bias.info(), GEMMLowpReductionKernelInfo(num_units, false, -qinput.offset, true));
254  _recurrent_to_input_reduction->configure(compile_context, _recurrent_to_input_weights->info(), _recurrent_to_input_eff_bias.info(), GEMMLowpReductionKernelInfo(num_units, false,
255  -qoutput_state_in.offset, true));
256  }
257  _input_to_forget_reduction->configure(compile_context, input_to_forget_weights->info(), _input_to_forget_eff_bias.info(), GEMMLowpReductionKernelInfo(num_units, false, -qinput.offset, true));
258  _recurrent_to_forget_reduction->configure(compile_context, recurrent_to_forget_weights->info(), _recurrent_to_forget_eff_bias.info(), GEMMLowpReductionKernelInfo(num_units, false,
259  -qoutput_state_in.offset, true));
260  _input_to_cell_reduction->configure(compile_context, input_to_cell_weights->info(), _input_to_cell_eff_bias.info(), GEMMLowpReductionKernelInfo(num_units, false, -qinput.offset, true));
261  _recurrent_to_cell_reduction->configure(compile_context, recurrent_to_cell_weights->info(), _recurrent_to_cell_eff_bias.info(), GEMMLowpReductionKernelInfo(num_units, false, -qoutput_state_in.offset,
262  true));
263  _input_to_output_reduction->configure(compile_context, input_to_output_weights->info(), _input_to_output_eff_bias.info(), GEMMLowpReductionKernelInfo(num_units, false, -qinput.offset, true));
264  _recurrent_to_output_reduction->configure(compile_context, recurrent_to_output_weights->info(), _recurrent_to_output_eff_bias.info(), GEMMLowpReductionKernelInfo(num_units, false,
265  -qoutput_state_in.offset, true));
266  if(_has_projection)
267  {
268  _projection_reduction->configure(compile_context, _projection_weights->info(), _projection_eff_bias.info(), GEMMLowpReductionKernelInfo(output_size, false, lstm_params.hidden_state_zero(), true));
269  if(_projection_bias != nullptr)
270  {
271  _projection_bias_add.configure(compile_context, _projection_bias, &_projection_eff_bias, &_projection_eff_bias, ConvertPolicy::SATURATE);
272  }
273  }
274 
275  // Pre-transpose weights to be used in GEMM.
276  _transpose_input_to_forget_weights.configure(compile_context, input_to_forget_weights, &_input_to_forget_weights_transposed);
277  _transpose_input_to_cell_weights.configure(compile_context, input_to_cell_weights, &_input_to_cell_weights_transposed);
278  _transpose_input_to_output_weights.configure(compile_context, input_to_output_weights, &_input_to_output_weights_transposed);
279  _transpose_recurrent_to_forget_weights.configure(compile_context, recurrent_to_forget_weights, &_recurrent_to_forget_weights_transposed);
280  _transpose_recurrent_to_cell_weights.configure(compile_context, recurrent_to_cell_weights, &_recurrent_to_cell_weights_transposed);
281  _transpose_recurrent_to_output_weights.configure(compile_context, recurrent_to_output_weights, &_recurrent_to_output_weights_transposed);
282  if(!_has_cifg)
283  {
284  _transpose_input_to_input_weights.configure(compile_context, lstm_params.input_to_input_weights(), &_input_to_input_weights_transposed);
285  _transpose_recurrent_to_input_weights.configure(compile_context, lstm_params.recurrent_to_input_weights(), &_recurrent_to_input_weights_transposed);
286  }
287  if(_has_projection)
288  {
289  _transpose_projection_weights.configure(compile_context, _projection_weights, &_projection_weights_transposed);
290  }
291 
292  GEMMLowpOutputStageInfo gemmlowp_info;
294  gemmlowp_info.gemmlowp_min_bound = std::numeric_limits<int16_t>::lowest();
295  gemmlowp_info.gemmlowp_max_bound = std::numeric_limits<int16_t>::max();
296  gemmlowp_info.output_data_type = DataType::QSYMM16;
297 
298  const TensorInfo mm_out_info(TensorShape(num_units, batch_size), 1, DataType::S32);
299  // Forget gate.
300  const TensorInfo forget_gate_outstage_info(mm_out_info.tensor_shape(), 1, DataType::QSYMM16, QuantizationInfo(lstm_params.forget_intermediate_scale(), 0));
301  const float input_to_forget_scale = input_to_forget_weights->info()->quantization_info().uniform().scale * qinput.scale / lstm_params.forget_intermediate_scale();
302  configure_mm(compile_context, _mm_input_to_forget, _input_to_forget_outstage, gemmlowp_info,
303  input, &_input_to_forget_weights_transposed, &_input_to_forget_eff_bias,
304  &_mm_input_to_forget_res, &_input_to_forget_outstage_res, input_to_forget_scale,
305  mm_out_info, forget_gate_outstage_info);
306 
307  const float recurrent_to_forget_scale = recurrent_to_forget_weights->info()->quantization_info().uniform().scale * qoutput_state_in.scale / lstm_params.forget_intermediate_scale();
308  configure_mm(compile_context, _mm_recurrent_to_forget, _recurrent_to_forget_outstage, gemmlowp_info,
309  output_state_in, &_recurrent_to_forget_weights_transposed, &_recurrent_to_forget_eff_bias,
310  &_mm_recurrent_to_forget_res, &_recurrent_to_forget_outstage_res, recurrent_to_forget_scale,
311  mm_out_info, forget_gate_outstage_info);
312 
313  _accumulate_input_recurrent_forget.configure(compile_context, &_input_to_forget_outstage_res, &_recurrent_to_forget_outstage_res, &_recurrent_to_forget_outstage_res,
315  _input_to_forget_outstage_res.allocator()->allocate();
316 
317  if(_has_peephole)
318  {
319  _mul_cell_to_forget_res.allocator()->init(TensorInfo(cell_state_in->info()->tensor_shape(), 1, DataType::S32));
320  _memory_group.manage(&_mul_cell_to_forget_res);
321  _pixelwise_mul_cell_to_forget.configure(compile_context, cell_state_in, lstm_params.cell_to_forget_weights(), &_mul_cell_to_forget_res, 1.f, ConvertPolicy::SATURATE, RoundingPolicy::TO_ZERO);
322  _cell_to_forget_outstage_res.allocator()->init(TensorInfo(_mul_cell_to_forget_res.info()->tensor_shape(), 1, DataType::QSYMM16, QuantizationInfo(lstm_params.forget_intermediate_scale(), 0)));
323  _memory_group.manage(&_cell_to_forget_outstage_res);
324  const float cell_to_forget_scale = std::pow(2, cell_shift) * lstm_params.cell_to_forget_weights()->info()->quantization_info().uniform().scale / lstm_params.forget_intermediate_scale();
325  quantization::calculate_quantized_multiplier(cell_to_forget_scale, &gemmlowp_info.gemmlowp_multiplier, &gemmlowp_info.gemmlowp_shift);
326  _cell_to_forget_outstage.configure(compile_context, &_mul_cell_to_forget_res, nullptr, &_cell_to_forget_outstage_res, gemmlowp_info);
327  _mul_cell_to_forget_res.allocator()->allocate();
328  _accumulate_cell_forget.configure(compile_context, &_recurrent_to_forget_outstage_res, &_cell_to_forget_outstage_res, &_recurrent_to_forget_outstage_res,
330  _cell_to_forget_outstage_res.allocator()->allocate();
331  }
332 
333  CLTensor *forget_activation_input = &_recurrent_to_forget_outstage_res;
334 
335  if(_has_layer_norm)
336  {
337  configure_layer_norm(LayerNormGate::Forget, &_recurrent_to_forget_outstage_res);
338  _recurrent_to_forget_outstage_res.allocator()->allocate();
339  forget_activation_input = &get_layer_norm_output(LayerNormGate::Forget);
340  }
341 
342  // Output quantization info of Sigmoid and Tanh activations
343  const QuantizationInfo sigmoid_tanh_outqinfo(1.f / 32768.f, 0);
344 
345  const TensorInfo forget_gate_info(TensorShape(num_units, batch_size), 1, DataType::QSYMM16, sigmoid_tanh_outqinfo);
346  _memory_group.manage(&_forget_gate);
347  _forget_gate.allocator()->init(forget_gate_info);
348  _forget_gate_sigmoid.configure(compile_context, forget_activation_input, &_forget_gate, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::LOGISTIC));
349  forget_activation_input->allocator()->allocate();
350 
351  // Modulation gate.
352  const TensorInfo cell_outstage_info(mm_out_info.tensor_shape(), 1, DataType::QSYMM16, QuantizationInfo(lstm_params.cell_intermediate_scale(), 0));
353  const float input_to_cell_scale = input_to_cell_weights->info()->quantization_info().uniform().scale * qinput.scale / lstm_params.cell_intermediate_scale();
354  configure_mm(compile_context, _mm_input_to_cell, _input_to_cell_outstage, gemmlowp_info,
355  input, &_input_to_cell_weights_transposed, &_input_to_cell_eff_bias,
356  &_mm_input_to_cell_res, &_input_to_cell_outstage_res, input_to_cell_scale,
357  mm_out_info, cell_outstage_info);
358 
359  const float recurrent_to_cell_scale = recurrent_to_cell_weights->info()->quantization_info().uniform().scale * qoutput_state_in.scale / lstm_params.cell_intermediate_scale();
360  configure_mm(compile_context, _mm_recurrent_to_cell, _recurrent_to_cell_outstage, gemmlowp_info,
361  output_state_in, &_recurrent_to_cell_weights_transposed, &_recurrent_to_cell_eff_bias,
362  &_mm_recurrent_to_cell_res, &_recurrent_to_cell_outstage_res, recurrent_to_cell_scale,
363  mm_out_info, cell_outstage_info);
364 
365  _accumulate_input_recurrent_modulation.configure(compile_context, &_input_to_cell_outstage_res, &_recurrent_to_cell_outstage_res, &_recurrent_to_cell_outstage_res,
367  _input_to_cell_outstage_res.allocator()->allocate();
368 
369  CLTensor *cell_activation_input = &_recurrent_to_cell_outstage_res;
370 
371  if(_has_layer_norm)
372  {
373  configure_layer_norm(LayerNormGate::Cell, &_recurrent_to_cell_outstage_res);
374  _recurrent_to_cell_outstage_res.allocator()->allocate();
375  cell_activation_input = &get_layer_norm_output(LayerNormGate::Cell);
376  }
377 
378  const TensorInfo cell_gate_info(TensorShape(num_units, batch_size), 1, DataType::QSYMM16, sigmoid_tanh_outqinfo);
379  _memory_group.manage(&_cell_gate);
380  _cell_gate.allocator()->init(cell_gate_info);
381  _cell_gate_tanh.configure(compile_context, cell_activation_input, &_cell_gate, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::TANH, 1.f, 1.f));
382  cell_activation_input->allocator()->allocate();
383 
384  // Input gate.
385  const TensorInfo input_gate_info(TensorShape(num_units, batch_size), 1, DataType::QSYMM16, sigmoid_tanh_outqinfo);
386  _input_gate.allocator()->init(input_gate_info);
387  _memory_group.manage(&_input_gate);
388  if(_has_cifg)
389  {
390  _ones.allocator()->init(*_forget_gate.info());
391  _input_gate_sub.configure(compile_context, &_ones, &_forget_gate, &_input_gate, ConvertPolicy::SATURATE);
392  _ones.allocator()->allocate();
393  }
394  else
395  {
396  const TensorInfo input_outstage_info(TensorShape(num_units, batch_size), 1, DataType::QSYMM16, QuantizationInfo(lstm_params.input_intermediate_scale(), 0));
397  const float input_to_input_scale = _input_to_input_weights->info()->quantization_info().uniform().scale * qinput.scale / lstm_params.input_intermediate_scale();
398  configure_mm(compile_context, _mm_input_to_input, _input_to_input_outstage, gemmlowp_info,
399  input, &_input_to_input_weights_transposed, &_input_to_input_eff_bias,
400  &_mm_input_to_input_res, &_input_to_input_outstage_res, input_to_input_scale,
401  mm_out_info, input_outstage_info);
402 
403  const float recurrent_to_input_scale = _recurrent_to_input_weights->info()->quantization_info().uniform().scale * qoutput_state_in.scale / lstm_params.input_intermediate_scale();
404  configure_mm(compile_context, _mm_recurrent_to_input, _recurrent_to_input_outstage, gemmlowp_info,
405  output_state_in, &_recurrent_to_input_weights_transposed, &_recurrent_to_input_eff_bias,
406  &_mm_recurrent_to_input_res, &_recurrent_to_input_outstage_res, recurrent_to_input_scale,
407  mm_out_info, input_outstage_info);
408  _accumulate_input_recurrent_input.configure(compile_context, &_input_to_input_outstage_res, &_recurrent_to_input_outstage_res, &_recurrent_to_input_outstage_res,
410  _input_to_input_outstage_res.allocator()->allocate();
411 
412  if(_has_peephole)
413  {
414  _mul_cell_to_input_res.allocator()->init(TensorInfo(cell_state_in->info()->tensor_shape(), 1, DataType::S32));
415  _memory_group.manage(&_mul_cell_to_input_res);
416  _pixelwise_mul_cell_to_input.configure(compile_context, cell_state_in, lstm_params.cell_to_input_weights(), &_mul_cell_to_input_res, 1.f, ConvertPolicy::SATURATE, RoundingPolicy::TO_ZERO);
417  const float cell_to_input_scale = std::pow(2, cell_shift) * lstm_params.cell_to_input_weights()->info()->quantization_info().uniform().scale / lstm_params.input_intermediate_scale();
418  quantization::calculate_quantized_multiplier(cell_to_input_scale, &gemmlowp_info.gemmlowp_multiplier, &gemmlowp_info.gemmlowp_shift);
419  _cell_to_input_outstage_res.allocator()->init(TensorInfo(_mul_cell_to_input_res.info()->tensor_shape(), 1, DataType::QSYMM16, QuantizationInfo(lstm_params.input_intermediate_scale(), 0)));
420  _memory_group.manage(&_cell_to_input_outstage_res);
421  _cell_to_input_outstage.configure(compile_context, &_mul_cell_to_input_res, nullptr, &_cell_to_input_outstage_res, gemmlowp_info);
422  _mul_cell_to_input_res.allocator()->allocate();
423  _accumulate_cell_input.configure(&_recurrent_to_input_outstage_res, &_cell_to_input_outstage_res, &_recurrent_to_input_outstage_res, ConvertPolicy::SATURATE);
424  _cell_to_input_outstage_res.allocator()->allocate();
425  }
426 
427  CLTensor *input_activation_input = &_recurrent_to_input_outstage_res;
428 
429  if(_has_layer_norm)
430  {
431  configure_layer_norm(LayerNormGate::Input, &_recurrent_to_input_outstage_res);
432  _recurrent_to_input_outstage_res.allocator()->allocate();
433  input_activation_input = &get_layer_norm_output(LayerNormGate::Input);
434  }
435 
436  _input_gate_sigmoid.configure(compile_context, input_activation_input, &_input_gate, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::LOGISTIC));
437  input_activation_input->allocator()->allocate();
438  }
439  // Cell.
440  // TODO(COMPMID-3396): Perform multiplication in the quantized domain in CLPixelWiseMultiplication
441  _pixelwise_mul_forget_cell.configure(compile_context, &_forget_gate, cell_state_in, &_forget_gate, 1.f, ConvertPolicy::SATURATE, RoundingPolicy::TO_ZERO);
442  const float cell_gate_scale = _cell_gate.info()->quantization_info().uniform().scale;
443  const float mul_input_cell_scale = cell_gate_scale * std::pow(2, 15 + cell_shift);
444  const TensorInfo mul_input_cell_info(TensorShape(num_units, batch_size), 1, DataType::QSYMM16, QuantizationInfo(mul_input_cell_scale, 0));
445  _memory_group.manage(&_mul_input_cell_res);
446  _mul_input_cell_res.allocator()->init(mul_input_cell_info);
447  _pixelwise_mul_input_cell.configure(compile_context, &_input_gate, &_cell_gate, &_mul_input_cell_res, 1.f, ConvertPolicy::SATURATE, RoundingPolicy::TO_ZERO);
448  _cell_gate.allocator()->allocate();
449  _add_forget_cell.configure(compile_context, &_forget_gate, &_mul_input_cell_res, cell_state_out, ConvertPolicy::SATURATE);
450  _mul_input_cell_res.allocator()->allocate();
451  _forget_gate.allocator()->allocate();
452  if(_has_cell_clipping)
453  {
454  _cell_clip.configure(compile_context, cell_state_out, nullptr, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::LU_BOUNDED_RELU, -quantized_cell_clip, quantized_cell_clip));
455  }
456  // Output gate.
457  const TensorInfo output_outstage_info(TensorShape(num_units, batch_size), 1, DataType::QSYMM16, QuantizationInfo(lstm_params.output_intermediate_scale(), 0));
458  const float input_to_output_scale = input_to_output_weights->info()->quantization_info().uniform().scale * qinput.scale / lstm_params.output_intermediate_scale();
459  configure_mm(compile_context, _mm_input_to_output, _input_to_output_outstage, gemmlowp_info,
460  input, &_input_to_output_weights_transposed, &_input_to_output_eff_bias,
461  &_mm_input_to_output_res, &_input_to_output_outstage_res, input_to_output_scale,
462  mm_out_info, output_outstage_info);
463 
464  const float recurrent_to_output_scale = recurrent_to_output_weights->info()->quantization_info().uniform().scale * qoutput_state_in.scale / lstm_params.output_intermediate_scale();
465  configure_mm(compile_context, _mm_recurrent_to_output, _recurrent_to_output_outstage, gemmlowp_info,
466  output_state_in, &_recurrent_to_output_weights_transposed, &_recurrent_to_output_eff_bias,
467  &_mm_recurrent_to_output_res, &_recurrent_to_output_outstage_res, recurrent_to_output_scale,
468  mm_out_info, output_outstage_info);
469 
470  _accumulate_input_recurrent_output.configure(compile_context, &_recurrent_to_output_outstage_res, &_input_to_output_outstage_res, &_recurrent_to_output_outstage_res,
472  _input_to_output_outstage_res.allocator()->allocate();
473 
474  if(_has_peephole)
475  {
476  // TODO(COMPMID-3396): Perform multiplication in the quantized domain in CLPixelWiseMultiplication
477  // Here we are not using the output stage because all operations are done in float
478  _mul_cell_to_output_res.allocator()->init(TensorInfo(cell_state_out->info()->tensor_shape(), 1, DataType::S32));
479  _memory_group.manage(&_mul_cell_to_output_res);
480  _pixelwise_mul_cell_to_output.configure(compile_context, cell_state_out, lstm_params.cell_to_output_weights(), &_mul_cell_to_output_res, 1.f, ConvertPolicy::SATURATE, RoundingPolicy::TO_ZERO);
481 
482  const float cell_to_output_scale = std::pow(2, cell_shift) * lstm_params.cell_to_output_weights()->info()->quantization_info().uniform().scale / lstm_params.output_intermediate_scale();
483  quantization::calculate_quantized_multiplier(cell_to_output_scale, &gemmlowp_info.gemmlowp_multiplier, &gemmlowp_info.gemmlowp_shift);
484  _cell_to_output_outstage_res.allocator()->init(TensorInfo(_mul_cell_to_output_res.info()->tensor_shape(), 1, DataType::QSYMM16, QuantizationInfo(lstm_params.output_intermediate_scale(), 0)));
485  _memory_group.manage(&_cell_to_output_outstage_res);
486  _cell_to_output_outstage.configure(compile_context, &_mul_cell_to_output_res, nullptr, &_cell_to_output_outstage_res, gemmlowp_info);
487  _mul_cell_to_output_res.allocator()->allocate();
488 
489  _accumulate_cell_to_output.configure(compile_context, &_recurrent_to_output_outstage_res, &_cell_to_output_outstage_res, &_recurrent_to_output_outstage_res,
491  _cell_to_output_outstage_res.allocator()->allocate();
492  }
493 
494  CLTensor *output_activation_input = &_recurrent_to_output_outstage_res;
495 
496  if(_has_layer_norm)
497  {
498  configure_layer_norm(LayerNormGate::Output, &_recurrent_to_output_outstage_res);
499  _recurrent_to_output_outstage_res.allocator()->allocate();
500  output_activation_input = &get_layer_norm_output(LayerNormGate::Output);
501  }
502 
503  const TensorInfo output_gate_info(TensorShape(num_units, batch_size), 1, DataType::QSYMM16, sigmoid_tanh_outqinfo);
504  _memory_group.manage(&_output_gate);
505  _output_gate.allocator()->init(output_gate_info);
506  _output_gate_sigmoid.configure(compile_context, output_activation_input, &_output_gate, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::LOGISTIC));
507  output_activation_input->allocator()->allocate();
508 
509  // Hidden.
510  _hidden_tanh.configure(compile_context, cell_state_out, &_input_gate, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::TANH, 1.f, 1.f));
511  // TODO(COMPMID-3396): Perform multiplication in the quantized domain in CLPixelWiseMultiplication
512  _memory_group.manage(&_hidden_mul_res);
513  const TensorInfo hidden_mul_res(_input_gate.info()->tensor_shape(), 1, DataType::S32);
514  _hidden_mul_res.allocator()->init(hidden_mul_res);
515  _pixelwise_mul_hidden.configure(compile_context, &_output_gate, &_input_gate, &_hidden_mul_res, 1.f, ConvertPolicy::SATURATE, RoundingPolicy::TO_ZERO);
516  _output_gate.allocator()->allocate();
517  _input_gate.allocator()->allocate();
518  const float hidden_state_scale = std::pow(2, -15) / lstm_params.hidden_state_scale() * std::pow(2, -15);
519  quantization::calculate_quantized_multiplier(hidden_state_scale, &gemmlowp_info.gemmlowp_multiplier, &gemmlowp_info.gemmlowp_shift, /* ignore_epsilon */ true);
520  gemmlowp_info.gemmlowp_offset = lstm_params.hidden_state_zero();
521  gemmlowp_info.output_data_type = output_state_in->info()->data_type();
522 
523  _projection_tensor_copy_required = (num_units != output_size);
524  ICLTensor *hidden_gate_result = output_state_out;
525 
526  _memory_group.manage(&_hidden_gate);
527 
528  if(_projection_tensor_copy_required)
529  {
530  _hidden_gate.allocator()->init(*output_state_out->info());
531  _hidden_gate.info()->set_tensor_shape(_hidden_mul_res.info()->tensor_shape());
532  hidden_gate_result = &_hidden_gate;
533  }
534 
535  _hidden_outstage.configure(compile_context, &_hidden_mul_res, nullptr, hidden_gate_result, gemmlowp_info);
536  _hidden_mul_res.allocator()->allocate();
537 
538  // Projection.
539  if(_has_projection)
540  {
541  const TensorInfo projection_outstage_info(*output_state_out->info());
542  const UniformQuantizationInfo qprojection = _projection_weights->info()->quantization_info().uniform();
543  const float projection_scale = qprojection.scale * lstm_params.hidden_state_scale() / qoutput_state_in.scale;
544  gemmlowp_info.gemmlowp_offset = qoutput_state_in.offset;
545  gemmlowp_info.gemmlowp_min_bound = std::numeric_limits<int8_t>::lowest();
546  gemmlowp_info.gemmlowp_max_bound = std::numeric_limits<int8_t>::max();
547  gemmlowp_info.output_data_type = DataType::QASYMM8_SIGNED;
548 
549  TensorInfo projection_mm_out_info{ mm_out_info };
550  projection_mm_out_info.set_tensor_shape(TensorShape(output_size, batch_size));
551 
552  configure_mm(compile_context, _mm_projection, _projection_outstage, gemmlowp_info,
553  hidden_gate_result, &_projection_weights_transposed, &_projection_eff_bias,
554  &_mm_projection_res, &_projection_outstage_res, projection_scale,
555  projection_mm_out_info, projection_outstage_info);
556 
557  ICLTensor *accumulate_destination = output_state_out;
558 
559  if(_projection_tensor_copy_required)
560  {
561  _hidden_gate.allocator()->allocate();
562  _projection_accumulate_res.allocator()->init(*output_state_in->info());
563  _projection_accumulate_res.info()->set_tensor_shape(_projection_outstage_res.info()->tensor_shape());
564  _projection_output_to_accumulate_copy.configure(*output_state_in, _projection_accumulate_res);
565  accumulate_destination = &_projection_accumulate_res;
566  }
567 
568  _accumulate_projection.configure(compile_context, &_projection_outstage_res, accumulate_destination, accumulate_destination, ConvertPolicy::SATURATE);
569  _projection_outstage_res.allocator()->allocate();
570 
571  if(_projection_tensor_copy_required)
572  {
573  _projection_accumulate_to_output_copy.configure(_projection_accumulate_res, *output_state_out);
574  _projection_accumulate_res.allocator()->allocate();
575  }
576 
577  int8_t quantized_projection_clip{ 0 };
578  if(lstm_params.projection_clip() > 0.0f)
579  {
580  quantized_projection_clip = utility::clamp<int8_t>(lstm_params.projection_clip() / qprojection.scale, -128, 127);
581  }
582 
583  if(quantized_projection_clip > 0)
584  {
585  _projection_clip.configure(compile_context, output_state_out, nullptr, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::LU_BOUNDED_RELU, -quantized_projection_clip,
586  quantized_projection_clip));
587  _has_projection_clipping = true;
588  }
589  }
590  else
591  {
592  if(_projection_tensor_copy_required)
593  {
594  _hidden_to_output_copy.configure(_hidden_gate, *output_state_out);
595  _hidden_gate.allocator()->allocate();
596  }
597  }
598 
599  // Copy output_state_out to output
600  _copy_output.configure(compile_context, output_state_out, output);
601 }
int16_t quantize_qsymm16(float value, const UniformQuantizationInfo &qinfo, RoundingPolicy rounding_policy=RoundingPolicy::TO_NEAREST_UP)
Quantize a value given a 16-bit symmetric quantization scheme.
Quantize using a fixed point multiplication.
quantized, symmetric fixed-point 16-bit number
TensorInfo * info() const override
Interface to be implemented by the child class to return the tensor&#39;s metadata.
Definition: CLTensor.cpp:41
void build_lstm_params_tensor_info(const LSTMParams< T > &lstm_params, LSTMParams< ITensorInfo > *lstm_params_info)
Build LSTMParams<ITensorInfo> object by extracting the metadata from each tensor. ...
Definition: InfoHelpers.h:71
QuantizationInfo quantization_info() const override
Get the quantization settings (scale and offset) of the tensor.
Definition: TensorInfo.h:287
void configure(const ICLTensor *input, const ICLTensor *bias, ICLTensor *output, const GEMMLowpOutputStageInfo &info)
Initialise the kernel&#39;s inputs, output.
void configure(ICLTensor *input1, ICLTensor *input2, ICLTensor *output, float scale, ConvertPolicy overflow_policy, RoundingPolicy rounding_policy, const ActivationLayerInfo &act_info=ActivationLayerInfo())
Initialise the kernel&#39;s inputs, output and convertion policy.
CLTensorAllocator * allocator()
Return a pointer to the tensor&#39;s allocator.
Definition: CLTensor.cpp:61
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
Status calculate_quantized_multiplier(float multiplier, int32_t *quant_multiplier, int32_t *shift, bool ignore_epsilon=false)
Calculate quantized representation of multiplier.
void init(const TensorInfo &input, size_t alignment=0)
Initialize a tensor based on the passed TensorInfo.
1 channel, 1 S32 per channel
void manage(IMemoryManageable *obj) override
Sets a object to be managed by the given memory group.
Definition: MemoryGroup.h:79
void configure(ICLTensor *input, ICLTensor *output, Window *dst_window=nullptr)
Initialise the function&#39;s source and destination.
Definition: CLCopy.cpp:54
UniformQuantizationInfo uniform() const
Return per layer quantization info.
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor&#39;s metadata.
void configure(ICLTensor *input1, ICLTensor *input2, ICLTensor *output, ConvertPolicy policy, const ActivationLayerInfo &act_info=ActivationLayerInfo())
Initialise the kernel&#39;s inputs, output and conversion policy.
virtual QuantizationInfo quantization_info() const =0
Get the quantization settings (scale and offset) of the tensor.
void configure(const ICLTensor *input1, const ICLTensor *input2, ICLTensor *output, ConvertPolicy policy, const ActivationLayerInfo &act_info=ActivationLayerInfo())
Initialise the kernel&#39;s inputs, output and conversion policy.
void allocate() override
Allocate size specified by TensorInfo of OpenCL memory.
void configure(ICLTensor *input, ICLTensor *output, ActivationLayerInfo act_info)
Set the input and output tensor.
#define ARM_COMPUTE_LOG_PARAMS(...)
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:157
ITensorInfo & set_tensor_shape(const TensorShape &shape) override
Set the shape of an already initialized tensor.
Definition: TensorInfo.cpp:338
quantized, asymmetric fixed-point 8-bit number signed
void configure(const ICLTensor *input, ICLTensor *output)
Initialise the kernel&#39;s inputs and output.
Definition: CLTranspose.cpp:47
const TensorShape & tensor_shape() const override
Size for each dimension of the tensor.
Definition: TensorInfo.h:234
Truncates the least significant values that are lost in operations.
static Status validate(const ITensorInfo *input, const ITensorInfo *input_to_forget_weights, const ITensorInfo *input_to_cell_weights, const ITensorInfo *input_to_output_weights, const ITensorInfo *recurrent_to_forget_weights, const ITensorInfo *recurrent_to_cell_weights, const ITensorInfo *recurrent_to_output_weights, const ITensorInfo *forget_gate_bias, const ITensorInfo *cell_bias, const ITensorInfo *output_gate_bias, const ITensorInfo *cell_state_in, const ITensorInfo *output_state_in, const ITensorInfo *cell_state_out, const ITensorInfo *output_state_out, const ITensorInfo *output, const LSTMParams< ITensorInfo > &lstm_params)
Static function to check if given info will lead to a valid configuration of CLQLSTMLayer.

◆ operator=() [1/2]

CLQLSTMLayer& operator= ( const CLQLSTMLayer )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

CLQLSTMLayer& operator= ( CLQLSTMLayer &&  )
default

Default move assignment operator.

◆ prepare()

void prepare ( )
overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Note
Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from IFunction.

Definition at line 1109 of file CLQLSTMLayer.cpp.

References arm_compute::ACL_DST, arm_compute::ACL_SRC, CLTensorAllocator::allocate(), CLTensor::allocator(), ICLTensor::buffer(), TensorInfo::element_size(), CLScheduler::enqueue_op(), CLScheduler::get(), CLTensor::info(), CLTensor::map(), ITensor::mark_as_unused(), CLScheduler::queue(), CLTranspose::run(), CLArithmeticAddition::run(), TensorInfo::total_size(), and CLTensor::unmap().

Referenced by CLQLSTMLayer::run().

1110 {
1111  if(!_is_prepared)
1112  {
1113  // Pre-transpose weights to be used in GEMM.
1114  _input_to_forget_weights_transposed.allocator()->allocate();
1115  _input_to_cell_weights_transposed.allocator()->allocate();
1116  _input_to_output_weights_transposed.allocator()->allocate();
1117  _recurrent_to_forget_weights_transposed.allocator()->allocate();
1118  _recurrent_to_cell_weights_transposed.allocator()->allocate();
1119  _recurrent_to_output_weights_transposed.allocator()->allocate();
1120  _transpose_input_to_forget_weights.run();
1121  _transpose_input_to_cell_weights.run();
1122  _transpose_input_to_output_weights.run();
1123  _transpose_recurrent_to_forget_weights.run();
1124  _transpose_recurrent_to_cell_weights.run();
1125  _transpose_recurrent_to_output_weights.run();
1126 
1127  // Precompute effective biases
1128  if(_has_cifg)
1129  {
1130  _ones.map(true);
1131  std::fill_n(reinterpret_cast<int16_t *>(_ones.buffer()), _ones.info()->total_size() / _ones.info()->element_size(), 32767);
1132  _ones.unmap();
1133  }
1134  else
1135  {
1136  _input_to_input_eff_bias.allocator()->allocate();
1137  _recurrent_to_input_eff_bias.allocator()->allocate();
1138 
1139  ITensorPack input_to_input_red_pack = { { ACL_SRC, _input_to_input_weights }, { ACL_DST, &_input_to_input_eff_bias } };
1140  CLScheduler::get().enqueue_op(*_input_to_input_reduction, input_to_input_red_pack, false);
1141 
1142  ITensorPack rec_to_input_red_pack = { { ACL_SRC, _recurrent_to_input_weights }, { ACL_DST, &_recurrent_to_input_eff_bias } };
1143  CLScheduler::get().enqueue_op(*_recurrent_to_input_reduction, rec_to_input_red_pack, false);
1144 
1145  _input_to_input_weights_transposed.allocator()->allocate();
1146  _recurrent_to_input_weights_transposed.allocator()->allocate();
1147  _transpose_input_to_input_weights.run();
1148  _transpose_recurrent_to_input_weights.run();
1149  _input_to_input_weights->mark_as_unused();
1150  _recurrent_to_input_weights->mark_as_unused();
1151  }
1152  _input_to_forget_eff_bias.allocator()->allocate();
1153  _recurrent_to_forget_eff_bias.allocator()->allocate();
1154  _input_to_cell_eff_bias.allocator()->allocate();
1155  _recurrent_to_cell_eff_bias.allocator()->allocate();
1156  _input_to_output_eff_bias.allocator()->allocate();
1157  _recurrent_to_output_eff_bias.allocator()->allocate();
1158 
1159  ITensorPack input_to_forget_red_pack = { { ACL_SRC, _input_to_forget_weights }, { ACL_DST, &_input_to_forget_eff_bias } };
1160  CLScheduler::get().enqueue_op(*_input_to_forget_reduction, input_to_forget_red_pack, false);
1161 
1162  ITensorPack rec_to_forget_red_pack = { { ACL_SRC, _recurrent_to_forget_weights }, { ACL_DST, &_recurrent_to_forget_eff_bias } };
1163  CLScheduler::get().enqueue_op(*_recurrent_to_forget_reduction, rec_to_forget_red_pack, false);
1164 
1165  ITensorPack input_to_cell_red_pack = { { ACL_SRC, _input_to_cell_weights }, { ACL_DST, &_input_to_cell_eff_bias } };
1166  CLScheduler::get().enqueue_op(*_input_to_cell_reduction, input_to_cell_red_pack, false);
1167 
1168  ITensorPack rec_to_cell_red_pack = { { ACL_SRC, _recurrent_to_cell_weights }, { ACL_DST, &_recurrent_to_cell_eff_bias } };
1169  CLScheduler::get().enqueue_op(*_recurrent_to_cell_reduction, rec_to_cell_red_pack, false);
1170 
1171  ITensorPack input_to_output_red_pack = { { ACL_SRC, _input_to_output_weights }, { ACL_DST, &_input_to_output_eff_bias } };
1172  CLScheduler::get().enqueue_op(*_input_to_output_reduction, input_to_output_red_pack, false);
1173 
1174  ITensorPack rec_to_output_red_pack = { { ACL_SRC, _recurrent_to_output_weights }, { ACL_DST, &_recurrent_to_output_eff_bias } };
1175  CLScheduler::get().enqueue_op(*_recurrent_to_output_reduction, rec_to_output_red_pack, false);
1176 
1177  if(_has_projection)
1178  {
1179  _projection_eff_bias.allocator()->allocate();
1180  ITensorPack proj_red_pack{ { ACL_SRC, _projection_weights }, { ACL_DST, &_projection_eff_bias } };
1181  CLScheduler::get().enqueue_op(*_projection_reduction, proj_red_pack, false);
1182  if(_projection_bias != nullptr)
1183  {
1184  _projection_bias_add.run();
1185  _projection_bias->mark_as_unused();
1186  }
1187 
1188  _projection_weights_transposed.allocator()->allocate();
1189  _transpose_projection_weights.run();
1190  _projection_weights->mark_as_unused();
1191 
1192  if(!_projection_tensor_copy_required)
1193  {
1194  _hidden_gate.mark_as_unused();
1195  _projection_accumulate_res.mark_as_unused();
1196  }
1197  }
1198 
1199  // Mark weights as unused
1200  _input_to_forget_weights->mark_as_unused();
1201  _input_to_cell_weights->mark_as_unused();
1202  _input_to_output_weights->mark_as_unused();
1203  _recurrent_to_forget_weights->mark_as_unused();
1204  _recurrent_to_cell_weights->mark_as_unused();
1205  _recurrent_to_output_weights->mark_as_unused();
1206 
1207  CLScheduler::get().queue().finish();
1208  _is_prepared = true;
1209  }
1210 }
TensorInfo * info() const override
Interface to be implemented by the child class to return the tensor&#39;s metadata.
Definition: CLTensor.cpp:41
void run() override
Run the kernels contained in the function.
static CLScheduler & get()
Access the scheduler singleton.
CLTensorAllocator * allocator()
Return a pointer to the tensor&#39;s allocator.
Definition: CLTensor.cpp:61
void map(bool blocking=true)
Enqueue a map operation of the allocated buffer.
Definition: CLTensor.cpp:66
void mark_as_unused() const
Marks a tensor as unused.
Definition: ITensor.cpp:168
uint8_t * buffer() const override
Interface to be implemented by the child class to return a pointer to CPU memory. ...
Definition: ICLTensor.cpp:53
void run() override
Run the kernels contained in the function.
Definition: CLTranspose.cpp:66
size_t total_size() const override
Returns the total size of the tensor in bytes.
Definition: TensorInfo.h:250
void enqueue_op(ICLKernel &kernel, ITensorPack &tensors, bool flush=true)
Schedule the execution of the passed kernel if possible.
cl::CommandQueue & queue()
Accessor for the associated CL command queue.
Definition: CLScheduler.cpp:43
void allocate() override
Allocate size specified by TensorInfo of OpenCL memory.
void unmap()
Enqueue an unmap operation of the allocated and mapped buffer.
Definition: CLTensor.cpp:71
size_t element_size() const override
Element size in bytes calculated as data_size() * num_channels()
Definition: TensorInfo.h:222

◆ run()

void run ( )
overridevirtual

Run the kernels contained in the function.

For CPU kernels:

  • Multi-threading is used for the kernels which are parallelisable.
  • By default std::thread::hardware_concurrency() threads are used.
Note
CPPScheduler::set_num_threads() can be used to manually set the number of threads

For OpenCL kernels:

  • All the kernels are enqueued on the queue associated with CLScheduler.
  • The queue is then flushed.
Note
The function will not block until the kernels are executed. It is the user's responsibility to wait.
Will call prepare() on first run if hasn't been done

Implements IFunction.

Definition at line 968 of file CLQLSTMLayer.cpp.

References CLScheduler::enqueue(), CLScheduler::get(), CLQLSTMLayer::prepare(), CLCopy::run(), CLActivationLayer::run(), CLGEMMLowpOutputStage::run(), CLPixelWiseMultiplication::run(), CLGEMMLowpMatrixMultiplyCore::run(), CLArithmeticAddition::run(), and CLArithmeticSubtraction::run().

969 {
970  prepare();
971 
972  // Acquire all the temporaries
973  MemoryGroupResourceScope scope_mg(_memory_group);
974 
975  // Forget gate.
976  _mm_input_to_forget.run();
977  _input_to_forget_outstage.run();
978 
979  _mm_recurrent_to_forget.run();
980  _recurrent_to_forget_outstage.run();
981  _accumulate_input_recurrent_forget.run();
982 
983  if(_has_peephole)
984  {
985  _pixelwise_mul_cell_to_forget.run();
986  _cell_to_forget_outstage.run();
987  _accumulate_cell_forget.run();
988  }
989 
990  if(_has_layer_norm)
991  {
992  CLScheduler::get().enqueue(get_layer_norm(LayerNormGate::Forget));
993  }
994 
995  _forget_gate_sigmoid.run();
996 
997  // Modulation gate.
998  _mm_input_to_cell.run();
999  _input_to_cell_outstage.run();
1000 
1001  _mm_recurrent_to_cell.run();
1002  _recurrent_to_cell_outstage.run();
1003  _accumulate_input_recurrent_modulation.run();
1004 
1005  if(_has_layer_norm)
1006  {
1007  CLScheduler::get().enqueue(get_layer_norm(LayerNormGate::Cell));
1008  }
1009 
1010  _cell_gate_tanh.run();
1011 
1012  // Input gate
1013  if(_has_cifg)
1014  {
1015  _input_gate_sub.run();
1016  }
1017  else
1018  {
1019  _mm_input_to_input.run();
1020  _input_to_input_outstage.run();
1021  _mm_recurrent_to_input.run();
1022  _recurrent_to_input_outstage.run();
1023  _accumulate_input_recurrent_input.run();
1024 
1025  if(_has_peephole)
1026  {
1027  _pixelwise_mul_cell_to_input.run();
1028  _cell_to_input_outstage.run();
1029  _accumulate_cell_input.run();
1030  }
1031 
1032  if(_has_layer_norm)
1033  {
1034  CLScheduler::get().enqueue(get_layer_norm(LayerNormGate::Input));
1035  }
1036 
1037  _input_gate_sigmoid.run();
1038  }
1039 
1040  // Cell.
1041  _pixelwise_mul_forget_cell.run();
1042  _pixelwise_mul_input_cell.run();
1043  _add_forget_cell.run();
1044  if(_has_cell_clipping)
1045  {
1046  _cell_clip.run();
1047  }
1048 
1049  // Output gate.
1050  _mm_input_to_output.run();
1051  _input_to_output_outstage.run();
1052  _mm_recurrent_to_output.run();
1053  _recurrent_to_output_outstage.run();
1054  _accumulate_input_recurrent_output.run();
1055  if(_has_peephole)
1056  {
1057  _pixelwise_mul_cell_to_output.run();
1058  _cell_to_output_outstage.run();
1059  _accumulate_cell_to_output.run();
1060  }
1061 
1062  if(_has_layer_norm)
1063  {
1064  CLScheduler::get().enqueue(get_layer_norm(LayerNormGate::Output));
1065  }
1066 
1067  _output_gate_sigmoid.run();
1068 
1069  // Hidden.
1070  _hidden_tanh.run();
1071  _pixelwise_mul_hidden.run();
1072  _hidden_outstage.run();
1073 
1074  // Projection.
1075  if(_has_projection)
1076  {
1077  _mm_projection.run();
1078  _projection_outstage.run();
1079 
1080  if(_projection_tensor_copy_required)
1081  {
1082  _projection_output_to_accumulate_copy.run();
1083  }
1084 
1085  _accumulate_projection.run();
1086 
1087  if(_projection_tensor_copy_required)
1088  {
1089  _projection_accumulate_to_output_copy.run();
1090  }
1091 
1092  if(_has_projection_clipping)
1093  {
1094  _projection_clip.run();
1095  }
1096  }
1097  else
1098  {
1099  if(_projection_tensor_copy_required)
1100  {
1101  _hidden_to_output_copy.run();
1102  }
1103  }
1104 
1105  // Copy output_state_out to output
1106  _copy_output.run();
1107 }
void run() override
Run the kernels contained in the function.
static CLScheduler & get()
Access the scheduler singleton.
void run() override
Run the kernels contained in the function.
void run() override
Run the kernels contained in the function.
Definition: CLCopy.cpp:76
void run() override
Run the kernels contained in the function.
void run() override
Run the kernels contained in the function.
void run() override
Run the kernels contained in the function.
void prepare() override
Prepare the function for executing.
void run() override
Run the kernels contained in the function.
void enqueue(ICLKernel &kernel, bool flush=true)
Schedule the execution of the passed kernel if possible.

◆ validate()

Status validate ( const ITensorInfo input,
const ITensorInfo input_to_forget_weights,
const ITensorInfo input_to_cell_weights,
const ITensorInfo input_to_output_weights,
const ITensorInfo recurrent_to_forget_weights,
const ITensorInfo recurrent_to_cell_weights,
const ITensorInfo recurrent_to_output_weights,
const ITensorInfo forget_gate_bias,
const ITensorInfo cell_bias,
const ITensorInfo output_gate_bias,
const ITensorInfo cell_state_in,
const ITensorInfo output_state_in,
const ITensorInfo cell_state_out,
const ITensorInfo output_state_out,
const ITensorInfo output,
const LSTMParams< ITensorInfo > &  lstm_params 
)
static

Static function to check if given info will lead to a valid configuration of CLQLSTMLayer.

Parameters
[in]inputSource tensor info. Input is a 2D tensor info with dimensions [input_size, batch_size]. Data types supported: QASYMM8_SIGNED.
[in]input_to_forget_weights2D weights tensor info with dimensions [input_size, num_units]. Data type supported: QSYMM8.
[in]input_to_cell_weights2D weights tensor info with dimensions [input_size, num_units]. Data type supported: QSYMM8.
[in]input_to_output_weights2D weights tensor info with dimensions [input_size, num_units]. Data type supported: QSYMM8.
[in]recurrent_to_forget_weights2D weights tensor info with dimensions [output_size, num_units]. Data type supported: QSYMM8.
[in]recurrent_to_cell_weights2D weights tensor info with dimensions [output_size, num_units]. Data type supported: QSYMM8.
[in]recurrent_to_output_weights2D weights tensor info with dimensions [output_size, num_units]. Data type supported: QSYMM8.
[in]forget_gate_bias1D weights tensor info with dimensions [num_units]. Data type supported: S32.
[in]cell_bias1D weights tensor info with dimensions [num_units]. Data type supported: S32.
[in]output_gate_bias1D weights tensor info with dimensions [num_units]. Data type supported: S32.
[in]cell_state_in2D tensor info with dimensions [num_units, batch_size]. Data type supported: QSYMM16.
[in]output_state_in2D tensor info with dimensions [output_size, batch_size]. Data type supported: Same as input.
[in]cell_state_outDestination tensor info. Output is a 2D tensor info with dimensions [num_units, batch_size]. Data type supported: QSYMM16.
[in]output_state_outDestination tensor info. Output is a 2D tensor info with dimensions [output_size, batch_size].Data types supported: Same as input.
[in]outputDestination tensor info. Output is a 2D tensor info with dimensions [output_size, batch_size].Data types supported: Same as input.
[in]lstm_paramsWeights tensors info used in peephole, CIFG and layer normalization optimizations: input_intermediate_scale Scale of the intermediate result of matmul, i.e. input to layer normalization, at input gate. forget_intermediate_scale Scale of the intermediate result of matmul, i.e. input to layer normalization, at forget gate. cell_intermediate_scale Scale of the intermediate result of matmul, i.e. input to layer normalization, at cell gate. output_intermediate_scale Scale of the intermediate result of matmul, i.e. input to layer normalization, at output gate. hidden_state_zero The zero point of the hidden state. hidden_state_scale The scale of the hidden state. input_to_input_weights (Optional) 2D weights tensor with dimensions [input_size, num_units]. Data type supported: QSYMM8. recurrent_to_input_weights (Optional) 2D weights tensor with dimensions [output_size, num_units]. Data type supported: QSYMM8. cell_to_input_weights (Optional) 1D weights tensor with dimensions [num_units]. Can be nullptr. Data type supported: QSYMM16. cell_to_forget_weights (Optional) 1D weights tensor with dimensions [num_units]. Data type supported: QSYMM16. cell_to_output_weights (Optional) 1D weights tensor with dimensions [num_units]. Data type supported: QSYMM16. input_gate_bias (Optional) 1D weights tensor with dimensions [num_units]. Data type supported: S32. projection_weights (Optional) 2D weights tensor with dimensions [output_size, num_units]. Data type supported: QSYMM8. projection_bias (Optional) 1D weights tensor with dimensions [output_size]. S32. input_layer_norm_weights (Optional) 1D weights tensor with dimensions [num_units]. Data type supported: QSYMM16. forget_layer_norm_weights (Optional) 1D weights tensor with dimensions [num_units]. Data type supported: QSYMM16. cell_layer_norm_weights (Optional) 1D weights tensor with dimensions [num_units]. Data type supported: QSYMM16. output_layer_norm_weights (Optional) 1D weights tensor with dimensions [num_units]. Data type supported: QSYMM16. cell_threshold (Optional) The clipping threshold for the cell state, such that values are bound within [-cell_clip, cell_clip]. If set to 0.0 then clipping is disabled. projection_threshold (Optional) The clipping threshold for the output from the projection layer, such that values are bound within [-proj_clip, proj_clip]. If set to 0.0 then clipping is disabled.
Returns
a status

Definition at line 603 of file CLQLSTMLayer.cpp.

References ARM_COMPUTE_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_RETURN_ERROR_ON, ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_SHAPES, ARM_COMPUTE_RETURN_ERROR_ON_MSG, ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR, ARM_COMPUTE_RETURN_ON_ERROR, arm_compute::quantization::calculate_quantized_multiplier(), LSTMParams< T >::cell_clip(), LSTMParams< T >::cell_intermediate_scale(), LSTMParams< T >::cell_layer_norm_weights(), LSTMParams< T >::cell_to_forget_weights(), LSTMParams< T >::cell_to_input_weights(), LSTMParams< T >::cell_to_output_weights(), ITensorInfo::data_type(), TensorInfo::data_type(), ITensorInfo::dimension(), arm_compute::test::validation::forget_gate_bias, LSTMParams< T >::forget_intermediate_scale(), LSTMParams< T >::forget_layer_norm_weights(), GEMMLowpOutputStageInfo::gemmlowp_max_bound, GEMMLowpOutputStageInfo::gemmlowp_min_bound, GEMMLowpOutputStageInfo::gemmlowp_multiplier, GEMMLowpOutputStageInfo::gemmlowp_offset, GEMMLowpOutputStageInfo::gemmlowp_shift, LSTMParams< T >::has_cifg_opt(), LSTMParams< T >::has_peephole_opt(), LSTMParams< T >::has_projection(), LSTMParams< T >::hidden_state_scale(), LSTMParams< T >::hidden_state_zero(), LSTMParams< T >::input_gate_bias(), LSTMParams< T >::input_intermediate_scale(), LSTMParams< T >::input_layer_norm_weights(), arm_compute::test::validation::input_size, LSTMParams< T >::input_to_input_weights(), ActivationLayerInfo::LOGISTIC, arm_compute::support::cpp11::lowest(), ActivationLayerInfo::LU_BOUNDED_RELU, ITensorInfo::num_dimensions(), UniformQuantizationInfo::offset, GEMMLowpOutputStageInfo::output_data_type, arm_compute::test::validation::output_gate_bias, LSTMParams< T >::output_intermediate_scale(), LSTMParams< T >::output_layer_norm_weights(), arm_compute::test::validation::output_size, LSTMParams< T >::projection_bias(), LSTMParams< T >::projection_clip(), LSTMParams< T >::projection_weights(), arm_compute::QASYMM8_SIGNED, arm_compute::QSYMM16, arm_compute::QSYMM8, ITensorInfo::quantization_info(), arm_compute::QUANTIZE_DOWN_FIXEDPOINT, arm_compute::quantize_qasymm8_signed(), arm_compute::quantize_qsymm16(), LSTMParams< T >::recurrent_to_input_weights(), arm_compute::S32, arm_compute::SATURATE, UniformQuantizationInfo::scale, TensorInfo::set_tensor_shape(), ActivationLayerInfo::TANH, arm_compute::TO_ZERO, ITensorInfo::total_size(), GEMMLowpOutputStageInfo::type, QuantizationInfo::uniform(), LSTMParams< T >::use_layer_norm(), arm_compute::experimental::dynamic_fusion::validate(), CLTranspose::validate(), ClGemmLowpMatrixAReductionKernel::validate(), CLCopy::validate(), CLActivationLayer::validate(), CLGEMMLowpOutputStage::validate(), CLPixelWiseMultiplication::validate(), CLArithmeticAddition::validate(), and CLArithmeticSubtraction::validate().

Referenced by CLQLSTMLayer::configure().

610 {
612  recurrent_to_output_weights, forget_gate_bias, cell_bias, output_gate_bias, cell_state_in, output_state_in,
613  cell_state_out, output_state_out, output);
614 
616  ARM_COMPUTE_RETURN_ERROR_ON_MSG(input->num_dimensions() != 2, "Input must have exactly 2 dimensions");
617 
618  const unsigned int input_size = input->dimension(0);
619  const unsigned int batch_size = input->dimension(1);
620  const unsigned int num_units = input_to_output_weights->dimension(1);
621  const unsigned int output_size = output_state_out->dimension(_out_state_output_size_dimension_idx);
622 
627  ARM_COMPUTE_RETURN_ERROR_ON(recurrent_to_output_weights->dimension(1) != num_units);
632 
633  ARM_COMPUTE_RETURN_ERROR_ON(forget_gate_bias->num_dimensions() != 1);
634  ARM_COMPUTE_RETURN_ERROR_ON(forget_gate_bias->dimension(0) != num_units);
638 
639  ARM_COMPUTE_RETURN_ERROR_ON(cell_state_in->num_dimensions() != 2);
640  ARM_COMPUTE_RETURN_ERROR_ON(cell_state_in->dimension(0) != num_units);
641  ARM_COMPUTE_RETURN_ERROR_ON(cell_state_in->dimension(1) != batch_size);
643 
644  ARM_COMPUTE_RETURN_ERROR_ON(output_state_in->num_dimensions() != 2);
645  ARM_COMPUTE_RETURN_ERROR_ON(output_state_in->dimension(0) != output_size);
646  ARM_COMPUTE_RETURN_ERROR_ON(output_state_in->dimension(1) != batch_size);
648 
649  // Check whether peephole weights are all there or none
650  if(lstm_params.has_peephole_opt())
651  {
652  ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR(lstm_params.cell_to_forget_weights(), lstm_params.cell_to_output_weights());
653  ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(lstm_params.cell_to_forget_weights(), 1, DataType::QSYMM16);
654  ARM_COMPUTE_RETURN_ERROR_ON(lstm_params.cell_to_forget_weights()->num_dimensions() != 1);
655  ARM_COMPUTE_RETURN_ERROR_ON(lstm_params.cell_to_forget_weights()->dimension(0) != num_units);
656  ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(lstm_params.cell_to_forget_weights(), lstm_params.cell_to_output_weights());
657  ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_SHAPES(lstm_params.cell_to_forget_weights(), lstm_params.cell_to_output_weights());
658 
659  if(!lstm_params.has_cifg_opt())
660  {
661  ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR(lstm_params.cell_to_input_weights());
662  ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(lstm_params.cell_to_forget_weights(), lstm_params.cell_to_input_weights());
663  ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_SHAPES(lstm_params.cell_to_forget_weights(), lstm_params.cell_to_input_weights());
664  }
665  }
666 
667  const UniformQuantizationInfo qinput = input->quantization_info().uniform();
668  const UniformQuantizationInfo qcell_state_in = cell_state_in->quantization_info().uniform();
669  const UniformQuantizationInfo qoutput_state_in = output_state_in->quantization_info().uniform();
670 
671  // Calculate and decompose effective scales for optimizing matmul calculation
672  const int32_t cell_shift = log2(qcell_state_in.scale);
673  ARM_COMPUTE_RETURN_ERROR_ON(cell_shift > -9);
674 
675  // Calculate quantized parameters for clipping.
676  int16_t quantized_cell_clip = 0;
677  if(lstm_params.cell_clip() > 0.0f)
678  {
679  quantized_cell_clip = quantize_qsymm16(lstm_params.cell_clip(), qcell_state_in);
680  }
681 
682  // Precompute effective bias for optimizing the matmul computations.
683  const TensorInfo eff_bias_info(TensorShape(num_units), 1, DataType::S32);
684  const TensorInfo projection_eff_bias_info(TensorShape(output_size), 1, DataType::S32);
685  if(!lstm_params.has_cifg_opt())
686  {
687  ARM_COMPUTE_RETURN_ON_ERROR(ClGemmLowpMatrixAReductionKernel::validate(lstm_params.input_to_input_weights(), &eff_bias_info, GEMMLowpReductionKernelInfo(num_units, false, -qinput.offset, true)));
688  ARM_COMPUTE_RETURN_ON_ERROR(ClGemmLowpMatrixAReductionKernel::validate(lstm_params.recurrent_to_input_weights(), &eff_bias_info, GEMMLowpReductionKernelInfo(num_units, false, -qoutput_state_in.offset,
689  true)));
690  }
691  ARM_COMPUTE_RETURN_ON_ERROR(ClGemmLowpMatrixAReductionKernel::validate(input_to_forget_weights, &eff_bias_info, GEMMLowpReductionKernelInfo(num_units, false, -qinput.offset, true)));
692  ARM_COMPUTE_RETURN_ON_ERROR(ClGemmLowpMatrixAReductionKernel::validate(recurrent_to_forget_weights, &eff_bias_info, GEMMLowpReductionKernelInfo(num_units, false, -qoutput_state_in.offset, true)));
693  ARM_COMPUTE_RETURN_ON_ERROR(ClGemmLowpMatrixAReductionKernel::validate(input_to_cell_weights, &eff_bias_info, GEMMLowpReductionKernelInfo(num_units, false, -qinput.offset, true)));
694  ARM_COMPUTE_RETURN_ON_ERROR(ClGemmLowpMatrixAReductionKernel::validate(recurrent_to_cell_weights, &eff_bias_info, GEMMLowpReductionKernelInfo(num_units, false, -qoutput_state_in.offset, true)));
695  ARM_COMPUTE_RETURN_ON_ERROR(ClGemmLowpMatrixAReductionKernel::validate(input_to_output_weights, &eff_bias_info, GEMMLowpReductionKernelInfo(num_units, false, -qinput.offset, true)));
696  ARM_COMPUTE_RETURN_ON_ERROR(ClGemmLowpMatrixAReductionKernel::validate(recurrent_to_output_weights, &eff_bias_info, GEMMLowpReductionKernelInfo(num_units, false, -qoutput_state_in.offset, true)));
697  if(lstm_params.has_projection())
698  {
699  ARM_COMPUTE_RETURN_ON_ERROR(ClGemmLowpMatrixAReductionKernel::validate(lstm_params.projection_weights(), &projection_eff_bias_info, GEMMLowpReductionKernelInfo(output_size, false,
700  lstm_params.hidden_state_zero(),
701  true)));
702  if(lstm_params.projection_bias() != nullptr)
703  {
704  ARM_COMPUTE_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(lstm_params.projection_bias(), 1, DataType::S32);
705  ARM_COMPUTE_RETURN_ON_ERROR(CLArithmeticAddition::validate(lstm_params.projection_bias(), &projection_eff_bias_info,
706  &projection_eff_bias_info, ConvertPolicy::SATURATE));
707  }
708  }
709 
710  const TensorInfo input_weights_transposed(TensorShape(num_units, input_size), 1, input_to_forget_weights->data_type(), input_to_forget_weights->quantization_info());
711  const TensorInfo recurrent_weights_transposed(TensorShape(num_units, output_size), 1, recurrent_to_forget_weights->data_type(), recurrent_to_forget_weights->quantization_info());
712 
713  // Validate weights transpose
720  if(!lstm_params.has_cifg_opt())
721  {
722  ARM_COMPUTE_RETURN_ON_ERROR(CLTranspose::validate(lstm_params.input_to_input_weights(), &input_weights_transposed));
723  ARM_COMPUTE_RETURN_ON_ERROR(CLTranspose::validate(lstm_params.recurrent_to_input_weights(), &recurrent_weights_transposed));
724  }
725  if(lstm_params.has_projection())
726  {
727  const TensorInfo projection_weights_transposed(TensorShape(output_size, num_units), 1, lstm_params.projection_weights()->data_type(), lstm_params.projection_weights()->quantization_info());
728  ARM_COMPUTE_RETURN_ON_ERROR(CLTranspose::validate(lstm_params.projection_weights(), &projection_weights_transposed));
729  }
730 
731  GEMMLowpOutputStageInfo gemmlowp_info;
733  gemmlowp_info.gemmlowp_min_bound = std::numeric_limits<int16_t>::lowest();
734  gemmlowp_info.gemmlowp_max_bound = std::numeric_limits<int16_t>::max();
735  gemmlowp_info.output_data_type = DataType::QSYMM16;
736 
737  const bool has_layer_norm = lstm_params.use_layer_norm();
738 
739  // Forget gate.
740  ARM_COMPUTE_RETURN_ERROR_ON(lstm_params.forget_intermediate_scale() == 0);
741  const TensorInfo forget_outstage_info(TensorShape(num_units, batch_size), 1, DataType::QSYMM16, QuantizationInfo(lstm_params.forget_intermediate_scale(), 0));
742  const TensorInfo mm_out_info(TensorShape(num_units, batch_size), 1, DataType::S32);
743  const float input_to_forget_scale = input_to_forget_weights->quantization_info().uniform().scale * qinput.scale / lstm_params.forget_intermediate_scale();
744  ARM_COMPUTE_RETURN_ON_ERROR(validate_mm(gemmlowp_info, input, &input_weights_transposed, &eff_bias_info, input_to_forget_scale, &mm_out_info, &forget_outstage_info));
745 
746  const float recurrent_to_forget_scale = recurrent_to_forget_weights->quantization_info().uniform().scale * qoutput_state_in.scale / lstm_params.forget_intermediate_scale();
747  ARM_COMPUTE_RETURN_ON_ERROR(validate_mm(gemmlowp_info, output_state_in, &recurrent_weights_transposed, &eff_bias_info, recurrent_to_forget_scale, &mm_out_info, &forget_outstage_info));
748 
749  ARM_COMPUTE_RETURN_ON_ERROR(CLArithmeticAddition::validate(&forget_outstage_info, &forget_outstage_info, &forget_outstage_info, ConvertPolicy::SATURATE));
750 
751  if(lstm_params.has_peephole_opt())
752  {
753  ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(lstm_params.cell_to_forget_weights(), 1, DataType::QSYMM16);
754  ARM_COMPUTE_RETURN_ON_ERROR(CLPixelWiseMultiplication::validate(cell_state_in, lstm_params.cell_to_forget_weights(), &mm_out_info, 1.f, ConvertPolicy::SATURATE,
756  const float cell_to_forget_scale = std::pow(2, cell_shift) * lstm_params.cell_to_forget_weights()->quantization_info().uniform().scale / lstm_params.forget_intermediate_scale();
757  ARM_COMPUTE_RETURN_ON_ERROR(quantization::calculate_quantized_multiplier(cell_to_forget_scale, &gemmlowp_info.gemmlowp_multiplier, &gemmlowp_info.gemmlowp_shift));
758  ARM_COMPUTE_RETURN_ON_ERROR(CLGEMMLowpOutputStage::validate(&mm_out_info, nullptr, &forget_outstage_info, gemmlowp_info));
759  ARM_COMPUTE_RETURN_ON_ERROR(CLArithmeticAddition::validate(&forget_outstage_info, &forget_outstage_info, &forget_outstage_info, ConvertPolicy::SATURATE));
760  }
761 
762  if(has_layer_norm)
763  {
764  const ITensorInfo *w_info = lstm_params.forget_layer_norm_weights();
765  const ITensorInfo *b_info = forget_gate_bias;
766  ARM_COMPUTE_RETURN_ON_ERROR(validate_layer_norm(forget_outstage_info, *w_info, *b_info));
767  }
768 
769  // Output quantization info of Sigmoid and Tanh activations
770  const QuantizationInfo sigmoid_tanh_outqinfo(1.f / 32768.f, 0);
771 
772  const TensorInfo forget_gate_info(TensorShape(num_units, batch_size), 1, DataType::QSYMM16, sigmoid_tanh_outqinfo);
773  ARM_COMPUTE_RETURN_ON_ERROR(CLActivationLayer::validate(&forget_outstage_info, &forget_gate_info, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::LOGISTIC)));
774 
775  // Modulation gate.
776  ARM_COMPUTE_RETURN_ERROR_ON(lstm_params.cell_intermediate_scale() == 0);
777  const TensorInfo cell_outstage_info(TensorShape(num_units, batch_size), 1, DataType::QSYMM16, QuantizationInfo(lstm_params.cell_intermediate_scale(), 0));
778  const float input_to_cell_scale = input_to_cell_weights->quantization_info().uniform().scale * qinput.scale / lstm_params.cell_intermediate_scale();
779  ARM_COMPUTE_RETURN_ON_ERROR(validate_mm(gemmlowp_info, input, &input_weights_transposed, &eff_bias_info, input_to_cell_scale, &mm_out_info, &cell_outstage_info));
780 
781  const float recurrent_to_cell_scale = recurrent_to_cell_weights->quantization_info().uniform().scale * qoutput_state_in.scale / lstm_params.cell_intermediate_scale();
782  ARM_COMPUTE_RETURN_ON_ERROR(validate_mm(gemmlowp_info, output_state_in, &recurrent_weights_transposed, &eff_bias_info, recurrent_to_cell_scale, &mm_out_info, &cell_outstage_info));
783 
784  ARM_COMPUTE_RETURN_ON_ERROR(CLArithmeticAddition::validate(&cell_outstage_info, &cell_outstage_info, &cell_outstage_info, ConvertPolicy::SATURATE));
785 
786  if(has_layer_norm)
787  {
788  const ITensorInfo *w_info = lstm_params.cell_layer_norm_weights();
789  const ITensorInfo *b_info = cell_bias;
790  ARM_COMPUTE_RETURN_ON_ERROR(validate_layer_norm(cell_outstage_info, *w_info, *b_info));
791  }
792 
793  const TensorInfo cell_gate_info(TensorShape(num_units, batch_size), 1, DataType::QSYMM16, sigmoid_tanh_outqinfo);
794  ARM_COMPUTE_RETURN_ON_ERROR(CLActivationLayer::validate(&cell_outstage_info, &cell_gate_info, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::TANH, 1.f, 1.f)));
795 
796  // Input gate.
797  const TensorInfo input_gate_info(TensorShape(num_units, batch_size), 1, DataType::QSYMM16, sigmoid_tanh_outqinfo);
798  if(lstm_params.has_cifg_opt())
799  {
800  ARM_COMPUTE_RETURN_ERROR_ON_MSG(lstm_params.input_gate_bias() != nullptr, "Input gate bias must not be present when CIFG is used");
801  ARM_COMPUTE_RETURN_ON_ERROR(CLArithmeticSubtraction::validate(&input_gate_info, &forget_gate_info, &forget_gate_info, ConvertPolicy::SATURATE));
802  }
803  else
804  {
805  ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR(lstm_params.input_to_input_weights(), lstm_params.recurrent_to_input_weights(), lstm_params.input_gate_bias());
806  ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(input_to_forget_weights, lstm_params.input_to_input_weights(), lstm_params.recurrent_to_input_weights());
807  ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_SHAPES(input_to_forget_weights, lstm_params.input_to_input_weights());
808  ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_SHAPES(recurrent_to_forget_weights, lstm_params.recurrent_to_input_weights());
811 
812  ARM_COMPUTE_RETURN_ERROR_ON(lstm_params.input_intermediate_scale() == 0);
813  const TensorInfo input_outstage_info(TensorShape(num_units, batch_size), 1, DataType::QSYMM16, QuantizationInfo(lstm_params.input_intermediate_scale(), 0));
814  const float input_to_input_scale = lstm_params.input_to_input_weights()->quantization_info().uniform().scale * qinput.scale / lstm_params.input_intermediate_scale();
815  ARM_COMPUTE_RETURN_ON_ERROR(validate_mm(gemmlowp_info, input, &input_weights_transposed, &eff_bias_info, input_to_input_scale, &mm_out_info, &input_outstage_info));
816 
817  const float recurrent_to_input_scale = lstm_params.recurrent_to_input_weights()->quantization_info().uniform().scale * qoutput_state_in.scale / lstm_params.input_intermediate_scale();
818  ARM_COMPUTE_RETURN_ON_ERROR(validate_mm(gemmlowp_info, output_state_in, &recurrent_weights_transposed, &eff_bias_info, recurrent_to_input_scale, &mm_out_info, &input_outstage_info));
819 
820  ARM_COMPUTE_RETURN_ON_ERROR(CLArithmeticAddition::validate(&input_outstage_info, &input_outstage_info, &input_outstage_info, ConvertPolicy::SATURATE));
821 
822  if(lstm_params.has_peephole_opt())
823  {
824  ARM_COMPUTE_RETURN_ON_ERROR(CLPixelWiseMultiplication::validate(cell_state_in, lstm_params.cell_to_input_weights(), &mm_out_info, 1.f, ConvertPolicy::SATURATE,
826  const float cell_to_input_scale = std::pow(2, cell_shift) * lstm_params.cell_to_input_weights()->quantization_info().uniform().scale / lstm_params.input_intermediate_scale();
827  ARM_COMPUTE_RETURN_ON_ERROR(quantization::calculate_quantized_multiplier(cell_to_input_scale, &gemmlowp_info.gemmlowp_multiplier, &gemmlowp_info.gemmlowp_shift));
828  ARM_COMPUTE_RETURN_ON_ERROR(CLGEMMLowpOutputStage::validate(&mm_out_info, &eff_bias_info, &input_outstage_info, gemmlowp_info));
829  ARM_COMPUTE_RETURN_ON_ERROR(CLArithmeticAddition::validate(&input_outstage_info, &input_outstage_info, &input_outstage_info, ConvertPolicy::SATURATE));
830  }
831 
832  if(has_layer_norm)
833  {
834  const ITensorInfo *w_info = lstm_params.input_layer_norm_weights();
835  const ITensorInfo *b_info = lstm_params.input_gate_bias();
836  ARM_COMPUTE_RETURN_ON_ERROR(validate_layer_norm(cell_outstage_info, *w_info, *b_info));
837  }
838 
839  ARM_COMPUTE_RETURN_ON_ERROR(CLActivationLayer::validate(&input_outstage_info, &input_gate_info, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::LOGISTIC, 1.f, 1.f)));
840  }
841  // Cell.
842  ARM_COMPUTE_RETURN_ON_ERROR(CLPixelWiseMultiplication::validate(&forget_gate_info, cell_state_in, &forget_gate_info, 1.f, ConvertPolicy::SATURATE, RoundingPolicy::TO_ZERO));
844  ARM_COMPUTE_RETURN_ON_ERROR(CLArithmeticAddition::validate(&forget_gate_info, &cell_gate_info, cell_state_out, ConvertPolicy::SATURATE));
845  if(quantized_cell_clip > 0)
846  {
847  ARM_COMPUTE_RETURN_ON_ERROR(CLActivationLayer::validate(cell_state_out, nullptr, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::LU_BOUNDED_RELU, -quantized_cell_clip,
848  quantized_cell_clip)));
849  }
850  // Output gate.
851  ARM_COMPUTE_RETURN_ERROR_ON(lstm_params.output_intermediate_scale() == 0);
852  const TensorInfo output_outstage_info(TensorShape(num_units, batch_size), 1, DataType::QSYMM16, QuantizationInfo(lstm_params.output_intermediate_scale(), 0));
853  const float input_to_output_scale = input_to_output_weights->quantization_info().uniform().scale * qinput.scale / lstm_params.output_intermediate_scale();
854  ARM_COMPUTE_RETURN_ON_ERROR(validate_mm(gemmlowp_info, input, &input_weights_transposed, &eff_bias_info, input_to_output_scale, &mm_out_info, &output_outstage_info));
855 
856  const float recurrent_to_output_scale = recurrent_to_output_weights->quantization_info().uniform().scale * qoutput_state_in.scale / lstm_params.output_intermediate_scale();
857  ARM_COMPUTE_RETURN_ON_ERROR(validate_mm(gemmlowp_info, output_state_in, &recurrent_weights_transposed, &eff_bias_info, recurrent_to_output_scale, &mm_out_info, &output_outstage_info));
858 
859  ARM_COMPUTE_RETURN_ON_ERROR(CLArithmeticAddition::validate(&output_outstage_info, &output_outstage_info, &output_outstage_info, ConvertPolicy::SATURATE));
860  if(lstm_params.has_peephole_opt())
861  {
862  ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(lstm_params.cell_to_output_weights(), 1, DataType::QSYMM16);
863  // TODO(COMPMID-3395): Perform multiplication in the quantized domain in NEPixelWiseMultiplicationKernel
864  // Here we are not using the output stage because all operations are done in float
865  // const float cell_to_output_scale = std::pow(2, cell_shift) * lstm_params.cell_to_output_weights()->quantization_info().uniform().scale / lstm_params.output_intermediate_scale();
866  // ARM_COMPUTE_RETURN_ON_ERROR(quantization::calculate_quantized_multiplier(cell_to_output_scale, &gemmlowp_info.gemmlowp_multiplier, &gemmlowp_info.gemmlowp_shift));
867  ARM_COMPUTE_RETURN_ON_ERROR(CLPixelWiseMultiplication::validate(cell_state_out, lstm_params.cell_to_output_weights(), &output_outstage_info, 1.f, ConvertPolicy::SATURATE,
869  ARM_COMPUTE_RETURN_ON_ERROR(CLArithmeticAddition::validate(&output_outstage_info, &output_outstage_info, &output_outstage_info, ConvertPolicy::SATURATE));
870  }
871 
872  if(has_layer_norm)
873  {
874  const ITensorInfo *w_info = lstm_params.output_layer_norm_weights();
875  const ITensorInfo *b_info = output_gate_bias;
876  ARM_COMPUTE_RETURN_ON_ERROR(validate_layer_norm(output_outstage_info, *w_info, *b_info));
877  }
878 
879  const TensorInfo output_gate_info(TensorShape(num_units, batch_size), 1, DataType::QSYMM16, sigmoid_tanh_outqinfo);
880  ARM_COMPUTE_RETURN_ON_ERROR(CLActivationLayer::validate(&output_outstage_info, &output_gate_info, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::LOGISTIC)));
881 
882  // Hidden.
883  ARM_COMPUTE_RETURN_ON_ERROR(CLActivationLayer::validate(cell_state_out, &input_gate_info, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::TANH, 1.f, 1.f)));
884  const TensorInfo hidden_mul_res(TensorShape(num_units, batch_size), 1, DataType::S32);
885  const TensorInfo hidden_out_info(TensorShape(num_units, batch_size), 1, DataType::QASYMM8_SIGNED);
886 
887  ARM_COMPUTE_RETURN_ERROR_ON(lstm_params.hidden_state_scale() == 0);
888  ARM_COMPUTE_RETURN_ON_ERROR(CLPixelWiseMultiplication::validate(&output_gate_info, &input_gate_info, &hidden_mul_res, 1.f, ConvertPolicy::SATURATE, RoundingPolicy::TO_ZERO));
889  const float hidden_state_scale = std::pow(2, -15) / lstm_params.hidden_state_scale() * std::pow(2, -15);
890  ARM_COMPUTE_RETURN_ON_ERROR(quantization::calculate_quantized_multiplier(hidden_state_scale, &gemmlowp_info.gemmlowp_multiplier, &gemmlowp_info.gemmlowp_shift, /* ignore_epsilon */ true));
891  gemmlowp_info.gemmlowp_offset = lstm_params.hidden_state_zero();
892  gemmlowp_info.output_data_type = hidden_out_info.data_type();
893  ARM_COMPUTE_RETURN_ON_ERROR(CLGEMMLowpOutputStage::validate(&hidden_mul_res, nullptr, &hidden_out_info, gemmlowp_info));
894 
895  const bool projection_tensor_copy_required = num_units != output_size;
896 
897  // Projection.
898  if(lstm_params.has_projection())
899  {
901  ARM_COMPUTE_RETURN_ERROR_ON(qoutput_state_in.scale == 0);
902 
903  const UniformQuantizationInfo qprojection = lstm_params.projection_weights()->quantization_info().uniform();
904  const float projection_scale = qprojection.scale * lstm_params.hidden_state_scale() / qoutput_state_in.scale;
905  ARM_COMPUTE_RETURN_ON_ERROR(quantization::calculate_quantized_multiplier(projection_scale, &gemmlowp_info.gemmlowp_multiplier, &gemmlowp_info.gemmlowp_shift));
906  gemmlowp_info.gemmlowp_offset = qoutput_state_in.offset;
907  gemmlowp_info.gemmlowp_min_bound = std::numeric_limits<int8_t>::lowest();
908  gemmlowp_info.gemmlowp_max_bound = std::numeric_limits<int8_t>::max();
909  gemmlowp_info.output_data_type = DataType::QASYMM8_SIGNED;
910 
911  const TensorInfo projection_outstage_info(*output_state_out);
912  const TensorInfo projection_weights_transposed(TensorShape(output_size, num_units), 1, lstm_params.projection_weights()->data_type(), lstm_params.projection_weights()->quantization_info());
913 
914  TensorInfo projection_mm_out_info{ mm_out_info };
915  projection_mm_out_info.set_tensor_shape(TensorShape(output_size, batch_size));
916 
917  ARM_COMPUTE_RETURN_ON_ERROR(validate_mm(gemmlowp_info, &hidden_out_info, &projection_weights_transposed, &projection_eff_bias_info, projection_scale, &projection_mm_out_info,
918  &projection_outstage_info));
919 
920  if(projection_tensor_copy_required)
921  {
922  ARM_COMPUTE_RETURN_ON_ERROR(CLQLSTMLayer::TensorCopyKernel::validate(*output_state_in, projection_outstage_info));
923  }
924 
925  ARM_COMPUTE_RETURN_ON_ERROR(CLArithmeticAddition::validate(output_state_out, output_state_out, output_state_out, ConvertPolicy::SATURATE));
926 
927  if(projection_tensor_copy_required)
928  {
929  ARM_COMPUTE_RETURN_ON_ERROR(CLQLSTMLayer::TensorCopyKernel::validate(projection_outstage_info, *output_state_out));
930  }
931 
932  int8_t quantized_projection_clip{ 0 };
933  if(lstm_params.projection_clip() > 0.0f)
934  {
935  quantized_projection_clip = quantize_qasymm8_signed(lstm_params.projection_clip(), qprojection);
936  }
937 
938  if(quantized_projection_clip > 0)
939  {
940  ARM_COMPUTE_RETURN_ON_ERROR(CLActivationLayer::validate(output_state_out, nullptr, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::LU_BOUNDED_RELU, -quantized_projection_clip,
941  quantized_projection_clip)));
942  }
943  }
944  else
945  {
946  if(projection_tensor_copy_required)
947  {
948  ARM_COMPUTE_RETURN_ON_ERROR(CLQLSTMLayer::TensorCopyKernel::validate(hidden_out_info, *output_state_out));
949  }
950  }
951 
952  if(cell_state_out->total_size() > 0)
953  {
954  ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(cell_state_in, cell_state_out);
955  ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_SHAPES(cell_state_in, cell_state_out);
956  }
957 
958  if(output_state_out->total_size() > 0)
959  {
961  ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_SHAPES(output_state_in, output_state_out);
962  }
963 
964  ARM_COMPUTE_RETURN_ON_ERROR(CLCopy::validate(output_state_out, output));
965  return Status{};
966 }
static Status validate(const ITensorInfo *input, const ITensorInfo *output, const ActivationLayerInfo &act_info)
Static function to check if given info will lead to a valid configuration of CLActivationLayer.
Status validate(const OperatorGraph &op_graph)
Return the validity of op_graph, usually after performing an operation (e.g.
int16_t quantize_qsymm16(float value, const UniformQuantizationInfo &qinfo, RoundingPolicy rounding_policy=RoundingPolicy::TO_NEAREST_UP)
Quantize a value given a 16-bit symmetric quantization scheme.
Quantize using a fixed point multiplication.
quantized, symmetric fixed-point 16-bit number
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204
Status calculate_quantized_multiplier(float multiplier, int32_t *quant_multiplier, int32_t *shift, bool ignore_epsilon=false)
Calculate quantized representation of multiplier.
#define ARM_COMPUTE_RETURN_ERROR_ON(cond)
If the condition is true, an error is returned.
Definition: Error.h:296
#define ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR(...)
Definition: Validate.h:159
static Status validate(const ITensorInfo *input, const ITensorInfo *bias, const ITensorInfo *output, const GEMMLowpOutputStageInfo &info)
Static function to check if given info will lead to a valid configuration of opencl::kernels::ClGemmL...
1 channel, 1 S32 per channel
static Status validate(const ITensorInfo *mtx_a, const ITensorInfo *vector_sum_row, const GEMMLowpReductionKernelInfo &info)
Static function to check if given info will lead to a valid configuration.
int8_t quantize_qasymm8_signed(float value, const INFO_TYPE &qinfo, RoundingPolicy rounding_policy=RoundingPolicy::TO_NEAREST_UP)
Quantize a value given a signed 8-bit asymmetric quantization scheme.
static Status validate(const ITensorInfo *input1, const ITensorInfo *input2, const ITensorInfo *output, ConvertPolicy policy, const ActivationLayerInfo &act_info=ActivationLayerInfo())
Static function to check if given info will lead to a valid configuration of opencl::kernels::ClSatur...
static Status validate(const ITensorInfo *input1, const ITensorInfo *input2, const ITensorInfo *output, ConvertPolicy policy, const ActivationLayerInfo &act_info=ActivationLayerInfo())
Static function to check if given info will lead to a valid configuration of opencl::kernels::ClSatur...
quantized, symmetric fixed-point 8-bit number
#define ARM_COMPUTE_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(t, c,...)
Definition: Validate.h:786
static Status validate(const ITensorInfo *input, const ITensorInfo *output, Window *dst_window=nullptr)
Static function to check if given info will lead to a valid configuration of CLCopy.
Definition: CLCopy.cpp:71
static Status validate(const ITensorInfo *input, const ITensorInfo *output)
Static function to check if given info will lead to a valid configuration of CLTranspose.
Definition: CLTranspose.cpp:61
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_SHAPES(...)
Definition: Validate.h:439
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(...)
Definition: Validate.h:541
#define ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(t, c,...)
Definition: Validate.h:788
#define ARM_COMPUTE_RETURN_ERROR_ON_MSG(cond, msg)
If the condition is true, an error is returned.
Definition: Error.h:244
quantized, asymmetric fixed-point 8-bit number signed
Truncates the least significant values that are lost in operations.
static Status validate(const ITensorInfo *input1, const ITensorInfo *input2, const ITensorInfo *output, float scale, ConvertPolicy overflow_policy, RoundingPolicy rounding_policy, const ActivationLayerInfo &act_info=ActivationLayerInfo())
Static function to check if given info will lead to a valid configuration of CLPixelWiseMultiplicatio...

The documentation for this class was generated from the following files: