Basic function to run CLLSTMLayerQuantized. More...

#include <CLLSTMLayerQuantized.h>

Collaboration diagram for CLLSTMLayerQuantized:

Public Member Functions
	CLLSTMLayerQuantized (std::shared_ptr< IMemoryManager > memory_manager=nullptr)
	Default constructor. More...

	CLLSTMLayerQuantized (const CLLSTMLayerQuantized &)=delete
	Prevent instances of this class from being copied (As this class contains pointers) More...

	CLLSTMLayerQuantized (CLLSTMLayerQuantized &&)=default
	Default move constructor. More...

CLLSTMLayerQuantized &	operator= (const CLLSTMLayerQuantized &)=delete
	Prevent instances of this class from being copied (As this class contains pointers) More...

CLLSTMLayerQuantized &	operator= (CLLSTMLayerQuantized &&)=default
	Default move assignment operator. More...

void	configure (const ICLTensor input, const ICLTensor input_to_input_weights, const ICLTensor input_to_forget_weights, const ICLTensor input_to_cell_weights, const ICLTensor input_to_output_weights, const ICLTensor recurrent_to_input_weights, const ICLTensor recurrent_to_forget_weights, const ICLTensor recurrent_to_cell_weights, const ICLTensor recurrent_to_output_weights, const ICLTensor input_gate_bias, const ICLTensor forget_gate_bias, const ICLTensor cell_bias, const ICLTensor output_gate_bias, ICLTensor cell_state_in, const ICLTensor output_state_in, ICLTensor cell_state_out, ICLTensor *output_state_out)
	Initialize function's tensors. More...

void	configure (const CLCompileContext &compile_context, const ICLTensor input, const ICLTensor input_to_input_weights, const ICLTensor input_to_forget_weights, const ICLTensor input_to_cell_weights, const ICLTensor input_to_output_weights, const ICLTensor recurrent_to_input_weights, const ICLTensor recurrent_to_forget_weights, const ICLTensor recurrent_to_cell_weights, const ICLTensor recurrent_to_output_weights, const ICLTensor input_gate_bias, const ICLTensor forget_gate_bias, const ICLTensor cell_bias, const ICLTensor output_gate_bias, ICLTensor cell_state_in, const ICLTensor output_state_in, ICLTensor cell_state_out, ICLTensor *output_state_out)
	Initialize function's tensors. More...

void	run () override
	Run the kernels contained in the function. More...

void	prepare () override
	Prepare the function for executing. More...

Public Member Functions inherited from IFunction
virtual	~IFunction ()=default
	Destructor. More...

Static Public Member Functions
static Status	validate (const ITensorInfo input, const ITensorInfo input_to_input_weights, const ITensorInfo input_to_forget_weights, const ITensorInfo input_to_cell_weights, const ITensorInfo input_to_output_weights, const ITensorInfo recurrent_to_input_weights, const ITensorInfo recurrent_to_forget_weights, const ITensorInfo recurrent_to_cell_weights, const ITensorInfo recurrent_to_output_weights, const ITensorInfo input_gate_bias, const ITensorInfo forget_gate_bias, const ITensorInfo cell_bias, const ITensorInfo output_gate_bias, const ITensorInfo cell_state_in, const ITensorInfo output_state_in, const ITensorInfo cell_state_out, const ITensorInfo *output_state_out)
	Static function to check if given info will lead to a valid configuration of CLLSTMLayerQuantized. More...

Detailed Description

Basic function to run CLLSTMLayerQuantized.

This function calls the following CL functions/kernels:

CLGEMMLowpMatrixMultiplyCore Quantized matrix multiplication core. Accumulators are 32-bit integers
CLGEMMLowpOutputStage Convert 32-bit integers into QSYMM16
CLTranspose Matrix transpose
CLConcatenateLayer Tensor concatenation
CLActivationLayer Activation functions (tanh and logistic)
CLArithmeticAddition Elementwise addition
CLPixelWiseMultiplication Elementwise multiplication
CLSlice Tensor slicing
CLDequantizationLayer Dequantize into float
CLQuantizationLayer Quantize from float

Definition at line 60 of file CLLSTMLayerQuantized.h.

Constructor & Destructor Documentation

◆ CLLSTMLayerQuantized() [1/3]

CLLSTMLayerQuantized ( std::shared_ptr< IMemoryManager > memory_manager = nullptr )

Default constructor.

Definition at line 48 of file CLLSTMLayerQuantized.cpp.

     : _memory_group(std::move(memory_manager)),
       _gemmlowp(),
       _output_stage(),
       _transpose_weights(),
       _concat_input_weights(),
       _concat_recurrent_weights(),
       _concat_weights(),
       _concat_inputs(),
       _concat_bias(),
       _sigmoid_forget_gate(),
       _sigmoid_input_gate(),
       _sigmoid_output_gate(),
       _tanh_modulation_gate(),
       _tanh_output_state(),
       _add_cell_state_tmps(),
       _add2(),
       _mul_forget_gate_cell_state(),
       _mul_input_gate_input_mod_gate(),
       _mul_output_state_tmp_output_gate(),
       _slice_input_tensor(),
       _slice_forget_tensor(),
       _slice_cell_tensor(),
       _slice_output_tensor(),
       _dequantize(),
       _quantize(),
       _input_to_input_weights(nullptr),
       _input_to_forget_weights(nullptr),
       _input_to_cell_weights(nullptr),
       _input_to_output_weights(nullptr),
       _recurrent_to_input_weights(nullptr),
       _recurrent_to_forget_weights(nullptr),
       _recurrent_to_cell_weights(nullptr),
       _recurrent_to_output_weights(nullptr),
       _input_gate_bias(nullptr),
       _forget_gate_bias(nullptr),
       _cell_bias(nullptr),
       _output_gate_bias(nullptr),
       _recurrent_weights(),
       _input_weights(),
       _weights(),
       _input(),
       _weights_transposed(),
       _output_highp(),
       _output_lowp(),
       _bias(),
       _forget_gate_input(),
       _input_gate_input(),
       _output_gate_input(),
       _input_modulation_gate_input(),
       _forget_gate_output(),
       _input_gate_output(),
       _output_gate_output(),
       _input_modulation_gate_output(),
       _cell_state_tmp1(),
       _cell_state_tmp2(),
       _output_state_tmp(),
       _output_state_out_symm(),
       _output_state_out_f32(),
       _is_prepared(false)
 {
 }

◆ CLLSTMLayerQuantized() [2/3]

CLLSTMLayerQuantized ( const CLLSTMLayerQuantized & )

delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ CLLSTMLayerQuantized() [3/3]

CLLSTMLayerQuantized ( CLLSTMLayerQuantized && )

default

Default move constructor.

Member Function Documentation

◆ configure() [1/2]

void configure	(	const CLCompileContext &	compile_context,
		const ICLTensor *	input,
		const ICLTensor *	input_to_input_weights,
		const ICLTensor *	input_to_forget_weights,
		const ICLTensor *	input_to_cell_weights,
		const ICLTensor *	input_to_output_weights,
		const ICLTensor *	recurrent_to_input_weights,
		const ICLTensor *	recurrent_to_forget_weights,
		const ICLTensor *	recurrent_to_cell_weights,
		const ICLTensor *	recurrent_to_output_weights,
		const ICLTensor *	input_gate_bias,
		const ICLTensor *	forget_gate_bias,
		const ICLTensor *	cell_bias,
		const ICLTensor *	output_gate_bias,
		ICLTensor *	cell_state_in,
		const ICLTensor *	output_state_in,
		ICLTensor *	cell_state_out,
		ICLTensor *	output_state_out
	)

Initialize function's tensors.

Parameters

[in]	compile_context	The compile context to be used.
[in]	input	Source tensor. Input is a 2D tensor with dimensions [input_size, batch_size]. Data types supported: QASYMM8.
[in]	input_to_input_weights	2D weights tensor with dimensions [input_size, output_size]. Data type supported: Same as `input`.
[in]	input_to_forget_weights	2D weights tensor with dimensions [input_size, output_size]. Data type supported: Same as `input`.
[in]	input_to_cell_weights	2D weights tensor with dimensions [input_size, output_size]. Data type supported: Same as `input`.
[in]	input_to_output_weights	2D weights tensor with dimensions [input_size, output_size]. Data type supported: Same as `input`.
[in]	recurrent_to_input_weights	2D weights tensor with dimensions [output_size, output_size]. Data type supported: Same as `input`.
[in]	recurrent_to_forget_weights	2D weights tensor with dimensions [output_size, output_size]. Data type supported: Same as `input`.
[in]	recurrent_to_cell_weights	2D weights tensor with dimensions [output_size, output_size]. Data type supported: Same as `input`.
[in]	recurrent_to_output_weights	2D weights tensor with dimensions [output_size, output_size]. Data type supported: Same as `input`.
[in]	input_gate_bias	1D weights tensor with dimensions [output_size]. Data type supported: S32.
[in]	forget_gate_bias	1D weights tensor with dimensions [output_size]. Data type supported: S32.
[in]	cell_bias	1D weights tensor with dimensions [output_size]. Data type supported: S32.
[in]	output_gate_bias	1D weights tensor with dimensions [output_size]. Data type supported: S32.
[in]	cell_state_in	2D tensor with dimensions [output_size, batch_size]. Data type supported: QSYMM16.
[in]	output_state_in	2D tensor with dimensions [output_size, batch_size]. Data type supported: Same as `input`.
[out]	cell_state_out	Destination tensor. Output is a 2D tensor with dimensions [output_size, batch_size]. Data type supported: QSYMM16.
[out]	output_state_out	Destination tensor. Output is a 2D tensor with dimensions [output_size, batch_size].Data types supported: Same as `input`.

Definition at line 135 of file CLLSTMLayerQuantized.cpp.

 {
     ARM_COMPUTE_ERROR_ON_NULLPTR(input, input_to_input_weights, input_to_forget_weights, input_to_cell_weights,
                                  input_to_output_weights, recurrent_to_input_weights, recurrent_to_forget_weights,
                                  recurrent_to_cell_weights, recurrent_to_output_weights, input_gate_bias,
                                  forget_gate_bias, cell_bias, output_gate_bias, cell_state_in, output_state_in,
                                  cell_state_out, output_state_out);
  
     ARM_COMPUTE_LOG_PARAMS(input, input_to_input_weights, input_to_forget_weights, input_to_cell_weights,
                            input_to_output_weights, recurrent_to_input_weights, recurrent_to_forget_weights,
                            recurrent_to_cell_weights, recurrent_to_output_weights, input_gate_bias, forget_gate_bias,
                            cell_bias, output_gate_bias, cell_state_in, output_state_in, cell_state_out,
                            output_state_out);
  
     ARM_COMPUTE_ERROR_THROW_ON(CLLSTMLayerQuantized::validate(
         input->info(), input_to_input_weights->info(), input_to_forget_weights->info(), input_to_cell_weights->info(),
         input_to_output_weights->info(), recurrent_to_input_weights->info(), recurrent_to_forget_weights->info(),
         recurrent_to_cell_weights->info(), recurrent_to_output_weights->info(), input_gate_bias->info(),
         forget_gate_bias->info(), cell_bias->info(), output_gate_bias->info(), cell_state_in->info(),
         output_state_in->info(), cell_state_out->info(), output_state_out->info()));
  
     const int input_size  = input->info()->dimension(0);
     const int batch_size  = input->info()->dimension(1);
     const int output_size = input_to_input_weights->info()->dimension(1);
  
     const QuantizationInfo qweights = input_to_input_weights->info()->quantization_info(); // Weights quantization
  
     auto_init_if_empty(*cell_state_out->info(),
                        TensorInfo(TensorShape(batch_size, output_size), 1, DataType::QSYMM16, qsymm_4));
     auto_init_if_empty(*output_state_out->info(),
                        TensorInfo(TensorShape(batch_size, output_size), 1, DataType::QASYMM8, qasymm));
  
     _input_to_input_weights      = input_to_input_weights;
     _input_to_forget_weights     = input_to_forget_weights;
     _input_to_cell_weights       = input_to_cell_weights;
     _input_to_output_weights     = input_to_output_weights;
     _recurrent_to_input_weights  = recurrent_to_input_weights;
     _recurrent_to_forget_weights = recurrent_to_forget_weights;
     _recurrent_to_cell_weights   = recurrent_to_cell_weights;
     _recurrent_to_output_weights = recurrent_to_output_weights;
     _input_gate_bias             = input_gate_bias;
     _forget_gate_bias            = forget_gate_bias;
     _cell_bias                   = cell_bias;
     _output_gate_bias            = output_gate_bias;
  
     // Weights concatenation
     std::vector<const ICLTensor *> inputs_weights_vector;
     inputs_weights_vector.emplace_back(input_to_input_weights);
     inputs_weights_vector.emplace_back(input_to_forget_weights);
     inputs_weights_vector.emplace_back(input_to_cell_weights);
     inputs_weights_vector.emplace_back(input_to_output_weights);
  
     std::vector<const ICLTensor *> recurrent_weights_vector;
     recurrent_weights_vector.emplace_back(recurrent_to_input_weights);
     recurrent_weights_vector.emplace_back(recurrent_to_forget_weights);
     recurrent_weights_vector.emplace_back(recurrent_to_cell_weights);
     recurrent_weights_vector.emplace_back(recurrent_to_output_weights);
  
     _input_weights.allocator()->init(
         TensorInfo(TensorShape(input_size, 4 * output_size), 1, DataType::QASYMM8, qweights));
     _concat_input_weights.configure(compile_context, inputs_weights_vector, &_input_weights, Window::DimY);
  
     _recurrent_weights.allocator()->init(
         TensorInfo(TensorShape(output_size, 4 * output_size), 1, DataType::QASYMM8, qweights));
     _concat_recurrent_weights.configure(compile_context, recurrent_weights_vector, &_recurrent_weights, Window::DimY);
  
     std::vector<const ICLTensor *> weights_vector;
     weights_vector.emplace_back(&_recurrent_weights);
     weights_vector.emplace_back(&_input_weights);
  
     _weights.allocator()->init(
         TensorInfo(TensorShape(output_size + input_size, 4 * output_size), 1, DataType::QASYMM8, qweights));
     _concat_weights.configure(compile_context, weights_vector, &_weights, Window::DimX);
     _transpose_weights.configure(compile_context, &_weights, &_weights_transposed);
  
     // Input concatenation
     std::vector<const ICLTensor *> input_vector;
     input_vector.emplace_back(input);
     input_vector.emplace_back(output_state_in);
  
     _memory_group.manage(&_input);
     _input.allocator()->init(
         TensorInfo(TensorShape(output_size + input_size, batch_size), 1, DataType::QASYMM8, qasymm));
     _concat_inputs.configure(compile_context, input_vector, &_input, Window::DimX);
  
     // Bias concatenation
     std::vector<const ICLTensor *> bias_vector;
     bias_vector.emplace_back(input_gate_bias);
     bias_vector.emplace_back(forget_gate_bias);
     bias_vector.emplace_back(cell_bias);
     bias_vector.emplace_back(output_gate_bias);
  
     _bias.allocator()->init(TensorInfo(TensorShape(4 * output_size), 1, DataType::S32));
     _concat_bias.configure(compile_context, bias_vector, &_bias, Window::DimX);
  
     // Invert the offset for gemmlowp
     _input.info()->set_quantization_info(QuantizationInfo(qasymm.uniform().scale, -qasymm.uniform().offset));
     _weights_transposed.info()->set_quantization_info(
         QuantizationInfo(qweights.uniform().scale, -qweights.uniform().offset));
  
     // Run gemmlowp
     _memory_group.manage(&_output_highp);
     _output_highp.allocator()->init(TensorInfo(TensorShape(4 * output_size, batch_size), 1, DataType::S32));
     _gemmlowp.configure(compile_context, &_input, &_weights_transposed, nullptr, &_output_highp);
     _input.allocator()->allocate();
  
     // Set the offset back
     _input.info()->set_quantization_info(QuantizationInfo(qasymm.uniform().scale, qasymm.uniform().offset));
     _weights_transposed.info()->set_quantization_info(
         QuantizationInfo(qweights.uniform().scale, qweights.uniform().offset));
  
     // multiplier = (input_scale * weights_scale) / output_scale (2 ^ (-12))
     _output_lowp.allocator()->init(TensorInfo(_output_highp.info()->tensor_shape(), 1, DataType::QSYMM16, qsymm_3));
  
     const float multiplier        = 4096.f * qasymm.uniform().scale * qweights.uniform().scale;
     int         output_multiplier = 0;
     int         output_shift      = 0;
     quantization::calculate_quantized_multiplier(multiplier, &output_multiplier, &output_shift);
  
     _memory_group.manage(&_output_lowp);
  
     GEMMLowpOutputStageInfo info{};
     info.type                = GEMMLowpOutputStageType::QUANTIZE_DOWN_FIXEDPOINT;
     info.gemmlowp_multiplier = output_multiplier;
     info.gemmlowp_shift      = output_shift;
     info.output_data_type    = DataType::QSYMM16;
     _output_stage.configure(compile_context, &_output_highp, &_bias, &_output_lowp, info);
     _output_highp.allocator()->allocate();
     _bias.allocator()->allocate();
  
     // Get the gate tensors
     if (batch_size > 1)
     {
         _memory_group.manage(&_input_gate_input);
         _slice_input_tensor.configure(compile_context, &_output_lowp, &_input_gate_input, {0, 0},
                                       {output_size, batch_size});
         _memory_group.manage(&_forget_gate_input);
         _slice_forget_tensor.configure(compile_context, &_output_lowp, &_forget_gate_input, {output_size, 0},
                                        {2 * output_size, batch_size});
         _memory_group.manage(&_input_modulation_gate_input);
         _slice_cell_tensor.configure(compile_context, &_output_lowp, &_input_modulation_gate_input,
                                      {2 * output_size, 0}, {3 * output_size, batch_size});
         _memory_group.manage(&_output_gate_input);
         _slice_output_tensor.configure(compile_context, &_output_lowp, &_output_gate_input, {3 * output_size, 0},
                                        {4 * output_size, batch_size});
         _output_lowp.allocator()->allocate();
     }
     else
     {
         _memory_group.manage(&_input_gate_input);
         _slice_input_tensor.configure(compile_context, &_output_lowp, &_input_gate_input, {0}, {output_size});
         _memory_group.manage(&_forget_gate_input);
         _slice_forget_tensor.configure(compile_context, &_output_lowp, &_forget_gate_input, {output_size},
                                        {2 * output_size});
         _memory_group.manage(&_input_modulation_gate_input);
         _slice_cell_tensor.configure(compile_context, &_output_lowp, &_input_modulation_gate_input, {2 * output_size},
                                      {3 * output_size});
         _memory_group.manage(&_output_gate_input);
         _slice_output_tensor.configure(compile_context, &_output_lowp, &_output_gate_input, {3 * output_size},
                                        {4 * output_size});
         _output_lowp.allocator()->allocate();
     }
  
     // Forget gate
     _memory_group.manage(&_forget_gate_output);
     _forget_gate_output.allocator()->init(
         TensorInfo(_forget_gate_input.info()->tensor_shape(), 1, DataType::QSYMM16, qsymm_0));
     _sigmoid_forget_gate.configure(compile_context, &_forget_gate_input, &_forget_gate_output,
                                    ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::LOGISTIC));
     _forget_gate_input.allocator()->allocate();
  
     // Input gate
     _memory_group.manage(&_input_gate_output);
     _input_gate_output.allocator()->init(
         TensorInfo(_input_gate_input.info()->tensor_shape(), 1, DataType::QSYMM16, qsymm_0));
     _sigmoid_input_gate.configure(compile_context, &_input_gate_input, &_input_gate_output,
                                   ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::LOGISTIC));
     _input_gate_input.allocator()->allocate();
  
     // Input modulation gate equation
     _memory_group.manage(&_input_modulation_gate_output);
     _input_modulation_gate_output.allocator()->init(
         TensorInfo(_input_modulation_gate_input.info()->tensor_shape(), 1, DataType::QSYMM16, qsymm_0));
     _tanh_modulation_gate.configure(compile_context, &_input_modulation_gate_input, &_input_modulation_gate_output,
                                     ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::TANH, 1.0f, 1.0f));
     _input_modulation_gate_input.allocator()->allocate();
  
     // Output gate
     _memory_group.manage(&_output_gate_output);
     _output_gate_output.allocator()->init(
         TensorInfo(_output_gate_input.info()->tensor_shape(), 1, DataType::QSYMM16, qsymm_0));
     _sigmoid_output_gate.configure(compile_context, &_output_gate_input, &_output_gate_output,
                                    ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::LOGISTIC));
     _output_gate_input.allocator()->allocate();
  
     // Long term memory
     _memory_group.manage(&_cell_state_tmp1);
     _cell_state_tmp1.allocator()->init(
         TensorInfo(_forget_gate_output.info()->tensor_shape(), 1, DataType::QSYMM16, qsymm_4));
     _mul_forget_gate_cell_state.configure(compile_context, &_forget_gate_output, cell_state_in, &_cell_state_tmp1, 1,
                                           ConvertPolicy::SATURATE, RoundingPolicy::TO_ZERO);
     _forget_gate_output.allocator()->allocate();
  
     _memory_group.manage(&_cell_state_tmp2);
     _cell_state_tmp2.allocator()->init(
         TensorInfo(_input_gate_output.info()->tensor_shape(), 1, DataType::QSYMM16, qsymm_4));
     _mul_input_gate_input_mod_gate.configure(compile_context, &_input_gate_output, &_input_modulation_gate_output,
                                              &_cell_state_tmp2, 1, ConvertPolicy::SATURATE, RoundingPolicy::TO_ZERO);
     _input_modulation_gate_output.allocator()->allocate();
     _input_gate_output.allocator()->allocate();
  
     _add_cell_state_tmps.configure(compile_context, &_cell_state_tmp1, &_cell_state_tmp2, cell_state_out,
                                    ConvertPolicy::SATURATE);
     _cell_state_tmp1.allocator()->allocate();
     _cell_state_tmp2.allocator()->allocate();
  
     // Short term memory
     _memory_group.manage(&_output_state_tmp);
     _output_state_tmp.allocator()->init(
         TensorInfo(cell_state_out->info()->tensor_shape(), 1, DataType::QSYMM16, qsymm_0));
     _tanh_output_state.configure(compile_context, cell_state_out, &_output_state_tmp,
                                  ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::TANH, 1.0f, 1.0f));
  
     _memory_group.manage(&_output_state_out_symm);
     _output_state_out_symm.allocator()->init(
         TensorInfo(_output_gate_output.info()->tensor_shape(), 1, DataType::QSYMM16, qsymm_0));
     _mul_output_state_tmp_output_gate.configure(compile_context, &_output_state_tmp, &_output_gate_output,
                                                 &_output_state_out_symm, 1, ConvertPolicy::SATURATE,
                                                 RoundingPolicy::TO_ZERO);
     _output_gate_output.allocator()->allocate();
     _output_state_tmp.allocator()->allocate();
  
     // Requantize the output state from QSYMM16 to QASYMM8
     _memory_group.manage(&_output_state_out_f32);
     _output_state_out_f32.allocator()->init(
         TensorInfo(_output_state_out_symm.info()->tensor_shape(), 1, DataType::F32));
     _dequantize.configure(compile_context, &_output_state_out_symm, &_output_state_out_f32);
     _output_state_out_symm.allocator()->allocate();
  
     _quantize.configure(compile_context, &_output_state_out_f32, output_state_out);
     _output_state_out_f32.allocator()->allocate();
 }

◆ configure() [2/2]

void configure	(	const ICLTensor *	input,
		const ICLTensor *	input_to_input_weights,
		const ICLTensor *	input_to_forget_weights,
		const ICLTensor *	input_to_cell_weights,
		const ICLTensor *	input_to_output_weights,
		const ICLTensor *	recurrent_to_input_weights,
		const ICLTensor *	recurrent_to_forget_weights,
		const ICLTensor *	recurrent_to_cell_weights,
		const ICLTensor *	recurrent_to_output_weights,
		const ICLTensor *	input_gate_bias,
		const ICLTensor *	forget_gate_bias,
		const ICLTensor *	cell_bias,
		const ICLTensor *	output_gate_bias,
		ICLTensor *	cell_state_in,
		const ICLTensor *	output_state_in,
		ICLTensor *	cell_state_out,
		ICLTensor *	output_state_out
	)

Initialize function's tensors.

Valid data layouts:

All

Valid data type configurations:

src0 - src8	src9 - src12	src13	src14	dst0	dst1
QASYMM8	S32	QSYMM16	QASYMM8	QSYMM16	QASYMM8

Parameters

[in]	input	Source tensor. Input is a 2D tensor with dimensions [input_size, batch_size]. Data types supported: QASYMM8.
[in]	input_to_input_weights	2D weights tensor with dimensions [input_size, output_size]. Data type supported: Same as `input`.
[in]	input_to_forget_weights	2D weights tensor with dimensions [input_size, output_size]. Data type supported: Same as `input`.
[in]	input_to_cell_weights	2D weights tensor with dimensions [input_size, output_size]. Data type supported: Same as `input`.
[in]	input_to_output_weights	2D weights tensor with dimensions [input_size, output_size]. Data type supported: Same as `input`.
[in]	recurrent_to_input_weights	2D weights tensor with dimensions [output_size, output_size]. Data type supported: Same as `input`.
[in]	recurrent_to_forget_weights	2D weights tensor with dimensions [output_size, output_size]. Data type supported: Same as `input`.
[in]	recurrent_to_cell_weights	2D weights tensor with dimensions [output_size, output_size]. Data type supported: Same as `input`.
[in]	recurrent_to_output_weights	2D weights tensor with dimensions [output_size, output_size]. Data type supported: Same as `input`.
[in]	input_gate_bias	1D weights tensor with dimensions [output_size]. Data type supported: S32.
[in]	forget_gate_bias	1D weights tensor with dimensions [output_size]. Data type supported: S32.
[in]	cell_bias	1D weights tensor with dimensions [output_size]. Data type supported: S32.
[in]	output_gate_bias	1D weights tensor with dimensions [output_size]. Data type supported: S32.
[in]	cell_state_in	2D tensor with dimensions [output_size, batch_size]. Data type supported: QSYMM16.
[in]	output_state_in	2D tensor with dimensions [output_size, batch_size]. Data type supported: Same as `input`.
[out]	cell_state_out	Destination tensor. Output is a 2D tensor with dimensions [output_size, batch_size]. Data type supported: QSYMM16.
[out]	output_state_out	Destination tensor. Output is a 2D tensor with dimensions [output_size, batch_size].Data types supported: Same as `input`.

Definition at line 111 of file CLLSTMLayerQuantized.cpp.

 {
     configure(CLKernelLibrary::get().get_compile_context(), input, input_to_input_weights, input_to_forget_weights,
               input_to_cell_weights, input_to_output_weights, recurrent_to_input_weights, recurrent_to_forget_weights,
               recurrent_to_cell_weights, recurrent_to_output_weights, input_gate_bias, forget_gate_bias, cell_bias,
               output_gate_bias, cell_state_in, output_state_in, cell_state_out, output_state_out);
 }

Referenced by arm_compute::test::validation::TEST_CASE().

◆ operator=() [1/2]

CLLSTMLayerQuantized& operator= ( CLLSTMLayerQuantized && )

default

Default move assignment operator.

◆ operator=() [2/2]

CLLSTMLayerQuantized& operator= ( const CLLSTMLayerQuantized & )

delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ prepare()

void prepare ( )

overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Note: Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from IFunction.

Definition at line 718 of file CLLSTMLayerQuantized.cpp.

 {
     if (!_is_prepared)
     {
         _input_weights.allocator()->allocate();
         _concat_input_weights.run();
  
         _input_to_input_weights->mark_as_unused();
         _input_to_forget_weights->mark_as_unused();
         _input_to_cell_weights->mark_as_unused();
         _input_to_output_weights->mark_as_unused();
  
         _recurrent_weights.allocator()->allocate();
         _concat_recurrent_weights.run();
         _recurrent_to_input_weights->mark_as_unused();
         _recurrent_to_forget_weights->mark_as_unused();
         _recurrent_to_cell_weights->mark_as_unused();
         _recurrent_to_output_weights->mark_as_unused();
  
         _weights.allocator()->allocate();
         _concat_weights.run();
  
         _input_weights.mark_as_unused();
         _input_weights.allocator()->free();
         _recurrent_weights.mark_as_unused();
         _recurrent_weights.allocator()->free();
  
         _weights_transposed.allocator()->allocate();
         _transpose_weights.run();
  
         _weights.mark_as_unused();
         _weights.allocator()->free();
  
         _bias.allocator()->allocate();
         _concat_bias.run();
         _input_gate_bias->mark_as_unused();
         _forget_gate_bias->mark_as_unused();
         _cell_bias->mark_as_unused();
         _output_gate_bias->mark_as_unused();
  
         _is_prepared = true;
     }
 }

References CLTensorAllocator::allocate(), CLTensor::allocator(), CLTensorAllocator::free(), ITensor::mark_as_unused(), CLTranspose::run(), and CLConcatenateLayer::run().

Referenced by CLLSTMLayerQuantized::run().

◆ run()

void run ( )

overridevirtual

Run the kernels contained in the function.

For CPU kernels:

Multi-threading is used for the kernels which are parallelisable.
By default std::thread::hardware_concurrency() threads are used.

Note: CPPScheduler::set_num_threads() can be used to manually set the number of threads

For OpenCL kernels:

All the kernels are enqueued on the queue associated with CLScheduler.
The queue is then flushed.

Note: The function will not block until the kernels are executed. It is the user's responsibility to wait.; Will call prepare() on first run if hasn't been done

Implements IFunction.

Definition at line 671 of file CLLSTMLayerQuantized.cpp.

 {
     prepare();
  
     // Acquire all the temporaries
     MemoryGroupResourceScope scope_mg(_memory_group);
  
     // Concat and transpose the input
     _concat_inputs.run();
  
     // Run gemmlowp
     _gemmlowp.run();
     _output_stage.run();
  
     // Slice the results
     _slice_input_tensor.run();
     _slice_forget_tensor.run();
     _slice_cell_tensor.run();
     _slice_output_tensor.run();
  
     // Gates
     // Forget gate
     _sigmoid_forget_gate.run();
  
     // Input gate
     _sigmoid_input_gate.run();
  
     // Input modulation gate
     _tanh_modulation_gate.run();
  
     // Output gate
     _sigmoid_output_gate.run();
  
     // Cell state (long term memory)
     _mul_forget_gate_cell_state.run();
     _mul_input_gate_input_mod_gate.run();
     _add_cell_state_tmps.run();
  
     // Output state (short term memory)
     _tanh_output_state.run();
     _mul_output_state_tmp_output_gate.run();
  
     // Requantize output state from QSYMM16 to QASYMM8
     _dequantize.run();
     _quantize.run();
 }

References CLLSTMLayerQuantized::prepare(), CLDequantizationLayer::run(), CLQuantizationLayer::run(), CLActivationLayer::run(), CLSlice::run(), CLConcatenateLayer::run(), CLGEMMLowpOutputStage::run(), CLPixelWiseMultiplication::run(), CLGEMMLowpMatrixMultiplyCore::run(), and CLArithmeticAddition::run().

Referenced by arm_compute::test::validation::TEST_CASE().

◆ validate()

Status validate	(	const ITensorInfo *	input,
		const ITensorInfo *	input_to_input_weights,
		const ITensorInfo *	input_to_forget_weights,
		const ITensorInfo *	input_to_cell_weights,
		const ITensorInfo *	input_to_output_weights,
		const ITensorInfo *	recurrent_to_input_weights,
		const ITensorInfo *	recurrent_to_forget_weights,
		const ITensorInfo *	recurrent_to_cell_weights,
		const ITensorInfo *	recurrent_to_output_weights,
		const ITensorInfo *	input_gate_bias,
		const ITensorInfo *	forget_gate_bias,
		const ITensorInfo *	cell_bias,
		const ITensorInfo *	output_gate_bias,
		const ITensorInfo *	cell_state_in,
		const ITensorInfo *	output_state_in,
		const ITensorInfo *	cell_state_out,
		const ITensorInfo *	output_state_out
	)

static

Static function to check if given info will lead to a valid configuration of CLLSTMLayerQuantized.

Parameters

[in]	input	Source tensor info. Input is a 2D tensor info with dimensions [input_size, batch_size]. Data types supported: QASYMM8.
[in]	input_to_input_weights	2D weights tensor info with dimensions [input_size, output_size]. Data type supported: Same as `input`.
[in]	input_to_forget_weights	2D weights tensor info with dimensions [input_size, output_size]. Data type supported: Same as `input`.
[in]	input_to_cell_weights	2D weights tensor info with dimensions [input_size, output_size]. Data type supported: Same as `input`.
[in]	input_to_output_weights	2D weights tensor info with dimensions [input_size, output_size]. Data type supported: Same as `input`.
[in]	recurrent_to_input_weights	2D weights tensor info with dimensions [output_size, output_size]. Data type supported: Same as `input`.
[in]	recurrent_to_forget_weights	2D weights tensor info with dimensions [output_size, output_size]. Data type supported: Same as `input`.
[in]	recurrent_to_cell_weights	2D weights tensor info with dimensions [output_size, output_size]. Data type supported: Same as `input`.
[in]	recurrent_to_output_weights	2D weights tensor info with dimensions [output_size, output_size]. Data type supported: Same as `input`.
[in]	input_gate_bias	1D weights tensor info with dimensions [output_size]. Data type supported: S32.
[in]	forget_gate_bias	1D weights tensor info with dimensions [output_size]. Data type supported: S32.
[in]	cell_bias	1D weights tensor info with dimensions [output_size]. Data type supported: S32.
[in]	output_gate_bias	1D weights tensor info with dimensions [output_size]. Data type supported: S32.
[in]	cell_state_in	2D tensor info with dimensions [output_size, batch_size]. Data type supported: QSYMM16.
[in]	output_state_in	2D tensor info with dimensions [output_size, batch_size]. Data type supported: Same as `input`.
[out]	cell_state_out	Destination tensor info. Output is a 2D tensor info with dimensions [output_size, batch_size]. Data type supported: QSYMM16.
[out]	output_state_out	Destination tensor info. Output is a 2D tensor info with dimensions [output_size, batch_size].Data types supported: Same as `input`.

Returns: a status

Definition at line 395 of file CLLSTMLayerQuantized.cpp.

 {
     ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR(
         input, input_to_input_weights, input_to_forget_weights, input_to_cell_weights, input_to_output_weights,
         recurrent_to_input_weights, recurrent_to_forget_weights, recurrent_to_cell_weights, recurrent_to_output_weights,
         input_gate_bias, forget_gate_bias, cell_bias, output_gate_bias, cell_state_in, output_state_in, cell_state_out,
         output_state_out);
     ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_NOT_IN(input, DataType::QASYMM8);
  
     const int input_size  = input->dimension(0);
     const int batch_size  = input->dimension(1);
     const int output_size = input_to_input_weights->dimension(1);
  
     // Dimensionality checks
     ARM_COMPUTE_RETURN_ERROR_ON(input->num_dimensions() > 2);
     ARM_COMPUTE_RETURN_ERROR_ON(input_to_input_weights->num_dimensions() > 2);
     ARM_COMPUTE_RETURN_ERROR_ON(input_gate_bias->num_dimensions() > 1);
     ARM_COMPUTE_RETURN_ERROR_ON(output_state_in->num_dimensions() > 2);
  
     TensorInfo input_weights_info(input_to_input_weights->clone()
                                       ->set_tensor_shape(TensorShape(input_size, output_size))
                                       .set_data_type(DataType::QASYMM8));
     TensorInfo recurrent_weights_info(input_to_input_weights->clone()
                                           ->set_tensor_shape(TensorShape(output_size, output_size))
                                           .set_data_type(DataType::QASYMM8));
     TensorInfo bias_info(
         input_gate_bias->clone()->set_tensor_shape(TensorShape(output_size)).set_data_type(DataType::S32));
     TensorInfo output_state_info(cell_state_in->clone()
                                      ->set_tensor_shape(TensorShape(output_size, batch_size))
                                      .set_data_type(DataType::QASYMM8)
                                      .set_quantization_info(qasymm));
     TensorInfo cell_state_info(cell_state_in->clone()
                                    ->set_tensor_shape(TensorShape(output_size, batch_size))
                                    .set_data_type(DataType::QSYMM16)
                                    .set_quantization_info(qsymm_4));
  
     // Shape checks
     ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_SHAPES(&input_weights_info, input_to_input_weights, input_to_forget_weights,
                                                    input_to_cell_weights, input_to_output_weights);
     ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_SHAPES(&recurrent_weights_info, recurrent_to_input_weights,
                                                    recurrent_to_forget_weights, recurrent_to_cell_weights,
                                                    recurrent_to_output_weights);
     ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_SHAPES(&bias_info, input_gate_bias, forget_gate_bias, cell_bias,
                                                    output_gate_bias);
     ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_SHAPES(&cell_state_info, cell_state_in);
     ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_SHAPES(&output_state_info, output_state_in);
  
     // Data type checks
     ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(&input_weights_info, input, input_to_input_weights,
                                                        input_to_forget_weights, input_to_cell_weights,
                                                        input_to_output_weights);
     ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(&recurrent_weights_info, recurrent_to_input_weights,
                                                        recurrent_to_forget_weights, recurrent_to_cell_weights,
                                                        recurrent_to_output_weights);
     ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(&bias_info, input_gate_bias, forget_gate_bias, cell_bias,
                                                        output_gate_bias);
     ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(&cell_state_info, cell_state_in);
     ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(&output_state_info, output_state_in);
  
     // Quantization checks
     ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_QUANTIZATION_INFO(input_to_input_weights, input_to_forget_weights,
                                                               input_to_cell_weights, input_to_output_weights);
     ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_QUANTIZATION_INFO(recurrent_to_input_weights, recurrent_to_forget_weights,
                                                               recurrent_to_cell_weights, recurrent_to_output_weights);
     ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_QUANTIZATION_INFO(&cell_state_info, cell_state_in);
     ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_QUANTIZATION_INFO(&output_state_info, output_state_in);
  
     // Validate internal functions
     // _concat_input_weights
     std::vector<const ITensorInfo *> inputs_weights_vector;
     inputs_weights_vector.emplace_back(input_to_input_weights);
     inputs_weights_vector.emplace_back(input_to_forget_weights);
     inputs_weights_vector.emplace_back(input_to_cell_weights);
     inputs_weights_vector.emplace_back(input_to_output_weights);
     const QuantizationInfo qweights = input_to_input_weights->quantization_info(); // Weights quantization
     const TensorInfo       input_weights(TensorShape(input_size, 4 * output_size), 1, DataType::QASYMM8, qweights);
     ARM_COMPUTE_RETURN_ON_ERROR(CLConcatenateLayer::validate(inputs_weights_vector, &input_weights, Window::DimY));
  
     // _concat_recurrent_weights
     std::vector<const ITensorInfo *> recurrent_weights_vector;
     recurrent_weights_vector.emplace_back(recurrent_to_input_weights);
     recurrent_weights_vector.emplace_back(recurrent_to_forget_weights);
     recurrent_weights_vector.emplace_back(recurrent_to_cell_weights);
     recurrent_weights_vector.emplace_back(recurrent_to_output_weights);
     const TensorInfo recurrent_weights(TensorShape(output_size, 4 * output_size), 1, DataType::QASYMM8, qweights);
     ARM_COMPUTE_RETURN_ON_ERROR(
         CLConcatenateLayer::validate(recurrent_weights_vector, &recurrent_weights, Window::DimY));
  
     // _concat_weights
     std::vector<const ITensorInfo *> weights_vector;
     weights_vector.emplace_back(&recurrent_weights);
     weights_vector.emplace_back(&input_weights);
     const TensorInfo weights(TensorShape(input_size + output_size, 4 * output_size), 1, DataType::QASYMM8, qweights);
     ARM_COMPUTE_RETURN_ON_ERROR(CLConcatenateLayer::validate(weights_vector, &weights, Window::DimX));
     // _transpose_weights
     const TensorShape weights_transposed_shape(weights.tensor_shape()[1], weights.tensor_shape()[0]);
     TensorInfo weights_transposed = weights.clone()->set_is_resizable(true).set_tensor_shape(weights_transposed_shape);
     ARM_COMPUTE_RETURN_ON_ERROR(CLTranspose::validate(&weights, &weights_transposed));
  
     // _concat_inputs
     std::vector<const ITensorInfo *> input_vector;
     input_vector.emplace_back(input);
     input_vector.emplace_back(output_state_in);
     TensorInfo input_concatenated(TensorShape(output_size + input_size, batch_size), 1, DataType::QASYMM8, qasymm);
     ARM_COMPUTE_RETURN_ON_ERROR(CLConcatenateLayer::validate(input_vector, &input_concatenated, Window::DimX));
  
     // _concat_bias
     std::vector<const ITensorInfo *> bias_vector;
     bias_vector.emplace_back(input_gate_bias);
     bias_vector.emplace_back(forget_gate_bias);
     bias_vector.emplace_back(cell_bias);
     bias_vector.emplace_back(output_gate_bias);
  
     const TensorInfo bias_concatenated(TensorShape(4 * output_size), 1, DataType::S32);
     ARM_COMPUTE_RETURN_ON_ERROR(CLConcatenateLayer::validate(bias_vector, &bias_concatenated, Window::DimX));
  
     // Invert the offset for gemmlowp
     input_concatenated.set_quantization_info(QuantizationInfo(qasymm.uniform().scale, -qasymm.uniform().offset));
     weights_transposed.set_quantization_info(QuantizationInfo(qweights.uniform().scale, -qweights.uniform().offset));
  
     // _gemmlowp
     const TensorInfo output_highp(TensorShape(4 * output_size, batch_size), 1, DataType::S32);
     ARM_COMPUTE_RETURN_ON_ERROR(
         CLGEMMLowpMatrixMultiplyCore::validate(&input_concatenated, &weights_transposed, nullptr, &output_highp));
  
     // Set the offset back
     input_concatenated.set_quantization_info(QuantizationInfo(qasymm.uniform().scale, qasymm.uniform().offset));
     weights_transposed.set_quantization_info(QuantizationInfo(qweights.uniform().scale, qweights.uniform().offset));
  
     const TensorInfo output_lowp(output_highp.tensor_shape(), 1, DataType::QSYMM16, qsymm_3);
  
     const float multiplier        = 4096.f * qasymm.uniform().scale * qweights.uniform().scale;
     int         output_multiplier = 0;
     int         output_shift      = 0;
     ARM_COMPUTE_RETURN_ON_ERROR(
         quantization::calculate_quantized_multiplier(multiplier, &output_multiplier, &output_shift));
  
     // _output_stage
     GEMMLowpOutputStageInfo info{};
     info.type                = GEMMLowpOutputStageType::QUANTIZE_DOWN_FIXEDPOINT;
     info.gemmlowp_multiplier = output_multiplier;
     info.gemmlowp_shift      = output_shift;
     info.output_data_type    = DataType::QSYMM16;
     ARM_COMPUTE_RETURN_ON_ERROR(CLGEMMLowpOutputStage::validate(&output_highp, &bias_concatenated, &output_lowp, info));
  
     TensorInfo input_gate_input;
     TensorInfo forget_gate_input;
     TensorInfo input_modulation_gate_input;
     TensorInfo output_gate_input;
  
     if (batch_size > 1)
     {
         // _slice_input_tensor
         input_gate_input = TensorInfo(TensorShape(output_size, batch_size), 1, DataType::QSYMM16, qsymm_3);
         ARM_COMPUTE_RETURN_ON_ERROR(
             CLSlice::validate(&output_lowp, &input_gate_input, {0, 0}, {output_size, batch_size}));
         // _slice_forget_tensor
         forget_gate_input = TensorInfo(TensorShape(output_size, batch_size), 1, DataType::QSYMM16, qsymm_3);
         ARM_COMPUTE_RETURN_ON_ERROR(
             CLSlice::validate(&output_lowp, &forget_gate_input, {output_size, 0}, {2 * output_size, batch_size}));
         // _slice_cell_tensor
         input_modulation_gate_input = TensorInfo(TensorShape(output_size, batch_size), 1, DataType::QSYMM16, qsymm_3);
         ARM_COMPUTE_RETURN_ON_ERROR(CLSlice::validate(&output_lowp, &input_modulation_gate_input, {2 * output_size, 0},
                                                       {3 * output_size, batch_size}));
         // _slice_output_tensor
         output_gate_input = TensorInfo(TensorShape(output_size, batch_size), 1, DataType::QSYMM16, qsymm_3);
         ARM_COMPUTE_RETURN_ON_ERROR(
             CLSlice::validate(&output_lowp, &output_gate_input, {3 * output_size, 0}, {4 * output_size, batch_size}));
     }
     else
     {
         // _slice_input_tensor
         input_gate_input = TensorInfo(TensorShape(output_size), 1, DataType::QSYMM16, qsymm_3);
         ARM_COMPUTE_RETURN_ON_ERROR(CLSlice::validate(&output_lowp, &input_gate_input, {0}, {output_size}));
         // _slice_forget_tensor
         forget_gate_input = TensorInfo(TensorShape(output_size), 1, DataType::QSYMM16, qsymm_3);
         ARM_COMPUTE_RETURN_ON_ERROR(
             CLSlice::validate(&output_lowp, &forget_gate_input, {output_size}, {2 * output_size}));
         // _slice_cell_tensor
         input_modulation_gate_input = TensorInfo(TensorShape(output_size), 1, DataType::QSYMM16, qsymm_3);
         ARM_COMPUTE_RETURN_ON_ERROR(
             CLSlice::validate(&output_lowp, &input_modulation_gate_input, {2 * output_size}, {3 * output_size}));
         // _slice_output_tensor
         output_gate_input = TensorInfo(TensorShape(output_size), 1, DataType::QSYMM16, qsymm_3);
         ARM_COMPUTE_RETURN_ON_ERROR(
             CLSlice::validate(&output_lowp, &output_gate_input, {3 * output_size}, {4 * output_size}));
     }
  
     // _sigmoid_forget_gate
     const TensorInfo forget_gate_output(forget_gate_input.tensor_shape(), 1, DataType::QSYMM16, qsymm_0);
     ARM_COMPUTE_RETURN_ON_ERROR(
         CLActivationLayer::validate(&forget_gate_input, &forget_gate_output,
                                     ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::LOGISTIC)));
     // _sigmoid_input_gate
     const TensorInfo input_gate_output(input_gate_input.tensor_shape(), 1, DataType::QSYMM16, qsymm_0);
     ARM_COMPUTE_RETURN_ON_ERROR(CLActivationLayer::validate(
         &input_gate_input, &input_gate_output, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::LOGISTIC)));
     // _tanh_modulation_gate
     const TensorInfo input_modulation_gate_output(input_modulation_gate_input.tensor_shape(), 1, DataType::QSYMM16,
                                                   qsymm_0);
     ARM_COMPUTE_RETURN_ON_ERROR(
         CLActivationLayer::validate(&input_modulation_gate_input, &input_modulation_gate_output,
                                     ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::TANH, 1.0f, 1.0f)));
     // _sigmoid_output_gate
     const TensorInfo output_gate_output(output_gate_input.tensor_shape(), 1, DataType::QSYMM16, qsymm_0);
     ARM_COMPUTE_RETURN_ON_ERROR(
         CLActivationLayer::validate(&output_gate_input, &output_gate_output,
                                     ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::LOGISTIC)));
  
     // _mul_forget_gate_cell_state
     const TensorInfo cell_state_tmp1(forget_gate_output.tensor_shape(), 1, DataType::QSYMM16, qsymm_4);
     ARM_COMPUTE_RETURN_ON_ERROR(CLPixelWiseMultiplication::validate(
         &forget_gate_output, cell_state_in, &cell_state_tmp1, 1, ConvertPolicy::SATURATE, RoundingPolicy::TO_ZERO));
  
     // _mul_input_gate_input_mod_gate
     const TensorInfo cell_state_tmp2(input_gate_output.tensor_shape(), 1, DataType::QSYMM16, qsymm_4);
     ARM_COMPUTE_RETURN_ON_ERROR(CLPixelWiseMultiplication::validate(&input_gate_output, &input_modulation_gate_output,
                                                                     &cell_state_tmp2, 1, ConvertPolicy::SATURATE,
                                                                     RoundingPolicy::TO_ZERO));
  
     // _add_cell_state_tmps
     ARM_COMPUTE_RETURN_ON_ERROR(
         CLArithmeticAddition::validate(&cell_state_tmp1, &cell_state_tmp2, cell_state_out, ConvertPolicy::SATURATE));
  
     // _tanh_modulation_gate
     const TensorInfo output_state_tmp(cell_state_out->tensor_shape(), 1, DataType::QSYMM16, qsymm_0);
     ARM_COMPUTE_RETURN_ON_ERROR(
         CLActivationLayer::validate(cell_state_out, &output_state_tmp,
                                     ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::TANH, 1.0f, 1.0f)));
  
     // _mul_output_state_tmp_output_gate
     const TensorInfo output_state_out_symm(output_gate_output.tensor_shape(), 1, DataType::QSYMM16, qsymm_0);
     ARM_COMPUTE_RETURN_ON_ERROR(CLPixelWiseMultiplication::validate(&output_state_tmp, &output_gate_output,
                                                                     &output_state_out_symm, 1, ConvertPolicy::SATURATE,
                                                                     RoundingPolicy::TO_ZERO));
  
     // _dequantize
     const TensorInfo output_state_out_f32(output_state_out_symm.tensor_shape(), 1, DataType::F32);
     ARM_COMPUTE_RETURN_ON_ERROR(CLDequantizationLayer::validate(&output_state_out_symm, &output_state_out_f32));
  
     // _quantize
     ARM_COMPUTE_RETURN_ON_ERROR(CLQuantizationLayer::validate(&output_state_out_f32, output_state_out));
  
     if (cell_state_out->total_size() != 0)
     {
         ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(&cell_state_info, cell_state_out);
         ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_SHAPES(&cell_state_info, cell_state_out);
         ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_QUANTIZATION_INFO(&cell_state_info, cell_state_out);
     }
  
     if (output_state_out->total_size() != 0)
     {
         ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(&output_state_info, output_state_out);
         ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_SHAPES(&output_state_info, output_state_out);
         ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_QUANTIZATION_INFO(&output_state_info, output_state_out);
     }
  
     return Status{};
 }

Referenced by CLLSTMLayerQuantized::configure().

The documentation for this class was generated from the following files:

arm_compute/runtime/CL/functions/CLLSTMLayerQuantized.h
src/runtime/CL/functions/CLLSTMLayerQuantized.cpp

Public Member Functions

Static Public Member Functions

Detailed Description

Constructor & Destructor Documentation

◆ CLLSTMLayerQuantized() [1/3]

◆ CLLSTMLayerQuantized() [2/3]

◆ CLLSTMLayerQuantized() [3/3]

Member Function Documentation

◆ configure() [1/2]

◆ configure() [2/2]

◆ operator=() [1/2]

◆ operator=() [2/2]

◆ prepare()

◆ run()

◆ validate()