Compute Library
 19.08
CLLSTMLayer Class Reference

This function performs a single time step in a Long Short-Term Memory (LSTM) layer. More...

#include <CLLSTMLayer.h>

Collaboration diagram for CLLSTMLayer:
[legend]

Public Member Functions

 CLLSTMLayer (std::shared_ptr< IMemoryManager > memory_manager=nullptr)
 Default constructor. More...
 
void configure (const ICLTensor *input, const ICLTensor *input_to_forget_weights, const ICLTensor *input_to_cell_weights, const ICLTensor *input_to_output_weights, const ICLTensor *recurrent_to_forget_weights, const ICLTensor *recurrent_to_cell_weights, const ICLTensor *recurrent_to_output_weights, const ICLTensor *forget_gate_bias, const ICLTensor *cell_bias, const ICLTensor *output_gate_bias, const ICLTensor *output_state_in, const ICLTensor *cell_state_in, ICLTensor *scratch_buffer, ICLTensor *output_state_out, ICLTensor *cell_state_out, ICLTensor *output, const LSTMParams< ICLTensor > &lstm_params, const ActivationLayerInfo &activation_info, float cell_threshold=0.f, float projection_threshold=0.f)
 Initialize function's tensors. More...
 
void run () override
 Run the kernels contained in the function. More...
 
void prepare () override
 Prepare the function for executing. More...
 
- Public Member Functions inherited from IFunction
virtual ~IFunction ()=default
 Destructor. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *input, const ITensorInfo *input_to_forget_weights, const ITensorInfo *input_to_cell_weights, const ITensorInfo *input_to_output_weights, const ITensorInfo *recurrent_to_forget_weights, const ITensorInfo *recurrent_to_cell_weights, const ITensorInfo *recurrent_to_output_weights, const ITensorInfo *forget_gate_bias, const ITensorInfo *cell_bias, const ITensorInfo *output_gate_bias, const ITensorInfo *output_state_in, const ITensorInfo *cell_state_in, const ITensorInfo *scratch_buffer, const ITensorInfo *output_state_out, const ITensorInfo *cell_state_out, const ITensorInfo *output, const LSTMParams< ITensorInfo > &lstm_params, const ActivationLayerInfo &activation_info, float cell_threshold=0.f, float projection_threshold=0.f)
 Static function to check if given info will lead to a valid configuration of CLLSTMLayer. More...
 

Detailed Description

This function performs a single time step in a Long Short-Term Memory (LSTM) layer.

Definition at line 55 of file CLLSTMLayer.h.

Constructor & Destructor Documentation

◆ CLLSTMLayer()

CLLSTMLayer ( std::shared_ptr< IMemoryManager memory_manager = nullptr)

Default constructor.

Definition at line 40 of file CLLSTMLayer.cpp.

41  : _memory_group(std::move(memory_manager)), _fully_connected_input_gate(), _accum_input_gate1(), _subtract_input_gate(), _pixelwise_mul_input_gate(), _activation_input_gate(),
42  _fully_connected_forget_gate(), _accum_forget_gate1(), _pixelwise_mul_forget_gate(), _activation_forget_gate(), _fully_connected_cell_state(), _gemm_cell_state1(), _transpose_cell_state(),
43  _accum_cell_state1(), _accum_cell_state2(), _pixelwise_mul_cell_state1(), _activation_cell_state(), _cell_clip(), _pixelwise_mul_cell_state2(), _fully_connected_output(),
44  _pixelwise_mul_output_state1(), _accum_output1(), _activation_output(), _activation_output_state(), _pixelwise_mul_output_state2(), _fully_connected_output_state(), _projection_clip(),
45  _copy_cell_state(), _copy_output(), _concat_scratch_buffer(), _concat_inputs_forget_gate(), _concat_weights_forget_gate(), _concat_weights_input_gate(), _concat_weights_output(),
46  _ones_memset_kernel(), _mean_std_norm_input_gate(), _pixelwise_mul_input_gate_coeff(), _accum_input_gate_bias(), _mean_std_norm_forget_gate(), _pixelwise_mul_forget_gate_coeff(),
47  _accum_forget_gate_bias(), _mean_std_norm_cell_gate(), _pixelwise_mul_cell_gate_coeff(), _accum_cell_gate_bias(), _mean_std_norm_output_gate(), _pixelwise_mul_output_gate_coeff(),
48  _accum_output_gate_bias(), _input_gate_out1(), _input_gate_out2(), _input_gate_out3(), _input_gate_out4(), _forget_gate_out1(), _forget_gate_out2(), _forget_gate_out3(), _forget_gate_out4(),
49  _forget_gate_out5(), _forget_gate_out6(), _cell_state_out1(), _cell_state_out2(), _cell_state_out3(), _cell_state_out4(), _cell_state_out5(), _output1(), _output2(), _output3(), _output4(),
50  _cell_state_activation(), _output_state1(), _ones(), _input_layer_norm_out1(), _input_layer_norm_out2(), _forget_layer_norm_out1(), _forget_layer_norm_out2(), _cell_layer_norm_out1(),
51  _cell_layer_norm_out2(), _output_layer_norm_out1(), _output_layer_norm_out2(), _run_peephole_opt(false), _run_cifg_opt(false), _perform_cell_clipping(false), _has_projection_weights(false),
52  _perform_projection_clipping(false), _is_prepared(false), _is_layer_norm_lstm(false)
53 {
54 }

Member Function Documentation

◆ configure()

void configure ( const ICLTensor input,
const ICLTensor input_to_forget_weights,
const ICLTensor input_to_cell_weights,
const ICLTensor input_to_output_weights,
const ICLTensor recurrent_to_forget_weights,
const ICLTensor recurrent_to_cell_weights,
const ICLTensor recurrent_to_output_weights,
const ICLTensor forget_gate_bias,
const ICLTensor cell_bias,
const ICLTensor output_gate_bias,
const ICLTensor output_state_in,
const ICLTensor cell_state_in,
ICLTensor scratch_buffer,
ICLTensor output_state_out,
ICLTensor cell_state_out,
ICLTensor output,
const LSTMParams< ICLTensor > &  lstm_params,
const ActivationLayerInfo activation_info,
float  cell_threshold = 0.f,
float  projection_threshold = 0.f 
)

Initialize function's tensors.

Parameters
[in]inputSource tensor. Input is a 2D tensor with dimensions [input_size, batch_size]. Data types supported: F16/F32.
[in]input_to_forget_weights2D weights tensor with dimensions [input_size, num_units]. Data type supported: Same as input.
[in]input_to_cell_weights2D weights tensor with dimensions [input_size, num_units]. Data type supported: Same as input.
[in]input_to_output_weights2D weights tensor with dimensions [input_size, num_units]. Data type supported: Same as input.
[in]recurrent_to_forget_weights2D weights tensor with dimensions [output_size, num_units]. Data type supported: Same as input.
[in]recurrent_to_cell_weights2D weights tensor with dimensions [output_size, num_units]. Data type supported: Same as input.
[in]recurrent_to_output_weights2D weights tensor with dimensions [output_size, num_units]. Data type supported: Same as input.
[in]forget_gate_bias1D weights tensor with dimensions [num_units]. Data type supported: Same as input.
[in]cell_bias1D weights tensor with dimensions [num_units]. Data type supported: Same as input.
[in]output_gate_bias1D weights tensor with dimensions [num_units]. Data type supported: Same as input.
[in]output_state_in2D weights tensor with dimensions [output_size, batch_size]. Data type supported: Same as input.
[in]cell_state_in2D tensor with dimensions [num_units, batch_size]. Data type supported: Same as input.
[out]scratch_buffer2D tensor with dimensions [num_units * 4, batch_size] with CIFG or [num_units * 3, batch_size] without CIGF. Data type supported: Same as input.
[out]output_state_out2D weights tensor with dimensions [output_size, batch_size]. Data type supported: Same as input.
[out]cell_state_out2D tensor with dimensions [num_units, batch_size]. Data type supported: Same as input.
[out]outputDestination tensor. Output is a 2D tensor with dimensions [output_size, batch_size]. Data types supported: Same as input.
[in]lstm_params(Optional) Weights tensors used in peephole optimization: input_to_input_weights 2D weights tensor with dimensions [input_size, num_units]. Data type supported: Same as input. recurrent_to_input_weights 2D weights tensor with dimensions [output_size, num_units]. Data type supported: Same as input. cell_to_input_weights 1D weights tensor with dimensions [num_units]. Can be nullptr. Data type supported: Same as input. cell_to_forget_weights 1D weights tensor with dimensions [num_units]. Data type supported: Same as input. cell_to_output_weights 1D weights tensor with dimensions [num_units]. Data type supported: Same as input. input_gate_bias 1D weights tensor with dimensions [num_units]. Data type supported: Same as input projection_weights 2D weights tensor with dimensions [output_size, num_units]. Data type supported: Same as input. projection_bias 1D weights tensor with dimensions [output_size]. Data type supported: Same as input. input_layer_norm_coefficients 1D weights tensor with dimensions [num_units]. Data type supported: Same as input. forget_layer_norm_coefficients 1D weights tensor with dimensions [num_units]. Data type supported: Same as input. cell_layer_norm_coefficients 1D weights tensor with dimensions [num_units]. Data type supported: Same as input. output_layer_norm_coefficients 1D weights tensor with dimensions [num_units]. Data type supported: Same as input.
[in]activation_infoContains activation information described in ActivationLayerInfo.
[in]cell_thresholdThe clipping threshold for the cell state, such that values are bound within [-cell_clip, cell_clip]. If set to 0.0f then clipping is disabled.
[in]projection_thresholdThe clipping threshold for the output from the projection layer, such that values are bound within [-proj_clip, proj_clip]. If set to 0.0f then clipping is disabled.

lstm_res = PixelwiseMul(output, Activation(cell_state))

                -- Clip(lstm_res * projection_weights + projection_bias, projection_threshold) , if there is a projection
               /

output_state = – \ – lstm_res , otherwise

Definition at line 56 of file CLLSTMLayer.cpp.

63 {
65  input_to_forget_weights, input_to_cell_weights, input_to_output_weights,
66  recurrent_to_forget_weights, recurrent_to_cell_weights, recurrent_to_output_weights,
67  forget_gate_bias, cell_bias, output_gate_bias,
68  output_state_in, cell_state_in,
69  scratch_buffer, output_state_out, cell_state_out, output);
70 
71  _is_layer_norm_lstm = lstm_params.use_layer_norm();
72 
73  // Set lstm parameters
74  LSTMParams<ITensorInfo> lstm_params_info;
75  if(lstm_params.has_peephole_opt())
76  {
77  lstm_params_info.set_peephole_params(lstm_params.cell_to_forget_weights()->info(), lstm_params.cell_to_output_weights()->info());
78  }
79  if(lstm_params.has_projection())
80  {
81  lstm_params_info.set_projection_params(lstm_params.projection_weights()->info(),
82  lstm_params.projection_bias() != nullptr ? lstm_params.projection_bias()->info() : nullptr);
83  }
84  if(!lstm_params.has_cifg_opt())
85  {
86  const ITensorInfo *cell_to_input_weights_info = (lstm_params.has_peephole_opt()) ? lstm_params.cell_to_input_weights()->info() : nullptr;
87  lstm_params_info.set_cifg_params(lstm_params.input_to_input_weights()->info(), lstm_params.recurrent_to_input_weights()->info(),
88  cell_to_input_weights_info, lstm_params.input_gate_bias()->info());
89  }
90 
91  // Validate
92  ARM_COMPUTE_ERROR_THROW_ON(CLLSTMLayer::validate(input->info(), input_to_forget_weights->info(),
93  input_to_cell_weights->info(), input_to_output_weights->info(),
94  recurrent_to_forget_weights->info(), recurrent_to_cell_weights->info(), recurrent_to_output_weights->info(),
95  forget_gate_bias->info(), cell_bias->info(), output_gate_bias->info(),
96  output_state_in->info(), cell_state_in->info(),
97  scratch_buffer->info(), output_state_out->info(), cell_state_out->info(), output->info(),
98  lstm_params_info, activation_info, cell_threshold, projection_threshold));
99 
100  const TensorShape cell_state_shape = cell_state_in->info()->tensor_shape();
101  // Configure block that calculates the forget gate
102  // forget_gate = Activation(input * input_to_forget_weights + output_state_in * recurrent_to_forget_weights + PixelWiseMul(cell_state, cell_to_forget_weights) + forget_gate_bias)
103  // We optimize this as follows:
104  // forget_gate = Activation( (input,output_state_in) * (input_to_forget_weights,recurrent_to_forget_weights) + PixelWiseMul(cell_state, cell_to_forget_weights) + forget_gate_bias
105  _forget_gate_out1.allocator()->init(TensorInfo(cell_state_shape, 1, input->info()->data_type()));
106  _forget_gate_out3.allocator()->init(TensorInfo(cell_state_shape, 1, input->info()->data_type()));
107  _forget_gate_out5.allocator()->init(TensorInfo(cell_state_shape, 1, input->info()->data_type()));
108 
109  std::vector<const ICLTensor *> inputs_vector;
110  inputs_vector.emplace_back(input);
111  inputs_vector.emplace_back(output_state_in);
113  _forget_gate_out2.allocator()->init(TensorInfo(concat_shape, 1, input->info()->data_type()));
114 
115  _memory_group.manage(&_forget_gate_out2);
116  _concat_inputs_forget_gate.configure(input, output_state_in, &_forget_gate_out2);
117 
118  std::vector<const ICLTensor *> weights_vector;
119 
120  weights_vector.emplace_back(input_to_forget_weights);
121  weights_vector.emplace_back(recurrent_to_forget_weights);
122  const TensorShape weights_concat_shape = arm_compute::misc::shape_calculator::calculate_concatenate_shape(weights_vector, 0);
123  _forget_gate_out6.allocator()->init(TensorInfo(weights_concat_shape, 1, input->info()->data_type()));
124 
125  _concat_weights_forget_gate.configure(input_to_forget_weights, recurrent_to_forget_weights, &_forget_gate_out6);
126 
127  _memory_group.manage(&_forget_gate_out5);
128  _fully_connected_forget_gate.configure(&_forget_gate_out2, &_forget_gate_out6, (_is_layer_norm_lstm) ? nullptr : forget_gate_bias, &_forget_gate_out5);
129  _memory_group.manage(&_forget_gate_out1);
130  _memory_group.manage(&_forget_gate_out3);
131  _forget_gate_out6.allocator()->allocate();
132 
133  CLTensor *forget_gate_out = &_forget_gate_out5;
134  if(lstm_params.has_peephole_opt())
135  {
136  _forget_gate_out4.allocator()->init(TensorInfo(cell_state_shape, 1, input->info()->data_type()));
137 
138  _run_peephole_opt = true;
139  _memory_group.manage(&_forget_gate_out4);
140  _pixelwise_mul_forget_gate.configure(cell_state_in, lstm_params.cell_to_forget_weights(), &_forget_gate_out4, 1, ConvertPolicy::SATURATE, RoundingPolicy::TO_NEAREST_EVEN);
141  _accum_forget_gate1.configure(&_forget_gate_out5, &_forget_gate_out4, &_forget_gate_out3, ConvertPolicy::SATURATE);
142  _forget_gate_out4.allocator()->allocate();
143  _forget_gate_out5.allocator()->allocate();
144  forget_gate_out = &_forget_gate_out3;
145  }
146  else
147  {
148  _forget_gate_out3.allocator()->allocate();
149  }
150  if(_is_layer_norm_lstm)
151  {
152  _forget_layer_norm_out1.allocator()->init(TensorInfo(cell_state_shape, 1, input->info()->data_type()));
153  _forget_layer_norm_out2.allocator()->init(TensorInfo(cell_state_shape, 1, input->info()->data_type()));
154  _memory_group.manage(&_forget_layer_norm_out1);
155  _memory_group.manage(&_forget_layer_norm_out2);
156  _mean_std_norm_forget_gate.configure(forget_gate_out);
157  _pixelwise_mul_forget_gate_coeff.configure(forget_gate_out, lstm_params.forget_layer_norm_weights(), &_forget_layer_norm_out1, 1, ConvertPolicy::SATURATE, RoundingPolicy::TO_NEAREST_EVEN);
158  // forget_gate_out is going to be reassigned, so allocate the tensor that it was assigned to before
159  forget_gate_out->allocator()->allocate();
160  _accum_forget_gate_bias.configure(ArithmeticOperation::ADD, &_forget_layer_norm_out1, forget_gate_bias, &_forget_layer_norm_out2, ConvertPolicy::SATURATE);
161  _forget_layer_norm_out1.allocator()->allocate();
162  forget_gate_out = &_forget_layer_norm_out2;
163  }
164  _activation_forget_gate.configure(forget_gate_out, nullptr, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::LOGISTIC));
165 
166  // Configure block that calculates the input gate
167  // input_gate = Activation(input * input_to_input_weights + output_state * recurrent_to_input_weights + PixelWiseMul(cell_state, cell_to_input_weights) + input_gate_bias), without CIFG
168  // input_gate = 1 - forget_gate, with CIFG
169  // We optimize this as follows:
170  // input_gate = Activation((input,output_state) * (input_to_input_weights,recurrent_to_input_weights) + PixelWiseMul(cell_state, cell_to_input_weights) + input_gate_bias), without CIFG
171  _input_gate_out1.allocator()->init(TensorInfo(cell_state_shape, 1, input->info()->data_type()));
172  CLTensor *input_gate_out = &_input_gate_out1;
173  if(lstm_params.has_cifg_opt())
174  {
175  _memory_group.manage(&_input_gate_out1);
176  _ones.allocator()->init(TensorInfo(cell_state_shape, 1, input->info()->data_type()));
177  _ones_memset_kernel.configure(&_ones, PixelValue(1, _ones.info()->data_type()));
178  _subtract_input_gate.configure(ArithmeticOperation::SUB, &_ones, forget_gate_out, &_input_gate_out1, ConvertPolicy::SATURATE);
179  _ones.allocator()->allocate();
180  _run_cifg_opt = true;
181  }
182  else
183  {
184  _input_gate_out3.allocator()->init(TensorInfo(cell_state_shape, 1, input->info()->data_type()));
185  _input_gate_out4.allocator()->init(TensorInfo(cell_state_shape, 1, input->info()->data_type()));
186 
187  std::vector<const ICLTensor *> lstm_weights;
188  lstm_weights.emplace_back(lstm_params.input_to_input_weights());
189  lstm_weights.emplace_back(lstm_params.recurrent_to_input_weights());
190  TensorShape lstm_weights_concat_shape = arm_compute::misc::shape_calculator::calculate_concatenate_shape(lstm_weights, 0);
191  _input_gate_out2.allocator()->init(TensorInfo(lstm_weights_concat_shape, 1, input->info()->data_type()));
192 
193  _concat_weights_input_gate.configure(lstm_params.input_to_input_weights(), lstm_params.recurrent_to_input_weights(), &_input_gate_out2);
194 
195  _memory_group.manage(&_input_gate_out1);
196 
197  _memory_group.manage(&_input_gate_out3);
198  _fully_connected_input_gate.configure(&_forget_gate_out2, &_input_gate_out2, (_is_layer_norm_lstm) ? nullptr : lstm_params.input_gate_bias(), &_input_gate_out3);
199  _input_gate_out2.allocator()->allocate();
200 
201  input_gate_out = &_input_gate_out3;
202  if(_run_peephole_opt)
203  {
204  _memory_group.manage(&_input_gate_out4);
205  _pixelwise_mul_input_gate.configure(cell_state_in, lstm_params.cell_to_input_weights(), &_input_gate_out4, 1, ConvertPolicy::SATURATE, RoundingPolicy::TO_NEAREST_EVEN);
206  _accum_input_gate1.configure(&_input_gate_out3, &_input_gate_out4, &_input_gate_out1, ConvertPolicy::SATURATE);
207  _input_gate_out3.allocator()->allocate();
208  _input_gate_out4.allocator()->allocate();
209  input_gate_out = &_input_gate_out1;
210  }
211  else
212  {
213  _input_gate_out1.allocator()->allocate();
214  }
215 
216  if(_is_layer_norm_lstm)
217  {
218  _input_layer_norm_out1.allocator()->init(TensorInfo(cell_state_shape, 1, input->info()->data_type()));
219  _input_layer_norm_out2.allocator()->init(TensorInfo(cell_state_shape, 1, input->info()->data_type()));
220  _memory_group.manage(&_input_layer_norm_out1);
221  _memory_group.manage(&_input_layer_norm_out2);
222  _mean_std_norm_input_gate.configure(input_gate_out);
223  _pixelwise_mul_input_gate_coeff.configure(input_gate_out, lstm_params.input_layer_norm_weights(), &_input_layer_norm_out1, 1, ConvertPolicy::SATURATE, RoundingPolicy::TO_NEAREST_EVEN);
224  // input_gate_out is going to be reassigned, so allocate the tensor that it was assigned to before
225  input_gate_out->allocator()->allocate();
226  _accum_input_gate_bias.configure(ArithmeticOperation::ADD, &_input_layer_norm_out1, lstm_params.input_gate_bias(), &_input_layer_norm_out2, ConvertPolicy::SATURATE);
227  _input_layer_norm_out1.allocator()->allocate();
228  input_gate_out = &_input_layer_norm_out2;
229  }
230  _activation_input_gate.configure(input_gate_out, nullptr, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::LOGISTIC));
231  }
232 
233  // Configure block that calculates the cell state
234  // cell_state = Clip((PixelwiseMul(input_gate, Activation(input * input_to_cell_weights + output_state_in * recurrent_to_cell_weights + cell_bias)) + PixelwiseMul(forget_gate, cell_state)), cell_threshold)
235  TensorShape cell_state1_shape = compute_transposed_shape(*recurrent_to_output_weights->info());
236  _cell_state_out1.allocator()->init(TensorInfo(cell_state_shape, 1, input->info()->data_type()));
237  _cell_state_out2.allocator()->init(TensorInfo(cell_state1_shape, 1, input->info()->data_type()));
238  _cell_state_out3.allocator()->init(TensorInfo(cell_state_shape, 1, input->info()->data_type()));
239  _cell_state_out4.allocator()->init(TensorInfo(cell_state_shape, 1, input->info()->data_type()));
240  _cell_state_out5.allocator()->init(TensorInfo(cell_state_shape, 1, input->info()->data_type()));
241 
242  _memory_group.manage(&_cell_state_out1);
243  _fully_connected_cell_state.configure(input, input_to_cell_weights, (_is_layer_norm_lstm) ? nullptr : cell_bias, &_cell_state_out1);
244  _memory_group.manage(&_cell_state_out2);
245  _transpose_cell_state.configure(recurrent_to_cell_weights, &_cell_state_out2);
246  _memory_group.manage(&_cell_state_out3);
247  _gemm_cell_state1.configure(output_state_in, &_cell_state_out2, nullptr, &_cell_state_out3, 1.f, 0.f);
248  _cell_state_out2.allocator()->allocate();
249  _memory_group.manage(&_cell_state_out4);
250  _accum_cell_state1.configure(ArithmeticOperation::ADD, &_cell_state_out1, &_cell_state_out3, &_cell_state_out4, ConvertPolicy::SATURATE);
251  CLTensor *cell_state_out_ptr = &_cell_state_out4;
252  if(_is_layer_norm_lstm)
253  {
254  _cell_layer_norm_out1.allocator()->init(TensorInfo(cell_state_shape, 1, input->info()->data_type()));
255  _cell_layer_norm_out2.allocator()->init(TensorInfo(cell_state_shape, 1, input->info()->data_type()));
256  _memory_group.manage(&_cell_layer_norm_out1);
257  _memory_group.manage(&_cell_layer_norm_out2);
258  _mean_std_norm_cell_gate.configure(cell_state_out_ptr);
259  _pixelwise_mul_cell_gate_coeff.configure(cell_state_out_ptr, lstm_params.cell_layer_norm_weights(), &_cell_layer_norm_out1, 1, ConvertPolicy::SATURATE, RoundingPolicy::TO_NEAREST_EVEN);
260  // cell_state_out_ptr is going to be reassigned, so allocate the tensor that it was assigned to before
261  cell_state_out_ptr->allocator()->allocate();
262  _accum_cell_gate_bias.configure(ArithmeticOperation::ADD, &_cell_layer_norm_out1, cell_bias, &_cell_layer_norm_out2, ConvertPolicy::SATURATE);
263  _cell_layer_norm_out1.allocator()->allocate();
264  cell_state_out_ptr = &_cell_layer_norm_out2;
265  }
266  _activation_cell_state.configure(cell_state_out_ptr, nullptr, activation_info);
267  _memory_group.manage(&_cell_state_out5);
268  _pixelwise_mul_cell_state1.configure(cell_state_out_ptr, input_gate_out, &_cell_state_out5, 1, ConvertPolicy::SATURATE, RoundingPolicy::TO_NEAREST_EVEN);
269  cell_state_out_ptr->allocator()->allocate();
270  _pixelwise_mul_cell_state2.configure(forget_gate_out, cell_state_in, &_cell_state_out3, 1, ConvertPolicy::SATURATE, RoundingPolicy::TO_NEAREST_EVEN);
271  _accum_cell_state2.configure(ArithmeticOperation::ADD, &_cell_state_out5, &_cell_state_out3, &_cell_state_out1, ConvertPolicy::SATURATE);
272  _cell_state_out3.allocator()->allocate();
273  _cell_state_out5.allocator()->allocate();
274  // Perform clipping
275  if(cell_threshold != 0.f)
276  {
277  _perform_cell_clipping = true;
278  _cell_clip.configure(&_cell_state_out1, nullptr, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::LU_BOUNDED_RELU, -cell_threshold, cell_threshold));
279  }
280 
281  // Configure block that calculates the output
282  // output_state_out = Activation(input * input_to_output_weights + output_state_in * recurrent_to_output_weights + PixelWiseMul(cell_state, cell_to_output_weights) + output_gate_bias)
283  // We optimize this as follows:
284  // output_state_out = Activation( (input,output_state_in) * (input_to_output_weights, recurrent_to_output_weights) + PixelWiseMul(cell_state, cell_to_output_weights) + output_gate_bias)
285  _output1.allocator()->init(TensorInfo(cell_state_shape, 1, input->info()->data_type()));
286  _output4.allocator()->init(TensorInfo(cell_state_shape, 1, input->info()->data_type()));
287  std::vector<const ICLTensor *> in_out_weights;
288  in_out_weights.emplace_back(input_to_output_weights);
289  in_out_weights.emplace_back(recurrent_to_output_weights);
290  TensorShape in_out_weights_concat_shape = arm_compute::misc::shape_calculator::calculate_concatenate_shape(in_out_weights, 0);
291  _output2.allocator()->init(TensorInfo(in_out_weights_concat_shape, 1, input->info()->data_type()));
292 
293  _concat_weights_output.configure(input_to_output_weights, recurrent_to_output_weights, &_output2);
294 
295  _memory_group.manage(&_output1);
296  _memory_group.manage(&_output4);
297 
298  _fully_connected_output.configure(&_forget_gate_out2, &_output2, (_is_layer_norm_lstm) ? nullptr : output_gate_bias, &_output4);
299 
300  _output2.allocator()->allocate();
301  _forget_gate_out2.allocator()->allocate();
302 
303  CLTensor *output_gate_out = &_output4;
304  if(lstm_params.has_peephole_opt())
305  {
306  _output3.allocator()->init(TensorInfo(_cell_state_out1.info()->tensor_shape(), 1, input->info()->data_type()));
307 
308  _memory_group.manage(&_output3);
309  _pixelwise_mul_output_state1.configure(&_cell_state_out1, lstm_params.cell_to_output_weights(), &_output3, 1, ConvertPolicy::SATURATE, RoundingPolicy::TO_NEAREST_EVEN);
310  _accum_output1.configure(&_output4, &_output3, &_output1, ConvertPolicy::SATURATE);
311  _output4.allocator()->allocate();
312  output_gate_out = &_output1;
313 
314  // Allocate intermediate buffers
315  _output3.allocator()->allocate();
316  }
317  else
318  {
319  _output1.allocator()->allocate();
320  }
321  if(_is_layer_norm_lstm)
322  {
323  _output_layer_norm_out1.allocator()->init(TensorInfo(cell_state_shape, 1, input->info()->data_type()));
324  _output_layer_norm_out2.allocator()->init(TensorInfo(cell_state_shape, 1, input->info()->data_type()));
325  _memory_group.manage(&_output_layer_norm_out1);
326  _memory_group.manage(&_output_layer_norm_out2);
327  _mean_std_norm_output_gate.configure(output_gate_out);
328  _pixelwise_mul_output_gate_coeff.configure(output_gate_out, lstm_params.output_layer_norm_weights(), &_output_layer_norm_out1, 1, ConvertPolicy::SATURATE, RoundingPolicy::TO_NEAREST_EVEN);
329  // output_gate_out is going to be reassigned, so allocate the tensor that it was assigned to before
330  output_gate_out->allocator()->allocate();
331  _accum_output_gate_bias.configure(ArithmeticOperation::ADD, &_output_layer_norm_out1, output_gate_bias, &_output_layer_norm_out2, ConvertPolicy::SATURATE);
332  _output_layer_norm_out1.allocator()->allocate();
333  output_gate_out = &_output_layer_norm_out2;
334  }
335  _activation_output.configure(output_gate_out, nullptr, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::LOGISTIC));
336 
337  // Configure block that calculates the output state
346  ICLTensor *output_state_out_tmp = lstm_params.has_projection() ? &_output_state1 : output_state_out;
347  _cell_state_activation.allocator()->init(TensorInfo(cell_state_shape, 1, input->info()->data_type()));
348  _output_state1.allocator()->init(TensorInfo(cell_state_shape, 1, input->info()->data_type()));
349 
350  _memory_group.manage(&_cell_state_activation);
351  _activation_output_state.configure(&_cell_state_out1, &_cell_state_activation, activation_info);
352  _pixelwise_mul_output_state2.configure(&_cell_state_activation, output_gate_out, output_state_out_tmp, 1, ConvertPolicy::SATURATE, RoundingPolicy::TO_NEAREST_EVEN);
353  _cell_state_activation.allocator()->allocate();
354 
355  if(lstm_params.has_projection())
356  {
357  _has_projection_weights = true;
358  _fully_connected_output_state.configure(output_state_out_tmp, lstm_params.projection_weights(), lstm_params.projection_bias(), output_state_out);
359  _output_state1.allocator()->allocate();
360  // Perform clipping
361  if(projection_threshold != 0.f)
362  {
363  _perform_projection_clipping = true;
364  _projection_clip.configure(output_state_out, nullptr, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::LU_BOUNDED_RELU, -projection_threshold, projection_threshold));
365  }
366  }
367 
368  // Copy cell state and output
369  _copy_cell_state.configure(&_cell_state_out1, cell_state_out);
370  _copy_output.configure(output_state_out, output);
371 
372  // Vector for holding the tensors to store in scratch buffer
373  std::vector<ICLTensor *> scratch_inputs;
374  if(!lstm_params.has_cifg_opt())
375  {
376  scratch_inputs.emplace_back(input_gate_out);
377  }
378  scratch_inputs.emplace_back(&_cell_state_out1);
379  scratch_inputs.emplace_back(forget_gate_out);
380  scratch_inputs.emplace_back(output_gate_out);
381  _concat_scratch_buffer.configure(scratch_inputs, scratch_buffer, Window::DimX);
382  input_gate_out->allocator()->allocate();
383  _cell_state_out1.allocator()->allocate();
384  forget_gate_out->allocator()->allocate();
385  output_gate_out->allocator()->allocate();
386 }
Class describing the value of a pixel for any image format.
Definition: PixelValue.h:34
const T * projection_weights() const
Definition: LSTMParams.h:150
const T * input_to_input_weights() const
Definition: LSTMParams.h:120
Shape of a tensor.
Definition: TensorShape.h:39
TensorShape calculate_concatenate_shape(const std::vector< T * > &input, size_t axis)
Calculate the concatenate output shape of the concatenate operation along a single axis.
bool use_layer_norm() const
Definition: LSTMParams.h:195
TensorInfo * info() const override
Interface to be implemented by the child class to return the tensor's metadata.
Definition: CLTensor.cpp:35
bool has_peephole_opt() const
Definition: LSTMParams.h:180
void configure(const ICLTensor *input1, const ICLTensor *input2, ICLTensor *output)
Initialise the kernel's input1s and output.
const T * cell_to_input_weights() const
Definition: LSTMParams.h:130
LSTMParams & set_peephole_params(const T *cell_to_forget_weights, const T *cell_to_output_weights)
Set peephole tensor parameters.
Definition: LSTMParams.h:93
virtual DataType data_type() const =0
Data type used for each element of the tensor.
bool has_cifg_opt() const
Definition: LSTMParams.h:190
const T * cell_to_output_weights() const
Definition: LSTMParams.h:145
void configure(const ICLTensor *input, ICLTensor *output, const PaddingList &padding=PaddingList(), Window *output_window=nullptr)
Initialize the kernel's input, output.
void configure(ICLTensor *input1, ICLTensor *input2, ICLTensor *output, ConvertPolicy policy)
Initialise the kernel's inputs, output and conversion policy.
Store the tensor's metadata.
Definition: ITensorInfo.h:40
CLTensorAllocator * allocator()
Return a pointer to the tensor's allocator.
Definition: CLTensor.cpp:55
void configure(ICLTensor *input, ICLTensor *output=nullptr, float epsilon=1e-8f)
Initialise the function's input and outputs.
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:327
Activation Layer Information class.
Definition: Types.h:1517
void init(const TensorInfo &input, size_t alignment=0)
Initialize a tensor based on the passed TensorInfo.
void configure(ICLTensor *input, ICLTensor *output, ActivationLayerInfo act_info)
Set the input and output tensor.
TensorShape compute_transposed_shape(const ITensorInfo &input)
Calculate the transposed shape of a tensor.
DataType data_type() const override
Data type used for each element of the tensor.
Definition: TensorInfo.h:256
const T * recurrent_to_input_weights() const
Definition: LSTMParams.h:125
const T * projection_bias() const
Definition: LSTMParams.h:155
void configure(const ICLTensor *input, const ICLTensor *weights, const ICLTensor *biases, ICLTensor *output, FullyConnectedLayerInfo fc_info=FullyConnectedLayerInfo())
Set the input and output tensors.
static constexpr size_t DimX
Alias for dimension 0 also known as X dimension.
Definition: Window.h:43
void manage(TensorType *obj)
Sets a object to be managed by the given memory group.
virtual const TensorShape & tensor_shape() const =0
Size for each dimension of the tensor.
const T * cell_to_forget_weights() const
Definition: LSTMParams.h:140
const T * cell_layer_norm_weights() const
Definition: LSTMParams.h:170
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor's metadata.
static Status validate(const ITensorInfo *input, const ITensorInfo *input_to_forget_weights, const ITensorInfo *input_to_cell_weights, const ITensorInfo *input_to_output_weights, const ITensorInfo *recurrent_to_forget_weights, const ITensorInfo *recurrent_to_cell_weights, const ITensorInfo *recurrent_to_output_weights, const ITensorInfo *forget_gate_bias, const ITensorInfo *cell_bias, const ITensorInfo *output_gate_bias, const ITensorInfo *output_state_in, const ITensorInfo *cell_state_in, const ITensorInfo *scratch_buffer, const ITensorInfo *output_state_out, const ITensorInfo *cell_state_out, const ITensorInfo *output, const LSTMParams< ITensorInfo > &lstm_params, const ActivationLayerInfo &activation_info, float cell_threshold=0.f, float projection_threshold=0.f)
Static function to check if given info will lead to a valid configuration of CLLSTMLayer.
const T * input_layer_norm_weights() const
Definition: LSTMParams.h:160
LSTMParams & set_projection_params(const T *projection_weights, const T *projection_bias)
Set projection tensor parameters.
Definition: LSTMParams.h:79
void configure(std::vector< ICLTensor * > &inputs_vector, ICLTensor *output, size_t axis)
Initialise the kernel's inputs vector and output.
LSTMParams & set_cifg_params(const T *input_to_input_weights, const T *recurrent_to_input_weights, const T *cell_to_input_weights, const T *input_gate_bias)
Set CIFG tensor parameters.
Definition: LSTMParams.h:63
bool has_projection() const
Definition: LSTMParams.h:185
void allocate() override
Allocate size specified by TensorInfo of OpenCL memory.
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161
Rounds to nearest value; half rounds to nearest even.
Interface for OpenCL tensor.
Definition: ICLTensor.h:42
void configure(const ICLTensor *a, const ICLTensor *b, const ICLTensor *c, ICLTensor *output, float alpha, float beta, const GEMMInfo &gemm_info=GEMMInfo())
Initialise the kernel's inputs and output.
Definition: CLGEMM.cpp:470
void configure(const ICLTensor *input, ICLTensor *output)
Initialise the kernel's input and output.
const T * output_layer_norm_weights() const
Definition: LSTMParams.h:175
const T * input_gate_bias() const
Definition: LSTMParams.h:135
void configure(ArithmeticOperation op, const ICLTensor *input1, const ICLTensor *input2, ICLTensor *output, const ConvertPolicy &policy)
Static function to check if given info will lead to a valid configuration of CLSaturatedArithmeticOpe...
Store the tensor's metadata.
Definition: TensorInfo.h:45
void configure(ICLTensor *tensor, const PixelValue &constant_value, Window *window=nullptr)
Initialise the kernel's tensor and filling value.
void configure(const ICLTensor *input1, const ICLTensor *input2, ICLTensor *output, float scale, ConvertPolicy overflow_policy, RoundingPolicy rounding_policy)
Initialise the kernel's input, output and border mode.
const TensorShape & tensor_shape() const override
Size for each dimension of the tensor.
Definition: TensorInfo.h:252
const T * forget_layer_norm_weights() const
Definition: LSTMParams.h:165
Basic implementation of the OpenCL tensor interface.
Definition: CLTensor.h:40

References arm_compute::ADD, CLTensorAllocator::allocate(), CLTensor::allocator(), ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, arm_compute::misc::shape_calculator::calculate_concatenate_shape(), LSTMParams< T >::cell_layer_norm_weights(), LSTMParams< T >::cell_to_forget_weights(), LSTMParams< T >::cell_to_input_weights(), LSTMParams< T >::cell_to_output_weights(), arm_compute::misc::shape_calculator::compute_transposed_shape(), CLMeanStdDevNormalizationLayer::configure(), CLTransposeKernel::configure(), CLArithmeticAddition::configure(), CLCopyKernel::configure(), CLActivationLayerKernel::configure(), CLMemsetKernel::configure(), CLWidthConcatenate2TensorsKernel::configure(), CLPixelWiseMultiplicationKernel::configure(), CLConcatenateLayer::configure(), CLGEMM::configure(), CLFullyConnectedLayer::configure(), CLSaturatedArithmeticOperationKernel::configure(), ITensorInfo::data_type(), TensorInfo::data_type(), Window::DimX, LSTMParams< T >::forget_layer_norm_weights(), LSTMParams< T >::has_cifg_opt(), LSTMParams< T >::has_peephole_opt(), LSTMParams< T >::has_projection(), ITensor::info(), CLTensor::info(), ITensorAllocator::init(), LSTMParams< T >::input_gate_bias(), LSTMParams< T >::input_layer_norm_weights(), LSTMParams< T >::input_to_input_weights(), ActivationLayerInfo::LOGISTIC, ActivationLayerInfo::LU_BOUNDED_RELU, MemoryGroupBase< TensorType >::manage(), LSTMParams< T >::output_layer_norm_weights(), LSTMParams< T >::projection_bias(), LSTMParams< T >::projection_weights(), LSTMParams< T >::recurrent_to_input_weights(), arm_compute::SATURATE, LSTMParams< T >::set_cifg_params(), LSTMParams< T >::set_peephole_params(), LSTMParams< T >::set_projection_params(), arm_compute::SUB, ITensorInfo::tensor_shape(), TensorInfo::tensor_shape(), arm_compute::TO_NEAREST_EVEN, LSTMParams< T >::use_layer_norm(), and CLLSTMLayer::validate().

◆ prepare()

void prepare ( )
overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Note
Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from IFunction.

Definition at line 719 of file CLLSTMLayer.cpp.

720 {
721  if(!_is_prepared)
722  {
723  CLScheduler::get().enqueue(_concat_weights_forget_gate);
724  if(!_run_cifg_opt)
725  {
726  CLScheduler::get().enqueue(_concat_weights_input_gate);
727  }
728  CLScheduler::get().enqueue(_concat_weights_output);
729  _is_prepared = true;
730  }
731 }
static CLScheduler & get()
Access the scheduler singleton.
Definition: CLScheduler.cpp:41
void enqueue(ICLKernel &kernel, bool flush=true)
Schedule the execution of the passed kernel if possible.
Definition: CLScheduler.cpp:95

References CLScheduler::enqueue(), and CLScheduler::get().

Referenced by CLLSTMLayer::run().

◆ run()

void run ( )
overridevirtual

Run the kernels contained in the function.

For NEON kernels:

  • Multi-threading is used for the kernels which are parallelisable.
  • By default std::thread::hardware_concurrency() threads are used.
Note
CPPScheduler::set_num_threads() can be used to manually set the number of threads

For OpenCL kernels:

  • All the kernels are enqueued on the queue associated with CLScheduler.
  • The queue is then flushed.
Note
The function will not block until the kernels are executed. It is the user's responsibility to wait.
Will call prepare() on first run if hasn't been done

Implements IFunction.

Definition at line 619 of file CLLSTMLayer.cpp.

620 {
621  prepare();
622 
623  MemoryGroupResourceScope scope_mg(_memory_group);
624 
625  CLScheduler::get().enqueue(_concat_inputs_forget_gate);
626 
627  _fully_connected_forget_gate.run();
628 
629  if(_run_peephole_opt)
630  {
631  CLScheduler::get().enqueue(_pixelwise_mul_forget_gate);
632  _accum_forget_gate1.run();
633  }
634  if(_is_layer_norm_lstm)
635  {
636  _mean_std_norm_forget_gate.run();
637  CLScheduler::get().enqueue(_pixelwise_mul_forget_gate_coeff);
638  CLScheduler::get().enqueue(_accum_forget_gate_bias);
639  }
640  CLScheduler::get().enqueue(_activation_forget_gate);
641 
642  if(_run_cifg_opt)
643  {
644  CLScheduler::get().enqueue(_ones_memset_kernel);
645  CLScheduler::get().enqueue(_subtract_input_gate);
646  }
647  else
648  {
649  _fully_connected_input_gate.run();
650 
651  if(_run_peephole_opt)
652  {
653  CLScheduler::get().enqueue(_pixelwise_mul_input_gate);
654  _accum_input_gate1.run();
655  }
656 
657  if(_is_layer_norm_lstm)
658  {
659  _mean_std_norm_input_gate.run();
660  CLScheduler::get().enqueue(_pixelwise_mul_input_gate_coeff);
661  CLScheduler::get().enqueue(_accum_input_gate_bias);
662  }
663  CLScheduler::get().enqueue(_activation_input_gate);
664  }
665 
666  _fully_connected_cell_state.run();
667  CLScheduler::get().enqueue(_transpose_cell_state);
668  _gemm_cell_state1.run();
669  CLScheduler::get().enqueue(_accum_cell_state1);
670  if(_is_layer_norm_lstm)
671  {
672  _mean_std_norm_cell_gate.run();
673  CLScheduler::get().enqueue(_pixelwise_mul_cell_gate_coeff);
674  CLScheduler::get().enqueue(_accum_cell_gate_bias);
675  }
676  CLScheduler::get().enqueue(_activation_cell_state);
677  CLScheduler::get().enqueue(_pixelwise_mul_cell_state1);
678  CLScheduler::get().enqueue(_pixelwise_mul_cell_state2);
679  CLScheduler::get().enqueue(_accum_cell_state2);
680 
681  if(_perform_cell_clipping)
682  {
683  CLScheduler::get().enqueue(_cell_clip);
684  }
685 
686  _fully_connected_output.run();
687 
688  if(_run_peephole_opt)
689  {
690  CLScheduler::get().enqueue(_pixelwise_mul_output_state1);
691  _accum_output1.run();
692  }
693  if(_is_layer_norm_lstm)
694  {
695  _mean_std_norm_output_gate.run();
696  CLScheduler::get().enqueue(_pixelwise_mul_output_gate_coeff);
697  CLScheduler::get().enqueue(_accum_output_gate_bias);
698  }
699  CLScheduler::get().enqueue(_activation_output);
700 
701  CLScheduler::get().enqueue(_activation_output_state);
702  CLScheduler::get().enqueue(_pixelwise_mul_output_state2);
703 
704  if(_has_projection_weights)
705  {
706  _fully_connected_output_state.run();
707  if(_perform_projection_clipping)
708  {
709  CLScheduler::get().enqueue(_projection_clip);
710  }
711  }
712 
713  CLScheduler::get().enqueue(_copy_cell_state);
714  CLScheduler::get().enqueue(_copy_output);
715 
716  _concat_scratch_buffer.run();
717 }
void run() override
Run the kernels contained in the function.
Definition: CLGEMM.cpp:572
static CLScheduler & get()
Access the scheduler singleton.
Definition: CLScheduler.cpp:41
void run() override final
Run the kernels contained in the function.
void run() override
Run the kernels contained in the function.
void enqueue(ICLKernel &kernel, bool flush=true)
Schedule the execution of the passed kernel if possible.
Definition: CLScheduler.cpp:95
Memory group resources scope handling class.
Definition: IMemoryGroup.h:46
void run() override
Run the kernels contained in the function.
void prepare() override
Prepare the function for executing.

References CLScheduler::enqueue(), CLScheduler::get(), CLLSTMLayer::prepare(), ICLSimpleFunction::run(), CLConcatenateLayer::run(), CLGEMM::run(), and CLFullyConnectedLayer::run().

◆ validate()

Status validate ( const ITensorInfo input,
const ITensorInfo input_to_forget_weights,
const ITensorInfo input_to_cell_weights,
const ITensorInfo input_to_output_weights,
const ITensorInfo recurrent_to_forget_weights,
const ITensorInfo recurrent_to_cell_weights,
const ITensorInfo recurrent_to_output_weights,
const ITensorInfo forget_gate_bias,
const ITensorInfo cell_bias,
const ITensorInfo output_gate_bias,
const ITensorInfo output_state_in,
const ITensorInfo cell_state_in,
const ITensorInfo scratch_buffer,
const ITensorInfo output_state_out,
const ITensorInfo cell_state_out,
const ITensorInfo output,
const LSTMParams< ITensorInfo > &  lstm_params,
const ActivationLayerInfo activation_info,
float  cell_threshold = 0.f,
float  projection_threshold = 0.f 
)
static

Static function to check if given info will lead to a valid configuration of CLLSTMLayer.

Parameters
[in]inputSource tensor info. Input is a 2D tensor with dimensions [input_size, batch_size]. Data types supported: F16/F32.
[in]input_to_forget_weights2D weights tensor info with dimensions [input_size, num_units]. Data type supported: Same as input.
[in]input_to_cell_weights2D weights tensor info with dimensions [input_size, num_units]. Data type supported: Same as input.
[in]input_to_output_weights2D weights tensor info with dimensions [input_size, num_units]. Data type supported: Same as input.
[in]recurrent_to_forget_weights2D weights tensor info with dimensions [output_size, num_units]. Data type supported: Same as input.
[in]recurrent_to_cell_weights2D weights tensor info with dimensions [output_size, num_units]. Data type supported: Same as input.
[in]recurrent_to_output_weights2D weights tensor info with dimensions [output_size, num_units]. Data type supported: Same as input.
[in]forget_gate_bias1D weights tensor info with dimensions [num_units]. Data type supported: Same as input.
[in]cell_bias1D weights tensor info with dimensions [num_units]. Data type supported: Same as input.
[in]output_gate_bias1D weights tensor info with dimensions [num_units]. Data type supported: Same as input.
[in]output_state_in2D weights tensor info with dimensions [output_size, batch_size]. Data type supported: Same as input.
[in]cell_state_in2D tensor info with dimensions [num_units, batch_size]. Data type supported: Same as input.
[in]scratch_buffer2D tensor info with dimensions [num_units * 4, batch_size] with CIFG or [num_units * 3, batch_size] without CIGF. Data type supported: Same as input.
[in]output_state_out2D weights tensor info with dimensions [output_size, batch_size]. Data type supported: Same as input.
[in]cell_state_out2D tensor info with dimensions [num_units, batch_size]. Data type supported: Same as input.
[in]outputDestination tensor info. Output is a 2D tensor with dimensions [output_size, batch_size]. Data types supported: Same as input.
[in]lstm_params(Optional) Weights tensors info used in peephole optimization: input_to_input_weights 2D weights tensor info with dimensions [input_size, num_units]. Data type supported: Same as input. recurrent_to_input_weights 2D weights tensor info with dimensions [output_size, num_units]. Data type supported: Same as input. cell_to_input_weights 1D weights tensor info with dimensions [num_units]. Can be nullptr. Data type supported: Same as input. cell_to_forget_weights 1D weights tensor info with dimensions [num_units]. Data type supported: Same as input. cell_to_output_weights 1D weights tensor info with dimensions [num_units]. Data type supported: Same as input. input_gate_bias 1D weights tensor info with dimensions [num_units]. Data type supported: Same as input projection_weights 2D weights tensor info with dimensions [output_size, num_units]. Data type supported: Same as input. projection_bias 1D weights tensor info with dimensions [output_size]. Data type supported: Same as input. input_layer_norm_coefficients 1D weights tensor info with dimensions [num_units]. Data type supported: Same as input. forget_layer_norm_coefficients 1D weights tensor info with dimensions [num_units]. Data type supported: Same as input. cell_layer_norm_coefficients 1D weights tensor info with dimensions [num_units]. Data type supported: Same as input. output_layer_norm_coefficients 1D weights tensor info with dimensions [num_units]. Data type supported: Same as input.
[in]activation_infoContains activation information described in ActivationLayerInfo.
[in]cell_thresholdThe clipping threshold for the cell state, such that values are bound within [-cell_clip, cell_clip]. If set to 0.0f then clipping is disabled.
[in]projection_thresholdThe clipping threshold for the output from the projection layer, such that values are bound within [-proj_clip, proj_clip]. If set to 0.0f then clipping is disabled.
Returns
a status

Definition at line 388 of file CLLSTMLayer.cpp.

395 {
397  input_to_forget_weights, input_to_cell_weights, input_to_output_weights,
398  recurrent_to_forget_weights, recurrent_to_cell_weights, recurrent_to_output_weights,
399  forget_gate_bias, cell_bias, output_gate_bias,
400  output_state_in, cell_state_in,
401  scratch_buffer, output_state_out, cell_state_out, output);
402 
403  // Check data types
406  input_to_forget_weights, input_to_cell_weights, input_to_output_weights,
407  recurrent_to_forget_weights, recurrent_to_cell_weights, recurrent_to_output_weights,
408  forget_gate_bias, cell_bias, output_gate_bias,
409  output_state_in, cell_state_in,
410  scratch_buffer, output_state_out, cell_state_out, output);
411 
412  // Check dimensions
414  ARM_COMPUTE_RETURN_ERROR_ON(input_to_forget_weights->num_dimensions() > 2);
415  ARM_COMPUTE_RETURN_ERROR_ON(input_to_cell_weights->num_dimensions() > 2);
416  ARM_COMPUTE_RETURN_ERROR_ON(input_to_output_weights->num_dimensions() > 2);
417  ARM_COMPUTE_RETURN_ERROR_ON(recurrent_to_forget_weights->num_dimensions() > 2);
418  ARM_COMPUTE_RETURN_ERROR_ON(recurrent_to_cell_weights->num_dimensions() > 2);
419  ARM_COMPUTE_RETURN_ERROR_ON(recurrent_to_output_weights->num_dimensions() > 2);
420  ARM_COMPUTE_RETURN_ERROR_ON(forget_gate_bias->num_dimensions() > 1);
422  ARM_COMPUTE_RETURN_ERROR_ON(output_gate_bias->num_dimensions() > 1);
423  ARM_COMPUTE_RETURN_ERROR_ON(output_state_in->num_dimensions() > 2);
424  ARM_COMPUTE_RETURN_ERROR_ON(cell_state_in->num_dimensions() > 2);
425  ARM_COMPUTE_RETURN_ERROR_ON(scratch_buffer->num_dimensions() > 2);
426  ARM_COMPUTE_RETURN_ERROR_ON(output_state_out->num_dimensions() > 2);
427  ARM_COMPUTE_RETURN_ERROR_ON(cell_state_out->num_dimensions() > 2);
429  ARM_COMPUTE_RETURN_ERROR_ON(cell_bias->dimension(0) * 4 != scratch_buffer->dimension(0)
430  && cell_bias->dimension(0) * 3 != scratch_buffer->dimension(0));
431 
432  const unsigned int num_batches = input->dimension(1);
433  const unsigned int num_cells = input_to_output_weights->dimension(1);
434 
435  if(lstm_params.use_layer_norm())
436  {
437  // If CIFG is used, input layer normalization weights tensor is omitted
438  if(lstm_params.has_cifg_opt())
439  {
440  ARM_COMPUTE_RETURN_ERROR_ON(lstm_params.input_layer_norm_weights() != nullptr);
441  }
442  else
443  {
446  ARM_COMPUTE_RETURN_ERROR_ON(lstm_params.input_layer_norm_weights()->dimension(0) != num_batches);
448  }
449 
455  ARM_COMPUTE_RETURN_ERROR_ON(lstm_params.forget_layer_norm_weights()->dimension(0) != num_batches);
456  ARM_COMPUTE_RETURN_ERROR_ON(lstm_params.cell_layer_norm_weights()->dimension(0) != num_batches);
457  ARM_COMPUTE_RETURN_ERROR_ON(lstm_params.output_layer_norm_weights()->dimension(0) != num_batches);
458  }
459 
460  // Check peephole optimization
461  if(lstm_params.has_peephole_opt())
462  {
466  }
467 
468  TensorShape units_out_transposed_shape = compute_transposed_shape(*recurrent_to_output_weights);
469  TensorShape num_units_transposed_shape = compute_transposed_shape(*forget_gate_bias);
470  const TensorInfo units_out_transposed_info = TensorInfo(units_out_transposed_shape, 1, input->data_type());
471  const TensorInfo num_units_transposed_info = TensorInfo(num_units_transposed_shape, 1, input->data_type());
472 
473  TensorInfo input_gate = TensorInfo(TensorShape(num_cells, num_batches), 1, input->data_type());
474  TensorInfo forget_gate = TensorInfo(TensorShape(num_cells, num_batches), 1, input->data_type());
475  TensorInfo output_gate_tmp = TensorInfo(TensorShape(num_cells, num_batches), 1, input->data_type());
476  TensorInfo cell_state_tmp = TensorInfo(TensorShape(num_cells, num_batches), 1, input->data_type());
477 
478  // Validate forget gate
479  ARM_COMPUTE_RETURN_ON_ERROR(CLFullyConnectedLayer::validate(input, input_to_forget_weights, (lstm_params.use_layer_norm()) ? nullptr : forget_gate_bias, &forget_gate));
480 
481  std::vector<const ITensorInfo *> inputs_vector;
482  inputs_vector.emplace_back(input);
483  inputs_vector.emplace_back(output_state_in);
485  TensorInfo forget_gate_concat = TensorInfo(concat_shape, 1, input->data_type());
486 
487  ARM_COMPUTE_RETURN_ON_ERROR(CLWidthConcatenate2TensorsKernel::validate(input, output_state_in, &forget_gate_concat));
488 
489  if(lstm_params.has_peephole_opt())
490  {
492  ARM_COMPUTE_RETURN_ON_ERROR(CLArithmeticAddition::validate(&forget_gate, &forget_gate, &forget_gate, ConvertPolicy::SATURATE));
493  }
494  if(lstm_params.use_layer_norm())
495  {
499  ARM_COMPUTE_RETURN_ON_ERROR(CLArithmeticAddition::validate(&forget_gate, forget_gate_bias, &forget_gate, ConvertPolicy::SATURATE));
500  }
502 
503  // Validate input gate
504  if(!lstm_params.has_cifg_opt())
505  {
507  lstm_params.recurrent_to_input_weights(),
508  lstm_params.input_gate_bias());
512 
513  std::vector<const ITensorInfo *> lstm_weights;
514  lstm_weights.emplace_back(lstm_params.input_to_input_weights());
515  lstm_weights.emplace_back(lstm_params.recurrent_to_input_weights());
516  TensorShape lstm_weights_concat_shape = arm_compute::misc::shape_calculator::calculate_concatenate_shape(lstm_weights, 0);
517  TensorInfo lstm_gate_concat = TensorInfo(lstm_weights_concat_shape, 1, input->data_type());
519 
520  ARM_COMPUTE_RETURN_ON_ERROR(CLFullyConnectedLayer::validate(input, lstm_params.input_to_input_weights(), (lstm_params.use_layer_norm()) ? nullptr : lstm_params.input_gate_bias(), &input_gate));
521 
522  if(lstm_params.has_peephole_opt())
523  {
528  }
529 
530  if(lstm_params.use_layer_norm())
531  {
535  }
537  }
538  else
539  {
541  }
542 
543  // Validate cell state
544  ARM_COMPUTE_RETURN_ON_ERROR(CLFullyConnectedLayer::validate(input, input_to_cell_weights, (lstm_params.use_layer_norm()) ? nullptr : cell_bias, &cell_state_tmp));
545  ARM_COMPUTE_RETURN_ON_ERROR(CLGEMM::validate(output_state_in, &units_out_transposed_info, nullptr, &cell_state_tmp, 1.f, 0.f, GEMMInfo()));
546  ARM_COMPUTE_RETURN_ON_ERROR(CLArithmeticAddition::validate(&cell_state_tmp, &cell_state_tmp, &cell_state_tmp, ConvertPolicy::SATURATE));
547  if(lstm_params.use_layer_norm())
548  {
552  ARM_COMPUTE_RETURN_ON_ERROR(CLArithmeticAddition::validate(&cell_state_tmp, cell_bias, &cell_state_tmp, ConvertPolicy::SATURATE));
553  }
554  ARM_COMPUTE_RETURN_ON_ERROR(CLActivationLayerKernel::validate(&cell_state_tmp, nullptr, activation_info));
557  ARM_COMPUTE_RETURN_ON_ERROR(CLArithmeticAddition::validate(&cell_state_tmp, &cell_state_tmp, &cell_state_tmp, ConvertPolicy::SATURATE));
558  if(cell_threshold != 0.f)
559  {
561  cell_threshold)));
562  }
563 
564  std::vector<const ITensorInfo *> in_out_weights;
565  in_out_weights.emplace_back(input_to_output_weights);
566  in_out_weights.emplace_back(recurrent_to_output_weights);
567  TensorShape in_out_weights_concat_shape = arm_compute::misc::shape_calculator::calculate_concatenate_shape(in_out_weights, 0);
568  TensorInfo in_out_gate_concat = TensorInfo(in_out_weights_concat_shape, 1, input->data_type());
569  ARM_COMPUTE_RETURN_ON_ERROR(CLWidthConcatenate2TensorsKernel::validate(input_to_output_weights, recurrent_to_output_weights, &in_out_gate_concat));
570  // Validate output gate tmp
571  ARM_COMPUTE_RETURN_ON_ERROR(CLFullyConnectedLayer::validate(input, input_to_output_weights, (lstm_params.use_layer_norm()) ? nullptr : output_gate_bias, &output_gate_tmp));
572 
573  if(lstm_params.has_peephole_opt())
574  {
577  ARM_COMPUTE_RETURN_ON_ERROR(CLArithmeticAddition::validate(&output_gate_tmp, &output_gate_tmp, &output_gate_tmp, ConvertPolicy::SATURATE));
578  }
579  if(lstm_params.use_layer_norm())
580  {
584  ARM_COMPUTE_RETURN_ON_ERROR(CLArithmeticAddition::validate(&output_gate_tmp, output_gate_bias, &output_gate_tmp, ConvertPolicy::SATURATE));
585  }
587 
588  // Validate output state
589  ARM_COMPUTE_RETURN_ON_ERROR(CLActivationLayerKernel::validate(&cell_state_tmp, &cell_state_tmp, activation_info));
591  if(lstm_params.has_projection())
592  {
593  ARM_COMPUTE_RETURN_ON_ERROR(CLFullyConnectedLayer::validate(&output_gate_tmp, lstm_params.projection_weights(), lstm_params.projection_bias(), output_state_out));
594  if(projection_threshold != 0.f)
595  {
596  ARM_COMPUTE_RETURN_ON_ERROR(CLActivationLayerKernel::validate(output_state_out, output_state_out,
597  ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::LU_BOUNDED_RELU, -projection_threshold, projection_threshold)));
598  }
599  }
600 
601  // Validate copy kernel
602  ARM_COMPUTE_RETURN_ON_ERROR(CLCopyKernel::validate(&cell_state_tmp, cell_state_out));
603  ARM_COMPUTE_RETURN_ON_ERROR(CLCopyKernel::validate(output_state_out, output));
604 
605  // Validate scratch concatenation
606  std::vector<ITensorInfo *> inputs_vector_info_raw;
607  if(!lstm_params.has_cifg_opt())
608  {
609  inputs_vector_info_raw.push_back(&input_gate);
610  }
611  inputs_vector_info_raw.push_back(&cell_state_tmp);
612  inputs_vector_info_raw.push_back(&forget_gate);
613  inputs_vector_info_raw.push_back(&output_gate_tmp);
614 
615  ARM_COMPUTE_RETURN_ON_ERROR(CLConcatenateLayer::validate(inputs_vector_info_raw, scratch_buffer, Window::DimX));
616  return Status{};
617 }
static Status validate(const ITensorInfo *input1, const ITensorInfo *input2, const ITensorInfo *output, ConvertPolicy policy)
Static function to check if given info will lead to a valid configuration of CLSaturatedArithmeticOpe...
virtual size_t num_dimensions() const =0
The number of dimensions of the tensor (rank)
const T * projection_weights() const
Definition: LSTMParams.h:150
const T * input_to_input_weights() const
Definition: LSTMParams.h:120
Shape of a tensor.
Definition: TensorShape.h:39
TensorShape calculate_concatenate_shape(const std::vector< T * > &input, size_t axis)
Calculate the concatenate output shape of the concatenate operation along a single axis.
bool use_layer_norm() const
Definition: LSTMParams.h:195
static Status validate(const ITensorInfo *input, const ITensorInfo *output=nullptr, float epsilon=1e-8f)
Static function to check if given info will lead to a valid configuration of CLMeanStdDevNormalizatio...
virtual size_t dimension(size_t index) const =0
Return the size of the requested dimension.
bool has_peephole_opt() const
Definition: LSTMParams.h:180
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(...)
Definition: Validate.h:545
const T * cell_to_input_weights() const
Definition: LSTMParams.h:130
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:193
virtual DataType data_type() const =0
Data type used for each element of the tensor.
#define ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(t, c,...)
Definition: Validate.h:791
1 channel, 1 F32 per channel
bool has_cifg_opt() const
Definition: LSTMParams.h:190
const T * cell_to_output_weights() const
Definition: LSTMParams.h:145
Status class.
Definition: Error.h:52
#define ARM_COMPUTE_RETURN_ERROR_ON(cond)
If the condition is true, an error is returned.
Definition: Error.h:244
Activation Layer Information class.
Definition: Types.h:1517
1 channel, 1 F16 per channel
static Status validate(const ITensorInfo *input, const ITensorInfo *output, const ActivationLayerInfo &act_info)
Static function to check if given info will lead to a valid configuration of CLActivationLayerKernel.
TensorShape compute_transposed_shape(const ITensorInfo &input)
Calculate the transposed shape of a tensor.
const T * recurrent_to_input_weights() const
Definition: LSTMParams.h:125
const T * projection_bias() const
Definition: LSTMParams.h:155
static constexpr size_t DimX
Alias for dimension 0 also known as X dimension.
Definition: Window.h:43
static Status validate(const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, FullyConnectedLayerInfo fc_info=FullyConnectedLayerInfo())
Static function to check if given info will lead to a valid configuration of CLFullyConnectedLayer.
const T * cell_to_forget_weights() const
Definition: LSTMParams.h:140
static Status validate(ArithmeticOperation op, const ITensorInfo *input1, const ITensorInfo *input2, const ITensorInfo *output, const ConvertPolicy &policy)
Static function to check if given info will lead to a valid configuration of CLSaturatedArithmeticOpe...
const T * cell_layer_norm_weights() const
Definition: LSTMParams.h:170
static Status validate(const std::vector< ITensorInfo * > &inputs_vector, const ITensorInfo *output, size_t axis)
Static function to check if given info will lead to a valid configuration of CLConcatenateLayer.
static Status validate(const ITensorInfo *input, const ITensorInfo *output, const PaddingList &padding=PaddingList(), Window *output_window=nullptr)
Static function to check if given info will lead to a valid configuration of CLCopyKernel.
const T * input_layer_norm_weights() const
Definition: LSTMParams.h:160
#define ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR(...)
Definition: Validate.h:163
bool has_projection() const
Definition: LSTMParams.h:185
static Status validate(const ITensorInfo *input1, const ITensorInfo *input2, const ITensorInfo *output)
Static function to check if given info will lead to a valid configuration of CLWidthConcatenate2Tenso...
Rounds to nearest value; half rounds to nearest even.
const T * output_layer_norm_weights() const
Definition: LSTMParams.h:175
const T * input_gate_bias() const
Definition: LSTMParams.h:135
static Status validate(const ITensorInfo *input1, const ITensorInfo *input2, const ITensorInfo *output, float scale, ConvertPolicy overflow_policy, RoundingPolicy rounding_policy)
Static function to check if given info will lead to a valid configuration of CLPixelWiseMultiplicatio...
static Status validate(const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *output, float alpha, float beta, const GEMMInfo &gemm_info=GEMMInfo())
Static function to check if given info will lead to a valid configuration of CLGEMM.
Definition: CLGEMM.cpp:525
Store the tensor's metadata.
Definition: TensorInfo.h:45
GEMM information class.
Definition: Types.h:1880
const T * forget_layer_norm_weights() const
Definition: LSTMParams.h:165

References ARM_COMPUTE_RETURN_ERROR_ON, ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES, ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR, ARM_COMPUTE_RETURN_ON_ERROR, arm_compute::misc::shape_calculator::calculate_concatenate_shape(), LSTMParams< T >::cell_layer_norm_weights(), LSTMParams< T >::cell_to_forget_weights(), LSTMParams< T >::cell_to_input_weights(), LSTMParams< T >::cell_to_output_weights(), arm_compute::misc::shape_calculator::compute_transposed_shape(), ITensorInfo::data_type(), ITensorInfo::dimension(), Window::DimX, arm_compute::F16, arm_compute::F32, LSTMParams< T >::forget_layer_norm_weights(), LSTMParams< T >::has_cifg_opt(), LSTMParams< T >::has_peephole_opt(), LSTMParams< T >::has_projection(), LSTMParams< T >::input_gate_bias(), LSTMParams< T >::input_layer_norm_weights(), LSTMParams< T >::input_to_input_weights(), ActivationLayerInfo::LOGISTIC, ActivationLayerInfo::LU_BOUNDED_RELU, ITensorInfo::num_dimensions(), LSTMParams< T >::output_layer_norm_weights(), LSTMParams< T >::projection_bias(), LSTMParams< T >::projection_weights(), LSTMParams< T >::recurrent_to_input_weights(), arm_compute::SATURATE, arm_compute::SUB, arm_compute::TO_NEAREST_EVEN, LSTMParams< T >::use_layer_norm(), CLMeanStdDevNormalizationLayer::validate(), CLArithmeticAddition::validate(), CLCopyKernel::validate(), CLWidthConcatenate2TensorsKernel::validate(), CLActivationLayerKernel::validate(), CLPixelWiseMultiplicationKernel::validate(), CLConcatenateLayer::validate(), CLGEMM::validate(), CLFullyConnectedLayer::validate(), and CLSaturatedArithmeticOperationKernel::validate().

Referenced by CLLSTMLayer::configure().


The documentation for this class was generated from the following files: