Compute Library
 19.08
CLFullyConnectedLayer Class Reference

Basic function to compute a Fully Connected layer on OpenCL. More...

#include <CLFullyConnectedLayer.h>

Collaboration diagram for CLFullyConnectedLayer:
[legend]

Public Member Functions

 CLFullyConnectedLayer (std::shared_ptr< IMemoryManager > memory_manager=nullptr)
 Constructor. More...
 
 CLFullyConnectedLayer (const CLFullyConnectedLayer &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 CLFullyConnectedLayer (CLFullyConnectedLayer &&)=default
 Default move constructor. More...
 
CLFullyConnectedLayeroperator= (const CLFullyConnectedLayer &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
CLFullyConnectedLayeroperator= (CLFullyConnectedLayer &&)=default
 Default move assignment operator. More...
 
void configure (const ICLTensor *input, const ICLTensor *weights, const ICLTensor *biases, ICLTensor *output, FullyConnectedLayerInfo fc_info=FullyConnectedLayerInfo())
 Set the input and output tensors. More...
 
void run () override
 Run the kernels contained in the function. More...
 
void prepare () override
 Prepare the function for executing. More...
 
- Public Member Functions inherited from IFunction
virtual ~IFunction ()=default
 Destructor. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, FullyConnectedLayerInfo fc_info=FullyConnectedLayerInfo())
 Static function to check if given info will lead to a valid configuration of CLFullyConnectedLayer. More...
 

Detailed Description

Basic function to compute a Fully Connected layer on OpenCL.

This function calls the following OpenCL kernels:

  1. CLIm2ColKernel (called when the input comes from a convolutional layer)
  2. CLFullyConnectedLayerReshapeWeights (if are_weights_reshaped is set to false and transpose_weights is set to true ) (called once)
  3. CLGEMMMatrixMultiplyKernel or CLGEMMLowpMatrixMultiplyCore (if quantized asymmetric)
  4. CLGEMMMatrixAccumulateBiasesKernel or CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint (if quantized asymmetric) (if biases is not equal to nullptr)
Note
The fully connected layer accepts "weights" tensors only with 2 dimensions.

Definition at line 75 of file CLFullyConnectedLayer.h.

Constructor & Destructor Documentation

◆ CLFullyConnectedLayer() [1/3]

CLFullyConnectedLayer ( std::shared_ptr< IMemoryManager memory_manager = nullptr)

Constructor.

Definition at line 79 of file CLFullyConnectedLayer.cpp.

80  : _memory_group(memory_manager), _convert_weights(), _flatten_layer(), _reshape_weights_kernel(), _mm_gemm(memory_manager), _mm_gemmlowp(memory_manager), _gemmlowp_output_stage(),
81  _accumulate_biases_kernel(), _flatten_output(), _gemmlowp_output(), _converted_weights_output(), _reshape_weights_output(), _are_weights_converted(true), _are_weights_reshaped(true),
82  _is_fc_after_conv(true), _accumulate_biases(false), _is_quantized(false), _is_prepared(false), _original_weights(nullptr)
83 {
84 }

◆ CLFullyConnectedLayer() [2/3]

Prevent instances of this class from being copied (As this class contains pointers)

◆ CLFullyConnectedLayer() [3/3]

Default move constructor.

Member Function Documentation

◆ configure()

void configure ( const ICLTensor input,
const ICLTensor weights,
const ICLTensor biases,
ICLTensor output,
FullyConnectedLayerInfo  fc_info = FullyConnectedLayerInfo() 
)

Set the input and output tensors.

Parameters
[in]inputSource tensor. Data type supported: QASYMM8/F16/F32.
[in]weightsWeights tensor. The weights must be 2 dimensional. If this function is called after a Convolution Layer, the (transposed) weights will have as many rows as the product of the first 3 input's dimensions. If it is called after another FullyConnected Layer, the (transposed) weights will have as many rows as the input's first dimension. Data type supported: Same as input.
[in]biasesBias tensor. Can be nullptr. Data type supported:Same as input.
[out]outputDestination tensor. Its shape should be equal to the output of a matrix multiplication between:
  • The output of im2col on the input and the (transposed) 2D weights, if the function is called after a Convolution Layer
  • The input tensor and the (transposed) 2D weights, if the function is called after another FullyConnected Layer. Data type supported: Same as input.
[in]fc_info(Optional) Fully connected layer additional info

Definition at line 140 of file CLFullyConnectedLayer.cpp.

142 {
143  ARM_COMPUTE_ERROR_ON_NULLPTR(input, weights, output);
144 
145  // Perform validate step
147  weights->info(),
148  biases != nullptr ? biases->info() : nullptr,
149  output->info(),
150  fc_info));
151 
152  _are_weights_converted = true;
153  _are_weights_reshaped = fc_info.transpose_weights ? fc_info.are_weights_reshaped : true;
154  _is_fc_after_conv = true;
155  _accumulate_biases = false;
156  _is_quantized = is_data_type_quantized_asymmetric(input->info()->data_type());
157  _is_prepared = fc_info.retain_internal_weights;
158  _original_weights = weights;
159 
160  // Configure gemmlowp output
161  if(_is_quantized)
162  {
163  _gemmlowp_output.allocator()->init(output->info()->clone()->set_is_resizable(true).reset_padding().set_data_type(DataType::S32));
164  }
165 
166  // Configure accumulate biases kernel for non quantized asymmetric types
167  if(biases != nullptr && !_is_quantized)
168  {
170 
171  _accumulate_biases = true;
172 
173  // Configure accumulate biases kernel
174  _accumulate_biases_kernel.set_target(CLScheduler::get().target());
175  _accumulate_biases_kernel.configure(output, biases);
176  }
177 
178  const ICLTensor *weights_to_use = weights;
179 
180  // With the Fully Connected layer we can have 4 different cases:
181  // 1) Convolution layer -> Fully Connected layer without batches
182  // 2) Fully Connected layer -> Fully Connected layer without batches
183  // 3) Convolution layer -> Fully Connected layer with batches
184  // 4) Fully Connected layer -> Fully Connected layer with batches
185 
186  // Check if we have a fully connected layer with batches
187  const bool is_batched_fc_layer = output->info()->dimension(1) > 1;
188  if(is_batched_fc_layer)
189  {
190  _is_fc_after_conv = (TensorShape::num_max_dimensions >= 4) && (std::equal(input->info()->tensor_shape().cbegin() + 3,
191  input->info()->tensor_shape().cend(),
192  output->info()->tensor_shape().cbegin() + 1));
193  }
194  else
195  {
196  _is_fc_after_conv = input->info()->num_dimensions() > 1;
197  }
198 
199  // Reshape weights if needed
200  if(!_are_weights_reshaped)
201  {
202  // Reshape the weights
203  _reshape_weights_kernel.configure(weights, &_reshape_weights_output);
204  weights_to_use = &_reshape_weights_output;
205  }
206 
207  // Convert weights if needed
208  if(_is_fc_after_conv && (input->info()->data_layout() != fc_info.weights_trained_layout))
209  {
210  // Convert weights
211  _convert_weights.configure(weights_to_use,
212  &_converted_weights_output,
213  input->info()->tensor_shape(),
214  fc_info.weights_trained_layout);
215 
216  weights_to_use = &_converted_weights_output;
217  _are_weights_converted = false;
218  }
219 
220  // Configure fc core
221  ICLTensor *tmp_output = (_is_quantized) ? &_gemmlowp_output : output;
222  if(_is_fc_after_conv)
223  {
224  // Fully Connected layer after a Convolution Layer without batches
225  configure_conv_fc(input, weights_to_use, tmp_output, fc_info.retain_internal_weights);
226  }
227  else
228  {
229  // Fully Connected layer after a Fully Connected Layer without batches
230  configure_fc_fc(input, weights_to_use, tmp_output, fc_info.retain_internal_weights);
231  }
232 
233  // Configure output stage for asymmetric quantized types
234  if(_is_quantized)
235  {
236  const UniformQuantizationInfo iq_info = input->info()->quantization_info().uniform();
238  const UniformQuantizationInfo oq_info = output->info()->quantization_info().uniform();
239 
240  float multiplier = iq_info.scale * wq_info.scale / oq_info.scale;
241  int output_multiplier;
242  int output_shift;
243  quantization::calculate_quantized_multiplier_less_than_one(multiplier, &output_multiplier, &output_shift);
244  _gemmlowp_output_stage.configure(&_gemmlowp_output, biases, output, output_multiplier, output_shift, oq_info.offset);
245  _gemmlowp_output.allocator()->allocate();
246  }
247 }
virtual size_t num_dimensions() const =0
The number of dimensions of the tensor (rank)
arm_compute::Status calculate_quantized_multiplier_less_than_one(float multiplier, int *quant_multiplier, int *right_shift)
Calculate quantized representation of multiplier with value less than one.
TensorInfo * info() const override
Interface to be implemented by the child class to return the tensor's metadata.
Definition: CLTensor.cpp:35
virtual size_t dimension(size_t index) const =0
Return the size of the requested dimension.
#define ARM_COMPUTE_ERROR_ON_MISMATCHING_DATA_TYPES(...)
Definition: Validate.h:543
bool retain_internal_weights
Retain internal reshaped weights.
Definition: Types.h:798
static CLScheduler & get()
Access the scheduler singleton.
Definition: CLScheduler.cpp:41
virtual DataType data_type() const =0
Data type used for each element of the tensor.
QuantizationInfo quantization_info() const override
Get the quantization settings (scale and offset) of the tensor.
Definition: TensorInfo.h:293
void configure(const ICLTensor *input, const ICLTensor *bias, ICLTensor *output, int result_fixedpoint_multiplier, int result_shift, int result_offset_after_shift, int min=0, int max=0)
Initialise the kernel's inputs, output.
CLTensorAllocator * allocator()
Return a pointer to the tensor's allocator.
Definition: CLTensor.cpp:55
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:327
Quantization info when assuming per layer quantization.
void init(const TensorInfo &input, size_t alignment=0)
Initialize a tensor based on the passed TensorInfo.
1 channel, 1 S32 per channel
void configure(ICLTensor *accum, const ICLTensor *biases)
Set the accumulate buffer and the biases of the kernel.
virtual const TensorShape & tensor_shape() const =0
Size for each dimension of the tensor.
static Status validate(const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, FullyConnectedLayerInfo fc_info=FullyConnectedLayerInfo())
Static function to check if given info will lead to a valid configuration of CLFullyConnectedLayer.
bool are_weights_reshaped
Reshape the weights tensor if false.
Definition: Types.h:797
UniformQuantizationInfo uniform() const
Return per layer quantization info.
virtual std::unique_ptr< T > clone() const =0
Provide a clone of the current object of class T.
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor's metadata.
virtual QuantizationInfo quantization_info() const =0
Get the quantization settings (scale and offset) of the tensor.
bool is_data_type_quantized_asymmetric(DataType dt)
Check if a given data type is of asymmetric quantized type.
Definition: Utils.h:1030
std::array< T, num_max_dimensions >::const_iterator cend() const
Returns a read-only (constant) iterator that points one past the last element in the dimension array.
Definition: Dimensions.h:234
void allocate() override
Allocate size specified by TensorInfo of OpenCL memory.
std::array< T, num_max_dimensions >::const_iterator cbegin() const
Returns a read-only (constant) iterator that points to the first element in the dimension array.
Definition: Dimensions.h:210
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161
Interface for OpenCL tensor.
Definition: ICLTensor.h:42
DataLayout weights_trained_layout
Layout that the weights have been trained with.
Definition: Types.h:795
void configure(const ICLTensor *input, ICLTensor *output, const TensorShape &original_input_shape, DataLayout data_layout)
Initialize the function.
void configure(const ICLTensor *input, ICLTensor *output)
Set the input and output tensors.
bool transpose_weights
Transpose weights if true.
Definition: Types.h:796
void set_target(GPUTarget target)
Set the targeted GPU architecture.
Definition: ICLKernel.h:271
static constexpr size_t num_max_dimensions
Number of dimensions the tensor has.
Definition: Dimensions.h:45
virtual DataLayout data_layout() const =0
Get the data layout of the tensor.

References CLTensorAllocator::allocate(), CLTensor::allocator(), FullyConnectedLayerInfo::are_weights_reshaped, ARM_COMPUTE_ERROR_ON_MISMATCHING_DATA_TYPES, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, arm_compute::quantization::calculate_quantized_multiplier_less_than_one(), Dimensions< T >::cbegin(), Dimensions< T >::cend(), ICloneable< T >::clone(), CLConvertFullyConnectedWeights::configure(), CLGEMMMatrixAccumulateBiasesKernel::configure(), CLFullyConnectedLayerReshapeWeights::configure(), CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint::configure(), ITensorInfo::data_layout(), ITensorInfo::data_type(), ITensorInfo::dimension(), CLScheduler::get(), ITensor::info(), CLTensor::info(), ITensorAllocator::init(), arm_compute::is_data_type_quantized_asymmetric(), ITensorInfo::num_dimensions(), Dimensions< size_t >::num_max_dimensions, UniformQuantizationInfo::offset, ITensorInfo::quantization_info(), TensorInfo::quantization_info(), FullyConnectedLayerInfo::retain_internal_weights, arm_compute::S32, UniformQuantizationInfo::scale, ICLKernel::set_target(), ITensorInfo::tensor_shape(), FullyConnectedLayerInfo::transpose_weights, QuantizationInfo::uniform(), CLFullyConnectedLayer::validate(), arm_compute::test::validation::weights, and FullyConnectedLayerInfo::weights_trained_layout.

Referenced by CLRNNLayer::configure(), CLLSTMLayer::configure(), and arm_compute::test::validation::DATA_TEST_CASE().

◆ operator=() [1/2]

CLFullyConnectedLayer& operator= ( const CLFullyConnectedLayer )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

CLFullyConnectedLayer& operator= ( CLFullyConnectedLayer &&  )
default

Default move assignment operator.

◆ prepare()

void prepare ( )
overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Note
Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from IFunction.

Definition at line 383 of file CLFullyConnectedLayer.cpp.

384 {
385  if(!_is_prepared)
386  {
387  ARM_COMPUTE_ERROR_ON(!_original_weights->is_used());
388 
389  auto release_unused = [](CLTensor * w)
390  {
391  if(!w->is_used())
392  {
393  CLScheduler::get().queue().finish();
394  w->allocator()->free();
395  }
396  };
397 
398  // Pointer to current weights
399  const ICLTensor *cur_weights = _original_weights;
400 
401  // Reshape of the weights if needed (happens only once)
402  if(!_are_weights_reshaped)
403  {
404  // Run reshape weights kernel and mark weights as unused
405  _reshape_weights_output.allocator()->allocate();
406  _reshape_weights_kernel.run();
407 
408  cur_weights->mark_as_unused();
409  cur_weights = &_reshape_weights_output;
410  _are_weights_reshaped = true;
411  }
412 
413  // Convert weights if needed (happens only once)
414  if(!_are_weights_converted)
415  {
416  _converted_weights_output.allocator()->allocate();
417  _convert_weights.run();
418 
419  cur_weights->mark_as_unused();
420  _are_weights_converted = true;
421  }
422 
423  // Release reshaped weights if unused
424  release_unused(&_reshape_weights_output);
425 
426  // Prepare GEMM prepare and release unused weights
427  if(!_is_quantized)
428  {
429  _mm_gemm.prepare();
430  }
431 
432  // Release converted weights if unused
433  release_unused(&_reshape_weights_output);
434  release_unused(&_converted_weights_output);
435 
436  _is_prepared = true;
437  }
438 }
SimpleTensor< float > w
Definition: DFT.cpp:156
void prepare() override
Prepare the function for executing.
Definition: CLGEMM.cpp:632
static CLScheduler & get()
Access the scheduler singleton.
Definition: CLScheduler.cpp:41
bool is_used() const
Flags if the tensor is used or not.
Definition: ITensor.cpp:162
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:337
CLTensorAllocator * allocator()
Return a pointer to the tensor's allocator.
Definition: CLTensor.cpp:55
void mark_as_unused() const
Marks a tensor as unused.
Definition: ITensor.cpp:167
void run() override final
Run the kernels contained in the function.
cl::CommandQueue & queue()
Accessor for the associated CL command queue.
Definition: CLScheduler.h:102
void allocate() override
Allocate size specified by TensorInfo of OpenCL memory.
Interface for OpenCL tensor.
Definition: ICLTensor.h:42
Basic implementation of the OpenCL tensor interface.
Definition: CLTensor.h:40

References CLTensorAllocator::allocate(), CLTensor::allocator(), ARM_COMPUTE_ERROR_ON, CLScheduler::get(), ITensor::is_used(), ITensor::mark_as_unused(), CLGEMM::prepare(), CLScheduler::queue(), ICLSimpleFunction::run(), and arm_compute::test::validation::w.

Referenced by CLRNNLayer::prepare(), and CLFullyConnectedLayer::run().

◆ run()

void run ( )
overridevirtual

Run the kernels contained in the function.

For NEON kernels:

  • Multi-threading is used for the kernels which are parallelisable.
  • By default std::thread::hardware_concurrency() threads are used.
Note
CPPScheduler::set_num_threads() can be used to manually set the number of threads

For OpenCL kernels:

  • All the kernels are enqueued on the queue associated with CLScheduler.
  • The queue is then flushed.
Note
The function will not block until the kernels are executed. It is the user's responsibility to wait.
Will call prepare() on first run if hasn't been done

Implements IFunction.

Definition at line 347 of file CLFullyConnectedLayer.cpp.

348 {
349  prepare();
350 
351  MemoryGroupResourceScope scope_mg(_memory_group);
352 
353  // Linearize input if it comes from a convolutional layer
354  if(_is_fc_after_conv)
355  {
356  _flatten_layer.run();
357  }
358 
359  // Run matrix multiply
360  if(_is_quantized)
361  {
362  _mm_gemmlowp.run();
363  }
364  else
365  {
366  _mm_gemm.run();
367  }
368 
369  // Accumulate biases if provided
370  if(_is_quantized)
371  {
372  _gemmlowp_output_stage.run();
373  }
374  else
375  {
376  if(_accumulate_biases)
377  {
378  CLScheduler::get().enqueue(_accumulate_biases_kernel);
379  }
380  }
381 }
void run() override
Run the kernels contained in the function.
Definition: CLGEMM.cpp:572
static CLScheduler & get()
Access the scheduler singleton.
Definition: CLScheduler.cpp:41
void run() override
Run the kernels contained in the function.
void prepare() override
Prepare the function for executing.
void run() override final
Run the kernels contained in the function.
void enqueue(ICLKernel &kernel, bool flush=true)
Schedule the execution of the passed kernel if possible.
Definition: CLScheduler.cpp:95
Memory group resources scope handling class.
Definition: IMemoryGroup.h:46

References CLScheduler::enqueue(), CLScheduler::get(), CLFullyConnectedLayer::prepare(), ICLSimpleFunction::run(), CLGEMMLowpMatrixMultiplyCore::run(), and CLGEMM::run().

Referenced by CLRNNLayer::run(), and CLLSTMLayer::run().

◆ validate()

Status validate ( const ITensorInfo input,
const ITensorInfo weights,
const ITensorInfo biases,
const ITensorInfo output,
FullyConnectedLayerInfo  fc_info = FullyConnectedLayerInfo() 
)
static

Static function to check if given info will lead to a valid configuration of CLFullyConnectedLayer.

Parameters
[in]inputSource tensor info. Data type supported: QASYMM8/F16/F32.
[in]weightsWeights tensor info. The weights must be 2 dimensional. If this function is called after a Convolution Layer, the (transposed) weights will have as many rows as the product of the first 3 input's dimensions. If it is called after another FullyConnected Layer, the (transposed) weights will have as many rows as the input's first dimension. Data type supported: Same as input.
[in]biasesBias tensor info. Can be nullptr. Data type supported:Same as input.
[out]outputDestination tensor info. Its shape should be equal to the output of a matrix multiplication between:
  • The output of im2col on the input and the (transposed) 2D weights, if the function is called after a Convolution Layer
  • The input tensor and the (transposed) 2D weights, if the function is called after another FullyConnected Layer. Data type supported: Same as input.
[in]fc_info(Optional) Fully connected layer additional info
Returns
a status

Definition at line 249 of file CLFullyConnectedLayer.cpp.

251 {
255  ARM_COMPUTE_RETURN_ERROR_ON(weights->num_dimensions() > 2);
256 
257  bool weights_reshaped = fc_info.transpose_weights ? fc_info.are_weights_reshaped : true;
258  bool is_fc_after_conv = true;
259  bool is_quantized = is_data_type_quantized_asymmetric(input->data_type());
260  const GPUTarget gpu_target = CLScheduler::get().target();
261 
262  const ITensorInfo &flatten_input = TensorInfo(input->clone()->set_is_resizable(true).reset_padding().set_tensor_shape(compute_flatten_shape(input)).set_data_layout(DataLayout::NCHW));
263  const ITensorInfo &reshaped_weights = TensorInfo(weights->clone()->set_is_resizable(true).reset_padding().set_tensor_shape(compute_transposed_shape(*weights)));
264  const ITensorInfo &converted_weights = weights_reshaped ? TensorInfo(weights->clone()->set_is_resizable(true).reset_padding()) : TensorInfo(*reshaped_weights.clone());
265  const ITensorInfo &gemmlowp_output = TensorInfo(output->clone()->set_is_resizable(true).reset_padding().set_data_type(DataType::S32));
266 
267  // Configure accumulate biases kernel for non quantized asymmetric types
268  if(biases != nullptr && !is_quantized)
269  {
272  }
273 
274  // With the Fully Connected layer we can have 4 different cases:
275  // 1) Convolution layer -> Fully Connected layer without batches
276  // 2) Fully Connected layer -> Fully Connected layer without batches
277  // 3) Convolution layer -> Fully Connected layer with batches
278  // 4) Fully Connected layer -> Fully Connected layer with batches
279 
280  const ITensorInfo *input_to_use = input;
281  const ITensorInfo *weights_to_use = weights;
282  const ITensorInfo *tmp_output = (is_quantized) ? &gemmlowp_output : output;
283 
284  // Check if we have a fully connected layer with batches
285  const bool is_batched_fc_layer = output->dimension(1) > 1;
286  if(is_batched_fc_layer)
287  {
288  is_fc_after_conv = (TensorShape::num_max_dimensions >= 4) && (std::equal(input->tensor_shape().cbegin() + 3,
289  input->tensor_shape().cend(),
290  output->tensor_shape().cbegin() + 1));
291  }
292  else
293  {
294  is_fc_after_conv = input->num_dimensions() > 1;
295  }
296 
297  if(!weights_reshaped)
298  {
299  // Validate reshape weights kernel
301  weights_to_use = &reshaped_weights;
302  }
303 
304  if(is_fc_after_conv && (input->data_layout() != fc_info.weights_trained_layout))
305  {
306  // Validate convert weights kernel
308  &converted_weights,
309  input->tensor_shape(),
310  fc_info.weights_trained_layout));
311  weights_to_use = &converted_weights;
312  }
313 
314  if(is_fc_after_conv)
315  {
316  // Fully Connected layer after a Convolution Layer without batches
317  ARM_COMPUTE_RETURN_ERROR_ON((weights_to_use->dimension(1) != (input->dimension(0) * input->dimension(1) * input->dimension(2))));
318 
319  // Validate flatten kernel
321  input_to_use = &flatten_input;
322  }
323  else
324  {
325  // Fully Connected layer after a Fully Connected Layer without batches
326  ARM_COMPUTE_RETURN_ERROR_ON(input->dimension(0) != weights_to_use->dimension(1));
327  }
328  // Validate matrix multiply kernel
329  ARM_COMPUTE_RETURN_ON_ERROR(validate_mm(*input_to_use, *weights_to_use, *tmp_output));
330 
331  // Validate output stage for asymmetric quantized types
332  if(is_quantized)
333  {
334  const UniformQuantizationInfo iq_info = input->quantization_info().uniform();
335  const UniformQuantizationInfo wq_info = weights->quantization_info().uniform();
336  const UniformQuantizationInfo oq_info = output->quantization_info().uniform();
337  const float multiplier = iq_info.scale * wq_info.scale / oq_info.scale;
338 
339  ARM_COMPUTE_UNUSED(multiplier);
340  ARM_COMPUTE_RETURN_ERROR_ON(multiplier > 1.0f);
342  }
343 
344  return Status{};
345 }
virtual size_t num_dimensions() const =0
The number of dimensions of the tensor (rank)
static Status validate(const ITensorInfo *input, const ITensorInfo *output)
Static function to check if given info will lead to a valid configuration of CLFlattenLayer.
virtual size_t dimension(size_t index) const =0
Return the size of the requested dimension.
static CLScheduler & get()
Access the scheduler singleton.
Definition: CLScheduler.cpp:41
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(...)
Definition: Validate.h:545
GPUTarget target() const
Get the target GPU.
Definition: CLScheduler.h:112
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:193
virtual DataType data_type() const =0
Data type used for each element of the tensor.
#define ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(t, c,...)
Definition: Validate.h:791
1 channel, 1 F32 per channel
static Status validate(const ITensorInfo *accum, const ITensorInfo *biases, GPUTarget gpu_target)
Static function to check if given info will lead to a valid configuration of CLGEMMMatrixAccumulateBi...
Store the tensor's metadata.
Definition: ITensorInfo.h:40
Quantization info when assuming per layer quantization.
Status class.
Definition: Error.h:52
#define ARM_COMPUTE_RETURN_ERROR_ON(cond)
If the condition is true, an error is returned.
Definition: Error.h:244
1 channel, 1 F16 per channel
TensorShape compute_transposed_shape(const ITensorInfo &input)
Calculate the transposed shape of a tensor.
static Status validate(const ITensorInfo *input, const ITensorInfo *bias, const ITensorInfo *output, int min=0, int max=0)
Static function to check if given info will lead to a valid configuration of CLGEMMLowpQuantizeDownIn...
1 channel, 1 S32 per channel
TensorShape compute_flatten_shape(const ITensorInfo *input)
Calculate the flattened output shape of a tensor.
#define ARM_COMPUTE_UNUSED(...)
To avoid unused variables warnings.
Definition: Error.h:160
virtual const TensorShape & tensor_shape() const =0
Size for each dimension of the tensor.
quantized, asymmetric fixed-point 8-bit number
bool are_weights_reshaped
Reshape the weights tensor if false.
Definition: Types.h:797
UniformQuantizationInfo uniform() const
Return per layer quantization info.
virtual std::unique_ptr< T > clone() const =0
Provide a clone of the current object of class T.
virtual QuantizationInfo quantization_info() const =0
Get the quantization settings (scale and offset) of the tensor.
Num samples, channels, height, width.
static Status validate(const ITensorInfo *input, const ITensorInfo *output, const TensorShape &original_input_shape, DataLayout data_layout)
Static function to check if given info will lead to a valid configuration of CLConvertFullyConnectedW...
bool is_data_type_quantized_asymmetric(DataType dt)
Check if a given data type is of asymmetric quantized type.
Definition: Utils.h:1030
std::array< T, num_max_dimensions >::const_iterator cend() const
Returns a read-only (constant) iterator that points one past the last element in the dimension array.
Definition: Dimensions.h:234
#define ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR(...)
Definition: Validate.h:163
std::array< T, num_max_dimensions >::const_iterator cbegin() const
Returns a read-only (constant) iterator that points to the first element in the dimension array.
Definition: Dimensions.h:210
GPUTarget
Available GPU Targets.
Definition: GPUTarget.h:34
DataLayout weights_trained_layout
Layout that the weights have been trained with.
Definition: Types.h:795
bool transpose_weights
Transpose weights if true.
Definition: Types.h:796
Store the tensor's metadata.
Definition: TensorInfo.h:45
static constexpr size_t num_max_dimensions
Number of dimensions the tensor has.
Definition: Dimensions.h:45
static Status validate(const ITensorInfo *input, const ITensorInfo *output)
Static function to check if given info will lead to a valid configuration of CLFullyConnectedLayerRes...
virtual DataLayout data_layout() const =0
Get the data layout of the tensor.

References FullyConnectedLayerInfo::are_weights_reshaped, ARM_COMPUTE_RETURN_ERROR_ON, ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES, ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR, ARM_COMPUTE_RETURN_ON_ERROR, ARM_COMPUTE_UNUSED, Dimensions< T >::cbegin(), Dimensions< T >::cend(), ICloneable< T >::clone(), arm_compute::misc::shape_calculator::compute_flatten_shape(), arm_compute::misc::shape_calculator::compute_transposed_shape(), ITensorInfo::data_layout(), ITensorInfo::data_type(), ITensorInfo::dimension(), arm_compute::F16, arm_compute::F32, CLScheduler::get(), arm_compute::is_data_type_quantized_asymmetric(), arm_compute::NCHW, ITensorInfo::num_dimensions(), Dimensions< size_t >::num_max_dimensions, arm_compute::QASYMM8, ITensorInfo::quantization_info(), arm_compute::S32, UniformQuantizationInfo::scale, CLScheduler::target(), ITensorInfo::tensor_shape(), FullyConnectedLayerInfo::transpose_weights, QuantizationInfo::uniform(), CLConvertFullyConnectedWeights::validate(), CLFlattenLayer::validate(), CLGEMMMatrixAccumulateBiasesKernel::validate(), CLFullyConnectedLayerReshapeWeights::validate(), CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint::validate(), arm_compute::test::validation::weights, and FullyConnectedLayerInfo::weights_trained_layout.

Referenced by CLFullyConnectedLayer::configure(), CLRNNLayer::validate(), and CLLSTMLayer::validate().


The documentation for this class was generated from the following files: