Compute Library
 22.11
ClFullyConnected Class Reference

Basic function to compute a Fully Connected layer on OpenCL. More...

#include <ClFullyConnected.h>

Collaboration diagram for ClFullyConnected:
[legend]

Public Member Functions

 ClFullyConnected ()
 
 ~ClFullyConnected ()
 
void configure (const CLCompileContext &compile_context, ITensorInfo *src, ITensorInfo *weights, ITensorInfo *biases, ITensorInfo *dst, FullyConnectedLayerInfo fc_info=FullyConnectedLayerInfo())
 Set the input and output tensors. More...
 
void run (ITensorPack &tensors) override
 Run the kernels contained in the function. More...
 
void prepare (ITensorPack &tensors) override
 Prepare the function for executing. More...
 
experimental::MemoryRequirements workspace () const override
 Return the memory requirements required by the workspace. More...
 
- Public Member Functions inherited from ICLOperator
 ICLOperator (IRuntimeContext *ctx=nullptr)
 Constructor. More...
 
 ICLOperator (const ICLOperator &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 ICLOperator (ICLOperator &&)=default
 Default move constructor. More...
 
ICLOperatoroperator= (const ICLOperator &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
ICLOperatoroperator= (ICLOperator &&)=default
 Default move assignment operator. More...
 
- Public Member Functions inherited from IOperator
virtual ~IOperator ()=default
 Destructor. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *src, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *dst, FullyConnectedLayerInfo fc_info=FullyConnectedLayerInfo())
 Static function to check if given info will lead to a valid configuration. More...
 

Detailed Description

Basic function to compute a Fully Connected layer on OpenCL.

This function calls the following OpenCL kernels:

  1. opencl::kernels::ClIm2ColKernel (called when the input comes from a convolutional layer)
  2. CLTranspose (if are_weights_reshaped is set to false and transpose_weights is set to true ) (called once)
  3. opencl::ClGemm or CLGEMMLowpMatrixMultiplyCore (if quantized asymmetric)
Note
The fully connected layer accepts "weights" tensors only with 2 dimensions.

Definition at line 53 of file ClFullyConnected.h.

Constructor & Destructor Documentation

◆ ClFullyConnected()

Definition at line 145 of file ClFullyConnected.cpp.

146  : _convert_weights(nullptr),
147  _flatten(nullptr),
148  _reshape_weights(nullptr),
149  _mm_gemm(nullptr),
150  _mm_gemmlowp(nullptr),
151  _aux_mem(Count)
152 {
153 }

◆ ~ClFullyConnected()

~ClFullyConnected ( )
default

Member Function Documentation

◆ configure()

void configure ( const CLCompileContext compile_context,
ITensorInfo src,
ITensorInfo weights,
ITensorInfo biases,
ITensorInfo dst,
FullyConnectedLayerInfo  fc_info = FullyConnectedLayerInfo() 
)

Set the input and output tensors.

Valid data layouts:

  • NHWC
  • NCHW

Valid data type configurations:

src0 src1 src2 dst
F16 F16 F16 F16
F32 F32 F32 F32
QASYMM8 QASYMM8 S32 QASYMM8
QASYMM8_SIGNED QASYMM8_SIGNED S32 QASYMM8_SIGNED
Parameters
[in]compile_contextThe compile context to be used.
[in]srcSource tensor. Data type supported: QASYMM8/QASYMM8_SIGNED/F16/F32.
[in]weightsWeights tensor. The weights must be 2 dimensional. If this function is called after a Convolution Layer, the (transposed) weights will have as many rows as the product of the first 3 input's dimensions. If it is called after another FullyConnected Layer, the (transposed) weights will have as many rows as the input's first dimension. Data type supported: Same as src.
[in]biasesBias tensor. Can be nullptr. Data type supported:Same as src.
[out]dstDestination tensor. Its shape should be equal to the output of a matrix multiplication between:
  • The output of im2col on the input and the (transposed) 2D weights, if the function is called after a Convolution Layer
  • The input tensor and the (transposed) 2D weights, if the function is called after another FullyConnected Layer. Data type supported: Same as src.
[in]fc_info(Optional) Fully connected layer additional info

Definition at line 227 of file ClFullyConnected.cpp.

References arm_compute::ACL_SRC_1, FullyConnectedLayerInfo::are_weights_reshaped, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, ARM_COMPUTE_LOG_PARAMS, Dimensions< T >::cbegin(), Dimensions< T >::cend(), ITensorInfo::data_layout(), ITensorInfo::data_type(), ITensorInfo::dimension(), arm_compute::is_data_type_quantized_asymmetric(), ITensorInfo::num_dimensions(), Dimensions< size_t >::num_max_dimensions, arm_compute::offset_int_vec(), arm_compute::experimental::Prepare, FullyConnectedLayerInfo::retain_internal_weights, ITensorInfo::tensor_shape(), TensorInfo::total_size(), FullyConnectedLayerInfo::transpose_weights, ClFullyConnected::validate(), and FullyConnectedLayerInfo::weights_trained_layout.

229 {
231 
232  // Perform validate step
233  ARM_COMPUTE_ERROR_THROW_ON(ClFullyConnected::validate(src, weights, biases, dst, fc_info));
234  ARM_COMPUTE_LOG_PARAMS(src, weights, biases, dst, fc_info);
235 
236  _are_weights_converted = true;
237  _are_weights_reshaped = fc_info.transpose_weights ? fc_info.are_weights_reshaped : true;
238  _is_fc_after_conv = true;
239  _is_quantized = is_data_type_quantized_asymmetric(src->data_type());
240  _is_prepared = fc_info.retain_internal_weights;
241  _weights_to_use = TensorInfo(*weights);
242  _weights_to_use_idx = ACL_SRC_1;
243 
244  // With the Fully Connected layer we can have 4 different cases:
245  // 1) Convolution layer -> Fully Connected layer without batches
246  // 2) Fully Connected layer -> Fully Connected layer without batches
247  // 3) Convolution layer -> Fully Connected layer with batches
248  // 4) Fully Connected layer -> Fully Connected layer with batches
249 
250  // Check if we have a fully connected layer with batches
251  const bool is_batched_fc_layer = dst->dimension(1) > 1;
252  if(is_batched_fc_layer)
253  {
254  _is_fc_after_conv = (TensorShape::num_max_dimensions >= 4) && (std::equal(src->tensor_shape().cbegin() + 3,
255  src->tensor_shape().cend(),
256  dst->tensor_shape().cbegin() + 1));
257  }
258  else
259  {
260  _is_fc_after_conv = src->num_dimensions() > 1;
261  }
262 
263  ITensorInfo *weights_used = weights;
264 
265  // Reshape weights if needed
266  if(!_are_weights_reshaped)
267  {
268  // Reshape the weights
269  _reshape_weights = std::make_unique<ClTranspose>();
270  _reshape_weights->configure(compile_context, weights, &_reshaped_weights);
271  weights_used = &_reshaped_weights;
272  _weights_to_use_idx = offset_int_vec(TransposedWeights);
273  }
274 
275  // Convert weights if needed
276  if(_is_fc_after_conv && (src->data_layout() != fc_info.weights_trained_layout))
277  {
278  // Convert weights
279  _convert_weights = std::make_unique<ClConvertFullyConnectedWeights>();
280  _convert_weights->configure(compile_context,
281  weights_used,
282  &_converted_weights,
283  src->tensor_shape(),
284  fc_info.weights_trained_layout);
285 
286  weights_used = &_converted_weights;
287  _weights_to_use_idx = offset_int_vec(ConvertedWeights);
288  _are_weights_converted = false;
289  }
290 
291  if(_is_fc_after_conv)
292  {
293  // Fully Connected layer after a Convolution Layer without batches
294  configure_conv_fc(compile_context, src, weights_used, biases, dst, fc_info);
295  }
296  else
297  {
298  // Fully Connected layer after a Fully Connected Layer without batches
299  configure_fc_fc(compile_context, src, weights_used, biases, dst, fc_info);
300  }
301  // Update TensorInfo of final weights used (Need to be done in the end due to padding expansion)
302  _weights_to_use = *weights_used;
303 
304  // Set auxiliary memory requirements
305  auto gemm_mem_req = (_is_quantized) ? _mm_gemmlowp->workspace() : _mm_gemm->workspace();
306  for(unsigned int i = 0; i < gemm_mem_req.size(); ++i)
307  {
308  _aux_mem[i] = gemm_mem_req[i];
309  }
310  if(_aux_mem[1].size > 0 || _aux_mem[2].size > 0) // Persistent weights memory on GEMMs
311  {
312  // Release permuted weights at the of prepare as they are further transposed by the assembly dispatch
313  _aux_mem[TransposedWeights] = MemoryInfo(offset_int_vec(TransposedWeights), MemoryLifetime::Prepare, _reshaped_weights.total_size());
314  _aux_mem[ConvertedWeights] = MemoryInfo(offset_int_vec(ConvertedWeights), MemoryLifetime::Prepare, _converted_weights.total_size());
315  }
316  else
317  {
318  // Release permuted weights at the of prepare as they are further transposed by the assembly dispatch
319  const auto transposed_wei_lft = (_weights_to_use_idx == offset_int_vec(TransposedWeights)) ? MemoryLifetime::Persistent : MemoryLifetime::Prepare;
320  const auto converted_wei_lft = (_weights_to_use_idx == offset_int_vec(ConvertedWeights)) ? MemoryLifetime::Persistent : MemoryLifetime::Prepare;
321 
322  _aux_mem[TransposedWeights] = MemoryInfo(offset_int_vec(TransposedWeights), transposed_wei_lft, _reshaped_weights.total_size());
323  _aux_mem[ConvertedWeights] = MemoryInfo(offset_int_vec(ConvertedWeights), converted_wei_lft, _converted_weights.total_size());
324  }
325  _aux_mem[FlattenedSrc] = MemoryInfo(offset_int_vec(FlattenedSrc), MemoryLifetime::Temporary, _flattened_src.total_size());
326 }
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
SimpleTensor< float > src
Definition: DFT.cpp:155
size_t total_size() const override
Returns the total size of the tensor in bytes.
Definition: TensorInfo.h:250
static Status validate(const ITensorInfo *src, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *dst, FullyConnectedLayerInfo fc_info=FullyConnectedLayerInfo())
Static function to check if given info will lead to a valid configuration.
bool is_data_type_quantized_asymmetric(DataType dt)
Check if a given data type is of asymmetric quantized type.
Definition: Utils.h:1052
#define ARM_COMPUTE_LOG_PARAMS(...)
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:157
int offset_int_vec(int offset)
Definition: MemoryHelpers.h:38
static constexpr size_t num_max_dimensions
Number of dimensions the tensor has.
Definition: Dimensions.h:46

◆ prepare()

void prepare ( ITensorPack constants)
overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Parameters
[in]constantsVector that contains the constants tensors.
Note
Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from ICLOperator.

Definition at line 453 of file ClFullyConnected.cpp.

References arm_compute::ACL_DST, arm_compute::ACL_SRC, arm_compute::ACL_SRC_1, ITensorPack::add_const_tensor(), CLAuxTensorHandler::get(), ITensorPack::get_const_tensor(), ITensor::mark_as_unused(), and arm_compute::offset_int_vec().

Referenced by ClFullyConnected::run().

454 {
455  if(!_is_prepared)
456  {
457  auto weights = tensors.get_const_tensor(ACL_SRC_1);
458 
459  CLAuxTensorHandler reshaped_weights(offset_int_vec(TransposedWeights), _reshaped_weights, tensors, false);
460  CLAuxTensorHandler converted_weights(offset_int_vec(ConvertedWeights), _converted_weights, tensors, false);
461 
462  // Pointer to current weights
463  const ITensor *cur_weights = weights;
464 
465  // Reshape of the weights if needed (happens only once)
466  if(!_are_weights_reshaped)
467  {
468  // Run reshape weights kernel and mark weights as unused
469  ITensorPack transpose_pack{ { ACL_SRC, weights }, { ACL_DST, reshaped_weights.get() } };
470  _reshape_weights->run(transpose_pack);
471 
472  cur_weights->mark_as_unused();
473  cur_weights = reshaped_weights.get();
474 
475  _are_weights_reshaped = true;
476  }
477 
478  // Convert weights if needed (happens only once)
479  if(!_are_weights_converted)
480  {
481  ITensorPack convert_pack{ { ACL_SRC, cur_weights }, { ACL_DST, converted_weights.get() } };
482  _convert_weights->run(convert_pack);
483 
484  cur_weights->mark_as_unused();
485  cur_weights = converted_weights.get();
486 
487  _are_weights_converted = true;
488  }
489 
490  tensors.add_const_tensor(ACL_SRC_1, cur_weights);
491 
492  // Prepare GEMM prepare and release unused weights
493  if(!_is_quantized)
494  {
495  _mm_gemm->prepare(tensors);
496  }
497  else
498  {
499  _mm_gemmlowp->prepare(tensors);
500  }
501  _is_prepared = true;
502  }
503 }
int offset_int_vec(int offset)
Definition: MemoryHelpers.h:38

◆ run()

void run ( ITensorPack tensors)
overridevirtual

Run the kernels contained in the function.

Parameters
[in]tensorsVector that contains the tensors to operate on.

Reimplemented from ICLOperator.

Definition at line 419 of file ClFullyConnected.cpp.

References arm_compute::ACL_DST, arm_compute::ACL_SRC, arm_compute::ACL_SRC_0, arm_compute::ACL_SRC_1, ITensorPack::add_const_tensor(), CLAuxTensorHandler::get(), ITensorPack::get_const_tensor(), arm_compute::offset_int_vec(), ClFullyConnected::prepare(), and arm_compute::test::validation::src.

420 {
421  prepare(tensors);
422 
423  auto src = tensors.get_const_tensor(ACL_SRC_0);
424 
425  CLAuxTensorHandler flattened_src(offset_int_vec(FlattenedSrc), _flattened_src, tensors, false);
426  CLAuxTensorHandler weights(_weights_to_use_idx, _weights_to_use, tensors, false);
427 
428  // Linearize input if it comes from a convolutional layer
429  if(_is_fc_after_conv)
430  {
431  ITensorPack flatten_pack{ { ACL_SRC, src }, { ACL_DST, flattened_src.get() } };
432  _flatten->run(flatten_pack);
433  }
434 
435  ITensorPack gemm_pack = tensors;
436  gemm_pack.add_const_tensor(ACL_SRC_0, (_is_fc_after_conv) ? flattened_src.get() : src);
437  if(_weights_to_use_idx != ACL_SRC_1)
438  {
439  gemm_pack.add_const_tensor(ACL_SRC_1, weights.get());
440  }
441 
442  // Run matrix multiply
443  if(_is_quantized)
444  {
445  _mm_gemmlowp->run(gemm_pack);
446  }
447  else
448  {
449  _mm_gemm->run(gemm_pack);
450  }
451 }
void prepare(ITensorPack &tensors) override
Prepare the function for executing.
SimpleTensor< float > src
Definition: DFT.cpp:155
int offset_int_vec(int offset)
Definition: MemoryHelpers.h:38

◆ validate()

Status validate ( const ITensorInfo src,
const ITensorInfo weights,
const ITensorInfo biases,
const ITensorInfo dst,
FullyConnectedLayerInfo  fc_info = FullyConnectedLayerInfo() 
)
static

Static function to check if given info will lead to a valid configuration.

Similar to ClFullyConnected::configure()

Returns
a status

Definition at line 328 of file ClFullyConnected.cpp.

References ActivationLayerInfo::activation(), FullyConnectedLayerInfo::activation_info, ITensorInfo::are_values_constant(), FullyConnectedLayerInfo::are_weights_reshaped, ARM_COMPUTE_RETURN_ERROR_ON, ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES, ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR, ARM_COMPUTE_RETURN_ON_ERROR, ActivationLayerInfo::BOUNDED_RELU, Dimensions< T >::cbegin(), Dimensions< T >::cend(), ICloneable< T >::clone(), arm_compute::misc::shape_calculator::compute_flatten_shape(), arm_compute::misc::shape_calculator::compute_transposed_shape(), ITensorInfo::data_layout(), ITensorInfo::data_type(), ITensorInfo::dimension(), ActivationLayerInfo::enabled(), arm_compute::F16, arm_compute::F32, arm_compute::is_data_type_quantized(), ActivationLayerInfo::LU_BOUNDED_RELU, arm_compute::NCHW, ITensorInfo::num_dimensions(), Dimensions< size_t >::num_max_dimensions, arm_compute::QASYMM8, arm_compute::QASYMM8_SIGNED, ActivationLayerInfo::RELU, arm_compute::S32, arm_compute::test::validation::src, ITensorInfo::tensor_shape(), FullyConnectedLayerInfo::transpose_weights, ClTranspose::validate(), ClConvertFullyConnectedWeights::validate(), ClFlatten::validate(), and FullyConnectedLayerInfo::weights_trained_layout.

Referenced by ClFullyConnected::configure(), and CLFullyConnectedLayer::validate().

330 {
334  ARM_COMPUTE_RETURN_ERROR_ON(weights->num_dimensions() > 2);
335  ARM_COMPUTE_RETURN_ERROR_ON(fc_info.activation_info.enabled() && is_data_type_quantized(src->data_type()) && fc_info.activation_info.activation() != ActivationLayerInfo::ActivationFunction::RELU
336  && fc_info.activation_info.activation() != ActivationLayerInfo::ActivationFunction::BOUNDED_RELU && fc_info.activation_info.activation() != ActivationLayerInfo::ActivationFunction::LU_BOUNDED_RELU);
337  ARM_COMPUTE_RETURN_ERROR_ON(!weights->are_values_constant() && (!fc_info.are_weights_reshaped || fc_info.transpose_weights));
338 
339  bool weights_reshaped = fc_info.transpose_weights ? fc_info.are_weights_reshaped : true;
340  bool is_fc_after_conv = true;
341 
342  const ITensorInfo &flatten_src = TensorInfo(src->clone()->set_is_resizable(true).reset_padding().set_tensor_shape(compute_flatten_shape(src)).set_data_layout(DataLayout::NCHW));
343  const ITensorInfo &reshaped_weights = TensorInfo(weights->clone()->set_is_resizable(true).reset_padding().set_tensor_shape(compute_transposed_shape(*weights)));
344  const ITensorInfo &converted_weights = weights_reshaped ? TensorInfo(weights->clone()->set_is_resizable(true).reset_padding()) : TensorInfo(*reshaped_weights.clone());
345 
346  // With the Fully Connected layer we can have 4 different cases:
347  // 1) Convolution layer -> Fully Connected layer without batches
348  // 2) Fully Connected layer -> Fully Connected layer without batches
349  // 3) Convolution layer -> Fully Connected layer with batches
350  // 4) Fully Connected layer -> Fully Connected layer with batches
351 
352  const ITensorInfo *src_to_use = src;
353  const ITensorInfo *weights_to_use = weights;
354 
355  if(biases != nullptr)
356  {
357  ARM_COMPUTE_RETURN_ERROR_ON(biases->num_dimensions() > 1);
358  if(is_data_type_quantized(src->data_type()))
359  {
361  }
362  else
363  {
365  }
366  }
367 
368  // Check if we have a fully connected layer with batches
369  const bool is_batched_fc_layer = dst->dimension(1) > 1;
370  if(is_batched_fc_layer)
371  {
372  is_fc_after_conv = (TensorShape::num_max_dimensions >= 4) && (std::equal(src->tensor_shape().cbegin() + 3,
373  src->tensor_shape().cend(),
374  dst->tensor_shape().cbegin() + 1));
375  }
376  else
377  {
378  is_fc_after_conv = src->num_dimensions() > 1;
379  }
380 
381  if(!weights_reshaped)
382  {
383  // Validate reshape weights kernel
384  ARM_COMPUTE_RETURN_ON_ERROR(ClTranspose::validate(weights, &reshaped_weights));
385  weights_to_use = &reshaped_weights;
386  }
387 
388  if(is_fc_after_conv && (src->data_layout() != fc_info.weights_trained_layout))
389  {
390  // Validate convert weights kernel
392  &converted_weights,
393  src->tensor_shape(),
394  fc_info.weights_trained_layout));
395  weights_to_use = &converted_weights;
396  }
397 
398  if(is_fc_after_conv)
399  {
400  // Fully Connected layer after a Convolution Layer without batches
401  ARM_COMPUTE_RETURN_ERROR_ON((weights_to_use->dimension(1) != (src->dimension(0) * src->dimension(1) * src->dimension(2))));
402 
403  // Validate flatten kernel
405  src_to_use = &flatten_src;
406  }
407  else
408  {
409  // Fully Connected layer after a Fully Connected Layer without batches
410  ARM_COMPUTE_RETURN_ERROR_ON(src->dimension(0) != weights_to_use->dimension(1));
411  }
412 
413  // Validate matrix multiply kernel
414  ARM_COMPUTE_RETURN_ON_ERROR(validate_mm(*src_to_use, *weights_to_use, biases, *dst, fc_info));
415 
416  return Status{};
417 }
bool is_data_type_quantized(DataType dt)
Check if a given data type is of quantized type.
Definition: Utils.h:1030
static Status validate(const ITensorInfo *src, const ITensorInfo *dst, const TensorShape &original_src_shape, DataLayout data_layout)
Static function to check if given info will lead to a valid configuration.
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204
1 channel, 1 F32 per channel
#define ARM_COMPUTE_RETURN_ERROR_ON(cond)
If the condition is true, an error is returned.
Definition: Error.h:296
SimpleTensor< float > src
Definition: DFT.cpp:155
1 channel, 1 F16 per channel
TensorShape compute_transposed_shape(const ITensorInfo &input)
Calculate the transposed shape of a tensor.
#define ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR(...)
Definition: Validate.h:159
1 channel, 1 S32 per channel
TensorShape compute_flatten_shape(const ITensorInfo *input)
Calculate the flattened output shape of a tensor.
quantized, asymmetric fixed-point 8-bit number unsigned
Num samples, channels, height, width.
static Status validate(const ITensorInfo *src, const ITensorInfo *dst)
Static function to check if given info will lead to a valid configuration.
Definition: ClTranspose.cpp:43
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(...)
Definition: Validate.h:541
#define ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(t, c,...)
Definition: Validate.h:788
quantized, asymmetric fixed-point 8-bit number signed
static constexpr size_t num_max_dimensions
Number of dimensions the tensor has.
Definition: Dimensions.h:46
static Status validate(const ITensorInfo *src, const ITensorInfo *dst)
Static function to check if given info will lead to a valid configuration.
Definition: ClFlatten.cpp:43

◆ workspace()

experimental::MemoryRequirements workspace ( ) const
overridevirtual

Return the memory requirements required by the workspace.

Reimplemented from ICLOperator.

Definition at line 505 of file ClFullyConnected.cpp.

506 {
507  return _aux_mem;
508 }

The documentation for this class was generated from the following files: