Compute Library
 21.08
ClFullyConnected Class Reference

Basic function to compute a Fully Connected layer on OpenCL. More...

#include <ClFullyConnected.h>

Collaboration diagram for ClFullyConnected:
[legend]

Public Member Functions

 ClFullyConnected ()
 
 ~ClFullyConnected ()
 
void configure (const CLCompileContext &compile_context, ITensorInfo *src, ITensorInfo *weights, ITensorInfo *biases, ITensorInfo *dst, FullyConnectedLayerInfo fc_info=FullyConnectedLayerInfo())
 Set the input and output tensors. More...
 
void run (ITensorPack &tensors) override
 Run the kernels contained in the function. More...
 
void prepare (ITensorPack &tensors) override
 Prepare the function for executing. More...
 
experimental::MemoryRequirements workspace () const override
 Return the memory requirements required by the workspace. More...
 
- Public Member Functions inherited from ICLOperator
 ICLOperator (IRuntimeContext *ctx=nullptr)
 Constructor. More...
 
 ICLOperator (const ICLOperator &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 ICLOperator (ICLOperator &&)=default
 Default move constructor. More...
 
ICLOperatoroperator= (const ICLOperator &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
ICLOperatoroperator= (ICLOperator &&)=default
 Default move assignment operator. More...
 
- Public Member Functions inherited from IOperator
virtual ~IOperator ()=default
 Destructor. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *src, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *dst, FullyConnectedLayerInfo fc_info=FullyConnectedLayerInfo())
 Static function to check if given info will lead to a valid configuration. More...
 

Detailed Description

Basic function to compute a Fully Connected layer on OpenCL.

This function calls the following OpenCL kernels:

  1. opencl::kernels::ClIm2ColKernel (called when the input comes from a convolutional layer)
  2. CLTranspose (if are_weights_reshaped is set to false and transpose_weights is set to true ) (called once)
  3. opencl::kernels::ClGemmMatrixMultiplyKernel or CLGEMMLowpMatrixMultiplyCore (if quantized asymmetric)
Note
The fully connected layer accepts "weights" tensors only with 2 dimensions.

Definition at line 53 of file ClFullyConnected.h.

Constructor & Destructor Documentation

◆ ClFullyConnected()

Definition at line 144 of file ClFullyConnected.cpp.

145  : _convert_weights(nullptr),
146  _flatten(nullptr),
147  _reshape_weights(nullptr),
148  _mm_gemm(nullptr),
149  _mm_gemmlowp(nullptr),
150  _aux_mem(Count)
151 {
152 }

◆ ~ClFullyConnected()

~ClFullyConnected ( )
default

Member Function Documentation

◆ configure()

void configure ( const CLCompileContext compile_context,
ITensorInfo src,
ITensorInfo weights,
ITensorInfo biases,
ITensorInfo dst,
FullyConnectedLayerInfo  fc_info = FullyConnectedLayerInfo() 
)

Set the input and output tensors.

Valid data layouts:

  • NHWC
  • NCHW

Valid data type configurations:

src0 src1 src2 dst
F16 F16 F16 F16
F32 F32 F32 F32
QASYMM8 QASYMM8 S32 QASYMM8
QASYMM8_SIGNED QASYMM8_SIGNED S32 QASYMM8_SIGNED
Parameters
[in]compile_contextThe compile context to be used.
[in]srcSource tensor. Data type supported: QASYMM8/QASYMM8_SIGNED/F16/F32.
[in]weightsWeights tensor. The weights must be 2 dimensional. If this function is called after a Convolution Layer, the (transposed) weights will have as many rows as the product of the first 3 input's dimensions. If it is called after another FullyConnected Layer, the (transposed) weights will have as many rows as the input's first dimension. Data type supported: Same as src.
[in]biasesBias tensor. Can be nullptr. Data type supported:Same as src.
[out]dstDestination tensor. Its shape should be equal to the output of a matrix multiplication between:
  • The output of im2col on the input and the (transposed) 2D weights, if the function is called after a Convolution Layer
  • The input tensor and the (transposed) 2D weights, if the function is called after another FullyConnected Layer. Data type supported: Same as src.
[in]fc_info(Optional) Fully connected layer additional info

Definition at line 227 of file ClFullyConnected.cpp.

References arm_compute::ACL_SRC_1, FullyConnectedLayerInfo::are_weights_reshaped, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, Dimensions< T >::cbegin(), Dimensions< T >::cend(), ITensorInfo::data_layout(), ITensorInfo::data_type(), ITensorInfo::dimension(), arm_compute::is_data_type_quantized_asymmetric(), ITensorInfo::num_dimensions(), Dimensions< size_t >::num_max_dimensions, arm_compute::offset_int_vec(), FullyConnectedLayerInfo::retain_internal_weights, ITensorInfo::tensor_shape(), TensorInfo::total_size(), FullyConnectedLayerInfo::transpose_weights, ClFullyConnected::validate(), and FullyConnectedLayerInfo::weights_trained_layout.

229 {
231 
232  // Perform validate step
233  ARM_COMPUTE_ERROR_THROW_ON(ClFullyConnected::validate(src, weights, biases, dst, fc_info));
234 
235  _are_weights_converted = true;
236  _are_weights_reshaped = fc_info.transpose_weights ? fc_info.are_weights_reshaped : true;
237  _is_fc_after_conv = true;
238  _is_quantized = is_data_type_quantized_asymmetric(src->data_type());
239  _is_prepared = fc_info.retain_internal_weights;
240  _weights_to_use = TensorInfo(*weights);
241  _weights_to_use_idx = ACL_SRC_1;
242 
243  // With the Fully Connected layer we can have 4 different cases:
244  // 1) Convolution layer -> Fully Connected layer without batches
245  // 2) Fully Connected layer -> Fully Connected layer without batches
246  // 3) Convolution layer -> Fully Connected layer with batches
247  // 4) Fully Connected layer -> Fully Connected layer with batches
248 
249  // Check if we have a fully connected layer with batches
250  const bool is_batched_fc_layer = dst->dimension(1) > 1;
251  if(is_batched_fc_layer)
252  {
253  _is_fc_after_conv = (TensorShape::num_max_dimensions >= 4) && (std::equal(src->tensor_shape().cbegin() + 3,
254  src->tensor_shape().cend(),
255  dst->tensor_shape().cbegin() + 1));
256  }
257  else
258  {
259  _is_fc_after_conv = src->num_dimensions() > 1;
260  }
261 
262  ITensorInfo *weights_used = weights;
263 
264  // Reshape weights if needed
265  if(!_are_weights_reshaped)
266  {
267  // Reshape the weights
268  _reshape_weights = std::make_unique<ClTranspose>();
269  _reshape_weights->configure(compile_context, weights, &_reshaped_weights);
270  weights_used = &_reshaped_weights;
271  _weights_to_use_idx = offset_int_vec(TransposedWeights);
272  }
273 
274  // Convert weights if needed
275  if(_is_fc_after_conv && (src->data_layout() != fc_info.weights_trained_layout))
276  {
277  // Convert weights
278  _convert_weights = std::make_unique<ClConvertFullyConnectedWeights>();
279  _convert_weights->configure(compile_context,
280  weights_used,
281  &_converted_weights,
282  src->tensor_shape(),
283  fc_info.weights_trained_layout);
284 
285  weights_used = &_converted_weights;
286  _weights_to_use_idx = offset_int_vec(ConvertedWeights);
287  _are_weights_converted = false;
288  }
289 
290  if(_is_fc_after_conv)
291  {
292  // Fully Connected layer after a Convolution Layer without batches
293  configure_conv_fc(compile_context, src, weights_used, biases, dst, fc_info);
294  }
295  else
296  {
297  // Fully Connected layer after a Fully Connected Layer without batches
298  configure_fc_fc(compile_context, src, weights_used, biases, dst, fc_info);
299  }
300  // Update TensorInfo of final weights used (Need to be done in the end due to padding expansion)
301  _weights_to_use = *weights_used;
302 
303  // Set auxiliary memory requirements
304  auto gemm_mem_req = (_is_quantized) ? _mm_gemmlowp->workspace() : _mm_gemm->workspace();
305  for(unsigned int i = 0; i < gemm_mem_req.size(); ++i)
306  {
307  _aux_mem[i] = gemm_mem_req[i];
308  }
309  if(_aux_mem[1].size > 0 || _aux_mem[2].size > 0) // Persistent weights memory on GEMMs
310  {
311  // Release permuted weights at the of prepare as they are further transposed by the assembly dispatch
312  _aux_mem[TransposedWeights] = MemoryInfo(offset_int_vec(TransposedWeights), MemoryLifetime::Prepare, _reshaped_weights.total_size());
313  _aux_mem[ConvertedWeights] = MemoryInfo(offset_int_vec(ConvertedWeights), MemoryLifetime::Prepare, _converted_weights.total_size());
314  }
315  else
316  {
317  // Release permuted weights at the of prepare as they are further transposed by the assembly dispatch
318  const auto transposed_wei_lft = (_weights_to_use_idx == offset_int_vec(TransposedWeights)) ? MemoryLifetime::Persistent : MemoryLifetime::Prepare;
319  const auto converted_wei_lft = (_weights_to_use_idx == offset_int_vec(ConvertedWeights)) ? MemoryLifetime::Persistent : MemoryLifetime::Prepare;
320 
321  _aux_mem[TransposedWeights] = MemoryInfo(offset_int_vec(TransposedWeights), transposed_wei_lft, _reshaped_weights.total_size());
322  _aux_mem[ConvertedWeights] = MemoryInfo(offset_int_vec(ConvertedWeights), converted_wei_lft, _converted_weights.total_size());
323  }
324  _aux_mem[FlattenedSrc] = MemoryInfo(offset_int_vec(FlattenedSrc), MemoryLifetime::Temporary, _flattened_src.total_size());
325 }
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
SimpleTensor< float > src
Definition: DFT.cpp:155
size_t total_size() const override
Returns the total size of the tensor in bytes.
Definition: TensorInfo.h:250
static Status validate(const ITensorInfo *src, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *dst, FullyConnectedLayerInfo fc_info=FullyConnectedLayerInfo())
Static function to check if given info will lead to a valid configuration.
bool is_data_type_quantized_asymmetric(DataType dt)
Check if a given data type is of asymmetric quantized type.
Definition: Utils.h:1003
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:157
int offset_int_vec(int offset)
Definition: MemoryHelpers.h:38
static constexpr size_t num_max_dimensions
Number of dimensions the tensor has.
Definition: Dimensions.h:46

◆ prepare()

void prepare ( ITensorPack constants)
overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Parameters
[in]constantsVector that contains the constants tensors.
Note
Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from ICLOperator.

Definition at line 439 of file ClFullyConnected.cpp.

References arm_compute::ACL_DST, arm_compute::ACL_SRC, arm_compute::ACL_SRC_1, ITensorPack::add_const_tensor(), CLAuxTensorHandler::get(), ITensorPack::get_const_tensor(), ITensor::mark_as_unused(), and arm_compute::offset_int_vec().

Referenced by ClFullyConnected::run().

440 {
441  if(!_is_prepared)
442  {
443  auto weights = tensors.get_const_tensor(ACL_SRC_1);
444 
445  CLAuxTensorHandler reshaped_weights(offset_int_vec(TransposedWeights), _reshaped_weights, tensors, false);
446  CLAuxTensorHandler converted_weights(offset_int_vec(ConvertedWeights), _converted_weights, tensors, false);
447 
448  // Pointer to current weights
449  const ITensor *cur_weights = weights;
450 
451  // Reshape of the weights if needed (happens only once)
452  if(!_are_weights_reshaped)
453  {
454  // Run reshape weights kernel and mark weights as unused
455  ITensorPack transpose_pack{ { ACL_SRC, weights }, { ACL_DST, reshaped_weights.get() } };
456  _reshape_weights->run(transpose_pack);
457 
458  cur_weights->mark_as_unused();
459  cur_weights = reshaped_weights.get();
460 
461  _are_weights_reshaped = true;
462  }
463 
464  // Convert weights if needed (happens only once)
465  if(!_are_weights_converted)
466  {
467  ITensorPack convert_pack{ { ACL_SRC, cur_weights }, { ACL_DST, converted_weights.get() } };
468  _convert_weights->run(convert_pack);
469 
470  cur_weights->mark_as_unused();
471  cur_weights = converted_weights.get();
472 
473  _are_weights_converted = true;
474  }
475 
476  tensors.add_const_tensor(ACL_SRC_1, cur_weights);
477 
478  // Prepare GEMM prepare and release unused weights
479  if(!_is_quantized)
480  {
481  _mm_gemm->prepare(tensors);
482  }
483  else
484  {
485  _mm_gemmlowp->prepare(tensors);
486  }
487  _is_prepared = true;
488  }
489 }
int offset_int_vec(int offset)
Definition: MemoryHelpers.h:38

◆ run()

void run ( ITensorPack tensors)
overridevirtual

Run the kernels contained in the function.

Parameters
[in]tensorsVector that contains the tensors to operate on.

Reimplemented from ICLOperator.

Definition at line 405 of file ClFullyConnected.cpp.

References arm_compute::ACL_DST, arm_compute::ACL_SRC, arm_compute::ACL_SRC_0, arm_compute::ACL_SRC_1, ITensorPack::add_const_tensor(), CLAuxTensorHandler::get(), ITensorPack::get_const_tensor(), arm_compute::offset_int_vec(), ClFullyConnected::prepare(), and arm_compute::test::validation::src.

406 {
407  prepare(tensors);
408 
409  auto src = tensors.get_const_tensor(ACL_SRC_0);
410 
411  CLAuxTensorHandler flattened_src(offset_int_vec(FlattenedSrc), _flattened_src, tensors, false);
412  CLAuxTensorHandler weights(_weights_to_use_idx, _weights_to_use, tensors, false);
413 
414  // Linearize input if it comes from a convolutional layer
415  if(_is_fc_after_conv)
416  {
417  ITensorPack flatten_pack{ { ACL_SRC, src }, { ACL_DST, flattened_src.get() } };
418  _flatten->run(flatten_pack);
419  }
420 
421  ITensorPack gemm_pack = tensors;
422  gemm_pack.add_const_tensor(ACL_SRC_0, (_is_fc_after_conv) ? flattened_src.get() : src);
423  if(_weights_to_use_idx != ACL_SRC_1)
424  {
425  gemm_pack.add_const_tensor(ACL_SRC_1, weights.get());
426  }
427 
428  // Run matrix multiply
429  if(_is_quantized)
430  {
431  _mm_gemmlowp->run(gemm_pack);
432  }
433  else
434  {
435  _mm_gemm->run(gemm_pack);
436  }
437 }
void prepare(ITensorPack &tensors) override
Prepare the function for executing.
SimpleTensor< float > src
Definition: DFT.cpp:155
int offset_int_vec(int offset)
Definition: MemoryHelpers.h:38

◆ validate()

Status validate ( const ITensorInfo src,
const ITensorInfo weights,
const ITensorInfo biases,
const ITensorInfo dst,
FullyConnectedLayerInfo  fc_info = FullyConnectedLayerInfo() 
)
static

Static function to check if given info will lead to a valid configuration.

Similar to ClFullyConnected::configure()

Returns
a status

Definition at line 327 of file ClFullyConnected.cpp.

References ActivationLayerInfo::activation(), FullyConnectedLayerInfo::activation_info, FullyConnectedLayerInfo::are_weights_reshaped, ARM_COMPUTE_RETURN_ERROR_ON, ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES, ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR, ARM_COMPUTE_RETURN_ON_ERROR, ActivationLayerInfo::BOUNDED_RELU, Dimensions< T >::cbegin(), Dimensions< T >::cend(), ICloneable< T >::clone(), arm_compute::misc::shape_calculator::compute_flatten_shape(), arm_compute::misc::shape_calculator::compute_transposed_shape(), FullyConnectedLayerInfo::constant_weights, ITensorInfo::data_layout(), ITensorInfo::data_type(), ITensorInfo::dimension(), ActivationLayerInfo::enabled(), arm_compute::F16, arm_compute::F32, arm_compute::is_data_type_quantized(), ActivationLayerInfo::LU_BOUNDED_RELU, arm_compute::NCHW, ITensorInfo::num_dimensions(), Dimensions< size_t >::num_max_dimensions, arm_compute::QASYMM8, arm_compute::QASYMM8_SIGNED, ActivationLayerInfo::RELU, arm_compute::test::validation::src, ITensorInfo::tensor_shape(), FullyConnectedLayerInfo::transpose_weights, ClTranspose::validate(), ClConvertFullyConnectedWeights::validate(), ClFlatten::validate(), and FullyConnectedLayerInfo::weights_trained_layout.

Referenced by ClFullyConnected::configure(), and CLFullyConnectedLayer::validate().

329 {
333  ARM_COMPUTE_RETURN_ERROR_ON(weights->num_dimensions() > 2);
334  ARM_COMPUTE_RETURN_ERROR_ON(fc_info.activation_info.enabled() && is_data_type_quantized(src->data_type()) && fc_info.activation_info.activation() != ActivationLayerInfo::ActivationFunction::RELU
335  && fc_info.activation_info.activation() != ActivationLayerInfo::ActivationFunction::BOUNDED_RELU && fc_info.activation_info.activation() != ActivationLayerInfo::ActivationFunction::LU_BOUNDED_RELU);
336  ARM_COMPUTE_RETURN_ERROR_ON(!fc_info.constant_weights && (!fc_info.are_weights_reshaped || fc_info.transpose_weights));
337 
338  bool weights_reshaped = fc_info.transpose_weights ? fc_info.are_weights_reshaped : true;
339  bool is_fc_after_conv = true;
340 
341  const ITensorInfo &flatten_src = TensorInfo(src->clone()->set_is_resizable(true).reset_padding().set_tensor_shape(compute_flatten_shape(src)).set_data_layout(DataLayout::NCHW));
342  const ITensorInfo &reshaped_weights = TensorInfo(weights->clone()->set_is_resizable(true).reset_padding().set_tensor_shape(compute_transposed_shape(*weights)));
343  const ITensorInfo &converted_weights = weights_reshaped ? TensorInfo(weights->clone()->set_is_resizable(true).reset_padding()) : TensorInfo(*reshaped_weights.clone());
344 
345  // With the Fully Connected layer we can have 4 different cases:
346  // 1) Convolution layer -> Fully Connected layer without batches
347  // 2) Fully Connected layer -> Fully Connected layer without batches
348  // 3) Convolution layer -> Fully Connected layer with batches
349  // 4) Fully Connected layer -> Fully Connected layer with batches
350 
351  const ITensorInfo *src_to_use = src;
352  const ITensorInfo *weights_to_use = weights;
353 
354  // Check if we have a fully connected layer with batches
355  const bool is_batched_fc_layer = dst->dimension(1) > 1;
356  if(is_batched_fc_layer)
357  {
358  is_fc_after_conv = (TensorShape::num_max_dimensions >= 4) && (std::equal(src->tensor_shape().cbegin() + 3,
359  src->tensor_shape().cend(),
360  dst->tensor_shape().cbegin() + 1));
361  }
362  else
363  {
364  is_fc_after_conv = src->num_dimensions() > 1;
365  }
366 
367  if(!weights_reshaped)
368  {
369  // Validate reshape weights kernel
370  ARM_COMPUTE_RETURN_ON_ERROR(ClTranspose::validate(weights, &reshaped_weights));
371  weights_to_use = &reshaped_weights;
372  }
373 
374  if(is_fc_after_conv && (src->data_layout() != fc_info.weights_trained_layout))
375  {
376  // Validate convert weights kernel
378  &converted_weights,
379  src->tensor_shape(),
380  fc_info.weights_trained_layout));
381  weights_to_use = &converted_weights;
382  }
383 
384  if(is_fc_after_conv)
385  {
386  // Fully Connected layer after a Convolution Layer without batches
387  ARM_COMPUTE_RETURN_ERROR_ON((weights_to_use->dimension(1) != (src->dimension(0) * src->dimension(1) * src->dimension(2))));
388 
389  // Validate flatten kernel
391  src_to_use = &flatten_src;
392  }
393  else
394  {
395  // Fully Connected layer after a Fully Connected Layer without batches
396  ARM_COMPUTE_RETURN_ERROR_ON(src->dimension(0) != weights_to_use->dimension(1));
397  }
398 
399  // Validate matrix multiply kernel
400  ARM_COMPUTE_RETURN_ON_ERROR(validate_mm(*src_to_use, *weights_to_use, biases, *dst, fc_info));
401 
402  return Status{};
403 }
bool is_data_type_quantized(DataType dt)
Check if a given data type is of quantized type.
Definition: Utils.h:981
static Status validate(const ITensorInfo *src, const ITensorInfo *dst, const TensorShape &original_src_shape, DataLayout data_layout)
Static function to check if given info will lead to a valid configuration.
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204
1 channel, 1 F32 per channel
#define ARM_COMPUTE_RETURN_ERROR_ON(cond)
If the condition is true, an error is returned.
Definition: Error.h:296
SimpleTensor< float > src
Definition: DFT.cpp:155
1 channel, 1 F16 per channel
TensorShape compute_transposed_shape(const ITensorInfo &input)
Calculate the transposed shape of a tensor.
#define ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR(...)
Definition: Validate.h:159
TensorShape compute_flatten_shape(const ITensorInfo *input)
Calculate the flattened output shape of a tensor.
quantized, asymmetric fixed-point 8-bit number unsigned
Num samples, channels, height, width.
static Status validate(const ITensorInfo *src, const ITensorInfo *dst)
Static function to check if given info will lead to a valid configuration.
Definition: ClTranspose.cpp:40
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(...)
Definition: Validate.h:541
#define ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(t, c,...)
Definition: Validate.h:788
quantized, asymmetric fixed-point 8-bit number signed
static constexpr size_t num_max_dimensions
Number of dimensions the tensor has.
Definition: Dimensions.h:46
static Status validate(const ITensorInfo *src, const ITensorInfo *dst)
Static function to check if given info will lead to a valid configuration.
Definition: ClFlatten.cpp:40

◆ workspace()

experimental::MemoryRequirements workspace ( ) const
overridevirtual

Return the memory requirements required by the workspace.

Reimplemented from ICLOperator.

Definition at line 491 of file ClFullyConnected.cpp.

492 {
493  return _aux_mem;
494 }

The documentation for this class was generated from the following files: