Compute Library
 21.02
CLGEMMConvolutionLayer Class Reference

Basic function to compute the convolution layer. More...

#include <CLGEMMConvolutionLayer.h>

Collaboration diagram for CLGEMMConvolutionLayer:
[legend]

Public Member Functions

 CLGEMMConvolutionLayer (std::shared_ptr< IMemoryManager > memory_manager=nullptr, IWeightsManager *weights_manager=nullptr)
 Constructor. More...
 
 CLGEMMConvolutionLayer (const CLGEMMConvolutionLayer &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 CLGEMMConvolutionLayer (CLGEMMConvolutionLayer &&)=default
 Default move constructor. More...
 
CLGEMMConvolutionLayeroperator= (const CLGEMMConvolutionLayer &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
CLGEMMConvolutionLayeroperator= (CLGEMMConvolutionLayer &&)=default
 Default move assignment operator. More...
 
 ~CLGEMMConvolutionLayer ()
 Default destructor. More...
 
void configure (const ICLTensor *input, const ICLTensor *weights, const ICLTensor *biases, ICLTensor *output, const PadStrideInfo &conv_info, const WeightsInfo &weights_info=WeightsInfo(), const Size2D &dilation=Size2D(1U, 1U), const ActivationLayerInfo &act_info=ActivationLayerInfo(), unsigned int num_groups=1)
 Set the input and output tensors. More...
 
void configure (const CLCompileContext &compile_context, const ICLTensor *input, const ICLTensor *weights, const ICLTensor *biases, ICLTensor *output, const PadStrideInfo &conv_info, const WeightsInfo &weights_info=WeightsInfo(), const Size2D &dilation=Size2D(1U, 1U), const ActivationLayerInfo &act_info=ActivationLayerInfo(), unsigned int num_groups=1)
 Set the input and output tensors. More...
 
void run () override
 Run the kernels contained in the function. More...
 
void prepare () override
 Prepare the function for executing. More...
 
- Public Member Functions inherited from IFunction
virtual ~IFunction ()=default
 Destructor. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, const PadStrideInfo &conv_info, const WeightsInfo &weights_info=WeightsInfo(), const Size2D &dilation=Size2D(1U, 1U), const ActivationLayerInfo &act_info=ActivationLayerInfo(), unsigned int num_groups=1)
 Static function to check if given info will lead to a valid configuration of CLGEMMConvolutionLayer. More...
 

Detailed Description

Basic function to compute the convolution layer.

This function calls the following OpenCL kernels/functions:

  1. CLIm2ColKernel
  2. CLGEMM (if the data type is FP32 or FP16)
  3. CLGEMMLowpMatrixMultiplyCore (if the data type is QASYMM8/QASYMM8_SIGNED)
  4. CLGEMMLowpOutputStage with QUANTIZE_DOWN_FIXEDPOINT type of quantization (if the data type is QASYMM8/QASYMM8_SIGNED)
  5. CLCol2ImKernel (if NCHW data layout)

Definition at line 176 of file CLGEMMConvolutionLayer.h.

Constructor & Destructor Documentation

◆ CLGEMMConvolutionLayer() [1/3]

CLGEMMConvolutionLayer ( std::shared_ptr< IMemoryManager memory_manager = nullptr,
IWeightsManager weights_manager = nullptr 
)

Constructor.

Parameters
[in]memory_manager(Optional) Memory manager.
[in]weights_manager(Optional) Weights manager.

Definition at line 118 of file CLGEMMConvolutionLayer.cpp.

119  : _memory_group(memory_manager), _weights_manager(weights_manager), _reshape_weights(), _reshape_weights_managed(), _im2col_kernel(std::make_unique<CLIm2ColKernel>()), _mm_gemm(memory_manager,
120  weights_manager), _mm_gemmlowp(memory_manager), _col2im_kernel(std::make_unique<CLCol2ImKernel>()), _activationlayer_function(), _original_weights(nullptr), _im2col_output(), _weights_reshaped(),
121  _gemm_output(), _skip_im2col(false), _skip_col2im(false), _is_quantized(false), _fuse_activation(true), _is_prepared(false)
122 {
123 }

◆ CLGEMMConvolutionLayer() [2/3]

Prevent instances of this class from being copied (As this class contains pointers)

◆ CLGEMMConvolutionLayer() [3/3]

Default move constructor.

◆ ~CLGEMMConvolutionLayer()

~CLGEMMConvolutionLayer ( )
default

Default destructor.

Member Function Documentation

◆ configure() [1/2]

void configure ( const ICLTensor input,
const ICLTensor weights,
const ICLTensor biases,
ICLTensor output,
const PadStrideInfo conv_info,
const WeightsInfo weights_info = WeightsInfo(),
const Size2D dilation = Size2D(1U, 1U),
const ActivationLayerInfo act_info = ActivationLayerInfo(),
unsigned int  num_groups = 1 
)

Set the input and output tensors.

Parameters
[in]inputSource tensor. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: QASYMM8/QASYMM8_SIGNED/F16/F32.
[in]weightsWeights tensor. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. Data type supported: Same as input or QASYMM8/QSYMM8_PER_CHANNEL when input is QASYMM8 or QASYMM8_SIGNED/QSYMM8_PER_CHANNEL when input is QASYMM8_SIGNED.
[in]biasesBiases tensor. Shared biases supported. Biases are 1D tensor with dimensions [OFM]. Data type supported: Should match input data type, except for input of quantized type where biases should be of S32 type.
[out]outputDestination tensor. 3 lower dimensions represent a single output [width, height, OFM], while the rest represent batch of outputs. Data types supported: Same as input.
[in]conv_infoContains padding and stride information described in PadStrideInfo.
[in]weights_infoSpecifies if the weights tensor has been reshaped with CLWeightsReshapeKernel. If this is not part of the fully connected layer the weights tensor has also been transposed with CLGEMMReshapeRHSMatrixKernel. Data type supported: Same as input.
[in]dilation(Optional) Dilation, in elements, across x and y. Defaults to (1, 1).
[in]act_info(Optional) Activation layer information in case of a fused activation.
[in]num_groups(Optional) Number of groups when performing a grouped convolution. num_groups != 1 is only supported for NCHW data layout

Definition at line 206 of file CLGEMMConvolutionLayer.cpp.

References CLKernelLibrary::get().

208 {
209  configure(CLKernelLibrary::get().get_compile_context(), input, weights, biases, output, conv_info, weights_info, dilation, act_info, num_groups);
210 }
static CLKernelLibrary & get()
Access the KernelLibrary singleton.
const unsigned int num_groups
Definition: Im2Col.cpp:153
void configure(const ICLTensor *input, const ICLTensor *weights, const ICLTensor *biases, ICLTensor *output, const PadStrideInfo &conv_info, const WeightsInfo &weights_info=WeightsInfo(), const Size2D &dilation=Size2D(1U, 1U), const ActivationLayerInfo &act_info=ActivationLayerInfo(), unsigned int num_groups=1)
Set the input and output tensors.

◆ configure() [2/2]

void configure ( const CLCompileContext compile_context,
const ICLTensor input,
const ICLTensor weights,
const ICLTensor biases,
ICLTensor output,
const PadStrideInfo conv_info,
const WeightsInfo weights_info = WeightsInfo(),
const Size2D dilation = Size2D(1U, 1U),
const ActivationLayerInfo act_info = ActivationLayerInfo(),
unsigned int  num_groups = 1 
)

Set the input and output tensors.

Parameters
[in]compile_contextThe compile context to be used.
[in]inputSource tensor. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: QASYMM8/QASYMM8_SIGNED/F16/F32.
[in]weightsWeights tensor. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. Data type supported: Same as input or QASYMM8/QSYMM8_PER_CHANNEL when input is QASYMM8 or QASYMM8_SIGNED/QSYMM8_PER_CHANNEL when input is QASYMM8_SIGNED.
[in]biasesBiases tensor. Shared biases supported. Biases are 1D tensor with dimensions [OFM]. Data type supported: Should match input data type, except for input of quantized type where biases should be of S32 type.
[out]outputDestination tensor. 3 lower dimensions represent a single output [width, height, OFM], while the rest represent batch of outputs. Data types supported: Same as input.
[in]conv_infoContains padding and stride information described in PadStrideInfo.
[in]weights_infoSpecifies if the weights tensor has been reshaped with CLWeightsReshapeKernel. If this is not part of the fully connected layer the weights tensor has also been transposed with CLGEMMReshapeRHSMatrixKernel. Data type supported: Same as input.
[in]dilation(Optional) Dilation, in elements, across x and y. Defaults to (1, 1).
[in]act_info(Optional) Activation layer information in case of a fused activation.
[in]num_groups(Optional) Number of groups when performing a grouped convolution. num_groups != 1 is only supported for NCHW data layout

Definition at line 212 of file CLGEMMConvolutionLayer.cpp.

References IWeightsManager::acquire(), CLTensorAllocator::allocate(), CLTensor::allocator(), IWeightsManager::are_weights_managed(), ARM_COMPUTE_ERROR_ON_MSG, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, ARM_COMPUTE_UNUSED, arm_compute::BATCHES, ActivationLayerInfo::BOUNDED_RELU, arm_compute::quantization::compute_quantized_multipliers_and_shifts(), CLActivationLayer::configure(), CLConvolutionLayerReshapeWeights::configure(), CLConvolutionLayerReshapeWeightsTransform::configure(), arm_compute::test::validation::conv_info, arm_compute::test::validation::data_layout, ITensorInfo::data_layout(), arm_compute::test::validation::data_type, ITensorInfo::data_type(), ITensorInfo::dimension(), GEMMLowpOutputStageInfo::gemmlowp_max_bound, GEMMLowpOutputStageInfo::gemmlowp_min_bound, GEMMLowpOutputStageInfo::gemmlowp_multiplier, GEMMLowpOutputStageInfo::gemmlowp_multipliers, GEMMLowpOutputStageInfo::gemmlowp_offset, GEMMLowpOutputStageInfo::gemmlowp_shift, GEMMLowpOutputStageInfo::gemmlowp_shifts, CLScheduler::get(), arm_compute::get_data_layout_dimension_index(), arm_compute::get_min_max(), arm_compute::get_quantized_activation_min_max(), arm_compute::HEIGHT, arm_compute::test::validation::idx_height, arm_compute::test::validation::idx_width, ITensor::info(), CLTensor::info(), ITensorAllocator::init(), arm_compute::test::validation::input, arm_compute::is_data_type_quantized_asymmetric(), arm_compute::is_data_type_quantized_per_channel(), GEMMLowpOutputStageInfo::is_quantized_per_channel, ActivationLayerInfo::LU_BOUNDED_RELU, MemoryGroup::manage(), arm_compute::NHWC, arm_compute::test::validation::num_groups, ITensorInfo::quantization_info(), arm_compute::QUANTIZE_DOWN_FIXEDPOINT, ActivationLayerInfo::RELU, arm_compute::scaled_dimensions(), TensorShape::set(), arm_compute::test::validation::set_data_layout(), TensorInfo::set_quantization_info(), PadStrideInfo::stride(), TensorInfo::tensor_shape(), ITensorInfo::total_size(), CLScheduler::tune_kernel_static(), GEMMLowpOutputStageInfo::type, QuantizationInfo::uniform(), CLGEMMConvolutionLayer::validate(), arm_compute::test::validation::weights_info, and arm_compute::WIDTH.

215 {
216  ARM_COMPUTE_ERROR_ON_NULLPTR(input, weights, output);
217 
219  weights->info(),
220  biases != nullptr ? biases->info() : nullptr,
221  output->info(),
222  conv_info,
223  weights_info,
224  dilation,
225  act_info,
226  num_groups));
227 
228  const DataType data_type = input->info()->data_type();
229  const DataLayout data_layout = input->info()->data_layout();
232  const int idx_kernels = get_data_layout_dimension_index(data_layout, DataLayoutDimension::BATCHES);
233 
234  const unsigned int kernel_width = weights->info()->dimension(idx_width);
235  const unsigned int kernel_height = weights->info()->dimension(idx_height);
236  const unsigned int num_kernels = weights->info()->dimension(idx_kernels);
237 
238  const UniformQuantizationInfo iq_info = input->info()->quantization_info().uniform();
239  const UniformQuantizationInfo oq_info = output->info()->quantization_info().uniform();
240 
241  _is_prepared = weights_info.retain_internal_weights();
242  _original_weights = weights;
243  _is_quantized = is_data_type_quantized_asymmetric(input->info()->data_type());
244  _skip_im2col = (data_layout == DataLayout::NHWC && kernel_width == 1 && kernel_height == 1 && conv_info.stride().first == 1 && conv_info.stride().second == 1);
245  _skip_col2im = data_layout == DataLayout::NHWC;
246 
247  // Only for quantize there are few cases where we cannot fuse the activation function in GEMM
248  _fuse_activation = true;
249 
250  // Set the GPU target for im2col and col2im
251  _im2col_kernel->set_target(CLScheduler::get().target());
252  _col2im_kernel->set_target(CLScheduler::get().target());
253 
254  const ICLTensor *gemm_input_to_use = input;
255  ICLTensor *gemm_output_to_use = output;
256 
257  // Get parameters from conv_info
258  unsigned int stride_x = 0;
259  unsigned int stride_y = 0;
260  std::tie(stride_x, stride_y) = conv_info.stride();
261 
262  // Get convolved dimensions
263  unsigned int conv_w = 0;
264  unsigned int conv_h = 0;
265  std::tie(conv_w, conv_h) = scaled_dimensions(input->info()->dimension(idx_width),
266  input->info()->dimension(idx_height),
267  kernel_width,
268  kernel_height,
269  conv_info,
270  dilation);
271 
272  unsigned int mat_weights_cols = num_kernels / num_groups;
273 
274  const ICLTensor *biases_to_use = biases;
275  bool append_bias = false;
276 
277  ICLTensor *weights_to_use = &_weights_reshaped;
278  if(num_groups != 1 && biases != nullptr)
279  {
280  // num_groups != 1 can only be for NCHW
281  // Since it is missing an utility function to reshape the biases, we append the biases into the weights tensor
282  biases_to_use = nullptr;
283  append_bias = true;
284 
285  if(_weights_manager && _weights_manager->are_weights_managed(weights))
286  {
287  _reshape_weights_managed.configure(compile_context, weights, biases, num_groups);
288  weights_to_use = utils::cast::polymorphic_downcast<ICLTensor *>(_weights_manager->acquire(weights, &_reshape_weights_managed));
289  }
290  else
291  {
292  _reshape_weights.configure(compile_context, weights, biases, &_weights_reshaped, num_groups);
293  }
294  }
295  else
296  {
297  if(_weights_manager && _weights_manager->are_weights_managed(weights))
298  {
299  _reshape_weights_managed.configure(compile_context, weights, nullptr, num_groups);
300  weights_to_use = utils::cast::polymorphic_downcast<ICLTensor *>(_weights_manager->acquire(weights, &_reshape_weights_managed));
301  }
302  else
303  {
304  _reshape_weights.configure(compile_context, weights, nullptr, &_weights_reshaped, num_groups);
305  }
306  }
307 
308  // Create tensor to store im2col reshaped inputs
309  if(!_skip_im2col)
310  {
311  _memory_group.manage(&_im2col_output);
312 
313  // Configure and tune im2col. im2col output shape is auto-initialized
314  _im2col_kernel->configure(compile_context, input, &_im2col_output, Size2D(kernel_width, kernel_height), conv_info, append_bias, dilation, num_groups);
315 
316  // Set quantization info
317  _im2col_output.info()->set_quantization_info(input->info()->quantization_info());
318  CLScheduler::get().tune_kernel_static(*_im2col_kernel);
319 
320  // Update GEMM input
321  gemm_input_to_use = &_im2col_output;
322  }
323 
324  // Create GEMM output tensor
325  if(!_skip_col2im)
326  {
327  TensorShape shape_gemm;
328 
329  // If we cannot skip col2im it means we run im2col as well
330  shape_gemm = _im2col_output.info()->tensor_shape();
331  shape_gemm.set(0, mat_weights_cols);
332  shape_gemm.set(1, conv_w * conv_h);
333 
334  TensorInfo info_gemm(shape_gemm, 1, data_type);
335  info_gemm.set_quantization_info(output->info()->quantization_info()).set_data_layout(input->info()->data_layout());
336  _gemm_output.allocator()->init(info_gemm);
337  _memory_group.manage(&_gemm_output);
338 
339  // Update GEMM output
340  gemm_output_to_use = &_gemm_output;
341  }
342 
343  GEMMLowpOutputStageInfo gemmlowp_output_stage;
344  gemmlowp_output_stage.type = GEMMLowpOutputStageType::QUANTIZE_DOWN_FIXEDPOINT;
345  gemmlowp_output_stage.gemmlowp_offset = 0;
346 
347  // Configure output stage for quantized case
348  if(_is_quantized)
349  {
350  const auto output_quant_info = (output->info()->total_size() == 0) ? iq_info : oq_info;
351  const bool is_quantized_per_channel = is_data_type_quantized_per_channel(weights->info()->data_type());
352  const unsigned int num_filters = (is_quantized_per_channel) ? num_kernels : 1;
353 
354  gemmlowp_output_stage.is_quantized_per_channel = is_quantized_per_channel;
355 
356  gemmlowp_output_stage.gemmlowp_multipliers.resize(num_filters);
357  gemmlowp_output_stage.gemmlowp_shifts.resize(num_filters);
359  weights->info(),
360  output->info(),
361  idx_kernels,
362  gemmlowp_output_stage.gemmlowp_multipliers.data(),
363  gemmlowp_output_stage.gemmlowp_shifts.data());
364  gemmlowp_output_stage.gemmlowp_multiplier = gemmlowp_output_stage.gemmlowp_multipliers[0];
365  gemmlowp_output_stage.gemmlowp_shift = gemmlowp_output_stage.gemmlowp_shifts[0];
366 
367  PixelValue min_val{};
368  PixelValue max_val{};
369  std::tie(min_val, max_val) = get_min_max(output->info()->data_type());
370 
371  auto min_activation = min_val.get<int32_t>();
372  auto max_activation = max_val.get<int32_t>();
373 
374  const std::set<ActivationLayerInfo::ActivationFunction> supported_acts = { ActivationLayerInfo::ActivationFunction::RELU,
377  };
378 
379  if(act_info.enabled())
380  {
381  if(supported_acts.count(act_info.activation()) != 0)
382  {
383  std::tie(min_activation, max_activation) = get_quantized_activation_min_max(act_info, data_type, output_quant_info);
384  }
385  else
386  {
387  _fuse_activation = false;
388  }
389  }
390 
391  // Set the GEMMLowp output stage info
392  gemmlowp_output_stage.gemmlowp_offset = output_quant_info.offset;
393  gemmlowp_output_stage.gemmlowp_min_bound = min_activation;
394  gemmlowp_output_stage.gemmlowp_max_bound = max_activation;
395  }
396 
397  // Configure and tune GEMM
398  // In case of NHWC, we need to run GEMM3D (gemm_3d_depth != 0) in order to avoid reshaping the output matrix
399  const unsigned int gemm_3d_depth = (data_layout == DataLayout::NHWC) ? conv_h : 0;
400 
401  configure_mm(compile_context, gemm_input_to_use, weights_to_use, biases_to_use, gemm_output_to_use, gemmlowp_output_stage, gemm_3d_depth, act_info);
402 
403  if(!_skip_im2col)
404  {
405  _im2col_output.allocator()->allocate();
406  }
407 
408  if(!_skip_col2im)
409  {
410  // Configure and tune Col2Im
411  _col2im_kernel->configure(compile_context, gemm_output_to_use, output, Size2D(conv_w, conv_h), num_groups);
412  CLScheduler::get().tune_kernel_static(*_col2im_kernel.get());
413  }
414 
415  if(!_skip_col2im)
416  {
417  _gemm_output.allocator()->allocate();
418  }
419 
420  ARM_COMPUTE_ERROR_ON_MSG((output->info()->dimension(idx_width) != conv_w) || (output->info()->dimension(idx_height) != conv_h),
421  "Output shape does not match the expected one");
422 
423  if(!_fuse_activation)
424  {
425  _activationlayer_function.configure(compile_context, output, nullptr, act_info);
426  }
427 
429 }
Quantize using a fixed point multiplication.
TensorInfo * info() const override
Interface to be implemented by the child class to return the tensor&#39;s metadata.
Definition: CLTensor.cpp:41
void compute_quantized_multipliers_and_shifts(const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *output, unsigned int idx_ofms, int32_t *output_multipliers_ptr, int32_t *output_shifts_ptr)
Compute quantized per-channel multipliers and shifts.
static CLScheduler & get()
Access the scheduler singleton.
const DataLayout data_layout
Definition: Im2Col.cpp:151
static Status validate(const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, const PadStrideInfo &conv_info, const WeightsInfo &weights_info=WeightsInfo(), const Size2D &dilation=Size2D(1U, 1U), const ActivationLayerInfo &act_info=ActivationLayerInfo(), unsigned int num_groups=1)
Static function to check if given info will lead to a valid configuration of CLGEMMConvolutionLayer.
CLTensorAllocator * allocator()
Return a pointer to the tensor&#39;s allocator.
Definition: CLTensor.cpp:61
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
void init(const TensorInfo &input, size_t alignment=0)
Initialize a tensor based on the passed TensorInfo.
std::pair< unsigned int, unsigned int > scaled_dimensions(int width, int height, int kernel_width, int kernel_height, const PadStrideInfo &pad_stride_info, const Size2D &dilation=Size2D(1U, 1U))
Returns expected width and height of output scaled tensor depending on dimensions rounding mode...
Definition: Utils.cpp:419
ITensorInfo & set_quantization_info(const QuantizationInfo &quantization_info) override
Set the quantization settings (scale and offset) of the tensor.
Definition: TensorInfo.cpp:380
void manage(IMemoryManageable *obj) override
Sets a object to be managed by the given memory group.
Definition: MemoryGroup.h:79
bool are_weights_managed(const ITensor *weights)
Check if the weights are managed.
const DataType data_type
Definition: Im2Col.cpp:150
#define ARM_COMPUTE_UNUSED(...)
To avoid unused variables warnings.
Definition: Error.h:152
bool is_data_type_quantized_per_channel(DataType dt)
Check if a given data type is of per channel type.
Definition: Utils.h:1245
std::pair< int32_t, int32_t > get_quantized_activation_min_max(ActivationLayerInfo act_info, DataType data_type, UniformQuantizationInfo oq_info)
Returns a pair of minimum and maximum values for a quantized activation.
Definition: Utils.cpp:483
#define ARM_COMPUTE_ERROR_ON_MSG(cond, msg)
Definition: Error.h:456
const unsigned int num_groups
Definition: Im2Col.cpp:153
src_info set_data_layout(data_layout)
bool is_data_type_quantized_asymmetric(DataType dt)
Check if a given data type is of asymmetric quantized type.
Definition: Utils.h:1190
void allocate() override
Allocate size specified by TensorInfo of OpenCL memory.
void configure(const ICLTensor *input, const ICLTensor *biases, unsigned int num_groups)
Configures the CLConvolutionLayerReshapeWeights function.
Num samples, height, width, channels.
void configure(ICLTensor *input, ICLTensor *output, ActivationLayerInfo act_info)
Set the input and output tensor.
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161
void configure(const ICLTensor *weights, const ICLTensor *biases, ICLTensor *output, unsigned int num_groups=1)
Set the input and output tensors.
size_t get_data_layout_dimension_index(const DataLayout data_layout, const DataLayoutDimension data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:193
void tune_kernel_static(ICLKernel &kernel)
Tunes OpenCL kernel.
Definition: CLScheduler.cpp:84
const TensorShape & tensor_shape() const override
Size for each dimension of the tensor.
Definition: TensorInfo.h:262
DataType
Available data types.
Definition: Types.h:77
DataLayout
[DataLayout enum definition]
Definition: Types.h:120
std::tuple< PixelValue, PixelValue > get_min_max(DataType dt)
Compute the mininum and maximum values a data type can take.
Definition: Utils.h:564
TensorShape & set(size_t dimension, size_t value, bool apply_dim_correction=true, bool increase_dim_unit=true)
Accessor to set the value of one of the dimensions.
Definition: TensorShape.h:79
ITensor * acquire(const ITensor *weights, ITransformWeights *weights_transform)
Acquire the requested reshape tensor of the selected weights.

◆ operator=() [1/2]

CLGEMMConvolutionLayer& operator= ( const CLGEMMConvolutionLayer )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

CLGEMMConvolutionLayer& operator= ( CLGEMMConvolutionLayer &&  )
default

Default move assignment operator.

◆ prepare()

void prepare ( )
overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Note
Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from IFunction.

Definition at line 660 of file CLGEMMConvolutionLayer.cpp.

References CLTensorAllocator::allocate(), CLTensor::allocator(), IWeightsManager::are_weights_managed(), ARM_COMPUTE_ERROR_ON, CLTensorAllocator::free(), CLScheduler::get(), ITensor::is_used(), ITensor::mark_as_unused(), CLGEMMLowpMatrixMultiplyCore::prepare(), CLGEMM::prepare(), CLScheduler::queue(), IWeightsManager::run(), and CLConvolutionLayerReshapeWeights::run().

Referenced by CLGEMMConvolutionLayer::run().

661 {
662  if(!_is_prepared)
663  {
664  ARM_COMPUTE_ERROR_ON(!_original_weights->is_used());
665  if(_weights_manager && _weights_manager->are_weights_managed(_original_weights))
666  {
667  _weights_manager->run(_original_weights, &_reshape_weights_managed);
668  }
669  else
670  {
671  // Run weights reshaping and mark original weights tensor as unused
672  _weights_reshaped.allocator()->allocate();
673  _reshape_weights.run();
674  _original_weights->mark_as_unused();
675  }
676 
677  // Prepare GEMM
678  _is_quantized ? _mm_gemmlowp.prepare() : _mm_gemm.prepare();
679  if(!_weights_reshaped.is_used())
680  {
681  _weights_reshaped.allocator()->free();
682  }
683 
684  CLScheduler::get().queue().finish();
685  _is_prepared = true;
686  }
687 }
void prepare() override
Prepare the function for executing.
Definition: CLGEMM.cpp:870
void prepare() override
Prepare the function for executing.
static CLScheduler & get()
Access the scheduler singleton.
bool is_used() const
Flags if the tensor is used or not.
Definition: ITensor.cpp:163
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:466
CLTensorAllocator * allocator()
Return a pointer to the tensor&#39;s allocator.
Definition: CLTensor.cpp:61
void mark_as_unused() const
Marks a tensor as unused.
Definition: ITensor.cpp:168
bool are_weights_managed(const ITensor *weights)
Check if the weights are managed.
cl::CommandQueue & queue()
Accessor for the associated CL command queue.
Definition: CLScheduler.cpp:41
void allocate() override
Allocate size specified by TensorInfo of OpenCL memory.
void free() override
Free allocated OpenCL memory.
ITensor * run(const ITensor *weights, ITransformWeights *weights_transform)
Run the reshape function.
void run() override
Run the kernels contained in the function.

◆ run()

void run ( )
overridevirtual

Run the kernels contained in the function.

For Neon kernels:

  • Multi-threading is used for the kernels which are parallelisable.
  • By default std::thread::hardware_concurrency() threads are used.
Note
CPPScheduler::set_num_threads() can be used to manually set the number of threads

For OpenCL kernels:

  • All the kernels are enqueued on the queue associated with CLScheduler.
  • The queue is then flushed.
Note
The function will not block until the kernels are executed. It is the user's responsibility to wait.
Will call prepare() on first run if hasn't been done

Implements IFunction.

Definition at line 623 of file CLGEMMConvolutionLayer.cpp.

References CLScheduler::enqueue(), CLScheduler::get(), CLGEMMConvolutionLayer::prepare(), CLActivationLayer::run(), CLGEMMLowpMatrixMultiplyCore::run(), and CLGEMM::run().

624 {
625  prepare();
626 
627  MemoryGroupResourceScope scope_mg(_memory_group);
628 
629  // Run im2col
630  if(!_skip_im2col)
631  {
632  CLScheduler::get().enqueue(*_im2col_kernel);
633  }
634 
635  // Runs CLGEMM or CLGEMMLowpMatrixMultiplyCore functions
636  if(_is_quantized)
637  {
638  // Run gemmlowp
639  _mm_gemmlowp.run();
640  }
641  else
642  {
643  // Run gemm
644  _mm_gemm.run();
645  }
646 
647  // Reshape output matrix
648  if(!_skip_col2im)
649  {
650  CLScheduler::get().enqueue(*_col2im_kernel.get(), false);
651  }
652 
653  //Run Activation Layer if we cannot fuse in GEMM
654  if(!_fuse_activation)
655  {
656  _activationlayer_function.run();
657  }
658 }
void run() override
Run the kernels contained in the function.
Definition: CLGEMM.cpp:778
static CLScheduler & get()
Access the scheduler singleton.
void run() override
Run the kernels contained in the function.
void run() override
Run the kernels contained in the function.
void enqueue(ICLKernel &kernel, bool flush=true)
Schedule the execution of the passed kernel if possible.
void prepare() override
Prepare the function for executing.

◆ validate()

Status validate ( const ITensorInfo input,
const ITensorInfo weights,
const ITensorInfo biases,
const ITensorInfo output,
const PadStrideInfo conv_info,
const WeightsInfo weights_info = WeightsInfo(),
const Size2D dilation = Size2D(1U, 1U),
const ActivationLayerInfo act_info = ActivationLayerInfo(),
unsigned int  num_groups = 1 
)
static

Static function to check if given info will lead to a valid configuration of CLGEMMConvolutionLayer.

Parameters
[in]inputSource tensor. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: QASYMM8/QASYMM8_SIGNED/F16/F32.
[in]weightsWeights tensor. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. Data type supported: Same as input or QASYMM8/QSYMM8_PER_CHANNEL when input is QASYMM8 or QASYMM8_SIGNED/QSYMM8_PER_CHANNEL when input is QASYMM8_SIGNED.
[in]biasesBiases tensor. Shared biases supported. Biases are 1D tensor with dimensions [OFM]. Data type supported: Should match input data type, except for input of quantized type where biases should be of S32 type.
[out]outputDestination tensor. 3 lower dimensions represent a single output [width, height, OFM], while the rest represent batch of outputs. Data types supported: Same as input.
[in]conv_infoContains padding and stride information described in PadStrideInfo.
[in]weights_infoSpecifies if the weights tensor has been reshaped with CLWeightsReshapeKernel. If this is not part of the fully connected layer the weights tensor has also been transposed with CLGEMMReshapeRHSMatrixKernel. Data type supported: Same as input.
[in]dilation(Optional) Dilation, in elements, across x and y. Defaults to (1, 1).
[in]act_info(Optional) Activation layer information in case of a fused activation.
[in]num_groups(Optional) Number of groups when performing a grouped convolution. num_groups != 1 is only supported for NCHW data layout
Returns
a status

Definition at line 431 of file CLGEMMConvolutionLayer.cpp.

References ActivationLayerInfo::a(), ActivationLayerInfo::activation(), WeightsInfo::are_reshaped(), ARM_COMPUTE_ERROR_ON, ARM_COMPUTE_RETURN_ERROR_ON, ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_LAYOUT, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES, ARM_COMPUTE_RETURN_ERROR_ON_MSG, ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR, ARM_COMPUTE_RETURN_ON_ERROR, arm_compute::auto_init_if_empty(), ActivationLayerInfo::b(), arm_compute::BATCHES, ActivationLayerInfo::BOUNDED_RELU, arm_compute::CHANNEL, ICloneable< T >::clone(), arm_compute::misc::shape_calculator::compute_im2col_conv_shape(), arm_compute::quantization::compute_quantized_multipliers_and_shifts(), arm_compute::misc::shape_calculator::compute_weights_reshaped_shape(), arm_compute::test::validation::conv_info, arm_compute::test::validation::data_layout, ITensorInfo::data_layout(), arm_compute::test::validation::data_type, ITensorInfo::data_type(), ITensorInfo::dimension(), ActivationLayerInfo::enabled(), arm_compute::F16, arm_compute::F32, GEMMLowpOutputStageInfo::gemmlowp_max_bound, GEMMLowpOutputStageInfo::gemmlowp_min_bound, GEMMLowpOutputStageInfo::gemmlowp_multiplier, GEMMLowpOutputStageInfo::gemmlowp_multipliers, GEMMLowpOutputStageInfo::gemmlowp_offset, GEMMLowpOutputStageInfo::gemmlowp_shift, GEMMLowpOutputStageInfo::gemmlowp_shifts, arm_compute::get_data_layout_dimension_index(), arm_compute::get_quantized_activation_min_max(), arm_compute::HEIGHT, arm_compute::test::validation::idx_height, arm_compute::test::validation::idx_width, arm_compute::test::validation::input, arm_compute::is_data_type_quantized_asymmetric(), arm_compute::is_data_type_quantized_per_channel(), GEMMLowpOutputStageInfo::is_quantized_per_channel, ActivationLayerInfo::LU_BOUNDED_RELU, arm_compute::NCHW, arm_compute::NHWC, ITensorInfo::num_dimensions(), arm_compute::test::validation::num_groups, arm_compute::QASYMM8, arm_compute::QASYMM8_SIGNED, ITensorInfo::quantization_info(), arm_compute::QUANTIZE_DOWN_FIXEDPOINT, ActivationLayerInfo::RELU, arm_compute::S32, arm_compute::scaled_dimensions(), TensorShape::set(), arm_compute::test::validation::set_data_layout(), PadStrideInfo::stride(), ITensorInfo::tensor_shape(), ITensorInfo::total_size(), GEMMLowpOutputStageInfo::type, QuantizationInfo::uniform(), CLActivationLayer::validate(), CLCol2ImKernel::validate(), CLConvolutionLayerReshapeWeights::validate(), CLIm2ColKernel::validate(), and arm_compute::WIDTH.

Referenced by CLGEMMConvolutionLayer::configure(), and CLConvolutionLayer::validate().

433 {
434  ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR(input, weights, output);
435  ARM_COMPUTE_RETURN_ERROR_ON_MSG(weights_info.are_reshaped(), "Weights already reshaped are not supported!");
437  const bool is_quantized_per_channel = is_data_type_quantized_per_channel(weights->data_type());
438 
439  if(!is_quantized_per_channel)
440  {
442  }
444  ARM_COMPUTE_RETURN_ERROR_ON_MSG((num_groups != 1) && (input->data_layout() != DataLayout::NCHW), "Grouping (num_groups != 1) with NHWC data layout is not supported");
445  ARM_COMPUTE_RETURN_ERROR_ON_MSG((num_groups != 1) && (input->data_type() == DataType::QASYMM8), "Grouping (num_groups != 1) is not supported with QASYMM8");
446  ARM_COMPUTE_RETURN_ERROR_ON(((input->dimension(2) / weights->dimension(2)) != num_groups) && (input->data_layout() == DataLayout::NCHW));
447 
448  const DataLayout data_layout = input->data_layout();
449  const DataType data_type = input->data_type();
452  const int idx_channel = get_data_layout_dimension_index(data_layout, DataLayoutDimension::CHANNEL);
453  const int idx_kernels = get_data_layout_dimension_index(data_layout, DataLayoutDimension::BATCHES);
454 
455  const unsigned int kernel_width = weights->dimension(idx_width);
456  const unsigned int kernel_height = weights->dimension(idx_height);
457  const unsigned int num_kernels = weights->dimension(idx_kernels);
458 
459  TensorInfo im2col_reshaped_info{};
460  TensorInfo info_gemm{};
461  TensorInfo weights_reshaped_info{};
462  const ITensorInfo *gemm_input_to_use = input;
463  const ITensorInfo *gemm_output_to_use = output;
464  const ITensorInfo *weights_to_use = weights;
465  const bool is_quantized = is_data_type_quantized_asymmetric(data_type);
466  const bool skip_im2col = (data_layout == DataLayout::NHWC && kernel_width == 1 && kernel_height == 1 && conv_info.stride().first == 1 && conv_info.stride().second == 1);
467  const bool skip_col2im = data_layout == DataLayout::NHWC;
468  bool fuse_activation = true;
469 
470  ARM_COMPUTE_RETURN_ERROR_ON((weights->dimension(idx_channel) * num_groups) != input->dimension(idx_channel));
471  ARM_COMPUTE_RETURN_ERROR_ON(weights->num_dimensions() > 4);
472 
473  // Validate biases
474  if(biases != nullptr)
475  {
476  if(is_quantized)
477  {
479  }
480  else
481  {
483  }
484  ARM_COMPUTE_RETURN_ERROR_ON(biases->dimension(0) != weights->dimension(idx_kernels));
485  ARM_COMPUTE_RETURN_ERROR_ON(biases->num_dimensions() > 1);
486  }
487 
488  if(act_info.enabled())
489  {
490  ARM_COMPUTE_ERROR_ON(act_info.b() > act_info.a());
491  }
492 
493  // Get convolved dimensions
494  unsigned int conv_w = 0;
495  unsigned int conv_h = 0;
496 
497  std::tie(conv_w, conv_h) = scaled_dimensions(input->dimension(idx_width),
498  input->dimension(idx_height),
499  kernel_width,
500  kernel_height,
501  conv_info,
502  dilation);
503 
504  unsigned int mat_weights_cols = num_kernels / num_groups;
505 
506  const ITensorInfo *biases_to_use = biases;
507  bool append_bias = false;
508 
509  if(num_groups != 1 && biases != nullptr)
510  {
511  // num_groups != 1 can only be for NCHW
512  // Since it is missing an utility function to reshape the biases, we append the biases into the weights tensor
513  biases_to_use = nullptr;
514  append_bias = true;
515 
517  weights_reshaped_info = TensorInfo(compute_weights_reshaped_shape(*weights, true, num_groups), 1, data_type);
518  }
519  else
520  {
522  weights_reshaped_info = TensorInfo(compute_weights_reshaped_shape(*weights, false, num_groups), 1, data_type);
523  }
524 
525  weights_to_use = &weights_reshaped_info;
526 
527  if(!skip_im2col)
528  {
529  const Size2D kernel_dims(kernel_width, kernel_height);
530 
531  // Output tensor auto initialization if not yet initialized
532  TensorShape expected_output_shape = compute_im2col_conv_shape(input, kernel_dims, conv_info, append_bias, dilation, num_groups == 1, num_groups);
533 
534  auto_init_if_empty(im2col_reshaped_info, input->clone()->set_tensor_shape(expected_output_shape));
535 
536  ARM_COMPUTE_RETURN_ON_ERROR(CLIm2ColKernel::validate(input, &im2col_reshaped_info, kernel_dims, conv_info, append_bias, dilation, num_groups));
537  gemm_input_to_use = &im2col_reshaped_info;
538  }
539 
540  // Create GEMM output tensor
541  if(!skip_col2im)
542  {
543  TensorShape shape_gemm;
544 
545  shape_gemm = gemm_input_to_use->tensor_shape();
546  shape_gemm.set(0, mat_weights_cols);
547  shape_gemm.set(1, conv_w * conv_h);
548 
549  info_gemm = TensorInfo(shape_gemm, 1, data_type);
550  info_gemm.set_quantization_info(output->quantization_info()).set_data_layout(input->data_layout());
551  gemm_output_to_use = &info_gemm;
552  }
553 
554  GEMMLowpOutputStageInfo gemmlowp_output_stage;
555  gemmlowp_output_stage.type = GEMMLowpOutputStageType::QUANTIZE_DOWN_FIXEDPOINT;
556  gemmlowp_output_stage.gemmlowp_offset = 0;
557  gemmlowp_output_stage.is_quantized_per_channel = is_quantized_per_channel;
558 
559  if(is_quantized)
560  {
561  const UniformQuantizationInfo iq_info = input->quantization_info().uniform();
562  const UniformQuantizationInfo oq_info = output->quantization_info().uniform();
563  const auto output_quant_info = (output->total_size() == 0) ? iq_info : oq_info;
564  const unsigned int num_filters = (is_quantized_per_channel) ? num_kernels : 1;
565 
566  gemmlowp_output_stage.gemmlowp_multipliers.resize(num_filters);
567  gemmlowp_output_stage.gemmlowp_shifts.resize(num_filters);
569  weights,
570  output,
571  idx_kernels,
572  gemmlowp_output_stage.gemmlowp_multipliers.data(),
573  gemmlowp_output_stage.gemmlowp_shifts.data());
574  gemmlowp_output_stage.gemmlowp_multiplier = gemmlowp_output_stage.gemmlowp_multipliers[0];
575  gemmlowp_output_stage.gemmlowp_shift = gemmlowp_output_stage.gemmlowp_shifts[0];
576 
577  int min_activation = 0;
578  int max_activation = 0;
579 
580  const std::set<ActivationLayerInfo::ActivationFunction> supported_acts = { ActivationLayerInfo::ActivationFunction::RELU,
583  };
584 
585  if(act_info.enabled())
586  {
587  if(supported_acts.count(act_info.activation()) != 0)
588  {
589  std::tie(min_activation, max_activation) = get_quantized_activation_min_max(act_info, data_type, output_quant_info);
590  }
591  else
592  {
593  fuse_activation = false;
594  }
595  }
596 
597  // Set the GEMMLowp output stage info
598  gemmlowp_output_stage.gemmlowp_offset = output_quant_info.offset;
599  gemmlowp_output_stage.gemmlowp_min_bound = min_activation;
600  gemmlowp_output_stage.gemmlowp_max_bound = max_activation;
601  }
602 
603  // In case of NHWC, we need to run GEMM3D (gemm_3d_depth != 0) in order to avoid reshaping the output matrix
604  const unsigned int gemm_3d_depth = (data_layout == DataLayout::NHWC) ? conv_h : 0;
605 
606  ARM_COMPUTE_RETURN_ON_ERROR(validate_mm(gemm_input_to_use, weights_to_use, biases_to_use, gemm_output_to_use, gemmlowp_output_stage, gemm_3d_depth, skip_im2col, act_info));
607 
608  // Validate Col2Im
609  if(!skip_col2im)
610  {
611  ARM_COMPUTE_RETURN_ON_ERROR(CLCol2ImKernel::validate(gemm_output_to_use, output, Size2D(conv_w, conv_h), num_groups));
612  }
613 
614  //Validate Activation Layer
615  if(!fuse_activation)
616  {
617  ARM_COMPUTE_RETURN_ON_ERROR(CLActivationLayer::validate(output, nullptr, act_info));
618  }
619 
620  return Status{};
621 }
static Status validate(const ITensorInfo *input, const ITensorInfo *output, const ActivationLayerInfo &act_info)
Static function to check if given info will lead to a valid configuration of CLActivationLayer.
Quantize using a fixed point multiplication.
void compute_quantized_multipliers_and_shifts(const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *output, unsigned int idx_ofms, int32_t *output_multipliers_ptr, int32_t *output_shifts_ptr)
Compute quantized per-channel multipliers and shifts.
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_LAYOUT(...)
Definition: Validate.h:494
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204
1 channel, 1 F32 per channel
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:466
const DataLayout data_layout
Definition: Im2Col.cpp:151
#define ARM_COMPUTE_RETURN_ERROR_ON(cond)
If the condition is true, an error is returned.
Definition: Error.h:296
1 channel, 1 F16 per channel
static Status validate(const ITensorInfo *input, const ITensorInfo *output, const Size2D &convolved_dims, unsigned int num_groups=1)
Static function to check if given info will lead to a valid configuration of CLCol2ImKernel.
std::pair< unsigned int, unsigned int > scaled_dimensions(int width, int height, int kernel_width, int kernel_height, const PadStrideInfo &pad_stride_info, const Size2D &dilation=Size2D(1U, 1U))
Returns expected width and height of output scaled tensor depending on dimensions rounding mode...
Definition: Utils.cpp:419
#define ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR(...)
Definition: Validate.h:163
1 channel, 1 S32 per channel
const DataType data_type
Definition: Im2Col.cpp:150
static Status validate(const ITensorInfo *input, const ITensorInfo *output, const Size2D &kernel_dims, const PadStrideInfo &conv_info, bool has_bias, const Size2D &dilation=Size2D(1U, 1U), unsigned int num_groups=1)
Static function to check if given info will lead to a valid configuration of CLIm2ColKernel.
bool is_data_type_quantized_per_channel(DataType dt)
Check if a given data type is of per channel type.
Definition: Utils.h:1245
std::pair< int32_t, int32_t > get_quantized_activation_min_max(ActivationLayerInfo act_info, DataType data_type, UniformQuantizationInfo oq_info)
Returns a pair of minimum and maximum values for a quantized activation.
Definition: Utils.cpp:483
quantized, asymmetric fixed-point 8-bit number unsigned
const unsigned int num_groups
Definition: Im2Col.cpp:153
bool auto_init_if_empty(ITensorInfo &info, const TensorShape &shape, int num_channels, DataType data_type, QuantizationInfo quantization_info=QuantizationInfo())
Auto initialize the tensor info (shape, number of channels and data type) if the current assignment i...
Num samples, channels, height, width.
src_info set_data_layout(data_layout)
bool is_data_type_quantized_asymmetric(DataType dt)
Check if a given data type is of asymmetric quantized type.
Definition: Utils.h:1190
static Status validate(const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, unsigned int num_groups=1)
Static function to check if given info will lead to a valid configuration of CLConvolutionLayerReshap...
TensorShape compute_weights_reshaped_shape(const ITensorInfo &weights, bool has_bias=false, unsigned int num_groups=1)
Calculate the reshaped shape of the weights.
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(...)
Definition: Validate.h:545
Num samples, height, width, channels.
#define ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(t, c,...)
Definition: Validate.h:792
#define ARM_COMPUTE_RETURN_ERROR_ON_MSG(cond, msg)
If the condition is true, an error is returned.
Definition: Error.h:244
quantized, asymmetric fixed-point 8-bit number signed
size_t get_data_layout_dimension_index(const DataLayout data_layout, const DataLayoutDimension data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:193
DataType
Available data types.
Definition: Types.h:77
DataLayout
[DataLayout enum definition]
Definition: Types.h:120
TensorShape compute_im2col_conv_shape(const ITensorInfo *input, const Size2D &kernel_dims, const PadStrideInfo &conv_info, bool has_bias, const Size2D &dilation, bool batch_size_on_z, unsigned int num_groups=1)
Calculate the im2col output shape of a tensor.

The documentation for this class was generated from the following files: