Compute Library
 19.11
CLGEMMConvolutionLayer Class Reference

Basic function to compute the convolution layer. More...

#include <CLGEMMConvolutionLayer.h>

Collaboration diagram for CLGEMMConvolutionLayer:
[legend]

Public Member Functions

 CLGEMMConvolutionLayer (std::shared_ptr< IMemoryManager > memory_manager=nullptr, IWeightsManager *weights_manager=nullptr)
 Constructor. More...
 
 CLGEMMConvolutionLayer (const CLGEMMConvolutionLayer &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 CLGEMMConvolutionLayer (CLGEMMConvolutionLayer &&)=default
 Default move constructor. More...
 
CLGEMMConvolutionLayeroperator= (const CLGEMMConvolutionLayer &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
CLGEMMConvolutionLayeroperator= (CLGEMMConvolutionLayer &&)=default
 Default move assignment operator. More...
 
void configure (const ICLTensor *input, const ICLTensor *weights, const ICLTensor *biases, ICLTensor *output, const PadStrideInfo &conv_info, const WeightsInfo &weights_info=WeightsInfo(), const Size2D &dilation=Size2D(1U, 1U), const ActivationLayerInfo &act_info=ActivationLayerInfo(), unsigned int num_groups=1)
 Set the input and output tensors. More...
 
void run () override
 Run the kernels contained in the function. More...
 
void prepare () override
 Prepare the function for executing. More...
 
- Public Member Functions inherited from IFunction
virtual ~IFunction ()=default
 Destructor. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, const PadStrideInfo &conv_info, const WeightsInfo &weights_info=WeightsInfo(), const Size2D &dilation=Size2D(1U, 1U), const ActivationLayerInfo &act_info=ActivationLayerInfo(), unsigned int num_groups=1)
 Static function to check if given info will lead to a valid configuration of CLGEMMConvolutionLayer. More...
 

Detailed Description

Basic function to compute the convolution layer.

This function calls the following OpenCL kernels/functions:

  1. CLIm2ColKernel
  2. CLGEMM (if the data type is FP32 or FP16)
  3. CLGEMMLowpMatrixMultiplyCore (if the data type is QASYMM8)
  4. CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint (if the data type is QASYMM8)
  5. CLElementwiseOperationKernel for addition (if biases != nullptr and we have a 1x1 convolution with the NHWC data layout)
  6. CLCol2ImKernel (if NCHW data layout)

Definition at line 149 of file CLGEMMConvolutionLayer.h.

Constructor & Destructor Documentation

◆ CLGEMMConvolutionLayer() [1/3]

CLGEMMConvolutionLayer ( std::shared_ptr< IMemoryManager memory_manager = nullptr,
IWeightsManager weights_manager = nullptr 
)

Constructor.

Parameters
[in]memory_manager(Optional) Memory manager.
[in]weights_manager(Optional) Weights manager.

Definition at line 96 of file CLGEMMConvolutionLayer.cpp.

97  : _memory_group(memory_manager), _weights_manager(weights_manager), _reshape_weights(), _reshape_weights_managed(), _im2col_kernel(), _mm_gemm(memory_manager, weights_manager),
98  _mm_gemmlowp(memory_manager), _col2im_kernel(), _activationlayer_function(), _original_weights(nullptr), _im2col_output(), _weights_reshaped(), _gemm_output(), _skip_im2col(false),
99  _skip_col2im(false), _is_quantized(false), _fuse_activation(true), _is_prepared(false)
100 {
101 }

◆ CLGEMMConvolutionLayer() [2/3]

Prevent instances of this class from being copied (As this class contains pointers)

◆ CLGEMMConvolutionLayer() [3/3]

Default move constructor.

Member Function Documentation

◆ configure()

void configure ( const ICLTensor input,
const ICLTensor weights,
const ICLTensor biases,
ICLTensor output,
const PadStrideInfo conv_info,
const WeightsInfo weights_info = WeightsInfo(),
const Size2D dilation = Size2D(1U, 1U),
const ActivationLayerInfo act_info = ActivationLayerInfo(),
unsigned int  num_groups = 1 
)

Set the input and output tensors.

Parameters
[in]inputSource tensor. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: QASYMM8/F16/F32.
[in]weightsWeights tensor. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. Data type supported: Same as input or QASYMM8/QSYMM8_PER_CHANNEL when input is QASYMM8.
[in]biasesBiases tensor. Shared biases supported. Biases are 1D tensor with dimensions [OFM]. Data type supported: Should match input data type, except for input of QASYMM8 type where biases should be of S32 type.
[out]outputDestination tensor. 3 lower dimensions represent a single output [width, height, OFM], while the rest represent batch of outputs. Data types supported: Same as input.
[in]conv_infoContains padding and stride information described in PadStrideInfo.
[in]weights_infoSpecifies if the weights tensor has been reshaped with CLWeightsReshapeKernel. If this is not part of the fully connected layer the weights tensor has also been transposed with CLGEMMReshapeRHSMatrixKernel. Data type supported: Same as input.
[in]dilation(Optional) Dilation, in elements, across x and y. Defaults to (1, 1).
[in]act_info(Optional) Activation layer information in case of a fused activation.
[in]num_groups(Optional) Number of groups when performing a grouped convolution. num_groups != 1 is only supported for NCHW data layout

Definition at line 181 of file CLGEMMConvolutionLayer.cpp.

183 {
185 
187  weights->info(),
188  biases != nullptr ? biases->info() : nullptr,
189  output->info(),
190  conv_info,
191  weights_info,
192  dilation,
193  act_info,
194  num_groups));
195 
196  const DataType data_type = input->info()->data_type();
197  const DataLayout data_layout = input->info()->data_layout();
201 
202  const unsigned int kernel_width = weights->info()->dimension(idx_width);
203  const unsigned int kernel_height = weights->info()->dimension(idx_height);
204  const unsigned int num_kernels = weights->info()->dimension(idx_kernels);
205 
206  const UniformQuantizationInfo iq_info = input->info()->quantization_info().uniform();
207  const UniformQuantizationInfo oq_info = output->info()->quantization_info().uniform();
208 
209  _is_prepared = weights_info.retain_internal_weights();
210  _original_weights = weights;
211  _is_quantized = is_data_type_quantized_asymmetric(input->info()->data_type());
212  _skip_im2col = (data_layout == DataLayout::NHWC && kernel_width == 1 && kernel_height == 1 && conv_info.stride().first == 1 && conv_info.stride().second == 1);
213  _skip_col2im = data_layout == DataLayout::NHWC;
214 
215  // Only for quantize there are few cases where we cannot fuse the activation function in GEMM
216  _fuse_activation = true;
217 
218  // Set the GPU target for im2col and col2im
219  _im2col_kernel.set_target(CLScheduler::get().target());
220  _col2im_kernel.set_target(CLScheduler::get().target());
221 
222  const ICLTensor *gemm_input_to_use = input;
223  ICLTensor *gemm_output_to_use = output;
224 
225  // Get parameters from conv_info
226  unsigned int stride_x = 0;
227  unsigned int stride_y = 0;
228  std::tie(stride_x, stride_y) = conv_info.stride();
229 
230  // Get convolved dimensions
231  unsigned int conv_w = 0;
232  unsigned int conv_h = 0;
233  std::tie(conv_w, conv_h) = scaled_dimensions(input->info()->dimension(idx_width),
234  input->info()->dimension(idx_height),
235  kernel_width,
236  kernel_height,
237  conv_info,
238  dilation);
239 
240  unsigned int mat_weights_cols = num_kernels / num_groups;
241 
242  const ICLTensor *biases_to_use = biases;
243  bool append_bias = false;
244 
245  ICLTensor *weights_to_use = &_weights_reshaped;
246  if(num_groups != 1 && biases != nullptr)
247  {
248  // num_groups != 1 can only be for NCHW
249  // Since it is missing an utility function to reshape the biases, we append the biases into the weights tensor
250  biases_to_use = nullptr;
251  append_bias = true;
252 
253  if(_weights_manager && _weights_manager->are_weights_managed(weights))
254  {
255  _reshape_weights_managed.configure(weights, biases, num_groups);
256  weights_to_use = utils::cast::polymorphic_downcast<ICLTensor *>(_weights_manager->acquire(weights, &_reshape_weights_managed));
257  }
258  else
259  {
260  _reshape_weights.configure(weights, biases, &_weights_reshaped, num_groups);
261  }
262  }
263  else
264  {
265  if(_weights_manager && _weights_manager->are_weights_managed(weights))
266  {
267  _reshape_weights_managed.configure(weights, nullptr, num_groups);
268  weights_to_use = utils::cast::polymorphic_downcast<ICLTensor *>(_weights_manager->acquire(weights, &_reshape_weights_managed));
269  }
270  else
271  {
272  _reshape_weights.configure(weights, nullptr, &_weights_reshaped, num_groups);
273  }
274  }
275 
276  // Create tensor to store im2col reshaped inputs
277  if(!_skip_im2col)
278  {
279  _memory_group.manage(&_im2col_output);
280 
281  // Configure and tune im2col. im2col output shape is auto-initialized
282  _im2col_kernel.configure(input, &_im2col_output, Size2D(kernel_width, kernel_height), conv_info, append_bias, dilation, num_groups);
283 
284  // Set quantization info
285  _im2col_output.info()->set_quantization_info(input->info()->quantization_info());
286  CLScheduler::get().tune_kernel_static(_im2col_kernel);
287 
288  // Update GEMM input
289  gemm_input_to_use = &_im2col_output;
290  }
291 
292  // Create GEMM output tensor
293  if(!_skip_col2im)
294  {
295  TensorShape shape_gemm;
296 
297  // If we cannot skip col2im it means we run im2col as well
298  shape_gemm = _im2col_output.info()->tensor_shape();
299  shape_gemm.set(0, mat_weights_cols);
300  shape_gemm.set(1, conv_w * conv_h);
301 
302  // TODO(COMPMID-2078): input->clone() doesn't work with subtensors for grouped convolutions.
303  TensorInfo info_gemm(shape_gemm, 1, data_type);
304  info_gemm.set_quantization_info(output->info()->quantization_info()).set_data_layout(input->info()->data_layout());
305  _gemm_output.allocator()->init(info_gemm);
306  _memory_group.manage(&_gemm_output);
307 
308  // Update GEMM output
309  gemm_output_to_use = &_gemm_output;
310  }
311 
312  GEMMLowpOutputStageInfo gemmlowp_output_stage;
313  gemmlowp_output_stage.type = GEMMLowpOutputStageType::QUANTIZE_DOWN_FIXEDPOINT;
314  gemmlowp_output_stage.gemmlowp_offset = 0;
315 
316  // Configure output stage for quantized case
317  if(_is_quantized)
318  {
319  const auto output_quant_info = (output->info()->total_size() == 0) ? iq_info : oq_info;
320  const bool is_quantized_per_channel = is_data_type_quantized_per_channel(weights->info()->data_type());
321  const unsigned int num_filters = (is_quantized_per_channel) ? num_kernels : 1;
322 
323  gemmlowp_output_stage.is_quantized_per_channel = is_quantized_per_channel;
324 
325  gemmlowp_output_stage.gemmlowp_multipliers.resize(num_filters);
326  gemmlowp_output_stage.gemmlowp_shifts.resize(num_filters);
328  weights->info(),
329  output->info(),
330  idx_kernels,
331  gemmlowp_output_stage.gemmlowp_multipliers.data(),
332  gemmlowp_output_stage.gemmlowp_shifts.data());
333  gemmlowp_output_stage.gemmlowp_multiplier = gemmlowp_output_stage.gemmlowp_multipliers[0];
334  gemmlowp_output_stage.gemmlowp_shift = gemmlowp_output_stage.gemmlowp_shifts[0];
335 
336  int min_activation = 0;
337  int max_activation = 0;
338 
339  const std::set<ActivationLayerInfo::ActivationFunction> supported_acts = { ActivationLayerInfo::ActivationFunction::RELU,
342  };
343 
344  if(act_info.enabled())
345  {
346  if(supported_acts.count(act_info.activation()) != 0)
347  {
348  const int a_const_int = quantize_qasymm8(act_info.a(), output_quant_info);
349  const int b_const_int = quantize_qasymm8(act_info.b(), output_quant_info);
350 
351  min_activation = act_info.activation() != ActivationLayerInfo::ActivationFunction::LU_BOUNDED_RELU ? output_quant_info.offset : b_const_int;
352  max_activation = act_info.activation() == ActivationLayerInfo::ActivationFunction::RELU ? 255 : a_const_int;
353  }
354  else
355  {
356  _fuse_activation = false;
357  }
358  }
359 
360  // Set the GEMMLowp output stage info
361  gemmlowp_output_stage.gemmlowp_offset = output_quant_info.offset;
362  gemmlowp_output_stage.gemmlowp_min_bound = min_activation;
363  gemmlowp_output_stage.gemmlowp_max_bound = max_activation;
364  }
365 
366  // Configure and tune GEMM
367  // In case of NHWC, we need to run GEMM3D (gemm_3d_depth != 0) in order to avoid reshaping the output matrix
368  const unsigned int gemm_3d_depth = (data_layout == DataLayout::NHWC) ? conv_h : 0;
369 
370  configure_mm(gemm_input_to_use, weights_to_use, biases_to_use, gemm_output_to_use, gemmlowp_output_stage, gemm_3d_depth, act_info);
371 
372  if(!_skip_im2col)
373  {
374  _im2col_output.allocator()->allocate();
375  }
376 
377  if(!_skip_col2im)
378  {
379  // Configure and tune Col2Im
380  _col2im_kernel.configure(gemm_output_to_use, output, Size2D(conv_w, conv_h), num_groups);
381  CLScheduler::get().tune_kernel_static(_col2im_kernel);
382  }
383 
384  if(!_skip_col2im)
385  {
386  _gemm_output.allocator()->allocate();
387  }
388 
389  ARM_COMPUTE_ERROR_ON_MSG((output->info()->dimension(idx_width) != conv_w) || (output->info()->dimension(idx_height) != conv_h),
390  "Output shape does not match the expected one");
391 
392  if(!_fuse_activation)
393  {
394  _activationlayer_function.configure(output, nullptr, act_info);
395  }
396 
398 }
void configure(const ICLTensor *input, ICLTensor *output, const Size2D &kernel_dims, const PadStrideInfo &conv_info, bool has_bias, const Size2D &dilation=Size2D(1U, 1U), unsigned int num_groups=1)
Set the input and output of the kernel.
const DataLayout data_layout
Definition: Im2Col.cpp:146
Quantize to uint8 using a fixed point multiplication.
TensorInfo * info() const override
Interface to be implemented by the child class to return the tensor's metadata.
Definition: CLTensor.cpp:41
void compute_quantized_multipliers_and_shifts(const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *output, unsigned int idx_ofms, int32_t *output_multipliers_ptr, int32_t *output_shifts_ptr)
Compute quantized per-channel multipliers and shifts.
std::pair< unsigned int, unsigned int > scaled_dimensions(unsigned int width, unsigned int height, unsigned int kernel_width, unsigned int kernel_height, const PadStrideInfo &pad_stride_info, const Size2D &dilation=Size2D(1U, 1U))
Returns expected width and height of output scaled tensor depending on dimensions rounding mode.
Definition: Utils.cpp:402
static CLScheduler & get()
Access the scheduler singleton.
Definition: CLScheduler.cpp:99
size_t dimension(size_t index) const override
Return the size of the requested dimension.
Definition: TensorInfo.h:232
static Status validate(const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, const PadStrideInfo &conv_info, const WeightsInfo &weights_info=WeightsInfo(), const Size2D &dilation=Size2D(1U, 1U), const ActivationLayerInfo &act_info=ActivationLayerInfo(), unsigned int num_groups=1)
Static function to check if given info will lead to a valid configuration of CLGEMMConvolutionLayer.
CLTensorAllocator * allocator()
Return a pointer to the tensor's allocator.
Definition: CLTensor.cpp:61
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
void init(const TensorInfo &input, size_t alignment=0)
Initialize a tensor based on the passed TensorInfo.
ITensorInfo & set_quantization_info(const QuantizationInfo &quantization_info) override
Set the quantization settings (scale and offset) of the tensor.
Definition: TensorInfo.cpp:372
DataType data_type() const override
Data type used for each element of the tensor.
Definition: TensorInfo.h:265
void manage(IMemoryManageable *obj) override
Sets a object to be managed by the given memory group.
Definition: MemoryGroup.h:79
bool are_weights_managed(const ITensor *weights)
Check if the weights are managed.
#define ARM_COMPUTE_UNUSED(...)
To avoid unused variables warnings.
Definition: Error.h:152
bool is_data_type_quantized_per_channel(DataType dt)
Check if a given data type is of per channel type.
Definition: Utils.h:1082
#define ARM_COMPUTE_ERROR_ON_MSG(cond, msg)
Definition: Error.h:456
const unsigned int num_groups
Definition: Im2Col.cpp:148
src_info set_data_layout(data_layout)
bool is_data_type_quantized_asymmetric(DataType dt)
Check if a given data type is of asymmetric quantized type.
Definition: Utils.h:1044
void allocate() override
Allocate size specified by TensorInfo of OpenCL memory.
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161
void configure(const ICLTensor *input, const ICLTensor *biases, unsigned int num_groups)
Configures the CLConvolutionLayerReshapeWeights function.
Num samples, height, width, channels.
void configure(ICLTensor *input, ICLTensor *output, ActivationLayerInfo act_info)
Set the input and output tensor.
TensorShape & set(size_t dimension, size_t value, bool apply_dim_correction=true)
Accessor to set the value of one of the dimensions.
Definition: TensorShape.h:78
void set_target(GPUTarget target)
Set the targeted GPU architecture.
Definition: ICLKernel.h:271
void configure(const ICLTensor *weights, const ICLTensor *biases, ICLTensor *output, unsigned int num_groups=1)
Set the input and output tensors.
size_t get_data_layout_dimension_index(const DataLayout data_layout, const DataLayoutDimension data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:327
void tune_kernel_static(ICLKernel &kernel)
Tunes OpenCL kernel.
Definition: CLScheduler.cpp:79
const TensorShape & tensor_shape() const override
Size for each dimension of the tensor.
Definition: TensorInfo.h:261
DataType
Available data types.
Definition: Types.h:74
DataLayout
[DataLayout enum definition]
Definition: Types.h:116
void configure(const ICLTensor *input, ICLTensor *output, const Size2D &convolved_dims, unsigned int num_groups=1)
Set the input and output of the kernel.
ITensor * acquire(const ITensor *weights, ITransformWeights *weights_transform)
Acquire the requested reshape tensor of the selected weights.
uint8_t quantize_qasymm8(float value, const UniformQuantizationInfo &qinfo, RoundingPolicy rounding_policy=RoundingPolicy::TO_NEAREST_UP)
Quantize a value given a 8-bit asymmetric quantization scheme.

References IWeightsManager::acquire(), arm_compute::test::validation::act_info, CLTensorAllocator::allocate(), CLTensor::allocator(), IWeightsManager::are_weights_managed(), ARM_COMPUTE_ERROR_ON_MSG, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, ARM_COMPUTE_UNUSED, arm_compute::BATCHES, ActivationLayerInfo::BOUNDED_RELU, arm_compute::quantization::compute_quantized_multipliers_and_shifts(), CLActivationLayer::configure(), CLConvolutionLayerReshapeWeights::configure(), CLCol2ImKernel::configure(), CLIm2ColKernel::configure(), CLConvolutionLayerReshapeWeightsTransform::configure(), arm_compute::test::validation::conv_info, arm_compute::test::validation::data_layout, arm_compute::test::validation::data_type, TensorInfo::data_type(), arm_compute::test::validation::dilation, ITensorInfo::dimension(), TensorInfo::dimension(), GEMMLowpOutputStageInfo::gemmlowp_max_bound, GEMMLowpOutputStageInfo::gemmlowp_min_bound, GEMMLowpOutputStageInfo::gemmlowp_multiplier, GEMMLowpOutputStageInfo::gemmlowp_multipliers, GEMMLowpOutputStageInfo::gemmlowp_offset, GEMMLowpOutputStageInfo::gemmlowp_shift, GEMMLowpOutputStageInfo::gemmlowp_shifts, CLScheduler::get(), arm_compute::get_data_layout_dimension_index(), arm_compute::HEIGHT, ITensor::info(), CLTensor::info(), ITensorAllocator::init(), arm_compute::test::validation::input, arm_compute::is_data_type_quantized_asymmetric(), arm_compute::is_data_type_quantized_per_channel(), GEMMLowpOutputStageInfo::is_quantized_per_channel, ActivationLayerInfo::LU_BOUNDED_RELU, MemoryGroup::manage(), arm_compute::NHWC, arm_compute::test::validation::num_groups, ITensorInfo::quantization_info(), arm_compute::QUANTIZE_DOWN_FIXEDPOINT, arm_compute::quantize_qasymm8(), ActivationLayerInfo::RELU, arm_compute::scaled_dimensions(), TensorShape::set(), arm_compute::test::validation::set_data_layout(), TensorInfo::set_quantization_info(), ICLKernel::set_target(), TensorInfo::tensor_shape(), ITensorInfo::total_size(), CLScheduler::tune_kernel_static(), GEMMLowpOutputStageInfo::type, QuantizationInfo::uniform(), CLGEMMConvolutionLayer::validate(), arm_compute::test::validation::weights, arm_compute::test::validation::weights_info, and arm_compute::WIDTH.

◆ operator=() [1/2]

CLGEMMConvolutionLayer& operator= ( const CLGEMMConvolutionLayer )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

CLGEMMConvolutionLayer& operator= ( CLGEMMConvolutionLayer &&  )
default

Default move assignment operator.

◆ prepare()

void prepare ( )
overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Note
Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from IFunction.

Definition at line 637 of file CLGEMMConvolutionLayer.cpp.

638 {
639  if(!_is_prepared)
640  {
641  ARM_COMPUTE_ERROR_ON(!_original_weights->is_used());
642  if(_weights_manager && _weights_manager->are_weights_managed(_original_weights))
643  {
644  _weights_manager->run(_original_weights, &_reshape_weights_managed);
645  }
646  else
647  {
648  // Run weights reshaping and mark original weights tensor as unused
649  _weights_reshaped.allocator()->allocate();
650  _reshape_weights.run();
651  _original_weights->mark_as_unused();
652  }
653 
654  // Prepare GEMM
655  _is_quantized ? _mm_gemmlowp.prepare() : _mm_gemm.prepare();
656  if(!_weights_reshaped.is_used())
657  {
658  _weights_reshaped.allocator()->free();
659  }
660 
661  CLScheduler::get().queue().finish();
662  _is_prepared = true;
663  }
664 }
void prepare() override
Prepare the function for executing.
Definition: CLGEMM.cpp:713
void prepare() override
Prepare the function for executing.
static CLScheduler & get()
Access the scheduler singleton.
Definition: CLScheduler.cpp:99
bool is_used() const
Flags if the tensor is used or not.
Definition: ITensor.cpp:162
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:466
CLTensorAllocator * allocator()
Return a pointer to the tensor's allocator.
Definition: CLTensor.cpp:61
void mark_as_unused() const
Marks a tensor as unused.
Definition: ITensor.cpp:167
bool are_weights_managed(const ITensor *weights)
Check if the weights are managed.
cl::CommandQueue & queue()
Accessor for the associated CL command queue.
Definition: CLScheduler.cpp:41
void allocate() override
Allocate size specified by TensorInfo of OpenCL memory.
void free() override
Free allocated OpenCL memory.
ITensor * run(const ITensor *weights, ITransformWeights *weights_transform)
Run the reshape function.
void run() override
Run the kernels contained in the function.

References CLTensorAllocator::allocate(), CLTensor::allocator(), IWeightsManager::are_weights_managed(), ARM_COMPUTE_ERROR_ON, CLTensorAllocator::free(), CLScheduler::get(), ITensor::is_used(), ITensor::mark_as_unused(), CLGEMMLowpMatrixMultiplyCore::prepare(), CLGEMM::prepare(), CLScheduler::queue(), IWeightsManager::run(), and CLConvolutionLayerReshapeWeights::run().

Referenced by CLGEMMConvolutionLayer::run().

◆ run()

void run ( )
overridevirtual

Run the kernels contained in the function.

For NEON kernels:

  • Multi-threading is used for the kernels which are parallelisable.
  • By default std::thread::hardware_concurrency() threads are used.
Note
CPPScheduler::set_num_threads() can be used to manually set the number of threads

For OpenCL kernels:

  • All the kernels are enqueued on the queue associated with CLScheduler.
  • The queue is then flushed.
Note
The function will not block until the kernels are executed. It is the user's responsibility to wait.
Will call prepare() on first run if hasn't been done

Implements IFunction.

Definition at line 600 of file CLGEMMConvolutionLayer.cpp.

601 {
602  prepare();
603 
604  MemoryGroupResourceScope scope_mg(_memory_group);
605 
606  // Run im2col
607  if(!_skip_im2col)
608  {
609  CLScheduler::get().enqueue(_im2col_kernel);
610  }
611 
612  // Runs CLGEMM or CLGEMMLowpMatrixMultiplyCore functions
613  if(_is_quantized)
614  {
615  // Run gemmlowp
616  _mm_gemmlowp.run();
617  }
618  else
619  {
620  // Run gemm
621  _mm_gemm.run();
622  }
623 
624  // Reshape output matrix
625  if(!_skip_col2im)
626  {
627  CLScheduler::get().enqueue(_col2im_kernel, false);
628  }
629 
630  //Run Activation Layer if we cannot fuse in GEMM
631  if(!_fuse_activation)
632  {
633  _activationlayer_function.run();
634  }
635 }
void run() override
Run the kernels contained in the function.
Definition: CLGEMM.cpp:632
static CLScheduler & get()
Access the scheduler singleton.
Definition: CLScheduler.cpp:99
void run() override
Run the kernels contained in the function.
void run() override final
Run the kernels contained in the function.
void enqueue(ICLKernel &kernel, bool flush=true)
Schedule the execution of the passed kernel if possible.
void prepare() override
Prepare the function for executing.

References CLScheduler::enqueue(), CLScheduler::get(), CLGEMMConvolutionLayer::prepare(), ICLSimpleFunction::run(), CLGEMMLowpMatrixMultiplyCore::run(), and CLGEMM::run().

◆ validate()

Status validate ( const ITensorInfo input,
const ITensorInfo weights,
const ITensorInfo biases,
const ITensorInfo output,
const PadStrideInfo conv_info,
const WeightsInfo weights_info = WeightsInfo(),
const Size2D dilation = Size2D(1U, 1U),
const ActivationLayerInfo act_info = ActivationLayerInfo(),
unsigned int  num_groups = 1 
)
static

Static function to check if given info will lead to a valid configuration of CLGEMMConvolutionLayer.

Parameters
[in]inputSource tensor. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: QASYMM8/F16/F32.
[in]weightsWeights tensor. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. Data type supported: Same as input or QASYMM8/QSYMM8_PER_CHANNEL when input is QASYMM8.
[in]biasesBiases tensor. Shared biases supported. Biases are 1D tensor with dimensions [OFM]. Data type supported: Should match input data type, except for input of QASYMM8 type where biases should be of S32 type.
[out]outputDestination tensor. 3 lower dimensions represent a single output [width, height, OFM], while the rest represent batch of outputs. Data types supported: Same as input.
[in]conv_infoContains padding and stride information described in PadStrideInfo.
[in]weights_infoSpecifies if the weights tensor has been reshaped with CLWeightsReshapeKernel. If this is not part of the fully connected layer the weights tensor has also been transposed with CLGEMMReshapeRHSMatrixKernel. Data type supported: Same as input.
[in]dilation(Optional) Dilation, in elements, across x and y. Defaults to (1, 1).
[in]act_info(Optional) Activation layer information in case of a fused activation.
[in]num_groups(Optional) Number of groups when performing a grouped convolution. num_groups != 1 is only supported for NCHW data layout
Returns
a status

Definition at line 400 of file CLGEMMConvolutionLayer.cpp.

402 {
404  ARM_COMPUTE_RETURN_ERROR_ON_MSG(weights_info.are_reshaped(), "Weights already reshaped are not supported!");
406  const bool is_quantized_per_channel = is_data_type_quantized_per_channel(weights->data_type());
407 
408  if(is_quantized_per_channel)
409  {
410  ARM_COMPUTE_RETURN_ERROR_ON_MSG(input->data_type() != DataType::QASYMM8, "Input data type not compatible with Weights");
411  }
412  else
413  {
415  }
417  ARM_COMPUTE_RETURN_ERROR_ON_MSG((num_groups != 1) && (input->data_layout() != DataLayout::NCHW), "Grouping (num_groups != 1) with NHWC data layout is not supported");
418  ARM_COMPUTE_RETURN_ERROR_ON_MSG((num_groups != 1) && (input->data_type() == DataType::QASYMM8), "Grouping (num_groups != 1) is not supported with QASYMM8");
419  ARM_COMPUTE_RETURN_ERROR_ON(((input->dimension(2) / weights->dimension(2)) != num_groups) && (input->data_layout() == DataLayout::NCHW));
420 
421  const DataLayout data_layout = input->data_layout();
422  const DataType data_type = input->data_type();
427 
428  const unsigned int kernel_width = weights->dimension(idx_width);
429  const unsigned int kernel_height = weights->dimension(idx_height);
430  const unsigned int num_kernels = weights->dimension(idx_kernels);
431 
432  TensorInfo im2col_reshaped_info{};
433  TensorInfo info_gemm{};
434  TensorInfo weights_reshaped_info{};
435  const ITensorInfo *gemm_input_to_use = input;
436  const ITensorInfo *gemm_output_to_use = output;
437  const ITensorInfo *weights_to_use = weights;
438  const bool is_quantized = is_data_type_quantized_asymmetric(data_type);
439  const bool skip_im2col = (data_layout == DataLayout::NHWC && kernel_width == 1 && kernel_height == 1 && conv_info.stride().first == 1 && conv_info.stride().second == 1);
440  const bool skip_col2im = data_layout == DataLayout::NHWC;
441  bool fuse_activation = true;
442 
443  ARM_COMPUTE_RETURN_ERROR_ON((weights->dimension(idx_channel) * num_groups) != input->dimension(idx_channel));
444  ARM_COMPUTE_RETURN_ERROR_ON(weights->num_dimensions() > 4);
445 
446  // Validate biases
447  if(biases != nullptr)
448  {
449  if(is_quantized)
450  {
452  }
453  else
454  {
456  }
457  ARM_COMPUTE_RETURN_ERROR_ON(biases->dimension(0) != weights->dimension(idx_kernels));
458  ARM_COMPUTE_RETURN_ERROR_ON(biases->num_dimensions() > 1);
459  }
460 
461  if(act_info.enabled())
462  {
464  }
465 
466  // Get convolved dimensions
467  unsigned int conv_w = 0;
468  unsigned int conv_h = 0;
469 
470  std::tie(conv_w, conv_h) = scaled_dimensions(input->dimension(idx_width),
471  input->dimension(idx_height),
472  kernel_width,
473  kernel_height,
474  conv_info,
475  dilation);
476 
477  unsigned int mat_weights_cols = num_kernels / num_groups;
478 
479  const ITensorInfo *biases_to_use = biases;
480  bool append_bias = false;
481 
482  if(num_groups != 1 && biases != nullptr)
483  {
484  // num_groups != 1 can only be for NCHW
485  // Since it is missing an utility function to reshape the biases, we append the biases into the weights tensor
486  biases_to_use = nullptr;
487  append_bias = true;
488 
490  weights_reshaped_info = TensorInfo(compute_weights_reshaped_shape(*weights, true, num_groups), 1, data_type);
491  }
492  else
493  {
495  weights_reshaped_info = TensorInfo(compute_weights_reshaped_shape(*weights, false, num_groups), 1, data_type);
496  }
497 
498  weights_to_use = &weights_reshaped_info;
499 
500  if(!skip_im2col)
501  {
502  const Size2D kernel_dims(kernel_width, kernel_height);
503 
504  // Output tensor auto initialization if not yet initialized
505  TensorShape expected_output_shape = compute_im2col_conv_shape(input, kernel_dims, conv_info, append_bias, dilation, num_groups == 1, num_groups);
506 
507  auto_init_if_empty(im2col_reshaped_info, input->clone()->set_tensor_shape(expected_output_shape));
508 
509  ARM_COMPUTE_RETURN_ON_ERROR(CLIm2ColKernel::validate(input, &im2col_reshaped_info, kernel_dims, conv_info, append_bias, dilation, num_groups));
510  gemm_input_to_use = &im2col_reshaped_info;
511  }
512 
513  // Create GEMM output tensor
514  if(!skip_col2im)
515  {
516  TensorShape shape_gemm;
517 
518  shape_gemm = gemm_input_to_use->tensor_shape();
519  shape_gemm.set(0, mat_weights_cols);
520  shape_gemm.set(1, conv_w * conv_h);
521 
522  info_gemm = TensorInfo(shape_gemm, 1, data_type);
523  info_gemm.set_quantization_info(output->quantization_info()).set_data_layout(input->data_layout());
524  gemm_output_to_use = &info_gemm;
525  }
526 
527  GEMMLowpOutputStageInfo gemmlowp_output_stage;
528  gemmlowp_output_stage.type = GEMMLowpOutputStageType::QUANTIZE_DOWN_FIXEDPOINT;
529  gemmlowp_output_stage.gemmlowp_offset = 0;
530  gemmlowp_output_stage.is_quantized_per_channel = is_quantized_per_channel;
531 
532  if(is_quantized)
533  {
534  const UniformQuantizationInfo iq_info = input->quantization_info().uniform();
535  const UniformQuantizationInfo oq_info = output->quantization_info().uniform();
536  const auto output_quant_info = (output->total_size() == 0) ? iq_info : oq_info;
537  const unsigned int num_filters = (is_quantized_per_channel) ? num_kernels : 1;
538 
539  gemmlowp_output_stage.gemmlowp_multipliers.resize(num_filters);
540  gemmlowp_output_stage.gemmlowp_shifts.resize(num_filters);
542  weights,
543  output,
544  idx_kernels,
545  gemmlowp_output_stage.gemmlowp_multipliers.data(),
546  gemmlowp_output_stage.gemmlowp_shifts.data());
547  gemmlowp_output_stage.gemmlowp_multiplier = gemmlowp_output_stage.gemmlowp_multipliers[0];
548  gemmlowp_output_stage.gemmlowp_shift = gemmlowp_output_stage.gemmlowp_shifts[0];
549 
550  int min_activation = 0;
551  int max_activation = 0;
552 
553  const std::set<ActivationLayerInfo::ActivationFunction> supported_acts = { ActivationLayerInfo::ActivationFunction::RELU,
556  };
557 
558  if(act_info.enabled())
559  {
560  if(supported_acts.count(act_info.activation()) != 0)
561  {
562  const int a_const_int = quantize_qasymm8(act_info.a(), output_quant_info);
563  const int b_const_int = quantize_qasymm8(act_info.b(), output_quant_info);
564 
565  min_activation = act_info.activation() != ActivationLayerInfo::ActivationFunction::LU_BOUNDED_RELU ? output_quant_info.offset : b_const_int;
566  max_activation = act_info.activation() == ActivationLayerInfo::ActivationFunction::RELU ? 255 : a_const_int;
567  }
568  else
569  {
570  fuse_activation = false;
571  }
572  }
573 
574  // Set the GEMMLowp output stage info
575  gemmlowp_output_stage.gemmlowp_offset = output_quant_info.offset;
576  gemmlowp_output_stage.gemmlowp_min_bound = min_activation;
577  gemmlowp_output_stage.gemmlowp_max_bound = max_activation;
578  }
579 
580  // In case of NHWC, we need to run GEMM3D (gemm_3d_depth != 0) in order to avoid reshaping the output matrix
581  const unsigned int gemm_3d_depth = (data_layout == DataLayout::NHWC) ? conv_h : 0;
582 
583  ARM_COMPUTE_RETURN_ON_ERROR(validate_mm(gemm_input_to_use, weights_to_use, biases_to_use, gemm_output_to_use, gemmlowp_output_stage, gemm_3d_depth, skip_im2col, act_info));
584 
585  // Validate Col2Im
586  if(!skip_col2im)
587  {
588  ARM_COMPUTE_RETURN_ON_ERROR(CLCol2ImKernel::validate(gemm_output_to_use, output, Size2D(conv_w, conv_h), num_groups));
589  }
590 
591  //Validate Activation Layer
592  if(!fuse_activation)
593  {
595  }
596 
597  return Status{};
598 }
static Status validate(const ITensorInfo *input, const ITensorInfo *output, const ActivationLayerInfo &act_info)
Static function to check if given info will lead to a valid configuration of CLActivationLayer.
const DataLayout data_layout
Definition: Im2Col.cpp:146
Quantize to uint8 using a fixed point multiplication.
void compute_quantized_multipliers_and_shifts(const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *output, unsigned int idx_ofms, int32_t *output_multipliers_ptr, int32_t *output_shifts_ptr)
Compute quantized per-channel multipliers and shifts.
std::pair< unsigned int, unsigned int > scaled_dimensions(unsigned int width, unsigned int height, unsigned int kernel_width, unsigned int kernel_height, const PadStrideInfo &pad_stride_info, const Size2D &dilation=Size2D(1U, 1U))
Returns expected width and height of output scaled tensor depending on dimensions rounding mode.
Definition: Utils.cpp:402
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_LAYOUT(...)
Definition: Validate.h:494
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(...)
Definition: Validate.h:545
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204
#define ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(t, c,...)
Definition: Validate.h:792
1 channel, 1 F32 per channel
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:466
#define ARM_COMPUTE_RETURN_ERROR_ON(cond)
If the condition is true, an error is returned.
Definition: Error.h:296
bool auto_init_if_empty(ITensorInfo &info, const TensorShape &shape, int num_channels, DataType data_type, QuantizationInfo quantization_info=QuantizationInfo())
Auto initialize the tensor info (shape, number of channels and data type) if the current assignment i...
Definition: Helpers.inl:202
1 channel, 1 F16 per channel
static Status validate(const ITensorInfo *input, const ITensorInfo *output, const Size2D &convolved_dims, unsigned int num_groups=1)
Static function to check if given info will lead to a valid configuration of CLCol2ImKernel.
1 channel, 1 S32 per channel
static Status validate(const ITensorInfo *input, const ITensorInfo *output, const Size2D &kernel_dims, const PadStrideInfo &conv_info, bool has_bias, const Size2D &dilation=Size2D(1U, 1U), unsigned int num_groups=1)
Static function to check if given info will lead to a valid configuration of CLIm2ColKernel.
bool is_data_type_quantized_per_channel(DataType dt)
Check if a given data type is of per channel type.
Definition: Utils.h:1082
quantized, asymmetric fixed-point 8-bit number unsigned
const unsigned int num_groups
Definition: Im2Col.cpp:148
Num samples, channels, height, width.
src_info set_data_layout(data_layout)
bool is_data_type_quantized_asymmetric(DataType dt)
Check if a given data type is of asymmetric quantized type.
Definition: Utils.h:1044
static Status validate(const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, unsigned int num_groups=1)
Static function to check if given info will lead to a valid configuration of CLConvolutionLayerReshap...
quantized, symmetric per channel fixed-point 8-bit number
#define ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR(...)
Definition: Validate.h:163
TensorShape compute_weights_reshaped_shape(const ITensorInfo &weights, bool has_bias=false, unsigned int num_groups=1)
Calculate the reshaped shape of the weights.
Num samples, height, width, channels.
#define ARM_COMPUTE_RETURN_ERROR_ON_MSG(cond, msg)
If the condition is true, an error is returned.
Definition: Error.h:244
size_t get_data_layout_dimension_index(const DataLayout data_layout, const DataLayoutDimension data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:327
DataType
Available data types.
Definition: Types.h:74
DataLayout
[DataLayout enum definition]
Definition: Types.h:116
TensorShape compute_im2col_conv_shape(const ITensorInfo *input, const Size2D &kernel_dims, const PadStrideInfo &conv_info, bool has_bias, const Size2D &dilation, bool batch_size_on_z, unsigned int num_groups=1)
Calculate the im2col output shape of a tensor.
uint8_t quantize_qasymm8(float value, const UniformQuantizationInfo &qinfo, RoundingPolicy rounding_policy=RoundingPolicy::TO_NEAREST_UP)
Quantize a value given a 8-bit asymmetric quantization scheme.

References arm_compute::test::validation::act_info, ARM_COMPUTE_ERROR_ON, ARM_COMPUTE_RETURN_ERROR_ON, ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_LAYOUT, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES, ARM_COMPUTE_RETURN_ERROR_ON_MSG, ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR, ARM_COMPUTE_RETURN_ON_ERROR, arm_compute::auto_init_if_empty(), arm_compute::BATCHES, ActivationLayerInfo::BOUNDED_RELU, arm_compute::CHANNEL, arm_compute::misc::shape_calculator::compute_im2col_conv_shape(), arm_compute::quantization::compute_quantized_multipliers_and_shifts(), arm_compute::misc::shape_calculator::compute_weights_reshaped_shape(), arm_compute::test::validation::conv_info, arm_compute::test::validation::data_layout, arm_compute::test::validation::data_type, arm_compute::test::validation::dilation, ITensorInfo::dimension(), arm_compute::F16, arm_compute::F32, GEMMLowpOutputStageInfo::gemmlowp_max_bound, GEMMLowpOutputStageInfo::gemmlowp_min_bound, GEMMLowpOutputStageInfo::gemmlowp_multiplier, GEMMLowpOutputStageInfo::gemmlowp_multipliers, GEMMLowpOutputStageInfo::gemmlowp_offset, GEMMLowpOutputStageInfo::gemmlowp_shift, GEMMLowpOutputStageInfo::gemmlowp_shifts, arm_compute::get_data_layout_dimension_index(), arm_compute::HEIGHT, arm_compute::test::validation::input, arm_compute::is_data_type_quantized_asymmetric(), arm_compute::is_data_type_quantized_per_channel(), GEMMLowpOutputStageInfo::is_quantized_per_channel, ActivationLayerInfo::LU_BOUNDED_RELU, arm_compute::NCHW, arm_compute::NHWC, ITensorInfo::num_dimensions(), arm_compute::test::validation::num_groups, arm_compute::QASYMM8, arm_compute::QSYMM8_PER_CHANNEL, ITensorInfo::quantization_info(), arm_compute::QUANTIZE_DOWN_FIXEDPOINT, arm_compute::quantize_qasymm8(), ActivationLayerInfo::RELU, arm_compute::S32, arm_compute::scaled_dimensions(), TensorShape::set(), arm_compute::test::validation::set_data_layout(), ITensorInfo::tensor_shape(), ITensorInfo::total_size(), GEMMLowpOutputStageInfo::type, QuantizationInfo::uniform(), CLActivationLayer::validate(), CLConvolutionLayerReshapeWeights::validate(), CLCol2ImKernel::validate(), CLIm2ColKernel::validate(), arm_compute::test::validation::weights, arm_compute::test::validation::weights_info, and arm_compute::WIDTH.

Referenced by CLGEMMConvolutionLayer::configure(), and CLConvolutionLayer::validate().


The documentation for this class was generated from the following files: