Compute Library
 19.08
CLGEMMConvolutionLayer Class Reference

Basic function to compute the convolution layer. More...

#include <CLGEMMConvolutionLayer.h>

Collaboration diagram for CLGEMMConvolutionLayer:
[legend]

Public Member Functions

 CLGEMMConvolutionLayer (std::shared_ptr< IMemoryManager > memory_manager=nullptr)
 Default constructor. More...
 
 CLGEMMConvolutionLayer (const CLGEMMConvolutionLayer &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 CLGEMMConvolutionLayer (CLGEMMConvolutionLayer &&)=default
 Default move constructor. More...
 
CLGEMMConvolutionLayeroperator= (const CLGEMMConvolutionLayer &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
CLGEMMConvolutionLayeroperator= (CLGEMMConvolutionLayer &&)=default
 Default move assignment operator. More...
 
void configure (const ICLTensor *input, const ICLTensor *weights, const ICLTensor *biases, ICLTensor *output, const PadStrideInfo &conv_info, const WeightsInfo &weights_info=WeightsInfo(), const Size2D &dilation=Size2D(1U, 1U), const ActivationLayerInfo &act_info=ActivationLayerInfo(), unsigned int num_groups=1)
 Set the input and output tensors. More...
 
void run () override
 Run the kernels contained in the function. More...
 
void prepare () override
 Prepare the function for executing. More...
 
- Public Member Functions inherited from IFunction
virtual ~IFunction ()=default
 Destructor. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, const PadStrideInfo &conv_info, const WeightsInfo &weights_info=WeightsInfo(), const Size2D &dilation=Size2D(1U, 1U), const ActivationLayerInfo &act_info=ActivationLayerInfo(), unsigned int num_groups=1)
 Static function to check if given info will lead to a valid configuration of CLGEMMConvolutionLayer. More...
 

Detailed Description

Basic function to compute the convolution layer.

This function calls the following OpenCL kernels/functions:

  1. CLIm2ColKernel
  2. CLGEMM (if the data type is FP32 or FP16)
  3. CLGEMMLowpMatrixMultiplyCore (if the data type is QASYMM8)
  4. CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint (if the data type is QASYMM8)
  5. CLElementwiseOperationKernel for addition (if biases != nullptr and we have a 1x1 convolution with the NHWC data layout)
  6. CLCol2ImKernel (if NCHW data layout)

Definition at line 94 of file CLGEMMConvolutionLayer.h.

Constructor & Destructor Documentation

◆ CLGEMMConvolutionLayer() [1/3]

CLGEMMConvolutionLayer ( std::shared_ptr< IMemoryManager memory_manager = nullptr)

Default constructor.

Parameters
[in]memory_manager(Optional) Memory manager.

Definition at line 93 of file CLGEMMConvolutionLayer.cpp.

94  : _memory_group(memory_manager), _reshape_weights(), _im2col_kernel(), _mm_gemm(memory_manager), _mm_gemmlowp(memory_manager), _col2im_kernel(), _activationlayer_function(),
95  _original_weights(nullptr), _im2col_output(), _weights_reshaped(), _gemm_output(), _skip_im2col(false), _skip_col2im(false), _is_quantized(false), _fuse_activation(true), _is_prepared(false)
96 {
97 }

◆ CLGEMMConvolutionLayer() [2/3]

Prevent instances of this class from being copied (As this class contains pointers)

◆ CLGEMMConvolutionLayer() [3/3]

Default move constructor.

Member Function Documentation

◆ configure()

void configure ( const ICLTensor input,
const ICLTensor weights,
const ICLTensor biases,
ICLTensor output,
const PadStrideInfo conv_info,
const WeightsInfo weights_info = WeightsInfo(),
const Size2D dilation = Size2D(1U, 1U),
const ActivationLayerInfo act_info = ActivationLayerInfo(),
unsigned int  num_groups = 1 
)

Set the input and output tensors.

Parameters
[in]inputSource tensor. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: QASYMM8/F16/F32.
[in]weightsWeights tensor. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. Data type supported: Same as input.
[in]biasesBiases tensor. Shared biases supported. Biases are 1D tensor with dimensions [OFM]. Data type supported: Should match input data type, except for input of QASYMM8 type where biases should be of S32 type.
[out]outputDestination tensor. 3 lower dimensions represent a single output [width, height, OFM], while the rest represent batch of outputs. Data types supported: Same as input.
[in]conv_infoContains padding and stride information described in PadStrideInfo.
[in]weights_infoSpecifies if the weights tensor has been reshaped with CLWeightsReshapeKernel. If this is not part of the fully connected layer the weights tensor has also been transposed with CLGEMMReshapeRHSMatrixKernel. Data type supported: Same as input.
[in]dilation(Optional) Dilation, in elements, across x and y. Defaults to (1, 1).
[in]act_info(Optional) Activation layer information in case of a fused activation.
[in]num_groups(Optional) Number of groups when performing a grouped convolution. num_groups != 1 is only supported for NCHW data layout

Definition at line 177 of file CLGEMMConvolutionLayer.cpp.

179 {
180  ARM_COMPUTE_ERROR_ON_NULLPTR(input, weights, output);
181 
183  weights->info(),
184  biases != nullptr ? biases->info() : nullptr,
185  output->info(),
186  conv_info,
187  weights_info,
188  dilation,
189  act_info,
190  num_groups));
191 
192  const DataType data_type = input->info()->data_type();
193  const DataLayout data_layout = input->info()->data_layout();
197 
198  const unsigned int kernel_width = weights->info()->dimension(idx_width);
199  const unsigned int kernel_height = weights->info()->dimension(idx_height);
200 
201  const UniformQuantizationInfo iq_info = input->info()->quantization_info().uniform();
203  const UniformQuantizationInfo oq_info = output->info()->quantization_info().uniform();
204 
205  _is_prepared = weights_info.retain_internal_weights();
206  _original_weights = weights;
207  _is_quantized = is_data_type_quantized_asymmetric(input->info()->data_type());
208  _skip_im2col = (data_layout == DataLayout::NHWC && kernel_width == 1 && kernel_height == 1 && conv_info.stride().first == 1 && conv_info.stride().second == 1);
209  _skip_col2im = data_layout == DataLayout::NHWC;
210 
211  // Only for quantize there are few cases where we cannot fuse the activation function in GEMM
212  _fuse_activation = true;
213 
214  // Set the GPU target for im2col and col2im
215  _im2col_kernel.set_target(CLScheduler::get().target());
216  _col2im_kernel.set_target(CLScheduler::get().target());
217 
218  const ICLTensor *gemm_input_to_use = input;
219  ICLTensor *gemm_output_to_use = output;
220 
221  // Get parameters from conv_info
222  unsigned int stride_x = 0;
223  unsigned int stride_y = 0;
224  std::tie(stride_x, stride_y) = conv_info.stride();
225 
226  // Get convolved dimensions
227  unsigned int conv_w = 0;
228  unsigned int conv_h = 0;
229  std::tie(conv_w, conv_h) = scaled_dimensions(input->info()->dimension(idx_width),
230  input->info()->dimension(idx_height),
231  kernel_width,
232  kernel_height,
233  conv_info,
234  dilation);
235 
236  unsigned int mat_weights_cols = weights->info()->dimension(idx_kernels) / num_groups;
237 
238  const ICLTensor *biases_to_use = biases;
239  bool append_bias = false;
240 
241  if(num_groups != 1 && biases != nullptr)
242  {
243  // num_groups != 1 can only be for NCHW
244  // Since it is missing an utility function to reshape the biases, we append the biases into the weights tensor
245  biases_to_use = nullptr;
246  append_bias = true;
247 
248  _reshape_weights.configure(weights, biases, &_weights_reshaped, num_groups);
249  }
250  else
251  {
252  _reshape_weights.configure(weights, nullptr, &_weights_reshaped, num_groups);
253  }
254 
255  // Create tensor to store im2col reshaped inputs
256  if(!_skip_im2col)
257  {
258  _memory_group.manage(&_im2col_output);
259 
260  // Configure and tune im2col. im2col output shape is auto-initialized
261  _im2col_kernel.configure(input, &_im2col_output, Size2D(kernel_width, kernel_height), conv_info, append_bias, dilation, num_groups);
262 
263  // Set quantization info
264  _im2col_output.info()->set_quantization_info(input->info()->quantization_info());
265  CLScheduler::get().tune_kernel_static(_im2col_kernel);
266 
267  // Update GEMM input
268  gemm_input_to_use = &_im2col_output;
269  }
270 
271  // Create GEMM output tensor
272  if(!_skip_col2im)
273  {
274  TensorShape shape_gemm;
275 
276  // If we cannot skip col2im it means we run im2col as well
277  shape_gemm = _im2col_output.info()->tensor_shape();
278  shape_gemm.set(0, mat_weights_cols);
279  shape_gemm.set(1, conv_w * conv_h);
280 
281  // TODO(COMPMID-2078): input->clone() doesn't work with subtensors for grouped convolutions.
282  TensorInfo info_gemm(shape_gemm, 1, data_type);
283  info_gemm.set_quantization_info(output->info()->quantization_info()).set_data_layout(input->info()->data_layout());
284  _gemm_output.allocator()->init(info_gemm);
285  _memory_group.manage(&_gemm_output);
286 
287  // Update GEMM output
288  gemm_output_to_use = &_gemm_output;
289  }
290 
291  GEMMLowpOutputStageInfo gemmlowp_output_stage;
293  gemmlowp_output_stage.gemmlowp_offset = 0;
294  gemmlowp_output_stage.gemmlowp_multiplier = 0;
295  gemmlowp_output_stage.gemmlowp_shift = 0;
296 
297  // Configure output stage for quantized case
298  if(_is_quantized)
299  {
300  const auto output_quant_info = (output->info()->total_size() == 0) ? iq_info : oq_info;
301 
302  const float multiplier = (iq_info.scale * wq_info.scale) / output_quant_info.scale;
303  int output_multiplier = 0;
304  int output_shift = 0;
305  quantization::calculate_quantized_multiplier_less_than_one(multiplier, &output_multiplier, &output_shift);
306 
307  int min_activation = 0;
308  int max_activation = 0;
309 
310  const std::set<ActivationLayerInfo::ActivationFunction> supported_acts = { ActivationLayerInfo::ActivationFunction::RELU,
313  };
314 
315  if(act_info.enabled())
316  {
317  if(supported_acts.count(act_info.activation()) != 0)
318  {
319  const int a_const_int = quantize_qasymm8(act_info.a(), output_quant_info);
320  const int b_const_int = quantize_qasymm8(act_info.b(), output_quant_info);
321 
322  min_activation = act_info.activation() != ActivationLayerInfo::ActivationFunction::LU_BOUNDED_RELU ? output_quant_info.offset : b_const_int;
323  max_activation = act_info.activation() == ActivationLayerInfo::ActivationFunction::RELU ? 255 : a_const_int;
324  }
325  else
326  {
327  _fuse_activation = false;
328  }
329  }
330 
331  // Set the GEMMLowp output stage info
332  gemmlowp_output_stage.gemmlowp_offset = output_quant_info.offset;
333  gemmlowp_output_stage.gemmlowp_multiplier = output_multiplier;
334  gemmlowp_output_stage.gemmlowp_shift = output_shift;
335  gemmlowp_output_stage.gemmlowp_min_bound = min_activation;
336  gemmlowp_output_stage.gemmlowp_max_bound = max_activation;
337  }
338 
339  // Configure and tune GEMM
340  // In case of NHWC, we need to run GEMM3D (gemm_3d_depth != 0) in order to avoid reshaping the output matrix
341  const unsigned int gemm_3d_depth = (data_layout == DataLayout::NHWC) ? conv_h : 0;
342 
343  configure_mm(gemm_input_to_use, &_weights_reshaped, biases_to_use, gemm_output_to_use, gemmlowp_output_stage, gemm_3d_depth, act_info);
344 
345  if(!_skip_im2col)
346  {
347  _im2col_output.allocator()->allocate();
348  }
349 
350  if(!_skip_col2im)
351  {
352  // Configure and tune Col2Im
353  _col2im_kernel.configure(gemm_output_to_use, output, Size2D(conv_w, conv_h), num_groups);
354  CLScheduler::get().tune_kernel_static(_col2im_kernel);
355  }
356 
357  if(!_skip_col2im)
358  {
359  _gemm_output.allocator()->allocate();
360  }
361 
362  ARM_COMPUTE_ERROR_ON_MSG((output->info()->dimension(idx_width) != conv_w) || (output->info()->dimension(idx_height) != conv_h),
363  "Output shape does not match the expected one");
364 
365  if(!_fuse_activation)
366  {
367  _activationlayer_function.configure(output, nullptr, act_info);
368  }
369 
371 }
arm_compute::Status calculate_quantized_multiplier_less_than_one(float multiplier, int *quant_multiplier, int *right_shift)
Calculate quantized representation of multiplier with value less than one.
void configure(const ICLTensor *input, ICLTensor *output, const Size2D &kernel_dims, const PadStrideInfo &conv_info, bool has_bias, const Size2D &dilation=Size2D(1U, 1U), unsigned int num_groups=1)
Set the input and output of the kernel.
Shape of a tensor.
Definition: TensorShape.h:39
const DataLayout data_layout
Definition: Im2Col.cpp:146
Quantize to uint8 using a fixed point multiplication.
int gemmlowp_max_bound
GEMMLowp max value used to saturate down the output result before converting back to QASYMM8.
Definition: Types.h:1852
TensorInfo * info() const override
Interface to be implemented by the child class to return the tensor's metadata.
Definition: CLTensor.cpp:35
virtual size_t dimension(size_t index) const =0
Return the size of the requested dimension.
std::pair< unsigned int, unsigned int > scaled_dimensions(unsigned int width, unsigned int height, unsigned int kernel_width, unsigned int kernel_height, const PadStrideInfo &pad_stride_info, const Size2D &dilation=Size2D(1U, 1U))
Returns expected width and height of output scaled tensor depending on dimensions rounding mode.
Definition: Utils.cpp:387
static CLScheduler & get()
Access the scheduler singleton.
Definition: CLScheduler.cpp:41
size_t dimension(size_t index) const override
Return the size of the requested dimension.
Definition: TensorInfo.h:223
virtual DataType data_type() const =0
Data type used for each element of the tensor.
QuantizationInfo quantization_info() const override
Get the quantization settings (scale and offset) of the tensor.
Definition: TensorInfo.h:293
static Status validate(const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, const PadStrideInfo &conv_info, const WeightsInfo &weights_info=WeightsInfo(), const Size2D &dilation=Size2D(1U, 1U), const ActivationLayerInfo &act_info=ActivationLayerInfo(), unsigned int num_groups=1)
Static function to check if given info will lead to a valid configuration of CLGEMMConvolutionLayer.
CLTensorAllocator * allocator()
Return a pointer to the tensor's allocator.
Definition: CLTensor.cpp:55
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:327
Quantization info when assuming per layer quantization.
GEMMLowpOutputStageType type
GEMMLowp output stage type.
Definition: Types.h:1847
void init(const TensorInfo &input, size_t alignment=0)
Initialize a tensor based on the passed TensorInfo.
ITensorInfo & set_quantization_info(const QuantizationInfo &quantization_info) override
Set the quantization settings (scale and offset) of the tensor.
Definition: TensorInfo.cpp:364
int gemmlowp_multiplier
GEMMLowp output stage multiplier used for quantizing to QASYMM8.
Definition: Types.h:1849
#define ARM_COMPUTE_UNUSED(...)
To avoid unused variables warnings.
Definition: Error.h:160
void manage(TensorType *obj)
Sets a object to be managed by the given memory group.
int gemmlowp_shift
GEMMLowp output stage shift used for quantizing to uint8.
Definition: Types.h:1850
const unsigned int num_groups
Definition: Im2Col.cpp:148
UniformQuantizationInfo uniform() const
Return per layer quantization info.
GEMMLowp output stage info.
Definition: Types.h:1845
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor's metadata.
int gemmlowp_offset
GEMMLowp output stage offset used for quantizing to QASYMM8.
Definition: Types.h:1848
virtual QuantizationInfo quantization_info() const =0
Get the quantization settings (scale and offset) of the tensor.
src_info set_data_layout(data_layout)
bool is_data_type_quantized_asymmetric(DataType dt)
Check if a given data type is of asymmetric quantized type.
Definition: Utils.h:1030
void allocate() override
Allocate size specified by TensorInfo of OpenCL memory.
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161
Interface for OpenCL tensor.
Definition: ICLTensor.h:42
virtual size_t total_size() const =0
Returns the total size of the tensor in bytes.
Class for specifying the size of an image or rectangle.
Definition: Size2D.h:34
Num samples, height, width, channels.
void configure(ICLTensor *input, ICLTensor *output, ActivationLayerInfo act_info)
Set the input and output tensor.
TensorShape & set(size_t dimension, size_t value, bool apply_dim_correction=true)
Accessor to set the value of one of the dimensions.
Definition: TensorShape.h:78
Store the tensor's metadata.
Definition: TensorInfo.h:45
void set_target(GPUTarget target)
Set the targeted GPU architecture.
Definition: ICLKernel.h:271
void configure(const ICLTensor *weights, const ICLTensor *biases, ICLTensor *output, unsigned int num_groups=1)
Set the input and output tensors.
size_t get_data_layout_dimension_index(const DataLayout data_layout, const DataLayoutDimension data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:326
void tune_kernel_static(ICLKernel &kernel)
Tunes OpenCL kernel.
Definition: CLScheduler.h:172
const TensorShape & tensor_shape() const override
Size for each dimension of the tensor.
Definition: TensorInfo.h:252
DataType
Available data types.
Definition: Types.h:74
DataLayout
[DataLayout enum definition]
Definition: Types.h:114
void configure(const ICLTensor *input, ICLTensor *output, const Size2D &convolved_dims, unsigned int num_groups=1)
Set the input and output of the kernel.
int gemmlowp_min_bound
GEMMLowp min value used to saturate down the output result before converting back to QASYMM8.
Definition: Types.h:1851
virtual DataLayout data_layout() const =0
Get the data layout of the tensor.
uint8_t quantize_qasymm8(float value, const UniformQuantizationInfo &qinfo, RoundingPolicy rounding_policy=RoundingPolicy::TO_NEAREST_UP)
Quantize a value given a asymmetric quantization scheme.
#define ARM_COMPUTE_ERROR_ON_MSG(cond,...)
Definition: Error.h:328

References arm_compute::test::validation::act_info, CLTensorAllocator::allocate(), CLTensor::allocator(), ARM_COMPUTE_ERROR_ON_MSG, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, ARM_COMPUTE_UNUSED, arm_compute::BATCHES, ActivationLayerInfo::BOUNDED_RELU, arm_compute::quantization::calculate_quantized_multiplier_less_than_one(), CLActivationLayer::configure(), CLConvolutionLayerReshapeWeights::configure(), CLCol2ImKernel::configure(), CLIm2ColKernel::configure(), arm_compute::test::validation::conv_info, arm_compute::test::validation::data_layout, ITensorInfo::data_layout(), arm_compute::test::validation::data_type, ITensorInfo::data_type(), arm_compute::test::validation::dilation, ITensorInfo::dimension(), TensorInfo::dimension(), GEMMLowpOutputStageInfo::gemmlowp_max_bound, GEMMLowpOutputStageInfo::gemmlowp_min_bound, GEMMLowpOutputStageInfo::gemmlowp_multiplier, GEMMLowpOutputStageInfo::gemmlowp_offset, GEMMLowpOutputStageInfo::gemmlowp_shift, CLScheduler::get(), arm_compute::get_data_layout_dimension_index(), arm_compute::HEIGHT, ITensor::info(), CLTensor::info(), ITensorAllocator::init(), arm_compute::is_data_type_quantized_asymmetric(), ActivationLayerInfo::LU_BOUNDED_RELU, MemoryGroupBase< TensorType >::manage(), arm_compute::NHWC, arm_compute::test::validation::num_groups, ITensorInfo::quantization_info(), TensorInfo::quantization_info(), arm_compute::QUANTIZE_DOWN_FIXEDPOINT, arm_compute::quantize_qasymm8(), ActivationLayerInfo::RELU, UniformQuantizationInfo::scale, arm_compute::scaled_dimensions(), TensorShape::set(), arm_compute::test::validation::set_data_layout(), TensorInfo::set_quantization_info(), ICLKernel::set_target(), TensorInfo::tensor_shape(), ITensorInfo::total_size(), CLScheduler::tune_kernel_static(), GEMMLowpOutputStageInfo::type, QuantizationInfo::uniform(), CLGEMMConvolutionLayer::validate(), arm_compute::test::validation::weights, arm_compute::test::validation::weights_info, and arm_compute::WIDTH.

◆ operator=() [1/2]

CLGEMMConvolutionLayer& operator= ( const CLGEMMConvolutionLayer )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

CLGEMMConvolutionLayer& operator= ( CLGEMMConvolutionLayer &&  )
default

Default move assignment operator.

◆ prepare()

void prepare ( )
overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Note
Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from IFunction.

Definition at line 600 of file CLGEMMConvolutionLayer.cpp.

601 {
602  if(!_is_prepared)
603  {
604  ARM_COMPUTE_ERROR_ON(!_original_weights->is_used());
605 
606  // Run weights reshaping and mark original weights tensor as unused
607  _weights_reshaped.allocator()->allocate();
608  _reshape_weights.run();
609  _original_weights->mark_as_unused();
610 
611  // Prepare GEMM
612  _is_quantized ? _mm_gemmlowp.prepare() : _mm_gemm.prepare();
613  if(!_weights_reshaped.is_used())
614  {
615  _weights_reshaped.allocator()->free();
616  }
617 
618  CLScheduler::get().queue().finish();
619  _is_prepared = true;
620  }
621 }
void prepare() override
Prepare the function for executing.
Definition: CLGEMM.cpp:632
void prepare() override
Prepare the function for executing.
static CLScheduler & get()
Access the scheduler singleton.
Definition: CLScheduler.cpp:41
bool is_used() const
Flags if the tensor is used or not.
Definition: ITensor.cpp:162
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:337
CLTensorAllocator * allocator()
Return a pointer to the tensor's allocator.
Definition: CLTensor.cpp:55
void mark_as_unused() const
Marks a tensor as unused.
Definition: ITensor.cpp:167
cl::CommandQueue & queue()
Accessor for the associated CL command queue.
Definition: CLScheduler.h:102
void allocate() override
Allocate size specified by TensorInfo of OpenCL memory.
void free() override
Free allocated OpenCL memory.
void run() override
Run the kernels contained in the function.

References CLTensorAllocator::allocate(), CLTensor::allocator(), ARM_COMPUTE_ERROR_ON, CLTensorAllocator::free(), CLScheduler::get(), ITensor::is_used(), ITensor::mark_as_unused(), CLGEMMLowpMatrixMultiplyCore::prepare(), CLGEMM::prepare(), CLScheduler::queue(), and CLConvolutionLayerReshapeWeights::run().

Referenced by CLGEMMConvolutionLayer::run().

◆ run()

void run ( )
overridevirtual

Run the kernels contained in the function.

For NEON kernels:

  • Multi-threading is used for the kernels which are parallelisable.
  • By default std::thread::hardware_concurrency() threads are used.
Note
CPPScheduler::set_num_threads() can be used to manually set the number of threads

For OpenCL kernels:

  • All the kernels are enqueued on the queue associated with CLScheduler.
  • The queue is then flushed.
Note
The function will not block until the kernels are executed. It is the user's responsibility to wait.
Will call prepare() on first run if hasn't been done

Implements IFunction.

Definition at line 563 of file CLGEMMConvolutionLayer.cpp.

564 {
565  prepare();
566 
567  MemoryGroupResourceScope scope_mg(_memory_group);
568 
569  // Run im2col
570  if(!_skip_im2col)
571  {
572  CLScheduler::get().enqueue(_im2col_kernel);
573  }
574 
575  // Runs CLGEMM or CLGEMMLowpMatrixMultiplyCore functions
576  if(_is_quantized)
577  {
578  // Run gemmlowp
579  _mm_gemmlowp.run();
580  }
581  else
582  {
583  // Run gemm
584  _mm_gemm.run();
585  }
586 
587  // Reshape output matrix
588  if(!_skip_col2im)
589  {
590  CLScheduler::get().enqueue(_col2im_kernel, false);
591  }
592 
593  //Run Activation Layer if we cannot fuse in GEMM
594  if(!_fuse_activation)
595  {
596  _activationlayer_function.run();
597  }
598 }
void run() override
Run the kernels contained in the function.
Definition: CLGEMM.cpp:572
static CLScheduler & get()
Access the scheduler singleton.
Definition: CLScheduler.cpp:41
void run() override
Run the kernels contained in the function.
void run() override final
Run the kernels contained in the function.
void enqueue(ICLKernel &kernel, bool flush=true)
Schedule the execution of the passed kernel if possible.
Definition: CLScheduler.cpp:95
Memory group resources scope handling class.
Definition: IMemoryGroup.h:46
void prepare() override
Prepare the function for executing.

References CLScheduler::enqueue(), CLScheduler::get(), CLGEMMConvolutionLayer::prepare(), ICLSimpleFunction::run(), CLGEMMLowpMatrixMultiplyCore::run(), and CLGEMM::run().

◆ validate()

Status validate ( const ITensorInfo input,
const ITensorInfo weights,
const ITensorInfo biases,
const ITensorInfo output,
const PadStrideInfo conv_info,
const WeightsInfo weights_info = WeightsInfo(),
const Size2D dilation = Size2D(1U, 1U),
const ActivationLayerInfo act_info = ActivationLayerInfo(),
unsigned int  num_groups = 1 
)
static

Static function to check if given info will lead to a valid configuration of CLGEMMConvolutionLayer.

Parameters
[in]inputSource tensor. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: QASYMM8/F16/F32.
[in]weightsWeights tensor. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. Data type supported: Same as input.
[in]biasesBiases tensor. Shared biases supported. Biases are 1D tensor with dimensions [OFM]. Data type supported: Should match input data type, except for input of QASYMM8 type where biases should be of S32 type.
[out]outputDestination tensor. 3 lower dimensions represent a single output [width, height, OFM], while the rest represent batch of outputs. Data types supported: Same as input.
[in]conv_infoContains padding and stride information described in PadStrideInfo.
[in]weights_infoSpecifies if the weights tensor has been reshaped with CLWeightsReshapeKernel. If this is not part of the fully connected layer the weights tensor has also been transposed with CLGEMMReshapeRHSMatrixKernel. Data type supported: Same as input.
[in]dilation(Optional) Dilation, in elements, across x and y. Defaults to (1, 1).
[in]act_info(Optional) Activation layer information in case of a fused activation.
[in]num_groups(Optional) Number of groups when performing a grouped convolution. num_groups != 1 is only supported for NCHW data layout
Returns
a status

Definition at line 373 of file CLGEMMConvolutionLayer.cpp.

375 {
377  ARM_COMPUTE_RETURN_ERROR_ON_MSG(weights_info.are_reshaped(), "Weights already reshaped are not supported!");
381  ARM_COMPUTE_RETURN_ERROR_ON_MSG((num_groups != 1) && (input->data_layout() != DataLayout::NCHW), "Grouping (num_groups != 1) with NHWC data layout is not supported");
382  ARM_COMPUTE_RETURN_ERROR_ON_MSG((num_groups != 1) && (input->data_type() == DataType::QASYMM8), "Grouping (num_groups != 1) is not supported with QASYMM8");
383  ARM_COMPUTE_RETURN_ERROR_ON(((input->dimension(2) / weights->dimension(2)) != num_groups) && (input->data_layout() == DataLayout::NCHW));
384 
385  const DataLayout data_layout = input->data_layout();
386  const DataType data_type = input->data_type();
391 
392  const unsigned int kernel_width = weights->dimension(idx_width);
393  const unsigned int kernel_height = weights->dimension(idx_height);
394 
395  TensorInfo im2col_reshaped_info{};
396  TensorInfo info_gemm{};
397  TensorInfo weights_reshaped_info{};
398  const ITensorInfo *gemm_input_to_use = input;
399  const ITensorInfo *gemm_output_to_use = output;
400  const ITensorInfo *weights_to_use = weights;
401 
402  const bool is_quantized = is_data_type_quantized_asymmetric(data_type);
403  const bool skip_im2col = (data_layout == DataLayout::NHWC && kernel_width == 1 && kernel_height == 1 && conv_info.stride().first == 1 && conv_info.stride().second == 1);
404  const bool skip_col2im = data_layout == DataLayout::NHWC;
405  bool fuse_activation = true;
406 
407  const UniformQuantizationInfo iq_info = input->quantization_info().uniform();
408  const UniformQuantizationInfo wq_info = weights->quantization_info().uniform();
409  const UniformQuantizationInfo oq_info = output->quantization_info().uniform();
410 
411  ARM_COMPUTE_RETURN_ERROR_ON((weights->dimension(idx_channel) * num_groups) != input->dimension(idx_channel));
412  ARM_COMPUTE_RETURN_ERROR_ON(weights->num_dimensions() > 4);
413 
414  // Validate biases
415  if(biases != nullptr)
416  {
417  if(is_quantized)
418  {
420  }
421  else
422  {
424  }
425  ARM_COMPUTE_RETURN_ERROR_ON(biases->dimension(0) != weights->dimension(idx_kernels));
427  }
428 
429  if(act_info.enabled())
430  {
432  }
433 
434  // Get convolved dimensions
435  unsigned int conv_w = 0;
436  unsigned int conv_h = 0;
437 
438  std::tie(conv_w, conv_h) = scaled_dimensions(input->dimension(idx_width),
439  input->dimension(idx_height),
440  kernel_width,
441  kernel_height,
442  conv_info,
443  dilation);
444 
445  unsigned int mat_weights_cols = weights->dimension(idx_kernels) / num_groups;
446 
447  const ITensorInfo *biases_to_use = biases;
448  bool append_bias = false;
449 
450  if(num_groups != 1 && biases != nullptr)
451  {
452  // num_groups != 1 can only be for NCHW
453  // Since it is missing an utility function to reshape the biases, we append the biases into the weights tensor
454  biases_to_use = nullptr;
455  append_bias = true;
456 
458  weights_reshaped_info = TensorInfo(compute_weights_reshaped_shape(*weights, true, num_groups), 1, data_type);
459  }
460  else
461  {
463  weights_reshaped_info = TensorInfo(compute_weights_reshaped_shape(*weights, false, num_groups), 1, data_type);
464  }
465 
466  weights_to_use = &weights_reshaped_info;
467 
468  if(!skip_im2col)
469  {
470  const Size2D kernel_dims(kernel_width, kernel_height);
471 
472  // Output tensor auto initialization if not yet initialized
473  TensorShape expected_output_shape = compute_im2col_conv_shape(input, kernel_dims, conv_info, append_bias, dilation, num_groups == 1, num_groups);
474 
475  auto_init_if_empty(im2col_reshaped_info, input->clone()->set_tensor_shape(expected_output_shape));
476 
477  ARM_COMPUTE_RETURN_ON_ERROR(CLIm2ColKernel::validate(input, &im2col_reshaped_info, kernel_dims, conv_info, append_bias, dilation, num_groups));
478  gemm_input_to_use = &im2col_reshaped_info;
479  }
480 
481  // Create GEMM output tensor
482  if(!skip_col2im)
483  {
484  TensorShape shape_gemm;
485 
486  shape_gemm = gemm_input_to_use->tensor_shape();
487  shape_gemm.set(0, mat_weights_cols);
488  shape_gemm.set(1, conv_w * conv_h);
489 
490  info_gemm = TensorInfo(shape_gemm, 1, data_type);
491  info_gemm.set_quantization_info(output->quantization_info()).set_data_layout(input->data_layout());
492  gemm_output_to_use = &info_gemm;
493  }
494 
495  GEMMLowpOutputStageInfo gemmlowp_output_stage;
497  gemmlowp_output_stage.gemmlowp_offset = 0;
498  gemmlowp_output_stage.gemmlowp_multiplier = 0;
499  gemmlowp_output_stage.gemmlowp_shift = 0;
500 
501  if(is_quantized)
502  {
503  const auto output_quant_info = (output->total_size() == 0) ? iq_info : oq_info;
504 
505  const float multiplier = (iq_info.scale * wq_info.scale) / output_quant_info.scale;
506  int output_multiplier = 0;
507  int output_shift = 0;
508 
509  ARM_COMPUTE_RETURN_ON_ERROR(quantization::calculate_quantized_multiplier_less_than_one(multiplier, &output_multiplier, &output_shift));
510 
511  int min_activation = 0;
512  int max_activation = 0;
513 
514  const std::set<ActivationLayerInfo::ActivationFunction> supported_acts = { ActivationLayerInfo::ActivationFunction::RELU,
517  };
518 
519  if(act_info.enabled())
520  {
521  if(supported_acts.count(act_info.activation()) != 0)
522  {
523  const int a_const_int = quantize_qasymm8(act_info.a(), output_quant_info);
524  const int b_const_int = quantize_qasymm8(act_info.b(), output_quant_info);
525 
526  min_activation = act_info.activation() != ActivationLayerInfo::ActivationFunction::LU_BOUNDED_RELU ? output_quant_info.offset : b_const_int;
527  max_activation = act_info.activation() == ActivationLayerInfo::ActivationFunction::RELU ? 255 : a_const_int;
528  }
529  else
530  {
531  fuse_activation = false;
532  }
533  }
534 
535  // Set the GEMMLowp output stage info
536  gemmlowp_output_stage.gemmlowp_offset = output_quant_info.offset;
537  gemmlowp_output_stage.gemmlowp_multiplier = output_multiplier;
538  gemmlowp_output_stage.gemmlowp_shift = output_shift;
539  gemmlowp_output_stage.gemmlowp_min_bound = min_activation;
540  gemmlowp_output_stage.gemmlowp_max_bound = max_activation;
541  }
542 
543  // In case of NHWC, we need to run GEMM3D (gemm_3d_depth != 0) in order to avoid reshaping the output matrix
544  const unsigned int gemm_3d_depth = (data_layout == DataLayout::NHWC) ? conv_h : 0;
545 
546  ARM_COMPUTE_RETURN_ON_ERROR(validate_mm(gemm_input_to_use, weights_to_use, biases_to_use, gemm_output_to_use, gemmlowp_output_stage, gemm_3d_depth, skip_im2col, act_info));
547 
548  // Validate Col2Im
549  if(!skip_col2im)
550  {
551  ARM_COMPUTE_RETURN_ON_ERROR(CLCol2ImKernel::validate(gemm_output_to_use, output, Size2D(conv_w, conv_h), num_groups));
552  }
553 
554  //Validate Activation Layer
555  if(!fuse_activation)
556  {
558  }
559 
560  return Status{};
561 }
virtual size_t num_dimensions() const =0
The number of dimensions of the tensor (rank)
static Status validate(const ITensorInfo *input, const ITensorInfo *output, const ActivationLayerInfo &act_info)
Static function to check if given info will lead to a valid configuration of CLActivationLayer.
arm_compute::Status calculate_quantized_multiplier_less_than_one(float multiplier, int *quant_multiplier, int *right_shift)
Calculate quantized representation of multiplier with value less than one.
Shape of a tensor.
Definition: TensorShape.h:39
const DataLayout data_layout
Definition: Im2Col.cpp:146
Quantize to uint8 using a fixed point multiplication.
int gemmlowp_max_bound
GEMMLowp max value used to saturate down the output result before converting back to QASYMM8.
Definition: Types.h:1852
virtual size_t dimension(size_t index) const =0
Return the size of the requested dimension.
std::pair< unsigned int, unsigned int > scaled_dimensions(unsigned int width, unsigned int height, unsigned int kernel_width, unsigned int kernel_height, const PadStrideInfo &pad_stride_info, const Size2D &dilation=Size2D(1U, 1U))
Returns expected width and height of output scaled tensor depending on dimensions rounding mode.
Definition: Utils.cpp:387
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_LAYOUT(...)
Definition: Validate.h:494
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(...)
Definition: Validate.h:545
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:193
virtual DataType data_type() const =0
Data type used for each element of the tensor.
#define ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(t, c,...)
Definition: Validate.h:791
1 channel, 1 F32 per channel
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:337
Store the tensor's metadata.
Definition: ITensorInfo.h:40
Quantization info when assuming per layer quantization.
Status class.
Definition: Error.h:52
#define ARM_COMPUTE_RETURN_ERROR_ON(cond)
If the condition is true, an error is returned.
Definition: Error.h:244
GEMMLowpOutputStageType type
GEMMLowp output stage type.
Definition: Types.h:1847
bool auto_init_if_empty(ITensorInfo &info, const TensorShape &shape, int num_channels, DataType data_type, QuantizationInfo quantization_info=QuantizationInfo())
Auto initialize the tensor info (shape, number of channels and data type) if the current assignment i...
Definition: Helpers.inl:201
1 channel, 1 F16 per channel
static Status validate(const ITensorInfo *input, const ITensorInfo *output, const Size2D &convolved_dims, unsigned int num_groups=1)
Static function to check if given info will lead to a valid configuration of CLCol2ImKernel.
1 channel, 1 S32 per channel
int gemmlowp_multiplier
GEMMLowp output stage multiplier used for quantizing to QASYMM8.
Definition: Types.h:1849
static Status validate(const ITensorInfo *input, const ITensorInfo *output, const Size2D &kernel_dims, const PadStrideInfo &conv_info, bool has_bias, const Size2D &dilation=Size2D(1U, 1U), unsigned int num_groups=1)
Static function to check if given info will lead to a valid configuration of CLIm2ColKernel.
int gemmlowp_shift
GEMMLowp output stage shift used for quantizing to uint8.
Definition: Types.h:1850
virtual const TensorShape & tensor_shape() const =0
Size for each dimension of the tensor.
quantized, asymmetric fixed-point 8-bit number
const unsigned int num_groups
Definition: Im2Col.cpp:148
#define ARM_COMPUTE_RETURN_ERROR_ON_MSG(cond,...)
If the condition is true, an error is returned.
Definition: Error.h:214
UniformQuantizationInfo uniform() const
Return per layer quantization info.
virtual std::unique_ptr< T > clone() const =0
Provide a clone of the current object of class T.
GEMMLowp output stage info.
Definition: Types.h:1845
int gemmlowp_offset
GEMMLowp output stage offset used for quantizing to QASYMM8.
Definition: Types.h:1848
virtual QuantizationInfo quantization_info() const =0
Get the quantization settings (scale and offset) of the tensor.
Num samples, channels, height, width.
src_info set_data_layout(data_layout)
bool is_data_type_quantized_asymmetric(DataType dt)
Check if a given data type is of asymmetric quantized type.
Definition: Utils.h:1030
static Status validate(const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, unsigned int num_groups=1)
Static function to check if given info will lead to a valid configuration of CLConvolutionLayerReshap...
#define ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR(...)
Definition: Validate.h:163
virtual size_t total_size() const =0
Returns the total size of the tensor in bytes.
TensorShape compute_weights_reshaped_shape(const ITensorInfo &weights, bool has_bias=false, unsigned int num_groups=1)
Calculate the reshaped shape of the weights.
Class for specifying the size of an image or rectangle.
Definition: Size2D.h:34
Num samples, height, width, channels.
TensorShape & set(size_t dimension, size_t value, bool apply_dim_correction=true)
Accessor to set the value of one of the dimensions.
Definition: TensorShape.h:78
Store the tensor's metadata.
Definition: TensorInfo.h:45
size_t get_data_layout_dimension_index(const DataLayout data_layout, const DataLayoutDimension data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:326
DataType
Available data types.
Definition: Types.h:74
DataLayout
[DataLayout enum definition]
Definition: Types.h:114
TensorShape compute_im2col_conv_shape(const ITensorInfo *input, const Size2D &kernel_dims, const PadStrideInfo &conv_info, bool has_bias, const Size2D &dilation, bool batch_size_on_z, unsigned int num_groups=1)
Calculate the im2col output shape of a tensor.
int gemmlowp_min_bound
GEMMLowp min value used to saturate down the output result before converting back to QASYMM8.
Definition: Types.h:1851
virtual DataLayout data_layout() const =0
Get the data layout of the tensor.
uint8_t quantize_qasymm8(float value, const UniformQuantizationInfo &qinfo, RoundingPolicy rounding_policy=RoundingPolicy::TO_NEAREST_UP)
Quantize a value given a asymmetric quantization scheme.

References arm_compute::test::validation::act_info, ARM_COMPUTE_ERROR_ON, ARM_COMPUTE_RETURN_ERROR_ON, ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_LAYOUT, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES, ARM_COMPUTE_RETURN_ERROR_ON_MSG, ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR, ARM_COMPUTE_RETURN_ON_ERROR, arm_compute::auto_init_if_empty(), arm_compute::BATCHES, ActivationLayerInfo::BOUNDED_RELU, arm_compute::quantization::calculate_quantized_multiplier_less_than_one(), arm_compute::CHANNEL, ICloneable< T >::clone(), arm_compute::misc::shape_calculator::compute_im2col_conv_shape(), arm_compute::misc::shape_calculator::compute_weights_reshaped_shape(), arm_compute::test::validation::conv_info, arm_compute::test::validation::data_layout, ITensorInfo::data_layout(), arm_compute::test::validation::data_type, ITensorInfo::data_type(), arm_compute::test::validation::dilation, ITensorInfo::dimension(), arm_compute::F16, arm_compute::F32, GEMMLowpOutputStageInfo::gemmlowp_max_bound, GEMMLowpOutputStageInfo::gemmlowp_min_bound, GEMMLowpOutputStageInfo::gemmlowp_multiplier, GEMMLowpOutputStageInfo::gemmlowp_offset, GEMMLowpOutputStageInfo::gemmlowp_shift, arm_compute::get_data_layout_dimension_index(), arm_compute::HEIGHT, arm_compute::is_data_type_quantized_asymmetric(), ActivationLayerInfo::LU_BOUNDED_RELU, arm_compute::NCHW, arm_compute::NHWC, ITensorInfo::num_dimensions(), arm_compute::test::validation::num_groups, arm_compute::QASYMM8, ITensorInfo::quantization_info(), arm_compute::QUANTIZE_DOWN_FIXEDPOINT, arm_compute::quantize_qasymm8(), ActivationLayerInfo::RELU, arm_compute::S32, UniformQuantizationInfo::scale, arm_compute::scaled_dimensions(), TensorShape::set(), arm_compute::test::validation::set_data_layout(), ITensorInfo::tensor_shape(), ITensorInfo::total_size(), GEMMLowpOutputStageInfo::type, QuantizationInfo::uniform(), CLActivationLayer::validate(), CLConvolutionLayerReshapeWeights::validate(), CLCol2ImKernel::validate(), CLIm2ColKernel::validate(), arm_compute::test::validation::weights, arm_compute::test::validation::weights_info, and arm_compute::WIDTH.

Referenced by CLGEMMConvolutionLayer::configure(), and CLConvolutionLayer::validate().


The documentation for this class was generated from the following files: