Compute Library
 19.08
CLDepthwiseConvolutionLayer Class Reference

Basic function to execute a generic depthwise convolution. More...

#include <CLDepthwiseConvolutionLayer.h>

Collaboration diagram for CLDepthwiseConvolutionLayer:
[legend]

Public Member Functions

 CLDepthwiseConvolutionLayer ()
 Default constructor. More...
 
 CLDepthwiseConvolutionLayer (const CLDepthwiseConvolutionLayer &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 CLDepthwiseConvolutionLayer (CLDepthwiseConvolutionLayer &&)=default
 Default move constructor. More...
 
CLDepthwiseConvolutionLayeroperator= (const CLDepthwiseConvolutionLayer &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
CLDepthwiseConvolutionLayeroperator= (CLDepthwiseConvolutionLayer &&)=default
 Default move assignment operator. More...
 
void configure (ICLTensor *input, const ICLTensor *weights, const ICLTensor *biases, ICLTensor *output, const PadStrideInfo &conv_info, unsigned int depth_multiplier=1, const ActivationLayerInfo &act_info=ActivationLayerInfo(), const Size2D &dilation=Size2D(1U, 1U))
 Initialize the function's source, destination, weights and convolution information. More...
 
void run () override
 Run the kernels contained in the function. More...
 
void prepare () override
 Prepare the function for executing. More...
 
- Public Member Functions inherited from IFunction
virtual ~IFunction ()=default
 Destructor. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, const PadStrideInfo &conv_info, unsigned int depth_multiplier=1, const ActivationLayerInfo &act_info=ActivationLayerInfo(), const Size2D &dilation=Size2D(1U, 1U))
 Static function to check if given info will lead to a valid configuration of CLDepthwiseConvolutionLayer. More...
 

Detailed Description

Basic function to execute a generic depthwise convolution.

This function calls the following OpenCL kernels:

  1. CLDepthwiseIm2ColKernel
  2. CLGEMMMatrixVectorMultiplyKernel
  3. CLDepthwiseConvolutionLayerReshapeWeightsGenericKernel
  4. CLFillBorderKernel (if pad_x or pad_y > 0)

Definition at line 130 of file CLDepthwiseConvolutionLayer.h.

Constructor & Destructor Documentation

◆ CLDepthwiseConvolutionLayer() [1/3]

Default constructor.

Definition at line 249 of file CLDepthwiseConvolutionLayer.cpp.

250  : _im2col_kernel(), _weights_reshape_kernel(), _v2mm_kernel(), _vector_to_tensor_kernel(), _output_stage_kernel(), _activationlayer_function(), _v2mm_input_fill_border(), _v2mm_weights_fill_border(),
251  _input_reshaped(), _weights_reshaped(), _v2mm_output(), _output_reshaped(), _is_prepared(false), _is_quantized(false), _is_activationlayer_enabled(false), _original_weights(nullptr),
252  _optimised_function(nullptr)
253 {
254 }

◆ CLDepthwiseConvolutionLayer() [2/3]

Prevent instances of this class from being copied (As this class contains pointers)

◆ CLDepthwiseConvolutionLayer() [3/3]

Default move constructor.

Member Function Documentation

◆ configure()

void configure ( ICLTensor input,
const ICLTensor weights,
const ICLTensor biases,
ICLTensor output,
const PadStrideInfo conv_info,
unsigned int  depth_multiplier = 1,
const ActivationLayerInfo act_info = ActivationLayerInfo(),
const Size2D dilation = Size2D(1U, 1U) 
)

Initialize the function's source, destination, weights and convolution information.

Parameters
[in,out]inputSource tensor. Data type supported: QASYMM8/F32. (Written to only for border filling).
[in]weightsWeights tensor. These are 3D tensors with shape [kernel_x, kernel_y, IFM]. Data type supported: Same as input.
[in]biasesBiases tensor. A 1D tensor with shape [IFM]. Must be nullptr if not needed. Data type supported: Same as input, S32 when input is QASYMM8.
[out]outputDestination tensor. Data type supported: same as input.
[in]conv_infoPadding and stride information to use for the convolution.
[in]depth_multiplier(Optional) Multiplier to apply to the input's depth in order to retrieve the output's depth. Defaults to 1.
[in]act_info(Optional) Activation layer information in case of a fused activation.
[in]dilation(Optional) Dilation, in elements, across x and y. Defaults to (1, 1).

Definition at line 256 of file CLDepthwiseConvolutionLayer.cpp.

258 {
262 
263  const size_t idx_w = get_data_layout_dimension_index(input->info()->data_layout(), DataLayoutDimension::WIDTH);
264  const size_t idx_h = get_data_layout_dimension_index(input->info()->data_layout(), DataLayoutDimension::HEIGHT);
265 
266  ARM_COMPUTE_ERROR_ON(weights->info()->dimension(idx_w) + (weights->info()->dimension(idx_w) - 1) * (dilation.x() - 1) > input->info()->dimension(idx_w) + conv_info.pad_left() + conv_info.pad_right());
267  ARM_COMPUTE_ERROR_ON(weights->info()->dimension(idx_h) + (weights->info()->dimension(idx_h) - 1) * (dilation.y() - 1) > input->info()->dimension(idx_h) + conv_info.pad_top() + conv_info.pad_bottom());
268 
269  const bool can_run_optimised_3x3_kernel = (weights->info()->dimension(idx_w) == 3) && (weights->info()->dimension(idx_h) == 3);
270 
271  if(bool(can_run_optimised_3x3_kernel))
272  {
273  auto f = arm_compute::support::cpp14::make_unique<CLDepthwiseConvolutionLayer3x3>();
274  f->configure(input, weights, biases, output, conv_info, depth_multiplier, act_info, dilation);
275  _optimised_function = std::move(f);
276  }
277  else
278  {
279  const size_t idx_c = get_data_layout_dimension_index(input->info()->data_layout(), DataLayoutDimension::CHANNEL);
280 
281  const size_t weights_w = weights->info()->dimension(idx_w);
282  const size_t weights_h = weights->info()->dimension(idx_h);
283  const size_t weights_z = weights->info()->dimension(idx_c);
284 
285  _is_prepared = false;
286  _original_weights = weights;
287  _is_quantized = is_data_type_quantized_asymmetric(input->info()->data_type());
288 
289  bool append_bias = (biases != nullptr) && !_is_quantized;
290  const GPUTarget gpu_target = CLScheduler::get().target();
291 
292  // Calculate output shape
293  TensorShape output_shape = shape_calculator::compute_depthwise_convolution_shape(*input->info(), *weights->info(), conv_info, depth_multiplier, dilation);
294 
295  // Output auto inizialitation if not yet initialized
296  auto_init_if_empty(*output->info(), input->info()->clone()->set_tensor_shape(output_shape));
297  ARM_COMPUTE_ERROR_ON_MISMATCHING_DIMENSIONS(output->info()->tensor_shape(), output_shape);
298 
299  // Output width and height
300  const unsigned int conv_w = output_shape[idx_w];
301  const unsigned int conv_h = output_shape[idx_h];
302 
303  // Set up intermediate tensors
304  const size_t patch_size = weights_w * weights_h + ((append_bias) ? 1 : 0);
305  const size_t conv_size = conv_w * conv_h;
306 
307  const UniformQuantizationInfo iq_info = input->info()->quantization_info().uniform();
308  const UniformQuantizationInfo wq_info = weights->info()->quantization_info().uniform();
309  const UniformQuantizationInfo oq_info = output->info()->quantization_info().uniform();
310 
311  // Im2Col configuration
312  TensorShape shape_im2col = input->info()->tensor_shape();
313  shape_im2col.set(0, patch_size);
314  shape_im2col.set(1, conv_size);
315  shape_im2col.set(2, weights_z);
316  _input_reshaped.allocator()->init(input->info()->clone()->set_is_resizable(true).reset_padding().set_tensor_shape(shape_im2col));
317  _im2col_kernel.set_target(gpu_target);
318  _im2col_kernel.configure(input, &_input_reshaped, Size2D(weights_w, weights_h), conv_info, append_bias, depth_multiplier, dilation);
319  CLScheduler::get().tune_kernel_static(_im2col_kernel);
320 
321  // Weights reshape configuration
322  const TensorShape shape_weights_reshape(patch_size, weights_z);
323  _weights_reshaped.allocator()->init(weights->info()->clone()->set_is_resizable(true).reset_padding().set_tensor_shape(shape_weights_reshape));
324  _weights_reshape_kernel.configure(weights, &_weights_reshaped, append_bias ? biases : nullptr);
325 
326  // GEMV configuration
327  DataType v2mm_dt = (input->info()->data_type() == DataType::QASYMM8) ? DataType::S32 : input->info()->data_type();
328  TensorShape shape_v2mm_out = input->info()->tensor_shape();
329  shape_v2mm_out.set(0, conv_size * weights_z);
330  shape_v2mm_out.set(1, 1);
331  shape_v2mm_out.set(2, 1);
332  _v2mm_output.allocator()->init(input->info()->clone()->set_is_resizable(true).reset_padding().set_data_type(v2mm_dt).set_tensor_shape(shape_v2mm_out));
333  _v2mm_kernel.set_target(gpu_target);
334  _v2mm_kernel.configure(&_input_reshaped, &_weights_reshaped, &_v2mm_output);
335  CLScheduler::get().tune_kernel_static(_v2mm_kernel);
336  _output_reshaped.allocator()->init(_v2mm_output.info()->clone()->set_is_resizable(true).reset_padding().set_tensor_shape(output_shape));
337  _vector_to_tensor_kernel.configure(&_v2mm_output, (_is_quantized) ? &_output_reshaped : output, conv_w, conv_h);
338 
339  // Output staged configuration
340  if(_is_quantized)
341  {
342  const UniformQuantizationInfo output_quant_info = (output->info()->total_size() == 0) ? iq_info : oq_info;
343 
344  int output_multiplier = 0;
345  int output_shift = 0;
346  const float multiplier = iq_info.scale * wq_info.scale / output_quant_info.scale;
347  quantization::calculate_quantized_multiplier_less_than_one(multiplier, &output_multiplier, &output_shift);
348  _output_stage_kernel.configure(&_output_reshaped, biases, output, output_multiplier, output_shift, output_quant_info.offset);
349  _output_reshaped.allocator()->allocate();
350  }
351 
352  // Fill borders on inputs
353  PixelValue zero_in(static_cast<int32_t>(0));
354  PixelValue zero_w(static_cast<int32_t>(0));
355  if(_is_quantized)
356  {
357  zero_in = PixelValue(static_cast<int32_t>(iq_info.offset));
358  zero_w = PixelValue(static_cast<int32_t>(wq_info.offset));
359  }
360  BorderSize border_size = _v2mm_kernel.border_size();
361  _v2mm_input_fill_border.configure(&_input_reshaped, border_size, BorderMode::CONSTANT, zero_in);
362 
363  border_size.bottom = 0;
364  _v2mm_weights_fill_border.configure(&_weights_reshaped, border_size, BorderMode::CONSTANT, zero_w);
365 
366  // Allocate intermediate tensors
367  _input_reshaped.allocator()->allocate();
368  _v2mm_output.allocator()->allocate();
369 
370  //Configure Activation Layer
371  _is_activationlayer_enabled = act_info.enabled();
372 
373  if(_is_activationlayer_enabled)
374  {
375  _activationlayer_function.configure(output, nullptr, act_info);
376  }
377  }
378 }
void configure(const ICLTensor *input, ICLTensor *output, const ICLTensor *biases=nullptr)
Set the input and output of the kernel.
arm_compute::Status calculate_quantized_multiplier_less_than_one(float multiplier, int *quant_multiplier, int *right_shift)
Calculate quantized representation of multiplier with value less than one.
std::unique_ptr< ITensorInfo > clone() const override
Provide a clone of the current object of class T.
Definition: TensorInfo.cpp:306
TensorInfo * info() const override
Interface to be implemented by the child class to return the tensor's metadata.
Definition: CLTensor.cpp:35
BorderSize border_size() const override
The size of the border for that kernel.
TensorShape compute_depthwise_convolution_shape(const ITensorInfo &input, const ITensorInfo &weights, PadStrideInfo conv_info, unsigned int depth_multiplier, const Size2D &dilation=Size2D(1U, 1U))
Calculate the depthwise convolution output shape of a tensor.
#define ARM_COMPUTE_ERROR_ON_MISMATCHING_DATA_TYPES(...)
Definition: Validate.h:543
static CLScheduler & get()
Access the scheduler singleton.
Definition: CLScheduler.cpp:41
void configure(ICLTensor *input, const ICLTensor *bias=nullptr, ICLTensor *output=nullptr, int result_fixedpoint_multiplier=0, int result_shift=0, int result_offset_after_shift=0)
Set the accumulate buffer and the biases of the kernel.
GPUTarget target() const
Get the target GPU.
Definition: CLScheduler.h:112
size_t dimension(size_t index) const override
Return the size of the requested dimension.
Definition: TensorInfo.h:223
QuantizationInfo quantization_info() const override
Get the quantization settings (scale and offset) of the tensor.
Definition: TensorInfo.h:293
1 channel, 1 F32 per channel
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:337
CLTensorAllocator * allocator()
Return a pointer to the tensor's allocator.
Definition: CLTensor.cpp:55
void configure(ICLTensor *tensor, BorderSize border_size, BorderMode border_mode, const PixelValue &constant_border_value=PixelValue())
Initialise the kernel's input, output and border mode.
void init(const TensorInfo &input, size_t alignment=0)
Initialize a tensor based on the passed TensorInfo.
bool auto_init_if_empty(ITensorInfo &info, const TensorShape &shape, int num_channels, DataType data_type, QuantizationInfo quantization_info=QuantizationInfo())
Auto initialize the tensor info (shape, number of channels and data type) if the current assignment i...
Definition: Helpers.inl:201
1 channel, 1 F16 per channel
1 channel, 1 S32 per channel
void configure(const ICLTensor *input, ICLTensor *output, const Size2D &kernel_dims, const PadStrideInfo &conv_info, bool has_bias=false, unsigned int depth_multiplier=1, const Size2D &dilation=Size2D(1U, 1U))
Set the input and output of the kernel.
quantized, asymmetric fixed-point 8-bit number
UniformQuantizationInfo uniform() const
Return per layer quantization info.
#define ARM_COMPUTE_ERROR_ON_MISMATCHING_DIMENSIONS(...)
Definition: Validate.h:286
#define ARM_COMPUTE_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(t, c,...)
Definition: Validate.h:789
bool is_data_type_quantized_asymmetric(DataType dt)
Check if a given data type is of asymmetric quantized type.
Definition: Utils.h:1030
void allocate() override
Allocate size specified by TensorInfo of OpenCL memory.
GPUTarget
Available GPU Targets.
Definition: GPUTarget.h:34
void configure(ICLTensor *input, ICLTensor *output, ActivationLayerInfo act_info)
Set the input and output tensor.
#define ARM_COMPUTE_ERROR_ON_MISMATCHING_DATA_LAYOUT(...)
Definition: Validate.h:492
void set_target(GPUTarget target)
Set the targeted GPU architecture.
Definition: ICLKernel.h:271
size_t get_data_layout_dimension_index(const DataLayout data_layout, const DataLayoutDimension data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:326
void tune_kernel_static(ICLKernel &kernel)
Tunes OpenCL kernel.
Definition: CLScheduler.h:172
DataType
Available data types.
Definition: Types.h:74
void configure(const ICLTensor *input0, const ICLTensor *input1, ICLTensor *output)
Set the input and output of the kernel.
void configure(const ICLTensor *input, ICLTensor *output, size_t conv_w, size_t conv_h)
Set the input and output of the kernel.

References arm_compute::test::validation::act_info, CLTensorAllocator::allocate(), CLTensor::allocator(), ARM_COMPUTE_ERROR_ON, ARM_COMPUTE_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_ERROR_ON_MISMATCHING_DATA_LAYOUT, ARM_COMPUTE_ERROR_ON_MISMATCHING_DATA_TYPES, ARM_COMPUTE_ERROR_ON_MISMATCHING_DIMENSIONS, arm_compute::auto_init_if_empty(), CLGEMMMatrixVectorMultiplyKernel::border_size(), BorderSize::bottom, arm_compute::quantization::calculate_quantized_multiplier_less_than_one(), arm_compute::CHANNEL, ICloneable< T >::clone(), TensorInfo::clone(), arm_compute::misc::shape_calculator::compute_depthwise_convolution_shape(), CLActivationLayer::configure(), CLGEMMMatrixVectorMultiplyKernel::configure(), CLDepthwiseConvolutionLayerReshapeWeightsGenericKernel::configure(), CLFillBorderKernel::configure(), CLDepthwiseVectorToTensorKernel::configure(), CLDepthwiseIm2ColKernel::configure(), CLDirectConvolutionLayerOutputStageKernel::configure(), arm_compute::CONSTANT, arm_compute::test::validation::conv_info, ITensorInfo::data_layout(), ITensorInfo::data_type(), arm_compute::test::validation::dilation, ITensorInfo::dimension(), TensorInfo::dimension(), arm_compute::F16, arm_compute::F32, CLScheduler::get(), arm_compute::get_data_layout_dimension_index(), arm_compute::HEIGHT, ITensor::info(), CLTensor::info(), ITensorAllocator::init(), arm_compute::is_data_type_quantized_asymmetric(), UniformQuantizationInfo::offset, arm_compute::test::validation::output_shape, arm_compute::QASYMM8, ITensorInfo::quantization_info(), TensorInfo::quantization_info(), arm_compute::S32, UniformQuantizationInfo::scale, TensorShape::set(), ICLKernel::set_target(), CLScheduler::target(), ITensorInfo::tensor_shape(), CLScheduler::tune_kernel_static(), QuantizationInfo::uniform(), arm_compute::test::validation::weights, and arm_compute::WIDTH.

Referenced by CLDepthwiseSeparableConvolutionLayer::configure().

◆ operator=() [1/2]

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

Default move assignment operator.

◆ prepare()

void prepare ( )
overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Note
Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from IFunction.

Definition at line 481 of file CLDepthwiseConvolutionLayer.cpp.

482 {
483  if(_optimised_function != nullptr)
484  {
485  _optimised_function->prepare();
486  }
487  else
488  {
489  if(!_is_prepared)
490  {
491  ARM_COMPUTE_ERROR_ON(!_original_weights->is_used());
492 
493  // Run weights reshaping and mark original weights tensor as unused
494  _weights_reshaped.allocator()->allocate();
495  CLScheduler::get().enqueue(_weights_reshape_kernel);
496  CLScheduler::get().enqueue(_v2mm_weights_fill_border);
497  _original_weights->mark_as_unused();
498 
499  CLScheduler::get().queue().finish();
500  _is_prepared = true;
501  }
502  }
503 }
static CLScheduler & get()
Access the scheduler singleton.
Definition: CLScheduler.cpp:41
bool is_used() const
Flags if the tensor is used or not.
Definition: ITensor.cpp:162
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:337
CLTensorAllocator * allocator()
Return a pointer to the tensor's allocator.
Definition: CLTensor.cpp:55
void mark_as_unused() const
Marks a tensor as unused.
Definition: ITensor.cpp:167
void enqueue(ICLKernel &kernel, bool flush=true)
Schedule the execution of the passed kernel if possible.
Definition: CLScheduler.cpp:95
cl::CommandQueue & queue()
Accessor for the associated CL command queue.
Definition: CLScheduler.h:102
void allocate() override
Allocate size specified by TensorInfo of OpenCL memory.

References CLTensorAllocator::allocate(), CLTensor::allocator(), ARM_COMPUTE_ERROR_ON, CLScheduler::enqueue(), CLScheduler::get(), ITensor::is_used(), ITensor::mark_as_unused(), and CLScheduler::queue().

Referenced by CLDepthwiseSeparableConvolutionLayer::prepare(), and CLDepthwiseConvolutionLayer::run().

◆ run()

void run ( )
overridevirtual

Run the kernels contained in the function.

For NEON kernels:

  • Multi-threading is used for the kernels which are parallelisable.
  • By default std::thread::hardware_concurrency() threads are used.
Note
CPPScheduler::set_num_threads() can be used to manually set the number of threads

For OpenCL kernels:

  • All the kernels are enqueued on the queue associated with CLScheduler.
  • The queue is then flushed.
Note
The function will not block until the kernels are executed. It is the user's responsibility to wait.
Will call prepare() on first run if hasn't been done

Implements IFunction.

Definition at line 456 of file CLDepthwiseConvolutionLayer.cpp.

457 {
458  prepare();
459 
460  if(_optimised_function != nullptr)
461  {
462  _optimised_function->run();
463  }
464  else
465  {
466  CLScheduler::get().enqueue(_im2col_kernel);
467  CLScheduler::get().enqueue(_v2mm_input_fill_border);
468  CLScheduler::get().enqueue(_v2mm_kernel);
469  CLScheduler::get().enqueue(_vector_to_tensor_kernel);
470  if(_is_quantized)
471  {
472  CLScheduler::get().enqueue(_output_stage_kernel);
473  }
474  if(_is_activationlayer_enabled)
475  {
476  _activationlayer_function.run();
477  }
478  }
479 }
void prepare() override
Prepare the function for executing.
static CLScheduler & get()
Access the scheduler singleton.
Definition: CLScheduler.cpp:41
void run() override final
Run the kernels contained in the function.
void enqueue(ICLKernel &kernel, bool flush=true)
Schedule the execution of the passed kernel if possible.
Definition: CLScheduler.cpp:95

References CLScheduler::enqueue(), CLScheduler::get(), CLDepthwiseConvolutionLayer::prepare(), and ICLSimpleFunction::run().

Referenced by CLDepthwiseSeparableConvolutionLayer::run().

◆ validate()

Status validate ( const ITensorInfo input,
const ITensorInfo weights,
const ITensorInfo biases,
const ITensorInfo output,
const PadStrideInfo conv_info,
unsigned int  depth_multiplier = 1,
const ActivationLayerInfo act_info = ActivationLayerInfo(),
const Size2D dilation = Size2D(1U, 1U) 
)
static

Static function to check if given info will lead to a valid configuration of CLDepthwiseConvolutionLayer.

Parameters
[in]inputSource tensor info. Data type supported: QASYMM8/F32.
[in]weightsWeights tensor info. These are 3D tensors with shape [kernel_x, kernel_y, IFM]. Data type supported: Same as input.
[in]biasesBiases tensor info. A 1D tensor with shape [IFM]. Must be nullptr if not needed. Data type supported: Same as input, S32 when input is QASYMM8.
[in]outputDestination tensor. Data type supported: same as input.
[in]conv_infoPadding and stride information to use for the convolution.
[in]depth_multiplier(Optional) Multiplier to apply to the input's depth in order to retrieve the output's depth. Defaults to 1.
[in]act_info(Optional) Activation layer information in case of a fused activation.
[in]dilation(Optional) Dilation, in elements, across x and y. Defaults to (1, 1).
Returns
a status

Definition at line 380 of file CLDepthwiseConvolutionLayer.cpp.

382 {
383  const size_t idx_w = get_data_layout_dimension_index(input->data_layout(), DataLayoutDimension::WIDTH);
384  const size_t idx_h = get_data_layout_dimension_index(input->data_layout(), DataLayoutDimension::HEIGHT);
385 
386  ARM_COMPUTE_RETURN_ERROR_ON(weights->dimension(idx_w) + (weights->dimension(idx_w) - 1) * (dilation.x() - 1) > input->dimension(idx_w) + conv_info.pad_left() + conv_info.pad_right());
387  ARM_COMPUTE_RETURN_ERROR_ON(weights->dimension(idx_h) + (weights->dimension(idx_h) - 1) * (dilation.y() - 1) > input->dimension(idx_h) + conv_info.pad_top() + conv_info.pad_bottom());
388 
389  const bool can_run_optimised_3x3_kernel = (weights->dimension(idx_w) == 3) && (weights->dimension(idx_h) == 3);
390 
391  if(!can_run_optimised_3x3_kernel)
392  {
393  const size_t idx_c = get_data_layout_dimension_index(input->data_layout(), DataLayoutDimension::CHANNEL);
394 
396  ARM_COMPUTE_RETURN_ERROR_ON((input->dimension(idx_c) * depth_multiplier) != weights->dimension(idx_c));
397 
398  const bool is_quantized = is_data_type_quantized_asymmetric(input->data_type());
399  const bool append_bias = (biases != nullptr) && !is_quantized;
400  const TensorShape output_shape = shape_calculator::compute_depthwise_convolution_shape(*input, *weights, conv_info, depth_multiplier, dilation);
401  const size_t weights_w = weights->dimension(idx_w);
402  const size_t weights_h = weights->dimension(idx_h);
403  const size_t weights_z = weights->dimension(idx_c);
404  const unsigned int conv_w = output_shape[idx_w];
405  const unsigned int conv_h = output_shape[idx_h];
406  const size_t patch_size = weights_w * weights_h + ((append_bias) ? 1 : 0);
407  const size_t conv_size = conv_w * conv_h;
408 
409  TensorShape shape_im2col = input->tensor_shape();
410  shape_im2col.set(0, patch_size);
411  shape_im2col.set(1, conv_size);
412  shape_im2col.set(2, weights_z);
413  TensorInfo input_reshaped(input->clone()->set_is_resizable(true).reset_padding().set_tensor_shape(shape_im2col));
414  ARM_COMPUTE_RETURN_ON_ERROR(CLDepthwiseIm2ColKernel::validate(input, &input_reshaped, Size2D(weights_w, weights_h), conv_info, append_bias, depth_multiplier, dilation));
415 
416  const TensorShape shape_weights_reshape(patch_size, weights_z);
417  TensorInfo weights_reshaped(weights->clone()->set_is_resizable(true).reset_padding().set_tensor_shape(shape_weights_reshape));
419 
420  DataType v2mm_dt = (input->data_type() == DataType::QASYMM8) ? DataType::S32 : input->data_type();
421  TensorShape shape_v2mm_out = input->tensor_shape();
422  shape_v2mm_out.set(0, conv_size * weights_z);
423  shape_v2mm_out.set(1, 1);
424  shape_v2mm_out.set(2, 1);
425  TensorInfo v2mm_output(input->clone()->set_is_resizable(true).reset_padding().set_data_type(v2mm_dt).set_tensor_shape(shape_v2mm_out));
426  ARM_COMPUTE_RETURN_ON_ERROR(CLGEMMMatrixVectorMultiplyKernel::validate(&input_reshaped, &weights_reshaped, &v2mm_output));
427 
428  TensorInfo output_reshaped(v2mm_output.clone()->set_is_resizable(true).reset_padding().set_tensor_shape(output_shape));
429  ARM_COMPUTE_RETURN_ON_ERROR(CLDepthwiseVectorToTensorKernel::validate(&v2mm_output, (is_quantized) ? &output_reshaped : output, conv_w, conv_h));
430 
431  if(is_quantized)
432  {
433  const UniformQuantizationInfo iq_info = input->quantization_info().uniform();
434  const UniformQuantizationInfo wq_info = weights->quantization_info().uniform();
435  const UniformQuantizationInfo oq_info = (output->total_size() == 0) ? iq_info : output->quantization_info().uniform();
436 
437  const float multiplier = iq_info.scale * wq_info.scale / oq_info.scale;
438  ARM_COMPUTE_UNUSED(multiplier);
439  ARM_COMPUTE_RETURN_ERROR_ON(multiplier > 1.0f);
441  }
442 
443  // Validate Activation Layer
444  if(act_info.enabled())
445  {
447  }
448  }
449  else
450  {
452  }
453  return Status{};
454 }
static Status validate(const ITensorInfo *input, const ITensorInfo *output, const ActivationLayerInfo &act_info)
Static function to check if given info will lead to a valid configuration of CLActivationLayer.
TensorShape compute_depthwise_convolution_shape(const ITensorInfo &input, const ITensorInfo &weights, PadStrideInfo conv_info, unsigned int depth_multiplier, const Size2D &dilation=Size2D(1U, 1U))
Calculate the depthwise convolution output shape of a tensor.
static Status validate(const ITensorInfo *input, const ITensorInfo *output, const ITensorInfo *biases=nullptr)
Static function to check if given info will lead to a valid configuration of CLDepthwiseConvolutionLa...
static Status validate(const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, const PadStrideInfo &conv_info, unsigned int depth_multiplier=1, ActivationLayerInfo act_info=ActivationLayerInfo(), GPUTarget gpu_target=GPUTarget::MIDGARD, const Size2D &dilation=Size2D(1U, 1U))
Static function to check if given info will lead to a valid configuration of CLDepthwiseConvolutionLa...
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:193
static Status validate(const ITensorInfo *input, const ITensorInfo *output, const Size2D &kernel_dims, const PadStrideInfo &conv_info, bool has_bias, unsigned int depth_multiplier, const Size2D &dilation=Size2D(1U, 1U))
Static function to check if given info will lead to a valid configuration of CLDepthwiseIm2ColKernel.
#define ARM_COMPUTE_RETURN_ERROR_ON(cond)
If the condition is true, an error is returned.
Definition: Error.h:244
1 channel, 1 S32 per channel
static Status validate(const ITensorInfo *input, const ITensorInfo *output, size_t conv_w, size_t conv_h)
Static function to check if given info will lead to a valid configuration of CLDepthwiseVectorToTenso...
#define ARM_COMPUTE_UNUSED(...)
To avoid unused variables warnings.
Definition: Error.h:160
quantized, asymmetric fixed-point 8-bit number
bool is_data_type_quantized_asymmetric(DataType dt)
Check if a given data type is of asymmetric quantized type.
Definition: Utils.h:1030
static Status validate(const ITensorInfo *input0, const ITensorInfo *input1, const ITensorInfo *output)
Static function to check if given info will lead to a valid configuration of CLGEMMMatrixVectorMultip...
#define ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR(...)
Definition: Validate.h:163
static Status validate(const ITensorInfo *input, const ITensorInfo *bias=nullptr, const ITensorInfo *output=nullptr)
Static function to check if given info will lead to a valid configuration of CLDirectConvolutionLayer...
size_t get_data_layout_dimension_index(const DataLayout data_layout, const DataLayoutDimension data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:326
DataType
Available data types.
Definition: Types.h:74

References arm_compute::test::validation::act_info, ARM_COMPUTE_RETURN_ERROR_ON, ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR, ARM_COMPUTE_RETURN_ON_ERROR, ARM_COMPUTE_UNUSED, arm_compute::CHANNEL, ICloneable< T >::clone(), arm_compute::misc::shape_calculator::compute_depthwise_convolution_shape(), arm_compute::test::validation::conv_info, ITensorInfo::data_layout(), ITensorInfo::data_type(), arm_compute::test::validation::dilation, ITensorInfo::dimension(), arm_compute::get_data_layout_dimension_index(), arm_compute::HEIGHT, arm_compute::is_data_type_quantized_asymmetric(), arm_compute::MIDGARD, arm_compute::test::validation::output_shape, arm_compute::QASYMM8, ITensorInfo::quantization_info(), arm_compute::S32, UniformQuantizationInfo::scale, TensorShape::set(), ITensorInfo::tensor_shape(), QuantizationInfo::uniform(), CLActivationLayer::validate(), CLGEMMMatrixVectorMultiplyKernel::validate(), CLDepthwiseConvolutionLayerReshapeWeightsGenericKernel::validate(), CLDepthwiseVectorToTensorKernel::validate(), CLDirectConvolutionLayerOutputStageKernel::validate(), CLDepthwiseIm2ColKernel::validate(), CLDepthwiseConvolutionLayer3x3::validate(), arm_compute::test::validation::weights, and arm_compute::WIDTH.


The documentation for this class was generated from the following files: