Compute Library
 21.02
NEDepthwiseConvolutionAssemblyDispatch Class Reference

Depthwise convolution assembly kernel glue. More...

#include <NEDepthwiseConvolutionAssemblyDispatch.h>

Collaboration diagram for NEDepthwiseConvolutionAssemblyDispatch:
[legend]

Public Member Functions

 NEDepthwiseConvolutionAssemblyDispatch (std::shared_ptr< IMemoryManager > memory_manager=nullptr)
 Default constructor. More...
 
 NEDepthwiseConvolutionAssemblyDispatch (const NEDepthwiseConvolutionAssemblyDispatch &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 NEDepthwiseConvolutionAssemblyDispatch (NEDepthwiseConvolutionAssemblyDispatch &&)=default
 Default move constructor. More...
 
NEDepthwiseConvolutionAssemblyDispatchoperator= (const NEDepthwiseConvolutionAssemblyDispatch &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
NEDepthwiseConvolutionAssemblyDispatchoperator= (NEDepthwiseConvolutionAssemblyDispatch &&)=default
 Default move assignment operator. More...
 
 ~NEDepthwiseConvolutionAssemblyDispatch ()
 Default destructor. More...
 
void configure (const ITensor *input, const ITensor *weights, const ITensor *bias, ITensor *output, const PadStrideInfo &conv_info, unsigned int depth_multiplier=1, const ActivationLayerInfo &act_info=ActivationLayerInfo(), const Size2D &dilation=Size2D(1, 1))
 Initialize the function's source, destination, kernels and border_size. More...
 
void run () override
 Run the kernels contained in the function. More...
 
void prepare () override
 Prepare the function for executing. More...
 
- Public Member Functions inherited from IFunction
virtual ~IFunction ()=default
 Destructor. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *bias, const ITensorInfo *output, const PadStrideInfo &conv_info, unsigned int depth_multiplier=1, const ActivationLayerInfo &act_info=ActivationLayerInfo(), const Size2D &dilation=Size2D(1, 1))
 Static function to check if given info will lead to a valid configuration of NEDepthwiseConvolutionAssemblyDispatch. More...
 
static bool is_optimized_supported (const ITensorInfo *input, const ITensorInfo *weights, PadStrideInfo conv_info, unsigned int depth_multiplier=1, const Size2D &dilation=Size2D(1, 1))
 Check if the optimized kernel can be used for the given kernel sizes and strides. More...
 

Detailed Description

Depthwise convolution assembly kernel glue.

Definition at line 36 of file NEDepthwiseConvolutionAssemblyDispatch.h.

Constructor & Destructor Documentation

◆ NEDepthwiseConvolutionAssemblyDispatch() [1/3]

NEDepthwiseConvolutionAssemblyDispatch ( std::shared_ptr< IMemoryManager memory_manager = nullptr)

Default constructor.

Parameters
[in,out]memory_managerMemory manager to use

◆ NEDepthwiseConvolutionAssemblyDispatch() [2/3]

Prevent instances of this class from being copied (As this class contains pointers)

◆ NEDepthwiseConvolutionAssemblyDispatch() [3/3]

◆ ~NEDepthwiseConvolutionAssemblyDispatch()

Default destructor.

Member Function Documentation

◆ configure()

void configure ( const ITensor input,
const ITensor weights,
const ITensor bias,
ITensor output,
const PadStrideInfo conv_info,
unsigned int  depth_multiplier = 1,
const ActivationLayerInfo act_info = ActivationLayerInfo(),
const Size2D dilation = Size2D(1, 1) 
)

Initialize the function's source, destination, kernels and border_size.

Note
Supports only NHWC format
Parameters
[in]inputSource tensor. Data type supported: QASYMM8/F16/F32. (Written to only for border filling).
[in]weightsWeights tensor. These are 3D tensors with shape [W, H, IFM]. Data type supported: Same as input.
[in]bias(Optional) Biases tensor. A 1D tensor with shape [IFM]. Must be nullptr if not needed. Data type supported: Same as input.
[out]outputDestination tensor. Data type supported: same as input.
[in]conv_infoPadding and stride information to use for the convolution.
[in]depth_multiplier(Optional) Multiplier to apply to the input's depth in order to retrieve the output's depth. Defaults to 1.
[in]act_info(Optional) Activation layer information in case of a fused activation.
[in]dilation(Optional) Dilation, in elements, across x and y. Defaults to (1, 1).

Definition at line 347 of file NEDepthwiseConvolutionAssemblyDispatch.cpp.

References ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, ARM_COMPUTE_UNUSED, arm_compute::auto_init_if_empty(), ICloneable< T >::clone(), arm_compute::misc::shape_calculator::compute_depthwise_convolution_shape(), arm_compute::test::validation::conv_info, ITensor::info(), arm_compute::test::validation::input, arm_compute::test::validation::output_shape, ITensorInfo::quantization_info(), and NEDepthwiseConvolutionAssemblyDispatch::validate().

355 {
356  ARM_COMPUTE_ERROR_ON_NULLPTR(input, weights, output);
357  ARM_COMPUTE_UNUSED(depth_multiplier);
359  weights->info(),
360  bias != nullptr ? bias->info() : nullptr,
361  output->info(),
362  conv_info,
363  depth_multiplier,
364  act_info,
365  dilation));
366 
367  // Output auto inizialitation if not yet initialized
368  const TensorShape output_shape = misc::shape_calculator::compute_depthwise_convolution_shape(*input->info(), *weights->info(), conv_info, depth_multiplier, dilation);
369  auto_init_if_empty(*output->info(), input->info()->clone()->set_is_resizable(true).reset_padding().set_tensor_shape(output_shape).set_quantization_info(output->info()->quantization_info()));
370 
371  _input = input;
372  _weights = weights;
373  _bias = bias;
374  _output = output;
375  _is_prepared = false;
376 
377  // Create convolver
378  _pImpl->_dwc_assembly_kernel = create_convolver(input, weights, output, conv_info, act_info, dilation);
379  ARM_COMPUTE_ERROR_ON(_pImpl->_dwc_assembly_kernel == nullptr);
380 
381  // Create assembly kernel wrapper
382  _pImpl->_dwc_acl_kernel.configure(_pImpl->_dwc_assembly_kernel.get());
383 
384  constexpr size_t alignment = 128;
385 
386  // Create workspace
387  const unsigned int num_threads = NEScheduler::get().num_threads();
388  const size_t workspace_size = _pImpl->_dwc_assembly_kernel->get_working_space_size(num_threads);
389  ARM_COMPUTE_ERROR_ON_MSG(workspace_size == 0, "Workspace size cannot be 0 !");
390  _workspace.allocator()->init(TensorInfo(TensorShape{ workspace_size }, 1, DataType::S8), alignment);
391  _memory_group.manage(&_workspace);
392  _workspace.allocator()->allocate();
393 
394  // Create packing tensor
395  const size_t pack_tensor_size = _pImpl->_dwc_assembly_kernel->get_packed_params_size();
396  ARM_COMPUTE_ERROR_ON_MSG(pack_tensor_size == 0, "Pack tensor size cannot be 0 !");
397  _packed_weights.allocator()->init(TensorInfo(TensorShape{ pack_tensor_size }, 1, DataType::S8), alignment);
398 }
TensorShape compute_depthwise_convolution_shape(const ITensorInfo &input, const ITensorInfo &weights, PadStrideInfo conv_info, unsigned int depth_multiplier, const Size2D &dilation=Size2D(1U, 1U))
Calculate the depthwise convolution output shape of a tensor.
void init(const TensorAllocator &allocator, const Coordinates &coords, TensorInfo &sub_info)
Shares the same backing memory with another tensor allocator, while the tensor info might be differen...
static Status validate(const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *bias, const ITensorInfo *output, const PadStrideInfo &conv_info, unsigned int depth_multiplier=1, const ActivationLayerInfo &act_info=ActivationLayerInfo(), const Size2D &dilation=Size2D(1, 1))
Static function to check if given info will lead to a valid configuration of NEDepthwiseConvolutionAs...
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:466
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
TensorAllocator * allocator()
Return a pointer to the tensor&#39;s allocator.
Definition: Tensor.cpp:48
void manage(IMemoryManageable *obj) override
Sets a object to be managed by the given memory group.
Definition: MemoryGroup.h:79
#define ARM_COMPUTE_UNUSED(...)
To avoid unused variables warnings.
Definition: Error.h:152
#define ARM_COMPUTE_ERROR_ON_MSG(cond, msg)
Definition: Error.h:456
void allocate() override
Allocate size specified by TensorInfo of CPU memory.
bool auto_init_if_empty(ITensorInfo &info, const TensorShape &shape, int num_channels, DataType data_type, QuantizationInfo quantization_info=QuantizationInfo())
Auto initialize the tensor info (shape, number of channels and data type) if the current assignment i...
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161
virtual unsigned int num_threads() const =0
Returns the number of threads that the SingleThreadScheduler has in his pool.
signed 8-bit number
static IScheduler & get()
Access the scheduler singleton.
Definition: Scheduler.cpp:94

◆ is_optimized_supported()

bool is_optimized_supported ( const ITensorInfo input,
const ITensorInfo weights,
PadStrideInfo  conv_info,
unsigned int  depth_multiplier = 1,
const Size2D dilation = Size2D(1, 1) 
)
static

Check if the optimized kernel can be used for the given kernel sizes and strides.

Warning
Even if this return true the inputs and outputs might need to get permuted as the only layout supported is NHWC
Parameters
[in]inputInput tensor info.
[in]weightsWeights tensor info.
[in]conv_infoConvolution layer metadata.
[in]depth_multiplier(Optional) Depth multiplier to be used.
[in]dilation(Optional) Dilation, in elements, across x and y. Defaults to (1, 1).
Returns
True if the assembly kernel could be used else false. Note that transformations of input/output could be needed.

Definition at line 454 of file NEDepthwiseConvolutionAssemblyDispatch.cpp.

References ARM_COMPUTE_ERROR_ON_NULLPTR, arm_compute::calculate_same_pad(), arm_compute::test::validation::data_layout, ITensorInfo::data_layout(), ITensorInfo::data_type(), ITensorInfo::dimension(), Window::DimX, Window::DimY, Window::DimZ, arm_compute::get_data_layout_dimension_index(), arm_compute::HEIGHT, arm_compute::is_data_type_float(), arm_compute::NCHW, arm_compute::NHWC, PadStrideInfo::pad_bottom(), PadStrideInfo::pad_left(), PadStrideInfo::pad_right(), PadStrideInfo::pad_top(), arm_compute::QASYMM8, arm_compute::QASYMM8_SIGNED, arm_compute::QSYMM8_PER_CHANNEL, TensorShape::set(), PadStrideInfo::stride(), ITensorInfo::tensor_shape(), arm_compute::U, arm_compute::WIDTH, Size2D::x(), Dimensions< T >::x(), Size2D::y(), Dimensions< T >::y(), and Dimensions< T >::z().

459 {
461 
462  // Reshape input shape if in NHWC format
463  const DataLayout data_layout = input->data_layout();
464  TensorShape in_shape{ input->tensor_shape() };
465  if(data_layout == DataLayout::NHWC)
466  {
467  in_shape.set(Window::DimX, input->tensor_shape().y());
468  in_shape.set(Window::DimY, input->tensor_shape().z());
469  in_shape.set(Window::DimZ, input->tensor_shape().x());
470  }
471 
472  // Check data type
473  // TODO (COMPMID-3004): Add assembly optimized routine for QASYMM8_SIGNED NEDepthwiseConvolutionLayer
474  const DataType input_type = input->data_type();
475  const bool is_input_type_valid = is_data_type_float(input_type) || input_type == DataType::QASYMM8;
476  const DataType weights_type = weights->data_type();
477  const bool is_weights_type_valid = is_data_type_float(weights_type) || weights_type == DataType::QASYMM8 || weights_type == DataType::QASYMM8_SIGNED
478  || weights_type == DataType::QSYMM8_PER_CHANNEL;
479 
480  // Check weighs size
481  std::set<unsigned int> supported_kernel_sizes = { 3, 5 };
482  const unsigned int width_idx = get_data_layout_dimension_index(data_layout, DataLayoutDimension::WIDTH);
483  const unsigned int height_idx = get_data_layout_dimension_index(data_layout, DataLayoutDimension::HEIGHT);
484  const unsigned int kernel_w = weights->dimension(width_idx);
485  const unsigned int kernel_h = weights->dimension(height_idx);
486  bool weights_supported = (kernel_w == kernel_h) && (supported_kernel_sizes.count(kernel_w) != 0);
487 
488  // Check for supported strides
489  const auto &strides = conv_info.stride();
490  bool supported_strides = (strides.first == strides.second) && ((strides.first == 1) || (strides.first == 2));
491 
492  // Check for supported padding
493  const auto pad_top = conv_info.pad_top();
494  const auto pad_right = conv_info.pad_right();
495  const auto pad_bottom = conv_info.pad_bottom();
496  const auto pad_left = conv_info.pad_left();
497  PadStrideInfo same_pad = calculate_same_pad(in_shape, TensorShape(kernel_w, kernel_h), conv_info, DataLayout::NCHW, dilation);
498  bool is_same_padding = (pad_top == same_pad.pad_top()) && (pad_right == same_pad.pad_right()) && (pad_bottom == same_pad.pad_bottom()) && (pad_left == same_pad.pad_left());
499  bool is_valid_padding = (pad_top == 0) && (pad_right == 0) && (pad_bottom == 0) && (pad_left == 0);
500  bool supported_padding = is_same_padding || is_valid_padding;
501  // TODO(COMPMID-2464): Enable once dilated conv with stride 2 is supported
502  bool is_dilation_supported = ((dilation == Size2D(1U, 1U)) || ((dilation.x() == dilation.y()) && strides.first == 1));
503 
504  if(weights_type == DataType::QSYMM8_PER_CHANNEL)
505  {
506  is_dilation_supported = is_dilation_supported && (dilation == Size2D(1U, 1U));
507  }
508 
509  return is_input_type_valid && is_weights_type_valid && weights_supported && supported_strides && supported_padding && (depth_multiplier == 1) && is_dilation_supported;
510 }
const DataLayout data_layout
Definition: Im2Col.cpp:151
static constexpr size_t DimX
Alias for dimension 0 also known as X dimension.
Definition: Window.h:43
quantized, asymmetric fixed-point 8-bit number unsigned
Num samples, channels, height, width.
quantized, symmetric per channel fixed-point 8-bit number
static constexpr size_t DimY
Alias for dimension 1 also known as Y dimension.
Definition: Window.h:45
static constexpr size_t DimZ
Alias for dimension 2 also known as Z dimension.
Definition: Window.h:47
Num samples, height, width, channels.
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161
quantized, asymmetric fixed-point 8-bit number signed
size_t get_data_layout_dimension_index(const DataLayout data_layout, const DataLayoutDimension data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:193
DataType
Available data types.
Definition: Types.h:77
DataLayout
[DataLayout enum definition]
Definition: Types.h:120
bool is_data_type_float(DataType dt)
Check if a given data type is of floating point type.
Definition: Utils.h:1148
PadStrideInfo calculate_same_pad(TensorShape input_shape, TensorShape weights_shape, PadStrideInfo conv_info, DataLayout data_layout=DataLayout::NCHW, const Size2D &dilation=Size2D(1u, 1u), const DimensionRoundingType &rounding_type=DimensionRoundingType::FLOOR)
Calculate padding requirements in case of SAME padding.
Definition: Utils.cpp:357

◆ operator=() [1/2]

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

Default move assignment operator.

◆ prepare()

void prepare ( )
overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Note
Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from IFunction.

Definition at line 543 of file NEDepthwiseConvolutionAssemblyDispatch.cpp.

544 {
545  if(!_is_prepared)
546  {
547  _packed_weights.allocator()->allocate();
548  ARM_COMPUTE_ERROR_ON(_packed_weights.buffer() == nullptr);
549 
550  // Pack weights and bias
551  const int weights_element_size = _weights->info()->element_size();
552  const int weights_row_stride = _weights->info()->strides_in_bytes().z() / weights_element_size;
553  const int weights_col_stride = _weights->info()->strides_in_bytes().y() / weights_element_size;
554  _pImpl->_dwc_assembly_kernel->pack_params(_packed_weights.buffer(),
555  _weights->buffer() + _weights->info()->offset_first_element_in_bytes(),
556  weights_row_stride,
557  weights_col_stride,
558  (_bias != nullptr) ? _bias->buffer() : nullptr);
559  _pImpl->_dwc_assembly_kernel->set_packed_params_buffer(_packed_weights.buffer());
560 
561  _weights->mark_as_unused();
562  if(_bias != nullptr)
563  {
564  _bias->mark_as_unused();
565  }
566  _is_prepared = true;
567  }
568 }
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:466
TensorAllocator * allocator()
Return a pointer to the tensor&#39;s allocator.
Definition: Tensor.cpp:48
void mark_as_unused() const
Marks a tensor as unused.
Definition: ITensor.cpp:168
T z() const
Alias to access the size of the third dimension.
Definition: Dimensions.h:97
virtual uint8_t * buffer() const =0
Interface to be implemented by the child class to return a pointer to CPU memory. ...
void allocate() override
Allocate size specified by TensorInfo of CPU memory.
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor&#39;s metadata.
virtual size_t element_size() const =0
Element size in bytes calculated as data_size() * num_channels()
virtual size_t offset_first_element_in_bytes() const =0
The offset from the beginning of the memory allocation to the first element of the tensor...
uint8_t * buffer() const override
Interface to be implemented by the child class to return a pointer to CPU memory. ...
Definition: Tensor.cpp:43
T y() const
Alias to access the size of the second dimension.
Definition: Dimensions.h:92
virtual const Strides & strides_in_bytes() const =0
The strides in bytes for accessing each dimension of the tensor.

◆ run()

void run ( )
overridevirtual

Run the kernels contained in the function.

For Neon kernels:

  • Multi-threading is used for the kernels which are parallelisable.
  • By default std::thread::hardware_concurrency() threads are used.
Note
CPPScheduler::set_num_threads() can be used to manually set the number of threads

For OpenCL kernels:

  • All the kernels are enqueued on the queue associated with CLScheduler.
  • The queue is then flushed.
Note
The function will not block until the kernels are executed. It is the user's responsibility to wait.
Will call prepare() on first run if hasn't been done

Implements IFunction.

Definition at line 512 of file NEDepthwiseConvolutionAssemblyDispatch.cpp.

References ARM_COMPUTE_ERROR_ON.

513 {
514  // Prepare assembly kernel
515  prepare();
516 
517  MemoryGroupResourceScope scope_mg(_memory_group);
518 
519  // Setup inputs/outputs
520  ARM_COMPUTE_ERROR_ON(_workspace.buffer() == nullptr);
521  _pImpl->_dwc_assembly_kernel->set_working_space(static_cast<void *>(_workspace.buffer()));
522 
523  ARM_COMPUTE_ERROR_ON(_input->buffer() == nullptr);
524  const int input_element_size = _input->info()->element_size();
525  const int input_batch_stride = _input->info()->strides_in_bytes()[3] / input_element_size;
526  const int input_row_stride = _input->info()->strides_in_bytes().z() / input_element_size;
527  const int input_col_stride = _input->info()->strides_in_bytes().y() / input_element_size;
528  const void *input_ptr = _input->buffer() + _input->info()->offset_first_element_in_bytes();
529  _pImpl->_dwc_assembly_kernel->set_input(input_ptr, input_batch_stride, input_row_stride, input_col_stride);
530 
531  ARM_COMPUTE_ERROR_ON(_output->buffer() == nullptr);
532  const int output_element_size = _output->info()->element_size();
533  const int output_batch_stride = _output->info()->strides_in_bytes()[3] / output_element_size;
534  const int output_row_stride = _output->info()->strides_in_bytes().z() / output_element_size;
535  const int output_col_stride = _output->info()->strides_in_bytes().y() / output_element_size;
536  void *output_ptr = _output->buffer() + _output->info()->offset_first_element_in_bytes();
537  _pImpl->_dwc_assembly_kernel->set_output(output_ptr, output_batch_stride, output_row_stride, output_col_stride);
538 
539  // Schedule assembly kernel
540  NEScheduler::get().schedule(&_pImpl->_dwc_acl_kernel, Window::DimX);
541 }
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:466
void prepare() override
Prepare the function for executing.
static constexpr size_t DimX
Alias for dimension 0 also known as X dimension.
Definition: Window.h:43
T z() const
Alias to access the size of the third dimension.
Definition: Dimensions.h:97
virtual uint8_t * buffer() const =0
Interface to be implemented by the child class to return a pointer to CPU memory. ...
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor&#39;s metadata.
virtual size_t element_size() const =0
Element size in bytes calculated as data_size() * num_channels()
virtual size_t offset_first_element_in_bytes() const =0
The offset from the beginning of the memory allocation to the first element of the tensor...
virtual void schedule(ICPPKernel *kernel, const Hints &hints)=0
Runs the kernel in the same thread as the caller synchronously.
uint8_t * buffer() const override
Interface to be implemented by the child class to return a pointer to CPU memory. ...
Definition: Tensor.cpp:43
T y() const
Alias to access the size of the second dimension.
Definition: Dimensions.h:92
virtual const Strides & strides_in_bytes() const =0
The strides in bytes for accessing each dimension of the tensor.
static IScheduler & get()
Access the scheduler singleton.
Definition: Scheduler.cpp:94

◆ validate()

Status validate ( const ITensorInfo input,
const ITensorInfo weights,
const ITensorInfo bias,
const ITensorInfo output,
const PadStrideInfo conv_info,
unsigned int  depth_multiplier = 1,
const ActivationLayerInfo act_info = ActivationLayerInfo(),
const Size2D dilation = Size2D(1, 1) 
)
static

Static function to check if given info will lead to a valid configuration of NEDepthwiseConvolutionAssemblyDispatch.

Note
Supports only NHWC format
Parameters
[in]inputSource tensor. Data type supported: QASYMM8/F16/F32. (Written to only for border filling).
[in]weightsWeights tensor. These are 3D tensors with shape [W, H, IFM]. Data type supported: Same as input.
[in]bias(Optional) Biases tensor. A 1D tensor with shape [IFM]. Must be nullptr if not needed. Data type supported: Same as input.
[out]outputDestination tensor. Data type supported: same as input.
[in]conv_infoPadding and stride information to use for the convolution.
[in]depth_multiplier(Optional) Multiplier to apply to the input's depth in order to retrieve the output's depth. Defaults to 1.
[in]act_info(Optional) Activation layer information in case of a fused activation.
[in]dilation(Optional) Dilation, in elements, across x and y. Defaults to (1, 1).
Returns
An error status

Definition at line 400 of file NEDepthwiseConvolutionAssemblyDispatch.cpp.

References ARM_COMPUTE_RETURN_ERROR_ON, ARM_COMPUTE_RETURN_ERROR_ON_CPU_F16_UNSUPPORTED, ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_LAYOUT, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DIMENSIONS, arm_compute::CHANNEL, arm_compute::misc::shape_calculator::compute_depthwise_convolution_shape(), ITensorInfo::data_layout(), ITensorInfo::data_type(), ITensorInfo::dimension(), ActivationLayerInfo::enabled(), arm_compute::F16, arm_compute::F32, arm_compute::get_data_layout_dimension_index(), arm_compute::utils::info_helpers::is_relu(), arm_compute::utils::info_helpers::is_relu6(), ITensorInfo::num_dimensions(), arm_compute::test::validation::output_shape, arm_compute::QASYMM8, arm_compute::QSYMM8_PER_CHANNEL, ITensorInfo::quantization_info(), UniformQuantizationInfo::scale, QuantizationInfo::scale(), ITensorInfo::tensor_shape(), ITensorInfo::total_size(), and QuantizationInfo::uniform().

Referenced by NEDepthwiseConvolutionAssemblyDispatch::configure().

408 {
411  if(weights->data_type() != DataType::QSYMM8_PER_CHANNEL)
412  {
414  }
416 
417  // Validate convolver
418  ARM_COMPUTE_RETURN_ERROR_ON(!is_optimized_supported(input, weights, conv_info, depth_multiplier, dilation));
419 
420  // Validate activation
421  const bool is_relu = arm_compute::utils::info_helpers::is_relu(act_info);
423  ARM_COMPUTE_RETURN_ERROR_ON(act_info.enabled() && !(is_relu || is_relu6));
424 
425  // Check bias
426  if(bias != nullptr)
427  {
428  unsigned int channel_idx = get_data_layout_dimension_index(input->data_layout(), DataLayoutDimension::CHANNEL);
429  ARM_COMPUTE_RETURN_ERROR_ON(bias->num_dimensions() > 1);
430  ARM_COMPUTE_RETURN_ERROR_ON(bias->dimension(0) != weights->dimension(channel_idx));
431  }
432 
433  // Check output
434  if(output->total_size() != 0)
435  {
436  const TensorShape output_shape = misc::shape_calculator::compute_depthwise_convolution_shape(*input, *weights, conv_info, depth_multiplier, dilation);
439  }
440 
441  // The uniform quantization case will only have 1 scale value in the weights quantization info
442  const UniformQuantizationInfo input_qinfo = input->quantization_info().uniform();
443  const QuantizationInfo weights_qinfo = weights->quantization_info();
444  const UniformQuantizationInfo output_qinfo = output->quantization_info().uniform();
445  for(auto const s : weights_qinfo.scale())
446  {
447  const float fmultipler = input_qinfo.scale * s / output_qinfo.scale;
448  ARM_COMPUTE_RETURN_ERROR_ON(fmultipler > 1.f);
449  }
450 
451  return Status{};
452 }
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_LAYOUT(...)
Definition: Validate.h:494
#define ARM_COMPUTE_RETURN_ERROR_ON_CPU_F16_UNSUPPORTED(tensor)
Definition: Validate.h:108
TensorShape compute_depthwise_convolution_shape(const ITensorInfo &input, const ITensorInfo &weights, PadStrideInfo conv_info, unsigned int depth_multiplier, const Size2D &dilation=Size2D(1U, 1U))
Calculate the depthwise convolution output shape of a tensor.
1 channel, 1 F32 per channel
#define ARM_COMPUTE_RETURN_ERROR_ON(cond)
If the condition is true, an error is returned.
Definition: Error.h:296
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DIMENSIONS(...)
Definition: Validate.h:288
1 channel, 1 F16 per channel
quantized, asymmetric fixed-point 8-bit number unsigned
static bool is_optimized_supported(const ITensorInfo *input, const ITensorInfo *weights, PadStrideInfo conv_info, unsigned int depth_multiplier=1, const Size2D &dilation=Size2D(1, 1))
Check if the optimized kernel can be used for the given kernel sizes and strides. ...
bool is_relu6(ActivationLayerInfo activation_info)
Checks if activation information correspond to a relu6 activation function.
Definition: InfoHelpers.h:54
quantized, symmetric per channel fixed-point 8-bit number
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(...)
Definition: Validate.h:545
#define ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(t, c,...)
Definition: Validate.h:792
bool is_relu(ActivationLayerInfo activation_info)
Checks if activation information correspond to a relu activation function.
Definition: InfoHelpers.h:43
size_t get_data_layout_dimension_index(const DataLayout data_layout, const DataLayoutDimension data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:193

The documentation for this class was generated from the following files: