Compute Library
 21.08
CpuDepthwiseConv2dNativeKernel Class Reference

Interface for the kernel to run a depthwise convolution native on a tensor. More...

#include <CpuDepthwiseConv2dNativeKernel.h>

Collaboration diagram for CpuDepthwiseConv2dNativeKernel:
[legend]

Public Member Functions

 CpuDepthwiseConv2dNativeKernel ()=default
 
 ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE (CpuDepthwiseConv2dNativeKernel)
 
void configure (const ITensorInfo *src, const ITensorInfo *weights, const ITensorInfo *biases, ITensorInfo *dst, const ConvolutionInfo &info)
 Initialize the function's source, destination and parameters. More...
 
void run_op (ITensorPack &tensors, const Window &window, const ThreadInfo &info) override
 Execute the kernel on the passed window. More...
 
const char * name () const override
 Name of the kernel. More...
 
- Public Member Functions inherited from ICPPKernel
virtual ~ICPPKernel ()=default
 Default destructor. More...
 
virtual void run (const Window &window, const ThreadInfo &info)
 Execute the kernel on the passed window. More...
 
virtual void run_nd (const Window &window, const ThreadInfo &info, const Window &thread_locator)
 legacy compatibility layer for implemantions which do not support thread_locator In these cases we simply narrow the interface down the legacy version More...
 
- Public Member Functions inherited from IKernel
 IKernel ()
 Constructor. More...
 
virtual ~IKernel ()=default
 Destructor. More...
 
virtual bool is_parallelisable () const
 Indicates whether or not the kernel is parallelisable. More...
 
virtual BorderSize border_size () const
 The size of the border for that kernel. More...
 
const Windowwindow () const
 The maximum window the kernel can be executed on. More...
 
bool is_window_configured () const
 Function to check if the embedded window of this kernel has been configured. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *src, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *dst, const ConvolutionInfo &info)
 Static function to check if given info will lead to a valid configuration. More...
 

Detailed Description

Interface for the kernel to run a depthwise convolution native on a tensor.

Definition at line 43 of file CpuDepthwiseConv2dNativeKernel.h.

Constructor & Destructor Documentation

◆ CpuDepthwiseConv2dNativeKernel()

Member Function Documentation

◆ ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE()

ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE ( CpuDepthwiseConv2dNativeKernel  )

◆ configure()

void configure ( const ITensorInfo src,
const ITensorInfo weights,
const ITensorInfo biases,
ITensorInfo dst,
const ConvolutionInfo info 
)

Initialize the function's source, destination and parameters.

Note
Supported data layouts: NHWC
Parameters
[in]srcSource tensor. DataType supported: QASYMM8/QASYMM8_SIGNED/F16/F32.
[in]weightsWeights tensor. This is a 3D tensor with dimensions [IFM, W, H]. Data type supported: Same as src or QASYMM8/QASYMM8_SIGNED/QSYMM8_PER_CHANNEL when src is QASYMM8/QASYMM8_SIGNED.
[in]biasesBiases tensor. A 1D tensor with dimensions [IFM]. Must be nullptr if not needed. Data type supported: Same as src, S32 when src is QASYMM8/QASYMM8_SIGNED.
[out]dstDestination tensor. Data type supported: Same as src.
[in]infoDepthwise convolution meta-data.

Definition at line 806 of file CpuDepthwiseConv2dNativeKernel.cpp.

References ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, arm_compute::auto_init_if_empty(), arm_compute::calculate_max_window(), arm_compute::quantization::calculate_quantized_multiplier(), ICloneable< T >::clone(), arm_compute::misc::shape_calculator::compute_depthwise_convolution_shape(), ITensorInfo::data_type(), ConvolutionInfo::depth_multiplier, ConvolutionInfo::dilation, ITensorInfo::dimension(), arm_compute::F16, arm_compute::F32, arm_compute::is_data_type_quantized(), arm_compute::is_data_type_quantized_per_channel(), arm_compute::test::validation::output_shape, ConvolutionInfo::pad_stride_info, arm_compute::QASYMM8, arm_compute::QASYMM8_SIGNED, arm_compute::QSYMM8_PER_CHANNEL, ITensorInfo::quantization_info(), UniformQuantizationInfo::scale, QuantizationInfo::scale(), and QuantizationInfo::uniform().

807 {
809  ARM_COMPUTE_ERROR_THROW_ON(validate_arguments(src, weights, (biases != nullptr) ? biases : nullptr, dst, info));
810 
811  _conv_info = info.pad_stride_info;
812  _depth_multiplier = info.depth_multiplier;
813  _dilation = info.dilation;
814  _has_biases = (biases != nullptr);
815 
816  if(is_data_type_quantized(src->data_type()))
817  {
818  const auto input_scale = src->quantization_info().uniform().scale;
819  const auto output_scale = dst->quantization_info().uniform().scale;
820 
821  auto weights_scale = weights->quantization_info().scale();
822  if(!is_data_type_quantized_per_channel(weights->data_type()))
823  {
824  for(size_t i = 1; i < weights->dimension(channel_idx); ++i)
825  {
826  weights_scale.push_back(weights_scale.front());
827  }
828  }
829 
830  for(const auto &s : weights_scale)
831  {
832  int32_t out_mult = 0;
833  int32_t out_shift = 0;
834  const float multiplier = input_scale * s / output_scale;
835  arm_compute::quantization::calculate_quantized_multiplier(multiplier, &out_mult, &out_shift);
836 
837  _output_multiplier.push_back(out_mult);
838  _output_shift.push_back(out_shift);
839  }
840  }
841 
842  switch(weights->data_type())
843  {
844  case DataType::QASYMM8:
845  _func = &CpuDepthwiseConv2dNativeKernel::run_depthwise<uint8_t, uint8_t>;
846  break;
848  _func = &CpuDepthwiseConv2dNativeKernel::run_depthwise<int8_t, int8_t>;
849  break;
851  if(src->data_type() == DataType::QASYMM8)
852  {
853  _func = &CpuDepthwiseConv2dNativeKernel::run_depthwise<uint8_t, int8_t>;
854  }
855  else
856  {
857  _func = &CpuDepthwiseConv2dNativeKernel::run_depthwise<int8_t, int8_t>;
858  }
859  break;
860 #ifdef __ARM_FEATURE_FP16_VECTOR_ARITHMETIC
861  case DataType::F16:
862  _func = &CpuDepthwiseConv2dNativeKernel::run_depthwise<float16_t, float16_t>;
863  break;
864 #endif // __ARM_FEATURE_FP16_VECTOR_ARITHMETIC
865  case DataType::F32:
866  _func = &CpuDepthwiseConv2dNativeKernel::run_depthwise<float, float>;
867  break;
868  default:
869  ARM_COMPUTE_ERROR("Data type not supported");
870  break;
871  }
872 
874  auto_init_if_empty(*dst, src->clone()->set_is_resizable(true).reset_padding().set_tensor_shape(output_shape).set_quantization_info(dst->quantization_info()));
875 
876  Window win = calculate_max_window(*dst, Steps());
877  ICpuKernel::configure(win);
878 }
bool is_data_type_quantized(DataType dt)
Check if a given data type is of quantized type.
Definition: Utils.h:981
Window calculate_max_window(const ValidRegion &valid_region, const Steps &steps, bool skip_border, BorderSize border_size)
TensorShape compute_depthwise_convolution_shape(const ITensorInfo &input, const ITensorInfo &weights, const ConvolutionInfo &info)
Calculate the depthwise convolution output shape of a tensor.
#define ARM_COMPUTE_ERROR(msg)
Print the given message then throw an std::runtime_error.
Definition: Error.h:352
1 channel, 1 F32 per channel
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
Status calculate_quantized_multiplier(float multiplier, int32_t *quant_multiplier, int32_t *shift, bool ignore_epsilon=false)
Calculate quantized representation of multiplier.
SimpleTensor< float > src
Definition: DFT.cpp:155
1 channel, 1 F16 per channel
bool is_data_type_quantized_per_channel(DataType dt)
Check if a given data type is of per channel type.
Definition: Utils.h:1058
quantized, asymmetric fixed-point 8-bit number unsigned
bool auto_init_if_empty(ITensorInfo &info, const TensorShape &shape, int num_channels, DataType data_type, QuantizationInfo quantization_info=QuantizationInfo())
Auto initialize the tensor info (shape, number of channels and data type) if the current assignment i...
quantized, symmetric per channel fixed-point 8-bit number
ScaleKernelInfo info(interpolation_policy, default_border_mode, PixelValue(), sampling_policy, false)
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:157
quantized, asymmetric fixed-point 8-bit number signed

◆ name()

const char * name ( ) const
overridevirtual

Name of the kernel.

Returns
Kernel name

Implements ICPPKernel.

Definition at line 944 of file CpuDepthwiseConv2dNativeKernel.cpp.

945 {
946  return "CpuDepthwiseConv2dNativeKernel";
947 }

◆ run_op()

void run_op ( ITensorPack tensors,
const Window window,
const ThreadInfo info 
)
overridevirtual

Execute the kernel on the passed window.

Warning
If is_parallelisable() returns false then the passed window must be equal to window()
Note
The window has to be a region within the window returned by the window() method
The width of the window has to be a multiple of num_elems_processed_per_iteration().
Parameters
[in]tensorsA vector containing the tensors to operate on.
[in]windowRegion on which to execute the kernel. (Must be a region of the window returned by window())
[in]infoInfo about executing thread and CPU.

Reimplemented from ICPPKernel.

Definition at line 930 of file CpuDepthwiseConv2dNativeKernel.cpp.

References arm_compute::ACL_DST, arm_compute::ACL_SRC_0, arm_compute::ACL_SRC_1, arm_compute::ACL_SRC_2, ARM_COMPUTE_ERROR_ON, ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW, ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL, ARM_COMPUTE_UNUSED, ITensorPack::get_const_tensor(), ITensorPack::get_tensor(), and IKernel::window().

931 {
935  ARM_COMPUTE_ERROR_ON(_func == nullptr);
936 
937  const auto src = tensors.get_const_tensor(TensorType::ACL_SRC_0);
938  const auto weights = tensors.get_const_tensor(TensorType::ACL_SRC_1);
939  const auto biases = tensors.get_const_tensor(TensorType::ACL_SRC_2);
940  auto dst = tensors.get_tensor(TensorType::ACL_DST);
941  (this->*_func)(src, weights, biases, dst, window, _has_biases);
942 }
const Window & window() const
The maximum window the kernel can be executed on.
Definition: IKernel.cpp:28
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:466
SimpleTensor< float > src
Definition: DFT.cpp:155
#define ARM_COMPUTE_UNUSED(...)
To avoid unused variables warnings.
Definition: Error.h:152
#define ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL(k)
Definition: Validate.h:915
ScaleKernelInfo info(interpolation_policy, default_border_mode, PixelValue(), sampling_policy, false)
#define ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW(f, s)
Definition: Validate.h:201

◆ validate()

Status validate ( const ITensorInfo src,
const ITensorInfo weights,
const ITensorInfo biases,
const ITensorInfo dst,
const ConvolutionInfo info 
)
static

Static function to check if given info will lead to a valid configuration.

Similar to CpuDepthwiseConv2dNativeKernel::configure()

Returns
a status

Definition at line 880 of file CpuDepthwiseConv2dNativeKernel.cpp.

References ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW, ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL, ARM_COMPUTE_RETURN_ON_ERROR, ITensorInfo::data_type(), arm_compute::test::validation::dst, ITensor::info(), arm_compute::is_data_type_quantized_per_channel(), arm_compute::test::validation::src, and IKernel::window().

881 {
882  ARM_COMPUTE_RETURN_ON_ERROR(validate_arguments(src, weights, biases, dst, info));
883  return Status{};
884 }
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204
SimpleTensor< float > src
Definition: DFT.cpp:155
ScaleKernelInfo info(interpolation_policy, default_border_mode, PixelValue(), sampling_policy, false)

The documentation for this class was generated from the following files: