Compute Library
 21.08
CpuDepthwiseConv2dAssemblyWrapperKernel Class Referencefinal

This class is a wrapper for the depthwise convolution assembly kernels. More...

#include <CpuDepthwiseConv2dAssemblyWrapperKernel.h>

Collaboration diagram for CpuDepthwiseConv2dAssemblyWrapperKernel:
[legend]

Public Member Functions

 CpuDepthwiseConv2dAssemblyWrapperKernel ()
 Default constructor. More...
 
 ~CpuDepthwiseConv2dAssemblyWrapperKernel ()
 
 ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE (CpuDepthwiseConv2dAssemblyWrapperKernel)
 
void configure (const ITensorInfo *src, const ITensorInfo *weights, const ITensorInfo *bias, ITensorInfo *dst, const ConvolutionInfo &info, const CPUInfo &cpu_info)
 Initialise the kernel's src and dst. More...
 
void run_op (ITensorPack &tensors, const Window &window, const ThreadInfo &info) override
 Execute the kernel on the passed window. More...
 
const char * name () const override
 Name of the kernel. More...
 
void pack_parameters (void *parameters_ptr, void *bias_ptr, void *weights_ptr, size_t ld_weights_col, size_t ld_weights_row)
 Pack bias and weights in a storage space for the assembly kernel. More...
 
size_t get_storage_size () const
 Get the amount of storage space required for the rearranged weights and bias. More...
 
size_t get_working_size (unsigned int num_threads, unsigned int num_input_channels) const
 Get size of the workspace needed by the assembly kernel. More...
 
bool is_configured () const
 Was the asm kernel successfully configured? More...
 
- Public Member Functions inherited from ICPPKernel
virtual ~ICPPKernel ()=default
 Default destructor. More...
 
virtual void run (const Window &window, const ThreadInfo &info)
 Execute the kernel on the passed window. More...
 
virtual void run_nd (const Window &window, const ThreadInfo &info, const Window &thread_locator)
 legacy compatibility layer for implemantions which do not support thread_locator In these cases we simply narrow the interface down the legacy version More...
 
- Public Member Functions inherited from IKernel
 IKernel ()
 Constructor. More...
 
virtual ~IKernel ()=default
 Destructor. More...
 
virtual bool is_parallelisable () const
 Indicates whether or not the kernel is parallelisable. More...
 
virtual BorderSize border_size () const
 The size of the border for that kernel. More...
 
const Windowwindow () const
 The maximum window the kernel can be executed on. More...
 
bool is_window_configured () const
 Function to check if the embedded window of this kernel has been configured. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *src, const ITensorInfo *weights, const ITensorInfo *bias, const ITensorInfo *dst, const ConvolutionInfo &info)
 Indicates whether or not this function can be used to process the given parameters. More...
 

Detailed Description

This class is a wrapper for the depthwise convolution assembly kernels.

Definition at line 47 of file CpuDepthwiseConv2dAssemblyWrapperKernel.h.

Constructor & Destructor Documentation

◆ CpuDepthwiseConv2dAssemblyWrapperKernel()

Default constructor.

Definition at line 197 of file CpuDepthwiseConv2dAssemblyWrapperKernel.cpp.

198  : _kernel_asm(nullptr),
199  _multipliers(),
200  _left_shifts(),
201  _right_shifts()
202 {
203 }

◆ ~CpuDepthwiseConv2dAssemblyWrapperKernel()

Member Function Documentation

◆ ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE()

ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE ( CpuDepthwiseConv2dAssemblyWrapperKernel  )

◆ configure()

void configure ( const ITensorInfo src,
const ITensorInfo weights,
const ITensorInfo bias,
ITensorInfo dst,
const ConvolutionInfo info,
const CPUInfo cpu_info 
)

Initialise the kernel's src and dst.

Parameters
[in]srcSource tensor info. Data types supported: QASYMM8/QASYMM8_SIGNED/F16/F32.
[in]weightsWeights tensor info. These are 3D tensors with shape [kernel_x, kernel_y, IFM]. Data type supported: same as src or QASYMM8/QASYMM8_SIGNED/QSYMM8_PER_CHANNEL when src is QASYMM8/QASYMM8_SIGNED.
[in]biasBias tensor. A 1D tensor with shape [IFM]. Must be nullptr if not needed. Data type supported: same as src, S32 when src is QASYMM8/QASYMM8_SIGNED.
[out]dstDestination tensor info. Data type supported: same as input.
[in]infoDepthwise convolution layer meta-data.
[in]cpu_infoCPU information needed to select the most appropriate kernel.

Definition at line 207 of file CpuDepthwiseConv2dAssemblyWrapperKernel.cpp.

References ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_UNUSED, arm_compute::auto_init_if_empty(), arm_compute::calculate_max_window(), ICloneable< T >::clone(), arm_compute::misc::shape_calculator::compute_depthwise_convolution_shape(), ITensorInfo::data_type(), arm_compute::test::validation::dst, arm_compute::test::validation::dst_shape, arm_compute::F16, arm_compute::F32, arm_compute::test::validation::info, arm_compute::is_data_type_quantized_per_channel(), arm_compute::QASYMM8, arm_compute::QASYMM8_SIGNED, and arm_compute::test::validation::src.

209 {
210  ARM_COMPUTE_UNUSED(cpu_info);
212 
213  // Destination initialization if not yet initialized
214  const TensorShape dst_shape = compute_depthwise_convolution_shape(*src, *weights, info);
215  auto_init_if_empty(*dst, src->clone()->set_tensor_shape(dst_shape));
216 
217 #if defined(__aarch64__)
218  switch(src->data_type())
219  {
220  case DataType::QASYMM8:
221  if(is_data_type_quantized_per_channel(weights->data_type()))
222  {
223  create_arm_dwc_quant<uint8_t, int8_t, uint8_t>(src, weights, dst, info, cpu_info, _kernel_asm, _multipliers, _right_shifts, _left_shifts);
224  }
225  else
226  {
227  create_arm_dwc_quant<uint8_t, uint8_t, uint8_t>(src, weights, dst, info, cpu_info, _kernel_asm, _multipliers, _right_shifts, _left_shifts);
228  }
229  break;
231  create_arm_dwc_quant<int8_t, int8_t, int8_t>(src, weights, dst, info, cpu_info, _kernel_asm, _multipliers, _right_shifts, _left_shifts);
232  break;
233 #if defined(__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
234  case DataType::F16:
235  create_arm_dwc<float16_t, float16_t, float16_t>(src, weights, dst, info, cpu_info, _kernel_asm);
236  break;
237 #endif // defined(__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
238  case DataType::F32:
239  create_arm_dwc<float, float, float>(src, weights, dst, info, cpu_info, _kernel_asm);
240  break;
241  default:
242  break;
243  }
244 #endif // defined(__aarch64__)
245 
246  Window win = calculate_max_window(*dst, Steps());
247  ICpuKernel::configure(win);
248 }
Window calculate_max_window(const ValidRegion &valid_region, const Steps &steps, bool skip_border, BorderSize border_size)
TensorShape compute_depthwise_convolution_shape(const ITensorInfo &input, const ITensorInfo &weights, const ConvolutionInfo &info)
Calculate the depthwise convolution output shape of a tensor.
1 channel, 1 F32 per channel
SimpleTensor< float > src
Definition: DFT.cpp:155
1 channel, 1 F16 per channel
#define ARM_COMPUTE_UNUSED(...)
To avoid unused variables warnings.
Definition: Error.h:152
bool is_data_type_quantized_per_channel(DataType dt)
Check if a given data type is of per channel type.
Definition: Utils.h:1058
quantized, asymmetric fixed-point 8-bit number unsigned
bool auto_init_if_empty(ITensorInfo &info, const TensorShape &shape, int num_channels, DataType data_type, QuantizationInfo quantization_info=QuantizationInfo())
Auto initialize the tensor info (shape, number of channels and data type) if the current assignment i...
ScaleKernelInfo info(interpolation_policy, default_border_mode, PixelValue(), sampling_policy, false)
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:157
quantized, asymmetric fixed-point 8-bit number signed

◆ get_storage_size()

size_t get_storage_size ( ) const

Get the amount of storage space required for the rearranged weights and bias.

Returns
size of workspace

Definition at line 338 of file CpuDepthwiseConv2dAssemblyWrapperKernel.cpp.

339 {
340  return _kernel_asm->get_storage_size();
341 }

◆ get_working_size()

size_t get_working_size ( unsigned int  num_threads,
unsigned int  num_input_channels 
) const

Get size of the workspace needed by the assembly kernel.

Parameters
[in]num_threadsMaximum number of threads that are going to be spawned.
[in]num_input_channelsNumber of channels of the input tensor.
Returns
size of workspace

Definition at line 343 of file CpuDepthwiseConv2dAssemblyWrapperKernel.cpp.

344 {
345  return _kernel_asm->get_working_size(num_threads, num_input_channels);
346 }

◆ is_configured()

bool is_configured ( ) const

Was the asm kernel successfully configured?

Returns
True if the asm kernel is configured and ready to run

Definition at line 348 of file CpuDepthwiseConv2dAssemblyWrapperKernel.cpp.

349 {
350  return _kernel_asm != nullptr;
351 }

◆ name()

const char * name ( ) const
overridevirtual

Name of the kernel.

Returns
Kernel name

Implements ICPPKernel.

Definition at line 353 of file CpuDepthwiseConv2dAssemblyWrapperKernel.cpp.

354 {
355  return "CpuDepthwiseConv2dAssemblyWrapperKernel";
356 }

◆ pack_parameters()

void pack_parameters ( void *  parameters_ptr,
void *  bias_ptr,
void *  weights_ptr,
size_t  ld_weights_col,
size_t  ld_weights_row 
)

Pack bias and weights in a storage space for the assembly kernel.

Parameters
[in]parameters_ptrPointer to storage space.
[in]bias_ptrPointer to bias buffer.
[in]weights_ptrPointer to weights buffer.
[in]ld_weights_colColumns displacement for the weights tensor.
[in]ld_weights_rowRows displacement for the weights tensor.

Definition at line 333 of file CpuDepthwiseConv2dAssemblyWrapperKernel.cpp.

334 {
335  _kernel_asm->pack_parameters(parameters_ptr, bias_ptr, weights_ptr, ld_weights_col, ld_weight_row);
336 }

◆ run_op()

void run_op ( ITensorPack tensors,
const Window window,
const ThreadInfo info 
)
overridevirtual

Execute the kernel on the passed window.

Warning
If is_parallelisable() returns false then the passed window must be equal to window()
Note
The window has to be a region within the window returned by the window() method
The width of the window has to be a multiple of num_elems_processed_per_iteration().
Parameters
[in]tensorsA vector containing the tensors to operate on.
[in]windowRegion on which to execute the kernel. (Must be a region of the window returned by window())
[in]infoInfo about executing thread and CPU.

Reimplemented from ICPPKernel.

Definition at line 296 of file CpuDepthwiseConv2dAssemblyWrapperKernel.cpp.

References arm_compute::ACL_DST, arm_compute::ACL_INT_0, arm_compute::ACL_INT_1, arm_compute::ACL_SRC_0, ARM_COMPUTE_ERROR_ON, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL, ARM_COMPUTE_UNUSED, ITensor::buffer(), arm_compute::test::validation::dst_shape, ITensorPack::empty(), ITensorPack::get_const_tensor(), ITensorPack::get_tensor(), ITensor::info(), BorderSize::left, ThreadInfo::num_threads, ITensorInfo::offset_first_element_in_bytes(), ITensorInfo::padding(), ITensorInfo::tensor_shape(), and ThreadInfo::thread_id.

297 {
298  ARM_COMPUTE_ERROR_ON_NULLPTR(_kernel_asm.get());
302 
303  ARM_COMPUTE_ERROR_ON(tensors.empty());
304 
305  const ITensor *src = tensors.get_const_tensor(TensorType::ACL_SRC_0);
306  ITensor *dst = tensors.get_tensor(TensorType::ACL_DST);
307  ITensor *workspace = tensors.get_tensor(TensorType::ACL_INT_0);
308  ITensor *storage = tensors.get_tensor(TensorType::ACL_INT_1);
309 
310  const auto src_ptr = src->buffer() + src->info()->offset_first_element_in_bytes();
311  auto dst_ptr = dst->buffer() + dst->info()->offset_first_element_in_bytes();
312  auto working_space = workspace->buffer() + workspace->info()->offset_first_element_in_bytes();
313  auto parameters_ptr = storage->buffer() + storage->info()->offset_first_element_in_bytes();
314 
315  const auto src_shape = src->info()->tensor_shape();
316  const auto dst_shape = dst->info()->tensor_shape();
317  const auto src_padding = src->info()->padding();
318  const auto dst_padding = dst->info()->padding();
319 
320  const size_t ld_src_col = src_shape[0] + src_padding.left + src_padding.right;
321  const size_t ld_src_row = ld_src_col * (src_shape[1] + src_padding.top + src_padding.bottom);
322  const size_t ld_src_batch = ld_src_row * src_shape[2];
323  const size_t ld_dst_col = dst_shape[0] + dst_padding.left + dst_padding.right;
324  const size_t ld_dst_row = ld_dst_col * (dst_shape[1] + dst_padding.top + dst_padding.bottom);
325  const size_t ld_dst_batch = ld_dst_row * dst_shape[2];
326 
327  _kernel_asm->execute(src_ptr, ld_src_col, ld_src_row, ld_src_batch,
328  parameters_ptr,
329  dst_ptr, ld_dst_col, ld_dst_row, ld_dst_batch,
330  working_space, info.thread_id, info.num_threads);
331 }
const Window & window() const
The maximum window the kernel can be executed on.
Definition: IKernel.cpp:28
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:466
SimpleTensor< float > src
Definition: DFT.cpp:155
#define ARM_COMPUTE_UNUSED(...)
To avoid unused variables warnings.
Definition: Error.h:152
#define ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL(k)
Definition: Validate.h:915
ScaleKernelInfo info(interpolation_policy, default_border_mode, PixelValue(), sampling_policy, false)
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:157

◆ validate()

Status validate ( const ITensorInfo src,
const ITensorInfo weights,
const ITensorInfo bias,
const ITensorInfo dst,
const ConvolutionInfo info 
)
static

Indicates whether or not this function can be used to process the given parameters.

Similar to CpuDepthwiseConv2dAssemblyWrapperKernel::configure()

Returns
a status.

Definition at line 250 of file CpuDepthwiseConv2dAssemblyWrapperKernel.cpp.

References ARM_COMPUTE_RETURN_ERROR_MSG, ARM_COMPUTE_RETURN_ERROR_ON, ARM_COMPUTE_RETURN_ERROR_ON_CPU_F16_UNSUPPORTED, ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DIMENSIONS, ARM_COMPUTE_RETURN_ERROR_ON_MSG, ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR, arm_compute::misc::shape_calculator::compute_depthwise_convolution_shape(), ITensorInfo::data_layout(), ITensorInfo::data_type(), ConvolutionInfo::dilation, ITensorInfo::dimension(), arm_compute::test::validation::dst_shape, arm_compute::F16, arm_compute::F32, arm_compute::is_data_type_quantized(), arm_compute::is_data_type_quantized_per_channel(), arm_compute::NHWC, ITensorInfo::num_dimensions(), arm_compute::QASYMM8, arm_compute::QASYMM8_SIGNED, arm_compute::QSYMM8_PER_CHANNEL, ITensorInfo::quantization_info(), arm_compute::S32, QuantizationInfo::scale(), ITensorInfo::tensor_shape(), and ITensorInfo::total_size().

Referenced by CpuDepthwiseConv2dAssemblyDispatch::validate().

251 {
253 
254 #if !defined(__aarch64__)
255  ARM_COMPUTE_RETURN_ERROR_MSG("32-bit is not supported by assembly kernels");
256 #endif // !defined(__aarch64__)
259  ARM_COMPUTE_RETURN_ERROR_ON_MSG(src->data_layout() != DataLayout::NHWC, "Only NHWC is supported by assembly kernels");
260  ARM_COMPUTE_RETURN_ERROR_ON_MSG(info.dilation != Size2D(1, 1), "Assembly kernels do not support dilation != (1, 1)");
261 
262  if(is_data_type_quantized_per_channel(weights->data_type()))
263  {
265  ARM_COMPUTE_RETURN_ERROR_ON(weights->dimension(0) != weights->quantization_info().scale().size());
266  }
267  else
268  {
270  }
271 
272  if(bias != nullptr)
273  {
274  ARM_COMPUTE_RETURN_ERROR_ON(bias->num_dimensions() > 1);
275  ARM_COMPUTE_RETURN_ERROR_ON(bias->dimension(0) != weights->dimension(0));
276 
277  if(is_data_type_quantized(src->data_type()))
278  {
280  }
281  else
282  {
284  }
285  }
286 
287  if(dst->total_size() > 0)
288  {
292  }
293  return Status{};
294 }
bool is_data_type_quantized(DataType dt)
Check if a given data type is of quantized type.
Definition: Utils.h:981
TensorShape compute_depthwise_convolution_shape(const ITensorInfo &input, const ITensorInfo &weights, const ConvolutionInfo &info)
Calculate the depthwise convolution output shape of a tensor.
#define ARM_COMPUTE_RETURN_ERROR_ON_CPU_F16_UNSUPPORTED(tensor)
Definition: Validate.h:115
1 channel, 1 F32 per channel
#define ARM_COMPUTE_RETURN_ERROR_ON(cond)
If the condition is true, an error is returned.
Definition: Error.h:296
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DIMENSIONS(...)
Definition: Validate.h:284
SimpleTensor< float > src
Definition: DFT.cpp:155
1 channel, 1 F16 per channel
#define ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR(...)
Definition: Validate.h:159
1 channel, 1 S32 per channel
bool is_data_type_quantized_per_channel(DataType dt)
Check if a given data type is of per channel type.
Definition: Utils.h:1058
quantized, asymmetric fixed-point 8-bit number unsigned
quantized, symmetric per channel fixed-point 8-bit number
ScaleKernelInfo info(interpolation_policy, default_border_mode, PixelValue(), sampling_policy, false)
#define ARM_COMPUTE_RETURN_ERROR_MSG(...)
An error is returned with the given description.
Definition: Error.h:194
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(...)
Definition: Validate.h:541
Num samples, height, width, channels.
#define ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(t, c,...)
Definition: Validate.h:788
#define ARM_COMPUTE_RETURN_ERROR_ON_MSG(cond, msg)
If the condition is true, an error is returned.
Definition: Error.h:244
quantized, asymmetric fixed-point 8-bit number signed

The documentation for this class was generated from the following files: