This class is a wrapper for the depthwise convolution assembly kernels. More...

#include <CpuDepthwiseConv2dAssemblyWrapperKernel.h>

Collaboration diagram for CpuDepthwiseConv2dAssemblyWrapperKernel:

Public Member Functions
	CpuDepthwiseConv2dAssemblyWrapperKernel ()
	Default constructor. More...

	~CpuDepthwiseConv2dAssemblyWrapperKernel ()

	ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE (CpuDepthwiseConv2dAssemblyWrapperKernel)

void	configure (const ITensorInfo src, const ITensorInfo weights, const ITensorInfo bias, ITensorInfo dst, const ConvolutionInfo &info, const CPUInfo &cpu_info)
	Initialise the kernel's src and dst. More...

void	run_op (ITensorPack &tensors, const Window &window, const ThreadInfo &info) override
	Execute the kernel on the passed window. More...

const char *	name () const override
	Name of the kernel. More...

void	pack_parameters (void parameters_ptr, void bias_ptr, void *weights_ptr, size_t ld_weights_col, size_t ld_weights_row)
	Pack bias and weights in a storage space for the assembly kernel. More...

size_t	get_storage_size () const
	Get the amount of storage space required for the rearranged weights and bias. More...

size_t	get_working_size (unsigned int num_threads) const
	Get size of the workspace needed by the assembly kernel. More...

bool	is_configured () const
	Was the asm kernel successfully configured? More...

size_t	get_mws (const CPUInfo &platform, size_t thread_count) const override
	Return minimum workload size of the relevant kernel. More...

Public Member Functions inherited from ICPPKernel
virtual	~ICPPKernel ()=default
	Default destructor. More...

virtual void	run (const Window &window, const ThreadInfo &info)
	Execute the kernel on the passed window. More...

virtual void	run_nd (const Window &window, const ThreadInfo &info, const Window &thread_locator)
	legacy compatibility layer for implemantions which do not support thread_locator In these cases we simply narrow the interface down the legacy version More...

Public Member Functions inherited from IKernel
	IKernel ()
	Constructor. More...

virtual	~IKernel ()=default
	Destructor. More...

virtual bool	is_parallelisable () const
	Indicates whether or not the kernel is parallelisable. More...

virtual BorderSize	border_size () const
	The size of the border for that kernel. More...

const Window &	window () const
	The maximum window the kernel can be executed on. More...

bool	is_window_configured () const
	Function to check if the embedded window of this kernel has been configured. More...

Static Public Member Functions
static Status	validate (const ITensorInfo src, const ITensorInfo weights, const ITensorInfo bias, const ITensorInfo dst, const ConvolutionInfo &info)
	Indicates whether or not this function can be used to process the given parameters. More...

Static Public Member Functions inherited from ICpuKernel< CpuDepthwiseConv2dAssemblyWrapperKernel >
static const auto *	get_implementation (const SelectorType &selector, KernelSelectionType selection_type=KernelSelectionType::Supported)
	Micro-kernel selector. More...

Additional Inherited Members
Static Public Attributes inherited from ICPPKernel
static constexpr size_t	default_mws = 1

Detailed Description

This class is a wrapper for the depthwise convolution assembly kernels.

Definition at line 51 of file CpuDepthwiseConv2dAssemblyWrapperKernel.h.

Constructor & Destructor Documentation

◆ CpuDepthwiseConv2dAssemblyWrapperKernel()

CpuDepthwiseConv2dAssemblyWrapperKernel ( )

Default constructor.

Definition at line 199 of file CpuDepthwiseConv2dAssemblyWrapperKernel.cpp.

     : _kernel_asm(nullptr), _multipliers(), _left_shifts(), _right_shifts(), _name()
 {
 }

◆ ~CpuDepthwiseConv2dAssemblyWrapperKernel()

~CpuDepthwiseConv2dAssemblyWrapperKernel ( )

default

Member Function Documentation

◆ ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE()

ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE ( CpuDepthwiseConv2dAssemblyWrapperKernel )

◆ configure()

void configure	(	const ITensorInfo *	src,
		const ITensorInfo *	weights,
		const ITensorInfo *	bias,
		ITensorInfo *	dst,
		const ConvolutionInfo &	info,
		const CPUInfo &	cpu_info
	)

Initialise the kernel's src and dst.

Parameters

[in]	src	Source tensor info. Data types supported: QASYMM8/QASYMM8_SIGNED/F16/F32.
[in]	weights	Weights tensor info. These are 3D tensors with shape [kernel_x, kernel_y, IFM]. Data type supported: same as `src` or QASYMM8/QASYMM8_SIGNED/QSYMM8_PER_CHANNEL when `src` is QASYMM8/QASYMM8_SIGNED.
[in]	bias	Bias tensor. A 1D tensor with shape [IFM]. Must be nullptr if not needed. Data type supported: same as `src`, S32 when `src` is QASYMM8/QASYMM8_SIGNED.
[out]	dst	Destination tensor info. Data type supported: same as `input`.
[in]	info	Depthwise convolution layer meta-data.
[in]	cpu_info	CPU information needed to select the most appropriate kernel.

Definition at line 206 of file CpuDepthwiseConv2dAssemblyWrapperKernel.cpp.

 {
     ARM_COMPUTE_UNUSED(cpu_info);
     ARM_COMPUTE_ERROR_ON_NULLPTR(src, weights, dst);
  
     // Destination initialization if not yet initialized
     const TensorShape dst_shape = compute_depthwise_convolution_shape(*src, *weights, info);
     auto_init_if_empty(*dst, src->clone()->set_tensor_shape(dst_shape));
     _name = "CpuDepthwiseConv2dAssemblyWrapperKernel";
     std::string asm_kernel_name("");
 #if defined(__aarch64__)
     switch (src->data_type())
     {
         case DataType::QASYMM8:
             if (is_data_type_quantized_per_channel(weights->data_type()))
             {
                 create_arm_dwc_quant<uint8_t, int8_t, uint8_t>(src, weights, dst, info, cpu_info, _kernel_asm,
                                                                _multipliers, _right_shifts, _left_shifts,
                                                                asm_kernel_name);
             }
             else
             {
                 create_arm_dwc_quant<uint8_t, uint8_t, uint8_t>(src, weights, dst, info, cpu_info, _kernel_asm,
                                                                 _multipliers, _right_shifts, _left_shifts,
                                                                 asm_kernel_name);
             }
             break;
         case DataType::QASYMM8_SIGNED:
             create_arm_dwc_quant<int8_t, int8_t, int8_t>(src, weights, dst, info, cpu_info, _kernel_asm, _multipliers,
                                                          _right_shifts, _left_shifts, asm_kernel_name);
             break;
 #if defined(ENABLE_FP16_KERNELS)
         case DataType::F16:
             create_arm_dwc<float16_t, float16_t, float16_t>(src, weights, dst, info, cpu_info, _kernel_asm,
                                                             asm_kernel_name);
             break;
 #endif // defined(ENABLE_FP16_KERNELS)
         case DataType::F32:
             create_arm_dwc<float, float, float>(src, weights, dst, info, cpu_info, _kernel_asm, asm_kernel_name);
             break;
         default:
             break;
     }
 #endif // defined(__aarch64__)
  
     Window win = calculate_max_window(*dst, Steps());
     ICpuKernel::configure(win);
     if (_kernel_asm != nullptr)
     {
         _name += "/" + asm_kernel_name;
     }
 }

References ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_UNUSED, arm_compute::auto_init_if_empty(), arm_compute::calculate_max_window(), arm_compute::misc::shape_calculator::compute_depthwise_convolution_shape(), ITensorInfo::data_type(), arm_compute::test::validation::dst, arm_compute::test::validation::dst_shape, arm_compute::F16, arm_compute::F32, arm_compute::test::validation::info, arm_compute::is_data_type_quantized_per_channel(), arm_compute::QASYMM8, arm_compute::QASYMM8_SIGNED, and arm_compute::test::validation::src.

◆ get_mws()

size_t get_mws	(	const CPUInfo &	platform,
		size_t	thread_count
	)		const

overridevirtual

Return minimum workload size of the relevant kernel.

Parameters

[in]	platform	The CPU platform used to create the context.
[in]	thread_count	Number of threads in the execution.

Returns: [out] small_network_mws Minimum workload size for requsted configuration.

Reimplemented from ICPPKernel.

Definition at line 388 of file CpuDepthwiseConv2dAssemblyWrapperKernel.cpp.

 {
     ARM_COMPUTE_UNUSED(thread_count);
     ARM_COMPUTE_UNUSED(platform);
  
     return ICPPKernel::default_mws;
 }

References ARM_COMPUTE_UNUSED, and ICPPKernel::default_mws.

◆ get_storage_size()

size_t get_storage_size ( ) const

Get the amount of storage space required for the rearranged weights and bias.

Returns: size of workspace

Definition at line 368 of file CpuDepthwiseConv2dAssemblyWrapperKernel.cpp.

 {
     return _kernel_asm->get_storage_size();
 }

◆ get_working_size()

size_t get_working_size ( unsigned int num_threads ) const

Get size of the workspace needed by the assembly kernel.

Parameters

[in] num_threads Maximum number of threads that are going to be spawned.

Returns: size of workspace

Definition at line 373 of file CpuDepthwiseConv2dAssemblyWrapperKernel.cpp.

 {
     return _kernel_asm->get_working_size(num_threads);
 }

◆ is_configured()

bool is_configured ( ) const

Was the asm kernel successfully configured?

Returns: True if the asm kernel is configured and ready to run

Definition at line 378 of file CpuDepthwiseConv2dAssemblyWrapperKernel.cpp.

 {
     return _kernel_asm != nullptr;
 }

◆ name()

const char * name ( ) const

overridevirtual

Name of the kernel.

Returns: Kernel name

Implements ICPPKernel.

Definition at line 383 of file CpuDepthwiseConv2dAssemblyWrapperKernel.cpp.

 {
     return _name.c_str();
 }

◆ pack_parameters()

void pack_parameters	(	void *	parameters_ptr,
		void *	bias_ptr,
		void *	weights_ptr,
		size_t	ld_weights_col,
		size_t	ld_weights_row
	)

Pack bias and weights in a storage space for the assembly kernel.

Parameters

[in]	parameters_ptr	Pointer to storage space.
[in]	bias_ptr	Pointer to bias buffer.
[in]	weights_ptr	Pointer to weights buffer.
[in]	ld_weights_col	Columns displacement for the weights tensor.
[in]	ld_weights_row	Rows displacement for the weights tensor.

Definition at line 362 of file CpuDepthwiseConv2dAssemblyWrapperKernel.cpp.

 {
     _kernel_asm->pack_parameters(parameters_ptr, bias_ptr, weights_ptr, ld_weights_col, ld_weight_row);
 }

◆ run_op()

void run_op	(	ITensorPack &	tensors,
		const Window &	window,
		const ThreadInfo &	info
	)

overridevirtual

Execute the kernel on the passed window.

Warning: If is_parallelisable() returns false then the passed window must be equal to window()

Note: The window has to be a region within the window returned by the window() method; The width of the window has to be a multiple of num_elems_processed_per_iteration().

Parameters

[in]	tensors	A vector containing the tensors to operate on.
[in]	window	Region on which to execute the kernel. (Must be a region of the window returned by window())
[in]	info	Info about executing thread and CPU.

Reimplemented from ICPPKernel.

Definition at line 327 of file CpuDepthwiseConv2dAssemblyWrapperKernel.cpp.

 {
     ARM_COMPUTE_ERROR_ON_NULLPTR(_kernel_asm.get());
     ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL(this);
     ARM_COMPUTE_UNUSED(window);
     ARM_COMPUTE_UNUSED(info);
  
     ARM_COMPUTE_ERROR_ON(tensors.empty());
  
     const ITensor *src       = tensors.get_const_tensor(TensorType::ACL_SRC_0);
     ITensor       *dst       = tensors.get_tensor(TensorType::ACL_DST);
     ITensor       *workspace = tensors.get_tensor(TensorType::ACL_INT_0);
     ITensor       *storage   = tensors.get_tensor(TensorType::ACL_INT_1);
  
     const auto src_ptr        = src->buffer() + src->info()->offset_first_element_in_bytes();
     auto       dst_ptr        = dst->buffer() + dst->info()->offset_first_element_in_bytes();
     auto       working_space  = workspace->buffer() + workspace->info()->offset_first_element_in_bytes();
     auto       parameters_ptr = storage->buffer() + storage->info()->offset_first_element_in_bytes();
  
     const auto src_shape   = src->info()->tensor_shape();
     const auto dst_shape   = dst->info()->tensor_shape();
     const auto src_padding = src->info()->padding();
     const auto dst_padding = dst->info()->padding();
  
     const size_t ld_src_col   = src_shape[0] + src_padding.left + src_padding.right;
     const size_t ld_src_row   = ld_src_col * (src_shape[1] + src_padding.top + src_padding.bottom);
     const size_t ld_src_batch = ld_src_row * src_shape[2];
     const size_t ld_dst_col   = dst_shape[0] + dst_padding.left + dst_padding.right;
     const size_t ld_dst_row   = ld_dst_col * (dst_shape[1] + dst_padding.top + dst_padding.bottom);
     const size_t ld_dst_batch = ld_dst_row * dst_shape[2];
  
     _kernel_asm->execute(src_ptr, ld_src_col, ld_src_row, ld_src_batch, parameters_ptr, dst_ptr, ld_dst_col, ld_dst_row,
                          ld_dst_batch, working_space, info.thread_id, info.num_threads);
 }

References arm_compute::ACL_DST, arm_compute::ACL_INT_0, arm_compute::ACL_INT_1, arm_compute::ACL_SRC_0, ARM_COMPUTE_ERROR_ON, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL, ARM_COMPUTE_UNUSED, ITensor::buffer(), arm_compute::test::validation::dst, arm_compute::test::validation::dst_shape, ITensorPack::empty(), ITensorPack::get_const_tensor(), ITensorPack::get_tensor(), ITensor::info(), arm_compute::test::validation::info, ITensorInfo::offset_first_element_in_bytes(), arm_compute::test::validation::src, and IKernel::window().

◆ validate()

Status validate	(	const ITensorInfo *	src,
		const ITensorInfo *	weights,
		const ITensorInfo *	bias,
		const ITensorInfo *	dst,
		const ConvolutionInfo &	info
	)

static

Indicates whether or not this function can be used to process the given parameters.

Returns: a status.

Definition at line 264 of file CpuDepthwiseConv2dAssemblyWrapperKernel.cpp.

 {
     ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR(src, dst);
  
 #if !defined(__aarch64__)
     ARM_COMPUTE_RETURN_ERROR_MSG("32-bit is not supported by assembly kernels");
 #endif // !defined(__aarch64__)
     ARM_COMPUTE_RETURN_ERROR_ON_CPU_F16_UNSUPPORTED(src);
     ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(src, 1, DataType::QASYMM8, DataType::QASYMM8_SIGNED,
                                                          DataType::F16, DataType::F32);
     ARM_COMPUTE_RETURN_ERROR_ON_MSG(src->data_layout() != DataLayout::NHWC,
                                     "Only NHWC is supported by assembly kernels");
  
     if (is_data_type_quantized_per_channel(weights->data_type()))
     {
         ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(weights, 1, DataType::QSYMM8_PER_CHANNEL);
         ARM_COMPUTE_RETURN_ERROR_ON(weights->dimension(0) != weights->quantization_info().scale().size());
     }
     else
     {
         ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(src, weights);
     }
  
     if (bias != nullptr)
     {
         ARM_COMPUTE_RETURN_ERROR_ON(bias->num_dimensions() > 1);
         ARM_COMPUTE_RETURN_ERROR_ON(bias->dimension(0) != weights->dimension(0));
  
         if (is_data_type_quantized(src->data_type()))
         {
             ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(bias, 1, DataType::S32);
         }
         else
         {
             ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(src, bias);
         }
     }
  
     if (dst->total_size() > 0)
     {
         const TensorShape dst_shape = misc::shape_calculator::compute_depthwise_convolution_shape(*src, *weights, info);
         ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DIMENSIONS(dst->tensor_shape(), dst_shape);
         ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(src, dst);
     }
  
     // Assembly kernels cannot work with padding greater than the kernel.
     const auto &padding   = info.pad_stride_info;
     const auto &dilation  = info.dilation;
     const auto &wei_shape = weights->tensor_shape();
  
     const auto dilated_wei_w = wei_shape[1] + (wei_shape[1] - 1) * (dilation.x() - 1);
     const auto dilated_wei_h = wei_shape[2] + (wei_shape[2] - 1) * (dilation.y() - 1);
  
     ARM_COMPUTE_RETURN_ERROR_ON(padding.pad_left() >= dilated_wei_w || padding.pad_right() >= dilated_wei_w ||
                                 padding.pad_top() >= dilated_wei_h || padding.pad_bottom() >= dilated_wei_h);
  
     return Status{};
 }

Referenced by CpuDepthwiseConv2dAssemblyDispatch::validate().

The documentation for this class was generated from the following files:

src/cpu/kernels/internal/CpuDepthwiseConv2dAssemblyWrapperKernel.h
src/cpu/kernels/internal/CpuDepthwiseConv2dAssemblyWrapperKernel.cpp

Public Member Functions

Static Public Member Functions

Additional Inherited Members

Detailed Description

Constructor & Destructor Documentation

◆ CpuDepthwiseConv2dAssemblyWrapperKernel()

◆ ~CpuDepthwiseConv2dAssemblyWrapperKernel()

Member Function Documentation

◆ ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE()

◆ configure()

◆ get_mws()

◆ get_storage_size()

◆ get_working_size()

◆ is_configured()

◆ name()

◆ pack_parameters()

◆ run_op()

◆ validate()