Depthwise convolution assembly kernel glue. More...

#include <NEDepthwiseConvolutionAssemblyDispatch.h>

Collaboration diagram for NEDepthwiseConvolutionAssemblyDispatch:

Public Member Functions
	NEDepthwiseConvolutionAssemblyDispatch (std::shared_ptr< IMemoryManager > memory_manager=nullptr)
	Default constructor. More...

	NEDepthwiseConvolutionAssemblyDispatch (const NEDepthwiseConvolutionAssemblyDispatch &)=delete
	Prevent instances of this class from being copied (As this class contains pointers) More...

	NEDepthwiseConvolutionAssemblyDispatch (NEDepthwiseConvolutionAssemblyDispatch &&)=default
	Default move constructor. More...

NEDepthwiseConvolutionAssemblyDispatch &	operator= (const NEDepthwiseConvolutionAssemblyDispatch &)=delete
	Prevent instances of this class from being copied (As this class contains pointers) More...

NEDepthwiseConvolutionAssemblyDispatch &	operator= (NEDepthwiseConvolutionAssemblyDispatch &&)=default
	Default move assignment operator. More...

	~NEDepthwiseConvolutionAssemblyDispatch ()
	Default destructor. More...

void	configure (const ITensor input, const ITensor weights, const ITensor bias, ITensor output, const PadStrideInfo &conv_info, unsigned int depth_multiplier=1, const ActivationLayerInfo &act_info=ActivationLayerInfo(), const Size2D &dilation=Size2D(1, 1))
	Initialize the function's source, destination, kernels and border_size. More...

void	run () override
	Run the kernels contained in the function. More...

void	prepare () override
	Prepare the function for executing. More...

Public Member Functions inherited from IFunction
virtual	~IFunction ()=default
	Destructor. More...

Static Public Member Functions
static Status	validate (const ITensorInfo input, const ITensorInfo weights, const ITensorInfo bias, const ITensorInfo output, const PadStrideInfo &conv_info, unsigned int depth_multiplier=1, const ActivationLayerInfo &act_info=ActivationLayerInfo(), const Size2D &dilation=Size2D(1, 1))
	Static function to check if given info will lead to a valid configuration of NEDepthwiseConvolutionAssemblyDispatch. More...

static bool	is_optimized_supported (const ITensorInfo input, const ITensorInfo weights, PadStrideInfo conv_info, unsigned int depth_multiplier=1, const Size2D &dilation=Size2D(1, 1))
	Check if the optimized kernel can be used for the given kernel sizes and strides. More...

Detailed Description

Depthwise convolution assembly kernel glue.

Definition at line 36 of file NEDepthwiseConvolutionAssemblyDispatch.h.

Constructor & Destructor Documentation

◆ NEDepthwiseConvolutionAssemblyDispatch() [1/3]

NEDepthwiseConvolutionAssemblyDispatch ( std::shared_ptr< IMemoryManager > memory_manager = nullptr )

Default constructor.

Parameters

[in,out] memory_manager Memory manager to use

◆ NEDepthwiseConvolutionAssemblyDispatch() [2/3]

NEDepthwiseConvolutionAssemblyDispatch ( const NEDepthwiseConvolutionAssemblyDispatch & )

delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ NEDepthwiseConvolutionAssemblyDispatch() [3/3]

NEDepthwiseConvolutionAssemblyDispatch ( NEDepthwiseConvolutionAssemblyDispatch && )

default

Default move constructor.

◆ ~NEDepthwiseConvolutionAssemblyDispatch()

~NEDepthwiseConvolutionAssemblyDispatch ( )

default

Default destructor.

Member Function Documentation

◆ configure()

void configure	(	const ITensor *	input,
		const ITensor *	weights,
		const ITensor *	bias,
		ITensor *	output,
		const PadStrideInfo &	conv_info,
		unsigned int	depth_multiplier = `1`,
		const ActivationLayerInfo &	act_info = `ActivationLayerInfo()`,
		const Size2D &	dilation = `Size2D(1, 1)`
	)

Initialize the function's source, destination, kernels and border_size.

Note: Supports only NHWC format

Parameters

[in]	input	Source tensor. Data type supported: QASYMM8/F16/F32. (Written to only for border filling).
[in]	weights	Weights tensor. These are 3D tensors with shape [W, H, IFM]. Data type supported: Same as `input`.
[in]	bias	(Optional) Biases tensor. A 1D tensor with shape [IFM]. Must be nullptr if not needed. Data type supported: Same as `input`.
[out]	output	Destination tensor. Data type supported: same as `input`.
[in]	conv_info	Padding and stride information to use for the convolution.
[in]	depth_multiplier	(Optional) Multiplier to apply to the input's depth in order to retrieve the output's depth. Defaults to 1.
[in]	act_info	(Optional) Activation layer information in case of a fused activation.
[in]	dilation	(Optional) Dilation, in elements, across x and y. Defaults to (1, 1).

Definition at line 347 of file NEDepthwiseConvolutionAssemblyDispatch.cpp.

References ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, ARM_COMPUTE_UNUSED, arm_compute::auto_init_if_empty(), ICloneable< T >::clone(), arm_compute::misc::shape_calculator::compute_depthwise_convolution_shape(), arm_compute::test::validation::conv_info, ITensor::info(), arm_compute::test::validation::input, arm_compute::test::validation::output_shape, ITensorInfo::quantization_info(), and NEDepthwiseConvolutionAssemblyDispatch::validate().

 {
     ARM_COMPUTE_ERROR_ON_NULLPTR(input, weights, output);
     ARM_COMPUTE_UNUSED(depth_multiplier);
     ARM_COMPUTE_ERROR_THROW_ON(NEDepthwiseConvolutionAssemblyDispatch::validate(input->info(),
                                                                                 weights->info(),
                                                                                 bias != nullptr ? bias->info() : nullptr,
                                                                                 output->info(),
                                                                                 conv_info,
                                                                                 depth_multiplier,
                                                                                 act_info,
                                                                                 dilation));
 
     // Output auto inizialitation if not yet initialized
     const TensorShape output_shape = misc::shape_calculator::compute_depthwise_convolution_shape(*input->info(), *weights->info(), conv_info, depth_multiplier, dilation);
     auto_init_if_empty(*output->info(), input->info()->clone()->set_is_resizable(true).reset_padding().set_tensor_shape(output_shape).set_quantization_info(output->info()->quantization_info()));
 
     _input       = input;
     _weights     = weights;
     _bias        = bias;
     _output      = output;
     _is_prepared = false;
 
     // Create convolver
     _pImpl->_dwc_assembly_kernel = create_convolver(input, weights, output, conv_info, act_info, dilation);
     ARM_COMPUTE_ERROR_ON(_pImpl->_dwc_assembly_kernel == nullptr);
 
     // Create assembly kernel wrapper
     _pImpl->_dwc_acl_kernel.configure(_pImpl->_dwc_assembly_kernel.get());
 
     constexpr size_t alignment = 128;
 
     // Create workspace
     const unsigned int num_threads    = NEScheduler::get().num_threads();
     const size_t       workspace_size = _pImpl->_dwc_assembly_kernel->get_working_space_size(num_threads);
     ARM_COMPUTE_ERROR_ON_MSG(workspace_size == 0, "Workspace size cannot be 0 !");
     _workspace.allocator()->init(TensorInfo(TensorShape{ workspace_size }, 1, DataType::S8), alignment);
     _memory_group.manage(&_workspace);
     _workspace.allocator()->allocate();
 
     // Create packing tensor
     const size_t pack_tensor_size = _pImpl->_dwc_assembly_kernel->get_packed_params_size();
     ARM_COMPUTE_ERROR_ON_MSG(pack_tensor_size == 0, "Pack tensor size cannot be 0 !");
     _packed_weights.allocator()->init(TensorInfo(TensorShape{ pack_tensor_size }, 1, DataType::S8), alignment);
 }

◆ is_optimized_supported()

bool is_optimized_supported	(	const ITensorInfo *	input,
		const ITensorInfo *	weights,
		PadStrideInfo	conv_info,
		unsigned int	depth_multiplier = `1`,
		const Size2D &	dilation = `Size2D(1, 1)`
	)

static

Check if the optimized kernel can be used for the given kernel sizes and strides.

Warning: Even if this return true the inputs and outputs might need to get permuted as the only layout supported is NHWC

Parameters

[in]	input	Input tensor info.
[in]	weights	Weights tensor info.
[in]	conv_info	Convolution layer metadata.
[in]	depth_multiplier	(Optional) Depth multiplier to be used.
[in]	dilation	(Optional) Dilation, in elements, across x and y. Defaults to (1, 1).

Returns: True if the assembly kernel could be used else false. Note that transformations of input/output could be needed.

Definition at line 454 of file NEDepthwiseConvolutionAssemblyDispatch.cpp.

References ARM_COMPUTE_ERROR_ON_NULLPTR, arm_compute::calculate_same_pad(), arm_compute::test::validation::data_layout, ITensorInfo::data_layout(), ITensorInfo::data_type(), ITensorInfo::dimension(), Window::DimX, Window::DimY, Window::DimZ, arm_compute::get_data_layout_dimension_index(), arm_compute::HEIGHT, arm_compute::is_data_type_float(), arm_compute::NCHW, arm_compute::NHWC, PadStrideInfo::pad_bottom(), PadStrideInfo::pad_left(), PadStrideInfo::pad_right(), PadStrideInfo::pad_top(), arm_compute::QASYMM8, arm_compute::QASYMM8_SIGNED, arm_compute::QSYMM8_PER_CHANNEL, TensorShape::set(), PadStrideInfo::stride(), ITensorInfo::tensor_shape(), arm_compute::U, arm_compute::WIDTH, Size2D::x(), Dimensions< T >::x(), Size2D::y(), Dimensions< T >::y(), and Dimensions< T >::z().

 {
     ARM_COMPUTE_ERROR_ON_NULLPTR(input, weights);
 
     // Reshape input shape if in NHWC format
     const DataLayout data_layout = input->data_layout();
     TensorShape      in_shape{ input->tensor_shape() };
     if(data_layout == DataLayout::NHWC)
     {
         in_shape.set(Window::DimX, input->tensor_shape().y());
         in_shape.set(Window::DimY, input->tensor_shape().z());
         in_shape.set(Window::DimZ, input->tensor_shape().x());
     }
 
     // Check data type
     // TODO (COMPMID-3004): Add assembly optimized routine for QASYMM8_SIGNED NEDepthwiseConvolutionLayer
     const DataType input_type            = input->data_type();
     const bool     is_input_type_valid   = is_data_type_float(input_type) || input_type == DataType::QASYMM8;
     const DataType weights_type          = weights->data_type();
     const bool     is_weights_type_valid = is_data_type_float(weights_type) || weights_type == DataType::QASYMM8 || weights_type == DataType::QASYMM8_SIGNED
                                            || weights_type == DataType::QSYMM8_PER_CHANNEL;
 
     // Check weighs size
     std::set<unsigned int> supported_kernel_sizes = { 3, 5 };
     const unsigned int     width_idx              = get_data_layout_dimension_index(data_layout, DataLayoutDimension::WIDTH);
     const unsigned int     height_idx             = get_data_layout_dimension_index(data_layout, DataLayoutDimension::HEIGHT);
     const unsigned int     kernel_w               = weights->dimension(width_idx);
     const unsigned int     kernel_h               = weights->dimension(height_idx);
     bool                   weights_supported      = (kernel_w == kernel_h) && (supported_kernel_sizes.count(kernel_w) != 0);
 
     // Check for supported strides
     const auto &strides           = conv_info.stride();
     bool        supported_strides = (strides.first == strides.second) && ((strides.first == 1) || (strides.first == 2));
 
     // Check for supported padding
     const auto    pad_top           = conv_info.pad_top();
     const auto    pad_right         = conv_info.pad_right();
     const auto    pad_bottom        = conv_info.pad_bottom();
     const auto    pad_left          = conv_info.pad_left();
     PadStrideInfo same_pad          = calculate_same_pad(in_shape, TensorShape(kernel_w, kernel_h), conv_info, DataLayout::NCHW, dilation);
     bool          is_same_padding   = (pad_top == same_pad.pad_top()) && (pad_right == same_pad.pad_right()) && (pad_bottom == same_pad.pad_bottom()) && (pad_left == same_pad.pad_left());
     bool          is_valid_padding  = (pad_top == 0) && (pad_right == 0) && (pad_bottom == 0) && (pad_left == 0);
     bool          supported_padding = is_same_padding || is_valid_padding;
     // TODO(COMPMID-2464): Enable once dilated conv with stride 2 is supported
     bool is_dilation_supported = ((dilation == Size2D(1U, 1U)) || ((dilation.x() == dilation.y()) && strides.first == 1));
 
     if(weights_type == DataType::QSYMM8_PER_CHANNEL)
     {
         is_dilation_supported = is_dilation_supported && (dilation == Size2D(1U, 1U));
     }
 
     return is_input_type_valid && is_weights_type_valid && weights_supported && supported_strides && supported_padding && (depth_multiplier == 1) && is_dilation_supported;
 }

◆ operator=() [1/2]

NEDepthwiseConvolutionAssemblyDispatch& operator= ( const NEDepthwiseConvolutionAssemblyDispatch & )

delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

NEDepthwiseConvolutionAssemblyDispatch& operator= ( NEDepthwiseConvolutionAssemblyDispatch && )

default

Default move assignment operator.

◆ prepare()

void prepare ( )

overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Note: Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from IFunction.

Definition at line 543 of file NEDepthwiseConvolutionAssemblyDispatch.cpp.

 {
     if(!_is_prepared)
     {
         _packed_weights.allocator()->allocate();
         ARM_COMPUTE_ERROR_ON(_packed_weights.buffer() == nullptr);
 
         // Pack weights and bias
         const int weights_element_size = _weights->info()->element_size();
         const int weights_row_stride   = _weights->info()->strides_in_bytes().z() / weights_element_size;
         const int weights_col_stride   = _weights->info()->strides_in_bytes().y() / weights_element_size;
         _pImpl->_dwc_assembly_kernel->pack_params(_packed_weights.buffer(),
                                                   _weights->buffer() + _weights->info()->offset_first_element_in_bytes(),
                                                   weights_row_stride,
                                                   weights_col_stride,
                                                   (_bias != nullptr) ? _bias->buffer() : nullptr);
         _pImpl->_dwc_assembly_kernel->set_packed_params_buffer(_packed_weights.buffer());
 
         _weights->mark_as_unused();
         if(_bias != nullptr)
         {
             _bias->mark_as_unused();
         }
         _is_prepared = true;
     }
 }

◆ run()

void run ( )

overridevirtual

Run the kernels contained in the function.

For Neon kernels:

Multi-threading is used for the kernels which are parallelisable.
By default std::thread::hardware_concurrency() threads are used.

Note: CPPScheduler::set_num_threads() can be used to manually set the number of threads

For OpenCL kernels:

All the kernels are enqueued on the queue associated with CLScheduler.
The queue is then flushed.

Note: The function will not block until the kernels are executed. It is the user's responsibility to wait.; Will call prepare() on first run if hasn't been done

Implements IFunction.

Definition at line 512 of file NEDepthwiseConvolutionAssemblyDispatch.cpp.

References ARM_COMPUTE_ERROR_ON.

 {
     // Prepare assembly kernel
     prepare();
 
     MemoryGroupResourceScope scope_mg(_memory_group);
 
     // Setup inputs/outputs
     ARM_COMPUTE_ERROR_ON(_workspace.buffer() == nullptr);
     _pImpl->_dwc_assembly_kernel->set_working_space(static_cast<void *>(_workspace.buffer()));
 
     ARM_COMPUTE_ERROR_ON(_input->buffer() == nullptr);
     const int   input_element_size = _input->info()->element_size();
     const int   input_batch_stride = _input->info()->strides_in_bytes()[3] / input_element_size;
     const int   input_row_stride   = _input->info()->strides_in_bytes().z() / input_element_size;
     const int   input_col_stride   = _input->info()->strides_in_bytes().y() / input_element_size;
     const void *input_ptr          = _input->buffer() + _input->info()->offset_first_element_in_bytes();
     _pImpl->_dwc_assembly_kernel->set_input(input_ptr, input_batch_stride, input_row_stride, input_col_stride);
 
     ARM_COMPUTE_ERROR_ON(_output->buffer() == nullptr);
     const int output_element_size = _output->info()->element_size();
     const int output_batch_stride = _output->info()->strides_in_bytes()[3] / output_element_size;
     const int output_row_stride   = _output->info()->strides_in_bytes().z() / output_element_size;
     const int output_col_stride   = _output->info()->strides_in_bytes().y() / output_element_size;
     void     *output_ptr          = _output->buffer() + _output->info()->offset_first_element_in_bytes();
     _pImpl->_dwc_assembly_kernel->set_output(output_ptr, output_batch_stride, output_row_stride, output_col_stride);
 
     // Schedule assembly kernel
     NEScheduler::get().schedule(&_pImpl->_dwc_acl_kernel, Window::DimX);
 }

◆ validate()

Status validate	(	const ITensorInfo *	input,
		const ITensorInfo *	weights,
		const ITensorInfo *	bias,
		const ITensorInfo *	output,
		const PadStrideInfo &	conv_info,
		unsigned int	depth_multiplier = `1`,
		const ActivationLayerInfo &	act_info = `ActivationLayerInfo()`,
		const Size2D &	dilation = `Size2D(1, 1)`
	)

static

Static function to check if given info will lead to a valid configuration of NEDepthwiseConvolutionAssemblyDispatch.

Note: Supports only NHWC format

Parameters

[in]	input	Source tensor. Data type supported: QASYMM8/F16/F32. (Written to only for border filling).
[in]	weights	Weights tensor. These are 3D tensors with shape [W, H, IFM]. Data type supported: Same as `input`.
[in]	bias	(Optional) Biases tensor. A 1D tensor with shape [IFM]. Must be nullptr if not needed. Data type supported: Same as `input`.
[out]	output	Destination tensor. Data type supported: same as `input`.
[in]	conv_info	Padding and stride information to use for the convolution.
[in]	depth_multiplier	(Optional) Multiplier to apply to the input's depth in order to retrieve the output's depth. Defaults to 1.
[in]	act_info	(Optional) Activation layer information in case of a fused activation.
[in]	dilation	(Optional) Dilation, in elements, across x and y. Defaults to (1, 1).

Returns: An error status

Definition at line 400 of file NEDepthwiseConvolutionAssemblyDispatch.cpp.

References ARM_COMPUTE_RETURN_ERROR_ON, ARM_COMPUTE_RETURN_ERROR_ON_CPU_F16_UNSUPPORTED, ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_LAYOUT, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DIMENSIONS, arm_compute::CHANNEL, arm_compute::misc::shape_calculator::compute_depthwise_convolution_shape(), ITensorInfo::data_layout(), ITensorInfo::data_type(), ITensorInfo::dimension(), ActivationLayerInfo::enabled(), arm_compute::F16, arm_compute::F32, arm_compute::get_data_layout_dimension_index(), arm_compute::utils::info_helpers::is_relu(), arm_compute::utils::info_helpers::is_relu6(), ITensorInfo::num_dimensions(), arm_compute::test::validation::output_shape, arm_compute::QASYMM8, arm_compute::QSYMM8_PER_CHANNEL, ITensorInfo::quantization_info(), UniformQuantizationInfo::scale, QuantizationInfo::scale(), ITensorInfo::tensor_shape(), ITensorInfo::total_size(), and QuantizationInfo::uniform().

Referenced by NEDepthwiseConvolutionAssemblyDispatch::configure().

 {
     ARM_COMPUTE_RETURN_ERROR_ON_CPU_F16_UNSUPPORTED(input);
     ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(input, 1, DataType::QASYMM8, DataType::F16, DataType::F32);
     if(weights->data_type() != DataType::QSYMM8_PER_CHANNEL)
     {
         ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(input, weights);
     }
     ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_LAYOUT(input, weights);
 
     // Validate convolver
     ARM_COMPUTE_RETURN_ERROR_ON(!is_optimized_supported(input, weights, conv_info, depth_multiplier, dilation));
 
     // Validate activation
     const bool is_relu  = arm_compute::utils::info_helpers::is_relu(act_info);
     const bool is_relu6 = arm_compute::utils::info_helpers::is_relu6(act_info);
     ARM_COMPUTE_RETURN_ERROR_ON(act_info.enabled() && !(is_relu || is_relu6));
 
     // Check bias
     if(bias != nullptr)
     {
         unsigned int channel_idx = get_data_layout_dimension_index(input->data_layout(), DataLayoutDimension::CHANNEL);
         ARM_COMPUTE_RETURN_ERROR_ON(bias->num_dimensions() > 1);
         ARM_COMPUTE_RETURN_ERROR_ON(bias->dimension(0) != weights->dimension(channel_idx));
     }
 
     // Check output
     if(output->total_size() != 0)
     {
         const TensorShape output_shape = misc::shape_calculator::compute_depthwise_convolution_shape(*input, *weights, conv_info, depth_multiplier, dilation);
         ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DIMENSIONS(output->tensor_shape(), output_shape);
         ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(input, output);
     }
 
     // The uniform quantization case will only have 1 scale value in the weights quantization info
     const UniformQuantizationInfo input_qinfo   = input->quantization_info().uniform();
     const QuantizationInfo        weights_qinfo = weights->quantization_info();
     const UniformQuantizationInfo output_qinfo  = output->quantization_info().uniform();
     for(auto const s : weights_qinfo.scale())
     {
         const float fmultipler = input_qinfo.scale * s / output_qinfo.scale;
         ARM_COMPUTE_RETURN_ERROR_ON(fmultipler > 1.f);
     }
 
     return Status{};
 }

The documentation for this class was generated from the following files:

arm_compute/runtime/NEON/functions/assembly/NEDepthwiseConvolutionAssemblyDispatch.h
src/runtime/NEON/functions/assembly/NEDepthwiseConvolutionAssemblyDispatch.cpp

Public Member Functions

Static Public Member Functions

Detailed Description

Constructor & Destructor Documentation

◆ NEDepthwiseConvolutionAssemblyDispatch() [1/3]

◆ NEDepthwiseConvolutionAssemblyDispatch() [2/3]

◆ NEDepthwiseConvolutionAssemblyDispatch() [3/3]

◆ ~NEDepthwiseConvolutionAssemblyDispatch()

Member Function Documentation

◆ configure()

◆ is_optimized_supported()

◆ operator=() [1/2]

◆ operator=() [2/2]

◆ prepare()

◆ run()

◆ validate()