Function to execute a depthwise convolution. More...

#include <CLDepthwiseConvolutionLayer.h>

Collaboration diagram for CLDepthwiseConvolutionLayer:

Public Member Functions
	CLDepthwiseConvolutionLayer (std::shared_ptr< IMemoryManager > memory_manager=nullptr)
	Default constructor. More...

	CLDepthwiseConvolutionLayer (const CLDepthwiseConvolutionLayer &)=delete
	Prevent instances of this class from being copied (As this class contains pointers) More...

	CLDepthwiseConvolutionLayer (CLDepthwiseConvolutionLayer &&)=default
	Default move constructor. More...

CLDepthwiseConvolutionLayer &	operator= (const CLDepthwiseConvolutionLayer &)=delete
	Prevent instances of this class from being copied (As this class contains pointers) More...

CLDepthwiseConvolutionLayer &	operator= (CLDepthwiseConvolutionLayer &&)=default
	Default move assignment operator. More...

	~CLDepthwiseConvolutionLayer ()
	Default destructor. More...

void	configure (const CLCompileContext &compile_context, ICLTensor input, const ICLTensor weights, const ICLTensor biases, ICLTensor output, const PadStrideInfo &conv_info, unsigned int depth_multiplier=1, ActivationLayerInfo act_info=ActivationLayerInfo(), const Size2D &dilation=Size2D(1U, 1U))
	Initialize the function's source, destination, weights and convolution information. More...

void	configure (ICLTensor input, const ICLTensor weights, const ICLTensor biases, ICLTensor output, const PadStrideInfo &conv_info, unsigned int depth_multiplier=1, ActivationLayerInfo act_info=ActivationLayerInfo(), const Size2D &dilation=Size2D(1U, 1U))
	Initialize the function's source, destination, weights and convolution information. More...

void	run () override
	Run the kernels contained in the function. More...

void	prepare () override
	Prepare the function for executing. More...

void	set_memory_group (std::shared_ptr< IMemoryManager > memory_manager)

Public Member Functions inherited from IFunction
virtual	~IFunction ()=default
	Destructor. More...

Static Public Member Functions
static Status	validate (const ITensorInfo input, const ITensorInfo weights, const ITensorInfo biases, const ITensorInfo output, const PadStrideInfo &conv_info, unsigned int depth_multiplier=1, ActivationLayerInfo act_info=ActivationLayerInfo(), const Size2D &dilation=Size2D(1U, 1U))
	Static function to check if given info will lead to a valid configuration of CLDepthwiseConvolutionLayer. More...

Detailed Description

Function to execute a depthwise convolution.

CLDepthwiseConvolutionLayerNativeKernel
CLPermute (if the data layout is NCHW)

Definition at line 45 of file CLDepthwiseConvolutionLayer.h.

Constructor & Destructor Documentation

◆ CLDepthwiseConvolutionLayer() [1/3]

CLDepthwiseConvolutionLayer ( std::shared_ptr< IMemoryManager > memory_manager = nullptr )

Default constructor.

Definition at line 133 of file CLDepthwiseConvolutionLayer.cpp.

     : _memory_group(std::move(memory_manager)),
       _dwc_native_kernel(std::make_unique<CLDepthwiseConvolutionLayerNativeKernel>()),
       _permute_input_to_nhwc(),
       _permute_weights_to_nhwc(),
       _permute_output_to_nchw(),
       _permuted_input(),
       _permuted_weights(),
       _permuted_output(),
       _output_multipliers(),
       _output_shifts(),
       _original_weights(),
       _input(),
       _output(),
       _needs_permute(false),
       _is_prepared(false),
       _is_quantized(false)
 {
 }

◆ CLDepthwiseConvolutionLayer() [2/3]

CLDepthwiseConvolutionLayer ( const CLDepthwiseConvolutionLayer & )

delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ CLDepthwiseConvolutionLayer() [3/3]

CLDepthwiseConvolutionLayer ( CLDepthwiseConvolutionLayer && )

default

Default move constructor.

◆ ~CLDepthwiseConvolutionLayer()

~CLDepthwiseConvolutionLayer ( )

default

Default destructor.

Member Function Documentation

◆ configure() [1/2]

void configure	(	const CLCompileContext &	compile_context,
		ICLTensor *	input,
		const ICLTensor *	weights,
		const ICLTensor *	biases,
		ICLTensor *	output,
		const PadStrideInfo &	conv_info,
		unsigned int	depth_multiplier = `1`,
		ActivationLayerInfo	act_info = `ActivationLayerInfo()`,
		const Size2D &	dilation = `Size2D(1U, 1U)`
	)

Initialize the function's source, destination, weights and convolution information.

Valid data layouts:

NHWC
NCHW

Valid data type configurations:

src0	src1	src2	dst
F16	F16	F16	F16
F32	F32	F32	F32
QASYMM8	QASYMM8	S32	QASYMM8
QASYMM8	QSYMM8_PER_CHANNEL	S32	QASYMM8
QASYMM8_SIGNED	QASYMM8_SIGNED	S32	QASYMM8_SIGNED
QASYMM8_SIGNED	QSYMM8_PER_CHANNEL	S32	QASYMM8_SIGNED

Parameters

[in]	compile_context	The compile context to be used.
[in,out]	input	Source tensor. Data type supported: QASYMM8/QASYMM8_SIGNED/FP16/FP32. Data layout supported: NHWC, NCHW
[in]	weights	Weights tensor. These are 3D tensors with shape [kernel_x, kernel_y, IFM]. Data type supported: Same as `input` or QASYMM8/QASYMM8_SIGNED/QSYMM8_PER_CHANNEL when `input` is QASYMM8.
[in]	biases	Biases tensor. A 1D tensor with shape [IFM]. Must be nullptr if not needed. Data type supported: Same as `input`, S32 when input is QASYMM8/QASYMM8_SIGNED.
[out]	output	Destination tensor. Pass in nullptr or `input` for in-place operation. Data type supported: same as `input`.
[in]	conv_info	Padding and stride information to use for the convolution.
[in]	depth_multiplier	(Optional) Multiplier to apply to the input's depth in order to retrieve the output's depth. Defaults to 1.
[in]	act_info	(Optional) Activation layer information in case of a fused activation.
[in]	dilation	(Optional) Dilation, in elements, across x and y. Defaults to (1, 1).

Note: : For in-place support, please check CLDepthwiseConvolutionLayerNativeKernel

Definition at line 161 of file CLDepthwiseConvolutionLayer.cpp.

References CLTensorAllocator::allocate(), CLTensor::allocator(), ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, arm_compute::CHANNEL, CLPermute::configure(), arm_compute::test::validation::conv_info, ITensorInfo::data_layout(), ITensorInfo::data_type(), ITensorInfo::dimension(), CLScheduler::get(), arm_compute::get_data_layout_dimension_index(), ITensor::info(), CLTensor::info(), ITensorAllocator::init(), arm_compute::test::validation::input, arm_compute::is_data_type_quantized(), arm_compute::is_data_type_quantized_per_channel(), MemoryGroup::manage(), arm_compute::NCHW, arm_compute::NHWC, ITensorInfo::quantization_info(), arm_compute::S32, TensorInfo::set_data_layout(), TensorInfo::set_quantization_info(), CLScheduler::target(), arm_compute::U, and CLDepthwiseConvolutionLayer::validate().

Referenced by CLDepthwiseConvolutionLayer::configure().

 {
     ARM_COMPUTE_ERROR_ON_NULLPTR(input, weights);
     ARM_COMPUTE_ERROR_THROW_ON(CLDepthwiseConvolutionLayer::validate(input->info(),
                                                                      weights->info(),
                                                                      biases != nullptr ? biases->info() : nullptr,
                                                                      output != nullptr ? output->info() : input->info(),
                                                                      conv_info,
                                                                      depth_multiplier,
                                                                      act_info,
                                                                      dilation));
 
     _is_quantized     = is_data_type_quantized(input->info()->data_type());
     _is_prepared      = false;
     _original_weights = weights;
     _input            = input;
     _output           = output;
     _needs_permute    = input->info()->data_layout() == DataLayout::NCHW;
 
     const GPUTarget gpu_target = CLScheduler::get().target();
 
     ICLTensor       *input_to_use   = input;
     const ICLTensor *weights_to_use = weights;
     ICLTensor       *output_to_use  = output;
     if(_needs_permute)
     {
         _memory_group.manage(&_permuted_input);
         _memory_group.manage(&_permuted_output);
 
         // Configure the function to transform the input tensor from NCHW -> NHWC
         _permute_input_to_nhwc.configure(compile_context, input, &_permuted_input, PermutationVector(2U, 0U, 1U));
         _permuted_input.info()->set_data_layout(DataLayout::NHWC);
 
         // Configure the function to transform the weights tensor from IHW -> HWI
         _permute_weights_to_nhwc.configure(compile_context, weights, &_permuted_weights, PermutationVector(2U, 0U, 1U));
         _permuted_weights.info()->set_data_layout(DataLayout::NHWC);
 
         // Set output quantization info before dwc kernel configure
         _permuted_output.info()->set_quantization_info(output->info()->quantization_info());
 
         input_to_use   = &_permuted_input;
         weights_to_use = &_permuted_weights;
         output_to_use  = &_permuted_output;
     }
 
     CLTensor *output_multipliers_to_use = nullptr;
     CLTensor *output_shifts_to_use      = nullptr;
     if(_is_quantized)
     {
         const size_t idx_c       = get_data_layout_dimension_index(weights->info()->data_layout(), DataLayoutDimension::CHANNEL);
         const size_t num_filters = (is_data_type_quantized_per_channel(weights->info()->data_type())) ? weights->info()->dimension(idx_c) : 1;
 
         _output_multipliers.allocator()->init(TensorInfo(TensorShape(num_filters), 1, DataType::S32));
         _output_shifts.allocator()->init(TensorInfo(TensorShape(num_filters), 1, DataType::S32));
 
         output_multipliers_to_use = &_output_multipliers;
         output_shifts_to_use      = &_output_shifts;
     }
 
     DWCComputeKernelInfo dwc_native_compute_info;
     initialize_dwc_native_compute_info(dwc_native_compute_info, weights_to_use->info(), conv_info, dilation, depth_multiplier, gpu_target);
 
     const ConvolutionInfo conv_kernel_info{ conv_info, depth_multiplier, act_info, dilation };
 
     _dwc_native_kernel->configure(compile_context, input_to_use, weights_to_use, biases, output_to_use,
                                   dwc_native_compute_info, conv_kernel_info, output_multipliers_to_use, output_shifts_to_use);
 
     if(_needs_permute)
     {
         _permuted_input.allocator()->allocate();
 
         // Configure the function to transform the convoluted output to NCHW format
         _permuted_output.info()->set_data_layout(DataLayout::NCHW);
         _permute_output_to_nchw.configure(compile_context, &_permuted_output, output, PermutationVector(1U, 2U, 0U));
         _permuted_output.allocator()->allocate();
     }
 
     if(_is_quantized)
     {
         _output_multipliers.allocator()->allocate();
         _output_shifts.allocator()->allocate();
     }
 }

◆ configure() [2/2]

void configure	(	ICLTensor *	input,
		const ICLTensor *	weights,
		const ICLTensor *	biases,
		ICLTensor *	output,
		const PadStrideInfo &	conv_info,
		unsigned int	depth_multiplier = `1`,
		ActivationLayerInfo	act_info = `ActivationLayerInfo()`,
		const Size2D &	dilation = `Size2D(1U, 1U)`
	)

Initialize the function's source, destination, weights and convolution information.

Definition at line 155 of file CLDepthwiseConvolutionLayer.cpp.

References CLDepthwiseConvolutionLayer::configure(), and CLKernelLibrary::get().

 {
     configure(CLKernelLibrary::get().get_compile_context(), input, weights, biases, output, conv_info, depth_multiplier, act_info, dilation);
 }

◆ operator=() [1/2]

CLDepthwiseConvolutionLayer& operator= ( const CLDepthwiseConvolutionLayer & )

delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

CLDepthwiseConvolutionLayer& operator= ( CLDepthwiseConvolutionLayer && )

default

Default move assignment operator.

◆ prepare()

void prepare ( )

overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Note: Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from IFunction.

Definition at line 340 of file CLDepthwiseConvolutionLayer.cpp.

References CLTensorAllocator::allocate(), CLTensor::allocator(), ARM_COMPUTE_ERROR_ON, arm_compute::quantization::compute_quantized_multipliers_and_shifts(), ITensor::info(), ITensor::is_used(), CLTensor::map(), ITensor::mark_as_unused(), ITensor::ptr_to_element(), CLPermute::run(), and CLTensor::unmap().

Referenced by CLDepthwiseConvolutionLayer::run().

 {
     if(!_is_prepared)
     {
         if(_is_quantized)
         {
             _output_multipliers.map();
             _output_shifts.map();
             quantization::compute_quantized_multipliers_and_shifts(_input->info(),
                                                                    _original_weights->info(),
                                                                    _output != nullptr ? _output->info() : _input->info(),
                                                                    reinterpret_cast<int32_t *>(_output_multipliers.ptr_to_element(Coordinates(0))),
                                                                    reinterpret_cast<int32_t *>(_output_shifts.ptr_to_element(Coordinates(0))));
             _output_multipliers.unmap();
             _output_shifts.unmap();
         }
 
         if(_needs_permute)
         {
             ARM_COMPUTE_ERROR_ON(!_original_weights->is_used());
 
             _permuted_weights.allocator()->allocate();
             _permute_weights_to_nhwc.run();
             _original_weights->mark_as_unused();
         }
         _is_prepared = true;
     }
 }

◆ run()

void run ( )

overridevirtual

Run the kernels contained in the function.

For CPU kernels:

Multi-threading is used for the kernels which are parallelisable.
By default std::thread::hardware_concurrency() threads are used.

Note: CPPScheduler::set_num_threads() can be used to manually set the number of threads

For OpenCL kernels:

All the kernels are enqueued on the queue associated with CLScheduler.
The queue is then flushed.

Note: The function will not block until the kernels are executed. It is the user's responsibility to wait.; Will call prepare() on first run if hasn't been done

Implements IFunction.

Definition at line 323 of file CLDepthwiseConvolutionLayer.cpp.

References CLScheduler::enqueue(), CLScheduler::get(), CLDepthwiseConvolutionLayer::prepare(), and CLPermute::run().

 {
     prepare();
 
     MemoryGroupResourceScope scope_mg(_memory_group);
 
     if(_needs_permute)
     {
         _permute_input_to_nhwc.run();
     }
     CLScheduler::get().enqueue(*_dwc_native_kernel);
     if(_needs_permute)
     {
         _permute_output_to_nchw.run();
     }
 }

◆ set_memory_group()

void set_memory_group ( std::shared_ptr< IMemoryManager > memory_manager )

inline

Definition at line 113 of file CLDepthwiseConvolutionLayer.h.

     {
         _memory_group = MemoryGroup(std::move(memory_manager));
     };

◆ validate()

Status validate	(	const ITensorInfo *	input,
		const ITensorInfo *	weights,
		const ITensorInfo *	biases,
		const ITensorInfo *	output,
		const PadStrideInfo &	conv_info,
		unsigned int	depth_multiplier = `1`,
		ActivationLayerInfo	act_info = `ActivationLayerInfo()`,
		const Size2D &	dilation = `Size2D(1U, 1U)`
	)

static

Static function to check if given info will lead to a valid configuration of CLDepthwiseConvolutionLayer.

Returns: a status

Definition at line 247 of file CLDepthwiseConvolutionLayer.cpp.

References ARM_COMPUTE_RETURN_ERROR_ON, ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_LAYOUT, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES, ARM_COMPUTE_RETURN_ERROR_ON_MSG, ARM_COMPUTE_RETURN_ON_ERROR, arm_compute::CHANNEL, ICloneable< T >::clone(), arm_compute::misc::shape_calculator::compute_depthwise_convolution_shape(), arm_compute::test::validation::conv_info, ITensorInfo::data_layout(), ITensorInfo::data_type(), ITensorInfo::dimension(), CLScheduler::get(), arm_compute::get_data_layout_dimension_index(), arm_compute::HEIGHT, arm_compute::test::validation::info, arm_compute::test::validation::input, arm_compute::is_data_type_quantized(), arm_compute::is_data_type_quantized_per_channel(), arm_compute::NCHW, arm_compute::NHWC, PadStrideInfo::pad_bottom(), PadStrideInfo::pad_left(), PadStrideInfo::pad_right(), PadStrideInfo::pad_top(), arm_compute::permute(), arm_compute::QSYMM8_PER_CHANNEL, arm_compute::S32, CLScheduler::target(), ITensorInfo::tensor_shape(), arm_compute::U, CLDepthwiseConvolutionLayerNativeKernel::validate(), CLPermute::validate(), arm_compute::WIDTH, Size2D::x(), and Size2D::y().

Referenced by CLDepthwiseConvolutionLayer::configure(), and arm_compute::test::validation::DATA_TEST_CASE().

 {
     const bool in_place = input == output || output == nullptr;
     if(in_place)
     {
         output = input;
     }
     ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_LAYOUT(input, output);
     const size_t idx_w = get_data_layout_dimension_index(input->data_layout(), DataLayoutDimension::WIDTH);
     const size_t idx_h = get_data_layout_dimension_index(input->data_layout(), DataLayoutDimension::HEIGHT);
 
     ARM_COMPUTE_RETURN_ERROR_ON(weights->dimension(idx_w) + (weights->dimension(idx_w) - 1) * (dilation.x() - 1) > input->dimension(idx_w) + conv_info.pad_left() + conv_info.pad_right());
     ARM_COMPUTE_RETURN_ERROR_ON(weights->dimension(idx_h) + (weights->dimension(idx_h) - 1) * (dilation.y() - 1) > input->dimension(idx_h) + conv_info.pad_top() + conv_info.pad_bottom());
 
     const GPUTarget gpu_target = CLScheduler::get().target();
 
     const ConvolutionInfo conv_kernel_info{ conv_info, depth_multiplier, act_info, dilation };
 
     const bool needs_permute = input->data_layout() == DataLayout::NCHW;
 
     const bool is_quantized = is_data_type_quantized(input->data_type());
 
     TensorInfo output_multipliers_shifts_info(TensorInfo(TensorShape(1U), 1, DataType::S32));
     if(is_quantized)
     {
         if(is_data_type_quantized_per_channel(weights->data_type()))
         {
             ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(weights, 1, DataType::QSYMM8_PER_CHANNEL);
 
             const size_t idx_c = get_data_layout_dimension_index(weights->data_layout(), DataLayoutDimension::CHANNEL);
             output_multipliers_shifts_info.set_tensor_shape(TensorShape(weights->dimension(idx_c)));
         }
         else
         {
             ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(input, weights);
         }
     }
 
     if(needs_permute)
     {
         ARM_COMPUTE_RETURN_ERROR_ON_MSG(in_place, "In-place is supported only with NHWC data layout");
         TensorShape           permuted_input_shape   = input->tensor_shape();
         TensorShape           permuted_weights_shape = weights->tensor_shape();
         const ConvolutionInfo info{ conv_info, depth_multiplier, ActivationLayerInfo(), dilation };
         TensorShape           permuted_output_shape = shape_calculator::compute_depthwise_convolution_shape(*input, *weights, info);
 
         permute(permuted_input_shape, PermutationVector(2U, 0U, 1U));
         permute(permuted_weights_shape, PermutationVector(2U, 0U, 1U));
         permute(permuted_output_shape, PermutationVector(2U, 0U, 1U));
 
         const TensorInfo permuted_input   = input->clone()->set_is_resizable(true).reset_padding().set_tensor_shape(permuted_input_shape).set_data_layout(DataLayout::NHWC);
         const TensorInfo permuted_weights = weights->clone()->set_is_resizable(true).reset_padding().set_tensor_shape(permuted_weights_shape).set_data_layout(DataLayout::NHWC);
         const TensorInfo permuted_output  = output->clone()->set_is_resizable(true).reset_padding().set_tensor_shape(permuted_output_shape).set_data_layout(DataLayout::NHWC);
 
         ARM_COMPUTE_RETURN_ON_ERROR(CLPermute::validate(input, &permuted_input, PermutationVector(2U, 0U, 1U)));
         ARM_COMPUTE_RETURN_ON_ERROR(CLPermute::validate(weights, &permuted_weights, PermutationVector(2U, 0U, 1U)));
 
         DWCComputeKernelInfo dwc_native_compute_info;
         initialize_dwc_native_compute_info(dwc_native_compute_info, &permuted_weights, conv_info, dilation, depth_multiplier, gpu_target);
 
         ARM_COMPUTE_RETURN_ON_ERROR(CLDepthwiseConvolutionLayerNativeKernel::validate(&permuted_input, &permuted_weights, biases, &permuted_output,
                                                                                       dwc_native_compute_info, conv_kernel_info, &output_multipliers_shifts_info, &output_multipliers_shifts_info));
         ARM_COMPUTE_RETURN_ON_ERROR(CLPermute::validate(&permuted_output, output, PermutationVector(1U, 2U, 0U)));
     }
     else
     {
         DWCComputeKernelInfo dwc_native_compute_info;
         initialize_dwc_native_compute_info(dwc_native_compute_info, weights, conv_info, dilation, depth_multiplier, gpu_target);
         ARM_COMPUTE_RETURN_ON_ERROR(CLDepthwiseConvolutionLayerNativeKernel::validate(input, weights, biases, output, dwc_native_compute_info, conv_kernel_info, &output_multipliers_shifts_info,
                                                                                       &output_multipliers_shifts_info));
     }
     return Status{};
 }

The documentation for this class was generated from the following files:

arm_compute/runtime/CL/functions/CLDepthwiseConvolutionLayer.h
src/runtime/CL/functions/CLDepthwiseConvolutionLayer.cpp

Public Member Functions

Static Public Member Functions

Detailed Description

Constructor & Destructor Documentation

◆ CLDepthwiseConvolutionLayer() [1/3]

◆ CLDepthwiseConvolutionLayer() [2/3]

◆ CLDepthwiseConvolutionLayer() [3/3]

◆ ~CLDepthwiseConvolutionLayer()

Member Function Documentation

◆ configure() [1/2]

◆ configure() [2/2]

◆ operator=() [1/2]

◆ operator=() [2/2]

◆ prepare()

◆ run()

◆ set_memory_group()

◆ validate()