Function to execute a depthwise convolution. More...

#include <CLDepthwiseConvolutionLayer.h>

Collaboration diagram for CLDepthwiseConvolutionLayer:

Public Member Functions
	CLDepthwiseConvolutionLayer (std::shared_ptr< IMemoryManager > memory_manager=nullptr)
	Default constructor. More...

	CLDepthwiseConvolutionLayer (const CLDepthwiseConvolutionLayer &)=delete
	Prevent instances of this class from being copied (As this class contains pointers) More...

	CLDepthwiseConvolutionLayer (CLDepthwiseConvolutionLayer &&)=default
	Default move constructor. More...

CLDepthwiseConvolutionLayer &	operator= (const CLDepthwiseConvolutionLayer &)=delete
	Prevent instances of this class from being copied (As this class contains pointers) More...

CLDepthwiseConvolutionLayer &	operator= (CLDepthwiseConvolutionLayer &&)=default
	Default move assignment operator. More...

	~CLDepthwiseConvolutionLayer ()
	Default destructor. More...

void	configure (const CLCompileContext &compile_context, ICLTensor input, const ICLTensor weights, const ICLTensor biases, ICLTensor output, const PadStrideInfo &conv_info, unsigned int depth_multiplier=1, ActivationLayerInfo act_info=ActivationLayerInfo(), const Size2D &dilation=Size2D(1U, 1U))
	Initialize the function's source, destination, weights and convolution information. More...

void	configure (ICLTensor input, const ICLTensor weights, const ICLTensor biases, ICLTensor output, const PadStrideInfo &conv_info, unsigned int depth_multiplier=1, ActivationLayerInfo act_info=ActivationLayerInfo(), const Size2D &dilation=Size2D(1U, 1U))
	Initialize the function's source, destination, weights and convolution information. More...

void	run () override
	Run the kernels contained in the function. More...

void	prepare () override
	Prepare the function for executing. More...

void	set_memory_group (std::shared_ptr< IMemoryManager > memory_manager)

Public Member Functions inherited from IFunction
virtual	~IFunction ()=default
	Destructor. More...

Static Public Member Functions
static Status	validate (const ITensorInfo input, const ITensorInfo weights, const ITensorInfo biases, const ITensorInfo output, const PadStrideInfo &conv_info, unsigned int depth_multiplier=1, ActivationLayerInfo act_info=ActivationLayerInfo(), const Size2D &dilation=Size2D(1U, 1U))
	Static function to check if given info will lead to a valid configuration of CLDepthwiseConvolutionLayer. More...

Detailed Description

Function to execute a depthwise convolution.

CLDepthwiseConvolutionLayerNativeKernel
CLPermute (if the data layout is NCHW)

Definition at line 46 of file CLDepthwiseConvolutionLayer.h.

Constructor & Destructor Documentation

◆ CLDepthwiseConvolutionLayer() [1/3]

CLDepthwiseConvolutionLayer ( std::shared_ptr< IMemoryManager > memory_manager = nullptr )

Default constructor.

Definition at line 44 of file CLDepthwiseConvolutionLayer.cpp.

     : _memory_group(std::move(memory_manager)),
       _dwc_native_kernel(std::make_unique<CLDepthwiseConvolutionLayerNativeKernel>()),
       _permute_input_to_nhwc(),
       _permute_weights_to_nhwc(),
       _permute_output_to_nchw(),
       _permuted_input(),
       _permuted_weights(),
       _permuted_output(),
       _output_multipliers(),
       _output_shifts(),
       _original_weights(),
       _input(),
       _output(),
       _needs_permute(false),
       _is_prepared(false),
       _is_quantized(false)
 {
 }

◆ CLDepthwiseConvolutionLayer() [2/3]

CLDepthwiseConvolutionLayer ( const CLDepthwiseConvolutionLayer & )

delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ CLDepthwiseConvolutionLayer() [3/3]

CLDepthwiseConvolutionLayer ( CLDepthwiseConvolutionLayer && )

default

Default move constructor.

◆ ~CLDepthwiseConvolutionLayer()

~CLDepthwiseConvolutionLayer ( )

default

Default destructor.

Member Function Documentation

◆ configure() [1/2]

void configure	(	const CLCompileContext &	compile_context,
		ICLTensor *	input,
		const ICLTensor *	weights,
		const ICLTensor *	biases,
		ICLTensor *	output,
		const PadStrideInfo &	conv_info,
		unsigned int	depth_multiplier = `1`,
		ActivationLayerInfo	act_info = `ActivationLayerInfo()`,
		const Size2D &	dilation = `Size2D(1U, 1U)`
	)

Initialize the function's source, destination, weights and convolution information.

Valid data layouts:

NHWC
NCHW

Valid data type configurations:

src0	src1	src2	dst
F16	F16	F16	F16
F32	F32	F32	F32
QASYMM8	QASYMM8	S32	QASYMM8
QASYMM8	QSYMM8_PER_CHANNEL	S32	QASYMM8
QASYMM8_SIGNED	QASYMM8_SIGNED	S32	QASYMM8_SIGNED
QASYMM8_SIGNED	QSYMM8_PER_CHANNEL	S32	QASYMM8_SIGNED

Parameters

[in]	compile_context	The compile context to be used.
[in,out]	input	Source tensor. Data type supported: QASYMM8/QASYMM8_SIGNED/FP16/FP32. Data layout supported: NHWC, NCHW
[in]	weights	Weights tensor. These are 3D tensors with shape [kernel_x, kernel_y, IFM]. Data type supported: Same as `input` or QASYMM8/QASYMM8_SIGNED/QSYMM8_PER_CHANNEL when `input` is QASYMM8.
[in]	biases	Biases tensor. A 1D tensor with shape [IFM]. Must be nullptr if not needed. Data type supported: Same as `input`, S32 when input is QASYMM8/QASYMM8_SIGNED.
[out]	output	Destination tensor. Pass in nullptr or `input` for in-place operation. Data type supported: same as `input`.
[in]	conv_info	Padding and stride information to use for the convolution.
[in]	depth_multiplier	(Optional) Multiplier to apply to the input's depth in order to retrieve the output's depth. Defaults to 1.
[in]	act_info	(Optional) Activation layer information in case of a fused activation.
[in]	dilation	(Optional) Dilation, in elements, across x and y. Defaults to (1, 1).

Note: : For in-place support, please check CLDepthwiseConvolutionLayerNativeKernel

Definition at line 79 of file CLDepthwiseConvolutionLayer.cpp.

 {
     ARM_COMPUTE_ERROR_ON_NULLPTR(input, weights);
     ARM_COMPUTE_ERROR_THROW_ON(CLDepthwiseConvolutionLayer::validate(
         input->info(), weights->info(), biases != nullptr ? biases->info() : nullptr,
         output != nullptr ? output->info() : input->info(), conv_info, depth_multiplier, act_info, dilation));
     ARM_COMPUTE_LOG_PARAMS(input, weights, biases, output, conv_info, depth_multiplier, act_info, dilation);
  
     _is_quantized     = is_data_type_quantized(input->info()->data_type());
     _is_prepared      = false;
     _original_weights = weights;
     _input            = input;
     _output           = output;
     _needs_permute    = input->info()->data_layout() == DataLayout::NCHW;
  
     const GPUTarget gpu_target = CLScheduler::get().target();
  
     ICLTensor       *input_to_use   = input;
     const ICLTensor *weights_to_use = weights;
     ICLTensor       *output_to_use  = output;
     if (_needs_permute)
     {
         _memory_group.manage(&_permuted_input);
         _memory_group.manage(&_permuted_output);
  
         // Configure the function to transform the input tensor from NCHW -> NHWC
         _permute_input_to_nhwc.configure(compile_context, input, &_permuted_input, PermutationVector(2U, 0U, 1U));
         _permuted_input.info()->set_data_layout(DataLayout::NHWC);
  
         // Configure the function to transform the weights tensor from IHW -> HWI
         _permute_weights_to_nhwc.configure(compile_context, weights, &_permuted_weights, PermutationVector(2U, 0U, 1U));
         _permuted_weights.info()->set_data_layout(DataLayout::NHWC);
  
         // Set output quantization info before dwc kernel configure
         _permuted_output.info()->set_quantization_info(output->info()->quantization_info());
  
         input_to_use   = &_permuted_input;
         weights_to_use = &_permuted_weights;
         output_to_use  = &_permuted_output;
     }
  
     CLTensor *output_multipliers_to_use = nullptr;
     CLTensor *output_shifts_to_use      = nullptr;
     if (_is_quantized)
     {
         const size_t idx_c =
             get_data_layout_dimension_index(weights->info()->data_layout(), DataLayoutDimension::CHANNEL);
         const size_t num_filters =
             (is_data_type_quantized_per_channel(weights->info()->data_type())) ? weights->info()->dimension(idx_c) : 1;
  
         _output_multipliers.allocator()->init(TensorInfo(TensorShape(num_filters), 1, DataType::S32));
         _output_shifts.allocator()->init(TensorInfo(TensorShape(num_filters), 1, DataType::S32));
  
         output_multipliers_to_use = &_output_multipliers;
         output_shifts_to_use      = &_output_shifts;
     }
  
     // Get the depthwise convolution compute parameters
     auto                       t = ClDWCNativeKernelConfigurationFactory::create(gpu_target);
     const DWCComputeKernelInfo dwc_native_compute_info =
         t->configure(input_to_use->info(), weights_to_use->info(), conv_info, dilation, depth_multiplier);
  
     const ConvolutionInfo conv_kernel_info{conv_info, depth_multiplier, act_info, dilation};
  
     _dwc_native_kernel->set_target(gpu_target);
     _dwc_native_kernel->configure(compile_context, input_to_use, weights_to_use, biases, output_to_use,
                                   dwc_native_compute_info, conv_kernel_info, output_multipliers_to_use,
                                   output_shifts_to_use);
  
     if (_needs_permute)
     {
         _permuted_input.allocator()->allocate();
  
         // Configure the function to transform the convoluted output to NCHW format
         _permuted_output.info()->set_data_layout(DataLayout::NCHW);
         _permute_output_to_nchw.configure(compile_context, &_permuted_output, output, PermutationVector(1U, 2U, 0U));
         _permuted_output.allocator()->allocate();
     }
  
     if (_is_quantized)
     {
         _output_multipliers.allocator()->allocate();
         _output_shifts.allocator()->allocate();
     }
 }

References arm_compute::test::validation::act_info, CLTensorAllocator::allocate(), CLTensor::allocator(), ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, ARM_COMPUTE_LOG_PARAMS, arm_compute::CHANNEL, CLPermute::configure(), arm_compute::test::validation::conv_info, ClDWCNativeKernelConfigurationFactory::create(), ITensorInfo::data_layout(), ITensorInfo::data_type(), ITensorInfo::dimension(), CLScheduler::get(), arm_compute::get_data_layout_dimension_index(), ITensor::info(), CLTensor::info(), ITensorAllocator::init(), arm_compute::test::validation::input, arm_compute::is_data_type_quantized(), arm_compute::is_data_type_quantized_per_channel(), MemoryGroup::manage(), arm_compute::NCHW, arm_compute::NHWC, ITensorInfo::quantization_info(), arm_compute::S32, TensorInfo::set_data_layout(), TensorInfo::set_quantization_info(), tf_frozen_model_extractor::t, CLScheduler::target(), arm_compute::utils::cast::U, and CLDepthwiseConvolutionLayer::validate().

Referenced by CLDepthwiseConvolutionLayer::configure().

◆ configure() [2/2]

void configure	(	ICLTensor *	input,
		const ICLTensor *	weights,
		const ICLTensor *	biases,
		ICLTensor *	output,
		const PadStrideInfo &	conv_info,
		unsigned int	depth_multiplier = `1`,
		ActivationLayerInfo	act_info = `ActivationLayerInfo()`,
		const Size2D &	dilation = `Size2D(1U, 1U)`
	)

Initialize the function's source, destination, weights and convolution information.

Definition at line 66 of file CLDepthwiseConvolutionLayer.cpp.

 {
     configure(CLKernelLibrary::get().get_compile_context(), input, weights, biases, output, conv_info, depth_multiplier,
               act_info, dilation);
 }

References arm_compute::test::validation::act_info, CLDepthwiseConvolutionLayer::configure(), arm_compute::test::validation::conv_info, CLKernelLibrary::get(), and arm_compute::test::validation::input.

◆ operator=() [1/2]

CLDepthwiseConvolutionLayer& operator= ( CLDepthwiseConvolutionLayer && )

default

Default move assignment operator.

◆ operator=() [2/2]

CLDepthwiseConvolutionLayer& operator= ( const CLDepthwiseConvolutionLayer & )

delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ prepare()

void prepare ( )

overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Note: Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from IFunction.

Definition at line 295 of file CLDepthwiseConvolutionLayer.cpp.

 {
     if (!_is_prepared)
     {
         if (_is_quantized)
         {
             _output_multipliers.map();
             _output_shifts.map();
             quantization::compute_quantized_multipliers_and_shifts(
                 _input->info(), _original_weights->info(), _output != nullptr ? _output->info() : _input->info(),
                 reinterpret_cast<int32_t *>(_output_multipliers.ptr_to_element(Coordinates(0))),
                 reinterpret_cast<int32_t *>(_output_shifts.ptr_to_element(Coordinates(0))));
             _output_multipliers.unmap();
             _output_shifts.unmap();
         }
  
         if (_needs_permute)
         {
             ARM_COMPUTE_ERROR_ON(!_original_weights->is_used());
  
             _permuted_weights.allocator()->allocate();
             _permute_weights_to_nhwc.run();
             _original_weights->mark_as_unused();
         }
         _is_prepared = true;
     }
 }

References CLTensorAllocator::allocate(), CLTensor::allocator(), ARM_COMPUTE_ERROR_ON, arm_compute::quantization::compute_quantized_multipliers_and_shifts(), ITensor::info(), ITensor::is_used(), CLTensor::map(), ITensor::mark_as_unused(), ITensor::ptr_to_element(), CLPermute::run(), and CLTensor::unmap().

Referenced by CLDepthwiseConvolutionLayer::run().

◆ run()

void run ( )

overridevirtual

Run the kernels contained in the function.

For CPU kernels:

Multi-threading is used for the kernels which are parallelisable.
By default std::thread::hardware_concurrency() threads are used.

Note: CPPScheduler::set_num_threads() can be used to manually set the number of threads

For OpenCL kernels:

All the kernels are enqueued on the queue associated with CLScheduler.
The queue is then flushed.

Note: The function will not block until the kernels are executed. It is the user's responsibility to wait.; Will call prepare() on first run if hasn't been done

Implements IFunction.

Definition at line 278 of file CLDepthwiseConvolutionLayer.cpp.

 {
     prepare();
  
     MemoryGroupResourceScope scope_mg(_memory_group);
  
     if (_needs_permute)
     {
         _permute_input_to_nhwc.run();
     }
     CLScheduler::get().enqueue(*_dwc_native_kernel);
     if (_needs_permute)
     {
         _permute_output_to_nchw.run();
     }
 }

References CLScheduler::enqueue(), CLScheduler::get(), CLDepthwiseConvolutionLayer::prepare(), and CLPermute::run().

◆ set_memory_group()

void set_memory_group ( std::shared_ptr< IMemoryManager > memory_manager )

inline

Definition at line 133 of file CLDepthwiseConvolutionLayer.h.

     {
         _memory_group = MemoryGroup(std::move(memory_manager));
     };

◆ validate()

Status validate	(	const ITensorInfo *	input,
		const ITensorInfo *	weights,
		const ITensorInfo *	biases,
		const ITensorInfo *	output,
		const PadStrideInfo &	conv_info,
		unsigned int	depth_multiplier = `1`,
		ActivationLayerInfo	act_info = `ActivationLayerInfo()`,
		const Size2D &	dilation = `Size2D(1U, 1U)`
	)

static

Static function to check if given info will lead to a valid configuration of CLDepthwiseConvolutionLayer.

Returns: a status

Definition at line 173 of file CLDepthwiseConvolutionLayer.cpp.

 {
     ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR(input, weights);
     ARM_COMPUTE_RETURN_ERROR_ON_MSG(!weights->are_values_constant(), "Dynamic weights are not supported");
  
     const bool in_place = input == output || output == nullptr;
     if (in_place)
     {
         output = input;
     }
     ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_LAYOUT(input, output);
     const size_t idx_w = get_data_layout_dimension_index(input->data_layout(), DataLayoutDimension::WIDTH);
     const size_t idx_h = get_data_layout_dimension_index(input->data_layout(), DataLayoutDimension::HEIGHT);
  
     ARM_COMPUTE_RETURN_ERROR_ON(weights->dimension(idx_w) + (weights->dimension(idx_w) - 1) * (dilation.x() - 1) >
                                 input->dimension(idx_w) + conv_info.pad_left() + conv_info.pad_right());
     ARM_COMPUTE_RETURN_ERROR_ON(weights->dimension(idx_h) + (weights->dimension(idx_h) - 1) * (dilation.y() - 1) >
                                 input->dimension(idx_h) + conv_info.pad_top() + conv_info.pad_bottom());
  
     const GPUTarget gpu_target = CLScheduler::get().target();
  
     const ConvolutionInfo conv_kernel_info{conv_info, depth_multiplier, act_info, dilation};
  
     const bool needs_permute = input->data_layout() == DataLayout::NCHW;
  
     const bool is_quantized = is_data_type_quantized(input->data_type());
  
     TensorInfo output_multipliers_shifts_info(TensorInfo(TensorShape(1U), 1, DataType::S32));
     if (is_quantized)
     {
         if (is_data_type_quantized_per_channel(weights->data_type()))
         {
             ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(weights, 1, DataType::QSYMM8_PER_CHANNEL);
  
             const size_t idx_c = get_data_layout_dimension_index(weights->data_layout(), DataLayoutDimension::CHANNEL);
             output_multipliers_shifts_info.set_tensor_shape(TensorShape(weights->dimension(idx_c)));
         }
         else
         {
             ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(input, weights);
         }
     }
  
     if (needs_permute)
     {
         ARM_COMPUTE_RETURN_ERROR_ON_MSG(in_place, "In-place is supported only with NHWC data layout");
         TensorShape           permuted_input_shape   = input->tensor_shape();
         TensorShape           permuted_weights_shape = weights->tensor_shape();
         const ConvolutionInfo info{conv_info, depth_multiplier, ActivationLayerInfo(), dilation};
         TensorShape           permuted_output_shape =
             shape_calculator::compute_depthwise_convolution_shape(*input, *weights, info);
  
         permute(permuted_input_shape, PermutationVector(2U, 0U, 1U));
         permute(permuted_weights_shape, PermutationVector(2U, 0U, 1U));
         permute(permuted_output_shape, PermutationVector(2U, 0U, 1U));
  
         const TensorInfo permuted_input = input->clone()
                                               ->set_is_resizable(true)
                                               .reset_padding()
                                               .set_tensor_shape(permuted_input_shape)
                                               .set_data_layout(DataLayout::NHWC);
         const TensorInfo permuted_weights = weights->clone()
                                                 ->set_is_resizable(true)
                                                 .reset_padding()
                                                 .set_tensor_shape(permuted_weights_shape)
                                                 .set_data_layout(DataLayout::NHWC);
         const TensorInfo permuted_output = output->clone()
                                                ->set_is_resizable(true)
                                                .reset_padding()
                                                .set_tensor_shape(permuted_output_shape)
                                                .set_data_layout(DataLayout::NHWC);
  
         ARM_COMPUTE_RETURN_ON_ERROR(CLPermute::validate(input, &permuted_input, PermutationVector(2U, 0U, 1U)));
         ARM_COMPUTE_RETURN_ON_ERROR(CLPermute::validate(weights, &permuted_weights, PermutationVector(2U, 0U, 1U)));
  
         // Get the depthwise convolution compute parameters
         auto                       t = ClDWCNativeKernelConfigurationFactory::create(gpu_target);
         const DWCComputeKernelInfo dwc_native_compute_info =
             t->configure(&permuted_input, &permuted_weights, conv_info, dilation, depth_multiplier);
  
         ARM_COMPUTE_RETURN_ON_ERROR(CLDepthwiseConvolutionLayerNativeKernel::validate(
             &permuted_input, &permuted_weights, biases, &permuted_output, dwc_native_compute_info, conv_kernel_info,
             &output_multipliers_shifts_info, &output_multipliers_shifts_info));
         ARM_COMPUTE_RETURN_ON_ERROR(CLPermute::validate(&permuted_output, output, PermutationVector(1U, 2U, 0U)));
     }
     else
     {
         // Get the depthwise convolution compute parameters
         auto                       t = ClDWCNativeKernelConfigurationFactory::create(gpu_target);
         const DWCComputeKernelInfo dwc_native_compute_info =
             t->configure(input, weights, conv_info, dilation, depth_multiplier);
         ARM_COMPUTE_RETURN_ON_ERROR(CLDepthwiseConvolutionLayerNativeKernel::validate(
             input, weights, biases, output, dwc_native_compute_info, conv_kernel_info, &output_multipliers_shifts_info,
             &output_multipliers_shifts_info));
     }
     return Status{};
 }

References arm_compute::test::validation::act_info, ITensorInfo::are_values_constant(), ARM_COMPUTE_RETURN_ERROR_ON, ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_LAYOUT, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES, ARM_COMPUTE_RETURN_ERROR_ON_MSG, ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR, ARM_COMPUTE_RETURN_ON_ERROR, arm_compute::CHANNEL, ICloneable< T >::clone(), arm_compute::misc::shape_calculator::compute_depthwise_convolution_shape(), arm_compute::test::validation::conv_info, ClDWCNativeKernelConfigurationFactory::create(), ITensorInfo::data_layout(), ITensorInfo::data_type(), ITensorInfo::dimension(), CLScheduler::get(), arm_compute::get_data_layout_dimension_index(), arm_compute::HEIGHT, arm_compute::test::validation::info, arm_compute::test::validation::input, arm_compute::is_data_type_quantized(), arm_compute::is_data_type_quantized_per_channel(), arm_compute::NCHW, arm_compute::NHWC, arm_compute::permute(), arm_compute::QSYMM8_PER_CHANNEL, arm_compute::S32, TensorInfo::set_tensor_shape(), tf_frozen_model_extractor::t, CLScheduler::target(), ITensorInfo::tensor_shape(), arm_compute::utils::cast::U, CLPermute::validate(), CLDepthwiseConvolutionLayerNativeKernel::validate(), arm_compute::WIDTH, Size2D::x(), and Size2D::y().

Referenced by CLDepthwiseConvolutionLayer::configure().

The documentation for this class was generated from the following files:

arm_compute/runtime/CL/functions/CLDepthwiseConvolutionLayer.h
src/runtime/CL/functions/CLDepthwiseConvolutionLayer.cpp

Public Member Functions

Static Public Member Functions

Detailed Description

Constructor & Destructor Documentation

◆ CLDepthwiseConvolutionLayer() [1/3]

◆ CLDepthwiseConvolutionLayer() [2/3]

◆ CLDepthwiseConvolutionLayer() [3/3]

◆ ~CLDepthwiseConvolutionLayer()

Member Function Documentation

◆ configure() [1/2]

◆ configure() [2/2]

◆ operator=() [1/2]

◆ operator=() [2/2]

◆ prepare()

◆ run()

◆ set_memory_group()

◆ validate()