Basic function to compute the convolution layer. More...

#include <CLConvolutionLayer.h>

Collaboration diagram for CLConvolutionLayer:

Public Member Functions
	CLConvolutionLayer (std::shared_ptr< IMemoryManager > memory_manager=nullptr)
	Default constructor. More...

	~CLConvolutionLayer ()
	Default Destructor. More...

	CLConvolutionLayer (const CLConvolutionLayer &)=delete
	Prevent instances of this class from being copied (As this class contains pointers) More...

	CLConvolutionLayer (CLConvolutionLayer &&)=default
	Default move constructor. More...

CLConvolutionLayer &	operator= (const CLConvolutionLayer &)=delete
	Prevent instances of this class from being copied (As this class contains pointers) More...

CLConvolutionLayer &	operator= (CLConvolutionLayer &&)=default
	Default move assignment operator. More...

void	configure (ICLTensor input, const ICLTensor weights, const ICLTensor biases, ICLTensor output, const PadStrideInfo &conv_info, const WeightsInfo &weights_info=WeightsInfo(), const Size2D &dilation=Size2D(1U, 1U), const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false, unsigned int num_groups=1)
	Set the input and output tensors. More...

void	configure (const CLCompileContext &compile_context, ICLTensor input, const ICLTensor weights, const ICLTensor biases, ICLTensor output, const PadStrideInfo &conv_info, const WeightsInfo &weights_info=WeightsInfo(), const Size2D &dilation=Size2D(1U, 1U), const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false, unsigned int num_groups=1)
	Set the input and output tensors. More...

void	run () override
	Run the kernels contained in the function. More...

void	prepare () override
	Prepare the function for executing. More...

Public Member Functions inherited from IFunction
virtual	~IFunction ()=default
	Destructor. More...

Static Public Member Functions
static Status	validate (const ITensorInfo input, const ITensorInfo weights, const ITensorInfo biases, const ITensorInfo output, const PadStrideInfo &conv_info, const WeightsInfo &weights_info=WeightsInfo(), const Size2D &dilation=Size2D(1U, 1U), const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false, unsigned int num_groups=1)
	Static function to check if given info will lead to a valid configuration of CLConvolutionLayer. More...

static ConvolutionMethod	get_convolution_method (const ITensorInfo input, const ITensorInfo weights, const ITensorInfo *output, const PadStrideInfo &conv_info, const WeightsInfo &weights_info, const ActivationLayerInfo &act_info, const GPUTarget gpu_target, const Size2D &dilation=Size2D(1U, 1U), bool enable_fast_math=false)
	Static function to check if given info will return the convolution called by CLConvolutionLayer. More...

Detailed Description

Basic function to compute the convolution layer.

This function calls the following OpenCL kernels/functions:

The function selects one of the algorithms mentioned above based on:

The size of the kernel
Number of input/output feature maps
Amount of memory needed

Generally GEMM-based convolution is executed when neither Winograd nor FFT nor Direct convolution can be performed.

FP32 Algorithm	Filter Size	Input/Output feature maps
Winograd	3x3 1x3 3x1 5x1 1x5 5x5(fast maths) 7x1 1x7	Input channels is greater than 3
FFT	Squared kernels and greater than 9x9	Input feature maps > Output feature maps
DirectConv	9x9
GEMM	Any size

Winograd 5x5 requires fast maths enabled.

FP16 Algorithm	Filter Size	Input/Output feature maps
Winograd	3x3 1x3 3x1 5x1 1x5 5x5	Input channels is greater than 3
FFT	Not supported
DirectConv	9x9
GEMM	Any size

Winograd FP16 requires fast maths enabled.

Definition at line 75 of file CLConvolutionLayer.h.

Constructor & Destructor Documentation

◆ CLConvolutionLayer() [1/3]

CLConvolutionLayer ( std::shared_ptr< IMemoryManager > memory_manager = nullptr )

Default constructor.

Definition at line 54 of file CLConvolutionLayer.cpp.

                                                                                    : _impl(std::make_unique<Impl>())
 {
     _impl->memory_manager = std::move(memory_manager);
 }

◆ ~CLConvolutionLayer()

~CLConvolutionLayer ( )

default

Default Destructor.

◆ CLConvolutionLayer() [2/3]

CLConvolutionLayer ( const CLConvolutionLayer & )

delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ CLConvolutionLayer() [3/3]

CLConvolutionLayer ( CLConvolutionLayer && )

default

Default move constructor.

Member Function Documentation

◆ configure() [1/2]

void configure	(	const CLCompileContext &	compile_context,
		ICLTensor *	input,
		const ICLTensor *	weights,
		const ICLTensor *	biases,
		ICLTensor *	output,
		const PadStrideInfo &	conv_info,
		const WeightsInfo &	weights_info = `WeightsInfo()`,
		const Size2D &	dilation = `Size2D(1U, 1U)`,
		const ActivationLayerInfo &	act_info = `ActivationLayerInfo()`,
		bool	enable_fast_math = `false`,
		unsigned int	num_groups = `1`
	)

Set the input and output tensors.

Parameters

[in]	compile_context	The compile context to be used.
[in]	input	Source tensor. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: QASYMM8/QASYMM8_SIGNED/F16/F32.
[in]	weights	Weights tensor. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. Data type supported: Same as `input`, also could be QSYMM8_PER_CHANNEL if input is QASYMM8/QASYMM8_SIGNED.
[in]	biases	Biases tensor. Shared biases supported. Biases are 1D tensor with dimensions [OFM]. Data type supported: Same as `input`, except for input of QASYMM8/QASYMM8_SIGNED type where biases should be of S32 type.
[out]	output	Destination tensor. 3 lower dimensions represent a single output [width, height, OFM], while the rest represent batch of outputs. Data types supported: Same as `input`.
[in]	conv_info	Contains padding and stride information described in PadStrideInfo.
[in]	weights_info	Specifies if the weights tensor has been reshaped with CLWeightsReshapeKernel. Data type supported: Same as `input`.
[in]	dilation	(Optional) Dilation, in elements, across x and y. Defaults to (1, 1).
[in]	act_info	(Optional) Activation layer information in case of a fused activation.
[in]	enable_fast_math	(Optional) Enable fast math computation. In case this flag were set, the function could dispatch the fastest implementation available which may introduce a drop of accuracy as well. Default is false
[in]	num_groups	(Optional) Number of groups when performing a grouped convolution. num_groups != 1 is only supported for NCHW data layout

Definition at line 76 of file CLConvolutionLayer.cpp.

 {
     ARM_COMPUTE_ERROR_ON_NULLPTR(input, weights, output);
     ARM_COMPUTE_ERROR_THROW_ON(CLConvolutionLayer::validate(
         input->info(), weights->info(), ((biases != nullptr) ? biases->info() : nullptr), output->info(), conv_info,
         weights_info, dilation, act_info, enable_fast_math, num_groups));
     ARM_COMPUTE_LOG_PARAMS(input, weights, biases, output, conv_info, weights_info, dilation, act_info,
                            enable_fast_math, num_groups);
  
     const Conv2dInfo conv2d_info = Conv2dInfo(conv_info, dilation, act_info, enable_fast_math, num_groups);
  
     switch (opencl::ClConv2d::get_convolution_method(input->info(), weights->info(), output->info(), conv2d_info,
                                                      weights_info, CLScheduler::get().target()))
     {
         case ConvolutionMethod::WINOGRAD:
         case ConvolutionMethod::DIRECT:
         case ConvolutionMethod::INDIRECT:
         case ConvolutionMethod::GEMM:
         {
             auto f = std::make_unique<opencl::ClConv2d>();
             f->configure(compile_context, input->info(), weights->info(),
                          ((biases != nullptr) ? biases->info() : nullptr), output->info(), conv2d_info, weights_info);
             _impl->op = std::move(f);
             break;
         }
         case ConvolutionMethod::FFT:
         {
             auto f = std::make_unique<CLFFTConvolutionLayer>(_impl->memory_manager);
             f->configure(compile_context, input, weights, biases, output, conv_info, act_info, enable_fast_math);
             _impl->func = std::move(f);
             break;
         }
         default:
             ARM_COMPUTE_ERROR("Not supported.");
             break;
     }
  
     if (_impl->op)
     {
         _impl->memory_group = MemoryGroup(std::move(_impl->memory_manager));
         _impl->aux_mem_req  = _impl->op->workspace();
         _impl->run_pack     = {{ACL_SRC_0, input}, {ACL_SRC_1, weights}, {ACL_SRC_2, biases}, {ACL_DST, output}};
         _impl->prep_pack    = {{ACL_SRC_1, weights}, {ACL_SRC_2, biases}};
         _impl->workspace =
             manage_workspace<CLTensor>(_impl->aux_mem_req, _impl->memory_group, _impl->run_pack, _impl->prep_pack);
     }
 }

References arm_compute::ACL_DST, arm_compute::ACL_SRC_0, arm_compute::ACL_SRC_1, arm_compute::ACL_SRC_2, arm_compute::test::validation::act_info, ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, ARM_COMPUTE_LOG_PARAMS, arm_compute::test::validation::conv_info, arm_compute::DIRECT, arm_compute::FFT, arm_compute::GEMM, CLScheduler::get(), ClConv2d::get_convolution_method(), arm_compute::INDIRECT, ITensor::info(), arm_compute::test::validation::input, arm_compute::test::validation::num_groups, CLScheduler::target(), CLConvolutionLayer::validate(), arm_compute::test::validation::weights_info, and arm_compute::WINOGRAD.

◆ configure() [2/2]

void configure	(	ICLTensor *	input,
		const ICLTensor *	weights,
		const ICLTensor *	biases,
		ICLTensor *	output,
		const PadStrideInfo &	conv_info,
		const WeightsInfo &	weights_info = `WeightsInfo()`,
		const Size2D &	dilation = `Size2D(1U, 1U)`,
		const ActivationLayerInfo &	act_info = `ActivationLayerInfo()`,
		bool	enable_fast_math = `false`,
		unsigned int	num_groups = `1`
	)

Set the input and output tensors.

Valid data layouts:

NHWC
NCHW

Valid data type configurations:

src0	src1	src2	dst
F16	F16	F16	F16
F32	F32	F32	F32
QASYMM8	QASYMM8	S32	QASYMM8
QASYMM8	QSYMM8_PER_CHANNEL	S32	QASYMM8
QASYMM8_SIGNED	QASYMM8_SIGNED	S32	QASYMM8_SIGNED
QASYMM8_SIGNED	QSYMM8_PER_CHANNEL	S32	QASYMM8_SIGNED

Parameters

[in]	input	Source tensor. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: QASYMM8/QASYMM8_SIGNED/F16/F32.
[in]	weights	Weights tensor. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. Data type supported: Same as `input`, also could be QSYMM8_PER_CHANNEL if input is QASYMM8/QASYMM8_SIGNED.
[in]	biases	Biases tensor. Shared biases supported. Biases are 1D tensor with dimensions [OFM]. Data type supported: Same as `input`, except for input of QASYMM8/QASYMM8_SIGNED type where biases should be of S32 type.
[out]	output	Destination tensor. 3 lower dimensions represent a single output [width, height, OFM], while the rest represent batch of outputs. Data types supported: Same as `input`.
[in]	conv_info	Contains padding and stride information described in PadStrideInfo.
[in]	weights_info	Specifies if the weights tensor has been reshaped with CLWeightsReshapeKernel. Data type supported: Same as `input`.
[in]	dilation	(Optional) Dilation, in elements, across x and y. Defaults to (1, 1).
[in]	act_info	(Optional) Activation layer information in case of a fused activation.
[in]	enable_fast_math	(Optional) Enable fast math computation. In case this flag were set, the function could dispatch the fastest implementation available which may introduce a drop of accuracy as well. Default is false
[in]	num_groups	(Optional) Number of groups when performing a grouped convolution. num_groups != 1 is only supported for NCHW data layout

Definition at line 61 of file CLConvolutionLayer.cpp.

 {
     configure(CLKernelLibrary::get().get_compile_context(), input, weights, biases, output, conv_info, weights_info,
               dilation, act_info, enable_fast_math, num_groups);
 }

References arm_compute::test::validation::act_info, arm_compute::test::validation::conv_info, CLKernelLibrary::get(), arm_compute::test::validation::input, arm_compute::test::validation::num_groups, and arm_compute::test::validation::weights_info.

Referenced by CLDirectDeconvolutionLayer::configure().

◆ get_convolution_method()

ConvolutionMethod get_convolution_method	(	const ITensorInfo *	input,
		const ITensorInfo *	weights,
		const ITensorInfo *	output,
		const PadStrideInfo &	conv_info,
		const WeightsInfo &	weights_info,
		const ActivationLayerInfo &	act_info,
		const GPUTarget	gpu_target,
		const Size2D &	dilation = `Size2D(1U, 1U)`,
		bool	enable_fast_math = `false`
	)

static

Static function to check if given info will return the convolution called by CLConvolutionLayer.

Parameters

[in]	input	Source tensor. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: QASYMM8/QASYMM8_SIGNED/F16/F32.
[in]	weights	Weights tensor. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. Data type supported: Same as `input`, also could be QSYMM8_PER_CHANNEL if input is QASYMM8/QASYMM8_SIGNED.
[in]	output	Destination tensor. 3 lower dimensions represent a single output [width, height, OFM], while the rest represent batch of outputs. Data types supported: Same as `input`.
[in]	conv_info	Contains padding and stride information described in PadStrideInfo.
[in]	weights_info	Specifies if the weights tensor has been reshaped with CLWeightsReshapeKernel.
[in]	act_info	(Optional) Activation layer information in case of a fused activation.
[in]	gpu_target	Specifies the `GPUTarget`.
[in]	dilation	(Optional) Dilation, in elements, across x and y. Defaults to (1, 1).
[in]	enable_fast_math	(Optional) Enable fast math computation. In case this flag were set, the function could dispatch the fastest implementation available which may introduce a drop of accuracy as well. Default is false

Returns: the Convolution Method Hint

Definition at line 179 of file CLConvolutionLayer.cpp.

 {
     const Conv2dInfo conv2d_info = Conv2dInfo(conv_info, dilation, act_info, enable_fast_math, 1);
     return opencl::ClConv2d::get_convolution_method(input, weights, output, conv2d_info, weights_info, gpu_target);
 }

References arm_compute::test::validation::act_info, arm_compute::test::validation::conv_info, ClConv2d::get_convolution_method(), arm_compute::test::validation::input, and arm_compute::test::validation::weights_info.

◆ operator=() [1/2]

CLConvolutionLayer& operator= ( CLConvolutionLayer && )

default

Default move assignment operator.

◆ operator=() [2/2]

CLConvolutionLayer& operator= ( const CLConvolutionLayer & )

delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ prepare()

void prepare ( )

overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Note: Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from IFunction.

Definition at line 209 of file CLConvolutionLayer.cpp.

 {
     if (_impl->func)
     {
         _impl->func->prepare();
     }
     else
     {
         _impl->op->prepare(_impl->prep_pack);
  
         // Release temporary tensors that are only used in prepare stage
         release_temporaries(_impl->aux_mem_req, _impl->workspace);
     }
 }

References arm_compute::release_temporaries().

Referenced by CLDirectDeconvolutionLayer::prepare(), and CLConvolutionLayer::run().

◆ run()

void run ( )

overridevirtual

Run the kernels contained in the function.

For CPU kernels:

Multi-threading is used for the kernels which are parallelisable.
By default std::thread::hardware_concurrency() threads are used.

Note: CPPScheduler::set_num_threads() can be used to manually set the number of threads

For OpenCL kernels:

All the kernels are enqueued on the queue associated with CLScheduler.
The queue is then flushed.

Note: The function will not block until the kernels are executed. It is the user's responsibility to wait.; Will call prepare() on first run if hasn't been done

Implements IFunction.

Definition at line 193 of file CLConvolutionLayer.cpp.

 {
     prepare();
  
     MemoryGroupResourceScope scope_mg(_impl->memory_group);
  
     if (_impl->func)
     {
         _impl->func->run();
     }
     else
     {
         _impl->op->run(_impl->run_pack);
     }
 }

References CLConvolutionLayer::prepare().

Referenced by CLDirectDeconvolutionLayer::run().

◆ validate()

Status validate	(	const ITensorInfo *	input,
		const ITensorInfo *	weights,
		const ITensorInfo *	biases,
		const ITensorInfo *	output,
		const PadStrideInfo &	conv_info,
		const WeightsInfo &	weights_info = `WeightsInfo()`,
		const Size2D &	dilation = `Size2D(1U, 1U)`,
		const ActivationLayerInfo &	act_info = `ActivationLayerInfo()`,
		bool	enable_fast_math = `false`,
		unsigned int	num_groups = `1`
	)

static

Static function to check if given info will lead to a valid configuration of CLConvolutionLayer.

Parameters

[in]	input	Source tensor. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: QASYMM8/QASYMM8_SIGNED/F16/F32.
[in]	weights	Weights tensor. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. Data type supported: Same as `input`, also could be QSYMM8_PER_CHANNEL if input is QASYMM8/QASYMM8_SIGNED.
[in]	biases	Biases tensor. Shared biases supported. Biases are 1D tensor with dimensions [OFM]. Data type supported: Same as `input`, except for input of QASYMM8/QASYMM8_SIGNED type where biases should be of S32 type.
[in]	output	Destination tensor. 3 lower dimensions represent a single output [width, height, OFM], while the rest represent batch of outputs. Data types supported: Same as `input`.
[in]	conv_info	Contains padding and stride information described in PadStrideInfo.
[in]	weights_info	Specifies if the weights tensor has been reshaped with CLWeightsReshapeKernel.
[in]	dilation	(Optional) Dilation, in elements, across x and y. Defaults to (1, 1).
[in]	act_info	(Optional) Activation layer information in case of a fused activation.
[in]	enable_fast_math	(Optional) Enable fast math computation. In case this flag were set, the function could dispatch the fastest implementation available which may introduce a drop of accuracy as well. Default is false
[in]	num_groups	(Optional) Number of groups when performing a grouped convolution. num_groups != 1 is only supported for NCHW data layout

Returns: a status

Definition at line 134 of file CLConvolutionLayer.cpp.

 {
     ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR(input, weights, output);
     ARM_COMPUTE_RETURN_ERROR_ON_MSG(!weights->are_values_constant(), "Dynamic weights are not supported");
     ARM_COMPUTE_RETURN_ERROR_ON_MSG((num_groups != 1) && (input->data_layout() != DataLayout::NCHW),
                                     "Grouping (num_groups != 1) with NHWC data layout is not supported");
  
     const GPUTarget  gpu_target  = CLScheduler::get().target();
     const Conv2dInfo conv2d_info = Conv2dInfo(conv_info, dilation, act_info, enable_fast_math, num_groups);
  
     switch (opencl::ClConv2d::get_convolution_method(input, weights, output, conv2d_info, weights_info, gpu_target))
     {
         case ConvolutionMethod::WINOGRAD:
         case ConvolutionMethod::DIRECT:
         case ConvolutionMethod::INDIRECT:
         case ConvolutionMethod::GEMM:
         {
             ARM_COMPUTE_RETURN_ON_ERROR(
                 opencl::ClConv2d::validate(input, weights, biases, output, conv2d_info, weights_info));
             break;
         }
         case ConvolutionMethod::FFT:
         {
             // Validate FFT-based convolution layer
             ARM_COMPUTE_RETURN_ON_ERROR(CLFFTConvolutionLayer::validate(input, weights, nullptr, output, conv_info,
                                                                         act_info, enable_fast_math));
             break;
         }
         default:
             ARM_COMPUTE_ERROR("Not supported.");
             break;
     }
  
     return Status{};
 }

References arm_compute::test::validation::act_info, ITensorInfo::are_values_constant(), ARM_COMPUTE_ERROR, ARM_COMPUTE_RETURN_ERROR_ON_MSG, ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR, ARM_COMPUTE_RETURN_ON_ERROR, arm_compute::test::validation::conv_info, arm_compute::DIRECT, arm_compute::FFT, arm_compute::GEMM, CLScheduler::get(), ClConv2d::get_convolution_method(), arm_compute::INDIRECT, arm_compute::test::validation::input, arm_compute::NCHW, arm_compute::test::validation::num_groups, CLScheduler::target(), ClConv2d::validate(), CLFFTConvolutionLayer::validate(), arm_compute::test::validation::weights_info, and arm_compute::WINOGRAD.

Referenced by CLConvolutionLayer::configure(), and CLDirectDeconvolutionLayer::validate().

The documentation for this class was generated from the following files:

arm_compute/runtime/CL/functions/CLConvolutionLayer.h
src/runtime/CL/functions/CLConvolutionLayer.cpp

Public Member Functions

Static Public Member Functions

Detailed Description

Constructor & Destructor Documentation

◆ CLConvolutionLayer() [1/3]

◆ ~CLConvolutionLayer()

◆ CLConvolutionLayer() [2/3]

◆ CLConvolutionLayer() [3/3]

Member Function Documentation

◆ configure() [1/2]

◆ configure() [2/2]

◆ get_convolution_method()

◆ operator=() [1/2]

◆ operator=() [2/2]

◆ prepare()

◆ run()

◆ validate()