Compute Library
 21.02
CLConvolutionLayer Class Reference

Basic function to compute the convolution layer. More...

#include <CLConvolutionLayer.h>

Collaboration diagram for CLConvolutionLayer:
[legend]

Public Member Functions

 CLConvolutionLayer (std::shared_ptr< IMemoryManager > memory_manager=nullptr)
 Default constructor. More...
 
 ~CLConvolutionLayer ()
 Default Destructor. More...
 
 CLConvolutionLayer (const CLConvolutionLayer &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 CLConvolutionLayer (CLConvolutionLayer &&)=default
 Default move constructor. More...
 
CLConvolutionLayeroperator= (const CLConvolutionLayer &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
CLConvolutionLayeroperator= (CLConvolutionLayer &&)=default
 Default move assignment operator. More...
 
void configure (ICLTensor *input, const ICLTensor *weights, const ICLTensor *biases, ICLTensor *output, const PadStrideInfo &conv_info, const WeightsInfo &weights_info=WeightsInfo(), const Size2D &dilation=Size2D(1U, 1U), const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false, unsigned int num_groups=1)
 Set the input and output tensors. More...
 
void configure (const CLCompileContext &compile_context, ICLTensor *input, const ICLTensor *weights, const ICLTensor *biases, ICLTensor *output, const PadStrideInfo &conv_info, const WeightsInfo &weights_info=WeightsInfo(), const Size2D &dilation=Size2D(1U, 1U), const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false, unsigned int num_groups=1)
 Set the input and output tensors. More...
 
void run () override
 Run the kernels contained in the function. More...
 
void prepare () override
 Prepare the function for executing. More...
 
- Public Member Functions inherited from IFunction
virtual ~IFunction ()=default
 Destructor. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, const PadStrideInfo &conv_info, const WeightsInfo &weights_info=WeightsInfo(), const Size2D &dilation=Size2D(1U, 1U), const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false, unsigned int num_groups=1)
 Static function to check if given info will lead to a valid configuration of CLConvolutionLayer. More...
 
static ConvolutionMethod get_convolution_method (const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *output, const PadStrideInfo &conv_info, const WeightsInfo &weights_info, const ActivationLayerInfo &act_info, const GPUTarget gpu_target, const Size2D &dilation=Size2D(1U, 1U), bool enable_fast_math=false)
 Static function to check if given info will return the convolution called by CLConvolutionLayer. More...
 

Detailed Description

Basic function to compute the convolution layer.

This function calls the following OpenCL kernels/functions:

  1. CLGEMMConvolutionLayer
  2. CLWinogradConvolutionLayer
  3. CLDirectConvolutionLayer
  4. CLFFTConvolutionLayer

The function selects one of the algorithms mentioned above based on:

  • The size of the kernel
  • Number of input/output feature maps
  • Amount of memory needed

Generally GEMM-based convolution is executed when neither Winograd nor FFT nor Direct convolution can be performed.

FP32 AlgorithmFilter Size Input/Output feature maps
Winograd 3x3 1x3 3x1 5x1 1x5 5x5(fast maths) 7x1 1x7 Input channels is greater than 3
FFT Squared kernels and greater than 9x9 Input feature maps > Output feature maps
DirectConv 9x9
GEMM Any size

Winograd 5x5 requires fast maths enabled.

FP16 AlgorithmFilter Size Input/Output feature maps
Winograd 3x3 1x3 3x1 5x1 1x5 5x5 Input channels is greater than 3
FFT Not supported
DirectConv 9x9
GEMM Any size

Winograd FP16 requires fast maths enabled.

Definition at line 71 of file CLConvolutionLayer.h.

Constructor & Destructor Documentation

◆ CLConvolutionLayer() [1/3]

CLConvolutionLayer ( std::shared_ptr< IMemoryManager memory_manager = nullptr)

Default constructor.

Definition at line 41 of file CLConvolutionLayer.cpp.

References CLConvolutionLayer::~CLConvolutionLayer().

42  : _memory_manager(std::move(memory_manager)), _function()
43 {
44 }

◆ ~CLConvolutionLayer()

~CLConvolutionLayer ( )
default

Default Destructor.

Referenced by CLConvolutionLayer::CLConvolutionLayer().

◆ CLConvolutionLayer() [2/3]

CLConvolutionLayer ( const CLConvolutionLayer )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ CLConvolutionLayer() [3/3]

Default move constructor.

Member Function Documentation

◆ configure() [1/2]

void configure ( ICLTensor input,
const ICLTensor weights,
const ICLTensor biases,
ICLTensor output,
const PadStrideInfo conv_info,
const WeightsInfo weights_info = WeightsInfo(),
const Size2D dilation = Size2D(1U, 1U),
const ActivationLayerInfo act_info = ActivationLayerInfo(),
bool  enable_fast_math = false,
unsigned int  num_groups = 1 
)

Set the input and output tensors.

Parameters
[in]inputSource tensor. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: QASYMM8/QASYMM8_SIGNED/F16/F32.
[in]weightsWeights tensor. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. Data type supported: Same as input or QASYMM8/QSYMM8_PER_CHANNEL when input is QASYMM8.
[in]biasesBiases tensor. Shared biases supported. Biases are 1D tensor with dimensions [OFM]. Data type supported: Should match input data type, except for input of QASYMM8 type where biases should be of S32 type.
[out]outputDestination tensor. 3 lower dimensions represent a single output [width, height, OFM], while the rest represent batch of outputs. Data types supported: Same as input.
[in]conv_infoContains padding and stride information described in PadStrideInfo.
[in]weights_infoSpecifies if the weights tensor has been reshaped with CLWeightsReshapeKernel. Data type supported: Same as input.
[in]dilation(Optional) Dilation, in elements, across x and y. Defaults to (1, 1).
[in]act_info(Optional) Activation layer information in case of a fused activation.
[in]enable_fast_math(Optional) Enable fast math computation. In case this flag were set, the function could dispatch the fastest implementation available which may introduce a drop of accuracy as well. Default is false
[in]num_groups(Optional) Number of groups when performing a grouped convolution. num_groups != 1 is only supported for NCHW data layout

Definition at line 48 of file CLConvolutionLayer.cpp.

References CLKernelLibrary::get().

Referenced by CLDirectDeconvolutionLayer::configure().

50 {
51  configure(CLKernelLibrary::get().get_compile_context(), input, weights, biases, output, conv_info, weights_info, dilation, act_info, enable_fast_math, num_groups);
52 }
void configure(ICLTensor *input, const ICLTensor *weights, const ICLTensor *biases, ICLTensor *output, const PadStrideInfo &conv_info, const WeightsInfo &weights_info=WeightsInfo(), const Size2D &dilation=Size2D(1U, 1U), const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false, unsigned int num_groups=1)
Set the input and output tensors.
static CLKernelLibrary & get()
Access the KernelLibrary singleton.
const unsigned int num_groups
Definition: Im2Col.cpp:153

◆ configure() [2/2]

void configure ( const CLCompileContext compile_context,
ICLTensor input,
const ICLTensor weights,
const ICLTensor biases,
ICLTensor output,
const PadStrideInfo conv_info,
const WeightsInfo weights_info = WeightsInfo(),
const Size2D dilation = Size2D(1U, 1U),
const ActivationLayerInfo act_info = ActivationLayerInfo(),
bool  enable_fast_math = false,
unsigned int  num_groups = 1 
)

Set the input and output tensors.

Parameters
[in]compile_contextThe compile context to be used.
[in]inputSource tensor. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: QASYMM8/QASYMM8_SIGNED/F16/F32.
[in]weightsWeights tensor. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. Data type supported: Same as input or QASYMM8/QSYMM8_PER_CHANNEL when input is QASYMM8.
[in]biasesBiases tensor. Shared biases supported. Biases are 1D tensor with dimensions [OFM]. Data type supported: Should match input data type, except for input of QASYMM8 type where biases should be of S32 type.
[out]outputDestination tensor. 3 lower dimensions represent a single output [width, height, OFM], while the rest represent batch of outputs. Data types supported: Same as input.
[in]conv_infoContains padding and stride information described in PadStrideInfo.
[in]weights_infoSpecifies if the weights tensor has been reshaped with CLWeightsReshapeKernel. Data type supported: Same as input.
[in]dilation(Optional) Dilation, in elements, across x and y. Defaults to (1, 1).
[in]act_info(Optional) Activation layer information in case of a fused activation.
[in]enable_fast_math(Optional) Enable fast math computation. In case this flag were set, the function could dispatch the fastest implementation available which may introduce a drop of accuracy as well. Default is false
[in]num_groups(Optional) Number of groups when performing a grouped convolution. num_groups != 1 is only supported for NCHW data layout

Definition at line 54 of file CLConvolutionLayer.cpp.

References ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, arm_compute::test::validation::conv_info, arm_compute::DIRECT, arm_compute::FFT, arm_compute::GEMM, CLScheduler::get(), CLConvolutionLayer::get_convolution_method(), ITensor::info(), arm_compute::test::validation::num_groups, CLScheduler::target(), CLConvolutionLayer::validate(), arm_compute::test::validation::weights_info, and arm_compute::WINOGRAD.

57 {
58  ARM_COMPUTE_ERROR_ON_NULLPTR(input, weights, output);
59  ARM_COMPUTE_ERROR_THROW_ON(CLConvolutionLayer::validate(input->info(), weights->info(), ((biases != nullptr) ? biases->info() : nullptr), output->info(), conv_info, weights_info, dilation, act_info,
60  enable_fast_math, num_groups));
61 
62  switch(CLConvolutionLayer::get_convolution_method(input->info(), weights->info(), output->info(), conv_info,
63  weights_info, act_info, CLScheduler::get().target(), dilation, enable_fast_math))
64  {
66  {
68  auto f = std::make_unique<CLWinogradConvolutionLayer>(_memory_manager);
69  f->configure(compile_context, input, weights, biases, output, conv_info, act_info, enable_fast_math);
70  _function = std::move(f);
71  break;
72  }
74  {
76  auto f = std::make_unique<CLDirectConvolutionLayer>();
77  f->configure(compile_context, input, weights, biases, output, conv_info, act_info);
78  _function = std::move(f);
79  break;
80  }
82  {
83  auto f = std::make_unique<CLGEMMConvolutionLayer>(_memory_manager);
84  f->configure(compile_context, input, weights, biases, output, conv_info, weights_info, dilation, act_info, num_groups);
85  _function = std::move(f);
86  break;
87  }
89  {
90  auto f = std::make_unique<CLFFTConvolutionLayer>(_memory_manager);
91  f->configure(compile_context, input, weights, biases, output, conv_info, act_info, enable_fast_math);
92  _function = std::move(f);
93  break;
94  }
95  default:
96  ARM_COMPUTE_ERROR("Not supported.");
97  break;
98  }
99 }
static CLScheduler & get()
Access the scheduler singleton.
#define ARM_COMPUTE_ERROR(msg)
Print the given message then throw an std::runtime_error.
Definition: Error.h:352
GPUTarget target() const
Get the target GPU.
Definition: CLScheduler.cpp:47
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:466
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
static Status validate(const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, const PadStrideInfo &conv_info, const WeightsInfo &weights_info=WeightsInfo(), const Size2D &dilation=Size2D(1U, 1U), const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false, unsigned int num_groups=1)
Static function to check if given info will lead to a valid configuration of CLConvolutionLayer.
const unsigned int num_groups
Definition: Im2Col.cpp:153
Convolution using Winograd.
static ConvolutionMethod get_convolution_method(const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *output, const PadStrideInfo &conv_info, const WeightsInfo &weights_info, const ActivationLayerInfo &act_info, const GPUTarget gpu_target, const Size2D &dilation=Size2D(1U, 1U), bool enable_fast_math=false)
Static function to check if given info will return the convolution called by CLConvolutionLayer.
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161
Convolution using GEMM.

◆ get_convolution_method()

ConvolutionMethod get_convolution_method ( const ITensorInfo input,
const ITensorInfo weights,
const ITensorInfo output,
const PadStrideInfo conv_info,
const WeightsInfo weights_info,
const ActivationLayerInfo act_info,
const GPUTarget  gpu_target,
const Size2D dilation = Size2D(1U, 1U),
bool  enable_fast_math = false 
)
static

Static function to check if given info will return the convolution called by CLConvolutionLayer.

Parameters
[in]inputSource tensor. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: QASYMM8/QASYMM8_SIGNED/F16/F32.
[in]weightsWeights tensor. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. Data type supported: Same as input or QASYMM8/QSYMM8_PER_CHANNEL when input is QASYMM8.
[in]outputDestination tensor. 3 lower dimensions represent a single output [width, height, OFM], while the rest represent batch of outputs. Data types supported: Same as input.
[in]conv_infoContains padding and stride information described in PadStrideInfo.
[in]weights_infoSpecifies if the weights tensor has been reshaped with CLWeightsReshapeKernel.
[in]act_info(Optional) Activation layer information in case of a fused activation.
[in]gpu_targetSpecifies the GPUTarget.
[in]dilation(Optional) Dilation, in elements, across x and y. Defaults to (1, 1).
[in]enable_fast_math(Optional) Enable fast math computation. In case this flag were set, the function could dispatch the fastest implementation available which may introduce a drop of accuracy as well. Default is false
Returns
a status

Definition at line 145 of file CLConvolutionLayer.cpp.

References ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_UNUSED, arm_compute::CHANNEL, arm_compute::test::validation::data_layout, ITensorInfo::data_layout(), ITensorInfo::dimension(), arm_compute::DIRECT, arm_compute::FFT, arm_compute::FLOOR, arm_compute::GEMM, arm_compute::get_data_layout_dimension_index(), arm_compute::HEIGHT, arm_compute::test::validation::info, arm_compute::NCHW, arm_compute::NHWC, PadStrideInfo::pad_bottom(), PadStrideInfo::pad_left(), PadStrideInfo::pad_right(), PadStrideInfo::pad_top(), PadStrideInfo::stride(), arm_compute::U, CLDirectConvolutionLayer::validate(), CLWinogradConvolutionLayer::validate(), CLFFTConvolutionLayer::validate(), arm_compute::WIDTH, and arm_compute::WINOGRAD.

Referenced by CLConvolutionLayer::configure(), arm_compute::test::validation::DATA_TEST_CASE(), and CLConvolutionLayer::validate().

147 {
152  ARM_COMPUTE_UNUSED(gpu_target);
153 
154  const size_t idx_w = get_data_layout_dimension_index(input->data_layout(), DataLayoutDimension::WIDTH);
155  const size_t idx_h = get_data_layout_dimension_index(input->data_layout(), DataLayoutDimension::HEIGHT);
156  const size_t idx_c = get_data_layout_dimension_index(input->data_layout(), DataLayoutDimension::CHANNEL);
157 
158  /* Input spatial dims, kernel size, IFM/OFM, conv info*/
159  using ConvolutionConfiguration = std::tuple<Size2D, Size2D, Size2D, PadStrideInfo, DataLayout>;
160  using ConfigurationMethod = std::pair<ConvolutionConfiguration, ConvolutionMethod>;
161 
162  const std::vector<ConfigurationMethod> known_configs =
163  {
164  // Alexnet
165  ConfigurationMethod(ConvolutionConfiguration(Size2D(27U, 27U), Size2D(5U, 5U), Size2D(48U, 128U), PadStrideInfo(1U, 1U, 2U, 2U), DataLayout::NCHW), ConvolutionMethod::DIRECT),
166  // VGG16 / VGG19
167  ConfigurationMethod(ConvolutionConfiguration(Size2D(224U, 224U), Size2D(3U, 3U), Size2D(3U, 64U), PadStrideInfo(1U, 1U, 1U, 1U), DataLayout::NCHW), ConvolutionMethod::DIRECT),
168  // Mobilenet 224
169  ConfigurationMethod(ConvolutionConfiguration(Size2D(224U, 224U), Size2D(3U, 3U), Size2D(3U, 32U), PadStrideInfo(2U, 2U, 0U, 1U, 0U, 1U, DimensionRoundingType::FLOOR), DataLayout::NCHW), ConvolutionMethod::GEMM),
170  // Mobilenet 160
171  ConfigurationMethod(ConvolutionConfiguration(Size2D(160U, 160U), Size2D(3U, 3U), Size2D(3U, 24U), PadStrideInfo(2U, 2U, 0U, 1U, 0U, 1U, DimensionRoundingType::FLOOR), DataLayout::NCHW), ConvolutionMethod::GEMM),
172  // Mobilenet 224
173  ConfigurationMethod(ConvolutionConfiguration(Size2D(224U, 224U), Size2D(3U, 3U), Size2D(3U, 32U), PadStrideInfo(2U, 2U, 0U, 1U, 0U, 1U, DimensionRoundingType::FLOOR), DataLayout::NHWC), ConvolutionMethod::GEMM),
174  // Mobilenet 160
175  ConfigurationMethod(ConvolutionConfiguration(Size2D(160U, 160U), Size2D(3U, 3U), Size2D(3U, 24U), PadStrideInfo(2U, 2U, 0U, 1U, 0U, 1U, DimensionRoundingType::FLOOR), DataLayout::NHWC), ConvolutionMethod::GEMM),
176  };
177 
178  const auto find_config = [&](ConfigurationMethod c)
179  {
180  const ConvolutionConfiguration config = c.first;
181  const PadStrideInfo info = std::get<3>(config);
182  const DataLayout data_layout = std::get<4>(config);
183 
184  return std::get<0>(config) == Size2D(input->dimension(idx_w), input->dimension(idx_h)) && std::get<1>(config) == Size2D(weights->dimension(idx_w), weights->dimension(idx_h))
185  && std::get<2>(config) == Size2D(weights->dimension(idx_c), weights->dimension(3)) && info.pad_top() == conv_info.pad_top() && info.pad_right() == conv_info.pad_right()
186  && info.pad_bottom() == conv_info.pad_bottom() && info.pad_left() == conv_info.pad_left() && info.stride() == conv_info.stride() && (data_layout == input->data_layout());
187  };
188 
189  std::vector<ConfigurationMethod>::const_iterator found;
190  if((found = std::find_if(known_configs.begin(), known_configs.end(), find_config)) != known_configs.end())
191  {
192  return (*found).second;
193  }
194 
195  if(dilation != Size2D(1U, 1U))
196  {
198  }
199  else
200  {
201  if(input->data_layout() == DataLayout::NCHW)
202  {
203  // SRGAN
204  if((input->dimension(idx_h) > 720U) && (output->dimension(idx_h) > 720U) && (weights->dimension(idx_h) == 9) && (conv_info.pad_top() < 3)
205  && (CLDirectConvolutionLayer::validate(input, weights, nullptr, output, conv_info, act_info)))
206  {
208  }
209  if((weights->dimension(idx_h) > 5) && (input->dimension(idx_c) > output->dimension(idx_c)) && (CLFFTConvolutionLayer::validate(input, weights, nullptr, output, conv_info, act_info, enable_fast_math)))
210  {
211  return ConvolutionMethod::FFT;
212  }
213  if(input->dimension(idx_c) < 16)
214  {
216  }
217  return bool(CLWinogradConvolutionLayer::validate(input, weights, nullptr, output, conv_info, act_info, enable_fast_math)) ? ConvolutionMethod::WINOGRAD : ConvolutionMethod::GEMM;
218  }
219  else
220  {
221  // SRGAN
222  if((input->dimension(idx_h) > 720U) && (output->dimension(idx_h) > 720U) && (weights->dimension(idx_h) == 9) && (conv_info.pad_top() < 3)
223  && (CLDirectConvolutionLayer::validate(input, weights, nullptr, output, conv_info, act_info)))
224  {
226  }
227  if((weights->dimension(idx_h) > 7) && (input->dimension(idx_c) > output->dimension(idx_c)) && (CLDirectConvolutionLayer::validate(input, weights, nullptr, output, conv_info, act_info)))
228  {
230  }
231  if(input->dimension(idx_c) < 16)
232  {
234  }
235  return bool(CLWinogradConvolutionLayer::validate(input, weights, nullptr, output, conv_info, act_info, enable_fast_math)) ? ConvolutionMethod::WINOGRAD : ConvolutionMethod::GEMM;
236  }
237  }
238 }
const DataLayout data_layout
Definition: Im2Col.cpp:151
static Status validate(const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, const PadStrideInfo &conv_info, const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false)
Static function to check if given info will lead to a valid configuration of CLFFTConvolutionLayer.
static Status validate(const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, const PadStrideInfo &conv_info, const ActivationLayerInfo &act_info=ActivationLayerInfo())
Static function to check if given info will lead to a valid configuration of CLDirectConvolutionLayer...
#define ARM_COMPUTE_UNUSED(...)
To avoid unused variables warnings.
Definition: Error.h:152
static Status validate(const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, const PadStrideInfo &conv_info, const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false)
Static function to check if given info will lead to a valid configuration of CLWinogradConvolutionLay...
Num samples, channels, height, width.
Convolution using Winograd.
ScaleKernelInfo info(interpolation_policy, default_border_mode, PixelValue(), sampling_policy, false)
Num samples, height, width, channels.
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161
size_t get_data_layout_dimension_index(const DataLayout data_layout, const DataLayoutDimension data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:193
DataLayout
[DataLayout enum definition]
Definition: Types.h:120
Convolution using GEMM.

◆ operator=() [1/2]

CLConvolutionLayer& operator= ( const CLConvolutionLayer )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

CLConvolutionLayer& operator= ( CLConvolutionLayer &&  )
default

Default move assignment operator.

◆ prepare()

void prepare ( )
overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Note
Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from IFunction.

Definition at line 246 of file CLConvolutionLayer.cpp.

Referenced by CLDirectDeconvolutionLayer::prepare(), and CLConvolutionLayer::run().

247 {
248  _function->prepare();
249 }

◆ run()

void run ( )
overridevirtual

Run the kernels contained in the function.

For Neon kernels:

  • Multi-threading is used for the kernels which are parallelisable.
  • By default std::thread::hardware_concurrency() threads are used.
Note
CPPScheduler::set_num_threads() can be used to manually set the number of threads

For OpenCL kernels:

  • All the kernels are enqueued on the queue associated with CLScheduler.
  • The queue is then flushed.
Note
The function will not block until the kernels are executed. It is the user's responsibility to wait.
Will call prepare() on first run if hasn't been done

Implements IFunction.

Definition at line 240 of file CLConvolutionLayer.cpp.

References CLConvolutionLayer::prepare().

Referenced by CLDirectDeconvolutionLayer::run().

241 {
242  prepare();
243  _function->run();
244 }
void prepare() override
Prepare the function for executing.

◆ validate()

Status validate ( const ITensorInfo input,
const ITensorInfo weights,
const ITensorInfo biases,
const ITensorInfo output,
const PadStrideInfo conv_info,
const WeightsInfo weights_info = WeightsInfo(),
const Size2D dilation = Size2D(1U, 1U),
const ActivationLayerInfo act_info = ActivationLayerInfo(),
bool  enable_fast_math = false,
unsigned int  num_groups = 1 
)
static

Static function to check if given info will lead to a valid configuration of CLConvolutionLayer.

Parameters
[in]inputSource tensor. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: QASYMM8/QASYMM8_SIGNED/F16/F32.
[in]weightsWeights tensor. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. Data type supported: Same as input or QASYMM8/QSYMM8_PER_CHANNEL when input is QASYMM8.
[in]biasesBiases tensor. Shared biases supported. Biases are 1D tensor with dimensions [OFM]. Data type supported:Same as input.
[in]outputDestination tensor. 3 lower dimensions represent a single output [width, height, OFM], while the rest represent batch of outputs. Data types supported: Same as input.
[in]conv_infoContains padding and stride information described in PadStrideInfo.
[in]weights_infoSpecifies if the weights tensor has been reshaped with CLWeightsReshapeKernel.
[in]dilation(Optional) Dilation, in elements, across x and y. Defaults to (1, 1).
[in]act_info(Optional) Activation layer information in case of a fused activation.
[in]enable_fast_math(Optional) Enable fast math computation. In case this flag were set, the function could dispatch the fastest implementation available which may introduce a drop of accuracy as well. Default is false
[in]num_groups(Optional) Number of groups when performing a grouped convolution. num_groups != 1 is only supported for NCHW data layout
Returns
a status

Definition at line 101 of file CLConvolutionLayer.cpp.

References ARM_COMPUTE_ERROR, ARM_COMPUTE_RETURN_ERROR_ON_MSG, ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR, ARM_COMPUTE_RETURN_ON_ERROR, ITensorInfo::data_layout(), arm_compute::DIRECT, arm_compute::FFT, arm_compute::GEMM, CLScheduler::get(), CLConvolutionLayer::get_convolution_method(), arm_compute::NCHW, CLScheduler::target(), CLDirectConvolutionLayer::validate(), CLWinogradConvolutionLayer::validate(), CLFFTConvolutionLayer::validate(), CLGEMMConvolutionLayer::validate(), and arm_compute::WINOGRAD.

Referenced by CLConvolutionLayer::configure(), and CLDirectDeconvolutionLayer::validate().

103 {
104  ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR(input, weights, output);
105  ARM_COMPUTE_RETURN_ERROR_ON_MSG((num_groups != 1) && (input->data_layout() != DataLayout::NCHW), "Grouping (num_groups != 1) with NHWC data layout is not supported");
106 
107  const GPUTarget gpu_target = CLScheduler::get().target();
108 
109  switch(CLConvolutionLayer::get_convolution_method(input, weights, output, conv_info, weights_info, act_info, gpu_target, dilation, enable_fast_math))
110  {
112  {
113  //Validate Winograd
114  ARM_COMPUTE_RETURN_ERROR_ON_MSG(num_groups != 1, "Grouping (num_groups != 1) with CLWinogradConvolutionLayer is not supported");
115  ARM_COMPUTE_RETURN_ON_ERROR(CLWinogradConvolutionLayer::validate(input, weights, biases, output, conv_info, act_info, enable_fast_math));
116  break;
117  }
119  {
120  // Validate direct convolution layer
121  ARM_COMPUTE_RETURN_ERROR_ON_MSG(num_groups != 1, "Grouping (num_groups != 1) with CLDirectConvolutionLayer is not supported");
123  break;
124  }
126  {
127  // Validate gemm-based convolution layer
129  break;
130  }
132  {
133  // Validate FFT-based convolution layer
134  ARM_COMPUTE_RETURN_ON_ERROR(CLFFTConvolutionLayer::validate(input, weights, nullptr, output, conv_info, act_info, enable_fast_math));
135  break;
136  }
137  default:
138  ARM_COMPUTE_ERROR("Not supported.");
139  break;
140  }
141 
142  return Status{};
143 }
static CLScheduler & get()
Access the scheduler singleton.
#define ARM_COMPUTE_ERROR(msg)
Print the given message then throw an std::runtime_error.
Definition: Error.h:352
GPUTarget target() const
Get the target GPU.
Definition: CLScheduler.cpp:47
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204
static Status validate(const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, const PadStrideInfo &conv_info, const WeightsInfo &weights_info=WeightsInfo(), const Size2D &dilation=Size2D(1U, 1U), const ActivationLayerInfo &act_info=ActivationLayerInfo(), unsigned int num_groups=1)
Static function to check if given info will lead to a valid configuration of CLGEMMConvolutionLayer.
static Status validate(const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, const PadStrideInfo &conv_info, const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false)
Static function to check if given info will lead to a valid configuration of CLFFTConvolutionLayer.
static Status validate(const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, const PadStrideInfo &conv_info, const ActivationLayerInfo &act_info=ActivationLayerInfo())
Static function to check if given info will lead to a valid configuration of CLDirectConvolutionLayer...
#define ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR(...)
Definition: Validate.h:163
const unsigned int num_groups
Definition: Im2Col.cpp:153
static Status validate(const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, const PadStrideInfo &conv_info, const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false)
Static function to check if given info will lead to a valid configuration of CLWinogradConvolutionLay...
Num samples, channels, height, width.
Convolution using Winograd.
GPUTarget
Available GPU Targets.
Definition: GPUTarget.h:34
static ConvolutionMethod get_convolution_method(const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *output, const PadStrideInfo &conv_info, const WeightsInfo &weights_info, const ActivationLayerInfo &act_info, const GPUTarget gpu_target, const Size2D &dilation=Size2D(1U, 1U), bool enable_fast_math=false)
Static function to check if given info will return the convolution called by CLConvolutionLayer.
#define ARM_COMPUTE_RETURN_ERROR_ON_MSG(cond, msg)
If the condition is true, an error is returned.
Definition: Error.h:244
Convolution using GEMM.

The documentation for this class was generated from the following files: