Compute Library
 21.02
CLDirectConvolutionLayerKernel Class Reference

Interface for the direct convolution kernel. More...

#include <CLDirectConvolutionLayerKernel.h>

Collaboration diagram for CLDirectConvolutionLayerKernel:
[legend]

Public Member Functions

 CLDirectConvolutionLayerKernel ()
 Default constructor. More...
 
 CLDirectConvolutionLayerKernel (const CLDirectConvolutionLayerKernel &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
CLDirectConvolutionLayerKerneloperator= (const CLDirectConvolutionLayerKernel &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 CLDirectConvolutionLayerKernel (CLDirectConvolutionLayerKernel &&)=default
 Allow instances of this class to be moved. More...
 
CLDirectConvolutionLayerKerneloperator= (CLDirectConvolutionLayerKernel &&)=default
 Allow instances of this class to be moved. More...
 
 ~CLDirectConvolutionLayerKernel ()=default
 Default destructor. More...
 
void configure (const ICLTensor *input, const ICLTensor *weights, const ICLTensor *biases, ICLTensor *output, const PadStrideInfo &conv_info)
 Set the input, weights, biases and output tensors. More...
 
void configure (const CLCompileContext &compile_context, const ICLTensor *input, const ICLTensor *weights, const ICLTensor *biases, ICLTensor *output, const PadStrideInfo &conv_info)
 Set the input, weights, biases and output tensors. More...
 
void run (const Window &window, cl::CommandQueue &queue) override
 Enqueue the OpenCL kernel to process the given window on the passed OpenCL command queue. More...
 
BorderSize border_size () const override
 The size of the border for that kernel. More...
 
- Public Member Functions inherited from ICLKernel
 ICLKernel ()
 Constructor. More...
 
cl::Kernel & kernel ()
 Returns a reference to the OpenCL kernel of this object. More...
 
template<typename T >
void add_1D_array_argument (unsigned int &idx, const ICLArray< T > *array, const Strides &strides, unsigned int num_dimensions, const Window &window)
 Add the passed 1D array's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_1D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 1D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_1D_tensor_argument_if (bool cond, unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 1D tensor's parameters to the object's kernel's arguments starting from the index idx if the condition is true. More...
 
void add_2D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 2D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_2D_tensor_argument_if (bool cond, unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 2D tensor's parameters to the object's kernel's arguments starting from the index idx if the condition is true. More...
 
void add_3D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 3D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_4D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 4D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
virtual void run_op (ITensorPack &tensors, const Window &window, cl::CommandQueue &queue)
 Enqueue the OpenCL kernel to process the given window on the passed OpenCL command queue. More...
 
template<typename T >
void add_argument (unsigned int &idx, T value)
 Add the passed parameters to the object's kernel's arguments starting from the index idx. More...
 
void set_lws_hint (const cl::NDRange &lws_hint)
 Set the Local-Workgroup-Size hint. More...
 
cl::NDRange lws_hint () const
 Return the Local-Workgroup-Size hint. More...
 
void set_wbsm_hint (const cl_int &wbsm_hint)
 Set the workgroup batch size modifier hint. More...
 
cl_int wbsm_hint () const
 Return the workgroup batch size modifier hint. More...
 
const std::string & config_id () const
 Get the configuration ID. More...
 
void set_target (GPUTarget target)
 Set the targeted GPU architecture. More...
 
void set_target (cl::Device &device)
 Set the targeted GPU architecture according to the CL device. More...
 
GPUTarget get_target () const
 Get the targeted GPU architecture. More...
 
size_t get_max_workgroup_size ()
 Get the maximum workgroup size for the device the CLKernelLibrary uses. More...
 
template<unsigned int dimension_size>
void add_tensor_argument (unsigned &idx, const ICLTensor *tensor, const Window &window)
 
template<typename T , unsigned int dimension_size>
void add_array_argument (unsigned &idx, const ICLArray< T > *array, const Strides &strides, unsigned int num_dimensions, const Window &window)
 Add the passed array's parameters to the object's kernel's arguments starting from the index idx. More...
 
- Public Member Functions inherited from IKernel
 IKernel ()
 Constructor. More...
 
virtual ~IKernel ()=default
 Destructor. More...
 
virtual bool is_parallelisable () const
 Indicates whether or not the kernel is parallelisable. More...
 
const Windowwindow () const
 The maximum window the kernel can be executed on. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, const PadStrideInfo &conv_info, const GPUTarget target)
 Static function to check if given info will lead to a valid configuration of CLDirectConvolutionLayerKernel. More...
 
- Static Public Member Functions inherited from ICLKernel
static constexpr unsigned int num_arguments_per_1D_array ()
 Returns the number of arguments enqueued per 1D array object. More...
 
static constexpr unsigned int num_arguments_per_1D_tensor ()
 Returns the number of arguments enqueued per 1D tensor object. More...
 
static constexpr unsigned int num_arguments_per_2D_tensor ()
 Returns the number of arguments enqueued per 2D tensor object. More...
 
static constexpr unsigned int num_arguments_per_3D_tensor ()
 Returns the number of arguments enqueued per 3D tensor object. More...
 
static constexpr unsigned int num_arguments_per_4D_tensor ()
 Returns the number of arguments enqueued per 4D tensor object. More...
 
static cl::NDRange gws_from_window (const Window &window)
 Get the global work size given an execution window. More...
 

Data Fields

const ICLTensor_input
 
const ICLTensor_biases
 
const ICLTensor_weights
 
ICLTensor_output
 
DataLayout _data_layout
 
BorderSize _border_size
 
int _conv_stride_x
 
int _conv_stride_y
 
PadStrideInfo _conv_info
 

Detailed Description

Interface for the direct convolution kernel.

Definition at line 36 of file CLDirectConvolutionLayerKernel.h.

Constructor & Destructor Documentation

◆ CLDirectConvolutionLayerKernel() [1/3]

◆ CLDirectConvolutionLayerKernel() [2/3]

Prevent instances of this class from being copied (As this class contains pointers)

◆ CLDirectConvolutionLayerKernel() [3/3]

Allow instances of this class to be moved.

◆ ~CLDirectConvolutionLayerKernel()

Default destructor.

Member Function Documentation

◆ border_size()

BorderSize border_size ( ) const
overridevirtual

The size of the border for that kernel.

Returns
The width in number of elements of the border.

Reimplemented from IKernel.

Definition at line 324 of file CLDirectConvolutionLayerKernel.cpp.

References CLDirectConvolutionLayerKernel::_border_size.

Referenced by CLDirectConvolutionLayerKernel::configure().

◆ configure() [1/2]

void configure ( const ICLTensor input,
const ICLTensor weights,
const ICLTensor biases,
ICLTensor output,
const PadStrideInfo conv_info 
)

Set the input, weights, biases and output tensors.

Note
: DirectConvolution only works in the following configurations: 1x1 convolution with stride_x = 1/2/3, stride_y = 1/2/3 3x3 convolution with stride_x = 1/2, stride_y = 1/2 5x5 convolution with stride_x = 1/2, stride_y = 1/2 9x9 convolution with stride_x = 1/2, stride_y = 1/2
Parameters
[in]inputThe input tensor to convolve. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: QASYMM8_SIGNED/QASYMM8/F16/F32.
[in]weightsWeights tensor. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. The 3rd dimension must be the same as the input's volume 3rd dimension. Data type supported:Same as input.
[in]biasesBiases tensor. Biases are 1D tensor with dimension [OFM]. Data type supported: Should match input data type, except for input of QASYMM8 and QASYMM8_SIGNED type where biases should be of S32 type
[out]outputOutput tensor. The 3rd dimensions must be equal to the 4th dimension of the kernels tensor. Data types supported: Same as input.
[in]conv_infoContains padding and stride information described in PadStrideInfo.

Definition at line 329 of file CLDirectConvolutionLayerKernel.cpp.

References CLKernelLibrary::get().

Referenced by arm_compute::test::validation::TEST_CASE().

330 {
331  configure(CLKernelLibrary::get().get_compile_context(), input, weights, biases, output, conv_info);
332 }
static CLKernelLibrary & get()
Access the KernelLibrary singleton.
void configure(const ICLTensor *input, const ICLTensor *weights, const ICLTensor *biases, ICLTensor *output, const PadStrideInfo &conv_info)
Set the input, weights, biases and output tensors.

◆ configure() [2/2]

void configure ( const CLCompileContext compile_context,
const ICLTensor input,
const ICLTensor weights,
const ICLTensor biases,
ICLTensor output,
const PadStrideInfo conv_info 
)

Set the input, weights, biases and output tensors.

Note
: DirectConvolution only works in the following configurations: 1x1 convolution with stride_x = 1/2/3, stride_y = 1/2/3 3x3 convolution with stride_x = 1/2, stride_y = 1/2 5x5 convolution with stride_x = 1/2, stride_y = 1/2 9x9 convolution with stride_x = 1/2, stride_y = 1/2
Parameters
[in]compile_contextThe compile context to be used.
[in]inputThe input tensor to convolve. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: QASYMM8_SIGNED/QASYMM8/F16/F32.
[in]weightsWeights tensor. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. The 3rd dimension must be the same as the input's volume 3rd dimension. Data type supported:Same as input.
[in]biasesBiases tensor. Biases are 1D tensor with dimension [OFM]. Data type supported: Should match input data type, except for input of QASYMM8 and QASYMM8_SIGNED type where biases should be of S32 type
[out]outputOutput tensor. The 3rd dimensions must be equal to the 4th dimension of the kernels tensor. Data types supported: Same as input.
[in]conv_infoContains padding and stride information described in PadStrideInfo.

Definition at line 334 of file CLDirectConvolutionLayerKernel.cpp.

References CLDirectConvolutionLayerKernel::_biases, CLDirectConvolutionLayerKernel::_border_size, CLDirectConvolutionLayerKernel::_conv_info, CLDirectConvolutionLayerKernel::_conv_stride_x, CLDirectConvolutionLayerKernel::_conv_stride_y, CLDirectConvolutionLayerKernel::_data_layout, CLDirectConvolutionLayerKernel::_input, CLDirectConvolutionLayerKernel::_output, CLDirectConvolutionLayerKernel::_weights, CLBuildOptions::add_option(), CLBuildOptions::add_option_if(), arm_compute::adjust_vec_size(), ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, CLDirectConvolutionLayerKernel::border_size(), build_options, arm_compute::quantization::calculate_quantized_multiplier(), arm_compute::CHANNEL, arm_compute::test::validation::conv_info, arm_compute::create_kernel(), ITensorInfo::data_layout(), ITensorInfo::data_type(), ITensorInfo::dimension(), PixelValue::get(), arm_compute::get_cl_type_from_data_type(), arm_compute::get_data_layout_dimension_index(), arm_compute::get_data_size_from_data_type(), ICLKernel::get_target(), arm_compute::HEIGHT, ITensor::info(), arm_compute::test::validation::input, arm_compute::is_data_type_quantized(), kernel_name, arm_compute::lower_string(), arm_compute::NHWC, UniformQuantizationInfo::offset, CLBuildOptions::options(), PadStrideInfo::pad_left(), PadStrideInfo::pad_top(), ITensorInfo::padding(), ITensorInfo::quantization_info(), arm_compute::S32, UniformQuantizationInfo::scale, PadStrideInfo::stride(), arm_compute::string_from_data_layout(), arm_compute::string_from_data_type(), arm_compute::support::cpp11::to_string(), QuantizationInfo::uniform(), arm_compute::validate_arguments(), and arm_compute::WIDTH.

336 {
337  ARM_COMPUTE_ERROR_ON_NULLPTR(input, weights, output);
338 
339  // Perform validation
341  weights->info(),
342  (biases != nullptr) ? biases->info() : nullptr,
343  output->info(),
344  conv_info));
345 
346  _conv_stride_x = std::get<0>(conv_info.stride());
347  _conv_stride_y = std::get<1>(conv_info.stride());
348  _data_layout = input->info()->data_layout();
349  _input = input;
350  _weights = weights;
351  _output = output;
352  _biases = biases;
354 
355  const unsigned int width_idx = get_data_layout_dimension_index(_data_layout, DataLayoutDimension::WIDTH);
356  const unsigned int height_idx = get_data_layout_dimension_index(_data_layout, DataLayoutDimension::HEIGHT);
357  const unsigned int channel_idx = get_data_layout_dimension_index(_data_layout, DataLayoutDimension::CHANNEL);
358  const unsigned int kernel_size = weights->info()->dimension(width_idx);
359  const DataType data_type = input->info()->data_type();
360 
361  const GPUTarget gpu_target = get_target();
362 
363  // Configure kernel window
364  auto win_config = validate_and_configure_window(input->info(), weights->info(), output->info(), conv_info, gpu_target);
365  ARM_COMPUTE_ERROR_THROW_ON(win_config.first);
366  ICLKernel::configure_internal(win_config.second);
367 
368  std::stringstream kernel_name;
369  CLBuildOptions build_options;
370 
371  if(_data_layout == DataLayout::NHWC)
372  {
373  _border_size = BorderSize();
374 
375  kernel_name << "direct_convolution_nhwc";
376 
377  const unsigned int n0 = win_config.second.x().step();
378  const unsigned int m0 = win_config.second.y().step();
379  const unsigned int k0 = adjust_vec_size(16u, _input->info()->dimension(channel_idx));
380  const unsigned int partial_store_n0 = _output->info()->dimension(channel_idx) % n0;
381  const unsigned int partial_store_m0 = (_output->info()->dimension(width_idx) * _output->info()->dimension(height_idx)) % m0;
382  const unsigned int pad_left = conv_info.pad_left();
383  const unsigned int pad_top = conv_info.pad_top();
384 
385  if(_biases != nullptr)
386  {
387  build_options.add_option(std::string("-DHAS_BIAS"));
388  build_options.add_option(std::string("-DBIA_DATA_TYPE=" + get_cl_type_from_data_type(_biases->info()->data_type())));
389  }
390  build_options.add_option("-DSRC_WIDTH=" + support::cpp11::to_string(_input->info()->dimension(width_idx)));
391  build_options.add_option("-DSRC_HEIGHT=" + support::cpp11::to_string(_input->info()->dimension(height_idx)));
392  build_options.add_option("-DSRC_CHANNELS=" + support::cpp11::to_string(_input->info()->dimension(channel_idx)));
393  build_options.add_option("-DSRC_DATA_TYPE=" + get_cl_type_from_data_type(_input->info()->data_type()));
394  build_options.add_option("-DDST_WIDTH=" + support::cpp11::to_string(_output->info()->dimension(width_idx)));
395  build_options.add_option("-DDST_HEIGHT=" + support::cpp11::to_string(_output->info()->dimension(height_idx)));
396  build_options.add_option("-DDST_CHANNELS=" + support::cpp11::to_string(_output->info()->dimension(channel_idx)));
397  build_options.add_option("-DDST_DATA_TYPE=" + get_cl_type_from_data_type(_output->info()->data_type()));
398  build_options.add_option("-DWEI_WIDTH=" + support::cpp11::to_string(_weights->info()->dimension(width_idx)));
399  build_options.add_option("-DWEI_HEIGHT=" + support::cpp11::to_string(_weights->info()->dimension(height_idx)));
400  build_options.add_option("-DWEI_DATA_TYPE=" + get_cl_type_from_data_type(_weights->info()->data_type()));
401  build_options.add_option("-DSTRIDE_X=" + support::cpp11::to_string(_conv_stride_x));
402  build_options.add_option("-DSTRIDE_Y=" + support::cpp11::to_string(_conv_stride_y));
403  build_options.add_option("-DPAD_LEFT=" + support::cpp11::to_string(pad_left));
404  build_options.add_option("-DPAD_TOP=" + support::cpp11::to_string(pad_top));
405  build_options.add_option("-DN0=" + support::cpp11::to_string(n0));
406  build_options.add_option("-DM0=" + support::cpp11::to_string(m0));
407  build_options.add_option("-DK0=" + support::cpp11::to_string(k0));
408  build_options.add_option("-DPARTIAL_STORE_N0=" + support::cpp11::to_string(partial_store_n0));
409  build_options.add_option("-DPARTIAL_STORE_M0=" + support::cpp11::to_string(partial_store_m0));
410 
411  if(is_data_type_quantized(data_type))
412  {
413  const UniformQuantizationInfo iqinfo = _input->info()->quantization_info().uniform();
414  const UniformQuantizationInfo wqinfo = _weights->info()->quantization_info().uniform();
415  const UniformQuantizationInfo oqinfo = _output->info()->quantization_info().uniform();
416 
417  PixelValue zero_value = PixelValue(0, input->info()->data_type(), input->info()->quantization_info());
418  int zero_value_s32;
419  zero_value.get(zero_value_s32);
420 
421  float multiplier = iqinfo.scale * wqinfo.scale / oqinfo.scale;
422  int output_multiplier = 0;
423  int output_shift = 0;
424  quantization::calculate_quantized_multiplier(multiplier, &output_multiplier, &output_shift);
425  build_options.add_option("-DIS_QUANTIZED");
426  build_options.add_option("-DDST_MULTIPLIER=" + support::cpp11::to_string(output_multiplier));
427  build_options.add_option("-DDST_SHIFT=" + support::cpp11::to_string(output_shift));
428  build_options.add_option("-DSRC_OFFSET=" + support::cpp11::to_string(-iqinfo.offset));
429  build_options.add_option("-DWEI_OFFSET=" + support::cpp11::to_string(-wqinfo.offset));
430  build_options.add_option("-DDST_OFFSET=" + support::cpp11::to_string(oqinfo.offset));
431  build_options.add_option("-DZERO_VALUE=" + support::cpp11::to_string(zero_value_s32));
432  build_options.add_option("-DACC_DATA_TYPE=" + get_cl_type_from_data_type(DataType::S32));
433  }
434  else
435  {
436  build_options.add_option("-DACC_DATA_TYPE=" + get_cl_type_from_data_type(data_type));
437  build_options.add_option("-DSRC_OFFSET=" + support::cpp11::to_string(0));
438  build_options.add_option("-DWEI_OFFSET=" + support::cpp11::to_string(0));
439  build_options.add_option("-DDST_OFFSET=" + support::cpp11::to_string(0));
440  }
441  }
442  else
443  {
444  _border_size = BorderSize(_input->info()->padding());
445 
446  kernel_name << "direct_convolution" << kernel_size << "x" << kernel_size;
447 
448  build_options.add_option_if(_biases != nullptr, std::string("-DHAS_BIAS"));
449 
450  const bool run_optimized_for_bifrost = can_run_optimized_kernel_for_bifrost_nchw(gpu_target, _conv_stride_x, _conv_stride_y, kernel_size, data_type, _data_layout);
451 
452  if(run_optimized_for_bifrost)
453  {
454  build_options.add_option(std::string("-DWEIGHTS_DEPTH=" + support::cpp11::to_string(_weights->info()->dimension(channel_idx))));
455 
456  kernel_name << "_f32_bifrost";
457  }
458  else
459  {
460  build_options.add_option(std::string("-DDATA_TYPE=" + get_cl_type_from_data_type(data_type)));
461  build_options.add_option(std::string("-DDATA_SIZE=" + get_data_size_from_data_type(data_type)));
462  build_options.add_option(std::string("-DWEIGHTS_DEPTH=" + support::cpp11::to_string(_weights->info()->dimension(channel_idx))));
463  build_options.add_option(std::string("-DSTRIDE_X=" + support::cpp11::to_string(_conv_stride_x)));
464  build_options.add_option(std::string("-DDATA_TYPE_PROMOTED=" + get_cl_type_from_data_type(data_type)));
465 
466  if(is_data_type_quantized(data_type))
467  {
468  const UniformQuantizationInfo iqinfo = _input->info()->quantization_info().uniform();
469  const UniformQuantizationInfo wqinfo = _weights->info()->quantization_info().uniform();
470  const UniformQuantizationInfo oqinfo = _output->info()->quantization_info().uniform();
471 
472  float multiplier = iqinfo.scale * wqinfo.scale / oqinfo.scale;
473  int output_multiplier = 0;
474  int output_shift = 0;
475  quantization::calculate_quantized_multiplier(multiplier, &output_multiplier, &output_shift);
476  build_options.add_option("-DOUTPUT_MULTIPLIER=" + support::cpp11::to_string(output_multiplier));
477  build_options.add_option("-DOUTPUT_SHIFT=" + support::cpp11::to_string(output_shift));
478  build_options.add_option("-DKERNEL_SIZE=" + support::cpp11::to_string(kernel_size));
479  build_options.add_option("-DINPUT_OFFSET=" + support::cpp11::to_string(-iqinfo.offset));
480  build_options.add_option("-DWEIGHTS_OFFSET=" + support::cpp11::to_string(-wqinfo.offset));
481  build_options.add_option("-DOUTPUT_OFFSET=" + support::cpp11::to_string(oqinfo.offset));
482 
483  kernel_name.str("direct_convolution_quantized");
484  }
485  }
486  }
487 
488  _kernel = create_kernel(compile_context, kernel_name.str(), build_options.options());
489 
490  // Set config_id for enabling LWS tuning
491  _config_id = kernel_name.str();
492  _config_id += "_";
493  _config_id += lower_string(string_from_data_type(data_type));
494  _config_id += "_";
495  _config_id += support::cpp11::to_string(kernel_size);
496  _config_id += "_";
497  _config_id += support::cpp11::to_string(border_size().left);
498  _config_id += "_";
499  _config_id += support::cpp11::to_string(border_size().top);
500  _config_id += "_";
501  _config_id += support::cpp11::to_string(border_size().right);
502  _config_id += "_";
503  _config_id += support::cpp11::to_string(border_size().bottom);
504  _config_id += "_";
506  _config_id += "_";
507  _config_id += support::cpp11::to_string(_conv_stride_y);
508  _config_id += "_";
509  _config_id += support::cpp11::to_string(output->info()->dimension(width_idx));
510  _config_id += "_";
511  _config_id += support::cpp11::to_string(output->info()->dimension(height_idx));
512  _config_id += "_";
513  _config_id += lower_string(string_from_data_layout(_data_layout));
514 }
bool is_data_type_quantized(DataType dt)
Check if a given data type is of quantized type.
Definition: Utils.h:1168
virtual size_t dimension(size_t index) const =0
Return the size of the requested dimension.
std::string to_string(T &&value)
Convert integer and float values to string.
virtual DataType data_type() const =0
Data type used for each element of the tensor.
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
Status calculate_quantized_multiplier(float multiplier, int32_t *quant_multiplier, int32_t *shift, bool ignore_epsilon=false)
Calculate quantized representation of multiplier.
std::string lower_string(const std::string &val)
Lower a given string.
Definition: Utils.cpp:350
1 channel, 1 S32 per channel
const DataType data_type
Definition: Im2Col.cpp:150
cl::Kernel create_kernel(const CLCompileContext &ctx, const std::string &kernel_name, const std::set< std::string > &build_opts=std::set< std::string >())
Creates an opencl kernel using a compile context.
Definition: CLHelpers.cpp:403
const std::string & string_from_data_type(DataType dt)
Convert a data type identity into a string.
Definition: Utils.cpp:135
std::string get_data_size_from_data_type(const DataType &dt)
Get the size of a data type in number of bits.
Definition: CLHelpers.cpp:191
std::set< std::string > build_options
std::string kernel_name
GPUTarget get_target() const
Get the targeted GPU architecture.
Definition: ICLKernel.h:336
UniformQuantizationInfo uniform() const
Return per layer quantization info.
std::string get_cl_type_from_data_type(const DataType &dt)
Translates a tensor data type to the appropriate OpenCL type.
Definition: CLHelpers.cpp:37
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor&#39;s metadata.
virtual PaddingSize padding() const =0
Padding of tensor.
virtual QuantizationInfo quantization_info() const =0
Get the quantization settings (scale and offset) of the tensor.
BorderSize border_size() const override
The size of the border for that kernel.
const std::string & string_from_data_layout(DataLayout dl)
Convert a data layout identity into a string.
Definition: Utils.cpp:123
GPUTarget
Available GPU Targets.
Definition: GPUTarget.h:34
Num samples, height, width, channels.
Status validate_arguments(const ITensorInfo *input, const ITensorInfo *bias, const ITensorInfo *output, const GEMMLowpOutputStageInfo *output_stage)
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161
unsigned int adjust_vec_size(unsigned int vec_size, size_t dim0)
Returns the adjusted vector size in case it is less than the input&#39;s first dimension, getting rounded down to its closest valid vector size.
Definition: Utils.h:1358
size_t get_data_layout_dimension_index(const DataLayout data_layout, const DataLayoutDimension data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:193
DataType
Available data types.
Definition: Types.h:77

◆ operator=() [1/2]

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

Allow instances of this class to be moved.

◆ run()

void run ( const Window window,
cl::CommandQueue &  queue 
)
overridevirtual

Enqueue the OpenCL kernel to process the given window on the passed OpenCL command queue.

Note
The queue is not flushed by this method, and therefore the kernel will not have been executed by the time this method returns.
Parameters
[in]windowRegion on which to execute the kernel. (Must be a valid region of the window returned by window()).
[in,out]queueCommand queue on which to enqueue the kernel.

Reimplemented from ICLKernel.

Definition at line 525 of file CLDirectConvolutionLayerKernel.cpp.

References CLDirectConvolutionLayerKernel::_biases, CLDirectConvolutionLayerKernel::_conv_info, CLDirectConvolutionLayerKernel::_conv_stride_x, CLDirectConvolutionLayerKernel::_conv_stride_y, CLDirectConvolutionLayerKernel::_data_layout, CLDirectConvolutionLayerKernel::_input, CLDirectConvolutionLayerKernel::_output, CLDirectConvolutionLayerKernel::_weights, ICLKernel::add_1D_tensor_argument(), ICLKernel::add_3D_tensor_argument(), Window::adjust(), ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW, ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL, ITensorInfo::dimension(), Window::DimX, Window::DimY, Window::DimZ, arm_compute::enqueue(), Window::first_slice_window_3D(), arm_compute::get_data_layout_dimension_index(), arm_compute::HEIGHT, ITensor::info(), ICLKernel::lws_hint(), arm_compute::NHWC, ICLKernel::num_arguments_per_3D_tensor(), PadStrideInfo::pad_left(), PadStrideInfo::pad_top(), Window::set(), Window::set_dimension_step(), arm_compute::test::validation::reference::slice(), Window::slide_window_slice_3D(), arm_compute::cpu::step, ITensorInfo::strides_in_bytes(), ITensorInfo::tensor_shape(), Window::use_tensor_dimensions(), arm_compute::WIDTH, and IKernel::window().

526 {
529 
530  // Get initial windows
532 
534  {
535  slice.set(Window::DimY, Window::Dimension(0, _output->info()->dimension(1) * _output->info()->dimension(2), 1));
536  slice.set(Window::DimZ, Window::Dimension(0, _output->info()->dimension(3), 1));
537 
538  unsigned int idx = 0;
539  add_3D_tensor_argument(idx, _input, slice);
540  add_3D_tensor_argument(idx, _output, slice);
541  add_3D_tensor_argument(idx, _weights, slice);
542  if(_biases != nullptr)
543  {
544  add_1D_tensor_argument(idx, _biases, slice);
545  }
546  _kernel.setArg(idx++, static_cast<unsigned int>(_weights->info()->strides_in_bytes()[3]));
547  enqueue(queue, *this, slice, lws_hint());
548  }
549  else
550  {
551  Window win_in = window;
552 
553  win_in.adjust(Window::DimX, -_conv_info.pad_left(), true);
554  win_in.adjust(Window::DimY, -_conv_info.pad_top(), true);
555 
558 
559  win_in.set_dimension_step(width_idx, window[width_idx].step() * _conv_stride_x);
560  win_in.set_dimension_step(height_idx, window[height_idx].step() * _conv_stride_y);
561 
562  Window slice_in = win_in.first_slice_window_3D();
563  unsigned int idx1 = 2 * num_arguments_per_3D_tensor();
564  add_3D_tensor_argument(idx1, _weights, slice);
565 
566  if(_biases != nullptr)
567  {
568  Window slice_biases;
569  slice_biases.use_tensor_dimensions(_biases->info()->tensor_shape());
570  add_1D_tensor_argument(idx1, _biases, slice_biases);
571  }
572 
573  _kernel.setArg(idx1++, static_cast<unsigned int>(_weights->info()->strides_in_bytes()[3]));
574 
575  do
576  {
577  unsigned int idx = 0;
578  add_3D_tensor_argument(idx, _input, slice_in);
579  add_3D_tensor_argument(idx, _output, slice);
580  enqueue(queue, *this, slice, lws_hint());
581  }
582  while(window.slide_window_slice_3D(slice) && win_in.slide_window_slice_3D(slice_in));
583  }
584 }
const Window & window() const
The maximum window the kernel can be executed on.
Definition: IKernel.cpp:28
void enqueue(IGCKernel &kernel, const Window &window, const gles::NDRange &lws=gles::NDRange(1U, 1U, 1U))
Add the kernel to the command queue with the given window.
Definition: IGCKernel.cpp:41
virtual size_t dimension(size_t index) const =0
Return the size of the requested dimension.
cl::NDRange lws_hint() const
Return the Local-Workgroup-Size hint.
Definition: ICLKernel.h:276
unsigned int pad_top() const
Get the top padding.
Definition: Types.h:806
void add_3D_tensor_argument(unsigned int &idx, const ICLTensor *tensor, const Window &window)
Add the passed 3D tensor&#39;s parameters to the object&#39;s kernel&#39;s arguments starting from the index idx...
Definition: ICLKernel.h:172
static constexpr size_t DimX
Alias for dimension 0 also known as X dimension.
Definition: Window.h:43
static constexpr unsigned int num_arguments_per_3D_tensor()
Returns the number of arguments enqueued per 3D tensor object.
Definition: ICLKernel.h:214
virtual const TensorShape & tensor_shape() const =0
Size for each dimension of the tensor.
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor&#39;s metadata.
void set(size_t dimension, const Dimension &dim)
Set the values of a given dimension.
Definition: Window.inl:49
bool slide_window_slice_3D(Window &slice) const
Slide the passed 3D window slice.
Definition: Window.h:335
#define ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL(k)
Definition: Validate.h:941
static constexpr size_t DimY
Alias for dimension 1 also known as Y dimension.
Definition: Window.h:45
static constexpr size_t DimZ
Alias for dimension 2 also known as Z dimension.
Definition: Window.h:47
constexpr int step
Definition: fp32.cpp:35
Num samples, height, width, channels.
virtual const Strides & strides_in_bytes() const =0
The strides in bytes for accessing each dimension of the tensor.
size_t get_data_layout_dimension_index(const DataLayout data_layout, const DataLayoutDimension data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:193
void adjust(size_t dimension, int adjust_value, bool is_at_start)
Adjust the start or end of a given dimension by the given value.
Definition: Window.inl:140
void add_1D_tensor_argument(unsigned int &idx, const ICLTensor *tensor, const Window &window)
Add the passed 1D tensor&#39;s parameters to the object&#39;s kernel&#39;s arguments starting from the index idx...
Definition: ICLKernel.h:124
Window first_slice_window_3D() const
First 3D slice of the window.
Definition: Window.h:291
unsigned int pad_left() const
Get the left padding.
Definition: Types.h:796
#define ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW(f, s)
Definition: Validate.h:205
SimpleTensor< T > slice(const SimpleTensor< T > &src, Coordinates starts, Coordinates ends)

◆ validate()

Status validate ( const ITensorInfo input,
const ITensorInfo weights,
const ITensorInfo biases,
const ITensorInfo output,
const PadStrideInfo conv_info,
const GPUTarget  target 
)
static

Static function to check if given info will lead to a valid configuration of CLDirectConvolutionLayerKernel.

Parameters
[in]inputThe input tensor to convolve. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: QASYMM8_SIGNED/QASYMM8/F16/F32.
[in]weightsWeights tensor. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. The 3rd dimension must be the same as the input's volume 3rd dimension. Data type supported:Same as input.
[in]biasesBiases tensor. Biases are 1D tensor with dimension [OFM]. Data type supported: Should match input data type, except for input of QASYMM8 and QASYMM8_SIGNED type where biases should be of S32 type.
[in]outputOutput tensor. The 3rd dimensions must be equal to the 4th dimension of the kernels tensor. Data types supported: Same as input.
[in]conv_infoContains padding and stride information described in PadStrideInfo.
[in]targetTarget GPU architecture.
Returns
a status

Definition at line 516 of file CLDirectConvolutionLayerKernel.cpp.

References ARM_COMPUTE_RETURN_ON_ERROR, ICloneable< T >::clone(), arm_compute::test::validation::conv_info, and arm_compute::validate_arguments().

Referenced by CLDirectConvolutionLayer::validate().

518 {
520  ARM_COMPUTE_RETURN_ON_ERROR(validate_and_configure_window(input->clone().get(), weights->clone().get(), output->clone().get(), conv_info, target).first);
521 
522  return Status{};
523 }
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204
Status validate_arguments(const ITensorInfo *input, const ITensorInfo *bias, const ITensorInfo *output, const GEMMLowpOutputStageInfo *output_stage)

Field Documentation

◆ _biases

◆ _border_size

◆ _conv_info

◆ _conv_stride_x

◆ _conv_stride_y

◆ _data_layout

◆ _input

◆ _output

◆ _weights


The documentation for this class was generated from the following files: