Compute Library
 19.08
CLDirectConvolutionLayerKernel Class Reference

Interface for the direct convolution kernel. More...

#include <CLDirectConvolutionLayerKernel.h>

Collaboration diagram for CLDirectConvolutionLayerKernel:
[legend]

Public Member Functions

 CLDirectConvolutionLayerKernel ()
 Default constructor. More...
 
 CLDirectConvolutionLayerKernel (const CLDirectConvolutionLayerKernel &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
CLDirectConvolutionLayerKerneloperator= (const CLDirectConvolutionLayerKernel &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 CLDirectConvolutionLayerKernel (CLDirectConvolutionLayerKernel &&)=default
 Allow instances of this class to be moved. More...
 
CLDirectConvolutionLayerKerneloperator= (CLDirectConvolutionLayerKernel &&)=default
 Allow instances of this class to be moved. More...
 
 ~CLDirectConvolutionLayerKernel ()=default
 Default destructor. More...
 
void configure (const ICLTensor *input, const ICLTensor *weights, const ICLTensor *biases, ICLTensor *output, const PadStrideInfo &conv_info)
 Set the input, weights, biases and output tensors. More...
 
void run (const Window &window, cl::CommandQueue &queue) override
 Enqueue the OpenCL kernel to process the given window on the passed OpenCL command queue. More...
 
BorderSize border_size () const override
 The size of the border for that kernel. More...
 
- Public Member Functions inherited from ICLKernel
 ICLKernel ()
 Constructor. More...
 
cl::Kernel & kernel ()
 Returns a reference to the OpenCL kernel of this object. More...
 
template<typename T >
void add_1D_array_argument (unsigned int &idx, const ICLArray< T > *array, const Strides &strides, unsigned int num_dimensions, const Window &window)
 Add the passed 1D array's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_1D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 1D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_1D_tensor_argument_if (bool cond, unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 1D tensor's parameters to the object's kernel's arguments starting from the index idx if the condition is true. More...
 
void add_2D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 2D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_2D_tensor_argument_if (bool cond, unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 2D tensor's parameters to the object's kernel's arguments starting from the index idx if the condition is true. More...
 
void add_3D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 3D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_4D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 4D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
template<typename T >
void add_argument (unsigned int &idx, T value)
 Add the passed parameters to the object's kernel's arguments starting from the index idx. More...
 
void set_lws_hint (const cl::NDRange &lws_hint)
 Set the Local-Workgroup-Size hint. More...
 
cl::NDRange lws_hint () const
 Return the Local-Workgroup-Size hint. More...
 
const std::string & config_id () const
 Get the configuration ID. More...
 
void set_target (GPUTarget target)
 Set the targeted GPU architecture. More...
 
void set_target (cl::Device &device)
 Set the targeted GPU architecture according to the CL device. More...
 
GPUTarget get_target () const
 Get the targeted GPU architecture. More...
 
size_t get_max_workgroup_size ()
 Get the maximum workgroup size for the device the CLKernelLibrary uses. More...
 
template<typename T , unsigned int dimension_size>
void add_array_argument (unsigned &idx, const ICLArray< T > *array, const Strides &strides, unsigned int num_dimensions, const Window &window)
 Add the passed array's parameters to the object's kernel's arguments starting from the index idx. More...
 
template<unsigned int dimension_size>
void add_tensor_argument (unsigned &idx, const ICLTensor *tensor, const Window &window)
 
- Public Member Functions inherited from IKernel
 IKernel ()
 Constructor. More...
 
virtual ~IKernel ()=default
 Destructor. More...
 
virtual bool is_parallelisable () const
 Indicates whether or not the kernel is parallelisable. More...
 
const Windowwindow () const
 The maximum window the kernel can be executed on. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, const PadStrideInfo &conv_info, const GPUTarget target)
 Static function to check if given info will lead to a valid configuration of CLDirectConvolutionLayerKernel. More...
 
- Static Public Member Functions inherited from ICLKernel
static constexpr unsigned int num_arguments_per_1D_array ()
 Returns the number of arguments enqueued per 1D array object. More...
 
static constexpr unsigned int num_arguments_per_1D_tensor ()
 Returns the number of arguments enqueued per 1D tensor object. More...
 
static constexpr unsigned int num_arguments_per_2D_tensor ()
 Returns the number of arguments enqueued per 2D tensor object. More...
 
static constexpr unsigned int num_arguments_per_3D_tensor ()
 Returns the number of arguments enqueued per 3D tensor object. More...
 
static constexpr unsigned int num_arguments_per_4D_tensor ()
 Returns the number of arguments enqueued per 4D tensor object. More...
 
static cl::NDRange gws_from_window (const Window &window)
 Get the global work size given an execution window. More...
 

Data Fields

const ICLTensor_input
 
const ICLTensor_biases
 
const ICLTensor_weights
 
ICLTensor_output
 
BorderSize _border_size
 
int _conv_stride_x
 
int _conv_stride_y
 

Detailed Description

Interface for the direct convolution kernel.

Definition at line 36 of file CLDirectConvolutionLayerKernel.h.

Constructor & Destructor Documentation

◆ CLDirectConvolutionLayerKernel() [1/3]

◆ CLDirectConvolutionLayerKernel() [2/3]

Prevent instances of this class from being copied (As this class contains pointers)

◆ CLDirectConvolutionLayerKernel() [3/3]

Allow instances of this class to be moved.

◆ ~CLDirectConvolutionLayerKernel()

Default destructor.

Member Function Documentation

◆ border_size()

BorderSize border_size ( ) const
overridevirtual

The size of the border for that kernel.

Returns
The width in number of elements of the border.

Reimplemented from IKernel.

Definition at line 384 of file CLDirectConvolutionLayerKernel.cpp.

References CLDirectConvolutionLayerKernel::_border_size.

Referenced by CLDirectConvolutionLayer::configure(), and CLDirectConvolutionLayerKernel::configure().

◆ configure()

void configure ( const ICLTensor input,
const ICLTensor weights,
const ICLTensor biases,
ICLTensor output,
const PadStrideInfo conv_info 
)

Set the input, weights, biases and output tensors.

Note
: DirectConvolution only works in the following configurations: 1x1 convolution with stride_x = 1/2/3, stride_y = 1/2/3 3x3 convolution with stride_x = 1/2, stride_y = 1/2 5x5 convolution with stride_x = 1/2, stride_y = 1/2 9x9 convolution with stride_x = 1/2, stride_y = 1/2, data_layout=NHWC
Parameters
[in]inputThe input tensor to convolve. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: QASYMM8/F16/F32.
[in]weightsWeights tensor. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. The 3rd dimension must be the same as the input's volume 3rd dimension. Data type supported:Same as input.
[in]biasesBiases tensor. Biases are 1D tensor with dimension [OFM]. Data type supported: Should match input data type, except for input of QASYMM8 type where biases should be of S32 type
[out]outputOutput tensor. The 3rd dimensions must be equal to the 4th dimension of the kernels tensor. Data types supported: Same as input.
[in]conv_infoContains padding and stride information described in PadStrideInfo.

Definition at line 389 of file CLDirectConvolutionLayerKernel.cpp.

390 {
391  ARM_COMPUTE_ERROR_ON_NULLPTR(input, weights, output);
392 
393  const DataLayout data_layout = input->info()->data_layout();
397 
398  const unsigned int kernel_size = weights->info()->dimension(width_idx);
399  const DataType data_type = input->info()->data_type();
400 
401  // Get convolved dimensions
403 
404  // Output auto inizialitation if not yet initialized
405  // TODO(COMPMID-2078): input->clone()->set_tensor_shape(output_shape) doesn't work with subtensors for grouped direct convolutions (AlexNet).
406  auto_init_if_empty(*output->info(),
407  output_shape,
408  1,
409  input->info()->data_type(),
410  input->info()->quantization_info());
411 
412  // Perform validation step
413  ARM_COMPUTE_ERROR_THROW_ON(validate_arguments(input->info(),
414  weights->info(),
415  (biases != nullptr) ? biases->info() : nullptr,
416  output->info(),
417  conv_info));
418 
419  _conv_stride_x = std::get<0>(conv_info.stride());
420  _conv_stride_y = std::get<1>(conv_info.stride());
421 
423  {
424  _border_size = BorderSize(conv_info.pad_left(), 0, conv_info.pad_right(), 0);
425  }
426  else if(data_layout == DataLayout::NCHW)
427  {
428  _border_size = BorderSize(conv_info.pad_top(), conv_info.pad_right(), conv_info.pad_bottom(), conv_info.pad_left());
429  }
430  else
431  {
432  ARM_COMPUTE_ERROR("Not supported");
433  }
434 
435  _input = input;
436  _weights = weights;
437  _output = output;
438  _biases = biases;
439 
440  const GPUTarget gpu_target = get_target();
441 
442  std::stringstream kernel_name;
443  kernel_name << "direct_convolution" << kernel_size << "x" << kernel_size;
445  {
446  kernel_name << "_" << lower_string(string_from_data_layout(data_layout));
447  }
448 
449  CLBuildOptions build_options;
450  build_options.add_option_if(_biases != nullptr, std::string("-DHAS_BIAS"));
451 
452  const bool run_optimized_for_bifrost = can_run_optimized_kernel_for_bifrost(gpu_target, _conv_stride_x, _conv_stride_y, kernel_size, data_type, data_layout);
453 
454  if(run_optimized_for_bifrost)
455  {
456  build_options.add_option(std::string("-DWEIGHTS_DEPTH=" + support::cpp11::to_string(_weights->info()->dimension(channel_idx))));
457 
458  kernel_name << "_f32_bifrost";
459  _kernel = static_cast<cl::Kernel>(CLKernelLibrary::get().create_kernel(kernel_name.str(), build_options.options()));
460  }
461  else
462  {
463  const bool is_quantized_asymm = is_data_type_quantized_asymmetric(data_type);
464  build_options.add_option_if(is_quantized_asymm, std::string("-DKERNEL_SIZE=" + support::cpp11::to_string(kernel_size)));
465  build_options.add_option(std::string("-DDATA_TYPE=" + get_cl_type_from_data_type(data_type)));
466  build_options.add_option(std::string("-DDATA_SIZE=" + get_data_size_from_data_type(data_type)));
467  build_options.add_option(std::string("-DWEIGHTS_DEPTH=" + support::cpp11::to_string(_weights->info()->dimension(channel_idx))));
468  build_options.add_option(std::string("-DSTRIDE_X=" + support::cpp11::to_string(_conv_stride_x)));
470  {
471  const bool run_optimized_for_bifrost_nhwc = can_run_optimized_kernel_for_bifrost_nhwc(gpu_target, _conv_stride_x, _conv_stride_y, kernel_size, data_type, data_layout);
472  build_options.add_option(std::string("-DDATA_LAYOUT_NHWC=1"));
473  build_options.add_option(std::string("-DDST_HEIGHT=" + support::cpp11::to_string(_output->info()->dimension(height_idx))));
474  build_options.add_option(std::string("-DDST_WIDTH=" + support::cpp11::to_string(_output->info()->dimension(width_idx))));
475  build_options.add_option(std::string("-DSRC_HEIGHT=" + support::cpp11::to_string(_input->info()->dimension(height_idx))));
476  build_options.add_option(std::string("-DSRC_WIDTH=" + support::cpp11::to_string(_input->info()->dimension(width_idx))));
477  build_options.add_option(std::string("-DPAD_LEFT=" + support::cpp11::to_string(conv_info.pad_left())));
478  build_options.add_option(std::string("-DPAD_TOP=" + support::cpp11::to_string(conv_info.pad_top())));
479  build_options.add_option(std::string("-DSTRIDE_Y=" + support::cpp11::to_string(_conv_stride_y)));
480  if(run_optimized_for_bifrost_nhwc)
481  {
482  const unsigned int num_elems_read_per_iteration_x = 4;
483  _border_size.right = num_elems_read_per_iteration_x;
484  build_options.add_option("-DVEC_SIZE=" + support::cpp11::to_string(num_elems_read_per_iteration_x));
485  }
486  }
487  build_options.add_option(std::string("-DDATA_TYPE_PROMOTED=" + get_cl_type_from_data_type(data_type)));
488  // Create kernel
489  _kernel = static_cast<cl::Kernel>(CLKernelLibrary::get().create_kernel(is_quantized_asymm ? "direct_convolution_1x1_3x3_5x5_quantized" : kernel_name.str(),
490  build_options.options()));
491  }
492 
493  // Configure kernel window
494  auto win_config = validate_and_configure_window(input->info(), weights->info(), output->info(), conv_info, gpu_target);
495  ARM_COMPUTE_ERROR_THROW_ON(win_config.first);
496  ICLKernel::configure_internal(win_config.second);
497 
498  // Set static kernel arguments
500  {
504 
505  int output_multiplier = 0;
506  int output_shift = 0;
507 
508  float multiplier = iqinfo.scale * wqinfo.scale / oqinfo.scale;
509  ARM_COMPUTE_THROW_ON_ERROR(quantization::calculate_quantized_multiplier_less_than_one(multiplier, &output_multiplier, &output_shift));
510 
511  unsigned int idx = 3 * num_arguments_per_3D_tensor() + ((_biases != nullptr) ? num_arguments_per_1D_tensor() : 0) + 1;
512  _kernel.setArg(idx++, -iqinfo.offset);
513  _kernel.setArg(idx++, -wqinfo.offset);
514  _kernel.setArg(idx++, oqinfo.offset);
515  _kernel.setArg(idx++, output_multiplier);
516  _kernel.setArg(idx++, output_shift);
517  }
518 
519  // Set config_id for enabling LWS tuning
520  _config_id = "direct_convolution_";
522  _config_id += "_";
523  _config_id += support::cpp11::to_string(kernel_size);
524  _config_id += "_";
525  _config_id += support::cpp11::to_string(border_size().left);
526  _config_id += "_";
527  _config_id += support::cpp11::to_string(border_size().top);
528  _config_id += "_";
529  _config_id += support::cpp11::to_string(border_size().right);
530  _config_id += "_";
531  _config_id += support::cpp11::to_string(border_size().bottom);
532  _config_id += "_";
534  _config_id += "_";
536  _config_id += "_";
537  _config_id += support::cpp11::to_string(output->info()->dimension(width_idx));
538  _config_id += "_";
539  _config_id += support::cpp11::to_string(output->info()->dimension(height_idx));
540  _config_id += "_";
542 }
static constexpr unsigned int num_arguments_per_1D_tensor()
Returns the number of arguments enqueued per 1D tensor object.
Definition: ICLKernel.h:184
#define ARM_COMPUTE_ERROR(...)
Print the given message then throw an std::runtime_error.
Definition: Error.h:261
arm_compute::Status calculate_quantized_multiplier_less_than_one(float multiplier, int *quant_multiplier, int *right_shift)
Calculate quantized representation of multiplier with value less than one.
Shape of a tensor.
Definition: TensorShape.h:39
const DataLayout data_layout
Definition: Im2Col.cpp:146
TensorInfo * info() const override
Interface to be implemented by the child class to return the tensor's metadata.
Definition: CLTensor.cpp:35
virtual size_t dimension(size_t index) const =0
Return the size of the requested dimension.
Container for 2D border size.
Definition: Types.h:259
const StringSet & options() const
Gets the current options list set.
TensorShape compute_deep_convolution_shape(const ITensorInfo &input, const ITensorInfo &weights, PadStrideInfo conv_info)
Calculate the deep convolution shape output shape of a tensor.
std::pair< Status, Window > validate_and_configure_window(ITensorInfo *input, ITensorInfo *weights, ITensorInfo *biases, ITensorInfo *output, const PadStrideInfo &conv_info, unsigned int depth_multiplier, const Size2D &dilation)
std::string to_string(T &&value)
Convert integer and float values to string.
size_t dimension(size_t index) const override
Return the size of the requested dimension.
Definition: TensorInfo.h:223
virtual DataType data_type() const =0
Data type used for each element of the tensor.
static CLKernelLibrary & get()
Access the KernelLibrary singleton.
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:327
Quantization info when assuming per layer quantization.
std::string lower_string(const std::string &val)
Lower a given string.
Definition: Utils.cpp:327
bool auto_init_if_empty(ITensorInfo &info, const TensorShape &shape, int num_channels, DataType data_type, QuantizationInfo quantization_info=QuantizationInfo())
Auto initialize the tensor info (shape, number of channels and data type) if the current assignment i...
Definition: Helpers.inl:201
void add_option(std::string option)
Adds option to the existing build option list.
const std::string & string_from_data_type(DataType dt)
Convert a data type identity into a string.
Definition: Utils.cpp:144
std::string get_data_size_from_data_type(const DataType &dt)
Get the size of a data type in number of bits.
Definition: CLHelpers.cpp:99
static constexpr unsigned int num_arguments_per_3D_tensor()
Returns the number of arguments enqueued per 3D tensor object.
Definition: ICLKernel.h:200
GPUTarget get_target() const
Get the targeted GPU architecture.
Definition: ICLKernel.h:286
UniformQuantizationInfo uniform() const
Return per layer quantization info.
std::string get_cl_type_from_data_type(const DataType &dt)
Translates a tensor data type to the appropriate OpenCL type.
Definition: CLHelpers.cpp:35
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor's metadata.
void add_option_if(bool cond, std::string option)
Adds option if a given condition is true;.
std::unique_ptr< Kernel > create_kernel()
Helper function to create and return a unique_ptr pointed to a CL/GLES kernel object.
Definition: Helpers.h:86
virtual QuantizationInfo quantization_info() const =0
Get the quantization settings (scale and offset) of the tensor.
unsigned int right
right of the border
Definition: Types.h:340
Num samples, channels, height, width.
bool is_data_type_quantized_asymmetric(DataType dt)
Check if a given data type is of asymmetric quantized type.
Definition: Utils.h:1030
BorderSize border_size() const override
The size of the border for that kernel.
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161
const std::string & string_from_data_layout(DataLayout dl)
Convert a data layout identity into a string.
Definition: Utils.cpp:132
GPUTarget
Available GPU Targets.
Definition: GPUTarget.h:34
Num samples, height, width, channels.
size_t get_data_layout_dimension_index(const DataLayout data_layout, const DataLayoutDimension data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:326
#define ARM_COMPUTE_THROW_ON_ERROR(error)
Checks if an error value is valid if not throws an exception with the error.
Definition: Error.h:206
DataType
Available data types.
Definition: Types.h:74
DataLayout
[DataLayout enum definition]
Definition: Types.h:114
virtual DataLayout data_layout() const =0
Get the data layout of the tensor.

References CLDirectConvolutionLayerKernel::_biases, CLDirectConvolutionLayerKernel::_border_size, CLDirectConvolutionLayerKernel::_conv_stride_x, CLDirectConvolutionLayerKernel::_conv_stride_y, CLDirectConvolutionLayerKernel::_input, CLDirectConvolutionLayerKernel::_output, CLDirectConvolutionLayerKernel::_weights, CLBuildOptions::add_option(), CLBuildOptions::add_option_if(), ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, ARM_COMPUTE_THROW_ON_ERROR, arm_compute::auto_init_if_empty(), CLDirectConvolutionLayerKernel::border_size(), arm_compute::quantization::calculate_quantized_multiplier_less_than_one(), arm_compute::CHANNEL, arm_compute::misc::shape_calculator::compute_deep_convolution_shape(), arm_compute::test::validation::conv_info, arm_compute::create_kernel(), arm_compute::test::validation::data_layout, ITensorInfo::data_layout(), arm_compute::test::validation::data_type, ITensorInfo::data_type(), ITensorInfo::dimension(), TensorInfo::dimension(), CLKernelLibrary::get(), arm_compute::get_cl_type_from_data_type(), arm_compute::get_data_layout_dimension_index(), arm_compute::get_data_size_from_data_type(), ICLKernel::get_target(), arm_compute::HEIGHT, ITensor::info(), CLTensor::info(), arm_compute::is_data_type_quantized_asymmetric(), arm_compute::lower_string(), arm_compute::NCHW, arm_compute::NHWC, ICLKernel::num_arguments_per_1D_tensor(), ICLKernel::num_arguments_per_3D_tensor(), UniformQuantizationInfo::offset, CLBuildOptions::options(), arm_compute::test::validation::output_shape, ITensorInfo::quantization_info(), BorderSize::right, UniformQuantizationInfo::scale, arm_compute::string_from_data_layout(), arm_compute::string_from_data_type(), arm_compute::support::cpp11::to_string(), QuantizationInfo::uniform(), arm_compute::validate_and_configure_window(), arm_compute::test::validation::weights, and arm_compute::WIDTH.

Referenced by CLDirectConvolutionLayer::configure().

◆ operator=() [1/2]

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

Allow instances of this class to be moved.

◆ run()

void run ( const Window window,
cl::CommandQueue &  queue 
)
overridevirtual

Enqueue the OpenCL kernel to process the given window on the passed OpenCL command queue.

Note
The queue is not flushed by this method, and therefore the kernel will not have been executed by the time this method returns.
Parameters
[in]windowRegion on which to execute the kernel. (Must be a valid region of the window returned by window()).
[in,out]queueCommand queue on which to enqueue the kernel.

Implements ICLKernel.

Definition at line 553 of file CLDirectConvolutionLayerKernel.cpp.

554 {
557 
558  // Get initial windows
560  Window win_in = window;
561 
562  win_in.adjust(Window::DimX, -_border_size.left, true);
563  win_in.adjust(Window::DimY, -_border_size.top, true);
564 
568 
569  win_in.set_dimension_step(width_idx, window[width_idx].step() * _conv_stride_x);
570  win_in.set_dimension_step(height_idx, window[height_idx].step() * _conv_stride_y);
571 
572  Window slice_in = win_in.first_slice_window_3D();
573  unsigned int idx1 = 2 * num_arguments_per_3D_tensor();
575 
576  if(_biases != nullptr)
577  {
578  Window slice_biases;
579  slice_biases.use_tensor_dimensions(_biases->info()->tensor_shape());
580  add_1D_tensor_argument(idx1, _biases, slice_biases);
581  }
582 
583  _kernel.setArg(idx1++, static_cast<unsigned int>(_weights->info()->strides_in_bytes()[3]));
584 
585  do
586  {
587  unsigned int idx = 0;
588  add_3D_tensor_argument(idx, _input, slice_in);
590  enqueue(queue, *this, slice, lws_hint());
591  }
592  while(window.slide_window_slice_3D(slice) && win_in.slide_window_slice_3D(slice_in));
593 }
unsigned int top
top of the border
Definition: Types.h:339
const Window & window() const
The maximum window the kernel can be executed on.
Definition: IKernel.cpp:28
const DataLayout data_layout
Definition: Im2Col.cpp:146
void enqueue(cl::CommandQueue &queue, ICLKernel &kernel, const Window &window, const cl::NDRange &lws_hint=CLKernelLibrary::get().default_ndrange(), bool use_dummy_work_items=false)
Add the kernel to the command queue with the given window.
Definition: ICLKernel.cpp:39
cl::NDRange lws_hint() const
Return the Local-Workgroup-Size hint.
Definition: ICLKernel.h:247
void add_3D_tensor_argument(unsigned int &idx, const ICLTensor *tensor, const Window &window)
Add the passed 3D tensor's parameters to the object's kernel's arguments starting from the index idx.
Definition: ICLKernel.h:158
void use_tensor_dimensions(const TensorShape &shape, size_t first_dimension=Window::DimX)
Use the tensor's dimensions to fill the window dimensions.
Definition: Window.inl:250
static constexpr size_t DimX
Alias for dimension 0 also known as X dimension.
Definition: Window.h:43
static constexpr unsigned int num_arguments_per_3D_tensor()
Returns the number of arguments enqueued per 3D tensor object.
Definition: ICLKernel.h:200
virtual const TensorShape & tensor_shape() const =0
Size for each dimension of the tensor.
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor's metadata.
unsigned int left
left of the border
Definition: Types.h:342
bool slide_window_slice_3D(Window &slice) const
Slide the passed 3D window slice.
Definition: Window.h:319
static constexpr size_t DimY
Alias for dimension 1 also known as Y dimension.
Definition: Window.h:45
void set_dimension_step(size_t dimension, int step)
Set the step of a given dimension.
Definition: Window.inl:153
virtual const Strides & strides_in_bytes() const =0
The strides in bytes for accessing each dimension of the tensor.
size_t get_data_layout_dimension_index(const DataLayout data_layout, const DataLayoutDimension data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:326
void adjust(size_t dimension, int adjust_value, bool is_at_start)
Adjust the start or end of a given dimension by the given value.
Definition: Window.inl:126
void add_1D_tensor_argument(unsigned int &idx, const ICLTensor *tensor, const Window &window)
Add the passed 1D tensor's parameters to the object's kernel's arguments starting from the index idx.
Definition: ICLKernel.h:110
Window first_slice_window_3D() const
First 3D slice of the window.
Definition: Window.h:275
#define ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW(f, s)
Definition: Validate.h:205
DataLayout
[DataLayout enum definition]
Definition: Types.h:114
Describe a multidimensional execution window.
Definition: Window.h:39
#define ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL(k)
Definition: Validate.h:940
SimpleTensor< T > slice(const SimpleTensor< T > &src, Coordinates starts, Coordinates ends)
virtual DataLayout data_layout() const =0
Get the data layout of the tensor.

References CLDirectConvolutionLayerKernel::_biases, CLDirectConvolutionLayerKernel::_border_size, CLDirectConvolutionLayerKernel::_conv_stride_x, CLDirectConvolutionLayerKernel::_conv_stride_y, CLDirectConvolutionLayerKernel::_input, CLDirectConvolutionLayerKernel::_output, CLDirectConvolutionLayerKernel::_weights, ICLKernel::add_1D_tensor_argument(), ICLKernel::add_3D_tensor_argument(), Window::adjust(), ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW, ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL, arm_compute::test::validation::data_layout, ITensorInfo::data_layout(), Window::DimX, Window::DimY, arm_compute::enqueue(), Window::first_slice_window_3D(), arm_compute::get_data_layout_dimension_index(), arm_compute::HEIGHT, ITensor::info(), BorderSize::left, ICLKernel::lws_hint(), ICLKernel::num_arguments_per_3D_tensor(), Window::set_dimension_step(), arm_compute::test::validation::reference::slice(), Window::slide_window_slice_3D(), arm_compute::test::validation::step, ITensorInfo::strides_in_bytes(), ITensorInfo::tensor_shape(), BorderSize::top, Window::use_tensor_dimensions(), arm_compute::WIDTH, and IKernel::window().

◆ validate()

Status validate ( const ITensorInfo input,
const ITensorInfo weights,
const ITensorInfo biases,
const ITensorInfo output,
const PadStrideInfo conv_info,
const GPUTarget  target 
)
static

Static function to check if given info will lead to a valid configuration of CLDirectConvolutionLayerKernel.

Parameters
[in]inputThe input tensor to convolve. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: QASYMM8/F16/F32.
[in]weightsWeights tensor. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. The 3rd dimension must be the same as the input's volume 3rd dimension. Data type supported:Same as input.
[in]biasesBiases tensor. Biases are 1D tensor with dimension [OFM]. Data type supported: Same as input.
[in]outputOutput tensor. The 3rd dimensions must be equal to the 4th dimension of the kernels tensor. Data types supported: Same as input.
[in]conv_infoContains padding and stride information described in PadStrideInfo.
[in]targetTarget GPU architecture.
Returns
a status

Definition at line 544 of file CLDirectConvolutionLayerKernel.cpp.

546 {
547  ARM_COMPUTE_RETURN_ON_ERROR(validate_arguments(input, weights, biases, output, conv_info));
548  ARM_COMPUTE_RETURN_ON_ERROR(validate_and_configure_window(input->clone().get(), weights->clone().get(), output->clone().get(), conv_info, target).first);
549 
550  return Status{};
551 }
std::pair< Status, Window > validate_and_configure_window(ITensorInfo *input, ITensorInfo *weights, ITensorInfo *biases, ITensorInfo *output, const PadStrideInfo &conv_info, unsigned int depth_multiplier, const Size2D &dilation)
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:193
Status class.
Definition: Error.h:52
virtual std::unique_ptr< T > clone() const =0
Provide a clone of the current object of class T.

References ARM_COMPUTE_RETURN_ON_ERROR, ICloneable< T >::clone(), arm_compute::test::validation::conv_info, arm_compute::validate_and_configure_window(), and arm_compute::test::validation::weights.

Referenced by CLDirectConvolutionLayer::validate().

Field Documentation

◆ _biases

◆ _border_size

◆ _conv_stride_x

◆ _conv_stride_y

◆ _input

◆ _output

◆ _weights


The documentation for this class was generated from the following files: