Compute Library
 19.11
CLDirectConvolutionLayerKernel Class Reference

Interface for the direct convolution kernel. More...

#include <CLDirectConvolutionLayerKernel.h>

Collaboration diagram for CLDirectConvolutionLayerKernel:
[legend]

Public Member Functions

 CLDirectConvolutionLayerKernel ()
 Default constructor. More...
 
 CLDirectConvolutionLayerKernel (const CLDirectConvolutionLayerKernel &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
CLDirectConvolutionLayerKerneloperator= (const CLDirectConvolutionLayerKernel &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 CLDirectConvolutionLayerKernel (CLDirectConvolutionLayerKernel &&)=default
 Allow instances of this class to be moved. More...
 
CLDirectConvolutionLayerKerneloperator= (CLDirectConvolutionLayerKernel &&)=default
 Allow instances of this class to be moved. More...
 
 ~CLDirectConvolutionLayerKernel ()=default
 Default destructor. More...
 
void configure (const ICLTensor *input, const ICLTensor *weights, const ICLTensor *biases, ICLTensor *output, const PadStrideInfo &conv_info)
 Set the input, weights, biases and output tensors. More...
 
void run (const Window &window, cl::CommandQueue &queue) override
 Enqueue the OpenCL kernel to process the given window on the passed OpenCL command queue. More...
 
BorderSize border_size () const override
 The size of the border for that kernel. More...
 
- Public Member Functions inherited from ICLKernel
 ICLKernel ()
 Constructor. More...
 
cl::Kernel & kernel ()
 Returns a reference to the OpenCL kernel of this object. More...
 
template<typename T >
void add_1D_array_argument (unsigned int &idx, const ICLArray< T > *array, const Strides &strides, unsigned int num_dimensions, const Window &window)
 Add the passed 1D array's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_1D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 1D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_1D_tensor_argument_if (bool cond, unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 1D tensor's parameters to the object's kernel's arguments starting from the index idx if the condition is true. More...
 
void add_2D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 2D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_2D_tensor_argument_if (bool cond, unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 2D tensor's parameters to the object's kernel's arguments starting from the index idx if the condition is true. More...
 
void add_3D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 3D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_4D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 4D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
template<typename T >
void add_argument (unsigned int &idx, T value)
 Add the passed parameters to the object's kernel's arguments starting from the index idx. More...
 
void set_lws_hint (const cl::NDRange &lws_hint)
 Set the Local-Workgroup-Size hint. More...
 
cl::NDRange lws_hint () const
 Return the Local-Workgroup-Size hint. More...
 
const std::string & config_id () const
 Get the configuration ID. More...
 
void set_target (GPUTarget target)
 Set the targeted GPU architecture. More...
 
void set_target (cl::Device &device)
 Set the targeted GPU architecture according to the CL device. More...
 
GPUTarget get_target () const
 Get the targeted GPU architecture. More...
 
size_t get_max_workgroup_size ()
 Get the maximum workgroup size for the device the CLKernelLibrary uses. More...
 
template<typename T , unsigned int dimension_size>
void add_array_argument (unsigned &idx, const ICLArray< T > *array, const Strides &strides, unsigned int num_dimensions, const Window &window)
 Add the passed array's parameters to the object's kernel's arguments starting from the index idx. More...
 
template<unsigned int dimension_size>
void add_tensor_argument (unsigned &idx, const ICLTensor *tensor, const Window &window)
 
- Public Member Functions inherited from IKernel
 IKernel ()
 Constructor. More...
 
virtual ~IKernel ()=default
 Destructor. More...
 
virtual bool is_parallelisable () const
 Indicates whether or not the kernel is parallelisable. More...
 
const Windowwindow () const
 The maximum window the kernel can be executed on. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, const PadStrideInfo &conv_info, const GPUTarget target)
 Static function to check if given info will lead to a valid configuration of CLDirectConvolutionLayerKernel. More...
 
- Static Public Member Functions inherited from ICLKernel
static constexpr unsigned int num_arguments_per_1D_array ()
 Returns the number of arguments enqueued per 1D array object. More...
 
static constexpr unsigned int num_arguments_per_1D_tensor ()
 Returns the number of arguments enqueued per 1D tensor object. More...
 
static constexpr unsigned int num_arguments_per_2D_tensor ()
 Returns the number of arguments enqueued per 2D tensor object. More...
 
static constexpr unsigned int num_arguments_per_3D_tensor ()
 Returns the number of arguments enqueued per 3D tensor object. More...
 
static constexpr unsigned int num_arguments_per_4D_tensor ()
 Returns the number of arguments enqueued per 4D tensor object. More...
 
static cl::NDRange gws_from_window (const Window &window)
 Get the global work size given an execution window. More...
 

Data Fields

const ICLTensor_input
 
const ICLTensor_biases
 
const ICLTensor_weights
 
ICLTensor_output
 
BorderSize _border_size
 
int _conv_stride_x
 
int _conv_stride_y
 

Detailed Description

Interface for the direct convolution kernel.

Definition at line 36 of file CLDirectConvolutionLayerKernel.h.

Constructor & Destructor Documentation

◆ CLDirectConvolutionLayerKernel() [1/3]

◆ CLDirectConvolutionLayerKernel() [2/3]

Prevent instances of this class from being copied (As this class contains pointers)

◆ CLDirectConvolutionLayerKernel() [3/3]

Allow instances of this class to be moved.

◆ ~CLDirectConvolutionLayerKernel()

Default destructor.

Member Function Documentation

◆ border_size()

BorderSize border_size ( ) const
overridevirtual

The size of the border for that kernel.

Returns
The width in number of elements of the border.

Reimplemented from IKernel.

Definition at line 407 of file CLDirectConvolutionLayerKernel.cpp.

References CLDirectConvolutionLayerKernel::_border_size.

Referenced by CLDirectConvolutionLayer::configure(), and CLDirectConvolutionLayerKernel::configure().

◆ configure()

void configure ( const ICLTensor input,
const ICLTensor weights,
const ICLTensor biases,
ICLTensor output,
const PadStrideInfo conv_info 
)

Set the input, weights, biases and output tensors.

Note
: DirectConvolution only works in the following configurations: 1x1 convolution with stride_x = 1/2/3, stride_y = 1/2/3 3x3 convolution with stride_x = 1/2, stride_y = 1/2 5x5 convolution with stride_x = 1/2, stride_y = 1/2 9x9 convolution with stride_x = 1/2, stride_y = 1/2, data_layout=NHWC
Parameters
[in]inputThe input tensor to convolve. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: QASYMM8/F16/F32.
[in]weightsWeights tensor. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. The 3rd dimension must be the same as the input's volume 3rd dimension. Data type supported:Same as input.
[in]biasesBiases tensor. Biases are 1D tensor with dimension [OFM]. Data type supported: Should match input data type, except for input of QASYMM8 type where biases should be of S32 type
[out]outputOutput tensor. The 3rd dimensions must be equal to the 4th dimension of the kernels tensor. Data types supported: Same as input.
[in]conv_infoContains padding and stride information described in PadStrideInfo.

Definition at line 412 of file CLDirectConvolutionLayerKernel.cpp.

413 {
415 
416  const DataLayout data_layout = input->info()->data_layout();
420 
421  const unsigned int kernel_size = weights->info()->dimension(width_idx);
422  const DataType data_type = input->info()->data_type();
423 
424  // Get convolved dimensions
426 
427  // Output auto inizialitation if not yet initialized
428  // TODO(COMPMID-2078): input->clone()->set_tensor_shape(output_shape) doesn't work with subtensors for grouped direct convolutions (AlexNet).
429  auto_init_if_empty(*output->info(),
430  output_shape,
431  1,
432  input->info()->data_type(),
433  input->info()->quantization_info());
434 
435  // Perform validation step
436  ARM_COMPUTE_ERROR_THROW_ON(validate_arguments(input->info(),
437  weights->info(),
438  (biases != nullptr) ? biases->info() : nullptr,
439  output->info(),
440  conv_info));
441 
442  _conv_stride_x = std::get<0>(conv_info.stride());
443  _conv_stride_y = std::get<1>(conv_info.stride());
444 
446  {
447  _border_size = BorderSize(conv_info.pad_left(), 0, conv_info.pad_right(), 0);
448  }
449  else if(data_layout == DataLayout::NCHW)
450  {
451  _border_size = BorderSize(conv_info.pad_top(), conv_info.pad_right(), conv_info.pad_bottom(), conv_info.pad_left());
452  }
453  else
454  {
455  ARM_COMPUTE_ERROR("Not supported");
456  }
457 
458  _input = input;
459  _weights = weights;
460  _output = output;
461  _biases = biases;
462 
463  const GPUTarget gpu_target = get_target();
464 
465  std::stringstream kernel_name;
466  kernel_name << "direct_convolution" << kernel_size << "x" << kernel_size;
468  {
469  kernel_name << "_" << lower_string(string_from_data_layout(data_layout));
470  }
471 
472  CLBuildOptions build_options;
473  build_options.add_option_if(_biases != nullptr, std::string("-DHAS_BIAS"));
474 
475  const bool run_optimized_for_bifrost = can_run_optimized_kernel_for_bifrost(gpu_target, _conv_stride_x, _conv_stride_y, kernel_size, data_type, data_layout);
476 
477  if(run_optimized_for_bifrost)
478  {
479  build_options.add_option(std::string("-DWEIGHTS_DEPTH=" + support::cpp11::to_string(_weights->info()->dimension(channel_idx))));
480 
481  kernel_name << "_f32_bifrost";
482  _kernel = static_cast<cl::Kernel>(CLKernelLibrary::get().create_kernel(kernel_name.str(), build_options.options()));
483  }
484  else
485  {
486  const bool is_quantized_asymm = is_data_type_quantized_asymmetric(data_type);
487  build_options.add_option_if(is_quantized_asymm, std::string("-DKERNEL_SIZE=" + support::cpp11::to_string(kernel_size)));
488  build_options.add_option(std::string("-DDATA_TYPE=" + get_cl_type_from_data_type(data_type)));
489  build_options.add_option(std::string("-DDATA_SIZE=" + get_data_size_from_data_type(data_type)));
490  build_options.add_option(std::string("-DWEIGHTS_DEPTH=" + support::cpp11::to_string(_weights->info()->dimension(channel_idx))));
491  build_options.add_option(std::string("-DSTRIDE_X=" + support::cpp11::to_string(_conv_stride_x)));
493  {
494  const bool run_optimized_for_bifrost_nhwc = can_run_optimized_kernel_for_bifrost_nhwc(gpu_target, _conv_stride_x, _conv_stride_y, kernel_size, data_type, data_layout);
495  build_options.add_option(std::string("-DDATA_LAYOUT_NHWC=1"));
496  build_options.add_option(std::string("-DDST_HEIGHT=" + support::cpp11::to_string(_output->info()->dimension(height_idx))));
497  build_options.add_option(std::string("-DDST_WIDTH=" + support::cpp11::to_string(_output->info()->dimension(width_idx))));
498  build_options.add_option(std::string("-DSRC_HEIGHT=" + support::cpp11::to_string(_input->info()->dimension(height_idx))));
499  build_options.add_option(std::string("-DSRC_WIDTH=" + support::cpp11::to_string(_input->info()->dimension(width_idx))));
500  build_options.add_option(std::string("-DPAD_LEFT=" + support::cpp11::to_string(conv_info.pad_left())));
501  build_options.add_option(std::string("-DPAD_TOP=" + support::cpp11::to_string(conv_info.pad_top())));
502  build_options.add_option(std::string("-DSTRIDE_Y=" + support::cpp11::to_string(_conv_stride_y)));
503  if(run_optimized_for_bifrost_nhwc)
504  {
505  const unsigned int num_elems_read_per_iteration_x = 4;
506  _border_size.right = num_elems_read_per_iteration_x;
507  build_options.add_option("-DVEC_SIZE=" + support::cpp11::to_string(num_elems_read_per_iteration_x));
508  }
509  }
510  build_options.add_option(std::string("-DDATA_TYPE_PROMOTED=" + get_cl_type_from_data_type(data_type)));
511  // Create kernel
512  _kernel = static_cast<cl::Kernel>(CLKernelLibrary::get().create_kernel(is_quantized_asymm ? "direct_convolution_quantized" : kernel_name.str(),
513  build_options.options()));
514  }
515 
516  // Configure kernel window
517  auto win_config = validate_and_configure_window(input->info(), weights->info(), output->info(), conv_info, gpu_target);
518  ARM_COMPUTE_ERROR_THROW_ON(win_config.first);
519  ICLKernel::configure_internal(win_config.second);
520 
521  // Set static kernel arguments
523  {
527 
528  int output_multiplier = 0;
529  int output_shift = 0;
530 
531  float multiplier = iqinfo.scale * wqinfo.scale / oqinfo.scale;
532  ARM_COMPUTE_THROW_ON_ERROR(quantization::calculate_quantized_multiplier_less_than_one(multiplier, &output_multiplier, &output_shift));
533 
534  unsigned int idx = 3 * num_arguments_per_3D_tensor() + ((_biases != nullptr) ? num_arguments_per_1D_tensor() : 0) + 1;
535  _kernel.setArg(idx++, -iqinfo.offset);
536  _kernel.setArg(idx++, -wqinfo.offset);
537  _kernel.setArg(idx++, oqinfo.offset);
538  _kernel.setArg(idx++, output_multiplier);
539  _kernel.setArg(idx++, output_shift);
540  }
541 
542  // Set config_id for enabling LWS tuning
543  _config_id = "direct_convolution_";
545  _config_id += "_";
546  _config_id += support::cpp11::to_string(kernel_size);
547  _config_id += "_";
548  _config_id += support::cpp11::to_string(border_size().left);
549  _config_id += "_";
550  _config_id += support::cpp11::to_string(border_size().top);
551  _config_id += "_";
552  _config_id += support::cpp11::to_string(border_size().right);
553  _config_id += "_";
554  _config_id += support::cpp11::to_string(border_size().bottom);
555  _config_id += "_";
557  _config_id += "_";
559  _config_id += "_";
560  _config_id += support::cpp11::to_string(output->info()->dimension(width_idx));
561  _config_id += "_";
562  _config_id += support::cpp11::to_string(output->info()->dimension(height_idx));
563  _config_id += "_";
565 }
static constexpr unsigned int num_arguments_per_1D_tensor()
Returns the number of arguments enqueued per 1D tensor object.
Definition: ICLKernel.h:184
Shape of a tensor.
Definition: TensorShape.h:39
const DataLayout data_layout
Definition: Im2Col.cpp:146
TensorInfo * info() const override
Interface to be implemented by the child class to return the tensor's metadata.
Definition: CLTensor.cpp:41
virtual size_t dimension(size_t index) const =0
Return the size of the requested dimension.
Container for 2D border size.
Definition: Types.h:268
const StringSet & options() const
Gets the current options list set.
TensorShape compute_deep_convolution_shape(const ITensorInfo &input, const ITensorInfo &weights, PadStrideInfo conv_info)
Calculate the deep convolution shape output shape of a tensor.
#define ARM_COMPUTE_ERROR(msg)
Print the given message then throw an std::runtime_error.
Definition: Error.h:352
std::string to_string(T &&value)
Convert integer and float values to string.
size_t dimension(size_t index) const override
Return the size of the requested dimension.
Definition: TensorInfo.h:232
static CLKernelLibrary & get()
Access the KernelLibrary singleton.
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
Quantization info when assuming per layer quantization.
std::string lower_string(const std::string &val)
Lower a given string.
Definition: Utils.cpp:333
bool auto_init_if_empty(ITensorInfo &info, const TensorShape &shape, int num_channels, DataType data_type, QuantizationInfo quantization_info=QuantizationInfo())
Auto initialize the tensor info (shape, number of channels and data type) if the current assignment i...
Definition: Helpers.inl:202
void add_option(std::string option)
Adds option to the existing build option list.
const std::string & string_from_data_type(DataType dt)
Convert a data type identity into a string.
Definition: Utils.cpp:144
std::string get_data_size_from_data_type(const DataType &dt)
Get the size of a data type in number of bits.
Definition: CLHelpers.cpp:152
static constexpr unsigned int num_arguments_per_3D_tensor()
Returns the number of arguments enqueued per 3D tensor object.
Definition: ICLKernel.h:200
GPUTarget get_target() const
Get the targeted GPU architecture.
Definition: ICLKernel.h:286
UniformQuantizationInfo uniform() const
Return per layer quantization info.
std::string get_cl_type_from_data_type(const DataType &dt)
Translates a tensor data type to the appropriate OpenCL type.
Definition: CLHelpers.cpp:37
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor's metadata.
void add_option_if(bool cond, std::string option)
Adds option if a given condition is true;.
std::unique_ptr< Kernel > create_kernel()
Helper function to create and return a unique_ptr pointed to a CL/GLES kernel object.
Definition: Helpers.h:86
virtual QuantizationInfo quantization_info() const =0
Get the quantization settings (scale and offset) of the tensor.
unsigned int right
right of the border
Definition: Types.h:349
Num samples, channels, height, width.
bool is_data_type_quantized_asymmetric(DataType dt)
Check if a given data type is of asymmetric quantized type.
Definition: Utils.h:1044
BorderSize border_size() const override
The size of the border for that kernel.
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161
const std::string & string_from_data_layout(DataLayout dl)
Convert a data layout identity into a string.
Definition: Utils.cpp:132
GPUTarget
Available GPU Targets.
Definition: GPUTarget.h:34
Num samples, height, width, channels.
Status calculate_quantized_multiplier_less_than_one(float multiplier, int *quant_multiplier, int *right_shift)
Calculate quantized representation of multiplier with value less than one.
size_t get_data_layout_dimension_index(const DataLayout data_layout, const DataLayoutDimension data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:327
#define ARM_COMPUTE_THROW_ON_ERROR(error)
Checks if an error value is valid if not throws an exception with the error.
Definition: Error.h:217
DataType
Available data types.
Definition: Types.h:74
DataLayout
[DataLayout enum definition]
Definition: Types.h:116

References CLDirectConvolutionLayerKernel::_biases, CLDirectConvolutionLayerKernel::_border_size, CLDirectConvolutionLayerKernel::_conv_stride_x, CLDirectConvolutionLayerKernel::_conv_stride_y, CLDirectConvolutionLayerKernel::_input, CLDirectConvolutionLayerKernel::_output, CLDirectConvolutionLayerKernel::_weights, CLBuildOptions::add_option(), CLBuildOptions::add_option_if(), ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, ARM_COMPUTE_THROW_ON_ERROR, arm_compute::auto_init_if_empty(), CLDirectConvolutionLayerKernel::border_size(), arm_compute::quantization::calculate_quantized_multiplier_less_than_one(), arm_compute::CHANNEL, arm_compute::misc::shape_calculator::compute_deep_convolution_shape(), arm_compute::test::validation::conv_info, arm_compute::create_kernel(), arm_compute::test::validation::data_layout, arm_compute::test::validation::data_type, ITensorInfo::dimension(), TensorInfo::dimension(), CLKernelLibrary::get(), arm_compute::get_cl_type_from_data_type(), arm_compute::get_data_layout_dimension_index(), arm_compute::get_data_size_from_data_type(), ICLKernel::get_target(), arm_compute::HEIGHT, ITensor::info(), CLTensor::info(), arm_compute::test::validation::input, arm_compute::is_data_type_quantized_asymmetric(), arm_compute::lower_string(), arm_compute::NCHW, arm_compute::NHWC, ICLKernel::num_arguments_per_1D_tensor(), ICLKernel::num_arguments_per_3D_tensor(), UniformQuantizationInfo::offset, CLBuildOptions::options(), arm_compute::test::validation::output_shape, ITensorInfo::quantization_info(), BorderSize::right, UniformQuantizationInfo::scale, arm_compute::string_from_data_layout(), arm_compute::string_from_data_type(), arm_compute::support::cpp11::to_string(), QuantizationInfo::uniform(), arm_compute::test::validation::weights, and arm_compute::WIDTH.

Referenced by CLDirectConvolutionLayer::configure().

◆ operator=() [1/2]

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

Allow instances of this class to be moved.

◆ run()

void run ( const Window window,
cl::CommandQueue &  queue 
)
overridevirtual

Enqueue the OpenCL kernel to process the given window on the passed OpenCL command queue.

Note
The queue is not flushed by this method, and therefore the kernel will not have been executed by the time this method returns.
Parameters
[in]windowRegion on which to execute the kernel. (Must be a valid region of the window returned by window()).
[in,out]queueCommand queue on which to enqueue the kernel.

Implements ICLKernel.

Definition at line 576 of file CLDirectConvolutionLayerKernel.cpp.

577 {
580 
581  // Get initial windows
583  Window win_in = window;
584 
585  win_in.adjust(Window::DimX, -_border_size.left, true);
586  win_in.adjust(Window::DimY, -_border_size.top, true);
587 
591 
592  win_in.set_dimension_step(width_idx, window[width_idx].step() * _conv_stride_x);
593  win_in.set_dimension_step(height_idx, window[height_idx].step() * _conv_stride_y);
594 
595  Window slice_in = win_in.first_slice_window_3D();
596  unsigned int idx1 = 2 * num_arguments_per_3D_tensor();
598 
599  if(_biases != nullptr)
600  {
601  Window slice_biases;
602  slice_biases.use_tensor_dimensions(_biases->info()->tensor_shape());
603  add_1D_tensor_argument(idx1, _biases, slice_biases);
604  }
605 
606  _kernel.setArg(idx1++, static_cast<unsigned int>(_weights->info()->strides_in_bytes()[3]));
607 
608  do
609  {
610  unsigned int idx = 0;
611  add_3D_tensor_argument(idx, _input, slice_in);
613  enqueue(queue, *this, slice, lws_hint());
614  }
615  while(window.slide_window_slice_3D(slice) && win_in.slide_window_slice_3D(slice_in));
616 }
unsigned int top
top of the border
Definition: Types.h:348
const Window & window() const
The maximum window the kernel can be executed on.
Definition: IKernel.cpp:28
const DataLayout data_layout
Definition: Im2Col.cpp:146
void enqueue(cl::CommandQueue &queue, ICLKernel &kernel, const Window &window, const cl::NDRange &lws_hint=CLKernelLibrary::get().default_ndrange(), bool use_dummy_work_items=false)
Add the kernel to the command queue with the given window.
Definition: ICLKernel.cpp:39
cl::NDRange lws_hint() const
Return the Local-Workgroup-Size hint.
Definition: ICLKernel.h:247
void add_3D_tensor_argument(unsigned int &idx, const ICLTensor *tensor, const Window &window)
Add the passed 3D tensor's parameters to the object's kernel's arguments starting from the index idx.
Definition: ICLKernel.h:158
void use_tensor_dimensions(const TensorShape &shape, size_t first_dimension=Window::DimX)
Use the tensor's dimensions to fill the window dimensions.
Definition: Window.inl:264
static constexpr size_t DimX
Alias for dimension 0 also known as X dimension.
Definition: Window.h:43
static constexpr unsigned int num_arguments_per_3D_tensor()
Returns the number of arguments enqueued per 3D tensor object.
Definition: ICLKernel.h:200
virtual const TensorShape & tensor_shape() const =0
Size for each dimension of the tensor.
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor's metadata.
unsigned int left
left of the border
Definition: Types.h:351
bool slide_window_slice_3D(Window &slice) const
Slide the passed 3D window slice.
Definition: Window.h:333
static constexpr size_t DimY
Alias for dimension 1 also known as Y dimension.
Definition: Window.h:45
void set_dimension_step(size_t dimension, int step)
Set the step of a given dimension.
Definition: Window.inl:167
virtual const Strides & strides_in_bytes() const =0
The strides in bytes for accessing each dimension of the tensor.
size_t get_data_layout_dimension_index(const DataLayout data_layout, const DataLayoutDimension data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:327
void adjust(size_t dimension, int adjust_value, bool is_at_start)
Adjust the start or end of a given dimension by the given value.
Definition: Window.inl:140
void add_1D_tensor_argument(unsigned int &idx, const ICLTensor *tensor, const Window &window)
Add the passed 1D tensor's parameters to the object's kernel's arguments starting from the index idx.
Definition: ICLKernel.h:110
Window first_slice_window_3D() const
First 3D slice of the window.
Definition: Window.h:289
#define ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW(f, s)
Definition: Validate.h:205
DataLayout
[DataLayout enum definition]
Definition: Types.h:116
Describe a multidimensional execution window.
Definition: Window.h:39
#define ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL(k)
Definition: Validate.h:941
SimpleTensor< T > slice(const SimpleTensor< T > &src, Coordinates starts, Coordinates ends)
virtual DataLayout data_layout() const =0
Get the data layout of the tensor.

References CLDirectConvolutionLayerKernel::_biases, CLDirectConvolutionLayerKernel::_border_size, CLDirectConvolutionLayerKernel::_conv_stride_x, CLDirectConvolutionLayerKernel::_conv_stride_y, CLDirectConvolutionLayerKernel::_input, CLDirectConvolutionLayerKernel::_output, CLDirectConvolutionLayerKernel::_weights, ICLKernel::add_1D_tensor_argument(), ICLKernel::add_3D_tensor_argument(), Window::adjust(), ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW, ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL, arm_compute::test::validation::data_layout, ITensorInfo::data_layout(), Window::DimX, Window::DimY, arm_compute::enqueue(), Window::first_slice_window_3D(), arm_compute::get_data_layout_dimension_index(), arm_compute::HEIGHT, ITensor::info(), BorderSize::left, ICLKernel::lws_hint(), ICLKernel::num_arguments_per_3D_tensor(), Window::set_dimension_step(), arm_compute::test::validation::reference::slice(), Window::slide_window_slice_3D(), arm_compute::test::validation::step, ITensorInfo::strides_in_bytes(), ITensorInfo::tensor_shape(), BorderSize::top, Window::use_tensor_dimensions(), arm_compute::WIDTH, and IKernel::window().

◆ validate()

Status validate ( const ITensorInfo input,
const ITensorInfo weights,
const ITensorInfo biases,
const ITensorInfo output,
const PadStrideInfo conv_info,
const GPUTarget  target 
)
static

Static function to check if given info will lead to a valid configuration of CLDirectConvolutionLayerKernel.

Parameters
[in]inputThe input tensor to convolve. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: QASYMM8/F16/F32.
[in]weightsWeights tensor. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. The 3rd dimension must be the same as the input's volume 3rd dimension. Data type supported:Same as input.
[in]biasesBiases tensor. Biases are 1D tensor with dimension [OFM]. Data type supported: Same as input.
[in]outputOutput tensor. The 3rd dimensions must be equal to the 4th dimension of the kernels tensor. Data types supported: Same as input.
[in]conv_infoContains padding and stride information described in PadStrideInfo.
[in]targetTarget GPU architecture.
Returns
a status

Definition at line 567 of file CLDirectConvolutionLayerKernel.cpp.

569 {
570  ARM_COMPUTE_RETURN_ON_ERROR(validate_arguments(input, weights, biases, output, conv_info));
571  ARM_COMPUTE_RETURN_ON_ERROR(validate_and_configure_window(input->clone().get(), weights->clone().get(), output->clone().get(), conv_info, target).first);
572 
573  return Status{};
574 }
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204
Status class.
Definition: Error.h:52
virtual std::unique_ptr< T > clone() const =0
Provide a clone of the current object of class T.

References ARM_COMPUTE_RETURN_ON_ERROR, ICloneable< T >::clone(), arm_compute::test::validation::conv_info, arm_compute::test::validation::input, and arm_compute::test::validation::weights.

Referenced by CLDirectConvolutionLayer::validate().

Field Documentation

◆ _biases

◆ _border_size

◆ _conv_stride_x

◆ _conv_stride_y

◆ _input

◆ _output

◆ _weights


The documentation for this class was generated from the following files: