Compute Library
 19.11
CLDepthwiseConvolutionLayer3x3NCHWKernel Class Reference

Interface for the kernel to run a 3x3 depthwise convolution on a tensor when the data layout is NCHW. More...

#include <CLDepthwiseConvolutionLayer3x3NCHWKernel.h>

Collaboration diagram for CLDepthwiseConvolutionLayer3x3NCHWKernel:
[legend]

Public Member Functions

 CLDepthwiseConvolutionLayer3x3NCHWKernel ()
 Default constructor. More...
 
void configure (const ICLTensor *input, const ICLTensor *weights, const ICLTensor *biases, ICLTensor *output, const PadStrideInfo &conv_info, unsigned int depth_multiplier=1, ActivationLayerInfo act_info=ActivationLayerInfo(), const Size2D &dilation=Size2D(1U, 1U), const ICLTensor *output_multipliers=nullptr, const ICLTensor *output_shifts=nullptr) override
 Initialize the function's source, destination, conv and border_size. More...
 
void run (const Window &window, cl::CommandQueue &queue) override
 Enqueue the OpenCL kernel to process the given window on the passed OpenCL command queue. More...
 
BorderSize border_size () const override
 The size of the border for that kernel. More...
 
- Public Member Functions inherited from ICLDepthwiseConvolutionLayer3x3Kernel
 ICLDepthwiseConvolutionLayer3x3Kernel ()
 Default constructor. More...
 
 ICLDepthwiseConvolutionLayer3x3Kernel (const ICLDepthwiseConvolutionLayer3x3Kernel &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
ICLDepthwiseConvolutionLayer3x3Kerneloperator= (const ICLDepthwiseConvolutionLayer3x3Kernel &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 ICLDepthwiseConvolutionLayer3x3Kernel (ICLDepthwiseConvolutionLayer3x3Kernel &&)=default
 Default Move Constructor. More...
 
ICLDepthwiseConvolutionLayer3x3Kerneloperator= (ICLDepthwiseConvolutionLayer3x3Kernel &&)=default
 Default move assignment operator. More...
 
- Public Member Functions inherited from ICLKernel
 ICLKernel ()
 Constructor. More...
 
cl::Kernel & kernel ()
 Returns a reference to the OpenCL kernel of this object. More...
 
template<typename T >
void add_1D_array_argument (unsigned int &idx, const ICLArray< T > *array, const Strides &strides, unsigned int num_dimensions, const Window &window)
 Add the passed 1D array's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_1D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 1D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_1D_tensor_argument_if (bool cond, unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 1D tensor's parameters to the object's kernel's arguments starting from the index idx if the condition is true. More...
 
void add_2D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 2D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_2D_tensor_argument_if (bool cond, unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 2D tensor's parameters to the object's kernel's arguments starting from the index idx if the condition is true. More...
 
void add_3D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 3D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_4D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 4D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
template<typename T >
void add_argument (unsigned int &idx, T value)
 Add the passed parameters to the object's kernel's arguments starting from the index idx. More...
 
void set_lws_hint (const cl::NDRange &lws_hint)
 Set the Local-Workgroup-Size hint. More...
 
cl::NDRange lws_hint () const
 Return the Local-Workgroup-Size hint. More...
 
const std::string & config_id () const
 Get the configuration ID. More...
 
void set_target (GPUTarget target)
 Set the targeted GPU architecture. More...
 
void set_target (cl::Device &device)
 Set the targeted GPU architecture according to the CL device. More...
 
GPUTarget get_target () const
 Get the targeted GPU architecture. More...
 
size_t get_max_workgroup_size ()
 Get the maximum workgroup size for the device the CLKernelLibrary uses. More...
 
template<typename T , unsigned int dimension_size>
void add_array_argument (unsigned &idx, const ICLArray< T > *array, const Strides &strides, unsigned int num_dimensions, const Window &window)
 Add the passed array's parameters to the object's kernel's arguments starting from the index idx. More...
 
template<unsigned int dimension_size>
void add_tensor_argument (unsigned &idx, const ICLTensor *tensor, const Window &window)
 
- Public Member Functions inherited from IKernel
 IKernel ()
 Constructor. More...
 
virtual ~IKernel ()=default
 Destructor. More...
 
virtual bool is_parallelisable () const
 Indicates whether or not the kernel is parallelisable. More...
 
const Windowwindow () const
 The maximum window the kernel can be executed on. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, const PadStrideInfo &conv_info, unsigned int depth_multiplier=1, ActivationLayerInfo act_info=ActivationLayerInfo(), GPUTarget gpu_target=GPUTarget::MIDGARD, const Size2D &dilation=Size2D(1U, 1U), const ITensorInfo *output_multipliers=nullptr, const ITensorInfo *output_shifts=nullptr)
 Static function to check if given info will lead to a valid configuration of CLDepthwiseConvolutionLayer3x3NCHWKernel. More...
 
- Static Public Member Functions inherited from ICLKernel
static constexpr unsigned int num_arguments_per_1D_array ()
 Returns the number of arguments enqueued per 1D array object. More...
 
static constexpr unsigned int num_arguments_per_1D_tensor ()
 Returns the number of arguments enqueued per 1D tensor object. More...
 
static constexpr unsigned int num_arguments_per_2D_tensor ()
 Returns the number of arguments enqueued per 2D tensor object. More...
 
static constexpr unsigned int num_arguments_per_3D_tensor ()
 Returns the number of arguments enqueued per 3D tensor object. More...
 
static constexpr unsigned int num_arguments_per_4D_tensor ()
 Returns the number of arguments enqueued per 4D tensor object. More...
 
static cl::NDRange gws_from_window (const Window &window)
 Get the global work size given an execution window. More...
 

Detailed Description

Interface for the kernel to run a 3x3 depthwise convolution on a tensor when the data layout is NCHW.

Definition at line 35 of file CLDepthwiseConvolutionLayer3x3NCHWKernel.h.

Constructor & Destructor Documentation

◆ CLDepthwiseConvolutionLayer3x3NCHWKernel()

Default constructor.

Definition at line 233 of file CLDepthwiseConvolutionLayer3x3NCHWKernel.cpp.

234  : _conv_stride_x(0), _conv_pad_top(0), _conv_pad_left(0)
235 {
236 }

Member Function Documentation

◆ border_size()

BorderSize border_size ( ) const
overridevirtual

The size of the border for that kernel.

Returns
The width in number of elements of the border.

Reimplemented from IKernel.

Definition at line 238 of file CLDepthwiseConvolutionLayer3x3NCHWKernel.cpp.

239 {
240  return _border_size;
241 }

◆ configure()

void configure ( const ICLTensor input,
const ICLTensor weights,
const ICLTensor biases,
ICLTensor output,
const PadStrideInfo conv_info,
unsigned int  depth_multiplier = 1,
ActivationLayerInfo  act_info = ActivationLayerInfo(),
const Size2D dilation = Size2D(1U, 1U),
const ICLTensor output_multipliers = nullptr,
const ICLTensor output_shifts = nullptr 
)
overridevirtual

Initialize the function's source, destination, conv and border_size.

Parameters
[in]inputSource tensor. DataType supported: QASYMM8/F16/F32.
[in]weightsWeights tensor. A 3D tensor with dimensions [3, 3, IFM]. Data type supported: Same as input or QASYMM8/QSYMM8_PER_CHANNEL when input is QASYMM8.
[in]biasesBiases tensor. A 1D tensor with dimensions [IFM]. Must be nullptr if not needed. Data type supported: Same as input, S32 when input is QASYMM8.
[out]outputDestination tensor. Data type supported: Same as input.
[in]conv_infoPadding and stride information to use for the convolution.
[in]depth_multiplier(Optional) Multiplier to apply to the input's depth in order to retrieve the output's depth. Defaults to 1.
[in]act_info(Optional) Activation layer information in case of a fused activation. Only RELU, BOUNDED_RELU and LU_BOUNDED_RELU for QASYMM8 supported.
[in]dilation(Optional) Dilation, in elements, across x and y. Defaults to (1, 1).
[in]output_multipliers(Optional) Output multipliers tensor for quantized computations. In case of per-channel quantization, the number of multipliers must be equal to the number of filters (IFM). Supported data types: S32
[in]output_shifts(Optional) Output shifts tensor for quantized computations. In case of per-channel quantization, the number of multipliers must be equal to the number of filters (IFM). Supported data types: S32

Implements ICLDepthwiseConvolutionLayer3x3Kernel.

Definition at line 243 of file CLDepthwiseConvolutionLayer3x3NCHWKernel.cpp.

246 {
248  ARM_COMPUTE_ERROR_THROW_ON(validate_arguments(input->info(), weights->info(), (biases != nullptr) ? biases->info() : nullptr, output->info(),
249  conv_info, depth_multiplier, act_info, dilation,
250  (output_multipliers != nullptr) ? output_multipliers->info() : nullptr,
251  (output_shifts != nullptr) ? output_shifts->info() : nullptr));
252 
253  _input = input;
254  _output = output;
255  _weights = weights;
256  _biases = biases;
257  _conv_stride_x = conv_info.stride().first;
258  _conv_stride_y = conv_info.stride().second;
259  _conv_pad_left = conv_info.pad_left();
260  _conv_pad_top = conv_info.pad_top();
261  _border_size = BorderSize(_conv_pad_top, conv_info.pad_right(), conv_info.pad_bottom(), _conv_pad_left);
262  _output_multipliers = output_multipliers;
263  _output_shifts = output_shifts;
264  _is_quantized = is_data_type_quantized_asymmetric(input->info()->data_type());
265 
266  // Configure kernel window
267  std::string kernel_name;
268  const GPUTarget gpu_target = get_target();
269 
270  auto win_config = validate_and_configure_window(input->info(), weights->info(), output->info(), conv_info, depth_multiplier, gpu_target, kernel_name, dilation);
271  ARM_COMPUTE_ERROR_THROW_ON(win_config.first);
272  ICLKernel::configure_internal(win_config.second);
273 
274  // Set build options
275  CLBuildOptions build_opts;
276  build_opts.add_option("-DACTIVATION_TYPE=" + lower_string(string_from_activation_func(act_info.activation())));
277  build_opts.add_option("-DDST_CHANNELS=" + support::cpp11::to_string(_output->info()->tensor_shape().z()));
278  build_opts.add_option("-DDEPTH_MULTIPLIER=" + support::cpp11::to_string(depth_multiplier));
279  build_opts.add_option("-DCONV_STRIDE_X=" + support::cpp11::to_string(_conv_stride_x));
280  build_opts.add_option("-DDILATION_X=" + support::cpp11::to_string(dilation.x()));
281  build_opts.add_option("-DDILATION_Y=" + support::cpp11::to_string(dilation.y()));
282  build_opts.add_option_if(_biases != nullptr, "-DHAS_BIAS");
283 
284  if(_is_quantized)
285  {
286  const UniformQuantizationInfo iq_info = _input->info()->quantization_info().uniform();
287  const UniformQuantizationInfo wq_info = _weights->info()->quantization_info().uniform();
288  const UniformQuantizationInfo oq_info = _output->info()->quantization_info().uniform();
289 
290  const bool is_quantized_per_channel = is_data_type_quantized_per_channel(weights->info()->data_type());
291  const bool is_dot8_supported = dot8_supported(CLKernelLibrary::get().get_device()) && !is_quantized_per_channel;
292  build_opts.add_option("-DCONV_STRIDE_Y=" + support::cpp11::to_string(_conv_stride_y));
293  build_opts.add_option("-DINPUT_OFFSET=" + support::cpp11::to_string(-iq_info.offset));
294  build_opts.add_option("-DWEIGHTS_OFFSET=" + support::cpp11::to_string(-wq_info.offset));
295  build_opts.add_option("-DOUTPUT_OFFSET=" + support::cpp11::to_string(oq_info.offset));
296  build_opts.add_option("-DK_OFFSET=" + support::cpp11::to_string(9 * iq_info.offset * wq_info.offset));
297  build_opts.add_option_if(is_quantized_per_channel, "-DPER_CHANNEL_QUANTIZATION");
298  build_opts.add_option_if(is_dot8_supported, "-DIS_DOT8");
299 
300  if(act_info.enabled())
301  {
302  const int a_val = quantize_qasymm8(act_info.a(), oq_info);
303  const int b_val = quantize_qasymm8(act_info.b(), oq_info);
304  const int o1 = oq_info.offset;
305 
306  build_opts.add_option("-DA_VAL=" + support::cpp11::to_string(a_val));
307  build_opts.add_option("-DB_VAL=" + support::cpp11::to_string(b_val));
308  build_opts.add_option("-DCONST_0=" + support::cpp11::to_string(o1));
309 
310  const float s1 = iq_info.scale;
311  build_opts.add_option("-DS1_VAL=" + float_to_string_with_full_precision(s1));
312  build_opts.add_option("-DO1_VAL=" + support::cpp11::to_string(o1));
313  }
314 
315  build_opts.add_option("-DDATA_TYPE=" + get_cl_type_from_data_type(input->info()->data_type()));
316  build_opts.add_option("-DWEIGHTS_TYPE=" + get_cl_type_from_data_type(weights->info()->data_type()));
317  build_opts.add_option("-DWEIGHTS_PROMOTED_TYPE=" + get_cl_promoted_type_from_data_type(weights->info()->data_type()));
318  }
319  else
320  {
321  build_opts.add_option_if(act_info.enabled(), "-DA_VAL=" + float_to_string_with_full_precision(act_info.a()));
322  build_opts.add_option_if(act_info.enabled(), "-DB_VAL=" + float_to_string_with_full_precision(act_info.b()));
323  build_opts.add_option_if(act_info.enabled(), "-DDATA_TYPE=" + get_cl_type_from_data_type(input->info()->data_type()));
324  build_opts.add_option("-DVEC_SIZE=" + support::cpp11::to_string(win_config.second.x().step()));
325  }
326 
327  build_opts.add_option_if(input->info()->data_type() == DataType::F16, "-DIS_F16");
328  build_opts.add_option_if(input->info()->data_type() == DataType::F32, "-DIS_F32");
329 
330  _kernel = static_cast<cl::Kernel>(CLKernelLibrary::get().create_kernel(kernel_name, build_opts.options()));
331 
332  // Set config_id for enabling LWS tuning
333  _config_id = kernel_name;
334  _config_id += "_";
335  _config_id += lower_string(string_from_data_type(input->info()->data_type()));
336  _config_id += "_";
337  _config_id += support::cpp11::to_string(input->info()->dimension(0));
338  _config_id += "_";
339  _config_id += support::cpp11::to_string(input->info()->dimension(1));
340  _config_id += "_";
341  _config_id += support::cpp11::to_string(input->info()->dimension(2));
342  _config_id += "_";
343  _config_id += support::cpp11::to_string(output->info()->dimension(0));
344  _config_id += "_";
345  _config_id += support::cpp11::to_string(output->info()->dimension(1));
346 }
bool dot8_supported(const cl::Device &device)
Helper function to check whether the cl_arm_integer_dot_product_int8 extension is supported.
Definition: CLHelpers.cpp:204
TensorInfo * info() const override
Interface to be implemented by the child class to return the tensor's metadata.
Definition: CLTensor.cpp:41
std::string to_string(T &&value)
Convert integer and float values to string.
1 channel, 1 F32 per channel
const std::string & string_from_activation_func(ActivationLayerInfo::ActivationFunction act)
Translates a given activation function to a string.
Definition: Utils.cpp:172
static CLKernelLibrary & get()
Access the KernelLibrary singleton.
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
std::string lower_string(const std::string &val)
Lower a given string.
Definition: Utils.cpp:333
1 channel, 1 F16 per channel
DataType data_type() const override
Data type used for each element of the tensor.
Definition: TensorInfo.h:265
const std::string & string_from_data_type(DataType dt)
Convert a data type identity into a string.
Definition: Utils.cpp:144
bool is_data_type_quantized_per_channel(DataType dt)
Check if a given data type is of per channel type.
Definition: Utils.h:1082
std::string float_to_string_with_full_precision(float val)
Create a string with the float in full precision.
Definition: Utils.h:1099
GPUTarget get_target() const
Get the targeted GPU architecture.
Definition: ICLKernel.h:286
std::string get_cl_type_from_data_type(const DataType &dt)
Translates a tensor data type to the appropriate OpenCL type.
Definition: CLHelpers.cpp:37
std::unique_ptr< Kernel > create_kernel()
Helper function to create and return a unique_ptr pointed to a CL/GLES kernel object.
Definition: Helpers.h:86
bool is_data_type_quantized_asymmetric(DataType dt)
Check if a given data type is of asymmetric quantized type.
Definition: Utils.h:1044
std::string get_cl_promoted_type_from_data_type(const DataType &dt)
Translates a tensor data type to the appropriate OpenCL promoted type.
Definition: CLHelpers.cpp:72
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161
GPUTarget
Available GPU Targets.
Definition: GPUTarget.h:34
uint8_t quantize_qasymm8(float value, const UniformQuantizationInfo &qinfo, RoundingPolicy rounding_policy=RoundingPolicy::TO_NEAREST_UP)
Quantize a value given a 8-bit asymmetric quantization scheme.

References arm_compute::test::validation::act_info, CLBuildOptions::add_option(), CLBuildOptions::add_option_if(), ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, arm_compute::test::validation::conv_info, arm_compute::create_kernel(), TensorInfo::data_type(), arm_compute::test::validation::dilation, ITensorInfo::dimension(), arm_compute::dot8_supported(), arm_compute::F16, arm_compute::F32, arm_compute::float_to_string_with_full_precision(), CLKernelLibrary::get(), arm_compute::get_cl_promoted_type_from_data_type(), arm_compute::get_cl_type_from_data_type(), ICLKernel::get_target(), ITensor::info(), CLTensor::info(), arm_compute::test::validation::input, arm_compute::is_data_type_quantized_asymmetric(), arm_compute::is_data_type_quantized_per_channel(), arm_compute::lower_string(), UniformQuantizationInfo::offset, CLBuildOptions::options(), arm_compute::quantize_qasymm8(), UniformQuantizationInfo::scale, arm_compute::string_from_activation_func(), arm_compute::string_from_data_type(), arm_compute::support::cpp11::to_string(), and arm_compute::test::validation::weights.

◆ run()

void run ( const Window window,
cl::CommandQueue &  queue 
)
overridevirtual

Enqueue the OpenCL kernel to process the given window on the passed OpenCL command queue.

Note
The queue is not flushed by this method, and therefore the kernel will not have been executed by the time this method returns.
Parameters
[in]windowRegion on which to execute the kernel. (Must be a valid region of the window returned by window()).
[in,out]queueCommand queue on which to enqueue the kernel.

Implements ICLKernel.

Definition at line 361 of file CLDepthwiseConvolutionLayer3x3NCHWKernel.cpp.

362 {
365 
367 
368  // Create input window and adjust
369  Window collapsed_in = collapsed;
370  collapsed_in.adjust(Window::DimX, -_conv_pad_left, true);
371  collapsed_in.adjust(Window::DimY, -_conv_pad_top, true);
372  collapsed_in.set_dimension_step(Window::DimX, collapsed_in.x().step() * _conv_stride_x);
373  collapsed_in.set_dimension_step(Window::DimY, collapsed_in.y().step() * _conv_stride_y);
374 
375  Window slice_in = collapsed_in.first_slice_window_3D();
376  Window slice_out = collapsed.first_slice_window_3D();
377  Window slice_weights = window.first_slice_window_3D();
378  slice_weights.set_dimension_step(Window::DimX, 0);
379  slice_weights.set_dimension_step(Window::DimY, 0);
380 
381  unsigned int idx = 3 * num_arguments_per_3D_tensor();
382 
383  // Set output multipliers in case of quantized data type
384  if(_is_quantized)
385  {
386  Window slice;
387  slice.use_tensor_dimensions(_output_multipliers->info()->tensor_shape());
388  add_1D_tensor_argument(idx, _output_multipliers, slice);
389  add_1D_tensor_argument(idx, _output_shifts, slice);
390  }
391 
392  // Set biases
393  if(_biases != nullptr)
394  {
395  Window slice_biases;
396  slice_biases.use_tensor_dimensions(_biases->info()->tensor_shape());
397  add_1D_tensor_argument(idx, _biases, slice_biases);
398  }
399 
400  do
401  {
402  idx = 0;
403  add_3D_tensor_argument(idx, _input, slice_in);
404  add_3D_tensor_argument(idx, _output, slice_out);
405  add_3D_tensor_argument(idx, _weights, slice_weights);
406 
407  enqueue(queue, *this, slice_out, lws_hint());
408  }
409  while(collapsed.slide_window_slice_3D(slice_out) && collapsed_in.slide_window_slice_3D(slice_in));
410 }
const Window & window() const
The maximum window the kernel can be executed on.
Definition: IKernel.cpp:28
void enqueue(cl::CommandQueue &queue, ICLKernel &kernel, const Window &window, const cl::NDRange &lws_hint=CLKernelLibrary::get().default_ndrange(), bool use_dummy_work_items=false)
Add the kernel to the command queue with the given window.
Definition: ICLKernel.cpp:39
cl::NDRange lws_hint() const
Return the Local-Workgroup-Size hint.
Definition: ICLKernel.h:247
void add_3D_tensor_argument(unsigned int &idx, const ICLTensor *tensor, const Window &window)
Add the passed 3D tensor's parameters to the object's kernel's arguments starting from the index idx.
Definition: ICLKernel.h:158
static constexpr size_t DimX
Alias for dimension 0 also known as X dimension.
Definition: Window.h:43
static constexpr unsigned int num_arguments_per_3D_tensor()
Returns the number of arguments enqueued per 3D tensor object.
Definition: ICLKernel.h:200
Window collapse_if_possible(const Window &full_window, size_t first, size_t last, bool *has_collapsed=nullptr) const
Collapse the dimensions between first and last if possible.
Definition: Window.inl:68
static constexpr size_t DimY
Alias for dimension 1 also known as Y dimension.
Definition: Window.h:45
void set_dimension_step(size_t dimension, int step)
Set the step of a given dimension.
Definition: Window.inl:167
static constexpr size_t DimZ
Alias for dimension 2 also known as Z dimension.
Definition: Window.h:47
void adjust(size_t dimension, int adjust_value, bool is_at_start)
Adjust the start or end of a given dimension by the given value.
Definition: Window.inl:140
void add_1D_tensor_argument(unsigned int &idx, const ICLTensor *tensor, const Window &window)
Add the passed 1D tensor's parameters to the object's kernel's arguments starting from the index idx.
Definition: ICLKernel.h:110
Window first_slice_window_3D() const
First 3D slice of the window.
Definition: Window.h:289
#define ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW(f, s)
Definition: Validate.h:205
#define ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL(k)
Definition: Validate.h:941
SimpleTensor< T > slice(const SimpleTensor< T > &src, Coordinates starts, Coordinates ends)

References ICLKernel::add_1D_tensor_argument(), ICLKernel::add_3D_tensor_argument(), Window::adjust(), ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW, ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL, Window::collapse_if_possible(), Window::DimX, Window::DimY, Window::DimZ, arm_compute::enqueue(), Window::first_slice_window_3D(), ICLKernel::lws_hint(), ICLKernel::num_arguments_per_3D_tensor(), Window::set_dimension_step(), arm_compute::test::validation::reference::slice(), Window::slide_window_slice_3D(), Window::Dimension::step(), Window::use_tensor_dimensions(), IKernel::window(), Window::x(), and Window::y().

◆ validate()

Status validate ( const ITensorInfo input,
const ITensorInfo weights,
const ITensorInfo biases,
const ITensorInfo output,
const PadStrideInfo conv_info,
unsigned int  depth_multiplier = 1,
ActivationLayerInfo  act_info = ActivationLayerInfo(),
GPUTarget  gpu_target = GPUTarget::MIDGARD,
const Size2D dilation = Size2D(1U, 1U),
const ITensorInfo output_multipliers = nullptr,
const ITensorInfo output_shifts = nullptr 
)
static

Static function to check if given info will lead to a valid configuration of CLDepthwiseConvolutionLayer3x3NCHWKernel.

Parameters
[in]inputSource tensor info. DataType supported: F16/F32/QASYMM8.
[in]weightsWeights tensor info. A 3D tensor with dimensions [3, 3, IFM]. Data type supported: Same as input or QASYMM8/QSYMM8_PER_CHANNEL when input is QASYMM8.
[in]biasesBiases tensor info. A 1D tensor with dimensions [IFM]. Must be nullptr if not needed. Data type supported: Same as input, S32 when input is QASYMM8.
[in]outputDestination tensor. Data type supported: Same as input.
[in]conv_infoPadding and stride information to use for the convolution.
[in]depth_multiplier(Optional) Multiplier to apply to the input's depth in order to retrieve the output's depth. Defaults to 1.
[in]act_info(Optional) Activation layer information in case of a fused activation. Only RELU, BOUNDED_RELU and LU_BOUNDED_RELU are supported.
[in]gpu_target(Optional) GPU target to validate the kernel for. Defaults to midgard.
[in]dilation(Optional) Dilation, in elements, across x and y. Defaults to (1, 1).
[in]output_multipliers(Optional) Output multipliers tensor info for quantized computations. In case of per-channel quantization, the number of multipliers must be equal to the number of filters (IFM). Supported data types: S32
[in]output_shifts(Optional) Output shifts tensor for quantized computations. In case of per-channel quantization, the number of multipliers must be equal to the number of filters (IFM). Supported data types: S32
Returns
a status

Definition at line 348 of file CLDepthwiseConvolutionLayer3x3NCHWKernel.cpp.

351 {
352  std::string kernel_name;
353  ARM_COMPUTE_RETURN_ON_ERROR(validate_arguments(input, weights, biases, output, conv_info, depth_multiplier, act_info, dilation, output_multipliers, output_shifts));
354  ARM_COMPUTE_RETURN_ON_ERROR(validate_and_configure_window(input->clone().get(), weights->clone().get(), output->clone().get(),
355  conv_info, depth_multiplier, gpu_target, kernel_name, dilation)
356  .first);
357 
358  return Status{};
359 }
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204

References arm_compute::test::validation::act_info, ARM_COMPUTE_RETURN_ON_ERROR, ICloneable< T >::clone(), arm_compute::test::validation::conv_info, arm_compute::test::validation::dilation, arm_compute::test::validation::input, and arm_compute::test::validation::weights.


The documentation for this class was generated from the following files: