Compute Library
 19.11
CLDepthwiseConvolutionLayer3x3NHWCKernel Class Reference

Interface for the kernel to run a 3x3 depthwise convolution on a tensor when the data layout is NHWC. More...

#include <CLDepthwiseConvolutionLayer3x3NHWCKernel.h>

Collaboration diagram for CLDepthwiseConvolutionLayer3x3NHWCKernel:
[legend]

Public Member Functions

 CLDepthwiseConvolutionLayer3x3NHWCKernel ()
 Default constructor. More...
 
void configure (const ICLTensor *input, const ICLTensor *weights, const ICLTensor *biases, ICLTensor *output, const PadStrideInfo &conv_info, unsigned int depth_multiplier=1, ActivationLayerInfo act_info=ActivationLayerInfo(), const Size2D &dilation=Size2D(1U, 1U), const ICLTensor *output_multipliers=nullptr, const ICLTensor *output_shifts=nullptr) override
 Default move assignment operator. More...
 
void run (const Window &window, cl::CommandQueue &queue) override
 Enqueue the OpenCL kernel to process the given window on the passed OpenCL command queue. More...
 
BorderSize border_size () const override
 The size of the border for that kernel. More...
 
- Public Member Functions inherited from ICLDepthwiseConvolutionLayer3x3Kernel
 ICLDepthwiseConvolutionLayer3x3Kernel ()
 Default constructor. More...
 
 ICLDepthwiseConvolutionLayer3x3Kernel (const ICLDepthwiseConvolutionLayer3x3Kernel &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
ICLDepthwiseConvolutionLayer3x3Kerneloperator= (const ICLDepthwiseConvolutionLayer3x3Kernel &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 ICLDepthwiseConvolutionLayer3x3Kernel (ICLDepthwiseConvolutionLayer3x3Kernel &&)=default
 Default Move Constructor. More...
 
ICLDepthwiseConvolutionLayer3x3Kerneloperator= (ICLDepthwiseConvolutionLayer3x3Kernel &&)=default
 Default move assignment operator. More...
 
- Public Member Functions inherited from ICLKernel
 ICLKernel ()
 Constructor. More...
 
cl::Kernel & kernel ()
 Returns a reference to the OpenCL kernel of this object. More...
 
template<typename T >
void add_1D_array_argument (unsigned int &idx, const ICLArray< T > *array, const Strides &strides, unsigned int num_dimensions, const Window &window)
 Add the passed 1D array's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_1D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 1D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_1D_tensor_argument_if (bool cond, unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 1D tensor's parameters to the object's kernel's arguments starting from the index idx if the condition is true. More...
 
void add_2D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 2D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_2D_tensor_argument_if (bool cond, unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 2D tensor's parameters to the object's kernel's arguments starting from the index idx if the condition is true. More...
 
void add_3D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 3D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_4D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 4D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
template<typename T >
void add_argument (unsigned int &idx, T value)
 Add the passed parameters to the object's kernel's arguments starting from the index idx. More...
 
void set_lws_hint (const cl::NDRange &lws_hint)
 Set the Local-Workgroup-Size hint. More...
 
cl::NDRange lws_hint () const
 Return the Local-Workgroup-Size hint. More...
 
const std::string & config_id () const
 Get the configuration ID. More...
 
void set_target (GPUTarget target)
 Set the targeted GPU architecture. More...
 
void set_target (cl::Device &device)
 Set the targeted GPU architecture according to the CL device. More...
 
GPUTarget get_target () const
 Get the targeted GPU architecture. More...
 
size_t get_max_workgroup_size ()
 Get the maximum workgroup size for the device the CLKernelLibrary uses. More...
 
template<typename T , unsigned int dimension_size>
void add_array_argument (unsigned &idx, const ICLArray< T > *array, const Strides &strides, unsigned int num_dimensions, const Window &window)
 Add the passed array's parameters to the object's kernel's arguments starting from the index idx. More...
 
template<unsigned int dimension_size>
void add_tensor_argument (unsigned &idx, const ICLTensor *tensor, const Window &window)
 
- Public Member Functions inherited from IKernel
 IKernel ()
 Constructor. More...
 
virtual ~IKernel ()=default
 Destructor. More...
 
virtual bool is_parallelisable () const
 Indicates whether or not the kernel is parallelisable. More...
 
const Windowwindow () const
 The maximum window the kernel can be executed on. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, const PadStrideInfo &conv_info, unsigned int depth_multiplier=1, ActivationLayerInfo act_info=ActivationLayerInfo(), const Size2D &dilation=Size2D(1U, 1U), const ITensorInfo *output_multipliers=nullptr, const ITensorInfo *output_shifts=nullptr)
 Static function to check if given info will lead to a valid configuration of CLDepthwiseConvolutionLayer3x3NHWCKernel. More...
 
- Static Public Member Functions inherited from ICLKernel
static constexpr unsigned int num_arguments_per_1D_array ()
 Returns the number of arguments enqueued per 1D array object. More...
 
static constexpr unsigned int num_arguments_per_1D_tensor ()
 Returns the number of arguments enqueued per 1D tensor object. More...
 
static constexpr unsigned int num_arguments_per_2D_tensor ()
 Returns the number of arguments enqueued per 2D tensor object. More...
 
static constexpr unsigned int num_arguments_per_3D_tensor ()
 Returns the number of arguments enqueued per 3D tensor object. More...
 
static constexpr unsigned int num_arguments_per_4D_tensor ()
 Returns the number of arguments enqueued per 4D tensor object. More...
 
static cl::NDRange gws_from_window (const Window &window)
 Get the global work size given an execution window. More...
 

Detailed Description

Interface for the kernel to run a 3x3 depthwise convolution on a tensor when the data layout is NHWC.

Definition at line 35 of file CLDepthwiseConvolutionLayer3x3NHWCKernel.h.

Constructor & Destructor Documentation

◆ CLDepthwiseConvolutionLayer3x3NHWCKernel()

Default constructor.

Definition at line 187 of file CLDepthwiseConvolutionLayer3x3NHWCKernel.cpp.

188  : _num_rows_processed_per_iteration(1), _num_planes_processed_per_iteration(1)
189 {
190 }

Member Function Documentation

◆ border_size()

BorderSize border_size ( ) const
overridevirtual

The size of the border for that kernel.

Returns
The width in number of elements of the border.

Reimplemented from IKernel.

Definition at line 192 of file CLDepthwiseConvolutionLayer3x3NHWCKernel.cpp.

193 {
194  return _border_size;
195 }

◆ configure()

void configure ( const ICLTensor input,
const ICLTensor weights,
const ICLTensor biases,
ICLTensor output,
const PadStrideInfo conv_info,
unsigned int  depth_multiplier = 1,
ActivationLayerInfo  act_info = ActivationLayerInfo(),
const Size2D dilation = Size2D(1U, 1U),
const ICLTensor output_multipliers = nullptr,
const ICLTensor output_shifts = nullptr 
)
overridevirtual

Default move assignment operator.

Initialize the function's source, destination, conv and border_size.

Parameters
[in]inputSource tensor. DataType supported: QASYMM8.
[in]weightsWeights tensor. A 3D tensor with dimensions [IFM, 3, 3]. Data type supported: Same as input or QASYMM8/QSYMM8_PER_CHANNEL when input is QASYMM8.
[in]biasesBiases tensor. A 1D tensor with dimensions [IFM]. Must be nullptr if not needed. Data type supported: Same as input, S32 when input is QASYMM8.
[out]outputDestination tensor. Data type supported: Same as input.
[in]conv_infoPadding and stride information to use for the convolution.
[in]depth_multiplier(Optional) Multiplier to apply to the input's depth in order to retrieve the output's depth. Defaults to 1.
[in]act_info(Optional) Activation layer information in case of a fused activation. Only RELU, BOUNDED_RELU and LU_BOUNDED_RELU are supported.
[in]dilation(Optional) Dilation, in elements, across x and y. Defaults to (1, 1).
[in]output_multipliers(Optional) Output multipliers tensor for quantized computations. In case of per-channel quantization, the number of multipliers must be equal to the number of filters (IFM). Supported data types: S32
[in]output_shifts(Optional) Output shifts tensor for quantized computations. In case of per-channel quantization, the number of multipliers must be equal to the number of filters (IFM). Supported data types: S32

Implements ICLDepthwiseConvolutionLayer3x3Kernel.

Definition at line 197 of file CLDepthwiseConvolutionLayer3x3NHWCKernel.cpp.

200 {
202  ARM_COMPUTE_ERROR_THROW_ON(validate_arguments(input->info(), weights->info(), (biases != nullptr) ? biases->info() : nullptr, output->info(),
203  conv_info, depth_multiplier, act_info, dilation,
204  (output_multipliers != nullptr) ? output_multipliers->info() : nullptr,
205  (output_shifts != nullptr) ? output_shifts->info() : nullptr));
206  auto win_config = validate_and_configure_window(input->info(), weights->info(), biases != nullptr ? biases->info() : nullptr, output->info(),
207  conv_info, depth_multiplier, dilation,
208  (output_multipliers != nullptr) ? output_multipliers->info() : nullptr,
209  (output_shifts != nullptr) ? output_shifts->info() : nullptr);
210  ARM_COMPUTE_ERROR_THROW_ON(win_config.first);
211 
212  const bool is_stride_1 = ((conv_info.stride().first == conv_info.stride().second) && (conv_info.stride().first == 1));
213  const bool is_stride_1_dilation_1 = (is_stride_1 && dilation.x() == 1 && dilation.y() == 1);
214 
215  const bool is_quantized_per_channel = is_data_type_quantized_per_channel(weights->info()->data_type());
216  const bool is_dot8_supported = dot8_supported(CLKernelLibrary::get().get_device()) && !is_quantized_per_channel;
217 
218  _input = input;
219  _output = output;
220  _weights = weights;
221  _biases = biases;
222  _conv_stride_y = conv_info.stride().second;
223  _num_rows_processed_per_iteration = is_stride_1_dilation_1 ? 2 : 1;
224  _num_planes_processed_per_iteration = is_stride_1_dilation_1 ? 2 : 1;
225  _output_multipliers = output_multipliers;
226  _output_shifts = output_shifts;
227  _is_quantized = is_data_type_quantized_asymmetric(input->info()->data_type());
228 
229  // If QASYMM8 and the 8 bit dot product is available, force _num_planes_processed_per_iteration to 1
230  if(is_dot8_supported && _is_quantized)
231  {
232  _num_planes_processed_per_iteration = 1;
233  }
234 
235  _border_size = BorderSize(_is_quantized && is_stride_1 ? 0 : conv_info.pad_left(), 0, std::max(std::max(conv_info.pad_right(), conv_info.pad_bottom()), conv_info.pad_top()), 0);
236 
237  const unsigned int num_elems_accessed_per_iteration = _is_quantized ? 4 : (8 / input->info()->element_size());
238 
239  CLBuildOptions build_opts;
240  build_opts.add_option("-DACTIVATION_TYPE=" + lower_string(string_from_activation_func(act_info.activation())));
241  build_opts.add_option_if(_biases != nullptr, "-DHAS_BIAS");
242  build_opts.add_option("-DVEC_SIZE=" + support::cpp11::to_string(num_elems_accessed_per_iteration));
243  build_opts.add_option("-DSRC_DIM_2=" + support::cpp11::to_string(_input->info()->dimension(2)));
244  build_opts.add_option("-DCONV_PAD_TOP=" + support::cpp11::to_string(conv_info.pad_top()));
245  build_opts.add_option("-DCONV_PAD_LEFT=" + support::cpp11::to_string(conv_info.pad_left()));
246  build_opts.add_option("-DDILATION_X=" + support::cpp11::to_string(dilation.x()));
247  build_opts.add_option("-DDILATION_Y=" + support::cpp11::to_string(dilation.y()));
248 
249  if(_is_quantized)
250  {
251  const UniformQuantizationInfo iq_info = _input->info()->quantization_info().uniform();
252  const UniformQuantizationInfo wq_info = _weights->info()->quantization_info().uniform();
253  const UniformQuantizationInfo oq_info = _output->info()->quantization_info().uniform();
254 
255  build_opts.add_option("-DSRC_DIM_1=" + support::cpp11::to_string(_input->info()->dimension(1)));
256  build_opts.add_option("-DINPUT_OFFSET=" + support::cpp11::to_string(-iq_info.offset));
257  build_opts.add_option("-DWEIGHTS_OFFSET=" + support::cpp11::to_string(-wq_info.offset));
258  build_opts.add_option("-DOUTPUT_OFFSET=" + support::cpp11::to_string(oq_info.offset));
259  build_opts.add_option("-DK_OFFSET=" + support::cpp11::to_string(9 * iq_info.offset * wq_info.offset));
260  build_opts.add_option_if(is_quantized_per_channel, "-DPER_CHANNEL_QUANTIZATION");
261  build_opts.add_option_if(is_dot8_supported, "-DIS_DOT8");
262 
263  if(act_info.enabled())
264  {
265  const int a_val = quantize_qasymm8(act_info.a(), oq_info);
266  const int b_val = quantize_qasymm8(act_info.b(), oq_info);
267  const int o1 = oq_info.offset;
268 
269  build_opts.add_option("-DA_VAL=" + support::cpp11::to_string(a_val));
270  build_opts.add_option("-DB_VAL=" + support::cpp11::to_string(b_val));
271  build_opts.add_option("-DCONST_0=" + support::cpp11::to_string(o1));
272 
273  const float s1 = iq_info.scale;
274  build_opts.add_option("-DS1_VAL=" + float_to_string_with_full_precision(s1));
275  build_opts.add_option("-DO1_VAL=" + support::cpp11::to_string(o1));
276  }
277 
278  build_opts.add_option("-DDATA_TYPE=" + get_cl_type_from_data_type(input->info()->data_type()));
279  build_opts.add_option("-DWEIGHTS_TYPE=" + get_cl_type_from_data_type(weights->info()->data_type()));
280  build_opts.add_option("-DWEIGHTS_PROMOTED_TYPE=" + get_cl_promoted_type_from_data_type(weights->info()->data_type()));
281  }
282  else
283  {
284  build_opts.add_option_if(act_info.enabled(), "-DA_VAL=" + float_to_string_with_full_precision(act_info.a()));
285  build_opts.add_option_if(act_info.enabled(), "-DB_VAL=" + float_to_string_with_full_precision(act_info.b()));
286  build_opts.add_option("-DDATA_TYPE=" + get_cl_type_from_data_type(_input->info()->data_type()));
287  }
288 
289  if(is_stride_1_dilation_1)
290  {
291  build_opts.add_option("-DNUM_ROWS_PROCESSED=" + support::cpp11::to_string(_num_rows_processed_per_iteration));
292  build_opts.add_option("-DNUM_PLANES_PROCESSED=" + support::cpp11::to_string(_num_planes_processed_per_iteration));
293  build_opts.add_option("-DDST_DIM_2=" + support::cpp11::to_string(_output->info()->dimension(2)));
294  }
295  else
296  {
297  build_opts.add_option("-DCONV_STRIDE_X=" + support::cpp11::to_string(conv_info.stride().first));
298  build_opts.add_option("-DCONV_STRIDE_Y=" + support::cpp11::to_string(_conv_stride_y));
299  }
300  build_opts.add_option_if(_input->info()->tensor_shape().total_size_upper(3) > 1,
301  "-DDST_DEPTH=" + support::cpp11::to_string(static_cast<int>(std::ceil(_output->info()->dimension(2) / static_cast<float>(_num_planes_processed_per_iteration)))));
302 
303  std::string kernel_name;
304  // Create kernel
305  if(_is_quantized)
306  {
307  kernel_name = std::string("dwc_3x3_reshaped_quantized8");
308  kernel_name += (is_dot8_supported && is_stride_1_dilation_1 ? "_dot8" : "");
309  kernel_name += (is_stride_1_dilation_1 ? "_stride1" : "");
310  kernel_name += "_nhwc";
311  }
312  else
313  {
314  kernel_name = std::string("depthwise_convolution_3x3_nhwc");
315  kernel_name += (is_stride_1_dilation_1 ? "_stride1" : "");
316  }
317 
318  build_opts.add_option_if(input->info()->data_type() == DataType::F16, "-DIS_F16");
319  build_opts.add_option_if(input->info()->data_type() == DataType::F32, "-DIS_F32");
320 
321  ICLKernel::configure_internal(win_config.second);
322  _kernel = static_cast<cl::Kernel>(CLKernelLibrary::get().create_kernel(kernel_name, build_opts.options()));
323 
324  // Set config_id for enabling LWS tuning
325  _config_id = kernel_name;
326  _config_id += "_";
327  _config_id += support::cpp11::to_string(input->info()->dimension(0));
328  _config_id += "_";
329  _config_id += support::cpp11::to_string(input->info()->dimension(1));
330  _config_id += "_";
331  _config_id += support::cpp11::to_string(input->info()->dimension(2));
332  _config_id += "_";
333  _config_id += support::cpp11::to_string(output->info()->dimension(0));
334  _config_id += "_";
335  _config_id += support::cpp11::to_string(output->info()->dimension(1));
336  _config_id += "_";
337  _config_id += string_from_data_type(input->info()->data_type());
338 }
bool dot8_supported(const cl::Device &device)
Helper function to check whether the cl_arm_integer_dot_product_int8 extension is supported.
Definition: CLHelpers.cpp:204
TensorInfo * info() const override
Interface to be implemented by the child class to return the tensor's metadata.
Definition: CLTensor.cpp:41
std::string to_string(T &&value)
Convert integer and float values to string.
1 channel, 1 F32 per channel
const std::string & string_from_activation_func(ActivationLayerInfo::ActivationFunction act)
Translates a given activation function to a string.
Definition: Utils.cpp:172
static CLKernelLibrary & get()
Access the KernelLibrary singleton.
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
std::string lower_string(const std::string &val)
Lower a given string.
Definition: Utils.cpp:333
1 channel, 1 F16 per channel
DataType data_type() const override
Data type used for each element of the tensor.
Definition: TensorInfo.h:265
const std::string & string_from_data_type(DataType dt)
Convert a data type identity into a string.
Definition: Utils.cpp:144
bool is_data_type_quantized_per_channel(DataType dt)
Check if a given data type is of per channel type.
Definition: Utils.h:1082
std::string float_to_string_with_full_precision(float val)
Create a string with the float in full precision.
Definition: Utils.h:1099
std::string get_cl_type_from_data_type(const DataType &dt)
Translates a tensor data type to the appropriate OpenCL type.
Definition: CLHelpers.cpp:37
std::unique_ptr< Kernel > create_kernel()
Helper function to create and return a unique_ptr pointed to a CL/GLES kernel object.
Definition: Helpers.h:86
bool is_data_type_quantized_asymmetric(DataType dt)
Check if a given data type is of asymmetric quantized type.
Definition: Utils.h:1044
std::string get_cl_promoted_type_from_data_type(const DataType &dt)
Translates a tensor data type to the appropriate OpenCL promoted type.
Definition: CLHelpers.cpp:72
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161
uint8_t quantize_qasymm8(float value, const UniformQuantizationInfo &qinfo, RoundingPolicy rounding_policy=RoundingPolicy::TO_NEAREST_UP)
Quantize a value given a 8-bit asymmetric quantization scheme.

References arm_compute::test::validation::act_info, CLBuildOptions::add_option(), CLBuildOptions::add_option_if(), ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, arm_compute::test::validation::conv_info, arm_compute::create_kernel(), TensorInfo::data_type(), arm_compute::test::validation::dilation, ITensorInfo::dimension(), arm_compute::dot8_supported(), arm_compute::F16, arm_compute::F32, arm_compute::float_to_string_with_full_precision(), CLKernelLibrary::get(), arm_compute::get_cl_promoted_type_from_data_type(), arm_compute::get_cl_type_from_data_type(), ITensor::info(), CLTensor::info(), arm_compute::test::validation::input, arm_compute::is_data_type_quantized_asymmetric(), arm_compute::is_data_type_quantized_per_channel(), arm_compute::lower_string(), UniformQuantizationInfo::offset, CLBuildOptions::options(), arm_compute::quantize_qasymm8(), UniformQuantizationInfo::scale, arm_compute::string_from_activation_func(), arm_compute::string_from_data_type(), arm_compute::support::cpp11::to_string(), and arm_compute::test::validation::weights.

◆ run()

void run ( const Window window,
cl::CommandQueue &  queue 
)
overridevirtual

Enqueue the OpenCL kernel to process the given window on the passed OpenCL command queue.

Note
The queue is not flushed by this method, and therefore the kernel will not have been executed by the time this method returns.
Parameters
[in]windowRegion on which to execute the kernel. (Must be a valid region of the window returned by window()).
[in,out]queueCommand queue on which to enqueue the kernel.

Implements ICLKernel.

Definition at line 355 of file CLDepthwiseConvolutionLayer3x3NHWCKernel.cpp.

356 {
359 
360  // Collapse window
361  Window window_collapsed = window.collapse_if_possible(ICLKernel::window(), Window::DimZ);
362  const size_t total_batches = _input->info()->tensor_shape().total_size_upper(3);
363 
364  Window win = window_collapsed;
365  win.set(Window::DimZ, Window::Dimension(0, std::ceil(_output->info()->dimension(2) / static_cast<float>(_num_planes_processed_per_iteration)) * total_batches, 1));
366 
367  // Create input window and adjust
368  Window win_in = win;
369  win_in.set_dimension_step(Window::DimY, _num_rows_processed_per_iteration);
370  win_in.set_dimension_step(Window::DimZ, _conv_stride_y);
371 
372  ARM_COMPUTE_ERROR_ON((win_in.y().step() < window.y().step()) || (win_in.z().step() < window.z().step()));
373 
374  Window slice_in = win_in.first_slice_window_4D();
375  Window slice_out = win.first_slice_window_4D();
376 
377  unsigned int idx = 2 * num_arguments_per_4D_tensor() + (_is_quantized ? num_arguments_per_2D_tensor() : num_arguments_per_3D_tensor());
378 
379  if(_is_quantized)
380  {
381  Window slice;
382  slice.use_tensor_dimensions(_output_multipliers->info()->tensor_shape());
383  slice.set_dimension_step(Window::DimX, window.x().step());
384  add_1D_tensor_argument(idx, _output_multipliers, slice);
385  add_1D_tensor_argument(idx, _output_shifts, slice);
386  }
387 
388  if(_biases != nullptr)
389  {
390  Window win_biases;
391  win_biases.use_tensor_dimensions(_biases->info()->tensor_shape());
392  win_biases.set_dimension_step(Window::DimX, window.x().step());
393  add_1D_tensor_argument(idx, _biases, win_biases);
394  }
395 
396  // Calculate the max_offset.
397  // max_offset is the offset for the last NOT valid value in the Z dimension (spatial dimension Y for NHWC)
398  // |******************|
399  // | pad_top |
400  // |******************|
401  // | |
402  // | plane0 |
403  // | batch0 |
404  // |__________________|
405  // |******************| Batch 0
406  // | pad_bottom |
407  // | pad_top |
408  // |******************|
409  // | |
410  // | plane1 |
411  // | batch0 |
412  // |__________________|-----> max_offset
413  // |******************|
414  // | pad_bottom |
415  // | pad_top |
416  // |******************|
417  // | |
418  // | plane0 |
419  // | batch1 |
420  // |__________________|
421  // |******************| Batch 1
422  // | pad_bottom |
423  // | pad_top |
424  // |******************|
425  // | |
426  // | plane1 |
427  // | batch1 |
428  // |__________________|
429  // | pad_bottom |
430  // |******************|
431  const int max_offset = _input->info()->strides_in_bytes().z() * _input->info()->dimension(2) - (_input->info()->padding().bottom + _input->info()->padding().top) *
432  _input->info()->strides_in_bytes().y();
433  _kernel.setArg(idx, max_offset);
434 
435  do
436  {
437  unsigned int idx = 0;
438  add_4D_tensor_argument(idx, _input, slice_in);
439  add_4D_tensor_argument(idx, _output, slice_out);
440  if(_is_quantized)
441  {
442  add_2D_tensor_argument(idx, _weights, slice_out);
443  }
444  else
445  {
446  add_3D_tensor_argument(idx, _weights, slice_out);
447  }
448  enqueue(queue, *this, slice_out, lws_hint());
449  }
450  while(win.slide_window_slice_4D(slice_out) && win_in.slide_window_slice_4D(slice_in));
451 }
const Window & window() const
The maximum window the kernel can be executed on.
Definition: IKernel.cpp:28
void enqueue(cl::CommandQueue &queue, ICLKernel &kernel, const Window &window, const cl::NDRange &lws_hint=CLKernelLibrary::get().default_ndrange(), bool use_dummy_work_items=false)
Add the kernel to the command queue with the given window.
Definition: ICLKernel.cpp:39
constexpr int step() const
Return the step of the dimension.
Definition: Window.h:102
cl::NDRange lws_hint() const
Return the Local-Workgroup-Size hint.
Definition: ICLKernel.h:247
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:466
constexpr const Dimension & z() const
Alias to access the third dimension of the window.
Definition: Window.h:161
void add_3D_tensor_argument(unsigned int &idx, const ICLTensor *tensor, const Window &window)
Add the passed 3D tensor's parameters to the object's kernel's arguments starting from the index idx.
Definition: ICLKernel.h:158
static constexpr size_t DimX
Alias for dimension 0 also known as X dimension.
Definition: Window.h:43
static constexpr unsigned int num_arguments_per_3D_tensor()
Returns the number of arguments enqueued per 3D tensor object.
Definition: ICLKernel.h:200
Window collapse_if_possible(const Window &full_window, size_t first, size_t last, bool *has_collapsed=nullptr) const
Collapse the dimensions between first and last if possible.
Definition: Window.inl:68
void set(size_t dimension, const Dimension &dim)
Set the values of a given dimension.
Definition: Window.inl:49
static constexpr unsigned int num_arguments_per_2D_tensor()
Returns the number of arguments enqueued per 2D tensor object.
Definition: ICLKernel.h:192
static constexpr unsigned int num_arguments_per_4D_tensor()
Returns the number of arguments enqueued per 4D tensor object.
Definition: ICLKernel.h:208
static constexpr size_t DimY
Alias for dimension 1 also known as Y dimension.
Definition: Window.h:45
void add_2D_tensor_argument(unsigned int &idx, const ICLTensor *tensor, const Window &window)
Add the passed 2D tensor's parameters to the object's kernel's arguments starting from the index idx.
Definition: ICLKernel.h:134
static constexpr size_t DimZ
Alias for dimension 2 also known as Z dimension.
Definition: Window.h:47
constexpr const Dimension & y() const
Alias to access the second dimension of the window.
Definition: Window.h:152
void add_1D_tensor_argument(unsigned int &idx, const ICLTensor *tensor, const Window &window)
Add the passed 1D tensor's parameters to the object's kernel's arguments starting from the index idx.
Definition: ICLKernel.h:110
#define ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW(f, s)
Definition: Validate.h:205
void add_4D_tensor_argument(unsigned int &idx, const ICLTensor *tensor, const Window &window)
Add the passed 4D tensor's parameters to the object's kernel's arguments starting from the index idx.
Definition: ICLKernel.h:168
#define ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL(k)
Definition: Validate.h:941
SimpleTensor< T > slice(const SimpleTensor< T > &src, Coordinates starts, Coordinates ends)
constexpr const Dimension & x() const
Alias to access the first dimension of the window.
Definition: Window.h:143

References ICLKernel::add_1D_tensor_argument(), ICLKernel::add_2D_tensor_argument(), ICLKernel::add_3D_tensor_argument(), ICLKernel::add_4D_tensor_argument(), ARM_COMPUTE_ERROR_ON, ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW, ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL, Window::collapse_if_possible(), Window::DimX, Window::DimY, Window::DimZ, arm_compute::enqueue(), Window::first_slice_window_4D(), ICLKernel::lws_hint(), ICLKernel::num_arguments_per_2D_tensor(), ICLKernel::num_arguments_per_3D_tensor(), ICLKernel::num_arguments_per_4D_tensor(), Window::set(), Window::set_dimension_step(), arm_compute::test::validation::reference::slice(), Window::slide_window_slice_4D(), Window::Dimension::step(), Window::use_tensor_dimensions(), IKernel::window(), Window::x(), Window::y(), and Window::z().

◆ validate()

Status validate ( const ITensorInfo input,
const ITensorInfo weights,
const ITensorInfo biases,
const ITensorInfo output,
const PadStrideInfo conv_info,
unsigned int  depth_multiplier = 1,
ActivationLayerInfo  act_info = ActivationLayerInfo(),
const Size2D dilation = Size2D(1U, 1U),
const ITensorInfo output_multipliers = nullptr,
const ITensorInfo output_shifts = nullptr 
)
static

Static function to check if given info will lead to a valid configuration of CLDepthwiseConvolutionLayer3x3NHWCKernel.

Parameters
[in]inputSource tensor info. DataType supported: QASYMM8.
[in]weightsWeights tensor info. A 3D tensor with dimensions [IFM, 3, 3]. Data type supported: Same as input or QASYMM8/QSYMM8_PER_CHANNEL when input is QASYMM8.
[in]biasesBiases tensor info. A 1D tensor with dimensions [IFM]. Must be nullptr if not needed. Data type supported: Same as input, S32 when input is QASYMM8.
[in]outputDestination tensor info. Data type supported: Same as input.
[in]conv_infoPadding and stride information to use for the convolution.
[in]depth_multiplier(Optional) Multiplier to apply to the input's depth in order to retrieve the output's depth. Defaults to 1.
[in]act_info(Optional) Activation layer information in case of a fused activation. Only RELU, BOUNDED_RELU and LU_BOUNDED_RELU are supported.
[in]dilation(Optional) Dilation, in elements, across x and y. Defaults to (1, 1).
[in]output_multipliers(Optional) Output multipliers tensor info for quantized computations. In case of per-channel quantization, the number of multipliers must be equal to the number of filters (IFM). Supported data types: S32
[in]output_shifts(Optional) Output shifts tensor for quantized computations. In case of per-channel quantization, the number of multipliers must be equal to the number of filters (IFM). Supported data types: S32
Returns
a status

Definition at line 340 of file CLDepthwiseConvolutionLayer3x3NHWCKernel.cpp.

343 {
344  ARM_COMPUTE_RETURN_ON_ERROR(validate_arguments(input, weights, biases, output, conv_info, depth_multiplier, act_info, dilation, output_multipliers, output_shifts));
345  ARM_COMPUTE_RETURN_ON_ERROR(validate_and_configure_window(input->clone().get(), weights->clone().get(),
346  biases != nullptr ? biases->clone().get() : nullptr,
347  output->clone().get(), conv_info, depth_multiplier, dilation,
348  (output_multipliers != nullptr) ? output_multipliers->clone().get() : nullptr,
349  (output_shifts != nullptr) ? output_shifts->clone().get() : nullptr)
350  .first);
351 
352  return Status{};
353 }
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204

References arm_compute::test::validation::act_info, ARM_COMPUTE_RETURN_ON_ERROR, ICloneable< T >::clone(), arm_compute::test::validation::conv_info, arm_compute::test::validation::dilation, arm_compute::test::validation::input, and arm_compute::test::validation::weights.


The documentation for this class was generated from the following files: