Compute Library
 19.08
CLDepthwiseConvolutionLayer3x3NHWCKernel Class Reference

Interface for the kernel to run a 3x3 depthwise convolution on a tensor when the data layout is NHWC. More...

#include <CLDepthwiseConvolutionLayer3x3NHWCKernel.h>

Collaboration diagram for CLDepthwiseConvolutionLayer3x3NHWCKernel:
[legend]

Public Member Functions

 CLDepthwiseConvolutionLayer3x3NHWCKernel ()
 Default constructor. More...
 
void configure (const ICLTensor *input, const ICLTensor *weights, const ICLTensor *biases, ICLTensor *output, const PadStrideInfo &conv_info, unsigned int depth_multiplier, ActivationLayerInfo act_info, const Size2D &dilation) override
 Default move assignment operator. More...
 
void run (const Window &window, cl::CommandQueue &queue) override
 Enqueue the OpenCL kernel to process the given window on the passed OpenCL command queue. More...
 
BorderSize border_size () const override
 The size of the border for that kernel. More...
 
- Public Member Functions inherited from ICLDepthwiseConvolutionLayer3x3Kernel
 ICLDepthwiseConvolutionLayer3x3Kernel ()
 Default constructor. More...
 
 ICLDepthwiseConvolutionLayer3x3Kernel (const ICLDepthwiseConvolutionLayer3x3Kernel &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
ICLDepthwiseConvolutionLayer3x3Kerneloperator= (const ICLDepthwiseConvolutionLayer3x3Kernel &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 ICLDepthwiseConvolutionLayer3x3Kernel (ICLDepthwiseConvolutionLayer3x3Kernel &&)=default
 Default Move Constructor. More...
 
ICLDepthwiseConvolutionLayer3x3Kerneloperator= (ICLDepthwiseConvolutionLayer3x3Kernel &&)=default
 Default move assignment operator. More...
 
- Public Member Functions inherited from ICLKernel
 ICLKernel ()
 Constructor. More...
 
cl::Kernel & kernel ()
 Returns a reference to the OpenCL kernel of this object. More...
 
template<typename T >
void add_1D_array_argument (unsigned int &idx, const ICLArray< T > *array, const Strides &strides, unsigned int num_dimensions, const Window &window)
 Add the passed 1D array's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_1D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 1D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_1D_tensor_argument_if (bool cond, unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 1D tensor's parameters to the object's kernel's arguments starting from the index idx if the condition is true. More...
 
void add_2D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 2D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_2D_tensor_argument_if (bool cond, unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 2D tensor's parameters to the object's kernel's arguments starting from the index idx if the condition is true. More...
 
void add_3D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 3D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_4D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 4D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
template<typename T >
void add_argument (unsigned int &idx, T value)
 Add the passed parameters to the object's kernel's arguments starting from the index idx. More...
 
void set_lws_hint (const cl::NDRange &lws_hint)
 Set the Local-Workgroup-Size hint. More...
 
cl::NDRange lws_hint () const
 Return the Local-Workgroup-Size hint. More...
 
const std::string & config_id () const
 Get the configuration ID. More...
 
void set_target (GPUTarget target)
 Set the targeted GPU architecture. More...
 
void set_target (cl::Device &device)
 Set the targeted GPU architecture according to the CL device. More...
 
GPUTarget get_target () const
 Get the targeted GPU architecture. More...
 
size_t get_max_workgroup_size ()
 Get the maximum workgroup size for the device the CLKernelLibrary uses. More...
 
template<typename T , unsigned int dimension_size>
void add_array_argument (unsigned &idx, const ICLArray< T > *array, const Strides &strides, unsigned int num_dimensions, const Window &window)
 Add the passed array's parameters to the object's kernel's arguments starting from the index idx. More...
 
template<unsigned int dimension_size>
void add_tensor_argument (unsigned &idx, const ICLTensor *tensor, const Window &window)
 
- Public Member Functions inherited from IKernel
 IKernel ()
 Constructor. More...
 
virtual ~IKernel ()=default
 Destructor. More...
 
virtual bool is_parallelisable () const
 Indicates whether or not the kernel is parallelisable. More...
 
const Windowwindow () const
 The maximum window the kernel can be executed on. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, const PadStrideInfo &conv_info, unsigned int depth_multiplier, ActivationLayerInfo act_info=ActivationLayerInfo(), const Size2D &dilation=Size2D(1U, 1U))
 Static function to check if given info will lead to a valid configuration of CLDepthwiseConvolutionLayer3x3NHWCKernel. More...
 
- Static Public Member Functions inherited from ICLKernel
static constexpr unsigned int num_arguments_per_1D_array ()
 Returns the number of arguments enqueued per 1D array object. More...
 
static constexpr unsigned int num_arguments_per_1D_tensor ()
 Returns the number of arguments enqueued per 1D tensor object. More...
 
static constexpr unsigned int num_arguments_per_2D_tensor ()
 Returns the number of arguments enqueued per 2D tensor object. More...
 
static constexpr unsigned int num_arguments_per_3D_tensor ()
 Returns the number of arguments enqueued per 3D tensor object. More...
 
static constexpr unsigned int num_arguments_per_4D_tensor ()
 Returns the number of arguments enqueued per 4D tensor object. More...
 
static cl::NDRange gws_from_window (const Window &window)
 Get the global work size given an execution window. More...
 

Detailed Description

Interface for the kernel to run a 3x3 depthwise convolution on a tensor when the data layout is NHWC.

Definition at line 35 of file CLDepthwiseConvolutionLayer3x3NHWCKernel.h.

Constructor & Destructor Documentation

◆ CLDepthwiseConvolutionLayer3x3NHWCKernel()

Default constructor.

Definition at line 168 of file CLDepthwiseConvolutionLayer3x3NHWCKernel.cpp.

169  : _num_rows_processed_per_iteration(1), _num_planes_processed_per_iteration(1)
170 {
171 }

Member Function Documentation

◆ border_size()

BorderSize border_size ( ) const
overridevirtual

The size of the border for that kernel.

Returns
The width in number of elements of the border.

Reimplemented from IKernel.

Definition at line 173 of file CLDepthwiseConvolutionLayer3x3NHWCKernel.cpp.

174 {
175  return _border_size;
176 }

◆ configure()

void configure ( const ICLTensor input,
const ICLTensor weights,
const ICLTensor biases,
ICLTensor output,
const PadStrideInfo conv_info,
unsigned int  depth_multiplier,
ActivationLayerInfo  act_info,
const Size2D dilation 
)
overridevirtual

Default move assignment operator.

Initialize the function's source, destination, conv and border_size.

Parameters
[in]inputSource tensor. DataType supported: QASYMM8.
[in]weightsWeights tensor. A 3D tensor with dimensions [IFM, 3, 3]. Data type supported: Same as input.
[in]biasesBiases tensor. A 1D tensor with dimensions [IFM]. Must be nullptr if not needed. Data type supported: Same as input.
[out]outputDestination tensor. Data type supported: Same as input.
[in]conv_infoPadding and stride information to use for the convolution.
[in]depth_multiplier(Optional) Multiplier to apply to the input's depth in order to retrieve the output's depth. Defaults to 1.
[in]act_info(Optional) Activation layer information in case of a fused activation. Only RELU, BOUNDED_RELU and LU_BOUNDED_RELU are supported.
[in]dilation(Optional) Dilation, in elements, across x and y. Defaults to (1, 1).

Implements ICLDepthwiseConvolutionLayer3x3Kernel.

Definition at line 178 of file CLDepthwiseConvolutionLayer3x3NHWCKernel.cpp.

180 {
181  ARM_COMPUTE_ERROR_ON_NULLPTR(input, weights, output);
182  ARM_COMPUTE_ERROR_THROW_ON(validate_arguments(input->info(), weights->info(), (biases != nullptr) ? biases->info() : nullptr, output->info(), conv_info, depth_multiplier, act_info, dilation));
183  auto win_config = validate_and_configure_window(input->info(), weights->info(), biases != nullptr ? biases->info() : nullptr, output->info(), conv_info, depth_multiplier, dilation);
184  ARM_COMPUTE_ERROR_THROW_ON(win_config.first);
185 
186  const bool is_qasymm = is_data_type_quantized_asymmetric(input->info()->data_type());
187  const bool is_stride_1 = ((conv_info.stride().first == conv_info.stride().second) && (conv_info.stride().first == 1));
188  const bool is_stride_1_dilation_1 = (is_stride_1 && dilation.x() == 1 && dilation.y() == 1);
189 
190  const bool is_dot8_supported = dot8_supported(CLKernelLibrary::get().get_device());
191 
192  _input = input;
193  _output = output;
194  _weights = weights;
195  _biases = biases;
196  _conv_stride_y = conv_info.stride().second;
197  _num_rows_processed_per_iteration = is_stride_1_dilation_1 ? 2 : 1;
198  _num_planes_processed_per_iteration = is_stride_1_dilation_1 ? 2 : 1;
199 
200  // If QASYMM8 and the 8 bit dot product is available, force _num_planes_processed_per_iteration to 1
201  if(is_dot8_supported && is_qasymm)
202  {
203  _num_planes_processed_per_iteration = 1;
204  }
205 
206  _border_size = BorderSize(is_qasymm && is_stride_1 ? 0 : conv_info.pad_left(), 0, std::max(std::max(conv_info.pad_right(), conv_info.pad_bottom()), conv_info.pad_top()), 0);
207 
208  const unsigned int num_elems_accessed_per_iteration = is_qasymm ? 4 : (8 / input->info()->element_size());
209 
210  CLBuildOptions build_opts;
211  build_opts.add_option("-DACTIVATION_TYPE=" + lower_string(string_from_activation_func(act_info.activation())));
212  build_opts.add_option_if(_biases != nullptr, "-DHAS_BIAS");
213  build_opts.add_option("-DVEC_SIZE=" + support::cpp11::to_string(num_elems_accessed_per_iteration));
214  build_opts.add_option("-DSRC_DIM_2=" + support::cpp11::to_string(_input->info()->dimension(2)));
215  build_opts.add_option("-DCONV_PAD_TOP=" + support::cpp11::to_string(conv_info.pad_top()));
216  build_opts.add_option("-DCONV_PAD_LEFT=" + support::cpp11::to_string(conv_info.pad_left()));
217  build_opts.add_option("-DDILATION_X=" + support::cpp11::to_string(dilation.x()));
218  build_opts.add_option("-DDILATION_Y=" + support::cpp11::to_string(dilation.y()));
219 
220  if(is_qasymm)
221  {
222  const UniformQuantizationInfo iq_info = _input->info()->quantization_info().uniform();
223  const UniformQuantizationInfo wq_info = _weights->info()->quantization_info().uniform();
224  const UniformQuantizationInfo oq_info = _output->info()->quantization_info().uniform();
225 
226  float multiplier = iq_info.scale * wq_info.scale / oq_info.scale;
227  int output_multiplier = 0;
228  int output_shift = 0;
229  quantization::calculate_quantized_multiplier_less_than_one(multiplier, &output_multiplier, &output_shift);
230 
231  build_opts.add_option("-DSRC_DIM_1=" + support::cpp11::to_string(_input->info()->dimension(1)));
232  build_opts.add_option("-DINPUT_OFFSET=" + support::cpp11::to_string(-iq_info.offset));
233  build_opts.add_option("-DWEIGHTS_OFFSET=" + support::cpp11::to_string(-wq_info.offset));
234  build_opts.add_option("-DOUTPUT_OFFSET=" + support::cpp11::to_string(oq_info.offset));
235  build_opts.add_option("-DK_OFFSET=" + support::cpp11::to_string(9 * iq_info.offset * wq_info.offset));
236  build_opts.add_option("-DOUTPUT_MULTIPLIER=" + support::cpp11::to_string(output_multiplier));
237  build_opts.add_option("-DOUTPUT_SHIFT=" + support::cpp11::to_string(output_shift));
238 
239  if(act_info.enabled())
240  {
241  const int a_val = quantize_qasymm8(act_info.a(), oq_info);
242  const int b_val = quantize_qasymm8(act_info.b(), oq_info);
243  const int o1 = oq_info.offset;
244 
245  build_opts.add_option("-DA_VAL=" + support::cpp11::to_string(a_val));
246  build_opts.add_option("-DB_VAL=" + support::cpp11::to_string(b_val));
247  build_opts.add_option("-DCONST_0=" + support::cpp11::to_string(o1));
248 
249  const float s1 = iq_info.scale;
250  build_opts.add_option("-DS1_VAL=" + float_to_string_with_full_precision(s1));
251  build_opts.add_option("-DO1_VAL=" + support::cpp11::to_string(o1));
252  }
253  }
254  else
255  {
256  build_opts.add_option_if(act_info.enabled(), "-DA_VAL=" + float_to_string_with_full_precision(act_info.a()));
257  build_opts.add_option_if(act_info.enabled(), "-DB_VAL=" + float_to_string_with_full_precision(act_info.b()));
258  build_opts.add_option("-DDATA_TYPE=" + get_cl_type_from_data_type(_input->info()->data_type()));
259  }
260 
261  if(is_stride_1_dilation_1)
262  {
263  build_opts.add_option("-DNUM_ROWS_PROCESSED=" + support::cpp11::to_string(_num_rows_processed_per_iteration));
264  build_opts.add_option("-DNUM_PLANES_PROCESSED=" + support::cpp11::to_string(_num_planes_processed_per_iteration));
265  build_opts.add_option("-DDST_DIM_2=" + support::cpp11::to_string(_output->info()->dimension(2)));
266  }
267  else
268  {
269  build_opts.add_option("-DCONV_STRIDE_X=" + support::cpp11::to_string(conv_info.stride().first));
270  build_opts.add_option("-DCONV_STRIDE_Y=" + support::cpp11::to_string(_conv_stride_y));
271  }
272  build_opts.add_option_if(_input->info()->tensor_shape().total_size_upper(3) > 1,
273  "-DDST_DEPTH=" + support::cpp11::to_string(static_cast<int>(std::ceil(_output->info()->dimension(2) / static_cast<float>(_num_planes_processed_per_iteration)))));
274 
275  std::string kernel_name;
276  // Create kernel
277  if(is_qasymm)
278  {
279  kernel_name = std::string("dwc_3x3_reshaped_qasymm8");
280  kernel_name += (is_dot8_supported && is_stride_1_dilation_1 ? "_dot8" : "");
281  kernel_name += (is_stride_1_dilation_1 ? "_stride1" : "");
282  kernel_name += "_nhwc";
283  }
284  else
285  {
286  kernel_name = std::string("depthwise_convolution_3x3_nhwc");
287  kernel_name += (is_stride_1_dilation_1 ? "_stride1" : "");
288  }
289 
290  build_opts.add_option_if(input->info()->data_type() == DataType::F16, "-DIS_F16");
291  build_opts.add_option_if(input->info()->data_type() == DataType::F32, "-DIS_F32");
292 
293  ICLKernel::configure_internal(win_config.second);
294  _kernel = static_cast<cl::Kernel>(CLKernelLibrary::get().create_kernel(kernel_name, build_opts.options()));
295 
296  // Set config_id for enabling LWS tuning
297  _config_id = kernel_name;
298  _config_id += "_";
299  _config_id += support::cpp11::to_string(input->info()->dimension(0));
300  _config_id += "_";
301  _config_id += support::cpp11::to_string(input->info()->dimension(1));
302  _config_id += "_";
303  _config_id += support::cpp11::to_string(input->info()->dimension(2));
304  _config_id += "_";
305  _config_id += support::cpp11::to_string(output->info()->dimension(0));
306  _config_id += "_";
307  _config_id += support::cpp11::to_string(output->info()->dimension(1));
308  _config_id += "_";
309  _config_id += string_from_data_type(input->info()->data_type());
310 }
arm_compute::Status calculate_quantized_multiplier_less_than_one(float multiplier, int *quant_multiplier, int *right_shift)
Calculate quantized representation of multiplier with value less than one.
bool dot8_supported(const cl::Device &device)
Helper function to check whether the cl_arm_integer_dot_product_int8 extension is supported.
Definition: CLHelpers.cpp:149
TensorInfo * info() const override
Interface to be implemented by the child class to return the tensor's metadata.
Definition: CLTensor.cpp:35
std::pair< Status, Window > validate_and_configure_window(ITensorInfo *input, ITensorInfo *weights, ITensorInfo *biases, ITensorInfo *output, const PadStrideInfo &conv_info, unsigned int depth_multiplier, const Size2D &dilation)
std::string to_string(T &&value)
Convert integer and float values to string.
1 channel, 1 F32 per channel
const std::string & string_from_activation_func(ActivationLayerInfo::ActivationFunction act)
Translates a given activation function to a string.
Definition: Utils.cpp:170
static CLKernelLibrary & get()
Access the KernelLibrary singleton.
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:327
std::string lower_string(const std::string &val)
Lower a given string.
Definition: Utils.cpp:327
1 channel, 1 F16 per channel
const std::string & string_from_data_type(DataType dt)
Convert a data type identity into a string.
Definition: Utils.cpp:144
std::string float_to_string_with_full_precision(float val)
Create a string with the float in full precision.
Definition: Utils.h:1066
std::string get_cl_type_from_data_type(const DataType &dt)
Translates a tensor data type to the appropriate OpenCL type.
Definition: CLHelpers.cpp:35
std::unique_ptr< Kernel > create_kernel()
Helper function to create and return a unique_ptr pointed to a CL/GLES kernel object.
Definition: Helpers.h:86
bool is_data_type_quantized_asymmetric(DataType dt)
Check if a given data type is of asymmetric quantized type.
Definition: Utils.h:1030
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161
uint8_t quantize_qasymm8(float value, const UniformQuantizationInfo &qinfo, RoundingPolicy rounding_policy=RoundingPolicy::TO_NEAREST_UP)
Quantize a value given a asymmetric quantization scheme.

References arm_compute::test::validation::act_info, CLBuildOptions::add_option(), CLBuildOptions::add_option_if(), ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, arm_compute::quantization::calculate_quantized_multiplier_less_than_one(), arm_compute::test::validation::conv_info, arm_compute::create_kernel(), ITensorInfo::data_type(), arm_compute::test::validation::dilation, ITensorInfo::dimension(), arm_compute::dot8_supported(), ITensorInfo::element_size(), arm_compute::F16, arm_compute::F32, arm_compute::float_to_string_with_full_precision(), CLKernelLibrary::get(), arm_compute::get_cl_type_from_data_type(), ITensor::info(), CLTensor::info(), arm_compute::is_data_type_quantized_asymmetric(), arm_compute::lower_string(), UniformQuantizationInfo::offset, CLBuildOptions::options(), arm_compute::quantize_qasymm8(), UniformQuantizationInfo::scale, arm_compute::string_from_activation_func(), arm_compute::string_from_data_type(), arm_compute::support::cpp11::to_string(), arm_compute::validate_and_configure_window(), and arm_compute::test::validation::weights.

◆ run()

void run ( const Window window,
cl::CommandQueue &  queue 
)
overridevirtual

Enqueue the OpenCL kernel to process the given window on the passed OpenCL command queue.

Note
The queue is not flushed by this method, and therefore the kernel will not have been executed by the time this method returns.
Parameters
[in]windowRegion on which to execute the kernel. (Must be a valid region of the window returned by window()).
[in,out]queueCommand queue on which to enqueue the kernel.

Implements ICLKernel.

Definition at line 324 of file CLDepthwiseConvolutionLayer3x3NHWCKernel.cpp.

325 {
328 
329  // Collapse window
330  Window window_collapsed = window.collapse_if_possible(ICLKernel::window(), Window::DimZ);
331  const size_t total_batches = _input->info()->tensor_shape().total_size_upper(3);
332  const bool is_qasymm = is_data_type_quantized_asymmetric(_input->info()->data_type());
333 
334  Window win = window_collapsed;
335  win.set(Window::DimZ, Window::Dimension(0, std::ceil(_output->info()->dimension(2) / static_cast<float>(_num_planes_processed_per_iteration)) * total_batches, 1));
336 
337  // Create input window and adjust
338  Window win_in = win;
339  win_in.set_dimension_step(Window::DimY, _num_rows_processed_per_iteration);
340  win_in.set_dimension_step(Window::DimZ, _conv_stride_y);
341 
342  ARM_COMPUTE_ERROR_ON((win_in.y().step() < window.y().step()) || (win_in.z().step() < window.z().step()));
343 
344  Window slice_in = win_in.first_slice_window_4D();
345  Window slice_out = win.first_slice_window_4D();
346 
347  unsigned int idx = 2 * num_arguments_per_4D_tensor() + (is_qasymm ? num_arguments_per_2D_tensor() : num_arguments_per_3D_tensor());
348 
349  if(_biases != nullptr)
350  {
351  Window win_biases;
352  win_biases.use_tensor_dimensions(_biases->info()->tensor_shape());
353  win_biases.set_dimension_step(Window::DimX, window.x().step());
354  add_1D_tensor_argument(idx, _biases, win_biases);
355  }
356 
357  // Calculate the max_offset.
358  // max_offset is the offset for the last NOT valid value in the Z dimension (spatial dimension Y for NHWC)
359  // |******************|
360  // | pad_top |
361  // |******************|
362  // | |
363  // | plane0 |
364  // | batch0 |
365  // |__________________|
366  // |******************| Batch 0
367  // | pad_bottom |
368  // | pad_top |
369  // |******************|
370  // | |
371  // | plane1 |
372  // | batch0 |
373  // |__________________|-----> max_offset
374  // |******************|
375  // | pad_bottom |
376  // | pad_top |
377  // |******************|
378  // | |
379  // | plane0 |
380  // | batch1 |
381  // |__________________|
382  // |******************| Batch 1
383  // | pad_bottom |
384  // | pad_top |
385  // |******************|
386  // | |
387  // | plane1 |
388  // | batch1 |
389  // |__________________|
390  // | pad_bottom |
391  // |******************|
392  const int max_offset = _input->info()->strides_in_bytes().z() * _input->info()->dimension(2) - (_input->info()->padding().bottom + _input->info()->padding().top) *
393  _input->info()->strides_in_bytes().y();
394  _kernel.setArg(idx, max_offset);
395 
396  do
397  {
398  unsigned int idx = 0;
399  add_4D_tensor_argument(idx, _input, slice_in);
400  add_4D_tensor_argument(idx, _output, slice_out);
401  if(is_qasymm)
402  {
403  add_2D_tensor_argument(idx, _weights, slice_out);
404  }
405  else
406  {
407  add_3D_tensor_argument(idx, _weights, slice_out);
408  }
409  enqueue(queue, *this, slice_out, lws_hint());
410  }
411  while(win.slide_window_slice_4D(slice_out) && win_in.slide_window_slice_4D(slice_in));
412 }
const Window & window() const
The maximum window the kernel can be executed on.
Definition: IKernel.cpp:28
void enqueue(cl::CommandQueue &queue, ICLKernel &kernel, const Window &window, const cl::NDRange &lws_hint=CLKernelLibrary::get().default_ndrange(), bool use_dummy_work_items=false)
Add the kernel to the command queue with the given window.
Definition: ICLKernel.cpp:39
constexpr int step() const
Return the step of the dimension.
Definition: Window.h:102
cl::NDRange lws_hint() const
Return the Local-Workgroup-Size hint.
Definition: ICLKernel.h:247
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:337
constexpr const Dimension & z() const
Alias to access the third dimension of the window.
Definition: Window.h:161
void add_3D_tensor_argument(unsigned int &idx, const ICLTensor *tensor, const Window &window)
Add the passed 3D tensor's parameters to the object's kernel's arguments starting from the index idx.
Definition: ICLKernel.h:158
static constexpr size_t DimX
Alias for dimension 0 also known as X dimension.
Definition: Window.h:43
static constexpr unsigned int num_arguments_per_3D_tensor()
Returns the number of arguments enqueued per 3D tensor object.
Definition: ICLKernel.h:200
Window collapse_if_possible(const Window &full_window, size_t first, size_t last, bool *has_collapsed=nullptr) const
Collapse the dimensions between first and last if possible.
Definition: Window.inl:54
static constexpr unsigned int num_arguments_per_2D_tensor()
Returns the number of arguments enqueued per 2D tensor object.
Definition: ICLKernel.h:192
static constexpr unsigned int num_arguments_per_4D_tensor()
Returns the number of arguments enqueued per 4D tensor object.
Definition: ICLKernel.h:208
bool is_data_type_quantized_asymmetric(DataType dt)
Check if a given data type is of asymmetric quantized type.
Definition: Utils.h:1030
static constexpr size_t DimY
Alias for dimension 1 also known as Y dimension.
Definition: Window.h:45
void add_2D_tensor_argument(unsigned int &idx, const ICLTensor *tensor, const Window &window)
Add the passed 2D tensor's parameters to the object's kernel's arguments starting from the index idx.
Definition: ICLKernel.h:134
static constexpr size_t DimZ
Alias for dimension 2 also known as Z dimension.
Definition: Window.h:47
constexpr const Dimension & y() const
Alias to access the second dimension of the window.
Definition: Window.h:152
void add_1D_tensor_argument(unsigned int &idx, const ICLTensor *tensor, const Window &window)
Add the passed 1D tensor's parameters to the object's kernel's arguments starting from the index idx.
Definition: ICLKernel.h:110
#define ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW(f, s)
Definition: Validate.h:205
void add_4D_tensor_argument(unsigned int &idx, const ICLTensor *tensor, const Window &window)
Add the passed 4D tensor's parameters to the object's kernel's arguments starting from the index idx.
Definition: ICLKernel.h:168
#define ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL(k)
Definition: Validate.h:940
constexpr const Dimension & x() const
Alias to access the first dimension of the window.
Definition: Window.h:143

References ICLKernel::add_1D_tensor_argument(), ICLKernel::add_2D_tensor_argument(), ICLKernel::add_3D_tensor_argument(), ICLKernel::add_4D_tensor_argument(), ARM_COMPUTE_ERROR_ON, ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW, ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL, Window::collapse_if_possible(), Window::DimX, Window::DimY, Window::DimZ, arm_compute::enqueue(), Window::first_slice_window_4D(), arm_compute::is_data_type_quantized_asymmetric(), ICLKernel::lws_hint(), ICLKernel::num_arguments_per_2D_tensor(), ICLKernel::num_arguments_per_3D_tensor(), ICLKernel::num_arguments_per_4D_tensor(), Window::set(), Window::set_dimension_step(), Window::slide_window_slice_4D(), Window::Dimension::step(), Window::use_tensor_dimensions(), IKernel::window(), Window::x(), Window::y(), and Window::z().

◆ validate()

Status validate ( const ITensorInfo input,
const ITensorInfo weights,
const ITensorInfo biases,
const ITensorInfo output,
const PadStrideInfo conv_info,
unsigned int  depth_multiplier,
ActivationLayerInfo  act_info = ActivationLayerInfo(),
const Size2D dilation = Size2D(1U, 1U) 
)
static

Static function to check if given info will lead to a valid configuration of CLDepthwiseConvolutionLayer3x3NHWCKernel.

Parameters
[in]inputSource tensor info. DataType supported: QASYMM8.
[in]weightsWeights tensor info. A 3D tensor with dimensions [IFM, 3, 3]. Data type supported: Same as input.
[in]biasesBiases tensor info. A 1D tensor with dimensions [IFM]. Must be nullptr if not needed. Data type supported: Same as input.
[in]outputDestination tensor info. Data type supported: Same as input.
[in]conv_infoPadding and stride information to use for the convolution.
[in]depth_multiplier(Optional) Multiplier to apply to the input's depth in order to retrieve the output's depth. Defaults to 1.
[in]act_info(Optional) Activation layer information in case of a fused activation. Only RELU, BOUNDED_RELU and LU_BOUNDED_RELU are supported.
[in]dilation(Optional) Dilation, in elements, across x and y. Defaults to (1, 1).
Returns
a status

Definition at line 312 of file CLDepthwiseConvolutionLayer3x3NHWCKernel.cpp.

314 {
315  ARM_COMPUTE_RETURN_ON_ERROR(validate_arguments(input, weights, biases, output, conv_info, depth_multiplier, act_info, dilation));
316  ARM_COMPUTE_RETURN_ON_ERROR(validate_and_configure_window(input->clone().get(), weights->clone().get(),
317  biases != nullptr ? biases->clone().get() : nullptr,
318  output->clone().get(), conv_info, depth_multiplier, dilation)
319  .first);
320 
321  return Status{};
322 }
std::pair< Status, Window > validate_and_configure_window(ITensorInfo *input, ITensorInfo *weights, ITensorInfo *biases, ITensorInfo *output, const PadStrideInfo &conv_info, unsigned int depth_multiplier, const Size2D &dilation)
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:193

References arm_compute::test::validation::act_info, ARM_COMPUTE_RETURN_ON_ERROR, ICloneable< T >::clone(), arm_compute::test::validation::conv_info, arm_compute::test::validation::dilation, arm_compute::validate_and_configure_window(), and arm_compute::test::validation::weights.

Referenced by CLDepthwiseConvolutionLayer3x3::validate().


The documentation for this class was generated from the following files: