Compute Library
 21.02
CLDepthwiseConvolutionLayer3x3NCHWKernel Class Reference

Interface for the kernel to run a 3x3 depthwise convolution on a tensor when the data layout is NCHW. More...

#include <CLDepthwiseConvolutionLayer3x3NCHWKernel.h>

Collaboration diagram for CLDepthwiseConvolutionLayer3x3NCHWKernel:
[legend]

Public Member Functions

 CLDepthwiseConvolutionLayer3x3NCHWKernel ()
 Default constructor. More...
 
void configure (const ICLTensor *input, const ICLTensor *weights, const ICLTensor *biases, ICLTensor *output, const PadStrideInfo &conv_info, unsigned int depth_multiplier=1, ActivationLayerInfo act_info=ActivationLayerInfo(), const Size2D &dilation=Size2D(1U, 1U), const ICLTensor *output_multipliers=nullptr, const ICLTensor *output_shifts=nullptr) override
 Initialize the function's source, destination, conv and border_size. More...
 
void configure (const CLCompileContext &compile_context, const ICLTensor *input, const ICLTensor *weights, const ICLTensor *biases, ICLTensor *output, const PadStrideInfo &conv_info, unsigned int depth_multiplier=1, ActivationLayerInfo act_info=ActivationLayerInfo(), const Size2D &dilation=Size2D(1U, 1U), const ICLTensor *output_multipliers=nullptr, const ICLTensor *output_shifts=nullptr) override
 Initialize the function's source, destination, conv and border_size. More...
 
void run (const Window &window, cl::CommandQueue &queue) override
 Enqueue the OpenCL kernel to process the given window on the passed OpenCL command queue. More...
 
BorderSize border_size () const override
 The size of the border for that kernel. More...
 
- Public Member Functions inherited from ICLDepthwiseConvolutionLayer3x3Kernel
 ICLDepthwiseConvolutionLayer3x3Kernel ()
 Default constructor. More...
 
 ICLDepthwiseConvolutionLayer3x3Kernel (const ICLDepthwiseConvolutionLayer3x3Kernel &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
ICLDepthwiseConvolutionLayer3x3Kerneloperator= (const ICLDepthwiseConvolutionLayer3x3Kernel &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 ICLDepthwiseConvolutionLayer3x3Kernel (ICLDepthwiseConvolutionLayer3x3Kernel &&)=default
 Default Move Constructor. More...
 
ICLDepthwiseConvolutionLayer3x3Kerneloperator= (ICLDepthwiseConvolutionLayer3x3Kernel &&)=default
 Default move assignment operator. More...
 
- Public Member Functions inherited from ICLKernel
 ICLKernel ()
 Constructor. More...
 
cl::Kernel & kernel ()
 Returns a reference to the OpenCL kernel of this object. More...
 
template<typename T >
void add_1D_array_argument (unsigned int &idx, const ICLArray< T > *array, const Strides &strides, unsigned int num_dimensions, const Window &window)
 Add the passed 1D array's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_1D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 1D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_1D_tensor_argument_if (bool cond, unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 1D tensor's parameters to the object's kernel's arguments starting from the index idx if the condition is true. More...
 
void add_2D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 2D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_2D_tensor_argument_if (bool cond, unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 2D tensor's parameters to the object's kernel's arguments starting from the index idx if the condition is true. More...
 
void add_3D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 3D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_4D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 4D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
virtual void run_op (ITensorPack &tensors, const Window &window, cl::CommandQueue &queue)
 Enqueue the OpenCL kernel to process the given window on the passed OpenCL command queue. More...
 
template<typename T >
void add_argument (unsigned int &idx, T value)
 Add the passed parameters to the object's kernel's arguments starting from the index idx. More...
 
void set_lws_hint (const cl::NDRange &lws_hint)
 Set the Local-Workgroup-Size hint. More...
 
cl::NDRange lws_hint () const
 Return the Local-Workgroup-Size hint. More...
 
void set_wbsm_hint (const cl_int &wbsm_hint)
 Set the workgroup batch size modifier hint. More...
 
cl_int wbsm_hint () const
 Return the workgroup batch size modifier hint. More...
 
const std::string & config_id () const
 Get the configuration ID. More...
 
void set_target (GPUTarget target)
 Set the targeted GPU architecture. More...
 
void set_target (cl::Device &device)
 Set the targeted GPU architecture according to the CL device. More...
 
GPUTarget get_target () const
 Get the targeted GPU architecture. More...
 
size_t get_max_workgroup_size ()
 Get the maximum workgroup size for the device the CLKernelLibrary uses. More...
 
template<unsigned int dimension_size>
void add_tensor_argument (unsigned &idx, const ICLTensor *tensor, const Window &window)
 
template<typename T , unsigned int dimension_size>
void add_array_argument (unsigned &idx, const ICLArray< T > *array, const Strides &strides, unsigned int num_dimensions, const Window &window)
 Add the passed array's parameters to the object's kernel's arguments starting from the index idx. More...
 
- Public Member Functions inherited from IKernel
 IKernel ()
 Constructor. More...
 
virtual ~IKernel ()=default
 Destructor. More...
 
virtual bool is_parallelisable () const
 Indicates whether or not the kernel is parallelisable. More...
 
const Windowwindow () const
 The maximum window the kernel can be executed on. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, const PadStrideInfo &conv_info, unsigned int depth_multiplier=1, ActivationLayerInfo act_info=ActivationLayerInfo(), GPUTarget gpu_target=GPUTarget::MIDGARD, const Size2D &dilation=Size2D(1U, 1U), const ITensorInfo *output_multipliers=nullptr, const ITensorInfo *output_shifts=nullptr)
 Static function to check if given info will lead to a valid configuration of CLDepthwiseConvolutionLayer3x3NCHWKernel. More...
 
- Static Public Member Functions inherited from ICLKernel
static constexpr unsigned int num_arguments_per_1D_array ()
 Returns the number of arguments enqueued per 1D array object. More...
 
static constexpr unsigned int num_arguments_per_1D_tensor ()
 Returns the number of arguments enqueued per 1D tensor object. More...
 
static constexpr unsigned int num_arguments_per_2D_tensor ()
 Returns the number of arguments enqueued per 2D tensor object. More...
 
static constexpr unsigned int num_arguments_per_3D_tensor ()
 Returns the number of arguments enqueued per 3D tensor object. More...
 
static constexpr unsigned int num_arguments_per_4D_tensor ()
 Returns the number of arguments enqueued per 4D tensor object. More...
 
static cl::NDRange gws_from_window (const Window &window)
 Get the global work size given an execution window. More...
 

Detailed Description

Interface for the kernel to run a 3x3 depthwise convolution on a tensor when the data layout is NCHW.

Definition at line 35 of file CLDepthwiseConvolutionLayer3x3NCHWKernel.h.

Constructor & Destructor Documentation

◆ CLDepthwiseConvolutionLayer3x3NCHWKernel()

Default constructor.

Definition at line 238 of file CLDepthwiseConvolutionLayer3x3NCHWKernel.cpp.

239  : _conv_stride_x(0), _conv_pad_top(0), _conv_pad_left(0)
240 {
241 }

Member Function Documentation

◆ border_size()

BorderSize border_size ( ) const
overridevirtual

The size of the border for that kernel.

Returns
The width in number of elements of the border.

Reimplemented from IKernel.

Definition at line 243 of file CLDepthwiseConvolutionLayer3x3NCHWKernel.cpp.

244 {
245  return _border_size;
246 }

◆ configure() [1/2]

void configure ( const ICLTensor input,
const ICLTensor weights,
const ICLTensor biases,
ICLTensor output,
const PadStrideInfo conv_info,
unsigned int  depth_multiplier = 1,
ActivationLayerInfo  act_info = ActivationLayerInfo(),
const Size2D dilation = Size2D(1U, 1U),
const ICLTensor output_multipliers = nullptr,
const ICLTensor output_shifts = nullptr 
)
overridevirtual

Initialize the function's source, destination, conv and border_size.

Parameters
[in]inputSource tensor. DataType supported: QASYMM8/QASYMM8_SIGNED/F16/F32.
[in]weightsWeights tensor. A 3D tensor with dimensions [3, 3, IFM]. Data type supported: Same as input or QASYMM8/QASYMM8_SIGNED/QSYMM8_PER_CHANNEL when input is QASYMM8/QASYMM8_SIGNED.
[in]biasesBiases tensor. A 1D tensor with dimensions [IFM]. Must be nullptr if not needed. Data type supported: Same as input, S32 when input is QASYMM8/QASYMM8_SIGNED.
[out]outputDestination tensor. Data type supported: Same as input.
[in]conv_infoPadding and stride information to use for the convolution.
[in]depth_multiplier(Optional) Multiplier to apply to the input's depth in order to retrieve the output's depth. Defaults to 1.
[in]act_info(Optional) Activation layer information in case of a fused activation. Only RELU, BOUNDED_RELU and LU_BOUNDED_RELU for QASYMM8 supported.
[in]dilation(Optional) Dilation, in elements, across x and y. Defaults to (1, 1).
[in]output_multipliers(Optional) Output multipliers tensor for quantized computations. In case of per-channel quantization, the number of multipliers must be equal to the number of filters (IFM). Supported data types: S32
[in]output_shifts(Optional) Output shifts tensor for quantized computations. In case of per-channel quantization, the number of multipliers must be equal to the number of filters (IFM). Supported data types: S32

Implements ICLDepthwiseConvolutionLayer3x3Kernel.

Definition at line 248 of file CLDepthwiseConvolutionLayer3x3NCHWKernel.cpp.

References CLKernelLibrary::get().

251 {
252  configure(CLKernelLibrary::get().get_compile_context(), input, weights, biases, output, conv_info, depth_multiplier, act_info, dilation, output_multipliers, output_shifts);
253 }
static CLKernelLibrary & get()
Access the KernelLibrary singleton.
void configure(const ICLTensor *input, const ICLTensor *weights, const ICLTensor *biases, ICLTensor *output, const PadStrideInfo &conv_info, unsigned int depth_multiplier=1, ActivationLayerInfo act_info=ActivationLayerInfo(), const Size2D &dilation=Size2D(1U, 1U), const ICLTensor *output_multipliers=nullptr, const ICLTensor *output_shifts=nullptr) override
Initialize the function&#39;s source, destination, conv and border_size.

◆ configure() [2/2]

void configure ( const CLCompileContext compile_context,
const ICLTensor input,
const ICLTensor weights,
const ICLTensor biases,
ICLTensor output,
const PadStrideInfo conv_info,
unsigned int  depth_multiplier = 1,
ActivationLayerInfo  act_info = ActivationLayerInfo(),
const Size2D dilation = Size2D(1U, 1U),
const ICLTensor output_multipliers = nullptr,
const ICLTensor output_shifts = nullptr 
)
overridevirtual

Initialize the function's source, destination, conv and border_size.

Parameters
[in]compile_contextThe compile context to be used.
[in]inputSource tensor. DataType supported: QASYMM8/QASYMM8_SIGNED/F16/F32.
[in]weightsWeights tensor. A 3D tensor with dimensions [3, 3, IFM]. Data type supported: Same as input or QASYMM8/QASYMM8_SIGNED/QSYMM8_PER_CHANNEL when input is QASYMM8/QASYMM8_SIGNED.
[in]biasesBiases tensor. A 1D tensor with dimensions [IFM]. Must be nullptr if not needed. Data type supported: Same as input, S32 when input is QASYMM8/QASYMM8_SIGNED.
[out]outputDestination tensor. Data type supported: Same as input.
[in]conv_infoPadding and stride information to use for the convolution.
[in]depth_multiplier(Optional) Multiplier to apply to the input's depth in order to retrieve the output's depth. Defaults to 1.
[in]act_info(Optional) Activation layer information in case of a fused activation. Only RELU, BOUNDED_RELU and LU_BOUNDED_RELU for QASYMM8 supported.
[in]dilation(Optional) Dilation, in elements, across x and y. Defaults to (1, 1).
[in]output_multipliers(Optional) Output multipliers tensor for quantized computations. In case of per-channel quantization, the number of multipliers must be equal to the number of filters (IFM). Supported data types: S32
[in]output_shifts(Optional) Output shifts tensor for quantized computations. In case of per-channel quantization, the number of multipliers must be equal to the number of filters (IFM). Supported data types: S32

Implements ICLDepthwiseConvolutionLayer3x3Kernel.

Definition at line 255 of file CLDepthwiseConvolutionLayer3x3NCHWKernel.cpp.

References CLBuildOptions::add_option(), ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, arm_compute::quantization::calculate_quantized_multiplier(), arm_compute::test::validation::conv_info, arm_compute::create_kernel(), ITensorInfo::data_type(), ITensorInfo::dimension(), arm_compute::dot8_supported(), arm_compute::F16, arm_compute::F32, arm_compute::float_to_string_with_full_precision(), CLKernelLibrary::get(), arm_compute::get_cl_promoted_type_from_data_type(), arm_compute::get_cl_type_from_data_type(), arm_compute::get_quantized_activation_min_max(), ICLKernel::get_target(), ITensor::info(), arm_compute::test::validation::input, arm_compute::is_data_type_quantized_asymmetric(), arm_compute::is_data_type_quantized_per_channel(), kernel_name, arm_compute::lower_string(), UniformQuantizationInfo::offset, PadStrideInfo::pad_left(), PadStrideInfo::pad_top(), ITensorInfo::padding(), UniformQuantizationInfo::scale, PadStrideInfo::stride(), arm_compute::string_from_activation_func(), arm_compute::string_from_data_type(), arm_compute::support::cpp11::to_string(), and arm_compute::validate_arguments().

258 {
259  ARM_COMPUTE_ERROR_ON_NULLPTR(input, weights, output);
260  ARM_COMPUTE_ERROR_THROW_ON(validate_arguments(input->info(), weights->info(), (biases != nullptr) ? biases->info() : nullptr, output->info(),
261  conv_info, depth_multiplier, act_info, dilation,
262  (output_multipliers != nullptr) ? output_multipliers->info() : nullptr,
263  (output_shifts != nullptr) ? output_shifts->info() : nullptr));
264 
265  _input = input;
266  _output = output;
267  _weights = weights;
268  _biases = biases;
269  _conv_stride_x = conv_info.stride().first;
270  _conv_stride_y = conv_info.stride().second;
271  _conv_pad_left = conv_info.pad_left();
272  _conv_pad_top = conv_info.pad_top();
273  _output_multipliers = output_multipliers;
274  _output_shifts = output_shifts;
275  _is_quantized = is_data_type_quantized_asymmetric(input->info()->data_type());
276 
277  // Configure kernel window
278  std::string kernel_name;
279  const GPUTarget gpu_target = get_target();
280 
281  auto win_config = validate_and_configure_window(input->info(), weights->info(), output->info(), conv_info, depth_multiplier, gpu_target, kernel_name, dilation);
282  ARM_COMPUTE_ERROR_THROW_ON(win_config.first);
283  ICLKernel::configure_internal(win_config.second);
284 
285  _border_size = BorderSize(input->info()->padding());
286 
287  // Set build options
288  CLBuildOptions build_opts;
289  build_opts.add_option("-DACTIVATION_TYPE=" + lower_string(string_from_activation_func(act_info.activation())));
290  build_opts.add_option("-DDST_CHANNELS=" + support::cpp11::to_string(_output->info()->tensor_shape().z()));
291  build_opts.add_option("-DDEPTH_MULTIPLIER=" + support::cpp11::to_string(depth_multiplier));
292  build_opts.add_option("-DCONV_STRIDE_X=" + support::cpp11::to_string(_conv_stride_x));
293  build_opts.add_option("-DDILATION_X=" + support::cpp11::to_string(dilation.x()));
294  build_opts.add_option("-DDILATION_Y=" + support::cpp11::to_string(dilation.y()));
295  build_opts.add_option_if(_biases != nullptr, "-DHAS_BIAS");
296 
297  if(_is_quantized)
298  {
299  const UniformQuantizationInfo iq_info = _input->info()->quantization_info().uniform();
300  const UniformQuantizationInfo wq_info = _weights->info()->quantization_info().uniform();
301  const UniformQuantizationInfo oq_info = _output->info()->quantization_info().uniform();
302 
303  const bool is_quantized_per_channel = is_data_type_quantized_per_channel(weights->info()->data_type());
304  const bool is_dot8_supported = dot8_supported(CLKernelLibrary::get().get_device()) && !is_quantized_per_channel;
305  build_opts.add_option("-DCONV_STRIDE_Y=" + support::cpp11::to_string(_conv_stride_y));
306  build_opts.add_option("-DINPUT_OFFSET=" + support::cpp11::to_string(-iq_info.offset));
307  build_opts.add_option("-DWEIGHTS_OFFSET=" + support::cpp11::to_string(-wq_info.offset));
308  build_opts.add_option("-DOUTPUT_OFFSET=" + support::cpp11::to_string(oq_info.offset));
309  build_opts.add_option("-DK_OFFSET=" + support::cpp11::to_string(9 * iq_info.offset * wq_info.offset));
310  build_opts.add_option_if(is_quantized_per_channel, "-DPER_CHANNEL_QUANTIZATION");
311  build_opts.add_option_if(is_dot8_supported, "-DIS_DOT8");
312 
313  // Compute non-per-channel multiplier and shift anyway to make OpenCL kernel simpler
314  float multiplier = iq_info.scale * wq_info.scale / oq_info.scale;
315  int output_multiplier = 0;
316  int output_shift = 0;
317  quantization::calculate_quantized_multiplier(multiplier, &output_multiplier, &output_shift);
318  build_opts.add_option("-DOUTPUT_MULTIPLIER=" + support::cpp11::to_string(output_multiplier));
319  build_opts.add_option("-DOUTPUT_SHIFT=" + support::cpp11::to_string(output_shift));
320 
321  if(act_info.enabled())
322  {
323  int a_val{};
324  int b_val{};
325  std::tie(b_val, a_val) = get_quantized_activation_min_max(act_info, input->info()->data_type(), oq_info);
326 
327  const int o1 = oq_info.offset;
328 
329  build_opts.add_option("-DA_VAL=" + support::cpp11::to_string(a_val));
330  build_opts.add_option("-DB_VAL=" + support::cpp11::to_string(b_val));
331  build_opts.add_option("-DCONST_0=" + support::cpp11::to_string(o1));
332 
333  const float s1 = iq_info.scale;
334  build_opts.add_option("-DS1_VAL=" + float_to_string_with_full_precision(s1));
335  build_opts.add_option("-DO1_VAL=" + support::cpp11::to_string(o1));
336  }
337 
338  build_opts.add_option("-DDATA_TYPE=" + get_cl_type_from_data_type(input->info()->data_type()));
339  build_opts.add_option("-DWEIGHTS_TYPE=" + get_cl_type_from_data_type(weights->info()->data_type()));
340  build_opts.add_option("-DWEIGHTS_PROMOTED_TYPE=" + get_cl_promoted_type_from_data_type(weights->info()->data_type()));
341  }
342  else
343  {
344  build_opts.add_option_if(act_info.enabled(), "-DA_VAL=" + float_to_string_with_full_precision(act_info.a()));
345  build_opts.add_option_if(act_info.enabled(), "-DB_VAL=" + float_to_string_with_full_precision(act_info.b()));
346  build_opts.add_option_if(act_info.enabled(), "-DDATA_TYPE=" + get_cl_type_from_data_type(input->info()->data_type()));
347  build_opts.add_option("-DVEC_SIZE=" + support::cpp11::to_string(win_config.second.x().step()));
348  }
349 
350  build_opts.add_option_if(input->info()->data_type() == DataType::F16, "-DIS_F16");
351  build_opts.add_option_if(input->info()->data_type() == DataType::F32, "-DIS_F32");
352 
353  _kernel = create_kernel(compile_context, kernel_name, build_opts.options());
354 
355  // Set config_id for enabling LWS tuning
356  _config_id = kernel_name;
357  _config_id += "_";
358  _config_id += lower_string(string_from_data_type(input->info()->data_type()));
359  _config_id += "_";
360  _config_id += support::cpp11::to_string(input->info()->dimension(0));
361  _config_id += "_";
362  _config_id += support::cpp11::to_string(input->info()->dimension(1));
363  _config_id += "_";
364  _config_id += support::cpp11::to_string(input->info()->dimension(2));
365  _config_id += "_";
366  _config_id += support::cpp11::to_string(output->info()->dimension(0));
367  _config_id += "_";
368  _config_id += support::cpp11::to_string(output->info()->dimension(1));
369 }
bool dot8_supported(const cl::Device &device)
Helper function to check whether the cl_arm_integer_dot_product_int8 extension is supported...
Definition: CLHelpers.cpp:239
std::string to_string(T &&value)
Convert integer and float values to string.
1 channel, 1 F32 per channel
const std::string & string_from_activation_func(ActivationLayerInfo::ActivationFunction act)
Translates a given activation function to a string.
Definition: Utils.cpp:163
static CLKernelLibrary & get()
Access the KernelLibrary singleton.
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
Status calculate_quantized_multiplier(float multiplier, int32_t *quant_multiplier, int32_t *shift, bool ignore_epsilon=false)
Calculate quantized representation of multiplier.
std::string lower_string(const std::string &val)
Lower a given string.
Definition: Utils.cpp:350
1 channel, 1 F16 per channel
cl::Kernel create_kernel(const CLCompileContext &ctx, const std::string &kernel_name, const std::set< std::string > &build_opts=std::set< std::string >())
Creates an opencl kernel using a compile context.
Definition: CLHelpers.cpp:403
const std::string & string_from_data_type(DataType dt)
Convert a data type identity into a string.
Definition: Utils.cpp:135
bool is_data_type_quantized_per_channel(DataType dt)
Check if a given data type is of per channel type.
Definition: Utils.h:1245
std::string float_to_string_with_full_precision(float val)
Create a string with the float in full precision.
Definition: Utils.h:1262
std::pair< int32_t, int32_t > get_quantized_activation_min_max(ActivationLayerInfo act_info, DataType data_type, UniformQuantizationInfo oq_info)
Returns a pair of minimum and maximum values for a quantized activation.
Definition: Utils.cpp:483
std::string kernel_name
GPUTarget get_target() const
Get the targeted GPU architecture.
Definition: ICLKernel.h:336
std::string get_cl_type_from_data_type(const DataType &dt)
Translates a tensor data type to the appropriate OpenCL type.
Definition: CLHelpers.cpp:37
bool is_data_type_quantized_asymmetric(DataType dt)
Check if a given data type is of asymmetric quantized type.
Definition: Utils.h:1190
std::string get_cl_promoted_type_from_data_type(const DataType &dt)
Translates a tensor data type to the appropriate OpenCL promoted type.
Definition: CLHelpers.cpp:73
GPUTarget
Available GPU Targets.
Definition: GPUTarget.h:34
Status validate_arguments(const ITensorInfo *input, const ITensorInfo *bias, const ITensorInfo *output, const GEMMLowpOutputStageInfo *output_stage)
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161

◆ run()

void run ( const Window window,
cl::CommandQueue &  queue 
)
overridevirtual

Enqueue the OpenCL kernel to process the given window on the passed OpenCL command queue.

Note
The queue is not flushed by this method, and therefore the kernel will not have been executed by the time this method returns.
Parameters
[in]windowRegion on which to execute the kernel. (Must be a valid region of the window returned by window()).
[in,out]queueCommand queue on which to enqueue the kernel.

Reimplemented from ICLKernel.

Definition at line 384 of file CLDepthwiseConvolutionLayer3x3NCHWKernel.cpp.

References ICLKernel::add_1D_tensor_argument(), ICLKernel::add_3D_tensor_argument(), Window::adjust(), ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW, ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL, Window::collapse_if_possible(), Window::DimX, Window::DimY, Window::DimZ, arm_compute::enqueue(), Window::first_slice_window_3D(), ICLKernel::lws_hint(), ICLKernel::num_arguments_per_3D_tensor(), Window::set_dimension_step(), arm_compute::test::validation::reference::slice(), Window::slide_window_slice_3D(), Window::Dimension::step(), Window::use_tensor_dimensions(), IKernel::window(), Window::x(), and Window::y().

385 {
388 
390 
391  // Create input window and adjust
392  Window collapsed_in = collapsed;
393  collapsed_in.adjust(Window::DimX, -_conv_pad_left, true);
394  collapsed_in.adjust(Window::DimY, -_conv_pad_top, true);
395  collapsed_in.set_dimension_step(Window::DimX, collapsed_in.x().step() * _conv_stride_x);
396  collapsed_in.set_dimension_step(Window::DimY, collapsed_in.y().step() * _conv_stride_y);
397 
398  Window slice_in = collapsed_in.first_slice_window_3D();
399  Window slice_out = collapsed.first_slice_window_3D();
400  Window slice_weights = window.first_slice_window_3D();
401  slice_weights.set_dimension_step(Window::DimX, 0);
402  slice_weights.set_dimension_step(Window::DimY, 0);
403 
404  unsigned int idx = 3 * num_arguments_per_3D_tensor();
405 
406  // Set output multipliers in case of quantized data type
407  if(_is_quantized)
408  {
409  Window slice;
410  slice.use_tensor_dimensions(_output_multipliers->info()->tensor_shape());
411  add_1D_tensor_argument(idx, _output_multipliers, slice);
412  add_1D_tensor_argument(idx, _output_shifts, slice);
413  }
414 
415  // Set biases
416  if(_biases != nullptr)
417  {
418  Window slice_biases;
419  slice_biases.use_tensor_dimensions(_biases->info()->tensor_shape());
420  add_1D_tensor_argument(idx, _biases, slice_biases);
421  }
422 
423  do
424  {
425  idx = 0;
426  add_3D_tensor_argument(idx, _input, slice_in);
427  add_3D_tensor_argument(idx, _output, slice_out);
428  add_3D_tensor_argument(idx, _weights, slice_weights);
429 
430  enqueue(queue, *this, slice_out, lws_hint());
431  }
432  while(collapsed.slide_window_slice_3D(slice_out) && collapsed_in.slide_window_slice_3D(slice_in));
433 }
const Window & window() const
The maximum window the kernel can be executed on.
Definition: IKernel.cpp:28
void enqueue(IGCKernel &kernel, const Window &window, const gles::NDRange &lws=gles::NDRange(1U, 1U, 1U))
Add the kernel to the command queue with the given window.
Definition: IGCKernel.cpp:41
cl::NDRange lws_hint() const
Return the Local-Workgroup-Size hint.
Definition: ICLKernel.h:276
void add_3D_tensor_argument(unsigned int &idx, const ICLTensor *tensor, const Window &window)
Add the passed 3D tensor&#39;s parameters to the object&#39;s kernel&#39;s arguments starting from the index idx...
Definition: ICLKernel.h:172
static constexpr size_t DimX
Alias for dimension 0 also known as X dimension.
Definition: Window.h:43
static constexpr unsigned int num_arguments_per_3D_tensor()
Returns the number of arguments enqueued per 3D tensor object.
Definition: ICLKernel.h:214
Window collapse_if_possible(const Window &full_window, size_t first, size_t last, bool *has_collapsed=nullptr) const
Collapse the dimensions between first and last if possible.
Definition: Window.inl:68
#define ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL(k)
Definition: Validate.h:941
static constexpr size_t DimY
Alias for dimension 1 also known as Y dimension.
Definition: Window.h:45
void set_dimension_step(size_t dimension, int step)
Set the step of a given dimension.
Definition: Window.inl:167
static constexpr size_t DimZ
Alias for dimension 2 also known as Z dimension.
Definition: Window.h:47
void adjust(size_t dimension, int adjust_value, bool is_at_start)
Adjust the start or end of a given dimension by the given value.
Definition: Window.inl:140
void add_1D_tensor_argument(unsigned int &idx, const ICLTensor *tensor, const Window &window)
Add the passed 1D tensor&#39;s parameters to the object&#39;s kernel&#39;s arguments starting from the index idx...
Definition: ICLKernel.h:124
Window first_slice_window_3D() const
First 3D slice of the window.
Definition: Window.h:291
#define ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW(f, s)
Definition: Validate.h:205
SimpleTensor< T > slice(const SimpleTensor< T > &src, Coordinates starts, Coordinates ends)

◆ validate()

Status validate ( const ITensorInfo input,
const ITensorInfo weights,
const ITensorInfo biases,
const ITensorInfo output,
const PadStrideInfo conv_info,
unsigned int  depth_multiplier = 1,
ActivationLayerInfo  act_info = ActivationLayerInfo(),
GPUTarget  gpu_target = GPUTarget::MIDGARD,
const Size2D dilation = Size2D(1U, 1U),
const ITensorInfo output_multipliers = nullptr,
const ITensorInfo output_shifts = nullptr 
)
static

Static function to check if given info will lead to a valid configuration of CLDepthwiseConvolutionLayer3x3NCHWKernel.

Parameters
[in]inputSource tensor info. DataType supported: QASYMM8/QASYMM8_SIGNED/F16/F32.
[in]weightsWeights tensor info. A 3D tensor with dimensions [3, 3, IFM]. Data type supported: Same as input or QASYMM8/QASYMM8_SIGNED/QSYMM8_PER_CHANNEL when input is QASYMM8/QASYMM8_SIGNED.
[in]biasesBiases tensor info. A 1D tensor with dimensions [IFM]. Must be nullptr if not needed. Data type supported: Same as input, S32 when input is QASYMM8/QASYMM8_SIGNED.
[in]outputDestination tensor. Data type supported: Same as input.
[in]conv_infoPadding and stride information to use for the convolution.
[in]depth_multiplier(Optional) Multiplier to apply to the input's depth in order to retrieve the output's depth. Defaults to 1.
[in]act_info(Optional) Activation layer information in case of a fused activation. Only RELU, BOUNDED_RELU and LU_BOUNDED_RELU are supported.
[in]gpu_target(Optional) GPU target to validate the kernel for. Defaults to midgard.
[in]dilation(Optional) Dilation, in elements, across x and y. Defaults to (1, 1).
[in]output_multipliers(Optional) Output multipliers tensor info for quantized computations. In case of per-channel quantization, the number of multipliers must be equal to the number of filters (IFM). Supported data types: S32
[in]output_shifts(Optional) Output shifts tensor for quantized computations. In case of per-channel quantization, the number of multipliers must be equal to the number of filters (IFM). Supported data types: S32
Returns
a status

Definition at line 371 of file CLDepthwiseConvolutionLayer3x3NCHWKernel.cpp.

References ARM_COMPUTE_RETURN_ON_ERROR, ICloneable< T >::clone(), arm_compute::test::validation::conv_info, kernel_name, and arm_compute::validate_arguments().

374 {
375  std::string kernel_name;
376  ARM_COMPUTE_RETURN_ON_ERROR(validate_arguments(input, weights, biases, output, conv_info, depth_multiplier, act_info, dilation, output_multipliers, output_shifts));
377  ARM_COMPUTE_RETURN_ON_ERROR(validate_and_configure_window(input->clone().get(), weights->clone().get(), output->clone().get(),
378  conv_info, depth_multiplier, gpu_target, kernel_name, dilation)
379  .first);
380 
381  return Status{};
382 }
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204
std::string kernel_name
Status validate_arguments(const ITensorInfo *input, const ITensorInfo *bias, const ITensorInfo *output, const GEMMLowpOutputStageInfo *output_stage)

The documentation for this class was generated from the following files: