Compute Library
 21.02
ClPoolingKernel Class Reference

Interface for the pooling layer kernel. More...

#include <ClPoolingKernel.h>

Collaboration diagram for ClPoolingKernel:
[legend]

Public Member Functions

 ClPoolingKernel ()
 Default constructor. More...
 
 ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE (ClPoolingKernel)
 
void configure (const ClCompileContext &compile_context, ITensorInfo *src, ITensorInfo *dst, const PoolingLayerInfo &pool_info, ITensorInfo *indices=nullptr)
 Configure kernel for a given list of arguments. More...
 
void run_op (ITensorPack &tensors, const Window &window, cl::CommandQueue &queue) override
 Enqueue the OpenCL kernel to process the given window on the passed OpenCL command queue. More...
 
BorderSize border_size () const override
 The size of the border for that kernel. More...
 
- Public Member Functions inherited from ICLKernel
 ICLKernel ()
 Constructor. More...
 
cl::Kernel & kernel ()
 Returns a reference to the OpenCL kernel of this object. More...
 
template<typename T >
void add_1D_array_argument (unsigned int &idx, const ICLArray< T > *array, const Strides &strides, unsigned int num_dimensions, const Window &window)
 Add the passed 1D array's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_1D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 1D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_1D_tensor_argument_if (bool cond, unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 1D tensor's parameters to the object's kernel's arguments starting from the index idx if the condition is true. More...
 
void add_2D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 2D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_2D_tensor_argument_if (bool cond, unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 2D tensor's parameters to the object's kernel's arguments starting from the index idx if the condition is true. More...
 
void add_3D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 3D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_4D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 4D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
virtual void run (const Window &window, cl::CommandQueue &queue)
 Enqueue the OpenCL kernel to process the given window on the passed OpenCL command queue. More...
 
template<typename T >
void add_argument (unsigned int &idx, T value)
 Add the passed parameters to the object's kernel's arguments starting from the index idx. More...
 
void set_lws_hint (const cl::NDRange &lws_hint)
 Set the Local-Workgroup-Size hint. More...
 
cl::NDRange lws_hint () const
 Return the Local-Workgroup-Size hint. More...
 
void set_wbsm_hint (const cl_int &wbsm_hint)
 Set the workgroup batch size modifier hint. More...
 
cl_int wbsm_hint () const
 Return the workgroup batch size modifier hint. More...
 
const std::string & config_id () const
 Get the configuration ID. More...
 
void set_target (GPUTarget target)
 Set the targeted GPU architecture. More...
 
void set_target (cl::Device &device)
 Set the targeted GPU architecture according to the CL device. More...
 
GPUTarget get_target () const
 Get the targeted GPU architecture. More...
 
size_t get_max_workgroup_size ()
 Get the maximum workgroup size for the device the CLKernelLibrary uses. More...
 
template<unsigned int dimension_size>
void add_tensor_argument (unsigned &idx, const ICLTensor *tensor, const Window &window)
 
template<typename T , unsigned int dimension_size>
void add_array_argument (unsigned &idx, const ICLArray< T > *array, const Strides &strides, unsigned int num_dimensions, const Window &window)
 Add the passed array's parameters to the object's kernel's arguments starting from the index idx. More...
 
- Public Member Functions inherited from IKernel
 IKernel ()
 Constructor. More...
 
virtual ~IKernel ()=default
 Destructor. More...
 
virtual bool is_parallelisable () const
 Indicates whether or not the kernel is parallelisable. More...
 
const Windowwindow () const
 The maximum window the kernel can be executed on. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *src, const ITensorInfo *dst, const PoolingLayerInfo &pool_info, const ITensorInfo *indices=nullptr)
 Static function to check if given info will lead to a valid configuration of ClPoolingKernel. More...
 
- Static Public Member Functions inherited from ICLKernel
static constexpr unsigned int num_arguments_per_1D_array ()
 Returns the number of arguments enqueued per 1D array object. More...
 
static constexpr unsigned int num_arguments_per_1D_tensor ()
 Returns the number of arguments enqueued per 1D tensor object. More...
 
static constexpr unsigned int num_arguments_per_2D_tensor ()
 Returns the number of arguments enqueued per 2D tensor object. More...
 
static constexpr unsigned int num_arguments_per_3D_tensor ()
 Returns the number of arguments enqueued per 3D tensor object. More...
 
static constexpr unsigned int num_arguments_per_4D_tensor ()
 Returns the number of arguments enqueued per 4D tensor object. More...
 
static cl::NDRange gws_from_window (const Window &window)
 Get the global work size given an execution window. More...
 

Data Fields

PoolingLayerInfo _pool_info
 
DataLayout _data_layout
 
BorderSize _border_size
 
unsigned int _num_elems_processed_per_iteration
 

Detailed Description

Interface for the pooling layer kernel.

Definition at line 38 of file ClPoolingKernel.h.

Constructor & Destructor Documentation

◆ ClPoolingKernel()

Member Function Documentation

◆ ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE()

ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE ( ClPoolingKernel  )

◆ border_size()

BorderSize border_size ( ) const
overridevirtual

The size of the border for that kernel.

Returns
The width in number of elements of the border.

Reimplemented from IKernel.

Definition at line 203 of file ClPoolingKernel.cpp.

References ClPoolingKernel::_border_size.

204 {
205  return _border_size;
206 }

◆ configure()

void configure ( const ClCompileContext compile_context,
ITensorInfo src,
ITensorInfo dst,
const PoolingLayerInfo pool_info,
ITensorInfo indices = nullptr 
)

Configure kernel for a given list of arguments.

Parameters
[in]compile_contextThe compile context to be used.
[in]srcSource tensor info. Data types supported: QASYMM8/QASYMM8_SIGNED/F16/F32.
[out]dstDestination tensor info. Data types supported: same as src.
[in]pool_infoContains pooling operation information described in PoolingLayerInfo.
[out]indices(optional) The indices of the maximal values. Data type supported: U32.

Definition at line 208 of file ClPoolingKernel.cpp.

References ClPoolingKernel::_border_size, ClPoolingKernel::_data_layout, ClPoolingKernel::_num_elems_processed_per_iteration, ClPoolingKernel::_pool_info, CLBuildOptions::add_option(), CLBuildOptions::add_option_if(), ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, arm_compute::BATCHES, BorderSize::bottom, arm_compute::CHANNEL, arm_compute::create_kernel(), ITensorInfo::data_layout(), PoolingLayerInfo::data_layout, ITensorInfo::data_type(), ITensorInfo::dimension(), arm_compute::test::validation::dst, PoolingLayerInfo::exclude_padding, arm_compute::F16, arm_compute::F32, arm_compute::float_to_string_with_full_precision(), PoolingLayerInfo::fp_mixed_precision, arm_compute::get_cl_type_from_data_type(), arm_compute::get_data_layout_dimension_index(), arm_compute::get_min_max(), arm_compute::get_padding_info(), arm_compute::has_padding_changed(), Size2D::height, arm_compute::HEIGHT, arm_compute::is_data_type_float(), arm_compute::is_data_type_quantized(), arm_compute::is_data_type_quantized_asymmetric(), PoolingLayerInfo::is_global_pooling, kernel_name, BorderSize::left, arm_compute::lower_string(), arm_compute::support::cpp11::lowest(), arm_compute::MAX, arm_compute::NCHW, arm_compute::NHWC, UniformQuantizationInfo::offset, CLBuildOptions::options(), PadStrideInfo::pad_left(), PoolingLayerInfo::pad_stride_info, PadStrideInfo::pad_top(), ITensorInfo::padding(), PoolingLayerInfo::pool_size, PoolingLayerInfo::pool_type, ITensorInfo::quantization_info(), BorderSize::right, arm_compute::S32, UniformQuantizationInfo::scale, arm_compute::test::validation::src, PadStrideInfo::stride(), arm_compute::string_from_data_layout(), arm_compute::string_from_data_type(), arm_compute::string_from_pooling_type(), arm_compute::support::cpp11::to_string(), BorderSize::top, type_min, QuantizationInfo::uniform(), arm_compute::UNKNOWN, arm_compute::validate_arguments(), Size2D::width, and arm_compute::WIDTH.

209 {
211 
212  auto padding_info = get_padding_info({ src, dst, indices });
213 
214  // Set instance variables
215  _pool_info = pool_info;
216  _data_layout = pool_info.data_layout == DataLayout::UNKNOWN ? src->data_layout() : pool_info.data_layout;
217  int pool_stride_x = 0;
218  int pool_stride_y = 0;
219  const PoolingType pool_type = pool_info.pool_type;
224  const int pool_size_x = pool_info.is_global_pooling ? src->dimension(idx_width) : pool_info.pool_size.width;
225  const int pool_size_y = pool_info.is_global_pooling ? src->dimension(idx_height) : pool_info.pool_size.height;
226  const PadStrideInfo pad_stride_info = pool_info.pad_stride_info;
227  const bool exclude_padding = pool_info.exclude_padding;
228  std::tie(pool_stride_x, pool_stride_y) = pad_stride_info.stride();
229  const int pool_pad_top = pad_stride_info.pad_top();
230  const int pool_pad_left = pad_stride_info.pad_left();
231 
232  // Set build options
233  CLBuildOptions build_opts;
234  const DataType data_type = src->data_type();
235 
236  // Configure kernel window
237  auto win_config = validate_and_configure_window(src, dst, pool_info, indices);
238 
239  ARM_COMPUTE_ERROR_THROW_ON(std::get<0>(win_config));
240  ICLKernel::configure_internal(std::get<1>(win_config));
241 
242  ClPoolingConfig pooling_config = std::get<2>(win_config);
243  _num_elems_processed_per_iteration = pooling_config.first;
244  _border_size = pooling_config.second;
245 
246  build_opts.add_option("-DVEC_SIZE=" + support::cpp11::to_string(_num_elems_processed_per_iteration));
247 
248  // Tensor paddings are used to calculate the indicies for MAX pooling
249  if(pool_info.pool_size == Size2D(2, 2) && pool_type == PoolingType::MAX && indices && is_data_type_float(data_type))
250  {
251  build_opts.add_option("-DPAD_TENSOR_LEFT=" + support::cpp11::to_string(src->padding().left));
252  build_opts.add_option("-DPAD_TENSOR_RIGHT=" + support::cpp11::to_string(src->padding().right));
253  build_opts.add_option("-DPAD_TENSOR_TOP=" + support::cpp11::to_string(src->padding().top));
254  build_opts.add_option("-DPAD_TENSOR_BOTTOM=" + support::cpp11::to_string(src->padding().bottom));
255  build_opts.add_option("-DTENSOR_CHANNEL=" + support::cpp11::to_string(src->dimension(idx_channel)));
256  build_opts.add_option("-DTENSOR_WIDTH=" + support::cpp11::to_string(src->dimension(idx_width)));
257  build_opts.add_option("-DTENSOR_HEIGHT=" + support::cpp11::to_string(src->dimension(idx_height)));
258  }
259 
260  if(is_data_type_quantized_asymmetric(data_type) && src->quantization_info() != dst->quantization_info())
261  {
262  const UniformQuantizationInfo iq_info = src->quantization_info().uniform();
263  const UniformQuantizationInfo oq_info = dst->quantization_info().uniform();
264 
265  build_opts.add_option("-DOFFSET_IN1=" + float_to_string_with_full_precision(iq_info.offset));
266  build_opts.add_option("-DOFFSET_OUT=" + float_to_string_with_full_precision(oq_info.offset));
267  build_opts.add_option("-DSCALE_IN1=" + float_to_string_with_full_precision(iq_info.scale));
268  build_opts.add_option("-DSCALE_OUT=" + float_to_string_with_full_precision(oq_info.scale));
269  }
270 
271  // Check dst dimensions
272  auto_init(src, dst, indices, pool_info);
273 
274  ARM_COMPUTE_ERROR_THROW_ON(validate_arguments(src, dst, pool_info, indices));
275 
276  build_opts.add_option("-DDATA_TYPE=" + get_cl_type_from_data_type(data_type));
277  build_opts.add_option("-DPOOL_" + string_from_pooling_type(pool_type));
278  build_opts.add_option("-DSTRIDE_X=" + support::cpp11::to_string(pool_stride_x));
279  build_opts.add_option("-DSTRIDE_Y=" + support::cpp11::to_string(pool_stride_y));
280  build_opts.add_option("-DPAD_X=" + support::cpp11::to_string(pool_pad_left));
281  build_opts.add_option("-DPAD_Y=" + support::cpp11::to_string(pool_pad_top));
282  build_opts.add_option("-DPOOL_SIZE_X=" + support::cpp11::to_string(pool_size_x));
283  build_opts.add_option("-DPOOL_SIZE_Y=" + support::cpp11::to_string(pool_size_y));
284 
285  // Set the initial value for the pooling operation accordingly with the data type
286  if(pool_type == PoolingType::MAX)
287  {
288  if(is_data_type_quantized(data_type))
289  {
290  PixelValue type_min{};
291  std::tie(type_min, std::ignore) = get_min_max(data_type);
292  build_opts.add_option("-DINITIAL_VALUE=" + support::cpp11::to_string(type_min.get<int32_t>()));
293  }
294  else
295  {
296  build_opts.add_option("-DINITIAL_VALUE=" + float_to_string_with_full_precision(std::numeric_limits<float>::lowest()));
297  }
298  }
299  else
300  {
301  // Pool AVG and Pool L2 initial value
302  build_opts.add_option("-DINITIAL_VALUE=0");
303  }
304 
305  build_opts.add_option("-DMAX_WIDTH=" + support::cpp11::to_string(src->dimension(idx_width) + (exclude_padding ? 0 : pool_pad_left)));
306  build_opts.add_option("-DMAX_HEIGHT=" + support::cpp11::to_string(src->dimension(idx_height) + (exclude_padding ? 0 : pool_pad_top)));
307 
308  // Create kernel
309  switch(_data_layout)
310  {
311  case DataLayout::NCHW:
312  {
313  const auto use_fp_mixed_precision = (data_type == DataType::F16) && pool_info.fp_mixed_precision;
314  const auto use_wider_accumulator = use_fp_mixed_precision && (pool_type != PoolingType::MAX);
315  const auto acc_data_type = get_cl_type_from_data_type(use_wider_accumulator ? DataType::F32 : data_type);
316  build_opts.add_option("-DACC_DATA_TYPE=" + acc_data_type);
317  build_opts.add_option_if(use_wider_accumulator, "-DFP_MIXED_PRECISION");
318 
319  if(pool_type != PoolingType::MAX)
320  {
321  build_opts.add_option_if(exclude_padding, "-DEXCLUDE_PADDING");
322  }
323 
324  if((pool_size_x == 3) && (pool_size_y == 3) && !is_data_type_quantized_asymmetric(data_type))
325  {
326  // Check if we have pool3x3 with stride_x less equal than 3. In these cases, run an optimized OpenCL kernel where
327  // each thread computes 4 dst elements
328  const bool is_pool3x3_stride_le3 = (pool_size_x == 3) && (pool_size_y == 3) && (pool_stride_x <= 3);
329 
330  std::string kernel_name = ((is_pool3x3_stride_le3) ? "pooling_layer_optimized_" : "pooling_layer_")
331  + support::cpp11::to_string(pool_size_x);
332  _kernel = create_kernel(compile_context, kernel_name, build_opts.options());
333  }
334  else if(pool_info.pool_size == Size2D(2, 2) && pool_type == PoolingType::MAX && indices && is_data_type_float(data_type))
335  {
336  // For max pooling with pool2x2, store indicies which will be used in max unpooling
337  if(data_type == DataType::F32)
338  {
339  std::string kernel_name = "pooling_layer_2_nchw_indices_fp32";
340  _kernel = create_kernel(compile_context, kernel_name, build_opts.options());
341  }
342  else if(data_type == DataType::F16)
343  {
344  std::string kernel_name = "pooling_layer_2_nchw_indices_fp16";
345  _kernel = create_kernel(compile_context, kernel_name, build_opts.options());
346  }
347  }
348  else // Run general case
349  {
350  std::string kernel_name = is_data_type_quantized_asymmetric(data_type) ? "pooling_layer_MxN_quantized_nchw" : "pooling_layer_MxN_nchw";
351  _kernel = create_kernel(compile_context, kernel_name, build_opts.options());
352  }
353  break;
354  }
355  case DataLayout::NHWC:
356  {
357  // Floating point mixed precision is support on F16 only
358  const auto use_fp_mixed_precision = (data_type == DataType::F16) && pool_info.fp_mixed_precision && pool_type != PoolingType::MAX;
359 
360  // Wider accumulation is required to avoid accuracy loss
361  // Case 1: Floating point mixed precision (fp16 src data and fp32 accumulation)
362  // Cast 2: Quantized (int8/uint8 src data and int32 accumulation )
363  DataType acc_data_type = data_type;
364 
365  if(use_fp_mixed_precision)
366  {
367  acc_data_type = DataType::F32;
368  }
369  else if(is_data_type_quantized(data_type) && pool_type != PoolingType::MAX)
370  {
371  acc_data_type = DataType::S32;
372  }
373 
374  build_opts.add_option("-DACC_DATA_TYPE=" + get_cl_type_from_data_type(acc_data_type));
375  build_opts.add_option_if(use_fp_mixed_precision, "-DFP_MIXED_PRECISION");
376  build_opts.add_option_if(exclude_padding, "-DEXCLUDE_PADDING");
377  build_opts.add_option("-DSRC_WIDTH=" + support::cpp11::to_string(src->dimension(idx_width)));
378  build_opts.add_option("-DSRC_HEIGHT=" + support::cpp11::to_string(src->dimension(idx_height)));
379  build_opts.add_option("-DDST_HEIGHT=" + support::cpp11::to_string(dst->dimension(idx_height)));
380  build_opts.add_option("-DDST_CHANNELS=" + support::cpp11::to_string(dst->dimension(idx_channel)));
381  build_opts.add_option("-DDST_BATCH_SIZE=" + support::cpp11::to_string(dst->dimension(idx_batch_size)));
382  build_opts.add_option("-DVEC_SIZE_LEFTOVER=" + support::cpp11::to_string(src->dimension(0) % _num_elems_processed_per_iteration));
383  if(pool_info.pool_size == Size2D(2, 2) && is_data_type_float(data_type))
384  {
385  build_opts.add_option_if(indices != nullptr && pool_type == PoolingType::MAX, "-DEXTRACT_MAX_INDEX");
386 
387  std::string kernel_name = "pooling_layer_2x2_nhwc";
388  _kernel = create_kernel(compile_context, kernel_name, build_opts.options());
389  }
390  else
391  {
392  std::string kernel_name = is_data_type_quantized_asymmetric(data_type) ? "pooling_layer_MxN_quantized_nhwc" : "pooling_layer_MxN_nhwc";
393  _kernel = create_kernel(compile_context, kernel_name, build_opts.options());
394  }
395  break;
396  }
397  default:
398  ARM_COMPUTE_ERROR("Not implemented");
399  }
400 
401  // Set config_id for enabling LWS tuning
402  _config_id = "pooling_layer_";
403  _config_id += lower_string(string_from_data_type(data_type));
404  _config_id += "_";
406  _config_id += "_";
407  _config_id += support::cpp11::to_string(dst->dimension(idx_width));
408  _config_id += "_";
409  _config_id += support::cpp11::to_string(dst->dimension(idx_height));
410  _config_id += "_";
411  _config_id += support::cpp11::to_string(dst->dimension(idx_channel));
412  _config_id += "_";
413  _config_id += lower_string(string_from_data_layout(src->data_layout()));
414 
415  ARM_COMPUTE_ERROR_ON(src->data_layout() == DataLayout::NHWC && has_padding_changed(padding_info));
416 }
bool is_data_type_quantized(DataType dt)
Check if a given data type is of quantized type.
Definition: Utils.h:1168
#define ARM_COMPUTE_ERROR(msg)
Print the given message then throw an std::runtime_error.
Definition: Error.h:352
std::string to_string(T &&value)
Convert integer and float values to string.
1 channel, 1 F32 per channel
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:466
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
std::string lower_string(const std::string &val)
Lower a given string.
Definition: Utils.cpp:350
SimpleTensor< float > src
Definition: DFT.cpp:155
int pool_stride_x
1 channel, 1 F16 per channel
1 channel, 1 S32 per channel
const DataType data_type
Definition: Im2Col.cpp:150
cl::Kernel create_kernel(const CLCompileContext &ctx, const std::string &kernel_name, const std::set< std::string > &build_opts=std::set< std::string >())
Creates an opencl kernel using a compile context.
Definition: CLHelpers.cpp:403
const std::string & string_from_data_type(DataType dt)
Convert a data type identity into a string.
Definition: Utils.cpp:135
std::string float_to_string_with_full_precision(float val)
Create a string with the float in full precision.
Definition: Utils.h:1262
std::string kernel_name
std::string get_cl_type_from_data_type(const DataType &dt)
Translates a tensor data type to the appropriate OpenCL type.
Definition: CLHelpers.cpp:37
bool has_padding_changed(const std::unordered_map< const ITensorInfo *, PaddingSize > &padding_map)
Check if the previously stored padding info has changed after configuring a kernel.
Definition: Utils.cpp:528
Num samples, channels, height, width.
bool is_data_type_quantized_asymmetric(DataType dt)
Check if a given data type is of asymmetric quantized type.
Definition: Utils.h:1190
__constant DATA_TYPE16 type_min
Definition: minmaxloc.cl:46
PoolingType
Available pooling types.
Definition: Types.h:610
const std::string & string_from_data_layout(DataLayout dl)
Convert a data layout identity into a string.
Definition: Utils.cpp:123
Num samples, height, width, channels.
std::unordered_map< const ITensorInfo *, PaddingSize > get_padding_info(std::initializer_list< const ITensorInfo *> infos)
Stores padding information before configuring a kernel.
Definition: Utils.cpp:513
Status validate_arguments(const ITensorInfo *input, const ITensorInfo *bias, const ITensorInfo *output, const GEMMLowpOutputStageInfo *output_stage)
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161
size_t get_data_layout_dimension_index(const DataLayout data_layout, const DataLayoutDimension data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:193
DataType
Available data types.
Definition: Types.h:77
const std::string & string_from_pooling_type(PoolingType type)
Translates a given pooling type to a string.
Definition: Utils.cpp:248
std::tuple< PixelValue, PixelValue > get_min_max(DataType dt)
Compute the mininum and maximum values a data type can take.
Definition: Utils.h:564
bool is_data_type_float(DataType dt)
Check if a given data type is of floating point type.
Definition: Utils.h:1148

◆ run_op()

void run_op ( ITensorPack tensors,
const Window window,
cl::CommandQueue &  queue 
)
overridevirtual

Enqueue the OpenCL kernel to process the given window on the passed OpenCL command queue.

Note
The queue is not flushed by this method, and therefore the kernel will not have been executed by the time this method returns.
Parameters
[in]tensorsA vector containing the tensors to operato on.
[in]windowRegion on which to execute the kernel. (Must be a valid region of the window returned by window()).
[in,out]queueCommand queue on which to enqueue the kernel.

Reimplemented from ICLKernel.

Definition at line 426 of file ClPoolingKernel.cpp.

References ClPoolingKernel::_data_layout, ClPoolingKernel::_num_elems_processed_per_iteration, ClPoolingKernel::_pool_info, arm_compute::ACL_DST_0, arm_compute::ACL_DST_1, arm_compute::ACL_SRC, ICLKernel::add_3D_tensor_argument(), ICLKernel::add_4D_tensor_argument(), ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW, ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL, Window::collapse_if_possible(), Window::DimX, Window::DimY, Window::DimZ, Window::Dimension::end(), arm_compute::enqueue(), Window::first_slice_window_4D(), ITensorPack::get_const_tensor(), ITensorPack::get_tensor(), arm_compute::is_data_type_float(), ICLKernel::lws_hint(), arm_compute::MAX, arm_compute::NCHW, arm_compute::NHWC, PadStrideInfo::pad_left(), PoolingLayerInfo::pad_stride_info, PadStrideInfo::pad_top(), PoolingLayerInfo::pool_size, pool_stride_x, PoolingLayerInfo::pool_type, Window::set(), arm_compute::test::validation::reference::slice(), Window::slide_window_slice_4D(), Window::Dimension::start(), PadStrideInfo::stride(), IKernel::window(), Window::x(), and Window::y().

427 {
430 
431  unsigned int pool_stride_x = 0;
432  unsigned int pool_stride_y = 0;
433  std::tie(pool_stride_x, pool_stride_y) = _pool_info.pad_stride_info.stride();
434 
435  const auto src = utils::cast::polymorphic_downcast<const ICLTensor *>(tensors.get_const_tensor(TensorType::ACL_SRC));
436  auto dst = utils::cast::polymorphic_downcast<ICLTensor *>(tensors.get_tensor(TensorType::ACL_DST_0));
437  auto indices = utils::cast::polymorphic_downcast<ICLTensor *>(tensors.get_tensor(TensorType::ACL_DST_1));
438 
439  // Collapse window
440  Window window_collapsed = window.collapse_if_possible(ICLKernel::window(), Window::DimZ);
441 
442  switch(_data_layout)
443  {
444  case DataLayout::NCHW:
445  {
446  Window slice = window_collapsed.first_slice_window_3D();
447  do
448  {
449  // Upsample src by pool size
450  Window in_slice(slice);
451  in_slice.set(Window::DimX, Window::Dimension(in_slice.x().start() - _pool_info.pad_stride_info.pad_left(),
452  (in_slice.x().end() - _pool_info.pad_stride_info.pad_left()) * pool_stride_x,
453  pool_stride_x * _num_elems_processed_per_iteration));
454  in_slice.set(Window::DimY, Window::Dimension(in_slice.y().start() - _pool_info.pad_stride_info.pad_top(),
455  (in_slice.y().end() - _pool_info.pad_stride_info.pad_top()) * pool_stride_y,
456  pool_stride_y));
457 
458  // Set srcs
459  unsigned int idx = 0;
460  add_3D_tensor_argument(idx, src, in_slice);
461  add_3D_tensor_argument(idx, dst, slice);
462  if(indices && is_data_type_float(src->info()->data_type()) && (_pool_info.pool_size == Size2D(2, 2)))
463  {
464  add_3D_tensor_argument(idx, indices, slice);
465  }
466  enqueue(queue, *this, slice, lws_hint());
467  }
468  while(window_collapsed.slide_window_slice_3D(slice));
469  break;
470  }
471  case DataLayout::NHWC:
472  {
473  const size_t batch_size = dst->info()->tensor_shape().total_size_upper(3);
474 
475  Window slice = window_collapsed.first_slice_window_4D();
476  Window in_slice = window_collapsed.first_slice_window_4D();
477  in_slice.set(Window::DimX, Window::Dimension(0, src->info()->dimension(0), _num_elems_processed_per_iteration));
478  in_slice.set(Window::DimY, Window::Dimension(0, src->info()->dimension(1), pool_stride_x));
479  in_slice.set(Window::DimZ, Window::Dimension(0, src->info()->dimension(2), pool_stride_y));
480  in_slice.set(3, Window::Dimension(0, batch_size, 1));
481  do
482  {
483  // Set srcs
484  unsigned int idx = 0;
485  add_4D_tensor_argument(idx, src, in_slice);
486  add_4D_tensor_argument(idx, dst, slice);
487  if(indices && is_data_type_float(src->info()->data_type()) && (_pool_info.pool_type == PoolingType::MAX) && (_pool_info.pool_size == Size2D(2, 2)))
488  {
489  add_4D_tensor_argument(idx, indices, slice);
490  }
491  enqueue(queue, *this, slice, lws_hint());
492  }
493  while(window.slide_window_slice_4D(slice) && window.slide_window_slice_4D(in_slice));
494  break;
495  }
496  default:
497  ARM_COMPUTE_ERROR("Not implemented");
498  }
499 }
const Window & window() const
The maximum window the kernel can be executed on.
Definition: IKernel.cpp:28
void enqueue(IGCKernel &kernel, const Window &window, const gles::NDRange &lws=gles::NDRange(1U, 1U, 1U))
Add the kernel to the command queue with the given window.
Definition: IGCKernel.cpp:41
#define ARM_COMPUTE_ERROR(msg)
Print the given message then throw an std::runtime_error.
Definition: Error.h:352
cl::NDRange lws_hint() const
Return the Local-Workgroup-Size hint.
Definition: ICLKernel.h:276
unsigned int pad_top() const
Get the top padding.
Definition: Types.h:806
void add_3D_tensor_argument(unsigned int &idx, const ICLTensor *tensor, const Window &window)
Add the passed 3D tensor&#39;s parameters to the object&#39;s kernel&#39;s arguments starting from the index idx...
Definition: ICLKernel.h:172
SimpleTensor< float > src
Definition: DFT.cpp:155
int pool_stride_x
static constexpr size_t DimX
Alias for dimension 0 also known as X dimension.
Definition: Window.h:43
Window collapse_if_possible(const Window &full_window, size_t first, size_t last, bool *has_collapsed=nullptr) const
Collapse the dimensions between first and last if possible.
Definition: Window.inl:68
std::pair< unsigned int, unsigned int > stride() const
Get the stride.
Definition: Types.h:770
#define ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL(k)
Definition: Validate.h:941
Num samples, channels, height, width.
static constexpr size_t DimY
Alias for dimension 1 also known as Y dimension.
Definition: Window.h:45
PadStrideInfo pad_stride_info
Definition: Types.h:1302
static constexpr size_t DimZ
Alias for dimension 2 also known as Z dimension.
Definition: Window.h:47
Num samples, height, width, channels.
bool slide_window_slice_4D(Window &slice) const
Slide the passed 4D window slice.
Definition: Window.h:347
void add_4D_tensor_argument(unsigned int &idx, const ICLTensor *tensor, const Window &window)
Add the passed 4D tensor&#39;s parameters to the object&#39;s kernel&#39;s arguments starting from the index idx...
Definition: ICLKernel.h:182
unsigned int pad_left() const
Get the left padding.
Definition: Types.h:796
bool is_data_type_float(DataType dt)
Check if a given data type is of floating point type.
Definition: Utils.h:1148
#define ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW(f, s)
Definition: Validate.h:205
SimpleTensor< T > slice(const SimpleTensor< T > &src, Coordinates starts, Coordinates ends)

◆ validate()

Status validate ( const ITensorInfo src,
const ITensorInfo dst,
const PoolingLayerInfo pool_info,
const ITensorInfo indices = nullptr 
)
static

Static function to check if given info will lead to a valid configuration of ClPoolingKernel.

Parameters
[in]srcSource tensor info. Data types supported: QASYMM8/QASYMM8_SIGNED/F16/F32.
[in]dstDestination tensor info. Data types supported: same as src.
[in]pool_infoContains pooling operation information described in PoolingLayerInfo.
[in]indices(optional) The indices of the maximal values. Data type supported: U32.
Returns
a status

Definition at line 418 of file ClPoolingKernel.cpp.

References ARM_COMPUTE_RETURN_ON_ERROR, ICloneable< T >::clone(), and arm_compute::validate_arguments().

Referenced by ClPooling::validate().

419 {
420  ARM_COMPUTE_RETURN_ON_ERROR(validate_arguments(src, dst, pool_info, indices));
421  ARM_COMPUTE_RETURN_ON_ERROR(std::get<0>(validate_and_configure_window(src->clone().get(), dst->clone().get(), pool_info)));
422 
423  return Status{};
424 }
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204
SimpleTensor< float > src
Definition: DFT.cpp:155
Status validate_arguments(const ITensorInfo *input, const ITensorInfo *bias, const ITensorInfo *output, const GEMMLowpOutputStageInfo *output_stage)

Field Documentation

◆ _border_size

BorderSize _border_size

Definition at line 73 of file ClPoolingKernel.h.

Referenced by ClPoolingKernel::border_size(), and ClPoolingKernel::configure().

◆ _data_layout

DataLayout _data_layout

Definition at line 72 of file ClPoolingKernel.h.

Referenced by ClPoolingKernel::configure(), and ClPoolingKernel::run_op().

◆ _num_elems_processed_per_iteration

unsigned int _num_elems_processed_per_iteration

Definition at line 74 of file ClPoolingKernel.h.

Referenced by ClPoolingKernel::configure(), and ClPoolingKernel::run_op().

◆ _pool_info

PoolingLayerInfo _pool_info

Definition at line 71 of file ClPoolingKernel.h.

Referenced by ClPoolingKernel::configure(), and ClPoolingKernel::run_op().


The documentation for this class was generated from the following files: