Compute Library
 22.11
ClPool2dKernel Class Reference

Interface for the pooling layer kernel. More...

#include <ClPool2dKernel.h>

Collaboration diagram for ClPool2dKernel:
[legend]

Public Member Functions

 ClPool2dKernel ()
 
 ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE (ClPool2dKernel)
 
void configure (const ClCompileContext &compile_context, ITensorInfo *src, ITensorInfo *dst, const PoolingLayerInfo &pool_info, ITensorInfo *indices=nullptr)
 Configure kernel for a given list of arguments. More...
 
void run_op (ITensorPack &tensors, const Window &window, cl::CommandQueue &queue) override
 Enqueue the OpenCL kernel to process the given window on the passed OpenCL command queue. More...
 
- Public Member Functions inherited from ICLKernel
 ICLKernel ()
 Constructor. More...
 
cl::Kernel & kernel ()
 Returns a reference to the OpenCL kernel of this object. More...
 
CLKernelType type () const
 Returns the CL kernel type. More...
 
template<typename T >
void add_1D_array_argument (unsigned int &idx, const ICLArray< T > *array, const Strides &strides, unsigned int num_dimensions, const Window &window)
 Add the passed 1D array's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_1D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 1D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_1D_tensor_argument_if (bool cond, unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 1D tensor's parameters to the object's kernel's arguments starting from the index idx if the condition is true. More...
 
void add_2D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 2D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_2D_tensor_argument_if (bool cond, unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 2D tensor's parameters to the object's kernel's arguments starting from the index idx if the condition is true. More...
 
void add_3D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 3D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_4D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 4D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_5D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 5D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_3d_tensor_nhw_argument (unsigned int &idx, const ICLTensor *tensor)
 Add the passed NHW 3D tensor's parameters to the object's kernel's arguments by passing strides, dimensions and the offset to the first valid element in bytes. More...
 
void add_4d_tensor_nhwc_argument (unsigned int &idx, const ICLTensor *tensor)
 Add the passed NHWC 4D tensor's parameters to the object's kernel's arguments by passing strides, dimensions and the offset to the first valid element in bytes. More...
 
virtual void run (const Window &window, cl::CommandQueue &queue)
 Enqueue the OpenCL kernel to process the given window on the passed OpenCL command queue. More...
 
virtual void run_composite_op (ITensorPack &tensors, const Window &window, cl::CommandQueue &queue, const experimental::dynamic_fusion::ClExecutionDescriptor &exec_desc)
 The execution is carried out through run_op method. But the run_op method needs to be extended to include ClExecutionDescriptor as now LWS GWS tuning will be separated from the IKernel. More...
 
template<typename T >
void add_argument (unsigned int &idx, T value)
 Add the passed parameters to the object's kernel's arguments starting from the index idx. More...
 
void set_lws_hint (const cl::NDRange &lws_hint)
 Set the Local-Workgroup-Size hint. More...
 
cl::NDRange lws_hint () const
 Return the Local-Workgroup-Size hint. More...
 
void set_wbsm_hint (const cl_int &wbsm_hint)
 Set the workgroup batch size modifier hint. More...
 
cl_int wbsm_hint () const
 Return the workgroup batch size modifier hint. More...
 
const std::string & config_id () const
 Get the configuration ID. More...
 
void set_target (GPUTarget target)
 Set the targeted GPU architecture. More...
 
void set_target (cl::Device &device)
 Set the targeted GPU architecture according to the CL device. More...
 
GPUTarget get_target () const
 Get the targeted GPU architecture. More...
 
size_t get_max_workgroup_size ()
 Get the maximum workgroup size for the device the CLKernelLibrary uses. More...
 
template<unsigned int dimension_size>
void add_tensor_argument (unsigned &idx, const ICLTensor *tensor, const Window &window)
 
template<typename T , unsigned int dimension_size>
void add_array_argument (unsigned &idx, const ICLArray< T > *array, const Strides &strides, unsigned int num_dimensions, const Window &window)
 Add the passed array's parameters to the object's kernel's arguments starting from the index idx. More...
 
- Public Member Functions inherited from IKernel
 IKernel ()
 Constructor. More...
 
virtual ~IKernel ()=default
 Destructor. More...
 
virtual bool is_parallelisable () const
 Indicates whether or not the kernel is parallelisable. More...
 
virtual BorderSize border_size () const
 The size of the border for that kernel. More...
 
const Windowwindow () const
 The maximum window the kernel can be executed on. More...
 
bool is_window_configured () const
 Function to check if the embedded window of this kernel has been configured. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *src, const ITensorInfo *dst, const PoolingLayerInfo &pool_info, const ITensorInfo *indices=nullptr)
 Static function to check if given info will lead to a valid configuration. More...
 
- Static Public Member Functions inherited from ICLKernel
static constexpr unsigned int num_arguments_per_3d_tensor_nhw ()
 Returns the number of arguments enqueued per NHW 3D Tensor object. More...
 
static constexpr unsigned int num_arguments_per_4d_tensor_nhwc ()
 Returns the number of arguments enqueued per NHWC 4D Tensor object. More...
 
static constexpr unsigned int num_arguments_per_1D_array ()
 Returns the number of arguments enqueued per 1D array object. More...
 
static constexpr unsigned int num_arguments_per_1D_tensor ()
 Returns the number of arguments enqueued per 1D tensor object. More...
 
static constexpr unsigned int num_arguments_per_2D_tensor ()
 Returns the number of arguments enqueued per 2D tensor object. More...
 
static constexpr unsigned int num_arguments_per_3D_tensor ()
 Returns the number of arguments enqueued per 3D tensor object. More...
 
static constexpr unsigned int num_arguments_per_4D_tensor ()
 Returns the number of arguments enqueued per 4D tensor object. More...
 
static cl::NDRange gws_from_window (const Window &window)
 Get the global work size given an execution window. More...
 

Data Fields

PoolingLayerInfo _pool_info {}
 
DataLayout _data_layout { DataLayout::UNKNOWN }
 
unsigned int _num_elems_processed_per_iteration { 1 }
 

Detailed Description

Interface for the pooling layer kernel.

Definition at line 38 of file ClPool2dKernel.h.

Constructor & Destructor Documentation

◆ ClPool2dKernel()

Definition at line 94 of file ClPool2dKernel.cpp.

References arm_compute::POOL.

95 {
96  _type = CLKernelType::POOL;
97 }
Pool CL kernel type.
Definition: CLTypes.h:87

Member Function Documentation

◆ ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE()

ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE ( ClPool2dKernel  )

◆ configure()

void configure ( const ClCompileContext compile_context,
ITensorInfo src,
ITensorInfo dst,
const PoolingLayerInfo pool_info,
ITensorInfo indices = nullptr 
)

Configure kernel for a given list of arguments.

Parameters
[in]compile_contextThe compile context to be used.
[in]srcSource tensor info. Data types supported: QASYMM8/QASYMM8_SIGNED/F16/F32.
[out]dstDestination tensor info. Data types supported: same as src.
[in]pool_infoContains pooling operation information described in PoolingLayerInfo.
[out]indices(optional) The indices of the maximal values. Data type supported: U32.

Definition at line 99 of file ClPool2dKernel.cpp.

References CLBuildOptions::add_option(), CLBuildOptions::add_option_if(), arm_compute::adjust_vec_size(), ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, arm_compute::auto_init_if_empty(), arm_compute::BATCHES, arm_compute::calculate_max_window(), arm_compute::CHANNEL, ICloneable< T >::clone(), arm_compute::misc::shape_calculator::compute_pool_shape(), arm_compute::create_kernel(), ITensorInfo::data_layout(), PoolingLayerInfo::data_layout, arm_compute::test::validation::data_type, ITensorInfo::data_type(), ITensorInfo::dimension(), arm_compute::test::validation::dst, PoolingLayerInfo::exclude_padding, arm_compute::F16, arm_compute::F32, arm_compute::float_to_string_with_full_precision(), PoolingLayerInfo::fp_mixed_precision, arm_compute::get_cl_type_from_data_type(), arm_compute::get_data_layout_dimension_index(), arm_compute::get_min_max(), arm_compute::get_padding_info(), arm_compute::has_padding_changed(), Size2D::height, arm_compute::HEIGHT, arm_compute::is_data_type_float(), arm_compute::is_data_type_quantized(), arm_compute::is_data_type_quantized_asymmetric(), PoolingLayerInfo::is_global_pooling, kernel_name, arm_compute::lower_string(), arm_compute::support::cpp11::lowest(), arm_compute::MAX, arm_compute::NCHW, arm_compute::NHWC, UniformQuantizationInfo::offset, CLBuildOptions::options(), PadStrideInfo::pad_left(), PoolingLayerInfo::pad_stride_info, PadStrideInfo::pad_top(), PoolingLayerInfo::pool_size, PoolingLayerInfo::pool_type, ITensorInfo::quantization_info(), arm_compute::S32, UniformQuantizationInfo::scale, arm_compute::test::validation::src, PadStrideInfo::stride(), arm_compute::string_from_data_layout(), arm_compute::string_from_data_type(), arm_compute::string_from_pooling_type(), ITensorInfo::tensor_shape(), arm_compute::support::cpp11::to_string(), TensorShape::total_size_lower(), arm_compute::U32, QuantizationInfo::uniform(), arm_compute::UNKNOWN, arm_compute::cpu::kernels::validate_arguments(), Size2D::width, and arm_compute::WIDTH.

100 {
102  ARM_COMPUTE_ERROR_THROW_ON(validate_arguments(src, dst, pool_info, indices));
103 
104  auto padding_info = get_padding_info({ src, dst, indices });
105 
106  // Auto init if empty
107  TensorShape out_shape = compute_pool_shape(*src, pool_info);
108  auto_init_if_empty(*dst, src->clone()->set_tensor_shape(out_shape));
109  if(indices)
110  {
111  auto_init_if_empty(*indices, src->clone()->set_tensor_shape(out_shape).set_data_type(DataType::U32));
112  }
113 
114  // Set instance variables
115  _pool_info = pool_info;
116  _data_layout = pool_info.data_layout == DataLayout::UNKNOWN ? src->data_layout() : pool_info.data_layout;
117  _num_elems_processed_per_iteration = (_data_layout == DataLayout::NCHW) ? 1 : ((dst->data_type() == DataType::F32) ? 2 : 4);
119 
120  int pool_stride_x = 0;
121  int pool_stride_y = 0;
122  const PoolingType pool_type = pool_info.pool_type;
127  const int pool_size_x = pool_info.is_global_pooling ? src->dimension(idx_width) : pool_info.pool_size.width;
128  const int pool_size_y = pool_info.is_global_pooling ? src->dimension(idx_height) : pool_info.pool_size.height;
129  const PadStrideInfo pad_stride_info = pool_info.pad_stride_info;
130  const bool exclude_padding = pool_info.exclude_padding;
131  std::tie(pool_stride_x, pool_stride_y) = pad_stride_info.stride();
132  const int pool_pad_top = pad_stride_info.pad_top();
133  const int pool_pad_left = pad_stride_info.pad_left();
134  const DataType data_type = src->data_type();
135 
136  // Set build options
137  CLBuildOptions build_opts;
138  build_opts.add_option("-DVEC_SIZE=" + support::cpp11::to_string(_num_elems_processed_per_iteration));
139  build_opts.add_option("-DDATA_TYPE=" + get_cl_type_from_data_type(data_type));
140  build_opts.add_option("-DPOOL_" + string_from_pooling_type(pool_type));
141  build_opts.add_option("-DSTRIDE_X=" + support::cpp11::to_string(pool_stride_x));
142  build_opts.add_option("-DSTRIDE_Y=" + support::cpp11::to_string(pool_stride_y));
143  build_opts.add_option("-DPAD_X=" + support::cpp11::to_string(pool_pad_left));
144  build_opts.add_option("-DPAD_Y=" + support::cpp11::to_string(pool_pad_top));
145  build_opts.add_option("-DPOOL_SIZE_X=" + support::cpp11::to_string(pool_size_x));
146  build_opts.add_option("-DPOOL_SIZE_Y=" + support::cpp11::to_string(pool_size_y));
147  build_opts.add_option("-DSRC_WIDTH=" + support::cpp11::to_string(src->dimension(idx_width)));
148  build_opts.add_option("-DSRC_HEIGHT=" + support::cpp11::to_string(src->dimension(idx_height)));
149  build_opts.add_option("-DMAX_WIDTH=" + support::cpp11::to_string(src->dimension(idx_width) + (exclude_padding ? 0 : pool_pad_left)));
150  build_opts.add_option("-DMAX_HEIGHT=" + support::cpp11::to_string(src->dimension(idx_height) + (exclude_padding ? 0 : pool_pad_top)));
151 
152  // Tensor paddings are used to calculate the indicies for MAX pooling
153  if(pool_info.pool_size == Size2D(2, 2) && pool_type == PoolingType::MAX && indices && is_data_type_float(data_type))
154  {
155  build_opts.add_option("-DSRC_BATCH=" + support::cpp11::to_string(src->tensor_shape().total_size_lower(3)));
156  }
157 
158  if(is_data_type_quantized_asymmetric(data_type))
159  {
160  build_opts.add_option("-DQUANTIZED");
161 
162  if(src->quantization_info() != dst->quantization_info())
163  {
164  const UniformQuantizationInfo iq_info = src->quantization_info().uniform();
165  const UniformQuantizationInfo oq_info = dst->quantization_info().uniform();
166 
167  build_opts.add_option("-DOFFSET_IN1=" + float_to_string_with_full_precision(iq_info.offset));
168  build_opts.add_option("-DOFFSET_OUT=" + float_to_string_with_full_precision(oq_info.offset));
169  build_opts.add_option("-DSCALE_IN1=" + float_to_string_with_full_precision(iq_info.scale));
170  build_opts.add_option("-DSCALE_OUT=" + float_to_string_with_full_precision(oq_info.scale));
171  }
172  }
173 
174  // Set the initial value for the pooling operation accordingly with the data type
175  if(pool_type == PoolingType::MAX)
176  {
177  if(is_data_type_quantized(data_type))
178  {
179  PixelValue type_min{};
180  std::tie(type_min, std::ignore) = get_min_max(data_type);
181  build_opts.add_option("-DINITIAL_VALUE=" + support::cpp11::to_string(type_min.get<int32_t>()));
182  }
183  else
184  {
185  build_opts.add_option("-DINITIAL_VALUE=" + float_to_string_with_full_precision(std::numeric_limits<float>::lowest()));
186  }
187  }
188  else
189  {
190  // Pool AVG and Pool L2 initial value
191  build_opts.add_option("-DINITIAL_VALUE=0");
192  }
193 
194  // Create kernel
195  switch(_data_layout)
196  {
197  case DataLayout::NCHW:
198  {
199  const auto use_fp_mixed_precision = (data_type == DataType::F16) && pool_info.fp_mixed_precision;
200  const auto use_wider_accumulator = use_fp_mixed_precision && (pool_type != PoolingType::MAX);
201  const auto acc_data_type = get_cl_type_from_data_type(use_wider_accumulator ? DataType::F32 : (is_data_type_quantized(data_type) ? DataType::S32 : data_type));
202  build_opts.add_option("-DACC_DATA_TYPE=" + acc_data_type);
203  build_opts.add_option_if(use_wider_accumulator, "-DFP_MIXED_PRECISION");
204 
205  if(pool_type != PoolingType::MAX)
206  {
207  build_opts.add_option_if(exclude_padding, "-DEXCLUDE_PADDING");
208  }
209 
210  if(pool_info.pool_size == Size2D(2, 2) && pool_type == PoolingType::MAX && indices && is_data_type_float(data_type))
211  {
212  // For max pooling with pool2x2, store indicies which will be used in max unpooling
213  std::string kernel_name = "pooling_layer_2_nchw_indices";
214  _kernel = create_kernel(compile_context, kernel_name, build_opts.options());
215  }
216  else // Run general case
217  {
218  std::string kernel_name = "pooling_layer_MxN_nchw";
219  _kernel = create_kernel(compile_context, kernel_name, build_opts.options());
220  }
221  break;
222  }
223  case DataLayout::NHWC:
224  {
225  // Floating point mixed precision is support on F16 only
226  const auto use_fp_mixed_precision = (data_type == DataType::F16) && pool_info.fp_mixed_precision && pool_type != PoolingType::MAX;
227 
228  // Wider accumulation is required to avoid accuracy loss
229  // Case 1: Floating point mixed precision (fp16 src data and fp32 accumulation)
230  // Cast 2: Quantized (int8/uint8 src data and int32 accumulation )
231  DataType acc_data_type = data_type;
232 
233  if(use_fp_mixed_precision)
234  {
235  acc_data_type = DataType::F32;
236  }
237  else if(is_data_type_quantized(data_type) && pool_type != PoolingType::MAX)
238  {
239  acc_data_type = DataType::S32;
240  }
241 
242  build_opts.add_option("-DACC_DATA_TYPE=" + get_cl_type_from_data_type(acc_data_type));
243  build_opts.add_option_if(use_fp_mixed_precision, "-DFP_MIXED_PRECISION");
244  build_opts.add_option_if(exclude_padding, "-DEXCLUDE_PADDING");
245  build_opts.add_option("-DSRC_WIDTH=" + support::cpp11::to_string(src->dimension(idx_width)));
246  build_opts.add_option("-DSRC_HEIGHT=" + support::cpp11::to_string(src->dimension(idx_height)));
247  build_opts.add_option("-DDST_HEIGHT=" + support::cpp11::to_string(dst->dimension(idx_height)));
248  build_opts.add_option("-DDST_CHANNELS=" + support::cpp11::to_string(dst->dimension(idx_channel)));
249  build_opts.add_option("-DDST_BATCH_SIZE=" + support::cpp11::to_string(dst->dimension(idx_batch_size)));
250  build_opts.add_option("-DVEC_SIZE_LEFTOVER=" + support::cpp11::to_string(src->dimension(0) % _num_elems_processed_per_iteration));
251  if(pool_info.pool_size == Size2D(2, 2) && is_data_type_float(data_type))
252  {
253  build_opts.add_option_if(indices != nullptr && pool_type == PoolingType::MAX, "-DEXTRACT_MAX_INDEX");
254 
255  std::string kernel_name = "pooling_layer_2x2_nhwc";
256  _kernel = create_kernel(compile_context, kernel_name, build_opts.options());
257  }
258  else
259  {
260  std::string kernel_name = is_data_type_quantized_asymmetric(data_type) ? "pooling_layer_MxN_quantized_nhwc" : "pooling_layer_MxN_nhwc";
261  _kernel = create_kernel(compile_context, kernel_name, build_opts.options());
262  }
263  break;
264  }
265  default:
266  ARM_COMPUTE_ERROR("Not implemented");
267  }
268 
269  // Configure kernel window
271  ICLKernel::configure_internal(win);
272 
273  // Set config_id for enabling LWS tuning
274  _config_id = "pooling_layer_";
275  _config_id += lower_string(string_from_data_type(data_type));
276  _config_id += "_";
278  _config_id += "_";
279  _config_id += support::cpp11::to_string(dst->dimension(idx_width));
280  _config_id += "_";
281  _config_id += support::cpp11::to_string(dst->dimension(idx_height));
282  _config_id += "_";
283  _config_id += support::cpp11::to_string(dst->dimension(idx_channel));
284  _config_id += "_";
285  _config_id += lower_string(string_from_data_layout(src->data_layout()));
286 
288 }
bool is_data_type_quantized(DataType dt)
Check if a given data type is of quantized type.
Definition: Utils.h:1030
Window calculate_max_window(const ValidRegion &valid_region, const Steps &steps, bool skip_border, BorderSize border_size)
#define ARM_COMPUTE_ERROR(msg)
Print the given message then throw an std::runtime_error.
Definition: Error.h:352
std::string to_string(T &&value)
Convert integer and float values to string.
1 channel, 1 F32 per channel
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:466
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
std::string lower_string(const std::string &val)
Lower a given string.
Definition: Utils.cpp:353
Status validate_arguments(const ITensorInfo *src, const ITensorInfo *weights, const ITensorInfo *dst, const PadStrideInfo &conv_info)
SimpleTensor< float > src
Definition: DFT.cpp:155
1 channel, 1 F16 per channel
1 channel, 1 S32 per channel
TensorShape compute_pool_shape(const ITensorInfo &input, PoolingLayerInfo pool_info)
Calculate the output pool shape of a tensor.
cl::Kernel create_kernel(const CLCompileContext &ctx, const std::string &kernel_name, const std::set< std::string > &build_opts=std::set< std::string >())
Creates an opencl kernel using a compile context.
Definition: CLHelpers.cpp:404
const std::string & string_from_data_type(DataType dt)
Convert a data type identity into a string.
Definition: Utils.cpp:135
1 channel, 1 U32 per channel
std::string float_to_string_with_full_precision(float val)
Create a string with the float in full precision.
Definition: Utils.h:1124
std::string get_cl_type_from_data_type(const DataType &dt)
Translates a tensor data type to the appropriate OpenCL type.
Definition: CLHelpers.cpp:39
bool auto_init_if_empty(ITensorInfo &info, const TensorShape &shape, int num_channels, DataType data_type, QuantizationInfo quantization_info=QuantizationInfo())
Auto initialize the tensor info (shape, number of channels and data type) if the current assignment i...
bool has_padding_changed(const std::unordered_map< const ITensorInfo *, PaddingSize > &padding_map)
Check if the previously stored padding info has changed after configuring a kernel.
Definition: Utils.cpp:603
Num samples, channels, height, width.
bool is_data_type_quantized_asymmetric(DataType dt)
Check if a given data type is of asymmetric quantized type.
Definition: Utils.h:1052
PoolingType
Available pooling types.
Definition: Types.h:557
const std::string & string_from_data_layout(DataLayout dl)
Convert a data layout identity into a string.
Definition: Utils.cpp:123
size_t get_data_layout_dimension_index(const DataLayout &data_layout, const DataLayoutDimension &data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:193
Num samples, height, width, channels.
std::unordered_map< const ITensorInfo *, PaddingSize > get_padding_info(std::initializer_list< const ITensorInfo *> infos)
Stores padding information before configuring a kernel.
Definition: Utils.cpp:588
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:157
unsigned int adjust_vec_size(unsigned int vec_size, size_t dim0)
Returns the adjusted vector size in case it is less than the input&#39;s first dimension, getting rounded down to its closest valid vector size.
Definition: Utils.h:1222
std::string kernel_name
DataType
Available data types.
Definition: Types.h:79
const std::string & string_from_pooling_type(PoolingType type)
Translates a given pooling type to a string.
Definition: Utils.cpp:225
std::tuple< PixelValue, PixelValue > get_min_max(DataType dt)
Compute the mininum and maximum values a data type can take.
Definition: Utils.h:564
bool is_data_type_float(DataType dt)
Check if a given data type is of floating point type.
Definition: Utils.h:1010

◆ run_op()

void run_op ( ITensorPack tensors,
const Window window,
cl::CommandQueue &  queue 
)
overridevirtual

Enqueue the OpenCL kernel to process the given window on the passed OpenCL command queue.

Note
The queue is not flushed by this method, and therefore the kernel will not have been executed by the time this method returns.
Parameters
[in]tensorsA vector containing the tensors to operato on.
[in]windowRegion on which to execute the kernel. (Must be a valid region of the window returned by window()).
[in,out]queueCommand queue on which to enqueue the kernel.

Reimplemented from ICLKernel.

Definition at line 296 of file ClPool2dKernel.cpp.

References arm_compute::ACL_DST_0, arm_compute::ACL_DST_1, arm_compute::ACL_SRC, ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW, ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL, Window::collapse_if_possible(), Window::DimX, Window::DimY, Window::DimZ, arm_compute::enqueue(), Window::first_slice_window_4D(), ITensorPack::get_const_tensor(), ITensorPack::get_tensor(), arm_compute::is_data_type_float(), arm_compute::MAX, arm_compute::NCHW, arm_compute::NHWC, Window::set(), arm_compute::test::validation::reference::slice(), Window::slide_window_slice_4D(), and IKernel::window().

297 {
300 
301  unsigned int pool_stride_x = 0;
302  unsigned int pool_stride_y = 0;
303  std::tie(pool_stride_x, pool_stride_y) = _pool_info.pad_stride_info.stride();
304 
305  const auto src = utils::cast::polymorphic_downcast<const ICLTensor *>(tensors.get_const_tensor(TensorType::ACL_SRC));
306  auto dst = utils::cast::polymorphic_downcast<ICLTensor *>(tensors.get_tensor(TensorType::ACL_DST_0));
307  auto indices = utils::cast::polymorphic_downcast<ICLTensor *>(tensors.get_tensor(TensorType::ACL_DST_1));
308 
309  // Collapse window
310  Window window_collapsed = window.collapse_if_possible(ICLKernel::window(), Window::DimZ);
311 
312  switch(_data_layout)
313  {
314  case DataLayout::NCHW:
315  {
316  Window slice = window_collapsed.first_slice_window_3D();
317  do
318  {
319  // Set srcs
320  unsigned int idx = 0;
321  add_3D_tensor_argument(idx, src, slice);
322  add_3D_tensor_argument(idx, dst, slice);
323  if(indices && is_data_type_float(src->info()->data_type()) && (_pool_info.pool_size == Size2D(2, 2)))
324  {
325  add_3D_tensor_argument(idx, indices, slice);
326  }
327  enqueue(queue, *this, slice, lws_hint());
328  }
329  while(window_collapsed.slide_window_slice_3D(slice));
330  break;
331  }
332  case DataLayout::NHWC:
333  {
334  const size_t batch_size = dst->info()->tensor_shape().total_size_upper(3);
335 
336  Window slice = window_collapsed.first_slice_window_4D();
337  Window in_slice = window_collapsed.first_slice_window_4D();
338  in_slice.set(Window::DimX, Window::Dimension(0, src->info()->dimension(0), _num_elems_processed_per_iteration));
339  in_slice.set(Window::DimY, Window::Dimension(0, src->info()->dimension(1), pool_stride_x));
340  in_slice.set(Window::DimZ, Window::Dimension(0, src->info()->dimension(2), pool_stride_y));
341  in_slice.set(3, Window::Dimension(0, batch_size, 1));
342  do
343  {
344  // Set srcs
345  unsigned int idx = 0;
346  add_4D_tensor_argument(idx, src, in_slice);
347  add_4D_tensor_argument(idx, dst, slice);
348  if(indices && is_data_type_float(src->info()->data_type()) && (_pool_info.pool_type == PoolingType::MAX) && (_pool_info.pool_size == Size2D(2, 2)))
349  {
350  add_4D_tensor_argument(idx, indices, slice);
351  }
352  enqueue(queue, *this, slice, lws_hint());
353  }
354  while(window.slide_window_slice_4D(slice) && window.slide_window_slice_4D(in_slice));
355  break;
356  }
357  default:
358  ARM_COMPUTE_ERROR("Not implemented");
359  }
360 }
const Window & window() const
The maximum window the kernel can be executed on.
Definition: IKernel.cpp:28
void enqueue(cl::CommandQueue &queue, ICLKernel &kernel, const Window &window, const cl::NDRange &lws_hint=CLKernelLibrary::get().default_ndrange(), bool use_dummy_work_items=false)
Add the kernel to the command queue with the given window.
Definition: ICLKernel.cpp:32
#define ARM_COMPUTE_ERROR(msg)
Print the given message then throw an std::runtime_error.
Definition: Error.h:352
cl::NDRange lws_hint() const
Return the Local-Workgroup-Size hint.
Definition: ICLKernel.h:383
void add_3D_tensor_argument(unsigned int &idx, const ICLTensor *tensor, const Window &window)
Add the passed 3D tensor&#39;s parameters to the object&#39;s kernel&#39;s arguments starting from the index idx...
Definition: ICLKernel.h:226
SimpleTensor< float > src
Definition: DFT.cpp:155
static constexpr size_t DimX
Alias for dimension 0 also known as X dimension.
Definition: Window.h:43
Window collapse_if_possible(const Window &full_window, size_t first, size_t last, bool *has_collapsed=nullptr) const
Collapse the dimensions between first and last if possible.
Definition: Window.inl:68
std::pair< unsigned int, unsigned int > stride() const
Get the stride.
Definition: Types.h:717
#define ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL(k)
Definition: Validate.h:915
Num samples, channels, height, width.
static constexpr size_t DimY
Alias for dimension 1 also known as Y dimension.
Definition: Window.h:45
PadStrideInfo pad_stride_info
Definition: Types.h:1288
static constexpr size_t DimZ
Alias for dimension 2 also known as Z dimension.
Definition: Window.h:47
Num samples, height, width, channels.
bool slide_window_slice_4D(Window &slice) const
Slide the passed 4D window slice.
Definition: Window.h:361
void add_4D_tensor_argument(unsigned int &idx, const ICLTensor *tensor, const Window &window)
Add the passed 4D tensor&#39;s parameters to the object&#39;s kernel&#39;s arguments starting from the index idx...
Definition: ICLKernel.h:236
bool is_data_type_float(DataType dt)
Check if a given data type is of floating point type.
Definition: Utils.h:1010
#define ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW(f, s)
Definition: Validate.h:201
SimpleTensor< T > slice(const SimpleTensor< T > &src, Coordinates starts, Coordinates ends)

◆ validate()

Status validate ( const ITensorInfo src,
const ITensorInfo dst,
const PoolingLayerInfo pool_info,
const ITensorInfo indices = nullptr 
)
static

Static function to check if given info will lead to a valid configuration.

Similar to ClPool2dKernel::configure()

Returns
a status

Definition at line 290 of file ClPool2dKernel.cpp.

References ARM_COMPUTE_RETURN_ON_ERROR, and arm_compute::cpu::kernels::validate_arguments().

Referenced by ClPool2d::validate().

291 {
292  ARM_COMPUTE_RETURN_ON_ERROR(validate_arguments(src, dst, pool_info, indices));
293  return Status{};
294 }
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204
Status validate_arguments(const ITensorInfo *src, const ITensorInfo *weights, const ITensorInfo *dst, const PadStrideInfo &conv_info)
SimpleTensor< float > src
Definition: DFT.cpp:155

Field Documentation

◆ _data_layout

DataLayout _data_layout { DataLayout::UNKNOWN }

Definition at line 67 of file ClPool2dKernel.h.

◆ _num_elems_processed_per_iteration

unsigned int _num_elems_processed_per_iteration { 1 }

Definition at line 68 of file ClPool2dKernel.h.

◆ _pool_info

PoolingLayerInfo _pool_info {}

Definition at line 66 of file ClPool2dKernel.h.


The documentation for this class was generated from the following files: