Compute Library
 21.08
CpuPool2dAssemblyWrapperKernel Class Referencefinal

This class is a wrapper for the assembly kernels. More...

#include <CpuPool2dAssemblyWrapperKernel.h>

Collaboration diagram for CpuPool2dAssemblyWrapperKernel:
[legend]

Public Member Functions

 CpuPool2dAssemblyWrapperKernel ()=default
 Constructor. More...
 
 ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE (CpuPool2dAssemblyWrapperKernel)
 
const char * name () const override
 Name of the kernel. More...
 
void configure (const ITensorInfo *src, ITensorInfo *dst, const PoolingLayerInfo &info, const CPUInfo &cpu_info)
 Initialise the kernel's src and dst. More...
 
void run_op (ITensorPack &tensors, const Window &window, const ThreadInfo &info) override
 Execute the kernel on the passed window. More...
 
size_t get_working_size (unsigned int num_threads) const
 Get size of the workspace needed by the assembly kernel. More...
 
bool is_configured () const
 Was the asm kernel successfully configured? More...
 
- Public Member Functions inherited from ICPPKernel
virtual ~ICPPKernel ()=default
 Default destructor. More...
 
virtual void run (const Window &window, const ThreadInfo &info)
 Execute the kernel on the passed window. More...
 
virtual void run_nd (const Window &window, const ThreadInfo &info, const Window &thread_locator)
 legacy compatibility layer for implemantions which do not support thread_locator In these cases we simply narrow the interface down the legacy version More...
 
- Public Member Functions inherited from IKernel
 IKernel ()
 Constructor. More...
 
virtual ~IKernel ()=default
 Destructor. More...
 
virtual bool is_parallelisable () const
 Indicates whether or not the kernel is parallelisable. More...
 
virtual BorderSize border_size () const
 The size of the border for that kernel. More...
 
const Windowwindow () const
 The maximum window the kernel can be executed on. More...
 
bool is_window_configured () const
 Function to check if the embedded window of this kernel has been configured. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *src, const ITensorInfo *dst, const PoolingLayerInfo &info)
 Static function to check if given info will lead to a valid configuration. More...
 

Detailed Description

This class is a wrapper for the assembly kernels.

Some kernels were written in assembly and highly optimised for specific CPUs like A53 or A55. The arm compute library creates an instance of CpuPool2dAssemblyWrapperKernel and other auxiliary data structures to execute a single assembly kernel in the context of an NEFunction.

Definition at line 48 of file CpuPool2dAssemblyWrapperKernel.h.

Constructor & Destructor Documentation

◆ CpuPool2dAssemblyWrapperKernel()

Constructor.

Member Function Documentation

◆ ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE()

ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE ( CpuPool2dAssemblyWrapperKernel  )

◆ configure()

void configure ( const ITensorInfo src,
ITensorInfo dst,
const PoolingLayerInfo info,
const CPUInfo cpu_info 
)

Initialise the kernel's src and dst.

Parameters
[in]srcSource tensor info. Data types supported: QASYMM8/QASYMM8_SIGNED/F16/F32.
[out]dstDestination tensor info to store the result of pooling. Data types supported: same as src.
[in]infoPooling meta-data.
[in]cpu_infoCPU information needed to select the most appropriate kernel.

Definition at line 44 of file CpuPool2dAssemblyWrapperKernel.cpp.

References ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_UNUSED, arm_compute::auto_init_if_empty(), arm_compute::calculate_max_window(), ICloneable< T >::clone(), arm_compute::misc::shape_calculator::compute_pool_shape(), ITensorInfo::data_type(), arm_compute::test::validation::dst, arm_compute::F16, arm_compute::F32, arm_compute::test::validation::info, arm_compute::QASYMM8, arm_compute::QASYMM8_SIGNED, ITensorInfo::quantization_info(), and arm_compute::test::validation::src.

Referenced by CpuPool2dAssemblyWrapperKernel::name().

45 {
46  ARM_COMPUTE_UNUSED(cpu_info);
48 
49  // dst initialization if not yet initialized
50  auto_init_if_empty(*dst, src->clone()->set_tensor_shape(compute_pool_shape(*src, info)));
51 
52 #if defined(__aarch64__)
53  const bool requantize = src->quantization_info() != dst->quantization_info();
54 
55  switch(src->data_type())
56  {
57  case DataType::QASYMM8:
58  if(requantize)
59  {
60  create_arm_pooling_requant<uint8_t, uint8_t>(src, dst, info, cpu_info);
61  }
62  else
63  {
64  create_arm_pooling<uint8_t, uint8_t>(src, dst, info, cpu_info);
65  }
66  break;
68  if(requantize)
69  {
70  create_arm_pooling_requant<int8_t, int8_t>(src, dst, info, cpu_info);
71  }
72  else
73  {
74  create_arm_pooling<int8_t, int8_t>(src, dst, info, cpu_info);
75  }
76  break;
77 #ifdef __ARM_FEATURE_FP16_VECTOR_ARITHMETIC
78  case DataType::F16:
79  create_arm_pooling<float16_t, float16_t>(src, dst, info, cpu_info);
80  break;
81 #endif /* __ARM_FEATURE_FP16_VECTOR_ARITHMETIC */
82  case DataType::F32:
83  create_arm_pooling<float, float>(src, dst, info, cpu_info);
84  break;
85  default:
86  break;
87  }
88 #endif // defined(__aarch64__)
89 
90  Window win = calculate_max_window(*dst, Steps());
91  INEKernel::configure(win);
92 }
Window calculate_max_window(const ValidRegion &valid_region, const Steps &steps, bool skip_border, BorderSize border_size)
1 channel, 1 F32 per channel
SimpleTensor< float > src
Definition: DFT.cpp:155
1 channel, 1 F16 per channel
TensorShape compute_pool_shape(const ITensorInfo &input, PoolingLayerInfo pool_info)
Calculate the output pool shape of a tensor.
#define ARM_COMPUTE_UNUSED(...)
To avoid unused variables warnings.
Definition: Error.h:152
quantized, asymmetric fixed-point 8-bit number unsigned
bool auto_init_if_empty(ITensorInfo &info, const TensorShape &shape, int num_channels, DataType data_type, QuantizationInfo quantization_info=QuantizationInfo())
Auto initialize the tensor info (shape, number of channels and data type) if the current assignment i...
ScaleKernelInfo info(interpolation_policy, default_border_mode, PixelValue(), sampling_policy, false)
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:157
quantized, asymmetric fixed-point 8-bit number signed

◆ get_working_size()

size_t get_working_size ( unsigned int  num_threads) const

Get size of the workspace needed by the assembly kernel.

Parameters
[in]num_threadsMaximum number of threads that are going to be spawned.
Returns
size of workspace

Definition at line 176 of file CpuPool2dAssemblyWrapperKernel.cpp.

Referenced by CpuPool2dAssemblyWrapperKernel::name().

177 {
178  return _kernel_asm->get_working_size(num_threads);
179 }

◆ is_configured()

◆ name()

◆ run_op()

void run_op ( ITensorPack tensors,
const Window window,
const ThreadInfo info 
)
overridevirtual

Execute the kernel on the passed window.

Warning
If is_parallelisable() returns false then the passed window must be equal to window()
Note
The window has to be a region within the window returned by the window() method
The width of the window has to be a multiple of num_elems_processed_per_iteration().
Parameters
[in]tensorsA vector containing the tensors to operate on.
[in]windowRegion on which to execute the kernel. (Must be a region of the window returned by window())
[in]infoInfo about executing thread and CPU.

Reimplemented from ICPPKernel.

Definition at line 142 of file CpuPool2dAssemblyWrapperKernel.cpp.

References arm_compute::ACL_DST, arm_compute::ACL_INT_0, arm_compute::ACL_SRC, ARM_COMPUTE_ERROR_ON, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL, ARM_COMPUTE_UNUSED, ITensor::buffer(), arm_compute::test::validation::dst, arm_compute::test::validation::dst_shape, ITensorPack::empty(), ITensorPack::get_const_tensor(), ITensorPack::get_tensor(), ITensor::info(), BorderSize::left, ThreadInfo::num_threads, ITensorInfo::offset_first_element_in_bytes(), ITensorInfo::padding(), arm_compute::test::validation::src, ITensorInfo::tensor_shape(), and ThreadInfo::thread_id.

Referenced by CpuPool2dAssemblyWrapperKernel::name().

143 {
144  ARM_COMPUTE_ERROR_ON_NULLPTR(_kernel_asm.get());
148 
149  ARM_COMPUTE_ERROR_ON(tensors.empty());
150 
151  const ITensor *src = tensors.get_const_tensor(TensorType::ACL_SRC);
152  ITensor *dst = tensors.get_tensor(TensorType::ACL_DST);
153  ITensor *workspace = tensors.get_tensor(TensorType::ACL_INT_0);
154 
155  const auto in_ptr = src->buffer() + src->info()->offset_first_element_in_bytes();
156  auto out_ptr = dst->buffer() + dst->info()->offset_first_element_in_bytes();
157  auto working_space = workspace->buffer() + workspace->info()->offset_first_element_in_bytes();
158 
159  const auto src_shape = src->info()->tensor_shape();
160  const auto dst_shape = dst->info()->tensor_shape();
161  const auto src_padding = src->info()->padding();
162  const auto dst_padding = dst->info()->padding();
163 
164  const size_t ld_src_col = src_shape[0] + src_padding.left + src_padding.right;
165  const size_t ld_src_row = ld_src_col * (src_shape[1] + src_padding.top + src_padding.bottom);
166  const size_t ld_src_batch = ld_src_row * src_shape[2];
167  const size_t ld_dst_col = dst_shape[0] + dst_padding.left + dst_padding.right;
168  const size_t ld_dst_row = ld_dst_col * (dst_shape[1] + dst_padding.top + dst_padding.bottom);
169  const size_t ld_dst_batch = ld_dst_row * dst_shape[2];
170 
171  _kernel_asm->execute(in_ptr, ld_src_col, ld_src_row, ld_src_batch,
172  out_ptr, ld_dst_col, ld_dst_row, ld_dst_batch,
173  working_space, info.thread_id, info.num_threads);
174 }
const Window & window() const
The maximum window the kernel can be executed on.
Definition: IKernel.cpp:28
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:466
SimpleTensor< float > src
Definition: DFT.cpp:155
#define ARM_COMPUTE_UNUSED(...)
To avoid unused variables warnings.
Definition: Error.h:152
#define ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL(k)
Definition: Validate.h:915
ScaleKernelInfo info(interpolation_policy, default_border_mode, PixelValue(), sampling_policy, false)
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:157

◆ validate()

Status validate ( const ITensorInfo src,
const ITensorInfo dst,
const PoolingLayerInfo info 
)
static

Static function to check if given info will lead to a valid configuration.

Similar to CpuPool2dAssemblyWrapperKernel::configure()

Returns
a status

Definition at line 94 of file CpuPool2dAssemblyWrapperKernel.cpp.

References ARM_COMPUTE_RETURN_ERROR_MSG, ARM_COMPUTE_RETURN_ERROR_ON, ARM_COMPUTE_RETURN_ERROR_ON_CPU_F16_UNSUPPORTED, ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES, ARM_COMPUTE_RETURN_ERROR_ON_MSG, ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR, arm_compute::AVG, arm_compute::quantization::calculate_quantized_multiplier(), ITensorInfo::data_layout(), PoolingLayerInfo::data_layout, ITensorInfo::data_type(), PoolingLayerInfo::exclude_padding, arm_compute::F16, arm_compute::F32, PadStrideInfo::has_padding(), arm_compute::MAX, arm_compute::NHWC, PoolingLayerInfo::pad_stride_info, PoolingLayerInfo::pool_type, arm_compute::QASYMM8, arm_compute::QASYMM8_SIGNED, ITensorInfo::quantization_info(), UniformQuantizationInfo::scale, ITensorInfo::total_size(), and QuantizationInfo::uniform().

Referenced by CpuPool2d::configure(), CpuPool2dAssemblyWrapperKernel::name(), and CpuPool2d::validate().

95 {
97 
98 #ifndef __aarch64__
99  ARM_COMPUTE_RETURN_ERROR_MSG("32-bit is not supported by assembly kernels");
100 #endif /* __aarch64__ */
103  ARM_COMPUTE_RETURN_ERROR_ON_MSG((src->data_layout() != DataLayout::NHWC) || (info.data_layout != DataLayout::NHWC), "Only NHWC is supported by assembly kernels");
105  "Only AVG and MAX pooling are supported by assembly kernels");
106 
107  if(dst->total_size() > 0)
108  {
110 
111  const auto src_qinfo = src->quantization_info().uniform();
112  const auto dst_qinfo = dst->quantization_info().uniform();
113 
114  if(src_qinfo != dst_qinfo)
115  {
116  const float multiplier = src_qinfo.scale / dst_qinfo.scale;
117  int32_t dst_multiplier{};
118  int32_t dst_shift{};
119  ARM_COMPUTE_RETURN_ERROR_ON(quantization::calculate_quantized_multiplier(multiplier, &dst_multiplier, &dst_shift));
120  }
121  else
122  {
123  if(src->data_type() == DataType::QASYMM8)
124  {
125  const bool has_padding = info.pad_stride_info.has_padding();
126  ARM_COMPUTE_RETURN_ERROR_ON_MSG(!info.exclude_padding && has_padding, "Assembly kernels do not support padding for QASYMM8 with same src/dst quantization info");
127  }
128  }
129  }
130  else
131  {
132  if(src->data_type() == DataType::QASYMM8)
133  {
134  // If dst is not configured, the quantization info are the same
135  const bool has_padding = info.pad_stride_info.has_padding();
136  ARM_COMPUTE_RETURN_ERROR_ON_MSG(!info.exclude_padding && has_padding, "Assembly kernels do not support padding for QASYMM8 with same src/dst quantization info");
137  }
138  }
139  return Status{};
140 }
#define ARM_COMPUTE_RETURN_ERROR_ON_CPU_F16_UNSUPPORTED(tensor)
Definition: Validate.h:115
1 channel, 1 F32 per channel
Status calculate_quantized_multiplier(float multiplier, int32_t *quant_multiplier, int32_t *shift, bool ignore_epsilon=false)
Calculate quantized representation of multiplier.
#define ARM_COMPUTE_RETURN_ERROR_ON(cond)
If the condition is true, an error is returned.
Definition: Error.h:296
SimpleTensor< float > src
Definition: DFT.cpp:155
1 channel, 1 F16 per channel
#define ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR(...)
Definition: Validate.h:159
quantized, asymmetric fixed-point 8-bit number unsigned
ScaleKernelInfo info(interpolation_policy, default_border_mode, PixelValue(), sampling_policy, false)
#define ARM_COMPUTE_RETURN_ERROR_MSG(...)
An error is returned with the given description.
Definition: Error.h:194
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(...)
Definition: Validate.h:541
Num samples, height, width, channels.
#define ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(t, c,...)
Definition: Validate.h:788
#define ARM_COMPUTE_RETURN_ERROR_ON_MSG(cond, msg)
If the condition is true, an error is returned.
Definition: Error.h:244
quantized, asymmetric fixed-point 8-bit number signed
DataLayout data_layout
Data layout to use.

The documentation for this class was generated from the following files: