Compute Library
 21.02
CpuPoolingAssemblyWrapperKernel Class Referencefinal

This class is a wrapper for the assembly kernels. More...

#include <CpuPoolingAssemblyWrapperKernel.h>

Collaboration diagram for CpuPoolingAssemblyWrapperKernel:
[legend]

Public Member Functions

 CpuPoolingAssemblyWrapperKernel ()=default
 Constructor. More...
 
 CpuPoolingAssemblyWrapperKernel (CpuPoolingAssemblyWrapperKernel &)=delete
 
 CpuPoolingAssemblyWrapperKernel (CpuPoolingAssemblyWrapperKernel &&)=default
 
CpuPoolingAssemblyWrapperKerneloperator= (CpuPoolingAssemblyWrapperKernel &)=delete
 
const char * name () const override
 Name of the kernel. More...
 
void configure (const ITensorInfo *src, ITensorInfo *dst, const PoolingLayerInfo &info, const CPUInfo &cpu_info)
 Initialise the kernel's src and dst. More...
 
void run_op (ITensorPack &tensors, const Window &window, const ThreadInfo &info) override
 Execute the kernel on the passed window. More...
 
size_t get_working_size (unsigned int num_threads) const
 Get size of the workspace needed by the assembly kernel. More...
 
bool is_configured () const
 Was the asm kernel successfully configured? More...
 
- Public Member Functions inherited from ICPPKernel
virtual ~ICPPKernel ()=default
 Default destructor. More...
 
virtual void run (const Window &window, const ThreadInfo &info)
 Execute the kernel on the passed window. More...
 
virtual void run_nd (const Window &window, const ThreadInfo &info, const Window &thread_locator)
 legacy compatibility layer for implemantions which do not support thread_locator In these cases we simply narrow the interface down the legacy version More...
 
- Public Member Functions inherited from IKernel
 IKernel ()
 Constructor. More...
 
virtual ~IKernel ()=default
 Destructor. More...
 
virtual bool is_parallelisable () const
 Indicates whether or not the kernel is parallelisable. More...
 
virtual BorderSize border_size () const
 The size of the border for that kernel. More...
 
const Windowwindow () const
 The maximum window the kernel can be executed on. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *src, const ITensorInfo *dst, const PoolingLayerInfo &info)
 Indicates whether or not this function can be used to process the given parameters. More...
 

Detailed Description

This class is a wrapper for the assembly kernels.

Some kernels were written in assembly and highly optimised for specific CPUs like A53 or A55. The arm compute library creates an instance of CpuPoolingAssemblyWrapperKernel and other auxiliary data structures to execute a single assembly kernel in the context of an NEFunction.

Definition at line 48 of file CpuPoolingAssemblyWrapperKernel.h.

Constructor & Destructor Documentation

◆ CpuPoolingAssemblyWrapperKernel() [1/3]

Constructor.

◆ CpuPoolingAssemblyWrapperKernel() [2/3]

◆ CpuPoolingAssemblyWrapperKernel() [3/3]

Member Function Documentation

◆ configure()

void configure ( const ITensorInfo src,
ITensorInfo dst,
const PoolingLayerInfo info,
const CPUInfo cpu_info 
)

Initialise the kernel's src and dst.

Parameters
[in]srcSource tensor info. Data types supported: QASYMM8/QASYMM8_SIGNED/F16/F32.
[out]dstDestination tensor info to store the result of pooling. Data types supported: same as src.
[in]infoPooling meta-data.
[in]cpu_infoCPU information needed to select the most appropriate kernel.

Definition at line 44 of file CpuPoolingAssemblyWrapperKernel.cpp.

References ARM_COMPUTE_ERROR_ON_NULLPTR, arm_compute::auto_init_if_empty(), arm_compute::calculate_max_window(), ICloneable< T >::clone(), arm_compute::misc::shape_calculator::compute_pool_shape(), ITensorInfo::data_type(), arm_compute::test::validation::dst, arm_compute::F16, arm_compute::F32, arm_compute::test::validation::info, arm_compute::QASYMM8, arm_compute::QASYMM8_SIGNED, ITensorInfo::quantization_info(), and arm_compute::test::validation::src.

Referenced by CpuPoolingAssemblyWrapperKernel::name().

45 {
47 
48  // dst initialization if not yet initialized
49  auto_init_if_empty(*dst, src->clone()->set_tensor_shape(compute_pool_shape(*src, info)));
50 
51  const bool requantize = src->quantization_info() != dst->quantization_info();
52 
53  switch(src->data_type())
54  {
55  case DataType::QASYMM8:
56  if(requantize)
57  {
58  create_arm_pooling_requant<uint8_t, uint8_t>(src, dst, info, cpu_info);
59  }
60  else
61  {
62  create_arm_pooling<uint8_t, uint8_t>(src, dst, info, cpu_info);
63  }
64  break;
66  if(requantize)
67  {
68  create_arm_pooling_requant<int8_t, int8_t>(src, dst, info, cpu_info);
69  }
70  else
71  {
72  create_arm_pooling<int8_t, int8_t>(src, dst, info, cpu_info);
73  }
74  break;
75 #ifdef __ARM_FEATURE_FP16_VECTOR_ARITHMETIC
76  case DataType::F16:
77  create_arm_pooling<float16_t, float16_t>(src, dst, info, cpu_info);
78  break;
79 #endif /* __ARM_FEATURE_FP16_VECTOR_ARITHMETIC */
80  case DataType::F32:
81  create_arm_pooling<float, float>(src, dst, info, cpu_info);
82  break;
83  default:
84  break;
85  }
86 
87  Window win = calculate_max_window(*dst, Steps());
88  INEKernel::configure(win);
89 }
Window calculate_max_window(const ValidRegion &valid_region, const Steps &steps, bool skip_border, BorderSize border_size)
1 channel, 1 F32 per channel
SimpleTensor< float > src
Definition: DFT.cpp:155
1 channel, 1 F16 per channel
TensorShape compute_pool_shape(const ITensorInfo &input, PoolingLayerInfo pool_info)
Calculate the output pool shape of a tensor.
quantized, asymmetric fixed-point 8-bit number unsigned
bool auto_init_if_empty(ITensorInfo &info, const TensorShape &shape, int num_channels, DataType data_type, QuantizationInfo quantization_info=QuantizationInfo())
Auto initialize the tensor info (shape, number of channels and data type) if the current assignment i...
ScaleKernelInfo info(interpolation_policy, default_border_mode, PixelValue(), sampling_policy, false)
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161
quantized, asymmetric fixed-point 8-bit number signed

◆ get_working_size()

size_t get_working_size ( unsigned int  num_threads) const

Get size of the workspace needed by the assembly kernel.

Parameters
[in]num_threadsMaximum number of threads that are going to be spawned.
Returns
size of workspace

Definition at line 173 of file CpuPoolingAssemblyWrapperKernel.cpp.

Referenced by CpuPoolingAssemblyWrapperKernel::name().

174 {
175  return _kernel_asm->get_working_size(num_threads);
176 }

◆ is_configured()

◆ name()

◆ operator=()

◆ run_op()

void run_op ( ITensorPack tensors,
const Window window,
const ThreadInfo info 
)
overridevirtual

Execute the kernel on the passed window.

Warning
If is_parallelisable() returns false then the passed window must be equal to window()
Note
The window has to be a region within the window returned by the window() method
The width of the window has to be a multiple of num_elems_processed_per_iteration().
Parameters
[in]tensorsA vector containing the tensors to operate on.
[in]windowRegion on which to execute the kernel. (Must be a region of the window returned by window())
[in]infoInfo about executing thread and CPU.

Reimplemented from ICPPKernel.

Definition at line 139 of file CpuPoolingAssemblyWrapperKernel.cpp.

References arm_compute::ACL_DST_0, arm_compute::ACL_DST_1, arm_compute::ACL_SRC, ARM_COMPUTE_ERROR_ON, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL, ARM_COMPUTE_UNUSED, ITensor::buffer(), arm_compute::test::validation::dst, arm_compute::test::validation::dst_shape, ITensorPack::empty(), ITensorPack::get_const_tensor(), ITensorPack::get_tensor(), ITensor::info(), BorderSize::left, ThreadInfo::num_threads, ITensorInfo::offset_first_element_in_bytes(), ITensorInfo::padding(), arm_compute::test::validation::src, ITensorInfo::tensor_shape(), and ThreadInfo::thread_id.

Referenced by CpuPoolingAssemblyWrapperKernel::name().

140 {
141  ARM_COMPUTE_ERROR_ON_NULLPTR(_kernel_asm.get());
145 
146  ARM_COMPUTE_ERROR_ON(tensors.empty());
147 
148  const ITensor *src = tensors.get_const_tensor(TensorType::ACL_SRC);
149  ITensor *dst = tensors.get_tensor(TensorType::ACL_DST_0);
150  ITensor *workspace = tensors.get_tensor(TensorType::ACL_DST_1);
151 
152  const auto in_ptr = src->buffer() + src->info()->offset_first_element_in_bytes();
153  auto out_ptr = dst->buffer() + dst->info()->offset_first_element_in_bytes();
154  auto working_space = workspace->buffer() + workspace->info()->offset_first_element_in_bytes();
155 
156  const auto src_shape = src->info()->tensor_shape();
157  const auto dst_shape = dst->info()->tensor_shape();
158  const auto src_padding = src->info()->padding();
159  const auto dst_padding = dst->info()->padding();
160 
161  const size_t ld_src_col = src_shape[0] + src_padding.left + src_padding.right;
162  const size_t ld_src_row = ld_src_col * (src_shape[1] + src_padding.top + src_padding.bottom);
163  const size_t ld_src_batch = ld_src_row * src_shape[2];
164  const size_t ld_dst_col = dst_shape[0] + dst_padding.left + dst_padding.right;
165  const size_t ld_dst_row = ld_dst_col * (dst_shape[1] + dst_padding.top + dst_padding.bottom);
166  const size_t ld_dst_batch = ld_dst_row * dst_shape[2];
167 
168  _kernel_asm->execute(in_ptr, ld_src_col, ld_src_row, ld_src_batch,
169  out_ptr, ld_dst_col, ld_dst_row, ld_dst_batch,
170  working_space, info.thread_id, info.num_threads);
171 }
const Window & window() const
The maximum window the kernel can be executed on.
Definition: IKernel.cpp:28
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:466
SimpleTensor< float > src
Definition: DFT.cpp:155
#define ARM_COMPUTE_UNUSED(...)
To avoid unused variables warnings.
Definition: Error.h:152
#define ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL(k)
Definition: Validate.h:941
ScaleKernelInfo info(interpolation_policy, default_border_mode, PixelValue(), sampling_policy, false)
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161

◆ validate()

Status validate ( const ITensorInfo src,
const ITensorInfo dst,
const PoolingLayerInfo info 
)
static

Indicates whether or not this function can be used to process the given parameters.

Parameters
[in]srcSource tensor info. Data types supported: QASYMM8/QASYMM8_SIGNED/F16/F32.
[in]dstDestination tensor to store the result of pooling. Data types supported: same as src.
[in]infoPooling meta-data
Returns
a status.

Definition at line 91 of file CpuPoolingAssemblyWrapperKernel.cpp.

References ARM_COMPUTE_RETURN_ERROR_MSG, ARM_COMPUTE_RETURN_ERROR_ON, ARM_COMPUTE_RETURN_ERROR_ON_CPU_F16_UNSUPPORTED, ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES, ARM_COMPUTE_RETURN_ERROR_ON_MSG, ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR, arm_compute::AVG, arm_compute::quantization::calculate_quantized_multiplier(), ITensorInfo::data_layout(), PoolingLayerInfo::data_layout, ITensorInfo::data_type(), PoolingLayerInfo::exclude_padding, arm_compute::F16, arm_compute::F32, PadStrideInfo::has_padding(), arm_compute::MAX, arm_compute::NHWC, PoolingLayerInfo::pad_stride_info, PoolingLayerInfo::pool_type, arm_compute::QASYMM8, arm_compute::QASYMM8_SIGNED, ITensorInfo::quantization_info(), UniformQuantizationInfo::scale, ITensorInfo::total_size(), and QuantizationInfo::uniform().

Referenced by CpuPoolingAssemblyWrapperKernel::name(), and CpuPoolingAssemblyDispatch::validate().

92 {
94 
95 #ifndef __aarch64__
96  ARM_COMPUTE_RETURN_ERROR_MSG("32-bit is not supported by assembly kernels");
97 #endif /* __aarch64__ */
100  ARM_COMPUTE_RETURN_ERROR_ON_MSG((src->data_layout() != DataLayout::NHWC) || (info.data_layout != DataLayout::NHWC), "Only NHWC is supported by assembly kernels");
102  "Only AVG and MAX pooling are supported by assembly kernels");
103 
104  if(dst->total_size() > 0)
105  {
107 
108  const auto src_qinfo = src->quantization_info().uniform();
109  const auto dst_qinfo = dst->quantization_info().uniform();
110 
111  if(src_qinfo != dst_qinfo)
112  {
113  const float multiplier = src_qinfo.scale / dst_qinfo.scale;
114  int32_t dst_multiplier{};
115  int32_t dst_shift{};
116  ARM_COMPUTE_RETURN_ERROR_ON(quantization::calculate_quantized_multiplier(multiplier, &dst_multiplier, &dst_shift));
117  }
118  else
119  {
120  if(src->data_type() == DataType::QASYMM8)
121  {
122  const bool has_padding = info.pad_stride_info.has_padding();
123  ARM_COMPUTE_RETURN_ERROR_ON_MSG(!info.exclude_padding && has_padding, "Assembly kernels do not support padding for QASYMM8 with same src/dst quantization info");
124  }
125  }
126  }
127  else
128  {
129  if(src->data_type() == DataType::QASYMM8)
130  {
131  // If dst is not configured, the quantization info are the same
132  const bool has_padding = info.pad_stride_info.has_padding();
133  ARM_COMPUTE_RETURN_ERROR_ON_MSG(!info.exclude_padding && has_padding, "Assembly kernels do not support padding for QASYMM8 with same src/dst quantization info");
134  }
135  }
136  return Status{};
137 }
#define ARM_COMPUTE_RETURN_ERROR_ON_CPU_F16_UNSUPPORTED(tensor)
Definition: Validate.h:108
1 channel, 1 F32 per channel
Status calculate_quantized_multiplier(float multiplier, int32_t *quant_multiplier, int32_t *shift, bool ignore_epsilon=false)
Calculate quantized representation of multiplier.
#define ARM_COMPUTE_RETURN_ERROR_ON(cond)
If the condition is true, an error is returned.
Definition: Error.h:296
SimpleTensor< float > src
Definition: DFT.cpp:155
1 channel, 1 F16 per channel
#define ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR(...)
Definition: Validate.h:163
quantized, asymmetric fixed-point 8-bit number unsigned
ScaleKernelInfo info(interpolation_policy, default_border_mode, PixelValue(), sampling_policy, false)
#define ARM_COMPUTE_RETURN_ERROR_MSG(...)
An error is returned with the given description.
Definition: Error.h:194
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(...)
Definition: Validate.h:545
Num samples, height, width, channels.
#define ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(t, c,...)
Definition: Validate.h:792
#define ARM_COMPUTE_RETURN_ERROR_ON_MSG(cond, msg)
If the condition is true, an error is returned.
Definition: Error.h:244
quantized, asymmetric fixed-point 8-bit number signed
DataLayout data_layout
Data layout to use.

The documentation for this class was generated from the following files: