Compute Library
 22.08
arm_compute::cpu::kernels Namespace Reference

Data Structures

struct  ActivationDataTypeISASelectorData
 
struct  CastDataTypeISASelectorData
 
class  CpuActivationKernel
 Interface for the activation kernel. More...
 
class  CpuAddKernel
 Interface for the kernel to perform addition between two tensors. More...
 
struct  CpuAddKernelDataTypeISASelectorData
 
class  CpuArithmeticKernel
 
class  CpuCastKernel
 Casts a given tensor to a new type. More...
 
class  CpuCol2ImKernel
 Kernel to perform col2im reshaping. More...
 
class  CpuComparisonKernel
 
class  CpuComplexMulKernel
 Interface for the complex pixelwise multiplication kernel. More...
 
class  CpuConcatenateBatchKernel
 Interface for the batch concatenate kernel. More...
 
class  CpuConcatenateDepthKernel
 Interface for the depth concatenate kernel. More...
 
class  CpuConcatenateHeightKernel
 Interface for the height concatenate kernel. More...
 
class  CpuConcatenateWidthKernel
 Interface for the width concatenate kernel. More...
 
class  CpuConvertFullyConnectedWeightsKernel
 Interface to convert the 2D Fully Connected weights from NCHW to NHWC or vice versa. More...
 
class  CpuConvertQuantizedSignednessKernel
 Kernel to convert asymmetric signed to asymmetric signed and vice-versa. More...
 
class  CpuCopyKernel
 Kernel to perform a copy between two tensors. More...
 
class  CpuDepthwiseConv2dAssemblyWrapperKernel
 This class is a wrapper for the depthwise convolution assembly kernels. More...
 
class  CpuDepthwiseConv2dNativeKernel
 Interface for the kernel to run a depthwise convolution native on a tensor. More...
 
class  CpuDequantizeKernel
 Interface for the dequantization layer kernel. More...
 
class  CpuDirectConv2dKernel
 Interface for the kernel to perform Direct Convolution Layer. More...
 
class  CpuDirectConv2dOutputStageKernel
 Kernel to accumulate the biases, if provided, or downscale in case of quantized input. More...
 
class  CpuDirectConv3dKernel
 Interface for the kernel to perform 3D Direct Convolution Layer. More...
 
class  CpuDivisionKernel
 
class  CpuElementwiseKernel
 Interface for an element-wise operation kernel. More...
 
class  CpuElementwiseUnaryKernel
 Interface for an element-wise unary operation kernel. More...
 
class  CpuFillKernel
 Kernel for filling a tensor with a given constant value. More...
 
class  CpuFloorKernel
 Cpu accelarated kernel to perform a floor operation. More...
 
class  CpuGemmInterleave4x4Kernel
 Kernel to interleave the elements of a matrix. More...
 
class  CpuGemmLowpMatrixAReductionKernel
 Kernel used to compute the row-vectors of sums of all the entries in each row of Matrix A. More...
 
class  CpuGemmLowpMatrixBReductionKernel
 Kernel used to compute the row-vectors of sums of all the entries in each column of Matrix B. More...
 
class  CpuGemmLowpMatrixMultiplyKernel
 Kernel to multiply matrices. More...
 
class  CpuGemmLowpOffsetContributionKernel
 Kernel used to add the offset contribution after CpuGemmLowpMatrixMultiplyKernel. More...
 
class  CpuGemmLowpOffsetContributionOutputStageKernel
 Kernel used to add the offset contribution and perform the output stage after CpuGemmLowpMatrixMultiplyKernel. More...
 
class  CpuGemmLowpQuantizeDownInt32ScaleKernel
 Kernel used to quantize down the int32 accumulator values of GEMMLowp to QASYMM8/QASYMM8_SIGNED. More...
 
class  CpuGemmLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel
 Kernel used to quantize down the int32 accumulator values of GEMMLowp to QSYMM16. More...
 
class  CpuGemmLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel
 Kernel used to quantize down the int32 accumulator values of GEMMLowp to QASYMM8_SIGNED. More...
 
class  CpuGemmLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel
 Kernel used to quantize down the int32 accumulator values of GEMMLowp to QASYMM8. More...
 
class  CpuGemmMatrixAdditionKernel
 Kernel to perform the in-place matrix addition between 2 matrices taking into account that the second matrix might be weighted by a scalar value beta: More...
 
class  CpuGemmMatrixMultiplyKernel
 Kernel to multiply two input matrices "A" and "B". More...
 
class  CpuGemmTranspose1xWKernel
 Kernel which transposes the elements of a matrix in chunks of 1xW, where W is equal to (16 / element size of the tensor) More...
 
class  CpuIm2ColKernel
 Interface for the im2col reshape kernel. More...
 
class  CpuLogits1DMaxKernel
 Interface for the identifying the max value of 1D Logits. More...
 
class  CpuLogits1DSoftmaxKernel
 Interface for softmax computation for QASYMM8 with pre-computed max. More...
 
class  CpuMaxUnpoolingLayerKernel
 Interface for the pooling layer kernel. More...
 
class  CpuMulKernel
 Interface for the kernel to perform multiplication between two tensors. More...
 
class  CpuPermuteKernel
 Kernel to perform tensor permutation given a permutation vector. More...
 
class  CpuPool2dAssemblyWrapperKernel
 This class is a wrapper for the assembly kernels. More...
 
class  CpuPool2dKernel
 Interface for the pooling layer kernel. More...
 
class  CpuPool3dKernel
 Interface for the kernel to perform Pooling 3D. More...
 
class  CpuPowerKernel
 
class  CpuQuantizeKernel
 Interface for the quantization layer kernel. More...
 
class  CpuReshapeKernel
 Interface for the kernel to perform tensor reshaping. More...
 
class  CpuScaleKernel
 Arm(R) Neon(TM) kernel to perform scaling on a tensor. More...
 
class  CpuSubKernel
 Interface for the kernel to perform subtraction between two tensors. More...
 
class  CpuTransposeKernel
 Kernel which transposes the elements of a matrix. More...
 
class  CpuWeightsReshapeKernel
 Kernel to perform reshaping on the weights used by convolution and locally connected layer. More...
 
struct  DataTypeDataLayoutISASelectorData
 
struct  DataTypeISASelectorData
 
struct  DepthwiseConv2dNativeDataTypeISASelectorData
 
struct  ElementwiseDataTypeISASelectorData
 
struct  PoolDataTypeISASelectorData
 

Typedefs

using DataTypeISASelectorPtr = std::add_pointer< bool(const DataTypeISASelectorData &data)>::type
 
using DataTypeDataLayoutSelectorPtr = std::add_pointer< bool(const DataTypeDataLayoutISASelectorData &data)>::type
 
using PoolDataTypeISASelectorPtr = std::add_pointer< bool(const PoolDataTypeISASelectorData &data)>::type
 
using ElementwiseDataTypeISASelectorPtr = std::add_pointer< bool(const ElementwiseDataTypeISASelectorData &data)>::type
 
using DepthwiseConv2dNativeDataTypeISASelectorPtr = std::add_pointer< bool(const DepthwiseConv2dNativeDataTypeISASelectorData &data)>::type
 
using CastDataTypeISASelectorDataPtr = std::add_pointer< bool(const CastDataTypeISASelectorData &data)>::type
 
using ActivationDataTypeISASelectorDataPtr = std::add_pointer< bool(const ActivationDataTypeISASelectorData &data)>::type
 
using CpuAddKernelDataTypeISASelectorDataPtr = std::add_pointer< bool(const CpuAddKernelDataTypeISASelectorData &data)>::type
 

Functions

bool can_interpret_inputs_as_1d_array (const ITensorInfo &src0, const ITensorInfo &src1)
 
Status validate_arguments (const ITensorInfo *src, const ITensorInfo *weights, const ITensorInfo *dst, const PadStrideInfo &conv_info)
 
std::pair< Status, Windowvalidate_and_configure_window (ITensorInfo *src, ITensorInfo *dst)
 
void neon_fp32_nhwc_directconv2d (const Window &window, const ITensor *src, const ITensor *weights, ITensor *dst, const PadStrideInfo &conv_info)
 
void neon_fp16_nchw_directconv2d (const Window &window, const ITensor *src, const ITensor *weights, ITensor *dst, const PadStrideInfo &conv_info)
 
void neon_fp32_nchw_directconv2d (const Window &window, const ITensor *src, const ITensor *weights, ITensor *dst, const PadStrideInfo &conv_info)
 
template<typename T >
void convolve_nchw (const Window &window, const ITensor *src, const ITensor *weights, ITensor *dst, const PadStrideInfo &conv_info)
 
template void convolve_nchw< float > (const Window &window, const ITensor *src, const ITensor *weights, ITensor *dst, const PadStrideInfo &conv_info)
 
template<typename T >
void convolve_nhwc (const Window &window, const ITensor *src, const ITensor *weights, ITensor *dst, const PadStrideInfo &conv_info)
 
template void convolve_nhwc< float > (const Window &window, const ITensor *src, const ITensor *weights, ITensor *dst, const PadStrideInfo &conv_info)
 

Typedef Documentation

◆ ActivationDataTypeISASelectorDataPtr

using ActivationDataTypeISASelectorDataPtr = std::add_pointer<bool(const ActivationDataTypeISASelectorData &data)>::type

Definition at line 100 of file CpuKernelSelectionTypes.h.

◆ CastDataTypeISASelectorDataPtr

using CastDataTypeISASelectorDataPtr = std::add_pointer<bool(const CastDataTypeISASelectorData &data)>::type

Definition at line 99 of file CpuKernelSelectionTypes.h.

◆ CpuAddKernelDataTypeISASelectorDataPtr

Definition at line 101 of file CpuKernelSelectionTypes.h.

◆ DataTypeDataLayoutSelectorPtr

using DataTypeDataLayoutSelectorPtr = std::add_pointer<bool(const DataTypeDataLayoutISASelectorData &data)>::type

Definition at line 95 of file CpuKernelSelectionTypes.h.

◆ DataTypeISASelectorPtr

using DataTypeISASelectorPtr = std::add_pointer<bool(const DataTypeISASelectorData &data)>::type

Definition at line 94 of file CpuKernelSelectionTypes.h.

◆ DepthwiseConv2dNativeDataTypeISASelectorPtr

Definition at line 98 of file CpuKernelSelectionTypes.h.

◆ ElementwiseDataTypeISASelectorPtr

using ElementwiseDataTypeISASelectorPtr = std::add_pointer<bool(const ElementwiseDataTypeISASelectorData &data)>::type

Definition at line 97 of file CpuKernelSelectionTypes.h.

◆ PoolDataTypeISASelectorPtr

using PoolDataTypeISASelectorPtr = std::add_pointer<bool(const PoolDataTypeISASelectorData &data)>::type

Definition at line 96 of file CpuKernelSelectionTypes.h.

Function Documentation

◆ can_interpret_inputs_as_1d_array()

bool arm_compute::cpu::kernels::can_interpret_inputs_as_1d_array ( const ITensorInfo src0,
const ITensorInfo src1 
)

Definition at line 42 of file CpuAddKernel.cpp.

References arm_compute::cpu::add_fp16_neon(), arm_compute::cpu::add_fp16_neon_as_1d_array(), arm_compute::cpu::add_fp16_sve(), arm_compute::cpu::add_fp32_neon(), arm_compute::cpu::add_fp32_neon_as_1d_array(), arm_compute::cpu::add_fp32_sve(), arm_compute::cpu::add_qasymm8_neon(), arm_compute::cpu::add_qasymm8_signed_neon(), arm_compute::cpu::add_qasymm8_signed_sve2(), arm_compute::cpu::add_qasymm8_sve2(), arm_compute::cpu::add_qsymm16_neon(), arm_compute::cpu::add_qsymm16_sve2(), arm_compute::cpu::add_s16_neon(), arm_compute::cpu::add_s16_neon_as_1d_array(), arm_compute::cpu::add_s16_sve(), arm_compute::cpu::add_s32_neon(), arm_compute::cpu::add_s32_neon_as_1d_array(), arm_compute::cpu::add_s32_sve(), arm_compute::cpu::add_u8_neon(), arm_compute::cpu::add_u8_neon_as_1d_array(), arm_compute::cpu::add_u8_sve(), ARM_COMPUTE_RETURN_ERROR_ON, ARM_COMPUTE_RETURN_ERROR_ON_CPU_F16_UNSUPPORTED, ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES, ARM_COMPUTE_RETURN_ERROR_ON_MSG, ARM_COMPUTE_UNUSED, TensorShape::broadcast_shape(), arm_compute::calculate_max_window(), ITensorInfo::data_type(), arm_compute::test::validation::dst, arm_compute::F16, arm_compute::F32, CPUInfo::get(), CPUInfo::get_isa(), ITensorInfo::has_padding(), arm_compute::detail::have_different_dimensions(), arm_compute::QASYMM8, arm_compute::QASYMM8_SIGNED, arm_compute::QSYMM16, REGISTER_FP16_NEON, REGISTER_FP16_SVE, REGISTER_FP32_NEON, REGISTER_FP32_SVE, REGISTER_INTEGER_NEON, REGISTER_INTEGER_SVE, REGISTER_QASYMM8_NEON, REGISTER_QASYMM8_SIGNED_NEON, REGISTER_QASYMM8_SIGNED_SVE2, REGISTER_QASYMM8_SVE2, REGISTER_QSYMM16_NEON, REGISTER_QSYMM16_SVE2, arm_compute::S16, arm_compute::S32, Window::set(), arm_compute::set_data_type_if_unknown(), arm_compute::set_shape_if_empty(), ITensorInfo::tensor_shape(), TensorShape::total_size(), ITensorInfo::total_size(), arm_compute::U8, validate_and_configure_window(), validate_arguments(), and Dimensions< T >::x().

Referenced by CpuAddKernel::configure(), and arm_compute::test::validation::DATA_TEST_CASE().

43 {
44  return !src0.has_padding() && !src1.has_padding() && src0.tensor_shape() == src1.tensor_shape();
45 }

◆ convolve_nchw()

void convolve_nchw ( const Window window,
const ITensor src,
const ITensor weights,
ITensor dst,
const PadStrideInfo conv_info 
)

Definition at line 62 of file all.cpp.

References ARM_COMPUTE_UNUSED, ITensor::buffer(), arm_compute::calculate_max_window(), arm_compute::test::validation::conv_info, conv_pad_left, conv_pad_top, convolve_nchw< float >(), ITensorInfo::dimension(), Window::DimX, Window::DimY, Window::DimZ, arm_compute::test::validation::dst, ITensorInfo::element_size(), arm_compute::execute_window_loop(), ITensor::info(), ITensorInfo::offset_first_element_in_bytes(), PadStrideInfo::pad_left(), PadStrideInfo::pad_top(), Iterator::ptr(), Window::set(), arm_compute::test::validation::src, PadStrideInfo::stride(), ITensorInfo::strides_in_bytes(), type, arm_compute::wrapper::vdup_n(), arm_compute::wrapper::vloadq(), arm_compute::wrapper::vmla(), arm_compute::vreduce(), Dimensions< T >::x(), Dimensions< T >::y(), and Dimensions< T >::z().

63 {
65 
66  // Declare useful types
67  using vtype = wrapper::traits::neon_bitvector<T, wrapper::traits::BitWidth::W128>;
68  using vector_type = typename vtype::type;
69  using tag_type = typename vtype::tag_type;
70 
71  // Scalar quantities
72  const int element_size = src->info()->element_size();
73  const int input_stride_w = src->info()->strides_in_bytes()[0] / element_size;
74  const int input_stride_h = src->info()->strides_in_bytes()[1] / element_size;
75  const int input_stride_c = src->info()->strides_in_bytes()[2] / element_size;
76  const int input_stride_n = src->info()->strides_in_bytes()[3] / element_size;
77 
78  const int input_dim_w = src->info()->dimension(0);
79  const int input_dim_h = src->info()->dimension(1);
80 
81  const int output_stride_c = dst->info()->strides_in_bytes()[2];
82 
83  const unsigned int kernel_stride_w = weights->info()->strides_in_bytes().x() / element_size;
84  const unsigned int kernel_stride_h = weights->info()->strides_in_bytes().y() / element_size;
85  const unsigned int kernel_stride_c = weights->info()->strides_in_bytes().z() / element_size;
86 
87  const int kernel_dim_w = weights->info()->dimension(0);
88  const int kernel_dim_h = weights->info()->dimension(1);
89 
90  const int conv_pad_top = conv_info.pad_top();
91  const int conv_pad_left = conv_info.pad_left();
92  const int conv_stride_w = std::get<0>(conv_info.stride());
93  const int conv_stride_h = std::get<1>(conv_info.stride());
94 
95  // Setup input window for the output iterator
96  Window window_out = window;
97  window_out.set(Window::DimZ, Window::Dimension(0, 1, 1));
98 
99  // Setup input window for the weights iterator
100  Window window_w = calculate_max_window(*weights->info(), Steps());
101  window_w.set(Window::DimX, Window::Dimension(0, 1, 1));
102  window_w.set(Window::DimY, Window::Dimension(0, 1, 1));
103  window_w.set(Window::DimZ, Window::Dimension(0, 1, 1));
104 
105  Iterator out(dst, window_out);
106  Iterator wei(weights, window_w);
107 
108  constexpr int num_elems_read_per_iteration = 16 / sizeof(T);
109 
110  execute_window_loop(window_out, [&](const Coordinates & id)
111  {
112  // We are computing the theoretical starting input starting points
113  const int in_w_start_t = static_cast<int>(id.x()) * conv_stride_w - conv_pad_left;
114  const int in_h_start_t = static_cast<int>(id.y()) * conv_stride_h - conv_pad_top;
115  const int in_w_end_t = in_w_start_t + kernel_dim_w;
116  const int in_h_end_t = in_h_start_t + kernel_dim_h;
117 
118  // We are computing the valid initial and ending input points by checking the borders
119  const int in_w_start = std::max(in_w_start_t, 0);
120  const int in_h_start = std::max(in_h_start_t, 0);
121  const int in_w_end = std::min(in_w_end_t, input_dim_w);
122  const int in_h_end = std::min(in_h_end_t, input_dim_h);
123 
124  // We use the input points to select the valid weight points to use
125  const int wei_w_start = in_w_start - in_w_start_t;
126  const int wei_h_start = in_h_start - in_h_start_t;
127  const int wei_h_end = kernel_dim_h - (in_h_end_t - in_h_end);
128 
129  const int index_c_end = weights->info()->dimension(2);
130  const T *const in_ptr_start = reinterpret_cast<const T *>(src->buffer() + src->info()->offset_first_element_in_bytes()) + id[3] * input_stride_n;
131  execute_window_loop(window_w, [&](const Coordinates & id_w)
132  {
133  const T *const weights_ptr_start = reinterpret_cast<const T *>(wei.ptr());
134  uint8_t *out_ptr = out.ptr() + id_w[3] * output_stride_c;
135  T out_temp = static_cast<T>(0);
136 
137  for(int index_wei_c = 0, index_in_c = 0; index_wei_c < index_c_end; ++index_wei_c, ++index_in_c)
138  {
139  const T *const in_ptr_row_0 = in_ptr_start + index_in_c * input_stride_c;
140  const T *const weights_ptr_row_0 = weights_ptr_start + index_wei_c * kernel_stride_c;
141  for(int index_wei_h = wei_h_start, index_in_h = in_h_start; index_wei_h < wei_h_end; ++index_wei_h, ++index_in_h)
142  {
143  const T *in_ptr_row = in_ptr_row_0 + index_in_h * input_stride_h;
144  const T *weights_ptr_row = weights_ptr_row_0 + index_wei_h * kernel_stride_h;
145  int index_w = in_w_start;
146  int index_wei_w = wei_w_start;
147  vector_type out_temp_vec = wrapper::vdup_n(static_cast<T>(0), tag_type());
148  for(; index_w <= ((in_w_end - num_elems_read_per_iteration)); index_w += num_elems_read_per_iteration, index_wei_w += num_elems_read_per_iteration)
149  {
150  const auto src_vec = wrapper::vloadq(in_ptr_row + index_w * input_stride_w);
151  const auto w_vec = wrapper::vloadq(weights_ptr_row + index_wei_w * kernel_stride_w);
152  out_temp_vec = wrapper::vmla(out_temp_vec, w_vec, src_vec);
153  }
154  out_temp += vreduce(out_temp_vec);
155  for(; index_w < in_w_end; ++index_w, ++index_wei_w)
156  {
157  const auto src_val = *(in_ptr_row + index_w * input_stride_w);
158  const auto w_val = *(weights_ptr_row + index_wei_w * kernel_stride_w);
159  out_temp += src_val * w_val;
160  }
161  }
162  }
163  *(reinterpret_cast<T *>(out_ptr)) = out_temp;
164 
165  },
166  wei);
167  },
168  out);
169 }
Window calculate_max_window(const ValidRegion &valid_region, const Steps &steps, bool skip_border, BorderSize border_size)
uint8x16_t vloadq(const uint8_t *ptr)
Definition: load.h:58
const size_t conv_pad_top
Definition: impl.cpp:60
decltype(strategy::transforms) typedef type
SimpleTensor< float > src
Definition: DFT.cpp:155
const size_t conv_pad_left
Definition: impl.cpp:59
#define ARM_COMPUTE_UNUSED(...)
To avoid unused variables warnings.
Definition: Error.h:152
void set(size_t dimension, const Dimension &dim)
Set the values of a given dimension.
Definition: Window.inl:49
float vreduce(const float32x4_t &v)
Reduce a vector to be a scalar by accumulating all lanes in the vector.
Definition: NEMath.inl:421
uint8x8_t vdup_n(uint8_t value, traits::vector_64_tag)
Definition: dup_n.h:41
void execute_window_loop(const Window &w, L &&lambda_function, Ts &&... iterators)
Iterate through the passed window, automatically adjusting the iterators and calling the lambda_funct...
Definition: Helpers.inl:77
uint8x8_t vmla(const uint8x8_t &a, const uint8x8_t &b, const uint8x8_t &c)
Definition: mla.h:46

◆ convolve_nchw< float >()

template void arm_compute::cpu::kernels::convolve_nchw< float > ( const Window window,
const ITensor src,
const ITensor weights,
ITensor dst,
const PadStrideInfo conv_info 
)

◆ convolve_nhwc()

void convolve_nhwc ( const Window window,
const ITensor src,
const ITensor weights,
ITensor dst,
const PadStrideInfo conv_info 
)

Definition at line 57 of file impl.cpp.

References ITensor::buffer(), arm_compute::calculate_max_window(), arm_compute::test::validation::conv_info, conv_pad_left, conv_pad_top, convolve_nhwc< float >(), ITensorInfo::dimension(), Window::DimX, Window::DimY, Window::DimZ, arm_compute::test::validation::dst, ITensorInfo::element_size(), arm_compute::execute_window_loop(), ITensor::info(), ITensorInfo::offset_first_element_in_bytes(), PadStrideInfo::pad_left(), PadStrideInfo::pad_top(), Iterator::ptr(), Window::set(), arm_compute::test::validation::src, PadStrideInfo::stride(), ITensorInfo::strides_in_bytes(), type, arm_compute::wrapper::vdup_n(), arm_compute::wrapper::vloadq(), arm_compute::wrapper::vmla(), arm_compute::vreduce(), Dimensions< T >::x(), Dimensions< T >::y(), and Dimensions< T >::z().

58 {
59  // Declare useful types
60  using vtype = wrapper::traits::neon_bitvector<T, wrapper::traits::BitWidth::W128>;
61  using vector_type = typename vtype::type;
62  using tag_type = typename vtype::tag_type;
63 
64  // Scalar quantities
65  const int element_size = src->info()->element_size();
66  const int input_stride_w = src->info()->strides_in_bytes().y() / element_size;
67  const int input_stride_h = src->info()->strides_in_bytes().z() / element_size;
68  const int input_stride_n = src->info()->strides_in_bytes()[3] / element_size;
69  const int input_dim_w = src->info()->dimension(1);
70  const int input_dim_h = src->info()->dimension(2);
71 
72  const int output_stride_c = dst->info()->strides_in_bytes().x();
73 
74  const unsigned int kernel_stride_w = weights->info()->strides_in_bytes().y() / element_size;
75  const unsigned int kernel_stride_h = weights->info()->strides_in_bytes().z() / element_size;
76  const int kernel_dim_w = weights->info()->dimension(1);
77  const int kernel_dim_h = weights->info()->dimension(2);
78 
79  const int conv_pad_top = conv_info.pad_top();
80  const int conv_pad_left = conv_info.pad_left();
81  const int conv_stride_w = std::get<0>(conv_info.stride());
82  const int conv_stride_h = std::get<1>(conv_info.stride());
83 
84  // Setup input window for the output iterator
85  Window window_out = window;
86  window_out.set(Window::DimX, Window::Dimension(0, 1, 1));
87 
88  // Setup input window for the weights iterator
89  Window window_w = calculate_max_window(*weights->info(), Steps());
90  window_w.set(Window::DimX, Window::Dimension(0, 1, 1));
91  window_w.set(Window::DimY, Window::Dimension(0, 1, 1));
92  window_w.set(Window::DimZ, Window::Dimension(0, 1, 1));
93 
94  Iterator out(dst, window_out);
95  Iterator wei(weights, window_w);
96 
97  constexpr int num_elems_read_per_iteration = 16 / sizeof(T);
98 
99  // nhwc optimized
100  if(have_zero_x_internal_padding(src->info(), weights->info()))
101  {
102  // This function assumes that input and weights have not padding in channel
103 
104  /*
105  * This implementation parallelize the full WC plane of input and weights by
106  * treating them as series of elements. So for example, a 3x3 weights and
107  * floating point vector operations of 4 elements per time, the first 3
108  * channel elements of the first row would be taken and additionally the first
109  * element of the second row. The 9 elements in each single WC weight plane
110  * would require 2 4-element vector operations and a last single element operation.
111  *
112  * This works since when we create the input vector to multiply with the weights,
113  * the exact required elements are loaded in the same order. Therefore the
114  * multiplication works on the correct input/weight elements.
115  */
117  window_out, [&](const Coordinates & id)
118  {
119  /*
120  * In here we create theoretical indexes which then we validate for both
121  * inputs and weights.
122  * As a reminder, this loop take each output point in NHW, C is treated
123  * in the weights loop.
124  */
125  // We are computing the theoretical starting input starting points
126  const int in_w_start_t = static_cast<int>(id.y()) * conv_stride_w - conv_pad_left;
127  const int in_h_start_t = static_cast<int>(id.z()) * conv_stride_h - conv_pad_top;
128  const int in_w_end_t = in_w_start_t + kernel_dim_w;
129  const int in_h_end_t = in_h_start_t + kernel_dim_h;
130 
131  // We are computing the valid initial and ending input points by checking the borders
132  const int in_w_start = std::max(in_w_start_t, 0);
133  const int in_h_start = std::max(in_h_start_t, 0);
134  const int in_w_end = std::min(in_w_end_t, input_dim_w);
135  const int in_h_end = std::min(in_h_end_t, input_dim_h);
136 
137  // We use the input points to select the valid weight points to use
138  const int index_wc_start = (in_w_start - in_w_start_t) * kernel_stride_w;
139  const int index_h_start = in_h_start - in_h_start_t;
140  const int index_wc_end = (kernel_dim_w - (in_w_end_t - in_w_end)) * kernel_stride_w;
141  const int index_h_end = kernel_dim_h - (in_h_end_t - in_h_end);
142 
144  window_w, [&](const Coordinates & id_w)
145  {
146  /*
147  * This is the loop in the weights, and it goes along N (the batches)
148  * As a reminder, the batches of the weights are translated into the
149  * channels of the output
150  */
151  const T *in_ptr_row = reinterpret_cast<const T *>(src->buffer() + src->info()->offset_first_element_in_bytes())
152  + id[3] * input_stride_n + in_w_start * input_stride_w + in_h_start * input_stride_h;
153  const T *weights_ptr_row = reinterpret_cast<const T *>(wei.ptr()) + index_h_start * kernel_stride_h;
154  uint8_t *out_ptr = out.ptr() + id_w[3] * output_stride_c;
155 
156  T out_temp = static_cast<T>(0);
157  for(int index_h = index_h_start; index_h < index_h_end; ++index_h, in_ptr_row += input_stride_h, weights_ptr_row += kernel_stride_h)
158  {
159  const T *in_ptr_mover = in_ptr_row;
160  int index_wc = index_wc_start;
161  vector_type out_temp_vec = wrapper::vdup_n(static_cast<T>(0), tag_type());
162  for(; index_wc <= index_wc_end - num_elems_read_per_iteration; index_wc += num_elems_read_per_iteration, in_ptr_mover += num_elems_read_per_iteration)
163  {
164  const auto src_vec = wrapper::vloadq(in_ptr_mover);
165  const auto w_vec = wrapper::vloadq(weights_ptr_row + index_wc);
166  out_temp_vec = wrapper::vmla(out_temp_vec, w_vec, src_vec);
167  }
168  out_temp += vreduce(out_temp_vec);
169  for(; index_wc < index_wc_end; ++index_wc, ++in_ptr_mover)
170  {
171  const auto src_val = *(in_ptr_mover);
172  const auto w_val = *(weights_ptr_row + index_wc);
173  out_temp += src_val * w_val;
174  }
175  }
176  *(reinterpret_cast<T *>(out_ptr)) = out_temp;
177  },
178  wei);
179  },
180  out);
181  }
182  else // nhwc non optimized
183  {
185  window_out, [&](const Coordinates & id)
186  {
187  // We are computing the theoretical starting input starting points
188  const int in_w_start_t = static_cast<int>(id.y()) * conv_stride_w - conv_pad_left;
189  const int in_h_start_t = static_cast<int>(id.z()) * conv_stride_h - conv_pad_top;
190  const int in_w_end_t = in_w_start_t + kernel_dim_w;
191  const int in_h_end_t = in_h_start_t + kernel_dim_h;
192 
193  // We are computing the valid initial and ending input points by checking the borders
194  const int in_w_start = std::max(in_w_start_t, 0);
195  const int in_h_start = std::max(in_h_start_t, 0);
196  const int in_w_end = std::min(in_w_end_t, input_dim_w);
197  const int in_h_end = std::min(in_h_end_t, input_dim_h);
198 
199  // We use the input points to select the valid weight points to use
200  const int wei_w_start = in_w_start - in_w_start_t;
201  const int wei_h_start = in_h_start - in_h_start_t;
202  const int wei_w_end = kernel_dim_w - (in_w_end_t - in_w_end);
203  const int wei_h_end = kernel_dim_h - (in_h_end_t - in_h_end);
204 
205  const int index_c_end = weights->info()->dimension(0);
206  const T *const in_ptr_start = reinterpret_cast<const T *>(src->buffer() + src->info()->offset_first_element_in_bytes()) + id[3] * input_stride_n;
207 
209  window_w, [&](const Coordinates & id_w)
210  {
211  const T *const weights_ptr_start = reinterpret_cast<const T *>(wei.ptr());
212  uint8_t *out_ptr = out.ptr() + id_w[3] * output_stride_c;
213 
214  T out_temp = static_cast<T>(0);
215  for(int index_wei_h = wei_h_start, index_in_h = in_h_start; index_wei_h < wei_h_end; ++index_wei_h, ++index_in_h)
216  {
217  const T *const in_ptr_row = in_ptr_start + index_in_h * input_stride_h;
218  const T *const weights_ptr_row = weights_ptr_start + index_wei_h * kernel_stride_h;
219  for(int index_wei_w = wei_w_start, index_in_w = in_w_start; index_wei_w < wei_w_end; ++index_wei_w, ++index_in_w)
220  {
221  const T *in_ptr_mover = in_ptr_row + index_in_w * input_stride_w;
222  const T *weights_ptr_mover = weights_ptr_row + index_wei_w * kernel_stride_w;
223  int index_c = 0;
224  vector_type out_temp_vec = wrapper::vdup_n(static_cast<T>(0), tag_type());
225  for(; index_c <= index_c_end - num_elems_read_per_iteration; index_c += num_elems_read_per_iteration, in_ptr_mover += num_elems_read_per_iteration, weights_ptr_mover += num_elems_read_per_iteration)
226  {
227  const auto src_vec = wrapper::vloadq(in_ptr_mover);
228  const auto w_vec = wrapper::vloadq(weights_ptr_mover);
229  out_temp_vec = wrapper::vmla(out_temp_vec, w_vec, src_vec);
230  }
231  out_temp += vreduce(out_temp_vec);
232  for(; index_c < index_c_end; ++index_c, ++in_ptr_mover, ++weights_ptr_mover)
233  {
234  const auto src_val = *(in_ptr_mover);
235  const auto w_val = *(weights_ptr_mover);
236  out_temp += src_val * w_val;
237  }
238  }
239  }
240  *(reinterpret_cast<T *>(out_ptr)) = out_temp;
241  },
242  wei);
243  },
244  out);
245  }
246 }
Window calculate_max_window(const ValidRegion &valid_region, const Steps &steps, bool skip_border, BorderSize border_size)
uint8x16_t vloadq(const uint8_t *ptr)
Definition: load.h:58
const size_t conv_pad_top
Definition: impl.cpp:60
decltype(strategy::transforms) typedef type
SimpleTensor< float > src
Definition: DFT.cpp:155
const size_t conv_pad_left
Definition: impl.cpp:59
void set(size_t dimension, const Dimension &dim)
Set the values of a given dimension.
Definition: Window.inl:49
float vreduce(const float32x4_t &v)
Reduce a vector to be a scalar by accumulating all lanes in the vector.
Definition: NEMath.inl:421
uint8x8_t vdup_n(uint8_t value, traits::vector_64_tag)
Definition: dup_n.h:41
void execute_window_loop(const Window &w, L &&lambda_function, Ts &&... iterators)
Iterate through the passed window, automatically adjusting the iterators and calling the lambda_funct...
Definition: Helpers.inl:77
uint8x8_t vmla(const uint8x8_t &a, const uint8x8_t &b, const uint8x8_t &c)
Definition: mla.h:46

◆ convolve_nhwc< float >()

template void arm_compute::cpu::kernels::convolve_nhwc< float > ( const Window window,
const ITensor src,
const ITensor weights,
ITensor dst,
const PadStrideInfo conv_info 
)

◆ neon_fp16_nchw_directconv2d()

void arm_compute::cpu::kernels::neon_fp16_nchw_directconv2d ( const Window window,
const ITensor src,
const ITensor weights,
ITensor dst,
const PadStrideInfo conv_info 
)

◆ neon_fp32_nchw_directconv2d()

void neon_fp32_nchw_directconv2d ( const Window window,
const ITensor src,
const ITensor weights,
ITensor dst,
const PadStrideInfo conv_info 
)

Definition at line 56 of file all.cpp.

References arm_compute::test::validation::conv_info, convolve_nchw< float >(), arm_compute::test::validation::dst, and arm_compute::test::validation::src.

57 {
58  convolve_nchw<float>(window, src, weights, dst, conv_info);
59 }
SimpleTensor< float > src
Definition: DFT.cpp:155
template void convolve_nchw< float >(const Window &window, const ITensor *src, const ITensor *weights, ITensor *dst, const PadStrideInfo &conv_info)

◆ neon_fp32_nhwc_directconv2d()

void neon_fp32_nhwc_directconv2d ( const Window window,
const ITensor src,
const ITensor weights,
ITensor dst,
const PadStrideInfo conv_info 
)

Definition at line 33 of file fp32.cpp.

References arm_compute::test::validation::conv_info, convolve_nhwc< float >(), arm_compute::test::validation::dst, and arm_compute::test::validation::src.

34 {
35  convolve_nhwc<float>(window, src, weights, dst, conv_info);
36 }
SimpleTensor< float > src
Definition: DFT.cpp:155
template void convolve_nhwc< float >(const Window &window, const ITensor *src, const ITensor *weights, ITensor *dst, const PadStrideInfo &conv_info)

◆ validate_and_configure_window()

std::pair<Status, Window> arm_compute::cpu::kernels::validate_and_configure_window ( ITensorInfo src,
ITensorInfo dst 
)

Definition at line 92 of file CpuDirectConv2dKernel.cpp.

References ARM_COMPUTE_CREATE_ERROR, ARM_COMPUTE_ERROR_ON, ARM_COMPUTE_UNUSED, arm_compute::calculate_max_window(), ITensorInfo::data_layout(), arm_compute::RUNTIME_ERROR, and arm_compute::UNKNOWN.

Referenced by can_interpret_inputs_as_1d_array(), CpuConvertQuantizedSignednessKernel::configure(), CpuCopyKernel::configure(), CpuActivationKernel::configure(), CpuPool2dKernel::configure(), ClGemmLowpMatrixMultiplyNativeKernel::configure(), CpuDirectConv2dKernel::configure(), NEInstanceNormalizationLayerKernel::configure(), ClGemmMatrixMultiplyNativeKernel::configure(), ClWinogradFilterTransformKernel::configure(), ClWinogradInputTransformKernel::configure(), NEFFTScaleKernel::configure(), NEFFTDigitReverseKernel::configure(), ClGemmMatrixMultiplyReshapedOnlyRhsMMULKernel::configure(), CLChannelShuffleLayerKernel::configure(), ClWinogradOutputTransformKernel::configure(), CLQLSTMLayerNormalizationKernel::configure(), CLGatherKernel::configure(), NEFFTRadixStageKernel::configure(), CLNormalizationLayerKernel::configure(), NEMeanStdDevNormalizationKernel::configure(), CLComparisonKernel::configure(), ClGemmLowpMatrixMultiplyReshapedKernel::configure(), NEStackLayerKernel::configure(), CLFFTDigitReverseKernel::configure(), ClGemmReshapeRhsMatrixKernel::configure(), CpuAddKernel::configure(), NEStridedSliceKernel::configure(), ClCol2ImKernel::configure(), CLFFTRadixStageKernel::configure(), CLPriorBoxLayerKernel::configure(), ClGemmLowpMatrixMultiplyReshapedOnlyRhsKernel::configure(), CLStackLayerKernel::configure(), ClGemmMatrixMultiplyReshapedOnlyRhsKernel::configure(), ClIm2ColKernel::configure(), CLDeconvolutionReshapeOutputKernel::configure(), ClGemmMatrixMultiplyReshapedKernel::configure(), CpuCopyKernel::validate(), CpuActivationKernel::validate(), CpuPool2dKernel::validate(), ClGemmLowpMatrixMultiplyNativeKernel::validate(), CpuDirectConv2dKernel::validate(), ClWinogradInputTransformKernel::validate(), ClWinogradFilterTransformKernel::validate(), ClGemmMatrixMultiplyNativeKernel::validate(), NEFFTScaleKernel::validate(), NEInstanceNormalizationLayerKernel::validate(), ClGemmMatrixMultiplyReshapedOnlyRhsMMULKernel::validate(), ClWinogradOutputTransformKernel::validate(), CLChannelShuffleLayerKernel::validate(), NEFFTDigitReverseKernel::validate(), NEFFTRadixStageKernel::validate(), ClGemmLowpMatrixMultiplyReshapedKernel::validate(), CLQLSTMLayerNormalizationKernel::validate(), ClGemmReshapeRhsMatrixKernel::validate(), NEMeanStdDevNormalizationKernel::validate(), ClGemmLowpMatrixMultiplyReshapedOnlyRhsMMULKernel::validate(), CLComparisonKernel::validate(), CpuAddKernel::validate(), CLNormalizationLayerKernel::validate(), CLGatherKernel::validate(), CLFFTDigitReverseKernel::validate(), ClCol2ImKernel::validate(), NEStackLayerKernel::validate(), CLFFTRadixStageKernel::validate(), ClGemmLowpMatrixMultiplyReshapedOnlyRhsKernel::validate(), CLPriorBoxLayerKernel::validate(), ClIm2ColKernel::validate(), NEStridedSliceKernel::validate(), and CLStackLayerKernel::validate().

93 {
94  ARM_COMPUTE_ERROR_ON(src->data_layout() == DataLayout::UNKNOWN);
96 
97  Window win{};
98  bool window_changed = false;
99 
100  // Configure window without any padding
101  win = calculate_max_window(*dst, Steps());
102 
103  Status err = (window_changed) ? ARM_COMPUTE_CREATE_ERROR(ErrorCode::RUNTIME_ERROR, "Insufficient Padding!") : Status{};
104  return std::make_pair(err, win);
105 }
Window calculate_max_window(const ValidRegion &valid_region, const Steps &steps, bool skip_border, BorderSize border_size)
Unknown CL kernel type.
Definition: CLTypes.h:82
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:466
SimpleTensor< float > src
Definition: DFT.cpp:155
#define ARM_COMPUTE_UNUSED(...)
To avoid unused variables warnings.
Definition: Error.h:152
#define ARM_COMPUTE_CREATE_ERROR(error_code, msg)
Creates an error with a given message.
Definition: Error.h:159

◆ validate_arguments()

Status arm_compute::cpu::kernels::validate_arguments ( const ITensorInfo src,
const ITensorInfo weights,
const ITensorInfo dst,
const PadStrideInfo conv_info 
)

Definition at line 60 of file CpuDirectConv2dKernel.cpp.

References ARM_COMPUTE_RETURN_ERROR_ON, ARM_COMPUTE_RETURN_ERROR_ON_CPU_F16_UNSUPPORTED, ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DIMENSIONS, ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR, ARM_COMPUTE_UNUSED, arm_compute::CHANNEL, arm_compute::misc::shape_calculator::compute_deep_convolution_shape(), ITensorInfo::data_layout(), arm_compute::test::validation::data_layout, arm_compute::test::validation::data_type, ITensorInfo::data_type(), ITensorInfo::dimension(), arm_compute::F16, arm_compute::F32, arm_compute::get_data_layout_dimension_index(), arm_compute::HEIGHT, arm_compute::NHWC, ITensorInfo::num_dimensions(), arm_compute::test::validation::output_shape, ITensorInfo::tensor_shape(), ITensorInfo::total_size(), arm_compute::UNKNOWN, and arm_compute::WIDTH.

Referenced by can_interpret_inputs_as_1d_array(), CpuConvertQuantizedSignednessKernel::configure(), CpuDequantizeKernel::configure(), CpuReshapeKernel::configure(), CpuCopyKernel::configure(), ClReshapeKernel::configure(), ClDequantizeKernel::configure(), ClFloorKernel::configure(), CpuConcatenateBatchKernel::configure(), CpuFloorKernel::configure(), CpuPermuteKernel::configure(), ClCopyKernel::configure(), ClElementWiseUnaryKernel::configure(), CpuConcatenateHeightKernel::configure(), CpuConcatenateWidthKernel::configure(), ClScaleKernel::configure(), ClWidthConcatenate2TensorsKernel::configure(), CpuQuantizeKernel::configure(), ClHeightConcatenateKernel::configure(), ClPool2dKernel::configure(), ClPool3dKernel::configure(), ClWidthConcatenateKernel::configure(), ClActivationKernel::configure(), CpuActivationKernel::configure(), ClQuantizeKernel::configure(), ClPermuteKernel::configure(), ClWidthConcatenate4TensorsKernel::configure(), CPPDetectionOutputLayer::configure(), CpuPool2dKernel::configure(), CLStridedSliceKernel::configure(), ClBatchConcatenateKernel::configure(), ClDepthConcatenateKernel::configure(), CpuConcatenateDepthKernel::configure(), CpuDirectConv2dOutputStageKernel::configure(), ClGemmLowpMatrixMultiplyNativeKernel::configure(), NEBatchToSpaceLayerKernel::configure(), CpuMaxUnpoolingLayerKernel::configure(), NEReverseKernel::configure(), NETileKernel::configure(), CpuDirectConv2dKernel::configure(), NEChannelShuffleLayerKernel::configure(), NEDepthToSpaceLayerKernel::configure(), NEPriorBoxLayerKernel::configure(), NESpaceToDepthLayerKernel::configure(), ClGemmReshapeLhsMatrixKernel::configure(), CPPTopKVKernel::configure(), CpuGemmLowpMatrixMultiplyKernel::configure(), CpuPool3dKernel::configure(), CpuScaleKernel::configure(), NEComputeAllAnchorsKernel::configure(), NEInstanceNormalizationLayerKernel::configure(), NEReorgLayerKernel::configure(), ClGemmMatrixMultiplyNativeKernel::configure(), ClWinogradFilterTransformKernel::configure(), ClWinogradInputTransformKernel::configure(), CLInstanceNormalizationLayerKernel::configure(), CLMaxUnpoolingLayerKernel::configure(), NEFFTDigitReverseKernel::configure(), NEFFTScaleKernel::configure(), NESpaceToBatchLayerKernel::configure(), ClGemmMatrixMultiplyReshapedOnlyRhsMMULKernel::configure(), CPPPermuteKernel::configure(), CLChannelShuffleLayerKernel::configure(), CLReverseKernel::configure(), CpuSubKernel::configure(), NENormalizationLayerKernel::configure(), CLSelectKernel::configure(), ClGemmLowpQuantizeDownInt32ScaleByFixedPointKernel::configure(), CLBatchToSpaceLayerKernel::configure(), ClWinogradOutputTransformKernel::configure(), NEPadLayerKernel::configure(), CLDepthToSpaceLayerKernel::configure(), CLSpaceToDepthLayerKernel::configure(), ClCastKernel::configure(), NERangeKernel::configure(), ClGemmLowpQuantizeDownInt32ScaleByFloatKernel::configure(), ClGemmLowpQuantizeDownInt32ScaleKernel::configure(), CLComputeAllAnchorsKernel::configure(), CpuCastKernel::configure(), CLNormalizationLayerKernel::configure(), CpuDepthwiseConv2dNativeKernel::configure(), CpuDirectConv3dKernel::configure(), NEBoundingBoxTransformKernel::configure(), CpuGemmLowpOffsetContributionKernel::configure(), CLFFTScaleKernel::configure(), CLQLSTMLayerNormalizationKernel::configure(), NEFFTRadixStageKernel::configure(), CLGatherKernel::configure(), NEROIPoolingLayerKernel::configure(), CLSpaceToBatchLayerKernel::configure(), CLTileKernel::configure(), CpuGemmLowpQuantizeDownInt32ScaleKernel::configure(), CLComparisonKernel::configure(), ClGemmLowpMatrixMultiplyReshapedKernel::configure(), NEStackLayerKernel::configure(), CLFFTDigitReverseKernel::configure(), CpuCol2ImKernel::configure(), CPPDetectionPostProcessLayer::configure(), CpuGemmLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel::configure(), CPPNonMaximumSuppressionKernel::configure(), NEReductionOperationKernel::configure(), CLReorgLayerKernel::configure(), ClGemmLowpMatrixMultiplyReshapedOnlyRhsMMULKernel::configure(), NEFuseBatchNormalizationKernel::configure(), ClGemmLowpOffsetContributionKernel::configure(), ClGemmLowpOffsetContributionOutputStageKernel::configure(), ClGemmReshapeRhsMatrixKernel::configure(), CpuAddKernel::configure(), NEBatchNormalizationLayerKernel::configure(), CLNormalizePlanarYUVLayerKernel::configure(), CpuGemmLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel::configure(), CpuGemmLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel::configure(), CpuGemmMatrixMultiplyKernel::configure(), CpuMulKernel::configure(), NEGatherKernel::configure(), CLRangeKernel::configure(), CLReductionOperationKernel::configure(), ClDirectConv2dKernel::configure(), NEROIAlignLayerKernel::configure(), CLPadLayerKernel::configure(), ClCol2ImKernel::configure(), NEStridedSliceKernel::configure(), CLFFTRadixStageKernel::configure(), CLPriorBoxLayerKernel::configure(), CLL2NormalizeLayerKernel::configure(), ClDirectConv3dKernel::configure(), ClGemmLowpMatrixMultiplyReshapedOnlyRhsKernel::configure(), ClMulKernel::configure(), CLBoundingBoxTransformKernel::configure(), CpuWeightsReshapeKernel::configure(), CLDepthwiseConvolutionLayerNativeKernel::configure(), CpuWinogradConv2d::configure(), CLStackLayerKernel::configure(), ClGemmMatrixMultiplyReshapedOnlyRhsKernel::configure(), ClWeightsReshapeKernel::configure(), CLArgMinMaxLayerKernel::configure(), ClIm2ColKernel::configure(), CpuIm2ColKernel::configure(), CLROIAlignLayerKernel::configure(), CLDeconvolutionReshapeOutputKernel::configure(), CLFuseBatchNormalizationKernel::configure(), CLBatchNormalizationLayerKernel::configure(), CpuGemmLowpOffsetContributionOutputStageKernel::configure(), ClGemmMatrixMultiplyReshapedKernel::configure(), ClWinogradConv2d::configure(), CpuFloorKernel::infer_window(), CpuConvertQuantizedSignednessKernel::validate(), CpuDequantizeKernel::validate(), CpuCopyKernel::validate(), CpuReshapeKernel::validate(), ClReshapeKernel::validate(), ClDequantizeKernel::validate(), ClFloorKernel::validate(), CpuConcatenateBatchKernel::validate(), CpuFloorKernel::validate(), CpuPermuteKernel::validate(), ClCopyKernel::validate(), ClElementWiseUnaryKernel::validate(), CpuConcatenateWidthKernel::validate(), CpuConcatenateHeightKernel::validate(), ClWidthConcatenate2TensorsKernel::validate(), CpuQuantizeKernel::validate(), ClScaleKernel::validate(), ClPool2dKernel::validate(), ClPool3dKernel::validate(), ClWidthConcatenateKernel::validate(), ClActivationKernel::validate(), ClHeightConcatenateKernel::validate(), CpuActivationKernel::validate(), ClQuantizeKernel::validate(), ClWidthConcatenate4TensorsKernel::validate(), CpuPool2dKernel::validate(), ClPermuteKernel::validate(), ClBatchConcatenateKernel::validate(), ClDepthConcatenateKernel::validate(), CpuConcatenateDepthKernel::validate(), CPPDetectionOutputLayer::validate(), CpuDirectConv2dOutputStageKernel::validate(), ClGemmLowpMatrixMultiplyNativeKernel::validate(), CpuDirectConv2dKernel::validate(), ClGemmReshapeLhsMatrixKernel::validate(), NETileKernel::validate(), CpuGemmLowpMatrixMultiplyKernel::validate(), CpuPool3dKernel::validate(), NEReverseKernel::validate(), NESpaceToDepthLayerKernel::validate(), NEChannelShuffleLayerKernel::validate(), NEDepthToSpaceLayerKernel::validate(), CpuMaxUnpoolingLayerKernel::validate(), CpuScaleKernel::validate(), ClWinogradFilterTransformKernel::validate(), ClWinogradInputTransformKernel::validate(), NEPriorBoxLayerKernel::validate(), CpuSubKernel::validate(), ClGemmLowpQuantizeDownInt32ScaleByFixedPointKernel::validate(), NEInstanceNormalizationLayerKernel::validate(), NEFFTScaleKernel::validate(), NEComputeAllAnchorsKernel::validate(), ClGemmMatrixMultiplyNativeKernel::validate(), ClGemmLowpQuantizeDownInt32ScaleByFloatKernel::validate(), CPPTopKVKernel::validate(), ClWinogradOutputTransformKernel::validate(), CLChannelShuffleLayerKernel::validate(), NEReorgLayerKernel::validate(), ClCastKernel::validate(), CPPPermuteKernel::validate(), ClGemmLowpQuantizeDownInt32ScaleKernel::validate(), CLInstanceNormalizationLayerKernel::validate(), ClGemmMatrixMultiplyReshapedOnlyRhsMMULKernel::validate(), CpuGemmLowpOffsetContributionKernel::validate(), CpuCastKernel::validate(), CLDepthToSpaceLayerKernel::validate(), CpuDepthwiseConv2dNativeKernel::validate(), CpuDirectConv3dKernel::validate(), NEFFTDigitReverseKernel::validate(), CLSelectKernel::validate(), CLReverseKernel::validate(), CLSpaceToDepthLayerKernel::validate(), CLStridedSliceKernel::validate(), CpuGemmLowpQuantizeDownInt32ScaleKernel::validate(), CpuCol2ImKernel::validate(), CLMaxUnpoolingLayerKernel::validate(), CLComputeAllAnchorsKernel::validate(), CpuGemmLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel::validate(), NEFFTRadixStageKernel::validate(), CLFFTScaleKernel::validate(), ClGemmLowpMatrixMultiplyReshapedKernel::validate(), NERangeKernel::validate(), NENormalizationLayerKernel::validate(), NEBatchToSpaceLayerKernel::validate(), ClGemmReshapeRhsMatrixKernel::validate(), NEMeanStdDevNormalizationKernel::validate(), CLQLSTMLayerNormalizationKernel::validate(), CpuAddKernel::validate(), CLNormalizationLayerKernel::validate(), NEPadLayerKernel::validate(), CpuMulKernel::validate(), CpuGemmLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel::validate(), CpuGemmLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel::validate(), CLComparisonKernel::validate(), CpuGemmMatrixMultiplyKernel::validate(), ClGemmLowpMatrixMultiplyReshapedOnlyRhsMMULKernel::validate(), CLGatherKernel::validate(), CLTileKernel::validate(), ClDirectConv2dKernel::validate(), ClGemmLowpOffsetContributionKernel::validate(), NEGatherKernel::validate(), CLFFTDigitReverseKernel::validate(), ClCol2ImKernel::validate(), ClGemmLowpOffsetContributionOutputStageKernel::validate(), CLMeanStdDevNormalizationKernel::validate(), NEReductionOperationKernel::validate(), ClDirectConv3dKernel::validate(), CPPNonMaximumSuppressionKernel::validate(), NEBoundingBoxTransformKernel::validate(), CLRangeKernel::validate(), CLReorgLayerKernel::validate(), CLFFTRadixStageKernel::validate(), NEStackLayerKernel::validate(), CpuWeightsReshapeKernel::validate(), ClGemmLowpMatrixMultiplyReshapedOnlyRhsKernel::validate(), CLNormalizePlanarYUVLayerKernel::validate(), NESpaceToBatchLayerKernel::validate(), ClMulKernel::validate(), CLReductionOperationKernel::validate(), CLPriorBoxLayerKernel::validate(), CPPDetectionPostProcessLayer::validate(), CLPadLayerKernel::validate(), NEROIPoolingLayerKernel::validate(), CpuWinogradConv2d::validate(), ClWeightsReshapeKernel::validate(), NEROIAlignLayerKernel::validate(), ClGemmMatrixMultiplyReshapedOnlyRhsKernel::validate(), NEBatchNormalizationLayerKernel::validate(), CLL2NormalizeLayerKernel::validate(), CLBoundingBoxTransformKernel::validate(), CpuIm2ColKernel::validate(), NEFuseBatchNormalizationKernel::validate(), NEStridedSliceKernel::validate(), ClIm2ColKernel::validate(), CLBatchToSpaceLayerKernel::validate(), CLStackLayerKernel::validate(), CLDepthwiseConvolutionLayerNativeKernel::validate(), CLArgMinMaxLayerKernel::validate(), CpuGemmLowpOffsetContributionOutputStageKernel::validate(), ClGemmMatrixMultiplyReshapedKernel::validate(), CLDeconvolutionReshapeOutputKernel::validate(), CLROIAlignLayerKernel::validate(), CLSpaceToBatchLayerKernel::validate(), CLBatchNormalizationLayerKernel::validate(), CLFuseBatchNormalizationKernel::validate(), and ClWinogradConv2d::validate().

61 {
65  ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(src, 1, DataType::F16, DataType::F32);
67 
68  const DataLayout data_layout = src->data_layout();
69  const int width_idx = get_data_layout_dimension_index(data_layout, DataLayoutDimension::WIDTH);
70  const int height_idx = get_data_layout_dimension_index(data_layout, DataLayoutDimension::HEIGHT);
71  const int channel_idx = get_data_layout_dimension_index(data_layout, DataLayoutDimension::CHANNEL);
72 
73  ARM_COMPUTE_RETURN_ERROR_ON(weights->dimension(channel_idx) != src->dimension(channel_idx));
74  ARM_COMPUTE_RETURN_ERROR_ON(weights->dimension(width_idx) != weights->dimension(height_idx));
75  ARM_COMPUTE_RETURN_ERROR_ON(weights->num_dimensions() > 4);
76  ARM_COMPUTE_RETURN_ERROR_ON(data_layout == DataLayout::NHWC && src->data_type() != DataType::F32);
77  ARM_COMPUTE_UNUSED(width_idx);
78  // Checks performed when output is configured
79  if(dst->total_size() != 0)
80  {
82 
83  DataType data_type = src->data_type();
84 
86  ARM_COMPUTE_RETURN_ERROR_ON(dst->data_type() != data_type);
87  }
88 
89  return Status{};
90 }
DataType
Definition: Acl.hpp:485
#define ARM_COMPUTE_RETURN_ERROR_ON_CPU_F16_UNSUPPORTED(tensor)
Definition: Validate.h:115
Unknown CL kernel type.
Definition: CLTypes.h:82
#define ARM_COMPUTE_RETURN_ERROR_ON(cond)
If the condition is true, an error is returned.
Definition: Error.h:296
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DIMENSIONS(...)
Definition: Validate.h:284
SimpleTensor< float > src
Definition: DFT.cpp:155
#define ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR(...)
Definition: Validate.h:159
#define ARM_COMPUTE_UNUSED(...)
To avoid unused variables warnings.
Definition: Error.h:152
size_t get_data_layout_dimension_index(const DataLayout &data_layout, const DataLayoutDimension &data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:193
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(...)
Definition: Validate.h:541
#define ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(t, c,...)
Definition: Validate.h:788
DataLayout
[DataLayout enum definition]
Definition: Types.h:113
TensorShape compute_deep_convolution_shape(const TensorShape &input_shape, DataLayout input_data_layout, const TensorShape &weights_shape, const PadStrideInfo &conv_info)
Calculate the deep convolution shape output shape of a tensor.