Data Structures
struct	ActivationDataTypeISASelectorData

struct	CastDataTypeISASelectorData

class	CpuActivationKernel
	Interface for the activation kernel. More...

class	CpuAddKernel
	Interface for the kernel to perform addition between two tensors. More...

struct	CpuAddKernelDataTypeISASelectorData

class	CpuAddMulAddKernel
	Interface for the kernel to perform addition between two tensors. More...

class	CpuArithmeticKernel

class	CpuCastKernel
	Casts a given tensor to a new type. More...

class	CpuCol2ImKernel
	Kernel to perform col2im reshaping. More...

class	CpuComparisonKernel

class	CpuComplexMulKernel
	Interface for the complex pixelwise multiplication kernel. More...

class	CpuConcatenateBatchKernel
	Interface for the batch concatenate kernel. More...

class	CpuConcatenateDepthKernel
	Interface for the depth concatenate kernel. More...

class	CpuConcatenateHeightKernel
	Interface for the height concatenate kernel. More...

class	CpuConcatenateWidthKernel
	Interface for the width concatenate kernel. More...

class	CpuConvertFullyConnectedWeightsKernel
	Interface to convert the 2D Fully Connected weights from NCHW to NHWC or vice versa. More...

class	CpuConvertQuantizedSignednessKernel
	Kernel to convert asymmetric signed to asymmetric signed and vice-versa. More...

class	CpuCopyKernel
	Kernel to perform a copy between two tensors. More...

class	CpuDepthwiseConv2dAssemblyWrapperKernel
	This class is a wrapper for the depthwise convolution assembly kernels. More...

class	CpuDepthwiseConv2dNativeKernel
	Interface for the kernel to run a depthwise convolution native on a tensor. More...

class	CpuDequantizeKernel
	Interface for the dequantization layer kernel. More...

class	CpuDirectConv2dKernel
	Interface for the kernel to perform Direct Convolution Layer. More...

class	CpuDirectConv2dOutputStageKernel
	Kernel to accumulate the biases, if provided, or downscale in case of quantized input. More...

class	CpuDirectConv3dKernel
	Interface for the kernel to perform 3D Direct Convolution Layer. More...

class	CpuDivisionKernel

class	CpuElementwiseKernel
	Interface for an element-wise operation kernel. More...

class	CpuElementwiseUnaryKernel
	Interface for an element-wise unary operation kernel. More...

class	CpuFillKernel
	Kernel for filling a tensor with a given constant value. More...

class	CpuFloorKernel
	Cpu accelarated kernel to perform a floor operation. More...

class	CpuGemmInterleave4x4Kernel
	Kernel to interleave the elements of a matrix. More...

class	CpuGemmLowpMatrixAReductionKernel
	Kernel used to compute the row-vectors of sums of all the entries in each row of Matrix A. More...

class	CpuGemmLowpMatrixBReductionKernel
	Kernel used to compute the row-vectors of sums of all the entries in each column of Matrix B. More...

class	CpuGemmLowpMatrixMultiplyKernel
	Kernel to multiply matrices. More...

class	CpuGemmLowpOffsetContributionKernel
	Kernel used to add the offset contribution after CpuGemmLowpMatrixMultiplyKernel. More...

class	CpuGemmLowpOffsetContributionOutputStageKernel
	Kernel used to add the offset contribution and perform the output stage after CpuGemmLowpMatrixMultiplyKernel. More...

class	CpuGemmLowpQuantizeDownInt32ScaleKernel
	Kernel used to quantize down the int32 accumulator values of GEMMLowp to QASYMM8/QASYMM8_SIGNED. More...

class	CpuGemmLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel
	Kernel used to quantize down the int32 accumulator values of GEMMLowp to QSYMM16. More...

class	CpuGemmLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel
	Kernel used to quantize down the int32 accumulator values of GEMMLowp to QASYMM8_SIGNED. More...

class	CpuGemmLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel
	Kernel used to quantize down the int32 accumulator values of GEMMLowp to QASYMM8. More...

class	CpuGemmMatrixAdditionKernel
	Kernel to perform the in-place matrix addition between 2 matrices taking into account that the second matrix might be weighted by a scalar value beta: More...

class	CpuGemmMatrixMultiplyKernel
	Kernel to multiply two input matrices "A" and "B". More...

class	CpuGemmTranspose1xWKernel
	Kernel which transposes the elements of a matrix in chunks of 1xW, where W is equal to (16 / element size of the tensor) More...

class	CpuIm2ColKernel
	Interface for the im2col reshape kernel. More...

class	CpuMaxUnpoolingLayerKernel
	Interface for the pooling layer kernel. More...

class	CpuMulKernel
	Interface for the kernel to perform multiplication between two tensors. More...

class	CpuPermuteKernel
	Kernel to perform tensor permutation given a permutation vector. More...

class	CpuPool2dAssemblyWrapperKernel
	This class is a wrapper for the assembly kernels. More...

class	CpuPool2dKernel
	Interface for the pooling layer kernel. More...

class	CpuPool3dKernel
	Interface for the kernel to perform Pooling 3D. More...

class	CpuPowerKernel

class	CpuQuantizeKernel
	Interface for the quantization layer kernel. More...

class	CpuReshapeKernel
	Interface for the kernel to perform tensor reshaping. More...

class	CpuScaleKernel
	Arm(R) Neon(TM) kernel to perform scaling on a tensor. More...

class	CpuSoftmaxKernel
	Interface for softmax computation. More...

class	CpuSubKernel
	Interface for the kernel to perform subtraction between two tensors. More...

class	CpuTransposeKernel
	Kernel which transposes the elements of a matrix. More...

class	CpuWeightsReshapeKernel
	Kernel to perform reshaping on the weights used by convolution and locally connected layer. More...

struct	DataTypeDataLayoutISASelectorData

struct	DataTypeISASelectorData

struct	DepthwiseConv2dNativeDataTypeISASelectorData

struct	ElementwiseDataTypeISASelectorData

struct	PoolDataTypeISASelectorData

struct	ScaleKernelDataTypeISASelectorData

struct	SoftmaxKernelDataTypeISASelectorData

Typedefs
using	DataTypeISASelectorPtr = std::add_pointer< bool(const DataTypeISASelectorData &data)>::type

using	DataTypeDataLayoutSelectorPtr = std::add_pointer< bool(const DataTypeDataLayoutISASelectorData &data)>::type

using	PoolDataTypeISASelectorPtr = std::add_pointer< bool(const PoolDataTypeISASelectorData &data)>::type

using	ElementwiseDataTypeISASelectorPtr = std::add_pointer< bool(const ElementwiseDataTypeISASelectorData &data)>::type

using	DepthwiseConv2dNativeDataTypeISASelectorPtr = std::add_pointer< bool(const DepthwiseConv2dNativeDataTypeISASelectorData &data)>::type

using	CastDataTypeISASelectorDataPtr = std::add_pointer< bool(const CastDataTypeISASelectorData &data)>::type

using	ActivationDataTypeISASelectorDataPtr = std::add_pointer< bool(const ActivationDataTypeISASelectorData &data)>::type

using	CpuAddKernelDataTypeISASelectorDataPtr = std::add_pointer< bool(const CpuAddKernelDataTypeISASelectorData &data)>::type

using	ScaleKernelDataTypeISASelectorDataPtr = std::add_pointer< bool(const ScaleKernelDataTypeISASelectorData &data)>::type

using	SoftmaxKernelDataTypeISASelectorDataPtr = std::add_pointer< bool(const SoftmaxKernelDataTypeISASelectorData &data)>::type

Functions
Status	validate_arguments (const ITensorInfo src, const ITensorInfo weights, const ITensorInfo *dst, const PadStrideInfo &conv_info)

std::pair< Status, Window >	validate_and_configure_window (ITensorInfo src, ITensorInfo dst)

void	run_im2col_fp32_pad (const ITensor src, ITensor dst, const Window &window, DataLayout data_layout, const PadStrideInfo &conv_info, std::pair< unsigned int, unsigned int > convolved_dims, const Size2D &kernel_dims, const Size2D &dilation, uint32_t input_pad_right, bool has_bias)

void	run_im2col_fp32_nopad (const ITensor src, ITensor dst, const Window &window, DataLayout data_layout, const PadStrideInfo &conv_info, std::pair< unsigned int, unsigned int > convolved_dims, const Size2D &kernel_dims, const Size2D &dilation, uint32_t input_pad_right, bool has_bias)

void	run_im2col_int8_nopad_nhwc (const ITensor src, ITensor dst, const Window &window, DataLayout data_layout, const PadStrideInfo &conv_info, std::pair< unsigned int, unsigned int > convolved_dims, const Size2D &kernel_dims, const Size2D &dilation, uint32_t input_pad_right, bool has_bias)

void	run_im2col_uint8_nopad_nhwc (const ITensor src, ITensor dst, const Window &window, DataLayout data_layout, const PadStrideInfo &conv_info, std::pair< unsigned int, unsigned int > convolved_dims, const Size2D &kernel_dims, const Size2D &dilation, uint32_t input_pad_right, bool has_bias)

void	run_im2col_qasymm8_pad_nhwc (const ITensor src, ITensor dst, const Window &window, DataLayout data_layout, const PadStrideInfo &conv_info, std::pair< unsigned int, unsigned int > convolved_dims, const Size2D &kernel_dims, const Size2D &dilation, uint32_t input_pad_right, bool has_bias)

void	internal_run_im2col_fp16_pad (const ITensor src, ITensor dst, const Window &window, DataLayout data_layout, const PadStrideInfo &conv_info, std::pair< unsigned int, unsigned int > convolved_dims, const Size2D &kernel_dims, const Size2D &dilation, uint32_t input_pad_right, bool has_bias)

void	internal_run_im2col_fp16_nopad (const ITensor src, ITensor dst, const Window &window, DataLayout data_layout, const PadStrideInfo &conv_info, std::pair< unsigned int, unsigned int > convolved_dims, const Size2D &kernel_dims, const Size2D &dilation, uint32_t input_pad_right, bool has_bias)

void	internal_run_im2col_fp16_nchw_pad (const ITensor src, ITensor dst, const Window &window, DataLayout data_layout, const PadStrideInfo &conv_info, std::pair< unsigned int, unsigned int > convolved_dims, const Size2D &kernel_dims, const Size2D &dilation, uint32_t input_pad_right, bool has_bias)

void	internal_run_im2col_fp16_nchw_nopad (const ITensor src, ITensor dst, const Window &window, DataLayout data_layout, const PadStrideInfo &conv_info, std::pair< unsigned int, unsigned int > convolved_dims, const Size2D &kernel_dims, const Size2D &dilation, uint32_t input_pad_right, bool has_bias)

template<typename T , bool has_pads>
void	linearize_volume_nchw (const uint8_t const in_ptr, T out_ptr, bool has_bias, int top_left_x, int top_left_y, int kernel_width, int kernel_height, int kernel_depth, int input_w, int input_h, int input_stride_x, int input_stride_y, int input_stride_z, int pad_value, int dilation_x, int dilation_y)

template<typename T , bool has_pads>
void	linearize_volume_nhwc (const uint8_t const in_ptr, T out_ptr, bool has_bias, int start_x, int start_y, int kernel_width, int kernel_height, int input_w, int input_h, int input_c, int input_stride_y, int input_stride_z, int pad_value, int dilation_x, int dilation_y)

template<typename T , bool has_pads>
void	linearize_volume_nhwc (const uint8_t const in_ptr, T out_ptr, bool has_bias, int start_x, int start_y, int kernel_width, int kernel_height, int input_w, int input_h, int input_c, int input_stride_y, int input_stride_z, int pad_value, int dilation_x, int dilation_y, int pad_right)

template<typename T , bool has_pads, bool is_nchw>
void	run_im2col (const ITensor src, ITensor dst, const Window &window, DataLayout data_layout, const PadStrideInfo &conv_info, std::pair< unsigned int, unsigned int > convolved_dims, const Size2D &kernel_dims, const Size2D &dilation, uint32_t input_pad_right, bool has_bias)

void	neon_fp32_nhwc_directconv2d (const Window &window, const ITensor src, const ITensor weights, ITensor *dst, const PadStrideInfo &conv_info)

void	neon_fp16_nchw_directconv2d (const Window &window, const ITensor src, const ITensor weights, ITensor *dst, const PadStrideInfo &conv_info)

void	neon_fp32_nchw_directconv2d (const Window &window, const ITensor src, const ITensor weights, ITensor *dst, const PadStrideInfo &conv_info)

void	run_im2col_fp32_nchw_pad (const ITensor src, ITensor dst, const Window &window, DataLayout data_layout, const PadStrideInfo &conv_info, std::pair< unsigned int, unsigned int > convolved_dims, const Size2D &kernel_dims, const Size2D &dilation, uint32_t input_pad_right, bool has_bias)

void	run_im2col_fp32_nchw_nopad (const ITensor src, ITensor dst, const Window &window, DataLayout data_layout, const PadStrideInfo &conv_info, std::pair< unsigned int, unsigned int > convolved_dims, const Size2D &kernel_dims, const Size2D &dilation, uint32_t input_pad_right, bool has_bias)

void	run_im2col_fp16_nchw_pad (const ITensor src, ITensor dst, const Window &window, DataLayout data_layout, const PadStrideInfo &conv_info, std::pair< unsigned int, unsigned int > convolved_dims, const Size2D &kernel_dims, const Size2D &dilation, uint32_t input_pad_right, bool has_bias)

void	run_im2col_fp16_nchw_nopad (const ITensor src, ITensor dst, const Window &window, DataLayout data_layout, const PadStrideInfo &conv_info, std::pair< unsigned int, unsigned int > convolved_dims, const Size2D &kernel_dims, const Size2D &dilation, uint32_t input_pad_right, bool has_bias)

void	run_im2col_bf16_nchw_pad (const ITensor src, ITensor dst, const Window &window, DataLayout data_layout, const PadStrideInfo &conv_info, std::pair< unsigned int, unsigned int > convolved_dims, const Size2D &kernel_dims, const Size2D &dilation, uint32_t input_pad_right, bool has_bias)

void	run_im2col_bf16_nchw_nopad (const ITensor src, ITensor dst, const Window &window, DataLayout data_layout, const PadStrideInfo &conv_info, std::pair< unsigned int, unsigned int > convolved_dims, const Size2D &kernel_dims, const Size2D &dilation, uint32_t input_pad_right, bool has_bias)

void	run_im2col_qasymm8_nchw_pad (const ITensor src, ITensor dst, const Window &window, DataLayout data_layout, const PadStrideInfo &conv_info, std::pair< unsigned int, unsigned int > convolved_dims, const Size2D &kernel_dims, const Size2D &dilation, uint32_t input_pad_right, bool has_bias)

void	run_im2col_qasymm8_nchw_nopad (const ITensor src, ITensor dst, const Window &window, DataLayout data_layout, const PadStrideInfo &conv_info, std::pair< unsigned int, unsigned int > convolved_dims, const Size2D &kernel_dims, const Size2D &dilation, uint32_t input_pad_right, bool has_bias)

void	run_im2col_fp16_pad (const ITensor src, ITensor dst, const Window &window, DataLayout data_layout, const PadStrideInfo &conv_info, std::pair< unsigned int, unsigned int > convolved_dims, const Size2D &kernel_dims, const Size2D &dilation, uint32_t input_pad_right, bool has_bias)

void	run_im2col_fp16_nopad (const ITensor src, ITensor dst, const Window &window, DataLayout data_layout, const PadStrideInfo &conv_info, std::pair< unsigned int, unsigned int > convolved_dims, const Size2D &kernel_dims, const Size2D &dilation, uint32_t input_pad_right, bool has_bias)

void	run_im2col_bf16_pad (const ITensor src, ITensor dst, const Window &window, DataLayout data_layout, const PadStrideInfo &conv_info, std::pair< unsigned int, unsigned int > convolved_dims, const Size2D &kernel_dims, const Size2D &dilation, uint32_t input_pad_right, bool has_bias)

void	run_im2col_bf16_nopad (const ITensor src, ITensor dst, const Window &window, DataLayout data_layout, const PadStrideInfo &conv_info, std::pair< unsigned int, unsigned int > convolved_dims, const Size2D &kernel_dims, const Size2D &dilation, uint32_t input_pad_right, bool has_bias)

template<typename T >
void	convolve_nchw (const Window &window, const ITensor src, const ITensor weights, ITensor *dst, const PadStrideInfo &conv_info)

template<typename T >
void	convolve_nhwc (const Window &window, const ITensor src, const ITensor weights, ITensor *dst, const PadStrideInfo &conv_info)

template void	convolve_nhwc< float > (const Window &window, const ITensor src, const ITensor weights, ITensor *dst, const PadStrideInfo &conv_info)

void	run_im2col_qasymm8_pad (const ITensor src, ITensor dst, const Window &window, DataLayout data_layout, const PadStrideInfo &conv_info, std::pair< unsigned int, unsigned int > convolved_dims, const Size2D &kernel_dims, const Size2D &dilation, uint32_t input_pad_right, bool has_bias)

void	run_im2col_qasymm8_nopad (const ITensor src, ITensor dst, const Window &window, DataLayout data_layout, const PadStrideInfo &conv_info, std::pair< unsigned int, unsigned int > convolved_dims, const Size2D &kernel_dims, const Size2D &dilation, uint32_t input_pad_right, bool has_bias)

Typedef Documentation

◆ ActivationDataTypeISASelectorDataPtr

using ActivationDataTypeISASelectorDataPtr = std::add_pointer<bool(const ActivationDataTypeISASelectorData &data)>::type

Definition at line 118 of file CpuKernelSelectionTypes.h.

◆ CastDataTypeISASelectorDataPtr

using CastDataTypeISASelectorDataPtr = std::add_pointer<bool(const CastDataTypeISASelectorData &data)>::type

Definition at line 116 of file CpuKernelSelectionTypes.h.

◆ CpuAddKernelDataTypeISASelectorDataPtr

using CpuAddKernelDataTypeISASelectorDataPtr = std::add_pointer<bool(const CpuAddKernelDataTypeISASelectorData &data)>::type

Definition at line 120 of file CpuKernelSelectionTypes.h.

◆ DataTypeDataLayoutSelectorPtr

using DataTypeDataLayoutSelectorPtr = std::add_pointer<bool(const DataTypeDataLayoutISASelectorData &data)>::type

Definition at line 111 of file CpuKernelSelectionTypes.h.

◆ DataTypeISASelectorPtr

using DataTypeISASelectorPtr = std::add_pointer<bool(const DataTypeISASelectorData &data)>::type

Definition at line 110 of file CpuKernelSelectionTypes.h.

◆ DepthwiseConv2dNativeDataTypeISASelectorPtr

using DepthwiseConv2dNativeDataTypeISASelectorPtr = std::add_pointer<bool(const DepthwiseConv2dNativeDataTypeISASelectorData &data)>::type

Definition at line 115 of file CpuKernelSelectionTypes.h.

◆ ElementwiseDataTypeISASelectorPtr

using ElementwiseDataTypeISASelectorPtr = std::add_pointer<bool(const ElementwiseDataTypeISASelectorData &data)>::type

Definition at line 113 of file CpuKernelSelectionTypes.h.

◆ PoolDataTypeISASelectorPtr

using PoolDataTypeISASelectorPtr = std::add_pointer<bool(const PoolDataTypeISASelectorData &data)>::type

Definition at line 112 of file CpuKernelSelectionTypes.h.

◆ ScaleKernelDataTypeISASelectorDataPtr

using ScaleKernelDataTypeISASelectorDataPtr = std::add_pointer<bool(const ScaleKernelDataTypeISASelectorData &data)>::type

Definition at line 122 of file CpuKernelSelectionTypes.h.

◆ SoftmaxKernelDataTypeISASelectorDataPtr

using SoftmaxKernelDataTypeISASelectorDataPtr = std::add_pointer<bool(const SoftmaxKernelDataTypeISASelectorData &data)>::type

Definition at line 124 of file CpuKernelSelectionTypes.h.

Function Documentation

◆ convolve_nchw()

void arm_compute::cpu::kernels::convolve_nchw	(	const Window &	window,
		const ITensor *	src,
		const ITensor *	weights,
		ITensor *	dst,
		const PadStrideInfo &	conv_info
	)

Definition at line 47 of file impl.h.

 {
     ARM_COMPUTE_UNUSED(conv_info);
  
     // Declare useful types
     using vtype       = wrapper::traits::neon_bitvector<T, wrapper::traits::BitWidth::W128>;
     using vector_type = typename vtype::type;
     using tag_type    = typename vtype::tag_type;
  
     // Scalar quantities
     const int element_size   = src->info()->element_size();
     const int input_stride_w = src->info()->strides_in_bytes()[0] / element_size;
     const int input_stride_h = src->info()->strides_in_bytes()[1] / element_size;
     const int input_stride_c = src->info()->strides_in_bytes()[2] / element_size;
     const int input_stride_n = src->info()->strides_in_bytes()[3] / element_size;
  
     const int input_dim_w = src->info()->dimension(0);
     const int input_dim_h = src->info()->dimension(1);
  
     const int output_stride_c = dst->info()->strides_in_bytes()[2];
  
     const unsigned int kernel_stride_w = weights->info()->strides_in_bytes().x() / element_size;
     const unsigned int kernel_stride_h = weights->info()->strides_in_bytes().y() / element_size;
     const unsigned int kernel_stride_c = weights->info()->strides_in_bytes().z() / element_size;
  
     const int kernel_dim_w = weights->info()->dimension(0);
     const int kernel_dim_h = weights->info()->dimension(1);
  
     const int conv_pad_top  = conv_info.pad_top();
     const int conv_pad_left = conv_info.pad_left();
     const int conv_stride_w = std::get<0>(conv_info.stride());
     const int conv_stride_h = std::get<1>(conv_info.stride());
  
     // Setup input window for the output iterator
     Window window_out = window;
     window_out.set(Window::DimZ, Window::Dimension(0, 1, 1));
  
     // Setup input window for the weights iterator
     Window window_w = calculate_max_window(*weights->info(), Steps());
     window_w.set(Window::DimX, Window::Dimension(0, 1, 1));
     window_w.set(Window::DimY, Window::Dimension(0, 1, 1));
     window_w.set(Window::DimZ, Window::Dimension(0, 1, 1));
  
     Iterator out(dst, window_out);
     Iterator wei(weights, window_w);
  
     constexpr int num_elems_read_per_iteration = 16 / sizeof(T);
  
     execute_window_loop(
         window_out,
         [&](const Coordinates &id)
         {
             // We are computing the theoretical starting input starting points
             const int in_w_start_t = static_cast<int>(id.x()) * conv_stride_w - conv_pad_left;
             const int in_h_start_t = static_cast<int>(id.y()) * conv_stride_h - conv_pad_top;
             const int in_w_end_t   = in_w_start_t + kernel_dim_w;
             const int in_h_end_t   = in_h_start_t + kernel_dim_h;
  
             // We are computing the valid initial and ending input points by checking the borders
             const int in_w_start = std::max(in_w_start_t, 0);
             const int in_h_start = std::max(in_h_start_t, 0);
             const int in_w_end   = std::min(in_w_end_t, input_dim_w);
             const int in_h_end   = std::min(in_h_end_t, input_dim_h);
  
             // We use the input points to select the valid weight points to use
             const int wei_w_start = in_w_start - in_w_start_t;
             const int wei_h_start = in_h_start - in_h_start_t;
             const int wei_h_end   = kernel_dim_h - (in_h_end_t - in_h_end);
  
             const int      index_c_end = weights->info()->dimension(2);
             const T *const in_ptr_start =
                 reinterpret_cast<const T *>(src->buffer() + src->info()->offset_first_element_in_bytes()) +
                 id[3] * input_stride_n;
             execute_window_loop(
                 window_w,
                 [&](const Coordinates &id_w)
                 {
                     const T *const weights_ptr_start = reinterpret_cast<const T *>(wei.ptr());
                     uint8_t       *out_ptr           = out.ptr() + id_w[3] * output_stride_c;
                     T              out_temp          = static_cast<T>(0);
  
                     for (int index_wei_c = 0, index_in_c = 0; index_wei_c < index_c_end; ++index_wei_c, ++index_in_c)
                     {
                         const T *const in_ptr_row_0      = in_ptr_start + index_in_c * input_stride_c;
                         const T *const weights_ptr_row_0 = weights_ptr_start + index_wei_c * kernel_stride_c;
                         for (int index_wei_h = wei_h_start, index_in_h = in_h_start; index_wei_h < wei_h_end;
                              ++index_wei_h, ++index_in_h)
                         {
                             const T    *in_ptr_row      = in_ptr_row_0 + index_in_h * input_stride_h;
                             const T    *weights_ptr_row = weights_ptr_row_0 + index_wei_h * kernel_stride_h;
                             int         index_w         = in_w_start;
                             int         index_wei_w     = wei_w_start;
                             vector_type out_temp_vec    = wrapper::vdup_n(static_cast<T>(0), tag_type());
                             for (; index_w <= ((in_w_end - num_elems_read_per_iteration));
                                  index_w += num_elems_read_per_iteration, index_wei_w += num_elems_read_per_iteration)
                             {
                                 const auto src_vec = wrapper::vloadq(in_ptr_row + index_w * input_stride_w);
                                 const auto w_vec   = wrapper::vloadq(weights_ptr_row + index_wei_w * kernel_stride_w);
                                 out_temp_vec       = wrapper::vmla(out_temp_vec, w_vec, src_vec);
                             }
                             out_temp += vreduce(out_temp_vec);
                             for (; index_w < in_w_end; ++index_w, ++index_wei_w)
                             {
                                 const auto src_val = *(in_ptr_row + index_w * input_stride_w);
                                 const auto w_val   = *(weights_ptr_row + index_wei_w * kernel_stride_w);
                                 out_temp += src_val * w_val;
                             }
                         }
                     }
                     *(reinterpret_cast<T *>(out_ptr)) = out_temp;
                 },
                 wei);
         },
         out);
 }

References ARM_COMPUTE_UNUSED, arm_compute::calculate_max_window(), arm_compute::test::validation::conv_info, ITensorInfo::dimension(), Window::DimX, Window::DimY, Window::DimZ, arm_compute::test::validation::dst, arm_compute::execute_window_loop(), ITensor::info(), Iterator::ptr(), Window::set(), arm_compute::test::validation::src, ITensorInfo::strides_in_bytes(), type, arm_compute::wrapper::vdup_n(), arm_compute::wrapper::vloadq(), arm_compute::wrapper::vmla(), arm_compute::vreduce(), Dimensions< T >::x(), Dimensions< T >::y(), and Dimensions< T >::z().

◆ convolve_nhwc()

void convolve_nhwc	(	const Window &	window,
		const ITensor *	src,
		const ITensor *	weights,
		ITensor *	dst,
		const PadStrideInfo &	conv_info
	)

Definition at line 58 of file impl.cpp.

 {
     // Declare useful types
     using vtype       = wrapper::traits::neon_bitvector<T, wrapper::traits::BitWidth::W128>;
     using vector_type = typename vtype::type;
     using tag_type    = typename vtype::tag_type;
  
     // Scalar quantities
     const int element_size   = src->info()->element_size();
     const int input_stride_w = src->info()->strides_in_bytes().y() / element_size;
     const int input_stride_h = src->info()->strides_in_bytes().z() / element_size;
     const int input_stride_n = src->info()->strides_in_bytes()[3] / element_size;
     const int input_dim_w    = src->info()->dimension(1);
     const int input_dim_h    = src->info()->dimension(2);
  
     const int output_stride_c = dst->info()->strides_in_bytes().x();
  
     const unsigned int kernel_stride_w = weights->info()->strides_in_bytes().y() / element_size;
     const unsigned int kernel_stride_h = weights->info()->strides_in_bytes().z() / element_size;
     const int          kernel_dim_w    = weights->info()->dimension(1);
     const int          kernel_dim_h    = weights->info()->dimension(2);
  
     const int conv_pad_top  = conv_info.pad_top();
     const int conv_pad_left = conv_info.pad_left();
     const int conv_stride_w = std::get<0>(conv_info.stride());
     const int conv_stride_h = std::get<1>(conv_info.stride());
  
     // Setup input window for the output iterator
     Window window_out = window;
     window_out.set(Window::DimX, Window::Dimension(0, 1, 1));
  
     // Setup input window for the weights iterator
     Window window_w = calculate_max_window(*weights->info(), Steps());
     window_w.set(Window::DimX, Window::Dimension(0, 1, 1));
     window_w.set(Window::DimY, Window::Dimension(0, 1, 1));
     window_w.set(Window::DimZ, Window::Dimension(0, 1, 1));
  
     Iterator out(dst, window_out);
     Iterator wei(weights, window_w);
  
     constexpr int num_elems_read_per_iteration = 16 / sizeof(T);
  
     // nhwc optimized
     if (have_zero_x_internal_padding(src->info(), weights->info()))
     {
         // This function assumes that input and weights have not padding in channel
  
         /*
         * This implementation parallelize the full WC plane of input and weights by
         * treating them as series of elements. So for example, a 3x3 weights and
         * floating point vector operations of 4 elements per time, the first 3
         * channel elements of the first row would be taken and additionally the first
         * element of the second row. The 9 elements in each single WC weight plane
         * would require 2 4-element vector operations and a last single element operation.
         *
         * This works since when we create the input vector to multiply with the weights,
         * the exact required elements are loaded in the same order. Therefore the
         * multiplication works on the correct input/weight elements.
         */
         execute_window_loop(
             window_out,
             [&](const Coordinates &id)
             {
                 /*
             * In here we create theoretical indexes which then we validate for both
             * inputs and weights.
             * As a reminder, this loop take each output point in NHW, C is treated
             * in the weights loop.
             */
                 // We are computing the theoretical starting input starting points
                 const int in_w_start_t = static_cast<int>(id.y()) * conv_stride_w - conv_pad_left;
                 const int in_h_start_t = static_cast<int>(id.z()) * conv_stride_h - conv_pad_top;
                 const int in_w_end_t   = in_w_start_t + kernel_dim_w;
                 const int in_h_end_t   = in_h_start_t + kernel_dim_h;
  
                 // We are computing the valid initial and ending input points by checking the borders
                 const int in_w_start = std::max(in_w_start_t, 0);
                 const int in_h_start = std::max(in_h_start_t, 0);
                 const int in_w_end   = std::min(in_w_end_t, input_dim_w);
                 const int in_h_end   = std::min(in_h_end_t, input_dim_h);
  
                 // We use the input points to select the valid weight points to use
                 const int index_wc_start = (in_w_start - in_w_start_t) * kernel_stride_w;
                 const int index_h_start  = in_h_start - in_h_start_t;
                 const int index_wc_end   = (kernel_dim_w - (in_w_end_t - in_w_end)) * kernel_stride_w;
                 const int index_h_end    = kernel_dim_h - (in_h_end_t - in_h_end);
  
                 execute_window_loop(
                     window_w,
                     [&](const Coordinates &id_w)
                     {
                         /*
                 * This is the loop in the weights, and it goes along N (the batches)
                 * As a reminder, the batches of the weights are translated into the
                 * channels of the output
                 */
                         const T *in_ptr_row =
                             reinterpret_cast<const T *>(src->buffer() + src->info()->offset_first_element_in_bytes()) +
                             id[3] * input_stride_n + in_w_start * input_stride_w + in_h_start * input_stride_h;
                         const T *weights_ptr_row =
                             reinterpret_cast<const T *>(wei.ptr()) + index_h_start * kernel_stride_h;
                         uint8_t *out_ptr = out.ptr() + id_w[3] * output_stride_c;
  
                         T out_temp = static_cast<T>(0);
                         for (int index_h = index_h_start; index_h < index_h_end;
                              ++index_h, in_ptr_row += input_stride_h, weights_ptr_row += kernel_stride_h)
                         {
                             const T    *in_ptr_mover = in_ptr_row;
                             int         index_wc     = index_wc_start;
                             vector_type out_temp_vec = wrapper::vdup_n(static_cast<T>(0), tag_type());
                             for (; index_wc <= index_wc_end - num_elems_read_per_iteration;
                                  index_wc += num_elems_read_per_iteration, in_ptr_mover += num_elems_read_per_iteration)
                             {
                                 const auto src_vec = wrapper::vloadq(in_ptr_mover);
                                 const auto w_vec   = wrapper::vloadq(weights_ptr_row + index_wc);
                                 out_temp_vec       = wrapper::vmla(out_temp_vec, w_vec, src_vec);
                             }
                             out_temp += vreduce(out_temp_vec);
                             for (; index_wc < index_wc_end; ++index_wc, ++in_ptr_mover)
                             {
                                 const auto src_val = *(in_ptr_mover);
                                 const auto w_val   = *(weights_ptr_row + index_wc);
                                 out_temp += src_val * w_val;
                             }
                         }
                         *(reinterpret_cast<T *>(out_ptr)) = out_temp;
                     },
                     wei);
             },
             out);
     }
     else // nhwc non optimized
     {
         execute_window_loop(
             window_out,
             [&](const Coordinates &id)
             {
                 // We are computing the theoretical starting input starting points
                 const int in_w_start_t = static_cast<int>(id.y()) * conv_stride_w - conv_pad_left;
                 const int in_h_start_t = static_cast<int>(id.z()) * conv_stride_h - conv_pad_top;
                 const int in_w_end_t   = in_w_start_t + kernel_dim_w;
                 const int in_h_end_t   = in_h_start_t + kernel_dim_h;
  
                 // We are computing the valid initial and ending input points by checking the borders
                 const int in_w_start = std::max(in_w_start_t, 0);
                 const int in_h_start = std::max(in_h_start_t, 0);
                 const int in_w_end   = std::min(in_w_end_t, input_dim_w);
                 const int in_h_end   = std::min(in_h_end_t, input_dim_h);
  
                 // We use the input points to select the valid weight points to use
                 const int wei_w_start = in_w_start - in_w_start_t;
                 const int wei_h_start = in_h_start - in_h_start_t;
                 const int wei_w_end   = kernel_dim_w - (in_w_end_t - in_w_end);
                 const int wei_h_end   = kernel_dim_h - (in_h_end_t - in_h_end);
  
                 const int      index_c_end = weights->info()->dimension(0);
                 const T *const in_ptr_start =
                     reinterpret_cast<const T *>(src->buffer() + src->info()->offset_first_element_in_bytes()) +
                     id[3] * input_stride_n;
  
                 execute_window_loop(
                     window_w,
                     [&](const Coordinates &id_w)
                     {
                         const T *const weights_ptr_start = reinterpret_cast<const T *>(wei.ptr());
                         uint8_t       *out_ptr           = out.ptr() + id_w[3] * output_stride_c;
  
                         T out_temp = static_cast<T>(0);
                         for (int index_wei_h = wei_h_start, index_in_h = in_h_start; index_wei_h < wei_h_end;
                              ++index_wei_h, ++index_in_h)
                         {
                             const T *const in_ptr_row      = in_ptr_start + index_in_h * input_stride_h;
                             const T *const weights_ptr_row = weights_ptr_start + index_wei_h * kernel_stride_h;
                             for (int index_wei_w = wei_w_start, index_in_w = in_w_start; index_wei_w < wei_w_end;
                                  ++index_wei_w, ++index_in_w)
                             {
                                 const T    *in_ptr_mover      = in_ptr_row + index_in_w * input_stride_w;
                                 const T    *weights_ptr_mover = weights_ptr_row + index_wei_w * kernel_stride_w;
                                 int         index_c           = 0;
                                 vector_type out_temp_vec      = wrapper::vdup_n(static_cast<T>(0), tag_type());
                                 for (; index_c <= index_c_end - num_elems_read_per_iteration;
                                      index_c += num_elems_read_per_iteration,
                                      in_ptr_mover += num_elems_read_per_iteration,
                                      weights_ptr_mover += num_elems_read_per_iteration)
                                 {
                                     const auto src_vec = wrapper::vloadq(in_ptr_mover);
                                     const auto w_vec   = wrapper::vloadq(weights_ptr_mover);
                                     out_temp_vec       = wrapper::vmla(out_temp_vec, w_vec, src_vec);
                                 }
                                 out_temp += vreduce(out_temp_vec);
                                 for (; index_c < index_c_end; ++index_c, ++in_ptr_mover, ++weights_ptr_mover)
                                 {
                                     const auto src_val = *(in_ptr_mover);
                                     const auto w_val   = *(weights_ptr_mover);
                                     out_temp += src_val * w_val;
                                 }
                             }
                         }
                         *(reinterpret_cast<T *>(out_ptr)) = out_temp;
                     },
                     wei);
             },
             out);
     }
 }

References arm_compute::calculate_max_window(), arm_compute::test::validation::conv_info, ITensorInfo::dimension(), Window::DimX, Window::DimY, Window::DimZ, arm_compute::test::validation::dst, arm_compute::execute_window_loop(), ITensor::info(), Iterator::ptr(), Window::set(), arm_compute::test::validation::src, ITensorInfo::strides_in_bytes(), type, arm_compute::wrapper::vdup_n(), arm_compute::wrapper::vloadq(), arm_compute::wrapper::vmla(), arm_compute::vreduce(), Dimensions< T >::y(), and Dimensions< T >::z().

◆ convolve_nhwc< float >()

template void arm_compute::cpu::kernels::convolve_nhwc< float >	(	const Window &	window,
		const ITensor *	src,
		const ITensor *	weights,
		ITensor *	dst,
		const PadStrideInfo &	conv_info
	)

Referenced by neon_fp32_nhwc_directconv2d().

◆ internal_run_im2col_fp16_nchw_nopad()

void arm_compute::cpu::kernels::internal_run_im2col_fp16_nchw_nopad	(	const ITensor *	src,
		ITensor *	dst,
		const Window &	window,
		DataLayout	data_layout,
		const PadStrideInfo &	conv_info,
		std::pair< unsigned int, unsigned int >	convolved_dims,
		const Size2D &	kernel_dims,
		const Size2D &	dilation,
		uint32_t	input_pad_right,
		bool	has_bias
	)

Definition at line 226 of file CpuIm2ColKernel.cpp.

 {
 #if defined(ENABLE_FP16_KERNELS)
     arm_compute::cpu::kernels::run_im2col_fp16_nchw_nopad(src, dst, window, data_layout, conv_info, convolved_dims,
                                                           kernel_dims, dilation, input_pad_right, has_bias);
 #else  //  defined(ENABLE_FP16_KERNELS)
     ARM_COMPUTE_UNUSED(src, dst, window, data_layout, conv_info, convolved_dims, kernel_dims, dilation, input_pad_right,
                        has_bias);
 #endif // defined(ENABLE_FP16_KERNELS)
 }

References ARM_COMPUTE_UNUSED, arm_compute::test::validation::conv_info, arm_compute::cpu::data_layout, arm_compute::test::validation::dst, arm_compute::test::validation::has_bias, run_im2col_fp16_nchw_nopad(), and arm_compute::test::validation::src.

Referenced by CpuIm2ColKernel::configure().

◆ internal_run_im2col_fp16_nchw_pad()

void arm_compute::cpu::kernels::internal_run_im2col_fp16_nchw_pad	(	const ITensor *	src,
		ITensor *	dst,
		const Window &	window,
		DataLayout	data_layout,
		const PadStrideInfo &	conv_info,
		std::pair< unsigned int, unsigned int >	convolved_dims,
		const Size2D &	kernel_dims,
		const Size2D &	dilation,
		uint32_t	input_pad_right,
		bool	has_bias
	)

Definition at line 206 of file CpuIm2ColKernel.cpp.

 {
 #if defined(ENABLE_FP16_KERNELS)
     arm_compute::cpu::kernels::run_im2col_fp16_nchw_pad(src, dst, window, data_layout, conv_info, convolved_dims,
                                                         kernel_dims, dilation, input_pad_right, has_bias);
 #else  // defined(ENABLE_FP16_KERNELS)
     ARM_COMPUTE_UNUSED(src, dst, window, data_layout, conv_info, convolved_dims, kernel_dims, dilation, input_pad_right,
                        has_bias);
 #endif // defined(ENABLE_FP16_KERNELS)
 }

References ARM_COMPUTE_UNUSED, arm_compute::test::validation::conv_info, arm_compute::cpu::data_layout, arm_compute::test::validation::dst, arm_compute::test::validation::has_bias, run_im2col_fp16_nchw_pad(), and arm_compute::test::validation::src.

Referenced by CpuIm2ColKernel::configure().

◆ internal_run_im2col_fp16_nopad()

void arm_compute::cpu::kernels::internal_run_im2col_fp16_nopad	(	const ITensor *	src,
		ITensor *	dst,
		const Window &	window,
		DataLayout	data_layout,
		const PadStrideInfo &	conv_info,
		std::pair< unsigned int, unsigned int >	convolved_dims,
		const Size2D &	kernel_dims,
		const Size2D &	dilation,
		uint32_t	input_pad_right,
		bool	has_bias
	)

Definition at line 186 of file CpuIm2ColKernel.cpp.

 {
 #if defined(ENABLE_FP16_KERNELS)
     arm_compute::cpu::kernels::run_im2col_fp16_nopad(src, dst, window, data_layout, conv_info, convolved_dims,
                                                      kernel_dims, dilation, input_pad_right, has_bias);
 #else  // defined(ENABLE_FP16_KERNELS)
     ARM_COMPUTE_UNUSED(src, dst, window, data_layout, conv_info, convolved_dims, kernel_dims, dilation, input_pad_right,
                        has_bias);
 #endif // defined(ENABLE_FP16_KERNELS)
 }

References ARM_COMPUTE_UNUSED, arm_compute::test::validation::conv_info, arm_compute::cpu::data_layout, arm_compute::test::validation::dst, arm_compute::test::validation::has_bias, run_im2col_fp16_nopad(), and arm_compute::test::validation::src.

Referenced by CpuIm2ColKernel::configure().

◆ internal_run_im2col_fp16_pad()

void arm_compute::cpu::kernels::internal_run_im2col_fp16_pad	(	const ITensor *	src,
		ITensor *	dst,
		const Window &	window,
		DataLayout	data_layout,
		const PadStrideInfo &	conv_info,
		std::pair< unsigned int, unsigned int >	convolved_dims,
		const Size2D &	kernel_dims,
		const Size2D &	dilation,
		uint32_t	input_pad_right,
		bool	has_bias
	)

Definition at line 161 of file CpuIm2ColKernel.cpp.

 {
 /*
    Note that when building with the option data_type_support=fp32 the fp16.cpp files won't be compiled and the linker
    would fail with the error undefined arm_compute::cpu::kernels::run_im2col_fp16_pad.
    To avoid this problem we only call to the actual fp16 kernel if ENABLE_FP16_KERNELS is defined.
 */
 #if defined(ENABLE_FP16_KERNELS)
     arm_compute::cpu::kernels::run_im2col_fp16_pad(src, dst, window, data_layout, conv_info, convolved_dims,
                                                    kernel_dims, dilation, input_pad_right, has_bias);
 #else  // defined(ENABLE_FP16_KERNELS)
     ARM_COMPUTE_UNUSED(src, dst, window, data_layout, conv_info, convolved_dims, kernel_dims, dilation, input_pad_right,
                        has_bias);
 #endif // defined(ENABLE_FP16_KERNELS)
 }

References ARM_COMPUTE_UNUSED, arm_compute::test::validation::conv_info, arm_compute::cpu::data_layout, arm_compute::test::validation::dst, arm_compute::test::validation::has_bias, run_im2col_fp16_pad(), and arm_compute::test::validation::src.

Referenced by CpuIm2ColKernel::configure().

◆ linearize_volume_nchw()

void arm_compute::cpu::kernels::linearize_volume_nchw	(	const uint8_t *const	in_ptr,
		T *	out_ptr,
		bool	has_bias,
		int	top_left_x,
		int	top_left_y,
		int	kernel_width,
		int	kernel_height,
		int	kernel_depth,
		int	input_w,
		int	input_h,
		int	input_stride_x,
		int	input_stride_y,
		int	input_stride_z,
		int	pad_value,
		int	dilation_x,
		int	dilation_y
	)

Definition at line 47 of file impl.h.

 {
     const int kernel_size2 = kernel_width * kernel_height;
     const int x_e          = top_left_x + kernel_width * dilation_x;
     const int y_e          = top_left_y + kernel_height * dilation_y;
  
     // Linearize volume
     int d = 0;
     // This for loop linearize a volume with 3 slices. This allows:
     // 1) to reduce the iterations of the outer for loop "d"
     // 2) to have an optimized im2col for the first convolution layer where usually we have 3 IFMs
     for (; d <= (kernel_depth - 3); d += 3)
     {
         for (int y = top_left_y; y < y_e; y += dilation_y)
         {
             if ((y < 0 || y >= input_h) && has_pads)
             {
                 // All the values will be the offset (will be zeros when not quantized)
                 for (int x = top_left_x; x < x_e; x += dilation_x, ++out_ptr)
                 {
                     *(out_ptr + 0 * kernel_size2) = pad_value;
                     *(out_ptr + 1 * kernel_size2) = pad_value;
                     *(out_ptr + 2 * kernel_size2) = pad_value;
                 }
             }
             else
             {
                 for (int x = top_left_x; x < x_e; x += dilation_x, ++out_ptr)
                 {
                     if ((x < 0 || x >= input_w) && has_pads)
                     {
                         *(out_ptr + 0 * kernel_size2) = pad_value;
                         *(out_ptr + 1 * kernel_size2) = pad_value;
                         *(out_ptr + 2 * kernel_size2) = pad_value;
                     }
                     else
                     {
                         *(out_ptr + 0 * kernel_size2) = *(reinterpret_cast<const T *>(
                             in_ptr + ((d + 0) * input_stride_z + y * input_stride_y + x * input_stride_x)));
                         *(out_ptr + 1 * kernel_size2) = *(reinterpret_cast<const T *>(
                             in_ptr + ((d + 1) * input_stride_z + y * input_stride_y + x * input_stride_x)));
                         *(out_ptr + 2 * kernel_size2) = *(reinterpret_cast<const T *>(
                             in_ptr + ((d + 2) * input_stride_z + y * input_stride_y + x * input_stride_x)));
                     }
                 }
             }
         }
         out_ptr += 2 * kernel_size2;
     }
  
     // Left over
     for (; d < kernel_depth; d++)
     {
         for (int y = top_left_y; y < y_e; y += dilation_y)
         {
             if ((y < 0 || y >= input_h) && has_pads)
             {
                 // All the values will be the offset (will be zeros when not quantized)
                 memset(static_cast<void *>(out_ptr), pad_value, kernel_width * sizeof(T));
                 out_ptr += kernel_width;
             }
             else
             {
                 for (int x = top_left_x; x < x_e; x += dilation_x, ++out_ptr)
                 {
                     if ((x < 0 || x >= input_w) && has_pads)
                     {
                         *out_ptr = pad_value;
                     }
                     else
                     {
                         *out_ptr = *(reinterpret_cast<const T *>(
                             in_ptr + (d * input_stride_z + y * input_stride_y + x * input_stride_x)));
                     }
                 }
             }
         }
     }
  
     // Append 1 if the convolution layer has biases
     if (has_bias)
     {
         *out_ptr = static_cast<T>(1);
     }
 }

References arm_compute::test::validation::has_bias.

◆ linearize_volume_nhwc() [1/2]

void arm_compute::cpu::kernels::linearize_volume_nhwc	(	const uint8_t *const	in_ptr,
		T *	out_ptr,
		bool	has_bias,
		int	start_x,
		int	start_y,
		int	kernel_width,
		int	kernel_height,
		int	input_w,
		int	input_h,
		int	input_c,
		int	input_stride_y,
		int	input_stride_z,
		int	pad_value,
		int	dilation_x,
		int	dilation_y
	)

Definition at line 149 of file impl.h.

 {
     const int end_x        = start_x + kernel_width * dilation_x;
     const int end_y        = start_y + kernel_height * dilation_y;
     const int pad_quant    = kernel_width * input_c;
     const int element_size = static_cast<int>(sizeof(T));
     if ((start_y >= 0) && (end_y < input_h) && (start_x >= 0) && (end_x < input_w) && (dilation_x == 1) &&
         (input_stride_y == input_c * element_size))
     {
         for (int y = start_y; y < end_y; y += dilation_y)
         {
             //optimized for no dilation and no boundary pixels
             memcpy(out_ptr, reinterpret_cast<const T *>(in_ptr + (y * input_stride_z + start_x * input_stride_y)),
                    input_c * kernel_width * element_size);
             out_ptr += input_c * kernel_width;
         }
     }
     else
     {
         for (int y = start_y; y < end_y; y += dilation_y)
         {
             if (y < 0 || y >= input_h)
             {
                 memset(static_cast<void *>(out_ptr), pad_value, pad_quant * element_size);
                 out_ptr += pad_quant;
             }
             else if (dilation_x > 1 || start_x < 0 || end_x >= input_w || input_stride_y != input_c * element_size)
             {
                 for (int x = start_x; x < end_x; x += dilation_x)
                 {
                     if (x < 0 || x >= input_w)
                     {
                         memset(static_cast<void *>(out_ptr), pad_value, input_c * element_size);
                         out_ptr += input_c;
                     }
                     else
                     {
                         memcpy(out_ptr, reinterpret_cast<const T *>(in_ptr + (y * input_stride_z + x * input_stride_y)),
                                input_c * element_size);
                         out_ptr += input_c;
                     }
                 }
             }
             else
             {
                 //optimized for no dilation and no boundary pixels
                 memcpy(out_ptr, reinterpret_cast<const T *>(in_ptr + (y * input_stride_z + start_x * input_stride_y)),
                        input_c * kernel_width * element_size);
                 out_ptr += input_c * kernel_width;
             }
         }
     }
     // Append 1 if the convolution layer has biases
     if (has_bias)
     {
         *out_ptr = static_cast<T>(1);
     }
 }

References arm_compute::test::validation::has_bias.

◆ linearize_volume_nhwc() [2/2]

void arm_compute::cpu::kernels::linearize_volume_nhwc	(	const uint8_t *const	in_ptr,
		T *	out_ptr,
		bool	has_bias,
		int	start_x,
		int	start_y,
		int	kernel_width,
		int	kernel_height,
		int	input_w,
		int	input_h,
		int	input_c,
		int	input_stride_y,
		int	input_stride_z,
		int	pad_value,
		int	dilation_x,
		int	dilation_y,
		int	pad_right
	)

Definition at line 223 of file impl.h.

 {
     const int end_x              = start_x + kernel_width * dilation_x;
     const int end_y              = start_y + kernel_height * dilation_y;
     const int pad_quant          = kernel_width * (input_c + pad_right);
     const int element_size       = static_cast<int>(sizeof(T));
     const int channel_chunk_size = input_c * element_size;
  
     if ((start_y >= 0) && (end_y < input_h) && (start_x >= 0) && (end_x < input_w) && (dilation_x == 1) &&
         (input_stride_y == channel_chunk_size))
     {
         for (int y = start_y; y < end_y; y += dilation_y)
         {
             const uint8_t *offset_ptr = in_ptr + (y * input_stride_z + start_x * input_stride_y);
             for (int e = 0; e < kernel_width; e++)
             {
                 memcpy(out_ptr, reinterpret_cast<const T *>(offset_ptr + e * channel_chunk_size), channel_chunk_size);
                 out_ptr += input_c + pad_right;
             }
         }
     }
     else
     {
         for (int y = start_y; y < end_y; y += dilation_y)
         {
             if (y < 0 || y >= input_h)
             {
                 memset(static_cast<void *>(out_ptr), pad_value, pad_quant * element_size);
                 out_ptr += pad_quant;
             }
             else if (dilation_x > 1 || start_x < 0 || end_x >= input_w || input_stride_y != channel_chunk_size)
             {
                 for (int x = start_x; x < end_x; x += dilation_x)
                 {
                     if (x < 0 || x >= input_w)
                     {
                         memset(static_cast<void *>(out_ptr), pad_value, (input_c + pad_right) * element_size);
                         out_ptr += input_c + pad_right;
                     }
                     else
                     {
                         memcpy(out_ptr, reinterpret_cast<const T *>(in_ptr + (y * input_stride_z + x * input_stride_y)),
                                channel_chunk_size);
                         out_ptr += input_c + pad_right;
                     }
                 }
             }
             else
             {
                 const uint8_t *offset_ptr = in_ptr + (y * input_stride_z + start_x * input_stride_y);
                 for (int e = 0; e < kernel_width; e++)
                 {
                     memcpy(out_ptr, reinterpret_cast<const T *>(offset_ptr + e * channel_chunk_size),
                            channel_chunk_size);
                     out_ptr += input_c + pad_right;
                 }
             }
         }
     }
     // Append 1 if the convolution layer has biases
     if (has_bias)
     {
         *out_ptr = static_cast<T>(1);
     }
 }

References arm_compute::test::validation::has_bias.

◆ neon_fp16_nchw_directconv2d()

void arm_compute::cpu::kernels::neon_fp16_nchw_directconv2d	(	const Window &	window,
		const ITensor *	src,
		const ITensor *	weights,
		ITensor *	dst,
		const PadStrideInfo &	conv_info
	)

◆ neon_fp32_nchw_directconv2d()

void neon_fp32_nchw_directconv2d	(	const Window &	window,
		const ITensor *	src,
		const ITensor *	weights,
		ITensor *	dst,
		const PadStrideInfo &	conv_info
	)

Definition at line 48 of file all.cpp.

 {
     convolve_nchw<float>(window, src, weights, dst, conv_info);
 }

References arm_compute::test::validation::conv_info, arm_compute::test::validation::dst, and arm_compute::test::validation::src.

◆ neon_fp32_nhwc_directconv2d()

void neon_fp32_nhwc_directconv2d	(	const Window &	window,
		const ITensor *	src,
		const ITensor *	weights,
		ITensor *	dst,
		const PadStrideInfo &	conv_info
	)

Definition at line 33 of file fp32.cpp.

 {
     convolve_nhwc<float>(window, src, weights, dst, conv_info);
 }

References arm_compute::test::validation::conv_info, convolve_nhwc< float >(), arm_compute::test::validation::dst, and arm_compute::test::validation::src.

◆ run_im2col()

void arm_compute::cpu::kernels::run_im2col	(	const ITensor *	src,
		ITensor *	dst,
		const Window &	window,
		DataLayout	data_layout,
		const PadStrideInfo &	conv_info,
		std::pair< unsigned int, unsigned int >	convolved_dims,
		const Size2D &	kernel_dims,
		const Size2D &	dilation,
		uint32_t	input_pad_right,
		bool	has_bias
	)

Definition at line 305 of file impl.h.

 {
     const unsigned int width_idx   = get_data_layout_dimension_index(data_layout, DataLayoutDimension::WIDTH);
     const unsigned int height_idx  = get_data_layout_dimension_index(data_layout, DataLayoutDimension::HEIGHT);
     const unsigned int channel_idx = get_data_layout_dimension_index(data_layout, DataLayoutDimension::CHANNEL);
  
     const int input_w        = src->info()->dimension(width_idx);
     const int input_h        = src->info()->dimension(height_idx);
     const int input_c        = src->info()->dimension(channel_idx);
     const int input_stride_x = src->info()->strides_in_bytes().x();
     const int input_stride_y = src->info()->strides_in_bytes().y();
     const int input_stride_z = src->info()->strides_in_bytes().z();
     const int pad_left       = conv_info.pad_left();
     const int pad_top        = conv_info.pad_top();
     const int stride_x       = conv_info.stride().first;
     const int stride_y       = conv_info.stride().second;
     const int pad_value =
         is_data_type_quantized(src->info()->data_type()) ? src->info()->quantization_info().uniform().offset : 0;
  
     const auto kernel_width  = kernel_dims.width;
     const auto kernel_height = kernel_dims.height;
  
     Window window_in_out(window);
     // The first three dimensions of the input and output are increased by the inner loops
     window_in_out.set(Window::DimX, Window::Dimension(0, 0, 0));
     window_in_out.set(Window::DimY, Window::Dimension(0, 0, 0));
     window_in_out.set(Window::DimZ, Window::Dimension(0, 0, 0));
  
     // Create iterators
     Iterator in(src, window_in_out);
     Iterator out(dst, window_in_out);
  
     execute_window_loop(
         window,
         [&](const Coordinates &id)
         {
             const int start_w = id[width_idx] * stride_x - pad_left;
             const int start_h = id[height_idx] * stride_y - pad_top;
  
             // Get pointers
             const uint8_t *const input_ptr = in.ptr();
             auto                 output_ptr =
                 reinterpret_cast<T *>(out.ptr() + (id[width_idx] + id[height_idx] * convolved_dims.first) *
                                                       dst->info()->strides_in_bytes().y());
  
             // Linearize volume
             if (is_nchw)
             {
                 linearize_volume_nchw<T, has_pads>(
                     input_ptr, output_ptr, has_bias, start_w, start_h, kernel_width, kernel_height, input_c, input_w,
                     input_h, input_stride_x, input_stride_y, input_stride_z, pad_value, dilation.x(), dilation.y());
             }
             else
             {
                 if (input_pad_right > 0)
                 {
                     linearize_volume_nhwc<T, has_pads>(input_ptr, output_ptr, has_bias, start_w, start_h, kernel_width,
                                                        kernel_height, input_w, input_h, input_c, input_stride_y,
                                                        input_stride_z, pad_value, dilation.x(), dilation.y(),
                                                        input_pad_right);
                 }
                 else
                 {
                     linearize_volume_nhwc<T, has_pads>(input_ptr, output_ptr, has_bias, start_w, start_h, kernel_width,
                                                        kernel_height, input_w, input_h, input_c, input_stride_y,
                                                        input_stride_z, pad_value, dilation.x(), dilation.y());
                 }
             }
         },
         in, out);
 }

References arm_compute::CHANNEL, arm_compute::cpu::channel_idx, arm_compute::test::validation::conv_info, arm_compute::cpu::data_layout, Window::DimX, Window::DimY, Window::DimZ, arm_compute::test::validation::dst, arm_compute::execute_window_loop(), arm_compute::get_data_layout_dimension_index(), arm_compute::test::validation::has_bias, Size2D::height, arm_compute::HEIGHT, arm_compute::cpu::height_idx, arm_compute::is_data_type_quantized(), Iterator::ptr(), Window::set(), arm_compute::test::validation::src, Size2D::width, arm_compute::WIDTH, arm_compute::cpu::width_idx, Size2D::x(), and Size2D::y().

◆ run_im2col_bf16_nchw_nopad()

void arm_compute::cpu::kernels::run_im2col_bf16_nchw_nopad	(	const ITensor *	src,
		ITensor *	dst,
		const Window &	window,
		DataLayout	data_layout,
		const PadStrideInfo &	conv_info,
		std::pair< unsigned int, unsigned int >	convolved_dims,
		const Size2D &	kernel_dims,
		const Size2D &	dilation,
		uint32_t	input_pad_right,
		bool	has_bias
	)

Referenced by CpuIm2ColKernel::configure().

◆ run_im2col_bf16_nchw_pad()

void arm_compute::cpu::kernels::run_im2col_bf16_nchw_pad	(	const ITensor *	src,
		ITensor *	dst,
		const Window &	window,
		DataLayout	data_layout,
		const PadStrideInfo &	conv_info,
		std::pair< unsigned int, unsigned int >	convolved_dims,
		const Size2D &	kernel_dims,
		const Size2D &	dilation,
		uint32_t	input_pad_right,
		bool	has_bias
	)

Referenced by CpuIm2ColKernel::configure().

◆ run_im2col_bf16_nopad()

void arm_compute::cpu::kernels::run_im2col_bf16_nopad	(	const ITensor *	src,
		ITensor *	dst,
		const Window &	window,
		DataLayout	data_layout,
		const PadStrideInfo &	conv_info,
		std::pair< unsigned int, unsigned int >	convolved_dims,
		const Size2D &	kernel_dims,
		const Size2D &	dilation,
		uint32_t	input_pad_right,
		bool	has_bias
	)

Referenced by CpuIm2ColKernel::configure().

◆ run_im2col_bf16_pad()

void arm_compute::cpu::kernels::run_im2col_bf16_pad	(	const ITensor *	src,
		ITensor *	dst,
		const Window &	window,
		DataLayout	data_layout,
		const PadStrideInfo &	conv_info,
		std::pair< unsigned int, unsigned int >	convolved_dims,
		const Size2D &	kernel_dims,
		const Size2D &	dilation,
		uint32_t	input_pad_right,
		bool	has_bias
	)

Referenced by CpuIm2ColKernel::configure().

◆ run_im2col_fp16_nchw_nopad()

void run_im2col_fp16_nchw_nopad	(	const ITensor *	src,
		ITensor *	dst,
		const Window &	window,
		DataLayout	data_layout,
		const PadStrideInfo &	conv_info,
		std::pair< unsigned int, unsigned int >	convolved_dims,
		const Size2D &	kernel_dims,
		const Size2D &	dilation,
		uint32_t	input_pad_right,
		bool	has_bias
	)

Definition at line 62 of file fp16.cpp.

 {
 #if defined(__ARM_FEATURE_FP16_VECTOR_ARITHMETIC) && defined(ENABLE_FP16_KERNELS)
     arm_compute::cpu::kernels::run_im2col<float16_t, false, true>(
         src, dst, window, data_layout, conv_info, convolved_dims, kernel_dims, dilation, input_pad_right, has_bias);
 #else  // defined(__ARM_FEATURE_FP16_VECTOR_ARITHMETIC) && defined(ENABLE_FP16_KERNELS)
     ARM_COMPUTE_UNUSED(src, dst, window, data_layout, conv_info, convolved_dims, kernel_dims, dilation, input_pad_right,
                        has_bias);
 #endif // defined(__ARM_FEATURE_FP16_VECTOR_ARITHMETIC) && defined(ENABLE_FP16_KERNELS)
 }

References ARM_COMPUTE_UNUSED, arm_compute::test::validation::conv_info, arm_compute::cpu::data_layout, arm_compute::test::validation::dst, arm_compute::test::validation::has_bias, and arm_compute::test::validation::src.

Referenced by internal_run_im2col_fp16_nchw_nopad().

◆ run_im2col_fp16_nchw_pad()

void run_im2col_fp16_nchw_pad	(	const ITensor *	src,
		ITensor *	dst,
		const Window &	window,
		DataLayout	data_layout,
		const PadStrideInfo &	conv_info,
		std::pair< unsigned int, unsigned int >	convolved_dims,
		const Size2D &	kernel_dims,
		const Size2D &	dilation,
		uint32_t	input_pad_right,
		bool	has_bias
	)

Definition at line 42 of file fp16.cpp.

 {
 #if defined(__ARM_FEATURE_FP16_VECTOR_ARITHMETIC) && defined(ENABLE_FP16_KERNELS)
     arm_compute::cpu::kernels::run_im2col<float16_t, true, true>(
         src, dst, window, data_layout, conv_info, convolved_dims, kernel_dims, dilation, input_pad_right, has_bias);
 #else  // defined(__ARM_FEATURE_FP16_VECTOR_ARITHMETIC) && defined(ENABLE_FP16_KERNELS)
     ARM_COMPUTE_UNUSED(src, dst, window, data_layout, conv_info, convolved_dims, kernel_dims, dilation, input_pad_right,
                        has_bias);
 #endif // defined(__ARM_FEATURE_FP16_VECTOR_ARITHMETIC) && defined(ENABLE_FP16_KERNELS)
 }

References ARM_COMPUTE_UNUSED, arm_compute::test::validation::conv_info, arm_compute::cpu::data_layout, arm_compute::test::validation::dst, arm_compute::test::validation::has_bias, and arm_compute::test::validation::src.

Referenced by internal_run_im2col_fp16_nchw_pad().

◆ run_im2col_fp16_nopad()

void run_im2col_fp16_nopad	(	const ITensor *	src,
		ITensor *	dst,
		const Window &	window,
		DataLayout	data_layout,
		const PadStrideInfo &	conv_info,
		std::pair< unsigned int, unsigned int >	convolved_dims,
		const Size2D &	kernel_dims,
		const Size2D &	dilation,
		uint32_t	input_pad_right,
		bool	has_bias
	)

Definition at line 54 of file fp16.cpp.

 {
 #if defined(__ARM_FEATURE_FP16_VECTOR_ARITHMETIC) && defined(ENABLE_FP16_KERNELS)
     arm_compute::cpu::kernels::run_im2col<float16_t, false, false>(
         src, dst, window, data_layout, conv_info, convolved_dims, kernel_dims, dilation, input_pad_right, has_bias);
 #else  // defined(__ARM_FEATURE_FP16_VECTOR_ARITHMETIC) && defined(ENABLE_FP16_KERNELS)
     ARM_COMPUTE_UNUSED(src, dst, window, data_layout, conv_info, convolved_dims, kernel_dims, dilation, input_pad_right,
                        has_bias);
 #endif // defined(__ARM_FEATURE_FP16_VECTOR_ARITHMETIC) && defined(ENABLE_FP16_KERNELS)
 }

References ARM_COMPUTE_UNUSED, arm_compute::test::validation::conv_info, arm_compute::cpu::data_layout, arm_compute::test::validation::dst, arm_compute::test::validation::has_bias, and arm_compute::test::validation::src.

Referenced by internal_run_im2col_fp16_nopad().

◆ run_im2col_fp16_pad()

void run_im2col_fp16_pad	(	const ITensor *	src,
		ITensor *	dst,
		const Window &	window,
		DataLayout	data_layout,
		const PadStrideInfo &	conv_info,
		std::pair< unsigned int, unsigned int >	convolved_dims,
		const Size2D &	kernel_dims,
		const Size2D &	dilation,
		uint32_t	input_pad_right,
		bool	has_bias
	)

Definition at line 34 of file fp16.cpp.

 {
 #if defined(__ARM_FEATURE_FP16_VECTOR_ARITHMETIC) && defined(ENABLE_FP16_KERNELS)
     arm_compute::cpu::kernels::run_im2col<float16_t, true, false>(
         src, dst, window, data_layout, conv_info, convolved_dims, kernel_dims, dilation, input_pad_right, has_bias);
 #else  // defined(__ARM_FEATURE_FP16_VECTOR_ARITHMETIC) && defined(ENABLE_FP16_KERNELS)
     ARM_COMPUTE_UNUSED(src, dst, window, data_layout, conv_info, convolved_dims, kernel_dims, dilation, input_pad_right,
                        has_bias);
 #endif // defined(__ARM_FEATURE_FP16_VECTOR_ARITHMETIC) && defined(ENABLE_FP16_KERNELS)
 }

References ARM_COMPUTE_UNUSED, arm_compute::test::validation::conv_info, arm_compute::cpu::data_layout, arm_compute::test::validation::dst, arm_compute::test::validation::has_bias, and arm_compute::test::validation::src.

Referenced by internal_run_im2col_fp16_pad().

◆ run_im2col_fp32_nchw_nopad()

void run_im2col_fp32_nchw_nopad	(	const ITensor *	src,
		ITensor *	dst,
		const Window &	window,
		DataLayout	data_layout,
		const PadStrideInfo &	conv_info,
		std::pair< unsigned int, unsigned int >	convolved_dims,
		const Size2D &	kernel_dims,
		const Size2D &	dilation,
		uint32_t	input_pad_right,
		bool	has_bias
	)

Definition at line 69 of file all.cpp.

 {
     arm_compute::cpu::kernels::run_im2col<float, false, true>(src, dst, window, data_layout, conv_info, convolved_dims,
                                                               kernel_dims, dilation, input_pad_right, has_bias);
 }

References arm_compute::test::validation::conv_info, arm_compute::cpu::data_layout, arm_compute::test::validation::dst, arm_compute::test::validation::has_bias, and arm_compute::test::validation::src.

Referenced by CpuIm2ColKernel::configure().

◆ run_im2col_fp32_nchw_pad()

void run_im2col_fp32_nchw_pad	(	const ITensor *	src,
		ITensor *	dst,
		const Window &	window,
		DataLayout	data_layout,
		const PadStrideInfo &	conv_info,
		std::pair< unsigned int, unsigned int >	convolved_dims,
		const Size2D &	kernel_dims,
		const Size2D &	dilation,
		uint32_t	input_pad_right,
		bool	has_bias
	)

Definition at line 54 of file all.cpp.

 {
     arm_compute::cpu::kernels::run_im2col<float, true, true>(src, dst, window, data_layout, conv_info, convolved_dims,
                                                              kernel_dims, dilation, input_pad_right, has_bias);
 }

References arm_compute::test::validation::conv_info, arm_compute::cpu::data_layout, arm_compute::test::validation::dst, arm_compute::test::validation::has_bias, and arm_compute::test::validation::src.

Referenced by CpuIm2ColKernel::configure().

◆ run_im2col_fp32_nopad()

void run_im2col_fp32_nopad	(	const ITensor *	src,
		ITensor *	dst,
		const Window &	window,
		DataLayout	data_layout,
		const PadStrideInfo &	conv_info,
		std::pair< unsigned int, unsigned int >	convolved_dims,
		const Size2D &	kernel_dims,
		const Size2D &	dilation,
		uint32_t	input_pad_right,
		bool	has_bias
	)

Definition at line 69 of file CpuIm2ColKernel.cpp.

 {
     arm_compute::cpu::kernels::run_im2col<float, false, false>(src, dst, window, data_layout, conv_info, convolved_dims,
                                                                kernel_dims, dilation, input_pad_right, has_bias);
 }

References arm_compute::test::validation::conv_info, arm_compute::cpu::data_layout, arm_compute::test::validation::dst, arm_compute::test::validation::has_bias, and arm_compute::test::validation::src.

Referenced by CpuIm2ColKernel::configure().

◆ run_im2col_fp32_pad()

void run_im2col_fp32_pad	(	const ITensor *	src,
		ITensor *	dst,
		const Window &	window,
		DataLayout	data_layout,
		const PadStrideInfo &	conv_info,
		std::pair< unsigned int, unsigned int >	convolved_dims,
		const Size2D &	kernel_dims,
		const Size2D &	dilation,
		uint32_t	input_pad_right,
		bool	has_bias
	)

Definition at line 54 of file CpuIm2ColKernel.cpp.

 {
     arm_compute::cpu::kernels::run_im2col<float, true, false>(src, dst, window, data_layout, conv_info, convolved_dims,
                                                               kernel_dims, dilation, input_pad_right, has_bias);
 }

References arm_compute::test::validation::conv_info, arm_compute::cpu::data_layout, arm_compute::test::validation::dst, arm_compute::test::validation::has_bias, and arm_compute::test::validation::src.

Referenced by CpuIm2ColKernel::configure().

◆ run_im2col_int8_nopad_nhwc()

void arm_compute::cpu::kernels::run_im2col_int8_nopad_nhwc	(	const ITensor *	src,
		ITensor *	dst,
		const Window &	window,
		DataLayout	data_layout,
		const PadStrideInfo &	conv_info,
		std::pair< unsigned int, unsigned int >	convolved_dims,
		const Size2D &	kernel_dims,
		const Size2D &	dilation,
		uint32_t	input_pad_right,
		bool	has_bias
	)

Definition at line 116 of file CpuIm2ColKernel.cpp.

 {
     arm_compute::cpu::kernels::run_im2col<int8_t, false, false>(
         src, dst, window, data_layout, conv_info, convolved_dims, kernel_dims, dilation, input_pad_right, has_bias);
 }

References arm_compute::test::validation::conv_info, arm_compute::cpu::data_layout, arm_compute::test::validation::dst, arm_compute::test::validation::has_bias, and arm_compute::test::validation::src.

Referenced by CpuIm2ColKernel::configure().

◆ run_im2col_qasymm8_nchw_nopad()

void run_im2col_qasymm8_nchw_nopad	(	const ITensor *	src,
		ITensor *	dst,
		const Window &	window,
		DataLayout	data_layout,
		const PadStrideInfo &	conv_info,
		std::pair< unsigned int, unsigned int >	convolved_dims,
		const Size2D &	kernel_dims,
		const Size2D &	dilation,
		uint32_t	input_pad_right,
		bool	has_bias
	)

Definition at line 99 of file all.cpp.

 {
     arm_compute::cpu::kernels::run_im2col<qasymm8_t, false, true>(
         src, dst, window, data_layout, conv_info, convolved_dims, kernel_dims, dilation, input_pad_right, has_bias);
 }

References arm_compute::test::validation::conv_info, arm_compute::cpu::data_layout, arm_compute::test::validation::dst, arm_compute::test::validation::has_bias, and arm_compute::test::validation::src.

Referenced by CpuIm2ColKernel::configure().

◆ run_im2col_qasymm8_nchw_pad()

void run_im2col_qasymm8_nchw_pad	(	const ITensor *	src,
		ITensor *	dst,
		const Window &	window,
		DataLayout	data_layout,
		const PadStrideInfo &	conv_info,
		std::pair< unsigned int, unsigned int >	convolved_dims,
		const Size2D &	kernel_dims,
		const Size2D &	dilation,
		uint32_t	input_pad_right,
		bool	has_bias
	)

Definition at line 84 of file all.cpp.

 {
     arm_compute::cpu::kernels::run_im2col<qasymm8_t, true, true>(
         src, dst, window, data_layout, conv_info, convolved_dims, kernel_dims, dilation, input_pad_right, has_bias);
 }

References arm_compute::test::validation::conv_info, arm_compute::cpu::data_layout, arm_compute::test::validation::dst, arm_compute::test::validation::has_bias, and arm_compute::test::validation::src.

Referenced by CpuIm2ColKernel::configure().

◆ run_im2col_qasymm8_nopad()

void arm_compute::cpu::kernels::run_im2col_qasymm8_nopad	(	const ITensor *	src,
		ITensor *	dst,
		const Window &	window,
		DataLayout	data_layout,
		const PadStrideInfo &	conv_info,
		std::pair< unsigned int, unsigned int >	convolved_dims,
		const Size2D &	kernel_dims,
		const Size2D &	dilation,
		uint32_t	input_pad_right,
		bool	has_bias
	)

Definition at line 49 of file qasymm8.cpp.

 {
     arm_compute::cpu::kernels::run_im2col<qasymm8_t, false, false>(
         src, dst, window, data_layout, conv_info, convolved_dims, kernel_dims, dilation, input_pad_right, has_bias);
 }

References arm_compute::test::validation::conv_info, arm_compute::cpu::data_layout, arm_compute::test::validation::dst, arm_compute::test::validation::has_bias, and arm_compute::test::validation::src.

◆ run_im2col_qasymm8_pad()

void arm_compute::cpu::kernels::run_im2col_qasymm8_pad	(	const ITensor *	src,
		ITensor *	dst,
		const Window &	window,
		DataLayout	data_layout,
		const PadStrideInfo &	conv_info,
		std::pair< unsigned int, unsigned int >	convolved_dims,
		const Size2D &	kernel_dims,
		const Size2D &	dilation,
		uint32_t	input_pad_right,
		bool	has_bias
	)

Definition at line 34 of file qasymm8.cpp.

 {
     arm_compute::cpu::kernels::run_im2col<qasymm8_t, true, false>(
         src, dst, window, data_layout, conv_info, convolved_dims, kernel_dims, dilation, input_pad_right, has_bias);
 }

References arm_compute::test::validation::conv_info, arm_compute::cpu::data_layout, arm_compute::test::validation::dst, arm_compute::test::validation::has_bias, and arm_compute::test::validation::src.

◆ run_im2col_qasymm8_pad_nhwc()

void arm_compute::cpu::kernels::run_im2col_qasymm8_pad_nhwc	(	const ITensor *	src,
		ITensor *	dst,
		const Window &	window,
		DataLayout	data_layout,
		const PadStrideInfo &	conv_info,
		std::pair< unsigned int, unsigned int >	convolved_dims,
		const Size2D &	kernel_dims,
		const Size2D &	dilation,
		uint32_t	input_pad_right,
		bool	has_bias
	)

Definition at line 146 of file CpuIm2ColKernel.cpp.

 {
     arm_compute::cpu::kernels::run_im2col<qasymm8_t, true, false>(
         src, dst, window, data_layout, conv_info, convolved_dims, kernel_dims, dilation, input_pad_right, has_bias);
 }

References arm_compute::test::validation::conv_info, arm_compute::cpu::data_layout, arm_compute::test::validation::dst, arm_compute::test::validation::has_bias, and arm_compute::test::validation::src.

Referenced by CpuIm2ColKernel::configure().

◆ run_im2col_uint8_nopad_nhwc()

void arm_compute::cpu::kernels::run_im2col_uint8_nopad_nhwc	(	const ITensor *	src,
		ITensor *	dst,
		const Window &	window,
		DataLayout	data_layout,
		const PadStrideInfo &	conv_info,
		std::pair< unsigned int, unsigned int >	convolved_dims,
		const Size2D &	kernel_dims,
		const Size2D &	dilation,
		uint32_t	input_pad_right,
		bool	has_bias
	)

Definition at line 131 of file CpuIm2ColKernel.cpp.

 {
     arm_compute::cpu::kernels::run_im2col<uint8_t, false, false>(
         src, dst, window, data_layout, conv_info, convolved_dims, kernel_dims, dilation, input_pad_right, has_bias);
 }

References arm_compute::test::validation::conv_info, arm_compute::cpu::data_layout, arm_compute::test::validation::dst, arm_compute::test::validation::has_bias, and arm_compute::test::validation::src.

Referenced by CpuIm2ColKernel::configure().

◆ validate_and_configure_window()

std::pair<Status, Window> arm_compute::cpu::kernels::validate_and_configure_window	(	ITensorInfo *	src,
		ITensorInfo *	dst
	)

Definition at line 92 of file CpuDirectConv2dKernel.cpp.

 {
     ARM_COMPUTE_ERROR_ON(src->data_layout() == DataLayout::UNKNOWN);
     ARM_COMPUTE_UNUSED(src);
  
     Window win{};
     bool   window_changed = false;
  
     // Configure window without any padding
     win = calculate_max_window(*dst, Steps());
  
     Status err =
         (window_changed) ? ARM_COMPUTE_CREATE_ERROR(ErrorCode::RUNTIME_ERROR, "Insufficient Padding!") : Status{};
     return std::make_pair(err, win);
 }

References ARM_COMPUTE_CREATE_ERROR, ARM_COMPUTE_ERROR_ON, ARM_COMPUTE_UNUSED, arm_compute::calculate_max_window(), arm_compute::test::validation::dst, arm_compute::RUNTIME_ERROR, arm_compute::test::validation::src, and arm_compute::UNKNOWN.

◆ validate_arguments()

Status arm_compute::cpu::kernels::validate_arguments	(	const ITensorInfo *	src,
		const ITensorInfo *	weights,
		const ITensorInfo *	dst,
		const PadStrideInfo &	conv_info
	)

Definition at line 57 of file CpuDirectConv2dKernel.cpp.

 {
     ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR(src, weights, dst);
     ARM_COMPUTE_RETURN_ERROR_ON(src->data_layout() == DataLayout::UNKNOWN);
     ARM_COMPUTE_RETURN_ERROR_ON_CPU_F16_UNSUPPORTED(src);
     ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(src, 1, DataType::F16, DataType::F32);
     ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(src, weights);
  
     const DataLayout data_layout = src->data_layout();
     const int        width_idx   = get_data_layout_dimension_index(data_layout, DataLayoutDimension::WIDTH);
     const int        height_idx  = get_data_layout_dimension_index(data_layout, DataLayoutDimension::HEIGHT);
     const int        channel_idx = get_data_layout_dimension_index(data_layout, DataLayoutDimension::CHANNEL);
  
     ARM_COMPUTE_RETURN_ERROR_ON(weights->dimension(channel_idx) != src->dimension(channel_idx));
     ARM_COMPUTE_RETURN_ERROR_ON(weights->dimension(width_idx) != weights->dimension(height_idx));
     ARM_COMPUTE_RETURN_ERROR_ON(weights->num_dimensions() > 4);
     ARM_COMPUTE_RETURN_ERROR_ON(data_layout == DataLayout::NHWC && src->data_type() != DataType::F32);
     ARM_COMPUTE_UNUSED(width_idx);
     // Checks performed when output is configured
     if (dst->total_size() != 0)
     {
         TensorShape output_shape = misc::shape_calculator::compute_deep_convolution_shape(*src, *weights, conv_info);
  
         DataType data_type = src->data_type();
  
         ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DIMENSIONS(dst->tensor_shape(), output_shape);
         ARM_COMPUTE_RETURN_ERROR_ON(dst->data_type() != data_type);
     }
  
     return Status{};
 }

References ARM_COMPUTE_RETURN_ERROR_ON, ARM_COMPUTE_RETURN_ERROR_ON_CPU_F16_UNSUPPORTED, ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DIMENSIONS, ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR, ARM_COMPUTE_UNUSED, arm_compute::CHANNEL, arm_compute::cpu::channel_idx, arm_compute::misc::shape_calculator::compute_deep_convolution_shape(), arm_compute::test::validation::conv_info, arm_compute::cpu::data_layout, arm_compute::test::validation::data_type, ITensorInfo::dimension(), arm_compute::test::validation::dst, arm_compute::F16, arm_compute::F32, arm_compute::get_data_layout_dimension_index(), arm_compute::HEIGHT, arm_compute::cpu::height_idx, arm_compute::NHWC, ITensorInfo::num_dimensions(), arm_compute::test::validation::output_shape, arm_compute::test::validation::src, arm_compute::UNKNOWN, arm_compute::WIDTH, and arm_compute::cpu::width_idx.

Data Structures

Typedefs

Functions

Typedef Documentation

◆ ActivationDataTypeISASelectorDataPtr

◆ CastDataTypeISASelectorDataPtr

◆ CpuAddKernelDataTypeISASelectorDataPtr

◆ DataTypeDataLayoutSelectorPtr

◆ DataTypeISASelectorPtr

◆ DepthwiseConv2dNativeDataTypeISASelectorPtr

◆ ElementwiseDataTypeISASelectorPtr

◆ PoolDataTypeISASelectorPtr

◆ ScaleKernelDataTypeISASelectorDataPtr

◆ SoftmaxKernelDataTypeISASelectorDataPtr

Function Documentation

◆ convolve_nchw()

◆ convolve_nhwc()

◆ convolve_nhwc< float >()

◆ internal_run_im2col_fp16_nchw_nopad()

◆ internal_run_im2col_fp16_nchw_pad()

◆ internal_run_im2col_fp16_nopad()

◆ internal_run_im2col_fp16_pad()

◆ linearize_volume_nchw()

◆ linearize_volume_nhwc() [1/2]

◆ linearize_volume_nhwc() [2/2]

◆ neon_fp16_nchw_directconv2d()

◆ neon_fp32_nchw_directconv2d()

◆ neon_fp32_nhwc_directconv2d()

◆ run_im2col()

◆ run_im2col_bf16_nchw_nopad()

◆ run_im2col_bf16_nchw_pad()

◆ run_im2col_bf16_nopad()

◆ run_im2col_bf16_pad()

◆ run_im2col_fp16_nchw_nopad()

◆ run_im2col_fp16_nchw_pad()

◆ run_im2col_fp16_nopad()

◆ run_im2col_fp16_pad()

◆ run_im2col_fp32_nchw_nopad()

◆ run_im2col_fp32_nchw_pad()

◆ run_im2col_fp32_nopad()

◆ run_im2col_fp32_pad()

◆ run_im2col_int8_nopad_nhwc()

◆ run_im2col_qasymm8_nchw_nopad()

◆ run_im2col_qasymm8_nchw_pad()

◆ run_im2col_qasymm8_nopad()

◆ run_im2col_qasymm8_pad()

◆ run_im2col_qasymm8_pad_nhwc()

◆ run_im2col_uint8_nopad_nhwc()

◆ validate_and_configure_window()

◆ validate_arguments()