Neon kernel to perform Winograd input transform. More...

#include <NEWinogradConvolutionLayerKernel.h>

Collaboration diagram for NEWinogradLayerTransformInputKernel< T, OutputTileRows, OutputTileCols, KernelRows, KernelCols >:

[legend]

Public Types
using	WinogradBase = winograd::WinogradGEMM< OutputTileRows, OutputTileCols, KernelRows, KernelCols, winograd::WinogradRoots::Integers >
	Winograd base kernel. More...

using	WinogradConv = typename WinogradBase::template Convolution< T, T >
	Winograd convolution kernel. More...

Public Member Functions
	NEWinogradLayerTransformInputKernel (const NEWinogradLayerTransformInputKernel &)=delete
	Prevent instances of this class from being copied (As this class contains pointers) More...

NEWinogradLayerTransformInputKernel &	operator= (const NEWinogradLayerTransformInputKernel &)=delete
	Prevent instances of this class from being copied (As this class contains pointers) More...

	NEWinogradLayerTransformInputKernel (NEWinogradLayerTransformInputKernel &&)=default
	Allow instances of this class to be moved. More...

NEWinogradLayerTransformInputKernel &	operator= (NEWinogradLayerTransformInputKernel &&)=default
	Allow instances of this class to be moved. More...

	~NEWinogradLayerTransformInputKernel ()=default
	Default destructor. More...

unsigned int	get_input_storage_size (int num_batches, int num_channels, int num_rows, int num_cols, bool same_padding) const override
	Determine how much memory (in units of TIn) to allocate for the transformed input. More...

unsigned int	get_working_space_size (unsigned int num_threads) const override
	Get the working space required to perform the transformation. More...

int	get_matrix_stride (int num_batches, int num_channels, int num_rows, int num_cols, bool same_padding) const override
	Gets the stride between matrices in the input worspace. More...

	NEWinogradLayerTransformInputKernel ()
	Default constructor. More...

const char *	name () const override
	Name of the kernel. More...

void	configure (const ITensor input_nhwc, const int num_batches, const int num_rows, const int num_cols, const int num_channels, const PaddingType padding, ITensor output, const int matrix_stride, ITensor *workspace) override
	Configure the output transform kernel. More...

void	run (const Window &window, const ThreadInfo &info) override
	Execute the kernel on the passed window. More...

Public Member Functions inherited from INEWinogradLayerTransformInputKernel
virtual	~INEWinogradLayerTransformInputKernel ()
	Destructor. More...

Public Member Functions inherited from ICPPKernel
virtual	~ICPPKernel ()=default
	Default destructor. More...

virtual void	run_nd (const Window &window, const ThreadInfo &info, const Window &thread_locator)
	legacy compatibility layer for implemantions which do not support thread_locator In these cases we simply narrow the interface down the legacy version More...

virtual void	run_op (ITensorPack &tensors, const Window &window, const ThreadInfo &info)
	Execute the kernel on the passed window. More...

Public Member Functions inherited from IKernel
	IKernel ()
	Constructor. More...

virtual	~IKernel ()=default
	Destructor. More...

virtual bool	is_parallelisable () const
	Indicates whether or not the kernel is parallelisable. More...

virtual BorderSize	border_size () const
	The size of the border for that kernel. More...

const Window &	window () const
	The maximum window the kernel can be executed on. More...

Static Public Member Functions
static Status	validate (const ITensorInfo input, const ITensorInfo output, const WinogradInfo &winograd_info)
	Static function to check if given info will lead to a valid configuration of NEWinogradLayerTransformInputKernel. More...

Detailed Description

template<typename T, int OutputTileRows, int OutputTileCols, int KernelRows, int KernelCols>
class arm_compute::NEWinogradLayerTransformInputKernel< T, OutputTileRows, OutputTileCols, KernelRows, KernelCols >

Neon kernel to perform Winograd input transform.

Definition at line 101 of file NEWinogradConvolutionLayerKernel.h.

Member Typedef Documentation

◆ WinogradBase

using WinogradBase = winograd::WinogradGEMM<OutputTileRows, OutputTileCols, KernelRows, KernelCols, winograd::WinogradRoots::Integers>

Winograd base kernel.

Definition at line 197 of file NEWinogradConvolutionLayerKernel.h.

◆ WinogradConv

using WinogradConv = typename WinogradBase::template Convolution<T, T>

Winograd convolution kernel.

Definition at line 199 of file NEWinogradConvolutionLayerKernel.h.

Constructor & Destructor Documentation

◆ NEWinogradLayerTransformInputKernel() [1/3]

NEWinogradLayerTransformInputKernel ( const NEWinogradLayerTransformInputKernel< T, OutputTileRows, OutputTileCols, KernelRows, KernelCols > & )

delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ NEWinogradLayerTransformInputKernel() [2/3]

NEWinogradLayerTransformInputKernel ( NEWinogradLayerTransformInputKernel< T, OutputTileRows, OutputTileCols, KernelRows, KernelCols > && )

default

Allow instances of this class to be moved.

◆ ~NEWinogradLayerTransformInputKernel()

~NEWinogradLayerTransformInputKernel ( )

default

Default destructor.

◆ NEWinogradLayerTransformInputKernel() [3/3]

NEWinogradLayerTransformInputKernel ( )

Default constructor.

Definition at line 318 of file NEWinogradConvolutionLayerKernel.cpp.

     : _transform(nullptr), _input_nhwc(nullptr), _num_batches(0), _num_rows(0), _num_cols(0), _num_channels(0), _padding(), _output(nullptr), _matrix_stride(0), _padding_top(), _padding_left(),
       _padding_right(), _padding_bottom(), _workspace(nullptr)
 {
 }

Member Function Documentation

◆ configure()

void configure	(	const ITensor *	input_nhwc,
		const int	num_batches,
		const int	num_rows,
		const int	num_cols,
		const int	num_channels,
		const PaddingType	padding,
		ITensor *	output,
		const int	matrix_stride,
		ITensor *	workspace
	)

overridevirtual

Configure the output transform kernel.

Parameters

[in]	input_nhwc	Input tensor. Data types supported: F16/F32. Layout supported NHWC.
[in]	num_batches	Number of batches in input tensor.
[in]	num_rows	Number of rows in input tensor.
[in]	num_cols	Number of columns in input tensor.
[in]	num_channels	Number of channels in input tensor.
[in]	padding	Padding type.
[out]	output	Base of output matrices.
[in]	matrix_stride	Stride between output matrices.
[in]	workspace	Tensor to be used as the working space during the computation.

< Padding to apply to the top of the image.

< Padding to apply to the left of the image.

< Padding to apply to the bottom of the image.

< Padding to apply to the right of the image.

Implements INEWinogradLayerTransformInputKernel.

Definition at line 325 of file NEWinogradConvolutionLayerKernel.cpp.

References Window::DimX, and arm_gemm::iceildiv().

 {
     _input_nhwc    = input_nhwc;
     _num_batches   = num_batches;
     _num_rows      = num_rows;
     _num_cols      = num_cols;
     _num_channels  = num_channels;
     _padding       = padding;
     _output        = output;
     _matrix_stride = matrix_stride;
     _workspace     = workspace;
 
     _padding_top    = (padding == PADDING_SAME) ? (KernelRows - 1) / 2 : 0;
     _padding_left   = (padding == PADDING_SAME) ? (KernelCols - 1) / 2 : 0;
     _padding_bottom = (padding == PADDING_SAME) ? iceildiv(KernelRows - 1, 2) : 0;
     _padding_right  = (padding == PADDING_SAME) ? iceildiv(KernelCols - 1, 2) : 0;
 
     _transform = std::make_unique<InputTransform>(
                      KernelRows,
                      KernelCols,
                      num_batches,
                      num_rows,
                      num_cols,
                      num_channels,
                      _padding_top,    /**< Padding to apply to the top of the image. */
                      _padding_left,   /**< Padding to apply to the left of the image. */
                      _padding_bottom, /**< Padding to apply to the bottom of the image. */
                      _padding_right   /**< Padding to apply to the right of the image. */
                  );
 
     Window win;
     auto   win_last = _transform->get_window();
     win.set(Window::DimX, Window::Dimension(0, win_last, 1));
     INEKernel::configure(win);
 }

◆ get_input_storage_size()

unsigned int get_input_storage_size	(	int	num_batches,
		int	num_channels,
		int	num_rows,
		int	num_cols,
		bool	same_padding
	)		const

overridevirtual

Determine how much memory (in units of TIn) to allocate for the transformed input.

Parameters

[in]	num_batches	Number of batches in the input tensor.
[in]	num_channels	Number of feature maps in the input tensor.
[in]	num_rows	Number of rows in each feature map.
[in]	num_cols	Number of columns in each feature map.
[in]	same_padding	Use "SAME" padding, otherwise use "VALID".

Returns: Storage size (in units of TIn) required.

Implements INEWinogradLayerTransformInputKernel.

Definition at line 285 of file NEWinogradConvolutionLayerKernel.cpp.

References arm_compute::test::validation::input_shape.

 {
     // Construct shapes for the input and kernel tensors.
     const Tensor4DShape input_shape(num_batches, num_rows, num_cols, num_channels);
     const KernelShape   kern_shape(1, KernelRows, KernelCols, num_channels);
     // Return the size, converted into units of TIn
     return static_cast<unsigned int>(WinogradConv::get_input_storage_size(num_batches, num_rows, num_cols, num_channels, same_padding) / sizeof(T));
 }

◆ get_matrix_stride()

int get_matrix_stride	(	int	num_batches,
		int	num_channels,
		int	num_rows,
		int	num_cols,
		bool	same_padding
	)		const

overridevirtual

Gets the stride between matrices in the input worspace.

Parameters

[in]	num_batches	Number of batches in the input tensor.
[in]	num_channels	Number of feature maps in the input tensor.
[in]	num_rows	Number of rows in each feature map.
[in]	num_cols	Number of columns in each feature map.
[in]	same_padding	Use "SAME" padding, otherwise use "VALID".

Returns: Stride expressed in bytes.

Implements INEWinogradLayerTransformInputKernel.

Definition at line 307 of file NEWinogradConvolutionLayerKernel.cpp.

 {
     return WinogradConv::get_input_matrix_stride(num_batches, num_rows, num_cols, num_channels, same_padding);
 }

◆ get_working_space_size()

unsigned int get_working_space_size ( unsigned int num_threads ) const

overridevirtual

Get the working space required to perform the transformation.

Note, the working space is only required when performing the transformation - hence it can be reused whenever the transformation is not running.

Parameters

[in] num_threads The greatest number of threads that will be used to execute the transform.

Returns: Size of working space required in bytes.

Implements INEWinogradLayerTransformInputKernel.

Definition at line 301 of file NEWinogradConvolutionLayerKernel.cpp.

 {
     return _transform->get_working_space_size(num_threads) / sizeof(T);
 }

◆ name()

const char* name ( ) const

inlineoverridevirtual

Name of the kernel.

Returns: Kernel name

Implements ICPPKernel.

Definition at line 165 of file NEWinogradConvolutionLayerKernel.h.

References INEWinogradLayerTransformInputKernel::configure(), arm_compute::test::validation::info, ICPPKernel::run(), and IKernel::window().

     {
         return "NEWinogradLayerTransformInputKernel";
     }

◆ operator=() [1/2]

NEWinogradLayerTransformInputKernel& operator= ( const NEWinogradLayerTransformInputKernel< T, OutputTileRows, OutputTileCols, KernelRows, KernelCols > & )

delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

NEWinogradLayerTransformInputKernel& operator= ( NEWinogradLayerTransformInputKernel< T, OutputTileRows, OutputTileCols, KernelRows, KernelCols > && )

default

Allow instances of this class to be moved.

◆ run()

void run	(	const Window &	window,
		const ThreadInfo &	info
	)

overridevirtual

Execute the kernel on the passed window.

Warning: If is_parallelisable() returns false then the passed window must be equal to window()

Note: The window has to be a region within the window returned by the window() method; The width of the window has to be a multiple of num_elems_processed_per_iteration().

Parameters

[in]	window	Region on which to execute the kernel. (Must be a region of the window returned by window())
[in]	info	Info about executing thread and CPU.

Reimplemented from ICPPKernel.

Definition at line 371 of file NEWinogradConvolutionLayerKernel.cpp.

References ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL, ARM_COMPUTE_UNUSED, ITensor::buffer(), ITensorInfo::element_size(), Window::Dimension::end(), ITensor::info(), ITensorInfo::offset_first_element_in_bytes(), Window::Dimension::start(), ITensorInfo::strides_in_bytes(), ThreadInfo::thread_id, Window::x(), Dimensions< T >::y(), and Dimensions< T >::z().

 {
     ARM_COMPUTE_UNUSED(info);
     ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL(this);
     ARM_COMPUTE_ERROR_ON_NULLPTR(_workspace);
 
     const int  element_size_in_bytes = _input_nhwc->info()->element_size();
     const int  input_col_stride      = _input_nhwc->info()->strides_in_bytes().y() / element_size_in_bytes;
     const int  input_row_stride      = _input_nhwc->info()->strides_in_bytes().z() / element_size_in_bytes;
     const int  input_batch_stride    = _input_nhwc->info()->strides_in_bytes()[3] / element_size_in_bytes;
     const auto input_nhwc_ptr        = reinterpret_cast<const T *>(_input_nhwc->buffer() + _input_nhwc->info()->offset_first_element_in_bytes());
     auto       output_ptr            = reinterpret_cast<T *>(_output->buffer() + _output->info()->offset_first_element_in_bytes());
     ARM_COMPUTE_ERROR_ON_NULLPTR(output_ptr);
 
     _transform->set_input_tensor(input_nhwc_ptr, input_batch_stride, input_row_stride, input_col_stride);
     _transform->set_output_matrices(output_ptr, _matrix_stride, _num_channels);
 
     _transform->set_working_space(_workspace->buffer());
 
     // The code below cannot be moved to configure because biases hasn't been allocated at that point
     const size_t fst = window.x().start();
     const size_t lst = window.x().end();
     _transform->run(fst, lst, info.thread_id);
 }

◆ validate()

Status validate	(	const ITensorInfo *	input,
		const ITensorInfo *	output,
		const WinogradInfo &	winograd_info
	)

static

Static function to check if given info will lead to a valid configuration of NEWinogradLayerTransformInputKernel.

Parameters

[in]	input	First tensor input info. Data types supported: F16/F32.
[in]	output	Output tensor info. Data types supported: same as `input`.
[in]	winograd_info	Contains Winograd's information described in WinogradInfo

Returns: a status

Definition at line 397 of file NEWinogradConvolutionLayerKernel.cpp.

References ARM_COMPUTE_RETURN_ON_ERROR, ICloneable< T >::clone(), and arm_compute::test::validation::winograd_info.

 {
     ARM_COMPUTE_RETURN_ON_ERROR(validate_arguments_winograd_input_trans(input, output, winograd_info));
     ARM_COMPUTE_RETURN_ON_ERROR(validate_and_configure_window_winograd_input_trans(input->clone().get(), output->clone().get(), winograd_info).first);
 
     return Status{};
 }

The documentation for this class was generated from the following files:

src/core/NEON/kernels/NEWinogradConvolutionLayerKernel.h
src/core/NEON/kernels/NEWinogradConvolutionLayerKernel.cpp

Public Types

Public Member Functions

Static Public Member Functions

Detailed Description

template<typename T, int OutputTileRows, int OutputTileCols, int KernelRows, int KernelCols> class arm_compute::NEWinogradLayerTransformInputKernel< T, OutputTileRows, OutputTileCols, KernelRows, KernelCols >

Member Typedef Documentation

◆ WinogradBase

◆ WinogradConv

Constructor & Destructor Documentation

◆ NEWinogradLayerTransformInputKernel() [1/3]

◆ NEWinogradLayerTransformInputKernel() [2/3]

◆ ~NEWinogradLayerTransformInputKernel()

◆ NEWinogradLayerTransformInputKernel() [3/3]

Member Function Documentation

◆ configure()

◆ get_input_storage_size()

◆ get_matrix_stride()

◆ get_working_space_size()

◆ name()

◆ operator=() [1/2]

◆ operator=() [2/2]

◆ run()

◆ validate()

template<typename T, int OutputTileRows, int OutputTileCols, int KernelRows, int KernelCols>
class arm_compute::NEWinogradLayerTransformInputKernel< T, OutputTileRows, OutputTileCols, KernelRows, KernelCols >