Neon kernel to perform Winograd output transform. More...

#include <NEWinogradConvolutionLayerKernel.h>

Collaboration diagram for NEWinogradLayerTransformOutputKernel< T, OutputTileRows, OutputTileCols, KernelRows, KernelCols >:

[legend]

Public Member Functions
const char *	name () const override
	Name of the kernel. More...

	NEWinogradLayerTransformOutputKernel ()
	Constructor. More...

	NEWinogradLayerTransformOutputKernel (const NEWinogradLayerTransformOutputKernel &)=delete
	Prevent instances of this class from being copied (As this class contains pointers) More...

NEWinogradLayerTransformOutputKernel &	operator= (const NEWinogradLayerTransformOutputKernel &)=delete
	Prevent instances of this class from being copied (As this class contains pointers) More...

	NEWinogradLayerTransformOutputKernel (NEWinogradLayerTransformOutputKernel &&)=default
	Allow instances of this class to be moved. More...

NEWinogradLayerTransformOutputKernel &	operator= (NEWinogradLayerTransformOutputKernel &&)=default
	Allow instances of this class to be moved. More...

	~NEWinogradLayerTransformOutputKernel ()=default
	Default destructor. More...

unsigned int	get_output_storage_size (int num_batches, int num_rows, int num_cols, int num_output_channels) const override
	Determine how much memory (in units of TOut) to allocate for the (Winograd domain) output. More...

int	get_matrix_stride (int num_batches, int num_rows, int num_cols, int num_output_channels) const override
	Gets the stride between matrices in the output worspace. More...

std::pair< unsigned int, unsigned int >	get_output_shape (int num_rows, int num_cols, bool padding_same) const override
	Get the output shape of a convolution. More...

unsigned int	get_working_space_size (unsigned int num_threads) const override
	Get the working space required to perform the transformation. More...

void	configure (const ITensor biases, const ITensor transformed_output, const int matrix_stride, ITensor output_nhwc, const int num_batches, const int num_rows, const int num_cols, const int num_channels, ITensor workspace, const arm_gemm::Activation &activation) override
	Configure the output transform kernel. More...

void	run (const Window &window, const ThreadInfo &info) override
	Execute the kernel on the passed window. More...

Public Member Functions inherited from INEWinogradLayerTransformOutputKernel
virtual	~INEWinogradLayerTransformOutputKernel ()

Public Member Functions inherited from ICPPKernel
virtual	~ICPPKernel ()=default
	Default destructor. More...

virtual void	run_nd (const Window &window, const ThreadInfo &info, const Window &thread_locator)
	legacy compatibility layer for implemantions which do not support thread_locator In these cases we simply narrow the interface down the legacy version More...

virtual void	run_op (ITensorPack &tensors, const Window &window, const ThreadInfo &info)
	Execute the kernel on the passed window. More...

Public Member Functions inherited from IKernel
	IKernel ()
	Constructor. More...

virtual	~IKernel ()=default
	Destructor. More...

virtual bool	is_parallelisable () const
	Indicates whether or not the kernel is parallelisable. More...

virtual BorderSize	border_size () const
	The size of the border for that kernel. More...

const Window &	window () const
	The maximum window the kernel can be executed on. More...

Static Public Member Functions
static Status	validate (const ITensorInfo input, const ITensorInfo bias, const ITensorInfo *output, const WinogradInfo &winograd_info)
	Static function to check if given info will lead to a valid configuration of NEWinogradLayerTransformOutputKernel. More...

Detailed Description

template<typename T, int OutputTileRows, int OutputTileCols, int KernelRows, int KernelCols>
class arm_compute::NEWinogradLayerTransformOutputKernel< T, OutputTileRows, OutputTileCols, KernelRows, KernelCols >

Neon kernel to perform Winograd output transform.

Definition at line 315 of file NEWinogradConvolutionLayerKernel.h.

Constructor & Destructor Documentation

◆ NEWinogradLayerTransformOutputKernel() [1/3]

NEWinogradLayerTransformOutputKernel ( )

Constructor.

Definition at line 439 of file NEWinogradConvolutionLayerKernel.cpp.

     : _transform(nullptr), _biases(nullptr), _transformed_output(nullptr), _workspace(nullptr), _matrix_stride(0), _matrix_row_stride(0), _output_nhwc(nullptr), _num_batches(0), _num_rows(0),
       _num_cols(0), _num_channels(0)
 {
 }

◆ NEWinogradLayerTransformOutputKernel() [2/3]

NEWinogradLayerTransformOutputKernel ( const NEWinogradLayerTransformOutputKernel< T, OutputTileRows, OutputTileCols, KernelRows, KernelCols > & )

delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ NEWinogradLayerTransformOutputKernel() [3/3]

NEWinogradLayerTransformOutputKernel ( NEWinogradLayerTransformOutputKernel< T, OutputTileRows, OutputTileCols, KernelRows, KernelCols > && )

default

Allow instances of this class to be moved.

◆ ~NEWinogradLayerTransformOutputKernel()

~NEWinogradLayerTransformOutputKernel ( )

default

Default destructor.

Member Function Documentation

◆ configure()

void configure	(	const ITensor *	biases,
		const ITensor *	transformed_output,
		const int	matrix_stride,
		ITensor *	output_nhwc,
		const int	num_batches,
		const int	num_rows,
		const int	num_cols,
		const int	num_channels,
		ITensor *	workspace,
		const arm_gemm::Activation &	activation
	)

overridevirtual

Configure the output transform kernel.

Parameters

[in]	biases	Pointer to the biases tensor.
[in]	transformed_output	Pointer to working space for the output tensor in the Winograd domain.
[in]	matrix_stride	Output matrix stride, can be computed with winograd::WinogradGEMM<2, 2, 3, 3>::Convolution<float, float>::get_output_matrix_stride()
[out]	output_nhwc	Pointer to a tensor with NHWC data layout, in the spatial domain.
[in]	num_batches	Number of batches in the input tensor.
[in]	num_rows	Number of rows in output tensor.
[in]	num_cols	Number of columns in output tensor.
[in]	num_channels	Number of feature maps in the output tensor.
[in]	workspace	Tensor to be used as the working space during the computation.
[in]	activation	Activation to be used

Implements INEWinogradLayerTransformOutputKernel.

Definition at line 472 of file NEWinogradConvolutionLayerKernel.cpp.

References Window::DimX, ITensor::info(), arm_gemm::roundup(), ITensorInfo::set_valid_region(), and ITensorInfo::tensor_shape().

 {
     _biases             = biases;
     _workspace          = workspace;
     _transformed_output = transformed_output;
     _matrix_stride      = matrix_stride;
     _matrix_row_stride  = roundup(num_channels, WinogradConv::N_BLOCK);
     _output_nhwc        = output_nhwc;
     _num_batches        = num_batches;
     _num_rows           = num_rows;
     _num_cols           = num_cols;
     _num_channels       = num_channels;
     // We don't have the biases buffer at this stage as it hasn't been allocated, we pass in nullptr OutputTransform is only used here to compute the window
     _transform = std::make_unique<OutputTransform>(num_batches, num_rows, num_cols, num_channels, activation);
     Window win;
     auto   win_last = _transform->get_window();
     win.set(Window::DimX, Window::Dimension(0, win_last, 1));
     _output_nhwc->info()->set_valid_region(ValidRegion(Coordinates(), _output_nhwc->info()->tensor_shape()));
 
     INEKernel::configure(win);
 }

◆ get_matrix_stride()

int get_matrix_stride	(	int	num_batches,
		int	num_rows,
		int	num_cols,
		int	num_output_channels
	)		const

overridevirtual

Gets the stride between matrices in the output worspace.

Parameters

[in]	num_batches	Number of batches in the output tensor.
[in]	num_rows	Number of rows in each feature map of the input tensor.
[in]	num_cols	Number of columns in each feature map of the input tensor.
[in]	num_output_channels	Number of feature maps in the output tensor.

Returns: Stride expressed in bytes.

Implements INEWinogradLayerTransformOutputKernel.

Definition at line 452 of file NEWinogradConvolutionLayerKernel.cpp.

 {
     return WinogradConv::get_output_matrix_stride(num_batches, num_rows, num_cols, num_output_channels);
 }

◆ get_output_shape()

std::pair< unsigned int, unsigned int > get_output_shape	(	int	num_rows,
		int	num_cols,
		bool	padding_same
	)		const

overridevirtual

Get the output shape of a convolution.

Parameters

[in]	num_rows	Number of rows in each feature map of the input tensor.
[in]	num_cols	Number of columns in each feature map of the input tensor.
[in]	padding_same	True if padding is SAME, false otherwise

Returns: Shape of the output tensor

Implements INEWinogradLayerTransformOutputKernel.

Definition at line 463 of file NEWinogradConvolutionLayerKernel.cpp.

 {
     return WinogradConv::get_output_shape(std::make_pair<unsigned int, unsigned int>(num_rows, num_cols), padding_same);
 }

◆ get_output_storage_size()

unsigned int get_output_storage_size	(	int	num_batches,
		int	num_rows,
		int	num_cols,
		int	num_output_channels
	)		const

overridevirtual

Determine how much memory (in units of TOut) to allocate for the (Winograd domain) output.

Parameters

[in]	num_batches	Number of batches in the output tensor.
[in]	num_rows	Number of rows in each feature map of the input tensor.
[in]	num_cols	Number of columns in each feature map of the input tensor.
[in]	num_output_channels	Number of feature maps in the output tensor.

Returns: Storage size (in units of TOut) required.

Implements INEWinogradLayerTransformOutputKernel.

Definition at line 423 of file NEWinogradConvolutionLayerKernel.cpp.

References arm_compute::test::validation::input_shape.

 {
     // Construct shapes for the input and kernel tensors.
     const Tensor4DShape input_shape(num_batches, num_rows, num_cols, 1);
     const KernelShape   kern_shape(num_output_channels, KernelRows, KernelCols, 1);
     // Return the size, converted into units of TOut
     return static_cast<unsigned int>(
                WinogradConv::get_output_storage_size(num_batches, num_rows, num_cols, num_output_channels) / sizeof(T));
 }

◆ get_working_space_size()

unsigned int get_working_space_size ( unsigned int num_threads ) const

overridevirtual

Get the working space required to perform the transformation.

Note, the working space is only required when performing the transformation - hence it can be reused whenever the transformation is not running.

Parameters

[in] num_threads The greatest number of threads that will be used to execute the transform.

Returns: Size of working space required in bytes.

Implements INEWinogradLayerTransformOutputKernel.

Definition at line 446 of file NEWinogradConvolutionLayerKernel.cpp.

 {
     return _transform->get_working_space_size(num_threads) / sizeof(T);
 }

◆ name()

const char* name ( ) const

inlineoverridevirtual

Name of the kernel.

Returns: Kernel name

Implements ICPPKernel.

Definition at line 318 of file NEWinogradConvolutionLayerKernel.h.

References INEWinogradLayerTransformInputKernel::configure(), INEWinogradLayerTransformInputKernel::get_matrix_stride(), INEWinogradLayerTransformInputKernel::get_working_space_size(), arm_compute::test::validation::info, arm_compute::test::validation::input, ICPPKernel::run(), arm_compute::validate(), IKernel::window(), and arm_compute::test::validation::winograd_info.

     {
         return "NEWinogradLayerTransformOutputKernel";
     }

◆ operator=() [1/2]

NEWinogradLayerTransformOutputKernel& operator= ( const NEWinogradLayerTransformOutputKernel< T, OutputTileRows, OutputTileCols, KernelRows, KernelCols > & )

delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

NEWinogradLayerTransformOutputKernel& operator= ( NEWinogradLayerTransformOutputKernel< T, OutputTileRows, OutputTileCols, KernelRows, KernelCols > && )

default

Allow instances of this class to be moved.

◆ run()

void run	(	const Window &	window,
		const ThreadInfo &	info
	)

overridevirtual

Execute the kernel on the passed window.

Warning: If is_parallelisable() returns false then the passed window must be equal to window()

Note: The window has to be a region within the window returned by the window() method; The width of the window has to be a multiple of num_elems_processed_per_iteration().

Parameters

[in]	window	Region on which to execute the kernel. (Must be a region of the window returned by window())
[in]	info	Info about executing thread and CPU.

Reimplemented from ICPPKernel.

Definition at line 505 of file NEWinogradConvolutionLayerKernel.cpp.

References ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL, ARM_COMPUTE_UNUSED, ITensor::buffer(), Window::Dimension::end(), ITensor::info(), ITensorInfo::offset_first_element_in_bytes(), Window::Dimension::start(), ITensorInfo::strides_in_bytes(), ThreadInfo::thread_id, and Window::x().

 {
     ARM_COMPUTE_UNUSED(info);
     ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL(this);
     ARM_COMPUTE_ERROR_ON_NULLPTR(_workspace);
     ARM_COMPUTE_ERROR_ON_NULLPTR(_transformed_output);
     ARM_COMPUTE_ERROR_ON_NULLPTR(_output_nhwc);
 
     const int out_batch_stride = _output_nhwc->info()->strides_in_bytes()[3] / sizeof(T);
     const int out_row_stride   = _output_nhwc->info()->strides_in_bytes()[2] / sizeof(T);
     const int out_col_stride   = _output_nhwc->info()->strides_in_bytes()[1] / sizeof(T);
 
     _transform->set_input_matrices(_transformed_output->buffer(), _matrix_stride, _matrix_row_stride);
     _transform->set_bias((_biases ? reinterpret_cast<T *>(_biases->buffer() + _biases->info()->offset_first_element_in_bytes()) : nullptr));
     _transform->set_output_tensor(_output_nhwc->buffer() + _output_nhwc->info()->offset_first_element_in_bytes(), out_batch_stride, out_row_stride, out_col_stride);
     _transform->set_working_space(_workspace->buffer());
     // The code below cannot be moved to configure because biases hasn't been allocated at that point
     const size_t fst = window.x().start();
     const size_t lst = window.x().end();
     _transform->run(fst, lst, info.thread_id);
 }

◆ validate()

Status validate	(	const ITensorInfo *	input,
		const ITensorInfo *	bias,
		const ITensorInfo *	output,
		const WinogradInfo &	winograd_info
	)

static

Static function to check if given info will lead to a valid configuration of NEWinogradLayerTransformOutputKernel.

Parameters

[in]	input	Source tensor info with shape [C, N, 16, batches] or [C, N, 36, batches]. Data types supported: F16/F32.
[in]	bias	Biases tensor info. Shared biases supported. Biases are 1D tensor with dimensions [OFM]. It can be a nullptr. Data type supported: as `input`
[in]	output	Destination tensor info with shape [output_convolved_dims.width, output_convolved_dims.height, C, batches]. Data type supported: same as `input`
[in]	winograd_info	Contains Winograd's information described in WinogradInfo

Returns: a status

Definition at line 528 of file NEWinogradConvolutionLayerKernel.cpp.

References ARM_COMPUTE_RETURN_ON_ERROR, ICloneable< T >::clone(), and arm_compute::test::validation::winograd_info.

 {
     ARM_COMPUTE_RETURN_ON_ERROR(validate_arguments_winograd_output_trans(input, (bias != nullptr ? bias->clone().get() : nullptr), output, winograd_info));
     ARM_COMPUTE_RETURN_ON_ERROR(validate_and_configure_window_winograd_output_trans(input->clone().get(), output->clone().get(), winograd_info).first);
 
     return Status{};
 }

The documentation for this class was generated from the following files:

src/core/NEON/kernels/NEWinogradConvolutionLayerKernel.h
src/core/NEON/kernels/NEWinogradConvolutionLayerKernel.cpp

Public Member Functions

Static Public Member Functions

Detailed Description

template<typename T, int OutputTileRows, int OutputTileCols, int KernelRows, int KernelCols> class arm_compute::NEWinogradLayerTransformOutputKernel< T, OutputTileRows, OutputTileCols, KernelRows, KernelCols >

Constructor & Destructor Documentation

◆ NEWinogradLayerTransformOutputKernel() [1/3]

◆ NEWinogradLayerTransformOutputKernel() [2/3]

◆ NEWinogradLayerTransformOutputKernel() [3/3]

◆ ~NEWinogradLayerTransformOutputKernel()

Member Function Documentation

◆ configure()

◆ get_matrix_stride()

◆ get_output_shape()

◆ get_output_storage_size()

◆ get_working_space_size()

◆ name()

◆ operator=() [1/2]

◆ operator=() [2/2]

◆ run()

◆ validate()

template<typename T, int OutputTileRows, int OutputTileCols, int KernelRows, int KernelCols>
class arm_compute::NEWinogradLayerTransformOutputKernel< T, OutputTileRows, OutputTileCols, KernelRows, KernelCols >