21.02
|
Neon kernel to perform Winograd output transform. More...
#include <NEWinogradConvolutionLayerKernel.h>
Public Member Functions | |
const char * | name () const override |
Name of the kernel. More... | |
NEWinogradLayerTransformOutputKernel () | |
Constructor. More... | |
NEWinogradLayerTransformOutputKernel (const NEWinogradLayerTransformOutputKernel &)=delete | |
Prevent instances of this class from being copied (As this class contains pointers) More... | |
NEWinogradLayerTransformOutputKernel & | operator= (const NEWinogradLayerTransformOutputKernel &)=delete |
Prevent instances of this class from being copied (As this class contains pointers) More... | |
NEWinogradLayerTransformOutputKernel (NEWinogradLayerTransformOutputKernel &&)=default | |
Allow instances of this class to be moved. More... | |
NEWinogradLayerTransformOutputKernel & | operator= (NEWinogradLayerTransformOutputKernel &&)=default |
Allow instances of this class to be moved. More... | |
~NEWinogradLayerTransformOutputKernel ()=default | |
Default destructor. More... | |
unsigned int | get_output_storage_size (int num_batches, int num_rows, int num_cols, int num_output_channels) const override |
Determine how much memory (in units of TOut) to allocate for the (Winograd domain) output. More... | |
int | get_matrix_stride (int num_batches, int num_rows, int num_cols, int num_output_channels) const override |
Gets the stride between matrices in the output worspace. More... | |
std::pair< unsigned int, unsigned int > | get_output_shape (int num_rows, int num_cols, bool padding_same) const override |
Get the output shape of a convolution. More... | |
unsigned int | get_working_space_size (unsigned int num_threads) const override |
Get the working space required to perform the transformation. More... | |
void | configure (const ITensor *biases, const ITensor *transformed_output, const int matrix_stride, ITensor *output_nhwc, const int num_batches, const int num_rows, const int num_cols, const int num_channels, ITensor *workspace, const arm_gemm::Activation &activation) override |
Configure the output transform kernel. More... | |
void | run (const Window &window, const ThreadInfo &info) override |
Execute the kernel on the passed window. More... | |
Public Member Functions inherited from INEWinogradLayerTransformOutputKernel | |
virtual | ~INEWinogradLayerTransformOutputKernel () |
Public Member Functions inherited from ICPPKernel | |
virtual | ~ICPPKernel ()=default |
Default destructor. More... | |
virtual void | run_nd (const Window &window, const ThreadInfo &info, const Window &thread_locator) |
legacy compatibility layer for implemantions which do not support thread_locator In these cases we simply narrow the interface down the legacy version More... | |
virtual void | run_op (ITensorPack &tensors, const Window &window, const ThreadInfo &info) |
Execute the kernel on the passed window. More... | |
Public Member Functions inherited from IKernel | |
IKernel () | |
Constructor. More... | |
virtual | ~IKernel ()=default |
Destructor. More... | |
virtual bool | is_parallelisable () const |
Indicates whether or not the kernel is parallelisable. More... | |
virtual BorderSize | border_size () const |
The size of the border for that kernel. More... | |
const Window & | window () const |
The maximum window the kernel can be executed on. More... | |
Static Public Member Functions | |
static Status | validate (const ITensorInfo *input, const ITensorInfo *bias, const ITensorInfo *output, const WinogradInfo &winograd_info) |
Static function to check if given info will lead to a valid configuration of NEWinogradLayerTransformOutputKernel. More... | |
Neon kernel to perform Winograd output transform.
Definition at line 315 of file NEWinogradConvolutionLayerKernel.h.
Constructor.
Definition at line 439 of file NEWinogradConvolutionLayerKernel.cpp.
|
delete |
Prevent instances of this class from being copied (As this class contains pointers)
|
default |
Allow instances of this class to be moved.
|
default |
Default destructor.
|
overridevirtual |
Configure the output transform kernel.
[in] | biases | Pointer to the biases tensor. |
[in] | transformed_output | Pointer to working space for the output tensor in the Winograd domain. |
[in] | matrix_stride | Output matrix stride, can be computed with winograd::WinogradGEMM<2, 2, 3, 3>::Convolution<float, float>::get_output_matrix_stride() |
[out] | output_nhwc | Pointer to a tensor with NHWC data layout, in the spatial domain. |
[in] | num_batches | Number of batches in the input tensor. |
[in] | num_rows | Number of rows in output tensor. |
[in] | num_cols | Number of columns in output tensor. |
[in] | num_channels | Number of feature maps in the output tensor. |
[in] | workspace | Tensor to be used as the working space during the computation. |
[in] | activation | Activation to be used |
Implements INEWinogradLayerTransformOutputKernel.
Definition at line 472 of file NEWinogradConvolutionLayerKernel.cpp.
References Window::DimX, ITensor::info(), arm_gemm::roundup(), ITensorInfo::set_valid_region(), and ITensorInfo::tensor_shape().
|
overridevirtual |
Gets the stride between matrices in the output worspace.
[in] | num_batches | Number of batches in the output tensor. |
[in] | num_rows | Number of rows in each feature map of the input tensor. |
[in] | num_cols | Number of columns in each feature map of the input tensor. |
[in] | num_output_channels | Number of feature maps in the output tensor. |
Implements INEWinogradLayerTransformOutputKernel.
Definition at line 452 of file NEWinogradConvolutionLayerKernel.cpp.
|
overridevirtual |
Get the output shape of a convolution.
[in] | num_rows | Number of rows in each feature map of the input tensor. |
[in] | num_cols | Number of columns in each feature map of the input tensor. |
[in] | padding_same | True if padding is SAME, false otherwise |
Implements INEWinogradLayerTransformOutputKernel.
Definition at line 463 of file NEWinogradConvolutionLayerKernel.cpp.
|
overridevirtual |
Determine how much memory (in units of TOut) to allocate for the (Winograd domain) output.
[in] | num_batches | Number of batches in the output tensor. |
[in] | num_rows | Number of rows in each feature map of the input tensor. |
[in] | num_cols | Number of columns in each feature map of the input tensor. |
[in] | num_output_channels | Number of feature maps in the output tensor. |
Implements INEWinogradLayerTransformOutputKernel.
Definition at line 423 of file NEWinogradConvolutionLayerKernel.cpp.
References arm_compute::test::validation::input_shape.
|
overridevirtual |
Get the working space required to perform the transformation.
Note, the working space is only required when performing the transformation - hence it can be reused whenever the transformation is not running.
[in] | num_threads | The greatest number of threads that will be used to execute the transform. |
Implements INEWinogradLayerTransformOutputKernel.
Definition at line 446 of file NEWinogradConvolutionLayerKernel.cpp.
|
inlineoverridevirtual |
Name of the kernel.
Implements ICPPKernel.
Definition at line 318 of file NEWinogradConvolutionLayerKernel.h.
References INEWinogradLayerTransformInputKernel::configure(), INEWinogradLayerTransformInputKernel::get_matrix_stride(), INEWinogradLayerTransformInputKernel::get_working_space_size(), arm_compute::test::validation::info, arm_compute::test::validation::input, ICPPKernel::run(), arm_compute::validate(), IKernel::window(), and arm_compute::test::validation::winograd_info.
|
delete |
Prevent instances of this class from being copied (As this class contains pointers)
|
default |
Allow instances of this class to be moved.
|
overridevirtual |
Execute the kernel on the passed window.
[in] | window | Region on which to execute the kernel. (Must be a region of the window returned by window()) |
[in] | info | Info about executing thread and CPU. |
Reimplemented from ICPPKernel.
Definition at line 505 of file NEWinogradConvolutionLayerKernel.cpp.
References ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL, ARM_COMPUTE_UNUSED, ITensor::buffer(), Window::Dimension::end(), ITensor::info(), ITensorInfo::offset_first_element_in_bytes(), Window::Dimension::start(), ITensorInfo::strides_in_bytes(), ThreadInfo::thread_id, and Window::x().
|
static |
Static function to check if given info will lead to a valid configuration of NEWinogradLayerTransformOutputKernel.
[in] | input | Source tensor info with shape [C, N, 16, batches] or [C, N, 36, batches]. Data types supported: F16/F32. |
[in] | bias | Biases tensor info. Shared biases supported. Biases are 1D tensor with dimensions [OFM]. It can be a nullptr. Data type supported: as input |
[in] | output | Destination tensor info with shape [output_convolved_dims.width, output_convolved_dims.height, C, batches]. Data type supported: same as input |
[in] | winograd_info | Contains Winograd's information described in WinogradInfo |
Definition at line 528 of file NEWinogradConvolutionLayerKernel.cpp.
References ARM_COMPUTE_RETURN_ON_ERROR, ICloneable< T >::clone(), and arm_compute::test::validation::winograd_info.