Basic function to simulate a convolution layer. More...

#include <NEWinogradConvolutionLayer.h>

Collaboration diagram for NEWinogradConvolutionLayer:

Public Member Functions
	NEWinogradConvolutionLayer (const std::shared_ptr< IMemoryManager > &memory_manager=nullptr)
	Constructor. More...

	NEWinogradConvolutionLayer (NEWinogradConvolutionLayer &&)=delete
	Prevent instances of this class from being moved (As this class contains non movable objects) More...

NEWinogradConvolutionLayer &	operator= (NEWinogradConvolutionLayer &&)=delete
	Prevent instances of this class from being moved (As this class contains non movable objects) More...

	~NEWinogradConvolutionLayer ()=default
	Default destructor. More...

void	configure (const ITensor input, const ITensor weights, const ITensor biases, ITensor output, const PadStrideInfo &conv_info, const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false)
	Set the input and output tensors. More...

void	run () override
	Run the kernels contained in the function. More...

void	prepare () override
	Prepare the function for executing. More...

	NEWinogradConvolutionLayer (const NEWinogradConvolutionLayer &)=delete
	Prevent instances of this class from being copied (As this class contains pointers) More...

NEWinogradConvolutionLayer &	operator= (const NEWinogradConvolutionLayer &)=delete
	Prevent instances of this class from being copied (As this class contains pointers) More...

Public Member Functions inherited from IFunction
virtual	~IFunction ()=default
	Destructor. More...

Static Public Member Functions
static Status	validate (const ITensorInfo input, const ITensorInfo weights, const ITensorInfo biases, const ITensorInfo output, const PadStrideInfo &conv_info, const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false)
	Static function to check if given info will lead to a valid configuration of NEGEMMConvolutionLayer. More...

Detailed Description

Basic function to simulate a convolution layer.

This function calls the following Neon kernels:

NEWinogradLayerTransformWeightsKernel (executed only once in the first call to the run() method )
NEWinogradLayerTransformInputKernel
NEWinogradLayerTransformOutputKernel
NEGEMMAssemblyDispatch
CPPPermute (three times: weights, input and output)

Note: Some Winograd configurations (i.e. F(2x2, 5x5), F(4x4, 5x5)) are supported only with enable_fast_math = true

Definition at line 54 of file NEWinogradConvolutionLayer.h.

Constructor & Destructor Documentation

◆ NEWinogradConvolutionLayer() [1/3]

NEWinogradConvolutionLayer ( const std::shared_ptr< IMemoryManager > & memory_manager = nullptr )

Constructor.

Definition at line 303 of file NEWinogradConvolutionLayer.cpp.

     : _memory_group(memory_manager), _gemm_function(memory_manager), _transform_input_kernel(nullptr), _transform_output_kernel(nullptr), _transform_weights_kernel(nullptr), _activationlayer_function(),
       _permute_input(), _permute_weights(), _permute_output(), _input_transformed(), _output_transformed(), _input_workspace(), _output_workspace(), _kernel_storage(), _input_nhwc(), _output_nhwc(),
       _weights_hwio(), _input(), _weights(), _output(), _is_prepared(false), _is_activationlayer_enabled(false)
 {
 }

◆ NEWinogradConvolutionLayer() [2/3]

NEWinogradConvolutionLayer ( NEWinogradConvolutionLayer && )

delete

Prevent instances of this class from being moved (As this class contains non movable objects)

◆ ~NEWinogradConvolutionLayer()

~NEWinogradConvolutionLayer ( )

default

Default destructor.

◆ NEWinogradConvolutionLayer() [3/3]

NEWinogradConvolutionLayer ( const NEWinogradConvolutionLayer & )

delete

Prevent instances of this class from being copied (As this class contains pointers)

Member Function Documentation

◆ configure()

void configure	(	const ITensor *	input,
		const ITensor *	weights,
		const ITensor *	biases,
		ITensor *	output,
		const PadStrideInfo &	conv_info,
		const ActivationLayerInfo &	act_info = `ActivationLayerInfo()`,
		bool	enable_fast_math = `false`
	)

Set the input and output tensors.

Parameters

[in]	input	Source tensor. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: F16/F32.
[in]	weights	Weights tensor. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. Data type supported: Same as `input`. Currently only 3x3 and 5x5 kernels are supported.
[in]	biases	Biases tensor. Shared biases supported. Biases are 1D tensor with dimensions [OFM]. Data type supported: Same as `weights`.
[out]	output	Destination tensor. 3 lower dimensions represent a single output [width, height, OFM], while the rest represent batch of outputs. Data types supported: Same as `input`.
[in]	conv_info	Contains padding and stride information described in PadStrideInfo. Currently only unit strides are supported.
[in]	act_info	(Optional) Activation layer information in case of a fused activation.
[in]	enable_fast_math	(Optional) Enable fast math computation. In case this flag were set, the function could dispatch the fastest implementation available which may introduce a drop of accuracy as well. Default is false

Definition at line 310 of file NEWinogradConvolutionLayer.cpp.

References TensorAllocator::allocate(), Tensor::allocator(), ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON_MSG, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, arm_compute::CHANNEL, CPPPermute::configure(), NEActivationLayer::configure(), NEGEMM::configure(), arm_compute::test::validation::conv_info, ITensorInfo::data_layout(), ITensorInfo::data_type(), ITensorInfo::dimension(), ITensorInfo::element_size(), ActivationLayerInfo::enabled(), arm_compute::F16, arm_compute::F32, Scheduler::get(), arm_compute::get_data_layout_dimension_index(), Size2D::height, arm_compute::HEIGHT, arm_gemm::iceildiv(), ITensor::info(), arm_compute::test::validation::info, TensorAllocator::init(), TensorInfo::init(), arm_compute::test::validation::input, MemoryGroup::manage(), arm_compute::NCHW, IScheduler::num_threads(), arm_compute::test::validation::output_shape, PadStrideInfo::pad_left(), PadStrideInfo::pad_top(), arm_gemm::roundup(), Dimensions< T >::set(), arm_compute::U, arm_compute::validate_arguments(), Size2D::width, and arm_compute::WIDTH.

 {
     ARM_COMPUTE_ERROR_ON_NULLPTR(input, weights, output);
     ARM_COMPUTE_ERROR_THROW_ON(validate_arguments(input->info(), weights->info(), (biases != nullptr) ? biases->info() : nullptr, output->info(), conv_info));
 
     // Get indices for the width and height
     const DataLayout   data_layout = input->info()->data_layout();
     const unsigned int width_idx   = get_data_layout_dimension_index(data_layout, DataLayoutDimension::WIDTH);
     const unsigned int height_idx  = get_data_layout_dimension_index(data_layout, DataLayoutDimension::HEIGHT);
     const unsigned int channel_idx = get_data_layout_dimension_index(data_layout, DataLayoutDimension::CHANNEL);
 
     const Size2D   input_dims  = Size2D(input->info()->dimension(width_idx), input->info()->dimension(height_idx));
     const Size2D   kernel_size = Size2D(weights->info()->dimension(width_idx), weights->info()->dimension(height_idx));
     const DataType data_type   = input->info()->data_type();
     const Size2D   output_tile = winograd_output_tile(input_dims, kernel_size, data_type);
 
     // Check if the Winograd configuration requires fast math
     if(!enable_fast_math)
     {
         ARM_COMPUTE_ERROR_ON_MSG(check_support_fast_math(output_tile, kernel_size, data_type),
                                  "This Winograd configuration requires enable_fast_math=true");
     }
 
     _weights     = weights;
     _input       = input;
     _output      = output;
     _is_prepared = false;
 
     int n_gemms = 0;
     int N_BLOCK = 0; // Size of block used by GEMM.
 
     std::unique_ptr<INEWinogradLayerTransformInputKernel>   transform_input_kernel;
     std::unique_ptr<INEWinogradLayerTransformWeightsKernel> transform_weights_kernel;
     std::unique_ptr<INEWinogradLayerTransformOutputKernel>  transform_output_kernel;
 
     if(data_type == DataType::F32)
     {
         if(kernel_size == Size2D(3, 3))
         {
             if(input->info()->dimension(width_idx) > 4 && input->info()->dimension(height_idx) > 4)
             {
                 using config             = NEWinogradLayerConfiguration<float, float, 4, 4, 3, 3>;
                 transform_input_kernel   = std::make_unique<config::TransformInputKernel>();
                 transform_weights_kernel = std::make_unique<config::TransformWeightsKernel>();
                 transform_output_kernel  = std::make_unique<config::TransformOutputKernel>();
                 n_gemms                  = config::WinogradBase::N_GEMMS;
                 N_BLOCK                  = config::WinogradConv::N_BLOCK;
             }
             else
             {
                 using config             = NEWinogradLayerConfiguration<float, float, 2, 2, 3, 3>;
                 transform_input_kernel   = std::make_unique<config::TransformInputKernel>();
                 transform_weights_kernel = std::make_unique<config::TransformWeightsKernel>();
                 transform_output_kernel  = std::make_unique<config::TransformOutputKernel>();
                 n_gemms                  = config::WinogradBase::N_GEMMS;
                 N_BLOCK                  = config::WinogradConv::N_BLOCK;
             }
         }
         else if(kernel_size == Size2D(5, 5))
         {
             using config             = NEWinogradLayerConfiguration<float, float, 2, 2, 5, 5>;
             transform_input_kernel   = std::make_unique<config::TransformInputKernel>();
             transform_weights_kernel = std::make_unique<config::TransformWeightsKernel>();
             transform_output_kernel  = std::make_unique<config::TransformOutputKernel>();
             n_gemms                  = config::WinogradBase::N_GEMMS;
             N_BLOCK                  = config::WinogradConv::N_BLOCK;
         }
         else if(kernel_size == Size2D(1, 3))
         {
             using config             = NEWinogradLayerConfiguration<float, float, 6, 1, 3, 1>;
             transform_input_kernel   = std::make_unique<config::TransformInputKernel>();
             transform_weights_kernel = std::make_unique<config::TransformWeightsKernel>();
             transform_output_kernel  = std::make_unique<config::TransformOutputKernel>();
             n_gemms                  = config::WinogradBase::N_GEMMS;
             N_BLOCK                  = config::WinogradConv::N_BLOCK;
         }
         else if(kernel_size == Size2D(3, 1))
         {
             using config             = NEWinogradLayerConfiguration<float, float, 1, 6, 1, 3>;
             transform_input_kernel   = std::make_unique<config::TransformInputKernel>();
             transform_weights_kernel = std::make_unique<config::TransformWeightsKernel>();
             transform_output_kernel  = std::make_unique<config::TransformOutputKernel>();
             n_gemms                  = config::WinogradBase::N_GEMMS;
             N_BLOCK                  = config::WinogradConv::N_BLOCK;
         }
         else if(kernel_size == Size2D(1, 5))
         {
             using config             = NEWinogradLayerConfiguration<float, float, 4, 1, 5, 1>;
             transform_input_kernel   = std::make_unique<config::TransformInputKernel>();
             transform_weights_kernel = std::make_unique<config::TransformWeightsKernel>();
             transform_output_kernel  = std::make_unique<config::TransformOutputKernel>();
             n_gemms                  = config::WinogradBase::N_GEMMS;
             N_BLOCK                  = config::WinogradConv::N_BLOCK;
         }
         else if(kernel_size == Size2D(5, 1))
         {
             using config             = NEWinogradLayerConfiguration<float, float, 1, 4, 1, 5>;
             transform_input_kernel   = std::make_unique<config::TransformInputKernel>();
             transform_weights_kernel = std::make_unique<config::TransformWeightsKernel>();
             transform_output_kernel  = std::make_unique<config::TransformOutputKernel>();
             n_gemms                  = config::WinogradBase::N_GEMMS;
             N_BLOCK                  = config::WinogradConv::N_BLOCK;
         }
         else if(kernel_size == Size2D(1, 7))
         {
             using config             = NEWinogradLayerConfiguration<float, float, 2, 1, 7, 1>;
             transform_input_kernel   = std::make_unique<config::TransformInputKernel>();
             transform_weights_kernel = std::make_unique<config::TransformWeightsKernel>();
             transform_output_kernel  = std::make_unique<config::TransformOutputKernel>();
             n_gemms                  = config::WinogradBase::N_GEMMS;
             N_BLOCK                  = config::WinogradConv::N_BLOCK;
         }
         else if(kernel_size == Size2D(7, 1))
         {
             using config             = NEWinogradLayerConfiguration<float, float, 1, 2, 1, 7>;
             transform_input_kernel   = std::make_unique<config::TransformInputKernel>();
             transform_weights_kernel = std::make_unique<config::TransformWeightsKernel>();
             transform_output_kernel  = std::make_unique<config::TransformOutputKernel>();
             n_gemms                  = config::WinogradBase::N_GEMMS;
             N_BLOCK                  = config::WinogradConv::N_BLOCK;
         }
         else
         {
             ARM_COMPUTE_ERROR("Not supported.");
         }
     }
 #ifdef __ARM_FEATURE_FP16_VECTOR_ARITHMETIC
     else if(data_type == DataType::F16)
     {
         if(kernel_size == Size2D(3, 3))
         {
             using config             = NEWinogradLayerConfiguration<__fp16, __fp16, 4, 4, 3, 3>;
             transform_input_kernel   = std::make_unique<config::TransformInputKernel>();
             transform_weights_kernel = std::make_unique<config::TransformWeightsKernel>();
             transform_output_kernel  = std::make_unique<config::TransformOutputKernel>();
             n_gemms                  = config::WinogradBase::N_GEMMS;
             N_BLOCK                  = config::WinogradConv::N_BLOCK;
         }
         else
         {
             ARM_COMPUTE_ERROR("Not supported.");
         }
     }
 #endif // __ARM_FEATURE_FP16_VECTOR_ARITHMETIC
 
     const PaddingType use_padding_type = (conv_info.pad_top() != 0u || conv_info.pad_left() != 0) ? PADDING_SAME : PADDING_VALID;
     const bool        use_same_padding = use_padding_type == PADDING_SAME;
 
     // Get convolved dimensions
     const int in_channels  = input->info()->dimension(channel_idx);
     const int out_channels = output->info()->dimension(channel_idx);
 
     const Tensor4DShape in_shape(internal_get_input_shape(input));
     const size_t        data_type_size = input->info()->element_size();
     // Get the memory required to instantiate a new Winograd operator.
     constexpr size_t storage_alignment = 64;
 
     // Kernel Storage
     const size_t kernel_storage_size = transform_weights_kernel->get_weight_storage_size(out_channels,
                                                                                          in_channels)
                                        * data_type_size;
 
     // Input storage
     const size_t input_storage_size = transform_input_kernel->get_input_storage_size(in_shape.n_batches, in_shape.n_channels, in_shape.n_rows, in_shape.n_cols,
                                                                                      use_same_padding)
                                       * data_type_size;
 
     // Output storage
     const size_t output_storage_size  = transform_output_kernel->get_output_storage_size(in_shape.n_batches, in_shape.n_rows, in_shape.n_cols, out_channels) * data_type_size;
     const int    kernel_matrix_stride = transform_weights_kernel->get_matrix_stride(out_channels, in_channels);
     const int    output_matrix_stride = transform_output_kernel->get_matrix_stride(in_shape.n_batches, in_shape.n_rows, in_shape.n_cols, out_channels);
     const auto   output_shape         = transform_output_kernel->get_output_shape(in_shape.n_rows, in_shape.n_cols, use_padding_type == PADDING_SAME);
     const int    input_matrix_stride  = transform_input_kernel->get_matrix_stride(in_shape.n_batches, in_channels, in_shape.n_rows, in_shape.n_cols, use_padding_type == PADDING_SAME);
 
     // Configure GEMM
     const int tile_rows                = iceildiv(output_shape.first, output_tile.height);
     const int tile_cols                = iceildiv(output_shape.second, output_tile.width);
     const int m                        = in_shape.n_batches * tile_rows * tile_cols;
     const int k                        = in_shape.n_channels;
     const int n                        = out_channels;
     const int kernel_matrix_row_stride = roundup(out_channels, N_BLOCK);
     const int output_matrix_row_stride = kernel_matrix_row_stride;
 
     TensorShape a_shape(k, m, 1, n_gemms);
     Strides     a_strides(data_type_size);
     a_strides.set(1, a_strides[0] * k);
     //a_strides.set(2, data_type_size * input_matrix_stride / n_gemms); FIXME: This is the real batch size, but RSH's code crashes if it's not 0.
     a_strides.set(2, 0);
     a_strides.set(3, data_type_size * input_matrix_stride);
 
     TensorShape b_shape(n, k, n_gemms);
     Strides     b_strides(data_type_size);
     b_strides.set(1, data_type_size * kernel_matrix_row_stride);
     b_strides.set(2, data_type_size * kernel_matrix_stride);
 
     TensorShape d_shape(n, m, 1, n_gemms);
     Strides     d_strides(data_type_size);
     d_strides.set(1, data_type_size * output_matrix_row_stride);
     //d_strides.set(2, data_type_size * output_matrix_stride / n_gemms); FIXME: This is the real batch size, but RSH's code crashes if it's not 0.
     d_strides.set(2, 0);
     d_strides.set(3, data_type_size * output_matrix_stride);
 
     TensorInfo a_info{};
     TensorInfo b_info{};
     TensorInfo d_info{};
     a_info.init(a_shape, 1, data_type, a_strides, 0, input_storage_size);
     b_info.init(b_shape, 1, data_type, b_strides, 0, kernel_storage_size);
     d_info.init(d_shape, 1, data_type, d_strides, 0, output_storage_size);
 
     _input_transformed.allocator()->init(a_info, storage_alignment);
     _kernel_storage.allocator()->init(b_info, storage_alignment);
     _output_transformed.allocator()->init(d_info, storage_alignment);
 
     // configure and allocate dst tensor to be used to convert from winograd domain to spatial domain when calling to reshape_output()
     TensorInfo info(TensorShape(_output->info()->dimension(2), _output->info()->dimension(0),
                                 _output->info()->dimension(1), _output->info()->dimension(3)),
                     1, _output->info()->data_type());
     _output_nhwc.allocator()->init(info);
 
     const ITensor     *input_to_use  = _input;
     ITensor           *output_to_use = _output;
     PermutationVector  weights_permutation_vector(3U, 0U, 1U, 2U);
     const unsigned int max_num_threads = NEScheduler::get().num_threads();
 
     // Configure the kernel to transform the input tensor from NCHW -> NHWC
     if(data_layout == DataLayout::NCHW)
     {
         _memory_group.manage(&_input_nhwc);
         _permute_input.configure(input, &_input_nhwc, PermutationVector(2U, 0U, 1U));
         input_to_use               = &_input_nhwc;
         weights_permutation_vector = PermutationVector(3U, 2U, 0U, 1U);
     }
 
     // Configure input transform kernel
     _memory_group.manage(&_input_transformed);
     _memory_group.manage(&_input_workspace);
     transform_input_kernel->configure(input_to_use, in_shape.n_batches, in_shape.n_rows, in_shape.n_cols, in_shape.n_channels, use_padding_type,
                                       &_input_transformed, input_matrix_stride, &_input_workspace);
     const size_t input_workspace_size = transform_input_kernel->get_working_space_size(max_num_threads);
     TensorInfo   input_workspace_info(TensorShape(input_workspace_size), 1, _input->info()->data_type());
     _input_workspace.allocator()->init(input_workspace_info);
     _input_workspace.allocator()->allocate();
     if(data_layout == DataLayout::NCHW)
     {
         _input_nhwc.allocator()->allocate();
     }
 
     // Re-order a weight tensor from [Output feature map x Input feature map x Height x Width] to [Height x Width x Input feature map x Output feature map]
     _permute_weights.configure(weights, &_weights_hwio, weights_permutation_vector);
     transform_weights_kernel->configure(&_weights_hwio, &_kernel_storage, kernel_matrix_stride, out_channels, in_channels);
 
     // Configure GEMM function
     _memory_group.manage(&_output_transformed);
     _gemm_function.configure(&_input_transformed, &_kernel_storage, nullptr, &_output_transformed, 1.0f, 0.f);
     _input_transformed.allocator()->allocate();
 
     // Configure output transform function
     // The biases tensor has not been allocated at this point in time, the output transform will add the biases to the final result in the run() method
     if(data_layout == DataLayout::NCHW)
     {
         _memory_group.manage(&_output_nhwc);
         output_to_use = &_output_nhwc;
     }
     const arm_gemm::Activation activation = arm_gemm_activation_from_acl_activation(act_info);
 
     transform_output_kernel->configure(biases,
                                        &_output_transformed,
                                        output_matrix_stride,
                                        output_to_use,
                                        in_shape.n_batches,
                                        output_shape.first,
                                        output_shape.second,
                                        out_channels,
                                        &_output_workspace,
                                        activation);
 
     const size_t output_workspace_size = transform_output_kernel->get_working_space_size(max_num_threads);
     TensorInfo   output_workspace_info(TensorShape(output_workspace_size), 1, _output->info()->data_type());
     _output_workspace.allocator()->init(output_workspace_info);
     _output_workspace.allocator()->allocate();
     _output_transformed.allocator()->allocate();
 
     // Reorder the convoluted output to ACL's ordering NCHW
     if(data_layout == DataLayout::NCHW)
     {
         _permute_output.configure(&_output_nhwc, _output, PermutationVector(1U, 2U, 0U));
         _output_nhwc.allocator()->allocate();
     }
 
     _transform_input_kernel   = std::move(transform_input_kernel);
     _transform_weights_kernel = std::move(transform_weights_kernel);
     _transform_output_kernel  = std::move(transform_output_kernel);
 
     //Configure Activation Layer
     _is_activationlayer_enabled = act_info.enabled() && !fuse_function_supported(act_info);
     if(_is_activationlayer_enabled)
     {
         _activationlayer_function.configure(_output, nullptr, act_info);
     }
 }

◆ operator=() [1/2]

NEWinogradConvolutionLayer& operator= ( NEWinogradConvolutionLayer && )

delete

Prevent instances of this class from being moved (As this class contains non movable objects)

◆ operator=() [2/2]

NEWinogradConvolutionLayer& operator= ( const NEWinogradConvolutionLayer & )

delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ prepare()

void prepare ( )

overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Note: Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from IFunction.

Definition at line 757 of file NEWinogradConvolutionLayer.cpp.

References TensorAllocator::allocate(), Tensor::allocator(), Window::DimX, TensorAllocator::free(), Scheduler::get(), ITensor::is_used(), ITensor::mark_as_unused(), NEGEMM::prepare(), ICPPSimpleFunction::run(), and IScheduler::schedule().

Referenced by NEWinogradConvolutionLayer::run().

 {
     if(!_is_prepared)
     {
         // Permute weights
         _weights_hwio.allocator()->allocate();
         _permute_weights.run();
         _weights->mark_as_unused();
 
         // Transform weights
         _kernel_storage.allocator()->allocate();
         NEScheduler::get().schedule(_transform_weights_kernel.get(), Window::DimX);
         _weights_hwio.allocator()->free();
 
         _gemm_function.prepare();
         if(!_kernel_storage.is_used())
         {
             _kernel_storage.allocator()->free();
         }
 
         _is_prepared = true;
     }
 }

◆ run()

void run ( )

overridevirtual

Run the kernels contained in the function.

For Neon kernels:

Multi-threading is used for the kernels which are parallelisable.
By default std::thread::hardware_concurrency() threads are used.

Note: CPPScheduler::set_num_threads() can be used to manually set the number of threads

For OpenCL kernels:

All the kernels are enqueued on the queue associated with CLScheduler.
The queue is then flushed.

Note: The function will not block until the kernels are executed. It is the user's responsibility to wait.; Will call prepare() on first run if hasn't been done

Implements IFunction.

Definition at line 612 of file NEWinogradConvolutionLayer.cpp.

References ITensorInfo::data_layout(), Window::DimX, Scheduler::get(), ITensor::info(), arm_compute::NCHW, NEWinogradConvolutionLayer::prepare(), ICPPSimpleFunction::run(), NEActivationLayer::run(), NEGEMM::run(), and IScheduler::schedule().

 {
     const DataLayout data_layout = _input->info()->data_layout();
 
     prepare();
 
     MemoryGroupResourceScope scope_mg(_memory_group);
 
     if(data_layout == DataLayout::NCHW)
     {
         //Bring channels to the front as Winograd code expects the tensor to be in the format NHWC
         _permute_input.run();
     }
 
     // Transform input tensor to the winograd domain
     NEScheduler::get().schedule(_transform_input_kernel.get(), Window::DimX);
 
     //Run 16 GEMMs in multiple threads, each kernel runs one or more GEMMs
     _gemm_function.run();
 
     // Transform output tensor to the spatial domain
     NEScheduler::get().schedule(_transform_output_kernel.get(), Window::DimX);
 
     if(data_layout == DataLayout::NCHW)
     {
         // Reorder the convoluted output to ACL's ordering NCHW
         _permute_output.run();
     }
 
     if(_is_activationlayer_enabled)
     {
         _activationlayer_function.run();
     }
 }

◆ validate()

Status validate	(	const ITensorInfo *	input,
		const ITensorInfo *	weights,
		const ITensorInfo *	biases,
		const ITensorInfo *	output,
		const PadStrideInfo &	conv_info,
		const ActivationLayerInfo &	act_info = `ActivationLayerInfo()`,
		bool	enable_fast_math = `false`
	)

static

Static function to check if given info will lead to a valid configuration of NEGEMMConvolutionLayer.

Parameters

[in]	input	Source tensor. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: F16/F32.
[in]	weights	Weights tensor. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. Data type supported:Same as `input`. Currently only 3x3 and 5x5 kernels are supported.
[in]	biases	Biases tensor. Shared biases supported. Biases are 1D tensor with dimensions [OFM]. Data type supported: Same as `weights`.
[in]	output	Destination tensor. 3 lower dimensions represent a single output [width, height, OFM], while the rest represent batch of outputs. Data types supported: Same as `input`.
[in]	conv_info	Contains padding and stride information described in PadStrideInfo. Currently only unit strides are supported.
[in]	act_info	(Optional) Activation layer information in case of a fused activation.
[in]	enable_fast_math	(Optional) Enable fast math computation. In case this flag were set, the function could dispatch the fastest implementation available which may introduce a drop of accuracy as well. Default is false

Returns: a status

Definition at line 647 of file NEWinogradConvolutionLayer.cpp.

References ARM_COMPUTE_RETURN_ERROR_MSG, ARM_COMPUTE_RETURN_ERROR_ON_MSG, ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR, ARM_COMPUTE_RETURN_ON_ERROR, ICloneable< T >::clone(), TensorInfo::clone(), arm_compute::misc::shape_calculator::compute_winograd_filter_transform_shape(), arm_compute::misc::shape_calculator::compute_winograd_input_transform_shape(), ITensorInfo::data_layout(), ITensorInfo::data_type(), ITensorInfo::dimension(), arm_compute::get_data_layout_dimension_index(), arm_compute::HEIGHT, arm_compute::test::validation::idx_height, arm_compute::test::validation::idx_width, PadStrideInfo::pad_bottom(), PadStrideInfo::pad_left(), PadStrideInfo::pad_right(), PadStrideInfo::pad_top(), TensorInfo::tensor_shape(), arm_compute::validate_arguments(), and arm_compute::WIDTH.

Referenced by NEConvolutionLayer::get_convolution_method(), and NEConvolutionLayer::validate().

 {
     ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR(input, weights, output);
     ARM_COMPUTE_RETURN_ON_ERROR(validate_arguments(input, weights, biases, output, conv_info));
 
     // Get indices for the width and height
     const size_t idx_width  = get_data_layout_dimension_index(input->data_layout(), DataLayoutDimension::WIDTH);
     const size_t idx_height = get_data_layout_dimension_index(input->data_layout(), DataLayoutDimension::HEIGHT);
 
     // Input shape, kernel size and output tile
     const Size2D   input_dims  = Size2D(input->dimension(idx_width), input->dimension(idx_height));
     const Size2D   kernel_size = Size2D(weights->dimension(idx_width), weights->dimension(idx_height));
     const DataType data_type   = input->data_type();
     const Size2D   output_tile = winograd_output_tile(input_dims, kernel_size, data_type);
 
     // Check if the Winograd configuration requires fast math
     if(!enable_fast_math)
     {
         ARM_COMPUTE_RETURN_ERROR_ON_MSG(check_support_fast_math(output_tile, kernel_size, data_type),
                                         "This Winograd configuration requires enable_fast_math=true");
     }
 
     const WinogradInfo winograd_info = WinogradInfo(output_tile,
                                                     kernel_size,
                                                     input_dims,
                                                     conv_info,
                                                     input->data_layout());
 
     // Validate input transform
     const TensorShape input0_shape = misc::shape_calculator::compute_winograd_input_transform_shape(*input, winograd_info);
     const TensorInfo  input0       = input->clone()->set_tensor_shape(input0_shape);
     // Validate filter transform
     const TensorShape input1_shape = misc::shape_calculator::compute_winograd_filter_transform_shape(*weights, winograd_info);
     const TensorInfo  input1       = weights->clone()->set_tensor_shape(input1_shape);
     // Validate batched matrix multiply
     TensorShape batched_mm_output_shape = input0.tensor_shape();
     batched_mm_output_shape[0]          = input1.tensor_shape()[0];
     const TensorInfo batched_mm_output  = input0.clone()->set_tensor_shape(batched_mm_output_shape);
 
     if(kernel_size == Size2D(3, 3))
     {
         ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_top() != 0u && conv_info.pad_top() != 1, "Only SAME or VALID padding supported");
         ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_bottom() != 0u && conv_info.pad_bottom() != 1, "Only SAME or VALID padding supported");
         ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_left() != 0u && conv_info.pad_left() != 1, "Only SAME or VALID padding supported");
         ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_right() != 0u && conv_info.pad_right() != 1, "Only SAME or VALID padding supported");
         ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_right() != conv_info.pad_left(), "Only SAME or VALID padding supported");
         ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_top() != conv_info.pad_bottom(), "Only SAME or VALID padding supported");
         ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_top() != conv_info.pad_left(), "Only SAME or VALID padding supported");
         return validate_kernel_3x3(input_dims, input, &input0, &input1, &batched_mm_output, weights, biases, output, winograd_info, act_info);
     }
     else if(kernel_size == Size2D(5, 5))
     {
         ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_top() != 0u && conv_info.pad_top() != 2, "Only SAME or VALID padding supported");
         ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_left() != 0u && conv_info.pad_left() != 2, "Only SAME or VALID padding supported");
         ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_bottom() != 0u && conv_info.pad_bottom() != 2, "Only SAME or VALID padding supported");
         ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_right() != 0u && conv_info.pad_right() != 2, "Only SAME or VALID padding supported");
         ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_right() != conv_info.pad_left(), "Only SAME or VALID padding supported");
         ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_top() != conv_info.pad_bottom(), "Only SAME or VALID padding supported");
         ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_top() != conv_info.pad_left(), "Only SAME or VALID padding supported");
         return validate_kernel_5x5(input, &input0, &input1, &batched_mm_output, weights, biases, output, winograd_info, act_info);
     }
     if(kernel_size == Size2D(3, 1))
     {
         ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_left() != 0u && conv_info.pad_left() != 1, "Only SAME or VALID padding supported");
         ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_right() != 0u && conv_info.pad_right() != 1, "Only SAME or VALID padding supported");
         ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_top() != 0u && conv_info.pad_bottom() != 0, "Only SAME or VALID padding supported");
         return validate_kernel_3x1(input, &input0, &input1, &batched_mm_output, weights, biases, output, winograd_info, act_info);
     }
     else if(kernel_size == Size2D(1, 3))
     {
         ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_top() != 0u && conv_info.pad_top() != 1, "Only SAME or VALID padding supported");
         ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_bottom() != 0u && conv_info.pad_bottom() != 1, "Only SAME or VALID padding supported");
         ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_left() != 0u && conv_info.pad_right() != 0, "Only SAME or VALID padding supported");
         return validate_kernel_1x3(input, &input0, &input1, &batched_mm_output, weights, biases, output, winograd_info, act_info);
     }
     else if(kernel_size == Size2D(5, 1))
     {
         ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_left() != 0u && conv_info.pad_left() != 2, "Only SAME or VALID padding supported");
         ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_right() != 0u && conv_info.pad_right() != 2, "Only SAME or VALID padding supported");
         ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_top() != 0u && conv_info.pad_bottom() != 0, "Only SAME or VALID padding supported");
         return validate_kernel_5x1(input, &input0, &input1, &batched_mm_output, weights, biases, output, winograd_info, act_info);
     }
     else if(kernel_size == Size2D(1, 5))
     {
         ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_top() != 0u && conv_info.pad_top() != 2, "Only SAME or VALID padding supported");
         ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_bottom() != 0u && conv_info.pad_bottom() != 2, "Only SAME or VALID padding supported");
         ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_left() != 0u && conv_info.pad_right() != 0, "Only SAME or VALID padding supported");
         return validate_kernel_1x5(input, &input0, &input1, &batched_mm_output, weights, biases, output, winograd_info, act_info);
     }
     else if(kernel_size == Size2D(7, 1))
     {
         ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_left() != 0u && conv_info.pad_left() != 3, "Only SAME or VALID padding supported");
         ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_right() != 0u && conv_info.pad_right() != 3, "Only SAME or VALID padding supported");
         ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_top() != 0u && conv_info.pad_bottom() != 0, "Only SAME or VALID padding supported");
         return validate_kernel_7x1(input, &input0, &input1, &batched_mm_output, weights, biases, output, winograd_info, act_info);
     }
     else if(kernel_size == Size2D(1, 7))
     {
         ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_top() != 0u && conv_info.pad_top() != 3, "Only SAME or VALID padding supported");
         ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_bottom() != 0u && conv_info.pad_bottom() != 3, "Only SAME or VALID padding supported");
         ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_left() != 0u && conv_info.pad_right() != 0, "Only SAME or VALID padding supported");
         return validate_kernel_1x7(input, &input0, &input1, &batched_mm_output, weights, biases, output, winograd_info, act_info);
     }
     else
     {
         ARM_COMPUTE_RETURN_ERROR_MSG("Kernel shape not supported");
     }
 }

The documentation for this class was generated from the following files:

arm_compute/runtime/NEON/functions/NEWinogradConvolutionLayer.h
src/runtime/NEON/functions/NEWinogradConvolutionLayer.cpp

Public Member Functions

Static Public Member Functions

Detailed Description

Constructor & Destructor Documentation

◆ NEWinogradConvolutionLayer() [1/3]

◆ NEWinogradConvolutionLayer() [2/3]

◆ ~NEWinogradConvolutionLayer()

◆ NEWinogradConvolutionLayer() [3/3]

Member Function Documentation

◆ configure()

◆ operator=() [1/2]

◆ operator=() [2/2]

◆ prepare()

◆ run()

◆ validate()