Basic function to execute FFT-based convolution on Neon. More...

#include <NEFFTConvolutionLayer.h>

Collaboration diagram for NEFFTConvolutionLayer:

Public Member Functions
	NEFFTConvolutionLayer (std::shared_ptr< IMemoryManager > memory_manager=nullptr)
	Default constructor. More...

	NEFFTConvolutionLayer (const NEFFTConvolutionLayer &)=delete
	Prevent instances of this class from being copied (As this class contains pointers) More...

	NEFFTConvolutionLayer (NEFFTConvolutionLayer &&)=delete
	Prevent instances of this class from being moved (As this class contains non movable objects) More...

NEFFTConvolutionLayer &	operator= (const NEFFTConvolutionLayer &)=delete
	Prevent instances of this class from being copied (As this class contains pointers) More...

NEFFTConvolutionLayer &	operator= (NEFFTConvolutionLayer &&)=delete
	Prevent instances of this class from being moved (As this class contains non movable objects) More...

	~NEFFTConvolutionLayer ()
	Default destructor. More...

void	configure (ITensor input, const ITensor weights, const ITensor biases, ITensor output, const PadStrideInfo &conv_info, const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false)
	Set the input and output tensors. More...

void	run () override
	Run the kernels contained in the function. More...

void	prepare () override
	Prepare the function for executing. More...

Public Member Functions inherited from IFunction
virtual	~IFunction ()=default
	Destructor. More...

Static Public Member Functions
static Status	validate (const ITensorInfo input, const ITensorInfo weights, const ITensorInfo biases, const ITensorInfo output, const PadStrideInfo &conv_info, const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false)
	Static function to check if given info will lead to a valid configuration of NEFFTConvolutionLayer. More...

Detailed Description

Basic function to execute FFT-based convolution on Neon.

This function calls the following Neon functions/kernels:

NEPermute Permute input if NHWC(only NCHW is supported).
NEPadLayer Pad input.
NEFFT2D Forward transform to the frequency domain.
NEComplexPixelWiseMultiplication Complex element-wise product of input and the weights.
NEReductionOperation Reduction across channels.
NEFFT2D Inverse transform back to the time domain.
NEStridedSlice Extract valid output.
NEArithmeticAddition Add bias.
NEActivationLayer Perform activation.
NEPermute Permute output if NHWC(only NCHW is supported).

Definition at line 59 of file NEFFTConvolutionLayer.h.

Constructor & Destructor Documentation

◆ NEFFTConvolutionLayer() [1/3]

NEFFTConvolutionLayer ( std::shared_ptr< IMemoryManager > memory_manager = nullptr )

Default constructor.

Definition at line 61 of file NEFFTConvolutionLayer.cpp.

     : _memory_group(memory_manager),
       _flip_weights_func(),
       _permute_input_func(),
       _permute_output_func(),
       _permute_weights_func(),
       _permute_bias_func(),
       _pad_input_func(),
       _pad_weights_func(),
       _transform_input_func(memory_manager),
       _transform_weights_func(),
       _itransform_output_func(memory_manager),
       _prod_func(),
       _reduce_func(),
       _extract_output_func(),
       _bias_add_func(),
       _activation_layer_func(),
       _permuted_input(),
       _permuted_weights(),
       _permuted_bias(),
       _permuted_output(),
       _padded_input(),
       _padded_weights(),
       _flip_axis(),
       _flipped_weights(),
       _transformed_input(),
       _transformed_weights(),
       _input_weights_product(),
       _output_product(),
       _output_reduced(),
       _itransformed_output(),
       _reshaped_output(),
       _bias_output(),
       _original_weights(nullptr),
       _original_bias(nullptr),
       _is_activationlayer_enabled(false),
       _needs_permute(false),
       _has_bias(false),
       _is_prepared(false)
 {
 }

◆ NEFFTConvolutionLayer() [2/3]

NEFFTConvolutionLayer ( const NEFFTConvolutionLayer & )

delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ NEFFTConvolutionLayer() [3/3]

NEFFTConvolutionLayer ( NEFFTConvolutionLayer && )

delete

Prevent instances of this class from being moved (As this class contains non movable objects)

◆ ~NEFFTConvolutionLayer()

~NEFFTConvolutionLayer ( )

default

Default destructor.

Member Function Documentation

◆ configure()

void configure	(	ITensor *	input,
		const ITensor *	weights,
		const ITensor *	biases,
		ITensor *	output,
		const PadStrideInfo &	conv_info,
		const ActivationLayerInfo &	act_info = `ActivationLayerInfo()`,
		bool	enable_fast_math = `false`
	)

Set the input and output tensors.

Note: : This function only works with any square kernel size and unit strides for both NCHW and NHWC data layout

Parameters

[in]	input	Source tensor. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: F32.
[in]	weights	Weights tensor. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. Data type supported:Same as `input`.
[in]	biases	Biases tensor. Shared biases supported. Biases are 1D tensor with dimensions [OFM].Data type supported: Same as `input`
[out]	output	Destination tensor. 3 lower dimensions represent a single output [width, height, OFM], while the rest represent batch of outputs. Data types supported: Same as `input`.
[in]	conv_info	Contains padding and stride information described in PadStrideInfo.
[in]	act_info	(Optional) Activation layer information in case of a fused activation.
[in]	enable_fast_math	(Optional) Enable fast math computation. Unused for Neon backend.

Definition at line 104 of file NEFFTConvolutionLayer.cpp.

References TensorAllocator::allocate(), Tensor::allocator(), ARM_COMPUTE_UNUSED, arm_compute::auto_init_if_empty(), Tensor::buffer(), ICloneable< T >::clone(), NEReverse::configure(), NEPermute::configure(), NEFFT2D::configure(), NEReductionOperation::configure(), NEActivationLayer::configure(), NEArithmeticAddition::configure(), NEPadLayer::configure(), NESlice::configure(), NEComplexPixelWiseMultiplication::configure(), ITensorInfo::data_layout(), FFT2DInfo::direction, ActivationLayerInfo::enabled(), arm_compute::get_data_layout_dimension_index(), arm_compute::HEIGHT, arm_compute::test::validation::idx_height, arm_compute::test::validation::idx_width, ITensor::info(), Tensor::info(), TensorAllocator::init(), arm_compute::test::validation::input, arm_compute::Inverse, MemoryGroup::manage(), arm_compute::NCHW, arm_compute::NHWC, PadStrideInfo::pad_bottom(), PadStrideInfo::pad_left(), PadStrideInfo::pad_right(), PadStrideInfo::pad_top(), TensorShape::remove_dimension(), ITensorInfo::set_data_layout(), arm_compute::SUM, ITensorInfo::tensor_shape(), arm_compute::U, arm_compute::U32, arm_compute::WIDTH, arm_compute::WRAP, Dimensions< T >::x(), and Dimensions< T >::y().

 {
     ARM_COMPUTE_UNUSED(enable_fast_math);
 
     _original_weights = weights;
     _original_bias    = biases;
 
     // Flat if bias addition is required
     _has_bias = biases != nullptr;
 
     // Get indices for the width and height
     const size_t idx_width  = get_data_layout_dimension_index(input->info()->data_layout(), DataLayoutDimension::WIDTH);
     const size_t idx_height = get_data_layout_dimension_index(input->info()->data_layout(), DataLayoutDimension::HEIGHT);
 
     // Input shape, kernel size and output tile
     const Size2D input_dims  = Size2D(input->info()->tensor_shape()[idx_width], input->info()->tensor_shape()[idx_height]);
     const Size2D kernel_size = Size2D(weights->info()->tensor_shape()[idx_width], weights->info()->tensor_shape()[idx_height]);
     const Size2D pad_valid   = Size2D(pad_decomposable(input_dims.x() + kernel_size.x() - 1),
                                       pad_decomposable(input_dims.y() + kernel_size.y() - 1));
     // Tensors to use
     ITensor       *input_to_use   = input;
     const ITensor *weights_to_use = weights;
     ITensor       *output_to_use  = _has_bias ? &_bias_output : output;
 
     // Permute bias
     if(biases != nullptr)
     {
         _permute_bias_func.configure(biases, &_permuted_bias, PermutationVector(1U, 2U, 0U));
         _permuted_bias.info()->set_data_layout(DataLayout::NCHW);
     }
 
     // Permute input if needed
     _needs_permute = input->info()->data_layout() == DataLayout::NHWC;
     if(_needs_permute)
     {
         _memory_group.manage(&_permuted_input);
         // Configure the function to transform the input tensor from NHWC -> NCHW
         _permute_input_func.configure(input, &_permuted_input, PermutationVector(1U, 2U, 0U));
         _permuted_input.info()->set_data_layout(DataLayout::NCHW);
 
         // Configure the function to transform the weights tensor from HWI -> IHW
         _permute_weights_func.configure(weights, &_permuted_weights, PermutationVector(1U, 2U, 0U));
         _permuted_weights.info()->set_data_layout(DataLayout::NCHW);
 
         input_to_use   = &_permuted_input;
         weights_to_use = &_permuted_weights;
     }
 
     // Flip weights
     _flipped_weights.allocator()->init(weights_to_use->info()->clone()->set_is_resizable(true).reset_padding());
     _flip_axis.allocator()->init(TensorInfo(TensorShape(2U), 1, DataType::U32));
     _flip_weights_func.configure(weights_to_use, &_flipped_weights, &_flip_axis);
 
     // Pad weights
     const PaddingList padding_w = { { 0, input_dims.x() + pad_valid.x() - 1 }, { 0, input_dims.y() + pad_valid.y() - 1 } };
     _pad_weights_func.configure(&_flipped_weights, &_padded_weights, padding_w);
 
     // Transform weights
     _transform_weights_func = std::make_unique<NEFFT2D>();
     _transform_weights_func->configure(&_padded_weights, &_transformed_weights, FFT2DInfo());
 
     // Pad input
     const PaddingList padding_in = { { 0, kernel_size.x() + pad_valid.x() - 1 }, { 0, kernel_size.y() + pad_valid.y() - 1 } };
     _memory_group.manage(&_padded_input);
     _pad_input_func.configure(input_to_use, &_padded_input, padding_in);
     if(_needs_permute)
     {
         _permuted_input.allocator()->allocate();
     }
 
     // Transform input
     _memory_group.manage(&_transformed_input);
     _transform_input_func.configure(&_padded_input, &_transformed_input, FFT2DInfo());
     _padded_input.allocator()->allocate();
 
     // Perform product
     _memory_group.manage(&_output_product);
     _prod_func.configure(&_transformed_input, &_transformed_weights, &_output_product);
     _transformed_input.allocator()->allocate();
 
     // Perform reduction
     _memory_group.manage(&_output_reduced);
     _reduce_func.configure(&_output_product, &_output_reduced, 2, ReductionOperation::SUM);
     _output_product.allocator()->allocate();
 
     // Transform output
     _memory_group.manage(&_itransformed_output);
     FFT2DInfo itranform_info;
     itranform_info.direction = FFTDirection::Inverse;
     _itransformed_output.allocator()->init(_output_reduced.info()->clone()->set_is_resizable(true).set_num_channels(1).reset_padding());
     _itransform_output_func.configure(&_output_reduced, &_itransformed_output, itranform_info);
     _output_reduced.allocator()->allocate();
 
     // Reshape output
     TensorShape reshaped_shape = _itransformed_output.info()->tensor_shape();
     reshaped_shape.remove_dimension(2);
     _reshaped_output.allocator()->init(_itransformed_output.info()->clone()->set_tensor_shape(reshaped_shape));
 
     // Extract correct region
     const int start_left = kernel_size.x() - conv_info.pad_left() - 1;
     const int start_top  = kernel_size.y() - conv_info.pad_top() - 1;
     const int end_right  = _reshaped_output.info()->tensor_shape().x() - (kernel_size.x() - conv_info.pad_right() - 1) - pad_valid.x();
     const int end_botton = _reshaped_output.info()->tensor_shape().y() - (kernel_size.y() - conv_info.pad_bottom() - 1) - pad_valid.y();
     if(_has_bias)
     {
         _memory_group.manage(&_bias_output);
     }
     else if(_needs_permute)
     {
         output_to_use = &_permuted_output;
         _memory_group.manage(&_permuted_output);
     }
     _extract_output_func.configure(&_reshaped_output, output_to_use, Coordinates(start_left, start_top), Coordinates(end_right, end_botton));
     _reshaped_output.allocator()->allocate();
     _itransformed_output.allocator()->allocate();
 
     // Add bias
     if(biases != nullptr)
     {
         output_to_use = output;
         if(_needs_permute)
         {
             output_to_use = &_permuted_output;
             _memory_group.manage(&_permuted_output);
         }
         auto_init_if_empty(*output_to_use->info(), *_bias_output.info());
         _bias_add_func.configure(&_bias_output, &_permuted_bias, output_to_use, ConvertPolicy::WRAP);
         _bias_output.allocator()->allocate();
     }
 
     // Permute output
     if(_needs_permute)
     {
         // Configure the function to transform the convoluted output to ACL's native ordering format NCHW
         _permuted_output.info()->set_data_layout(DataLayout::NCHW);
         _permute_output_func.configure(&_permuted_output, output, PermutationVector(2U, 0U, 1U));
 
         // Allocate tensors
         _permuted_output.allocator()->allocate();
     }
 
     // Configure Activation Layer
     _is_activationlayer_enabled = act_info.enabled();
     if(_is_activationlayer_enabled)
     {
         _activation_layer_func.configure(output, nullptr, act_info);
     }
 
     // Setup flip axis data
     _flip_axis.allocator()->allocate();
 
     auto axis_data = reinterpret_cast<uint32_t *>(_flip_axis.buffer());
     axis_data[0]   = 0;
     axis_data[1]   = 1;
 }

◆ operator=() [1/2]

NEFFTConvolutionLayer& operator= ( const NEFFTConvolutionLayer & )

delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

NEFFTConvolutionLayer& operator= ( NEFFTConvolutionLayer && )

delete

Prevent instances of this class from being moved (As this class contains non movable objects)

◆ prepare()

void prepare ( )

overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Note: Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from IFunction.

Definition at line 348 of file NEFFTConvolutionLayer.cpp.

References TensorAllocator::allocate(), Tensor::allocator(), ARM_COMPUTE_ERROR_ON, TensorAllocator::free(), ITensor::is_used(), ITensor::mark_as_unused(), INESimpleFunctionNoBorder::run(), NEPermute::run(), and NEPadLayer::run().

Referenced by NEFFTConvolutionLayer::run().

 {
     if(!_is_prepared)
     {
         // Permute bias to NCHW
         if(_original_bias != nullptr)
         {
             _permuted_bias.allocator()->allocate();
             _permute_bias_func.run();
             _original_bias->mark_as_unused();
         }
 
         const ITensor *cur_weights = _original_weights;
 
         // Permute weights
         if(_needs_permute)
         {
             ARM_COMPUTE_ERROR_ON(!cur_weights->is_used());
 
             _permuted_weights.allocator()->allocate();
             _permute_weights_func.run();
             cur_weights->mark_as_unused();
             cur_weights = &_permuted_weights;
         }
 
         // Flip weights
         _flipped_weights.allocator()->allocate();
         _flip_weights_func.run();
         cur_weights->mark_as_unused();
 
         // Pad weights
         _padded_weights.allocator()->allocate();
         _pad_weights_func.run();
         _flipped_weights.mark_as_unused();
         _flipped_weights.allocator()->free();
 
         // Transform weights to frequency domain
         _transformed_weights.allocator()->allocate();
         _transform_weights_func->run();
         _transform_weights_func.reset();
 
         _padded_weights.mark_as_unused();
         _padded_weights.allocator()->free();
 
         _is_prepared = true;
     }
 }

◆ run()

void run ( )

overridevirtual

Run the kernels contained in the function.

For Neon kernels:

Multi-threading is used for the kernels which are parallelisable.
By default std::thread::hardware_concurrency() threads are used.

Note: CPPScheduler::set_num_threads() can be used to manually set the number of threads

For OpenCL kernels:

All the kernels are enqueued on the queue associated with CLScheduler.
The queue is then flushed.

Note: The function will not block until the kernels are executed. It is the user's responsibility to wait.; Will call prepare() on first run if hasn't been done

Implements IFunction.

Definition at line 307 of file NEFFTConvolutionLayer.cpp.

References Tensor::allocator(), Tensor::buffer(), TensorAllocator::import_memory(), NEFFTConvolutionLayer::prepare(), NEFFT2D::run(), NEPermute::run(), NEReductionOperation::run(), NEActivationLayer::run(), NEArithmeticAddition::run(), NEPadLayer::run(), NESlice::run(), and NEComplexPixelWiseMultiplication::run().

 {
     prepare();
 
     MemoryGroupResourceScope scope_mg(_memory_group);
 
     // Transform input
     if(_needs_permute)
     {
         _permute_input_func.run();
     }
     _pad_input_func.run();
     _transform_input_func.run();
 
     // Perform operations to frequency domain
     _prod_func.run();
 
     _reduce_func.run();
 
     // Transform output
     _itransform_output_func.run();
     _reshaped_output.allocator()->import_memory(_itransformed_output.buffer());
     _extract_output_func.run();
 
     // Add bias
     if(_has_bias)
     {
         _bias_add_func.run();
     }
     if(_needs_permute)
     {
         _permute_output_func.run();
     }
 
     // Run activation layer
     if(_is_activationlayer_enabled)
     {
         _activation_layer_func.run();
     }
 }

◆ validate()

Status validate	(	const ITensorInfo *	input,
		const ITensorInfo *	weights,
		const ITensorInfo *	biases,
		const ITensorInfo *	output,
		const PadStrideInfo &	conv_info,
		const ActivationLayerInfo &	act_info = `ActivationLayerInfo()`,
		bool	enable_fast_math = `false`
	)

static

Static function to check if given info will lead to a valid configuration of NEFFTConvolutionLayer.

Note: : This function only works with any square kernel size and unit strides for both NCHW and NHWC data layout

Parameters

[in]	input	Source tensor. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: F32.
[in]	weights	Weights tensor. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. Data type supported:Same as `input`.
[in]	biases	Biases tensor. Shared biases supported. Biases are 1D tensor with dimensions [OFM].Data type supported: Same as `input`
[in]	output	Destination tensor. 3 lower dimensions represent a single output [width, height, OFM], while the rest represent batch of outputs. Data types supported: Same as `input`.
[in]	conv_info	Contains padding and stride information described in PadStrideInfo.
[in]	act_info	(Optional) Activation layer information in case of a fused activation.
[in]	enable_fast_math	(Optional) Enable fast math computation. Unused for Neon backend.

Returns: a status

Definition at line 261 of file NEFFTConvolutionLayer.cpp.

References ARM_COMPUTE_RETURN_ERROR_ON, ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES, ARM_COMPUTE_RETURN_ON_ERROR, ARM_COMPUTE_UNUSED, arm_compute::CHANNEL, ITensorInfo::data_layout(), ActivationLayerInfo::enabled(), arm_compute::F32, arm_compute::get_data_layout_dimension_index(), arm_compute::HEIGHT, arm_compute::test::validation::idx_height, arm_compute::test::validation::idx_width, PadStrideInfo::pad_bottom(), PadStrideInfo::pad_left(), PadStrideInfo::pad_right(), PadStrideInfo::pad_top(), PadStrideInfo::stride(), ITensorInfo::tensor_shape(), ITensorInfo::total_size(), NEActivationLayer::validate(), arm_compute::WIDTH, and Dimensions< T >::x().

Referenced by NEConvolutionLayer::get_convolution_method(), and NEConvolutionLayer::validate().

 {
     ARM_COMPUTE_UNUSED(enable_fast_math);
 
     ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(input, 1, DataType::F32);
     ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(input, weights);
 
     // Get indices for the width and height
     const size_t idx_width  = get_data_layout_dimension_index(input->data_layout(), DataLayoutDimension::WIDTH);
     const size_t idx_height = get_data_layout_dimension_index(input->data_layout(), DataLayoutDimension::HEIGHT);
 
     // Input shape, kernel size and output tile
     const Size2D kernel_size = Size2D(weights->tensor_shape()[idx_width], weights->tensor_shape()[idx_height]);
 
     // Strides
     const auto strides = conv_info.stride();
     ARM_COMPUTE_RETURN_ERROR_ON(strides.first != strides.second && strides.first != 1);
     ARM_COMPUTE_RETURN_ERROR_ON(kernel_size.x() != kernel_size.y());
     ARM_COMPUTE_RETURN_ERROR_ON(conv_info.pad_left() != (kernel_size.x() / 2) || conv_info.pad_right() != (kernel_size.x() / 2));
     ARM_COMPUTE_RETURN_ERROR_ON(conv_info.pad_top() != (kernel_size.y() / 2) || conv_info.pad_bottom() != (kernel_size.y() / 2));
 
     // Validate biases
     if(biases != nullptr)
     {
         const size_t idx_channels = get_data_layout_dimension_index(input->data_layout(), DataLayoutDimension::CHANNEL);
         ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(input, biases);
         ARM_COMPUTE_RETURN_ERROR_ON(input->tensor_shape()[idx_channels] != biases->tensor_shape().x());
     }
 
     // Checks performed when output is configured
     if((output != nullptr) && (output->total_size() != 0))
     {
         ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(input, output);
         ARM_COMPUTE_RETURN_ERROR_ON((input->tensor_shape()[idx_height] != output->tensor_shape()[idx_height]) || (input->tensor_shape()[idx_width] != output->tensor_shape()[idx_width]));
 
         // Validate Activation Layer
         if(act_info.enabled())
         {
             ARM_COMPUTE_RETURN_ON_ERROR(NEActivationLayer::validate(output, nullptr, act_info));
         }
     }
 
     return Status{};
 }

The documentation for this class was generated from the following files:

arm_compute/runtime/NEON/functions/NEFFTConvolutionLayer.h
src/runtime/NEON/functions/NEFFTConvolutionLayer.cpp

Public Member Functions

Static Public Member Functions

Detailed Description

Constructor & Destructor Documentation

◆ NEFFTConvolutionLayer() [1/3]

◆ NEFFTConvolutionLayer() [2/3]

◆ NEFFTConvolutionLayer() [3/3]

◆ ~NEFFTConvolutionLayer()

Member Function Documentation

◆ configure()

◆ operator=() [1/2]

◆ operator=() [2/2]

◆ prepare()

◆ run()

◆ validate()