Function to run the deconvolution layer. More...

#include <NEDeconvolutionLayer.h>

Collaboration diagram for NEDeconvolutionLayer:

Public Member Functions
	NEDeconvolutionLayer (std::shared_ptr< IMemoryManager > memory_manager=nullptr)
	Constructor. More...

	NEDeconvolutionLayer (const NEDeconvolutionLayer &)=delete
	Prevent instances of this class from being copied (As this class contains pointers) More...

	NEDeconvolutionLayer (NEDeconvolutionLayer &&)=default
	Default move constructor. More...

NEDeconvolutionLayer &	operator= (const NEDeconvolutionLayer &)=delete
	Prevent instances of this class from being copied (As this class contains pointers) More...

NEDeconvolutionLayer &	operator= (NEDeconvolutionLayer &&)=default
	Default move assignment operator. More...

	~NEDeconvolutionLayer ()=default
	Default destructor. More...

void	configure (ITensor input, const ITensor weights, const ITensor bias, ITensor output, const PadStrideInfo &info, bool enable_fast_math=false, const WeightsInfo &weights_info=WeightsInfo())
	Set the input, weights, biases and output tensors. More...

void	run () override
	Run the kernels contained in the function. More...

void	prepare () override
	Prepare the function for executing. More...

Public Member Functions inherited from IFunction
virtual	~IFunction ()=default
	Destructor. More...

Static Public Member Functions
static Status	validate (const ITensorInfo input, const ITensorInfo weights, const ITensorInfo bias, const ITensorInfo output, const PadStrideInfo &info, bool enable_fast_math=false, const WeightsInfo &weights_info=WeightsInfo())
	Static function to check if given info will lead to a valid configuration of NEDeconvolutionLayer. More...

Detailed Description

Function to run the deconvolution layer.

Deconvolution Layer is the backward pass of Convolution Layer. First we transform the input depending on the stride and pad info and then perfrom a 1x1 convolution pass. Input stride defines how many zeroes we should put between each element of the input, pad is the amount of padding and finaly a is a user specified value where a < stride - 1 that increases the padding top and right of the input image.

The relation between input to output is as follows:

\[ width\_output = (width\_input - 1) \cdot stride\_x - 2 \cdot padding\_x + kernel\_x \]

\[ height\_output = (height\_input - 1) \cdot stride\_y - 2 \cdot padding\_y + kernel\_y \]

where width is the size of the first input dimension. height is the size of the second input dimension. width_output is the size of the first output dimension. height_output is the size of the second output dimension. kernel_x and kernel_y are the convolution sizes in x and y. stride_x and stride_y is the input stride of the first and second dimension.

The weights used by Deconvolution are supposed to be the same as the ones used for Convolution. Therefore, it will be necessary to use the weights in the reverse order to perform an actual convolution. This is achieved by using NEReverse.

This function calls the following kernels/functions:

Definition at line 73 of file NEDeconvolutionLayer.h.

Constructor & Destructor Documentation

◆ NEDeconvolutionLayer() [1/3]

NEDeconvolutionLayer ( std::shared_ptr< IMemoryManager > memory_manager = nullptr )

Constructor.

Definition at line 70 of file NEDeconvolutionLayer.cpp.

     : _memory_group(std::move(memory_manager)),
       _conv_f(),
       _upsample_f(),
       _flip_weights(),
       _scaled_output(),
       _weights_flipped(),
       _flip_axis(),
       _original_weights(nullptr),
       _input(nullptr),
       _info(),
       _is_prepared(false),
       _do_upsampling(true)
 {
 }

◆ NEDeconvolutionLayer() [2/3]

NEDeconvolutionLayer ( const NEDeconvolutionLayer & )

delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ NEDeconvolutionLayer() [3/3]

NEDeconvolutionLayer ( NEDeconvolutionLayer && )

default

Default move constructor.

◆ ~NEDeconvolutionLayer()

~NEDeconvolutionLayer ( )

default

Default destructor.

Member Function Documentation

◆ configure()

void configure	(	ITensor *	input,
		const ITensor *	weights,
		const ITensor *	bias,
		ITensor *	output,
		const PadStrideInfo &	info,
		bool	enable_fast_math = `false`,
		const WeightsInfo &	weights_info = `WeightsInfo()`
	)

Set the input, weights, biases and output tensors.

Valid data layouts:

NHWC
NCHW

Valid data type configurations:

src0	src1	src2	dst
F16	F16	F16	F16
F32	F32	F32	F32
QASYMM8	QASYMM8	S32	QASYMM8
QASYMM8	QSYMM8_PER_CHANNEL	S32	QASYMM8
QASYMM8_SIGNED	QASYMM8_SIGNED	S32	QASYMM8_SIGNED
QASYMM8_SIGNED	QSYMM8_PER_CHANNEL	S32	QASYMM8_SIGNED

Parameters

[in,out]	input	Input tensor. 3 lower dimensions represent a single input, and an optional 4th dimension for batch of inputs. Data types supported: F32/F16/QASYMM8/QASYMM8_SIGNED.
[in]	weights	The 4d weights with dimensions [width, height, IFM, OFM]. Data type supported: Same as `input`, also could be QSYMM8_PER_CHANNEL if input is QASYMM8/QASYMM8_SIGNED.
[in]	bias	Optional, ignored if NULL. The biases have one dimension. Data type supported: Data types supported: S32 for QASYMM8/QASYMM8_SIGNED input, F32 for F32 input, F16 for F16 input.
[out]	output	Output tensor. The output has the same number of dimensions as the `input`.
[in]	info	Contains padding and policies to be used in the deconvolution, this is described in PadStrideInfo.
[in]	enable_fast_math	(Optional) Enable fast math computation. In case this flag were set, the function could dispatch the fastest implementation available which may introduce a drop of accuracy as well. Default is false
[in]	weights_info	(Optional) Specifies the weight format. Default is unspecified. This parameter can be used to specify the weight format that is optimal for the GEMM convolution.

Definition at line 195 of file NEDeconvolutionLayer.cpp.

 {
     // Perform validation step
     ARM_COMPUTE_ERROR_ON_NULLPTR(input, weights, output);
     ARM_COMPUTE_ERROR_THROW_ON(NEDeconvolutionLayer::validate(input->info(), weights->info(),
                                                               (bias == nullptr) ? nullptr : bias->info(),
                                                               output->info(), info, enable_fast_math, weights_info));
     ARM_COMPUTE_LOG_PARAMS(input, weights, bias, output, info, enable_fast_math, weights_info);
  
     const DataLayout   data_layout = input->info()->data_layout();
     const unsigned int width_idx   = get_data_layout_dimension_index(data_layout, DataLayoutDimension::WIDTH);
     const unsigned int height_idx  = get_data_layout_dimension_index(data_layout, DataLayoutDimension::HEIGHT);
     auto               out_dims    = deconvolution_output_dimensions(
                          input->info()->dimension(width_idx), input->info()->dimension(height_idx),
                          weights->info()->dimension(width_idx), weights->info()->dimension(height_idx), info);
  
     const TensorShape output_shape = compute_deconvolution_output_shape(out_dims, *input->info(), *weights->info());
  
     _input            = input;
     _original_weights = weights;
     _info             = info;
     _is_prepared      = false;
  
     const unsigned int stride_x = info.stride().first;
     const unsigned int stride_y = info.stride().second;
  
     // Output auto initialization if not yet initialized
     auto_init_if_empty(*output->info(), output_shape, 1, input->info()->data_type(),
                        input->info()->quantization_info());
  
     _flip_axis.allocator()->init(TensorInfo(TensorShape(2U), 1, DataType::U32));
  
     _weights_flipped.allocator()->init(weights->info()->clone()->set_data_layout(data_layout));
     _flip_weights.configure(weights, &_weights_flipped, &_flip_axis);
  
     // setup the function to convolve the upscaled output
     uint32_t          deconv_pad_x    = 0;
     uint32_t          deconv_pad_y    = 0;
     const TensorShape scale_out_shape = compute_deconvolution_upsampled_shape(
         *input->info(), *weights->info(), stride_x, stride_y, out_dims, deconv_pad_x, deconv_pad_y);
  
     const PadStrideInfo upsample_info = compute_upsample_info(info, deconv_pad_x, deconv_pad_y);
  
     // Do not perform upsampling when the operation uses unit stride in all dimensions
     _do_upsampling = stride_x != 1 || stride_y != 1;
  
     // Setup flip axis data
     _flip_axis.allocator()->allocate();
     auto axis_data = reinterpret_cast<uint32_t *>(_flip_axis.buffer());
     axis_data[0]   = static_cast<uint32_t>(width_idx);
     axis_data[1]   = static_cast<uint32_t>(height_idx);
  
     // Setup convolution and upsampling, if needed
     if (_do_upsampling)
     {
         _memory_group.manage(&_scaled_output);
  
         const PadStrideInfo conv_info(1, 1, 0, 0, 0, 0, DimensionRoundingType::CEIL);
         TensorInfo scale_out_info(scale_out_shape, 1, input->info()->data_type(), input->info()->quantization_info());
         scale_out_info.set_data_layout(data_layout);
         _scaled_output.allocator()->init(scale_out_info);
  
         // Minor optimization: In the upsampling step, we do not need to allocate space for the padding in the upsampled image.
         // The padding amount can be given as input to the convolution layer.
         _upsample_f.configure(input, &_scaled_output, upsample_info);
  
         _conv_f.configure(&_scaled_output, &_weights_flipped, bias, output, conv_info, weights_info, Size2D(1U, 1U),
                           ActivationLayerInfo(), enable_fast_math);
  
         _scaled_output.allocator()->allocate();
     }
     else
     {
         const PadStrideInfo conv_info(1, 1, upsample_info.pad_left(), upsample_info.pad_right(),
                                       upsample_info.pad_top(), upsample_info.pad_bottom(), DimensionRoundingType::CEIL);
         _conv_f.configure(input, &_weights_flipped, bias, output, conv_info, weights_info, Size2D(1U, 1U),
                           ActivationLayerInfo(), enable_fast_math);
     }
 }

References TensorAllocator::allocate(), Tensor::allocator(), ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, ARM_COMPUTE_LOG_PARAMS, arm_compute::auto_init_if_empty(), bias, Tensor::buffer(), arm_compute::CEIL, ICloneable< T >::clone(), arm_compute::misc::shape_calculator::compute_deconvolution_output_shape(), arm_compute::misc::shape_calculator::compute_deconvolution_upsampled_shape(), CPPUpsample::configure(), NEReverse::configure(), NEConvolutionLayer::configure(), arm_compute::test::validation::conv_info, arm_compute::cpu::data_layout, arm_compute::deconvolution_output_dimensions(), ITensorInfo::dimension(), arm_compute::get_data_layout_dimension_index(), arm_compute::HEIGHT, arm_compute::cpu::height_idx, ITensor::info(), arm_compute::test::validation::info, TensorAllocator::init(), arm_compute::test::validation::input, MemoryGroup::manage(), arm_compute::test::validation::output_shape, PadStrideInfo::pad_bottom(), PadStrideInfo::pad_left(), PadStrideInfo::pad_right(), PadStrideInfo::pad_top(), TensorInfo::set_data_layout(), arm_compute::utils::cast::U, arm_compute::U32, NEDeconvolutionLayer::validate(), arm_compute::test::validation::weights_info, arm_compute::WIDTH, and arm_compute::cpu::width_idx.

◆ operator=() [1/2]

NEDeconvolutionLayer& operator= ( const NEDeconvolutionLayer & )

delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

NEDeconvolutionLayer& operator= ( NEDeconvolutionLayer && )

default

Default move assignment operator.

◆ prepare()

void prepare ( )

overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Note: Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from IFunction.

Definition at line 294 of file NEDeconvolutionLayer.cpp.

 {
     if (!_is_prepared)
     {
         ARM_COMPUTE_ERROR_ON(!_original_weights->is_used());
  
         // Run weights flipping and mark original weights tensor as unused
         _weights_flipped.allocator()->allocate();
         _flip_weights.run();
         _original_weights->mark_as_unused();
  
         // Prepare convolution
         _conv_f.prepare();
  
         _is_prepared = true;
     }
 }

References TensorAllocator::allocate(), Tensor::allocator(), ARM_COMPUTE_ERROR_ON, ITensor::is_used(), ITensor::mark_as_unused(), NEConvolutionLayer::prepare(), and INESimpleFunctionNoBorder::run().

Referenced by NEDeconvolutionLayer::run().

◆ run()

void run ( )

overridevirtual

Run the kernels contained in the function.

For CPU kernels:

Multi-threading is used for the kernels which are parallelisable.
By default std::thread::hardware_concurrency() threads are used.

Note: CPPScheduler::set_num_threads() can be used to manually set the number of threads

For OpenCL kernels:

All the kernels are enqueued on the queue associated with CLScheduler.
The queue is then flushed.

Note: The function will not block until the kernels are executed. It is the user's responsibility to wait.; Will call prepare() on first run if hasn't been done

Implements IFunction.

Definition at line 281 of file NEDeconvolutionLayer.cpp.

 {
     prepare();
  
     MemoryGroupResourceScope scope_mg(_memory_group);
  
     if (_do_upsampling)
     {
         _upsample_f.run();
     }
     _conv_f.run();
 }

References NEDeconvolutionLayer::prepare(), ICPPSimpleFunction::run(), and NEConvolutionLayer::run().

◆ validate()

Status validate	(	const ITensorInfo *	input,
		const ITensorInfo *	weights,
		const ITensorInfo *	bias,
		const ITensorInfo *	output,
		const PadStrideInfo &	info,
		bool	enable_fast_math = `false`,
		const WeightsInfo &	weights_info = `WeightsInfo()`
	)

static

Static function to check if given info will lead to a valid configuration of NEDeconvolutionLayer.

Parameters

[in]	input	Input tensor info. 3 lower dimensions represent a single input, and an optional 4th dimension for batch of inputs. Data types supported: F32/F16/QASYMM8/QASYMM8_SIGNED.
[in]	weights	The 4d weights info with dimensions [width, height, IFM, OFM]. Data type supported: Same as `input`, also could be QSYMM8_PER_CHANNEL if input is QASYMM8/QASYMM8_SIGNED.
[in]	bias	(Optional) The biases have one dimension. Data type supported: Data types supported: S32 for QASYMM8/QASYMM8_SIGNED input, F32 for F32 input, F16 for F16 input.
[in]	output	Output tensor info. The output has the same number of dimensions as the `input`.
[in]	info	Contains padding and policies to be used in the deconvolution, this is described in PadStrideInfo.
[in]	enable_fast_math	(Optional) Enable fast math computation. In case this flag were set, the function could dispatch the fastest implementation available which may introduce a drop of accuracy as well. Default is false
[in]	weights_info	(Optional) Specifies the weight format. Default is unspecified. This parameter can be used to specify the weight format that is optimal for the GEMM convolution.

Returns: a status

Definition at line 86 of file NEDeconvolutionLayer.cpp.

 {
     ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR(input, weights, output);
     ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(input, 1, DataType::F32, DataType::F16, DataType::QASYMM8,
                                                          DataType::QASYMM8_SIGNED);
     const unsigned int width_idx = get_data_layout_dimension_index(weights->data_layout(), DataLayoutDimension::WIDTH);
     const unsigned int height_idx =
         get_data_layout_dimension_index(weights->data_layout(), DataLayoutDimension::HEIGHT);
     ARM_COMPUTE_RETURN_ERROR_ON(weights->dimension(width_idx) < 1);
     ARM_COMPUTE_RETURN_ERROR_ON(weights->dimension(height_idx) < 1);
     ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_LAYOUT(weights, input);
     if (is_data_type_quantized_per_channel(weights->data_type()) && is_data_type_quantized(input->data_type()))
     {
         ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(weights, 1, DataType::QSYMM8_PER_CHANNEL);
     }
     else
     {
         ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(input, weights);
     }
  
     const unsigned int pad_left   = info.pad_left();
     const unsigned int pad_top    = info.pad_top();
     const unsigned int pad_right  = info.pad_right();
     const unsigned int pad_bottom = info.pad_bottom();
  
     ARM_COMPUTE_RETURN_ERROR_ON(((input->dimension(width_idx) - 1) * info.stride().first +
                                  weights->dimension(width_idx)) < (pad_left + pad_right));
     ARM_COMPUTE_RETURN_ERROR_ON(((input->dimension(height_idx) - 1) * info.stride().second +
                                  weights->dimension(height_idx)) < (pad_top + pad_bottom));
  
     auto out_dims =
         deconvolution_output_dimensions(input->dimension(width_idx), input->dimension(height_idx),
                                         weights->dimension(width_idx), weights->dimension(height_idx), info);
  
     if (bias != nullptr)
     {
         if (is_data_type_quantized_asymmetric(input->data_type()))
         {
             ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(bias, 1, DataType::S32);
         }
         else
         {
             ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(input, bias);
         }
     }
  
     if (output->tensor_shape().total_size() > 0)
     {
         ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(input, output);
  
         const TensorShape output_shape = compute_deconvolution_output_shape(out_dims, *input, *weights);
  
         ARM_COMPUTE_RETURN_ERROR_ON_MSG(output->dimension(Window::DimX) != output_shape.x(),
                                         "Output's width is invalid.");
         ARM_COMPUTE_RETURN_ERROR_ON_MSG(output->dimension(Window::DimY) != output_shape.y(),
                                         "Output's height is invalid.");
         ARM_COMPUTE_RETURN_ERROR_ON_MSG(output->dimension(Window::DimZ) != output_shape.z(),
                                         "Output's depth is invalid.");
     }
  
     uint32_t       deconv_pad_x   = 0;
     uint32_t       deconv_pad_y   = 0;
     const uint32_t stride_x       = info.stride().first;
     const uint32_t stride_y       = info.stride().second;
     const auto     deconv_padding = compute_deconvolution_padding(*input, *weights, static_cast<int32_t>(stride_x),
                                                                   static_cast<int32_t>(stride_y), out_dims);
     ARM_COMPUTE_RETURN_ERROR_ON_MSG(deconv_padding.first < 0 || deconv_padding.second < 0,
                                     "Negative padding not supported");
  
     const TensorShape scale_out_shape = compute_deconvolution_upsampled_shape(*input, *weights, stride_x, stride_y,
                                                                               out_dims, deconv_pad_x, deconv_pad_y);
     TensorInfo scale_out_info(input->clone()->set_is_resizable(true).reset_padding().set_tensor_shape(scale_out_shape));
     const PadStrideInfo upsample_info = compute_upsample_info(info, deconv_pad_x, deconv_pad_y);
  
     // Do not perform upsampling when the operation uses unit stride in all dimensions
     const bool do_upsampling = stride_x != 1 || stride_y != 1;
  
     const unsigned int batches_idx =
         get_data_layout_dimension_index(weights->data_layout(), DataLayoutDimension::BATCHES);
     const unsigned int channel_idx =
         get_data_layout_dimension_index(weights->data_layout(), DataLayoutDimension::CHANNEL);
     ARM_COMPUTE_RETURN_ERROR_ON(input->dimension(batches_idx) != scale_out_info.dimension(batches_idx));
     ARM_COMPUTE_RETURN_ERROR_ON(input->dimension(channel_idx) != scale_out_info.dimension(channel_idx));
  
     if (do_upsampling)
     {
         const PadStrideInfo conv_info(1, 1, 0, 0, 0, 0, DimensionRoundingType::CEIL);
         ARM_COMPUTE_RETURN_ON_ERROR(NEConvolutionLayer::validate(&scale_out_info, weights, bias, output, conv_info,
                                                                  weights_info, Size2D(1U, 1U), ActivationLayerInfo(),
                                                                  enable_fast_math));
     }
     else
     {
         const PadStrideInfo conv_info(1, 1, upsample_info.pad_left(), upsample_info.pad_right(),
                                       upsample_info.pad_top(), upsample_info.pad_bottom(), DimensionRoundingType::CEIL);
         ARM_COMPUTE_RETURN_ON_ERROR(NEConvolutionLayer::validate(input, weights, bias, output, conv_info, weights_info,
                                                                  Size2D(1U, 1U), ActivationLayerInfo(),
                                                                  enable_fast_math));
     }
  
     return Status{};
 }

References ARM_COMPUTE_RETURN_ERROR_ON, ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_LAYOUT, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES, ARM_COMPUTE_RETURN_ERROR_ON_MSG, ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR, ARM_COMPUTE_RETURN_ON_ERROR, arm_compute::BATCHES, bias, arm_compute::CEIL, arm_compute::CHANNEL, arm_compute::cpu::channel_idx, arm_compute::misc::shape_calculator::compute_deconvolution_output_shape(), arm_compute::misc::shape_calculator::compute_deconvolution_padding(), arm_compute::misc::shape_calculator::compute_deconvolution_upsampled_shape(), arm_compute::test::validation::conv_info, ITensorInfo::data_layout(), ITensorInfo::data_type(), arm_compute::deconvolution_output_dimensions(), ITensorInfo::dimension(), TensorInfo::dimension(), Window::DimX, Window::DimY, Window::DimZ, arm_compute::F16, arm_compute::F32, arm_compute::get_data_layout_dimension_index(), arm_compute::HEIGHT, arm_compute::cpu::height_idx, arm_compute::test::validation::info, arm_compute::test::validation::input, arm_compute::is_data_type_quantized(), arm_compute::is_data_type_quantized_asymmetric(), arm_compute::is_data_type_quantized_per_channel(), arm_compute::test::validation::output_shape, PadStrideInfo::pad_bottom(), PadStrideInfo::pad_left(), PadStrideInfo::pad_right(), PadStrideInfo::pad_top(), arm_compute::QASYMM8, arm_compute::QASYMM8_SIGNED, arm_compute::QSYMM8_PER_CHANNEL, arm_compute::S32, ITensorInfo::tensor_shape(), TensorShape::total_size(), arm_compute::utils::cast::U, NEConvolutionLayer::validate(), arm_compute::test::validation::weights_info, arm_compute::WIDTH, arm_compute::cpu::width_idx, Dimensions< T >::x(), Dimensions< T >::y(), and Dimensions< T >::z().

Referenced by NEDeconvolutionLayer::configure().

The documentation for this class was generated from the following files:

arm_compute/runtime/NEON/functions/NEDeconvolutionLayer.h
src/runtime/NEON/functions/NEDeconvolutionLayer.cpp

Public Member Functions

Static Public Member Functions

Detailed Description

Constructor & Destructor Documentation

◆ NEDeconvolutionLayer() [1/3]

◆ NEDeconvolutionLayer() [2/3]

◆ NEDeconvolutionLayer() [3/3]

◆ ~NEDeconvolutionLayer()

Member Function Documentation

◆ configure()

◆ operator=() [1/2]

◆ operator=() [2/2]

◆ prepare()

◆ run()

◆ validate()