Neon kernel to accumulate the biases, if provided, or downscale in case of quantized input. More...

#include <NEDirectConvolutionLayerOutputStageKernel.h>

Collaboration diagram for NEDirectConvolutionLayerOutputStageKernel:

Public Member Functions
const char *	name () const override
	Name of the kernel. More...

	NEDirectConvolutionLayerOutputStageKernel ()
	Default constructor. More...

	NEDirectConvolutionLayerOutputStageKernel (const NEDirectConvolutionLayerOutputStageKernel &)=delete
	Prevent instances of this class from being copied (As this class contains pointers) More...

NEDirectConvolutionLayerOutputStageKernel &	operator= (const NEDirectConvolutionLayerOutputStageKernel &)=delete
	Prevent instances of this class from being copied (As this class contains pointers) More...

	NEDirectConvolutionLayerOutputStageKernel (NEDirectConvolutionLayerOutputStageKernel &&)=default
	Allow instances of this class to be moved. More...

NEDirectConvolutionLayerOutputStageKernel &	operator= (NEDirectConvolutionLayerOutputStageKernel &&)=default
	Allow instances of this class to be moved. More...

	~NEDirectConvolutionLayerOutputStageKernel ()=default
	Default destructor. More...

void	configure (ITensor input, const ITensor bias=nullptr, ITensor *output=nullptr, const DirectConvolutionLayerOutputStageKernelInfo &info=DirectConvolutionLayerOutputStageKernelInfo())
	Set the accumulate buffer and the biases of the kernel. More...

void	run (const Window &window, const ThreadInfo &info) override
	Execute the kernel on the passed window. More...

Public Member Functions inherited from ICPPKernel
virtual	~ICPPKernel ()=default
	Default destructor. More...

virtual void	run_nd (const Window &window, const ThreadInfo &info, const Window &thread_locator)
	legacy compatibility layer for implemantions which do not support thread_locator In these cases we simply narrow the interface down the legacy version More...

virtual void	run_op (ITensorPack &tensors, const Window &window, const ThreadInfo &info)
	Execute the kernel on the passed window. More...

Public Member Functions inherited from IKernel
	IKernel ()
	Constructor. More...

virtual	~IKernel ()=default
	Destructor. More...

virtual bool	is_parallelisable () const
	Indicates whether or not the kernel is parallelisable. More...

virtual BorderSize	border_size () const
	The size of the border for that kernel. More...

const Window &	window () const
	The maximum window the kernel can be executed on. More...

Static Public Member Functions
static Status	validate (const ITensorInfo input, const ITensorInfo bias=nullptr, const ITensorInfo *output=nullptr, const DirectConvolutionLayerOutputStageKernelInfo &info=DirectConvolutionLayerOutputStageKernelInfo())
	Static function to check if given info will lead to a valid configuration of NEDirectConvolutionLayerOutputStageKernel. More...

Detailed Description

Neon kernel to accumulate the biases, if provided, or downscale in case of quantized input.

Note: We assume bias to be shared; For quantized computations (i.e. input of S32 type) the output data type for auto-initialization must be passed as part of the DirectConvolutionLayerOutputStageKernelInfo.

Definition at line 39 of file NEDirectConvolutionLayerOutputStageKernel.h.

Constructor & Destructor Documentation

◆ NEDirectConvolutionLayerOutputStageKernel() [1/3]

NEDirectConvolutionLayerOutputStageKernel ( )

Default constructor.

Definition at line 380 of file NEDirectConvolutionLayerOutputStageKernel.cpp.

Referenced by NEDirectConvolutionLayerOutputStageKernel::name().

     : _func(nullptr), _input(nullptr), _bias(nullptr), _output(nullptr), _result_fixedpoint_multiplier(0), _result_shift(0), _result_offset_after_shift(0)
 {
 }

◆ NEDirectConvolutionLayerOutputStageKernel() [2/3]

NEDirectConvolutionLayerOutputStageKernel ( const NEDirectConvolutionLayerOutputStageKernel & )

delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ NEDirectConvolutionLayerOutputStageKernel() [3/3]

NEDirectConvolutionLayerOutputStageKernel ( NEDirectConvolutionLayerOutputStageKernel && )

default

Allow instances of this class to be moved.

◆ ~NEDirectConvolutionLayerOutputStageKernel()

~NEDirectConvolutionLayerOutputStageKernel ( )

default

Default destructor.

Referenced by NEDirectConvolutionLayerOutputStageKernel::name().

Member Function Documentation

◆ configure()

void configure	(	ITensor *	input,
		const ITensor *	bias = `nullptr`,
		ITensor *	output = `nullptr`,
		const DirectConvolutionLayerOutputStageKernelInfo &	info = `DirectConvolutionLayerOutputStageKernelInfo()`
	)

Set the accumulate buffer and the biases of the kernel.

Parameters

[in,out]	input	Input to add the bias to. If `output` is not specified then accumulation is done in-place. Data type supported: F16/F32/S32
[in]	bias	(Optional) The shared bias tensor to add. It must be 1D Tensor. Data type supported: Same as `input`
[out]	output	(Optional) If the output tensor is specified the accumulation is done out-of-place. (Defaults to nullptr) Note that in-place computation is only supported for F16/F32. For S32 this must not be nullptr. Data type supported: F16/F32 or QASYMM8/QASYMM8_SIGNED if `input` is S32
[in]	info	(Optional) DirectConvolutionLayerOutputStageKernel descriptor metadata

Definition at line 385 of file NEDirectConvolutionLayerOutputStageKernel.cpp.

References ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, arm_compute::auto_init_if_empty(), arm_compute::calculate_max_window(), ICloneable< T >::clone(), ITensorInfo::data_layout(), ITensorInfo::data_type(), arm_compute::F16, arm_compute::F32, ITensor::info(), arm_compute::test::validation::info, arm_compute::test::validation::input, arm_compute::is_data_type_quantized_asymmetric_signed(), arm_compute::NCHW, ITensorInfo::num_dimensions(), DirectConvolutionLayerOutputStageKernelInfo::output_data_type, DirectConvolutionLayerOutputStageKernelInfo::result_fixedpoint_multiplier, DirectConvolutionLayerOutputStageKernelInfo::result_offset_after_shift, DirectConvolutionLayerOutputStageKernelInfo::result_shift, arm_compute::S32, Dimensions< T >::set_num_dimensions(), ITensorInfo::set_valid_region(), ITensorInfo::tensor_shape(), ITensorInfo::total_size(), and arm_compute::validate_arguments().

Referenced by NEDirectConvolutionLayerOutputStageKernel::name().

 {
     // Perform validation step
     ARM_COMPUTE_ERROR_ON_NULLPTR(input);
     ARM_COMPUTE_ERROR_THROW_ON(validate_arguments(input->info(), (bias == nullptr) ? nullptr : bias->info(), (output == nullptr) ? nullptr : output->info(), info));
 
     _func                         = nullptr;
     _bias                         = bias;
     _input                        = input;
     _output                       = (output != nullptr) ? output : input;
     _result_fixedpoint_multiplier = info.result_fixedpoint_multiplier;
     _result_shift                 = info.result_shift;
     _result_offset_after_shift    = info.result_offset_after_shift;
 
     // Auto-initialize output output if required
     if(output != nullptr && output->info() != nullptr)
     {
         // Work out expected output data type
         const DataType output_dt = (input->info()->data_type() == DataType::S32) ? info.output_data_type : DataType::S32;
         // Output tensor auto initialization if not yet initialized
         auto_init_if_empty(*output->info(), input->info()->clone()->set_data_type(output_dt));
     }
 
     Window      win = calculate_max_window(*input->info(), Steps());
     Coordinates coord;
     coord.set_num_dimensions(input->info()->num_dimensions());
 
     if(output != nullptr && (output->info()->total_size() != 0))
     {
         output->info()->set_valid_region(ValidRegion(coord, output->info()->tensor_shape()));
     }
     else
     {
         input->info()->set_valid_region(ValidRegion(coord, input->info()->tensor_shape()));
     }
 
     INEKernel::configure(win);
 
     const bool is_qasymm8_signed = (output != nullptr) ? is_data_type_quantized_asymmetric_signed(output->info()->data_type()) : false;
 
     // Set appropriate function
     if(input->info()->data_layout() == DataLayout::NCHW)
     {
         switch(input->info()->data_type())
         {
             case DataType::S32:
             {
                 if(is_qasymm8_signed)
                 {
                     _func = &output_stage_nchw<int8_t>;
                 }
                 else
                 {
                     _func = &output_stage_nchw<uint8_t>;
                 }
                 break;
             }
 #ifdef __ARM_FEATURE_FP16_VECTOR_ARITHMETIC
             case DataType::F16:
             {
                 _func = &output_stage_nchw<float16_t>;
                 break;
             }
 #endif /* __ARM_FEATURE_FP16_VECTOR_ARITHMETIC */
             case DataType::F32:
             {
                 _func = &output_stage_nchw<float>;
                 break;
             }
             default:
             {
                 ARM_COMPUTE_ERROR("Unsupported combination of types among the inputs.");
             }
         }
     }
     else
     {
         switch(input->info()->data_type())
         {
             case DataType::S32:
             {
                 if(is_qasymm8_signed)
                 {
                     _func = &output_stage_nhwc<int8_t>;
                 }
                 else
                 {
                     _func = &output_stage_nhwc<uint8_t>;
                 }
                 break;
             }
 #ifdef __ARM_FEATURE_FP16_VECTOR_ARITHMETIC
             case DataType::F16:
             {
                 _func = &output_stage_nhwc<float16_t>;
                 break;
             }
 #endif /* __ARM_FEATURE_FP16_VECTOR_ARITHMETIC */
             case DataType::F32:
             {
                 _func = &output_stage_nhwc<float>;
                 break;
             }
             default:
             {
                 ARM_COMPUTE_ERROR("Unsupported combination of types among the inputs.");
             }
         }
     }
 }

◆ name()

const char* name ( ) const

inlineoverridevirtual

Name of the kernel.

Returns: Kernel name

Implements ICPPKernel.

Definition at line 42 of file NEDirectConvolutionLayerOutputStageKernel.h.

     {
         return "NEDirectConvolutionLayerOutputStageKernel";
     }

◆ operator=() [1/2]

NEDirectConvolutionLayerOutputStageKernel& operator= ( const NEDirectConvolutionLayerOutputStageKernel & )

delete

Prevent instances of this class from being copied (As this class contains pointers)

Referenced by NEDirectConvolutionLayerOutputStageKernel::name().

◆ operator=() [2/2]

NEDirectConvolutionLayerOutputStageKernel& operator= ( NEDirectConvolutionLayerOutputStageKernel && )

default

Allow instances of this class to be moved.

◆ run()

void run	(	const Window &	window,
		const ThreadInfo &	info
	)

overridevirtual

Execute the kernel on the passed window.

Warning: If is_parallelisable() returns false then the passed window must be equal to window()

Note: The window has to be a region within the window returned by the window() method; The width of the window has to be a multiple of num_elems_processed_per_iteration().

Parameters

[in]	window	Region on which to execute the kernel. (Must be a region of the window returned by window())
[in]	info	Info about executing thread and CPU.

Reimplemented from ICPPKernel.

Definition at line 505 of file NEDirectConvolutionLayerOutputStageKernel.cpp.

References ARM_COMPUTE_ERROR_ON, ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW, ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL, ARM_COMPUTE_UNUSED, arm_compute::test::validation::has_bias, and IKernel::window().

Referenced by NEDirectConvolutionLayerOutputStageKernel::name().

 {
     ARM_COMPUTE_UNUSED(info);
     ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL(this);
     ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW(INEKernel::window(), window);
     ARM_COMPUTE_ERROR_ON(_func == nullptr);
 
     const bool has_bias = _bias != nullptr;
     (*_func)(_input, _bias, window, _output, _result_fixedpoint_multiplier, _result_shift, _result_offset_after_shift, has_bias);
 }

◆ validate()

Status validate	(	const ITensorInfo *	input,
		const ITensorInfo *	bias = `nullptr`,
		const ITensorInfo *	output = `nullptr`,
		const DirectConvolutionLayerOutputStageKernelInfo &	info = `DirectConvolutionLayerOutputStageKernelInfo()`
	)

static

Static function to check if given info will lead to a valid configuration of NEDirectConvolutionLayerOutputStageKernel.

Parameters

[in]	input	Input to add the bias to. If `output` is not specified then accumulation is done in-place. Data type supported: F16/F32/S32
[in]	bias	(Optional) The shared bias tensor to add. It must be 1D Tensor. Data type supported: Same as `input`
[in]	output	(Optional) If the output tensor is specified the accumulation is done out-of-place. (Defaults to nullptr) Note that in-place computation is only supported for F16/F32. For S32 this must not be nullptr. Data type supported: F16/F32 or QASYMM8/QASYMM8_SIGNED if `input` is S32
[in]	info	(Optional) DirectConvolutionLayerOutputStageKernel descriptor metadata

Returns: a status

Definition at line 497 of file NEDirectConvolutionLayerOutputStageKernel.cpp.

References ARM_COMPUTE_RETURN_ON_ERROR, and arm_compute::validate_arguments().

Referenced by NEDirectConvolutionLayerOutputStageKernel::name(), and NEDirectConvolutionLayer::validate().

 {
     ARM_COMPUTE_RETURN_ON_ERROR(validate_arguments(input, bias, output, info));
 
     return Status{};
 }

The documentation for this class was generated from the following files:

src/core/NEON/kernels/NEDirectConvolutionLayerOutputStageKernel.h
src/core/NEON/kernels/NEDirectConvolutionLayerOutputStageKernel.cpp

Public Member Functions

Static Public Member Functions

Detailed Description

Constructor & Destructor Documentation

◆ NEDirectConvolutionLayerOutputStageKernel() [1/3]

◆ NEDirectConvolutionLayerOutputStageKernel() [2/3]

◆ NEDirectConvolutionLayerOutputStageKernel() [3/3]

◆ ~NEDirectConvolutionLayerOutputStageKernel()

Member Function Documentation

◆ configure()

◆ name()

◆ operator=() [1/2]

◆ operator=() [2/2]

◆ run()

◆ validate()