Kernel to accumulate the biases, if provided, or downscale in case of quantized input. More...

#include <CpuDirectConv2dOutputStageKernel.h>

Collaboration diagram for CpuDirectConv2dOutputStageKernel:

Public Member Functions
	CpuDirectConv2dOutputStageKernel ()=default

	ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE (CpuDirectConv2dOutputStageKernel)

void	configure (ITensorInfo src, const ITensorInfo bias=nullptr, ITensorInfo *dst=nullptr, const DirectConvolutionLayerOutputStageKernelInfo &info=DirectConvolutionLayerOutputStageKernelInfo())
	Set the accumulate buffer and the biases of the kernel. More...

void	run_op (ITensorPack &tensors, const Window &window, const ThreadInfo &info) override
	Execute the kernel on the passed window. More...

const char *	name () const override
	Name of the kernel. More...

Public Member Functions inherited from ICPPKernel
virtual	~ICPPKernel ()=default
	Default destructor. More...

virtual void	run (const Window &window, const ThreadInfo &info)
	Execute the kernel on the passed window. More...

virtual void	run_nd (const Window &window, const ThreadInfo &info, const Window &thread_locator)
	legacy compatibility layer for implemantions which do not support thread_locator In these cases we simply narrow the interface down the legacy version More...

virtual size_t	get_mws (const CPUInfo &platform, size_t thread_count) const
	Return minimum workload size of the relevant kernel. More...

Public Member Functions inherited from IKernel
	IKernel ()
	Constructor. More...

virtual	~IKernel ()=default
	Destructor. More...

virtual bool	is_parallelisable () const
	Indicates whether or not the kernel is parallelisable. More...

virtual BorderSize	border_size () const
	The size of the border for that kernel. More...

const Window &	window () const
	The maximum window the kernel can be executed on. More...

bool	is_window_configured () const
	Function to check if the embedded window of this kernel has been configured. More...

Static Public Member Functions
static Status	validate (const ITensorInfo src, const ITensorInfo bias=nullptr, const ITensorInfo *dst=nullptr, const DirectConvolutionLayerOutputStageKernelInfo &info=DirectConvolutionLayerOutputStageKernelInfo())
	Static function to check if given info will lead to a valid configuration. More...

Static Public Member Functions inherited from ICpuKernel< CpuDirectConv2dOutputStageKernel >
static const auto *	get_implementation (const SelectorType &selector, KernelSelectionType selection_type=KernelSelectionType::Supported)
	Micro-kernel selector. More...

Additional Inherited Members
Static Public Attributes inherited from ICPPKernel
static constexpr size_t	default_mws = 1

Detailed Description

Kernel to accumulate the biases, if provided, or downscale in case of quantized input.

Note: We assume bias to be shared; For quantized computations (i.e. src of S32 type) the output data type for auto-initialization must be passed as part of the DirectConvolutionLayerOutputStageKernelInfo.

Definition at line 44 of file CpuDirectConv2dOutputStageKernel.h.

Constructor & Destructor Documentation

◆ CpuDirectConv2dOutputStageKernel()

CpuDirectConv2dOutputStageKernel ( )

default

Member Function Documentation

◆ ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE()

ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE ( CpuDirectConv2dOutputStageKernel )

◆ configure()

void configure	(	ITensorInfo *	src,
		const ITensorInfo *	bias = `nullptr`,
		ITensorInfo *	dst = `nullptr`,
		const DirectConvolutionLayerOutputStageKernelInfo &	info = `DirectConvolutionLayerOutputStageKernelInfo()`
	)

Set the accumulate buffer and the biases of the kernel.

Parameters

[in,out]	src	Input to add the bias to. If `dst` is not specified then accumulation is done in-place. Data type supported: F16/F32/S32
[in]	bias	(Optional) The shared bias tensor to add. It must be 1D Tensor. Data type supported: Same as `src`
[out]	dst	(Optional) If the dst tensor is specified the accumulation is done out-of-place. (Defaults to nullptr) Note that in-place computation is only supported for F16/F32. For S32 this must not be nullptr. Data type supported: F16/F32 or QASYMM8/QASYMM8_SIGNED if `src` is S32
[in]	info	(Optional) DirectConvolutionLayerOutputStageKernel descriptor metadata

Definition at line 410 of file CpuDirectConv2dOutputStageKernel.cpp.

 {
     ARM_COMPUTE_UNUSED(bias);
     // Perform validation step
     ARM_COMPUTE_ERROR_ON_NULLPTR(src);
     ARM_COMPUTE_ERROR_THROW_ON(validate_arguments(src, bias, dst, info));
  
     _func                         = nullptr;
     _result_fixedpoint_multiplier = info.result_fixedpoint_multiplier;
     _result_shift                 = info.result_shift;
     _result_offset_after_shift    = info.result_offset_after_shift;
  
     // Auto-initialize output output if required
     if (dst != nullptr)
     {
         // Work out expected output data type
         const DataType output_dt = (src->data_type() == DataType::S32) ? info.output_data_type : DataType::S32;
         // Output tensor auto initialization if not yet initialized
         auto_init_if_empty(*dst, src->clone()->set_data_type(output_dt));
     }
  
     Window win = calculate_max_window(*src, Steps());
  
     ICpuKernel::configure(win);
  
     const bool is_qasymm8_signed =
         (dst != nullptr) ? is_data_type_quantized_asymmetric_signed(dst->data_type()) : false;
  
     // Set appropriate function
     if (src->data_layout() == DataLayout::NCHW)
     {
         switch (src->data_type())
         {
             case DataType::S32:
             {
                 if (is_qasymm8_signed)
                 {
                     _func = &output_stage_nchw<int8_t>;
                 }
                 else
                 {
                     _func = &output_stage_nchw<uint8_t>;
                 }
                 break;
             }
 #ifdef __ARM_FEATURE_FP16_VECTOR_ARITHMETIC
             case DataType::F16:
             {
                 _func = &output_stage_nchw<float16_t>;
                 break;
             }
 #endif /* __ARM_FEATURE_FP16_VECTOR_ARITHMETIC */
             case DataType::F32:
             {
                 _func = &output_stage_nchw<float>;
                 break;
             }
             default:
             {
                 ARM_COMPUTE_ERROR("Unsupported combination of types among the inputs.");
             }
         }
     }
     else
     {
         switch (src->data_type())
         {
             case DataType::S32:
             {
                 if (is_qasymm8_signed)
                 {
                     _func = &output_stage_nhwc<int8_t>;
                 }
                 else
                 {
                     _func = &output_stage_nhwc<uint8_t>;
                 }
                 break;
             }
 #ifdef __ARM_FEATURE_FP16_VECTOR_ARITHMETIC
             case DataType::F16:
             {
                 _func = &output_stage_nhwc<float16_t>;
                 break;
             }
 #endif /* __ARM_FEATURE_FP16_VECTOR_ARITHMETIC */
             case DataType::F32:
             {
                 _func = &output_stage_nhwc<float>;
                 break;
             }
             default:
             {
                 ARM_COMPUTE_ERROR("Unsupported combination of types among the inputs.");
             }
         }
     }
 }

References ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, ARM_COMPUTE_UNUSED, arm_compute::auto_init_if_empty(), bias, arm_compute::calculate_max_window(), arm_compute::test::validation::dst, arm_compute::F16, arm_compute::F32, arm_compute::test::validation::info, arm_compute::is_data_type_quantized_asymmetric_signed(), arm_compute::NCHW, arm_compute::S32, arm_compute::test::validation::src, and arm_compute::cpu::kernels::validate_arguments().

◆ name()

const char * name ( ) const

overridevirtual

Name of the kernel.

Returns: Kernel name

Implements ICPPKernel.

Definition at line 535 of file CpuDirectConv2dOutputStageKernel.cpp.

 {
     return "CpuDirectConv2dOutputStageKernel";
 }

◆ run_op()

void run_op	(	ITensorPack &	tensors,
		const Window &	window,
		const ThreadInfo &	info
	)

overridevirtual

Execute the kernel on the passed window.

Warning: If is_parallelisable() returns false then the passed window must be equal to window()

Note: The window has to be a region within the window returned by the window() method; The width of the window has to be a multiple of num_elems_processed_per_iteration().

Parameters

[in]	tensors	A vector containing the tensors to operate on.
[in]	window	Region on which to execute the kernel. (Must be a region of the window returned by window())
[in]	info	Info about executing thread and CPU.

Reimplemented from ICPPKernel.

Definition at line 521 of file CpuDirectConv2dOutputStageKernel.cpp.

 {
     ARM_COMPUTE_UNUSED(info);
     ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL(this);
     ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW(ICpuKernel::window(), window);
     ARM_COMPUTE_ERROR_ON(_func == nullptr);
  
     auto src  = tensors.get_tensor(TensorType::ACL_SRC_0);
     auto bias = tensors.get_const_tensor(TensorType::ACL_SRC_1);
     auto dst  = tensors.get_tensor(TensorType::ACL_DST);
  
     (*_func)(src, bias, window, dst, _result_fixedpoint_multiplier, _result_shift, _result_offset_after_shift);
 }

References arm_compute::ACL_DST, arm_compute::ACL_SRC_0, arm_compute::ACL_SRC_1, ARM_COMPUTE_ERROR_ON, ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW, ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL, ARM_COMPUTE_UNUSED, bias, arm_compute::test::validation::dst, ITensorPack::get_const_tensor(), ITensorPack::get_tensor(), arm_compute::test::validation::info, arm_compute::test::validation::src, and IKernel::window().

◆ validate()

Status validate	(	const ITensorInfo *	src,
		const ITensorInfo *	bias = `nullptr`,
		const ITensorInfo *	dst = `nullptr`,
		const DirectConvolutionLayerOutputStageKernelInfo &	info = `DirectConvolutionLayerOutputStageKernelInfo()`
	)

static

Static function to check if given info will lead to a valid configuration.

Returns: a status

Definition at line 512 of file CpuDirectConv2dOutputStageKernel.cpp.

 {
     ARM_COMPUTE_RETURN_ON_ERROR(validate_arguments(src, bias, dst, info));
     return Status{};
 }

References ARM_COMPUTE_RETURN_ON_ERROR, bias, arm_compute::test::validation::dst, arm_compute::test::validation::info, arm_compute::test::validation::src, and arm_compute::cpu::kernels::validate_arguments().

Referenced by CpuDirectConv2d::validate().

The documentation for this class was generated from the following files:

src/cpu/kernels/CpuDirectConv2dOutputStageKernel.h
src/cpu/kernels/CpuDirectConv2dOutputStageKernel.cpp

Public Member Functions

Static Public Member Functions

Additional Inherited Members