Kernel used to quantize down the int32 accumulator values of GEMMLowp to QSYMM16. More...

#include <CpuGemmLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel.h>

Collaboration diagram for CpuGemmLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel:

Public Member Functions
	CpuGemmLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel ()=default

	ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE (CpuGemmLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel)

void	configure (ITensorInfo src, ITensorInfo bias, ITensorInfo *dst, int result_fixedpoint_multiplier, int result_shift, int min=0, int max=0)
	Initialise the kernel's input and output. More...

void	run_op (ITensorPack &tensors, const Window &window, const ThreadInfo &info) override
	Execute the kernel on the passed window. More...

const char *	name () const override
	Name of the kernel. More...

Public Member Functions inherited from ICPPKernel
virtual	~ICPPKernel ()=default
	Default destructor. More...

virtual void	run (const Window &window, const ThreadInfo &info)
	Execute the kernel on the passed window. More...

virtual void	run_nd (const Window &window, const ThreadInfo &info, const Window &thread_locator)
	legacy compatibility layer for implemantions which do not support thread_locator In these cases we simply narrow the interface down the legacy version More...

virtual size_t	get_mws (const CPUInfo &platform, size_t thread_count) const
	Return minimum workload size of the relevant kernel. More...

Public Member Functions inherited from IKernel
	IKernel ()
	Constructor. More...

virtual	~IKernel ()=default
	Destructor. More...

virtual bool	is_parallelisable () const
	Indicates whether or not the kernel is parallelisable. More...

virtual BorderSize	border_size () const
	The size of the border for that kernel. More...

const Window &	window () const
	The maximum window the kernel can be executed on. More...

bool	is_window_configured () const
	Function to check if the embedded window of this kernel has been configured. More...

Static Public Member Functions
static Status	validate (const ITensorInfo src, const ITensorInfo bias, const ITensorInfo *dst, int min=0, int max=0)
	Static function to check if given info will lead to a valid configuration. More...

Static Public Member Functions inherited from ICpuKernel< CpuGemmLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel >
static const auto *	get_implementation (const SelectorType &selector, KernelSelectionType selection_type=KernelSelectionType::Supported)
	Micro-kernel selector. More...

Additional Inherited Members
Static Public Attributes inherited from ICPPKernel
static constexpr size_t	default_mws = 1

Detailed Description

Kernel used to quantize down the int32 accumulator values of GEMMLowp to QSYMM16.

This kernel takes a final int32 accumulator value (the output of CpuGemmLowpMatrixMultiplyKernel), and processes it to obtain the final QSYMM16 value. The following computations will be performed by the kernel:

Compute fixed point multiplication between each entry of input by result_fixedpoint_multiplier
Add bias to final result if bias tensor is not a nullptr
Round to nearest division by a power-of-two using result_shift
Clamp the value between the specified min and max bounds
Clamp the resulting int32 values to the [-32768, 32767] range and cast to QSYMM16.

Definition at line 52 of file CpuGemmLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel.h.

Constructor & Destructor Documentation

◆ CpuGemmLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel()

CpuGemmLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel ( )

default

Member Function Documentation

◆ ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE()

ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE ( CpuGemmLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel )

◆ configure()

void configure	(	ITensorInfo *	src,
		ITensorInfo *	bias,
		ITensorInfo *	dst,
		int	result_fixedpoint_multiplier,
		int	result_shift,
		int	min = `0`,
		int	max = `0`
	)

Initialise the kernel's input and output.

Parameters

[in]	src	Input tensor info. Data type supported: S32
[in]	bias	Biases tensor info. Only shared biases supported and it can be a nullptr if the biases addition is not required. Biases are 1D tensor with dimensions [OFM]. Data type supported: Same as `input`.
[out]	dst	Output tensor info. Data type supported: Data type supported: QSYMM16
[in]	result_fixedpoint_multiplier	Fixed point value to be multiplied to each element of the input matrix when once the result_offset has been add
[in]	result_shift	Integer value used to round to nearest division by a power-of-two the result after the fixed point multiplication
[in]	min	(Optional) Min value used to saturate down the output result before converting back to QSYMM16. Defaults to 0.
[in]	max	(Optional) Max value used to saturate up the output result before converting back to QSYMM16. Along with `min`, this value can be used to implement "rectified linear unit" activation functions. Defaults to 0.

Definition at line 174 of file CpuGemmLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel.cpp.

 {
     // Perform validate step
     ARM_COMPUTE_UNUSED(bias, dst);
     ARM_COMPUTE_ERROR_ON_NULLPTR(src, dst);
     ARM_COMPUTE_ERROR_THROW_ON(validate_arguments(src, bias, dst, min, max));
  
     _result_fixedpoint_multiplier = result_fixedpoint_multiplier;
     _result_shift                 = result_shift;
     _min                          = min;
     _max                          = max;
  
     // Output auto inizialitation if not yet initialized
     auto_init_if_empty(*src, src->clone()->set_data_type(DataType::QSYMM16));
     // Configure kernel window
     Window win_config = calculate_max_window(*src, Steps());
     ICpuKernel::configure(win_config);
  
     // Check if we need to clamp the result using min and max
     const bool is_bounded_relu = !(min <= -32768 && max >= 32767);
     _func = is_bounded_relu ? &CpuGemmLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel::run_internal<true>
                             : &CpuGemmLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel::run_internal<false>;
 }

References ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, ARM_COMPUTE_UNUSED, arm_compute::auto_init_if_empty(), bias, arm_compute::calculate_max_window(), arm_compute::test::validation::dst, arm_compute::QSYMM16, arm_compute::test::validation::src, and arm_compute::cpu::kernels::validate_arguments().

◆ name()

const char * name ( ) const

overridevirtual

Name of the kernel.

Returns: Kernel name

Implements ICPPKernel.

Definition at line 228 of file CpuGemmLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel.cpp.

 {
     return "CpuGemmLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel";
 }

◆ run_op()

void run_op	(	ITensorPack &	tensors,
		const Window &	window,
		const ThreadInfo &	info
	)

overridevirtual

Execute the kernel on the passed window.

Warning: If is_parallelisable() returns false then the passed window must be equal to window()

Note: The window has to be a region within the window returned by the window() method; The width of the window has to be a multiple of num_elems_processed_per_iteration().

Parameters

[in]	tensors	A vector containing the tensors to operate on.
[in]	window	Region on which to execute the kernel. (Must be a region of the window returned by window())
[in]	info	Info about executing thread and CPU.

Reimplemented from ICPPKernel.

Definition at line 212 of file CpuGemmLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel.cpp.

 {
     ARM_COMPUTE_UNUSED(info);
     ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL(this);
     ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW(ICpuKernel::window(), window);
     ARM_COMPUTE_ERROR_ON_MSG(tensors.empty(), "No inputs provided");
  
     auto src  = tensors.get_const_tensor(TensorType::ACL_SRC);
     auto bias = tensors.get_const_tensor(TensorType::ACL_BIAS);
     auto dst  = tensors.get_tensor(TensorType::ACL_DST);
  
     (this->*_func)(src, bias, dst, window);
 }

References arm_compute::ACL_BIAS, arm_compute::ACL_DST, arm_compute::ACL_SRC, ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW, ARM_COMPUTE_ERROR_ON_MSG, ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL, ARM_COMPUTE_UNUSED, bias, arm_compute::test::validation::dst, ITensorPack::empty(), ITensorPack::get_const_tensor(), ITensorPack::get_tensor(), arm_compute::test::validation::info, arm_compute::test::validation::src, and IKernel::window().

◆ validate()

Status validate	(	const ITensorInfo *	src,
		const ITensorInfo *	bias,
		const ITensorInfo *	dst,
		int	min = `0`,
		int	max = `0`
	)

static

Static function to check if given info will lead to a valid configuration.

Returns: a status

Definition at line 204 of file CpuGemmLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel.cpp.

 {
     ARM_COMPUTE_ERROR_ON_NULLPTR(input, output);
     ARM_COMPUTE_RETURN_ON_ERROR(validate_arguments(input, bias, output, min, max));
     return Status{};
 }

References ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_RETURN_ON_ERROR, bias, arm_compute::test::validation::input, and arm_compute::cpu::kernels::validate_arguments().

Referenced by CpuGemmLowpOutputStage::validate().

The documentation for this class was generated from the following files:

src/cpu/kernels/CpuGemmLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel.h
src/cpu/kernels/CpuGemmLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel.cpp

Public Member Functions

Static Public Member Functions

Additional Inherited Members