Interface for the kernel to perform multiplication between two tensors. More...

#include <CpuMulKernel.h>

Collaboration diagram for CpuMulKernel:

Public Member Functions
	CpuMulKernel ()=default

	ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE (CpuMulKernel)

void	configure (ITensorInfo src1, ITensorInfo src2, ITensorInfo *dst, float scale, ConvertPolicy overflow_policy, RoundingPolicy rounding_policy)
	Initialise the kernel's input, dst and border mode. More...

void	run_op (ITensorPack &tensors, const Window &window, const ThreadInfo &info) override
	Execute the kernel on the passed window. More...

const char *	name () const override
	Name of the kernel. More...

size_t	get_mws (const CPUInfo &platform, size_t thread_count) const override
	Return minimum workload size of the relevant kernel. More...

size_t	get_split_dimension_hint () const
	Get the preferred dimension in which the scheduler splits the work into multiple jobs. More...

Public Member Functions inherited from ICPPKernel
virtual	~ICPPKernel ()=default
	Default destructor. More...

virtual void	run (const Window &window, const ThreadInfo &info)
	Execute the kernel on the passed window. More...

virtual void	run_nd (const Window &window, const ThreadInfo &info, const Window &thread_locator)
	legacy compatibility layer for implemantions which do not support thread_locator In these cases we simply narrow the interface down the legacy version More...

Public Member Functions inherited from IKernel
	IKernel ()
	Constructor. More...

virtual	~IKernel ()=default
	Destructor. More...

virtual bool	is_parallelisable () const
	Indicates whether or not the kernel is parallelisable. More...

virtual BorderSize	border_size () const
	The size of the border for that kernel. More...

const Window &	window () const
	The maximum window the kernel can be executed on. More...

bool	is_window_configured () const
	Function to check if the embedded window of this kernel has been configured. More...

Static Public Member Functions
static Status	validate (const ITensorInfo src1, const ITensorInfo src2, const ITensorInfo *dst, float scale, ConvertPolicy overflow_policy, RoundingPolicy rounding_policy)
	Static function to check if given info will lead to a valid configuration. More...

Static Public Member Functions inherited from ICpuKernel< CpuMulKernel >
static const auto *	get_implementation (const SelectorType &selector, KernelSelectionType selection_type=KernelSelectionType::Supported)
	Micro-kernel selector. More...

Additional Inherited Members
Static Public Attributes inherited from ICPPKernel
static constexpr size_t	default_mws = 1

Detailed Description

Interface for the kernel to perform multiplication between two tensors.

Definition at line 39 of file CpuMulKernel.h.

Constructor & Destructor Documentation

◆ CpuMulKernel()

CpuMulKernel ( )

default

Member Function Documentation

◆ ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE()

ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE ( CpuMulKernel )

◆ configure()

void configure	(	ITensorInfo *	src1,
		ITensorInfo *	src2,
		ITensorInfo *	dst,
		float	scale,
		ConvertPolicy	overflow_policy,
		RoundingPolicy	rounding_policy
	)

Initialise the kernel's input, dst and border mode.

Valid configurations (Src1,Src2) -> Dst :

                                                  Support: Broadcast? Scale=1/255?

(U8,U8) -> U8, S16 N Y
(U8,S16) -> S16 N Y
(S16,U8) -> S16 N Y
(S16,S16) -> S16 N Y
(S32,S32) -> S32 Y N
(F16,F16) -> F16 N Y
(F32,F32) -> F32 Y Y
(QASYMM8,QASYMM8) -> QASYMM8 Y Y
(QASYMM8_SIGNED,QASYMM8_SIGNED) -> QASYMM8_SIGNED Y Y
(QSYMM16,QSYMM16) -> QSYMM16, S32 N Y

Note: For scale equal to 1/255 only round to nearest even (implemented as round half up) is supported. For all other scale values only round to zero (implemented as round towards minus infinity) is supported.

Parameters

[in]	src1	First input tensor. Data types supported: U8/QASYMM8/QASYMM8_SIGNED/S16/S32/QSYMM16/F16/F32
[in]	src2	Second input tensor. Data types supported: U8/QASYMM8/QASYMM8_SIGNED/S16/S32/QSYMM16/F16/F32
[out]	dst	Dst tensor. Data types supported: U8/QASYMM8/QASYMM8_SIGNED/S16/S32/QSYMM16/F16/F32
[in]	scale	Scale to apply after multiplication. Scale must be positive and its value must be either 1/255 or 1/2^n where n is between 0 and 15. If both `src1`, `src2` and `dst` are of datatype S32, scale cannot be 1/255
[in]	overflow_policy	Overflow policy. ConvertPolicy cannot be WRAP if any of the inputs is of quantized datatype
[in]	rounding_policy	Rounding policy.

Definition at line 1503 of file CpuMulKernel.cpp.

 {
     ARM_COMPUTE_UNUSED(rounding_policy);
     ARM_COMPUTE_ERROR_ON_NULLPTR(src1, src2, dst);
  
     ARM_COMPUTE_ERROR_THROW_ON(validate_arguments(src1, src2, dst, scale, overflow_policy, rounding_policy));
  
     const TensorShape &out_shape = TensorShape::broadcast_shape(src1->tensor_shape(), src2->tensor_shape());
  
     // Auto initialize dst if not initialized
     set_shape_if_empty(*dst, out_shape);
  
     _scale          = scale;
     _scale_exponent = 0;
     _func_quantized = nullptr;
     _func_int       = nullptr;
     _func_float     = nullptr;
  
     bool is_scale_255 = false;
     // Check and validate scaling factor
     if (std::abs(scale - scale255_constant) < 0.00001f)
     {
         is_scale_255 = true;
     }
     else
     {
         int exponent = 0;
  
         std::frexp(scale, &exponent);
  
         // Store the positive exponent. We know that we compute 1/2^n
         // Additionally we need to subtract 1 to compensate that frexp used a mantissa of 0.5
         _scale_exponent = std::abs(exponent - 1);
     }
  
     const DataType dt_input1 = src1->data_type();
     const DataType dt_input2 = src2->data_type();
     const DataType dt_output = dst->data_type();
     const bool     is_sat    = (overflow_policy == ConvertPolicy::SATURATE);
  
     switch (dt_input1)
     {
         case DataType::QASYMM8:
             if (dt_input2 == DataType::QASYMM8 && dt_output == DataType::QASYMM8)
             {
                 if (mul_q8_neon_fixedpoint_possible(src1, src2, dst, scale))
                 {
                     _func_quantized = &mul_q8_neon_fixedpoint<uint8_t>;
                 }
                 else
                 {
                     _func_quantized = &mul_saturate_quantized_8<uint8_t>;
                 }
             }
             break;
         case DataType::QASYMM8_SIGNED:
             if (dt_input2 == DataType::QASYMM8_SIGNED)
             {
                 if (mul_q8_neon_fixedpoint_possible(src1, src2, dst, scale))
                 {
                     _func_quantized = &mul_q8_neon_fixedpoint<int8_t>;
                 }
                 else
                 {
                     _func_quantized = &mul_saturate_quantized_8<int8_t>;
                 }
             }
             break;
         case DataType::QSYMM16:
             if (dt_input2 == DataType::QSYMM16 && dt_output == DataType::QSYMM16)
             {
                 _func_quantized = &mul_saturate_QSYMM16_QSYMM16_QSYMM16;
             }
             else if (dt_input2 == DataType::QSYMM16 && dt_output == DataType::S32)
             {
                 _func_int = &mul_QSYMM16_QSYMM16_S32;
             }
             break;
         case DataType::S16:
             if (DataType::U8 == dt_input2 && DataType::S16 == dt_output)
             {
                 if (is_scale_255)
                 {
                     _func_int = is_sat ? &mul_S16_U8_S16<true, true> : &mul_S16_U8_S16<true, false>;
                 }
                 else
                 {
                     _func_int = is_sat ? &mul_S16_U8_S16<false, true> : &mul_S16_U8_S16<false, false>;
                 }
             }
             if (DataType::S16 == dt_input2 && DataType::S16 == dt_output)
             {
                 if (is_scale_255)
                 {
                     _func_int = is_sat ? &mul_S16_S16_S16<true, true> : &mul_S16_S16_S16<true, false>;
                 }
                 else
                 {
                     _func_int = is_sat ? &mul_S16_S16_S16<false, true> : &mul_S16_S16_S16<false, false>;
                 }
             }
             break;
         case DataType::S32:
             if (DataType::S32 == dt_input2 && DataType::S32 == dt_output)
             {
                 _func_int = is_sat ? &mul_S32_S32_S32<true> : &mul_S32_S32_S32<false>;
             }
             break;
         case DataType::U8:
             if (DataType::U8 == dt_input2 && DataType::U8 == dt_output)
             {
                 if (is_scale_255)
                 {
                     _func_int = is_sat ? &mul_U8_U8_U8<true, true> : &mul_U8_U8_U8<true, false>;
                 }
                 else
                 {
                     _func_int = is_sat ? &mul_U8_U8_U8<false, true> : &mul_U8_U8_U8<false, false>;
                 }
             }
             else if (DataType::U8 == dt_input2 && DataType::S16 == dt_output)
             {
                 if (is_scale_255)
                 {
                     _func_int = is_sat ? &mul_U8_U8_S16<true, true> : &mul_U8_U8_S16<true, false>;
                 }
                 else
                 {
                     _func_int = is_sat ? &mul_U8_U8_S16<false, true> : &mul_U8_U8_S16<false, false>;
                 }
             }
             else if (DataType::S16 == dt_input2 && DataType::S16 == dt_output)
             {
                 if (is_scale_255)
                 {
                     _func_int = is_sat ? &mul_U8_S16_S16<true, true> : &mul_U8_S16_S16<true, false>;
                 }
                 else
                 {
                     _func_int = is_sat ? &mul_U8_S16_S16<false, true> : &mul_U8_S16_S16<false, false>;
                 }
             }
             break;
         case DataType::F16:
             _func_float = REGISTER_FP16_NEON(cpu::mul_F16_F16_F16);
             break;
         case DataType::F32:
             _func_float = REGISTER_FP32_NEON(cpu::mul_F32_F32_F32);
             break;
         default:
             ARM_COMPUTE_ERROR("You called with the wrong img formats");
     }
  
     // Configure kernel window
     Window win;
     std::tie(win, _split_dimension) = calculate_squashed_or_max_window(*src1, *src2);
  
     ICpuKernel::configure(win);
 }

References ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, ARM_COMPUTE_UNUSED, TensorShape::broadcast_shape(), arm_compute::calculate_squashed_or_max_window(), ITensorInfo::data_type(), arm_compute::test::validation::dst, arm_compute::F16, arm_compute::F32, arm_compute::cpu::mul_F16_F16_F16(), arm_compute::cpu::mul_F32_F32_F32(), arm_compute::QASYMM8, arm_compute::QASYMM8_SIGNED, arm_compute::QSYMM16, REGISTER_FP16_NEON, REGISTER_FP32_NEON, arm_compute::S16, arm_compute::S32, arm_compute::SATURATE, arm_compute::test::validation::scale, arm_compute::set_shape_if_empty(), ITensorInfo::tensor_shape(), arm_compute::U8, and arm_compute::cpu::kernels::validate_arguments().

◆ get_mws()

size_t get_mws	(	const CPUInfo &	platform,
		size_t	thread_count
	)		const

overridevirtual

Return minimum workload size of the relevant kernel.

Parameters

[in]	platform	The CPU platform used to create the context.
[in]	thread_count	Number of threads in the execution.

Returns: [out] mws Minimum workload size for requested configuration.

Reimplemented from ICPPKernel.

Definition at line 1668 of file CpuMulKernel.cpp.

 {
     ARM_COMPUTE_UNUSED(thread_count);
  
 #if defined(ENABLE_FP32_KERNELS)
     if (this->_func_float == &mul_F32_F32_F32)
     {
         size_t mws = ICPPKernel::default_mws;
         if (platform.get_cpu_model() == CPUModel::N1)
         {
             mws = default_mws_N1_fp32_neon;
         }
         else if (platform.get_cpu_model() == CPUModel::V1)
         {
             mws = default_mws_V1_fp32_neon;
         }
         else
         {
             if (_split_dimension == Window::DimX)
             {
                 // Don't split the work load too small if the tensor has been reinterpreted as 1D.
                 // This number is loosely chosen as threading overhead in each platform varies wildly.
                 return default_mws_other_platforms_1d_tensor;
             }
             return default_mws;
         }
  
         // tensor is 1D or was re-interpreted as 1D
         if (this->window().shape().num_dimensions() == 1)
         {
             return mws;
         }
         else
         {
             // scale mws down by the number of elements along all the dimensions (x, z, w, etc) except the one
             // that we parallelize along (the y dimension). This allows for parallelization when the Y_SIZE is small
             // but the other sizes are large, which boosts performance.
             mws = static_cast<size_t>(mws / (this->window().num_iterations_total() / this->window().num_iterations(1)));
             return std::max(static_cast<size_t>(1), mws);
         }
     }
 #else  /* ENABLE_FP32_KERNELS */
     ARM_COMPUTE_UNUSED(platform);
 #endif /* ENABLE_FP32_KERNELS */
     if (_split_dimension == Window::DimX)
     {
         // Don't split the work load too small if the tensor has been reinterpreted as 1D.
         // This number is loosely chosen as threading overhead in each platform varies wildly.
         return default_mws_other_platforms_1d_tensor;
     }
     return default_mws;
 }

References ARM_COMPUTE_UNUSED, ICPPKernel::default_mws, Window::DimX, CPUInfo::get_cpu_model(), arm_compute::cpu::mul_F32_F32_F32(), arm_compute::N1, Window::num_iterations(), Window::num_iterations_total(), arm_compute::test::validation::shape, arm_compute::V1, and IKernel::window().

◆ get_split_dimension_hint()

size_t get_split_dimension_hint ( ) const

inline

Get the preferred dimension in which the scheduler splits the work into multiple jobs.

Returns: The split dimension hint.

Definition at line 108 of file CpuMulKernel.h.

     {
         return _split_dimension;
     }

◆ name()

const char * name ( ) const

overridevirtual

Name of the kernel.

Returns: Kernel name

Implements ICPPKernel.

Definition at line 1759 of file CpuMulKernel.cpp.

 {
     return "CpuMulKernel";
 }

◆ run_op()

void run_op	(	ITensorPack &	tensors,
		const Window &	window,
		const ThreadInfo &	info
	)

overridevirtual

Execute the kernel on the passed window.

Warning: If is_parallelisable() returns false then the passed window must be equal to window()

Note: The window has to be a region within the window returned by the window() method; The width of the window has to be a multiple of num_elems_processed_per_iteration().

Parameters

[in]	tensors	A vector containing the tensors to operate on.
[in]	window	Region on which to execute the kernel. (Must be a region of the window returned by window())
[in]	info	Info about executing thread and CPU.

Reimplemented from ICPPKernel.

Definition at line 1734 of file CpuMulKernel.cpp.

 {
     ARM_COMPUTE_UNUSED(info);
     ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL(this);
     ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW(ICpuKernel::window(), window);
  
     auto src1 = tensors.get_const_tensor(TensorType::ACL_SRC_0);
     auto src2 = tensors.get_const_tensor(TensorType::ACL_SRC_1);
     auto dst  = tensors.get_tensor(TensorType::ACL_DST);
  
     if (_func_quantized != nullptr)
     {
         (*_func_quantized)(src1, src2, dst, window, _scale);
     }
     else if (_func_int != nullptr)
     {
         (*_func_int)(src1, src2, dst, window, _scale_exponent);
     }
     else
     {
         ARM_COMPUTE_ERROR_ON(_func_float == nullptr);
         (*_func_float)(src1, src2, dst, window, _scale);
     }
 }

References arm_compute::ACL_DST, arm_compute::ACL_SRC_0, arm_compute::ACL_SRC_1, ARM_COMPUTE_ERROR_ON, ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW, ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL, ARM_COMPUTE_UNUSED, arm_compute::test::validation::dst, ITensorPack::get_const_tensor(), ITensorPack::get_tensor(), arm_compute::test::validation::info, and IKernel::window().

◆ validate()

Status validate	(	const ITensorInfo *	src1,
		const ITensorInfo *	src2,
		const ITensorInfo *	dst,
		float	scale,
		ConvertPolicy	overflow_policy,
		RoundingPolicy	rounding_policy
	)

static

Static function to check if given info will lead to a valid configuration.

Similar to CpuMulKernel::configure()

Returns: a status

Definition at line 1721 of file CpuMulKernel.cpp.

 {
     ARM_COMPUTE_ERROR_ON_NULLPTR(src1, src2, dst);
     ARM_COMPUTE_RETURN_ON_ERROR(validate_arguments(src1, src2, dst, scale, overflow_policy, rounding_policy));
  
     return Status{};
 }

References ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_RETURN_ON_ERROR, arm_compute::test::validation::dst, arm_compute::test::validation::scale, and arm_compute::cpu::kernels::validate_arguments().

Referenced by CpuMul::validate().

The documentation for this class was generated from the following files:

src/cpu/kernels/CpuMulKernel.h
src/cpu/kernels/CpuMulKernel.cpp

Public Member Functions

Static Public Member Functions

Additional Inherited Members

Detailed Description

Constructor & Destructor Documentation

◆ CpuMulKernel()

Member Function Documentation

◆ ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE()

◆ configure()

◆ get_mws()

◆ get_split_dimension_hint()

◆ name()

◆ run_op()

◆ validate()