Compute Library
 21.08
CpuMulKernel Class Reference

Interface for the kernel to perform multiplication between two tensors. More...

#include <CpuMulKernel.h>

Collaboration diagram for CpuMulKernel:
[legend]

Public Member Functions

 CpuMulKernel ()=default
 
 ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE (CpuMulKernel)
 
void configure (ITensorInfo *src1, ITensorInfo *src2, ITensorInfo *dst, float scale, ConvertPolicy overflow_policy, RoundingPolicy rounding_policy)
 Initialise the kernel's input, dst and border mode. More...
 
void run_op (ITensorPack &tensors, const Window &window, const ThreadInfo &info) override
 Execute the kernel on the passed window. More...
 
const char * name () const override
 Name of the kernel. More...
 
- Public Member Functions inherited from ICPPKernel
virtual ~ICPPKernel ()=default
 Default destructor. More...
 
virtual void run (const Window &window, const ThreadInfo &info)
 Execute the kernel on the passed window. More...
 
virtual void run_nd (const Window &window, const ThreadInfo &info, const Window &thread_locator)
 legacy compatibility layer for implemantions which do not support thread_locator In these cases we simply narrow the interface down the legacy version More...
 
- Public Member Functions inherited from IKernel
 IKernel ()
 Constructor. More...
 
virtual ~IKernel ()=default
 Destructor. More...
 
virtual bool is_parallelisable () const
 Indicates whether or not the kernel is parallelisable. More...
 
virtual BorderSize border_size () const
 The size of the border for that kernel. More...
 
const Windowwindow () const
 The maximum window the kernel can be executed on. More...
 
bool is_window_configured () const
 Function to check if the embedded window of this kernel has been configured. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *src1, const ITensorInfo *src2, const ITensorInfo *dst, float scale, ConvertPolicy overflow_policy, RoundingPolicy rounding_policy)
 Static function to check if given info will lead to a valid configuration. More...
 

Detailed Description

Interface for the kernel to perform multiplication between two tensors.

Definition at line 37 of file CpuMulKernel.h.

Constructor & Destructor Documentation

◆ CpuMulKernel()

CpuMulKernel ( )
default

Member Function Documentation

◆ ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE()

ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE ( CpuMulKernel  )

◆ configure()

void configure ( ITensorInfo src1,
ITensorInfo src2,
ITensorInfo dst,
float  scale,
ConvertPolicy  overflow_policy,
RoundingPolicy  rounding_policy 
)

Initialise the kernel's input, dst and border mode.

Valid configurations (Src1,Src2) -> Dst :

                                                  Support: Broadcast? Scale=1/255?
  • (U8,U8) -> U8, S16 N Y
  • (U8,S16) -> S16 N Y
  • (S16,U8) -> S16 N Y
  • (S16,S16) -> S16 N Y
  • (S32,S32) -> S32 Y N
  • (F16,F16) -> F16 N Y
  • (F32,F32) -> F32 Y Y
  • (QASYMM8,QASYMM8) -> QASYMM8 Y Y
  • (QASYMM8_SIGNED,QASYMM8_SIGNED) -> QASYMM8_SIGNED Y Y
  • (QSYMM16,QSYMM16) -> QSYMM16, S32 N Y
Note
For scale equal to 1/255 only round to nearest even (implemented as round half up) is supported. For all other scale values only round to zero (implemented as round towards minus infinity) is supported.
Parameters
[in]src1First input tensor. Data types supported: U8/QASYMM8/QASYMM8_SIGNED/S16/S32/QSYMM16/F16/F32
[in]src2Second input tensor. Data types supported: U8/QASYMM8/QASYMM8_SIGNED/S16/S32/QSYMM16/F16/F32
[out]dstDst tensor. Data types supported: U8/QASYMM8/QASYMM8_SIGNED/S16/S32/QSYMM16/F16/F32
[in]scaleScale to apply after multiplication. Scale must be positive and its value must be either 1/255 or 1/2^n where n is between 0 and 15. If both src1, src2 and dst are of datatype S32, scale cannot be 1/255
[in]overflow_policyOverflow policy. ConvertPolicy cannot be WRAP if any of the inputs is of quantized datatype
[in]rounding_policyRounding policy.

Definition at line 1478 of file CpuMulKernel.cpp.

References ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, ARM_COMPUTE_UNUSED, TensorShape::broadcast_shape(), arm_compute::calculate_max_window(), ITensorInfo::data_type(), arm_compute::F16, arm_compute::F32, arm_compute::QASYMM8, arm_compute::QASYMM8_SIGNED, arm_compute::QSYMM16, arm_compute::S16, arm_compute::S32, arm_compute::SATURATE, arm_compute::test::validation::scale, arm_compute::set_shape_if_empty(), ITensorInfo::tensor_shape(), and arm_compute::U8.

1479 {
1480  ARM_COMPUTE_UNUSED(rounding_policy);
1481  ARM_COMPUTE_ERROR_ON_NULLPTR(src1, src2, dst);
1482 
1483  ARM_COMPUTE_ERROR_THROW_ON(validate_arguments(src1, src2, dst, scale, overflow_policy, rounding_policy));
1484 
1485  const TensorShape &out_shape = TensorShape::broadcast_shape(src1->tensor_shape(), src2->tensor_shape());
1486 
1487  // Auto initialize dst if not initialized
1488  set_shape_if_empty(*dst, out_shape);
1489 
1490  _scale = scale;
1491  _scale_exponent = 0;
1492  _func_quantized = nullptr;
1493  _func_int = nullptr;
1494  _func_float = nullptr;
1495 
1496  bool is_scale_255 = false;
1497  // Check and validate scaling factor
1498  if(std::abs(scale - scale255_constant) < 0.00001f)
1499  {
1500  is_scale_255 = true;
1501  }
1502  else
1503  {
1504  int exponent = 0;
1505 
1506  std::frexp(scale, &exponent);
1507 
1508  // Store the positive exponent. We know that we compute 1/2^n
1509  // Additionally we need to subtract 1 to compensate that frexp used a mantissa of 0.5
1510  _scale_exponent = std::abs(exponent - 1);
1511  }
1512 
1513  const DataType dt_input1 = src1->data_type();
1514  const DataType dt_input2 = src2->data_type();
1515  const DataType dt_output = dst->data_type();
1516  const bool is_sat = (overflow_policy == ConvertPolicy::SATURATE);
1517 
1518  switch(dt_input1)
1519  {
1520  case DataType::QASYMM8:
1521  if(dt_input2 == DataType::QASYMM8 && dt_output == DataType::QASYMM8)
1522  {
1523  _func_quantized = &mul_saturate_quantized_8<uint8_t>;
1524  }
1525  break;
1527  if(dt_input2 == DataType::QASYMM8_SIGNED)
1528  {
1529  _func_quantized = &mul_saturate_quantized_8<int8_t>;
1530  ;
1531  }
1532  break;
1533  case DataType::QSYMM16:
1534  if(dt_input2 == DataType::QSYMM16 && dt_output == DataType::QSYMM16)
1535  {
1536  _func_quantized = &mul_saturate_QSYMM16_QSYMM16_QSYMM16;
1537  }
1538  else if(dt_input2 == DataType::QSYMM16 && dt_output == DataType::S32)
1539  {
1540  _func_int = &mul_QSYMM16_QSYMM16_S32;
1541  }
1542  break;
1543  case DataType::S16:
1544  if(DataType::U8 == dt_input2 && DataType::S16 == dt_output)
1545  {
1546  if(is_scale_255)
1547  {
1548  _func_int = is_sat ? &mul_S16_U8_S16<true, true> : &mul_S16_U8_S16<true, false>;
1549  }
1550  else
1551  {
1552  _func_int = is_sat ? &mul_S16_U8_S16<false, true> : &mul_S16_U8_S16<false, false>;
1553  }
1554  }
1555  if(DataType::S16 == dt_input2 && DataType::S16 == dt_output)
1556  {
1557  if(is_scale_255)
1558  {
1559  _func_int = is_sat ? &mul_S16_S16_S16<true, true> : &mul_S16_S16_S16<true, false>;
1560  }
1561  else
1562  {
1563  _func_int = is_sat ? &mul_S16_S16_S16<false, true> : &mul_S16_S16_S16<false, false>;
1564  }
1565  }
1566  break;
1567  case DataType::S32:
1568  if(DataType::S32 == dt_input2 && DataType::S32 == dt_output)
1569  {
1570  _func_int = is_sat ? &mul_S32_S32_S32<true> : &mul_S32_S32_S32<false>;
1571  }
1572  break;
1573  case DataType::U8:
1574  if(DataType::U8 == dt_input2 && DataType::U8 == dt_output)
1575  {
1576  if(is_scale_255)
1577  {
1578  _func_int = is_sat ? &mul_U8_U8_U8<true, true> : &mul_U8_U8_U8<true, false>;
1579  }
1580  else
1581  {
1582  _func_int = is_sat ? &mul_U8_U8_U8<false, true> : &mul_U8_U8_U8<false, false>;
1583  }
1584  }
1585  else if(DataType::U8 == dt_input2 && DataType::S16 == dt_output)
1586  {
1587  if(is_scale_255)
1588  {
1589  _func_int = is_sat ? &mul_U8_U8_S16<true, true> : &mul_U8_U8_S16<true, false>;
1590  }
1591  else
1592  {
1593  _func_int = is_sat ? &mul_U8_U8_S16<false, true> : &mul_U8_U8_S16<false, false>;
1594  }
1595  }
1596  else if(DataType::S16 == dt_input2 && DataType::S16 == dt_output)
1597  {
1598  if(is_scale_255)
1599  {
1600  _func_int = is_sat ? &mul_U8_S16_S16<true, true> : &mul_U8_S16_S16<true, false>;
1601  }
1602  else
1603  {
1604  _func_int = is_sat ? &mul_U8_S16_S16<false, true> : &mul_U8_S16_S16<false, false>;
1605  }
1606  }
1607  break;
1608 #ifdef __ARM_FEATURE_FP16_VECTOR_ARITHMETIC
1609  case DataType::F16:
1610  _func_float = &mul_F16_F16_F16;
1611  break;
1612 #endif /* __ARM_FEATURE_FP16_VECTOR_ARITHMETIC */
1613  case DataType::F32:
1614  _func_float = &mul_F32_F32_F32;
1615  break;
1616  default:
1617  ARM_COMPUTE_ERROR("You called with the wrong img formats");
1618  }
1619 
1620  // Configure kernel window
1621  Window win = calculate_max_window(out_shape);
1622 
1623  ICpuKernel::configure(win);
1624 }
Window calculate_max_window(const ValidRegion &valid_region, const Steps &steps, bool skip_border, BorderSize border_size)
quantized, symmetric fixed-point 16-bit number
#define ARM_COMPUTE_ERROR(msg)
Print the given message then throw an std::runtime_error.
Definition: Error.h:352
1 channel, 1 U8 per channel
1 channel, 1 F32 per channel
static TensorShape broadcast_shape(const Shapes &... shapes)
If shapes are broadcast compatible, return the broadcasted shape.
Definition: TensorShape.h:211
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
1 channel, 1 F16 per channel
1 channel, 1 S32 per channel
#define ARM_COMPUTE_UNUSED(...)
To avoid unused variables warnings.
Definition: Error.h:152
quantized, asymmetric fixed-point 8-bit number unsigned
bool set_shape_if_empty(ITensorInfo &info, const TensorShape &shape)
Set the shape to the specified value if the current assignment is empty.
1 channel, 1 S16 per channel
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:157
quantized, asymmetric fixed-point 8-bit number signed
DataType
Available data types.
Definition: Types.h:77

◆ name()

const char * name ( ) const
overridevirtual

◆ run_op()

void run_op ( ITensorPack tensors,
const Window window,
const ThreadInfo info 
)
overridevirtual

Execute the kernel on the passed window.

Warning
If is_parallelisable() returns false then the passed window must be equal to window()
Note
The window has to be a region within the window returned by the window() method
The width of the window has to be a multiple of num_elems_processed_per_iteration().
Parameters
[in]tensorsA vector containing the tensors to operate on.
[in]windowRegion on which to execute the kernel. (Must be a region of the window returned by window())
[in]infoInfo about executing thread and CPU.

Reimplemented from ICPPKernel.

Definition at line 1635 of file CpuMulKernel.cpp.

References arm_compute::ACL_DST, arm_compute::ACL_SRC_0, arm_compute::ACL_SRC_1, ARM_COMPUTE_ERROR_ON, ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW, ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL, ARM_COMPUTE_UNUSED, arm_compute::test::validation::dst, ITensorPack::get_const_tensor(), ITensorPack::get_tensor(), and IKernel::window().

1636 {
1640 
1641  auto src1 = tensors.get_const_tensor(TensorType::ACL_SRC_0);
1642  auto src2 = tensors.get_const_tensor(TensorType::ACL_SRC_1);
1643  auto dst = tensors.get_tensor(TensorType::ACL_DST);
1644 
1645  if(_func_quantized != nullptr)
1646  {
1647  (*_func_quantized)(src1, src2, dst, window, _scale);
1648  }
1649  else if(_func_int != nullptr)
1650  {
1651  (*_func_int)(src1, src2, dst, window, _scale_exponent);
1652  }
1653  else
1654  {
1655  ARM_COMPUTE_ERROR_ON(_func_float == nullptr);
1656  (*_func_float)(src1, src2, dst, window, _scale);
1657  }
1658 }
const Window & window() const
The maximum window the kernel can be executed on.
Definition: IKernel.cpp:28
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:466
#define ARM_COMPUTE_UNUSED(...)
To avoid unused variables warnings.
Definition: Error.h:152
#define ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL(k)
Definition: Validate.h:915
ScaleKernelInfo info(interpolation_policy, default_border_mode, PixelValue(), sampling_policy, false)
#define ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW(f, s)
Definition: Validate.h:201

◆ validate()

Status validate ( const ITensorInfo src1,
const ITensorInfo src2,
const ITensorInfo dst,
float  scale,
ConvertPolicy  overflow_policy,
RoundingPolicy  rounding_policy 
)
static

Static function to check if given info will lead to a valid configuration.

Similar to CpuMulKernel::configure()

Returns
a status

Definition at line 1626 of file CpuMulKernel.cpp.

References ARM_COMPUTE_ERROR_ON_NULLPTR, and ARM_COMPUTE_RETURN_ON_ERROR.

Referenced by CpuMul::validate().

1628 {
1629  ARM_COMPUTE_ERROR_ON_NULLPTR(src1, src2, dst);
1630  ARM_COMPUTE_RETURN_ON_ERROR(validate_arguments(src1, src2, dst, scale, overflow_policy, rounding_policy));
1631 
1632  return Status{};
1633 }
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:157

The documentation for this class was generated from the following files: