Compute Library
 21.02
NEGEMMLowpMatrixAReductionKernel Class Reference

Neon kernel used to compute the row-vectors of sums of all the entries in each row of Matrix A. More...

#include <NEGEMMLowpReductionKernel.h>

Collaboration diagram for NEGEMMLowpMatrixAReductionKernel:
[legend]

Public Member Functions

const char * name () const override
 Name of the kernel. More...
 
 NEGEMMLowpMatrixAReductionKernel ()=default
 Default constructor. More...
 
 NEGEMMLowpMatrixAReductionKernel (const NEGEMMLowpMatrixAReductionKernel &)=delete
 Prevent instances of this class from being copied. More...
 
NEGEMMLowpMatrixAReductionKerneloperator= (const NEGEMMLowpMatrixAReductionKernel &)=delete
 Prevent instances of this class from being copied. More...
 
 NEGEMMLowpMatrixAReductionKernel (NEGEMMLowpMatrixAReductionKernel &&)=default
 Allow instances of this class to be moved. More...
 
NEGEMMLowpMatrixAReductionKerneloperator= (NEGEMMLowpMatrixAReductionKernel &&)=default
 Allow instances of this class to be moved. More...
 
 ~NEGEMMLowpMatrixAReductionKernel ()=default
 Default destructor. More...
 
void configure (const ITensor *mtx_a, ITensor *vector_sum_row, const GEMMLowpReductionKernelInfo &info) override
 Initialise the kernel's input and output. More...
 
void run (const Window &window, const ThreadInfo &info) override
 Execute the kernel on the passed window. More...
 
- Public Member Functions inherited from INEGEMMLowpReductionKernel
 INEGEMMLowpReductionKernel ()
 Constructor. More...
 
 INEGEMMLowpReductionKernel (const INEGEMMLowpReductionKernel &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
INEGEMMLowpReductionKerneloperator= (const INEGEMMLowpReductionKernel &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 INEGEMMLowpReductionKernel (INEGEMMLowpReductionKernel &&)=default
 Allow instances of this class to be moved. More...
 
INEGEMMLowpReductionKerneloperator= (INEGEMMLowpReductionKernel &&)=default
 Allow instances of this class to be moved. More...
 
virtual ~INEGEMMLowpReductionKernel ()=default
 Default destructor. More...
 
- Public Member Functions inherited from ICPPKernel
virtual ~ICPPKernel ()=default
 Default destructor. More...
 
virtual void run_nd (const Window &window, const ThreadInfo &info, const Window &thread_locator)
 legacy compatibility layer for implemantions which do not support thread_locator In these cases we simply narrow the interface down the legacy version More...
 
virtual void run_op (ITensorPack &tensors, const Window &window, const ThreadInfo &info)
 Execute the kernel on the passed window. More...
 
- Public Member Functions inherited from IKernel
 IKernel ()
 Constructor. More...
 
virtual ~IKernel ()=default
 Destructor. More...
 
virtual bool is_parallelisable () const
 Indicates whether or not the kernel is parallelisable. More...
 
virtual BorderSize border_size () const
 The size of the border for that kernel. More...
 
const Windowwindow () const
 The maximum window the kernel can be executed on. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *mtx_a, const ITensorInfo *vector_sum_row, const GEMMLowpReductionKernelInfo &info)
 Static function to check if given info will lead to a valid configuration of NEGEMMLowpMatrixAReductionKernel. More...
 

Detailed Description

Neon kernel used to compute the row-vectors of sums of all the entries in each row of Matrix A.

Note
This stage is needed to handle the offset of matrix product https://github.com/google/gemmlowp/blob/master/doc/low-precision.md

Definition at line 77 of file NEGEMMLowpReductionKernel.h.

Constructor & Destructor Documentation

◆ NEGEMMLowpMatrixAReductionKernel() [1/3]

Default constructor.

◆ NEGEMMLowpMatrixAReductionKernel() [2/3]

Prevent instances of this class from being copied.

◆ NEGEMMLowpMatrixAReductionKernel() [3/3]

Allow instances of this class to be moved.

◆ ~NEGEMMLowpMatrixAReductionKernel()

Default destructor.

Member Function Documentation

◆ configure()

void configure ( const ITensor mtx_a,
ITensor vector_sum_row,
const GEMMLowpReductionKernelInfo info 
)
overridevirtual

Initialise the kernel's input and output.

Parameters
[in]mtx_aInput tensor. Data type supported: QASYMM8/QASYMM8_SIGNED/QSYMM8/QSYMM8_PER_CHANNEL
[out]vector_sum_rowOutput row-vector of sums of all the entries in each row of mtx_a. Data type supported: S32
[in]infoKernel metadata:
  • k (num_mtx_a_cols) Number of matrix A columns
  • is_reshaped (is_interleaved4x4) True if the matrix A has been interleaved4x4
  • scalar Scalar value to multiply each reduced row by.
  • mul_byscalar True if each reduced column must be multiplied by a scalar value.

Implements INEGEMMLowpReductionKernel.

Definition at line 69 of file NEGEMMLowpReductionKernel.cpp.

References ARM_COMPUTE_ERROR_ON_MSG, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, arm_compute::auto_init_if_empty(), arm_compute::calculate_max_window(), ITensorInfo::dimension(), ITensor::info(), GEMMLowpReductionKernelInfo::is_reshaped, GEMMLowpReductionKernelInfo::k, GEMMLowpReductionKernelInfo::mul_by_scalar, arm_compute::S32, GEMMLowpReductionKernelInfo::scalar, ITensorInfo::set_valid_region(), and ITensorInfo::tensor_shape().

70 {
71  // Perform validate step
72  ARM_COMPUTE_ERROR_ON_NULLPTR(mtx_a, vector_sum_row);
73  ARM_COMPUTE_ERROR_ON_MSG(info.is_reshaped == true, "Not supported");
74  ARM_COMPUTE_ERROR_THROW_ON(validate_arguments_matrix_a_reduction(mtx_a->info(), vector_sum_row->info()));
75  _input = mtx_a;
76  _output = vector_sum_row;
77  _k = info.k;
78  _scalar = info.scalar;
79  _mul_by_scalar = info.mul_by_scalar;
80 
81  // Output auto initialization if not yet initialized
82  auto_init_if_empty(*_output->info(), TensorShape(_input->info()->dimension(1)), 1, DataType::S32);
83 
84  Window win = calculate_max_window(*_output->info(), Steps(1));
85  _output->info()->set_valid_region(ValidRegion(Coordinates(), _output->info()->tensor_shape()));
86 
87  INEKernel::configure(win);
88 }
Window calculate_max_window(const ValidRegion &valid_region, const Steps &steps, bool skip_border, BorderSize border_size)
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
1 channel, 1 S32 per channel
#define ARM_COMPUTE_ERROR_ON_MSG(cond, msg)
Definition: Error.h:456
bool auto_init_if_empty(ITensorInfo &info, const TensorShape &shape, int num_channels, DataType data_type, QuantizationInfo quantization_info=QuantizationInfo())
Auto initialize the tensor info (shape, number of channels and data type) if the current assignment i...
ScaleKernelInfo info(interpolation_policy, default_border_mode, PixelValue(), sampling_policy, false)
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161

◆ name()

const char* name ( ) const
inlineoverridevirtual

Name of the kernel.

Returns
Kernel name

Implements ICPPKernel.

Definition at line 80 of file NEGEMMLowpReductionKernel.h.

References INEGEMMLowpReductionKernel::configure(), arm_compute::test::validation::info, INEGEMMLowpReductionKernel::operator=(), ICPPKernel::run(), arm_compute::validate(), and IKernel::window().

81  {
82  return "NEGEMMLowpMatrixAReductionKernel";
83  }

◆ operator=() [1/2]

Prevent instances of this class from being copied.

◆ operator=() [2/2]

Allow instances of this class to be moved.

◆ run()

void run ( const Window window,
const ThreadInfo info 
)
overridevirtual

Execute the kernel on the passed window.

Warning
If is_parallelisable() returns false then the passed window must be equal to window()
Note
The window has to be a region within the window returned by the window() method
The width of the window has to be a multiple of num_elems_processed_per_iteration().
Parameters
[in]windowRegion on which to execute the kernel. (Must be a region of the window returned by window())
[in]infoInfo about executing thread and CPU.

Reimplemented from ICPPKernel.

Definition at line 165 of file NEGEMMLowpReductionKernel.cpp.

References ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW, ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL, ARM_COMPUTE_UNUSED, ITensorInfo::data_type(), ITensor::info(), arm_compute::QASYMM8, arm_compute::QASYMM8_SIGNED, arm_compute::QSYMM8, arm_compute::QSYMM8_PER_CHANNEL, and IKernel::window().

166 {
170 
171  switch(_input->info()->data_type())
172  {
173  case DataType::QASYMM8:
174  run_internal<uint8_t>(window);
175  break;
177  case DataType::QSYMM8:
179  run_internal<int8_t>(window);
180  break;
181  default:
182  ARM_COMPUTE_ERROR("Unsupported data type");
183  }
184 }
const Window & window() const
The maximum window the kernel can be executed on.
Definition: IKernel.cpp:28
#define ARM_COMPUTE_ERROR(msg)
Print the given message then throw an std::runtime_error.
Definition: Error.h:352
#define ARM_COMPUTE_UNUSED(...)
To avoid unused variables warnings.
Definition: Error.h:152
quantized, asymmetric fixed-point 8-bit number unsigned
#define ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL(k)
Definition: Validate.h:941
quantized, symmetric fixed-point 8-bit number
quantized, symmetric per channel fixed-point 8-bit number
ScaleKernelInfo info(interpolation_policy, default_border_mode, PixelValue(), sampling_policy, false)
quantized, asymmetric fixed-point 8-bit number signed
#define ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW(f, s)
Definition: Validate.h:205

◆ validate()

Status validate ( const ITensorInfo mtx_a,
const ITensorInfo vector_sum_row,
const GEMMLowpReductionKernelInfo info 
)
static

Static function to check if given info will lead to a valid configuration of NEGEMMLowpMatrixAReductionKernel.

Parameters
[in]mtx_aInput tensor. Data type supported: QASYMM8/QASYMM8_SIGNED/QSYMM8/QSYMM8_PER_CHANNEL
[in]vector_sum_rowOutput row-vector of sums of all the entries in each row of mtx_a. Data type supported: S32
[in]infoKernel metadata:
  • k (num_mtx_a_cols) Number of matrix A columns
  • is_reshaped (is_interleaved4x4) True if the matrix A has been interleaved4x4
  • scalar Scalar value to multiply each reduced row by.
  • mul_byscalar True if each reduced column must be multiplied by a scalar value.
Returns
a status

Definition at line 90 of file NEGEMMLowpReductionKernel.cpp.

References ARM_COMPUTE_RETURN_ON_ERROR, ARM_COMPUTE_UNUSED, Window::collapse_if_possible(), Window::DimX, Window::DimY, Window::DimZ, arm_compute::execute_window_loop(), ITensor::info(), Iterator::ptr(), Window::set(), ITensorInfo::strides_in_bytes(), arm_compute::wrapper::vadd(), arm_compute::wrapper::vaddl(), arm_compute::wrapper::vdup_n(), arm_compute::wrapper::vgethigh(), arm_compute::wrapper::vgetlane(), arm_compute::wrapper::vgetlow(), arm_compute::wrapper::vloadq(), arm_compute::wrapper::vpadd(), arm_compute::wrapper::vpaddl(), IKernel::window(), and Dimensions< T >::y().

Referenced by NEGEMMLowpMatrixMultiplyCore::validate(), and NEQLSTMLayer::validate().

91 {
93  ARM_COMPUTE_RETURN_ON_ERROR(validate_arguments_matrix_a_reduction(mtx_a, vector_sum_row));
94  return Status{};
95 }
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204
#define ARM_COMPUTE_UNUSED(...)
To avoid unused variables warnings.
Definition: Error.h:152
ScaleKernelInfo info(interpolation_policy, default_border_mode, PixelValue(), sampling_policy, false)

The documentation for this class was generated from the following files: