Compute Library
 21.02
NEGEMMLowpMatrixBReductionKernel Class Reference

Neon kernel used to compute the row-vectors of sums of all the entries in each column of Matrix B. More...

#include <NEGEMMLowpReductionKernel.h>

Collaboration diagram for NEGEMMLowpMatrixBReductionKernel:
[legend]

Public Member Functions

const char * name () const override
 Name of the kernel. More...
 
 NEGEMMLowpMatrixBReductionKernel ()=default
 Default constructor. More...
 
 NEGEMMLowpMatrixBReductionKernel (const NEGEMMLowpMatrixBReductionKernel &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
NEGEMMLowpMatrixBReductionKerneloperator= (const NEGEMMLowpMatrixBReductionKernel &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 NEGEMMLowpMatrixBReductionKernel (NEGEMMLowpMatrixBReductionKernel &&)=default
 Allow instances of this class to be moved. More...
 
NEGEMMLowpMatrixBReductionKerneloperator= (NEGEMMLowpMatrixBReductionKernel &&)=default
 Allow instances of this class to be moved. More...
 
 ~NEGEMMLowpMatrixBReductionKernel ()=default
 Default destructor. More...
 
void configure (const ITensor *mtx_b, ITensor *vector_sum_col, const GEMMLowpReductionKernelInfo &info) override
 Initialise the kernel's input and output. More...
 
void run (const Window &window, const ThreadInfo &info) override
 Execute the kernel on the passed window. More...
 
- Public Member Functions inherited from INEGEMMLowpReductionKernel
 INEGEMMLowpReductionKernel ()
 Constructor. More...
 
 INEGEMMLowpReductionKernel (const INEGEMMLowpReductionKernel &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
INEGEMMLowpReductionKerneloperator= (const INEGEMMLowpReductionKernel &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 INEGEMMLowpReductionKernel (INEGEMMLowpReductionKernel &&)=default
 Allow instances of this class to be moved. More...
 
INEGEMMLowpReductionKerneloperator= (INEGEMMLowpReductionKernel &&)=default
 Allow instances of this class to be moved. More...
 
virtual ~INEGEMMLowpReductionKernel ()=default
 Default destructor. More...
 
- Public Member Functions inherited from ICPPKernel
virtual ~ICPPKernel ()=default
 Default destructor. More...
 
virtual void run_nd (const Window &window, const ThreadInfo &info, const Window &thread_locator)
 legacy compatibility layer for implemantions which do not support thread_locator In these cases we simply narrow the interface down the legacy version More...
 
virtual void run_op (ITensorPack &tensors, const Window &window, const ThreadInfo &info)
 Execute the kernel on the passed window. More...
 
- Public Member Functions inherited from IKernel
 IKernel ()
 Constructor. More...
 
virtual ~IKernel ()=default
 Destructor. More...
 
virtual bool is_parallelisable () const
 Indicates whether or not the kernel is parallelisable. More...
 
virtual BorderSize border_size () const
 The size of the border for that kernel. More...
 
const Windowwindow () const
 The maximum window the kernel can be executed on. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *mtx_b, const ITensorInfo *vector_sum_col, const GEMMLowpReductionKernelInfo &info)
 Static function to check if given info will lead to a valid configuration of NEGEMMLowpMatrixBReductionKernel. More...
 

Detailed Description

Neon kernel used to compute the row-vectors of sums of all the entries in each column of Matrix B.

Note
This stage is needed to handle the offset of matrix product https://github.com/google/gemmlowp/blob/master/doc/low-precision.md

Definition at line 138 of file NEGEMMLowpReductionKernel.h.

Constructor & Destructor Documentation

◆ NEGEMMLowpMatrixBReductionKernel() [1/3]

Default constructor.

◆ NEGEMMLowpMatrixBReductionKernel() [2/3]

Prevent instances of this class from being copied (As this class contains pointers)

◆ NEGEMMLowpMatrixBReductionKernel() [3/3]

Allow instances of this class to be moved.

◆ ~NEGEMMLowpMatrixBReductionKernel()

Default destructor.

Member Function Documentation

◆ configure()

void configure ( const ITensor mtx_b,
ITensor vector_sum_col,
const GEMMLowpReductionKernelInfo info 
)
overridevirtual

Initialise the kernel's input and output.

Parameters
[in]mtx_bInput tensor. Data type supported: Data type supported: QASYMM8/QASYMM8_SIGNED/QSYMM8/QSYMM8_PER_CHANNEL
[out]vector_sum_colOutput row-vector of sums of all the entries in each column of mtx_b. Data type supported: S32
[in]infoKernel metadata:
  • k (num_mtx_b_rows) Number of matrix B rows.
  • is_reshaped (is_transposed1xW) True if the input tensor is transposed 1xW.
  • scalar Scalar value to multiply each reduced row by.
  • mul_byscalar True if each reduced row must be multiplied by a scalar value.

Implements INEGEMMLowpReductionKernel.

Definition at line 186 of file NEGEMMLowpReductionKernel.cpp.

References ARM_COMPUTE_ERROR_ON_MSG, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, arm_compute::auto_init_if_empty(), arm_compute::calculate_max_window_horizontal(), ITensorInfo::dimension(), ITensor::info(), GEMMLowpReductionKernelInfo::is_reshaped, GEMMLowpReductionKernelInfo::k, GEMMLowpReductionKernelInfo::mul_by_scalar, num_elems_processed_per_iteration, arm_compute::S32, GEMMLowpReductionKernelInfo::scalar, ITensorInfo::set_valid_region(), and ITensorInfo::tensor_shape().

187 {
188  ARM_COMPUTE_ERROR_ON_NULLPTR(mtx_b, vector_sum_col);
189  ARM_COMPUTE_ERROR_ON_MSG(info.is_reshaped == true, "Not supported");
190 
191  ARM_COMPUTE_ERROR_THROW_ON(validate_arguments_matrix_b_reduction(mtx_b->info(), vector_sum_col->info()));
192 
193  _input = mtx_b;
194  _output = vector_sum_col;
195  _k = info.k;
196  _scalar = info.scalar;
197  _mul_by_scalar = info.mul_by_scalar;
198 
199  // Configure kernel window
200  constexpr unsigned int num_elems_processed_per_iteration = 16;
201 
202  // Output auto initialization if not yet initialized
203  auto_init_if_empty(*_output->info(), TensorShape(_input->info()->dimension(0)), 1, DataType::S32);
204 
205  // Configure kernel window
206  Window win = calculate_max_window_horizontal(*_output->info(), Steps(num_elems_processed_per_iteration));
207  _output->info()->set_valid_region(ValidRegion(Coordinates(), _output->info()->tensor_shape()));
208  INEKernel::configure(win);
209 }
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
Window calculate_max_window_horizontal(const ValidRegion &valid_region, const Steps &steps, bool skip_border, BorderSize border_size)
1 channel, 1 S32 per channel
#define ARM_COMPUTE_ERROR_ON_MSG(cond, msg)
Definition: Error.h:456
bool auto_init_if_empty(ITensorInfo &info, const TensorShape &shape, int num_channels, DataType data_type, QuantizationInfo quantization_info=QuantizationInfo())
Auto initialize the tensor info (shape, number of channels and data type) if the current assignment i...
ScaleKernelInfo info(interpolation_policy, default_border_mode, PixelValue(), sampling_policy, false)
unsigned int num_elems_processed_per_iteration
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161

◆ name()

const char* name ( ) const
inlineoverridevirtual

Name of the kernel.

Returns
Kernel name

Implements ICPPKernel.

Definition at line 141 of file NEGEMMLowpReductionKernel.h.

References INEGEMMLowpReductionKernel::configure(), arm_compute::test::validation::info, INEGEMMLowpReductionKernel::operator=(), ICPPKernel::run(), arm_compute::validate(), and IKernel::window().

142  {
143  return "NEGEMMLowpMatrixBReductionKernel";
144  }

◆ operator=() [1/2]

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

Allow instances of this class to be moved.

◆ run()

void run ( const Window window,
const ThreadInfo info 
)
overridevirtual

Execute the kernel on the passed window.

Warning
If is_parallelisable() returns false then the passed window must be equal to window()
Note
The window has to be a region within the window returned by the window() method
The width of the window has to be a multiple of num_elems_processed_per_iteration().
Parameters
[in]windowRegion on which to execute the kernel. (Must be a region of the window returned by window())
[in]infoInfo about executing thread and CPU.

Reimplemented from ICPPKernel.

Definition at line 365 of file NEGEMMLowpReductionKernel.cpp.

References ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW, ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL, ARM_COMPUTE_UNUSED, ITensorInfo::data_type(), ITensor::info(), arm_compute::test::validation::info, arm_compute::QASYMM8, arm_compute::QASYMM8_SIGNED, arm_compute::QSYMM8, arm_compute::QSYMM8_PER_CHANNEL, and IKernel::window().

366 {
370 
371  switch(_input->info()->data_type())
372  {
373  case DataType::QASYMM8:
374  run_internal<uint8_t>(window, info);
375  break;
377  case DataType::QSYMM8:
379  run_internal<int8_t>(window, info);
380  break;
381  default:
382  ARM_COMPUTE_ERROR("Unsupported data type");
383  }
384 }
const Window & window() const
The maximum window the kernel can be executed on.
Definition: IKernel.cpp:28
#define ARM_COMPUTE_ERROR(msg)
Print the given message then throw an std::runtime_error.
Definition: Error.h:352
#define ARM_COMPUTE_UNUSED(...)
To avoid unused variables warnings.
Definition: Error.h:152
quantized, asymmetric fixed-point 8-bit number unsigned
#define ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL(k)
Definition: Validate.h:941
quantized, symmetric fixed-point 8-bit number
quantized, symmetric per channel fixed-point 8-bit number
ScaleKernelInfo info(interpolation_policy, default_border_mode, PixelValue(), sampling_policy, false)
quantized, asymmetric fixed-point 8-bit number signed
#define ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW(f, s)
Definition: Validate.h:205

◆ validate()

Status validate ( const ITensorInfo mtx_b,
const ITensorInfo vector_sum_col,
const GEMMLowpReductionKernelInfo info 
)
static

Static function to check if given info will lead to a valid configuration of NEGEMMLowpMatrixBReductionKernel.

Parameters
[in]mtx_bInput tensor. Data type supported: Data type supported: QASYMM8/QASYMM8_SIGNED/QSYMM8/QSYMM8_PER_CHANNEL
[in]vector_sum_colOutput row-vector of sums of all the entries in each column of mtx_b. Data type supported: S32
[in]infoKernel metadata:
  • k (num_mtx_b_rows) Number of matrix B rows.
  • is_reshaped (is_transposed1xW) True if the input tensor is transposed 1xW.
  • scalar Scalar value to multiply each reduced row by.
  • mul_byscalar True if each reduced row must be multiplied by a scalar value.
Returns
a status

Definition at line 211 of file NEGEMMLowpReductionKernel.cpp.

References ARM_COMPUTE_RETURN_ON_ERROR, ARM_COMPUTE_UNUSED, arm_compute::ceil_to_multiple(), Window::collapse_if_possible(), ITensorInfo::dimension(), Window::DimX, Window::DimY, Window::DimZ, arm_compute::execute_window_loop(), ITensor::info(), arm_compute::test::validation::info, ThreadInfo::num_threads, Iterator::ptr(), Window::set(), ITensorInfo::strides_in_bytes(), ThreadInfo::thread_id, arm_compute::wrapper::vaddw(), arm_compute::wrapper::vdup_n(), arm_compute::wrapper::vgethigh(), arm_compute::wrapper::vgetlow(), arm_compute::wrapper::vloadq(), arm_compute::wrapper::vmovl(), arm_compute::wrapper::vmul(), arm_compute::wrapper::vreinterpret(), arm_compute::wrapper::vstore(), and IKernel::window().

Referenced by NEGEMMLowpMatrixMultiplyCore::validate().

212 {
214  ARM_COMPUTE_RETURN_ON_ERROR(validate_arguments_matrix_b_reduction(mtx_b, vector_sum_col));
215 
216  return Status{};
217 }
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204
#define ARM_COMPUTE_UNUSED(...)
To avoid unused variables warnings.
Definition: Error.h:152
ScaleKernelInfo info(interpolation_policy, default_border_mode, PixelValue(), sampling_policy, false)

The documentation for this class was generated from the following files: