Compute Library
 21.02
NEGEMMMatrixMultiplyKernel Class Reference

Neon kernel to multiply two input matrices "A" and "B". More...

#include <NEGEMMMatrixMultiplyKernel.h>

Collaboration diagram for NEGEMMMatrixMultiplyKernel:
[legend]

Public Member Functions

const char * name () const override
 Name of the kernel. More...
 
 NEGEMMMatrixMultiplyKernel ()
 Constructor. More...
 
 NEGEMMMatrixMultiplyKernel (const NEGEMMMatrixMultiplyKernel &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
NEGEMMMatrixMultiplyKerneloperator= (const NEGEMMMatrixMultiplyKernel &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 NEGEMMMatrixMultiplyKernel (NEGEMMMatrixMultiplyKernel &&)=default
 Allow instances of this class to be moved. More...
 
NEGEMMMatrixMultiplyKerneloperator= (NEGEMMMatrixMultiplyKernel &&)=default
 Allow instances of this class to be moved. More...
 
void configure (const ITensor *input0, const ITensor *input1, ITensor *output, float alpha, bool is_interleaved, const GEMMReshapeInfo &reshape_info=GEMMReshapeInfo())
 Initialise the kernel's input and output. More...
 
void run (const Window &window, const ThreadInfo &info) override
 Execute the kernel on the passed window. More...
 
- Public Member Functions inherited from ICPPKernel
virtual ~ICPPKernel ()=default
 Default destructor. More...
 
virtual void run_nd (const Window &window, const ThreadInfo &info, const Window &thread_locator)
 legacy compatibility layer for implemantions which do not support thread_locator In these cases we simply narrow the interface down the legacy version More...
 
virtual void run_op (ITensorPack &tensors, const Window &window, const ThreadInfo &info)
 Execute the kernel on the passed window. More...
 
- Public Member Functions inherited from IKernel
 IKernel ()
 Constructor. More...
 
virtual ~IKernel ()=default
 Destructor. More...
 
virtual bool is_parallelisable () const
 Indicates whether or not the kernel is parallelisable. More...
 
virtual BorderSize border_size () const
 The size of the border for that kernel. More...
 
const Windowwindow () const
 The maximum window the kernel can be executed on. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *input0, const ITensorInfo *input1, const ITensorInfo *output, float alpha, bool is_interleaved, const GEMMReshapeInfo &reshape_info)
 Static function to check if given info will lead to a valid configuration of NEGEMMMatrixMultiplyKernel. More...
 

Detailed Description

Neon kernel to multiply two input matrices "A" and "B".

All elements of the output matrix/vector will be multiplied by alpha after the matrix multiplication

Note
If the output tensor is a matrix, the implementation assumes that the input tensors input0 and input1 are both matrices and reshaped respectively with NEGEMMInterleave4x4Kernel" and NEGEMMTranspose1xWKernel
If the output tensor is a vector and the data type is F32, the implementation assumes that the first input tensor input0 is a vector and the second input tensor input1 a matrix. The implementation also assumes that both tensors have not been reshaped

Definition at line 39 of file NEGEMMMatrixMultiplyKernel.h.

Constructor & Destructor Documentation

◆ NEGEMMMatrixMultiplyKernel() [1/3]

Constructor.

Definition at line 1088 of file NEGEMMMatrixMultiplyKernel.cpp.

Referenced by NEGEMMMatrixMultiplyKernel::name().

1089  : _input0(nullptr), _input1(nullptr), _output(nullptr), _alpha(1.0f)
1090 {
1091 }

◆ NEGEMMMatrixMultiplyKernel() [2/3]

Prevent instances of this class from being copied (As this class contains pointers)

◆ NEGEMMMatrixMultiplyKernel() [3/3]

Allow instances of this class to be moved.

Member Function Documentation

◆ configure()

void configure ( const ITensor input0,
const ITensor input1,
ITensor output,
float  alpha,
bool  is_interleaved,
const GEMMReshapeInfo reshape_info = GEMMReshapeInfo() 
)

Initialise the kernel's input and output.

Note
If the output tensor is a matrix, the input matrices input0 and input1 should be the output of the kernels: NEGEMMInterleave4x4Kernel and NEGEMMTranspose1xWKernel These two kernels change the layout of the original matrices to be more cache-friendly.
Parameters
[in]input0Input tensor containing the interleaved Matrix A or the vector A. Data types supported: F16/F32
[in]input1Input tensor containing the transposed Matrix B if the first input tensor A is not a vector. If the output tensor is a vector, input1 must contain the matrix B not reshaped. Data type supported: same as input0
[out]outputOutput tensor to store the result of matrix multiplication. Data type supported: same as input0.
[in]alphaWeight of the matrix product
[in]is_interleaved(Optional) True if input0 and input1 have been reshaped respectively using NEGEMMInterleave4x4Kernel and NEGEMMTranspose1xWKernel
[in]reshape_info(Optional) GEMM reshape info. If is_interleaved_transposed = true, this object must contain the information to understand how the matrix A and matrix B have been reshaped

Definition at line 1093 of file NEGEMMMatrixMultiplyKernel.cpp.

References ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, arm_compute::auto_init_if_empty(), arm_compute::calculate_max_window(), ICloneable< T >::clone(), ITensorInfo::data_type(), ITensorInfo::dimension(), arm_compute::F32, ITensor::info(), GEMMReshapeInfo::m(), GEMMReshapeInfo::n(), ITensorInfo::num_dimensions(), TensorShape::set(), Dimensions< T >::set_num_dimensions(), ITensorInfo::set_valid_region(), ITensorInfo::tensor_shape(), and arm_compute::validate_arguments().

Referenced by NEGEMMMatrixMultiplyKernel::name().

1094 {
1095  ARM_COMPUTE_ERROR_ON_NULLPTR(input0, input1, output);
1096 
1097  // Output tensor auto inizialitation if not yet initialized
1098  TensorShape tensor_shape{ input0->info()->tensor_shape() };
1099  tensor_shape.set(0, is_interleaved ? reshape_info.n() : input1->info()->dimension(0));
1100  tensor_shape.set(1, is_interleaved ? reshape_info.m() : input0->info()->dimension(1));
1101 
1102  auto_init_if_empty(*output->info(), input0->info()->clone()->set_tensor_shape(tensor_shape));
1103 
1104  // Perform validate step
1105  ARM_COMPUTE_ERROR_THROW_ON(validate_arguments(input0->info(), input1->info(), output->info(), alpha, is_interleaved, reshape_info));
1106 
1107  _input0 = input0;
1108  _input1 = input1;
1109  _output = output;
1110  _alpha = alpha;
1111 
1112  // Configure kernel window
1113  Window win{};
1114 
1115  // Check if the output tensor is a vector. If so,the kernel runs the vector-matrix multiplication
1116  if((output->info()->dimension(1) == 1))
1117  {
1118  const unsigned int num_elems_processed_per_iteration_x = (input0->info()->data_type() == DataType::F32) ? 16 : 32;
1119 
1120  win = calculate_max_window(*output->info(), Steps(num_elems_processed_per_iteration_x));
1121  }
1122  else
1123  {
1124  constexpr unsigned int num_elems_processed_per_iteration_x = 8;
1125  constexpr unsigned int num_elems_processed_per_iteration_y = 4;
1126 
1127  win = calculate_max_window(*output->info(), Steps(num_elems_processed_per_iteration_x, num_elems_processed_per_iteration_y));
1128  }
1129 
1130  Coordinates coord;
1131  coord.set_num_dimensions(output->info()->num_dimensions());
1132  output->info()->set_valid_region(ValidRegion(coord, output->info()->tensor_shape()));
1133  INEKernel::configure(win);
1134 }
Window calculate_max_window(const ValidRegion &valid_region, const Steps &steps, bool skip_border, BorderSize border_size)
1 channel, 1 F32 per channel
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
bool auto_init_if_empty(ITensorInfo &info, const TensorShape &shape, int num_channels, DataType data_type, QuantizationInfo quantization_info=QuantizationInfo())
Auto initialize the tensor info (shape, number of channels and data type) if the current assignment i...
Status validate_arguments(const ITensorInfo *input, const ITensorInfo *bias, const ITensorInfo *output, const GEMMLowpOutputStageInfo *output_stage)
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161

◆ name()

const char* name ( ) const
inlineoverridevirtual

◆ operator=() [1/2]

NEGEMMMatrixMultiplyKernel& operator= ( const NEGEMMMatrixMultiplyKernel )
delete

Prevent instances of this class from being copied (As this class contains pointers)

Referenced by NEGEMMMatrixMultiplyKernel::name().

◆ operator=() [2/2]

Allow instances of this class to be moved.

◆ run()

void run ( const Window window,
const ThreadInfo info 
)
overridevirtual

Execute the kernel on the passed window.

Warning
If is_parallelisable() returns false then the passed window must be equal to window()
Note
The window has to be a region within the window returned by the window() method
The width of the window has to be a multiple of num_elems_processed_per_iteration().
Parameters
[in]windowRegion on which to execute the kernel. (Must be a region of the window returned by window())
[in]infoInfo about executing thread and CPU.

Reimplemented from ICPPKernel.

Definition at line 1144 of file NEGEMMMatrixMultiplyKernel.cpp.

References ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW, ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL, ITensorInfo::data_type(), ITensorInfo::dimension(), arm_compute::F16, arm_compute::F32, ITensor::info(), and IKernel::window().

Referenced by NEGEMMMatrixMultiplyKernel::name().

1145 {
1148 
1149  // Check if the output tensor is a vector. If so,the kernel runs the vector-matrix multiplication
1150  const bool is_output_vector = (_output->info()->dimension(1) == 1);
1151  switch(_input0->info()->data_type())
1152  {
1153  case DataType::F32:
1154  {
1155  is_output_vector ? vector_matrix_multiply_f32(_input0, _input1, _output, window, info, _alpha) :
1156  matrix_matrix_multiply_f32(_input0, _input1, _output, window, _alpha);
1157  break;
1158  }
1159 #ifdef __ARM_FEATURE_FP16_VECTOR_ARITHMETIC
1160  case DataType::F16:
1161  {
1162  is_output_vector ? vector_matrix_multiply_f16(_input0, _input1, _output, window, info, _alpha) :
1163  matrix_matrix_multiply_f16(_input0, _input1, _output, window, _alpha);
1164  break;
1165  }
1166 #endif /* __ARM_FEATURE_FP16_VECTOR_ARITHMETIC */
1167  default:
1168  {
1169  ARM_COMPUTE_ERROR("Data type not supported");
1170  break;
1171  }
1172  }
1173 }
const Window & window() const
The maximum window the kernel can be executed on.
Definition: IKernel.cpp:28
virtual size_t dimension(size_t index) const =0
Return the size of the requested dimension.
#define ARM_COMPUTE_ERROR(msg)
Print the given message then throw an std::runtime_error.
Definition: Error.h:352
virtual DataType data_type() const =0
Data type used for each element of the tensor.
1 channel, 1 F32 per channel
1 channel, 1 F16 per channel
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor&#39;s metadata.
#define ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL(k)
Definition: Validate.h:941
ScaleKernelInfo info(interpolation_policy, default_border_mode, PixelValue(), sampling_policy, false)
#define ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW(f, s)
Definition: Validate.h:205

◆ validate()

Status validate ( const ITensorInfo input0,
const ITensorInfo input1,
const ITensorInfo output,
float  alpha,
bool  is_interleaved,
const GEMMReshapeInfo reshape_info 
)
static

Static function to check if given info will lead to a valid configuration of NEGEMMMatrixMultiplyKernel.

Parameters
[in]input0Input tensor containing the interleaved Matrix A or the vector A. Data types supported: F16/F32
[in]input1Input tensor containing the transposed Matrix B if the first input tensor A is not a vector. If the output tensor is a vector, input1 must contain the matrix B not reshaped. Data type supported: same as input0
[in]outputOutput tensor to store the result of matrix multiplication. Data type supported: same as input0.
[in]alphaWeight of the matrix product
[in]is_interleaved(Optional) True if input0 and input1 have been reshaped respectively using NEGEMMInterleave4x4Kernel and NEGEMMTranspose1xWKernel
[in]reshape_info(Optional) GEMM reshape info. If is_interleaved_transposed = true, this object must contain the information to understand how the matrix A and matrix B have been reshaped
Returns
a status

Definition at line 1136 of file NEGEMMMatrixMultiplyKernel.cpp.

References ARM_COMPUTE_RETURN_ON_ERROR, and arm_compute::validate_arguments().

Referenced by NEGEMMMatrixMultiplyKernel::name(), and NEGEMM::validate().

1138 {
1139  ARM_COMPUTE_RETURN_ON_ERROR(validate_arguments(input0, input1, output, alpha, is_interleaved, reshape_info));
1140 
1141  return Status{};
1142 }
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204
Status validate_arguments(const ITensorInfo *input, const ITensorInfo *bias, const ITensorInfo *output, const GEMMLowpOutputStageInfo *output_stage)

The documentation for this class was generated from the following files: