Compute Library
 19.08
NEGEMMLowpMatrixMultiplyCore Class Reference

Basic function to execute GEMMLowpMatrixMultiplyCore on NEON. More...

#include <NEGEMMLowpMatrixMultiplyCore.h>

Collaboration diagram for NEGEMMLowpMatrixMultiplyCore:
[legend]

Public Member Functions

 NEGEMMLowpMatrixMultiplyCore (std::shared_ptr< IMemoryManager > memory_manager=nullptr)
 Constructor. More...
 
 NEGEMMLowpMatrixMultiplyCore (const NEGEMMLowpMatrixMultiplyCore &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 NEGEMMLowpMatrixMultiplyCore (NEGEMMLowpMatrixMultiplyCore &&)=default
 Default move constructor. More...
 
NEGEMMLowpMatrixMultiplyCoreoperator= (const NEGEMMLowpMatrixMultiplyCore &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
NEGEMMLowpMatrixMultiplyCoreoperator= (NEGEMMLowpMatrixMultiplyCore &&)=default
 Default move assignment operator. More...
 
void configure (const ITensor *a, const ITensor *b, const ITensor *c, ITensor *output, const GEMMInfo &gemm_info=GEMMInfo())
 Initialise the kernel's inputs, output. More...
 
void run () override
 Run the kernels contained in the function. More...
 
void prepare () override
 Prepare the function for executing. More...
 
- Public Member Functions inherited from IFunction
virtual ~IFunction ()=default
 Destructor. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *output, const GEMMInfo &gemm_info=GEMMInfo())
 Static function to check if given info will lead to a valid configuration of NEGEMMLowpMatrixMultiplyCore. More...
 

Detailed Description

Basic function to execute GEMMLowpMatrixMultiplyCore on NEON.

This function calls the following NEON kernels if the DOT product instruction is not available:

  1. NEGEMMInterleave4x4Kernel
  2. NEGEMMTranspose1xWKernel
  3. NEGEMMLowpMatrixMultiplyKernel
  4. NEGEMMLowpOffsetContributionKernel

otherwise if the DOT product instruction is available:

  1. NEGEMMLowpOffsetContributionKernel

Definition at line 55 of file NEGEMMLowpMatrixMultiplyCore.h.

Constructor & Destructor Documentation

◆ NEGEMMLowpMatrixMultiplyCore() [1/3]

NEGEMMLowpMatrixMultiplyCore ( std::shared_ptr< IMemoryManager memory_manager = nullptr)

Constructor.

Definition at line 43 of file NEGEMMLowpMatrixMultiplyCore.cpp.

44  : _memory_group(memory_manager), _asm_glue(memory_manager), _mm_kernel(nullptr), _mtx_a_reshape_kernel(nullptr), _mtx_b_reshape_kernel(nullptr), _mtx_a_reduction_kernel(), _mtx_b_reduction_kernel(),
45  _offset_contribution_kernel(), _offset_contribution_output_stage_kernel(), _vector_sum_col(), _vector_sum_row(), _tmp_a(), _tmp_b(), _mm_result_s32(), _original_b(nullptr), _a_offset(0), _b_offset(0),
46  _run_vector_matrix_multiplication(false), _assembly_path(false), _fused_assembly_path(false), _reshape_b_only_on_first_run(false), _is_prepared(false), _fuse_output_stage(false)
47 {
48 }

◆ NEGEMMLowpMatrixMultiplyCore() [2/3]

Prevent instances of this class from being copied (As this class contains pointers)

◆ NEGEMMLowpMatrixMultiplyCore() [3/3]

Default move constructor.

Member Function Documentation

◆ configure()

void configure ( const ITensor a,
const ITensor b,
const ITensor c,
ITensor output,
const GEMMInfo gemm_info = GEMMInfo() 
)

Initialise the kernel's inputs, output.

Note
GEMM_LOWP: low precision GEMM kernel This kernel performs the following computations:
  1. Convert a values from QASYMM8 to int32 and add a_offset to each of them.
  2. Convert b values from QASYMM8 to int32 add b_offset to each of them.
  3. Compute the matrix product of the resulting a * b in int32.
Note
The output type is S32 if gemm_info.type == GEMMLowpOutputStageType::NONE. It is QASYMM8 otherwise
Parameters
[in]aFirst input tensor (Matrix A). Data type supported: QASYMM8.
[in]bSecond input tensor (Matrix B). Data type supported: same as a
[in]cThird input tensor (Matrix C). It can be a nullptr. Data type supported: S32
[out]outputOutput tensor. Data type supported: Data type supported: S32/QASYMM8
[in]gemm_info(Optional) Specifies if the matrix A and/or matrix B have been reshaped and if the reshape of matrix B should be executed only for the first run

Definition at line 50 of file NEGEMMLowpMatrixMultiplyCore.cpp.

51 {
52  ARM_COMPUTE_ERROR_ON_NULLPTR(a, b, output);
54  ARM_COMPUTE_ERROR_THROW_ON(NEGEMMLowpMatrixMultiplyCore::validate(a->info(), b->info(), c != nullptr ? c->info() : nullptr, output->info(), gemm_info));
55 
56  const ITensor *matrix_a = a;
57  const ITensor *matrix_b = b;
58 
59  // Clear state
60  _mtx_a_reshape_kernel = nullptr;
61  _mtx_b_reshape_kernel = nullptr;
62 
63  // Set internal variables
64  _a_offset = a->info()->quantization_info().uniform().offset;
65  _b_offset = b->info()->quantization_info().uniform().offset;
66  _run_vector_matrix_multiplication = a->info()->dimension(1) < 2;
67  _reshape_b_only_on_first_run = gemm_info.reshape_b_only_on_first_run();
68  _is_prepared = false;
69  _fused_assembly_path = false;
70  _original_b = b;
71 
72  // If GEMMLowpOutputStage != NONE, fuse the offset contribution with the output stage
74  {
75  _fuse_output_stage = true;
76  _memory_group.manage(&_mm_result_s32);
77  TensorInfo info_mm_result_s32(output->info()->tensor_shape(), 1, DataType::S32);
78  _mm_result_s32.allocator()->init(info_mm_result_s32);
79  }
80 
81 #ifdef __aarch64__
82  switch(a->info()->data_type())
83  {
84  case DataType::QASYMM8:
85  case DataType::U8:
86  case DataType::S8:
87  {
89  {
90  _asm_glue.configure(a, b, c, output, 1.f, 0.f, gemm_info);
91  _fused_assembly_path = _asm_glue.is_configured();
92  }
93  else
94  {
95  _asm_glue.configure(a, b, nullptr, _fuse_output_stage ? &_mm_result_s32 : output, 1.f, 0.f, gemm_info);
96  }
97  _assembly_path = _asm_glue.is_configured();
98  break;
99  }
100  default:
101  {
102  ARM_COMPUTE_ERROR("Datatype not supported");
103  break;
104  }
105  }
106 #endif /* __aarch64__ */
107  if(!(_assembly_path || _run_vector_matrix_multiplication))
108  {
109  matrix_a = &_tmp_a;
110  matrix_b = &_tmp_b;
111 
112  // The interleaved output matrix will have the following shape: [ a_height * 4, ceil(a_width / 4.0f) ]
113  TensorInfo a_info(compute_interleaved_shape(*a->info()), 1, a->info()->data_type(), a->info()->quantization_info());
114  // The transpose1xW output matrix will have the following shape: [ b_height * 16, ceil(b_width / 16.0f) ]
115  TensorInfo b_info(compute_transpose1xW_shape(*b->info()), 1, b->info()->data_type(), b->info()->quantization_info());
116  _tmp_a.allocator()->init(a_info);
117  _tmp_b.allocator()->init(b_info);
118  _memory_group.manage(&_tmp_a);
119  if(!_reshape_b_only_on_first_run)
120  {
121  _memory_group.manage(&_tmp_b);
122  }
123 
124  // Configure interleave kernel
125  {
126  auto k = arm_compute::support::cpp14::make_unique<NEGEMMInterleave4x4Kernel>();
127  k->configure(a, &_tmp_a);
128  _mtx_a_reshape_kernel = std::move(k);
129  }
130 
131  // Configure transpose kernel
132  {
133  auto k = arm_compute::support::cpp14::make_unique<NEGEMMTranspose1xWKernel>();
134  k->configure(b, &_tmp_b);
135  _mtx_b_reshape_kernel = std::move(k);
136  }
137  }
138 
139  if(!_fused_assembly_path)
140  {
141  // Initialize matrix B reduction kernel only if _a_offset is not equal to 0
142  if(_a_offset != 0)
143  {
144  TensorInfo info_vector_sum_col(compute_reductionA_shape(*b->info()), 1, DataType::S32);
145 
146  _vector_sum_col.allocator()->init(info_vector_sum_col);
147  if(!_reshape_b_only_on_first_run)
148  {
149  _memory_group.manage(&_vector_sum_col);
150  }
151 
152  // Configure Matrix B reduction kernel
153  _mtx_b_reduction_kernel.configure(b, &_vector_sum_col, a->info()->dimension(0), false);
154  }
155 
156  // Initialize Matrix A reduction kernel only if _b_offset is not equal to 0
157  if(_b_offset != 0)
158  {
159  TensorInfo info_vector_sum_row(compute_reductionB_shape(*a->info()), 1, DataType::S32);
160 
161  _vector_sum_row.allocator()->init(info_vector_sum_row);
162  _memory_group.manage(&_vector_sum_row);
163 
164  // Configure matrix A reduction kernel
165  _mtx_a_reduction_kernel.configure(a, &_vector_sum_row, a->info()->dimension(0), false);
166  }
167 
168  if(_fuse_output_stage)
169  {
170  // Configure matrix multiply kernel
171  if(!_assembly_path)
172  {
173  auto k = arm_compute::support::cpp14::make_unique<NEGEMMLowpMatrixMultiplyKernel>();
174  k->configure(matrix_a, matrix_b, &_mm_result_s32);
175  _mm_kernel = std::move(k);
176  }
177 
178  _offset_contribution_output_stage_kernel.configure(&_mm_result_s32, _a_offset == 0 ? nullptr : &_vector_sum_col, _b_offset == 0 ? nullptr : &_vector_sum_row, c, output, a->info()->dimension(0),
179  _a_offset, _b_offset, gemm_info.gemmlowp_output_stage());
180  }
181  else
182  {
183  // Configure matrix multiply kernel
184  if(!_assembly_path)
185  {
186  auto k = arm_compute::support::cpp14::make_unique<NEGEMMLowpMatrixMultiplyKernel>();
187  k->configure(matrix_a, matrix_b, output);
188  _mm_kernel = std::move(k);
189  }
190  // Configure offset contribution kernel
191  _offset_contribution_kernel.configure(output, _a_offset == 0 ? nullptr : &_vector_sum_col, _b_offset == 0 ? nullptr : &_vector_sum_row, a->info()->dimension(0), _a_offset, _b_offset);
192  }
193  }
194 
195  // Allocate tensors
196  if(!_assembly_path && !_run_vector_matrix_multiplication)
197  {
198  _tmp_a.allocator()->allocate();
199  if(!_reshape_b_only_on_first_run)
200  {
201  _tmp_b.allocator()->allocate();
202  }
203  }
204 
205  if(!_fused_assembly_path)
206  {
207  if(_a_offset != 0 && !_reshape_b_only_on_first_run)
208  {
209  _vector_sum_col.allocator()->allocate();
210  }
211 
212  if(_b_offset != 0)
213  {
214  _vector_sum_row.allocator()->allocate();
215  }
216  }
217 
218  if(_fuse_output_stage)
219  {
220  _mm_result_s32.allocator()->allocate();
221  }
222 }
#define ARM_COMPUTE_ERROR(...)
Print the given message then throw an std::runtime_error.
Definition: Error.h:261
Quantize to uint8 using a fixed point multiplication.
void init(const TensorAllocator &allocator, const Coordinates &coords, TensorInfo &sub_info)
Shares the same backing memory with another tensor allocator, while the tensor info might be differen...
virtual size_t dimension(size_t index) const =0
Return the size of the requested dimension.
SimpleTensor< float > b
Definition: DFT.cpp:157
1 channel, 1 U8 per channel
virtual DataType data_type() const =0
Data type used for each element of the tensor.
GEMMLowpOutputStageInfo gemmlowp_output_stage() const
GEMMLowp output stage.
Definition: Types.h:1983
TensorShape compute_reductionA_shape(const ITensorInfo &b)
Calculate the reductionA shape used in GEMMLowp.
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:327
void configure(ITensor *mm_result, const ITensor *vector_sum_col, const ITensor *vector_sum_row, int32_t k, int32_t a_offset, int32_t b_offset)
Initialise the kernel's input and output.
void configure(const ITensor *mtx_b, ITensor *vector_sum_col, int32_t num_mtx_b_rows, bool is_transposed1xW) override
Initialise the kernel's input and output.
GEMMLowpOutputStageType type
GEMMLowp output stage type.
Definition: Types.h:1847
Interface for NEON tensor.
Definition: ITensor.h:36
TensorShape compute_interleaved_shape(const ITensorInfo &a, int mult_interleave4x4_height=1, bool reinterpret_input_as_3d=false)
Calculate the interleaved shape of an input tensor.
TensorAllocator * allocator()
Return a pointer to the tensor's allocator.
Definition: Tensor.cpp:48
1 channel, 1 S32 per channel
void configure(const ITensor *a, const ITensor *b, const ITensor *c, ITensor *d, float alpha, float beta, const GEMMInfo &gemm_info)
If supported create an ACL function else fallback to the arm_gemm function.
#define ARM_COMPUTE_UNUSED(...)
To avoid unused variables warnings.
Definition: Error.h:160
void manage(TensorType *obj)
Sets a object to be managed by the given memory group.
virtual const TensorShape & tensor_shape() const =0
Size for each dimension of the tensor.
quantized, asymmetric fixed-point 8-bit number
bool is_configured() const
Was the function successfully configured ?
void allocate() override
Allocate size specified by TensorInfo of CPU memory.
UniformQuantizationInfo uniform() const
Return per layer quantization info.
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor's metadata.
void configure(const ITensor *mm_result, const ITensor *vector_sum_col, const ITensor *vector_sum_row, const ITensor *bias, ITensor *output, int32_t k, int32_t a_offset, int32_t b_offset, GEMMLowpOutputStageInfo output_stage)
Initialise the kernel's input and output.
virtual QuantizationInfo quantization_info() const =0
Get the quantization settings (scale and offset) of the tensor.
TensorShape compute_reductionB_shape(const ITensorInfo &a)
Calculate the reductionB shape used in GEMMLowp.
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161
TensorShape compute_transpose1xW_shape(const ITensorInfo &b)
Calculate the transposed 1xW shape.
Store the tensor's metadata.
Definition: TensorInfo.h:45
bool reshape_b_only_on_first_run() const
Flag which specifies if the reshape of matrix B should executed only for the first.
Definition: Types.h:1951
static Status validate(const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *output, const GEMMInfo &gemm_info=GEMMInfo())
Static function to check if given info will lead to a valid configuration of NEGEMMLowpMatrixMultiply...
signed 8-bit number
void configure(const ITensor *mtx_a, ITensor *vector_sum_row, int32_t num_mtx_a_cols, bool is_interleaved4x4) override
Initialise the kernel's input and output.

References TensorAllocator::allocate(), Tensor::allocator(), ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, ARM_COMPUTE_UNUSED, arm_compute::test::validation::b, arm_compute::misc::shape_calculator::compute_interleaved_shape(), arm_compute::misc::shape_calculator::compute_reductionA_shape(), arm_compute::misc::shape_calculator::compute_reductionB_shape(), arm_compute::misc::shape_calculator::compute_transpose1xW_shape(), NEGEMMLowpOffsetContributionKernel::configure(), NEGEMMLowpMatrixAReductionKernel::configure(), NEGEMMLowpOffsetContributionOutputStageKernel::configure(), NEGEMMAssemblyDispatch::configure(), NEGEMMLowpMatrixBReductionKernel::configure(), ITensorInfo::data_type(), ITensorInfo::dimension(), GEMMInfo::gemmlowp_output_stage(), ITensor::info(), TensorAllocator::init(), NEGEMMAssemblyDispatch::is_configured(), MemoryGroupBase< TensorType >::manage(), arm_compute::NONE, UniformQuantizationInfo::offset, arm_compute::QASYMM8, ITensorInfo::quantization_info(), arm_compute::QUANTIZE_DOWN_FIXEDPOINT, GEMMInfo::reshape_b_only_on_first_run(), arm_compute::S32, arm_compute::S8, ITensorInfo::tensor_shape(), GEMMLowpOutputStageInfo::type, arm_compute::U8, QuantizationInfo::uniform(), and NEGEMMLowpMatrixMultiplyCore::validate().

Referenced by NELSTMLayerQuantized::configure(), and arm_compute::test::validation::DATA_TEST_CASE().

◆ operator=() [1/2]

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

Default move assignment operator.

◆ prepare()

void prepare ( )
overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Note
Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from IFunction.

Definition at line 420 of file NEGEMMLowpMatrixMultiplyCore.cpp.

421 {
422  if(!_is_prepared)
423  {
424  // Run assembly reshape
425  if(_asm_glue.is_configured() && _reshape_b_only_on_first_run)
426  {
427  ARM_COMPUTE_ERROR_ON(!_original_b->is_used());
428 
429  _asm_glue.prepare();
430  _original_b->mark_as_unused();
431  }
432  // Run non-assembly reshape
433  else if(_mtx_b_reshape_kernel && _reshape_b_only_on_first_run)
434  {
435  ARM_COMPUTE_ERROR_ON(!_original_b->is_used());
436 
437  // Run reshape kernel and mark original weights tensor as unused
438  _tmp_b.allocator()->allocate();
439  NEScheduler::get().schedule(_mtx_b_reshape_kernel.get(), Window::DimY);
440  _original_b->mark_as_unused();
441  }
442 
443  // Run matrix B reduction kernel only if _a_offset is not equal to 0
444  if(_a_offset != 0 && _reshape_b_only_on_first_run)
445  {
446  _vector_sum_col.allocator()->allocate();
447  NEScheduler::get().schedule(&_mtx_b_reduction_kernel, Window::DimX);
448  }
449 
450  _is_prepared = true;
451  }
452 }
bool is_used() const
Flags if the tensor is used or not.
Definition: ITensor.cpp:162
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:337
TensorAllocator * allocator()
Return a pointer to the tensor's allocator.
Definition: Tensor.cpp:48
void mark_as_unused() const
Marks a tensor as unused.
Definition: ITensor.cpp:167
static constexpr size_t DimX
Alias for dimension 0 also known as X dimension.
Definition: Window.h:43
void prepare() override
Runs a preparation step, usually for pre-transposing matrix b.
bool is_configured() const
Was the function successfully configured ?
void allocate() override
Allocate size specified by TensorInfo of CPU memory.
static constexpr size_t DimY
Alias for dimension 1 also known as Y dimension.
Definition: Window.h:45
virtual void schedule(ICPPKernel *kernel, const Hints &hints)=0
Runs the kernel in the same thread as the caller synchronously.
static IScheduler & get()
Access the scheduler singleton.
Definition: Scheduler.cpp:96

References TensorAllocator::allocate(), Tensor::allocator(), ARM_COMPUTE_ERROR_ON, Window::DimX, Window::DimY, Scheduler::get(), NEGEMMAssemblyDispatch::is_configured(), ITensor::is_used(), ITensor::mark_as_unused(), NEGEMMAssemblyDispatch::prepare(), and IScheduler::schedule().

Referenced by NEGEMMConvolutionLayer::prepare(), and NEGEMMLowpMatrixMultiplyCore::run().

◆ run()

void run ( )
overridevirtual

Run the kernels contained in the function.

For NEON kernels:

  • Multi-threading is used for the kernels which are parallelisable.
  • By default std::thread::hardware_concurrency() threads are used.
Note
CPPScheduler::set_num_threads() can be used to manually set the number of threads

For OpenCL kernels:

  • All the kernels are enqueued on the queue associated with CLScheduler.
  • The queue is then flushed.
Note
The function will not block until the kernels are executed. It is the user's responsibility to wait.
Will call prepare() on first run if hasn't been done

Implements IFunction.

Definition at line 367 of file NEGEMMLowpMatrixMultiplyCore.cpp.

368 {
369  prepare();
370 
371  MemoryGroupResourceScope scope_mg(_memory_group);
372 
373  // Reshape inputs
374  if(_mtx_a_reshape_kernel)
375  {
376  NEScheduler::get().schedule(_mtx_a_reshape_kernel.get(), Window::DimY);
377  }
378  if(_mtx_b_reshape_kernel && !_reshape_b_only_on_first_run)
379  {
380  NEScheduler::get().schedule(_mtx_b_reshape_kernel.get(), Window::DimY);
381  }
382 
383  // Run GEMM
384  if(_asm_glue.is_configured())
385  {
386  _asm_glue.run();
387  }
388  else
389  {
390  NEScheduler::get().schedule(_mm_kernel.get(), Window::DimY);
391  }
392 
393  if(!_fused_assembly_path)
394  {
395  // Run matrix A reduction kernel only if _b_offset is not equal to 0
396  if(_b_offset != 0)
397  {
398  NEScheduler::get().schedule(&_mtx_a_reduction_kernel, Window::DimX);
399  }
400 
401  // Run matrix B reduction kernel only if _a_offset is not equal to 0
402  if(_a_offset != 0 && !_reshape_b_only_on_first_run)
403  {
404  NEScheduler::get().schedule(&_mtx_b_reduction_kernel, Window::DimX);
405  }
406 
407  if(_fuse_output_stage)
408  {
409  // Run offset contribution kernel
410  NEScheduler::get().schedule(&_offset_contribution_output_stage_kernel, Window::DimY);
411  }
412  else
413  {
414  // Run offset contribution kernel
415  NEScheduler::get().schedule(&_offset_contribution_kernel, Window::DimY);
416  }
417  }
418 }
void prepare() override
Prepare the function for executing.
void run() override
Run the kernels contained in the function.
static constexpr size_t DimX
Alias for dimension 0 also known as X dimension.
Definition: Window.h:43
bool is_configured() const
Was the function successfully configured ?
static constexpr size_t DimY
Alias for dimension 1 also known as Y dimension.
Definition: Window.h:45
Memory group resources scope handling class.
Definition: IMemoryGroup.h:46
virtual void schedule(ICPPKernel *kernel, const Hints &hints)=0
Runs the kernel in the same thread as the caller synchronously.
static IScheduler & get()
Access the scheduler singleton.
Definition: Scheduler.cpp:96

References Window::DimX, Window::DimY, Scheduler::get(), NEGEMMAssemblyDispatch::is_configured(), NEGEMMLowpMatrixMultiplyCore::prepare(), NEGEMMAssemblyDispatch::run(), and IScheduler::schedule().

Referenced by NEFullyConnectedLayer::run(), NELSTMLayerQuantized::run(), and NEGEMMConvolutionLayer::run().

◆ validate()

Status validate ( const ITensorInfo a,
const ITensorInfo b,
const ITensorInfo c,
const ITensorInfo output,
const GEMMInfo gemm_info = GEMMInfo() 
)
static

Static function to check if given info will lead to a valid configuration of NEGEMMLowpMatrixMultiplyCore.

Note
The output type is S32 if gemm_info.type == GEMMLowpOutputStageType::NONE. It is QASYMM8 otherwise
Parameters
[in]aFirst input tensor info (Matrix A). Data type supported: QASYMM8.
[in]bSecond input tensor info (Matrix B). Data type supported: same as a
[in]cThird input tensor info (Matrix C). It can be a nullptr. Data type supported: S32
[in]outputOutput tensor info. Data type supported: Data type supported: S32/QASYMM8
[in]gemm_info(Optional) Specifies if the matrix A and/or matrix B have been reshaped and if the reshape of matrix B should be executed only for the first run
Returns
a status

Definition at line 224 of file NEGEMMLowpMatrixMultiplyCore.cpp.

225 {
229  ARM_COMPUTE_RETURN_ERROR_ON_MSG(c != nullptr && gemm_info.gemmlowp_output_stage().type == GEMMLowpOutputStageType::NONE, "Bias addition not supported in NEGEMMLowpMatrixMultiplyCore for output S32");
230  ARM_COMPUTE_RETURN_ERROR_ON_MSG((a)->dimension(0) != (b)->dimension(1),
231  "The product AB is defined only if the number of columns in A is equal to the number of rows in B");
232  ARM_COMPUTE_RETURN_ERROR_ON_MSG(gemm_info.is_a_reshaped(), "Matrix A already reshaped is not supported");
233  ARM_COMPUTE_RETURN_ERROR_ON_MSG(gemm_info.is_b_reshaped(), "Matrix B already reshaped is not supported");
234 
235  const ITensorInfo *matrix_a_info = a;
236  const ITensorInfo *matrix_b_info = b;
237 
238  TensorInfo tmp_a_info{};
239  TensorInfo tmp_b_info{};
240  TensorInfo mm_result_s32_info{};
241 
242  int32_t a_offset = a->quantization_info().uniform().offset;
243  int32_t b_offset = b->quantization_info().uniform().offset;
244 
245  bool fuse_output_stage = gemm_info.gemmlowp_output_stage().type != GEMMLowpOutputStageType::NONE;
246  if(fuse_output_stage)
247  {
248  auto_init_if_empty(mm_result_s32_info, a->clone()->set_tensor_shape(output->tensor_shape()).set_data_type(DataType::S32));
249  }
250 
251  // Check if we need to run the optimized assembly kernel
252  bool run_optimised = false;
253  bool run_optimised_requantized = false;
255  {
256  run_optimised = bool(NEGEMMAssemblyDispatch::validate(a, b, c, output, 1.f, 0.f, gemm_info));
257  run_optimised_requantized = run_optimised;
258  }
259  else
260  {
261  run_optimised = bool(NEGEMMAssemblyDispatch::validate(a, b, nullptr, fuse_output_stage ? &mm_result_s32_info : output, 1.f, 0.f, gemm_info));
262  }
263 
264  if(run_optimised)
265  {
266  ARM_COMPUTE_RETURN_ERROR_ON(b->dimension(0) != output->dimension(0));
267  if(gemm_info.depth_output_gemm3d() != 0)
268  {
269  if(gemm_info.reinterpret_input_as_3d())
270  {
271  ARM_COMPUTE_RETURN_ERROR_ON(a->dimension(1) != output->dimension(1));
272  ARM_COMPUTE_RETURN_ERROR_ON(a->dimension(2) != output->dimension(2));
273  }
274  else
275  {
276  ARM_COMPUTE_RETURN_ERROR_ON(a->dimension(1) != output->dimension(1) * output->dimension(2));
277  }
278  }
279  else
280  {
281  ARM_COMPUTE_RETURN_ERROR_ON(a->dimension(1) != output->dimension(1));
282  }
283  }
284  else
285  {
286  ARM_COMPUTE_RETURN_ERROR_ON_MSG(gemm_info.reinterpret_input_as_3d(), "NEGEMM cannot reinterpret the input tensor as 3D");
287  ARM_COMPUTE_RETURN_ERROR_ON_MSG(gemm_info.depth_output_gemm3d() != 0, "NEGEMM cannot reinterpret the output tensor as 3D");
288 
289  const bool run_vector_matrix_multiplication = a->dimension(1) < 2;
290  if(!run_vector_matrix_multiplication)
291  {
292  matrix_a_info = &tmp_a_info;
293  matrix_b_info = &tmp_b_info;
294 
295  // The interleaved output matrix will have the following shape: [ a_height * 4, ceil(a_width / 4.0f) ]
296  TensorShape shape_tmp_a = a->tensor_shape();
297  shape_tmp_a.set(0, a->dimension(0) * 4);
298  shape_tmp_a.set(1, std::ceil(a->dimension(1) / 4.f));
299 
300  // The transpose1xW output matrix will have the following shape: [ b_height * 16, ceil(b_width / 16.0f) ]
301  TensorShape shape_tmp_b = b->tensor_shape();
302  shape_tmp_b.set(0, b->dimension(1) * 16);
303  shape_tmp_b.set(1, std::ceil(b->dimension(0) / 16.f));
304 
305  // Validate interleave kernel
306  auto_init_if_empty(tmp_a_info, a->clone()->set_tensor_shape(shape_tmp_a));
307  auto_init_if_empty(tmp_b_info, b->clone()->set_tensor_shape(shape_tmp_b));
308 
311  }
312  }
313 
314  if(!run_optimised_requantized)
315  {
316  TensorInfo info_vector_sum_col{};
317  TensorInfo info_vector_sum_row{};
318 
319  // Validate matrix B reduction kernel only if _a_offset is not equal to 0
320  if(a_offset != 0)
321  {
322  info_vector_sum_col = TensorInfo(compute_reductionA_shape(*b), 1, DataType::S32);
323 
324  // Configure Matrix B reduction kernel
326  }
327 
328  // Validate Matrix A reduction kernel only if _b_offset is not equal to 0
329  if(b_offset != 0)
330  {
331  info_vector_sum_row = TensorInfo(compute_reductionB_shape(*a), 1, DataType::S32);
332 
333  // Configure matrix A reduction kernel
335  }
336 
337  if(fuse_output_stage)
338  {
339  if(!run_optimised)
340  {
341  ARM_COMPUTE_RETURN_ON_ERROR(NEGEMMLowpMatrixMultiplyKernel::validate(matrix_a_info, matrix_b_info, &mm_result_s32_info));
342  }
343 
344  // Validate offset contribution kernel
346  a_offset == 0 ? nullptr : &info_vector_sum_col,
347  b_offset == 0 ? nullptr : &info_vector_sum_row,
348  c, output, a_offset, b_offset,
349  gemm_info.gemmlowp_output_stage()));
350  }
351  else
352  {
353  if(!run_optimised)
354  {
355  ARM_COMPUTE_RETURN_ON_ERROR(NEGEMMLowpMatrixMultiplyKernel::validate(matrix_a_info, matrix_b_info, output));
356  }
357  // Validate offset contribution kernel
359  a_offset == 0 ? nullptr : &info_vector_sum_col,
360  b_offset == 0 ? nullptr : &info_vector_sum_row,
361  a_offset, b_offset));
362  }
363  }
364  return Status{};
365 }
static Status validate(const ITensorInfo *mtx_a, const ITensorInfo *vector_sum_row, int32_t num_mtx_a_cols, bool is_interleaved4x4)
Static function to check if given info will lead to a valid configuration of NEGEMMLowpMatrixAReducti...
Shape of a tensor.
Definition: TensorShape.h:39
virtual size_t dimension(size_t index) const =0
Return the size of the requested dimension.
SimpleTensor< float > b
Definition: DFT.cpp:157
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(...)
Definition: Validate.h:545
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:193
virtual DataType data_type() const =0
Data type used for each element of the tensor.
GEMMLowpOutputStageInfo gemmlowp_output_stage() const
GEMMLowp output stage.
Definition: Types.h:1983
TensorShape compute_reductionA_shape(const ITensorInfo &b)
Calculate the reductionA shape used in GEMMLowp.
#define ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(t, c,...)
Definition: Validate.h:791
Store the tensor's metadata.
Definition: ITensorInfo.h:40
int depth_output_gemm3d() const
Depth of the output when GEMM output is reinterpreted as 3D tensor.
Definition: Types.h:1959
Status class.
Definition: Error.h:52
#define ARM_COMPUTE_RETURN_ERROR_ON(cond)
If the condition is true, an error is returned.
Definition: Error.h:244
GEMMLowpOutputStageType type
GEMMLowp output stage type.
Definition: Types.h:1847
static Status validate(const ITensorInfo *mm_result, const ITensorInfo *vector_sum_col, const ITensorInfo *vector_sum_row, int32_t a_offset, int32_t b_offset)
Static function to check if given info will lead to a valid configuration of NEGEMMLowpOffsetContribu...
static Status validate(const ITensorInfo *input, const ITensorInfo *output)
Static function to check if given info will lead to a valid configuration of NEGEMMTranspose1xWKernel...
bool auto_init_if_empty(ITensorInfo &info, const TensorShape &shape, int num_channels, DataType data_type, QuantizationInfo quantization_info=QuantizationInfo())
Auto initialize the tensor info (shape, number of channels and data type) if the current assignment i...
Definition: Helpers.inl:201
bool is_b_reshaped() const
Flag which specifies if the matrix B has been reshaped.
Definition: Types.h:1941
1 channel, 1 S32 per channel
virtual const TensorShape & tensor_shape() const =0
Size for each dimension of the tensor.
quantized, asymmetric fixed-point 8-bit number
static Status validate(const ITensorInfo *input, const ITensorInfo *output)
Static function to check if given info will lead to a valid configuration of NEGEMMInterleave4x4Kerne...
#define ARM_COMPUTE_RETURN_ERROR_ON_MSG(cond,...)
If the condition is true, an error is returned.
Definition: Error.h:214
UniformQuantizationInfo uniform() const
Return per layer quantization info.
virtual std::unique_ptr< T > clone() const =0
Provide a clone of the current object of class T.
bool reinterpret_input_as_3d() const
Flag which specifies if the input tensor has to be reinterpreted as 3D.
Definition: Types.h:1967
virtual QuantizationInfo quantization_info() const =0
Get the quantization settings (scale and offset) of the tensor.
bool is_data_type_quantized_asymmetric(DataType dt)
Check if a given data type is of asymmetric quantized type.
Definition: Utils.h:1030
bool is_a_reshaped() const
Flag which specifies if the matrix A has been reshaped.
Definition: Types.h:1933
TensorShape compute_reductionB_shape(const ITensorInfo &a)
Calculate the reductionB shape used in GEMMLowp.
TensorShape & set(size_t dimension, size_t value, bool apply_dim_correction=true)
Accessor to set the value of one of the dimensions.
Definition: TensorShape.h:78
static Status validate(const ITensorInfo *mtx_b, const ITensorInfo *vector_sum_col, int32_t num_mtx_b_rows, bool is_transposed1xW)
Static function to check if given info will lead to a valid configuration of NEGEMMLowpMatrixBReducti...
Store the tensor's metadata.
Definition: TensorInfo.h:45
static Status validate(const ITensorInfo *input0, const ITensorInfo *input1, const ITensorInfo *output)
Static function to check if given info will lead to a valid configuration of NEGEMMLowpMatrixMultiply...
static Status validate(const ITensorInfo *mm_result, const ITensorInfo *vector_sum_col, const ITensorInfo *vector_sum_row, const ITensorInfo *bias, const ITensorInfo *output, int32_t a_offset, int32_t b_offset, GEMMLowpOutputStageInfo output_stage)
Static function to check if given info will lead to a valid configuration of NEGEMMLowpOffsetContribu...
static Status validate(const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *d, float alpha, float beta, const GEMMInfo &gemm_info)
Indicates whether or not this function can be used to process the given parameters.

References ARM_COMPUTE_RETURN_ERROR_ON, ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES, ARM_COMPUTE_RETURN_ERROR_ON_MSG, ARM_COMPUTE_RETURN_ON_ERROR, arm_compute::auto_init_if_empty(), arm_compute::test::validation::b, ICloneable< T >::clone(), arm_compute::misc::shape_calculator::compute_reductionA_shape(), arm_compute::misc::shape_calculator::compute_reductionB_shape(), ITensorInfo::data_type(), GEMMInfo::depth_output_gemm3d(), ITensorInfo::dimension(), GEMMInfo::gemmlowp_output_stage(), GEMMInfo::is_a_reshaped(), GEMMInfo::is_b_reshaped(), arm_compute::is_data_type_quantized_asymmetric(), arm_compute::NONE, UniformQuantizationInfo::offset, arm_compute::QASYMM8, ITensorInfo::quantization_info(), GEMMInfo::reinterpret_input_as_3d(), arm_compute::S32, TensorShape::set(), ITensorInfo::tensor_shape(), GEMMLowpOutputStageInfo::type, QuantizationInfo::uniform(), NEGEMMInterleave4x4Kernel::validate(), NEGEMMLowpMatrixMultiplyKernel::validate(), NEGEMMLowpOffsetContributionKernel::validate(), NEGEMMTranspose1xWKernel::validate(), NEGEMMLowpMatrixAReductionKernel::validate(), NEGEMMAssemblyDispatch::validate(), NEGEMMLowpOffsetContributionOutputStageKernel::validate(), and NEGEMMLowpMatrixBReductionKernel::validate().

Referenced by NEGEMMLowpMatrixMultiplyCore::configure(), and NELSTMLayerQuantized::validate().


The documentation for this class was generated from the following files: