Compute Library
NEGEMMLowpAssemblyMatrixMultiplyCore Class Reference

Basic function to execute matrix multiply assembly kernels. More...

#include <NEGEMMLowpAssemblyMatrixMultiplyCore.h>

Collaboration diagram for NEGEMMLowpAssemblyMatrixMultiplyCore:

Public Member Functions

 NEGEMMLowpAssemblyMatrixMultiplyCore (std::shared_ptr< IMemoryManager > memory_manager=nullptr)
 Constructor. More...
void configure (const ITensor *a, const ITensor *b, const ITensor *c, ITensor *output)
 Initialise the kernel's inputs, output. More...
void run () override
 Run the kernels contained in the function. More...
- Public Member Functions inherited from IFunction
virtual ~IFunction ()=default
 Destructor. More...
virtual void prepare ()
 Prepare the function for executing. More...

Detailed Description

Basic function to execute matrix multiply assembly kernels.

Definition at line 42 of file NEGEMMLowpAssemblyMatrixMultiplyCore.h.

Constructor & Destructor Documentation

◆ NEGEMMLowpAssemblyMatrixMultiplyCore()

NEGEMMLowpAssemblyMatrixMultiplyCore ( std::shared_ptr< IMemoryManager memory_manager = nullptr)


Definition at line 41 of file NEGEMMLowpAssemblyMatrixMultiplyCore.cpp.

42  : _memory_group(memory_manager), _asm_glue(memory_manager), _mm_kernel(nullptr), _mtx_a_reshape_kernel(nullptr), _mtx_b_reshape_kernel(nullptr), _tmp_a(), _tmp_b()
43 {
44 }

Member Function Documentation

◆ configure()

void configure ( const ITensor a,
const ITensor b,
const ITensor c,
ITensor output 

Initialise the kernel's inputs, output.

[in]aFirst input tensor (Matrix A). Data type supported: U8, S8.
[in]bSecond input tensor (Matrix B). Data type supported: same as a
[in]cThird input tensor (Matrix C). Data type supported: same as a
[out]outputOutput tensor. Data type supported: Data type supported: U32, S32

Definition at line 46 of file NEGEMMLowpAssemblyMatrixMultiplyCore.cpp.

47 {
51  ARM_COMPUTE_ERROR_ON_MSG((a)->info()->dimension(0) != (b)->info()->dimension(1), "The product AB is defined only if the number of columns in A is equal to the number of rows in B");
52  ARM_COMPUTE_ERROR_ON_MSG((a)->info()->dimension(1) != (output)->info()->dimension(1), "The output matrix must have the same number of rows as the matrix A");
53  ARM_COMPUTE_ERROR_ON_MSG((b)->info()->dimension(0) != (output)->info()->dimension(0), "The output matrix must have the same number of columns as the matrix B");
55  bool run_optimised = false;
56  switch(a->info()->data_type())
57  {
58  case DataType::S8:
59  case DataType::QASYMM8:
60  case DataType::U8:
61  {
62  _asm_glue.configure(a, b, c, output, 1.f, 0.f, GEMMInfo(false, false, true));
63  run_optimised = _asm_glue.is_configured();
64  break;
65  }
66  default:
67  {
68  ARM_COMPUTE_ERROR("Datatype not supported");
69  break;
70  }
71  }
72  if(!run_optimised)
73  {
74  // The interleaved output matrix will have the following shape: [ a_height * 4, ceil(a_width / 4.0f) ]
75  TensorShape shape_tmp_a = a->info()->tensor_shape();
76  shape_tmp_a.set(0, a->info()->dimension(0) * 4);
77  shape_tmp_a.set(1, std::ceil(a->info()->dimension(1) / 4.f));
79  // The transpose1xW output matrix will have the following shape: [ b_height * 16, ceil(b_width / 16.0f) ]
80  TensorShape shape_tmp_b = b->info()->tensor_shape();
81  shape_tmp_b.set(0, b->info()->dimension(1) * 16);
82  shape_tmp_b.set(1, std::ceil(b->info()->dimension(0) / 16.f));
84  TensorInfo info_a(shape_tmp_a, 1, a->info()->data_type());
85  TensorInfo info_b(shape_tmp_b, 1, b->info()->data_type());
86  _tmp_a.allocator()->init(info_a);
87  _tmp_b.allocator()->init(info_b);
88  _memory_group.manage(&_tmp_a);
89  _memory_group.manage(&_tmp_b);
91  // Configure interleave kernel
92  {
93  auto k = arm_compute::support::cpp14::make_unique<NEGEMMInterleave4x4Kernel>();
94  k->configure(a, &_tmp_a);
95  _mtx_a_reshape_kernel = std::move(k);
96  }
98  // Configure transpose kernel
99  {
100  auto k = arm_compute::support::cpp14::make_unique<NEGEMMTranspose1xWKernel>();
101  k->configure(b, &_tmp_b);
102  _mtx_b_reshape_kernel = std::move(k);
103  }
105  // Configure matrix multiply kernel
106  {
107  auto k = arm_compute::support::cpp14::make_unique<NEGEMMLowpMatrixMultiplyKernel>();
108  k->configure(&_tmp_a, &_tmp_b, output);
109  _mm_kernel = std::move(k);
110  }
112  // Allocate tensors
113  _tmp_a.allocator()->allocate();
114  _tmp_b.allocator()->allocate();
115  }
116 }
#define ARM_COMPUTE_ERROR(...)
Print the given message then throw an std::runtime_error.
Definition: Error.h:261
Shape of a tensor.
Definition: TensorShape.h:39
void init(const TensorAllocator &allocator, const Coordinates &coords, TensorInfo &sub_info)
Shares the same backing memory with another tensor allocator, while the tensor info might be differen...
virtual size_t dimension(size_t index) const =0
Return the size of the requested dimension.
Definition: Validate.h:543
SimpleTensor< float > b
Definition: DFT.cpp:157
1 channel, 1 U8 per channel
virtual DataType data_type() const =0
Data type used for each element of the tensor.
TensorAllocator * allocator()
Return a pointer to the tensor's allocator.
Definition: Tensor.cpp:48
1 channel, 1 S32 per channel
void configure(const ITensor *a, const ITensor *b, const ITensor *c, ITensor *d, float alpha, float beta, const GEMMInfo &gemm_info)
If supported create an ACL function else fallback to the arm_gemm function.
void manage(TensorType *obj)
Sets a object to be managed by the given memory group.
1 channel, 1 U32 per channel
virtual const TensorShape & tensor_shape() const =0
Size for each dimension of the tensor.
quantized, asymmetric fixed-point 8-bit number
bool is_configured() const
Was the function successfully configured ?
void allocate() override
Allocate size specified by TensorInfo of CPU memory.
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor's metadata.
Definition: Validate.h:789
TensorShape & set(size_t dimension, size_t value, bool apply_dim_correction=true)
Accessor to set the value of one of the dimensions.
Definition: TensorShape.h:78
Store the tensor's metadata.
Definition: TensorInfo.h:45
GEMM information class.
Definition: Types.h:1880
signed 8-bit number
#define ARM_COMPUTE_ERROR_ON_MSG(cond,...)
Definition: Error.h:328

References TensorAllocator::allocate(), Tensor::allocator(), ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_ERROR_ON_MISMATCHING_DATA_TYPES, ARM_COMPUTE_ERROR_ON_MSG, arm_compute::test::validation::b, NEGEMMAssemblyDispatch::configure(), ITensorInfo::data_type(), ITensorInfo::dimension(), ITensor::info(), arm_compute::test::validation::info, TensorAllocator::init(), NEGEMMAssemblyDispatch::is_configured(), MemoryGroupBase< TensorType >::manage(), arm_compute::QASYMM8, arm_compute::S32, arm_compute::S8, TensorShape::set(), ITensorInfo::tensor_shape(), arm_compute::U32, and arm_compute::U8.

◆ run()

void run ( )

Run the kernels contained in the function.

For NEON kernels:

  • Multi-threading is used for the kernels which are parallelisable.
  • By default std::thread::hardware_concurrency() threads are used.
CPPScheduler::set_num_threads() can be used to manually set the number of threads

For OpenCL kernels:

  • All the kernels are enqueued on the queue associated with CLScheduler.
  • The queue is then flushed.
The function will not block until the kernels are executed. It is the user's responsibility to wait.
Will call prepare() on first run if hasn't been done

Implements IFunction.

Definition at line 118 of file NEGEMMLowpAssemblyMatrixMultiplyCore.cpp.

119 {
120  MemoryGroupResourceScope scope_mg(_memory_group);
121  if(_mtx_a_reshape_kernel)
122  {
123  NEScheduler::get().schedule(_mtx_a_reshape_kernel.get(), Window::DimY);
124  }
126  if(_mtx_b_reshape_kernel)
127  {
128  NEScheduler::get().schedule(_mtx_b_reshape_kernel.get(), Window::DimY);
129  }
131  if(_asm_glue.is_configured())
132  {
134  }
135  else
136  {
137  NEScheduler::get().schedule(_mm_kernel.get(), Window::DimY);
138  }
139 }
void run() override
Run the kernels contained in the function.
bool is_configured() const
Was the function successfully configured ?
static constexpr size_t DimY
Alias for dimension 1 also known as Y dimension.
Definition: Window.h:45
Memory group resources scope handling class.
Definition: IMemoryGroup.h:46
virtual void schedule(ICPPKernel *kernel, const Hints &hints)=0
Runs the kernel in the same thread as the caller synchronously.
static IScheduler & get()
Access the scheduler singleton.
Definition: Scheduler.cpp:96

References Window::DimY, Scheduler::get(), NEGEMMAssemblyDispatch::is_configured(), NEGEMMAssemblyDispatch::run(), and IScheduler::schedule().

The documentation for this class was generated from the following files: