NEGEMMLowpAssemblyMatrixMultiplyCore Class Reference

Basic function to execute matrix multiply assembly kernels. More...

#include <NEGEMMLowpAssemblyMatrixMultiplyCore.h>

Collaboration diagram for NEGEMMLowpAssemblyMatrixMultiplyCore:

Public Member Functions

 NEGEMMLowpAssemblyMatrixMultiplyCore (std::shared_ptr< IMemoryManager > memory_manager=nullptr)
 Constructor. More...
void configure (const ITensor *a, const ITensor *b, const ITensor *c, ITensor *output)
 Initialise the kernel's inputs, output. More...
void run () override
 Run the kernels contained in the function. More...
- Public Member Functions inherited from IFunction
virtual ~IFunction ()=default
 Destructor. More...
virtual void prepare ()
 Prepare the function for executing. More...

Detailed Description

Basic function to execute matrix multiply assembly kernels.

Constructor & Destructor Documentation

◆ NEGEMMLowpAssemblyMatrixMultiplyCore()

NEGEMMLowpAssemblyMatrixMultiplyCore ( std::shared_ptr< IMemoryManager memory_manager = nullptr)


42  : _memory_group(memory_manager), _asm_glue(memory_manager), _mm_kernel(nullptr), _mtx_a_reshape_kernel(nullptr), _mtx_b_reshape_kernel(nullptr), _tmp_a(), _tmp_b()
43 {
44 }

Member Function Documentation

◆ configure()

void configure ( const ITensor a,
const ITensor b,
const ITensor c,
ITensor output 

Initialise the kernel's inputs, output.

[in]aFirst input tensor (Matrix A). Data type supported: U8, S8.
[in]bSecond input tensor (Matrix B). Data type supported: same as a
[in]cThird input tensor (Matrix C). Data type supported: same as a
[out]outputOutput tensor. Data type supported: Data type supported: U32, S32

47 {
51  ARM_COMPUTE_ERROR_ON_MSG((a)->info()->dimension(0) != (b)->info()->dimension(1), "The product AB is defined only if the number of columns in A is equal to the number of rows in B");
52  ARM_COMPUTE_ERROR_ON_MSG((a)->info()->dimension(1) != (output)->info()->dimension(1), "The output matrix must have the same number of rows as the matrix A");
53  ARM_COMPUTE_ERROR_ON_MSG((b)->info()->dimension(0) != (output)->info()->dimension(0), "The output matrix must have the same number of columns as the matrix B");
55  bool run_optimised = false;
56  switch(a->info()->data_type())
57  {
58  case DataType::S8:
59  case DataType::QASYMM8:
60  case DataType::U8:
61  {
62  _asm_glue.configure(a, b, c, output, GEMMInfo(false, false, true));
63  run_optimised = _asm_glue.is_configured();
64  break;
65  }
66  default:
67  {
68  ARM_COMPUTE_ERROR("Datatype not supported");
69  break;
70  }
71  }
72  if(!run_optimised)
73  {
74  // The interleaved output matrix will have the following shape: [ a_height * 4, ceil(a_width / 4.0f) ]
75  TensorShape shape_tmp_a = a->info()->tensor_shape();
76  shape_tmp_a.set(0, a->info()->dimension(0) * 4);
77  shape_tmp_a.set(1, std::ceil(a->info()->dimension(1) / 4.f));
79  // The transpose1xW output matrix will have the following shape: [ b_height * 16, ceil(b_width / 16.0f) ]
80  TensorShape shape_tmp_b = b->info()->tensor_shape();
81  shape_tmp_b.set(0, b->info()->dimension(1) * 16);
82  shape_tmp_b.set(1, std::ceil(b->info()->dimension(0) / 16.f));
84  TensorInfo info_a(shape_tmp_a, 1, a->info()->data_type());
85  TensorInfo info_b(shape_tmp_b, 1, b->info()->data_type());
86  _tmp_a.allocator()->init(info_a);
87  _tmp_b.allocator()->init(info_b);
88  _memory_group.manage(&_tmp_a);
89  _memory_group.manage(&_tmp_b);
91  // Configure interleave kernel
92  {
93  auto k = arm_compute::support::cpp14::make_unique<NEGEMMInterleave4x4Kernel>();
94  k->configure(a, &_tmp_a);
95  _mtx_a_reshape_kernel = std::move(k);
96  }
98  // Configure transpose kernel
99  {
100  auto k = arm_compute::support::cpp14::make_unique<NEGEMMTranspose1xWKernel>();
101  k->configure(b, &_tmp_b);
102  _mtx_b_reshape_kernel = std::move(k);
103  }
105  // Configure matrix multiply kernel
106  {
107  auto k = arm_compute::support::cpp14::make_unique<NEGEMMLowpMatrixMultiplyKernel>();
108  k->configure(&_tmp_a, &_tmp_b, output);
109  _mm_kernel = std::move(k);
110  }
112  // Allocate tensors
113  _tmp_a.allocator()->allocate();
114  _tmp_b.allocator()->allocate();
115  }
116 }
References TensorAllocator::allocate(), Tensor::allocator(), ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_ERROR_ON_MISMATCHING_DATA_TYPES, ARM_COMPUTE_ERROR_ON_MSG, arm_compute::test::validation::b, NEGEMMAssemblyDispatch::configure(), ITensorInfo::data_type(), ITensorInfo::dimension(), ITensor::info(), arm_compute::test::validation::info, TensorAllocator::init(), NEGEMMAssemblyDispatch::is_configured(), MemoryGroup::manage(), arm_compute::QASYMM8, arm_compute::S32, arm_compute::S8, TensorShape::set(), ITensorInfo::tensor_shape(), arm_compute::U32, and arm_compute::U8.

◆ run()

void run ( )

Run the kernels contained in the function.

For NEON kernels:

  • Multi-threading is used for the kernels which are parallelisable.
  • By default std::thread::hardware_concurrency() threads are used.
CPPScheduler::set_num_threads() can be used to manually set the number of threads

For OpenCL kernels:

  • All the kernels are enqueued on the queue associated with CLScheduler.
  • The queue is then flushed.
The function will not block until the kernels are executed. It is the user's responsibility to wait.
Will call prepare() on first run if hasn't been done

Implements IFunction.

119 {
120  MemoryGroupResourceScope scope_mg(_memory_group);
121  if(_mtx_a_reshape_kernel)
122  {
123  NEScheduler::get().schedule(_mtx_a_reshape_kernel.get(), Window::DimY);
124  }
126  if(_mtx_b_reshape_kernel)
127  {
128  NEScheduler::get().schedule(_mtx_b_reshape_kernel.get(), Window::DimY);
129  }
131  if(_asm_glue.is_configured())
132  {
134  }
135  else
136  {
137  NEScheduler::get().schedule(_mm_kernel.get(), Window::DimY);
138  }
139 }
References Window::DimY, Scheduler::get(), NEGEMMAssemblyDispatch::is_configured(), NEGEMMAssemblyDispatch::run(), and IScheduler::schedule().

