Compute Library
 19.08
CLGEMMLowpMatrixMultiplyCore Class Reference

Basic function to execute GEMMLowpMatrixMultiplyCore on OpenCL. More...

#include <CLGEMMLowpMatrixMultiplyCore.h>

Collaboration diagram for CLGEMMLowpMatrixMultiplyCore:
[legend]

Public Member Functions

 CLGEMMLowpMatrixMultiplyCore (std::shared_ptr< IMemoryManager > memory_manager=nullptr)
 Constructor. More...
 
 CLGEMMLowpMatrixMultiplyCore (const CLGEMMLowpMatrixMultiplyCore &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 CLGEMMLowpMatrixMultiplyCore (CLGEMMLowpMatrixMultiplyCore &&)=default
 Default move constructor. More...
 
CLGEMMLowpMatrixMultiplyCoreoperator= (const CLGEMMLowpMatrixMultiplyCore &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
CLGEMMLowpMatrixMultiplyCoreoperator= (CLGEMMLowpMatrixMultiplyCore &&)=default
 Default move assignment operator. More...
 
void configure (const ICLTensor *a, const ICLTensor *b, const ICLTensor *c, ICLTensor *output, const GEMMInfo &gemm_info=GEMMInfo())
 Initialise the kernel's inputs, output. More...
 
void run () override
 Run the kernels contained in the function. More...
 
void prepare () override
 Prepare the function for executing. More...
 
- Public Member Functions inherited from IFunction
virtual ~IFunction ()=default
 Destructor. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *output, const GEMMInfo &gemm_info=GEMMInfo())
 Static function to check if given info will lead to a valid configuration of CLGEMMLowpMatrixMultiplyCore. More...
 

Detailed Description

Basic function to execute GEMMLowpMatrixMultiplyCore on OpenCL.

This function calls the following OpenCL kernels:

  1. CLGEMMReshapeRHSMatrixKernel (if the output tensor is a matrix)
  2. CLGEMMLowpMatrixMultiplyKernel (if the parameter "reshape_b_only_on_first_run" of GEMMInfo is FALSE)
  3. CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel (if the parameter "reshape_b_only_on_first_run" of GEMMInfo is TRUE)
  4. CLGEMMLowpMatrixAReductionKernel (if the offset of matrix B is not 0)
  5. CLGEMMLowpMatrixBReductionKernel (if the offset of matrix A is not 0)
  6. CLGEMMLowpOffsetContributionKernel (if gemm_info.gemmlowp_output_stage == NONE)
  7. CLGEMMLowpOffsetContributionOutputStageKernel (if gemm_info.gemmlowp_output_stage != NONE)

Definition at line 54 of file CLGEMMLowpMatrixMultiplyCore.h.

Constructor & Destructor Documentation

◆ CLGEMMLowpMatrixMultiplyCore() [1/3]

CLGEMMLowpMatrixMultiplyCore ( std::shared_ptr< IMemoryManager memory_manager = nullptr)

Constructor.

Definition at line 50 of file CLGEMMLowpMatrixMultiplyCore.cpp.

51  : _memory_group(std::move(memory_manager)),
52  _mm_midgard_kernel(),
53  _mm_native_kernel(),
54  _mm_reshaped_only_rhs_kernel(),
55  _mtx_b_reshape_kernel(),
56  _mtx_a_reduction_kernel(),
57  _mtx_b_reduction_kernel(),
58  _offset_contribution_kernel(),
59  _offset_contribution_output_stage_kernel(),
60  _vector_sum_col(),
61  _vector_sum_row(),
62  _tmp_b(),
63  _mm_result_s32(),
64  _original_b(nullptr),
65  _a_offset(0),
66  _b_offset(0),
67  _is_gemm_reshaped(true),
68  _is_midgard(false),
69  _reshape_b_only_on_first_run(false),
70  _is_prepared(false),
71  _fuse_output_stage(false)
72 {
73 }

◆ CLGEMMLowpMatrixMultiplyCore() [2/3]

Prevent instances of this class from being copied (As this class contains pointers)

◆ CLGEMMLowpMatrixMultiplyCore() [3/3]

Default move constructor.

Member Function Documentation

◆ configure()

void configure ( const ICLTensor a,
const ICLTensor b,
const ICLTensor c,
ICLTensor output,
const GEMMInfo gemm_info = GEMMInfo() 
)

Initialise the kernel's inputs, output.

Note
GEMMLowp: low precision GEMM kernel. [A * B + C] This kernel performs the following computations:
  1. Convert a values from QASYMM8 to int32 and add a_offset to each of them.
  2. Convert b values from QASYMM8 to int32 and add b_offset to each of them.
  3. Compute the matrix product of the resulting a * b in int32.
  4. Quantize to uint8 if gemm_info.gemmlowp_output_stage != NONE
Parameters
[in]aFirst input tensor (Matrix A). Data type supported: QASYMM8.
[in]bSecond input tensor (Matrix B). Data type supported: same as a
[in]cThird input tensor (Matrix C). It can be a nullptr. Data type supported: S32
[out]outputOutput tensor. Data type supported: S32 or QASYMM8 if gemm_info.gemmlowp_output_stage != NONE
[in]gemm_info(Optional) Specifies if the matrix A and/or matrix B have been reshaped and if the reshape of matrix B should be executed only for the first run

Definition at line 75 of file CLGEMMLowpMatrixMultiplyCore.cpp.

76 {
77  ARM_COMPUTE_ERROR_ON_NULLPTR(a, b, output);
78  ARM_COMPUTE_ERROR_THROW_ON(CLGEMMLowpMatrixMultiplyCore::validate(a->info(), b->info(), c != nullptr ? c->info() : nullptr, output->info(), gemm_info));
79 
80  _is_prepared = false;
81  _original_b = b;
82  _reshape_b_only_on_first_run = gemm_info.reshape_b_only_on_first_run();
83  _a_offset = a->info()->quantization_info().uniform().offset;
84  _b_offset = b->info()->quantization_info().uniform().offset;
85 
86  // Get the GPU target
87  const GPUTarget gpu_target = CLScheduler::get().target();
88 
89  // Set the target for the kernels
90  _mm_midgard_kernel.set_target(gpu_target);
91  _mm_native_kernel.set_target(gpu_target);
92  _mm_reshaped_only_rhs_kernel.set_target(gpu_target);
93 
94  const ICLTensor *matrix_a = a;
95  const ICLTensor *matrix_b = b;
96  GEMMRHSMatrixInfo rhs_info;
97  GEMMLHSMatrixInfo lhs_info;
98 
99  // Arguments used by GEMMReshapeInfo
100  // If we pass the matrix A and matrix B reshaped to CLGEMMMatrixMultiplyKernel, we need to pass m, n, k, mult_transpose1xW_width and mult_interleave4x4_height to CLGEMMReshapeInfo
101  // in order to know how the matrices have been reshaped
102  bool reinterpret_input_as_3d = gemm_info.reinterpret_input_as_3d();
103  const unsigned int m = reinterpret_input_as_3d ? (a->info()->dimension(1) * a->info()->dimension(2)) : a->info()->dimension(1);
104  const unsigned int n = b->info()->dimension(0);
105  const unsigned int k = a->info()->dimension(0);
106  const unsigned int batch_size = reinterpret_input_as_3d ? a->info()->dimension(3) : a->info()->dimension(2);
107  const int depth_output_gemm3d = gemm_info.depth_output_gemm3d();
108 
109  // Check if we need to reshape the matrix A and matrix B
110  _is_gemm_reshaped = is_gemm_reshaped(_reshape_b_only_on_first_run, gpu_target);
111  _is_midgard = gpu_target == GPUTarget::MIDGARD;
112 
113  if(_is_gemm_reshaped)
114  {
115  matrix_b = &_tmp_b;
116 
117  if(!_reshape_b_only_on_first_run)
118  {
119  _memory_group.manage(&_tmp_b);
120  }
121 
122  // Pick up the GEMM configuration
123  std::tie(lhs_info, rhs_info) = CLGEMMReshapedOnlyRHSKernelConfigurationFactory::create(gpu_target)->configure(m, n, k, batch_size, DataType::QASYMM8);
124 
125  // Configure reshape RHS kernel
126  _mtx_b_reshape_kernel.configure(b, &_tmp_b, rhs_info);
127  }
128 
129  // Initialize matrix B reduction kernel only if _a_offset is not equal to 0
130  if(_a_offset != 0)
131  {
132  TensorInfo info_vector_sum_col(compute_reductionA_shape(*b->info()), 1, DataType::S32);
133  _vector_sum_col.allocator()->init(info_vector_sum_col);
134  if(!_reshape_b_only_on_first_run)
135  {
136  _memory_group.manage(&_vector_sum_col);
137  }
138 
139  // Configure Matrix B reduction kernel
140  _mtx_b_reduction_kernel.configure(b, &_vector_sum_col);
141  }
142 
143  // Initialize Matrix A reduction kernel only if _b_offset is not equal to 0
144  if(_b_offset != 0)
145  {
146  TensorInfo info_vector_sum_row(compute_reductionB_shape(*a->info()), 1, DataType::S32);
147  _vector_sum_row.allocator()->init(info_vector_sum_row);
148  _memory_group.manage(&_vector_sum_row);
149 
150  // Configure matrix A reduction kernel
151  _mtx_a_reduction_kernel.configure(a, &_vector_sum_row);
152  }
153 
154  // If GEMMLowpOutputStage != NONE, fuse the offset contribution with the output stage
155  if(gemm_info.gemmlowp_output_stage().type != GEMMLowpOutputStageType::NONE)
156  {
157  _fuse_output_stage = true;
158 
159  _memory_group.manage(&_mm_result_s32);
160 
161  if(_is_gemm_reshaped)
162  {
163  // Configure and tune matrix multiply kernel
164  _mm_reshaped_only_rhs_kernel.configure(matrix_a, matrix_b, &_mm_result_s32, lhs_info, rhs_info, GEMMReshapeInfo(m, n, k, 1, 1, depth_output_gemm3d, reinterpret_input_as_3d));
165  }
166  else
167  {
168  if(_is_midgard)
169  {
170  // Configure matrix multiply kernel
171  _mm_midgard_kernel.configure(matrix_a, matrix_b, &_mm_result_s32, GEMMReshapeInfo(m, n, k, 1, 1, depth_output_gemm3d, reinterpret_input_as_3d));
172  }
173  else
174  {
175  // Pick up the GEMM configuration
176  std::tie(lhs_info, rhs_info) = CLGEMMNativeKernelConfigurationFactory::create(gpu_target)->configure(m, n, k, batch_size, DataType::QASYMM8);
177 
178  // Configure matrix multiply kernel
179  _mm_native_kernel.configure(matrix_a, matrix_b, &_mm_result_s32, lhs_info, rhs_info, GEMMReshapeInfo(m, n, k, 1, 1, depth_output_gemm3d, reinterpret_input_as_3d));
180  }
181  }
182 
183  // Configure offset contribution kernel
184  _offset_contribution_output_stage_kernel.configure(&_mm_result_s32, _a_offset == 0 ? nullptr : &_vector_sum_col, _b_offset == 0 ? nullptr : &_vector_sum_row, c, output, a->info()->dimension(0),
185  _a_offset, _b_offset, gemm_info.gemmlowp_output_stage());
186 
187  _mm_result_s32.allocator()->allocate();
188  }
189  else
190  {
191  if(_is_gemm_reshaped)
192  {
193  // Configure and tune matrix multiply kernel
194  _mm_reshaped_only_rhs_kernel.configure(matrix_a, matrix_b, output, lhs_info, rhs_info, GEMMReshapeInfo(m, n, k, 1, 1, depth_output_gemm3d, reinterpret_input_as_3d));
195  }
196  else
197  {
198  if(_is_midgard)
199  {
200  // Configure matrix multiply kernel
201  _mm_midgard_kernel.configure(matrix_a, matrix_b, output, GEMMReshapeInfo(m, n, k, 1, 1, depth_output_gemm3d, reinterpret_input_as_3d));
202  }
203  else
204  {
205  // Pick up the GEMM configuration
206  std::tie(lhs_info, rhs_info) = CLGEMMNativeKernelConfigurationFactory::create(gpu_target)->configure(m, n, k, batch_size, DataType::QASYMM8);
207 
208  // Configure matrix multiply kernel
209  _mm_native_kernel.configure(matrix_a, matrix_b, output, lhs_info, rhs_info, GEMMReshapeInfo(m, n, k, 1, 1, depth_output_gemm3d, reinterpret_input_as_3d));
210  }
211  }
212 
213  // Configure offset contribution kernel
214  _offset_contribution_kernel.configure(output, _a_offset == 0 ? nullptr : &_vector_sum_col, _b_offset == 0 ? nullptr : &_vector_sum_row, c, a->info()->dimension(0), _a_offset, _b_offset);
215  }
216 
217  // Allocate tensors
218  if(_is_gemm_reshaped)
219  {
220  if(!_reshape_b_only_on_first_run)
221  {
222  _tmp_b.allocator()->allocate();
223  }
224  }
225 
226  if(_a_offset != 0 && !_reshape_b_only_on_first_run)
227  {
228  _vector_sum_col.allocator()->allocate();
229  }
230 
231  if(_b_offset != 0)
232  {
233  _vector_sum_row.allocator()->allocate();
234  }
235 }
void configure(ICLTensor *mm_result, const ICLTensor *vector_sum_col, const ICLTensor *vector_sum_row, const ICLTensor *bias, int32_t k, int32_t a_offset, int32_t b_offset)
Initialise the kernel's input and output.
SimpleTensor< float > b
Definition: DFT.cpp:157
static CLScheduler & get()
Access the scheduler singleton.
Definition: CLScheduler.cpp:41
GPUTarget target() const
Get the target GPU.
Definition: CLScheduler.h:112
TensorShape compute_reductionA_shape(const ITensorInfo &b)
Calculate the reductionA shape used in GEMMLowp.
CLTensorAllocator * allocator()
Return a pointer to the tensor's allocator.
Definition: CLTensor.cpp:55
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:327
void init(const TensorInfo &input, size_t alignment=0)
Initialize a tensor based on the passed TensorInfo.
1 channel, 1 S32 per channel
static Status validate(const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *output, const GEMMInfo &gemm_info=GEMMInfo())
Static function to check if given info will lead to a valid configuration of CLGEMMLowpMatrixMultiply...
void manage(TensorType *obj)
Sets a object to be managed by the given memory group.
static std::unique_ptr< ICLGEMMKernelConfiguration > create(GPUTarget arch)
Static method to construct CLGEMMNative kernel object accordingly with the GPU architecture.
quantized, asymmetric fixed-point 8-bit number
void configure(const ICLTensor *input0, const ICLTensor *input1, ICLTensor *output, const GEMMLHSMatrixInfo &lhs_info, const GEMMRHSMatrixInfo &rhs_info, const GEMMReshapeInfo &gemm_info)
Initialise the kernel's input and output.
void configure(const ICLTensor *mtx_b, ICLTensor *vector_sum_col) override
Initialise the kernel's input and output.
static std::unique_ptr< ICLGEMMKernelConfiguration > create(GPUTarget arch)
Static method to call the CLGEMMReshapedOnlyRHS kernel configuration class accordingly with the GPU a...
TensorShape compute_reductionB_shape(const ITensorInfo &a)
Calculate the reductionB shape used in GEMMLowp.
void allocate() override
Allocate size specified by TensorInfo of OpenCL memory.
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161
GPUTarget
Available GPU Targets.
Definition: GPUTarget.h:34
void configure(const ICLTensor *input0, const ICLTensor *input1, ICLTensor *output, const GEMMReshapeInfo &gemm_info=GEMMReshapeInfo())
Initialise the kernel's input and output.
void configure(const ICLTensor *input, ICLTensor *output, const GEMMRHSMatrixInfo &rhs_info)
Initialise the kernel's input and output.
void configure(const ICLTensor *input0, const ICLTensor *input1, ICLTensor *output, const GEMMLHSMatrixInfo &lhs_info, const GEMMRHSMatrixInfo &rhs_info, const GEMMReshapeInfo &gemm_info)
Initialise the kernel's input and output.
void configure(const ICLTensor *mm_result, const ICLTensor *vector_sum_col, const ICLTensor *vector_sum_row, const ICLTensor *bias, ICLTensor *output, int32_t k, int32_t a_offset, int32_t b_offset, const GEMMLowpOutputStageInfo &output_stage)
Initialise the kernel's input and output.
void set_target(GPUTarget target)
Set the targeted GPU architecture.
Definition: ICLKernel.h:271
void configure(const ICLTensor *mtx_a, ICLTensor *vector_sum_row) override
Initialise the kernel's input and output.

References CLTensorAllocator::allocate(), CLTensor::allocator(), ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, arm_compute::test::validation::b, arm_compute::misc::shape_calculator::compute_reductionA_shape(), arm_compute::misc::shape_calculator::compute_reductionB_shape(), CLGEMMLowpMatrixMultiplyNativeKernel::configure(), CLGEMMReshapeRHSMatrixKernel::configure(), CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel::configure(), CLGEMMLowpMatrixMultiplyKernel::configure(), CLGEMMLowpOffsetContributionOutputStageKernel::configure(), CLGEMMLowpOffsetContributionKernel::configure(), CLGEMMLowpMatrixAReductionKernel::configure(), CLGEMMLowpMatrixBReductionKernel::configure(), CLGEMMNativeKernelConfigurationFactory::create(), CLGEMMReshapedOnlyRHSKernelConfigurationFactory::create(), GEMMInfo::depth_output_gemm3d(), ITensorInfo::dimension(), GEMMInfo::gemmlowp_output_stage(), CLScheduler::get(), ITensor::info(), ITensorAllocator::init(), MemoryGroupBase< TensorType >::manage(), arm_compute::MIDGARD, arm_compute::NONE, UniformQuantizationInfo::offset, arm_compute::QASYMM8, ITensorInfo::quantization_info(), GEMMInfo::reinterpret_input_as_3d(), GEMMInfo::reshape_b_only_on_first_run(), arm_compute::S32, ICLKernel::set_target(), CLScheduler::target(), GEMMLowpOutputStageInfo::type, QuantizationInfo::uniform(), and CLGEMMLowpMatrixMultiplyCore::validate().

Referenced by CLLSTMLayerQuantized::configure(), CLGEMMDeconvolutionLayer::configure(), and arm_compute::test::validation::DATA_TEST_CASE().

◆ operator=() [1/2]

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

Default move assignment operator.

◆ prepare()

void prepare ( )
overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Note
Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from IFunction.

Definition at line 437 of file CLGEMMLowpMatrixMultiplyCore.cpp.

438 {
439  if(!_is_prepared)
440  {
441  if(_is_gemm_reshaped && _reshape_b_only_on_first_run)
442  {
443  ARM_COMPUTE_ERROR_ON(!_original_b->is_used());
444 
445  // Run reshape kernel and mark original weights tensor as unused
446  _tmp_b.allocator()->allocate();
447  CLScheduler::get().enqueue(_mtx_b_reshape_kernel, false);
448  _original_b->mark_as_unused();
449  }
450 
451  // Run matrix B reduction kernel only if _a_offset is not equal to 0
452  if(_a_offset != 0 && _reshape_b_only_on_first_run)
453  {
454  _vector_sum_col.allocator()->allocate();
455  CLScheduler::get().enqueue(_mtx_b_reduction_kernel, false);
456  }
457 
458  CLScheduler::get().queue().finish();
459  _is_prepared = true;
460  }
461 }
static CLScheduler & get()
Access the scheduler singleton.
Definition: CLScheduler.cpp:41
bool is_used() const
Flags if the tensor is used or not.
Definition: ITensor.cpp:162
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:337
CLTensorAllocator * allocator()
Return a pointer to the tensor's allocator.
Definition: CLTensor.cpp:55
void mark_as_unused() const
Marks a tensor as unused.
Definition: ITensor.cpp:167
void enqueue(ICLKernel &kernel, bool flush=true)
Schedule the execution of the passed kernel if possible.
Definition: CLScheduler.cpp:95
cl::CommandQueue & queue()
Accessor for the associated CL command queue.
Definition: CLScheduler.h:102
void allocate() override
Allocate size specified by TensorInfo of OpenCL memory.

References CLTensorAllocator::allocate(), CLTensor::allocator(), ARM_COMPUTE_ERROR_ON, CLScheduler::enqueue(), CLScheduler::get(), ITensor::is_used(), ITensor::mark_as_unused(), and CLScheduler::queue().

Referenced by CLGEMMDeconvolutionLayer::prepare(), CLGEMMConvolutionLayer::prepare(), and CLGEMMLowpMatrixMultiplyCore::run().

◆ run()

void run ( )
overridevirtual

Run the kernels contained in the function.

For NEON kernels:

  • Multi-threading is used for the kernels which are parallelisable.
  • By default std::thread::hardware_concurrency() threads are used.
Note
CPPScheduler::set_num_threads() can be used to manually set the number of threads

For OpenCL kernels:

  • All the kernels are enqueued on the queue associated with CLScheduler.
  • The queue is then flushed.
Note
The function will not block until the kernels are executed. It is the user's responsibility to wait.
Will call prepare() on first run if hasn't been done

Implements IFunction.

Definition at line 381 of file CLGEMMLowpMatrixMultiplyCore.cpp.

382 {
383  prepare();
384 
385  MemoryGroupResourceScope scope_mg(_memory_group);
386 
387  if(_is_gemm_reshaped)
388  {
389  if(!_reshape_b_only_on_first_run)
390  {
391  // Run reshape matrix B
392  CLScheduler::get().enqueue(_mtx_b_reshape_kernel, false);
393  }
394  }
395 
396  // Run matrix B reduction kernel only if _a_offset is not equal to 0
397  if(_a_offset != 0 && !_reshape_b_only_on_first_run)
398  {
399  CLScheduler::get().enqueue(_mtx_b_reduction_kernel, false);
400  }
401 
402  // Run matrix multiply
403  if(_is_gemm_reshaped)
404  {
405  CLScheduler::get().enqueue(_mm_reshaped_only_rhs_kernel, false);
406  }
407  else
408  {
409  if(_is_midgard)
410  {
411  CLScheduler::get().enqueue(_mm_midgard_kernel, false);
412  }
413  else
414  {
415  CLScheduler::get().enqueue(_mm_native_kernel, false);
416  }
417  }
418 
419  // Run matrix A reduction kernel only if _b_offset is not equal to 0
420  if(_b_offset != 0)
421  {
422  CLScheduler::get().enqueue(_mtx_a_reduction_kernel, false);
423  }
424 
425  if(_fuse_output_stage)
426  {
427  // Run offset contribution/output stage kernel
428  CLScheduler::get().enqueue(_offset_contribution_output_stage_kernel, true);
429  }
430  else
431  {
432  // Run offset contribution kernel
433  CLScheduler::get().enqueue(_offset_contribution_kernel, true);
434  }
435 }
void prepare() override
Prepare the function for executing.
static CLScheduler & get()
Access the scheduler singleton.
Definition: CLScheduler.cpp:41
void enqueue(ICLKernel &kernel, bool flush=true)
Schedule the execution of the passed kernel if possible.
Definition: CLScheduler.cpp:95

References CLScheduler::enqueue(), CLScheduler::get(), and CLGEMMLowpMatrixMultiplyCore::prepare().

Referenced by CLGEMMDeconvolutionLayer::run(), CLFullyConnectedLayer::run(), CLLSTMLayerQuantized::run(), and CLGEMMConvolutionLayer::run().

◆ validate()

Status validate ( const ITensorInfo a,
const ITensorInfo b,
const ITensorInfo c,
const ITensorInfo output,
const GEMMInfo gemm_info = GEMMInfo() 
)
static

Static function to check if given info will lead to a valid configuration of CLGEMMLowpMatrixMultiplyCore.

Parameters
[in]aFirst input tensor (Matrix A). Data type supported: QASYMM8.
[in]bSecond input tensor (Matrix B). Data type supported: same as a
[in]cThird input tensor (Matrix C). It can be a nullptr. Data type supported: S32
[in]outputOutput tensor. Data type supported: S32 or QASYMM8 if gemm_info.gemmlowp_output_stage != NONE
[in]gemm_info(Optional) Specifies if the matrix A and/or matrix B have been reshaped and if the reshape of matrix B should be executed only for the first run
Returns
a status

Definition at line 237 of file CLGEMMLowpMatrixMultiplyCore.cpp.

238 {
241  ARM_COMPUTE_RETURN_ERROR_ON_MSG(gemm_info.is_a_reshaped(), "Matrix A already reshaped is not supported");
242  ARM_COMPUTE_RETURN_ERROR_ON_MSG(gemm_info.is_b_reshaped(), "Matrix B already reshaped is not supported");
243 
244  int32_t a_offset = a->quantization_info().uniform().offset;
245  int32_t b_offset = b->quantization_info().uniform().offset;
246 
247  const ITensorInfo *matrix_a_info = a;
248  const ITensorInfo *matrix_b_info = b;
249 
250  TensorInfo tmp_b_info{};
251  GEMMRHSMatrixInfo rhs_info;
252  GEMMLHSMatrixInfo lhs_info;
253 
254  // Get the GPU target
255  const GPUTarget gpu_target = CLScheduler::get().target();
256 
257  bool reinterpret_input_as_3d = gemm_info.reinterpret_input_as_3d();
258  const unsigned int m = reinterpret_input_as_3d ? (a->dimension(1) * a->dimension(2)) : a->dimension(1);
259  const unsigned int n = b->dimension(0);
260  const unsigned int k = a->dimension(0);
261  const unsigned int batch_size = reinterpret_input_as_3d ? a->dimension(3) : a->dimension(2);
262  const int depth_output_gemm3d = gemm_info.depth_output_gemm3d();
263  const bool is_midgard = gpu_target == GPUTarget::MIDGARD;
264 
265  bool reshape_matrix_b = is_gemm_reshaped(gemm_info.reshape_b_only_on_first_run(), CLScheduler::get().target());
266 
267  const GEMMReshapeInfo reshape_info = GEMMReshapeInfo(m, n, k, 1, 1, depth_output_gemm3d, reinterpret_input_as_3d);
268 
269  if(reshape_matrix_b)
270  {
271  matrix_b_info = &tmp_b_info;
272 
273  // Pick up the GEMM configuration
274  std::tie(lhs_info, rhs_info) = CLGEMMReshapedOnlyRHSKernelConfigurationFactory::create(gpu_target)->configure(m, n, k, batch_size, DataType::QASYMM8);
275 
276  // Validate reshape RHS kernel
277  auto_init_if_empty(tmp_b_info, b->clone()->set_tensor_shape(compute_rhs_reshaped_shape(*b, rhs_info)));
279  }
280 
281  TensorInfo info_vector_sum_col{};
282  TensorInfo info_vector_sum_row{};
283 
284  // Validate matrix B reduction kernel only if _a_offset is not equal to 0
285  if(a_offset != 0)
286  {
287  info_vector_sum_col = TensorInfo(compute_reductionA_shape(*b), 1, DataType::S32);
288 
289  // Configure Matrix B reduction kernel
291  }
292 
293  // Validate Matrix A reduction kernel only if _b_offset is not equal to 0
294  if(b_offset != 0)
295  {
296  info_vector_sum_row = TensorInfo(compute_reductionB_shape(*a), 1, DataType::S32);
297 
298  // Configure matrix A reduction kernel
300  }
301 
302  if(gemm_info.gemmlowp_output_stage().type != GEMMLowpOutputStageType::NONE)
303  {
304  TensorInfo mm_result_s32_info{};
305 
306  if(reshape_matrix_b)
307  {
308  // Output tensor auto inizialitation if not yet initialized
309  auto_init_if_empty(mm_result_s32_info, a->clone()->set_tensor_shape(compute_mm_shape(*matrix_a_info, *matrix_b_info, reshape_info)).set_data_type(DataType::S32));
310 
311  // Validate matrix multiply
312  ARM_COMPUTE_RETURN_ON_ERROR(CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel::validate(matrix_a_info, matrix_b_info, &mm_result_s32_info, lhs_info, rhs_info, reshape_info));
313  }
314  else
315  {
316  // Output tensor auto inizialitation if not yet initialized
317  auto_init_if_empty(mm_result_s32_info, a->clone()->set_tensor_shape(compute_mm_shape(*matrix_a_info, *matrix_b_info, false, reshape_info)).set_data_type(DataType::S32));
318 
319  if(is_midgard)
320  {
321  // Validate matrix multiply
322  ARM_COMPUTE_RETURN_ON_ERROR(CLGEMMLowpMatrixMultiplyKernel::validate(matrix_a_info, matrix_b_info, &mm_result_s32_info, reshape_info));
323  }
324  else
325  {
326  // Pick up the GEMM configuration
327  std::tie(lhs_info, rhs_info) = CLGEMMNativeKernelConfigurationFactory::create(gpu_target)->configure(m, n, k, batch_size, DataType::QASYMM8);
328 
329  // Validate matrix multiply
330  ARM_COMPUTE_RETURN_ON_ERROR(CLGEMMLowpMatrixMultiplyNativeKernel::validate(matrix_a_info, matrix_b_info, &mm_result_s32_info, lhs_info, rhs_info, reshape_info));
331  }
332  }
333 
334  // Validate offset contribution kernel
336  a_offset == 0 ? nullptr : &info_vector_sum_col,
337  b_offset == 0 ? nullptr : &info_vector_sum_row,
338  c,
339  output,
340  a_offset, b_offset,
341  gemm_info.gemmlowp_output_stage()));
342  }
343  else
344  {
345  if(reshape_matrix_b)
346  {
347  // Validate matrix multiply
348  ARM_COMPUTE_RETURN_ON_ERROR(CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel::validate(matrix_a_info, matrix_b_info, output, lhs_info, rhs_info, reshape_info));
349  }
350  else
351  {
352  if(is_midgard)
353  {
354  // Validate matrix multiply
355  ARM_COMPUTE_RETURN_ON_ERROR(CLGEMMLowpMatrixMultiplyKernel::validate(matrix_a_info, matrix_b_info, output, reshape_info));
356  }
357  else
358  {
359  // Pick up the GEMM configuration
360  std::tie(lhs_info, rhs_info) = CLGEMMNativeKernelConfigurationFactory::create(gpu_target)->configure(m, n, k, batch_size, DataType::QASYMM8);
361 
362  // Validate matrix multiply
363  ARM_COMPUTE_RETURN_ON_ERROR(CLGEMMLowpMatrixMultiplyNativeKernel::validate(matrix_a_info, matrix_b_info, output, lhs_info, rhs_info, reshape_info));
364  }
365  }
366 
367  if(output->total_size() != 0)
368  {
369  // Validate offset contribution kernel
371  a_offset == 0 ? nullptr : &info_vector_sum_col,
372  b_offset == 0 ? nullptr : &info_vector_sum_row,
373  c,
374  a_offset, b_offset));
375  }
376  }
377 
378  return Status{};
379 }
static Status validate(const ITensorInfo *mm_result, const ITensorInfo *vector_sum_col, const ITensorInfo *vector_sum_row, const ITensorInfo *bias, int32_t a_offset, int32_t b_offset)
Static function to check if given info will lead to a valid configuration of CLGEMMLowpOffsetContribu...
SimpleTensor< float > b
Definition: DFT.cpp:157
static CLScheduler & get()
Access the scheduler singleton.
Definition: CLScheduler.cpp:41
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(...)
Definition: Validate.h:545
GPUTarget target() const
Get the target GPU.
Definition: CLScheduler.h:112
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:193
TensorShape compute_mm_shape(const ITensorInfo &input0, const ITensorInfo &input1, bool is_interleaved_transposed, const GEMMReshapeInfo &reshape_info)
Calculate the matrix multiplication output shape of two tensors.
TensorShape compute_reductionA_shape(const ITensorInfo &b)
Calculate the reductionA shape used in GEMMLowp.
#define ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(t, c,...)
Definition: Validate.h:791
static Status validate(const ITensorInfo *mtx_a, const ITensorInfo *vector_sum_row)
Static function to check if given info will lead to a valid configuration of CLGEMMLowpMatrixAReducti...
bool auto_init_if_empty(ITensorInfo &info, const TensorShape &shape, int num_channels, DataType data_type, QuantizationInfo quantization_info=QuantizationInfo())
Auto initialize the tensor info (shape, number of channels and data type) if the current assignment i...
Definition: Helpers.inl:201
1 channel, 1 S32 per channel
static Status validate(const ITensorInfo *mm_result, const ITensorInfo *vector_sum_col, const ITensorInfo *vector_sum_row, const ITensorInfo *bias, const ITensorInfo *output, int32_t a_offset, int32_t b_offset, const GEMMLowpOutputStageInfo &output_stage)
Static function to check if given info will lead to a valid configuration of CLGEMMLowpOffsetContribu...
static std::unique_ptr< ICLGEMMKernelConfiguration > create(GPUTarget arch)
Static method to construct CLGEMMNative kernel object accordingly with the GPU architecture.
static Status validate(const ITensorInfo *input0, const ITensorInfo *input1, const ITensorInfo *output, const GEMMReshapeInfo &gemm_info=GEMMReshapeInfo())
Static function to check if given info will lead to a valid configuration of CLGEMMLowpMatrixMultiply...
quantized, asymmetric fixed-point 8-bit number
#define ARM_COMPUTE_RETURN_ERROR_ON_MSG(cond,...)
If the condition is true, an error is returned.
Definition: Error.h:214
TensorShape compute_rhs_reshaped_shape(const ITensorInfo &a, const GEMMRHSMatrixInfo &rhs_info)
Calculate the Right Hand Side matrix reshaped shape.
static std::unique_ptr< ICLGEMMKernelConfiguration > create(GPUTarget arch)
Static method to call the CLGEMMReshapedOnlyRHS kernel configuration class accordingly with the GPU a...
static Status validate(const ITensorInfo *input0, const ITensorInfo *input1, const ITensorInfo *output, const GEMMLHSMatrixInfo &lhs_info, const GEMMRHSMatrixInfo &rhs_info, const GEMMReshapeInfo &gemm_info)
Static function to check if given info will lead to a valid configuration of CLGEMMLowpMatrixMultiply...
TensorShape compute_reductionB_shape(const ITensorInfo &a)
Calculate the reductionB shape used in GEMMLowp.
GPUTarget
Available GPU Targets.
Definition: GPUTarget.h:34
static Status validate(const ITensorInfo *input0, const ITensorInfo *input1, const ITensorInfo *output, const GEMMLHSMatrixInfo &lhs_info, const GEMMRHSMatrixInfo &rhs_info, const GEMMReshapeInfo &gemm_info)
Static function to check if given info will lead to a valid configuration of CLGEMMLowpMatrixMultiply...
static Status validate(const ITensorInfo *mtx_b, const ITensorInfo *vector_sum_col)
Static function to check if given info will lead to a valid configuration of CLGEMMLowpMatrixBReducti...
static Status validate(const ITensorInfo *input, const ITensorInfo *output, const GEMMRHSMatrixInfo &rhs_info)
Static function to check if given info will lead to a valid configuration of CLGEMMReshapeRHSMatrixKe...

References ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES, ARM_COMPUTE_RETURN_ERROR_ON_MSG, ARM_COMPUTE_RETURN_ON_ERROR, arm_compute::auto_init_if_empty(), arm_compute::test::validation::b, ICloneable< T >::clone(), arm_compute::misc::shape_calculator::compute_mm_shape(), arm_compute::misc::shape_calculator::compute_reductionA_shape(), arm_compute::misc::shape_calculator::compute_reductionB_shape(), arm_compute::misc::shape_calculator::compute_rhs_reshaped_shape(), CLGEMMNativeKernelConfigurationFactory::create(), CLGEMMReshapedOnlyRHSKernelConfigurationFactory::create(), GEMMInfo::depth_output_gemm3d(), ITensorInfo::dimension(), GEMMInfo::gemmlowp_output_stage(), CLScheduler::get(), GEMMInfo::is_a_reshaped(), GEMMInfo::is_b_reshaped(), arm_compute::MIDGARD, arm_compute::NONE, UniformQuantizationInfo::offset, arm_compute::QASYMM8, ITensorInfo::quantization_info(), GEMMInfo::reinterpret_input_as_3d(), GEMMInfo::reshape_b_only_on_first_run(), arm_compute::S32, CLScheduler::target(), ITensorInfo::total_size(), GEMMLowpOutputStageInfo::type, QuantizationInfo::uniform(), CLGEMMReshapeRHSMatrixKernel::validate(), CLGEMMLowpMatrixMultiplyKernel::validate(), CLGEMMLowpMatrixMultiplyNativeKernel::validate(), CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel::validate(), CLGEMMLowpMatrixAReductionKernel::validate(), CLGEMMLowpOffsetContributionOutputStageKernel::validate(), CLGEMMLowpOffsetContributionKernel::validate(), and CLGEMMLowpMatrixBReductionKernel::validate().

Referenced by CLGEMMLowpMatrixMultiplyCore::configure(), CLGEMMDeconvolutionLayer::validate(), and CLLSTMLayerQuantized::validate().


The documentation for this class was generated from the following files: