Compute Library
 19.11
CLGEMMLowpMatrixMultiplyCore Class Reference

Basic function to execute GEMMLowpMatrixMultiplyCore on OpenCL. More...

#include <CLGEMMLowpMatrixMultiplyCore.h>

Collaboration diagram for CLGEMMLowpMatrixMultiplyCore:
[legend]

Public Member Functions

 CLGEMMLowpMatrixMultiplyCore (std::shared_ptr< IMemoryManager > memory_manager=nullptr)
 Constructor. More...
 
 CLGEMMLowpMatrixMultiplyCore (const CLGEMMLowpMatrixMultiplyCore &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 CLGEMMLowpMatrixMultiplyCore (CLGEMMLowpMatrixMultiplyCore &&)=default
 Default move constructor. More...
 
CLGEMMLowpMatrixMultiplyCoreoperator= (const CLGEMMLowpMatrixMultiplyCore &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
CLGEMMLowpMatrixMultiplyCoreoperator= (CLGEMMLowpMatrixMultiplyCore &&)=default
 Default move assignment operator. More...
 
void configure (const ICLTensor *a, const ICLTensor *b, const ICLTensor *c, ICLTensor *output, const GEMMInfo &gemm_info=GEMMInfo())
 Initialise the kernel's inputs, output. More...
 
void run () override
 Run the kernels contained in the function. More...
 
void prepare () override
 Prepare the function for executing. More...
 
- Public Member Functions inherited from IFunction
virtual ~IFunction ()=default
 Destructor. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *output, const GEMMInfo &gemm_info=GEMMInfo())
 Static function to check if given info will lead to a valid configuration of CLGEMMLowpMatrixMultiplyCore. More...
 

Detailed Description

Basic function to execute GEMMLowpMatrixMultiplyCore on OpenCL.

This function calls the following OpenCL kernels:

  1. CLGEMMReshapeRHSMatrixKernel (if the output tensor is a matrix)
  2. CLGEMMLowpMatrixMultiplyKernel (if the parameter "reshape_b_only_on_first_run" of GEMMInfo is FALSE)
  3. CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel (if the parameter "reshape_b_only_on_first_run" of GEMMInfo is TRUE)
  4. CLGEMMLowpMatrixAReductionKernel (if the offset of matrix B is not 0)
  5. CLGEMMLowpMatrixBReductionKernel (if the offset of matrix A is not 0)
  6. CLGEMMLowpOffsetContributionKernel (if gemm_info.gemmlowp_output_stage == NONE)
  7. CLGEMMLowpOffsetContributionOutputStageKernel (if gemm_info.gemmlowp_output_stage != NONE)
  8. CLDepthConvertLayerKernel

Definition at line 56 of file CLGEMMLowpMatrixMultiplyCore.h.

Constructor & Destructor Documentation

◆ CLGEMMLowpMatrixMultiplyCore() [1/3]

CLGEMMLowpMatrixMultiplyCore ( std::shared_ptr< IMemoryManager memory_manager = nullptr)

Constructor.

Definition at line 51 of file CLGEMMLowpMatrixMultiplyCore.cpp.

52  : _memory_group(std::move(memory_manager)),
53  _weights_to_qasymm8(),
54  _mm_midgard_kernel(),
55  _mm_native_kernel(),
56  _mm_reshaped_only_rhs_kernel(),
57  _mtx_b_reshape_kernel(),
58  _mtx_a_reduction_kernel(),
59  _mtx_b_reduction_kernel(),
60  _offset_contribution_kernel(),
61  _offset_contribution_output_stage_kernel(),
62  _qasymm8_weights(),
63  _vector_sum_col(),
64  _vector_sum_row(),
65  _tmp_b(),
66  _mm_result_s32(),
67  _gemm_output_stage_multipliers(),
68  _gemm_output_stage_shifts(),
69  _matrix_a(nullptr),
70  _original_b(nullptr),
71  _output(nullptr),
72  _a_offset(0),
73  _b_offset(0),
74  _is_gemm_reshaped(true),
75  _is_midgard(false),
76  _reshape_b_only_on_first_run(false),
77  _is_prepared(false),
78  _fuse_output_stage(false),
79  _convert_to_qasymm8(false)
80 {
81 }

◆ CLGEMMLowpMatrixMultiplyCore() [2/3]

Prevent instances of this class from being copied (As this class contains pointers)

◆ CLGEMMLowpMatrixMultiplyCore() [3/3]

Default move constructor.

Member Function Documentation

◆ configure()

void configure ( const ICLTensor a,
const ICLTensor b,
const ICLTensor c,
ICLTensor output,
const GEMMInfo gemm_info = GEMMInfo() 
)

Initialise the kernel's inputs, output.

Note
GEMMLowp: low precision GEMM kernel. [A * B + C] This kernel performs the following computations:
  1. Convert a values from QASYMM8 to int32 and add a_offset to each of them.
  2. Convert b values from QASYMM8 to int32 and add b_offset to each of them.
  3. Compute the matrix product of the resulting a * b in int32.
  4. Quantize to uint8 if gemm_info.gemmlowp_output_stage != NONE
Parameters
[in]aFirst input tensor (Matrix A). Data type supported: QASYMM8.
[in]bSecond input tensor (Matrix B). Data type supported: same as a
[in]cThird input tensor (Matrix C). It can be a nullptr. Data type supported: S32
[out]outputOutput tensor. Data type supported: S32 or QASYMM8 if gemm_info.gemmlowp_output_stage != NONE
[in]gemm_info(Optional) Specifies if the matrix A and/or matrix B have been reshaped and if the reshape of matrix B should be executed only for the first run

Definition at line 83 of file CLGEMMLowpMatrixMultiplyCore.cpp.

84 {
85  ARM_COMPUTE_ERROR_ON_NULLPTR(a, b, output);
86  ARM_COMPUTE_ERROR_THROW_ON(CLGEMMLowpMatrixMultiplyCore::validate(a->info(), b->info(), c != nullptr ? c->info() : nullptr, output->info(), gemm_info));
87 
88  _is_prepared = false;
89  _original_b = b;
90  _reshape_b_only_on_first_run = gemm_info.reshape_b_only_on_first_run();
91  _a_offset = a->info()->quantization_info().uniform().offset;
92  _matrix_a = a;
93  _output = output;
94 
95  _convert_to_qasymm8 = is_data_type_quantized_per_channel(b->info()->data_type()) && is_data_type_quantized_symmetric(b->info()->data_type())
96  && is_data_type_quantized_asymmetric(a->info()->data_type());
97  _b_offset = _convert_to_qasymm8 ? -128 : b->info()->quantization_info().uniform().offset;
98 
99  // Get the GPU target
100  const GPUTarget gpu_target = CLScheduler::get().target();
101 
102  // Set the target for the kernels
103  _mm_midgard_kernel.set_target(gpu_target);
104  _mm_native_kernel.set_target(gpu_target);
105  _mm_reshaped_only_rhs_kernel.set_target(gpu_target);
106 
107  GEMMRHSMatrixInfo rhs_info;
108  GEMMLHSMatrixInfo lhs_info;
109 
110  // Arguments used by GEMMReshapeInfo
111  // If we pass the matrix A and matrix B reshaped to CLGEMMMatrixMultiplyKernel, we need to pass m, n, k, mult_transpose1xW_width and mult_interleave4x4_height to CLGEMMReshapeInfo
112  // in order to know how the matrices have been reshaped
113  bool reinterpret_input_as_3d = gemm_info.reinterpret_input_as_3d();
114  const unsigned int m = reinterpret_input_as_3d ? (a->info()->dimension(1) * a->info()->dimension(2)) : a->info()->dimension(1);
115  const unsigned int n = b->info()->dimension(0);
116  const unsigned int k = a->info()->dimension(0);
117  const unsigned int batch_size = reinterpret_input_as_3d ? a->info()->dimension(3) : a->info()->dimension(2);
118  const int depth_output_gemm3d = gemm_info.depth_output_gemm3d();
119 
120  // Check if we need to reshape the matrix A and matrix B
121  _is_gemm_reshaped = is_gemm_reshaped(_reshape_b_only_on_first_run, gpu_target);
122  _is_midgard = gpu_target == GPUTarget::MIDGARD;
123 
124  if(_convert_to_qasymm8)
125  {
126  // Set data type for converted weights
127  TensorInfo weights_info(*b->info());
128  weights_info.set_data_type(DataType::QASYMM8);
129  _qasymm8_weights.allocator()->init(weights_info);
130  _weights_to_qasymm8.configure(b, &_qasymm8_weights, ConvertPolicy::WRAP, 0);
131  }
132 
133  const ICLTensor *matrix_b = _convert_to_qasymm8 ? &_qasymm8_weights : b;
134  if(_is_gemm_reshaped)
135  {
136  matrix_b = &_tmp_b;
137 
138  if(!_reshape_b_only_on_first_run)
139  {
140  _memory_group.manage(&_tmp_b);
141  }
142 
143  // Pick up the GEMM configuration
144  std::tie(lhs_info, rhs_info) = CLGEMMReshapedOnlyRHSKernelConfigurationFactory::create(gpu_target)->configure(m, n, k, batch_size, DataType::QASYMM8);
145 
146  // Configure reshape RHS kernel
147  _mtx_b_reshape_kernel.configure(_convert_to_qasymm8 ? &_qasymm8_weights : b, &_tmp_b, rhs_info);
148  }
149 
150  // Initialize matrix B reduction kernel only if _a_offset is not equal to 0
151  if(_a_offset != 0)
152  {
153  TensorInfo info_vector_sum_col(compute_reductionA_shape(*b->info()), 1, DataType::S32);
154  _vector_sum_col.allocator()->init(info_vector_sum_col);
155  if(!_reshape_b_only_on_first_run)
156  {
157  _memory_group.manage(&_vector_sum_col);
158  }
159 
160  // Configure Matrix B reduction kernel
161  _mtx_b_reduction_kernel.configure(_convert_to_qasymm8 ? &_qasymm8_weights : b, &_vector_sum_col);
162  }
163 
164  // Initialize Matrix A reduction kernel only if _b_offset is not equal to 0
165  if(_b_offset != 0)
166  {
167  TensorInfo info_vector_sum_row(compute_reductionB_shape(*a->info()), 1, DataType::S32);
168  _vector_sum_row.allocator()->init(info_vector_sum_row);
169  _memory_group.manage(&_vector_sum_row);
170 
171  // Configure matrix A reduction kernel
172  _mtx_a_reduction_kernel.configure(a, &_vector_sum_row);
173  }
174 
175  // If GEMMLowpOutputStage != NONE, fuse the offset contribution with the output stage
176  if(gemm_info.gemmlowp_output_stage().type != GEMMLowpOutputStageType::NONE)
177  {
178  _fuse_output_stage = true;
179 
180  _memory_group.manage(&_mm_result_s32);
181 
182  if(_is_gemm_reshaped)
183  {
184  // Configure and tune matrix multiply kernel
185  _mm_reshaped_only_rhs_kernel.configure(_matrix_a, matrix_b, &_mm_result_s32, lhs_info, rhs_info, GEMMReshapeInfo(m, n, k, 1, 1, depth_output_gemm3d, reinterpret_input_as_3d));
186  }
187  else
188  {
189  if(_is_midgard)
190  {
191  // Configure matrix multiply kernel
192  _mm_midgard_kernel.configure(_matrix_a, matrix_b, &_mm_result_s32, GEMMReshapeInfo(m, n, k, 1, 1, depth_output_gemm3d, reinterpret_input_as_3d));
193  }
194  else
195  {
196  // Pick up the GEMM configuration
197  std::tie(lhs_info, rhs_info) = CLGEMMNativeKernelConfigurationFactory::create(gpu_target)->configure(m, n, k, batch_size, DataType::QASYMM8);
198 
199  // Configure matrix multiply kernel
200  _mm_native_kernel.configure(_matrix_a, matrix_b, &_mm_result_s32, lhs_info, rhs_info, GEMMReshapeInfo(m, n, k, 1, 1, depth_output_gemm3d, reinterpret_input_as_3d));
201  }
202  }
203  // Configure offset contribution kernel
204  const size_t num_filters = (gemm_info.gemmlowp_output_stage().is_quantized_per_channel) ? gemm_info.gemmlowp_output_stage().gemmlowp_multipliers.size() : 1;
205 
206  _gemm_output_stage_multipliers.allocator()->init(TensorInfo(TensorShape(num_filters), 1, DataType::S32));
207  _gemm_output_stage_shifts.allocator()->init(TensorInfo(TensorShape(num_filters), 1, DataType::S32));
208 
209  _offset_contribution_output_stage_kernel.configure(&_mm_result_s32, _a_offset == 0 ? nullptr : &_vector_sum_col, _b_offset == 0 ? nullptr : &_vector_sum_row, c, output, a->info()->dimension(0),
210  _a_offset, _b_offset, gemm_info.gemmlowp_output_stage(), &_gemm_output_stage_multipliers, &_gemm_output_stage_shifts);
211 
212  _gemm_output_stage_multipliers.allocator()->allocate();
213  _gemm_output_stage_shifts.allocator()->allocate();
214  // Compute GEMM output multipliers and shifts for output stage
215  _gemm_output_stage_multipliers.map();
216  _gemm_output_stage_shifts.map();
217  std::memcpy(_gemm_output_stage_multipliers.ptr_to_element(Coordinates(0)), gemm_info.gemmlowp_output_stage().gemmlowp_multipliers.data(), num_filters * sizeof(int32_t));
218  std::memcpy(_gemm_output_stage_shifts.ptr_to_element(Coordinates(0)), gemm_info.gemmlowp_output_stage().gemmlowp_shifts.data(), num_filters * sizeof(int32_t));
219  _gemm_output_stage_multipliers.unmap();
220  _gemm_output_stage_shifts.unmap();
221 
222  _mm_result_s32.allocator()->allocate();
223  }
224  else
225  {
226  if(_is_gemm_reshaped)
227  {
228  // Configure and tune matrix multiply kernel
229  _mm_reshaped_only_rhs_kernel.configure(_matrix_a, matrix_b, output, lhs_info, rhs_info, GEMMReshapeInfo(m, n, k, 1, 1, depth_output_gemm3d, reinterpret_input_as_3d));
230  }
231  else
232  {
233  if(_is_midgard)
234  {
235  // Configure matrix multiply kernel
236  _mm_midgard_kernel.configure(_matrix_a, matrix_b, output, GEMMReshapeInfo(m, n, k, 1, 1, depth_output_gemm3d, reinterpret_input_as_3d));
237  }
238  else
239  {
240  // Pick up the GEMM configuration
241  std::tie(lhs_info, rhs_info) = CLGEMMNativeKernelConfigurationFactory::create(gpu_target)->configure(m, n, k, batch_size, DataType::QASYMM8);
242 
243  // Configure matrix multiply kernel
244  _mm_native_kernel.configure(_matrix_a, matrix_b, output, lhs_info, rhs_info, GEMMReshapeInfo(m, n, k, 1, 1, depth_output_gemm3d, reinterpret_input_as_3d));
245  }
246  }
247 
248  // Configure offset contribution kernel
249  _offset_contribution_kernel.configure(output, _a_offset == 0 ? nullptr : &_vector_sum_col, _b_offset == 0 ? nullptr : &_vector_sum_row, c, a->info()->dimension(0), _a_offset, _b_offset);
250  }
251 
252  // Allocate tensors
253  if(_is_gemm_reshaped)
254  {
255  if(!_reshape_b_only_on_first_run)
256  {
257  _tmp_b.allocator()->allocate();
258  }
259  }
260 
261  if(_a_offset != 0 && !_reshape_b_only_on_first_run)
262  {
263  _vector_sum_col.allocator()->allocate();
264  }
265 
266  if(_b_offset != 0)
267  {
268  _vector_sum_row.allocator()->allocate();
269  }
270 }
uint8_t * ptr_to_element(const Coordinates &id) const
Return a pointer to the element at the passed coordinates.
Definition: ITensor.h:63
void configure(ICLTensor *mm_result, const ICLTensor *vector_sum_col, const ICLTensor *vector_sum_row, const ICLTensor *bias, int32_t k, int32_t a_offset, int32_t b_offset)
Initialise the kernel's input and output.
SimpleTensor< float > b
Definition: DFT.cpp:157
static CLScheduler & get()
Access the scheduler singleton.
Definition: CLScheduler.cpp:99
GPUTarget target() const
Get the target GPU.
Definition: CLScheduler.cpp:47
TensorShape compute_reductionA_shape(const ITensorInfo &b)
Calculate the reductionA shape used in GEMMLowp.
void configure(const ICLTensor *mm_result, const ICLTensor *vector_sum_col, const ICLTensor *vector_sum_row, const ICLTensor *bias, ICLTensor *output, int32_t k, int32_t a_offset, int32_t b_offset, const GEMMLowpOutputStageInfo &output_stage, const ICLTensor *output_multipliers, const ICLTensor *output_shifts)
Initialise the kernel's input and output.
CLTensorAllocator * allocator()
Return a pointer to the tensor's allocator.
Definition: CLTensor.cpp:61
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
bool is_data_type_quantized_symmetric(DataType dt)
Check if a given data type is of symmetric quantized type.
Definition: Utils.h:1063
void init(const TensorInfo &input, size_t alignment=0)
Initialize a tensor based on the passed TensorInfo.
void map(bool blocking=true)
Enqueue a map operation of the allocated buffer.
Definition: CLTensor.cpp:66
void configure(const ICLTensor *input, ICLTensor *output, ConvertPolicy policy, uint32_t shift)
Set the input and output of the kernel.
1 channel, 1 S32 per channel
void manage(IMemoryManageable *obj) override
Sets a object to be managed by the given memory group.
Definition: MemoryGroup.h:79
static Status validate(const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *output, const GEMMInfo &gemm_info=GEMMInfo())
Static function to check if given info will lead to a valid configuration of CLGEMMLowpMatrixMultiply...
static std::unique_ptr< ICLGEMMKernelConfiguration > create(GPUTarget arch)
Static method to construct CLGEMMNative kernel object accordingly with the GPU architecture.
bool is_data_type_quantized_per_channel(DataType dt)
Check if a given data type is of per channel type.
Definition: Utils.h:1082
quantized, asymmetric fixed-point 8-bit number unsigned
void configure(const ICLTensor *input0, const ICLTensor *input1, ICLTensor *output, const GEMMLHSMatrixInfo &lhs_info, const GEMMRHSMatrixInfo &rhs_info, const GEMMReshapeInfo &gemm_info)
Initialise the kernel's input and output.
void configure(const ICLTensor *mtx_b, ICLTensor *vector_sum_col) override
Initialise the kernel's input and output.
static std::unique_ptr< ICLGEMMKernelConfiguration > create(GPUTarget arch)
Static method to call the CLGEMMReshapedOnlyRHS kernel configuration class accordingly with the GPU a...
bool is_data_type_quantized_asymmetric(DataType dt)
Check if a given data type is of asymmetric quantized type.
Definition: Utils.h:1044
TensorShape compute_reductionB_shape(const ITensorInfo &a)
Calculate the reductionB shape used in GEMMLowp.
void allocate() override
Allocate size specified by TensorInfo of OpenCL memory.
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161
GPUTarget
Available GPU Targets.
Definition: GPUTarget.h:34
void configure(const ICLTensor *input0, const ICLTensor *input1, ICLTensor *output, const GEMMReshapeInfo &gemm_info=GEMMReshapeInfo())
Initialise the kernel's input and output.
void configure(const ICLTensor *input, ICLTensor *output, const GEMMRHSMatrixInfo &rhs_info)
Initialise the kernel's input and output.
void configure(const ICLTensor *input0, const ICLTensor *input1, ICLTensor *output, const GEMMLHSMatrixInfo &lhs_info, const GEMMRHSMatrixInfo &rhs_info, const GEMMReshapeInfo &gemm_info)
Initialise the kernel's input and output.
void set_target(GPUTarget target)
Set the targeted GPU architecture.
Definition: ICLKernel.h:271
void unmap()
Enqueue an unmap operation of the allocated and mapped buffer.
Definition: CLTensor.cpp:71
void configure(const ICLTensor *mtx_a, ICLTensor *vector_sum_row) override
Initialise the kernel's input and output.

References CLTensorAllocator::allocate(), CLTensor::allocator(), ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, arm_compute::test::validation::b, arm_compute::misc::shape_calculator::compute_reductionA_shape(), arm_compute::misc::shape_calculator::compute_reductionB_shape(), CLDepthConvertLayerKernel::configure(), CLGEMMLowpMatrixMultiplyNativeKernel::configure(), CLGEMMReshapeRHSMatrixKernel::configure(), CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel::configure(), CLGEMMLowpMatrixMultiplyKernel::configure(), CLGEMMLowpOffsetContributionOutputStageKernel::configure(), CLGEMMLowpOffsetContributionKernel::configure(), CLGEMMLowpMatrixAReductionKernel::configure(), CLGEMMLowpMatrixBReductionKernel::configure(), CLGEMMNativeKernelConfigurationFactory::create(), CLGEMMReshapedOnlyRHSKernelConfigurationFactory::create(), ITensorInfo::data_type(), GEMMInfo::depth_output_gemm3d(), ITensorInfo::dimension(), GEMMLowpOutputStageInfo::gemmlowp_multipliers, GEMMInfo::gemmlowp_output_stage(), GEMMLowpOutputStageInfo::gemmlowp_shifts, CLScheduler::get(), ITensor::info(), ITensorAllocator::init(), arm_compute::is_data_type_quantized_asymmetric(), arm_compute::is_data_type_quantized_per_channel(), arm_compute::is_data_type_quantized_symmetric(), GEMMLowpOutputStageInfo::is_quantized_per_channel, MemoryGroup::manage(), CLTensor::map(), arm_compute::MIDGARD, arm_compute::NONE, UniformQuantizationInfo::offset, ITensor::ptr_to_element(), arm_compute::QASYMM8, ITensorInfo::quantization_info(), GEMMInfo::reinterpret_input_as_3d(), GEMMInfo::reshape_b_only_on_first_run(), arm_compute::S32, ICLKernel::set_target(), CLScheduler::target(), GEMMLowpOutputStageInfo::type, QuantizationInfo::uniform(), CLTensor::unmap(), CLGEMMLowpMatrixMultiplyCore::validate(), arm_compute::test::validation::weights_info, and arm_compute::WRAP.

Referenced by CLLSTMLayerQuantized::configure(), CLGEMMDeconvolutionLayer::configure(), and arm_compute::test::validation::DATA_TEST_CASE().

◆ operator=() [1/2]

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

Default move assignment operator.

◆ prepare()

void prepare ( )
overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Note
Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from IFunction.

Definition at line 495 of file CLGEMMLowpMatrixMultiplyCore.cpp.

496 {
497  if(!_is_prepared)
498  {
499  if(_convert_to_qasymm8)
500  {
501  _qasymm8_weights.allocator()->allocate();
502  CLScheduler::get().enqueue(_weights_to_qasymm8, false);
503  }
504 
505  if(_is_gemm_reshaped && _reshape_b_only_on_first_run)
506  {
507  ARM_COMPUTE_ERROR_ON(!_original_b->is_used());
508 
509  // Run reshape kernel and mark original weights tensor as unused
510  _tmp_b.allocator()->allocate();
511  CLScheduler::get().enqueue(_mtx_b_reshape_kernel, false);
512  _original_b->mark_as_unused();
513  }
514 
515  // Run matrix B reduction kernel only if _a_offset is not equal to 0
516  if(_a_offset != 0 && _reshape_b_only_on_first_run)
517  {
518  _vector_sum_col.allocator()->allocate();
519  CLScheduler::get().enqueue(_mtx_b_reduction_kernel, false);
520  }
521 
522  CLScheduler::get().queue().finish();
523  _is_prepared = true;
524  }
525 }
static CLScheduler & get()
Access the scheduler singleton.
Definition: CLScheduler.cpp:99
bool is_used() const
Flags if the tensor is used or not.
Definition: ITensor.cpp:162
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:466
CLTensorAllocator * allocator()
Return a pointer to the tensor's allocator.
Definition: CLTensor.cpp:61
void mark_as_unused() const
Marks a tensor as unused.
Definition: ITensor.cpp:167
cl::CommandQueue & queue()
Accessor for the associated CL command queue.
Definition: CLScheduler.cpp:41
void enqueue(ICLKernel &kernel, bool flush=true)
Schedule the execution of the passed kernel if possible.
void allocate() override
Allocate size specified by TensorInfo of OpenCL memory.

References CLTensorAllocator::allocate(), CLTensor::allocator(), ARM_COMPUTE_ERROR_ON, CLScheduler::enqueue(), CLScheduler::get(), ITensor::is_used(), ITensor::mark_as_unused(), and CLScheduler::queue().

Referenced by CLGEMMDeconvolutionLayer::prepare(), CLGEMMConvolutionLayer::prepare(), and CLGEMMLowpMatrixMultiplyCore::run().

◆ run()

void run ( )
overridevirtual

Run the kernels contained in the function.

For NEON kernels:

  • Multi-threading is used for the kernels which are parallelisable.
  • By default std::thread::hardware_concurrency() threads are used.
Note
CPPScheduler::set_num_threads() can be used to manually set the number of threads

For OpenCL kernels:

  • All the kernels are enqueued on the queue associated with CLScheduler.
  • The queue is then flushed.
Note
The function will not block until the kernels are executed. It is the user's responsibility to wait.
Will call prepare() on first run if hasn't been done

Implements IFunction.

Definition at line 439 of file CLGEMMLowpMatrixMultiplyCore.cpp.

440 {
441  prepare();
442 
443  MemoryGroupResourceScope scope_mg(_memory_group);
444 
445  if(_is_gemm_reshaped)
446  {
447  if(!_reshape_b_only_on_first_run)
448  {
449  // Run reshape matrix B
450  CLScheduler::get().enqueue(_mtx_b_reshape_kernel, false);
451  }
452  }
453 
454  // Run matrix B reduction kernel only if _a_offset is not equal to 0
455  if(_a_offset != 0 && !_reshape_b_only_on_first_run)
456  {
457  CLScheduler::get().enqueue(_mtx_b_reduction_kernel, false);
458  }
459 
460  // Run matrix multiply
461  if(_is_gemm_reshaped)
462  {
463  CLScheduler::get().enqueue(_mm_reshaped_only_rhs_kernel, false);
464  }
465  else
466  {
467  if(_is_midgard)
468  {
469  CLScheduler::get().enqueue(_mm_midgard_kernel, false);
470  }
471  else
472  {
473  CLScheduler::get().enqueue(_mm_native_kernel, false);
474  }
475  }
476 
477  // Run matrix A reduction kernel only if _b_offset is not equal to 0
478  if(_b_offset != 0)
479  {
480  CLScheduler::get().enqueue(_mtx_a_reduction_kernel, false);
481  }
482 
483  if(_fuse_output_stage)
484  {
485  // Run offset contribution/output stage kernel
486  CLScheduler::get().enqueue(_offset_contribution_output_stage_kernel, true);
487  }
488  else
489  {
490  // Run offset contribution kernel
491  CLScheduler::get().enqueue(_offset_contribution_kernel, true);
492  }
493 }
void prepare() override
Prepare the function for executing.
static CLScheduler & get()
Access the scheduler singleton.
Definition: CLScheduler.cpp:99
void enqueue(ICLKernel &kernel, bool flush=true)
Schedule the execution of the passed kernel if possible.

References CLScheduler::enqueue(), CLScheduler::get(), and CLGEMMLowpMatrixMultiplyCore::prepare().

Referenced by CLGEMMDeconvolutionLayer::run(), CLLSTMLayerQuantized::run(), CLFullyConnectedLayer::run(), and CLGEMMConvolutionLayer::run().

◆ validate()

Status validate ( const ITensorInfo a,
const ITensorInfo b,
const ITensorInfo c,
const ITensorInfo output,
const GEMMInfo gemm_info = GEMMInfo() 
)
static

Static function to check if given info will lead to a valid configuration of CLGEMMLowpMatrixMultiplyCore.

Parameters
[in]aFirst input tensor info (Matrix A). Data type supported: QASYMM8.
[in]bSecond input tensor info (Matrix B). Data type supported: same as a
[in]cThird input tensor info (Matrix C). It can be a nullptr. Data type supported: S32
[in]outputOutput tensor info. Data type supported: S32 or QASYMM8 if gemm_info.gemmlowp_output_stage != NONE
[in]gemm_info(Optional) Specifies if the matrix A and/or matrix B have been reshaped and if the reshape of matrix B should be executed only for the first run
Returns
a status

Definition at line 272 of file CLGEMMLowpMatrixMultiplyCore.cpp.

273 {
275  if(b->data_type() == DataType::QSYMM8_PER_CHANNEL)
276  {
277  //DataType::QSYMM8_PER_CHANNEL supported only for weights
278  ARM_COMPUTE_RETURN_ERROR_ON_MSG(a->data_type() != DataType::QASYMM8, "Matrix A is not quantized while Matrix B is");
279  }
280  else
281  {
283  }
284  ARM_COMPUTE_RETURN_ERROR_ON_MSG(gemm_info.is_a_reshaped(), "Matrix A already reshaped is not supported");
285  ARM_COMPUTE_RETURN_ERROR_ON_MSG(gemm_info.is_b_reshaped(), "Matrix B already reshaped is not supported");
286 
287  int32_t a_offset = a->quantization_info().uniform().offset;
288  int32_t b_offset = b->quantization_info().uniform().offset;
289 
290  const ITensorInfo *matrix_a_info = a;
291 
292  TensorInfo tmp_b_info{};
293  GEMMRHSMatrixInfo rhs_info;
294  GEMMLHSMatrixInfo lhs_info;
295 
296  // Get the GPU target
297  const GPUTarget gpu_target = CLScheduler::get().target();
298 
299  bool reinterpret_input_as_3d = gemm_info.reinterpret_input_as_3d();
300  const unsigned int m = reinterpret_input_as_3d ? (a->dimension(1) * a->dimension(2)) : a->dimension(1);
301  const unsigned int n = b->dimension(0);
302  const unsigned int k = a->dimension(0);
303  const unsigned int batch_size = reinterpret_input_as_3d ? a->dimension(3) : a->dimension(2);
304  const int depth_output_gemm3d = gemm_info.depth_output_gemm3d();
305  const bool is_midgard = gpu_target == GPUTarget::MIDGARD;
306 
307  bool reshape_matrix_b = is_gemm_reshaped(gemm_info.reshape_b_only_on_first_run(), CLScheduler::get().target());
308 
309  const GEMMReshapeInfo reshape_info = GEMMReshapeInfo(m, n, k, 1, 1, depth_output_gemm3d, reinterpret_input_as_3d);
310 
311  bool convert_to_qasymm8 = is_data_type_quantized_per_channel(b->data_type()) && is_data_type_quantized_symmetric(b->data_type())
312  && is_data_type_quantized_asymmetric(a->data_type());
313  TensorInfo weights_info(*b);
314  if(convert_to_qasymm8)
315  {
316  b_offset = -128;
317  weights_info.set_data_type(DataType::QASYMM8);
319  }
320  const ITensorInfo *matrix_b_info = &weights_info;
321  if(reshape_matrix_b)
322  {
323  matrix_b_info = &tmp_b_info;
324 
325  // Pick up the GEMM configuration
326  std::tie(lhs_info, rhs_info) = CLGEMMReshapedOnlyRHSKernelConfigurationFactory::create(gpu_target)->configure(m, n, k, batch_size, DataType::QASYMM8);
327 
328  // Validate reshape RHS kernel
329  auto_init_if_empty(tmp_b_info, weights_info.clone()->set_tensor_shape(compute_rhs_reshaped_shape(weights_info, rhs_info)));
331  }
332 
333  TensorInfo info_vector_sum_col{};
334  TensorInfo info_vector_sum_row{};
335 
336  // Validate matrix B reduction kernel only if _a_offset is not equal to 0
337  if(a_offset != 0)
338  {
339  info_vector_sum_col = TensorInfo(compute_reductionA_shape(weights_info), 1, DataType::S32);
340 
341  // Configure Matrix B reduction kernel
343  }
344 
345  // Validate Matrix A reduction kernel only if _b_offset is not equal to 0
346  if(b_offset != 0)
347  {
348  info_vector_sum_row = TensorInfo(compute_reductionB_shape(*a), 1, DataType::S32);
349 
350  // Configure matrix A reduction kernel
352  }
353 
354  if(gemm_info.gemmlowp_output_stage().type != GEMMLowpOutputStageType::NONE)
355  {
356  TensorInfo mm_result_s32_info{};
357 
358  if(reshape_matrix_b)
359  {
360  // Output tensor auto inizialitation if not yet initialized
361  auto_init_if_empty(mm_result_s32_info, a->clone()->set_tensor_shape(compute_mm_shape(*matrix_a_info, *matrix_b_info, reshape_info)).set_data_type(DataType::S32));
362 
363  // Validate matrix multiply
364  ARM_COMPUTE_RETURN_ON_ERROR(CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel::validate(matrix_a_info, matrix_b_info, &mm_result_s32_info, lhs_info, rhs_info, reshape_info));
365  }
366  else
367  {
368  // Output tensor auto inizialitation if not yet initialized
369  auto_init_if_empty(mm_result_s32_info, a->clone()->set_tensor_shape(compute_mm_shape(*matrix_a_info, *matrix_b_info, false, reshape_info)).set_data_type(DataType::S32));
370 
371  if(is_midgard)
372  {
373  // Validate matrix multiply
374  ARM_COMPUTE_RETURN_ON_ERROR(CLGEMMLowpMatrixMultiplyKernel::validate(matrix_a_info, matrix_b_info, &mm_result_s32_info, reshape_info));
375  }
376  else
377  {
378  // Pick up the GEMM configuration
379  std::tie(lhs_info, rhs_info) = CLGEMMNativeKernelConfigurationFactory::create(gpu_target)->configure(m, n, k, batch_size, DataType::QASYMM8);
380 
381  // Validate matrix multiply
382  ARM_COMPUTE_RETURN_ON_ERROR(CLGEMMLowpMatrixMultiplyNativeKernel::validate(matrix_a_info, matrix_b_info, &mm_result_s32_info, lhs_info, rhs_info, reshape_info));
383  }
384  }
385 
386  // Validate offset contribution kernel
387  const size_t num_filters = (gemm_info.gemmlowp_output_stage().is_quantized_per_channel) ? gemm_info.gemmlowp_output_stage().gemmlowp_multipliers.size() : 1;
388 
389  const TensorInfo gemm_output_stage_multipliers_shifts_info(TensorInfo(TensorShape(num_filters), 1, DataType::S32));
390 
392  a_offset == 0 ? nullptr : &info_vector_sum_col,
393  b_offset == 0 ? nullptr : &info_vector_sum_row,
394  c,
395  output,
396  a_offset, b_offset,
397  gemm_info.gemmlowp_output_stage(),
398  &gemm_output_stage_multipliers_shifts_info,
399  &gemm_output_stage_multipliers_shifts_info));
400  }
401  else
402  {
403  if(reshape_matrix_b)
404  {
405  // Validate matrix multiply
406  ARM_COMPUTE_RETURN_ON_ERROR(CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel::validate(matrix_a_info, matrix_b_info, output, lhs_info, rhs_info, reshape_info));
407  }
408  else
409  {
410  if(is_midgard)
411  {
412  // Validate matrix multiply
413  ARM_COMPUTE_RETURN_ON_ERROR(CLGEMMLowpMatrixMultiplyKernel::validate(matrix_a_info, matrix_b_info, output, reshape_info));
414  }
415  else
416  {
417  // Pick up the GEMM configuration
418  std::tie(lhs_info, rhs_info) = CLGEMMNativeKernelConfigurationFactory::create(gpu_target)->configure(m, n, k, batch_size, DataType::QASYMM8);
419 
420  // Validate matrix multiply
421  ARM_COMPUTE_RETURN_ON_ERROR(CLGEMMLowpMatrixMultiplyNativeKernel::validate(matrix_a_info, matrix_b_info, output, lhs_info, rhs_info, reshape_info));
422  }
423  }
424 
425  if(output->total_size() != 0)
426  {
427  // Validate offset contribution kernel
429  a_offset == 0 ? nullptr : &info_vector_sum_col,
430  b_offset == 0 ? nullptr : &info_vector_sum_row,
431  c,
432  a_offset, b_offset));
433  }
434  }
435 
436  return Status{};
437 }
static Status validate(const ITensorInfo *mm_result, const ITensorInfo *vector_sum_col, const ITensorInfo *vector_sum_row, const ITensorInfo *bias, int32_t a_offset, int32_t b_offset)
Static function to check if given info will lead to a valid configuration of CLGEMMLowpOffsetContribu...
static Status validate(const ITensorInfo *input, const ITensorInfo *output, ConvertPolicy policy, uint32_t shift)
Static function to check if given info will lead to a valid configuration of CLDepthConvertLayerKerne...
SimpleTensor< float > b
Definition: DFT.cpp:157
static CLScheduler & get()
Access the scheduler singleton.
Definition: CLScheduler.cpp:99
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(...)
Definition: Validate.h:545
GPUTarget target() const
Get the target GPU.
Definition: CLScheduler.cpp:47
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204
TensorShape compute_mm_shape(const ITensorInfo &input0, const ITensorInfo &input1, bool is_interleaved_transposed, const GEMMReshapeInfo &reshape_info)
Calculate the matrix multiplication output shape of two tensors.
TensorShape compute_reductionA_shape(const ITensorInfo &b)
Calculate the reductionA shape used in GEMMLowp.
#define ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(t, c,...)
Definition: Validate.h:792
static Status validate(const ITensorInfo *mtx_a, const ITensorInfo *vector_sum_row)
Static function to check if given info will lead to a valid configuration of CLGEMMLowpMatrixAReducti...
bool is_data_type_quantized_symmetric(DataType dt)
Check if a given data type is of symmetric quantized type.
Definition: Utils.h:1063
bool auto_init_if_empty(ITensorInfo &info, const TensorShape &shape, int num_channels, DataType data_type, QuantizationInfo quantization_info=QuantizationInfo())
Auto initialize the tensor info (shape, number of channels and data type) if the current assignment i...
Definition: Helpers.inl:202
1 channel, 1 S32 per channel
static Status validate(const ITensorInfo *mm_result, const ITensorInfo *vector_sum_col, const ITensorInfo *vector_sum_row, const ITensorInfo *bias, const ITensorInfo *output, int32_t a_offset, int32_t b_offset, const GEMMLowpOutputStageInfo &output_stage, const ITensorInfo *output_multipliers, const ITensorInfo *output_shifts)
Static function to check if given info will lead to a valid configuration of CLGEMMLowpOffsetContribu...
static std::unique_ptr< ICLGEMMKernelConfiguration > create(GPUTarget arch)
Static method to construct CLGEMMNative kernel object accordingly with the GPU architecture.
bool is_data_type_quantized_per_channel(DataType dt)
Check if a given data type is of per channel type.
Definition: Utils.h:1082
static Status validate(const ITensorInfo *input0, const ITensorInfo *input1, const ITensorInfo *output, const GEMMReshapeInfo &gemm_info=GEMMReshapeInfo())
Static function to check if given info will lead to a valid configuration of CLGEMMLowpMatrixMultiply...
quantized, asymmetric fixed-point 8-bit number unsigned
TensorShape compute_rhs_reshaped_shape(const ITensorInfo &a, const GEMMRHSMatrixInfo &rhs_info)
Calculate the Right Hand Side matrix reshaped shape.
static std::unique_ptr< ICLGEMMKernelConfiguration > create(GPUTarget arch)
Static method to call the CLGEMMReshapedOnlyRHS kernel configuration class accordingly with the GPU a...
bool is_data_type_quantized_asymmetric(DataType dt)
Check if a given data type is of asymmetric quantized type.
Definition: Utils.h:1044
static Status validate(const ITensorInfo *input0, const ITensorInfo *input1, const ITensorInfo *output, const GEMMLHSMatrixInfo &lhs_info, const GEMMRHSMatrixInfo &rhs_info, const GEMMReshapeInfo &gemm_info)
Static function to check if given info will lead to a valid configuration of CLGEMMLowpMatrixMultiply...
quantized, symmetric per channel fixed-point 8-bit number
TensorShape compute_reductionB_shape(const ITensorInfo &a)
Calculate the reductionB shape used in GEMMLowp.
GPUTarget
Available GPU Targets.
Definition: GPUTarget.h:34
static Status validate(const ITensorInfo *input0, const ITensorInfo *input1, const ITensorInfo *output, const GEMMLHSMatrixInfo &lhs_info, const GEMMRHSMatrixInfo &rhs_info, const GEMMReshapeInfo &gemm_info)
Static function to check if given info will lead to a valid configuration of CLGEMMLowpMatrixMultiply...
#define ARM_COMPUTE_RETURN_ERROR_ON_MSG(cond, msg)
If the condition is true, an error is returned.
Definition: Error.h:244
static Status validate(const ITensorInfo *mtx_b, const ITensorInfo *vector_sum_col)
Static function to check if given info will lead to a valid configuration of CLGEMMLowpMatrixBReducti...
static Status validate(const ITensorInfo *input, const ITensorInfo *output, const GEMMRHSMatrixInfo &rhs_info)
Static function to check if given info will lead to a valid configuration of CLGEMMReshapeRHSMatrixKe...

References ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES, ARM_COMPUTE_RETURN_ERROR_ON_MSG, ARM_COMPUTE_RETURN_ON_ERROR, arm_compute::auto_init_if_empty(), arm_compute::test::validation::b, ICloneable< T >::clone(), arm_compute::misc::shape_calculator::compute_mm_shape(), arm_compute::misc::shape_calculator::compute_reductionA_shape(), arm_compute::misc::shape_calculator::compute_reductionB_shape(), arm_compute::misc::shape_calculator::compute_rhs_reshaped_shape(), CLGEMMNativeKernelConfigurationFactory::create(), CLGEMMReshapedOnlyRHSKernelConfigurationFactory::create(), ITensorInfo::data_type(), GEMMInfo::depth_output_gemm3d(), ITensorInfo::dimension(), GEMMLowpOutputStageInfo::gemmlowp_multipliers, GEMMInfo::gemmlowp_output_stage(), CLScheduler::get(), GEMMInfo::is_a_reshaped(), GEMMInfo::is_b_reshaped(), arm_compute::is_data_type_quantized_asymmetric(), arm_compute::is_data_type_quantized_per_channel(), arm_compute::is_data_type_quantized_symmetric(), GEMMLowpOutputStageInfo::is_quantized_per_channel, arm_compute::MIDGARD, arm_compute::NONE, UniformQuantizationInfo::offset, arm_compute::QASYMM8, arm_compute::QSYMM8_PER_CHANNEL, ITensorInfo::quantization_info(), GEMMInfo::reinterpret_input_as_3d(), GEMMInfo::reshape_b_only_on_first_run(), arm_compute::S32, CLScheduler::target(), ITensorInfo::total_size(), GEMMLowpOutputStageInfo::type, QuantizationInfo::uniform(), CLDepthConvertLayerKernel::validate(), CLGEMMReshapeRHSMatrixKernel::validate(), CLGEMMLowpMatrixMultiplyNativeKernel::validate(), CLGEMMLowpMatrixMultiplyKernel::validate(), CLGEMMLowpMatrixAReductionKernel::validate(), CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel::validate(), CLGEMMLowpOffsetContributionKernel::validate(), CLGEMMLowpOffsetContributionOutputStageKernel::validate(), CLGEMMLowpMatrixBReductionKernel::validate(), arm_compute::test::validation::weights_info, and arm_compute::WRAP.

Referenced by CLGEMMLowpMatrixMultiplyCore::configure(), CLGEMMDeconvolutionLayer::validate(), and CLLSTMLayerQuantized::validate().


The documentation for this class was generated from the following files: