Compute Library
 21.02
CLGEMMLowpMatrixMultiplyCore Class Reference

Basic function to execute GEMMLowpMatrixMultiplyCore on OpenCL. More...

#include <CLGEMMLowpMatrixMultiplyCore.h>

Collaboration diagram for CLGEMMLowpMatrixMultiplyCore:
[legend]

Public Member Functions

 CLGEMMLowpMatrixMultiplyCore (std::shared_ptr< IMemoryManager > memory_manager=nullptr)
 Constructor. More...
 
 CLGEMMLowpMatrixMultiplyCore (const CLGEMMLowpMatrixMultiplyCore &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 CLGEMMLowpMatrixMultiplyCore (CLGEMMLowpMatrixMultiplyCore &&)=default
 Default move constructor. More...
 
CLGEMMLowpMatrixMultiplyCoreoperator= (const CLGEMMLowpMatrixMultiplyCore &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
CLGEMMLowpMatrixMultiplyCoreoperator= (CLGEMMLowpMatrixMultiplyCore &&)=default
 Default move assignment operator. More...
 
 ~CLGEMMLowpMatrixMultiplyCore ()
 Default destructor. More...
 
void configure (const ICLTensor *a, const ICLTensor *b, const ICLTensor *c, ICLTensor *output, const GEMMInfo &gemm_info=GEMMInfo())
 Initialise the kernel's inputs, output. More...
 
void configure (const CLCompileContext &compile_context, const ICLTensor *a, const ICLTensor *b, const ICLTensor *c, ICLTensor *output, const GEMMInfo &gemm_info=GEMMInfo())
 Initialise the kernel's inputs, output. More...
 
void run () override
 Run the kernels contained in the function. More...
 
void prepare () override
 Prepare the function for executing. More...
 
- Public Member Functions inherited from IFunction
virtual ~IFunction ()=default
 Destructor. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *output, const GEMMInfo &gemm_info=GEMMInfo())
 Static function to check if given info will lead to a valid configuration of CLGEMMLowpMatrixMultiplyCore. More...
 

Detailed Description

Basic function to execute GEMMLowpMatrixMultiplyCore on OpenCL.

Definition at line 47 of file CLGEMMLowpMatrixMultiplyCore.h.

Constructor & Destructor Documentation

◆ CLGEMMLowpMatrixMultiplyCore() [1/3]

CLGEMMLowpMatrixMultiplyCore ( std::shared_ptr< IMemoryManager memory_manager = nullptr)

Constructor.

Definition at line 190 of file CLGEMMLowpMatrixMultiplyCore.cpp.

191  : _memory_group(std::move(memory_manager)),
192  _weights_to_qasymm8(std::make_unique<CLDepthConvertLayerKernel>()),
193  _mm_native_kernel(std::make_unique<CLGEMMLowpMatrixMultiplyNativeKernel>()),
194  _mm_reshaped_only_rhs_kernel(std::make_unique<CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel>()),
195  _mtx_b_reshape_kernel(std::make_unique<CLGEMMReshapeRHSMatrixKernel>()),
196  _mtx_a_reduction_kernel(std::make_unique<CLGEMMLowpMatrixAReductionKernel>()),
197  _mtx_b_reduction_kernel(std::make_unique<CLGEMMLowpMatrixBReductionKernel>()),
198  _offset_contribution_kernel(std::make_unique<CLGEMMLowpOffsetContributionKernel>()),
199  _offset_contribution_output_stage_kernel(std::make_unique<CLGEMMLowpOffsetContributionOutputStageKernel>()),
200  _qasymm8_weights(),
201  _vector_sum_col(),
202  _vector_sum_row(),
203  _tmp_b(),
204  _mm_result_s32(),
205  _gemm_output_stage_multipliers(),
206  _gemm_output_stage_shifts(),
207  _matrix_a(nullptr),
208  _original_b(nullptr),
209  _output(nullptr),
210  _a_offset(0),
211  _b_offset(0),
212  _is_gemm_reshaped(true),
213  _reshape_b_only_on_first_run(false),
214  _is_prepared(false),
215  _run_output_stage(false),
216  _convert_to_qasymm8(false),
217  _run_offset_contribution(false)
218 {
219 }

◆ CLGEMMLowpMatrixMultiplyCore() [2/3]

Prevent instances of this class from being copied (As this class contains pointers)

◆ CLGEMMLowpMatrixMultiplyCore() [3/3]

Default move constructor.

◆ ~CLGEMMLowpMatrixMultiplyCore()

Default destructor.

Member Function Documentation

◆ configure() [1/2]

void configure ( const ICLTensor a,
const ICLTensor b,
const ICLTensor c,
ICLTensor output,
const GEMMInfo gemm_info = GEMMInfo() 
)

Initialise the kernel's inputs, output.

Note
GEMMLowp: low precision GEMM kernel. [A * B + C] This kernel performs the following computations:
  1. Convert a values from 8-bit quantized to int32 and add a_offset to each of them.
  2. Convert b values from 8-bit quantized to int32 and add b_offset to each of them.
  3. Compute the matrix product of the resulting a * b in int32.
  4. Quantize to uint8 if gemm_info.gemmlowp_output_stage != NONE
Parameters
[in]aFirst input tensor (Matrix A). Data type supported: QASYMM8/QASYMM8_SIGNED.
[in]bSecond input tensor (Matrix B). Data type supported: QASYMM8/QASYMM8_SIGNED/QSYMM8/QSYMM8_PER_CHANNEL
[in]cThird input tensor (Matrix C). It can be a nullptr. Data type supported: S32
[out]outputOutput tensor. Data type supported: S32 or QASYMM8/QASYMM8_SIGNED if gemm_info.gemmlowp_output_stage != NONE
[in]gemm_info(Optional) Specifies if the matrix A and/or matrix B have been reshaped and if the reshape of matrix B should be executed only for the first run

Definition at line 223 of file CLGEMMLowpMatrixMultiplyCore.cpp.

References CLKernelLibrary::get().

Referenced by CLQLSTMLayer::CLQLSTMLayer(), CLGEMMDeconvolutionLayer::configure(), and CLLSTMLayerQuantized::configure().

224 {
225  configure(CLKernelLibrary::get().get_compile_context(), a, b, c, output, gemm_info);
226 }
SimpleTensor< float > b
Definition: DFT.cpp:157
static CLKernelLibrary & get()
Access the KernelLibrary singleton.
void configure(const ICLTensor *a, const ICLTensor *b, const ICLTensor *c, ICLTensor *output, const GEMMInfo &gemm_info=GEMMInfo())
Initialise the kernel&#39;s inputs, output.

◆ configure() [2/2]

void configure ( const CLCompileContext compile_context,
const ICLTensor a,
const ICLTensor b,
const ICLTensor c,
ICLTensor output,
const GEMMInfo gemm_info = GEMMInfo() 
)

Initialise the kernel's inputs, output.

Note
GEMMLowp: low precision GEMM kernel. [A * B + C] This kernel performs the following computations:
  1. Convert a values from 8-bit quantized to int32 and add a_offset to each of them.
  2. Convert b values from 8-bit quantized to int32 and add b_offset to each of them.
  3. Compute the matrix product of the resulting a * b in int32.
  4. Quantize to uint8 if gemm_info.gemmlowp_output_stage != NONE
Parameters
[in]compile_contextThe compile context to be used.
[in]aFirst input tensor (Matrix A). Data type supported: QASYMM8/QASYMM8_SIGNED.
[in]bSecond input tensor (Matrix B). Data type supported: same as a
[in]cThird input tensor (Matrix C). It can be a nullptr. Data type supported: S32
[out]outputOutput tensor. Data type supported: S32 or QASYMM8/QASYMM8_SIGNED if gemm_info.gemmlowp_output_stage != NONE
[in]gemm_info(Optional) Specifies if the matrix A and/or matrix B have been reshaped and if the reshape of matrix B should be executed only for the first run

Definition at line 228 of file CLGEMMLowpMatrixMultiplyCore.cpp.

References GEMMKernelInfo::a_offset, CLTensorAllocator::allocate(), CLTensor::allocator(), ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, arm_compute::test::validation::b, GEMMKernelInfo::b_offset, arm_compute::misc::shape_calculator::compute_reductionA_shape(), arm_compute::misc::shape_calculator::compute_reductionB_shape(), ITensorInfo::data_type(), GEMMKernelInfo::depth_output_gemm3d, GEMMInfo::depth_output_gemm3d(), ITensorInfo::dimension(), GEMMLowpOutputStageInfo::gemmlowp_multipliers, GEMMInfo::gemmlowp_output_stage(), GEMMLowpOutputStageInfo::gemmlowp_shifts, CLScheduler::get(), ITensor::info(), CLTensor::info(), ITensorAllocator::init(), arm_compute::is_data_type_quantized_per_channel(), arm_compute::is_data_type_quantized_symmetric(), GEMMLowpOutputStageInfo::is_quantized_per_channel, GEMMKernelInfo::k, GEMMKernelInfo::lhs_info, GEMMKernelInfo::m, MemoryGroup::manage(), CLTensor::map(), GEMMKernelInfo::n, arm_compute::NONE, UniformQuantizationInfo::offset, GEMMLowpOutputStageInfo::output_data_type, GEMMKernelInfo::output_stage, ITensor::ptr_to_element(), arm_compute::QASYMM8, ITensorInfo::quantization_info(), arm_compute::QUANTIZE_DOWN_FIXEDPOINT, GEMMKernelInfo::reinterpret_input_as_3d, GEMMInfo::reinterpret_input_as_3d(), GEMMInfo::reshape_b_only_on_first_run(), GEMMKernelInfo::rhs_info, arm_compute::S32, CLScheduler::target(), GEMMLowpOutputStageInfo::type, QuantizationInfo::uniform(), CLTensor::unmap(), CLGEMMLowpMatrixMultiplyCore::validate(), arm_compute::test::validation::weights_info, and arm_compute::WRAP.

229 {
230  ARM_COMPUTE_ERROR_ON_NULLPTR(a, b, output);
231  ARM_COMPUTE_ERROR_THROW_ON(CLGEMMLowpMatrixMultiplyCore::validate(a->info(), b->info(), c != nullptr ? c->info() : nullptr, output->info(), gemm_info));
232 
233  _is_prepared = false;
234  _original_b = b;
235  _reshape_b_only_on_first_run = gemm_info.reshape_b_only_on_first_run();
236  _a_offset = a->info()->quantization_info().uniform().offset;
237  _matrix_a = a;
238  _output = output;
239 
240  _convert_to_qasymm8 = is_data_type_quantized_per_channel(b->info()->data_type()) && is_data_type_quantized_symmetric(b->info()->data_type())
241  && a->info()->data_type() == DataType::QASYMM8;
242  _b_offset = _convert_to_qasymm8 ? -128 : b->info()->quantization_info().uniform().offset;
243 
244  // Get the GPU target
245  const GPUTarget gpu_target = CLScheduler::get().target();
246 
247  // Set the target for the kernels
248  _mm_native_kernel->set_target(gpu_target);
249  _mm_reshaped_only_rhs_kernel->set_target(gpu_target);
250 
251  GEMMRHSMatrixInfo rhs_info;
252  GEMMLHSMatrixInfo lhs_info;
253 
254  // Arguments used by GEMMReshapeInfo
255  // If we pass the matrix A and matrix B reshaped to CLGEMMMatrixMultiplyKernel, we need to pass m, n, k, mult_transpose1xW_width and mult_interleave4x4_height to CLGEMMReshapeInfo
256  // in order to know how the matrices have been reshaped
257  bool reinterpret_input_as_3d = gemm_info.reinterpret_input_as_3d();
258  const unsigned int m = reinterpret_input_as_3d ? (a->info()->dimension(1) * a->info()->dimension(2)) : a->info()->dimension(1);
259  const unsigned int n = b->info()->dimension(0);
260  const unsigned int k = a->info()->dimension(0);
261  const unsigned int batch_size = reinterpret_input_as_3d ? a->info()->dimension(3) : a->info()->dimension(2);
262  const int depth_output_gemm3d = gemm_info.depth_output_gemm3d();
263 
264  const auto reshape_info = GEMMReshapeInfo(m, n, k, 1, 1, depth_output_gemm3d, reinterpret_input_as_3d);
265 
266  // Check if we need to reshape the matrix A and matrix B
267  _is_gemm_reshaped = is_gemm_reshaped(auto_select_gemm_kernel(auto_heuristics::CommonQuery{ gpu_target, a->info()->data_type(), m, n, k, batch_size }, _reshape_b_only_on_first_run));
268 
269  if(_convert_to_qasymm8)
270  {
271  // Set data type for converted weights
272  TensorInfo weights_info(*b->info());
273  weights_info.set_data_type(DataType::QASYMM8);
274  _qasymm8_weights.allocator()->init(weights_info);
275  _weights_to_qasymm8->configure(compile_context, b, &_qasymm8_weights, ConvertPolicy::WRAP, 0);
276  }
277 
278  const ICLTensor *matrix_b = _convert_to_qasymm8 ? &_qasymm8_weights : b;
279  if(_is_gemm_reshaped)
280  {
281  matrix_b = &_tmp_b;
282 
283  if(!_reshape_b_only_on_first_run)
284  {
285  _memory_group.manage(&_tmp_b);
286  }
287 
288  // Pick up the GEMM configuration
289  // It doesn't matter whether Datatype is DataType::QASYMM8 or DataType::QASYMM8_SIGNED, since it only affect the shape configuration
290  std::tie(lhs_info, rhs_info) = auto_select_gemm_config_reshaped_only_rhs(auto_heuristics::CommonQuery{ gpu_target, DataType::QASYMM8, m, n, k, batch_size }, reinterpret_input_as_3d,
291  depth_output_gemm3d,
292  a->info(), _convert_to_qasymm8 ? _qasymm8_weights.info() : b->info(), output->info());
293 
294  // Configure reshape RHS kernel
295  _mtx_b_reshape_kernel->configure(compile_context, _convert_to_qasymm8 ? &_qasymm8_weights : b, &_tmp_b, rhs_info);
296  }
297 
298  // Using default reduction info
299  const GEMMLowpReductionKernelInfo reduction_info {};
300 
301  // Initialize matrix B reduction kernel only if _a_offset is not equal to 0
302  if(_a_offset != 0)
303  {
304  TensorInfo info_vector_sum_col(compute_reductionA_shape(*b->info()), 1, DataType::S32);
305  _vector_sum_col.allocator()->init(info_vector_sum_col);
306  if(!_reshape_b_only_on_first_run)
307  {
308  _memory_group.manage(&_vector_sum_col);
309  }
310 
311  // Configure Matrix B reduction kernel
312  _mtx_b_reduction_kernel->configure(compile_context, _convert_to_qasymm8 ? &_qasymm8_weights : b, &_vector_sum_col, reduction_info);
313  }
314 
315  // Initialize Matrix A reduction kernel only if _b_offset is not equal to 0
316  if(_b_offset != 0)
317  {
318  TensorInfo info_vector_sum_row(compute_reductionB_shape(*a->info()), 1, DataType::S32);
319  _vector_sum_row.allocator()->init(info_vector_sum_row);
320  _memory_group.manage(&_vector_sum_row);
321 
322  // Configure matrix A reduction kernel
323  _mtx_a_reduction_kernel->configure(compile_context, a, &_vector_sum_row, reduction_info);
324  }
325 
326  GEMMKernelInfo gemm_kernel_info;
327  gemm_kernel_info.m = m;
328  gemm_kernel_info.n = n;
329  gemm_kernel_info.k = k;
330  gemm_kernel_info.depth_output_gemm3d = depth_output_gemm3d;
331  gemm_kernel_info.reinterpret_input_as_3d = reinterpret_input_as_3d;
332  gemm_kernel_info.lhs_info = lhs_info;
333  gemm_kernel_info.rhs_info = rhs_info;
334  gemm_kernel_info.a_offset = _a_offset;
335  gemm_kernel_info.b_offset = _b_offset;
336  // If GEMMLowpOutputStage != NONE, fuse the offset contribution with the output stage
337  if(gemm_info.gemmlowp_output_stage().type != GEMMLowpOutputStageType::NONE)
338  {
339  // Configure offset contribution kernel
340  const size_t num_filters = (gemm_info.gemmlowp_output_stage().is_quantized_per_channel) ? gemm_info.gemmlowp_output_stage().gemmlowp_multipliers.size() : 1;
341 
342  _gemm_output_stage_multipliers.allocator()->init(TensorInfo(TensorShape(num_filters), 1, DataType::S32));
343  _gemm_output_stage_shifts.allocator()->init(TensorInfo(TensorShape(num_filters), 1, DataType::S32));
344 
345  GEMMLowpOutputStageInfo gemmlowp_output_stage = gemm_info.gemmlowp_output_stage();
346  gemmlowp_output_stage.output_data_type = _matrix_a->info()->data_type();
347 
348  gemm_kernel_info.output_stage = gemmlowp_output_stage;
349 
350  if(_is_gemm_reshaped && gemmlowp_output_stage.type == GEMMLowpOutputStageType::QUANTIZE_DOWN_FIXEDPOINT)
351  {
352  // Configure and tune matrix multiply kernel with fused output stage
353  _mm_reshaped_only_rhs_kernel->configure(compile_context, _matrix_a, matrix_b, output, gemm_kernel_info, _a_offset == 0 ? nullptr : &_vector_sum_col,
354  _b_offset == 0 ? nullptr : &_vector_sum_row, c, &_gemm_output_stage_multipliers, &_gemm_output_stage_shifts);
355  }
356  else
357  {
358  _run_output_stage = true;
359 
360  _memory_group.manage(&_mm_result_s32);
361 
362  if(_is_gemm_reshaped)
363  {
364  _mm_reshaped_only_rhs_kernel->configure(compile_context, _matrix_a, matrix_b, &_mm_result_s32, gemm_kernel_info);
365  }
366  else
367  {
368  // Pick up the GEMM configuration
369  // It doesn't matter whether Datatype is DataType::QASYMM8 or DataType::QASYMM8_SIGNED, since it only affect the shape configuration
370  std::tie(lhs_info, rhs_info) = auto_select_gemm_config_native(auto_heuristics::CommonQuery{ gpu_target, DataType::QASYMM8, m, n, k, batch_size },
371  _matrix_a->info(), _convert_to_qasymm8 ? _qasymm8_weights.info() : matrix_b->info(), reshape_info);
372 
373  // Configure matrix multiply kernel
374  _mm_native_kernel->configure(compile_context, _matrix_a, matrix_b, &_mm_result_s32, lhs_info, rhs_info, reshape_info);
375 
376  _offset_contribution_output_stage_kernel->configure(compile_context, &_mm_result_s32, _a_offset == 0 ? nullptr : &_vector_sum_col, _b_offset == 0 ? nullptr : &_vector_sum_row, c, output,
377  a->info()->dimension(0),
378  _a_offset, _b_offset, gemmlowp_output_stage, &_gemm_output_stage_multipliers, &_gemm_output_stage_shifts);
379  _mm_result_s32.allocator()->allocate();
380  }
381  }
382 
383  _gemm_output_stage_multipliers.allocator()->allocate();
384  _gemm_output_stage_shifts.allocator()->allocate();
385  // Compute GEMM output multipliers and shifts for output stage
386  _gemm_output_stage_multipliers.map();
387  _gemm_output_stage_shifts.map();
388  std::memcpy(_gemm_output_stage_multipliers.ptr_to_element(Coordinates(0)), gemm_info.gemmlowp_output_stage().gemmlowp_multipliers.data(), num_filters * sizeof(int32_t));
389  std::memcpy(_gemm_output_stage_shifts.ptr_to_element(Coordinates(0)), gemm_info.gemmlowp_output_stage().gemmlowp_shifts.data(), num_filters * sizeof(int32_t));
390  _gemm_output_stage_multipliers.unmap();
391  _gemm_output_stage_shifts.unmap();
392  }
393  else
394  {
395  _run_offset_contribution = true;
396  if(_is_gemm_reshaped)
397  {
398  // Configure and tune matrix multiply kernel
399  _mm_reshaped_only_rhs_kernel->configure(compile_context, _matrix_a, matrix_b, output, gemm_kernel_info);
400  }
401  else
402  {
403  // Pick up the GEMM configuration
404  // It doesn't matter whether Datatype is DataType::QASYMM8 or DataType::QASYMM8_SIGNED, since it only affect the shape configuration
405  std::tie(lhs_info, rhs_info) = auto_select_gemm_config_native(auto_heuristics::CommonQuery{ gpu_target, DataType::QASYMM8, m, n, k, batch_size },
406  a->info(), _convert_to_qasymm8 ? _qasymm8_weights.info() : b->info(), reshape_info);
407 
408  // Configure matrix multiply kernel
409  _mm_native_kernel->configure(compile_context, _matrix_a, matrix_b, output, lhs_info, rhs_info, reshape_info);
410  }
411 
412  // Configure offset contribution kernel
413  _offset_contribution_kernel->configure(compile_context, output, _a_offset == 0 ? nullptr : &_vector_sum_col, _b_offset == 0 ? nullptr : &_vector_sum_row, c, a->info()->dimension(0), _a_offset,
414  _b_offset);
415  }
416 
417  // Allocate tensors
418  if(_is_gemm_reshaped)
419  {
420  if(!_reshape_b_only_on_first_run)
421  {
422  _tmp_b.allocator()->allocate();
423  }
424  }
425 
426  if(_a_offset != 0 && !_reshape_b_only_on_first_run)
427  {
428  _vector_sum_col.allocator()->allocate();
429  }
430 
431  if(_b_offset != 0)
432  {
433  _vector_sum_row.allocator()->allocate();
434  }
435 }
uint8_t * ptr_to_element(const Coordinates &id) const
Return a pointer to the element at the passed coordinates.
Definition: ITensor.h:63
Quantize using a fixed point multiplication.
TensorInfo * info() const override
Interface to be implemented by the child class to return the tensor&#39;s metadata.
Definition: CLTensor.cpp:41
SimpleTensor< float > b
Definition: DFT.cpp:157
static CLScheduler & get()
Access the scheduler singleton.
GPUTarget target() const
Get the target GPU.
Definition: CLScheduler.cpp:47
virtual DataType data_type() const =0
Data type used for each element of the tensor.
TensorShape compute_reductionA_shape(const ITensorInfo &b)
Calculate the reductionA shape used in GEMMLowp.
A collection of adaptor functions that enable the auto selection between mlgo-based heuristics and de...
CLTensorAllocator * allocator()
Return a pointer to the tensor&#39;s allocator.
Definition: CLTensor.cpp:61
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
bool is_data_type_quantized_symmetric(DataType dt)
Check if a given data type is of symmetric quantized type.
Definition: Utils.h:1226
void init(const TensorInfo &input, size_t alignment=0)
Initialize a tensor based on the passed TensorInfo.
void map(bool blocking=true)
Enqueue a map operation of the allocated buffer.
Definition: CLTensor.cpp:66
1 channel, 1 S32 per channel
void manage(IMemoryManageable *obj) override
Sets a object to be managed by the given memory group.
Definition: MemoryGroup.h:79
static Status validate(const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *output, const GEMMInfo &gemm_info=GEMMInfo())
Static function to check if given info will lead to a valid configuration of CLGEMMLowpMatrixMultiply...
bool is_data_type_quantized_per_channel(DataType dt)
Check if a given data type is of per channel type.
Definition: Utils.h:1245
quantized, asymmetric fixed-point 8-bit number unsigned
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor&#39;s metadata.
TensorShape compute_reductionB_shape(const ITensorInfo &a)
Calculate the reductionB shape used in GEMMLowp.
void allocate() override
Allocate size specified by TensorInfo of OpenCL memory.
GPUTarget
Available GPU Targets.
Definition: GPUTarget.h:34
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161
void unmap()
Enqueue an unmap operation of the allocated and mapped buffer.
Definition: CLTensor.cpp:71

◆ operator=() [1/2]

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

Default move assignment operator.

◆ prepare()

void prepare ( )
overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Note
Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from IFunction.

Definition at line 668 of file CLGEMMLowpMatrixMultiplyCore.cpp.

References CLTensorAllocator::allocate(), CLTensor::allocator(), ARM_COMPUTE_ERROR_ON, CLScheduler::enqueue(), CLScheduler::get(), ITensor::is_used(), ITensor::mark_as_unused(), and CLScheduler::queue().

Referenced by CLGEMMDeconvolutionLayer::prepare(), CLGEMMConvolutionLayer::prepare(), and CLGEMMLowpMatrixMultiplyCore::run().

669 {
670  if(!_is_prepared)
671  {
672  if(_convert_to_qasymm8)
673  {
674  _qasymm8_weights.allocator()->allocate();
675  CLScheduler::get().enqueue(*_weights_to_qasymm8, false);
676  }
677 
678  if(_is_gemm_reshaped && _reshape_b_only_on_first_run)
679  {
680  ARM_COMPUTE_ERROR_ON(!_original_b->is_used());
681 
682  // Run reshape kernel and mark original weights tensor as unused
683  _tmp_b.allocator()->allocate();
684  CLScheduler::get().enqueue(*_mtx_b_reshape_kernel, false);
685  _original_b->mark_as_unused();
686  }
687 
688  // Run matrix B reduction kernel only if _a_offset is not equal to 0
689  if(_a_offset != 0 && _reshape_b_only_on_first_run)
690  {
691  _vector_sum_col.allocator()->allocate();
692  CLScheduler::get().enqueue(*_mtx_b_reduction_kernel, false);
693  }
694 
695  CLScheduler::get().queue().finish();
696  _is_prepared = true;
697  }
698 }
static CLScheduler & get()
Access the scheduler singleton.
bool is_used() const
Flags if the tensor is used or not.
Definition: ITensor.cpp:163
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:466
CLTensorAllocator * allocator()
Return a pointer to the tensor&#39;s allocator.
Definition: CLTensor.cpp:61
void mark_as_unused() const
Marks a tensor as unused.
Definition: ITensor.cpp:168
cl::CommandQueue & queue()
Accessor for the associated CL command queue.
Definition: CLScheduler.cpp:41
void enqueue(ICLKernel &kernel, bool flush=true)
Schedule the execution of the passed kernel if possible.
void allocate() override
Allocate size specified by TensorInfo of OpenCL memory.

◆ run()

void run ( )
overridevirtual

Run the kernels contained in the function.

For Neon kernels:

  • Multi-threading is used for the kernels which are parallelisable.
  • By default std::thread::hardware_concurrency() threads are used.
Note
CPPScheduler::set_num_threads() can be used to manually set the number of threads

For OpenCL kernels:

  • All the kernels are enqueued on the queue associated with CLScheduler.
  • The queue is then flushed.
Note
The function will not block until the kernels are executed. It is the user's responsibility to wait.
Will call prepare() on first run if hasn't been done

Implements IFunction.

Definition at line 620 of file CLGEMMLowpMatrixMultiplyCore.cpp.

References CLScheduler::enqueue(), CLScheduler::get(), and CLGEMMLowpMatrixMultiplyCore::prepare().

Referenced by CLGEMMDeconvolutionLayer::run(), CLLSTMLayerQuantized::run(), CLFullyConnectedLayer::run(), CLQLSTMLayer::run(), and CLGEMMConvolutionLayer::run().

621 {
622  prepare();
623 
624  MemoryGroupResourceScope scope_mg(_memory_group);
625 
626  if(_is_gemm_reshaped)
627  {
628  if(!_reshape_b_only_on_first_run)
629  {
630  // Run reshape matrix B
631  CLScheduler::get().enqueue(*_mtx_b_reshape_kernel, false);
632  }
633  }
634 
635  // Run matrix B reduction kernel only if _a_offset is not equal to 0
636  if(_a_offset != 0 && !_reshape_b_only_on_first_run)
637  {
638  CLScheduler::get().enqueue(*_mtx_b_reduction_kernel, false);
639  }
640 
641  // Run matrix A reduction kernel only if _b_offset is not equal to 0
642  if(_b_offset != 0)
643  {
644  CLScheduler::get().enqueue(*_mtx_a_reduction_kernel, false);
645  }
646 
647  // Run matrix multiply
648  if(_is_gemm_reshaped)
649  {
650  CLScheduler::get().enqueue(*_mm_reshaped_only_rhs_kernel, false);
651  }
652  else
653  {
654  CLScheduler::get().enqueue(*_mm_native_kernel, false);
655  }
656  if(_run_output_stage)
657  {
658  // Run offset contribution/output stage kernel
659  CLScheduler::get().enqueue(*_offset_contribution_output_stage_kernel, true);
660  }
661  if(_run_offset_contribution)
662  {
663  // Run offset contribution kernel
664  CLScheduler::get().enqueue(*_offset_contribution_kernel, true);
665  }
666 }
void prepare() override
Prepare the function for executing.
static CLScheduler & get()
Access the scheduler singleton.
void enqueue(ICLKernel &kernel, bool flush=true)
Schedule the execution of the passed kernel if possible.

◆ validate()

Status validate ( const ITensorInfo a,
const ITensorInfo b,
const ITensorInfo c,
const ITensorInfo output,
const GEMMInfo gemm_info = GEMMInfo() 
)
static

Static function to check if given info will lead to a valid configuration of CLGEMMLowpMatrixMultiplyCore.

Parameters
[in]aFirst input tensor info (Matrix A). Data type supported: QASYMM8.
[in]bSecond input tensor info (Matrix B). Data type supported: QASYMM8/QASYMM8_SIGNED/QSYMM8/QSYMM8_PER_CHANNEL
[in]cThird input tensor info (Matrix C). It can be a nullptr. Data type supported: S32
[in]outputOutput tensor info. Data type supported: S32 or QASYMM8/QASYMM8_SIGNED if gemm_info.gemmlowp_output_stage != NONE
[in]gemm_info(Optional) Specifies if the matrix A and/or matrix B have been reshaped and if the reshape of matrix B should be executed only for the first run
Returns
a status

Definition at line 437 of file CLGEMMLowpMatrixMultiplyCore.cpp.

References GEMMKernelInfo::a_offset, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_RETURN_ERROR_ON, ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_RETURN_ERROR_ON_MSG, ARM_COMPUTE_RETURN_ON_ERROR, arm_compute::auto_init_if_empty(), GEMMKernelInfo::b_offset, ICloneable< T >::clone(), arm_compute::misc::shape_calculator::compute_mm_shape(), arm_compute::misc::shape_calculator::compute_reductionA_shape(), arm_compute::misc::shape_calculator::compute_reductionB_shape(), arm_compute::misc::shape_calculator::compute_rhs_reshaped_shape(), ITensorInfo::data_type(), GEMMKernelInfo::depth_output_gemm3d, GEMMInfo::depth_output_gemm3d(), ITensorInfo::dimension(), GEMMLowpOutputStageInfo::gemmlowp_multipliers, GEMMInfo::gemmlowp_output_stage(), CLScheduler::get(), GEMMInfo::is_a_reshaped(), GEMMInfo::is_b_reshaped(), arm_compute::is_data_type_quantized_asymmetric(), arm_compute::is_data_type_quantized_per_channel(), arm_compute::is_data_type_quantized_symmetric(), GEMMLowpOutputStageInfo::is_quantized_per_channel, GEMMKernelInfo::k, GEMMKernelInfo::lhs_info, GEMMKernelInfo::m, GEMMKernelInfo::n, arm_compute::NONE, UniformQuantizationInfo::offset, GEMMLowpOutputStageInfo::output_data_type, GEMMKernelInfo::output_stage, arm_compute::QASYMM8, arm_compute::QASYMM8_SIGNED, arm_compute::QSYMM8, arm_compute::QSYMM8_PER_CHANNEL, ITensorInfo::quantization_info(), arm_compute::QUANTIZE_DOWN_FIXEDPOINT, GEMMKernelInfo::reinterpret_input_as_3d, GEMMInfo::reinterpret_input_as_3d(), GEMMInfo::reshape_b_only_on_first_run(), GEMMKernelInfo::rhs_info, arm_compute::S32, arm_compute::cl_gemm::auto_heuristics::select_default_gemm_config_native(), arm_compute::cl_gemm::auto_heuristics::select_default_gemm_config_reshaped_only_rhs(), CLScheduler::target(), ITensorInfo::total_size(), GEMMLowpOutputStageInfo::type, QuantizationInfo::uniform(), CLDepthConvertLayerKernel::validate(), CLGEMMLowpMatrixMultiplyNativeKernel::validate(), CLGEMMLowpOffsetContributionKernel::validate(), CLGEMMLowpOffsetContributionOutputStageKernel::validate(), CLGEMMLowpMatrixAReductionKernel::validate(), CLGEMMReshapeRHSMatrixKernel::validate(), CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel::validate(), CLGEMMLowpMatrixBReductionKernel::validate(), arm_compute::test::validation::weights_info, and arm_compute::WRAP.

Referenced by CLGEMMLowpMatrixMultiplyCore::configure(), CLGEMMDeconvolutionLayer::validate(), and CLLSTMLayerQuantized::validate().

438 {
439  ARM_COMPUTE_ERROR_ON_NULLPTR(a, b, output);
442  ARM_COMPUTE_RETURN_ERROR_ON(a->data_type() == DataType::QASYMM8 && b->data_type() == DataType::QASYMM8_SIGNED);
443  ARM_COMPUTE_RETURN_ERROR_ON(a->data_type() == DataType::QASYMM8_SIGNED && b->data_type() == DataType::QASYMM8);
444  ARM_COMPUTE_RETURN_ERROR_ON_MSG(gemm_info.is_a_reshaped(), "Matrix A already reshaped is not supported");
445  ARM_COMPUTE_RETURN_ERROR_ON_MSG(gemm_info.is_b_reshaped(), "Matrix B already reshaped is not supported");
446 
447  int32_t a_offset = a->quantization_info().uniform().offset;
448  int32_t b_offset = b->quantization_info().uniform().offset;
449 
450  const ITensorInfo *matrix_a_info = a;
451 
452  TensorInfo tmp_b_info{};
453  GEMMRHSMatrixInfo rhs_info;
454  GEMMLHSMatrixInfo lhs_info;
455 
456  // Get the GPU target
457  const GPUTarget gpu_target = CLScheduler::get().target();
458 
459  bool reinterpret_input_as_3d = gemm_info.reinterpret_input_as_3d();
460  const unsigned int m = reinterpret_input_as_3d ? (a->dimension(1) * a->dimension(2)) : a->dimension(1);
461  const unsigned int n = b->dimension(0);
462  const unsigned int k = a->dimension(0);
463  const unsigned int batch_size = reinterpret_input_as_3d ? a->dimension(3) : a->dimension(2);
464  const int depth_output_gemm3d = gemm_info.depth_output_gemm3d();
465 
466  bool reshape_matrix_b = is_gemm_reshaped(auto_select_gemm_kernel(auto_heuristics::CommonQuery{ gpu_target, a->data_type(), m, n, k, batch_size }, gemm_info.reshape_b_only_on_first_run()));
467 
468  const GEMMReshapeInfo reshape_info = GEMMReshapeInfo(m, n, k, 1, 1, depth_output_gemm3d, reinterpret_input_as_3d);
469 
470  bool convert_to_qasymm8 = is_data_type_quantized_per_channel(b->data_type()) && is_data_type_quantized_symmetric(b->data_type())
471  && is_data_type_quantized_asymmetric(a->data_type());
472  TensorInfo weights_info(*b);
473  if(convert_to_qasymm8)
474  {
475  b_offset = -128;
476  weights_info.set_data_type(DataType::QASYMM8);
478  }
479  const ITensorInfo *matrix_b_info = &weights_info;
480  if(reshape_matrix_b)
481  {
482  matrix_b_info = &tmp_b_info;
483 
484  // Pick up the GEMM configuration
485  // NOTE: No need to validate mlgo configurations as they automatically fall back to default heuristics if validation fails
486  // It doesn't matter whether Datatype is DataType::QASYMM8 or DataType::QASYMM8_SIGNED, since it only affect the shape configuration
487  const auto res = select_default_gemm_config_reshaped_only_rhs(auto_heuristics::CommonQuery{ gpu_target, DataType::QASYMM8, m, n, k, batch_size });
488  lhs_info = res.lhs_info;
489  rhs_info = res.rhs_info;
490 
491  // Validate reshape RHS kernel
492  auto_init_if_empty(tmp_b_info, weights_info.clone()->set_tensor_shape(compute_rhs_reshaped_shape(weights_info, rhs_info)));
494  }
495 
496  TensorInfo info_vector_sum_col{};
497  TensorInfo info_vector_sum_row{};
498 
499  const GEMMLowpReductionKernelInfo reduction_info;
500  // Validate matrix B reduction kernel only if _a_offset is not equal to 0
501  if(a_offset != 0)
502  {
503  info_vector_sum_col = TensorInfo(compute_reductionA_shape(weights_info), 1, DataType::S32);
504 
505  // Configure Matrix B reduction kernel
507  }
508 
509  // Validate Matrix A reduction kernel only if _b_offset is not equal to 0
510  if(b_offset != 0)
511  {
512  info_vector_sum_row = TensorInfo(compute_reductionB_shape(*a), 1, DataType::S32);
513 
514  // Configure matrix A reduction kernel
515  ARM_COMPUTE_RETURN_ON_ERROR(CLGEMMLowpMatrixAReductionKernel::validate(a, &info_vector_sum_row, reduction_info));
516  }
517 
518  GEMMKernelInfo gemm_kernel_info;
519  gemm_kernel_info.m = m;
520  gemm_kernel_info.n = n;
521  gemm_kernel_info.k = k;
522  gemm_kernel_info.depth_output_gemm3d = depth_output_gemm3d;
523  gemm_kernel_info.reinterpret_input_as_3d = reinterpret_input_as_3d;
524  gemm_kernel_info.lhs_info = lhs_info;
525  gemm_kernel_info.rhs_info = rhs_info;
526  gemm_kernel_info.a_offset = a_offset;
527  gemm_kernel_info.b_offset = b_offset;
528  if(gemm_info.gemmlowp_output_stage().type != GEMMLowpOutputStageType::NONE)
529  {
530  const size_t num_filters = (gemm_info.gemmlowp_output_stage().is_quantized_per_channel) ? gemm_info.gemmlowp_output_stage().gemmlowp_multipliers.size() : 1;
531 
532  const TensorInfo gemm_output_stage_multipliers_shifts_info(TensorInfo(TensorShape(num_filters), 1, DataType::S32));
533 
534  GEMMLowpOutputStageInfo gemmlowp_output_stage = gemm_info.gemmlowp_output_stage();
535  gemmlowp_output_stage.output_data_type = a->data_type();
536 
537  gemm_kernel_info.output_stage = gemmlowp_output_stage;
538  if(reshape_matrix_b && gemm_info.gemmlowp_output_stage().type == GEMMLowpOutputStageType::QUANTIZE_DOWN_FIXEDPOINT)
539  {
540  ARM_COMPUTE_RETURN_ON_ERROR(CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel::validate(matrix_a_info, matrix_b_info, output, gemm_kernel_info,
541  a_offset == 0 ? nullptr : &info_vector_sum_col,
542  b_offset == 0 ? nullptr : &info_vector_sum_row,
543  c,
544  &gemm_output_stage_multipliers_shifts_info,
545  &gemm_output_stage_multipliers_shifts_info));
546  }
547  else
548  {
549  TensorInfo mm_result_s32_info{};
550 
551  if(reshape_matrix_b)
552  {
553  // Output tensor auto inizialitation if not yet initialized
554  auto_init_if_empty(mm_result_s32_info, a->clone()->set_tensor_shape(compute_mm_shape(*matrix_a_info, *matrix_b_info, reshape_info)).set_data_type(DataType::S32));
555 
556  // Validate matrix multiply
557  ARM_COMPUTE_RETURN_ON_ERROR(CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel::validate(matrix_a_info, matrix_b_info, &mm_result_s32_info, gemm_kernel_info));
558  }
559  else
560  {
561  // Output tensor auto inizialitation if not yet initialized
562  auto_init_if_empty(mm_result_s32_info, a->clone()->set_tensor_shape(compute_mm_shape(*matrix_a_info, *matrix_b_info, false, reshape_info)).set_data_type(DataType::S32));
563 
564  // Pick up the GEMM configuration
565  // NOTE: No need to validate mlgo configurations as they automatically fall back to default heuristics if validation fails
566  // It doesn't matter whether Datatype is DataType::QASYMM8 or DataType::QASYMM8_SIGNED, since it only affect the shape configuration
567  const auto res = select_default_gemm_config_native(auto_heuristics::CommonQuery{ gpu_target, DataType::QASYMM8, m, n, k, batch_size });
568  lhs_info = res.lhs_info;
569  rhs_info = res.rhs_info;
570 
571  // Validate matrix multiply
572  ARM_COMPUTE_RETURN_ON_ERROR(CLGEMMLowpMatrixMultiplyNativeKernel::validate(matrix_a_info, matrix_b_info, &mm_result_s32_info, lhs_info, rhs_info, reshape_info));
573  }
574 
575  // Validate offset contribution kernel
577  a_offset == 0 ? nullptr : &info_vector_sum_col,
578  b_offset == 0 ? nullptr : &info_vector_sum_row,
579  c,
580  output,
581  a_offset, b_offset,
582  gemmlowp_output_stage,
583  &gemm_output_stage_multipliers_shifts_info,
584  &gemm_output_stage_multipliers_shifts_info));
585  }
586  }
587  else
588  {
589  if(reshape_matrix_b)
590  {
591  // Validate matrix multiply
592  ARM_COMPUTE_RETURN_ON_ERROR(CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel::validate(matrix_a_info, matrix_b_info, output, gemm_kernel_info));
593  }
594  else
595  {
596  // Pick up the GEMM configuration
597  // It doesn't matter whether Datatype is DataType::QASYMM8 or DataType::QASYMM8_SIGNED, since it only affect the shape configuration
598  const auto res = select_default_gemm_config_native(auto_heuristics::CommonQuery{ gpu_target, DataType::QASYMM8, m, n, k, batch_size });
599  lhs_info = res.lhs_info;
600  rhs_info = res.rhs_info;
601 
602  // Validate matrix multiply
603  ARM_COMPUTE_RETURN_ON_ERROR(CLGEMMLowpMatrixMultiplyNativeKernel::validate(matrix_a_info, matrix_b_info, output, lhs_info, rhs_info, reshape_info));
604  }
605 
606  if(output->total_size() != 0)
607  {
608  // Validate offset contribution kernel
610  a_offset == 0 ? nullptr : &info_vector_sum_col,
611  b_offset == 0 ? nullptr : &info_vector_sum_row,
612  c,
613  a_offset, b_offset));
614  }
615  }
616 
617  return Status{};
618 }
Quantize using a fixed point multiplication.
static Status validate(const ITensorInfo *mm_result, const ITensorInfo *vector_sum_col, const ITensorInfo *vector_sum_row, const ITensorInfo *bias, int32_t a_offset, int32_t b_offset)
Static function to check if given info will lead to a valid configuration of CLGEMMLowpOffsetContribu...
static Status validate(const ITensorInfo *input, const ITensorInfo *output, ConvertPolicy policy, uint32_t shift)
Static function to check if given info will lead to a valid configuration of CLDepthConvertLayerKerne...
SimpleTensor< float > b
Definition: DFT.cpp:157
static CLScheduler & get()
Access the scheduler singleton.
GPUTarget target() const
Get the target GPU.
Definition: CLScheduler.cpp:47
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204
TensorShape compute_mm_shape(const ITensorInfo &input0, const ITensorInfo &input1, bool is_interleaved_transposed, const GEMMReshapeInfo &reshape_info)
Calculate the matrix multiplication output shape of two tensors.
TensorShape compute_reductionA_shape(const ITensorInfo &b)
Calculate the reductionA shape used in GEMMLowp.
static Status validate(const ITensorInfo *mtx_b, const ITensorInfo *vector_sum_col, const GEMMLowpReductionKernelInfo &info)
Static function to check if given info will lead to a valid configuration of CLGEMMLowpMatrixBReducti...
A collection of adaptor functions that enable the auto selection between mlgo-based heuristics and de...
bool is_data_type_quantized_symmetric(DataType dt)
Check if a given data type is of symmetric quantized type.
Definition: Utils.h:1226
#define ARM_COMPUTE_RETURN_ERROR_ON(cond)
If the condition is true, an error is returned.
Definition: Error.h:296
GEMMLHSMatrixInfo lhs_info
If the result is valid.
1 channel, 1 S32 per channel
GEMMConfigResult select_default_gemm_config_native(const CommonQuery &query)
Select gemm config based on default heuristics.
static Status validate(const ITensorInfo *mm_result, const ITensorInfo *vector_sum_col, const ITensorInfo *vector_sum_row, const ITensorInfo *bias, const ITensorInfo *output, int32_t a_offset, int32_t b_offset, const GEMMLowpOutputStageInfo &output_stage, const ITensorInfo *output_multipliers, const ITensorInfo *output_shifts)
Static function to check if given info will lead to a valid configuration of CLGEMMLowpOffsetContribu...
static Status validate(const ITensorInfo *input0, const ITensorInfo *input1, const ITensorInfo *output, const GEMMKernelInfo &gemm_info, const ITensorInfo *vector_sum_col=nullptr, const ITensorInfo *vector_sum_row=nullptr, const ITensorInfo *bias=nullptr, const ITensorInfo *output_multipliers=nullptr, const ITensorInfo *output_shifts=nullptr)
Static function to check if given info will lead to a valid configuration of CLGEMMLowpMatrixMultiply...
bool is_data_type_quantized_per_channel(DataType dt)
Check if a given data type is of per channel type.
Definition: Utils.h:1245
quantized, asymmetric fixed-point 8-bit number unsigned
TensorShape compute_rhs_reshaped_shape(const ITensorInfo &a, const GEMMRHSMatrixInfo &rhs_info)
Calculate the Right Hand Side matrix reshaped shape.
bool auto_init_if_empty(ITensorInfo &info, const TensorShape &shape, int num_channels, DataType data_type, QuantizationInfo quantization_info=QuantizationInfo())
Auto initialize the tensor info (shape, number of channels and data type) if the current assignment i...
quantized, symmetric fixed-point 8-bit number
bool is_data_type_quantized_asymmetric(DataType dt)
Check if a given data type is of asymmetric quantized type.
Definition: Utils.h:1190
static Status validate(const ITensorInfo *input0, const ITensorInfo *input1, const ITensorInfo *output, const GEMMLHSMatrixInfo &lhs_info, const GEMMRHSMatrixInfo &rhs_info, const GEMMReshapeInfo &gemm_info)
Static function to check if given info will lead to a valid configuration of CLGEMMLowpMatrixMultiply...
quantized, symmetric per channel fixed-point 8-bit number
TensorShape compute_reductionB_shape(const ITensorInfo &a)
Calculate the reductionB shape used in GEMMLowp.
GPUTarget
Available GPU Targets.
Definition: GPUTarget.h:34
#define ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(t, c,...)
Definition: Validate.h:792
static Status validate(const ITensorInfo *mtx_a, const ITensorInfo *vector_sum_row, const GEMMLowpReductionKernelInfo &info)
Static function to check if given info will lead to a valid configuration of CLGEMMLowpMatrixAReducti...
#define ARM_COMPUTE_RETURN_ERROR_ON_MSG(cond, msg)
If the condition is true, an error is returned.
Definition: Error.h:244
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161
quantized, asymmetric fixed-point 8-bit number signed
GEMMConfigResult select_default_gemm_config_reshaped_only_rhs(const CommonQuery &query)
Select gemm config based on default heuristics.
static Status validate(const ITensorInfo *input, const ITensorInfo *output, const GEMMRHSMatrixInfo &rhs_info)
Static function to check if given info will lead to a valid configuration of CLGEMMReshapeRHSMatrixKe...

The documentation for this class was generated from the following files: