Compute Library
 22.05
ClGemm Class Reference

Basic function to execute GEMM on OpenCL. More...

#include <ClGemm.h>

Collaboration diagram for ClGemm:
[legend]

Public Member Functions

 ClGemm ()
 Constructor. More...
 
void configure (const CLCompileContext &compile_context, ITensorInfo *a, ITensorInfo *b, ITensorInfo *c, ITensorInfo *output, float alpha, float beta, const GEMMInfo &gemm_info)
 Initialise the kernel's inputs and output. More...
 
void run (ITensorPack &tensors) override
 Run the kernels contained in the function. More...
 
void prepare (ITensorPack &constants) override
 Prepare the function for executing. More...
 
experimental::MemoryRequirements workspace () const override
 Return the memory requirements required by the workspace. More...
 
- Public Member Functions inherited from ICLOperator
 ICLOperator (IRuntimeContext *ctx=nullptr)
 Constructor. More...
 
 ICLOperator (const ICLOperator &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 ICLOperator (ICLOperator &&)=default
 Default move constructor. More...
 
ICLOperatoroperator= (const ICLOperator &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
ICLOperatoroperator= (ICLOperator &&)=default
 Default move assignment operator. More...
 
- Public Member Functions inherited from IOperator
virtual ~IOperator ()=default
 Destructor. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *output, float alpha, float beta, const GEMMInfo &gemm_info)
 Static function to check if given info will lead to a valid configuration. More...
 

Detailed Description

Basic function to execute GEMM on OpenCL.

This function calls the following OpenCL kernels:

  1. kernels::ClGemmReshapeLhsMatrixKernel (only if the RESHAPED is selected by the heuristic model)
  2. kernels::ClGemmReshapeRhsMatrixKernel (only if either the RESHAPED or RESHAPED_ONLY_RHS is selected by the select_gemm_kernel method())
  3. kernels::ClGemmMatrixMultiplyNativeKernel (only if NATIVE is selected by the select_gemm_kernel method())
  4. kernels::ClGemmMatrixMultiplyReshapedKernel (only if RESHAPED is selected by the select_gemm_kernel method())
  5. kernels::ClGemmMatrixMultiplyReshapedOnlyRhsKernel (only if RESHAPED_ONLY_RHS is selected by the select_gemm_kernel method())

Definition at line 54 of file ClGemm.h.

Constructor & Destructor Documentation

◆ ClGemm()

ClGemm ( )

Constructor.

Definition at line 188 of file ClGemm.cpp.

References arm_compute::NATIVE.

189  : _reshape_lhs_kernel(std::make_unique<ClGemmReshapeLhsMatrixKernel>()),
190  _reshape_rhs_kernel(std::make_unique<ClGemmReshapeRhsMatrixKernel>()),
191  _mm_native_kernel(std::make_unique<ClGemmMatrixMultiplyNativeKernel>()),
192  _mm_reshaped_kernel(std::make_unique<ClGemmMatrixMultiplyReshapedKernel>()),
193  _mm_reshaped_only_rhs_kernel(std::make_unique<ClGemmMatrixMultiplyReshapedOnlyRhsKernel>()),
194  _tmp_a(),
195  _tmp_b(),
196  _reshape_b_only_on_first_run(false),
197  _gemm_kernel_type(CLGEMMKernelType::NATIVE),
198  _is_prepared(false),
199  _aux_mem(AuxTensorIdx::Count)
200 {
201 }
Native GEMM kernel with configurable block size.

Member Function Documentation

◆ configure()

void configure ( const CLCompileContext compile_context,
ITensorInfo a,
ITensorInfo b,
ITensorInfo c,
ITensorInfo output,
float  alpha,
float  beta,
const GEMMInfo gemm_info 
)

Initialise the kernel's inputs and output.

Valid data layouts:

  • All

Valid data type configurations:

src0 src1 src2 dst
F32 F32 F32 F32
F16 F16 F16 F16
Note
GEMM: General Matrix Multiply - [alpha * A * B + beta * C].
All tensors must have the same data type.
Whilst the first input tensor can be a vector, the second input tensor must be at least a matrix
Parameters
[in]compile_contextThe compile context to be used.
[in]aFirst input tensor (Matrix or Vector A). Data types supported: F16/F32
[in]bSecond input tensor (Matrix B). Data type supported: same as a.
[in]cThird input tensor (Matrix C). It can be a nullptr if just the multiplication between a and b is needed. Data type supported: same as a.
[out]outputOutput tensor. Data type supported: same as a
[in]alphaWeight of the matrix product
[in]betaWeight of matrix C
[in]gemm_info(Optional) Specifies if the matrix A and/or matrix B have been reshaped and if the reshape of matrix B should happen only for the first run. GEMMInfo also contains information about the reshaping in case matrix A and matrix B have been already transformed.

Definition at line 461 of file ClGemm.cpp.

References ITensorInfo::are_values_constant(), ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, ARM_COMPUTE_LOG_PARAMS, ITensorInfo::data_type(), ITensorInfo::dimension(), CLScheduler::get(), arm_compute::helpers::float_ops::is_zero(), arm_compute::test::validation::k, arm_compute::test::validation::m, arm_compute::test::validation::n, arm_compute::NATIVE, GEMMInfo::reinterpret_input_as_3d(), GEMMInfo::reshape_b_only_on_first_run(), arm_compute::RESHAPED, arm_compute::RESHAPED_ONLY_RHS, GEMMInfo::retain_internal_weights(), CLScheduler::target(), and ClGemm::validate().

Referenced by ClWinogradConv2d::configure().

462 {
463  ARM_COMPUTE_ERROR_ON_NULLPTR(a, b, output);
464 
465  // Perform validation step
466  ARM_COMPUTE_ERROR_THROW_ON(validate(a, b, c, output, alpha, beta, gemm_info));
467  ARM_COMPUTE_LOG_PARAMS(a, b, c, output, alpha, beta, gemm_info);
468 
469  // Check if we need to reshape the matrix B only on the first run
470  _reshape_b_only_on_first_run = gemm_info.reshape_b_only_on_first_run();
471  _is_prepared = gemm_info.retain_internal_weights();
472 
473  bool reinterpret_input_as_3d = gemm_info.reinterpret_input_as_3d();
474  const unsigned int m = reinterpret_input_as_3d ? (a->dimension(1) * a->dimension(2)) : a->dimension(1);
475  const unsigned int n = b->dimension(0);
476  const unsigned int k = a->dimension(0);
477  const unsigned int batch_size = reinterpret_input_as_3d ? a->dimension(3) : a->dimension(2);
478 
479  // Select GEMMType
480  _gemm_kernel_type = auto_select_gemm_kernel(auto_heuristics::CommonQuery{ CLScheduler::get().target(), a->data_type(), m, n, k, batch_size }, _reshape_b_only_on_first_run,
481  b->are_values_constant());
482 
483  const bool fuse_add_c = (!(helpers::float_ops::is_zero(beta)) && c != nullptr);
484 
485  ITensorInfo *c_to_use = fuse_add_c ? c : nullptr;
486 
487  switch(_gemm_kernel_type)
488  {
490  {
491  configure_native(compile_context, a, b, c_to_use, output, alpha, beta, gemm_info);
492  break;
493  }
495  {
496  configure_reshaped(compile_context, a, b, c_to_use, output, alpha, beta, gemm_info);
497  break;
498  }
500  {
501  configure_reshaped_only_rhs(compile_context, a, b, c_to_use, output, alpha, beta, gemm_info);
502  break;
503  }
504  default:
505  {
506  ARM_COMPUTE_ERROR("GEMMType not supported");
507  }
508  }
509 }
SimpleTensor< float > b
Definition: DFT.cpp:157
static CLScheduler & get()
Access the scheduler singleton.
#define ARM_COMPUTE_ERROR(msg)
Print the given message then throw an std::runtime_error.
Definition: Error.h:352
GPUTarget target() const
Get the target GPU.
Definition: CLScheduler.cpp:49
A collection of adaptor functions that enable the auto selection between mlgo-based heuristics and de...
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
Reshaped GEMM kernel where only the rhs matrix is reshaped.
Reshaped GEMM kernel where both lhs and rhs matrices are reshaped.
Native GEMM kernel with configurable block size.
bool is_zero(float a, float epsilon=0.00001f)
Checks if the input floating point number is 0.0f checking if the difference is within a range define...
Definition: float_ops.h:109
#define ARM_COMPUTE_LOG_PARAMS(...)
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:157
static Status validate(const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *output, float alpha, float beta, const GEMMInfo &gemm_info)
Static function to check if given info will lead to a valid configuration.
Definition: ClGemm.cpp:511

◆ prepare()

void prepare ( ITensorPack constants)
overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Parameters
[in]constantsVector that contains the constants tensors.
Note
Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from ICLOperator.

Definition at line 637 of file ClGemm.cpp.

References arm_compute::ACL_DST, arm_compute::ACL_SRC, arm_compute::ACL_SRC_1, ARM_COMPUTE_ERROR_ON, ARM_COMPUTE_LOG_INFO_WITH_FUNCNAME_ACL, ICLTensor::cl_buffer(), CLScheduler::enqueue_op(), CLScheduler::get(), CLAuxTensorHandler::get(), ITensorPack::get_const_tensor(), ITensorPack::get_tensor(), and arm_compute::offset_int_vec().

Referenced by ClWinogradConv2d::prepare(), and ClGemm::run().

638 {
639  if(!_is_prepared)
640  {
641  const ITensor *src1 = constants.get_const_tensor(ACL_SRC_1);
642  ICLTensor *rhs_aux = utils::cast::polymorphic_downcast<ICLTensor *>(constants.get_tensor(offset_int_vec(RhsReshape)));
643 
644  // If memory for RHS is persistent and src1 is provided re-transform else assume that RHS is transformed
645  if((_aux_mem[AuxTensorIdx::RhsReshape].lifetime == MemoryLifetime::Persistent) && (src1 != nullptr && rhs_aux != nullptr) && rhs_aux)
646  {
647  ARM_COMPUTE_LOG_INFO_WITH_FUNCNAME_ACL("Transforming RHS Matrix!");
648 
649  CLAuxTensorHandler rhs_reshaped(_tmp_b, *rhs_aux);
650  ARM_COMPUTE_ERROR_ON(rhs_reshaped.get()->cl_buffer().get() == nullptr);
651 
652  ITensorPack reshape_rhs_pack{ { ACL_SRC, src1 }, { ACL_DST, rhs_reshaped.get() } };
653  CLScheduler::get().enqueue_op(*_reshape_rhs_kernel, reshape_rhs_pack, true);
654  }
655  _is_prepared = true;
656  }
657 }
static CLScheduler & get()
Access the scheduler singleton.
#define ARM_COMPUTE_LOG_INFO_WITH_FUNCNAME_ACL(msg)
Log an information message to the logger with function name before the message.
Definition: Log.h:99
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:466
void enqueue_op(ICLKernel &kernel, ITensorPack &tensors, bool flush=true)
Schedule the execution of the passed kernel if possible.
int offset_int_vec(int offset)
Definition: MemoryHelpers.h:38

◆ run()

void run ( ITensorPack tensors)
overridevirtual

Run the kernels contained in the function.

Parameters
[in]tensorsVector that contains the tensors to operate on.

Reimplemented from ICLOperator.

Definition at line 557 of file ClGemm.cpp.

References arm_compute::ACL_DST, arm_compute::ACL_SRC, arm_compute::ACL_SRC_0, arm_compute::ACL_SRC_1, ITensorPack::add_const_tensor(), ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON, ARM_COMPUTE_ERROR_ON_NULLPTR, BorderSize::bottom, arm_compute::test::validation::dst, CLScheduler::enqueue_op(), CLScheduler::get(), CLAuxTensorHandler::get(), ITensorPack::get_const_tensor(), ITensorPack::get_tensor(), ITensor::info(), arm_compute::NATIVE, arm_compute::offset_int_vec(), ITensorInfo::padding(), ClGemm::prepare(), arm_compute::RESHAPED, arm_compute::RESHAPED_ONLY_RHS, and BorderSize::top.

Referenced by ClWinogradConv2d::run().

558 {
559  const ITensor *lhs = tensors.get_const_tensor(ACL_SRC_0);
560  const ITensor *rhs = tensors.get_const_tensor(ACL_SRC_1);
561  ITensor *dst = tensors.get_tensor(ACL_DST);
562 
564 
565  CLAuxTensorHandler lhs_reshaped(offset_int_vec(LhsReshape), _tmp_a, tensors, true);
566  CLAuxTensorHandler rhs_reshaped(offset_int_vec(RhsReshape), _tmp_b, tensors, true);
567 
568  // Prepare the consts if needed
569  prepare(tensors);
570 
571  // Run matrix multiply kernel
572  switch(_gemm_kernel_type)
573  {
575  {
576  CLScheduler::get().enqueue_op(*_mm_native_kernel, tensors, true);
577  break;
578  }
580  {
581  // Run interleave kernel
582  ITensorPack reshape_lhs_pack{ { ACL_SRC, lhs }, { ACL_DST, lhs_reshaped.get() } };
583  CLScheduler::get().enqueue_op(*_reshape_lhs_kernel, reshape_lhs_pack, false);
584 
585  if(!_reshape_b_only_on_first_run)
586  {
587  // Run transpose kernel
588  ITensorPack reshape_rhs_pack{ { ACL_SRC, rhs }, { ACL_DST, rhs_reshaped.get() } };
589  CLScheduler::get().enqueue_op(*_reshape_rhs_kernel, reshape_rhs_pack, false);
590  }
591  // Copy original tensor pack and overwrite lhs and rhs with reshaped counterparts
592  ITensorPack gemm_reshaped_pack(tensors);
593  gemm_reshaped_pack.add_const_tensor(ACL_SRC_0, lhs_reshaped.get());
594  gemm_reshaped_pack.add_const_tensor(ACL_SRC_1, rhs_reshaped.get());
595 
596  if(_gemm_kernel_type == CLGEMMKernelType::RESHAPED)
597  {
598  CLScheduler::get().enqueue_op(*_mm_reshaped_kernel, gemm_reshaped_pack, true);
599  }
600  break;
601  }
603  {
604  if(!_reshape_b_only_on_first_run)
605  {
606  // Run transpose kernel
607  ITensorPack reshape_rhs_pack{ { ACL_SRC, rhs }, { ACL_DST, rhs_reshaped.get() } };
608  CLScheduler::get().enqueue_op(*_reshape_rhs_kernel, reshape_rhs_pack, false);
609  }
610  // In case of RESHAPED_ONLY_RHS, we need to check the padding requirement
611  // Check if the lhs or dst tensors have padding
612  const unsigned int cross_plane_pad_lhs = lhs->info()->padding().top + lhs->info()->padding().bottom;
613  const unsigned int cross_plane_pad_dst = dst->info()->padding().top + dst->info()->padding().bottom;
614  bool has_pad_y = (cross_plane_pad_lhs != 0) || (cross_plane_pad_dst != 0);
615 
616  // Copy original tensor pack and overwrite rhs with reshaped counterpart
617  ITensorPack gemm_reshaped_onlyrhs_pack(tensors);
618  gemm_reshaped_onlyrhs_pack.add_const_tensor(ACL_SRC_1, rhs_reshaped.get());
619 
620  if(has_pad_y)
621  {
622  ARM_COMPUTE_ERROR_ON(has_pad_y);
623  }
624  else
625  {
626  CLScheduler::get().enqueue_op(*_mm_reshaped_only_rhs_kernel, gemm_reshaped_onlyrhs_pack, true);
627  }
628  break;
629  }
630  default:
631  {
632  ARM_COMPUTE_ERROR("GEMMType not supported");
633  }
634  }
635 }
static CLScheduler & get()
Access the scheduler singleton.
#define ARM_COMPUTE_ERROR(msg)
Print the given message then throw an std::runtime_error.
Definition: Error.h:352
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:466
Reshaped GEMM kernel where only the rhs matrix is reshaped.
Reshaped GEMM kernel where both lhs and rhs matrices are reshaped.
void enqueue_op(ICLKernel &kernel, ITensorPack &tensors, bool flush=true)
Schedule the execution of the passed kernel if possible.
Native GEMM kernel with configurable block size.
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:157
void prepare(ITensorPack &constants) override
Prepare the function for executing.
Definition: ClGemm.cpp:637
int offset_int_vec(int offset)
Definition: MemoryHelpers.h:38

◆ validate()

Status validate ( const ITensorInfo a,
const ITensorInfo b,
const ITensorInfo c,
const ITensorInfo output,
float  alpha,
float  beta,
const GEMMInfo gemm_info 
)
static

Static function to check if given info will lead to a valid configuration.

Similar to ClGemm::configure()

Returns
a status

Definition at line 511 of file ClGemm.cpp.

References ITensorInfo::are_values_constant(), ARM_COMPUTE_RETURN_ERROR_MSG, ARM_COMPUTE_RETURN_ON_ERROR, ITensorInfo::data_type(), ITensorInfo::dimension(), CLScheduler::get(), arm_compute::helpers::float_ops::is_zero(), arm_compute::test::validation::k, arm_compute::test::validation::m, arm_compute::test::validation::n, arm_compute::NATIVE, GEMMInfo::reinterpret_input_as_3d(), GEMMInfo::reshape_b_only_on_first_run(), arm_compute::RESHAPED, arm_compute::RESHAPED_ONLY_RHS, and CLScheduler::target().

Referenced by ClGemm::configure(), NEElementwiseUnaryLayer< op >::validate(), NEPReluLayer::validate(), CLPReluLayer::validate(), CLSoftmaxLayerGeneric< IS_LOG >::validate(), NEGEMMConv2d::validate(), CLGEMM::validate(), and CLGEMMLowpMatrixMultiplyCore::validate().

512 {
513  // Get the GPU target
514  bool reinterpret_input_as_3d = gemm_info.reinterpret_input_as_3d();
515  const unsigned int m = reinterpret_input_as_3d ? (a->dimension(1) * a->dimension(2)) : a->dimension(1);
516  const unsigned int n = b->dimension(0);
517  const unsigned int k = a->dimension(0);
518  const unsigned int batch_size = reinterpret_input_as_3d ? a->dimension(3) : a->dimension(2);
519 
520  // Select GEMMType
521  CLGEMMKernelType gemm_kernel_type = auto_select_gemm_kernel(auto_heuristics::CommonQuery
522  {
523  CLScheduler::get().target(), a->data_type(), m, n, k, batch_size,
524  },
525  gemm_info.reshape_b_only_on_first_run(), b->are_values_constant());
526 
527  const bool fuse_add_c = (!(helpers::float_ops::is_zero(beta)) && c != nullptr);
528 
529  const ITensorInfo *c_to_use = fuse_add_c ? c : nullptr;
530 
531  switch(gemm_kernel_type)
532  {
534  {
535  ARM_COMPUTE_RETURN_ON_ERROR(validate_native(a, b, c_to_use, output, alpha, beta, gemm_info));
536  break;
537  }
539  {
540  ARM_COMPUTE_RETURN_ON_ERROR(validate_reshaped(a, b, c_to_use, output, alpha, beta, gemm_info));
541  break;
542  }
544  {
545  ARM_COMPUTE_RETURN_ON_ERROR(validate_reshaped_only_rhs(a, b, c_to_use, output, alpha, beta, gemm_info));
546  break;
547  }
548  default:
549  {
550  ARM_COMPUTE_RETURN_ERROR_MSG("GEMMType not supported");
551  }
552  }
553 
554  return Status{};
555 }
SimpleTensor< float > b
Definition: DFT.cpp:157
static CLScheduler & get()
Access the scheduler singleton.
GPUTarget target() const
Get the target GPU.
Definition: CLScheduler.cpp:49
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204
A collection of adaptor functions that enable the auto selection between mlgo-based heuristics and de...
Reshaped GEMM kernel where only the rhs matrix is reshaped.
CLGEMMKernelType
OpenCL GEMM kernel types.
Definition: CLTypes.h:31
Reshaped GEMM kernel where both lhs and rhs matrices are reshaped.
#define ARM_COMPUTE_RETURN_ERROR_MSG(...)
An error is returned with the given description.
Definition: Error.h:194
Native GEMM kernel with configurable block size.
bool is_zero(float a, float epsilon=0.00001f)
Checks if the input floating point number is 0.0f checking if the difference is within a range define...
Definition: float_ops.h:109

◆ workspace()

experimental::MemoryRequirements workspace ( ) const
overridevirtual

Return the memory requirements required by the workspace.

Reimplemented from ICLOperator.

Definition at line 659 of file ClGemm.cpp.

Referenced by ClWinogradConv2d::configure().

660 {
661  return _aux_mem;
662 }

The documentation for this class was generated from the following files: