Compute Library
 22.11
ClGemm Class Reference

Basic function to execute GEMM on OpenCL. More...

#include <ClGemm.h>

Collaboration diagram for ClGemm:
[legend]

Public Member Functions

 ClGemm ()
 Constructor. More...
 
void configure (const CLCompileContext &compile_context, ITensorInfo *a, ITensorInfo *b, ITensorInfo *c, ITensorInfo *output, float alpha, float beta, const GEMMInfo &gemm_info)
 Initialise the kernel's inputs and output. More...
 
void run (ITensorPack &tensors) override
 Run the kernels contained in the function. More...
 
void prepare (ITensorPack &constants) override
 Prepare the function for executing. More...
 
experimental::MemoryRequirements workspace () const override
 Return the memory requirements required by the workspace. More...
 
- Public Member Functions inherited from ICLOperator
 ICLOperator (IRuntimeContext *ctx=nullptr)
 Constructor. More...
 
 ICLOperator (const ICLOperator &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 ICLOperator (ICLOperator &&)=default
 Default move constructor. More...
 
ICLOperatoroperator= (const ICLOperator &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
ICLOperatoroperator= (ICLOperator &&)=default
 Default move assignment operator. More...
 
- Public Member Functions inherited from IOperator
virtual ~IOperator ()=default
 Destructor. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *output, float alpha, float beta, const GEMMInfo &gemm_info)
 Static function to check if given info will lead to a valid configuration. More...
 

Detailed Description

Basic function to execute GEMM on OpenCL.

This function calls the following OpenCL kernels:

  1. kernels::ClGemmReshapeLhsMatrixKernel (only if the RESHAPED is selected by the heuristic model)
  2. kernels::ClGemmReshapeRhsMatrixKernel (only if either the RESHAPED or RESHAPED_ONLY_RHS is selected by the select_gemm_kernel method())
  3. kernels::ClGemmMatrixMultiplyNativeKernel (only if NATIVE is selected by the select_gemm_kernel method())
  4. kernels::ClGemmMatrixMultiplyReshapedKernel (only if RESHAPED is selected by the select_gemm_kernel method())
  5. kernels::ClGemmMatrixMultiplyReshapedOnlyRhsKernel (only if RESHAPED_ONLY_RHS is selected by the select_gemm_kernel method())
  6. kernels::ClGemmMatrixMultiplyReshapedOnlyRhsMMULKernel (only if RESHAPED_ONLY_RHS_MMUL is selected by the select_gemm_kernel method())

Definition at line 56 of file ClGemm.h.

Constructor & Destructor Documentation

◆ ClGemm()

ClGemm ( )

Constructor.

Definition at line 188 of file ClGemm.cpp.

References arm_compute::NATIVE.

189  : _reshape_lhs_kernel(std::make_unique<ClGemmReshapeLhsMatrixKernel>()),
190  _reshape_rhs_kernel(std::make_unique<ClGemmReshapeRhsMatrixKernel>()),
191  _mm_native_kernel(std::make_unique<ClGemmMatrixMultiplyNativeKernel>()),
192  _mm_reshaped_kernel(std::make_unique<ClGemmMatrixMultiplyReshapedKernel>()),
193  _mm_reshaped_only_rhs_kernel(std::make_unique<ClGemmMatrixMultiplyReshapedOnlyRhsKernel>()),
194  _mm_reshaped_only_rhs_mmul_kernel(std::make_unique<ClGemmMatrixMultiplyReshapedOnlyRhsMMULKernel>()),
195  _tmp_a(),
196  _tmp_b(),
197  _reshape_b_only_on_first_run(false),
198  _gemm_kernel_type(CLGEMMKernelType::NATIVE),
199  _is_prepared(false),
200  _aux_mem(AuxTensorIdx::Count)
201 {
202 }
Native GEMM kernel with configurable block size.

Member Function Documentation

◆ configure()

void configure ( const CLCompileContext compile_context,
ITensorInfo a,
ITensorInfo b,
ITensorInfo c,
ITensorInfo output,
float  alpha,
float  beta,
const GEMMInfo gemm_info 
)

Initialise the kernel's inputs and output.

Valid data layouts:

  • All

Valid data type configurations:

src0 src1 src2 dst
F32 F32 F32 F32
F16 F16 F16 F16
Note
GEMM: General Matrix Multiply - [alpha * A * B + beta * C].
All tensors must have the same data type.
Whilst the first input tensor can be a vector, the second input tensor must be at least a matrix
Parameters
[in]compile_contextThe compile context to be used.
[in]aFirst input tensor (Matrix or Vector A). Data types supported: F16/F32
[in]bSecond input tensor (Matrix B). Data type supported: same as a.
[in]cThird input tensor (Matrix C). It can be a nullptr if just the multiplication between a and b is needed. Data type supported: same as a.
[out]outputOutput tensor. Data type supported: same as a
[in]alphaWeight of the matrix product
[in]betaWeight of matrix C
[in]gemm_info(Optional) Specifies if the matrix A and/or matrix B have been reshaped and if the reshape of matrix B should happen only for the first run. GEMMInfo also contains information about the reshaping in case matrix A and matrix B have been already transformed.

Definition at line 557 of file ClGemm.cpp.

References ITensorInfo::are_values_constant(), ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, ARM_COMPUTE_LOG_PARAMS, ITensorInfo::data_type(), ITensorInfo::dimension(), CLScheduler::get(), arm_compute::helpers::float_ops::is_zero(), arm_compute::test::validation::k, arm_compute::test::validation::m, arm_compute::test::validation::n, arm_compute::NATIVE, GEMMInfo::reinterpret_input_as_3d(), GEMMInfo::reshape_b_only_on_first_run(), arm_compute::RESHAPED, arm_compute::RESHAPED_ONLY_RHS, arm_compute::RESHAPED_ONLY_RHS_MMUL, GEMMInfo::retain_internal_weights(), CLScheduler::target(), and ClGemm::validate().

Referenced by ClWinogradConv2d::configure().

558 {
559  ARM_COMPUTE_ERROR_ON_NULLPTR(a, b, output);
560 
561  // Perform validation step
562  ARM_COMPUTE_ERROR_THROW_ON(validate(a, b, c, output, alpha, beta, gemm_info));
563  ARM_COMPUTE_LOG_PARAMS(a, b, c, output, alpha, beta, gemm_info);
564 
565  // Check if we need to reshape the matrix B only on the first run
566  _reshape_b_only_on_first_run = gemm_info.reshape_b_only_on_first_run();
567  _is_prepared = gemm_info.retain_internal_weights();
568 
569  bool reinterpret_input_as_3d = gemm_info.reinterpret_input_as_3d();
570  const unsigned int m = reinterpret_input_as_3d ? (a->dimension(1) * a->dimension(2)) : a->dimension(1);
571  const unsigned int n = b->dimension(0);
572  const unsigned int k = a->dimension(0);
573  const unsigned int batch_size = reinterpret_input_as_3d ? a->dimension(3) : a->dimension(2);
574 
575  // Select GEMMType
576  _gemm_kernel_type = auto_select_gemm_kernel(auto_heuristics::CommonQuery{ CLScheduler::get().target(), a->data_type(), m, n, k, batch_size }, _reshape_b_only_on_first_run,
577  b->are_values_constant());
578 
579  const bool fuse_add_c = (!(helpers::float_ops::is_zero(beta)) && c != nullptr);
580 
581  ITensorInfo *c_to_use = fuse_add_c ? c : nullptr;
582 
583  switch(_gemm_kernel_type)
584  {
586  {
587  configure_native(compile_context, a, b, c_to_use, output, alpha, beta, gemm_info);
588  break;
589  }
591  {
592  configure_reshaped(compile_context, a, b, c_to_use, output, alpha, beta, gemm_info);
593  break;
594  }
596  {
597  configure_reshaped_only_rhs(compile_context, a, b, c_to_use, output, alpha, beta, gemm_info);
598  break;
599  }
601  {
602  configure_reshaped_only_rhs_mmul(compile_context, a, b, c_to_use, output, alpha, beta, gemm_info);
603  break;
604  }
605  default:
606  {
607  ARM_COMPUTE_ERROR("GEMMType not supported");
608  }
609  }
610 }
SimpleTensor< float > b
Definition: DFT.cpp:157
static CLScheduler & get()
Access the scheduler singleton.
#define ARM_COMPUTE_ERROR(msg)
Print the given message then throw an std::runtime_error.
Definition: Error.h:352
GPUTarget target() const
Get the target GPU.
Definition: CLScheduler.cpp:49
A collection of adaptor functions that enable the auto selection between mlgo-based heuristics and de...
Reshaped GEMM kernel where only the rhs matrix is reshaped.
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
Reshaped GEMM kernel where only the rhs matrix is reshaped.
Reshaped GEMM kernel where both lhs and rhs matrices are reshaped.
Native GEMM kernel with configurable block size.
bool is_zero(float a, float epsilon=0.00001f)
Checks if the input floating point number is 0.0f checking if the difference is within a range define...
Definition: float_ops.h:109
#define ARM_COMPUTE_LOG_PARAMS(...)
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:157
static Status validate(const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *output, float alpha, float beta, const GEMMInfo &gemm_info)
Static function to check if given info will lead to a valid configuration.
Definition: ClGemm.cpp:612

◆ prepare()

void prepare ( ITensorPack constants)
overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Parameters
[in]constantsVector that contains the constants tensors.
Note
Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from ICLOperator.

Definition at line 771 of file ClGemm.cpp.

References arm_compute::ACL_DST, arm_compute::ACL_SRC, arm_compute::ACL_SRC_1, ARM_COMPUTE_ERROR_ON, ARM_COMPUTE_LOG_INFO_WITH_FUNCNAME_ACL, ICLTensor::cl_buffer(), CLScheduler::enqueue_op(), CLScheduler::get(), CLAuxTensorHandler::get(), ITensorPack::get_const_tensor(), ITensorPack::get_tensor(), and arm_compute::offset_int_vec().

Referenced by ClWinogradConv2d::prepare(), and ClGemm::run().

772 {
773  if(!_is_prepared)
774  {
775  const ITensor *src1 = constants.get_const_tensor(ACL_SRC_1);
776  ICLTensor *rhs_aux = utils::cast::polymorphic_downcast<ICLTensor *>(constants.get_tensor(offset_int_vec(RhsReshape)));
777 
778  // If memory for RHS is persistent and src1 is provided re-transform else assume that RHS is transformed
779  if((_aux_mem[AuxTensorIdx::RhsReshape].lifetime == MemoryLifetime::Persistent) && (src1 != nullptr && rhs_aux != nullptr) && rhs_aux)
780  {
781  ARM_COMPUTE_LOG_INFO_WITH_FUNCNAME_ACL("Transforming RHS Matrix!");
782 
783  CLAuxTensorHandler rhs_reshaped(_tmp_b, *rhs_aux);
784  ARM_COMPUTE_ERROR_ON(rhs_reshaped.get()->cl_buffer().get() == nullptr);
785 
786  ITensorPack reshape_rhs_pack{ { ACL_SRC, src1 }, { ACL_DST, rhs_reshaped.get() } };
787  CLScheduler::get().enqueue_op(*_reshape_rhs_kernel, reshape_rhs_pack, true);
788  }
789  _is_prepared = true;
790  }
791 }
static CLScheduler & get()
Access the scheduler singleton.
#define ARM_COMPUTE_LOG_INFO_WITH_FUNCNAME_ACL(msg)
Log an information message to the logger with function name before the message.
Definition: Log.h:99
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:466
void enqueue_op(ICLKernel &kernel, ITensorPack &tensors, bool flush=true)
Schedule the execution of the passed kernel if possible.
int offset_int_vec(int offset)
Definition: MemoryHelpers.h:38

◆ run()

void run ( ITensorPack tensors)
overridevirtual

Run the kernels contained in the function.

Parameters
[in]tensorsVector that contains the tensors to operate on.

Reimplemented from ICLOperator.

Definition at line 663 of file ClGemm.cpp.

References arm_compute::ACL_DST, arm_compute::ACL_SRC, arm_compute::ACL_SRC_0, arm_compute::ACL_SRC_1, ITensorPack::add_const_tensor(), ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON, ARM_COMPUTE_ERROR_ON_NULLPTR, BorderSize::bottom, arm_compute::test::validation::dst, CLScheduler::enqueue_op(), CLScheduler::get(), CLAuxTensorHandler::get(), ITensorPack::get_const_tensor(), ITensorPack::get_tensor(), ITensor::info(), arm_compute::NATIVE, arm_compute::offset_int_vec(), ITensorInfo::padding(), ClGemm::prepare(), arm_compute::RESHAPED, arm_compute::RESHAPED_ONLY_RHS, arm_compute::RESHAPED_ONLY_RHS_MMUL, and BorderSize::top.

Referenced by ClWinogradConv2d::run().

664 {
665  const ITensor *lhs = tensors.get_const_tensor(ACL_SRC_0);
666  const ITensor *rhs = tensors.get_const_tensor(ACL_SRC_1);
667  ITensor *dst = tensors.get_tensor(ACL_DST);
668 
670 
671  CLAuxTensorHandler lhs_reshaped(offset_int_vec(LhsReshape), _tmp_a, tensors, true);
672  CLAuxTensorHandler rhs_reshaped(offset_int_vec(RhsReshape), _tmp_b, tensors, true);
673 
674  // Prepare the consts if needed
675  prepare(tensors);
676 
677  // Run matrix multiply kernel
678  switch(_gemm_kernel_type)
679  {
681  {
682  CLScheduler::get().enqueue_op(*_mm_native_kernel, tensors, true);
683  break;
684  }
686  {
687  // Run interleave kernel
688  ITensorPack reshape_lhs_pack{ { ACL_SRC, lhs }, { ACL_DST, lhs_reshaped.get() } };
689  CLScheduler::get().enqueue_op(*_reshape_lhs_kernel, reshape_lhs_pack, false);
690 
691  if(!_reshape_b_only_on_first_run)
692  {
693  // Run transpose kernel
694  ITensorPack reshape_rhs_pack{ { ACL_SRC, rhs }, { ACL_DST, rhs_reshaped.get() } };
695  CLScheduler::get().enqueue_op(*_reshape_rhs_kernel, reshape_rhs_pack, false);
696  }
697  // Copy original tensor pack and overwrite lhs and rhs with reshaped counterparts
698  ITensorPack gemm_reshaped_pack(tensors);
699  gemm_reshaped_pack.add_const_tensor(ACL_SRC_0, lhs_reshaped.get());
700  gemm_reshaped_pack.add_const_tensor(ACL_SRC_1, rhs_reshaped.get());
701 
702  if(_gemm_kernel_type == CLGEMMKernelType::RESHAPED)
703  {
704  CLScheduler::get().enqueue_op(*_mm_reshaped_kernel, gemm_reshaped_pack, true);
705  }
706  break;
707  }
709  {
710  if(!_reshape_b_only_on_first_run)
711  {
712  // Run transpose kernel
713  ITensorPack reshape_rhs_pack{ { ACL_SRC, rhs }, { ACL_DST, rhs_reshaped.get() } };
714  CLScheduler::get().enqueue_op(*_reshape_rhs_kernel, reshape_rhs_pack, false);
715  }
716  // In case of RESHAPED_ONLY_RHS, we need to check the padding requirement
717  // Check if the lhs or dst tensors have padding
718  const unsigned int cross_plane_pad_lhs = lhs->info()->padding().top + lhs->info()->padding().bottom;
719  const unsigned int cross_plane_pad_dst = dst->info()->padding().top + dst->info()->padding().bottom;
720  bool has_pad_y = (cross_plane_pad_lhs != 0) || (cross_plane_pad_dst != 0);
721 
722  // Copy original tensor pack and overwrite rhs with reshaped counterpart
723  ITensorPack gemm_reshaped_onlyrhs_pack(tensors);
724  gemm_reshaped_onlyrhs_pack.add_const_tensor(ACL_SRC_1, rhs_reshaped.get());
725 
726  if(has_pad_y)
727  {
728  ARM_COMPUTE_ERROR_ON(has_pad_y);
729  }
730  else
731  {
732  CLScheduler::get().enqueue_op(*_mm_reshaped_only_rhs_kernel, gemm_reshaped_onlyrhs_pack, true);
733  }
734  break;
735  }
737  {
738  if(!_reshape_b_only_on_first_run)
739  {
740  // Run transpose kernel
741  ITensorPack reshape_rhs_pack{ { ACL_SRC, rhs }, { ACL_DST, rhs_reshaped.get() } };
742  CLScheduler::get().enqueue_op(*_reshape_rhs_kernel, reshape_rhs_pack, false);
743  }
744  // In case of RESHAPED_ONLY_RHS, we need to check the padding requirement
745  // Check if the lhs or dst tensors have padding
746  const unsigned int cross_plane_pad_lhs = lhs->info()->padding().top + lhs->info()->padding().bottom;
747  const unsigned int cross_plane_pad_dst = dst->info()->padding().top + dst->info()->padding().bottom;
748  bool has_pad_y = (cross_plane_pad_lhs != 0) || (cross_plane_pad_dst != 0);
749 
750  // Copy original tensor pack and overwrite rhs with reshaped counterpart
751  ITensorPack gemm_reshaped_onlyrhs_pack(tensors);
752  gemm_reshaped_onlyrhs_pack.add_const_tensor(ACL_SRC_1, rhs_reshaped.get());
753 
754  if(has_pad_y)
755  {
756  ARM_COMPUTE_ERROR_ON(has_pad_y);
757  }
758  else
759  {
760  CLScheduler::get().enqueue_op(*_mm_reshaped_only_rhs_mmul_kernel, gemm_reshaped_onlyrhs_pack, true);
761  }
762  break;
763  }
764  default:
765  {
766  ARM_COMPUTE_ERROR("GEMMType not supported");
767  }
768  }
769 }
static CLScheduler & get()
Access the scheduler singleton.
#define ARM_COMPUTE_ERROR(msg)
Print the given message then throw an std::runtime_error.
Definition: Error.h:352
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:466
Reshaped GEMM kernel where only the rhs matrix is reshaped.
Reshaped GEMM kernel where only the rhs matrix is reshaped.
Reshaped GEMM kernel where both lhs and rhs matrices are reshaped.
void enqueue_op(ICLKernel &kernel, ITensorPack &tensors, bool flush=true)
Schedule the execution of the passed kernel if possible.
Native GEMM kernel with configurable block size.
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:157
void prepare(ITensorPack &constants) override
Prepare the function for executing.
Definition: ClGemm.cpp:771
int offset_int_vec(int offset)
Definition: MemoryHelpers.h:38

◆ validate()

Status validate ( const ITensorInfo a,
const ITensorInfo b,
const ITensorInfo c,
const ITensorInfo output,
float  alpha,
float  beta,
const GEMMInfo gemm_info 
)
static

Static function to check if given info will lead to a valid configuration.

Similar to ClGemm::configure()

Returns
a status

Definition at line 612 of file ClGemm.cpp.

References ITensorInfo::are_values_constant(), ARM_COMPUTE_RETURN_ERROR_MSG, ARM_COMPUTE_RETURN_ON_ERROR, ITensorInfo::data_type(), ITensorInfo::dimension(), CLScheduler::get(), arm_compute::helpers::float_ops::is_zero(), arm_compute::test::validation::k, arm_compute::test::validation::m, arm_compute::test::validation::n, arm_compute::NATIVE, GEMMInfo::reinterpret_input_as_3d(), GEMMInfo::reshape_b_only_on_first_run(), arm_compute::RESHAPED, arm_compute::RESHAPED_ONLY_RHS, arm_compute::RESHAPED_ONLY_RHS_MMUL, and CLScheduler::target().

Referenced by ClGemm::configure(), NEElementwiseUnaryLayer< op >::validate(), NEPReluLayer::validate(), CLPReluLayer::validate(), CLSoftmaxLayerGeneric< IS_LOG >::validate(), NEGEMMConv2d::validate(), CLGEMM::validate(), and CLGEMMLowpMatrixMultiplyCore::validate().

613 {
614  // Get the GPU target
615  bool reinterpret_input_as_3d = gemm_info.reinterpret_input_as_3d();
616  const unsigned int m = reinterpret_input_as_3d ? (a->dimension(1) * a->dimension(2)) : a->dimension(1);
617  const unsigned int n = b->dimension(0);
618  const unsigned int k = a->dimension(0);
619  const unsigned int batch_size = reinterpret_input_as_3d ? a->dimension(3) : a->dimension(2);
620 
621  // Select GEMMType
622  CLGEMMKernelType gemm_kernel_type = auto_select_gemm_kernel(auto_heuristics::CommonQuery
623  {
624  CLScheduler::get().target(), a->data_type(), m, n, k, batch_size,
625  },
626  gemm_info.reshape_b_only_on_first_run(), b->are_values_constant());
627 
628  const bool fuse_add_c = (!(helpers::float_ops::is_zero(beta)) && c != nullptr);
629 
630  const ITensorInfo *c_to_use = fuse_add_c ? c : nullptr;
631 
632  switch(gemm_kernel_type)
633  {
635  {
636  ARM_COMPUTE_RETURN_ON_ERROR(validate_native(a, b, c_to_use, output, alpha, beta, gemm_info));
637  break;
638  }
640  {
641  ARM_COMPUTE_RETURN_ON_ERROR(validate_reshaped(a, b, c_to_use, output, alpha, beta, gemm_info));
642  break;
643  }
645  {
646  ARM_COMPUTE_RETURN_ON_ERROR(validate_reshaped_only_rhs(a, b, c_to_use, output, alpha, beta, gemm_info));
647  break;
648  }
650  {
651  ARM_COMPUTE_RETURN_ON_ERROR(validate_reshaped_only_rhs_mmul(a, b, c_to_use, output, alpha, beta, gemm_info));
652  break;
653  }
654  default:
655  {
656  ARM_COMPUTE_RETURN_ERROR_MSG("GEMMType not supported");
657  }
658  }
659 
660  return Status{};
661 }
SimpleTensor< float > b
Definition: DFT.cpp:157
static CLScheduler & get()
Access the scheduler singleton.
GPUTarget target() const
Get the target GPU.
Definition: CLScheduler.cpp:49
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204
A collection of adaptor functions that enable the auto selection between mlgo-based heuristics and de...
Reshaped GEMM kernel where only the rhs matrix is reshaped.
Reshaped GEMM kernel where only the rhs matrix is reshaped.
CLGEMMKernelType
OpenCL GEMM kernel types.
Definition: CLTypes.h:31
Reshaped GEMM kernel where both lhs and rhs matrices are reshaped.
#define ARM_COMPUTE_RETURN_ERROR_MSG(...)
An error is returned with the given description.
Definition: Error.h:194
Native GEMM kernel with configurable block size.
bool is_zero(float a, float epsilon=0.00001f)
Checks if the input floating point number is 0.0f checking if the difference is within a range define...
Definition: float_ops.h:109

◆ workspace()

experimental::MemoryRequirements workspace ( ) const
overridevirtual

Return the memory requirements required by the workspace.

Reimplemented from ICLOperator.

Definition at line 793 of file ClGemm.cpp.

Referenced by ClWinogradConv2d::configure().

794 {
795  return _aux_mem;
796 }

The documentation for this class was generated from the following files: