Compute Library
 21.08
ClGemm Class Reference

Basic function to execute GEMM on OpenCL. More...

#include <ClGemm.h>

Collaboration diagram for ClGemm:
[legend]

Public Member Functions

 ClGemm ()
 Constructor. More...
 
void configure (const CLCompileContext &compile_context, ITensorInfo *a, ITensorInfo *b, ITensorInfo *c, ITensorInfo *output, float alpha, float beta, const GEMMInfo &gemm_info)
 Initialise the kernel's inputs and output. More...
 
void run (ITensorPack &tensors) override
 Run the kernels contained in the function. More...
 
void prepare (ITensorPack &constants) override
 Prepare the function for executing. More...
 
experimental::MemoryRequirements workspace () const override
 Return the memory requirements required by the workspace. More...
 
- Public Member Functions inherited from ICLOperator
 ICLOperator (IRuntimeContext *ctx=nullptr)
 Constructor. More...
 
 ICLOperator (const ICLOperator &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 ICLOperator (ICLOperator &&)=default
 Default move constructor. More...
 
ICLOperatoroperator= (const ICLOperator &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
ICLOperatoroperator= (ICLOperator &&)=default
 Default move assignment operator. More...
 
- Public Member Functions inherited from IOperator
virtual ~IOperator ()=default
 Destructor. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *output, float alpha, float beta, const GEMMInfo &gemm_info)
 Static function to check if given info will lead to a valid configuration. More...
 

Detailed Description

Basic function to execute GEMM on OpenCL.

This function calls the following OpenCL kernels:

  1. kernels::ClGemmReshapeLhsMatrixKernel (only if the RESHAPED_V1 is selected by the heuristic model)
  2. kernels::ClGemmReshapeRhsMatrixKernel (only if either the RESHAPED_V1 or RESHAPED_ONLY_RHS is selected by the select_gemm_kernel method())
  3. kernels::ClGemmMatrixMultiplyKernel (only if either the NATIVE or RESHAPED_V1 is selected by the select_gemm_kernel method())
  4. kernels::ClGemmMatrixMultiplyReshapedKernel (only if RESHAPED_V1 is selected by the select_gemm_kernel method())
  5. kernels::ClGemmMatrixMultiplyReshapedOnlyRhsKernel (only if RESHAPED_ONLY_RHS is selected by the select_gemm_kernel method())

Definition at line 55 of file ClGemm.h.

Constructor & Destructor Documentation

◆ ClGemm()

ClGemm ( )

Constructor.

Definition at line 200 of file ClGemm.cpp.

References arm_compute::NATIVE_V1.

201  : _mm_kernel(std::make_unique<ClGemmMatrixMultiplyKernel>()),
202  _reshape_lhs_kernel(std::make_unique<ClGemmReshapeLhsMatrixKernel>()),
203  _reshape_rhs_kernel(std::make_unique<ClGemmReshapeRhsMatrixKernel>()),
204  _mm_reshaped_kernel(std::make_unique<ClGemmMatrixMultiplyReshapedKernel>()),
205  _mm_reshaped_only_rhs_kernel(std::make_unique<ClGemmMatrixMultiplyReshapedOnlyRhsKernel>()),
206  _mm_reshaped_only_rhs_fallback_kernel(std::make_unique<ClGemmMatrixMultiplyReshapedOnlyRhsKernel>()),
207  _tmp_a(),
208  _tmp_b(),
209  _reshape_b_only_on_first_run(false),
210  _gemm_kernel_type(CLGEMMKernelType::NATIVE_V1),
211  _is_prepared(false),
212  _aux_mem(AuxTensorIdx::Count)
213 {
214 }
Native GEMM kernel with fixed block size.

Member Function Documentation

◆ configure()

void configure ( const CLCompileContext compile_context,
ITensorInfo a,
ITensorInfo b,
ITensorInfo c,
ITensorInfo output,
float  alpha,
float  beta,
const GEMMInfo gemm_info 
)

Initialise the kernel's inputs and output.

Valid data layouts:

  • All

Valid data type configurations:

src0 src1 src2 dst
F32 F32 F32 F32
F16 F16 F16 F16
Note
GEMM: General Matrix Multiply - [alpha * A * B + beta * C].
All tensors must have the same data type.
Whilst the first input tensor can be a vector, the second input tensor must be at least a matrix
Parameters
[in]compile_contextThe compile context to be used.
[in]aFirst input tensor (Matrix or Vector A). Data types supported: F16/F32
[in]bSecond input tensor (Matrix B). Data type supported: same as a.
[in]cThird input tensor (Matrix C). It can be a nullptr if just the multiplication between a and b is needed. Data type supported: same as a.
[out]outputOutput tensor. Data type supported: same as a
[in]alphaWeight of the matrix product
[in]betaWeight of matrix C
[in]gemm_info(Optional) Specifies if the matrix A and/or matrix B have been reshaped and if the reshape of matrix B should happen only for the first run. GEMMInfo also contains information about the reshaping in case matrix A and matrix B have been already transformed.

Definition at line 558 of file ClGemm.cpp.

References ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, GEMMInfo::constant_weights(), ITensorInfo::data_type(), ITensorInfo::dimension(), CLScheduler::get(), arm_compute::helpers::float_ops::is_zero(), arm_compute::NATIVE_V1, GEMMInfo::reinterpret_input_as_3d(), GEMMInfo::reshape_b_only_on_first_run(), arm_compute::RESHAPED, arm_compute::RESHAPED_ONLY_RHS, arm_compute::RESHAPED_V1, GEMMInfo::retain_internal_weights(), CLScheduler::target(), and ClGemm::validate().

Referenced by ClWinogradConv2d::configure().

559 {
560  ARM_COMPUTE_ERROR_ON_NULLPTR(a, b, output);
561 
562  // Perform validation step
563  ARM_COMPUTE_ERROR_THROW_ON(validate(a, b, c, output, alpha, beta, gemm_info));
564 
565  // Check if we need to reshape the matrix B only on the first run
566  _reshape_b_only_on_first_run = gemm_info.reshape_b_only_on_first_run();
567  _is_prepared = gemm_info.retain_internal_weights();
568 
569  bool reinterpret_input_as_3d = gemm_info.reinterpret_input_as_3d();
570  const unsigned int m = reinterpret_input_as_3d ? (a->dimension(1) * a->dimension(2)) : a->dimension(1);
571  const unsigned int n = b->dimension(0);
572  const unsigned int k = a->dimension(0);
573  const unsigned int batch_size = reinterpret_input_as_3d ? a->dimension(3) : a->dimension(2);
574 
575  // Select GEMMType
576  _gemm_kernel_type = auto_select_gemm_kernel(auto_heuristics::CommonQuery{ CLScheduler::get().target(), a->data_type(), m, n, k, batch_size }, _reshape_b_only_on_first_run,
577  gemm_info.constant_weights());
578 
579  const bool fuse_add_c = (!(helpers::float_ops::is_zero(beta)) && c != nullptr);
580 
581  ITensorInfo *c_to_use = fuse_add_c ? c : nullptr;
582 
583  switch(_gemm_kernel_type)
584  {
586  {
587  configure_native_v1(compile_context, a, b, c_to_use, output, alpha, beta, gemm_info);
588  break;
589  }
591  {
592  configure_reshaped_v1(compile_context, a, b, c_to_use, output, alpha, beta, gemm_info);
593  break;
594  }
596  {
597  configure_reshaped_v2(compile_context, a, b, c_to_use, output, alpha, beta, gemm_info);
598  break;
599  }
601  {
602  configure_reshaped_only_rhs(compile_context, a, b, c_to_use, output, alpha, beta, gemm_info);
603  break;
604  }
605  default:
606  {
607  ARM_COMPUTE_ERROR("GEMMType not supported");
608  }
609  }
610 }
SimpleTensor< float > b
Definition: DFT.cpp:157
static CLScheduler & get()
Access the scheduler singleton.
#define ARM_COMPUTE_ERROR(msg)
Print the given message then throw an std::runtime_error.
Definition: Error.h:352
GPUTarget target() const
Get the target GPU.
Definition: CLScheduler.cpp:45
A collection of adaptor functions that enable the auto selection between mlgo-based heuristics and de...
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
Reshaped GEMM kernel where only the rhs matrix is reshaped.
Reshaped GEMM kernel where both lhs and rhs matrices are reshaped.
Native GEMM kernel with fixed block size.
bool is_zero(float a, float epsilon=0.00001f)
Checks if the input floating point number is 0.0f checking if the difference is within a range define...
Definition: float_ops.h:109
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:157
static Status validate(const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *output, float alpha, float beta, const GEMMInfo &gemm_info)
Static function to check if given info will lead to a valid configuration.
Definition: ClGemm.cpp:612
Reshaped GEMM kernel where both lhs and rhs matrices are reshaped.

◆ prepare()

void prepare ( ITensorPack constants)
overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Parameters
[in]constantsVector that contains the constants tensors.
Note
Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from ICLOperator.

Definition at line 744 of file ClGemm.cpp.

References arm_compute::ACL_DST, arm_compute::ACL_SRC, arm_compute::ACL_SRC_1, ARM_COMPUTE_ERROR_ON, ARM_COMPUTE_LOG_INFO_WITH_FUNCNAME_ACL, ICLTensor::cl_buffer(), CLScheduler::enqueue_op(), CLScheduler::get(), CLAuxTensorHandler::get(), ITensorPack::get_const_tensor(), ITensorPack::get_tensor(), and arm_compute::offset_int_vec().

Referenced by ClWinogradConv2d::prepare(), and ClGemm::run().

745 {
746  if(!_is_prepared)
747  {
748  const ITensor *src1 = constants.get_const_tensor(ACL_SRC_1);
749  ICLTensor *rhs_aux = utils::cast::polymorphic_downcast<ICLTensor *>(constants.get_tensor(offset_int_vec(RhsReshape)));
750 
751  // If memory for RHS is persistent and src1 is provided re-transform else assume that RHS is transformed
752  if((_aux_mem[AuxTensorIdx::RhsReshape].lifetime == MemoryLifetime::Persistent) && (src1 != nullptr && rhs_aux != nullptr) && rhs_aux)
753  {
754  ARM_COMPUTE_LOG_INFO_WITH_FUNCNAME_ACL("Transforming RHS Matrix!");
755 
756  CLAuxTensorHandler rhs_reshaped(_tmp_b, *rhs_aux);
757  ARM_COMPUTE_ERROR_ON(rhs_reshaped.get()->cl_buffer().get() == nullptr);
758 
759  ITensorPack reshape_rhs_pack{ { ACL_SRC, src1 }, { ACL_DST, rhs_reshaped.get() } };
760  CLScheduler::get().enqueue_op(*_reshape_rhs_kernel, reshape_rhs_pack, true);
761  }
762  _is_prepared = true;
763  }
764 }
static CLScheduler & get()
Access the scheduler singleton.
#define ARM_COMPUTE_LOG_INFO_WITH_FUNCNAME_ACL(msg)
Log an information message to the logger with function name before the message.
Definition: Log.h:98
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:466
void enqueue_op(ICLKernel &kernel, ITensorPack &tensors, bool flush=true)
Schedule the execution of the passed kernel if possible.
int offset_int_vec(int offset)
Definition: MemoryHelpers.h:38

◆ run()

void run ( ITensorPack tensors)
overridevirtual

Run the kernels contained in the function.

Parameters
[in]tensorsVector that contains the tensors to operate on.

Reimplemented from ICLOperator.

Definition at line 663 of file ClGemm.cpp.

References arm_compute::ACL_DST, arm_compute::ACL_SRC, arm_compute::ACL_SRC_0, arm_compute::ACL_SRC_1, arm_compute::ACL_SRC_2, ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON_NULLPTR, BorderSize::bottom, arm_compute::test::validation::dst, CLScheduler::enqueue_op(), CLScheduler::get(), CLAuxTensorHandler::get(), ITensorPack::get_const_tensor(), ITensorPack::get_tensor(), ITensor::info(), arm_compute::NATIVE_V1, arm_compute::offset_int_vec(), ITensorInfo::padding(), ClGemm::prepare(), arm_compute::RESHAPED, arm_compute::RESHAPED_ONLY_RHS, arm_compute::RESHAPED_V1, and BorderSize::top.

Referenced by ClWinogradConv2d::run().

664 {
665  const ITensor *lhs = tensors.get_const_tensor(ACL_SRC_0);
666  const ITensor *rhs = tensors.get_const_tensor(ACL_SRC_1);
667  const ITensor *src2 = tensors.get_const_tensor(ACL_SRC_2);
668  ITensor *dst = tensors.get_tensor(ACL_DST);
669 
671 
672  CLAuxTensorHandler lhs_reshaped(offset_int_vec(LhsReshape), _tmp_a, tensors, true);
673  CLAuxTensorHandler rhs_reshaped(offset_int_vec(RhsReshape), _tmp_b, tensors, true);
674 
675  // Prepare the consts if needed
676  prepare(tensors);
677 
678  // Run matrix multiply kernel
679  switch(_gemm_kernel_type)
680  {
682  {
683  CLScheduler::get().enqueue_op(*_mm_kernel, tensors, true);
684  break;
685  }
688  {
689  // Run interleave kernel
690  ITensorPack reshape_lhs_pack{ { ACL_SRC, lhs }, { ACL_DST, lhs_reshaped.get() } };
691  CLScheduler::get().enqueue_op(*_reshape_lhs_kernel, reshape_lhs_pack, false);
692 
693  if(!_reshape_b_only_on_first_run)
694  {
695  // Run transpose kernel
696  ITensorPack reshape_rhs_pack{ { ACL_SRC, rhs }, { ACL_DST, rhs_reshaped.get() } };
697  CLScheduler::get().enqueue_op(*_reshape_rhs_kernel, reshape_rhs_pack, false);
698  }
699 
700  ITensorPack gemm_reshaped_pack{ { ACL_SRC_0, lhs_reshaped.get() }, { ACL_SRC_1, rhs_reshaped.get() }, { ACL_SRC_2, src2 }, { ACL_DST, dst } };
701 
702  if(_gemm_kernel_type == CLGEMMKernelType::RESHAPED)
703  {
704  CLScheduler::get().enqueue_op(*_mm_reshaped_kernel, gemm_reshaped_pack, true);
705  }
706  else
707  {
708  CLScheduler::get().enqueue_op(*_mm_kernel, gemm_reshaped_pack, true);
709  }
710  break;
711  }
713  {
714  if(!_reshape_b_only_on_first_run)
715  {
716  // Run transpose kernel
717  ITensorPack reshape_rhs_pack{ { ACL_SRC, rhs }, { ACL_DST, rhs_reshaped.get() } };
718  CLScheduler::get().enqueue_op(*_reshape_rhs_kernel, reshape_rhs_pack, false);
719  }
720  // In case of RESHAPED_ONLY_RHS, we need to check the padding requirement
721  // Check if the lhs or dst tensors have padding
722  const unsigned int cross_plane_pad_lhs = lhs->info()->padding().top + lhs->info()->padding().bottom;
723  const unsigned int cross_plane_pad_dst = dst->info()->padding().top + dst->info()->padding().bottom;
724  bool has_pad_y = (cross_plane_pad_lhs != 0) || (cross_plane_pad_dst != 0);
725 
726  ITensorPack gemm_reshaped_onlyrhs_pack{ { ACL_SRC_0, lhs }, { ACL_SRC_1, rhs_reshaped.get() }, { ACL_SRC_2, src2 }, { ACL_DST, dst } };
727  if(has_pad_y)
728  {
729  CLScheduler::get().enqueue_op(*_mm_reshaped_only_rhs_fallback_kernel, gemm_reshaped_onlyrhs_pack, true);
730  }
731  else
732  {
733  CLScheduler::get().enqueue_op(*_mm_reshaped_only_rhs_kernel, gemm_reshaped_onlyrhs_pack, true);
734  }
735  break;
736  }
737  default:
738  {
739  ARM_COMPUTE_ERROR("GEMMType not supported");
740  }
741  }
742 }
static CLScheduler & get()
Access the scheduler singleton.
#define ARM_COMPUTE_ERROR(msg)
Print the given message then throw an std::runtime_error.
Definition: Error.h:352
Reshaped GEMM kernel where only the rhs matrix is reshaped.
Reshaped GEMM kernel where both lhs and rhs matrices are reshaped.
Native GEMM kernel with fixed block size.
void enqueue_op(ICLKernel &kernel, ITensorPack &tensors, bool flush=true)
Schedule the execution of the passed kernel if possible.
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:157
void prepare(ITensorPack &constants) override
Prepare the function for executing.
Definition: ClGemm.cpp:744
int offset_int_vec(int offset)
Definition: MemoryHelpers.h:38
Reshaped GEMM kernel where both lhs and rhs matrices are reshaped.

◆ validate()

Status validate ( const ITensorInfo a,
const ITensorInfo b,
const ITensorInfo c,
const ITensorInfo output,
float  alpha,
float  beta,
const GEMMInfo gemm_info 
)
static

Static function to check if given info will lead to a valid configuration.

Similar to ClGemm::configure()

Returns
a status

Definition at line 612 of file ClGemm.cpp.

References ARM_COMPUTE_RETURN_ERROR_MSG, ARM_COMPUTE_RETURN_ON_ERROR, GEMMInfo::constant_weights(), ITensorInfo::data_type(), ITensorInfo::dimension(), CLScheduler::get(), arm_compute::helpers::float_ops::is_zero(), arm_compute::NATIVE_V1, GEMMInfo::reinterpret_input_as_3d(), GEMMInfo::reshape_b_only_on_first_run(), arm_compute::RESHAPED, arm_compute::RESHAPED_ONLY_RHS, arm_compute::RESHAPED_V1, and CLScheduler::target().

Referenced by ClGemm::configure(), NEElementwiseUnaryLayer< op >::validate(), NEPReluLayer::validate(), CLPReluLayer::validate(), CLSoftmaxLayerGeneric< IS_LOG >::validate(), NEGEMMConv2d::validate(), CLGEMM::validate(), and CLGEMMLowpMatrixMultiplyCore::validate().

613 {
614  // Get the GPU target
615  bool reinterpret_input_as_3d = gemm_info.reinterpret_input_as_3d();
616  const unsigned int m = reinterpret_input_as_3d ? (a->dimension(1) * a->dimension(2)) : a->dimension(1);
617  const unsigned int n = b->dimension(0);
618  const unsigned int k = a->dimension(0);
619  const unsigned int batch_size = reinterpret_input_as_3d ? a->dimension(3) : a->dimension(2);
620 
621  // Select GEMMType
622  CLGEMMKernelType gemm_kernel_type = auto_select_gemm_kernel(auto_heuristics::CommonQuery
623  {
624  CLScheduler::get().target(), a->data_type(), m, n, k, batch_size,
625  },
626  gemm_info.reshape_b_only_on_first_run(), gemm_info.constant_weights());
627 
628  const bool fuse_add_c = (!(helpers::float_ops::is_zero(beta)) && c != nullptr);
629 
630  const ITensorInfo *c_to_use = fuse_add_c ? c : nullptr;
631 
632  switch(gemm_kernel_type)
633  {
635  {
636  ARM_COMPUTE_RETURN_ON_ERROR(validate_native_v1(a, b, c_to_use, output, alpha, beta, gemm_info));
637  break;
638  }
640  {
641  ARM_COMPUTE_RETURN_ON_ERROR(validate_reshaped_v1(a, b, c_to_use, output, alpha, beta, gemm_info));
642  break;
643  }
645  {
646  ARM_COMPUTE_RETURN_ON_ERROR(validate_reshaped(a, b, c_to_use, output, alpha, beta, gemm_info));
647  break;
648  }
650  {
651  ARM_COMPUTE_RETURN_ON_ERROR(validate_reshaped_only_rhs(a, b, c_to_use, output, alpha, beta, gemm_info));
652  break;
653  }
654  default:
655  {
656  ARM_COMPUTE_RETURN_ERROR_MSG("GEMMType not supported");
657  }
658  }
659 
660  return Status{};
661 }
SimpleTensor< float > b
Definition: DFT.cpp:157
static CLScheduler & get()
Access the scheduler singleton.
GPUTarget target() const
Get the target GPU.
Definition: CLScheduler.cpp:45
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204
A collection of adaptor functions that enable the auto selection between mlgo-based heuristics and de...
Reshaped GEMM kernel where only the rhs matrix is reshaped.
CLGEMMKernelType
OpenCL GEMM kernel types.
Definition: CLTypes.h:31
Reshaped GEMM kernel where both lhs and rhs matrices are reshaped.
Native GEMM kernel with fixed block size.
#define ARM_COMPUTE_RETURN_ERROR_MSG(...)
An error is returned with the given description.
Definition: Error.h:194
bool is_zero(float a, float epsilon=0.00001f)
Checks if the input floating point number is 0.0f checking if the difference is within a range define...
Definition: float_ops.h:109
Reshaped GEMM kernel where both lhs and rhs matrices are reshaped.

◆ workspace()

experimental::MemoryRequirements workspace ( ) const
overridevirtual

Return the memory requirements required by the workspace.

Reimplemented from ICLOperator.

Definition at line 766 of file ClGemm.cpp.

Referenced by ClWinogradConv2d::configure().

767 {
768  return _aux_mem;
769 }

The documentation for this class was generated from the following files: