Compute Library
 19.11
CLGEMM Class Reference

Basic function to execute GEMM on OpenCL. More...

#include <CLGEMM.h>

Collaboration diagram for CLGEMM:
[legend]

Public Member Functions

 CLGEMM (std::shared_ptr< IMemoryManager > memory_manager=nullptr, IWeightsManager *weights_manager=nullptr)
 Default constructor. More...
 
 CLGEMM (const CLGEMM &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 CLGEMM (CLGEMM &&)=default
 Default move constructor. More...
 
CLGEMMoperator= (const CLGEMM &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
CLGEMMoperator= (CLGEMM &&)=default
 Default move assignment operator. More...
 
void configure (const ICLTensor *a, const ICLTensor *b, const ICLTensor *c, ICLTensor *output, float alpha, float beta, const GEMMInfo &gemm_info=GEMMInfo())
 Initialise the kernel's inputs and output. More...
 
void run () override
 Run the kernels contained in the function. More...
 
void prepare () override
 Prepare the function for executing. More...
 
- Public Member Functions inherited from IFunction
virtual ~IFunction ()=default
 Destructor. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *output, float alpha, float beta, const GEMMInfo &gemm_info=GEMMInfo())
 Static function to check if given info will lead to a valid configuration of CLGEMM. More...
 

Detailed Description

Basic function to execute GEMM on OpenCL.

This function calls the following OpenCL kernels:

  1. CLGEMMReshapeLHSMatrixKernel (only if the RESHAPED_V1 is selected by the heuristic model)
  2. CLGEMMReshapeRHSMatrixKernel (only if either the RESHAPED_V1 or RESHAPED_ONLY_RHS is selected by the select_gemm_type method())
  3. CLGEMMMatrixMultiplyKernel (only if either the NATIVE or RESHAPED_V1 is selected by the select_gemm_type method())
  4. CLGEMMMatrixMultiplyReshapedKernel (only if RESHAPED_V1 is selected by the select_gemm_type method())
  5. CLGEMMMatrixMultiplyReshapedOnlyRHSKernel (only if RESHAPED_ONLY_RHS is selected by the select_gemm_type method())

Definition at line 100 of file CLGEMM.h.

Constructor & Destructor Documentation

◆ CLGEMM() [1/3]

CLGEMM ( std::shared_ptr< IMemoryManager memory_manager = nullptr,
IWeightsManager weights_manager = nullptr 
)

Default constructor.

Parameters
[in]memory_manager(Optional) Memory manager.
[in]weights_manager(Optional) Weights manager.

Definition at line 50 of file CLGEMM.cpp.

51  : _memory_group(std::move(memory_manager)),
52  _weights_manager(weights_manager),
53  _mm_kernel(),
54  _reshape_lhs_kernel(),
55  _reshape_rhs_kernel(),
56  _reshape_rhs_kernel_managed(),
57  _mm_reshaped_kernel(),
58  _mm_reshaped_only_rhs_kernel(),
59  _tmp_a(),
60  _tmp_b(),
61  _original_b(nullptr),
62  _reshape_b_only_on_first_run(false),
63  _is_prepared(false),
64  _gemm_type(GEMMType::NATIVE)
65 {
66 }

◆ CLGEMM() [2/3]

CLGEMM ( const CLGEMM )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ CLGEMM() [3/3]

CLGEMM ( CLGEMM &&  )
default

Default move constructor.

Member Function Documentation

◆ configure()

void configure ( const ICLTensor a,
const ICLTensor b,
const ICLTensor c,
ICLTensor output,
float  alpha,
float  beta,
const GEMMInfo gemm_info = GEMMInfo() 
)

Initialise the kernel's inputs and output.

Note
GEMM: General Matrix Multiply - [alpha * A * B + beta * C].
All tensors must have the same data type.
Whilst the first input tensor can be a vector, the second input tensor must be at least a matrix
Parameters
[in]aFirst input tensor (Matrix or Vector A). Data types supported: F16/F32
[in]bSecond input tensor (Matrix B). Data type supported: same as a.
[in]cThird input tensor (Matrix C). It can be a nullptr if just the multiplication between a and b is needed. Data type supported: same as a.
[out]outputOutput tensor. Data type supported: same as a
[in]alphaWeight of the matrix product
[in]betaWeight of matrix C
[in]gemm_info(Optional) Specifies if the matrix A and/or matrix B have been reshaped and if the reshape of matrix B should happen only for the first run. GEMMInfo also contains information about the reshaping in case matrix A and matrix B have been already transformed.

Definition at line 530 of file CLGEMM.cpp.

531 {
532  ARM_COMPUTE_ERROR_ON_NULLPTR(a, b, output);
533 
534  // Perform validation step
535  ARM_COMPUTE_ERROR_THROW_ON(validate(a->info(), b->info(), c != nullptr ? c->info() : nullptr, output->info(), alpha, beta, gemm_info));
536 
537  // Check if we need to reshape the matrix B only on the first run
538  _reshape_b_only_on_first_run = gemm_info.reshape_b_only_on_first_run();
539  _is_prepared = gemm_info.retain_internal_weights();
540  _original_b = b;
541 
542  // Get the GPU target
543  const GPUTarget gpu_target = CLScheduler::get().target();
544  bool reinterpret_input_as_3d = gemm_info.reinterpret_input_as_3d();
545  const unsigned int m = reinterpret_input_as_3d ? (a->info()->dimension(1) * a->info()->dimension(2)) : a->info()->dimension(1);
546  const unsigned int n = b->info()->dimension(0);
547  const unsigned int k = a->info()->dimension(0);
548 
549  // Select GEMMType
550  _gemm_type = select_gemm_type(m, n, k, a->info()->data_type(), _reshape_b_only_on_first_run, gpu_target);
551 
552  const bool fuse_add_c = (!(helpers::float_ops::is_zero(beta)) && c != nullptr);
553 
554  const ICLTensor *c_to_use = fuse_add_c ? c : nullptr;
555 
556  switch(_gemm_type)
557  {
558  case GEMMType::NATIVE:
559  {
560  configure_native(a, b, c_to_use, output, alpha, beta, gemm_info);
561  break;
562  }
563  case GEMMType::RESHAPED_V1:
564  {
565  configure_reshaped_v1(a, b, c_to_use, output, alpha, beta, gemm_info);
566  break;
567  }
568  case GEMMType::RESHAPED_V2:
569  {
570  configure_reshaped_v2(a, b, c_to_use, output, alpha, beta, gemm_info);
571  break;
572  }
573  case GEMMType::RESHAPED_ONLY_RHS:
574  {
575  configure_reshaped_only_rhs(a, b, c_to_use, output, alpha, beta, gemm_info);
576  break;
577  }
578  default:
579  {
580  ARM_COMPUTE_ERROR("GEMMType not supported");
581  }
582  }
583 }
SimpleTensor< float > b
Definition: DFT.cpp:157
static CLScheduler & get()
Access the scheduler singleton.
Definition: CLScheduler.cpp:99
#define ARM_COMPUTE_ERROR(msg)
Print the given message then throw an std::runtime_error.
Definition: Error.h:352
GPUTarget target() const
Get the target GPU.
Definition: CLScheduler.cpp:47
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161
GPUTarget
Available GPU Targets.
Definition: GPUTarget.h:34
bool is_zero(float a, float epsilon=0.00001f)
Checks if the input floating point number is 0.0f checking if the difference is within a range define...
Definition: float_ops.h:109
static Status validate(const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *output, float alpha, float beta, const GEMMInfo &gemm_info=GEMMInfo())
Static function to check if given info will lead to a valid configuration of CLGEMM.
Definition: CLGEMM.cpp:585

References arm_compute::test::validation::alpha, ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, arm_compute::test::validation::b, ITensorInfo::data_type(), ITensorInfo::dimension(), CLScheduler::get(), ITensor::info(), arm_compute::helpers::float_ops::is_zero(), GEMMInfo::reinterpret_input_as_3d(), GEMMInfo::reshape_b_only_on_first_run(), GEMMInfo::retain_internal_weights(), CLScheduler::target(), and CLGEMM::validate().

Referenced by CLRNNLayer::configure(), CLWinogradConvolutionLayer::configure(), CLLSTMLayer::configure(), and CLGEMMDeconvolutionLayer::configure().

◆ operator=() [1/2]

CLGEMM& operator= ( const CLGEMM )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

CLGEMM& operator= ( CLGEMM &&  )
default

Default move assignment operator.

◆ prepare()

void prepare ( )
overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Note
Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from IFunction.

Definition at line 713 of file CLGEMM.cpp.

714 {
715  if(!_is_prepared)
716  {
717  if(_gemm_type != GEMMType::NATIVE && _reshape_b_only_on_first_run)
718  {
719  if(_weights_manager && _weights_manager->are_weights_managed(_original_b))
720  {
721  _weights_manager->run(_original_b, &_reshape_rhs_kernel_managed);
722  }
723  else
724  {
725  // Run transpose kernel and mark original weights tensor as unused
726  _tmp_b.allocator()->allocate();
727  CLScheduler::get().enqueue(_reshape_rhs_kernel, false);
728  _original_b->mark_as_unused();
729  }
730  }
731  CLScheduler::get().queue().finish();
732  _is_prepared = true;
733  }
734 }
static CLScheduler & get()
Access the scheduler singleton.
Definition: CLScheduler.cpp:99
CLTensorAllocator * allocator()
Return a pointer to the tensor's allocator.
Definition: CLTensor.cpp:61
void mark_as_unused() const
Marks a tensor as unused.
Definition: ITensor.cpp:167
bool are_weights_managed(const ITensor *weights)
Check if the weights are managed.
cl::CommandQueue & queue()
Accessor for the associated CL command queue.
Definition: CLScheduler.cpp:41
void enqueue(ICLKernel &kernel, bool flush=true)
Schedule the execution of the passed kernel if possible.
void allocate() override
Allocate size specified by TensorInfo of OpenCL memory.
ITensor * run(const ITensor *weights, ITransformWeights *weights_transform)
Run the reshape function.

References CLTensorAllocator::allocate(), CLTensor::allocator(), IWeightsManager::are_weights_managed(), CLScheduler::enqueue(), CLScheduler::get(), ITensor::mark_as_unused(), CLScheduler::queue(), and IWeightsManager::run().

Referenced by CLRNNLayer::prepare(), CLWinogradConvolutionLayer::prepare(), CLGEMMDeconvolutionLayer::prepare(), CLFullyConnectedLayer::prepare(), CLGEMMConvolutionLayer::prepare(), and CLGEMM::run().

◆ run()

void run ( )
overridevirtual

Run the kernels contained in the function.

For NEON kernels:

  • Multi-threading is used for the kernels which are parallelisable.
  • By default std::thread::hardware_concurrency() threads are used.
Note
CPPScheduler::set_num_threads() can be used to manually set the number of threads

For OpenCL kernels:

  • All the kernels are enqueued on the queue associated with CLScheduler.
  • The queue is then flushed.
Note
The function will not block until the kernels are executed. It is the user's responsibility to wait.
Will call prepare() on first run if hasn't been done

Implements IFunction.

Definition at line 632 of file CLGEMM.cpp.

633 {
634  prepare();
635 
636  MemoryGroupResourceScope scope_mg(_memory_group);
637 
638  // Run matrix multiply kernel
639  switch(_gemm_type)
640  {
641  case GEMMType::NATIVE:
642  {
643  CLScheduler::get().enqueue(_mm_kernel, true);
644  break;
645  }
646  case GEMMType::RESHAPED_V1:
647  {
648  // Run interleave kernel
649  CLScheduler::get().enqueue(_reshape_lhs_kernel, false);
650 
651  if(!_reshape_b_only_on_first_run)
652  {
653  // Run transpose kernel
654  if(_weights_manager && _weights_manager->are_weights_managed(_original_b))
655  {
656  _weights_manager->run(_original_b, &_reshape_rhs_kernel_managed);
657  }
658  else
659  {
660  CLScheduler::get().enqueue(_reshape_rhs_kernel, false);
661  }
662  }
663 
664  CLScheduler::get().enqueue(_mm_kernel, true);
665  break;
666  }
667  case GEMMType::RESHAPED_V2:
668  {
669  // Run interleave kernel
670  CLScheduler::get().enqueue(_reshape_lhs_kernel, false);
671 
672  if(!_reshape_b_only_on_first_run)
673  {
674  // Run transpose kernel
675  if(_weights_manager && _weights_manager->are_weights_managed(_original_b))
676  {
677  _weights_manager->run(_original_b, &_reshape_rhs_kernel_managed);
678  }
679  else
680  {
681  CLScheduler::get().enqueue(_reshape_rhs_kernel, false);
682  }
683  }
684 
685  CLScheduler::get().enqueue(_mm_reshaped_kernel, true);
686  break;
687  }
688  case GEMMType::RESHAPED_ONLY_RHS:
689  {
690  if(!_reshape_b_only_on_first_run)
691  {
692  // Run transpose kernel
693  if(_weights_manager && _weights_manager->are_weights_managed(_original_b))
694  {
695  _weights_manager->run(_original_b, &_reshape_rhs_kernel_managed);
696  }
697  else
698  {
699  CLScheduler::get().enqueue(_reshape_rhs_kernel, false);
700  }
701  }
702 
703  CLScheduler::get().enqueue(_mm_reshaped_only_rhs_kernel, true);
704  break;
705  }
706  default:
707  {
708  ARM_COMPUTE_ERROR("GEMMType not supported");
709  }
710  }
711 }
void prepare() override
Prepare the function for executing.
Definition: CLGEMM.cpp:713
static CLScheduler & get()
Access the scheduler singleton.
Definition: CLScheduler.cpp:99
#define ARM_COMPUTE_ERROR(msg)
Print the given message then throw an std::runtime_error.
Definition: Error.h:352
bool are_weights_managed(const ITensor *weights)
Check if the weights are managed.
void enqueue(ICLKernel &kernel, bool flush=true)
Schedule the execution of the passed kernel if possible.
ITensor * run(const ITensor *weights, ITransformWeights *weights_transform)
Run the reshape function.

References IWeightsManager::are_weights_managed(), ARM_COMPUTE_ERROR, CLScheduler::enqueue(), CLScheduler::get(), CLGEMM::prepare(), and IWeightsManager::run().

Referenced by CLRNNLayer::run(), CLWinogradConvolutionLayer::run(), CLGEMMDeconvolutionLayer::run(), CLLSTMLayer::run(), CLFullyConnectedLayer::run(), and CLGEMMConvolutionLayer::run().

◆ validate()

Status validate ( const ITensorInfo a,
const ITensorInfo b,
const ITensorInfo c,
const ITensorInfo output,
float  alpha,
float  beta,
const GEMMInfo gemm_info = GEMMInfo() 
)
static

Static function to check if given info will lead to a valid configuration of CLGEMM.

Parameters
[in]aFirst input tensor info (Matrix or Vector A). Data types supported: F16/F32
[in]bSecond input tensor info (Matrix B). Data type supported: same as a.
[in]cThird input tensor info (Matrix C). It can be a nullptr if just the multiplication between a and b is needed. Data type supported: same as a.
[in]outputOutput tensor info. Data type supported: same as a
[in]alphaWeight of the matrix product
[in]betaWeight of matrix C
[in]gemm_info(Optional) Specifies if the matrix A and/or matrix B have been reshaped and if the reshape of matrix B should happen only for the first run
Returns
a status

Definition at line 585 of file CLGEMM.cpp.

586 {
587  // Get the GPU target
588  const GPUTarget gpu_target = CLScheduler::get().target();
589  bool reinterpret_input_as_3d = gemm_info.reinterpret_input_as_3d();
590  const unsigned int m = reinterpret_input_as_3d ? (a->dimension(1) * a->dimension(2)) : a->dimension(1);
591  const unsigned int n = b->dimension(0);
592  const unsigned int k = a->dimension(0);
593 
594  // Select GEMMType
595  GEMMType gemm_type = select_gemm_type(m, n, k, a->data_type(), gemm_info.reshape_b_only_on_first_run(), gpu_target);
596 
597  const bool fuse_add_c = (!(helpers::float_ops::is_zero(beta)) && c != nullptr);
598 
599  const ITensorInfo *c_to_use = fuse_add_c ? c : nullptr;
600 
601  switch(gemm_type)
602  {
603  case GEMMType::NATIVE:
604  {
605  ARM_COMPUTE_RETURN_ON_ERROR(validate_native(a, b, c_to_use, output, alpha, beta, gemm_info));
606  break;
607  }
608  case GEMMType::RESHAPED_V1:
609  {
610  ARM_COMPUTE_RETURN_ON_ERROR(validate_reshaped_v1(a, b, c_to_use, output, alpha, beta, gemm_info));
611  break;
612  }
613  case GEMMType::RESHAPED_V2:
614  {
615  ARM_COMPUTE_RETURN_ON_ERROR(validate_reshaped_v2(a, b, c_to_use, output, alpha, beta, gemm_info));
616  break;
617  }
618  case GEMMType::RESHAPED_ONLY_RHS:
619  {
620  ARM_COMPUTE_RETURN_ON_ERROR(validate_reshaped_only_rhs(a, b, c_to_use, output, alpha, beta, gemm_info));
621  break;
622  }
623  default:
624  {
625  ARM_COMPUTE_RETURN_ERROR_MSG("GEMMType not supported");
626  }
627  }
628 
629  return Status{};
630 }
SimpleTensor< float > b
Definition: DFT.cpp:157
static CLScheduler & get()
Access the scheduler singleton.
Definition: CLScheduler.cpp:99
GPUTarget target() const
Get the target GPU.
Definition: CLScheduler.cpp:47
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204
#define ARM_COMPUTE_RETURN_ERROR_MSG(...)
An error is returned with the given description.
Definition: Error.h:194
GPUTarget
Available GPU Targets.
Definition: GPUTarget.h:34
bool is_zero(float a, float epsilon=0.00001f)
Checks if the input floating point number is 0.0f checking if the difference is within a range define...
Definition: float_ops.h:109

References arm_compute::test::validation::alpha, ARM_COMPUTE_RETURN_ERROR_MSG, ARM_COMPUTE_RETURN_ON_ERROR, arm_compute::test::validation::b, ITensorInfo::data_type(), ITensorInfo::dimension(), CLScheduler::get(), arm_compute::helpers::float_ops::is_zero(), GEMMInfo::reinterpret_input_as_3d(), GEMMInfo::reshape_b_only_on_first_run(), and CLScheduler::target().

Referenced by CLGEMM::configure(), CLRNNLayer::validate(), CLWinogradConvolutionLayer::validate(), CLGEMMDeconvolutionLayer::validate(), and CLLSTMLayer::validate().


The documentation for this class was generated from the following files: