Compute Library
 19.08
CLGEMM Class Reference

Basic function to execute GEMM on OpenCL. More...

#include <CLGEMM.h>

Collaboration diagram for CLGEMM:
[legend]

Public Member Functions

 CLGEMM (std::shared_ptr< IMemoryManager > memory_manager=nullptr)
 Default constructor. More...
 
 CLGEMM (const CLGEMM &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 CLGEMM (CLGEMM &&)=default
 Default move constructor. More...
 
CLGEMMoperator= (const CLGEMM &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
CLGEMMoperator= (CLGEMM &&)=default
 Default move assignment operator. More...
 
void configure (const ICLTensor *a, const ICLTensor *b, const ICLTensor *c, ICLTensor *output, float alpha, float beta, const GEMMInfo &gemm_info=GEMMInfo())
 Initialise the kernel's inputs and output. More...
 
void run () override
 Run the kernels contained in the function. More...
 
void prepare () override
 Prepare the function for executing. More...
 
- Public Member Functions inherited from IFunction
virtual ~IFunction ()=default
 Destructor. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *output, float alpha, float beta, const GEMMInfo &gemm_info=GEMMInfo())
 Static function to check if given info will lead to a valid configuration of CLGEMM. More...
 

Detailed Description

Basic function to execute GEMM on OpenCL.

This function calls the following OpenCL kernels:

  1. CLGEMMReshapeLHSMatrixKernel (only if the RESHAPED_V1 is selected by the heuristic model)
  2. CLGEMMReshapeRHSMatrixKernel (only if either the RESHAPED_V1 or RESHAPED_ONLY_RHS is selected by the select_gemm_type method())
  3. CLGEMMMatrixMultiplyKernel (only if either the NATIVE or RESHAPED_V1 is selected by the select_gemm_type method())
  4. CLGEMMMatrixMultiplyReshapedKernel (only if RESHAPED_V1 is selected by the select_gemm_type method())
  5. CLGEMMMatrixMultiplyReshapedOnlyRHSKernel (only if RESHAPED_ONLY_RHS is selected by the select_gemm_type method())
  6. CLGEMMMatrixAdditionKernel (if c != nullptr and beta != 0.0)

Definition at line 52 of file CLGEMM.h.

Constructor & Destructor Documentation

◆ CLGEMM() [1/3]

CLGEMM ( std::shared_ptr< IMemoryManager memory_manager = nullptr)

Default constructor.

Parameters
[in]memory_manager(Optional) Memory manager.

Definition at line 48 of file CLGEMM.cpp.

49  : _memory_group(std::move(memory_manager)),
50  _mm_kernel(),
51  _reshape_lhs_kernel(),
52  _reshape_rhs_kernel(),
53  _mm_reshaped_kernel(),
54  _mm_reshaped_only_rhs_kernel(),
55  _tmp_a(),
56  _tmp_b(),
57  _original_b(nullptr),
58  _reshape_b_only_on_first_run(false),
59  _is_prepared(false),
60  _gemm_type(GEMMType::NATIVE)
61 {
62 }

◆ CLGEMM() [2/3]

CLGEMM ( const CLGEMM )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ CLGEMM() [3/3]

CLGEMM ( CLGEMM &&  )
default

Default move constructor.

Member Function Documentation

◆ configure()

void configure ( const ICLTensor a,
const ICLTensor b,
const ICLTensor c,
ICLTensor output,
float  alpha,
float  beta,
const GEMMInfo gemm_info = GEMMInfo() 
)

Initialise the kernel's inputs and output.

Note
GEMM: General Matrix Multiply - [alpha * A * B + beta * C].
All tensors must have the same data type.
Whilst the first input tensor can be a vector, the second input tensor must be at least a matrix
Parameters
[in]aFirst input tensor (Matrix or Vector A). Data types supported: F16/F32
[in]bSecond input tensor (Matrix B). Data type supported: same as a.
[in]cThird input tensor (Matrix C). It can be a nullptr if just the multiplication between a and b is needed. Data type supported: same as a.
[out]outputOutput tensor. Data type supported: same as a
[in]alphaWeight of the matrix product
[in]betaWeight of matrix C
[in]gemm_info(Optional) Specifies if the matrix A and/or matrix B have been reshaped and if the reshape of matrix B should happen only for the first run. GEMMInfo also contains information about the reshaping in case matrix A and matrix B have been already transformed.

Definition at line 470 of file CLGEMM.cpp.

471 {
472  ARM_COMPUTE_ERROR_ON_NULLPTR(a, b, output);
473 
474  // Perform validation step
475  ARM_COMPUTE_ERROR_THROW_ON(validate(a->info(), b->info(), c != nullptr ? c->info() : nullptr, output->info(), alpha, beta, gemm_info));
476 
477  // Check if we need to reshape the matrix B only on the first run
478  _reshape_b_only_on_first_run = gemm_info.reshape_b_only_on_first_run();
479  _is_prepared = gemm_info.retain_internal_weights();
480  _original_b = b;
481 
482  // Get the GPU target
483  const GPUTarget gpu_target = CLScheduler::get().target();
484  bool reinterpret_input_as_3d = gemm_info.reinterpret_input_as_3d();
485  const unsigned int m = reinterpret_input_as_3d ? (a->info()->dimension(1) * a->info()->dimension(2)) : a->info()->dimension(1);
486  const unsigned int n = b->info()->dimension(0);
487  const unsigned int k = a->info()->dimension(0);
488 
489  // Select GEMMType
490  _gemm_type = select_gemm_type(m, n, k, a->info()->data_type(), _reshape_b_only_on_first_run, gpu_target);
491 
492  const bool fuse_add_c = (!(helpers::float_ops::is_zero(beta)) && c != nullptr);
493 
494  const ICLTensor *c_to_use = fuse_add_c ? c : nullptr;
495 
496  switch(_gemm_type)
497  {
498  case GEMMType::NATIVE:
499  {
500  configure_native(a, b, c_to_use, output, alpha, beta, gemm_info);
501  break;
502  }
503  case GEMMType::RESHAPED_V1:
504  {
505  configure_reshaped_v1(a, b, c_to_use, output, alpha, beta, gemm_info);
506  break;
507  }
508  case GEMMType::RESHAPED_V2:
509  {
510  configure_reshaped_v2(a, b, c_to_use, output, alpha, beta, gemm_info);
511  break;
512  }
513  case GEMMType::RESHAPED_ONLY_RHS:
514  {
515  configure_reshaped_only_rhs(a, b, c_to_use, output, alpha, beta, gemm_info);
516  break;
517  }
518  default:
519  {
520  ARM_COMPUTE_ERROR("GEMMType not supported");
521  }
522  }
523 }
#define ARM_COMPUTE_ERROR(...)
Print the given message then throw an std::runtime_error.
Definition: Error.h:261
SimpleTensor< float > b
Definition: DFT.cpp:157
static CLScheduler & get()
Access the scheduler singleton.
Definition: CLScheduler.cpp:41
GPUTarget target() const
Get the target GPU.
Definition: CLScheduler.h:112
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:327
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161
GPUTarget
Available GPU Targets.
Definition: GPUTarget.h:34
bool is_zero(float a, float epsilon=0.00001f)
Checks if the input floating point number is 0.0f checking if the difference is within a range define...
Definition: float_ops.h:109
static Status validate(const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *output, float alpha, float beta, const GEMMInfo &gemm_info=GEMMInfo())
Static function to check if given info will lead to a valid configuration of CLGEMM.
Definition: CLGEMM.cpp:525

References arm_compute::test::validation::alpha, ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, arm_compute::test::validation::b, ITensorInfo::data_type(), ITensorInfo::dimension(), CLScheduler::get(), ITensor::info(), arm_compute::helpers::float_ops::is_zero(), GEMMInfo::reinterpret_input_as_3d(), GEMMInfo::reshape_b_only_on_first_run(), GEMMInfo::retain_internal_weights(), CLScheduler::target(), and CLGEMM::validate().

Referenced by CLRNNLayer::configure(), CLWinogradConvolutionLayer::configure(), CLLSTMLayer::configure(), and CLGEMMDeconvolutionLayer::configure().

◆ operator=() [1/2]

CLGEMM& operator= ( const CLGEMM )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

CLGEMM& operator= ( CLGEMM &&  )
default

Default move assignment operator.

◆ prepare()

void prepare ( )
overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Note
Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from IFunction.

Definition at line 632 of file CLGEMM.cpp.

633 {
634  if(!_is_prepared)
635  {
636  if(_gemm_type != GEMMType::NATIVE && _reshape_b_only_on_first_run)
637  {
638  // Run transpose kernel and mark original weights tensor as unused
639  _tmp_b.allocator()->allocate();
640  CLScheduler::get().enqueue(_reshape_rhs_kernel, false);
641  _original_b->mark_as_unused();
642  }
643  CLScheduler::get().queue().finish();
644  _is_prepared = true;
645  }
646 }
static CLScheduler & get()
Access the scheduler singleton.
Definition: CLScheduler.cpp:41
CLTensorAllocator * allocator()
Return a pointer to the tensor's allocator.
Definition: CLTensor.cpp:55
void mark_as_unused() const
Marks a tensor as unused.
Definition: ITensor.cpp:167
void enqueue(ICLKernel &kernel, bool flush=true)
Schedule the execution of the passed kernel if possible.
Definition: CLScheduler.cpp:95
cl::CommandQueue & queue()
Accessor for the associated CL command queue.
Definition: CLScheduler.h:102
void allocate() override
Allocate size specified by TensorInfo of OpenCL memory.

References CLTensorAllocator::allocate(), CLTensor::allocator(), CLScheduler::enqueue(), CLScheduler::get(), ITensor::mark_as_unused(), and CLScheduler::queue().

Referenced by CLRNNLayer::prepare(), CLWinogradConvolutionLayer::prepare(), CLGEMMDeconvolutionLayer::prepare(), CLFullyConnectedLayer::prepare(), CLGEMMConvolutionLayer::prepare(), and CLGEMM::run().

◆ run()

void run ( )
overridevirtual

Run the kernels contained in the function.

For NEON kernels:

  • Multi-threading is used for the kernels which are parallelisable.
  • By default std::thread::hardware_concurrency() threads are used.
Note
CPPScheduler::set_num_threads() can be used to manually set the number of threads

For OpenCL kernels:

  • All the kernels are enqueued on the queue associated with CLScheduler.
  • The queue is then flushed.
Note
The function will not block until the kernels are executed. It is the user's responsibility to wait.
Will call prepare() on first run if hasn't been done

Implements IFunction.

Definition at line 572 of file CLGEMM.cpp.

573 {
574  prepare();
575 
576  MemoryGroupResourceScope scope_mg(_memory_group);
577 
578  // Run matrix multiply kernel
579  switch(_gemm_type)
580  {
581  case GEMMType::NATIVE:
582  {
583  CLScheduler::get().enqueue(_mm_kernel, true);
584  break;
585  }
586  case GEMMType::RESHAPED_V1:
587  {
588  // Run interleave kernel
589  CLScheduler::get().enqueue(_reshape_lhs_kernel, false);
590 
591  if(!_reshape_b_only_on_first_run)
592  {
593  // Run transpose kernel
594  CLScheduler::get().enqueue(_reshape_rhs_kernel, false);
595  }
596 
597  CLScheduler::get().enqueue(_mm_kernel, true);
598  break;
599  }
600  case GEMMType::RESHAPED_V2:
601  {
602  // Run interleave kernel
603  CLScheduler::get().enqueue(_reshape_lhs_kernel, false);
604 
605  if(!_reshape_b_only_on_first_run)
606  {
607  // Run transpose kernel
608  CLScheduler::get().enqueue(_reshape_rhs_kernel, false);
609  }
610 
611  CLScheduler::get().enqueue(_mm_reshaped_kernel, true);
612  break;
613  }
614  case GEMMType::RESHAPED_ONLY_RHS:
615  {
616  if(!_reshape_b_only_on_first_run)
617  {
618  // Run transpose kernel
619  CLScheduler::get().enqueue(_reshape_rhs_kernel, false);
620  }
621 
622  CLScheduler::get().enqueue(_mm_reshaped_only_rhs_kernel, true);
623  break;
624  }
625  default:
626  {
627  ARM_COMPUTE_ERROR("GEMMType not supported");
628  }
629  }
630 }
#define ARM_COMPUTE_ERROR(...)
Print the given message then throw an std::runtime_error.
Definition: Error.h:261
void prepare() override
Prepare the function for executing.
Definition: CLGEMM.cpp:632
static CLScheduler & get()
Access the scheduler singleton.
Definition: CLScheduler.cpp:41
void enqueue(ICLKernel &kernel, bool flush=true)
Schedule the execution of the passed kernel if possible.
Definition: CLScheduler.cpp:95

References ARM_COMPUTE_ERROR, CLScheduler::enqueue(), CLScheduler::get(), and CLGEMM::prepare().

Referenced by CLRNNLayer::run(), CLWinogradConvolutionLayer::run(), CLGEMMDeconvolutionLayer::run(), CLFullyConnectedLayer::run(), CLGEMMConvolutionLayer::run(), and CLLSTMLayer::run().

◆ validate()

Status validate ( const ITensorInfo a,
const ITensorInfo b,
const ITensorInfo c,
const ITensorInfo output,
float  alpha,
float  beta,
const GEMMInfo gemm_info = GEMMInfo() 
)
static

Static function to check if given info will lead to a valid configuration of CLGEMM.

Parameters
[in]aFirst input tensor info (Matrix or Vector A). Data types supported: F16/F32
[in]bSecond input tensor info (Matrix B). Data type supported: same as a.
[in]cThird input tensor info (Matrix C). It can be a nullptr if just the multiplication between a and b is needed. Data type supported: same as a.
[in]outputOutput tensor info. Data type supported: same as a
[in]alphaWeight of the matrix product
[in]betaWeight of matrix C
[in]gemm_info(Optional) Specifies if the matrix A and/or matrix B have been reshaped and if the reshape of matrix B should happen only for the first run
Returns
a status

Definition at line 525 of file CLGEMM.cpp.

526 {
527  // Get the GPU target
528  const GPUTarget gpu_target = CLScheduler::get().target();
529  bool reinterpret_input_as_3d = gemm_info.reinterpret_input_as_3d();
530  const unsigned int m = reinterpret_input_as_3d ? (a->dimension(1) * a->dimension(2)) : a->dimension(1);
531  const unsigned int n = b->dimension(0);
532  const unsigned int k = a->dimension(0);
533 
534  // Select GEMMType
535  GEMMType gemm_type = select_gemm_type(m, n, k, a->data_type(), gemm_info.reshape_b_only_on_first_run(), gpu_target);
536 
537  const bool fuse_add_c = (!(helpers::float_ops::is_zero(beta)) && c != nullptr);
538 
539  const ITensorInfo *c_to_use = fuse_add_c ? c : nullptr;
540 
541  switch(gemm_type)
542  {
543  case GEMMType::NATIVE:
544  {
545  ARM_COMPUTE_RETURN_ON_ERROR(validate_native(a, b, c_to_use, output, alpha, beta, gemm_info));
546  break;
547  }
548  case GEMMType::RESHAPED_V1:
549  {
550  ARM_COMPUTE_RETURN_ON_ERROR(validate_reshaped_v1(a, b, c_to_use, output, alpha, beta, gemm_info));
551  break;
552  }
553  case GEMMType::RESHAPED_V2:
554  {
555  ARM_COMPUTE_RETURN_ON_ERROR(validate_reshaped_v2(a, b, c_to_use, output, alpha, beta, gemm_info));
556  break;
557  }
558  case GEMMType::RESHAPED_ONLY_RHS:
559  {
560  ARM_COMPUTE_RETURN_ON_ERROR(validate_reshaped_only_rhs(a, b, c_to_use, output, alpha, beta, gemm_info));
561  break;
562  }
563  default:
564  {
565  ARM_COMPUTE_RETURN_ERROR_MSG("GEMMType not supported");
566  }
567  }
568 
569  return Status{};
570 }
SimpleTensor< float > b
Definition: DFT.cpp:157
static CLScheduler & get()
Access the scheduler singleton.
Definition: CLScheduler.cpp:41
GPUTarget target() const
Get the target GPU.
Definition: CLScheduler.h:112
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:193
#define ARM_COMPUTE_RETURN_ERROR_MSG(...)
An error is returned with the given description.
Definition: Error.h:183
GPUTarget
Available GPU Targets.
Definition: GPUTarget.h:34
bool is_zero(float a, float epsilon=0.00001f)
Checks if the input floating point number is 0.0f checking if the difference is within a range define...
Definition: float_ops.h:109

References arm_compute::test::validation::alpha, ARM_COMPUTE_RETURN_ERROR_MSG, ARM_COMPUTE_RETURN_ON_ERROR, arm_compute::test::validation::b, ITensorInfo::data_type(), ITensorInfo::dimension(), CLScheduler::get(), arm_compute::helpers::float_ops::is_zero(), GEMMInfo::reinterpret_input_as_3d(), GEMMInfo::reshape_b_only_on_first_run(), and CLScheduler::target().

Referenced by CLGEMM::configure(), CLRNNLayer::validate(), CLWinogradConvolutionLayer::validate(), CLGEMMDeconvolutionLayer::validate(), and CLLSTMLayer::validate().


The documentation for this class was generated from the following files: