Basic function to execute GEMM on OpenCL. More...

#include <CLGEMM.h>

Collaboration diagram for CLGEMM:

Public Member Functions
	CLGEMM (std::shared_ptr< IMemoryManager > memory_manager=nullptr, IWeightsManager *weights_manager=nullptr)
	Default constructor. More...

	CLGEMM (const CLGEMM &)=delete
	Prevent instances of this class from being copied (As this class contains pointers) More...

	CLGEMM (CLGEMM &&)=default
	Default move constructor. More...

CLGEMM &	operator= (const CLGEMM &)=delete
	Prevent instances of this class from being copied (As this class contains pointers) More...

CLGEMM &	operator= (CLGEMM &&)=default
	Default move assignment operator. More...

	~CLGEMM ()
	Default destructor. More...

void	configure (const ICLTensor a, const ICLTensor b, const ICLTensor c, ICLTensor output, float alpha, float beta, const GEMMInfo &gemm_info=GEMMInfo())
	Initialise the kernel's inputs and output. More...

void	configure (const CLCompileContext &compile_context, const ICLTensor a, const ICLTensor b, const ICLTensor c, ICLTensor output, float alpha, float beta, const GEMMInfo &gemm_info=GEMMInfo())
	Initialise the kernel's inputs and output. More...

void	run () override
	Run the kernels contained in the function. More...

void	prepare () override
	Prepare the function for executing. More...

Public Member Functions inherited from IFunction
virtual	~IFunction ()=default
	Destructor. More...

Static Public Member Functions
static Status	validate (const ITensorInfo a, const ITensorInfo b, const ITensorInfo c, const ITensorInfo output, float alpha, float beta, const GEMMInfo &gemm_info=GEMMInfo())
	Static function to check if given info will lead to a valid configuration of CLGEMM. More...

Detailed Description

Basic function to execute GEMM on OpenCL.

This function calls the following OpenCL kernels:

CLGEMMReshapeLHSMatrixKernel (only if the RESHAPED_V1 is selected by the heuristic model)
CLGEMMReshapeRHSMatrixKernel (only if either the RESHAPED_V1 or RESHAPED_ONLY_RHS is selected by the select_gemm_kernel method())
CLGEMMMatrixMultiplyKernel (only if either the NATIVE or RESHAPED_V1 is selected by the select_gemm_kernel method())
CLGEMMMatrixMultiplyReshapedKernel (only if RESHAPED_V1 is selected by the select_gemm_kernel method())
CLGEMMMatrixMultiplyReshapedOnlyRHSKernel (only if RESHAPED_ONLY_RHS is selected by the select_gemm_kernel method())

Definition at line 108 of file CLGEMM.h.

Constructor & Destructor Documentation

◆ CLGEMM() [1/3]

CLGEMM	(	std::shared_ptr< IMemoryManager >	memory_manager = `nullptr`,
		IWeightsManager *	weights_manager = `nullptr`
	)

Default constructor.

Parameters

[in]	memory_manager	(Optional) Memory manager.
[in]	weights_manager	(Optional) Weights manager.

Definition at line 233 of file CLGEMM.cpp.

     : _memory_group(std::move(memory_manager)),
       _weights_manager(weights_manager),
       _mm_kernel(std::make_unique<CLGEMMMatrixMultiplyKernel>()),
       _reshape_lhs_kernel(std::make_unique<CLGEMMReshapeLHSMatrixKernel>()),
       _reshape_rhs_kernel(std::make_unique<CLGEMMReshapeRHSMatrixKernel>()),
       _reshape_rhs_kernel_managed(std::make_unique<weights_transformations::CLGEMMReshapeRHSMatrixKernelManaged>()),
       _mm_reshaped_kernel(std::make_unique<CLGEMMMatrixMultiplyReshapedKernel>()),
       _mm_reshaped_only_rhs_kernel(std::make_unique<CLGEMMMatrixMultiplyReshapedOnlyRHSKernel>()),
       _mm_reshaped_only_rhs_fallback_kernel(std::make_unique<CLGEMMMatrixMultiplyReshapedOnlyRHSKernel>()),
       _tmp_a(),
       _tmp_b(),
       _original_b(nullptr),
       _lhs(nullptr),
       _dst(nullptr),
       _reshape_b_only_on_first_run(false),
       _is_prepared(false),
       _gemm_kernel_type(CLGEMMKernelType::NATIVE_V1)
 {
 }

◆ CLGEMM() [2/3]

CLGEMM ( const CLGEMM & )

delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ CLGEMM() [3/3]

CLGEMM ( CLGEMM && )

default

Default move constructor.

◆ ~CLGEMM()

~CLGEMM ( )

default

Default destructor.

Member Function Documentation

◆ configure() [1/2]

void configure	(	const ICLTensor *	a,
		const ICLTensor *	b,
		const ICLTensor *	c,
		ICLTensor *	output,
		float	alpha,
		float	beta,
		const GEMMInfo &	gemm_info = `GEMMInfo()`
	)

Initialise the kernel's inputs and output.

Note: GEMM: General Matrix Multiply - [alpha * A * B + beta * C].; All tensors must have the same data type.; Whilst the first input tensor can be a vector, the second input tensor must be at least a matrix

Parameters

[in]	a	First input tensor (Matrix or Vector A). Data types supported: F16/F32
[in]	b	Second input tensor (Matrix B). Data type supported: same as `a`.
[in]	c	Third input tensor (Matrix C). It can be a nullptr if just the multiplication between `a` and `b` is needed. Data type supported: same as `a`.
[out]	output	Output tensor. Data type supported: same as `a`
[in]	alpha	Weight of the matrix product
[in]	beta	Weight of matrix C
[in]	gemm_info	(Optional) Specifies if the matrix A and/or matrix B have been reshaped and if the reshape of matrix B should happen only for the first run. GEMMInfo also contains information about the reshaping in case matrix A and matrix B have been already transformed.

Definition at line 666 of file CLGEMM.cpp.

References CLKernelLibrary::get().

Referenced by CLRNNLayer::configure(), CLWinogradConvolutionLayer::configure(), CLGEMMDeconvolutionLayer::configure(), and CLLSTMLayer::configure().

 {
     configure(CLKernelLibrary::get().get_compile_context(), a, b, c, output, alpha, beta, gemm_info);
 }

◆ configure() [2/2]

void configure	(	const CLCompileContext &	compile_context,
		const ICLTensor *	a,
		const ICLTensor *	b,
		const ICLTensor *	c,
		ICLTensor *	output,
		float	alpha,
		float	beta,
		const GEMMInfo &	gemm_info = `GEMMInfo()`
	)

Initialise the kernel's inputs and output.

Note: GEMM: General Matrix Multiply - [alpha * A * B + beta * C].; All tensors must have the same data type.; Whilst the first input tensor can be a vector, the second input tensor must be at least a matrix

Parameters

[in]	compile_context	The compile context to be used.
[in]	a	First input tensor (Matrix or Vector A). Data types supported: F16/F32
[in]	b	Second input tensor (Matrix B). Data type supported: same as `a`.
[in]	c	Third input tensor (Matrix C). It can be a nullptr if just the multiplication between `a` and `b` is needed. Data type supported: same as `a`.
[out]	output	Output tensor. Data type supported: same as `a`
[in]	alpha	Weight of the matrix product
[in]	beta	Weight of matrix C
[in]	gemm_info	(Optional) Specifies if the matrix A and/or matrix B have been reshaped and if the reshape of matrix B should happen only for the first run. GEMMInfo also contains information about the reshaping in case matrix A and matrix B have been already transformed.

Definition at line 671 of file CLGEMM.cpp.

References ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, arm_compute::test::validation::b, ITensorInfo::data_type(), ITensorInfo::dimension(), CLScheduler::get(), ITensor::info(), arm_compute::helpers::float_ops::is_zero(), arm_compute::NATIVE_V1, GEMMInfo::reinterpret_input_as_3d(), GEMMInfo::reshape_b_only_on_first_run(), arm_compute::RESHAPED, arm_compute::RESHAPED_ONLY_RHS, arm_compute::RESHAPED_V1, GEMMInfo::retain_internal_weights(), CLScheduler::target(), and CLGEMM::validate().

 {
     ARM_COMPUTE_ERROR_ON_NULLPTR(a, b, output);
 
     // Perform validation step
     ARM_COMPUTE_ERROR_THROW_ON(validate(a->info(), b->info(), c != nullptr ? c->info() : nullptr, output->info(), alpha, beta, gemm_info));
 
     // Check if we need to reshape the matrix B only on the first run
     _reshape_b_only_on_first_run = gemm_info.reshape_b_only_on_first_run();
     _is_prepared                 = gemm_info.retain_internal_weights();
     _original_b                  = b;
     _lhs                         = a;
     _dst                         = output;
 
     bool               reinterpret_input_as_3d = gemm_info.reinterpret_input_as_3d();
     const unsigned int m                       = reinterpret_input_as_3d ? (a->info()->dimension(1) * a->info()->dimension(2)) : a->info()->dimension(1);
     const unsigned int n                       = b->info()->dimension(0);
     const unsigned int k                       = a->info()->dimension(0);
     const unsigned int batch_size              = reinterpret_input_as_3d ? a->info()->dimension(3) : a->info()->dimension(2);
 
     // Select GEMMType
     _gemm_kernel_type = auto_select_gemm_kernel(auto_heuristics::CommonQuery{ CLScheduler::get().target(), a->info()->data_type(), m, n, k, batch_size }, _reshape_b_only_on_first_run);
 
     const bool fuse_add_c = (!(helpers::float_ops::is_zero(beta)) && c != nullptr);
 
     const ICLTensor *c_to_use = fuse_add_c ? c : nullptr;
 
     switch(_gemm_kernel_type)
     {
         case CLGEMMKernelType::NATIVE_V1:
         {
             configure_native_v1(compile_context, a, b, c_to_use, output, alpha, beta, gemm_info);
             break;
         }
         case CLGEMMKernelType::RESHAPED_V1:
         {
             configure_reshaped_v1(compile_context, a, b, c_to_use, output, alpha, beta, gemm_info);
             break;
         }
         case CLGEMMKernelType::RESHAPED:
         {
             configure_reshaped_v2(compile_context, a, b, c_to_use, output, alpha, beta, gemm_info);
             break;
         }
         case CLGEMMKernelType::RESHAPED_ONLY_RHS:
         {
             configure_reshaped_only_rhs(compile_context, a, b, c_to_use, output, alpha, beta, gemm_info);
             break;
         }
         default:
         {
             ARM_COMPUTE_ERROR("GEMMType not supported");
         }
     }
 }

◆ operator=() [1/2]

CLGEMM& operator= ( const CLGEMM & )

delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

CLGEMM& operator= ( CLGEMM && )

default

Default move assignment operator.

◆ prepare()

void prepare ( )

overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Note: Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from IFunction.

Definition at line 870 of file CLGEMM.cpp.

References CLTensorAllocator::allocate(), CLTensor::allocator(), IWeightsManager::are_weights_managed(), CLScheduler::enqueue(), CLScheduler::get(), ITensor::mark_as_unused(), arm_compute::NATIVE_V1, CLScheduler::queue(), and IWeightsManager::run().

Referenced by CLRNNLayer::prepare(), CLWinogradConvolutionLayer::prepare(), CLGEMMDeconvolutionLayer::prepare(), CLFullyConnectedLayer::prepare(), CLGEMMConvolutionLayer::prepare(), and CLGEMM::run().

 {
     if(!_is_prepared)
     {
         if(_gemm_kernel_type != CLGEMMKernelType::NATIVE_V1 && _reshape_b_only_on_first_run)
         {
             if(_weights_manager && _weights_manager->are_weights_managed(_original_b))
             {
                 _weights_manager->run(_original_b, _reshape_rhs_kernel_managed.get());
             }
             else
             {
                 // Run transpose kernel and mark original weights tensor as unused
                 _tmp_b.allocator()->allocate();
                 CLScheduler::get().enqueue(*_reshape_rhs_kernel, false);
                 _original_b->mark_as_unused();
             }
         }
         CLScheduler::get().queue().finish();
         _is_prepared = true;
     }
 }

◆ run()

void run ( )

overridevirtual

Run the kernels contained in the function.

For Neon kernels:

Multi-threading is used for the kernels which are parallelisable.
By default std::thread::hardware_concurrency() threads are used.

Note: CPPScheduler::set_num_threads() can be used to manually set the number of threads

For OpenCL kernels:

All the kernels are enqueued on the queue associated with CLScheduler.
The queue is then flushed.

Note: The function will not block until the kernels are executed. It is the user's responsibility to wait.; Will call prepare() on first run if hasn't been done

Implements IFunction.

Definition at line 778 of file CLGEMM.cpp.

References IWeightsManager::are_weights_managed(), ARM_COMPUTE_ERROR, BorderSize::bottom, CLScheduler::enqueue(), CLScheduler::get(), ITensor::info(), arm_compute::NATIVE_V1, ITensorInfo::padding(), CLGEMM::prepare(), arm_compute::RESHAPED, arm_compute::RESHAPED_ONLY_RHS, arm_compute::RESHAPED_V1, IWeightsManager::run(), and BorderSize::top.

Referenced by CLRNNLayer::run(), CLWinogradConvolutionLayer::run(), CLGEMMDeconvolutionLayer::run(), CLFullyConnectedLayer::run(), CLLSTMLayer::run(), and CLGEMMConvolutionLayer::run().

 {
     prepare();
     MemoryGroupResourceScope scope_mg(_memory_group);
 
     // Run matrix multiply kernel
     switch(_gemm_kernel_type)
     {
         case CLGEMMKernelType::NATIVE_V1:
         {
             CLScheduler::get().enqueue(*_mm_kernel, true);
             break;
         }
         case CLGEMMKernelType::RESHAPED_V1:
         {
             // Run interleave kernel
             CLScheduler::get().enqueue(*_reshape_lhs_kernel, false);
 
             if(!_reshape_b_only_on_first_run)
             {
                 // Run transpose kernel
                 if(_weights_manager && _weights_manager->are_weights_managed(_original_b))
                 {
                     _weights_manager->run(_original_b, _reshape_rhs_kernel_managed.get());
                 }
                 else
                 {
                     CLScheduler::get().enqueue(*_reshape_rhs_kernel, false);
                 }
             }
 
             CLScheduler::get().enqueue(*_mm_kernel, true);
             break;
         }
         case CLGEMMKernelType::RESHAPED:
         {
             // Run interleave kernel
             CLScheduler::get().enqueue(*_reshape_lhs_kernel, false);
 
             if(!_reshape_b_only_on_first_run)
             {
                 // Run transpose kernel
                 if(_weights_manager && _weights_manager->are_weights_managed(_original_b))
                 {
                     _weights_manager->run(_original_b, _reshape_rhs_kernel_managed.get());
                 }
                 else
                 {
                     CLScheduler::get().enqueue(*_reshape_rhs_kernel, false);
                 }
             }
 
             CLScheduler::get().enqueue(*_mm_reshaped_kernel, true);
             break;
         }
         case CLGEMMKernelType::RESHAPED_ONLY_RHS:
         {
             if(!_reshape_b_only_on_first_run)
             {
                 // Run transpose kernel
                 if(_weights_manager && _weights_manager->are_weights_managed(_original_b))
                 {
                     _weights_manager->run(_original_b, _reshape_rhs_kernel_managed.get());
                 }
                 else
                 {
                     CLScheduler::get().enqueue(*_reshape_rhs_kernel, false);
                 }
             }
             // In case of RESHAPED_ONLY_RHS, we need to check the padding requirement
             // Check if the lhs or dst tensors have padding
             const unsigned int cross_plane_pad_lhs = _lhs->info()->padding().top + _lhs->info()->padding().bottom;
             const unsigned int cross_plane_pad_dst = _dst->info()->padding().top + _dst->info()->padding().bottom;
 
             bool has_pad_y = (cross_plane_pad_lhs != 0) || (cross_plane_pad_dst != 0);
             if(has_pad_y)
             {
                 CLScheduler::get().enqueue(*_mm_reshaped_only_rhs_fallback_kernel, true);
             }
             else
             {
                 CLScheduler::get().enqueue(*_mm_reshaped_only_rhs_kernel, true);
             }
             break;
         }
         default:
         {
             ARM_COMPUTE_ERROR("GEMMType not supported");
         }
     }
 }

◆ validate()

Status validate	(	const ITensorInfo *	a,
		const ITensorInfo *	b,
		const ITensorInfo *	c,
		const ITensorInfo *	output,
		float	alpha,
		float	beta,
		const GEMMInfo &	gemm_info = `GEMMInfo()`
	)

static

Static function to check if given info will lead to a valid configuration of CLGEMM.

Parameters

[in]	a	First input tensor info (Matrix or Vector A). Data types supported: F16/F32
[in]	b	Second input tensor info (Matrix B). Data type supported: same as `a`.
[in]	c	Third input tensor info (Matrix C). It can be a nullptr if just the multiplication between `a` and `b` is needed. Data type supported: same as `a`.
[in]	output	Output tensor info. Data type supported: same as `a`
[in]	alpha	Weight of the matrix product
[in]	beta	Weight of matrix C
[in]	gemm_info	(Optional) Specifies if the matrix A and/or matrix B have been reshaped and if the reshape of matrix B should happen only for the first run

Returns: a status

Definition at line 727 of file CLGEMM.cpp.

References ARM_COMPUTE_RETURN_ERROR_MSG, ARM_COMPUTE_RETURN_ON_ERROR, ITensorInfo::data_type(), ITensorInfo::dimension(), CLScheduler::get(), arm_compute::helpers::float_ops::is_zero(), arm_compute::NATIVE_V1, GEMMInfo::reinterpret_input_as_3d(), GEMMInfo::reshape_b_only_on_first_run(), arm_compute::RESHAPED, arm_compute::RESHAPED_ONLY_RHS, arm_compute::RESHAPED_V1, and CLScheduler::target().

Referenced by CLGEMM::configure(), CLRNNLayer::validate(), CLWinogradConvolutionLayer::validate(), CLGEMMDeconvolutionLayer::validate(), and CLLSTMLayer::validate().

 {
     // Get the GPU target
     bool               reinterpret_input_as_3d = gemm_info.reinterpret_input_as_3d();
     const unsigned int m                       = reinterpret_input_as_3d ? (a->dimension(1) * a->dimension(2)) : a->dimension(1);
     const unsigned int n                       = b->dimension(0);
     const unsigned int k                       = a->dimension(0);
     const unsigned int batch_size              = reinterpret_input_as_3d ? a->dimension(3) : a->dimension(2);
 
     // Select GEMMType
     CLGEMMKernelType gemm_kernel_type = auto_select_gemm_kernel(auto_heuristics::CommonQuery
     {
         CLScheduler::get().target(), a->data_type(), m, n, k, batch_size,
     },
     gemm_info.reshape_b_only_on_first_run());
 
     const bool fuse_add_c = (!(helpers::float_ops::is_zero(beta)) && c != nullptr);
 
     const ITensorInfo *c_to_use = fuse_add_c ? c : nullptr;
 
     switch(gemm_kernel_type)
     {
         case CLGEMMKernelType::NATIVE_V1:
         {
             ARM_COMPUTE_RETURN_ON_ERROR(validate_native_v1(a, b, c_to_use, output, alpha, beta, gemm_info));
             break;
         }
         case CLGEMMKernelType::RESHAPED_V1:
         {
             ARM_COMPUTE_RETURN_ON_ERROR(validate_reshaped_v1(a, b, c_to_use, output, alpha, beta, gemm_info));
             break;
         }
         case CLGEMMKernelType::RESHAPED:
         {
             ARM_COMPUTE_RETURN_ON_ERROR(validate_reshaped(a, b, c_to_use, output, alpha, beta, gemm_info));
             break;
         }
         case CLGEMMKernelType::RESHAPED_ONLY_RHS:
         {
             ARM_COMPUTE_RETURN_ON_ERROR(validate_reshaped_only_rhs(a, b, c_to_use, output, alpha, beta, gemm_info));
             break;
         }
         default:
         {
             ARM_COMPUTE_RETURN_ERROR_MSG("GEMMType not supported");
         }
     }
 
     return Status{};
 }

The documentation for this class was generated from the following files:

arm_compute/runtime/CL/functions/CLGEMM.h
src/runtime/CL/functions/CLGEMM.cpp

Public Member Functions

Static Public Member Functions

Detailed Description

Constructor & Destructor Documentation

◆ CLGEMM() [1/3]

◆ CLGEMM() [2/3]

◆ CLGEMM() [3/3]

◆ ~CLGEMM()

Member Function Documentation

◆ configure() [1/2]

◆ configure() [2/2]

◆ operator=() [1/2]

◆ operator=() [2/2]

◆ prepare()

◆ run()

◆ validate()