Basic function to execute GEMM on OpenCL. More...

#include <ClGemm.h>

Collaboration diagram for ClGemm:

Public Member Functions
	ClGemm ()
	Constructor. More...

void	configure (const CLCompileContext &compile_context, ITensorInfo a, ITensorInfo b, ITensorInfo c, ITensorInfo output, float alpha, float beta, const GEMMInfo &gemm_info)
	Initialise the kernel's inputs and output. More...

void	run (ITensorPack &tensors) override
	Run the kernels contained in the function. More...

void	prepare (ITensorPack &constants) override
	Prepare the function for executing. More...

experimental::MemoryRequirements	workspace () const override
	Return the memory requirements required by the workspace. More...

Public Member Functions inherited from ICLOperator
	ICLOperator (IRuntimeContext *ctx=nullptr)
	Constructor. More...

	ICLOperator (const ICLOperator &)=delete
	Prevent instances of this class from being copied (As this class contains pointers) More...

	ICLOperator (ICLOperator &&)=default
	Default move constructor. More...

ICLOperator &	operator= (const ICLOperator &)=delete
	Prevent instances of this class from being copied (As this class contains pointers) More...

ICLOperator &	operator= (ICLOperator &&)=default
	Default move assignment operator. More...

Public Member Functions inherited from IOperator
virtual	~IOperator ()=default
	Destructor. More...

Static Public Member Functions
static Status	validate (const ITensorInfo a, const ITensorInfo b, const ITensorInfo c, const ITensorInfo output, float alpha, float beta, const GEMMInfo &gemm_info)
	Static function to check if given info will lead to a valid configuration. More...

Detailed Description

Basic function to execute GEMM on OpenCL.

This function calls the following OpenCL kernels:

kernels::ClGemmReshapeLhsMatrixKernel (only if the RESHAPED is selected by the heuristic model)
kernels::ClGemmReshapeRhsMatrixKernel (only if either the RESHAPED or RESHAPED_ONLY_RHS is selected by the select_gemm_kernel method())
kernels::ClGemmMatrixMultiplyNativeKernel (only if NATIVE is selected by the select_gemm_kernel method())
kernels::ClGemmMatrixMultiplyReshapedKernel (only if RESHAPED is selected by the select_gemm_kernel method())
kernels::ClGemmMatrixMultiplyReshapedOnlyRhsKernel (only if RESHAPED_ONLY_RHS is selected by the select_gemm_kernel method())
kernels::ClGemmMatrixMultiplyReshapedOnlyRhsMMULKernel (only if RESHAPED_ONLY_RHS_MMUL is selected by the select_gemm_kernel method())

Definition at line 57 of file ClGemm.h.

Constructor & Destructor Documentation

◆ ClGemm()

ClGemm ( )

Constructor.

Definition at line 223 of file ClGemm.cpp.

     : _reshape_lhs_kernel(std::make_unique<ClGemmReshapeLhsMatrixKernel>()),
       _reshape_rhs_kernel(std::make_unique<ClGemmReshapeRhsMatrixKernel>()),
       _mm_native_kernel(std::make_unique<ClGemmMatrixMultiplyNativeKernel>()),
       _mm_reshaped_kernel(std::make_unique<ClGemmMatrixMultiplyReshapedKernel>()),
       _mm_reshaped_only_rhs_kernel(std::make_unique<ClGemmMatrixMultiplyReshapedOnlyRhsKernel>()),
       _mm_reshaped_only_rhs_mmul_kernel(std::make_unique<ClGemmMatrixMultiplyReshapedOnlyRhsMMULKernel>()),
       _tmp_a(),
       _tmp_b(),
       _reshape_b_only_on_first_run(false),
       _gemm_kernel_type(CLGEMMKernelType::NATIVE),
       _is_prepared(false),
       _aux_mem(AuxTensorIdx::Count)
 {
 }

References arm_compute::NATIVE.

Member Function Documentation

◆ configure()

void configure	(	const CLCompileContext &	compile_context,
		ITensorInfo *	a,
		ITensorInfo *	b,
		ITensorInfo *	c,
		ITensorInfo *	output,
		float	alpha,
		float	beta,
		const GEMMInfo &	gemm_info
	)

Initialise the kernel's inputs and output.

Valid data layouts:

All

Valid data type configurations:

src0	src1	src2	dst
F32	F32	F32	F32
F16	F16	F16	F16

Note: GEMM: General Matrix Multiply - [alpha * A * B + beta * C].; All tensors must have the same data type.; Whilst the first input tensor can be a vector, the second input tensor must be at least a matrix; Batched GEMM only allows RHS tensor's rank to be <= 3; Batched GEMM only supports broadcasting cases where RHS rank < LHS rank but not the other way around

Parameters

[in]	compile_context	The compile context to be used.
[in]	a	First input tensor (Matrix or Vector A). Data types supported: F16/F32
[in]	b	Second input tensor (Matrix B). Data type supported: same as `a`.
[in]	c	Third input tensor (Matrix C). It can be a nullptr if just the multiplication between `a` and `b` is needed. Data type supported: same as `a`.
[out]	output	Output tensor. Data type supported: same as `a`
[in]	alpha	Weight of the matrix product
[in]	beta	Weight of matrix C
[in]	gemm_info	(Optional) Specifies if the matrix A and/or matrix B have been reshaped and if the reshape of matrix B should happen only for the first run. GEMMInfo also contains information about the reshaping in case matrix A and matrix B have been already transformed.

Definition at line 657 of file ClGemm.cpp.

 {
     ARM_COMPUTE_ERROR_ON_NULLPTR(a, b, output);
  
     // Perform validation step
     ARM_COMPUTE_ERROR_THROW_ON(validate(a, b, c, output, alpha, beta, gemm_info));
     ARM_COMPUTE_LOG_PARAMS(a, b, c, output, alpha, beta, gemm_info);
  
     // Check if we need to reshape the matrix B only on the first run
     _reshape_b_only_on_first_run = gemm_info.reshape_b_only_on_first_run();
     _is_prepared                 = gemm_info.retain_internal_weights();
  
     bool               reinterpret_input_as_3d = gemm_info.reinterpret_input_as_3d();
     const unsigned int m          = reinterpret_input_as_3d ? (a->dimension(1) * a->dimension(2)) : a->dimension(1);
     const unsigned int n          = b->dimension(0);
     const unsigned int k          = a->dimension(0);
     const unsigned int batch_size = reinterpret_input_as_3d ? a->dimension(3) : a->dimension(2);
  
     // Select GEMMType
     _gemm_kernel_type = auto_select_gemm_kernel(
         auto_heuristics::CommonQuery{CLScheduler::get().target(), a->data_type(), m, n, k, batch_size},
         _reshape_b_only_on_first_run, b->are_values_constant());
  
     const bool fuse_add_c = (!(helpers::float_ops::is_zero(beta)) && c != nullptr);
  
     ITensorInfo *c_to_use = fuse_add_c ? c : nullptr;
  
     switch (_gemm_kernel_type)
     {
         case CLGEMMKernelType::NATIVE:
         {
             configure_native(compile_context, a, b, c_to_use, output, alpha, beta, gemm_info);
             break;
         }
         case CLGEMMKernelType::RESHAPED:
         {
             configure_reshaped(compile_context, a, b, c_to_use, output, alpha, beta, gemm_info);
             break;
         }
         case CLGEMMKernelType::RESHAPED_ONLY_RHS:
         {
             configure_reshaped_only_rhs(compile_context, a, b, c_to_use, output, alpha, beta, gemm_info);
             break;
         }
         case CLGEMMKernelType::RESHAPED_ONLY_RHS_MMUL:
         {
             configure_reshaped_only_rhs_mmul(compile_context, a, b, c_to_use, output, alpha, beta, gemm_info);
             break;
         }
         default:
         {
             ARM_COMPUTE_ERROR("GEMMType not supported");
         }
     }
 }

References ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, ARM_COMPUTE_LOG_PARAMS, arm_compute::test::validation::b, ITensorInfo::data_type(), ITensorInfo::dimension(), CLScheduler::get(), arm_compute::helpers::float_ops::is_zero(), arm_compute::NATIVE, GEMMInfo::reinterpret_input_as_3d(), GEMMInfo::reshape_b_only_on_first_run(), arm_compute::RESHAPED, arm_compute::RESHAPED_ONLY_RHS, arm_compute::RESHAPED_ONLY_RHS_MMUL, GEMMInfo::retain_internal_weights(), CLScheduler::target(), and ClGemm::validate().

Referenced by ClWinogradConv2d::configure().

◆ prepare()

void prepare ( ITensorPack & constants )

overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Parameters

[in] constants Vector that contains the constants tensors.

Note: Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from ICLOperator.

Definition at line 894 of file ClGemm.cpp.

 {
     if (!_is_prepared)
     {
         const ITensor *src1 = constants.get_const_tensor(ACL_SRC_1);
         ICLTensor     *rhs_aux =
             utils::cast::polymorphic_downcast<ICLTensor *>(constants.get_tensor(offset_int_vec(RhsReshape)));
  
         // If memory for RHS is persistent and src1 is provided re-transform else assume that RHS is transformed
         if ((_aux_mem[AuxTensorIdx::RhsReshape].lifetime == MemoryLifetime::Persistent) &&
             (src1 != nullptr && rhs_aux != nullptr) && rhs_aux)
         {
             ARM_COMPUTE_LOG_INFO_WITH_FUNCNAME_ACL("Transforming RHS Matrix!");
  
             CLAuxTensorHandler rhs_reshaped(_tmp_b, *rhs_aux);
             ARM_COMPUTE_ERROR_ON(rhs_reshaped.get()->cl_buffer().get() == nullptr);
  
             ITensorPack reshape_rhs_pack{{ACL_SRC, src1}, {ACL_DST, rhs_reshaped.get()}};
             CLScheduler::get().enqueue_op(*_reshape_rhs_kernel, reshape_rhs_pack, true);
         }
         _is_prepared = true;
     }
 }

References arm_compute::ACL_DST, arm_compute::ACL_SRC, arm_compute::ACL_SRC_1, ARM_COMPUTE_ERROR_ON, ARM_COMPUTE_LOG_INFO_WITH_FUNCNAME_ACL, ICLTensor::cl_buffer(), CLScheduler::enqueue_op(), CLScheduler::get(), CLAuxTensorHandler::get(), ITensorPack::get_const_tensor(), ITensorPack::get_tensor(), and arm_compute::offset_int_vec().

Referenced by ClWinogradConv2d::prepare(), and ClGemm::run().

◆ run()

void run ( ITensorPack & tensors )

overridevirtual

Run the kernels contained in the function.

Parameters

[in] tensors Vector that contains the tensors to operate on.

Reimplemented from ICLOperator.

Definition at line 786 of file ClGemm.cpp.

 {
     const ITensor *lhs = tensors.get_const_tensor(ACL_SRC_0);
     const ITensor *rhs = tensors.get_const_tensor(ACL_SRC_1);
     ITensor       *dst = tensors.get_tensor(ACL_DST);
  
     ARM_COMPUTE_ERROR_ON_NULLPTR(lhs, dst);
  
     CLAuxTensorHandler lhs_reshaped(offset_int_vec(LhsReshape), _tmp_a, tensors, true);
     CLAuxTensorHandler rhs_reshaped(offset_int_vec(RhsReshape), _tmp_b, tensors, true);
  
     // Prepare the consts if needed
     prepare(tensors);
  
     // Run matrix multiply kernel
     switch (_gemm_kernel_type)
     {
         case CLGEMMKernelType::NATIVE:
         {
             CLScheduler::get().enqueue_op(*_mm_native_kernel, tensors, true);
             break;
         }
         case CLGEMMKernelType::RESHAPED:
         {
             // Run interleave kernel
             ITensorPack reshape_lhs_pack{{ACL_SRC, lhs}, {ACL_DST, lhs_reshaped.get()}};
             CLScheduler::get().enqueue_op(*_reshape_lhs_kernel, reshape_lhs_pack, false);
  
             if (!_reshape_b_only_on_first_run)
             {
                 // Run transpose kernel
                 ITensorPack reshape_rhs_pack{{ACL_SRC, rhs}, {ACL_DST, rhs_reshaped.get()}};
                 CLScheduler::get().enqueue_op(*_reshape_rhs_kernel, reshape_rhs_pack, false);
             }
             // Copy original tensor pack and overwrite lhs and rhs with reshaped counterparts
             ITensorPack gemm_reshaped_pack(tensors);
             gemm_reshaped_pack.add_const_tensor(ACL_SRC_0, lhs_reshaped.get());
             gemm_reshaped_pack.add_const_tensor(ACL_SRC_1, rhs_reshaped.get());
  
             if (_gemm_kernel_type == CLGEMMKernelType::RESHAPED)
             {
                 CLScheduler::get().enqueue_op(*_mm_reshaped_kernel, gemm_reshaped_pack, true);
             }
             break;
         }
         case CLGEMMKernelType::RESHAPED_ONLY_RHS:
         {
             if (!_reshape_b_only_on_first_run)
             {
                 // Run transpose kernel
                 ITensorPack reshape_rhs_pack{{ACL_SRC, rhs}, {ACL_DST, rhs_reshaped.get()}};
                 CLScheduler::get().enqueue_op(*_reshape_rhs_kernel, reshape_rhs_pack, false);
             }
             // In case of RESHAPED_ONLY_RHS, we need to check the padding requirement
             // Check if the lhs or dst tensors have padding
             const unsigned int cross_plane_pad_lhs = lhs->info()->padding().top + lhs->info()->padding().bottom;
             const unsigned int cross_plane_pad_dst = dst->info()->padding().top + dst->info()->padding().bottom;
             bool               has_pad_y           = (cross_plane_pad_lhs != 0) || (cross_plane_pad_dst != 0);
  
             // Copy original tensor pack and overwrite rhs with reshaped counterpart
             ITensorPack gemm_reshaped_onlyrhs_pack(tensors);
             gemm_reshaped_onlyrhs_pack.add_const_tensor(ACL_SRC_1, rhs_reshaped.get());
  
             if (has_pad_y)
             {
                 ARM_COMPUTE_ERROR_ON(has_pad_y);
             }
             else
             {
                 CLScheduler::get().enqueue_op(*_mm_reshaped_only_rhs_kernel, gemm_reshaped_onlyrhs_pack, true);
             }
             break;
         }
         case CLGEMMKernelType::RESHAPED_ONLY_RHS_MMUL:
         {
             if (!_reshape_b_only_on_first_run)
             {
                 // Run transpose kernel
                 ITensorPack reshape_rhs_pack{{ACL_SRC, rhs}, {ACL_DST, rhs_reshaped.get()}};
                 CLScheduler::get().enqueue_op(*_reshape_rhs_kernel, reshape_rhs_pack, false);
             }
             // In case of RESHAPED_ONLY_RHS, we need to check the padding requirement
             // Check if the lhs or dst tensors have padding
             const unsigned int cross_plane_pad_lhs = lhs->info()->padding().top + lhs->info()->padding().bottom;
             const unsigned int cross_plane_pad_dst = dst->info()->padding().top + dst->info()->padding().bottom;
             bool               has_pad_y           = (cross_plane_pad_lhs != 0) || (cross_plane_pad_dst != 0);
  
             // Copy original tensor pack and overwrite rhs with reshaped counterpart
             ITensorPack gemm_reshaped_onlyrhs_pack(tensors);
             gemm_reshaped_onlyrhs_pack.add_const_tensor(ACL_SRC_1, rhs_reshaped.get());
  
             if (has_pad_y)
             {
                 ARM_COMPUTE_ERROR_ON(has_pad_y);
             }
             else
             {
                 CLScheduler::get().enqueue_op(*_mm_reshaped_only_rhs_mmul_kernel, gemm_reshaped_onlyrhs_pack, true);
             }
             break;
         }
         default:
         {
             ARM_COMPUTE_ERROR("GEMMType not supported");
         }
     }
 }

References arm_compute::ACL_DST, arm_compute::ACL_SRC, arm_compute::ACL_SRC_0, arm_compute::ACL_SRC_1, ITensorPack::add_const_tensor(), ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON, ARM_COMPUTE_ERROR_ON_NULLPTR, BorderSize::bottom, arm_compute::test::validation::dst, CLScheduler::enqueue_op(), CLScheduler::get(), CLAuxTensorHandler::get(), ITensorPack::get_const_tensor(), ITensorPack::get_tensor(), ITensor::info(), arm_compute::NATIVE, arm_compute::offset_int_vec(), ITensorInfo::padding(), ClGemm::prepare(), arm_compute::RESHAPED, arm_compute::RESHAPED_ONLY_RHS, arm_compute::RESHAPED_ONLY_RHS_MMUL, and BorderSize::top.

Referenced by ClWinogradConv2d::run().

◆ validate()

Status validate	(	const ITensorInfo *	a,
		const ITensorInfo *	b,
		const ITensorInfo *	c,
		const ITensorInfo *	output,
		float	alpha,
		float	beta,
		const GEMMInfo &	gemm_info
	)

static

Static function to check if given info will lead to a valid configuration.

Similar to ClGemm::configure()

Returns: a status

Definition at line 720 of file ClGemm.cpp.

 {
     // Get the GPU target
     bool               reinterpret_input_as_3d = gemm_info.reinterpret_input_as_3d();
     const unsigned int m          = reinterpret_input_as_3d ? (a->dimension(1) * a->dimension(2)) : a->dimension(1);
     const unsigned int n          = b->dimension(0);
     const unsigned int k          = a->dimension(0);
     const unsigned int batch_size = reinterpret_input_as_3d ? a->dimension(3) : a->dimension(2);
  
     // Check data type early because the auto_select_gemm_kernel has assertions on supported data types
     ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(a, 1, DataType::F32, DataType::F16);
  
     // Select GEMMType
     CLGEMMKernelType gemm_kernel_type = auto_select_gemm_kernel(
         auto_heuristics::CommonQuery{
             CLScheduler::get().target(),
             a->data_type(),
             m,
             n,
             k,
             batch_size,
         },
         gemm_info.reshape_b_only_on_first_run(), b->are_values_constant());
  
     const bool fuse_add_c = (!(helpers::float_ops::is_zero(beta)) && c != nullptr);
  
     const ITensorInfo *c_to_use = fuse_add_c ? c : nullptr;
  
     switch (gemm_kernel_type)
     {
         case CLGEMMKernelType::NATIVE:
         {
             ARM_COMPUTE_RETURN_ON_ERROR(validate_native(a, b, c_to_use, output, alpha, beta, gemm_info));
             break;
         }
         case CLGEMMKernelType::RESHAPED:
         {
             ARM_COMPUTE_RETURN_ON_ERROR(validate_reshaped(a, b, c_to_use, output, alpha, beta, gemm_info));
             break;
         }
         case CLGEMMKernelType::RESHAPED_ONLY_RHS:
         {
             ARM_COMPUTE_RETURN_ON_ERROR(validate_reshaped_only_rhs(a, b, c_to_use, output, alpha, beta, gemm_info));
             break;
         }
         case CLGEMMKernelType::RESHAPED_ONLY_RHS_MMUL:
         {
             ARM_COMPUTE_RETURN_ON_ERROR(
                 validate_reshaped_only_rhs_mmul(a, b, c_to_use, output, alpha, beta, gemm_info));
             break;
         }
         default:
         {
             ARM_COMPUTE_RETURN_ERROR_MSG("GEMMType not supported");
         }
     }
  
     return Status{};
 }

References ARM_COMPUTE_RETURN_ERROR_MSG, ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_RETURN_ON_ERROR, arm_compute::test::validation::b, ITensorInfo::data_type(), ITensorInfo::dimension(), arm_compute::F16, arm_compute::F32, CLScheduler::get(), arm_compute::helpers::float_ops::is_zero(), arm_compute::NATIVE, GEMMInfo::reinterpret_input_as_3d(), GEMMInfo::reshape_b_only_on_first_run(), arm_compute::RESHAPED, arm_compute::RESHAPED_ONLY_RHS, arm_compute::RESHAPED_ONLY_RHS_MMUL, and CLScheduler::target().

Referenced by ClGemm::configure(), NEElementwiseUnaryLayer< op >::validate(), NEPReluLayer::validate(), CLPReluLayer::validate(), CLSoftmaxLayerGeneric< IS_LOG >::validate(), NEGEMMConv2d::validate(), CLMatMul::validate(), CLGEMM::validate(), and CLGEMMLowpMatrixMultiplyCore::validate().

◆ workspace()

experimental::MemoryRequirements workspace ( ) const

overridevirtual

Return the memory requirements required by the workspace.

Reimplemented from ICLOperator.

Definition at line 918 of file ClGemm.cpp.

 {
     return _aux_mem;
 }

Referenced by ClWinogradConv2d::configure().

The documentation for this class was generated from the following files:

src/gpu/cl/operators/ClGemm.h
src/gpu/cl/operators/ClGemm.cpp

Public Member Functions

Static Public Member Functions

Detailed Description

Constructor & Destructor Documentation

◆ ClGemm()

Member Function Documentation

◆ configure()

◆ prepare()

◆ run()

◆ validate()

◆ workspace()