Kernel to multiply two input matrices "A" and "B". More...

#include <CpuGemmMatrixMultiplyKernel.h>

Collaboration diagram for CpuGemmMatrixMultiplyKernel:

Data Structures
struct	GemmMatrixMulKernel

Public Member Functions
	CpuGemmMatrixMultiplyKernel ()=default

	ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE (CpuGemmMatrixMultiplyKernel)

void	configure (const ITensorInfo lhs, const ITensorInfo rhs, ITensorInfo *dst, float alpha, bool is_interleaved, const GEMMReshapeInfo &reshape_info=GEMMReshapeInfo())
	Initialise the kernel's input and output. More...

void	run_op (ITensorPack &tensors, const Window &window, const ThreadInfo &info) override
	Execute the kernel on the passed window. More...

const char *	name () const override
	Name of the kernel. More...

Public Member Functions inherited from ICPPKernel
virtual	~ICPPKernel ()=default
	Default destructor. More...

virtual void	run (const Window &window, const ThreadInfo &info)
	Execute the kernel on the passed window. More...

virtual void	run_nd (const Window &window, const ThreadInfo &info, const Window &thread_locator)
	legacy compatibility layer for implemantions which do not support thread_locator In these cases we simply narrow the interface down the legacy version More...

virtual size_t	get_mws (const CPUInfo &platform, size_t thread_count) const
	Return minimum workload size of the relevant kernel. More...

Public Member Functions inherited from IKernel
	IKernel ()
	Constructor. More...

virtual	~IKernel ()=default
	Destructor. More...

virtual bool	is_parallelisable () const
	Indicates whether or not the kernel is parallelisable. More...

virtual BorderSize	border_size () const
	The size of the border for that kernel. More...

const Window &	window () const
	The maximum window the kernel can be executed on. More...

bool	is_window_configured () const
	Function to check if the embedded window of this kernel has been configured. More...

Static Public Member Functions
static Status	validate (const ITensorInfo lhs, const ITensorInfo rhs, const ITensorInfo *dst, float alpha, bool is_interleaved, const GEMMReshapeInfo &reshape_info)
	Static function to check if given info will lead to a valid configuration of CpuGemmMatrixMultiplyKernel. More...

static const std::vector< GemmMatrixMulKernel > &	get_available_kernels ()

Static Public Member Functions inherited from ICpuKernel< CpuGemmMatrixMultiplyKernel >
static const auto *	get_implementation (const SelectorType &selector, KernelSelectionType selection_type=KernelSelectionType::Supported)
	Micro-kernel selector. More...

Additional Inherited Members
Static Public Attributes inherited from ICPPKernel
static constexpr size_t	default_mws = 1

Detailed Description

Kernel to multiply two input matrices "A" and "B".

All elements of the output matrix/vector will be multiplied by alpha after the matrix multiplication

Note: If the output tensor is a matrix, the implementation assumes that the input tensors lhs and rhs are both matrices and reshaped respectively with CpuGemmInterleave4x4Kernel" and CpuGemmTranspose1xWKernel; If the output tensor is a vector and the data type is F32, the implementation assumes that the first input tensor lhs is a vector and the second input tensor rhs a matrix. The implementation also assumes that both tensors have not been reshaped

Definition at line 42 of file CpuGemmMatrixMultiplyKernel.h.

Constructor & Destructor Documentation

◆ CpuGemmMatrixMultiplyKernel()

CpuGemmMatrixMultiplyKernel ( )

default

Member Function Documentation

◆ ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE()

ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE ( CpuGemmMatrixMultiplyKernel )

◆ configure()

void configure	(	const ITensorInfo *	lhs,
		const ITensorInfo *	rhs,
		ITensorInfo *	dst,
		float	alpha,
		bool	is_interleaved,
		const GEMMReshapeInfo &	reshape_info = `GEMMReshapeInfo()`
	)

Initialise the kernel's input and output.

Note: If the output tensor is a matrix, the input matrices lhs and rhs should be the output of the kernels: CpuGemmInterleave4x4Kernel and CpuGemmTranspose1xWKernel These two kernels change the layout of the original matrices to be more cache-friendly.

Parameters

[in]	lhs	Left-handside tensor info containing the interleaved Matrix A or the vector A. Data types supported: F16/F32
[in]	rhs	Right-handside tensor info containing the transposed Matrix B if the first input tensor A is not a vector. If the output tensor is a vector, rhs must contain the matrix B not reshaped. Data type supported: same as `lhs`
[out]	dst	Output tensor to store the result of matrix multiplication. Data type supported: same as `lhs`.
[in]	alpha	Weight of the matrix product
[in]	is_interleaved	(Optional) True if lhs and rhs have been reshaped respectively using CpuGemmInterleave4x4Kernel and CpuGemmTranspose1xWKernel
[in]	reshape_info	(Optional) GEMM reshape info. If is_interleaved_transposed = true, this object must contain the information to understand how `lhs` and `rhs` have been reshaped

Definition at line 125 of file CpuGemmMatrixMultiplyKernel.cpp.

 {
     ARM_COMPUTE_ERROR_ON_NULLPTR(lhs, rhs, dst);
  
     // dst tensor auto inizialitation if not yet initialized
     TensorShape tensor_shape{lhs->tensor_shape()};
     tensor_shape.set(0, is_interleaved ? reshape_info.n() : rhs->dimension(0));
     tensor_shape.set(1, is_interleaved ? reshape_info.m() : lhs->dimension(1));
  
     auto_init_if_empty(*dst, lhs->clone()->set_tensor_shape(tensor_shape));
  
     // Perform validate step
     ARM_COMPUTE_ERROR_THROW_ON(validate_arguments(lhs, rhs, dst, alpha, is_interleaved, reshape_info));
  
     _alpha = alpha;
  
     // Configure kernel window
     Window win{};
  
     // Check if the dst tensor is a vector. If so,the kernel runs the vector-matrix multiplication
     const bool is_dst_vector = (dst->dimension(1) == 1);
     if (is_dst_vector)
     {
         const unsigned int num_elems_processed_per_iteration_x = (lhs->data_type() == DataType::F32) ? 16 : 32;
  
         win = calculate_max_window(*dst, Steps(num_elems_processed_per_iteration_x));
     }
     else
     {
         constexpr unsigned int num_elems_processed_per_iteration_x = 8;
         constexpr unsigned int num_elems_processed_per_iteration_y = 4;
  
         win =
             calculate_max_window(*dst, Steps(num_elems_processed_per_iteration_x, num_elems_processed_per_iteration_y));
     }
  
     const auto uk = CpuGemmMatrixMultiplyKernel::get_implementation(
         DataTypeISASelectorData{lhs->data_type(), CPUInfo::get().get_isa()});
     ARM_COMPUTE_ERROR_ON_NULLPTR(uk);
     _func = uk->ukernel;
  
     ICPPKernel::configure(win);
 }

References ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, arm_compute::auto_init_if_empty(), arm_compute::calculate_max_window(), ICloneable< T >::clone(), ITensorInfo::data_type(), ITensorInfo::dimension(), arm_compute::test::validation::dst, arm_compute::F32, CPUInfo::get(), ICpuKernel< CpuGemmMatrixMultiplyKernel >::get_implementation(), GEMMReshapeInfo::m(), GEMMReshapeInfo::n(), TensorShape::set(), ITensorInfo::tensor_shape(), and arm_compute::cpu::kernels::validate_arguments().

◆ get_available_kernels()

const std::vector< CpuGemmMatrixMultiplyKernel::GemmMatrixMulKernel > & get_available_kernels ( )

static

Definition at line 207 of file CpuGemmMatrixMultiplyKernel.cpp.

 {
     return available_kernels;
 }

◆ name()

const char * name ( ) const

overridevirtual

Name of the kernel.

Returns: Kernel name

Implements ICPPKernel.

Definition at line 201 of file CpuGemmMatrixMultiplyKernel.cpp.

 {
     return "CpuGemmMatrixMultiplyKernel";
 }

◆ run_op()

void run_op	(	ITensorPack &	tensors,
		const Window &	window,
		const ThreadInfo &	info
	)

overridevirtual

Execute the kernel on the passed window.

Warning: If is_parallelisable() returns false then the passed window must be equal to window()

Note: The window has to be a region within the window returned by the window() method; The width of the window has to be a multiple of num_elems_processed_per_iteration().

Parameters

[in]	tensors	A vector containing the tensors to operate on.
[in]	window	Region on which to execute the kernel. (Must be a region of the window returned by window())
[in]	info	Info about executing thread and CPU.

Reimplemented from ICPPKernel.

Definition at line 186 of file CpuGemmMatrixMultiplyKernel.cpp.

 {
     ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL(this);
     ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW(IKernel::window(), window);
     ARM_COMPUTE_ERROR_ON(tensors.empty());
     ARM_COMPUTE_ERROR_ON(_func == nullptr);
  
     const ITensor *lhs = tensors.get_const_tensor(TensorType::ACL_SRC_0);
     const ITensor *rhs = tensors.get_const_tensor(TensorType::ACL_SRC_1);
     ITensor       *dst = tensors.get_tensor(TensorType::ACL_DST);
  
     const bool is_dst_vector = (dst->info()->dimension(1) == 1);
     (*_func)(lhs, rhs, dst, window, info, _alpha, is_dst_vector);
 }

References arm_compute::ACL_DST, arm_compute::ACL_SRC_0, arm_compute::ACL_SRC_1, ARM_COMPUTE_ERROR_ON, ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW, ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL, arm_compute::test::validation::dst, ITensorPack::empty(), ITensorPack::get_const_tensor(), ITensorPack::get_tensor(), arm_compute::test::validation::info, and IKernel::window().

◆ validate()

Status validate	(	const ITensorInfo *	lhs,
		const ITensorInfo *	rhs,
		const ITensorInfo *	dst,
		float	alpha,
		bool	is_interleaved,
		const GEMMReshapeInfo &	reshape_info
	)

static

Static function to check if given info will lead to a valid configuration of CpuGemmMatrixMultiplyKernel.

Returns: a status

Definition at line 174 of file CpuGemmMatrixMultiplyKernel.cpp.

 {
     ARM_COMPUTE_RETURN_ON_ERROR(validate_arguments(lhs, rhs, dst, alpha, is_interleaved, reshape_info));
  
     return Status{};
 }

References ARM_COMPUTE_RETURN_ON_ERROR, arm_compute::test::validation::dst, and arm_compute::cpu::kernels::validate_arguments().

Referenced by CpuGemm::validate().

The documentation for this class was generated from the following files:

src/cpu/kernels/CpuGemmMatrixMultiplyKernel.h
src/cpu/kernels/CpuGemmMatrixMultiplyKernel.cpp

Data Structures

Public Member Functions

Static Public Member Functions

Additional Inherited Members