GLES Compute kernel to multiply two input matrices "A" and "B" or to multiply a vector "A" by a matrix "B". More...

#include <GCGEMMMatrixMultiplyKernel.h>

Collaboration diagram for GCGEMMMatrixMultiplyKernel:

Public Member Functions
	GCGEMMMatrixMultiplyKernel ()
	Default constructor. More...

	GCGEMMMatrixMultiplyKernel (const GCGEMMMatrixMultiplyKernel &)=delete
	Prevent instances of this class from being copied (As this class contains pointers) More...

GCGEMMMatrixMultiplyKernel &	operator= (const GCGEMMMatrixMultiplyKernel &)=delete
	Prevent instances of this class from being copied (As this class contains pointers) More...

	GCGEMMMatrixMultiplyKernel (GCGEMMMatrixMultiplyKernel &&)=default
	Allow instances of this class to be moved. More...

GCGEMMMatrixMultiplyKernel &	operator= (GCGEMMMatrixMultiplyKernel &&)=default
	Allow instances of this class to be moved. More...

void	configure (const IGCTensor input0, const IGCTensor input1, IGCTensor *output, float alpha, bool is_interleaved_transposed=true, const GEMMReshapeInfo &reshape_info=GEMMReshapeInfo())
	Initialise the kernel's input, output and alpha. More...

void	run (const Window &window) override
	Enqueue the OpenGL ES shader to process the given window. More...

Public Member Functions inherited from IGCKernel
	IGCKernel ()
	Constructor. More...

GCKernel &	kernel ()
	Returns a reference to the GLES kernel of this object. More...

void	add_1D_tensor_argument (unsigned int &idx, const IGCTensor *tensor, const unsigned int binding_point, const Window &window)
	Add the passed 1D tensor's parameters to the object's kernel's arguments starting from the index idx. More...

void	add_2D_tensor_argument (unsigned int &idx, const IGCTensor *tensor, const unsigned int binding_point, const Window &window)
	Add the passed 2D tensor's parameters to the object's kernel's arguments starting from the index idx. More...

void	add_3D_tensor_argument (unsigned int &idx, const IGCTensor *tensor, const unsigned int binding_point, const Window &window)
	Add the passed 3D tensor's parameters to the object's kernel's arguments starting from the index idx. More...

unsigned int	num_arguments_per_1D_tensor () const
	Returns the number of arguments enqueued per 1D tensor object. More...

unsigned int	num_arguments_per_2D_tensor () const
	Returns the number of arguments enqueued per 2D tensor object. More...

unsigned int	num_arguments_per_3D_tensor () const
	Returns the number of arguments enqueued per 3D tensor object. More...

void	set_lws_hint (gles::NDRange &lws_hint)
	Set the Local-Workgroup-Size hint. More...

void	set_target (GPUTarget target)
	Set the targeted GPU architecture. More...

GPUTarget	get_target () const
	Get the targeted GPU architecture. More...

Public Member Functions inherited from IKernel
	IKernel ()
	Constructor. More...

virtual	~IKernel ()=default
	Destructor. More...

virtual bool	is_parallelisable () const
	Indicates whether or not the kernel is parallelisable. More...

virtual BorderSize	border_size () const
	The size of the border for that kernel. More...

const Window &	window () const
	The maximum window the kernel can be executed on. More...

Static Public Member Functions
static Status	validate (const ITensorInfo input0, const ITensorInfo input1, const ITensorInfo *output, float alpha, bool is_interleaved_transposed, const GEMMReshapeInfo &reshape_info, GPUTarget gpu_target)
	Static function to check if given info will lead to a valid configuration of GCGEMMMatrixMultiplyKernel. More...

Detailed Description

GLES Compute kernel to multiply two input matrices "A" and "B" or to multiply a vector "A" by a matrix "B".

All elements of the output matrix/vector will be multiplied by alpha

Attention: The second input tensor must have at least 2 dimensions (matrix)

Definition at line 39 of file GCGEMMMatrixMultiplyKernel.h.

Constructor & Destructor Documentation

◆ GCGEMMMatrixMultiplyKernel() [1/3]

GCGEMMMatrixMultiplyKernel ( )

Default constructor.

Definition at line 183 of file GCGEMMMatrixMultiplyKernel.cpp.

     : _input0(nullptr), _input1(nullptr), _output(nullptr)
 {
 }

◆ GCGEMMMatrixMultiplyKernel() [2/3]

GCGEMMMatrixMultiplyKernel ( const GCGEMMMatrixMultiplyKernel & )

delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ GCGEMMMatrixMultiplyKernel() [3/3]

GCGEMMMatrixMultiplyKernel ( GCGEMMMatrixMultiplyKernel && )

default

Allow instances of this class to be moved.

Member Function Documentation

◆ configure()

void configure	(	const IGCTensor *	input0,
		const IGCTensor *	input1,
		IGCTensor *	output,
		float	alpha,
		bool	is_interleaved_transposed = `true`,
		const GEMMReshapeInfo &	reshape_info = `GEMMReshapeInfo()`
	)

Initialise the kernel's input, output and alpha.

Parameters

[in]	input0	Input tensor containing the interleaved Matrix A or the vector A. Data types supported: F16/F32
[in]	input1	Input tensor containing the transposed Matrix B if the first input tensor A is not a vector. If the output tensor is a vector, input1 must contain the matrix B not reshaped. Data type supported: same as `input0`
[out]	output	Output tensor to store the result of matrix multiplication. Data type supported: same as `input0`
[in]	alpha	Weight of the matrix product
[in]	is_interleaved_transposed	(Optional) True if input0 and input1 have been reshaped respectively using GCGEMMInterleave4x4Kernel and GCGEMMTranspose1xWKernel
[in]	reshape_info	(Optional) GEMM reshape info. If is_interleaved_transposed = true, this object must contain the information to understand how the matrix A and matrix B have been reshaped

Definition at line 188 of file GCGEMMMatrixMultiplyKernel.cpp.

References ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, arm_compute::BIFROST, GCKernelLibrary::create_kernel(), ITensorInfo::data_type(), ITensorInfo::dimension(), arm_compute::F16, arm_compute::F32, arm_compute::float_to_string_with_full_precision(), GCKernelLibrary::get(), arm_compute::get_arch_from_target(), IGCKernel::get_target(), ITensor::info(), kernel_name, ITensorInfo::num_dimensions(), arm_compute::support::cpp11::to_string(), and arm_compute::validate_arguments().

Referenced by GCGEMM::configure().

 {
     ARM_COMPUTE_ERROR_ON_NULLPTR(input0, input1, output);
 
     // Perform validate step
     ARM_COMPUTE_ERROR_THROW_ON(validate_arguments(input0->info(), input1->info(), output->info(), is_interleaved_transposed, reshape_info));
 
     _input0 = input0;
     _input1 = input1;
     _output = output;
 
     // Get target architecture
     GPUTarget gpu_target = get_target();
 
     ElementsProcessed num_elements_processed{};
 
     // Configure kernel window
     auto win_config = validate_and_configure_window(input0->info(), input1->info(), output->info(), is_interleaved_transposed, reshape_info, gpu_target, num_elements_processed);
     ARM_COMPUTE_ERROR_THROW_ON(win_config.first);
     IGCKernel::configure(win_config.second);
 
     // Create build options
     std::set<std::string> build_opts;
     std::string           kernel_name;
 
     build_opts.emplace("#define LOCAL_SIZE_X " + support::cpp11::to_string(1));
     build_opts.emplace("#define LOCAL_SIZE_Y " + support::cpp11::to_string(1));
     build_opts.emplace("#define LOCAL_SIZE_Z " + support::cpp11::to_string(1));
     build_opts.emplace("#define COLS_A " + support::cpp11::to_string(input0->info()->dimension(0)));
     build_opts.emplace("#define COLS_B " + support::cpp11::to_string(input1->info()->dimension(0)));
     build_opts.emplace("#define ALPHA " + float_to_string_with_full_precision(alpha));
 
     // Check if the output tensor is a vector. If so,the kernel runs the vector-matrix multiplication
     if(is_interleaved_transposed)
     {
         const int mult_transpose1xW_width   = reshape_info.mult_transpose1xW_width();
         const int mult_interleave4x4_height = reshape_info.mult_interleave4x4_height();
 
         build_opts.emplace("#define MULT_TRANSPOSE1XW_WIDTH " + support::cpp11::to_string(mult_transpose1xW_width));
         build_opts.emplace("#define MULT_INTERLEAVE4X4_HEIGHT " + support::cpp11::to_string(mult_interleave4x4_height));
 
         switch(input0->info()->data_type())
         {
             case DataType::F16:
                 build_opts.emplace("#define DATA_TYPE_FP16");
                 break;
 
             case DataType::F32:
                 build_opts.emplace("#define DATA_TYPE_FP32");
                 break;
 
             default:
                 ARM_COMPUTE_ERROR("Current data type is not supported");
                 break;
         }
 
         build_opts.emplace("#define GEMM_MM_INTERLEAVED_TRANSPOSED");
 
         kernel_name = "gemm_mm_interleaved_transposed";
     }
     else
     {
         // Special case for 1xN, 2xN, 3xN and 4xN input0 tensor
 
         GPUTarget arch_target = get_arch_from_target(gpu_target);
         switch(input0->info()->data_type())
         {
             case DataType::F16:
                 build_opts.emplace("#define DATA_TYPE_FP16");
                 build_opts.emplace("#define MM_PROCESS_4X_OPTIMIZED");
                 build_opts.emplace("#define GEMM_MM_FLOATING_POINT");
                 break;
 
             case DataType::F32:
                 build_opts.emplace("#define DATA_TYPE_FP32");
 
                 if(arch_target == GPUTarget::BIFROST && input0->info()->num_dimensions() != 1)
                 {
                     build_opts.emplace("#define GEMM_MM_FLOATING_POINT_BIFROST");
                 }
                 else
                 {
                     build_opts.emplace("#define GEMM_MM_FLOATING_POINT");
                 }
                 break;
 
             default:
                 ARM_COMPUTE_ERROR("Current data type is not supported");
                 break;
         }
 
         build_opts.emplace("#define NUM_ELEMS_PROCESSED_PER_THREAD_X " + support::cpp11::to_string(num_elements_processed.x()));
         build_opts.emplace("#define NUM_ELEMS_PROCESSED_PER_THREAD_Y " + support::cpp11::to_string(num_elements_processed.y()));
 
         kernel_name = "gemm_mm_floating_point";
     }
 
     // Create kernel
     _kernel = GCKernelLibrary::get().create_kernel(kernel_name, build_opts);
 }

◆ operator=() [1/2]

GCGEMMMatrixMultiplyKernel& operator= ( const GCGEMMMatrixMultiplyKernel & )

delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

GCGEMMMatrixMultiplyKernel& operator= ( GCGEMMMatrixMultiplyKernel && )

default

Allow instances of this class to be moved.

◆ run()

void run ( const Window & window )

overridevirtual

Enqueue the OpenGL ES shader to process the given window.

Parameters

[in] window Region on which to execute the kernel. (Must be a valid region of the window returned by window()).

Implements IGCKernel.

Definition at line 306 of file GCGEMMMatrixMultiplyKernel.cpp.

References IGCKernel::add_2D_tensor_argument(), ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW, ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL, Window::DimX, Window::DimY, arm_compute::enqueue(), Window::first_slice_window_2D(), ITensor::info(), ITensorInfo::num_dimensions(), Window::set(), arm_compute::test::validation::reference::slice(), Window::slide_window_slice_2D(), and IKernel::window().

 {
     ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL(this);
     ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW(IGCKernel::window(), window);
 
     _kernel.use();
 
     Window slice          = window.first_slice_window_2D();
     Window slice_matrix_b = slice;
 
     slice_matrix_b.set(Window::DimX, Window::Dimension(0, 1, 1));
     slice_matrix_b.set(Window::DimY, Window::Dimension(0, 1, 1));
 
     do
     {
         Window slice_b = slice;
         // Don't slice matrix B along the z dimension if matrix B has just 2 dimensions and matrix A more than 2
         // This scenario can happen when the the matrix multiplication is used to perform a convolution operation
         if(_input1->info()->num_dimensions() < 3)
         {
             slice_b = slice_matrix_b;
         }
 
         unsigned int idx = 0;
 
         add_2D_tensor_argument(idx, _input0, 1, slice);
         add_2D_tensor_argument(idx, _input1, 2, slice_b);
         add_2D_tensor_argument(idx, _output, 3, slice);
         _kernel.update_shader_params();
         enqueue(*this, slice);
     }
     while(window.slide_window_slice_2D(slice));
 }

◆ validate()

Status validate	(	const ITensorInfo *	input0,
		const ITensorInfo *	input1,
		const ITensorInfo *	output,
		float	alpha,
		bool	is_interleaved_transposed,
		const GEMMReshapeInfo &	reshape_info,
		GPUTarget	gpu_target
	)

static

Static function to check if given info will lead to a valid configuration of GCGEMMMatrixMultiplyKernel.

Parameters

[in]	input0	Input tensor containing the Matrix A. Data types supported: F16/F32
[in]	input1	Input tensor containing the Matrix B. Data type supported: same as `input0`
[in]	output	Output tensor to store the result of matrix multiplication. Data type supported: same as `input0`
[in]	alpha	Weight of the matrix product
[in]	is_interleaved_transposed	True if input0 and input1 have been reshaped respectively using GCGEMMInterleave4x4Kernel and GCGEMMTranspose1xWKernel
[in]	reshape_info	GEMM reshape info. If is_interleaved_transposed = true, this object must contain the information to understand how the matrix A and matrix B have been reshaped
[in]	gpu_target	GPU Target

Returns: a status

Definition at line 289 of file GCGEMMMatrixMultiplyKernel.cpp.

References ARM_COMPUTE_RETURN_ON_ERROR, ARM_COMPUTE_UNUSED, ICloneable< T >::clone(), and arm_compute::validate_arguments().

 {
     ARM_COMPUTE_UNUSED(alpha);
     ElementsProcessed num_elements_processed{};
     ARM_COMPUTE_RETURN_ON_ERROR(validate_arguments(input0, input1, output, is_interleaved_transposed, reshape_info));
     ARM_COMPUTE_RETURN_ON_ERROR(validate_and_configure_window(input0->clone().get(),
                                                               input1->clone().get(),
                                                               output->clone().get(),
                                                               is_interleaved_transposed,
                                                               reshape_info,
                                                               gpu_target,
                                                               num_elements_processed)
                                 .first);
     return Status{};
 }

The documentation for this class was generated from the following files:

arm_compute/core/GLES_COMPUTE/kernels/GCGEMMMatrixMultiplyKernel.h
src/core/GLES_COMPUTE/kernels/GCGEMMMatrixMultiplyKernel.cpp

Public Member Functions

Static Public Member Functions

Detailed Description

Constructor & Destructor Documentation

◆ GCGEMMMatrixMultiplyKernel() [1/3]

◆ GCGEMMMatrixMultiplyKernel() [2/3]

◆ GCGEMMMatrixMultiplyKernel() [3/3]

Member Function Documentation

◆ configure()

◆ operator=() [1/2]

◆ operator=() [2/2]

◆ run()

◆ validate()