Compute Library
 21.02
GCGEMMMatrixMultiplyKernel Class Reference

GLES Compute kernel to multiply two input matrices "A" and "B" or to multiply a vector "A" by a matrix "B". More...

#include <GCGEMMMatrixMultiplyKernel.h>

Collaboration diagram for GCGEMMMatrixMultiplyKernel:
[legend]

Public Member Functions

 GCGEMMMatrixMultiplyKernel ()
 Default constructor. More...
 
 GCGEMMMatrixMultiplyKernel (const GCGEMMMatrixMultiplyKernel &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
GCGEMMMatrixMultiplyKerneloperator= (const GCGEMMMatrixMultiplyKernel &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 GCGEMMMatrixMultiplyKernel (GCGEMMMatrixMultiplyKernel &&)=default
 Allow instances of this class to be moved. More...
 
GCGEMMMatrixMultiplyKerneloperator= (GCGEMMMatrixMultiplyKernel &&)=default
 Allow instances of this class to be moved. More...
 
void configure (const IGCTensor *input0, const IGCTensor *input1, IGCTensor *output, float alpha, bool is_interleaved_transposed=true, const GEMMReshapeInfo &reshape_info=GEMMReshapeInfo())
 Initialise the kernel's input, output and alpha. More...
 
void run (const Window &window) override
 Enqueue the OpenGL ES shader to process the given window. More...
 
- Public Member Functions inherited from IGCKernel
 IGCKernel ()
 Constructor. More...
 
GCKernelkernel ()
 Returns a reference to the GLES kernel of this object. More...
 
void add_1D_tensor_argument (unsigned int &idx, const IGCTensor *tensor, const unsigned int binding_point, const Window &window)
 Add the passed 1D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_2D_tensor_argument (unsigned int &idx, const IGCTensor *tensor, const unsigned int binding_point, const Window &window)
 Add the passed 2D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_3D_tensor_argument (unsigned int &idx, const IGCTensor *tensor, const unsigned int binding_point, const Window &window)
 Add the passed 3D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
unsigned int num_arguments_per_1D_tensor () const
 Returns the number of arguments enqueued per 1D tensor object. More...
 
unsigned int num_arguments_per_2D_tensor () const
 Returns the number of arguments enqueued per 2D tensor object. More...
 
unsigned int num_arguments_per_3D_tensor () const
 Returns the number of arguments enqueued per 3D tensor object. More...
 
void set_lws_hint (gles::NDRange &lws_hint)
 Set the Local-Workgroup-Size hint. More...
 
void set_target (GPUTarget target)
 Set the targeted GPU architecture. More...
 
GPUTarget get_target () const
 Get the targeted GPU architecture. More...
 
- Public Member Functions inherited from IKernel
 IKernel ()
 Constructor. More...
 
virtual ~IKernel ()=default
 Destructor. More...
 
virtual bool is_parallelisable () const
 Indicates whether or not the kernel is parallelisable. More...
 
virtual BorderSize border_size () const
 The size of the border for that kernel. More...
 
const Windowwindow () const
 The maximum window the kernel can be executed on. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *input0, const ITensorInfo *input1, const ITensorInfo *output, float alpha, bool is_interleaved_transposed, const GEMMReshapeInfo &reshape_info, GPUTarget gpu_target)
 Static function to check if given info will lead to a valid configuration of GCGEMMMatrixMultiplyKernel. More...
 

Detailed Description

GLES Compute kernel to multiply two input matrices "A" and "B" or to multiply a vector "A" by a matrix "B".

All elements of the output matrix/vector will be multiplied by alpha

Attention
The second input tensor must have at least 2 dimensions (matrix)

Definition at line 39 of file GCGEMMMatrixMultiplyKernel.h.

Constructor & Destructor Documentation

◆ GCGEMMMatrixMultiplyKernel() [1/3]

Default constructor.

Definition at line 183 of file GCGEMMMatrixMultiplyKernel.cpp.

184  : _input0(nullptr), _input1(nullptr), _output(nullptr)
185 {
186 }

◆ GCGEMMMatrixMultiplyKernel() [2/3]

Prevent instances of this class from being copied (As this class contains pointers)

◆ GCGEMMMatrixMultiplyKernel() [3/3]

Allow instances of this class to be moved.

Member Function Documentation

◆ configure()

void configure ( const IGCTensor input0,
const IGCTensor input1,
IGCTensor output,
float  alpha,
bool  is_interleaved_transposed = true,
const GEMMReshapeInfo reshape_info = GEMMReshapeInfo() 
)

Initialise the kernel's input, output and alpha.

Parameters
[in]input0Input tensor containing the interleaved Matrix A or the vector A. Data types supported: F16/F32
[in]input1Input tensor containing the transposed Matrix B if the first input tensor A is not a vector. If the output tensor is a vector, input1 must contain the matrix B not reshaped. Data type supported: same as input0
[out]outputOutput tensor to store the result of matrix multiplication. Data type supported: same as input0
[in]alphaWeight of the matrix product
[in]is_interleaved_transposed(Optional) True if input0 and input1 have been reshaped respectively using GCGEMMInterleave4x4Kernel and GCGEMMTranspose1xWKernel
[in]reshape_info(Optional) GEMM reshape info. If is_interleaved_transposed = true, this object must contain the information to understand how the matrix A and matrix B have been reshaped

Definition at line 188 of file GCGEMMMatrixMultiplyKernel.cpp.

References ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, arm_compute::BIFROST, GCKernelLibrary::create_kernel(), ITensorInfo::data_type(), ITensorInfo::dimension(), arm_compute::F16, arm_compute::F32, arm_compute::float_to_string_with_full_precision(), GCKernelLibrary::get(), arm_compute::get_arch_from_target(), IGCKernel::get_target(), ITensor::info(), kernel_name, ITensorInfo::num_dimensions(), arm_compute::support::cpp11::to_string(), and arm_compute::validate_arguments().

Referenced by GCGEMM::configure().

189 {
190  ARM_COMPUTE_ERROR_ON_NULLPTR(input0, input1, output);
191 
192  // Perform validate step
193  ARM_COMPUTE_ERROR_THROW_ON(validate_arguments(input0->info(), input1->info(), output->info(), is_interleaved_transposed, reshape_info));
194 
195  _input0 = input0;
196  _input1 = input1;
197  _output = output;
198 
199  // Get target architecture
200  GPUTarget gpu_target = get_target();
201 
202  ElementsProcessed num_elements_processed{};
203 
204  // Configure kernel window
205  auto win_config = validate_and_configure_window(input0->info(), input1->info(), output->info(), is_interleaved_transposed, reshape_info, gpu_target, num_elements_processed);
206  ARM_COMPUTE_ERROR_THROW_ON(win_config.first);
207  IGCKernel::configure(win_config.second);
208 
209  // Create build options
210  std::set<std::string> build_opts;
211  std::string kernel_name;
212 
213  build_opts.emplace("#define LOCAL_SIZE_X " + support::cpp11::to_string(1));
214  build_opts.emplace("#define LOCAL_SIZE_Y " + support::cpp11::to_string(1));
215  build_opts.emplace("#define LOCAL_SIZE_Z " + support::cpp11::to_string(1));
216  build_opts.emplace("#define COLS_A " + support::cpp11::to_string(input0->info()->dimension(0)));
217  build_opts.emplace("#define COLS_B " + support::cpp11::to_string(input1->info()->dimension(0)));
218  build_opts.emplace("#define ALPHA " + float_to_string_with_full_precision(alpha));
219 
220  // Check if the output tensor is a vector. If so,the kernel runs the vector-matrix multiplication
221  if(is_interleaved_transposed)
222  {
223  const int mult_transpose1xW_width = reshape_info.mult_transpose1xW_width();
224  const int mult_interleave4x4_height = reshape_info.mult_interleave4x4_height();
225 
226  build_opts.emplace("#define MULT_TRANSPOSE1XW_WIDTH " + support::cpp11::to_string(mult_transpose1xW_width));
227  build_opts.emplace("#define MULT_INTERLEAVE4X4_HEIGHT " + support::cpp11::to_string(mult_interleave4x4_height));
228 
229  switch(input0->info()->data_type())
230  {
231  case DataType::F16:
232  build_opts.emplace("#define DATA_TYPE_FP16");
233  break;
234 
235  case DataType::F32:
236  build_opts.emplace("#define DATA_TYPE_FP32");
237  break;
238 
239  default:
240  ARM_COMPUTE_ERROR("Current data type is not supported");
241  break;
242  }
243 
244  build_opts.emplace("#define GEMM_MM_INTERLEAVED_TRANSPOSED");
245 
246  kernel_name = "gemm_mm_interleaved_transposed";
247  }
248  else
249  {
250  // Special case for 1xN, 2xN, 3xN and 4xN input0 tensor
251 
252  GPUTarget arch_target = get_arch_from_target(gpu_target);
253  switch(input0->info()->data_type())
254  {
255  case DataType::F16:
256  build_opts.emplace("#define DATA_TYPE_FP16");
257  build_opts.emplace("#define MM_PROCESS_4X_OPTIMIZED");
258  build_opts.emplace("#define GEMM_MM_FLOATING_POINT");
259  break;
260 
261  case DataType::F32:
262  build_opts.emplace("#define DATA_TYPE_FP32");
263 
264  if(arch_target == GPUTarget::BIFROST && input0->info()->num_dimensions() != 1)
265  {
266  build_opts.emplace("#define GEMM_MM_FLOATING_POINT_BIFROST");
267  }
268  else
269  {
270  build_opts.emplace("#define GEMM_MM_FLOATING_POINT");
271  }
272  break;
273 
274  default:
275  ARM_COMPUTE_ERROR("Current data type is not supported");
276  break;
277  }
278 
279  build_opts.emplace("#define NUM_ELEMS_PROCESSED_PER_THREAD_X " + support::cpp11::to_string(num_elements_processed.x()));
280  build_opts.emplace("#define NUM_ELEMS_PROCESSED_PER_THREAD_Y " + support::cpp11::to_string(num_elements_processed.y()));
281 
282  kernel_name = "gemm_mm_floating_point";
283  }
284 
285  // Create kernel
286  _kernel = GCKernelLibrary::get().create_kernel(kernel_name, build_opts);
287 }
virtual size_t num_dimensions() const =0
The number of dimensions of the tensor (rank)
virtual size_t dimension(size_t index) const =0
Return the size of the requested dimension.
#define ARM_COMPUTE_ERROR(msg)
Print the given message then throw an std::runtime_error.
Definition: Error.h:352
std::string to_string(T &&value)
Convert integer and float values to string.
virtual DataType data_type() const =0
Data type used for each element of the tensor.
1 channel, 1 F32 per channel
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
GPUTarget get_arch_from_target(GPUTarget target)
Helper function to get the GPU arch.
Definition: GPUTarget.cpp:189
1 channel, 1 F16 per channel
std::string float_to_string_with_full_precision(float val)
Create a string with the float in full precision.
Definition: Utils.h:1262
std::string kernel_name
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor&#39;s metadata.
static GCKernelLibrary & get()
Get the static instance of GCKernelLibrary.
GPUTarget
Available GPU Targets.
Definition: GPUTarget.h:34
Status validate_arguments(const ITensorInfo *input, const ITensorInfo *bias, const ITensorInfo *output, const GEMMLowpOutputStageInfo *output_stage)
GCKernel create_kernel(const std::string &shader_name, const StringSet &build_options_set={}) const
Creates a kernel from the kernel library.
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161
GPUTarget get_target() const
Get the targeted GPU architecture.
Definition: IGCKernel.h:122

◆ operator=() [1/2]

GCGEMMMatrixMultiplyKernel& operator= ( const GCGEMMMatrixMultiplyKernel )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

Allow instances of this class to be moved.

◆ run()

void run ( const Window window)
overridevirtual

Enqueue the OpenGL ES shader to process the given window.

Parameters
[in]windowRegion on which to execute the kernel. (Must be a valid region of the window returned by window()).

Implements IGCKernel.

Definition at line 306 of file GCGEMMMatrixMultiplyKernel.cpp.

References IGCKernel::add_2D_tensor_argument(), ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW, ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL, Window::DimX, Window::DimY, arm_compute::enqueue(), Window::first_slice_window_2D(), ITensor::info(), ITensorInfo::num_dimensions(), Window::set(), arm_compute::test::validation::reference::slice(), Window::slide_window_slice_2D(), and IKernel::window().

307 {
310 
311  _kernel.use();
312 
314  Window slice_matrix_b = slice;
315 
316  slice_matrix_b.set(Window::DimX, Window::Dimension(0, 1, 1));
317  slice_matrix_b.set(Window::DimY, Window::Dimension(0, 1, 1));
318 
319  do
320  {
321  Window slice_b = slice;
322  // Don't slice matrix B along the z dimension if matrix B has just 2 dimensions and matrix A more than 2
323  // This scenario can happen when the the matrix multiplication is used to perform a convolution operation
324  if(_input1->info()->num_dimensions() < 3)
325  {
326  slice_b = slice_matrix_b;
327  }
328 
329  unsigned int idx = 0;
330 
331  add_2D_tensor_argument(idx, _input0, 1, slice);
332  add_2D_tensor_argument(idx, _input1, 2, slice_b);
333  add_2D_tensor_argument(idx, _output, 3, slice);
334  _kernel.update_shader_params();
335  enqueue(*this, slice);
336  }
337  while(window.slide_window_slice_2D(slice));
338 }
Window first_slice_window_2D() const
First 2D slice of the window.
Definition: Window.h:283
virtual size_t num_dimensions() const =0
The number of dimensions of the tensor (rank)
const Window & window() const
The maximum window the kernel can be executed on.
Definition: IKernel.cpp:28
void enqueue(IGCKernel &kernel, const Window &window, const gles::NDRange &lws=gles::NDRange(1U, 1U, 1U))
Add the kernel to the command queue with the given window.
Definition: IGCKernel.cpp:41
Describe one of the image&#39;s dimensions with a start, end and step.
Definition: Window.h:77
bool slide_window_slice_2D(Window &slice) const
Slide the passed 2D window slice.
Definition: Window.h:323
static constexpr size_t DimX
Alias for dimension 0 also known as X dimension.
Definition: Window.h:43
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor&#39;s metadata.
void set(size_t dimension, const Dimension &dim)
Set the values of a given dimension.
Definition: Window.inl:49
void add_2D_tensor_argument(unsigned int &idx, const IGCTensor *tensor, const unsigned int binding_point, const Window &window)
Add the passed 2D tensor&#39;s parameters to the object&#39;s kernel&#39;s arguments starting from the index idx...
Definition: IGCKernel.cpp:127
#define ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL(k)
Definition: Validate.h:941
static constexpr size_t DimY
Alias for dimension 1 also known as Y dimension.
Definition: Window.h:45
Describe a multidimensional execution window.
Definition: Window.h:39
#define ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW(f, s)
Definition: Validate.h:205
SimpleTensor< T > slice(const SimpleTensor< T > &src, Coordinates starts, Coordinates ends)

◆ validate()

Status validate ( const ITensorInfo input0,
const ITensorInfo input1,
const ITensorInfo output,
float  alpha,
bool  is_interleaved_transposed,
const GEMMReshapeInfo reshape_info,
GPUTarget  gpu_target 
)
static

Static function to check if given info will lead to a valid configuration of GCGEMMMatrixMultiplyKernel.

Parameters
[in]input0Input tensor containing the Matrix A. Data types supported: F16/F32
[in]input1Input tensor containing the Matrix B. Data type supported: same as input0
[in]outputOutput tensor to store the result of matrix multiplication. Data type supported: same as input0
[in]alphaWeight of the matrix product
[in]is_interleaved_transposedTrue if input0 and input1 have been reshaped respectively using GCGEMMInterleave4x4Kernel and GCGEMMTranspose1xWKernel
[in]reshape_infoGEMM reshape info. If is_interleaved_transposed = true, this object must contain the information to understand how the matrix A and matrix B have been reshaped
[in]gpu_targetGPU Target
Returns
a status

Definition at line 289 of file GCGEMMMatrixMultiplyKernel.cpp.

References ARM_COMPUTE_RETURN_ON_ERROR, ARM_COMPUTE_UNUSED, ICloneable< T >::clone(), and arm_compute::validate_arguments().

291 {
292  ARM_COMPUTE_UNUSED(alpha);
293  ElementsProcessed num_elements_processed{};
294  ARM_COMPUTE_RETURN_ON_ERROR(validate_arguments(input0, input1, output, is_interleaved_transposed, reshape_info));
295  ARM_COMPUTE_RETURN_ON_ERROR(validate_and_configure_window(input0->clone().get(),
296  input1->clone().get(),
297  output->clone().get(),
298  is_interleaved_transposed,
299  reshape_info,
300  gpu_target,
301  num_elements_processed)
302  .first);
303  return Status{};
304 }
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204
Status class.
Definition: Error.h:52
#define ARM_COMPUTE_UNUSED(...)
To avoid unused variables warnings.
Definition: Error.h:152
virtual std::unique_ptr< T > clone() const =0
Provide a clone of the current object of class T.
Status validate_arguments(const ITensorInfo *input, const ITensorInfo *bias, const ITensorInfo *output, const GEMMLowpOutputStageInfo *output_stage)

The documentation for this class was generated from the following files: