Compute Library
 19.08
GCGEMMMatrixMultiplyKernel Class Reference

GLES Compute kernel to multiply two input matrices "A" and "B" or to multiply a vector "A" by a matrix "B". More...

#include <GCGEMMMatrixMultiplyKernel.h>

Collaboration diagram for GCGEMMMatrixMultiplyKernel:
[legend]

Public Member Functions

 GCGEMMMatrixMultiplyKernel ()
 Default constructor. More...
 
 GCGEMMMatrixMultiplyKernel (const GCGEMMMatrixMultiplyKernel &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
GCGEMMMatrixMultiplyKerneloperator= (const GCGEMMMatrixMultiplyKernel &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 GCGEMMMatrixMultiplyKernel (GCGEMMMatrixMultiplyKernel &&)=default
 Allow instances of this class to be moved. More...
 
GCGEMMMatrixMultiplyKerneloperator= (GCGEMMMatrixMultiplyKernel &&)=default
 Allow instances of this class to be moved. More...
 
void configure (const IGCTensor *input0, const IGCTensor *input1, IGCTensor *output, float alpha, bool is_interleaved_transposed=true, const GEMMReshapeInfo &reshape_info=GEMMReshapeInfo())
 Initialise the kernel's input, output and alpha. More...
 
void run (const Window &window) override
 Enqueue the OpenGL ES shader to process the given window. More...
 
- Public Member Functions inherited from IGCKernel
 IGCKernel ()
 Constructor. More...
 
GCKernelkernel ()
 Returns a reference to the GLES kernel of this object. More...
 
void add_1D_tensor_argument (unsigned int &idx, const IGCTensor *tensor, const unsigned int binding_point, const Window &window)
 Add the passed 1D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_2D_tensor_argument (unsigned int &idx, const IGCTensor *tensor, const unsigned int binding_point, const Window &window)
 Add the passed 2D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_3D_tensor_argument (unsigned int &idx, const IGCTensor *tensor, const unsigned int binding_point, const Window &window)
 Add the passed 3D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
unsigned int num_arguments_per_1D_tensor () const
 Returns the number of arguments enqueued per 1D tensor object. More...
 
unsigned int num_arguments_per_2D_tensor () const
 Returns the number of arguments enqueued per 2D tensor object. More...
 
unsigned int num_arguments_per_3D_tensor () const
 Returns the number of arguments enqueued per 3D tensor object. More...
 
void set_lws_hint (gles::NDRange &lws_hint)
 Set the Local-Workgroup-Size hint. More...
 
void set_target (GPUTarget target)
 Set the targeted GPU architecture. More...
 
GPUTarget get_target () const
 Get the targeted GPU architecture. More...
 
- Public Member Functions inherited from IKernel
 IKernel ()
 Constructor. More...
 
virtual ~IKernel ()=default
 Destructor. More...
 
virtual bool is_parallelisable () const
 Indicates whether or not the kernel is parallelisable. More...
 
virtual BorderSize border_size () const
 The size of the border for that kernel. More...
 
const Windowwindow () const
 The maximum window the kernel can be executed on. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *input0, const ITensorInfo *input1, const ITensorInfo *output, float alpha, bool is_interleaved_transposed, const GEMMReshapeInfo &reshape_info, GPUTarget gpu_target)
 Static function to check if given info will lead to a valid configuration of GCGEMMMatrixMultiplyKernel. More...
 

Detailed Description

GLES Compute kernel to multiply two input matrices "A" and "B" or to multiply a vector "A" by a matrix "B".

All elements of the output matrix/vector will be multiplied by alpha

Attention
The second input tensor must have at least 2 dimensions (matrix)

Definition at line 39 of file GCGEMMMatrixMultiplyKernel.h.

Constructor & Destructor Documentation

◆ GCGEMMMatrixMultiplyKernel() [1/3]

Default constructor.

Definition at line 180 of file GCGEMMMatrixMultiplyKernel.cpp.

181  : _input0(nullptr), _input1(nullptr), _output(nullptr)
182 {
183 }

◆ GCGEMMMatrixMultiplyKernel() [2/3]

Prevent instances of this class from being copied (As this class contains pointers)

◆ GCGEMMMatrixMultiplyKernel() [3/3]

Allow instances of this class to be moved.

Member Function Documentation

◆ configure()

void configure ( const IGCTensor input0,
const IGCTensor input1,
IGCTensor output,
float  alpha,
bool  is_interleaved_transposed = true,
const GEMMReshapeInfo reshape_info = GEMMReshapeInfo() 
)

Initialise the kernel's input, output and alpha.

Parameters
[in]input0Input tensor containing the interleaved Matrix A or the vector A. Data types supported: F16/F32
[in]input1Input tensor containing the transposed Matrix B if the first input tensor A is not a vector. If the output tensor is a vector, input1 must contain the matrix B not reshaped. Data type supported: same as input0
[out]outputOutput tensor to store the result of matrix multiplication. Data type supported: same as input0
[in]alphaWeight of the matrix product
[in]is_interleaved_transposed(Optional) True if input0 and input1 have been reshaped respectively using GCGEMMInterleave4x4Kernel and GCGEMMTranspose1xWKernel
[in]reshape_info(Optional) GEMM reshape info. If is_interleaved_transposed = true, this object must contain the information to understand how the matrix A and matrix B have been reshaped

Definition at line 185 of file GCGEMMMatrixMultiplyKernel.cpp.

186 {
187  ARM_COMPUTE_ERROR_ON_NULLPTR(input0, input1, output);
188 
189  // Perform validate step
190  ARM_COMPUTE_ERROR_THROW_ON(validate_arguments(input0->info(), input1->info(), output->info(), is_interleaved_transposed, reshape_info));
191 
192  _input0 = input0;
193  _input1 = input1;
194  _output = output;
195 
196  // Get target architecture
197  GPUTarget gpu_target = get_target();
198 
199  ElementsProcessed num_elements_processed{};
200 
201  // Configure kernel window
202  auto win_config = validate_and_configure_window(input0->info(), input1->info(), output->info(), is_interleaved_transposed, reshape_info, gpu_target, num_elements_processed);
203  ARM_COMPUTE_ERROR_THROW_ON(win_config.first);
204  IGCKernel::configure(win_config.second);
205 
206  // Create build options
207  std::set<std::string> build_opts;
208  std::string kernel_name;
209 
210  build_opts.emplace("#define LOCAL_SIZE_X " + support::cpp11::to_string(1));
211  build_opts.emplace("#define LOCAL_SIZE_Y " + support::cpp11::to_string(1));
212  build_opts.emplace("#define LOCAL_SIZE_Z " + support::cpp11::to_string(1));
213  build_opts.emplace("#define COLS_A " + support::cpp11::to_string(input0->info()->dimension(0)));
214  build_opts.emplace("#define COLS_B " + support::cpp11::to_string(input1->info()->dimension(0)));
215  build_opts.emplace("#define ALPHA " + float_to_string_with_full_precision(alpha));
216 
217  // Check if the output tensor is a vector. If so,the kernel runs the vector-matrix multiplication
218  if(is_interleaved_transposed)
219  {
220  const int mult_transpose1xW_width = reshape_info.mult_transpose1xW_width();
221  const int mult_interleave4x4_height = reshape_info.mult_interleave4x4_height();
222 
223  build_opts.emplace("#define MULT_TRANSPOSE1XW_WIDTH " + support::cpp11::to_string(mult_transpose1xW_width));
224  build_opts.emplace("#define MULT_INTERLEAVE4X4_HEIGHT " + support::cpp11::to_string(mult_interleave4x4_height));
225 
226  switch(input0->info()->data_type())
227  {
228  case DataType::F16:
229  build_opts.emplace("#define DATA_TYPE_FP16");
230  break;
231 
232  case DataType::F32:
233  build_opts.emplace("#define DATA_TYPE_FP32");
234  break;
235 
236  default:
237  ARM_COMPUTE_ERROR("Current data type is not supported");
238  break;
239  }
240 
241  build_opts.emplace("#define GEMM_MM_INTERLEAVED_TRANSPOSED");
242 
243  kernel_name = "gemm_mm_interleaved_transposed";
244  }
245  else
246  {
247  // Special case for 1xN, 2xN, 3xN and 4xN input0 tensor
248 
249  GPUTarget arch_target = get_arch_from_target(gpu_target);
250  switch(input0->info()->data_type())
251  {
252  case DataType::F16:
253  build_opts.emplace("#define DATA_TYPE_FP16");
254  build_opts.emplace("#define MM_PROCESS_4X_OPTIMIZED");
255  build_opts.emplace("#define GEMM_MM_FLOATING_POINT");
256  break;
257 
258  case DataType::F32:
259  build_opts.emplace("#define DATA_TYPE_FP32");
260 
261  if(arch_target == GPUTarget::BIFROST && input0->info()->num_dimensions() != 1)
262  {
263  build_opts.emplace("#define GEMM_MM_FLOATING_POINT_BIFROST");
264  }
265  else
266  {
267  build_opts.emplace("#define GEMM_MM_FLOATING_POINT");
268  }
269  break;
270 
271  default:
272  ARM_COMPUTE_ERROR("Current data type is not supported");
273  break;
274  }
275 
276  build_opts.emplace("#define NUM_ELEMS_PROCESSED_PER_THREAD_X " + support::cpp11::to_string(num_elements_processed.x()));
277  build_opts.emplace("#define NUM_ELEMS_PROCESSED_PER_THREAD_Y " + support::cpp11::to_string(num_elements_processed.y()));
278 
279  kernel_name = "gemm_mm_floating_point";
280  }
281 
282  // Create kernel
283  _kernel = GCKernelLibrary::get().create_kernel(kernel_name, build_opts);
284 }
#define ARM_COMPUTE_ERROR(...)
Print the given message then throw an std::runtime_error.
Definition: Error.h:261
virtual size_t num_dimensions() const =0
The number of dimensions of the tensor (rank)
virtual size_t dimension(size_t index) const =0
Return the size of the requested dimension.
std::pair< Status, Window > validate_and_configure_window(ITensorInfo *input, ITensorInfo *weights, ITensorInfo *biases, ITensorInfo *output, const PadStrideInfo &conv_info, unsigned int depth_multiplier, const Size2D &dilation)
std::string to_string(T &&value)
Convert integer and float values to string.
virtual DataType data_type() const =0
Data type used for each element of the tensor.
1 channel, 1 F32 per channel
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:327
GPUTarget get_arch_from_target(GPUTarget target)
Helper function to get the GPU arch.
Definition: GPUTarget.cpp:189
1 channel, 1 F16 per channel
std::string float_to_string_with_full_precision(float val)
Create a string with the float in full precision.
Definition: Utils.h:1066
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor's metadata.
static GCKernelLibrary & get()
Get the static instance of GCKernelLibrary.
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161
GPUTarget
Available GPU Targets.
Definition: GPUTarget.h:34
GCKernel create_kernel(const std::string &shader_name, const StringSet &build_options_set={}) const
Creates a kernel from the kernel library.
GPUTarget get_target() const
Get the targeted GPU architecture.
Definition: IGCKernel.h:122

References arm_compute::test::validation::alpha, ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, arm_compute::BIFROST, GCKernelLibrary::create_kernel(), ITensorInfo::data_type(), ITensorInfo::dimension(), arm_compute::F16, arm_compute::F32, arm_compute::float_to_string_with_full_precision(), GCKernelLibrary::get(), arm_compute::get_arch_from_target(), IGCKernel::get_target(), ITensor::info(), ITensorInfo::num_dimensions(), arm_compute::support::cpp11::to_string(), and arm_compute::validate_and_configure_window().

Referenced by GCGEMM::configure().

◆ operator=() [1/2]

GCGEMMMatrixMultiplyKernel& operator= ( const GCGEMMMatrixMultiplyKernel )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

Allow instances of this class to be moved.

◆ run()

void run ( const Window window)
overridevirtual

Enqueue the OpenGL ES shader to process the given window.

Parameters
[in]windowRegion on which to execute the kernel. (Must be a valid region of the window returned by window()).

Implements IGCKernel.

Definition at line 303 of file GCGEMMMatrixMultiplyKernel.cpp.

304 {
307 
308  _kernel.use();
309 
311  Window slice_matrix_b = slice;
312 
313  slice_matrix_b.set(Window::DimX, Window::Dimension(0, 1, 1));
314  slice_matrix_b.set(Window::DimY, Window::Dimension(0, 1, 1));
315 
316  do
317  {
318  Window slice_b = slice;
319  // Don't slice matrix B along the z dimension if matrix B has just 2 dimensions and matrix A more than 2
320  // This scenario can happen when the the matrix multiplication is used to perform a convolution operation
321  if(_input1->info()->num_dimensions() < 3)
322  {
323  slice_b = slice_matrix_b;
324  }
325 
326  unsigned int idx = 0;
327 
328  add_2D_tensor_argument(idx, _input0, 1, slice);
329  add_2D_tensor_argument(idx, _input1, 2, slice_b);
330  add_2D_tensor_argument(idx, _output, 3, slice);
331  _kernel.update_shader_params();
332  enqueue(*this, slice);
333  }
335 }
Window first_slice_window_2D() const
First 2D slice of the window.
Definition: Window.h:267
virtual size_t num_dimensions() const =0
The number of dimensions of the tensor (rank)
const Window & window() const
The maximum window the kernel can be executed on.
Definition: IKernel.cpp:28
void enqueue(cl::CommandQueue &queue, ICLKernel &kernel, const Window &window, const cl::NDRange &lws_hint=CLKernelLibrary::get().default_ndrange(), bool use_dummy_work_items=false)
Add the kernel to the command queue with the given window.
Definition: ICLKernel.cpp:39
Describe one of the image's dimensions with a start, end and step.
Definition: Window.h:75
bool slide_window_slice_2D(Window &slice) const
Slide the passed 2D window slice.
Definition: Window.h:307
static constexpr size_t DimX
Alias for dimension 0 also known as X dimension.
Definition: Window.h:43
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor's metadata.
void set(size_t dimension, const Dimension &dim)
Set the values of a given dimension.
Definition: Window.inl:48
void add_2D_tensor_argument(unsigned int &idx, const IGCTensor *tensor, const unsigned int binding_point, const Window &window)
Add the passed 2D tensor's parameters to the object's kernel's arguments starting from the index idx.
Definition: IGCKernel.cpp:127
static constexpr size_t DimY
Alias for dimension 1 also known as Y dimension.
Definition: Window.h:45
#define ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW(f, s)
Definition: Validate.h:205
Describe a multidimensional execution window.
Definition: Window.h:39
#define ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL(k)
Definition: Validate.h:940
SimpleTensor< T > slice(const SimpleTensor< T > &src, Coordinates starts, Coordinates ends)

References IGCKernel::add_2D_tensor_argument(), ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW, ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL, Window::DimX, Window::DimY, arm_compute::enqueue(), Window::first_slice_window_2D(), ITensor::info(), ITensorInfo::num_dimensions(), Window::set(), arm_compute::test::validation::reference::slice(), Window::slide_window_slice_2D(), and IKernel::window().

◆ validate()

Status validate ( const ITensorInfo input0,
const ITensorInfo input1,
const ITensorInfo output,
float  alpha,
bool  is_interleaved_transposed,
const GEMMReshapeInfo reshape_info,
GPUTarget  gpu_target 
)
static

Static function to check if given info will lead to a valid configuration of GCGEMMMatrixMultiplyKernel.

Parameters
[in]input0Input tensor containing the Matrix A. Data types supported: F16/F32
[in]input1Input tensor containing the Matrix B. Data type supported: same as input0
[in]outputOutput tensor to store the result of matrix multiplication. Data type supported: same as input0
[in]alphaWeight of the matrix product
[in]is_interleaved_transposedTrue if input0 and input1 have been reshaped respectively using GCGEMMInterleave4x4Kernel and GCGEMMTranspose1xWKernel
[in]reshape_infoGEMM reshape info. If is_interleaved_transposed = true, this object must contain the information to understand how the matrix A and matrix B have been reshaped
[in]gpu_targetGPU Target
Returns
a status

Definition at line 286 of file GCGEMMMatrixMultiplyKernel.cpp.

288 {
290  ElementsProcessed num_elements_processed{};
291  ARM_COMPUTE_RETURN_ON_ERROR(validate_arguments(input0, input1, output, is_interleaved_transposed, reshape_info));
293  input1->clone().get(),
294  output->clone().get(),
295  is_interleaved_transposed,
296  reshape_info,
297  gpu_target,
298  num_elements_processed)
299  .first);
300  return Status{};
301 }
std::pair< Status, Window > validate_and_configure_window(ITensorInfo *input, ITensorInfo *weights, ITensorInfo *biases, ITensorInfo *output, const PadStrideInfo &conv_info, unsigned int depth_multiplier, const Size2D &dilation)
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:193
Status class.
Definition: Error.h:52
#define ARM_COMPUTE_UNUSED(...)
To avoid unused variables warnings.
Definition: Error.h:160
virtual std::unique_ptr< T > clone() const =0
Provide a clone of the current object of class T.

References arm_compute::test::validation::alpha, ARM_COMPUTE_RETURN_ON_ERROR, ARM_COMPUTE_UNUSED, ICloneable< T >::clone(), and arm_compute::validate_and_configure_window().


The documentation for this class was generated from the following files: