Compute Library
 19.08
CLGEMMMatrixMultiplyKernel Class Reference

OpenCL kernel to multiply two input matrices "A" and "B" and add a martix "C" if provided. More...

#include <CLGEMMMatrixMultiplyKernel.h>

Collaboration diagram for CLGEMMMatrixMultiplyKernel:
[legend]

Public Member Functions

 CLGEMMMatrixMultiplyKernel ()
 Default constructor. More...
 
 CLGEMMMatrixMultiplyKernel (const CLGEMMMatrixMultiplyKernel &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
CLGEMMMatrixMultiplyKerneloperator= (const CLGEMMMatrixMultiplyKernel &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 CLGEMMMatrixMultiplyKernel (CLGEMMMatrixMultiplyKernel &&)=default
 Allow instances of this class to be moved. More...
 
CLGEMMMatrixMultiplyKerneloperator= (CLGEMMMatrixMultiplyKernel &&)=default
 Allow instances of this class to be moved. More...
 
void configure (const ICLTensor *input0, const ICLTensor *input1, const ICLTensor *input2, ICLTensor *output, float alpha, float beta=0.f, bool is_interleaved_transposed=true, const GEMMReshapeInfo &reshape_info=GEMMReshapeInfo(), bool fp_mixed_precision=false, const ActivationLayerInfo &activation_info=ActivationLayerInfo())
 Initialise the kernel's input, output and alpha. More...
 
void run (const Window &window, cl::CommandQueue &queue) override
 Enqueue the OpenCL kernel to process the given window on the passed OpenCL command queue. More...
 
- Public Member Functions inherited from ICLKernel
 ICLKernel ()
 Constructor. More...
 
cl::Kernel & kernel ()
 Returns a reference to the OpenCL kernel of this object. More...
 
template<typename T >
void add_1D_array_argument (unsigned int &idx, const ICLArray< T > *array, const Strides &strides, unsigned int num_dimensions, const Window &window)
 Add the passed 1D array's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_1D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 1D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_1D_tensor_argument_if (bool cond, unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 1D tensor's parameters to the object's kernel's arguments starting from the index idx if the condition is true. More...
 
void add_2D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 2D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_2D_tensor_argument_if (bool cond, unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 2D tensor's parameters to the object's kernel's arguments starting from the index idx if the condition is true. More...
 
void add_3D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 3D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_4D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 4D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
template<typename T >
void add_argument (unsigned int &idx, T value)
 Add the passed parameters to the object's kernel's arguments starting from the index idx. More...
 
void set_lws_hint (const cl::NDRange &lws_hint)
 Set the Local-Workgroup-Size hint. More...
 
cl::NDRange lws_hint () const
 Return the Local-Workgroup-Size hint. More...
 
const std::string & config_id () const
 Get the configuration ID. More...
 
void set_target (GPUTarget target)
 Set the targeted GPU architecture. More...
 
void set_target (cl::Device &device)
 Set the targeted GPU architecture according to the CL device. More...
 
GPUTarget get_target () const
 Get the targeted GPU architecture. More...
 
size_t get_max_workgroup_size ()
 Get the maximum workgroup size for the device the CLKernelLibrary uses. More...
 
template<typename T , unsigned int dimension_size>
void add_array_argument (unsigned &idx, const ICLArray< T > *array, const Strides &strides, unsigned int num_dimensions, const Window &window)
 Add the passed array's parameters to the object's kernel's arguments starting from the index idx. More...
 
template<unsigned int dimension_size>
void add_tensor_argument (unsigned &idx, const ICLTensor *tensor, const Window &window)
 
- Public Member Functions inherited from IKernel
 IKernel ()
 Constructor. More...
 
virtual ~IKernel ()=default
 Destructor. More...
 
virtual bool is_parallelisable () const
 Indicates whether or not the kernel is parallelisable. More...
 
virtual BorderSize border_size () const
 The size of the border for that kernel. More...
 
const Windowwindow () const
 The maximum window the kernel can be executed on. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *input0, const ITensorInfo *input1, const ITensorInfo *input2, const ITensorInfo *output, float alpha, float beta, bool is_interleaved_transposed, const GEMMReshapeInfo &reshape_info, GPUTarget gpu_target, bool fp_mixed_precision=false, const ActivationLayerInfo &activation_info=ActivationLayerInfo())
 Static function to check if given info will lead to a valid configuration of CLGEMMMatrixMultiplyKernel. More...
 
- Static Public Member Functions inherited from ICLKernel
static constexpr unsigned int num_arguments_per_1D_array ()
 Returns the number of arguments enqueued per 1D array object. More...
 
static constexpr unsigned int num_arguments_per_1D_tensor ()
 Returns the number of arguments enqueued per 1D tensor object. More...
 
static constexpr unsigned int num_arguments_per_2D_tensor ()
 Returns the number of arguments enqueued per 2D tensor object. More...
 
static constexpr unsigned int num_arguments_per_3D_tensor ()
 Returns the number of arguments enqueued per 3D tensor object. More...
 
static constexpr unsigned int num_arguments_per_4D_tensor ()
 Returns the number of arguments enqueued per 4D tensor object. More...
 
static cl::NDRange gws_from_window (const Window &window)
 Get the global work size given an execution window. More...
 

Data Fields

const ICLTensor_input0
 
const ICLTensor_input1
 
const ICLTensor_input2
 
ICLTensor_output
 
bool _slide_matrix_b
 
bool _reinterpret_input_as_3d
 
bool _reinterpret_output_as_3d
 
bool _add_bias
 
bool _broadcast_bias
 

Detailed Description

OpenCL kernel to multiply two input matrices "A" and "B" and add a martix "C" if provided.

All elements of the output matrix will be multiplied by alpha. In case matrix C is passed, it will be added to the previous result. For the matrix C, the broadcast addition is supported if the flag "broadcast_bias" is set in the GEMMReshapeInfo object

Note
If the input tensors input0 and input1 have been reshaped respectively with CLGEMMReshapeLHSMatrixKernel" and CLGEMMReshapeRHSMatrixKernel, the flag is_interleaved_transposed must be set to true
Attention
input1 tensor must have at least 2 dimensions (matrix)

Definition at line 42 of file CLGEMMMatrixMultiplyKernel.h.

Constructor & Destructor Documentation

◆ CLGEMMMatrixMultiplyKernel() [1/3]

◆ CLGEMMMatrixMultiplyKernel() [2/3]

Prevent instances of this class from being copied (As this class contains pointers)

◆ CLGEMMMatrixMultiplyKernel() [3/3]

Allow instances of this class to be moved.

Member Function Documentation

◆ configure()

void configure ( const ICLTensor input0,
const ICLTensor input1,
const ICLTensor input2,
ICLTensor output,
float  alpha,
float  beta = 0.f,
bool  is_interleaved_transposed = true,
const GEMMReshapeInfo reshape_info = GEMMReshapeInfo(),
bool  fp_mixed_precision = false,
const ActivationLayerInfo activation_info = ActivationLayerInfo() 
)

Initialise the kernel's input, output and alpha.

Parameters
[in]input0Input tensor containing the Matrix A. Data types supported: F16/F32
[in]input1Input tensor containing the Matrix B. Data type supported: same as input0
[in]input2Input tensor containing the Matrix C (bias). Can be nullptr. Data type supported: same as input0
[out]outputOutput tensor to store the result of matrix multiplication. Data type supported: same as input0
[in]alphaWeight of the matrix product
[in]beta(Optional) Weight of vector C. Default value is 0. Only beta = 1 is currently supported.
[in]is_interleaved_transposed(Optional) True if input0 and input1 have been reshaped respectively using CLGEMMReshapeLHSMatrixKernel and CLGEMMReshapeRHSMatrixKernel
[in]reshape_info(Optional) GEMM reshape info. If is_interleaved_transposed = true, this object must contain the information to understand how the matrix A and matrix B have been reshaped
[in]fp_mixed_precision(Optional) Use wider accumulators (32 bit instead of 16 for FP16) to improve accuracy
[in]activation_info(Optional) Activation to apply after the matrix multiplication

Definition at line 305 of file CLGEMMMatrixMultiplyKernel.cpp.

307 {
308  ARM_COMPUTE_ERROR_ON_NULLPTR(input0, input1, output);
309 
310  // Perform validate step
311  ARM_COMPUTE_ERROR_THROW_ON(validate_arguments(input0->info(), input1->info(), (input2 != nullptr) ? input2->info() : nullptr, output->info(), beta,
312  is_interleaved_transposed, reshape_info, fp_mixed_precision));
313 
314  _input0 = input0;
315  _input1 = input1;
316  _input2 = helpers::float_ops::is_zero(beta) ? nullptr : input2;
317  _output = output;
318  _reinterpret_input_as_3d = reshape_info.reinterpret_input_as_3d();
319  _reinterpret_output_as_3d = (reshape_info.depth_output_gemm3d() != 0);
320  _add_bias = _input2 != nullptr;
321  _broadcast_bias = reshape_info.broadcast_bias();
322 
323  // In case both input and output have to be reinterpreted as 3D tensors,
324  // force reinterpret_input_as_3d and reinterpret_output_as_3d to be false.
326  {
327  _reinterpret_input_as_3d = false;
329  }
330 
331  // Check if we need to slide the matrix B
332  const unsigned int num_dimensions_input0 = _reinterpret_input_as_3d ? _input0->info()->num_dimensions() - 1 : _input0->info()->num_dimensions();
333 
334  _slide_matrix_b = (_input1->info()->num_dimensions() >= num_dimensions_input0);
335 
336  const DataType data_type = input0->info()->data_type();
337 
338  // Get target architecture
339  GPUTarget gpu_target = get_target();
340 
341  ElementsProcessed num_elements_processed{};
342 
343  // Configure kernel window
344  auto win_config = validate_and_configure_window(input0->info(), input1->info(), (input2 != nullptr) ? input2->info() : nullptr, output->info(), beta, is_interleaved_transposed, reshape_info,
345  gpu_target, num_elements_processed);
346  ARM_COMPUTE_ERROR_THROW_ON(win_config.first);
347  ICLKernel::configure_internal(win_config.second);
348 
349  // Create build options
350  CLBuildOptions build_opts;
351 
352  build_opts.add_option_if(!(helpers::float_ops::is_one(alpha)), "-DALPHA=" + float_to_string_with_full_precision(alpha));
353  build_opts.add_option_if(_input2 != nullptr, "-DBETA=" + float_to_string_with_full_precision(beta));
354  build_opts.add_option_if(helpers::float_ops::is_one(beta), "-DUNIT_BETA");
355  build_opts.add_option_if(reshape_info.broadcast_bias(), "-DBROADCAST_BIAS");
356  build_opts.add_option_if(_reinterpret_input_as_3d, "-DREINTERPRET_INPUT_AS_3D");
357  build_opts.add_option_if(_reinterpret_output_as_3d, "-DREINTERPRET_OUTPUT_AS_3D");
358  build_opts.add_option_if(_reinterpret_input_as_3d || _reinterpret_output_as_3d, "-DHEIGHT_GEMM3D=" + support::cpp11::to_string(output->info()->dimension(1)));
359  build_opts.add_option_if(_reinterpret_input_as_3d || _reinterpret_output_as_3d, "-DDEPTH_GEMM3D=" + support::cpp11::to_string(output->info()->dimension(2)));
360  build_opts.add_option_if(!_slide_matrix_b, "-DMATRIX_B_DEPTH=" + support::cpp11::to_string(input1->info()->dimension(2)));
361  build_opts.add_option_if(activation_info.enabled(), "-DACTIVATION_TYPE=" + lower_string(string_from_activation_func(activation_info.activation())));
362  build_opts.add_option_if(activation_info.enabled(), "-DA_VAL=" + float_to_string_with_full_precision(activation_info.a()));
363  build_opts.add_option_if(activation_info.enabled(), "-DB_VAL=" + float_to_string_with_full_precision(activation_info.b()));
364 
365  const bool is_bifrost = get_arch_from_target(gpu_target) == GPUTarget::BIFROST;
366 
367  std::string kernel_name;
368  if(is_interleaved_transposed)
369  {
370  const int mult_transpose1xW_width = reshape_info.mult_transpose1xW_width();
371  const int mult_interleave4x4_height = reshape_info.mult_interleave4x4_height();
372 
373  build_opts.add_option("-DCOLS_B=" + support::cpp11::to_string(input1->info()->dimension(0)));
374  build_opts.add_option("-DMULT_TRANSPOSE1XW_WIDTH=" + support::cpp11::to_string(mult_transpose1xW_width));
375  build_opts.add_option("-DMULT_INTERLEAVE4X4_HEIGHT=" + support::cpp11::to_string(mult_interleave4x4_height));
376 
377  if(is_data_type_float(data_type) && is_bifrost)
378  {
379  kernel_name = "gemm_mm_interleaved_transposed_" + lower_string(string_from_data_type(data_type)) + "_bifrost";
380  }
381  else
382  {
383  kernel_name = "gemm_mm_interleaved_transposed_" + lower_string(string_from_data_type(data_type));
384  if(fp_mixed_precision && data_type == DataType::F16)
385  {
386  // currently wider accumulator is only supported for fp16 kernels.
387  kernel_name += "_acc32";
388  }
389  }
390  }
391  else // The input tensors have not been reshaped
392  {
393  build_opts.add_option("-DCOLS_A=" + support::cpp11::to_string(input0->info()->dimension(0)));
394  build_opts.add_option("-DDATA_TYPE=" + get_cl_type_from_data_type(data_type));
395 
396  // Create kernels according to the architecture, data type and input size.
397  if(is_data_type_float(data_type) && is_bifrost)
398  {
399  kernel_name = "gemm_mm_floating_point";
400 
401  if(input0->info()->num_dimensions() != 1)
402  {
403  kernel_name += "_" + lower_string(string_from_data_type(data_type)) + "_bifrost";
404  if(fp_mixed_precision && data_type == DataType::F16)
405  {
406  // currently wider accumulator is only supported for fp16 kernels.
407  kernel_name += "_acc32";
408  }
409  }
410  else if(input1->info()->dimension(0) <= 1000 && data_type == DataType::F32)
411  {
412  // The first kernel is optimized for the case of 1000 or less output elements (e.g. FC8 of AlexNet and VGG-16, and
413  // FC1 of Inception v3). The second kernel is optimized for the case of greater than 1000 output elements (e.g.
414  // FC6 and FC7 of AlexNet and VGG-16).
415  kernel_name += "_" + lower_string(string_from_data_type(data_type)) + "_bifrost_1000";
416  }
417 
418  // The work-group size equal to the Bifrost quad size has been proved to be optimal for these kernels
419  // via exhaustive autotuning over a range of representative layer configurations.
420  set_lws_hint(cl::NDRange(4));
421  }
422  else // (MIDGARD and F32) or (F16)
423  {
424  kernel_name = "gemm_mm_floating_point";
425  }
426  build_opts.add_option("-DNUM_ELEMS_PROCESSED_PER_THREAD_Y=" + support::cpp11::to_string(num_elements_processed.y()));
427  build_opts.add_option("-DNUM_ELEMS_PROCESSED_PER_THREAD_X=" + support::cpp11::to_string(num_elements_processed.x()));
428  }
429 
430  // Create kernel
431  _kernel = static_cast<cl::Kernel>(CLKernelLibrary::get().create_kernel(kernel_name, build_opts.options()));
432 
433  // Set config_id for enabling LWS tuning
434  _config_id = "gemm_";
435  _config_id += (is_interleaved_transposed ? "reshaped_" : "");
436  _config_id += (_add_bias ? "add_bias_" : "");
437  _config_id += (_broadcast_bias ? "broadcast_bias_" : "");
438  _config_id += (fp_mixed_precision ? "fp_mixed_" : "");
439  _config_id += (_reinterpret_input_as_3d ? "3di_" : "");
440  _config_id += (_reinterpret_output_as_3d ? "3do_" : "");
441  _config_id += lower_string(string_from_data_type(input0->info()->data_type()));
442  _config_id += "_";
443  _config_id += support::cpp11::to_string(output->info()->dimension(1));
444  _config_id += "_";
445  _config_id += support::cpp11::to_string(output->info()->dimension(0));
446  _config_id += "_";
447  _config_id += support::cpp11::to_string(output->info()->dimension(2));
448  _config_id += "_";
449  _config_id += support::cpp11::to_string(output->info()->dimension(3));
450  _config_id += "_";
451  _config_id += (is_interleaved_transposed ? support::cpp11::to_string(input1->info()->dimension(0)) : support::cpp11::to_string(input1->info()->dimension(1)));
452 }
virtual size_t num_dimensions() const =0
The number of dimensions of the tensor (rank)
bool is_one(float a, float epsilon=0.00001f)
Checks if the input floating point number is 1.0f checking if the difference is within a range define...
Definition: float_ops.h:97
std::pair< Status, Window > validate_and_configure_window(ITensorInfo *input, ITensorInfo *weights, ITensorInfo *biases, ITensorInfo *output, const PadStrideInfo &conv_info, unsigned int depth_multiplier, const Size2D &dilation)
void set_lws_hint(const cl::NDRange &lws_hint)
Set the Local-Workgroup-Size hint.
Definition: ICLKernel.h:237
std::string to_string(T &&value)
Convert integer and float values to string.
1 channel, 1 F32 per channel
const std::string & string_from_activation_func(ActivationLayerInfo::ActivationFunction act)
Translates a given activation function to a string.
Definition: Utils.cpp:170
static CLKernelLibrary & get()
Access the KernelLibrary singleton.
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:327
GPUTarget get_arch_from_target(GPUTarget target)
Helper function to get the GPU arch.
Definition: GPUTarget.cpp:189
std::string lower_string(const std::string &val)
Lower a given string.
Definition: Utils.cpp:327
1 channel, 1 F16 per channel
const std::string & string_from_data_type(DataType dt)
Convert a data type identity into a string.
Definition: Utils.cpp:144
std::string float_to_string_with_full_precision(float val)
Create a string with the float in full precision.
Definition: Utils.h:1066
GPUTarget get_target() const
Get the targeted GPU architecture.
Definition: ICLKernel.h:286
std::string get_cl_type_from_data_type(const DataType &dt)
Translates a tensor data type to the appropriate OpenCL type.
Definition: CLHelpers.cpp:35
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor's metadata.
std::unique_ptr< Kernel > create_kernel()
Helper function to create and return a unique_ptr pointed to a CL/GLES kernel object.
Definition: Helpers.h:86
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161
GPUTarget
Available GPU Targets.
Definition: GPUTarget.h:34
bool is_zero(float a, float epsilon=0.00001f)
Checks if the input floating point number is 0.0f checking if the difference is within a range define...
Definition: float_ops.h:109
DataType
Available data types.
Definition: Types.h:74
bool is_data_type_float(DataType dt)
Check if a given data type is of floating point type.
Definition: Utils.h:990

References CLGEMMMatrixMultiplyKernel::_add_bias, CLGEMMMatrixMultiplyKernel::_broadcast_bias, CLGEMMMatrixMultiplyKernel::_input0, CLGEMMMatrixMultiplyKernel::_input1, CLGEMMMatrixMultiplyKernel::_input2, CLGEMMMatrixMultiplyKernel::_output, CLGEMMMatrixMultiplyKernel::_reinterpret_input_as_3d, CLGEMMMatrixMultiplyKernel::_reinterpret_output_as_3d, CLGEMMMatrixMultiplyKernel::_slide_matrix_b, ActivationLayerInfo::a(), ActivationLayerInfo::activation(), CLBuildOptions::add_option(), CLBuildOptions::add_option_if(), arm_compute::test::validation::alpha, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, ActivationLayerInfo::b(), arm_compute::BIFROST, arm_compute::create_kernel(), arm_compute::test::validation::data_type, ITensorInfo::data_type(), ITensorInfo::dimension(), ActivationLayerInfo::enabled(), arm_compute::F16, arm_compute::F32, arm_compute::float_to_string_with_full_precision(), CLKernelLibrary::get(), arm_compute::get_arch_from_target(), arm_compute::get_cl_type_from_data_type(), ICLKernel::get_target(), ITensor::info(), arm_compute::is_data_type_float(), arm_compute::helpers::float_ops::is_one(), arm_compute::helpers::float_ops::is_zero(), arm_compute::lower_string(), ITensorInfo::num_dimensions(), CLBuildOptions::options(), ICLKernel::set_lws_hint(), arm_compute::string_from_activation_func(), arm_compute::string_from_data_type(), arm_compute::support::cpp11::to_string(), and arm_compute::validate_and_configure_window().

◆ operator=() [1/2]

CLGEMMMatrixMultiplyKernel& operator= ( const CLGEMMMatrixMultiplyKernel )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

Allow instances of this class to be moved.

◆ run()

void run ( const Window window,
cl::CommandQueue &  queue 
)
overridevirtual

Enqueue the OpenCL kernel to process the given window on the passed OpenCL command queue.

Note
The queue is not flushed by this method, and therefore the kernel will not have been executed by the time this method returns.
Parameters
[in]windowRegion on which to execute the kernel. (Must be a valid region of the window returned by window()).
[in,out]queueCommand queue on which to enqueue the kernel.

Implements ICLKernel.

Definition at line 476 of file CLGEMMMatrixMultiplyKernel.cpp.

477 {
480 
481  if(_input1->info()->num_dimensions() < 3)
482  {
483  // The stride_z for matrix B must be zero if we do not slice
485  }
486 
488  Window slice_matrix_b = slice;
489 
490  slice_matrix_b.set(Window::DimX, Window::Dimension(0, 1, 1));
491  slice_matrix_b.set(Window::DimY, Window::Dimension(0, 1, 1));
492 
493  const unsigned int num_arguments_bias = _add_bias ? num_arguments_per_2D_tensor() + 1 : 0;
494 
496  {
497  // Pass bottom paddings to the kernel if the input has to be reinterpreted as 3D tensor
498  const unsigned int idx0 = 3 * num_arguments_per_2D_tensor() + 3 + num_arguments_bias;
499  const unsigned int total_cross_plane_pad = _input0->info()->padding().top + _input0->info()->padding().bottom;
500  _kernel.setArg<cl_uint>(idx0, static_cast<unsigned int>(total_cross_plane_pad));
501  }
502 
504  {
505  // Pass bottom paddings to the kernel if the output has to be reinterpreted as 3D tensor
506  const unsigned int idx0 = 3 * num_arguments_per_2D_tensor() + 3 + (_reinterpret_input_as_3d ? 1 : 0) + num_arguments_bias;
507  const unsigned int total_cross_plane_pad = _output->info()->padding().top + _output->info()->padding().bottom;
508  _kernel.setArg<cl_uint>(idx0, static_cast<unsigned int>(total_cross_plane_pad));
509  }
510 
511  do
512  {
513  Window slice_b = slice;
514  // Don't slice matrix B along the z dimension if matrix B has just 2 dimensions and matrix A more than 2
515  // This scenario can happen when the matrix multiplication is used to perform a convolution operation
516  if(!_slide_matrix_b)
517  {
518  slice_b = slice_matrix_b;
519  }
520 
521  unsigned int idx = 0;
523  add_2D_tensor_argument(idx, _input1, slice_b);
524  if(_add_bias)
525  {
527  }
529  _kernel.setArg<cl_uint>(idx++, static_cast<unsigned int>(_input0->info()->strides_in_bytes()[2]));
530  _kernel.setArg<cl_uint>(idx++, static_cast<unsigned int>(_input1->info()->strides_in_bytes()[2]));
531  if(_add_bias)
532  {
533  _kernel.setArg<cl_uint>(idx++, static_cast<unsigned int>(_input2->info()->strides_in_bytes()[2]));
534  }
535  _kernel.setArg<cl_uint>(idx++, static_cast<unsigned int>(_output->info()->strides_in_bytes()[2]));
536  enqueue(queue, *this, slice, lws_hint());
537  }
539 }
unsigned int top
top of the border
Definition: Types.h:339
virtual size_t num_dimensions() const =0
The number of dimensions of the tensor (rank)
const Window & window() const
The maximum window the kernel can be executed on.
Definition: IKernel.cpp:28
void enqueue(cl::CommandQueue &queue, ICLKernel &kernel, const Window &window, const cl::NDRange &lws_hint=CLKernelLibrary::get().default_ndrange(), bool use_dummy_work_items=false)
Add the kernel to the command queue with the given window.
Definition: ICLKernel.cpp:39
cl::NDRange lws_hint() const
Return the Local-Workgroup-Size hint.
Definition: ICLKernel.h:247
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:337
unsigned int bottom
bottom of the border
Definition: Types.h:341
static constexpr size_t DimX
Alias for dimension 0 also known as X dimension.
Definition: Window.h:43
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor's metadata.
virtual PaddingSize padding() const =0
Padding of tensor.
static constexpr unsigned int num_arguments_per_2D_tensor()
Returns the number of arguments enqueued per 2D tensor object.
Definition: ICLKernel.h:192
bool slide_window_slice_3D(Window &slice) const
Slide the passed 3D window slice.
Definition: Window.h:319
static constexpr size_t DimY
Alias for dimension 1 also known as Y dimension.
Definition: Window.h:45
void add_2D_tensor_argument(unsigned int &idx, const ICLTensor *tensor, const Window &window)
Add the passed 2D tensor's parameters to the object's kernel's arguments starting from the index idx.
Definition: ICLKernel.h:134
virtual const Strides & strides_in_bytes() const =0
The strides in bytes for accessing each dimension of the tensor.
Window first_slice_window_3D() const
First 3D slice of the window.
Definition: Window.h:275
#define ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW(f, s)
Definition: Validate.h:205
#define ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL(k)
Definition: Validate.h:940
SimpleTensor< T > slice(const SimpleTensor< T > &src, Coordinates starts, Coordinates ends)

References CLGEMMMatrixMultiplyKernel::_add_bias, CLGEMMMatrixMultiplyKernel::_input0, CLGEMMMatrixMultiplyKernel::_input1, CLGEMMMatrixMultiplyKernel::_input2, CLGEMMMatrixMultiplyKernel::_output, CLGEMMMatrixMultiplyKernel::_reinterpret_input_as_3d, CLGEMMMatrixMultiplyKernel::_reinterpret_output_as_3d, CLGEMMMatrixMultiplyKernel::_slide_matrix_b, ICLKernel::add_2D_tensor_argument(), ARM_COMPUTE_ERROR_ON, ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW, ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL, BorderSize::bottom, Window::DimX, Window::DimY, arm_compute::enqueue(), Window::first_slice_window_3D(), ITensor::info(), ICLKernel::lws_hint(), ICLKernel::num_arguments_per_2D_tensor(), ITensorInfo::num_dimensions(), ITensorInfo::padding(), Window::set(), arm_compute::test::validation::reference::slice(), Window::slide_window_slice_3D(), ITensorInfo::strides_in_bytes(), BorderSize::top, and IKernel::window().

◆ validate()

Status validate ( const ITensorInfo input0,
const ITensorInfo input1,
const ITensorInfo input2,
const ITensorInfo output,
float  alpha,
float  beta,
bool  is_interleaved_transposed,
const GEMMReshapeInfo reshape_info,
GPUTarget  gpu_target,
bool  fp_mixed_precision = false,
const ActivationLayerInfo activation_info = ActivationLayerInfo() 
)
static

Static function to check if given info will lead to a valid configuration of CLGEMMMatrixMultiplyKernel.

Parameters
[in]input0Input tensor containing the Matrix A info. Data types supported: F16/F32
[in]input1Input tensor containing the Matrix B info. Data type supported: same as input0
[in]input2Input tensor containing the Matrix C (bias) info. Can be nullptr. Data type supported: same as input0
[in]outputOutput tensor to store the result of matrix multiplication. Data type supported: same as input0
[in]alphaWeight of the matrix product
[in]betaWeight of vector C. Default value is 0. Only beta = 1 is currently supported.
[in]is_interleaved_transposedTrue if input0 and input1 have been reshaped respectively using CLGEMMReshapeLHSMatrixKernel and CLGEMMReshapeRHSMatrixKernel
[in]reshape_infoGEMM reshape info. If is_interleaved_transposed = true, this object must contain the information to understand how the matrix A and matrix B have been reshaped
[in]gpu_targetGPU Target
[in]fp_mixed_precision(Optional) Use wider accumulators (32 bit instead of 16 for FP16) to improve accuracy
[in]activation_info(Optional) Activation to apply after the matrix multiplication
Returns
a status

Definition at line 454 of file CLGEMMMatrixMultiplyKernel.cpp.

456 {
457  // Note: num_elements_processed will be set in validate_and_configure_window()
458  ElementsProcessed num_elements_processed{};
460  ARM_COMPUTE_UNUSED(activation_info);
461  ARM_COMPUTE_RETURN_ON_ERROR(validate_arguments(input0, input1, input2, output, beta, is_interleaved_transposed, reshape_info, fp_mixed_precision));
463  input1->clone().get(),
464  (input2 != nullptr) ? input2->clone().get() : nullptr,
465  output->clone().get(),
466  beta,
467  is_interleaved_transposed,
468  reshape_info,
469  gpu_target,
470  num_elements_processed)
471  .first);
472 
473  return Status{};
474 }
std::pair< Status, Window > validate_and_configure_window(ITensorInfo *input, ITensorInfo *weights, ITensorInfo *biases, ITensorInfo *output, const PadStrideInfo &conv_info, unsigned int depth_multiplier, const Size2D &dilation)
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:193
#define ARM_COMPUTE_UNUSED(...)
To avoid unused variables warnings.
Definition: Error.h:160

References arm_compute::test::validation::alpha, ARM_COMPUTE_RETURN_ON_ERROR, ARM_COMPUTE_UNUSED, ICloneable< T >::clone(), and arm_compute::validate_and_configure_window().

Field Documentation

◆ _add_bias

◆ _broadcast_bias

bool _broadcast_bias

Definition at line 102 of file CLGEMMMatrixMultiplyKernel.h.

Referenced by CLGEMMMatrixMultiplyKernel::configure().

◆ _input0

◆ _input1

◆ _input2

◆ _output

◆ _reinterpret_input_as_3d

bool _reinterpret_input_as_3d

◆ _reinterpret_output_as_3d

bool _reinterpret_output_as_3d

◆ _slide_matrix_b

bool _slide_matrix_b

The documentation for this class was generated from the following files: