Compute Library
 21.02
CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel Class Reference

OpenCL kernel to multiply matrices with QASYMM8 data type when only the input matrix RHS (input1) has been reshaped. More...

#include <CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel.h>

Collaboration diagram for CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel:
[legend]

Public Member Functions

 CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel ()
 Default Constructor. More...
 
 CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel (const CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKerneloperator= (const CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel (CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel &&)=default
 Allow instances of this class to be moved. More...
 
CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKerneloperator= (CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel &&)=default
 Allow instances of this class to be moved. More...
 
void configure (const ICLTensor *input0, const ICLTensor *input1, ICLTensor *output, const GEMMKernelInfo &gemm_info, const ICLTensor *vector_sum_col=nullptr, const ICLTensor *vector_sum_row=nullptr, const ICLTensor *bias=nullptr, const ICLTensor *output_multipliers=nullptr, const ICLTensor *output_shifts=nullptr)
 Initialise the kernel's input and output. More...
 
void configure (const CLCompileContext &compile_context, const ICLTensor *input0, const ICLTensor *input1, ICLTensor *output, const GEMMKernelInfo &gemm_info, const ICLTensor *vector_sum_col=nullptr, const ICLTensor *vector_sum_row=nullptr, const ICLTensor *bias=nullptr, const ICLTensor *output_multipliers=nullptr, const ICLTensor *output_shifts=nullptr)
 Initialise the kernel's input and output. More...
 
void run (const Window &window, cl::CommandQueue &queue) override
 Enqueue the OpenCL kernel to process the given window on the passed OpenCL command queue. More...
 
- Public Member Functions inherited from ICLKernel
 ICLKernel ()
 Constructor. More...
 
cl::Kernel & kernel ()
 Returns a reference to the OpenCL kernel of this object. More...
 
template<typename T >
void add_1D_array_argument (unsigned int &idx, const ICLArray< T > *array, const Strides &strides, unsigned int num_dimensions, const Window &window)
 Add the passed 1D array's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_1D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 1D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_1D_tensor_argument_if (bool cond, unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 1D tensor's parameters to the object's kernel's arguments starting from the index idx if the condition is true. More...
 
void add_2D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 2D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_2D_tensor_argument_if (bool cond, unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 2D tensor's parameters to the object's kernel's arguments starting from the index idx if the condition is true. More...
 
void add_3D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 3D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_4D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window)
 Add the passed 4D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
virtual void run_op (ITensorPack &tensors, const Window &window, cl::CommandQueue &queue)
 Enqueue the OpenCL kernel to process the given window on the passed OpenCL command queue. More...
 
template<typename T >
void add_argument (unsigned int &idx, T value)
 Add the passed parameters to the object's kernel's arguments starting from the index idx. More...
 
void set_lws_hint (const cl::NDRange &lws_hint)
 Set the Local-Workgroup-Size hint. More...
 
cl::NDRange lws_hint () const
 Return the Local-Workgroup-Size hint. More...
 
void set_wbsm_hint (const cl_int &wbsm_hint)
 Set the workgroup batch size modifier hint. More...
 
cl_int wbsm_hint () const
 Return the workgroup batch size modifier hint. More...
 
const std::string & config_id () const
 Get the configuration ID. More...
 
void set_target (GPUTarget target)
 Set the targeted GPU architecture. More...
 
void set_target (cl::Device &device)
 Set the targeted GPU architecture according to the CL device. More...
 
GPUTarget get_target () const
 Get the targeted GPU architecture. More...
 
size_t get_max_workgroup_size ()
 Get the maximum workgroup size for the device the CLKernelLibrary uses. More...
 
template<unsigned int dimension_size>
void add_tensor_argument (unsigned &idx, const ICLTensor *tensor, const Window &window)
 
template<typename T , unsigned int dimension_size>
void add_array_argument (unsigned &idx, const ICLArray< T > *array, const Strides &strides, unsigned int num_dimensions, const Window &window)
 Add the passed array's parameters to the object's kernel's arguments starting from the index idx. More...
 
- Public Member Functions inherited from IKernel
 IKernel ()
 Constructor. More...
 
virtual ~IKernel ()=default
 Destructor. More...
 
virtual bool is_parallelisable () const
 Indicates whether or not the kernel is parallelisable. More...
 
virtual BorderSize border_size () const
 The size of the border for that kernel. More...
 
const Windowwindow () const
 The maximum window the kernel can be executed on. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *input0, const ITensorInfo *input1, const ITensorInfo *output, const GEMMKernelInfo &gemm_info, const ITensorInfo *vector_sum_col=nullptr, const ITensorInfo *vector_sum_row=nullptr, const ITensorInfo *bias=nullptr, const ITensorInfo *output_multipliers=nullptr, const ITensorInfo *output_shifts=nullptr)
 Static function to check if given info will lead to a valid configuration of CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel. More...
 
- Static Public Member Functions inherited from ICLKernel
static constexpr unsigned int num_arguments_per_1D_array ()
 Returns the number of arguments enqueued per 1D array object. More...
 
static constexpr unsigned int num_arguments_per_1D_tensor ()
 Returns the number of arguments enqueued per 1D tensor object. More...
 
static constexpr unsigned int num_arguments_per_2D_tensor ()
 Returns the number of arguments enqueued per 2D tensor object. More...
 
static constexpr unsigned int num_arguments_per_3D_tensor ()
 Returns the number of arguments enqueued per 3D tensor object. More...
 
static constexpr unsigned int num_arguments_per_4D_tensor ()
 Returns the number of arguments enqueued per 4D tensor object. More...
 
static cl::NDRange gws_from_window (const Window &window)
 Get the global work size given an execution window. More...
 

Detailed Description

OpenCL kernel to multiply matrices with QASYMM8 data type when only the input matrix RHS (input1) has been reshaped.

Note
The input matrix input1 must be reshaped through CLGEMMReshapeRHSMatrixKernel
For fused output stage, only GEMMLowpOutputStageType::QUANTIZE_DOWN_FIXEDPOINT type is supported

Definition at line 39 of file CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel.h.

Constructor & Destructor Documentation

◆ CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel() [1/3]

Default Constructor.

Definition at line 278 of file CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel.cpp.

279  : _input0(nullptr),
280  _input1(nullptr),
281  _output(nullptr),
282  _vector_sum_col(nullptr),
283  _vector_sum_row(nullptr),
284  _bias(nullptr),
285  _output_multipliers(nullptr),
286  _output_shifts(nullptr),
287  _slide_matrix_b(true),
288  _reinterpret_input_as_3d(false),
289  _reinterpret_output_as_3d(false),
290  _use_dummy_work_items(false),
291  _is_quantized_per_channel(false),
292  _fuse_output_stage(false)
293 {
294 }

◆ CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel() [2/3]

Prevent instances of this class from being copied (As this class contains pointers)

◆ CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel() [3/3]

Allow instances of this class to be moved.

Member Function Documentation

◆ configure() [1/2]

void configure ( const ICLTensor input0,
const ICLTensor input1,
ICLTensor output,
const GEMMKernelInfo gemm_info,
const ICLTensor vector_sum_col = nullptr,
const ICLTensor vector_sum_row = nullptr,
const ICLTensor bias = nullptr,
const ICLTensor output_multipliers = nullptr,
const ICLTensor output_shifts = nullptr 
)

Initialise the kernel's input and output.

Parameters
[in]input0Input tensor containing the LHS matrix. Data type supported: QASYMM8/QASYMM8_SIGNED
[in]input1Input tensor containing the RHS reshaped matrix. Data type supported: QASYMM8/QASYMM8_SIGNED/QSYMM8/QSYMM8_PER_CHANNEL
[out]outputOutput tensor. Data type supported: QASYMM8/QASYMM8_SIGNED/S32.
[in]gemm_infoGEMM information used to retrieve the original dimensions of the input matrices, output stage information and RHS/LHS info. Only the following values are supported for LHS info: lhs_info.m0: 2,3,4,5,6,7,8 lhs_info.k0: 2,3,4,8,16 Only the following values are supported for RHS info: rhs_info.n0: 2,3,4,8,16 rhs_info.k0: same as lhs_info.k0 rhs_info.transpose: true
[in]vector_sum_col(Optional) Input row-vector of sums of all the entries in each column of matrix B. Note: vector_sum_col can be a nullptr in case a_offset = 0. Data type supported: S32
[in]vector_sum_row(Optional) Input row-vector of sums of all the entries in each row of matrix A. Note: vector_sum_row can be a nullptr in case b_offset = 0. Data type supported: S32
[in]bias(Optional) Biases tensor. Only shared biases supported and it can be a nullptr if the addition of biases is not required. Biases are 1D tensor with dimensions [OFM]. Data type supported: S32.
[in]output_multipliers(Optional) Output multipliers tensor. In case of per-channel quantization, the number of multipliers must be equal to the number of filters (OFM). Supported data types: S32.
[in]output_shifts(Optional) Output shifts tensor. In case of per-channel quantization, the number of multipliers must be equal to the number of filters (OFM). Supported data types: S32.

Definition at line 296 of file CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel.cpp.

References CLKernelLibrary::get().

299 {
300  configure(CLKernelLibrary::get().get_compile_context(), input0, input1, output, gemm_info, vector_sum_col, vector_sum_row, bias, output_multipliers, output_shifts);
301 }
static CLKernelLibrary & get()
Access the KernelLibrary singleton.
void configure(const ICLTensor *input0, const ICLTensor *input1, ICLTensor *output, const GEMMKernelInfo &gemm_info, const ICLTensor *vector_sum_col=nullptr, const ICLTensor *vector_sum_row=nullptr, const ICLTensor *bias=nullptr, const ICLTensor *output_multipliers=nullptr, const ICLTensor *output_shifts=nullptr)
Initialise the kernel&#39;s input and output.

◆ configure() [2/2]

void configure ( const CLCompileContext compile_context,
const ICLTensor input0,
const ICLTensor input1,
ICLTensor output,
const GEMMKernelInfo gemm_info,
const ICLTensor vector_sum_col = nullptr,
const ICLTensor vector_sum_row = nullptr,
const ICLTensor bias = nullptr,
const ICLTensor output_multipliers = nullptr,
const ICLTensor output_shifts = nullptr 
)

Initialise the kernel's input and output.

Parameters
[in]compile_contextThe compile context to be used.
[in]input0Input tensor containing the LHS matrix. Data type supported: QASYMM8/QASYMM8_SIGNED
[in]input1Input tensor containing the RHS reshaped matrix. Data type supported: same as input0
[out]outputOutput tensor. Data type supported: QASYMM8/QASYMM8_SIGNED/S32.
[in]gemm_infoGEMM information used to retrieve the original dimensions of the input matrices, output stage information and RHS/LHS info. Only the following values are supported for LHS info: lhs_info.m0: 2,3,4,5,6,7,8 lhs_info.k0: 2,3,4,8,16 Only the following values are supported for RHS info: rhs_info.n0: 2,3,4,8,16 rhs_info.k0: same as lhs_info.k0 rhs_info.transpose: true
[in]vector_sum_col(Optional) Input row-vector of sums of all the entries in each column of matrix B. Note: vector_sum_col can be a nullptr in case a_offset = 0. Data type supported: S32
[in]vector_sum_row(Optional) Input row-vector of sums of all the entries in each row of matrix A. Note: vector_sum_row can be a nullptr in case b_offset = 0. Data type supported: S32
[in]bias(Optional) Biases tensor. Only shared biases supported and it can be a nullptr if the addition of biases is not required. Biases are 1D tensor with dimensions [OFM]. Data type supported: S32.
[in]output_multipliers(Optional) Output multipliers tensor. In case of per-channel quantization, the number of multipliers must be equal to the number of filters (OFM). Supported data types: S32.
[in]output_shifts(Optional) Output shifts tensor. In case of per-channel quantization, the number of multipliers must be equal to the number of filters (OFM). Supported data types: S32.

Definition at line 303 of file CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel.cpp.

References GEMMKernelInfo::a_offset, CLBuildOptions::add_option(), CLBuildOptions::add_option_if(), ARM_COMPUTE_ERROR_ON, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, GEMMKernelInfo::b_offset, arm_compute::create_kernel(), ITensorInfo::data_type(), GEMMKernelInfo::depth_output_gemm3d, ITensorInfo::dimension(), arm_compute::dot8_supported(), GEMMLowpOutputStageInfo::gemmlowp_max_bound, GEMMLowpOutputStageInfo::gemmlowp_min_bound, GEMMLowpOutputStageInfo::gemmlowp_multipliers, GEMMLowpOutputStageInfo::gemmlowp_offset, GEMMLowpOutputStageInfo::gemmlowp_shifts, CLKernelLibrary::get(), arm_compute::get_cl_dot8_acc_type_from_data_type(), arm_compute::get_cl_type_from_data_type(), arm_compute::get_min_max(), arm_compute::get_padding_info(), arm_compute::has_padding_changed(), ITensor::info(), GEMMLowpOutputStageInfo::is_quantized_per_channel, GEMMKernelInfo::k, kernel_name, GEMMKernelInfo::lhs_info, GEMMKernelInfo::m, GEMMLHSMatrixInfo::m0, GEMMKernelInfo::n, Dimensions< T >::num_dimensions(), ITensorInfo::num_dimensions(), CLBuildOptions::options(), GEMMKernelInfo::output_stage, arm_compute::preferred_dummy_work_items_support(), arm_compute::QUANTIZE_DOWN_FIXEDPOINT, GEMMKernelInfo::reinterpret_input_as_3d, GEMMKernelInfo::rhs_info, ITensorInfo::tensor_shape(), arm_compute::support::cpp11::to_string(), GEMMLowpOutputStageInfo::type, and arm_compute::validate_arguments().

307 {
308  ARM_COMPUTE_ERROR_ON_NULLPTR(input0, input1, output);
310  input1->info(),
311  output->info(),
312  gemm_info,
313  vector_sum_col != nullptr ? vector_sum_col->info() : nullptr,
314  vector_sum_row != nullptr ? vector_sum_row->info() : nullptr,
315  bias != nullptr ? bias->info() : nullptr,
316  output_multipliers != nullptr ? output_multipliers->info() : nullptr,
317  output_shifts != nullptr ? output_shifts->info() : nullptr));
318 
319  auto padding_info = get_padding_info({ input0, input1, output, vector_sum_row });
320  const GEMMRHSMatrixInfo rhs_info = gemm_info.rhs_info;
321  const GEMMLHSMatrixInfo lhs_info = gemm_info.lhs_info;
322  const GEMMLowpOutputStageInfo output_stage = gemm_info.output_stage;
323  const int32_t a_offset = gemm_info.a_offset;
324  const int32_t b_offset = gemm_info.b_offset;
325 
326  _input0 = input0;
327  _input1 = input1;
328  _output = output;
329  _vector_sum_col = vector_sum_col;
330  _vector_sum_row = vector_sum_row;
331  _bias = bias;
332  _output_multipliers = output_multipliers;
333  _output_shifts = output_shifts;
334  _reinterpret_input_as_3d = gemm_info.reinterpret_input_as_3d;
335  _reinterpret_output_as_3d = (gemm_info.depth_output_gemm3d != 0);
336  _use_dummy_work_items = preferred_dummy_work_items_support(CLKernelLibrary::get().get_device());
337  _is_quantized_per_channel = output_stage.is_quantized_per_channel;
338 
339  // In case both input and output have to be reinterpreted as 3D tensors,
340  // force reinterpret_input_as_3d and reinterpret_output_as_3d to be false.
341  if(_reinterpret_input_as_3d == _reinterpret_output_as_3d)
342  {
343  _reinterpret_input_as_3d = false;
344  _reinterpret_output_as_3d = false;
345  }
346 
347  // Check if we need to slide the matrix B
348  const unsigned int num_dimensions_input0 = _input0->info()->num_dimensions();
349  _slide_matrix_b = (_input1->info()->num_dimensions() >= num_dimensions_input0);
350 
351  ElementsProcessed num_elements_processed{};
352 
353  // Configure kernel window
354  auto win_config = validate_and_configure_window(input0->info(),
355  input1->info(),
356  output->info(),
357  gemm_info,
358  vector_sum_col != nullptr ? vector_sum_col->info() : nullptr,
359  vector_sum_row != nullptr ? vector_sum_row->info() : nullptr,
360  bias != nullptr ? bias->info() : nullptr,
361  output_multipliers != nullptr ? output_multipliers->info() : nullptr,
362  output_shifts != nullptr ? output_shifts->info() : nullptr,
363  num_elements_processed);
364  ARM_COMPUTE_ERROR_THROW_ON(win_config.first);
365  ICLKernel::configure_internal(win_config.second);
366 
367  // If _reinterpret_input_as_3d = _reinterpret_output_as_3d = true,
368  // we will dispatch a batched-GEMM to reduce the complexity of the address calculation within the OpenCL kernel.
369  // This means that the actual m used by the kernel is given by output->info()->dimension(1) and not by gemm_info.m
370  const unsigned int internal_m = _reinterpret_output_as_3d ? gemm_info.m : output->info()->dimension(1);
371 
372  // Shrink M0 to be always <= M (internal_m) to prevent out-of-bounds reads.
373  // NOTE: This might have implications on heuristics and performance
374  const unsigned int internal_m0 = std::min(internal_m, lhs_info.m0);
375 
376  // Calculate partial (store instead of load) M0 and partial N0 for the partial blocks at the end of a row/column if any. This is to avoid padding.
377  const unsigned int partial_store_m0 = internal_m % internal_m0;
378  const unsigned int partial_store_n0 = gemm_info.n % rhs_info.n0;
379 
380  // Create build options
381  CLBuildOptions build_opts;
382  build_opts.add_option_if(_reinterpret_input_as_3d, "-DREINTERPRET_INPUT_AS_3D");
383  build_opts.add_option_if(_reinterpret_output_as_3d, "-DREINTERPRET_OUTPUT_AS_3D");
384  build_opts.add_option_if(_reinterpret_input_as_3d || _reinterpret_output_as_3d, "-DHEIGHT_GEMM3D=" + support::cpp11::to_string(output->info()->dimension(1)));
385  build_opts.add_option_if(_reinterpret_input_as_3d || _reinterpret_output_as_3d, "-DDEPTH_GEMM3D=" + support::cpp11::to_string(output->info()->dimension(2)));
386  build_opts.add_option_if(!_slide_matrix_b, "-DMATRIX_B_DEPTH=" + support::cpp11::to_string(input1->info()->dimension(2)));
387  build_opts.add_option_if(rhs_info.interleave, "-DRHS_INTERLEAVE");
388  build_opts.add_option_if(_use_dummy_work_items, "-DDUMMY_WORK_ITEMS");
389  build_opts.add_option("-DM=" + support::cpp11::to_string(internal_m));
390  build_opts.add_option("-DN=" + support::cpp11::to_string(gemm_info.n));
391  build_opts.add_option("-DK=" + support::cpp11::to_string(gemm_info.k));
392  build_opts.add_option("-DM0=" + support::cpp11::to_string(internal_m0));
393  build_opts.add_option("-DN0=" + support::cpp11::to_string(rhs_info.n0));
394  build_opts.add_option("-DK0=" + support::cpp11::to_string(rhs_info.k0));
395  build_opts.add_option("-DH0=" + support::cpp11::to_string(rhs_info.h0));
396  build_opts.add_option("-DPARTIAL_STORE_M0=" + support::cpp11::to_string(partial_store_m0));
397  build_opts.add_option("-DPARTIAL_STORE_N0=" + support::cpp11::to_string(partial_store_n0));
398  build_opts.add_option("-DDATA_TYPE=" + get_cl_type_from_data_type(input0->info()->data_type()));
399  build_opts.add_option("-DACC_DATA_TYPE=" + get_cl_dot8_acc_type_from_data_type(input0->info()->data_type()));
400 
401  std::string kernel_name("gemmlowp_mm_reshaped_only_rhs_");
402  kernel_name += rhs_info.transpose ? "t" : "nt";
403 
405  {
406  kernel_name += "_fused_output_stage_fixedpoint";
407  _fuse_output_stage = true;
408  // If a_offset == 0, vector_sum_col can be a nullptr
409  if(a_offset != 0)
410  {
411  build_opts.add_option("-DA_OFFSET=" + support::cpp11::to_string(a_offset));
412  build_opts.add_option_if(vector_sum_col->info()->tensor_shape().num_dimensions() > 1, "-DSUM_COL_HAS_BATCHES");
413  }
414  // If b_offset == 0, vector_sum_row can be a nullptr
415  build_opts.add_option_if(b_offset != 0, "-DB_OFFSET=" + support::cpp11::to_string(b_offset));
416  build_opts.add_option("-DK_OFFSET=" + support::cpp11::to_string(a_offset * b_offset * input0->info()->dimension(0)));
417  build_opts.add_option_if(bias != nullptr, "-DADD_BIAS");
418  build_opts.add_option("-DRESULT_OFFSET=" + support::cpp11::to_string(output_stage.gemmlowp_offset));
419  build_opts.add_option("-DRESULT_MULTIPLIER=" + support::cpp11::to_string(output_stage.gemmlowp_multipliers[0]));
420  build_opts.add_option("-DRESULT_SHIFT=" + support::cpp11::to_string(output_stage.gemmlowp_shifts[0]));
421  build_opts.add_option_if(_is_quantized_per_channel, "-DPER_CHANNEL_QUANTIZATION");
422 
423  const int min = output_stage.gemmlowp_min_bound;
424  const int max = output_stage.gemmlowp_max_bound;
425 
426  PixelValue min_val{};
427  PixelValue max_val{};
428  std::tie(min_val, max_val) = get_min_max(output->info()->data_type());
429  build_opts.add_option_if(min != min_val.get<int32_t>(), "-DMIN_BOUND=" + support::cpp11::to_string(min));
430  build_opts.add_option_if(max != max_val.get<int32_t>(), "-DMAX_BOUND=" + support::cpp11::to_string(max));
431  }
432 
433  // Create kernel
434  _kernel = create_kernel(compile_context, kernel_name, build_opts.options());
435 
436  // Set config_id for enabling LWS tuning
437  _config_id = kernel_name;
438  _config_id += "_";
439  _config_id += dot8_supported(CLKernelLibrary::get().get_device()) ? "_dot8" : "";
440  _config_id += "_";
441  _config_id += (_reinterpret_input_as_3d ? "3di_" : "");
442  _config_id += (_reinterpret_output_as_3d ? "3do_" : "");
443  _config_id += support::cpp11::to_string(output->info()->dimension(1));
444  _config_id += "_";
445  _config_id += support::cpp11::to_string(output->info()->dimension(0));
446  _config_id += "_";
447  _config_id += support::cpp11::to_string(gemm_info.k);
448  _config_id += "_";
449  _config_id += support::cpp11::to_string(output->info()->dimension(2));
450  _config_id += "_";
451  _config_id += support::cpp11::to_string(lhs_info.m0);
452  _config_id += "_";
453  _config_id += support::cpp11::to_string(rhs_info.n0);
454  _config_id += "_";
455  _config_id += support::cpp11::to_string(rhs_info.k0);
456  _config_id += "_";
457  _config_id += support::cpp11::to_string(rhs_info.h0);
458  _config_id += "_";
459  _config_id += support::cpp11::to_string(rhs_info.interleave);
461 }
virtual size_t num_dimensions() const =0
The number of dimensions of the tensor (rank)
Quantize using a fixed point multiplication.
bool dot8_supported(const cl::Device &device)
Helper function to check whether the cl_arm_integer_dot_product_int8 extension is supported...
Definition: CLHelpers.cpp:239
bool preferred_dummy_work_items_support(const cl::Device &device)
Helper function to check if "dummy work-items" are preferred to have a power of two NDRange In case d...
Definition: CLHelpers.cpp:361
std::string get_cl_dot8_acc_type_from_data_type(const DataType &dt)
Translates a tensor data type to the appropriate OpenCL dot8 accumulator type.
Definition: CLHelpers.cpp:173
std::string to_string(T &&value)
Convert integer and float values to string.
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:466
static CLKernelLibrary & get()
Access the KernelLibrary singleton.
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
cl::Kernel create_kernel(const CLCompileContext &ctx, const std::string &kernel_name, const std::set< std::string > &build_opts=std::set< std::string >())
Creates an opencl kernel using a compile context.
Definition: CLHelpers.cpp:403
std::string kernel_name
std::string get_cl_type_from_data_type(const DataType &dt)
Translates a tensor data type to the appropriate OpenCL type.
Definition: CLHelpers.cpp:37
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor&#39;s metadata.
bool has_padding_changed(const std::unordered_map< const ITensorInfo *, PaddingSize > &padding_map)
Check if the previously stored padding info has changed after configuring a kernel.
Definition: Utils.cpp:528
std::unordered_map< const ITensorInfo *, PaddingSize > get_padding_info(std::initializer_list< const ITensorInfo *> infos)
Stores padding information before configuring a kernel.
Definition: Utils.cpp:513
Status validate_arguments(const ITensorInfo *input, const ITensorInfo *bias, const ITensorInfo *output, const GEMMLowpOutputStageInfo *output_stage)
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161
std::tuple< PixelValue, PixelValue > get_min_max(DataType dt)
Compute the mininum and maximum values a data type can take.
Definition: Utils.h:564

◆ operator=() [1/2]

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

Allow instances of this class to be moved.

◆ run()

void run ( const Window window,
cl::CommandQueue &  queue 
)
overridevirtual

Enqueue the OpenCL kernel to process the given window on the passed OpenCL command queue.

Note
The queue is not flushed by this method, and therefore the kernel will not have been executed by the time this method returns.
Parameters
[in]windowRegion on which to execute the kernel. (Must be a valid region of the window returned by window()).
[in,out]queueCommand queue on which to enqueue the kernel.

Reimplemented from ICLKernel.

Definition at line 484 of file CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel.cpp.

References ICLKernel::add_1D_tensor_argument_if(), ICLKernel::add_2D_tensor_argument(), ICLKernel::add_2D_tensor_argument_if(), ARM_COMPUTE_ERROR_ON, ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW, ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL, BorderSize::bottom, Window::DimX, Window::DimY, Window::DimZ, arm_compute::enqueue(), Window::first_slice_window_3D(), ITensor::info(), ICLKernel::lws_hint(), ICLKernel::num_arguments_per_2D_tensor(), ITensorInfo::num_dimensions(), ITensorInfo::padding(), Window::set(), arm_compute::test::validation::reference::slice(), Window::slide_window_slice_3D(), ITensorInfo::strides_in_bytes(), BorderSize::top, and IKernel::window().

485 {
488 
489  if(_input1->info()->num_dimensions() < 3)
490  {
491  // The stride_z for matrix B must be zero if we do not slice
492  ARM_COMPUTE_ERROR_ON(_input1->info()->strides_in_bytes()[3] != 0);
493  }
494 
496  Window slice_matrix_b = slice;
497 
498  slice_matrix_b.set(Window::DimX, Window::Dimension(0, 1, 1));
499  slice_matrix_b.set(Window::DimY, Window::Dimension(0, 1, 1));
500 
501  if(_reinterpret_input_as_3d)
502  {
503  // Pass bottom paddings to the kernel if the input has to be reinterpreted as 3D tensor
504  const unsigned int idx0 = 3 * num_arguments_per_2D_tensor() + 3;
505  const unsigned int total_cross_plane_pad = _input0->info()->padding().top + _input0->info()->padding().bottom;
506  _kernel.setArg<cl_uint>(idx0, static_cast<unsigned int>(total_cross_plane_pad));
507  }
508 
509  if(_reinterpret_output_as_3d)
510  {
511  // Pass bottom paddings to the kernel if the output has to be reinterpreted as 3D tensor
512  const unsigned int idx0 = 3 * num_arguments_per_2D_tensor() + 3 + (_reinterpret_input_as_3d ? 1 : 0);
513  const unsigned int total_cross_plane_pad = _output->info()->padding().top + _output->info()->padding().bottom;
514  _kernel.setArg<cl_uint>(idx0, static_cast<unsigned int>(total_cross_plane_pad));
515  }
516 
517  // Set window for vector_sum_col
518  Window win_vector_sum_col = slice;
519  win_vector_sum_col.set(Window::DimY, Window::Dimension(0, 0, 0));
520  win_vector_sum_col.set(Window::DimZ, Window::Dimension(0, 0, 0));
521 
522  // Set window for vector_sum_row
523  Window win_vector_sum_row = slice;
524  win_vector_sum_row.set(Window::DimX, Window::Dimension(0, 0, 0));
525  win_vector_sum_row.set(Window::DimY, Window::Dimension(0, 0, 0));
526  win_vector_sum_col.set(Window::DimZ, Window::Dimension(0, 0, 0));
527 
528  Window biases_slice = slice;
529  biases_slice.set(Window::DimY, Window::Dimension(0, 1, 1));
530  biases_slice.set(Window::DimZ, Window::Dimension(0, 1, 1));
531 
532  do
533  {
534  Window slice_b = slice;
535  // Don't slice matrix B along the z dimension if matrix B has just 2 dimensions and matrix A more than 2
536  // This scenario can happen when the matrix multiplication is used to perform a convolution operation
537  if(!_slide_matrix_b)
538  {
539  slice_b = slice_matrix_b;
540  }
541 
542  unsigned int idx = 0;
543  add_2D_tensor_argument(idx, _input0, slice);
544  add_2D_tensor_argument(idx, _input1, slice_b);
545  add_2D_tensor_argument(idx, _output, slice);
546  _kernel.setArg<cl_uint>(idx++, static_cast<unsigned int>(_input0->info()->strides_in_bytes()[2]));
547  _kernel.setArg<cl_uint>(idx++, static_cast<unsigned int>(_input1->info()->strides_in_bytes()[2]));
548  _kernel.setArg<cl_uint>(idx++, static_cast<unsigned int>(_output->info()->strides_in_bytes()[2]));
549  if(_reinterpret_input_as_3d)
550  {
551  // Pass bottom paddings to the kernel if the input has to be reinterpreted as 3D tensor
552  idx++;
553  }
554 
555  if(_reinterpret_output_as_3d)
556  {
557  // Pass bottom paddings to the kernel if the output has to be reinterpreted as 3D tensor
558  idx++;
559  }
560 
561  if(_fuse_output_stage)
562  {
563  add_2D_tensor_argument_if((_vector_sum_col != nullptr), idx, _vector_sum_col, win_vector_sum_col);
564  add_2D_tensor_argument_if((_vector_sum_row != nullptr), idx, _vector_sum_row, win_vector_sum_row);
565  add_1D_tensor_argument_if((_bias != nullptr), idx, _bias, biases_slice);
566  add_1D_tensor_argument_if(_is_quantized_per_channel, idx, _output_multipliers, biases_slice);
567  add_1D_tensor_argument_if(_is_quantized_per_channel, idx, _output_shifts, biases_slice);
568  }
569  enqueue(queue, *this, slice, lws_hint(), _use_dummy_work_items);
570  }
571  while(window.slide_window_slice_3D(slice));
572 }
void add_1D_tensor_argument_if(bool cond, unsigned int &idx, const ICLTensor *tensor, const Window &window)
Add the passed 1D tensor&#39;s parameters to the object&#39;s kernel&#39;s arguments starting from the index idx ...
Definition: ICLKernel.h:135
unsigned int top
top of the border
Definition: Types.h:375
virtual size_t num_dimensions() const =0
The number of dimensions of the tensor (rank)
const Window & window() const
The maximum window the kernel can be executed on.
Definition: IKernel.cpp:28
void add_2D_tensor_argument_if(bool cond, unsigned int &idx, const ICLTensor *tensor, const Window &window)
Add the passed 2D tensor&#39;s parameters to the object&#39;s kernel&#39;s arguments starting from the index idx ...
Definition: ICLKernel.h:159
void enqueue(IGCKernel &kernel, const Window &window, const gles::NDRange &lws=gles::NDRange(1U, 1U, 1U))
Add the kernel to the command queue with the given window.
Definition: IGCKernel.cpp:41
cl::NDRange lws_hint() const
Return the Local-Workgroup-Size hint.
Definition: ICLKernel.h:276
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:466
unsigned int bottom
bottom of the border
Definition: Types.h:377
static constexpr size_t DimX
Alias for dimension 0 also known as X dimension.
Definition: Window.h:43
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor&#39;s metadata.
virtual PaddingSize padding() const =0
Padding of tensor.
static constexpr unsigned int num_arguments_per_2D_tensor()
Returns the number of arguments enqueued per 2D tensor object.
Definition: ICLKernel.h:206
bool slide_window_slice_3D(Window &slice) const
Slide the passed 3D window slice.
Definition: Window.h:335
#define ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL(k)
Definition: Validate.h:941
static constexpr size_t DimY
Alias for dimension 1 also known as Y dimension.
Definition: Window.h:45
void add_2D_tensor_argument(unsigned int &idx, const ICLTensor *tensor, const Window &window)
Add the passed 2D tensor&#39;s parameters to the object&#39;s kernel&#39;s arguments starting from the index idx...
Definition: ICLKernel.h:148
static constexpr size_t DimZ
Alias for dimension 2 also known as Z dimension.
Definition: Window.h:47
virtual const Strides & strides_in_bytes() const =0
The strides in bytes for accessing each dimension of the tensor.
Window first_slice_window_3D() const
First 3D slice of the window.
Definition: Window.h:291
#define ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW(f, s)
Definition: Validate.h:205
SimpleTensor< T > slice(const SimpleTensor< T > &src, Coordinates starts, Coordinates ends)

◆ validate()

Status validate ( const ITensorInfo input0,
const ITensorInfo input1,
const ITensorInfo output,
const GEMMKernelInfo gemm_info,
const ITensorInfo vector_sum_col = nullptr,
const ITensorInfo vector_sum_row = nullptr,
const ITensorInfo bias = nullptr,
const ITensorInfo output_multipliers = nullptr,
const ITensorInfo output_shifts = nullptr 
)
static

Static function to check if given info will lead to a valid configuration of CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel.

Parameters
[in]input0Input tensor info for the LHS matrix. Data type supported: QASYMM8/QASYMM8_SIGNED
[in]input1Input tensor info for the RHS reshaped matrix. Data type supported: QASYMM8/QASYMM8_SIGNED/QSYMM8/QSYMM8_PER_CHANNEL
[in]outputOutput tensor info. Data type supported: QASYMM8/QASYMM8_SIGNED/S32.
[in]gemm_infoGEMM information used to retrieve the original dimensions of the input matrices, output stage information and RHS/LHS info. Only the following values are supported for LHS info: lhs_info.m0: 2,3,4,5,6,7,8 lhs_info.k0: 2,3,4,8,16 Only the following values are supported for RHS info: rhs_info.n0: 2,3,4,8,16 rhs_info.k0: same as lhs_info.k0 rhs_info.transpose: true
[in]vector_sum_col(Optional) Input row-vector info of sums of all the entries in each column of matrix B. Note: vector_sum_col can be a nullptr in case a_offset = 0. Data type supported: S32
[in]vector_sum_row(Optional) Input row-vector info of sums of all the entries in each row of matrix A. Note: vector_sum_row can be a nullptr in case b_offset = 0. Data type supported: S32
[in]bias(Optional) Biases tensor info. Only shared biases supported and it can be a nullptr if the addition of biases is not required. Biases are 1D tensor with dimensions [OFM]. Data type supported: S32.
[in]output_multipliers(Optional) Output multipliers tensor info. In case of per-channel quantization, the number of multipliers must be equal to the number of filters (OFM). Supported data types: S32.
[in]output_shifts(Optional) Output shifts tensor info. In case of per-channel quantization, the number of multipliers must be equal to the number of filters (OFM). Supported data types: S32.
Returns
a status

Definition at line 463 of file CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel.cpp.

References ARM_COMPUTE_RETURN_ON_ERROR, ICloneable< T >::clone(), and arm_compute::validate_arguments().

Referenced by CLGEMMLowpMatrixMultiplyCore::validate().

466 {
467  ElementsProcessed num_elements_processed{};
468  ARM_COMPUTE_RETURN_ON_ERROR(validate_arguments(input0, input1, output, gemm_info, vector_sum_col, vector_sum_row, bias, output_multipliers, output_shifts));
469  ARM_COMPUTE_RETURN_ON_ERROR(validate_and_configure_window(input0->clone().get(),
470  input1->clone().get(),
471  output->clone().get(),
472  gemm_info,
473  vector_sum_col != nullptr ? vector_sum_col->clone().get() : nullptr,
474  vector_sum_row != nullptr ? vector_sum_row->clone().get() : nullptr,
475  bias != nullptr ? bias->clone().get() : nullptr,
476  output_multipliers != nullptr ? output_multipliers->clone().get() : nullptr,
477  output_shifts != nullptr ? output_shifts->clone().get() : nullptr,
478  num_elements_processed)
479  .first);
480 
481  return Status{};
482 }
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204
Status validate_arguments(const ITensorInfo *input, const ITensorInfo *bias, const ITensorInfo *output, const GEMMLowpOutputStageInfo *output_stage)

The documentation for this class was generated from the following files: