Compute Library
 23.11
CpuGemm Class Reference

Basic function to execute GEMM. More...

#include <CpuGemm.h>

Collaboration diagram for CpuGemm:
[legend]

Public Member Functions

 CpuGemm ()=default
 Default constructor. More...
 
 ~CpuGemm ()=default
 Default destructor. More...
 
void configure (const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, ITensorInfo *d, float alpha, float beta, const GEMMInfo &gemm_info=GEMMInfo())
 Configure operator for a given list of arguments. More...
 
void run (ITensorPack &tensors) override
 Run the kernels contained in the function. More...
 
void prepare (ITensorPack &constants) override
 Prepare the function for executing. More...
 
experimental::MemoryRequirements workspace () const override
 Return the memory requirements required by the workspace. More...
 
bool isVarWeightsKernel () const
 Indicates if the convolution executes in variable weights mode. More...
 
- Public Member Functions inherited from INEOperator
 INEOperator (IRuntimeContext *ctx=nullptr)
 Constructor. More...
 
 INEOperator (const INEOperator &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 INEOperator (INEOperator &&)=default
 Default move constructor. More...
 
INEOperatoroperator= (const INEOperator &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
INEOperatoroperator= (INEOperator &&)=default
 Default move assignment operator. More...
 
 ~INEOperator ()
 Default destructor. More...
 
- Public Member Functions inherited from IOperator
virtual ~IOperator ()=default
 Destructor. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *d, float alpha, float beta, const GEMMInfo &gemm_info=GEMMInfo())
 Static function to check if given info will lead to a valid configuration of CpuGemm. More...
 
static Status has_opt_impl (arm_compute::WeightFormat &weight_format, const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *d, const GEMMInfo &gemm_info=GEMMInfo())
 Indicates whether or not there is an optimal assembly implementation that can be used to process the given parameters. More...
 

Detailed Description

Basic function to execute GEMM.

This function calls the following kernels:

If optimized assembly is available:

  1. cpu::CpuGemmAssemblyDispatch
  2. cpu::CpuActivation (if alpha != 1.0) Else:
  3. cpu::kernels::CpuGemmInterleave4x4Kernel (if the output tensor is a matrix)
  4. cpu::kernels::CpuGemmTranspose1xWKernel (if the output tensor is a matrix)
  5. cpu::kernels::CpuGemmMatrixMultiplyKernel In both cases:
  6. cpu::kernels::CpuGemmMatrixAdditionKernel (if c != nullptr and beta != 0.0 and is not reshaped once) Else:
  7. cpu::CpuAdd (if c != nullptr and is reshaped once and not optimized assembly in place)
  8. cpu::CpuActivation (if activation is specified in GEMMInfo)

Definition at line 64 of file CpuGemm.h.

Constructor & Destructor Documentation

◆ CpuGemm()

CpuGemm ( )
default

Default constructor.

◆ ~CpuGemm()

~CpuGemm ( )
default

Default destructor.

Member Function Documentation

◆ configure()

void configure ( const ITensorInfo a,
const ITensorInfo b,
const ITensorInfo c,
ITensorInfo d,
float  alpha,
float  beta,
const GEMMInfo gemm_info = GEMMInfo() 
)

Configure operator for a given list of arguments.

Valid data layouts:

  • All

Valid data type configurations:

src0 src1 src2 dst
F32 F32 F32 F32
F16 F16 F16 F16
BFLOAT16 BFLOAT16 BFLOAT16 FP32
Note
GEMM: General Matrix Multiply - [alpha * A * B + beta * C].
GEMM: The tensors a, b, c, d must have the same data type. You should not mix data types when calling this function.
Batched GEMM only supports broadcasting cases where RHS rank < LHS rank but not the other way around
Parameters
[in]aFirst input tensor info (Matrix A or Vector A). Data type supported: BFLOAT16/F16/F32
[in]bSecond input tensor info (Matrix B). Data type supported: same as a
[in]cThird input tensor info (Matrix C). It can be a nullptr if just the multiplication between a and b is needed. Data type supported: same as a
[out]dOutput tensor info. Data type supported: same as a
[in]alphaWeight of the matrix product
[in]betaWeight of matrix C
[in]gemm_info(Optional) Specifies if the matrix A and/or matrix B have been reshaped and if the reshape of matrix B should happen only for the first run

Definition at line 63 of file CpuGemm.cpp.

70 {
72  ARM_COMPUTE_ERROR_THROW_ON(CpuGemm::validate(a, b, c, d, alpha, beta, gemm_info));
73  ARM_COMPUTE_LOG_PARAMS(a, b, c, d, alpha, beta, gemm_info);
74 
75  const cpu::AsmGemmInfo asm_info = init_assembly_metadata(gemm_info);
76  const bool is_c_bias = beta == 1 && c != nullptr;
77  const bool run_optimised =
78  bool(cpu::CpuGemmAssemblyDispatch::validate(a, b, (is_c_bias) ? c : nullptr, d, asm_info)) &&
79  (c == nullptr || beta == 0.f || beta == 1.f) && // Optimized GeMM doesn't support beta coefficient.
80  !(!b->are_values_constant() &&
81  b->tensor_shape().z() > 1); // Disable batch matmul as optimized GeMM handles batching differently.
82 
83  // Check if we need to reshape the matrix B only on the first run
84  _is_prepared = false;
85  _reshape_b_only_on_first_run = b->are_values_constant();
86  _run_vector_matrix_multiplication = a->dimension(1) < 2;
87  _run_alpha_scale = alpha != 1.f;
88  _run_bias_addition = is_c_bias;
89  _run_addition = beta != 0 && beta != 1 && c != nullptr;
90  _run_activation =
91  gemm_info.activation_info().enabled() &&
92  (!run_optimised ||
93  (run_optimised && !cpu::CpuGemmAssemblyDispatch::is_activation_supported(gemm_info.activation_info())));
94 
95  if (run_optimised)
96  {
97  _run_interleave_transpose = false;
98  const ITensorInfo *c_to_use = is_c_bias ? c : nullptr;
99  _asm_glue = std::make_unique<cpu::CpuGemmAssemblyDispatch>();
100  _asm_glue->configure(a, b, c_to_use, d, asm_info);
101  ARM_COMPUTE_ERROR_ON(!_asm_glue->is_configured());
102 
103  const auto asm_mem_req = _asm_glue->workspace();
104  for (unsigned int slot = 0; slot < asm_mem_req.size(); ++slot)
105  {
106  _aux_mem[slot] = asm_mem_req[slot];
107  }
108 
109  // Scale product by alpha
110  if (_run_alpha_scale)
111  {
112  _alpha_scale_func = std::make_unique<cpu::CpuActivation>();
113  _alpha_scale_func->configure(
114  d, nullptr, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::LINEAR, alpha, 0.f));
115  }
116  }
117  else
118  {
119  _run_interleave_transpose = !_run_vector_matrix_multiplication;
120  // Pick output tensor in case bias addition should be performed
121  ITensorInfo *gemm_output_to_use = (_run_bias_addition) ? &_tmp_d : d;
122  // Pick b tensor in case pretranspose should be performed
123  const ITensorInfo *b_to_use = b;
124 
125  _mm_kernel = std::make_unique<cpu::kernels::CpuGemmMatrixMultiplyKernel>();
126 
127  // Configure rhs pretranspose
128  if (gemm_info.pretranspose_B())
129  {
130  _pretranspose_b_func = std::make_unique<CpuTranspose>();
131  _pretranspose_b_func->configure(b_to_use, &_pretransposed_b);
132  MemoryLifetime lifetime;
133  if (_reshape_b_only_on_first_run)
134  {
135  if (_run_interleave_transpose)
136  {
137  // PreTransposedRHS tensor is only used in prepare(), but is then succeeded by Transposed1xWRHS
138  // So PreTransposedRHS can be freed inside prepare()
139  lifetime = MemoryLifetime::Prepare;
140  }
141  else
142  {
143  // PreTransposedRHS tensor is only used in prepare(), but is the final transformation of rhs
144  // So PreTransposedRHS needs to persist beyond prepare()
145  lifetime = MemoryLifetime::Persistent;
146  }
147  }
148  else
149  {
150  // PreTransposedRHS tensor is always used in run() and doesn't need to persist
151  lifetime = MemoryLifetime::Temporary;
152  }
153  _aux_mem[PreTransposedRHS] =
154  MemoryInfo(offset_int_vec(PreTransposedRHS), lifetime, _pretransposed_b.total_size());
155  b_to_use = &_pretransposed_b;
156  }
157 
158  // Select between GEMV and GEMM
159  if (_run_vector_matrix_multiplication)
160  {
161  // Configure the matrix multiply kernel
162  _mm_kernel->configure(a, b_to_use, gemm_output_to_use, alpha, false);
163  }
164  else
165  {
166  ARM_COMPUTE_ERROR_ON(!_run_interleave_transpose);
167  // Configure interleave kernel
168  _interleave_kernel = std::make_unique<cpu::kernels::CpuGemmInterleave4x4Kernel>();
169  _interleave_kernel->configure(a, &_tmp_a);
170  _aux_mem[InterleavedLHS] =
171  MemoryInfo(offset_int_vec(InterleavedLHS), MemoryLifetime::Temporary, _tmp_a.total_size());
172 
173  // Configure rhs transpose1xw kernel
174  _transpose1xW_b_kernel = std::make_unique<cpu::kernels::CpuGemmTranspose1xWKernel>();
175  _transpose1xW_b_kernel->configure(b_to_use, &_tmp_b);
176  _aux_mem[Transposed1xWRHS] =
177  MemoryInfo(offset_int_vec(Transposed1xWRHS), MemoryLifetime::Persistent, _tmp_b.total_size());
178 
179  // Use a and b here instead of _tmp_a and _tmp_b because CpuGemmMatrixMultiplyKernel requires the original m,n,k in case of interleaved a and transposed1xw b
180  const int m = a->dimension(1);
181  const int n = b_to_use->dimension(0);
182  const int k = a->dimension(0);
183 
184  // Configure matrix multiplication kernel
185  _mm_kernel->configure(&_tmp_a, &_tmp_b, gemm_output_to_use, alpha, _run_interleave_transpose,
186  GEMMReshapeInfo(m, n, k));
187  }
188 
189  if (_run_bias_addition)
190  {
191  _add_bias = std::make_unique<cpu::CpuAdd>();
192  _add_bias->configure(gemm_output_to_use, c, d, ConvertPolicy::SATURATE);
193  _aux_mem[TempResult] =
194  MemoryInfo(offset_int_vec(TempResult), MemoryLifetime::Temporary, _tmp_d.total_size());
195  }
196  }
197 
198  // Configure matrix addition kernel
199  if (_run_addition)
200  {
201  _ma_kernel = std::make_unique<cpu::kernels::CpuGemmMatrixAdditionKernel>();
202  _ma_kernel->configure(c, d, beta);
203  }
204 
205  // Configure activation
206  if (_run_activation)
207  {
208  _activation_func = std::make_unique<cpu::CpuActivation>();
209  _activation_func->configure(d, nullptr, gemm_info.activation_info());
210  }
211 }

References ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, ARM_COMPUTE_LOG_PARAMS, arm_compute::test::validation::b, CpuGemmAssemblyDispatch::validate(), and arm_compute::validate().

◆ has_opt_impl()

Status has_opt_impl ( arm_compute::WeightFormat weight_format,
const ITensorInfo a,
const ITensorInfo b,
const ITensorInfo c,
const ITensorInfo d,
const GEMMInfo gemm_info = GEMMInfo() 
)
static

Indicates whether or not there is an optimal assembly implementation that can be used to process the given parameters.

This method has the same use of NEGEMMConvolutionLayer::has_opt_impl, with the only caveat that the value of arm_compute::WeightFormat need to be passed via the parameter gemm_info.

Definition at line 539 of file CpuGemm.cpp.

545 {
546  const cpu::AsmGemmInfo asm_info = init_assembly_metadata(gemm_info);
547 
548  return CpuGemmAssemblyDispatch::has_opt_impl(expected_weight_format, a, b, c, d, asm_info);
549 }

References arm_compute::test::validation::b.

Referenced by NEGEMM::has_opt_impl(), CpuFullyConnected::has_opt_impl(), and CpuGemmConv2d::has_opt_impl().

◆ isVarWeightsKernel()

bool isVarWeightsKernel ( ) const

Indicates if the convolution executes in variable weights mode.

When ACL executes convolution in variable weights mode, it does not perform any processing of the weights tensor. Instead, it utilizes the data as it is given by the user.

Definition at line 551 of file CpuGemm.cpp.

552 {
553  return _asm_glue && _asm_glue->isVarWeightsKernel();
554 }

◆ prepare()

void prepare ( ITensorPack constants)
overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Parameters
[in]constantsVector that contains the constants tensors.
Note
Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from INEOperator.

Definition at line 495 of file CpuGemm.cpp.

496 {
497  if (!_is_prepared)
498  {
499  if (_asm_glue && _asm_glue->is_configured())
500  {
501  _asm_glue->prepare(tensors);
502  }
503  else if (_reshape_b_only_on_first_run)
504  {
505  const ITensor *b = tensors.get_const_tensor(ACL_SRC_1);
506  const ITensor *b_to_use = b;
507  CpuAuxTensorHandler pretransposed_b(
508  offset_int_vec(PreTransposedRHS), _pretransposed_b, tensors,
509  false /*pack_inject: no need to inject into tensors*/,
510  _pretranspose_b_func ==
511  nullptr /*bypass_alloc: no need to allocate if _pretranspose_b_func is not run*/);
512  CpuAuxTensorHandler transposed1xw_b(offset_int_vec(Transposed1xWRHS), _tmp_b, tensors,
513  false /*pack_inject*/, !_run_interleave_transpose /*bypass_alloc*/);
514 
515  if (_pretranspose_b_func)
516  {
517  // Run pretranspose kernel
518  ITensorPack pretranspose_pack{{ACL_SRC, b_to_use}, {ACL_DST, pretransposed_b.get()}};
519  _pretranspose_b_func->run(pretranspose_pack);
520  b_to_use = pretransposed_b.get();
521  }
522  if (_run_interleave_transpose)
523  {
524  // Run transpose kernel
525  ITensorPack transpose_pack{{ACL_SRC, b_to_use}, {ACL_DST, transposed1xw_b.get()}};
526  NEScheduler::get().schedule_op(_transpose1xW_b_kernel.get(), Window::DimY,
527  _transpose1xW_b_kernel->window(), transpose_pack);
528  }
529  }
530  _is_prepared = true;
531  }
532 }

◆ run()

void run ( ITensorPack tensors)
overridevirtual

Run the kernels contained in the function.

Parameters
[in]tensorsVector that contains the tensors to operate on.

Reimplemented from INEOperator.

Definition at line 403 of file CpuGemm.cpp.

404 {
405  prepare(tensors);
406 
407  auto a = tensors.get_const_tensor(ACL_SRC_0);
408  auto b = tensors.get_const_tensor(ACL_SRC_1);
409  auto c = tensors.get_const_tensor(ACL_SRC_2);
410  auto d = tensors.get_tensor(ACL_DST);
411 
412  if (_asm_glue && _asm_glue->is_configured())
413  {
414  // Pass c to asm dispatch only if it's the bias tensor
415  ITensorPack asm_pack = tensors;
416  asm_pack.add_const_tensor(ACL_SRC_2, _run_bias_addition ? c : nullptr);
417  _asm_glue->run(asm_pack);
418  if (_run_alpha_scale)
419  {
420  ITensorPack pack{{ACL_SRC, d}, {ACL_DST, d}};
421  _alpha_scale_func->run(pack);
422  }
423  }
424  else
425  {
426  CpuAuxTensorHandler interleaved_a(offset_int_vec(InterleavedLHS), _tmp_a, tensors, true);
427  CpuAuxTensorHandler pretransposed_b(offset_int_vec(PreTransposedRHS), _pretransposed_b, tensors);
428  CpuAuxTensorHandler transposed1xw_b(offset_int_vec(Transposed1xWRHS), _tmp_b, tensors, true);
429  CpuAuxTensorHandler temp_d(offset_int_vec(TempResult), _tmp_d, tensors, true);
430 
431  ITensorPack mm_pack{{ACL_SRC_0, a}, {ACL_SRC_1, b}, {ACL_DST, (_run_bias_addition) ? temp_d.get() : d}};
432 
433  if (_run_interleave_transpose)
434  {
435  // Run interleave kernel
436  ITensorPack interleave_pack{{ACL_SRC, a}, {ACL_DST, interleaved_a.get()}};
437  NEScheduler::get().schedule_op(_interleave_kernel.get(), Window::DimY, _interleave_kernel->window(),
438  interleave_pack);
439  // Use reshaped matrices
440  mm_pack.add_const_tensor(ACL_SRC_0, interleaved_a.get());
441  }
442 
443  const ITensor *b_to_use = b;
444  if (_pretranspose_b_func)
445  {
446  if (!_reshape_b_only_on_first_run)
447  {
448  // Run pretranspose kernel
449  ITensorPack pretranspose_pack{{ACL_SRC, b_to_use}, {ACL_DST, pretransposed_b.get()}};
450  _pretranspose_b_func->run(pretranspose_pack);
451  }
452  b_to_use = pretransposed_b.get();
453  }
454  if (_run_interleave_transpose)
455  {
456  if (!_reshape_b_only_on_first_run)
457  {
458  // Run transpose1xw kernel
459  ITensorPack transpose_pack{{ACL_SRC, b_to_use}, {ACL_DST, transposed1xw_b.get()}};
460  NEScheduler::get().schedule_op(_transpose1xW_b_kernel.get(), Window::DimY,
461  _transpose1xW_b_kernel->window(), transpose_pack);
462  }
463  b_to_use = transposed1xw_b.get();
464  }
465  // Use reshaped matrices
466  mm_pack.add_const_tensor(ACL_SRC_1, b_to_use);
467 
468  NEScheduler::get().schedule_op(_mm_kernel.get(),
469  _run_vector_matrix_multiplication ? Window::DimX : Window::DimY,
470  _mm_kernel->window(), mm_pack);
471 
472  // Run bias addition kernel
473  if (_run_bias_addition)
474  {
475  ITensorPack pack{{ACL_SRC_0, temp_d.get()}, {ACL_SRC_1, c}, {ACL_DST, d}};
476  _add_bias->run(pack);
477  }
478  }
479 
480  // Run matrix addition kernel
481  if (_run_addition)
482  {
483  ITensorPack c_add_pack{{ACL_SRC, c}, {ACL_DST, d}};
484  NEScheduler::get().schedule_op(_ma_kernel.get(), Window::DimY, _ma_kernel->window(), c_add_pack);
485  }
486 
487  // Run activation function
488  if (_run_activation)
489  {
490  ITensorPack pack{{ACL_SRC, d}, {ACL_DST, d}};
491  _activation_func->run(pack);
492  }
493 }

References arm_compute::ACL_DST, arm_compute::ACL_SRC, arm_compute::ACL_SRC_0, arm_compute::ACL_SRC_1, arm_compute::ACL_SRC_2, ITensorPack::add_const_tensor(), arm_compute::test::validation::b, Window::DimX, Window::DimY, Scheduler::get(), CpuAuxTensorHandler::get(), ITensorPack::get_const_tensor(), ITensorPack::get_tensor(), arm_compute::offset_int_vec(), arm_compute::test::validation::pack, and IScheduler::schedule_op().

◆ validate()

Status validate ( const ITensorInfo a,
const ITensorInfo b,
const ITensorInfo c,
const ITensorInfo d,
float  alpha,
float  beta,
const GEMMInfo gemm_info = GEMMInfo() 
)
static

Static function to check if given info will lead to a valid configuration of CpuGemm.

Similar to CpuGemm::configure()

Returns
a status

Definition at line 213 of file CpuGemm.cpp.

220 {
221  ARM_COMPUTE_UNUSED(alpha);
222  const bool is_c_bias = beta == 1 && c != nullptr;
223  const bool run_addition = c != nullptr && beta != 0 && beta != 1;
224  // Check if we should use the pretransposed_b or original b
225  // TODO: COMPMID-6597
226  // Note that this check should only apply to the non-optimized path. The reason we brought this at the beginning
227  // instead of only for the fallback path is because of the checks performed below, between here and the run_optimised decision
228  // We should simplify this by
229  // 1. Moving the checks between "fix-start" and "fix-end" into their corresponding ops / kernels (e.g. the weights format checks can and should be moved into CpuGemmAssemblyDispatch)
230  // 2. Moving this b_to_use check back into the non-optimized path
231  TensorInfo pretransposed_b = b->clone()->set_tensor_shape(misc::shape_calculator::compute_transposed_shape(*b));
232  const ITensorInfo *b_to_use = gemm_info.pretranspose_B() ? &pretransposed_b : b;
233  // TODO: COMPMID-6597 fix-start
234 
238 
239  if (is_fixed_format_fast_math(gemm_info.weight_format()))
240  {
243  }
244  else
245  {
247  }
248 
249  const int block_by = arm_compute::block_by(gemm_info.weight_format());
250  // test if im2col has changed the dimensions that are needed for padding
251  if (a->dimension(0) != b_to_use->dimension(1) && block_by > 1)
252  {
253  // have to verify bias
254  const size_t dim0_sz = a->dimension(0);
256  (dim0_sz % block_by) != 0,
257  ("The matrix A number of columns must be a multiple of block_by=" + std::to_string(block_by)).c_str());
258  // a->dimension(0) = kernel_area * input_channel + kernel_area * input_pad_right
259  // b_to_use->dimension(1) = kernel_area * input_channel
260  // a->dimension(0) = b_to_use->dimension(1) + kernel_area * input_pad_right
261  const size_t input_pad_right = (dim0_sz - b_to_use->dimension(1)) % block_by;
262  const size_t kernel_area = (dim0_sz - b_to_use->dimension(1)) / input_pad_right;
264  (dim0_sz - kernel_area * input_pad_right) != b_to_use->dimension(1),
265  "The product AB is defined only if A number of columns and B number of rows are related");
266  }
267  else
268  {
270  a->dimension(0) != b_to_use->dimension(1),
271  "The product AB is defined only if the number of columns in A is equal to the number of rows in B");
272  }
273 
274  ARM_COMPUTE_RETURN_ERROR_ON_MSG(gemm_info.is_a_reshaped(), "Matrix A already reshaped is not supported");
275  ARM_COMPUTE_RETURN_ERROR_ON_MSG(gemm_info.is_b_reshaped(), "Matrix B already reshaped is not supported");
276  if (a->data_type() != DataType::BFLOAT16)
277  {
279  }
280 
281  if (run_addition)
282  {
283  ARM_COMPUTE_RETURN_ERROR_ON(gemm_info.depth_output_gemm3d() != 0);
284  ARM_COMPUTE_RETURN_ERROR_ON(gemm_info.reinterpret_input_as_3d());
286  ARM_COMPUTE_RETURN_ERROR_ON_MSG(a->dimension(1) != c->dimension(1),
287  "The C matrix must have the same number of rows as the matrix A");
288  ARM_COMPUTE_RETURN_ERROR_ON_MSG(b_to_use->dimension(0) != c->dimension(0),
289  "The C matrix must have the same number of columns as the matrix B");
290  }
291 
292  if (d->total_size() != 0)
293  {
294  // For fixed format we are expecting some kind of blocked format for B/RHS so the dimension won't necessarily match the result matrix any more.
295  ARM_COMPUTE_RETURN_ERROR_ON(!gemm_info.fixed_format() && b_to_use->dimension(0) != d->dimension(0));
296  if (gemm_info.depth_output_gemm3d() != 0)
297  {
298  if (gemm_info.reinterpret_input_as_3d())
299  {
300  ARM_COMPUTE_RETURN_ERROR_ON(a->dimension(1) != d->dimension(1));
301  ARM_COMPUTE_RETURN_ERROR_ON(a->dimension(2) != d->dimension(2));
302  }
303  else
304  {
305  ARM_COMPUTE_RETURN_ERROR_ON(a->dimension(1) != d->dimension(1) * d->dimension(2));
306  }
307  }
308  else
309  {
310  ARM_COMPUTE_RETURN_ERROR_ON(a->dimension(1) != d->dimension(1));
311  }
312  }
313  // TODO: COMPMID-6597 fix-end
314 
315  // Check if we need to run the optimized assembly kernel
316  cpu::AsmGemmInfo asm_info = init_assembly_metadata(gemm_info);
317 
318  // Note we use b instead of b_to_use here because asm_info also captures the pretranspose_b() flag
319  // so we pass the original b to CpuGemmAssemblyDispatch
320  const bool run_optimised =
321  bool(cpu::CpuGemmAssemblyDispatch::validate(a, b, is_c_bias ? c : nullptr, d, asm_info)) &&
322  (c == nullptr || beta == 0.f || beta == 1.f) && // Optimized GeMM doesn't support beta coefficient.
323  !(!b->are_values_constant() &&
324  b->tensor_shape().z() > 1); // Disable batch matmul as optimized GeMM handles batching differently.
325 
326  if (!run_optimised)
327  {
328  ARM_COMPUTE_RETURN_ERROR_ON_MSG(gemm_info.reinterpret_input_as_3d(),
329  "CpuGemm cannot reinterpret the input tensor as 3D");
330  ARM_COMPUTE_RETURN_ERROR_ON_MSG(gemm_info.depth_output_gemm3d() != 0,
331  "CpuGemm cannot reinterpret the output tensor as 3D");
332 
333  // Check if the first input tensor is a vector.
334  const bool run_vector_matrix_multiplication = a->dimension(1) < 2;
335  // Check if we need to reshape the matrix A and matrix B
336  const bool run_interleave_transpose = !run_vector_matrix_multiplication;
337 
338  // Arguments used by GEMMReshapeInfo
339  // If we pass the matrix A and matrix B reshaped to CpuGemmMatrixMultiplyKernel, we need to pass m, n, k, mult_transpose1xW_width and mult_interleave4x4_height to GEMMReshapeInfo
340  // in order to know how the matrices have been reshaped
341  const int m = a->dimension(1);
342  const int n = b_to_use->dimension(0);
343  const int k = a->dimension(0);
344  int mult_transpose1xW_width = 1;
345  int mult_interleave4x4_height = 1;
346 
347  const GEMMReshapeInfo reshape_info = GEMMReshapeInfo(
348  m, n, k, mult_transpose1xW_width, mult_interleave4x4_height, gemm_info.depth_output_gemm3d());
349 
350  const ITensorInfo *matrix_a_info = a;
351  const ITensorInfo *matrix_b_info = b_to_use;
352 
353  TensorInfo tmp_a_info{};
354  TensorInfo tmp_b_info{};
355  TensorInfo tmp_output_info = *d->clone();
356 
357  if (run_interleave_transpose)
358  {
359  matrix_a_info = &tmp_a_info;
360  matrix_b_info = &tmp_b_info;
361 
362  // Validate interleave kernel
363  auto_init_if_empty(tmp_a_info, a->clone()->set_tensor_shape(compute_interleaved_shape(
364  *a, mult_interleave4x4_height, gemm_info.reinterpret_input_as_3d())));
366 
367  // Validate transpose kernel
368  auto_init_if_empty(tmp_b_info,
369  b_to_use->clone()->set_tensor_shape(
370  compute_transpose1xW_with_element_size_shape(*b_to_use, mult_transpose1xW_width)));
372  }
373 
374  // Validate matrix multiply
375  auto_init_if_empty(tmp_output_info,
376  matrix_a_info->clone()->set_tensor_shape(compute_mm_shape(
377  *matrix_a_info, *matrix_b_info, run_interleave_transpose, reshape_info)));
379  matrix_a_info, matrix_b_info, &tmp_output_info, alpha, run_interleave_transpose, reshape_info));
380 
381  if (is_c_bias)
382  {
384  }
385  }
386 
387  // Validate matrix addition kernel
388  if (run_addition)
389  {
391  }
392 
393  // Validate activation
394  const ActivationLayerInfo &activation = gemm_info.activation_info();
395  if (activation.enabled())
396  {
398  }
399 
400  return Status{};
401 }

References GEMMInfo::activation_info(), ARM_COMPUTE_RETURN_ERROR_ON, ARM_COMPUTE_RETURN_ERROR_ON_CPU_BF16_UNSUPPORTED, ARM_COMPUTE_RETURN_ERROR_ON_CPU_F16_UNSUPPORTED, ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_NOT_IN, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES, ARM_COMPUTE_RETURN_ERROR_ON_MSG, ARM_COMPUTE_RETURN_ON_ERROR, ARM_COMPUTE_UNUSED, arm_compute::auto_init_if_empty(), arm_compute::test::validation::b, arm_compute::BFLOAT16, arm_compute::block_by(), ICloneable< T >::clone(), arm_compute::misc::shape_calculator::compute_interleaved_shape(), arm_compute::misc::shape_calculator::compute_mm_shape(), arm_compute::misc::shape_calculator::compute_transpose1xW_with_element_size_shape(), arm_compute::misc::shape_calculator::compute_transposed_shape(), ITensorInfo::data_type(), GEMMInfo::depth_output_gemm3d(), ITensorInfo::dimension(), ActivationLayerInfo::enabled(), arm_compute::F16, arm_compute::F32, GEMMInfo::fixed_format(), GEMMInfo::is_a_reshaped(), GEMMInfo::is_b_reshaped(), arm_compute::is_fixed_format_fast_math(), GEMMInfo::pretranspose_B(), GEMMInfo::reinterpret_input_as_3d(), arm_compute::SATURATE, arm_compute::to_string(), ITensorInfo::total_size(), CpuActivation::validate(), CpuAdd::validate(), CpuGemmInterleave4x4Kernel::validate(), CpuGemmMatrixAdditionKernel::validate(), CpuGemmMatrixMultiplyKernel::validate(), CpuGemmTranspose1xWKernel::validate(), CpuGemmAssemblyDispatch::validate(), and GEMMInfo::weight_format().

Referenced by NEGEMM::configure(), and NEGEMM::validate().

◆ workspace()

experimental::MemoryRequirements workspace ( ) const
overridevirtual

Return the memory requirements required by the workspace.

Reimplemented from INEOperator.

Definition at line 534 of file CpuGemm.cpp.

535 {
536  return _aux_mem;
537 }

The documentation for this class was generated from the following files:
ARM_COMPUTE_RETURN_ERROR_ON_CPU_BF16_UNSUPPORTED
#define ARM_COMPUTE_RETURN_ERROR_ON_CPU_BF16_UNSUPPORTED(tensor)
Definition: Validate.h:123
arm_compute::DataType::BFLOAT16
@ BFLOAT16
16-bit brain floating-point number
arm_compute::cpu::CpuGemm::validate
static Status validate(const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *d, float alpha, float beta, const GEMMInfo &gemm_info=GEMMInfo())
Static function to check if given info will lead to a valid configuration of CpuGemm.
Definition: CpuGemm.cpp:213
arm_compute::misc::shape_calculator::compute_mm_shape
TensorShape compute_mm_shape(const ITensorInfo &input0, const ITensorInfo &input1, bool is_interleaved_transposed, const GEMMReshapeInfo &reshape_info)
Calculate the matrix multiplication output shape of two tensors.
Definition: ShapeCalculator.h:980
arm_compute::cpu::CpuGemmAssemblyDispatch::has_opt_impl
static Status has_opt_impl(arm_compute::WeightFormat &weight_format, const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *d, const AsmGemmInfo &info)
Indicates whether or not there is an optimal assembly implementation that can be used to process the ...
Definition: CpuGemmAssemblyDispatch.cpp:826
arm_compute::cpu::CpuGemm::prepare
void prepare(ITensorPack &constants) override
Prepare the function for executing.
Definition: CpuGemm.cpp:495
arm_compute::IScheduler::schedule_op
virtual void schedule_op(ICPPKernel *kernel, const Hints &hints, const Window &window, ITensorPack &tensors)=0
Runs the kernel in the same thread as the caller synchronously.
arm_compute::cpu::kernels::CpuGemmMatrixMultiplyKernel::validate
static Status validate(const ITensorInfo *lhs, const ITensorInfo *rhs, const ITensorInfo *dst, float alpha, bool is_interleaved, const GEMMReshapeInfo &reshape_info)
Static function to check if given info will lead to a valid configuration of CpuGemmMatrixMultiplyKer...
Definition: CpuGemmMatrixMultiplyKernel.cpp:174
arm_compute::Window::DimX
static constexpr size_t DimX
Alias for dimension 0 also known as X dimension.
Definition: Window.h:43
arm_compute::cpu::CpuActivation::validate
static Status validate(const ITensorInfo *input, const ITensorInfo *output, const ActivationLayerInfo &act_info)
Static function to check if given info will lead to a valid configuration.
Definition: CpuActivation.cpp:47
arm_compute::experimental::MemoryLifetime::Prepare
@ Prepare
arm_compute::ACL_SRC_0
@ ACL_SRC_0
Definition: Types.h:45
arm_compute::misc::shape_calculator::compute_transposed_shape
TensorShape compute_transposed_shape(const ITensorInfo &input)
Calculate the transposed shape of a tensor.
Definition: ShapeCalculator.h:415
arm_compute::ACL_SRC_1
@ ACL_SRC_1
Definition: Types.h:46
arm_compute::cpu::kernels::CpuGemmTranspose1xWKernel::validate
static Status validate(const ITensorInfo *src, const ITensorInfo *dst)
Static function to check if given info will lead to a valid configuration of CpuGemmTranspose1xWKerne...
Definition: CpuGemmTranspose1xWKernel.cpp:61
ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(...)
Definition: Validate.h:677
ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN
#define ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(t, c,...)
Definition: Validate.h:952
arm_compute::ACL_SRC_2
@ ACL_SRC_2
Definition: Types.h:47
ARM_COMPUTE_RETURN_ON_ERROR
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:205
arm_compute::misc::shape_calculator::compute_transpose1xW_with_element_size_shape
TensorShape compute_transpose1xW_with_element_size_shape(const ITensorInfo &b, int mult_transpose1xW_width=1)
Calculate the transposed 1xW width element shape.
Definition: ShapeCalculator.h:323
ARM_COMPUTE_ERROR_ON_NULLPTR
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:159
arm_compute::experimental::MemoryInfo
Definition: Types.h:91
arm_compute::cpu::CpuAdd::validate
static Status validate(const ITensorInfo *src0, const ITensorInfo *src1, const ITensorInfo *dst, ConvertPolicy policy, const ActivationLayerInfo &act_info=ActivationLayerInfo())
Static function to check if given info will lead to a valid configuration.
Definition: CpuAdd.cpp:48
ARM_COMPUTE_ERROR_ON
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:466
arm_compute::block_by
int block_by(const WeightFormat wf)
Definition: Types.h:1656
arm_compute::misc::shape_calculator::compute_interleaved_shape
TensorShape compute_interleaved_shape(const ITensorInfo &a, int mult_interleave4x4_height=1, bool reinterpret_input_as_3d=false)
Calculate the interleaved shape of an input tensor.
Definition: ShapeCalculator.h:270
ARM_COMPUTE_ERROR_THROW_ON
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
arm_compute::experimental::MemoryLifetime
MemoryLifetime
Definition: Types.h:85
arm_compute::cpu::kernels::CpuGemmMatrixAdditionKernel::validate
static Status validate(const ITensorInfo *src, const ITensorInfo *dst, float beta)
Static function to check if given info will lead to a valid configuration of CpuGemmMatrixAdditionKer...
Definition: CpuGemmMatrixAdditionKernel.cpp:72
ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_NOT_IN
#define ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_NOT_IN(t,...)
Definition: Validate.h:838
ARM_COMPUTE_RETURN_ERROR_ON
#define ARM_COMPUTE_RETURN_ERROR_ON(cond)
If the condition is true, an error is returned.
Definition: Error.h:298
arm_compute::TensorInfo::total_size
size_t total_size() const override
Returns the total size of the tensor in bytes.
Definition: TensorInfo.h:261
arm_compute::ACL_DST
@ ACL_DST
Definition: Types.h:55
ARM_COMPUTE_RETURN_ERROR_ON_CPU_F16_UNSUPPORTED
#define ARM_COMPUTE_RETURN_ERROR_ON_CPU_F16_UNSUPPORTED(tensor)
Definition: Validate.h:117
arm_compute::auto_init_if_empty
bool auto_init_if_empty(ITensorInfo &info, const TensorShape &shape, int num_channels, DataType data_type, QuantizationInfo quantization_info=QuantizationInfo())
Auto initialize the tensor info (shape, number of channels and data type) if the current assignment i...
Definition: AutoConfiguration.h:43
arm_compute::Scheduler::get
static IScheduler & get()
Access the scheduler singleton.
Definition: Scheduler.cpp:94
arm_compute::ConvertPolicy::SATURATE
@ SATURATE
Saturate.
arm_compute::test::validation::pack
ITensorPack pack
Definition: Im2Col.cpp:188
ARM_COMPUTE_UNUSED
#define ARM_COMPUTE_UNUSED(...)
To avoid unused variables warnings.
Definition: Error.h:151
arm_compute::Window::DimY
static constexpr size_t DimY
Alias for dimension 1 also known as Y dimension.
Definition: Window.h:45
arm_compute::is_fixed_format_fast_math
bool is_fixed_format_fast_math(const WeightFormat &wf)
Definition: Types.h:1664
arm_compute::offset_int_vec
int offset_int_vec(int offset)
Definition: MemoryHelpers.h:38
arm_compute::cpu::kernels::CpuGemmInterleave4x4Kernel::validate
static Status validate(const ITensorInfo *src, const ITensorInfo *dst)
Static function to check if given info will lead to a valid configuration of CpuGemmInterleave4x4Kern...
Definition: CpuGemmInterleave4x4Kernel.cpp:58
arm_compute::test::validation::b
SimpleTensor< float > b
Definition: DFT.cpp:157
ARM_COMPUTE_RETURN_ERROR_ON_MSG
#define ARM_COMPUTE_RETURN_ERROR_ON_MSG(cond, msg)
If the condition is true, an error is returned.
Definition: Error.h:245
arm_compute::to_string
std::string to_string(const ClComponentElementwiseBinary::Attributes::ElementwiseOp &op)
Formatted output of the arm_compute::experimental::dynamic_fusion::ClComponentElementwiseBinary::Attr...
Definition: ElementwiseBinary.h:68
arm_compute::DataType::F16
@ F16
16-bit floating-point number
arm_compute::cpu::CpuGemmAssemblyDispatch::is_activation_supported
static bool is_activation_supported(const ActivationLayerInfo &activation)
Checks if activation is supported by the gemm assembly dispatcher.
Definition: CpuGemmAssemblyDispatch.cpp:968
arm_compute::ACL_SRC
@ ACL_SRC
Definition: Types.h:44
arm_compute::DataType::F32
@ F32
32-bit floating-point number
arm_compute::cpu::CpuGemmAssemblyDispatch::validate
static Status validate(const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *d, const AsmGemmInfo &info)
Indicates whether or not this function can be used to process the given parameters.
Definition: CpuGemmAssemblyDispatch.cpp:909
ARM_COMPUTE_LOG_PARAMS
#define ARM_COMPUTE_LOG_PARAMS(...)
Definition: Log.h:35