Compute Library
 22.08
CpuGemm Class Reference

Basic function to execute GEMM. More...

#include <CpuGemm.h>

Collaboration diagram for CpuGemm:
[legend]

Public Member Functions

 CpuGemm ()=default
 Default constructor. More...
 
 ~CpuGemm ()=default
 Default destructor. More...
 
void configure (const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, ITensorInfo *d, float alpha, float beta, const GEMMInfo &gemm_info=GEMMInfo())
 Configure operator for a given list of arguments. More...
 
void run (ITensorPack &tensors) override
 Run the kernels contained in the function. More...
 
void prepare (ITensorPack &constants) override
 Prepare the function for executing. More...
 
experimental::MemoryRequirements workspace () const override
 Return the memory requirements required by the workspace. More...
 
bool isVarWeightsKernel () const
 Indicates if the convolution executes in variable weights mode. More...
 
- Public Member Functions inherited from INEOperator
 INEOperator (IRuntimeContext *ctx=nullptr)
 Constructor. More...
 
 INEOperator (const INEOperator &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 INEOperator (INEOperator &&)=default
 Default move constructor. More...
 
INEOperatoroperator= (const INEOperator &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
INEOperatoroperator= (INEOperator &&)=default
 Default move assignment operator. More...
 
 ~INEOperator ()
 Default destructor. More...
 
- Public Member Functions inherited from IOperator
virtual ~IOperator ()=default
 Destructor. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *d, float alpha, float beta, const GEMMInfo &gemm_info=GEMMInfo())
 Static function to check if given info will lead to a valid configuration of CpuGemm. More...
 
static Status has_opt_impl (arm_compute::WeightFormat &weight_format, const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *d, const GEMMInfo &gemm_info=GEMMInfo())
 Indicates whether or not there is an optimal assembly implementation that can be used to process the given parameters. More...
 

Detailed Description

Basic function to execute GEMM.

This function calls the following kernels:

If optimized assembly is available:

  1. cpu::CpuGemmAssemblyDispatch
  2. cpu::CpuActivation (if alpha != 1.0) Else:
  3. cpu::kernels::CpuGemmInterleave4x4Kernel (if the output tensor is a matrix)
  4. cpu::kernels::CpuGemmTranspose1xWKernel (if the output tensor is a matrix)
  5. cpu::kernels::CpuGemmMatrixMultiplyKernel In both cases:
  6. cpu::kernels::CpuGemmMatrixAdditionKernel (if c != nullptr and beta != 0.0 and is not reshaped once) Else:
  7. cpu::CpuAdd (if c != nullptr and is reshaped once and not optimized assembly in place)
  8. cpu::CpuActivation (if activation is specified in GEMMInfo)

Definition at line 62 of file CpuGemm.h.

Constructor & Destructor Documentation

◆ CpuGemm()

CpuGemm ( )
default

Default constructor.

◆ ~CpuGemm()

~CpuGemm ( )
default

Default destructor.

Member Function Documentation

◆ configure()

void configure ( const ITensorInfo a,
const ITensorInfo b,
const ITensorInfo c,
ITensorInfo d,
float  alpha,
float  beta,
const GEMMInfo gemm_info = GEMMInfo() 
)

Configure operator for a given list of arguments.

Valid data layouts:

  • All

Valid data type configurations:

src0 src1 src2 dst
F32 F32 F32 F32
F16 F16 F16 F16
BFLOAT16 BFLOAT16 BFLOAT16 BFLOAT16
Note
GEMM: General Matrix Multiply - [alpha * A * B + beta * C].
GEMM: The tensors a, b, c, d must have the same data type. You should not mix data types when calling this function.
Parameters
[in]aFirst input tensor info (Matrix A or Vector A). Data type supported: BFLOAT16/F16/F32
[in]bSecond input tensor info (Matrix B). Data type supported: same as a
[in]cThird input tensor info (Matrix C). It can be a nullptr if just the multiplication between a and b is needed. Data type supported: same as a
[out]dOutput tensor info. Data type supported: same as a
[in]alphaWeight of the matrix product
[in]betaWeight of matrix C
[in]gemm_info(Optional) Specifies if the matrix A and/or matrix B have been reshaped and if the reshape of matrix B should happen only for the first run

Definition at line 60 of file CpuGemm.cpp.

References ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, ARM_COMPUTE_LOG_PARAMS, GEMMInfo::reshape_b_only_on_first_run(), arm_compute::experimental::dynamic_fusion::validate(), and CpuGemmAssemblyDispatch::validate().

61 {
64  ARM_COMPUTE_LOG_PARAMS(a, b, c, d, alpha, beta, gemm_info);
65 
66  const cpu::AsmGemmInfo asm_info = init_assembly_metadata(gemm_info);
67  const bool is_c_bias = gemm_info.reshape_b_only_on_first_run();
68  bool run_optimised = bool(cpu::CpuGemmAssemblyDispatch::validate(a, b, (is_c_bias) ? c : nullptr, d, asm_info));
69 
70  // Check if we need to reshape the matrix B only on the first run
71  _is_prepared = false;
72  _reshape_b_only_on_first_run = gemm_info.reshape_b_only_on_first_run();
73  _run_vector_matrix_multiplication = a->dimension(1) < 2;
74  _run_alpha_scale = alpha != 1.f;
75  _run_bias_addition = c != nullptr && gemm_info.reshape_b_only_on_first_run();
76  _run_addition = beta != 0 && c != nullptr && !gemm_info.reshape_b_only_on_first_run();
77  _run_activation = gemm_info.activation_info().enabled() && (!run_optimised || (run_optimised && !cpu::CpuGemmAssemblyDispatch::is_activation_supported(gemm_info.activation_info())));
78 
79  if(run_optimised)
80  {
81  const ITensorInfo *c_to_use = is_c_bias ? c : nullptr;
82  _asm_glue = std::make_unique<cpu::CpuGemmAssemblyDispatch>();
83  _asm_glue->configure(a, b, c_to_use, d, asm_info);
84  ARM_COMPUTE_ERROR_ON(!_asm_glue->is_configured());
85 
86  auto asm_mem_req = _asm_glue->workspace();
87  _aux_mem[AsmGemmWorkspace] = asm_mem_req[AsmGemmWorkspace];
88  _aux_mem[Pretraspose] = asm_mem_req[Pretraspose];
89 
90  // Scale product by alpha
91  if(_run_alpha_scale)
92  {
93  _alpha_scale_func = std::make_unique<cpu::CpuActivation>();
94  _alpha_scale_func->configure(d, nullptr, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::LINEAR, alpha, 0.f));
95  }
96  }
97  else
98  {
99  // Pick output tensor in case bias addition should be performed
100  ITensorInfo *gemm_output_to_use = (_run_bias_addition) ? &_tmp_d : d;
101 
102  _mm_kernel = std::make_unique<cpu::kernels::CpuGemmMatrixMultiplyKernel>();
103 
104  // Select between GEMV and GEMM
105  if(_run_vector_matrix_multiplication)
106  {
107  // Configure the matrix multiply kernel
108  _mm_kernel->configure(a, b, gemm_output_to_use, alpha, false);
109  }
110  else
111  {
112  const int m = a->dimension(1);
113  const int n = b->dimension(0);
114  const int k = a->dimension(0);
115 
116  // Configure interleave kernel
117  _interleave_kernel = std::make_unique<cpu::kernels::CpuGemmInterleave4x4Kernel>();
118  _interleave_kernel->configure(a, &_tmp_a);
119  _aux_mem[InterleavedLHS] = MemoryInfo(offset_int_vec(InterleavedLHS), MemoryLifetime::Temporary, _tmp_a.total_size());
120 
121  // Configure transpose kernel
122  _transpose_kernel = std::make_unique<cpu::kernels::CpuGemmTranspose1xWKernel>();
123  _transpose_kernel->configure(b, &_tmp_b);
124  _aux_mem[TransposedRHS] = MemoryInfo(offset_int_vec(TransposedRHS), MemoryLifetime::Persistent, _tmp_b.total_size());
125 
126  // Configure matrix multiplication kernel
127  _mm_kernel->configure(&_tmp_a, &_tmp_b, gemm_output_to_use, alpha, true, GEMMReshapeInfo(m, n, k));
128  }
129 
130  if(_run_bias_addition)
131  {
132  _add_bias = std::make_unique<cpu::CpuAdd>();
133  _add_bias->configure(gemm_output_to_use, c, d, ConvertPolicy::SATURATE);
134  _aux_mem[TempResult] = MemoryInfo(offset_int_vec(TempResult), MemoryLifetime::Temporary, _tmp_d.total_size());
135  }
136  }
137 
138  // Configure matrix addition kernel
139  if(_run_addition)
140  {
141  _ma_kernel = std::make_unique<cpu::kernels::CpuGemmMatrixAdditionKernel>();
142  _ma_kernel->configure(c, d, beta);
143  }
144 
145  // Configure activation
146  if(_run_activation)
147  {
148  _activation_func = std::make_unique<cpu::CpuActivation>();
149  _activation_func->configure(d, nullptr, gemm_info.activation_info());
150  }
151 }
SimpleTensor< float > b
Definition: DFT.cpp:157
static bool is_activation_supported(const ActivationLayerInfo &activation)
Checks if activation is supported by the gemm assembly dispatcher.
static Status validate(const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *d, float alpha, float beta, const GEMMInfo &gemm_info=GEMMInfo())
Static function to check if given info will lead to a valid configuration of CpuGemm.
Definition: CpuGemm.cpp:153
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:466
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
static Status validate(const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *d, const AsmGemmInfo &info)
Indicates whether or not this function can be used to process the given parameters.
size_t total_size() const override
Returns the total size of the tensor in bytes.
Definition: TensorInfo.h:250
#define ARM_COMPUTE_LOG_PARAMS(...)
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:157
int offset_int_vec(int offset)
Definition: MemoryHelpers.h:38

◆ has_opt_impl()

Status has_opt_impl ( arm_compute::WeightFormat weight_format,
const ITensorInfo a,
const ITensorInfo b,
const ITensorInfo c,
const ITensorInfo d,
const GEMMInfo gemm_info = GEMMInfo() 
)
static

Indicates whether or not there is an optimal assembly implementation that can be used to process the given parameters.

This method has the same use of NEGEMMConvolutionLayer::has_opt_impl, with the only caveat that the value of arm_compute::WeightFormat need to be passed via the parameter gemm_info.

Definition at line 371 of file CpuGemm.cpp.

Referenced by NEGEMM::has_opt_impl(), CpuFullyConnected::has_opt_impl(), and CpuGemmConv2d::has_opt_impl().

373 {
374  const cpu::AsmGemmInfo asm_info = init_assembly_metadata(gemm_info);
375 
376  return CpuGemmAssemblyDispatch::has_opt_impl(expected_weight_format, a, b, c, d, asm_info);
377 }
SimpleTensor< float > b
Definition: DFT.cpp:157
static Status has_opt_impl(arm_compute::WeightFormat &weight_format, const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *d, const AsmGemmInfo &info)
Indicates whether or not there is an optimal assembly implementation that can be used to process the ...

◆ isVarWeightsKernel()

bool isVarWeightsKernel ( ) const

Indicates if the convolution executes in variable weights mode.

When ACL executes convolution in variable weights mode, it does not perform any processing of the weights tensor. Instead, it utilizes the data as it is given by the user.

Definition at line 379 of file CpuGemm.cpp.

380 {
381  return _asm_glue && _asm_glue->isVarWeightsKernel();
382 }

◆ prepare()

void prepare ( ITensorPack constants)
overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Parameters
[in]constantsVector that contains the constants tensors.
Note
Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from INEOperator.

Definition at line 344 of file CpuGemm.cpp.

345 {
346  if(!_is_prepared)
347  {
348  if(_asm_glue && _asm_glue->is_configured())
349  {
350  _asm_glue->prepare(tensors);
351  }
352  else if(_reshape_b_only_on_first_run && !_run_vector_matrix_multiplication)
353  {
354  const ITensor *b = tensors.get_const_tensor(ACL_SRC_1);
355  ITensor *b_aux = utils::cast::polymorphic_cast<ITensor *>(tensors.get_tensor(offset_int_vec(TransposedRHS)));
357 
358  CpuAuxTensorHandler transposed_b(_tmp_b, *b_aux);
359  ITensorPack transpose_pack{ { ACL_SRC, b }, { ACL_DST, transposed_b.get() } };
360  NEScheduler::get().schedule_op(_transpose_kernel.get(), Window::DimY, _transpose_kernel->window(), transpose_pack);
361  }
362  _is_prepared = true;
363  }
364 }
SimpleTensor< float > b
Definition: DFT.cpp:157
virtual void schedule_op(ICPPKernel *kernel, const Hints &hints, const Window &window, ITensorPack &tensors)=0
Runs the kernel in the same thread as the caller synchronously.
static constexpr size_t DimY
Alias for dimension 1 also known as Y dimension.
Definition: Window.h:45
Target polymorphic_cast(Source *v)
Polymorphic cast between two types.
Definition: Cast.h:47
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:157
int offset_int_vec(int offset)
Definition: MemoryHelpers.h:38
static IScheduler & get()
Access the scheduler singleton.
Definition: Scheduler.cpp:94

◆ run()

void run ( ITensorPack tensors)
overridevirtual

Run the kernels contained in the function.

Parameters
[in]tensorsVector that contains the tensors to operate on.

Reimplemented from INEOperator.

Definition at line 273 of file CpuGemm.cpp.

References arm_compute::ACL_DST, arm_compute::ACL_SRC, arm_compute::ACL_SRC_0, arm_compute::ACL_SRC_1, arm_compute::ACL_SRC_2, ITensorPack::add_const_tensor(), arm_compute::test::validation::b, Window::DimX, Window::DimY, Scheduler::get(), CpuAuxTensorHandler::get(), ITensorPack::get_const_tensor(), ITensorPack::get_tensor(), arm_compute::offset_int_vec(), arm_compute::test::validation::pack, and IScheduler::schedule_op().

274 {
275  prepare(tensors);
276 
277  auto a = tensors.get_const_tensor(ACL_SRC_0);
278  auto b = tensors.get_const_tensor(ACL_SRC_1);
279  auto c = tensors.get_const_tensor(ACL_SRC_2);
280  auto d = tensors.get_tensor(ACL_DST);
281 
282  if(_asm_glue && _asm_glue->is_configured())
283  {
284  // Pass c to asm dispatch only if it's the bias tensor
285  ITensorPack asm_pack = tensors;
286  asm_pack.add_const_tensor(ACL_SRC_2, (_reshape_b_only_on_first_run) ? c : nullptr);
287  _asm_glue->run(asm_pack);
288  if(_run_alpha_scale)
289  {
290  ITensorPack pack{ { ACL_SRC, d }, { ACL_DST, d } };
291  _alpha_scale_func->run(pack);
292  }
293  }
294  else
295  {
296  CpuAuxTensorHandler interleaved_a(offset_int_vec(InterleavedLHS), _tmp_a, tensors, true);
297  CpuAuxTensorHandler transposed_b(offset_int_vec(TransposedRHS), _tmp_b, tensors, true);
298  CpuAuxTensorHandler temp_d(offset_int_vec(TempResult), _tmp_d, tensors, true);
299 
300  ITensorPack mm_pack{ { ACL_SRC_0, a }, { ACL_SRC_1, b }, { ACL_DST, (_run_bias_addition) ? temp_d.get() : d } };
301  if(!_run_vector_matrix_multiplication)
302  {
303  // Run interleave kernel
304  ITensorPack interleave_pack{ { ACL_SRC, a }, { ACL_DST, interleaved_a.get() } };
305  NEScheduler::get().schedule_op(_interleave_kernel.get(), Window::DimY, _interleave_kernel->window(), interleave_pack);
306 
307  if(!_reshape_b_only_on_first_run)
308  {
309  // Run transpose kernel
310  ITensorPack transpose_pack{ { ACL_SRC, b }, { ACL_DST, transposed_b.get() } };
311  NEScheduler::get().schedule_op(_transpose_kernel.get(), Window::DimY, _transpose_kernel->window(), transpose_pack);
312  }
313 
314  // Use reshaped matrices
315  mm_pack.add_const_tensor(ACL_SRC_0, interleaved_a.get());
316  mm_pack.add_const_tensor(ACL_SRC_1, transposed_b.get());
317  }
318 
319  NEScheduler::get().schedule_op(_mm_kernel.get(), _run_vector_matrix_multiplication ? Window::DimX : Window::DimY, _mm_kernel->window(), mm_pack);
320 
321  // Run bias addition kernel
322  if(_run_bias_addition)
323  {
324  ITensorPack pack{ { ACL_SRC_0, temp_d.get() }, { ACL_SRC_1, c }, { ACL_DST, d } };
325  _add_bias->run(pack);
326  }
327  }
328 
329  // Run matrix addition kernel
330  if(_run_addition)
331  {
332  ITensorPack c_add_pack{ { ACL_SRC, c }, { ACL_DST, d } };
333  NEScheduler::get().schedule_op(_ma_kernel.get(), Window::DimY, _ma_kernel->window(), c_add_pack);
334  }
335 
336  // Run activation function
337  if(_run_activation)
338  {
339  ITensorPack pack{ { ACL_SRC, d }, { ACL_DST, d } };
340  _activation_func->run(pack);
341  }
342 }
SimpleTensor< float > b
Definition: DFT.cpp:157
virtual void schedule_op(ICPPKernel *kernel, const Hints &hints, const Window &window, ITensorPack &tensors)=0
Runs the kernel in the same thread as the caller synchronously.
void prepare(ITensorPack &constants) override
Prepare the function for executing.
Definition: CpuGemm.cpp:344
static constexpr size_t DimX
Alias for dimension 0 also known as X dimension.
Definition: Window.h:43
static constexpr size_t DimY
Alias for dimension 1 also known as Y dimension.
Definition: Window.h:45
int offset_int_vec(int offset)
Definition: MemoryHelpers.h:38
static IScheduler & get()
Access the scheduler singleton.
Definition: Scheduler.cpp:94

◆ validate()

Status validate ( const ITensorInfo a,
const ITensorInfo b,
const ITensorInfo c,
const ITensorInfo d,
float  alpha,
float  beta,
const GEMMInfo gemm_info = GEMMInfo() 
)
static

Static function to check if given info will lead to a valid configuration of CpuGemm.

Similar to CpuGemm::configure()

Returns
a status

Definition at line 153 of file CpuGemm.cpp.

References GEMMInfo::activation_info(), ARM_COMPUTE_RETURN_ERROR_ON, ARM_COMPUTE_RETURN_ERROR_ON_CPU_BF16_UNSUPPORTED, ARM_COMPUTE_RETURN_ERROR_ON_CPU_F16_UNSUPPORTED, ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES, ARM_COMPUTE_RETURN_ERROR_ON_MSG, ARM_COMPUTE_RETURN_ON_ERROR, ARM_COMPUTE_UNUSED, arm_compute::auto_init_if_empty(), arm_compute::test::validation::b, arm_compute::BFLOAT16, ICloneable< T >::clone(), TensorInfo::clone(), arm_compute::misc::shape_calculator::compute_interleaved_shape(), arm_compute::misc::shape_calculator::compute_mm_shape(), arm_compute::misc::shape_calculator::compute_transpose1xW_with_element_size_shape(), ITensorInfo::data_type(), GEMMInfo::depth_output_gemm3d(), ITensorInfo::dimension(), ActivationLayerInfo::enabled(), arm_compute::F16, arm_compute::F32, GEMMInfo::fixed_format(), GEMMInfo::is_a_reshaped(), GEMMInfo::is_b_reshaped(), arm_compute::test::validation::k, arm_compute::test::validation::m, arm_compute::test::validation::n, GEMMInfo::reinterpret_input_as_3d(), GEMMInfo::reshape_b_only_on_first_run(), arm_compute::SATURATE, ITensorInfo::total_size(), CpuActivation::validate(), CpuAdd::validate(), CpuGemmInterleave4x4Kernel::validate(), CpuGemmMatrixAdditionKernel::validate(), CpuGemmMatrixMultiplyKernel::validate(), CpuGemmTranspose1xWKernel::validate(), and CpuGemmAssemblyDispatch::validate().

Referenced by NEGEMM::configure(), and NEGEMM::validate().

154 {
155  ARM_COMPUTE_UNUSED(alpha);
156  const bool is_c_bias = gemm_info.reshape_b_only_on_first_run();
157 
162  ARM_COMPUTE_RETURN_ERROR_ON_MSG(a->dimension(0) != b->dimension(1), "The product AB is defined only if the number of columns in A is equal to the number of rows in B");
163  ARM_COMPUTE_RETURN_ERROR_ON_MSG(gemm_info.is_a_reshaped(), "Matrix A already reshaped is not supported");
164  ARM_COMPUTE_RETURN_ERROR_ON_MSG(gemm_info.is_b_reshaped(), "Matrix B already reshaped is not supported");
165  if(a->data_type() != DataType::BFLOAT16)
166  {
168  }
169 
170  if(c != nullptr && !is_c_bias)
171  {
172  ARM_COMPUTE_RETURN_ERROR_ON(gemm_info.depth_output_gemm3d() != 0);
173  ARM_COMPUTE_RETURN_ERROR_ON(gemm_info.reinterpret_input_as_3d());
175  ARM_COMPUTE_RETURN_ERROR_ON_MSG(a->dimension(1) != c->dimension(1), "The C matrix must have the same number of rows as the matrix A");
176  ARM_COMPUTE_RETURN_ERROR_ON_MSG(b->dimension(0) != c->dimension(0), "The C matrix must have the same number of columns as the matrix B");
177  }
178 
179  if(d->total_size() != 0)
180  {
181  // For fixed format we are expecting some kind of blocked format for B/RHS so the dimension won't necessarily match the result matrix any more.
182  ARM_COMPUTE_RETURN_ERROR_ON(!gemm_info.fixed_format() && b->dimension(0) != d->dimension(0));
183  if(gemm_info.depth_output_gemm3d() != 0)
184  {
185  if(gemm_info.reinterpret_input_as_3d())
186  {
187  ARM_COMPUTE_RETURN_ERROR_ON(a->dimension(1) != d->dimension(1));
188  ARM_COMPUTE_RETURN_ERROR_ON(a->dimension(2) != d->dimension(2));
189  }
190  else
191  {
192  ARM_COMPUTE_RETURN_ERROR_ON(a->dimension(1) != d->dimension(1) * d->dimension(2));
193  }
194  }
195  else
196  {
197  ARM_COMPUTE_RETURN_ERROR_ON(a->dimension(1) != d->dimension(1));
198  }
199  }
200 
201  // Check if we need to run the optimized assembly kernel
202  cpu::AsmGemmInfo asm_info = init_assembly_metadata(gemm_info);
203  const bool run_optimised = bool(cpu::CpuGemmAssemblyDispatch::validate(a, b, is_c_bias ? c : nullptr, d, asm_info));
204 
205  if(!run_optimised)
206  {
207  ARM_COMPUTE_RETURN_ERROR_ON_MSG(gemm_info.reinterpret_input_as_3d(), "CpuGemm cannot reinterpret the input tensor as 3D");
208  ARM_COMPUTE_RETURN_ERROR_ON_MSG(gemm_info.depth_output_gemm3d() != 0, "CpuGemm cannot reinterpret the output tensor as 3D");
209 
210  // Check if the first input tensor is a vector.
211  const bool run_vector_matrix_multiplication = a->dimension(1) < 2;
212  // Check if we need to reshape the matrix A and matrix B
213  const bool run_interleave_transpose = !run_vector_matrix_multiplication && !(gemm_info.reshape_b_only_on_first_run());
214 
215  // Arguments used by GEMMReshapeInfo
216  // If we pass the matrix A and matrix B reshaped to CpuGemmMatrixMultiplyKernel, we need to pass m, n, k, mult_transpose1xW_width and mult_interleave4x4_height to GEMMReshapeInfo
217  // in order to know how the matrices have been reshaped
218  const int m = a->dimension(1);
219  const int n = b->dimension(0);
220  const int k = a->dimension(0);
221  int mult_transpose1xW_width = 1;
222  int mult_interleave4x4_height = 1;
223 
224  const GEMMReshapeInfo reshape_info = GEMMReshapeInfo(m, n, k, mult_transpose1xW_width, mult_interleave4x4_height, gemm_info.depth_output_gemm3d());
225 
226  const ITensorInfo *matrix_a_info = a;
227  const ITensorInfo *matrix_b_info = b;
228 
229  TensorInfo tmp_a_info{};
230  TensorInfo tmp_b_info{};
231  TensorInfo tmp_output_info = *d->clone();
232 
233  if(run_interleave_transpose)
234  {
235  matrix_a_info = &tmp_a_info;
236  matrix_b_info = &tmp_b_info;
237 
238  // Validate interleave kernel
239  auto_init_if_empty(tmp_a_info, a->clone()->set_tensor_shape(compute_interleaved_shape(*a, mult_interleave4x4_height, gemm_info.reinterpret_input_as_3d())));
241 
242  // Validate transpose kernel
243  auto_init_if_empty(tmp_b_info, b->clone()->set_tensor_shape(compute_transpose1xW_with_element_size_shape(*b, mult_transpose1xW_width)));
245  }
246 
247  // Validate matrix multiply
248  auto_init_if_empty(tmp_output_info, matrix_a_info->clone()->set_tensor_shape(compute_mm_shape(*matrix_a_info, *matrix_b_info, run_interleave_transpose, reshape_info)));
249  ARM_COMPUTE_RETURN_ON_ERROR(cpu::kernels::CpuGemmMatrixMultiplyKernel::validate(matrix_a_info, matrix_b_info, &tmp_output_info, alpha, run_interleave_transpose, reshape_info));
250 
251  if(c != nullptr && gemm_info.reshape_b_only_on_first_run())
252  {
254  }
255  }
256 
257  // Validate matrix addition kernel
258  if(beta != 0 && c != nullptr && !is_c_bias)
259  {
261  }
262 
263  // Validate activation
264  const ActivationLayerInfo &activation = gemm_info.activation_info();
265  if(activation.enabled())
266  {
268  }
269 
270  return Status{};
271 }
TensorShape compute_transpose1xW_with_element_size_shape(const ITensorInfo &b, int mult_transpose1xW_width=1)
Calculate the transposed 1xW width element shape.
#define ARM_COMPUTE_RETURN_ERROR_ON_CPU_F16_UNSUPPORTED(tensor)
Definition: Validate.h:115
#define ARM_COMPUTE_RETURN_ERROR_ON_CPU_BF16_UNSUPPORTED(tensor)
Definition: Validate.h:121
SimpleTensor< float > b
Definition: DFT.cpp:157
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204
TensorShape compute_mm_shape(const ITensorInfo &input0, const ITensorInfo &input1, bool is_interleaved_transposed, const GEMMReshapeInfo &reshape_info)
Calculate the matrix multiplication output shape of two tensors.
1 channel, 1 F32 per channel
static Status validate(const ITensorInfo *src, const ITensorInfo *dst)
Static function to check if given info will lead to a valid configuration of CpuGemmTranspose1xWKerne...
#define ARM_COMPUTE_RETURN_ERROR_ON(cond)
If the condition is true, an error is returned.
Definition: Error.h:296
static Status validate(const ITensorInfo *lhs, const ITensorInfo *rhs, const ITensorInfo *dst, float alpha, bool is_interleaved, const GEMMReshapeInfo &reshape_info)
Static function to check if given info will lead to a valid configuration of CpuGemmMatrixMultiplyKer...
TensorShape compute_interleaved_shape(const ITensorInfo &a, int mult_interleave4x4_height=1, bool reinterpret_input_as_3d=false)
Calculate the interleaved shape of an input tensor.
1 channel, 1 F16 per channel
16-bit brain floating-point number
static Status validate(const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *d, const AsmGemmInfo &info)
Indicates whether or not this function can be used to process the given parameters.
#define ARM_COMPUTE_UNUSED(...)
To avoid unused variables warnings.
Definition: Error.h:152
bool auto_init_if_empty(ITensorInfo &info, const TensorShape &shape, int num_channels, DataType data_type, QuantizationInfo quantization_info=QuantizationInfo())
Auto initialize the tensor info (shape, number of channels and data type) if the current assignment i...
static Status validate(const ITensorInfo *src, const ITensorInfo *dst)
Static function to check if given info will lead to a valid configuration of CpuGemmInterleave4x4Kern...
static Status validate(const ITensorInfo *src0, const ITensorInfo *src1, const ITensorInfo *dst, ConvertPolicy policy, const ActivationLayerInfo &act_info=ActivationLayerInfo())
Static function to check if given info will lead to a valid configuration.
Definition: CpuAdd.cpp:45
static Status validate(const ITensorInfo *input, const ITensorInfo *output, const ActivationLayerInfo &act_info)
Static function to check if given info will lead to a valid configuration.
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(...)
Definition: Validate.h:541
#define ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(t, c,...)
Definition: Validate.h:788
#define ARM_COMPUTE_RETURN_ERROR_ON_MSG(cond, msg)
If the condition is true, an error is returned.
Definition: Error.h:244
static Status validate(const ITensorInfo *src, const ITensorInfo *dst, float beta)
Static function to check if given info will lead to a valid configuration of CpuGemmMatrixAdditionKer...

◆ workspace()

experimental::MemoryRequirements workspace ( ) const
overridevirtual

Return the memory requirements required by the workspace.

Reimplemented from INEOperator.

Definition at line 366 of file CpuGemm.cpp.

367 {
368  return _aux_mem;
369 }

The documentation for this class was generated from the following files: