Compute Library
 21.02
NEGEMM Class Reference

Basic function to execute GEMM on Neon. More...

#include <NEGEMM.h>

Collaboration diagram for NEGEMM:
[legend]

Public Member Functions

 NEGEMM (std::shared_ptr< IMemoryManager > memory_manager=nullptr, IWeightsManager *weights_manager=nullptr)
 Constructor. More...
 
 NEGEMM (const NEGEMM &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 NEGEMM (NEGEMM &&)=default
 Default move constructor. More...
 
NEGEMMoperator= (const NEGEMM &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
NEGEMMoperator= (NEGEMM &&)=default
 Default move assignment operator. More...
 
 ~NEGEMM ()
 Default destructor. More...
 
void configure (const ITensor *a, const ITensor *b, const ITensor *c, ITensor *d, float alpha, float beta, const GEMMInfo &gemm_info=GEMMInfo())
 Initialise the kernel's inputs, output. More...
 
void run () override
 Run the kernels contained in the function. More...
 
void prepare () override
 Prepare the function for executing. More...
 
- Public Member Functions inherited from IFunction
virtual ~IFunction ()=default
 Destructor. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *output, float alpha, float beta, const GEMMInfo &gemm_info=GEMMInfo())
 Static function to check if given info will lead to a valid configuration of NEGEMM. More...
 

Detailed Description

Basic function to execute GEMM on Neon.

This function calls the following Neon kernels:

If optimized assembly is available:

  1. NEGEMMAssemblyDispatch
  2. NEActivationLayer (if alpha != 1.0) Else:
  3. NEGEMMInterleave4x4Kernel (if the output tensor is a matrix)
  4. NEGEMMTranspose1xWKernel (if the output tensor is a matrix)
  5. NEGEMMMatrixMultiplyKernel In both cases:
  6. NEGEMMMatrixAdditionKernel (if c != nullptr and beta != 0.0 and is not reshaped once) Else:
  7. NEArithmeticAddition (if c != nullptr and is reshaped once and not optimized assembly in place)
  8. NEActivationLayer (if activation is specified in GEMMInfo)

Definition at line 62 of file NEGEMM.h.

Constructor & Destructor Documentation

◆ NEGEMM() [1/3]

NEGEMM ( std::shared_ptr< IMemoryManager memory_manager = nullptr,
IWeightsManager weights_manager = nullptr 
)

Constructor.

Definition at line 63 of file NEGEMM.cpp.

64  : _memory_group(memory_manager), _weights_manager(weights_manager), _interleave_kernel(), _transpose_kernel(), _mm_kernel(), _asm_glue(std::make_unique<NEGEMMAssemblyDispatch>()), _ma_kernel(),
65  _alpha_scale_func(nullptr), _add_bias(), _activation_func(), _tmp_a(), _tmp_b(), _tmp_d(), _original_b(nullptr), _run_vector_matrix_multiplication(false), _run_alpha_scale(false),
66  _run_addition(false), _run_bias_addition(false), _run_activation(false), _reshape_b_only_on_first_run(false), _is_prepared(false)
67 {
68 }

◆ NEGEMM() [2/3]

NEGEMM ( const NEGEMM )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ NEGEMM() [3/3]

NEGEMM ( NEGEMM &&  )
default

Default move constructor.

◆ ~NEGEMM()

~NEGEMM ( )
default

Default destructor.

Member Function Documentation

◆ configure()

void configure ( const ITensor a,
const ITensor b,
const ITensor c,
ITensor d,
float  alpha,
float  beta,
const GEMMInfo gemm_info = GEMMInfo() 
)

Initialise the kernel's inputs, output.

Note
GEMM: General Matrix Multiply - [alpha * A * B + beta * C].
GEMM: The tensors a, b, c, d must have the same data type. You should not mix data types when calling this function.
Parameters
[in]aFirst input tensor (Matrix A or Vector A). Data type supported: BFLOAT16/F16/F32
[in]bSecond input tensor (Matrix B). Data type supported: same as a
[in]cThird input tensor (Matrix C). It can be a nullptr if just the multiplication between a and b is needed. Data type supported: same as a
[out]dOutput tensor. Data type supported: same as a
[in]alphaWeight of the matrix product
[in]betaWeight of matrix C
[in]gemm_info(Optional) Specifies if the matrix A and/or matrix B have been reshaped and if the reshape of matrix B should happen only for the first run

Definition at line 72 of file NEGEMM.cpp.

References GEMMInfo::activation_info(), TensorAllocator::allocate(), Tensor::allocator(), ARM_COMPUTE_ERROR_ON, ARM_COMPUTE_ERROR_THROW_ON, arm_compute::test::validation::b, ICloneable< T >::clone(), NEActivationLayer::configure(), NEArithmeticAddition::configure(), arm_compute::data_size_from_type(), ITensorInfo::data_type(), ITensorInfo::dimension(), ActivationLayerInfo::enabled(), ITensor::info(), TensorAllocator::init(), NEGEMMAssemblyDispatch::is_activation_supported(), ActivationLayerInfo::LINEAR, MemoryGroup::manage(), GEMMInfo::reshape_b_only_on_first_run(), arm_compute::SATURATE, TensorShape::set(), ITensorInfo::tensor_shape(), NEGEMMAssemblyDispatch::validate(), and NEGEMM::validate().

Referenced by NERNNLayer::configure(), NEWinogradConvolutionLayer::configure(), and NELSTMLayer::configure().

73 {
74  ARM_COMPUTE_ERROR_THROW_ON(NEGEMM::validate(a->info(), b->info(), (c != nullptr) ? c->info() : nullptr, d->info(), alpha, beta, gemm_info));
75 
76  const AsmGemmInfo asm_info = init_assembly_metadata(gemm_info);
77  const bool is_c_bias = gemm_info.reshape_b_only_on_first_run();
78  bool run_optimised = bool(NEGEMMAssemblyDispatch::validate(a->info(), b->info(), (is_c_bias && c != nullptr) ? c->info() : nullptr, d->info(), asm_info));
79 
80  // Check if we need to reshape the matrix B only on the first run
81  _is_prepared = false;
82  _reshape_b_only_on_first_run = gemm_info.reshape_b_only_on_first_run();
83  _run_vector_matrix_multiplication = a->info()->dimension(1) < 2;
84  _original_b = b;
85  _run_alpha_scale = alpha != 1.f;
86  _run_bias_addition = c != nullptr && gemm_info.reshape_b_only_on_first_run();
87  _run_addition = beta != 0 && c != nullptr && !gemm_info.reshape_b_only_on_first_run();
88  _run_activation = gemm_info.activation_info().enabled() && (!run_optimised || (run_optimised && !NEGEMMAssemblyDispatch::is_activation_supported(gemm_info.activation_info())));
89 
90  if(run_optimised)
91  {
92  const ITensor *c_to_use = is_c_bias ? c : nullptr;
93  _asm_glue->configure(a, b, c_to_use, d, asm_info);
94  ARM_COMPUTE_ERROR_ON(!_asm_glue->is_configured());
95 
96  // Scale product by alpha
97  if(_run_alpha_scale)
98  {
99  _alpha_scale_func.configure(d, nullptr, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::LINEAR, alpha, 0.f));
100  }
101  }
102  else
103  {
104  // Pick output tensor in case bias addition should be performed
105  ITensor *gemm_output_to_use = d;
106  if(_run_bias_addition)
107  {
108  gemm_output_to_use = &_tmp_d;
109  _memory_group.manage(&_tmp_d);
110  }
111 
112  _mm_kernel = std::make_unique<NEGEMMMatrixMultiplyKernel>();
113 
114  // Select between GEMV and GEMM
115  if(_run_vector_matrix_multiplication)
116  {
117  // Configure the matrix multiply kernel
118  _mm_kernel->configure(a, b, gemm_output_to_use, alpha, false);
119  }
120  else
121  {
122  TensorShape shape_tmp_a = a->info()->tensor_shape();
123  TensorShape shape_tmp_b = b->info()->tensor_shape();
124 
125  shape_tmp_a.set(0, a->info()->dimension(0) * 4);
126  shape_tmp_a.set(1, std::ceil(a->info()->dimension(1) / 4.0f));
127 
128  const unsigned int transpose_w = 16 / data_size_from_type(b->info()->data_type());
129  shape_tmp_b.set(0, b->info()->dimension(1) * transpose_w);
130  shape_tmp_b.set(1, std::ceil(b->info()->dimension(0) / static_cast<float>(transpose_w)));
131 
132  TensorInfo info_a = a->info()->clone()->set_tensor_shape(shape_tmp_a).set_is_resizable(true);
133  TensorInfo info_b = b->info()->clone()->set_tensor_shape(shape_tmp_b).set_is_resizable(true);
134 
135  _tmp_a.allocator()->init(info_a);
136  _tmp_b.allocator()->init(info_b);
137 
138  // Manage intermediate buffers
139  _memory_group.manage(&_tmp_a);
140  if(!_reshape_b_only_on_first_run)
141  {
142  _memory_group.manage(&_tmp_b);
143  }
144 
145  int m = a->info()->dimension(1);
146  int n = b->info()->dimension(0);
147  int k = a->info()->dimension(0);
148 
149  // Configure interleave kernel
150  _interleave_kernel = std::make_unique<NEGEMMInterleave4x4Kernel>();
151  _interleave_kernel->configure(a, &_tmp_a);
152 
153  // Configure transpose kernel
154  _transpose_kernel = std::make_unique<NEGEMMTranspose1xWKernel>();
155  _transpose_kernel->configure(b, &_tmp_b);
156 
157  // Configure matrix multiplication kernel
158  _mm_kernel->configure(&_tmp_a, &_tmp_b, gemm_output_to_use, alpha, true, GEMMReshapeInfo(m, n, k));
159 
160  // Allocate once the all configure methods have been called
161  _tmp_a.allocator()->allocate();
162  if(!_reshape_b_only_on_first_run)
163  {
164  _tmp_b.allocator()->allocate();
165  }
166  }
167 
168  if(_run_bias_addition)
169  {
170  _add_bias.configure(gemm_output_to_use, c, d, ConvertPolicy::SATURATE);
171  _tmp_d.allocator()->allocate();
172  }
173  }
174 
175  // Configure matrix addition kernel
176  if(_run_addition)
177  {
178  _ma_kernel = std::make_unique<NEGEMMMatrixAdditionKernel>();
179  _ma_kernel->configure(c, d, beta);
180  }
181 
182  // Configure activation
183  const ActivationLayerInfo &activation = gemm_info.activation_info();
184  if(_run_activation)
185  {
186  _activation_func.configure(d, nullptr, activation);
187  }
188 }
void init(const TensorAllocator &allocator, const Coordinates &coords, TensorInfo &sub_info)
Shares the same backing memory with another tensor allocator, while the tensor info might be differen...
SimpleTensor< float > b
Definition: DFT.cpp:157
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:466
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
void configure(const ITensor *input1, const ITensor *input2, ITensor *output, ConvertPolicy policy, const ActivationLayerInfo &act_info=ActivationLayerInfo())
Initialise the kernel&#39;s inputs, output and conversion policy.
TensorAllocator * allocator()
Return a pointer to the tensor&#39;s allocator.
Definition: Tensor.cpp:48
void manage(IMemoryManageable *obj) override
Sets a object to be managed by the given memory group.
Definition: MemoryGroup.h:79
static Status validate(const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *output, float alpha, float beta, const GEMMInfo &gemm_info=GEMMInfo())
Static function to check if given info will lead to a valid configuration of NEGEMM.
Definition: NEGEMM.cpp:190
static bool is_activation_supported(const ActivationLayerInfo &activation)
Checks if activation is supported by the gemm assembly dispatcher.
void allocate() override
Allocate size specified by TensorInfo of CPU memory.
size_t data_size_from_type(DataType data_type)
The size in bytes of the data type.
Definition: Utils.h:106
static Status validate(const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *d, const AsmGemmInfo &info)
Indicates whether or not this function can be used to process the given parameters.
void configure(ITensor *input, ITensor *output, ActivationLayerInfo activation_info)
[NEActivationLayer snippet]

◆ operator=() [1/2]

NEGEMM& operator= ( const NEGEMM )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

NEGEMM& operator= ( NEGEMM &&  )
default

Default move assignment operator.

◆ prepare()

void prepare ( )
overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Note
Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from IFunction.

Definition at line 359 of file NEGEMM.cpp.

References TensorAllocator::allocate(), Tensor::allocator(), IWeightsManager::are_weights_managed(), ARM_COMPUTE_ERROR_ON, Window::DimY, Scheduler::get(), ITensor::is_used(), ITensor::mark_as_unused(), and IScheduler::schedule().

Referenced by NERNNLayer::prepare(), NEWinogradConvolutionLayer::prepare(), NEFullyConnectedLayer::prepare(), NEGEMMConvolutionLayer::prepare(), and NEGEMM::run().

360 {
361  if(!_is_prepared)
362  {
363  const bool original_b_managed_by_weights_manager = _weights_manager && _weights_manager->are_weights_managed(_original_b);
364  if(_asm_glue->is_configured())
365  {
366  if(!original_b_managed_by_weights_manager)
367  {
368  ARM_COMPUTE_ERROR_ON(!_original_b->is_used());
369  }
370 
371  _asm_glue->prepare();
372  if(!original_b_managed_by_weights_manager)
373  {
374  _original_b->mark_as_unused();
375  }
376  }
377  else if(_reshape_b_only_on_first_run && !_run_vector_matrix_multiplication && !_asm_glue->is_configured())
378  {
379  if(!original_b_managed_by_weights_manager)
380  {
381  ARM_COMPUTE_ERROR_ON(!_original_b->is_used());
382  }
383 
384  _tmp_b.allocator()->allocate();
385  NEScheduler::get().schedule(_transpose_kernel.get(), Window::DimY);
386  if(!original_b_managed_by_weights_manager)
387  {
388  _original_b->mark_as_unused();
389  }
390  }
391 
392  _is_prepared = true;
393  }
394 }
bool is_used() const
Flags if the tensor is used or not.
Definition: ITensor.cpp:163
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:466
TensorAllocator * allocator()
Return a pointer to the tensor&#39;s allocator.
Definition: Tensor.cpp:48
void mark_as_unused() const
Marks a tensor as unused.
Definition: ITensor.cpp:168
bool are_weights_managed(const ITensor *weights)
Check if the weights are managed.
void allocate() override
Allocate size specified by TensorInfo of CPU memory.
static constexpr size_t DimY
Alias for dimension 1 also known as Y dimension.
Definition: Window.h:45
virtual void schedule(ICPPKernel *kernel, const Hints &hints)=0
Runs the kernel in the same thread as the caller synchronously.
static IScheduler & get()
Access the scheduler singleton.
Definition: Scheduler.cpp:94

◆ run()

void run ( )
overridevirtual

Run the kernels contained in the function.

For Neon kernels:

  • Multi-threading is used for the kernels which are parallelisable.
  • By default std::thread::hardware_concurrency() threads are used.
Note
CPPScheduler::set_num_threads() can be used to manually set the number of threads

For OpenCL kernels:

  • All the kernels are enqueued on the queue associated with CLScheduler.
  • The queue is then flushed.
Note
The function will not block until the kernels are executed. It is the user's responsibility to wait.
Will call prepare() on first run if hasn't been done

Implements IFunction.

Definition at line 309 of file NEGEMM.cpp.

References Window::DimX, Window::DimY, Scheduler::get(), NEGEMM::prepare(), NEActivationLayer::run(), NEArithmeticAddition::run(), and IScheduler::schedule().

Referenced by NERNNLayer::run(), NEWinogradConvolutionLayer::run(), NELSTMLayer::run(), NEFullyConnectedLayer::run(), and NEGEMMConvolutionLayer::run().

310 {
311  prepare();
312 
313  MemoryGroupResourceScope scope_mg(_memory_group);
314 
315  if(_asm_glue->is_configured())
316  {
317  _asm_glue->run();
318  if(_run_alpha_scale)
319  {
320  _alpha_scale_func.run();
321  }
322  }
323  else
324  {
325  if(!_run_vector_matrix_multiplication)
326  {
327  // Run interleave kernel
328  NEScheduler::get().schedule(_interleave_kernel.get(), Window::DimY);
329 
330  if(!_reshape_b_only_on_first_run)
331  {
332  // Run transpose kernel
333  NEScheduler::get().schedule(_transpose_kernel.get(), Window::DimY);
334  }
335  }
336 
337  NEScheduler::get().schedule(_mm_kernel.get(), _run_vector_matrix_multiplication ? Window::DimX : Window::DimY);
338 
339  // Run bias addition kernel
340  if(_run_bias_addition)
341  {
342  _add_bias.run();
343  }
344  }
345 
346  // Run matrix addition kernel
347  if(_run_addition)
348  {
349  NEScheduler::get().schedule(_ma_kernel.get(), Window::DimY);
350  }
351 
352  // Run activation function
353  if(_run_activation)
354  {
355  _activation_func.run();
356  }
357 }
void run() override
Run the kernels contained in the function.
static constexpr size_t DimX
Alias for dimension 0 also known as X dimension.
Definition: Window.h:43
void run() override
Run the kernels contained in the function.
static constexpr size_t DimY
Alias for dimension 1 also known as Y dimension.
Definition: Window.h:45
virtual void schedule(ICPPKernel *kernel, const Hints &hints)=0
Runs the kernel in the same thread as the caller synchronously.
void prepare() override
Prepare the function for executing.
Definition: NEGEMM.cpp:359
static IScheduler & get()
Access the scheduler singleton.
Definition: Scheduler.cpp:94

◆ validate()

Status validate ( const ITensorInfo a,
const ITensorInfo b,
const ITensorInfo c,
const ITensorInfo output,
float  alpha,
float  beta,
const GEMMInfo gemm_info = GEMMInfo() 
)
static

Static function to check if given info will lead to a valid configuration of NEGEMM.

Parameters
[in]aFirst input tensor info (Matrix or Vector A). Data types supported: BFLOAT16/F16/F32
[in]bSecond input tensor info (Matrix B). Data type supported: same as a.
[in]cThird input tensor info (Matrix C). It can be a nullptr if just the multiplication between a and b is needed. Data type supported: same as a.
[out]outputOutput tensor info. Data type supported: same as a
[in]alphaWeight of the matrix product
[in]betaWeight of matrix C
[in]gemm_info(Optional) Specifies if the matrix A and/or matrix B have been reshaped and if the reshape of matrix B should happen only for the first run
Returns
a status

Definition at line 190 of file NEGEMM.cpp.

References GEMMInfo::activation_info(), ARM_COMPUTE_RETURN_ERROR_ON, ARM_COMPUTE_RETURN_ERROR_ON_CPU_BF16_UNSUPPORTED, ARM_COMPUTE_RETURN_ERROR_ON_CPU_F16_UNSUPPORTED, ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES, ARM_COMPUTE_RETURN_ERROR_ON_MSG, ARM_COMPUTE_RETURN_ON_ERROR, ARM_COMPUTE_UNUSED, arm_compute::auto_init_if_empty(), arm_compute::test::validation::b, arm_compute::BFLOAT16, ICloneable< T >::clone(), TensorInfo::clone(), arm_compute::misc::shape_calculator::compute_interleaved_shape(), arm_compute::misc::shape_calculator::compute_mm_shape(), arm_compute::misc::shape_calculator::compute_transpose1xW_with_element_size_shape(), ITensorInfo::data_type(), GEMMInfo::depth_output_gemm3d(), ITensorInfo::dimension(), ActivationLayerInfo::enabled(), arm_compute::F16, arm_compute::F32, GEMMInfo::is_a_reshaped(), GEMMInfo::is_b_reshaped(), GEMMInfo::reinterpret_input_as_3d(), GEMMInfo::reshape_b_only_on_first_run(), arm_compute::SATURATE, ITensorInfo::total_size(), NEGEMMMatrixAdditionKernel::validate(), NEGEMMMatrixMultiplyKernel::validate(), NEActivationLayer::validate(), NEGEMMInterleave4x4Kernel::validate(), NEArithmeticAddition::validate(), NEGEMMTranspose1xWKernel::validate(), and NEGEMMAssemblyDispatch::validate().

Referenced by NEGEMM::configure(), and NELSTMLayer::validate().

191 {
192  ARM_COMPUTE_UNUSED(alpha);
193  const bool is_c_bias = gemm_info.reshape_b_only_on_first_run();
194 
199  ARM_COMPUTE_RETURN_ERROR_ON_MSG(a->dimension(0) != b->dimension(1), "The product AB is defined only if the number of columns in A is equal to the number of rows in B");
200  ARM_COMPUTE_RETURN_ERROR_ON_MSG(gemm_info.is_a_reshaped(), "Matrix A already reshaped is not supported");
201  ARM_COMPUTE_RETURN_ERROR_ON_MSG(gemm_info.is_b_reshaped(), "Matrix B already reshaped is not supported");
202  if(a->data_type() != DataType::BFLOAT16)
203  {
205  }
206 
207  if(c != nullptr && !is_c_bias)
208  {
209  ARM_COMPUTE_RETURN_ERROR_ON(gemm_info.depth_output_gemm3d() != 0);
210  ARM_COMPUTE_RETURN_ERROR_ON(gemm_info.reinterpret_input_as_3d());
212  ARM_COMPUTE_RETURN_ERROR_ON_MSG(a->dimension(1) != c->dimension(1), "The C matrix must have the same number of rows as the matrix A");
213  ARM_COMPUTE_RETURN_ERROR_ON_MSG(b->dimension(0) != c->dimension(0), "The C matrix must have the same number of columns as the matrix B");
214  }
215 
216  if(output->total_size() != 0)
217  {
218  ARM_COMPUTE_RETURN_ERROR_ON(b->dimension(0) != output->dimension(0));
219  if(gemm_info.depth_output_gemm3d() != 0)
220  {
221  if(gemm_info.reinterpret_input_as_3d())
222  {
223  ARM_COMPUTE_RETURN_ERROR_ON(a->dimension(1) != output->dimension(1));
224  ARM_COMPUTE_RETURN_ERROR_ON(a->dimension(2) != output->dimension(2));
225  }
226  else
227  {
228  ARM_COMPUTE_RETURN_ERROR_ON(a->dimension(1) != output->dimension(1) * output->dimension(2));
229  }
230  }
231  else
232  {
233  ARM_COMPUTE_RETURN_ERROR_ON(a->dimension(1) != output->dimension(1));
234  }
235  }
236 
237  // Check if we need to run the optimized assembly kernel
238  AsmGemmInfo asm_info = init_assembly_metadata(gemm_info);
239  const bool run_optimised = bool(NEGEMMAssemblyDispatch::validate(a, b, is_c_bias ? c : nullptr, output, asm_info));
240 
241  if(!run_optimised)
242  {
243  ARM_COMPUTE_RETURN_ERROR_ON_MSG(gemm_info.reinterpret_input_as_3d(), "NEGEMM cannot reinterpret the input tensor as 3D");
244  ARM_COMPUTE_RETURN_ERROR_ON_MSG(gemm_info.depth_output_gemm3d() != 0, "NEGEMM cannot reinterpret the output tensor as 3D");
245 
246  // Check if the first input tensor is a vector.
247  const bool run_vector_matrix_multiplication = a->dimension(1) < 2;
248  // Check if we need to reshape the matrix A and matrix B
249  const bool run_interleave_transpose = !run_vector_matrix_multiplication && !(gemm_info.reshape_b_only_on_first_run());
250 
251  // Arguments used by GEMMReshapeInfo
252  // If we pass the matrix A and matrix B reshaped to NEGEMMMatrixMultiplyKernel, we need to pass m, n, k, mult_transpose1xW_width and mult_interleave4x4_height to NEGEMMReshapeInfo
253  // in order to know how the matrices have been reshaped
254  const int m = a->dimension(1);
255  const int n = b->dimension(0);
256  const int k = a->dimension(0);
257  int mult_transpose1xW_width = 1;
258  int mult_interleave4x4_height = 1;
259 
260  const GEMMReshapeInfo reshape_info = GEMMReshapeInfo(m, n, k, mult_transpose1xW_width, mult_interleave4x4_height, gemm_info.depth_output_gemm3d());
261 
262  const ITensorInfo *matrix_a_info = a;
263  const ITensorInfo *matrix_b_info = b;
264 
265  TensorInfo tmp_a_info{};
266  TensorInfo tmp_b_info{};
267  TensorInfo tmp_output_info = *output->clone();
268 
269  if(run_interleave_transpose)
270  {
271  matrix_a_info = &tmp_a_info;
272  matrix_b_info = &tmp_b_info;
273 
274  // Validate interleave kernel
275  auto_init_if_empty(tmp_a_info, a->clone()->set_tensor_shape(compute_interleaved_shape(*a, mult_interleave4x4_height, gemm_info.reinterpret_input_as_3d())));
277 
278  // Validate transpose kernel
279  auto_init_if_empty(tmp_b_info, b->clone()->set_tensor_shape(compute_transpose1xW_with_element_size_shape(*b, mult_transpose1xW_width)));
281  }
282 
283  // Validate matrix multiply
284  auto_init_if_empty(tmp_output_info, matrix_a_info->clone()->set_tensor_shape(compute_mm_shape(*matrix_a_info, *matrix_b_info, run_interleave_transpose, reshape_info)));
285  ARM_COMPUTE_RETURN_ON_ERROR(NEGEMMMatrixMultiplyKernel::validate(matrix_a_info, matrix_b_info, &tmp_output_info, alpha, run_interleave_transpose, reshape_info));
286 
287  if(c != nullptr && gemm_info.reshape_b_only_on_first_run())
288  {
290  }
291  }
292 
293  // Validate matrix addition kernel
294  if(beta != 0 && c != nullptr && !is_c_bias)
295  {
297  }
298 
299  // Validate activation
300  const ActivationLayerInfo &activation = gemm_info.activation_info();
301  if(activation.enabled())
302  {
303  ARM_COMPUTE_RETURN_ON_ERROR(NEActivationLayer::validate(output, nullptr, activation));
304  }
305 
306  return Status{};
307 }
TensorShape compute_transpose1xW_with_element_size_shape(const ITensorInfo &b, int mult_transpose1xW_width=1)
Calculate the transposed 1xW width element shape.
static Status validate(const ITensorInfo *input1, const ITensorInfo *input2, const ITensorInfo *output, ConvertPolicy policy, const ActivationLayerInfo &act_info=ActivationLayerInfo())
Static function to check if given info will lead to a valid configuration of NEArithmeticAddition.
#define ARM_COMPUTE_RETURN_ERROR_ON_CPU_F16_UNSUPPORTED(tensor)
Definition: Validate.h:108
#define ARM_COMPUTE_RETURN_ERROR_ON_CPU_BF16_UNSUPPORTED(tensor)
Definition: Validate.h:114
SimpleTensor< float > b
Definition: DFT.cpp:157
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204
TensorShape compute_mm_shape(const ITensorInfo &input0, const ITensorInfo &input1, bool is_interleaved_transposed, const GEMMReshapeInfo &reshape_info)
Calculate the matrix multiplication output shape of two tensors.
static Status validate(const ITensorInfo *input, const ITensorInfo *output, const ActivationLayerInfo &act_info)
[NEActivationLayer snippet]
1 channel, 1 F32 per channel
#define ARM_COMPUTE_RETURN_ERROR_ON(cond)
If the condition is true, an error is returned.
Definition: Error.h:296
static Status validate(const ITensorInfo *input, const ITensorInfo *output, float beta)
Static function to check if given info will lead to a valid configuration of NEGEMMMatrixAdditionKern...
TensorShape compute_interleaved_shape(const ITensorInfo &a, int mult_interleave4x4_height=1, bool reinterpret_input_as_3d=false)
Calculate the interleaved shape of an input tensor.
static Status validate(const ITensorInfo *input, const ITensorInfo *output)
Static function to check if given info will lead to a valid configuration of NEGEMMTranspose1xWKernel...
1 channel, 1 F16 per channel
16-bit brain floating-point number
#define ARM_COMPUTE_UNUSED(...)
To avoid unused variables warnings.
Definition: Error.h:152
static Status validate(const ITensorInfo *input, const ITensorInfo *output)
Static function to check if given info will lead to a valid configuration of NEGEMMInterleave4x4Kerne...
bool auto_init_if_empty(ITensorInfo &info, const TensorShape &shape, int num_channels, DataType data_type, QuantizationInfo quantization_info=QuantizationInfo())
Auto initialize the tensor info (shape, number of channels and data type) if the current assignment i...
static Status validate(const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *d, const AsmGemmInfo &info)
Indicates whether or not this function can be used to process the given parameters.
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(...)
Definition: Validate.h:545
#define ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(t, c,...)
Definition: Validate.h:792
#define ARM_COMPUTE_RETURN_ERROR_ON_MSG(cond, msg)
If the condition is true, an error is returned.
Definition: Error.h:244
static Status validate(const ITensorInfo *input0, const ITensorInfo *input1, const ITensorInfo *output, float alpha, bool is_interleaved, const GEMMReshapeInfo &reshape_info)
Static function to check if given info will lead to a valid configuration of NEGEMMMatrixMultiplyKern...

The documentation for this class was generated from the following files: