Compute Library
 22.08
CpuGemmLowpMatrixMultiplyCore Class Reference

Basic function to execute GEMMLowpMatrixMultiplyCore. More...

#include <CpuGemmLowpMatrixMultiplyCore.h>

Collaboration diagram for CpuGemmLowpMatrixMultiplyCore:
[legend]

Public Member Functions

 CpuGemmLowpMatrixMultiplyCore ()
 Constructor. More...
 
 ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE (CpuGemmLowpMatrixMultiplyCore)
 
 ~CpuGemmLowpMatrixMultiplyCore ()
 Destructor. More...
 
void configure (const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, ITensorInfo *dst, const GEMMInfo &gemm_info=GEMMInfo())
 Initialise the kernel's inputs, output. More...
 
void run (ITensorPack &tensors) override
 Run the kernels contained in the function. More...
 
void prepare (ITensorPack &tensors) override
 Prepare the function for executing. More...
 
experimental::MemoryRequirements workspace () const override
 Return the memory requirements required by the workspace. More...
 
- Public Member Functions inherited from INEOperator
 INEOperator (IRuntimeContext *ctx=nullptr)
 Constructor. More...
 
 INEOperator (const INEOperator &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 INEOperator (INEOperator &&)=default
 Default move constructor. More...
 
INEOperatoroperator= (const INEOperator &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
INEOperatoroperator= (INEOperator &&)=default
 Default move assignment operator. More...
 
 ~INEOperator ()
 Default destructor. More...
 
- Public Member Functions inherited from IOperator
virtual ~IOperator ()=default
 Destructor. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *dst, const GEMMInfo &gemm_info=GEMMInfo())
 Static function to check if given info will lead to a valid configuration. More...
 

Detailed Description

Basic function to execute GEMMLowpMatrixMultiplyCore.

This function calls the following kernels if the DOT product instruction is not available:

  1. kernels::CpuGemmInterleave4x4Kernel
  2. kernels::CpuGemmTranspose1xWKernel
  3. kernels::CpuGemmLowpMatrixMultiplyKernel
  4. kernels::CpuGemmLowpOffsetContributionKernel
  5. CpuActivation

otherwise if the DOT product instruction is available:

  1. kernels::CpuGemmLowpOffsetContributionKernel

Definition at line 64 of file CpuGemmLowpMatrixMultiplyCore.h.

Constructor & Destructor Documentation

◆ CpuGemmLowpMatrixMultiplyCore()

Constructor.

Definition at line 73 of file CpuGemmLowpMatrixMultiplyCore.cpp.

74  : _asm_glue(std::make_unique<CpuGemmAssemblyDispatch>()),
75  _mm_kernel(),
76  _mtx_a_reshape_kernel(),
77  _mtx_b_reshape_kernel(),
78  _mtx_a_reduction_kernel(),
79  _mtx_b_reduction_kernel(),
80  _offset_contribution_kernel(),
81  _offset_contribution_output_stage_kernel(),
82  _activation_func(),
83  _convert_to_signed_asymm(),
84  _convert_from_signed_asymm(),
85  _vector_sum_col(),
86  _vector_sum_row(),
87  _tmp_a(),
88  _tmp_b(),
89  _mm_result_s32(),
90  _signed_a(),
91  _signed_output(),
92  _a_offset(0),
93  _b_offset(0),
94  _run_vector_matrix_multiplication(false),
95  _assembly_path(false),
96  _fused_assembly_path(false),
97  _reshape_b_only_on_first_run(false),
98  _is_prepared(false),
99  _fuse_output_stage(false),
100  _run_activation(false),
101  _flip_signedness(false),
102  _gemm_info(),
103  _aux_mem(Count)
104 {
105 }

◆ ~CpuGemmLowpMatrixMultiplyCore()

Destructor.

Member Function Documentation

◆ ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE()

ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE ( CpuGemmLowpMatrixMultiplyCore  )

◆ configure()

void configure ( const ITensorInfo a,
const ITensorInfo b,
const ITensorInfo c,
ITensorInfo dst,
const GEMMInfo gemm_info = GEMMInfo() 
)

Initialise the kernel's inputs, output.

Valid data layouts:

  • NHWC
  • NCHW

Valid data type configurations:

src0 src1 src2 dst
QASYMM8 QASYMM8 S32 QASYMM8
QASYMM8 QSYMM8_PER_CHANNEL S32 QASYMM8
QASYMM8 QSYMM8 S32 QASYMM8
QASYMM8 QASYMM8 S32 S32
QASYMM8 QSYMM8_PER_CHANNEL S32 S32
QASYMM8 QSYMM8 S32 S32
QASYMM8_SIGNED QASYMM8_SIGNED S32 QASYMM8_SIGNED
QASYMM8_SIGNED QSYMM8_PER_CHANNEL S32 QASYMM8_SIGNED
QASYMM8_SIGNED QSYMM8 S32 QASYMM8_SIGNED
QASYMM8_SIGNED QASYMM8_SIGNED S32 S32
QASYMM8_SIGNED QSYMM8_PER_CHANNEL S32 S32
QASYMM8_SIGNED QSYMM8 S32 S32
Note
GEMM_LOWP: low precision GEMM kernel This kernel performs the following computations:
  1. Convert a values from QASYMM8 to int32 and add a_offset to each of them.
  2. Convert b values from QASYMM8 to int32 add b_offset to each of them.
  3. Compute the matrix product of the resulting a * b in int32.
Note
The output type is S32 if gemm_info.type == GEMMLowpOutputStageType::NONE. It is QASYMM8/QASYMM8_SIGNED otherwise
Parameters
[in]aFirst input tensor info (Matrix A). Data type supported: QASYMM8/QASYMM8_SIGNED.
[in]bSecond input tensor info (Matrix B). Data type supported: QASYMM8/QASYMM8_SIGNED/QSYMM8/QSYMM8_PER_CHANNEL.
[in]cThird input tensor info (Matrix C). It can be a nullptr. Data type supported: S32
[out]dstOutput tensor info. Data type supported: Data type supported: S32/QASYMM8/QASYMM8_SIGNED
[in]gemm_info(Optional) Specifies if the matrix A and/or matrix B have been reshaped and if the reshape of matrix B should be executed only for the first run

Definition at line 108 of file CpuGemmLowpMatrixMultiplyCore.cpp.

References GEMMInfo::activation_info(), ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, ARM_COMPUTE_LOG_PARAMS, arm_compute::test::validation::b, ICloneable< T >::clone(), TensorInfo::clone(), arm_compute::misc::shape_calculator::compute_interleaved_shape(), arm_compute::misc::shape_calculator::compute_reductionA_shape(), arm_compute::misc::shape_calculator::compute_reductionB_shape(), arm_compute::misc::shape_calculator::compute_transpose1xW_shape(), ITensorInfo::data_type(), ITensorInfo::dimension(), TensorInfo::dimension(), arm_compute::test::validation::dst, dt, ActivationLayerInfo::enabled(), arm_compute::test::validation::gemm_info, GEMMLowpOutputStageInfo::gemmlowp_max_bound, GEMMLowpOutputStageInfo::gemmlowp_min_bound, GEMMLowpOutputStageInfo::gemmlowp_offset, GEMMInfo::gemmlowp_output_stage(), CpuGemmAssemblyDispatch::is_activation_supported(), arm_compute::is_data_type_quantized_asymmetric(), arm_compute::is_data_type_quantized_per_channel(), arm_compute::NONE, UniformQuantizationInfo::offset, arm_compute::offset_int_vec(), arm_compute::QASYMM8, arm_compute::QASYMM8_SIGNED, ITensorInfo::quantization_info(), TensorInfo::quantization_info(), arm_compute::QUANTIZE_DOWN_FIXEDPOINT, GEMMInfo::reshape_b_only_on_first_run(), arm_compute::S32, arm_compute::S8, UniformQuantizationInfo::scale, GEMMInfo::set_gemmlowp_output_stage(), ITensorInfo::tensor_shape(), TensorInfo::total_size(), GEMMLowpOutputStageInfo::type, arm_compute::U8, QuantizationInfo::uniform(), and CpuGemmLowpMatrixMultiplyCore::validate().

109 {
113 
114  const ITensorInfo *matrix_a = a;
115  const ITensorInfo *matrix_b = b;
116  GEMMInfo info = gemm_info;
117 
118  // Set internal variables
119  _a_offset = a->quantization_info().uniform().offset;
120  _b_offset = b->quantization_info().uniform().offset;
121  _run_vector_matrix_multiplication = a->dimension(1) < 2;
122  _reshape_b_only_on_first_run = info.reshape_b_only_on_first_run();
123  _is_prepared = false;
124  _fused_assembly_path = false;
125  _flip_signedness = is_data_type_quantized_per_channel(b->data_type()) && (a->data_type() == DataType::QASYMM8) && _reshape_b_only_on_first_run;
126  _gemm_info = gemm_info;
127 
128  _asm_glue = std::make_unique<cpu::CpuGemmAssemblyDispatch>();
129 
130  const ITensorInfo *a_to_use = a;
131 
132  // Convert to QASYMM8 -> QASYMM8_SIGNED and back
133  if(_flip_signedness)
134  {
135  const int32_t offset_correction = 128;
137  const UniformQuantizationInfo iqinfo = a_to_use->quantization_info().uniform();
138 
139  _signed_a = a_to_use->clone()->set_data_type(dt).set_quantization_info(QuantizationInfo(iqinfo.scale, iqinfo.offset + offset_correction));
140  _convert_to_signed_asymm = std::make_unique<kernels::CpuConvertQuantizedSignednessKernel>();
141  _convert_to_signed_asymm->configure(a_to_use, &_signed_a);
142  a_to_use = &_signed_a;
143  _a_offset = _signed_a.quantization_info().uniform().offset;
144 
145  const UniformQuantizationInfo oqinfo = dst->quantization_info().uniform();
146  _signed_output = dst->clone()->set_data_type(dt).set_quantization_info(QuantizationInfo(oqinfo.scale, oqinfo.offset - offset_correction));
147 
148  // Output stage correction
149  GEMMLowpOutputStageInfo output_stage_corr = info.gemmlowp_output_stage();
150  output_stage_corr.gemmlowp_offset = _signed_output.quantization_info().uniform().offset;
151  output_stage_corr.gemmlowp_min_bound -= offset_correction;
152  output_stage_corr.gemmlowp_max_bound -= offset_correction;
153  info.set_gemmlowp_output_stage(output_stage_corr);
154 
155  // Update matrix a
156  matrix_a = &_signed_a;
157  }
158 
159  // If GEMMLowpOutputStage != NONE, fuse the offset contribution with the output stage
160  if(info.gemmlowp_output_stage().type != GEMMLowpOutputStageType::NONE)
161  {
162  _fuse_output_stage = true;
163  _mm_result_s32 = TensorInfo(dst->tensor_shape(), 1, DataType::S32);
164  }
165 
166  // Initialize assembly kernel meta-data
167  const cpu::AsmGemmInfo asm_info = init_assembly_metadata(gemm_info);
168 #ifdef __aarch64__
169  switch(a->data_type())
170  {
171  case DataType::QASYMM8:
173  case DataType::U8:
174  case DataType::S8:
175  {
176  if(is_data_type_quantized_asymmetric(a_to_use->data_type()) && info.gemmlowp_output_stage().type == GEMMLowpOutputStageType::QUANTIZE_DOWN_FIXEDPOINT)
177  {
178  auto c_info_to_use = c == nullptr ? nullptr : c;
179  _asm_glue->configure(a_to_use, b, c_info_to_use, dst, asm_info);
180  _fused_assembly_path = _asm_glue->is_configured();
181  }
182  else
183  {
184  auto output_to_use = (_fuse_output_stage ? &_mm_result_s32 : dst);
185  _asm_glue->configure(a_to_use, b, nullptr, output_to_use, asm_info);
186  }
187  _assembly_path = _asm_glue->is_configured();
188  break;
189  }
190  default:
191  {
192  ARM_COMPUTE_ERROR("Datatype not supported");
193  break;
194  }
195  }
196 #endif /* __aarch64__ */
197  if(!(_assembly_path || _run_vector_matrix_multiplication))
198  {
199  matrix_a = &_tmp_a;
200  matrix_b = &_tmp_b;
201 
202  // The interleaved output matrix will have the following shape: [ a_height * 4, ceil(a_width / 4.0f) ]
203  _tmp_a = TensorInfo(compute_interleaved_shape(*a_to_use), 1, a_to_use->data_type(), a_to_use->quantization_info());
204  // The transpose1xW output matrix will have the following shape: [ b_height * 16, ceil(b_width / 16.0f) ]
205  _tmp_b = TensorInfo(compute_transpose1xW_shape(*b), 1, b->data_type(), b->quantization_info());
206 
207  // Configure interleave kernel
208  _mtx_a_reshape_kernel = std::make_unique<kernels::CpuGemmInterleave4x4Kernel>();
209  _mtx_a_reshape_kernel->configure(a_to_use, &_tmp_a);
210 
211  // Configure transpose kernel
212  _mtx_b_reshape_kernel = std::make_unique<kernels::CpuGemmTranspose1xWKernel>();
213  _mtx_b_reshape_kernel->configure(b, &_tmp_b);
214  }
215 
216  if(!_fused_assembly_path)
217  {
218  // Build reduction info
219  const GEMMLowpReductionKernelInfo reduction_info(a_to_use->dimension(0), false, 0, false);
220 
221  // Initialize matrix B reduction kernel only if _a_offset is not equal to 0
222  if(_a_offset != 0)
223  {
224  _vector_sum_col = TensorInfo(compute_reductionA_shape(*b), 1, DataType::S32);
225 
226  // Configure Matrix B reduction kernel
227  _mtx_b_reduction_kernel = std::make_unique<kernels::CpuGemmLowpMatrixBReductionKernel>();
228  _mtx_b_reduction_kernel->configure(b, &_vector_sum_col, reduction_info);
229  }
230 
231  // Initialize Matrix A reduction kernel only if _b_offset is not equal to 0
232  if(_b_offset != 0)
233  {
234  _vector_sum_row = TensorInfo(compute_reductionB_shape(*a_to_use), 1, DataType::S32);
235 
236  // Configure matrix A reduction kernel
237  _mtx_a_reduction_kernel = std::make_unique<kernels::CpuGemmLowpMatrixAReductionKernel>();
238  _mtx_a_reduction_kernel->configure(a_to_use, &_vector_sum_row, reduction_info);
239  }
240 
241  if(_fuse_output_stage)
242  {
243  // Configure matrix multiply kernel
244  if(!_assembly_path)
245  {
246  _mm_kernel = std::make_unique<kernels::CpuGemmLowpMatrixMultiplyKernel>();
247  _mm_kernel->configure(matrix_a, matrix_b, &_mm_result_s32);
248  }
249 
250  _offset_contribution_output_stage_kernel = std::make_unique<kernels::CpuGemmLowpOffsetContributionOutputStageKernel>();
251  _offset_contribution_output_stage_kernel->configure(&_mm_result_s32,
252  _a_offset == 0 ? nullptr : &_vector_sum_col,
253  _b_offset == 0 ? nullptr : &_vector_sum_row, c,
254  _flip_signedness ? &_signed_output : dst,
255  a->dimension(0),
256  _a_offset, _b_offset, info.gemmlowp_output_stage());
257 
258  if(_flip_signedness)
259  {
260  _convert_from_signed_asymm = std::make_unique<kernels::CpuConvertQuantizedSignednessKernel>();
261  _convert_from_signed_asymm->configure(&_signed_output, dst);
262  }
263  }
264  else
265  {
266  // Configure matrix multiply kernel
267  if(!_assembly_path)
268  {
269  _mm_kernel = std::make_unique<kernels::CpuGemmLowpMatrixMultiplyKernel>();
270  _mm_kernel->configure(matrix_a, matrix_b, dst);
271  }
272  // Configure offset contribution kernel
273  _offset_contribution_kernel = std::make_unique<kernels::CpuGemmLowpOffsetContributionKernel>();
274  _offset_contribution_kernel->configure(dst, _a_offset == 0 ? nullptr : &_vector_sum_col, _b_offset == 0 ? nullptr : &_vector_sum_row, a_to_use->dimension(0),
275  _a_offset, _b_offset);
276  }
277  }
278  // Configure activation
279  const ActivationLayerInfo &activation = gemm_info.activation_info();
280  _run_activation = activation.enabled() && (!_assembly_path || !cpu::CpuGemmAssemblyDispatch::is_activation_supported(activation));
281  if(_run_activation)
282  {
283  _activation_func = std::make_unique<CpuActivation>();
284  _activation_func->configure(dst, nullptr, activation);
285  }
286 
287  if(_assembly_path)
288  {
289  auto asm_mem_req = _asm_glue->workspace();
290  _aux_mem[AsmGemmWorkspace] = asm_mem_req[AsmGemmWorkspace];
291  _aux_mem[Pretranspose] = asm_mem_req[Pretranspose];
292  }
293 
294  // Request memory for LHS and RHS reshape matrix
295  _aux_mem[VectorSumCol] = MemoryInfo(offset_int_vec(VectorSumCol), !_fused_assembly_path && _a_offset != 0
296  && _reshape_b_only_on_first_run ?
297  MemoryLifetime::Persistent :
298  MemoryLifetime::Temporary,
299  _vector_sum_col.total_size());
300  _aux_mem[VectorSumRow] = MemoryInfo(offset_int_vec(VectorSumRow), MemoryLifetime::Temporary, _vector_sum_row.total_size());
301  _aux_mem[TmpA] = MemoryInfo(offset_int_vec(TmpA), MemoryLifetime::Temporary, _tmp_a.total_size());
302  _aux_mem[TmpB] = MemoryInfo(offset_int_vec(TmpB), _reshape_b_only_on_first_run ? MemoryLifetime::Persistent : MemoryLifetime::Temporary, _tmp_b.total_size());
303  _aux_mem[MMResultS32] = MemoryInfo(offset_int_vec(MMResultS32), MemoryLifetime::Temporary, _mm_result_s32.total_size());
304  _aux_mem[SignedA] = MemoryInfo(offset_int_vec(SignedA), MemoryLifetime::Temporary, _signed_a.total_size());
305  _aux_mem[SignedOutput] = MemoryInfo(offset_int_vec(SignedOutput), MemoryLifetime::Temporary, _signed_output.total_size());
306 }
Quantize using a fixed point multiplication.
std::unique_ptr< ITensorInfo > clone() const override
Provide a clone of the current object of class T.
Definition: TensorInfo.cpp:282
SimpleTensor< float > b
Definition: DFT.cpp:157
#define ARM_COMPUTE_ERROR(msg)
Print the given message then throw an std::runtime_error.
Definition: Error.h:352
1 channel, 1 U8 per channel
static bool is_activation_supported(const ActivationLayerInfo &activation)
Checks if activation is supported by the gemm assembly dispatcher.
TensorShape compute_reductionA_shape(const ITensorInfo &b)
Calculate the reductionA shape used in GEMMLowp.
QuantizationInfo quantization_info() const override
Get the quantization settings (scale and offset) of the tensor.
Definition: TensorInfo.h:287
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
TensorShape compute_interleaved_shape(const ITensorInfo &a, int mult_interleave4x4_height=1, bool reinterpret_input_as_3d=false)
Calculate the interleaved shape of an input tensor.
1 channel, 1 S32 per channel
bool is_data_type_quantized_per_channel(DataType dt)
Check if a given data type is of per channel type.
Definition: Utils.h:1107
quantized, asymmetric fixed-point 8-bit number unsigned
size_t total_size() const override
Returns the total size of the tensor in bytes.
Definition: TensorInfo.h:250
UniformQuantizationInfo uniform() const
Return per layer quantization info.
bool is_data_type_quantized_asymmetric(DataType dt)
Check if a given data type is of asymmetric quantized type.
Definition: Utils.h:1052
TensorShape compute_reductionB_shape(const ITensorInfo &a)
Calculate the reductionB shape used in GEMMLowp.
ScaleKernelInfo info(interpolation_policy, default_border_mode, PixelValue(), sampling_policy, false)
TensorShape compute_transpose1xW_shape(const ITensorInfo &b)
Calculate the transposed 1xW shape.
#define ARM_COMPUTE_LOG_PARAMS(...)
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:157
int offset_int_vec(int offset)
Definition: MemoryHelpers.h:38
quantized, asymmetric fixed-point 8-bit number signed
DataType
Available data types.
Definition: Types.h:79
signed 8-bit number
static Status validate(const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *dst, const GEMMInfo &gemm_info=GEMMInfo())
Static function to check if given info will lead to a valid configuration.

◆ prepare()

void prepare ( ITensorPack constants)
overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Parameters
[in]constantsVector that contains the constants tensors.
Note
Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from INEOperator.

Definition at line 669 of file CpuGemmLowpMatrixMultiplyCore.cpp.

References arm_compute::ACL_DST, arm_compute::ACL_SRC, arm_compute::ACL_SRC_1, Window::DimX, Window::DimY, Scheduler::get(), ITensorPack::get_const_tensor(), ITensorPack::get_tensor(), arm_compute::offset_int_vec(), arm_compute::test::validation::pack, and IScheduler::schedule_op().

Referenced by CpuGemmLowpMatrixMultiplyCore::run().

670 {
671  if(!_is_prepared)
672  {
673  auto original_b = tensors.get_const_tensor(TensorType::ACL_SRC_1);
674  // Run assembly reshape
675  if(_asm_glue->is_configured())
676  {
677  _asm_glue->prepare(tensors);
678  }
679  // Run non-assembly reshape
680  else if(_reshape_b_only_on_first_run && !_run_vector_matrix_multiplication && !_asm_glue->is_configured())
681  {
682  // Run reshape kernel and mark original weights tensor as unused
683  ITensor *tmp_b_p = utils::cast::polymorphic_downcast<ITensor *>(tensors.get_tensor(offset_int_vec(TmpB)));
684  CpuAuxTensorHandler tmp_b(_tmp_b, *tmp_b_p);
685  ITensorPack pack =
686  {
687  { TensorType::ACL_SRC, original_b },
688  { TensorType::ACL_DST, tmp_b.get() }
689  };
690  NEScheduler::get().schedule_op(_mtx_b_reshape_kernel.get(), Window::DimY, _mtx_b_reshape_kernel->window(), pack);
691  }
692 
693  // Run matrix B reduction kernel only if _a_offset is not equal to 0
694  if(!_fused_assembly_path && _a_offset != 0 && _reshape_b_only_on_first_run)
695  {
696  ITensor *vector_sum_col_p = utils::cast::polymorphic_downcast<ITensor *>(tensors.get_tensor(offset_int_vec(VectorSumCol)));
697  CpuAuxTensorHandler vector_sum_col(_vector_sum_col, *vector_sum_col_p);
698  ITensorPack pack =
699  {
700  { TensorType::ACL_SRC, original_b },
701  { TensorType::ACL_DST, vector_sum_col.get() }
702  };
703  NEScheduler::get().schedule_op(_mtx_b_reduction_kernel.get(), Window::DimX, _mtx_b_reduction_kernel->window(), pack);
704  }
705  _is_prepared = true;
706  }
707 }
virtual void schedule_op(ICPPKernel *kernel, const Hints &hints, const Window &window, ITensorPack &tensors)=0
Runs the kernel in the same thread as the caller synchronously.
static constexpr size_t DimX
Alias for dimension 0 also known as X dimension.
Definition: Window.h:43
static constexpr size_t DimY
Alias for dimension 1 also known as Y dimension.
Definition: Window.h:45
int offset_int_vec(int offset)
Definition: MemoryHelpers.h:38
static IScheduler & get()
Access the scheduler singleton.
Definition: Scheduler.cpp:94

◆ run()

void run ( ITensorPack tensors)
overridevirtual

Run the kernels contained in the function.

Parameters
[in]tensorsVector that contains the tensors to operate on.

Reimplemented from INEOperator.

Definition at line 504 of file CpuGemmLowpMatrixMultiplyCore.cpp.

References arm_compute::ACL_DST, arm_compute::ACL_SRC, arm_compute::ACL_SRC_0, arm_compute::ACL_SRC_1, arm_compute::ACL_SRC_2, arm_compute::ACL_SRC_3, ITensorPack::add_const_tensor(), ITensorPack::add_tensor(), arm_compute::test::validation::b, Window::DimX, Window::DimY, arm_compute::test::validation::dst, GEMMInfo::gemmlowp_output_stage(), Scheduler::get(), CpuAuxTensorHandler::get(), ITensorPack::get_const_tensor(), ITensorPack::get_tensor(), arm_compute::is_data_type_quantized_asymmetric(), arm_compute::offset_int_vec(), arm_compute::test::validation::pack, CpuGemmLowpMatrixMultiplyCore::prepare(), arm_compute::QUANTIZE_DOWN_FIXEDPOINT, IScheduler::schedule_op(), and GEMMLowpOutputStageInfo::type.

505 {
506  prepare(tensors);
507 
508  auto a = tensors.get_const_tensor(TensorType::ACL_SRC_0);
509  auto b = tensors.get_const_tensor(TensorType::ACL_SRC_1);
510  auto c = tensors.get_const_tensor(TensorType::ACL_SRC_2);
511  auto dst = tensors.get_tensor(TensorType::ACL_DST);
512  auto a_to_use = a;
513  auto matrix_a = a;
514  auto matrix_b = b;
515 
516  CpuAuxTensorHandler vector_sum_col(offset_int_vec(VectorSumCol), _vector_sum_col, tensors, false);
517  CpuAuxTensorHandler vector_sum_row(offset_int_vec(VectorSumRow), _vector_sum_row, tensors, false);
518  CpuAuxTensorHandler tmp_a(offset_int_vec(TmpA), _tmp_a, tensors, false);
519  CpuAuxTensorHandler tmp_b(offset_int_vec(TmpB), _tmp_b, tensors, true);
520  CpuAuxTensorHandler mm_result_s32(offset_int_vec(MMResultS32), _mm_result_s32, tensors, false);
521  CpuAuxTensorHandler signed_a(offset_int_vec(SignedA), _signed_a, tensors, false);
522  CpuAuxTensorHandler signed_output(offset_int_vec(SignedOutput), _signed_output, tensors, false);
523 
524  // Convert QASYMM8->QASYMM8_SIGNED
525  if(_flip_signedness)
526  {
527  ITensorPack pack =
528  {
529  { TensorType::ACL_SRC, a },
530  { TensorType::ACL_DST, signed_a.get() }
531  };
532  NEScheduler::get().schedule_op(_convert_to_signed_asymm.get(), Window::DimY, _convert_to_signed_asymm->window(), pack);
533  a_to_use = signed_a.get();
534  matrix_a = signed_a.get();
535  }
536 
537  // Run GEMM
538  if(_asm_glue->is_configured())
539  {
540  ITensorPack asm_glue_tensors = tensors;
541  auto output_to_use = (_fuse_output_stage ? mm_result_s32.get() : dst);
543  {
544  asm_glue_tensors.add_const_tensor(TensorType::ACL_SRC_0, a_to_use);
545  asm_glue_tensors.add_const_tensor(TensorType::ACL_SRC_1, b);
546  asm_glue_tensors.add_const_tensor(TensorType::ACL_SRC_2, c);
547  asm_glue_tensors.add_tensor(TensorType::ACL_DST, dst);
548  }
549  else
550  {
551  asm_glue_tensors.add_const_tensor(TensorType::ACL_SRC_0, a_to_use);
552  asm_glue_tensors.add_const_tensor(TensorType::ACL_SRC_1, b);
553  asm_glue_tensors.add_tensor(TensorType::ACL_DST, output_to_use);
554  }
555  _asm_glue->run(asm_glue_tensors);
556  }
557  else
558  {
559  if(!_run_vector_matrix_multiplication)
560  {
561  matrix_a = tmp_a.get();
562  matrix_b = tmp_b.get();
563  // Run interleave kernel
564  ITensorPack pack_a =
565  {
566  { TensorType::ACL_SRC, a_to_use },
567  { TensorType::ACL_DST, tmp_a.get() }
568  };
569  NEScheduler::get().schedule_op(_mtx_a_reshape_kernel.get(), Window::DimY, _mtx_a_reshape_kernel->window(), pack_a);
570 
571  if(!_reshape_b_only_on_first_run)
572  {
573  ITensorPack pack_b =
574  {
575  { TensorType::ACL_SRC, b },
576  { TensorType::ACL_DST, tmp_b.get() }
577  };
578  // Run transpose kernel
579  NEScheduler::get().schedule_op(_mtx_b_reshape_kernel.get(), Window::DimY, _mtx_b_reshape_kernel->window(), pack_b);
580  }
581  }
582  ITensorPack pack_mm =
583  {
584  { TensorType::ACL_SRC_0, matrix_a },
585  { TensorType::ACL_SRC_1, matrix_b }
586  };
587  if(_fuse_output_stage)
588  {
589  pack_mm.add_tensor(TensorType::ACL_DST, mm_result_s32.get());
590  }
591  else
592  {
593  pack_mm.add_tensor(TensorType::ACL_DST, dst);
594  }
595  NEScheduler::get().schedule_op(_mm_kernel.get(), Window::DimY, _mm_kernel->window(), pack_mm);
596  }
597 
598  if(!_fused_assembly_path)
599  {
600  // Run matrix A reduction kernel only if _b_offset is not equal to 0
601  if(_b_offset != 0)
602  {
603  ITensorPack pack =
604  {
605  { TensorType::ACL_SRC, a_to_use },
606  { TensorType::ACL_DST, vector_sum_row.get() }
607  };
608  NEScheduler::get().schedule_op(_mtx_a_reduction_kernel.get(), Window::DimX, _mtx_a_reduction_kernel->window(), pack);
609  }
610 
611  // Run matrix B reduction kernel only if _a_offset is not equal to 0
612  if(_a_offset != 0 && !_reshape_b_only_on_first_run)
613  {
614  ITensorPack pack =
615  {
616  { TensorType::ACL_SRC, b },
617  { TensorType::ACL_DST, vector_sum_col.get() }
618  };
619  NEScheduler::get().schedule_op(_mtx_b_reduction_kernel.get(), Window::DimX, _mtx_b_reduction_kernel->window(), pack);
620  }
621 
622  if(_fuse_output_stage)
623  {
624  ITensorPack pack;
625  pack.add_tensor(TensorType::ACL_SRC_0, mm_result_s32.get());
626  pack.add_tensor(TensorType::ACL_SRC_1, _a_offset == 0 ? nullptr : vector_sum_col.get());
627  pack.add_tensor(TensorType::ACL_SRC_2, _b_offset == 0 ? nullptr : vector_sum_row.get());
628  pack.add_tensor(TensorType::ACL_SRC_3, c);
629  pack.add_tensor(TensorType::ACL_DST, _flip_signedness ? signed_output.get() : dst);
630 
631  // Run offset contribution kernel
632  NEScheduler::get().schedule_op(_offset_contribution_output_stage_kernel.get(), Window::DimY, _offset_contribution_output_stage_kernel->window(), pack);
633  }
634  else
635  {
636  ITensorPack pack;
637  pack.add_tensor(TensorType::ACL_SRC_0, _a_offset == 0 ? nullptr : vector_sum_col.get());
638  pack.add_tensor(TensorType::ACL_SRC_1, _b_offset == 0 ? nullptr : vector_sum_row.get());
639  pack.add_tensor(TensorType::ACL_DST, dst);
640 
641  // Run offset contribution kernel
642  NEScheduler::get().schedule_op(_offset_contribution_kernel.get(), Window::DimY, _offset_contribution_kernel->window(), pack);
643  }
644  }
645 
646  // Convert QASYMM8_SIGNED->QASYMM8
647  if(!_fused_assembly_path && _fuse_output_stage && _flip_signedness)
648  {
649  ITensorPack pack =
650  {
651  { TensorType::ACL_SRC, signed_output.get() },
653  };
654  NEScheduler::get().schedule_op(_convert_from_signed_asymm.get(), Window::DimY, _convert_from_signed_asymm->window(), pack);
655  }
656 
657  // Run fused activation unless already run in the fused assembly
658  if(_run_activation)
659  {
660  ITensorPack pack =
661  {
662  { TensorType::ACL_SRC, dst },
663  { TensorType::ACL_DST, dst }
664  };
665  _activation_func->run(pack);
666  }
667 }
Quantize using a fixed point multiplication.
SimpleTensor< float > b
Definition: DFT.cpp:157
void prepare(ITensorPack &tensors) override
Prepare the function for executing.
virtual void schedule_op(ICPPKernel *kernel, const Hints &hints, const Window &window, ITensorPack &tensors)=0
Runs the kernel in the same thread as the caller synchronously.
GEMMLowpOutputStageInfo gemmlowp_output_stage() const
GEMMLowp output stage.
Definition: Types.h:2390
GEMMLowpOutputStageType type
GEMMLowp output stage type.
Definition: Types.h:2222
static constexpr size_t DimX
Alias for dimension 0 also known as X dimension.
Definition: Window.h:43
bool is_data_type_quantized_asymmetric(DataType dt)
Check if a given data type is of asymmetric quantized type.
Definition: Utils.h:1052
static constexpr size_t DimY
Alias for dimension 1 also known as Y dimension.
Definition: Window.h:45
int offset_int_vec(int offset)
Definition: MemoryHelpers.h:38
void add_tensor(int id, ITensor *tensor)
Add tensor to the pack.
Definition: ITensorPack.cpp:39
static IScheduler & get()
Access the scheduler singleton.
Definition: Scheduler.cpp:94

◆ validate()

Status validate ( const ITensorInfo a,
const ITensorInfo b,
const ITensorInfo c,
const ITensorInfo dst,
const GEMMInfo gemm_info = GEMMInfo() 
)
static

Static function to check if given info will lead to a valid configuration.

Similar to CpuGemmLowpMatrixMultiplyCore::configure()

Returns
a status

Definition at line 308 of file CpuGemmLowpMatrixMultiplyCore.cpp.

References GEMMInfo::activation_info(), ARM_COMPUTE_RETURN_ERROR_ON, ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_RETURN_ERROR_ON_MSG, ARM_COMPUTE_RETURN_ON_ERROR, arm_compute::auto_init_if_empty(), arm_compute::test::validation::b, ICloneable< T >::clone(), arm_compute::misc::shape_calculator::compute_reductionA_shape(), arm_compute::misc::shape_calculator::compute_reductionB_shape(), ITensorInfo::data_type(), ITensorInfo::dimension(), dt, ActivationLayerInfo::enabled(), arm_compute::test::validation::gemm_info, GEMMLowpOutputStageInfo::gemmlowp_max_bound, GEMMLowpOutputStageInfo::gemmlowp_min_bound, GEMMLowpOutputStageInfo::gemmlowp_offset, GEMMInfo::gemmlowp_output_stage(), GEMMInfo::is_a_reshaped(), GEMMInfo::is_b_reshaped(), arm_compute::is_data_type_quantized_asymmetric(), arm_compute::is_data_type_quantized_per_channel(), arm_compute::NONE, UniformQuantizationInfo::offset, arm_compute::QASYMM8, arm_compute::QASYMM8_SIGNED, arm_compute::QSYMM8, arm_compute::QSYMM8_PER_CHANNEL, ITensorInfo::quantization_info(), arm_compute::QUANTIZE_DOWN_FIXEDPOINT, arm_compute::S32, UniformQuantizationInfo::scale, TensorShape::set(), ITensorInfo::tensor_shape(), GEMMLowpOutputStageInfo::type, QuantizationInfo::uniform(), CpuActivation::validate(), CpuConvertQuantizedSignednessKernel::validate(), CpuGemmLowpMatrixAReductionKernel::validate(), CpuGemmLowpMatrixMultiplyKernel::validate(), CpuGemmInterleave4x4Kernel::validate(), CpuGemmLowpOffsetContributionKernel::validate(), CpuGemmTranspose1xWKernel::validate(), CpuGemmLowpOffsetContributionOutputStageKernel::validate(), CpuGemmAssemblyDispatch::validate(), and CpuGemmLowpMatrixBReductionKernel::validate().

Referenced by CpuGemmLowpMatrixMultiplyCore::configure(), and NEGEMMLowpMatrixMultiplyCore::validate().

309 {
313  ARM_COMPUTE_RETURN_ERROR_ON_MSG(c != nullptr && gemm_info.gemmlowp_output_stage().type == GEMMLowpOutputStageType::NONE, "Bias addition not supported in NEGEMMLowpMatrixMultiplyCore for output S32");
314  ARM_COMPUTE_RETURN_ERROR_ON_MSG((a)->dimension(0) != (b)->dimension(1),
315  "The product AB is defined only if the number of columns in A is equal to the number of rows in B");
316  ARM_COMPUTE_RETURN_ERROR_ON_MSG(gemm_info.is_a_reshaped(), "Matrix A already reshaped is not supported");
317  ARM_COMPUTE_RETURN_ERROR_ON_MSG(gemm_info.is_b_reshaped(), "Matrix B already reshaped is not supported");
318 
319  GEMMInfo info = gemm_info;
320  const ITensorInfo *matrix_a_info = a;
321  const ITensorInfo *matrix_b_info = b;
322 
323  const ITensorInfo *a_to_use = a;
324 
325  TensorInfo tmp_a_info{};
326  TensorInfo tmp_b_info{};
327  TensorInfo mm_result_s32_info{};
328 
329  int32_t a_offset = a->quantization_info().uniform().offset;
330  int32_t b_offset = b->quantization_info().uniform().offset;
331 
332  bool fuse_output_stage = info.gemmlowp_output_stage().type != GEMMLowpOutputStageType::NONE;
333  if(fuse_output_stage)
334  {
335  auto_init_if_empty(mm_result_s32_info, a->clone()->set_tensor_shape(output->tensor_shape()).set_data_type(DataType::S32));
336  }
337 
338  // Convert QASYMM8->QASYMM8_SIGNED
339  TensorInfo signed_a{};
340  TensorInfo signed_output{};
341  bool flip_signedness = is_data_type_quantized_per_channel(b->data_type()) && (a->data_type() == DataType::QASYMM8) && info.reshape_b_only_on_first_run();
342  if(flip_signedness)
343  {
344  const int32_t offset_correction = 128;
346  const UniformQuantizationInfo iqinfo = a_to_use->quantization_info().uniform();
347 
348  signed_a = a_to_use->clone()->set_data_type(dt).set_quantization_info(QuantizationInfo(iqinfo.scale, iqinfo.offset + offset_correction));
350  a_to_use = &signed_a;
351  a_offset = signed_a.quantization_info().uniform().offset;
352 
353  const UniformQuantizationInfo oqinfo = output->quantization_info().uniform();
354  signed_output = output->clone()->set_data_type(dt).set_quantization_info(QuantizationInfo(oqinfo.scale, oqinfo.offset - offset_correction));
355 
356  // Output stage correction
357  GEMMLowpOutputStageInfo output_stage_corr = info.gemmlowp_output_stage();
358  output_stage_corr.gemmlowp_offset = signed_output.quantization_info().uniform().offset;
359  output_stage_corr.gemmlowp_min_bound -= offset_correction;
360  output_stage_corr.gemmlowp_max_bound -= offset_correction;
361  info.set_gemmlowp_output_stage(output_stage_corr);
362 
363  // Update matrix a
364  matrix_a_info = &signed_a;
365  }
366 
367  // Initialize assembly kernel meta-data
368  const AsmGemmInfo asm_info = init_assembly_metadata(info);
369 
370  // Check if we need to run the optimized assembly kernel
371  bool run_optimised = false;
372  bool run_optimised_requantized = false;
373  if(is_data_type_quantized_asymmetric(a_to_use->data_type()) && info.gemmlowp_output_stage().type == GEMMLowpOutputStageType::QUANTIZE_DOWN_FIXEDPOINT)
374  {
375  run_optimised = bool(CpuGemmAssemblyDispatch::validate(a_to_use, b, c, output, asm_info));
376  run_optimised_requantized = run_optimised;
377  }
378  else
379  {
380  run_optimised = bool(CpuGemmAssemblyDispatch::validate(a_to_use, b, nullptr, fuse_output_stage ? &mm_result_s32_info : output, asm_info));
381  }
382 
383  if(run_optimised)
384  {
385  ARM_COMPUTE_RETURN_ERROR_ON(b->dimension(0) != output->dimension(0));
386  if(info.depth_output_gemm3d() != 0)
387  {
388  if(info.reinterpret_input_as_3d())
389  {
390  ARM_COMPUTE_RETURN_ERROR_ON(a->dimension(1) != output->dimension(1));
391  ARM_COMPUTE_RETURN_ERROR_ON(a->dimension(2) != output->dimension(2));
392  }
393  else
394  {
395  ARM_COMPUTE_RETURN_ERROR_ON(a->dimension(1) != output->dimension(1) * output->dimension(2));
396  }
397  }
398  else
399  {
400  ARM_COMPUTE_RETURN_ERROR_ON(a->dimension(1) != output->dimension(1));
401  }
402  }
403  else
404  {
405  ARM_COMPUTE_RETURN_ERROR_ON_MSG(info.reinterpret_input_as_3d(), "NEGEMM cannot reinterpret the input tensor as 3D");
406  ARM_COMPUTE_RETURN_ERROR_ON_MSG(info.depth_output_gemm3d() != 0, "NEGEMM cannot reinterpret the output tensor as 3D");
407 
408  const bool run_vector_matrix_multiplication = a->dimension(1) < 2;
409  if(!run_vector_matrix_multiplication)
410  {
411  matrix_a_info = &tmp_a_info;
412  matrix_b_info = &tmp_b_info;
413 
414  // The interleaved output matrix will have the following shape: [ a_height * 4, ceil(a_width / 4.0f) ]
415  TensorShape shape_tmp_a = a->tensor_shape();
416  shape_tmp_a.set(0, a->dimension(0) * 4);
417  shape_tmp_a.set(1, std::ceil(a->dimension(1) / 4.f));
418 
419  // The transpose1xW output matrix will have the following shape: [ b_height * 16, ceil(b_width / 16.0f) ]
420  TensorShape shape_tmp_b = b->tensor_shape();
421  shape_tmp_b.set(0, b->dimension(1) * 16);
422  shape_tmp_b.set(1, std::ceil(b->dimension(0) / 16.f));
423 
424  // Validate interleave kernel
425  auto_init_if_empty(tmp_a_info, a_to_use->clone()->set_tensor_shape(shape_tmp_a));
426  auto_init_if_empty(tmp_b_info, b->clone()->set_tensor_shape(shape_tmp_b));
427 
430  }
431  }
432 
433  if(!run_optimised_requantized)
434  {
435  TensorInfo info_vector_sum_col{};
436  TensorInfo info_vector_sum_row{};
437 
438  const GEMMLowpReductionKernelInfo reduction_info(a_to_use->dimension(0), false, 0, false);
439 
440  // Validate matrix B reduction kernel only if _a_offset is not equal to 0
441  if(a_offset != 0)
442  {
443  info_vector_sum_col = TensorInfo(compute_reductionA_shape(*b), 1, DataType::S32);
444 
445  // Configure Matrix B reduction kernel
447  }
448 
449  // Validate Matrix A reduction kernel only if _b_offset is not equal to 0
450  if(b_offset != 0)
451  {
452  info_vector_sum_row = TensorInfo(compute_reductionB_shape(*a), 1, DataType::S32);
453 
454  // Configure matrix A reduction kernel
455  ARM_COMPUTE_RETURN_ON_ERROR(kernels::CpuGemmLowpMatrixAReductionKernel::validate(a_to_use, &info_vector_sum_row, reduction_info));
456  }
457 
458  if(fuse_output_stage)
459  {
460  if(!run_optimised)
461  {
462  ARM_COMPUTE_RETURN_ERROR_ON_MSG(info.reinterpret_input_as_3d(), "CpuGemmLowpMatrixMultiplyKernel cannot reinterpret the input tensor as 3D");
463  ARM_COMPUTE_RETURN_ERROR_ON_MSG(info.depth_output_gemm3d() != 0, "CpuGemmLowpMatrixMultiplyKernel cannot reinterpret the output tensor as 3D");
464 
465  ARM_COMPUTE_RETURN_ON_ERROR(kernels::CpuGemmLowpMatrixMultiplyKernel::validate(matrix_a_info, matrix_b_info, &mm_result_s32_info));
466  }
467 
468  // Validate offset contribution kernel
470  a_offset == 0 ? nullptr : &info_vector_sum_col,
471  b_offset == 0 ? nullptr : &info_vector_sum_row,
472  c,
473  flip_signedness ? &signed_output : output,
474  a_offset, b_offset,
475  info.gemmlowp_output_stage()));
476  }
477  else
478  {
479  if(!run_optimised)
480  {
481  ARM_COMPUTE_RETURN_ERROR_ON_MSG(info.reinterpret_input_as_3d(), "CpuGemmLowpMatrixMultiplyKernel cannot reinterpret the input tensor as 3D");
482  ARM_COMPUTE_RETURN_ERROR_ON_MSG(info.depth_output_gemm3d() != 0, "CpuGemmLowpMatrixMultiplyKernel cannot reinterpret the output tensor as 3D");
483 
485  }
486  // Validate offset contribution kernel
488  a_offset == 0 ? nullptr : &info_vector_sum_col,
489  b_offset == 0 ? nullptr : &info_vector_sum_row,
490  a_offset, b_offset));
491  }
492  }
493 
494  // Validate activation
495  const ActivationLayerInfo &activation = gemm_info.activation_info();
496  if(activation.enabled())
497  {
498  ARM_COMPUTE_RETURN_ON_ERROR(CpuActivation::validate(output, nullptr, activation));
499  }
500 
501  return Status{};
502 }
static Status validate(const ITensorInfo *src, const ITensorInfo *dst, const GEMMLowpReductionKernelInfo &info)
Static function to check if given info will lead to a valid configuration.
Quantize using a fixed point multiplication.
static Status validate(const ITensorInfo *src, const ITensorInfo *dst)
Static function to check if given info will lead to a valid configuration.
SimpleTensor< float > b
Definition: DFT.cpp:157
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204
TensorShape compute_reductionA_shape(const ITensorInfo &b)
Calculate the reductionA shape used in GEMMLowp.
static Status validate(const ITensorInfo *src, const ITensorInfo *dst)
Static function to check if given info will lead to a valid configuration of CpuGemmTranspose1xWKerne...
#define ARM_COMPUTE_RETURN_ERROR_ON(cond)
If the condition is true, an error is returned.
Definition: Error.h:296
1 channel, 1 S32 per channel
static Status validate(const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *d, const AsmGemmInfo &info)
Indicates whether or not this function can be used to process the given parameters.
static Status validate(const ITensorInfo *src, const ITensorInfo *dst, const GEMMLowpReductionKernelInfo &info)
Static function to check if given info will lead to a valid configuration.
bool is_data_type_quantized_per_channel(DataType dt)
Check if a given data type is of per channel type.
Definition: Utils.h:1107
quantized, asymmetric fixed-point 8-bit number unsigned
bool auto_init_if_empty(ITensorInfo &info, const TensorShape &shape, int num_channels, DataType data_type, QuantizationInfo quantization_info=QuantizationInfo())
Auto initialize the tensor info (shape, number of channels and data type) if the current assignment i...
static Status validate(const ITensorInfo *mm_result, const ITensorInfo *vector_sum_col, const ITensorInfo *vector_sum_row, int32_t a_offset, int32_t b_offset)
Static function to check if given info will lead to a valid configuration.
quantized, symmetric fixed-point 8-bit number
bool is_data_type_quantized_asymmetric(DataType dt)
Check if a given data type is of asymmetric quantized type.
Definition: Utils.h:1052
static Status validate(const ITensorInfo *mm_result, const ITensorInfo *vector_sum_col, const ITensorInfo *vector_sum_row, const ITensorInfo *bias, const ITensorInfo *dst, int32_t a_offset, int32_t b_offset, GEMMLowpOutputStageInfo output_stage)
Static function to check if given info will lead to a valid configuration.
quantized, symmetric per channel fixed-point 8-bit number
static Status validate(const ITensorInfo *src0, const ITensorInfo *src1, const ITensorInfo *dst)
Static function to check if given info will lead to a valid configuration.
TensorShape compute_reductionB_shape(const ITensorInfo &a)
Calculate the reductionB shape used in GEMMLowp.
static Status validate(const ITensorInfo *src, const ITensorInfo *dst)
Static function to check if given info will lead to a valid configuration of CpuGemmInterleave4x4Kern...
ScaleKernelInfo info(interpolation_policy, default_border_mode, PixelValue(), sampling_policy, false)
static Status validate(const ITensorInfo *input, const ITensorInfo *output, const ActivationLayerInfo &act_info)
Static function to check if given info will lead to a valid configuration.
#define ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(t, c,...)
Definition: Validate.h:788
#define ARM_COMPUTE_RETURN_ERROR_ON_MSG(cond, msg)
If the condition is true, an error is returned.
Definition: Error.h:244
quantized, asymmetric fixed-point 8-bit number signed
DataType
Available data types.
Definition: Types.h:79

◆ workspace()

experimental::MemoryRequirements workspace ( ) const
overridevirtual

Return the memory requirements required by the workspace.

Reimplemented from INEOperator.

Definition at line 708 of file CpuGemmLowpMatrixMultiplyCore.cpp.

709 {
710  return _aux_mem;
711 }

The documentation for this class was generated from the following files: