Compute Library
 22.08
CpuGemmConv2d Class Reference

Basic function to compute the convolution layer. More...

#include <CpuGemmConv2d.h>

Collaboration diagram for CpuGemmConv2d:
[legend]

Public Member Functions

 CpuGemmConv2d ()
 Constructor. More...
 
 CpuGemmConv2d (const CpuGemmConv2d &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 CpuGemmConv2d (CpuGemmConv2d &&)=delete
 Prevent instances of this class from being moved (As this class contains non movable objects) More...
 
CpuGemmConv2doperator= (const CpuGemmConv2d &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
CpuGemmConv2doperator= (CpuGemmConv2d &&)=delete
 Prevent instances of this class from being moved (As this class contains non movable objects) More...
 
 ~CpuGemmConv2d ()
 Destructor. More...
 
void configure (const ITensorInfo *src, const ITensorInfo *weights, const ITensorInfo *biases, ITensorInfo *dst, const PadStrideInfo &conv_info, const WeightsInfo &weights_info=WeightsInfo(), const Size2D &dilation=Size2D(1U, 1U), const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false, unsigned int num_groups=1)
 Set the input and output tensors. More...
 
void run (ITensorPack &tensors) override
 Run the kernels contained in the function. More...
 
void prepare (ITensorPack &tensors) override
 Prepare the function for executing. More...
 
experimental::MemoryRequirements workspace () const override
 Return the memory requirements required by the workspace. More...
 
- Public Member Functions inherited from INEOperator
 INEOperator (IRuntimeContext *ctx=nullptr)
 Constructor. More...
 
 INEOperator (const INEOperator &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 INEOperator (INEOperator &&)=default
 Default move constructor. More...
 
INEOperatoroperator= (const INEOperator &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
INEOperatoroperator= (INEOperator &&)=default
 Default move assignment operator. More...
 
 ~INEOperator ()
 Default destructor. More...
 
- Public Member Functions inherited from IOperator
virtual ~IOperator ()=default
 Destructor. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *src, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, const PadStrideInfo &conv_info, const WeightsInfo &weights_info=WeightsInfo(), const Size2D &dilation=Size2D(1U, 1U), const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false, unsigned int num_groups=1)
 Static function to check if given info will lead to a valid configuration. More...
 
static Status has_opt_impl (arm_compute::WeightFormat &expected_weight_format, const ITensorInfo *src, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, const PadStrideInfo &conv_info, const WeightsInfo &weights_info=WeightsInfo(), const Size2D &dilation=Size2D(1U, 1U), const ActivationLayerInfo &act_info=ActivationLayerInfo(), const bool enable_fast_math=false)
 Indicates whether or not there is an optimal assembly implementation that can be used to process the given parameters. More...
 

Detailed Description

Basic function to compute the convolution layer.

This function calls the following kernels/functions:

  1. cpu::kernels::CpuIm2ColKernel
  2. CpuGemm (if the data type is BFLOAT16/FP16/FP32)
  3. CpuGemmLowpMatrixMultiplyCore (if the data type is QASYMM8/QASYMM8_SIGNED)
  4. CpuGemmLowpOutputStage (if the data type is QASYMM8/QASYMM8_SIGNED)
  5. cpu::kernels::CpuCol2ImKernel (if NCHW data layout)
  6. kernels::CpuWeightsReshapeKernel

Definition at line 58 of file CpuGemmConv2d.h.

Constructor & Destructor Documentation

◆ CpuGemmConv2d() [1/3]

Constructor.

Definition at line 94 of file CpuGemmConv2d.cpp.

References arm_compute::NCHW.

95  : _weights_reshape_kernel(nullptr), _im2col_kernel(), _mm_gemm(), _mm_gemmlowp(), _col2im_kernel(), _reshape_kernel(), _im2col_output(), _weights_reshaped(), _gemm_output(), _gemm_output_3d(),
96  _data_layout(DataLayout::NCHW), _skip_im2col(false), _skip_col2im(false), _is_quantized(false), _is_prepared(false), _aux_mem(AuxTensorIdx::Count)
97 {
98 }
Num samples, channels, height, width.

◆ CpuGemmConv2d() [2/3]

CpuGemmConv2d ( const CpuGemmConv2d )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ CpuGemmConv2d() [3/3]

CpuGemmConv2d ( CpuGemmConv2d &&  )
delete

Prevent instances of this class from being moved (As this class contains non movable objects)

◆ ~CpuGemmConv2d()

~CpuGemmConv2d ( )
default

Destructor.

Member Function Documentation

◆ configure()

void configure ( const ITensorInfo src,
const ITensorInfo weights,
const ITensorInfo biases,
ITensorInfo dst,
const PadStrideInfo conv_info,
const WeightsInfo weights_info = WeightsInfo(),
const Size2D dilation = Size2D(1U, 1U),
const ActivationLayerInfo act_info = ActivationLayerInfo(),
bool  enable_fast_math = false,
unsigned int  num_groups = 1 
)

Set the input and output tensors.

Valid data layouts:

  • NHWC
  • NCHW

Valid data type configurations:

src0 src1 src2 dst
F16 F16 F16 F16
F32 F32 F32 F32
BFLOAT16 BFLOAT16 BFLOAT16 BFLOAT16
QASYMM8 QASYMM8 S32 QASYMM8
QASYMM8 QSYMM8_PER_CHANNEL S32 QASYMM8
QASYMM8_SIGNED QASYMM8_SIGNED S32 QASYMM8_SIGNED
QASYMM8_SIGNED QSYMM8_PER_CHANNEL S32 QASYMM8_SIGNED
Parameters
[in]srcSource tensor info. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: QASYMM8/QASYMM8_SIGNED/BFLOAT16/F16/F32.
[in]weightsWeights tensor info. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. Data type supported: QASYMM8/QASYMM8_SIGNED/QSYMM8_PER_CHANNEL/BFLOAT16/F16/F32.
[in]biasesBiases tensor info. Shared biases supported. Biases are 1D tensor with dimensions [OFM]. Data type supported: Should match input data type, except for input of QASYMM8/QASYMM8_SIGNED type where biases should be of S32 type.
[out]dstDestination tensor info. 3 lower dimensions represent a single output [width, height, OFM], while the rest represent batch of outputs. Data types supported: Same as input.
[in]conv_infoContains padding and stride information described in PadStrideInfo.
[in]weights_infoSpecifies if the weights tensor has been reshaped with NEWeightsReshapeKernel. If this is not part of the fully connected layer the weights tensor has also been transposed with cpu::kernels::CpuGemmTranspose1xWKernel. Data type supported: Same as input.
[in]dilation(Optional) Dilation, in elements, across x and y. Defaults to (1, 1).
[in]act_info(Optional) Activation layer information in case of a fused activation. Only RELU, BOUNDED_RELU and LU_BOUNDED_RELU supported.
[in]enable_fast_math(Optional) Enable fast math computation. In case this flag were set, the function could dispatch the fastest implementation available which may introduce a drop of accuracy as well. Default is false
[in]num_groups(Optional) Number of groups when performing a grouped convolution. num_groups != 1 is not supported

Definition at line 256 of file CpuGemmConv2d.cpp.

References ARM_COMPUTE_ERROR_ON_MSG, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, ARM_COMPUTE_LOG_PARAMS, ARM_COMPUTE_UNUSED, arm_compute::BATCHES, arm_compute::BFLOAT16, arm_compute::test::validation::conv_info, ITensorInfo::data_layout(), arm_compute::test::validation::data_layout, arm_compute::test::validation::data_type, ITensorInfo::data_type(), ITensorInfo::dimension(), arm_compute::test::validation::dst, arm_compute::F32, arm_compute::get_data_layout_dimension_index(), arm_compute::HEIGHT, arm_compute::is_data_type_quantized_asymmetric(), arm_compute::NCHW, arm_compute::NHWC, arm_compute::offset_int_vec(), arm_compute::experimental::Prepare, ITensorInfo::quantization_info(), WeightsInfo::retain_internal_weights(), arm_compute::scaled_dimensions(), TensorShape::set(), ITensorInfo::set_data_layout(), arm_compute::test::validation::set_data_layout(), TensorInfo::set_data_type(), TensorInfo::set_quantization_info(), arm_compute::test::validation::src, PadStrideInfo::stride(), TensorInfo::tensor_shape(), TensorInfo::total_size(), arm_compute::UNSPECIFIED, CpuGemmConv2d::validate(), WeightsInfo::weight_format(), and arm_compute::WIDTH.

258 {
262  weights,
263  biases,
264  dst,
265  conv_info,
266  weights_info,
267  dilation,
268  act_info,
269  enable_fast_math,
270  num_groups));
271  ARM_COMPUTE_LOG_PARAMS(src, weights, biases, dst, conv_info, weights_info, dilation, act_info, enable_fast_math, num_groups);
272 
273  const DataType data_type = src->data_type();
274  const DataLayout data_layout = src->data_layout();
277  const int idx_kernels = get_data_layout_dimension_index(data_layout, DataLayoutDimension::BATCHES);
278 
279  const unsigned int kernel_width = weights->dimension(idx_width);
280  const unsigned int kernel_height = weights->dimension(idx_height);
281 
282  _is_prepared = weights_info.retain_internal_weights();
283  _is_quantized = is_data_type_quantized_asymmetric(src->data_type());
284  _data_layout = data_layout;
285  _skip_im2col = (data_layout == DataLayout::NHWC && kernel_width == 1 && kernel_height == 1 && conv_info.stride().first == 1 && conv_info.stride().second == 1);
286 
287  const ITensorInfo *gemm_input_to_use = src;
288  ITensorInfo *gemm_output_to_use = dst;
289 
290  // Get convolved dimensions
291  unsigned int conv_w = 0;
292  unsigned int conv_h = 0;
293  std::tie(conv_w, conv_h) = scaled_dimensions(src->dimension(idx_width),
294  src->dimension(idx_height),
295  kernel_width,
296  kernel_height,
297  conv_info,
298  dilation);
299 
300  ARM_COMPUTE_ERROR_ON_MSG((dst->dimension(idx_width) != conv_w) || (dst->dimension(idx_height) != conv_h),
301  "Output shape does not match the expected one");
302 
303  // Check if GEMM3D is supported
304  const CpuGemmConv2d::SkipInfo skip_info = CpuGemmConv2d::skip_im_col_info(src, weights, conv_info, dilation, act_info);
305  _skip_im2col = skip_info.skip_im2col;
306  _skip_col2im = skip_info.skip_col2im;
307 
308  // Get parameters from conv_info
309  unsigned int stride_x = 0;
310  unsigned int stride_y = 0;
311  std::tie(stride_x, stride_y) = conv_info.stride();
312 
313  unsigned int mat_weights_cols = weights->dimension(idx_kernels);
314 
315  // _weights_reshaped will be auto configured in the kernel.
316  // Just append biases and do not transpose 1xW as it will be reshaped in CpuGemm
317  _weights_reshape_kernel = std::make_unique<kernels::CpuWeightsReshapeKernel>();
318  _weights_reshape_kernel->configure(weights, nullptr, &_weights_reshaped);
319  _weights_reshaped.set_quantization_info(weights->quantization_info());
320 
321  // Create tensor to store im2col reshaped inputs
322  if(!_skip_im2col)
323  {
324  // Configure
325  _im2col_kernel = std::make_unique<kernels::CpuIm2ColKernel>();
326  _im2col_kernel->configure(src, &_im2col_output, Size2D(kernel_width, kernel_height), conv_info, false, dilation);
327 
328  // Update GEMM input
329  gemm_input_to_use = &_im2col_output;
330  }
331 
332  // Create temporary GEMM output tensor in case we cannot skip col2im
333  const DataType output_data_type = data_type == DataType::BFLOAT16 ? DataType::F32 : data_type;
334  if(!_skip_col2im)
335  {
336  TensorShape shape_gemm;
337 
338  // Calculate GEMM output shape
339  shape_gemm = _im2col_output.tensor_shape();
340  shape_gemm.set(0, mat_weights_cols);
341  shape_gemm.set(1, conv_w * conv_h);
342 
343  _gemm_output = TensorInfo(shape_gemm, 1, output_data_type);
344  _gemm_output.set_quantization_info(dst->quantization_info()).set_data_layout(src->data_layout());
345  _gemm_output_3d = TensorInfo(_gemm_output);
346 
347  // Update GEMM output
348  gemm_output_to_use = &_gemm_output;
349  }
350  else
351  {
352  _gemm_output_3d = TensorInfo(*dst);
353  _gemm_output_3d.set_data_type(output_data_type).set_data_layout(src->data_layout()).set_is_resizable(true);
354  _gemm_output = TensorInfo(_gemm_output_3d);
355 
356  // Update GEMM output
357  gemm_output_to_use = &_gemm_output_3d;
358  }
359 
360  // Configure GEMM
361  // In case we need to skip col2im, GEMM3D (gemm_3d_depth != 0) must be called in order to avoid reshaping the output matrix
362  const unsigned int gemm_3d_depth = _skip_col2im ? conv_h : 0;
363  const bool fixed_format = weights_info.weight_format() != arm_compute::WeightFormat::UNSPECIFIED;
364  configure_mm(gemm_input_to_use, &_weights_reshaped, biases, gemm_output_to_use, act_info, enable_fast_math, gemm_3d_depth, fixed_format, weights_info.weight_format());
365 
366  if(!_skip_col2im && _data_layout == DataLayout::NCHW)
367  {
368  // Configure col2im
369  _col2im_kernel = std::make_unique<kernels::CpuCol2ImKernel>();
370  _col2im_kernel->configure(gemm_output_to_use, dst, Size2D(conv_w, conv_h));
371  }
372  else
373  {
374  // Configure reshape layer
375  _reshape_kernel = std::make_unique<kernels::CpuReshapeKernel>();
376  _reshape_kernel->configure(gemm_output_to_use, dst);
377  }
378 
379  // Check if GEMM transforms weights
380  // Modernise through COMPMID-4535
381  bool gemm_trans_wei = _aux_mem[1].size > 0; // Asm Pretranspose
382  gemm_trans_wei = _mm_gemm != nullptr ? _aux_mem[3].size > 0 : gemm_trans_wei; // Tranpose RHS
383  gemm_trans_wei = _mm_gemmlowp != nullptr ? _aux_mem[5].size > 0 : gemm_trans_wei; // Transpose RHS
384 
385  // Check lifetime
386  _aux_mem[Im2ColOutput] = MemoryInfo(offset_int_vec(Im2ColOutput), MemoryLifetime::Temporary, _im2col_output.total_size());
387  _aux_mem[WeightsReshaped] = MemoryInfo(offset_int_vec(WeightsReshaped), gemm_trans_wei ? MemoryLifetime::Prepare : MemoryLifetime::Persistent, _weights_reshaped.total_size());
388  _aux_mem[GemmOutput] = MemoryInfo(offset_int_vec(GemmOutput), MemoryLifetime::Temporary, _gemm_output.total_size());
389 }
1 channel, 1 F32 per channel
ITensorInfo & set_data_type(DataType data_type) override
Set the data type to the specified value.
Definition: TensorInfo.cpp:287
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
SimpleTensor< float > src
Definition: DFT.cpp:155
std::pair< unsigned int, unsigned int > scaled_dimensions(int width, int height, int kernel_width, int kernel_height, const PadStrideInfo &pad_stride_info, const Size2D &dilation=Size2D(1U, 1U))
Returns expected width and height of output scaled tensor depending on dimensions rounding mode...
Definition: Utils.cpp:427
ITensorInfo & set_quantization_info(const QuantizationInfo &quantization_info) override
Set the quantization settings (scale and offset) of the tensor.
Definition: TensorInfo.cpp:346
16-bit brain floating-point number
#define ARM_COMPUTE_UNUSED(...)
To avoid unused variables warnings.
Definition: Error.h:152
virtual ITensorInfo & set_data_layout(const DataLayout &data_layout)=0
Set the data layout of the tensor.
static Status validate(const ITensorInfo *src, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, const PadStrideInfo &conv_info, const WeightsInfo &weights_info=WeightsInfo(), const Size2D &dilation=Size2D(1U, 1U), const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false, unsigned int num_groups=1)
Static function to check if given info will lead to a valid configuration.
#define ARM_COMPUTE_ERROR_ON_MSG(cond, msg)
Definition: Error.h:456
const unsigned int num_groups
Definition: Im2Col.cpp:153
size_t total_size() const override
Returns the total size of the tensor in bytes.
Definition: TensorInfo.h:250
Num samples, channels, height, width.
src_info set_data_layout(data_layout)
bool is_data_type_quantized_asymmetric(DataType dt)
Check if a given data type is of asymmetric quantized type.
Definition: Utils.h:1052
size_t get_data_layout_dimension_index(const DataLayout &data_layout, const DataLayoutDimension &data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:193
Num samples, height, width, channels.
#define ARM_COMPUTE_LOG_PARAMS(...)
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:157
int offset_int_vec(int offset)
Definition: MemoryHelpers.h:38
const TensorShape & tensor_shape() const override
Size for each dimension of the tensor.
Definition: TensorInfo.h:234
DataType
Available data types.
Definition: Types.h:79
DataLayout
[DataLayout enum definition]
Definition: Types.h:113
TensorShape & set(size_t dimension, size_t value, bool apply_dim_correction=true, bool increase_dim_unit=true)
Accessor to set the value of one of the dimensions.
Definition: TensorShape.h:79

◆ has_opt_impl()

Status has_opt_impl ( arm_compute::WeightFormat expected_weight_format,
const ITensorInfo src,
const ITensorInfo weights,
const ITensorInfo biases,
const ITensorInfo output,
const PadStrideInfo conv_info,
const WeightsInfo weights_info = WeightsInfo(),
const Size2D dilation = Size2D(1U, 1U),
const ActivationLayerInfo act_info = ActivationLayerInfo(),
const bool  enable_fast_math = false 
)
static

Indicates whether or not there is an optimal assembly implementation that can be used to process the given parameters.

The paramter list is the same as NEGEMMConvolutionLayer::has_opt_impl

Returns
a status.

Definition at line 391 of file CpuGemmConv2d.cpp.

References arm_compute::test::validation::conv_info, ITensorInfo::data_layout(), ITensorInfo::dimension(), arm_compute::test::validation::gemm_info, arm_compute::get_data_layout_dimension_index(), CpuGemm::has_opt_impl(), arm_compute::HEIGHT, arm_compute::scaled_dimensions(), arm_compute::UNSPECIFIED, WeightsInfo::weight_format(), and arm_compute::WIDTH.

Referenced by NEGEMMConvolutionLayer::has_opt_impl().

394 {
395  const DataLayout data_layout = src->data_layout();
398  const unsigned int kernel_width = weights->dimension(idx_width);
399  const unsigned int kernel_height = weights->dimension(idx_height);
400  unsigned int conv_w = 0;
401  unsigned int conv_h = 0;
402  std::tie(conv_w, conv_h) = scaled_dimensions(src->dimension(idx_width),
403  src->dimension(idx_height),
404  kernel_width,
405  kernel_height,
406  conv_info,
407  dilation);
408 
409  const CpuGemmConv2d::SkipInfo skip_info = CpuGemmConv2d::skip_im_col_info(src, weights, conv_info,
410  dilation, act_info);
411 
412  const bool skip_im2col = skip_info.skip_im2col;
413  const bool skip_col2im = skip_info.skip_col2im;
414  const unsigned int gemm_3d_depth = skip_col2im ? conv_h : 0;
415  const bool fixed_format = weights_info.weight_format() != arm_compute::WeightFormat::UNSPECIFIED;
416  const GEMMInfo gemm_info = GEMMInfo(false, false, true /* Reshape weights only for the first run */,
417  gemm_3d_depth, skip_im2col /* Reinterpret the input as 3D if im2col is skipped */,
418  false, GEMMLowpOutputStageInfo(), false, enable_fast_math, false, act_info, experimental::PostOpList<ITensorInfo *>(), fixed_format, weights_info.weight_format());
419 
420  return CpuGemm::has_opt_impl(expected_weight_format, src, weights, biases, dst, gemm_info);
421 }
static Status has_opt_impl(arm_compute::WeightFormat &weight_format, const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *d, const GEMMInfo &gemm_info=GEMMInfo())
Indicates whether or not there is an optimal assembly implementation that can be used to process the ...
Definition: CpuGemm.cpp:371
SimpleTensor< float > src
Definition: DFT.cpp:155
std::pair< unsigned int, unsigned int > scaled_dimensions(int width, int height, int kernel_width, int kernel_height, const PadStrideInfo &pad_stride_info, const Size2D &dilation=Size2D(1U, 1U))
Returns expected width and height of output scaled tensor depending on dimensions rounding mode...
Definition: Utils.cpp:427
size_t get_data_layout_dimension_index(const DataLayout &data_layout, const DataLayoutDimension &data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:193
DataLayout
[DataLayout enum definition]
Definition: Types.h:113

◆ operator=() [1/2]

CpuGemmConv2d& operator= ( const CpuGemmConv2d )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

CpuGemmConv2d& operator= ( CpuGemmConv2d &&  )
delete

Prevent instances of this class from being moved (As this class contains non movable objects)

◆ prepare()

void prepare ( ITensorPack constants)
overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Parameters
[in]constantsVector that contains the constants tensors.
Note
Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from INEOperator.

Definition at line 639 of file CpuGemmConv2d.cpp.

References arm_compute::ACL_DST, arm_compute::ACL_SRC, arm_compute::ACL_SRC_1, ITensorPack::add_const_tensor(), Scheduler::get(), CpuAuxTensorHandler::get(), ITensorPack::get_const_tensor(), arm_compute::offset_int_vec(), arm_compute::test::validation::pack, and IScheduler::schedule_op().

Referenced by CpuGemmConv2d::run().

640 {
641  if(!_is_prepared)
642  {
643  // Variable weights executions that use fixed-format kernels
644  // need no reshaping of the weights.
645  if(this->isVarWeightsKernel())
646  {
647  _is_quantized ? _mm_gemmlowp->prepare(tensors) : _mm_gemm->prepare(tensors);
648  _is_prepared = true;
649  return;
650  }
651 
652  // Run weights reshaping and mark original weights tensor as unused
653  CpuAuxTensorHandler weights_reshaped(offset_int_vec(WeightsReshaped), _weights_reshaped, tensors);
654  auto weights = tensors.get_const_tensor(TensorType::ACL_SRC_1);
655  ITensorPack pack =
656  {
657  { TensorType::ACL_SRC, weights },
658  { TensorType::ACL_DST, weights_reshaped.get() }
659  };
660  NEScheduler::get().schedule_op(_weights_reshape_kernel.get(), 3, _weights_reshape_kernel->window(), pack);
661  weights->mark_as_unused();
662  ITensorPack gemm_pack = tensors;
663  gemm_pack.add_const_tensor(TensorType::ACL_SRC_1, weights_reshaped.get());
664  _is_quantized ? _mm_gemmlowp->prepare(gemm_pack) : _mm_gemm->prepare(gemm_pack);
665  _is_prepared = true;
666  }
667 }
virtual void schedule_op(ICPPKernel *kernel, const Hints &hints, const Window &window, ITensorPack &tensors)=0
Runs the kernel in the same thread as the caller synchronously.
int offset_int_vec(int offset)
Definition: MemoryHelpers.h:38
static IScheduler & get()
Access the scheduler singleton.
Definition: Scheduler.cpp:94

◆ run()

void run ( ITensorPack tensors)
overridevirtual

Run the kernels contained in the function.

Parameters
[in]tensorsVector that contains the tensors to operate on.

Reimplemented from INEOperator.

Definition at line 544 of file CpuGemmConv2d.cpp.

References arm_compute::ACL_DST, arm_compute::ACL_SRC, arm_compute::ACL_SRC_0, arm_compute::ACL_SRC_1, ITensorPack::add_const_tensor(), ITensorPack::add_tensor(), Tensor::allocator(), BorderSize::bottom, ITensor::buffer(), Window::DimY, arm_compute::test::validation::dst, TensorInfo::extend_padding(), Scheduler::get(), CpuAuxTensorHandler::get(), ITensorPack::get_const_tensor(), arm_compute::get_data_layout_dimension_index(), ITensorPack::get_tensor(), arm_compute::HEIGHT, TensorAllocator::import_memory(), ITensor::info(), arm_compute::NCHW, arm_compute::offset_int_vec(), arm_compute::test::validation::pack, ITensorInfo::padding(), CpuGemmConv2d::prepare(), IScheduler::schedule_op(), ITensorAllocator::soft_init(), arm_compute::test::validation::src, and BorderSize::top.

545 {
546  prepare(tensors);
547 
548  auto src = tensors.get_const_tensor(ACL_SRC_0);
549  auto dst = tensors.get_tensor(ACL_DST);
550  auto gemm_input_to_use = src;
551 
552  CpuAuxTensorHandler im2col_output(offset_int_vec(Im2ColOutput), _im2col_output, tensors, false);
553  CpuAuxTensorHandler gemm_output(offset_int_vec(GemmOutput), _gemm_output, tensors, false);
554  CpuAuxTensorHandler reshaped_wei(offset_int_vec(WeightsReshaped), _weights_reshaped, tensors, false);
555 
556  bool out_has_padding = _skip_col2im && (dst->info()->padding().bottom != 0 || dst->info()->padding().top != 0);
557  if(!_skip_im2col)
558  {
559  // Run input reshaping
560  unsigned int y_dim = get_data_layout_dimension_index(_data_layout, DataLayoutDimension::HEIGHT);
561  ITensorPack pack =
562  {
564  { TensorType::ACL_DST, im2col_output.get() }
565  };
566  NEScheduler::get().schedule_op(_im2col_kernel.get(), y_dim, _im2col_kernel->window(), pack);
567  gemm_input_to_use = im2col_output.get();
568  }
569 
570  // Handle the case where output has top/bottom padding
571  const ITensor *out_to_use = out_has_padding ? gemm_output.get() : dst;
572  Tensor gemm3d;
573  _gemm_output_3d.extend_padding(out_to_use->info()->padding());
574  gemm3d.allocator()->soft_init(_gemm_output_3d);
575  gemm3d.allocator()->import_memory(out_to_use->buffer());
576  auto gemm_output_to_use = gemm_output.get();
577 
578  if(_skip_im2col)
579  {
580  gemm_output_to_use = &gemm3d;
581  }
582  if(_skip_col2im && !out_has_padding)
583  {
584  gemm_output_to_use = dst;
585  }
586 
587  // Runs CpuGemm or CpuGemmLowpMatrixMultiplyCore functions
588  ITensorPack pack_mm = tensors;
589  pack_mm.add_const_tensor(TensorType::ACL_SRC_0, gemm_input_to_use);
590  if(!this->isVarWeightsKernel())
591  {
592  pack_mm.add_const_tensor(TensorType::ACL_SRC_1, reshaped_wei.get());
593  }
594  pack_mm.add_tensor(TensorType::ACL_DST, gemm_output_to_use);
595  if(_is_quantized)
596  {
597  // Run gemmlowp
598  _mm_gemmlowp->run(pack_mm);
599  }
600  else
601  {
602  // Run gemm
603  _mm_gemm->run(pack_mm);
604  }
605 
606  // Reshape output matrix
607  if(!_skip_col2im)
608  {
609  if(_data_layout == DataLayout::NCHW)
610  {
611  ITensorPack pack =
612  {
613  { TensorType::ACL_SRC, gemm_output.get() },
615  };
616  NEScheduler::get().schedule_op(_col2im_kernel.get(), Window::DimY, _col2im_kernel->window(), pack);
617  }
618  else
619  {
620  ITensorPack pack =
621  {
622  { TensorType::ACL_SRC, gemm_output_to_use },
623  { TensorType::ACL_DST, dst }
624  };
625  NEScheduler::get().schedule_op(_reshape_kernel.get(), Window::DimY, _reshape_kernel->window(), pack);
626  }
627  }
628  else if(out_has_padding)
629  {
630  ITensorPack pack =
631  {
632  { TensorType::ACL_SRC, gemm_output_to_use },
633  { TensorType::ACL_DST, dst }
634  };
635  NEScheduler::get().schedule_op(_reshape_kernel.get(), Window::DimY, _reshape_kernel->window(), pack);
636  }
637 }
virtual void schedule_op(ICPPKernel *kernel, const Hints &hints, const Window &window, ITensorPack &tensors)=0
Runs the kernel in the same thread as the caller synchronously.
bool extend_padding(const PaddingSize &padding) override
Update the offset to the first element, the strides and the total size.
Definition: TensorInfo.cpp:247
void prepare(ITensorPack &tensors) override
Prepare the function for executing.
SimpleTensor< float > src
Definition: DFT.cpp:155
Num samples, channels, height, width.
static constexpr size_t DimY
Alias for dimension 1 also known as Y dimension.
Definition: Window.h:45
size_t get_data_layout_dimension_index(const DataLayout &data_layout, const DataLayoutDimension &data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:193
int offset_int_vec(int offset)
Definition: MemoryHelpers.h:38
static IScheduler & get()
Access the scheduler singleton.
Definition: Scheduler.cpp:94

◆ validate()

Status validate ( const ITensorInfo src,
const ITensorInfo weights,
const ITensorInfo biases,
const ITensorInfo output,
const PadStrideInfo conv_info,
const WeightsInfo weights_info = WeightsInfo(),
const Size2D dilation = Size2D(1U, 1U),
const ActivationLayerInfo act_info = ActivationLayerInfo(),
bool  enable_fast_math = false,
unsigned int  num_groups = 1 
)
static

Static function to check if given info will lead to a valid configuration.

Similar to CpuGemmConvolution::configure()

Returns
a status

Definition at line 423 of file CpuGemmConv2d.cpp.

References WeightsInfo::are_reshaped(), ARM_COMPUTE_RETURN_ERROR_ON, ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_LAYOUT, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES, ARM_COMPUTE_RETURN_ERROR_ON_MSG, ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR, ARM_COMPUTE_RETURN_ON_ERROR, arm_compute::BATCHES, arm_compute::BFLOAT16, arm_compute::CHANNEL, arm_compute::misc::shape_calculator::compute_weights_reshaped_shape(), arm_compute::test::validation::conv_info, ITensorInfo::data_layout(), arm_compute::test::validation::data_type, ITensorInfo::data_type(), ITensorInfo::dimension(), arm_compute::test::validation::dst, arm_compute::F16, arm_compute::F32, arm_compute::get_data_layout_dimension_index(), arm_compute::HEIGHT, arm_compute::is_data_type_quantized_asymmetric(), arm_compute::NCHW, ITensorInfo::num_dimensions(), arm_compute::QASYMM8, arm_compute::QASYMM8_SIGNED, arm_compute::QSYMM8_PER_CHANNEL, ITensorInfo::quantization_info(), arm_compute::S32, arm_compute::scaled_dimensions(), TensorShape::set(), arm_compute::test::validation::set_data_layout(), TensorInfo::set_quantization_info(), arm_compute::test::validation::src, ITensorInfo::tensor_shape(), arm_compute::UNSPECIFIED, CpuCol2ImKernel::validate(), CpuIm2ColKernel::validate(), WeightsInfo::weight_format(), and arm_compute::WIDTH.

Referenced by CpuGemmConv2d::configure(), CpuConv2d::validate(), and NEGEMMConvolutionLayer::validate().

425 {
427  ARM_COMPUTE_RETURN_ERROR_ON_MSG(weights_info.are_reshaped(), "Weights already reshaped are not supported!");
431  ARM_COMPUTE_RETURN_ERROR_ON_MSG(num_groups > 1, "Grouping (num_groups != 1) is not supported");
432 
433  const DataLayout data_layout = src->data_layout();
434  const DataType data_type = src->data_type();
437  const int idx_channel = get_data_layout_dimension_index(data_layout, DataLayoutDimension::CHANNEL);
438  const int idx_kernels = get_data_layout_dimension_index(data_layout, DataLayoutDimension::BATCHES);
439 
440  const unsigned int kernel_width = weights->dimension(idx_width);
441  const unsigned int kernel_height = weights->dimension(idx_height);
442 
443  TensorInfo im2col_reshaped_info{};
444  TensorInfo info_gemm{};
445  TensorInfo tmp_info{};
446  TensorInfo weights_reshaped_info{};
447  const ITensorInfo *gemm_input_to_use = src;
448  const ITensorInfo *gemm_output_to_use = dst;
449  const ITensorInfo *weights_to_use = weights;
450 
451  const bool append_bias = false;
452  const bool is_quantized = is_data_type_quantized_asymmetric(data_type);
453  const bool is_bf16 = data_type == DataType::BFLOAT16;
454 
455  // Get convolved dimensions
456  unsigned int conv_w = 0;
457  unsigned int conv_h = 0;
458 
459  std::tie(conv_w, conv_h) = scaled_dimensions(src->dimension(idx_width),
460  src->dimension(idx_height),
461  kernel_width,
462  kernel_height,
463  conv_info,
464  dilation);
465 
466  // Check if GEMM3D is supported
467  const CpuGemmConv2d::SkipInfo skip_info = CpuGemmConv2d::skip_im_col_info(src, weights, conv_info,
468  dilation, act_info);
469  const bool skip_im2col = skip_info.skip_im2col, skip_col2im = skip_info.skip_col2im;
470 
471  ARM_COMPUTE_RETURN_ERROR_ON(weights->dimension(idx_channel) != src->dimension(idx_channel));
472  ARM_COMPUTE_RETURN_ERROR_ON(weights->num_dimensions() > 4);
473 
474  // Validate biases
475  if(biases != nullptr)
476  {
477  if(is_quantized)
478  {
480  }
481  else if(is_bf16)
482  {
484  }
485  else
486  {
488  }
489  ARM_COMPUTE_RETURN_ERROR_ON(biases->dimension(0) != dst->dimension(idx_channel));
490  ARM_COMPUTE_RETURN_ERROR_ON(biases->num_dimensions() > 1);
491  }
492 
493  unsigned int mat_weights_cols = weights->dimension(idx_kernels);
494  unsigned int mat_weights_rows = weights->dimension(idx_width) * weights->dimension(idx_height) * weights->dimension(idx_channel);
495 
496  weights_reshaped_info = TensorInfo(compute_weights_reshaped_shape(*weights, append_bias), 1, data_type);
497  weights_reshaped_info.set_quantization_info(weights->quantization_info());
498  weights_to_use = &weights_reshaped_info;
499 
500  if(!skip_im2col)
501  {
502  // Create tensor info for im2col reshaped inputs
503  // For CPU, the batch size is on the fourth dimension
504  TensorShape shape_im2col = src->tensor_shape();
505  shape_im2col.set(0, mat_weights_rows);
506  shape_im2col.set(1, conv_w * conv_h);
507  shape_im2col.set(2, 1);
508 
509  im2col_reshaped_info = TensorInfo(shape_im2col, 1, data_type);
510  im2col_reshaped_info.set_quantization_info(src->quantization_info());
511  ARM_COMPUTE_RETURN_ON_ERROR(kernels::CpuIm2ColKernel::validate(src, &im2col_reshaped_info, Size2D(kernel_width, kernel_height), conv_info, append_bias, dilation, 1));
512  gemm_input_to_use = &im2col_reshaped_info;
513  }
514 
515  // Create temporary GEMM output tensor in case we cannot skip col2im
516  const DataType output_data_type = data_type == DataType::BFLOAT16 ? DataType::F32 : data_type;
517  if(!skip_col2im)
518  {
519  TensorShape shape_gemm = gemm_input_to_use->tensor_shape();
520  shape_gemm.set(0, mat_weights_cols);
521  shape_gemm.set(1, conv_w * conv_h);
522  info_gemm = TensorInfo(shape_gemm, 1, output_data_type);
523  }
524  else
525  {
526  info_gemm = TensorInfo(dst->tensor_shape(), 1, output_data_type);
527  }
528  info_gemm.set_quantization_info(dst->quantization_info()).set_data_layout(src->data_layout());
529  gemm_output_to_use = &info_gemm;
530  const bool fixed_format = weights_info.weight_format() != arm_compute::WeightFormat::UNSPECIFIED;
531 
532  ARM_COMPUTE_RETURN_ON_ERROR(validate_mm(gemm_input_to_use, weights_to_use, biases, gemm_output_to_use, act_info, enable_fast_math, skip_col2im ? conv_h : 0, skip_im2col, fixed_format,
533  weights_info.weight_format()));
534 
535  // Validate Col2Im/ReshapeLayer
536  if(!skip_col2im && (data_layout == DataLayout::NCHW))
537  {
538  ARM_COMPUTE_RETURN_ON_ERROR(kernels::CpuCol2ImKernel::validate(gemm_output_to_use, dst, Size2D(conv_w, conv_h)));
539  }
540 
541  return Status{};
542 }
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_LAYOUT(...)
Definition: Validate.h:490
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204
1 channel, 1 F32 per channel
#define ARM_COMPUTE_RETURN_ERROR_ON(cond)
If the condition is true, an error is returned.
Definition: Error.h:296
SimpleTensor< float > src
Definition: DFT.cpp:155
1 channel, 1 F16 per channel
std::pair< unsigned int, unsigned int > scaled_dimensions(int width, int height, int kernel_width, int kernel_height, const PadStrideInfo &pad_stride_info, const Size2D &dilation=Size2D(1U, 1U))
Returns expected width and height of output scaled tensor depending on dimensions rounding mode...
Definition: Utils.cpp:427
#define ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR(...)
Definition: Validate.h:159
1 channel, 1 S32 per channel
16-bit brain floating-point number
static Status validate(const ITensorInfo *src, const ITensorInfo *dst, const Size2D &convolved_dims)
Static function to check if given info will lead to a valid configuration.
quantized, asymmetric fixed-point 8-bit number unsigned
const unsigned int num_groups
Definition: Im2Col.cpp:153
Num samples, channels, height, width.
src_info set_data_layout(data_layout)
bool is_data_type_quantized_asymmetric(DataType dt)
Check if a given data type is of asymmetric quantized type.
Definition: Utils.h:1052
quantized, symmetric per channel fixed-point 8-bit number
TensorShape compute_weights_reshaped_shape(const ITensorInfo &weights, bool has_bias=false, unsigned int num_groups=1)
Calculate the reshaped shape of the weights.
size_t get_data_layout_dimension_index(const DataLayout &data_layout, const DataLayoutDimension &data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:193
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(...)
Definition: Validate.h:541
#define ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(t, c,...)
Definition: Validate.h:788
#define ARM_COMPUTE_RETURN_ERROR_ON_MSG(cond, msg)
If the condition is true, an error is returned.
Definition: Error.h:244
quantized, asymmetric fixed-point 8-bit number signed
DataType
Available data types.
Definition: Types.h:79
DataLayout
[DataLayout enum definition]
Definition: Types.h:113
static Status validate(const ITensorInfo *src, const ITensorInfo *dst, const Size2D &kernel_dims, const PadStrideInfo &conv_info, bool has_bias, const Size2D &dilation=Size2D(1U, 1U), unsigned int num_groups=1)
Static function to check if given info will lead to a valid configuration.

◆ workspace()

experimental::MemoryRequirements workspace ( ) const
overridevirtual

Return the memory requirements required by the workspace.

Reimplemented from INEOperator.

Definition at line 668 of file CpuGemmConv2d.cpp.

669 {
670  return _aux_mem;
671 }

The documentation for this class was generated from the following files: