Compute Library
 23.05
CpuGemmConv2d Class Reference

Basic function to compute the convolution layer. More...

#include <CpuGemmConv2d.h>

Collaboration diagram for CpuGemmConv2d:
[legend]

Public Member Functions

 CpuGemmConv2d ()
 Constructor. More...
 
 CpuGemmConv2d (const CpuGemmConv2d &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 CpuGemmConv2d (CpuGemmConv2d &&)=delete
 Prevent instances of this class from being moved (As this class contains non movable objects) More...
 
CpuGemmConv2doperator= (const CpuGemmConv2d &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
CpuGemmConv2doperator= (CpuGemmConv2d &&)=delete
 Prevent instances of this class from being moved (As this class contains non movable objects) More...
 
 ~CpuGemmConv2d ()
 Destructor. More...
 
void configure (const ITensorInfo *src, const ITensorInfo *weights, const ITensorInfo *biases, ITensorInfo *dst, const PadStrideInfo &conv_info, const WeightsInfo &weights_info=WeightsInfo(), const Size2D &dilation=Size2D(1U, 1U), const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false, unsigned int num_groups=1)
 Set the input and output tensors. More...
 
void run (ITensorPack &tensors) override
 Run the kernels contained in the function. More...
 
void prepare (ITensorPack &tensors) override
 Prepare the function for executing. More...
 
experimental::MemoryRequirements workspace () const override
 Return the memory requirements required by the workspace. More...
 
- Public Member Functions inherited from INEOperator
 INEOperator (IRuntimeContext *ctx=nullptr)
 Constructor. More...
 
 INEOperator (const INEOperator &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 INEOperator (INEOperator &&)=default
 Default move constructor. More...
 
INEOperatoroperator= (const INEOperator &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
INEOperatoroperator= (INEOperator &&)=default
 Default move assignment operator. More...
 
 ~INEOperator ()
 Default destructor. More...
 
- Public Member Functions inherited from IOperator
virtual ~IOperator ()=default
 Destructor. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *src, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, const PadStrideInfo &conv_info, const WeightsInfo &weights_info=WeightsInfo(), const Size2D &dilation=Size2D(1U, 1U), const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false, unsigned int num_groups=1)
 Static function to check if given info will lead to a valid configuration. More...
 
static Status has_opt_impl (arm_compute::WeightFormat &expected_weight_format, const ITensorInfo *src, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, const PadStrideInfo &conv_info, const WeightsInfo &weights_info=WeightsInfo(), const Size2D &dilation=Size2D(1U, 1U), const ActivationLayerInfo &act_info=ActivationLayerInfo(), const bool enable_fast_math=false)
 Indicates whether or not there is an optimal assembly implementation that can be used to process the given parameters. More...
 

Detailed Description

Basic function to compute the convolution layer.

This function calls the following kernels/functions:

  1. cpu::kernels::CpuIm2ColKernel
  2. CpuGemm (if the data type is BFLOAT16/FP16/FP32)
  3. CpuGemmLowpMatrixMultiplyCore (if the data type is QASYMM8/QASYMM8_SIGNED)
  4. CpuGemmLowpOutputStage (if the data type is QASYMM8/QASYMM8_SIGNED)
  5. cpu::kernels::CpuCol2ImKernel (if NCHW data layout)
  6. kernels::CpuWeightsReshapeKernel

Definition at line 58 of file CpuGemmConv2d.h.

Constructor & Destructor Documentation

◆ CpuGemmConv2d() [1/3]

Constructor.

Definition at line 94 of file CpuGemmConv2d.cpp.

References arm_compute::NCHW.

95  : _weights_reshape_kernel(nullptr), _im2col_kernel(), _mm_gemm(), _mm_gemmlowp(), _col2im_kernel(), _reshape_kernel(), _im2col_output(), _weights_reshaped(), _gemm_output(), _gemm_output_3d(),
96  _data_layout(DataLayout::NCHW), _skip_im2col(false), _skip_col2im(false), _is_quantized(false), _is_prepared(false), _aux_mem(AuxTensorIdx::Count)
97 {
98 }
Num samples, channels, height, width.

◆ CpuGemmConv2d() [2/3]

CpuGemmConv2d ( const CpuGemmConv2d )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ CpuGemmConv2d() [3/3]

CpuGemmConv2d ( CpuGemmConv2d &&  )
delete

Prevent instances of this class from being moved (As this class contains non movable objects)

◆ ~CpuGemmConv2d()

~CpuGemmConv2d ( )
default

Destructor.

Member Function Documentation

◆ configure()

void configure ( const ITensorInfo src,
const ITensorInfo weights,
const ITensorInfo biases,
ITensorInfo dst,
const PadStrideInfo conv_info,
const WeightsInfo weights_info = WeightsInfo(),
const Size2D dilation = Size2D(1U, 1U),
const ActivationLayerInfo act_info = ActivationLayerInfo(),
bool  enable_fast_math = false,
unsigned int  num_groups = 1 
)

Set the input and output tensors.

Valid data layouts:

  • NHWC
  • NCHW

Valid data type configurations:

src0 src1 src2 dst
F16 F16 F16 F16
F32 F32 F32 F32
BFLOAT16 BFLOAT16 BFLOAT16 BFLOAT16
QASYMM8 QASYMM8 S32 QASYMM8
QASYMM8 QSYMM8_PER_CHANNEL S32 QASYMM8
QASYMM8_SIGNED QASYMM8_SIGNED S32 QASYMM8_SIGNED
QASYMM8_SIGNED QSYMM8_PER_CHANNEL S32 QASYMM8_SIGNED
Parameters
[in]srcSource tensor info. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: QASYMM8/QASYMM8_SIGNED/BFLOAT16/F16/F32.
[in]weightsWeights tensor info. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. Data type supported: QASYMM8/QASYMM8_SIGNED/QSYMM8_PER_CHANNEL/BFLOAT16/F16/F32.
[in]biasesBiases tensor info. Shared biases supported. Biases are 1D tensor with dimensions [OFM]. Data type supported: Should match input data type, except for input of QASYMM8/QASYMM8_SIGNED type where biases should be of S32 type.
[out]dstDestination tensor info. 3 lower dimensions represent a single output [width, height, OFM], while the rest represent batch of outputs. Data types supported: Same as input.
[in]conv_infoContains padding and stride information described in PadStrideInfo.
[in]weights_infoSpecifies if the weights tensor has been reshaped with NEWeightsReshapeKernel. If this is not part of the fully connected layer the weights tensor has also been transposed with cpu::kernels::CpuGemmTranspose1xWKernel. Data type supported: Same as input.
[in]dilation(Optional) Dilation, in elements, across x and y. Defaults to (1, 1).
[in]act_info(Optional) Activation layer information in case of a fused activation. Only RELU, BOUNDED_RELU and LU_BOUNDED_RELU supported.
[in]enable_fast_math(Optional) Enable fast math computation. In case this flag were set, the function could dispatch the fastest implementation available which may introduce a drop of accuracy as well. Default is false
[in]num_groups(Optional) Number of groups when performing a grouped convolution. num_groups != 1 is not supported

Definition at line 256 of file CpuGemmConv2d.cpp.

References ARM_COMPUTE_ERROR_ON_MSG, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, ARM_COMPUTE_LOG_PARAMS, ARM_COMPUTE_UNUSED, arm_compute::BATCHES, arm_compute::BFLOAT16, arm_compute::block_by(), arm_compute::CHANNEL, arm_compute::test::validation::conv_info, ITensorInfo::data_layout(), arm_compute::test::validation::data_layout, arm_compute::test::validation::data_type, ITensorInfo::data_type(), ITensorInfo::dimension(), arm_compute::test::validation::dst, arm_compute::F32, arm_compute::get_data_layout_dimension_index(), arm_compute::HEIGHT, arm_compute::is_data_type_quantized_asymmetric(), arm_compute::NCHW, arm_compute::NHWC, arm_compute::offset_int_vec(), arm_compute::experimental::Prepare, ITensorInfo::quantization_info(), WeightsInfo::retain_internal_weights(), arm_compute::scaled_dimensions(), TensorShape::set(), ITensorInfo::set_data_layout(), arm_compute::test::validation::set_data_layout(), TensorInfo::set_data_type(), TensorInfo::set_quantization_info(), arm_compute::test::validation::src, PadStrideInfo::stride(), TensorInfo::tensor_shape(), TensorInfo::total_size(), arm_compute::UNSPECIFIED, CpuGemmConv2d::validate(), WeightsInfo::weight_format(), and arm_compute::WIDTH.

258 {
262  weights,
263  biases,
264  dst,
265  conv_info,
266  weights_info,
267  dilation,
268  act_info,
269  enable_fast_math,
270  num_groups));
271  ARM_COMPUTE_LOG_PARAMS(src, weights, biases, dst, conv_info, weights_info, dilation, act_info, enable_fast_math, num_groups);
272 
273  const DataType data_type = src->data_type();
274  const DataLayout data_layout = src->data_layout();
277  const int idx_channel = get_data_layout_dimension_index(data_layout, DataLayoutDimension::CHANNEL);
278  const int idx_kernels = get_data_layout_dimension_index(data_layout, DataLayoutDimension::BATCHES);
279 
280  const unsigned int kernel_width = weights->dimension(idx_width);
281  const unsigned int kernel_height = weights->dimension(idx_height);
282 
283  _is_prepared = weights_info.retain_internal_weights();
284  _is_quantized = is_data_type_quantized_asymmetric(src->data_type());
285  _data_layout = data_layout;
286  _skip_im2col = (data_layout == DataLayout::NHWC && kernel_width == 1 && kernel_height == 1 && conv_info.stride().first == 1 && conv_info.stride().second == 1);
287 
288  const ITensorInfo *gemm_input_to_use = src;
289  ITensorInfo *gemm_output_to_use = dst;
290 
291  // Get convolved dimensions
292  unsigned int conv_w = 0;
293  unsigned int conv_h = 0;
294  std::tie(conv_w, conv_h) = scaled_dimensions(src->dimension(idx_width),
295  src->dimension(idx_height),
296  kernel_width,
297  kernel_height,
298  conv_info,
299  dilation);
300 
301  ARM_COMPUTE_ERROR_ON_MSG((dst->dimension(idx_width) != conv_w) || (dst->dimension(idx_height) != conv_h),
302  "Output shape does not match the expected one");
303 
304  // Check if GEMM3D is supported
305  const CpuGemmConv2d::SkipInfo skip_info = CpuGemmConv2d::skip_im_col_info(src, weights, conv_info, dilation, act_info);
306  _skip_im2col = skip_info.skip_im2col;
307  _skip_col2im = skip_info.skip_col2im;
308 
309  // Get parameters from conv_info
310  unsigned int stride_x = 0;
311  unsigned int stride_y = 0;
312  std::tie(stride_x, stride_y) = conv_info.stride();
313 
314  unsigned int mat_weights_cols = weights->dimension(idx_kernels);
315 
316  // _weights_reshaped will be auto configured in the kernel.
317  // Just append biases and do not transpose 1xW as it will be reshaped in CpuGemm
318  _weights_reshape_kernel = std::make_unique<kernels::CpuWeightsReshapeKernel>();
319  _weights_reshape_kernel->configure(weights, nullptr, &_weights_reshaped);
320  _weights_reshaped.set_quantization_info(weights->quantization_info());
321 
322  // Create tensor to store im2col reshaped inputs
323  if(!_skip_im2col)
324  {
325  const int block_by = arm_compute::block_by(weights_info.weight_format());
326  unsigned int input_pad_right = 0;
327  if(block_by > 1)
328  {
329  input_pad_right = (src->dimension(idx_channel) % block_by) == 0 ? 0 : block_by - (src->dimension(idx_channel) % block_by);
330  }
331  // Configure
332  _im2col_kernel = std::make_unique<kernels::CpuIm2ColKernel>();
333  _im2col_kernel->configure(src, &_im2col_output, Size2D(kernel_width, kernel_height), conv_info, false, dilation, num_groups, input_pad_right);
334 
335  // Update GEMM input
336  gemm_input_to_use = &_im2col_output;
337  }
338 
339  // Create temporary GEMM output tensor in case we cannot skip col2im
340  const DataType output_data_type = data_type == DataType::BFLOAT16 ? DataType::F32 : data_type;
341  if(!_skip_col2im)
342  {
343  TensorShape shape_gemm;
344 
345  // Calculate GEMM output shape
346  shape_gemm = _im2col_output.tensor_shape();
347  shape_gemm.set(0, mat_weights_cols);
348  shape_gemm.set(1, conv_w * conv_h);
349 
350  _gemm_output = TensorInfo(shape_gemm, 1, output_data_type);
351  _gemm_output.set_quantization_info(dst->quantization_info()).set_data_layout(src->data_layout());
352  _gemm_output_3d = TensorInfo(_gemm_output);
353 
354  // Update GEMM output
355  gemm_output_to_use = &_gemm_output;
356  }
357  else
358  {
359  _gemm_output_3d = TensorInfo(*dst);
360  _gemm_output_3d.set_data_type(output_data_type).set_data_layout(src->data_layout()).set_is_resizable(true);
361  _gemm_output = TensorInfo(_gemm_output_3d);
362 
363  // Update GEMM output
364  gemm_output_to_use = &_gemm_output_3d;
365  }
366 
367  // Configure GEMM
368  // In case we need to skip col2im, GEMM3D (gemm_3d_depth != 0) must be called in order to avoid reshaping the output matrix
369  const unsigned int gemm_3d_depth = _skip_col2im ? conv_h : 0;
370  const bool fixed_format = weights_info.weight_format() != arm_compute::WeightFormat::UNSPECIFIED;
371  configure_mm(gemm_input_to_use, &_weights_reshaped, biases, gemm_output_to_use, act_info, enable_fast_math, gemm_3d_depth, fixed_format, weights_info.weight_format());
372 
373  if(!_skip_col2im && _data_layout == DataLayout::NCHW)
374  {
375  // Configure col2im
376  _col2im_kernel = std::make_unique<kernels::CpuCol2ImKernel>();
377  _col2im_kernel->configure(gemm_output_to_use, dst, Size2D(conv_w, conv_h));
378  }
379  else
380  {
381  // Configure reshape layer
382  _reshape_kernel = std::make_unique<kernels::CpuReshapeKernel>();
383  _reshape_kernel->configure(gemm_output_to_use, dst);
384  }
385 
386  // Check if GEMM transforms weights
387  // Modernise through COMPMID-4535
388  bool gemm_trans_wei = _aux_mem[1].size > 0; // Asm Pretranspose
389  gemm_trans_wei = _mm_gemm != nullptr ? _aux_mem[3].size > 0 : gemm_trans_wei; // Tranpose RHS
390  gemm_trans_wei = _mm_gemmlowp != nullptr ? _aux_mem[5].size > 0 : gemm_trans_wei; // Transpose RHS
391 
392  // Check lifetime
393  _aux_mem[Im2ColOutput] = MemoryInfo(offset_int_vec(Im2ColOutput), MemoryLifetime::Temporary, _im2col_output.total_size());
394  _aux_mem[WeightsReshaped] = MemoryInfo(offset_int_vec(WeightsReshaped), gemm_trans_wei ? MemoryLifetime::Prepare : MemoryLifetime::Persistent, _weights_reshaped.total_size());
395  _aux_mem[GemmOutput] = MemoryInfo(offset_int_vec(GemmOutput), MemoryLifetime::Temporary, _gemm_output.total_size());
396 }
1 channel, 1 F32 per channel
ITensorInfo & set_data_type(DataType data_type) override
Set the data type to the specified value.
Definition: TensorInfo.cpp:321
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
SimpleTensor< float > src
Definition: DFT.cpp:155
std::pair< unsigned int, unsigned int > scaled_dimensions(int width, int height, int kernel_width, int kernel_height, const PadStrideInfo &pad_stride_info, const Size2D &dilation=Size2D(1U, 1U))
Returns expected width and height of output scaled tensor depending on dimensions rounding mode...
Definition: Utils.cpp:429
ITensorInfo & set_quantization_info(const QuantizationInfo &quantization_info) override
Set the quantization settings (scale and offset) of the tensor.
Definition: TensorInfo.cpp:380
16-bit brain floating-point number
#define ARM_COMPUTE_UNUSED(...)
To avoid unused variables warnings.
Definition: Error.h:152
virtual ITensorInfo & set_data_layout(const DataLayout &data_layout)=0
Set the data layout of the tensor.
static Status validate(const ITensorInfo *src, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, const PadStrideInfo &conv_info, const WeightsInfo &weights_info=WeightsInfo(), const Size2D &dilation=Size2D(1U, 1U), const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false, unsigned int num_groups=1)
Static function to check if given info will lead to a valid configuration.
#define ARM_COMPUTE_ERROR_ON_MSG(cond, msg)
Definition: Error.h:456
const unsigned int num_groups
Definition: Im2Col.cpp:153
size_t total_size() const override
Returns the total size of the tensor in bytes.
Definition: TensorInfo.h:252
int block_by(const WeightFormat wf)
Definition: Types.h:2079
Num samples, channels, height, width.
src_info set_data_layout(data_layout)
bool is_data_type_quantized_asymmetric(DataType dt)
Check if a given data type is of asymmetric quantized type.
Definition: Utils.h:1052
size_t get_data_layout_dimension_index(const DataLayout &data_layout, const DataLayoutDimension &data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:203
Num samples, height, width, channels.
#define ARM_COMPUTE_LOG_PARAMS(...)
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:157
int offset_int_vec(int offset)
Definition: MemoryHelpers.h:38
const TensorShape & tensor_shape() const override
Size for each dimension of the tensor.
Definition: TensorInfo.h:236
DataType
Available data types.
Definition: Types.h:79
DataLayout
[DataLayout enum definition]
Definition: Types.h:113
TensorShape & set(size_t dimension, size_t value, bool apply_dim_correction=true, bool increase_dim_unit=true)
Accessor to set the value of one of the dimensions.
Definition: TensorShape.h:79

◆ has_opt_impl()

Status has_opt_impl ( arm_compute::WeightFormat expected_weight_format,
const ITensorInfo src,
const ITensorInfo weights,
const ITensorInfo biases,
const ITensorInfo output,
const PadStrideInfo conv_info,
const WeightsInfo weights_info = WeightsInfo(),
const Size2D dilation = Size2D(1U, 1U),
const ActivationLayerInfo act_info = ActivationLayerInfo(),
const bool  enable_fast_math = false 
)
static

Indicates whether or not there is an optimal assembly implementation that can be used to process the given parameters.

The paramter list is the same as NEGEMMConvolutionLayer::has_opt_impl

Returns
a status.

Definition at line 398 of file CpuGemmConv2d.cpp.

References arm_compute::test::validation::conv_info, ITensorInfo::data_layout(), ITensorInfo::dimension(), arm_compute::test::validation::gemm_info, arm_compute::get_data_layout_dimension_index(), CpuGemm::has_opt_impl(), arm_compute::HEIGHT, arm_compute::scaled_dimensions(), arm_compute::UNSPECIFIED, WeightsInfo::weight_format(), and arm_compute::WIDTH.

Referenced by NEGEMMConvolutionLayer::has_opt_impl().

401 {
402  const DataLayout data_layout = src->data_layout();
405  const unsigned int kernel_width = weights->dimension(idx_width);
406  const unsigned int kernel_height = weights->dimension(idx_height);
407  unsigned int conv_w = 0;
408  unsigned int conv_h = 0;
409  std::tie(conv_w, conv_h) = scaled_dimensions(src->dimension(idx_width),
410  src->dimension(idx_height),
411  kernel_width,
412  kernel_height,
413  conv_info,
414  dilation);
415 
416  const CpuGemmConv2d::SkipInfo skip_info = CpuGemmConv2d::skip_im_col_info(src, weights, conv_info,
417  dilation, act_info);
418 
419  const bool skip_im2col = skip_info.skip_im2col;
420  const bool skip_col2im = skip_info.skip_col2im;
421  const unsigned int gemm_3d_depth = skip_col2im ? conv_h : 0;
422  const bool fixed_format = weights_info.weight_format() != arm_compute::WeightFormat::UNSPECIFIED;
423  const GEMMInfo gemm_info = GEMMInfo(false, false, true /* Reshape weights only for the first run */,
424  gemm_3d_depth, skip_im2col /* Reinterpret the input as 3D if im2col is skipped */,
425  false, GEMMLowpOutputStageInfo(), false, enable_fast_math, false, act_info, experimental::PostOpList<ITensorInfo *>(), fixed_format, weights_info.weight_format());
426 
427  return CpuGemm::has_opt_impl(expected_weight_format, src, weights, biases, dst, gemm_info);
428 }
static Status has_opt_impl(arm_compute::WeightFormat &weight_format, const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *d, const GEMMInfo &gemm_info=GEMMInfo())
Indicates whether or not there is an optimal assembly implementation that can be used to process the ...
Definition: CpuGemm.cpp:403
SimpleTensor< float > src
Definition: DFT.cpp:155
std::pair< unsigned int, unsigned int > scaled_dimensions(int width, int height, int kernel_width, int kernel_height, const PadStrideInfo &pad_stride_info, const Size2D &dilation=Size2D(1U, 1U))
Returns expected width and height of output scaled tensor depending on dimensions rounding mode...
Definition: Utils.cpp:429
size_t get_data_layout_dimension_index(const DataLayout &data_layout, const DataLayoutDimension &data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:203
DataLayout
[DataLayout enum definition]
Definition: Types.h:113

◆ operator=() [1/2]

CpuGemmConv2d& operator= ( const CpuGemmConv2d )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

CpuGemmConv2d& operator= ( CpuGemmConv2d &&  )
delete

Prevent instances of this class from being moved (As this class contains non movable objects)

◆ prepare()

void prepare ( ITensorPack constants)
overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Parameters
[in]constantsVector that contains the constants tensors.
Note
Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from INEOperator.

Definition at line 659 of file CpuGemmConv2d.cpp.

References arm_compute::ACL_DST, arm_compute::ACL_SRC, arm_compute::ACL_SRC_1, ITensorPack::add_const_tensor(), Scheduler::get(), CpuAuxTensorHandler::get(), ITensorPack::get_const_tensor(), arm_compute::offset_int_vec(), arm_compute::test::validation::pack, and IScheduler::schedule_op().

Referenced by CpuGemmConv2d::run().

660 {
661  if(!_is_prepared)
662  {
663  // Variable weights executions that use fixed-format kernels
664  // need no reshaping of the weights.
665  if(this->isVarWeightsKernel())
666  {
667  _is_quantized ? _mm_gemmlowp->prepare(tensors) : _mm_gemm->prepare(tensors);
668  _is_prepared = true;
669  return;
670  }
671 
672  // Run weights reshaping and mark original weights tensor as unused
673  CpuAuxTensorHandler weights_reshaped(offset_int_vec(WeightsReshaped), _weights_reshaped, tensors);
674  auto weights = tensors.get_const_tensor(TensorType::ACL_SRC_1);
675  ITensorPack pack =
676  {
677  { TensorType::ACL_SRC, weights },
678  { TensorType::ACL_DST, weights_reshaped.get() }
679  };
680  NEScheduler::get().schedule_op(_weights_reshape_kernel.get(), 3, _weights_reshape_kernel->window(), pack);
681  weights->mark_as_unused();
682  ITensorPack gemm_pack = tensors;
683  gemm_pack.add_const_tensor(TensorType::ACL_SRC_1, weights_reshaped.get());
684  _is_quantized ? _mm_gemmlowp->prepare(gemm_pack) : _mm_gemm->prepare(gemm_pack);
685  _is_prepared = true;
686  }
687 }
virtual void schedule_op(ICPPKernel *kernel, const Hints &hints, const Window &window, ITensorPack &tensors)=0
Runs the kernel in the same thread as the caller synchronously.
int offset_int_vec(int offset)
Definition: MemoryHelpers.h:38
static IScheduler & get()
Access the scheduler singleton.
Definition: Scheduler.cpp:94

◆ run()

void run ( ITensorPack tensors)
overridevirtual

Run the kernels contained in the function.

Parameters
[in]tensorsVector that contains the tensors to operate on.

Reimplemented from INEOperator.

Definition at line 564 of file CpuGemmConv2d.cpp.

References arm_compute::ACL_DST, arm_compute::ACL_SRC, arm_compute::ACL_SRC_0, arm_compute::ACL_SRC_1, ITensorPack::add_const_tensor(), ITensorPack::add_tensor(), Tensor::allocator(), BorderSize::bottom, ITensor::buffer(), Window::DimY, arm_compute::test::validation::dst, TensorInfo::extend_padding(), Scheduler::get(), CpuAuxTensorHandler::get(), ITensorPack::get_const_tensor(), arm_compute::get_data_layout_dimension_index(), ITensorPack::get_tensor(), arm_compute::HEIGHT, TensorAllocator::import_memory(), ITensor::info(), arm_compute::NCHW, arm_compute::offset_int_vec(), arm_compute::test::validation::pack, ITensorInfo::padding(), CpuGemmConv2d::prepare(), IScheduler::schedule_op(), ITensorAllocator::soft_init(), arm_compute::test::validation::src, and BorderSize::top.

565 {
566  prepare(tensors);
567 
568  auto src = tensors.get_const_tensor(ACL_SRC_0);
569  auto dst = tensors.get_tensor(ACL_DST);
570  auto gemm_input_to_use = src;
571 
572  CpuAuxTensorHandler im2col_output(offset_int_vec(Im2ColOutput), _im2col_output, tensors, false);
573  CpuAuxTensorHandler gemm_output(offset_int_vec(GemmOutput), _gemm_output, tensors, false);
574  CpuAuxTensorHandler reshaped_wei(offset_int_vec(WeightsReshaped), _weights_reshaped, tensors, false);
575 
576  bool out_has_padding = _skip_col2im && (dst->info()->padding().bottom != 0 || dst->info()->padding().top != 0);
577  if(!_skip_im2col)
578  {
579  // Run input reshaping
580  unsigned int y_dim = get_data_layout_dimension_index(_data_layout, DataLayoutDimension::HEIGHT);
581  ITensorPack pack =
582  {
584  { TensorType::ACL_DST, im2col_output.get() }
585  };
586  NEScheduler::get().schedule_op(_im2col_kernel.get(), y_dim, _im2col_kernel->window(), pack);
587  gemm_input_to_use = im2col_output.get();
588  }
589 
590  // Handle the case where output has top/bottom padding
591  const ITensor *out_to_use = out_has_padding ? gemm_output.get() : dst;
592  Tensor gemm3d;
593  _gemm_output_3d.extend_padding(out_to_use->info()->padding());
594  gemm3d.allocator()->soft_init(_gemm_output_3d);
595  gemm3d.allocator()->import_memory(out_to_use->buffer());
596  auto gemm_output_to_use = gemm_output.get();
597 
598  if(_skip_im2col)
599  {
600  gemm_output_to_use = &gemm3d;
601  }
602  if(_skip_col2im && !out_has_padding)
603  {
604  gemm_output_to_use = dst;
605  }
606 
607  // Runs CpuGemm or CpuGemmLowpMatrixMultiplyCore functions
608  ITensorPack pack_mm = tensors;
609  pack_mm.add_const_tensor(TensorType::ACL_SRC_0, gemm_input_to_use);
610  if(!this->isVarWeightsKernel())
611  {
612  pack_mm.add_const_tensor(TensorType::ACL_SRC_1, reshaped_wei.get());
613  }
614  pack_mm.add_tensor(TensorType::ACL_DST, gemm_output_to_use);
615  if(_is_quantized)
616  {
617  // Run gemmlowp
618  _mm_gemmlowp->run(pack_mm);
619  }
620  else
621  {
622  // Run gemm
623  _mm_gemm->run(pack_mm);
624  }
625 
626  // Reshape output matrix
627  if(!_skip_col2im)
628  {
629  if(_data_layout == DataLayout::NCHW)
630  {
631  ITensorPack pack =
632  {
633  { TensorType::ACL_SRC, gemm_output.get() },
635  };
636  NEScheduler::get().schedule_op(_col2im_kernel.get(), Window::DimY, _col2im_kernel->window(), pack);
637  }
638  else
639  {
640  ITensorPack pack =
641  {
642  { TensorType::ACL_SRC, gemm_output_to_use },
643  { TensorType::ACL_DST, dst }
644  };
645  NEScheduler::get().schedule_op(_reshape_kernel.get(), Window::DimY, _reshape_kernel->window(), pack);
646  }
647  }
648  else if(out_has_padding)
649  {
650  ITensorPack pack =
651  {
652  { TensorType::ACL_SRC, gemm_output_to_use },
653  { TensorType::ACL_DST, dst }
654  };
655  NEScheduler::get().schedule_op(_reshape_kernel.get(), Window::DimY, _reshape_kernel->window(), pack);
656  }
657 }
virtual void schedule_op(ICPPKernel *kernel, const Hints &hints, const Window &window, ITensorPack &tensors)=0
Runs the kernel in the same thread as the caller synchronously.
bool extend_padding(const PaddingSize &padding) override
Update the offset to the first element, the strides and the total size.
Definition: TensorInfo.cpp:280
void prepare(ITensorPack &tensors) override
Prepare the function for executing.
SimpleTensor< float > src
Definition: DFT.cpp:155
Num samples, channels, height, width.
static constexpr size_t DimY
Alias for dimension 1 also known as Y dimension.
Definition: Window.h:45
size_t get_data_layout_dimension_index(const DataLayout &data_layout, const DataLayoutDimension &data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:203
int offset_int_vec(int offset)
Definition: MemoryHelpers.h:38
static IScheduler & get()
Access the scheduler singleton.
Definition: Scheduler.cpp:94

◆ validate()

Status validate ( const ITensorInfo src,
const ITensorInfo weights,
const ITensorInfo biases,
const ITensorInfo output,
const PadStrideInfo conv_info,
const WeightsInfo weights_info = WeightsInfo(),
const Size2D dilation = Size2D(1U, 1U),
const ActivationLayerInfo act_info = ActivationLayerInfo(),
bool  enable_fast_math = false,
unsigned int  num_groups = 1 
)
static

Static function to check if given info will lead to a valid configuration.

Similar to CpuGemmConvolution::configure()

Returns
a status

Definition at line 430 of file CpuGemmConv2d.cpp.

References WeightsInfo::are_reshaped(), ARM_COMPUTE_RETURN_ERROR_ON, ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_LAYOUT, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES, ARM_COMPUTE_RETURN_ERROR_ON_MSG, ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR, ARM_COMPUTE_RETURN_ON_ERROR, arm_compute::BATCHES, arm_compute::BFLOAT16, arm_compute::block_by(), arm_compute::CHANNEL, arm_compute::misc::shape_calculator::compute_weights_reshaped_shape(), arm_compute::test::validation::conv_info, ITensorInfo::data_layout(), arm_compute::test::validation::data_type, ITensorInfo::data_type(), ITensorInfo::dimension(), arm_compute::test::validation::dst, arm_compute::F16, arm_compute::F32, arm_compute::get_data_layout_dimension_index(), arm_compute::HEIGHT, arm_compute::is_data_type_quantized_asymmetric(), arm_compute::is_fixed_format(), arm_compute::NCHW, ITensorInfo::num_dimensions(), arm_compute::QASYMM8, arm_compute::QASYMM8_SIGNED, arm_compute::QSYMM8_PER_CHANNEL, ITensorInfo::quantization_info(), arm_compute::S32, arm_compute::scaled_dimensions(), TensorShape::set(), arm_compute::test::validation::set_data_layout(), TensorInfo::set_quantization_info(), arm_compute::test::validation::src, ITensorInfo::tensor_shape(), arm_compute::UNSPECIFIED, CpuCol2ImKernel::validate(), CpuIm2ColKernel::validate(), WeightsInfo::weight_format(), and arm_compute::WIDTH.

Referenced by CpuGemmConv2d::configure(), CpuConv2d::validate(), and NEGEMMConvolutionLayer::validate().

432 {
434  ARM_COMPUTE_RETURN_ERROR_ON_MSG(weights_info.are_reshaped(), "Weights already reshaped are not supported!");
437 
438  if(!is_fixed_format(weights_info.weight_format()))
439  {
441  }
442 
443  ARM_COMPUTE_RETURN_ERROR_ON_MSG(num_groups > 1, "Grouping (num_groups != 1) is not supported");
444 
445  const DataLayout data_layout = src->data_layout();
446  const DataType data_type = src->data_type();
449  const int idx_channel = get_data_layout_dimension_index(data_layout, DataLayoutDimension::CHANNEL);
450  const int idx_kernels = get_data_layout_dimension_index(data_layout, DataLayoutDimension::BATCHES);
451 
452  const unsigned int kernel_width = weights->dimension(idx_width);
453  const unsigned int kernel_height = weights->dimension(idx_height);
454 
455  TensorInfo im2col_reshaped_info{};
456  TensorInfo info_gemm{};
457  TensorInfo tmp_info{};
458  TensorInfo weights_reshaped_info{};
459  const ITensorInfo *gemm_input_to_use = src;
460  const ITensorInfo *gemm_output_to_use = dst;
461  const ITensorInfo *weights_to_use = weights;
462 
463  const bool append_bias = false;
464  const bool is_quantized = is_data_type_quantized_asymmetric(data_type);
465  const bool is_bf16 = data_type == DataType::BFLOAT16;
466 
467  // Get convolved dimensions
468  unsigned int conv_w = 0;
469  unsigned int conv_h = 0;
470 
471  std::tie(conv_w, conv_h) = scaled_dimensions(src->dimension(idx_width),
472  src->dimension(idx_height),
473  kernel_width,
474  kernel_height,
475  conv_info,
476  dilation);
477 
478  // Check if GEMM3D is supported
479  const CpuGemmConv2d::SkipInfo skip_info = CpuGemmConv2d::skip_im_col_info(src, weights, conv_info,
480  dilation, act_info);
481  const bool skip_im2col = skip_info.skip_im2col, skip_col2im = skip_info.skip_col2im;
482 
483  ARM_COMPUTE_RETURN_ERROR_ON(weights->dimension(idx_channel) != src->dimension(idx_channel));
484  ARM_COMPUTE_RETURN_ERROR_ON(weights->num_dimensions() > 4);
485 
486  // Validate biases
487  if(biases != nullptr)
488  {
489  if(is_quantized)
490  {
492  }
493  else if(is_bf16)
494  {
496  }
497  else
498  {
500  }
501  ARM_COMPUTE_RETURN_ERROR_ON(biases->dimension(0) != dst->dimension(idx_channel));
502  ARM_COMPUTE_RETURN_ERROR_ON(biases->num_dimensions() > 1);
503  }
504 
505  unsigned int mat_weights_cols = weights->dimension(idx_kernels);
506  unsigned int mat_weights_rows = weights->dimension(idx_width) * weights->dimension(idx_height) * weights->dimension(idx_channel);
507 
508  weights_reshaped_info = TensorInfo(compute_weights_reshaped_shape(*weights, append_bias), 1, weights->data_type());
509  weights_reshaped_info.set_quantization_info(weights->quantization_info());
510  weights_to_use = &weights_reshaped_info;
511 
512  if(!skip_im2col)
513  {
514  const int block_by = arm_compute::block_by(weights_info.weight_format());
515  int input_pad_right = 0;
516  if(block_by > 1)
517  {
518  input_pad_right = (src->dimension(idx_channel) % block_by) == 0 ? 0 : block_by - (src->dimension(idx_channel) % block_by);
519  mat_weights_rows = weights->dimension(idx_width) * weights->dimension(idx_height) * (weights->dimension(idx_channel) + input_pad_right);
520  }
521 
522  // Create tensor info for im2col reshaped inputs
523  // For CPU, the batch size is on the fourth dimension
524  TensorShape shape_im2col = src->tensor_shape();
525  shape_im2col.set(0, mat_weights_rows);
526  shape_im2col.set(1, conv_w * conv_h);
527  shape_im2col.set(2, 1);
528 
529  im2col_reshaped_info = TensorInfo(shape_im2col, 1, data_type);
530  im2col_reshaped_info.set_quantization_info(src->quantization_info());
531  ARM_COMPUTE_RETURN_ON_ERROR(kernels::CpuIm2ColKernel::validate(src, &im2col_reshaped_info, Size2D(kernel_width, kernel_height), conv_info, append_bias, dilation, num_groups, input_pad_right));
532  gemm_input_to_use = &im2col_reshaped_info;
533  }
534 
535  // Create temporary GEMM output tensor in case we cannot skip col2im
536  const DataType output_data_type = data_type == DataType::BFLOAT16 ? DataType::F32 : data_type;
537  if(!skip_col2im)
538  {
539  TensorShape shape_gemm = gemm_input_to_use->tensor_shape();
540  shape_gemm.set(0, mat_weights_cols);
541  shape_gemm.set(1, conv_w * conv_h);
542  info_gemm = TensorInfo(shape_gemm, 1, output_data_type);
543  }
544  else
545  {
546  info_gemm = TensorInfo(dst->tensor_shape(), 1, output_data_type);
547  }
548  info_gemm.set_quantization_info(dst->quantization_info()).set_data_layout(src->data_layout());
549  gemm_output_to_use = &info_gemm;
550  const bool fixed_format = weights_info.weight_format() != arm_compute::WeightFormat::UNSPECIFIED;
551 
552  ARM_COMPUTE_RETURN_ON_ERROR(validate_mm(gemm_input_to_use, weights_to_use, biases, gemm_output_to_use, act_info, enable_fast_math, skip_col2im ? conv_h : 0, skip_im2col, fixed_format,
553  weights_info.weight_format()));
554 
555  // Validate Col2Im/ReshapeLayer
556  if(!skip_col2im && (data_layout == DataLayout::NCHW))
557  {
558  ARM_COMPUTE_RETURN_ON_ERROR(kernels::CpuCol2ImKernel::validate(gemm_output_to_use, dst, Size2D(conv_w, conv_h)));
559  }
560 
561  return Status{};
562 }
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_LAYOUT(...)
Definition: Validate.h:490
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204
1 channel, 1 F32 per channel
static Status validate(const ITensorInfo *src, const ITensorInfo *dst, const Size2D &kernel_dims, const PadStrideInfo &conv_info, bool has_bias, const Size2D &dilation=Size2D(1U, 1U), unsigned int num_groups=1, unsigned int input_pad_right=0)
Static function to check if given info will lead to a valid configuration.
#define ARM_COMPUTE_RETURN_ERROR_ON(cond)
If the condition is true, an error is returned.
Definition: Error.h:296
SimpleTensor< float > src
Definition: DFT.cpp:155
1 channel, 1 F16 per channel
std::pair< unsigned int, unsigned int > scaled_dimensions(int width, int height, int kernel_width, int kernel_height, const PadStrideInfo &pad_stride_info, const Size2D &dilation=Size2D(1U, 1U))
Returns expected width and height of output scaled tensor depending on dimensions rounding mode...
Definition: Utils.cpp:429
#define ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR(...)
Definition: Validate.h:159
1 channel, 1 S32 per channel
16-bit brain floating-point number
static Status validate(const ITensorInfo *src, const ITensorInfo *dst, const Size2D &convolved_dims)
Static function to check if given info will lead to a valid configuration.
quantized, asymmetric fixed-point 8-bit number unsigned
const unsigned int num_groups
Definition: Im2Col.cpp:153
bool is_fixed_format(const WeightFormat &wf)
Definition: Types.h:2083
int block_by(const WeightFormat wf)
Definition: Types.h:2079
Num samples, channels, height, width.
src_info set_data_layout(data_layout)
bool is_data_type_quantized_asymmetric(DataType dt)
Check if a given data type is of asymmetric quantized type.
Definition: Utils.h:1052
quantized, symmetric per channel fixed-point 8-bit number
TensorShape compute_weights_reshaped_shape(const ITensorInfo &weights, bool has_bias=false, unsigned int num_groups=1)
Calculate the reshaped shape of the weights.
size_t get_data_layout_dimension_index(const DataLayout &data_layout, const DataLayoutDimension &data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:203
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(...)
Definition: Validate.h:541
#define ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(t, c,...)
Definition: Validate.h:788
#define ARM_COMPUTE_RETURN_ERROR_ON_MSG(cond, msg)
If the condition is true, an error is returned.
Definition: Error.h:244
quantized, asymmetric fixed-point 8-bit number signed
DataType
Available data types.
Definition: Types.h:79
DataLayout
[DataLayout enum definition]
Definition: Types.h:113

◆ workspace()

experimental::MemoryRequirements workspace ( ) const
overridevirtual

Return the memory requirements required by the workspace.

Reimplemented from INEOperator.

Definition at line 688 of file CpuGemmConv2d.cpp.

689 {
690  return _aux_mem;
691 }

The documentation for this class was generated from the following files: