Basic function to compute the convolution layer. More...

#include <CpuGemmConv2d.h>

Collaboration diagram for CpuGemmConv2d:

Public Member Functions
	CpuGemmConv2d ()
	Constructor. More...

	CpuGemmConv2d (const CpuGemmConv2d &)=delete
	Prevent instances of this class from being copied (As this class contains pointers) More...

	CpuGemmConv2d (CpuGemmConv2d &&)=delete
	Prevent instances of this class from being moved (As this class contains non movable objects) More...

CpuGemmConv2d &	operator= (const CpuGemmConv2d &)=delete
	Prevent instances of this class from being copied (As this class contains pointers) More...

CpuGemmConv2d &	operator= (CpuGemmConv2d &&)=delete
	Prevent instances of this class from being moved (As this class contains non movable objects) More...

	~CpuGemmConv2d ()
	Destructor. More...

void	configure (const ITensorInfo src, const ITensorInfo weights, const ITensorInfo biases, ITensorInfo dst, const PadStrideInfo &conv_info, const WeightsInfo &weights_info=WeightsInfo(), const Size2D &dilation=Size2D(1U, 1U), const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false, unsigned int num_groups=1)
	Set the input and output tensors. More...

void	run (ITensorPack &tensors) override
	Run the kernels contained in the function. More...

void	prepare (ITensorPack &tensors) override
	Prepare the function for executing. More...

experimental::MemoryRequirements	workspace () const override
	Return the memory requirements required by the workspace. More...

Public Member Functions inherited from INEOperator
	INEOperator (IRuntimeContext *ctx=nullptr)
	Constructor. More...

	INEOperator (const INEOperator &)=delete
	Prevent instances of this class from being copied (As this class contains pointers) More...

	INEOperator (INEOperator &&)=default
	Default move constructor. More...

INEOperator &	operator= (const INEOperator &)=delete
	Prevent instances of this class from being copied (As this class contains pointers) More...

INEOperator &	operator= (INEOperator &&)=default
	Default move assignment operator. More...

	~INEOperator ()
	Default destructor. More...

Public Member Functions inherited from IOperator
virtual	~IOperator ()=default
	Destructor. More...

Static Public Member Functions
static Status	validate (const ITensorInfo src, const ITensorInfo weights, const ITensorInfo biases, const ITensorInfo output, const PadStrideInfo &conv_info, const WeightsInfo &weights_info=WeightsInfo(), const Size2D &dilation=Size2D(1U, 1U), const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false, unsigned int num_groups=1)
	Static function to check if given info will lead to a valid configuration. More...

static Status	has_opt_impl (arm_compute::WeightFormat &expected_weight_format, const ITensorInfo src, const ITensorInfo weights, const ITensorInfo biases, const ITensorInfo output, const PadStrideInfo &conv_info, const WeightsInfo &weights_info=WeightsInfo(), const Size2D &dilation=Size2D(1U, 1U), const ActivationLayerInfo &act_info=ActivationLayerInfo(), const bool enable_fast_math=false)
	Indicates whether or not there is an optimal assembly implementation that can be used to process the given parameters. More...

Detailed Description

Basic function to compute the convolution layer.

Weight Transformations in CpuGemmConv2d

Definition at line 51 of file CpuGemmConv2d.h.

Constructor & Destructor Documentation

◆ CpuGemmConv2d() [1/3]

CpuGemmConv2d ( )

Constructor.

Definition at line 211 of file CpuGemmConv2d.cpp.

     : _weights_reshape(nullptr),
       _weights_reshape_and_transpose_kernel(nullptr),
       _im2col_kernel(),
       _mm_gemm(),
       _mm_gemmlowp(),
       _col2im_kernel(),
       _reshape(),
       _im2col_output(),
       _weights_reshaped(),
       _gemm_output(),
       _gemm_output_3d(),
       _data_layout(DataLayout::NCHW),
       _skip_im2col(false),
       _skip_col2im(false),
       _is_quantized(false),
       _is_prepared(false),
       _wt_method(WeightTransformMethod::ReshapeThenTranspose),
       _run_wt(true),
       _aux_mem(AuxTensorIdx::Count)
 {
 }

References arm_compute::NCHW.

◆ CpuGemmConv2d() [2/3]

CpuGemmConv2d ( const CpuGemmConv2d & )

delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ CpuGemmConv2d() [3/3]

CpuGemmConv2d ( CpuGemmConv2d && )

delete

Prevent instances of this class from being moved (As this class contains non movable objects)

◆ ~CpuGemmConv2d()

~CpuGemmConv2d ( )

default

Destructor.

Member Function Documentation

◆ configure()

void configure	(	const ITensorInfo *	src,
		const ITensorInfo *	weights,
		const ITensorInfo *	biases,
		ITensorInfo *	dst,
		const PadStrideInfo &	conv_info,
		const WeightsInfo &	weights_info = `WeightsInfo()`,
		const Size2D &	dilation = `Size2D(1U, 1U)`,
		const ActivationLayerInfo &	act_info = `ActivationLayerInfo()`,
		bool	enable_fast_math = `false`,
		unsigned int	num_groups = `1`
	)

Set the input and output tensors.

Valid data layouts:

NHWC
NCHW

Valid data type configurations:

src0	src1	src2	dst
F16	F16	F16	F16
F32	F32	F32	F32
BFLOAT16	BFLOAT16	BFLOAT16	BFLOAT16
QASYMM8	QASYMM8	S32	QASYMM8
QASYMM8	QSYMM8_PER_CHANNEL	S32	QASYMM8
QASYMM8_SIGNED	QASYMM8_SIGNED	S32	QASYMM8_SIGNED
QASYMM8_SIGNED	QSYMM8_PER_CHANNEL	S32	QASYMM8_SIGNED

Parameters

[in]	src	Source tensor info. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: QASYMM8/QASYMM8_SIGNED/BFLOAT16/F16/F32.
[in]	weights	Weights tensor info. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. Data type supported: QASYMM8/QASYMM8_SIGNED/QSYMM8_PER_CHANNEL/BFLOAT16/F16/F32.
[in]	biases	Biases tensor info. Shared biases supported. Biases are 1D tensor with dimensions [OFM]. Data type supported: Should match `input` data type, except for input of QASYMM8/QASYMM8_SIGNED type where biases should be of S32 type.
[out]	dst	Destination tensor info. 3 lower dimensions represent a single output [width, height, OFM], while the rest represent batch of outputs. Data types supported: Same as `input`.
[in]	conv_info	Contains padding and stride information described in PadStrideInfo.
[in]	weights_info	Specifies if the weights tensor has been reshaped with CpuWeightsReshapeKernel. If this is not part of the fully connected layer the weights tensor has also been transposed with cpu::kernels::CpuGemmTranspose1xWKernel. Data type supported: Same as `input`.
[in]	dilation	(Optional) Dilation, in elements, across x and y. Defaults to (1, 1).
[in]	act_info	(Optional) Activation layer information in case of a fused activation. Only RELU, BOUNDED_RELU and LU_BOUNDED_RELU supported.
[in]	enable_fast_math	(Optional) Enable fast math computation. In case this flag were set, the function could dispatch the fastest implementation available which may introduce a drop of accuracy as well. Default is false
[in]	num_groups	(Optional) Number of groups when performing a grouped convolution. num_groups != 1 is not supported

Which weights tensor should we use to configure gemm

A. The problem: In principle, we should use the weights tensor corresponding to the weights transformation path. I.e.:

If no weight transformation (_run_wt == false): Use original weights
else: Use transformed weights However in practice we have a dilemma:
We need to know _run_wt before we can configure gemm with the corresponding weights, but
_run_wt depends on isVarWeightsKernel(), which is only known after gemm is configured

B. The decision: To simplify the matter, we decide to always use the transformed weights, regardless of _run_wt

This decision requires the following conditions:

The underlying gemm where isVarWeightsKernel() == true, must guarantee that: A. Ignore the flag to transpose weights (GEMMInfo::pretranspose_B) B. Use weights/B tensor passed to it at prepare() or run() instead of that passed at configure()
CpuGemmConv2d where isVarWeightsKernel() == true, must guarantee that: A. Pass original weights instead of reshaped or reinterpreted weights

C. Future actions: Condition 2 is a given, based on our implementation. If condition 1 cannot hold, we must make changes to the underlying gemm to:

Either expose isVarWeightsKernel() before gemm is configured somehow, or
Take in an additional "original_weights" tensor info at configure

Definition at line 420 of file CpuGemmConv2d.cpp.

 {
     ARM_COMPUTE_ERROR_ON_NULLPTR(src, weights, dst);
     ARM_COMPUTE_UNUSED(num_groups, weights_info);
     ARM_COMPUTE_ERROR_THROW_ON(CpuGemmConv2d::validate(src, weights, biases, dst, conv_info, weights_info, dilation,
                                                        act_info, enable_fast_math, num_groups));
     ARM_COMPUTE_LOG_PARAMS(src, weights, biases, dst, conv_info, weights_info, dilation, act_info, enable_fast_math,
                            num_groups);
  
     const DataType   data_type   = src->data_type();
     const DataLayout data_layout = src->data_layout();
     const int        idx_width   = get_data_layout_dimension_index(data_layout, DataLayoutDimension::WIDTH);
     const int        idx_height  = get_data_layout_dimension_index(data_layout, DataLayoutDimension::HEIGHT);
     const int        idx_channel = get_data_layout_dimension_index(data_layout, DataLayoutDimension::CHANNEL);
     const int        idx_kernels = get_data_layout_dimension_index(data_layout, DataLayoutDimension::BATCHES);
  
     const unsigned int kernel_width  = weights->dimension(idx_width);
     const unsigned int kernel_height = weights->dimension(idx_height);
  
     _is_prepared  = weights_info.retain_internal_weights();
     _is_quantized = is_data_type_quantized_asymmetric(src->data_type());
     _data_layout  = data_layout;
     _skip_im2col  = (data_layout == DataLayout::NHWC && kernel_width == 1 && kernel_height == 1 &&
                     conv_info.stride().first == 1 && conv_info.stride().second == 1);
  
     const ITensorInfo *gemm_input_to_use  = src;
     ITensorInfo       *gemm_output_to_use = dst;
  
     // Get convolved dimensions
     unsigned int conv_w      = 0;
     unsigned int conv_h      = 0;
     std::tie(conv_w, conv_h) = scaled_dimensions(src->dimension(idx_width), src->dimension(idx_height), kernel_width,
                                                  kernel_height, conv_info, dilation);
  
     ARM_COMPUTE_ERROR_ON_MSG((dst->dimension(idx_width) != conv_w) || (dst->dimension(idx_height) != conv_h),
                              "Output shape does not match the expected one");
  
     // Check if GEMM3D is supported
     const CpuGemmConv2d::SkipInfo skip_info =
         CpuGemmConv2d::skip_im_col_info(src, weights, conv_info, dilation, act_info);
     _skip_im2col = skip_info.skip_im2col;
     _skip_col2im = skip_info.skip_col2im;
  
     // Get parameters from conv_info
     unsigned int stride_x        = 0;
     unsigned int stride_y        = 0;
     std::tie(stride_x, stride_y) = conv_info.stride();
  
     // Initialize reshaped weights
     initialize_reshaped_weight_info(*weights, _weights_reshaped);
  
     // Create tensor to store im2col reshaped inputs
     if (!_skip_im2col)
     {
         const int    block_by        = arm_compute::block_by(weights_info.weight_format());
         unsigned int input_pad_right = 0;
         if (block_by > 1)
         {
             input_pad_right =
                 (src->dimension(idx_channel) % block_by) == 0 ? 0 : block_by - (src->dimension(idx_channel) % block_by);
         }
         // Configure
         _im2col_kernel = std::make_unique<kernels::CpuIm2ColKernel>();
         _im2col_kernel->configure(src, &_im2col_output, Size2D(kernel_width, kernel_height), conv_info, false, dilation,
                                   num_groups, input_pad_right);
  
         // Update GEMM input
         gemm_input_to_use = &_im2col_output;
     }
  
     const unsigned int mat_weights_cols = weights->dimension(idx_kernels);
  
     // Create temporary GEMM output tensor in case we cannot skip col2im
     const DataType output_data_type = data_type == DataType::BFLOAT16 ? DataType::F32 : data_type;
     if (!_skip_col2im)
     {
         TensorShape shape_gemm;
  
         // Calculate GEMM output shape
         shape_gemm = _im2col_output.tensor_shape();
         shape_gemm.set(0, mat_weights_cols);
         shape_gemm.set(1, conv_w * conv_h);
  
         _gemm_output = TensorInfo(shape_gemm, 1, output_data_type);
         _gemm_output.set_quantization_info(dst->quantization_info()).set_data_layout(src->data_layout());
         _gemm_output_3d = TensorInfo(_gemm_output);
  
         // Update GEMM output
         gemm_output_to_use = &_gemm_output;
     }
     else
     {
         _gemm_output_3d = TensorInfo(*dst);
         _gemm_output_3d.set_data_type(output_data_type).set_data_layout(src->data_layout()).set_is_resizable(true);
         _gemm_output = TensorInfo(_gemm_output_3d);
  
         // Update GEMM output
         gemm_output_to_use = &_gemm_output_3d;
     }
  
     // Configure GEMM
     // In case we need to skip col2im, GEMM3D (gemm_3d_depth != 0) must be called in order to avoid reshaping the output matrix
     const unsigned int gemm_3d_depth = _skip_col2im ? conv_h : 0;
     const bool         fixed_format  = weights_info.weight_format() != arm_compute::WeightFormat::UNSPECIFIED;
     /** @section note_CpuGemmConv2d_weight_use_in_configure  Which weights tensor should we use to configure gemm
      *
      *  A. The problem:
      *      In principle, we should use the weights tensor corresponding to the weights transformation path. I.e.:
      *          - If no weight transformation (_run_wt == false): Use original weights
      *          - else:                                           Use transformed weights
      *      However in practice we have a dilemma:
      *          - We need to know _run_wt before we can configure gemm with the corresponding weights, but
      *          - _run_wt depends on isVarWeightsKernel(), which is only known after gemm is configured
      *
      *  B. The decision:
      *      To simplify the matter, we decide to always use the transformed weights, regardless of _run_wt
      *
      *      This decision requires the following conditions:
      *          1. The underlying gemm where isVarWeightsKernel() == true, must guarantee that:
      *              A. Ignore the flag to transpose weights (GEMMInfo::pretranspose_B)
      *              B. Use weights/B tensor passed to it at prepare() or run() instead of that passed at configure()
      *          2. CpuGemmConv2d where isVarWeightsKernel() == true, must guarantee that:
      *              A. Pass original weights instead of reshaped or reinterpreted weights
      *
      *  C. Future actions:
      *      Condition 2 is a given, based on our implementation.
      *      If condition 1 cannot hold, we must make changes to the underlying gemm to:
      *           1. Either expose isVarWeightsKernel() before gemm is configured somehow, or
      *           2. Take in an additional "original_weights" tensor info at configure
      */
     configure_mm(gemm_input_to_use, &_weights_reshaped, biases, gemm_output_to_use, act_info, enable_fast_math,
                  gemm_3d_depth, fixed_format, weights_info.weight_format());
  
     // Can only decide isVarWeightsKernel after gemm is configured
     _run_wt = !isVarWeightsKernel();
  
     if (!_skip_col2im && _data_layout == DataLayout::NCHW)
     {
         // Configure col2im
         _col2im_kernel = std::make_unique<kernels::CpuCol2ImKernel>();
         _col2im_kernel->configure(gemm_output_to_use, dst, Size2D(conv_w, conv_h));
     }
     else
     {
         // Configure reshape layer
         _reshape = std::make_unique<CpuReshape>();
         _reshape->configure(gemm_output_to_use, dst);
     }
  
     // Check lifetime
     _aux_mem[Im2ColOutput] =
         MemoryInfo(offset_int_vec(Im2ColOutput), MemoryLifetime::Temporary, _im2col_output.total_size());
     // Add WeightsReshaped memory requirement to workspace
     // Note that in case of WeightTransformMethod::ReinterpretThenTranspose, we do not need to allocate this memory
     // However since we cannot determine weight transformation method until prepare (see prepare()), we will have to
     // settle with allocating more
     if (_run_wt)
     {
         // Check if GEMM transforms weights
         // If weight is further transformed by underlying gemm after ReshapeThenTranspose then we can free
         // WeightsReshaped in prepare
         // Otherwise WeightsReshaped is the final transformation of weights and needs to persist
         bool gemm_trans_wei = _aux_mem[GemmAsmPretransposedRHS].size > 0;
         gemm_trans_wei      = _mm_gemm != nullptr ? _aux_mem[GemmTransposed1xWRHS].size > 0 : gemm_trans_wei;
         gemm_trans_wei      = _mm_gemmlowp != nullptr ? _aux_mem[GemmLowpTransposed1xWRHS].size > 0 : gemm_trans_wei;
  
         _aux_mem[WeightsReshaped] = MemoryInfo(offset_int_vec(WeightsReshaped),
                                                gemm_trans_wei ? MemoryLifetime::Prepare : MemoryLifetime::Persistent,
                                                _weights_reshaped.total_size());
     }
     _aux_mem[GemmOutput] = MemoryInfo(offset_int_vec(GemmOutput), MemoryLifetime::Temporary, _gemm_output.total_size());
 }

◆ has_opt_impl()

Status has_opt_impl	(	arm_compute::WeightFormat &	expected_weight_format,
		const ITensorInfo *	src,
		const ITensorInfo *	weights,
		const ITensorInfo *	biases,
		const ITensorInfo *	output,
		const PadStrideInfo &	conv_info,
		const WeightsInfo &	weights_info = `WeightsInfo()`,
		const Size2D &	dilation = `Size2D(1U, 1U)`,
		const ActivationLayerInfo &	act_info = `ActivationLayerInfo()`,
		const bool	enable_fast_math = `false`
	)

static

Indicates whether or not there is an optimal assembly implementation that can be used to process the given parameters.

The parameter list is the same as NEGEMMConvolutionLayer::has_opt_impl

Returns: a status.

Which weights tensor should we use for has_opt_impl

For the pretranspose_B flag, this shares a similar problem and thus the same decision as that of Which weights tensor should we use to configure gemm

But for the weights, we shall always use the original instead of reshaped weights here

Definition at line 602 of file CpuGemmConv2d.cpp.

 {
     const DataLayout   data_layout   = src->data_layout();
     const int          idx_width     = get_data_layout_dimension_index(data_layout, DataLayoutDimension::WIDTH);
     const int          idx_height    = get_data_layout_dimension_index(data_layout, DataLayoutDimension::HEIGHT);
     const unsigned int kernel_width  = weights->dimension(idx_width);
     const unsigned int kernel_height = weights->dimension(idx_height);
     unsigned int       conv_w        = 0;
     unsigned int       conv_h        = 0;
     std::tie(conv_w, conv_h) = scaled_dimensions(src->dimension(idx_width), src->dimension(idx_height), kernel_width,
                                                  kernel_height, conv_info, dilation);
  
     const CpuGemmConv2d::SkipInfo skip_info =
         CpuGemmConv2d::skip_im_col_info(src, weights, conv_info, dilation, act_info);
  
     const bool         skip_im2col   = skip_info.skip_im2col;
     const bool         skip_col2im   = skip_info.skip_col2im;
     const unsigned int gemm_3d_depth = skip_col2im ? conv_h : 0;
     const bool         fixed_format  = weights_info.weight_format() != arm_compute::WeightFormat::UNSPECIFIED;
  
     /** @section note_CpuGemmConv2d_weight_use_in_has_opt_impl Which weights tensor should we use for has_opt_impl
      *
      *  For the pretranspose_B flag, this shares a similar problem and thus the same decision as that of
      *  @ref note_CpuGemmConv2d_weight_use_in_configure
      *
      *  But for the weights, we shall always use the original instead of reshaped weights here
      */
     const GEMMInfo gemm_info = GEMMInfo(false, false, true /* Reshape weights only for the first run */, gemm_3d_depth,
                                         skip_im2col /* Reinterpret the input as 3D if im2col is skipped */, false,
                                         GEMMLowpOutputStageInfo(), false, enable_fast_math, false, act_info,
                                         fixed_format, weights_info.weight_format(), true /* pretranspose_B */);
  
     return CpuGemm::has_opt_impl(expected_weight_format, src, weights, biases, dst, gemm_info);
 }

References arm_compute::test::validation::act_info, arm_compute::test::validation::conv_info, arm_compute::cpu::data_layout, ITensorInfo::dimension(), arm_compute::test::validation::dst, arm_compute::get_data_layout_dimension_index(), CpuGemm::has_opt_impl(), arm_compute::HEIGHT, arm_compute::test::validation::idx_height, arm_compute::test::validation::idx_width, arm_compute::scaled_dimensions(), arm_compute::test::validation::src, arm_compute::UNSPECIFIED, arm_compute::test::validation::weights_info, and arm_compute::WIDTH.

Referenced by NEGEMMConvolutionLayer::has_opt_impl().

◆ operator=() [1/2]

CpuGemmConv2d& operator= ( const CpuGemmConv2d & )

delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

CpuGemmConv2d& operator= ( CpuGemmConv2d && )

delete

Prevent instances of this class from being moved (As this class contains non movable objects)

◆ prepare()

void prepare ( ITensorPack & constants )

overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Parameters

[in] constants Vector that contains the constants tensors.

Note: Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from INEOperator.

Definition at line 895 of file CpuGemmConv2d.cpp.

 {
     if (!_is_prepared)
     {
         auto weights = tensors.get_const_tensor(TensorType::ACL_SRC_1);
         // Determine which weights reshape path to take
         // Note that this decision can only occur at prepare instead of configure because it relies on the presence of
         // any holes in the weight tensor, which may change after configure (e.g. from extending padding)
         if (_run_wt)
         {
             _wt_method = get_wt_method(*(weights->info()));
             switch (_wt_method)
             {
                 case (WeightTransformMethod::FusedReshapeAndTranspose):
                 {
                     ARM_COMPUTE_LOG_INFO_WITH_FUNCNAME_ACL("Perform weight transformation: FusedReshapeAndTranspose");
                     _weights_reshape_and_transpose_kernel = std::make_unique<kernels::CpuWeightsReshapeKernel>();
                     _weights_reshape_and_transpose_kernel->configure(weights->info(), nullptr, &_weights_reshaped);
                     break;
                 }
                 case (WeightTransformMethod::ReshapeThenTranspose):
                 {
                     ARM_COMPUTE_LOG_INFO_WITH_FUNCNAME_ACL("Perform weight transformation: ReshapeThenTranspose");
                     _weights_reshape = std::make_unique<CpuReshape>();
                     _weights_reshape->configure(weights->info(), &_weights_reshaped);
                     break;
                 }
                 case (WeightTransformMethod::ReinterpretThenTranspose):
                 {
                     ARM_COMPUTE_LOG_INFO_WITH_FUNCNAME_ACL("Perform weight transformation: ReinterpretThenTranspose");
                     // Nothing to configure
                     break;
                 }
                 default:
                 {
                     ARM_COMPUTE_ERROR("Unsupported weight transform method");
                 }
             }
         }
         else
         {
             ARM_COMPUTE_LOG_INFO_WITH_FUNCNAME_ACL("No weight transformation is performed");
         }
         ITensorPack gemm_pack = tensors;
         // Allocate reshaped weights if required
         CpuAuxTensorHandler reinterpreted_wei(
             _weights_reshaped,
             *weights); // Re-interpreted weights. Only tensor shape is changed. No allocation
         CpuAuxTensorHandler reshaped_wei(offset_int_vec(WeightsReshaped), _weights_reshaped, tensors);
         // Run weights reshape if required
         if (_run_wt)
         {
             switch (_wt_method)
             {
                 case (WeightTransformMethod::FusedReshapeAndTranspose):
                 {
                     ITensorPack pack = {{TensorType::ACL_SRC, weights}, {TensorType::ACL_DST, reshaped_wei.get()}};
                     NEScheduler::get().schedule_op(_weights_reshape_and_transpose_kernel.get(), Window::DimW,
                                                    _weights_reshape_and_transpose_kernel->window(), pack);
                     weights->mark_as_unused();
                     gemm_pack.add_const_tensor(TensorType::ACL_SRC_1, reshaped_wei.get());
                     break;
                 }
                 case (WeightTransformMethod::ReshapeThenTranspose):
                 {
                     ITensorPack pack = {{TensorType::ACL_SRC, weights}, {TensorType::ACL_DST, reshaped_wei.get()}};
                     _weights_reshape->run(pack);
                     weights->mark_as_unused();
                     gemm_pack.add_const_tensor(TensorType::ACL_SRC_1, reshaped_wei.get());
                     break;
                 }
                 case (WeightTransformMethod::ReinterpretThenTranspose):
                 {
                     gemm_pack.add_const_tensor(TensorType::ACL_SRC_1, reinterpreted_wei.get());
                     // Nothing to run
                     break;
                 }
                 default:
                 {
                     ARM_COMPUTE_ERROR("Unsupported weight transform method");
                 }
             }
         }
         _is_quantized ? _mm_gemmlowp->prepare(gemm_pack) : _mm_gemm->prepare(gemm_pack);
  
         _is_prepared = true;
     }
 }

References arm_compute::ACL_DST, arm_compute::ACL_SRC, arm_compute::ACL_SRC_1, ITensorPack::add_const_tensor(), ARM_COMPUTE_ERROR, ARM_COMPUTE_LOG_INFO_WITH_FUNCNAME_ACL, Window::DimW, Scheduler::get(), CpuAuxTensorHandler::get(), ITensorPack::get_const_tensor(), arm_compute::offset_int_vec(), arm_compute::test::validation::pack, and IScheduler::schedule_op().

Referenced by CpuGemmConv2d::run().

◆ run()

void run ( ITensorPack & tensors )

overridevirtual

Run the kernels contained in the function.

Parameters

[in] tensors Vector that contains the tensors to operate on.

Reimplemented from INEOperator.

Definition at line 797 of file CpuGemmConv2d.cpp.

 {
     prepare(tensors);
  
     auto src               = tensors.get_const_tensor(ACL_SRC_0);
     auto dst               = tensors.get_tensor(ACL_DST);
     auto gemm_input_to_use = src;
  
     CpuAuxTensorHandler im2col_output(offset_int_vec(Im2ColOutput), _im2col_output, tensors, false);
     CpuAuxTensorHandler gemm_output(offset_int_vec(GemmOutput), _gemm_output, tensors, false);
  
     bool out_has_padding = _skip_col2im && (dst->info()->padding().bottom != 0 || dst->info()->padding().top != 0);
     if (!_skip_im2col)
     {
         // Run input reshaping
         unsigned int hint_dim            = get_data_layout_dimension_index(_data_layout, DataLayoutDimension::HEIGHT);
         unsigned int x_dim               = get_data_layout_dimension_index(_data_layout, DataLayoutDimension::WIDTH);
         unsigned int hint_dim_iterations = _im2col_kernel->window().num_iterations(hint_dim);
         unsigned int x_dim_iterations    = _im2col_kernel->window().num_iterations(x_dim);
         if (hint_dim_iterations < NEScheduler::get().num_threads() && x_dim_iterations > hint_dim_iterations)
         {
             hint_dim = x_dim;
         }
         ITensorPack pack = {{TensorType::ACL_SRC, src}, {TensorType::ACL_DST, im2col_output.get()}};
         NEScheduler::get().schedule_op(_im2col_kernel.get(), hint_dim, _im2col_kernel->window(), pack);
         gemm_input_to_use = im2col_output.get();
     }
  
     // Handle the case where output has top/bottom padding
     const ITensor *out_to_use = out_has_padding ? gemm_output.get() : dst;
     Tensor         gemm3d;
     _gemm_output_3d.extend_padding(out_to_use->info()->padding());
     gemm3d.allocator()->soft_init(_gemm_output_3d);
     gemm3d.allocator()->import_memory(out_to_use->buffer());
     auto gemm_output_to_use = gemm_output.get();
  
     if (_skip_im2col)
     {
         gemm_output_to_use = &gemm3d;
     }
     if (_skip_col2im && !out_has_padding)
     {
         gemm_output_to_use = dst;
     }
  
     ITensorPack gemm_pack = tensors;
     gemm_pack.add_const_tensor(TensorType::ACL_SRC_0, gemm_input_to_use);
     gemm_pack.add_tensor(TensorType::ACL_DST, gemm_output_to_use);
     // Allocate reshaped weights if required
     auto weights = gemm_pack.get_const_tensor(TensorType::ACL_SRC_1);
     ARM_COMPUTE_ERROR_ON_NULLPTR(weights);
     // Re-interpreted weights. Only tensor shape is changed. Only memory import, no allocation
     const bool use_reinterpreted_wei = (_run_wt && _wt_method == WeightTransformMethod::ReinterpretThenTranspose);
     CpuAuxTensorHandler reinterpreted_wei(
         _weights_reshaped, *weights,
         /* import only if we chose the ReinterpretThenTranspose path, because otherwise the weight may have been freed */
         !use_reinterpreted_wei);
  
     const bool          use_reshaped_wei = (_run_wt && (_wt_method == WeightTransformMethod::ReshapeThenTranspose ||
                                                _wt_method == WeightTransformMethod::FusedReshapeAndTranspose));
     CpuAuxTensorHandler reshaped_wei(offset_int_vec(WeightsReshaped), _weights_reshaped, tensors,
                                      false /* pack_inject */, !use_reshaped_wei /* bypass_alloc */,
                                      !use_reshaped_wei /* bypass_import */
     );
     // Update the weights to use if it has been reshaped
     if (use_reinterpreted_wei)
     {
         gemm_pack.add_const_tensor(TensorType::ACL_SRC_1, reinterpreted_wei.get());
     }
     else if (use_reshaped_wei)
     {
         gemm_pack.add_const_tensor(TensorType::ACL_SRC_1, reshaped_wei.get());
     }
  
     // Runs CpuGemm or CpuGemmLowpMatrixMultiplyCore functions
     _is_quantized ? _mm_gemmlowp->run(gemm_pack) : _mm_gemm->run(gemm_pack);
  
     // Reshape output matrix
     if (!_skip_col2im)
     {
         if (_data_layout == DataLayout::NCHW)
         {
             ITensorPack pack = {{TensorType::ACL_SRC, gemm_output.get()}, {TensorType::ACL_DST, dst}};
             NEScheduler::get().schedule_op(_col2im_kernel.get(), Window::DimY, _col2im_kernel->window(), pack);
         }
         else
         {
             ITensorPack pack = {{TensorType::ACL_SRC, gemm_output_to_use}, {TensorType::ACL_DST, dst}};
             _reshape->run(pack);
         }
     }
     else if (out_has_padding)
     {
         ITensorPack pack = {{TensorType::ACL_SRC, gemm_output_to_use}, {TensorType::ACL_DST, dst}};
         _reshape->run(pack);
     }
 }

References arm_compute::ACL_DST, arm_compute::ACL_SRC, arm_compute::ACL_SRC_0, arm_compute::ACL_SRC_1, ITensorPack::add_const_tensor(), ITensorPack::add_tensor(), Tensor::allocator(), ARM_COMPUTE_ERROR_ON_NULLPTR, ITensor::buffer(), Window::DimY, arm_compute::test::validation::dst, TensorInfo::extend_padding(), Scheduler::get(), CpuAuxTensorHandler::get(), ITensorPack::get_const_tensor(), arm_compute::get_data_layout_dimension_index(), ITensorPack::get_tensor(), arm_compute::HEIGHT, TensorAllocator::import_memory(), ITensor::info(), arm_compute::NCHW, arm_compute::offset_int_vec(), arm_compute::test::validation::pack, ITensorInfo::padding(), CpuGemmConv2d::prepare(), IScheduler::schedule_op(), ITensorAllocator::soft_init(), arm_compute::test::validation::src, and arm_compute::WIDTH.

◆ validate()

Status validate	(	const ITensorInfo *	src,
		const ITensorInfo *	weights,
		const ITensorInfo *	biases,
		const ITensorInfo *	output,
		const PadStrideInfo &	conv_info,
		const WeightsInfo &	weights_info = `WeightsInfo()`,
		const Size2D &	dilation = `Size2D(1U, 1U)`,
		const ActivationLayerInfo &	act_info = `ActivationLayerInfo()`,
		bool	enable_fast_math = `false`,
		unsigned int	num_groups = `1`
	)

static

Static function to check if given info will lead to a valid configuration.

Returns: a status

Definition at line 646 of file CpuGemmConv2d.cpp.

 {
     ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR(src, weights, dst);
     ARM_COMPUTE_RETURN_ERROR_ON_MSG(weights_info.are_reshaped(), "Weights already reshaped are not supported!");
     ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(src, 1, DataType::QASYMM8, DataType::QASYMM8_SIGNED,
                                                          DataType::BFLOAT16, DataType::F16, DataType::F32);
     ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(weights, 1, DataType::QASYMM8, DataType::QASYMM8_SIGNED,
                                                          DataType::QSYMM8_PER_CHANNEL, DataType::BFLOAT16,
                                                          DataType::F16, DataType::F32);
  
     if (!is_fixed_format(weights_info.weight_format()))
     {
         ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_LAYOUT(src, weights);
     }
  
     ARM_COMPUTE_RETURN_ERROR_ON_MSG(num_groups > 1, "Grouping (num_groups != 1) is not supported");
  
     const DataLayout data_layout = src->data_layout();
     const DataType   data_type   = src->data_type();
     const int        idx_width   = get_data_layout_dimension_index(data_layout, DataLayoutDimension::WIDTH);
     const int        idx_height  = get_data_layout_dimension_index(data_layout, DataLayoutDimension::HEIGHT);
     const int        idx_channel = get_data_layout_dimension_index(data_layout, DataLayoutDimension::CHANNEL);
     const int        idx_kernels = get_data_layout_dimension_index(data_layout, DataLayoutDimension::BATCHES);
  
     const unsigned int kernel_width  = weights->dimension(idx_width);
     const unsigned int kernel_height = weights->dimension(idx_height);
  
     TensorInfo         im2col_reshaped_info{};
     TensorInfo         info_gemm{};
     TensorInfo         tmp_info{};
     TensorInfo         weights_reshaped_info{};
     const ITensorInfo *gemm_input_to_use  = src;
     const ITensorInfo *gemm_output_to_use = dst;
     const ITensorInfo *weights_to_use     = weights;
  
     const bool append_bias  = false;
     const bool is_quantized = is_data_type_quantized_asymmetric(data_type);
     const bool is_bf16      = data_type == DataType::BFLOAT16;
  
     // Get convolved dimensions
     unsigned int conv_w = 0;
     unsigned int conv_h = 0;
  
     std::tie(conv_w, conv_h) = scaled_dimensions(src->dimension(idx_width), src->dimension(idx_height), kernel_width,
                                                  kernel_height, conv_info, dilation);
  
     // Check if GEMM3D is supported
     const CpuGemmConv2d::SkipInfo skip_info =
         CpuGemmConv2d::skip_im_col_info(src, weights, conv_info, dilation, act_info);
     const bool skip_im2col = skip_info.skip_im2col, skip_col2im = skip_info.skip_col2im;
  
     ARM_COMPUTE_RETURN_ERROR_ON(weights->dimension(idx_channel) != src->dimension(idx_channel));
     ARM_COMPUTE_RETURN_ERROR_ON(weights->num_dimensions() > 4);
  
     // Validate biases
     if (biases != nullptr)
     {
         if (is_quantized)
         {
             ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(biases, 1, DataType::S32);
         }
         else if (is_bf16)
         {
             ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(biases, 1, DataType::F32);
         }
         else
         {
             ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(src, biases);
         }
         ARM_COMPUTE_RETURN_ERROR_ON(biases->dimension(0) != dst->dimension(idx_channel));
         ARM_COMPUTE_RETURN_ERROR_ON(biases->num_dimensions() > 1);
     }
  
     unsigned int mat_weights_cols = weights->dimension(idx_kernels);
     unsigned int mat_weights_rows =
         weights->dimension(idx_width) * weights->dimension(idx_height) * weights->dimension(idx_channel);
  
     // Initialize reshaped weights
     initialize_reshaped_weight_info(*weights, weights_reshaped_info);
     // No need to call CpuReshape::validate() or CpuTranspose::validate() as the dst info is auto-configured from the
     // src
     weights_to_use = &weights_reshaped_info;
  
     if (!skip_im2col)
     {
         const int block_by        = arm_compute::block_by(weights_info.weight_format());
         int       input_pad_right = 0;
         if (block_by > 1)
         {
             input_pad_right =
                 (src->dimension(idx_channel) % block_by) == 0 ? 0 : block_by - (src->dimension(idx_channel) % block_by);
             mat_weights_rows = weights->dimension(idx_width) * weights->dimension(idx_height) *
                                (weights->dimension(idx_channel) + input_pad_right);
         }
  
         // Create tensor info for im2col reshaped inputs
         // For CPU, the batch size is on the fourth dimension
         TensorShape shape_im2col = src->tensor_shape();
         shape_im2col.set(0, mat_weights_rows);
         shape_im2col.set(1, conv_w * conv_h);
         shape_im2col.set(2, 1);
  
         im2col_reshaped_info = TensorInfo(shape_im2col, 1, data_type);
         im2col_reshaped_info.set_quantization_info(src->quantization_info());
         ARM_COMPUTE_RETURN_ON_ERROR(
             kernels::CpuIm2ColKernel::validate(src, &im2col_reshaped_info, Size2D(kernel_width, kernel_height),
                                                conv_info, append_bias, dilation, num_groups, input_pad_right));
         gemm_input_to_use = &im2col_reshaped_info;
     }
  
     // Create temporary GEMM output tensor in case we cannot skip col2im
     const DataType output_data_type = data_type == DataType::BFLOAT16 ? DataType::F32 : data_type;
     if (!skip_col2im)
     {
         TensorShape shape_gemm = gemm_input_to_use->tensor_shape();
         shape_gemm.set(0, mat_weights_cols);
         shape_gemm.set(1, conv_w * conv_h);
         info_gemm = TensorInfo(shape_gemm, 1, output_data_type);
     }
     else
     {
         info_gemm = TensorInfo(dst->tensor_shape(), 1, output_data_type);
     }
     info_gemm.set_quantization_info(dst->quantization_info()).set_data_layout(src->data_layout());
     gemm_output_to_use      = &info_gemm;
     const bool fixed_format = weights_info.weight_format() != arm_compute::WeightFormat::UNSPECIFIED;
  
     // See note_CpuGemmConv2d_weight_use_in_configure regarding the choice of the weights
     ARM_COMPUTE_RETURN_ON_ERROR(validate_mm(gemm_input_to_use, weights_to_use, biases, gemm_output_to_use, act_info,
                                             enable_fast_math, skip_col2im ? conv_h : 0, skip_im2col, fixed_format,
                                             weights_info.weight_format()));
  
     // Validate Col2Im/ReshapeLayer
     if (!skip_col2im && (data_layout == DataLayout::NCHW))
     {
         ARM_COMPUTE_RETURN_ON_ERROR(
             kernels::CpuCol2ImKernel::validate(gemm_output_to_use, dst, Size2D(conv_w, conv_h)));
     }
  
     return Status{};
 }

Referenced by CpuGemmConv2d::configure(), CpuConv2d::validate(), and NEGEMMConvolutionLayer::validate().

◆ workspace()

experimental::MemoryRequirements workspace ( ) const

overridevirtual

Return the memory requirements required by the workspace.

Reimplemented from INEOperator.

Definition at line 983 of file CpuGemmConv2d.cpp.

 {
     return _aux_mem;
 }

The documentation for this class was generated from the following files:

src/cpu/operators/CpuGemmConv2d.h
src/cpu/operators/CpuGemmConv2d.cpp

Public Member Functions

Static Public Member Functions

Detailed Description

Constructor & Destructor Documentation

◆ CpuGemmConv2d() [1/3]

◆ CpuGemmConv2d() [2/3]

◆ CpuGemmConv2d() [3/3]

◆ ~CpuGemmConv2d()

Member Function Documentation

◆ configure()

Which weights tensor should we use to configure gemm

◆ has_opt_impl()

Which weights tensor should we use for has_opt_impl

◆ operator=() [1/2]

◆ operator=() [2/2]

◆ prepare()

◆ run()

◆ validate()

◆ workspace()