Compute Library
 23.08
NEGEMMConvolutionLayer Class Reference

Basic function to compute the convolution layer. More...

#include <NEGEMMConvolutionLayer.h>

Collaboration diagram for NEGEMMConvolutionLayer:
[legend]

Public Member Functions

 NEGEMMConvolutionLayer (const std::shared_ptr< IMemoryManager > &memory_manager=nullptr, IWeightsManager *weights_manager=nullptr)
 Constructor. More...
 
 NEGEMMConvolutionLayer (const NEGEMMConvolutionLayer &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 NEGEMMConvolutionLayer (NEGEMMConvolutionLayer &&)=delete
 Prevent instances of this class from being moved (As this class contains non movable objects) More...
 
NEGEMMConvolutionLayeroperator= (const NEGEMMConvolutionLayer &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
NEGEMMConvolutionLayeroperator= (NEGEMMConvolutionLayer &&)=delete
 Prevent instances of this class from being moved (As this class contains non movable objects) More...
 
 ~NEGEMMConvolutionLayer ()
 Default destructor. More...
 
void configure (const ITensor *input, const ITensor *weights, const ITensor *biases, ITensor *output, const PadStrideInfo &conv_info, const WeightsInfo &weights_info=WeightsInfo(), const Size2D &dilation=Size2D(1U, 1U), const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false, unsigned int num_groups=1)
 Set the input and output tensors. More...
 
void run () override
 Run the kernels contained in the function. More...
 
void prepare () override
 Prepare the function for executing. More...
 
- Public Member Functions inherited from IFunction
virtual ~IFunction ()=default
 Destructor. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, const PadStrideInfo &conv_info, const WeightsInfo &weights_info=WeightsInfo(), const Size2D &dilation=Size2D(1U, 1U), const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false, unsigned int num_groups=1)
 Static function to check if given info will lead to a valid configuration of NEGEMMConvolutionLayer. More...
 
static Status has_opt_impl (arm_compute::WeightFormat &expected_weight_format, const ITensorInfo *src, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *dst, const PadStrideInfo &conv_info, const WeightsInfo &weights_info=WeightsInfo(), const Size2D &dilation=Size2D(1U, 1U), const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false)
 Static function to check if there is an optimized version of GEMM available for the input parameters. More...
 

Detailed Description

Basic function to compute the convolution layer.

This function calls the following kernels/functions:

  1. cpu::CpuGemmConv2d

Definition at line 48 of file NEGEMMConvolutionLayer.h.

Constructor & Destructor Documentation

◆ NEGEMMConvolutionLayer() [1/3]

NEGEMMConvolutionLayer ( const std::shared_ptr< IMemoryManager > &  memory_manager = nullptr,
IWeightsManager weights_manager = nullptr 
)

Constructor.

Definition at line 49 of file NEGEMMConvolutionLayer.cpp.

50  : _impl(std::make_unique<Impl>())
51 {
52  _impl->weights_manager = weights_manager;
53  _impl->memory_group = MemoryGroup(memory_manager);
54 }

◆ NEGEMMConvolutionLayer() [2/3]

Prevent instances of this class from being copied (As this class contains pointers)

◆ NEGEMMConvolutionLayer() [3/3]

Prevent instances of this class from being moved (As this class contains non movable objects)

◆ ~NEGEMMConvolutionLayer()

~NEGEMMConvolutionLayer ( )
default

Default destructor.

Member Function Documentation

◆ configure()

void configure ( const ITensor input,
const ITensor weights,
const ITensor biases,
ITensor output,
const PadStrideInfo conv_info,
const WeightsInfo weights_info = WeightsInfo(),
const Size2D dilation = Size2D(1U, 1U),
const ActivationLayerInfo act_info = ActivationLayerInfo(),
bool  enable_fast_math = false,
unsigned int  num_groups = 1 
)

Set the input and output tensors.

Valid data layouts:

  • NHWC
  • NCHW

Valid data type configurations:

src0 src1 src2 dst
F16 F16 F16 F16
F32 F32 F32 F32
BFLOAT16 BFLOAT16 BFLOAT16 BFLOAT16
QASYMM8 QASYMM8 S32 QASYMM8
QASYMM8 QSYMM8_PER_CHANNEL S32 QASYMM8
QASYMM8_SIGNED QASYMM8_SIGNED S32 QASYMM8_SIGNED
QASYMM8_SIGNED QSYMM8_PER_CHANNEL S32 QASYMM8_SIGNED
Parameters
[in]inputSource tensor. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: QASYMM8/QASYMM8_SIGNED/BFLOAT16/F16/F32.
[in]weightsWeights tensor. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. Data type supported: QASYMM8/QASYMM8_SIGNED/QSYMM8_PER_CHANNEL/BFLOAT16/F16/F32.
[in]biasesBiases tensor. Shared biases supported. Biases are 1D tensor with dimensions [OFM]. Data type supported: Should match input data type, except for input of QASYMM8/QASYMM8_SIGNED type where biases should be of S32 type.
[out]outputDestination tensor. 3 lower dimensions represent a single output [width, height, OFM], while the rest represent batch of outputs. Data types supported: Same as input.
[in]conv_infoContains padding and stride information described in PadStrideInfo.
[in]weights_infoSpecifies if the weights tensor has been reshaped with NEWeightsReshapeKernel. If this is not part of the fully connected layer the weights tensor has also been transposed with cpu::kernels::CpuGemmTranspose1xWKernel. Data type supported: Same as input.
[in]dilation(Optional) Dilation, in elements, across x and y. Defaults to (1, 1).
[in]act_info(Optional) Activation layer information in case of a fused activation. Only RELU, BOUNDED_RELU and LU_BOUNDED_RELU supported.
[in]enable_fast_math(Optional) Enable fast math computation. In case this flag were set, the function could dispatch the fastest implementation available which may introduce a drop of accuracy as well. Default is false
[in]num_groups(Optional) Number of groups when performing a grouped convolution. num_groups != 1 is not supported

Definition at line 57 of file NEGEMMConvolutionLayer.cpp.

59 {
60  ARM_COMPUTE_ERROR_ON_NULLPTR(input, weights, output);
61 
62  _impl->weights = weights;
63  _impl->op = std::make_unique<cpu::CpuGemmConv2d>();
64  _impl->op->configure(input->info(), weights->info(), (biases != nullptr ? biases->info() : nullptr), output->info(), conv_info, weights_info, dilation, act_info, enable_fast_math, num_groups);
65 
66  _impl->run_pack =
67  {
69  { TensorType::ACL_SRC_1, weights },
70  { TensorType::ACL_SRC_2, biases },
71  { TensorType::ACL_DST, output }
72  };
73  _impl->aux_mem_req = _impl->op->workspace();
74  _impl->workspace_tensors = manage_workspace<Tensor>(_impl->aux_mem_req, _impl->memory_group, _impl->run_pack, _impl->run_pack);
75 }

References arm_compute::ACL_DST, arm_compute::ACL_SRC_0, arm_compute::ACL_SRC_1, arm_compute::ACL_SRC_2, arm_compute::test::validation::act_info, ARM_COMPUTE_ERROR_ON_NULLPTR, arm_compute::test::validation::conv_info, ITensor::info(), arm_compute::test::validation::input, arm_compute::test::validation::num_groups, and arm_compute::test::validation::weights_info.

◆ has_opt_impl()

Status has_opt_impl ( arm_compute::WeightFormat expected_weight_format,
const ITensorInfo src,
const ITensorInfo weights,
const ITensorInfo biases,
const ITensorInfo dst,
const PadStrideInfo conv_info,
const WeightsInfo weights_info = WeightsInfo(),
const Size2D dilation = Size2D(1U, 1U),
const ActivationLayerInfo act_info = ActivationLayerInfo(),
bool  enable_fast_math = false 
)
static

Static function to check if there is an optimized version of GEMM available for the input parameters.

The method is intended to be used to find out the optimal memory layout to be used for the weights tensor when running variable weights execution.

The user can query the database of optimised kernels in arm_gemm by specifying one of the enumerations of arm_compute::WeightFormat in the weight_format field of the input parameter weights_info. In case of success, the method writes the expected format in the output parameter expected_weight_format. The expected_weight_format can than be used in the configure method of the class for retrieving the best optimal kernel.

Use case one - query for a specific format:

WeightInfo weights_info(..., arm_compute::WeightFormat::OHWIo4, ...); // Set the value of the input query.
if (NEGEMMConvolutionlayer::has_opt_impl(WeightFormat(), ...., weights_info, ...))
{
  auto conv = std::unique_ptr<NEGEMMConvolutionlayer>();
  conv->configure(..., weights_info, ...);  // uses the same WeightFormat the user wanted originally, OHWYo4.
  conv->run(...);
}

Use case two - query for any format that would be optimal for the GEMM to execute:

WeightInfo weights_info(..., arm_compute::WeightFormat::ANY, ...); // Set the value of the input query.
arm_compute::WeightFormat expected_wf;
if (NEGEMMConvolutionlayer::has_opt_impl(expected_wf, ...., weights_info, ...))
{
  auto conv = std::unique_ptr<NEGEMMConvolutionlayer>();
  // ... code to convert the layout of the weights tensor to the layout returned by has_opt_impl
  WeightInfo new_weights_info(..., expected_wf, ...); // Set the value of the WeightFormat returned by has_opt_impl.
  conv->configure(..., new_weights_info, ...);
  conv->run(...);
}

Notice that a GEMM configured with a WeightFormat other than UNSPECIFIED will run GEMM with variable weights mode.

Parameters
[out]expected_weight_formatThe arm_compute::WeightFormat expected by the kernel.
[in]srcSource tensor info.
[in]weightsWeights tensor info.
[in]biasesBiases tensor info. Shared biases supported.
[in]dstDestination tensor info.
[in]conv_infoContains padding and stride information described in PadStrideInfo.
[in]weights_info(optional) Specifies additional configuration parameters for the weights of the GEMM computation.
[in]dilation(Optional) Dilation, in elements, across x and y. Defaults to (1, 1).
[in]act_info(Optional) Activation layer information in case of a fused activation. Only RELU, BOUNDED_RELU and LU_BOUNDED_RELU supported. And no activation (i.e. Linear) which is the default value.
[in]enable_fast_math(Optional) Enable fast math computation. In case this flag were set, the function could dispatch the fastest implementation
Returns
a Status

Definition at line 83 of file NEGEMMConvolutionLayer.cpp.

86 {
87  return cpu::CpuGemmConv2d::has_opt_impl(expected_weight_format, src, weights, biases, dst, conv_info, weights_info, dilation, act_info, enable_fast_math);
88 }

References arm_compute::test::validation::act_info, arm_compute::test::validation::conv_info, arm_compute::test::validation::dst, CpuGemmConv2d::has_opt_impl(), arm_compute::test::validation::src, and arm_compute::test::validation::weights_info.

◆ operator=() [1/2]

NEGEMMConvolutionLayer& operator= ( const NEGEMMConvolutionLayer )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

NEGEMMConvolutionLayer& operator= ( NEGEMMConvolutionLayer &&  )
delete

Prevent instances of this class from being moved (As this class contains non movable objects)

◆ prepare()

void prepare ( )
overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Note
Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from IFunction.

Definition at line 97 of file NEGEMMConvolutionLayer.cpp.

98 {
99  if(!_impl->is_prepared)
100  {
101  _impl->op->prepare(_impl->run_pack);
102 
103  // Release temporary tensors that are only used in prepare stage
104  release_temporaries<Tensor>(_impl->aux_mem_req, _impl->workspace_tensors);
105  _impl->is_prepared = true;
106  }
107 }

Referenced by NEGEMMConvolutionLayer::run().

◆ run()

void run ( )
overridevirtual

Run the kernels contained in the function.

For CPU kernels:

  • Multi-threading is used for the kernels which are parallelisable.
  • By default std::thread::hardware_concurrency() threads are used.
Note
CPPScheduler::set_num_threads() can be used to manually set the number of threads

For OpenCL kernels:

  • All the kernels are enqueued on the queue associated with CLScheduler.
  • The queue is then flushed.
Note
The function will not block until the kernels are executed. It is the user's responsibility to wait.
Will call prepare() on first run if hasn't been done

Implements IFunction.

Definition at line 90 of file NEGEMMConvolutionLayer.cpp.

91 {
92  prepare();
93  MemoryGroupResourceScope scope_mg(_impl->memory_group);
94  _impl->op->run(_impl->run_pack);
95 }

References NEGEMMConvolutionLayer::prepare().

◆ validate()

Status validate ( const ITensorInfo input,
const ITensorInfo weights,
const ITensorInfo biases,
const ITensorInfo output,
const PadStrideInfo conv_info,
const WeightsInfo weights_info = WeightsInfo(),
const Size2D dilation = Size2D(1U, 1U),
const ActivationLayerInfo act_info = ActivationLayerInfo(),
bool  enable_fast_math = false,
unsigned int  num_groups = 1 
)
static

Static function to check if given info will lead to a valid configuration of NEGEMMConvolutionLayer.

Parameters
[in]inputSource tensor info. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: QASYMM8/QASYMM8_SIGNED/BFLOAT16/F16/F32.
[in]weightsWeights tensor info. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. Data type supported: QASYMM8/QASYMM8_SIGNED/QSYMM8_PER_CHANNEL/BFLOAT16/F16/F32.
[in]biasesBiases tensor info. Shared biases supported. Biases are 1D tensor with dimensions [OFM]. Data type supported: Should match input data type, except for input of QASYMM8/QASYMM8_SIGNED type where biases should be of S32 type.
[in]outputDestination tensor info. 3 lower dimensions represent a single output [width, height, OFM], while the rest represent batch of outputs. Data types supported: Same as input.
[in]conv_infoContains padding and stride information described in PadStrideInfo.
[in]weights_infoSpecifies if the weights tensor has been reshaped with NEWeightsReshapeKernel. If this is not part of the fully connected layer the weights tensor has also been transposed with cpu::kernels::CpuGemmTranspose1xWKernel. Data type supported: Same as input.
[in]dilation(Optional) Dilation, in elements, across x and y. Defaults to (1, 1).
[in]act_info(Optional) Activation layer information in case of a fused activation. Only RELU, BOUNDED_RELU and LU_BOUNDED_RELU supported.
[in]enable_fast_math(Optional) Enable fast math computation. In case this flag were set, the function could dispatch the fastest implementation available which may introduce a drop of accuracy as well. Default is false
[in]num_groups(Optional) Number of groups when performing a grouped convolution. num_groups != 1 is not supported
Returns
a status

Definition at line 77 of file NEGEMMConvolutionLayer.cpp.

79 {
80  return cpu::CpuGemmConv2d::validate(input, weights, biases, output, conv_info, weights_info, dilation, act_info, enable_fast_math, num_groups);
81 }

References arm_compute::test::validation::act_info, arm_compute::test::validation::conv_info, arm_compute::test::validation::input, arm_compute::test::validation::num_groups, CpuGemmConv2d::validate(), and arm_compute::test::validation::weights_info.


The documentation for this class was generated from the following files:
arm_compute::test::validation::src
SimpleTensor< float > src
Definition: DFT.cpp:155
arm_compute::test::validation::weights_info
weights_info
Definition: BatchNormalizationLayer.cpp:165
arm_compute::test::validation::dst
auto dst
Definition: DFT.cpp:170
arm_compute::ACL_SRC_0
@ ACL_SRC_0
Definition: Types.h:45
arm_compute::ACL_SRC_1
@ ACL_SRC_1
Definition: Types.h:46
arm_compute::ACL_SRC_2
@ ACL_SRC_2
Definition: Types.h:47
arm_compute::test::validation::act_info
act_info
Definition: DirectConvolutionLayer.cpp:547
ARM_COMPUTE_ERROR_ON_NULLPTR
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161
arm_compute::ACL_DST
@ ACL_DST
Definition: Types.h:55
arm_compute::test::validation::num_groups
const unsigned int num_groups
Definition: Im2Col.cpp:153
arm_compute::test::validation::conv_info
const auto conv_info
Definition: ConvolutionLayer.cpp:407
arm_compute::NEGEMMConvolutionLayer::prepare
void prepare() override
Prepare the function for executing.
Definition: NEGEMMConvolutionLayer.cpp:97
arm_compute::cpu::CpuGemmConv2d::has_opt_impl
static Status has_opt_impl(arm_compute::WeightFormat &expected_weight_format, const ITensorInfo *src, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, const PadStrideInfo &conv_info, const WeightsInfo &weights_info=WeightsInfo(), const Size2D &dilation=Size2D(1U, 1U), const ActivationLayerInfo &act_info=ActivationLayerInfo(), const bool enable_fast_math=false)
Indicates whether or not there is an optimal assembly implementation that can be used to process the ...
Definition: CpuGemmConv2d.cpp:398
arm_compute::cpu::CpuGemmConv2d::validate
static Status validate(const ITensorInfo *src, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, const PadStrideInfo &conv_info, const WeightsInfo &weights_info=WeightsInfo(), const Size2D &dilation=Size2D(1U, 1U), const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false, unsigned int num_groups=1)
Static function to check if given info will lead to a valid configuration.
Definition: CpuGemmConv2d.cpp:430
arm_compute::test::validation::input
auto input
Definition: LSTMLayerQuantized.cpp:486