Compute Library
 19.08
NEDepthwiseConvolutionLayer Class Reference

Basic function to execute a generic depthwise convolution. More...

#include <NEDepthwiseConvolutionLayer.h>

Collaboration diagram for NEDepthwiseConvolutionLayer:
[legend]

Public Member Functions

 NEDepthwiseConvolutionLayer ()
 Default constructor. More...
 
 NEDepthwiseConvolutionLayer (const NEDepthwiseConvolutionLayer &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 NEDepthwiseConvolutionLayer (NEDepthwiseConvolutionLayer &&)=default
 Default move constructor. More...
 
NEDepthwiseConvolutionLayeroperator= (const NEDepthwiseConvolutionLayer &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
NEDepthwiseConvolutionLayeroperator= (NEDepthwiseConvolutionLayer &&)=default
 Default move assignment operator. More...
 
void configure (ITensor *input, const ITensor *weights, const ITensor *biases, ITensor *output, const PadStrideInfo &conv_info, unsigned int depth_multiplier=1, const ActivationLayerInfo &act_info=ActivationLayerInfo(), const Size2D &dilation=Size2D(1U, 1U))
 Initialize the function's source, destination, weights and convolution information. More...
 
void run () override
 Run the kernels contained in the function. More...
 
void prepare () override
 Prepare the function for executing. More...
 
- Public Member Functions inherited from IFunction
virtual ~IFunction ()=default
 Destructor. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, const PadStrideInfo &conv_info, unsigned int depth_multiplier=1, const ActivationLayerInfo &act_info=ActivationLayerInfo(), const Size2D &dilation=Size2D(1U, 1U))
 Static function to check if given info will lead to a valid configuration of NEDepthwiseConvolutionLayer. More...
 

Detailed Description

Basic function to execute a generic depthwise convolution.

This function calls the following NEON kernels:

If data type is F32 and data layout is NHWC:

  1. NEDepthwiseConvolutionLayerNativeKernel

Otherwise:

  1. NEDepthwiseIm2ColKernel
  2. NEDepthwiseWeightsReshapeKernel
  3. NEGEMMMatrixVectorMultiplyKernel
  4. NEFillBorderKernel (if pad_x or pad_y > 0)

Definition at line 294 of file NEDepthwiseConvolutionLayer.h.

Constructor & Destructor Documentation

◆ NEDepthwiseConvolutionLayer() [1/3]

Default constructor.

Definition at line 703 of file NEDepthwiseConvolutionLayer.cpp.

704  : _im2col_kernel(), _weights_reshape_kernel(), _v2mm_kernel(), _depthwise_conv_kernel(), _vector_to_tensor_kernel(), _output_stage_kernel(), _fill_border(), _v2mm_input_fill_border(),
705  _v2mm_weights_fill_border(), _permute_input(), _permute_weights(), _permute_output(), _activationlayer_function(), _input_reshaped(), _weights_reshaped(), _v2mm_output(), _output_reshaped(),
706  _permuted_input(), _permuted_weights(), _permuted_output(), _is_prepared(false), _is_quantized(false), _is_nhwc(false), _is_activationlayer_enabled(false), _is_optimized(false),
707  _original_weights(nullptr)
708 {
709 }

◆ NEDepthwiseConvolutionLayer() [2/3]

Prevent instances of this class from being copied (As this class contains pointers)

◆ NEDepthwiseConvolutionLayer() [3/3]

Default move constructor.

Member Function Documentation

◆ configure()

void configure ( ITensor input,
const ITensor weights,
const ITensor biases,
ITensor output,
const PadStrideInfo conv_info,
unsigned int  depth_multiplier = 1,
const ActivationLayerInfo act_info = ActivationLayerInfo(),
const Size2D dilation = Size2D(1U, 1U) 
)

Initialize the function's source, destination, weights and convolution information.

Parameters
[in,out]inputSource tensor. Data type supported: QASYMM8/F16/F32. (Written to only for border filling).
[out]outputDestination tensor. Data type supported: same as input.
[in]weightsWeights tensor. These are 3D tensors with shape [kernel_x, kernel_y, IFM]. Data type supported: Same as input.
[in]biases(Optional) Biases tensor. A 1D tensor with shape [IFM]. Must be nullptr if not needed. Data type supported: Same as input, S32 when input is QASYMM8.
[in]conv_infoPadding and stride information to use for the convolution.
[in]depth_multiplier(Optional) Multiplier to apply to the input's depth in order to retrieve the output's depth. Defaults to 1.
[in]act_info(Optional) Activation layer information in case of a fused activation.
[in]dilation(Optional) Dilation, in elements, across x and y. Defaults to (1, 1).

Definition at line 711 of file NEDepthwiseConvolutionLayer.cpp.

713 {
714  ARM_COMPUTE_ERROR_ON_NULLPTR(input, weights, output);
715  // Perform validation step
716  ARM_COMPUTE_ERROR_THROW_ON(NEDepthwiseConvolutionLayer::validate(input->info(), weights->info(), (biases == nullptr) ? nullptr : biases->info(),
717  output->info(), conv_info, depth_multiplier, act_info, dilation));
718 
719  _is_nhwc = input->info()->data_layout() == DataLayout::NHWC;
720  _is_optimized = _is_nhwc && input->info()->data_type() == DataType::F32;
721 
722  if(!_is_optimized)
723  {
724  ITensor *input_to_use = input;
725  const ITensor *weights_to_use = weights;
726  ITensor *output_to_use = output;
727 
728  if(_is_nhwc)
729  {
730  _permute_input.configure(input, &_permuted_input, PermutationVector(1U, 2U, 0U));
731  _permuted_input.info()->set_data_layout(DataLayout::NCHW);
732  input_to_use = &_permuted_input;
733 
734  _permute_weights.configure(weights, &_permuted_weights, PermutationVector(1U, 2U, 0U));
735  _permuted_weights.info()->set_data_layout(DataLayout::NCHW);
736  weights_to_use = &_permuted_weights;
737  }
738 
739  const size_t weights_w = weights_to_use->info()->dimension(0);
740  const size_t weights_h = weights_to_use->info()->dimension(1);
741  const size_t weights_z = weights_to_use->info()->dimension(2);
742 
743  _is_quantized = is_data_type_quantized_asymmetric(input->info()->data_type());
744  _is_prepared = false;
745  _original_weights = weights_to_use;
746 
747  // Should bias be appended ?
748  bool append_bias = (biases != nullptr) && !_is_quantized;
749 
750  // Calculate output shape
751  TensorShape output_shape = shape_calculator::compute_depthwise_convolution_shape(*input->info(), *weights->info(), conv_info, depth_multiplier, dilation);
752 
753  // Output auto inizialitation if not yet initialized
754  auto_init_if_empty(*output->info(), input->info()->clone()->set_tensor_shape(output_shape));
755  ARM_COMPUTE_ERROR_ON_MISMATCHING_DIMENSIONS(output->info()->tensor_shape(), output_shape);
756 
757  if(_is_nhwc)
758  {
760  _permuted_output.allocator()->init(output->info()->clone()->set_is_resizable(true).reset_padding().set_tensor_shape(output_shape));
761  _permuted_output.info()->set_data_layout(DataLayout::NCHW);
762  _permuted_output.info()->set_quantization_info(output->info()->quantization_info());
763  output_to_use = &_permuted_output;
764  }
765 
766  // Output width and height
767  const unsigned int conv_w = output_shape.x();
768  const unsigned int conv_h = output_shape.y();
769 
770  // Set up intermediate tensors
771  const size_t patch_size = weights_w * weights_h + (append_bias ? 1 : 0);
772  const size_t conv_size = conv_w * conv_h;
773 
774  // Im2Col configuration
775  TensorShape shape_im2col = input_to_use->info()->tensor_shape();
776  shape_im2col.set(0, patch_size);
777  shape_im2col.set(1, conv_size);
778  shape_im2col.set(2, weights_z);
779  _input_reshaped.allocator()->init(input->info()->clone()->set_is_resizable(true).reset_padding().set_tensor_shape(shape_im2col).set_data_layout(DataLayout::NCHW));
780  _im2col_kernel.configure(input_to_use, &_input_reshaped, Size2D(weights_w, weights_h), conv_info, append_bias, depth_multiplier, dilation);
781 
782  // Weights reshape configuration
783  const TensorShape shape_weights_reshape(patch_size, weights_z);
784  _weights_reshaped.allocator()->init(weights->info()->clone()->set_is_resizable(true).reset_padding().set_tensor_shape(shape_weights_reshape).set_data_layout(DataLayout::NCHW));
785  _weights_reshape_kernel.configure(weights_to_use, &_weights_reshaped, append_bias ? biases : nullptr);
786 
787  // GEMV configuration
788  DataType v2mm_dt = (input->info()->data_type() == DataType::QASYMM8) ? DataType::S32 : input->info()->data_type();
789  TensorShape shape_v2mm_out = input_to_use->info()->tensor_shape();
790  shape_v2mm_out.set(0, conv_size * weights_z);
791  shape_v2mm_out.set(1, 1);
792  shape_v2mm_out.set(2, 1);
793  _v2mm_output.allocator()->init(input->info()->clone()->set_is_resizable(true).reset_padding().set_data_type(v2mm_dt).set_tensor_shape(shape_v2mm_out).set_data_layout(DataLayout::NCHW));
794  _v2mm_kernel.configure(&_input_reshaped, &_weights_reshaped, &_v2mm_output);
795  _output_reshaped.allocator()->init(_v2mm_output.info()->clone()->set_is_resizable(true).reset_padding().set_tensor_shape(output_shape));
796  _vector_to_tensor_kernel.configure(&_v2mm_output, (_is_quantized) ? &_output_reshaped : output_to_use, conv_w, conv_h);
797 
798  // Output staged configuration
799  if(_is_quantized)
800  {
801  const UniformQuantizationInfo iq_info = input->info()->quantization_info().uniform();
802  const UniformQuantizationInfo wq_info = weights->info()->quantization_info().uniform();
803  const UniformQuantizationInfo oq_info = output->info()->quantization_info().uniform();
804 
805  float multiplier = (iq_info.scale * wq_info.scale) / oq_info.scale;
806  int output_multiplier;
807  int output_shift;
808  quantization::calculate_quantized_multiplier_less_than_one(multiplier, &output_multiplier, &output_shift);
809  _output_stage_kernel.configure(&_output_reshaped, biases, output_to_use, output_multiplier, output_shift, oq_info.offset);
810  _output_reshaped.allocator()->allocate();
811  }
812 
813  if(_is_nhwc)
814  {
815  _permute_output.configure(&_permuted_output, output, PermutationVector(2U, 0U, 1U));
816 
817  _permuted_input.allocator()->allocate();
818  _permuted_weights.allocator()->allocate();
819  _permuted_output.allocator()->allocate();
820  }
821 
822  // Fill borders on inputs
823  PixelValue zero_in(static_cast<int32_t>(0));
824  PixelValue zero_w(static_cast<int32_t>(0));
825  if(_is_quantized)
826  {
827  zero_in = PixelValue(static_cast<int32_t>(input->info()->quantization_info().uniform().offset));
828  zero_w = PixelValue(static_cast<int32_t>(weights->info()->quantization_info().uniform().offset));
829  }
830  BorderSize border_size = _v2mm_kernel.border_size();
831  _v2mm_input_fill_border.configure(&_input_reshaped, border_size, BorderMode::CONSTANT, zero_in);
832 
833  border_size.bottom = 0;
834  _v2mm_weights_fill_border.configure(&_weights_reshaped, border_size, BorderMode::CONSTANT, zero_w);
835 
836  // Allocate intermediate tensors
837  _input_reshaped.allocator()->allocate();
838  _v2mm_output.allocator()->allocate();
839  }
840  else
841  {
842  // Configure kernel
843  _depthwise_conv_kernel.configure(input, weights, biases, output, conv_info, depth_multiplier, dilation);
844 
845  // Fill input borders
846  _fill_border.configure(input, _depthwise_conv_kernel.border_size(), BorderMode::CONSTANT, PixelValue(static_cast<uint64_t>(0), input->info()->data_type()));
847  }
848 
849  //Configure Activation Layer
850  _is_activationlayer_enabled = act_info.enabled();
851 
852  if(_is_activationlayer_enabled)
853  {
854  _activationlayer_function.configure(output, nullptr, act_info);
855  }
856 }
arm_compute::Status calculate_quantized_multiplier_less_than_one(float multiplier, int *quant_multiplier, int *right_shift)
Calculate quantized representation of multiplier with value less than one.
std::unique_ptr< ITensorInfo > clone() const override
Provide a clone of the current object of class T.
Definition: TensorInfo.cpp:306
TensorInfo * info() const override
Interface to be implemented by the child class to return the tensor's metadata.
Definition: CLTensor.cpp:35
TensorShape compute_depthwise_convolution_shape(const ITensorInfo &input, const ITensorInfo &weights, PadStrideInfo conv_info, unsigned int depth_multiplier, const Size2D &dilation=Size2D(1U, 1U))
Calculate the depthwise convolution output shape of a tensor.
void init(const TensorAllocator &allocator, const Coordinates &coords, TensorInfo &sub_info)
Shares the same backing memory with another tensor allocator, while the tensor info might be differen...
virtual size_t dimension(size_t index) const =0
Return the size of the requested dimension.
void configure(const ITensor *input0, const ITensor *input1, ITensor *output)
Initialise the kernel's input and output.
QuantizationInfo quantization_info() const override
Get the quantization settings (scale and offset) of the tensor.
Definition: TensorInfo.h:293
1 channel, 1 F32 per channel
Strides PermutationVector
Permutation vector.
Definition: Types.h:47
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:327
void configure(ITensor *tensor, BorderSize border_size, BorderMode border_mode, const PixelValue &constant_border_value=PixelValue())
Initialise the function.
void configure(ITensor *input, const ITensor *bias=nullptr, ITensor *output=nullptr, int result_fixedpoint_multiplier=0, int result_shift=0, int result_offset_after_shift=0)
Set the accumulate buffer and the biases of the kernel.
bool auto_init_if_empty(ITensorInfo &info, const TensorShape &shape, int num_channels, DataType data_type, QuantizationInfo quantization_info=QuantizationInfo())
Auto initialize the tensor info (shape, number of channels and data type) if the current assignment i...
Definition: Helpers.inl:201
TensorAllocator * allocator()
Return a pointer to the tensor's allocator.
Definition: Tensor.cpp:48
ITensorInfo * info() const override
Interface to be implemented by the child class to return the tensor's metadata.
Definition: Tensor.cpp:33
void permute(Dimensions< T > &dimensions, const PermutationVector &perm)
Permutes given Dimensions according to a permutation vector.
Definition: Helpers.h:570
BorderSize border_size() const override
The size of the border for that kernel.
1 channel, 1 S32 per channel
void configure(const ITensor *input, ITensor *output, const ITensor *biases)
Set the input and output of the kernel.
virtual ITensorInfo & set_data_layout(const DataLayout &data_layout)=0
Set the data layout of the tensor.
static Status validate(const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, const PadStrideInfo &conv_info, unsigned int depth_multiplier=1, const ActivationLayerInfo &act_info=ActivationLayerInfo(), const Size2D &dilation=Size2D(1U, 1U))
Static function to check if given info will lead to a valid configuration of NEDepthwiseConvolutionLa...
quantized, asymmetric fixed-point 8-bit number
void allocate() override
Allocate size specified by TensorInfo of CPU memory.
UniformQuantizationInfo uniform() const
Return per layer quantization info.
virtual std::unique_ptr< T > clone() const =0
Provide a clone of the current object of class T.
BorderSize border_size() const override
The size of the border for that kernel.
#define ARM_COMPUTE_ERROR_ON_MISMATCHING_DIMENSIONS(...)
Definition: Validate.h:286
virtual ITensorInfo & set_quantization_info(const QuantizationInfo &quantization_info)=0
Set the quantization settings (scale and offset) of the tensor.
void configure(const ITensor *input, ITensor *output, const Size2D &kernel_dims, const PadStrideInfo &conv_info, bool has_bias=false, unsigned int depth_multiplier=1, const Size2D &dilation=Size2D(1U, 1U))
Set the input and output of the kernel.
Num samples, channels, height, width.
bool is_data_type_quantized_asymmetric(DataType dt)
Check if a given data type is of asymmetric quantized type.
Definition: Utils.h:1030
void configure(const ITensor *input, ITensor *output, size_t conv_w, size_t conv_h)
Set the input and output of the kernel.
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161
Num samples, height, width, channels.
void configure(ITensor *input, ITensor *output, ActivationLayerInfo activation_info)
Set the input and output tensor.
void configure(const ITensor *input, ITensor *output, const PermutationVector &perm)
Configure the permute NEON kernel.
Definition: NEPermute.cpp:31
DataType
Available data types.
Definition: Types.h:74
void configure(const ITensor *input, const ITensor *weights, const ITensor *biases, ITensor *output, const PadStrideInfo &conv_info, unsigned int depth_multiplier=1, const Size2D &dilation=Size2D(1U, 1U))
Initialize the function's source, destination and parameters.

References arm_compute::test::validation::act_info, TensorAllocator::allocate(), Tensor::allocator(), ARM_COMPUTE_ERROR_ON_MISMATCHING_DIMENSIONS, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, arm_compute::auto_init_if_empty(), NEGEMMMatrixVectorMultiplyKernel::border_size(), NEDepthwiseConvolutionLayerNativeKernel::border_size(), BorderSize::bottom, arm_compute::quantization::calculate_quantized_multiplier_less_than_one(), ICloneable< T >::clone(), TensorInfo::clone(), arm_compute::misc::shape_calculator::compute_depthwise_convolution_shape(), NEPermute::configure(), NEActivationLayer::configure(), NEGEMMMatrixVectorMultiplyKernel::configure(), NEDepthwiseWeightsReshapeKernel::configure(), NEDepthwiseVectorToTensorKernel::configure(), NEDepthwiseConvolutionLayerNativeKernel::configure(), NEDirectConvolutionLayerOutputStageKernel::configure(), NEFillBorderKernel::configure(), NEDepthwiseIm2ColKernel::configure(), arm_compute::CONSTANT, arm_compute::test::validation::conv_info, ITensorInfo::data_layout(), ITensorInfo::data_type(), arm_compute::test::validation::dilation, ITensorInfo::dimension(), arm_compute::F32, ITensor::info(), Tensor::info(), CLTensor::info(), TensorAllocator::init(), arm_compute::is_data_type_quantized_asymmetric(), arm_compute::NCHW, arm_compute::NHWC, UniformQuantizationInfo::offset, arm_compute::test::validation::output_shape, arm_compute::permute(), arm_compute::QASYMM8, ITensorInfo::quantization_info(), TensorInfo::quantization_info(), arm_compute::S32, UniformQuantizationInfo::scale, TensorShape::set(), ITensorInfo::set_data_layout(), ITensorInfo::tensor_shape(), arm_compute::U, QuantizationInfo::uniform(), NEDepthwiseConvolutionLayer::validate(), and arm_compute::test::validation::weights.

Referenced by NEDepthwiseSeparableConvolutionLayer::configure().

◆ operator=() [1/2]

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

Default move assignment operator.

◆ prepare()

void prepare ( )
overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Note
Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from IFunction.

Definition at line 1011 of file NEDepthwiseConvolutionLayer.cpp.

1012 {
1013  if(!_is_prepared && !_is_optimized)
1014  {
1015  ARM_COMPUTE_ERROR_ON(!_original_weights->is_used());
1016 
1017  if(_is_nhwc)
1018  {
1019  _permute_weights.run();
1020  }
1021 
1022  // Run reshape and mark original weights as unused
1023  _weights_reshaped.allocator()->allocate();
1024  NEScheduler::get().schedule(&_weights_reshape_kernel, Window::DimX);
1025  NEScheduler::get().schedule(&_v2mm_weights_fill_border, Window::DimX);
1026  _original_weights->mark_as_unused();
1027 
1028  _is_prepared = true;
1029  }
1030 }
void run() override final
Run the kernels contained in the function.
bool is_used() const
Flags if the tensor is used or not.
Definition: ITensor.cpp:162
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:337
TensorAllocator * allocator()
Return a pointer to the tensor's allocator.
Definition: Tensor.cpp:48
void mark_as_unused() const
Marks a tensor as unused.
Definition: ITensor.cpp:167
static constexpr size_t DimX
Alias for dimension 0 also known as X dimension.
Definition: Window.h:43
void allocate() override
Allocate size specified by TensorInfo of CPU memory.
virtual void schedule(ICPPKernel *kernel, const Hints &hints)=0
Runs the kernel in the same thread as the caller synchronously.
static IScheduler & get()
Access the scheduler singleton.
Definition: Scheduler.cpp:96

References TensorAllocator::allocate(), Tensor::allocator(), ARM_COMPUTE_ERROR_ON, Window::DimX, Scheduler::get(), ITensor::is_used(), ITensor::mark_as_unused(), INESimpleFunctionNoBorder::run(), and IScheduler::schedule().

Referenced by NEDepthwiseSeparableConvolutionLayer::prepare(), and NEDepthwiseConvolutionLayer::run().

◆ run()

void run ( )
overridevirtual

Run the kernels contained in the function.

For NEON kernels:

  • Multi-threading is used for the kernels which are parallelisable.
  • By default std::thread::hardware_concurrency() threads are used.
Note
CPPScheduler::set_num_threads() can be used to manually set the number of threads

For OpenCL kernels:

  • All the kernels are enqueued on the queue associated with CLScheduler.
  • The queue is then flushed.
Note
The function will not block until the kernels are executed. It is the user's responsibility to wait.
Will call prepare() on first run if hasn't been done

Implements IFunction.

Definition at line 974 of file NEDepthwiseConvolutionLayer.cpp.

975 {
976  if(!_is_optimized)
977  {
978  prepare();
979 
980  if(_is_nhwc)
981  {
982  _permute_input.run();
983  }
984 
985  NEScheduler::get().schedule(&_im2col_kernel, Window::DimX);
986  NEScheduler::get().schedule(&_v2mm_input_fill_border, Window::DimX);
987  NEScheduler::get().schedule(&_v2mm_kernel, Window::DimX);
988  NEScheduler::get().schedule(&_vector_to_tensor_kernel, Window::DimX);
989  if(_is_quantized)
990  {
991  NEScheduler::get().schedule(&_output_stage_kernel, Window::DimX);
992  }
993 
994  if(_is_nhwc)
995  {
996  _permute_output.run();
997  }
998  }
999  else
1000  {
1001  NEScheduler::get().schedule(&_fill_border, Window::DimX);
1002  NEScheduler::get().schedule(&_depthwise_conv_kernel, Window::DimY);
1003  }
1004 
1005  if(_is_activationlayer_enabled)
1006  {
1007  _activationlayer_function.run();
1008  }
1009 }
void run() override final
Run the kernels contained in the function.
void prepare() override
Prepare the function for executing.
static constexpr size_t DimX
Alias for dimension 0 also known as X dimension.
Definition: Window.h:43
static constexpr size_t DimY
Alias for dimension 1 also known as Y dimension.
Definition: Window.h:45
virtual void schedule(ICPPKernel *kernel, const Hints &hints)=0
Runs the kernel in the same thread as the caller synchronously.
static IScheduler & get()
Access the scheduler singleton.
Definition: Scheduler.cpp:96

References Window::DimX, Window::DimY, Scheduler::get(), NEDepthwiseConvolutionLayer::prepare(), INESimpleFunctionNoBorder::run(), and IScheduler::schedule().

Referenced by NEDepthwiseSeparableConvolutionLayer::run().

◆ validate()

Status validate ( const ITensorInfo input,
const ITensorInfo weights,
const ITensorInfo biases,
const ITensorInfo output,
const PadStrideInfo conv_info,
unsigned int  depth_multiplier = 1,
const ActivationLayerInfo act_info = ActivationLayerInfo(),
const Size2D dilation = Size2D(1U, 1U) 
)
static

Static function to check if given info will lead to a valid configuration of NEDepthwiseConvolutionLayer.

Parameters
[in]inputSource tensor. Data type supported: QASYMM8/F16/F32. (Written to only for border filling).
[in]outputDestination tensor. Data type supported: same as input.
[in]weightsWeights tensor. These are 3D tensors with shape [kernel_x, kernel_y, IFM]. Data type supported: Same as input.
[in]biases(Optional) Biases tensor. A 1D tensor with shape [IFM]. Must be nullptr if not needed. Data type supported: Same as input, S32 when input is QASYMM8.
[in]conv_infoPadding and stride information to use for the convolution.
[in]depth_multiplier(Optional) Multiplier to apply to the input's depth in order to retrieve the output's depth. Defaults to 1.
[in]act_info(Optional) Activation layer information in case of a fused activation.
[in]dilation(Optional) Dilation, in elements, across x and y. Defaults to (1, 1).
Returns
a status

Definition at line 858 of file NEDepthwiseConvolutionLayer.cpp.

860 {
862  ARM_COMPUTE_RETURN_ERROR_ON(input->data_layout() == DataLayout::UNKNOWN);
863  ARM_COMPUTE_RETURN_ERROR_ON(dilation.x() < 1 || dilation.y() < 1);
864 
865  const unsigned int width_idx = get_data_layout_dimension_index(input->data_layout(), DataLayoutDimension::WIDTH);
866  const unsigned int height_idx = get_data_layout_dimension_index(input->data_layout(), DataLayoutDimension::HEIGHT);
867  const unsigned int channel_idx = get_data_layout_dimension_index(input->data_layout(), DataLayoutDimension::CHANNEL);
868 
869  ARM_COMPUTE_RETURN_ERROR_ON(weights->dimension(width_idx) + (weights->dimension(width_idx) - 1) * (dilation.x() - 1) > input->dimension(width_idx) + conv_info.pad_left() + conv_info.pad_right());
870  ARM_COMPUTE_RETURN_ERROR_ON(weights->dimension(height_idx) + (weights->dimension(height_idx) - 1) * (dilation.y() - 1) > input->dimension(height_idx) + conv_info.pad_top() + conv_info.pad_bottom());
871  ARM_COMPUTE_RETURN_ERROR_ON((input->dimension(channel_idx) * depth_multiplier) != weights->dimension(channel_idx));
872 
873  if(input->data_layout() != DataLayout::NHWC || input->data_type() != DataType::F32)
874  {
875  // Clone output to use auto init
876  auto output_clone = output->clone();
877 
878  const ITensorInfo *input_to_use = input;
879  const ITensorInfo *weights_to_use = weights;
880  const ITensorInfo *output_to_use = output_clone.get();
881 
882  TensorShape permuted_input_shape = input->tensor_shape();
883  TensorShape permuted_weights_shape = weights->tensor_shape();
884  TensorInfo permuted_input;
885  TensorInfo permuted_weights;
886 
887  if(input->data_layout() == DataLayout::NHWC)
888  {
889  permute(permuted_input_shape, PermutationVector(1U, 2U, 0U));
890  permute(permuted_weights_shape, PermutationVector(1U, 2U, 0U));
891 
892  permuted_input = TensorInfo(input->clone()->set_is_resizable(true).reset_padding().set_tensor_shape(permuted_input_shape).set_data_layout(DataLayout::NCHW));
893  permuted_weights = TensorInfo(weights->clone()->set_is_resizable(true).reset_padding().set_tensor_shape(permuted_weights_shape).set_data_layout(DataLayout::NCHW));
894 
895  input_to_use = &permuted_input;
896  weights_to_use = &permuted_weights;
897  }
898 
899  const bool is_quantized = is_data_type_quantized_asymmetric(input->data_type());
900  const bool append_bias = (biases != nullptr) && !is_quantized;
902  const size_t weights_w = weights_to_use->dimension(0);
903  const size_t weights_h = weights_to_use->dimension(1);
904  const size_t weights_z = weights_to_use->dimension(2);
905  const unsigned int conv_w = output_shape[width_idx];
906  const unsigned int conv_h = output_shape[height_idx];
907  const size_t patch_size = weights_w * weights_h + (append_bias ? 1 : 0);
908  const size_t conv_size = conv_w * conv_h;
909 
910  // Output auto inizialitation if not yet initialized
911  auto_init_if_empty(*output_clone, input->clone()->set_tensor_shape(output_shape));
913 
914  TensorInfo permuted_output;
915  if(input->data_layout() == DataLayout::NHWC)
916  {
918  permuted_output = TensorInfo(output_clone->clone()->set_is_resizable(true).reset_padding().set_tensor_shape(output_shape).set_data_layout(DataLayout::NCHW));
919  output_to_use = &permuted_output;
920  }
921 
922  // Im2Col configuration
923  TensorShape shape_im2col = input_to_use->tensor_shape();
924  shape_im2col.set(0, patch_size);
925  shape_im2col.set(1, conv_size);
926  shape_im2col.set(2, weights_z);
927  TensorInfo input_reshaped(input->clone()->set_is_resizable(true).reset_padding().set_tensor_shape(shape_im2col).set_data_layout(DataLayout::NCHW));
928  ARM_COMPUTE_RETURN_ON_ERROR(NEDepthwiseIm2ColKernel::validate(input_to_use, &input_reshaped, Size2D(weights_w, weights_h), conv_info, append_bias, depth_multiplier, dilation));
929 
930  // Weights reshape configuration
931  const TensorShape shape_weights_reshape(patch_size, weights_z);
932  TensorInfo weights_reshaped(weights->clone()->set_is_resizable(true).reset_padding().set_tensor_shape(shape_weights_reshape).set_data_layout(DataLayout::NCHW));
933  ARM_COMPUTE_RETURN_ON_ERROR(NEDepthwiseWeightsReshapeKernel::validate(weights_to_use, &weights_reshaped, append_bias ? biases : nullptr));
934 
935  // GEMV configuration
936  DataType v2mm_dt = (input->data_type() == DataType::QASYMM8) ? DataType::S32 : input->data_type();
937  TensorShape shape_v2mm_out = input_to_use->tensor_shape();
938  shape_v2mm_out.set(0, conv_size * weights_z);
939  shape_v2mm_out.set(1, 1);
940  shape_v2mm_out.set(2, 1);
941  TensorInfo v2mm_output(input->clone()->set_is_resizable(true).reset_padding().set_data_type(v2mm_dt).set_tensor_shape(shape_v2mm_out).set_data_layout(DataLayout::NCHW));
942  ARM_COMPUTE_RETURN_ON_ERROR(NEGEMMMatrixVectorMultiplyKernel::validate(&input_reshaped, &weights_reshaped, &v2mm_output));
943 
944  TensorInfo output_reshaped(v2mm_output.clone()->set_is_resizable(true).reset_padding().set_tensor_shape(output_to_use->tensor_shape()));
945  ARM_COMPUTE_RETURN_ON_ERROR(NEDepthwiseVectorToTensorKernel::validate(&v2mm_output, (is_quantized) ? &output_reshaped : output_to_use, conv_w, conv_h));
946 
947  if(is_quantized)
948  {
949  const UniformQuantizationInfo iq_info = input->quantization_info().uniform();
950  const UniformQuantizationInfo wq_info = weights->quantization_info().uniform();
951  const UniformQuantizationInfo oq_info = output->quantization_info().uniform();
952 
953  float multiplier = (iq_info.scale * wq_info.scale) / oq_info.scale;
954  int output_multiplier;
955  int output_shift;
956  ARM_COMPUTE_RETURN_ON_ERROR(quantization::calculate_quantized_multiplier_less_than_one(multiplier, &output_multiplier, &output_shift));
957  ARM_COMPUTE_RETURN_ON_ERROR(NEDirectConvolutionLayerOutputStageKernel::validate(&output_reshaped, biases, output_to_use, output_multiplier, output_shift, oq_info.offset));
958  }
959  }
960  else
961  {
963  }
964 
965  // Validate Activation Layer
966  if(act_info.enabled())
967  {
969  }
970 
971  return Status{};
972 }
arm_compute::Status calculate_quantized_multiplier_less_than_one(float multiplier, int *quant_multiplier, int *right_shift)
Calculate quantized representation of multiplier with value less than one.
static Status validate(const ITensorInfo *input, const ITensorInfo *output, size_t conv_w, size_t conv_h)
Static function to check if given info will lead to a valid configuration of NEDepthwiseVectorToTenso...
TensorShape compute_depthwise_convolution_shape(const ITensorInfo &input, const ITensorInfo &weights, PadStrideInfo conv_info, unsigned int depth_multiplier, const Size2D &dilation=Size2D(1U, 1U))
Calculate the depthwise convolution output shape of a tensor.
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:193
static Status validate(const ITensorInfo *input, const ITensorInfo *output, const ActivationLayerInfo &act_info)
Static function to check if given info will lead to a valid configuration of NEActivationLayer.
1 channel, 1 F32 per channel
Strides PermutationVector
Permutation vector.
Definition: Types.h:47
#define ARM_COMPUTE_RETURN_ERROR_ON(cond)
If the condition is true, an error is returned.
Definition: Error.h:244
bool auto_init_if_empty(ITensorInfo &info, const TensorShape &shape, int num_channels, DataType data_type, QuantizationInfo quantization_info=QuantizationInfo())
Auto initialize the tensor info (shape, number of channels and data type) if the current assignment i...
Definition: Helpers.inl:201
void permute(Dimensions< T > &dimensions, const PermutationVector &perm)
Permutes given Dimensions according to a permutation vector.
Definition: Helpers.h:570
1 channel, 1 S32 per channel
static Status validate(const ITensorInfo *input, const ITensorInfo *output, const Size2D &kernel_dims, const PadStrideInfo &conv_info, bool has_bias=false, unsigned int depth_multiplier=1, const Size2D &dilation=Size2D(1U, 1U))
Static function to check if given info will lead to a valid configuration of NEDepthwiseIm2ColKernel.
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DIMENSIONS(...)
Definition: Validate.h:288
quantized, asymmetric fixed-point 8-bit number
Num samples, channels, height, width.
bool is_data_type_quantized_asymmetric(DataType dt)
Check if a given data type is of asymmetric quantized type.
Definition: Utils.h:1030
#define ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR(...)
Definition: Validate.h:163
static Status validate(const ITensorInfo *input, const ITensorInfo *output, const ITensorInfo *biases)
Static function to check if given info will lead to a valid configuration of NEDepthwiseWeightsReshap...
static Status validate(const ITensorInfo *input0, const ITensorInfo *input1, const ITensorInfo *output)
Static function to check if given info will lead to a valid configuration of NEGEMMMatrixVectorMultip...
Num samples, height, width, channels.
size_t get_data_layout_dimension_index(const DataLayout data_layout, const DataLayoutDimension data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:326
static Status validate(const ITensorInfo *input, const ITensorInfo *bias=nullptr, const ITensorInfo *output=nullptr, int result_fixedpoint_multiplier=0, int result_shift=0, int result_offset_after_shift=0)
Static function to check if given info will lead to a valid configuration of NEDirectConvolutionLayer...
DataType
Available data types.
Definition: Types.h:74
static Status validate(const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, const PadStrideInfo &conv_info, unsigned int depth_multiplier=1, const Size2D &dilation=Size2D(1U, 1U))
Static function to check if given info will lead to a valid configuration of NEDepthwiseConvolutionLa...

References arm_compute::test::validation::act_info, ARM_COMPUTE_RETURN_ERROR_ON, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DIMENSIONS, ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR, ARM_COMPUTE_RETURN_ON_ERROR, arm_compute::auto_init_if_empty(), arm_compute::quantization::calculate_quantized_multiplier_less_than_one(), arm_compute::CHANNEL, ICloneable< T >::clone(), arm_compute::misc::shape_calculator::compute_depthwise_convolution_shape(), arm_compute::test::validation::conv_info, ITensorInfo::data_layout(), ITensorInfo::data_type(), arm_compute::test::validation::dilation, ITensorInfo::dimension(), arm_compute::F32, arm_compute::get_data_layout_dimension_index(), arm_compute::HEIGHT, arm_compute::is_data_type_quantized_asymmetric(), arm_compute::NCHW, arm_compute::NHWC, UniformQuantizationInfo::offset, arm_compute::test::validation::output_shape, arm_compute::permute(), arm_compute::QASYMM8, ITensorInfo::quantization_info(), arm_compute::S32, UniformQuantizationInfo::scale, TensorShape::set(), ITensorInfo::tensor_shape(), arm_compute::U, QuantizationInfo::uniform(), arm_compute::UNKNOWN, NEActivationLayer::validate(), NEGEMMMatrixVectorMultiplyKernel::validate(), NEDepthwiseWeightsReshapeKernel::validate(), NEDepthwiseVectorToTensorKernel::validate(), NEDepthwiseConvolutionLayerNativeKernel::validate(), NEDirectConvolutionLayerOutputStageKernel::validate(), NEDepthwiseIm2ColKernel::validate(), arm_compute::test::validation::weights, and arm_compute::WIDTH.

Referenced by NEDepthwiseConvolutionLayer::configure().


The documentation for this class was generated from the following files: