Compute Library
 19.08
NEDirectConvolutionLayerKernel Class Reference

NEON interface for Direct Convolution Layer kernel. More...

#include <NEDirectConvolutionLayerKernel.h>

Collaboration diagram for NEDirectConvolutionLayerKernel:
[legend]

Public Member Functions

const char * name () const override
 Name of the kernel. More...
 
 NEDirectConvolutionLayerKernel ()
 Default constructor. More...
 
 NEDirectConvolutionLayerKernel (const NEDirectConvolutionLayerKernel &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
NEDirectConvolutionLayerKerneloperator= (const NEDirectConvolutionLayerKernel &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 NEDirectConvolutionLayerKernel (NEDirectConvolutionLayerKernel &&)=default
 Allow instances of this class to be moved. More...
 
NEDirectConvolutionLayerKerneloperator= (NEDirectConvolutionLayerKernel &&)=default
 Allow instances of this class to be moved. More...
 
 ~NEDirectConvolutionLayerKernel ()=default
 Default destructor. More...
 
void configure (const ITensor *input, const ITensor *weights, ITensor *output, const PadStrideInfo &conv_info)
 Set the input, weights, and output tensors. More...
 
void run (const Window &window, const ThreadInfo &info) override
 Execute the kernel on the passed window. More...
 
BorderSize border_size () const override
 The size of the border for that kernel. More...
 
- Public Member Functions inherited from ICPPKernel
virtual ~ICPPKernel ()=default
 Default destructor. More...
 
- Public Member Functions inherited from IKernel
 IKernel ()
 Constructor. More...
 
virtual ~IKernel ()=default
 Destructor. More...
 
virtual bool is_parallelisable () const
 Indicates whether or not the kernel is parallelisable. More...
 
const Windowwindow () const
 The maximum window the kernel can be executed on. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *output, const PadStrideInfo &conv_info)
 Static function to check if given info will lead to a valid configuration of NEDirectConvolutionLayerKernel. More...
 

Detailed Description

NEON interface for Direct Convolution Layer kernel.

Definition at line 34 of file NEDirectConvolutionLayerKernel.h.

Constructor & Destructor Documentation

◆ NEDirectConvolutionLayerKernel() [1/3]

Default constructor.

Definition at line 1453 of file NEDirectConvolutionLayerKernel.cpp.

1454  : _input(nullptr), _weights(nullptr), _output(nullptr), _conv_info(), _border_size(0), _kernel_size(0), _num_weight_elems_read_per_row(0), _num_elems_read_per_iteration(0),
1455  _num_elems_written_per_iteration(0)
1456 {
1457 }

◆ NEDirectConvolutionLayerKernel() [2/3]

Prevent instances of this class from being copied (As this class contains pointers)

◆ NEDirectConvolutionLayerKernel() [3/3]

Allow instances of this class to be moved.

◆ ~NEDirectConvolutionLayerKernel()

Default destructor.

Member Function Documentation

◆ border_size()

BorderSize border_size ( ) const
overridevirtual

The size of the border for that kernel.

Returns
The width in number of elements of the border.

Reimplemented from IKernel.

Definition at line 1459 of file NEDirectConvolutionLayerKernel.cpp.

1460 {
1461  return _border_size;
1462 }

Referenced by NEDirectConvolutionLayer::configure(), and NEDirectConvolutionLayerKernel::validate().

◆ configure()

void configure ( const ITensor input,
const ITensor weights,
ITensor output,
const PadStrideInfo conv_info 
)

Set the input, weights, and output tensors.

Note
: DirectConvolution only works in the following configurations: 1x1 convolution with stride_x = 1/2/3, stride_y = 1/2/3 3x3 convolution with stride_x = 1/2/3, stride_y = 1/2/3
Parameters
[in]inputThe input tensor to convolve. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: F16/F32.
[in]weightsWeights tensor. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. The 3rd dimension must be the same as the input's volume 3rd dimension. Data type supported:Same as input.
[out]outputOutput tensor. The 3rd dimensions must be equal to the 4th dimension of the kernels tensor. Data types supported: F16/F32
[in]conv_infoContains padding and stride information described in PadStrideInfo.

Definition at line 1464 of file NEDirectConvolutionLayerKernel.cpp.

1465 {
1466  ARM_COMPUTE_ERROR_ON_NULLPTR(input, weights, output);
1467 
1468  _input = input;
1469  _weights = weights;
1470  _output = output;
1471  _conv_info = conv_info;
1473 
1474  const unsigned int conv_pad_left = conv_info.pad_left();
1475  const unsigned int conv_pad_top = conv_info.pad_top();
1476  const unsigned int conv_pad_right = conv_info.pad_right();
1477  const unsigned int conv_pad_bottom = conv_info.pad_bottom();
1478  _border_size = BorderSize(conv_pad_top, conv_pad_right, conv_pad_bottom, conv_pad_left);
1479 
1480  // Get convolved dimensions
1482 
1483  DataType data_type = input->info()->data_type();
1484 
1485  // Output auto inizialitation if not yet initialized
1486  auto_init_if_empty(*output->info(), output_shape, 1, data_type);
1487 
1488  // Perform validation step
1489  ARM_COMPUTE_ERROR_THROW_ON(validate_arguments(input->info(), weights->info(), output->info(), conv_info));
1490 
1491  // Configure kernel window
1492  auto win_config = validate_and_configure_window(input->info(), weights->info(), output->info(), conv_info, _num_weight_elems_read_per_row,
1493  _num_elems_read_per_iteration, _num_elems_written_per_iteration, _border_size);
1494  ARM_COMPUTE_ERROR_THROW_ON(win_config.first);
1495  INEKernel::configure(win_config.second);
1496 }
Shape of a tensor.
Definition: TensorShape.h:39
TensorInfo * info() const override
Interface to be implemented by the child class to return the tensor's metadata.
Definition: CLTensor.cpp:35
Container for 2D border size.
Definition: Types.h:259
DataLayout data_layout() const override
Get the data layout of the tensor.
Definition: TensorInfo.h:297
TensorShape compute_deep_convolution_shape(const ITensorInfo &input, const ITensorInfo &weights, PadStrideInfo conv_info)
Calculate the deep convolution shape output shape of a tensor.
std::pair< Status, Window > validate_and_configure_window(ITensorInfo *input, ITensorInfo *weights, ITensorInfo *biases, ITensorInfo *output, const PadStrideInfo &conv_info, unsigned int depth_multiplier, const Size2D &dilation)
size_t dimension(size_t index) const override
Return the size of the requested dimension.
Definition: TensorInfo.h:223
virtual DataType data_type() const =0
Data type used for each element of the tensor.
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:327
bool auto_init_if_empty(ITensorInfo &info, const TensorShape &shape, int num_channels, DataType data_type, QuantizationInfo quantization_info=QuantizationInfo())
Auto initialize the tensor info (shape, number of channels and data type) if the current assignment i...
Definition: Helpers.inl:201
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor's metadata.
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161
size_t get_data_layout_dimension_index(const DataLayout data_layout, const DataLayoutDimension data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:326
DataType
Available data types.
Definition: Types.h:74

References ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, arm_compute::auto_init_if_empty(), arm_compute::misc::shape_calculator::compute_deep_convolution_shape(), arm_compute::test::validation::conv_info, TensorInfo::data_layout(), arm_compute::test::validation::data_type, ITensorInfo::data_type(), TensorInfo::dimension(), arm_compute::get_data_layout_dimension_index(), ITensor::info(), CLTensor::info(), arm_compute::test::validation::output_shape, arm_compute::validate_and_configure_window(), arm_compute::test::validation::weights, and arm_compute::WIDTH.

Referenced by NEDirectConvolutionLayer::configure().

◆ name()

const char* name ( ) const
inlineoverridevirtual

Name of the kernel.

Returns
Kernel name

Implements ICPPKernel.

Definition at line 37 of file NEDirectConvolutionLayerKernel.h.

38  {
39  return "NEDirectConvolutionLayerKernel";
40  }

◆ operator=() [1/2]

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

Allow instances of this class to be moved.

◆ run()

void run ( const Window window,
const ThreadInfo info 
)
overridevirtual

Execute the kernel on the passed window.

Warning
If is_parallelisable() returns false then the passed window must be equal to window()
Note
The window has to be a region within the window returned by the window() method
The width of the window has to be a multiple of num_elems_processed_per_iteration().
Parameters
[in]windowRegion on which to execute the kernel. (Must be a region of the window returned by window())
[in]infoInfo about executing thread and CPU.

Implements ICPPKernel.

Definition at line 1518 of file NEDirectConvolutionLayerKernel.cpp.

1519 {
1523  ARM_COMPUTE_ERROR_ON(_input->buffer() == nullptr);
1524 
1525  const int kernel_size = _weights->info()->dimension(get_data_layout_dimension_index(_weights->info()->data_layout(), DataLayoutDimension::WIDTH));
1526 
1527  if(_input->info()->data_layout() == DataLayout::NCHW)
1528  {
1529  switch(kernel_size)
1530  {
1531  case 1:
1532  {
1533  switch(_input->info()->data_type())
1534  {
1535  case DataType::F32:
1536  convolve_1x1<float, float>(window, _num_elems_read_per_iteration, _num_elems_written_per_iteration, _input, _weights, _output, _conv_info);
1537  break;
1538 #ifdef __ARM_FEATURE_FP16_VECTOR_ARITHMETIC
1539  case DataType::F16:
1540  convolve_1x1<float16_t, float16_t>(window, _num_elems_read_per_iteration, _num_elems_written_per_iteration, _input, _weights, _output, _conv_info);
1541  break;
1542 #endif /* __ARM_FEATURE_FP16_VECTOR_ARITHMETIC */
1543  default:
1544  ARM_COMPUTE_ERROR("Data type not supported");
1545  break;
1546  }
1547  break;
1548  }
1549  case 3:
1550  {
1551  switch(_input->info()->data_type())
1552  {
1553  case DataType::F32:
1554  convolve_3x3<float, float>(window, _num_elems_read_per_iteration, _num_elems_written_per_iteration, _input, _weights, _output, _conv_info);
1555  break;
1556 #ifdef __ARM_FEATURE_FP16_VECTOR_ARITHMETIC
1557  case DataType::F16:
1558  convolve_3x3<float16_t, float16_t>(window, _num_elems_read_per_iteration, _num_elems_written_per_iteration, _input, _weights, _output, _conv_info);
1559  break;
1560 #endif /* __ARM_FEATURE_FP16_VECTOR_ARITHMETIC */
1561  default:
1562  ARM_COMPUTE_ERROR("Data type not supported");
1563  break;
1564  }
1565  break;
1566  }
1567  case 5:
1568  {
1569  switch(_input->info()->data_type())
1570  {
1571  case DataType::F32:
1572  convolve_5x5<float, float>(window, _num_elems_read_per_iteration, _num_elems_written_per_iteration, _input, _weights, _output, _conv_info);
1573  break;
1574  default:
1575  ARM_COMPUTE_ERROR("Data type not supported");
1576  break;
1577  }
1578  break;
1579  }
1580  default:
1581  {
1582  ARM_COMPUTE_ERROR("Only kernel sizes 1x1, 3x3 and 5x5 are supported.");
1583  break;
1584  }
1585  }
1586  }
1587  else
1588  {
1589  const int kernel_size = _weights->info()->dimension(get_data_layout_dimension_index(_weights->info()->data_layout(), DataLayoutDimension::WIDTH));
1590  const int stride_x = std::get<0>(_conv_info.stride());
1591  const int stride_y = std::get<1>(_conv_info.stride());
1592 
1593  switch(_input->info()->data_type())
1594  {
1595  case DataType::F32:
1596  {
1597  if(kernel_size == 9 && stride_x == 1 && stride_y == 1)
1598  {
1600  convolve_9x9_nhwc<vtype>(window, _num_elems_read_per_iteration, _input, _weights, _output, _conv_info);
1601  }
1602  else
1603  {
1604  convolver_nhwc<float>::convolve(window, kernel_size, _num_elems_read_per_iteration, _input, _weights, _output, _conv_info);
1605  }
1606  break;
1607  }
1608  default:
1609  ARM_COMPUTE_ERROR("Data type not supported");
1610  break;
1611  }
1612  }
1613 }
#define ARM_COMPUTE_ERROR(...)
Print the given message then throw an std::runtime_error.
Definition: Error.h:261
const Window & window() const
The maximum window the kernel can be executed on.
Definition: IKernel.cpp:28
virtual size_t dimension(size_t index) const =0
Return the size of the requested dimension.
virtual DataType data_type() const =0
Data type used for each element of the tensor.
1 channel, 1 F32 per channel
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:337
1 channel, 1 F16 per channel
#define ARM_COMPUTE_UNUSED(...)
To avoid unused variables warnings.
Definition: Error.h:160
Create the appropriate NEON vector given its type and size in terms of elements.
Definition: traits.h:44
virtual uint8_t * buffer() const =0
Interface to be implemented by the child class to return a pointer to CPU memory.
std::pair< unsigned int, unsigned int > stride() const
Get the stride.
Definition: Types.h:724
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor's metadata.
Num samples, channels, height, width.
size_t get_data_layout_dimension_index(const DataLayout data_layout, const DataLayoutDimension data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:326
#define ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW(f, s)
Definition: Validate.h:205
#define ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL(k)
Definition: Validate.h:940
virtual DataLayout data_layout() const =0
Get the data layout of the tensor.

References ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON, ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW, ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL, ARM_COMPUTE_UNUSED, ITensor::buffer(), ITensorInfo::data_layout(), ITensorInfo::data_type(), ITensorInfo::dimension(), arm_compute::F16, arm_compute::F32, arm_compute::get_data_layout_dimension_index(), ITensor::info(), arm_compute::test::validation::info, arm_compute::NCHW, PadStrideInfo::stride(), arm_compute::WIDTH, and IKernel::window().

◆ validate()

Status validate ( const ITensorInfo input,
const ITensorInfo weights,
const ITensorInfo output,
const PadStrideInfo conv_info 
)
static

Static function to check if given info will lead to a valid configuration of NEDirectConvolutionLayerKernel.

Parameters
[in]inputThe input tensor to convolve. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: F16/F32.
[in]weightsWeights tensor. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. The 3rd dimension must be the same as the input's volume 3rd dimension. Data type supported:Same as input.
[in]outputOutput tensor. The 3rd dimensions must be equal to the 4th dimension of the kernels tensor. Data types supported: F16/F32
[in]conv_infoContains padding and stride information described in PadStrideInfo.
Returns
a status

Definition at line 1498 of file NEDirectConvolutionLayerKernel.cpp.

1499 {
1500  unsigned int num_weight_elems_read_per_row = 0;
1501  unsigned int num_elems_read_per_iteration = 0;
1502  unsigned int num_elems_written_per_iteration = 0;
1503  BorderSize border_size = {};
1504  ARM_COMPUTE_RETURN_ON_ERROR(validate_arguments(input, weights, output, conv_info));
1506  weights->clone().get(),
1507  output->clone().get(),
1508  conv_info,
1509  num_weight_elems_read_per_row,
1510  num_elems_read_per_iteration,
1511  num_elems_written_per_iteration,
1512  border_size)
1513  .first);
1514 
1515  return Status{};
1516 }
Container for 2D border size.
Definition: Types.h:259
std::pair< Status, Window > validate_and_configure_window(ITensorInfo *input, ITensorInfo *weights, ITensorInfo *biases, ITensorInfo *output, const PadStrideInfo &conv_info, unsigned int depth_multiplier, const Size2D &dilation)
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:193
Status class.
Definition: Error.h:52
virtual std::unique_ptr< T > clone() const =0
Provide a clone of the current object of class T.
BorderSize border_size() const override
The size of the border for that kernel.

References ARM_COMPUTE_RETURN_ON_ERROR, NEDirectConvolutionLayerKernel::border_size(), ICloneable< T >::clone(), arm_compute::test::validation::conv_info, arm_compute::validate_and_configure_window(), and arm_compute::test::validation::weights.

Referenced by NEDirectConvolutionLayer::validate().


The documentation for this class was generated from the following files: