Compute Library
 21.02
NEConvolutionSquare< matrix_size > Class Template Reference

Basic function to execute convolution of size 5x5, 7x7, 9x9. More...

#include <NEConvolution.h>

Collaboration diagram for NEConvolutionSquare< matrix_size >:
[legend]

Public Member Functions

 NEConvolutionSquare (std::shared_ptr< IMemoryManager > memory_manager=nullptr)
 Default constructor. More...
 
 NEConvolutionSquare (const NEConvolutionSquare &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
NEConvolutionSquareoperator= (const NEConvolutionSquare &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 NEConvolutionSquare (NEConvolutionSquare &&)=delete
 Prevent instances of this class from being moved (As this class contains non movable objects) More...
 
NEConvolutionSquareoperator= (NEConvolutionSquare &&)=delete
 Prevent instances of this class from being moved (As this class contains non movable objects) More...
 
 ~NEConvolutionSquare ()
 Default destructor. More...
 
void configure (ITensor *input, ITensor *output, const int16_t *conv, uint32_t scale, BorderMode border_mode, uint8_t constant_border_value=0)
 Initialize the function's source, destination, conv and border_mode. More...
 
void run () override
 Run the kernels contained in the function. More...
 
- Public Member Functions inherited from IFunction
virtual ~IFunction ()=default
 Destructor. More...
 
virtual void prepare ()
 Prepare the function for executing. More...
 

Detailed Description

template<unsigned int matrix_size>
class arm_compute::NEConvolutionSquare< matrix_size >

Basic function to execute convolution of size 5x5, 7x7, 9x9.

This function calls the following Neon kernels:

  1. NEFillBorderKernel (executed if border_mode == CONSTANT or border_mode == REPLICATE)
  2. NEConvolutionKernel or
    NESeparableConvolutionHorKernel and NESeparableConvolutionVertKernel (if convolution matrix is separable)
Deprecated:
This function is deprecated and is intended to be removed in 21.05 release

Definition at line 93 of file NEConvolution.h.

Constructor & Destructor Documentation

◆ NEConvolutionSquare() [1/3]

NEConvolutionSquare ( std::shared_ptr< IMemoryManager memory_manager = nullptr)

Default constructor.

Definition at line 60 of file NEConvolution.cpp.

61  : _memory_group(std::move(memory_manager)), _tmp(), _is_separable(false), _kernel_hor(), _kernel_vert(), _kernel(), _border_handler()
62 {
63 }

◆ NEConvolutionSquare() [2/3]

NEConvolutionSquare ( const NEConvolutionSquare< matrix_size > &  )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ NEConvolutionSquare() [3/3]

NEConvolutionSquare ( NEConvolutionSquare< matrix_size > &&  )
delete

Prevent instances of this class from being moved (As this class contains non movable objects)

◆ ~NEConvolutionSquare()

~NEConvolutionSquare ( )
default

Default destructor.

Referenced by NEConvolution3x3::configure().

Member Function Documentation

◆ configure()

void configure ( ITensor input,
ITensor output,
const int16_t *  conv,
uint32_t  scale,
BorderMode  border_mode,
uint8_t  constant_border_value = 0 
)

Initialize the function's source, destination, conv and border_mode.

Parameters
[in,out]inputSource tensor. Data type supported: U8. (Written to only for border_mode != UNDEFINED)
[out]outputDestination tensor, Data types supported: U8 or S16.
[in]convmatrix_size x matrix_size S16 coefficients structured as a row-major 2D array in a linear buffer.
[in]scaleScale of the convolution matrix. If 0 is passed, it will be set to the sum of the coefficients of the convolution or 1 if they add up to 0.
[in]border_modeStrategy to use for borders.
[in]constant_border_value(Optional) Constant value to use for borders if border_mode is set to CONSTANT.

Definition at line 66 of file NEConvolution.cpp.

References TensorAllocator::allocate(), Tensor::allocator(), ARM_COMPUTE_ERROR_ON, ARM_COMPUTE_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, arm_compute::test::validation::b, arm_compute::calculate_matrix_scale(), arm_compute::data_type_for_convolution(), ITensor::info(), TensorAllocator::init(), MemoryGroup::manage(), arm_compute::S16, arm_compute::test::validation::scale, arm_compute::separate_matrix(), ITensorInfo::tensor_shape(), arm_compute::U8, arm_compute::UNDEFINED, and arm_compute::UNKNOWN.

68 {
69  ARM_COMPUTE_ERROR_ON(conv == nullptr);
72 
73  std::array<int16_t, matrix_size> conv_col{ { 0 } };
74  std::array<int16_t, matrix_size> conv_row{ { 0 } };
75 
76  _is_separable = separate_matrix(conv, conv_col.data(), conv_row.data(), matrix_size);
77 
78  auto b = std::make_unique<NEFillBorderKernel>();
79  if(_is_separable)
80  {
81  DataType intermediate_type = DataType::UNKNOWN;
82  std::tie(std::ignore, intermediate_type) = data_type_for_convolution(conv_col.data(), conv_row.data(), matrix_size);
83 
84  _tmp.allocator()->init(TensorInfo(input->info()->tensor_shape(), 1, intermediate_type));
85 
86  // Manage intermediate buffers
87  _memory_group.manage(&_tmp);
88 
89  // Calculate scale
90  if(scale == 0)
91  {
92  scale = calculate_matrix_scale(conv, matrix_size);
93  }
94 
95  _kernel_hor = std::make_unique<NESeparableConvolutionHorKernel<matrix_size>>();
96  _kernel_vert = std::make_unique<NESeparableConvolutionVertKernel<matrix_size>>();
97 
98  _kernel_hor->configure(input, &_tmp, conv_row.data(), border_mode == BorderMode::UNDEFINED);
99  _kernel_vert->configure(&_tmp, output, conv_col.data(), scale, border_mode == BorderMode::UNDEFINED);
100 
101  _tmp.allocator()->allocate();
102 
103  b->configure(input, _kernel_hor->border_size(), border_mode, PixelValue(constant_border_value));
104  }
105  else
106  {
107  _kernel = std::make_unique<NEConvolutionKernel<matrix_size>>();
108  _kernel->configure(input, output, conv, scale, border_mode == BorderMode::UNDEFINED);
109  b->configure(input, _kernel->border_size(), border_mode, PixelValue(constant_border_value));
110  }
111  _border_handler = std::move(b);
112 }
void init(const TensorAllocator &allocator, const Coordinates &coords, TensorInfo &sub_info)
Shares the same backing memory with another tensor allocator, while the tensor info might be differen...
SimpleTensor< float > b
Definition: DFT.cpp:157
1 channel, 1 U8 per channel
std::pair< DataType, DataType > data_type_for_convolution(const int16_t *conv_col, const int16_t *conv_row, size_t size)
Calculate accurary required by the horizontal and vertical convolution computations.
Definition: Utils.h:806
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:466
uint32_t calculate_matrix_scale(const int16_t *matrix, unsigned int matrix_size)
Calculate the scale of the given square matrix.
Definition: Utils.h:727
TensorAllocator * allocator()
Return a pointer to the tensor&#39;s allocator.
Definition: Tensor.cpp:48
void manage(IMemoryManageable *obj) override
Sets a object to be managed by the given memory group.
Definition: MemoryGroup.h:79
void allocate() override
Allocate size specified by TensorInfo of CPU memory.
1 channel, 1 S16 per channel
#define ARM_COMPUTE_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(t, c,...)
Definition: Validate.h:790
Borders are left undefined.
bool separate_matrix(const int16_t *conv, int16_t *conv_col, int16_t *conv_row, uint8_t size)
Separate a 2D convolution into two 1D convolutions.
Definition: Utils.h:667
DataType
Available data types.
Definition: Types.h:77

◆ operator=() [1/2]

NEConvolutionSquare& operator= ( const NEConvolutionSquare< matrix_size > &  )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

NEConvolutionSquare& operator= ( NEConvolutionSquare< matrix_size > &&  )
delete

Prevent instances of this class from being moved (As this class contains non movable objects)

◆ run()

void run ( )
overridevirtual

Run the kernels contained in the function.

For Neon kernels:

  • Multi-threading is used for the kernels which are parallelisable.
  • By default std::thread::hardware_concurrency() threads are used.
Note
CPPScheduler::set_num_threads() can be used to manually set the number of threads

For OpenCL kernels:

  • All the kernels are enqueued on the queue associated with CLScheduler.
  • The queue is then flushed.
Note
The function will not block until the kernels are executed. It is the user's responsibility to wait.
Will call prepare() on first run if hasn't been done

Implements IFunction.

Definition at line 115 of file NEConvolution.cpp.

References Window::DimY, Window::DimZ, Scheduler::get(), IScheduler::schedule(), and NEConvolutionRectangle::~NEConvolutionRectangle().

116 {
117  NEScheduler::get().schedule(_border_handler.get(), Window::DimZ);
118 
119  if(_is_separable)
120  {
121  MemoryGroupResourceScope scope_mg(_memory_group);
122 
123  NEScheduler::get().schedule(_kernel_hor.get(), Window::DimY);
124  NEScheduler::get().schedule(_kernel_vert.get(), Window::DimY);
125  }
126  else
127  {
128  NEScheduler::get().schedule(_kernel.get(), Window::DimY);
129  }
130 }
static constexpr size_t DimY
Alias for dimension 1 also known as Y dimension.
Definition: Window.h:45
virtual void schedule(ICPPKernel *kernel, const Hints &hints)=0
Runs the kernel in the same thread as the caller synchronously.
static constexpr size_t DimZ
Alias for dimension 2 also known as Z dimension.
Definition: Window.h:47
static IScheduler & get()
Access the scheduler singleton.
Definition: Scheduler.cpp:94

The documentation for this class was generated from the following files: