Compute Library
 21.02
CLConvolutionSquare< matrix_size > Class Template Reference

Basic function to execute square convolution.Currently it supports 5x5, 7x7, 9x9. More...

#include <CLConvolution.h>

Collaboration diagram for CLConvolutionSquare< matrix_size >:
[legend]

Public Member Functions

 CLConvolutionSquare (std::shared_ptr< IMemoryManager > memory_manager=nullptr)
 Default constructor. More...
 
 CLConvolutionSquare (const CLConvolutionSquare &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 CLConvolutionSquare (CLConvolutionSquare &&)=default
 Default move constructor. More...
 
CLConvolutionSquareoperator= (const CLConvolutionSquare &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
CLConvolutionSquareoperator= (CLConvolutionSquare &&)=default
 Default move assignment operator. More...
 
 ~CLConvolutionSquare ()
 Default destructor. More...
 
void configure (ICLTensor *input, ICLTensor *output, const int16_t *conv, uint32_t scale, BorderMode border_mode, uint8_t constant_border_value=0)
 Initialize the function's source, destination, conv and border_mode. More...
 
void configure (const CLCompileContext &compile_context, ICLTensor *input, ICLTensor *output, const int16_t *conv, uint32_t scale, BorderMode border_mode, uint8_t constant_border_value=0)
 Initialize the function's source, destination, conv and border_mode. More...
 
void run () override
 Run the kernels contained in the function. More...
 
- Public Member Functions inherited from IFunction
virtual ~IFunction ()=default
 Destructor. More...
 
virtual void prepare ()
 Prepare the function for executing. More...
 

Detailed Description

template<unsigned int matrix_size>
class arm_compute::CLConvolutionSquare< matrix_size >

Basic function to execute square convolution.Currently it supports 5x5, 7x7, 9x9.

This function calls the following OpenCL kernels:

  1. CLFillBorderKernel (executed if border_mode == CONSTANT or border_mode == REPLICATE)
  2. CLConvolutionKernel or
    CLSeparableConvolutionHorKernel and CLSeparableConvolutionVertKernel (if convolution matrix is separable)
Deprecated:
This function is deprecated and is intended to be removed in 21.05 release

Definition at line 92 of file CLConvolution.h.

Constructor & Destructor Documentation

◆ CLConvolutionSquare() [1/3]

CLConvolutionSquare ( std::shared_ptr< IMemoryManager memory_manager = nullptr)

Default constructor.

Definition at line 56 of file CLConvolution.cpp.

References CLConvolutionSquare< matrix_size >::~CLConvolutionSquare().

57  : _memory_group(std::move(memory_manager)), _tmp(), _is_separable(false), _kernel_hor(std::make_unique<CLSeparableConvolutionHorKernel<matrix_size>>()),
58  _kernel_vert(std::make_unique<CLSeparableConvolutionVertKernel<matrix_size>>()), _kernel(std::make_unique<CLConvolutionKernel<matrix_size>>()), _border_handler(std::make_unique<CLFillBorderKernel>())
59 {
60 }
Kernel for the Horizontal pass of a Separable Convolution.
Definition: CLConvolution.h:42
Interface for the kernel to run an arbitrary size convolution on a tensor.
Definition: CLConvolution.h:40
Kernel for the Vertical pass of a Separable Convolution.
Definition: CLConvolution.h:44

◆ CLConvolutionSquare() [2/3]

CLConvolutionSquare ( const CLConvolutionSquare< matrix_size > &  )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ CLConvolutionSquare() [3/3]

CLConvolutionSquare ( CLConvolutionSquare< matrix_size > &&  )
default

Default move constructor.

◆ ~CLConvolutionSquare()

~CLConvolutionSquare ( )
default

Member Function Documentation

◆ configure() [1/2]

void configure ( ICLTensor input,
ICLTensor output,
const int16_t *  conv,
uint32_t  scale,
BorderMode  border_mode,
uint8_t  constant_border_value = 0 
)

Initialize the function's source, destination, conv and border_mode.

Parameters
[in,out]inputSource tensor. Data types supported: U8. (Written to only for border_mode != UNDEFINED)
[out]outputDestination tensor, Data types supported: U8 or S16.
[in]convmatrix_size x matrix_size S16 coefficients structured as a row-major 2D array in a linear buffer.
[in]scaleScale of the convolution matrix. If 0 is passed, it will be set to the sum of the coefficients of the convolution or 1 if they add up to 0.
[in]border_modeStrategy to use for borders.
[in]constant_border_value(Optional) Constant value to use for borders if border_mode is set to CONSTANT.

Definition at line 66 of file CLConvolution.cpp.

References CLKernelLibrary::get().

Referenced by CLConvolutionRectangle::configure().

68 {
69  configure(CLKernelLibrary::get().get_compile_context(), input, output, conv, scale, border_mode, constant_border_value);
70 }
void configure(ICLTensor *input, ICLTensor *output, const int16_t *conv, uint32_t scale, BorderMode border_mode, uint8_t constant_border_value=0)
Initialize the function&#39;s source, destination, conv and border_mode.
static CLKernelLibrary & get()
Access the KernelLibrary singleton.

◆ configure() [2/2]

void configure ( const CLCompileContext compile_context,
ICLTensor input,
ICLTensor output,
const int16_t *  conv,
uint32_t  scale,
BorderMode  border_mode,
uint8_t  constant_border_value = 0 
)

Initialize the function's source, destination, conv and border_mode.

Parameters
[in]compile_contextThe compile context to be used.
[in,out]inputSource tensor. Data types supported: U8. (Written to only for border_mode != UNDEFINED)
[out]outputDestination tensor, Data types supported: U8 or S16.
[in]convmatrix_size x matrix_size S16 coefficients structured as a row-major 2D array in a linear buffer.
[in]scaleScale of the convolution matrix. If 0 is passed, it will be set to the sum of the coefficients of the convolution or 1 if they add up to 0.
[in]border_modeStrategy to use for borders.
[in]constant_border_value(Optional) Constant value to use for borders if border_mode is set to CONSTANT.

Definition at line 73 of file CLConvolution.cpp.

References CLTensorAllocator::allocate(), CLTensor::allocator(), ARM_COMPUTE_ERROR_ON, ARM_COMPUTE_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, arm_compute::calculate_matrix_scale(), arm_compute::data_type_for_convolution(), ITensor::info(), ITensorAllocator::init(), MemoryGroup::manage(), arm_compute::test::validation::scale, arm_compute::separate_matrix(), ITensorInfo::tensor_shape(), arm_compute::U8, and arm_compute::UNDEFINED.

75 {
77  ARM_COMPUTE_ERROR_ON(conv == nullptr);
78  std::array<int16_t, matrix_size> conv_col{ 0 };
79  std::array<int16_t, matrix_size> conv_row{ 0 };
80  _is_separable = separate_matrix(conv, conv_col.data(), conv_row.data(), matrix_size);
81 
82  if(_is_separable)
83  {
84  std::pair<DataType, DataType> type_pair = data_type_for_convolution(conv_col.data(), conv_row.data(), matrix_size);
85  _tmp.allocator()->init(TensorInfo(input->info()->tensor_shape(), 1, type_pair.first));
86 
87  // Manage intermediate buffers
88  _memory_group.manage(&_tmp);
89 
90  if(scale == 0)
91  {
92  scale = calculate_matrix_scale(conv, matrix_size);
93  }
94 
95  _kernel_hor->configure(compile_context, input, &_tmp, conv_row.data(), border_mode == BorderMode::UNDEFINED);
96  _kernel_vert->configure(compile_context, &_tmp, output, conv_col.data(), scale, border_mode == BorderMode::UNDEFINED, type_pair.second);
97  _border_handler->configure(compile_context, input, _kernel_hor->border_size(), border_mode, PixelValue(constant_border_value));
98 
99  // Allocate intermediate buffer
100  _tmp.allocator()->allocate();
101  }
102  else
103  {
104  _kernel->configure(compile_context, input, output, conv, scale, border_mode == BorderMode::UNDEFINED);
105  _border_handler->configure(compile_context, input, _kernel->border_size(), border_mode, PixelValue(constant_border_value));
106  }
107 }
Class describing the value of a pixel for any image format.
Definition: PixelValue.h:34
1 channel, 1 U8 per channel
std::pair< DataType, DataType > data_type_for_convolution(const int16_t *conv_col, const int16_t *conv_row, size_t size)
Calculate accurary required by the horizontal and vertical convolution computations.
Definition: Utils.h:806
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:466
CLTensorAllocator * allocator()
Return a pointer to the tensor&#39;s allocator.
Definition: CLTensor.cpp:61
uint32_t calculate_matrix_scale(const int16_t *matrix, unsigned int matrix_size)
Calculate the scale of the given square matrix.
Definition: Utils.h:727
void init(const TensorInfo &input, size_t alignment=0)
Initialize a tensor based on the passed TensorInfo.
void manage(IMemoryManageable *obj) override
Sets a object to be managed by the given memory group.
Definition: MemoryGroup.h:79
virtual const TensorShape & tensor_shape() const =0
Size for each dimension of the tensor.
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor&#39;s metadata.
#define ARM_COMPUTE_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(t, c,...)
Definition: Validate.h:790
void allocate() override
Allocate size specified by TensorInfo of OpenCL memory.
Borders are left undefined.
Store the tensor&#39;s metadata.
Definition: TensorInfo.h:45
bool separate_matrix(const int16_t *conv, int16_t *conv_col, int16_t *conv_row, uint8_t size)
Separate a 2D convolution into two 1D convolutions.
Definition: Utils.h:667

◆ operator=() [1/2]

CLConvolutionSquare& operator= ( const CLConvolutionSquare< matrix_size > &  )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

CLConvolutionSquare& operator= ( CLConvolutionSquare< matrix_size > &&  )
default

Default move assignment operator.

◆ run()

void run ( )
overridevirtual

Run the kernels contained in the function.

For Neon kernels:

  • Multi-threading is used for the kernels which are parallelisable.
  • By default std::thread::hardware_concurrency() threads are used.
Note
CPPScheduler::set_num_threads() can be used to manually set the number of threads

For OpenCL kernels:

  • All the kernels are enqueued on the queue associated with CLScheduler.
  • The queue is then flushed.
Note
The function will not block until the kernels are executed. It is the user's responsibility to wait.
Will call prepare() on first run if hasn't been done

Implements IFunction.

Definition at line 110 of file CLConvolution.cpp.

References CLScheduler::enqueue(), and CLScheduler::get().

111 {
112  CLScheduler::get().enqueue(*_border_handler);
113 
114  if(_is_separable)
115  {
116  MemoryGroupResourceScope scope_mg(_memory_group);
117 
118  CLScheduler::get().enqueue(*_kernel_hor, false);
119  CLScheduler::get().enqueue(*_kernel_vert);
120  }
121  else
122  {
123  CLScheduler::get().enqueue(*_kernel);
124  }
125 }
static CLScheduler & get()
Access the scheduler singleton.
void enqueue(ICLKernel &kernel, bool flush=true)
Schedule the execution of the passed kernel if possible.
Memory group resources scope handling class.
Definition: IMemoryGroup.h:82

The documentation for this class was generated from the following files: