Compute Library
 19.08
NEHOGDetectorKernel Class Reference

NEON kernel to perform HOG detector kernel using linear SVM. More...

#include <NEHOGDetectorKernel.h>

Collaboration diagram for NEHOGDetectorKernel:
[legend]

Public Member Functions

const char * name () const override
 Name of the kernel. More...
 
 NEHOGDetectorKernel ()
 Default constructor. More...
 
 NEHOGDetectorKernel (const NEHOGDetectorKernel &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
NEHOGDetectorKerneloperator= (const NEHOGDetectorKernel &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 NEHOGDetectorKernel (NEHOGDetectorKernel &&)=default
 Allow instances of this class to be moved. More...
 
NEHOGDetectorKerneloperator= (NEHOGDetectorKernel &&)=default
 Allow instances of this class to be moved. More...
 
 ~NEHOGDetectorKernel ()=default
 Default destructor. More...
 
void configure (const ITensor *input, const IHOG *hog, IDetectionWindowArray *detection_windows, const Size2D &detection_window_stride, float threshold=0.0f, uint16_t idx_class=0)
 Initialise the kernel's input, HOG data-object, detection window, the stride of the detection window, the threshold and index of the object to detect. More...
 
void run (const Window &window, const ThreadInfo &info) override
 Execute the kernel on the passed window. More...
 
- Public Member Functions inherited from ICPPKernel
virtual ~ICPPKernel ()=default
 Default destructor. More...
 
- Public Member Functions inherited from IKernel
 IKernel ()
 Constructor. More...
 
virtual ~IKernel ()=default
 Destructor. More...
 
virtual bool is_parallelisable () const
 Indicates whether or not the kernel is parallelisable. More...
 
virtual BorderSize border_size () const
 The size of the border for that kernel. More...
 
const Windowwindow () const
 The maximum window the kernel can be executed on. More...
 

Detailed Description

NEON kernel to perform HOG detector kernel using linear SVM.

Definition at line 37 of file NEHOGDetectorKernel.h.

Constructor & Destructor Documentation

◆ NEHOGDetectorKernel() [1/3]

Default constructor.

Definition at line 36 of file NEHOGDetectorKernel.cpp.

37  : _input(nullptr), _detection_windows(), _hog_descriptor(nullptr), _bias(0.0f), _threshold(0.0f), _idx_class(0), _num_bins_per_descriptor_x(0), _num_blocks_per_descriptor_y(0), _block_stride_width(0),
38  _block_stride_height(0), _detection_window_width(0), _detection_window_height(0), _max_num_detection_windows(0), _mutex()
39 {
40 }

◆ NEHOGDetectorKernel() [2/3]

Prevent instances of this class from being copied (As this class contains pointers)

◆ NEHOGDetectorKernel() [3/3]

Allow instances of this class to be moved.

◆ ~NEHOGDetectorKernel()

~NEHOGDetectorKernel ( )
default

Default destructor.

Member Function Documentation

◆ configure()

void configure ( const ITensor input,
const IHOG hog,
IDetectionWindowArray detection_windows,
const Size2D detection_window_stride,
float  threshold = 0.0f,
uint16_t  idx_class = 0 
)

Initialise the kernel's input, HOG data-object, detection window, the stride of the detection window, the threshold and index of the object to detect.

Parameters
[in]inputInput tensor which stores the HOG descriptor obtained with NEHOGOrientationBinningKernel. Data type supported: F32. Number of channels supported: equal to the number of histogram bins per block
[in]hogHOG data object used by NEHOGOrientationBinningKernel and NEHOGBlockNormalizationKernel
[out]detection_windowsArray of DetectionWindow. This array stores all the detected objects
[in]detection_window_strideDistance in pixels between 2 consecutive detection windows in x and y directions. It must be multiple of the hog->info()->block_stride()
[in]threshold(Optional) Threshold for the distance between features and SVM classifying plane
[in]idx_class(Optional) Index of the class used for evaluating which class the detection window belongs to

Definition at line 42 of file NEHOGDetectorKernel.cpp.

43 {
45  ARM_COMPUTE_ERROR_ON(hog == nullptr);
46  ARM_COMPUTE_ERROR_ON(detection_windows == nullptr);
47  ARM_COMPUTE_ERROR_ON((detection_window_stride.width % hog->info()->block_stride().width) != 0);
48  ARM_COMPUTE_ERROR_ON((detection_window_stride.height % hog->info()->block_stride().height) != 0);
49 
50  const Size2D &detection_window_size = hog->info()->detection_window_size();
51  const Size2D &block_size = hog->info()->block_size();
52  const Size2D &block_stride = hog->info()->block_stride();
53 
54  _input = input;
55  _detection_windows = detection_windows;
56  _threshold = threshold;
57  _idx_class = idx_class;
58  _hog_descriptor = hog->descriptor();
59  _bias = _hog_descriptor[hog->info()->descriptor_size() - 1];
60  _num_bins_per_descriptor_x = ((detection_window_size.width - block_size.width) / block_stride.width + 1) * input->info()->num_channels();
61  _num_blocks_per_descriptor_y = (detection_window_size.height - block_size.height) / block_stride.height + 1;
62  _block_stride_width = block_stride.width;
63  _block_stride_height = block_stride.height;
64  _detection_window_width = detection_window_size.width;
65  _detection_window_height = detection_window_size.height;
66  _max_num_detection_windows = detection_windows->max_num_values();
67 
68  ARM_COMPUTE_ERROR_ON((_num_bins_per_descriptor_x * _num_blocks_per_descriptor_y + 1) != hog->info()->descriptor_size());
69 
70  // Get the number of blocks along the x and y directions of the input tensor
71  const ValidRegion &valid_region = input->info()->valid_region();
72  const size_t num_blocks_x = valid_region.shape[0];
73  const size_t num_blocks_y = valid_region.shape[1];
74 
75  // Get the number of blocks along the x and y directions of the detection window
76  const size_t num_blocks_per_detection_window_x = detection_window_size.width / block_stride.width;
77  const size_t num_blocks_per_detection_window_y = detection_window_size.height / block_stride.height;
78 
79  const size_t window_step_x = detection_window_stride.width / block_stride.width;
80  const size_t window_step_y = detection_window_stride.height / block_stride.height;
81 
82  // Configure kernel window
83  Window win;
84  win.set(Window::DimX, Window::Dimension(0, floor_to_multiple(num_blocks_x - num_blocks_per_detection_window_x, window_step_x) + window_step_x, window_step_x));
85  win.set(Window::DimY, Window::Dimension(0, floor_to_multiple(num_blocks_y - num_blocks_per_detection_window_y, window_step_y) + window_step_y, window_step_y));
86 
87  constexpr unsigned int num_elems_read_per_iteration = 1;
88  const unsigned int num_rows_read_per_iteration = _num_blocks_per_descriptor_y;
89 
90  update_window_and_padding(win, AccessWindowRectangle(input->info(), 0, 0, num_elems_read_per_iteration, num_rows_read_per_iteration));
91 
92  INEKernel::configure(win);
93 }
const Size2D & detection_window_size() const
The detection window size in pixels.
Definition: HOGInfo.cpp:101
TensorShape shape
Shape of the valid region.
Definition: Types.h:247
1 channel, 1 F32 per channel
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:337
Describe one of the image's dimensions with a start, end and step.
Definition: Window.h:75
auto floor_to_multiple(S value, T divisor) -> decltype((value/divisor) *divisor)
Computes the largest number smaller or equal to value that is a multiple of divisor.
Definition: Utils.h:80
const Size2D & block_stride() const
The block stride in pixels.
Definition: HOGInfo.cpp:106
size_t height
Height of the image region or rectangle.
Definition: Size2D.h:93
virtual ValidRegion valid_region() const =0
Valid region of the tensor.
Implementation of a rectangular access pattern.
static constexpr size_t DimX
Alias for dimension 0 also known as X dimension.
Definition: Window.h:43
bool update_window_and_padding(Window &win, Ts &&... patterns)
Update window and padding size for each of the access patterns.
Definition: Helpers.h:402
const Size2D & block_size() const
The block size in pixels.
Definition: HOGInfo.cpp:96
virtual float * descriptor() const =0
Pointer to the first element of the array which stores the linear SVM coefficients of HOG descriptor.
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor's metadata.
void set(size_t dimension, const Dimension &dim)
Set the values of a given dimension.
Definition: Window.inl:48
static constexpr size_t DimY
Alias for dimension 1 also known as Y dimension.
Definition: Window.h:45
size_t width
Width of the image region or rectangle.
Definition: Size2D.h:92
Class for specifying the size of an image or rectangle.
Definition: Size2D.h:34
size_t max_num_values() const
Maximum number of values which can be stored in this array.
Definition: IArray.h:58
virtual const HOGInfo * info() const =0
Interface to be implemented by the child class to return the HOG's metadata.
Container for valid region of a window.
Definition: Types.h:174
size_t descriptor_size() const
The size of HOG descriptor.
Definition: HOGInfo.cpp:131
SimpleTensor< T > threshold(const SimpleTensor< T > &src, T threshold, T false_value, T true_value, ThresholdType type, T upper)
Definition: Threshold.cpp:35
Describe a multidimensional execution window.
Definition: Window.h:39
virtual size_t num_channels() const =0
The number of channels for each tensor element.
#define ARM_COMPUTE_ERROR_ON_DATA_TYPE_NOT_IN(t,...)
Definition: Validate.h:691

References ARM_COMPUTE_ERROR_ON, ARM_COMPUTE_ERROR_ON_DATA_TYPE_NOT_IN, HOGInfo::block_size(), HOGInfo::block_stride(), IHOG::descriptor(), HOGInfo::descriptor_size(), HOGInfo::detection_window_size(), Window::DimX, Window::DimY, arm_compute::F32, arm_compute::floor_to_multiple(), Size2D::height, IHOG::info(), ITensor::info(), IArray< T >::max_num_values(), ITensorInfo::num_channels(), Window::set(), ValidRegion::shape, arm_compute::test::validation::reference::threshold(), arm_compute::update_window_and_padding(), arm_compute::test::validation::valid_region, ITensorInfo::valid_region(), and Size2D::width.

◆ name()

const char* name ( ) const
inlineoverridevirtual

Name of the kernel.

Returns
Kernel name

Implements ICPPKernel.

Definition at line 40 of file NEHOGDetectorKernel.h.

41  {
42  return "NEHOGDetectorKernel";
43  }

◆ operator=() [1/2]

NEHOGDetectorKernel& operator= ( const NEHOGDetectorKernel )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

NEHOGDetectorKernel& operator= ( NEHOGDetectorKernel &&  )
default

Allow instances of this class to be moved.

◆ run()

void run ( const Window window,
const ThreadInfo info 
)
overridevirtual

Execute the kernel on the passed window.

Warning
If is_parallelisable() returns false then the passed window must be equal to window()
Note
The window has to be a region within the window returned by the window() method
The width of the window has to be a multiple of num_elems_processed_per_iteration().
Parameters
[in]windowRegion on which to execute the kernel. (Must be a region of the window returned by window())
[in]infoInfo about executing thread and CPU.

Implements ICPPKernel.

Definition at line 95 of file NEHOGDetectorKernel.cpp.

96 {
100  ARM_COMPUTE_ERROR_ON(_hog_descriptor == nullptr);
101 
102  const size_t in_step_y = _input->info()->strides_in_bytes()[Window::DimY] / data_size_from_type(_input->info()->data_type());
103 
104  Iterator in(_input, window);
105 
106  execute_window_loop(window, [&](const Coordinates & id)
107  {
108  const auto *in_row_ptr = reinterpret_cast<const float *>(in.ptr());
109 
110  // Init score_f32 with 0
111  float32x4_t score_f32 = vdupq_n_f32(0.0f);
112 
113  // Init score with bias
114  float score = _bias;
115 
116  // Compute Linear SVM
117  for(size_t yb = 0; yb < _num_blocks_per_descriptor_y; ++yb, in_row_ptr += in_step_y)
118  {
119  int32_t xb = 0;
120 
121  const int32_t offset_y = yb * _num_bins_per_descriptor_x;
122 
123  for(; xb < static_cast<int32_t>(_num_bins_per_descriptor_x) - 16; xb += 16)
124  {
125  // Load descriptor values
126  const float32x4x4_t a_f32 =
127  {
128  {
129  vld1q_f32(&in_row_ptr[xb + 0]),
130  vld1q_f32(&in_row_ptr[xb + 4]),
131  vld1q_f32(&in_row_ptr[xb + 8]),
132  vld1q_f32(&in_row_ptr[xb + 12])
133  }
134  };
135 
136  // Load detector values
137  const float32x4x4_t b_f32 =
138  {
139  {
140  vld1q_f32(&_hog_descriptor[xb + 0 + offset_y]),
141  vld1q_f32(&_hog_descriptor[xb + 4 + offset_y]),
142  vld1q_f32(&_hog_descriptor[xb + 8 + offset_y]),
143  vld1q_f32(&_hog_descriptor[xb + 12 + offset_y])
144  }
145  };
146 
147  // Multiply accumulate
148  score_f32 = vmlaq_f32(score_f32, a_f32.val[0], b_f32.val[0]);
149  score_f32 = vmlaq_f32(score_f32, a_f32.val[1], b_f32.val[1]);
150  score_f32 = vmlaq_f32(score_f32, a_f32.val[2], b_f32.val[2]);
151  score_f32 = vmlaq_f32(score_f32, a_f32.val[3], b_f32.val[3]);
152  }
153 
154  for(; xb < static_cast<int32_t>(_num_bins_per_descriptor_x); ++xb)
155  {
156  const float a = in_row_ptr[xb];
157  const float b = _hog_descriptor[xb + offset_y];
158 
159  score += a * b;
160  }
161  }
162 
163  score += vgetq_lane_f32(score_f32, 0);
164  score += vgetq_lane_f32(score_f32, 1);
165  score += vgetq_lane_f32(score_f32, 2);
166  score += vgetq_lane_f32(score_f32, 3);
167 
168  if(score > _threshold)
169  {
170  if(_detection_windows->num_values() < _max_num_detection_windows)
171  {
172  DetectionWindow win;
173  win.x = (id.x() * _block_stride_width);
174  win.y = (id.y() * _block_stride_height);
175  win.width = _detection_window_width;
176  win.height = _detection_window_height;
177  win.idx_class = _idx_class;
178  win.score = score;
179 
180  std::unique_lock<arm_compute::Mutex> lock(_mutex);
181  _detection_windows->push_back(win);
182  lock.unlock();
183  }
184  }
185  },
186  in);
187 }
const Window & window() const
The maximum window the kernel can be executed on.
Definition: IKernel.cpp:28
uint16_t x
Top-left x coordinate.
Definition: Types.h:546
float score
Confidence value for the detection window.
Definition: Types.h:551
SimpleTensor< float > b
Definition: DFT.cpp:157
virtual DataType data_type() const =0
Data type used for each element of the tensor.
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:337
#define ARM_COMPUTE_UNUSED(...)
To avoid unused variables warnings.
Definition: Error.h:160
size_t num_values() const
Number of values currently stored in the array.
Definition: IArray.h:68
uint16_t width
Width of the detection window.
Definition: Types.h:548
Coordinates of an item.
Definition: Coordinates.h:37
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor's metadata.
size_t data_size_from_type(DataType data_type)
The size in bytes of the data type.
Definition: Utils.h:109
uint16_t idx_class
Index of the class.
Definition: Types.h:550
uint16_t height
Height of the detection window.
Definition: Types.h:549
static constexpr size_t DimY
Alias for dimension 1 also known as Y dimension.
Definition: Window.h:45
Detection window used for the object detection.
Definition: Types.h:544
uint16_t y
Top-left y coordinate.
Definition: Types.h:547
void execute_window_loop(const Window &w, L &&lambda_function, Ts &&... iterators)
Iterate through the passed window, automatically adjusting the iterators and calling the lambda_funct...
Definition: Helpers.inl:122
virtual const Strides & strides_in_bytes() const =0
The strides in bytes for accessing each dimension of the tensor.
bool push_back(const T &val)
Append the passed argument to the end of the array if there is room.
Definition: IArray.h:78
Iterator updated by execute_window_loop for each window element.
Definition: Helpers.h:318
#define ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW(f, s)
Definition: Validate.h:205
#define ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL(k)
Definition: Validate.h:940

References ARM_COMPUTE_ERROR_ON, ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW, ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL, ARM_COMPUTE_UNUSED, arm_compute::test::validation::b, arm_compute::data_size_from_type(), ITensorInfo::data_type(), Window::DimY, arm_compute::execute_window_loop(), DetectionWindow::height, DetectionWindow::idx_class, ITensor::info(), arm_compute::test::validation::info, IArray< T >::num_values(), IArray< T >::push_back(), DetectionWindow::score, ITensorInfo::strides_in_bytes(), DetectionWindow::width, IKernel::window(), DetectionWindow::x, and DetectionWindow::y.


The documentation for this class was generated from the following files: