Compute Library
 19.08
CPPBoxWithNonMaximaSuppressionLimitKernel Class Reference

CPP kernel to perform computation of BoxWithNonMaximaSuppressionLimit. More...

#include <CPPBoxWithNonMaximaSuppressionLimitKernel.h>

Collaboration diagram for CPPBoxWithNonMaximaSuppressionLimitKernel:
[legend]

Public Member Functions

const char * name () const override
 Name of the kernel. More...
 
 CPPBoxWithNonMaximaSuppressionLimitKernel ()
 Default constructor. More...
 
 CPPBoxWithNonMaximaSuppressionLimitKernel (const CPPBoxWithNonMaximaSuppressionLimitKernel &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
CPPBoxWithNonMaximaSuppressionLimitKerneloperator= (const CPPBoxWithNonMaximaSuppressionLimitKernel &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 CPPBoxWithNonMaximaSuppressionLimitKernel (CPPBoxWithNonMaximaSuppressionLimitKernel &&)=default
 Allow instances of this class to be moved. More...
 
CPPBoxWithNonMaximaSuppressionLimitKerneloperator= (CPPBoxWithNonMaximaSuppressionLimitKernel &&)=default
 Allow instances of this class to be moved. More...
 
void configure (const ITensor *scores_in, const ITensor *boxes_in, const ITensor *batch_splits_in, ITensor *scores_out, ITensor *boxes_out, ITensor *classes, ITensor *batch_splits_out=nullptr, ITensor *keeps=nullptr, ITensor *keeps_size=nullptr, const BoxNMSLimitInfo info=BoxNMSLimitInfo())
 Initialise the kernel's input and output tensors. More...
 
void run (const Window &window, const ThreadInfo &info) override
 Execute the kernel on the passed window. More...
 
bool is_parallelisable () const override
 Indicates whether or not the kernel is parallelisable. More...
 
template<typename T >
void run_nmslimit ()
 
- Public Member Functions inherited from ICPPKernel
virtual ~ICPPKernel ()=default
 Default destructor. More...
 
- Public Member Functions inherited from IKernel
 IKernel ()
 Constructor. More...
 
virtual ~IKernel ()=default
 Destructor. More...
 
virtual BorderSize border_size () const
 The size of the border for that kernel. More...
 
const Windowwindow () const
 The maximum window the kernel can be executed on. More...
 

Detailed Description

CPP kernel to perform computation of BoxWithNonMaximaSuppressionLimit.

Definition at line 37 of file CPPBoxWithNonMaximaSuppressionLimitKernel.h.

Constructor & Destructor Documentation

◆ CPPBoxWithNonMaximaSuppressionLimitKernel() [1/3]

Default constructor.

Definition at line 186 of file CPPBoxWithNonMaximaSuppressionLimitKernel.cpp.

187  : _scores_in(nullptr), _boxes_in(nullptr), _batch_splits_in(nullptr), _scores_out(nullptr), _boxes_out(nullptr), _classes(nullptr), _batch_splits_out(nullptr), _keeps(nullptr), _keeps_size(nullptr),
188  _info()
189 {
190 }

◆ CPPBoxWithNonMaximaSuppressionLimitKernel() [2/3]

Prevent instances of this class from being copied (As this class contains pointers)

◆ CPPBoxWithNonMaximaSuppressionLimitKernel() [3/3]

Allow instances of this class to be moved.

Member Function Documentation

◆ configure()

void configure ( const ITensor scores_in,
const ITensor boxes_in,
const ITensor batch_splits_in,
ITensor scores_out,
ITensor boxes_out,
ITensor classes,
ITensor batch_splits_out = nullptr,
ITensor keeps = nullptr,
ITensor keeps_size = nullptr,
const BoxNMSLimitInfo  info = BoxNMSLimitInfo() 
)

Initialise the kernel's input and output tensors.

Parameters
[in]scores_inThe scores input tensor of size [num_classes, count]. Data types supported: F16/F32
[in]boxes_inThe boxes input tensor of size [num_classes * 4, count]. Data types supported: Same as scores_in
[in]batch_splits_inThe batch splits input tensor of size [batch_size]. Data types supported: Same as scores_in
Note
Can be a nullptr. If not a nullptr, scores_in and boxes_in have items from multiple images.
Parameters
[out]scores_outThe scores output tensor of size [N]. Data types supported: Same as scores_in
[out]boxes_outThe boxes output tensor of size [4, N]. Data types supported: Same as scores_in
[out]classesThe classes output tensor of size [N]. Data types supported: Same as scores_in
[out]batch_splits_out(Optional) The batch splits output tensor [batch_size]. Data types supported: Same as scores_in
[out]keeps(Optional) The keeps output tensor of size [N]. Data types supported: Same asscores_in
[out]keeps_size(Optional) Number of filtered indices per class tensor of size [num_classes]. Data types supported: Same as scores_in
[in]info(Optional) BoxNMSLimitInfo information.

Definition at line 349 of file CPPBoxWithNonMaximaSuppressionLimitKernel.cpp.

351 {
352  ARM_COMPUTE_ERROR_ON_NULLPTR(scores_in, boxes_in, scores_out, boxes_out, classes);
354  const unsigned int num_classes = scores_in->info()->dimension(0);
355 
356  ARM_COMPUTE_UNUSED(num_classes);
357  ARM_COMPUTE_ERROR_ON_MSG((4 * num_classes) != boxes_in->info()->dimension(0), "First dimension of input boxes must be of size 4*num_classes");
358  ARM_COMPUTE_ERROR_ON_MSG(scores_in->info()->dimension(1) != boxes_in->info()->dimension(1), "Input scores and input boxes must have the same number of rows");
359 
360  ARM_COMPUTE_ERROR_ON(scores_out->info()->dimension(0) != boxes_out->info()->dimension(1));
361  ARM_COMPUTE_ERROR_ON(boxes_out->info()->dimension(0) != 4);
362  if(keeps != nullptr)
363  {
364  ARM_COMPUTE_ERROR_ON_MSG(keeps_size == nullptr, "keeps_size cannot be nullptr if keeps has to be provided as output");
367  ARM_COMPUTE_ERROR_ON(scores_out->info()->dimension(0) != keeps->info()->dimension(0));
368  ARM_COMPUTE_ERROR_ON(num_classes != keeps_size->info()->dimension(0));
369  }
370  if(batch_splits_in != nullptr)
371  {
372  ARM_COMPUTE_ERROR_ON_MISMATCHING_DATA_TYPES(scores_in, batch_splits_in);
373  }
374  if(batch_splits_out != nullptr)
375  {
376  ARM_COMPUTE_ERROR_ON_MISMATCHING_DATA_TYPES(scores_in, batch_splits_out);
377  }
378 
379  _scores_in = scores_in;
380  _boxes_in = boxes_in;
381  _batch_splits_in = batch_splits_in;
382  _scores_out = scores_out;
383  _boxes_out = boxes_out;
384  _classes = classes;
385  _batch_splits_out = batch_splits_out;
386  _keeps = keeps;
387  _keeps_size = keeps_size;
388  _info = info;
389 
390  // Configure kernel window
391  Window win = calculate_max_window(*scores_in->info(), Steps(scores_in->info()->dimension(0)));
392 
393  IKernel::configure(win);
394 }
#define ARM_COMPUTE_ERROR_ON_MISMATCHING_DATA_TYPES(...)
Definition: Validate.h:543
1 channel, 1 F32 per channel
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:337
Window calculate_max_window(const ValidRegion &valid_region, const Steps &steps=Steps(), bool skip_border=false, BorderSize border_size=BorderSize())
Calculate the maximum window for a given tensor shape and border setting.
Definition: Helpers.cpp:28
1 channel, 1 F16 per channel
#define ARM_COMPUTE_UNUSED(...)
To avoid unused variables warnings.
Definition: Error.h:160
1 channel, 1 U32 per channel
#define ARM_COMPUTE_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(t, c,...)
Definition: Validate.h:789
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161
#define ARM_COMPUTE_ERROR_ON_MSG(cond,...)
Definition: Error.h:328

References ARM_COMPUTE_ERROR_ON, ARM_COMPUTE_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_ERROR_ON_MISMATCHING_DATA_TYPES, ARM_COMPUTE_ERROR_ON_MSG, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_UNUSED, arm_compute::calculate_max_window(), ITensorInfo::dimension(), arm_compute::F16, arm_compute::F32, ITensor::info(), arm_compute::test::validation::info, and arm_compute::U32.

Referenced by CLGenerateProposalsLayer::configure().

◆ is_parallelisable()

bool is_parallelisable ( ) const
overridevirtual

Indicates whether or not the kernel is parallelisable.

If the kernel is parallelisable then the window returned by window() can be split into sub-windows which can then be run in parallel.

If the kernel is not parallelisable then only the window returned by window() can be passed to run()

Returns
True if the kernel is parallelisable

Reimplemented from IKernel.

Definition at line 192 of file CPPBoxWithNonMaximaSuppressionLimitKernel.cpp.

193 {
194  return false;
195 }

◆ name()

const char* name ( ) const
inlineoverridevirtual

Name of the kernel.

Returns
Kernel name

Implements ICPPKernel.

Definition at line 40 of file CPPBoxWithNonMaximaSuppressionLimitKernel.h.

41  {
42  return "CPPBoxWithNonMaximaSuppressionLimitKernel";
43  }

◆ operator=() [1/2]

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

Allow instances of this class to be moved.

◆ run()

void run ( const Window window,
const ThreadInfo info 
)
overridevirtual

Execute the kernel on the passed window.

Warning
If is_parallelisable() returns false then the passed window must be equal to window()
Note
The window has to be a region within the window returned by the window() method
The width of the window has to be a multiple of num_elems_processed_per_iteration().
Parameters
[in]windowRegion on which to execute the kernel. (Must be a region of the window returned by window())
[in]infoInfo about executing thread and CPU.

Implements ICPPKernel.

Definition at line 396 of file CPPBoxWithNonMaximaSuppressionLimitKernel.cpp.

397 {
402 
403  switch(_scores_in->info()->data_type())
404  {
405  case DataType::F32:
406  run_nmslimit<float>();
407  break;
408  case DataType::F16:
409  run_nmslimit<half>();
410  break;
411  default:
412  ARM_COMPUTE_ERROR("Not supported");
413  }
414 }
#define ARM_COMPUTE_ERROR(...)
Print the given message then throw an std::runtime_error.
Definition: Error.h:261
const Window & window() const
The maximum window the kernel can be executed on.
Definition: IKernel.cpp:28
virtual DataType data_type() const =0
Data type used for each element of the tensor.
1 channel, 1 F32 per channel
1 channel, 1 F16 per channel
#define ARM_COMPUTE_UNUSED(...)
To avoid unused variables warnings.
Definition: Error.h:160
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor's metadata.
#define ARM_COMPUTE_ERROR_ON_MISMATCHING_WINDOWS(f, w)
Definition: Validate.h:183
#define ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL(k)
Definition: Validate.h:940

References ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON_MISMATCHING_WINDOWS, ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL, ARM_COMPUTE_UNUSED, ITensorInfo::data_type(), arm_compute::F16, arm_compute::F32, ITensor::info(), arm_compute::test::validation::info, and IKernel::window().

◆ run_nmslimit()

void run_nmslimit ( )

Definition at line 198 of file CPPBoxWithNonMaximaSuppressionLimitKernel.cpp.

199 {
200  const int batch_size = _batch_splits_in == nullptr ? 1 : _batch_splits_in->info()->dimension(0);
201  const int num_classes = _scores_in->info()->dimension(0);
202  const int scores_count = _scores_in->info()->dimension(1);
203  std::vector<int> total_keep_per_batch(batch_size);
204  std::vector<std::vector<int>> keeps(num_classes);
205  int total_keep_count = 0;
206 
207  std::vector<std::vector<T>> in_scores(num_classes, std::vector<T>(scores_count));
208  for(int i = 0; i < scores_count; ++i)
209  {
210  for(int j = 0; j < num_classes; ++j)
211  {
212  in_scores[j][i] = *reinterpret_cast<const T *>(_scores_in->ptr_to_element(Coordinates(j, i)));
213  }
214  }
215 
216  int offset = 0;
217  int cur_start_idx = 0;
218  for(int b = 0; b < batch_size; ++b)
219  {
220  const int num_boxes = _batch_splits_in == nullptr ? 1 : static_cast<int>(*reinterpret_cast<T *>(_batch_splits_in->ptr_to_element(Coordinates(b))));
221  // Skip first class if there is more than 1 except if the number of classes is 1.
222  const int j_start = (num_classes == 1 ? 0 : 1);
223  for(int j = j_start; j < num_classes; ++j)
224  {
225  std::vector<T> cur_scores(scores_count);
226  std::vector<int> inds;
227  for(int i = 0; i < scores_count; ++i)
228  {
229  const T score = in_scores[j][i];
230  cur_scores[i] = score;
231 
232  if(score > _info.score_thresh())
233  {
234  inds.push_back(i);
235  }
236  }
237  if(_info.soft_nms_enabled())
238  {
239  keeps[j] = SoftNMS(_boxes_in, in_scores, inds, _info, j);
240  }
241  else
242  {
243  std::sort(inds.data(), inds.data() + inds.size(),
244  [&cur_scores](int lhs, int rhs)
245  {
246  return cur_scores[lhs] > cur_scores[rhs];
247  });
248 
249  keeps[j] = NonMaximaSuppression<T>(_boxes_in, inds, _info, j);
250  }
251  total_keep_count += keeps[j].size();
252  }
253 
254  if(_info.detections_per_im() > 0 && total_keep_count > _info.detections_per_im())
255  {
256  // merge all scores (represented by indices) together and sort
257  auto get_all_scores_sorted = [&in_scores, &keeps, total_keep_count]()
258  {
259  std::vector<T> ret(total_keep_count);
260 
261  int ret_idx = 0;
262  for(unsigned int i = 1; i < keeps.size(); ++i)
263  {
264  auto &cur_keep = keeps[i];
265  for(auto &ckv : cur_keep)
266  {
267  ret[ret_idx++] = in_scores[i][ckv];
268  }
269  }
270 
271  std::sort(ret.data(), ret.data() + ret.size());
272 
273  return ret;
274  };
275 
276  auto all_scores_sorted = get_all_scores_sorted();
277  const T image_thresh = all_scores_sorted[all_scores_sorted.size() - _info.detections_per_im()];
278  for(int j = 1; j < num_classes; ++j)
279  {
280  auto &cur_keep = keeps[j];
281  std::vector<int> new_keeps_j;
282  for(auto &k : cur_keep)
283  {
284  if(in_scores[j][k] >= image_thresh)
285  {
286  new_keeps_j.push_back(k);
287  }
288  }
289  keeps[j] = new_keeps_j;
290  }
291  total_keep_count = _info.detections_per_im();
292  }
293 
294  total_keep_per_batch[b] = total_keep_count;
295 
296  // Write results
297  int cur_out_idx = 0;
298  for(int j = j_start; j < num_classes; ++j)
299  {
300  auto &cur_keep = keeps[j];
301  auto cur_out_scores = reinterpret_cast<T *>(_scores_out->ptr_to_element(Coordinates(cur_start_idx + cur_out_idx)));
302  auto cur_out_classes = reinterpret_cast<T *>(_classes->ptr_to_element(Coordinates(cur_start_idx + cur_out_idx)));
303  const int box_column = (cur_start_idx + cur_out_idx) * 4;
304 
305  for(unsigned int k = 0; k < cur_keep.size(); ++k)
306  {
307  cur_out_scores[k] = in_scores[j][cur_keep[k]];
308  cur_out_classes[k] = static_cast<T>(j);
309  auto cur_out_box_row0 = reinterpret_cast<T *>(_boxes_out->ptr_to_element(Coordinates(box_column + 0, k)));
310  auto cur_out_box_row1 = reinterpret_cast<T *>(_boxes_out->ptr_to_element(Coordinates(box_column + 1, k)));
311  auto cur_out_box_row2 = reinterpret_cast<T *>(_boxes_out->ptr_to_element(Coordinates(box_column + 2, k)));
312  auto cur_out_box_row3 = reinterpret_cast<T *>(_boxes_out->ptr_to_element(Coordinates(box_column + 3, k)));
313  *cur_out_box_row0 = *reinterpret_cast<const T *>(_boxes_in->ptr_to_element(Coordinates(j * 4 + 0, cur_keep[k])));
314  *cur_out_box_row1 = *reinterpret_cast<const T *>(_boxes_in->ptr_to_element(Coordinates(j * 4 + 1, cur_keep[k])));
315  *cur_out_box_row2 = *reinterpret_cast<const T *>(_boxes_in->ptr_to_element(Coordinates(j * 4 + 2, cur_keep[k])));
316  *cur_out_box_row3 = *reinterpret_cast<const T *>(_boxes_in->ptr_to_element(Coordinates(j * 4 + 3, cur_keep[k])));
317  }
318 
319  cur_out_idx += cur_keep.size();
320  }
321 
322  if(_keeps != nullptr)
323  {
324  cur_out_idx = 0;
325  for(int j = 0; j < num_classes; ++j)
326  {
327  for(unsigned int i = 0; i < keeps[j].size(); ++i)
328  {
329  *reinterpret_cast<T *>(_keeps->ptr_to_element(Coordinates(cur_start_idx + cur_out_idx + i))) = static_cast<T>(keeps[j].at(i));
330  }
331  *reinterpret_cast<uint32_t *>(_keeps_size->ptr_to_element(Coordinates(j + b * num_classes))) = keeps[j].size();
332  cur_out_idx += keeps[j].size();
333  }
334  }
335 
336  offset += num_boxes;
337  cur_start_idx += total_keep_count;
338  }
339 
340  if(_batch_splits_out != nullptr)
341  {
342  for(int b = 0; b < batch_size; ++b)
343  {
344  *reinterpret_cast<float *>(_batch_splits_out->ptr_to_element(Coordinates(b))) = total_keep_per_batch[b];
345  }
346  }
347 }
__global uchar * offset(const Image *img, int x, int y)
Get the pointer position of a Image.
Definition: helpers.h:328
uint8_t * ptr_to_element(const Coordinates &id) const
Return a pointer to the element at the passed coordinates.
Definition: ITensor.h:63
float score_thresh() const
Get the score threshold.
Definition: Types.h:606
virtual size_t dimension(size_t index) const =0
Return the size of the requested dimension.
SimpleTensor< float > b
Definition: DFT.cpp:157
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor's metadata.
bool soft_nms_enabled() const
Check if soft NMS is enabled.
Definition: Types.h:621
int detections_per_im() const
Get the number of detections.
Definition: Types.h:616

References arm_compute::test::validation::b, BoxNMSLimitInfo::detections_per_im(), ITensorInfo::dimension(), ITensor::info(), offset(), ITensor::ptr_to_element(), BoxNMSLimitInfo::score_thresh(), and BoxNMSLimitInfo::soft_nms_enabled().


The documentation for this class was generated from the following files: