Compute Library
 22.11
CPPBoxWithNonMaximaSuppressionLimitKernel Class Reference

CPP kernel to perform computation of BoxWithNonMaximaSuppressionLimit. More...

#include <CPPBoxWithNonMaximaSuppressionLimitKernel.h>

Collaboration diagram for CPPBoxWithNonMaximaSuppressionLimitKernel:
[legend]

Public Member Functions

const char * name () const override
 Name of the kernel. More...
 
 CPPBoxWithNonMaximaSuppressionLimitKernel ()
 Default constructor. More...
 
 CPPBoxWithNonMaximaSuppressionLimitKernel (const CPPBoxWithNonMaximaSuppressionLimitKernel &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
CPPBoxWithNonMaximaSuppressionLimitKerneloperator= (const CPPBoxWithNonMaximaSuppressionLimitKernel &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 CPPBoxWithNonMaximaSuppressionLimitKernel (CPPBoxWithNonMaximaSuppressionLimitKernel &&)=default
 Allow instances of this class to be moved. More...
 
CPPBoxWithNonMaximaSuppressionLimitKerneloperator= (CPPBoxWithNonMaximaSuppressionLimitKernel &&)=default
 Allow instances of this class to be moved. More...
 
void configure (const ITensor *scores_in, const ITensor *boxes_in, const ITensor *batch_splits_in, ITensor *scores_out, ITensor *boxes_out, ITensor *classes, ITensor *batch_splits_out=nullptr, ITensor *keeps=nullptr, ITensor *keeps_size=nullptr, const BoxNMSLimitInfo info=BoxNMSLimitInfo())
 Initialise the kernel's input and output tensors. More...
 
void run (const Window &window, const ThreadInfo &info) override
 Execute the kernel on the passed window. More...
 
bool is_parallelisable () const override
 Indicates whether or not the kernel is parallelisable. More...
 
template<typename T >
void run_nmslimit ()
 
- Public Member Functions inherited from ICPPKernel
virtual ~ICPPKernel ()=default
 Default destructor. More...
 
virtual void run_nd (const Window &window, const ThreadInfo &info, const Window &thread_locator)
 legacy compatibility layer for implemantions which do not support thread_locator In these cases we simply narrow the interface down the legacy version More...
 
virtual void run_op (ITensorPack &tensors, const Window &window, const ThreadInfo &info)
 Execute the kernel on the passed window. More...
 
virtual size_t get_mws (const CPUInfo &platform, size_t thread_count) const
 Return minimum workload size of the relevant kernel. More...
 
- Public Member Functions inherited from IKernel
 IKernel ()
 Constructor. More...
 
virtual ~IKernel ()=default
 Destructor. More...
 
virtual BorderSize border_size () const
 The size of the border for that kernel. More...
 
const Windowwindow () const
 The maximum window the kernel can be executed on. More...
 
bool is_window_configured () const
 Function to check if the embedded window of this kernel has been configured. More...
 

Additional Inherited Members

- Static Public Attributes inherited from ICPPKernel
static constexpr size_t default_mws = 1
 

Detailed Description

CPP kernel to perform computation of BoxWithNonMaximaSuppressionLimit.

Definition at line 35 of file CPPBoxWithNonMaximaSuppressionLimitKernel.h.

Constructor & Destructor Documentation

◆ CPPBoxWithNonMaximaSuppressionLimitKernel() [1/3]

Default constructor.

Definition at line 186 of file CPPBoxWithNonMaximaSuppressionLimitKernel.cpp.

Referenced by CPPBoxWithNonMaximaSuppressionLimitKernel::name().

187  : _scores_in(nullptr), _boxes_in(nullptr), _batch_splits_in(nullptr), _scores_out(nullptr), _boxes_out(nullptr), _classes(nullptr), _batch_splits_out(nullptr), _keeps(nullptr), _keeps_size(nullptr),
188  _info()
189 {
190 }

◆ CPPBoxWithNonMaximaSuppressionLimitKernel() [2/3]

Prevent instances of this class from being copied (As this class contains pointers)

◆ CPPBoxWithNonMaximaSuppressionLimitKernel() [3/3]

Allow instances of this class to be moved.

Member Function Documentation

◆ configure()

void configure ( const ITensor scores_in,
const ITensor boxes_in,
const ITensor batch_splits_in,
ITensor scores_out,
ITensor boxes_out,
ITensor classes,
ITensor batch_splits_out = nullptr,
ITensor keeps = nullptr,
ITensor keeps_size = nullptr,
const BoxNMSLimitInfo  info = BoxNMSLimitInfo() 
)

Initialise the kernel's input and output tensors.

Parameters
[in]scores_inThe scores input tensor of size [num_classes, count]. Data types supported: F16/F32
[in]boxes_inThe boxes input tensor of size [num_classes * 4, count]. Data types supported: Same as scores_in
[in]batch_splits_inThe batch splits input tensor of size [batch_size]. Data types supported: Same as scores_in
Note
Can be a nullptr. If not a nullptr, scores_in and boxes_in have items from multiple images.
Parameters
[out]scores_outThe scores output tensor of size [N]. Data types supported: Same as scores_in
[out]boxes_outThe boxes output tensor of size [4, N]. Data types supported: Same as scores_in
[out]classesThe classes output tensor of size [N]. Data types supported: Same as scores_in
[out]batch_splits_out(Optional) The batch splits output tensor [batch_size]. Data types supported: Same as scores_in
[out]keeps(Optional) The keeps output tensor of size [N]. Data types supported: Same asscores_in
[out]keeps_size(Optional) Number of filtered indices per class tensor of size [num_classes]. Data types supported: U32
[in]info(Optional) BoxNMSLimitInfo information.

Definition at line 346 of file CPPBoxWithNonMaximaSuppressionLimitKernel.cpp.

References ARM_COMPUTE_ERROR_ON, ARM_COMPUTE_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_ERROR_ON_MISMATCHING_DATA_TYPES, ARM_COMPUTE_ERROR_ON_MSG, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_UNUSED, arm_compute::calculate_max_window(), ITensorInfo::dimension(), arm_compute::F16, arm_compute::F32, ITensor::info(), arm_compute::test::validation::info, and arm_compute::U32.

Referenced by CPPBoxWithNonMaximaSuppressionLimit::configure(), and CPPBoxWithNonMaximaSuppressionLimitKernel::name().

348 {
349  ARM_COMPUTE_ERROR_ON_NULLPTR(scores_in, boxes_in, scores_out, boxes_out, classes);
351  ARM_COMPUTE_ERROR_ON_MISMATCHING_DATA_TYPES(scores_in, boxes_in, scores_out);
352  const unsigned int num_classes = scores_in->info()->dimension(0);
353 
354  ARM_COMPUTE_UNUSED(num_classes);
355  ARM_COMPUTE_ERROR_ON_MSG((4 * num_classes) != boxes_in->info()->dimension(0), "First dimension of input boxes must be of size 4*num_classes");
356  ARM_COMPUTE_ERROR_ON_MSG(scores_in->info()->dimension(1) != boxes_in->info()->dimension(1), "Input scores and input boxes must have the same number of rows");
357 
358  ARM_COMPUTE_ERROR_ON(scores_out->info()->dimension(0) != boxes_out->info()->dimension(1));
359  ARM_COMPUTE_ERROR_ON(boxes_out->info()->dimension(0) != 4);
360  ARM_COMPUTE_ERROR_ON(scores_out->info()->dimension(0) != classes->info()->dimension(0));
361  if(keeps != nullptr)
362  {
363  ARM_COMPUTE_ERROR_ON_MSG(keeps_size == nullptr, "keeps_size cannot be nullptr if keeps has to be provided as output");
366  ARM_COMPUTE_ERROR_ON(scores_out->info()->dimension(0) != keeps->info()->dimension(0));
367  ARM_COMPUTE_ERROR_ON(num_classes != keeps_size->info()->dimension(0));
368  }
369  if(batch_splits_in != nullptr)
370  {
371  ARM_COMPUTE_ERROR_ON_MISMATCHING_DATA_TYPES(scores_in, batch_splits_in);
372  }
373  if(batch_splits_out != nullptr)
374  {
375  ARM_COMPUTE_ERROR_ON_MISMATCHING_DATA_TYPES(scores_in, batch_splits_out);
376  }
377 
378  _scores_in = scores_in;
379  _boxes_in = boxes_in;
380  _batch_splits_in = batch_splits_in;
381  _scores_out = scores_out;
382  _boxes_out = boxes_out;
383  _classes = classes;
384  _batch_splits_out = batch_splits_out;
385  _keeps = keeps;
386  _keeps_size = keeps_size;
387  _info = info;
388 
389  // Configure kernel window
390  Window win = calculate_max_window(*scores_in->info(), Steps(scores_in->info()->dimension(0)));
391 
392  IKernel::configure(win);
393 }
Window calculate_max_window(const ValidRegion &valid_region, const Steps &steps, bool skip_border, BorderSize border_size)
1 channel, 1 F32 per channel
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:466
1 channel, 1 F16 per channel
#define ARM_COMPUTE_UNUSED(...)
To avoid unused variables warnings.
Definition: Error.h:152
1 channel, 1 U32 per channel
#define ARM_COMPUTE_ERROR_ON_MISMATCHING_DATA_TYPES(...)
Definition: Validate.h:539
#define ARM_COMPUTE_ERROR_ON_MSG(cond, msg)
Definition: Error.h:456
#define ARM_COMPUTE_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(t, c,...)
Definition: Validate.h:786
ScaleKernelInfo info(interpolation_policy, default_border_mode, PixelValue(), sampling_policy, false)
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:157

◆ is_parallelisable()

bool is_parallelisable ( ) const
overridevirtual

Indicates whether or not the kernel is parallelisable.

If the kernel is parallelisable then the window returned by window() can be split into sub-windows which can then be run in parallel.

If the kernel is not parallelisable then only the window returned by window() can be passed to run()

Returns
True if the kernel is parallelisable

Reimplemented from IKernel.

Definition at line 192 of file CPPBoxWithNonMaximaSuppressionLimitKernel.cpp.

Referenced by CPPBoxWithNonMaximaSuppressionLimitKernel::name().

193 {
194  return false;
195 }

◆ name()

◆ operator=() [1/2]

Prevent instances of this class from being copied (As this class contains pointers)

Referenced by CPPBoxWithNonMaximaSuppressionLimitKernel::name().

◆ operator=() [2/2]

Allow instances of this class to be moved.

◆ run()

void run ( const Window window,
const ThreadInfo info 
)
overridevirtual

Execute the kernel on the passed window.

Warning
If is_parallelisable() returns false then the passed window must be equal to window()
Note
The window has to be a region within the window returned by the window() method
The width of the window has to be a multiple of num_elems_processed_per_iteration().
Parameters
[in]windowRegion on which to execute the kernel. (Must be a region of the window returned by window())
[in]infoInfo about executing thread and CPU.

Reimplemented from ICPPKernel.

Definition at line 395 of file CPPBoxWithNonMaximaSuppressionLimitKernel.cpp.

References ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON_MISMATCHING_WINDOWS, ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL, ARM_COMPUTE_UNUSED, ITensorInfo::data_type(), arm_compute::F16, arm_compute::F32, ITensor::info(), and IKernel::window().

Referenced by CPPBoxWithNonMaximaSuppressionLimitKernel::name().

396 {
401 
402  switch(_scores_in->info()->data_type())
403  {
404  case DataType::F32:
405  run_nmslimit<float>();
406  break;
407  case DataType::F16:
408  run_nmslimit<half>();
409  break;
410  default:
411  ARM_COMPUTE_ERROR("Not supported");
412  }
413 }
const Window & window() const
The maximum window the kernel can be executed on.
Definition: IKernel.cpp:28
#define ARM_COMPUTE_ERROR(msg)
Print the given message then throw an std::runtime_error.
Definition: Error.h:352
virtual DataType data_type() const =0
Data type used for each element of the tensor.
1 channel, 1 F32 per channel
1 channel, 1 F16 per channel
#define ARM_COMPUTE_UNUSED(...)
To avoid unused variables warnings.
Definition: Error.h:152
#define ARM_COMPUTE_ERROR_ON_MISMATCHING_WINDOWS(f, w)
Definition: Validate.h:179
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor&#39;s metadata.
#define ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL(k)
Definition: Validate.h:915
ScaleKernelInfo info(interpolation_policy, default_border_mode, PixelValue(), sampling_policy, false)

◆ run_nmslimit()

void run_nmslimit ( )

Definition at line 198 of file CPPBoxWithNonMaximaSuppressionLimitKernel.cpp.

References arm_compute::test::validation::b, BoxNMSLimitInfo::detections_per_im(), ITensorInfo::dimension(), ITensor::info(), arm_compute::test::validation::k, ITensor::ptr_to_element(), BoxNMSLimitInfo::score_thresh(), and BoxNMSLimitInfo::soft_nms_enabled().

Referenced by CPPBoxWithNonMaximaSuppressionLimitKernel::name().

199 {
200  const int batch_size = _batch_splits_in == nullptr ? 1 : _batch_splits_in->info()->dimension(0);
201  const int num_classes = _scores_in->info()->dimension(0);
202  const int scores_count = _scores_in->info()->dimension(1);
203  std::vector<int> total_keep_per_batch(batch_size);
204  std::vector<std::vector<int>> keeps(num_classes);
205  int total_keep_count = 0;
206 
207  std::vector<std::vector<T>> in_scores(num_classes, std::vector<T>(scores_count));
208  for(int i = 0; i < scores_count; ++i)
209  {
210  for(int j = 0; j < num_classes; ++j)
211  {
212  in_scores[j][i] = *reinterpret_cast<const T *>(_scores_in->ptr_to_element(Coordinates(j, i)));
213  }
214  }
215 
216  int cur_start_idx = 0;
217  for(int b = 0; b < batch_size; ++b)
218  {
219  // Skip first class if there is more than 1 except if the number of classes is 1.
220  const int j_start = (num_classes == 1 ? 0 : 1);
221  for(int j = j_start; j < num_classes; ++j)
222  {
223  std::vector<T> cur_scores(scores_count);
224  std::vector<int> inds;
225  for(int i = 0; i < scores_count; ++i)
226  {
227  const T score = in_scores[j][i];
228  cur_scores[i] = score;
229 
230  if(score > _info.score_thresh())
231  {
232  inds.push_back(i);
233  }
234  }
235  if(_info.soft_nms_enabled())
236  {
237  keeps[j] = SoftNMS(_boxes_in, in_scores, inds, _info, j);
238  }
239  else
240  {
241  std::sort(inds.data(), inds.data() + inds.size(),
242  [&cur_scores](int lhs, int rhs)
243  {
244  return cur_scores[lhs] > cur_scores[rhs];
245  });
246 
247  keeps[j] = NonMaximaSuppression<T>(_boxes_in, inds, _info, j);
248  }
249  total_keep_count += keeps[j].size();
250  }
251 
252  if(_info.detections_per_im() > 0 && total_keep_count > _info.detections_per_im())
253  {
254  // merge all scores (represented by indices) together and sort
255  auto get_all_scores_sorted = [&in_scores, &keeps, total_keep_count]()
256  {
257  std::vector<T> ret(total_keep_count);
258 
259  int ret_idx = 0;
260  for(unsigned int i = 1; i < keeps.size(); ++i)
261  {
262  auto &cur_keep = keeps[i];
263  for(auto &ckv : cur_keep)
264  {
265  ret[ret_idx++] = in_scores[i][ckv];
266  }
267  }
268 
269  std::sort(ret.data(), ret.data() + ret.size());
270 
271  return ret;
272  };
273 
274  auto all_scores_sorted = get_all_scores_sorted();
275  const T image_thresh = all_scores_sorted[all_scores_sorted.size() - _info.detections_per_im()];
276  for(int j = 1; j < num_classes; ++j)
277  {
278  auto &cur_keep = keeps[j];
279  std::vector<int> new_keeps_j;
280  for(auto &k : cur_keep)
281  {
282  if(in_scores[j][k] >= image_thresh)
283  {
284  new_keeps_j.push_back(k);
285  }
286  }
287  keeps[j] = new_keeps_j;
288  }
289  total_keep_count = _info.detections_per_im();
290  }
291 
292  total_keep_per_batch[b] = total_keep_count;
293 
294  // Write results
295  int cur_out_idx = 0;
296  for(int j = j_start; j < num_classes; ++j)
297  {
298  auto &cur_keep = keeps[j];
299  auto cur_out_scores = reinterpret_cast<T *>(_scores_out->ptr_to_element(Coordinates(cur_start_idx + cur_out_idx)));
300  auto cur_out_classes = reinterpret_cast<T *>(_classes->ptr_to_element(Coordinates(cur_start_idx + cur_out_idx)));
301  const int box_column = (cur_start_idx + cur_out_idx) * 4;
302 
303  for(unsigned int k = 0; k < cur_keep.size(); ++k)
304  {
305  cur_out_scores[k] = in_scores[j][cur_keep[k]];
306  cur_out_classes[k] = static_cast<T>(j);
307  auto cur_out_box_row0 = reinterpret_cast<T *>(_boxes_out->ptr_to_element(Coordinates(box_column + 0, k)));
308  auto cur_out_box_row1 = reinterpret_cast<T *>(_boxes_out->ptr_to_element(Coordinates(box_column + 1, k)));
309  auto cur_out_box_row2 = reinterpret_cast<T *>(_boxes_out->ptr_to_element(Coordinates(box_column + 2, k)));
310  auto cur_out_box_row3 = reinterpret_cast<T *>(_boxes_out->ptr_to_element(Coordinates(box_column + 3, k)));
311  *cur_out_box_row0 = *reinterpret_cast<const T *>(_boxes_in->ptr_to_element(Coordinates(j * 4 + 0, cur_keep[k])));
312  *cur_out_box_row1 = *reinterpret_cast<const T *>(_boxes_in->ptr_to_element(Coordinates(j * 4 + 1, cur_keep[k])));
313  *cur_out_box_row2 = *reinterpret_cast<const T *>(_boxes_in->ptr_to_element(Coordinates(j * 4 + 2, cur_keep[k])));
314  *cur_out_box_row3 = *reinterpret_cast<const T *>(_boxes_in->ptr_to_element(Coordinates(j * 4 + 3, cur_keep[k])));
315  }
316 
317  cur_out_idx += cur_keep.size();
318  }
319 
320  if(_keeps != nullptr)
321  {
322  cur_out_idx = 0;
323  for(int j = 0; j < num_classes; ++j)
324  {
325  for(unsigned int i = 0; i < keeps[j].size(); ++i)
326  {
327  *reinterpret_cast<T *>(_keeps->ptr_to_element(Coordinates(cur_start_idx + cur_out_idx + i))) = static_cast<T>(keeps[j].at(i));
328  }
329  *reinterpret_cast<uint32_t *>(_keeps_size->ptr_to_element(Coordinates(j + b * num_classes))) = keeps[j].size();
330  cur_out_idx += keeps[j].size();
331  }
332  }
333 
334  cur_start_idx += total_keep_count;
335  }
336 
337  if(_batch_splits_out != nullptr)
338  {
339  for(int b = 0; b < batch_size; ++b)
340  {
341  *reinterpret_cast<float *>(_batch_splits_out->ptr_to_element(Coordinates(b))) = total_keep_per_batch[b];
342  }
343  }
344 }
uint8_t * ptr_to_element(const Coordinates &id) const
Return a pointer to the element at the passed coordinates.
Definition: ITensor.h:63
float score_thresh() const
Get the score threshold.
Definition: Types.h:599
virtual size_t dimension(size_t index) const =0
Return the size of the requested dimension.
SimpleTensor< float > b
Definition: DFT.cpp:157
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor&#39;s metadata.
bool soft_nms_enabled() const
Check if soft NMS is enabled.
Definition: Types.h:614
int detections_per_im() const
Get the number of detections.
Definition: Types.h:609

The documentation for this class was generated from the following files: