Compute Library
 21.11
CPPDetectionPostProcessLayer Class Reference

CPP Function to generate the detection output based on center size encoded boxes, class prediction and anchors by doing non maximum suppression. More...

#include <CPPDetectionPostProcessLayer.h>

Collaboration diagram for CPPDetectionPostProcessLayer:
[legend]

Public Member Functions

 CPPDetectionPostProcessLayer (std::shared_ptr< IMemoryManager > memory_manager=nullptr)
 Constructor. More...
 
 CPPDetectionPostProcessLayer (const CPPDetectionPostProcessLayer &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
CPPDetectionPostProcessLayeroperator= (const CPPDetectionPostProcessLayer &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
void configure (const ITensor *input_box_encoding, const ITensor *input_score, const ITensor *input_anchors, ITensor *output_boxes, ITensor *output_classes, ITensor *output_scores, ITensor *num_detection, DetectionPostProcessLayerInfo info=DetectionPostProcessLayerInfo())
 Configure the detection output layer CPP function. More...
 
void run () override
 Run the kernels contained in the function. More...
 
- Public Member Functions inherited from IFunction
virtual ~IFunction ()=default
 Destructor. More...
 
virtual void prepare ()
 Prepare the function for executing. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *input_box_encoding, const ITensorInfo *input_class_score, const ITensorInfo *input_anchors, ITensorInfo *output_boxes, ITensorInfo *output_classes, ITensorInfo *output_scores, ITensorInfo *num_detection, DetectionPostProcessLayerInfo info=DetectionPostProcessLayerInfo())
 Static function to check if given info will lead to a valid configuration of CPPDetectionPostProcessLayer. More...
 

Detailed Description

CPP Function to generate the detection output based on center size encoded boxes, class prediction and anchors by doing non maximum suppression.

Note
Intended for use with MultiBox detection method.

Definition at line 46 of file CPPDetectionPostProcessLayer.h.

Constructor & Destructor Documentation

◆ CPPDetectionPostProcessLayer() [1/2]

CPPDetectionPostProcessLayer ( std::shared_ptr< IMemoryManager memory_manager = nullptr)

Constructor.

Definition at line 211 of file CPPDetectionPostProcessLayer.cpp.

212  : _memory_group(std::move(memory_manager)), _nms(), _input_box_encoding(nullptr), _input_scores(nullptr), _input_anchors(nullptr), _output_boxes(nullptr), _output_classes(nullptr),
213  _output_scores(nullptr), _num_detection(nullptr), _info(), _num_boxes(), _num_classes_with_background(), _num_max_detected_boxes(), _dequantize_scores(false), _decoded_boxes(), _decoded_scores(),
214  _selected_indices(), _class_scores(), _input_scores_to_use(nullptr)
215 {
216 }

◆ CPPDetectionPostProcessLayer() [2/2]

Prevent instances of this class from being copied (As this class contains pointers)

Member Function Documentation

◆ configure()

void configure ( const ITensor input_box_encoding,
const ITensor input_score,
const ITensor input_anchors,
ITensor output_boxes,
ITensor output_classes,
ITensor output_scores,
ITensor num_detection,
DetectionPostProcessLayerInfo  info = DetectionPostProcessLayerInfo() 
)

Configure the detection output layer CPP function.

Parameters
[in]input_box_encodingThe bounding box input tensor. Data types supported: F32/QASYMM8/QASYMM8_SIGNED.
[in]input_scoreThe class prediction input tensor. Data types supported: Same as input_box_encoding.
[in]input_anchorsThe anchors input tensor. Data types supported: Same as input_box_encoding.
[out]output_boxesThe boxes output tensor. Data types supported: F32.
[out]output_classesThe classes output tensor. Data types supported: Same as output_boxes.
[out]output_scoresThe scores output tensor. Data types supported: Same as output_boxes.
[out]num_detectionThe number of output detection. Data types supported: Same as output_boxes.
[in]info(Optional) DetectionPostProcessLayerInfo information.
Note
Output contains all the detections. Of those, only the ones selected by the valid region are valid.

Definition at line 218 of file CPPDetectionPostProcessLayer.cpp.

References TensorAllocator::allocate(), Tensor::allocator(), ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, ARM_COMPUTE_LOG_PARAMS, arm_compute::auto_init_if_empty(), CPPNonMaximumSuppression::configure(), ITensorInfo::data_type(), DetectionPostProcessLayerInfo::dequantize_scores(), DetectionPostProcessLayerInfo::detection_per_class(), ITensorInfo::dimension(), arm_compute::F32, ITensor::info(), Tensor::info(), arm_compute::test::validation::info, DetectionPostProcessLayerInfo::iou_threshold(), arm_compute::is_data_type_quantized(), MemoryGroup::manage(), DetectionPostProcessLayerInfo::max_classes_per_detection(), DetectionPostProcessLayerInfo::max_detections(), DetectionPostProcessLayerInfo::nms_score_threshold(), DetectionPostProcessLayerInfo::num_classes(), arm_compute::S32, arm_compute::U, and DetectionPostProcessLayerInfo::use_regular_nms().

Referenced by NEDetectionPostProcessLayer::configure().

221 {
222  ARM_COMPUTE_ERROR_ON_NULLPTR(input_box_encoding, input_scores, input_anchors, output_boxes, output_classes, output_scores);
223  ARM_COMPUTE_LOG_PARAMS(input_box_encoding, input_scores, input_anchors, output_boxes, output_classes, output_scores,
224  num_detection, info);
225 
226  _num_max_detected_boxes = info.max_detections() * info.max_classes_per_detection();
227 
228  auto_init_if_empty(*output_boxes->info(), TensorInfo(TensorShape(_kNumCoordBox, _num_max_detected_boxes, _kBatchSize), 1, DataType::F32));
229  auto_init_if_empty(*output_classes->info(), TensorInfo(TensorShape(_num_max_detected_boxes, _kBatchSize), 1, DataType::F32));
230  auto_init_if_empty(*output_scores->info(), TensorInfo(TensorShape(_num_max_detected_boxes, _kBatchSize), 1, DataType::F32));
231  auto_init_if_empty(*num_detection->info(), TensorInfo(TensorShape(1U), 1, DataType::F32));
232 
233  // Perform validation step
234  ARM_COMPUTE_ERROR_THROW_ON(validate_arguments(input_box_encoding->info(), input_scores->info(), input_anchors->info(), output_boxes->info(), output_classes->info(), output_scores->info(),
235  num_detection->info(),
236  info, _kBatchSize, _kNumCoordBox));
237 
238  _input_box_encoding = input_box_encoding;
239  _input_scores = input_scores;
240  _input_anchors = input_anchors;
241  _output_boxes = output_boxes;
242  _output_classes = output_classes;
243  _output_scores = output_scores;
244  _num_detection = num_detection;
245  _info = info;
246  _num_boxes = input_box_encoding->info()->dimension(1);
247  _num_classes_with_background = _input_scores->info()->dimension(0);
248  _dequantize_scores = (info.dequantize_scores() && is_data_type_quantized(input_box_encoding->info()->data_type()));
249 
250  auto_init_if_empty(*_decoded_boxes.info(), TensorInfo(TensorShape(_kNumCoordBox, _input_box_encoding->info()->dimension(1), _kBatchSize), 1, DataType::F32));
251  auto_init_if_empty(*_decoded_scores.info(), TensorInfo(TensorShape(_input_scores->info()->dimension(0), _input_scores->info()->dimension(1), _kBatchSize), 1, DataType::F32));
252  auto_init_if_empty(*_selected_indices.info(), TensorInfo(TensorShape(info.use_regular_nms() ? info.detection_per_class() : info.max_detections()), 1, DataType::S32));
253  const unsigned int num_classes_per_box = std::min(info.max_classes_per_detection(), info.num_classes());
254  auto_init_if_empty(*_class_scores.info(), TensorInfo(info.use_regular_nms() ? TensorShape(_num_boxes) : TensorShape(_num_boxes * num_classes_per_box), 1, DataType::F32));
255 
256  _input_scores_to_use = _dequantize_scores ? &_decoded_scores : _input_scores;
257 
258  // Manage intermediate buffers
259  _memory_group.manage(&_decoded_boxes);
260  _memory_group.manage(&_decoded_scores);
261  _memory_group.manage(&_selected_indices);
262  _memory_group.manage(&_class_scores);
263  _nms.configure(&_decoded_boxes, &_class_scores, &_selected_indices, info.use_regular_nms() ? info.detection_per_class() : info.max_detections(), info.nms_score_threshold(), info.iou_threshold());
264 
265  // Allocate and reserve intermediate tensors and vectors
266  _decoded_boxes.allocator()->allocate();
267  _decoded_scores.allocator()->allocate();
268  _selected_indices.allocator()->allocate();
269  _class_scores.allocator()->allocate();
270 }
bool is_data_type_quantized(DataType dt)
Check if a given data type is of quantized type.
Definition: Utils.h:981
virtual size_t dimension(size_t index) const =0
Return the size of the requested dimension.
1 channel, 1 F32 per channel
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
TensorAllocator * allocator()
Return a pointer to the tensor&#39;s allocator.
Definition: Tensor.cpp:48
ITensorInfo * info() const override
Interface to be implemented by the child class to return the tensor&#39;s metadata.
Definition: Tensor.cpp:33
1 channel, 1 S32 per channel
void manage(IMemoryManageable *obj) override
Sets a object to be managed by the given memory group.
Definition: MemoryGroup.h:79
void allocate() override
Allocate size specified by TensorInfo of CPU memory.
bool auto_init_if_empty(ITensorInfo &info, const TensorShape &shape, int num_channels, DataType data_type, QuantizationInfo quantization_info=QuantizationInfo())
Auto initialize the tensor info (shape, number of channels and data type) if the current assignment i...
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor&#39;s metadata.
ScaleKernelInfo info(interpolation_policy, default_border_mode, PixelValue(), sampling_policy, false)
#define ARM_COMPUTE_LOG_PARAMS(...)
void configure(const ITensor *bboxes, const ITensor *scores, ITensor *indices, unsigned int max_output_size, const float score_threshold, const float nms_threshold)
Configure the function to perform non maximal suppression.
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:157
DataType
Available data types.
Definition: Types.h:79

◆ operator=()

Prevent instances of this class from being copied (As this class contains pointers)

◆ run()

void run ( )
overridevirtual

Run the kernels contained in the function.

For CPU kernels:

  • Multi-threading is used for the kernels which are parallelisable.
  • By default std::thread::hardware_concurrency() threads are used.
Note
CPPScheduler::set_num_threads() can be used to manually set the number of threads

For OpenCL kernels:

  • All the kernels are enqueued on the queue associated with CLScheduler.
  • The queue is then flushed.
Note
The function will not block until the kernels are executed. It is the user's responsibility to wait.
Will call prepare() on first run if hasn't been done

Implements IFunction.

Definition at line 288 of file CPPDetectionPostProcessLayer.cpp.

References arm_compute::test::validation::b, Tensor::buffer(), ITensorInfo::data_type(), arm_compute::dequantize_qasymm8(), arm_compute::dequantize_qasymm8_signed(), DetectionPostProcessLayerInfo::detection_per_class(), ITensor::info(), DetectionPostProcessLayerInfo::max_classes_per_detection(), DetectionPostProcessLayerInfo::max_detections(), DetectionPostProcessLayerInfo::num_classes(), ITensor::ptr_to_element(), arm_compute::QASYMM8, arm_compute::QASYMM8_SIGNED, ITensorInfo::quantization_info(), ICPPSimpleFunction::run(), and DetectionPostProcessLayerInfo::use_regular_nms().

Referenced by NEDetectionPostProcessLayer::run().

289 {
290  const unsigned int num_classes = _info.num_classes();
291  const unsigned int max_detections = _info.max_detections();
292 
293  DecodeCenterSizeBoxes(_input_box_encoding, _input_anchors, _info, &_decoded_boxes);
294 
295  // Decode scores if necessary
296  if(_dequantize_scores)
297  {
298  if(_input_box_encoding->info()->data_type() == DataType::QASYMM8)
299  {
300  for(unsigned int idx_c = 0; idx_c < _num_classes_with_background; ++idx_c)
301  {
302  for(unsigned int idx_b = 0; idx_b < _num_boxes; ++idx_b)
303  {
304  *(reinterpret_cast<float *>(_decoded_scores.ptr_to_element(Coordinates(idx_c, idx_b)))) =
305  dequantize_qasymm8(*(reinterpret_cast<qasymm8_t *>(_input_scores->ptr_to_element(Coordinates(idx_c, idx_b)))), _input_scores->info()->quantization_info());
306  }
307  }
308  }
309  else if(_input_box_encoding->info()->data_type() == DataType::QASYMM8_SIGNED)
310  {
311  for(unsigned int idx_c = 0; idx_c < _num_classes_with_background; ++idx_c)
312  {
313  for(unsigned int idx_b = 0; idx_b < _num_boxes; ++idx_b)
314  {
315  *(reinterpret_cast<float *>(_decoded_scores.ptr_to_element(Coordinates(idx_c, idx_b)))) =
316  dequantize_qasymm8_signed(*(reinterpret_cast<qasymm8_signed_t *>(_input_scores->ptr_to_element(Coordinates(idx_c, idx_b)))), _input_scores->info()->quantization_info());
317  }
318  }
319  }
320  }
321 
322  // Regular NMS
323  if(_info.use_regular_nms())
324  {
325  std::vector<int> result_idx_boxes_after_nms;
326  std::vector<int> result_classes_after_nms;
327  std::vector<float> result_scores_after_nms;
328  std::vector<unsigned int> sorted_indices;
329 
330  for(unsigned int c = 0; c < num_classes; ++c)
331  {
332  // For each boxes get scores of the boxes for the class c
333  for(unsigned int i = 0; i < _num_boxes; ++i)
334  {
335  *(reinterpret_cast<float *>(_class_scores.ptr_to_element(Coordinates(i)))) =
336  *(reinterpret_cast<float *>(_input_scores_to_use->ptr_to_element(Coordinates(c + 1, i)))); // i * _num_classes_with_background + c + 1
337  }
338 
339  // Run Non-maxima Suppression
340  _nms.run();
341 
342  for(unsigned int i = 0; i < _info.detection_per_class(); ++i)
343  {
344  const auto selected_index = *(reinterpret_cast<int *>(_selected_indices.ptr_to_element(Coordinates(i))));
345  if(selected_index == -1)
346  {
347  // Nms will return -1 for all the last M-elements not valid
348  break;
349  }
350  result_idx_boxes_after_nms.emplace_back(selected_index);
351  result_scores_after_nms.emplace_back((reinterpret_cast<float *>(_class_scores.buffer()))[selected_index]);
352  result_classes_after_nms.emplace_back(c);
353  }
354  }
355 
356  // We select the max detection numbers of the highest score of all classes
357  const auto num_selected = result_scores_after_nms.size();
358  const auto num_output = std::min<unsigned int>(max_detections, num_selected);
359 
360  // Sort selected indices based on result scores
361  sorted_indices.resize(num_selected);
362  std::iota(sorted_indices.begin(), sorted_indices.end(), 0);
363  std::partial_sort(sorted_indices.data(),
364  sorted_indices.data() + num_output,
365  sorted_indices.data() + num_selected,
366  [&](unsigned int first, unsigned int second)
367  {
368 
369  return result_scores_after_nms[first] > result_scores_after_nms[second];
370  });
371 
372  SaveOutputs(&_decoded_boxes, result_idx_boxes_after_nms, result_scores_after_nms, result_classes_after_nms, sorted_indices,
373  num_output, max_detections, _output_boxes, _output_classes, _output_scores, _num_detection);
374  }
375  // Fast NMS
376  else
377  {
378  const unsigned int num_classes_per_box = std::min<unsigned int>(_info.max_classes_per_detection(), _info.num_classes());
379  std::vector<float> max_scores;
380  std::vector<int> box_indices;
381  std::vector<int> max_score_classes;
382 
383  for(unsigned int b = 0; b < _num_boxes; ++b)
384  {
385  std::vector<float> box_scores;
386  for(unsigned int c = 0; c < num_classes; ++c)
387  {
388  box_scores.emplace_back(*(reinterpret_cast<float *>(_input_scores_to_use->ptr_to_element(Coordinates(c + 1, b)))));
389  }
390 
391  std::vector<unsigned int> max_score_indices;
392  max_score_indices.resize(_info.num_classes());
393  std::iota(max_score_indices.data(), max_score_indices.data() + _info.num_classes(), 0);
394  std::partial_sort(max_score_indices.data(),
395  max_score_indices.data() + num_classes_per_box,
396  max_score_indices.data() + num_classes,
397  [&](unsigned int first, unsigned int second)
398  {
399  return box_scores[first] > box_scores[second];
400  });
401 
402  for(unsigned int i = 0; i < num_classes_per_box; ++i)
403  {
404  const float score_to_add = box_scores[max_score_indices[i]];
405  *(reinterpret_cast<float *>(_class_scores.ptr_to_element(Coordinates(b * num_classes_per_box + i)))) = score_to_add;
406  max_scores.emplace_back(score_to_add);
407  box_indices.emplace_back(b);
408  max_score_classes.emplace_back(max_score_indices[i]);
409  }
410  }
411 
412  // Run Non-maxima Suppression
413  _nms.run();
414  std::vector<unsigned int> selected_indices;
415  for(unsigned int i = 0; i < max_detections; ++i)
416  {
417  // NMS returns M valid indices, the not valid tail is filled with -1
418  if(*(reinterpret_cast<int *>(_selected_indices.ptr_to_element(Coordinates(i)))) == -1)
419  {
420  // Nms will return -1 for all the last M-elements not valid
421  break;
422  }
423  selected_indices.emplace_back(*(reinterpret_cast<int *>(_selected_indices.ptr_to_element(Coordinates(i)))));
424  }
425  // We select the max detection numbers of the highest score of all classes
426  const auto num_output = std::min<unsigned int>(_info.max_detections(), selected_indices.size());
427 
428  SaveOutputs(&_decoded_boxes, box_indices, max_scores, max_score_classes, selected_indices,
429  num_output, max_detections, _output_boxes, _output_classes, _output_scores, _num_detection);
430  }
431 }
uint8_t * ptr_to_element(const Coordinates &id) const
Return a pointer to the element at the passed coordinates.
Definition: ITensor.h:63
float dequantize_qasymm8(uint8_t value, const INFO_TYPE &qinfo)
Dequantize a value given an unsigned 8-bit asymmetric quantization scheme.
SimpleTensor< float > b
Definition: DFT.cpp:157
unsigned int max_detections() const
Get max detections.
Definition: Types.h:1096
virtual DataType data_type() const =0
Data type used for each element of the tensor.
bool use_regular_nms() const
Get if use regular nms.
Definition: Types.h:1126
quantized, asymmetric fixed-point 8-bit number unsigned
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor&#39;s metadata.
virtual QuantizationInfo quantization_info() const =0
Get the quantization settings (scale and offset) of the tensor.
unsigned int max_classes_per_detection() const
Get max_classes per detection.
Definition: Types.h:1101
unsigned int detection_per_class() const
Get detection per class.
Definition: Types.h:1106
unsigned int num_classes() const
Get num classes.
Definition: Types.h:1121
uint8_t * buffer() const override
Interface to be implemented by the child class to return a pointer to CPU memory. ...
Definition: Tensor.cpp:43
float dequantize_qasymm8_signed(int8_t value, const INFO_TYPE &qinfo)
Dequantize a value given a signed 8-bit asymmetric quantization scheme.
void run() override final
Run the kernels contained in the function.
quantized, asymmetric fixed-point 8-bit number signed

◆ validate()

Status validate ( const ITensorInfo input_box_encoding,
const ITensorInfo input_class_score,
const ITensorInfo input_anchors,
ITensorInfo output_boxes,
ITensorInfo output_classes,
ITensorInfo output_scores,
ITensorInfo num_detection,
DetectionPostProcessLayerInfo  info = DetectionPostProcessLayerInfo() 
)
static

Static function to check if given info will lead to a valid configuration of CPPDetectionPostProcessLayer.

Parameters
[in]input_box_encodingThe bounding box input tensor info. Data types supported: F32/QASYMM8/QASYMM8_SIGNED.
[in]input_class_scoreThe class prediction input tensor info. Data types supported: Same as input_box_encoding.
[in]input_anchorsThe anchors input tensor. Data types supported: F32, QASYMM8.
[out]output_boxesThe output tensor. Data types supported: F32.
[out]output_classesThe output tensor. Data types supported: Same as output_boxes.
[out]output_scoresThe output tensor. Data types supported: Same as output_boxes.
[out]num_detectionThe number of output detection. Data types supported: Same as output_boxes.
[in]info(Optional) DetectionPostProcessLayerInfo information.
Returns
a status

Definition at line 272 of file CPPDetectionPostProcessLayer.cpp.

References ARM_COMPUTE_RETURN_ON_ERROR, ITensorInfo::dimension(), arm_compute::F32, DetectionPostProcessLayerInfo::iou_threshold(), DetectionPostProcessLayerInfo::max_detections(), DetectionPostProcessLayerInfo::nms_score_threshold(), arm_compute::S32, and CPPNonMaximumSuppression::validate().

Referenced by arm_compute::test::validation::DATA_TEST_CASE(), and NEDetectionPostProcessLayer::validate().

274 {
275  constexpr unsigned int kBatchSize = 1;
276  constexpr unsigned int kNumCoordBox = 4;
277  const TensorInfo _decoded_boxes_info = TensorInfo(TensorShape(kNumCoordBox, input_box_encoding->dimension(1)), 1, DataType::F32);
278  const TensorInfo _decoded_scores_info = TensorInfo(TensorShape(input_box_encoding->dimension(1)), 1, DataType::F32);
279  const TensorInfo _selected_indices_info = TensorInfo(TensorShape(info.max_detections()), 1, DataType::S32);
280 
281  ARM_COMPUTE_RETURN_ON_ERROR(CPPNonMaximumSuppression::validate(&_decoded_boxes_info, &_decoded_scores_info, &_selected_indices_info, info.max_detections(), info.nms_score_threshold(),
282  info.iou_threshold()));
283  ARM_COMPUTE_RETURN_ON_ERROR(validate_arguments(input_box_encoding, input_class_score, input_anchors, output_boxes, output_classes, output_scores, num_detection, info, kBatchSize, kNumCoordBox));
284 
285  return Status{};
286 }
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204
1 channel, 1 F32 per channel
1 channel, 1 S32 per channel
static Status validate(const ITensorInfo *bboxes, const ITensorInfo *scores, const ITensorInfo *indices, unsigned int max_output_size, const float score_threshold, const float nms_threshold)
Static function to check if given arguments will lead to a valid configuration of CPPNonMaximumSuppre...
ScaleKernelInfo info(interpolation_policy, default_border_mode, PixelValue(), sampling_policy, false)

The documentation for this class was generated from the following files: