Compute Library
 21.02
CPPDetectionPostProcessLayer Class Reference

CPP Function to generate the detection output based on center size encoded boxes, class prediction and anchors by doing non maximum suppression. More...

#include <CPPDetectionPostProcessLayer.h>

Collaboration diagram for CPPDetectionPostProcessLayer:
[legend]

Public Member Functions

 CPPDetectionPostProcessLayer (std::shared_ptr< IMemoryManager > memory_manager=nullptr)
 Constructor. More...
 
 CPPDetectionPostProcessLayer (const CPPDetectionPostProcessLayer &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
CPPDetectionPostProcessLayeroperator= (const CPPDetectionPostProcessLayer &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
void configure (const ITensor *input_box_encoding, const ITensor *input_score, const ITensor *input_anchors, ITensor *output_boxes, ITensor *output_classes, ITensor *output_scores, ITensor *num_detection, DetectionPostProcessLayerInfo info=DetectionPostProcessLayerInfo())
 Configure the detection output layer CPP function. More...
 
void run () override
 Run the kernels contained in the function. More...
 
- Public Member Functions inherited from IFunction
virtual ~IFunction ()=default
 Destructor. More...
 
virtual void prepare ()
 Prepare the function for executing. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *input_box_encoding, const ITensorInfo *input_class_score, const ITensorInfo *input_anchors, ITensorInfo *output_boxes, ITensorInfo *output_classes, ITensorInfo *output_scores, ITensorInfo *num_detection, DetectionPostProcessLayerInfo info=DetectionPostProcessLayerInfo())
 Static function to check if given info will lead to a valid configuration of CPPDetectionPostProcessLayer. More...
 

Detailed Description

CPP Function to generate the detection output based on center size encoded boxes, class prediction and anchors by doing non maximum suppression.

Note
Intended for use with MultiBox detection method.

Definition at line 46 of file CPPDetectionPostProcessLayer.h.

Constructor & Destructor Documentation

◆ CPPDetectionPostProcessLayer() [1/2]

CPPDetectionPostProcessLayer ( std::shared_ptr< IMemoryManager memory_manager = nullptr)

Constructor.

Definition at line 209 of file CPPDetectionPostProcessLayer.cpp.

210  : _memory_group(std::move(memory_manager)), _nms(), _input_box_encoding(nullptr), _input_scores(nullptr), _input_anchors(nullptr), _output_boxes(nullptr), _output_classes(nullptr),
211  _output_scores(nullptr), _num_detection(nullptr), _info(), _num_boxes(), _num_classes_with_background(), _num_max_detected_boxes(), _dequantize_scores(false), _decoded_boxes(), _decoded_scores(),
212  _selected_indices(), _class_scores(), _input_scores_to_use(nullptr)
213 {
214 }

◆ CPPDetectionPostProcessLayer() [2/2]

Prevent instances of this class from being copied (As this class contains pointers)

Member Function Documentation

◆ configure()

void configure ( const ITensor input_box_encoding,
const ITensor input_score,
const ITensor input_anchors,
ITensor output_boxes,
ITensor output_classes,
ITensor output_scores,
ITensor num_detection,
DetectionPostProcessLayerInfo  info = DetectionPostProcessLayerInfo() 
)

Configure the detection output layer CPP function.

Parameters
[in]input_box_encodingThe bounding box input tensor. Data types supported: F32/QASYMM8/QASYMM8_SIGNED.
[in]input_scoreThe class prediction input tensor. Data types supported: Same as input_box_encoding.
[in]input_anchorsThe anchors input tensor. Data types supported: Same as input_box_encoding.
[out]output_boxesThe boxes output tensor. Data types supported: F32.
[out]output_classesThe classes output tensor. Data types supported: Same as output_boxes.
[out]output_scoresThe scores output tensor. Data types supported: Same as output_boxes.
[out]num_detectionThe number of output detection. Data types supported: Same as output_boxes.
[in]info(Optional) DetectionPostProcessLayerInfo information.
Note
Output contains all the detections. Of those, only the ones selected by the valid region are valid.

Definition at line 216 of file CPPDetectionPostProcessLayer.cpp.

References TensorAllocator::allocate(), Tensor::allocator(), ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, arm_compute::auto_init_if_empty(), CPPNonMaximumSuppression::configure(), ITensorInfo::data_type(), DetectionPostProcessLayerInfo::dequantize_scores(), DetectionPostProcessLayerInfo::detection_per_class(), ITensorInfo::dimension(), arm_compute::F32, ITensor::info(), Tensor::info(), arm_compute::test::validation::info, DetectionPostProcessLayerInfo::iou_threshold(), arm_compute::is_data_type_quantized(), MemoryGroup::manage(), DetectionPostProcessLayerInfo::max_classes_per_detection(), DetectionPostProcessLayerInfo::max_detections(), DetectionPostProcessLayerInfo::nms_score_threshold(), DetectionPostProcessLayerInfo::num_classes(), arm_compute::S32, arm_compute::U, DetectionPostProcessLayerInfo::use_regular_nms(), and arm_compute::validate_arguments().

Referenced by NEDetectionPostProcessLayer::configure().

218 {
219  ARM_COMPUTE_ERROR_ON_NULLPTR(input_box_encoding, input_scores, input_anchors, output_boxes, output_classes, output_scores);
220  _num_max_detected_boxes = info.max_detections() * info.max_classes_per_detection();
221 
222  auto_init_if_empty(*output_boxes->info(), TensorInfo(TensorShape(_kNumCoordBox, _num_max_detected_boxes, _kBatchSize), 1, DataType::F32));
223  auto_init_if_empty(*output_classes->info(), TensorInfo(TensorShape(_num_max_detected_boxes, _kBatchSize), 1, DataType::F32));
224  auto_init_if_empty(*output_scores->info(), TensorInfo(TensorShape(_num_max_detected_boxes, _kBatchSize), 1, DataType::F32));
225  auto_init_if_empty(*num_detection->info(), TensorInfo(TensorShape(1U), 1, DataType::F32));
226 
227  // Perform validation step
228  ARM_COMPUTE_ERROR_THROW_ON(validate_arguments(input_box_encoding->info(), input_scores->info(), input_anchors->info(), output_boxes->info(), output_classes->info(), output_scores->info(),
229  num_detection->info(),
230  info, _kBatchSize, _kNumCoordBox));
231 
232  _input_box_encoding = input_box_encoding;
233  _input_scores = input_scores;
234  _input_anchors = input_anchors;
235  _output_boxes = output_boxes;
236  _output_classes = output_classes;
237  _output_scores = output_scores;
238  _num_detection = num_detection;
239  _info = info;
240  _num_boxes = input_box_encoding->info()->dimension(1);
241  _num_classes_with_background = _input_scores->info()->dimension(0);
242  _dequantize_scores = (info.dequantize_scores() && is_data_type_quantized(input_box_encoding->info()->data_type()));
243 
244  auto_init_if_empty(*_decoded_boxes.info(), TensorInfo(TensorShape(_kNumCoordBox, _input_box_encoding->info()->dimension(1), _kBatchSize), 1, DataType::F32));
245  auto_init_if_empty(*_decoded_scores.info(), TensorInfo(TensorShape(_input_scores->info()->dimension(0), _input_scores->info()->dimension(1), _kBatchSize), 1, DataType::F32));
246  auto_init_if_empty(*_selected_indices.info(), TensorInfo(TensorShape(info.use_regular_nms() ? info.detection_per_class() : info.max_detections()), 1, DataType::S32));
247  const unsigned int num_classes_per_box = std::min(info.max_classes_per_detection(), info.num_classes());
248  auto_init_if_empty(*_class_scores.info(), TensorInfo(info.use_regular_nms() ? TensorShape(_num_boxes) : TensorShape(_num_boxes * num_classes_per_box), 1, DataType::F32));
249 
250  _input_scores_to_use = _dequantize_scores ? &_decoded_scores : _input_scores;
251 
252  // Manage intermediate buffers
253  _memory_group.manage(&_decoded_boxes);
254  _memory_group.manage(&_decoded_scores);
255  _memory_group.manage(&_selected_indices);
256  _memory_group.manage(&_class_scores);
257  _nms.configure(&_decoded_boxes, &_class_scores, &_selected_indices, info.use_regular_nms() ? info.detection_per_class() : info.max_detections(), info.nms_score_threshold(), info.iou_threshold());
258 
259  // Allocate and reserve intermediate tensors and vectors
260  _decoded_boxes.allocator()->allocate();
261  _decoded_scores.allocator()->allocate();
262  _selected_indices.allocator()->allocate();
263  _class_scores.allocator()->allocate();
264 }
bool is_data_type_quantized(DataType dt)
Check if a given data type is of quantized type.
Definition: Utils.h:1168
virtual size_t dimension(size_t index) const =0
Return the size of the requested dimension.
1 channel, 1 F32 per channel
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
TensorAllocator * allocator()
Return a pointer to the tensor&#39;s allocator.
Definition: Tensor.cpp:48
ITensorInfo * info() const override
Interface to be implemented by the child class to return the tensor&#39;s metadata.
Definition: Tensor.cpp:33
1 channel, 1 S32 per channel
void manage(IMemoryManageable *obj) override
Sets a object to be managed by the given memory group.
Definition: MemoryGroup.h:79
void allocate() override
Allocate size specified by TensorInfo of CPU memory.
bool auto_init_if_empty(ITensorInfo &info, const TensorShape &shape, int num_channels, DataType data_type, QuantizationInfo quantization_info=QuantizationInfo())
Auto initialize the tensor info (shape, number of channels and data type) if the current assignment i...
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor&#39;s metadata.
ScaleKernelInfo info(interpolation_policy, default_border_mode, PixelValue(), sampling_policy, false)
Status validate_arguments(const ITensorInfo *input, const ITensorInfo *bias, const ITensorInfo *output, const GEMMLowpOutputStageInfo *output_stage)
void configure(const ITensor *bboxes, const ITensor *scores, ITensor *indices, unsigned int max_output_size, const float score_threshold, const float nms_threshold)
Configure the function to perform non maximal suppression.
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161
DataType
Available data types.
Definition: Types.h:77

◆ operator=()

Prevent instances of this class from being copied (As this class contains pointers)

◆ run()

void run ( )
overridevirtual

Run the kernels contained in the function.

For Neon kernels:

  • Multi-threading is used for the kernels which are parallelisable.
  • By default std::thread::hardware_concurrency() threads are used.
Note
CPPScheduler::set_num_threads() can be used to manually set the number of threads

For OpenCL kernels:

  • All the kernels are enqueued on the queue associated with CLScheduler.
  • The queue is then flushed.
Note
The function will not block until the kernels are executed. It is the user's responsibility to wait.
Will call prepare() on first run if hasn't been done

Implements IFunction.

Definition at line 282 of file CPPDetectionPostProcessLayer.cpp.

References arm_compute::test::validation::b, Tensor::buffer(), ITensorInfo::data_type(), arm_compute::dequantize_qasymm8(), arm_compute::dequantize_qasymm8_signed(), DetectionPostProcessLayerInfo::detection_per_class(), ITensor::info(), DetectionPostProcessLayerInfo::max_classes_per_detection(), DetectionPostProcessLayerInfo::max_detections(), DetectionPostProcessLayerInfo::num_classes(), ITensor::ptr_to_element(), arm_compute::QASYMM8, arm_compute::QASYMM8_SIGNED, ITensorInfo::quantization_info(), ICPPSimpleFunction::run(), and DetectionPostProcessLayerInfo::use_regular_nms().

Referenced by NEDetectionPostProcessLayer::run().

283 {
284  const unsigned int num_classes = _info.num_classes();
285  const unsigned int max_detections = _info.max_detections();
286 
287  DecodeCenterSizeBoxes(_input_box_encoding, _input_anchors, _info, &_decoded_boxes);
288 
289  // Decode scores if necessary
290  if(_dequantize_scores)
291  {
292  if(_input_box_encoding->info()->data_type() == DataType::QASYMM8)
293  {
294  for(unsigned int idx_c = 0; idx_c < _num_classes_with_background; ++idx_c)
295  {
296  for(unsigned int idx_b = 0; idx_b < _num_boxes; ++idx_b)
297  {
298  *(reinterpret_cast<float *>(_decoded_scores.ptr_to_element(Coordinates(idx_c, idx_b)))) =
299  dequantize_qasymm8(*(reinterpret_cast<qasymm8_t *>(_input_scores->ptr_to_element(Coordinates(idx_c, idx_b)))), _input_scores->info()->quantization_info());
300  }
301  }
302  }
303  else if(_input_box_encoding->info()->data_type() == DataType::QASYMM8_SIGNED)
304  {
305  for(unsigned int idx_c = 0; idx_c < _num_classes_with_background; ++idx_c)
306  {
307  for(unsigned int idx_b = 0; idx_b < _num_boxes; ++idx_b)
308  {
309  *(reinterpret_cast<float *>(_decoded_scores.ptr_to_element(Coordinates(idx_c, idx_b)))) =
310  dequantize_qasymm8_signed(*(reinterpret_cast<qasymm8_signed_t *>(_input_scores->ptr_to_element(Coordinates(idx_c, idx_b)))), _input_scores->info()->quantization_info());
311  }
312  }
313  }
314  }
315 
316  // Regular NMS
317  if(_info.use_regular_nms())
318  {
319  std::vector<int> result_idx_boxes_after_nms;
320  std::vector<int> result_classes_after_nms;
321  std::vector<float> result_scores_after_nms;
322  std::vector<unsigned int> sorted_indices;
323 
324  for(unsigned int c = 0; c < num_classes; ++c)
325  {
326  // For each boxes get scores of the boxes for the class c
327  for(unsigned int i = 0; i < _num_boxes; ++i)
328  {
329  *(reinterpret_cast<float *>(_class_scores.ptr_to_element(Coordinates(i)))) =
330  *(reinterpret_cast<float *>(_input_scores_to_use->ptr_to_element(Coordinates(c + 1, i)))); // i * _num_classes_with_background + c + 1
331  }
332 
333  // Run Non-maxima Suppression
334  _nms.run();
335 
336  for(unsigned int i = 0; i < _info.detection_per_class(); ++i)
337  {
338  const auto selected_index = *(reinterpret_cast<int *>(_selected_indices.ptr_to_element(Coordinates(i))));
339  if(selected_index == -1)
340  {
341  // Nms will return -1 for all the last M-elements not valid
342  break;
343  }
344  result_idx_boxes_after_nms.emplace_back(selected_index);
345  result_scores_after_nms.emplace_back((reinterpret_cast<float *>(_class_scores.buffer()))[selected_index]);
346  result_classes_after_nms.emplace_back(c);
347  }
348  }
349 
350  // We select the max detection numbers of the highest score of all classes
351  const auto num_selected = result_scores_after_nms.size();
352  const auto num_output = std::min<unsigned int>(max_detections, num_selected);
353 
354  // Sort selected indices based on result scores
355  sorted_indices.resize(num_selected);
356  std::iota(sorted_indices.begin(), sorted_indices.end(), 0);
357  std::partial_sort(sorted_indices.data(),
358  sorted_indices.data() + num_output,
359  sorted_indices.data() + num_selected,
360  [&](unsigned int first, unsigned int second)
361  {
362 
363  return result_scores_after_nms[first] > result_scores_after_nms[second];
364  });
365 
366  SaveOutputs(&_decoded_boxes, result_idx_boxes_after_nms, result_scores_after_nms, result_classes_after_nms, sorted_indices,
367  num_output, max_detections, _output_boxes, _output_classes, _output_scores, _num_detection);
368  }
369  // Fast NMS
370  else
371  {
372  const unsigned int num_classes_per_box = std::min<unsigned int>(_info.max_classes_per_detection(), _info.num_classes());
373  std::vector<float> max_scores;
374  std::vector<int> box_indices;
375  std::vector<int> max_score_classes;
376 
377  for(unsigned int b = 0; b < _num_boxes; ++b)
378  {
379  std::vector<float> box_scores;
380  for(unsigned int c = 0; c < num_classes; ++c)
381  {
382  box_scores.emplace_back(*(reinterpret_cast<float *>(_input_scores_to_use->ptr_to_element(Coordinates(c + 1, b)))));
383  }
384 
385  std::vector<unsigned int> max_score_indices;
386  max_score_indices.resize(_info.num_classes());
387  std::iota(max_score_indices.data(), max_score_indices.data() + _info.num_classes(), 0);
388  std::partial_sort(max_score_indices.data(),
389  max_score_indices.data() + num_classes_per_box,
390  max_score_indices.data() + num_classes,
391  [&](unsigned int first, unsigned int second)
392  {
393  return box_scores[first] > box_scores[second];
394  });
395 
396  for(unsigned int i = 0; i < num_classes_per_box; ++i)
397  {
398  const float score_to_add = box_scores[max_score_indices[i]];
399  *(reinterpret_cast<float *>(_class_scores.ptr_to_element(Coordinates(b * num_classes_per_box + i)))) = score_to_add;
400  max_scores.emplace_back(score_to_add);
401  box_indices.emplace_back(b);
402  max_score_classes.emplace_back(max_score_indices[i]);
403  }
404  }
405 
406  // Run Non-maxima Suppression
407  _nms.run();
408  std::vector<unsigned int> selected_indices;
409  for(unsigned int i = 0; i < max_detections; ++i)
410  {
411  // NMS returns M valid indices, the not valid tail is filled with -1
412  if(*(reinterpret_cast<int *>(_selected_indices.ptr_to_element(Coordinates(i)))) == -1)
413  {
414  // Nms will return -1 for all the last M-elements not valid
415  break;
416  }
417  selected_indices.emplace_back(*(reinterpret_cast<int *>(_selected_indices.ptr_to_element(Coordinates(i)))));
418  }
419  // We select the max detection numbers of the highest score of all classes
420  const auto num_output = std::min<unsigned int>(_info.max_detections(), selected_indices.size());
421 
422  SaveOutputs(&_decoded_boxes, box_indices, max_scores, max_score_classes, selected_indices,
423  num_output, max_detections, _output_boxes, _output_classes, _output_scores, _num_detection);
424  }
425 }
uint8_t * ptr_to_element(const Coordinates &id) const
Return a pointer to the element at the passed coordinates.
Definition: ITensor.h:63
float dequantize_qasymm8(uint8_t value, const INFO_TYPE &qinfo)
Dequantize a value given an unsigned 8-bit asymmetric quantization scheme.
SimpleTensor< float > b
Definition: DFT.cpp:157
unsigned int max_detections() const
Get max detections.
Definition: Types.h:1137
virtual DataType data_type() const =0
Data type used for each element of the tensor.
bool use_regular_nms() const
Get if use regular nms.
Definition: Types.h:1167
quantized, asymmetric fixed-point 8-bit number unsigned
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor&#39;s metadata.
virtual QuantizationInfo quantization_info() const =0
Get the quantization settings (scale and offset) of the tensor.
unsigned int max_classes_per_detection() const
Get max_classes per detection.
Definition: Types.h:1142
unsigned int detection_per_class() const
Get detection per class.
Definition: Types.h:1147
unsigned int num_classes() const
Get num classes.
Definition: Types.h:1162
uint8_t * buffer() const override
Interface to be implemented by the child class to return a pointer to CPU memory. ...
Definition: Tensor.cpp:43
float dequantize_qasymm8_signed(int8_t value, const INFO_TYPE &qinfo)
Dequantize a value given a signed 8-bit asymmetric quantization scheme.
void run() override final
Run the kernels contained in the function.
quantized, asymmetric fixed-point 8-bit number signed

◆ validate()

Status validate ( const ITensorInfo input_box_encoding,
const ITensorInfo input_class_score,
const ITensorInfo input_anchors,
ITensorInfo output_boxes,
ITensorInfo output_classes,
ITensorInfo output_scores,
ITensorInfo num_detection,
DetectionPostProcessLayerInfo  info = DetectionPostProcessLayerInfo() 
)
static

Static function to check if given info will lead to a valid configuration of CPPDetectionPostProcessLayer.

Parameters
[in]input_box_encodingThe bounding box input tensor info. Data types supported: F32/QASYMM8/QASYMM8_SIGNED.
[in]input_class_scoreThe class prediction input tensor info. Data types supported: Same as input_box_encoding.
[in]input_anchorsThe anchors input tensor. Data types supported: F32, QASYMM8.
[out]output_boxesThe output tensor. Data types supported: F32.
[out]output_classesThe output tensor. Data types supported: Same as output_boxes.
[out]output_scoresThe output tensor. Data types supported: Same as output_boxes.
[out]num_detectionThe number of output detection. Data types supported: Same as output_boxes.
[in]info(Optional) DetectionPostProcessLayerInfo information.
Returns
a status

Definition at line 266 of file CPPDetectionPostProcessLayer.cpp.

References ARM_COMPUTE_RETURN_ON_ERROR, ITensorInfo::dimension(), arm_compute::F32, DetectionPostProcessLayerInfo::iou_threshold(), DetectionPostProcessLayerInfo::max_detections(), DetectionPostProcessLayerInfo::nms_score_threshold(), arm_compute::S32, CPPNonMaximumSuppression::validate(), and arm_compute::validate_arguments().

Referenced by arm_compute::test::validation::DATA_TEST_CASE(), and NEDetectionPostProcessLayer::validate().

268 {
269  constexpr unsigned int kBatchSize = 1;
270  constexpr unsigned int kNumCoordBox = 4;
271  const TensorInfo _decoded_boxes_info = TensorInfo(TensorShape(kNumCoordBox, input_box_encoding->dimension(1)), 1, DataType::F32);
272  const TensorInfo _decoded_scores_info = TensorInfo(TensorShape(input_box_encoding->dimension(1)), 1, DataType::F32);
273  const TensorInfo _selected_indices_info = TensorInfo(TensorShape(info.max_detections()), 1, DataType::S32);
274 
275  ARM_COMPUTE_RETURN_ON_ERROR(CPPNonMaximumSuppression::validate(&_decoded_boxes_info, &_decoded_scores_info, &_selected_indices_info, info.max_detections(), info.nms_score_threshold(),
276  info.iou_threshold()));
277  ARM_COMPUTE_RETURN_ON_ERROR(validate_arguments(input_box_encoding, input_class_score, input_anchors, output_boxes, output_classes, output_scores, num_detection, info, kBatchSize, kNumCoordBox));
278 
279  return Status{};
280 }
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204
1 channel, 1 F32 per channel
1 channel, 1 S32 per channel
static Status validate(const ITensorInfo *bboxes, const ITensorInfo *scores, const ITensorInfo *indices, unsigned int max_output_size, const float score_threshold, const float nms_threshold)
Static function to check if given arguments will lead to a valid configuration of CPPNonMaximumSuppre...
ScaleKernelInfo info(interpolation_policy, default_border_mode, PixelValue(), sampling_policy, false)
Status validate_arguments(const ITensorInfo *input, const ITensorInfo *bias, const ITensorInfo *output, const GEMMLowpOutputStageInfo *output_stage)

The documentation for this class was generated from the following files: