CPP kernel to perform computation of BoxWithNonMaximaSuppressionLimit. More...

#include <CPPBoxWithNonMaximaSuppressionLimitKernel.h>

Collaboration diagram for CPPBoxWithNonMaximaSuppressionLimitKernel:

Public Member Functions
const char *	name () const override
	Name of the kernel. More...

	CPPBoxWithNonMaximaSuppressionLimitKernel ()
	Default constructor. More...

	CPPBoxWithNonMaximaSuppressionLimitKernel (const CPPBoxWithNonMaximaSuppressionLimitKernel &)=delete
	Prevent instances of this class from being copied (As this class contains pointers) More...

CPPBoxWithNonMaximaSuppressionLimitKernel &	operator= (const CPPBoxWithNonMaximaSuppressionLimitKernel &)=delete
	Prevent instances of this class from being copied (As this class contains pointers) More...

	CPPBoxWithNonMaximaSuppressionLimitKernel (CPPBoxWithNonMaximaSuppressionLimitKernel &&)=default
	Allow instances of this class to be moved. More...

CPPBoxWithNonMaximaSuppressionLimitKernel &	operator= (CPPBoxWithNonMaximaSuppressionLimitKernel &&)=default
	Allow instances of this class to be moved. More...

void	configure (const ITensor scores_in, const ITensor boxes_in, const ITensor batch_splits_in, ITensor scores_out, ITensor boxes_out, ITensor classes, ITensor batch_splits_out=nullptr, ITensor keeps=nullptr, ITensor *keeps_size=nullptr, const BoxNMSLimitInfo info=BoxNMSLimitInfo())
	Initialise the kernel's input and output tensors. More...

void	run (const Window &window, const ThreadInfo &info) override
	Execute the kernel on the passed window. More...

bool	is_parallelisable () const override
	Indicates whether or not the kernel is parallelisable. More...

template<typename T >
void	run_nmslimit ()

Public Member Functions inherited from ICPPKernel
virtual	~ICPPKernel ()=default
	Default destructor. More...

virtual void	run_nd (const Window &window, const ThreadInfo &info, const Window &thread_locator)
	legacy compatibility layer for implemantions which do not support thread_locator In these cases we simply narrow the interface down the legacy version More...

virtual void	run_op (ITensorPack &tensors, const Window &window, const ThreadInfo &info)
	Execute the kernel on the passed window. More...

virtual size_t	get_mws (const CPUInfo &platform, size_t thread_count) const
	Return minimum workload size of the relevant kernel. More...

Public Member Functions inherited from IKernel
	IKernel ()
	Constructor. More...

virtual	~IKernel ()=default
	Destructor. More...

virtual BorderSize	border_size () const
	The size of the border for that kernel. More...

const Window &	window () const
	The maximum window the kernel can be executed on. More...

bool	is_window_configured () const
	Function to check if the embedded window of this kernel has been configured. More...

Additional Inherited Members
Static Public Attributes inherited from ICPPKernel
static constexpr size_t	default_mws = 1

Detailed Description

CPP kernel to perform computation of BoxWithNonMaximaSuppressionLimit.

Definition at line 35 of file CPPBoxWithNonMaximaSuppressionLimitKernel.h.

Constructor & Destructor Documentation

◆ CPPBoxWithNonMaximaSuppressionLimitKernel() [1/3]

CPPBoxWithNonMaximaSuppressionLimitKernel ( )

Default constructor.

Definition at line 195 of file CPPBoxWithNonMaximaSuppressionLimitKernel.cpp.

     : _scores_in(nullptr),
       _boxes_in(nullptr),
       _batch_splits_in(nullptr),
       _scores_out(nullptr),
       _boxes_out(nullptr),
       _classes(nullptr),
       _batch_splits_out(nullptr),
       _keeps(nullptr),
       _keeps_size(nullptr),
       _info()
 {
 }

◆ CPPBoxWithNonMaximaSuppressionLimitKernel() [2/3]

CPPBoxWithNonMaximaSuppressionLimitKernel ( const CPPBoxWithNonMaximaSuppressionLimitKernel & )

delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ CPPBoxWithNonMaximaSuppressionLimitKernel() [3/3]

CPPBoxWithNonMaximaSuppressionLimitKernel ( CPPBoxWithNonMaximaSuppressionLimitKernel && )

default

Allow instances of this class to be moved.

Member Function Documentation

◆ configure()

void configure	(	const ITensor *	scores_in,
		const ITensor *	boxes_in,
		const ITensor *	batch_splits_in,
		ITensor *	scores_out,
		ITensor *	boxes_out,
		ITensor *	classes,
		ITensor *	batch_splits_out = `nullptr`,
		ITensor *	keeps = `nullptr`,
		ITensor *	keeps_size = `nullptr`,
		const BoxNMSLimitInfo	info = `BoxNMSLimitInfo()`
	)

Initialise the kernel's input and output tensors.

Parameters

[in]	scores_in	The scores input tensor of size [num_classes, count]. Data types supported: F16/F32
[in]	boxes_in	The boxes input tensor of size [num_classes * 4, count]. Data types supported: Same as `scores_in`
[in]	batch_splits_in	The batch splits input tensor of size [batch_size]. Data types supported: Same as `scores_in`

Note: Can be a nullptr. If not a nullptr, scores_in and boxes_in have items from multiple images.

Parameters

[out]	scores_out	The scores output tensor of size [N]. Data types supported: Same as `scores_in`
[out]	boxes_out	The boxes output tensor of size [4, N]. Data types supported: Same as `scores_in`
[out]	classes	The classes output tensor of size [N]. Data types supported: Same as `scores_in`
[out]	batch_splits_out	(Optional) The batch splits output tensor [batch_size]. Data types supported: Same as `scores_in`
[out]	keeps	(Optional) The keeps output tensor of size [N]. Data types supported: Same as`scores_in`
[out]	keeps_size	(Optional) Number of filtered indices per class tensor of size [num_classes]. Data types supported: U32
[in]	info	(Optional) BoxNMSLimitInfo information.

Definition at line 372 of file CPPBoxWithNonMaximaSuppressionLimitKernel.cpp.

 {
     ARM_COMPUTE_ERROR_ON_NULLPTR(scores_in, boxes_in, scores_out, boxes_out, classes);
     ARM_COMPUTE_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(scores_in, 1, DataType::F16, DataType::F32);
     ARM_COMPUTE_ERROR_ON_MISMATCHING_DATA_TYPES(scores_in, boxes_in, scores_out);
     const unsigned int num_classes = scores_in->info()->dimension(0);
  
     ARM_COMPUTE_UNUSED(num_classes);
     ARM_COMPUTE_ERROR_ON_MSG((4 * num_classes) != boxes_in->info()->dimension(0),
                              "First dimension of input boxes must be of size 4*num_classes");
     ARM_COMPUTE_ERROR_ON_MSG(scores_in->info()->dimension(1) != boxes_in->info()->dimension(1),
                              "Input scores and input boxes must have the same number of rows");
  
     ARM_COMPUTE_ERROR_ON(scores_out->info()->dimension(0) != boxes_out->info()->dimension(1));
     ARM_COMPUTE_ERROR_ON(boxes_out->info()->dimension(0) != 4);
     ARM_COMPUTE_ERROR_ON(scores_out->info()->dimension(0) != classes->info()->dimension(0));
     if (keeps != nullptr)
     {
         ARM_COMPUTE_ERROR_ON_MSG(keeps_size == nullptr,
                                  "keeps_size cannot be nullptr if keeps has to be provided as output");
         ARM_COMPUTE_ERROR_ON_MISMATCHING_DATA_TYPES(scores_in, keeps);
         ARM_COMPUTE_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(keeps_size, 1, DataType::U32);
         ARM_COMPUTE_ERROR_ON(scores_out->info()->dimension(0) != keeps->info()->dimension(0));
         ARM_COMPUTE_ERROR_ON(num_classes != keeps_size->info()->dimension(0));
     }
     if (batch_splits_in != nullptr)
     {
         ARM_COMPUTE_ERROR_ON_MISMATCHING_DATA_TYPES(scores_in, batch_splits_in);
     }
     if (batch_splits_out != nullptr)
     {
         ARM_COMPUTE_ERROR_ON_MISMATCHING_DATA_TYPES(scores_in, batch_splits_out);
     }
  
     _scores_in        = scores_in;
     _boxes_in         = boxes_in;
     _batch_splits_in  = batch_splits_in;
     _scores_out       = scores_out;
     _boxes_out        = boxes_out;
     _classes          = classes;
     _batch_splits_out = batch_splits_out;
     _keeps            = keeps;
     _keeps_size       = keeps_size;
     _info             = info;
  
     // Configure kernel window
     Window win = calculate_max_window(*scores_in->info(), Steps(scores_in->info()->dimension(0)));
  
     IKernel::configure(win);
 }

References ARM_COMPUTE_ERROR_ON, ARM_COMPUTE_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_ERROR_ON_MISMATCHING_DATA_TYPES, ARM_COMPUTE_ERROR_ON_MSG, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_UNUSED, arm_compute::calculate_max_window(), ITensorInfo::dimension(), arm_compute::F16, arm_compute::F32, ITensor::info(), arm_compute::test::validation::info, and arm_compute::U32.

Referenced by CPPBoxWithNonMaximaSuppressionLimit::configure().

◆ is_parallelisable()

bool is_parallelisable ( ) const

overridevirtual

Indicates whether or not the kernel is parallelisable.

If the kernel is parallelisable then the window returned by window() can be split into sub-windows which can then be run in parallel.

If the kernel is not parallelisable then only the window returned by window() can be passed to run()

Returns: True if the kernel is parallelisable

Reimplemented from IKernel.

Definition at line 209 of file CPPBoxWithNonMaximaSuppressionLimitKernel.cpp.

 {
     return false;
 }

◆ name()

const char* name ( ) const

inlineoverridevirtual

Name of the kernel.

Returns: Kernel name

Implements ICPPKernel.

Definition at line 38 of file CPPBoxWithNonMaximaSuppressionLimitKernel.h.

     {
         return "CPPBoxWithNonMaximaSuppressionLimitKernel";
     }

◆ operator=() [1/2]

CPPBoxWithNonMaximaSuppressionLimitKernel& operator= ( const CPPBoxWithNonMaximaSuppressionLimitKernel & )

delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

CPPBoxWithNonMaximaSuppressionLimitKernel& operator= ( CPPBoxWithNonMaximaSuppressionLimitKernel && )

default

Allow instances of this class to be moved.

◆ run()

void run	(	const Window &	window,
		const ThreadInfo &	info
	)

overridevirtual

Execute the kernel on the passed window.

Warning: If is_parallelisable() returns false then the passed window must be equal to window()

Note: The window has to be a region within the window returned by the window() method; The width of the window has to be a multiple of num_elems_processed_per_iteration().

Parameters

[in]	window	Region on which to execute the kernel. (Must be a region of the window returned by window())
[in]	info	Info about executing thread and CPU.

Reimplemented from ICPPKernel.

Definition at line 432 of file CPPBoxWithNonMaximaSuppressionLimitKernel.cpp.

 {
     ARM_COMPUTE_UNUSED(info);
     ARM_COMPUTE_UNUSED(window);
     ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL(this);
     ARM_COMPUTE_ERROR_ON_MISMATCHING_WINDOWS(IKernel::window(), window);
  
     switch (_scores_in->info()->data_type())
     {
         case DataType::F32:
             run_nmslimit<float>();
             break;
         case DataType::F16:
             run_nmslimit<half>();
             break;
         default:
             ARM_COMPUTE_ERROR("Not supported");
     }
 }

References ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON_MISMATCHING_WINDOWS, ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL, ARM_COMPUTE_UNUSED, ITensorInfo::data_type(), arm_compute::F16, arm_compute::F32, ITensor::info(), arm_compute::test::validation::info, and IKernel::window().

◆ run_nmslimit()

void run_nmslimit

Definition at line 215 of file CPPBoxWithNonMaximaSuppressionLimitKernel.cpp.

 {
     const int                     batch_size = _batch_splits_in == nullptr ? 1 : _batch_splits_in->info()->dimension(0);
     const int                     num_classes  = _scores_in->info()->dimension(0);
     const int                     scores_count = _scores_in->info()->dimension(1);
     std::vector<int>              total_keep_per_batch(batch_size);
     std::vector<std::vector<int>> keeps(num_classes);
     int                           total_keep_count = 0;
  
     std::vector<std::vector<T>> in_scores(num_classes, std::vector<T>(scores_count));
     for (int i = 0; i < scores_count; ++i)
     {
         for (int j = 0; j < num_classes; ++j)
         {
             in_scores[j][i] = *reinterpret_cast<const T *>(_scores_in->ptr_to_element(Coordinates(j, i)));
         }
     }
  
     int cur_start_idx = 0;
     for (int b = 0; b < batch_size; ++b)
     {
         // Skip first class if there is more than 1 except if the number of classes is 1.
         const int j_start = (num_classes == 1 ? 0 : 1);
         for (int j = j_start; j < num_classes; ++j)
         {
             std::vector<T>   cur_scores(scores_count);
             std::vector<int> inds;
             for (int i = 0; i < scores_count; ++i)
             {
                 const T score = in_scores[j][i];
                 cur_scores[i] = score;
  
                 if (score > _info.score_thresh())
                 {
                     inds.push_back(i);
                 }
             }
             if (_info.soft_nms_enabled())
             {
                 keeps[j] = SoftNMS(_boxes_in, in_scores, inds, _info, j);
             }
             else
             {
                 std::sort(inds.data(), inds.data() + inds.size(),
                           [&cur_scores](int lhs, int rhs) { return cur_scores[lhs] > cur_scores[rhs]; });
  
                 keeps[j] = NonMaximaSuppression<T>(_boxes_in, inds, _info, j);
             }
             total_keep_count += keeps[j].size();
         }
  
         if (_info.detections_per_im() > 0 && total_keep_count > _info.detections_per_im())
         {
             // merge all scores (represented by indices) together and sort
             auto get_all_scores_sorted = [&in_scores, &keeps, total_keep_count]()
             {
                 std::vector<T> ret(total_keep_count);
  
                 int ret_idx = 0;
                 for (unsigned int i = 1; i < keeps.size(); ++i)
                 {
                     auto &cur_keep = keeps[i];
                     for (auto &ckv : cur_keep)
                     {
                         ret[ret_idx++] = in_scores[i][ckv];
                     }
                 }
  
                 std::sort(ret.data(), ret.data() + ret.size());
  
                 return ret;
             };
  
             auto    all_scores_sorted = get_all_scores_sorted();
             const T image_thresh      = all_scores_sorted[all_scores_sorted.size() - _info.detections_per_im()];
             for (int j = 1; j < num_classes; ++j)
             {
                 auto            &cur_keep = keeps[j];
                 std::vector<int> new_keeps_j;
                 for (auto &k : cur_keep)
                 {
                     if (in_scores[j][k] >= image_thresh)
                     {
                         new_keeps_j.push_back(k);
                     }
                 }
                 keeps[j] = new_keeps_j;
             }
             total_keep_count = _info.detections_per_im();
         }
  
         total_keep_per_batch[b] = total_keep_count;
  
         // Write results
         int cur_out_idx = 0;
         for (int j = j_start; j < num_classes; ++j)
         {
             auto &cur_keep = keeps[j];
             auto  cur_out_scores =
                 reinterpret_cast<T *>(_scores_out->ptr_to_element(Coordinates(cur_start_idx + cur_out_idx)));
             auto cur_out_classes =
                 reinterpret_cast<T *>(_classes->ptr_to_element(Coordinates(cur_start_idx + cur_out_idx)));
             const int box_column = (cur_start_idx + cur_out_idx) * 4;
  
             for (unsigned int k = 0; k < cur_keep.size(); ++k)
             {
                 cur_out_scores[k]  = in_scores[j][cur_keep[k]];
                 cur_out_classes[k] = static_cast<T>(j);
                 auto cur_out_box_row0 =
                     reinterpret_cast<T *>(_boxes_out->ptr_to_element(Coordinates(box_column + 0, k)));
                 auto cur_out_box_row1 =
                     reinterpret_cast<T *>(_boxes_out->ptr_to_element(Coordinates(box_column + 1, k)));
                 auto cur_out_box_row2 =
                     reinterpret_cast<T *>(_boxes_out->ptr_to_element(Coordinates(box_column + 2, k)));
                 auto cur_out_box_row3 =
                     reinterpret_cast<T *>(_boxes_out->ptr_to_element(Coordinates(box_column + 3, k)));
                 *cur_out_box_row0 =
                     *reinterpret_cast<const T *>(_boxes_in->ptr_to_element(Coordinates(j * 4 + 0, cur_keep[k])));
                 *cur_out_box_row1 =
                     *reinterpret_cast<const T *>(_boxes_in->ptr_to_element(Coordinates(j * 4 + 1, cur_keep[k])));
                 *cur_out_box_row2 =
                     *reinterpret_cast<const T *>(_boxes_in->ptr_to_element(Coordinates(j * 4 + 2, cur_keep[k])));
                 *cur_out_box_row3 =
                     *reinterpret_cast<const T *>(_boxes_in->ptr_to_element(Coordinates(j * 4 + 3, cur_keep[k])));
             }
  
             cur_out_idx += cur_keep.size();
         }
  
         if (_keeps != nullptr)
         {
             cur_out_idx = 0;
             for (int j = 0; j < num_classes; ++j)
             {
                 for (unsigned int i = 0; i < keeps[j].size(); ++i)
                 {
                     *reinterpret_cast<T *>(_keeps->ptr_to_element(Coordinates(cur_start_idx + cur_out_idx + i))) =
                         static_cast<T>(keeps[j].at(i));
                 }
                 *reinterpret_cast<uint32_t *>(_keeps_size->ptr_to_element(Coordinates(j + b * num_classes))) =
                     keeps[j].size();
                 cur_out_idx += keeps[j].size();
             }
         }
  
         cur_start_idx += total_keep_count;
     }
  
     if (_batch_splits_out != nullptr)
     {
         for (int b = 0; b < batch_size; ++b)
         {
             *reinterpret_cast<float *>(_batch_splits_out->ptr_to_element(Coordinates(b))) = total_keep_per_batch[b];
         }
     }
 }

References arm_compute::test::validation::b, BoxNMSLimitInfo::detections_per_im(), ITensorInfo::dimension(), ITensor::info(), ITensor::ptr_to_element(), BoxNMSLimitInfo::score_thresh(), and BoxNMSLimitInfo::soft_nms_enabled().

The documentation for this class was generated from the following files:

arm_compute/core/CPP/kernels/CPPBoxWithNonMaximaSuppressionLimitKernel.h
src/core/CPP/kernels/CPPBoxWithNonMaximaSuppressionLimitKernel.cpp

Public Member Functions

Additional Inherited Members

Detailed Description

Constructor & Destructor Documentation

◆ CPPBoxWithNonMaximaSuppressionLimitKernel() [1/3]

◆ CPPBoxWithNonMaximaSuppressionLimitKernel() [2/3]

◆ CPPBoxWithNonMaximaSuppressionLimitKernel() [3/3]

Member Function Documentation

◆ configure()

◆ is_parallelisable()

◆ name()

◆ operator=() [1/2]

◆ operator=() [2/2]

◆ run()

◆ run_nmslimit()