Common interface for all kernels implemented in C++. More...

#include <ICPPKernel.h>

Collaboration diagram for ICPPKernel:

Public Member Functions
virtual	~ICPPKernel ()=default
	Default destructor. More...

virtual void	run (const Window &window, const ThreadInfo &info)
	Execute the kernel on the passed window. More...

virtual void	run_nd (const Window &window, const ThreadInfo &info, const Window &thread_locator)
	legacy compatibility layer for implemantions which do not support thread_locator In these cases we simply narrow the interface down the legacy version More...

virtual void	run_op (ITensorPack &tensors, const Window &window, const ThreadInfo &info)
	Execute the kernel on the passed window. More...

virtual size_t	get_mws (const CPUInfo &platform, size_t thread_count) const
	Return minimum workload size of the relevant kernel. More...

virtual const char *	name () const =0
	Name of the kernel. More...

Public Member Functions inherited from IKernel
	IKernel ()
	Constructor. More...

virtual	~IKernel ()=default
	Destructor. More...

virtual bool	is_parallelisable () const
	Indicates whether or not the kernel is parallelisable. More...

virtual BorderSize	border_size () const
	The size of the border for that kernel. More...

const Window &	window () const
	The maximum window the kernel can be executed on. More...

bool	is_window_configured () const
	Function to check if the embedded window of this kernel has been configured. More...

Static Public Attributes
static constexpr size_t	default_mws = 1

Detailed Description

Common interface for all kernels implemented in C++.

Definition at line 38 of file ICPPKernel.h.

Constructor & Destructor Documentation

◆ ~ICPPKernel()

virtual ~ICPPKernel ( )

virtualdefault

Default destructor.

Member Function Documentation

◆ get_mws()

virtual size_t get_mws	(	const CPUInfo &	platform,
		size_t	thread_count
	)		const

inlinevirtual

Return minimum workload size of the relevant kernel.

Parameters

[in]	platform	The CPU platform used to create the context.
[in]	thread_count	Number of threads in the execution.

Returns: Minimum workload size for requested configuration.

Reimplemented in CpuDivisionKernel, CpuDepthwiseConv2dAssemblyWrapperKernel, CpuGemmAssemblyWrapperKernel< TypeInput, TypeOutput >, CpuIm2ColKernel, CpuArithmeticKernel, CpuMulKernel, NEPadLayerKernel, CpuAddKernel, CpuSubKernel, CpuReshapeKernel, and CpuActivationKernel.

Definition at line 100 of file ICPPKernel.h.

     {
         ARM_COMPUTE_UNUSED(platform, thread_count);
  
         return default_mws;
     }

References ARM_COMPUTE_UNUSED, and ICPPKernel::default_mws.

◆ name()

virtual const char* name ( ) const

pure virtual

Name of the kernel.

Returns: Kernel name

Implemented in CpuComplexMulKernel, CpuGemmLowpMatrixBReductionKernel, CpuGemmLowpOffsetContributionOutputStageKernel, CpuIm2ColKernel, CpuWinogradConv2dTransformOutputKernel, CpuAddMulAddKernel, CpuMulKernel, CpuGemmMatrixMultiplyKernel, CpuGemmTranspose1xWKernel, CpuScaleKernel, CpuGemmLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel, CpuGemmLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel, CpuDepthwiseConv2dAssemblyWrapperKernel, CpuDepthwiseConv2dNativeKernel, CpuGemmLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel, CpuMaxUnpoolingLayerKernel, CpuDirectConv3dKernel, CpuGemmLowpOffsetContributionKernel, CpuWeightsReshapeKernel, CpuAddKernel, CpuGemmLowpQuantizeDownInt32ScaleKernel, CpuCastKernel, CpuCol2ImKernel, CpuGemmMatrixAdditionKernel, CpuActivationKernel, CpuDirectConv2dOutputStageKernel, CpuSubKernel, CpuDirectConv2dKernel, CpuGemmInterleave4x4Kernel, CpuPool3dKernel, CpuConvertFullyConnectedWeightsKernel, CpuElementwiseUnaryKernel, CpuGemmLowpMatrixMultiplyKernel, CpuPool2dKernel, CpuGemmLowpMatrixAReductionKernel, CpuGemmAssemblyWrapperKernel< TypeInput, TypeOutput >, CpuConcatenateDepthKernel, CpuFloorKernel, CpuSoftmaxKernel, NELogicalKernel, CpuWinogradConv2dTransformInputKernel, CpuQuantizeKernel, CpuConcatenateHeightKernel, CpuConcatenateWidthKernel, CpuConcatenateBatchKernel, CpuPermuteKernel, CpuCopyKernel, CpuReshapeKernel, NEGatherKernel, CpuConvertQuantizedSignednessKernel, CpuDequantizeKernel, CpuTransposeKernel, CpuPool2dAssemblyWrapperKernel, CpuElementwiseKernel< Derived >, NECol2ImKernel, CpuFillKernel, NETileKernel, NESelectKernel, NEFFTRadixStageKernel, NERangeKernel, NEReductionOperationKernel, NEStackLayerKernel, NEStridedSliceKernel, NEBatchNormalizationLayerKernel, NEBitwiseAndKernel, NEBitwiseNotKernel, NEBitwiseOrKernel, NEBitwiseXorKernel, NEFillBorderKernel, NEMeanStdDevNormalizationKernel, CPPNonMaximumSuppressionKernel, CPPPermuteKernel, NECropKernel, NEFFTDigitReverseKernel, NEFFTScaleKernel, NESpaceToBatchLayerKernel, CPPUpsampleKernel, NEBatchToSpaceLayerKernel, NEPadLayerKernel, NEQLSTMLayerNormalizationKernel, NESpaceToDepthLayerKernel, CPPBoxWithNonMaximaSuppressionLimitKernel, NEChannelShuffleLayerKernel, NEDepthToSpaceLayerKernel, NEFuseBatchNormalizationKernel, NEInstanceNormalizationLayerKernel, NENormalizationLayerKernel, NEReorgLayerKernel, NEROIAlignLayerKernel, CPPTopKVKernel, NEBoundingBoxTransformKernel, NEL2NormalizeLayerKernel, NEPriorBoxLayerKernel, NEReverseKernel, NEComputeAllAnchorsKernel, and NEROIPoolingLayerKernel.

◆ run()

virtual void run	(	const Window &	window,
		const ThreadInfo &	info
	)

inlinevirtual

Execute the kernel on the passed window.

Warning: If is_parallelisable() returns false then the passed window must be equal to window()

Note: The window has to be a region within the window returned by the window() method; The width of the window has to be a multiple of num_elems_processed_per_iteration().

Parameters

[in]	window	Region on which to execute the kernel. (Must be a region of the window returned by window())
[in]	info	Info about executing thread and CPU.

Reimplemented in NESpaceToBatchLayerKernel, NEFuseBatchNormalizationKernel, NEBatchNormalizationLayerKernel, NEBatchToSpaceLayerKernel, NECropKernel, NECol2ImKernel, CPPNonMaximumSuppressionKernel, NEROIAlignLayerKernel, NEPadLayerKernel, NEFillBorderKernel, NEBoundingBoxTransformKernel, NEStackLayerKernel, NEFFTRadixStageKernel, NEReductionOperationKernel, NESelectKernel, NEGatherKernel, NENormalizationLayerKernel, NEL2NormalizeLayerKernel, NEFFTDigitReverseKernel, NEMeanStdDevNormalizationKernel, NERangeKernel, CPPBoxWithNonMaximaSuppressionLimitKernel, CPPTopKVKernel, NEPriorBoxLayerKernel, NEQLSTMLayerNormalizationKernel, CPPPermuteKernel, NEDepthToSpaceLayerKernel, NEInstanceNormalizationLayerKernel, NEReorgLayerKernel, NEReverseKernel, NEFFTScaleKernel, NEComputeAllAnchorsKernel, CpuGemmAssemblyWrapperKernel< TypeInput, TypeOutput >, NESpaceToDepthLayerKernel, NEChannelShuffleLayerKernel, NETileKernel, NEROIPoolingLayerKernel, NEBitwiseAndKernel, NEBitwiseOrKernel, NEBitwiseXorKernel, CPPUpsampleKernel, and NEBitwiseNotKernel.

Definition at line 57 of file ICPPKernel.h.

     {
         ARM_COMPUTE_UNUSED(window, info);
         ARM_COMPUTE_ERROR("default implementation of legacy run() virtual member function invoked");
     }

References ARM_COMPUTE_ERROR, ARM_COMPUTE_UNUSED, arm_compute::test::validation::info, and IKernel::window().

Referenced by ICPPKernel::run_nd(), and SingleThreadScheduler::schedule().

◆ run_nd()

virtual void run_nd	(	const Window &	window,
		const ThreadInfo &	info,
		const Window &	thread_locator
	)

inlinevirtual

legacy compatibility layer for implemantions which do not support thread_locator In these cases we simply narrow the interface down the legacy version

Parameters

[in]	window	Region on which to execute the kernel. (Must be a region of the window returned by window())
[in]	info	Info about executing thread and CPU.
[in]	thread_locator	Specifies "where" the current thread is in the multi-dimensional space

Reimplemented in CpuGemmAssemblyWrapperKernel< TypeInput, TypeOutput >.

Definition at line 70 of file ICPPKernel.h.

     {
         ARM_COMPUTE_UNUSED(thread_locator);
         run(window, info);
     }

References ARM_COMPUTE_UNUSED, arm_compute::test::validation::info, ICPPKernel::run(), and IKernel::window().

◆ run_op()

virtual void run_op	(	ITensorPack &	tensors,
		const Window &	window,
		const ThreadInfo &	info
	)

inlinevirtual

Execute the kernel on the passed window.

Warning: If is_parallelisable() returns false then the passed window must be equal to window()

Note: The window has to be a region within the window returned by the window() method; The width of the window has to be a multiple of num_elems_processed_per_iteration().

Parameters

[in]	tensors	A vector containing the tensors to operate on.
[in]	window	Region on which to execute the kernel. (Must be a region of the window returned by window())
[in]	info	Info about executing thread and CPU.

Reimplemented in CpuComplexMulKernel, CpuGemmLowpMatrixBReductionKernel, CpuGemmLowpOffsetContributionOutputStageKernel, CpuIm2ColKernel, NEStridedSliceKernel, CpuWinogradConv2dTransformOutputKernel, CpuAddMulAddKernel, CpuMulKernel, CpuGemmMatrixMultiplyKernel, CpuGemmTranspose1xWKernel, CpuScaleKernel, CpuGemmLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel, CpuGemmLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel, CpuDepthwiseConv2dAssemblyWrapperKernel, NEFillBorderKernel, CpuDepthwiseConv2dNativeKernel, CpuGemmLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel, CpuDirectConv3dKernel, CpuGemmLowpOffsetContributionKernel, CpuWeightsReshapeKernel, CpuAddKernel, CpuGemmLowpQuantizeDownInt32ScaleKernel, CpuPool2dAssemblyWrapperKernel, CpuCastKernel, CpuCol2ImKernel, CpuGemmMatrixAdditionKernel, CpuActivationKernel, CpuDirectConv2dOutputStageKernel, CpuMaxUnpoolingLayerKernel, CpuSubKernel, CpuDirectConv2dKernel, CpuGemmInterleave4x4Kernel, CpuPool3dKernel, CpuConvertFullyConnectedWeightsKernel, CpuElementwiseUnaryKernel, CpuGemmLowpMatrixMultiplyKernel, CpuPool2dKernel, CpuGemmLowpMatrixAReductionKernel, CpuConcatenateDepthKernel, CpuFloorKernel, CpuSoftmaxKernel, NELogicalKernel, CpuQuantizeKernel, CpuWinogradConv2dTransformInputKernel, CpuConcatenateHeightKernel, CpuConcatenateWidthKernel, CpuConcatenateBatchKernel, CpuPermuteKernel, CpuCopyKernel, CpuReshapeKernel, CpuConvertQuantizedSignednessKernel, CpuDequantizeKernel, CpuTransposeKernel, CpuElementwiseKernel< Derived >, CpuElementwiseKernel< CpuComparisonKernel >, CpuElementwiseKernel< CpuArithmeticKernel >, and CpuFillKernel.

Definition at line 88 of file ICPPKernel.h.

     {
         ARM_COMPUTE_UNUSED(tensors, window, info);
     }

References ARM_COMPUTE_UNUSED, arm_compute::test::validation::info, and IKernel::window().

Referenced by SingleThreadScheduler::schedule_op(), and OMPScheduler::schedule_op().

Field Documentation

◆ default_mws

constexpr size_t default_mws = 1

staticconstexpr

Definition at line 41 of file ICPPKernel.h.

Referenced by CpuActivationKernel::get_mws(), CpuReshapeKernel::get_mws(), CpuSubKernel::get_mws(), CpuAddKernel::get_mws(), NEPadLayerKernel::get_mws(), ICPPKernel::get_mws(), CpuMulKernel::get_mws(), CpuArithmeticKernel::get_mws(), CpuIm2ColKernel::get_mws(), CpuGemmAssemblyWrapperKernel< TypeInput, TypeOutput >::get_mws(), CpuDepthwiseConv2dAssemblyWrapperKernel::get_mws(), and CpuDivisionKernel::get_mws().

The documentation for this class was generated from the following file:

arm_compute/core/CPP/ICPPKernel.h

Public Member Functions

Static Public Attributes

Detailed Description

Constructor & Destructor Documentation

◆ ~ICPPKernel()

Member Function Documentation

◆ get_mws()

◆ name()

◆ run()

◆ run_nd()

◆ run_op()

Field Documentation

◆ default_mws