Compute Library
 19.08
CLKernelLibrary Class Reference

CLKernelLibrary class. More...

#include <CLKernelLibrary.h>

Public Member Functions

 CLKernelLibrary (const CLKernelLibrary &)=delete
 Prevent instances of this class from being copied. More...
 
const CLKernelLibraryoperator= (const CLKernelLibrary &)=delete
 Prevent instances of this class from being copied. More...
 
void init (std::string kernel_path, cl::Context context, cl::Device device)
 Initialises the kernel library. More...
 
void set_kernel_path (const std::string &kernel_path)
 Sets the path that the kernels reside in. More...
 
std::string get_kernel_path ()
 Gets the path that the kernels reside in. More...
 
std::string get_program_source (const std::string &program_name)
 Gets the source of the selected program. More...
 
void set_context (cl::Context context)
 Sets the CL context used to create programs. More...
 
cl::Context & context ()
 Accessor for the associated CL context. More...
 
cl::Device & get_device ()
 Gets the CL device for which the programs are created. More...
 
void set_device (cl::Device device)
 Sets the CL device for which the programs are created. More...
 
std::string get_device_version ()
 Return the device version. More...
 
cl_uint get_num_compute_units ()
 Return the maximum number of compute units in the device. More...
 
Kernel create_kernel (const std::string &kernel_name, const StringSet &build_options_set={}) const
 Creates a kernel from the kernel library. More...
 
size_t max_local_workgroup_size (const cl::Kernel &kernel) const
 Find the maximum number of local work items in a workgroup can be supported for the kernel. More...
 
cl::NDRange default_ndrange () const
 Return the default NDRange for the device. More...
 
void clear_programs_cache ()
 Clear the library's cache of binary programs. More...
 
const std::map< std::string, cl::Program > & get_built_programs () const
 Access the cache of built OpenCL programs. More...
 
void add_built_program (const std::string &built_program_name, const cl::Program &program)
 Add a new built program to the cache. More...
 
bool fp16_supported () const
 Returns true if FP16 is supported by the CL device. More...
 
bool int64_base_atomics_supported () const
 Returns true if int64_base_atomics extension is supported by the CL device. More...
 

Static Public Member Functions

static CLKernelLibraryget ()
 Access the KernelLibrary singleton. More...
 

Detailed Description

CLKernelLibrary class.

Definition at line 192 of file CLKernelLibrary.h.

Constructor & Destructor Documentation

◆ CLKernelLibrary()

CLKernelLibrary ( const CLKernelLibrary )
delete

Prevent instances of this class from being copied.

Member Function Documentation

◆ add_built_program()

void add_built_program ( const std::string &  built_program_name,
const cl::Program &  program 
)

Add a new built program to the cache.

Parameters
[in]built_program_nameName of the program
[in]programBuilt program to add to the cache

Definition at line 1120 of file CLKernelLibrary.cpp.

1121 {
1122  _built_programs_map.emplace(built_program_name, program);
1123 }

Referenced by arm_compute::utils::restore_program_cache_from_file().

◆ clear_programs_cache()

void clear_programs_cache ( )
inline

Clear the library's cache of binary programs.

Definition at line 324 of file CLKernelLibrary.h.

325  {
326  _programs_map.clear();
327  _built_programs_map.clear();
328  }

Referenced by Framework::run().

◆ context()

cl::Context& context ( )
inline

Accessor for the associated CL context.

Returns
A CL context.

Definition at line 275 of file CLKernelLibrary.h.

276  {
277  return _context;
278  }

Referenced by CLScheduler::context(), CLKernelLibrary::init(), and CLKernelLibrary::set_context().

◆ create_kernel()

Kernel create_kernel ( const std::string &  kernel_name,
const StringSet &  build_options_set = {} 
) const

Creates a kernel from the kernel library.

Parameters
[in]kernel_nameKernel name.
[in]build_options_setKernel build options as a set.
Returns
The created kernel.

Definition at line 1043 of file CLKernelLibrary.cpp.

1044 {
1045  // Find which program contains the kernel
1046  auto kernel_program_it = _kernel_program_map.find(kernel_name);
1047 
1048  if(_kernel_program_map.end() == kernel_program_it)
1049  {
1050  ARM_COMPUTE_ERROR("Kernel %s not found in the CLKernelLibrary", kernel_name.c_str());
1051  }
1052  std::string concat_str;
1053 
1054 #if defined(ARM_COMPUTE_DEBUG_ENABLED)
1055  // Enable debug properties in CL kernels
1056  concat_str += " -DARM_COMPUTE_DEBUG_ENABLED";
1057 #endif // defined(ARM_COMPUTE_DEBUG_ENABLED)
1058 
1060  concat_str += " -DGPU_ARCH=" + support::cpp11::to_string(
1061  static_cast<std::underlying_type<GPUTarget>::type>(gpu_arch));
1062  if(fp16_supported())
1063  {
1064  concat_str += " -DARM_COMPUTE_OPENCL_FP16_ENABLED=1 ";
1065  }
1066 
1067  if(dot8_supported(_device))
1068  {
1069  concat_str += " -DARM_COMPUTE_OPENCL_DOT8_ENABLED=1 ";
1070  }
1071 
1072  if(dot8_acc_supported(_device))
1073  {
1074  concat_str += " -DARM_COMPUTE_OPENCL_DOT8_ACC_ENABLED=1 ";
1075  }
1076 
1077  if(get_cl_version(_device) == CLVersion::CL20)
1078  {
1079  concat_str += " -cl-std=CL2.0 ";
1080  }
1081  else if(arm_non_uniform_workgroup_supported(_device))
1082  {
1083  concat_str += " -cl-arm-non-uniform-work-group-size ";
1084  }
1085  else
1086  {
1087  ARM_COMPUTE_ERROR("Non uniform workgroup size is not supported!!");
1088  }
1089 
1090  // Check if the program has been built before with same build options.
1091  const std::string program_name = kernel_program_it->second;
1092  const std::string build_options = stringify_set(build_options_set) + concat_str;
1093 
1094  const std::string built_program_name = program_name + "_" + build_options;
1095  auto built_program_it = _built_programs_map.find(built_program_name);
1096 
1097  cl::Program cl_program;
1098 
1099  if(_built_programs_map.end() != built_program_it)
1100  {
1101  // If program has been built, retrieve to create kernel from it
1102  cl_program = built_program_it->second;
1103  }
1104  else
1105  {
1106  // Get program
1107  Program program = load_program(program_name);
1108 
1109  // Build program
1110  cl_program = program.build(build_options);
1111 
1112  // Add built program to internal map
1113  _built_programs_map.emplace(built_program_name, cl_program);
1114  }
1115 
1116  // Create and return kernel
1117  return Kernel(kernel_name, cl_program);
1118 }
#define ARM_COMPUTE_ERROR(...)
Print the given message then throw an std::runtime_error.
Definition: Error.h:261
bool dot8_acc_supported(const cl::Device &device)
Helper function to check whether the cl_arm_integer_dot_product_accumulate_int8 extension is supporte...
Definition: CLHelpers.cpp:159
bool dot8_supported(const cl::Device &device)
Helper function to check whether the cl_arm_integer_dot_product_int8 extension is supported.
Definition: CLHelpers.cpp:149
std::string to_string(T &&value)
Convert integer and float values to string.
GPUTarget get_arch_from_target(GPUTarget target)
Helper function to get the GPU arch.
Definition: GPUTarget.cpp:189
static bool build(const cl::Program &program, const std::string &build_options="")
Build the given CL program.
GPUTarget get_target_from_device(const cl::Device &device)
Helper function to get the GPU target from CL device.
Definition: CLHelpers.cpp:131
CLVersion get_cl_version(const cl::Device &device)
Helper function to get the highest OpenCL version supported.
Definition: CLHelpers.cpp:164
GPUTarget
Available GPU Targets.
Definition: GPUTarget.h:34
bool fp16_supported() const
Returns true if FP16 is supported by the CL device.
bool arm_non_uniform_workgroup_supported(const cl::Device &device)
Helper function to check whether the arm_non_uniform_work_group_size extension is supported.
Definition: CLHelpers.cpp:139

References ARM_COMPUTE_ERROR, arm_compute::arm_non_uniform_workgroup_supported(), Program::build(), arm_compute::CL20, arm_compute::dot8_acc_supported(), arm_compute::dot8_supported(), CLKernelLibrary::fp16_supported(), arm_compute::get_arch_from_target(), arm_compute::get_cl_version(), arm_compute::get_target_from_device(), and arm_compute::support::cpp11::to_string().

◆ default_ndrange()

cl::NDRange default_ndrange ( ) const

Return the default NDRange for the device.

Definition at line 1221 of file CLKernelLibrary.cpp.

1222 {
1223  GPUTarget _target = get_target_from_device(_device);
1224  cl::NDRange default_range;
1225 
1226  switch(_target)
1227  {
1228  case GPUTarget::MIDGARD:
1229  case GPUTarget::T600:
1230  case GPUTarget::T700:
1231  case GPUTarget::T800:
1232  default_range = cl::NDRange(128u, 1);
1233  break;
1234  default:
1235  default_range = cl::NullRange;
1236  }
1237 
1238  return default_range;
1239 }
GPUTarget get_target_from_device(const cl::Device &device)
Helper function to get the GPU target from CL device.
Definition: CLHelpers.cpp:131
GPUTarget
Available GPU Targets.
Definition: GPUTarget.h:34

References arm_compute::get_target_from_device(), arm_compute::MIDGARD, arm_compute::T600, arm_compute::T700, and arm_compute::T800.

Referenced by CLReductionOperationKernel::configure().

◆ fp16_supported()

bool fp16_supported ( ) const

Returns true if FP16 is supported by the CL device.

Returns
true if the CL device supports FP16

Definition at line 1125 of file CLKernelLibrary.cpp.

1126 {
1127  return ::fp16_supported(_device);
1128 }
bool fp16_supported(const cl::Device &device)
Helper function to check whether the cl_khr_fp16 extension is supported.
Definition: CLHelpers.cpp:144

References arm_compute::fp16_supported().

Referenced by CLKernelLibrary::create_kernel().

◆ get()

CLKernelLibrary & get ( )
static

Access the KernelLibrary singleton.

Returns
The KernelLibrary instance.

Definition at line 1037 of file CLKernelLibrary.cpp.

1038 {
1039  static CLKernelLibrary _kernel_library;
1040  return _kernel_library;
1041 }
CLKernelLibrary class.

Referenced by CLIntegralImageHorKernel::configure(), CLTableLookupKernel::configure(), CLWarpPerspectiveKernel::configure(), CLBox3x3Kernel::configure(), CLDilateKernel::configure(), CLElementWiseUnaryLayerKernel::configure(), CLErodeKernel::configure(), CLMedian3x3Kernel::configure(), CLGaussian3x3Kernel::configure(), CLTransposeKernel::configure(), CLWarpAffineKernel::configure(), CLBitwiseNotKernel::configure(), CLNonMaximaSuppression3x3Kernel::configure(), CLScaleKernel::configure(), CLAccumulateKernel::configure(), CLDepthwiseConvolutionLayer3x3NCHWKernel::configure(), CLThresholdKernel::configure(), CLGEMMMatrixAccumulateBiasesKernel::configure(), CLNonLinearFilterKernel::configure(), CLGEMMMatrixVectorMultiplyKernel::configure(), CLDepthwiseConvolutionLayer3x3NHWCKernel::configure(), CLDepthwiseConvolutionLayerReshapeWeightsKernel::configure(), CLDequantizationLayerKernel::configure(), CLMinMaxLayerKernel::configure(), CLFloorKernel::configure(), CLFlattenLayerKernel::configure(), CLCopyKernel::configure(), CLChannelShuffleLayerKernel::configure(), CLGaussianPyramidHorKernel::configure(), CLReshapeLayerKernel::configure(), CLReverseKernel::configure(), CLBatchToSpaceLayerKernel::configure(), CLSpaceToDepthLayerKernel::configure(), CLGradientKernel::configure(), CLDepthToSpaceLayerKernel::configure(), CLComputeAllAnchorsKernel::configure(), CLHistogramKernel::configure(), CLNormalizationLayerKernel::configure(), CLPermuteKernel::configure(), CLDeconvolutionLayerUpsampleKernel::configure(), CLSpaceToBatchLayerKernel::configure(), CLTileKernel::configure(), CLDepthConvertLayerKernel::configure(), CLUpsampleLayerKernel::configure(), CLBitwiseAndKernel::configure(), CLLocallyConnectedMatrixMultiplyKernel::configure(), CLMinMaxKernel::configure(), CLBitwiseOrKernel::configure(), CLGatherKernel::configure(), CLQuantizationLayerKernel::configure(), CLRemapKernel::configure(), CLReorgLayerKernel::configure(), CLBitwiseXorKernel::configure(), CLSobel3x3Kernel::configure(), CLDepthwiseConvolutionLayerReshapeWeightsGenericKernel::configure(), CLActivationLayerKernel::configure(), CLDerivativeKernel::configure(), CLComparisonKernel::configure(), CLMeanStdDevNormalizationKernel::configure(), CLMemsetKernel::configure(), CLFFTScaleKernel::configure(), CLSobel5x5HorKernel::configure(), CLSobel7x7HorKernel::configure(), CLWidthConcatenate2TensorsKernel::configure(), CLGEMMMatrixAdditionKernel::configure(), CLHOGOrientationBinningKernel::configure(), CLFFTDigitReverseKernel::configure(), CLFillBorderKernel::configure(), CLNormalizePlanarYUVLayerKernel::configure(), CLPixelWiseMultiplicationKernel::configure(), CLPoolingLayerKernel::configure(), CLAbsoluteDifferenceKernel::configure(), CLGEMMLowpMatrixMultiplyNativeKernel::configure(), CLCropKernel::configure(), CLBoundingBoxTransformKernel::configure(), CLWidthConcatenateLayerKernel::configure(), CLHeightConcatenateLayerKernel::configure(), CLDepthwiseVectorToTensorKernel::configure(), CLChannelExtractKernel::configure(), CLPriorBoxLayerKernel::configure(), CLReductionOperationKernel::configure(), CLWidthConcatenate4TensorsKernel::configure(), CLGEMMReshapeRHSMatrixKernel::configure(), CLL2NormalizeLayerKernel::configure(), CLColorConvertKernel::configure(), CLMagnitudePhaseKernel::configure(), CLMeanStdDevKernel::configure(), CLRangeKernel::configure(), CLSelectKernel::configure(), CLChannelCombineKernel::configure(), CLFFTRadixStageKernel::configure(), CLStackLayerKernel::configure(), CLDepthConcatenateLayerKernel::configure(), CLBatchConcatenateLayerKernel::configure(), CLGEMMReshapeLHSMatrixKernel::configure(), CLDepthwiseIm2ColKernel::configure(), CLConvertFullyConnectedWeightsKernel::configure(), CLConvolutionKernel< matrix_size >::configure(), CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel::configure(), CLWinogradInputTransformKernel::configure(), CLIntegralImageVertKernel::configure(), CLYOLOLayerKernel::configure(), CLROIPoolingLayerKernel::configure(), CLROIAlignLayerKernel::configure(), CLGEMMMatrixMultiplyNativeKernel::configure(), CLFuseBatchNormalizationKernel::configure(), CLGEMMLowpMatrixMultiplyKernel::configure(), CLWinogradFilterTransformKernel::configure(), CLBatchNormalizationLayerKernel::configure(), CLDirectConvolutionLayerOutputStageKernel::configure(), CLGEMMLowpOffsetContributionOutputStageKernel::configure(), CLHarrisScoreKernel::configure(), CLGEMMLowpMatrixMultiplyReshapedKernel::configure(), CLStridedSliceKernel::configure(), CLWinogradOutputTransformKernel::configure(), CLGEMMMatrixMultiplyReshapedOnlyRHSKernel::configure(), CLFastCornersKernel::configure(), CLGEMMMatrixMultiplyKernel::configure(), CLHOGDetectorKernel::configure(), CLAccumulateWeightedKernel::configure(), CLGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel::configure(), CLDirectConvolutionLayerKernel::configure(), CLDeconvolutionReshapeOutputKernel::configure(), CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFloatKernel::configure(), CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel::configure(), CLScharr3x3Kernel::configure(), CLGEMMLowpOffsetContributionKernel::configure(), CLGEMMMatrixMultiplyReshapedKernel::configure(), CLGEMMLowpQuantizeDownInt32ToUint8ScaleKernel::configure(), CLGEMMLowpMatrixAReductionKernel::configure(), CLCol2ImKernel::configure(), CLDepthwiseConvolutionLayer3x3::configure(), CLWeightsReshapeKernel::configure(), CLIm2ColKernel::configure(), CLLKTrackerInitKernel::configure(), CLGaussianPyramidVertKernel::configure(), CLHistogramBorderKernel::configure(), CLAccumulateSquaredKernel::configure(), CLEdgeNonMaxSuppressionKernel::configure(), CLMinMaxLocationKernel::configure(), CLHOGBlockNormalizationKernel::configure(), CLSeparableConvolutionHorKernel< matrix_size >::configure(), CLGEMMLowpMatrixBReductionKernel::configure(), CLSobel5x5VertKernel::configure(), CLSobel7x7VertKernel::configure(), CLCopyToArrayKernel::configure(), CLLKTrackerFinalizeKernel::configure(), CLComplexPixelWiseMultiplicationKernel::configure(), CLSeparableConvolutionVertKernel< matrix_size >::configure(), CLEdgeTraceKernel::configure(), CLLKTrackerStage0Kernel::configure(), CLConvolutionRectangleKernel::configure(), CLLKTrackerStage1Kernel::configure(), CLLogits1DNormKernel::configure(), CLScheduler::context(), CLScheduler::default_init(), CLScheduler::default_init_with_context(), arm_compute::error_on_unsupported_int64_base_atomics(), ICLKernel::get_max_workgroup_size(), main(), arm_compute::utils::restore_program_cache_from_file(), Framework::run(), arm_compute::utils::save_program_cache_to_file(), CLScheduler::set_context(), arm_compute::test::validation::TEST_CASE(), OpenCLMemoryUsage::test_measurements(), CLTuner::tune_kernel_dynamic(), and CLDepthwiseConvolutionLayer3x3::validate().

◆ get_built_programs()

const std::map<std::string, cl::Program>& get_built_programs ( ) const
inline

Access the cache of built OpenCL programs.

Definition at line 331 of file CLKernelLibrary.h.

332  {
333  return _built_programs_map;
334  }

Referenced by arm_compute::utils::save_program_cache_to_file(), and OpenCLMemoryUsage::test_measurements().

◆ get_device()

cl::Device& get_device ( )
inline

Gets the CL device for which the programs are created.

Definition at line 281 of file CLKernelLibrary.h.

282  {
283  return _device;
284  }

Referenced by arm_compute::test::validation::TEST_CASE().

◆ get_device_version()

std::string get_device_version ( )

Return the device version.

Returns
The content of CL_DEVICE_VERSION

Definition at line 1241 of file CLKernelLibrary.cpp.

1242 {
1243  return _device.getInfo<CL_DEVICE_VERSION>();
1244 }

◆ get_kernel_path()

std::string get_kernel_path ( )
inline

Gets the path that the kernels reside in.

Definition at line 231 of file CLKernelLibrary.h.

232  {
233  return _kernel_path;
234  };

◆ get_num_compute_units()

cl_uint get_num_compute_units ( )

Return the maximum number of compute units in the device.

Returns
The content of CL_DEVICE_MAX_COMPUTE_UNITS

Definition at line 1246 of file CLKernelLibrary.cpp.

1247 {
1248  return _device.getInfo<CL_DEVICE_MAX_COMPUTE_UNITS>();
1249 }

Referenced by CLTuner::tune_kernel_dynamic().

◆ get_program_source()

std::string get_program_source ( const std::string &  program_name)

Gets the source of the selected program.

Parameters
[in]program_nameProgram name.
Returns
Source of the selected program.

Definition at line 1198 of file CLKernelLibrary.cpp.

1199 {
1200  const auto program_source_it = _program_source_map.find(program_name);
1201 
1202  if(program_source_it == _program_source_map.end())
1203  {
1204  ARM_COMPUTE_ERROR("Embedded program for %s does not exist.", program_name.c_str());
1205  }
1206 
1207  return program_source_it->second;
1208 }
#define ARM_COMPUTE_ERROR(...)
Print the given message then throw an std::runtime_error.
Definition: Error.h:261

References ARM_COMPUTE_ERROR.

◆ init()

void init ( std::string  kernel_path,
cl::Context  context,
cl::Device  device 
)
inline

Initialises the kernel library.

Parameters
[in]kernel_pathPath of the directory from which kernel sources are loaded.
[in]contextCL context used to create programs.
[in]deviceCL device for which the programs are created.

Definition at line 215 of file CLKernelLibrary.h.

216  {
217  _kernel_path = std::move(kernel_path);
218  _context = std::move(context);
219  _device = std::move(device);
220  }
cl::Context & context()
Accessor for the associated CL context.

References CLKernelLibrary::context().

Referenced by CLScheduler::default_init(), and CLScheduler::default_init_with_context().

◆ int64_base_atomics_supported()

bool int64_base_atomics_supported ( ) const

Returns true if int64_base_atomics extension is supported by the CL device.

Returns
true if the CL device supports int64_base_atomics extension

Definition at line 1130 of file CLKernelLibrary.cpp.

1131 {
1132  return device_supports_extension(_device, "cl_khr_int64_base_atomics");
1133 }
bool device_supports_extension(const cl::Device &device, const char *extension_name)
Helper function to check whether a given extension is supported.
Definition: CLHelpers.cpp:187

References arm_compute::device_supports_extension().

◆ max_local_workgroup_size()

size_t max_local_workgroup_size ( const cl::Kernel &  kernel) const

Find the maximum number of local work items in a workgroup can be supported for the kernel.

Definition at line 1210 of file CLKernelLibrary.cpp.

1211 {
1212  size_t result;
1213 
1214  size_t err = kernel.getWorkGroupInfo(_device, CL_KERNEL_WORK_GROUP_SIZE, &result);
1215  ARM_COMPUTE_ERROR_ON_MSG(err != 0, "clGetKernelWorkGroupInfo failed to return the maximum workgroup size for the kernel");
1216  ARM_COMPUTE_UNUSED(err);
1217 
1218  return result;
1219 }
#define ARM_COMPUTE_UNUSED(...)
To avoid unused variables warnings.
Definition: Error.h:160
#define ARM_COMPUTE_ERROR_ON_MSG(cond,...)
Definition: Error.h:328

References ARM_COMPUTE_ERROR_ON_MSG, and ARM_COMPUTE_UNUSED.

Referenced by ICLKernel::get_max_workgroup_size().

◆ operator=()

const CLKernelLibrary& operator= ( const CLKernelLibrary )
delete

Prevent instances of this class from being copied.

◆ set_context()

void set_context ( cl::Context  context)
inline

Sets the CL context used to create programs.

Note
Setting the context also resets the device to the first one available in the new context.
Parameters
[in]contextA CL context.

Definition at line 249 of file CLKernelLibrary.h.

250  {
251  _context = std::move(context);
252  if(_context.get() == nullptr)
253  {
254  _device = cl::Device();
255  }
256  else
257  {
258  const auto cl_devices = _context.getInfo<CL_CONTEXT_DEVICES>();
259 
260  if(cl_devices.empty())
261  {
262  _device = cl::Device();
263  }
264  else
265  {
266  _device = cl_devices[0];
267  }
268  }
269  }
cl::Context & context()
Accessor for the associated CL context.

References CLKernelLibrary::context().

Referenced by CLScheduler::set_context().

◆ set_device()

void set_device ( cl::Device  device)
inline

Sets the CL device for which the programs are created.

Parameters
[in]deviceA CL device.

Definition at line 290 of file CLKernelLibrary.h.

291  {
292  _device = std::move(device);
293  }

◆ set_kernel_path()

void set_kernel_path ( const std::string &  kernel_path)
inline

Sets the path that the kernels reside in.

Parameters
[in]kernel_pathPath of the kernel.

Definition at line 225 of file CLKernelLibrary.h.

226  {
227  _kernel_path = kernel_path;
228  };

The documentation for this class was generated from the following files: