Compute Library
 19.08
CLTuner Class Reference

Basic implementation of the OpenCL tuner interface. More...

#include <CLTuner.h>

Collaboration diagram for CLTuner:
[legend]

Public Member Functions

 CLTuner (bool tune_new_kernels=true)
 Constructor. More...
 
 ~CLTuner ()=default
 Destructor. More...
 
void set_tune_new_kernels (bool tune_new_kernels)
 Setter for tune_new_kernels option. More...
 
bool tune_new_kernels () const
 Tune kernels that are not in the LWS table. More...
 
void set_tuner_mode (CLTunerMode mode)
 Set OpenCL tuner mode. More...
 
CLTunerMode get_tuner_mode () const
 Get the current OpenCL tuner mode. More...
 
void add_lws_to_table (const std::string &kernel_id, cl::NDRange optimal_lws)
 Manually add a LWS for a kernel. More...
 
void import_lws_table (const std::unordered_map< std::string, cl::NDRange > &lws_table)
 Import LWS table. More...
 
const std::unordered_map< std::string, cl::NDRange > & lws_table () const
 Give read access to the LWS table. More...
 
void set_cl_kernel_event (cl_event kernel_event)
 Set the OpenCL kernel event. More...
 
void load_from_file (const std::string &filename)
 Load the LWS table from file. More...
 
void save_to_file (const std::string &filename) const
 Save the content of the LWS table to file. More...
 
void tune_kernel_static (ICLKernel &kernel) override
 Tune OpenCL kernel statically. More...
 
void tune_kernel_dynamic (ICLKernel &kernel) override
 Tune OpenCL kernel dynamically. More...
 
bool kernel_event_is_set () const
 Is the kernel_event set ? More...
 
- Public Member Functions inherited from ICLTuner
virtual ~ICLTuner ()=default
 Virtual destructor. More...
 

Data Fields

std::function< decltype(clEnqueueNDRangeKernel)> real_clEnqueueNDRangeKernel
 clEnqueueNDRangeKernel symbol More...
 

Detailed Description

Basic implementation of the OpenCL tuner interface.

Definition at line 38 of file CLTuner.h.

Constructor & Destructor Documentation

◆ CLTuner()

CLTuner ( bool  tune_new_kernels = true)

Constructor.

Parameters
[in]tune_new_kernelsFind the optimal local workgroup size for kernels which are not present in the table ?

Definition at line 40 of file CLTuner.cpp.

41  : real_clEnqueueNDRangeKernel(nullptr), _lws_table(), _kernel_event(), _tune_new_kernels(tune_new_kernels), _tuner_mode(CLTunerMode::NORMAL)
42 {
43 }
std::function< decltype(clEnqueueNDRangeKernel)> real_clEnqueueNDRangeKernel
clEnqueueNDRangeKernel symbol
Definition: CLTuner.h:102
bool tune_new_kernels() const
Tune kernels that are not in the LWS table.
Definition: CLTuner.cpp:58

References arm_compute::NORMAL.

◆ ~CLTuner()

~CLTuner ( )
default

Destructor.

Member Function Documentation

◆ add_lws_to_table()

void add_lws_to_table ( const std::string &  kernel_id,
cl::NDRange  optimal_lws 
)

Manually add a LWS for a kernel.

Parameters
[in]kernel_idUnique identifiant of the kernel
[in]optimal_lwsOptimal local workgroup size to use for the given kernel

Definition at line 109 of file CLTuner.cpp.

110 {
111  _lws_table.emplace(kernel_id, optimal_lws);
112 }

Referenced by CLTuner::tune_kernel_dynamic().

◆ get_tuner_mode()

CLTunerMode get_tuner_mode ( ) const

Get the current OpenCL tuner mode.

Returns
tuner_mode Indicates how exhaustive the search for the optimal LWS should be while tuning

Definition at line 67 of file CLTuner.cpp.

68 {
69  return _tuner_mode;
70 }

◆ import_lws_table()

void import_lws_table ( const std::unordered_map< std::string, cl::NDRange > &  lws_table)

Import LWS table.

Parameters
[in]lws_tableThe unordered_map container to import

Definition at line 222 of file CLTuner.cpp.

223 {
224  _lws_table.clear();
225  _lws_table = lws_table;
226 }
const std::unordered_map< std::string, cl::NDRange > & lws_table() const
Give read access to the LWS table.
Definition: CLTuner.cpp:228

References CLTuner::lws_table().

◆ kernel_event_is_set()

bool kernel_event_is_set ( ) const

Is the kernel_event set ?

Returns
true if the kernel_event is set.

Definition at line 45 of file CLTuner.cpp.

46 {
47  return _kernel_event() != nullptr;
48 }

◆ load_from_file()

void load_from_file ( const std::string &  filename)

Load the LWS table from file.

Parameters
[in]filenameLoad the LWS table from this file.(Must exist)

Definition at line 233 of file CLTuner.cpp.

234 {
235  std::ifstream fs;
236  fs.exceptions(std::ifstream::badbit);
237  fs.open(filename, std::ios::in);
238  if(!fs.is_open())
239  {
240  ARM_COMPUTE_ERROR("Failed to open '%s' (%s [%d])", filename.c_str(), strerror(errno), errno);
241  }
242  std::string line;
243  while(!std::getline(fs, line).fail())
244  {
245  std::istringstream ss(line);
246  std::string token;
247  if(std::getline(ss, token, ';').fail())
248  {
249  ARM_COMPUTE_ERROR("Malformed row '%s' in %s (Should be of the form 'kernel_id;lws[0];lws[1];lws[2]')", ss.str().c_str(), filename.c_str());
250  }
251  std::string kernel_id = token;
252  cl::NDRange lws(1, 1, 1);
253  for(int i = 0; i < 3; i++)
254  {
255  if(std::getline(ss, token, ';').fail())
256  {
257  ARM_COMPUTE_ERROR("Malformed row '%s' in %s (Should be of the form 'kernel_id;lws[0];lws[1];lws[2]')", ss.str().c_str(), filename.c_str());
258  }
259  lws.get()[i] = support::cpp11::stoi(token);
260  }
261 
262  // If all dimensions are 0: reset to NullRange (i.e nullptr)
263  if(lws[0] == 0 && lws[1] == 0 && lws[2] == 0)
264  {
265  lws = cl::NullRange;
266  }
267  add_lws_to_table(kernel_id, lws);
268  }
269  fs.close();
270 }
#define ARM_COMPUTE_ERROR(...)
Print the given message then throw an std::runtime_error.
Definition: Error.h:261
int stoi(const std::string &str, std::size_t *pos=0, NumericBase base=NumericBase::BASE_10)
Convert string values to integer.
void add_lws_to_table(const std::string &kernel_id, cl::NDRange optimal_lws)
Manually add a LWS for a kernel.
Definition: CLTuner.cpp:109

References ARM_COMPUTE_ERROR, and arm_compute::support::cpp11::stoi().

Referenced by main(), and CLDeviceBackend::setup_backend_context().

◆ lws_table()

const std::unordered_map< std::string, cl::NDRange > & lws_table ( ) const

Give read access to the LWS table.

Returns
The lws table as unordered_map container

Definition at line 228 of file CLTuner.cpp.

229 {
230  return _lws_table;
231 }

Referenced by CLTuner::import_lws_table(), and CLDeviceBackend::~CLDeviceBackend().

◆ save_to_file()

void save_to_file ( const std::string &  filename) const

Save the content of the LWS table to file.

Parameters
[in]filenameSave the LWS table to this file. (Content will be overwritten)

Definition at line 272 of file CLTuner.cpp.

273 {
274  std::ofstream fs;
275  fs.exceptions(std::ifstream::failbit | std::ifstream::badbit);
276  fs.open(filename, std::ios::out);
277  for(auto const &kernel_data : _lws_table)
278  {
279  fs << kernel_data.first << ";" << kernel_data.second[0] << ";" << kernel_data.second[1] << ";" << kernel_data.second[2] << std::endl;
280  }
281  fs.close();
282 }

Referenced by main(), and CLDeviceBackend::~CLDeviceBackend().

◆ set_cl_kernel_event()

void set_cl_kernel_event ( cl_event  kernel_event)

Set the OpenCL kernel event.

Note
The interceptor can use this function to store the event associated to the OpenCL kernel
Parameters
[in]kernel_eventThe OpenCL kernel event

Definition at line 49 of file CLTuner.cpp.

50 {
51  _kernel_event = kernel_event;
52 }

◆ set_tune_new_kernels()

void set_tune_new_kernels ( bool  tune_new_kernels)

Setter for tune_new_kernels option.

Parameters
[in]tune_new_kernelsFind the optimal local workgroup size for kernels which are not present in the table ?

Definition at line 54 of file CLTuner.cpp.

55 {
56  _tune_new_kernels = tune_new_kernels;
57 }
bool tune_new_kernels() const
Tune kernels that are not in the LWS table.
Definition: CLTuner.cpp:58

References CLTuner::tune_new_kernels().

Referenced by main(), and CLDeviceBackend::set_kernel_tuning().

◆ set_tuner_mode()

void set_tuner_mode ( CLTunerMode  mode)

Set OpenCL tuner mode.

Parameters
[in]modeIndicates how exhaustive the search for the optimal LWS should be while tuning. Default is Exhaustive mode

Definition at line 63 of file CLTuner.cpp.

64 {
65  _tuner_mode = mode;
66 }

Referenced by main(), and CLDeviceBackend::set_kernel_tuning_mode().

◆ tune_kernel_dynamic()

void tune_kernel_dynamic ( ICLKernel kernel)
overridevirtual

Tune OpenCL kernel dynamically.

Note
Tuning requires memory to be available on all kernel tensors and objects in order to be performed
Parameters
[in]kernelKernel to tune

Implements ICLTuner.

Definition at line 77 of file CLTuner.cpp.

78 {
79  // Get the configuration ID from the kernel and append GPU target name and number of available compute units
80  const std::string config_id = kernel.config_id() + "_" + string_from_target(kernel.get_target()) + "_MP" + support::cpp11::to_string(CLKernelLibrary::get().get_num_compute_units());
81 
82  // Check if we need to find the Optimal LWS. If the kernel's config_id is equal to default_config_id, the kernel does not require to be tuned
83  if(kernel.config_id() != arm_compute::default_config_id)
84  {
85  auto p = _lws_table.find(config_id);
86 
87  if(p == _lws_table.end())
88  {
89  if(_tune_new_kernels)
90  {
91  // Find the optimal LWS for the kernel
92  cl::NDRange opt_lws = find_optimal_lws(kernel);
93 
94  // Insert the optimal LWS in the table
95  add_lws_to_table(config_id, opt_lws);
96 
97  // Set Local-Workgroup-Size
98  kernel.set_lws_hint(opt_lws);
99  }
100  }
101  else
102  {
103  // Set Local-Workgroup-Size
104  kernel.set_lws_hint(p->second);
105  }
106  }
107 }
std::string to_string(T &&value)
Convert integer and float values to string.
static CLKernelLibrary & get()
Access the KernelLibrary singleton.
const std::string & string_from_target(GPUTarget target)
Translates a given gpu device target to string.
Definition: GPUTarget.cpp:115
cl_uint get_num_compute_units()
Return the maximum number of compute units in the device.
void add_lws_to_table(const std::string &kernel_id, cl::NDRange optimal_lws)
Manually add a LWS for a kernel.
Definition: CLTuner.cpp:109

References CLTuner::add_lws_to_table(), ICLKernel::config_id(), CLKernelLibrary::get(), CLKernelLibrary::get_num_compute_units(), ICLKernel::get_target(), ICLKernel::set_lws_hint(), arm_compute::string_from_target(), and arm_compute::support::cpp11::to_string().

◆ tune_kernel_static()

void tune_kernel_static ( ICLKernel kernel)
overridevirtual

Tune OpenCL kernel statically.

Note
Tuning is performed using only kernel and tensor metadata, thus can be performed when memory is not available
Parameters
[in]kernelKernel to tune

Implements ICLTuner.

Definition at line 72 of file CLTuner.cpp.

73 {
74  ARM_COMPUTE_UNUSED(kernel);
75 }
#define ARM_COMPUTE_UNUSED(...)
To avoid unused variables warnings.
Definition: Error.h:160

References ARM_COMPUTE_UNUSED.

◆ tune_new_kernels()

bool tune_new_kernels ( ) const

Tune kernels that are not in the LWS table.

Returns
True if tuning of new kernels is enabled.

Definition at line 58 of file CLTuner.cpp.

59 {
60  return _tune_new_kernels;
61 }

Referenced by CLTuner::set_tune_new_kernels(), and CLDeviceBackend::~CLDeviceBackend().

Field Documentation

◆ real_clEnqueueNDRangeKernel

std::function<decltype(clEnqueueNDRangeKernel)> real_clEnqueueNDRangeKernel

clEnqueueNDRangeKernel symbol

Definition at line 102 of file CLTuner.h.


The documentation for this class was generated from the following files: