OpenCL kernel to fuse the batch normalization node to a preceding convolution node.
More...
|
| CLFuseBatchNormalizationKernel () |
| Default constructor. More...
|
|
| CLFuseBatchNormalizationKernel (const CLFuseBatchNormalizationKernel &)=delete |
| Prevent instances of this class from being copied (As this class contains pointers) More...
|
|
CLFuseBatchNormalizationKernel & | operator= (const CLFuseBatchNormalizationKernel &)=delete |
| Prevent instances of this class from being copied (As this class contains pointers) More...
|
|
| CLFuseBatchNormalizationKernel (CLFuseBatchNormalizationKernel &&)=default |
| Allow instances of this class to be moved. More...
|
|
CLFuseBatchNormalizationKernel & | operator= (CLFuseBatchNormalizationKernel &&)=default |
| Allow instances of this class to be moved. More...
|
|
| ~CLFuseBatchNormalizationKernel ()=default |
| Default destructor. More...
|
|
void | configure (const ICLTensor *input_weights, const ICLTensor *bn_mean, const ICLTensor *bn_var, ICLTensor *fused_weights, ICLTensor *fused_bias, const ICLTensor *input_bias=nullptr, const ICLTensor *bn_beta=nullptr, const ICLTensor *bn_gamma=nullptr, float epsilon=0.001f, FuseBatchNormalizationType fbn_type=FuseBatchNormalizationType::CONVOLUTION) |
| Set the source, destination of the kernel. More...
|
|
void | configure (const CLCompileContext &compile_context, const ICLTensor *input_weights, const ICLTensor *bn_mean, const ICLTensor *bn_var, ICLTensor *fused_weights, ICLTensor *fused_bias, const ICLTensor *input_bias=nullptr, const ICLTensor *bn_beta=nullptr, const ICLTensor *bn_gamma=nullptr, float epsilon=0.001f, FuseBatchNormalizationType fbn_type=FuseBatchNormalizationType::CONVOLUTION) |
| Set the source, destination of the kernel. More...
|
|
void | run (const Window &window, cl::CommandQueue &queue) override |
| Enqueue the OpenCL kernel to process the given window on the passed OpenCL command queue. More...
|
|
| ICLKernel () |
| Constructor. More...
|
|
cl::Kernel & | kernel () |
| Returns a reference to the OpenCL kernel of this object. More...
|
|
CLKernelType | type () const |
| Returns the CL kernel type. More...
|
|
template<typename T > |
void | add_1D_array_argument (unsigned int &idx, const ICLArray< T > *array, const Strides &strides, unsigned int num_dimensions, const Window &window) |
| Add the passed 1D array's parameters to the object's kernel's arguments starting from the index idx. More...
|
|
void | add_1D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window) |
| Add the passed 1D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
|
|
void | add_1D_tensor_argument_if (bool cond, unsigned int &idx, const ICLTensor *tensor, const Window &window) |
| Add the passed 1D tensor's parameters to the object's kernel's arguments starting from the index idx if the condition is true. More...
|
|
void | add_2D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window) |
| Add the passed 2D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
|
|
void | add_2D_tensor_argument_if (bool cond, unsigned int &idx, const ICLTensor *tensor, const Window &window) |
| Add the passed 2D tensor's parameters to the object's kernel's arguments starting from the index idx if the condition is true. More...
|
|
void | add_3D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window) |
| Add the passed 3D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
|
|
void | add_4D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window) |
| Add the passed 4D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
|
|
void | add_5D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window) |
| Add the passed 5D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
|
|
void | add_3d_tensor_nhw_argument (unsigned int &idx, const ICLTensor *tensor) |
| Add the passed NHW 3D tensor's parameters to the object's kernel's arguments by passing strides, dimensions and the offset to the first valid element in bytes. More...
|
|
void | add_4d_tensor_nhwc_argument (unsigned int &idx, const ICLTensor *tensor) |
| Add the passed NHWC 4D tensor's parameters to the object's kernel's arguments by passing strides, dimensions and the offset to the first valid element in bytes. More...
|
|
virtual void | run_op (ITensorPack &tensors, const Window &window, cl::CommandQueue &queue) |
| Enqueue the OpenCL kernel to process the given window on the passed OpenCL command queue. More...
|
|
template<typename T > |
void | add_argument (unsigned int &idx, T value) |
| Add the passed parameters to the object's kernel's arguments starting from the index idx. More...
|
|
void | set_lws_hint (const cl::NDRange &lws_hint) |
| Set the Local-Workgroup-Size hint. More...
|
|
cl::NDRange | lws_hint () const |
| Return the Local-Workgroup-Size hint. More...
|
|
void | set_wbsm_hint (const cl_int &wbsm_hint) |
| Set the workgroup batch size modifier hint. More...
|
|
cl_int | wbsm_hint () const |
| Return the workgroup batch size modifier hint. More...
|
|
const std::string & | config_id () const |
| Get the configuration ID. More...
|
|
void | set_target (GPUTarget target) |
| Set the targeted GPU architecture. More...
|
|
void | set_target (cl::Device &device) |
| Set the targeted GPU architecture according to the CL device. More...
|
|
GPUTarget | get_target () const |
| Get the targeted GPU architecture. More...
|
|
size_t | get_max_workgroup_size () |
| Get the maximum workgroup size for the device the CLKernelLibrary uses. More...
|
|
cl::NDRange | get_cached_gws () const |
| Get the cached gws used to enqueue this kernel. More...
|
|
void | cache_gws (const cl::NDRange &gws) |
| Cache the latest gws used to enqueue this kernel. More...
|
|
template<unsigned int dimension_size> |
void | add_tensor_argument (unsigned &idx, const ICLTensor *tensor, const Window &window) |
|
template<typename T , unsigned int dimension_size> |
void | add_array_argument (unsigned &idx, const ICLArray< T > *array, const Strides &strides, unsigned int num_dimensions, const Window &window) |
| Add the passed array's parameters to the object's kernel's arguments starting from the index idx. More...
|
|
| IKernel () |
| Constructor. More...
|
|
virtual | ~IKernel ()=default |
| Destructor. More...
|
|
virtual bool | is_parallelisable () const |
| Indicates whether or not the kernel is parallelisable. More...
|
|
virtual BorderSize | border_size () const |
| The size of the border for that kernel. More...
|
|
const Window & | window () const |
| The maximum window the kernel can be executed on. More...
|
|
bool | is_window_configured () const |
| Function to check if the embedded window of this kernel has been configured. More...
|
|
|
static Status | validate (const ITensorInfo *input_weights, const ITensorInfo *bn_mean, const ITensorInfo *bn_var, const ITensorInfo *fused_weights, const ITensorInfo *fused_bias, const ITensorInfo *input_bias=nullptr, const ITensorInfo *bn_beta=nullptr, const ITensorInfo *bn_gamma=nullptr, float epsilon=0.001f, FuseBatchNormalizationType fbn_type=FuseBatchNormalizationType::CONVOLUTION) |
| Static function to check if given info will lead to a valid configuration of CLFuseBatchNormalizationKernel. More...
|
|
constexpr static unsigned int | num_arguments_per_3d_tensor_nhw () |
| Returns the number of arguments enqueued per NHW 3D Tensor object. More...
|
|
constexpr static unsigned int | num_arguments_per_4d_tensor_nhwc () |
| Returns the number of arguments enqueued per NHWC 4D Tensor object. More...
|
|
constexpr static unsigned int | num_arguments_per_1D_array () |
| Returns the number of arguments enqueued per 1D array object. More...
|
|
constexpr static unsigned int | num_arguments_per_1D_tensor () |
| Returns the number of arguments enqueued per 1D tensor object. More...
|
|
constexpr static unsigned int | num_arguments_per_2D_tensor () |
| Returns the number of arguments enqueued per 2D tensor object. More...
|
|
constexpr static unsigned int | num_arguments_per_3D_tensor () |
| Returns the number of arguments enqueued per 3D tensor object. More...
|
|
constexpr static unsigned int | num_arguments_per_4D_tensor () |
| Returns the number of arguments enqueued per 4D tensor object. More...
|
|
static cl::NDRange | gws_from_window (const Window &window, bool use_dummy_work_items) |
| Get the global work size given an execution window. More...
|
|
OpenCL kernel to fuse the batch normalization node to a preceding convolution node.
Definition at line 35 of file CLFuseBatchNormalizationKernel.h.