Compute Library
 19.08
GCDirectConvolutionLayerKernel< kernel_size > Class Template Reference

Interface for the direct convolution kernel. More...

#include <GCDirectConvolutionLayerKernel.h>

Collaboration diagram for GCDirectConvolutionLayerKernel< kernel_size >:
[legend]

Public Member Functions

 GCDirectConvolutionLayerKernel ()
 Default constructor. More...
 
 GCDirectConvolutionLayerKernel (const GCDirectConvolutionLayerKernel &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
GCDirectConvolutionLayerKerneloperator= (const GCDirectConvolutionLayerKernel &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 GCDirectConvolutionLayerKernel (GCDirectConvolutionLayerKernel &&)=default
 Allow instances of this class to be moved. More...
 
GCDirectConvolutionLayerKerneloperator= (GCDirectConvolutionLayerKernel &&)=default
 Allow instances of this class to be moved. More...
 
 ~GCDirectConvolutionLayerKernel ()=default
 Default destructor. More...
 
void configure (const IGCTensor *input, const IGCTensor *weights, const IGCTensor *bias, IGCTensor *output, const PadStrideInfo &conv_info, const ActivationLayerInfo &act_info=ActivationLayerInfo())
 Set the input and output of the kernel. More...
 
BorderSize border_size () const override
 The size of the border for that kernel. More...
 
void run (const Window &window) override
 Enqueue the OpenGL ES shader to process the given window. More...
 
- Public Member Functions inherited from IGCKernel
 IGCKernel ()
 Constructor. More...
 
GCKernelkernel ()
 Returns a reference to the GLES kernel of this object. More...
 
void add_1D_tensor_argument (unsigned int &idx, const IGCTensor *tensor, const unsigned int binding_point, const Window &window)
 Add the passed 1D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_2D_tensor_argument (unsigned int &idx, const IGCTensor *tensor, const unsigned int binding_point, const Window &window)
 Add the passed 2D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_3D_tensor_argument (unsigned int &idx, const IGCTensor *tensor, const unsigned int binding_point, const Window &window)
 Add the passed 3D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
unsigned int num_arguments_per_1D_tensor () const
 Returns the number of arguments enqueued per 1D tensor object. More...
 
unsigned int num_arguments_per_2D_tensor () const
 Returns the number of arguments enqueued per 2D tensor object. More...
 
unsigned int num_arguments_per_3D_tensor () const
 Returns the number of arguments enqueued per 3D tensor object. More...
 
void set_lws_hint (gles::NDRange &lws_hint)
 Set the Local-Workgroup-Size hint. More...
 
void set_target (GPUTarget target)
 Set the targeted GPU architecture. More...
 
GPUTarget get_target () const
 Get the targeted GPU architecture. More...
 
- Public Member Functions inherited from IKernel
 IKernel ()
 Constructor. More...
 
virtual ~IKernel ()=default
 Destructor. More...
 
virtual bool is_parallelisable () const
 Indicates whether or not the kernel is parallelisable. More...
 
const Windowwindow () const
 The maximum window the kernel can be executed on. More...
 

Detailed Description

template<unsigned int kernel_size>
class arm_compute::GCDirectConvolutionLayerKernel< kernel_size >

Interface for the direct convolution kernel.

Definition at line 37 of file GCDirectConvolutionLayerKernel.h.

Constructor & Destructor Documentation

◆ GCDirectConvolutionLayerKernel() [1/3]

Default constructor.

Definition at line 41 of file GCDirectConvolutionLayerKernel.cpp.

42  : _input(nullptr), _bias(nullptr), _weights(nullptr), _output(nullptr), _border_size(0), _conv_stride_x(0), _conv_stride_y(0), _conv_pad_x(0), _conv_pad_y(0), _lws(gles::NDRange(1U, 1U, 1U))
43 {
44 }
Class interface for specifying NDRange values.
Definition: OpenGLES.h:53

◆ GCDirectConvolutionLayerKernel() [2/3]

Prevent instances of this class from being copied (As this class contains pointers)

◆ GCDirectConvolutionLayerKernel() [3/3]

Allow instances of this class to be moved.

◆ ~GCDirectConvolutionLayerKernel()

Default destructor.

Member Function Documentation

◆ border_size()

BorderSize border_size ( ) const
overridevirtual

The size of the border for that kernel.

Returns
The width in number of elements of the border.

Reimplemented from IKernel.

Definition at line 47 of file GCDirectConvolutionLayerKernel.cpp.

48 {
49  return _border_size;
50 }

◆ configure()

void configure ( const IGCTensor input,
const IGCTensor weights,
const IGCTensor bias,
IGCTensor output,
const PadStrideInfo conv_info,
const ActivationLayerInfo act_info = ActivationLayerInfo() 
)

Set the input and output of the kernel.

Parameters
[in]inputThe input tensor to convert. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: F16/F32
[in]weightsWeights tensor. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. Data type supported:Same as input.
[in]biasBiases tensor. Shared bias supported. Biases are 1D tensor with dimensions [OFM]. Data type supported:Same as input.
[out]outputThe output tensor. First 2 lower dimensions represent a transform of each 3D input, while every dimension above represents a batch. Data types supported: Same as input
[in]conv_infoContains padding and stride information described in PadStrideInfo.
[in]act_info(Optional) Activation layer information in case of a fused activation.

Definition at line 53 of file GCDirectConvolutionLayerKernel.cpp.

55 {
57  ARM_COMPUTE_ERROR_ON(weights->info()->dimension(2) != input->info()->dimension(2));
60  ARM_COMPUTE_ERROR_ON_MSG((kernel_size == 3 && std::get<0>(conv_info.stride()) > 2), "Strides larger than 2 not supported in 3x3 direct convolution!");
61  ARM_COMPUTE_ERROR_ON(kernel_size != weights->info()->dimension(0));
63 
64  if(bias != nullptr)
65  {
67  // FIXME: Bug in framework, workaround it in tests currently.
68  //ARM_COMPUTE_ERROR_ON(bias->info()->dimension(0) != weights->info()->dimension(3));
70  }
71 
72  // Get convolved dimensions
73  unsigned int owidth = 0;
74  unsigned int oheight = 0;
75  std::tie(owidth, oheight) = scaled_dimensions(input->info()->dimension(0), input->info()->dimension(1), kernel_size, kernel_size, conv_info);
76 
78  output_shape.set(0, owidth);
79  output_shape.set(1, oheight);
80  output_shape.set(2, weights->info()->dimension(3));
81 
82  // Output auto inizialitation if not yet initialized
83  auto_init_if_empty(*output->info(), output_shape, 1, input->info()->data_type());
84 
88  ARM_COMPUTE_ERROR_ON(!conv_info.padding_is_symmetric());
89 
90  _conv_stride_x = std::get<0>(conv_info.stride());
91  _conv_stride_y = std::get<1>(conv_info.stride());
92  _conv_pad_x = std::get<0>(conv_info.pad());
93  _conv_pad_y = std::get<1>(conv_info.pad());
94 
95  _input = input;
96  _weights = weights;
97  _output = output;
98  _bias = bias;
99  _border_size = BorderSize(_conv_pad_y, _conv_pad_x);
100 
101  std::set<std::string> options;
102 
103  options.emplace("#define LOCAL_SIZE_X " + support::cpp11::to_string(_lws[0]));
104  options.emplace("#define LOCAL_SIZE_Y " + support::cpp11::to_string(_lws[1]));
105  options.emplace("#define LOCAL_SIZE_Z " + support::cpp11::to_string(_lws[2]));
106  options.emplace("#define STRIDE_X " + support::cpp11::to_string(_conv_stride_x));
107  options.emplace("#define STRIDE_Y " + support::cpp11::to_string(_conv_stride_y));
108 
109  std::string dt_name = (input->info()->data_type() == DataType::F32) ? "DATA_TYPE_FP32" : "DATA_TYPE_FP16";
110  options.emplace(("#define " + dt_name));
111 
112  // Activation information in case of a fused activation
113  if(act_info.enabled())
114  {
115  options.emplace("#define FUSED_ACTIVATION");
116  options.emplace(("#define " + string_from_activation_func(act_info.activation())));
117  options.emplace(("#define ACT_OP " + lower_string(string_from_activation_func(act_info.activation())) + "_op"));
118  options.emplace(("#define A_VAL " + float_to_string_with_full_precision(act_info.a())));
119  options.emplace(("#define B_VAL " + float_to_string_with_full_precision(act_info.b())));
120  }
121 
122  unsigned int num_elems_read_per_iteration_x = kernel_size * _conv_stride_x;
123  unsigned int num_elems_read_per_iteration_y = 1;
124  unsigned int num_elems_written_per_iteration_x = 1;
125  unsigned int num_elems_written_per_iteration_y = 1;
126  unsigned int num_elems_written_per_iteration_z = 1;
127 
128  if(kernel_size == 3)
129  {
130  if((_conv_stride_x == 1) && (_conv_stride_y == 1))
131  {
132  switch(input->info()->data_type())
133  {
134  case DataType::F16:
135  // TODO(APPBROWSER-299): Choose the most optimal path and remove others.
136 #define PROCESS_4X_3Y_1Z
137 
138 #if defined(PROCESS_8X_3Y_1Z)
139  options.emplace("#define PROCESS_8X_3Y_1Z");
140  num_elems_read_per_iteration_x = 16;
141  num_elems_read_per_iteration_y = 5;
142  num_elems_written_per_iteration_x = 8;
143  num_elems_written_per_iteration_y = 3;
144 #elif defined(PROCESS_4X_3Y_1Z)
145  options.emplace("#define PROCESS_4X_3Y_1Z");
146  num_elems_read_per_iteration_x = 8;
147  num_elems_read_per_iteration_y = 5;
148  num_elems_written_per_iteration_x = 4;
149  num_elems_written_per_iteration_y = 3;
150 #elif defined(PROCESS_4X_4Y_1Z)
151  options.emplace("#define PROCESS_4X_4Y_1Z");
152  num_elems_read_per_iteration_x = 8;
153  num_elems_read_per_iteration_y = 6;
154  num_elems_written_per_iteration_x = 4;
155  num_elems_written_per_iteration_y = 4;
156 #elif defined(PROCESS_4X_3Y_2Z)
157  options.emplace("#define PROCESS_4X_3Y_2Z");
158  num_elems_read_per_iteration_x = 8;
159  num_elems_read_per_iteration_y = 5;
160  num_elems_written_per_iteration_x = 4;
161  num_elems_written_per_iteration_y = 3;
162  num_elems_written_per_iteration_z = 2;
163 #endif /* PROCESS_nX_nY_nZ */
164 #undef PROCESS_8X_3Y_1Z
165 #undef PROCESS_4X_3Y_1Z
166 #undef PROCESS_4X_4Y_1Z
167 #undef PROCESS_4X_3Y_2Z
168  break;
169 
170  case DataType::F32:
171  options.emplace("#define PROCESS_4X_3Y_1Z");
172  num_elems_read_per_iteration_x = 8;
173  num_elems_read_per_iteration_y = 5;
174  num_elems_written_per_iteration_x = 4;
175  num_elems_written_per_iteration_y = 3;
176  break;
177 
178  default:
179  ARM_COMPUTE_ERROR("Current data type is not supported");
180  break;
181  }
182  }
183  // FIXME: Just keep one in release
184  else
185  {
186  switch(input->info()->data_type())
187  {
188  case DataType::F16:
189  options.emplace("#define PROCESS_4X_1Y_1Z");
190  num_elems_read_per_iteration_x = 8;
191  num_elems_written_per_iteration_x = 4;
192  break;
193 
194  case DataType::F32:
195  // TODO(APPBROWSER-299): Choose the most optimal path and remove others.
196 #define PROCESS_4X_1Y_1Z
197 
198 #if defined(PROCESS_1X_1Y_1Z)
199  options.emplace("#define PROCESS_1X_1Y_1Z");
200  num_elems_read_per_iteration_x = 3;
201  num_elems_written_per_iteration_x = 1;
202 #elif defined(PROCESS_4X_1Y_1Z)
203  options.emplace("#define PROCESS_4X_1Y_1Z");
204  num_elems_read_per_iteration_x = 8;
205  num_elems_written_per_iteration_x = 4;
206 #elif defined(PROCESS_8X_1Y_1Z)
207  options.emplace("#define PROCESS_8X_1Y_1Z");
208  num_elems_read_per_iteration_x = 12;
209  num_elems_written_per_iteration_x = 8;
210 #else /* PROCESS_nX_nY_nZ */
211 #error Have to declare how many elements to process in one thread.
212 #endif /* PROCESS_nX_nY_nZ */
213 #undef PROCESS_1X_1Y_1Z
214 #undef PROCESS_4X_1Y_1Z
215 #undef PROCESS_8X_1Y_1Z
216  break;
217 
218  default:
219  ARM_COMPUTE_ERROR("Current data type is not supported");
220  break;
221  }
222  }
223  }
224  else if(kernel_size == 1)
225  {
226  if(weights->info()->dimension(2) % 2 == 0)
227  {
228  options.emplace("#define WEIGHTS_OPTIMIZATION");
229  }
230  switch(input->info()->data_type())
231  {
232  case DataType::F16:
233 #define PROCESS_8X_2Y_1Z
234 
235 #if defined(PROCESS_4X_1Y_1Z)
236  options.emplace("#define PROCESS_4X_1Y_1Z");
237  num_elems_read_per_iteration_x = 4;
238  num_elems_written_per_iteration_x = 4;
239 #elif defined(PROCESS_4X_2Y_1Z)
240  options.emplace("#define PROCESS_4X_2Y_1Z");
241  num_elems_read_per_iteration_x = 4;
242  num_elems_read_per_iteration_y = 2;
243  num_elems_written_per_iteration_x = 4;
244  num_elems_written_per_iteration_y = 2;
245 #elif defined(PROCESS_4X_3Y_1Z)
246  options.emplace("#define PROCESS_4X_3Y_1Z");
247  num_elems_read_per_iteration_x = 4;
248  num_elems_read_per_iteration_y = 3;
249  num_elems_written_per_iteration_x = 4;
250  num_elems_written_per_iteration_y = 3;
251 #elif defined(PROCESS_4X_4Y_1Z)
252  options.emplace("#define PROCESS_4X_4Y_1Z");
253  num_elems_read_per_iteration_x = 4;
254  num_elems_read_per_iteration_y = 4;
255  num_elems_written_per_iteration_x = 4;
256  num_elems_written_per_iteration_y = 4;
257 #elif defined(PROCESS_4X_2Y_2Z)
258  ARM_COMPUTE_ERROR_ON_MSG((weights->info()->dimension(4) % 2) == 1, "Current 'weights->info()->dimension(4) % 2) == 1' is not supported");
259  options.emplace("#define PROCESS_4X_2Y_2Z");
260  num_elems_read_per_iteration_x = 4;
261  num_elems_read_per_iteration_y = 2;
262  num_elems_written_per_iteration_x = 4;
263  num_elems_written_per_iteration_y = 2;
264  num_elems_written_per_iteration_z = 2;
265 #elif defined(PROCESS_8X_1Y_1Z)
266  options.emplace("#define PROCESS_8X_1Y_1Z");
267  num_elems_read_per_iteration_x = 8;
268  num_elems_written_per_iteration_x = 8;
269 #elif defined(PROCESS_8X_2Y_1Z)
270  options.emplace("#define PROCESS_8X_2Y_1Z");
271  num_elems_read_per_iteration_x = 8;
272  num_elems_read_per_iteration_y = 2;
273  num_elems_written_per_iteration_x = 8;
274  num_elems_written_per_iteration_y = 2;
275 #else /* PROCESS_4X_1Y_1Z */
276 #error Have to declare how many elements to process in one thread.
277 #endif /* PROCESS_4X_1Y_1Z */
278 #undef PROCESS_4X_1Y_1Z
279 #undef PROCESS_4X_2Y_1Z
280 #undef PROCESS_4X_3Y_1Z
281 #undef PROCESS_4X_4Y_1Z
282 #undef PROCESS_4X_2Y_2Z
283 #undef PROCESS_8X_1Y_1Z
284 #undef PROCESS_8X_2Y_1Z
285  break;
286 
287  case DataType::F32:
288  num_elems_read_per_iteration_x = 1;
289  num_elems_written_per_iteration_x = 1;
290  break;
291 
292  default:
293  break;
294  }
295  }
296  else if(kernel_size == 5)
297  {
298  switch(input->info()->data_type())
299  {
300  case DataType::F16:
301  options.emplace("#define PROCESS_4X_1Y_1Z");
302  num_elems_read_per_iteration_x = 8;
303  num_elems_written_per_iteration_x = 4;
304 
305  default:
306  break;
307  }
308  }
309  else
310  {
311  }
312 
313  if(_bias != nullptr)
314  {
315  options.emplace("#define BIAS");
316  }
317 
318  std::stringstream kernel_name;
319  kernel_name << "direct_convolution" << kernel_size << "x" << kernel_size;
320 
321  _kernel = static_cast<GCKernel>(GCKernelLibrary::get().create_kernel(kernel_name.str(), options));
322 
323  unsigned int idx = (_bias == nullptr) ? 3 * num_arguments_per_3D_tensor() : (num_arguments_per_1D_tensor() + 3 * num_arguments_per_3D_tensor());
324 
325  // Calculate output right and bottom border
326  const int output_width = output->info()->dimension(0);
327  const int output_height = output->info()->dimension(1);
328  const int output_padding_right = ceil_to_multiple(output_width, num_elems_written_per_iteration_x * _lws[0]) - output_width;
329  const int output_padding_bottom = ceil_to_multiple(output_height, num_elems_written_per_iteration_y * _lws[1]) - output_height;
330 
331  // Calculate input right and bottom border
332  const int input_width = input->info()->dimension(0);
333  const int input_height = input->info()->dimension(1);
334  const int input_total_width = std::max(int(input->info()->padding().left), int(_conv_pad_x)) + input_width + std::max(int(input->info()->padding().right), int(_conv_pad_x));
335  const int input_total_height = std::max(int(input->info()->padding().top), int(_conv_pad_y)) + input_height + std::max(int(input->info()->padding().bottom), int(_conv_pad_y));
336  const int padding_right1 = ceil_to_multiple(input_total_width, num_elems_read_per_iteration_x * _lws[0]) - input_width - _conv_pad_x;
337  const int padding_bottom1 = ceil_to_multiple(input_total_height, num_elems_read_per_iteration_y * _lws[1]) - input_height - _conv_pad_y;
338 
339  const int upper_bound_w = ceil_to_multiple(((output_width + output_padding_right) * _conv_stride_x + (kernel_size - 1)), num_elems_read_per_iteration_x * _lws[0]) - _conv_pad_x - input_width;
340  const int upper_bound_h = ceil_to_multiple(((output_height + output_padding_bottom) * _conv_stride_y + (kernel_size - 1)), num_elems_read_per_iteration_y * _lws[1]) - _conv_pad_y - input_height;
341  const int padding_right2 = std::max(upper_bound_w, _conv_pad_x);
342  const int padding_bottom2 = std::max(upper_bound_h, _conv_pad_y);
343 
344  const int padding_right = std::max(padding_right1, padding_right2);
345  const int padding_bottom = std::max(padding_bottom1, padding_bottom2);
346 
347  BorderSize border = BorderSize(0, output_padding_right, output_padding_bottom, 0);
348 
349  Window win = calculate_max_enlarged_window(*output->info(), Steps(num_elems_written_per_iteration_x, num_elems_written_per_iteration_y, num_elems_written_per_iteration_z), border);
350 
351  AccessWindowStatic input_access(input->info(), -_conv_pad_x, -_conv_pad_y, input_width + padding_right, input_height + padding_bottom);
352  AccessWindowStatic weights_access = AccessWindowStatic(nullptr, 0, 0, 0, 0);
353  AccessWindowStatic bias_access = AccessWindowStatic(nullptr, 0, 0, 0, 1);
354 
355  switch(weights->info()->data_type())
356  {
357  case DataType::F16:
358  if((weights->info()->dimension(2) % 2 != 0) || (kernel_size != 1))
359  {
360  weights_access = AccessWindowStatic(weights->info(), 0, 0, kernel_size + 1, kernel_size);
361  }
362  if(_bias != nullptr)
363  {
364  bias_access = AccessWindowStatic(_bias->info(), 0, 0, _bias->info()->dimension(0) + 1, 1);
365  }
366  break;
367 
368  case DataType::F32:
369  weights_access = AccessWindowStatic(weights->info(), 0, 0, kernel_size, kernel_size);
370  if(_bias != nullptr)
371  {
372  bias_access = AccessWindowStatic(_bias->info(), 0, 0, _bias->info()->dimension(0), 1);
373  }
374  break;
375 
376  default:
377  ARM_COMPUTE_ERROR("Current data type is not supported");
378  break;
379  }
380 
381  AccessWindowStatic output_access(output->info(), 0, 0, output_width + output_padding_right, output_height + output_padding_bottom);
382 
383  if(_bias != nullptr)
384  {
385  update_window_and_padding(win, input_access, weights_access, bias_access, output_access);
386  }
387  else
388  {
389  update_window_and_padding(win, input_access, weights_access, output_access);
390  }
391 
392  output_access.set_valid_region(win, ValidRegion(Coordinates(), output->info()->tensor_shape()));
393 
394  _kernel.set_argument(idx++, _weights->info()->strides_in_bytes()[3]); // weights_stride_w
395  _kernel.set_argument(idx++, _weights->info()->dimension(2)); // weights_depth
396 
397  IGCKernel::configure(win);
398 }
#define ARM_COMPUTE_ERROR(...)
Print the given message then throw an std::runtime_error.
Definition: Error.h:261
unsigned int top
top of the border
Definition: Types.h:339
Shape of a tensor.
Definition: TensorShape.h:39
TensorInfo * info() const override
Interface to be implemented by the child class to return the tensor's metadata.
Definition: CLTensor.cpp:35
virtual size_t dimension(size_t index) const =0
Return the size of the requested dimension.
std::pair< unsigned int, unsigned int > scaled_dimensions(unsigned int width, unsigned int height, unsigned int kernel_width, unsigned int kernel_height, const PadStrideInfo &pad_stride_info, const Size2D &dilation=Size2D(1U, 1U))
Returns expected width and height of output scaled tensor depending on dimensions rounding mode.
Definition: Utils.cpp:387
#define ARM_COMPUTE_ERROR_ON_MISMATCHING_DATA_TYPES(...)
Definition: Validate.h:543
Container for 2D border size.
Definition: Types.h:259
std::string to_string(T &&value)
Convert integer and float values to string.
size_t dimension(size_t index) const override
Return the size of the requested dimension.
Definition: TensorInfo.h:223
virtual DataType data_type() const =0
Data type used for each element of the tensor.
1 channel, 1 F32 per channel
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:337
const std::string & string_from_activation_func(ActivationLayerInfo::ActivationFunction act)
Translates a given activation function to a string.
Definition: Utils.cpp:170
unsigned int bottom
bottom of the border
Definition: Types.h:341
unsigned int num_arguments_per_1D_tensor() const
Returns the number of arguments enqueued per 1D tensor object.
Definition: IGCKernel.cpp:137
unsigned int num_arguments_per_3D_tensor() const
Returns the number of arguments enqueued per 3D tensor object.
Definition: IGCKernel.cpp:147
std::string lower_string(const std::string &val)
Lower a given string.
Definition: Utils.cpp:327
bool auto_init_if_empty(ITensorInfo &info, const TensorShape &shape, int num_channels, DataType data_type, QuantizationInfo quantization_info=QuantizationInfo())
Auto initialize the tensor info (shape, number of channels and data type) if the current assignment i...
Definition: Helpers.inl:201
1 channel, 1 F16 per channel
Implementation of a static rectangular access pattern.
DataType data_type() const override
Data type used for each element of the tensor.
Definition: TensorInfo.h:256
Window calculate_max_enlarged_window(const ValidRegion &valid_region, const Steps &steps=Steps(), BorderSize border_size=BorderSize())
Calculate the maximum window for a given tensor shape and border setting.
Definition: Helpers.cpp:82
size_t num_dimensions() const override
The number of dimensions of the tensor (rank)
Definition: TensorInfo.h:244
bool update_window_and_padding(Window &win, Ts &&... patterns)
Update window and padding size for each of the access patterns.
Definition: Helpers.h:402
std::string float_to_string_with_full_precision(float val)
Create a string with the float in full precision.
Definition: Utils.h:1066
virtual const TensorShape & tensor_shape() const =0
Size for each dimension of the tensor.
auto ceil_to_multiple(S value, T divisor) -> decltype(((value+divisor - 1)/divisor) *divisor)
Computes the smallest number larger or equal to value that is a multiple of divisor.
Definition: Utils.h:66
Class to describe a number of elements in each dimension.
Definition: Steps.h:40
Coordinates of an item.
Definition: Coordinates.h:37
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor's metadata.
#define ARM_COMPUTE_ERROR_ON_MISMATCHING_DIMENSIONS(...)
Definition: Validate.h:286
virtual PaddingSize padding() const =0
Padding of tensor.
std::unique_ptr< Kernel > create_kernel()
Helper function to create and return a unique_ptr pointed to a CL/GLES kernel object.
Definition: Helpers.h:86
unsigned int left
left of the border
Definition: Types.h:342
unsigned int right
right of the border
Definition: Types.h:340
#define ARM_COMPUTE_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(t, c,...)
Definition: Validate.h:789
static GCKernelLibrary & get()
Get the static instance of GCKernelLibrary.
virtual const Strides & strides_in_bytes() const =0
The strides in bytes for accessing each dimension of the tensor.
Container for valid region of a window.
Definition: Types.h:174
Describe a multidimensional execution window.
Definition: Window.h:39
#define ARM_COMPUTE_ERROR_ON_MSG(cond,...)
Definition: Error.h:328

References arm_compute::test::validation::act_info, ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON, ARM_COMPUTE_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_ERROR_ON_MISMATCHING_DATA_TYPES, ARM_COMPUTE_ERROR_ON_MISMATCHING_DIMENSIONS, ARM_COMPUTE_ERROR_ON_MSG, arm_compute::auto_init_if_empty(), arm_compute::test::validation::bias, BorderSize::bottom, arm_compute::calculate_max_enlarged_window(), arm_compute::ceil_to_multiple(), arm_compute::test::validation::conv_info, arm_compute::create_kernel(), ITensorInfo::data_type(), TensorInfo::data_type(), ITensorInfo::dimension(), TensorInfo::dimension(), arm_compute::F16, arm_compute::F32, arm_compute::float_to_string_with_full_precision(), GCKernelLibrary::get(), ITensor::info(), CLTensor::info(), BorderSize::left, ActivationLayerInfo::LOGISTIC, arm_compute::lower_string(), TensorInfo::num_dimensions(), arm_compute::test::validation::output_shape, ITensorInfo::padding(), ActivationLayerInfo::RELU, BorderSize::right, arm_compute::scaled_dimensions(), arm_compute::string_from_activation_func(), ITensorInfo::tensor_shape(), arm_compute::support::cpp11::to_string(), BorderSize::top, arm_compute::update_window_and_padding(), and arm_compute::test::validation::weights.

◆ operator=() [1/2]

GCDirectConvolutionLayerKernel& operator= ( const GCDirectConvolutionLayerKernel< kernel_size > &  )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

GCDirectConvolutionLayerKernel& operator= ( GCDirectConvolutionLayerKernel< kernel_size > &&  )
default

Allow instances of this class to be moved.

◆ run()

void run ( const Window window)
overridevirtual

Enqueue the OpenGL ES shader to process the given window.

Parameters
[in]windowRegion on which to execute the kernel. (Must be a valid region of the window returned by window()).

Implements IGCKernel.

Definition at line 401 of file GCDirectConvolutionLayerKernel.cpp.

402 {
405 
406  _kernel.use();
407 
408  _output->set_needs_shifting(true);
409 
410  // Get initial windows
412  Window win_in = window;
413 
414  win_in.adjust(Window::DimX, -_conv_pad_x, true);
415  win_in.adjust(Window::DimY, -_conv_pad_y, true);
416  win_in.set_dimension_step(Window::DimX, window.x().step() * _conv_stride_x);
417  win_in.set_dimension_step(Window::DimY, window.y().step() * _conv_stride_y);
418 
419  Window slice_in = win_in.first_slice_window_3D();
420 
421  unsigned int idx1 = 2 * num_arguments_per_3D_tensor();
422  add_3D_tensor_argument(idx1, _weights, 3, slice);
423 
424  if(_bias != nullptr)
425  {
426  Window slice_bias;
427  slice_bias.use_tensor_dimensions(_bias->info()->tensor_shape());
428  add_1D_tensor_argument(idx1, _bias, 4, slice_bias);
429  }
430 
431  slice.shift(Window::DimX, -(_output->info()->padding()).left);
432 
433  do
434  {
435  unsigned int idx = 0;
436 
437  add_3D_tensor_argument(idx, _input, 1, slice_in);
438  add_3D_tensor_argument(idx, _output, 2, slice);
439 
440  _kernel.update_shader_params();
441  enqueue(*this, slice, _lws);
442  }
443  while(window.slide_window_slice_3D(slice) && win_in.slide_window_slice_3D(slice_in));
444 }
const Window & window() const
The maximum window the kernel can be executed on.
Definition: IKernel.cpp:28
void add_3D_tensor_argument(unsigned int &idx, const IGCTensor *tensor, const unsigned int binding_point, const Window &window)
Add the passed 3D tensor's parameters to the object's kernel's arguments starting from the index idx.
Definition: IGCKernel.cpp:132
void enqueue(cl::CommandQueue &queue, ICLKernel &kernel, const Window &window, const cl::NDRange &lws_hint=CLKernelLibrary::get().default_ndrange(), bool use_dummy_work_items=false)
Add the kernel to the command queue with the given window.
Definition: ICLKernel.cpp:39
constexpr int step() const
Return the step of the dimension.
Definition: Window.h:102
unsigned int num_arguments_per_3D_tensor() const
Returns the number of arguments enqueued per 3D tensor object.
Definition: IGCKernel.cpp:147
void use_tensor_dimensions(const TensorShape &shape, size_t first_dimension=Window::DimX)
Use the tensor's dimensions to fill the window dimensions.
Definition: Window.inl:250
static constexpr size_t DimX
Alias for dimension 0 also known as X dimension.
Definition: Window.h:43
virtual const TensorShape & tensor_shape() const =0
Size for each dimension of the tensor.
void set_needs_shifting(bool needs_shifting)
Set the flag indicating whether or not a tensor needs shifting.
Definition: IGCTensor.cpp:61
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor's metadata.
virtual PaddingSize padding() const =0
Padding of tensor.
bool slide_window_slice_3D(Window &slice) const
Slide the passed 3D window slice.
Definition: Window.h:319
static constexpr size_t DimY
Alias for dimension 1 also known as Y dimension.
Definition: Window.h:45
void set_dimension_step(size_t dimension, int step)
Set the step of a given dimension.
Definition: Window.inl:153
constexpr const Dimension & y() const
Alias to access the second dimension of the window.
Definition: Window.h:152
void adjust(size_t dimension, int adjust_value, bool is_at_start)
Adjust the start or end of a given dimension by the given value.
Definition: Window.inl:126
Window first_slice_window_3D() const
First 3D slice of the window.
Definition: Window.h:275
#define ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW(f, s)
Definition: Validate.h:205
Describe a multidimensional execution window.
Definition: Window.h:39
#define ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL(k)
Definition: Validate.h:940
void add_1D_tensor_argument(unsigned int &idx, const IGCTensor *tensor, const unsigned int binding_point, const Window &window)
Add the passed 1D tensor's parameters to the object's kernel's arguments starting from the index idx.
Definition: IGCKernel.cpp:122
SimpleTensor< T > slice(const SimpleTensor< T > &src, Coordinates starts, Coordinates ends)
constexpr const Dimension & x() const
Alias to access the first dimension of the window.
Definition: Window.h:143

References Window::adjust(), ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW, ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL, Window::DimX, Window::DimY, arm_compute::enqueue(), Window::first_slice_window_3D(), Window::set_dimension_step(), arm_compute::test::validation::reference::slice(), Window::slide_window_slice_3D(), Window::Dimension::step(), Window::use_tensor_dimensions(), IKernel::window(), Window::x(), and Window::y().


The documentation for this class was generated from the following files: