Compute Library
 21.02
GCDirectConvolutionLayerKernel< kernel_size > Class Template Reference

Interface for the direct convolution kernel. More...

#include <GCDirectConvolutionLayerKernel.h>

Collaboration diagram for GCDirectConvolutionLayerKernel< kernel_size >:
[legend]

Public Member Functions

 GCDirectConvolutionLayerKernel ()
 Default constructor. More...
 
 GCDirectConvolutionLayerKernel (const GCDirectConvolutionLayerKernel &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
GCDirectConvolutionLayerKerneloperator= (const GCDirectConvolutionLayerKernel &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 GCDirectConvolutionLayerKernel (GCDirectConvolutionLayerKernel &&)=default
 Allow instances of this class to be moved. More...
 
GCDirectConvolutionLayerKerneloperator= (GCDirectConvolutionLayerKernel &&)=default
 Allow instances of this class to be moved. More...
 
 ~GCDirectConvolutionLayerKernel ()=default
 Default destructor. More...
 
void configure (const IGCTensor *input, const IGCTensor *weights, const IGCTensor *bias, IGCTensor *output, const PadStrideInfo &conv_info, const ActivationLayerInfo &act_info=ActivationLayerInfo())
 Set the input and output of the kernel. More...
 
BorderSize border_size () const override
 The size of the border for that kernel. More...
 
void run (const Window &window) override
 Enqueue the OpenGL ES shader to process the given window. More...
 
- Public Member Functions inherited from IGCKernel
 IGCKernel ()
 Constructor. More...
 
GCKernelkernel ()
 Returns a reference to the GLES kernel of this object. More...
 
void add_1D_tensor_argument (unsigned int &idx, const IGCTensor *tensor, const unsigned int binding_point, const Window &window)
 Add the passed 1D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_2D_tensor_argument (unsigned int &idx, const IGCTensor *tensor, const unsigned int binding_point, const Window &window)
 Add the passed 2D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
void add_3D_tensor_argument (unsigned int &idx, const IGCTensor *tensor, const unsigned int binding_point, const Window &window)
 Add the passed 3D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
 
unsigned int num_arguments_per_1D_tensor () const
 Returns the number of arguments enqueued per 1D tensor object. More...
 
unsigned int num_arguments_per_2D_tensor () const
 Returns the number of arguments enqueued per 2D tensor object. More...
 
unsigned int num_arguments_per_3D_tensor () const
 Returns the number of arguments enqueued per 3D tensor object. More...
 
void set_lws_hint (gles::NDRange &lws_hint)
 Set the Local-Workgroup-Size hint. More...
 
void set_target (GPUTarget target)
 Set the targeted GPU architecture. More...
 
GPUTarget get_target () const
 Get the targeted GPU architecture. More...
 
- Public Member Functions inherited from IKernel
 IKernel ()
 Constructor. More...
 
virtual ~IKernel ()=default
 Destructor. More...
 
virtual bool is_parallelisable () const
 Indicates whether or not the kernel is parallelisable. More...
 
const Windowwindow () const
 The maximum window the kernel can be executed on. More...
 

Detailed Description

template<unsigned int kernel_size>
class arm_compute::GCDirectConvolutionLayerKernel< kernel_size >

Interface for the direct convolution kernel.

Definition at line 37 of file GCDirectConvolutionLayerKernel.h.

Constructor & Destructor Documentation

◆ GCDirectConvolutionLayerKernel() [1/3]

Default constructor.

Definition at line 43 of file GCDirectConvolutionLayerKernel.cpp.

44  : _input(nullptr), _bias(nullptr), _weights(nullptr), _output(nullptr), _border_size(0), _conv_stride_x(0), _conv_stride_y(0), _conv_pad_x(0), _conv_pad_y(0), _lws(gles::NDRange(1U, 1U, 1U))
45 {
46 }
Class interface for specifying NDRange values.
Definition: OpenGLES.h:53

◆ GCDirectConvolutionLayerKernel() [2/3]

Prevent instances of this class from being copied (As this class contains pointers)

◆ GCDirectConvolutionLayerKernel() [3/3]

Allow instances of this class to be moved.

◆ ~GCDirectConvolutionLayerKernel()

Default destructor.

Member Function Documentation

◆ border_size()

BorderSize border_size ( ) const
overridevirtual

The size of the border for that kernel.

Returns
The width in number of elements of the border.

Reimplemented from IKernel.

Definition at line 49 of file GCDirectConvolutionLayerKernel.cpp.

50 {
51  return _border_size;
52 }

◆ configure()

void configure ( const IGCTensor input,
const IGCTensor weights,
const IGCTensor bias,
IGCTensor output,
const PadStrideInfo conv_info,
const ActivationLayerInfo act_info = ActivationLayerInfo() 
)

Set the input and output of the kernel.

Parameters
[in]inputThe input tensor to convert. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: F16/F32
[in]weightsWeights tensor. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. Data type supported:Same as input.
[in]biasBiases tensor. Shared bias supported. Biases are 1D tensor with dimensions [OFM]. Data type supported:Same as input.
[out]outputThe output tensor. First 2 lower dimensions represent a transform of each 3D input, while every dimension above represents a batch. Data types supported: Same as input
[in]conv_infoContains padding and stride information described in PadStrideInfo.
[in]act_info(Optional) Activation layer information in case of a fused activation.

Definition at line 55 of file GCDirectConvolutionLayerKernel.cpp.

References ActivationLayerInfo::a(), ActivationLayerInfo::activation(), ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON, ARM_COMPUTE_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_ERROR_ON_MISMATCHING_DATA_TYPES, ARM_COMPUTE_ERROR_ON_MISMATCHING_DIMENSIONS, ARM_COMPUTE_ERROR_ON_MSG, arm_compute::auto_init_if_empty(), ActivationLayerInfo::b(), BorderSize::bottom, arm_compute::calculate_max_enlarged_window(), arm_compute::ceil_to_multiple(), arm_compute::test::validation::conv_info, GCKernelLibrary::create_kernel(), ITensorInfo::data_type(), ITensorInfo::dimension(), ActivationLayerInfo::enabled(), arm_compute::F16, arm_compute::F32, arm_compute::float_to_string_with_full_precision(), GCKernelLibrary::get(), ITensor::info(), arm_compute::test::validation::input, input_height, input_width, kernel_name, BorderSize::left, ActivationLayerInfo::LOGISTIC, arm_compute::lower_string(), IGCKernel::num_arguments_per_1D_tensor(), IGCKernel::num_arguments_per_3D_tensor(), ITensorInfo::num_dimensions(), arm_compute::test::validation::output_shape, PadStrideInfo::pad(), ITensorInfo::padding(), PadStrideInfo::padding_is_symmetric(), ActivationLayerInfo::RELU, BorderSize::right, arm_compute::scaled_dimensions(), TensorShape::set(), PadStrideInfo::stride(), ITensorInfo::strides_in_bytes(), arm_compute::string_from_activation_func(), ITensorInfo::tensor_shape(), arm_compute::support::cpp11::to_string(), BorderSize::top, and arm_compute::update_window_and_padding().

57 {
59  ARM_COMPUTE_ERROR_ON(weights->info()->dimension(2) != input->info()->dimension(2));
60  ARM_COMPUTE_ERROR_ON(weights->info()->dimension(0) != weights->info()->dimension(1));
61  ARM_COMPUTE_ERROR_ON(weights->info()->num_dimensions() > 4);
62  ARM_COMPUTE_ERROR_ON_MSG((kernel_size == 3 && std::get<0>(conv_info.stride()) > 2), "Strides larger than 2 not supported in 3x3 direct convolution!");
63  ARM_COMPUTE_ERROR_ON(kernel_size != weights->info()->dimension(0));
65 
66  if(bias != nullptr)
67  {
69  // FIXME: Bug in framework, workaround it in tests currently.
70  //ARM_COMPUTE_ERROR_ON(bias->info()->dimension(0) != weights->info()->dimension(3));
72  }
73 
74  // Get convolved dimensions
75  unsigned int owidth = 0;
76  unsigned int oheight = 0;
77  std::tie(owidth, oheight) = scaled_dimensions(input->info()->dimension(0), input->info()->dimension(1), kernel_size, kernel_size, conv_info);
78 
80  output_shape.set(0, owidth);
81  output_shape.set(1, oheight);
82  output_shape.set(2, weights->info()->dimension(3));
83 
84  // Output auto inizialitation if not yet initialized
85  auto_init_if_empty(*output->info(), output_shape, 1, input->info()->data_type());
86 
87  ARM_COMPUTE_ERROR_ON_MISMATCHING_DATA_TYPES(input, weights, output);
91 
92  _conv_stride_x = std::get<0>(conv_info.stride());
93  _conv_stride_y = std::get<1>(conv_info.stride());
94  _conv_pad_x = std::get<0>(conv_info.pad());
95  _conv_pad_y = std::get<1>(conv_info.pad());
96 
97  _input = input;
98  _weights = weights;
99  _output = output;
100  _bias = bias;
101  _border_size = BorderSize(_conv_pad_y, _conv_pad_x);
102 
103  std::set<std::string> options;
104 
105  options.emplace("#define LOCAL_SIZE_X " + support::cpp11::to_string(_lws[0]));
106  options.emplace("#define LOCAL_SIZE_Y " + support::cpp11::to_string(_lws[1]));
107  options.emplace("#define LOCAL_SIZE_Z " + support::cpp11::to_string(_lws[2]));
108  options.emplace("#define STRIDE_X " + support::cpp11::to_string(_conv_stride_x));
109  options.emplace("#define STRIDE_Y " + support::cpp11::to_string(_conv_stride_y));
110 
111  std::string dt_name = (input->info()->data_type() == DataType::F32) ? "DATA_TYPE_FP32" : "DATA_TYPE_FP16";
112  options.emplace(("#define " + dt_name));
113 
114  // Activation information in case of a fused activation
115  if(act_info.enabled())
116  {
117  options.emplace("#define FUSED_ACTIVATION");
118  options.emplace(("#define " + string_from_activation_func(act_info.activation())));
119  options.emplace(("#define ACT_OP " + lower_string(string_from_activation_func(act_info.activation())) + "_op"));
120  options.emplace(("#define A_VAL " + float_to_string_with_full_precision(act_info.a())));
121  options.emplace(("#define B_VAL " + float_to_string_with_full_precision(act_info.b())));
122  }
123 
124  unsigned int num_elems_read_per_iteration_x = kernel_size * _conv_stride_x;
125  unsigned int num_elems_read_per_iteration_y = 1;
126  unsigned int num_elems_written_per_iteration_x = 1;
127  unsigned int num_elems_written_per_iteration_y = 1;
128  unsigned int num_elems_written_per_iteration_z = 1;
129 
130  if(kernel_size == 3)
131  {
132  if((_conv_stride_x == 1) && (_conv_stride_y == 1))
133  {
134  switch(input->info()->data_type())
135  {
136  case DataType::F16:
137  // TODO(APPBROWSER-299): Choose the most optimal path and remove others.
138 #define PROCESS_4X_3Y_1Z
139 
140 #if defined(PROCESS_8X_3Y_1Z)
141  options.emplace("#define PROCESS_8X_3Y_1Z");
142  num_elems_read_per_iteration_x = 16;
143  num_elems_read_per_iteration_y = 5;
144  num_elems_written_per_iteration_x = 8;
145  num_elems_written_per_iteration_y = 3;
146 #elif defined(PROCESS_4X_3Y_1Z)
147  options.emplace("#define PROCESS_4X_3Y_1Z");
148  num_elems_read_per_iteration_x = 8;
149  num_elems_read_per_iteration_y = 5;
150  num_elems_written_per_iteration_x = 4;
151  num_elems_written_per_iteration_y = 3;
152 #elif defined(PROCESS_4X_4Y_1Z)
153  options.emplace("#define PROCESS_4X_4Y_1Z");
154  num_elems_read_per_iteration_x = 8;
155  num_elems_read_per_iteration_y = 6;
156  num_elems_written_per_iteration_x = 4;
157  num_elems_written_per_iteration_y = 4;
158 #elif defined(PROCESS_4X_3Y_2Z)
159  options.emplace("#define PROCESS_4X_3Y_2Z");
160  num_elems_read_per_iteration_x = 8;
161  num_elems_read_per_iteration_y = 5;
162  num_elems_written_per_iteration_x = 4;
163  num_elems_written_per_iteration_y = 3;
164  num_elems_written_per_iteration_z = 2;
165 #endif /* PROCESS_nX_nY_nZ */
166 #undef PROCESS_8X_3Y_1Z
167 #undef PROCESS_4X_3Y_1Z
168 #undef PROCESS_4X_4Y_1Z
169 #undef PROCESS_4X_3Y_2Z
170  break;
171 
172  case DataType::F32:
173  options.emplace("#define PROCESS_4X_3Y_1Z");
174  num_elems_read_per_iteration_x = 8;
175  num_elems_read_per_iteration_y = 5;
176  num_elems_written_per_iteration_x = 4;
177  num_elems_written_per_iteration_y = 3;
178  break;
179 
180  default:
181  ARM_COMPUTE_ERROR("Current data type is not supported");
182  break;
183  }
184  }
185  // FIXME: Just keep one in release
186  else
187  {
188  switch(input->info()->data_type())
189  {
190  case DataType::F16:
191  options.emplace("#define PROCESS_4X_1Y_1Z");
192  num_elems_read_per_iteration_x = 8;
193  num_elems_written_per_iteration_x = 4;
194  break;
195 
196  case DataType::F32:
197  // TODO(APPBROWSER-299): Choose the most optimal path and remove others.
198 #define PROCESS_4X_1Y_1Z
199 
200 #if defined(PROCESS_1X_1Y_1Z)
201  options.emplace("#define PROCESS_1X_1Y_1Z");
202  num_elems_read_per_iteration_x = 3;
203  num_elems_written_per_iteration_x = 1;
204 #elif defined(PROCESS_4X_1Y_1Z)
205  options.emplace("#define PROCESS_4X_1Y_1Z");
206  num_elems_read_per_iteration_x = 8;
207  num_elems_written_per_iteration_x = 4;
208 #elif defined(PROCESS_8X_1Y_1Z)
209  options.emplace("#define PROCESS_8X_1Y_1Z");
210  num_elems_read_per_iteration_x = 12;
211  num_elems_written_per_iteration_x = 8;
212 #else /* PROCESS_nX_nY_nZ */
213 #error Have to declare how many elements to process in one thread.
214 #endif /* PROCESS_nX_nY_nZ */
215 #undef PROCESS_1X_1Y_1Z
216 #undef PROCESS_4X_1Y_1Z
217 #undef PROCESS_8X_1Y_1Z
218  break;
219 
220  default:
221  ARM_COMPUTE_ERROR("Current data type is not supported");
222  break;
223  }
224  }
225  }
226  else if(kernel_size == 1)
227  {
228  if(weights->info()->dimension(2) % 2 == 0)
229  {
230  options.emplace("#define WEIGHTS_OPTIMIZATION");
231  }
232  switch(input->info()->data_type())
233  {
234  case DataType::F16:
235 #define PROCESS_8X_2Y_1Z
236 
237 #if defined(PROCESS_4X_1Y_1Z)
238  options.emplace("#define PROCESS_4X_1Y_1Z");
239  num_elems_read_per_iteration_x = 4;
240  num_elems_written_per_iteration_x = 4;
241 #elif defined(PROCESS_4X_2Y_1Z)
242  options.emplace("#define PROCESS_4X_2Y_1Z");
243  num_elems_read_per_iteration_x = 4;
244  num_elems_read_per_iteration_y = 2;
245  num_elems_written_per_iteration_x = 4;
246  num_elems_written_per_iteration_y = 2;
247 #elif defined(PROCESS_4X_3Y_1Z)
248  options.emplace("#define PROCESS_4X_3Y_1Z");
249  num_elems_read_per_iteration_x = 4;
250  num_elems_read_per_iteration_y = 3;
251  num_elems_written_per_iteration_x = 4;
252  num_elems_written_per_iteration_y = 3;
253 #elif defined(PROCESS_4X_4Y_1Z)
254  options.emplace("#define PROCESS_4X_4Y_1Z");
255  num_elems_read_per_iteration_x = 4;
256  num_elems_read_per_iteration_y = 4;
257  num_elems_written_per_iteration_x = 4;
258  num_elems_written_per_iteration_y = 4;
259 #elif defined(PROCESS_4X_2Y_2Z)
260  ARM_COMPUTE_ERROR_ON_MSG((weights->info()->dimension(4) % 2) == 1, "Current 'weights->info()->dimension(4) % 2) == 1' is not supported");
261  options.emplace("#define PROCESS_4X_2Y_2Z");
262  num_elems_read_per_iteration_x = 4;
263  num_elems_read_per_iteration_y = 2;
264  num_elems_written_per_iteration_x = 4;
265  num_elems_written_per_iteration_y = 2;
266  num_elems_written_per_iteration_z = 2;
267 #elif defined(PROCESS_8X_1Y_1Z)
268  options.emplace("#define PROCESS_8X_1Y_1Z");
269  num_elems_read_per_iteration_x = 8;
270  num_elems_written_per_iteration_x = 8;
271 #elif defined(PROCESS_8X_2Y_1Z)
272  options.emplace("#define PROCESS_8X_2Y_1Z");
273  num_elems_read_per_iteration_x = 8;
274  num_elems_read_per_iteration_y = 2;
275  num_elems_written_per_iteration_x = 8;
276  num_elems_written_per_iteration_y = 2;
277 #else /* PROCESS_4X_1Y_1Z */
278 #error Have to declare how many elements to process in one thread.
279 #endif /* PROCESS_4X_1Y_1Z */
280 #undef PROCESS_4X_1Y_1Z
281 #undef PROCESS_4X_2Y_1Z
282 #undef PROCESS_4X_3Y_1Z
283 #undef PROCESS_4X_4Y_1Z
284 #undef PROCESS_4X_2Y_2Z
285 #undef PROCESS_8X_1Y_1Z
286 #undef PROCESS_8X_2Y_1Z
287  break;
288 
289  case DataType::F32:
290  num_elems_read_per_iteration_x = 1;
291  num_elems_written_per_iteration_x = 1;
292  break;
293 
294  default:
295  break;
296  }
297  }
298  else if(kernel_size == 5)
299  {
300  switch(input->info()->data_type())
301  {
302  case DataType::F16:
303  options.emplace("#define PROCESS_4X_1Y_1Z");
304  num_elems_read_per_iteration_x = 8;
305  num_elems_written_per_iteration_x = 4;
306 
307  default:
308  break;
309  }
310  }
311  else
312  {
313  }
314 
315  if(_bias != nullptr)
316  {
317  options.emplace("#define BIAS");
318  }
319 
320  std::stringstream kernel_name;
321  kernel_name << "direct_convolution" << kernel_size << "x" << kernel_size;
322 
323  _kernel = static_cast<GCKernel>(GCKernelLibrary::get().create_kernel(kernel_name.str(), options));
324 
325  unsigned int idx = (_bias == nullptr) ? 3 * num_arguments_per_3D_tensor() : (num_arguments_per_1D_tensor() + 3 * num_arguments_per_3D_tensor());
326 
327  // Calculate output right and bottom border
328  const int output_width = output->info()->dimension(0);
329  const int output_height = output->info()->dimension(1);
330  const int output_padding_right = ceil_to_multiple(output_width, num_elems_written_per_iteration_x * _lws[0]) - output_width;
331  const int output_padding_bottom = ceil_to_multiple(output_height, num_elems_written_per_iteration_y * _lws[1]) - output_height;
332 
333  // Calculate input right and bottom border
334  const int input_width = input->info()->dimension(0);
335  const int input_height = input->info()->dimension(1);
336  const int input_total_width = std::max(int(input->info()->padding().left), int(_conv_pad_x)) + input_width + std::max(int(input->info()->padding().right), int(_conv_pad_x));
337  const int input_total_height = std::max(int(input->info()->padding().top), int(_conv_pad_y)) + input_height + std::max(int(input->info()->padding().bottom), int(_conv_pad_y));
338  const int padding_right1 = ceil_to_multiple(input_total_width, num_elems_read_per_iteration_x * _lws[0]) - input_width - _conv_pad_x;
339  const int padding_bottom1 = ceil_to_multiple(input_total_height, num_elems_read_per_iteration_y * _lws[1]) - input_height - _conv_pad_y;
340 
341  const int upper_bound_w = ceil_to_multiple(((output_width + output_padding_right) * _conv_stride_x + (kernel_size - 1)), num_elems_read_per_iteration_x * _lws[0]) - _conv_pad_x - input_width;
342  const int upper_bound_h = ceil_to_multiple(((output_height + output_padding_bottom) * _conv_stride_y + (kernel_size - 1)), num_elems_read_per_iteration_y * _lws[1]) - _conv_pad_y - input_height;
343  const int padding_right2 = std::max(upper_bound_w, _conv_pad_x);
344  const int padding_bottom2 = std::max(upper_bound_h, _conv_pad_y);
345 
346  const int padding_right = std::max(padding_right1, padding_right2);
347  const int padding_bottom = std::max(padding_bottom1, padding_bottom2);
348 
349  BorderSize border = BorderSize(0, output_padding_right, output_padding_bottom, 0);
350 
351  Window win = calculate_max_enlarged_window(*output->info(), Steps(num_elems_written_per_iteration_x, num_elems_written_per_iteration_y, num_elems_written_per_iteration_z), border);
352 
353  AccessWindowStatic input_access(input->info(), -_conv_pad_x, -_conv_pad_y, input_width + padding_right, input_height + padding_bottom);
354  AccessWindowStatic weights_access = AccessWindowStatic(nullptr, 0, 0, 0, 0);
355  AccessWindowStatic bias_access = AccessWindowStatic(nullptr, 0, 0, 0, 1);
356 
357  switch(weights->info()->data_type())
358  {
359  case DataType::F16:
360  if((weights->info()->dimension(2) % 2 != 0) || (kernel_size != 1))
361  {
362  weights_access = AccessWindowStatic(weights->info(), 0, 0, kernel_size + 1, kernel_size);
363  }
364  if(_bias != nullptr)
365  {
366  bias_access = AccessWindowStatic(_bias->info(), 0, 0, _bias->info()->dimension(0) + 1, 1);
367  }
368  break;
369 
370  case DataType::F32:
371  weights_access = AccessWindowStatic(weights->info(), 0, 0, kernel_size, kernel_size);
372  if(_bias != nullptr)
373  {
374  bias_access = AccessWindowStatic(_bias->info(), 0, 0, _bias->info()->dimension(0), 1);
375  }
376  break;
377 
378  default:
379  ARM_COMPUTE_ERROR("Current data type is not supported");
380  break;
381  }
382 
383  AccessWindowStatic output_access(output->info(), 0, 0, output_width + output_padding_right, output_height + output_padding_bottom);
384 
385  if(_bias != nullptr)
386  {
387  update_window_and_padding(win, input_access, weights_access, bias_access, output_access);
388  }
389  else
390  {
391  update_window_and_padding(win, input_access, weights_access, output_access);
392  }
393 
394  output_access.set_valid_region(win, ValidRegion(Coordinates(), output->info()->tensor_shape()));
395 
396  _kernel.set_argument(idx++, _weights->info()->strides_in_bytes()[3]); // weights_stride_w
397  _kernel.set_argument(idx++, _weights->info()->dimension(2)); // weights_depth
398 
399  IGCKernel::configure(win);
400 }
unsigned int top
top of the border
Definition: Types.h:375
virtual size_t num_dimensions() const =0
The number of dimensions of the tensor (rank)
Shape of a tensor.
Definition: TensorShape.h:39
bool enabled() const
Check if initialised.
Definition: Types.h:1600
virtual size_t dimension(size_t index) const =0
Return the size of the requested dimension.
Container for 2D border size.
Definition: Types.h:273
#define ARM_COMPUTE_ERROR(msg)
Print the given message then throw an std::runtime_error.
Definition: Error.h:352
float a() const
Get the alpha value.
Definition: Types.h:1590
std::string to_string(T &&value)
Convert integer and float values to string.
virtual DataType data_type() const =0
Data type used for each element of the tensor.
1 channel, 1 F32 per channel
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:466
const std::string & string_from_activation_func(ActivationLayerInfo::ActivationFunction act)
Translates a given activation function to a string.
Definition: Utils.cpp:163
unsigned int bottom
bottom of the border
Definition: Types.h:377
GCKernel class.
unsigned int num_arguments_per_1D_tensor() const
Returns the number of arguments enqueued per 1D tensor object.
Definition: IGCKernel.cpp:137
unsigned int num_arguments_per_3D_tensor() const
Returns the number of arguments enqueued per 3D tensor object.
Definition: IGCKernel.cpp:147
std::string lower_string(const std::string &val)
Lower a given string.
Definition: Utils.cpp:350
1 channel, 1 F16 per channel
std::pair< unsigned int, unsigned int > scaled_dimensions(int width, int height, int kernel_width, int kernel_height, const PadStrideInfo &pad_stride_info, const Size2D &dilation=Size2D(1U, 1U))
Returns expected width and height of output scaled tensor depending on dimensions rounding mode...
Definition: Utils.cpp:419
Implementation of a static rectangular access pattern.
bool update_window_and_padding(Window &win, Ts &&... patterns)
Update window and padding size for each of the access patterns.
Definition: WindowHelpers.h:46
bool padding_is_symmetric() const
Check whether the padding is symmetric.
Definition: Types.h:778
std::string float_to_string_with_full_precision(float val)
Create a string with the float in full precision.
Definition: Utils.h:1262
virtual const TensorShape & tensor_shape() const =0
Size for each dimension of the tensor.
#define ARM_COMPUTE_ERROR_ON_MISMATCHING_DATA_TYPES(...)
Definition: Validate.h:543
auto ceil_to_multiple(S value, T divisor) -> decltype(((value+divisor - 1)/divisor) *divisor)
Computes the smallest number larger or equal to value that is a multiple of divisor.
Definition: Utils.h:71
Class to describe a number of elements in each dimension.
Definition: Steps.h:40
#define ARM_COMPUTE_ERROR_ON_MSG(cond, msg)
Definition: Error.h:456
Coordinates of an item.
Definition: Coordinates.h:37
std::pair< unsigned int, unsigned int > stride() const
Get the stride.
Definition: Types.h:770
std::string kernel_name
bool auto_init_if_empty(ITensorInfo &info, const TensorShape &shape, int num_channels, DataType data_type, QuantizationInfo quantization_info=QuantizationInfo())
Auto initialize the tensor info (shape, number of channels and data type) if the current assignment i...
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor&#39;s metadata.
virtual PaddingSize padding() const =0
Padding of tensor.
unsigned int left
left of the border
Definition: Types.h:378
unsigned int right
right of the border
Definition: Types.h:376
#define ARM_COMPUTE_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(t, c,...)
Definition: Validate.h:790
static GCKernelLibrary & get()
Get the static instance of GCKernelLibrary.
#define ARM_COMPUTE_ERROR_ON_MISMATCHING_DIMENSIONS(...)
Definition: Validate.h:286
Window calculate_max_enlarged_window(const ValidRegion &valid_region, const Steps &steps, BorderSize border_size)
GCKernel create_kernel(const std::string &shader_name, const StringSet &build_options_set={}) const
Creates a kernel from the kernel library.
std::pair< unsigned int, unsigned int > pad() const
Get the padding.
Definition: Types.h:788
ActivationFunction activation() const
Get the type of activation function.
Definition: Types.h:1585
float b() const
Get the beta value.
Definition: Types.h:1595
virtual const Strides & strides_in_bytes() const =0
The strides in bytes for accessing each dimension of the tensor.
Container for valid region of a window.
Definition: Types.h:188
Describe a multidimensional execution window.
Definition: Window.h:39
TensorShape & set(size_t dimension, size_t value, bool apply_dim_correction=true, bool increase_dim_unit=true)
Accessor to set the value of one of the dimensions.
Definition: TensorShape.h:79

◆ operator=() [1/2]

GCDirectConvolutionLayerKernel& operator= ( const GCDirectConvolutionLayerKernel< kernel_size > &  )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

GCDirectConvolutionLayerKernel& operator= ( GCDirectConvolutionLayerKernel< kernel_size > &&  )
default

Allow instances of this class to be moved.

◆ run()

void run ( const Window window)
overridevirtual

Enqueue the OpenGL ES shader to process the given window.

Parameters
[in]windowRegion on which to execute the kernel. (Must be a valid region of the window returned by window()).

Implements IGCKernel.

Definition at line 403 of file GCDirectConvolutionLayerKernel.cpp.

References IGCKernel::add_1D_tensor_argument(), IGCKernel::add_3D_tensor_argument(), Window::adjust(), ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW, ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL, Window::DimX, Window::DimY, arm_compute::enqueue(), Window::first_slice_window_3D(), ITensor::info(), IGCKernel::num_arguments_per_3D_tensor(), ITensorInfo::padding(), Window::set_dimension_step(), IGCTensor::set_needs_shifting(), Window::shift(), arm_compute::test::validation::reference::slice(), Window::slide_window_slice_3D(), Window::Dimension::step(), ITensorInfo::tensor_shape(), Window::use_tensor_dimensions(), IKernel::window(), Window::x(), and Window::y().

404 {
407 
408  _kernel.use();
409 
410  _output->set_needs_shifting(true);
411 
412  // Get initial windows
414  Window win_in = window;
415 
416  win_in.adjust(Window::DimX, -_conv_pad_x, true);
417  win_in.adjust(Window::DimY, -_conv_pad_y, true);
418  win_in.set_dimension_step(Window::DimX, window.x().step() * _conv_stride_x);
419  win_in.set_dimension_step(Window::DimY, window.y().step() * _conv_stride_y);
420 
421  Window slice_in = win_in.first_slice_window_3D();
422 
423  unsigned int idx1 = 2 * num_arguments_per_3D_tensor();
424  add_3D_tensor_argument(idx1, _weights, 3, slice);
425 
426  if(_bias != nullptr)
427  {
428  Window slice_bias;
429  slice_bias.use_tensor_dimensions(_bias->info()->tensor_shape());
430  add_1D_tensor_argument(idx1, _bias, 4, slice_bias);
431  }
432 
433  slice.shift(Window::DimX, -(_output->info()->padding()).left);
434 
435  do
436  {
437  unsigned int idx = 0;
438 
439  add_3D_tensor_argument(idx, _input, 1, slice_in);
440  add_3D_tensor_argument(idx, _output, 2, slice);
441 
442  _kernel.update_shader_params();
443  enqueue(*this, slice, _lws);
444  }
445  while(window.slide_window_slice_3D(slice) && win_in.slide_window_slice_3D(slice_in));
446 }
const Window & window() const
The maximum window the kernel can be executed on.
Definition: IKernel.cpp:28
void add_3D_tensor_argument(unsigned int &idx, const IGCTensor *tensor, const unsigned int binding_point, const Window &window)
Add the passed 3D tensor&#39;s parameters to the object&#39;s kernel&#39;s arguments starting from the index idx...
Definition: IGCKernel.cpp:132
void enqueue(IGCKernel &kernel, const Window &window, const gles::NDRange &lws=gles::NDRange(1U, 1U, 1U))
Add the kernel to the command queue with the given window.
Definition: IGCKernel.cpp:41
void shift(size_t dimension, int shift_value)
Shift the values of a given dimension by the given shift_value.
Definition: Window.inl:133
constexpr int step() const
Return the step of the dimension.
Definition: Window.h:104
unsigned int num_arguments_per_3D_tensor() const
Returns the number of arguments enqueued per 3D tensor object.
Definition: IGCKernel.cpp:147
void use_tensor_dimensions(const TensorShape &shape, size_t first_dimension=Window::DimX)
Use the tensor&#39;s dimensions to fill the window dimensions.
Definition: Window.inl:276
static constexpr size_t DimX
Alias for dimension 0 also known as X dimension.
Definition: Window.h:43
virtual const TensorShape & tensor_shape() const =0
Size for each dimension of the tensor.
void set_needs_shifting(bool needs_shifting)
Set the flag indicating whether or not a tensor needs shifting.
Definition: IGCTensor.cpp:61
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor&#39;s metadata.
virtual PaddingSize padding() const =0
Padding of tensor.
bool slide_window_slice_3D(Window &slice) const
Slide the passed 3D window slice.
Definition: Window.h:335
#define ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL(k)
Definition: Validate.h:941
static constexpr size_t DimY
Alias for dimension 1 also known as Y dimension.
Definition: Window.h:45
void set_dimension_step(size_t dimension, int step)
Set the step of a given dimension.
Definition: Window.inl:167
constexpr const Dimension & y() const
Alias to access the second dimension of the window.
Definition: Window.h:154
void adjust(size_t dimension, int adjust_value, bool is_at_start)
Adjust the start or end of a given dimension by the given value.
Definition: Window.inl:140
Window first_slice_window_3D() const
First 3D slice of the window.
Definition: Window.h:291
Describe a multidimensional execution window.
Definition: Window.h:39
void add_1D_tensor_argument(unsigned int &idx, const IGCTensor *tensor, const unsigned int binding_point, const Window &window)
Add the passed 1D tensor&#39;s parameters to the object&#39;s kernel&#39;s arguments starting from the index idx...
Definition: IGCKernel.cpp:122
#define ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW(f, s)
Definition: Validate.h:205
SimpleTensor< T > slice(const SimpleTensor< T > &src, Coordinates starts, Coordinates ends)
constexpr const Dimension & x() const
Alias to access the first dimension of the window.
Definition: Window.h:145

The documentation for this class was generated from the following files: