Compute Library
 22.08
ClDirectConvolutionKernelComponent Class Reference

#include <ClDirectConvolutionKernelComponent.h>

Collaboration diagram for ClDirectConvolutionKernelComponent:
[legend]

Public Member Functions

 ClDirectConvolutionKernelComponent (ClKernelBlueprint *blueprint, const ClDirectConv2dKernelDescriptor &desc, const Link &src, const Link &weight, const Link &dst, const Link &bias=Link{})
 
ComponentType get_component_type () const override
 
std::set< std::string > get_headers_list () const override
 
std::string get_additional_macros () const override
 
std::string get_component_code () const override
 
Window get_window () const override
 
ClKernelArgList get_args ()
 
CLBuildOptions generate_build_options () const override
 
virtual std::vector< Linkget_links () const override
 
virtual TagLUT get_tag_lut (const SharedVarTable &vtable) const override
 Get the tag look-up table used to instantiate the component code. More...
 
virtual void allocate_shared_vars (SharedVarTable &vtable) const override
 Allocate all shared variables used by the component in the vtable. More...
 
virtual std::string name () const override
 
- Public Member Functions inherited from IClKernelComponent
 IClKernelComponent (ClKernelBlueprint *blueprint)
 
 ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE (IClKernelComponent)
 
virtual ~IClKernelComponent ()=default
 
ComponentID id () const
 
void set_id (ComponentID id)
 
virtual std::string get_dst_addr_calculation () const
 
virtual std::string generate_config_id () const
 Generate config id of the component. More...
 

Additional Inherited Members

- Public Types inherited from IClKernelComponent
using Link = SharedVarLink
 
using Tag = std::string
 
using TagLUT = std::unordered_map< Tag, TagVal >
 
- Static Public Member Functions inherited from IClKernelComponent
static std::string replace_tags (const std::string &code_template, const TagLUT &tags)
 

Detailed Description

Definition at line 39 of file ClDirectConvolutionKernelComponent.h.

Constructor & Destructor Documentation

◆ ClDirectConvolutionKernelComponent()

Member Function Documentation

◆ allocate_shared_vars()

void allocate_shared_vars ( SharedVarTable vtable) const
overridevirtual

Allocate all shared variables used by the component in the vtable.

Parameters
vtable

Implements IClKernelComponent.

Definition at line 328 of file ClDirectConvolutionKernelComponent.cpp.

References SharedVarTable::add(), SharedVarLink::arg_id, TensorInfo::data_layout(), arm_compute::experimental::dynamic_fusion::export_to_cl_image_support(), CLScheduler::get(), SharedVarLink::is_empty(), arm_compute::test::validation::src_info, CLScheduler::target(), arm_compute::experimental::dynamic_fusion::Tensor_4D_t_Buffer, arm_compute::experimental::dynamic_fusion::Tensor_4D_t_Image, and arm_compute::experimental::dynamic_fusion::Vector.

Referenced by ClDirectConvolutionKernelComponent::get_links().

329 {
330  const auto src_info = _blueprint->impl().get_kernel_argument_info(_src.arg_id);
331  const auto weight_info = _blueprint->impl().get_kernel_argument_info(_weight.arg_id);
332 
333  vtable.add(_src, _blueprint->impl().group(_src.arg_id), ClKernelArgDescriptor(_src.arg_id, ClKernelTensorArgType::Tensor_4D_t_Buffer), "src");
334 
335  const GPUTarget gpu_target = CLScheduler::get().target();
336  const bool export_to_cl_image = export_to_cl_image_support(weight_info, gpu_target, src_info->data_layout());
338  vtable.add(_weight, _blueprint->impl().group(_weight.arg_id), ClKernelArgDescriptor(_weight.arg_id, weight_type), "weight");
339 
340  if(!_bias.is_empty()) // optional bias
341  {
342  vtable.add(_bias, _blueprint->impl().group(_bias.arg_id), ClKernelArgDescriptor(_bias.arg_id, ClKernelTensorArgType::Vector), "bias");
343  }
344  vtable.add(_dst, _blueprint->impl().group(_dst.arg_id), ClKernelArgDescriptor(_dst.arg_id, ClKernelTensorArgType::Tensor_4D_t_Buffer), "dst");
345 }
static CLScheduler & get()
Access the scheduler singleton.
bool export_to_cl_image_support(const ITensorInfo *tensor, GPUTarget gpu_target, DataLayout data_layout)
DataLayout data_layout() const override
Get the data layout of the tensor.
Definition: TensorInfo.h:291
GPUTarget target() const
Get the target GPU.
Definition: CLScheduler.cpp:49
ClKernelTensorArgType
Verbose and explicit way to enumerate all the tensor arguments variants used by all kernel implementa...
Definition: ClWorkload.h:46
TensorInfo src_info(src_shape, 1, data_type)
GPUTarget
Available GPU Targets.
Definition: GPUTarget.h:34

◆ generate_build_options()

CLBuildOptions generate_build_options ( ) const
overridevirtual

Reimplemented from IClKernelComponent.

Definition at line 294 of file ClDirectConvolutionKernelComponent.cpp.

References CLBuildOptions::add_option(), arm_compute::adjust_vec_size(), SharedVarLink::arg_id, arm_compute::CHANNEL, TensorInfo::data_layout(), arm_compute::test::validation::data_type, TensorInfo::data_type(), TensorInfo::dimension(), arm_compute::experimental::dynamic_fusion::export_to_cl_image_support(), CLScheduler::get(), arm_compute::get_data_layout_dimension_index(), arm_compute::is_data_type_quantized(), arm_compute::test::validation::src_info, CLScheduler::target(), arm_compute::support::cpp11::to_string(), and arm_compute::opencl::kernels::gemm::update_padding_for_cl_image().

Referenced by ClDirectConvolutionKernelComponent::ClDirectConvolutionKernelComponent().

295 {
296  const auto src_info = _blueprint->impl().get_kernel_argument_info(_src.arg_id);
297  auto weight_info = _blueprint->impl().get_kernel_argument_info(_weight.arg_id);
298  const auto dst_info = _blueprint->impl().get_kernel_argument_info(_blueprint->impl().get_dst_id());
299  // const auto tile_info = _blueprint->impl().get_tile_info();
300 
303  const GPUTarget gpu_target = CLScheduler::get().target();
304 
305  const unsigned int n0 = _blueprint->impl().get_execution_window().x().step();
306  const unsigned int m0 = _blueprint->impl().get_execution_window().y().step();
307  const unsigned int k0 = adjust_vec_size(is_data_type_quantized(data_type) ? 16u : 8u, src_info->dimension(channel_idx));
308  const unsigned int partial_store_n0 = dst_info->dimension(0) % n0;
309  const bool export_to_cl_image = export_to_cl_image_support(weight_info, gpu_target, src_info->data_layout());
310 
311  // Update the padding for the weights tensor if we can export to cl_image
312  if(export_to_cl_image)
313  {
315  }
316 
317  CLBuildOptions build_opts{};
318  build_opts.add_option("-cl-fast-relaxed-math");
319  build_opts.add_option("-DIS_TILED");
320  build_opts.add_option("-DN0=" + support::cpp11::to_string(n0));
321  build_opts.add_option("-DM0=" + support::cpp11::to_string(m0));
322  build_opts.add_option("-DK0=" + support::cpp11::to_string(k0));
323  build_opts.add_option("-DPARTIAL_N0=" + support::cpp11::to_string(partial_store_n0));
324 
325  return build_opts;
326 }
bool is_data_type_quantized(DataType dt)
Check if a given data type is of quantized type.
Definition: Utils.h:1030
static CLScheduler & get()
Access the scheduler singleton.
bool export_to_cl_image_support(const ITensorInfo *tensor, GPUTarget gpu_target, DataLayout data_layout)
DataLayout data_layout() const override
Get the data layout of the tensor.
Definition: TensorInfo.h:291
GPUTarget target() const
Get the target GPU.
Definition: CLScheduler.cpp:49
std::string to_string(T &&value)
Convert integer and float values to string.
size_t dimension(size_t index) const override
Return the size of the requested dimension.
Definition: TensorInfo.h:205
void update_padding_for_cl_image(ITensorInfo *tensor)
Update padding required to export the OpenCL buffer to OpenCL image2d.
DataType data_type() const override
Data type used for each element of the tensor.
Definition: TensorInfo.h:242
TensorInfo src_info(src_shape, 1, data_type)
GPUTarget
Available GPU Targets.
Definition: GPUTarget.h:34
size_t get_data_layout_dimension_index(const DataLayout &data_layout, const DataLayoutDimension &data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:193
unsigned int adjust_vec_size(unsigned int vec_size, size_t dim0)
Returns the adjusted vector size in case it is less than the input&#39;s first dimension, getting rounded down to its closest valid vector size.
Definition: Utils.h:1222
DataType
Available data types.
Definition: Types.h:79

◆ get_additional_macros()

std::string get_additional_macros ( ) const
overridevirtual

Reimplemented from IClKernelComponent.

Definition at line 92 of file ClDirectConvolutionKernelComponent.cpp.

Referenced by ClDirectConvolutionKernelComponent::ClDirectConvolutionKernelComponent().

93 {
94  return R"_()_"; // no macros
95 }

◆ get_args()

◆ get_component_code()

std::string get_component_code ( ) const
overridevirtual

Reimplemented from IClKernelComponent.

Definition at line 97 of file ClDirectConvolutionKernelComponent.cpp.

References arm_compute::adjust_vec_size(), SharedVarLink::arg_id, ARM_COMPUTE_ERROR_ON_MSG, arm_compute::CHANNEL, TensorInfo::data_layout(), TensorInfo::data_type(), TensorInfo::dimension(), arm_compute::get_data_layout_dimension_index(), arm_compute::is_data_type_quantized(), arm_compute::NHWC, and arm_compute::test::validation::src_info.

Referenced by ClDirectConvolutionKernelComponent::ClDirectConvolutionKernelComponent().

98 {
99  const auto src_info = _blueprint->impl().get_kernel_argument_info(_src.arg_id);
100  const auto bias_info = _blueprint->impl().get_kernel_argument_info(_bias.arg_id);
101 
102  ARM_COMPUTE_ERROR_ON_MSG(src_info->data_layout() != DataLayout::NHWC, "Only NHWC data layout is supported by this component.");
103 
105  const auto k0 = adjust_vec_size(is_data_type_quantized(src_info->data_type()) ? 16u : 8u, src_info->dimension(channel_idx));
106  const bool leftover_loop = (src_info->dimension(channel_idx) % k0) != 0;
107 
108  std::string code = R"_(
109  //------------------ START KERNEL {{meta_kernel_id}} ---------------------
110  // IN_0(src) {{src}}
111  // IN_1(wei) {{weight}}
112  )_";
113  if(bias_info != nullptr)
114  {
115  code += R"_(
116  // IN_1(bia) {{bias}}
117  )_";
118  }
119  code += R"_(
120  // OUT(dst, accum) {{dst}}
121 
122  // Initialize the accumulators
123  TILE({{ACC_DATA_TYPE}}, M0, N0, {{dst}});
124  {
125  // All the tensor dimensions are passed at compile time.
126  // In case of dynamic tensor support, the following dimensions should be passed as function argument.
127  #define _IWEI_WIDTH {{WEI_WIDTH}}
128  #define _IWEI_HEIGHT {{WEI_HEIGHT}}
129  #define _ISRC_WIDTH {{src}}_w
130  #define _ISRC_HEIGHT {{src}}_h
131  #define _ISRC_CHANNELS {{src}}_c
132  #define _IDST_WIDTH {{arg_dst}}_w
133  #define _IDST_HEIGHT {{arg_dst}}_h
134  #define _IDST_CHANNELS {{arg_dst}}_c
135  #define _IY_MULTIPLIER (_IWEI_WIDTH * _IWEI_HEIGHT)
136 
137  // .v = access the whole vector (OpenCL vector)
138  // .s[x] = access the vector element at position x (scalar access)
139  TILE(int, M0, 1, xi);
140  TILE(int, M0, 1, yi);
141 
142  // Convert the linear index to coordinate
143  LOOP_UNROLLING(int, i, 0, 1, M0,
144  {
145  xi[i].v = ((mout + i) % _IDST_WIDTH) * {{STRIDE_X}};
146  yi[i].v = ((mout + i) / _IDST_WIDTH) * {{STRIDE_Y}};
147  xi[i].v -= {{PAD_LEFT}};
148  yi[i].v -= {{PAD_TOP}};
149  })
150 
151  LOOP_UNROLLING(int, i, 0, 1, M0,
152  {
153  {{dst}}[i].v = 0;
154  })
155 
156  for(int i = 0; i < (_IWEI_WIDTH * _IWEI_HEIGHT); ++i)
157  {
158  int ck = 0;
159  int xk = i % _IWEI_WIDTH;
160  int yk = i / _IWEI_HEIGHT;
161 
162  int k = 0;
163  for(; k <= (_ISRC_CHANNELS - K0); k += K0)
164  {
165  TILE({{SRC_DATA_TYPE}}, M0, K0, a);
166  TILE({{WEI_DATA_TYPE}}, N0, K0, b);
167 
168  LOOP_UNROLLING(int, i, 0, 1, M0,
169  {
170  a[i].v = {{ZERO_VALUE}};
171  })
172 
173  // Load tile from the src tensor
174  T_LOAD_NHWC_INDIRECT({{SRC_DATA_TYPE}}, M0, K0, {{SRC_TENSOR_TYPE}}, {{src}}, bout, yk, xk, ck, _ISRC_WIDTH, _ISRC_HEIGHT, {{src}}_stride_y, xi, yi, a);
175 
176  // Load tile from the weights tensor
177  T_LOAD({{WEI_DATA_TYPE}}, N0, K0, {{WEI_TENSOR_TYPE}}, {{weight}}, ck, cout * _IY_MULTIPLIER + i, _IY_MULTIPLIER, {{weight}}_stride_y, b);
178 
179  // Compute the matrix multiplication between two tiles
180  T_MMUL({{SRC_DATA_TYPE}}, {{WEI_DATA_TYPE}}, {{ACC_DATA_TYPE}}, M0, N0, K0, NT, T, a, b, {{dst}});
181 
182  ck += K0;
183  }
184 
185  // We voluntarily use SRC_CHANNELS rather than _DSRC_CHANNELS
186  // This #if directive should be removed in case of dynamic tensor support
187  )_";
188 
189  if(leftover_loop)
190  {
191  code += R"_(
192  // Left-over accumulations
193  for(; k < _ISRC_CHANNELS; ++k)
194  {
195  TILE({{SRC_DATA_TYPE}}, M0, 1, a);
196  TILE({{WEI_DATA_TYPE}}, N0, 1, b);
197 
198  LOOP_UNROLLING(int, i, 0, 1, M0,
199  {
200  a[i].v = {{ZERO_VALUE}};
201  })
202 
203  // Load tile from the src tensor
204  T_LOAD_NHWC_INDIRECT({{SRC_DATA_TYPE}}, M0, 1, {{SRC_TENSOR_TYPE}}, {{src}}, bout, yk, xk, ck, _ISRC_WIDTH, _ISRC_HEIGHT, {{src}}_stride_y, xi, yi, a);
205 
206  // Load tile from the weights tensor
207  // The T_LOAD for the left-over elements can only use BUFFER because we load one element per iteration
208  T_LOAD({{WEI_DATA_TYPE}}, N0, 1, BUFFER, {{weight}}, ck, cout * _IY_MULTIPLIER + i, _IY_MULTIPLIER, {{weight}}_stride_y, b);
209 
210  // Compute the matrix multiplication between two tiles
211  T_MMUL({{SRC_DATA_TYPE}}, {{WEI_DATA_TYPE}}, {{ACC_DATA_TYPE}}, M0, N0, 1, NT, T, a, b, {{dst}});
212 
213  ++ck;
214  }
215  )_";
216  }
217 
218  code += R"_(
219  #undef _I_WEI_WIDTH
220  #undef _I_WEI_HEIGHT
221  #undef _ISRC_WIDTH
222  #undef _ISRC_HEIGHT
223  #undef _ISRC_CHANNELS
224  #undef _IDST_WIDTH
225  #undef _IDST_HEIGHT
226  #undef _IDST_CHANNELS
227  #undef _IY_MULTIPLIER
228 
229  }
230  )_";
231 
232  if(bias_info != nullptr)
233  {
234  code += R"_(
235  TILE({{BIA_DATA_TYPE}}, 1, N0, bias0);
236 
237  T_LOAD({{BIA_DATA_TYPE}}, 1, N0, BUFFER, {{bias}}, cout, 0, 1, 0, bias0);
238 
239  // c = c + bias[broadcasted]
240  T_ELTWISE_BROADCAST_ADD_X({{ACC_DATA_TYPE}}, M0, N0, {{dst}}, bias0, {{dst}});
241  )_";
242  }
243 
244  code += R"_(
245  }
246 //------------------ END KERNEL {{meta_kernel_id}} ---------------------
247  )_";
248  return code.c_str();
249 }
bool is_data_type_quantized(DataType dt)
Check if a given data type is of quantized type.
Definition: Utils.h:1030
DataLayout data_layout() const override
Get the data layout of the tensor.
Definition: TensorInfo.h:291
size_t dimension(size_t index) const override
Return the size of the requested dimension.
Definition: TensorInfo.h:205
DataType data_type() const override
Data type used for each element of the tensor.
Definition: TensorInfo.h:242
#define ARM_COMPUTE_ERROR_ON_MSG(cond, msg)
Definition: Error.h:456
TensorInfo src_info(src_shape, 1, data_type)
size_t get_data_layout_dimension_index(const DataLayout &data_layout, const DataLayoutDimension &data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:193
Num samples, height, width, channels.
unsigned int adjust_vec_size(unsigned int vec_size, size_t dim0)
Returns the adjusted vector size in case it is less than the input&#39;s first dimension, getting rounded down to its closest valid vector size.
Definition: Utils.h:1222

◆ get_component_type()

◆ get_headers_list()

std::set< std::string > get_headers_list ( ) const
overridevirtual

Reimplemented from IClKernelComponent.

Definition at line 46 of file ClDirectConvolutionKernelComponent.cpp.

Referenced by ClDirectConvolutionKernelComponent::ClDirectConvolutionKernelComponent().

47 {
48  return std::set<std::string> { "helpers.h", "tile_helpers.h" };
49 }

◆ get_links()

virtual std::vector<Link> get_links ( ) const
inlineoverridevirtual

Implements IClKernelComponent.

Definition at line 56 of file ClDirectConvolutionKernelComponent.h.

References ClDirectConvolutionKernelComponent::allocate_shared_vars(), and ClDirectConvolutionKernelComponent::get_tag_lut().

57  {
58  return { _src, _weight, _bias, _dst };
59  }

◆ get_tag_lut()

ClDirectConvolutionKernelComponent::TagLUT get_tag_lut ( const SharedVarTable vtable) const
overridevirtual

Get the tag look-up table used to instantiate the component code.

Parameters
vtable
Returns
TagLUT

Implements IClKernelComponent.

Definition at line 347 of file ClDirectConvolutionKernelComponent.cpp.

References SharedVarLink::arg_id, ClDirectConv2dKernelDescriptor::conv2d, TensorInfo::data_layout(), TensorInfo::data_type(), SharedVarTable::SharedVar::desc, SharedVarTable::get(), arm_compute::get_cl_type_from_data_type(), arm_compute::get_data_layout_dimension_index(), arm_compute::HEIGHT, IClKernelComponent::id(), arm_compute::experimental::dynamic_fusion::Image_3D_Export_To_ClImage2D, arm_compute::experimental::dynamic_fusion::Image_Export_To_ClImage2D, SharedVarLink::is_empty(), Padding2D::left, Conv2dDescriptor::pad, arm_compute::test::validation::src_info, Conv2dDescriptor::stride, arm_compute::experimental::dynamic_fusion::Tensor_4D_t_Image, ClKernelArgDescriptor::tensor_arg_type, Padding2D::top, SharedVarTable::SharedVar::uniq_name, arm_compute::WIDTH, Size2D::x(), and Size2D::y().

Referenced by ClDirectConvolutionKernelComponent::get_links().

348 {
349  TagLUT lut{};
350 
351  const auto src_info = _blueprint->impl().get_kernel_argument_info(_src.arg_id);
352  const auto weight_info = _blueprint->impl().get_kernel_argument_info(_weight.arg_id);
353  const auto bias_info = _blueprint->impl().get_kernel_argument_info(_bias.arg_id);
354 
355  // Arguments and global shared variables
356  lut["src"] = vtable.get(_src);
357  lut["weight"] = vtable.get(_weight);
358 
359  if(!_bias.is_empty()) // optional bias
360  {
361  lut["bias"] = vtable.get(_bias);
362  lut["BIA_DATA_TYPE"] = get_cl_type_from_data_type(bias_info->data_type());
363  }
364  lut["dst"] = vtable.get(_dst);
365 
366  const auto dst_argument = _blueprint->impl().get_argument_shared_vars().get_dst_var();
367  lut["arg_dst"] = dst_argument.uniq_name;
368 
369  // Local build options
370  lut["meta_kernel_id"] = id();
371  lut["ACC_DATA_TYPE"] = src_info->data_type();
372  lut["SRC_DATA_TYPE"] = src_info->data_type();
373  lut["WEI_DATA_TYPE"] = weight_info->data_type();
374 
375  lut["SRC_TENSOR_TYPE"] = "BUFFER";
376  switch(vtable.get(_weight).desc.tensor_arg_type)
377  {
381  {
382  lut["WEI_TENSOR_TYPE"] = "IMAGE";
383  break;
384  }
385  default:
386  {
387  lut["WEI_TENSOR_TYPE"] = "BUFFER";
388  break;
389  }
390  }
393  lut["WEI_WIDTH"] = weight_info->dimension(width_idx);
394  lut["WEI_HEIGHT"] = weight_info->dimension(height_idx);
395 
396  lut["STRIDE_X"] = _desc.conv2d.stride.x();
397  lut["STRIDE_Y"] = _desc.conv2d.stride.y();
398 
399  lut["PAD_LEFT"] = _desc.conv2d.pad.left;
400  lut["PAD_TOP"] = _desc.conv2d.pad.top;
401 
402  lut["ZERO_VALUE"] = 0;
403 
404  return lut;
405 }
DataLayout data_layout() const override
Get the data layout of the tensor.
Definition: TensorInfo.h:291
size_t x() const
Semantic accessor for width as x.
Definition: Size2D.h:75
DataType data_type() const override
Data type used for each element of the tensor.
Definition: TensorInfo.h:242
std::string get_cl_type_from_data_type(const DataType &dt)
Translates a tensor data type to the appropriate OpenCL type.
Definition: CLHelpers.cpp:39
size_t left
Padding across the width dimension on the left, in elements.
size_t y() const
Semantic accessor for height as y.
Definition: Size2D.h:84
TensorInfo src_info(src_shape, 1, data_type)
size_t get_data_layout_dimension_index(const DataLayout &data_layout, const DataLayoutDimension &data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:193
size_t top
Padding across the height dimension on the top, in elements.

◆ get_window()

Window get_window ( ) const
overridevirtual

Reimplemented from IClKernelComponent.

Definition at line 51 of file ClDirectConvolutionKernelComponent.cpp.

References SharedVarLink::arg_id, arm_compute::auto_init_if_empty(), Padding2D::bottom, arm_compute::calculate_max_window(), arm_compute::ceil_to_multiple(), arm_compute::misc::shape_calculator::compute_deep_convolution_shape(), ClDirectConv2dKernelDescriptor::conv2d, TensorInfo::data_type(), Window::DimY, Window::DimZ, arm_compute::F32, arm_compute::FLOOR, Padding2D::left, arm_compute::test::validation::output_shape, Conv2dDescriptor::pad, TensorInfo::quantization_info(), Padding2D::right, Window::set(), arm_compute::test::validation::src_info, Conv2dDescriptor::stride, Padding2D::top, arm_compute::utils::cast::U, Size2D::x(), and Size2D::y().

Referenced by ClDirectConvolutionKernelComponent::ClDirectConvolutionKernelComponent().

52 {
53  const auto src_info = _blueprint->impl().get_kernel_argument_info(_src.arg_id);
54  const auto weight_info = _blueprint->impl().get_kernel_argument_info(_weight.arg_id);
55  auto dst_info = _blueprint->impl().get_kernel_argument_info(_blueprint->impl().get_dst_id());
56 
57  // Get dst shape
58  PadStrideInfo pad_stride_info
59  {
60  static_cast<unsigned int>(_desc.conv2d.stride.x()),
61  static_cast<unsigned int>(_desc.conv2d.stride.y()),
62  static_cast<unsigned int>(_desc.conv2d.pad.left),
63  static_cast<unsigned int>(_desc.conv2d.pad.right),
64  static_cast<unsigned int>(_desc.conv2d.pad.top),
65  static_cast<unsigned int>(_desc.conv2d.pad.bottom),
66  DimensionRoundingType::FLOOR /*default rounding type*/
67  };
68  TensorShape output_shape = misc::shape_calculator::compute_deep_convolution_shape(*src_info, *weight_info, pad_stride_info);
69 
70  // Output auto initialization if not yet initialized
71  auto_init_if_empty(*dst_info, output_shape,
72  1,
75 
76  const unsigned int vec_size = std::min(static_cast<unsigned int>(dst_info->tensor_shape()[0]), 4u);
77  const unsigned int num_rows = (dst_info->tensor_shape()[0] > 16) ? ((src_info->data_type() == DataType::F32) ? 2U : 4U) : 1U;
78  // const unsigned int num_rows = 1;
79  // const unsigned int vec_size = tile_info.tile_dims.x();
80  // const unsigned int num_rows = tile_info.tile_dims.y();
81 
82  // Create and configure kernel window
83  Window win = calculate_max_window(output_shape, Steps(vec_size, num_rows));
84 
85  const size_t dim_y_collapsed = ceil_to_multiple(output_shape[1] * output_shape[2], num_rows);
86  win.set(Window::DimY, Window::Dimension(0, dim_y_collapsed, num_rows));
87  win.set(Window::DimZ, Window::Dimension(0, output_shape.total_size_upper(3), 1));
88 
89  return win;
90 }
Window calculate_max_window(const ValidRegion &valid_region, const Steps &steps, bool skip_border, BorderSize border_size)
size_t bottom
Padding across the height dimension on the bottom, in elements.
QuantizationInfo quantization_info() const override
Get the quantization settings (scale and offset) of the tensor.
Definition: TensorInfo.h:287
1 channel, 1 F32 per channel
size_t x() const
Semantic accessor for width as x.
Definition: Size2D.h:75
DataType data_type() const override
Data type used for each element of the tensor.
Definition: TensorInfo.h:242
size_t right
Padding across the width dimension on the right, in elements.
auto ceil_to_multiple(S value, T divisor) -> decltype(((value+divisor - 1)/divisor) *divisor)
Computes the smallest number larger or equal to value that is a multiple of divisor.
Definition: Utils.h:71
bool auto_init_if_empty(ITensorInfo &info, const TensorShape &shape, int num_channels, DataType data_type, QuantizationInfo quantization_info=QuantizationInfo())
Auto initialize the tensor info (shape, number of channels and data type) if the current assignment i...
size_t left
Padding across the width dimension on the left, in elements.
size_t y() const
Semantic accessor for height as y.
Definition: Size2D.h:84
TensorInfo src_info(src_shape, 1, data_type)
static constexpr size_t DimY
Alias for dimension 1 also known as Y dimension.
Definition: Window.h:45
static constexpr size_t DimZ
Alias for dimension 2 also known as Z dimension.
Definition: Window.h:47
size_t top
Padding across the height dimension on the top, in elements.
TensorShape compute_deep_convolution_shape(const TensorShape &input_shape, DataLayout input_data_layout, const TensorShape &weights_shape, const PadStrideInfo &conv_info)
Calculate the deep convolution shape output shape of a tensor.

◆ name()

virtual std::string name ( ) const
inlineoverridevirtual

Implements IClKernelComponent.

Definition at line 64 of file ClDirectConvolutionKernelComponent.h.

References SharedVarLink::arg_id, and arm_compute::experimental::dynamic_fusion::to_string().

65  {
66  return "direct_convolution_" + to_string(_blueprint->impl().get_kernel_argument_info(_src.arg_id)->data_layout()) + "_" + std::to_string(id());
67  }
std::string to_string(const CLBuildOptions &cl_build_opts)
Definition: Utils.h:51

The documentation for this class was generated from the following files: