Compute Library
 23.11
arm_compute::experimental::dynamic_fusion Namespace Reference

Data Structures

class  ArgumentPack
 This is a generic class that packs the arguments of an operator. More...
 
struct  AuxMemoryInfo
 Memory information for tensors with MemoryType::Auxiliary. More...
 
class  CastAttributes
 Attributes are backend-agnostic parameters (in addition to the input/output tensors) of an operator. More...
 
class  ClampAttributes
 Attributes are backend-agnostic parameters (in addition to the input/output tensors) of an operator. More...
 
class  ClComponentActivation
 
class  ClComponentCast
 
class  ClComponentCastSettings
 Component specific settings. More...
 
class  ClComponentDepthwiseConv2d
 
class  ClComponentDepthwiseConv2dSettings
 Component specific settings. More...
 
class  ClComponentDirectConv2d
 
class  ClComponentDirectConv2dSettings
 Component specific settings. More...
 
class  ClComponentElementwiseBinary
 
class  ClComponentLogits1DMaxShiftExpSum
 Component to calculate max-shifted exponentials and their sum. More...
 
class  ClComponentLogits1DNorm
 Component to calculate the final step of the Softmax Layer where each logit value is multiplied by the inverse of the sum of the logits. More...
 
class  ClComponentMatMul
 
class  ClComponentPool2d
 
class  ClComponentReshape
 
class  ClComponentResize
 
class  ClComponentStore
 
class  ClKernelRuntime
 OpenCL runtime to run a single kernel. More...
 
class  ClTemplateActivation
 
class  ClTemplateCast
 
class  ClTemplateDepthwiseConv2d
 
class  ClTemplateDirectConv2d
 
class  ClTemplateElementwiseBinary
 
class  ClTemplateLogits1DMaxShiftExpSum
 
class  ClTemplateLogits1DNorm
 
class  ClTemplatePool2d
 
class  ClTemplateReshape
 
class  ClTemplateResize
 
class  ClTemplateStore
 
class  ClTemplateWriter
 Use a templated-string-based method to write kernel code It stitches the component code templates together based on the valid fusion configuration. More...
 
class  ClWorkloadRuntime
 OpenCL runtime to run a workload. More...
 
class  Conv2dAttributes
 Attributes are backend-agnostic parameters (in addition to the input/output tensors) of an operator. More...
 
class  DependencyGraph
 A multi-input (tensors), multi-output (tensors) acyclic directed graph Represented as a doubly-linked adjacency list with the differentiation between source and destination. More...
 
class  DepthwiseConv2dAttributes
 Attributes are backend-agnostic parameters (in addition to the input/output tensors) of an operator. More...
 
class  ElementwiseBinaryCommonAttributes
 
class  GpuAdd
 Operator interface. More...
 
class  GpuCast
 Operator interface. More...
 
class  GpuCkwActivation
 
class  GpuCkwCast
 
class  GpuCkwComponentArgument
 The argument of a dynamic fusion component which can be either user tensor or virtual tensor. More...
 
class  GpuCkwDepthwiseConv2d
 
class  GpuCkwDirectConv2d
 
class  GpuCkwDriver
 Use Kernel Writer to write kernel code Used by dynamic_fusion module. More...
 
class  GpuCkwElementwiseBinary
 
class  GpuCkwKernelWriter
 Extended implementation of kernel writer for dynamic fusion. More...
 
class  GpuCkwMatMul
 
class  GpuCkwPool2d
 
class  GpuCkwResize
 
class  GpuCkwScopedKernelWriter
 Helper to automatically manage kernel writer ID space. More...
 
class  GpuCkwStore
 An interface used by ClTemplateWriter to write source code for a kernel component. More...
 
class  GpuCkwVariableTable
 A table of all the variables used in the kernel. More...
 
class  GpuClamp
 Operator interface. More...
 
class  GpuComponentServices
 Services that are used throughout the creation phase of workload code. More...
 
class  GpuConv2d
 Operator interface. More...
 
class  GpuDepthwiseConv2d
 Operator interface. More...
 
class  GpuElementwiseBinaryCommon
 Operator interface. More...
 
class  GpuKernelArgument
 Kernel argument information linked with its corresponding ITensorInfo. More...
 
struct  GpuKernelArgumentInfo
 Contain information required to set up a kernel argument at run time. More...
 
class  GpuKernelComponentFactory
 Factory class that creates new instances of IGpuKernelComponent by assigning new component ids. More...
 
class  GpuKernelComponentGraph
 A multi-input (tensors), multi-output (tensors) acyclic directed graph of gpu kernel components Its main purposes are: More...
 
class  GpuKernelComponentGroup
 A group of gpu kernel components to be fused together PRECONDITIONS: More...
 
class  GpuKernelComponentStream
 A linear sequence of component groups serialized from the GpuKernelComponentGraph Each component group in the stream denotes a complete kernel that may consist of multiple components. More...
 
class  GpuKernelSourceCode
 Container of kernel code to be compiled and run in a GpuUnitWorkload. More...
 
class  GpuKernelVariableTable
 A table of all the variables used in the kernel. More...
 
class  GpuLogicalKernel
 A wrapper-processor of a GpuKernelComponentGroup It adds the load (if any) and store components to the component group The GpuLogicalKernel represents a complete kernel, and can proceed to invoke any kernel writer to generate the full kernel code. More...
 
class  GpuMatMul
 Operator interface. More...
 
class  GpuMatMulSettings
 Operator backend specific settings. More...
 
class  GpuMul
 Operator interface. More...
 
class  GpuOperatorGroup
 A linear sequence of operators to be fused in a workload For the time being, this class is only used for validating operator fusion INVARIANTS: More...
 
class  GpuOutput
 Operator interface. More...
 
class  GpuPool2d
 Operator interface. More...
 
class  GpuPool2dSettings
 Operator backend specific settings. More...
 
class  GpuReshape
 Operator interface. More...
 
class  GpuResize
 Operator interface. More...
 
class  GpuSigmoid
 Operator interface. More...
 
class  GpuSoftmax
 Operator interface. More...
 
class  GpuSub
 Operator interface. More...
 
class  GpuTanh
 Operator interface. More...
 
class  GpuUnitWorkload
 The atomic unit in a Gpu workload. More...
 
class  GpuWorkloadArgument
 Describes all the info related to a workload argument (tensor) in order to: More...
 
class  GpuWorkloadContext
 Provide context necessary for the creation and configuration of a workload e.g. More...
 
class  GpuWorkloadSketch
 A descriptor of a workload of operators. More...
 
class  GpuWorkloadSourceCode
 Hold the generated kernel source code and other information required to compile and run the workload. More...
 
class  IGpuCkwComponentDriver
 An interface used by GpuCkwDriver to write source code for a kernel component. More...
 
class  IGpuKernelComponent
 An abstract interface of a component. More...
 
class  IGpuKernelWriter
 An interface that can write a gpu kernel. More...
 
class  IGpuTemplateComponentWriter
 An interface used by ClTemplateWriter to write source code for a kernel component. More...
 
class  KernelProperties
 Properties common to all kernel component types. More...
 
class  MatMulAttributes
 Attributes are backend-agnostic parameters (in addition to the input/output tensors) of an operator. More...
 
struct  MemoryDescriptor
 Descriptor of a workload tensor memory. More...
 
class  Operator
 An operator for the sole purpose of validating fusion. More...
 
class  Pool2dAttributes
 Attributes are backend-agnostic parameters (in addition to the input/output tensors) of an operator. More...
 
class  ReshapeAttributes
 Attributes are backend-agnostic parameters (in addition to the input/output tensors) of an operator. More...
 
class  ResizeAttributes
 Attributes are backend-agnostic parameters (in addition to the input/output tensors) of an operator. More...
 
class  SoftmaxAttributes
 Attributes are backend-agnostic parameters (in addition to the input/output tensors) of an operator. More...
 
struct  TagVal
 A tag value will substitute a tag in a string template during its instantiation. More...
 
struct  UnitWorkloadStage
 Describes when a unit workload is run. More...
 

Typedefs

using GpuTarget = ::arm_compute::GPUTarget
 Gpu Information such as the Gpu target (for example, G76) More...
 
using MemoryDescriptorMap = std::map< ITensorInfo::Id, MemoryDescriptor >
 A map from ITensorInfo to their corresponding MemoryDescriptor. More...
 
using TileContainer = std::vector< std::vector< std::string > >
 
using SamplerCreator = std::function< TensorTileSampler(GpuCkwScopedKernelWriter &, int32_t, int32_t)>
 
using Settings = ClComponentDepthwiseConv2dSettings
 
using ComponentId = int32_t
 Uniquely identifies a kernel component within a workload. More...
 
using GpuKernelArgumentList = std::map< ITensorInfo::Id, GpuKernelArgument >
 The argument list of a GpuKernelSourceCode. More...
 
using OperatorId = DependencyGraph::OperatorId
 
using UnitWorkloadId = int32_t
 Uniquely identifies a GpuUnitWorkload within a GpuWorkloadSourceCode. More...
 
using Tag = std::string
 A tag used in a string template is a placeholder string to be substituted by real values during template instantiation. More...
 
using TagLUT = std::unordered_map< Tag, TagVal >
 Tag lookup table. More...
 

Enumerations

enum  GpuLanguage { OpenCL, Unknown }
 Gpu Language. More...
 
enum  MemoryType { User = 0, Auxiliary, Virtual }
 Type of memory used by a workload tensor. More...
 
enum  GpuComponentType { Complex, Simple, Unfusable, Output }
 Component type in the context of fusion Its main purpose is to inform the optimizer how to perform fusion. More...
 
enum  GpuOperatorType { Simple, Complex, Unfusable }
 Contain properties common to all operator types. More...
 

Functions

void cl_add_tensor_component_argument (cl::Kernel &kernel, unsigned int &idx, const ICLTensor *tensor, TensorComponentType component)
 Select a Compute Kernel Writer tensor component from a tensor and add to the kernel's arguments at the specified index idx. More...
 
void cl_add_buffer_argument (cl::Kernel &kernel, unsigned int &idx, const cl::Buffer &buffer)
 Add an OpenCL buffer object to the kernel's arguments at the specified index idx. More...
 
void cl_add_texture_argument (cl::Kernel &kernel, unsigned int &idx, const cl::Image &image)
 Add an OpenCL image object to the kernel's arguments at the specified index idx. More...
 
ckw::DataType to_ckw (DataType dt)
 
ckw::TensorShape to_ckw (const TensorShape &shape)
 
ckw::TensorDataLayout to_ckw (DataLayout dl)
 
ckw::TensorInfo to_ckw (const ITensorInfo &tensor_info)
 
TensorComponentType from_ckw (const ckw::TensorComponentType &component)
 
ckw::TensorStorageType to_ckw (const TensorStorageType &storage)
 
TensorStorageType from_ckw (const ckw::TensorStorageType &storage)
 
ckw::BinaryOp to_ckw (const ElementwiseBinaryCommonAttributes &attributes)
 
void load_src_dst_tiles_and_prepare_sampler (GpuCkwScopedKernelWriter &writer, GpuCkwComponentArgument *src, GpuCkwComponentArgument *dst, int32_t m0, int32_t n0, SamplerCreator create_sampler)
 Load src and dst tiles of dimension [m0, n0] only when not loaded and prepare the sampler. More...
 
void get_coord (GpuCkwScopedKernelWriter writer, TileOperand &coord, const TileOperand &gid, int32_t step_v, int32_t leftover_step_v, const std::string &prefix, const TileOperand &const_0)
 Get boundary aware coordinate along one axis. More...
 
TensorTileSampler create_boundary_aware_2d_sampler (GpuCkwScopedKernelWriter writer, TileOperand &gid_0, TileOperand &gid_1, int32_t dim0_v, int32_t dim1_v, int32_t n0_v, int32_t m0_v, const std::string prefix, TileOperand &const_0)
 Declare coordinate tiles "{prefix}_dim0_coord" and "{prefix}_dim1_coord", and create a boundary-aware sampler from tile of size [n0, m0], against the overall dimensions [dim0, dim1] The load and store of tile [n0, m0] will never be out of bound of [dim0, dim1]. More...
 
bool operator== (const KernelProperties &config0, const KernelProperties &config1)
 
bool operator== (const GpuKernelArgumentInfo &info0, const GpuKernelArgumentInfo &info1)
 
bool operator== (const UnitWorkloadStage &stage0, const UnitWorkloadStage &stage1)
 
bool is_alloc_tensor (const ITensorInfo *tensor_info)
 Tensor should have backing memory. More...
 
bool is_noalloc_tensor (const ITensorInfo *tensor_info)
 Tensor should not have backing memory. More...
 
bool is_valid_tensor (const ITensorInfo *tensor_info)
 ITensorInfo has valid id More...
 
bool is_invalid_tensor (const ITensorInfo *tensor_info)
 ITensorInfo has invalid id More...
 
PoolingLayerInfo convert_pool_attr_to_pool_info (const Pool2dAttributes &pool_attr, bool mixed_precision=false, DataLayout data_layout=DataLayout::NHWC)
 Inline function to convert Pool2dAttributes to PoolingLayerInfo. More...
 

Variables

constexpr unsigned int vector_size_byte_opencl = 16
 

Typedef Documentation

◆ ComponentId

using ComponentId = int32_t

Uniquely identifies a kernel component within a workload.

Definition at line 37 of file Types.h.

◆ GpuKernelArgumentList

The argument list of a GpuKernelSourceCode.

Definition at line 47 of file GpuKernelSourceCode.h.

◆ GpuTarget

Gpu Information such as the Gpu target (for example, G76)

Definition at line 41 of file GpuWorkloadContext.h.

◆ MemoryDescriptorMap

A map from ITensorInfo to their corresponding MemoryDescriptor.

Definition at line 91 of file MemoryDescriptor.h.

◆ OperatorId

Definition at line 41 of file GpuOperatorGroup.h.

◆ SamplerCreator

using SamplerCreator = std::function<TensorTileSampler(GpuCkwScopedKernelWriter &, int32_t , int32_t )>

Definition at line 44 of file WriterHelper.h.

◆ Settings

◆ Tag

using Tag = std::string

A tag used in a string template is a placeholder string to be substituted by real values during template instantiation.

Definition at line 127 of file GpuKernelVariableTable.h.

◆ TagLUT

using TagLUT = std::unordered_map<Tag, TagVal>

Tag lookup table.

It is used to instantiate a string template

Definition at line 130 of file GpuKernelVariableTable.h.

◆ TileContainer

using TileContainer = std::vector<std::vector<std::string> >

Definition at line 50 of file GpuCkwDirectConv2d.cpp.

◆ UnitWorkloadId

using UnitWorkloadId = int32_t

Uniquely identifies a GpuUnitWorkload within a GpuWorkloadSourceCode.

Definition at line 75 of file GpuWorkloadSourceCode.h.

Enumeration Type Documentation

◆ GpuComponentType

enum GpuComponentType
strong

Component type in the context of fusion Its main purpose is to inform the optimizer how to perform fusion.

Enumerator
Complex 
Simple 
Unfusable 
Output 

Definition at line 42 of file Types.h.

43 {
44  Complex,
45  Simple,
46  Unfusable,
47  Output
48 };

◆ GpuLanguage

enum GpuLanguage
strong

Gpu Language.

Enumerator
OpenCL 
Unknown 

Definition at line 44 of file GpuWorkloadContext.h.

45 {
46  OpenCL,
47  Unknown
48 };

◆ GpuOperatorType

enum GpuOperatorType
strong

Contain properties common to all operator types.

Operator type in the context of fusion

Enumerator
Simple 

Simple operators are operators that:

  1. Have a 1-to-1 mapping between the input elements and output elements, like elementwise
  2. Have exactly 1 output
Complex 

Complex operators are operators that are not simple but are still fusable with simple ones.

Unfusable 

Unfusable operators are operators that cannot be fused with any other types of operators.

Definition at line 37 of file GpuOperatorProperties.h.

38 {
39  /** Simple operators are operators that:
40  * 1. Have a 1-to-1 mapping between the input elements and output elements, like elementwise
41  * 2. Have exactly 1 output
42  */
43  Simple,
44  /** Complex operators are operators that are not simple but are still fusable with simple ones
45  */
46  Complex,
47  /** Unfusable operators are operators that cannot be fused with any other types of operators
48  */
49  Unfusable
50 };

◆ MemoryType

enum MemoryType
strong

Type of memory used by a workload tensor.

We can classify tensors in 2 dimensions: Topology (where they are in a workload) and Memory allocation: Topology: Argument tensors: "Outer" tensors exposed to the users as inputs and outputs (arguments) Intermediate tensors: "Inner" tensors hidden from the users as links between operators Memory allocation: Alloc: Tensors that need to be allocated real backing memory No-Alloc: Tensors that don't need to be allocated real backing memory

We end up with 3 MemoryType based on the product of these two classifications | Argument | Intermediate | ------—*-------------—*----------------—* Alloc | User | Auxiliary | ------—*-------------—*----------------—* No-Alloc * N/A | Virtual | ------—*-------------—*----------------—*

Enumerator
User 

Both User and Auxiliary types are of Alloc type.

Since they require memory allocation Memory coming directly from users, e.g. for argument tensors

Auxiliary 

Additional memory required by the workload tensor, e.g.

for tensors holding temporary results between kernels

Virtual 

Virtual type is of No-Alloc type.

Since it doesn't require memory allocation Temporary tile which is not allocated as a whole tensor in the memory. It is mainly used at sketch time to link operators; there should be no Virtual tensors at runtime

Definition at line 53 of file MemoryDescriptor.h.

54 {
55  /** Both User and Auxiliary types are of Alloc type. Since they require memory allocation */
56  User = 0, /**< Memory coming directly from users, e.g. for argument tensors */
57  Auxiliary =
58  1, /**< Additional memory required by the workload tensor, e.g. for tensors holding temporary results between kernels */
59  /** Virtual type is of No-Alloc type. Since it doesn't require memory allocation */
60  Virtual =
61  2, /**< Temporary tile which is not allocated as a whole tensor in the memory. It is mainly used at sketch time to link operators; there should be no Virtual tensors at runtime */
62 };

Function Documentation

◆ cl_add_buffer_argument()

void cl_add_buffer_argument ( cl::Kernel &  kernel,
unsigned int &  idx,
const cl::Buffer &  buffer 
)

Add an OpenCL buffer object to the kernel's arguments at the specified index idx.

Parameters
[in,out]kernelOpenCL kernel to configure with the provided argument.
[in,out]idxIndex at which to add the argument.
[in]bufferOpenCL buffer containing the tensor's data.

Definition at line 93 of file GpuCkwKernelArgumentsHelpers.cpp.

94 {
95  kernel.setArg(idx++, buffer);
96 }

◆ cl_add_tensor_component_argument()

void cl_add_tensor_component_argument ( cl::Kernel &  kernel,
unsigned int &  idx,
const ICLTensor tensor,
TensorComponentType  component 
)

Select a Compute Kernel Writer tensor component from a tensor and add to the kernel's arguments at the specified index idx.

Parameters
[in,out]kernelOpenCL kernel to configure with the provided argument.
[in,out]idxIndex at which to add the argument.
[in]tensorTensor from which to access the tensor component.
[in]componentTensor component to select such as tensor dimensions, strides, etc.

Definition at line 33 of file GpuCkwKernelArgumentsHelpers.cpp.

37 {
38  ARM_COMPUTE_ERROR_ON(tensor == nullptr);
39 
40  const auto *info = tensor->info();
41  const auto &strides = info->strides_in_bytes();
42 
43  switch (component)
44  {
45  case TensorComponentType::OffsetFirstElement:
46  kernel.setArg<cl_uint>(idx++, info->offset_first_element_in_bytes());
47  break;
48  case TensorComponentType::Stride0:
49  kernel.setArg<cl_uint>(idx++, strides[0]);
50  break;
51  case TensorComponentType::Stride1:
52  kernel.setArg<cl_uint>(idx++, strides[1]);
53  break;
54  case TensorComponentType::Stride2:
55  kernel.setArg<cl_uint>(idx++, strides[2]);
56  break;
57  case TensorComponentType::Stride3:
58  kernel.setArg<cl_uint>(idx++, strides[3]);
59  break;
60  case TensorComponentType::Stride4:
61  kernel.setArg<cl_uint>(idx++, strides[4]);
62  break;
63  case TensorComponentType::Dim0:
64  kernel.setArg<cl_uint>(idx++, info->dimension(0));
65  break;
66  case TensorComponentType::Dim1:
67  kernel.setArg<cl_uint>(idx++, info->dimension(1));
68  break;
69  case TensorComponentType::Dim2:
70  kernel.setArg<cl_uint>(idx++, info->dimension(2));
71  break;
72  case TensorComponentType::Dim3:
73  kernel.setArg<cl_uint>(idx++, info->dimension(3));
74  break;
75  case TensorComponentType::Dim4:
76  kernel.setArg<cl_uint>(idx++, info->dimension(4));
77  break;
78  case TensorComponentType::Dim1xDim2:
79  kernel.setArg<cl_uint>(idx++, info->dimension(1) * info->dimension(2));
80  break;
81  case TensorComponentType::Dim2xDim3:
82  kernel.setArg<cl_uint>(idx++, info->dimension(2) * info->dimension(3));
83  break;
84  case TensorComponentType::Dim1xDim2xDim3:
85  kernel.setArg<cl_uint>(idx++, info->dimension(1) * info->dimension(2) * info->dimension(3));
86  break;
87  case TensorComponentType::Unknown:
88  default:
89  ARM_COMPUTE_ERROR("Unknown tensor component");
90  }
91 }

References ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON, arm_compute::test::validation::info, and tensor.

◆ cl_add_texture_argument()

void cl_add_texture_argument ( cl::Kernel &  kernel,
unsigned int &  idx,
const cl::Image image 
)

Add an OpenCL image object to the kernel's arguments at the specified index idx.

Parameters
[in,out]kernelOpenCL kernel to configure with the provided argument.
[in,out]idxIndex at which to add the argument.
[in]imageOpenCL image containing the image's data.

Definition at line 98 of file GpuCkwKernelArgumentsHelpers.cpp.

99 {
100  kernel.setArg(idx++, image);
101 }

References caffe_mnist_image_extractor::image.

◆ convert_pool_attr_to_pool_info()

PoolingLayerInfo arm_compute::experimental::dynamic_fusion::convert_pool_attr_to_pool_info ( const Pool2dAttributes pool_attr,
bool  mixed_precision = false,
DataLayout  data_layout = DataLayout::NHWC 
)
inline

Inline function to convert Pool2dAttributes to PoolingLayerInfo.

Definition at line 66 of file Utils.h.

69 {
70  // Create PadStrideInfo
71  const Size2D stride = pool_attr.stride();
72  const Padding2D padding = pool_attr.pad();
73  const PadStrideInfo pad_stride(stride.x(), stride.y(), padding.left, padding.top,
75 
76  return PoolingLayerInfo(pool_attr.pool_type(), pool_attr.pool_size(), data_layout, pad_stride,
77  pool_attr.exclude_padding(), mixed_precision);
78 }

References arm_compute::cpu::data_layout, Pool2dAttributes::exclude_padding(), arm_compute::FLOOR, Padding2D::left, Pool2dAttributes::pad(), Pool2dAttributes::pool_size(), Pool2dAttributes::pool_type(), Pool2dAttributes::stride(), Padding2D::top, Size2D::x(), and Size2D::y().

Referenced by ClComponentPool2d::validate().

◆ create_boundary_aware_2d_sampler()

TensorTileSampler arm_compute::experimental::dynamic_fusion::create_boundary_aware_2d_sampler ( GpuCkwScopedKernelWriter  writer,
TileOperand &  gid_0,
TileOperand &  gid_1,
int32_t  dim0_v,
int32_t  dim1_v,
int32_t  n0_v,
int32_t  m0_v,
const std::string  prefix,
TileOperand &  const_0 
)
inline

Declare coordinate tiles "{prefix}_dim0_coord" and "{prefix}_dim1_coord", and create a boundary-aware sampler from tile of size [n0, m0], against the overall dimensions [dim0, dim1] The load and store of tile [n0, m0] will never be out of bound of [dim0, dim1].

Declare coordinate tiles "{prefix}_dim0_coord" and "{prefix}_dim1_coord", and create a boundary-aware sampler from tile of size [n0, m0], against the overall dimensions [dim0, dim1] The load and store of tile [n0, m0] will never be out of bound of [dim0, dim1]

Parameters
[in,out]writerWriter
[in]gid_0Global work item id 0
[in]gid_1Global work item id 1
[in]dim0_vDimension 0
[in]dim1_vDimension 1
[in]n0_vTile size dimension 0
[in]m0_vTile size dimension 1
[in]prefixPrefix to all the tiles declared within this function
[in]const_0Constant tile of value 0
Returns
TensorTileSampler

Definition at line 137 of file WriterHelper.h.

146 {
147  // Clamp tile size [n0, m0] against dimension [dim0, dim1]
148  // This is needed to:
149  // * Guard against tile sizes are bigger than the tensor dimensions
150  // * Handle broadcasting tiles (e.g. src tensor is of size 1 in one of the dimensions)
151  n0_v = utility::clamp(n0_v, 1, dim0_v);
152  m0_v = utility::clamp(m0_v, 1, dim1_v);
153  const int32_t partial_n0_v = dim0_v % n0_v;
154  const int32_t partial_m0_v = dim1_v % m0_v;
155 
156  // Declare #prefix_dim0_coord and #prefix_dim1_coord
157  auto &dim0_coord = writer->declare_tile(prefix + "dim0_coord", ckw::DataType::Int32);
158  get_coord(writer, dim0_coord, gid_0, n0_v, partial_n0_v, prefix + "dim0_", const_0);
159  auto &dim1_coord = writer->declare_tile(prefix + "dim1_coord", ckw::DataType::Int32);
160  get_coord(writer, dim1_coord, gid_1, m0_v, partial_m0_v, prefix + "dim1_", const_0);
161 
162  // Set sampler
163  // Only set fields related to boundary aware loading/storing. Other info (e.g. format) is not responsibility of this function
164  TensorTileSampler sampler;
165 
166  sampler.x(dim0_coord);
167  sampler.y(dim1_coord);
168 
169  sampler.width(n0_v);
170  sampler.height(m0_v);
171 
172  sampler.address_mode_x(TensorSamplerAddressModeX::None);
173  sampler.address_mode_y(TensorSamplerAddressModeY::None);
174  sampler.address_mode_z(TensorSamplerAddressModeZ::None);
175 
176  return sampler;
177 }

References arm_compute::utility::clamp(), get_coord(), tf_frozen_model_extractor::None, and check_header_guards::prefix.

Referenced by GpuCkwElementwiseBinary::write_component_code().

◆ from_ckw() [1/2]

TensorComponentType arm_compute::experimental::dynamic_fusion::from_ckw ( const ckw::TensorComponentType &  component)
inline

Definition at line 94 of file Common.h.

95 {
96  switch (component)
97  {
98  case ckw::TensorComponentType::OffsetFirstElement:
99  return TensorComponentType::OffsetFirstElement;
100  case ckw::TensorComponentType::Stride0:
101  return TensorComponentType::Stride0;
102  case ckw::TensorComponentType::Stride1:
103  return TensorComponentType::Stride1;
104  case ckw::TensorComponentType::Stride2:
105  return TensorComponentType::Stride2;
106  case ckw::TensorComponentType::Stride3:
107  return TensorComponentType::Stride3;
108  case ckw::TensorComponentType::Stride4:
109  return TensorComponentType::Stride4;
110  case ckw::TensorComponentType::Dim0:
111  return TensorComponentType::Dim0;
112  case ckw::TensorComponentType::Dim1:
113  return TensorComponentType::Dim1;
114  case ckw::TensorComponentType::Dim2:
115  return TensorComponentType::Dim2;
116  case ckw::TensorComponentType::Dim3:
117  return TensorComponentType::Dim3;
118  case ckw::TensorComponentType::Dim4:
119  return TensorComponentType::Dim4;
120  case ckw::TensorComponentType::Dim1xDim2:
121  return TensorComponentType::Dim1xDim2;
122  case ckw::TensorComponentType::Dim2xDim3:
123  return TensorComponentType::Dim2xDim3;
124  case ckw::TensorComponentType::Dim1xDim2xDim3:
125  return TensorComponentType::Dim1xDim2xDim3;
126  case ckw::TensorComponentType::Unknown:
127  return TensorComponentType::Unknown;
128  default:
129  ARM_COMPUTE_ERROR("Unknown CKW tensor component");
130  return TensorComponentType::Unknown;
131  }
132 }

References ARM_COMPUTE_ERROR.

Referenced by GpuCkwDriver::get_kernel_arguments().

◆ from_ckw() [2/2]

TensorStorageType arm_compute::experimental::dynamic_fusion::from_ckw ( const ckw::TensorStorageType &  storage)
inline

Definition at line 151 of file Common.h.

152 {
153  switch (storage)
154  {
155  case ckw::TensorStorageType::BufferUint8Ptr:
156  return TensorStorageType::ClBufferUint8Ptr;
157  case ckw::TensorStorageType::Texture2dReadOnly:
158  return TensorStorageType::ClImage2dReadOnly;
159  case ckw::TensorStorageType::Texture2dWriteOnly:
160  return TensorStorageType::ClImage2dWriteOnly;
161  case ckw::TensorStorageType::Unknown:
162  return TensorStorageType::Unknown;
163  default:
164  ARM_COMPUTE_ERROR("Unknown CKW tensor storage type");
165  return TensorStorageType::Unknown;
166  }
167 }

References ARM_COMPUTE_ERROR.

◆ get_coord()

void arm_compute::experimental::dynamic_fusion::get_coord ( GpuCkwScopedKernelWriter  writer,
TileOperand &  coord,
const TileOperand &  gid,
int32_t  step_v,
int32_t  leftover_step_v,
const std::string &  prefix,
const TileOperand &  const_0 
)
inline

Get boundary aware coordinate along one axis.

Load and store of size step_v at the coordinate will not be out of bound

Parameters
[in,out]writerWriter
[out]coordResultant coordinate
[in]gidGlobal work item id
[in]step_vStep size / vector size
[in]leftover_step_vLeftover step size at the boundary
[in]prefixPrefix to all the tiles declared within this function
[in]const_0Constant tile of value 0

Definition at line 87 of file WriterHelper.h.

94 {
95  auto &step = writer->declare_tile(prefix + "step", step_v);
96  auto &leftover_step = writer->declare_tile(prefix + "leftover_step", leftover_step_v);
97 
98  // step - leftover_step
99  auto &step_minus_leftover = writer->declare_tile(prefix + "step_minus_leftover", ckw::DataType::Int32);
100  writer->op_binary_expression(step_minus_leftover, step, ckw::BinaryOp::Sub, leftover_step);
101 
102  // (step - leftover_step) % step
103  auto &coord_correction = writer->declare_tile(prefix + "coord_correction", ckw::DataType::Int32);
104  writer->op_binary_expression(coord_correction, step_minus_leftover, ckw::BinaryOp::Mod, step);
105 
106  // (gid * step)
107  auto &raw_coord = writer->declare_tile(prefix + "raw_coord", ckw::DataType::Int32);
108  writer->op_binary_expression(raw_coord, gid, ckw::BinaryOp::Mul, step);
109 
110  // (gid * step) - (step - leftover_step) % step
111  auto &corrected_coord = writer->declare_tile(prefix + "corrected_coord", ckw::DataType::Int32);
112  writer->op_binary_expression(corrected_coord, raw_coord, ckw::BinaryOp::Sub, coord_correction);
113 
114  // max((gid * step) - (step - leftover_step) % step, 0)
115  writer->op_binary_elementwise_function(coord, ckw::BinaryFunction::Max, corrected_coord, const_0);
116 }

References check_header_guards::prefix, and arm_compute::cpu::step.

Referenced by create_boundary_aware_2d_sampler(), GpuCkwPool2d::write_component_code(), GpuCkwDepthwiseConv2d::write_component_code(), GpuCkwMatMul::write_component_code(), and GpuCkwDirectConv2d::write_component_code().

◆ is_alloc_tensor()

bool arm_compute::experimental::dynamic_fusion::is_alloc_tensor ( const ITensorInfo tensor_info)
inline

Tensor should have backing memory.

MemoryType

Definition at line 38 of file Utils.h.

39 {
40  return tensor_info->id() > ITensorInfo::invalid_tensor_id;
41 }

References ITensorInfo::invalid_tensor_id, and tensor_info.

Referenced by GpuOutput::validate_op().

◆ is_invalid_tensor()

bool arm_compute::experimental::dynamic_fusion::is_invalid_tensor ( const ITensorInfo tensor_info)
inline

ITensorInfo has invalid id

Definition at line 59 of file Utils.h.

60 {
62 }

References is_valid_tensor(), and tensor_info.

◆ is_noalloc_tensor()

bool arm_compute::experimental::dynamic_fusion::is_noalloc_tensor ( const ITensorInfo tensor_info)
inline

Tensor should not have backing memory.

MemoryType

Definition at line 45 of file Utils.h.

46 {
47  return tensor_info->id() < ITensorInfo::invalid_tensor_id;
48 }

References ITensorInfo::invalid_tensor_id, and tensor_info.

◆ is_valid_tensor()

bool arm_compute::experimental::dynamic_fusion::is_valid_tensor ( const ITensorInfo tensor_info)
inline

ITensorInfo has valid id

Definition at line 52 of file Utils.h.

53 {
54  return tensor_info->has_valid_id();
55 }

References tensor_info.

Referenced by is_invalid_tensor().

◆ load_src_dst_tiles_and_prepare_sampler()

void arm_compute::experimental::dynamic_fusion::load_src_dst_tiles_and_prepare_sampler ( GpuCkwScopedKernelWriter writer,
GpuCkwComponentArgument src,
GpuCkwComponentArgument dst,
int32_t  m0,
int32_t  n0,
SamplerCreator  create_sampler 
)
inline

Load src and dst tiles of dimension [m0, n0] only when not loaded and prepare the sampler.

Definition at line 48 of file WriterHelper.h.

54 {
55  if (!src->has_tile())
56  {
57  const auto sampler = create_sampler(writer, m0, n0);
58  writer->op_load_once(src, sampler);
59  }
60  else
61  {
62  const auto &sampler = src->tile_sampler();
63  writer->op_load_once(src, sampler);
64  }
65 
66  auto &src_tile = src->tile();
67  const auto &sampler = src->tile_sampler();
68 
69  // Prepare the output tile.
70  if (!dst->has_tile())
71  {
72  auto &tile = writer->declare_tile("dst_tile", src_tile.tile_info());
73  dst->init_virtual_tensor(tile, sampler);
74  }
75 }

References arm_compute::test::validation::dst, GpuCkwKernelWriter::op_load_once(), arm_compute::test::validation::src, and arm_compute::test::validation::reference::tile().

Referenced by GpuCkwActivation::write_component_code().

◆ operator==() [1/3]

bool operator== ( const GpuKernelArgumentInfo info0,
const GpuKernelArgumentInfo info1 
)

Definition at line 31 of file GpuKernelArgument.cpp.

32 {
33  return info0.type == info1.type;
34 }

References GpuKernelArgumentInfo::type.

◆ operator==() [2/3]

bool arm_compute::experimental::dynamic_fusion::operator== ( const KernelProperties config0,
const KernelProperties config1 
)
inline

Definition at line 56 of file IGpuKernelComponent.h.

57 {
58  return config0.stage() == config1.stage();
59 }

References KernelProperties::stage().

◆ operator==() [3/3]

bool arm_compute::experimental::dynamic_fusion::operator== ( const UnitWorkloadStage stage0,
const UnitWorkloadStage stage1 
)
inline

Definition at line 193 of file GpuWorkloadSourceCode.h.

194 {
195  return stage0.stage == stage1.stage;
196 }

References UnitWorkloadStage::stage.

◆ to_ckw() [1/6]

ckw::BinaryOp arm_compute::experimental::dynamic_fusion::to_ckw ( const ElementwiseBinaryCommonAttributes attributes)
inline

Definition at line 37 of file ElementwiseBinary.h.

38 {
39  switch (attributes.operation())
40  {
41  case ElementwiseBinaryCommonAttributes::ElementwiseOp::Add:
42  return ckw::BinaryOp::Add;
43  case ElementwiseBinaryCommonAttributes::ElementwiseOp::Sub:
44  return ckw::BinaryOp::Sub;
45  case ElementwiseBinaryCommonAttributes::ElementwiseOp::Div:
46  return ckw::BinaryOp::Div;
47  case ElementwiseBinaryCommonAttributes::ElementwiseOp::Mul:
48  return ckw::BinaryOp::Mul;
49  case ElementwiseBinaryCommonAttributes::ElementwiseOp::Min:
50  case ElementwiseBinaryCommonAttributes::ElementwiseOp::Max:
51  case ElementwiseBinaryCommonAttributes::ElementwiseOp::Power:
52  case ElementwiseBinaryCommonAttributes::ElementwiseOp::Prelu:
53  case ElementwiseBinaryCommonAttributes::ElementwiseOp::SquaredDiff:
54  default:
55  ARM_COMPUTE_ERROR("Cannot convert ElementwiseBinaryCommonAttributes to corresponding ckw::BinaryOp");
56  }
57 }

References ElementwiseBinaryCommonAttributes::Add, ARM_COMPUTE_ERROR, ElementwiseBinaryCommonAttributes::Div, ElementwiseBinaryCommonAttributes::Max, ElementwiseBinaryCommonAttributes::Min, ElementwiseBinaryCommonAttributes::Mul, ElementwiseBinaryCommonAttributes::operation(), ElementwiseBinaryCommonAttributes::Power, ElementwiseBinaryCommonAttributes::Prelu, ElementwiseBinaryCommonAttributes::SquaredDiff, and ElementwiseBinaryCommonAttributes::Sub.

◆ to_ckw() [2/6]

ckw::TensorInfo arm_compute::experimental::dynamic_fusion::to_ckw ( const ITensorInfo tensor_info)
inline

Definition at line 88 of file Common.h.

89 {
90  return ckw::TensorInfo{to_ckw(tensor_info.data_type()), to_ckw(tensor_info.tensor_shape()),
91  to_ckw(tensor_info.data_layout()), tensor_info.id()};
92 }

References tensor_info, and to_ckw().

◆ to_ckw() [3/6]

ckw::TensorShape arm_compute::experimental::dynamic_fusion::to_ckw ( const TensorShape shape)
inline

NOTE: Overflow danger. Use size_t?

Definition at line 67 of file Common.h.

68 {
69  ARM_COMPUTE_ERROR_ON(shape.num_max_dimensions < std::tuple_size<ckw::TensorShape>{});
70  ARM_COMPUTE_ERROR_ON(std::tuple_size<ckw::TensorShape>{} != 5);
71  /// NOTE: Overflow danger. Use size_t?
72  return ckw::TensorShape{static_cast<int32_t>(shape[0]), static_cast<int32_t>(shape[1]),
73  static_cast<int32_t>(shape[2]), static_cast<int32_t>(shape[3]),
74  static_cast<int32_t>(shape[4])};
75 }

References ARM_COMPUTE_ERROR_ON, and arm_compute::test::validation::shape.

◆ to_ckw() [4/6]

ckw::TensorStorageType arm_compute::experimental::dynamic_fusion::to_ckw ( const TensorStorageType &  storage)
inline

Definition at line 134 of file Common.h.

135 {
136  switch (storage)
137  {
138  case TensorStorageType::ClBufferUint8Ptr:
139  return ckw::TensorStorageType::BufferUint8Ptr;
140  case TensorStorageType::ClImage2dReadOnly:
141  return ckw::TensorStorageType::Texture2dReadOnly;
142  case TensorStorageType::ClImage2dWriteOnly:
143  return ckw::TensorStorageType::Texture2dWriteOnly;
144  case TensorStorageType::Unknown:
145  return ckw::TensorStorageType::Unknown;
146  default:
147  ARM_COMPUTE_ERROR("Unknown tensor storage type");
148  return ckw::TensorStorageType::Unknown;
149  }
150 }

References ARM_COMPUTE_ERROR.

◆ to_ckw() [5/6]

ckw::TensorDataLayout arm_compute::experimental::dynamic_fusion::to_ckw ( DataLayout  dl)
inline

Definition at line 76 of file Common.h.

77 {
78  switch (dl)
79  {
80  case DataLayout::NHWC:
81  return ckw::TensorDataLayout::Nhwc;
82  case DataLayout::NDHWC:
83  return ckw::TensorDataLayout::Ndhwc;
84  default:
85  return ckw::TensorDataLayout::Unknown;
86  }
87 }

References dl, arm_compute::NDHWC, and arm_compute::NHWC.

◆ to_ckw() [6/6]

ckw::DataType arm_compute::experimental::dynamic_fusion::to_ckw ( DataType  dt)
inline

Definition at line 40 of file Common.h.

41 {
42  switch (dt)
43  {
44  case DataType::F32:
45  return ckw::DataType::Fp32;
46  case DataType::F16:
47  return ckw::DataType::Fp16;
48  case DataType::S32:
49  return ckw::DataType::Int32;
50  case DataType::S16:
51  return ckw::DataType::Int16;
52  case DataType::S8:
53  case DataType::QASYMM8_SIGNED:
54  return ckw::DataType::Int8;
55  case DataType::U32:
56  return ckw::DataType::Uint32;
57  case DataType::U16:
58  return ckw::DataType::Uint16;
59  case DataType::U8:
60  case DataType::QASYMM8:
61  return ckw::DataType::Uint8;
62  default:
63  return ckw::DataType::Unknown;
64  }
65 }

References dt, arm_compute::F16, arm_compute::F32, arm_compute::QASYMM8, arm_compute::QASYMM8_SIGNED, arm_compute::S16, arm_compute::S32, arm_compute::S8, arm_compute::U16, arm_compute::U32, and arm_compute::U8.

Referenced by GpuCkwVariableTable::declare_variable(), to_ckw(), GpuCkwElementwiseBinary::write_component_code(), GpuCkwCast::write_component_code(), GpuCkwPool2d::write_component_code(), GpuCkwDepthwiseConv2d::write_component_code(), GpuCkwDirectConv2d::write_component_code(), and GpuCkwMatMul::write_component_code().

Variable Documentation

◆ vector_size_byte_opencl

arm_compute::experimental::dynamic_fusion::MemoryType::Auxiliary
@ Auxiliary
Additional memory required by the workload tensor, e.g.
arm_compute::test::validation::src
SimpleTensor< float > src
Definition: DFT.cpp:155
arm_compute::experimental::dynamic_fusion::GpuComponentType::Complex
@ Complex
arm_compute::experimental::dynamic_fusion::MemoryType::Virtual
@ Virtual
Virtual type is of No-Alloc type.
caffe_mnist_image_extractor.image
float image
Definition: caffe_mnist_image_extractor.py:43
arm_compute::test::validation::dst
auto dst
Definition: DFT.cpp:170
dl
DataLayout dl
Definition: NEFuseBatchNormalizationKernel.cpp:50
ARM_COMPUTE_ERROR
#define ARM_COMPUTE_ERROR(msg)
Print the given message then throw an std::runtime_error.
Definition: Error.h:354
arm_compute::test::validation::reference::tile
SimpleTensor< T > tile(const SimpleTensor< T > &src, const Multiples &multiples)
Definition: Tile.cpp:38
arm_compute::cpu::data_layout
constexpr auto data_layout
Definition: impl.h:36
arm_compute::experimental::dynamic_fusion::is_valid_tensor
bool is_valid_tensor(const ITensorInfo *tensor_info)
ITensorInfo has valid id
Definition: Utils.h:52
arm_compute::utility::clamp
DataType clamp(const DataType &n, const DataType &lower=std::numeric_limits< RangeType >::lowest(), const DataType &upper=std::numeric_limits< RangeType >::max())
Performs clamping among a lower and upper value.
Definition: Utility.h:102
arm_compute::test::validation::shape
shape
Definition: DFT.cpp:115
ARM_COMPUTE_ERROR_ON
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:466
arm_compute::experimental::dynamic_fusion::GpuComponentType::Simple
@ Simple
dt
DataType dt
Definition: NEBatchNormalizationLayerKernel.cpp:50
tensor
CLTensor * tensor
Pointer to the auxiliary tensor.
Definition: ClWorkloadRuntime.cpp:67
arm_compute::experimental::dynamic_fusion::get_coord
void get_coord(GpuCkwScopedKernelWriter writer, TileOperand &coord, const TileOperand &gid, int32_t step_v, int32_t leftover_step_v, const std::string &prefix, const TileOperand &const_0)
Get boundary aware coordinate along one axis.
Definition: WriterHelper.h:87
arm_compute::experimental::dynamic_fusion::GpuComponentType::Unfusable
@ Unfusable
arm_compute::experimental::dynamic_fusion::MemoryType::User
@ User
Both User and Auxiliary types are of Alloc type.
arm_compute::DimensionRoundingType::FLOOR
@ FLOOR
Floor rounding.
tf_frozen_model_extractor.None
None
Definition: tf_frozen_model_extractor.py:41
check_header_guards.prefix
prefix
Definition: check_header_guards.py:180
arm_compute::graph::NodeType::Output
@ Output
arm_compute::test::validation::info
ScaleKernelInfo info(interpolation_policy, default_border_mode, PixelValue(), sampling_policy, false)
arm_compute::cpu::step
constexpr int step
Definition: fp32.cpp:35
arm_compute::experimental::dynamic_fusion::GpuLanguage::OpenCL
@ OpenCL
acl::DataType::Unknown
@ Unknown
tensor_info
TensorInfo tensor_info
Associated tensor info.
Definition: ClWorkloadRuntime.cpp:68
arm_compute::experimental::dynamic_fusion::to_ckw
ckw::TensorStorageType to_ckw(const TensorStorageType &storage)
Definition: Common.h:134