ArmNN
 25.02
All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Pages
ClBackend Class Reference

#include <ClBackend.hpp>

Inheritance diagram for ClBackend:
[legend]
Collaboration diagram for ClBackend:
[legend]

Classes

class  ClBackendCustomAllocatorMemoryRegion
 
class  ClBackendCustomAllocatorWrapper
 

Public Member Functions

 ClBackend ()
 
 ClBackend (std::shared_ptr< ICustomAllocator > allocator)
 
 ~ClBackend ()=default
 
const BackendIdGetId () const override
 
IBackendInternal::IMemoryManagerUniquePtr CreateMemoryManager () const override
 
IBackendInternal::IWorkloadFactoryPtr CreateWorkloadFactory (const IBackendInternal::IMemoryManagerSharedPtr &memoryManager=nullptr) const override
 
IBackendInternal::IWorkloadFactoryPtr CreateWorkloadFactory (TensorHandleFactoryRegistry &registry) const override
 
IWorkloadFactoryPtr CreateWorkloadFactory (const IMemoryManagerSharedPtr &memoryManager, const ModelOptions &modelOptions) const override
 
IWorkloadFactoryPtr CreateWorkloadFactory (class TensorHandleFactoryRegistry &tensorHandleFactoryRegistry, const ModelOptions &modelOptions) const override
 
IWorkloadFactoryPtr CreateWorkloadFactory (class TensorHandleFactoryRegistry &tensorHandleFactoryRegistry, const ModelOptions &modelOptions, MemorySourceFlags inputFlags, MemorySourceFlags outputFlags) const override
 
std::vector< ITensorHandleFactory::FactoryIdGetHandleFactoryPreferences () const override
 (Optional) Returns a vector of supported TensorHandleFactory ids in preference order. More...
 
void RegisterTensorHandleFactories (TensorHandleFactoryRegistry &registry) override
 (Optional) Register TensorHandleFactories Either this method or CreateMemoryManager() and IWorkloadFactory::CreateTensor() IWorkloadFactory::CreateSubtensor() methods must be implemented. More...
 
void RegisterTensorHandleFactories (TensorHandleFactoryRegistry &registry, MemorySourceFlags inputFlags, MemorySourceFlags outputFlags) override
 (Optional) Register TensorHandleFactories Either this method or CreateMemoryManager() and IWorkloadFactory::CreateTensor() IWorkloadFactory::CreateSubtensor() methods must be implemented. More...
 
IBackendInternal::IBackendContextPtr CreateBackendContext (const IRuntime::CreationOptions &) const override
 Create the runtime context of the backend. More...
 
IBackendInternal::IBackendProfilingContextPtr CreateBackendProfilingContext (const IRuntime::CreationOptions &, IBackendProfilingPtr &backendProfiling) override
 Create context specifically used for profiling interaction from backends. More...
 
IBackendInternal::ILayerSupportSharedPtr GetLayerSupport () const override
 
IBackendInternal::ILayerSupportSharedPtr GetLayerSupport (const ModelOptions &modelOptions) const override
 
OptimizationViews OptimizeSubgraphView (const SubgraphView &subgraph, const ModelOptions &modelOptions) const override
 
IBackendInternal::IBackendSpecificModelContextPtr CreateBackendSpecificModelContext (const ModelOptions &modelOptions) const override
 
std::unique_ptr< ICustomAllocatorGetDefaultAllocator () const override
 Returns the default memory allocator for the backend. More...
 
BackendCapabilities GetCapabilities () const override
 Returns a BackendCapability if the backend lists the capability The BackendCapability must then be inspected to check whether or not that BackendCapability is supported Otherwise returns an EmptyOptional if the BackendCapability is unlisted. More...
 
virtual bool UseCustomMemoryAllocator (std::shared_ptr< ICustomAllocator > allocator, armnn::Optional< std::string & > errMsg) override
 Signals the backend to use a custom memory allocator provided by the user. More...
 
virtual unsigned int GetNumberOfCacheFiles () const override
 Returns the number of files cached if backend supports caching. More...
 
- Public Member Functions inherited from IBackendInternal
 ~IBackendInternal () override=default
 Allow backends created by the factory function to be destroyed through IBackendInternal. More...
 
virtual OptimizationViews OptimizeSubgraphView (const SubgraphView &subgraph) const
 
bool SupportsTensorAllocatorAPI () const
 
ITensorHandleFactory::FactoryId GetBackwardCompatibleFavoriteHandleFactory ()
 

Static Public Member Functions

static const BackendIdGetIdStatic ()
 
- Static Public Member Functions inherited from IBackendInternal
static constexpr BackendVersion GetApiVersion ()
 Returns the version of the Backend API. More...
 

Public Attributes

std::shared_ptr< ClBackendCustomAllocatorWrapperm_CustomAllocator
 
bool m_UsingCustomAllocator = false
 

Additional Inherited Members

- Public Types inherited from IBackendInternal
using IWorkloadFactoryPtr = std::unique_ptr< IWorkloadFactory >
 
using IBackendContextPtr = std::unique_ptr< IBackendContext >
 
using IBackendProfilingContextPtr = std::shared_ptr< arm::pipe::IBackendProfilingContext >
 This is the bridge between backend and backend profiling we'll keep it in the backend namespace. More...
 
using IBackendProfilingPtr = std::unique_ptr< arm::pipe::IBackendProfiling >
 
using ILayerSupportSharedPtr = std::shared_ptr< ILayerSupport >
 
using IBackendSpecificModelContextPtr = std::shared_ptr< IBackendModelContext >
 
using IMemoryManagerUniquePtr = std::unique_ptr< IMemoryManager >
 
using IMemoryManagerSharedPtr = std::shared_ptr< IMemoryManager >
 
- Protected Member Functions inherited from IBackendInternal
 IBackendInternal ()=default
 Creation must be done through a specific backend interface. More...
 
- Protected Member Functions inherited from IBackend
 IBackend ()
 
virtual ~IBackend ()
 

Detailed Description

Definition at line 24 of file ClBackend.hpp.

Constructor & Destructor Documentation

◆ ClBackend() [1/2]

ClBackend ( )
inline

Definition at line 27 of file ClBackend.hpp.

27 : m_CustomAllocator(nullptr) {};
std::shared_ptr< ClBackendCustomAllocatorWrapper > m_CustomAllocator
Definition: ClBackend.hpp:283

◆ ClBackend() [2/2]

ClBackend ( std::shared_ptr< ICustomAllocator allocator)
inline

Definition at line 28 of file ClBackend.hpp.

29  {
30  std::string err;
31  UseCustomMemoryAllocator(allocator, err);
32  }
virtual bool UseCustomMemoryAllocator(std::shared_ptr< ICustomAllocator > allocator, armnn::Optional< std::string & > errMsg) override
Signals the backend to use a custom memory allocator provided by the user.
Definition: ClBackend.hpp:82

References ClBackend::UseCustomMemoryAllocator().

◆ ~ClBackend()

~ClBackend ( )
default

Member Function Documentation

◆ CreateBackendContext()

IBackendInternal::IBackendContextPtr CreateBackendContext ( const IRuntime::CreationOptions ) const
overridevirtual

Create the runtime context of the backend.

Implementations may return a default-constructed IBackendContextPtr if no context is needed at runtime. Implementations must throw BackendUnavailableException if the backend cannot be used (for example, necessary accelerator hardware is not present). The default implementation always returns a default-constructed pointer.

Reimplemented from IBackendInternal.

Definition at line 235 of file ClBackend.cpp.

236 {
237  return IBackendContextPtr{new ClBackendContext{options}};
238 }
std::unique_ptr< IBackendContext > IBackendContextPtr

◆ CreateBackendProfilingContext()

IBackendInternal::IBackendProfilingContextPtr CreateBackendProfilingContext ( const IRuntime::CreationOptions creationOptions,
IBackendProfilingPtr backendProfiling 
)
overridevirtual

Create context specifically used for profiling interaction from backends.

Reimplemented from IBackendInternal.

Definition at line 240 of file ClBackend.cpp.

242 {
244 }
std::shared_ptr< arm::pipe::IBackendProfilingContext > IBackendProfilingContextPtr
This is the bridge between backend and backend profiling we'll keep it in the backend namespace.

◆ CreateBackendSpecificModelContext()

IBackendInternal::IBackendSpecificModelContextPtr CreateBackendSpecificModelContext ( const ModelOptions modelOptions) const
overridevirtual

Reimplemented from IBackendInternal.

Definition at line 246 of file ClBackend.cpp.

248 {
249  return IBackendSpecificModelContextPtr{new ClBackendModelContext{modelOptions}};
250 }
std::shared_ptr< IBackendModelContext > IBackendSpecificModelContextPtr

Referenced by ClBackend::CreateWorkloadFactory(), ClBackend::GetLayerSupport(), and ClBackend::OptimizeSubgraphView().

◆ CreateMemoryManager()

IBackendInternal::IMemoryManagerUniquePtr CreateMemoryManager ( ) const
overridevirtual

Reimplemented from IBackendInternal.

Definition at line 50 of file ClBackend.cpp.

51 {
53  {
54  return std::make_unique<ClMemoryManager>(m_CustomAllocator);
55  }
56  return std::make_unique<ClMemoryManager>(std::make_unique<arm_compute::CLBufferAllocator>());
57 }
bool m_UsingCustomAllocator
Definition: ClBackend.hpp:284

References ClBackend::m_CustomAllocator, and ClBackend::m_UsingCustomAllocator.

◆ CreateWorkloadFactory() [1/5]

IBackendInternal::IWorkloadFactoryPtr CreateWorkloadFactory ( class TensorHandleFactoryRegistry tensorHandleFactoryRegistry,
const ModelOptions modelOptions 
) const
overridevirtual

Reimplemented from IBackendInternal.

Definition at line 101 of file ClBackend.cpp.

103 {
104  std::shared_ptr<ClMemoryManager> memoryManager;
106  {
107  memoryManager = std::make_shared<ClMemoryManager>(m_CustomAllocator);
108  }
109  else
110  {
111  memoryManager = std::make_shared<ClMemoryManager>(std::make_unique<arm_compute::CLBufferAllocator>());
112  }
113 
114  std::unique_ptr<ITensorHandleFactory> factory = std::make_unique<ClTensorHandleFactory>(memoryManager);
115  std::unique_ptr<ITensorHandleFactory> importFactory = std::make_unique<ClImportTensorHandleFactory>(
117 
118  registry.RegisterCopyAndImportFactoryPair(factory->GetId(), importFactory->GetId());
119  registry.RegisterCopyAndImportFactoryPair(importFactory->GetId(), factory->GetId());
120 
121  registry.RegisterMemoryManager(memoryManager);
122  registry.RegisterFactory(std::move(factory));
123  registry.RegisterFactory(std::move(importFactory));
124 
125  return std::make_unique<ClWorkloadFactory>(
126  PolymorphicPointerDowncast<ClMemoryManager>(memoryManager), CreateBackendSpecificModelContext(modelOptions));
127 }
IBackendInternal::IBackendSpecificModelContextPtr CreateBackendSpecificModelContext(const ModelOptions &modelOptions) const override
Definition: ClBackend.cpp:246
unsigned int MemorySourceFlags

References ClBackend::CreateBackendSpecificModelContext(), ClBackend::m_CustomAllocator, ClBackend::m_UsingCustomAllocator, armnn::Malloc, TensorHandleFactoryRegistry::RegisterCopyAndImportFactoryPair(), TensorHandleFactoryRegistry::RegisterFactory(), and TensorHandleFactoryRegistry::RegisterMemoryManager().

◆ CreateWorkloadFactory() [2/5]

IBackendInternal::IWorkloadFactoryPtr CreateWorkloadFactory ( class TensorHandleFactoryRegistry tensorHandleFactoryRegistry,
const ModelOptions modelOptions,
MemorySourceFlags  inputFlags,
MemorySourceFlags  outputFlags 
) const
overridevirtual

Reimplemented from IBackendInternal.

Definition at line 129 of file ClBackend.cpp.

134 {
135  // To allow force import if inputFlags/outputFlags are Undefined, set it as Malloc
136  if (inputFlags == static_cast<MemorySourceFlags>(MemorySource::Undefined))
137  {
138  inputFlags = static_cast<MemorySourceFlags>(MemorySource::Malloc);
139  }
140  if (outputFlags == static_cast<MemorySourceFlags>(MemorySource::Undefined))
141  {
142  outputFlags = static_cast<MemorySourceFlags>(MemorySource::Malloc);
143  }
144  std::shared_ptr<ClMemoryManager> memoryManager;
146  {
147  memoryManager = std::make_shared<ClMemoryManager>(m_CustomAllocator);
148  }
149  else
150  {
151  memoryManager = std::make_shared<ClMemoryManager>(std::make_unique<arm_compute::CLBufferAllocator>());
152  }
153 
154  std::unique_ptr<ITensorHandleFactory> factory = std::make_unique<ClTensorHandleFactory>(memoryManager);
155  std::unique_ptr<ITensorHandleFactory> importFactory = std::make_unique<ClImportTensorHandleFactory>(
156  inputFlags, outputFlags);
157 
158  registry.RegisterCopyAndImportFactoryPair(factory->GetId(), importFactory->GetId());
159  registry.RegisterCopyAndImportFactoryPair(importFactory->GetId(), factory->GetId());
160 
161  registry.RegisterMemoryManager(memoryManager);
162  registry.RegisterFactory(std::move(factory));
163  registry.RegisterFactory(std::move(importFactory));
164 
165  return std::make_unique<ClWorkloadFactory>(
166  PolymorphicPointerDowncast<ClMemoryManager>(memoryManager), CreateBackendSpecificModelContext(modelOptions));
167 }

References ClBackend::CreateBackendSpecificModelContext(), ClBackend::m_CustomAllocator, ClBackend::m_UsingCustomAllocator, armnn::Malloc, TensorHandleFactoryRegistry::RegisterCopyAndImportFactoryPair(), TensorHandleFactoryRegistry::RegisterFactory(), TensorHandleFactoryRegistry::RegisterMemoryManager(), and armnn::Undefined.

◆ CreateWorkloadFactory() [3/5]

IBackendInternal::IWorkloadFactoryPtr CreateWorkloadFactory ( const IBackendInternal::IMemoryManagerSharedPtr memoryManager = nullptr) const
overridevirtual

Implements IBackendInternal.

Definition at line 59 of file ClBackend.cpp.

61 {
62  return std::make_unique<ClWorkloadFactory>(
63  PolymorphicPointerDowncast<ClMemoryManager>(memoryManager));
64 }

◆ CreateWorkloadFactory() [4/5]

IBackendInternal::IWorkloadFactoryPtr CreateWorkloadFactory ( const IMemoryManagerSharedPtr memoryManager,
const ModelOptions modelOptions 
) const
overridevirtual

Reimplemented from IBackendInternal.

Definition at line 66 of file ClBackend.cpp.

68 {
69  return std::make_unique<ClWorkloadFactory>(
70  PolymorphicPointerDowncast<ClMemoryManager>(memoryManager), CreateBackendSpecificModelContext(modelOptions));
71 }

References ClBackend::CreateBackendSpecificModelContext().

◆ CreateWorkloadFactory() [5/5]

IBackendInternal::IWorkloadFactoryPtr CreateWorkloadFactory ( TensorHandleFactoryRegistry registry) const
overridevirtual

Reimplemented from IBackendInternal.

Definition at line 73 of file ClBackend.cpp.

75 {
76  std::shared_ptr<ClMemoryManager> memoryManager;
78  {
79  memoryManager = std::make_shared<ClMemoryManager>(m_CustomAllocator);
80  }
81  else
82  {
83  memoryManager = std::make_shared<ClMemoryManager>(std::make_unique<arm_compute::CLBufferAllocator>());
84  }
85 
86  std::unique_ptr<ITensorHandleFactory> factory = std::make_unique<ClTensorHandleFactory>(memoryManager);
87  std::unique_ptr<ITensorHandleFactory> importFactory = std::make_unique<ClImportTensorHandleFactory>(
89 
90  registry.RegisterCopyAndImportFactoryPair(factory->GetId(), importFactory->GetId());
91  registry.RegisterCopyAndImportFactoryPair(importFactory->GetId(), factory->GetId());
92 
93  registry.RegisterMemoryManager(memoryManager);
94  registry.RegisterFactory(std::move(factory));
95  registry.RegisterFactory(std::move(importFactory));
96 
97  return std::make_unique<ClWorkloadFactory>(
98  PolymorphicPointerDowncast<ClMemoryManager>(memoryManager));
99 }

References ClBackend::m_CustomAllocator, ClBackend::m_UsingCustomAllocator, armnn::Malloc, TensorHandleFactoryRegistry::RegisterCopyAndImportFactoryPair(), TensorHandleFactoryRegistry::RegisterFactory(), and TensorHandleFactoryRegistry::RegisterMemoryManager().

◆ GetCapabilities()

BackendCapabilities GetCapabilities ( ) const
overridevirtual

Returns a BackendCapability if the backend lists the capability The BackendCapability must then be inspected to check whether or not that BackendCapability is supported Otherwise returns an EmptyOptional if the BackendCapability is unlisted.

Reimplemented from IBackendInternal.

Definition at line 275 of file ClBackend.cpp.

276 {
277  // add new capabilities here..
278  return BackendCapabilities ("GpuAcc",
279  {
280  {"NonConstWeights", true},
281  {"ProtectedContentAllocation", true},
282  {"ConstantTensorsAsInputs", true},
283  {"PreImportIOTensors", false},
284  {"ExternallyManagedMemory", true},
285  {"MultiAxisPacking", false},
286  {"SingleAxisPacking", true},
287  {"AllOrNothing", false},
288  {"HasFp16", arm_compute::CLKernelLibrary::get().fp16_supported()}
289  });
290 }
BackendOptions BackendCapabilities

◆ GetDefaultAllocator()

std::unique_ptr< ICustomAllocator > GetDefaultAllocator ( ) const
overridevirtual

Returns the default memory allocator for the backend.

Returns
- Returns unique pointer to the Default Allocator of the Backend

Reimplemented from IBackendInternal.

Definition at line 270 of file ClBackend.cpp.

271 {
272  return std::make_unique<ClBackendDefaultAllocator>();
273 }

◆ GetHandleFactoryPreferences()

std::vector< ITensorHandleFactory::FactoryId > GetHandleFactoryPreferences ( ) const
overridevirtual

(Optional) Returns a vector of supported TensorHandleFactory ids in preference order.

Reimplemented from IBackendInternal.

Definition at line 169 of file ClBackend.cpp.

170 {
171  return std::vector<ITensorHandleFactory::FactoryId> {ClTensorHandleFactory::GetIdStatic(),
173 }
static const FactoryId & GetIdStatic()

References ClImportTensorHandleFactory::GetIdStatic(), and ClTensorHandleFactory::GetIdStatic().

◆ GetId()

const BackendId& GetId ( ) const
inlineoverridevirtual

Implements IBackend.

Definition at line 36 of file ClBackend.hpp.

36 { return GetIdStatic(); }
static const BackendId & GetIdStatic()
Definition: ClBackend.cpp:44

References ClBackend::GetIdStatic().

◆ GetIdStatic()

const BackendId & GetIdStatic ( )
static

Definition at line 44 of file ClBackend.cpp.

45 {
46  static const BackendId s_Id{ClBackendId()};
47  return s_Id;
48 }
constexpr const char * ClBackendId()
Definition: ClBackendId.hpp:10

References armnn::ClBackendId().

Referenced by ClBackend::GetId().

◆ GetLayerSupport() [1/2]

IBackendInternal::ILayerSupportSharedPtr GetLayerSupport ( ) const
overridevirtual

Implements IBackendInternal.

Definition at line 252 of file ClBackend.cpp.

253 {
254  static ILayerSupportSharedPtr layerSupport
255  {
257  };
258  return layerSupport;
259 }
std::shared_ptr< ILayerSupport > ILayerSupportSharedPtr

◆ GetLayerSupport() [2/2]

IBackendInternal::ILayerSupportSharedPtr GetLayerSupport ( const ModelOptions modelOptions) const
overridevirtual

Reimplemented from IBackendInternal.

Definition at line 261 of file ClBackend.cpp.

262 {
263  static ILayerSupportSharedPtr layerSupport
264  {
265  new ClLayerSupport(CreateBackendSpecificModelContext(modelOptions))
266  };
267  return layerSupport;
268 }

References ClBackend::CreateBackendSpecificModelContext().

◆ GetNumberOfCacheFiles()

virtual unsigned int GetNumberOfCacheFiles ( ) const
inlineoverridevirtual

Returns the number of files cached if backend supports caching.

Returns
- Returns 0 if backend does not support caching otherwise number of files cached

Reimplemented from IBackendInternal.

Definition at line 94 of file ClBackend.hpp.

94 { return 1; }

◆ OptimizeSubgraphView()

OptimizationViews OptimizeSubgraphView ( const SubgraphView subgraph,
const ModelOptions modelOptions 
) const
overridevirtual

Reimplemented from IBackendInternal.

Definition at line 292 of file ClBackend.cpp.

294 {
295  OptimizationViews optimizationViews(modelOptions);
296 
297  auto it = subgraph.end();
298  bool isFastMathEnabled = false;
299  std::map<LayerGuid, Layer*> untouched;
300 
301  while (it != subgraph.begin())
302  {
303  --it;
304  Layer& base = *(PolymorphicDowncast<Layer*>(*it));
305  untouched.insert({base.GetGuid(), &base});
306  }
307 
308  it = subgraph.end();
309 #if defined(ARMCOMPUTECL_ENABLED)
311 
312  if (modelContextPtr)
313  {
314  auto clModelOptions = dynamic_cast<ClBackendModelContext*>(modelContextPtr.get());
315  if (clModelOptions)
316  {
317  isFastMathEnabled = clModelOptions->IsFastMathEnabled();
318  }
319  }
320 #endif
321  while (it != subgraph.begin())
322  {
323  --it;
324  Layer& base = *(PolymorphicDowncast<Layer*>(*it));
325 
326  // Fuse activation into previous layer if supported by backend
327  if ((base.GetType() == LayerType::DepthwiseConvolution2d || base.GetType() == LayerType::Convolution2d
328  || base.GetType() == LayerType::BatchNormalization || base.GetType() == LayerType::FullyConnected
329  || base.GetType() == LayerType::Addition || base.GetType() == LayerType::Multiplication
330  || base.GetType() == LayerType::Subtraction || base.GetType() == LayerType::Division
331  || base.GetType() == LayerType::ElementwiseBinary)
332  && (base.GetAdditionalInformation<ActivationDescriptor>() == nullptr))
333  {
334  for (auto output = base.BeginOutputSlots(); output != base.EndOutputSlots(); ++output)
335  {
336  if (output->GetNumConnections() == 1)
337  {
338  for (auto&& childInput : output->GetConnections())
339  {
340  if ((childInput->GetOwningLayer().GetType() == LayerType::Activation) &&
341  (checkDataTypeInputandOutput(childInput->GetOwningLayer())))
342  {
343  Layer& child = childInput->GetOwningLayer();
344 
345  auto* activationLayer = PolymorphicDowncast<ActivationLayer*>(&child);
346  // Before we proceed make sure that this activation layer is in the subgraph. It could be
347  // the first layer in the next subgraph.
348  if (untouched.find(activationLayer->GetGuid()) == untouched.end())
349  {
350  // We can't fuse a layer that's outside the subgraph.
351  break;
352  }
353 
354  const std::string name = std::string("fused-") + child.GetName() + std::string("-into-") +
355  base.GetName();
356 
357  // Get params from activation layer
358  ActivationDescriptor activationDesc = activationLayer->GetParameters();
359 
360  if (base.GetType() == LayerType::Convolution2d)
361  {
362  Convolution2dLayer* baseLayer = PolymorphicDowncast<Convolution2dLayer*>(&base);
363 
364  Optional<TensorInfo> biases;
365 
366  if (baseLayer->GetParameters().m_BiasEnabled)
367  {
368  biases = baseLayer->GetInputSlot(2).GetConnectedOutputSlot()->GetTensorInfo();
369  }
370 
372  baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
373  activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
374  baseLayer->GetParameters(),
375  baseLayer->GetInputSlot(1).GetConnectedOutputSlot()->GetTensorInfo(),
376  biases,
377  isFastMathEnabled,
378  &activationDesc);
379 
380  if (status)
381  {
382  FuseConvolution2dLayer<Convolution2dLayer>(optimizationViews,
383  baseLayer,
384  activationLayer,
385  activationDesc,
386  name);
387  untouched.erase(baseLayer->GetGuid());
388  untouched.erase(activationLayer->GetGuid());
389  }
390  }
391  else if (base.GetType() == LayerType::DepthwiseConvolution2d)
392  {
393  DepthwiseConvolution2dLayer* baseLayer =
394  PolymorphicDowncast<DepthwiseConvolution2dLayer*>(&base);
395 
396  Optional<TensorInfo> biases;
397 
398  if (baseLayer->GetParameters().m_BiasEnabled)
399  {
400  biases = baseLayer->GetInputSlot(2).GetConnectedOutputSlot()->GetTensorInfo();
401  }
402 
404  baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
405  activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
406  baseLayer->GetParameters(),
407  baseLayer->GetInputSlot(1).GetConnectedOutputSlot()->GetTensorInfo(),
408  biases,
409  &activationDesc);
410 
411  if (status)
412  {
413  FuseDepthwiseConvolution2dLayer<DepthwiseConvolution2dLayer>(optimizationViews,
414  baseLayer,
415  activationLayer,
416  activationDesc,
417  name);
418  untouched.erase(baseLayer->GetGuid());
419  untouched.erase(activationLayer->GetGuid());
420  }
421  }
422  else if (base.GetType() == LayerType::FullyConnected)
423  {
424  FullyConnectedLayer* baseLayer = PolymorphicDowncast<FullyConnectedLayer*>(&base);
425  FullyConnectedDescriptor descriptor = baseLayer->GetParameters();
426 
427  // As bias is optional only try to get TensorInfo from input if bias is enabled.
428  Optional<TensorInfo> biases;
429  if (descriptor.m_BiasEnabled)
430  {
431  biases = baseLayer->GetInputSlot(2).GetConnectedOutputSlot()->GetTensorInfo();
432  }
433 
435  baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
436  activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
437  baseLayer->GetInputSlot(1).GetConnectedOutputSlot()->GetTensorInfo(),
438  biases,
439  baseLayer->GetParameters(),
440  &activationDesc);
441 
442  if (status)
443  {
444  FuseFullyConnectedLayer<FullyConnectedLayer>(optimizationViews,
445  baseLayer,
446  activationLayer,
447  activationDesc,
448  name);
449  untouched.erase(baseLayer->GetGuid());
450  untouched.erase(activationLayer->GetGuid());
451  }
452  }
453  else if (base.GetType() == LayerType::BatchNormalization)
454  {
455  BatchNormalizationLayer* baseLayer =
456  PolymorphicDowncast<BatchNormalizationLayer*>(&base);
457 
459  baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
460  activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
461  baseLayer->m_Mean->GetTensorInfo(),
462  baseLayer->m_Variance->GetTensorInfo(),
463  baseLayer->m_Beta->GetTensorInfo(),
464  baseLayer->m_Gamma->GetTensorInfo(),
465  baseLayer->GetParameters(),
466  &activationDesc);
467 
468  if (status)
469  {
470  BatchNormalizationLayer* replacementLayer =
471  FuseBatchNormalizationLayer<BatchNormalizationLayer>(optimizationViews,
472  baseLayer,
473  activationLayer,
474  activationDesc,
475  name);
476 
477  replacementLayer->m_Beta = std::move(baseLayer->m_Beta);
478  replacementLayer->m_Gamma = std::move(baseLayer->m_Gamma);
479  replacementLayer->m_Mean = std::move(baseLayer->m_Mean);
480  replacementLayer->m_Variance = std::move(baseLayer->m_Variance);
481 
482  untouched.erase(baseLayer->GetGuid());
483  untouched.erase(activationLayer->GetGuid());
484  }
485  }
486  else if (base.GetType() == LayerType::Addition)
487  {
488  AdditionLayer* baseLayer = PolymorphicDowncast<AdditionLayer*>(&base);
489 
491  baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
492  baseLayer->GetInputSlot(1).GetConnectedOutputSlot()->GetTensorInfo(),
493  activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
494  &activationDesc);
495 
496  if (status)
497  {
498  FuseAdditionLayer<AdditionLayer>(optimizationViews,
499  baseLayer,
500  activationLayer,
501  activationDesc,
502  name);
503 
504  untouched.erase(baseLayer->GetGuid());
505  untouched.erase(activationLayer->GetGuid());
506  }
507  }
508  else if (base.GetType() == LayerType::Division)
509  {
510  DivisionLayer* baseLayer = PolymorphicDowncast<DivisionLayer*>(&base);
511 
513  baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
514  baseLayer->GetInputSlot(1).GetConnectedOutputSlot()->GetTensorInfo(),
515  activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
516  &activationDesc);
517 
518  if (status)
519  {
520  FuseDivisionLayer<DivisionLayer>(optimizationViews,
521  baseLayer,
522  activationLayer,
523  activationDesc,
524  name);
525  untouched.erase(baseLayer->GetGuid());
526  untouched.erase(activationLayer->GetGuid());
527  }
528  }
529  else if (base.GetType() == LayerType::Multiplication)
530  {
531  MultiplicationLayer* baseLayer = PolymorphicDowncast<MultiplicationLayer*>(&base);
532 
534  baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
535  baseLayer->GetInputSlot(1).GetConnectedOutputSlot()->GetTensorInfo(),
536  activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
537  &activationDesc);
538 
539  if (status)
540  {
541  FuseMultiplicationLayer<MultiplicationLayer>(optimizationViews,
542  baseLayer,
543  activationLayer,
544  activationDesc,
545  name);
546  untouched.erase(baseLayer->GetGuid());
547  untouched.erase(activationLayer->GetGuid());
548  }
549  }
550  else if (base.GetType() == LayerType::Subtraction)
551  {
552  SubtractionLayer* baseLayer = PolymorphicDowncast<SubtractionLayer*>(&base);
553 
555  baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
556  baseLayer->GetInputSlot(1).GetConnectedOutputSlot()->GetTensorInfo(),
557  activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
558  &activationDesc);
559 
560  if (status)
561  {
562  FuseSubtractionLayer<SubtractionLayer>(optimizationViews,
563  baseLayer,
564  activationLayer,
565  activationDesc,
566  name);
567  untouched.erase(baseLayer->GetGuid());
568  untouched.erase(activationLayer->GetGuid());
569  }
570  }
571  else if (base.GetType() == LayerType::ElementwiseBinary)
572  {
573  ElementwiseBinaryLayer* baseLayer = PolymorphicDowncast<ElementwiseBinaryLayer*>(&base);
574 
575  if (baseLayer->GetParameters().m_Operation == BinaryOperation::Add)
576  {
578  baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
579  baseLayer->GetInputSlot(1).GetConnectedOutputSlot()->GetTensorInfo(),
580  activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
581  &activationDesc);
582 
583  if (status)
584  {
585  FuseElementwiseBinaryLayer<ElementwiseBinaryLayer>(optimizationViews,
586  baseLayer,
587  activationLayer,
588  activationDesc,
590  name);
591  untouched.erase(baseLayer->GetGuid());
592  untouched.erase(activationLayer->GetGuid());
593  }
594  }
595  else if (baseLayer->GetParameters().m_Operation == BinaryOperation::Div)
596  {
598  baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
599  baseLayer->GetInputSlot(1).GetConnectedOutputSlot()->GetTensorInfo(),
600  activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
601  &activationDesc);
602 
603  if (status)
604  {
605  FuseElementwiseBinaryLayer<ElementwiseBinaryLayer>(optimizationViews,
606  baseLayer,
607  activationLayer,
608  activationDesc,
610  name);
611  untouched.erase(baseLayer->GetGuid());
612  untouched.erase(activationLayer->GetGuid());
613  }
614  }
615  else if (baseLayer->GetParameters().m_Operation == BinaryOperation::Mul)
616  {
618  baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
619  baseLayer->GetInputSlot(1).GetConnectedOutputSlot()->GetTensorInfo(),
620  activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
621  &activationDesc);
622 
623  if (status)
624  {
625  FuseElementwiseBinaryLayer<ElementwiseBinaryLayer>(optimizationViews,
626  baseLayer,
627  activationLayer,
628  activationDesc,
630  name);
631  untouched.erase(baseLayer->GetGuid());
632  untouched.erase(activationLayer->GetGuid());
633  }
634  }
635  else if (baseLayer->GetParameters().m_Operation == BinaryOperation::Sub)
636  {
638  baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
639  baseLayer->GetInputSlot(1).GetConnectedOutputSlot()->GetTensorInfo(),
640  activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
641  &activationDesc);
642 
643  if (status)
644  {
645  FuseElementwiseBinaryLayer<ElementwiseBinaryLayer>(optimizationViews,
646  baseLayer,
647  activationLayer,
648  activationDesc,
650  name);
651  untouched.erase(baseLayer->GetGuid());
652  untouched.erase(activationLayer->GetGuid());
653  }
654  }
655  // No fusion available for other BinaryOperations
656  }
657  }
658  }
659  }
660  }
661  }
662 
663  // Separate reduce layer with multiple axes into multiple reduce layers with 1 axis.
664  if (base.GetType() == LayerType::Reduce)
665  {
666  ReduceLayer* baseLayer = PolymorphicDowncast<ReduceLayer*>(&base);
667  ReduceDescriptor reduceDescriptor = baseLayer->GetParameters();
668 
669  if (!reduceDescriptor.m_vAxis.empty() && reduceDescriptor.m_vAxis.size() > 1)
670  {
671  // Add new layers to the graph and connect them.
672  std::vector<IConnectableLayer*> layers = ChainReduceLayers<ReduceLayer>(optimizationViews,
673  baseLayer,
674  reduceDescriptor);
675 
676  // Replace existing baselayer with new subgraph.
677  ReplaceLayers<ReduceLayer>(optimizationViews, baseLayer, layers);
678  untouched.erase(baseLayer->GetGuid());
679  }
680  }
681 
682  // Remove Reshape where possible
683  if (base.GetType() == LayerType::Reshape)
684  {
685  ReshapeLayer* baseLayer = PolymorphicDowncast<ReshapeLayer*>(&base);
686 
687  // Cannot remove a Reshape if it's connected to any layer that has an NCHW layout
688  if (ConnectedToLayerWithNCHW(baseLayer))
689  {
690  continue;
691  }
692  RemoveReshapeLayer(baseLayer, untouched, optimizationViews);
693  }
694 
695  // Special case to fuse padding into average pooling 2d for quantized datatype.
696  // Required to be done as a backend specific optimization as Neon does not support this special case.
697  if (base.GetType() == LayerType::Pooling2d)
698  {
699  Pooling2dLayer* baseLayer = PolymorphicDowncast<Pooling2dLayer*>(&base);
700  Pooling2dDescriptor poolingDescriptor = baseLayer->GetParameters();
701 
702  if (baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetOwningLayer().GetType() == LayerType::Pad)
703  {
704  PadLayer* padLayer = PolymorphicDowncast<PadLayer*>(
705  &baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetOwningLayer());
706  if (padLayer->GetOutputSlot(0).GetNumConnections() == 1 &&
707  optimizations::pad_fold::TryFoldPadIntoLayer2d(padLayer->GetParameters(),
708  poolingDescriptor,
709  padLayer->GetOutputSlot().GetTensorInfo(),
710  true))
711  {
712  FoldPadIntoAveragePool2d<Pooling2dLayer>(optimizationViews, baseLayer,
713  poolingDescriptor, padLayer);
714  untouched.erase(baseLayer->GetGuid());
715  untouched.erase(padLayer->GetGuid());
716  }
717  }
718  }
719  }
720 
721  if (optimizationViews.GetSubstitutions().empty() && optimizationViews.GetDeletedSubgraphs().empty())
722  {
723  optimizationViews.AddUntouchedSubgraph(SubgraphView(subgraph));
724  }
725  else
726  {
727  ReportUntouchedLayers(optimizationViews, untouched);
728  }
729 
730  return optimizationViews;
731 }
bool TryFoldPadIntoLayer2d(const PadDescriptor &padDescriptor, Descriptor &layerDescriptor, const TensorInfo &tensorInfo)
const armnnSerializer::Pooling2dDescriptor * Pooling2dDescriptor
arm_compute::Status ClDivisionWorkloadValidate(const TensorInfo &input0, const TensorInfo &input1, const TensorInfo &output, const ActivationDescriptor *activationDescriptor)
arm_compute::Status ClFullyConnectedWorkloadValidate(const TensorInfo &input, const TensorInfo &output, const TensorInfo &weights, const Optional< TensorInfo > &biases, const FullyConnectedDescriptor &descriptor, const ActivationDescriptor *activationDescriptor)
arm_compute::Status ClAdditionValidate(const TensorInfo &input0, const TensorInfo &input1, const TensorInfo &output, const ActivationDescriptor *activationDescriptor)
arm_compute::Status ClBatchNormalizationValidate(const TensorInfo &input, const TensorInfo &output, const TensorInfo &mean, const TensorInfo &var, const TensorInfo &beta, const TensorInfo &gamma, const BatchNormalizationDescriptor &descriptor, const ActivationDescriptor *activationDescriptor)
void ReportUntouchedLayers(OptimizationViews &optimizationViews, std::map< LayerGuid, Layer * > untouched)
arm_compute::Status ClConvolution2dWorkloadValidate(const TensorInfo &input, const TensorInfo &output, const Convolution2dDescriptor &descriptor, const TensorInfo &weights, const Optional< TensorInfo > &biases, bool isFastMathEnabled, const ActivationDescriptor *activationDescriptor)
Status
enumeration
Definition: Types.hpp:43
arm_compute::Status ClMultiplicationWorkloadValidate(const TensorInfo &input0, const TensorInfo &input1, const TensorInfo &output, const ActivationDescriptor *activationDescriptor)
arm_compute::Status ClDepthwiseConvolutionWorkloadValidate(const TensorInfo &input, const TensorInfo &output, const DepthwiseConvolution2dDescriptor &descriptor, const TensorInfo &weights, const Optional< TensorInfo > &biases, const ActivationDescriptor *activationDescriptor)
bool ConnectedToLayerWithNCHW(Layer *baseLayer)
Checks if the Layer is connected to any Layer that has an NCHW layout.
void RemoveReshapeLayer(ReshapeLayer *baseLayer, std::map< LayerGuid, Layer * > &untouched, OptimizationViews &optimizationViews)
arm_compute::Status ClSubtractionValidate(const TensorInfo &input0, const TensorInfo &input1, const TensorInfo &output, const ActivationDescriptor *activationDescriptor)

References armnn::Activation, armnn::Add, armnn::Addition, OptimizationViews::AddUntouchedSubgraph(), armnn::BatchNormalization, SubgraphView::begin(), Layer::BeginOutputSlots(), armnn::ClAdditionValidate(), armnn::ClBatchNormalizationValidate(), armnn::ClConvolution2dWorkloadValidate(), armnn::ClDepthwiseConvolutionWorkloadValidate(), armnn::ClDivisionWorkloadValidate(), armnn::ClFullyConnectedWorkloadValidate(), armnn::ClMultiplicationWorkloadValidate(), armnn::ClSubtractionValidate(), armnn::ConnectedToLayerWithNCHW(), armnn::Convolution2d, ClBackend::CreateBackendSpecificModelContext(), armnn::DepthwiseConvolution2d, armnn::Div, armnn::Division, armnn::ElementwiseBinary, SubgraphView::end(), Layer::EndOutputSlots(), armnn::FullyConnected, Layer::GetAdditionalInformation(), InputSlot::GetConnectedOutputSlot(), OptimizationViews::GetDeletedSubgraphs(), Layer::GetGuid(), Layer::GetInputSlot(), Layer::GetName(), OutputSlot::GetNumConnections(), Layer::GetOutputSlot(), OutputSlot::GetOwningLayer(), LayerWithParameters< Parameters >::GetParameters(), OptimizationViews::GetSubstitutions(), OutputSlot::GetTensorInfo(), Layer::GetType(), ClBackendModelContext::IsFastMathEnabled(), BatchNormalizationLayer::m_Beta, FullyConnectedDescriptor::m_BiasEnabled, Convolution2dDescriptor::m_BiasEnabled, DepthwiseConvolution2dDescriptor::m_BiasEnabled, BatchNormalizationLayer::m_Gamma, BatchNormalizationLayer::m_Mean, ElementwiseBinaryDescriptor::m_Operation, BatchNormalizationLayer::m_Variance, ReduceDescriptor::m_vAxis, armnn::Mul, armnn::Multiplication, armnn::Pad, armnn::Pooling2d, armnn::Reduce, armnn::RemoveReshapeLayer(), armnn::ReportUntouchedLayers(), armnn::Reshape, armnn::Sub, armnn::Subtraction, and armnn::optimizations::pad_fold::TryFoldPadIntoLayer2d().

◆ RegisterTensorHandleFactories() [1/2]

void RegisterTensorHandleFactories ( TensorHandleFactoryRegistry )
overridevirtual

(Optional) Register TensorHandleFactories Either this method or CreateMemoryManager() and IWorkloadFactory::CreateTensor() IWorkloadFactory::CreateSubtensor() methods must be implemented.

Reimplemented from IBackendInternal.

Definition at line 175 of file ClBackend.cpp.

176 {
177  std::shared_ptr<ClMemoryManager> memoryManager;
179  {
180  memoryManager = std::make_shared<ClMemoryManager>(m_CustomAllocator);
181  }
182  else
183  {
184  memoryManager = std::make_shared<ClMemoryManager>(std::make_unique<arm_compute::CLBufferAllocator>());
185  }
186 
187  std::unique_ptr<ITensorHandleFactory> factory = std::make_unique<ClTensorHandleFactory>(memoryManager);
188  std::unique_ptr<ITensorHandleFactory> importFactory = std::make_unique<ClImportTensorHandleFactory>(
190 
191  registry.RegisterCopyAndImportFactoryPair(factory->GetId(), importFactory->GetId());
192  registry.RegisterCopyAndImportFactoryPair(importFactory->GetId(), factory->GetId());
193 
194  registry.RegisterMemoryManager(memoryManager);
195  registry.RegisterFactory(std::move(factory));
196  registry.RegisterFactory(std::move(importFactory));
197 
198 }

References ClBackend::m_CustomAllocator, ClBackend::m_UsingCustomAllocator, armnn::Malloc, TensorHandleFactoryRegistry::RegisterCopyAndImportFactoryPair(), TensorHandleFactoryRegistry::RegisterFactory(), and TensorHandleFactoryRegistry::RegisterMemoryManager().

◆ RegisterTensorHandleFactories() [2/2]

void RegisterTensorHandleFactories ( TensorHandleFactoryRegistry registry,
MemorySourceFlags  inputFlags,
MemorySourceFlags  outputFlags 
)
overridevirtual

(Optional) Register TensorHandleFactories Either this method or CreateMemoryManager() and IWorkloadFactory::CreateTensor() IWorkloadFactory::CreateSubtensor() methods must be implemented.

Reimplemented from IBackendInternal.

Definition at line 200 of file ClBackend.cpp.

203 {
204  // To allow force import if inputFlags/outputFlags are Undefined, set it as Malloc
205  if (inputFlags == static_cast<MemorySourceFlags>(MemorySource::Undefined))
206  {
207  inputFlags = static_cast<MemorySourceFlags>(MemorySource::Malloc);
208  }
209  if (outputFlags == static_cast<MemorySourceFlags>(MemorySource::Undefined))
210  {
211  outputFlags = static_cast<MemorySourceFlags>(MemorySource::Malloc);
212  }
213  std::shared_ptr<ClMemoryManager> memoryManager;
215  {
216  memoryManager = std::make_shared<ClMemoryManager>(m_CustomAllocator);
217  }
218  else
219  {
220  memoryManager = std::make_shared<ClMemoryManager>(std::make_unique<arm_compute::CLBufferAllocator>());
221  }
222 
223  std::unique_ptr<ITensorHandleFactory> factory = std::make_unique<ClTensorHandleFactory>(memoryManager);
224  std::unique_ptr<ITensorHandleFactory> importFactory = std::make_unique<ClImportTensorHandleFactory>(
225  inputFlags, outputFlags);
226 
227  registry.RegisterCopyAndImportFactoryPair(factory->GetId(), importFactory->GetId());
228  registry.RegisterCopyAndImportFactoryPair(importFactory->GetId(), factory->GetId());
229 
230  registry.RegisterMemoryManager(memoryManager);
231  registry.RegisterFactory(std::move(factory));
232  registry.RegisterFactory(std::move(importFactory));
233 }

References ClBackend::m_CustomAllocator, ClBackend::m_UsingCustomAllocator, armnn::Malloc, TensorHandleFactoryRegistry::RegisterCopyAndImportFactoryPair(), TensorHandleFactoryRegistry::RegisterFactory(), TensorHandleFactoryRegistry::RegisterMemoryManager(), and armnn::Undefined.

◆ UseCustomMemoryAllocator()

virtual bool UseCustomMemoryAllocator ( std::shared_ptr< ICustomAllocator allocator,
armnn::Optional< std::string & >  errMsg 
)
inlineoverridevirtual

Signals the backend to use a custom memory allocator provided by the user.

Parameters
allocator- a pointer to the provided ICustomAllocator to use with this backend
errMsg- Optional string variable to return error messages
Returns
- Returns true if switching to custom allocator was successful

Reimplemented from IBackendInternal.

Definition at line 82 of file ClBackend.hpp.

84  {
85  IgnoreUnused(errMsg);
86  ARMNN_LOG(info) << "Using Custom Allocator for ClBackend";
87 
88  // Set flag to signal the backend to use a custom memory allocator
89  m_CustomAllocator = std::make_shared<ClBackendCustomAllocatorWrapper>(std::move(allocator));
92  }
#define ARMNN_LOG(severity)
Definition: Logging.hpp:212
void IgnoreUnused(Ts &&...)

References ARMNN_LOG, armnn::IgnoreUnused(), armnn::info, ClBackend::m_CustomAllocator, and ClBackend::m_UsingCustomAllocator.

Referenced by ClBackend::ClBackend().

Member Data Documentation

◆ m_CustomAllocator

◆ m_UsingCustomAllocator


The documentation for this class was generated from the following files: