ArmNN
 25.11
Loading...
Searching...
No Matches
ClBackend Class Reference

#include <ClBackend.hpp>

Inheritance diagram for ClBackend:
[legend]
Collaboration diagram for ClBackend:
[legend]

Classes

class  ClBackendCustomAllocatorMemoryRegion
class  ClBackendCustomAllocatorWrapper

Public Member Functions

 ClBackend ()
 ClBackend (std::shared_ptr< ICustomAllocator > allocator)
 ~ClBackend ()=default
const BackendIdGetId () const override
IBackendInternal::IMemoryManagerUniquePtr CreateMemoryManager () const override
IBackendInternal::IWorkloadFactoryPtr CreateWorkloadFactory (const IBackendInternal::IMemoryManagerSharedPtr &memoryManager=nullptr) const override
IBackendInternal::IWorkloadFactoryPtr CreateWorkloadFactory (TensorHandleFactoryRegistry &registry) const override
IWorkloadFactoryPtr CreateWorkloadFactory (const IMemoryManagerSharedPtr &memoryManager, const ModelOptions &modelOptions) const override
IWorkloadFactoryPtr CreateWorkloadFactory (class TensorHandleFactoryRegistry &tensorHandleFactoryRegistry, const ModelOptions &modelOptions) const override
IWorkloadFactoryPtr CreateWorkloadFactory (class TensorHandleFactoryRegistry &tensorHandleFactoryRegistry, const ModelOptions &modelOptions, MemorySourceFlags inputFlags, MemorySourceFlags outputFlags) const override
std::vector< ITensorHandleFactory::FactoryIdGetHandleFactoryPreferences () const override
 (Optional) Returns a vector of supported TensorHandleFactory ids in preference order.
void RegisterTensorHandleFactories (TensorHandleFactoryRegistry &registry) override
 (Optional) Register TensorHandleFactories Either this method or CreateMemoryManager() and IWorkloadFactory::CreateTensor() IWorkloadFactory::CreateSubtensor() methods must be implemented.
void RegisterTensorHandleFactories (TensorHandleFactoryRegistry &registry, MemorySourceFlags inputFlags, MemorySourceFlags outputFlags) override
 (Optional) Register TensorHandleFactories Either this method or CreateMemoryManager() and IWorkloadFactory::CreateTensor() IWorkloadFactory::CreateSubtensor() methods must be implemented.
IBackendInternal::IBackendContextPtr CreateBackendContext (const IRuntime::CreationOptions &) const override
 Create the runtime context of the backend.
IBackendInternal::IBackendProfilingContextPtr CreateBackendProfilingContext (const IRuntime::CreationOptions &, IBackendProfilingPtr &backendProfiling) override
 Create context specifically used for profiling interaction from backends.
IBackendInternal::ILayerSupportSharedPtr GetLayerSupport () const override
IBackendInternal::ILayerSupportSharedPtr GetLayerSupport (const ModelOptions &modelOptions) const override
OptimizationViews OptimizeSubgraphView (const SubgraphView &subgraph, const ModelOptions &modelOptions) const override
IBackendInternal::IBackendSpecificModelContextPtr CreateBackendSpecificModelContext (const ModelOptions &modelOptions) const override
std::unique_ptr< ICustomAllocatorGetDefaultAllocator () const override
 Returns the default memory allocator for the backend.
BackendCapabilities GetCapabilities () const override
 Returns a BackendCapability if the backend lists the capability The BackendCapability must then be inspected to check whether or not that BackendCapability is supported Otherwise returns an EmptyOptional if the BackendCapability is unlisted.
virtual bool UseCustomMemoryAllocator (std::shared_ptr< ICustomAllocator > allocator, armnn::Optional< std::string & > errMsg) override
 Signals the backend to use a custom memory allocator provided by the user.
virtual unsigned int GetNumberOfCacheFiles () const override
 Returns the number of files cached if backend supports caching.
Public Member Functions inherited from IBackendInternal
 ~IBackendInternal () override=default
 Allow backends created by the factory function to be destroyed through IBackendInternal.
virtual OptimizationViews OptimizeSubgraphView (const SubgraphView &subgraph) const
bool SupportsTensorAllocatorAPI () const
ITensorHandleFactory::FactoryId GetBackwardCompatibleFavoriteHandleFactory ()

Static Public Member Functions

static const BackendIdGetIdStatic ()
Static Public Member Functions inherited from IBackendInternal
static constexpr BackendVersion GetApiVersion ()
 Returns the version of the Backend API.

Public Attributes

std::shared_ptr< ClBackendCustomAllocatorWrapperm_CustomAllocator
bool m_UsingCustomAllocator = false

Additional Inherited Members

Public Types inherited from IBackendInternal
using IWorkloadFactoryPtr = std::unique_ptr<IWorkloadFactory>
using IBackendContextPtr = std::unique_ptr<IBackendContext>
using IBackendProfilingContextPtr = std::shared_ptr<arm::pipe::IBackendProfilingContext>
 This is the bridge between backend and backend profiling we'll keep it in the backend namespace.
using IBackendProfilingPtr = std::unique_ptr<arm::pipe::IBackendProfiling>
using ILayerSupportSharedPtr = std::shared_ptr<ILayerSupport>
using IBackendSpecificModelContextPtr = std::shared_ptr<IBackendModelContext>
using IMemoryManagerUniquePtr = std::unique_ptr<IMemoryManager>
using IMemoryManagerSharedPtr = std::shared_ptr<IMemoryManager>
Protected Member Functions inherited from IBackendInternal
 IBackendInternal ()=default
 Creation must be done through a specific backend interface.
Protected Member Functions inherited from IBackend
 IBackend ()
virtual ~IBackend ()

Detailed Description

Definition at line 24 of file ClBackend.hpp.

Constructor & Destructor Documentation

◆ ClBackend() [1/2]

ClBackend ( )
inline

Definition at line 27 of file ClBackend.hpp.

27: m_CustomAllocator(nullptr) {};

References m_CustomAllocator.

◆ ClBackend() [2/2]

ClBackend ( std::shared_ptr< ICustomAllocator > allocator)
inline

Definition at line 28 of file ClBackend.hpp.

29 {
30 std::string err;
31 UseCustomMemoryAllocator(allocator, err);
32 }

References UseCustomMemoryAllocator().

◆ ~ClBackend()

~ClBackend ( )
default

Member Function Documentation

◆ CreateBackendContext()

IBackendInternal::IBackendContextPtr CreateBackendContext ( const IRuntime::CreationOptions & ) const
overridevirtual

Create the runtime context of the backend.

Implementations may return a default-constructed IBackendContextPtr if no context is needed at runtime. Implementations must throw BackendUnavailableException if the backend cannot be used (for example, necessary accelerator hardware is not present). The default implementation always returns a default-constructed pointer.

Reimplemented from IBackendInternal.

Definition at line 235 of file ClBackend.cpp.

236{
237 return IBackendContextPtr{new ClBackendContext{options}};
238}

◆ CreateBackendProfilingContext()

IBackendInternal::IBackendProfilingContextPtr CreateBackendProfilingContext ( const IRuntime::CreationOptions & creationOptions,
IBackendProfilingPtr & backendProfiling )
overridevirtual

Create context specifically used for profiling interaction from backends.

Reimplemented from IBackendInternal.

Definition at line 240 of file ClBackend.cpp.

242{
243 return IBackendProfilingContextPtr{};
244}

◆ CreateBackendSpecificModelContext()

IBackendInternal::IBackendSpecificModelContextPtr CreateBackendSpecificModelContext ( const ModelOptions & modelOptions) const
overridevirtual

Reimplemented from IBackendInternal.

Definition at line 246 of file ClBackend.cpp.

248{
249 return IBackendSpecificModelContextPtr{new ClBackendModelContext{modelOptions}};
250}

Referenced by CreateWorkloadFactory(), CreateWorkloadFactory(), CreateWorkloadFactory(), GetLayerSupport(), and OptimizeSubgraphView().

◆ CreateMemoryManager()

IBackendInternal::IMemoryManagerUniquePtr CreateMemoryManager ( ) const
overridevirtual

Reimplemented from IBackendInternal.

Definition at line 50 of file ClBackend.cpp.

51{
52 if (m_UsingCustomAllocator)
53 {
54 return std::make_unique<ClMemoryManager>(m_CustomAllocator);
55 }
56 return std::make_unique<ClMemoryManager>(std::make_unique<arm_compute::CLBufferAllocator>());
57}

References m_CustomAllocator, and m_UsingCustomAllocator.

◆ CreateWorkloadFactory() [1/5]

IBackendInternal::IWorkloadFactoryPtr CreateWorkloadFactory ( class TensorHandleFactoryRegistry & tensorHandleFactoryRegistry,
const ModelOptions & modelOptions ) const
overridevirtual

Reimplemented from IBackendInternal.

Definition at line 101 of file ClBackend.cpp.

103{
104 std::shared_ptr<ClMemoryManager> memoryManager;
105 if (m_UsingCustomAllocator)
106 {
107 memoryManager = std::make_shared<ClMemoryManager>(m_CustomAllocator);
108 }
109 else
110 {
111 memoryManager = std::make_shared<ClMemoryManager>(std::make_unique<arm_compute::CLBufferAllocator>());
112 }
113
114 std::unique_ptr<ITensorHandleFactory> factory = std::make_unique<ClTensorHandleFactory>(memoryManager);
115 std::unique_ptr<ITensorHandleFactory> importFactory = std::make_unique<ClImportTensorHandleFactory>(
116 static_cast<MemorySourceFlags>(MemorySource::Malloc), static_cast<MemorySourceFlags>(MemorySource::Malloc));
117
118 registry.RegisterCopyAndImportFactoryPair(factory->GetId(), importFactory->GetId());
119 registry.RegisterCopyAndImportFactoryPair(importFactory->GetId(), factory->GetId());
120
121 registry.RegisterMemoryManager(memoryManager);
122 registry.RegisterFactory(std::move(factory));
123 registry.RegisterFactory(std::move(importFactory));
124
125 return std::make_unique<ClWorkloadFactory>(
126 PolymorphicPointerDowncast<ClMemoryManager>(memoryManager), CreateBackendSpecificModelContext(modelOptions));
127}
unsigned int MemorySourceFlags

References CreateBackendSpecificModelContext(), m_CustomAllocator, m_UsingCustomAllocator, armnn::Malloc, armnn::PolymorphicPointerDowncast(), TensorHandleFactoryRegistry::RegisterCopyAndImportFactoryPair(), TensorHandleFactoryRegistry::RegisterFactory(), and TensorHandleFactoryRegistry::RegisterMemoryManager().

◆ CreateWorkloadFactory() [2/5]

IBackendInternal::IWorkloadFactoryPtr CreateWorkloadFactory ( class TensorHandleFactoryRegistry & tensorHandleFactoryRegistry,
const ModelOptions & modelOptions,
MemorySourceFlags inputFlags,
MemorySourceFlags outputFlags ) const
overridevirtual

Reimplemented from IBackendInternal.

Definition at line 129 of file ClBackend.cpp.

134{
135 // To allow force import if inputFlags/outputFlags are Undefined, set it as Malloc
136 if (inputFlags == static_cast<MemorySourceFlags>(MemorySource::Undefined))
137 {
138 inputFlags = static_cast<MemorySourceFlags>(MemorySource::Malloc);
139 }
140 if (outputFlags == static_cast<MemorySourceFlags>(MemorySource::Undefined))
141 {
142 outputFlags = static_cast<MemorySourceFlags>(MemorySource::Malloc);
143 }
144 std::shared_ptr<ClMemoryManager> memoryManager;
145 if (m_UsingCustomAllocator)
146 {
147 memoryManager = std::make_shared<ClMemoryManager>(m_CustomAllocator);
148 }
149 else
150 {
151 memoryManager = std::make_shared<ClMemoryManager>(std::make_unique<arm_compute::CLBufferAllocator>());
152 }
153
154 std::unique_ptr<ITensorHandleFactory> factory = std::make_unique<ClTensorHandleFactory>(memoryManager);
155 std::unique_ptr<ITensorHandleFactory> importFactory = std::make_unique<ClImportTensorHandleFactory>(
156 inputFlags, outputFlags);
157
158 registry.RegisterCopyAndImportFactoryPair(factory->GetId(), importFactory->GetId());
159 registry.RegisterCopyAndImportFactoryPair(importFactory->GetId(), factory->GetId());
160
161 registry.RegisterMemoryManager(memoryManager);
162 registry.RegisterFactory(std::move(factory));
163 registry.RegisterFactory(std::move(importFactory));
164
165 return std::make_unique<ClWorkloadFactory>(
166 PolymorphicPointerDowncast<ClMemoryManager>(memoryManager), CreateBackendSpecificModelContext(modelOptions));
167}

References CreateBackendSpecificModelContext(), m_CustomAllocator, m_UsingCustomAllocator, armnn::Malloc, armnn::PolymorphicPointerDowncast(), TensorHandleFactoryRegistry::RegisterCopyAndImportFactoryPair(), TensorHandleFactoryRegistry::RegisterFactory(), TensorHandleFactoryRegistry::RegisterMemoryManager(), and armnn::Undefined.

◆ CreateWorkloadFactory() [3/5]

IBackendInternal::IWorkloadFactoryPtr CreateWorkloadFactory ( const IBackendInternal::IMemoryManagerSharedPtr & memoryManager = nullptr) const
overridevirtual

Implements IBackendInternal.

Definition at line 59 of file ClBackend.cpp.

61{
62 return std::make_unique<ClWorkloadFactory>(
63 PolymorphicPointerDowncast<ClMemoryManager>(memoryManager));
64}

References armnn::PolymorphicPointerDowncast().

◆ CreateWorkloadFactory() [4/5]

IBackendInternal::IWorkloadFactoryPtr CreateWorkloadFactory ( const IMemoryManagerSharedPtr & memoryManager,
const ModelOptions & modelOptions ) const
overridevirtual

Reimplemented from IBackendInternal.

Definition at line 66 of file ClBackend.cpp.

68{
69 return std::make_unique<ClWorkloadFactory>(
70 PolymorphicPointerDowncast<ClMemoryManager>(memoryManager), CreateBackendSpecificModelContext(modelOptions));
71}

References CreateBackendSpecificModelContext(), and armnn::PolymorphicPointerDowncast().

◆ CreateWorkloadFactory() [5/5]

IBackendInternal::IWorkloadFactoryPtr CreateWorkloadFactory ( TensorHandleFactoryRegistry & registry) const
overridevirtual

Reimplemented from IBackendInternal.

Definition at line 73 of file ClBackend.cpp.

75{
76 std::shared_ptr<ClMemoryManager> memoryManager;
77 if (m_UsingCustomAllocator)
78 {
79 memoryManager = std::make_shared<ClMemoryManager>(m_CustomAllocator);
80 }
81 else
82 {
83 memoryManager = std::make_shared<ClMemoryManager>(std::make_unique<arm_compute::CLBufferAllocator>());
84 }
85
86 std::unique_ptr<ITensorHandleFactory> factory = std::make_unique<ClTensorHandleFactory>(memoryManager);
87 std::unique_ptr<ITensorHandleFactory> importFactory = std::make_unique<ClImportTensorHandleFactory>(
88 static_cast<MemorySourceFlags>(MemorySource::Malloc), static_cast<MemorySourceFlags>(MemorySource::Malloc));
89
90 registry.RegisterCopyAndImportFactoryPair(factory->GetId(), importFactory->GetId());
91 registry.RegisterCopyAndImportFactoryPair(importFactory->GetId(), factory->GetId());
92
93 registry.RegisterMemoryManager(memoryManager);
94 registry.RegisterFactory(std::move(factory));
95 registry.RegisterFactory(std::move(importFactory));
96
97 return std::make_unique<ClWorkloadFactory>(
98 PolymorphicPointerDowncast<ClMemoryManager>(memoryManager));
99}

References m_CustomAllocator, m_UsingCustomAllocator, armnn::Malloc, armnn::PolymorphicPointerDowncast(), TensorHandleFactoryRegistry::RegisterCopyAndImportFactoryPair(), TensorHandleFactoryRegistry::RegisterFactory(), and TensorHandleFactoryRegistry::RegisterMemoryManager().

◆ GetCapabilities()

BackendCapabilities GetCapabilities ( ) const
overridevirtual

Returns a BackendCapability if the backend lists the capability The BackendCapability must then be inspected to check whether or not that BackendCapability is supported Otherwise returns an EmptyOptional if the BackendCapability is unlisted.

Reimplemented from IBackendInternal.

Definition at line 275 of file ClBackend.cpp.

276{
277 // add new capabilities here..
278 return BackendCapabilities ("GpuAcc",
279 {
280 {"NonConstWeights", true},
281 {"ProtectedContentAllocation", true},
282 {"ConstantTensorsAsInputs", true},
283 {"PreImportIOTensors", false},
284 {"ExternallyManagedMemory", true},
285 {"MultiAxisPacking", false},
286 {"SingleAxisPacking", true},
287 {"AllOrNothing", false},
288 {"HasFp16", arm_compute::CLKernelLibrary::get().fp16_supported()}
289 });
290}
BackendOptions BackendCapabilities

◆ GetDefaultAllocator()

std::unique_ptr< ICustomAllocator > GetDefaultAllocator ( ) const
overridevirtual

Returns the default memory allocator for the backend.

Returns
- Returns unique pointer to the Default Allocator of the Backend

Reimplemented from IBackendInternal.

Definition at line 270 of file ClBackend.cpp.

271{
272 return std::make_unique<ClBackendDefaultAllocator>();
273}

◆ GetHandleFactoryPreferences()

std::vector< ITensorHandleFactory::FactoryId > GetHandleFactoryPreferences ( ) const
overridevirtual

(Optional) Returns a vector of supported TensorHandleFactory ids in preference order.

Reimplemented from IBackendInternal.

Definition at line 169 of file ClBackend.cpp.

170{
171 return std::vector<ITensorHandleFactory::FactoryId> {ClTensorHandleFactory::GetIdStatic(),
172 ClImportTensorHandleFactory::GetIdStatic()};
173}

References ClImportTensorHandleFactory::GetIdStatic(), and ClTensorHandleFactory::GetIdStatic().

◆ GetId()

const BackendId & GetId ( ) const
inlineoverridevirtual

Implements IBackend.

Definition at line 36 of file ClBackend.hpp.

36{ return GetIdStatic(); }

References GetIdStatic().

◆ GetIdStatic()

const BackendId & GetIdStatic ( )
static

Definition at line 44 of file ClBackend.cpp.

45{
46 static const BackendId s_Id{ClBackendId()};
47 return s_Id;
48}
constexpr const char * ClBackendId()

References armnn::ClBackendId().

Referenced by GetId().

◆ GetLayerSupport() [1/2]

IBackendInternal::ILayerSupportSharedPtr GetLayerSupport ( ) const
overridevirtual

Implements IBackendInternal.

Definition at line 252 of file ClBackend.cpp.

253{
254 static ILayerSupportSharedPtr layerSupport
255 {
256 new ClLayerSupport(IBackendInternal::IBackendSpecificModelContextPtr{})
257 };
258 return layerSupport;
259}
std::shared_ptr< ILayerSupport > ILayerSupportSharedPtr

◆ GetLayerSupport() [2/2]

IBackendInternal::ILayerSupportSharedPtr GetLayerSupport ( const ModelOptions & modelOptions) const
overridevirtual

Reimplemented from IBackendInternal.

Definition at line 261 of file ClBackend.cpp.

262{
263 static ILayerSupportSharedPtr layerSupport
264 {
265 new ClLayerSupport(CreateBackendSpecificModelContext(modelOptions))
266 };
267 return layerSupport;
268}

References CreateBackendSpecificModelContext().

◆ GetNumberOfCacheFiles()

virtual unsigned int GetNumberOfCacheFiles ( ) const
inlineoverridevirtual

Returns the number of files cached if backend supports caching.

Returns
- Returns 0 if backend does not support caching otherwise number of files cached

Reimplemented from IBackendInternal.

Definition at line 94 of file ClBackend.hpp.

94{ return 1; }

◆ OptimizeSubgraphView()

OptimizationViews OptimizeSubgraphView ( const SubgraphView & subgraph,
const ModelOptions & modelOptions ) const
overridevirtual

Reimplemented from IBackendInternal.

Definition at line 292 of file ClBackend.cpp.

294{
295 OptimizationViews optimizationViews(modelOptions);
296
297 auto it = subgraph.end();
298 bool isFastMathEnabled = false;
299 std::map<LayerGuid, Layer*> untouched;
300
301 while (it != subgraph.begin())
302 {
303 --it;
304 Layer& base = *(PolymorphicDowncast<Layer*>(*it));
305 untouched.insert({base.GetGuid(), &base});
306 }
307
308 it = subgraph.end();
309#if defined(ARMCOMPUTECL_ENABLED)
310 IBackendInternal::IBackendSpecificModelContextPtr modelContextPtr = CreateBackendSpecificModelContext(modelOptions);
311
312 if (modelContextPtr)
313 {
314 auto clModelOptions = dynamic_cast<ClBackendModelContext*>(modelContextPtr.get());
315 if (clModelOptions)
316 {
317 isFastMathEnabled = clModelOptions->IsFastMathEnabled();
318 }
319 }
320#endif
321 while (it != subgraph.begin())
322 {
323 --it;
324 Layer& base = *(PolymorphicDowncast<Layer*>(*it));
325
326 // Fuse activation into previous layer if supported by backend
327 if ((base.GetType() == LayerType::DepthwiseConvolution2d || base.GetType() == LayerType::Convolution2d
328 || base.GetType() == LayerType::BatchNormalization || base.GetType() == LayerType::FullyConnected
329 || base.GetType() == LayerType::Addition || base.GetType() == LayerType::Multiplication
330 || base.GetType() == LayerType::Subtraction || base.GetType() == LayerType::Division
331 || base.GetType() == LayerType::ElementwiseBinary)
332 && (base.GetAdditionalInformation<ActivationDescriptor>() == nullptr))
333 {
334 for (auto output = base.BeginOutputSlots(); output != base.EndOutputSlots(); ++output)
335 {
336 if (output->GetNumConnections() == 1)
337 {
338 for (auto&& childInput : output->GetConnections())
339 {
340 if ((childInput->GetOwningLayer().GetType() == LayerType::Activation) &&
341 (checkDataTypeInputandOutput(childInput->GetOwningLayer())))
342 {
343 Layer& child = childInput->GetOwningLayer();
344
345 auto* activationLayer = PolymorphicDowncast<ActivationLayer*>(&child);
346 // Before we proceed make sure that this activation layer is in the subgraph. It could be
347 // the first layer in the next subgraph.
348 if (untouched.find(activationLayer->GetGuid()) == untouched.end())
349 {
350 // We can't fuse a layer that's outside the subgraph.
351 break;
352 }
353
354 const std::string name = std::string("fused-") + child.GetName() + std::string("-into-") +
355 base.GetName();
356
357 // Get params from activation layer
358 ActivationDescriptor activationDesc = activationLayer->GetParameters();
359
360 if (base.GetType() == LayerType::Convolution2d)
361 {
362 Convolution2dLayer* baseLayer = PolymorphicDowncast<Convolution2dLayer*>(&base);
363
364 Optional<TensorInfo> biases;
365
366 if (baseLayer->GetParameters().m_BiasEnabled)
367 {
368 biases = baseLayer->GetInputSlot(2).GetConnectedOutputSlot()->GetTensorInfo();
369 }
370
371 arm_compute::Status status = ClConvolution2dWorkloadValidate(
372 baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
373 activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
374 baseLayer->GetParameters(),
375 baseLayer->GetInputSlot(1).GetConnectedOutputSlot()->GetTensorInfo(),
376 biases,
377 isFastMathEnabled,
378 &activationDesc);
379
380 if (status)
381 {
382 FuseConvolution2dLayer<Convolution2dLayer>(optimizationViews,
383 baseLayer,
384 activationLayer,
385 activationDesc,
386 name);
387 untouched.erase(baseLayer->GetGuid());
388 untouched.erase(activationLayer->GetGuid());
389 }
390 }
391 else if (base.GetType() == LayerType::DepthwiseConvolution2d)
392 {
393 DepthwiseConvolution2dLayer* baseLayer =
394 PolymorphicDowncast<DepthwiseConvolution2dLayer*>(&base);
395
396 Optional<TensorInfo> biases;
397
398 if (baseLayer->GetParameters().m_BiasEnabled)
399 {
400 biases = baseLayer->GetInputSlot(2).GetConnectedOutputSlot()->GetTensorInfo();
401 }
402
403 arm_compute::Status status = ClDepthwiseConvolutionWorkloadValidate(
404 baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
405 activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
406 baseLayer->GetParameters(),
407 baseLayer->GetInputSlot(1).GetConnectedOutputSlot()->GetTensorInfo(),
408 biases,
409 &activationDesc);
410
411 if (status)
412 {
413 FuseDepthwiseConvolution2dLayer<DepthwiseConvolution2dLayer>(optimizationViews,
414 baseLayer,
415 activationLayer,
416 activationDesc,
417 name);
418 untouched.erase(baseLayer->GetGuid());
419 untouched.erase(activationLayer->GetGuid());
420 }
421 }
422 else if (base.GetType() == LayerType::FullyConnected)
423 {
424 FullyConnectedLayer* baseLayer = PolymorphicDowncast<FullyConnectedLayer*>(&base);
425 FullyConnectedDescriptor descriptor = baseLayer->GetParameters();
426
427 // As bias is optional only try to get TensorInfo from input if bias is enabled.
428 Optional<TensorInfo> biases;
429 if (descriptor.m_BiasEnabled)
430 {
431 biases = baseLayer->GetInputSlot(2).GetConnectedOutputSlot()->GetTensorInfo();
432 }
433
434 arm_compute::Status status = ClFullyConnectedWorkloadValidate(
435 baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
436 activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
437 baseLayer->GetInputSlot(1).GetConnectedOutputSlot()->GetTensorInfo(),
438 biases,
439 baseLayer->GetParameters(),
440 &activationDesc);
441
442 if (status)
443 {
444 FuseFullyConnectedLayer<FullyConnectedLayer>(optimizationViews,
445 baseLayer,
446 activationLayer,
447 activationDesc,
448 name);
449 untouched.erase(baseLayer->GetGuid());
450 untouched.erase(activationLayer->GetGuid());
451 }
452 }
453 else if (base.GetType() == LayerType::BatchNormalization)
454 {
455 BatchNormalizationLayer* baseLayer =
456 PolymorphicDowncast<BatchNormalizationLayer*>(&base);
457
458 arm_compute::Status status = ClBatchNormalizationValidate(
459 baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
460 activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
461 baseLayer->m_Mean->GetTensorInfo(),
462 baseLayer->m_Variance->GetTensorInfo(),
463 baseLayer->m_Beta->GetTensorInfo(),
464 baseLayer->m_Gamma->GetTensorInfo(),
465 baseLayer->GetParameters(),
466 &activationDesc);
467
468 if (status)
469 {
470 BatchNormalizationLayer* replacementLayer =
471 FuseBatchNormalizationLayer<BatchNormalizationLayer>(optimizationViews,
472 baseLayer,
473 activationLayer,
474 activationDesc,
475 name);
476
477 replacementLayer->m_Beta = std::move(baseLayer->m_Beta);
478 replacementLayer->m_Gamma = std::move(baseLayer->m_Gamma);
479 replacementLayer->m_Mean = std::move(baseLayer->m_Mean);
480 replacementLayer->m_Variance = std::move(baseLayer->m_Variance);
481
482 untouched.erase(baseLayer->GetGuid());
483 untouched.erase(activationLayer->GetGuid());
484 }
485 }
486 else if (base.GetType() == LayerType::Addition)
487 {
488 AdditionLayer* baseLayer = PolymorphicDowncast<AdditionLayer*>(&base);
489
490 arm_compute::Status status = ClAdditionValidate(
491 baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
492 baseLayer->GetInputSlot(1).GetConnectedOutputSlot()->GetTensorInfo(),
493 activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
494 &activationDesc);
495
496 if (status)
497 {
498 FuseAdditionLayer<AdditionLayer>(optimizationViews,
499 baseLayer,
500 activationLayer,
501 activationDesc,
502 name);
503
504 untouched.erase(baseLayer->GetGuid());
505 untouched.erase(activationLayer->GetGuid());
506 }
507 }
508 else if (base.GetType() == LayerType::Division)
509 {
510 DivisionLayer* baseLayer = PolymorphicDowncast<DivisionLayer*>(&base);
511
512 arm_compute::Status status = ClDivisionWorkloadValidate(
513 baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
514 baseLayer->GetInputSlot(1).GetConnectedOutputSlot()->GetTensorInfo(),
515 activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
516 &activationDesc);
517
518 if (status)
519 {
520 FuseDivisionLayer<DivisionLayer>(optimizationViews,
521 baseLayer,
522 activationLayer,
523 activationDesc,
524 name);
525 untouched.erase(baseLayer->GetGuid());
526 untouched.erase(activationLayer->GetGuid());
527 }
528 }
529 else if (base.GetType() == LayerType::Multiplication)
530 {
531 MultiplicationLayer* baseLayer = PolymorphicDowncast<MultiplicationLayer*>(&base);
532
533 arm_compute::Status status = ClMultiplicationWorkloadValidate(
534 baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
535 baseLayer->GetInputSlot(1).GetConnectedOutputSlot()->GetTensorInfo(),
536 activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
537 &activationDesc);
538
539 if (status)
540 {
541 FuseMultiplicationLayer<MultiplicationLayer>(optimizationViews,
542 baseLayer,
543 activationLayer,
544 activationDesc,
545 name);
546 untouched.erase(baseLayer->GetGuid());
547 untouched.erase(activationLayer->GetGuid());
548 }
549 }
550 else if (base.GetType() == LayerType::Subtraction)
551 {
552 SubtractionLayer* baseLayer = PolymorphicDowncast<SubtractionLayer*>(&base);
553
554 arm_compute::Status status = ClSubtractionValidate(
555 baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
556 baseLayer->GetInputSlot(1).GetConnectedOutputSlot()->GetTensorInfo(),
557 activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
558 &activationDesc);
559
560 if (status)
561 {
562 FuseSubtractionLayer<SubtractionLayer>(optimizationViews,
563 baseLayer,
564 activationLayer,
565 activationDesc,
566 name);
567 untouched.erase(baseLayer->GetGuid());
568 untouched.erase(activationLayer->GetGuid());
569 }
570 }
571 else if (base.GetType() == LayerType::ElementwiseBinary)
572 {
573 ElementwiseBinaryLayer* baseLayer = PolymorphicDowncast<ElementwiseBinaryLayer*>(&base);
574
575 if (baseLayer->GetParameters().m_Operation == BinaryOperation::Add)
576 {
577 arm_compute::Status status = ClAdditionValidate(
578 baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
579 baseLayer->GetInputSlot(1).GetConnectedOutputSlot()->GetTensorInfo(),
580 activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
581 &activationDesc);
582
583 if (status)
584 {
585 FuseElementwiseBinaryLayer<ElementwiseBinaryLayer>(optimizationViews,
586 baseLayer,
587 activationLayer,
588 activationDesc,
589 BinaryOperation::Add,
590 name);
591 untouched.erase(baseLayer->GetGuid());
592 untouched.erase(activationLayer->GetGuid());
593 }
594 }
595 else if (baseLayer->GetParameters().m_Operation == BinaryOperation::Div)
596 {
597 arm_compute::Status status = ClDivisionWorkloadValidate(
598 baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
599 baseLayer->GetInputSlot(1).GetConnectedOutputSlot()->GetTensorInfo(),
600 activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
601 &activationDesc);
602
603 if (status)
604 {
605 FuseElementwiseBinaryLayer<ElementwiseBinaryLayer>(optimizationViews,
606 baseLayer,
607 activationLayer,
608 activationDesc,
609 BinaryOperation::Div,
610 name);
611 untouched.erase(baseLayer->GetGuid());
612 untouched.erase(activationLayer->GetGuid());
613 }
614 }
615 else if (baseLayer->GetParameters().m_Operation == BinaryOperation::Mul)
616 {
617 arm_compute::Status status = ClMultiplicationWorkloadValidate(
618 baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
619 baseLayer->GetInputSlot(1).GetConnectedOutputSlot()->GetTensorInfo(),
620 activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
621 &activationDesc);
622
623 if (status)
624 {
625 FuseElementwiseBinaryLayer<ElementwiseBinaryLayer>(optimizationViews,
626 baseLayer,
627 activationLayer,
628 activationDesc,
629 BinaryOperation::Mul,
630 name);
631 untouched.erase(baseLayer->GetGuid());
632 untouched.erase(activationLayer->GetGuid());
633 }
634 }
635 else if (baseLayer->GetParameters().m_Operation == BinaryOperation::Sub)
636 {
637 arm_compute::Status status = ClSubtractionValidate(
638 baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
639 baseLayer->GetInputSlot(1).GetConnectedOutputSlot()->GetTensorInfo(),
640 activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
641 &activationDesc);
642
643 if (status)
644 {
645 FuseElementwiseBinaryLayer<ElementwiseBinaryLayer>(optimizationViews,
646 baseLayer,
647 activationLayer,
648 activationDesc,
649 BinaryOperation::Sub,
650 name);
651 untouched.erase(baseLayer->GetGuid());
652 untouched.erase(activationLayer->GetGuid());
653 }
654 }
655 // No fusion available for other BinaryOperations
656 }
657 }
658 }
659 }
660 }
661 }
662
663 // Separate reduce layer with multiple axes into multiple reduce layers with 1 axis.
664 if (base.GetType() == LayerType::Reduce)
665 {
666 ReduceLayer* baseLayer = PolymorphicDowncast<ReduceLayer*>(&base);
667 ReduceDescriptor reduceDescriptor = baseLayer->GetParameters();
668
669 if (!reduceDescriptor.m_vAxis.empty() && reduceDescriptor.m_vAxis.size() > 1)
670 {
671 // Add new layers to the graph and connect them.
672 std::vector<IConnectableLayer*> layers = ChainReduceLayers<ReduceLayer>(optimizationViews,
673 baseLayer,
674 reduceDescriptor);
675
676 // Replace existing baselayer with new subgraph.
677 ReplaceLayers<ReduceLayer>(optimizationViews, baseLayer, layers);
678 untouched.erase(baseLayer->GetGuid());
679 }
680 }
681
682 // Remove Reshape where possible
683 if (base.GetType() == LayerType::Reshape)
684 {
685 ReshapeLayer* baseLayer = PolymorphicDowncast<ReshapeLayer*>(&base);
686
687 // Cannot remove a Reshape if it's connected to any layer that has an NCHW layout
688 if (ConnectedToLayerWithNCHW(baseLayer))
689 {
690 continue;
691 }
692 RemoveReshapeLayer(baseLayer, untouched, optimizationViews);
693 }
694 // Special case to fuse padding into average pooling 2d for quantized datatype.
695 // Required to be done as a backend specific optimization as Neon does not support this special case.
696 if (base.GetType() == LayerType::Pooling2d)
697 {
698 Pooling2dLayer* baseLayer = PolymorphicDowncast<Pooling2dLayer*>(&base);
699 Pooling2dDescriptor poolingDescriptor = baseLayer->GetParameters();
700 if (baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetOwningLayer().GetType() == LayerType::Pad)
701 {
702 PadLayer* padLayer = PolymorphicDowncast<PadLayer*>(
703 &baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetOwningLayer());
704 if (padLayer->GetOutputSlot(0).GetNumConnections() == 1 &&
705 optimizations::pad_fold::TryFoldPadIntoLayer2d(padLayer->GetParameters(),
706 poolingDescriptor,
707 padLayer->GetOutputSlot().GetTensorInfo(),
708 true))
709 {
710 FoldPadLayer2d<Pooling2dLayer, Pooling2dDescriptor>(optimizationViews, baseLayer,
711 poolingDescriptor, padLayer);
712 untouched.erase(baseLayer->GetGuid());
713 untouched.erase(padLayer->GetGuid());
714 }
715 }
716 }
717 }
718
719 if (optimizationViews.GetSubstitutions().empty() && optimizationViews.GetDeletedSubgraphs().empty())
720 {
721 optimizationViews.AddUntouchedSubgraph(SubgraphView(subgraph));
722 }
723 else
724 {
725 ReportUntouchedLayers(optimizationViews, untouched);
726 }
727
728 return optimizationViews;
729}
const armnnSerializer::Pooling2dDescriptor * Pooling2dDescriptor
arm_compute::Status ClDivisionWorkloadValidate(const TensorInfo &input0, const TensorInfo &input1, const TensorInfo &output, const ActivationDescriptor *activationDescriptor)
arm_compute::Status ClFullyConnectedWorkloadValidate(const TensorInfo &input, const TensorInfo &output, const TensorInfo &weights, const Optional< TensorInfo > &biases, const FullyConnectedDescriptor &descriptor, const ActivationDescriptor *activationDescriptor)
arm_compute::Status ClAdditionValidate(const TensorInfo &input0, const TensorInfo &input1, const TensorInfo &output, const ActivationDescriptor *activationDescriptor)
arm_compute::Status ClBatchNormalizationValidate(const TensorInfo &input, const TensorInfo &output, const TensorInfo &mean, const TensorInfo &var, const TensorInfo &beta, const TensorInfo &gamma, const BatchNormalizationDescriptor &descriptor, const ActivationDescriptor *activationDescriptor)
arm_compute::Status ClConvolution2dWorkloadValidate(const TensorInfo &input, const TensorInfo &output, const Convolution2dDescriptor &descriptor, const TensorInfo &weights, const Optional< TensorInfo > &biases, bool isFastMathEnabled, const ActivationDescriptor *activationDescriptor)
void RemoveReshapeLayer(ReshapeLayer *baseLayer, std::map< LayerGuid, Layer * > &untouched, OptimizationViews &optimizationViews)
void ReportUntouchedLayers(OptimizationViews &optimizationViews, std::map< LayerGuid, Layer * > untouched)
arm_compute::Status ClMultiplicationWorkloadValidate(const TensorInfo &input0, const TensorInfo &input1, const TensorInfo &output, const ActivationDescriptor *activationDescriptor)
arm_compute::Status ClDepthwiseConvolutionWorkloadValidate(const TensorInfo &input, const TensorInfo &output, const DepthwiseConvolution2dDescriptor &descriptor, const TensorInfo &weights, const Optional< TensorInfo > &biases, const ActivationDescriptor *activationDescriptor)
bool ConnectedToLayerWithNCHW(Layer *baseLayer)
Checks if the Layer is connected to any Layer that has an NCHW layout.
arm_compute::Status ClSubtractionValidate(const TensorInfo &input0, const TensorInfo &input1, const TensorInfo &output, const ActivationDescriptor *activationDescriptor)

References armnn::Activation, armnn::Add, armnn::Addition, OptimizationViews::AddUntouchedSubgraph(), armnn::BatchNormalization, SubgraphView::begin(), Layer::BeginOutputSlots(), armnn::ChainReduceLayers(), armnn::ClAdditionValidate(), armnn::ClBatchNormalizationValidate(), armnn::ClConvolution2dWorkloadValidate(), armnn::ClDepthwiseConvolutionWorkloadValidate(), armnn::ClDivisionWorkloadValidate(), armnn::ClFullyConnectedWorkloadValidate(), armnn::ClMultiplicationWorkloadValidate(), armnn::ClSubtractionValidate(), armnn::ConnectedToLayerWithNCHW(), armnn::Convolution2d, CreateBackendSpecificModelContext(), armnn::DepthwiseConvolution2d, armnn::Div, armnn::Division, armnn::ElementwiseBinary, SubgraphView::end(), Layer::EndOutputSlots(), armnn::FoldPadLayer2d(), armnn::FullyConnected, armnn::FuseAdditionLayer(), armnn::FuseBatchNormalizationLayer(), armnn::FuseConvolution2dLayer(), armnn::FuseDepthwiseConvolution2dLayer(), armnn::FuseDivisionLayer(), armnn::FuseElementwiseBinaryLayer(), armnn::FuseFullyConnectedLayer(), armnn::FuseMultiplicationLayer(), armnn::FuseSubtractionLayer(), Layer::GetAdditionalInformation(), InputSlot::GetConnectedOutputSlot(), OptimizationViews::GetDeletedSubgraphs(), Layer::GetGuid(), Layer::GetInputSlot(), Layer::GetName(), OutputSlot::GetNumConnections(), Layer::GetOutputSlot(), OutputSlot::GetOwningLayer(), LayerWithParameters< Parameters >::GetParameters(), OptimizationViews::GetSubstitutions(), OutputSlot::GetTensorInfo(), Layer::GetType(), BatchNormalizationLayer::m_Beta, Convolution2dDescriptor::m_BiasEnabled, DepthwiseConvolution2dDescriptor::m_BiasEnabled, FullyConnectedDescriptor::m_BiasEnabled, BatchNormalizationLayer::m_Gamma, BatchNormalizationLayer::m_Mean, ElementwiseBinaryDescriptor::m_Operation, BatchNormalizationLayer::m_Variance, ReduceDescriptor::m_vAxis, armnn::Mul, armnn::Multiplication, armnn::Pad, armnn::PolymorphicDowncast(), armnn::Pooling2d, armnn::Reduce, armnn::RemoveReshapeLayer(), armnn::ReplaceLayers(), armnn::ReportUntouchedLayers(), armnn::Reshape, armnn::Sub, armnn::Subtraction, and armnn::optimizations::pad_fold::TryFoldPadIntoLayer2d().

◆ RegisterTensorHandleFactories() [1/2]

void RegisterTensorHandleFactories ( TensorHandleFactoryRegistry & )
overridevirtual

(Optional) Register TensorHandleFactories Either this method or CreateMemoryManager() and IWorkloadFactory::CreateTensor() IWorkloadFactory::CreateSubtensor() methods must be implemented.

Reimplemented from IBackendInternal.

Definition at line 175 of file ClBackend.cpp.

176{
177 std::shared_ptr<ClMemoryManager> memoryManager;
178 if (m_UsingCustomAllocator)
179 {
180 memoryManager = std::make_shared<ClMemoryManager>(m_CustomAllocator);
181 }
182 else
183 {
184 memoryManager = std::make_shared<ClMemoryManager>(std::make_unique<arm_compute::CLBufferAllocator>());
185 }
186
187 std::unique_ptr<ITensorHandleFactory> factory = std::make_unique<ClTensorHandleFactory>(memoryManager);
188 std::unique_ptr<ITensorHandleFactory> importFactory = std::make_unique<ClImportTensorHandleFactory>(
189 static_cast<MemorySourceFlags>(MemorySource::Malloc), static_cast<MemorySourceFlags>(MemorySource::Malloc));
190
191 registry.RegisterCopyAndImportFactoryPair(factory->GetId(), importFactory->GetId());
192 registry.RegisterCopyAndImportFactoryPair(importFactory->GetId(), factory->GetId());
193
194 registry.RegisterMemoryManager(memoryManager);
195 registry.RegisterFactory(std::move(factory));
196 registry.RegisterFactory(std::move(importFactory));
197
198}

References m_CustomAllocator, m_UsingCustomAllocator, armnn::Malloc, TensorHandleFactoryRegistry::RegisterCopyAndImportFactoryPair(), TensorHandleFactoryRegistry::RegisterFactory(), and TensorHandleFactoryRegistry::RegisterMemoryManager().

◆ RegisterTensorHandleFactories() [2/2]

void RegisterTensorHandleFactories ( TensorHandleFactoryRegistry & registry,
MemorySourceFlags inputFlags,
MemorySourceFlags outputFlags )
overridevirtual

(Optional) Register TensorHandleFactories Either this method or CreateMemoryManager() and IWorkloadFactory::CreateTensor() IWorkloadFactory::CreateSubtensor() methods must be implemented.

Reimplemented from IBackendInternal.

Definition at line 200 of file ClBackend.cpp.

203{
204 // To allow force import if inputFlags/outputFlags are Undefined, set it as Malloc
205 if (inputFlags == static_cast<MemorySourceFlags>(MemorySource::Undefined))
206 {
207 inputFlags = static_cast<MemorySourceFlags>(MemorySource::Malloc);
208 }
209 if (outputFlags == static_cast<MemorySourceFlags>(MemorySource::Undefined))
210 {
211 outputFlags = static_cast<MemorySourceFlags>(MemorySource::Malloc);
212 }
213 std::shared_ptr<ClMemoryManager> memoryManager;
214 if (m_UsingCustomAllocator)
215 {
216 memoryManager = std::make_shared<ClMemoryManager>(m_CustomAllocator);
217 }
218 else
219 {
220 memoryManager = std::make_shared<ClMemoryManager>(std::make_unique<arm_compute::CLBufferAllocator>());
221 }
222
223 std::unique_ptr<ITensorHandleFactory> factory = std::make_unique<ClTensorHandleFactory>(memoryManager);
224 std::unique_ptr<ITensorHandleFactory> importFactory = std::make_unique<ClImportTensorHandleFactory>(
225 inputFlags, outputFlags);
226
227 registry.RegisterCopyAndImportFactoryPair(factory->GetId(), importFactory->GetId());
228 registry.RegisterCopyAndImportFactoryPair(importFactory->GetId(), factory->GetId());
229
230 registry.RegisterMemoryManager(memoryManager);
231 registry.RegisterFactory(std::move(factory));
232 registry.RegisterFactory(std::move(importFactory));
233}

References m_CustomAllocator, m_UsingCustomAllocator, armnn::Malloc, TensorHandleFactoryRegistry::RegisterCopyAndImportFactoryPair(), TensorHandleFactoryRegistry::RegisterFactory(), TensorHandleFactoryRegistry::RegisterMemoryManager(), and armnn::Undefined.

◆ UseCustomMemoryAllocator()

virtual bool UseCustomMemoryAllocator ( std::shared_ptr< ICustomAllocator > allocator,
armnn::Optional< std::string & > errMsg )
inlineoverridevirtual

Signals the backend to use a custom memory allocator provided by the user.

Parameters
allocator- a pointer to the provided ICustomAllocator to use with this backend
errMsg- Optional string variable to return error messages
Returns
- Returns true if switching to custom allocator was successful

Reimplemented from IBackendInternal.

Definition at line 82 of file ClBackend.hpp.

84 {
85 IgnoreUnused(errMsg);
86 ARMNN_LOG(info) << "Using Custom Allocator for ClBackend";
87
88 // Set flag to signal the backend to use a custom memory allocator
89 m_CustomAllocator = std::make_shared<ClBackendCustomAllocatorWrapper>(std::move(allocator));
90 m_UsingCustomAllocator = true;
91 return m_UsingCustomAllocator;
92 }
#define ARMNN_LOG(severity)
Definition Logging.hpp:212
void IgnoreUnused(Ts &&...)

References ARMNN_LOG, armnn::IgnoreUnused(), armnn::info, m_CustomAllocator, and m_UsingCustomAllocator.

Referenced by ClBackend().

Member Data Documentation

◆ m_CustomAllocator

◆ m_UsingCustomAllocator


The documentation for this class was generated from the following files: