ArmNN
 25.02
All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Pages
NeonBackend Class Reference

#include <NeonBackend.hpp>

Inheritance diagram for NeonBackend:
[legend]
Collaboration diagram for NeonBackend:
[legend]

Public Member Functions

 NeonBackend ()=default
 
 ~NeonBackend ()=default
 
const BackendIdGetId () const override
 
IBackendInternal::IMemoryManagerUniquePtr CreateMemoryManager () const override
 
IWorkloadFactoryPtr CreateWorkloadFactory (const IBackendInternal::IMemoryManagerSharedPtr &memoryManager=nullptr) const override
 
IWorkloadFactoryPtr CreateWorkloadFactory (class TensorHandleFactoryRegistry &tensorHandleFactoryRegistry) const override
 
IWorkloadFactoryPtr CreateWorkloadFactory (const IMemoryManagerSharedPtr &memoryManager, const ModelOptions &modelOptions) const override
 
IWorkloadFactoryPtr CreateWorkloadFactory (class TensorHandleFactoryRegistry &tensorHandleFactoryRegistry, const ModelOptions &modelOptions) const override
 
IBackendInternal::IBackendContextPtr CreateBackendContext (const IRuntime::CreationOptions &) const override
 Create the runtime context of the backend. More...
 
IBackendInternal::IBackendProfilingContextPtr CreateBackendProfilingContext (const IRuntime::CreationOptions &, IBackendProfilingPtr &backendProfiling) override
 Create context specifically used for profiling interaction from backends. More...
 
IBackendInternal::ILayerSupportSharedPtr GetLayerSupport () const override
 
IBackendInternal::ILayerSupportSharedPtr GetLayerSupport (const ModelOptions &modelOptions) const override
 
OptimizationViews OptimizeSubgraphView (const SubgraphView &subgraph, const ModelOptions &modelOptions) const override
 
std::vector< ITensorHandleFactory::FactoryIdGetHandleFactoryPreferences () const override
 (Optional) Returns a vector of supported TensorHandleFactory ids in preference order. More...
 
void RegisterTensorHandleFactories (class TensorHandleFactoryRegistry &registry) override
 (Optional) Register TensorHandleFactories Either this method or CreateMemoryManager() and IWorkloadFactory::CreateTensor() IWorkloadFactory::CreateSubtensor() methods must be implemented. More...
 
IBackendInternal::IBackendSpecificModelContextPtr CreateBackendSpecificModelContext (const ModelOptions &modelOptions) const override
 
BackendCapabilities GetCapabilities () const override
 Returns a BackendCapability if the backend lists the capability The BackendCapability must then be inspected to check whether or not that BackendCapability is supported Otherwise returns an EmptyOptional if the BackendCapability is unlisted. More...
 
std::unique_ptr< ICustomAllocatorGetDefaultAllocator () const override
 Returns the default memory allocator for the backend. More...
 
- Public Member Functions inherited from IBackendInternal
 ~IBackendInternal () override=default
 Allow backends created by the factory function to be destroyed through IBackendInternal. More...
 
virtual IWorkloadFactoryPtr CreateWorkloadFactory (class TensorHandleFactoryRegistry &tensorHandleFactoryRegistry, const ModelOptions &modelOptions, MemorySourceFlags inputFlags, MemorySourceFlags outputFlags) const
 
virtual OptimizationViews OptimizeSubgraphView (const SubgraphView &subgraph) const
 
bool SupportsTensorAllocatorAPI () const
 
ITensorHandleFactory::FactoryId GetBackwardCompatibleFavoriteHandleFactory ()
 
virtual void RegisterTensorHandleFactories (class TensorHandleFactoryRegistry &registry, MemorySourceFlags inputFlags, MemorySourceFlags outputFlags)
 (Optional) Register TensorHandleFactories Either this method or CreateMemoryManager() and IWorkloadFactory::CreateTensor() IWorkloadFactory::CreateSubtensor() methods must be implemented. More...
 
virtual bool UseCustomMemoryAllocator (std::shared_ptr< ICustomAllocator > allocator, armnn::Optional< std::string & > errMsg)
 Signals the backend to use a custom memory allocator provided by the user. More...
 
virtual unsigned int GetNumberOfCacheFiles () const
 Returns the number of files cached if backend supports caching. More...
 

Static Public Member Functions

static const BackendIdGetIdStatic ()
 
- Static Public Member Functions inherited from IBackendInternal
static constexpr BackendVersion GetApiVersion ()
 Returns the version of the Backend API. More...
 

Additional Inherited Members

- Public Types inherited from IBackendInternal
using IWorkloadFactoryPtr = std::unique_ptr< IWorkloadFactory >
 
using IBackendContextPtr = std::unique_ptr< IBackendContext >
 
using IBackendProfilingContextPtr = std::shared_ptr< arm::pipe::IBackendProfilingContext >
 This is the bridge between backend and backend profiling we'll keep it in the backend namespace. More...
 
using IBackendProfilingPtr = std::unique_ptr< arm::pipe::IBackendProfiling >
 
using ILayerSupportSharedPtr = std::shared_ptr< ILayerSupport >
 
using IBackendSpecificModelContextPtr = std::shared_ptr< IBackendModelContext >
 
using IMemoryManagerUniquePtr = std::unique_ptr< IMemoryManager >
 
using IMemoryManagerSharedPtr = std::shared_ptr< IMemoryManager >
 
- Protected Member Functions inherited from IBackendInternal
 IBackendInternal ()=default
 Creation must be done through a specific backend interface. More...
 
- Protected Member Functions inherited from IBackend
 IBackend ()
 
virtual ~IBackend ()
 

Detailed Description

Definition at line 29 of file NeonBackend.hpp.

Constructor & Destructor Documentation

◆ NeonBackend()

NeonBackend ( )
default

◆ ~NeonBackend()

~NeonBackend ( )
default

Member Function Documentation

◆ CreateBackendContext()

IBackendInternal::IBackendContextPtr CreateBackendContext ( const IRuntime::CreationOptions ) const
overridevirtual

Create the runtime context of the backend.

Implementations may return a default-constructed IBackendContextPtr if no context is needed at runtime. Implementations must throw BackendUnavailableException if the backend cannot be used (for example, necessary accelerator hardware is not present). The default implementation always returns a default-constructed pointer.

Reimplemented from IBackendInternal.

Definition at line 109 of file NeonBackend.cpp.

110 {
111  return IBackendContextPtr{};
112 }
std::unique_ptr< IBackendContext > IBackendContextPtr

◆ CreateBackendProfilingContext()

IBackendInternal::IBackendProfilingContextPtr CreateBackendProfilingContext ( const IRuntime::CreationOptions creationOptions,
IBackendProfilingPtr backendProfiling 
)
overridevirtual

Create context specifically used for profiling interaction from backends.

Reimplemented from IBackendInternal.

Definition at line 114 of file NeonBackend.cpp.

116 {
118 }
std::shared_ptr< arm::pipe::IBackendProfilingContext > IBackendProfilingContextPtr
This is the bridge between backend and backend profiling we'll keep it in the backend namespace.

◆ CreateBackendSpecificModelContext()

IBackendInternal::IBackendSpecificModelContextPtr CreateBackendSpecificModelContext ( const ModelOptions modelOptions) const
overridevirtual

Reimplemented from IBackendInternal.

Definition at line 120 of file NeonBackend.cpp.

122 {
123  return IBackendSpecificModelContextPtr{new NeonBackendModelContext{modelOptions}};
124 }
std::shared_ptr< IBackendModelContext > IBackendSpecificModelContextPtr

Referenced by NeonBackend::CreateWorkloadFactory(), and NeonBackend::GetLayerSupport().

◆ CreateMemoryManager()

IBackendInternal::IMemoryManagerUniquePtr CreateMemoryManager ( ) const
overridevirtual

Reimplemented from IBackendInternal.

Definition at line 52 of file NeonBackend.cpp.

53 {
54  return std::make_unique<NeonMemoryManager>(std::make_unique<arm_compute::Allocator>(),
56 }

References BaseMemoryManager::Offset.

◆ CreateWorkloadFactory() [1/4]

IBackendInternal::IWorkloadFactoryPtr CreateWorkloadFactory ( class TensorHandleFactoryRegistry tensorHandleFactoryRegistry) const
overridevirtual

Reimplemented from IBackendInternal.

Definition at line 72 of file NeonBackend.cpp.

74 {
75  auto memoryManager = std::make_shared<NeonMemoryManager>(std::make_unique<arm_compute::Allocator>(),
77 
78  tensorHandleFactoryRegistry.RegisterMemoryManager(memoryManager);
79 
80  auto factory = std::make_unique<NeonTensorHandleFactory>(memoryManager);
81  // Register copy and import factory pair
82  tensorHandleFactoryRegistry.RegisterCopyAndImportFactoryPair(factory->GetId(), factory->GetId());
83  // Register the factory
84  tensorHandleFactoryRegistry.RegisterFactory(std::move(factory));
85 
86 
87  return std::make_unique<NeonWorkloadFactory>(
88  PolymorphicPointerDowncast<NeonMemoryManager>(memoryManager));
89 }

References BaseMemoryManager::Offset, TensorHandleFactoryRegistry::RegisterCopyAndImportFactoryPair(), TensorHandleFactoryRegistry::RegisterFactory(), and TensorHandleFactoryRegistry::RegisterMemoryManager().

◆ CreateWorkloadFactory() [2/4]

IBackendInternal::IWorkloadFactoryPtr CreateWorkloadFactory ( class TensorHandleFactoryRegistry tensorHandleFactoryRegistry,
const ModelOptions modelOptions 
) const
overridevirtual

Reimplemented from IBackendInternal.

Definition at line 91 of file NeonBackend.cpp.

93 {
94  auto memoryManager = std::make_shared<NeonMemoryManager>(std::make_unique<arm_compute::Allocator>(),
96 
97  tensorHandleFactoryRegistry.RegisterMemoryManager(memoryManager);
98 
99  auto factory = std::make_unique<NeonTensorHandleFactory>(memoryManager);
100  // Register copy and import factory pair
101  tensorHandleFactoryRegistry.RegisterCopyAndImportFactoryPair(factory->GetId(), factory->GetId());
102  // Register the factory
103  tensorHandleFactoryRegistry.RegisterFactory(std::move(factory));
104 
105  return std::make_unique<NeonWorkloadFactory>(
106  PolymorphicPointerDowncast<NeonMemoryManager>(memoryManager), CreateBackendSpecificModelContext(modelOptions));
107 }
IBackendInternal::IBackendSpecificModelContextPtr CreateBackendSpecificModelContext(const ModelOptions &modelOptions) const override

References NeonBackend::CreateBackendSpecificModelContext(), BaseMemoryManager::Offset, TensorHandleFactoryRegistry::RegisterCopyAndImportFactoryPair(), TensorHandleFactoryRegistry::RegisterFactory(), and TensorHandleFactoryRegistry::RegisterMemoryManager().

◆ CreateWorkloadFactory() [3/4]

IBackendInternal::IWorkloadFactoryPtr CreateWorkloadFactory ( const IBackendInternal::IMemoryManagerSharedPtr memoryManager = nullptr) const
overridevirtual

Implements IBackendInternal.

Definition at line 58 of file NeonBackend.cpp.

60 {
61  return std::make_unique<NeonWorkloadFactory>(
62  PolymorphicPointerDowncast<NeonMemoryManager>(memoryManager));
63 }

◆ CreateWorkloadFactory() [4/4]

IBackendInternal::IWorkloadFactoryPtr CreateWorkloadFactory ( const IMemoryManagerSharedPtr memoryManager,
const ModelOptions modelOptions 
) const
overridevirtual

Reimplemented from IBackendInternal.

Definition at line 65 of file NeonBackend.cpp.

67 {
68  return std::make_unique<NeonWorkloadFactory>(
69  PolymorphicPointerDowncast<NeonMemoryManager>(memoryManager), CreateBackendSpecificModelContext(modelOptions));
70 }

References NeonBackend::CreateBackendSpecificModelContext().

◆ GetCapabilities()

BackendCapabilities GetCapabilities ( ) const
inlineoverridevirtual

Returns a BackendCapability if the backend lists the capability The BackendCapability must then be inspected to check whether or not that BackendCapability is supported Otherwise returns an EmptyOptional if the BackendCapability is unlisted.

Reimplemented from IBackendInternal.

Definition at line 68 of file NeonBackend.hpp.

69  {
70  return cpuAccCapabilities;
71  };
const BackendCapabilities cpuAccCapabilities("CpuAcc", { {"NonConstWeights", true}, {"ProtectedContentAllocation", false}, {"ConstantTensorsAsInputs", true}, {"PreImportIOTensors", false}, {"ExternallyManagedMemory", true}, {"MultiAxisPacking", false}, {"SingleAxisPacking", true}, {"HasFp16", arm_compute::CPUInfo::get().has_fp16()}, {"AllOrNothing", false} })

References armnn::cpuAccCapabilities.

◆ GetDefaultAllocator()

std::unique_ptr< ICustomAllocator > GetDefaultAllocator ( ) const
overridevirtual

Returns the default memory allocator for the backend.

Returns
- Returns unique pointer to the Default Allocator of the Backend

Reimplemented from IBackendInternal.

Definition at line 643 of file NeonBackend.cpp.

644 {
645  return std::make_unique<DefaultAllocator>();
646 }

◆ GetHandleFactoryPreferences()

std::vector< ITensorHandleFactory::FactoryId > GetHandleFactoryPreferences ( ) const
overridevirtual

(Optional) Returns a vector of supported TensorHandleFactory ids in preference order.

Reimplemented from IBackendInternal.

Definition at line 624 of file NeonBackend.cpp.

625 {
626  return std::vector<ITensorHandleFactory::FactoryId>() = { NeonTensorHandleFactory::GetIdStatic() };
627 }
static const FactoryId & GetIdStatic()

References NeonTensorHandleFactory::GetIdStatic().

◆ GetId()

const BackendId& GetId ( ) const
inlineoverridevirtual

Implements IBackend.

Definition at line 36 of file NeonBackend.hpp.

36 { return GetIdStatic(); }
static const BackendId & GetIdStatic()
Definition: NeonBackend.cpp:46

References NeonBackend::GetIdStatic().

◆ GetIdStatic()

const BackendId & GetIdStatic ( )
static

Definition at line 46 of file NeonBackend.cpp.

47 {
48  static const BackendId s_Id{NeonBackendId()};
49  return s_Id;
50 }
constexpr const char * NeonBackendId()

References armnn::NeonBackendId().

Referenced by NeonBackend::GetId().

◆ GetLayerSupport() [1/2]

IBackendInternal::ILayerSupportSharedPtr GetLayerSupport ( ) const
overridevirtual

Implements IBackendInternal.

Definition at line 126 of file NeonBackend.cpp.

127 {
128  static ILayerSupportSharedPtr layerSupport
129  {
131  };
132  return layerSupport;
133 }
std::shared_ptr< ILayerSupport > ILayerSupportSharedPtr

◆ GetLayerSupport() [2/2]

IBackendInternal::ILayerSupportSharedPtr GetLayerSupport ( const ModelOptions modelOptions) const
overridevirtual

Reimplemented from IBackendInternal.

Definition at line 135 of file NeonBackend.cpp.

136 {
137  static ILayerSupportSharedPtr layerSupport
138  {
139  new NeonLayerSupport(CreateBackendSpecificModelContext(modelOptions))
140  };
141  return layerSupport;
142 }

References NeonBackend::CreateBackendSpecificModelContext().

◆ OptimizeSubgraphView()

OptimizationViews OptimizeSubgraphView ( const SubgraphView subgraph,
const ModelOptions modelOptions 
) const
overridevirtual

Reimplemented from IBackendInternal.

Definition at line 144 of file NeonBackend.cpp.

146 {
147  OptimizationViews optimizationViews(modelOptions);
148 
149  auto it = subgraph.end();
150  std::map<LayerGuid, Layer*> untouched;
151 
152  while (it != subgraph.begin())
153  {
154  --it;
155  Layer& base = *(PolymorphicDowncast<Layer*>(*it));
156  untouched.insert({base.GetGuid(), &base});
157  }
158 
159  it = subgraph.end();
160  while (it != subgraph.begin())
161  {
162  --it;
163  Layer& base = *(PolymorphicDowncast<Layer*>(*it));
164 
165  // Fuse activation into previous layer if supported by backend
166  if ((base.GetType() == LayerType::DepthwiseConvolution2d || base.GetType() == LayerType::Convolution2d
167  || base.GetType() == LayerType::BatchNormalization || base.GetType() == LayerType::FullyConnected
168  || base.GetType() == LayerType::Addition || base.GetType() == LayerType::Multiplication
169  || base.GetType() == LayerType::Subtraction || base.GetType() == LayerType::Division
170  || base.GetType() == LayerType::ElementwiseBinary)
171  && (base.GetAdditionalInformation<ActivationDescriptor>() == nullptr))
172  {
173  for (auto output = base.BeginOutputSlots(); output != base.EndOutputSlots(); ++output)
174  {
175  if (output->GetNumConnections() == 1)
176  {
177  for (auto&& childInput : output->GetConnections())
178  {
179  if ((childInput->GetOwningLayer().GetType() == LayerType::Activation) &&
180  (checkDataTypeInputandOutput(childInput->GetOwningLayer())))
181  {
182  Layer& child = childInput->GetOwningLayer();
183 
184  auto* activationLayer = PolymorphicDowncast<ActivationLayer*>(&child);
185  // Before we proceed make sure that this activation layer is in the subgraph. It could be
186  // the first layer in the next subgraph.
187  if (untouched.find(activationLayer->GetGuid()) == untouched.end())
188  {
189  // We can't fuse a layer that's outside the subgraph.
190  break;
191  }
192  const std::string name = std::string("fused-") + child.GetName() + std::string("-into-") +
193  base.GetName();
194 
195  // Get params from activation layer
196  ActivationDescriptor activationDesc = activationLayer->GetParameters();
197 
198  if (base.GetType() == LayerType::Convolution2d)
199  {
200  Convolution2dLayer* baseLayer = PolymorphicDowncast<Convolution2dLayer*>(&base);
201 
202  Optional<TensorInfo> biases;
203 
204  if (baseLayer->GetParameters().m_BiasEnabled)
205  {
206  biases = baseLayer->GetInputSlot(2).GetConnectedOutputSlot()->GetTensorInfo();
207  }
208 
210  baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
211  activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
212  baseLayer->GetParameters(),
213  baseLayer->GetInputSlot(1).GetConnectedOutputSlot()->GetTensorInfo(),
214  biases,
215  false,
216  &activationDesc);
217 
218  if (status)
219  {
220  FuseConvolution2dLayer<Convolution2dLayer>(optimizationViews,
221  baseLayer,
222  activationLayer,
223  activationDesc,
224  name);
225  untouched.erase(baseLayer->GetGuid());
226  untouched.erase(activationLayer->GetGuid());
227  }
228  }
229  else if (base.GetType() == LayerType::DepthwiseConvolution2d)
230  {
231  DepthwiseConvolution2dLayer* baseLayer =
232  PolymorphicDowncast<DepthwiseConvolution2dLayer*>(&base);
233 
234  Optional<TensorInfo> biases;
235 
236  if (baseLayer->GetParameters().m_BiasEnabled)
237  {
238  biases = baseLayer->GetInputSlot(2).GetConnectedOutputSlot()->GetTensorInfo();
239  }
240 
242  baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
243  activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
244  baseLayer->GetParameters(),
245  baseLayer->GetInputSlot(1).GetConnectedOutputSlot()->GetTensorInfo(),
246  biases,
247  &activationDesc);
248 
249  if (status)
250  {
251  FuseDepthwiseConvolution2dLayer<DepthwiseConvolution2dLayer>(optimizationViews,
252  baseLayer,
253  activationLayer,
254  activationDesc,
255  name);
256  untouched.erase(baseLayer->GetGuid());
257  untouched.erase(activationLayer->GetGuid());
258  }
259  }
260  else if (base.GetType() == LayerType::FullyConnected)
261  {
262  FullyConnectedLayer* baseLayer = PolymorphicDowncast<FullyConnectedLayer*>(&base);
263  FullyConnectedDescriptor descriptor = baseLayer->GetParameters();
264 
265  // As bias is optional only try to get TensorInfo from input if bias is enabled.
266  Optional<TensorInfo> biases;
267  if (descriptor.m_BiasEnabled)
268  {
269  biases = baseLayer->GetInputSlot(2).GetConnectedOutputSlot()->GetTensorInfo();
270  }
271 
273  baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
274  activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
275  baseLayer->GetInputSlot(1).GetConnectedOutputSlot()->GetTensorInfo(),
276  biases,
277  baseLayer->GetParameters(),
278  &activationDesc);
279 
280  if (status)
281  {
282  FuseFullyConnectedLayer<FullyConnectedLayer>(optimizationViews,
283  baseLayer,
284  activationLayer,
285  activationDesc,
286  name);
287  untouched.erase(baseLayer->GetGuid());
288  untouched.erase(activationLayer->GetGuid());
289  }
290  }
291  else if (base.GetType() == LayerType::BatchNormalization)
292  {
293  BatchNormalizationLayer* baseLayer =
294  PolymorphicDowncast<BatchNormalizationLayer*>(&base);
295 
297  baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
298  activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
299  baseLayer->m_Mean->GetTensorInfo(),
300  baseLayer->m_Variance->GetTensorInfo(),
301  baseLayer->m_Beta->GetTensorInfo(),
302  baseLayer->m_Gamma->GetTensorInfo(),
303  baseLayer->GetParameters(),
304  &activationDesc);
305 
306  if (status)
307  {
308  BatchNormalizationLayer* replacementLayer =
309  FuseBatchNormalizationLayer<BatchNormalizationLayer>(optimizationViews,
310  baseLayer,
311  activationLayer,
312  activationDesc,
313  name);
314 
315  replacementLayer->m_Beta = std::move(baseLayer->m_Beta);
316  replacementLayer->m_Gamma = std::move(baseLayer->m_Gamma);
317  replacementLayer->m_Mean = std::move(baseLayer->m_Mean);
318  replacementLayer->m_Variance = std::move(baseLayer->m_Variance);
319  untouched.erase(baseLayer->GetGuid());
320  untouched.erase(activationLayer->GetGuid());
321  }
322  }
323  else if (base.GetType() == LayerType::Addition)
324  {
325  AdditionLayer* baseLayer = PolymorphicDowncast<AdditionLayer*>(&base);
326 
328  baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
329  baseLayer->GetInputSlot(1).GetConnectedOutputSlot()->GetTensorInfo(),
330  activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
331  &activationDesc);
332 
333  if (status)
334  {
335  FuseAdditionLayer<AdditionLayer>(optimizationViews,
336  baseLayer,
337  activationLayer,
338  activationDesc,
339  name);
340  untouched.erase(baseLayer->GetGuid());
341  untouched.erase(activationLayer->GetGuid());
342  }
343  }
344  else if (base.GetType() == LayerType::Division)
345  {
346  DivisionLayer* baseLayer = PolymorphicDowncast<DivisionLayer*>(&base);
347 
349  baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
350  baseLayer->GetInputSlot(1).GetConnectedOutputSlot()->GetTensorInfo(),
351  activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
352  &activationDesc);
353 
354  if (status)
355  {
356  FuseDivisionLayer<DivisionLayer>(optimizationViews,
357  baseLayer,
358  activationLayer,
359  activationDesc,
360  name);
361  untouched.erase(baseLayer->GetGuid());
362  untouched.erase(activationLayer->GetGuid());
363  }
364  }
365  else if (base.GetType() == LayerType::Multiplication)
366  {
367  MultiplicationLayer* baseLayer = PolymorphicDowncast<MultiplicationLayer*>(&base);
368 
370  baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
371  baseLayer->GetInputSlot(1).GetConnectedOutputSlot()->GetTensorInfo(),
372  activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
373  &activationDesc);
374 
375  if (status)
376  {
377  FuseMultiplicationLayer<MultiplicationLayer>(optimizationViews,
378  baseLayer,
379  activationLayer,
380  activationDesc,
381  name);
382  untouched.erase(baseLayer->GetGuid());
383  untouched.erase(activationLayer->GetGuid());
384  }
385  }
386  else if (base.GetType() == LayerType::Subtraction)
387  {
388  SubtractionLayer* baseLayer = PolymorphicDowncast<SubtractionLayer*>(&base);
389 
391  baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
392  baseLayer->GetInputSlot(1).GetConnectedOutputSlot()->GetTensorInfo(),
393  activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
394  &activationDesc);
395 
396  if (status)
397  {
398  FuseSubtractionLayer<SubtractionLayer>(optimizationViews,
399  baseLayer,
400  activationLayer,
401  activationDesc,
402  name);
403  untouched.erase(baseLayer->GetGuid());
404  untouched.erase(activationLayer->GetGuid());
405  }
406  }
407  else if (base.GetType() == LayerType::ElementwiseBinary)
408  {
409  ElementwiseBinaryLayer* baseLayer = PolymorphicDowncast<ElementwiseBinaryLayer*>(&base);
410 
411  if (baseLayer->GetParameters().m_Operation == BinaryOperation::Add)
412  {
414  baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
415  baseLayer->GetInputSlot(1).GetConnectedOutputSlot()->GetTensorInfo(),
416  activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
417  &activationDesc);
418 
419  if (status)
420  {
421  FuseElementwiseBinaryLayer<ElementwiseBinaryLayer>(optimizationViews,
422  baseLayer,
423  activationLayer,
424  activationDesc,
426  name);
427  untouched.erase(baseLayer->GetGuid());
428  untouched.erase(activationLayer->GetGuid());
429  }
430  }
431  else if (baseLayer->GetParameters().m_Operation == BinaryOperation::Div)
432  {
434  baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
435  baseLayer->GetInputSlot(1).GetConnectedOutputSlot()->GetTensorInfo(),
436  activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
437  &activationDesc);
438 
439  if (status)
440  {
441  FuseElementwiseBinaryLayer<ElementwiseBinaryLayer>(optimizationViews,
442  baseLayer,
443  activationLayer,
444  activationDesc,
446  name);
447  untouched.erase(baseLayer->GetGuid());
448  untouched.erase(activationLayer->GetGuid());
449  }
450  }
451  else if (baseLayer->GetParameters().m_Operation == BinaryOperation::Mul)
452  {
454  baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
455  baseLayer->GetInputSlot(1).GetConnectedOutputSlot()->GetTensorInfo(),
456  activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
457  &activationDesc);
458 
459  if (status)
460  {
461  FuseElementwiseBinaryLayer<ElementwiseBinaryLayer>(optimizationViews,
462  baseLayer,
463  activationLayer,
464  activationDesc,
466  name);
467  untouched.erase(baseLayer->GetGuid());
468  untouched.erase(activationLayer->GetGuid());
469  }
470  }
471  else if (baseLayer->GetParameters().m_Operation == BinaryOperation::Sub)
472  {
474  baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
475  baseLayer->GetInputSlot(1).GetConnectedOutputSlot()->GetTensorInfo(),
476  activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
477  &activationDesc);
478 
479  if (status)
480  {
481  FuseElementwiseBinaryLayer<ElementwiseBinaryLayer>(optimizationViews,
482  baseLayer,
483  activationLayer,
484  activationDesc,
486  name);
487  untouched.erase(baseLayer->GetGuid());
488  untouched.erase(activationLayer->GetGuid());
489  }
490  }
491  // No fusion available for other BinaryOperations
492  }
493  }
494  }
495  }
496  }
497  }
498 
499  // Separate reduce layer with multiple axes into multiple reduce layers with 1 axis.
500  if (base.GetType() == LayerType::Reduce)
501  {
502  ReduceLayer* baseLayer = PolymorphicDowncast<ReduceLayer*>(&base);
503  ReduceDescriptor reduceDescriptor = baseLayer->GetParameters();
504 
505  if (!reduceDescriptor.m_vAxis.empty() && reduceDescriptor.m_vAxis.size() > 1)
506  {
507  // Add new layers to the graph and connect them.
508  std::vector<IConnectableLayer*> layers = ChainReduceLayers<ReduceLayer>(optimizationViews,
509  baseLayer,
510  reduceDescriptor);
511 
512  // Replace existing baselayer with new subgraph.
513  ReplaceLayers<ReduceLayer>(optimizationViews, baseLayer, layers);
514  untouched.erase(baseLayer->GetGuid());
515  }
516  }
517 
518  // Remove Reshape where possible
519  if (base.GetType() == LayerType::Reshape)
520  {
521  ReshapeLayer* baseLayer = PolymorphicDowncast<ReshapeLayer*>(&base);
522 
523  // Cannot remove a Reshape if it's connected to any layer that has an NCHW layout
524  if (ConnectedToLayerWithNCHW(baseLayer))
525  {
526  continue;
527  }
528  RemoveReshapeLayer(baseLayer, untouched, optimizationViews);
529  }
530 
531  // Replace Add/Mul/Add where possible
532  Layer* layerList[4] = {nullptr, nullptr, nullptr, nullptr};
533  const std::vector<ActivationFunction> validActivates = { ActivationFunction::ReLu,
535  if (IsLayerSequence<BinaryOperation>(base,
537  layerList,
538  true, // handleValidActivates
539  validActivates))
540  {
541  bool fuseReLu = false;
542  unsigned int numInputs = 0;
543  unsigned int numOutputs = 0;
544  std::vector<TensorInfo> inputInfos;
545  std::vector<TensorInfo> outputInfos;
546  const ActivationDescriptor* activationDescriptor = nullptr;
547 
548  if (BuildAddMulAddTensorInfoLists<Layer>(layerList,
549  numInputs,
550  numOutputs,
551  inputInfos,
552  outputInfos,
553  activationDescriptor,
554  fuseReLu))
555  {
556  // Create the new Add/Mul/Add layer and set the Relu activation function
557  FusedDescriptor fusedDescriptor(numInputs, numOutputs, FusedKernelType::AddMulAdd);
558  arm_compute::Status status = NeonFusedWorkloadValidate({inputInfos.begin(), inputInfos.end()},
559  {outputInfos.begin(), outputInfos.end()},
560  fusedDescriptor,
561  activationDescriptor);
562  if (status)
563  {
564  std::string fusedName;
565  GetFusedName(layerList, fusedName);
566 
567  IConnectableLayer* addMulAddLayer =
568  optimizationViews.GetINetwork()->AddFusedLayer(fusedDescriptor, fusedName.c_str());
569 
570  if (fuseReLu)
571  {
572  FusedLayer* addMulAddFusedLayer = PolymorphicDowncast<FusedLayer*>(addMulAddLayer);
573  addMulAddFusedLayer->SetAdditionalInfoForObject(
574  std::make_shared<ActivationDescriptor>(*activationDescriptor));
575  }
576 
577  // Update the graph
578  std::vector<IConnectableLayer*> originalLayers;
579  for (unsigned int layerIdx = 0; layerIdx < 4; ++layerIdx)
580  {
581  if (layerList[layerIdx])
582  {
583  originalLayers.push_back(layerList[layerIdx]);
584  }
585  }
586 
587  std::vector<SlotList> inputLayersSlotLists, outputLayersSlotLists;
588  BuildAddMulAddSlotLists<SlotList>(fuseReLu,
589  outputInfos.size() > 1,
590  inputLayersSlotLists,
591  outputLayersSlotLists);
592 
593  ReplaceMultipleLayers<FusedLayer>(optimizationViews,
594  originalLayers,
595  PolymorphicDowncast<FusedLayer*>(addMulAddLayer),
596  inputLayersSlotLists,
597  outputLayersSlotLists);
598 
599  // Remove unused layers
600  for (unsigned int layerIdx = 0; layerIdx < 4; ++layerIdx)
601  {
602  if (layerList[layerIdx])
603  {
604  untouched.erase(layerList[layerIdx]->GetGuid());
605  }
606  }
607  }
608  }
609  }
610  }
611 
612  if (optimizationViews.GetSubstitutions().empty() && optimizationViews.GetDeletedSubgraphs().empty())
613  {
614  optimizationViews.AddUntouchedSubgraph(SubgraphView(subgraph));
615  }
616  else
617  {
618  ReportUntouchedLayers(optimizationViews, untouched);
619  }
620 
621  return optimizationViews;
622 }
void GetFusedName(Layer *layerList[4], std::string &fusedName)
arm_compute::Status NeonAdditionWorkloadValidate(const TensorInfo &input0, const TensorInfo &input1, const TensorInfo &output, const ActivationDescriptor *activationDescriptor)
arm_compute::Status NeonConvolution2dWorkloadValidate(const TensorInfo &input, const TensorInfo &output, const Convolution2dDescriptor &descriptor, const TensorInfo &weights, const Optional< TensorInfo > &biases, bool isFastMathEnabled, const ActivationDescriptor *activationDescriptor)
@ BoundedReLu
min(a, max(b, input)) ReLu1 & ReLu6.
void ReportUntouchedLayers(OptimizationViews &optimizationViews, std::map< LayerGuid, Layer * > untouched)
arm_compute::Status NeonDepthwiseConvolutionWorkloadValidate(const TensorInfo &input, const TensorInfo &output, const DepthwiseConvolution2dDescriptor &descriptor, const TensorInfo &weights, const Optional< TensorInfo > &biases, const ActivationDescriptor *activationDescriptor)
arm_compute::Status NeonDivisionWorkloadValidate(const TensorInfo &input0, const TensorInfo &input1, const TensorInfo &output, const ActivationDescriptor *activationDescriptor)
Status
enumeration
Definition: Types.hpp:43
arm_compute::Status NeonSubtractionWorkloadValidate(const TensorInfo &input0, const TensorInfo &input1, const TensorInfo &output, const ActivationDescriptor *activationDescriptor)
arm_compute::Status NeonFusedWorkloadValidate(const std::vector< std::reference_wrapper< TensorInfo >> &inputInfos, const std::vector< std::reference_wrapper< TensorInfo >> &outputInfos, const FusedDescriptor &fusedDescriptor, const ActivationDescriptor *activationDescriptor)
arm_compute::Status NeonBatchNormalizationValidate(const TensorInfo &input, const TensorInfo &output, const TensorInfo &mean, const TensorInfo &var, const TensorInfo &beta, const TensorInfo &gamma, const BatchNormalizationDescriptor &descriptor, const ActivationDescriptor *activationDescriptor)
bool ConnectedToLayerWithNCHW(Layer *baseLayer)
Checks if the Layer is connected to any Layer that has an NCHW layout.
arm_compute::Status NeonMultiplicationWorkloadValidate(const TensorInfo &input0, const TensorInfo &input1, const TensorInfo &output, const ActivationDescriptor *activationDescriptor)
arm_compute::Status NeonFullyConnectedWorkloadValidate(const TensorInfo &input, const TensorInfo &output, const TensorInfo &weights, const Optional< TensorInfo > &biases, const FullyConnectedDescriptor &descriptor, const ActivationDescriptor *activationDescriptor)
void RemoveReshapeLayer(ReshapeLayer *baseLayer, std::map< LayerGuid, Layer * > &untouched, OptimizationViews &optimizationViews)

References armnn::Activation, armnn::Add, INetwork::AddFusedLayer(), armnn::Addition, armnn::AddMulAdd, OptimizationViews::AddUntouchedSubgraph(), armnn::BatchNormalization, SubgraphView::begin(), Layer::BeginOutputSlots(), armnn::BoundedReLu, armnn::ConnectedToLayerWithNCHW(), armnn::Convolution2d, armnn::DepthwiseConvolution2d, armnn::Div, armnn::Division, armnn::ElementwiseBinary, SubgraphView::end(), Layer::EndOutputSlots(), armnn::FullyConnected, Layer::GetAdditionalInformation(), InputSlot::GetConnectedOutputSlot(), OptimizationViews::GetDeletedSubgraphs(), armnn::GetFusedName(), Layer::GetGuid(), OptimizationViews::GetINetwork(), Layer::GetInputSlot(), Layer::GetName(), LayerWithParameters< Parameters >::GetParameters(), OptimizationViews::GetSubstitutions(), OutputSlot::GetTensorInfo(), Layer::GetType(), BatchNormalizationLayer::m_Beta, FullyConnectedDescriptor::m_BiasEnabled, Convolution2dDescriptor::m_BiasEnabled, DepthwiseConvolution2dDescriptor::m_BiasEnabled, BatchNormalizationLayer::m_Gamma, BatchNormalizationLayer::m_Mean, ElementwiseBinaryDescriptor::m_Operation, BatchNormalizationLayer::m_Variance, ReduceDescriptor::m_vAxis, armnn::Mul, armnn::Multiplication, armnn::NeonAdditionWorkloadValidate(), armnn::NeonBatchNormalizationValidate(), armnn::NeonConvolution2dWorkloadValidate(), armnn::NeonDepthwiseConvolutionWorkloadValidate(), armnn::NeonDivisionWorkloadValidate(), armnn::NeonFullyConnectedWorkloadValidate(), armnn::NeonFusedWorkloadValidate(), armnn::NeonMultiplicationWorkloadValidate(), armnn::NeonSubtractionWorkloadValidate(), armnn::Reduce, armnn::ReLu, armnn::RemoveReshapeLayer(), armnn::ReportUntouchedLayers(), armnn::Reshape, Layer::SetAdditionalInfoForObject(), armnn::Sub, and armnn::Subtraction.

◆ RegisterTensorHandleFactories()

void RegisterTensorHandleFactories ( class TensorHandleFactoryRegistry )
overridevirtual

(Optional) Register TensorHandleFactories Either this method or CreateMemoryManager() and IWorkloadFactory::CreateTensor() IWorkloadFactory::CreateSubtensor() methods must be implemented.

Reimplemented from IBackendInternal.

Definition at line 629 of file NeonBackend.cpp.

630 {
631  auto memoryManager = std::make_shared<NeonMemoryManager>(std::make_unique<arm_compute::Allocator>(),
633 
634  registry.RegisterMemoryManager(memoryManager);
635 
636  auto factory = std::make_unique<NeonTensorHandleFactory>(memoryManager);
637  // Register copy and import factory pair
638  registry.RegisterCopyAndImportFactoryPair(factory->GetId(), factory->GetId());
639  // Register the factory
640  registry.RegisterFactory(std::move(factory));
641 }

References BaseMemoryManager::Offset, TensorHandleFactoryRegistry::RegisterCopyAndImportFactoryPair(), TensorHandleFactoryRegistry::RegisterFactory(), and TensorHandleFactoryRegistry::RegisterMemoryManager().


The documentation for this class was generated from the following files: