ArmNN
 25.11
Loading...
Searching...
No Matches
NeonBackend Class Reference

#include <NeonBackend.hpp>

Inheritance diagram for NeonBackend:
[legend]
Collaboration diagram for NeonBackend:
[legend]

Public Member Functions

 NeonBackend ()=default
 ~NeonBackend ()=default
const BackendIdGetId () const override
IBackendInternal::IMemoryManagerUniquePtr CreateMemoryManager () const override
IWorkloadFactoryPtr CreateWorkloadFactory (const IBackendInternal::IMemoryManagerSharedPtr &memoryManager=nullptr) const override
IWorkloadFactoryPtr CreateWorkloadFactory (class TensorHandleFactoryRegistry &tensorHandleFactoryRegistry) const override
IWorkloadFactoryPtr CreateWorkloadFactory (const IMemoryManagerSharedPtr &memoryManager, const ModelOptions &modelOptions) const override
IWorkloadFactoryPtr CreateWorkloadFactory (class TensorHandleFactoryRegistry &tensorHandleFactoryRegistry, const ModelOptions &modelOptions) const override
IBackendInternal::IBackendContextPtr CreateBackendContext (const IRuntime::CreationOptions &) const override
 Create the runtime context of the backend.
IBackendInternal::IBackendProfilingContextPtr CreateBackendProfilingContext (const IRuntime::CreationOptions &, IBackendProfilingPtr &backendProfiling) override
 Create context specifically used for profiling interaction from backends.
IBackendInternal::ILayerSupportSharedPtr GetLayerSupport () const override
IBackendInternal::ILayerSupportSharedPtr GetLayerSupport (const ModelOptions &modelOptions) const override
OptimizationViews OptimizeSubgraphView (const SubgraphView &subgraph, const ModelOptions &modelOptions) const override
std::vector< ITensorHandleFactory::FactoryIdGetHandleFactoryPreferences () const override
 (Optional) Returns a vector of supported TensorHandleFactory ids in preference order.
void RegisterTensorHandleFactories (class TensorHandleFactoryRegistry &registry) override
 (Optional) Register TensorHandleFactories Either this method or CreateMemoryManager() and IWorkloadFactory::CreateTensor() IWorkloadFactory::CreateSubtensor() methods must be implemented.
IBackendInternal::IBackendSpecificModelContextPtr CreateBackendSpecificModelContext (const ModelOptions &modelOptions) const override
BackendCapabilities GetCapabilities () const override
 Returns a BackendCapability if the backend lists the capability The BackendCapability must then be inspected to check whether or not that BackendCapability is supported Otherwise returns an EmptyOptional if the BackendCapability is unlisted.
std::unique_ptr< ICustomAllocatorGetDefaultAllocator () const override
 Returns the default memory allocator for the backend.
Public Member Functions inherited from IBackendInternal
 ~IBackendInternal () override=default
 Allow backends created by the factory function to be destroyed through IBackendInternal.
virtual IWorkloadFactoryPtr CreateWorkloadFactory (class TensorHandleFactoryRegistry &tensorHandleFactoryRegistry, const ModelOptions &modelOptions, MemorySourceFlags inputFlags, MemorySourceFlags outputFlags) const
virtual OptimizationViews OptimizeSubgraphView (const SubgraphView &subgraph) const
bool SupportsTensorAllocatorAPI () const
ITensorHandleFactory::FactoryId GetBackwardCompatibleFavoriteHandleFactory ()
virtual void RegisterTensorHandleFactories (class TensorHandleFactoryRegistry &registry, MemorySourceFlags inputFlags, MemorySourceFlags outputFlags)
 (Optional) Register TensorHandleFactories Either this method or CreateMemoryManager() and IWorkloadFactory::CreateTensor() IWorkloadFactory::CreateSubtensor() methods must be implemented.
virtual bool UseCustomMemoryAllocator (std::shared_ptr< ICustomAllocator > allocator, armnn::Optional< std::string & > errMsg)
 Signals the backend to use a custom memory allocator provided by the user.
virtual unsigned int GetNumberOfCacheFiles () const
 Returns the number of files cached if backend supports caching.

Static Public Member Functions

static const BackendIdGetIdStatic ()
Static Public Member Functions inherited from IBackendInternal
static constexpr BackendVersion GetApiVersion ()
 Returns the version of the Backend API.

Additional Inherited Members

Public Types inherited from IBackendInternal
using IWorkloadFactoryPtr = std::unique_ptr<IWorkloadFactory>
using IBackendContextPtr = std::unique_ptr<IBackendContext>
using IBackendProfilingContextPtr = std::shared_ptr<arm::pipe::IBackendProfilingContext>
 This is the bridge between backend and backend profiling we'll keep it in the backend namespace.
using IBackendProfilingPtr = std::unique_ptr<arm::pipe::IBackendProfiling>
using ILayerSupportSharedPtr = std::shared_ptr<ILayerSupport>
using IBackendSpecificModelContextPtr = std::shared_ptr<IBackendModelContext>
using IMemoryManagerUniquePtr = std::unique_ptr<IMemoryManager>
using IMemoryManagerSharedPtr = std::shared_ptr<IMemoryManager>
Protected Member Functions inherited from IBackendInternal
 IBackendInternal ()=default
 Creation must be done through a specific backend interface.
Protected Member Functions inherited from IBackend
 IBackend ()
virtual ~IBackend ()

Detailed Description

Definition at line 29 of file NeonBackend.hpp.

Constructor & Destructor Documentation

◆ NeonBackend()

NeonBackend ( )
default

◆ ~NeonBackend()

~NeonBackend ( )
default

Member Function Documentation

◆ CreateBackendContext()

IBackendInternal::IBackendContextPtr CreateBackendContext ( const IRuntime::CreationOptions & ) const
overridevirtual

Create the runtime context of the backend.

Implementations may return a default-constructed IBackendContextPtr if no context is needed at runtime. Implementations must throw BackendUnavailableException if the backend cannot be used (for example, necessary accelerator hardware is not present). The default implementation always returns a default-constructed pointer.

Reimplemented from IBackendInternal.

Definition at line 109 of file NeonBackend.cpp.

110{
111 return IBackendContextPtr{};
112}

◆ CreateBackendProfilingContext()

IBackendInternal::IBackendProfilingContextPtr CreateBackendProfilingContext ( const IRuntime::CreationOptions & creationOptions,
IBackendProfilingPtr & backendProfiling )
overridevirtual

Create context specifically used for profiling interaction from backends.

Reimplemented from IBackendInternal.

Definition at line 114 of file NeonBackend.cpp.

116{
117 return IBackendProfilingContextPtr{};
118}

◆ CreateBackendSpecificModelContext()

IBackendInternal::IBackendSpecificModelContextPtr CreateBackendSpecificModelContext ( const ModelOptions & modelOptions) const
overridevirtual

Reimplemented from IBackendInternal.

Definition at line 120 of file NeonBackend.cpp.

122{
123 return IBackendSpecificModelContextPtr{new NeonBackendModelContext{modelOptions}};
124}

Referenced by CreateWorkloadFactory(), CreateWorkloadFactory(), and GetLayerSupport().

◆ CreateMemoryManager()

IBackendInternal::IMemoryManagerUniquePtr CreateMemoryManager ( ) const
overridevirtual

Reimplemented from IBackendInternal.

Definition at line 52 of file NeonBackend.cpp.

53{
54 return std::make_unique<NeonMemoryManager>(std::make_unique<arm_compute::Allocator>(),
55 BaseMemoryManager::MemoryAffinity::Offset);
56}

References BaseMemoryManager::Offset.

◆ CreateWorkloadFactory() [1/4]

IBackendInternal::IWorkloadFactoryPtr CreateWorkloadFactory ( class TensorHandleFactoryRegistry & tensorHandleFactoryRegistry) const
overridevirtual

Reimplemented from IBackendInternal.

Definition at line 72 of file NeonBackend.cpp.

74{
75 auto memoryManager = std::make_shared<NeonMemoryManager>(std::make_unique<arm_compute::Allocator>(),
76 BaseMemoryManager::MemoryAffinity::Offset);
77
78 tensorHandleFactoryRegistry.RegisterMemoryManager(memoryManager);
79
80 auto factory = std::make_unique<NeonTensorHandleFactory>(memoryManager);
81 // Register copy and import factory pair
82 tensorHandleFactoryRegistry.RegisterCopyAndImportFactoryPair(factory->GetId(), factory->GetId());
83 // Register the factory
84 tensorHandleFactoryRegistry.RegisterFactory(std::move(factory));
85
86
87 return std::make_unique<NeonWorkloadFactory>(
88 PolymorphicPointerDowncast<NeonMemoryManager>(memoryManager));
89}

References BaseMemoryManager::Offset, armnn::PolymorphicPointerDowncast(), TensorHandleFactoryRegistry::RegisterCopyAndImportFactoryPair(), TensorHandleFactoryRegistry::RegisterFactory(), and TensorHandleFactoryRegistry::RegisterMemoryManager().

◆ CreateWorkloadFactory() [2/4]

IBackendInternal::IWorkloadFactoryPtr CreateWorkloadFactory ( class TensorHandleFactoryRegistry & tensorHandleFactoryRegistry,
const ModelOptions & modelOptions ) const
overridevirtual

Reimplemented from IBackendInternal.

Definition at line 91 of file NeonBackend.cpp.

93{
94 auto memoryManager = std::make_shared<NeonMemoryManager>(std::make_unique<arm_compute::Allocator>(),
95 BaseMemoryManager::MemoryAffinity::Offset);
96
97 tensorHandleFactoryRegistry.RegisterMemoryManager(memoryManager);
98
99 auto factory = std::make_unique<NeonTensorHandleFactory>(memoryManager);
100 // Register copy and import factory pair
101 tensorHandleFactoryRegistry.RegisterCopyAndImportFactoryPair(factory->GetId(), factory->GetId());
102 // Register the factory
103 tensorHandleFactoryRegistry.RegisterFactory(std::move(factory));
104
105 return std::make_unique<NeonWorkloadFactory>(
106 PolymorphicPointerDowncast<NeonMemoryManager>(memoryManager), CreateBackendSpecificModelContext(modelOptions));
107}

References CreateBackendSpecificModelContext(), BaseMemoryManager::Offset, armnn::PolymorphicPointerDowncast(), TensorHandleFactoryRegistry::RegisterCopyAndImportFactoryPair(), TensorHandleFactoryRegistry::RegisterFactory(), and TensorHandleFactoryRegistry::RegisterMemoryManager().

◆ CreateWorkloadFactory() [3/4]

IBackendInternal::IWorkloadFactoryPtr CreateWorkloadFactory ( const IBackendInternal::IMemoryManagerSharedPtr & memoryManager = nullptr) const
overridevirtual

Implements IBackendInternal.

Definition at line 58 of file NeonBackend.cpp.

60{
61 return std::make_unique<NeonWorkloadFactory>(
62 PolymorphicPointerDowncast<NeonMemoryManager>(memoryManager));
63}

References armnn::PolymorphicPointerDowncast().

◆ CreateWorkloadFactory() [4/4]

IBackendInternal::IWorkloadFactoryPtr CreateWorkloadFactory ( const IMemoryManagerSharedPtr & memoryManager,
const ModelOptions & modelOptions ) const
overridevirtual

Reimplemented from IBackendInternal.

Definition at line 65 of file NeonBackend.cpp.

67{
68 return std::make_unique<NeonWorkloadFactory>(
69 PolymorphicPointerDowncast<NeonMemoryManager>(memoryManager), CreateBackendSpecificModelContext(modelOptions));
70}

References CreateBackendSpecificModelContext(), and armnn::PolymorphicPointerDowncast().

◆ GetCapabilities()

BackendCapabilities GetCapabilities ( ) const
inlineoverridevirtual

Returns a BackendCapability if the backend lists the capability The BackendCapability must then be inspected to check whether or not that BackendCapability is supported Otherwise returns an EmptyOptional if the BackendCapability is unlisted.

Reimplemented from IBackendInternal.

Definition at line 68 of file NeonBackend.hpp.

69 {
70 return cpuAccCapabilities;
71 };
const BackendCapabilities cpuAccCapabilities("CpuAcc", { {"NonConstWeights", true}, {"ProtectedContentAllocation", false}, {"ConstantTensorsAsInputs", true}, {"PreImportIOTensors", false}, {"ExternallyManagedMemory", true}, {"MultiAxisPacking", false}, {"SingleAxisPacking", true}, {"HasFp16", arm_compute::CPUInfo::get().has_fp16()}, {"AllOrNothing", false} })

References armnn::cpuAccCapabilities.

◆ GetDefaultAllocator()

std::unique_ptr< ICustomAllocator > GetDefaultAllocator ( ) const
overridevirtual

Returns the default memory allocator for the backend.

Returns
- Returns unique pointer to the Default Allocator of the Backend

Reimplemented from IBackendInternal.

Definition at line 643 of file NeonBackend.cpp.

644{
645 return std::make_unique<DefaultAllocator>();
646}

◆ GetHandleFactoryPreferences()

std::vector< ITensorHandleFactory::FactoryId > GetHandleFactoryPreferences ( ) const
overridevirtual

(Optional) Returns a vector of supported TensorHandleFactory ids in preference order.

Reimplemented from IBackendInternal.

Definition at line 624 of file NeonBackend.cpp.

625{
626 return std::vector<ITensorHandleFactory::FactoryId>() = { NeonTensorHandleFactory::GetIdStatic() };
627}

References NeonTensorHandleFactory::GetIdStatic().

◆ GetId()

const BackendId & GetId ( ) const
inlineoverridevirtual

Implements IBackend.

Definition at line 36 of file NeonBackend.hpp.

36{ return GetIdStatic(); }

References GetIdStatic().

◆ GetIdStatic()

const BackendId & GetIdStatic ( )
static

Definition at line 46 of file NeonBackend.cpp.

47{
48 static const BackendId s_Id{NeonBackendId()};
49 return s_Id;
50}
constexpr const char * NeonBackendId()

References armnn::NeonBackendId().

Referenced by GetId().

◆ GetLayerSupport() [1/2]

IBackendInternal::ILayerSupportSharedPtr GetLayerSupport ( ) const
overridevirtual

Implements IBackendInternal.

Definition at line 126 of file NeonBackend.cpp.

127{
128 static ILayerSupportSharedPtr layerSupport
129 {
130 new NeonLayerSupport(IBackendInternal::IBackendSpecificModelContextPtr{})
131 };
132 return layerSupport;
133}
std::shared_ptr< ILayerSupport > ILayerSupportSharedPtr

◆ GetLayerSupport() [2/2]

IBackendInternal::ILayerSupportSharedPtr GetLayerSupport ( const ModelOptions & modelOptions) const
overridevirtual

Reimplemented from IBackendInternal.

Definition at line 135 of file NeonBackend.cpp.

136{
137 static ILayerSupportSharedPtr layerSupport
138 {
139 new NeonLayerSupport(CreateBackendSpecificModelContext(modelOptions))
140 };
141 return layerSupport;
142}

References CreateBackendSpecificModelContext().

◆ OptimizeSubgraphView()

OptimizationViews OptimizeSubgraphView ( const SubgraphView & subgraph,
const ModelOptions & modelOptions ) const
overridevirtual

Reimplemented from IBackendInternal.

Definition at line 144 of file NeonBackend.cpp.

146{
147 OptimizationViews optimizationViews(modelOptions);
148
149 auto it = subgraph.end();
150 std::map<LayerGuid, Layer*> untouched;
151
152 while (it != subgraph.begin())
153 {
154 --it;
155 Layer& base = *(PolymorphicDowncast<Layer*>(*it));
156 untouched.insert({base.GetGuid(), &base});
157 }
158
159 it = subgraph.end();
160 while (it != subgraph.begin())
161 {
162 --it;
163 Layer& base = *(PolymorphicDowncast<Layer*>(*it));
164
165 // Fuse activation into previous layer if supported by backend
166 if ((base.GetType() == LayerType::DepthwiseConvolution2d || base.GetType() == LayerType::Convolution2d
167 || base.GetType() == LayerType::BatchNormalization || base.GetType() == LayerType::FullyConnected
168 || base.GetType() == LayerType::Addition || base.GetType() == LayerType::Multiplication
169 || base.GetType() == LayerType::Subtraction || base.GetType() == LayerType::Division
170 || base.GetType() == LayerType::ElementwiseBinary)
171 && (base.GetAdditionalInformation<ActivationDescriptor>() == nullptr))
172 {
173 for (auto output = base.BeginOutputSlots(); output != base.EndOutputSlots(); ++output)
174 {
175 if (output->GetNumConnections() == 1)
176 {
177 for (auto&& childInput : output->GetConnections())
178 {
179 if ((childInput->GetOwningLayer().GetType() == LayerType::Activation) &&
180 (checkDataTypeInputandOutput(childInput->GetOwningLayer())))
181 {
182 Layer& child = childInput->GetOwningLayer();
183
184 auto* activationLayer = PolymorphicDowncast<ActivationLayer*>(&child);
185 // Before we proceed make sure that this activation layer is in the subgraph. It could be
186 // the first layer in the next subgraph.
187 if (untouched.find(activationLayer->GetGuid()) == untouched.end())
188 {
189 // We can't fuse a layer that's outside the subgraph.
190 break;
191 }
192 const std::string name = std::string("fused-") + child.GetName() + std::string("-into-") +
193 base.GetName();
194
195 // Get params from activation layer
196 ActivationDescriptor activationDesc = activationLayer->GetParameters();
197
198 if (base.GetType() == LayerType::Convolution2d)
199 {
200 Convolution2dLayer* baseLayer = PolymorphicDowncast<Convolution2dLayer*>(&base);
201
202 Optional<TensorInfo> biases;
203
204 if (baseLayer->GetParameters().m_BiasEnabled)
205 {
206 biases = baseLayer->GetInputSlot(2).GetConnectedOutputSlot()->GetTensorInfo();
207 }
208
209 arm_compute::Status status = NeonConvolution2dWorkloadValidate(
210 baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
211 activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
212 baseLayer->GetParameters(),
213 baseLayer->GetInputSlot(1).GetConnectedOutputSlot()->GetTensorInfo(),
214 biases,
215 false,
216 &activationDesc);
217
218 if (status)
219 {
220 FuseConvolution2dLayer<Convolution2dLayer>(optimizationViews,
221 baseLayer,
222 activationLayer,
223 activationDesc,
224 name);
225 untouched.erase(baseLayer->GetGuid());
226 untouched.erase(activationLayer->GetGuid());
227 }
228 }
229 else if (base.GetType() == LayerType::DepthwiseConvolution2d)
230 {
231 DepthwiseConvolution2dLayer* baseLayer =
232 PolymorphicDowncast<DepthwiseConvolution2dLayer*>(&base);
233
234 Optional<TensorInfo> biases;
235
236 if (baseLayer->GetParameters().m_BiasEnabled)
237 {
238 biases = baseLayer->GetInputSlot(2).GetConnectedOutputSlot()->GetTensorInfo();
239 }
240
241 arm_compute::Status status = NeonDepthwiseConvolutionWorkloadValidate(
242 baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
243 activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
244 baseLayer->GetParameters(),
245 baseLayer->GetInputSlot(1).GetConnectedOutputSlot()->GetTensorInfo(),
246 biases,
247 &activationDesc);
248
249 if (status)
250 {
251 FuseDepthwiseConvolution2dLayer<DepthwiseConvolution2dLayer>(optimizationViews,
252 baseLayer,
253 activationLayer,
254 activationDesc,
255 name);
256 untouched.erase(baseLayer->GetGuid());
257 untouched.erase(activationLayer->GetGuid());
258 }
259 }
260 else if (base.GetType() == LayerType::FullyConnected)
261 {
262 FullyConnectedLayer* baseLayer = PolymorphicDowncast<FullyConnectedLayer*>(&base);
263 FullyConnectedDescriptor descriptor = baseLayer->GetParameters();
264
265 // As bias is optional only try to get TensorInfo from input if bias is enabled.
266 Optional<TensorInfo> biases;
267 if (descriptor.m_BiasEnabled)
268 {
269 biases = baseLayer->GetInputSlot(2).GetConnectedOutputSlot()->GetTensorInfo();
270 }
271
272 arm_compute::Status status = NeonFullyConnectedWorkloadValidate(
273 baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
274 activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
275 baseLayer->GetInputSlot(1).GetConnectedOutputSlot()->GetTensorInfo(),
276 biases,
277 baseLayer->GetParameters(),
278 &activationDesc);
279
280 if (status)
281 {
282 FuseFullyConnectedLayer<FullyConnectedLayer>(optimizationViews,
283 baseLayer,
284 activationLayer,
285 activationDesc,
286 name);
287 untouched.erase(baseLayer->GetGuid());
288 untouched.erase(activationLayer->GetGuid());
289 }
290 }
291 else if (base.GetType() == LayerType::BatchNormalization)
292 {
293 BatchNormalizationLayer* baseLayer =
294 PolymorphicDowncast<BatchNormalizationLayer*>(&base);
295
296 arm_compute::Status status = NeonBatchNormalizationValidate(
297 baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
298 activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
299 baseLayer->m_Mean->GetTensorInfo(),
300 baseLayer->m_Variance->GetTensorInfo(),
301 baseLayer->m_Beta->GetTensorInfo(),
302 baseLayer->m_Gamma->GetTensorInfo(),
303 baseLayer->GetParameters(),
304 &activationDesc);
305
306 if (status)
307 {
308 BatchNormalizationLayer* replacementLayer =
309 FuseBatchNormalizationLayer<BatchNormalizationLayer>(optimizationViews,
310 baseLayer,
311 activationLayer,
312 activationDesc,
313 name);
314
315 replacementLayer->m_Beta = std::move(baseLayer->m_Beta);
316 replacementLayer->m_Gamma = std::move(baseLayer->m_Gamma);
317 replacementLayer->m_Mean = std::move(baseLayer->m_Mean);
318 replacementLayer->m_Variance = std::move(baseLayer->m_Variance);
319 untouched.erase(baseLayer->GetGuid());
320 untouched.erase(activationLayer->GetGuid());
321 }
322 }
323 else if (base.GetType() == LayerType::Addition)
324 {
325 AdditionLayer* baseLayer = PolymorphicDowncast<AdditionLayer*>(&base);
326
327 arm_compute::Status status = NeonAdditionWorkloadValidate(
328 baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
329 baseLayer->GetInputSlot(1).GetConnectedOutputSlot()->GetTensorInfo(),
330 activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
331 &activationDesc);
332
333 if (status)
334 {
335 FuseAdditionLayer<AdditionLayer>(optimizationViews,
336 baseLayer,
337 activationLayer,
338 activationDesc,
339 name);
340 untouched.erase(baseLayer->GetGuid());
341 untouched.erase(activationLayer->GetGuid());
342 }
343 }
344 else if (base.GetType() == LayerType::Division)
345 {
346 DivisionLayer* baseLayer = PolymorphicDowncast<DivisionLayer*>(&base);
347
348 arm_compute::Status status = NeonDivisionWorkloadValidate(
349 baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
350 baseLayer->GetInputSlot(1).GetConnectedOutputSlot()->GetTensorInfo(),
351 activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
352 &activationDesc);
353
354 if (status)
355 {
356 FuseDivisionLayer<DivisionLayer>(optimizationViews,
357 baseLayer,
358 activationLayer,
359 activationDesc,
360 name);
361 untouched.erase(baseLayer->GetGuid());
362 untouched.erase(activationLayer->GetGuid());
363 }
364 }
365 else if (base.GetType() == LayerType::Multiplication)
366 {
367 MultiplicationLayer* baseLayer = PolymorphicDowncast<MultiplicationLayer*>(&base);
368
369 arm_compute::Status status = NeonMultiplicationWorkloadValidate(
370 baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
371 baseLayer->GetInputSlot(1).GetConnectedOutputSlot()->GetTensorInfo(),
372 activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
373 &activationDesc);
374
375 if (status)
376 {
377 FuseMultiplicationLayer<MultiplicationLayer>(optimizationViews,
378 baseLayer,
379 activationLayer,
380 activationDesc,
381 name);
382 untouched.erase(baseLayer->GetGuid());
383 untouched.erase(activationLayer->GetGuid());
384 }
385 }
386 else if (base.GetType() == LayerType::Subtraction)
387 {
388 SubtractionLayer* baseLayer = PolymorphicDowncast<SubtractionLayer*>(&base);
389
390 arm_compute::Status status = NeonSubtractionWorkloadValidate(
391 baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
392 baseLayer->GetInputSlot(1).GetConnectedOutputSlot()->GetTensorInfo(),
393 activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
394 &activationDesc);
395
396 if (status)
397 {
398 FuseSubtractionLayer<SubtractionLayer>(optimizationViews,
399 baseLayer,
400 activationLayer,
401 activationDesc,
402 name);
403 untouched.erase(baseLayer->GetGuid());
404 untouched.erase(activationLayer->GetGuid());
405 }
406 }
407 else if (base.GetType() == LayerType::ElementwiseBinary)
408 {
409 ElementwiseBinaryLayer* baseLayer = PolymorphicDowncast<ElementwiseBinaryLayer*>(&base);
410
411 if (baseLayer->GetParameters().m_Operation == BinaryOperation::Add)
412 {
413 arm_compute::Status status = NeonAdditionWorkloadValidate(
414 baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
415 baseLayer->GetInputSlot(1).GetConnectedOutputSlot()->GetTensorInfo(),
416 activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
417 &activationDesc);
418
419 if (status)
420 {
421 FuseElementwiseBinaryLayer<ElementwiseBinaryLayer>(optimizationViews,
422 baseLayer,
423 activationLayer,
424 activationDesc,
425 BinaryOperation::Add,
426 name);
427 untouched.erase(baseLayer->GetGuid());
428 untouched.erase(activationLayer->GetGuid());
429 }
430 }
431 else if (baseLayer->GetParameters().m_Operation == BinaryOperation::Div)
432 {
433 arm_compute::Status status = NeonDivisionWorkloadValidate(
434 baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
435 baseLayer->GetInputSlot(1).GetConnectedOutputSlot()->GetTensorInfo(),
436 activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
437 &activationDesc);
438
439 if (status)
440 {
441 FuseElementwiseBinaryLayer<ElementwiseBinaryLayer>(optimizationViews,
442 baseLayer,
443 activationLayer,
444 activationDesc,
445 BinaryOperation::Div,
446 name);
447 untouched.erase(baseLayer->GetGuid());
448 untouched.erase(activationLayer->GetGuid());
449 }
450 }
451 else if (baseLayer->GetParameters().m_Operation == BinaryOperation::Mul)
452 {
453 arm_compute::Status status = NeonMultiplicationWorkloadValidate(
454 baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
455 baseLayer->GetInputSlot(1).GetConnectedOutputSlot()->GetTensorInfo(),
456 activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
457 &activationDesc);
458
459 if (status)
460 {
461 FuseElementwiseBinaryLayer<ElementwiseBinaryLayer>(optimizationViews,
462 baseLayer,
463 activationLayer,
464 activationDesc,
465 BinaryOperation::Mul,
466 name);
467 untouched.erase(baseLayer->GetGuid());
468 untouched.erase(activationLayer->GetGuid());
469 }
470 }
471 else if (baseLayer->GetParameters().m_Operation == BinaryOperation::Sub)
472 {
473 arm_compute::Status status = NeonSubtractionWorkloadValidate(
474 baseLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
475 baseLayer->GetInputSlot(1).GetConnectedOutputSlot()->GetTensorInfo(),
476 activationLayer->GetInputSlot(0).GetConnectedOutputSlot()->GetTensorInfo(),
477 &activationDesc);
478
479 if (status)
480 {
481 FuseElementwiseBinaryLayer<ElementwiseBinaryLayer>(optimizationViews,
482 baseLayer,
483 activationLayer,
484 activationDesc,
485 BinaryOperation::Sub,
486 name);
487 untouched.erase(baseLayer->GetGuid());
488 untouched.erase(activationLayer->GetGuid());
489 }
490 }
491 // No fusion available for other BinaryOperations
492 }
493 }
494 }
495 }
496 }
497 }
498
499 // Separate reduce layer with multiple axes into multiple reduce layers with 1 axis.
500 if (base.GetType() == LayerType::Reduce)
501 {
502 ReduceLayer* baseLayer = PolymorphicDowncast<ReduceLayer*>(&base);
503 ReduceDescriptor reduceDescriptor = baseLayer->GetParameters();
504
505 if (!reduceDescriptor.m_vAxis.empty() && reduceDescriptor.m_vAxis.size() > 1)
506 {
507 // Add new layers to the graph and connect them.
508 std::vector<IConnectableLayer*> layers = ChainReduceLayers<ReduceLayer>(optimizationViews,
509 baseLayer,
510 reduceDescriptor);
511
512 // Replace existing baselayer with new subgraph.
513 ReplaceLayers<ReduceLayer>(optimizationViews, baseLayer, layers);
514 untouched.erase(baseLayer->GetGuid());
515 }
516 }
517
518 // Remove Reshape where possible
519 if (base.GetType() == LayerType::Reshape)
520 {
521 ReshapeLayer* baseLayer = PolymorphicDowncast<ReshapeLayer*>(&base);
522
523 // Cannot remove a Reshape if it's connected to any layer that has an NCHW layout
524 if (ConnectedToLayerWithNCHW(baseLayer))
525 {
526 continue;
527 }
528 RemoveReshapeLayer(baseLayer, untouched, optimizationViews);
529 }
530
531 // Replace Add/Mul/Add where possible
532 Layer* layerList[4] = {nullptr, nullptr, nullptr, nullptr};
533 const std::vector<ActivationFunction> validActivates = { ActivationFunction::ReLu,
534 ActivationFunction::BoundedReLu };
535 if (IsLayerSequence<BinaryOperation>(base,
536 BinaryOperation::Add, BinaryOperation::Mul, BinaryOperation::Add,
537 layerList,
538 true, // handleValidActivates
539 validActivates))
540 {
541 bool fuseReLu = false;
542 unsigned int numInputs = 0;
543 unsigned int numOutputs = 0;
544 std::vector<TensorInfo> inputInfos;
545 std::vector<TensorInfo> outputInfos;
546 const ActivationDescriptor* activationDescriptor = nullptr;
547
548 if (BuildAddMulAddTensorInfoLists<Layer>(layerList,
549 numInputs,
550 numOutputs,
551 inputInfos,
552 outputInfos,
553 activationDescriptor,
554 fuseReLu))
555 {
556 // Create the new Add/Mul/Add layer and set the Relu activation function
557 FusedDescriptor fusedDescriptor(numInputs, numOutputs, FusedKernelType::AddMulAdd);
558 arm_compute::Status status = NeonFusedWorkloadValidate({inputInfos.begin(), inputInfos.end()},
559 {outputInfos.begin(), outputInfos.end()},
560 fusedDescriptor,
561 activationDescriptor);
562 if (status)
563 {
564 std::string fusedName;
565 GetFusedName(layerList, fusedName);
566
567 IConnectableLayer* addMulAddLayer =
568 optimizationViews.GetINetwork()->AddFusedLayer(fusedDescriptor, fusedName.c_str());
569
570 if (fuseReLu)
571 {
572 FusedLayer* addMulAddFusedLayer = PolymorphicDowncast<FusedLayer*>(addMulAddLayer);
573 addMulAddFusedLayer->SetAdditionalInfoForObject(
574 std::make_shared<ActivationDescriptor>(*activationDescriptor));
575 }
576
577 // Update the graph
578 std::vector<IConnectableLayer*> originalLayers;
579 for (unsigned int layerIdx = 0; layerIdx < 4; ++layerIdx)
580 {
581 if (layerList[layerIdx])
582 {
583 originalLayers.push_back(layerList[layerIdx]);
584 }
585 }
586
587 std::vector<SlotList> inputLayersSlotLists, outputLayersSlotLists;
588 BuildAddMulAddSlotLists<SlotList>(fuseReLu,
589 outputInfos.size() > 1,
590 inputLayersSlotLists,
591 outputLayersSlotLists);
592
593 ReplaceMultipleLayers<FusedLayer>(optimizationViews,
594 originalLayers,
595 PolymorphicDowncast<FusedLayer*>(addMulAddLayer),
596 inputLayersSlotLists,
597 outputLayersSlotLists);
598
599 // Remove unused layers
600 for (unsigned int layerIdx = 0; layerIdx < 4; ++layerIdx)
601 {
602 if (layerList[layerIdx])
603 {
604 untouched.erase(layerList[layerIdx]->GetGuid());
605 }
606 }
607 }
608 }
609 }
610 }
611
612 if (optimizationViews.GetSubstitutions().empty() && optimizationViews.GetDeletedSubgraphs().empty())
613 {
614 optimizationViews.AddUntouchedSubgraph(SubgraphView(subgraph));
615 }
616 else
617 {
618 ReportUntouchedLayers(optimizationViews, untouched);
619 }
620
621 return optimizationViews;
622}
void GetFusedName(Layer *layerList[4], std::string &fusedName)
arm_compute::Status NeonAdditionWorkloadValidate(const TensorInfo &input0, const TensorInfo &input1, const TensorInfo &output, const ActivationDescriptor *activationDescriptor)
arm_compute::Status NeonConvolution2dWorkloadValidate(const TensorInfo &input, const TensorInfo &output, const Convolution2dDescriptor &descriptor, const TensorInfo &weights, const Optional< TensorInfo > &biases, bool isFastMathEnabled, const ActivationDescriptor *activationDescriptor)
arm_compute::Status NeonDepthwiseConvolutionWorkloadValidate(const TensorInfo &input, const TensorInfo &output, const DepthwiseConvolution2dDescriptor &descriptor, const TensorInfo &weights, const Optional< TensorInfo > &biases, const ActivationDescriptor *activationDescriptor)
arm_compute::Status NeonDivisionWorkloadValidate(const TensorInfo &input0, const TensorInfo &input1, const TensorInfo &output, const ActivationDescriptor *activationDescriptor)
void RemoveReshapeLayer(ReshapeLayer *baseLayer, std::map< LayerGuid, Layer * > &untouched, OptimizationViews &optimizationViews)
void ReportUntouchedLayers(OptimizationViews &optimizationViews, std::map< LayerGuid, Layer * > untouched)
bool ConnectedToLayerWithNCHW(Layer *baseLayer)
Checks if the Layer is connected to any Layer that has an NCHW layout.
arm_compute::Status NeonFusedWorkloadValidate(const std::vector< std::reference_wrapper< TensorInfo > > &inputInfos, const std::vector< std::reference_wrapper< TensorInfo > > &outputInfos, const FusedDescriptor &fusedDescriptor, const ActivationDescriptor *activationDescriptor)
arm_compute::Status NeonSubtractionWorkloadValidate(const TensorInfo &input0, const TensorInfo &input1, const TensorInfo &output, const ActivationDescriptor *activationDescriptor)
arm_compute::Status NeonBatchNormalizationValidate(const TensorInfo &input, const TensorInfo &output, const TensorInfo &mean, const TensorInfo &var, const TensorInfo &beta, const TensorInfo &gamma, const BatchNormalizationDescriptor &descriptor, const ActivationDescriptor *activationDescriptor)
arm_compute::Status NeonMultiplicationWorkloadValidate(const TensorInfo &input0, const TensorInfo &input1, const TensorInfo &output, const ActivationDescriptor *activationDescriptor)
arm_compute::Status NeonFullyConnectedWorkloadValidate(const TensorInfo &input, const TensorInfo &output, const TensorInfo &weights, const Optional< TensorInfo > &biases, const FullyConnectedDescriptor &descriptor, const ActivationDescriptor *activationDescriptor)

References armnn::Activation, armnn::Add, INetwork::AddFusedLayer(), armnn::Addition, armnn::AddMulAdd, OptimizationViews::AddUntouchedSubgraph(), armnn::BatchNormalization, SubgraphView::begin(), Layer::BeginOutputSlots(), armnn::BoundedReLu, armnn::BuildAddMulAddSlotLists(), armnn::BuildAddMulAddTensorInfoLists(), armnn::ChainReduceLayers(), armnn::ConnectedToLayerWithNCHW(), armnn::Convolution2d, armnn::DepthwiseConvolution2d, armnn::Div, armnn::Division, armnn::ElementwiseBinary, SubgraphView::end(), Layer::EndOutputSlots(), armnn::FullyConnected, armnn::FuseAdditionLayer(), armnn::FuseBatchNormalizationLayer(), armnn::FuseConvolution2dLayer(), armnn::FuseDepthwiseConvolution2dLayer(), armnn::FuseDivisionLayer(), armnn::FuseElementwiseBinaryLayer(), armnn::FuseFullyConnectedLayer(), armnn::FuseMultiplicationLayer(), armnn::FuseSubtractionLayer(), Layer::GetAdditionalInformation(), InputSlot::GetConnectedOutputSlot(), OptimizationViews::GetDeletedSubgraphs(), armnn::GetFusedName(), Layer::GetGuid(), OptimizationViews::GetINetwork(), Layer::GetInputSlot(), Layer::GetName(), LayerWithParameters< Parameters >::GetParameters(), OptimizationViews::GetSubstitutions(), OutputSlot::GetTensorInfo(), Layer::GetType(), armnn::IsLayerSequence(), BatchNormalizationLayer::m_Beta, Convolution2dDescriptor::m_BiasEnabled, DepthwiseConvolution2dDescriptor::m_BiasEnabled, FullyConnectedDescriptor::m_BiasEnabled, BatchNormalizationLayer::m_Gamma, BatchNormalizationLayer::m_Mean, ElementwiseBinaryDescriptor::m_Operation, BatchNormalizationLayer::m_Variance, ReduceDescriptor::m_vAxis, armnn::Mul, armnn::Multiplication, armnn::NeonAdditionWorkloadValidate(), armnn::NeonBatchNormalizationValidate(), armnn::NeonConvolution2dWorkloadValidate(), armnn::NeonDepthwiseConvolutionWorkloadValidate(), armnn::NeonDivisionWorkloadValidate(), armnn::NeonFullyConnectedWorkloadValidate(), armnn::NeonFusedWorkloadValidate(), armnn::NeonMultiplicationWorkloadValidate(), armnn::NeonSubtractionWorkloadValidate(), armnn::PolymorphicDowncast(), armnn::Reduce, armnn::ReLu, armnn::RemoveReshapeLayer(), armnn::ReplaceLayers(), armnn::ReplaceMultipleLayers(), armnn::ReportUntouchedLayers(), armnn::Reshape, Layer::SetAdditionalInfoForObject(), armnn::Sub, and armnn::Subtraction.

◆ RegisterTensorHandleFactories()

void RegisterTensorHandleFactories ( class TensorHandleFactoryRegistry & )
overridevirtual

(Optional) Register TensorHandleFactories Either this method or CreateMemoryManager() and IWorkloadFactory::CreateTensor() IWorkloadFactory::CreateSubtensor() methods must be implemented.

Reimplemented from IBackendInternal.

Definition at line 629 of file NeonBackend.cpp.

630{
631 auto memoryManager = std::make_shared<NeonMemoryManager>(std::make_unique<arm_compute::Allocator>(),
632 BaseMemoryManager::MemoryAffinity::Offset);
633
634 registry.RegisterMemoryManager(memoryManager);
635
636 auto factory = std::make_unique<NeonTensorHandleFactory>(memoryManager);
637 // Register copy and import factory pair
638 registry.RegisterCopyAndImportFactoryPair(factory->GetId(), factory->GetId());
639 // Register the factory
640 registry.RegisterFactory(std::move(factory));
641}

References BaseMemoryManager::Offset, TensorHandleFactoryRegistry::RegisterCopyAndImportFactoryPair(), TensorHandleFactoryRegistry::RegisterFactory(), and TensorHandleFactoryRegistry::RegisterMemoryManager().


The documentation for this class was generated from the following files: