Arm-2D provides several standard ways to add support for the hardware acceleration of your target platform. Those methods address the typical 2D accelerations that are available in the ecosystem. Arm-2D categorizes them into several topics:
Depending on the capability of the target platform, the acceleration methods might exist simultaneously, for example, a dual-core Cortex-M55 system with Helium and ACI extensions, using DMAC-350 to accelerate some 2D operations. In other words, the aforementioned acceleration methods are not mutually exclusive. You can apply and enable them to arm-2d if available.
NOTE:
- When Helium is available, and one enables the Helium support during the compilation, Arm-2D detects the Helium and turns on the Helium acceleration automatically.
- This document is for system/application engineers who design drivers to add various accelerations to Arm-2D for a target hardware platform.
Arm-2D uses some Arm-2D specific Intrinsics in the default low-level C implementations. These intrinsics are defined as macros in the private header file __arm_2d_impl.h
:
NOTE: Arm-2D will call
__ARM_2D_PIXEL_BLENDING_INIT
every time before running the low-level implementation.
As shown above, you can override the default definition and implement with your own acceleration. Depends on the toolchain and the way of compilation, different ways of overriding the Arm-2D intrinsics are available:
arm_2d_user_sync_acc.h
, adding your definition in the header file and defining the macro __ARM_2D_HAS_TIGHTLY_COUPLED_ACC__
to 1
(in arm_2d_cfg.h
or -D
option). (This is the recommended method.)-D
in command line.-include
option in GCC, LLVM and Arm Compiler 6).Arm-2D provides the default C implementations for a set of 2D operations. Although these functions are seperated in different c source files, the prototypes are list in a private header file called __arm_2d_direct.h
.
You can override the default C implementation by using the keyword __OVERRIDE_WEAK
, for example:
This example code overrides the low-level implementation of tile-copy-with-src-mask-only for rgb565.
If you defined the macro __ARM_2D_HAS_TIGHTLY_COUPLED_ACC__
to 1
, an user defined header file arm_2d_user_sync_acc.h
will be included in compilation, as shown below:
You can use this header file to
NOTE:
- The macro
__ARM_2D_HAS_TIGHTLY_COUPLED_ACC__
does NOT affect the overriding of Arm-2D intrinsics.- It is NOT necessary but recommended to use
arm_2d_user_sync_acc.h
to override the Arm-2D intrinsics as long as you have other viable solutions (for example, use-D
command line option in GCC, LLVM and Arm Compiler 6 )- It is NOT necessary but recommanded to use macro
__ARM_2D_HAS_TIGHTLY_COUPLED_ACC__
to include the header filearm_2d_user_sync_acc.h
as long as you have other viable solutions (for example, use-include
command line option in GCC, LLVM and Arm Compiler 6 ).
After setting the macro __ARM_2D_HAS_TIGHTLY_COUPLED_ACC__
to 1
, arm_2d.c
will call the __arm_2d_sync_acc_init()
that you MUST implement in your own c source file. You can initialize the acceleration hardware logic here. If there is nothing to initialize, please place an empty function body in your c source code.
Architectually, Arm-2D is designed as a Pixel-Pipeline plus a set of OPCODE, as shown in Figure 2-1.
Figure 2-1 Arm-2D Pixel-Pipeline
Here, OPCODE is the descriptor of 2D operations. It contains both the arguments from the caller and the references to the actual algorithms for the specific 2D operation. The User Interface part provides APIs that generate and initialize OPCODE. The Frontend performs some common pre-processing for each OPCODE and generates TASK**s for the **Backend.
TASK is the descriptor of low-level tasks that can be handled by software algorithms and hardware accelerators. The key feature of the Backend is a Fall-back scheme, that the Dispatcher in the Backend will always issue tasks to the HW adaptor (a driver for a corresponding accelerator) and falls back to the software algorithm for tasks refused by the HW adaptor.
Arm-2D does not propose a standard for what functionality a hardware accelerator should provide, nor does it set requirements for the characteristics of a hardware accelerator. This is intentional. In order to get the best compatibility and interface flexibility, Arm-2D splits 2D processing into simple small tasks, validates parameters passed to the hardware (e.g. ensuring that the coordinate values are always non-negative, and memory addresses are valid), passes all the task information to the HW adapter, and let the HW adaptor to decide whether the corresponding task can be processed or not. This is the meanning of the Feature Agnostic.
Each OPCODE refers to a dedicated Low-Level-IO that has two function pointers:
SW
points to a software implementation that won't return until either the 2D operation is complete or some error happens.HW
points to the hardware adaptor (a.k.a driver) of a hardware accelerator that can work asynchronously with the caller of the Arm-2D APIs. Based on the arguments passed to the HW
, the capability and the status of the 2D accelerator, the hardware adaptor might:ARM_2D_ERR_NOT_SUPPORT
if the hardware isn't capable to do what is requested.arm_fsm_rt_cpl
if the task is done immediately (and no need to wait).arm_fsm_rt_async
if the task is done asynchronously and the driver will call function __arm_2d_notify_sub_task_cpl()
to report the result.Arm-2D provides the default C implementation (and the Helium version when it is available) for each OPCODE.
Here __arm_2d_<colour>_sw_<operation name>
are the default software implementations for corresponding Low-Level-IO. The keyword __WEAK
indicates that the target IOs can be overridden with user-defined ones. For example, if you want to accelerate copy-with-opacity for RGB565 using your own hardware accelerator, please do the following steps:
In one of your C source file, override the target Low-Level-IO __ARM_2D_IO_COPY_WITH_OPACITY_RGB565
__arm_2d_rgb565_sw_tile_copy_with_opacity()
to your source file, rename it as __arm_2d_rgb565_my_hw_tile_copy_with_opacity()
and use it as a template of the hardware adaptor.__arm_2d_rgb565_my_hw_tile_copy_with_opacity
for the hardware accelerator.ARM_2D_ERR_NOT_SUPPORT
if the hardware isn't capable to do what is requested.arm_fsm_rt_cpl
if the task is done immediately and no need to wait.arm_fsm_rt_async
if the task is done asynchronously and later report to arm-2d by calling function __arm_2d_notify_sub_task_cpl()
.NOTE: As the Arm-2D pipeline will keep issuing tasks to your hardware adaptor, please quickly check whether the hardware is capable of doing the task:
__arm_2d_sub_task_t
object) to a waiting list in First-In-First-Out manner, and handle them later.ARM_2D_ERR_NOT_SUPPORT
and the task falls-back to the SW implementation.If you defined the macro __ARM_2D_HAS_HW_ACC__
to 1
, an user defined header file arm_2d_user_async_acc.h
will be included in compilation, as shown below:
You can use this header file to
NOTE:
- The macro
__ARM_2D_HAS_HW_ACC__
does NOT affect the overriding of Arm-2D intrinsics or overriding the default OPCODEs.- It is NOT necessary but recommended to use
arm_2d_user_async_acc.h
to override the Arm-2D intrinsics and the default OPCODEs as long as you have other viable solutions (for example, use-D
command line option in GCC, LLVM and Arm Compiler 6 )- It is NOT necessary but recommanded to use macro
__ARM_2D_HAS_HW_ACC__
to include the header filearm_2d_user_async_acc.h
as long as you have other viable solutions (for example, use-include
command line option in GCC, LLVM and Arm Compiler 6 ).
After setting the macro __ARM_2D_HAS_HW_ACC__
to 1
, arm_2d.c
will call the __arm_2d_async_acc_init()
that you MUST implement in your own c source file. You can initialize the acceleration hardware logic here. If there is nothing to initialize, please place an empty function body in your c source code.
Arm-2D APIs can be used in both Synchronous mode and Asynchronous mode. In fact, The Arm-2D library is designed for working asynchronously, and wrappers are added to support synchronous mode.
The Synchronous mode is also known as the classic mode, in which a function call won't return until the task is finished or an error occurred.
The Asynchronous mode is good for the event-driven design paradigm, and it is suitable for most of the RTOS based applications and applications that are written in Protothread and/or FSM in the bare-metal system.
Please only enable Asynchronouse mode if and only if:
You can enable the Asynchronouse mode by set the macro __ARM_2D_HAS_ASYNC__
to 1
, the default value is 0
. You can modify the macro value in arm_2d_cfg.h
or define the macro __ARM_2D_HAS_ASYNC__
directly in your project, which will override the macro value defined in arm_2d_cfg.h
:
A lot of Arm Cortex-M processors support Arm Custom Instruction. When Helium extension is available, chip designers can implement the so-called Helium-based ACI which can use 128bit wide vectors and Helium registers.
In the Library/Include/template
folder, there is a template file for the arm_2d_user_aci.h
, please follow the guidance in the DISABLE DEFAULT HELIUM IMPLEMENTATION
section to disable the default Helium implementation of the low level functions.
For example, suppose you want to accelerate __arm_2d_impl_rgb565_src_msk_copy
and replace the Helium version with your own ACI accelerated one, then please do the following steps:
arm_2d_user_aci.h
from the Library/Include/template
to your own directory and add the following content:-mcpu=cortex-m55+cdecp0
in the command line), after that the macro __ARM_2D_HAS_ACI
will be set to 1
by arm-2d library automatically.If you defined the macro __ARM_2D_HAS_ACI__
to 1
, an user defined header file arm_2d_user_aci.h
will be included in compilation, as shown below:
You can use this header file to
After setting the macro __ARM_2D_HAS_ACI__
to 1
, arm_2d.c
will call the __arm_2d_aci_init()
that you MUST implement in your own c source file. You can initialize the ACI logic if required. If there is nothing to initialize, please place an empty function body in your c source code.