This user manual describes the CMSIS DSP software library, a suite of common compute processing functions for use on Cortex-M and Cortex-A processor based devices.
The library is divided into a number of functions each covering a specific category:
The library has generally separate functions for operating on 8-bit integers, 16-bit integers, 32-bit integer and 32-bit floating-point values and 64-bit floating-point values.
The library is providing vectorized versions of most algorithms for Helium and of most f32 algorithms for Neon.
When using a vectorized version, provide a little bit of padding after the end of a buffer (3 words) because the vectorized code may read a little bit after the end of a buffer. You don't have to modify your buffers but just ensure that the end of buffer + padding is not outside of a memory region.
A Python wrapper is also available with a Python API as close as possible to the C one. It can be used to start developing and testing an algorithm with NumPy and SciPy before writing the C version. Is is available on PyPI.org. It can be installed with: pip install cmsisdsp
.
This extension is a set of C++ headers. They just need to included to start using the features.
Those headers are not yet part of the pack and you need to get them from the github repository
More documentation about the DSP++ extension.
The library is released in source form. It is strongly advised to compile the library using -Ofast
optimization to have the best performances.
Following options should be avoided:
-fno-builtin
-ffreestanding
because it enables previous optionsThe library is doing some type punning to process word 32 from memory as a pair of q15
or a quadruple of q7
. Those type manipulations are done through memcpy
functions. Most compilers should be able to optimize out those function calls when the length to copy is small (4 bytes).
This optimization will not occur when -fno-builtin
is used and it will have a very bad impact on the performances.
The library functions are declared in the public file Include/arm_math.h
. Simply include this file to use the CMSIS-DSP library. If you don't want to include everything, you can also rely on individual header files from the Include/dsp/
folder and include only those that are needed in the project.
The library ships with a number of examples which demonstrate how to use the library functions. Please refer to Examples.
The library is now tested on Fast Models building with cmake. Core M0, M4, M7, M33, M55 are tested.
CMSIS-DSP is actively maintained in the CMSIS-DSP GitHub repository and is released as a standalone CMSIS-DSP pack in the CMSIS-Pack format.
The table below explains the content of ARM::CMSIS-DSP pack.
Directory | Description |
---|---|
📂 ComputeLibrary | Small Neon kernels when building on Cortex-A |
📂 Documentation | Folder with this CMSIS-DSP documenation |
📂 Example | Example projects demonstrating the usage of the library functions |
📂 Include | Include files for using and building the lib |
📂 PrivateInclude | Private include files for building the lib |
📂 Source | Source files |
📄 ARM.CMSIS-DSP.pdsc | CMSIS-Pack description file |
📄 LICENSE | License Agreement (Apache 2.0) |
See CMSIS Documentation for an overview of CMSIS software components, tools and specifications.
Each library project has different preprocessor macros.
ARM_MATH_BIG_ENDIAN
:ARM_MATH_MATRIX_CHECK
:ARM_MATH_ROUNDING
:ARM_MATH_LOOPUNROLL
:ARM_MATH_NEON
:ARM_MATH_NEON_EXPERIMENTAL
:ARM_MATH_HELIUM
:ARM_MATH_HELIUM_EXPERIMENTAL
:ARM_MATH_MVEF
:ARM_MATH_MVEI
:ARM_MATH_MVE_FLOAT16
:DISABLEFLOAT16
:ARM_MATH_AUTOVECTORIZE
:ARM_DSP_ATTRIBUTE
: Can be set to define CMSIS-DSP function as weak functions. This can either be set on the command line when building or in a new arm_dsp_config.h
header (see below)ARM_DSP_TABLE_ATTRIBUTE
: Can be set to define in which section constant tables must be mapped. This can either be set on the command line when building or in a new arm_dsp_config.h
header (see below). Another way to set those sections is by modifying the linker scripts since the constant tables are defined only in a restricted set of source files.ARM_DSP_CUSTOM_CONFIG
When set, the file arm_dsp_config.h
is included by the arm_math_types.h
headers. You can use this file to define any of the above compilation symbols.Previous versions were using lots of compilation flags to control code size. It was enabled with ARM_DSP_CONFIG_TABLES
. It was getting too complex and has been removed. Now code size optimizations are relying on the linker.
You no more need to use any compilation flags like ARM_TABLE_TWIDDLECOEF_F32_2048
, ARM_FFT_ALLOW_TABLES
etc ...
They have been removed.
Constant tables can use a lot of read only memory but the linker can remove the unused functions and constant tables if it can deduce that those tables or functions are not used.
For this you need to use the right initialization functions in the library and the right options for the linker (they are compiler dependent).
For all transforms functions (CFFT, RFFT ...) instead of using a generic initialization function that works for all lengths (like arm_cfft_init_f32
), use a dedicated initialization function for a specific size (like arm_cfft_init_1024_f32
).
By using the right initialization function, you're telling the linker what is really used.
If you use a generic function, the linker cannot deduce the used lengths and thus will keep all the constant tables required for each length.
Then you need to use the right options for the compiler so that the unused tables and functions are removed. It is compiler dependent but generally the options are named like -ffunction-sections
, -fdata-sections
, --gc-sections
...
Some algorithms may give slightlty different results on different architectures (like M0 or M4/M7 or M55). It is a tradeoff made for speed reasons and to make best use of the different instruction sets.
All algorithms are compared with a double precision reference and the different versions (for different architectures) have the same characteristics when compared to the double precision (SNR bound, max bound for sample error ...)
As consequence, the small differences that may exists between the different architecture implementations should be too small to have any practical consequences.
The CMSIS-DSP is provided free of charge under the Apache 2.0 License.