Release versions
All releases are numbered vYY.MM Where YY are the last two digits of the year, and MM the month number. If there is more than one release in a month then an extra sequential number is appended at the end:
v17.03 (First release of March 2017)
v17.03.1 (Second release of March 2017)
v17.04 (First release of April 2017)
- Note
- We're aiming at releasing one major public release with new features per quarter. All releases in between will only contain bug fixes.
Changelog
v21.05 Public major release
- Various bug fixes.
- Various optimisations.
- Various documentation updates:
- Add supported operators and coressponding Android NNAPI operators.
- Documentaiton reorg into user guide and contributor guide.
- Add support for a global allocator for OpenCL tensors
- Add experimental support for CLVK.
- Add data type S32 support for:
- Add data type QASYMM8 support for:
- Add per-channel quantization support for:
- Remove padding from OpenCL kernels:
- Remove computer vision support from Arm® Neon™ backend
- Remove the following functions:
- NEAbsoluteDifference
- NEAccumulate
- NEBox3x3
- NECannyEdge
- NEChannelCombine
- NEChannelExtract
- NEColorConvert
- NEConvolution
- NEDerivative
- NEDilate
- NEEqualizeHistogram
- NEErode
- NEFastCorners
- NEGaussian3x3
- NEGaussian5x5
- NEGaussianPyramid
- NEHOGDescriptor
- NEHOGDetector
- NEHOGGradient
- NEHOGMultiDetection
- NEHarrisCorners
- NEHistogram
- NEIntegralImage
- NELaplacianPyramid
- NELaplacianReconstruct
- NEMagnitude
- NEMeanStdDev
- NEMedian3x3
- NEMinMaxLocation
- NENonLinearFilter
- NEOpticalFlow
- NEPhase
- NEScharr3x3
- NESobel3x3
- NESobel5x5
- NESobel7x7
- NETableLookup
- NEThreshold
- NEWarpAffine
- NEWarpPerspectiveKernel
- Remove all GLES kernels / functions / tests / examples
- Remove computer vision support from CL backend
- Remove the following functions:
- CLAbsoluteDifference
- CLAccumulate
- CLBox3x3
- CLCannyEdge
- CLChannelCombine
- CLChannelExtract
- CLColorConvert
- CLConvolution
- CLDerivative
- CLDilate
- CLEqualizeHistogram
- CLErode
- CLFastCorners
- CLGaussian3x3
- CLGaussian5x5
- CLGaussianPyramid
- CLHOGDescriptor
- CLHOGDetector
- CLHOGGradient
- CLHOGMultiDetection
- CLHarrisCorners
- CLHistogram
- CLIntegralImage
- CLLaplacianPyramid
- CLLaplacianReconstruct
- CLMagnitude
- CLMeanStdDev
- CLMedian3x3
- CLMinMaxLocation
- CLNonLinearFilter
- CLOpticalFlow
- CLPhase
- CLScharr3x3
- CLSobel3x3
- CLSobel5x5
- CLSobel7x7
- CLTableLookup
- CLThreshold
- CLWarpAffine
- CLWarpPerspective
v21.02 Public major release
- Various bug fixes.
- Various optimisations.
- Upgrade C++ standard to C++14
- Add macOS support
- Add Armv8-R AArch64 architecture support
- Add SVE/SVE2 support for:
- Remove padding from OpenCL kernels:
- Deprecate functions in CLTuner:
- add_lws_to_table
- import_lws_table
- lws_table
- Remove functions:
- NELocallyConnectedLayer / CLLocallyConnectedLayer
- NEIm2Col
- NECol2Im
- NEGEMMInterleave4x4
- NEGEMMTranspose1xW
- NEComputeAllAnchors / CLComputeAllAnchors
- NEGEMMAssemblyDispatch
- NEUpsampleLayer / CLUpsampleLayer
- Remove kernels:
- NEGEMMMatrixVectorMultiplyKernel
- NELocallyConnectedMatrixMultiplyKernel / CLLocallyConnectedMatrixMultiplyKernel
- NEUpsampleLayerKernel / CLUpsampleLayerKernel
- Extend OpenCL tuner with workgroup batch size support
- Experimental extension for the OpenCL tuner to tune the batches of work groups distribute to compute units
- Add functionality to load the OpenCL GEMM heuristics at runtime
- The GEMM heuristic file (MLGO) can be used to update the default GEMM heuristics available for OpenCL
- Note: there might be performance regressions against v20.08 in Inception v3 using int8 data types on Arm Mali-G77 GPUs. Currently under investigation
- Note: data-type decoupling is in progress and expiremental. Warning of unused symbols might be raised
v20.11 Public major release
- Various bug fixes.
- Various optimisations.
- Performance regressions can be noted when executing Depthwise Convolution on Arm® Neon™ with a depth multiplier > 1 for quantized data type. This is planned to be resolved in 21.02 release.
- Added new data type QASYMM8_SIGNED support for NEROIAlignLayer.
- Added new data type S32 support for:
- Interface change
- Properly support softmax axis to have the same meaning as other major frameworks. That is, axis now defines the dimension on which Softmax/Logsoftmax is performed. E.g. for input of shape 4x5x6 and axis=1, softmax will be applied to 4x6=24 vectors of size 5. The supported value range of axis is [-rank, rank). This change applies to the following functions:
- New OpenCL kernels / functions:
- New Arm® Neon™ kernels / functions:
- Removed padding from Arm® Neon™ kernels:
- Removed padding from OpenCL kernels:
- Removed OpenCL kernels / functions:
- CLGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel
- CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel
- CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel
- Deprecated OpenCL kernels / functions (If a kernel is used only by the function that is being deprecated, the kernel is deprecated together):
- CLLocallyConnectedLayer
- CLLocallyConnectedMatrixMultiplyKernel
- CLAbsoluteDifference
- CLAbsoluteDifferenceKernel
- CLAccumulate
- CLAccumulateKernel
- CLAccumulateSquared
- CLAccumulateSquaredKernel
- CLAccumulateWeighted
- CLAccumulateWeightedKernel
- CLAccumulateWeightedFP16Kernel
- CLBox3x3
- CLBox3x3Kernel
- CLBox3x3FP16Kernel
- CLCannyEdge
- CLChannelCombine
- CLChannelCombineKernel
- CLChannelExtract
- CLChannelExtractKernel
- CLColorConvert
- CLColorConvertKernel
- CLConvolution3x3
- CLConvolutionRectangle
- CLConvolutionRectangleKernel
- CLConvolutionSquare
- CLConvolutionKernel
- CLDerivative
- CLDerivativeKernel
- CLDilate
- CLDilateKernel
- CLEqualizeHistogram
- CLErode
- CLErodeKernel
- CLFastCorners
- CLFastCornersKernel
- CLGaussian3x3
- CLGaussian3x3Kernel
- CLGaussian5x5
- CLGaussian5x5HorKernel
- CLGaussian5x5VertKernel
- CLGaussianPyramid
- CLGaussianPyramidHalf
- CLGaussianPyramidOrb
- CLHarrisCorners
- CLHarrisScoreKernel
- CLHarrisScoreFP16Kernel
- CLHistogram
- CLHistogramKernel
- CLHOGOrientationBinningKernel
- CLHOGBlockNormalizationKernel
- CLHOGDetectorKernel
- CLHOGNonMaximaSuppressionKernel
- CLHOGDescriptor
- CLHOGDetector
- CLHOGGradient
- CLHOGMultiDetection
- CLHOGOrientationBinningKernel
- CLHOGBlockNormalizationKernel
- CLHOGDetectorKernel
- CLIntegralImage
- CLIntegralImageKernel
- CLLaplacianReconstruct
- CLLaplacianPyramid
- CLMagnitude
- CLMagnitudePhaseKernel
- CLMedian3x3
- CLMedian3x3Kernel
- CLMinMaxLocation
- CLMinMaxLocationKernel
- CLNonLinearFilter
- CLNonLinearFilterKernel
- CLNonMaximaSuppression3x3
- CLNonMaximaSuppression3x3FP16Kernel
- CLNonMaximaSuppression3x3Kernel
- CLOpticalFlow
- CLPhase
- CLRemap
- CLRemapKernel
- CLScharr3x3
- CLScharr3x3Kernel
- CLSobel3x3
- CLSobel3x3Kernel
- CLSobel5x5
- CLSobel5x5HorKernel
- CLSobel5x5VertKernel
- CLSobel7x7
- CLSobel7x7HorKernel
- CLSobel7x7VertKernel
- CLThreshold
- CLThresholdKernel
- CLWarpAffine
- CLWarpAffineKernel
- CLWarpPerspective
- CLWarpPerspectiveKernel
- Deprecated Arm® Neon™ kernels / functions (If a kernel is used only by the function that is being deprecated, the kernel is deprecated together):
- NELocallyConnectedLayer
- NELocallyConnectedMatrixMultiplyKernel
- NEAbsoluteDifference
- NEAbsoluteDifferenceKernel
- NEAccumulate
- NEAccumulateKernel
- NEAccumulateSquared
- NEAccumulateSquaredKernel
- NEAccumulateWeighted
- NEAccumulateWeightedKernel
- NEAccumulateWeightedFP16Kernel
- NEBox3x3
- NEBox3x3Kernel
- NEBox3x3FP16Kernel
- NECannyEdge
- NEChannelCombine
- NEChannelCombineKernel
- NEChannelExtract
- NEChannelExtractKernel
- NEColorConvert
- NEColorConvertKernel
- NEConvolution3x3
- NEConvolutionRectangle
- NEConvolutionRectangleKernel
- NEConvolutionSquare
- NEConvolutionKernel
- NEDerivative
- NEDerivativeKernel
- NEDilate
- NEDilateKernel
- NEEqualizeHistogram
- NEErode
- NEErodeKernel
- NEFastCorners
- NEFastCornersKernel
- NEGaussian3x3
- NEGaussian3x3Kernel
- NEGaussian5x5
- NEGaussian5x5HorKernel
- NEGaussian5x5VertKernel
- NEGaussianPyramid
- NEGaussianPyramidHalf
- NEGaussianPyramidOrb
- NEHarrisCorners
- NEHarrisScoreKernel
- NEHarrisScoreFP16Kernel
- NEHistogram
- NEHistogramKernel
- NEHOGOrientationBinningKernel
- NEHOGBlockNormalizationKernel
- NEHOGDetectorKernel
- NEHOGNonMaximaSuppressionKernel
- NEHOGDescriptor
- NEHOGDetector
- NEHOGGradient
- NEHOGMultiDetection
- NEHOGOrientationBinningKernel
- NEHOGBlockNormalizationKernel
- NEHOGDetectorKernel
- NEIntegralImage
- NEIntegralImageKernel
- NELaplacianReconstruct
- NELaplacianPyramid
- NEMagnitude
- NEMagnitudePhaseKernel
- NEMedian3x3
- NEMedian3x3Kernel
- NEMinMaxLocation
- NEMinMaxLocationKernel
- NENonLinearFilter
- NENonLinearFilterKernel
- NENonMaximaSuppression3x3
- NENonMaximaSuppression3x3FP16Kernel
- NENonMaximaSuppression3x3Kernel
- NEOpticalFlow
- NEPhase
- NERemap
- NERemapKernel
- NEScharr3x3
- NEScharr3x3Kernel
- NESobel3x3
- NESobel3x3Kernel
- NESobel5x5
- NESobel5x5HorKernel
- NESobel5x5VertKernel
- NESobel7x7
- NESobel7x7HorKernel
- NESobel7x7VertKernel
- NEThreshold
- NEThresholdKernel
- NEWarpAffine
- NEWarpAffineKernel
- NEWarpPerspective
- NEWarpPerspectiveKernel
- Deprecated GLES kernels / functions (If a kernel is used only by the function that is being deprecated, the kernel is deprecated together):
- GCAbsoluteDifference
- GCActivationLayer
- GCArithmeticAddition
- GCBatchNormalizationLayer
- GCConcatenateLayer
- GCConvolutionLayer
- GCDepthwiseConvolutionLayer
- GCDirectConvolutionLayer
- GCDropoutLayer
- GCFillBorder
- GCFullyConnectedLayer
- GCGEMM
- GCGEMMInterleave4x4
- GCGEMMTranspose1xW
- GCNormalizationLayer
- GCNormalizePlanarYUVLayer
- GCPixelWiseMultiplication
- GCPoolingLayer
- GCScale
- GCSoftmaxLayer
- GCTensorShift
- GCTranspose
v20.08 Public major release
- Various bug fixes.
- Various optimisations.
- Added new data type QASYMM8_SIGNED support for:
- Added new data type U8 support for:
- Added aligh_corner support for nearest neighbor interpolation in:
- NEScaleKernel
- CLScaleKernel
- New OpenCL kernels / functions:
- New Arm® Neon™ kernels / functions:
- New graph example:
- graph_yolov3_output_detector
- GEMMTuner improvements:
- Added fp16 support
- Output json files for easier integration
- Enabled tuning for export_to_cl_image_rhs option for RHS tensors
- More robust script for running benchmarks
- Removed padding from:
- Removed OpenCL kernels / functions:
- CLGEMMLowpQuantizeDownInt32ToUint8Scale
- CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFloat
- Removed Arm® Neon™ kernels / functions:
- NEGEMMLowpQuantizeDownInt32ToUint8Scale
- NEGEMMMatrixAccumulateBiasesKernel
- Deprecated functions / interfaces:
- The support for quantized data types has been removed from CLLogSoftmaxLayer due to implementation complexity.
- Removed padding requirement for the input (e.g. LHS of GEMM) and output in CLGEMMMatrixMultiplyNativeKernel, CLGEMMMatrixMultiplyReshapedKernel, CLGEMMMatrixMultiplyReshapedOnlyRHSKernel and CLIm2ColKernel (NHWC only)
- Added support for exporting the OpenCL buffer object to the OpenCL image object in CLGEMMMatrixMultiplyReshapedKernel and CLGEMMMatrixMultiplyReshapedOnlyRHSKernel.
- This support allows to export the OpenCL buffer used for the reshaped RHS matrix to the OpenCL image object.
- The padding requirement for the OpenCL image object is considered into the CLGEMMReshapeRHSMatrixKernel.
- The reshaped RHS matrix stores the weights when GEMM is used to accelerate CLGEMMConvolutionLayer.
v20.05 Public major release
- Various bug fixes.
- Various optimisations.
- Updated recommended NDK version to r18b.
- Updated recommended gcc version to Linaro 6.3.1.
- Added Bfloat16 type support
- Added Bfloat16 support in:
- Added new data type QASYMM8_SIGNED support for:
- New OpenCL kernels / functions:
- New Arm® Neon™ kernels / functions:
- Added HARD_SWISH support in:
- CLActivationLayerKernel
- NEActivationLayerKernel
- Deprecated OpenCL kernels / functions:
- CLGEMMLowpQuantizeDownInt32ToUint8Scale
- CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFloat
- Deprecated Arm® Neon™ kernels / functions:
- NEGEMMLowpQuantizeDownInt32ToUint8Scale
- Removed CPP kernels / functions:
- Removed PoolingLayerInfo constructors without Data Layout.
- Removed CLDepthwiseConvolutionLayer3x3
- Removed NEDepthwiseConvolutionLayerOptimized
- Added support for Winograd 3x3,4x4 on Arm® Neon™ FP16:
- Added CLCompileContext
- Added Arm® Neon™ GEMM kernel with 2D window support
v20.02.1 Maintenance release
- Added Android-NN build script.
v20.02 Public major release
- Various bug fixes.
- Various optimisations.
- Added new data type QASYMM8_SIGNED support for:
- Added support for QSYMM8_PER_CHANNEL in:
- NEDepthwiseConvolutionLayer3x3Kernel
- Added support for split sizes in:
- New OpenCL kernels / functions:
- New Arm® Neon™ kernels / functions:
- Deprecated Arm® Neon™ functions / interfaces:
- CLDepthwiseConvolutionLayer3x3
- NEDepthwiseConvolutionLayerOptimized
- PoolingLayerInfo constructors without Data Layout.
- Added support for quantization with multiplier greater than 1 on Arm® Neon™ and CL.
- Added support for quantized inputs of type QASYMM8_SIGNED and QASYMM8 to CLQuantizationLayer.
- Added the ability to build bootcode for bare metal.
- Added support for generating synthetic QASYMM8 graphs.
- Added support for F16 datatype in VGG16.
- Removed pre-built binaries for GLES.
v19.11.1 Public maintenance release
- Fix offset calculation in NEReductionOperationKernel.
- Fix data layout in NEScaleKernel for nhwc.
- Retain configuration step data layout to avoid side-effects.
- Perform sqrt in double domain for L2 pooling.
- Fix output shape calculation for Reduce Mean
- Restrict cases where optimized NEPadLayer runs.
v19.11 Public major release
- Various bug fixes.
- Various optimisations.
- Updated recommended NDK version to r17c.
- Deprecated OpenCL kernels / functions:
- CLDepthwiseConvolutionLayerReshapeWeightsGenericKernel
- CLDepthwiseIm2ColKernel
- CLDepthwiseSeparableConvolutionLayer
- CLDepthwiseVectorToTensorKernel
- CLDirectConvolutionLayerOutputStageKernel
- Deprecated Arm® Neon™ kernels / functions:
- NEDepthwiseWeightsReshapeKernel
- NEDepthwiseIm2ColKernel
- NEDepthwiseSeparableConvolutionLayer
- NEDepthwiseVectorToTensorKernel
- NEDepthwiseConvolutionLayer3x3
- New OpenCL kernels / functions:
- New Arm® Neon™ kernels / functions:
- Added QASYMM8 support for:
- Added QASYMM16 support for:
- Added FP16 support for:
- Added new data type QASYMM8_PER_CHANNEL support for:
- Added new data type QSYMM8_PER_CHANNEL support for:
- Added FP16 mixed-precision support for:
- Added FP32 and FP16 ELU activation for:
- Added asymmetric padding support for:
- Added SYMMETRIC and REFLECT modes for CLPadLayerKernel / CLPadLayer.
- Replaced the calls to NECopyKernel and NEMemsetKernel with NEPadLayer in NEGenerateProposalsLayer.
- Replaced the calls to CLCopyKernel and CLMemsetKernel with CLPadLayer in CLGenerateProposalsLayer.
- Improved performance for CL Inception V3 - FP16.
- Improved accuracy for CL Inception V3 - FP16 by enabling FP32 accumulator (mixed-precision).
- Improved Arm® Neon™ performance by enabling fusing batch normalization with convolution and depth-wise convolution layer.
- Improved Arm® Neon™ performance for MobileNet-SSD by improving the output detection performance.
- Optimized CLPadLayer.
- Optimized CL generic depthwise convolution layer by introducing CLDepthwiseConvolutionLayerNativeKernel.
- Reduced memory consumption by implementing weights sharing.
v19.08.1 Public maintenance release
- Fix offset calculation in NEReductionOperationKernel.
- Fix data layout in NEScaleKernel for nhwc.
- Retain configuration step data layout to avoid side-effects.
- Perform sqrt in double domain for L2 pooling.
- Fix output shape calculation for Reduce Mean
- Fix broadcast CLPixelwiseMultiplication with 5D tensors
v19.08 Public major release
- Various bug fixes.
- Various optimisations.
- Deprecated Arm® Neon™ functions
- NEDepthConcatenateLayer
- NEWidthConcatenateLayer
- Deprecated OpenCL kernels / functions
- CLDepthConcatenateLayer
- CLGEMMInterleave4x4Kernel / CLGEMMInterleave4x4
- CLGEMMTranspose1xWKernel / CLGEMMTranspose1xW
- CLWidthConcatenateLayer
- New Arm® Neon™ kernels / functions:
- New OpenCL kernels / functions:
- New examples:
- neon_opticalflow
- cl_cache
- neon_permute
- Added support for FP16 in NEDeconvolutionLayer
- Added support for FP16 in CLDeconvolutionLayer
- Added support for REDUCE_MIN and REDUCE_MAX in ReductionOperation
- Enable the fusion of batch normalization with convolution and depthwise convolution layer for FP32 in the graph API (OpenCL only)
- Added support for fusing activation function and broadcast addition with the matrix multiplication for FP32 (OpenCL only)
- Re-factored the depthwise convolution layer kernel on Arm® Neon™ for generic cases
- Added an optimized depthwise convolution layer kernel for 5x5 filters (Neon only)
- Added support to enable OpenCL kernel cache. Added example showing how to load the prebuilt OpenCL kernels from a binary cache file
- Altered QuantizationInfo interface to support per-channel quantization.
- The CLDepthwiseConvolutionLayer3x3 will be included by CLDepthwiseConvolutionLayer to accommodate for future optimizations.
- The NEDepthwiseConvolutionLayerOptimized will be included by NEDepthwiseConvolutionLayer to accommodate for future optimizations.
- Removed inner_border_right and inner_border_top parameters from CLDeconvolutionLayer interface
- Removed inner_border_right and inner_border_top parameters from NEDeconvolutionLayer interface
- Optimized the Arm® Neon™ assembly kernel for GEMMLowp. The new implementation fuses the output stage and quantization with the matrix multiplication kernel
v19.05 Public major release
- Various bug fixes.
- Various optimisations.
- New Arm® Neon™ kernels / functions:
- New OpenCL kernels / functions:
- New OpenGLES kernels / functions:
- Deprecated functions/interfaces
- GCDepthConcatenateLayer
- NEWidthConcatenateLayer
- NEDepthConcatenateLayer
- CLWidthConcatenateLayer
- CLDepthConcatenateLayer
- CLGEMMInterleave4x4
- CLGEMMTranspose1xW
- Support different quantization info in CLConcatLayer.
- Add checks on different input/output quantization info were not supported.
- Tensors have different quantization information.
- Add FP16 support checks.
- Fix output quantization CLDeptwiseConv3x3 when activation is fused.
- New graph examples:
- graph_convolution
- graph_fully_connected
- graph_depthwise_convolution
- Deepspeech v0.4.1
- Add support for QASYMM8 in NEArithmeticSubtractionKernel.
- Add support for QASYMM8 in NEPixelWiseMultiplicationKernel.
- Add support for QASYMM8 NEDeconvolution.
- Add support for DequantizationLayer for Neon/CL.
- Add support for dilation in CLDepthwiseConvolution.
- Fuse offset contribution with the output stage when we use NEGEMMLowpMatrixMultiplyCore.
- Optimize CLDeconvolution.
- Add StackLayer to the graph API.
- Add support for "reflect" padding mode in NEPad.
- Winograd 7x7 NHWC on OpenCL.
- Rework CL ML layers to run exclusively on CL.
- Support different quantization info in PoolingLayer.
- Implement and test import memory interfaces.
- Added new tests and removed old ones.
- Various clang-tidy fixes.
v19.02 Public major release
- Various bug fixes.
- Various optimisations.
- New Arm® Neon™ kernels / functions:
- New OpenCL kernels / functions:
- New CPP kernels / functions:
- Added new examples:
- Add 4D tensors support to
- Fused activation in CLWinogradConvolutionLayer
- Extented NEPermute to support more cases
- Added Neon/SVE GEMM Hybrid kernels
- Added u8 and s8 hybrid assembly kernels
- Introduced GEMM strategy name in NEGEMMAssemblyWrapper
- Improved CLTuner
- Fused the bias addition within CLGEMM
- Added support for QASYMM8 LOGISTIC activation in NEActivationLayer
- Added NHWC data layout support to:
- Added QASYMM8 support to the following kernels:
- Added new tests and improved validation and benchmarking suites.
- Deprecated functions/interfaces
v18.11 Public major release
- Various bug fixes.
- Various optimisations.
- New Arm® Neon™ kernels / functions:
- New OpenCL kernels / functions:
- New CPP kernels / functions:
- Added the validate method in:
- Added new examples:
- Added documentation for add a new function or kernel.
- Improved doxygen documentation adding a list of the existing functions.
- Add 4D tensors support to
- Add dot product support for CLDepthwiseConvolutionLayer3x3NHWCKernel non-unit stride
- Add SVE support
- Fused batch normalization into convolution layer weights in CLFuseBatchNormalization
- Fuses activation in CLDepthwiseConvolutionLayer3x3NCHWKernel, CLDepthwiseConvolutionLayer3x3NHWCKernel and NEGEMMConvolutionLayer
- Added NHWC data layout support to:
- Added QASYMM8 support to the following kernels:
- CLScaleKernel
- NEDepthwiseConvolutionLayer3x3Kernel
- CLPixelWiseMultiplicationKernel
- Added FP16 support to the following kernels:
- More tests added to both validation and benchmarking suites.
v18.08 Public major release
- Various bug fixes.
- Various optimisations.
- Updated recommended NDK version to r17b.
- Removed support for QS8/QS16 data types.
- Added support for grouped convolution in CLConvolutionLayer.
- Added NHWC data layout support to:
- New Arm® Neon™ kernels / functions:
- New OpenCL kernels / functions:
- Introduced prepare() stage support in the graph API for GLES.
- Added support for memory reusage when trying to allocate smaller CLTensors.
- Enabled NHWC execution on graph examples.
- Added JPEG accessor for validation purposes.
- Added validate methods to some kernels / functions.
v18.05 Public major release
- Various bug fixes.
- Various optimisations.
- Major redesign in the interface for the neon kernels implemented in assembly.
- Removed arm_compute::NEGEMMLowpAArch64A53Kernel / arm_compute::NEGEMMLowpAArch64Kernel / arm_compute::NEGEMMLowpAArch64V8P4Kernel / arm_compute::NEGEMMInterleavedBlockedKernel / arm_compute::NEGEMMLowpAssemblyMatrixMultiplyCore / arm_compute::NEHGEMMAArch64FP16Kernel
- Added NEGEMMAssemblyWrapper and AssemblyKernelGlue which are used to execute assembly kernels in neon functions.
- Minor changes to the CPUInfo type to make it compatible with the new assembly gemm interface.
- Moved neon assembly kernels to the folder src/core/Neon/kernels/arm_gemm.
- Improved doxygen documentation.
- Improved memory management for layer's transitions.
- Added support for NHWC data layout in tensors.
- Added NHWC data layout support to:
- Added support for dilated convolutions in NEConvolutionLayer and CLConvolutionLayer.
- New OpenCL kernels / functions:
- New Arm® Neon™ kernels / functions:
- Created the validate method in CLDepthwiseConvolutionLayer.
- Beta and gamma are no longer mandatory arguments in NEBatchNormalizationLayer and CLBatchNormalizationLayer.
- Added depth multiplier support in NEDepthwiseConvolutionLayer and CLDepthwiseConvolutionLayer.
- Added broadcast multiply support in NEPixelWiseMultiplication / NEPixelWiseMultiplicationKernel.
- Port mobilenet example to NHWC data layout.
- Enabled Winograd method in CLConvolutionLayer.
- Renamed NEWinogradLayer to NEWinogradConvolutionLayer.
- Updated NEWinogradConvolutionLayer to use highly optimised assembly kernels in src/core/Neon/kernels/arm_gemm.
- Added memory manager support in GLES functions.
- Major refactoring of the graph API.
- Added GLES backend in the graph API.
- Added support for the memory manager in the graph API.
- Enabled Winograd Convolution method in the graph API.
- Added support for grouped convolutions in the graph API.
- Replaced NEDeconvolutionLayerUpsampleKernel with NEScaleKernel in NEDeconvolutionLayer.
- Added fast maths flag in CLConvolutionLayer.
- Added new tests and benchmarks in validation and benchmark frameworks
- Merge Activation layer with Convolution Layer (Neon. CL, GLES)
- Added support to OpenCL 2.0 SVM
- Added support to import memory in OpenCL tensors.
- Added the prepare() method to perform any one off pre-processing before running the function.
- Added new examples:
- Added memory measurement instrument for CL.
v18.03 Public maintenance release
- Various bug fixes.
- Fixed bug in NEActivationLayer
- Fix in CLTuner when using batches.
- Updated recommended NDK version to r16b (And fixed warnings).
- Fixed bug in validation code.
- Added Inception v4 graph example.
- Renamed NEWinogradLayer.cpp to NEWinogradConvolutionLayer
v18.02 Public major release
v18.01 Public maintenance release
- Various bug fixes
- Added some of the missing validate() methods
- Added CLDeconvolutionLayerUpsampleKernel / CLDeconvolutionLayer CLDeconvolutionLayerUpsample
- Added CLPermuteKernel / CLPermute
- Added method to clean the programs cache in the CL Kernel library.
- Added GCArithmeticAdditionKernel / GCArithmeticAddition
- Added GCDepthwiseConvolutionLayer3x3Kernel / GCDepthwiseConvolutionLayer3x3
- Added GCNormalizePlanarYUVLayerKernel / GCNormalizePlanarYUVLayer
- Added GCScaleKernel / GCScale
- Added GCWeightsReshapeKernel / GCConvolutionLayer
- Added FP16 support to the following GLES compute kernels:
- GCCol2ImKernel
- GCGEMMInterleave4x4Kernel
- GCGEMMTranspose1xWKernel
- GCIm2ColKernel
- Refactored Arm® Neon™ Winograd (NEWinogradLayerKernel)
- Added NEDirectConvolutionLayerOutputStageKernel
- Added QASYMM8 support to the following Arm® Neon™ kernels:
- Added new examples:
- More tests added to both validation and benchmarking suites.
v17.12 Public major release
- Most machine learning functions on OpenCL support the new data type QASYMM8
- Introduced logging interface
- Introduced opencl timer
- Reworked GEMMLowp interface
- Added new Arm® Neon™ assembly kernels for GEMMLowp, SGEMM and HGEMM
- Added validation method for most Machine Learning kernels / functions
- Added new graph examples such as googlenet, mobilenet, squeezenet, vgg16 and vgg19
- Added sgemm example for OpenCL
- Added absolute difference example for GLES compute
- Added new tests and benchmarks in validation and benchmark frameworks
- Added new kernels / functions for GLES compute
- New OpenGL ES kernels / functions
- GCAbsoluteDifferenceKernel / GCAbsoluteDifference
- GCActivationLayerKernel / GCActivationLayer
- GCBatchNormalizationLayerKernel / GCBatchNormalizationLayer
- GCCol2ImKernel
- GCDepthConcatenateLayerKernel / GCDepthConcatenateLayer
- GCDirectConvolutionLayerKernel / GCDirectConvolutionLayer
- GCDropoutLayerKernel / GCDropoutLayer
- GCFillBorderKernel / GCFillBorder
- GCGEMMInterleave4x4Kernel / GCGEMMInterleave4x4
- GCGEMMMatrixAccumulateBiasesKernel / GCGEMMMatrixAdditionKernel / GCGEMMMatrixMultiplyKernel / GCGEMM
- GCGEMMTranspose1xWKernel / GCGEMMTranspose1xW
- GCIm2ColKernel
- GCNormalizationLayerKernel / GCNormalizationLayer
- GCPixelWiseMultiplicationKernel / GCPixelWiseMultiplication
- GCPoolingLayerKernel / GCPoolingLayer
- GCLogits1DMaxKernel / GCLogits1DShiftExpSumKernel / GCLogits1DNormKernel / GCSoftmaxLayer
- GCTransposeKernel / GCTranspose
- New Arm® Neon™ kernels / functions
- New OpenCL kernels / functions
- New graph nodes for Arm® Neon™ and OpenCL
- graph::BranchLayer
- graph::DepthConvertLayer
- graph::DepthwiseConvolutionLayer
- graph::DequantizationLayer
- graph::FlattenLayer
- graph::QuantizationLayer
- graph::ReshapeLayer
v17.10 Public maintenance release
- Bug fixes:
- Check the maximum local workgroup size supported by OpenCL devices
- Minor documentation updates (Fixed instructions to build the examples)
- Introduced a graph::GraphContext
- Added a few new Graph nodes, support for branches and grouping.
- Automatically enable cl_printf in debug builds
- Fixed bare metal builds for armv7a
- Added AlexNet and cartoon effect examples
- Fixed library builds: libraries are no longer built as supersets of each other.(It means application using the Runtime part of the library now need to link against both libarm_compute_core and libarm_compute)
v17.09 Public major release
v17.06 Public major release
- Various bug fixes
- Added support for fixed point 8 bit (QS8) to the various Arm® Neon™ machine learning kernels.
- Added unit tests and benchmarks (AlexNet, LeNet)
- Added support for sub tensors.
- Added infrastructure to provide GPU specific optimisation for some OpenCL kernels.
- Added OMPScheduler (OpenMP) scheduler for Neon
- Added SingleThreadScheduler scheduler for Arm® Neon™ (For bare metal)
- User can specify his own scheduler by implementing the IScheduler interface.
- New OpenCL kernels / functions:
- New C++ kernels:
- CPPDetectionWindowNonMaximaSuppressionKernel
- New Arm® Neon™ kernels / functions:
v17.05 Public bug fixes release
- Various bug fixes
- Remaining of the functions ported to use accurate padding.
- Library does not link against OpenCL anymore (It uses dlopen / dlsym at runtime instead to determine whether or not OpenCL is available).
- Added "free" method to allocator.
- Minimum version of g++ required for armv7 Linux changed from 4.8 to 4.9
v17.04 Public bug fixes release
The following functions have been ported to use the new accurate padding:
- CLColorConvertKernel
- CLEdgeNonMaxSuppressionKernel
- CLEdgeTraceKernel
- CLGaussianPyramidHorKernel
- CLGaussianPyramidVertKernel
- CLGradientKernel
- NEChannelCombineKernel
- NEFillArrayKernel
- NEGaussianPyramidHorKernel
- NEGaussianPyramidVertKernel
- NEHarrisScoreFP16Kernel
- NEHarrisScoreKernel
- NEHOGDetectorKernel
- NELogits1DMaxKernel
- NELogits1DShiftExpSumKernel
- NELogits1DNormKernel
- NENonMaximaSuppression3x3FP16Kernel
- NENonMaximaSuppression3x3Kernel
v17.03.1 First Major public release of the sources
- Renamed the library to arm_compute
- New CPP target introduced for C++ kernels shared between Arm® Neon™ and CL functions.
- New padding calculation interface introduced and ported most kernels / functions to use it.
- New OpenCL kernels / functions:
- CLGEMMLowpMatrixMultiplyKernel / CLGEMMLowp
- New Arm® Neon™ kernels / functions:
v17.03 Sources preview
- New OpenCL kernels / functions:
- New Arm® Neon™ kernels / functions:
v17.02.1 Sources preview
- New OpenCL kernels / functions:
- New Arm® Neon™ FP16 kernels (Requires armv8.2 CPU)
- NEAccumulateWeightedFP16Kernel
- NEBox3x3FP16Kernel
- NENonMaximaSuppression3x3FP16Kernel
v17.02 Sources preview
- New OpenCL kernels / functions:
- CLActivationLayerKernel / CLActivationLayer
- CLChannelCombineKernel / CLChannelCombine
- CLDerivativeKernel / CLChannelExtract
- CLFastCornersKernel / CLFastCorners
- CLMeanStdDevKernel / CLMeanStdDev
- New Arm® Neon™ kernels / functions:
- HOG / SVM: NEHOGOrientationBinningKernel, NEHOGBlockNormalizationKernel, NEHOGDetectorKernel, NEHOGNonMaximaSuppressionKernel / NEHOGDescriptor, NEHOGDetector, NEHOGGradient, NEHOGMultiDetection
- NENonLinearFilterKernel / NENonLinearFilter
- Introduced a CLScheduler to manage the default context and command queue used by the runtime library and create synchronisation events.
- Switched all the kernels / functions to use tensors instead of images.
- Updated documentation to include instructions to build the library from sources.
v16.12 Binary preview release