Compute Library
 22.11
Release Versions and Changelog

Table of Contents

Release versions

All releases are numbered vYY.MM Where YY are the last two digits of the year, and MM the month number. If there is more than one release in a month then an extra sequential number is appended at the end:

v17.03 (First release of March 2017)
v17.03.1 (Second release of March 2017)
v17.04 (First release of April 2017)
Note
We're aiming at releasing one major public release with new features per quarter. All releases in between will only contain bug fixes.
Starting from release 22.05, 'master' branch is no longer being used, it has been replaced by 'main'. Please update your clone jobs accordingly.

Changelog

v22.11 Public major release

  • New features:
    • Add new experimental dynamic fusion API.
    • Add CPU batch matrix multiplication with adj_x = false and adj_y = false for FP32.
    • Add CPU MeanStdDevNorm for QASYMM8.
    • Add CPU and GPU GELU activation function for FP32 and FP16.
    • Add CPU swish activation function for FP32 and FP16.
  • Performance optimizations:
    • Optimize CPU bilinear scale for FP32, FP16, QASYMM8, QASYMM8_SIGNED, U8 and S8.
    • Optimize CPU activation functions using LUT-based implementation:
      • Sigmoid function for QASYMM8 and QASYMM8_SIGNED.
      • Hard swish function for QASYMM8_SIGNED.
    • Optimize CPU addition for QASYMM8 and QASYMM8_SIGNED using fixed-point arithmetic.
    • Optimize CPU multiplication, subtraction and activation layers by considering tensors as 1D.
    • Optimize GPU depthwise convolution kernel and heuristic.
    • Optimize GPU Conv2d heuristic.
    • Optimize CPU MeanStdDevNorm for FP16.
    • Optimize CPU tanh activation function for FP16 using rational approximation.
  • Improve GPU GeMMLowp start-up time.
  • Various optimizations and bug fixes.

v22.08 Public major release

  • Various bug fixes.
  • Disable unsafe FP optimizations causing accuracy issues in:
  • Add Dynamic Fusion of Elementwise Operators: Div, Floor, Add.
  • Optimize the gemm_reshaped_rhs_nly_nt OpenCL kernel using the arm_matrix_multiply extension available for Arm® Mali™-G715 and Arm® Mali™-G615.
  • Add support for the arm_matrix_multiply extension in the gemmlowp_mm_reshaped_only_rhs_t OpenCL kernel.
  • Expand GPUTarget list with missing Mali™ GPUs product names: G57, G68, G78AE, G610, G510, G310.
  • Extend the direct convolution 2d interface to configure the block size.
  • Update ClConv2D heuristic to use direct convolution.
  • Use official Khronos® OpenCL extensions:
    • Add cl_khr_integer_dot_product extension support.
    • Add support of OpenCL 3.0 non-uniform workgroup.
  • Cpu performance optimizations:
    • Add LUT-based implementation of Hard Swish and Leaky ReLU activation function for aarch64 build.
    • Optimize Add layer by considering the input tensors as 1D array.
  • Add fixed-format BF16, FP16 and FP32 Neon™ GEMM kernels to support variable weights.
  • Add new winograd convolution kernels implementation and update the ACL CpuWinogradConv2d operator.
  • Add experimental support for native builds for Windows on Arm®.
  • Build flag interpretation change: arch=armv8.6-a now translates to -march=armv8.6-a CXX flag instead of march=armv8.2-a + explicit selection of feature extensions.
  • Build flag change: toolchain_prefix, compiler_prefix:
    • Use empty string "" to suppress any prefixes.
    • Use "auto" to use default (auto) prefixes chosen by the build script. This is the default behavior when unspecified.
    • Any other string will be used as custom prefixes to the compiler and the rest of toolchain tools.
    • The default behaviour when prefix is unspecified does not change, but its signifier has been changed from empty string "" to "auto".
  • armv7a with Android build will no longer be tested or maintained.

v22.05 Public major release

v22.02 Public major release

v21.11 Public major release

  • Various bug fixes.
  • Various optimizations:
    • Improve performance of bilinear and nearest neighbor Scale on both CPU and GPU for FP32, FP16, Int8, Uint8 data types
    • Improve performance of Softmax on GPU for Uint8/Int8
  • New OpenCL kernels / functions:
  • New Arm® Neon™ kernels / functions:
  • Support configurable build by a selected subset of operator list
  • Support MobileBert on Neon™ backend
  • Improve operator/function logging
  • Remove padding from OpenCL kernels:
    • ClPool2dKernel
    • ClScaleKernel
    • ClGemmMatrixMultiplyReshapedKernel
  • Remove padding from Cpu kernels:
    • CpuPool2dKernel
  • Remove Y padding from OpenCL kernels:
    • ClGemmMatrixMultiplyKernel
    • ClGemmReshapedRHSMatrixKernel
  • Remove legacy GeMM kernels in gemm_v1.cl

v21.08 Public major release

v21.05 Public major release

  • Various bug fixes.
  • Various optimisations.
  • Various documentation updates:
    • Add supported operators and corresponding Android NNAPI operators.
    • Documentation reorg into user guide and contributor guide.
  • Add support for a global allocator for OpenCL tensors
  • Add experimental support for CLVK.
  • Add data type S32 support for:
  • Add data type QASYMM8 support for:
  • Add per-channel quantization support for:
  • Remove padding from OpenCL kernels:
  • Remove computer vision support from Arm® Neon™ backend
  • Remove the following functions:
    • NEAbsoluteDifference
    • NEAccumulate
    • NEBox3x3
    • NECannyEdge
    • NEChannelCombine
    • NEChannelExtract
    • NEColorConvert
    • NEConvolution
    • NEDerivative
    • NEDilate
    • NEEqualizeHistogram
    • NEErode
    • NEFastCorners
    • NEGaussian3x3
    • NEGaussian5x5
    • NEGaussianPyramid
    • NEHOGDescriptor
    • NEHOGDetector
    • NEHOGGradient
    • NEHOGMultiDetection
    • NEHarrisCorners
    • NEHistogram
    • NEIntegralImage
    • NELaplacianPyramid
    • NELaplacianReconstruct
    • NEMagnitude
    • NEMeanStdDev
    • NEMedian3x3
    • NEMinMaxLocation
    • NENonLinearFilter
    • NEOpticalFlow
    • NEPhase
    • NEScharr3x3
    • NESobel3x3
    • NESobel5x5
    • NESobel7x7
    • NETableLookup
    • NEThreshold
    • NEWarpAffine
    • NEWarpPerspectiveKernel
  • Remove all GLES kernels / functions / tests / examples
  • Remove computer vision support from CL backend
  • Remove the following functions:
    • CLAbsoluteDifference
    • CLAccumulate
    • CLBox3x3
    • CLCannyEdge
    • CLChannelCombine
    • CLChannelExtract
    • CLColorConvert
    • CLConvolution
    • CLDerivative
    • CLDilate
    • CLEqualizeHistogram
    • CLErode
    • CLFastCorners
    • CLGaussian3x3
    • CLGaussian5x5
    • CLGaussianPyramid
    • CLHOGDescriptor
    • CLHOGDetector
    • CLHOGGradient
    • CLHOGMultiDetection
    • CLHarrisCorners
    • CLHistogram
    • CLIntegralImage
    • CLLaplacianPyramid
    • CLLaplacianReconstruct
    • CLMagnitude
    • CLMeanStdDev
    • CLMedian3x3
    • CLMinMaxLocation
    • CLNonLinearFilter
    • CLOpticalFlow
    • CLPhase
    • CLScharr3x3
    • CLSobel3x3
    • CLSobel5x5
    • CLSobel7x7
    • CLTableLookup
    • CLThreshold
    • CLWarpAffine
    • CLWarpPerspective

v21.02 Public major release

  • Various bug fixes.
  • Various optimisations.
  • Upgrade C++ standard to C++14
  • Add macOS support
  • Add Armv8-R AArch64 architecture support
  • Add SVE/SVE2 support for:
  • Remove padding from OpenCL kernels:
  • Deprecate functions in CLTuner:
    • add_lws_to_table
    • import_lws_table
    • lws_table
  • Remove functions:
    • NELocallyConnectedLayer / CLLocallyConnectedLayer
    • NEIm2Col
    • NECol2Im
    • NEGEMMInterleave4x4
    • NEGEMMTranspose1xW
    • NEComputeAllAnchors / CLComputeAllAnchors
    • NEGEMMAssemblyDispatch
    • NEUpsampleLayer / CLUpsampleLayer
  • Remove kernels:
    • NEGEMMMatrixVectorMultiplyKernel
    • NELocallyConnectedMatrixMultiplyKernel / CLLocallyConnectedMatrixMultiplyKernel
    • NEUpsampleLayerKernel / CLUpsampleLayerKernel
  • Extend OpenCL tuner with workgroup batch size support
    • Experimental extension for the OpenCL tuner to tune the batches of work groups distribute to compute units
  • Add functionality to load the OpenCL GEMM heuristics at runtime
    • The GEMM heuristic file (MLGO) can be used to update the default GEMM heuristics available for OpenCL
  • Note: there might be performance regressions against v20.08 in Inception v3 using int8 data types on Arm Mali-G77 GPUs. Currently under investigation
  • Note: data-type decoupling is in progress and experimental. Warning of unused symbols might be raised

v20.11 Public major release

  • Various bug fixes.
  • Various optimisations.
  • Performance regressions can be noted when executing Depthwise Convolution on Arm® Neon™ with a depth multiplier > 1 for quantized data type. This is planned to be resolved in 21.02 release.
  • Added new data type QASYMM8_SIGNED support for NEROIAlignLayer.
  • Added new data type S32 support for:
  • Interface change
    • Properly support softmax axis to have the same meaning as other major frameworks. That is, axis now defines the dimension on which Softmax/Logsoftmax is performed. E.g. for input of shape 4x5x6 and axis=1, softmax will be applied to 4x6=24 vectors of size 5. The supported value range of axis is [-rank, rank). This change applies to the following functions:
  • New OpenCL kernels / functions:
  • New Arm® Neon™ kernels / functions:
  • Removed padding from Arm® Neon™ kernels:
    • NEComplexPixelWiseMultiplicationKernel
    • NENonMaximaSuppression3x3Kernel
    • NERemapKernel
    • NEGEMMInterleave4x4Kernel
    • NEDirectConvolutionLayerKernel
    • NEScaleKernel
    • NELocallyConnectedMatrixMultiplyKernel
    • NEGEMMLowpOffsetContributionKernel
    • NEGEMMTranspose1xWKernel
    • NEPoolingLayerKernel
    • NEConvolutionKernel
    • NEDepthwiseConvolutionLayerNativeKernel
    • NEGEMMLowpMatrixMultiplyKernel
    • NEGEMMMatrixMultiplyKernel
    • NEDirectConvolutionLayerOutputStageKernel
    • NEReductionOperationKernel
    • NEGEMMLowpMatrixAReductionKernel
    • NEGEMMLowpMatrixBReductionKernel
  • Removed padding from OpenCL kernels:
    • CLBatchConcatenateLayerKernel
    • CLElementwiseOperationKernel
    • CLBatchNormalizationLayerKernel
    • CLPoolingLayerKernel
    • CLWinogradInputTransformKernel
    • CLGEMMLowpMatrixMultiplyNativeKernel
    • CLGEMMLowpMatrixAReductionKernel
    • CLGEMMLowpMatrixBReductionKernel
    • CLGEMMLowpOffsetContributionOutputStageKernel
    • CLGEMMLowpOffsetContributionKernel
    • CLWinogradOutputTransformKernel
    • CLGEMMLowpMatrixMultiplyReshapedKernel
    • CLFuseBatchNormalizationKernel
    • CLDepthwiseConvolutionLayerNativeKernel
    • CLDepthConvertLayerKernel
    • CLCopyKernel
    • CLDepthwiseConvolutionLayer3x3NHWCKernel
    • CLActivationLayerKernel
    • CLWinogradFilterTransformKernel
    • CLWidthConcatenateLayerKernel
    • CLWidthConcatenate4TensorsKernel
    • CLWidthConcatenate2TensorsKernel
    • CLLogits1DMaxShiftExpSumKernel
    • CLLogits1DNormKernel
    • CLHeightConcatenateLayerKernel
    • CLGEMMMatrixMultiplyKernel
    • CLGEMMLowpQuantizeDownInt32ScaleKernel
    • CLGEMMLowpQuantizeDownInt32ScaleByFloatKernel
    • CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel
    • CLDepthConcatenateLayerKernel
    • CLGEMMLowpQuantizeDownInt32ScaleByFixedPointKernel
  • Removed OpenCL kernels / functions:
    • CLGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel
    • CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel
    • CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel
  • Deprecated OpenCL kernels / functions (If a kernel is used only by the function that is being deprecated, the kernel is deprecated together):
    • CLLocallyConnectedLayer
    • CLLocallyConnectedMatrixMultiplyKernel
    • CLAbsoluteDifference
    • CLAbsoluteDifferenceKernel
    • CLAccumulate
    • CLAccumulateKernel
    • CLAccumulateSquared
    • CLAccumulateSquaredKernel
    • CLAccumulateWeighted
    • CLAccumulateWeightedKernel
    • CLAccumulateWeightedFP16Kernel
    • CLBox3x3
    • CLBox3x3Kernel
    • CLBox3x3FP16Kernel
    • CLCannyEdge
    • CLChannelCombine
    • CLChannelCombineKernel
    • CLChannelExtract
    • CLChannelExtractKernel
    • CLColorConvert
    • CLColorConvertKernel
    • CLConvolution3x3
    • CLConvolutionRectangle
    • CLConvolutionRectangleKernel
    • CLConvolutionSquare
    • CLConvolutionKernel
    • CLDerivative
    • CLDerivativeKernel
    • CLDilate
    • CLDilateKernel
    • CLEqualizeHistogram
    • CLErode
    • CLErodeKernel
    • CLFastCorners
    • CLFastCornersKernel
    • CLGaussian3x3
    • CLGaussian3x3Kernel
    • CLGaussian5x5
    • CLGaussian5x5HorKernel
    • CLGaussian5x5VertKernel
    • CLGaussianPyramid
    • CLGaussianPyramidHalf
    • CLGaussianPyramidOrb
    • CLHarrisCorners
    • CLHarrisScoreKernel
    • CLHarrisScoreFP16Kernel
    • CLHistogram
    • CLHistogramKernel
    • CLHOGOrientationBinningKernel
    • CLHOGBlockNormalizationKernel
    • CLHOGDetectorKernel
    • CLHOGNonMaximaSuppressionKernel
    • CLHOGDescriptor
    • CLHOGDetector
    • CLHOGGradient
    • CLHOGMultiDetection
    • CLHOGOrientationBinningKernel
    • CLHOGBlockNormalizationKernel
    • CLHOGDetectorKernel
    • CLIntegralImage
    • CLIntegralImageKernel
    • CLLaplacianReconstruct
    • CLLaplacianPyramid
    • CLMagnitude
    • CLMagnitudePhaseKernel
    • CLMedian3x3
    • CLMedian3x3Kernel
    • CLMinMaxLocation
    • CLMinMaxLocationKernel
    • CLNonLinearFilter
    • CLNonLinearFilterKernel
    • CLNonMaximaSuppression3x3
    • CLNonMaximaSuppression3x3FP16Kernel
    • CLNonMaximaSuppression3x3Kernel
    • CLOpticalFlow
    • CLPhase
    • CLRemap
    • CLRemapKernel
    • CLScharr3x3
    • CLScharr3x3Kernel
    • CLSobel3x3
    • CLSobel3x3Kernel
    • CLSobel5x5
    • CLSobel5x5HorKernel
    • CLSobel5x5VertKernel
    • CLSobel7x7
    • CLSobel7x7HorKernel
    • CLSobel7x7VertKernel
    • CLThreshold
    • CLThresholdKernel
    • CLWarpAffine
    • CLWarpAffineKernel
    • CLWarpPerspective
    • CLWarpPerspectiveKernel
  • Deprecated Arm® Neon™ kernels / functions (If a kernel is used only by the function that is being deprecated, the kernel is deprecated together):
    • NELocallyConnectedLayer
    • NELocallyConnectedMatrixMultiplyKernel
    • NEAbsoluteDifference
    • NEAbsoluteDifferenceKernel
    • NEAccumulate
    • NEAccumulateKernel
    • NEAccumulateSquared
    • NEAccumulateSquaredKernel
    • NEAccumulateWeighted
    • NEAccumulateWeightedKernel
    • NEAccumulateWeightedFP16Kernel
    • NEBox3x3
    • NEBox3x3Kernel
    • NEBox3x3FP16Kernel
    • NECannyEdge
    • NEChannelCombine
    • NEChannelCombineKernel
    • NEChannelExtract
    • NEChannelExtractKernel
    • NEColorConvert
    • NEColorConvertKernel
    • NEConvolution3x3
    • NEConvolutionRectangle
    • NEConvolutionRectangleKernel
    • NEConvolutionSquare
    • NEConvolutionKernel
    • NEDerivative
    • NEDerivativeKernel
    • NEDilate
    • NEDilateKernel
    • NEEqualizeHistogram
    • NEErode
    • NEErodeKernel
    • NEFastCorners
    • NEFastCornersKernel
    • NEGaussian3x3
    • NEGaussian3x3Kernel
    • NEGaussian5x5
    • NEGaussian5x5HorKernel
    • NEGaussian5x5VertKernel
    • NEGaussianPyramid
    • NEGaussianPyramidHalf
    • NEGaussianPyramidOrb
    • NEHarrisCorners
    • NEHarrisScoreKernel
    • NEHarrisScoreFP16Kernel
    • NEHistogram
    • NEHistogramKernel
    • NEHOGOrientationBinningKernel
    • NEHOGBlockNormalizationKernel
    • NEHOGDetectorKernel
    • NEHOGNonMaximaSuppressionKernel
    • NEHOGDescriptor
    • NEHOGDetector
    • NEHOGGradient
    • NEHOGMultiDetection
    • NEHOGOrientationBinningKernel
    • NEHOGBlockNormalizationKernel
    • NEHOGDetectorKernel
    • NEIntegralImage
    • NEIntegralImageKernel
    • NELaplacianReconstruct
    • NELaplacianPyramid
    • NEMagnitude
    • NEMagnitudePhaseKernel
    • NEMedian3x3
    • NEMedian3x3Kernel
    • NEMinMaxLocation
    • NEMinMaxLocationKernel
    • NENonLinearFilter
    • NENonLinearFilterKernel
    • NENonMaximaSuppression3x3
    • NENonMaximaSuppression3x3FP16Kernel
    • NENonMaximaSuppression3x3Kernel
    • NEOpticalFlow
    • NEPhase
    • NERemap
    • NERemapKernel
    • NEScharr3x3
    • NEScharr3x3Kernel
    • NESobel3x3
    • NESobel3x3Kernel
    • NESobel5x5
    • NESobel5x5HorKernel
    • NESobel5x5VertKernel
    • NESobel7x7
    • NESobel7x7HorKernel
    • NESobel7x7VertKernel
    • NEThreshold
    • NEThresholdKernel
    • NEWarpAffine
    • NEWarpAffineKernel
    • NEWarpPerspective
    • NEWarpPerspectiveKernel
  • Deprecated GLES kernels / functions (If a kernel is used only by the function that is being deprecated, the kernel is deprecated together):
    • GCAbsoluteDifference
    • GCActivationLayer
    • GCArithmeticAddition
    • GCBatchNormalizationLayer
    • GCConcatenateLayer
    • GCConvolutionLayer
    • GCDepthwiseConvolutionLayer
    • GCDirectConvolutionLayer
    • GCDropoutLayer
    • GCFillBorder
    • GCFullyConnectedLayer
    • GCGEMM
    • GCGEMMInterleave4x4
    • GCGEMMTranspose1xW
    • GCNormalizationLayer
    • GCNormalizePlanarYUVLayer
    • GCPixelWiseMultiplication
    • GCPoolingLayer
    • GCScale
    • GCSoftmaxLayer
    • GCTensorShift
    • GCTranspose

v20.08 Public major release

  • Various bug fixes.
  • Various optimisations.
  • Added new data type QASYMM8_SIGNED support for:
  • Added new data type U8 support for:
  • Added align_corner support for nearest neighbor interpolation in:
    • NEScaleKernel
    • CLScaleKernel
  • New OpenCL kernels / functions:
  • New Arm® Neon™ kernels / functions:
    • NEMaxUnpoolingLayerKernel
  • New graph example:
    • graph_yolov3_output_detector
  • GEMMTuner improvements:
    • Added fp16 support
    • Output json files for easier integration
    • Enabled tuning for export_to_cl_image_rhs option for RHS tensors
    • More robust script for running benchmarks
  • Removed padding from:
  • Removed OpenCL kernels / functions:
    • CLGEMMLowpQuantizeDownInt32ToUint8Scale
    • CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFloat
  • Removed Arm® Neon™ kernels / functions:
    • NEGEMMLowpQuantizeDownInt32ToUint8Scale
    • NEGEMMMatrixAccumulateBiasesKernel
  • Deprecated functions / interfaces:
  • The support for quantized data types has been removed from CLLogSoftmaxLayer due to implementation complexity.
  • Removed padding requirement for the input (e.g. LHS of GEMM) and output in CLGEMMMatrixMultiplyNativeKernel, CLGEMMMatrixMultiplyReshapedKernel, CLGEMMMatrixMultiplyReshapedOnlyRHSKernel and CLIm2ColKernel (NHWC only)
    • This change allows to use CLGEMMConvolutionLayer without extra padding for the input and output.
    • Only the weights/bias of CLGEMMConvolutionLayer could require padding for the computation.
    • Only on Arm® Mali™ Midgard GPUs, CLGEMMConvolutionLayer could require padding since CLGEMMMatrixMultiplyKernel is called and currently requires padding.
  • Added support for exporting the OpenCL buffer object to the OpenCL image object in CLGEMMMatrixMultiplyReshapedKernel and CLGEMMMatrixMultiplyReshapedOnlyRHSKernel.
    • This support allows to export the OpenCL buffer used for the reshaped RHS matrix to the OpenCL image object.
    • The padding requirement for the OpenCL image object is considered into the CLGEMMReshapeRHSMatrixKernel.
    • The reshaped RHS matrix stores the weights when GEMM is used to accelerate CLGEMMConvolutionLayer.

v20.05 Public major release

v20.02.1 Maintenance release

  • Added Android-NN build script.

v20.02 Public major release

v19.11.1 Public maintenance release

  • Fix offset calculation in NEReductionOperationKernel.
  • Fix data layout in NEScaleKernel for nhwc.
  • Retain configuration step data layout to avoid side-effects.
  • Perform sqrt in double domain for L2 pooling.
  • Fix output shape calculation for Reduce Mean
  • Restrict cases where optimized NEPadLayer runs.

v19.11 Public major release

v19.08.1 Public maintenance release

  • Fix offset calculation in NEReductionOperationKernel.
  • Fix data layout in NEScaleKernel for nhwc.
  • Retain configuration step data layout to avoid side-effects.
  • Perform sqrt in double domain for L2 pooling.
  • Fix output shape calculation for Reduce Mean
  • Fix broadcast CLPixelwiseMultiplication with 5D tensors

v19.08 Public major release

v19.05 Public major release

v19.02 Public major release

v18.11 Public major release

v18.08 Public major release

v18.05 Public major release

v18.03 Public maintenance release

  • Various bug fixes.
  • Fixed bug in NEActivationLayer
  • Fix in CLTuner when using batches.
  • Updated recommended NDK version to r16b (And fixed warnings).
  • Fixed bug in validation code.
  • Added Inception v4 graph example.
  • Renamed NEWinogradLayer.cpp to NEWinogradConvolutionLayer

v18.02 Public major release

v18.01 Public maintenance release

  • Various bug fixes
  • Added some of the missing validate() methods
  • Added CLDeconvolutionLayerUpsampleKernel / CLDeconvolutionLayer CLDeconvolutionLayerUpsample
  • Added CLPermuteKernel / CLPermute
  • Added method to clean the programs cache in the CL Kernel library.
  • Added GCArithmeticAdditionKernel / GCArithmeticAddition
  • Added GCDepthwiseConvolutionLayer3x3Kernel / GCDepthwiseConvolutionLayer3x3
  • Added GCNormalizePlanarYUVLayerKernel / GCNormalizePlanarYUVLayer
  • Added GCScaleKernel / GCScale
  • Added GCWeightsReshapeKernel / GCConvolutionLayer
  • Added FP16 support to the following GLES compute kernels:
    • GCCol2ImKernel
    • GCGEMMInterleave4x4Kernel
    • GCGEMMTranspose1xWKernel
    • GCIm2ColKernel
  • Refactored Arm® Neon™ Winograd (NEWinogradLayerKernel)
  • Added NEDirectConvolutionLayerOutputStageKernel
  • Added QASYMM8 support to the following Arm® Neon™ kernels:
  • Added new examples:
  • More tests added to both validation and benchmarking suites.

v17.12 Public major release

  • Most machine learning functions on OpenCL support the new data type QASYMM8
  • Introduced logging interface
  • Introduced opencl timer
  • Reworked GEMMLowp interface
  • Added new Arm® Neon™ assembly kernels for GEMMLowp, SGEMM and HGEMM
  • Added validation method for most Machine Learning kernels / functions
  • Added new graph examples such as googlenet, mobilenet, squeezenet, vgg16 and vgg19
  • Added sgemm example for OpenCL
  • Added absolute difference example for GLES compute
  • Added new tests and benchmarks in validation and benchmark frameworks
  • Added new kernels / functions for GLES compute
  • New OpenGL ES kernels / functions
    • GCAbsoluteDifferenceKernel / GCAbsoluteDifference
    • GCActivationLayerKernel / GCActivationLayer
    • GCBatchNormalizationLayerKernel / GCBatchNormalizationLayer
    • GCCol2ImKernel
    • GCDepthConcatenateLayerKernel / GCDepthConcatenateLayer
    • GCDirectConvolutionLayerKernel / GCDirectConvolutionLayer
    • GCDropoutLayerKernel / GCDropoutLayer
    • GCFillBorderKernel / GCFillBorder
    • GCGEMMInterleave4x4Kernel / GCGEMMInterleave4x4
    • GCGEMMMatrixAccumulateBiasesKernel / GCGEMMMatrixAdditionKernel / GCGEMMMatrixMultiplyKernel / GCGEMM
    • GCGEMMTranspose1xWKernel / GCGEMMTranspose1xW
    • GCIm2ColKernel
    • GCNormalizationLayerKernel / GCNormalizationLayer
    • GCPixelWiseMultiplicationKernel / GCPixelWiseMultiplication
    • GCPoolingLayerKernel / GCPoolingLayer
    • GCLogits1DMaxKernel / GCLogits1DShiftExpSumKernel / GCLogits1DNormKernel / GCSoftmaxLayer
    • GCTransposeKernel / GCTranspose
  • New Arm® Neon™ kernels / functions
    • arm_compute::NEGEMMLowpAArch64A53Kernel / arm_compute::NEGEMMLowpAArch64Kernel / arm_compute::NEGEMMLowpAArch64V8P4Kernel / arm_compute::NEGEMMInterleavedBlockedKernel / arm_compute::NEGEMMLowpAssemblyMatrixMultiplyCore
    • arm_compute::NEHGEMMAArch64FP16Kernel
    • NEDepthwiseConvolutionLayer3x3Kernel / NEDepthwiseIm2ColKernel / NEGEMMMatrixVectorMultiplyKernel / NEDepthwiseVectorToTensorKernel / NEDepthwiseConvolutionLayer
    • NEGEMMLowpOffsetContributionKernel / NEGEMMLowpMatrixAReductionKernel / NEGEMMLowpMatrixBReductionKernel / NEGEMMLowpMatrixMultiplyCore
    • NEGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel / NEGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint
    • NEWinogradLayer / NEWinogradLayerKernel
  • New OpenCL kernels / functions
    • CLGEMMLowpOffsetContributionKernel / CLGEMMLowpMatrixAReductionKernel / CLGEMMLowpMatrixBReductionKernel / CLGEMMLowpMatrixMultiplyCore
    • CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel / CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint
  • New graph nodes for Arm® Neon™ and OpenCL
    • graph::BranchLayer
    • graph::DepthConvertLayer
    • graph::DepthwiseConvolutionLayer
    • graph::DequantizationLayer
    • graph::FlattenLayer
    • graph::QuantizationLayer
    • graph::ReshapeLayer

v17.10 Public maintenance release

  • Bug fixes:
    • Check the maximum local workgroup size supported by OpenCL devices
    • Minor documentation updates (Fixed instructions to build the examples)
    • Introduced a graph::GraphContext
    • Added a few new Graph nodes, support for branches and grouping.
    • Automatically enable cl_printf in debug builds
    • Fixed bare metal builds for armv7a
    • Added AlexNet and cartoon effect examples
    • Fixed library builds: libraries are no longer built as supersets of each other.(It means application using the Runtime part of the library now need to link against both libarm_compute_core and libarm_compute)

v17.09 Public major release

v17.06 Public major release

  • Various bug fixes
  • Added support for fixed point 8 bit (QS8) to the various Arm® Neon™ machine learning kernels.
  • Added unit tests and benchmarks (AlexNet, LeNet)
  • Added support for sub tensors.
  • Added infrastructure to provide GPU specific optimisation for some OpenCL kernels.
  • Added OMPScheduler (OpenMP) scheduler for Neon
  • Added SingleThreadScheduler scheduler for Arm® Neon™ (For bare metal)
  • User can specify their own scheduler by implementing the IScheduler interface.
  • New OpenCL kernels / functions:
    • CLBatchNormalizationLayerKernel / CLBatchNormalizationLayer
    • CLDepthConcatenateLayerKernel / CLDepthConcatenateLayer
    • CLHOGOrientationBinningKernel CLHOGBlockNormalizationKernel, CLHOGDetectorKernel / CLHOGDescriptor CLHOGDetector CLHOGGradient CLHOGMultiDetection
    • CLLocallyConnectedMatrixMultiplyKernel / CLLocallyConnectedLayer
    • CLWeightsReshapeKernel / CLConvolutionLayerReshapeWeights
  • New C++ kernels:
    • CPPDetectionWindowNonMaximaSuppressionKernel
  • New Arm® Neon™ kernels / functions:

v17.05 Public bug fixes release

  • Various bug fixes
  • Remaining of the functions ported to use accurate padding.
  • Library does not link against OpenCL anymore (It uses dlopen / dlsym at runtime instead to determine whether or not OpenCL is available).
  • Added "free" method to allocator.
  • Minimum version of g++ required for armv7 Linux changed from 4.8 to 4.9

v17.04 Public bug fixes release

The following functions have been ported to use the new accurate padding:

  • CLColorConvertKernel
  • CLEdgeNonMaxSuppressionKernel
  • CLEdgeTraceKernel
  • CLGaussianPyramidHorKernel
  • CLGaussianPyramidVertKernel
  • CLGradientKernel
  • NEChannelCombineKernel
  • NEFillArrayKernel
  • NEGaussianPyramidHorKernel
  • NEGaussianPyramidVertKernel
  • NEHarrisScoreFP16Kernel
  • NEHarrisScoreKernel
  • NEHOGDetectorKernel
  • NELogits1DMaxKernel
  • NELogits1DShiftExpSumKernel
  • NELogits1DNormKernel
  • NENonMaximaSuppression3x3FP16Kernel
  • NENonMaximaSuppression3x3Kernel

v17.03.1 First Major public release of the sources

  • Renamed the library to arm_compute
  • New CPP target introduced for C++ kernels shared between Arm® Neon™ and CL functions.
  • New padding calculation interface introduced and ported most kernels / functions to use it.
  • New OpenCL kernels / functions:
    • CLGEMMLowpMatrixMultiplyKernel / CLGEMMLowp
  • New Arm® Neon™ kernels / functions:

v17.03 Sources preview

  • New OpenCL kernels / functions:
    • CLGradientKernel, CLEdgeNonMaxSuppressionKernel, CLEdgeTraceKernel / CLCannyEdge
    • GEMM refactoring + FP16 support: CLGEMMInterleave4x4Kernel, CLGEMMTranspose1xWKernel, CLGEMMMatrixMultiplyKernel, CLGEMMMatrixAdditionKernel / CLGEMM
    • CLGEMMMatrixAccumulateBiasesKernel / CLFullyConnectedLayer
    • CLTransposeKernel / CLTranspose
    • CLLKTrackerInitKernel, CLLKTrackerStage0Kernel, CLLKTrackerStage1Kernel, CLLKTrackerFinalizeKernel / CLOpticalFlow
    • CLNormalizationLayerKernel / CLNormalizationLayer
    • CLLaplacianPyramid, CLLaplacianReconstruct
  • New Arm® Neon™ kernels / functions:
    • NEActivationLayerKernel / NEActivationLayer
    • GEMM refactoring + FP16 support (Requires armv8.2 CPU): NEGEMMInterleave4x4Kernel, NEGEMMTranspose1xWKernel, NEGEMMMatrixMultiplyKernel, NEGEMMMatrixAdditionKernel / NEGEMM
    • NEPoolingLayerKernel / NEPoolingLayer

v17.02.1 Sources preview

  • New OpenCL kernels / functions:
    • CLLogits1DMaxKernel, CLLogits1DShiftExpSumKernel, CLLogits1DNormKernel / CLSoftmaxLayer
    • CLPoolingLayerKernel / CLPoolingLayer
    • CLIm2ColKernel, CLCol2ImKernel, CLConvolutionLayerWeightsReshapeKernel / CLConvolutionLayer
    • CLRemapKernel / CLRemap
    • CLGaussianPyramidHorKernel, CLGaussianPyramidVertKernel / CLGaussianPyramid, CLGaussianPyramidHalf, CLGaussianPyramidOrb
    • CLMinMaxKernel, CLMinMaxLocationKernel / CLMinMaxLocation
    • CLNonLinearFilterKernel / CLNonLinearFilter
  • New Arm® Neon™ FP16 kernels (Requires armv8.2 CPU)
    • NEAccumulateWeightedFP16Kernel
    • NEBox3x3FP16Kernel
    • NENonMaximaSuppression3x3FP16Kernel

v17.02 Sources preview

  • New OpenCL kernels / functions:
    • CLActivationLayerKernel / CLActivationLayer
    • CLChannelCombineKernel / CLChannelCombine
    • CLDerivativeKernel / CLChannelExtract
    • CLFastCornersKernel / CLFastCorners
    • CLMeanStdDevKernel / CLMeanStdDev
  • New Arm® Neon™ kernels / functions:
    • HOG / SVM: NEHOGOrientationBinningKernel, NEHOGBlockNormalizationKernel, NEHOGDetectorKernel, NEHOGNonMaximaSuppressionKernel / NEHOGDescriptor, NEHOGDetector, NEHOGGradient, NEHOGMultiDetection
    • NENonLinearFilterKernel / NENonLinearFilter
  • Introduced a CLScheduler to manage the default context and command queue used by the runtime library and create synchronisation events.
  • Switched all the kernels / functions to use tensors instead of images.
  • Updated documentation to include instructions to build the library from sources.

v16.12 Binary preview release

  • Original release