Compute Library
 21.05
Release Versions and Changelog

Table of Contents

Release versions

All releases are numbered vYY.MM Where YY are the last two digits of the year, and MM the month number. If there is more than one release in a month then an extra sequential number is appended at the end:

v17.03 (First release of March 2017)
v17.03.1 (Second release of March 2017)
v17.04 (First release of April 2017)
Note
We're aiming at releasing one major public release with new features per quarter. All releases in between will only contain bug fixes.

Changelog

v21.05 Public major release

  • Various bug fixes.
  • Various optimisations.
  • Various documentation updates:
    • Add supported operators and coressponding Android NNAPI operators.
    • Documentaiton reorg into user guide and contributor guide.
  • Add support for a global allocator for OpenCL tensors
  • Add experimental support for CLVK.
  • Add data type S32 support for:
  • Add data type QASYMM8 support for:
  • Add per-channel quantization support for:
  • Remove padding from OpenCL kernels:
  • Remove computer vision support from Arm® Neon™ backend
  • Remove the following functions:
    • NEAbsoluteDifference
    • NEAccumulate
    • NEBox3x3
    • NECannyEdge
    • NEChannelCombine
    • NEChannelExtract
    • NEColorConvert
    • NEConvolution
    • NEDerivative
    • NEDilate
    • NEEqualizeHistogram
    • NEErode
    • NEFastCorners
    • NEGaussian3x3
    • NEGaussian5x5
    • NEGaussianPyramid
    • NEHOGDescriptor
    • NEHOGDetector
    • NEHOGGradient
    • NEHOGMultiDetection
    • NEHarrisCorners
    • NEHistogram
    • NEIntegralImage
    • NELaplacianPyramid
    • NELaplacianReconstruct
    • NEMagnitude
    • NEMeanStdDev
    • NEMedian3x3
    • NEMinMaxLocation
    • NENonLinearFilter
    • NEOpticalFlow
    • NEPhase
    • NEScharr3x3
    • NESobel3x3
    • NESobel5x5
    • NESobel7x7
    • NETableLookup
    • NEThreshold
    • NEWarpAffine
    • NEWarpPerspectiveKernel
  • Remove all GLES kernels / functions / tests / examples
  • Remove computer vision support from CL backend
  • Remove the following functions:
    • CLAbsoluteDifference
    • CLAccumulate
    • CLBox3x3
    • CLCannyEdge
    • CLChannelCombine
    • CLChannelExtract
    • CLColorConvert
    • CLConvolution
    • CLDerivative
    • CLDilate
    • CLEqualizeHistogram
    • CLErode
    • CLFastCorners
    • CLGaussian3x3
    • CLGaussian5x5
    • CLGaussianPyramid
    • CLHOGDescriptor
    • CLHOGDetector
    • CLHOGGradient
    • CLHOGMultiDetection
    • CLHarrisCorners
    • CLHistogram
    • CLIntegralImage
    • CLLaplacianPyramid
    • CLLaplacianReconstruct
    • CLMagnitude
    • CLMeanStdDev
    • CLMedian3x3
    • CLMinMaxLocation
    • CLNonLinearFilter
    • CLOpticalFlow
    • CLPhase
    • CLScharr3x3
    • CLSobel3x3
    • CLSobel5x5
    • CLSobel7x7
    • CLTableLookup
    • CLThreshold
    • CLWarpAffine
    • CLWarpPerspective

v21.02 Public major release

v20.11 Public major release

  • Various bug fixes.
  • Various optimisations.
  • Performance regressions can be noted when executing Depthwise Convolution on Arm® Neon™ with a depth multiplier > 1 for quantized data type. This is planned to be resolved in 21.02 release.
  • Added new data type QASYMM8_SIGNED support for NEROIAlignLayer.
  • Added new data type S32 support for:
  • Interface change
    • Properly support softmax axis to have the same meaning as other major frameworks. That is, axis now defines the dimension on which Softmax/Logsoftmax is performed. E.g. for input of shape 4x5x6 and axis=1, softmax will be applied to 4x6=24 vectors of size 5. The supported value range of axis is [-rank, rank). This change applies to the following functions:
  • New OpenCL kernels / functions:
  • New Arm® Neon™ kernels / functions:
  • Removed padding from Arm® Neon™ kernels:
  • Removed padding from OpenCL kernels:
  • Removed OpenCL kernels / functions:
    • CLGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel
    • CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel
    • CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel
  • Deprecated OpenCL kernels / functions (If a kernel is used only by the function that is being deprecated, the kernel is deprecated together):
    • CLLocallyConnectedLayer
    • CLLocallyConnectedMatrixMultiplyKernel
    • CLAbsoluteDifference
    • CLAbsoluteDifferenceKernel
    • CLAccumulate
    • CLAccumulateKernel
    • CLAccumulateSquared
    • CLAccumulateSquaredKernel
    • CLAccumulateWeighted
    • CLAccumulateWeightedKernel
    • CLAccumulateWeightedFP16Kernel
    • CLBox3x3
    • CLBox3x3Kernel
    • CLBox3x3FP16Kernel
    • CLCannyEdge
    • CLChannelCombine
    • CLChannelCombineKernel
    • CLChannelExtract
    • CLChannelExtractKernel
    • CLColorConvert
    • CLColorConvertKernel
    • CLConvolution3x3
    • CLConvolutionRectangle
    • CLConvolutionRectangleKernel
    • CLConvolutionSquare
    • CLConvolutionKernel
    • CLDerivative
    • CLDerivativeKernel
    • CLDilate
    • CLDilateKernel
    • CLEqualizeHistogram
    • CLErode
    • CLErodeKernel
    • CLFastCorners
    • CLFastCornersKernel
    • CLGaussian3x3
    • CLGaussian3x3Kernel
    • CLGaussian5x5
    • CLGaussian5x5HorKernel
    • CLGaussian5x5VertKernel
    • CLGaussianPyramid
    • CLGaussianPyramidHalf
    • CLGaussianPyramidOrb
    • CLHarrisCorners
    • CLHarrisScoreKernel
    • CLHarrisScoreFP16Kernel
    • CLHistogram
    • CLHistogramKernel
    • CLHOGOrientationBinningKernel
    • CLHOGBlockNormalizationKernel
    • CLHOGDetectorKernel
    • CLHOGNonMaximaSuppressionKernel
    • CLHOGDescriptor
    • CLHOGDetector
    • CLHOGGradient
    • CLHOGMultiDetection
    • CLHOGOrientationBinningKernel
    • CLHOGBlockNormalizationKernel
    • CLHOGDetectorKernel
    • CLIntegralImage
    • CLIntegralImageKernel
    • CLLaplacianReconstruct
    • CLLaplacianPyramid
    • CLMagnitude
    • CLMagnitudePhaseKernel
    • CLMedian3x3
    • CLMedian3x3Kernel
    • CLMinMaxLocation
    • CLMinMaxLocationKernel
    • CLNonLinearFilter
    • CLNonLinearFilterKernel
    • CLNonMaximaSuppression3x3
    • CLNonMaximaSuppression3x3FP16Kernel
    • CLNonMaximaSuppression3x3Kernel
    • CLOpticalFlow
    • CLPhase
    • CLRemap
    • CLRemapKernel
    • CLScharr3x3
    • CLScharr3x3Kernel
    • CLSobel3x3
    • CLSobel3x3Kernel
    • CLSobel5x5
    • CLSobel5x5HorKernel
    • CLSobel5x5VertKernel
    • CLSobel7x7
    • CLSobel7x7HorKernel
    • CLSobel7x7VertKernel
    • CLThreshold
    • CLThresholdKernel
    • CLWarpAffine
    • CLWarpAffineKernel
    • CLWarpPerspective
    • CLWarpPerspectiveKernel
  • Deprecated Arm® Neon™ kernels / functions (If a kernel is used only by the function that is being deprecated, the kernel is deprecated together):
    • NELocallyConnectedLayer
    • NELocallyConnectedMatrixMultiplyKernel
    • NEAbsoluteDifference
    • NEAbsoluteDifferenceKernel
    • NEAccumulate
    • NEAccumulateKernel
    • NEAccumulateSquared
    • NEAccumulateSquaredKernel
    • NEAccumulateWeighted
    • NEAccumulateWeightedKernel
    • NEAccumulateWeightedFP16Kernel
    • NEBox3x3
    • NEBox3x3Kernel
    • NEBox3x3FP16Kernel
    • NECannyEdge
    • NEChannelCombine
    • NEChannelCombineKernel
    • NEChannelExtract
    • NEChannelExtractKernel
    • NEColorConvert
    • NEColorConvertKernel
    • NEConvolution3x3
    • NEConvolutionRectangle
    • NEConvolutionRectangleKernel
    • NEConvolutionSquare
    • NEConvolutionKernel
    • NEDerivative
    • NEDerivativeKernel
    • NEDilate
    • NEDilateKernel
    • NEEqualizeHistogram
    • NEErode
    • NEErodeKernel
    • NEFastCorners
    • NEFastCornersKernel
    • NEGaussian3x3
    • NEGaussian3x3Kernel
    • NEGaussian5x5
    • NEGaussian5x5HorKernel
    • NEGaussian5x5VertKernel
    • NEGaussianPyramid
    • NEGaussianPyramidHalf
    • NEGaussianPyramidOrb
    • NEHarrisCorners
    • NEHarrisScoreKernel
    • NEHarrisScoreFP16Kernel
    • NEHistogram
    • NEHistogramKernel
    • NEHOGOrientationBinningKernel
    • NEHOGBlockNormalizationKernel
    • NEHOGDetectorKernel
    • NEHOGNonMaximaSuppressionKernel
    • NEHOGDescriptor
    • NEHOGDetector
    • NEHOGGradient
    • NEHOGMultiDetection
    • NEHOGOrientationBinningKernel
    • NEHOGBlockNormalizationKernel
    • NEHOGDetectorKernel
    • NEIntegralImage
    • NEIntegralImageKernel
    • NELaplacianReconstruct
    • NELaplacianPyramid
    • NEMagnitude
    • NEMagnitudePhaseKernel
    • NEMedian3x3
    • NEMedian3x3Kernel
    • NEMinMaxLocation
    • NEMinMaxLocationKernel
    • NENonLinearFilter
    • NENonLinearFilterKernel
    • NENonMaximaSuppression3x3
    • NENonMaximaSuppression3x3FP16Kernel
    • NENonMaximaSuppression3x3Kernel
    • NEOpticalFlow
    • NEPhase
    • NERemap
    • NERemapKernel
    • NEScharr3x3
    • NEScharr3x3Kernel
    • NESobel3x3
    • NESobel3x3Kernel
    • NESobel5x5
    • NESobel5x5HorKernel
    • NESobel5x5VertKernel
    • NESobel7x7
    • NESobel7x7HorKernel
    • NESobel7x7VertKernel
    • NEThreshold
    • NEThresholdKernel
    • NEWarpAffine
    • NEWarpAffineKernel
    • NEWarpPerspective
    • NEWarpPerspectiveKernel
  • Deprecated GLES kernels / functions (If a kernel is used only by the function that is being deprecated, the kernel is deprecated together):
    • GCAbsoluteDifference
    • GCActivationLayer
    • GCArithmeticAddition
    • GCBatchNormalizationLayer
    • GCConcatenateLayer
    • GCConvolutionLayer
    • GCDepthwiseConvolutionLayer
    • GCDirectConvolutionLayer
    • GCDropoutLayer
    • GCFillBorder
    • GCFullyConnectedLayer
    • GCGEMM
    • GCGEMMInterleave4x4
    • GCGEMMTranspose1xW
    • GCNormalizationLayer
    • GCNormalizePlanarYUVLayer
    • GCPixelWiseMultiplication
    • GCPoolingLayer
    • GCScale
    • GCSoftmaxLayer
    • GCTensorShift
    • GCTranspose

v20.08 Public major release

v20.05 Public major release

v20.02.1 Maintenance release

  • Added Android-NN build script.

v20.02 Public major release

v19.11.1 Public maintenance release

  • Fix offset calculation in NEReductionOperationKernel.
  • Fix data layout in NEScaleKernel for nhwc.
  • Retain configuration step data layout to avoid side-effects.
  • Perform sqrt in double domain for L2 pooling.
  • Fix output shape calculation for Reduce Mean
  • Restrict cases where optimized NEPadLayer runs.

v19.11 Public major release

v19.08.1 Public maintenance release

  • Fix offset calculation in NEReductionOperationKernel.
  • Fix data layout in NEScaleKernel for nhwc.
  • Retain configuration step data layout to avoid side-effects.
  • Perform sqrt in double domain for L2 pooling.
  • Fix output shape calculation for Reduce Mean
  • Fix broadcast CLPixelwiseMultiplication with 5D tensors

v19.08 Public major release

v19.05 Public major release

v19.02 Public major release

v18.11 Public major release

v18.08 Public major release

v18.05 Public major release

v18.03 Public maintenance release

  • Various bug fixes.
  • Fixed bug in NEActivationLayer
  • Fix in CLTuner when using batches.
  • Updated recommended NDK version to r16b (And fixed warnings).
  • Fixed bug in validation code.
  • Added Inception v4 graph example.
  • Renamed NEWinogradLayer.cpp to NEWinogradConvolutionLayer

v18.02 Public major release

v18.01 Public maintenance release

  • Various bug fixes
  • Added some of the missing validate() methods
  • Added CLDeconvolutionLayerUpsampleKernel / CLDeconvolutionLayer CLDeconvolutionLayerUpsample
  • Added CLPermuteKernel / CLPermute
  • Added method to clean the programs cache in the CL Kernel library.
  • Added GCArithmeticAdditionKernel / GCArithmeticAddition
  • Added GCDepthwiseConvolutionLayer3x3Kernel / GCDepthwiseConvolutionLayer3x3
  • Added GCNormalizePlanarYUVLayerKernel / GCNormalizePlanarYUVLayer
  • Added GCScaleKernel / GCScale
  • Added GCWeightsReshapeKernel / GCConvolutionLayer
  • Added FP16 support to the following GLES compute kernels:
    • GCCol2ImKernel
    • GCGEMMInterleave4x4Kernel
    • GCGEMMTranspose1xWKernel
    • GCIm2ColKernel
  • Refactored Arm® Neon™ Winograd (NEWinogradLayerKernel)
  • Added NEDirectConvolutionLayerOutputStageKernel
  • Added QASYMM8 support to the following Arm® Neon™ kernels:
  • Added new examples:
  • More tests added to both validation and benchmarking suites.

v17.12 Public major release

  • Most machine learning functions on OpenCL support the new data type QASYMM8
  • Introduced logging interface
  • Introduced opencl timer
  • Reworked GEMMLowp interface
  • Added new Arm® Neon™ assembly kernels for GEMMLowp, SGEMM and HGEMM
  • Added validation method for most Machine Learning kernels / functions
  • Added new graph examples such as googlenet, mobilenet, squeezenet, vgg16 and vgg19
  • Added sgemm example for OpenCL
  • Added absolute difference example for GLES compute
  • Added new tests and benchmarks in validation and benchmark frameworks
  • Added new kernels / functions for GLES compute
  • New OpenGL ES kernels / functions
    • GCAbsoluteDifferenceKernel / GCAbsoluteDifference
    • GCActivationLayerKernel / GCActivationLayer
    • GCBatchNormalizationLayerKernel / GCBatchNormalizationLayer
    • GCCol2ImKernel
    • GCDepthConcatenateLayerKernel / GCDepthConcatenateLayer
    • GCDirectConvolutionLayerKernel / GCDirectConvolutionLayer
    • GCDropoutLayerKernel / GCDropoutLayer
    • GCFillBorderKernel / GCFillBorder
    • GCGEMMInterleave4x4Kernel / GCGEMMInterleave4x4
    • GCGEMMMatrixAccumulateBiasesKernel / GCGEMMMatrixAdditionKernel / GCGEMMMatrixMultiplyKernel / GCGEMM
    • GCGEMMTranspose1xWKernel / GCGEMMTranspose1xW
    • GCIm2ColKernel
    • GCNormalizationLayerKernel / GCNormalizationLayer
    • GCPixelWiseMultiplicationKernel / GCPixelWiseMultiplication
    • GCPoolingLayerKernel / GCPoolingLayer
    • GCLogits1DMaxKernel / GCLogits1DShiftExpSumKernel / GCLogits1DNormKernel / GCSoftmaxLayer
    • GCTransposeKernel / GCTranspose
  • New Arm® Neon™ kernels / functions
  • New OpenCL kernels / functions
  • New graph nodes for Arm® Neon™ and OpenCL
    • graph::BranchLayer
    • graph::DepthConvertLayer
    • graph::DepthwiseConvolutionLayer
    • graph::DequantizationLayer
    • graph::FlattenLayer
    • graph::QuantizationLayer
    • graph::ReshapeLayer

v17.10 Public maintenance release

  • Bug fixes:
    • Check the maximum local workgroup size supported by OpenCL devices
    • Minor documentation updates (Fixed instructions to build the examples)
    • Introduced a graph::GraphContext
    • Added a few new Graph nodes, support for branches and grouping.
    • Automatically enable cl_printf in debug builds
    • Fixed bare metal builds for armv7a
    • Added AlexNet and cartoon effect examples
    • Fixed library builds: libraries are no longer built as supersets of each other.(It means application using the Runtime part of the library now need to link against both libarm_compute_core and libarm_compute)

v17.09 Public major release

v17.06 Public major release

v17.05 Public bug fixes release

  • Various bug fixes
  • Remaining of the functions ported to use accurate padding.
  • Library does not link against OpenCL anymore (It uses dlopen / dlsym at runtime instead to determine whether or not OpenCL is available).
  • Added "free" method to allocator.
  • Minimum version of g++ required for armv7 Linux changed from 4.8 to 4.9

v17.04 Public bug fixes release

The following functions have been ported to use the new accurate padding:

  • CLColorConvertKernel
  • CLEdgeNonMaxSuppressionKernel
  • CLEdgeTraceKernel
  • CLGaussianPyramidHorKernel
  • CLGaussianPyramidVertKernel
  • CLGradientKernel
  • NEChannelCombineKernel
  • NEFillArrayKernel
  • NEGaussianPyramidHorKernel
  • NEGaussianPyramidVertKernel
  • NEHarrisScoreFP16Kernel
  • NEHarrisScoreKernel
  • NEHOGDetectorKernel
  • NELogits1DMaxKernel
  • NELogits1DShiftExpSumKernel
  • NELogits1DNormKernel
  • NENonMaximaSuppression3x3FP16Kernel
  • NENonMaximaSuppression3x3Kernel

v17.03.1 First Major public release of the sources

v17.03 Sources preview

v17.02.1 Sources preview

  • New OpenCL kernels / functions:
  • New Arm® Neon™ FP16 kernels (Requires armv8.2 CPU)
    • NEAccumulateWeightedFP16Kernel
    • NEBox3x3FP16Kernel
    • NENonMaximaSuppression3x3FP16Kernel

v17.02 Sources preview

  • New OpenCL kernels / functions:
    • CLActivationLayerKernel / CLActivationLayer
    • CLChannelCombineKernel / CLChannelCombine
    • CLDerivativeKernel / CLChannelExtract
    • CLFastCornersKernel / CLFastCorners
    • CLMeanStdDevKernel / CLMeanStdDev
  • New Arm® Neon™ kernels / functions:
    • HOG / SVM: NEHOGOrientationBinningKernel, NEHOGBlockNormalizationKernel, NEHOGDetectorKernel, NEHOGNonMaximaSuppressionKernel / NEHOGDescriptor, NEHOGDetector, NEHOGGradient, NEHOGMultiDetection
    • NENonLinearFilterKernel / NENonLinearFilter
  • Introduced a CLScheduler to manage the default context and command queue used by the runtime library and create synchronisation events.
  • Switched all the kernels / functions to use tensors instead of images.
  • Updated documentation to include instructions to build the library from sources.

v16.12 Binary preview release

  • Original release