CMSIS-NN  
CMSIS NN Software Library
Convolution

Functions

arm_cmsis_nn_status arm_nn_depthwise_conv_nt_t_padded_s8 (const int8_t *lhs, const int8_t *rhs, const int32_t input_offset, const int32_t active_ch, const int32_t total_ch, const int32_t *out_shift, const int32_t *out_mult, const int32_t out_offset, const int32_t activation_min, const int32_t activation_max, const uint16_t row_x_col, const int32_t *const output_bias, int8_t *out)
 Depthwise convolution of transposed rhs matrix with 4 lhs matrices. To be used in padded cases where the padding is -lhs_offset(Range: int8). Dimensions are the same for lhs and rhs. More...
 
int16_t * arm_nn_depthwise_conv_nt_t_s16 (const int16_t *lhs, const int8_t *rhs, const uint16_t num_ch, const int32_t *out_shift, const int32_t *out_mult, const int32_t activation_min, const int32_t activation_max, const uint16_t row_x_col, const int64_t *const output_bias, int16_t *out)
 Depthwise convolution of transposed rhs matrix with 4 lhs matrices. To be used in non-padded cases. Dimensions are the same for lhs and rhs. More...
 
arm_cmsis_nn_status arm_nn_depthwise_conv_nt_t_s8 (const int8_t *lhs, const int8_t *rhs, const int32_t input_offset, const int32_t active_ch, const int32_t total_ch, const int32_t *out_shift, const int32_t *out_mult, const int32_t out_offset, const int32_t activation_min, const int32_t activation_max, const uint16_t row_x_col, const int32_t *const output_bias, int8_t *out)
 Depthwise convolution of transposed rhs matrix with 4 lhs matrices. To be used in non-padded cases. Dimensions are the same for lhs and rhs. More...
 
arm_cmsis_nn_status arm_nn_mat_mul_core_1x_s8 (int32_t row_elements, const int32_t skipped_row_elements, const int8_t *row_base_ref, const int8_t *col_base_ref, const int32_t out_ch, const cmsis_nn_conv_params *conv_params, const cmsis_nn_per_channel_quant_params *quant_params, const int32_t *bias, int8_t *output)
 General Vector by Matrix multiplication with requantization and storage of result. More...
 
int8_t * arm_nn_mat_mul_core_4x_s8 (const int32_t row_elements, const int32_t offset, const int8_t *row_base, const int8_t *col_base_ref, const int32_t out_ch, const cmsis_nn_conv_params *conv_params, const cmsis_nn_per_channel_quant_params *quant_params, const int32_t *bias, int8_t *output)
 Matrix-multiplication with requantization & activation function for four rows and one column. More...
 
int16_t * arm_nn_mat_mult_kernel_s16 (const int8_t *input_a, const int16_t *input_b, const int32_t output_ch, const int32_t *out_shift, const int32_t *out_mult, const int16_t activation_min, const int16_t activation_max, const int32_t num_col_a, const int64_t *const output_bias, int16_t *out_0)
 Matrix-multiplication function for convolution with per-channel requantization for 16 bits convolution. More...
 
arm_cmsis_nn_status arm_nn_mat_mult_nt_t_s8 (const int8_t *lhs, const int8_t *rhs, const int32_t *bias, int8_t *dst, const int32_t *dst_multipliers, const int32_t *dst_shifts, const int32_t lhs_rows, const int32_t rhs_rows, const int32_t rhs_cols, const int32_t lhs_offset, const int32_t dst_offset, const int32_t activation_min, const int32_t activation_max, const int32_t rhs_cols_offset)
 General Matrix-multiplication function with per-channel requantization. This function assumes: More...
 

Description

Support functions for Convolution and DW Convolution

Function Documentation

◆ arm_nn_depthwise_conv_nt_t_padded_s8()

arm_cmsis_nn_status arm_nn_depthwise_conv_nt_t_padded_s8 ( const int8_t *  lhs,
const int8_t *  rhs,
const int32_t  lhs_offset,
const int32_t  active_ch,
const int32_t  total_ch,
const int32_t *  out_shift,
const int32_t *  out_mult,
const int32_t  out_offset,
const int32_t  activation_min,
const int32_t  activation_max,
const uint16_t  row_x_col,
const int32_t *const  output_bias,
int8_t *  out 
)
Parameters
[in]lhsInput left-hand side matrix
[in]rhsInput right-hand side matrix (transposed)
[in]lhs_offsetLHS matrix offset(input offset). Range: -127 to 128
[in]active_chSubset of total_ch processed
[in]total_chNumber of channels in LHS/RHS
[in]out_shiftPer channel output shift. Length of vector is equal to number of channels
[in]out_multPer channel output multiplier. Length of vector is equal to number of channels
[in]out_offsetOffset to be added to the output values. Range: -127 to 128
[in]activation_minMinimum value to clamp the output to. Range: int8
[in]activation_maxMaximum value to clamp the output to. Range: int8
[in]row_x_col(row_dimension * col_dimension) of LHS/RHS matrix
[in]output_biasPer channel output bias. Length of vector is equal to number of channels
[in]outOutput pointer
Returns
The function returns one of the two
  • Updated output pointer if an implementation is available
  • NULL if no implementation is available.
Note
If number of channels is not a multiple of 4, upto 3 elements outside the boundary will be read out for the following.
  • Output shift
  • Output multiplier
  • Output bias
  • rhs

◆ arm_nn_depthwise_conv_nt_t_s16()

int16_t * arm_nn_depthwise_conv_nt_t_s16 ( const int16_t *  lhs,
const int8_t *  rhs,
const uint16_t  num_ch,
const int32_t *  out_shift,
const int32_t *  out_mult,
const int32_t  activation_min,
const int32_t  activation_max,
const uint16_t  row_x_col,
const int64_t *const  output_bias,
int16_t *  out 
)
Parameters
[in]lhsInput left-hand side matrix
[in]rhsInput right-hand side matrix (transposed)
[in]num_chNumber of channels in LHS/RHS
[in]out_shiftPer channel output shift. Length of vector is equal to number of channels.
[in]out_multPer channel output multiplier. Length of vector is equal to number of channels.
[in]activation_minMinimum value to clamp the output to. Range: int8
[in]activation_maxMaximum value to clamp the output to. Range: int8
[in]row_x_col(row_dimension * col_dimension) of LHS/RHS matrix
[in]output_biasPer channel output bias. Length of vector is equal to number of channels.
[in]outOutput pointer
Returns
The function returns one of the two
  • Updated output pointer if an implementation is available
  • NULL if no implementation is available.
Note
If number of channels is not a multiple of 4, upto 3 elements outside the boundary will be read out for the following.
  • Output shift
  • Output multiplier
  • Output bias
  • rhs

◆ arm_nn_depthwise_conv_nt_t_s8()

arm_cmsis_nn_status arm_nn_depthwise_conv_nt_t_s8 ( const int8_t *  lhs,
const int8_t *  rhs,
const int32_t  lhs_offset,
const int32_t  active_ch,
const int32_t  total_ch,
const int32_t *  out_shift,
const int32_t *  out_mult,
const int32_t  out_offset,
const int32_t  activation_min,
const int32_t  activation_max,
const uint16_t  row_x_col,
const int32_t *const  output_bias,
int8_t *  out 
)
Parameters
[in]lhsInput left-hand side matrix
[in]rhsInput right-hand side matrix (transposed)
[in]lhs_offsetLHS matrix offset(input offset). Range: -127 to 128
[in]active_chSubset of total_ch processed
[in]total_chNumber of channels in LHS/RHS
[in]out_shiftPer channel output shift. Length of vector is equal to number of channels.
[in]out_multPer channel output multiplier. Length of vector is equal to number of channels.
[in]out_offsetOffset to be added to the output values. Range: -127 to 128
[in]activation_minMinimum value to clamp the output to. Range: int8
[in]activation_maxMaximum value to clamp the output to. Range: int8
[in]row_x_col(row_dimension * col_dimension) of LHS/RHS matrix
[in]output_biasPer channel output bias. Length of vector is equal to number of channels.
[in]outOutput pointer
Returns
The function returns one of the two
  • Updated output pointer if an implementation is available
  • NULL if no implementation is available.
Note
If number of channels is not a multiple of 4, upto 3 elements outside the boundary will be read out for the following.
  • Output shift
  • Output multiplier
  • Output bias
  • rhs

◆ arm_nn_mat_mul_core_1x_s8()

arm_cmsis_nn_status arm_nn_mat_mul_core_1x_s8 ( int32_t  row_elements,
const int32_t  skipped_row_elements,
const int8_t *  row_base_ref,
const int8_t *  col_base_ref,
const int32_t  out_ch,
const cmsis_nn_conv_params conv_params,
const cmsis_nn_per_channel_quant_params quant_params,
const int32_t *  bias,
int8_t *  output 
)
Parameters
[in]row_elementsnumber of row elements
[in]skipped_row_elementsnumber of row elements skipped due to padding. row_elements + skipped_row_elements = (kernel_x * kernel_y) * input_ch
[in]row_base_refpointer to row operand
[in]col_base_refpointer to col operand
[out]out_chNumber of output channels
[in]conv_paramsPointer to convolution parameters like offsets and activation values
[in]quant_paramsPointer to per-channel quantization parameters
[in]biasPointer to optional per-channel bias
[out]outputPointer to output where int8 results are stored.
Returns
The function performs matrix(row_base_ref) multiplication with vector(col_base_ref) and scaled result is stored in memory.

Pseudo-code *output = 0 sum_col = 0 for (j = 0; j < out_ch; j++) for (i = 0; i < row_elements; i++) *output += row_base_ref[i] * col_base_ref[i] sum_col += col_base_ref[i] scale sum_col using quant_params and bias store result in 'output'

◆ arm_nn_mat_mul_core_4x_s8()

int8_t * arm_nn_mat_mul_core_4x_s8 ( const int32_t  row_elements,
const int32_t  offset,
const int8_t *  row_base,
const int8_t *  col_base,
const int32_t  out_ch,
const cmsis_nn_conv_params conv_params,
const cmsis_nn_per_channel_quant_params quant_params,
const int32_t *  bias,
int8_t *  output 
)
Parameters
[in]row_elementsnumber of row elements
[in]offsetoffset between rows. Can be the same as row_elements. For e.g, in a 1x1 conv scenario with stride as 1.
[in]row_basepointer to row operand
[in]col_basepointer to col operand
[in]out_chNumber of output channels
[in]conv_paramsPointer to convolution parameters like offsets and activation values
[in]quant_paramsPointer to per-channel quantization parameters
[in]biasPointer to per-channel bias
[out]outputPointer to output where int8 results are stored.
Returns
The function returns the updated output pointer or NULL if implementation is not available.

Compliant to TFLM int8 specification. MVE implementation only

◆ arm_nn_mat_mult_kernel_s16()

int16_t * arm_nn_mat_mult_kernel_s16 ( const int8_t *  input_a,
const int16_t *  input_b,
const int32_t  output_ch,
const int32_t *  out_shift,
const int32_t *  out_mult,
const int16_t  activation_min,
const int16_t  activation_max,
const int32_t  num_col_a,
const int64_t *const  output_bias,
int16_t *  out_0 
)
Parameters
[in]input_apointer to operand A
[in]input_bpointer to operand B, always consists of 2 vectors.
[in]output_chnumber of rows of A
[in]out_shiftpointer to per output channel requantization shift parameter.
[in]out_multpointer to per output channel requantization multiplier parameter.
[in]activation_minminimum value to clamp the output to. Range : int16
[in]activation_maxmaximum value to clamp the output to. Range : int16
[in]num_col_anumber of columns of A
[in]output_biasper output channel bias. Range : int64
[in,out]out_0pointer to output
Returns
The function returns one of the two
  1. The incremented output pointer for a successful operation or
  2. NULL if implementation is not available.

    This function does the matrix multiplication of weight matrix for all output channels with 2 columns from im2col and produces two elements/output_channel. The outputs are clamped in the range provided by activation min and max. Supported framework: TensorFlow Lite micro.

◆ arm_nn_mat_mult_nt_t_s8()

arm_cmsis_nn_status arm_nn_mat_mult_nt_t_s8 ( const int8_t *  lhs,
const int8_t *  rhs,
const int32_t *  bias,
int8_t *  dst,
const int32_t *  dst_multipliers,
const int32_t *  dst_shifts,
const int32_t  lhs_rows,
const int32_t  rhs_rows,
const int32_t  rhs_cols,
const int32_t  lhs_offset,
const int32_t  dst_offset,
const int32_t  activation_min,
const int32_t  activation_max,
const int32_t  rhs_cols_offset 
)
  • LHS input matrix NOT transposed (nt)
  • RHS input matrix transposed (t)
Note
This operation also performs the broadcast bias addition before the requantization
Parameters
[in]lhsPointer to the LHS input matrix
[in]rhsPointer to the RHS input matrix
[in]biasPointer to the bias vector. The length of this vector is equal to the number of output columns (or RHS input rows)
[out]dstPointer to the output matrix with "m" rows and "n" columns
[in]dst_multipliersPointer to the multipliers vector needed for the per-channel requantization. The length of this vector is equal to the number of output columns (or RHS input rows)
[in]dst_shiftsPointer to the shifts vector needed for the per-channel requantization. The length of this vector is equal to the number of output columns (or RHS input rows)
[in]lhs_rowsNumber of LHS input rows
[in]rhs_rowsNumber of RHS input rows
[in]rhs_colsNumber of LHS/RHS input columns
[in]lhs_offsetOffset to be applied to the LHS input value
[in]dst_offsetOffset to be applied the output result
[in]activation_minMinimum value to clamp down the output. Range : int8
[in]activation_maxMaximum value to clamp up the output. Range : int8
[in]rhs_cols_offsetOffset between input columns. Used to handle non-unity strides Expected value : x * rhs_cols, where x >= 1
Returns
The function returns ARM_CMSIS_NN_SUCCESS