Functions
arm_status	arm_fully_connected_mat_q7_vec_q15 (const q15_t pV, const q7_t pM, const uint16_t dim_vec, const uint16_t num_of_rows, const uint16_t bias_shift, const uint16_t out_shift, const q7_t bias, q15_t pOut, q15_t *vec_buffer)
	Mixed Q15-Q7 fully-connected layer function. More...

arm_status	arm_fully_connected_mat_q7_vec_q15_opt (const q15_t pV, const q7_t pM, const uint16_t dim_vec, const uint16_t num_of_rows, const uint16_t bias_shift, const uint16_t out_shift, const q7_t bias, q15_t pOut, q15_t *vec_buffer)
	Mixed Q15-Q7 opt fully-connected layer function. More...

arm_status	arm_fully_connected_q15 (const q15_t pV, const q15_t pM, const uint16_t dim_vec, const uint16_t num_of_rows, const uint16_t bias_shift, const uint16_t out_shift, const q15_t bias, q15_t pOut, q15_t *vec_buffer)
	Q15 opt fully-connected layer function. More...

arm_status	arm_fully_connected_q15_opt (const q15_t pV, const q15_t pM, const uint16_t dim_vec, const uint16_t num_of_rows, const uint16_t bias_shift, const uint16_t out_shift, const q15_t bias, q15_t pOut, q15_t *vec_buffer)
	Q15 opt fully-connected layer function. More...

arm_status	arm_fully_connected_q7 (const q7_t pV, const q7_t pM, const uint16_t dim_vec, const uint16_t num_of_rows, const uint16_t bias_shift, const uint16_t out_shift, const q7_t bias, q7_t pOut, q15_t *vec_buffer)
	Q7 basic fully-connected layer function. More...

arm_status	arm_fully_connected_q7_opt (const q7_t pV, const q7_t pM, const uint16_t dim_vec, const uint16_t num_of_rows, const uint16_t bias_shift, const uint16_t out_shift, const q7_t bias, q7_t pOut, q15_t *vec_buffer)
	Q7 opt fully-connected layer function. More...

arm_status	arm_fully_connected_s8 (const cmsis_nn_context ctx, const cmsis_nn_fc_params fc_params, const cmsis_nn_per_tensor_quant_params quant_params, const cmsis_nn_dims input_dims, const q7_t input, const cmsis_nn_dims filter_dims, const q7_t kernel, const cmsis_nn_dims bias_dims, const int32_t bias, const cmsis_nn_dims output_dims, q7_t *output)
	Basic s8 Fully Connected function. More...

int32_t	arm_fully_connected_s8_get_buffer_size (const cmsis_nn_dims *filter_dims)
	Get the required buffer size for S8 basic fully-connected and matrix multiplication layer function for TF Lite. More...

Description

Collection of fully-connected and matrix multiplication functions.

Fully-connected layer is basically a matrix-vector multiplication with bias. The matrix is the weights and the input/output vectors are the activation values. Supported {weight, activation} precisions include {8-bit, 8-bit}, {16-bit, 16-bit}, and {8-bit, 16-bit}.

Here we have two types of kernel functions. The basic function implements the function using regular GEMV approach. The opt functions operates with weights in interleaved formats.

Function Documentation

arm_status arm_fully_connected_mat_q7_vec_q15	(	const q15_t *	pV,
		const q7_t *	pM,
		const uint16_t	dim_vec,
		const uint16_t	num_of_rows,
		const uint16_t	bias_shift,
		const uint16_t	out_shift,
		const q7_t *	bias,
		q15_t *	pOut,
		q15_t *	vec_buffer
	)

Parameters

[in]	pV	pointer to input vector
[in]	pM	pointer to matrix weights
[in]	dim_vec	length of the vector
[in]	num_of_rows	number of rows in weight matrix
[in]	bias_shift	amount of left-shift for bias
[in]	out_shift	amount of right-shift for output
[in]	bias	pointer to bias
[in,out]	pOut	pointer to output vector
[in,out]	vec_buffer	pointer to buffer space for input

Returns: The function returns ARM_MATH_SUCCESS

Buffer size:

vec_buffer size: 0

Q7_Q15 version of the fully connected layer

Weights are in q7_t and Activations are in q15_t

References arm_nn_read_q15x2_ia(), and NN_ROUND.

arm_status arm_fully_connected_mat_q7_vec_q15_opt	(	const q15_t *	pV,
		const q7_t *	pM,
		const uint16_t	dim_vec,
		const uint16_t	num_of_rows,
		const uint16_t	bias_shift,
		const uint16_t	out_shift,
		const q7_t *	bias,
		q15_t *	pOut,
		q15_t *	vec_buffer
	)

Parameters

[in]	pV	pointer to input vector
[in]	pM	pointer to matrix weights
[in]	dim_vec	length of the vector
[in]	num_of_rows	number of rows in weight matrix
[in]	bias_shift	amount of left-shift for bias
[in]	out_shift	amount of right-shift for output
[in]	bias	pointer to bias
[in,out]	pOut	pointer to output vector
[in,out]	vec_buffer	pointer to buffer space for input

Returns: The function returns ARM_MATH_SUCCESS

Buffer size:

vec_buffer size: 0

Q7_Q15 version of the fully connected layer

Weights are in q7_t and Activations are in q15_t

Limitation: x4 version requires weight reordering to work

Here we use only one pointer to read 4 rows in the weight matrix. So if the original q7_t matrix looks like this:

| a11 | a12 | a13 | a14 | a15 | a16 | a17 |

| a21 | a22 | a23 | a24 | a25 | a26 | a27 |

| a31 | a32 | a33 | a34 | a35 | a36 | a37 |

| a41 | a42 | a43 | a44 | a45 | a46 | a47 |

| a51 | a52 | a53 | a54 | a55 | a56 | a57 |

| a61 | a62 | a63 | a64 | a65 | a66 | a67 |

We operates on multiple-of-4 rows, so the first four rows becomes

| a11 | a21 | a12 | a22 | a31 | a41 | a32 | a42 |

| a13 | a23 | a14 | a24 | a33 | a43 | a34 | a44 |

| a15 | a25 | a16 | a26 | a35 | a45 | a36 | a46 |

The column left over will be in-order. which is: | a17 | a27 | a37 | a47 |

For the left-over rows, we do 1x1 computation, so the data remains as its original order.

So the stored weight matrix looks like this:

| a11 | a21 | a12 | a22 | a31 | a41 |

| a32 | a42 | a13 | a23 | a14 | a24 |

| a33 | a43 | a34 | a44 | a15 | a25 |

| a16 | a26 | a35 | a45 | a36 | a46 |

| a17 | a27 | a37 | a47 | a51 | a52 |

| a53 | a54 | a55 | a56 | a57 | a61 |

| a62 | a63 | a64 | a65 | a66 | a67 |

References arm_nn_read_q15x2_ia(), arm_nn_read_q7x4_ia(), and NN_ROUND.

arm_status arm_fully_connected_q15	(	const q15_t *	pV,
		const q15_t *	pM,
		const uint16_t	dim_vec,
		const uint16_t	num_of_rows,
		const uint16_t	bias_shift,
		const uint16_t	out_shift,
		const q15_t *	bias,
		q15_t *	pOut,
		q15_t *	vec_buffer
	)

Q15 basic fully-connected layer function.

Parameters

[in]	pV	pointer to input vector
[in]	pM	pointer to matrix weights
[in]	dim_vec	length of the vector
[in]	num_of_rows	number of rows in weight matrix
[in]	bias_shift	amount of left-shift for bias
[in]	out_shift	amount of right-shift for output
[in]	bias	pointer to bias
[in,out]	pOut	pointer to output vector
[in,out]	vec_buffer	pointer to buffer space for input

Returns: The function returns ARM_MATH_SUCCESS

Buffer size:

vec_buffer size: 0

References arm_nn_read_q15x2_ia(), and NN_ROUND.

arm_status arm_fully_connected_q15_opt	(	const q15_t *	pV,
		const q15_t *	pM,
		const uint16_t	dim_vec,
		const uint16_t	num_of_rows,
		const uint16_t	bias_shift,
		const uint16_t	out_shift,
		const q15_t *	bias,
		q15_t *	pOut,
		q15_t *	vec_buffer
	)

Parameters

[in]	pV	pointer to input vector
[in]	pM	pointer to matrix weights
[in]	dim_vec	length of the vector
[in]	num_of_rows	number of rows in weight matrix
[in]	bias_shift	amount of left-shift for bias
[in]	out_shift	amount of right-shift for output
[in]	bias	pointer to bias
[in,out]	pOut	pointer to output vector
[in,out]	vec_buffer	pointer to buffer space for input

Returns: The function returns ARM_MATH_SUCCESS

Buffer size:

vec_buffer size: 0

Here we use only one pointer to read 4 rows in the weight matrix. So if the original matrix looks like this:

| a11 | a12 | a13 |

| a21 | a22 | a23 |

| a31 | a32 | a33 |

| a41 | a42 | a43 |

| a51 | a52 | a53 |

| a61 | a62 | a63 |

We operates on multiple-of-4 rows, so the first four rows becomes

| a11 | a12 | a21 | a22 | a31 | a32 | a41 | a42 |

| a13 | a23 | a33 | a43 |

Remaining rows are kept the same original order.

So the stored weight matrix looks like this:

| a11 | a12 | a21 | a22 | a31 | a32 | a41 | a42 |

| a13 | a23 | a33 | a43 | a51 | a52 | a53 | a61 |

| a62 | a63 |

References arm_nn_read_q15x2_ia(), and NN_ROUND.

arm_status arm_fully_connected_q7	(	const q7_t *	pV,
		const q7_t *	pM,
		const uint16_t	dim_vec,
		const uint16_t	num_of_rows,
		const uint16_t	bias_shift,
		const uint16_t	out_shift,
		const q7_t *	bias,
		q7_t *	pOut,
		q15_t *	vec_buffer
	)

Parameters

[in]	pV	pointer to input vector
[in]	pM	pointer to matrix weights
[in]	dim_vec	length of the vector
[in]	num_of_rows	number of rows in weight matrix
[in]	bias_shift	amount of left-shift for bias
[in]	out_shift	amount of right-shift for output
[in]	bias	pointer to bias
[in,out]	pOut	pointer to output vector
[in,out]	vec_buffer	pointer to buffer space for input

Returns: The function returns ARM_MATH_SUCCESS

Buffer size:

vec_buffer size: dim_vec

This basic function is designed to work with regular weight matrix without interleaving.

References arm_nn_read_q15x2_ia(), arm_q7_to_q15_reordered_no_shift(), and NN_ROUND.

arm_status arm_fully_connected_q7_opt	(	const q7_t *	pV,
		const q7_t *	pM,
		const uint16_t	dim_vec,
		const uint16_t	num_of_rows,
		const uint16_t	bias_shift,
		const uint16_t	out_shift,
		const q7_t *	bias,
		q7_t *	pOut,
		q15_t *	vec_buffer
	)

Parameters

[in]	pV	pointer to input vector
[in]	pM	pointer to matrix weights
[in]	dim_vec	length of the vector
[in]	num_of_rows	number of rows in weight matrix
[in]	bias_shift	amount of left-shift for bias
[in]	out_shift	amount of right-shift for output
[in]	bias	pointer to bias
[in,out]	pOut	pointer to output vector
[in,out]	vec_buffer	pointer to buffer space for input

Returns: The function returns ARM_MATH_SUCCESS

Buffer size:

vec_buffer size: dim_vec

This opt function is designed to work with interleaved weight matrix. The vector input is assumed in q7_t format, we call arm_q7_to_q15_no_shift_shuffle function to expand into q15_t format with certain weight re-ordering, refer to the function comments for more details. Here we use only one pointer to read 4 rows in the weight matrix. So if the original q7_t matrix looks like this:

| a11 | a12 | a13 | a14 | a15 | a16 | a17 |

| a21 | a22 | a23 | a24 | a25 | a26 | a27 |

| a31 | a32 | a33 | a34 | a35 | a36 | a37 |

| a41 | a42 | a43 | a44 | a45 | a46 | a47 |

| a51 | a52 | a53 | a54 | a55 | a56 | a57 |

| a61 | a62 | a63 | a64 | a65 | a66 | a67 |

We operates on multiple-of-4 rows, so the first four rows becomes

| a11 | a21 | a13 | a23 | a31 | a41 | a33 | a43 |

| a12 | a22 | a14 | a24 | a32 | a42 | a34 | a44 |

| a15 | a25 | a35 | a45 | a16 | a26 | a36 | a46 |

So within the kernel, we first read the re-ordered vector in as:

| b1 | b3 | and | b2 | b4 |

the four q31_t weights will look like

| a11 | a13 |, | a21 | a23 |, | a31 | a33 |, | a41 | a43 |

| a12 | a14 |, | a22 | a24 |, | a32 | a34 |, | a42 | a44 |

The column left over will be in-order. which is:

| a17 | a27 | a37 | a47 |

For the left-over rows, we do 1x1 computation, so the data remains as its original order.

So the stored weight matrix looks like this:

| a11 | a21 | a13 | a23 | a31 | a41 |

| a33 | a43 | a12 | a22 | a14 | a24 |

| a32 | a42 | a34 | a44 | a15 | a25 |

| a35 | a45 | a16 | a26 | a36 | a46 |

| a17 | a27 | a37 | a47 | a51 | a52 |

| a53 | a54 | a55 | a56 | a57 | a61 |

| a62 | a63 | a64 | a65 | a66 | a67 |

References arm_nn_read_q15x2_ia(), arm_nn_read_q7x4_ia(), arm_q7_to_q15_reordered_no_shift(), and NN_ROUND.

arm_status arm_fully_connected_s8	(	const cmsis_nn_context *	ctx,
		const cmsis_nn_fc_params *	fc_params,
		const cmsis_nn_per_tensor_quant_params *	quant_params,
		const cmsis_nn_dims *	input_dims,
		const q7_t *	input_data,
		const cmsis_nn_dims *	filter_dims,
		const q7_t *	filter_data,
		const cmsis_nn_dims *	bias_dims,
		const int32_t *	bias_data,
		const cmsis_nn_dims *	output_dims,
		q7_t *	output_data
	)

Parameters

[in,out]	ctx	Function context (e.g. temporary buffer). Check the function definition file to see if an additional buffer is required. Optional function {API}_get_buffer_size() provides the buffer size if an additional buffer is required.
[in]	fc_params	Fully Connected layer parameters (e.g. strides, dilations, pads,...) Range of fc_params->input_offset : [-127, 128] fc_params->filter_offset : 0 Range of fc_params->output_offset : [-128, 127]
[in]	quant_params	Per-tensor quantization info. It contains the multiplier and shift values to be applied to the output tensor.
[in]	input_dims	Input (activation) tensor dimensions. Format: [N, H, W, C_IN] Input dimension is taken as Nx(H * W * C_IN)
[in]	input_data	Input (activation) data pointer. Data type: int8
[in]	filter_dims	Two dimensional filter dimensions. Format: [N, C] N : accumulation depth and equals (H * W * C_IN) from input_dims C : output depth and equals C_OUT in output_dims H & W : Not used
[in]	filter_data	Filter data pointer. Data type: int8
[in]	bias_dims	Bias tensor dimensions. Format: [C_OUT] N, H, W : Not used
[in]	bias_data	Bias data pointer. Data type: int32
[in]	output_dims	Output tensor dimensions. Format: [N, C_OUT] N : Batches C_OUT : Output depth H & W : Not used.
[in,out]	output_data	Output data pointer. Data type: int8

Returns: The function returns ARM_MATH_SUCCESS

Supported framework: TensorFlow Lite
q7 is used as data type eventhough it is s8 data. It is done so to be consistent with existing APIs.

References cmsis_nn_fc_params::activation, arm_nn_vec_mat_mult_t_s8(), cmsis_nn_dims::c, cmsis_nn_fc_params::filter_offset, cmsis_nn_fc_params::input_offset, cmsis_nn_activation::max, cmsis_nn_activation::min, cmsis_nn_per_tensor_quant_params::multiplier, cmsis_nn_dims::n, cmsis_nn_fc_params::output_offset, and cmsis_nn_per_tensor_quant_params::shift.

int32_t arm_fully_connected_s8_get_buffer_size ( const cmsis_nn_dims * filter_dims )

Parameters

[in] filter_dims dimension of filter

Returns: The function returns required buffer size in bytes

Functions

Description

Function Documentation