23.11
|
Assembly kernel glue. More...
#include <CpuGemmAssemblyDispatch.h>
Data Structures | |
class | IFallback |
Public Member Functions | |
CpuGemmAssemblyDispatch () | |
Constructor. More... | |
~CpuGemmAssemblyDispatch ()=default | |
Defautl destructor. More... | |
ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE (CpuGemmAssemblyDispatch) | |
void | configure (const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, ITensorInfo *d, const AsmGemmInfo &info) |
If supported create a Compute Library function else fallback to the arm_gemm function. More... | |
bool | is_configured () const |
Was the function successfully configured ? More... | |
bool | isVarWeightsKernel () const |
Indicates if the convolution executes in variable weights mode. More... | |
void | prepare (ITensorPack &tensors) override |
Prepare the function for executing. More... | |
void | run (ITensorPack &tensors) override |
Run the kernels contained in the function. More... | |
experimental::MemoryRequirements | workspace () const override |
Return the memory requirements required by the workspace. More... | |
![]() | |
INEOperator (IRuntimeContext *ctx=nullptr) | |
Constructor. More... | |
INEOperator (const INEOperator &)=delete | |
Prevent instances of this class from being copied (As this class contains pointers) More... | |
INEOperator (INEOperator &&)=default | |
Default move constructor. More... | |
INEOperator & | operator= (const INEOperator &)=delete |
Prevent instances of this class from being copied (As this class contains pointers) More... | |
INEOperator & | operator= (INEOperator &&)=default |
Default move assignment operator. More... | |
~INEOperator () | |
Default destructor. More... | |
![]() | |
virtual | ~IOperator ()=default |
Destructor. More... | |
Static Public Member Functions | |
static Status | validate (const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *d, const AsmGemmInfo &info) |
Indicates whether or not this function can be used to process the given parameters. More... | |
static Status | has_opt_impl (arm_compute::WeightFormat &weight_format, const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *d, const AsmGemmInfo &info) |
Indicates whether or not there is an optimal assembly implementation that can be used to process the given parameters. More... | |
static bool | is_activation_supported (const ActivationLayerInfo &activation) |
Checks if activation is supported by the gemm assembly dispatcher. More... | |
Assembly kernel glue.
Definition at line 70 of file CpuGemmAssemblyDispatch.h.
Constructor.
Definition at line 822 of file CpuGemmAssemblyDispatch.cpp.
|
default |
Defautl destructor.
ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE | ( | CpuGemmAssemblyDispatch | ) |
void configure | ( | const ITensorInfo * | a, |
const ITensorInfo * | b, | ||
const ITensorInfo * | c, | ||
ITensorInfo * | d, | ||
const AsmGemmInfo & | info | ||
) |
If supported create a Compute Library function else fallback to the arm_gemm function.
a
b
and d
are arranged as follows: Lowest dimension <-> Highest dimension a: [K, M, Batch, Multi] b: [N, K, Multi] d: [N, M, Batch, Multi]The "Batch" refers to where "Batch" number of MxK slices of tensor a multiplies with a single KxN slice of b The "Multi" refers to where "Multi" number of individual multiplication of a with b
E.g. the following are some example input shape configurations
(1) Normal 2D gemm a: [K=3, M=4] b: [N=5, K=3] d: [N=5, M=4]
(2) Batches of a sharing b (e.g. gemm-based batched convolution where b is the shared ) a: [K=3, M=4, Batch=9] b: [N=5, K=3] d: [N=5, M=4, Batch=9]
(3) "Batches" of independent gemm (e.g. batched matmul) a: [K=3, M=4, Batch=1, Multi=7] b: [N=5, K=3, Multi=7] d: [N=5, M=4, Batch=1, Multi=7]
(4) "Batches" of independent gemm where b is also shared a: [K=3, M=4, Batch=4, Multi=7] b: [N=5, K=3, Multi=7] d: [N=5, M=4, Batch=4, Multi=7]
[in] | a | Input tensor (Matrix A) |
[in] | b | Input tensor (Matrix B) |
[in] | c | Input tensor (Matrix C) used to pass the bias for quantized calculations |
[out] | d | Output tensor to store the result of matrix multiplication. Data type supported: same as input0 . |
[in] | info | GEMM meta-data |
Definition at line 974 of file CpuGemmAssemblyDispatch.cpp.
References ARM_COMPUTE_ERROR_ON_NULLPTR, arm_compute::test::validation::b, arm_compute::BFLOAT16, ITensorInfo::data_type(), arm_compute::F16, arm_compute::F32, arm_compute::test::validation::info, arm_compute::assembly_utils::map_to_arm_gemm_activation(), arm_compute::QASYMM8, arm_compute::QASYMM8_SIGNED, arm_compute::S32, arm_compute::S8, arm_compute::U8, and CpuGemmAssemblyDispatch::validate().
|
static |
Indicates whether or not there is an optimal assembly implementation that can be used to process the given parameters.
This method has the same use of NEGEMMConvolutionLayer::has_opt_impl, with the only caveat that the value of arm_compute::WeightFormat need to be passed via the parameter info.
Definition at line 826 of file CpuGemmAssemblyDispatch.cpp.
References GemmTuner::args, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_RETURN_ERROR_ON_MSG, ARM_COMPUTE_UNUSED, arm_compute::test::validation::b, arm_compute::BFLOAT16, ci, IScheduler::cpu_info(), ITensorInfo::data_type(), arm_compute::F16, arm_compute::F32, Scheduler::get(), arm_compute::test::validation::info, arm_compute::assembly_utils::map_to_arm_compute_weight_format(), arm_compute::assembly_utils::map_to_arm_gemm_activation(), arm_compute::assembly_utils::map_to_arm_gemm_weight_format(), IScheduler::num_threads(), arm_compute::QASYMM8, arm_compute::QASYMM8_SIGNED, arm_compute::S32, arm_compute::S8, arm_compute::U8, and GemmConfig::weight_format.
Referenced by CpuGemmAssemblyDispatch::validate().
|
static |
Checks if activation is supported by the gemm assembly dispatcher.
[in] | activation | Activation to check |
Definition at line 968 of file CpuGemmAssemblyDispatch.cpp.
References arm_compute::assembly_utils::map_to_arm_gemm_activation(), Activation::None, and Activation::type.
Referenced by CpuGemmLowpMatrixMultiplyCore::configure().
bool is_configured | ( | ) | const |
Was the function successfully configured ?
Definition at line 1036 of file CpuGemmAssemblyDispatch.cpp.
|
inline |
Indicates if the convolution executes in variable weights mode.
Similar to CpuGemm::isVarWeightsKernel
Definition at line 182 of file CpuGemmAssemblyDispatch.h.
|
overridevirtual |
Prepare the function for executing.
Any one off pre-processing step required by the function is handled here
[in] | constants | Vector that contains the constants tensors. |
Reimplemented from INEOperator.
Definition at line 1030 of file CpuGemmAssemblyDispatch.cpp.
References ARM_COMPUTE_ERROR_ON.
|
overridevirtual |
Run the kernels contained in the function.
[in] | tensors | Vector that contains the tensors to operate on. |
Reimplemented from INEOperator.
Definition at line 1041 of file CpuGemmAssemblyDispatch.cpp.
References ARM_COMPUTE_ERROR_ON.
|
static |
Indicates whether or not this function can be used to process the given parameters.
[in] | a | Input tensor info (Matrix A) |
[in] | b | Input tensor info (Matrix B) |
[in] | c | Input tensor info (Matrix C) used to pass the bias for quantized calculations |
[in] | d | Output tensor to store the result of matrix multiplication. Data type supported: same as input0 . |
[in] | info | GEMM meta-data |
Definition at line 909 of file CpuGemmAssemblyDispatch.cpp.
References arm_compute::ANY, ARM_COMPUTE_RETURN_ERROR_ON_CPU_BF16_UNSUPPORTED, ARM_COMPUTE_RETURN_ERROR_ON_CPU_F16_UNSUPPORTED, ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_NOT_IN, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES, ARM_COMPUTE_RETURN_ERROR_ON_MSG, ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR, ARM_COMPUTE_UNUSED, arm_compute::test::validation::b, arm_compute::BFLOAT16, ITensorInfo::data_type(), ITensorInfo::element_size(), arm_compute::F16, arm_compute::F32, CpuGemmAssemblyDispatch::has_opt_impl(), arm_compute::test::validation::info, arm_compute::is_data_type_quantized_per_channel(), arm_compute::is_fixed_format_fast_math(), arm_compute::QASYMM8, arm_compute::QASYMM8_SIGNED, arm_compute::QSYMM8_PER_CHANNEL, arm_compute::S32, arm_compute::S8, arm_compute::U32, arm_compute::U8, and arm_compute::UNSPECIFIED.
Referenced by CpuGemm::configure(), CpuGemmAssemblyDispatch::configure(), CpuMatMul::validate(), CpuGemmDirectConv2d::validate(), CpuGemm::validate(), and CpuGemmLowpMatrixMultiplyCore::validate().
|
overridevirtual |
Return the memory requirements required by the workspace.
Reimplemented from INEOperator.
Definition at line 1047 of file CpuGemmAssemblyDispatch.cpp.
References ARM_COMPUTE_ERROR_ON.