Compute Library
 21.02
CLGEMM Class Reference

Basic function to execute GEMM on OpenCL. More...

#include <CLGEMM.h>

Collaboration diagram for CLGEMM:
[legend]

Public Member Functions

 CLGEMM (std::shared_ptr< IMemoryManager > memory_manager=nullptr, IWeightsManager *weights_manager=nullptr)
 Default constructor. More...
 
 CLGEMM (const CLGEMM &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 CLGEMM (CLGEMM &&)=default
 Default move constructor. More...
 
CLGEMMoperator= (const CLGEMM &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
CLGEMMoperator= (CLGEMM &&)=default
 Default move assignment operator. More...
 
 ~CLGEMM ()
 Default destructor. More...
 
void configure (const ICLTensor *a, const ICLTensor *b, const ICLTensor *c, ICLTensor *output, float alpha, float beta, const GEMMInfo &gemm_info=GEMMInfo())
 Initialise the kernel's inputs and output. More...
 
void configure (const CLCompileContext &compile_context, const ICLTensor *a, const ICLTensor *b, const ICLTensor *c, ICLTensor *output, float alpha, float beta, const GEMMInfo &gemm_info=GEMMInfo())
 Initialise the kernel's inputs and output. More...
 
void run () override
 Run the kernels contained in the function. More...
 
void prepare () override
 Prepare the function for executing. More...
 
- Public Member Functions inherited from IFunction
virtual ~IFunction ()=default
 Destructor. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *output, float alpha, float beta, const GEMMInfo &gemm_info=GEMMInfo())
 Static function to check if given info will lead to a valid configuration of CLGEMM. More...
 

Detailed Description

Basic function to execute GEMM on OpenCL.

This function calls the following OpenCL kernels:

  1. CLGEMMReshapeLHSMatrixKernel (only if the RESHAPED_V1 is selected by the heuristic model)
  2. CLGEMMReshapeRHSMatrixKernel (only if either the RESHAPED_V1 or RESHAPED_ONLY_RHS is selected by the select_gemm_kernel method())
  3. CLGEMMMatrixMultiplyKernel (only if either the NATIVE or RESHAPED_V1 is selected by the select_gemm_kernel method())
  4. CLGEMMMatrixMultiplyReshapedKernel (only if RESHAPED_V1 is selected by the select_gemm_kernel method())
  5. CLGEMMMatrixMultiplyReshapedOnlyRHSKernel (only if RESHAPED_ONLY_RHS is selected by the select_gemm_kernel method())

Definition at line 108 of file CLGEMM.h.

Constructor & Destructor Documentation

◆ CLGEMM() [1/3]

CLGEMM ( std::shared_ptr< IMemoryManager memory_manager = nullptr,
IWeightsManager weights_manager = nullptr 
)

Default constructor.

Parameters
[in]memory_manager(Optional) Memory manager.
[in]weights_manager(Optional) Weights manager.

Definition at line 233 of file CLGEMM.cpp.

234  : _memory_group(std::move(memory_manager)),
235  _weights_manager(weights_manager),
236  _mm_kernel(std::make_unique<CLGEMMMatrixMultiplyKernel>()),
237  _reshape_lhs_kernel(std::make_unique<CLGEMMReshapeLHSMatrixKernel>()),
238  _reshape_rhs_kernel(std::make_unique<CLGEMMReshapeRHSMatrixKernel>()),
239  _reshape_rhs_kernel_managed(std::make_unique<weights_transformations::CLGEMMReshapeRHSMatrixKernelManaged>()),
240  _mm_reshaped_kernel(std::make_unique<CLGEMMMatrixMultiplyReshapedKernel>()),
241  _mm_reshaped_only_rhs_kernel(std::make_unique<CLGEMMMatrixMultiplyReshapedOnlyRHSKernel>()),
242  _mm_reshaped_only_rhs_fallback_kernel(std::make_unique<CLGEMMMatrixMultiplyReshapedOnlyRHSKernel>()),
243  _tmp_a(),
244  _tmp_b(),
245  _original_b(nullptr),
246  _lhs(nullptr),
247  _dst(nullptr),
248  _reshape_b_only_on_first_run(false),
249  _is_prepared(false),
250  _gemm_kernel_type(CLGEMMKernelType::NATIVE_V1)
251 {
252 }
Native GEMM kernel with fixed block size.

◆ CLGEMM() [2/3]

CLGEMM ( const CLGEMM )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ CLGEMM() [3/3]

CLGEMM ( CLGEMM &&  )
default

Default move constructor.

◆ ~CLGEMM()

~CLGEMM ( )
default

Default destructor.

Member Function Documentation

◆ configure() [1/2]

void configure ( const ICLTensor a,
const ICLTensor b,
const ICLTensor c,
ICLTensor output,
float  alpha,
float  beta,
const GEMMInfo gemm_info = GEMMInfo() 
)

Initialise the kernel's inputs and output.

Note
GEMM: General Matrix Multiply - [alpha * A * B + beta * C].
All tensors must have the same data type.
Whilst the first input tensor can be a vector, the second input tensor must be at least a matrix
Parameters
[in]aFirst input tensor (Matrix or Vector A). Data types supported: F16/F32
[in]bSecond input tensor (Matrix B). Data type supported: same as a.
[in]cThird input tensor (Matrix C). It can be a nullptr if just the multiplication between a and b is needed. Data type supported: same as a.
[out]outputOutput tensor. Data type supported: same as a
[in]alphaWeight of the matrix product
[in]betaWeight of matrix C
[in]gemm_info(Optional) Specifies if the matrix A and/or matrix B have been reshaped and if the reshape of matrix B should happen only for the first run. GEMMInfo also contains information about the reshaping in case matrix A and matrix B have been already transformed.

Definition at line 666 of file CLGEMM.cpp.

References CLKernelLibrary::get().

Referenced by CLRNNLayer::configure(), CLWinogradConvolutionLayer::configure(), CLGEMMDeconvolutionLayer::configure(), and CLLSTMLayer::configure().

667 {
668  configure(CLKernelLibrary::get().get_compile_context(), a, b, c, output, alpha, beta, gemm_info);
669 }
SimpleTensor< float > b
Definition: DFT.cpp:157
static CLKernelLibrary & get()
Access the KernelLibrary singleton.
void configure(const ICLTensor *a, const ICLTensor *b, const ICLTensor *c, ICLTensor *output, float alpha, float beta, const GEMMInfo &gemm_info=GEMMInfo())
Initialise the kernel&#39;s inputs and output.
Definition: CLGEMM.cpp:666

◆ configure() [2/2]

void configure ( const CLCompileContext compile_context,
const ICLTensor a,
const ICLTensor b,
const ICLTensor c,
ICLTensor output,
float  alpha,
float  beta,
const GEMMInfo gemm_info = GEMMInfo() 
)

Initialise the kernel's inputs and output.

Note
GEMM: General Matrix Multiply - [alpha * A * B + beta * C].
All tensors must have the same data type.
Whilst the first input tensor can be a vector, the second input tensor must be at least a matrix
Parameters
[in]compile_contextThe compile context to be used.
[in]aFirst input tensor (Matrix or Vector A). Data types supported: F16/F32
[in]bSecond input tensor (Matrix B). Data type supported: same as a.
[in]cThird input tensor (Matrix C). It can be a nullptr if just the multiplication between a and b is needed. Data type supported: same as a.
[out]outputOutput tensor. Data type supported: same as a
[in]alphaWeight of the matrix product
[in]betaWeight of matrix C
[in]gemm_info(Optional) Specifies if the matrix A and/or matrix B have been reshaped and if the reshape of matrix B should happen only for the first run. GEMMInfo also contains information about the reshaping in case matrix A and matrix B have been already transformed.

Definition at line 671 of file CLGEMM.cpp.

References ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, arm_compute::test::validation::b, ITensorInfo::data_type(), ITensorInfo::dimension(), CLScheduler::get(), ITensor::info(), arm_compute::helpers::float_ops::is_zero(), arm_compute::NATIVE_V1, GEMMInfo::reinterpret_input_as_3d(), GEMMInfo::reshape_b_only_on_first_run(), arm_compute::RESHAPED, arm_compute::RESHAPED_ONLY_RHS, arm_compute::RESHAPED_V1, GEMMInfo::retain_internal_weights(), CLScheduler::target(), and CLGEMM::validate().

672 {
673  ARM_COMPUTE_ERROR_ON_NULLPTR(a, b, output);
674 
675  // Perform validation step
676  ARM_COMPUTE_ERROR_THROW_ON(validate(a->info(), b->info(), c != nullptr ? c->info() : nullptr, output->info(), alpha, beta, gemm_info));
677 
678  // Check if we need to reshape the matrix B only on the first run
679  _reshape_b_only_on_first_run = gemm_info.reshape_b_only_on_first_run();
680  _is_prepared = gemm_info.retain_internal_weights();
681  _original_b = b;
682  _lhs = a;
683  _dst = output;
684 
685  bool reinterpret_input_as_3d = gemm_info.reinterpret_input_as_3d();
686  const unsigned int m = reinterpret_input_as_3d ? (a->info()->dimension(1) * a->info()->dimension(2)) : a->info()->dimension(1);
687  const unsigned int n = b->info()->dimension(0);
688  const unsigned int k = a->info()->dimension(0);
689  const unsigned int batch_size = reinterpret_input_as_3d ? a->info()->dimension(3) : a->info()->dimension(2);
690 
691  // Select GEMMType
692  _gemm_kernel_type = auto_select_gemm_kernel(auto_heuristics::CommonQuery{ CLScheduler::get().target(), a->info()->data_type(), m, n, k, batch_size }, _reshape_b_only_on_first_run);
693 
694  const bool fuse_add_c = (!(helpers::float_ops::is_zero(beta)) && c != nullptr);
695 
696  const ICLTensor *c_to_use = fuse_add_c ? c : nullptr;
697 
698  switch(_gemm_kernel_type)
699  {
701  {
702  configure_native_v1(compile_context, a, b, c_to_use, output, alpha, beta, gemm_info);
703  break;
704  }
706  {
707  configure_reshaped_v1(compile_context, a, b, c_to_use, output, alpha, beta, gemm_info);
708  break;
709  }
711  {
712  configure_reshaped_v2(compile_context, a, b, c_to_use, output, alpha, beta, gemm_info);
713  break;
714  }
716  {
717  configure_reshaped_only_rhs(compile_context, a, b, c_to_use, output, alpha, beta, gemm_info);
718  break;
719  }
720  default:
721  {
722  ARM_COMPUTE_ERROR("GEMMType not supported");
723  }
724  }
725 }
virtual size_t dimension(size_t index) const =0
Return the size of the requested dimension.
SimpleTensor< float > b
Definition: DFT.cpp:157
static CLScheduler & get()
Access the scheduler singleton.
#define ARM_COMPUTE_ERROR(msg)
Print the given message then throw an std::runtime_error.
Definition: Error.h:352
GPUTarget target() const
Get the target GPU.
Definition: CLScheduler.cpp:47
A collection of adaptor functions that enable the auto selection between mlgo-based heuristics and de...
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
Reshaped GEMM kernel where only the rhs matrix is reshaped.
Reshaped GEMM kernel where both lhs and rhs matrices are reshaped.
Native GEMM kernel with fixed block size.
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor&#39;s metadata.
bool is_zero(float a, float epsilon=0.00001f)
Checks if the input floating point number is 0.0f checking if the difference is within a range define...
Definition: float_ops.h:109
static Status validate(const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *output, float alpha, float beta, const GEMMInfo &gemm_info=GEMMInfo())
Static function to check if given info will lead to a valid configuration of CLGEMM.
Definition: CLGEMM.cpp:727
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161
Reshaped GEMM kernel where both lhs and rhs matrices are reshaped.

◆ operator=() [1/2]

CLGEMM& operator= ( const CLGEMM )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

CLGEMM& operator= ( CLGEMM &&  )
default

Default move assignment operator.

◆ prepare()

void prepare ( )
overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Note
Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from IFunction.

Definition at line 870 of file CLGEMM.cpp.

References CLTensorAllocator::allocate(), CLTensor::allocator(), IWeightsManager::are_weights_managed(), CLScheduler::enqueue(), CLScheduler::get(), ITensor::mark_as_unused(), arm_compute::NATIVE_V1, CLScheduler::queue(), and IWeightsManager::run().

Referenced by CLRNNLayer::prepare(), CLWinogradConvolutionLayer::prepare(), CLGEMMDeconvolutionLayer::prepare(), CLFullyConnectedLayer::prepare(), CLGEMMConvolutionLayer::prepare(), and CLGEMM::run().

871 {
872  if(!_is_prepared)
873  {
874  if(_gemm_kernel_type != CLGEMMKernelType::NATIVE_V1 && _reshape_b_only_on_first_run)
875  {
876  if(_weights_manager && _weights_manager->are_weights_managed(_original_b))
877  {
878  _weights_manager->run(_original_b, _reshape_rhs_kernel_managed.get());
879  }
880  else
881  {
882  // Run transpose kernel and mark original weights tensor as unused
883  _tmp_b.allocator()->allocate();
884  CLScheduler::get().enqueue(*_reshape_rhs_kernel, false);
885  _original_b->mark_as_unused();
886  }
887  }
888  CLScheduler::get().queue().finish();
889  _is_prepared = true;
890  }
891 }
static CLScheduler & get()
Access the scheduler singleton.
CLTensorAllocator * allocator()
Return a pointer to the tensor&#39;s allocator.
Definition: CLTensor.cpp:61
void mark_as_unused() const
Marks a tensor as unused.
Definition: ITensor.cpp:168
Native GEMM kernel with fixed block size.
bool are_weights_managed(const ITensor *weights)
Check if the weights are managed.
cl::CommandQueue & queue()
Accessor for the associated CL command queue.
Definition: CLScheduler.cpp:41
void enqueue(ICLKernel &kernel, bool flush=true)
Schedule the execution of the passed kernel if possible.
void allocate() override
Allocate size specified by TensorInfo of OpenCL memory.
ITensor * run(const ITensor *weights, ITransformWeights *weights_transform)
Run the reshape function.

◆ run()

void run ( )
overridevirtual

Run the kernels contained in the function.

For Neon kernels:

  • Multi-threading is used for the kernels which are parallelisable.
  • By default std::thread::hardware_concurrency() threads are used.
Note
CPPScheduler::set_num_threads() can be used to manually set the number of threads

For OpenCL kernels:

  • All the kernels are enqueued on the queue associated with CLScheduler.
  • The queue is then flushed.
Note
The function will not block until the kernels are executed. It is the user's responsibility to wait.
Will call prepare() on first run if hasn't been done

Implements IFunction.

Definition at line 778 of file CLGEMM.cpp.

References IWeightsManager::are_weights_managed(), ARM_COMPUTE_ERROR, BorderSize::bottom, CLScheduler::enqueue(), CLScheduler::get(), ITensor::info(), arm_compute::NATIVE_V1, ITensorInfo::padding(), CLGEMM::prepare(), arm_compute::RESHAPED, arm_compute::RESHAPED_ONLY_RHS, arm_compute::RESHAPED_V1, IWeightsManager::run(), and BorderSize::top.

Referenced by CLRNNLayer::run(), CLWinogradConvolutionLayer::run(), CLGEMMDeconvolutionLayer::run(), CLFullyConnectedLayer::run(), CLLSTMLayer::run(), and CLGEMMConvolutionLayer::run().

779 {
780  prepare();
781  MemoryGroupResourceScope scope_mg(_memory_group);
782 
783  // Run matrix multiply kernel
784  switch(_gemm_kernel_type)
785  {
787  {
788  CLScheduler::get().enqueue(*_mm_kernel, true);
789  break;
790  }
792  {
793  // Run interleave kernel
794  CLScheduler::get().enqueue(*_reshape_lhs_kernel, false);
795 
796  if(!_reshape_b_only_on_first_run)
797  {
798  // Run transpose kernel
799  if(_weights_manager && _weights_manager->are_weights_managed(_original_b))
800  {
801  _weights_manager->run(_original_b, _reshape_rhs_kernel_managed.get());
802  }
803  else
804  {
805  CLScheduler::get().enqueue(*_reshape_rhs_kernel, false);
806  }
807  }
808 
809  CLScheduler::get().enqueue(*_mm_kernel, true);
810  break;
811  }
813  {
814  // Run interleave kernel
815  CLScheduler::get().enqueue(*_reshape_lhs_kernel, false);
816 
817  if(!_reshape_b_only_on_first_run)
818  {
819  // Run transpose kernel
820  if(_weights_manager && _weights_manager->are_weights_managed(_original_b))
821  {
822  _weights_manager->run(_original_b, _reshape_rhs_kernel_managed.get());
823  }
824  else
825  {
826  CLScheduler::get().enqueue(*_reshape_rhs_kernel, false);
827  }
828  }
829 
830  CLScheduler::get().enqueue(*_mm_reshaped_kernel, true);
831  break;
832  }
834  {
835  if(!_reshape_b_only_on_first_run)
836  {
837  // Run transpose kernel
838  if(_weights_manager && _weights_manager->are_weights_managed(_original_b))
839  {
840  _weights_manager->run(_original_b, _reshape_rhs_kernel_managed.get());
841  }
842  else
843  {
844  CLScheduler::get().enqueue(*_reshape_rhs_kernel, false);
845  }
846  }
847  // In case of RESHAPED_ONLY_RHS, we need to check the padding requirement
848  // Check if the lhs or dst tensors have padding
849  const unsigned int cross_plane_pad_lhs = _lhs->info()->padding().top + _lhs->info()->padding().bottom;
850  const unsigned int cross_plane_pad_dst = _dst->info()->padding().top + _dst->info()->padding().bottom;
851 
852  bool has_pad_y = (cross_plane_pad_lhs != 0) || (cross_plane_pad_dst != 0);
853  if(has_pad_y)
854  {
855  CLScheduler::get().enqueue(*_mm_reshaped_only_rhs_fallback_kernel, true);
856  }
857  else
858  {
859  CLScheduler::get().enqueue(*_mm_reshaped_only_rhs_kernel, true);
860  }
861  break;
862  }
863  default:
864  {
865  ARM_COMPUTE_ERROR("GEMMType not supported");
866  }
867  }
868 }
unsigned int top
top of the border
Definition: Types.h:375
void prepare() override
Prepare the function for executing.
Definition: CLGEMM.cpp:870
static CLScheduler & get()
Access the scheduler singleton.
#define ARM_COMPUTE_ERROR(msg)
Print the given message then throw an std::runtime_error.
Definition: Error.h:352
unsigned int bottom
bottom of the border
Definition: Types.h:377
Reshaped GEMM kernel where only the rhs matrix is reshaped.
Reshaped GEMM kernel where both lhs and rhs matrices are reshaped.
Native GEMM kernel with fixed block size.
bool are_weights_managed(const ITensor *weights)
Check if the weights are managed.
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor&#39;s metadata.
virtual PaddingSize padding() const =0
Padding of tensor.
void enqueue(ICLKernel &kernel, bool flush=true)
Schedule the execution of the passed kernel if possible.
ITensor * run(const ITensor *weights, ITransformWeights *weights_transform)
Run the reshape function.
Reshaped GEMM kernel where both lhs and rhs matrices are reshaped.

◆ validate()

Status validate ( const ITensorInfo a,
const ITensorInfo b,
const ITensorInfo c,
const ITensorInfo output,
float  alpha,
float  beta,
const GEMMInfo gemm_info = GEMMInfo() 
)
static

Static function to check if given info will lead to a valid configuration of CLGEMM.

Parameters
[in]aFirst input tensor info (Matrix or Vector A). Data types supported: F16/F32
[in]bSecond input tensor info (Matrix B). Data type supported: same as a.
[in]cThird input tensor info (Matrix C). It can be a nullptr if just the multiplication between a and b is needed. Data type supported: same as a.
[in]outputOutput tensor info. Data type supported: same as a
[in]alphaWeight of the matrix product
[in]betaWeight of matrix C
[in]gemm_info(Optional) Specifies if the matrix A and/or matrix B have been reshaped and if the reshape of matrix B should happen only for the first run
Returns
a status

Definition at line 727 of file CLGEMM.cpp.

References ARM_COMPUTE_RETURN_ERROR_MSG, ARM_COMPUTE_RETURN_ON_ERROR, ITensorInfo::data_type(), ITensorInfo::dimension(), CLScheduler::get(), arm_compute::helpers::float_ops::is_zero(), arm_compute::NATIVE_V1, GEMMInfo::reinterpret_input_as_3d(), GEMMInfo::reshape_b_only_on_first_run(), arm_compute::RESHAPED, arm_compute::RESHAPED_ONLY_RHS, arm_compute::RESHAPED_V1, and CLScheduler::target().

Referenced by CLGEMM::configure(), CLRNNLayer::validate(), CLWinogradConvolutionLayer::validate(), CLGEMMDeconvolutionLayer::validate(), and CLLSTMLayer::validate().

728 {
729  // Get the GPU target
730  bool reinterpret_input_as_3d = gemm_info.reinterpret_input_as_3d();
731  const unsigned int m = reinterpret_input_as_3d ? (a->dimension(1) * a->dimension(2)) : a->dimension(1);
732  const unsigned int n = b->dimension(0);
733  const unsigned int k = a->dimension(0);
734  const unsigned int batch_size = reinterpret_input_as_3d ? a->dimension(3) : a->dimension(2);
735 
736  // Select GEMMType
737  CLGEMMKernelType gemm_kernel_type = auto_select_gemm_kernel(auto_heuristics::CommonQuery
738  {
739  CLScheduler::get().target(), a->data_type(), m, n, k, batch_size,
740  },
741  gemm_info.reshape_b_only_on_first_run());
742 
743  const bool fuse_add_c = (!(helpers::float_ops::is_zero(beta)) && c != nullptr);
744 
745  const ITensorInfo *c_to_use = fuse_add_c ? c : nullptr;
746 
747  switch(gemm_kernel_type)
748  {
750  {
751  ARM_COMPUTE_RETURN_ON_ERROR(validate_native_v1(a, b, c_to_use, output, alpha, beta, gemm_info));
752  break;
753  }
755  {
756  ARM_COMPUTE_RETURN_ON_ERROR(validate_reshaped_v1(a, b, c_to_use, output, alpha, beta, gemm_info));
757  break;
758  }
760  {
761  ARM_COMPUTE_RETURN_ON_ERROR(validate_reshaped(a, b, c_to_use, output, alpha, beta, gemm_info));
762  break;
763  }
765  {
766  ARM_COMPUTE_RETURN_ON_ERROR(validate_reshaped_only_rhs(a, b, c_to_use, output, alpha, beta, gemm_info));
767  break;
768  }
769  default:
770  {
771  ARM_COMPUTE_RETURN_ERROR_MSG("GEMMType not supported");
772  }
773  }
774 
775  return Status{};
776 }
SimpleTensor< float > b
Definition: DFT.cpp:157
static CLScheduler & get()
Access the scheduler singleton.
GPUTarget target() const
Get the target GPU.
Definition: CLScheduler.cpp:47
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204
A collection of adaptor functions that enable the auto selection between mlgo-based heuristics and de...
Reshaped GEMM kernel where only the rhs matrix is reshaped.
CLGEMMKernelType
OpenCL GEMM kernel types.
Definition: CLTypes.h:31
Reshaped GEMM kernel where both lhs and rhs matrices are reshaped.
Native GEMM kernel with fixed block size.
#define ARM_COMPUTE_RETURN_ERROR_MSG(...)
An error is returned with the given description.
Definition: Error.h:194
bool is_zero(float a, float epsilon=0.00001f)
Checks if the input floating point number is 0.0f checking if the difference is within a range define...
Definition: float_ops.h:109
Reshaped GEMM kernel where both lhs and rhs matrices are reshaped.

The documentation for this class was generated from the following files: