Compute Library
 19.08
GCGEMM Class Reference

Basic function to execute GEMM on OpenGLES Compute. More...

#include <GCGEMM.h>

Collaboration diagram for GCGEMM:
[legend]

Public Member Functions

 GCGEMM (std::shared_ptr< IMemoryManager > memory_manager=nullptr)
 Default constructor. More...
 
 GCGEMM (const GCGEMM &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 GCGEMM (GCGEMM &&)=default
 Default move constructor. More...
 
GCGEMMoperator= (const GCGEMM &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
GCGEMMoperator= (GCGEMM &&)=default
 Default move assignment operator. More...
 
void configure (const IGCTensor *a, const IGCTensor *b, const IGCTensor *c, IGCTensor *output, float alpha, float beta, const GEMMInfo &gemm_info=GEMMInfo())
 Initialise the kernel's inputs and output. More...
 
void run () override
 Run the kernels contained in the function. More...
 
void prepare () override
 Prepare the function for executing. More...
 
- Public Member Functions inherited from IFunction
virtual ~IFunction ()=default
 Destructor. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *a, const ITensorInfo *b, const IGCTensor *c, const ITensorInfo *output, const float alpha, const float beta, const GEMMInfo &gemm_info=GEMMInfo())
 Static function to check if given info will lead to a valid configuration of GCGEMM. More...
 

Detailed Description

Basic function to execute GEMM on OpenGLES Compute.

This function calls the following kernels:

  1. GCGEMMInterleave4x4Kernel (if the output tensor is a matrix)
  2. GCGEMMTranspose1xWKernel (if the output tensor is a matrix)
  3. GCGEMMMatrixMultiplyKernel
  4. GCGEMMMatrixAdditionKernel (if c != nullptr and beta != 0.0)

Definition at line 48 of file GCGEMM.h.

Constructor & Destructor Documentation

◆ GCGEMM() [1/3]

GCGEMM ( std::shared_ptr< IMemoryManager memory_manager = nullptr)

Default constructor.

Definition at line 75 of file GCGEMM.cpp.

76  : _memory_group(std::move(memory_manager)), _interleave_kernel(), _transpose_kernel(), _mm_kernel(), _ma_kernel(), _tmp_a(), _tmp_b(), _original_b(nullptr), _is_interleaved_transposed(false),
77  _run_addition(false), _reshape_b_only_on_first_run(false), _is_prepared(false)
78 {
79 }

◆ GCGEMM() [2/3]

GCGEMM ( const GCGEMM )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ GCGEMM() [3/3]

GCGEMM ( GCGEMM &&  )
default

Default move constructor.

Member Function Documentation

◆ configure()

void configure ( const IGCTensor a,
const IGCTensor b,
const IGCTensor c,
IGCTensor output,
float  alpha,
float  beta,
const GEMMInfo gemm_info = GEMMInfo() 
)

Initialise the kernel's inputs and output.

Note
GEMM: General Matrix Multiply - [alpha * A * B + beta * C].
All tensors must have the same data type.
Whilst the first input tensor can be a vector, the second input tensor must be at least a matrix
Parameters
[in]aFirst input tensor (Matrix or Vector A). Data types supported: F32
[in]bSecond input tensor (Matrix B). Data type supported: same as a.
[in]cThird input tensor (Matrix C). It can be a nullptr if just the multiplication between a and b is needed. Data type supported: same as a.
[out]outputOutput tensor. Data type supported: same as a
[in]alphaWeight of the matrix product
[in]betaWeight of matrix C
[in]gemm_info(Optional) Specifies if the matrix A and/or matrix B have been reshaped and if the reshape of matrix B should happen only for the first run

Definition at line 81 of file GCGEMM.cpp.

82 {
83  ARM_COMPUTE_ERROR_ON_NULLPTR(a, b, output);
84 
85  // Perform validation step
86  ARM_COMPUTE_ERROR_THROW_ON(validate_arguments(a->info(), b->info(), c, output->info(), alpha, beta, gemm_info));
87 
88  // Check if we need to reshape the matrix B only on the first run
89  _reshape_b_only_on_first_run = gemm_info.reshape_b_only_on_first_run();
90  _is_prepared = false;
91  _original_b = b;
92 
93  const IGCTensor *matrix_a = a;
94  const IGCTensor *matrix_b = b;
95 
96  // Get the GPU target
97  const GPUTarget gpu_target = GCScheduler::get().get_target();
98 
99  // Set the target for the kernels
100  _interleave_kernel.set_target(gpu_target);
101  _mm_kernel.set_target(gpu_target);
102 
103  // Arguments used by GEMMReshapeInfo
104  // If we pass the matrix A and matrix B reshaped to GCGEMMMatrixMultiplyKernel, we need to pass m, n, k, mult_transpose1xW_width and mult_interleave4x4_height to GCGEMMReshapeInfo
105  // in order to know how the matrices have been reshaped
106  const int m = a->info()->dimension(1);
107  const int n = b->info()->dimension(0);
108  const int k = a->info()->dimension(0);
109  int mult_transpose1xW_width = 1;
110  int mult_interleave4x4_height = 1;
111 
112  // If the input tensor has less than 16 rows, we run a special version of GEMM without reshaping the input tensors
113  _is_interleaved_transposed = a->info()->dimension(1) > 16;
114 
115  if(_is_interleaved_transposed)
116  {
117  matrix_a = &_tmp_a;
118  matrix_b = &_tmp_b;
119 
120  // Manage intermediate buffers
121  _memory_group.manage(&_tmp_a);
122  if(!_reshape_b_only_on_first_run)
123  {
124  _memory_group.manage(&_tmp_b);
125  }
126  // _tmp_a and _tmp_b will be auto configured in _interleave_kernel and in _transpose_kernel
127 
128  // Configure interleave kernel
129  _interleave_kernel.configure(a, &_tmp_a);
130 
131  // Configure transpose kernel
132  _transpose_kernel.configure(b, &_tmp_b);
133  }
134 
135  _mm_kernel.configure(matrix_a, matrix_b, output, alpha, _is_interleaved_transposed, GEMMReshapeInfo(m, n, k, mult_transpose1xW_width, mult_interleave4x4_height));
136 
137  if(_is_interleaved_transposed)
138  {
139  // Allocate intermediate tensors
140  _tmp_a.allocator()->allocate();
141  if(!_reshape_b_only_on_first_run)
142  {
143  _tmp_b.allocator()->allocate();
144  }
145  }
146 
147  // Configure matrix addition kernel
148  if(beta != 0 && c != nullptr)
149  {
150  _ma_kernel.configure(c, output, beta);
151  _run_addition = true;
152  }
153 }
virtual size_t dimension(size_t index) const =0
Return the size of the requested dimension.
SimpleTensor< float > b
Definition: DFT.cpp:157
GEMM reshape information class.
Definition: Types.h:1724
Interface for GLES Compute tensor.
Definition: IGCTensor.h:35
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:327
void set_target(GPUTarget target)
Set the targeted GPU architecture.
Definition: IGCKernel.h:113
static GCScheduler & get()
Access the scheduler singleton.
Definition: GCScheduler.cpp:62
void manage(TensorType *obj)
Sets a object to be managed by the given memory group.
void configure(const IGCTensor *input0, const IGCTensor *input1, IGCTensor *output, float alpha, bool is_interleaved_transposed=true, const GEMMReshapeInfo &reshape_info=GEMMReshapeInfo())
Initialise the kernel's input, output and alpha.
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor's metadata.
virtual void allocate()=0
Interface to be implemented by the child class to allocate the tensor.
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161
GPUTarget
Available GPU Targets.
Definition: GPUTarget.h:34
bool reshape_b_only_on_first_run() const
Flag which specifies if the reshape of matrix B should executed only for the first.
Definition: Types.h:1951
void configure(const IGCTensor *input, IGCTensor *output, float beta)
Initialise the kernel's input, output and beta value.
GPUTarget get_target() const
Get the target GPU.
Definition: GCScheduler.h:71
void configure(const IGCTensor *input, IGCTensor *output)
Initialise the kernel's input and output.
ITensorAllocator * allocator()
Return a pointer to the tensor's allocator.
Definition: GCTensor.cpp:34
void configure(const IGCTensor *input, IGCTensor *output)
Initialise the kernel's input and output.

References ITensorAllocator::allocate(), GCTensor::allocator(), arm_compute::test::validation::alpha, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, arm_compute::test::validation::b, GCGEMMMatrixAdditionKernel::configure(), GCGEMMTranspose1xWKernel::configure(), GCGEMMMatrixMultiplyKernel::configure(), GCGEMMInterleave4x4Kernel::configure(), ITensorInfo::dimension(), GCScheduler::get(), GCScheduler::get_target(), ITensor::info(), MemoryGroupBase< TensorType >::manage(), GEMMInfo::reshape_b_only_on_first_run(), and IGCKernel::set_target().

◆ operator=() [1/2]

GCGEMM& operator= ( const GCGEMM )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

GCGEMM& operator= ( GCGEMM &&  )
default

Default move assignment operator.

◆ prepare()

void prepare ( )
overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Note
Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from IFunction.

Definition at line 192 of file GCGEMM.cpp.

193 {
194  if(!_is_prepared)
195  {
196  if(_is_interleaved_transposed && _reshape_b_only_on_first_run)
197  {
198  ARM_COMPUTE_ERROR_ON(!_original_b->is_used());
199 
200  // Run transpose kernel
201  _tmp_b.allocator()->allocate();
202  GCScheduler::get().dispatch(_transpose_kernel, false);
204 
205  // Mark original weights tensor as unused
206  _original_b->mark_as_unused();
207  }
208 
209  _is_prepared = true;
210  }
211 }
void dispatch(IGCKernel &kernel, bool flush=true)
Schedule the execution of the passed kernel if possible.
Definition: GCScheduler.cpp:69
bool is_used() const
Flags if the tensor is used or not.
Definition: ITensor.cpp:162
void memory_barrier()
Defines a barrier ordering memory transactions.
Definition: GCScheduler.cpp:78
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:337
void mark_as_unused() const
Marks a tensor as unused.
Definition: ITensor.cpp:167
static GCScheduler & get()
Access the scheduler singleton.
Definition: GCScheduler.cpp:62
virtual void allocate()=0
Interface to be implemented by the child class to allocate the tensor.
ITensorAllocator * allocator()
Return a pointer to the tensor's allocator.
Definition: GCTensor.cpp:34

References ITensorAllocator::allocate(), GCTensor::allocator(), ARM_COMPUTE_ERROR_ON, GCScheduler::dispatch(), GCScheduler::get(), ITensor::is_used(), ITensor::mark_as_unused(), and GCScheduler::memory_barrier().

Referenced by GCGEMM::run().

◆ run()

void run ( )
overridevirtual

Run the kernels contained in the function.

For NEON kernels:

  • Multi-threading is used for the kernels which are parallelisable.
  • By default std::thread::hardware_concurrency() threads are used.
Note
CPPScheduler::set_num_threads() can be used to manually set the number of threads

For OpenCL kernels:

  • All the kernels are enqueued on the queue associated with CLScheduler.
  • The queue is then flushed.
Note
The function will not block until the kernels are executed. It is the user's responsibility to wait.
Will call prepare() on first run if hasn't been done

Implements IFunction.

Definition at line 161 of file GCGEMM.cpp.

162 {
163  prepare();
164 
165  MemoryGroupResourceScope scope_mg(_memory_group);
166 
167  if(_is_interleaved_transposed)
168  {
169  // Run interleave kernel
170  GCScheduler::get().dispatch(_interleave_kernel, false);
171 
172  if(!_reshape_b_only_on_first_run)
173  {
174  // Run transpose kernel
175  GCScheduler::get().dispatch(_transpose_kernel, false);
176  }
177 
179  }
180 
181  // Run matrix multiply kernel
182  GCScheduler::get().dispatch(_mm_kernel, !_run_addition);
183 
184  // Run matrix addition kernel
185  if(_run_addition)
186  {
188  GCScheduler::get().dispatch(_ma_kernel);
189  }
190 }
void dispatch(IGCKernel &kernel, bool flush=true)
Schedule the execution of the passed kernel if possible.
Definition: GCScheduler.cpp:69
void memory_barrier()
Defines a barrier ordering memory transactions.
Definition: GCScheduler.cpp:78
static GCScheduler & get()
Access the scheduler singleton.
Definition: GCScheduler.cpp:62
void prepare() override
Prepare the function for executing.
Definition: GCGEMM.cpp:192
Memory group resources scope handling class.
Definition: IMemoryGroup.h:46

References GCScheduler::dispatch(), GCScheduler::get(), GCScheduler::memory_barrier(), and GCGEMM::prepare().

Referenced by GCConvolutionLayer::run().

◆ validate()

Status validate ( const ITensorInfo a,
const ITensorInfo b,
const IGCTensor c,
const ITensorInfo output,
const float  alpha,
const float  beta,
const GEMMInfo gemm_info = GEMMInfo() 
)
static

Static function to check if given info will lead to a valid configuration of GCGEMM.

Parameters
[in]aFirst input tensor (Matrix or Vector A). Data types supported: F16/F32
[in]bSecond input tensor (Matrix B). Data type supported: same as a.
[in]cThird input tensor (Matrix C). It can be a nullptr if just the multiplication between a and b is needed. Data type supported: same as a.
[out]outputOutput tensor. Data type supported: same as a
[in]alphaWeight of the matrix product
[in]betaWeight of matrix C
[in]gemm_info(Optional) Specifies if the matrix A and/or matrix B have been reshaped and if the reshape of matrix B should happen only for the first run
Returns
a status

Definition at line 155 of file GCGEMM.cpp.

156 {
157  ARM_COMPUTE_RETURN_ON_ERROR(validate_arguments(a, b, c, output, alpha, beta, gemm_info));
158  return Status{};
159 }
SimpleTensor< float > b
Definition: DFT.cpp:157
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:193
Status class.
Definition: Error.h:52

References arm_compute::test::validation::alpha, ARM_COMPUTE_RETURN_ON_ERROR, and arm_compute::test::validation::b.


The documentation for this class was generated from the following files: