Compute Library
 19.08
CLWinogradConvolutionLayer Class Reference

Basic function to execute Winograd-based convolution on OpenCL. More...

#include <CLWinogradConvolutionLayer.h>

Collaboration diagram for CLWinogradConvolutionLayer:
[legend]

Public Member Functions

 CLWinogradConvolutionLayer (std::shared_ptr< IMemoryManager > memory_manager=nullptr)
 Default constructor. More...
 
 CLWinogradConvolutionLayer (const CLWinogradConvolutionLayer &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 CLWinogradConvolutionLayer (CLWinogradConvolutionLayer &&)=default
 Default move constructor. More...
 
CLWinogradConvolutionLayeroperator= (const CLWinogradConvolutionLayer &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
CLWinogradConvolutionLayeroperator= (CLWinogradConvolutionLayer &&)=default
 Default move assignment operator. More...
 
void configure (ICLTensor *input, const ICLTensor *weights, const ICLTensor *biases, ICLTensor *output, const PadStrideInfo &conv_info, const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false)
 Set the input and output tensors. More...
 
void run () override
 Run the kernels contained in the function. More...
 
void prepare () override
 Prepare the function for executing. More...
 
- Public Member Functions inherited from IFunction
virtual ~IFunction ()=default
 Destructor. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, const PadStrideInfo &conv_info, const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false)
 Static function to check if given info will lead to a valid configuration of CLWinogradConvolutionLayer. More...
 

Detailed Description

Basic function to execute Winograd-based convolution on OpenCL.

This function calls the following OpenCL functions/kernels:

  1. CLWinogradInputTransform
  2. CLWinogradFilterTransformKernel (only once)
  3. CLGEMM
  4. CLWinogradOutputTransformKernel

Definition at line 46 of file CLWinogradConvolutionLayer.h.

Constructor & Destructor Documentation

◆ CLWinogradConvolutionLayer() [1/3]

CLWinogradConvolutionLayer ( std::shared_ptr< IMemoryManager memory_manager = nullptr)

Default constructor.

Definition at line 92 of file CLWinogradConvolutionLayer.cpp.

93  : _memory_group(memory_manager), _batched_mm(memory_manager), _input_transform(), _filter_transform(), _output_transform(), _input0(), _input1(), _batched_mm_output(), _original_weights(nullptr),
94  _is_prepared(false)
95 {
96 }

◆ CLWinogradConvolutionLayer() [2/3]

Prevent instances of this class from being copied (As this class contains pointers)

◆ CLWinogradConvolutionLayer() [3/3]

Default move constructor.

Member Function Documentation

◆ configure()

void configure ( ICLTensor input,
const ICLTensor weights,
const ICLTensor biases,
ICLTensor output,
const PadStrideInfo conv_info,
const ActivationLayerInfo act_info = ActivationLayerInfo(),
bool  enable_fast_math = false 
)

Set the input and output tensors.

Note
: This function only works with 3x3,3x1,1x3,5x5,5x1,1x5,7x1 and 1x7 kernels along with unit strides for both NCHW and NHWC data layout
Some Winograd configurations (i.e. F(4x4, 5x5)) are supported only with enable_fast_math = true
Parameters
[in]inputSource tensor. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: F16/F32.
[in]weightsWeights tensor. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. Data type supported:Same as input.
[in]biasesBiases tensor. Shared biases supported. Biases are 1D tensor with dimensions [OFM].Data type supported: Same as input
[out]outputDestination tensor. 3 lower dimensions represent a single output [width, height, OFM], while the rest represent batch of outputs. Data types supported: Same as input.
[in]conv_infoContains padding and stride information described in PadStrideInfo.
[in]act_info(Optional) Activation layer information in case of a fused activation.
[in]enable_fast_math(Optional) Enable fast math computation. In case this flag were set, the function could dispatch the fastest implementation available which may introduce a drop of accuracy as well. Default is false

Definition at line 98 of file CLWinogradConvolutionLayer.cpp.

100 {
101  // Get indices for the width and height
102  const size_t idx_width = get_data_layout_dimension_index(input->info()->data_layout(), DataLayoutDimension::WIDTH);
103  const size_t idx_height = get_data_layout_dimension_index(input->info()->data_layout(), DataLayoutDimension::HEIGHT);
104 
105  // Input shape, kernel size and output tile
106  const Size2D input_dims = Size2D(input->info()->tensor_shape()[idx_width], input->info()->tensor_shape()[idx_height]);
107  const Size2D kernel_size = Size2D(weights->info()->tensor_shape()[idx_width], weights->info()->tensor_shape()[idx_height]);
108  const Size2D output_tile = winograd_output_tile(input_dims, kernel_size, input->info()->data_layout());
109 
110  // Check if the Winograd configuration requires fast math
111  if(!enable_fast_math)
112  {
113  ARM_COMPUTE_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(input, 1, DataType::F32); //disable winograd for fp16 if fast math is false.
114  ARM_COMPUTE_ERROR_ON_MSG(check_support_fast_math(output_tile, kernel_size), "This Winograd configuration requires enable_fast_math=true");
115  }
116  const WinogradInfo winograd_info = WinogradInfo(output_tile,
117  kernel_size,
118  input_dims,
119  conv_info,
120  input->info()->data_layout());
121 
122  _is_prepared = false;
123  _original_weights = weights;
124 
125  // Manage intermediate tensors
126  _memory_group.manage(&_input0);
127  _memory_group.manage(&_batched_mm_output);
128 
129  // Do not manage _input1 as it contains the weights
130 
131  // Configure input transform
132  _input_transform.configure(input, &_input0, winograd_info);
133 
134  // Configure filter transform
135  _filter_transform.configure(weights, &_input1, winograd_info);
136 
137  // Configure batched matrix multiply
138  _batched_mm.configure(&_input0, &_input1, nullptr, &_batched_mm_output, 1.0f, 0.0f, GEMMInfo(false, false, true /* Reshape weights only for the first run*/, 0, false, false, GEMMLowpOutputStageInfo(),
139  (input->info()->data_type() == DataType::F16)));
140 
141  // Configure output transform
142  _output_transform.configure(&_batched_mm_output, biases, output, winograd_info, act_info);
143 
144  // Allocate temporary tensors
145  _input0.allocator()->allocate();
146  _batched_mm_output.allocator()->allocate();
147 }
TensorInfo * info() const override
Interface to be implemented by the child class to return the tensor's metadata.
Definition: CLTensor.cpp:35
Winograd information.
Definition: Types.h:2043
virtual DataType data_type() const =0
Data type used for each element of the tensor.
1 channel, 1 F32 per channel
CLTensorAllocator * allocator()
Return a pointer to the tensor's allocator.
Definition: CLTensor.cpp:55
void configure(const ICLTensor *input, ICLTensor *output, const WinogradInfo &winograd_info)
Set the input and output tensor.
1 channel, 1 F16 per channel
void manage(TensorType *obj)
Sets a object to be managed by the given memory group.
void configure(const ICLTensor *input, const ICLTensor *bias, ICLTensor *output, const WinogradInfo &winograd_info, const ActivationLayerInfo &act_info=ActivationLayerInfo())
Set the input and output tensor.
virtual const TensorShape & tensor_shape() const =0
Size for each dimension of the tensor.
GEMMLowp output stage info.
Definition: Types.h:1845
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor's metadata.
#define ARM_COMPUTE_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(t, c,...)
Definition: Validate.h:789
void allocate() override
Allocate size specified by TensorInfo of OpenCL memory.
void configure(const ICLTensor *a, const ICLTensor *b, const ICLTensor *c, ICLTensor *output, float alpha, float beta, const GEMMInfo &gemm_info=GEMMInfo())
Initialise the kernel's inputs and output.
Definition: CLGEMM.cpp:470
Class for specifying the size of an image or rectangle.
Definition: Size2D.h:34
GEMM information class.
Definition: Types.h:1880
void configure(ICLTensor *input, ICLTensor *output, const WinogradInfo &winograd_info)
Set the input and output tensors.
size_t get_data_layout_dimension_index(const DataLayout data_layout, const DataLayoutDimension data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:326
const TensorShape & tensor_shape() const override
Size for each dimension of the tensor.
Definition: TensorInfo.h:252
virtual DataLayout data_layout() const =0
Get the data layout of the tensor.
#define ARM_COMPUTE_ERROR_ON_MSG(cond,...)
Definition: Error.h:328

References arm_compute::test::validation::act_info, CLTensorAllocator::allocate(), CLTensor::allocator(), ARM_COMPUTE_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_ERROR_ON_MSG, CLWinogradInputTransform::configure(), CLWinogradFilterTransformKernel::configure(), CLWinogradOutputTransformKernel::configure(), CLGEMM::configure(), arm_compute::test::validation::conv_info, ITensorInfo::data_layout(), ITensorInfo::data_type(), arm_compute::F16, arm_compute::F32, arm_compute::get_data_layout_dimension_index(), arm_compute::HEIGHT, ITensor::info(), CLTensor::info(), MemoryGroupBase< TensorType >::manage(), ITensorInfo::tensor_shape(), TensorInfo::tensor_shape(), arm_compute::test::validation::weights, arm_compute::WIDTH, and arm_compute::test::validation::winograd_info.

◆ operator=() [1/2]

CLWinogradConvolutionLayer& operator= ( const CLWinogradConvolutionLayer )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

Default move assignment operator.

◆ prepare()

void prepare ( )
overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Note
Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from IFunction.

Definition at line 216 of file CLWinogradConvolutionLayer.cpp.

217 {
218  if(!_is_prepared)
219  {
220  // Run filter transform and mark original weights as unused
221  _input1.allocator()->allocate();
222  CLScheduler::get().enqueue(_filter_transform, false);
223  _original_weights->mark_as_unused();
224 
225  // Prepare GEMM and release reshaped weights if marked unused by CLGEMM
226  _batched_mm.prepare();
227  if(!_input1.is_used())
228  {
229  _input1.allocator()->free();
230  }
231 
232  CLScheduler::get().queue().finish();
233  _is_prepared = true;
234  }
235 }
void prepare() override
Prepare the function for executing.
Definition: CLGEMM.cpp:632
static CLScheduler & get()
Access the scheduler singleton.
Definition: CLScheduler.cpp:41
bool is_used() const
Flags if the tensor is used or not.
Definition: ITensor.cpp:162
CLTensorAllocator * allocator()
Return a pointer to the tensor's allocator.
Definition: CLTensor.cpp:55
void mark_as_unused() const
Marks a tensor as unused.
Definition: ITensor.cpp:167
void enqueue(ICLKernel &kernel, bool flush=true)
Schedule the execution of the passed kernel if possible.
Definition: CLScheduler.cpp:95
cl::CommandQueue & queue()
Accessor for the associated CL command queue.
Definition: CLScheduler.h:102
void allocate() override
Allocate size specified by TensorInfo of OpenCL memory.
void free() override
Free allocated OpenCL memory.

References CLTensorAllocator::allocate(), CLTensor::allocator(), CLScheduler::enqueue(), CLTensorAllocator::free(), CLScheduler::get(), ITensor::is_used(), ITensor::mark_as_unused(), CLGEMM::prepare(), and CLScheduler::queue().

Referenced by CLWinogradConvolutionLayer::run().

◆ run()

void run ( )
overridevirtual

Run the kernels contained in the function.

For NEON kernels:

  • Multi-threading is used for the kernels which are parallelisable.
  • By default std::thread::hardware_concurrency() threads are used.
Note
CPPScheduler::set_num_threads() can be used to manually set the number of threads

For OpenCL kernels:

  • All the kernels are enqueued on the queue associated with CLScheduler.
  • The queue is then flushed.
Note
The function will not block until the kernels are executed. It is the user's responsibility to wait.
Will call prepare() on first run if hasn't been done

Implements IFunction.

Definition at line 200 of file CLWinogradConvolutionLayer.cpp.

201 {
202  prepare();
203 
204  MemoryGroupResourceScope scope_mg(_memory_group);
205 
206  // Run input transform
207  _input_transform.run();
208 
209  // Run batched matrix multiplication
210  _batched_mm.run();
211 
212  // Run output transform
213  CLScheduler::get().enqueue(_output_transform);
214 }
void run() override
Run the kernels contained in the function.
Definition: CLGEMM.cpp:572
static CLScheduler & get()
Access the scheduler singleton.
Definition: CLScheduler.cpp:41
void prepare() override
Prepare the function for executing.
void run() override final
Run the kernels contained in the function.
void enqueue(ICLKernel &kernel, bool flush=true)
Schedule the execution of the passed kernel if possible.
Definition: CLScheduler.cpp:95
Memory group resources scope handling class.
Definition: IMemoryGroup.h:46

References CLScheduler::enqueue(), CLScheduler::get(), CLWinogradConvolutionLayer::prepare(), ICLSimpleFunction::run(), and CLGEMM::run().

◆ validate()

Status validate ( const ITensorInfo input,
const ITensorInfo weights,
const ITensorInfo biases,
const ITensorInfo output,
const PadStrideInfo conv_info,
const ActivationLayerInfo act_info = ActivationLayerInfo(),
bool  enable_fast_math = false 
)
static

Static function to check if given info will lead to a valid configuration of CLWinogradConvolutionLayer.

Note
: This function only works with 3x3,3x1,1x3,5x5,5x1 and 1x5 kernels along with unit strides for both NCHW and NHWC data layout
Some Winograd configurations (i.e. F(4x4, 5x5)) are supported only with enable_fast_math = true
Parameters
[in]inputSource tensor. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: F16/F32.
[in]weightsWeights tensor. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. Data type supported:Same as input.
[in]biasesBiases tensor. Shared biases supported. Biases are 1D tensor with dimensions [OFM].Data type supported: Same as input
[out]outputDestination tensor. 3 lower dimensions represent a single output [width, height, OFM], while the rest represent batch of outputs. Data types supported: Same as input.
[in]conv_infoContains padding and stride information described in PadStrideInfo.
[in]act_info(Optional) Activation layer information in case of a fused activation.
[in]enable_fast_math(Optional) Enable fast math computation. In case this flag were set, the function could dispatch the fastest implementation available which may introduce a drop of accuracy as well. Default is false
Returns
a status

Definition at line 149 of file CLWinogradConvolutionLayer.cpp.

151 {
152  // Get indeces for the width and height
153  const size_t idx_width = get_data_layout_dimension_index(input->data_layout(), DataLayoutDimension::WIDTH);
154  const size_t idx_height = get_data_layout_dimension_index(input->data_layout(), DataLayoutDimension::HEIGHT);
155 
156  // Input shape, kernel size and output tile
157  const Size2D input_dims = Size2D(input->tensor_shape()[idx_width], input->tensor_shape()[idx_height]);
158  const Size2D kernel_size = Size2D(weights->tensor_shape()[idx_width], weights->tensor_shape()[idx_height]);
159  const Size2D output_tile = winograd_output_tile(input_dims, kernel_size, input->data_layout());
160 
161  ARM_COMPUTE_RETURN_ERROR_ON_MSG(((conv_info.pad_left() > (kernel_size.x() / 2u)) || (conv_info.pad_right() > (kernel_size.x() / 2u))), "Winograd only supports padding up to half kernel size");
162  ARM_COMPUTE_RETURN_ERROR_ON_MSG(((conv_info.pad_top() > (kernel_size.y() / 2u)) || (conv_info.pad_bottom() > (kernel_size.y() / 2u))), "Winograd only supports padding up to half kernel size");
163 
164  // Check if the Winograd configuration requires fast math
165  if(!enable_fast_math)
166  {
167  ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(input, 1, DataType::F32); //disable winograd for fp16 if fast math is false.
168  ARM_COMPUTE_RETURN_ERROR_ON_MSG(check_support_fast_math(output_tile, kernel_size), "This Winograd configuration requires enable_fast_math=true");
169  }
170 
171  const WinogradInfo winograd_info = WinogradInfo(output_tile,
172  kernel_size,
173  input_dims,
174  conv_info,
175  input->data_layout());
176 
177  // Validate input transform
179  const TensorInfo input0 = input->clone()->set_tensor_shape(input0_shape);
181 
182  // Validate filter transform
184  const TensorInfo input1 = weights->clone()->set_tensor_shape(input1_shape);
186 
187  // Validate batched matrix multiply
188  TensorShape batched_mm_output_shape = input0.tensor_shape();
189  batched_mm_output_shape[0] = input1.tensor_shape()[0];
190  const TensorInfo batched_mm_output = input0.clone()->set_tensor_shape(batched_mm_output_shape);
191  ARM_COMPUTE_RETURN_ON_ERROR(CLGEMM::validate(&input0, &input1, nullptr, &batched_mm_output, 1.0f, 0.0f, GEMMInfo(false, false, true /* Reshape weights only for the first run*/, 0, false, false,
192  GEMMLowpOutputStageInfo(), (input->data_type() == DataType::F16))));
193 
194  // Configure output transform
196 
197  return Status{};
198 }
Shape of a tensor.
Definition: TensorShape.h:39
TensorShape compute_winograd_input_transform_shape(const ITensorInfo &input, const WinogradInfo &winograd_info)
Calculate the winograd input transform shape.
static Status validate(const ITensorInfo *input, const ITensorInfo *output, const WinogradInfo &winograd_info)
Static function to check if given info will lead to a valid configuration of CLWinogradFilterTransfor...
static Status validate(const ITensorInfo *input, const ITensorInfo *bias, const ITensorInfo *output, const WinogradInfo &winograd_info, const ActivationLayerInfo &act_info=ActivationLayerInfo())
Static function to check if given info will lead to a valid configuration of CLWinogradOutputTransfor...
std::unique_ptr< ITensorInfo > clone() const override
Provide a clone of the current object of class T.
Definition: TensorInfo.cpp:306
Winograd information.
Definition: Types.h:2043
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:193
virtual DataType data_type() const =0
Data type used for each element of the tensor.
#define ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(t, c,...)
Definition: Validate.h:791
1 channel, 1 F32 per channel
Status class.
Definition: Error.h:52
1 channel, 1 F16 per channel
virtual const TensorShape & tensor_shape() const =0
Size for each dimension of the tensor.
#define ARM_COMPUTE_RETURN_ERROR_ON_MSG(cond,...)
If the condition is true, an error is returned.
Definition: Error.h:214
virtual std::unique_ptr< T > clone() const =0
Provide a clone of the current object of class T.
GEMMLowp output stage info.
Definition: Types.h:1845
TensorShape compute_winograd_filter_transform_shape(const ITensorInfo &input, const WinogradInfo &winograd_info)
Calculate the winograd filter transform shape.
Class for specifying the size of an image or rectangle.
Definition: Size2D.h:34
static Status validate(const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *output, float alpha, float beta, const GEMMInfo &gemm_info=GEMMInfo())
Static function to check if given info will lead to a valid configuration of CLGEMM.
Definition: CLGEMM.cpp:525
Store the tensor's metadata.
Definition: TensorInfo.h:45
GEMM information class.
Definition: Types.h:1880
size_t get_data_layout_dimension_index(const DataLayout data_layout, const DataLayoutDimension data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:326
const TensorShape & tensor_shape() const override
Size for each dimension of the tensor.
Definition: TensorInfo.h:252
static Status validate(const ITensorInfo *input, const ITensorInfo *output, const WinogradInfo &winograd_info)
Static function to check if given info will lead to a valid configuration of CLWinogradInputTransform...
virtual DataLayout data_layout() const =0
Get the data layout of the tensor.

References arm_compute::test::validation::act_info, ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_RETURN_ERROR_ON_MSG, ARM_COMPUTE_RETURN_ON_ERROR, ICloneable< T >::clone(), TensorInfo::clone(), arm_compute::misc::shape_calculator::compute_winograd_filter_transform_shape(), arm_compute::misc::shape_calculator::compute_winograd_input_transform_shape(), arm_compute::test::validation::conv_info, ITensorInfo::data_layout(), ITensorInfo::data_type(), arm_compute::F16, arm_compute::F32, arm_compute::get_data_layout_dimension_index(), arm_compute::HEIGHT, ITensorInfo::tensor_shape(), TensorInfo::tensor_shape(), CLWinogradInputTransform::validate(), CLWinogradFilterTransformKernel::validate(), CLWinogradOutputTransformKernel::validate(), CLGEMM::validate(), arm_compute::test::validation::weights, arm_compute::WIDTH, and arm_compute::test::validation::winograd_info.

Referenced by CLConvolutionLayer::get_convolution_method(), and CLConvolutionLayer::validate().


The documentation for this class was generated from the following files: