Compute Library
 21.02
CLFFTConvolutionLayer Class Reference

Basic function to execute FFT-based convolution on OpenCL. More...

#include <CLFFTConvolutionLayer.h>

Collaboration diagram for CLFFTConvolutionLayer:
[legend]

Public Member Functions

 CLFFTConvolutionLayer (std::shared_ptr< IMemoryManager > memory_manager=nullptr)
 Default constructor. More...
 
 CLFFTConvolutionLayer (const CLFFTConvolutionLayer &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 CLFFTConvolutionLayer (CLFFTConvolutionLayer &&)=default
 Default move constructor. More...
 
CLFFTConvolutionLayeroperator= (const CLFFTConvolutionLayer &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
CLFFTConvolutionLayeroperator= (CLFFTConvolutionLayer &&)=default
 Default move assignment operator. More...
 
void configure (ICLTensor *input, const ICLTensor *weights, const ICLTensor *biases, ICLTensor *output, const PadStrideInfo &conv_info, const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false)
 Set the input and output tensors. More...
 
void configure (const CLCompileContext &compile_context, ICLTensor *input, const ICLTensor *weights, const ICLTensor *biases, ICLTensor *output, const PadStrideInfo &conv_info, const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false)
 Set the input and output tensors. More...
 
void run () override
 Run the kernels contained in the function. More...
 
void prepare () override
 Prepare the function for executing. More...
 
- Public Member Functions inherited from IFunction
virtual ~IFunction ()=default
 Destructor. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, const PadStrideInfo &conv_info, const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false)
 Static function to check if given info will lead to a valid configuration of CLFFTConvolutionLayer. More...
 

Detailed Description

Basic function to execute FFT-based convolution on OpenCL.

This function calls the following OpenCL functions/kernels:

  1. CLPermute Permute input if NHWC(only NCHW is supported).
  2. CLPadLayer Pad input.
  3. CLFFT2D Forward transform to the frequency domain.
  4. CLComplexPixelWiseMultiplication Complex element-wise product of input and the weights.
  5. CLReductionOperation Reduction across channels.
  6. CLFFT2D Inverse transform back to the time domain.
  7. CLStridedSlice Extract valid output.
  8. CLArithmeticAddition Add bias.
  9. CLActivationLayer Perform activation.
  10. CLPermute Permute output if NHWC(only NCHW is supported).

Definition at line 59 of file CLFFTConvolutionLayer.h.

Constructor & Destructor Documentation

◆ CLFFTConvolutionLayer() [1/3]

CLFFTConvolutionLayer ( std::shared_ptr< IMemoryManager memory_manager = nullptr)

Default constructor.

Definition at line 63 of file CLFFTConvolutionLayer.cpp.

64  : _memory_group(memory_manager),
65  _flip_weights_func(),
66  _permute_input_func(),
67  _permute_output_func(),
68  _permute_weights_func(),
69  _permute_bias_func(),
70  _pad_input_func(),
71  _pad_weights_func(),
72  _transform_input_func(memory_manager),
73  _transform_weights_func(),
74  _itransform_output_func(memory_manager),
75  _prod_func(),
76  _reduce_func(),
77  _extract_output_func(),
78  _bias_add_func(),
79  _activation_layer_func(),
80  _permuted_input(),
81  _permuted_weights(),
82  _permuted_bias(),
83  _permuted_output(),
84  _padded_input(),
85  _padded_weights(),
86  _flip_axis(),
87  _flipped_weights(),
88  _transformed_input(),
89  _transformed_weights(),
90  _input_weights_product(),
91  _output_product(),
92  _output_reduced(),
93  _itransformed_output(),
94  _reshaped_output(),
95  _bias_output(),
96  _original_weights(nullptr),
97  _original_bias(nullptr),
98  _is_activationlayer_enabled(false),
99  _needs_permute(false),
100  _has_bias(false),
101  _is_prepared(false)
102 {
103 }

◆ CLFFTConvolutionLayer() [2/3]

Prevent instances of this class from being copied (As this class contains pointers)

◆ CLFFTConvolutionLayer() [3/3]

Default move constructor.

Member Function Documentation

◆ configure() [1/2]

void configure ( ICLTensor input,
const ICLTensor weights,
const ICLTensor biases,
ICLTensor output,
const PadStrideInfo conv_info,
const ActivationLayerInfo act_info = ActivationLayerInfo(),
bool  enable_fast_math = false 
)

Set the input and output tensors.

Note
: This function only works with any square kernel size and unit strides for both NCHW and NHWC data layout
Parameters
[in]inputSource tensor. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: F16/F32.
[in]weightsWeights tensor. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. Data type supported:Same as input.
[in]biasesBiases tensor. Shared biases supported. Biases are 1D tensor with dimensions [OFM].Data type supported: Same as input
[out]outputDestination tensor. 3 lower dimensions represent a single output [width, height, OFM], while the rest represent batch of outputs. Data types supported: Same as input.
[in]conv_infoContains padding and stride information described in PadStrideInfo.
[in]act_info(Optional) Activation layer information in case of a fused activation.
[in]enable_fast_math(Optional) Enable fast math computation. In case this flag were set, the function could dispatch the fastest implementation available which may introduce a drop of accuracy as well. Default is false

Definition at line 105 of file CLFFTConvolutionLayer.cpp.

References CLKernelLibrary::get().

107 {
108  configure(CLKernelLibrary::get().get_compile_context(), input, weights, biases, output, conv_info, act_info, enable_fast_math);
109 }
static CLKernelLibrary & get()
Access the KernelLibrary singleton.
void configure(ICLTensor *input, const ICLTensor *weights, const ICLTensor *biases, ICLTensor *output, const PadStrideInfo &conv_info, const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false)
Set the input and output tensors.

◆ configure() [2/2]

void configure ( const CLCompileContext compile_context,
ICLTensor input,
const ICLTensor weights,
const ICLTensor biases,
ICLTensor output,
const PadStrideInfo conv_info,
const ActivationLayerInfo act_info = ActivationLayerInfo(),
bool  enable_fast_math = false 
)

Set the input and output tensors.

Note
: This function only works with any square kernel size and unit strides for both NCHW and NHWC data layout
Parameters
[in]compile_contextThe compile context to be used.
[in]inputSource tensor. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: F16/F32.
[in]weightsWeights tensor. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. Data type supported:Same as input.
[in]biasesBiases tensor. Shared biases supported. Biases are 1D tensor with dimensions [OFM].Data type supported: Same as input
[out]outputDestination tensor. 3 lower dimensions represent a single output [width, height, OFM], while the rest represent batch of outputs. Data types supported: Same as input.
[in]conv_infoContains padding and stride information described in PadStrideInfo.
[in]act_info(Optional) Activation layer information in case of a fused activation.
[in]enable_fast_math(Optional) Enable fast math computation. In case this flag were set, the function could dispatch the fastest implementation available which may introduce a drop of accuracy as well. Default is false

Definition at line 111 of file CLFFTConvolutionLayer.cpp.

References CLTensorAllocator::allocate(), CLTensor::allocator(), ARM_COMPUTE_ERROR_THROW_ON, ARM_COMPUTE_UNUSED, arm_compute::auto_init_if_empty(), ICLTensor::buffer(), ICloneable< T >::clone(), TensorInfo::clone(), CLReverse::configure(), CLPermute::configure(), CLFFT2D::configure(), CLActivationLayer::configure(), CLPadLayer::configure(), CLReductionOperation::configure(), CLArithmeticAddition::configure(), CLSlice::configure(), CLComplexPixelWiseMultiplication::configure(), arm_compute::test::validation::conv_info, ITensorInfo::data_layout(), FFT2DInfo::direction, arm_compute::get_data_layout_dimension_index(), arm_compute::HEIGHT, arm_compute::test::validation::idx_height, arm_compute::test::validation::idx_width, ITensor::info(), CLTensor::info(), ITensorAllocator::init(), arm_compute::test::validation::input, arm_compute::Inverse, MemoryGroup::manage(), CLTensor::map(), arm_compute::NCHW, arm_compute::NHWC, PadStrideInfo::pad_bottom(), PadStrideInfo::pad_left(), PadStrideInfo::pad_right(), PadStrideInfo::pad_top(), TensorShape::remove_dimension(), TensorInfo::set_data_layout(), arm_compute::SUM, ITensorInfo::tensor_shape(), TensorInfo::tensor_shape(), arm_compute::U, arm_compute::U32, CLTensor::unmap(), CLFFTConvolutionLayer::validate(), arm_compute::WIDTH, arm_compute::WRAP, Dimensions< T >::x(), and Dimensions< T >::y().

113 {
114  ARM_COMPUTE_UNUSED(enable_fast_math);
115  ARM_COMPUTE_ERROR_THROW_ON(CLFFTConvolutionLayer::validate(input->info(), weights->info(), biases != nullptr ? biases->info() : nullptr, output->info(), conv_info, act_info, enable_fast_math));
116 
117  _original_weights = weights;
118  _original_bias = biases;
119 
120  // Flat if bias addition is required
121  _has_bias = biases != nullptr;
122 
123  // Get indices for the width and height
124  const size_t idx_width = get_data_layout_dimension_index(input->info()->data_layout(), DataLayoutDimension::WIDTH);
125  const size_t idx_height = get_data_layout_dimension_index(input->info()->data_layout(), DataLayoutDimension::HEIGHT);
126 
127  // Input shape, kernel size and output tile
128  const Size2D input_dims = Size2D(input->info()->tensor_shape()[idx_width], input->info()->tensor_shape()[idx_height]);
129  const Size2D kernel_size = Size2D(weights->info()->tensor_shape()[idx_width], weights->info()->tensor_shape()[idx_height]);
130  const Size2D pad_valid = Size2D(pad_decomposable(input_dims.x() + kernel_size.x() - 1),
131  pad_decomposable(input_dims.y() + kernel_size.y() - 1));
132  // Tensors to use
133  ICLTensor *input_to_use = input;
134  const ICLTensor *weights_to_use = weights;
135  ICLTensor *output_to_use = _has_bias ? &_bias_output : output;
136 
137  // Permute bias
138  if(biases != nullptr)
139  {
140  _permute_bias_func.configure(compile_context, biases, &_permuted_bias, PermutationVector(1U, 2U, 0U));
141  _permuted_bias.info()->set_data_layout(DataLayout::NCHW);
142  }
143 
144  // Permute input if needed
145  _needs_permute = input->info()->data_layout() == DataLayout::NHWC;
146  if(_needs_permute)
147  {
148  _memory_group.manage(&_permuted_input);
149  // Configure the function to transform the input tensor from NHWC -> NCHW
150  _permute_input_func.configure(compile_context, input, &_permuted_input, PermutationVector(1U, 2U, 0U));
151  _permuted_input.info()->set_data_layout(DataLayout::NCHW);
152 
153  // Configure the function to transform the weights tensor from HWI -> IHW
154  _permute_weights_func.configure(compile_context, weights, &_permuted_weights, PermutationVector(1U, 2U, 0U));
155  _permuted_weights.info()->set_data_layout(DataLayout::NCHW);
156 
157  input_to_use = &_permuted_input;
158  weights_to_use = &_permuted_weights;
159  }
160 
161  // Flip weights
162  _flipped_weights.allocator()->init(weights_to_use->info()->clone()->set_is_resizable(true).reset_padding());
163  _flip_axis.allocator()->init(TensorInfo(TensorShape(2U), 1, DataType::U32));
164  _flip_weights_func.configure(compile_context, weights_to_use, &_flipped_weights, &_flip_axis);
165 
166  // Pad weights
167  const PaddingList padding_w = { { 0, input_dims.x() + pad_valid.x() - 1 }, { 0, input_dims.y() + pad_valid.y() - 1 } };
168  _pad_weights_func.configure(compile_context, &_flipped_weights, &_padded_weights, padding_w);
169 
170  // Transform weights
171  _transform_weights_func = std::make_unique<CLFFT2D>();
172  _transform_weights_func->configure(compile_context, &_padded_weights, &_transformed_weights, FFT2DInfo());
173 
174  // Pad input
175  const PaddingList padding_in = { { 0, kernel_size.x() + pad_valid.x() - 1 }, { 0, kernel_size.y() + pad_valid.y() - 1 } };
176  _memory_group.manage(&_padded_input);
177  _pad_input_func.configure(compile_context, input_to_use, &_padded_input, padding_in);
178  if(_needs_permute)
179  {
180  _permuted_input.allocator()->allocate();
181  }
182 
183  // Transform input
184  _memory_group.manage(&_transformed_input);
185  _transform_input_func.configure(compile_context, &_padded_input, &_transformed_input, FFT2DInfo());
186  _padded_input.allocator()->allocate();
187 
188  // Perform product
189  _memory_group.manage(&_output_product);
190  _prod_func.configure(compile_context, &_transformed_input, &_transformed_weights, &_output_product);
191  _transformed_input.allocator()->allocate();
192 
193  // Perform reduction
194  _memory_group.manage(&_output_reduced);
195  _reduce_func.configure(compile_context, &_output_product, &_output_reduced, 2, ReductionOperation::SUM);
196  _output_product.allocator()->allocate();
197 
198  // Transform output
199  _memory_group.manage(&_itransformed_output);
200  FFT2DInfo itranform_info;
201  itranform_info.direction = FFTDirection::Inverse;
202  _itransformed_output.allocator()->init(_output_reduced.info()->clone()->set_is_resizable(true).set_num_channels(1).reset_padding());
203  _itransform_output_func.configure(compile_context, &_output_reduced, &_itransformed_output, itranform_info);
204  _output_reduced.allocator()->allocate();
205 
206  // Reshape output
207  TensorShape reshaped_shape = _itransformed_output.info()->tensor_shape();
208  reshaped_shape.remove_dimension(2);
209  _reshaped_output.allocator()->init(_itransformed_output.info()->clone()->set_tensor_shape(reshaped_shape));
210 
211  // Extract correct region
212  const int start_left = kernel_size.x() - conv_info.pad_left() - 1;
213  const int start_top = kernel_size.y() - conv_info.pad_top() - 1;
214  const int end_right = _reshaped_output.info()->tensor_shape().x() - (kernel_size.x() - conv_info.pad_right() - 1) - pad_valid.x();
215  const int end_botton = _reshaped_output.info()->tensor_shape().y() - (kernel_size.y() - conv_info.pad_bottom() - 1) - pad_valid.y();
216  if(_has_bias)
217  {
218  _memory_group.manage(&_bias_output);
219  }
220  else if(_needs_permute)
221  {
222  output_to_use = &_permuted_output;
223  _memory_group.manage(&_permuted_output);
224  }
225  _extract_output_func.configure(compile_context, &_reshaped_output, output_to_use, Coordinates(start_left, start_top), Coordinates(end_right, end_botton));
226  _itransformed_output.allocator()->allocate();
227 
228  // Add bias
229  if(biases != nullptr)
230  {
231  output_to_use = output;
232  if(_needs_permute)
233  {
234  output_to_use = &_permuted_output;
235  _memory_group.manage(&_permuted_output);
236  }
237  auto_init_if_empty(*output_to_use->info(), *_bias_output.info());
238  _bias_add_func.configure(compile_context, &_bias_output, &_permuted_bias, output_to_use, ConvertPolicy::WRAP);
239  _bias_output.allocator()->allocate();
240  }
241 
242  // Permute output
243  if(_needs_permute)
244  {
245  // Configure the function to transform the convoluted output to ACL's native ordering format NCHW
246  _permuted_output.info()->set_data_layout(DataLayout::NCHW);
247  _permute_output_func.configure(compile_context, &_permuted_output, output, PermutationVector(2U, 0U, 1U));
248 
249  // Allocate tensors
250  _permuted_output.allocator()->allocate();
251  }
252 
253  // Configure Activation Layer
254  _is_activationlayer_enabled = act_info.enabled();
255  if(_is_activationlayer_enabled)
256  {
257  _activation_layer_func.configure(compile_context, output, nullptr, act_info);
258  }
259 
260  // Setup flip axis data
261  _flip_axis.allocator()->allocate();
262  _flip_axis.map(true);
263  auto axis_data = reinterpret_cast<uint32_t *>(_flip_axis.buffer());
264  axis_data[0] = 0;
265  axis_data[1] = 1;
266  _flip_axis.unmap();
267 }
void remove_dimension(size_t n)
Accessor to remove the dimension n from the tensor shape.
Definition: TensorShape.h:111
std::unique_ptr< ITensorInfo > clone() const override
Provide a clone of the current object of class T.
Definition: TensorInfo.cpp:316
TensorInfo * info() const override
Interface to be implemented by the child class to return the tensor&#39;s metadata.
Definition: CLTensor.cpp:41
void configure(const ICLTensor *input, ICLTensor *output, const FFT2DInfo &config)
Initialise the function&#39;s source, destinations and border mode.
Definition: CLFFT2D.cpp:42
std::vector< PaddingInfo > PaddingList
List of padding information.
Definition: Types.h:481
Strides PermutationVector
Permutation vector.
Definition: Types.h:49
static Status validate(const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, const PadStrideInfo &conv_info, const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false)
Static function to check if given info will lead to a valid configuration of CLFFTConvolutionLayer.
CLTensorAllocator * allocator()
Return a pointer to the tensor&#39;s allocator.
Definition: CLTensor.cpp:61
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
void init(const TensorInfo &input, size_t alignment=0)
Initialize a tensor based on the passed TensorInfo.
void map(bool blocking=true)
Enqueue a map operation of the allocated buffer.
Definition: CLTensor.cpp:66
uint8_t * buffer() const override
Interface to be implemented by the child class to return a pointer to CPU memory. ...
Definition: ICLTensor.cpp:53
void manage(IMemoryManageable *obj) override
Sets a object to be managed by the given memory group.
Definition: MemoryGroup.h:79
T x() const
Alias to access the size of the first dimension.
Definition: Dimensions.h:87
ITensorInfo & set_data_layout(const DataLayout &data_layout) override
Set the data layout of the tensor.
Definition: TensorInfo.cpp:386
#define ARM_COMPUTE_UNUSED(...)
To avoid unused variables warnings.
Definition: Error.h:152
1 channel, 1 U32 per channel
void configure(ICLTensor *input, ICLTensor *output, const PaddingList &padding, PixelValue constant_value=PixelValue(), PaddingMode mode=PaddingMode::CONSTANT)
Initialize the function.
Definition: CLPadLayer.cpp:38
bool auto_init_if_empty(ITensorInfo &info, const TensorShape &shape, int num_channels, DataType data_type, QuantizationInfo quantization_info=QuantizationInfo())
Auto initialize the tensor info (shape, number of channels and data type) if the current assignment i...
void configure(ICLTensor *input1, ICLTensor *input2, ICLTensor *output, ConvertPolicy policy, const ActivationLayerInfo &act_info=ActivationLayerInfo())
Initialise the kernel&#39;s inputs, output and conversion policy.
Num samples, channels, height, width.
void configure(const ICLTensor *input, ICLTensor *output, const Coordinates &starts, const Coordinates &ends)
Configure kernel.
Definition: CLSlice.cpp:84
void allocate() override
Allocate size specified by TensorInfo of OpenCL memory.
Num samples, height, width, channels.
void configure(ICLTensor *input, ICLTensor *output, ActivationLayerInfo act_info)
Set the input and output tensor.
void configure(ICLTensor *input, ICLTensor *output, unsigned int axis, ReductionOperation op, bool keep_dims=true)
Set the input and output tensors.
void configure(const ICLTensor *input, ICLTensor *output, const PermutationVector &perm)
Set the input and output tensors.
Definition: CLPermute.cpp:50
T y() const
Alias to access the size of the second dimension.
Definition: Dimensions.h:92
size_t get_data_layout_dimension_index(const DataLayout data_layout, const DataLayoutDimension data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:193
const TensorShape & tensor_shape() const override
Size for each dimension of the tensor.
Definition: TensorInfo.h:262
void unmap()
Enqueue an unmap operation of the allocated and mapped buffer.
Definition: CLTensor.cpp:71
void configure(ICLTensor *input1, ICLTensor *input2, ICLTensor *output, const ActivationLayerInfo &act_info=ActivationLayerInfo())
Initialise the kernel&#39;s inputs, output.
void configure(const ICLTensor *input, ICLTensor *output, const ICLTensor *axis)
Initialize the function.
Definition: CLReverse.cpp:31

◆ operator=() [1/2]

CLFFTConvolutionLayer& operator= ( const CLFFTConvolutionLayer )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

CLFFTConvolutionLayer& operator= ( CLFFTConvolutionLayer &&  )
default

Default move assignment operator.

◆ prepare()

void prepare ( )
overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Note
Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from IFunction.

Definition at line 352 of file CLFFTConvolutionLayer.cpp.

References CLTensorAllocator::allocate(), CLTensor::allocator(), ARM_COMPUTE_ERROR_ON, CLTensorAllocator::free(), CLScheduler::get(), ITensor::is_used(), ITensor::mark_as_unused(), CLScheduler::queue(), ICLSimpleFunction::run(), CLPermute::run(), and CLPadLayer::run().

Referenced by CLFFTConvolutionLayer::run().

353 {
354  if(!_is_prepared)
355  {
356  // Permute bias to NCHW
357  if(_original_bias != nullptr)
358  {
359  _permuted_bias.allocator()->allocate();
360  _permute_bias_func.run();
361  _original_bias->mark_as_unused();
362  }
363 
364  const ICLTensor *cur_weights = _original_weights;
365  // Permute weights
366  if(_needs_permute)
367  {
368  ARM_COMPUTE_ERROR_ON(!cur_weights->is_used());
369 
370  _permuted_weights.allocator()->allocate();
371  _permute_weights_func.run();
372  cur_weights->mark_as_unused();
373  cur_weights = &_permuted_weights;
374  }
375 
376  // Flip weights
377  _flipped_weights.allocator()->allocate();
378  _flip_weights_func.run();
379  cur_weights->mark_as_unused();
380 
381  // Pad weights
382  _padded_weights.allocator()->allocate();
383  _pad_weights_func.run();
384  _flipped_weights.mark_as_unused();
385  CLScheduler::get().queue().finish();
386  _flipped_weights.allocator()->free();
387 
388  // Transform weights to frequency domain
389  _transformed_weights.allocator()->allocate();
390  _transform_weights_func->run();
391  _padded_weights.mark_as_unused();
392  CLScheduler::get().queue().finish();
393  // Delete object and release internal memory
394  _transform_weights_func.reset();
395  _padded_weights.allocator()->free();
396 
397  _is_prepared = true;
398  }
399 }
static CLScheduler & get()
Access the scheduler singleton.
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:466
CLTensorAllocator * allocator()
Return a pointer to the tensor&#39;s allocator.
Definition: CLTensor.cpp:61
void run() override
Run the kernels contained in the function.
Definition: CLPermute.cpp:71
void mark_as_unused() const
Marks a tensor as unused.
Definition: ITensor.cpp:168
void run() override final
Run the kernels contained in the function.
cl::CommandQueue & queue()
Accessor for the associated CL command queue.
Definition: CLScheduler.cpp:41
void allocate() override
Allocate size specified by TensorInfo of OpenCL memory.
void free() override
Free allocated OpenCL memory.
void run() override
Run the kernels contained in the function.
Definition: CLPadLayer.cpp:79

◆ run()

void run ( )
overridevirtual

Run the kernels contained in the function.

For Neon kernels:

  • Multi-threading is used for the kernels which are parallelisable.
  • By default std::thread::hardware_concurrency() threads are used.
Note
CPPScheduler::set_num_threads() can be used to manually set the number of threads

For OpenCL kernels:

  • All the kernels are enqueued on the queue associated with CLScheduler.
  • The queue is then flushed.
Note
The function will not block until the kernels are executed. It is the user's responsibility to wait.
Will call prepare() on first run if hasn't been done

Implements IFunction.

Definition at line 313 of file CLFFTConvolutionLayer.cpp.

References CLTensor::allocator(), CLTensor::cl_buffer(), CLTensorAllocator::import_memory(), CLFFTConvolutionLayer::prepare(), CLFFT2D::run(), CLPermute::run(), CLActivationLayer::run(), CLReductionOperation::run(), CLPadLayer::run(), CLArithmeticAddition::run(), CLSlice::run(), and CLComplexPixelWiseMultiplication::run().

314 {
315  prepare();
316 
317  MemoryGroupResourceScope scope_mg(_memory_group);
318 
319  // Transform input
320  if(_needs_permute)
321  {
322  _permute_input_func.run();
323  }
324  _pad_input_func.run();
325  _transform_input_func.run();
326 
327  // Perform operations to frequency domain
328  _prod_func.run();
329  _reduce_func.run();
330 
331  // Transform output
332  _itransform_output_func.run();
333  _reshaped_output.allocator()->import_memory(_itransformed_output.cl_buffer());
334  _extract_output_func.run();
335  // Add bias
336  if(_has_bias)
337  {
338  _bias_add_func.run();
339  }
340  if(_needs_permute)
341  {
342  _permute_output_func.run();
343  }
344 
345  // Run activation layer
346  if(_is_activationlayer_enabled)
347  {
348  _activation_layer_func.run();
349  }
350 }
void run() override
Run the kernels contained in the function.
const cl::Buffer & cl_buffer() const override
Interface to be implemented by the child class to return a reference to the OpenCL buffer containing ...
Definition: CLTensor.cpp:51
void run() override
Run the kernels contained in the function.
Definition: CLFFT2D.cpp:97
void run() override
Run the kernels contained in the function.
CLTensorAllocator * allocator()
Return a pointer to the tensor&#39;s allocator.
Definition: CLTensor.cpp:61
Status import_memory(cl::Buffer buffer)
Import an existing memory as a tensor&#39;s backing memory.
void run() override
Run the kernels contained in the function.
Definition: CLPermute.cpp:71
void run() override
Run the kernels contained in the function.
void run() override
Run the kernels contained in the function.
Definition: CLSlice.cpp:97
void run() override
Run the kernels contained in the function.
void run() override
Run the kernels contained in the function.
Definition: CLPadLayer.cpp:79
void prepare() override
Prepare the function for executing.

◆ validate()

Status validate ( const ITensorInfo input,
const ITensorInfo weights,
const ITensorInfo biases,
const ITensorInfo output,
const PadStrideInfo conv_info,
const ActivationLayerInfo act_info = ActivationLayerInfo(),
bool  enable_fast_math = false 
)
static

Static function to check if given info will lead to a valid configuration of CLFFTConvolutionLayer.

Note
: This function only works with any square kernel size and unit strides for both NCHW and NHWC data layout
Parameters
[in]inputSource tensor. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: F16/F32.
[in]weightsWeights tensor. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. Data type supported:Same as input.
[in]biasesBiases tensor. Shared biases supported. Biases are 1D tensor with dimensions [OFM].Data type supported: Same as input
[out]outputDestination tensor. 3 lower dimensions represent a single output [width, height, OFM], while the rest represent batch of outputs. Data types supported: Same as input.
[in]conv_infoContains padding and stride information described in PadStrideInfo.
[in]act_info(Optional) Activation layer information in case of a fused activation.
[in]enable_fast_math(Optional) Enable fast math computation. In case this flag were set, the function could dispatch the fastest implementation available which may introduce a drop of accuracy as well. Default is false
Returns
a status

Definition at line 269 of file CLFFTConvolutionLayer.cpp.

References ARM_COMPUTE_RETURN_ERROR_ON, ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES, ARM_COMPUTE_RETURN_ON_ERROR, ITensorInfo::data_layout(), ITensorInfo::data_type(), ActivationLayerInfo::enabled(), arm_compute::F16, arm_compute::F32, arm_compute::get_data_layout_dimension_index(), arm_compute::HEIGHT, arm_compute::test::validation::idx_height, arm_compute::test::validation::idx_width, PadStrideInfo::pad_bottom(), PadStrideInfo::pad_left(), PadStrideInfo::pad_right(), PadStrideInfo::pad_top(), PadStrideInfo::stride(), ITensorInfo::tensor_shape(), ITensorInfo::total_size(), CLActivationLayer::validate(), arm_compute::WIDTH, and Dimensions< T >::x().

Referenced by CLFFTConvolutionLayer::configure(), CLConvolutionLayer::get_convolution_method(), and CLConvolutionLayer::validate().

271 {
273  ARM_COMPUTE_RETURN_ERROR_ON((input->data_type() == DataType::F16) && !enable_fast_math);
275 
276  // Get indices for the width and height
279 
280  // Input shape, kernel size and output tile
281  const Size2D kernel_size = Size2D(weights->tensor_shape()[idx_width], weights->tensor_shape()[idx_height]);
282 
283  // Strides
284  const auto strides = conv_info.stride();
285  ARM_COMPUTE_RETURN_ERROR_ON(strides.first != strides.second && strides.first != 1);
286  ARM_COMPUTE_RETURN_ERROR_ON(kernel_size.x() != kernel_size.y());
287  ARM_COMPUTE_RETURN_ERROR_ON(conv_info.pad_left() != (kernel_size.x() / 2) || conv_info.pad_right() != (kernel_size.x() / 2));
288  ARM_COMPUTE_RETURN_ERROR_ON(conv_info.pad_top() != (kernel_size.y() / 2) || conv_info.pad_bottom() != (kernel_size.y() / 2));
289 
290  // Validate biases
291  if(biases != nullptr)
292  {
294  ARM_COMPUTE_RETURN_ERROR_ON(weights->tensor_shape()[3] != biases->tensor_shape().x());
295  }
296 
297  // Checks performed when output is configured
298  if((output != nullptr) && (output->total_size() != 0))
299  {
301  ARM_COMPUTE_RETURN_ERROR_ON((input->tensor_shape()[idx_height] != output->tensor_shape()[idx_height]) || (input->tensor_shape()[idx_width] != output->tensor_shape()[idx_width]));
302 
303  // Validate Activation Layer
304  if(act_info.enabled())
305  {
306  ARM_COMPUTE_RETURN_ON_ERROR(CLActivationLayer::validate(output, nullptr, act_info));
307  }
308  }
309 
310  return Status{};
311 }
static Status validate(const ITensorInfo *input, const ITensorInfo *output, const ActivationLayerInfo &act_info)
Static function to check if given info will lead to a valid configuration of CLActivationLayer.
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204
1 channel, 1 F32 per channel
#define ARM_COMPUTE_RETURN_ERROR_ON(cond)
If the condition is true, an error is returned.
Definition: Error.h:296
1 channel, 1 F16 per channel
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(...)
Definition: Validate.h:545
#define ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(t, c,...)
Definition: Validate.h:792
size_t get_data_layout_dimension_index(const DataLayout data_layout, const DataLayoutDimension data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:193

The documentation for this class was generated from the following files: