Compute Library
 21.11
CLFFTConvolutionLayer Class Reference

Basic function to execute FFT-based convolution on OpenCL. More...

#include <CLFFTConvolutionLayer.h>

Collaboration diagram for CLFFTConvolutionLayer:
[legend]

Public Member Functions

 CLFFTConvolutionLayer (std::shared_ptr< IMemoryManager > memory_manager=nullptr)
 Default constructor. More...
 
 CLFFTConvolutionLayer (const CLFFTConvolutionLayer &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 CLFFTConvolutionLayer (CLFFTConvolutionLayer &&)=default
 Default move constructor. More...
 
CLFFTConvolutionLayeroperator= (const CLFFTConvolutionLayer &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
CLFFTConvolutionLayeroperator= (CLFFTConvolutionLayer &&)=default
 Default move assignment operator. More...
 
void configure (ICLTensor *input, const ICLTensor *weights, const ICLTensor *biases, ICLTensor *output, const PadStrideInfo &conv_info, const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false)
 Set the input and output tensors. More...
 
void configure (const CLCompileContext &compile_context, ICLTensor *input, const ICLTensor *weights, const ICLTensor *biases, ICLTensor *output, const PadStrideInfo &conv_info, const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false)
 Set the input and output tensors. More...
 
void run () override
 Run the kernels contained in the function. More...
 
void prepare () override
 Prepare the function for executing. More...
 
- Public Member Functions inherited from IFunction
virtual ~IFunction ()=default
 Destructor. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, const PadStrideInfo &conv_info, const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false)
 Static function to check if given info will lead to a valid configuration of CLFFTConvolutionLayer. More...
 

Detailed Description

Basic function to execute FFT-based convolution on OpenCL.

This function calls the following OpenCL functions/kernels:

  1. CLPermute Permute input if NHWC(only NCHW is supported).
  2. CLPadLayer Pad input.
  3. CLFFT2D Forward transform to the frequency domain.
  4. CLComplexPixelWiseMultiplication Complex element-wise product of input and the weights.
  5. CLReductionOperation Reduction across channels.
  6. CLFFT2D Inverse transform back to the time domain.
  7. CLStridedSlice Extract valid output.
  8. CLArithmeticAddition Add bias.
  9. CLActivationLayer Perform activation.
  10. CLPermute Permute output if NHWC(only NCHW is supported).

Definition at line 59 of file CLFFTConvolutionLayer.h.

Constructor & Destructor Documentation

◆ CLFFTConvolutionLayer() [1/3]

CLFFTConvolutionLayer ( std::shared_ptr< IMemoryManager memory_manager = nullptr)

Default constructor.

Definition at line 65 of file CLFFTConvolutionLayer.cpp.

66  : _memory_group(memory_manager),
67  _flip_weights_func(),
68  _permute_input_func(),
69  _permute_output_func(),
70  _permute_weights_func(),
71  _permute_bias_func(),
72  _pad_input_func(),
73  _pad_weights_func(),
74  _transform_input_func(memory_manager),
75  _transform_weights_func(),
76  _itransform_output_func(memory_manager),
77  _prod_func(),
78  _reduce_func(),
79  _extract_output_func(),
80  _bias_add_func(),
81  _activation_layer_func(),
82  _permuted_input(),
83  _permuted_weights(),
84  _permuted_bias(),
85  _permuted_output(),
86  _padded_input(),
87  _padded_weights(),
88  _flip_axis(),
89  _flipped_weights(),
90  _transformed_input(),
91  _transformed_weights(),
92  _input_weights_product(),
93  _output_product(),
94  _output_reduced(),
95  _itransformed_output(),
96  _reshaped_output(),
97  _bias_output(),
98  _original_weights(nullptr),
99  _original_bias(nullptr),
100  _is_activationlayer_enabled(false),
101  _needs_permute(false),
102  _has_bias(false),
103  _is_prepared(false)
104 {
105 }

◆ CLFFTConvolutionLayer() [2/3]

Prevent instances of this class from being copied (As this class contains pointers)

◆ CLFFTConvolutionLayer() [3/3]

Default move constructor.

Member Function Documentation

◆ configure() [1/2]

void configure ( ICLTensor input,
const ICLTensor weights,
const ICLTensor biases,
ICLTensor output,
const PadStrideInfo conv_info,
const ActivationLayerInfo act_info = ActivationLayerInfo(),
bool  enable_fast_math = false 
)

Set the input and output tensors.

Valid data layouts:

  • All

Valid data type configurations:

src dst
F32 F32
F16 F16
Note
: This function only works with any square kernel size and unit strides for both NCHW and NHWC data layout
Parameters
[in]inputSource tensor. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: F16/F32.
[in]weightsWeights tensor. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. Data type supported:Same as input.
[in]biasesBiases tensor. Shared biases supported. Biases are 1D tensor with dimensions [OFM].Data type supported: Same as input
[out]outputDestination tensor. 3 lower dimensions represent a single output [width, height, OFM], while the rest represent batch of outputs. Data types supported: Same as input.
[in]conv_infoContains padding and stride information described in PadStrideInfo.
[in]act_info(Optional) Activation layer information in case of a fused activation.
[in]enable_fast_math(Optional) Enable fast math computation. In case this flag were set, the function could dispatch the fastest implementation available which may introduce a drop of accuracy as well. Default is false

Definition at line 107 of file CLFFTConvolutionLayer.cpp.

References CLKernelLibrary::get().

109 {
110  configure(CLKernelLibrary::get().get_compile_context(), input, weights, biases, output, conv_info, act_info, enable_fast_math);
111 }
static CLKernelLibrary & get()
Access the KernelLibrary singleton.
void configure(ICLTensor *input, const ICLTensor *weights, const ICLTensor *biases, ICLTensor *output, const PadStrideInfo &conv_info, const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false)
Set the input and output tensors.

◆ configure() [2/2]

void configure ( const CLCompileContext compile_context,
ICLTensor input,
const ICLTensor weights,
const ICLTensor biases,
ICLTensor output,
const PadStrideInfo conv_info,
const ActivationLayerInfo act_info = ActivationLayerInfo(),
bool  enable_fast_math = false 
)

Set the input and output tensors.

Note
: This function only works with any square kernel size and unit strides for both NCHW and NHWC data layout
Parameters
[in]compile_contextThe compile context to be used.
[in]inputSource tensor. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: F16/F32.
[in]weightsWeights tensor. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. Data type supported:Same as input.
[in]biasesBiases tensor. Shared biases supported. Biases are 1D tensor with dimensions [OFM].Data type supported: Same as input
[out]outputDestination tensor. 3 lower dimensions represent a single output [width, height, OFM], while the rest represent batch of outputs. Data types supported: Same as input.
[in]conv_infoContains padding and stride information described in PadStrideInfo.
[in]act_info(Optional) Activation layer information in case of a fused activation.
[in]enable_fast_math(Optional) Enable fast math computation. In case this flag were set, the function could dispatch the fastest implementation available which may introduce a drop of accuracy as well. Default is false

Definition at line 113 of file CLFFTConvolutionLayer.cpp.

References CLTensorAllocator::allocate(), CLTensor::allocator(), ARM_COMPUTE_ERROR_THROW_ON, ARM_COMPUTE_LOG_PARAMS, ARM_COMPUTE_UNUSED, arm_compute::auto_init_if_empty(), ICLTensor::buffer(), ICloneable< T >::clone(), TensorInfo::clone(), CLReverse::configure(), CLPermute::configure(), CLSlice::configure(), CLFFT2D::configure(), CLPadLayer::configure(), CLActivationLayer::configure(), CLReductionOperation::configure(), CLArithmeticAddition::configure(), CLComplexPixelWiseMultiplication::configure(), arm_compute::test::validation::conv_info, ITensorInfo::data_layout(), FFT2DInfo::direction, arm_compute::get_data_layout_dimension_index(), arm_compute::HEIGHT, arm_compute::test::validation::idx_height, arm_compute::test::validation::idx_width, ITensor::info(), CLTensor::info(), ITensorAllocator::init(), arm_compute::test::validation::input, arm_compute::Inverse, MemoryGroup::manage(), CLTensor::map(), arm_compute::NCHW, arm_compute::NHWC, PadStrideInfo::pad_bottom(), PadStrideInfo::pad_left(), PadStrideInfo::pad_right(), PadStrideInfo::pad_top(), TensorShape::remove_dimension(), TensorInfo::set_data_layout(), arm_compute::SUM, ITensorInfo::tensor_shape(), TensorInfo::tensor_shape(), arm_compute::U, arm_compute::U32, CLTensor::unmap(), CLFFTConvolutionLayer::validate(), arm_compute::WIDTH, arm_compute::WRAP, Dimensions< T >::x(), and Dimensions< T >::y().

115 {
116  ARM_COMPUTE_UNUSED(enable_fast_math);
117  ARM_COMPUTE_ERROR_THROW_ON(CLFFTConvolutionLayer::validate(input->info(), weights->info(), biases != nullptr ? biases->info() : nullptr, output->info(), conv_info, act_info, enable_fast_math));
118  ARM_COMPUTE_LOG_PARAMS(input, weights, biases, output, conv_info, act_info, enable_fast_math);
119 
120  _original_weights = weights;
121  _original_bias = biases;
122 
123  // Flat if bias addition is required
124  _has_bias = biases != nullptr;
125 
126  // Get indices for the width and height
127  const size_t idx_width = get_data_layout_dimension_index(input->info()->data_layout(), DataLayoutDimension::WIDTH);
128  const size_t idx_height = get_data_layout_dimension_index(input->info()->data_layout(), DataLayoutDimension::HEIGHT);
129 
130  // Input shape, kernel size and output tile
131  const Size2D input_dims = Size2D(input->info()->tensor_shape()[idx_width], input->info()->tensor_shape()[idx_height]);
132  const Size2D kernel_size = Size2D(weights->info()->tensor_shape()[idx_width], weights->info()->tensor_shape()[idx_height]);
133  const Size2D pad_valid = Size2D(pad_decomposable(input_dims.x() + kernel_size.x() - 1),
134  pad_decomposable(input_dims.y() + kernel_size.y() - 1));
135  // Tensors to use
136  ICLTensor *input_to_use = input;
137  const ICLTensor *weights_to_use = weights;
138  ICLTensor *output_to_use = _has_bias ? &_bias_output : output;
139 
140  // Permute bias
141  if(biases != nullptr)
142  {
143  _permute_bias_func.configure(compile_context, biases, &_permuted_bias, PermutationVector(1U, 2U, 0U));
144  _permuted_bias.info()->set_data_layout(DataLayout::NCHW);
145  }
146 
147  // Permute input if needed
148  _needs_permute = input->info()->data_layout() == DataLayout::NHWC;
149  if(_needs_permute)
150  {
151  _memory_group.manage(&_permuted_input);
152  // Configure the function to transform the input tensor from NHWC -> NCHW
153  _permute_input_func.configure(compile_context, input, &_permuted_input, PermutationVector(1U, 2U, 0U));
154  _permuted_input.info()->set_data_layout(DataLayout::NCHW);
155 
156  // Configure the function to transform the weights tensor from HWI -> IHW
157  _permute_weights_func.configure(compile_context, weights, &_permuted_weights, PermutationVector(1U, 2U, 0U));
158  _permuted_weights.info()->set_data_layout(DataLayout::NCHW);
159 
160  input_to_use = &_permuted_input;
161  weights_to_use = &_permuted_weights;
162  }
163 
164  // Flip weights
165  _flipped_weights.allocator()->init(weights_to_use->info()->clone()->set_is_resizable(true).reset_padding());
166  _flip_axis.allocator()->init(TensorInfo(TensorShape(2U), 1, DataType::U32));
167  _flip_weights_func.configure(compile_context, weights_to_use, &_flipped_weights, &_flip_axis);
168 
169  // Pad weights
170  const PaddingList padding_w = { { 0, input_dims.x() + pad_valid.x() - 1 }, { 0, input_dims.y() + pad_valid.y() - 1 } };
171  _pad_weights_func.configure(compile_context, &_flipped_weights, &_padded_weights, padding_w);
172 
173  // Transform weights
174  _transform_weights_func = std::make_unique<CLFFT2D>();
175  _transform_weights_func->configure(compile_context, &_padded_weights, &_transformed_weights, FFT2DInfo());
176 
177  // Pad input
178  const PaddingList padding_in = { { 0, kernel_size.x() + pad_valid.x() - 1 }, { 0, kernel_size.y() + pad_valid.y() - 1 } };
179  _memory_group.manage(&_padded_input);
180  _pad_input_func.configure(compile_context, input_to_use, &_padded_input, padding_in);
181  if(_needs_permute)
182  {
183  _permuted_input.allocator()->allocate();
184  }
185 
186  // Transform input
187  _memory_group.manage(&_transformed_input);
188  _transform_input_func.configure(compile_context, &_padded_input, &_transformed_input, FFT2DInfo());
189  _padded_input.allocator()->allocate();
190 
191  // Perform product
192  _memory_group.manage(&_output_product);
193  _prod_func.configure(compile_context, &_transformed_input, &_transformed_weights, &_output_product);
194  _transformed_input.allocator()->allocate();
195 
196  // Perform reduction
197  _memory_group.manage(&_output_reduced);
198  _reduce_func.configure(compile_context, &_output_product, &_output_reduced, 2, ReductionOperation::SUM);
199  _output_product.allocator()->allocate();
200 
201  // Transform output
202  _memory_group.manage(&_itransformed_output);
203  FFT2DInfo itranform_info;
204  itranform_info.direction = FFTDirection::Inverse;
205  _itransformed_output.allocator()->init(_output_reduced.info()->clone()->set_is_resizable(true).set_num_channels(1).reset_padding());
206  _itransform_output_func.configure(compile_context, &_output_reduced, &_itransformed_output, itranform_info);
207  _output_reduced.allocator()->allocate();
208 
209  // Reshape output
210  TensorShape reshaped_shape = _itransformed_output.info()->tensor_shape();
211  reshaped_shape.remove_dimension(2);
212  _reshaped_output.allocator()->init(_itransformed_output.info()->clone()->set_tensor_shape(reshaped_shape));
213 
214  // Extract correct region
215  const int start_left = kernel_size.x() - conv_info.pad_left() - 1;
216  const int start_top = kernel_size.y() - conv_info.pad_top() - 1;
217  const int end_right = _reshaped_output.info()->tensor_shape().x() - (kernel_size.x() - conv_info.pad_right() - 1) - pad_valid.x();
218  const int end_botton = _reshaped_output.info()->tensor_shape().y() - (kernel_size.y() - conv_info.pad_bottom() - 1) - pad_valid.y();
219  if(_has_bias)
220  {
221  _memory_group.manage(&_bias_output);
222  }
223  else if(_needs_permute)
224  {
225  output_to_use = &_permuted_output;
226  _memory_group.manage(&_permuted_output);
227  }
228  _extract_output_func.configure(compile_context, &_reshaped_output, output_to_use, Coordinates(start_left, start_top), Coordinates(end_right, end_botton));
229  _itransformed_output.allocator()->allocate();
230 
231  // Add bias
232  if(biases != nullptr)
233  {
234  output_to_use = output;
235  if(_needs_permute)
236  {
237  output_to_use = &_permuted_output;
238  _memory_group.manage(&_permuted_output);
239  }
240  auto_init_if_empty(*output_to_use->info(), *_bias_output.info());
241  _bias_add_func.configure(compile_context, &_bias_output, &_permuted_bias, output_to_use, ConvertPolicy::WRAP);
242  _bias_output.allocator()->allocate();
243  }
244 
245  // Permute output
246  if(_needs_permute)
247  {
248  // Configure the function to transform the convoluted output to ACL's native ordering format NCHW
249  _permuted_output.info()->set_data_layout(DataLayout::NCHW);
250  _permute_output_func.configure(compile_context, &_permuted_output, output, PermutationVector(2U, 0U, 1U));
251 
252  // Allocate tensors
253  _permuted_output.allocator()->allocate();
254  }
255 
256  // Configure Activation Layer
257  _is_activationlayer_enabled = act_info.enabled();
258  if(_is_activationlayer_enabled)
259  {
260  _activation_layer_func.configure(compile_context, output, nullptr, act_info);
261  }
262 
263  // Setup flip axis data
264  _flip_axis.allocator()->allocate();
265  _flip_axis.map(true);
266  auto axis_data = reinterpret_cast<uint32_t *>(_flip_axis.buffer());
267  axis_data[0] = 0;
268  axis_data[1] = 1;
269  _flip_axis.unmap();
270 }
void remove_dimension(size_t n)
Accessor to remove the dimension n from the tensor shape.
Definition: TensorShape.h:111
std::unique_ptr< ITensorInfo > clone() const override
Provide a clone of the current object of class T.
Definition: TensorInfo.cpp:282
TensorInfo * info() const override
Interface to be implemented by the child class to return the tensor&#39;s metadata.
Definition: CLTensor.cpp:41
void configure(const ICLTensor *input, ICLTensor *output, const FFT2DInfo &config)
Initialise the function&#39;s source, destinations and border mode.
Definition: CLFFT2D.cpp:44
std::vector< PaddingInfo > PaddingList
List of padding information.
Definition: Types.h:440
Strides PermutationVector
Permutation vector.
Definition: Types.h:51
static Status validate(const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, const PadStrideInfo &conv_info, const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false)
Static function to check if given info will lead to a valid configuration of CLFFTConvolutionLayer.
CLTensorAllocator * allocator()
Return a pointer to the tensor&#39;s allocator.
Definition: CLTensor.cpp:61
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
void init(const TensorInfo &input, size_t alignment=0)
Initialize a tensor based on the passed TensorInfo.
void map(bool blocking=true)
Enqueue a map operation of the allocated buffer.
Definition: CLTensor.cpp:66
uint8_t * buffer() const override
Interface to be implemented by the child class to return a pointer to CPU memory. ...
Definition: ICLTensor.cpp:53
void manage(IMemoryManageable *obj) override
Sets a object to be managed by the given memory group.
Definition: MemoryGroup.h:79
T x() const
Alias to access the size of the first dimension.
Definition: Dimensions.h:87
ITensorInfo & set_data_layout(const DataLayout &data_layout) override
Set the data layout of the tensor.
Definition: TensorInfo.cpp:352
#define ARM_COMPUTE_UNUSED(...)
To avoid unused variables warnings.
Definition: Error.h:152
1 channel, 1 U32 per channel
void configure(ICLTensor *input, ICLTensor *output, const PaddingList &padding, PixelValue constant_value=PixelValue(), PaddingMode mode=PaddingMode::CONSTANT)
Initialize the function.
Definition: CLPadLayer.cpp:40
bool auto_init_if_empty(ITensorInfo &info, const TensorShape &shape, int num_channels, DataType data_type, QuantizationInfo quantization_info=QuantizationInfo())
Auto initialize the tensor info (shape, number of channels and data type) if the current assignment i...
void configure(ICLTensor *input1, ICLTensor *input2, ICLTensor *output, ConvertPolicy policy, const ActivationLayerInfo &act_info=ActivationLayerInfo())
Initialise the kernel&#39;s inputs, output and conversion policy.
Num samples, channels, height, width.
void configure(const ICLTensor *input, ICLTensor *output, const Coordinates &starts, const Coordinates &ends)
Configure kernel.
Definition: CLSlice.cpp:87
void allocate() override
Allocate size specified by TensorInfo of OpenCL memory.
size_t get_data_layout_dimension_index(const DataLayout &data_layout, const DataLayoutDimension &data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:193
Num samples, height, width, channels.
void configure(ICLTensor *input, ICLTensor *output, ActivationLayerInfo act_info)
Set the input and output tensor.
#define ARM_COMPUTE_LOG_PARAMS(...)
void configure(ICLTensor *input, ICLTensor *output, unsigned int axis, ReductionOperation op, bool keep_dims=true)
Set the input and output tensors.
void configure(const ICLTensor *input, ICLTensor *output, const PermutationVector &perm)
Set the input and output tensors.
Definition: CLPermute.cpp:51
T y() const
Alias to access the size of the second dimension.
Definition: Dimensions.h:92
const TensorShape & tensor_shape() const override
Size for each dimension of the tensor.
Definition: TensorInfo.h:234
void unmap()
Enqueue an unmap operation of the allocated and mapped buffer.
Definition: CLTensor.cpp:71
void configure(ICLTensor *input1, ICLTensor *input2, ICLTensor *output, const ActivationLayerInfo &act_info=ActivationLayerInfo())
Initialise the kernel&#39;s inputs, output.
void configure(const ICLTensor *input, ICLTensor *output, const ICLTensor *axis)
Initialize the function.
Definition: CLReverse.cpp:33

◆ operator=() [1/2]

CLFFTConvolutionLayer& operator= ( const CLFFTConvolutionLayer )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

CLFFTConvolutionLayer& operator= ( CLFFTConvolutionLayer &&  )
default

Default move assignment operator.

◆ prepare()

void prepare ( )
overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Note
Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from IFunction.

Definition at line 355 of file CLFFTConvolutionLayer.cpp.

References CLTensorAllocator::allocate(), CLTensor::allocator(), ARM_COMPUTE_ERROR_ON, CLTensorAllocator::free(), CLScheduler::get(), ITensor::is_used(), ITensor::mark_as_unused(), CLScheduler::queue(), ICLSimpleFunction::run(), CLPermute::run(), and CLPadLayer::run().

Referenced by CLFFTConvolutionLayer::run().

356 {
357  if(!_is_prepared)
358  {
359  // Permute bias to NCHW
360  if(_original_bias != nullptr)
361  {
362  _permuted_bias.allocator()->allocate();
363  _permute_bias_func.run();
364  _original_bias->mark_as_unused();
365  }
366 
367  const ICLTensor *cur_weights = _original_weights;
368  // Permute weights
369  if(_needs_permute)
370  {
371  ARM_COMPUTE_ERROR_ON(!cur_weights->is_used());
372 
373  _permuted_weights.allocator()->allocate();
374  _permute_weights_func.run();
375  cur_weights->mark_as_unused();
376  cur_weights = &_permuted_weights;
377  }
378 
379  // Flip weights
380  _flipped_weights.allocator()->allocate();
381  _flip_weights_func.run();
382  cur_weights->mark_as_unused();
383 
384  // Pad weights
385  _padded_weights.allocator()->allocate();
386  _pad_weights_func.run();
387  _flipped_weights.mark_as_unused();
388  CLScheduler::get().queue().finish();
389  _flipped_weights.allocator()->free();
390 
391  // Transform weights to frequency domain
392  _transformed_weights.allocator()->allocate();
393  _transform_weights_func->run();
394  _padded_weights.mark_as_unused();
395  CLScheduler::get().queue().finish();
396  // Delete object and release internal memory
397  _transform_weights_func.reset();
398  _padded_weights.allocator()->free();
399 
400  _is_prepared = true;
401  }
402 }
static CLScheduler & get()
Access the scheduler singleton.
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:466
CLTensorAllocator * allocator()
Return a pointer to the tensor&#39;s allocator.
Definition: CLTensor.cpp:61
void run() override
Run the kernels contained in the function.
Definition: CLPermute.cpp:73
void mark_as_unused() const
Marks a tensor as unused.
Definition: ITensor.cpp:168
void run() override final
Run the kernels contained in the function.
cl::CommandQueue & queue()
Accessor for the associated CL command queue.
Definition: CLScheduler.cpp:39
void allocate() override
Allocate size specified by TensorInfo of OpenCL memory.
void free() override
Free allocated OpenCL memory.
void run() override
Run the kernels contained in the function.
Definition: CLPadLayer.cpp:82

◆ run()

void run ( )
overridevirtual

Run the kernels contained in the function.

For CPU kernels:

  • Multi-threading is used for the kernels which are parallelisable.
  • By default std::thread::hardware_concurrency() threads are used.
Note
CPPScheduler::set_num_threads() can be used to manually set the number of threads

For OpenCL kernels:

  • All the kernels are enqueued on the queue associated with CLScheduler.
  • The queue is then flushed.
Note
The function will not block until the kernels are executed. It is the user's responsibility to wait.
Will call prepare() on first run if hasn't been done

Implements IFunction.

Definition at line 316 of file CLFFTConvolutionLayer.cpp.

References CLTensor::allocator(), CLTensor::cl_buffer(), CLTensorAllocator::import_memory(), CLFFTConvolutionLayer::prepare(), CLFFT2D::run(), CLPermute::run(), CLSlice::run(), CLActivationLayer::run(), CLPadLayer::run(), CLReductionOperation::run(), CLArithmeticAddition::run(), and CLComplexPixelWiseMultiplication::run().

317 {
318  prepare();
319 
320  MemoryGroupResourceScope scope_mg(_memory_group);
321 
322  // Transform input
323  if(_needs_permute)
324  {
325  _permute_input_func.run();
326  }
327  _pad_input_func.run();
328  _transform_input_func.run();
329 
330  // Perform operations to frequency domain
331  _prod_func.run();
332  _reduce_func.run();
333 
334  // Transform output
335  _itransform_output_func.run();
336  _reshaped_output.allocator()->import_memory(_itransformed_output.cl_buffer());
337  _extract_output_func.run();
338  // Add bias
339  if(_has_bias)
340  {
341  _bias_add_func.run();
342  }
343  if(_needs_permute)
344  {
345  _permute_output_func.run();
346  }
347 
348  // Run activation layer
349  if(_is_activationlayer_enabled)
350  {
351  _activation_layer_func.run();
352  }
353 }
void run() override
Run the kernels contained in the function.
const cl::Buffer & cl_buffer() const override
Interface to be implemented by the child class to return a reference to the OpenCL buffer containing ...
Definition: CLTensor.cpp:51
void run() override
Run the kernels contained in the function.
Definition: CLFFT2D.cpp:100
void run() override
Run the kernels contained in the function.
CLTensorAllocator * allocator()
Return a pointer to the tensor&#39;s allocator.
Definition: CLTensor.cpp:61
Status import_memory(cl::Buffer buffer)
Import an existing memory as a tensor&#39;s backing memory.
void run() override
Run the kernels contained in the function.
Definition: CLPermute.cpp:73
void run() override
Run the kernels contained in the function.
void run() override
Run the kernels contained in the function.
Definition: CLSlice.cpp:100
void run() override
Run the kernels contained in the function.
void run() override
Run the kernels contained in the function.
Definition: CLPadLayer.cpp:82
void prepare() override
Prepare the function for executing.

◆ validate()

Status validate ( const ITensorInfo input,
const ITensorInfo weights,
const ITensorInfo biases,
const ITensorInfo output,
const PadStrideInfo conv_info,
const ActivationLayerInfo act_info = ActivationLayerInfo(),
bool  enable_fast_math = false 
)
static

Static function to check if given info will lead to a valid configuration of CLFFTConvolutionLayer.

Note
: This function only works with any square kernel size and unit strides for both NCHW and NHWC data layout
Parameters
[in]inputSource tensor. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: F16/F32.
[in]weightsWeights tensor. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. Data type supported:Same as input.
[in]biasesBiases tensor. Shared biases supported. Biases are 1D tensor with dimensions [OFM].Data type supported: Same as input
[out]outputDestination tensor. 3 lower dimensions represent a single output [width, height, OFM], while the rest represent batch of outputs. Data types supported: Same as input.
[in]conv_infoContains padding and stride information described in PadStrideInfo.
[in]act_info(Optional) Activation layer information in case of a fused activation.
[in]enable_fast_math(Optional) Enable fast math computation. In case this flag were set, the function could dispatch the fastest implementation available which may introduce a drop of accuracy as well. Default is false
Returns
a status

Definition at line 272 of file CLFFTConvolutionLayer.cpp.

References ARM_COMPUTE_RETURN_ERROR_ON, ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES, ARM_COMPUTE_RETURN_ON_ERROR, ITensorInfo::data_layout(), ITensorInfo::data_type(), ActivationLayerInfo::enabled(), arm_compute::F16, arm_compute::F32, arm_compute::get_data_layout_dimension_index(), arm_compute::HEIGHT, arm_compute::test::validation::idx_height, arm_compute::test::validation::idx_width, PadStrideInfo::pad_bottom(), PadStrideInfo::pad_left(), PadStrideInfo::pad_right(), PadStrideInfo::pad_top(), PadStrideInfo::stride(), ITensorInfo::tensor_shape(), ITensorInfo::total_size(), CLActivationLayer::validate(), arm_compute::WIDTH, and Dimensions< T >::x().

Referenced by CLFFTConvolutionLayer::configure(), ClConv2d::get_convolution_method(), and CLConvolutionLayer::validate().

274 {
276  ARM_COMPUTE_RETURN_ERROR_ON((input->data_type() == DataType::F16) && !enable_fast_math);
278 
279  // Get indices for the width and height
282 
283  // Input shape, kernel size and output tile
284  const Size2D kernel_size = Size2D(weights->tensor_shape()[idx_width], weights->tensor_shape()[idx_height]);
285 
286  // Strides
287  const auto strides = conv_info.stride();
288  ARM_COMPUTE_RETURN_ERROR_ON(strides.first != strides.second && strides.first != 1);
289  ARM_COMPUTE_RETURN_ERROR_ON(kernel_size.x() != kernel_size.y());
290  ARM_COMPUTE_RETURN_ERROR_ON(conv_info.pad_left() != (kernel_size.x() / 2) || conv_info.pad_right() != (kernel_size.x() / 2));
291  ARM_COMPUTE_RETURN_ERROR_ON(conv_info.pad_top() != (kernel_size.y() / 2) || conv_info.pad_bottom() != (kernel_size.y() / 2));
292 
293  // Validate biases
294  if(biases != nullptr)
295  {
297  ARM_COMPUTE_RETURN_ERROR_ON(weights->tensor_shape()[3] != biases->tensor_shape().x());
298  }
299 
300  // Checks performed when output is configured
301  if((output != nullptr) && (output->total_size() != 0))
302  {
304  ARM_COMPUTE_RETURN_ERROR_ON((input->tensor_shape()[idx_height] != output->tensor_shape()[idx_height]) || (input->tensor_shape()[idx_width] != output->tensor_shape()[idx_width]));
305 
306  // Validate Activation Layer
307  if(act_info.enabled())
308  {
309  ARM_COMPUTE_RETURN_ON_ERROR(CLActivationLayer::validate(output, nullptr, act_info));
310  }
311  }
312 
313  return Status{};
314 }
static Status validate(const ITensorInfo *input, const ITensorInfo *output, const ActivationLayerInfo &act_info)
Static function to check if given info will lead to a valid configuration of CLActivationLayer.
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204
1 channel, 1 F32 per channel
#define ARM_COMPUTE_RETURN_ERROR_ON(cond)
If the condition is true, an error is returned.
Definition: Error.h:296
1 channel, 1 F16 per channel
size_t get_data_layout_dimension_index(const DataLayout &data_layout, const DataLayoutDimension &data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:193
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(...)
Definition: Validate.h:541
#define ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(t, c,...)
Definition: Validate.h:788

The documentation for this class was generated from the following files: