Compute Library
 19.08
NEWinogradConvolutionLayer Class Reference

Basic function to simulate a convolution layer. More...

#include <NEWinogradConvolutionLayer.h>

Collaboration diagram for NEWinogradConvolutionLayer:
[legend]

Public Member Functions

 NEWinogradConvolutionLayer (const std::shared_ptr< IMemoryManager > &memory_manager=nullptr)
 Constructor. More...
 
void configure (const ITensor *input, const ITensor *weights, const ITensor *biases, ITensor *output, const PadStrideInfo &conv_info, const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false)
 Set the input and output tensors. More...
 
void run () override
 Run the kernels contained in the function. More...
 
void prepare () override
 Prepare the function for executing. More...
 
 NEWinogradConvolutionLayer (const NEWinogradConvolutionLayer &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
NEWinogradConvolutionLayeroperator= (const NEWinogradConvolutionLayer &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
- Public Member Functions inherited from IFunction
virtual ~IFunction ()=default
 Destructor. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, const PadStrideInfo &conv_info, const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false)
 Static function to check if given info will lead to a valid configuration of NEGEMMConvolutionLayer. More...
 

Detailed Description

Basic function to simulate a convolution layer.

This function calls the following NEON kernels:

  1. NEWinogradLayerTransformWeightsKernel (executed only once in the first call to the run() method )
  2. NEWinogradLayerTransformInputKernel
  3. NEWinogradLayerTransformOutputKernel
  4. NEGEMMAssemblyDispatch
  5. CPPPermute (three times: weights, input and output)
Note
Some Winograd configurations (i.e. F(2x2, 5x5), F(4x4, 5x5)) are supported only with enable_fast_math = true

Definition at line 52 of file NEWinogradConvolutionLayer.h.

Constructor & Destructor Documentation

◆ NEWinogradConvolutionLayer() [1/2]

NEWinogradConvolutionLayer ( const std::shared_ptr< IMemoryManager > &  memory_manager = nullptr)

Constructor.

Definition at line 237 of file NEWinogradConvolutionLayer.cpp.

238  : _memory_group(memory_manager), _gemm_function(memory_manager), _transform_input_kernel(nullptr), _transform_output_kernel(nullptr), _transform_weights_kernel(nullptr), _activationlayer_function(),
239  _permute_input(), _permute_weights(), _permute_output(), _input_transformed(), _output_transformed(), _input_workspace(), _output_workspace(), _kernel_storage(), _input_nhwc(), _output_nhwc(),
240  _weights_hwio(), _input(), _weights(), _output(), _is_prepared(false), _is_activationlayer_enabled(false)
241 {
242 }

◆ NEWinogradConvolutionLayer() [2/2]

Prevent instances of this class from being copied (As this class contains pointers)

Member Function Documentation

◆ configure()

void configure ( const ITensor input,
const ITensor weights,
const ITensor biases,
ITensor output,
const PadStrideInfo conv_info,
const ActivationLayerInfo act_info = ActivationLayerInfo(),
bool  enable_fast_math = false 
)

Set the input and output tensors.

Parameters
[in]inputSource tensor. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: F32.
[in]weightsWeights tensor. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. Data type supported: Same as input. Currently only 3x3 and 5x5 kernels are supported.
[in]biasesBiases tensor. Shared biases supported. Biases are 1D tensor with dimensions [OFM]. Data type supported: Same as weights.
[out]outputDestination tensor. 3 lower dimensions represent a single output [width, height, OFM], while the rest represent batch of outputs. Data types supported: Same as input.
[in]conv_infoContains padding and stride information described in PadStrideInfo. Currently only unit strides are supported.
[in]act_info(Optional) Activation layer information in case of a fused activation.
[in]enable_fast_math(Optional) Enable fast math computation. In case this flag were set, the function could dispatch the fastest implementation available which may introduce a drop of accuracy as well. Default is false

Definition at line 244 of file NEWinogradConvolutionLayer.cpp.

246 {
247  ARM_COMPUTE_ERROR_ON_NULLPTR(input, weights, output);
248  ARM_COMPUTE_ERROR_THROW_ON(validate_arguments(input->info(), weights->info(), (biases != nullptr) ? biases->info() : nullptr, output->info(), conv_info));
249 
250  // Get indices for the width and height
251  const DataLayout data_layout = input->info()->data_layout();
255 
256  const Size2D input_dims = Size2D(input->info()->dimension(width_idx), input->info()->dimension(height_idx));
257  const Size2D kernel_size = Size2D(weights->info()->dimension(width_idx), weights->info()->dimension(height_idx));
258  const Size2D output_tile = winograd_output_tile(input_dims, kernel_size);
259 
260  // Check if the Winograd configuration requires fast math
261  if(!enable_fast_math)
262  {
263  ARM_COMPUTE_ERROR_ON_MSG(check_support_fast_math(output_tile, kernel_size), "This Winograd configuration requires enable_fast_math=true");
264  }
265 
266  _weights = weights;
267  _input = input;
268  _output = output;
269  _is_prepared = false;
270 
271  std::unique_ptr<INEWinogradLayerTransformInputKernel<float>> transform_input_kernel;
272  std::unique_ptr<INEWinogradLayerTransformWeightsKernel<float>> transform_weights_kernel;
273  std::unique_ptr<INEWinogradLayerTransformOutputKernel<float>> transform_output_kernel;
274 
275  int n_gemms = 0;
276  int N_BLOCK = 0; // Size of block used by GEMM.
277 
278  if(kernel_size == Size2D(3, 3))
279  {
280  if(input->info()->dimension(width_idx) > 4 && input->info()->dimension(height_idx) > 4)
281  {
282  using config = NEWinogradLayerConfiguration<float, float, 4, 4, 3, 3>;
283  transform_input_kernel = support::cpp14::make_unique<config::TransformInputKernel>();
284  transform_weights_kernel = support::cpp14::make_unique<config::TransformWeightsKernel>();
285  transform_output_kernel = support::cpp14::make_unique<config::TransformOutputKernel>();
286  n_gemms = config::WinogradBase::N_GEMMS;
287  N_BLOCK = config::WinogradConv::N_BLOCK;
288  }
289  else
290  {
291  using config = NEWinogradLayerConfiguration<float, float, 2, 2, 3, 3>;
292  transform_input_kernel = support::cpp14::make_unique<config::TransformInputKernel>();
293  transform_weights_kernel = support::cpp14::make_unique<config::TransformWeightsKernel>();
294  transform_output_kernel = support::cpp14::make_unique<config::TransformOutputKernel>();
295  n_gemms = config::WinogradBase::N_GEMMS;
296  N_BLOCK = config::WinogradConv::N_BLOCK;
297  }
298  }
299  else if(kernel_size == Size2D(5, 5))
300  {
301  using config = NEWinogradLayerConfiguration<float, float, 2, 2, 5, 5>;
302  transform_input_kernel = support::cpp14::make_unique<config::TransformInputKernel>();
303  transform_weights_kernel = support::cpp14::make_unique<config::TransformWeightsKernel>();
304  transform_output_kernel = support::cpp14::make_unique<config::TransformOutputKernel>();
305  n_gemms = config::WinogradBase::N_GEMMS;
306  N_BLOCK = config::WinogradConv::N_BLOCK;
307  }
308  else if(kernel_size == Size2D(1, 3))
309  {
310  using config = NEWinogradLayerConfiguration<float, float, 6, 1, 3, 1>;
311  transform_input_kernel = support::cpp14::make_unique<config::TransformInputKernel>();
312  transform_weights_kernel = support::cpp14::make_unique<config::TransformWeightsKernel>();
313  transform_output_kernel = support::cpp14::make_unique<config::TransformOutputKernel>();
314  n_gemms = config::WinogradBase::N_GEMMS;
315  N_BLOCK = config::WinogradConv::N_BLOCK;
316  }
317  else if(kernel_size == Size2D(3, 1))
318  {
319  using config = NEWinogradLayerConfiguration<float, float, 1, 6, 1, 3>;
320  transform_input_kernel = support::cpp14::make_unique<config::TransformInputKernel>();
321  transform_weights_kernel = support::cpp14::make_unique<config::TransformWeightsKernel>();
322  transform_output_kernel = support::cpp14::make_unique<config::TransformOutputKernel>();
323  n_gemms = config::WinogradBase::N_GEMMS;
324  N_BLOCK = config::WinogradConv::N_BLOCK;
325  }
326  else if(kernel_size == Size2D(1, 5))
327  {
328  using config = NEWinogradLayerConfiguration<float, float, 4, 1, 5, 1>;
329  transform_input_kernel = support::cpp14::make_unique<config::TransformInputKernel>();
330  transform_weights_kernel = support::cpp14::make_unique<config::TransformWeightsKernel>();
331  transform_output_kernel = support::cpp14::make_unique<config::TransformOutputKernel>();
332  n_gemms = config::WinogradBase::N_GEMMS;
333  N_BLOCK = config::WinogradConv::N_BLOCK;
334  }
335  else if(kernel_size == Size2D(5, 1))
336  {
337  using config = NEWinogradLayerConfiguration<float, float, 1, 4, 1, 5>;
338  transform_input_kernel = support::cpp14::make_unique<config::TransformInputKernel>();
339  transform_weights_kernel = support::cpp14::make_unique<config::TransformWeightsKernel>();
340  transform_output_kernel = support::cpp14::make_unique<config::TransformOutputKernel>();
341  n_gemms = config::WinogradBase::N_GEMMS;
342  N_BLOCK = config::WinogradConv::N_BLOCK;
343  }
344  else if(kernel_size == Size2D(1, 7))
345  {
346  using config = NEWinogradLayerConfiguration<float, float, 2, 1, 7, 1>;
347  transform_input_kernel = support::cpp14::make_unique<config::TransformInputKernel>();
348  transform_weights_kernel = support::cpp14::make_unique<config::TransformWeightsKernel>();
349  transform_output_kernel = support::cpp14::make_unique<config::TransformOutputKernel>();
350  n_gemms = config::WinogradBase::N_GEMMS;
351  N_BLOCK = config::WinogradConv::N_BLOCK;
352  }
353  else if(kernel_size == Size2D(7, 1))
354  {
355  using config = NEWinogradLayerConfiguration<float, float, 1, 2, 1, 7>;
356  transform_input_kernel = support::cpp14::make_unique<config::TransformInputKernel>();
357  transform_weights_kernel = support::cpp14::make_unique<config::TransformWeightsKernel>();
358  transform_output_kernel = support::cpp14::make_unique<config::TransformOutputKernel>();
359  n_gemms = config::WinogradBase::N_GEMMS;
360  N_BLOCK = config::WinogradConv::N_BLOCK;
361  }
362  else
363  {
364  ARM_COMPUTE_ERROR("Not supported.");
365  }
366 
367  const PaddingType use_padding_type = (conv_info.pad_top() != 0u || conv_info.pad_left() != 0) ? PADDING_SAME : PADDING_VALID;
368  const bool use_same_padding = use_padding_type == PADDING_SAME;
369 
370  // Get convolved dimensions
371  const int in_channels = input->info()->dimension(channel_idx);
372  const int out_channels = output->info()->dimension(channel_idx);
373 
374  const Tensor4DShape in_shape(internal_get_input_shape(input));
375  const DataType data_type = input->info()->data_type();
376  const size_t data_type_size = input->info()->element_size();
377  // Get the memory required to instantiate a new Winograd operator.
378  constexpr size_t storage_alignment = 64;
379 
380  // Kernel Storage
381  const size_t kernel_storage_size = transform_weights_kernel->get_weight_storage_size(out_channels,
382  in_channels)
383  * data_type_size;
384 
385  // Input storage
386  const size_t input_storage_size = transform_input_kernel->get_input_storage_size(in_shape.n_batches, in_shape.n_channels, in_shape.n_rows, in_shape.n_cols,
387  use_same_padding)
388  * data_type_size;
389 
390  // Output storage
391  const size_t output_storage_size = transform_output_kernel->get_output_storage_size(in_shape.n_batches, in_shape.n_rows, in_shape.n_cols, out_channels,
392  use_same_padding)
393  * data_type_size;
394  ;
395  const KernelShape kernel_shape({ out_channels, static_cast<int>(kernel_size.height), static_cast<int>(kernel_size.width), in_channels });
396  const int kernel_matrix_stride = transform_weights_kernel->get_matrix_stride(kernel_shape);
397 
398  const int output_matrix_stride = transform_output_kernel->get_matrix_stride(kernel_shape, in_shape, use_padding_type);
399  const auto output_shape(transform_output_kernel->get_output_shape(kernel_shape, in_shape, use_padding_type));
400 
401  const int input_matrix_stride = transform_input_kernel->get_matrix_stride(kernel_shape, in_shape, use_padding_type);
402 
403  // Configure GEMM
404  const int tile_rows = iceildiv(output_shape.n_rows, output_tile.height);
405  const int tile_cols = iceildiv(output_shape.n_cols, output_tile.width);
406  const int m = in_shape.n_batches * tile_rows * tile_cols;
407  const int k = in_shape.n_channels;
408  const int n = out_channels;
409  const int kernel_matrix_row_stride = roundup(out_channels, N_BLOCK);
410  const int output_matrix_row_stride = kernel_matrix_row_stride;
411 
412  TensorShape a_shape(k, m, 1, n_gemms);
413  Strides a_strides(data_type_size);
414  a_strides.set(1, a_strides[0] * k);
415  //a_strides.set(2, data_type_size * input_matrix_stride / n_gemms); FIXME: This is the real batch size, but RSH's code crashes if it's not 0.
416  a_strides.set(2, 0);
417  a_strides.set(3, data_type_size * input_matrix_stride);
418 
419  TensorShape b_shape(n, k, n_gemms);
420  Strides b_strides(data_type_size);
421  b_strides.set(1, data_type_size * kernel_matrix_row_stride);
422  b_strides.set(2, data_type_size * kernel_matrix_stride);
423 
424  TensorShape d_shape(n, m, 1, n_gemms);
425  Strides d_strides(data_type_size);
426  d_strides.set(1, data_type_size * output_matrix_row_stride);
427  //d_strides.set(2, data_type_size * output_matrix_stride / n_gemms); FIXME: This is the real batch size, but RSH's code crashes if it's not 0.
428  d_strides.set(2, 0);
429  d_strides.set(3, data_type_size * output_matrix_stride);
430 
431  TensorInfo a_info{};
432  TensorInfo b_info{};
433  TensorInfo d_info{};
434  a_info.init(a_shape, 1, data_type, a_strides, 0, input_storage_size);
435  b_info.init(b_shape, 1, data_type, b_strides, 0, kernel_storage_size);
436  d_info.init(d_shape, 1, data_type, d_strides, 0, output_storage_size);
437 
438  _input_transformed.allocator()->init(a_info, storage_alignment);
439  _kernel_storage.allocator()->init(b_info, storage_alignment);
440  _output_transformed.allocator()->init(d_info, storage_alignment);
441 
442  // configure and allocate dst tensor to be used to convert from winograd domain to spatial domain when calling to reshape_output()
443  TensorInfo info(TensorShape(_output->info()->dimension(2), _output->info()->dimension(0),
444  _output->info()->dimension(1), _output->info()->dimension(3)),
445  1, _output->info()->data_type());
446  _output_nhwc.allocator()->init(info);
447 
448  const ITensor *input_to_use = _input;
449  ITensor *output_to_use = _output;
450  PermutationVector weights_permutation_vector(3U, 0U, 1U, 2U);
451  const unsigned int max_num_threads = NEScheduler::get().num_threads();
452 
453  // Configure the kernel to transform the input tensor from NCHW -> NHWC
455  {
456  _memory_group.manage(&_input_nhwc);
457  _permute_input.configure(input, &_input_nhwc, PermutationVector(2U, 0U, 1U));
458  input_to_use = &_input_nhwc;
459  weights_permutation_vector = PermutationVector(3U, 2U, 0U, 1U);
460  }
461 
462  // Configure input transform kernel
463  _memory_group.manage(&_input_transformed);
464  _memory_group.manage(&_input_workspace);
465  transform_input_kernel->configure(input_to_use, in_shape.n_batches, in_shape.n_rows, in_shape.n_cols, in_shape.n_channels, use_padding_type,
466  &_input_transformed, input_matrix_stride, &_input_workspace);
467  const size_t input_workspace_size = transform_input_kernel->get_working_space_size(max_num_threads);
468  TensorInfo input_workspace_info(TensorShape(input_workspace_size), 1, _input->info()->data_type());
469  _input_workspace.allocator()->init(input_workspace_info);
470  _input_workspace.allocator()->allocate();
472  {
473  _input_nhwc.allocator()->allocate();
474  }
475 
476  // Re-order a weight tensor from [Output feature map x Input feature map x Height x Width] to [Height x Width x Input feature map x Output feature map]
477  _permute_weights.configure(weights, &_weights_hwio, weights_permutation_vector);
478  transform_weights_kernel->configure(&_weights_hwio, &_kernel_storage, kernel_matrix_stride, out_channels, in_channels);
479 
480  // Configure GEMM function
481  _memory_group.manage(&_output_transformed);
482  _gemm_function.configure(&_input_transformed, &_kernel_storage, nullptr, &_output_transformed, 1.0f, 0.f);
483  _input_transformed.allocator()->allocate();
484 
485  // Configure output transform function
486  // The biases tensor has not been allocated at this point in time, the output transform will add the biases to the final result in the run() method
488  {
489  _memory_group.manage(&_output_nhwc);
490  output_to_use = &_output_nhwc;
491  }
492  transform_output_kernel->configure(biases, &_output_transformed,
493  output_matrix_stride, output_to_use,
494  in_shape.n_batches, output_shape.n_rows, output_shape.n_cols, out_channels, &_output_workspace);
495  const size_t output_workspace_size = transform_output_kernel->get_working_space_size(max_num_threads);
496  TensorInfo output_workspace_info(TensorShape(output_workspace_size), 1, _output->info()->data_type());
497  _output_workspace.allocator()->init(output_workspace_info);
498  _output_workspace.allocator()->allocate();
499  _output_transformed.allocator()->allocate();
500 
501  // Reorder the convoluted output to ACL's ordering NCHW
503  {
504  _permute_output.configure(&_output_nhwc, _output, PermutationVector(1U, 2U, 0U));
505  _output_nhwc.allocator()->allocate();
506  }
507 
508  _transform_input_kernel = std::move(transform_input_kernel);
509  _transform_weights_kernel = std::move(transform_weights_kernel);
510  _transform_output_kernel = std::move(transform_output_kernel);
511 
512  //Configure Activation Layer
513  _is_activationlayer_enabled = act_info.enabled();
514  if(_is_activationlayer_enabled)
515  {
516  _activationlayer_function.configure(_output, nullptr, act_info);
517  }
518 }
#define ARM_COMPUTE_ERROR(...)
Print the given message then throw an std::runtime_error.
Definition: Error.h:261
T iceildiv(const T a, const T b)
Definition: utils.hpp:36
const DataLayout data_layout
Definition: Im2Col.cpp:146
TensorInfo * info() const override
Interface to be implemented by the child class to return the tensor's metadata.
Definition: CLTensor.cpp:35
void init(const TensorAllocator &allocator, const Coordinates &coords, TensorInfo &sub_info)
Shares the same backing memory with another tensor allocator, while the tensor info might be differen...
virtual size_t dimension(size_t index) const =0
Return the size of the requested dimension.
size_t dimension(size_t index) const override
Return the size of the requested dimension.
Definition: TensorInfo.h:223
virtual DataType data_type() const =0
Data type used for each element of the tensor.
Strides PermutationVector
Permutation vector.
Definition: Types.h:47
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:327
TensorAllocator * allocator()
Return a pointer to the tensor's allocator.
Definition: Tensor.cpp:48
void manage(TensorType *obj)
Sets a object to be managed by the given memory group.
void allocate() override
Allocate size specified by TensorInfo of CPU memory.
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor's metadata.
Num samples, channels, height, width.
T roundup(const T a, const T b)
Definition: utils.hpp:41
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161
void configure(const ITensor *a, const ITensor *b, const ITensor *c, ITensor *d, float alpha, float beta, const GEMMInfo &gemm_info=GEMMInfo())
Initialise the kernel's inputs, output.
Definition: NEGEMM.cpp:51
void configure(ITensor *input, ITensor *output, ActivationLayerInfo activation_info)
Set the input and output tensor.
void configure(const ITensor *input, ITensor *output, const PermutationVector &perm)
Configure the permute CPP kernel.
Definition: CPPPermute.cpp:31
virtual unsigned int num_threads() const =0
Returns the number of threads that the SingleThreadScheduler has in his pool.
size_t get_data_layout_dimension_index(const DataLayout data_layout, const DataLayoutDimension data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:326
DataType
Available data types.
Definition: Types.h:74
DataLayout
[DataLayout enum definition]
Definition: Types.h:114
static IScheduler & get()
Access the scheduler singleton.
Definition: Scheduler.cpp:96
#define ARM_COMPUTE_ERROR_ON_MSG(cond,...)
Definition: Error.h:328

References arm_compute::test::validation::act_info, TensorAllocator::allocate(), Tensor::allocator(), ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON_MSG, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, arm_compute::CHANNEL, CPPPermute::configure(), NEActivationLayer::configure(), NEGEMM::configure(), arm_compute::test::validation::conv_info, arm_compute::test::validation::data_layout, ITensorInfo::data_layout(), arm_compute::test::validation::data_type, ITensorInfo::data_type(), ITensorInfo::dimension(), TensorInfo::dimension(), ITensorInfo::element_size(), Scheduler::get(), arm_compute::get_data_layout_dimension_index(), arm_compute::HEIGHT, iceildiv(), ITensor::info(), CLTensor::info(), arm_compute::test::validation::info, TensorAllocator::init(), TensorInfo::init(), MemoryGroupBase< TensorType >::manage(), arm_compute::NCHW, IScheduler::num_threads(), arm_compute::test::validation::output_shape, roundup(), Dimensions< T >::set(), arm_compute::U, arm_compute::test::validation::weights, and arm_compute::WIDTH.

◆ operator=()

NEWinogradConvolutionLayer& operator= ( const NEWinogradConvolutionLayer )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ prepare()

void prepare ( )
overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Note
Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from IFunction.

Definition at line 663 of file NEWinogradConvolutionLayer.cpp.

664 {
665  if(!_is_prepared)
666  {
667  // Permute weights
668  _weights_hwio.allocator()->allocate();
669  _permute_weights.run();
670  _weights->mark_as_unused();
671 
672  // Transform weights
673  _kernel_storage.allocator()->allocate();
674  NEScheduler::get().schedule(_transform_weights_kernel.get(), Window::DimX);
675 
676  _weights_hwio.allocator()->free();
677  _is_prepared = true;
678  }
679 }
TensorAllocator * allocator()
Return a pointer to the tensor's allocator.
Definition: Tensor.cpp:48
void mark_as_unused() const
Marks a tensor as unused.
Definition: ITensor.cpp:167
static constexpr size_t DimX
Alias for dimension 0 also known as X dimension.
Definition: Window.h:43
void allocate() override
Allocate size specified by TensorInfo of CPU memory.
void free() override
Free allocated CPU memory.
virtual void schedule(ICPPKernel *kernel, const Hints &hints)=0
Runs the kernel in the same thread as the caller synchronously.
void run() override final
Run the kernels contained in the function.
static IScheduler & get()
Access the scheduler singleton.
Definition: Scheduler.cpp:96

References TensorAllocator::allocate(), Tensor::allocator(), Window::DimX, TensorAllocator::free(), Scheduler::get(), ITensor::mark_as_unused(), ICPPSimpleFunction::run(), and IScheduler::schedule().

Referenced by NEWinogradConvolutionLayer::run().

◆ run()

void run ( )
overridevirtual

Run the kernels contained in the function.

For NEON kernels:

  • Multi-threading is used for the kernels which are parallelisable.
  • By default std::thread::hardware_concurrency() threads are used.
Note
CPPScheduler::set_num_threads() can be used to manually set the number of threads

For OpenCL kernels:

  • All the kernels are enqueued on the queue associated with CLScheduler.
  • The queue is then flushed.
Note
The function will not block until the kernels are executed. It is the user's responsibility to wait.
Will call prepare() on first run if hasn't been done

Implements IFunction.

Definition at line 520 of file NEWinogradConvolutionLayer.cpp.

521 {
522  const DataLayout data_layout = _input->info()->data_layout();
523 
524  prepare();
525 
526  MemoryGroupResourceScope scope_mg(_memory_group);
527 
529  {
530  //Bring channels to the front as Winograd code expects the tensor to be in the format NHWC
531  _permute_input.run();
532  }
533 
534  // Transform input tensor to the winograd domain
535  NEScheduler::get().schedule(_transform_input_kernel.get(), Window::DimX);
536 
537  //Run 16 GEMMs in multiple threads, each kernel runs one or more GEMMs
538  _gemm_function.run();
539 
540  // Transform output tensor to the spatial domain
541  NEScheduler::get().schedule(_transform_output_kernel.get(), Window::DimX);
542 
544  {
545  // Reorder the convoluted output to ACL's ordering NCHW
546  _permute_output.run();
547  }
548 
549  if(_is_activationlayer_enabled)
550  {
551  _activationlayer_function.run();
552  }
553 }
const DataLayout data_layout
Definition: Im2Col.cpp:146
void run() override final
Run the kernels contained in the function.
void run() override
Run the kernels contained in the function.
Definition: NEGEMM.cpp:239
static constexpr size_t DimX
Alias for dimension 0 also known as X dimension.
Definition: Window.h:43
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor's metadata.
void prepare() override
Prepare the function for executing.
Num samples, channels, height, width.
virtual void schedule(ICPPKernel *kernel, const Hints &hints)=0
Runs the kernel in the same thread as the caller synchronously.
void run() override final
Run the kernels contained in the function.
DataLayout
[DataLayout enum definition]
Definition: Types.h:114
virtual DataLayout data_layout() const =0
Get the data layout of the tensor.
static IScheduler & get()
Access the scheduler singleton.
Definition: Scheduler.cpp:96

References arm_compute::test::validation::data_layout, ITensorInfo::data_layout(), Window::DimX, Scheduler::get(), ITensor::info(), arm_compute::NCHW, NEWinogradConvolutionLayer::prepare(), INESimpleFunctionNoBorder::run(), ICPPSimpleFunction::run(), NEGEMM::run(), and IScheduler::schedule().

◆ validate()

Status validate ( const ITensorInfo input,
const ITensorInfo weights,
const ITensorInfo biases,
const ITensorInfo output,
const PadStrideInfo conv_info,
const ActivationLayerInfo act_info = ActivationLayerInfo(),
bool  enable_fast_math = false 
)
static

Static function to check if given info will lead to a valid configuration of NEGEMMConvolutionLayer.

Parameters
[in]inputSource tensor. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: F32.
[in]weightsWeights tensor. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. Data type supported:Same as input. Currently only 3x3 and 5x5 kernels are supported.
[in]biasesBiases tensor. Shared biases supported. Biases are 1D tensor with dimensions [OFM]. Data type supported: Same as weights.
[in]outputDestination tensor. 3 lower dimensions represent a single output [width, height, OFM], while the rest represent batch of outputs. Data types supported: Same as input.
[in]conv_infoContains padding and stride information described in PadStrideInfo. Currently only unit strides are supported.
[in]act_info(Optional) Activation layer information in case of a fused activation.
[in]enable_fast_math(Optional) Enable fast math computation. In case this flag were set, the function could dispatch the fastest implementation available which may introduce a drop of accuracy as well. Default is false
Returns
a status

Definition at line 555 of file NEWinogradConvolutionLayer.cpp.

557 {
559  ARM_COMPUTE_RETURN_ON_ERROR(validate_arguments(input, weights, biases, output, conv_info));
560 
561  // Get indices for the width and height
562  const size_t idx_width = get_data_layout_dimension_index(input->data_layout(), DataLayoutDimension::WIDTH);
563  const size_t idx_height = get_data_layout_dimension_index(input->data_layout(), DataLayoutDimension::HEIGHT);
564 
565  // Input shape, kernel size and output tile
566  const Size2D input_dims = Size2D(input->dimension(idx_width), input->dimension(idx_height));
567  const Size2D kernel_size = Size2D(weights->dimension(idx_width), weights->dimension(idx_height));
568  const Size2D output_tile = winograd_output_tile(input_dims, kernel_size);
569 
570  // Check if the Winograd configuration requires fast math
571  if(!enable_fast_math)
572  {
573  ARM_COMPUTE_RETURN_ERROR_ON_MSG(check_support_fast_math(output_tile, kernel_size), "This Winograd configuration requires enable_fast_math=true");
574  }
575 
576  const WinogradInfo winograd_info = WinogradInfo(output_tile,
577  kernel_size,
578  input_dims,
579  conv_info,
580  input->data_layout());
581 
582  // Validate input transform
583  const TensorShape input0_shape = misc::shape_calculator::compute_winograd_input_transform_shape(*input, winograd_info);
584  const TensorInfo input0 = input->clone()->set_tensor_shape(input0_shape);
585  // Validate filter transform
587  const TensorInfo input1 = weights->clone()->set_tensor_shape(input1_shape);
588  // Validate batched matrix multiply
589  TensorShape batched_mm_output_shape = input0.tensor_shape();
590  batched_mm_output_shape[0] = input1.tensor_shape()[0];
591  const TensorInfo batched_mm_output = input0.clone()->set_tensor_shape(batched_mm_output_shape);
592 
593  if(kernel_size == Size2D(3, 3))
594  {
595  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_top() != 0u && conv_info.pad_top() != 1, "Only SAME or VALID padding supported");
596  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_bottom() != 0u && conv_info.pad_bottom() != 1, "Only SAME or VALID padding supported");
597  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_left() != 0u && conv_info.pad_left() != 1, "Only SAME or VALID padding supported");
598  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_right() != 0u && conv_info.pad_right() != 1, "Only SAME or VALID padding supported");
599  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_right() != conv_info.pad_left(), "Only SAME or VALID padding supported");
600  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_top() != conv_info.pad_bottom(), "Only SAME or VALID padding supported");
601  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_top() != conv_info.pad_left(), "Only SAME or VALID padding supported");
602  return validate_kernel_3x3(input_dims, input, &input0, &input1, &batched_mm_output, weights, biases, output, winograd_info, act_info);
603  }
604  else if(kernel_size == Size2D(5, 5))
605  {
606  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_top() != 0u && conv_info.pad_top() != 2, "Only SAME or VALID padding supported");
607  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_left() != 0u && conv_info.pad_left() != 2, "Only SAME or VALID padding supported");
608  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_bottom() != 0u && conv_info.pad_bottom() != 2, "Only SAME or VALID padding supported");
609  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_right() != 0u && conv_info.pad_right() != 2, "Only SAME or VALID padding supported");
610  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_right() != conv_info.pad_left(), "Only SAME or VALID padding supported");
611  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_top() != conv_info.pad_bottom(), "Only SAME or VALID padding supported");
612  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_top() != conv_info.pad_left(), "Only SAME or VALID padding supported");
613  return validate_kernel_5x5(input, &input0, &input1, &batched_mm_output, weights, biases, output, winograd_info, act_info);
614  }
615  if(kernel_size == Size2D(3, 1))
616  {
617  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_left() != 0u && conv_info.pad_left() != 1, "Only SAME or VALID padding supported");
618  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_right() != 0u && conv_info.pad_right() != 1, "Only SAME or VALID padding supported");
619  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_top() != 0u && conv_info.pad_bottom() != 0, "Only SAME or VALID padding supported");
620  return validate_kernel_3x1(input, &input0, &input1, &batched_mm_output, weights, biases, output, winograd_info, act_info);
621  }
622  else if(kernel_size == Size2D(1, 3))
623  {
624  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_top() != 0u && conv_info.pad_top() != 1, "Only SAME or VALID padding supported");
625  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_bottom() != 0u && conv_info.pad_bottom() != 1, "Only SAME or VALID padding supported");
626  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_left() != 0u && conv_info.pad_right() != 0, "Only SAME or VALID padding supported");
627  return validate_kernel_1x3(input, &input0, &input1, &batched_mm_output, weights, biases, output, winograd_info, act_info);
628  }
629  else if(kernel_size == Size2D(5, 1))
630  {
631  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_left() != 0u && conv_info.pad_left() != 2, "Only SAME or VALID padding supported");
632  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_right() != 0u && conv_info.pad_right() != 2, "Only SAME or VALID padding supported");
633  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_top() != 0u && conv_info.pad_bottom() != 0, "Only SAME or VALID padding supported");
634  return validate_kernel_5x1(input, &input0, &input1, &batched_mm_output, weights, biases, output, winograd_info, act_info);
635  }
636  else if(kernel_size == Size2D(1, 5))
637  {
638  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_top() != 0u && conv_info.pad_top() != 2, "Only SAME or VALID padding supported");
639  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_bottom() != 0u && conv_info.pad_bottom() != 2, "Only SAME or VALID padding supported");
640  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_left() != 0u && conv_info.pad_right() != 0, "Only SAME or VALID padding supported");
641  return validate_kernel_1x5(input, &input0, &input1, &batched_mm_output, weights, biases, output, winograd_info, act_info);
642  }
643  else if(kernel_size == Size2D(7, 1))
644  {
645  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_left() != 0u && conv_info.pad_left() != 3, "Only SAME or VALID padding supported");
646  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_right() != 0u && conv_info.pad_right() != 3, "Only SAME or VALID padding supported");
647  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_top() != 0u && conv_info.pad_bottom() != 0, "Only SAME or VALID padding supported");
648  return validate_kernel_7x1(input, &input0, &input1, &batched_mm_output, weights, biases, output, winograd_info, act_info);
649  }
650  else if(kernel_size == Size2D(1, 7))
651  {
652  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_top() != 0u && conv_info.pad_top() != 3, "Only SAME or VALID padding supported");
653  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_bottom() != 0u && conv_info.pad_bottom() != 3, "Only SAME or VALID padding supported");
654  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_left() != 0u && conv_info.pad_right() != 0, "Only SAME or VALID padding supported");
655  return validate_kernel_1x7(input, &input0, &input1, &batched_mm_output, weights, biases, output, winograd_info, act_info);
656  }
657  else
658  {
659  ARM_COMPUTE_RETURN_ERROR_MSG("Kernel shape not supported");
660  }
661 }
TensorShape compute_winograd_input_transform_shape(const ITensorInfo &input, const WinogradInfo &winograd_info)
Calculate the winograd input transform shape.
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:193
#define ARM_COMPUTE_RETURN_ERROR_ON_MSG(cond,...)
If the condition is true, an error is returned.
Definition: Error.h:214
TensorShape compute_winograd_filter_transform_shape(const ITensorInfo &input, const WinogradInfo &winograd_info)
Calculate the winograd filter transform shape.
#define ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR(...)
Definition: Validate.h:163
#define ARM_COMPUTE_RETURN_ERROR_MSG(...)
An error is returned with the given description.
Definition: Error.h:183
size_t get_data_layout_dimension_index(const DataLayout data_layout, const DataLayoutDimension data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:326

References arm_compute::test::validation::act_info, ARM_COMPUTE_RETURN_ERROR_MSG, ARM_COMPUTE_RETURN_ERROR_ON_MSG, ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR, ARM_COMPUTE_RETURN_ON_ERROR, ICloneable< T >::clone(), TensorInfo::clone(), arm_compute::misc::shape_calculator::compute_winograd_filter_transform_shape(), arm_compute::misc::shape_calculator::compute_winograd_input_transform_shape(), arm_compute::test::validation::conv_info, ITensorInfo::data_layout(), ITensorInfo::dimension(), arm_compute::get_data_layout_dimension_index(), arm_compute::HEIGHT, TensorInfo::tensor_shape(), arm_compute::test::validation::weights, arm_compute::WIDTH, and arm_compute::test::validation::winograd_info.

Referenced by NEConvolutionLayer::get_convolution_method(), and NEConvolutionLayer::validate().


The documentation for this class was generated from the following files: