Compute Library
 19.11
NEWinogradConvolutionLayer Class Reference

Basic function to simulate a convolution layer. More...

#include <NEWinogradConvolutionLayer.h>

Collaboration diagram for NEWinogradConvolutionLayer:
[legend]

Public Member Functions

 NEWinogradConvolutionLayer (const std::shared_ptr< IMemoryManager > &memory_manager=nullptr)
 Constructor. More...
 
void configure (const ITensor *input, const ITensor *weights, const ITensor *biases, ITensor *output, const PadStrideInfo &conv_info, const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false)
 Set the input and output tensors. More...
 
void run () override
 Run the kernels contained in the function. More...
 
void prepare () override
 Prepare the function for executing. More...
 
 NEWinogradConvolutionLayer (const NEWinogradConvolutionLayer &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
NEWinogradConvolutionLayeroperator= (const NEWinogradConvolutionLayer &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
- Public Member Functions inherited from IFunction
virtual ~IFunction ()=default
 Destructor. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, const PadStrideInfo &conv_info, const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false)
 Static function to check if given info will lead to a valid configuration of NEGEMMConvolutionLayer. More...
 

Detailed Description

Basic function to simulate a convolution layer.

This function calls the following NEON kernels:

  1. NEWinogradLayerTransformWeightsKernel (executed only once in the first call to the run() method )
  2. NEWinogradLayerTransformInputKernel
  3. NEWinogradLayerTransformOutputKernel
  4. NEGEMMAssemblyDispatch
  5. CPPPermute (three times: weights, input and output)
Note
Some Winograd configurations (i.e. F(2x2, 5x5), F(4x4, 5x5)) are supported only with enable_fast_math = true

Definition at line 52 of file NEWinogradConvolutionLayer.h.

Constructor & Destructor Documentation

◆ NEWinogradConvolutionLayer() [1/2]

NEWinogradConvolutionLayer ( const std::shared_ptr< IMemoryManager > &  memory_manager = nullptr)

Constructor.

Definition at line 263 of file NEWinogradConvolutionLayer.cpp.

264  : _memory_group(memory_manager), _gemm_function(memory_manager), _transform_input_kernel(nullptr), _transform_output_kernel(nullptr), _transform_weights_kernel(nullptr), _activationlayer_function(),
265  _permute_input(), _permute_weights(), _permute_output(), _input_transformed(), _output_transformed(), _input_workspace(), _output_workspace(), _kernel_storage(), _input_nhwc(), _output_nhwc(),
266  _weights_hwio(), _input(), _weights(), _output(), _is_prepared(false), _is_activationlayer_enabled(false)
267 {
268 }

◆ NEWinogradConvolutionLayer() [2/2]

Prevent instances of this class from being copied (As this class contains pointers)

Member Function Documentation

◆ configure()

void configure ( const ITensor input,
const ITensor weights,
const ITensor biases,
ITensor output,
const PadStrideInfo conv_info,
const ActivationLayerInfo act_info = ActivationLayerInfo(),
bool  enable_fast_math = false 
)

Set the input and output tensors.

Parameters
[in]inputSource tensor. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: F32.
[in]weightsWeights tensor. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. Data type supported: Same as input. Currently only 3x3 and 5x5 kernels are supported.
[in]biasesBiases tensor. Shared biases supported. Biases are 1D tensor with dimensions [OFM]. Data type supported: Same as weights.
[out]outputDestination tensor. 3 lower dimensions represent a single output [width, height, OFM], while the rest represent batch of outputs. Data types supported: Same as input.
[in]conv_infoContains padding and stride information described in PadStrideInfo. Currently only unit strides are supported.
[in]act_info(Optional) Activation layer information in case of a fused activation.
[in]enable_fast_math(Optional) Enable fast math computation. In case this flag were set, the function could dispatch the fastest implementation available which may introduce a drop of accuracy as well. Default is false

Definition at line 270 of file NEWinogradConvolutionLayer.cpp.

272 {
274  ARM_COMPUTE_ERROR_THROW_ON(validate_arguments(input->info(), weights->info(), (biases != nullptr) ? biases->info() : nullptr, output->info(), conv_info));
275 
276  // Get indices for the width and height
277  const DataLayout data_layout = input->info()->data_layout();
281 
282  const Size2D input_dims = Size2D(input->info()->dimension(width_idx), input->info()->dimension(height_idx));
283  const Size2D kernel_size = Size2D(weights->info()->dimension(width_idx), weights->info()->dimension(height_idx));
284  const Size2D output_tile = winograd_output_tile(input_dims, kernel_size);
285 
286 
287 
288  // Check if the Winograd configuration requires fast math
289  if(!enable_fast_math)
290  {
291  ARM_COMPUTE_ERROR_ON_MSG(check_support_fast_math(output_tile, kernel_size), "This Winograd configuration requires enable_fast_math=true");
292  }
293 
294  _weights = weights;
295  _input = input;
296  _output = output;
297  _is_prepared = false;
298 
299  std::unique_ptr<INEWinogradLayerTransformInputKernel<float>> transform_input_kernel;
300  std::unique_ptr<INEWinogradLayerTransformWeightsKernel<float>> transform_weights_kernel;
301  std::unique_ptr<INEWinogradLayerTransformOutputKernel<float>> transform_output_kernel;
302 
303  int n_gemms = 0;
304  int N_BLOCK = 0; // Size of block used by GEMM.
305 
306  if(kernel_size == Size2D(3, 3))
307  {
308  if(input->info()->dimension(width_idx) > 4 && input->info()->dimension(height_idx) > 4)
309  {
310  using config = NEWinogradLayerConfiguration<float, float, 4, 4, 3, 3>;
311  transform_input_kernel = support::cpp14::make_unique<config::TransformInputKernel>();
312  transform_weights_kernel = support::cpp14::make_unique<config::TransformWeightsKernel>();
313  transform_output_kernel = support::cpp14::make_unique<config::TransformOutputKernel>();
314  n_gemms = config::WinogradBase::N_GEMMS;
315  N_BLOCK = config::WinogradConv::N_BLOCK;
316  }
317  else
318  {
319  using config = NEWinogradLayerConfiguration<float, float, 2, 2, 3, 3>;
320  transform_input_kernel = support::cpp14::make_unique<config::TransformInputKernel>();
321  transform_weights_kernel = support::cpp14::make_unique<config::TransformWeightsKernel>();
322  transform_output_kernel = support::cpp14::make_unique<config::TransformOutputKernel>();
323  n_gemms = config::WinogradBase::N_GEMMS;
324  N_BLOCK = config::WinogradConv::N_BLOCK;
325  }
326  }
327  else if(kernel_size == Size2D(5, 5))
328  {
329  using config = NEWinogradLayerConfiguration<float, float, 2, 2, 5, 5>;
330  transform_input_kernel = support::cpp14::make_unique<config::TransformInputKernel>();
331  transform_weights_kernel = support::cpp14::make_unique<config::TransformWeightsKernel>();
332  transform_output_kernel = support::cpp14::make_unique<config::TransformOutputKernel>();
333  n_gemms = config::WinogradBase::N_GEMMS;
334  N_BLOCK = config::WinogradConv::N_BLOCK;
335  }
336  else if(kernel_size == Size2D(1, 3))
337  {
338  using config = NEWinogradLayerConfiguration<float, float, 6, 1, 3, 1>;
339  transform_input_kernel = support::cpp14::make_unique<config::TransformInputKernel>();
340  transform_weights_kernel = support::cpp14::make_unique<config::TransformWeightsKernel>();
341  transform_output_kernel = support::cpp14::make_unique<config::TransformOutputKernel>();
342  n_gemms = config::WinogradBase::N_GEMMS;
343  N_BLOCK = config::WinogradConv::N_BLOCK;
344  }
345  else if(kernel_size == Size2D(3, 1))
346  {
347  using config = NEWinogradLayerConfiguration<float, float, 1, 6, 1, 3>;
348  transform_input_kernel = support::cpp14::make_unique<config::TransformInputKernel>();
349  transform_weights_kernel = support::cpp14::make_unique<config::TransformWeightsKernel>();
350  transform_output_kernel = support::cpp14::make_unique<config::TransformOutputKernel>();
351  n_gemms = config::WinogradBase::N_GEMMS;
352  N_BLOCK = config::WinogradConv::N_BLOCK;
353  }
354  else if(kernel_size == Size2D(1, 5))
355  {
356  using config = NEWinogradLayerConfiguration<float, float, 4, 1, 5, 1>;
357  transform_input_kernel = support::cpp14::make_unique<config::TransformInputKernel>();
358  transform_weights_kernel = support::cpp14::make_unique<config::TransformWeightsKernel>();
359  transform_output_kernel = support::cpp14::make_unique<config::TransformOutputKernel>();
360  n_gemms = config::WinogradBase::N_GEMMS;
361  N_BLOCK = config::WinogradConv::N_BLOCK;
362  }
363  else if(kernel_size == Size2D(5, 1))
364  {
365  using config = NEWinogradLayerConfiguration<float, float, 1, 4, 1, 5>;
366  transform_input_kernel = support::cpp14::make_unique<config::TransformInputKernel>();
367  transform_weights_kernel = support::cpp14::make_unique<config::TransformWeightsKernel>();
368  transform_output_kernel = support::cpp14::make_unique<config::TransformOutputKernel>();
369  n_gemms = config::WinogradBase::N_GEMMS;
370  N_BLOCK = config::WinogradConv::N_BLOCK;
371  }
372  else if(kernel_size == Size2D(1, 7))
373  {
374  using config = NEWinogradLayerConfiguration<float, float, 2, 1, 7, 1>;
375  transform_input_kernel = support::cpp14::make_unique<config::TransformInputKernel>();
376  transform_weights_kernel = support::cpp14::make_unique<config::TransformWeightsKernel>();
377  transform_output_kernel = support::cpp14::make_unique<config::TransformOutputKernel>();
378  n_gemms = config::WinogradBase::N_GEMMS;
379  N_BLOCK = config::WinogradConv::N_BLOCK;
380  }
381  else if(kernel_size == Size2D(7, 1))
382  {
383  using config = NEWinogradLayerConfiguration<float, float, 1, 2, 1, 7>;
384  transform_input_kernel = support::cpp14::make_unique<config::TransformInputKernel>();
385  transform_weights_kernel = support::cpp14::make_unique<config::TransformWeightsKernel>();
386  transform_output_kernel = support::cpp14::make_unique<config::TransformOutputKernel>();
387  n_gemms = config::WinogradBase::N_GEMMS;
388  N_BLOCK = config::WinogradConv::N_BLOCK;
389  }
390  else
391  {
392  ARM_COMPUTE_ERROR("Not supported.");
393  }
394 
395  const PaddingType use_padding_type = (conv_info.pad_top() != 0u || conv_info.pad_left() != 0) ? PADDING_SAME : PADDING_VALID;
396  const bool use_same_padding = use_padding_type == PADDING_SAME;
397 
398  // Get convolved dimensions
399  const int in_channels = input->info()->dimension(channel_idx);
400  const int out_channels = output->info()->dimension(channel_idx);
401 
402  const Tensor4DShape in_shape(internal_get_input_shape(input));
403  const DataType data_type = input->info()->data_type();
404  const size_t data_type_size = input->info()->element_size();
405  // Get the memory required to instantiate a new Winograd operator.
406  constexpr size_t storage_alignment = 64;
407 
408  // Kernel Storage
409  const size_t kernel_storage_size = transform_weights_kernel->get_weight_storage_size(out_channels,
410  in_channels)
411  * data_type_size;
412 
413  // Input storage
414  const size_t input_storage_size = transform_input_kernel->get_input_storage_size(in_shape.n_batches, in_shape.n_channels, in_shape.n_rows, in_shape.n_cols,
415  use_same_padding)
416  * data_type_size;
417 
418  // Output storage
419  const size_t output_storage_size = transform_output_kernel->get_output_storage_size(in_shape.n_batches, in_shape.n_rows, in_shape.n_cols, out_channels) * data_type_size;
420  const int kernel_matrix_stride = transform_weights_kernel->get_matrix_stride(out_channels, in_channels);
421  const int output_matrix_stride = transform_output_kernel->get_matrix_stride(in_shape.n_batches, in_shape.n_rows, in_shape.n_cols, out_channels);
422  const auto output_shape = transform_output_kernel->get_output_shape(in_shape.n_rows, in_shape.n_cols, use_padding_type == PADDING_SAME);
423  const int input_matrix_stride = transform_input_kernel->get_matrix_stride(in_shape.n_batches, in_channels, in_shape.n_rows, in_shape.n_cols, use_padding_type == PADDING_SAME);
424 
425  // Configure GEMM
426  const int tile_rows = iceildiv(output_shape.first, output_tile.height);
427  const int tile_cols = iceildiv(output_shape.second, output_tile.width);
428  const int m = in_shape.n_batches * tile_rows * tile_cols;
429  const int k = in_shape.n_channels;
430  const int n = out_channels;
431  const int kernel_matrix_row_stride = roundup(out_channels, N_BLOCK);
432  const int output_matrix_row_stride = kernel_matrix_row_stride;
433 
434  TensorShape a_shape(k, m, 1, n_gemms);
435  Strides a_strides(data_type_size);
436  a_strides.set(1, a_strides[0] * k);
437  //a_strides.set(2, data_type_size * input_matrix_stride / n_gemms); FIXME: This is the real batch size, but RSH's code crashes if it's not 0.
438  a_strides.set(2, 0);
439  a_strides.set(3, data_type_size * input_matrix_stride);
440 
441  TensorShape b_shape(n, k, n_gemms);
442  Strides b_strides(data_type_size);
443  b_strides.set(1, data_type_size * kernel_matrix_row_stride);
444  b_strides.set(2, data_type_size * kernel_matrix_stride);
445 
446  TensorShape d_shape(n, m, 1, n_gemms);
447  Strides d_strides(data_type_size);
448  d_strides.set(1, data_type_size * output_matrix_row_stride);
449  //d_strides.set(2, data_type_size * output_matrix_stride / n_gemms); FIXME: This is the real batch size, but RSH's code crashes if it's not 0.
450  d_strides.set(2, 0);
451  d_strides.set(3, data_type_size * output_matrix_stride);
452 
453  TensorInfo a_info{};
454  TensorInfo b_info{};
455  TensorInfo d_info{};
456  a_info.init(a_shape, 1, data_type, a_strides, 0, input_storage_size);
457  b_info.init(b_shape, 1, data_type, b_strides, 0, kernel_storage_size);
458  d_info.init(d_shape, 1, data_type, d_strides, 0, output_storage_size);
459 
460  _input_transformed.allocator()->init(a_info, storage_alignment);
461  _kernel_storage.allocator()->init(b_info, storage_alignment);
462  _output_transformed.allocator()->init(d_info, storage_alignment);
463 
464  // configure and allocate dst tensor to be used to convert from winograd domain to spatial domain when calling to reshape_output()
465  TensorInfo info(TensorShape(_output->info()->dimension(2), _output->info()->dimension(0),
466  _output->info()->dimension(1), _output->info()->dimension(3)),
467  1, _output->info()->data_type());
468  _output_nhwc.allocator()->init(info);
469 
470  const ITensor *input_to_use = _input;
471  ITensor *output_to_use = _output;
472  PermutationVector weights_permutation_vector(3U, 0U, 1U, 2U);
473  const unsigned int max_num_threads = NEScheduler::get().num_threads();
474 
475  // Configure the kernel to transform the input tensor from NCHW -> NHWC
477  {
478  _memory_group.manage(&_input_nhwc);
479  _permute_input.configure(input, &_input_nhwc, PermutationVector(2U, 0U, 1U));
480  input_to_use = &_input_nhwc;
481  weights_permutation_vector = PermutationVector(3U, 2U, 0U, 1U);
482  }
483 
484  // Configure input transform kernel
485  _memory_group.manage(&_input_transformed);
486  _memory_group.manage(&_input_workspace);
487  transform_input_kernel->configure(input_to_use, in_shape.n_batches, in_shape.n_rows, in_shape.n_cols, in_shape.n_channels, use_padding_type,
488  &_input_transformed, input_matrix_stride, &_input_workspace);
489  const size_t input_workspace_size = transform_input_kernel->get_working_space_size(max_num_threads);
490  TensorInfo input_workspace_info(TensorShape(input_workspace_size), 1, _input->info()->data_type());
491  _input_workspace.allocator()->init(input_workspace_info);
492  _input_workspace.allocator()->allocate();
494  {
495  _input_nhwc.allocator()->allocate();
496  }
497 
498  // Re-order a weight tensor from [Output feature map x Input feature map x Height x Width] to [Height x Width x Input feature map x Output feature map]
499  _permute_weights.configure(weights, &_weights_hwio, weights_permutation_vector);
500  transform_weights_kernel->configure(&_weights_hwio, &_kernel_storage, kernel_matrix_stride, out_channels, in_channels);
501 
502  // Configure GEMM function
503  _memory_group.manage(&_output_transformed);
504  _gemm_function.configure(&_input_transformed, &_kernel_storage, nullptr, &_output_transformed, 1.0f, 0.f);
505  _input_transformed.allocator()->allocate();
506 
507  // Configure output transform function
508  // The biases tensor has not been allocated at this point in time, the output transform will add the biases to the final result in the run() method
510  {
511  _memory_group.manage(&_output_nhwc);
512  output_to_use = &_output_nhwc;
513  }
514  const arm_gemm::Activation activation = arm_gemm_activation_from_acl_activation(act_info);
515 
516  transform_output_kernel->configure(biases,
517  &_output_transformed,
518  output_matrix_stride,
519  output_to_use,
520  in_shape.n_batches,
521  output_shape.first,
522  output_shape.second,
523  out_channels,
524  &_output_workspace,
525  activation);
526 
527  const size_t output_workspace_size = transform_output_kernel->get_working_space_size(max_num_threads);
528  TensorInfo output_workspace_info(TensorShape(output_workspace_size), 1, _output->info()->data_type());
529  _output_workspace.allocator()->init(output_workspace_info);
530  _output_workspace.allocator()->allocate();
531  _output_transformed.allocator()->allocate();
532 
533  // Reorder the convoluted output to ACL's ordering NCHW
535  {
536  _permute_output.configure(&_output_nhwc, _output, PermutationVector(1U, 2U, 0U));
537  _output_nhwc.allocator()->allocate();
538  }
539 
540  _transform_input_kernel = std::move(transform_input_kernel);
541  _transform_weights_kernel = std::move(transform_weights_kernel);
542  _transform_output_kernel = std::move(transform_output_kernel);
543 
544  //Configure Activation Layer
545  _is_activationlayer_enabled = act_info.enabled() && ! fuse_function_supported(act_info);
546  if(_is_activationlayer_enabled)
547  {
548  _activationlayer_function.configure(_output, nullptr, act_info);
549  }
550 }
T iceildiv(const T a, const T b)
Definition: utils.hpp:38
const DataLayout data_layout
Definition: Im2Col.cpp:146
TensorInfo * info() const override
Interface to be implemented by the child class to return the tensor's metadata.
Definition: CLTensor.cpp:41
void init(const TensorAllocator &allocator, const Coordinates &coords, TensorInfo &sub_info)
Shares the same backing memory with another tensor allocator, while the tensor info might be differen...
virtual size_t dimension(size_t index) const =0
Return the size of the requested dimension.
#define ARM_COMPUTE_ERROR(msg)
Print the given message then throw an std::runtime_error.
Definition: Error.h:352
size_t dimension(size_t index) const override
Return the size of the requested dimension.
Definition: TensorInfo.h:232
virtual DataType data_type() const =0
Data type used for each element of the tensor.
Strides PermutationVector
Permutation vector.
Definition: Types.h:47
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
arm_compute::ActivationLayerInfo::ActivationFunction Activation
Constant TensorID specifying an equivalent of null tensor.
Definition: Types.h:68
TensorAllocator * allocator()
Return a pointer to the tensor's allocator.
Definition: Tensor.cpp:48
void manage(IMemoryManageable *obj) override
Sets a object to be managed by the given memory group.
Definition: MemoryGroup.h:79
#define ARM_COMPUTE_ERROR_ON_MSG(cond, msg)
Definition: Error.h:456
void allocate() override
Allocate size specified by TensorInfo of CPU memory.
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor's metadata.
Num samples, channels, height, width.
T roundup(const T a, const T b)
Definition: utils.hpp:43
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161
void configure(const ITensor *a, const ITensor *b, const ITensor *c, ITensor *d, float alpha, float beta, const GEMMInfo &gemm_info=GEMMInfo())
Initialise the kernel's inputs, output.
Definition: NEGEMM.cpp:51
void configure(ITensor *input, ITensor *output, ActivationLayerInfo activation_info)
[NEActivationLayer snippet]
void configure(const ITensor *input, ITensor *output, const PermutationVector &perm)
Configure the permute CPP kernel.
Definition: CPPPermute.cpp:31
virtual unsigned int num_threads() const =0
Returns the number of threads that the SingleThreadScheduler has in his pool.
size_t get_data_layout_dimension_index(const DataLayout data_layout, const DataLayoutDimension data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:327
DataType
Available data types.
Definition: Types.h:74
DataLayout
[DataLayout enum definition]
Definition: Types.h:116
static IScheduler & get()
Access the scheduler singleton.
Definition: Scheduler.cpp:95

References arm_compute::test::validation::act_info, TensorAllocator::allocate(), Tensor::allocator(), ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON_MSG, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, arm_compute::CHANNEL, CPPPermute::configure(), NEActivationLayer::configure(), NEGEMM::configure(), arm_compute::test::validation::conv_info, arm_compute::test::validation::data_layout, arm_compute::test::validation::data_type, ITensorInfo::data_type(), ITensorInfo::dimension(), TensorInfo::dimension(), Scheduler::get(), arm_compute::get_data_layout_dimension_index(), arm_compute::HEIGHT, iceildiv(), ITensor::info(), CLTensor::info(), arm_compute::test::validation::info, TensorAllocator::init(), TensorInfo::init(), arm_compute::test::validation::input, MemoryGroup::manage(), arm_compute::NCHW, IScheduler::num_threads(), arm_compute::test::validation::output_shape, roundup(), Dimensions< T >::set(), arm_compute::U, arm_compute::test::validation::weights, and arm_compute::WIDTH.

◆ operator=()

NEWinogradConvolutionLayer& operator= ( const NEWinogradConvolutionLayer )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ prepare()

void prepare ( )
overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Note
Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from IFunction.

Definition at line 695 of file NEWinogradConvolutionLayer.cpp.

696 {
697  if(!_is_prepared)
698  {
699  // Permute weights
700  _weights_hwio.allocator()->allocate();
701  _permute_weights.run();
702  _weights->mark_as_unused();
703 
704  // Transform weights
705  _kernel_storage.allocator()->allocate();
706  NEScheduler::get().schedule(_transform_weights_kernel.get(), Window::DimX);
707 
708  _weights_hwio.allocator()->free();
709  _is_prepared = true;
710  }
711 }
TensorAllocator * allocator()
Return a pointer to the tensor's allocator.
Definition: Tensor.cpp:48
void mark_as_unused() const
Marks a tensor as unused.
Definition: ITensor.cpp:167
static constexpr size_t DimX
Alias for dimension 0 also known as X dimension.
Definition: Window.h:43
void allocate() override
Allocate size specified by TensorInfo of CPU memory.
void free() override
Free allocated CPU memory.
virtual void schedule(ICPPKernel *kernel, const Hints &hints)=0
Runs the kernel in the same thread as the caller synchronously.
void run() override final
Run the kernels contained in the function.
static IScheduler & get()
Access the scheduler singleton.
Definition: Scheduler.cpp:95

References TensorAllocator::allocate(), Tensor::allocator(), Window::DimX, TensorAllocator::free(), Scheduler::get(), ITensor::mark_as_unused(), ICPPSimpleFunction::run(), and IScheduler::schedule().

Referenced by NEWinogradConvolutionLayer::run().

◆ run()

void run ( )
overridevirtual

Run the kernels contained in the function.

For NEON kernels:

  • Multi-threading is used for the kernels which are parallelisable.
  • By default std::thread::hardware_concurrency() threads are used.
Note
CPPScheduler::set_num_threads() can be used to manually set the number of threads

For OpenCL kernels:

  • All the kernels are enqueued on the queue associated with CLScheduler.
  • The queue is then flushed.
Note
The function will not block until the kernels are executed. It is the user's responsibility to wait.
Will call prepare() on first run if hasn't been done

Implements IFunction.

Definition at line 552 of file NEWinogradConvolutionLayer.cpp.

553 {
554  const DataLayout data_layout = _input->info()->data_layout();
555 
556  prepare();
557 
558  MemoryGroupResourceScope scope_mg(_memory_group);
559 
561  {
562  //Bring channels to the front as Winograd code expects the tensor to be in the format NHWC
563  _permute_input.run();
564  }
565 
566  // Transform input tensor to the winograd domain
567  NEScheduler::get().schedule(_transform_input_kernel.get(), Window::DimX);
568 
569  //Run 16 GEMMs in multiple threads, each kernel runs one or more GEMMs
570  _gemm_function.run();
571 
572  // Transform output tensor to the spatial domain
573  NEScheduler::get().schedule(_transform_output_kernel.get(), Window::DimX);
574 
576  {
577  // Reorder the convoluted output to ACL's ordering NCHW
578  _permute_output.run();
579  }
580 
581  if(_is_activationlayer_enabled )
582  {
583  _activationlayer_function.run();
584  }
585 }
const DataLayout data_layout
Definition: Im2Col.cpp:146
void run() override final
Run the kernels contained in the function.
void run() override
Run the kernels contained in the function.
Definition: NEGEMM.cpp:285
static constexpr size_t DimX
Alias for dimension 0 also known as X dimension.
Definition: Window.h:43
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor's metadata.
void prepare() override
Prepare the function for executing.
Num samples, channels, height, width.
virtual void schedule(ICPPKernel *kernel, const Hints &hints)=0
Runs the kernel in the same thread as the caller synchronously.
void run() override final
Run the kernels contained in the function.
DataLayout
[DataLayout enum definition]
Definition: Types.h:116
virtual DataLayout data_layout() const =0
Get the data layout of the tensor.
static IScheduler & get()
Access the scheduler singleton.
Definition: Scheduler.cpp:95

References arm_compute::test::validation::data_layout, ITensorInfo::data_layout(), Window::DimX, Scheduler::get(), ITensor::info(), arm_compute::NCHW, NEWinogradConvolutionLayer::prepare(), ICPPSimpleFunction::run(), INESimpleFunctionNoBorder::run(), NEGEMM::run(), and IScheduler::schedule().

◆ validate()

Status validate ( const ITensorInfo input,
const ITensorInfo weights,
const ITensorInfo biases,
const ITensorInfo output,
const PadStrideInfo conv_info,
const ActivationLayerInfo act_info = ActivationLayerInfo(),
bool  enable_fast_math = false 
)
static

Static function to check if given info will lead to a valid configuration of NEGEMMConvolutionLayer.

Parameters
[in]inputSource tensor. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: F32.
[in]weightsWeights tensor. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. Data type supported:Same as input. Currently only 3x3 and 5x5 kernels are supported.
[in]biasesBiases tensor. Shared biases supported. Biases are 1D tensor with dimensions [OFM]. Data type supported: Same as weights.
[in]outputDestination tensor. 3 lower dimensions represent a single output [width, height, OFM], while the rest represent batch of outputs. Data types supported: Same as input.
[in]conv_infoContains padding and stride information described in PadStrideInfo. Currently only unit strides are supported.
[in]act_info(Optional) Activation layer information in case of a fused activation.
[in]enable_fast_math(Optional) Enable fast math computation. In case this flag were set, the function could dispatch the fastest implementation available which may introduce a drop of accuracy as well. Default is false
Returns
a status

Definition at line 587 of file NEWinogradConvolutionLayer.cpp.

589 {
591  ARM_COMPUTE_RETURN_ON_ERROR(validate_arguments(input, weights, biases, output, conv_info));
592 
593  // Get indices for the width and height
594  const size_t idx_width = get_data_layout_dimension_index(input->data_layout(), DataLayoutDimension::WIDTH);
595  const size_t idx_height = get_data_layout_dimension_index(input->data_layout(), DataLayoutDimension::HEIGHT);
596 
597  // Input shape, kernel size and output tile
598  const Size2D input_dims = Size2D(input->dimension(idx_width), input->dimension(idx_height));
599  const Size2D kernel_size = Size2D(weights->dimension(idx_width), weights->dimension(idx_height));
600  const Size2D output_tile = winograd_output_tile(input_dims, kernel_size);
601 
602  // Check if the Winograd configuration requires fast math
603  if(!enable_fast_math)
604  {
605  ARM_COMPUTE_RETURN_ERROR_ON_MSG(check_support_fast_math(output_tile, kernel_size), "This Winograd configuration requires enable_fast_math=true");
606  }
607 
608  const WinogradInfo winograd_info = WinogradInfo(output_tile,
609  kernel_size,
610  input_dims,
611  conv_info,
612  input->data_layout());
613 
614  // Validate input transform
616  const TensorInfo input0 = input->clone()->set_tensor_shape(input0_shape);
617  // Validate filter transform
619  const TensorInfo input1 = weights->clone()->set_tensor_shape(input1_shape);
620  // Validate batched matrix multiply
621  TensorShape batched_mm_output_shape = input0.tensor_shape();
622  batched_mm_output_shape[0] = input1.tensor_shape()[0];
623  const TensorInfo batched_mm_output = input0.clone()->set_tensor_shape(batched_mm_output_shape);
624 
625  if(kernel_size == Size2D(3, 3))
626  {
627  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_top() != 0u && conv_info.pad_top() != 1, "Only SAME or VALID padding supported");
628  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_bottom() != 0u && conv_info.pad_bottom() != 1, "Only SAME or VALID padding supported");
629  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_left() != 0u && conv_info.pad_left() != 1, "Only SAME or VALID padding supported");
630  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_right() != 0u && conv_info.pad_right() != 1, "Only SAME or VALID padding supported");
631  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_right() != conv_info.pad_left(), "Only SAME or VALID padding supported");
632  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_top() != conv_info.pad_bottom(), "Only SAME or VALID padding supported");
633  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_top() != conv_info.pad_left(), "Only SAME or VALID padding supported");
634  return validate_kernel_3x3(input_dims, input, &input0, &input1, &batched_mm_output, weights, biases, output, winograd_info, act_info);
635  }
636  else if(kernel_size == Size2D(5, 5))
637  {
638  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_top() != 0u && conv_info.pad_top() != 2, "Only SAME or VALID padding supported");
639  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_left() != 0u && conv_info.pad_left() != 2, "Only SAME or VALID padding supported");
640  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_bottom() != 0u && conv_info.pad_bottom() != 2, "Only SAME or VALID padding supported");
641  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_right() != 0u && conv_info.pad_right() != 2, "Only SAME or VALID padding supported");
642  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_right() != conv_info.pad_left(), "Only SAME or VALID padding supported");
643  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_top() != conv_info.pad_bottom(), "Only SAME or VALID padding supported");
644  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_top() != conv_info.pad_left(), "Only SAME or VALID padding supported");
645  return validate_kernel_5x5(input, &input0, &input1, &batched_mm_output, weights, biases, output, winograd_info, act_info);
646  }
647  if(kernel_size == Size2D(3, 1))
648  {
649  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_left() != 0u && conv_info.pad_left() != 1, "Only SAME or VALID padding supported");
650  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_right() != 0u && conv_info.pad_right() != 1, "Only SAME or VALID padding supported");
651  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_top() != 0u && conv_info.pad_bottom() != 0, "Only SAME or VALID padding supported");
652  return validate_kernel_3x1(input, &input0, &input1, &batched_mm_output, weights, biases, output, winograd_info, act_info);
653  }
654  else if(kernel_size == Size2D(1, 3))
655  {
656  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_top() != 0u && conv_info.pad_top() != 1, "Only SAME or VALID padding supported");
657  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_bottom() != 0u && conv_info.pad_bottom() != 1, "Only SAME or VALID padding supported");
658  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_left() != 0u && conv_info.pad_right() != 0, "Only SAME or VALID padding supported");
659  return validate_kernel_1x3(input, &input0, &input1, &batched_mm_output, weights, biases, output, winograd_info, act_info);
660  }
661  else if(kernel_size == Size2D(5, 1))
662  {
663  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_left() != 0u && conv_info.pad_left() != 2, "Only SAME or VALID padding supported");
664  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_right() != 0u && conv_info.pad_right() != 2, "Only SAME or VALID padding supported");
665  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_top() != 0u && conv_info.pad_bottom() != 0, "Only SAME or VALID padding supported");
666  return validate_kernel_5x1(input, &input0, &input1, &batched_mm_output, weights, biases, output, winograd_info, act_info);
667  }
668  else if(kernel_size == Size2D(1, 5))
669  {
670  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_top() != 0u && conv_info.pad_top() != 2, "Only SAME or VALID padding supported");
671  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_bottom() != 0u && conv_info.pad_bottom() != 2, "Only SAME or VALID padding supported");
672  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_left() != 0u && conv_info.pad_right() != 0, "Only SAME or VALID padding supported");
673  return validate_kernel_1x5(input, &input0, &input1, &batched_mm_output, weights, biases, output, winograd_info, act_info);
674  }
675  else if(kernel_size == Size2D(7, 1))
676  {
677  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_left() != 0u && conv_info.pad_left() != 3, "Only SAME or VALID padding supported");
678  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_right() != 0u && conv_info.pad_right() != 3, "Only SAME or VALID padding supported");
679  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_top() != 0u && conv_info.pad_bottom() != 0, "Only SAME or VALID padding supported");
680  return validate_kernel_7x1(input, &input0, &input1, &batched_mm_output, weights, biases, output, winograd_info, act_info);
681  }
682  else if(kernel_size == Size2D(1, 7))
683  {
684  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_top() != 0u && conv_info.pad_top() != 3, "Only SAME or VALID padding supported");
685  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_bottom() != 0u && conv_info.pad_bottom() != 3, "Only SAME or VALID padding supported");
686  ARM_COMPUTE_RETURN_ERROR_ON_MSG(conv_info.pad_left() != 0u && conv_info.pad_right() != 0, "Only SAME or VALID padding supported");
687  return validate_kernel_1x7(input, &input0, &input1, &batched_mm_output, weights, biases, output, winograd_info, act_info);
688  }
689  else
690  {
691  ARM_COMPUTE_RETURN_ERROR_MSG("Kernel shape not supported");
692  }
693 }
TensorShape compute_winograd_input_transform_shape(const ITensorInfo &input, const WinogradInfo &winograd_info)
Calculate the winograd input transform shape.
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204
TensorShape compute_winograd_filter_transform_shape(const ITensorInfo &input, const WinogradInfo &winograd_info)
Calculate the winograd filter transform shape.
#define ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR(...)
Definition: Validate.h:163
#define ARM_COMPUTE_RETURN_ERROR_MSG(...)
An error is returned with the given description.
Definition: Error.h:194
#define ARM_COMPUTE_RETURN_ERROR_ON_MSG(cond, msg)
If the condition is true, an error is returned.
Definition: Error.h:244
size_t get_data_layout_dimension_index(const DataLayout data_layout, const DataLayoutDimension data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:327

References arm_compute::test::validation::act_info, ARM_COMPUTE_RETURN_ERROR_MSG, ARM_COMPUTE_RETURN_ERROR_ON_MSG, ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR, ARM_COMPUTE_RETURN_ON_ERROR, TensorInfo::clone(), arm_compute::misc::shape_calculator::compute_winograd_filter_transform_shape(), arm_compute::misc::shape_calculator::compute_winograd_input_transform_shape(), arm_compute::test::validation::conv_info, arm_compute::get_data_layout_dimension_index(), arm_compute::HEIGHT, arm_compute::test::validation::input, TensorInfo::tensor_shape(), arm_compute::test::validation::weights, arm_compute::WIDTH, and arm_compute::test::validation::winograd_info.

Referenced by NEConvolutionLayer::get_convolution_method(), and NEConvolutionLayer::validate().


The documentation for this class was generated from the following files: