Compute Library
 21.02
NEGEMMTranspose1xWKernel Class Reference

Neon kernel which transposes the elements of a matrix in chunks of 1xW, where W is equal to (16 / element size of the tensor) More...

#include <NEGEMMTranspose1xWKernel.h>

Collaboration diagram for NEGEMMTranspose1xWKernel:
[legend]

Public Member Functions

const char * name () const override
 Name of the kernel. More...
 
 NEGEMMTranspose1xWKernel ()=default
 Constructor. More...
 
 NEGEMMTranspose1xWKernel (const NEGEMMTranspose1xWKernel &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
NEGEMMTranspose1xWKerneloperator= (const NEGEMMTranspose1xWKernel &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 NEGEMMTranspose1xWKernel (NEGEMMTranspose1xWKernel &&)=default
 Allow instances of this class to be moved. More...
 
NEGEMMTranspose1xWKerneloperator= (NEGEMMTranspose1xWKernel &&)=default
 Allow instances of this class to be moved. More...
 
 ~NEGEMMTranspose1xWKernel ()=default
 Default destructor. More...
 
void configure (const ITensor *input, ITensor *output)
 Initialise the kernel's input and output. More...
 
void run (const Window &window, const ThreadInfo &info) override
 Execute the kernel on the passed window. More...
 
- Public Member Functions inherited from ICPPSimpleKernel
 ICPPSimpleKernel ()
 Constructor. More...
 
 ICPPSimpleKernel (const ICPPSimpleKernel &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
ICPPSimpleKerneloperator= (const ICPPSimpleKernel &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 ICPPSimpleKernel (ICPPSimpleKernel &&)=default
 Allow instances of this class to be moved. More...
 
ICPPSimpleKerneloperator= (ICPPSimpleKernel &&)=default
 Allow instances of this class to be moved. More...
 
 ~ICPPSimpleKernel ()=default
 Default destructor. More...
 
- Public Member Functions inherited from ICPPKernel
virtual ~ICPPKernel ()=default
 Default destructor. More...
 
virtual void run_nd (const Window &window, const ThreadInfo &info, const Window &thread_locator)
 legacy compatibility layer for implemantions which do not support thread_locator In these cases we simply narrow the interface down the legacy version More...
 
virtual void run_op (ITensorPack &tensors, const Window &window, const ThreadInfo &info)
 Execute the kernel on the passed window. More...
 
- Public Member Functions inherited from IKernel
 IKernel ()
 Constructor. More...
 
virtual ~IKernel ()=default
 Destructor. More...
 
virtual bool is_parallelisable () const
 Indicates whether or not the kernel is parallelisable. More...
 
virtual BorderSize border_size () const
 The size of the border for that kernel. More...
 
const Windowwindow () const
 The maximum window the kernel can be executed on. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *input, const ITensorInfo *output)
 Static function to check if given info will lead to a valid configuration of NEGEMMTranspose1xWKernel. More...
 

Detailed Description

Neon kernel which transposes the elements of a matrix in chunks of 1xW, where W is equal to (16 / element size of the tensor)

Following an example of how the transposition1xW works when the input data is F32

\[ \left( \begin{array}{cccc} a00 & a01 & a02 & a03 \\ a10 & a11 & a12 & a13 \\ a20 & a21 & a22 & a23 \\ a30 & a31 & a32 & a33 \\ \end{array} \right) \rightarrow \left( \begin{array}{ccccccccccccccccc} a00 & a01 & a02 & a03 & a10 & a11 & a12 & a13 & a20 & a21 & a22 & a23 & a30 & a31 & a32 & a33 \\ \end{array} \right) \]

Following an example of how the transposition1xW works when the input data type is F16

\[ \left( \begin{array}{cccccccc} a00 & a01 & a02 & a03 & a04 & a05 & a06 & a07 \\ a10 & a11 & a12 & a13 & a14 & a15 & a16 & a17 \\ a20 & a21 & a22 & a23 & a24 & a25 & a26 & a27 \\ a30 & a31 & a32 & a33 & a34 & a35 & a36 & a37 \\ \end{array} \right) \rightarrow \left( \begin{array}{cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc} a00 & a01 & a02 & a03 & a04 & a05 & a06 & a07 & a10 & a11 & a12 & a13 & a14 & a15 & a16 & a17 & a20 & a21 & a22 & a23 & a24 & a25 & a26 & a27 & a30 & a31 & a32 & a33 & a34 & a35 & a36 & a37\\ \end{array} \right) \]

Note
The output matrix will have the following shape: [ height * W, ceil(width / W) ], where W = (16 / element size of the tensor)

Definition at line 69 of file NEGEMMTranspose1xWKernel.h.

Constructor & Destructor Documentation

◆ NEGEMMTranspose1xWKernel() [1/3]

Constructor.

Referenced by NEGEMMTranspose1xWKernel::name().

◆ NEGEMMTranspose1xWKernel() [2/3]

Prevent instances of this class from being copied (As this class contains pointers)

◆ NEGEMMTranspose1xWKernel() [3/3]

Allow instances of this class to be moved.

◆ ~NEGEMMTranspose1xWKernel()

Default destructor.

Referenced by NEGEMMTranspose1xWKernel::name().

Member Function Documentation

◆ configure()

void configure ( const ITensor input,
ITensor output 
)

Initialise the kernel's input and output.

Parameters
[in]inputInput tensor. Data types supported: All
[out]outputOutput tensor. Data type supported: same as input.

Definition at line 67 of file NEGEMMTranspose1xWKernel.cpp.

References ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, arm_compute::auto_init_if_empty(), arm_compute::calculate_max_window(), ITensorInfo::data_type(), ITensorInfo::element_size(), ITensor::info(), arm_compute::test::validation::input, ITensorInfo::num_dimensions(), Dimensions< T >::set_num_dimensions(), ITensorInfo::set_valid_region(), ITensorInfo::tensor_shape(), and arm_compute::validate_arguments().

Referenced by NEGEMMTranspose1xWKernel::name().

68 {
70 
71  // Output tensor auto inizialitation if not yet initialized
72  auto_init_if_empty(*output->info(), get_output_shape(input->info()), 1, input->info()->data_type());
73 
74  // Perform validate step
75  ARM_COMPUTE_ERROR_THROW_ON(validate_arguments(input->info(), output->info()));
76 
77  _input = input;
78  _output = output;
79 
80  const size_t vector_size = 16 / input->info()->element_size();
81 
82  // Configure kernel window
83  Window win = calculate_max_window(*input->info(), Steps(vector_size));
84 
85  Coordinates coord;
86  coord.set_num_dimensions(output->info()->num_dimensions());
87  output->info()->set_valid_region(ValidRegion(coord, output->info()->tensor_shape()));
88 
89  INEKernel::configure(win);
90 }
Window calculate_max_window(const ValidRegion &valid_region, const Steps &steps, bool skip_border, BorderSize border_size)
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
bool auto_init_if_empty(ITensorInfo &info, const TensorShape &shape, int num_channels, DataType data_type, QuantizationInfo quantization_info=QuantizationInfo())
Auto initialize the tensor info (shape, number of channels and data type) if the current assignment i...
Status validate_arguments(const ITensorInfo *input, const ITensorInfo *bias, const ITensorInfo *output, const GEMMLowpOutputStageInfo *output_stage)
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161

◆ name()

◆ operator=() [1/2]

NEGEMMTranspose1xWKernel& operator= ( const NEGEMMTranspose1xWKernel )
delete

Prevent instances of this class from being copied (As this class contains pointers)

Referenced by NEGEMMTranspose1xWKernel::name().

◆ operator=() [2/2]

Allow instances of this class to be moved.

◆ run()

void run ( const Window window,
const ThreadInfo info 
)
overridevirtual

Execute the kernel on the passed window.

Warning
If is_parallelisable() returns false then the passed window must be equal to window()
Note
The window has to be a region within the window returned by the window() method
The width of the window has to be a multiple of num_elems_processed_per_iteration().
Parameters
[in]windowRegion on which to execute the kernel. (Must be a region of the window returned by window())
[in]infoInfo about executing thread and CPU.

Reimplemented from ICPPKernel.

Definition at line 99 of file NEGEMMTranspose1xWKernel.cpp.

References ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW, ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL, ARM_COMPUTE_UNUSED, Window::DimX, Window::DimY, arm_compute::execute_window_loop(), Iterator::ptr(), Window::set(), and IKernel::window().

Referenced by NEGEMMTranspose1xWKernel::name().

100 {
104 
105  /*
106  * Following an example of how the transposition1xW works when the input data type is F32
107  *
108  * |a00 a01 a02 a03|
109  * |a10 a11 a12 a13|
110  * |a20 a21 a22 a23| = | a00 a01 a02 a03 || a10 a11 a12 a13 || a20 a21 a22 a23 || a30 a31 a32 a33 |
111  * |a30 a31 a32 a33|
112  *
113  * The output matrix will have the following shape: [ height * W, ceil(width / W) ], where W = (16 / element size of the tensor)
114  */
115 
116  // Set window for output tensor. Set to 0 the X and Y dimensions in order to allow multi-threading implementation and future batched matrix multiplications
117  Window win_out(window);
118  win_out.set(Window::DimX, Window::Dimension(0, 0, 0));
119  win_out.set(Window::DimY, Window::Dimension(0, 0, 0));
120 
121  Iterator in(_input, window);
122  Iterator out(_output, win_out);
123 
124  const size_t in_width = _input->info()->dimension(0);
125  const size_t element_size = _input->info()->element_size();
126  const size_t out_stride = _output->info()->strides_in_bytes()[1];
127  const size_t vector_size = 16 / element_size;
128 
129  execute_window_loop(window, [&](const Coordinates & id)
130  {
131  const uint8_t *in_ptr = in.ptr();
132  uint8_t *const out_ptr = out.ptr() + (id.y() * vector_size) * element_size + (id.x() / vector_size) * out_stride;
133 
134  for(size_t k = 0; k < vector_size; ++k)
135  {
136  // If the input width is not multiple of W, we fill the reference with 0s
137  if((id.x() + k) >= in_width)
138  {
139  std::memset(out_ptr + k * element_size, 0, element_size);
140  }
141  else
142  {
143  std::memcpy(out_ptr + k * element_size, in_ptr + k * element_size, element_size);
144  }
145  }
146  },
147  in, out);
148 }
const Window & window() const
The maximum window the kernel can be executed on.
Definition: IKernel.cpp:28
static constexpr size_t DimX
Alias for dimension 0 also known as X dimension.
Definition: Window.h:43
#define ARM_COMPUTE_UNUSED(...)
To avoid unused variables warnings.
Definition: Error.h:152
#define ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL(k)
Definition: Validate.h:941
static constexpr size_t DimY
Alias for dimension 1 also known as Y dimension.
Definition: Window.h:45
ScaleKernelInfo info(interpolation_policy, default_border_mode, PixelValue(), sampling_policy, false)
void execute_window_loop(const Window &w, L &&lambda_function, Ts &&... iterators)
Iterate through the passed window, automatically adjusting the iterators and calling the lambda_funct...
Definition: Helpers.inl:77
#define ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW(f, s)
Definition: Validate.h:205

◆ validate()

Status validate ( const ITensorInfo input,
const ITensorInfo output 
)
static

Static function to check if given info will lead to a valid configuration of NEGEMMTranspose1xWKernel.

Parameters
[in]inputInput tensor info. Data types supported: All
[in]outputOutput tensor info. Data type supported: same as input.
Returns
a status

Definition at line 92 of file NEGEMMTranspose1xWKernel.cpp.

References ARM_COMPUTE_RETURN_ON_ERROR, and arm_compute::validate_arguments().

Referenced by NEGEMMTranspose1xWKernel::name(), NEGEMM::validate(), and NEGEMMLowpMatrixMultiplyCore::validate().

93 {
95 
96  return Status{};
97 }
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204
Status validate_arguments(const ITensorInfo *input, const ITensorInfo *bias, const ITensorInfo *output, const GEMMLowpOutputStageInfo *output_stage)

The documentation for this class was generated from the following files: