Kernel to interleave the elements of a matrix. More...

#include <CpuGemmInterleave4x4Kernel.h>

Collaboration diagram for CpuGemmInterleave4x4Kernel:

Public Member Functions
	CpuGemmInterleave4x4Kernel ()=default

void	configure (const ITensorInfo src, ITensorInfo dst)
	Initialise the kernel's src and dst. More...

void	run_op (ITensorPack &tensors, const Window &window, const ThreadInfo &info) override
	Execute the kernel on the passed window. More...

const char *	name () const override
	Name of the kernel. More...

Public Member Functions inherited from ICPPKernel
virtual	~ICPPKernel ()=default
	Default destructor. More...

virtual void	run (const Window &window, const ThreadInfo &info)
	Execute the kernel on the passed window. More...

virtual void	run_nd (const Window &window, const ThreadInfo &info, const Window &thread_locator)
	legacy compatibility layer for implemantions which do not support thread_locator In these cases we simply narrow the interface down the legacy version More...

virtual size_t	get_mws (const CPUInfo &platform, size_t thread_count) const
	Return minimum workload size of the relevant kernel. More...

Public Member Functions inherited from IKernel
	IKernel ()
	Constructor. More...

virtual	~IKernel ()=default
	Destructor. More...

virtual bool	is_parallelisable () const
	Indicates whether or not the kernel is parallelisable. More...

virtual BorderSize	border_size () const
	The size of the border for that kernel. More...

const Window &	window () const
	The maximum window the kernel can be executed on. More...

bool	is_window_configured () const
	Function to check if the embedded window of this kernel has been configured. More...

Static Public Member Functions
static Status	validate (const ITensorInfo src, const ITensorInfo dst)
	Static function to check if given info will lead to a valid configuration of CpuGemmInterleave4x4Kernel. More...

Static Public Member Functions inherited from ICpuKernel< CpuGemmInterleave4x4Kernel >
static const auto *	get_implementation (const SelectorType &selector, KernelSelectionType selection_type=KernelSelectionType::Supported)
	Micro-kernel selector. More...

Additional Inherited Members
Static Public Attributes inherited from ICPPKernel
static constexpr size_t	default_mws = 1

Detailed Description

Kernel to interleave the elements of a matrix.

This function puts the values in a 4x4 block of Matrix A on the same row (Interleaved values)

\[ \left( \begin{array}{cccc} a00 & a01 & a02 & a03 \\ a10 & a11 & a12 & a13 \\ a20 & a21 & a22 & a23 \\ a30 & a31 & a32 & a33 \\ \end{array} \right) \rightarrow \left( \begin{array}{ccccccccccccccccc} a00 & a10 & a20 & a30 & a01 & a11 & a21 & a31 & a02 & a12 & a22 & a32 & a03 & a13 & a23 & a33 \\ \end{array} \right) \]

After this operation, the dst matrix will have the following shape: [ height * 4, ceil(width / 4.0f) ]

Definition at line 55 of file CpuGemmInterleave4x4Kernel.h.

Constructor & Destructor Documentation

◆ CpuGemmInterleave4x4Kernel()

CpuGemmInterleave4x4Kernel ( )

default

Member Function Documentation

◆ configure()

void configure	(	const ITensorInfo *	src,
		ITensorInfo *	dst
	)

Initialise the kernel's src and dst.

Parameters

[in]	src	Input tensor info. Data types supported: All
[out]	dst	Output tensor info which stores the interleaved matrix. Data type supported: same as `src`.

Definition at line 44 of file CpuGemmInterleave4x4Kernel.cpp.

 {
     ARM_COMPUTE_ERROR_ON_NULLPTR(src, dst);
  
     // dst auto inizialitation if not yet initialized
     auto_init_if_empty(*dst, src->clone()->set_tensor_shape(compute_interleaved_shape(*src)));
  
     // Perform validate step
     ARM_COMPUTE_ERROR_THROW_ON(CpuGemmInterleave4x4Kernel::validate(src, dst));
  
     Window win = calculate_max_window(*src, Steps(1, 4));
     ICPPKernel::configure(win);
 }

References ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, arm_compute::auto_init_if_empty(), arm_compute::calculate_max_window(), arm_compute::misc::shape_calculator::compute_interleaved_shape(), arm_compute::test::validation::dst, arm_compute::test::validation::src, and CpuGemmInterleave4x4Kernel::validate().

◆ name()

const char * name ( ) const

overridevirtual

Name of the kernel.

Returns: Kernel name

Implements ICPPKernel.

Definition at line 153 of file CpuGemmInterleave4x4Kernel.cpp.

 {
     return "CpuGemmInterleave4x4Kernel";
 }

◆ run_op()

void run_op	(	ITensorPack &	tensors,
		const Window &	window,
		const ThreadInfo &	info
	)

overridevirtual

Execute the kernel on the passed window.

Warning: If is_parallelisable() returns false then the passed window must be equal to window()

Note: The window has to be a region within the window returned by the window() method; The width of the window has to be a multiple of num_elems_processed_per_iteration().

Parameters

[in]	tensors	A vector containing the tensors to operate on.
[in]	window	Region on which to execute the kernel. (Must be a region of the window returned by window())
[in]	info	Info about executing thread and CPU.

Reimplemented from ICPPKernel.

Definition at line 75 of file CpuGemmInterleave4x4Kernel.cpp.

 {
     ARM_COMPUTE_UNUSED(info);
     ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL(this);
     ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW(IKernel::window(), window);
     ARM_COMPUTE_ERROR_ON(tensors.empty());
     /*
     *  This kernel puts the values in a 4x4 block of Matrix A on the same row (Interleaved values)
     *         |a00 a01 a02 a03|
     *         |a10 a11 a12 a13|
     *         |a20 a21 a22 a23| = | a00 a10 a20 a30 || a01 a11 a21 a31 || a02 a12 a22 a32 || a03 a13 a23 a33 |
     *         |a30 a31 a32 a33|
     *
     *         After this operation, the dst matrix will have the following shape: [ height * 4, ceil(width / 4.0f) ]
     */
     const ITensor *src = tensors.get_const_tensor(TensorType::ACL_SRC);
     ITensor       *dst = tensors.get_tensor(TensorType::ACL_DST);
  
     const size_t window_start_x = window.x().start();
     const size_t window_end_x   = window.x().end();
  
     const size_t in_height = src->info()->dimension(1);
     const size_t in_stride = src->info()->strides_in_bytes()[1];
  
     const size_t partial_y = in_height % 4;
  
     const size_t element_size = src->info()->element_size();
  
     // Set window for the src tensor
     Window win = window;
     win.set(Window::DimX, Window::Dimension(0, 1, 1));
  
     // Set window for the dst tensor
     Window win_out(window);
     win_out.set(Window::DimX, Window::Dimension(0, 1, 1));
     win_out.scale(Window::DimY, 0.25f);
  
     Iterator in(src, win);
     Iterator out(dst, win_out);
  
     execute_window_loop(
         win,
         [&](const Coordinates &id)
         {
             if (id.y() + 4 <= static_cast<int>(in_height))
             {
                 for (size_t x = window_start_x; x < window_end_x; ++x)
                 {
                     std::memcpy(out.ptr() + (x * 4 + 0) * element_size, (in.ptr() + 0 * in_stride) + x * element_size,
                                 element_size);
                     std::memcpy(out.ptr() + (x * 4 + 1) * element_size, (in.ptr() + 1 * in_stride) + x * element_size,
                                 element_size);
                     std::memcpy(out.ptr() + (x * 4 + 2) * element_size, (in.ptr() + 2 * in_stride) + x * element_size,
                                 element_size);
                     std::memcpy(out.ptr() + (x * 4 + 3) * element_size, (in.ptr() + 3 * in_stride) + x * element_size,
                                 element_size);
                 }
             }
             else
             {
                 for (size_t x = window_start_x; x < window_end_x; ++x)
                 {
                     size_t y = 0;
                     for (; y < partial_y; ++y)
                     {
                         std::memcpy(out.ptr() + (x * 4 + y) * element_size,
                                     (in.ptr() + y * in_stride) + x * element_size, element_size);
                     }
                     for (; y < 4; ++y)
                     {
                         std::memset(out.ptr() + (x * 4 + y) * element_size, 0, element_size);
                     }
                 }
             }
         },
         in, out);
 }

References arm_compute::ACL_DST, arm_compute::ACL_SRC, ARM_COMPUTE_ERROR_ON, ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW, ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL, ARM_COMPUTE_UNUSED, Window::DimX, Window::DimY, arm_compute::test::validation::dst, ITensorPack::empty(), Window::Dimension::end(), arm_compute::execute_window_loop(), ITensorPack::get_const_tensor(), ITensorPack::get_tensor(), arm_compute::test::validation::info, Iterator::ptr(), Window::scale(), Window::set(), arm_compute::test::validation::src, Window::Dimension::start(), IKernel::window(), and Window::x().

◆ validate()

Status validate	(	const ITensorInfo *	src,
		const ITensorInfo *	dst
	)

static

Static function to check if given info will lead to a valid configuration of CpuGemmInterleave4x4Kernel.

Returns: a status

Definition at line 58 of file CpuGemmInterleave4x4Kernel.cpp.

 {
     ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR(src, dst);
     //Note: ARM_COMPUTE_RETURN_ERROR_ON_CPU_F16_UNSUPPORTED(src) is not needed here as this kernel doesn't use CPU FP16 instructions.
     ARM_COMPUTE_RETURN_ERROR_ON(src->data_type() == DataType::UNKNOWN);
  
     if (dst->total_size() != 0)
     {
         const TensorShape dst_shape = compute_interleaved_shape(*src);
         ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DIMENSIONS(dst->tensor_shape(), dst_shape);
         ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(src, dst);
         ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_QUANTIZATION_INFO(src, dst);
     }
  
     return Status{};
 }

References ARM_COMPUTE_RETURN_ERROR_ON, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DIMENSIONS, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_QUANTIZATION_INFO, ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR, arm_compute::misc::shape_calculator::compute_interleaved_shape(), arm_compute::test::validation::dst, arm_compute::test::validation::dst_shape, arm_compute::test::validation::src, and arm_compute::UNKNOWN.

Referenced by CpuGemmInterleave4x4Kernel::configure(), CpuGemm::validate(), and CpuGemmLowpMatrixMultiplyCore::validate().

The documentation for this class was generated from the following files:

src/cpu/kernels/CpuGemmInterleave4x4Kernel.h
src/cpu/kernels/CpuGemmInterleave4x4Kernel.cpp

Public Member Functions

Static Public Member Functions

Additional Inherited Members