Compute Library
 23.11
CpuCastKernel Class Reference

Casts a given tensor to a new type. More...

#include <CpuCastKernel.h>

Collaboration diagram for CpuCastKernel:
[legend]

Data Structures

struct  CastKernel
 

Public Member Functions

 CpuCastKernel ()=default
 
 ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE (CpuCastKernel)
 
void configure (const ITensorInfo *src, ITensorInfo *dst, ConvertPolicy policy)
 Set the src and dst of the kernel. More...
 
void run_op (ITensorPack &tensors, const Window &window, const ThreadInfo &info) override
 Execute the kernel on the passed window. More...
 
const char * name () const override
 Name of the kernel. More...
 
- Public Member Functions inherited from ICPPKernel
virtual ~ICPPKernel ()=default
 Default destructor. More...
 
virtual void run (const Window &window, const ThreadInfo &info)
 Execute the kernel on the passed window. More...
 
virtual void run_nd (const Window &window, const ThreadInfo &info, const Window &thread_locator)
 legacy compatibility layer for implemantions which do not support thread_locator In these cases we simply narrow the interface down the legacy version More...
 
virtual size_t get_mws (const CPUInfo &platform, size_t thread_count) const
 Return minimum workload size of the relevant kernel. More...
 
- Public Member Functions inherited from IKernel
 IKernel ()
 Constructor. More...
 
virtual ~IKernel ()=default
 Destructor. More...
 
virtual bool is_parallelisable () const
 Indicates whether or not the kernel is parallelisable. More...
 
virtual BorderSize border_size () const
 The size of the border for that kernel. More...
 
const Windowwindow () const
 The maximum window the kernel can be executed on. More...
 
bool is_window_configured () const
 Function to check if the embedded window of this kernel has been configured. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *src, const ITensorInfo *dst, ConvertPolicy policy)
 Static function to check if given info will lead to a valid configuration. More...
 
static const std::vector< CastKernel > & get_available_kernels ()
 
- Static Public Member Functions inherited from ICpuKernel< CpuCastKernel >
static const auto * get_implementation (const SelectorType &selector, KernelSelectionType selection_type=KernelSelectionType::Supported)
 Micro-kernel selector. More...
 

Additional Inherited Members

- Static Public Attributes inherited from ICPPKernel
static constexpr size_t default_mws = 1
 

Detailed Description

Casts a given tensor to a new type.

Note
When casting between quantized types the scale and zeroPoint are ignored

Definition at line 40 of file CpuCastKernel.h.

Constructor & Destructor Documentation

◆ CpuCastKernel()

CpuCastKernel ( )
default

Member Function Documentation

◆ ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE()

ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE ( CpuCastKernel  )

◆ configure()

void configure ( const ITensorInfo src,
ITensorInfo dst,
ConvertPolicy  policy 
)

Set the src and dst of the kernel.

Valid conversions src -> dst :

  • QASYMM8_SIGNED -> S16, S32, F32, F16
  • QASYMM8 -> U16, S16, S32, F32, F16
  • U8 -> U16, S16, S32, F32, F16
  • U16 -> U8, U32
  • S16 -> QASYMM8_SIGNED, U8, S32
  • F16 -> QASYMM8_SIGNED, QASYMM8, F32, S32, U8
  • S32 -> QASYMM8_SIGNED, QASYMM8, F16, F32, U8
  • S64 -> F32
  • F32 -> QASYMM8_SIGNED, QASYMM8, F16, S32, U8
Parameters
[in]srcThe src tensor to convert. Data types supported: QASYMM8_SIGNED/QASYMM8/U8/U16/S16/S32/S64/F16/F32.
[out]dstThe dst tensor. Data types supported: QASYMM8_SIGNED/QASYMM8/U8/U16/S16/U32/S32/S64/F16/F32.
[in]policyConversion policy.
Note
S64 is only supported in aarch64

Definition at line 163 of file CpuCastKernel.cpp.

164 {
166 
167  // Auto initialize dst shape if not initialized (We can only auto-configure the shape, datatype must be given)
168  set_shape_if_empty(*dst, src->tensor_shape());
169 
170  _policy = policy;
171 
173 
174  // Configure kernel window
175  Window win = calculate_max_window(*src, Steps());
176 
177  ICPPKernel::configure(win);
178 }

References ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, arm_compute::calculate_max_window(), arm_compute::test::validation::dst, arm_compute::set_shape_if_empty(), arm_compute::test::validation::src, and arm_compute::cpu::kernels::validate_arguments().

◆ get_available_kernels()

const std::vector< CpuCastKernel::CastKernel > & get_available_kernels ( )
static

Definition at line 1172 of file CpuCastKernel.cpp.

1173 {
1174  return available_kernels;
1175 }

◆ name()

const char * name ( ) const
overridevirtual

Name of the kernel.

Returns
Kernel name

Implements ICPPKernel.

Definition at line 1167 of file CpuCastKernel.cpp.

1168 {
1169  return "CpuCastKernel.cpp";
1170 }

◆ run_op()

void run_op ( ITensorPack tensors,
const Window window,
const ThreadInfo info 
)
overridevirtual

Execute the kernel on the passed window.

Warning
If is_parallelisable() returns false then the passed window must be equal to window()
Note
The window has to be a region within the window returned by the window() method
The width of the window has to be a multiple of num_elems_processed_per_iteration().
Parameters
[in]tensorsA vector containing the tensors to operate on.
[in]windowRegion on which to execute the kernel. (Must be a region of the window returned by window())
[in]infoInfo about executing thread and CPU.

Reimplemented from ICPPKernel.

Definition at line 271 of file CpuCastKernel.cpp.

272 {
276 
277  const auto window_start_x = static_cast<int>(window.x().start());
278  const auto window_end_x = static_cast<int>(window.x().end());
279  const int window_step_x = 16;
280 
281  const ITensor *_src = tensors.get_const_tensor(TensorType::ACL_SRC);
282  ITensor *_dst = tensors.get_tensor(TensorType::ACL_DST);
283  ARM_COMPUTE_ERROR_ON_NULLPTR(_src, _dst);
284  ARM_COMPUTE_ERROR_ON(_src == _dst);
285 
286  ARM_COMPUTE_ERROR_ON_NULLPTR(_src, _dst);
287 
288  Window win{window};
289  win.set(Window::DimX, Window::Dimension(0, 1, 1));
290 
291  Iterator src(_src, win);
292  Iterator dst(_dst, win);
293 
294  /*ukernel runs only when using fp16, so we validate it isn't a nullptr only before using it */
295  const auto *uk = CpuCastKernel::get_implementation(
296  CastDataTypeISASelectorData{_src->info()->data_type(), _dst->info()->data_type(), CPUInfo::get().get_isa()});
297 
298  switch (_src->info()->data_type())
299  {
300 #ifdef __aarch64__
301  case DataType::U64:
302  {
303  switch (_dst->info()->data_type())
304  {
305  case DataType::F32:
306  {
307  convert64<uint64_t, float>(src, dst, win, window_start_x, window_end_x, window_step_x);
308  break;
309  }
310  default:
311  ARM_COMPUTE_ERROR("dst data type not supported");
312  }
313  break;
314  }
315  case DataType::S64:
316  {
317  switch (_dst->info()->data_type())
318  {
319  case DataType::F32:
320  {
321  convert64<int64_t, float>(src, dst, win, window_start_x, window_end_x, window_step_x);
322  break;
323  }
324  default:
325  ARM_COMPUTE_ERROR("dst data type not supported");
326  }
327  break;
328  }
329 #endif // __aarch64__
330 
332  {
333  switch (_dst->info()->data_type())
334  {
335  case DataType::S16:
336  {
337  /* Up-conversion QASYMM8_SIGNED -> S16 */
339  win,
340  [&](const Coordinates &)
341  {
342  const auto src_ptr = reinterpret_cast<const int8_t *>(src.ptr());
343  const auto dst_ptr = reinterpret_cast<int16_t *>(dst.ptr());
344  int x = window_start_x;
345 
346  for (; x <= (window_end_x - window_step_x); x += window_step_x)
347  {
348  const int8x16_t texels_s8 = vld1q_s8(src_ptr + x);
349 
350  const int16x8x2_t texels = {
351  {vmovl_s8(vget_low_s8(texels_s8)), vmovl_s8(vget_high_s8(texels_s8))}};
352 
353  vst1q_s16(dst_ptr + x, texels.val[0]);
354  vst1q_s16(dst_ptr + x + 8, texels.val[1]);
355  }
356 
357  // Compute left-over elements
358  for (; x < window_end_x; ++x)
359  {
360  *(dst_ptr + x) = static_cast<int16_t>(*(src_ptr + x));
361  }
362  },
363  src, dst);
364  break;
365  }
366  case DataType::S32:
367  {
368  /* Up-conversion QASYMM8_SIGNED -> S32 */
370  win,
371  [&](const Coordinates &)
372  {
373  const auto src_ptr = reinterpret_cast<const int8_t *>(src.ptr());
374  const auto dst_ptr = reinterpret_cast<int32_t *>(dst.ptr());
375  int x = window_start_x;
376 
377  for (; x <= (window_end_x - window_step_x); x += window_step_x)
378  {
379  const int8x16_t texels_s8 = vld1q_s8(src_ptr + x);
380 
381  const int16x8x2_t texels = {
382  {vmovl_s8(vget_low_s8(texels_s8)), vmovl_s8(vget_high_s8(texels_s8))}};
383 
384  vst1q_s32(dst_ptr + x, vmovl_s16(vget_low_s16(texels.val[0])));
385  vst1q_s32(dst_ptr + x + 4, vmovl_s16(vget_high_s16(texels.val[0])));
386  vst1q_s32(dst_ptr + x + 8, vmovl_s16(vget_low_s16(texels.val[1])));
387  vst1q_s32(dst_ptr + x + 12, vmovl_s16(vget_high_s16(texels.val[1])));
388  }
389 
390  // Compute left-over elements
391  for (; x < window_end_x; ++x)
392  {
393  *(dst_ptr + x) = static_cast<int32_t>(*(src_ptr + x));
394  }
395  },
396  src, dst);
397  break;
398  }
399  case DataType::F32:
400  {
401  /* Up-conversion QASYMM8_SIGNED -> F32 */
403  win,
404  [&](const Coordinates &)
405  {
406  const auto src_ptr = reinterpret_cast<const int8_t *>(src.ptr());
407  const auto dst_ptr = reinterpret_cast<float *>(dst.ptr());
408 
409  int x = window_start_x;
410  for (; x <= (window_end_x - window_step_x); x += window_step_x)
411  {
412  const int8x16_t texels_s8 = vld1q_s8(src_ptr + x);
413 
414  const int16x8x2_t texels = {
415  {vmovl_s8(vget_low_s8(texels_s8)), vmovl_s8(vget_high_s8(texels_s8))}};
416  vst1q_f32(dst_ptr + x, vcvtq_f32_s32(vmovl_s16(vget_low_s16(texels.val[0]))));
417  vst1q_f32(dst_ptr + x + 4, vcvtq_f32_s32(vmovl_s16(vget_high_s16(texels.val[0]))));
418  vst1q_f32(dst_ptr + x + 8, vcvtq_f32_s32(vmovl_s16(vget_low_s16(texels.val[1]))));
419  vst1q_f32(dst_ptr + x + 12, vcvtq_f32_s32(vmovl_s16(vget_high_s16(texels.val[1]))));
420  }
421 
422  // Compute left-over elements
423  for (; x < window_end_x; ++x)
424  {
425  *(dst_ptr + x) = static_cast<float>(*(src_ptr + x));
426  }
427  },
428  src, dst);
429  break;
430  }
431  case DataType::F16:
432  {
433  /* Up-conversion QASYMM8_SIGNED -> F16 */
434  ARM_COMPUTE_ERROR_ON(uk->ukernel == nullptr);
435  uk->ukernel(_src, _dst, info, _policy, window);
436  break;
437  }
438  default:
439  ARM_COMPUTE_ERROR("dst data type not supported");
440  }
441  break;
442  }
443 
444  case DataType::QASYMM8:
445  case DataType::U8:
446  {
447  switch (_dst->info()->data_type())
448  {
449  case DataType::S16:
450  {
451  /* Up-conversion U8 -> S16 */
453  win,
454  [&](const Coordinates &)
455  {
456  const auto src_ptr = reinterpret_cast<const uint8_t *>(src.ptr());
457  const auto dst_ptr = reinterpret_cast<int16_t *>(dst.ptr());
458 
459  int x = window_start_x;
460  for (; x <= (window_end_x - window_step_x); x += window_step_x)
461  {
462  const uint8x16_t texels_u8 = vld1q_u8(src_ptr + x);
463 
464  const int16x8x2_t texels = {{vreinterpretq_s16_u16(vmovl_u8(vget_low_u8(texels_u8))),
465  vreinterpretq_s16_u16(vmovl_u8(vget_high_u8(texels_u8)))}};
466 
467  vst1q_s16(dst_ptr + x, texels.val[0]);
468  vst1q_s16(dst_ptr + x + 8, texels.val[1]);
469  }
470 
471  // Compute left-over elements
472  for (; x < window_end_x; ++x)
473  {
474  *(dst_ptr + x) = static_cast<int32_t>(*(src_ptr + x));
475  }
476  },
477  src, dst);
478  break;
479  }
480  case DataType::S32:
481  {
482  /* Up-conversion U8 -> S32 */
484  win,
485  [&](const Coordinates &)
486  {
487  const auto src_ptr = reinterpret_cast<const uint8_t *>(src.ptr());
488  const auto dst_ptr = reinterpret_cast<int32_t *>(dst.ptr());
489 
490  int x = window_start_x;
491  for (; x <= (window_end_x - window_step_x); x += window_step_x)
492  {
493  const uint8x16_t texels_u8 = vld1q_u8(src_ptr + x);
494 
495  const int16x8x2_t texels = {{vreinterpretq_s16_u16(vmovl_u8(vget_low_u8(texels_u8))),
496  vreinterpretq_s16_u16(vmovl_u8(vget_high_u8(texels_u8)))}};
497 
498  vst1q_s32(dst_ptr + x, vmovl_s16(vget_low_s16(texels.val[0])));
499  vst1q_s32(dst_ptr + x + 4, vmovl_s16(vget_high_s16(texels.val[0])));
500  vst1q_s32(dst_ptr + x + 8, vmovl_s16(vget_low_s16(texels.val[1])));
501  vst1q_s32(dst_ptr + x + 12, vmovl_s16(vget_high_s16(texels.val[1])));
502  }
503 
504  // Compute left-over elements
505  for (; x < window_end_x; ++x)
506  {
507  *(dst_ptr + x) = static_cast<uint32_t>(*(src_ptr + x));
508  }
509  },
510  src, dst);
511  break;
512  }
513  case DataType::F32:
514  {
515  /* Up-conversion U8 -> F32 */
517  win,
518  [&](const Coordinates &)
519  {
520  const auto src_ptr = reinterpret_cast<const uint8_t *>(src.ptr());
521  const auto dst_ptr = reinterpret_cast<float *>(dst.ptr());
522 
523  int x = window_start_x;
524  for (; x <= (window_end_x - window_step_x); x += window_step_x)
525  {
526  const uint8x16_t texels_u8 = vld1q_u8(src_ptr + x);
527 
528  const int16x8x2_t texels = {{vreinterpretq_s16_u16(vmovl_u8(vget_low_u8(texels_u8))),
529  vreinterpretq_s16_u16(vmovl_u8(vget_high_u8(texels_u8)))}};
530  vst1q_f32(dst_ptr + x, vcvtq_f32_s32(vmovl_s16(vget_low_s16(texels.val[0]))));
531  vst1q_f32(dst_ptr + x + 4, vcvtq_f32_s32(vmovl_s16(vget_high_s16(texels.val[0]))));
532  vst1q_f32(dst_ptr + x + 8, vcvtq_f32_s32(vmovl_s16(vget_low_s16(texels.val[1]))));
533  vst1q_f32(dst_ptr + x + 12, vcvtq_f32_s32(vmovl_s16(vget_high_s16(texels.val[1]))));
534  }
535 
536  // Compute left-over elements
537  for (; x < window_end_x; ++x)
538  {
539  *(dst_ptr + x) = static_cast<uint32_t>(*(src_ptr + x));
540  }
541  },
542  src, dst);
543  break;
544  }
545  case DataType::F16:
546  {
547  /* Up-conversion U8 -> FP16 */
548  ARM_COMPUTE_ERROR_ON(uk->ukernel == nullptr);
549  uk->ukernel(_src, _dst, info, _policy, window);
550  break;
551  }
552  case DataType::U16:
553  {
554  /* Up-conversion U8 -> U16 */
556  win,
557  [&](const Coordinates &)
558  {
559  const auto src_ptr = reinterpret_cast<const uint8_t *>(src.ptr());
560  const auto dst_ptr = reinterpret_cast<uint16_t *>(dst.ptr());
561 
562  int x = window_start_x;
563  for (; x <= (window_end_x - window_step_x); x += window_step_x)
564  {
565  const uint8x16_t texels_u8 = vld1q_u8(src_ptr + x);
566 
567  const uint16x8x2_t texels = {
568  {vmovl_u8(vget_low_u8(texels_u8)), vmovl_u8(vget_high_u8(texels_u8))}};
569 
570  vst1q_u16(dst_ptr + x, texels.val[0]);
571  vst1q_u16(dst_ptr + x + 8, texels.val[1]);
572  }
573 
574  // Compute left-over elements
575  for (; x < window_end_x; ++x)
576  {
577  *(dst_ptr + x) = static_cast<uint16_t>(*(src_ptr + x));
578  }
579  },
580  src, dst);
581  break;
582  }
583  default:
584  ARM_COMPUTE_ERROR("dst data type not supported");
585  }
586  break;
587  }
588  case DataType::S16:
589  {
590  switch (_dst->info()->data_type())
591  {
593  {
594  /* Down-conversion S16 -> QASYMM8_SIGNED */
595  if (ConvertPolicy::SATURATE == _policy)
596  {
598  win,
599  [&](const Coordinates &)
600  {
601  const auto src_ptr = reinterpret_cast<const int16_t *>(src.ptr());
602  const auto dst_ptr = reinterpret_cast<int8_t *>(dst.ptr());
603 
604  int x = window_start_x;
605  for (; x <= (window_end_x - window_step_x); x += window_step_x)
606  {
607  const int16x8x2_t texels = {{vld1q_s16(src_ptr + x), vld1q_s16(src_ptr + x + 8)}};
608 
609  vst1q_s8(dst_ptr + x,
610  vcombine_s8(vqmovn_s16(texels.val[0]), vqmovn_s16(texels.val[1])));
611  }
612 
613  // Compute left-over elements
614  for (; x < window_end_x; ++x)
615  {
616  *(dst_ptr + x) = utils::cast::saturate_cast<int8_t>(*(src_ptr + x));
617  }
618  },
619  src, dst);
620  }
621  else
622  {
624  win,
625  [&](const Coordinates &)
626  {
627  const auto src_ptr = reinterpret_cast<const int16_t *>(src.ptr());
628  const auto dst_ptr = reinterpret_cast<int8_t *>(dst.ptr());
629 
630  int x = window_start_x;
631  for (; x <= (window_end_x - window_step_x); x += window_step_x)
632  {
633  const int16x8x2_t texels = {{vld1q_s16(src_ptr + x), vld1q_s16(src_ptr + x + 8)}};
634 
635  vst1q_s8(dst_ptr + x,
636  vcombine_s8(vmovn_s16(texels.val[0]), vmovn_s16(texels.val[1])));
637  }
638 
639  // Compute left-over elements
640  for (; x < window_end_x; ++x)
641  {
642  *(dst_ptr + x) = static_cast<int8_t>(*(src_ptr + x));
643  }
644  },
645  src, dst);
646  }
647  break;
648  }
649  case DataType::U8:
650  {
651  /* Down-conversion S16 -> U8 */
652  if (ConvertPolicy::SATURATE == _policy)
653  {
655  win,
656  [&](const Coordinates &)
657  {
658  const auto src_ptr = reinterpret_cast<const int16_t *>(src.ptr());
659  const auto dst_ptr = reinterpret_cast<uint8_t *>(dst.ptr());
660 
661  int x = window_start_x;
662  for (; x <= (window_end_x - window_step_x); x += window_step_x)
663  {
664  const int16x8x2_t texels = {{vld1q_s16(src_ptr + x), vld1q_s16(src_ptr + x + 8)}};
665 
666  vst1q_u8(dst_ptr + x,
667  vcombine_u8(vqmovun_s16(texels.val[0]), vqmovun_s16(texels.val[1])));
668  }
669 
670  // Compute left-over elements
671  for (; x < window_end_x; ++x)
672  {
673  *(dst_ptr + x) = utils::cast::saturate_cast<uint8_t>(*(src_ptr + x));
674  }
675  },
676  src, dst);
677  }
678  else
679  {
681  win,
682  [&](const Coordinates &)
683  {
684  const auto src_ptr = reinterpret_cast<const int16_t *>(src.ptr());
685  const auto dst_ptr = reinterpret_cast<uint8_t *>(dst.ptr());
686 
687  int x = window_start_x;
688  for (; x <= (window_end_x - window_step_x); x += window_step_x)
689  {
690  const int16x8x2_t texels = {{vld1q_s16(src_ptr + x), vld1q_s16(src_ptr + x + 8)}};
691 
692  vst1q_u8(dst_ptr + x, vcombine_u8(vmovn_u16(vreinterpretq_u16_s16(texels.val[0])),
693  vmovn_u16(vreinterpretq_u16_s16(texels.val[1]))));
694  }
695 
696  // Compute left-over elements
697  for (; x < window_end_x; ++x)
698  {
699  *(dst_ptr + x) = static_cast<uint8_t>(*(src_ptr + x));
700  }
701  },
702  src, dst);
703  }
704  break;
705  }
706  case DataType::S32:
707  {
708  /* Up-conversion S16 -> S32 */
710  win,
711  [&](const Coordinates &)
712  {
713  const auto src_ptr = reinterpret_cast<const int16_t *>(src.ptr());
714  const auto dst_ptr = reinterpret_cast<int32_t *>(dst.ptr());
715 
716  int x = window_start_x;
717  for (; x <= (window_end_x - window_step_x); x += window_step_x)
718  {
719  const int16x8x2_t texels = {{vld1q_s16(src_ptr + x), vld1q_s16(src_ptr + x + 8)}};
720 
721  const int32x4x4_t texels_s32 = {
722  {vmovl_s16(vget_low_s16(texels.val[0])), vmovl_s16(vget_high_s16(texels.val[0])),
723  vmovl_s16(vget_low_s16(texels.val[1])), vmovl_s16(vget_high_s16(texels.val[1]))}};
724 
725  vst1q_s32(dst_ptr + x, texels_s32.val[0]);
726  vst1q_s32(dst_ptr + x + 4, texels_s32.val[1]);
727  vst1q_s32(dst_ptr + x + 8, texels_s32.val[2]);
728  vst1q_s32(dst_ptr + x + 12, texels_s32.val[3]);
729  }
730 
731  // Compute left-over elements
732  for (; x < window_end_x; ++x)
733  {
734  *(dst_ptr + x) = static_cast<int32_t>(*(src_ptr + x));
735  }
736  },
737  src, dst);
738  break;
739  }
740  default:
741  ARM_COMPUTE_ERROR("dst data type not supported");
742  }
743  break;
744  }
745 
746  case DataType::U16:
747  {
748  switch (_dst->info()->data_type())
749  {
750  case DataType::U8:
751  {
752  /* Down-conversion U16 -> U8 */
753  if (ConvertPolicy::SATURATE == _policy)
754  {
756  win,
757  [&](const Coordinates &)
758  {
759  const auto src_ptr = reinterpret_cast<const uint16_t *>(src.ptr());
760  const auto dst_ptr = reinterpret_cast<uint8_t *>(dst.ptr());
761 
762  int x = window_start_x;
763  for (; x <= (window_end_x - window_step_x); x += window_step_x)
764  {
765  const uint16x8x2_t texels = {{vld1q_u16(src_ptr + x), vld1q_u16(src_ptr + x + 8)}};
766 
767  vst1q_u8(dst_ptr + x,
768  vcombine_u8(vqmovn_u16(texels.val[0]), vqmovn_u16(texels.val[1])));
769  }
770 
771  // Compute left-over elements
772  for (; x < window_end_x; ++x)
773  {
774  *(dst_ptr + x) = utils::cast::saturate_cast<uint8_t>(*(src_ptr + x));
775  }
776  },
777  src, dst);
778  }
779  else
780  {
782  win,
783  [&](const Coordinates &)
784  {
785  const auto src_ptr = reinterpret_cast<const uint16_t *>(src.ptr());
786  const auto dst_ptr = reinterpret_cast<uint8_t *>(dst.ptr());
787 
788  int x = window_start_x;
789  for (; x <= (window_end_x - window_step_x); x += window_step_x)
790  {
791  const uint16x8x2_t texels = {{vld1q_u16(src_ptr + x), vld1q_u16(src_ptr + x + 8)}};
792 
793  vst1q_u8(dst_ptr + x,
794  vcombine_u8(vmovn_u16(texels.val[0]), vmovn_u16(texels.val[1])));
795  }
796 
797  // Compute left-over elements
798  for (; x < window_end_x; ++x)
799  {
800  *(dst_ptr + x) = static_cast<uint8_t>(*(src_ptr + x));
801  }
802  },
803  src, dst);
804  }
805  break;
806  }
807  case DataType::U32:
808  {
809  /* Up-conversion U16 -> U32 */
811  win,
812  [&](const Coordinates &)
813  {
814  const auto src_ptr = reinterpret_cast<const uint16_t *>(src.ptr());
815  const auto dst_ptr = reinterpret_cast<uint32_t *>(dst.ptr());
816 
817  int x = window_start_x;
818  for (; x <= (window_end_x - window_step_x); x += window_step_x)
819  {
820  const uint16x8x2_t texels = {{vld1q_u16(src_ptr + x), vld1q_u16(src_ptr + x + 8)}};
821 
822  vst1q_u32(dst_ptr + x, vmovl_u16(vget_low_u16(texels.val[0])));
823  vst1q_u32(dst_ptr + x + 4, vmovl_u16(vget_high_u16(texels.val[0])));
824  vst1q_u32(dst_ptr + x + 8, vmovl_u16(vget_low_u16(texels.val[1])));
825  vst1q_u32(dst_ptr + x + 12, vmovl_u16(vget_high_u16(texels.val[1])));
826  }
827  // Compute left-over elements
828  for (; x < window_end_x; ++x)
829  {
830  *(dst_ptr + x) = static_cast<uint32_t>(*(src_ptr + x));
831  }
832  },
833  src, dst);
834  break;
835  }
836  default:
837  ARM_COMPUTE_ERROR("dst data type not supported");
838  }
839  break;
840  }
841  case DataType::F16:
842  {
843  /* conversion F16 -> any data type */
844  ARM_COMPUTE_ERROR_ON(uk->ukernel == nullptr);
845  uk->ukernel(_src, _dst, info, _policy, window);
846  break;
847  }
848  case DataType::F32:
849  switch (_dst->info()->data_type())
850  {
851  case DataType::F16:
852  {
853  /* Down-conversion F32 -> F16 */
854  ARM_COMPUTE_ERROR_ON(uk->ukernel == nullptr);
855  uk->ukernel(_src, _dst, info, _policy, window);
856  break;
857  }
858  case DataType::S32:
859  {
860  /* Conversion F32 -> S32 */
862  win,
863  [&](const Coordinates &)
864  {
865  const auto src_ptr = reinterpret_cast<const float *>(src.ptr());
866  const auto dst_ptr = reinterpret_cast<int32_t *>(dst.ptr());
867 
868  int x = window_start_x;
869  for (; x <= (window_end_x - window_step_x); x += window_step_x)
870  {
871  const float32x4x4_t texels = {{
872  vld1q_f32(src_ptr + x),
873  vld1q_f32(src_ptr + x + 4),
874  vld1q_f32(src_ptr + x + 8),
875  vld1q_f32(src_ptr + x + 12),
876  }};
877 
878  vst1q_s32(dst_ptr + x, vcvtq_s32_f32(texels.val[0]));
879  vst1q_s32(dst_ptr + x + 4, vcvtq_s32_f32(texels.val[1]));
880  vst1q_s32(dst_ptr + x + 8, vcvtq_s32_f32(texels.val[2]));
881  vst1q_s32(dst_ptr + x + 12, vcvtq_s32_f32(texels.val[3]));
882  }
883 
884  // Compute left-over elements
885  for (; x < window_end_x; ++x)
886  {
887  *(dst_ptr + x) = static_cast<int32_t>(*(src_ptr + x));
888  }
889  },
890  src, dst);
891  break;
892  }
893  case DataType::QASYMM8:
894  case DataType::U8:
895  {
896  /* Down-conversion F32 -> U8 */
898  win,
899  [&](const Coordinates &)
900  {
901  const auto src_ptr = reinterpret_cast<const float *>(src.ptr());
902  const auto dst_ptr = reinterpret_cast<uint8_t *>(dst.ptr());
903 
904  int x = window_start_x;
905  for (; x <= (window_end_x - window_step_x); x += window_step_x)
906  {
907  const float32x4x4_t texels = {{
908  vld1q_f32(src_ptr + x),
909  vld1q_f32(src_ptr + x + 4),
910  vld1q_f32(src_ptr + x + 8),
911  vld1q_f32(src_ptr + x + 12),
912  }};
913 
914  vst1_u8(dst_ptr + x,
915  vqmovn_u16(vcombine_u16(vqmovun_s32(vcvtq_s32_f32(texels.val[0])),
916  vqmovun_s32(vcvtq_s32_f32(texels.val[1])))));
917  vst1_u8(dst_ptr + x + 8,
918  vqmovn_u16(vcombine_u16(vqmovun_s32(vcvtq_s32_f32(texels.val[2])),
919  vqmovun_s32(vcvtq_s32_f32(texels.val[3])))));
920  }
921 
922  // Compute left-over elements
923  for (; x < window_end_x; ++x)
924  {
925  *(dst_ptr + x) = utils::cast::saturate_cast<uint8_t>(*(src_ptr + x));
926  }
927  },
928  src, dst);
929  break;
930  }
932  {
933  /* Down-conversion F32 -> QASYMM8_SIGNED */
935  win,
936  [&](const Coordinates &)
937  {
938  const auto src_ptr = reinterpret_cast<const float *>(src.ptr());
939  const auto dst_ptr = reinterpret_cast<int8_t *>(dst.ptr());
940 
941  int x = window_start_x;
942  for (; x <= (window_end_x - window_step_x); x += window_step_x)
943  {
944  const float32x4x4_t texels = {{
945  vld1q_f32(src_ptr + x),
946  vld1q_f32(src_ptr + x + 4),
947  vld1q_f32(src_ptr + x + 8),
948  vld1q_f32(src_ptr + x + 12),
949  }};
950 
951  vst1_s8(dst_ptr + x,
952  vqmovn_s16(vcombine_s16(vqmovn_s32(vcvtq_s32_f32(texels.val[0])),
953  vqmovn_s32(vcvtq_s32_f32(texels.val[1])))));
954  vst1_s8(dst_ptr + x + 8,
955  vqmovn_s16(vcombine_s16(vqmovn_s32(vcvtq_s32_f32(texels.val[2])),
956  vqmovn_s32(vcvtq_s32_f32(texels.val[3])))));
957  }
958  // Compute left-over elements
959  for (; x < window_end_x; ++x)
960  {
961  *(dst_ptr + x) = utils::cast::saturate_cast<int8_t>(*(src_ptr + x));
962  }
963  },
964  src, dst);
965  break;
966  }
967 
968  default:
969  ARM_COMPUTE_ERROR("dst data type not supported");
970  }
971  break;
972  case DataType::S32:
973  switch (_dst->info()->data_type())
974  {
975 #if __aarch64__
976  case DataType::S64:
977  {
978  convert64<int32_t, int64_t>(src, dst, win, window_start_x, window_end_x, window_step_x);
979  break;
980  }
981 #endif // __aarch64__
982  case DataType::F16:
983  {
984  /* Down-conversion S32 -> F16 */
985  ARM_COMPUTE_ERROR_ON(uk->ukernel == nullptr);
986  uk->ukernel(_src, _dst, info, _policy, window);
987  break;
988  }
989  case DataType::F32:
990  {
991  /* Conversion S32 -> F32 */
993  win,
994  [&](const Coordinates &)
995  {
996  const auto src_ptr = reinterpret_cast<const int32_t *>(src.ptr());
997  const auto dst_ptr = reinterpret_cast<float *>(dst.ptr());
998 
999  int x = window_start_x;
1000  for (; x <= (window_end_x - window_step_x); x += window_step_x)
1001  {
1002  const int32x4x4_t texels = {{
1003  vld1q_s32(src_ptr + x),
1004  vld1q_s32(src_ptr + x + 4),
1005  vld1q_s32(src_ptr + x + 8),
1006  vld1q_s32(src_ptr + x + 12),
1007  }};
1008 
1009  vst1q_f32(dst_ptr + x, vcvtq_f32_s32(texels.val[0]));
1010  vst1q_f32(dst_ptr + x + 4, vcvtq_f32_s32(texels.val[1]));
1011  vst1q_f32(dst_ptr + x + 8, vcvtq_f32_s32(texels.val[2]));
1012  vst1q_f32(dst_ptr + x + 12, vcvtq_f32_s32(texels.val[3]));
1013  }
1014 
1015  // Compute left-over elements
1016  for (; x < window_end_x; ++x)
1017  {
1018  *(dst_ptr + x) = static_cast<float>(*(src_ptr + x));
1019  }
1020  },
1021  src, dst);
1022  break;
1023  }
1025  {
1026  /* Down-conversion S32 -> QASYMM8_SIGNED */
1027  if (ConvertPolicy::SATURATE == _policy)
1028  {
1030  win,
1031  [&](const Coordinates &)
1032  {
1033  const auto src_ptr = reinterpret_cast<const int32_t *>(src.ptr());
1034  const auto dst_ptr = reinterpret_cast<int8_t *>(dst.ptr());
1035 
1036  int x = window_start_x;
1037  for (; x <= (window_end_x - window_step_x); x += window_step_x)
1038  {
1039  const int32x4x4_t texels = {{
1040  vld1q_s32(src_ptr + x),
1041  vld1q_s32(src_ptr + x + 4),
1042  vld1q_s32(src_ptr + x + 8),
1043  vld1q_s32(src_ptr + x + 12),
1044  }};
1045  vst1_s8(dst_ptr + x, vqmovn_s16(vcombine_s16(vqmovn_s32(texels.val[0]),
1046  vqmovn_s32(texels.val[1]))));
1047  vst1_s8(dst_ptr + x + 8, vqmovn_s16(vcombine_s16(vqmovn_s32(texels.val[2]),
1048  vqmovn_s32(texels.val[3]))));
1049  }
1050 
1051  // Compute left-over elements
1052  for (; x < window_end_x; ++x)
1053  {
1054  *(dst_ptr + x) = utils::cast::saturate_cast<int8_t>(*(src_ptr + x));
1055  }
1056  },
1057  src, dst);
1058  }
1059  else
1060  {
1062  win,
1063  [&](const Coordinates &)
1064  {
1065  const auto src_ptr = reinterpret_cast<const int32_t *>(src.ptr());
1066  const auto dst_ptr = reinterpret_cast<int8_t *>(dst.ptr());
1067 
1068  int x = window_start_x;
1069  for (; x <= (window_end_x - window_step_x); x += window_step_x)
1070  {
1071  const int32x4x4_t texels = {{vld1q_s32(src_ptr + x), vld1q_s32(src_ptr + x + 4),
1072  vld1q_s32(src_ptr + x + 8),
1073  vld1q_s32(src_ptr + x + 12)}};
1074 
1075  vst1_s8(dst_ptr + x, vmovn_s16(vcombine_s16(vmovn_s32(texels.val[0]),
1076  vmovn_s32(texels.val[1]))));
1077  vst1_s8(dst_ptr + x + 8, vmovn_s16(vcombine_s16(vmovn_s32(texels.val[2]),
1078  vmovn_s32(texels.val[3]))));
1079  }
1080 
1081  // Compute left-over elements
1082  for (; x < window_end_x; ++x)
1083  {
1084  *(dst_ptr + x) = static_cast<int8_t>(*(src_ptr + x));
1085  }
1086  },
1087  src, dst);
1088  }
1089  break;
1090  }
1091  case DataType::QASYMM8:
1092  case DataType::U8:
1093  {
1094  /* Down-conversion S32 -> U8 */
1095  if (ConvertPolicy::SATURATE == _policy)
1096  {
1098  win,
1099  [&](const Coordinates &)
1100  {
1101  const auto src_ptr = reinterpret_cast<const int32_t *>(src.ptr());
1102  const auto dst_ptr = reinterpret_cast<uint8_t *>(dst.ptr());
1103 
1104  int x = window_start_x;
1105  for (; x <= (window_end_x - window_step_x); x += window_step_x)
1106  {
1107  const int32x4x4_t texels = {{vld1q_s32(src_ptr + x), vld1q_s32(src_ptr + x + 4),
1108  vld1q_s32(src_ptr + x + 8),
1109  vld1q_s32(src_ptr + x + 12)}};
1110  vst1_u8(dst_ptr + x, vqmovn_u16(vcombine_u16(vqmovun_s32(texels.val[0]),
1111  vqmovun_s32(texels.val[1]))));
1112  vst1_u8(dst_ptr + x + 8, vqmovn_u16(vcombine_u16(vqmovun_s32(texels.val[2]),
1113  vqmovun_s32(texels.val[3]))));
1114  }
1115 
1116  // Compute left-over elements
1117  for (; x < window_end_x; ++x)
1118  {
1119  *(dst_ptr + x) = utils::cast::saturate_cast<uint8_t>(*(src_ptr + x));
1120  }
1121  },
1122  src, dst);
1123  }
1124  else
1125  {
1127  win,
1128  [&](const Coordinates &)
1129  {
1130  const auto src_ptr = reinterpret_cast<const int32_t *>(src.ptr());
1131  const auto dst_ptr = reinterpret_cast<uint8_t *>(dst.ptr());
1132 
1133  int x = window_start_x;
1134  for (; x <= (window_end_x - window_step_x); x += window_step_x)
1135  {
1136  const int32x4x4_t texels = {{vld1q_s32(src_ptr + x), vld1q_s32(src_ptr + x + 4),
1137  vld1q_s32(src_ptr + x + 8),
1138  vld1q_s32(src_ptr + x + 12)}};
1139 
1140  vst1_u8(dst_ptr + x,
1141  vmovn_u16(vcombine_u16(vmovn_u32(vreinterpretq_u32_s32(texels.val[0])),
1142  vmovn_u32(vreinterpretq_u32_s32(texels.val[1])))));
1143  vst1_u8(dst_ptr + x + 8,
1144  vmovn_u16(vcombine_u16(vmovn_u32(vreinterpretq_u32_s32(texels.val[2])),
1145  vmovn_u32(vreinterpretq_u32_s32(texels.val[3])))));
1146  }
1147 
1148  // Compute left-over elements
1149  for (; x < window_end_x; ++x)
1150  {
1151  *(dst_ptr + x) = static_cast<uint8_t>(*(src_ptr + x));
1152  }
1153  },
1154  src, dst);
1155  }
1156  break;
1157  }
1158  default:
1159  ARM_COMPUTE_ERROR("dst data type not supported");
1160  }
1161  break;
1162  default:
1163  ARM_COMPUTE_ERROR("Not supported");
1164  }
1165 }

References arm_compute::ACL_DST, arm_compute::ACL_SRC, ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON, ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL, ARM_COMPUTE_UNUSED, arm_compute::test::validation::data_type, ITensorInfo::data_type(), Window::DimX, arm_compute::test::validation::dst, Window::Dimension::end(), arm_compute::execute_window_loop(), arm_compute::F16, arm_compute::F32, CPUInfo::get(), ITensorPack::get_const_tensor(), ICpuKernel< CpuCastKernel >::get_implementation(), CPUInfo::get_isa(), ITensorPack::get_tensor(), ITensor::info(), arm_compute::test::validation::info, arm_compute::QASYMM8, arm_compute::QASYMM8_SIGNED, arm_compute::S16, arm_compute::S32, arm_compute::S64, arm_compute::SATURATE, Window::set(), arm_compute::test::validation::src, Window::Dimension::start(), arm_compute::U16, arm_compute::U32, arm_compute::U64, arm_compute::U8, IKernel::window(), and Window::x().

◆ validate()

Status validate ( const ITensorInfo src,
const ITensorInfo dst,
ConvertPolicy  policy 
)
static

Static function to check if given info will lead to a valid configuration.

Similar to CpuCastKernel::configure()

Returns
a status

Definition at line 180 of file CpuCastKernel.cpp.

181 {
183  return Status{};
184 }

References ARM_COMPUTE_RETURN_ON_ERROR, arm_compute::test::validation::dst, arm_compute::test::validation::src, and arm_compute::cpu::kernels::validate_arguments().

Referenced by CpuCast::validate().


The documentation for this class was generated from the following files:
arm_compute::DataType::U64
@ U64
unsigned 64-bit number
arm_compute::Window::Dimension::start
constexpr int start() const
Return the start of the dimension.
Definition: Window.h:96
arm_compute::test::validation::src
SimpleTensor< float > src
Definition: DFT.cpp:155
arm_compute::calculate_max_window
Window calculate_max_window(const ValidRegion &valid_region, const Steps &steps, bool skip_border, BorderSize border_size)
Definition: WindowHelpers.cpp:29
arm_compute::DataType::QASYMM8
@ QASYMM8
quantized, asymmetric fixed-point 8-bit number unsigned
arm_compute::DataType::U16
@ U16
unsigned 16-bit number
arm_compute::test::validation::dst
auto dst
Definition: DFT.cpp:170
arm_compute::cpu::kernels::validate_arguments
Status validate_arguments(const ITensorInfo *src, const ITensorInfo *weights, const ITensorInfo *dst, const PadStrideInfo &conv_info)
Definition: CpuDirectConv2dKernel.cpp:57
ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL
#define ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL(k)
Definition: Validate.h:1079
arm_compute::Window::DimX
static constexpr size_t DimX
Alias for dimension 0 also known as X dimension.
Definition: Window.h:43
ARM_COMPUTE_ERROR
#define ARM_COMPUTE_ERROR(msg)
Print the given message then throw an std::runtime_error.
Definition: Error.h:354
arm_compute::CPUInfo::get
static CPUInfo & get()
Access the KernelLibrary singleton.
Definition: CPPTypes.cpp:41
ARM_COMPUTE_RETURN_ON_ERROR
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:205
ARM_COMPUTE_ERROR_ON_NULLPTR
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:159
ARM_COMPUTE_ERROR_ON
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:466
ARM_COMPUTE_ERROR_THROW_ON
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
arm_compute::DataType::U32
@ U32
unsigned 32-bit number
arm_compute::execute_window_loop
void execute_window_loop(const Window &w, L &&lambda_function, Ts &&...iterators)
Iterate through the passed window, automatically adjusting the iterators and calling the lambda_funct...
Definition: Helpers.inl:74
arm_compute::ACL_DST
@ ACL_DST
Definition: Types.h:55
arm_compute::DataType::U8
@ U8
unsigned 8-bit number
arm_compute::DataType::S16
@ S16
signed 16-bit number
arm_compute::DataType::QASYMM8_SIGNED
@ QASYMM8_SIGNED
quantized, asymmetric fixed-point 8-bit number signed
ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW
#define ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW(f, s)
Definition: Validate.h:203
arm_compute::ConvertPolicy::SATURATE
@ SATURATE
Saturate.
ARM_COMPUTE_UNUSED
#define ARM_COMPUTE_UNUSED(...)
To avoid unused variables warnings.
Definition: Error.h:151
arm_compute::cpu::ICpuKernel< CpuCastKernel >::get_implementation
static const auto * get_implementation(const SelectorType &selector, KernelSelectionType selection_type=KernelSelectionType::Supported)
Micro-kernel selector.
Definition: ICpuKernel.h:54
arm_compute::CPUInfo::get_isa
cpuinfo::CpuIsaInfo get_isa() const
Gets the current cpu's ISA information.
Definition: CPPTypes.cpp:124
arm_compute::Window::set
void set(size_t dimension, const Dimension &dim)
Set the values of a given dimension.
Definition: Window.inl:53
arm_compute::test::validation::data_type
data_type
Definition: Cast.cpp:222
arm_compute::IKernel::window
const Window & window() const
The maximum window the kernel can be executed on.
Definition: IKernel.cpp:28
arm_compute::set_shape_if_empty
bool set_shape_if_empty(ITensorInfo &info, const TensorShape &shape)
Set the shape to the specified value if the current assignment is empty.
Definition: AutoConfiguration.h:95
arm_compute::DataType::S64
@ S64
signed 64-bit number
arm_compute::DataType::F16
@ F16
16-bit floating-point number
arm_compute::DataType::S32
@ S32
signed 32-bit number
arm_compute::ACL_SRC
@ ACL_SRC
Definition: Types.h:44
arm_compute::DataType::F32
@ F32
32-bit floating-point number
arm_compute::test::validation::info
ScaleKernelInfo info(interpolation_policy, default_border_mode, PixelValue(), sampling_policy, false)
arm_compute::Window::Dimension::end
constexpr int end() const
Return the end of the dimension.
Definition: Window.h:101
arm_compute::Window::x
constexpr const Dimension & x() const
Alias to access the first dimension of the window.
Definition: Window.h:158