Compute Library
 22.08
CpuWinogradConv2d Class Reference

#include <CpuWinogradConv2d.h>

Collaboration diagram for CpuWinogradConv2d:
[legend]

Public Member Functions

 CpuWinogradConv2d ()
 Constructor. More...
 
 ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE (CpuWinogradConv2d)
 
 ~CpuWinogradConv2d ()
 Destructor. More...
 
void configure (const ITensorInfo *src, const ITensorInfo *weights, const ITensorInfo *biases, ITensorInfo *dst, const PadStrideInfo &conv_info, const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false)
 Set the input and output tensors. More...
 
void run (ITensorPack &tensors) override
 Run the kernels contained in the function. More...
 
void prepare (ITensorPack &constants) override
 Prepare the function for executing. More...
 
experimental::MemoryRequirements workspace () const override
 Return the memory requirements required by the workspace. More...
 
- Public Member Functions inherited from INEOperator
 INEOperator (IRuntimeContext *ctx=nullptr)
 Constructor. More...
 
 INEOperator (const INEOperator &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 INEOperator (INEOperator &&)=default
 Default move constructor. More...
 
INEOperatoroperator= (const INEOperator &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
INEOperatoroperator= (INEOperator &&)=default
 Default move assignment operator. More...
 
 ~INEOperator ()
 Default destructor. More...
 
- Public Member Functions inherited from IOperator
virtual ~IOperator ()=default
 Destructor. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *src, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *dst, const PadStrideInfo &conv_info, const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false)
 Static function to check if given info will lead to a valid configuration of CpuWinogradConv2d. More...
 

Detailed Description

Definition at line 42 of file CpuWinogradConv2d.h.

Constructor & Destructor Documentation

◆ CpuWinogradConv2d()

Constructor.

Definition at line 134 of file CpuWinogradConv2d.cpp.

136  : _gemm_function(std::make_unique<CpuGemm>()),
137  _activation_func(std::make_unique<CpuActivation>()),
138  _transform_input_kernel(nullptr),
139  _transform_output_kernel(nullptr),
140  _permute_input(std::make_unique<CpuPermute>()),
141  _permute_output(std::make_unique<CpuPermute>()),
142  _permute_weights(std::make_unique<CpuPermute>()),
143  _aux_mem(AuxTensorIdx::Count),
144  _conv_args{ nullptr },
145  _winograd_impl{},
146  _data_layout(),
147  _winograd_transformed_input{},
148  _winograd_transformed_output{},
149  _winograd_transformed_weights{},
150  _input_workspace(),
151  _output_workspace(),
152  _weights_hwio(),
153  _input_nhwc(),
154  _output_nhwc(),
155  _is_prepared{ false },
156  _run_activation{ false }
157 {
158 }

◆ ~CpuWinogradConv2d()

~CpuWinogradConv2d ( )
default

Destructor.

Member Function Documentation

◆ ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE()

ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE ( CpuWinogradConv2d  )

◆ configure()

void configure ( const ITensorInfo src,
const ITensorInfo weights,
const ITensorInfo biases,
ITensorInfo dst,
const PadStrideInfo conv_info,
const ActivationLayerInfo act_info = ActivationLayerInfo(),
bool  enable_fast_math = false 
)

Set the input and output tensors.

Valid data layouts:

  • NHWC
  • NCHW

Valid data type configurations:

src0 src1 src2 dst
F16 F16 F16 F16
F32 F32 F32 F32
Parameters
[in]srcSource tensor Info. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: F16/F32.
[in]weightsWeights tensor Info. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. Data type supported: Same as input. Currently only 3x3 and 5x5 kernels are supported.
[in]biasesBiases tensor Info. Shared biases supported. Biases are 1D tensor with dimensions [OFM]. Data type supported: Same as weights.
[out]dstDestination tensor Info. 3 lower dimensions represent a single output [width, height, OFM], while the rest represent batch of outputs. Data types supported: Same as input.
[in]conv_infoContains padding and stride information described in PadStrideInfo. Currently only unit strides are supported.
[in]act_info(Optional) Activation layer information in case of a fused activation.
[in]enable_fast_math(Optional) Enable fast math computation. In case this flag were set, the function could dispatch the fastest implementation available which may introduce a drop of accuracy as well. Default is false

Definition at line 162 of file CpuWinogradConv2d.cpp.

References ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, ARM_COMPUTE_EXIT_ON_MSG_VAR, ARM_COMPUTE_LOG_MSG_WITH_FORMAT_ACL, ARM_COMPUTE_LOG_PARAMS, ARM_COMPUTE_UNUSED, ITensorInfo::data_layout(), ITensorInfo::data_type(), ITensorInfo::dimension(), ITensorInfo::element_size(), ActivationLayerInfo::enabled(), Scheduler::get(), arm_compute::logging::INFO, arm_compute::test::validation::info, TensorInfo::init(), arm_compute::test::validation::k, arm_compute::test::validation::m, MemoryInfo::merge(), arm_compute::test::validation::n, arm_compute::NCHW, IScheduler::num_threads(), arm_compute::offset_int_vec(), arm_compute::experimental::Prepare, Dimensions< T >::set(), ITensorInfo::total_size(), TensorInfo::total_size(), arm_compute::utils::cast::U, arm_compute::U8, and arm_compute::cpu::kernels::validate_arguments().

164 {
167  ARM_COMPUTE_LOG_PARAMS(src, weights, biases, dst, conv_info, act_info, enable_fast_math);
168  ARM_COMPUTE_UNUSED(biases);
169  const DataType data_type = src->data_type();
170  uint32_t nthreads = NEScheduler::get().num_threads();
171  _data_layout = src->data_layout();
172  const Tensor4DShape kernel_shape{ internal_get_shape(weights) };
173 
174  bool success = get_winograd_kernel_implementation(src, weights, dst, conv_info, act_info, enable_fast_math, &_winograd_impl, _conv_args);
175 
176  ARM_COMPUTE_EXIT_ON_MSG_VAR(!success, "Unsupported kernel size: %d x %d.\n", kernel_shape.n_rows, kernel_shape.n_cols);
177  ARM_COMPUTE_LOG_MSG_WITH_FORMAT_ACL(arm_compute::logging::LogLevel::INFO, "Using input transform: %s\n", _winograd_impl.input_transform->get_name().c_str());
178  ARM_COMPUTE_LOG_MSG_WITH_FORMAT_ACL(arm_compute::logging::LogLevel::INFO, "Using weight transform: %s\n", _winograd_impl.input_transform->get_name().c_str());
179  ARM_COMPUTE_LOG_MSG_WITH_FORMAT_ACL(arm_compute::logging::LogLevel::INFO, "Using output transform: %s\n", _winograd_impl.input_transform->get_name().c_str());
180 
181  const bool has_impl = ((_winograd_impl.input_transform != nullptr) && (_winograd_impl.output_transform != nullptr) && (_winograd_impl.gemm_args != nullptr));
182  if(has_impl)
183  {
184  // Determine how much working space is required, allocate it.
185  const size_t input_workspace_size = _winograd_impl.input_transform->get_working_space_size(*_conv_args, nthreads);
186  const size_t output_workspace_size = _winograd_impl.output_transform->get_working_space_size(*_conv_args, nthreads);
187 
188  TensorInfo input_workspace_info(TensorShape(input_workspace_size), 1, DataType::U8);
189  TensorInfo output_workspace_info(TensorShape(output_workspace_size), 1, DataType::U8);
190  _input_workspace = input_workspace_info;
191  _output_workspace = output_workspace_info;
192 
193  const auto &wds = _winograd_impl.winograd_spec;
194 
195  // Preparing winograd transformed input tensor
196  const size_t data_type_size = src->element_size();
197  const uint32_t m = _winograd_impl.gemm_args->_Msize; // Total number of tiles
198  const uint32_t k = _winograd_impl.gemm_args->_Ksize; // Input channels
199  const uint32_t n = _winograd_impl.gemm_args->_Nsize; // Output channels
200  const uint32_t n_gemms = _winograd_impl.gemm_args->_nmulti;
201  const uint32_t n_batches = _winograd_impl.gemm_args->_nbatches;
202  constexpr size_t storage_alignment = 64;
203 
204  const TensorShape a_shape(k, m, n_batches, n_gemms);
205  Strides a_strides(data_type_size);
206  a_strides.set(1, data_type_size * _winograd_impl.winograd_spec.input_ld_row);
207  a_strides.set(2, data_type_size * _winograd_impl.winograd_spec.input_ld_batch);
208  a_strides.set(3, data_type_size * _winograd_impl.winograd_spec.input_ld_matrix);
209 
210  const TensorShape b_shape(n, k, n_gemms);
211  Strides b_strides(data_type_size);
212  b_strides.set(1, data_type_size * _winograd_impl.winograd_spec.weight_ld_row);
213  b_strides.set(2, data_type_size * _winograd_impl.winograd_spec.weight_ld_matrix);
214 
215  const TensorShape d_shape(n, m, n_batches, n_gemms);
216  Strides d_strides(data_type_size);
217  d_strides.set(1, data_type_size * _winograd_impl.winograd_spec.output_ld_row);
218  d_strides.set(2, data_type_size * _winograd_impl.winograd_spec.output_ld_batch);
219  d_strides.set(3, data_type_size * _winograd_impl.winograd_spec.output_ld_matrix);
220 
221  TensorInfo a_info{};
222  TensorInfo b_info{};
223  TensorInfo d_info{};
224  a_info.init(a_shape, 1, data_type, a_strides, 0, wds.input_matrix_size_bytes);
225  b_info.init(b_shape, 1, data_type, b_strides, 0, wds.weight_matrix_size_bytes);
226  d_info.init(d_shape, 1, data_type, d_strides, 0, wds.output_matrix_size_bytes);
227 
228  _winograd_transformed_input = a_info;
229  _winograd_transformed_weights = b_info;
230  _winograd_transformed_output = d_info;
231 
232  PermutationVector weights_permutation_vector(3U, 0U, 1U, 2U);
233 
234  // Configure the kernel to transform the input tensor from NCHW -> NHWC
235  if(_data_layout == DataLayout::NCHW)
236  {
237  _permute_input->configure(src, &_input_nhwc, PermutationVector(2U, 0U, 1U));
238  weights_permutation_vector = PermutationVector(3U, 2U, 0U, 1U);
239  }
240 
241  // Re-order a weight tensor from [Output feature map x Input feature map x Height x Width] to [Height x Width x Input feature map x Output feature map]
242  _permute_weights->configure(weights, &_weights_hwio, weights_permutation_vector);
243 
244  // Reorder the convoluted output to ACL's ordering NCHW
245  if(_data_layout == DataLayout::NCHW)
246  {
247  // configure and allocate dst tensor to be used to convert from winograd domain to spatial domain when calling to reshape_output()
248  TensorInfo info(TensorShape(dst->dimension(2), dst->dimension(0),
249  dst->dimension(1), dst->dimension(3)),
250  1, dst->data_type());
251  _output_nhwc = info;
252  _permute_output->configure(&_output_nhwc, dst, PermutationVector(1U, 2U, 0U));
253  }
254 
255  // Configure input transform kernel
256  _transform_input_kernel = std::make_unique<CpuWinogradConv2dTransformInputKernel>(_winograd_impl, *_conv_args, nthreads);
257 
258  // Configure GEMM function
259  _gemm_function->configure(&_winograd_transformed_input, &_winograd_transformed_weights, nullptr, &_winograd_transformed_output, 1.0f, 0.f);
260 
261  // Configure output transform kernel
262  _transform_output_kernel = std::make_unique<CpuWinogradConv2dTransformOutputKernel>(_winograd_impl, *_conv_args, nthreads);
263 
264  //Configure Activation Layer
265  _run_activation = act_info.enabled() && !fuse_function_supported(act_info);
266  if(_run_activation)
267  {
268  _activation_func->configure(dst, nullptr, act_info);
269  }
270 
271  auto asm_mem_req = _gemm_function->workspace();
272  _aux_mem[GemmWorkspace] = asm_mem_req[GemmWorkspace];
273  _aux_mem[Pretranspose] = asm_mem_req[Pretranspose];
274  _aux_mem[InterleavedLHS] = asm_mem_req[InterleavedLHS];
275  _aux_mem[TransposedRHS] = asm_mem_req[TransposedRHS];
276  _aux_mem[TempResult] = asm_mem_req[TempResult];
277 
278  // Request temporary memory. Overlap memory needed for Input/Output transformations as they run on different non-overlapping time-steps.
279  _aux_mem[TransformedInput] = MemoryInfo(offset_int_vec(TransformedInput), MemoryLifetime::Temporary, wds.input_matrix_size_bytes, storage_alignment);
280  _aux_mem[TransformedOutput] = MemoryInfo(offset_int_vec(TransformedOutput), MemoryLifetime::Temporary, wds.output_matrix_size_bytes, storage_alignment);
281  _aux_mem[WorkspaceIO] = MemoryInfo(offset_int_vec(WorkspaceIO), MemoryLifetime::Temporary, std::max(input_workspace_size, output_workspace_size));
282  _aux_mem[PermutedWeights] = MemoryInfo(offset_int_vec(PermutedWeights), MemoryLifetime::Prepare, _weights_hwio.total_size());
283  _aux_mem[TransformedWeights] = MemoryInfo(offset_int_vec(TransformedWeights), MemoryLifetime::Persistent, wds.weight_matrix_size_bytes, storage_alignment);
284  if(_data_layout == DataLayout::NCHW)
285  {
286  _aux_mem[PermutedInput].merge(offset_int_vec(PermutedInput), src->total_size());
287  _aux_mem[PermutedOutput].merge(offset_int_vec(PermutedOutput), dst->total_size());
288  }
289  }
290 }
bool merge(int slot, size_t new_size, size_t new_alignment=0) noexcept
Definition: Types.h:115
1 channel, 1 U8 per channel
#define ARM_COMPUTE_EXIT_ON_MSG_VAR(cond, msg,...)
If the condition is true, the given message is printed and program exits.
Definition: Error.h:395
#define ARM_COMPUTE_LOG_MSG_WITH_FORMAT_ACL(log_level, fmt,...)
Log a message with format to the logger.
Definition: Log.h:66
Strides PermutationVector
Permutation vector.
Definition: Types.h:51
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
Status validate_arguments(const ITensorInfo *src, const ITensorInfo *weights, const ITensorInfo *dst, const PadStrideInfo &conv_info)
SimpleTensor< float > src
Definition: DFT.cpp:155
#define ARM_COMPUTE_UNUSED(...)
To avoid unused variables warnings.
Definition: Error.h:152
size_t total_size() const override
Returns the total size of the tensor in bytes.
Definition: TensorInfo.h:250
Num samples, channels, height, width.
ScaleKernelInfo info(interpolation_policy, default_border_mode, PixelValue(), sampling_policy, false)
#define ARM_COMPUTE_LOG_PARAMS(...)
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:157
int offset_int_vec(int offset)
Definition: MemoryHelpers.h:38
virtual unsigned int num_threads() const =0
Returns the number of threads that the SingleThreadScheduler has in its pool.
DataType
Available data types.
Definition: Types.h:79
static IScheduler & get()
Access the scheduler singleton.
Definition: Scheduler.cpp:94

◆ prepare()

void prepare ( ITensorPack constants)
overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Parameters
[in]constantsVector that contains the constants tensors.
Note
Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from INEOperator.

Definition at line 368 of file CpuWinogradConv2d.cpp.

References arm_compute::ACL_DST, arm_compute::ACL_SRC, arm_compute::ACL_SRC_1, ITensorPack::add_const_tensor(), ARM_COMPUTE_ERROR_ON_NULLPTR, ITensor::buffer(), CpuAuxTensorHandler::get(), ITensorPack::get_const_tensor(), ITensorPack::get_tensor(), ITensor::info(), ITensorInfo::offset_first_element_in_bytes(), arm_compute::offset_int_vec(), and arm_compute::utils::cast::polymorphic_cast().

Referenced by CpuWinogradConv2d::run().

369 {
370  if(!_is_prepared)
371  {
372  const ITensor *weights = tensors.get_const_tensor(ACL_SRC_1);
373  ITensor *weights_aux = utils::cast::polymorphic_cast<ITensor *>(tensors.get_tensor(offset_int_vec(PermutedWeights)));
374 
375  CpuAuxTensorHandler permuted_weights(_weights_hwio, *weights_aux);
376  ITensorPack permute_tensors{ { ACL_SRC, weights }, { ACL_DST, permuted_weights.get() } };
377  _permute_weights->run(permute_tensors);
378  const int element_size_in_bytes = permuted_weights.get()->info()->element_size();
379  // Weights were in OHWI format, before being permuted "permuted_weights" to be in HWIO format.
380  const unsigned int height_idx = 3; // H in HWIO
381  const unsigned int width_idx = 2; // W in HWIO
382  const unsigned int channel_idx = 1; // I in HWIO
383 
384  const int permuted_weight_row_stride = permuted_weights.get()->info()->strides_in_bytes()[height_idx] / element_size_in_bytes;
385  const int permuted_weight_col_stride = permuted_weights.get()->info()->strides_in_bytes()[width_idx] / element_size_in_bytes;
386  const int permuted_weight_channel_stride = permuted_weights.get()->info()->strides_in_bytes()[channel_idx] / element_size_in_bytes;
387 
388  // Wrap the winograd-domain transformed weight TensorInfo in Auxiliary tensor and allocate the required memory.
389  ITensor *weights_transf = utils::cast::polymorphic_cast<ITensor *>(tensors.get_tensor(offset_int_vec(TransformedWeights)));
390  ARM_COMPUTE_ERROR_ON_NULLPTR(weights_transf);
391  CpuAuxTensorHandler winograd_transformed_weights(_winograd_transformed_weights, *weights_transf);
392 
393  const void *permuted_weights_ptr;
394  void *win_wght_transf_ptr;
395 
396  permuted_weights_ptr = reinterpret_cast<const void *>(permuted_weights.get()->buffer() + permuted_weights.get()->info()->offset_first_element_in_bytes());
397  win_wght_transf_ptr = reinterpret_cast<void *>(winograd_transformed_weights.get()->buffer() + winograd_transformed_weights.get()->info()->offset_first_element_in_bytes());
398 
399  // Prepare Weights
400  _winograd_impl.weight_transform->execute(
401  *_conv_args,
402  permuted_weights_ptr,
403  permuted_weight_row_stride,
404  permuted_weight_col_stride,
405  permuted_weight_channel_stride,
406  win_wght_transf_ptr,
407  _winograd_impl.winograd_spec,
408  0, 1 // Thread 1 of 1
409  );
410  ITensorPack gemm_pack = tensors;
411  gemm_pack.add_const_tensor(ACL_SRC_1, winograd_transformed_weights.get());
412  _gemm_function->prepare(gemm_pack);
413  _is_prepared = 1;
414  }
415 }
Target polymorphic_cast(Source *v)
Polymorphic cast between two types.
Definition: Cast.h:47
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:157
int offset_int_vec(int offset)
Definition: MemoryHelpers.h:38

◆ run()

void run ( ITensorPack tensors)
overridevirtual

Run the kernels contained in the function.

Parameters
[in]tensorsVector that contains the tensors to operate on.

Reimplemented from INEOperator.

Definition at line 310 of file CpuWinogradConv2d.cpp.

References arm_compute::ACL_BIAS, arm_compute::ACL_DST, arm_compute::ACL_INT, arm_compute::ACL_SRC, arm_compute::ACL_SRC_0, arm_compute::ACL_SRC_1, arm_compute::ACL_SRC_2, ITensorPack::add_const_tensor(), ITensorPack::add_tensor(), Window::DimX, Scheduler::get(), CpuAuxTensorHandler::get(), ITensorPack::get_const_tensor(), ITensorPack::get_tensor(), arm_compute::NCHW, IScheduler::num_threads(), arm_compute::offset_int_vec(), arm_compute::test::validation::pack, CpuWinogradConv2d::prepare(), IScheduler::schedule_op(), and Window::set().

311 {
312  prepare(tensors);
313  auto src = tensors.get_const_tensor(ACL_SRC_0);
314  auto biases = tensors.get_const_tensor(ACL_SRC_2);
315  auto output = tensors.get_tensor(ACL_DST);
316  Window win;
317 
318  const uint32_t nthreads = NEScheduler::get().num_threads();
319 
320  // The Winograd transform implementation does fine-grain threading inside the transforms. Just pass thread_id and nthreads.
321  win.set(Window::DimX, Window::Dimension(0, nthreads, 1));
322 
323  // Wrap the winograd-domain tensorInfos created in configuration in tensors and allocate the required memory.
324  CpuAuxTensorHandler input_nhwc(offset_int_vec(PermutedInput), _input_nhwc, tensors, true);
325  CpuAuxTensorHandler winograd_input_transformed(offset_int_vec(TransformedInput), _winograd_transformed_input, tensors, true);
326  CpuAuxTensorHandler input_workspace(offset_int_vec(WorkspaceIO), _input_workspace, tensors, true);
327  const bool is_nchw = _data_layout == DataLayout::NCHW;
328  if(is_nchw)
329  {
330  //Bring channels to the front as Winograd code expects the tensor to be in the format NHWC
331  ITensorPack pack{ { ACL_SRC, src }, { ACL_DST, input_nhwc.get() } };
332  _permute_input->run(pack);
333  }
334 
335  CpuAuxTensorHandler winograd_output_transformed(offset_int_vec(TransformedOutput), _winograd_transformed_output, tensors, true);
336  CpuAuxTensorHandler output_workspace(offset_int_vec(WorkspaceIO), _output_workspace, tensors, true);
337  CpuAuxTensorHandler output_nhwc(offset_int_vec(PermutedOutput), _output_nhwc, tensors, true);
338 
339  ITensorPack transform_input_pack{ { ACL_SRC, is_nchw ? input_nhwc.get() : src }, { ACL_DST, winograd_input_transformed.get() }, { ACL_INT, input_workspace.get() } };
340  NEScheduler::get().schedule_op(_transform_input_kernel.get(), Window::DimX, win, transform_input_pack);
341 
342  CpuAuxTensorHandler winograd_weights_transformed(offset_int_vec(TransformedWeights), _winograd_transformed_weights, tensors, true);
343 
344  // Run 16 GEMMs in multiple threads, each kernel runs one or more GEMMs
345  ITensorPack gemm_pack = tensors;
346  gemm_pack.add_const_tensor(ACL_SRC, winograd_input_transformed.get());
347  gemm_pack.add_const_tensor(ACL_SRC_1, winograd_weights_transformed.get());
348  gemm_pack.add_const_tensor(ACL_BIAS, nullptr);
349  gemm_pack.add_tensor(ACL_DST, winograd_output_transformed.get());
350  _gemm_function->run(gemm_pack);
351 
352  // Output transform
353  ITensorPack transform_output_pack{ { ACL_SRC_0, winograd_output_transformed.get() }, { ACL_DST, is_nchw ? output_nhwc.get() : output }, { ACL_SRC_1, biases }, { ACL_INT, output_workspace.get() } };
354  NEScheduler::get().schedule_op(_transform_output_kernel.get(), Window::DimX, win, transform_output_pack);
355  if(is_nchw)
356  {
357  // Reorder the convoluted output to ACL's ordering NCHW
358  ITensorPack pack{ { ACL_SRC, output_nhwc.get() }, { ACL_DST, output } };
359  _permute_output->run(pack);
360  }
361  if(_run_activation)
362  {
363  ITensorPack pack{ { ACL_SRC, output }, { ACL_DST, output } };
364  _activation_func->run(pack);
365  }
366 }
virtual void schedule_op(ICPPKernel *kernel, const Hints &hints, const Window &window, ITensorPack &tensors)=0
Runs the kernel in the same thread as the caller synchronously.
SimpleTensor< float > src
Definition: DFT.cpp:155
static constexpr size_t DimX
Alias for dimension 0 also known as X dimension.
Definition: Window.h:43
void prepare(ITensorPack &constants) override
Prepare the function for executing.
Num samples, channels, height, width.
int offset_int_vec(int offset)
Definition: MemoryHelpers.h:38
virtual unsigned int num_threads() const =0
Returns the number of threads that the SingleThreadScheduler has in its pool.
static IScheduler & get()
Access the scheduler singleton.
Definition: Scheduler.cpp:94

◆ validate()

Status validate ( const ITensorInfo src,
const ITensorInfo weights,
const ITensorInfo biases,
const ITensorInfo dst,
const PadStrideInfo conv_info,
const ActivationLayerInfo act_info = ActivationLayerInfo(),
bool  enable_fast_math = false 
)
static

Static function to check if given info will lead to a valid configuration of CpuWinogradConv2d.

Similar to CpuWinogradConv2d::configure()

Returns
a status

Definition at line 291 of file CpuWinogradConv2d.cpp.

References ARM_COMPUTE_LOG_MSG_WITH_FORMAT_ACL, ARM_COMPUTE_RETURN_ERROR_ON_MSG_VAR, ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR, ARM_COMPUTE_RETURN_ON_ERROR, arm_compute::logging::INFO, and arm_compute::cpu::kernels::validate_arguments().

Referenced by CpuConv2d::get_convolution_method(), NEWinogradConvolutionLayer::validate(), and CpuConv2d::validate().

293 {
296 
297  const Tensor4DShape kernel_shape{ internal_get_shape(weights) };
298  arm_conv::winograd::WinogradImpl winograd_impl{};
299 
300  std::unique_ptr<arm_conv::ConvolutionArgs> conv_args;
301  const bool success = get_winograd_kernel_implementation(src, weights, dst, conv_info, act_info, enable_fast_math, &winograd_impl, conv_args);
302 
303  ARM_COMPUTE_RETURN_ERROR_ON_MSG_VAR(success == false, "Unsupported kernel size: %d x %d.\n", kernel_shape.n_rows, kernel_shape.n_cols);
304  ARM_COMPUTE_LOG_MSG_WITH_FORMAT_ACL(arm_compute::logging::LogLevel::INFO, "Using input transform: %s\n", winograd_impl.input_transform->get_name().c_str());
305  ARM_COMPUTE_LOG_MSG_WITH_FORMAT_ACL(arm_compute::logging::LogLevel::INFO, "Using weight transform: %s\n", winograd_impl.input_transform->get_name().c_str());
306  ARM_COMPUTE_LOG_MSG_WITH_FORMAT_ACL(arm_compute::logging::LogLevel::INFO, "Using output transform: %s\n", winograd_impl.input_transform->get_name().c_str());
307  return Status{};
308 }
#define ARM_COMPUTE_RETURN_ERROR_ON_MSG_VAR(cond, msg,...)
If the condition is true, an error is returned.
Definition: Error.h:227
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204
#define ARM_COMPUTE_LOG_MSG_WITH_FORMAT_ACL(log_level, fmt,...)
Log a message with format to the logger.
Definition: Log.h:66
Status validate_arguments(const ITensorInfo *src, const ITensorInfo *weights, const ITensorInfo *dst, const PadStrideInfo &conv_info)
SimpleTensor< float > src
Definition: DFT.cpp:155
#define ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR(...)
Definition: Validate.h:159

◆ workspace()

experimental::MemoryRequirements workspace ( ) const
overridevirtual

Return the memory requirements required by the workspace.

Reimplemented from INEOperator.

Definition at line 416 of file CpuWinogradConv2d.cpp.

417 {
418  return _aux_mem;
419 }

The documentation for this class was generated from the following files: