Compute Library
 23.08
ClWinogradConv2d Class Reference

Basic function to execute Winograd-based convolution on OpenCL. More...

#include <ClWinogradConv2d.h>

Collaboration diagram for ClWinogradConv2d:
[legend]

Public Member Functions

 ClWinogradConv2d ()
 Default constructor. More...
 
 ~ClWinogradConv2d ()
 Default destructor. More...
 
 ClWinogradConv2d (const ClWinogradConv2d &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 ClWinogradConv2d (ClWinogradConv2d &&)=default
 Default move constructor. More...
 
ClWinogradConv2doperator= (const ClWinogradConv2d &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
ClWinogradConv2doperator= (ClWinogradConv2d &&)=default
 Default move assignment operator. More...
 
void configure (const ClCompileContext &compile_context, ITensorInfo *src, ITensorInfo *weights, ITensorInfo *biases, ITensorInfo *dst, const PadStrideInfo &conv_info, const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false)
 Set the input and output tensors. More...
 
void run (ITensorPack &tensors) override
 Run the kernels contained in the function. More...
 
void prepare (ITensorPack &tensors) override
 Prepare the function for executing. More...
 
experimental::MemoryRequirements workspace () const override
 Return the memory requirements required by the workspace. More...
 
- Public Member Functions inherited from ICLOperator
 ICLOperator (IRuntimeContext *ctx=nullptr)
 Constructor. More...
 
 ICLOperator (const ICLOperator &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 ICLOperator (ICLOperator &&)=default
 Default move constructor. More...
 
ICLOperatoroperator= (const ICLOperator &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
ICLOperatoroperator= (ICLOperator &&)=default
 Default move assignment operator. More...
 
- Public Member Functions inherited from IOperator
virtual ~IOperator ()=default
 Destructor. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *src, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *dst, const PadStrideInfo &conv_info, const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false)
 Static function to check if given info will lead to a valid configuration. More...
 

Detailed Description

Basic function to execute Winograd-based convolution on OpenCL.

This function calls the following OpenCL functions/kernels:

  1. kernels::ClWinogradInputTransformKernel
  2. kernels::ClWinogradFilterTransformKernel (only once)
  3. ClGemm
  4. kernels::ClWinogradOutputTransformKernel

Definition at line 53 of file ClWinogradConv2d.h.

Constructor & Destructor Documentation

◆ ClWinogradConv2d() [1/3]

Default constructor.

Definition at line 158 of file ClWinogradConv2d.cpp.

159  : _batched_mm(),
160  _input_transform(std::make_unique<kernels::ClWinogradInputTransformKernel>()),
161  _filter_transform(std::make_unique<kernels::ClWinogradFilterTransformKernel>()),
162  _output_transform(std::make_unique<kernels::ClWinogradOutputTransformKernel>()),
163  _border_handler(),
164  _input0(),
165  _input1(),
166  _batched_mm_output(),
167  _is_prepared(false),
168  _aux_mem()
169 {
170 }

◆ ~ClWinogradConv2d()

~ClWinogradConv2d ( )
default

Default destructor.

◆ ClWinogradConv2d() [2/3]

ClWinogradConv2d ( const ClWinogradConv2d )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ ClWinogradConv2d() [3/3]

Default move constructor.

Member Function Documentation

◆ configure()

void configure ( const ClCompileContext compile_context,
ITensorInfo src,
ITensorInfo weights,
ITensorInfo biases,
ITensorInfo dst,
const PadStrideInfo conv_info,
const ActivationLayerInfo act_info = ActivationLayerInfo(),
bool  enable_fast_math = false 
)

Set the input and output tensors.

Valid data layouts:

  • NHWC
  • NCHW

Valid data type configurations:

src0 src1 src2 dst
F16 F16 F16 F16
F32 F32 F32 F32
Note
: This function only works with 3x3,3x1,1x3,5x5,5x1,1x5,7x1 and 1x7 kernels along with unit strides for both NCHW and NHWC data layout
Some Winograd configurations (i.e. F(4x4, 5x5)) are supported only with enable_fast_math = true
Parameters
[in]compile_contextThe compile context to be used.
[in]srcSource tensor info. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: F16/F32.
[in]weightsWeights tensor info. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. Data type supported:Same as src.
[in]biasesBiases tensor info. Shared biases supported. Biases are 1D tensor with dimensions [OFM].Data type supported: Same as src
[out]dstDestination tensor info. 3 lower dimensions represent a single output [width, height, OFM], while the rest represent batch of outputs. Data types supported: Same as src.
[in]conv_infoContains padding and stride information described in PadStrideInfo.
[in]act_info(Optional) Activation layer information in case of a fused activation.
[in]enable_fast_math(Optional) Enable fast math computation. In case this flag were set, the function could dispatch the fastest implementation available which may introduce a drop of accuracy as well. Default is false

Definition at line 174 of file ClWinogradConv2d.cpp.

176 {
177  ARM_COMPUTE_ERROR_THROW_ON(validate_arguments(src, weights, biases, dst, conv_info, act_info, enable_fast_math));
178  ARM_COMPUTE_LOG_PARAMS(src, weights, biases, dst, conv_info, act_info, enable_fast_math);
179 
180  // Get indices for the width and height
183 
184  // Input shape, kernel size and output tile
185  const Size2D input_dims = Size2D(src->tensor_shape()[idx_width], src->tensor_shape()[idx_height]);
186  const Size2D kernel_size = Size2D(weights->tensor_shape()[idx_width], weights->tensor_shape()[idx_height]);
187  const Size2D output_tile = winograd_output_tile(input_dims, kernel_size, src->data_layout());
188 
189  // Check if the Winograd configuration requires fast math
190  if(!enable_fast_math)
191  {
192  ARM_COMPUTE_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(src, 1, DataType::F32); //disable winograd for fp16 if fast math is false.
193  ARM_COMPUTE_ERROR_ON_MSG(check_support_fast_math(output_tile, kernel_size), "This Winograd configuration requires enable_fast_math=true");
194  }
195  const WinogradInfo winograd_info = WinogradInfo(output_tile,
196  kernel_size,
197  input_dims,
198  conv_info,
199  src->data_layout());
200 
201  _is_prepared = false;
202 
203  // Configure input transform
204  _input_transform->configure(compile_context, src, &_input0, winograd_info);
205  _border_handler.configure(compile_context, src, _input_transform->border_size(), BorderMode::CONSTANT, PixelValue());
206 
207  // Configure filter transform
208  _filter_transform->configure(compile_context, weights, &_input1, winograd_info);
209 
210  // Configure batched matrix multiply
211  _batched_mm.configure(compile_context, &_input0, &_input1, nullptr, &_batched_mm_output, 1.0f, 0.0f, GEMMInfo(false, false, true /* Reshape weights only for the first run*/, 0,
212  false, false,
213  GEMMLowpOutputStageInfo(),
214  (src->data_type() == DataType::F16)));
215 
216  // Configure output transform
217  _output_transform->set_target(CLScheduler::get().target());
218  _output_transform->configure(compile_context, &_batched_mm_output, biases, dst, winograd_info, act_info);
219 
220  _aux_mem = _batched_mm.workspace();
221  const MemoryLifetime wino_wei_lifetm = std::any_of(std::begin(_aux_mem), std::end(_aux_mem), [](const auto & r)
222  {
223  return (r.lifetime == MemoryLifetime::Persistent) && (r.size > 0);
224  }) ?
226  MemoryLifetime::Persistent;
227  _aux_mem.push_back(MemoryInfo(offset_int_vec(2), MemoryLifetime::Temporary, _input0.total_size()));
228  _aux_mem.push_back(MemoryInfo(offset_int_vec(3), wino_wei_lifetm, _input1.total_size()));
229  _aux_mem.push_back(MemoryInfo(offset_int_vec(4), MemoryLifetime::Temporary, _batched_mm_output.total_size()));
230 }

References arm_compute::test::validation::act_info, ARM_COMPUTE_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_ERROR_ON_MSG, ARM_COMPUTE_ERROR_THROW_ON, ARM_COMPUTE_LOG_PARAMS, CLFillBorderKernel::configure(), ClGemm::configure(), arm_compute::CONSTANT, arm_compute::test::validation::conv_info, arm_compute::test::validation::dst, arm_compute::mlgo::parser::end(), arm_compute::F16, arm_compute::F32, CLScheduler::get(), arm_compute::get_data_layout_dimension_index(), arm_compute::HEIGHT, arm_compute::test::validation::idx_height, arm_compute::test::validation::idx_width, arm_compute::offset_int_vec(), arm_compute::experimental::Prepare, arm_compute::test::validation::src, ITensorInfo::tensor_shape(), TensorInfo::total_size(), arm_compute::cpu::kernels::validate_arguments(), arm_compute::WIDTH, and ClGemm::workspace().

◆ operator=() [1/2]

ClWinogradConv2d& operator= ( ClWinogradConv2d &&  )
default

Default move assignment operator.

◆ operator=() [2/2]

ClWinogradConv2d& operator= ( const ClWinogradConv2d )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ prepare()

void prepare ( ITensorPack constants)
overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Parameters
[in]constantsVector that contains the constants tensors.
Note
Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from ICLOperator.

Definition at line 279 of file ClWinogradConv2d.cpp.

280 {
281  if(!_is_prepared)
282  {
283  auto weights = utils::cast::polymorphic_downcast<const ICLTensor *>(tensors.get_const_tensor(TensorType::ACL_SRC_1));
284  ICLTensor *in1_aux = utils::cast::polymorphic_downcast<ICLTensor *>(tensors.get_tensor(offset_int_vec(3)));
285 
286  CLAuxTensorHandler input1(_input1, *in1_aux);
287  ITensorPack pack_ft
288  {
289  { TensorType::ACL_SRC, weights },
290  { TensorType::ACL_DST, input1.get() },
291  };
292  // Run filter transform and mark original weights as unused
293  CLScheduler::get().enqueue_op(*_filter_transform, pack_ft, false);
294  weights->mark_as_unused();
295 
296  // Prepare GEMM and release reshaped weights if marked unused by ClGemm
297  ITensorPack mm_prepare_pack = tensors;
298  mm_prepare_pack.add_tensor(ACL_SRC_1, input1.get());
299  _batched_mm.prepare(mm_prepare_pack);
300 
301  CLScheduler::get().queue().finish();
302  _is_prepared = true;
303  }
304 }

References arm_compute::ACL_DST, arm_compute::ACL_SRC, arm_compute::ACL_SRC_1, ITensorPack::add_tensor(), CLScheduler::enqueue_op(), CLScheduler::get(), CLAuxTensorHandler::get(), ITensorPack::get_const_tensor(), ITensorPack::get_tensor(), arm_compute::offset_int_vec(), ClGemm::prepare(), and CLScheduler::queue().

Referenced by ClWinogradConv2d::run().

◆ run()

void run ( ITensorPack tensors)
overridevirtual

Run the kernels contained in the function.

Parameters
[in]tensorsVector that contains the tensors to operate on.

Reimplemented from ICLOperator.

Definition at line 239 of file ClWinogradConv2d.cpp.

240 {
241  const bool is_gemm_reshaped = _aux_mem[3].lifetime == MemoryLifetime::Prepare;
242 
243  auto src = utils::cast::polymorphic_downcast<const ICLTensor *>(tensors.get_const_tensor(TensorType::ACL_SRC_0));
244  auto biases = utils::cast::polymorphic_downcast<const ICLTensor *>(tensors.get_const_tensor(TensorType::ACL_SRC_2));
245  auto dst = utils::cast::polymorphic_downcast<ICLTensor *>(tensors.get_tensor(TensorType::ACL_DST));
246 
247  CLAuxTensorHandler input0(offset_int_vec(2), _input0, tensors, true);
248  CLAuxTensorHandler input1(offset_int_vec(3), _input1, tensors, true, is_gemm_reshaped);
249  CLAuxTensorHandler batched_mm_output(offset_int_vec(4), _batched_mm_output, tensors, true);
250 
251  prepare(tensors);
252 
253  // Run input transform
254  ITensorPack pack_it
255  {
257  { TensorType::ACL_DST, input0.get() },
258  };
259  CLScheduler::get().enqueue_op(_border_handler, pack_it, false);
260  CLScheduler::get().enqueue_op(*_input_transform, pack_it, false);
261 
262  // Run batched matrix multiplication
263  ITensorPack pack_mm = tensors;
264  pack_mm.add_const_tensor(TensorType::ACL_SRC_0, input0.get());
265  pack_mm.add_tensor(TensorType::ACL_DST, batched_mm_output.get());
266  is_gemm_reshaped ? pack_mm.remove_tensor(TensorType::ACL_SRC_1) : pack_mm.add_const_tensor(TensorType::ACL_SRC_1, input1.get());
267  _batched_mm.run(pack_mm);
268 
269  // Run output transform
270  ITensorPack pack_ot
271  {
272  { TensorType::ACL_SRC_0, batched_mm_output.get() },
273  { TensorType::ACL_SRC_1, biases },
275  };
276  CLScheduler::get().enqueue_op(*_output_transform, pack_ot);
277 }

References arm_compute::ACL_DST, arm_compute::ACL_SRC, arm_compute::ACL_SRC_0, arm_compute::ACL_SRC_1, arm_compute::ACL_SRC_2, ITensorPack::add_const_tensor(), ITensorPack::add_tensor(), arm_compute::test::validation::dst, CLScheduler::enqueue_op(), CLScheduler::get(), CLAuxTensorHandler::get(), ITensorPack::get_const_tensor(), ITensorPack::get_tensor(), arm_compute::offset_int_vec(), arm_compute::experimental::Prepare, ClWinogradConv2d::prepare(), ITensorPack::remove_tensor(), ClGemm::run(), and arm_compute::test::validation::src.

◆ validate()

Status validate ( const ITensorInfo src,
const ITensorInfo weights,
const ITensorInfo biases,
const ITensorInfo dst,
const PadStrideInfo conv_info,
const ActivationLayerInfo act_info = ActivationLayerInfo(),
bool  enable_fast_math = false 
)
static

◆ workspace()

experimental::MemoryRequirements workspace ( ) const
overridevirtual

Return the memory requirements required by the workspace.

Reimplemented from ICLOperator.

Definition at line 306 of file ClWinogradConv2d.cpp.

307 {
308  return _aux_mem;
309 }

The documentation for this class was generated from the following files:
arm_compute::BorderMode::CONSTANT
@ CONSTANT
Pixels outside the image are assumed to have a constant value.
arm_compute::opencl::ClGemm::workspace
experimental::MemoryRequirements workspace() const override
Return the memory requirements required by the workspace.
Definition: ClGemm.cpp:796
arm_compute::test::validation::src
SimpleTensor< float > src
Definition: DFT.cpp:155
arm_compute::test::validation::idx_height
const int idx_height
Definition: Scale.cpp:263
ARM_COMPUTE_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN
#define ARM_COMPUTE_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(t, c,...)
Definition: Validate.h:875
arm_compute::opencl::ClGemm::configure
void configure(const CLCompileContext &compile_context, ITensorInfo *a, ITensorInfo *b, ITensorInfo *c, ITensorInfo *output, float alpha, float beta, const GEMMInfo &gemm_info)
Initialise the kernel's inputs and output.
Definition: ClGemm.cpp:557
arm_compute::test::validation::dst
auto dst
Definition: DFT.cpp:170
arm_compute::cpu::kernels::validate_arguments
Status validate_arguments(const ITensorInfo *src, const ITensorInfo *weights, const ITensorInfo *dst, const PadStrideInfo &conv_info)
Definition: CpuDirectConv2dKernel.cpp:60
arm_compute::experimental::MemoryLifetime::Prepare
@ Prepare
arm_compute::test::validation::idx_width
const int idx_width
Definition: Scale.cpp:262
arm_compute::ACL_SRC_0
@ ACL_SRC_0
Definition: Types.h:45
arm_compute::ACL_SRC_1
@ ACL_SRC_1
Definition: Types.h:46
arm_compute::DataLayoutDimension::WIDTH
@ WIDTH
width
arm_compute::ACL_SRC_2
@ ACL_SRC_2
Definition: Types.h:47
ARM_COMPUTE_RETURN_ON_ERROR
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204
arm_compute::test::validation::act_info
act_info
Definition: DirectConvolutionLayer.cpp:547
arm_compute::experimental::MemoryInfo
Definition: Types.h:96
ARM_COMPUTE_ERROR_THROW_ON
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:456
arm_compute::experimental::MemoryLifetime
MemoryLifetime
Definition: Types.h:90
arm_compute::DataLayoutDimension::HEIGHT
@ HEIGHT
height
ARM_COMPUTE_ERROR_ON_MSG
#define ARM_COMPUTE_ERROR_ON_MSG(cond, msg)
Definition: Error.h:457
arm_compute::TensorInfo::total_size
size_t total_size() const override
Returns the total size of the tensor in bytes.
Definition: TensorInfo.h:251
arm_compute::ACL_DST
@ ACL_DST
Definition: Types.h:55
arm_compute::opencl::ClWinogradConv2d::prepare
void prepare(ITensorPack &tensors) override
Prepare the function for executing.
Definition: ClWinogradConv2d.cpp:279
arm_compute::CLScheduler::get
static CLScheduler & get()
Access the scheduler singleton.
Definition: CLScheduler.cpp:103
arm_compute::get_data_layout_dimension_index
size_t get_data_layout_dimension_index(const DataLayout &data_layout, const DataLayoutDimension &data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:203
arm_compute::offset_int_vec
int offset_int_vec(int offset)
Definition: MemoryHelpers.h:38
arm_compute::test::validation::conv_info
const auto conv_info
Definition: ConvolutionLayer.cpp:407
arm_compute::opencl::ClGemm::prepare
void prepare(ITensorPack &constants) override
Prepare the function for executing.
Definition: ClGemm.cpp:774
arm_compute::DataType::F16
@ F16
16-bit floating-point number
arm_compute::mlgo::parser::end
void end(TokenStream &in, bool &valid)
Definition: MLGOParser.cpp:290
arm_compute::CLScheduler::enqueue_op
void enqueue_op(ICLKernel &kernel, ITensorPack &tensors, bool flush=true)
Schedule the execution of the passed kernel if possible.
Definition: CLScheduler.cpp:211
arm_compute::opencl::ClGemm::run
void run(ITensorPack &tensors) override
Run the kernels contained in the function.
Definition: ClGemm.cpp:666
arm_compute::ACL_SRC
@ ACL_SRC
Definition: Types.h:44
arm_compute::DataType::F32
@ F32
32-bit floating-point number
arm_compute::CLScheduler::queue
cl::CommandQueue & queue()
Accessor for the associated CL command queue.
Definition: CLScheduler.cpp:39
arm_compute::CLFillBorderKernel::configure
void configure(const CLCompileContext &compile_context, ICLTensor *tensor, BorderSize border_size, BorderMode border_mode, const PixelValue &constant_border_value=PixelValue())
Initialise the kernel's input, output and border mode.
Definition: CLFillBorderKernel.cpp:64
ARM_COMPUTE_LOG_PARAMS
#define ARM_COMPUTE_LOG_PARAMS(...)
Definition: Log.h:35