Compute Library
 22.08
ClWinogradConv2d Class Reference

Basic function to execute Winograd-based convolution on OpenCL. More...

#include <ClWinogradConv2d.h>

Collaboration diagram for ClWinogradConv2d:
[legend]

Public Member Functions

 ClWinogradConv2d ()
 Default constructor. More...
 
 ~ClWinogradConv2d ()
 Default destructor. More...
 
 ClWinogradConv2d (const ClWinogradConv2d &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 ClWinogradConv2d (ClWinogradConv2d &&)=default
 Default move constructor. More...
 
ClWinogradConv2doperator= (const ClWinogradConv2d &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
ClWinogradConv2doperator= (ClWinogradConv2d &&)=default
 Default move assignment operator. More...
 
void configure (const ClCompileContext &compile_context, ITensorInfo *src, ITensorInfo *weights, ITensorInfo *biases, ITensorInfo *dst, const PadStrideInfo &conv_info, const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false)
 Set the input and output tensors. More...
 
void run (ITensorPack &tensors) override
 Run the kernels contained in the function. More...
 
void prepare (ITensorPack &tensors) override
 Prepare the function for executing. More...
 
experimental::MemoryRequirements workspace () const override
 Return the memory requirements required by the workspace. More...
 
- Public Member Functions inherited from ICLOperator
 ICLOperator (IRuntimeContext *ctx=nullptr)
 Constructor. More...
 
 ICLOperator (const ICLOperator &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 ICLOperator (ICLOperator &&)=default
 Default move constructor. More...
 
ICLOperatoroperator= (const ICLOperator &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
ICLOperatoroperator= (ICLOperator &&)=default
 Default move assignment operator. More...
 
- Public Member Functions inherited from IOperator
virtual ~IOperator ()=default
 Destructor. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *src, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *dst, const PadStrideInfo &conv_info, const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false)
 Static function to check if given info will lead to a valid configuration. More...
 

Detailed Description

Basic function to execute Winograd-based convolution on OpenCL.

This function calls the following OpenCL functions/kernels:

  1. kernels::ClWinogradInputTransformKernel
  2. kernels::ClWinogradFilterTransformKernel (only once)
  3. ClGemm
  4. kernels::ClWinogradOutputTransformKernel

Definition at line 53 of file ClWinogradConv2d.h.

Constructor & Destructor Documentation

◆ ClWinogradConv2d() [1/3]

Default constructor.

Definition at line 158 of file ClWinogradConv2d.cpp.

159  : _batched_mm(),
160  _input_transform(std::make_unique<kernels::ClWinogradInputTransformKernel>()),
161  _filter_transform(std::make_unique<kernels::ClWinogradFilterTransformKernel>()),
162  _output_transform(std::make_unique<kernels::ClWinogradOutputTransformKernel>()),
163  _border_handler(),
164  _input0(),
165  _input1(),
166  _batched_mm_output(),
167  _is_prepared(false),
168  _aux_mem()
169 {
170 }

◆ ~ClWinogradConv2d()

~ClWinogradConv2d ( )
default

Default destructor.

◆ ClWinogradConv2d() [2/3]

ClWinogradConv2d ( const ClWinogradConv2d )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ ClWinogradConv2d() [3/3]

Default move constructor.

Member Function Documentation

◆ configure()

void configure ( const ClCompileContext compile_context,
ITensorInfo src,
ITensorInfo weights,
ITensorInfo biases,
ITensorInfo dst,
const PadStrideInfo conv_info,
const ActivationLayerInfo act_info = ActivationLayerInfo(),
bool  enable_fast_math = false 
)

Set the input and output tensors.

Valid data layouts:

  • NHWC
  • NCHW

Valid data type configurations:

src0 src1 src2 dst
F16 F16 F16 F16
F32 F32 F32 F32
Note
: This function only works with 3x3,3x1,1x3,5x5,5x1,1x5,7x1 and 1x7 kernels along with unit strides for both NCHW and NHWC data layout
Some Winograd configurations (i.e. F(4x4, 5x5)) are supported only with enable_fast_math = true
Parameters
[in]compile_contextThe compile context to be used.
[in]srcSource tensor info. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: F16/F32.
[in]weightsWeights tensor info. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. Data type supported:Same as src.
[in]biasesBiases tensor info. Shared biases supported. Biases are 1D tensor with dimensions [OFM].Data type supported: Same as src
[out]dstDestination tensor info. 3 lower dimensions represent a single output [width, height, OFM], while the rest represent batch of outputs. Data types supported: Same as src.
[in]conv_infoContains padding and stride information described in PadStrideInfo.
[in]act_info(Optional) Activation layer information in case of a fused activation.
[in]enable_fast_math(Optional) Enable fast math computation. In case this flag were set, the function could dispatch the fastest implementation available which may introduce a drop of accuracy as well. Default is false

Definition at line 174 of file ClWinogradConv2d.cpp.

References ARM_COMPUTE_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_ERROR_ON_MSG, ARM_COMPUTE_ERROR_THROW_ON, ARM_COMPUTE_LOG_PARAMS, CLFillBorderKernel::configure(), ClGemm::configure(), arm_compute::CONSTANT, ITensorInfo::data_layout(), ITensorInfo::data_type(), arm_compute::mlgo::parser::end(), arm_compute::F16, arm_compute::F32, CLScheduler::get(), arm_compute::get_data_layout_dimension_index(), arm_compute::HEIGHT, arm_compute::test::validation::idx_height, arm_compute::test::validation::idx_width, arm_compute::offset_int_vec(), arm_compute::experimental::Prepare, ITensorInfo::tensor_shape(), TensorInfo::total_size(), arm_compute::cpu::kernels::validate_arguments(), arm_compute::WIDTH, and ClGemm::workspace().

176 {
177  ARM_COMPUTE_ERROR_THROW_ON(validate_arguments(src, weights, biases, dst, conv_info, act_info, enable_fast_math));
178  ARM_COMPUTE_LOG_PARAMS(src, weights, biases, dst, conv_info, act_info, enable_fast_math);
179 
180  // Get indices for the width and height
183 
184  // Input shape, kernel size and output tile
185  const Size2D input_dims = Size2D(src->tensor_shape()[idx_width], src->tensor_shape()[idx_height]);
186  const Size2D kernel_size = Size2D(weights->tensor_shape()[idx_width], weights->tensor_shape()[idx_height]);
187  const Size2D output_tile = winograd_output_tile(input_dims, kernel_size, src->data_layout());
188 
189  // Check if the Winograd configuration requires fast math
190  if(!enable_fast_math)
191  {
192  ARM_COMPUTE_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(src, 1, DataType::F32); //disable winograd for fp16 if fast math is false.
193  ARM_COMPUTE_ERROR_ON_MSG(check_support_fast_math(output_tile, kernel_size), "This Winograd configuration requires enable_fast_math=true");
194  }
195  const WinogradInfo winograd_info = WinogradInfo(output_tile,
196  kernel_size,
197  input_dims,
198  conv_info,
199  src->data_layout());
200 
201  _is_prepared = false;
202 
203  // Configure input transform
204  _input_transform->configure(compile_context, src, &_input0, winograd_info);
205  _border_handler.configure(compile_context, src, _input_transform->border_size(), BorderMode::CONSTANT, PixelValue());
206 
207  // Configure filter transform
208  _filter_transform->configure(compile_context, weights, &_input1, winograd_info);
209 
210  // Configure batched matrix multiply
211  _batched_mm.configure(compile_context, &_input0, &_input1, nullptr, &_batched_mm_output, 1.0f, 0.0f, GEMMInfo(false, false, true /* Reshape weights only for the first run*/, 0,
212  false, false,
213  GEMMLowpOutputStageInfo(),
214  (src->data_type() == DataType::F16)));
215 
216  // Configure output transform
217  _output_transform->set_target(CLScheduler::get().target());
218  _output_transform->configure(compile_context, &_batched_mm_output, biases, dst, winograd_info, act_info);
219 
220  _aux_mem = _batched_mm.workspace();
221  const MemoryLifetime wino_wei_lifetm = std::any_of(std::begin(_aux_mem), std::end(_aux_mem), [](const auto & r)
222  {
223  return (r.lifetime == MemoryLifetime::Persistent) && (r.size > 0);
224  }) ?
226  MemoryLifetime::Persistent;
227  _aux_mem.push_back(MemoryInfo(offset_int_vec(2), MemoryLifetime::Temporary, _input0.total_size()));
228  _aux_mem.push_back(MemoryInfo(offset_int_vec(3), wino_wei_lifetm, _input1.total_size()));
229  _aux_mem.push_back(MemoryInfo(offset_int_vec(4), MemoryLifetime::Temporary, _batched_mm_output.total_size()));
230 }
void configure(const CLCompileContext &compile_context, ICLTensor *tensor, BorderSize border_size, BorderMode border_mode, const PixelValue &constant_border_value=PixelValue())
Initialise the kernel&#39;s input, output and border mode.
static CLScheduler & get()
Access the scheduler singleton.
void configure(const CLCompileContext &compile_context, ITensorInfo *a, ITensorInfo *b, ITensorInfo *c, ITensorInfo *output, float alpha, float beta, const GEMMInfo &gemm_info)
Initialise the kernel&#39;s inputs and output.
Definition: ClGemm.cpp:557
1 channel, 1 F32 per channel
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
Status validate_arguments(const ITensorInfo *src, const ITensorInfo *weights, const ITensorInfo *dst, const PadStrideInfo &conv_info)
SimpleTensor< float > src
Definition: DFT.cpp:155
1 channel, 1 F16 per channel
#define ARM_COMPUTE_ERROR_ON_MSG(cond, msg)
Definition: Error.h:456
size_t total_size() const override
Returns the total size of the tensor in bytes.
Definition: TensorInfo.h:250
void end(TokenStream &in, bool &valid)
Definition: MLGOParser.cpp:290
#define ARM_COMPUTE_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(t, c,...)
Definition: Validate.h:786
size_t get_data_layout_dimension_index(const DataLayout &data_layout, const DataLayoutDimension &data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:193
#define ARM_COMPUTE_LOG_PARAMS(...)
int offset_int_vec(int offset)
Definition: MemoryHelpers.h:38
experimental::MemoryRequirements workspace() const override
Return the memory requirements required by the workspace.
Definition: ClGemm.cpp:793

◆ operator=() [1/2]

ClWinogradConv2d& operator= ( const ClWinogradConv2d )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

ClWinogradConv2d& operator= ( ClWinogradConv2d &&  )
default

Default move assignment operator.

◆ prepare()

void prepare ( ITensorPack constants)
overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Parameters
[in]constantsVector that contains the constants tensors.
Note
Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from ICLOperator.

Definition at line 279 of file ClWinogradConv2d.cpp.

References arm_compute::ACL_DST, arm_compute::ACL_SRC, arm_compute::ACL_SRC_1, ITensorPack::add_tensor(), CLScheduler::enqueue_op(), CLScheduler::get(), ITensorPack::get_const_tensor(), ITensorPack::get_tensor(), arm_compute::offset_int_vec(), ClGemm::prepare(), and CLScheduler::queue().

Referenced by ClWinogradConv2d::run().

280 {
281  if(!_is_prepared)
282  {
283  auto weights = utils::cast::polymorphic_downcast<const ICLTensor *>(tensors.get_const_tensor(TensorType::ACL_SRC_1));
284  ICLTensor *in1_aux = utils::cast::polymorphic_downcast<ICLTensor *>(tensors.get_tensor(offset_int_vec(3)));
285 
286  CLAuxTensorHandler input1(_input1, *in1_aux);
287  ITensorPack pack_ft
288  {
289  { TensorType::ACL_SRC, weights },
290  { TensorType::ACL_DST, input1.get() },
291  };
292  // Run filter transform and mark original weights as unused
293  CLScheduler::get().enqueue_op(*_filter_transform, pack_ft, false);
294  weights->mark_as_unused();
295 
296  // Prepare GEMM and release reshaped weights if marked unused by ClGemm
297  ITensorPack mm_prepare_pack = tensors;
298  mm_prepare_pack.add_tensor(ACL_SRC_1, input1.get());
299  _batched_mm.prepare(mm_prepare_pack);
300 
301  CLScheduler::get().queue().finish();
302  _is_prepared = true;
303  }
304 }
static CLScheduler & get()
Access the scheduler singleton.
void enqueue_op(ICLKernel &kernel, ITensorPack &tensors, bool flush=true)
Schedule the execution of the passed kernel if possible.
cl::CommandQueue & queue()
Accessor for the associated CL command queue.
Definition: CLScheduler.cpp:43
void prepare(ITensorPack &constants) override
Prepare the function for executing.
Definition: ClGemm.cpp:771
int offset_int_vec(int offset)
Definition: MemoryHelpers.h:38

◆ run()

void run ( ITensorPack tensors)
overridevirtual

Run the kernels contained in the function.

Parameters
[in]tensorsVector that contains the tensors to operate on.

Reimplemented from ICLOperator.

Definition at line 239 of file ClWinogradConv2d.cpp.

References arm_compute::ACL_DST, arm_compute::ACL_SRC, arm_compute::ACL_SRC_0, arm_compute::ACL_SRC_1, arm_compute::ACL_SRC_2, ITensorPack::add_const_tensor(), ITensorPack::add_tensor(), CLScheduler::enqueue_op(), CLScheduler::get(), CLAuxTensorHandler::get(), ITensorPack::get_const_tensor(), ITensorPack::get_tensor(), arm_compute::offset_int_vec(), arm_compute::experimental::Prepare, ClWinogradConv2d::prepare(), ITensorPack::remove_tensor(), and ClGemm::run().

240 {
241  const bool is_gemm_reshaped = _aux_mem[3].lifetime == MemoryLifetime::Prepare;
242 
243  auto src = utils::cast::polymorphic_downcast<const ICLTensor *>(tensors.get_const_tensor(TensorType::ACL_SRC_0));
244  auto biases = utils::cast::polymorphic_downcast<const ICLTensor *>(tensors.get_const_tensor(TensorType::ACL_SRC_2));
245  auto dst = utils::cast::polymorphic_downcast<ICLTensor *>(tensors.get_tensor(TensorType::ACL_DST));
246 
247  CLAuxTensorHandler input0(offset_int_vec(2), _input0, tensors, true);
248  CLAuxTensorHandler input1(offset_int_vec(3), _input1, tensors, true, is_gemm_reshaped);
249  CLAuxTensorHandler batched_mm_output(offset_int_vec(4), _batched_mm_output, tensors, true);
250 
251  prepare(tensors);
252 
253  // Run input transform
254  ITensorPack pack_it
255  {
257  { TensorType::ACL_DST, input0.get() },
258  };
259  CLScheduler::get().enqueue_op(_border_handler, pack_it, false);
260  CLScheduler::get().enqueue_op(*_input_transform, pack_it, false);
261 
262  // Run batched matrix multiplication
263  ITensorPack pack_mm = tensors;
264  pack_mm.add_const_tensor(TensorType::ACL_SRC_0, input0.get());
265  pack_mm.add_tensor(TensorType::ACL_DST, batched_mm_output.get());
266  is_gemm_reshaped ? pack_mm.remove_tensor(TensorType::ACL_SRC_1) : pack_mm.add_const_tensor(TensorType::ACL_SRC_1, input1.get());
267  _batched_mm.run(pack_mm);
268 
269  // Run output transform
270  ITensorPack pack_ot
271  {
272  { TensorType::ACL_SRC_0, batched_mm_output.get() },
273  { TensorType::ACL_SRC_1, biases },
274  { TensorType::ACL_DST, dst },
275  };
276  CLScheduler::get().enqueue_op(*_output_transform, pack_ot);
277 }
static CLScheduler & get()
Access the scheduler singleton.
void prepare(ITensorPack &tensors) override
Prepare the function for executing.
SimpleTensor< float > src
Definition: DFT.cpp:155
void enqueue_op(ICLKernel &kernel, ITensorPack &tensors, bool flush=true)
Schedule the execution of the passed kernel if possible.
void run(ITensorPack &tensors) override
Run the kernels contained in the function.
Definition: ClGemm.cpp:663
int offset_int_vec(int offset)
Definition: MemoryHelpers.h:38

◆ validate()

Status validate ( const ITensorInfo src,
const ITensorInfo weights,
const ITensorInfo biases,
const ITensorInfo dst,
const PadStrideInfo conv_info,
const ActivationLayerInfo act_info = ActivationLayerInfo(),
bool  enable_fast_math = false 
)
static

Static function to check if given info will lead to a valid configuration.

Similar to ClWinogradConv2d::configure()

Returns
a status

Definition at line 232 of file ClWinogradConv2d.cpp.

References ARM_COMPUTE_RETURN_ON_ERROR, and arm_compute::cpu::kernels::validate_arguments().

Referenced by ClConv2d::get_convolution_method(), ClConv2d::validate(), and CLWinogradConvolutionLayer::validate().

234 {
235  ARM_COMPUTE_RETURN_ON_ERROR(validate_arguments(src, weights, biases, dst, conv_info, act_info, enable_fast_math));
236  return Status{};
237 }
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204
Status validate_arguments(const ITensorInfo *src, const ITensorInfo *weights, const ITensorInfo *dst, const PadStrideInfo &conv_info)
SimpleTensor< float > src
Definition: DFT.cpp:155

◆ workspace()

experimental::MemoryRequirements workspace ( ) const
overridevirtual

Return the memory requirements required by the workspace.

Reimplemented from ICLOperator.

Definition at line 306 of file ClWinogradConv2d.cpp.

307 {
308  return _aux_mem;
309 }

The documentation for this class was generated from the following files: