Compute Library
 22.02
ClWinogradConv2d Class Reference

Basic function to execute Winograd-based convolution on OpenCL. More...

#include <ClWinogradConv2d.h>

Collaboration diagram for ClWinogradConv2d:
[legend]

Public Member Functions

 ClWinogradConv2d ()
 Default constructor. More...
 
 ~ClWinogradConv2d ()
 Default destructor. More...
 
 ClWinogradConv2d (const ClWinogradConv2d &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 ClWinogradConv2d (ClWinogradConv2d &&)=default
 Default move constructor. More...
 
ClWinogradConv2doperator= (const ClWinogradConv2d &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
ClWinogradConv2doperator= (ClWinogradConv2d &&)=default
 Default move assignment operator. More...
 
void configure (const ClCompileContext &compile_context, ITensorInfo *src, ITensorInfo *weights, ITensorInfo *biases, ITensorInfo *dst, const PadStrideInfo &conv_info, const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false)
 Set the input and output tensors. More...
 
void run (ITensorPack &tensors) override
 Run the kernels contained in the function. More...
 
void prepare (ITensorPack &tensors) override
 Prepare the function for executing. More...
 
experimental::MemoryRequirements workspace () const override
 Return the memory requirements required by the workspace. More...
 
- Public Member Functions inherited from ICLOperator
 ICLOperator (IRuntimeContext *ctx=nullptr)
 Constructor. More...
 
 ICLOperator (const ICLOperator &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 ICLOperator (ICLOperator &&)=default
 Default move constructor. More...
 
ICLOperatoroperator= (const ICLOperator &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
ICLOperatoroperator= (ICLOperator &&)=default
 Default move assignment operator. More...
 
- Public Member Functions inherited from IOperator
virtual ~IOperator ()=default
 Destructor. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *src, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *dst, const PadStrideInfo &conv_info, const ActivationLayerInfo &act_info=ActivationLayerInfo(), bool enable_fast_math=false)
 Static function to check if given info will lead to a valid configuration. More...
 

Detailed Description

Basic function to execute Winograd-based convolution on OpenCL.

This function calls the following OpenCL functions/kernels:

  1. kernels::ClWinogradInputTransformKernel
  2. kernels::ClWinogradFilterTransformKernel (only once)
  3. ClGemm
  4. kernels::ClWinogradOutputTransformKernel

Definition at line 53 of file ClWinogradConv2d.h.

Constructor & Destructor Documentation

◆ ClWinogradConv2d() [1/3]

Default constructor.

Definition at line 158 of file ClWinogradConv2d.cpp.

159  : _batched_mm(),
160  _input_transform(std::make_unique<kernels::ClWinogradInputTransformKernel>()),
161  _filter_transform(std::make_unique<kernels::ClWinogradFilterTransformKernel>()),
162  _output_transform(std::make_unique<kernels::ClWinogradOutputTransformKernel>()),
163  _border_handler(),
164  _input0(),
165  _input1(),
166  _batched_mm_output(),
167  _is_prepared(false),
168  _aux_mem()
169 {
170 }

◆ ~ClWinogradConv2d()

~ClWinogradConv2d ( )
default

Default destructor.

◆ ClWinogradConv2d() [2/3]

ClWinogradConv2d ( const ClWinogradConv2d )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ ClWinogradConv2d() [3/3]

Default move constructor.

Member Function Documentation

◆ configure()

void configure ( const ClCompileContext compile_context,
ITensorInfo src,
ITensorInfo weights,
ITensorInfo biases,
ITensorInfo dst,
const PadStrideInfo conv_info,
const ActivationLayerInfo act_info = ActivationLayerInfo(),
bool  enable_fast_math = false 
)

Set the input and output tensors.

Valid data layouts:

  • NHWC
  • NCHW

Valid data type configurations:

src0 src1 src2 dst
F16 F16 F16 F16
F32 F32 F32 F32
Note
: This function only works with 3x3,3x1,1x3,5x5,5x1,1x5,7x1 and 1x7 kernels along with unit strides for both NCHW and NHWC data layout
Some Winograd configurations (i.e. F(4x4, 5x5)) are supported only with enable_fast_math = true
Parameters
[in]compile_contextThe compile context to be used.
[in]srcSource tensor info. 3 lower dimensions represent a single input [width, height, IFM], while every optional dimension from 4 and above represent a batch of inputs. Data types supported: F16/F32.
[in]weightsWeights tensor info. Weights are 4D tensor with dimensions [kernel_x, kernel_y, IFM, OFM]. Data type supported:Same as src.
[in]biasesBiases tensor info. Shared biases supported. Biases are 1D tensor with dimensions [OFM].Data type supported: Same as src
[out]dstDestination tensor info. 3 lower dimensions represent a single output [width, height, OFM], while the rest represent batch of outputs. Data types supported: Same as src.
[in]conv_infoContains padding and stride information described in PadStrideInfo.
[in]act_info(Optional) Activation layer information in case of a fused activation.
[in]enable_fast_math(Optional) Enable fast math computation. In case this flag were set, the function could dispatch the fastest implementation available which may introduce a drop of accuracy as well. Default is false

Definition at line 174 of file ClWinogradConv2d.cpp.

References ARM_COMPUTE_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_ERROR_ON_MSG, ARM_COMPUTE_ERROR_THROW_ON, ARM_COMPUTE_LOG_PARAMS, CLFillBorderKernel::configure(), ClGemm::configure(), arm_compute::CONSTANT, ITensorInfo::data_layout(), ITensorInfo::data_type(), arm_compute::mlgo::parser::end(), arm_compute::F16, arm_compute::F32, arm_compute::get_data_layout_dimension_index(), arm_compute::HEIGHT, arm_compute::test::validation::idx_height, arm_compute::test::validation::idx_width, arm_compute::offset_int_vec(), ITensorInfo::tensor_shape(), TensorInfo::total_size(), arm_compute::WIDTH, and ClGemm::workspace().

176 {
177  ARM_COMPUTE_ERROR_THROW_ON(validate_arguments(src, weights, biases, dst, conv_info, act_info, enable_fast_math));
178  ARM_COMPUTE_LOG_PARAMS(src, weights, biases, dst, conv_info, act_info, enable_fast_math);
179 
180  // Get indices for the width and height
183 
184  // Input shape, kernel size and output tile
185  const Size2D input_dims = Size2D(src->tensor_shape()[idx_width], src->tensor_shape()[idx_height]);
186  const Size2D kernel_size = Size2D(weights->tensor_shape()[idx_width], weights->tensor_shape()[idx_height]);
187  const Size2D output_tile = winograd_output_tile(input_dims, kernel_size, src->data_layout());
188 
189  // Check if the Winograd configuration requires fast math
190  if(!enable_fast_math)
191  {
192  ARM_COMPUTE_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(src, 1, DataType::F32); //disable winograd for fp16 if fast math is false.
193  ARM_COMPUTE_ERROR_ON_MSG(check_support_fast_math(output_tile, kernel_size), "This Winograd configuration requires enable_fast_math=true");
194  }
195  const WinogradInfo winograd_info = WinogradInfo(output_tile,
196  kernel_size,
197  input_dims,
198  conv_info,
199  src->data_layout());
200 
201  _is_prepared = false;
202 
203  // Configure input transform
204  _input_transform->configure(compile_context, src, &_input0, winograd_info);
205  _border_handler.configure(compile_context, src, _input_transform->border_size(), BorderMode::CONSTANT, PixelValue());
206 
207  // Configure filter transform
208  _filter_transform->configure(compile_context, weights, &_input1, winograd_info);
209 
210  // Configure batched matrix multiply
211  _batched_mm.configure(compile_context, &_input0, &_input1, nullptr, &_batched_mm_output, 1.0f, 0.0f, GEMMInfo(false, false, true /* Reshape weights only for the first run*/, 0,
212  false, false,
213  GEMMLowpOutputStageInfo(),
214  (src->data_type() == DataType::F16)));
215 
216  // Configure output transform
217  _output_transform->configure(compile_context, &_batched_mm_output, biases, dst, winograd_info, act_info);
218 
219  _aux_mem = _batched_mm.workspace();
220  const MemoryLifetime wino_wei_lifetm = std::any_of(std::begin(_aux_mem), std::end(_aux_mem), [](const auto & r)
221  {
222  return (r.lifetime == MemoryLifetime::Persistent) && (r.size > 0);
223  }) ?
224  MemoryLifetime::Prepare :
225  MemoryLifetime::Persistent;
226  _aux_mem.push_back(MemoryInfo(offset_int_vec(2), MemoryLifetime::Temporary, _input0.total_size()));
227  _aux_mem.push_back(MemoryInfo(offset_int_vec(3), wino_wei_lifetm, _input1.total_size()));
228  _aux_mem.push_back(MemoryInfo(offset_int_vec(4), MemoryLifetime::Temporary, _batched_mm_output.total_size()));
229 }
void configure(const CLCompileContext &compile_context, ICLTensor *tensor, BorderSize border_size, BorderMode border_mode, const PixelValue &constant_border_value=PixelValue())
Initialise the kernel&#39;s input, output and border mode.
void configure(const CLCompileContext &compile_context, ITensorInfo *a, ITensorInfo *b, ITensorInfo *c, ITensorInfo *output, float alpha, float beta, const GEMMInfo &gemm_info)
Initialise the kernel&#39;s inputs and output.
Definition: ClGemm.cpp:461
1 channel, 1 F32 per channel
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
SimpleTensor< float > src
Definition: DFT.cpp:155
1 channel, 1 F16 per channel
#define ARM_COMPUTE_ERROR_ON_MSG(cond, msg)
Definition: Error.h:456
size_t total_size() const override
Returns the total size of the tensor in bytes.
Definition: TensorInfo.h:250
void end(TokenStream &in, bool &valid)
Definition: MLGOParser.cpp:290
#define ARM_COMPUTE_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(t, c,...)
Definition: Validate.h:786
size_t get_data_layout_dimension_index(const DataLayout &data_layout, const DataLayoutDimension &data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:193
#define ARM_COMPUTE_LOG_PARAMS(...)
int offset_int_vec(int offset)
Definition: MemoryHelpers.h:38
experimental::MemoryRequirements workspace() const override
Return the memory requirements required by the workspace.
Definition: ClGemm.cpp:659

◆ operator=() [1/2]

ClWinogradConv2d& operator= ( const ClWinogradConv2d )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

ClWinogradConv2d& operator= ( ClWinogradConv2d &&  )
default

Default move assignment operator.

◆ prepare()

void prepare ( ITensorPack constants)
overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Parameters
[in]constantsVector that contains the constants tensors.
Note
Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from ICLOperator.

Definition at line 278 of file ClWinogradConv2d.cpp.

References arm_compute::ACL_DST, arm_compute::ACL_SRC, arm_compute::ACL_SRC_1, ITensorPack::add_tensor(), CLScheduler::enqueue_op(), CLScheduler::get(), ITensorPack::get_const_tensor(), ITensorPack::get_tensor(), arm_compute::offset_int_vec(), ClGemm::prepare(), and CLScheduler::queue().

Referenced by ClWinogradConv2d::run().

279 {
280  if(!_is_prepared)
281  {
282  auto weights = utils::cast::polymorphic_downcast<const ICLTensor *>(tensors.get_const_tensor(TensorType::ACL_SRC_1));
283  ICLTensor *in1_aux = utils::cast::polymorphic_downcast<ICLTensor *>(tensors.get_tensor(offset_int_vec(3)));
284 
285  CLAuxTensorHandler input1(_input1, *in1_aux);
286  ITensorPack pack_ft
287  {
288  { TensorType::ACL_SRC, weights },
289  { TensorType::ACL_DST, input1.get() },
290  };
291  // Run filter transform and mark original weights as unused
292  CLScheduler::get().enqueue_op(*_filter_transform, pack_ft, false);
293  weights->mark_as_unused();
294 
295  // Prepare GEMM and release reshaped weights if marked unused by ClGemm
296  ITensorPack mm_prepare_pack = tensors;
297  mm_prepare_pack.add_tensor(ACL_SRC_1, input1.get());
298  _batched_mm.prepare(mm_prepare_pack);
299 
300  CLScheduler::get().queue().finish();
301  _is_prepared = true;
302  }
303 }
static CLScheduler & get()
Access the scheduler singleton.
void enqueue_op(ICLKernel &kernel, ITensorPack &tensors, bool flush=true)
Schedule the execution of the passed kernel if possible.
cl::CommandQueue & queue()
Accessor for the associated CL command queue.
Definition: CLScheduler.cpp:39
void prepare(ITensorPack &constants) override
Prepare the function for executing.
Definition: ClGemm.cpp:637
int offset_int_vec(int offset)
Definition: MemoryHelpers.h:38

◆ run()

void run ( ITensorPack tensors)
overridevirtual

Run the kernels contained in the function.

Parameters
[in]tensorsVector that contains the tensors to operate on.

Reimplemented from ICLOperator.

Definition at line 238 of file ClWinogradConv2d.cpp.

References arm_compute::ACL_DST, arm_compute::ACL_SRC, arm_compute::ACL_SRC_0, arm_compute::ACL_SRC_1, arm_compute::ACL_SRC_2, ITensorPack::add_const_tensor(), ITensorPack::add_tensor(), CLScheduler::enqueue_op(), CLScheduler::get(), CLAuxTensorHandler::get(), ITensorPack::get_const_tensor(), ITensorPack::get_tensor(), arm_compute::offset_int_vec(), ClWinogradConv2d::prepare(), ITensorPack::remove_tensor(), and ClGemm::run().

239 {
240  const bool is_gemm_reshaped = _aux_mem[3].lifetime == MemoryLifetime::Prepare;
241 
242  auto src = utils::cast::polymorphic_downcast<const ICLTensor *>(tensors.get_const_tensor(TensorType::ACL_SRC_0));
243  auto biases = utils::cast::polymorphic_downcast<const ICLTensor *>(tensors.get_const_tensor(TensorType::ACL_SRC_2));
244  auto dst = utils::cast::polymorphic_downcast<ICLTensor *>(tensors.get_tensor(TensorType::ACL_DST));
245 
246  CLAuxTensorHandler input0(offset_int_vec(2), _input0, tensors, true);
247  CLAuxTensorHandler input1(offset_int_vec(3), _input1, tensors, true, is_gemm_reshaped);
248  CLAuxTensorHandler batched_mm_output(offset_int_vec(4), _batched_mm_output, tensors, true);
249 
250  prepare(tensors);
251 
252  // Run input transform
253  ITensorPack pack_it
254  {
256  { TensorType::ACL_DST, input0.get() },
257  };
258  CLScheduler::get().enqueue_op(_border_handler, pack_it, false);
259  CLScheduler::get().enqueue_op(*_input_transform, pack_it, false);
260 
261  // Run batched matrix multiplication
262  ITensorPack pack_mm = tensors;
263  pack_mm.add_const_tensor(TensorType::ACL_SRC_0, input0.get());
264  pack_mm.add_tensor(TensorType::ACL_DST, batched_mm_output.get());
265  is_gemm_reshaped ? pack_mm.remove_tensor(TensorType::ACL_SRC_1) : pack_mm.add_const_tensor(TensorType::ACL_SRC_1, input1.get());
266  _batched_mm.run(pack_mm);
267 
268  // Run output transform
269  ITensorPack pack_ot
270  {
271  { TensorType::ACL_SRC_0, batched_mm_output.get() },
272  { TensorType::ACL_SRC_1, biases },
273  { TensorType::ACL_DST, dst },
274  };
275  CLScheduler::get().enqueue_op(*_output_transform, pack_ot);
276 }
static CLScheduler & get()
Access the scheduler singleton.
void prepare(ITensorPack &tensors) override
Prepare the function for executing.
SimpleTensor< float > src
Definition: DFT.cpp:155
void enqueue_op(ICLKernel &kernel, ITensorPack &tensors, bool flush=true)
Schedule the execution of the passed kernel if possible.
void run(ITensorPack &tensors) override
Run the kernels contained in the function.
Definition: ClGemm.cpp:557
int offset_int_vec(int offset)
Definition: MemoryHelpers.h:38

◆ validate()

Status validate ( const ITensorInfo src,
const ITensorInfo weights,
const ITensorInfo biases,
const ITensorInfo dst,
const PadStrideInfo conv_info,
const ActivationLayerInfo act_info = ActivationLayerInfo(),
bool  enable_fast_math = false 
)
static

Static function to check if given info will lead to a valid configuration.

Similar to ClWinogradConv2d::configure()

Returns
a status

Definition at line 231 of file ClWinogradConv2d.cpp.

References ARM_COMPUTE_RETURN_ON_ERROR.

Referenced by ClConv2d::get_convolution_method(), ClConv2d::validate(), and CLWinogradConvolutionLayer::validate().

233 {
234  ARM_COMPUTE_RETURN_ON_ERROR(validate_arguments(src, weights, biases, dst, conv_info, act_info, enable_fast_math));
235  return Status{};
236 }
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204
SimpleTensor< float > src
Definition: DFT.cpp:155

◆ workspace()

experimental::MemoryRequirements workspace ( ) const
overridevirtual

Return the memory requirements required by the workspace.

Reimplemented from ICLOperator.

Definition at line 305 of file ClWinogradConv2d.cpp.

306 {
307  return _aux_mem;
308 }

The documentation for this class was generated from the following files: