Compute Library
 21.11
CLGEMMDeconvolutionLayer Class Reference

Function to run the deconvolution layer through a call to GEMM. More...

#include <CLGEMMDeconvolutionLayer.h>

Collaboration diagram for CLGEMMDeconvolutionLayer:
[legend]

Public Member Functions

 CLGEMMDeconvolutionLayer (std::shared_ptr< IMemoryManager > memory_manager=nullptr)
 Constructor. More...
 
 CLGEMMDeconvolutionLayer (const CLGEMMDeconvolutionLayer &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 CLGEMMDeconvolutionLayer (CLGEMMDeconvolutionLayer &&)=default
 Default move constructor. More...
 
CLGEMMDeconvolutionLayeroperator= (const CLGEMMDeconvolutionLayer &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
CLGEMMDeconvolutionLayeroperator= (CLGEMMDeconvolutionLayer &&)=default
 Default move assignment operator. More...
 
 ~CLGEMMDeconvolutionLayer ()
 Default desctructor. More...
 
void configure (const ICLTensor *input, const ICLTensor *weights, const ICLTensor *bias, ICLTensor *output, const PadStrideInfo &deconv_info)
 Set the input, weights, biases and output tensors. More...
 
void configure (const CLCompileContext &compile_context, const ICLTensor *input, const ICLTensor *weights, const ICLTensor *bias, ICLTensor *output, const PadStrideInfo &deconv_info)
 Set the input, weights, biases and output tensors. More...
 
void run () override
 Run the kernels contained in the function. More...
 
void prepare () override
 Prepare the function for executing. More...
 
- Public Member Functions inherited from IFunction
virtual ~IFunction ()=default
 Destructor. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *bias, const ITensorInfo *output, const PadStrideInfo &deconv_info)
 Static function to check if given info will lead to a valid configuration of CLDeconvolutionLayer. More...
 

Detailed Description

Function to run the deconvolution layer through a call to GEMM.

Deconvolution Layer is the backward pass of Convolution Layer. First we transform the input depending on the stride and pad info and then perform a 1x1 convolution pass. Input stride defines how many zeroes we should put between each element of the input, pad is the amount of padding and finally a is a user specified value where a < stride - 1, that increases the padding top and right of the input image.

The relation between input to output is as follows:

\[ width\_output = (width\_input - 1) \cdot stride\_x - 2 \cdot padding\_x + kernel\_x \]

\[ height\_output = (height\_input - 1) \cdot stride\_y - 2 \cdot padding\_y + kernel\_y \]

where: width_input is the size of the first input dimension. height_input is the size of the second input dimension. width_output is the size of the first output dimension. height_output is the size of the second output dimension. kernel_x and kernel_y are the convolution sizes in x and y. stride_x and stride_y is the input stride of the first and second dimension.

The weights used by Deconvolution are supposed to be the same as the ones used for Convolution.

This function calls the following OpenCL kernels/functions:

  1. CLGEMMLowpMatrixMultiplyCore
  2. CLGEMMLowpOutputStage
  3. CLPermute
  4. CLPermute
  5. CLReshapeLayer
  6. CLTranspose
  7. CLDeconvolutionReshapeOutputKernel
  8. CLSlice

Definition at line 81 of file CLGEMMDeconvolutionLayer.h.

Constructor & Destructor Documentation

◆ CLGEMMDeconvolutionLayer() [1/3]

CLGEMMDeconvolutionLayer ( std::shared_ptr< IMemoryManager memory_manager = nullptr)

Constructor.

Definition at line 96 of file CLGEMMDeconvolutionLayer.cpp.

97  : _memory_group(std::move(memory_manager)),
98  _mm_gemm(),
99  _mm_gemmlowp(),
100  _gemmlowp_output_stage(),
101  _permute_input_to_nhwc(),
102  _permute_weights_to_nhwc(),
103  _reshape_weights(),
104  _transpose_weights(),
105  _deconv_reshape(std::make_unique<CLDeconvolutionReshapeOutputKernel>()),
106  _slice_gemm(),
107  _gemmlowp_final(),
108  _reshaped_weights(),
109  _reshaped_weights_t(),
110  _permuted_input(),
111  _permuted_weights(),
112  _gemm_output(),
113  _slice_gemm_input(),
114  _original_weights(),
115  _is_prepared(false),
116  _padded_input(false),
117  _is_nchw(false),
118  _is_quantized(false)
119 {
120 }

◆ CLGEMMDeconvolutionLayer() [2/3]

Prevent instances of this class from being copied (As this class contains pointers)

◆ CLGEMMDeconvolutionLayer() [3/3]

Default move constructor.

◆ ~CLGEMMDeconvolutionLayer()

Default desctructor.

Member Function Documentation

◆ configure() [1/2]

void configure ( const ICLTensor input,
const ICLTensor weights,
const ICLTensor bias,
ICLTensor output,
const PadStrideInfo deconv_info 
)

Set the input, weights, biases and output tensors.

Valid data layouts:

  • NHWC

Valid data type configurations:

src0 src1 src2 dst
F16 F16 F16 F16
F32 F32 F32 F32
QASYMM8 QASYMM8 S32 QASYMM8
QASYMM8_SIGNED QASYMM8_SIGNED S32 QASYMM8_SIGNED
Parameters
[in,out]inputInput tensor. 3 lower dimensions represent a single input, and an optional 4th dimension for batch of inputs. Data types supported: QASYMM8/QASYMM8_SIGNED/F16/F32. Data layout supported: NHWC
[in]weightsThe 4d weights with dimensions [width, height, IFM, OFM]. Data type supported: Same as input. Data layout supported: same as input.
[in]bias(Optional) The biases have one dimension. Data type supported: Same as input. Data layout supported: same as input.
[out]outputOutput tensor. The output has the same number of dimensions as the input. Data layout supported: same as input.
[in]deconv_infoContains padding and policies to be used in the deconvolution, this is described in PadStrideInfo. This function supports only stride_x = weights.width && stride_y = weights.height. Moreover, padding is not supported.

Definition at line 219 of file CLGEMMDeconvolutionLayer.cpp.

References CLKernelLibrary::get().

220 {
221  configure(CLKernelLibrary::get().get_compile_context(), input, weights, bias, output, deconv_info);
222 }
void configure(const ICLTensor *input, const ICLTensor *weights, const ICLTensor *bias, ICLTensor *output, const PadStrideInfo &deconv_info)
Set the input, weights, biases and output tensors.
static CLKernelLibrary & get()
Access the KernelLibrary singleton.

◆ configure() [2/2]

void configure ( const CLCompileContext compile_context,
const ICLTensor input,
const ICLTensor weights,
const ICLTensor bias,
ICLTensor output,
const PadStrideInfo deconv_info 
)

Set the input, weights, biases and output tensors.

Parameters
[in]compile_contextThe compile context to be used.
[in,out]inputInput tensor. 3 lower dimensions represent a single input, and an optional 4th dimension for batch of inputs. Data types supported: QASYMM8/QASYMM8_SIGNED/F16/F32. Data layout supported: NHWC
[in]weightsThe 4d weights with dimensions [width, height, IFM, OFM]. Data type supported: Same as input. Data layout supported: same as input.
[in]bias(Optional) The biases have one dimension. Data type supported: Same as input. Data layout supported: same as input.
[out]outputOutput tensor. The output has the same number of dimensions as the input. Data layout supported: same as input.
[in]deconv_infoContains padding and policies to be used in the deconvolution, this is described in PadStrideInfo. This function supports only stride_x = weights.width && stride_y = weights.height. Moreover, padding is not supported.

Definition at line 224 of file CLGEMMDeconvolutionLayer.cpp.

References CLTensorAllocator::allocate(), CLTensor::allocator(), ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, ARM_COMPUTE_LOG_PARAMS, CLReshapeLayer::configure(), CLTranspose::configure(), CLPermute::configure(), CLSlice::configure(), CLGEMMLowpOutputStage::configure(), CLGEMM::configure(), CLGEMMLowpMatrixMultiplyCore::configure(), ITensorInfo::data_layout(), ITensorInfo::data_type(), ITensorInfo::dimension(), arm_compute::test::validation::gemm_info, arm_compute::get_data_layout_dimension_index(), arm_compute::HEIGHT, ITensor::info(), CLTensor::info(), ITensorAllocator::init(), arm_compute::test::validation::input, arm_compute::is_data_type_quantized_asymmetric(), MemoryGroup::manage(), arm_compute::NCHW, UniformQuantizationInfo::offset, PadStrideInfo::pad_bottom(), PadStrideInfo::pad_left(), PadStrideInfo::pad_right(), PadStrideInfo::pad_top(), ITensorInfo::quantization_info(), UniformQuantizationInfo::scale, TensorInfo::set_quantization_info(), arm_compute::U, QuantizationInfo::uniform(), and CLGEMMDeconvolutionLayer::validate().

226 {
227  ARM_COMPUTE_ERROR_ON_NULLPTR(input, weights, output);
229  weights->info(),
230  bias != nullptr ? bias->info() : nullptr,
231  output->info(),
232  deconv_info));
233  ARM_COMPUTE_LOG_PARAMS(input, weights, bias, output, deconv_info);
234 
235  _original_weights = weights;
236  _padded_input = deconv_info.pad_bottom() > 0 || deconv_info.pad_left() > 0 || deconv_info.pad_right() > 0 || deconv_info.pad_top() > 0;
237  _is_nchw = input->info()->data_layout() == DataLayout::NCHW;
238  _is_quantized = is_data_type_quantized_asymmetric(input->info()->data_type());
239 
240  const ICLTensor *input_to_use = input;
241  const ICLTensor *weights_to_use = weights;
242 
243  // If the data layout is NCHW, transform everything in NHWC. Another alternative could be to
244  // do an outer product in NCHW and then an accumulation through a reduction. This would have two
245  // drawbacks: first, the outer product is less efficient than a full GEMM. Second, the reduction
246  // might be slower than GEMM.
247  if(_is_nchw)
248  {
249  _memory_group.manage(&_permuted_input);
250  _permute_input_to_nhwc.configure(compile_context, input, &_permuted_input, PermutationVector(2U, 0U, 1U));
251 
252  _permute_weights_to_nhwc.configure(compile_context, weights, &_permuted_weights, PermutationVector(2U, 0U, 1U));
253 
254  input_to_use = &_permuted_input;
255  weights_to_use = &_permuted_weights;
256  }
257 
258  // Reshape the input weights. The weights will be reshaped only once during the call to prepare()
259  _reshaped_weights.allocator()->init(TensorInfo(TensorShape(weights_to_use->info()->dimension(0),
260  weights_to_use->info()->dimension(1) * weights_to_use->info()->dimension(2) * weights_to_use->info()->dimension(3)),
261  1,
262  input->info()->data_type(), weights->info()->quantization_info()));
263 
264  _reshape_weights.configure(compile_context, weights_to_use, &_reshaped_weights);
265  _transpose_weights.configure(compile_context, &_reshaped_weights, &_reshaped_weights_t);
266 
267  const size_t idx_h = get_data_layout_dimension_index(input->info()->data_layout(), DataLayoutDimension::HEIGHT);
268  GEMMInfo gemm_info(false, false, true, input->info()->dimension(idx_h), true);
269 
270  // Configure output stage for asymmetric quantized types
271  if(_is_quantized)
272  {
273  // gemmlowp adds the offsets (instead of subtracting them). Thus, we need to negate the original
274  // and restore them back to make it work properly.
275  QuantizationInfo iq_info = input->info()->quantization_info();
276  QuantizationInfo wq_info = weights->info()->quantization_info();
277 
278  input_to_use->info()->set_quantization_info(QuantizationInfo(iq_info.uniform().scale, -iq_info.uniform().offset));
279  _reshaped_weights_t.info()->set_quantization_info(QuantizationInfo(wq_info.uniform().scale, -wq_info.uniform().offset));
280 
281  _mm_gemmlowp.configure(compile_context, input_to_use, &_reshaped_weights_t, nullptr, &_gemm_output, gemm_info);
282 
283  input_to_use->info()->set_quantization_info(iq_info);
284  _reshaped_weights_t.info()->set_quantization_info(wq_info);
285  }
286  else
287  {
288  _mm_gemm.configure(compile_context, input_to_use, &_reshaped_weights_t, nullptr, &_gemm_output, 1.f, 0.0f, gemm_info);
289  }
290 
291  if(_is_nchw)
292  {
293  _permuted_input.allocator()->allocate();
294  }
295 
296  ICLTensor *deconv_reshape_output = nullptr;
297  ICLTensor *slice_output = nullptr;
298  ICLTensor *output_stage_output = nullptr;
299 
300  if(_padded_input && _is_quantized)
301  {
302  _memory_group.manage(&_slice_gemm_input);
303  _memory_group.manage(&_gemmlowp_final);
304  deconv_reshape_output = &_gemmlowp_final;
305  output_stage_output = &_slice_gemm_input;
306  slice_output = output;
307  }
308  else if(_padded_input)
309  {
310  _memory_group.manage(&_slice_gemm_input);
311  deconv_reshape_output = &_slice_gemm_input;
312  slice_output = output;
313  }
314  else if(_is_quantized)
315  {
316  _memory_group.manage(&_gemmlowp_final);
317  deconv_reshape_output = &_gemmlowp_final;
318  output_stage_output = output;
319  }
320  else
321  {
322  deconv_reshape_output = output;
323  }
324 
325  // Configure a Col2Im call to reshape the output of GEMM
326  _deconv_reshape->configure(compile_context, &_gemm_output, bias, deconv_reshape_output, input->info(), weights->info(), deconv_info);
327  _gemm_output.allocator()->allocate();
328 
329  if(_is_quantized)
330  {
331  GEMMLowpOutputStageInfo output_stage_info;
332  construct_gemmlowp_output_stage(input->info(), weights->info(), output->info(), output_stage_info);
333  _gemmlowp_output_stage.configure(compile_context, &_gemmlowp_final, nullptr, output_stage_output, output_stage_info);
334  _gemmlowp_final.allocator()->allocate();
335  }
336 
337  // If the input was padded, the output needs to be sliced.
338  if(_padded_input)
339  {
340  const auto start_end = compute_start_end_slice_coordinates(*deconv_reshape_output->info(), deconv_info, _is_nchw);
341  _slice_gemm.configure(compile_context, &_slice_gemm_input, slice_output, start_end.first, start_end.second);
342  _slice_gemm_input.allocator()->allocate();
343  }
344 }
TensorInfo * info() const override
Interface to be implemented by the child class to return the tensor&#39;s metadata.
Definition: CLTensor.cpp:41
static Status validate(const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *bias, const ITensorInfo *output, const PadStrideInfo &deconv_info)
Static function to check if given info will lead to a valid configuration of CLDeconvolutionLayer.
Strides PermutationVector
Permutation vector.
Definition: Types.h:51
void configure(const ICLTensor *input, const ICLTensor *bias, ICLTensor *output, const GEMMLowpOutputStageInfo &info)
Initialise the kernel&#39;s inputs, output.
CLTensorAllocator * allocator()
Return a pointer to the tensor&#39;s allocator.
Definition: CLTensor.cpp:61
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
void configure(const ICLTensor *a, const ICLTensor *b, const ICLTensor *c, ICLTensor *output, const GEMMInfo &gemm_info=GEMMInfo())
Initialise the kernel&#39;s inputs, output.
void init(const TensorInfo &input, size_t alignment=0)
Initialize a tensor based on the passed TensorInfo.
ITensorInfo & set_quantization_info(const QuantizationInfo &quantization_info) override
Set the quantization settings (scale and offset) of the tensor.
Definition: TensorInfo.cpp:346
void manage(IMemoryManageable *obj) override
Sets a object to be managed by the given memory group.
Definition: MemoryGroup.h:79
void configure(const CLCompileContext &compile_context, const ICLTensor *a, const ICLTensor *b, const ICLTensor *c, ICLTensor *output, float alpha, float beta, const GEMMInfo &gemm_info=GEMMInfo())
Initialise the kernel&#39;s inputs and output.
Definition: CLGEMM.cpp:69
Num samples, channels, height, width.
void configure(const ICLTensor *input, ICLTensor *output, const Coordinates &starts, const Coordinates &ends)
Configure kernel.
Definition: CLSlice.cpp:87
bool is_data_type_quantized_asymmetric(DataType dt)
Check if a given data type is of asymmetric quantized type.
Definition: Utils.h:1003
void allocate() override
Allocate size specified by TensorInfo of OpenCL memory.
size_t get_data_layout_dimension_index(const DataLayout &data_layout, const DataLayoutDimension &data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:193
#define ARM_COMPUTE_LOG_PARAMS(...)
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:157
void configure(const ICLTensor *input, ICLTensor *output, const PermutationVector &perm)
Set the input and output tensors.
Definition: CLPermute.cpp:51
void configure(const ICLTensor *input, ICLTensor *output)
Initialise the kernel&#39;s inputs and outputs.
void configure(const ICLTensor *input, ICLTensor *output)
Initialise the kernel&#39;s inputs and output.
Definition: CLTranspose.cpp:47

◆ operator=() [1/2]

CLGEMMDeconvolutionLayer& operator= ( const CLGEMMDeconvolutionLayer )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

Default move assignment operator.

◆ prepare()

void prepare ( )
overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Note
Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from IFunction.

Definition at line 379 of file CLGEMMDeconvolutionLayer.cpp.

References CLTensorAllocator::allocate(), CLTensor::allocator(), ARM_COMPUTE_ERROR_ON, CLTensorAllocator::free(), ITensor::is_used(), ITensor::mark_as_unused(), CLGEMM::prepare(), CLGEMMLowpMatrixMultiplyCore::prepare(), CLReshapeLayer::run(), CLTranspose::run(), and CLPermute::run().

Referenced by CLGEMMDeconvolutionLayer::run().

380 {
381  if(!_is_prepared)
382  {
383  ARM_COMPUTE_ERROR_ON(!_original_weights->is_used());
384 
385  if(_is_nchw)
386  {
387  _permuted_weights.allocator()->allocate();
388  _permute_weights_to_nhwc.run();
389  }
390 
391  _reshaped_weights.allocator()->allocate();
392  _reshape_weights.run();
393 
394  if(_is_nchw)
395  {
396  _permuted_weights.allocator()->free();
397  }
398 
399  _reshaped_weights_t.allocator()->allocate();
400  _transpose_weights.run();
401 
402  // Prepare gemm
403  if(!_is_quantized)
404  {
405  _mm_gemm.prepare();
406  }
407  else
408  {
409  _mm_gemmlowp.prepare();
410  }
411 
412  // Free resources
413  if(!_reshaped_weights_t.is_used())
414  {
415  _reshaped_weights_t.allocator()->free();
416  }
417 
418  _original_weights->mark_as_unused();
419  _is_prepared = true;
420  }
421 }
void prepare() override
Prepare the function for executing.
Definition: CLGEMM.cpp:109
void prepare() override
Prepare the function for executing.
bool is_used() const
Flags if the tensor is used or not.
Definition: ITensor.cpp:163
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:466
CLTensorAllocator * allocator()
Return a pointer to the tensor&#39;s allocator.
Definition: CLTensor.cpp:61
void run() override
Run the kernels contained in the function.
Definition: CLPermute.cpp:73
void run() override
Run the kernels contained in the function.
void mark_as_unused() const
Marks a tensor as unused.
Definition: ITensor.cpp:168
void run() override
Run the kernels contained in the function.
Definition: CLTranspose.cpp:66
void allocate() override
Allocate size specified by TensorInfo of OpenCL memory.
void free() override
Free allocated OpenCL memory.

◆ run()

void run ( )
overridevirtual

Run the kernels contained in the function.

For CPU kernels:

  • Multi-threading is used for the kernels which are parallelisable.
  • By default std::thread::hardware_concurrency() threads are used.
Note
CPPScheduler::set_num_threads() can be used to manually set the number of threads

For OpenCL kernels:

  • All the kernels are enqueued on the queue associated with CLScheduler.
  • The queue is then flushed.
Note
The function will not block until the kernels are executed. It is the user's responsibility to wait.
Will call prepare() on first run if hasn't been done

Implements IFunction.

Definition at line 346 of file CLGEMMDeconvolutionLayer.cpp.

References CLScheduler::enqueue(), CLScheduler::get(), CLGEMMDeconvolutionLayer::prepare(), CLPermute::run(), CLSlice::run(), CLGEMM::run(), CLGEMMLowpOutputStage::run(), and CLGEMMLowpMatrixMultiplyCore::run().

347 {
348  prepare();
349 
350  MemoryGroupResourceScope scope_mg(_memory_group);
351 
352  if(_is_nchw)
353  {
354  _permute_input_to_nhwc.run();
355  }
356 
357  if(_is_quantized)
358  {
359  _mm_gemmlowp.run();
360  }
361  else
362  {
363  _mm_gemm.run();
364  }
365 
366  CLScheduler::get().enqueue(*_deconv_reshape, false);
367 
368  if(_is_quantized)
369  {
370  _gemmlowp_output_stage.run();
371  }
372 
373  if(_padded_input)
374  {
375  _slice_gemm.run();
376  }
377 }
void run() override
Run the kernels contained in the function.
Definition: CLGEMM.cpp:100
static CLScheduler & get()
Access the scheduler singleton.
void run() override
Run the kernels contained in the function.
void run() override
Run the kernels contained in the function.
Definition: CLPermute.cpp:73
void run() override
Run the kernels contained in the function.
void enqueue(ICLKernel &kernel, bool flush=true)
Schedule the execution of the passed kernel if possible.
void run() override
Run the kernels contained in the function.
Definition: CLSlice.cpp:100
void prepare() override
Prepare the function for executing.

◆ validate()

Status validate ( const ITensorInfo input,
const ITensorInfo weights,
const ITensorInfo bias,
const ITensorInfo output,
const PadStrideInfo deconv_info 
)
static

Static function to check if given info will lead to a valid configuration of CLDeconvolutionLayer.

Parameters
[in]inputInput tensor info. 3 lower dimensions represent a single input, and an optional 4th dimension for batch of inputs. Data types supported: QASYMM8/QASYMM8_SIGNED/F16/F32. Data layout supported: NHWC
[in]weightsThe 4d weights info with dimensions [width, height, IFM, OFM]. Data type supported: Same as input. Data layout supported: same as input.
[in]bias(Optional) The biases have one dimension. Data type supported: Same as input. Data layout supported: same as input.
[in]outputOutput tensor info. The output has the same number of dimensions as the input. Data layout supported: same as input.
[in]deconv_infoContains padding and policies to be used in the deconvolution, this is described in PadStrideInfo.
Returns
a status

Definition at line 124 of file CLGEMMDeconvolutionLayer.cpp.

References ARM_COMPUTE_RETURN_ERROR_ON, ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_LAYOUT, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES, ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR, ARM_COMPUTE_RETURN_ON_ERROR, arm_compute::BATCHES, ICloneable< T >::clone(), TensorInfo::clone(), arm_compute::misc::shape_calculator::compute_deconvolution_output_shape(), ITensorInfo::data_layout(), arm_compute::test::validation::data_layout, ITensorInfo::data_type(), arm_compute::deconvolution_output_dimensions(), ITensorInfo::dimension(), arm_compute::F16, arm_compute::F32, arm_compute::test::validation::gemm_info, arm_compute::get_data_layout_dimension_index(), arm_compute::HEIGHT, arm_compute::is_data_type_quantized_asymmetric(), arm_compute::NCHW, PadStrideInfo::pad_bottom(), PadStrideInfo::pad_left(), PadStrideInfo::pad_right(), PadStrideInfo::pad_top(), arm_compute::permute(), arm_compute::QASYMM8, arm_compute::QASYMM8_SIGNED, arm_compute::S32, TensorInfo::set_data_type(), PadStrideInfo::stride(), ITensorInfo::tensor_shape(), CLTranspose::validate(), CLReshapeLayer::validate(), CLPermute::validate(), CLDeconvolutionReshapeOutputKernel::validate(), CLSlice::validate(), CLGEMM::validate(), CLGEMMLowpOutputStage::validate(), CLGEMMLowpMatrixMultiplyCore::validate(), and arm_compute::WIDTH.

Referenced by CLGEMMDeconvolutionLayer::configure(), and CLDeconvolutionLayer::validate().

125 {
126  ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR(input, weights, output);
130 
131  DataLayout data_layout = input->data_layout();
132  const bool padded_input = deconv_info.pad_bottom() > 0 || deconv_info.pad_left() > 0 || deconv_info.pad_right() > 0 || deconv_info.pad_top() > 0;
133  const bool is_nchw = input->data_layout() == DataLayout::NCHW;
134  const bool is_quantized = is_data_type_quantized_asymmetric(input->data_type());
135 
136  const size_t idx_w = get_data_layout_dimension_index(data_layout, DataLayoutDimension::WIDTH);
137  const size_t idx_h = get_data_layout_dimension_index(data_layout, DataLayoutDimension::HEIGHT);
138  const size_t idx_b = get_data_layout_dimension_index(data_layout, DataLayoutDimension::BATCHES);
139 
140  ARM_COMPUTE_RETURN_ERROR_ON(weights->dimension(idx_w) != deconv_info.stride().first);
141  ARM_COMPUTE_RETURN_ERROR_ON(weights->dimension(idx_h) != deconv_info.stride().second);
142 
143  TensorShape nhwc_weights_shape = weights->tensor_shape();
144  TensorShape nhwc_input_shape = input->tensor_shape();
145 
146  if(is_nchw)
147  {
148  permute(nhwc_weights_shape, PermutationVector(2, 0, 1));
149  permute(nhwc_input_shape, PermutationVector(2, 0, 1));
150 
151  TensorInfo nhwc_input_info = input->clone()->set_is_resizable(true).reset_padding().set_tensor_shape(nhwc_input_shape).set_data_layout(DataLayout::NCHW);
152 
153  TensorInfo nhwc_weights_info = weights->clone()->set_is_resizable(true).reset_padding().set_tensor_shape(nhwc_weights_shape).set_data_layout(DataLayout::NCHW);
154 
155  CLPermute::validate(weights, &nhwc_weights_info, PermutationVector(2, 0, 1));
156  CLPermute::validate(input, &nhwc_input_info, PermutationVector(2, 0, 1));
157  }
158 
159  const TensorShape reshaped_shape = TensorShape(nhwc_weights_shape[0], nhwc_weights_shape[1] * nhwc_weights_shape[2] * nhwc_weights_shape[3]);
160  const TensorInfo reshaped_info = weights->clone()->set_tensor_shape(reshaped_shape).set_data_layout(DataLayout::NCHW).set_is_resizable(true);
161  ARM_COMPUTE_RETURN_ON_ERROR(CLReshapeLayer::validate(weights, &reshaped_info));
162 
163  TensorShape transposed_shape(reshaped_shape[1], reshaped_shape[0]);
164  const TensorInfo reshaped_t_info = reshaped_info.clone()->set_is_resizable(true).set_tensor_shape(transposed_shape);
165  ARM_COMPUTE_RETURN_ON_ERROR(CLTranspose::validate(&reshaped_info, &reshaped_t_info));
166 
167  TensorShape gemm_output_shape(weights->dimension(idx_w) * weights->dimension(idx_h) * weights->dimension(idx_b),
168  input->dimension(idx_w),
169  input->dimension(idx_h),
170  input->dimension(idx_b));
171 
172  TensorInfo gemm_output_info = reshaped_t_info.clone()->set_tensor_shape(gemm_output_shape).set_is_resizable(true);
173  GEMMInfo gemm_info(false, false, true, input->dimension(idx_h), true);
174 
175  GEMMLowpOutputStageInfo output_stage_info;
176 
177  if(is_quantized)
178  {
179  ARM_COMPUTE_RETURN_ON_ERROR(CLGEMMLowpMatrixMultiplyCore::validate(&input->clone()->set_tensor_shape(nhwc_input_shape), &reshaped_t_info, nullptr, &gemm_output_info.set_data_type(DataType::S32),
180  gemm_info));
181  ARM_COMPUTE_RETURN_ON_ERROR(construct_gemmlowp_output_stage(input, weights, output, output_stage_info));
182  }
183  else
184  {
185  ARM_COMPUTE_RETURN_ON_ERROR(CLGEMM::validate(&input->clone()->set_tensor_shape(nhwc_input_shape).set_is_resizable(true), &reshaped_t_info, nullptr, &gemm_output_info, 1.0f, 0.0f, gemm_info));
186  }
187 
188  const PadStrideInfo stride_info(deconv_info.stride().first, deconv_info.stride().second);
189  auto out_dims = deconvolution_output_dimensions(input->dimension(idx_w), input->dimension(idx_h), weights->dimension(idx_w), weights->dimension(idx_h), stride_info);
190  const TensorShape deconv_shape = misc::shape_calculator::compute_deconvolution_output_shape(out_dims, *input, *weights);
191  TensorInfo col2im_output_info = gemm_output_info.clone()->set_tensor_shape(deconv_shape).set_is_resizable(true);
192 
193  if(padded_input && is_quantized)
194  {
195  const auto start_end = compute_start_end_slice_coordinates(col2im_output_info, deconv_info, is_nchw);
196  ARM_COMPUTE_RETURN_ON_ERROR(CLDeconvolutionReshapeOutputKernel::validate(&gemm_output_info, bias, &col2im_output_info, input, weights, deconv_info));
197  ARM_COMPUTE_RETURN_ON_ERROR(CLGEMMLowpOutputStage::validate(&col2im_output_info, nullptr, &col2im_output_info.clone()->set_is_resizable(true).set_data_type(input->data_type()), output_stage_info));
198  ARM_COMPUTE_RETURN_ON_ERROR(CLSlice::validate(&col2im_output_info.clone()->set_is_resizable(true).set_data_type(input->data_type()), output, start_end.first, start_end.second));
199  }
200  else if(padded_input)
201  {
202  const auto start_end = compute_start_end_slice_coordinates(col2im_output_info, deconv_info, is_nchw);
203  ARM_COMPUTE_RETURN_ON_ERROR(CLDeconvolutionReshapeOutputKernel::validate(&gemm_output_info, bias, &col2im_output_info, input, weights, deconv_info));
204  ARM_COMPUTE_RETURN_ON_ERROR(CLSlice::validate(&col2im_output_info, output, start_end.first, start_end.second));
205  }
206  else if(is_quantized)
207  {
208  ARM_COMPUTE_RETURN_ON_ERROR(CLDeconvolutionReshapeOutputKernel::validate(&gemm_output_info, bias, &col2im_output_info, input, weights, deconv_info));
209  ARM_COMPUTE_RETURN_ON_ERROR(CLGEMMLowpOutputStage::validate(&col2im_output_info, nullptr, output, output_stage_info));
210  }
211  else
212  {
213  ARM_COMPUTE_RETURN_ON_ERROR(CLDeconvolutionReshapeOutputKernel::validate(&gemm_output_info, bias, output, input, weights, deconv_info));
214  }
215 
216  return Status{};
217 }
static Status validate(const ITensorInfo *input, const ITensorInfo *output)
Static function to check if given info will lead to a valid configuration of CLReshapeLayer.
static Status validate(const ITensorInfo *input, const ITensorInfo *bias, const ITensorInfo *output, const ITensorInfo *input_info, const ITensorInfo *weights_info, const PadStrideInfo &deconv_info)
Static function to check if given info will lead to a valid configuration of CLDeconvolutionReshapeOu...
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_LAYOUT(...)
Definition: Validate.h:490
std::pair< unsigned int, unsigned int > deconvolution_output_dimensions(unsigned int in_width, unsigned int in_height, unsigned int kernel_width, unsigned int kernel_height, const PadStrideInfo &pad_stride_info)
Returns expected width and height of the deconvolution&#39;s output tensor.
Definition: Utils.cpp:375
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204
1 channel, 1 F32 per channel
Strides PermutationVector
Permutation vector.
Definition: Types.h:51
#define ARM_COMPUTE_RETURN_ERROR_ON(cond)
If the condition is true, an error is returned.
Definition: Error.h:296
1 channel, 1 F16 per channel
#define ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR(...)
Definition: Validate.h:159
void permute(Dimensions< T > &dimensions, const PermutationVector &perm)
Permutes given Dimensions according to a permutation vector.
Definition: Helpers.h:125
static Status validate(const ITensorInfo *input, const ITensorInfo *bias, const ITensorInfo *output, const GEMMLowpOutputStageInfo &info)
Static function to check if given info will lead to a valid configuration of opencl::kernels::ClGemmL...
TensorShape compute_deconvolution_output_shape(const std::pair< unsigned int, unsigned int > &out_dims, const ITensorInfo &input, const ITensorInfo &weights)
Calculate the output shape of the deconvolution layer.
1 channel, 1 S32 per channel
static Status validate(const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *output, const GEMMInfo &gemm_info=GEMMInfo())
Static function to check if given info will lead to a valid configuration of CLGEMMLowpMatrixMultiply...
quantized, asymmetric fixed-point 8-bit number unsigned
static Status validate(const ITensorInfo *input, const ITensorInfo *output, const Coordinates &starts, const Coordinates &ends)
Static function to check if given info will lead to a valid configuration of CLSlice.
Definition: CLSlice.cpp:82
Num samples, channels, height, width.
bool is_data_type_quantized_asymmetric(DataType dt)
Check if a given data type is of asymmetric quantized type.
Definition: Utils.h:1003
static Status validate(const ITensorInfo *input, const ITensorInfo *output)
Static function to check if given info will lead to a valid configuration of CLTranspose.
Definition: CLTranspose.cpp:61
size_t get_data_layout_dimension_index(const DataLayout &data_layout, const DataLayoutDimension &data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:193
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(...)
Definition: Validate.h:541
#define ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(t, c,...)
Definition: Validate.h:788
static Status validate(const ITensorInfo *a, const ITensorInfo *b, const ITensorInfo *c, const ITensorInfo *output, float alpha, float beta, const GEMMInfo &gemm_info=GEMMInfo())
Static function to check if given info will lead to a valid configuration of CLGEMM.
Definition: CLGEMM.cpp:95
static Status validate(const ITensorInfo *input, const ITensorInfo *output, const PermutationVector &perm)
Static function to check if given info will lead to a valid configuration of CLPermute.
Definition: CLPermute.cpp:68
quantized, asymmetric fixed-point 8-bit number signed
DataLayout
[DataLayout enum definition]
Definition: Types.h:113

The documentation for this class was generated from the following files: