Compute Library
 21.11
CLDepthwiseConvolutionLayer Class Reference

Function to execute a depthwise convolution. More...

#include <CLDepthwiseConvolutionLayer.h>

Collaboration diagram for CLDepthwiseConvolutionLayer:
[legend]

Public Member Functions

 CLDepthwiseConvolutionLayer (std::shared_ptr< IMemoryManager > memory_manager=nullptr)
 Default constructor. More...
 
 CLDepthwiseConvolutionLayer (const CLDepthwiseConvolutionLayer &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 CLDepthwiseConvolutionLayer (CLDepthwiseConvolutionLayer &&)=default
 Default move constructor. More...
 
CLDepthwiseConvolutionLayeroperator= (const CLDepthwiseConvolutionLayer &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
CLDepthwiseConvolutionLayeroperator= (CLDepthwiseConvolutionLayer &&)=default
 Default move assignment operator. More...
 
 ~CLDepthwiseConvolutionLayer ()
 Default destructor. More...
 
void configure (const CLCompileContext &compile_context, ICLTensor *input, const ICLTensor *weights, const ICLTensor *biases, ICLTensor *output, const PadStrideInfo &conv_info, unsigned int depth_multiplier=1, ActivationLayerInfo act_info=ActivationLayerInfo(), const Size2D &dilation=Size2D(1U, 1U))
 Initialize the function's source, destination, weights and convolution information. More...
 
void configure (ICLTensor *input, const ICLTensor *weights, const ICLTensor *biases, ICLTensor *output, const PadStrideInfo &conv_info, unsigned int depth_multiplier=1, ActivationLayerInfo act_info=ActivationLayerInfo(), const Size2D &dilation=Size2D(1U, 1U))
 Initialize the function's source, destination, weights and convolution information. More...
 
void run () override
 Run the kernels contained in the function. More...
 
void prepare () override
 Prepare the function for executing. More...
 
void set_memory_group (std::shared_ptr< IMemoryManager > memory_manager)
 
- Public Member Functions inherited from IFunction
virtual ~IFunction ()=default
 Destructor. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, const PadStrideInfo &conv_info, unsigned int depth_multiplier=1, ActivationLayerInfo act_info=ActivationLayerInfo(), const Size2D &dilation=Size2D(1U, 1U))
 Static function to check if given info will lead to a valid configuration of CLDepthwiseConvolutionLayer. More...
 

Detailed Description

Function to execute a depthwise convolution.

  1. CLDepthwiseConvolutionLayerNativeKernel
  2. CLPermute (if the data layout is NCHW)

Definition at line 45 of file CLDepthwiseConvolutionLayer.h.

Constructor & Destructor Documentation

◆ CLDepthwiseConvolutionLayer() [1/3]

CLDepthwiseConvolutionLayer ( std::shared_ptr< IMemoryManager memory_manager = nullptr)

Default constructor.

Definition at line 135 of file CLDepthwiseConvolutionLayer.cpp.

136  : _memory_group(std::move(memory_manager)),
137  _dwc_native_kernel(std::make_unique<CLDepthwiseConvolutionLayerNativeKernel>()),
138  _permute_input_to_nhwc(),
139  _permute_weights_to_nhwc(),
140  _permute_output_to_nchw(),
141  _permuted_input(),
142  _permuted_weights(),
143  _permuted_output(),
144  _output_multipliers(),
145  _output_shifts(),
146  _original_weights(),
147  _input(),
148  _output(),
149  _needs_permute(false),
150  _is_prepared(false),
151  _is_quantized(false)
152 {
153 }

◆ CLDepthwiseConvolutionLayer() [2/3]

Prevent instances of this class from being copied (As this class contains pointers)

◆ CLDepthwiseConvolutionLayer() [3/3]

Default move constructor.

◆ ~CLDepthwiseConvolutionLayer()

Default destructor.

Member Function Documentation

◆ configure() [1/2]

void configure ( const CLCompileContext compile_context,
ICLTensor input,
const ICLTensor weights,
const ICLTensor biases,
ICLTensor output,
const PadStrideInfo conv_info,
unsigned int  depth_multiplier = 1,
ActivationLayerInfo  act_info = ActivationLayerInfo(),
const Size2D dilation = Size2D(1U, 1U) 
)

Initialize the function's source, destination, weights and convolution information.

Valid data layouts:

  • NHWC
  • NCHW

Valid data type configurations:

src0 src1 src2 dst
F16 F16 F16 F16
F32 F32 F32 F32
QASYMM8 QASYMM8 S32 QASYMM8
QASYMM8 QSYMM8_PER_CHANNEL S32 QASYMM8
QASYMM8_SIGNED QASYMM8_SIGNED S32 QASYMM8_SIGNED
QASYMM8_SIGNED QSYMM8_PER_CHANNEL S32 QASYMM8_SIGNED
Parameters
[in]compile_contextThe compile context to be used.
[in,out]inputSource tensor. Data type supported: QASYMM8/QASYMM8_SIGNED/FP16/FP32. Data layout supported: NHWC, NCHW
[in]weightsWeights tensor. These are 3D tensors with shape [kernel_x, kernel_y, IFM]. Data type supported: Same as input or QASYMM8/QASYMM8_SIGNED/QSYMM8_PER_CHANNEL when input is QASYMM8.
[in]biasesBiases tensor. A 1D tensor with shape [IFM]. Must be nullptr if not needed. Data type supported: Same as input, S32 when input is QASYMM8/QASYMM8_SIGNED.
[out]outputDestination tensor. Pass in nullptr or input for in-place operation. Data type supported: same as input.
[in]conv_infoPadding and stride information to use for the convolution.
[in]depth_multiplier(Optional) Multiplier to apply to the input's depth in order to retrieve the output's depth. Defaults to 1.
[in]act_info(Optional) Activation layer information in case of a fused activation.
[in]dilation(Optional) Dilation, in elements, across x and y. Defaults to (1, 1).
Note
: For in-place support, please check CLDepthwiseConvolutionLayerNativeKernel

Definition at line 163 of file CLDepthwiseConvolutionLayer.cpp.

References CLTensorAllocator::allocate(), CLTensor::allocator(), ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, ARM_COMPUTE_LOG_PARAMS, arm_compute::CHANNEL, CLPermute::configure(), arm_compute::test::validation::conv_info, ITensorInfo::data_layout(), ITensorInfo::data_type(), ITensorInfo::dimension(), CLScheduler::get(), arm_compute::get_data_layout_dimension_index(), ITensor::info(), CLTensor::info(), ITensorAllocator::init(), arm_compute::test::validation::input, arm_compute::is_data_type_quantized(), arm_compute::is_data_type_quantized_per_channel(), MemoryGroup::manage(), arm_compute::NCHW, arm_compute::NHWC, ITensorInfo::quantization_info(), arm_compute::S32, TensorInfo::set_data_layout(), TensorInfo::set_quantization_info(), CLScheduler::target(), arm_compute::U, and CLDepthwiseConvolutionLayer::validate().

Referenced by CLDepthwiseConvolutionLayer::configure().

166 {
169  weights->info(),
170  biases != nullptr ? biases->info() : nullptr,
171  output != nullptr ? output->info() : input->info(),
172  conv_info,
173  depth_multiplier,
174  act_info,
175  dilation));
176  ARM_COMPUTE_LOG_PARAMS(input, weights, biases, output, conv_info, depth_multiplier, act_info, dilation);
177 
178  _is_quantized = is_data_type_quantized(input->info()->data_type());
179  _is_prepared = false;
180  _original_weights = weights;
181  _input = input;
182  _output = output;
183  _needs_permute = input->info()->data_layout() == DataLayout::NCHW;
184 
185  const GPUTarget gpu_target = CLScheduler::get().target();
186 
187  ICLTensor *input_to_use = input;
188  const ICLTensor *weights_to_use = weights;
189  ICLTensor *output_to_use = output;
190  if(_needs_permute)
191  {
192  _memory_group.manage(&_permuted_input);
193  _memory_group.manage(&_permuted_output);
194 
195  // Configure the function to transform the input tensor from NCHW -> NHWC
196  _permute_input_to_nhwc.configure(compile_context, input, &_permuted_input, PermutationVector(2U, 0U, 1U));
197  _permuted_input.info()->set_data_layout(DataLayout::NHWC);
198 
199  // Configure the function to transform the weights tensor from IHW -> HWI
200  _permute_weights_to_nhwc.configure(compile_context, weights, &_permuted_weights, PermutationVector(2U, 0U, 1U));
201  _permuted_weights.info()->set_data_layout(DataLayout::NHWC);
202 
203  // Set output quantization info before dwc kernel configure
204  _permuted_output.info()->set_quantization_info(output->info()->quantization_info());
205 
206  input_to_use = &_permuted_input;
207  weights_to_use = &_permuted_weights;
208  output_to_use = &_permuted_output;
209  }
210 
211  CLTensor *output_multipliers_to_use = nullptr;
212  CLTensor *output_shifts_to_use = nullptr;
213  if(_is_quantized)
214  {
215  const size_t idx_c = get_data_layout_dimension_index(weights->info()->data_layout(), DataLayoutDimension::CHANNEL);
216  const size_t num_filters = (is_data_type_quantized_per_channel(weights->info()->data_type())) ? weights->info()->dimension(idx_c) : 1;
217 
218  _output_multipliers.allocator()->init(TensorInfo(TensorShape(num_filters), 1, DataType::S32));
219  _output_shifts.allocator()->init(TensorInfo(TensorShape(num_filters), 1, DataType::S32));
220 
221  output_multipliers_to_use = &_output_multipliers;
222  output_shifts_to_use = &_output_shifts;
223  }
224 
225  DWCComputeKernelInfo dwc_native_compute_info;
226  initialize_dwc_native_compute_info(dwc_native_compute_info, weights_to_use->info(), conv_info, dilation, depth_multiplier, gpu_target);
227 
228  const ConvolutionInfo conv_kernel_info{ conv_info, depth_multiplier, act_info, dilation };
229 
230  _dwc_native_kernel->configure(compile_context, input_to_use, weights_to_use, biases, output_to_use,
231  dwc_native_compute_info, conv_kernel_info, output_multipliers_to_use, output_shifts_to_use);
232 
233  if(_needs_permute)
234  {
235  _permuted_input.allocator()->allocate();
236 
237  // Configure the function to transform the convoluted output to NCHW format
238  _permuted_output.info()->set_data_layout(DataLayout::NCHW);
239  _permute_output_to_nchw.configure(compile_context, &_permuted_output, output, PermutationVector(1U, 2U, 0U));
240  _permuted_output.allocator()->allocate();
241  }
242 
243  if(_is_quantized)
244  {
245  _output_multipliers.allocator()->allocate();
246  _output_shifts.allocator()->allocate();
247  }
248 }
bool is_data_type_quantized(DataType dt)
Check if a given data type is of quantized type.
Definition: Utils.h:981
TensorInfo * info() const override
Interface to be implemented by the child class to return the tensor&#39;s metadata.
Definition: CLTensor.cpp:41
static CLScheduler & get()
Access the scheduler singleton.
GPUTarget target() const
Get the target GPU.
Definition: CLScheduler.cpp:45
Strides PermutationVector
Permutation vector.
Definition: Types.h:51
CLTensorAllocator * allocator()
Return a pointer to the tensor&#39;s allocator.
Definition: CLTensor.cpp:61
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:455
void init(const TensorInfo &input, size_t alignment=0)
Initialize a tensor based on the passed TensorInfo.
ITensorInfo & set_quantization_info(const QuantizationInfo &quantization_info) override
Set the quantization settings (scale and offset) of the tensor.
Definition: TensorInfo.cpp:346
1 channel, 1 S32 per channel
void manage(IMemoryManageable *obj) override
Sets a object to be managed by the given memory group.
Definition: MemoryGroup.h:79
ITensorInfo & set_data_layout(const DataLayout &data_layout) override
Set the data layout of the tensor.
Definition: TensorInfo.cpp:352
bool is_data_type_quantized_per_channel(DataType dt)
Check if a given data type is of per channel type.
Definition: Utils.h:1058
Num samples, channels, height, width.
void allocate() override
Allocate size specified by TensorInfo of OpenCL memory.
GPUTarget
Available GPU Targets.
Definition: GPUTarget.h:34
size_t get_data_layout_dimension_index(const DataLayout &data_layout, const DataLayoutDimension &data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:193
Num samples, height, width, channels.
#define ARM_COMPUTE_LOG_PARAMS(...)
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:157
void configure(const ICLTensor *input, ICLTensor *output, const PermutationVector &perm)
Set the input and output tensors.
Definition: CLPermute.cpp:51
static Status validate(const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, const PadStrideInfo &conv_info, unsigned int depth_multiplier=1, ActivationLayerInfo act_info=ActivationLayerInfo(), const Size2D &dilation=Size2D(1U, 1U))
Static function to check if given info will lead to a valid configuration of CLDepthwiseConvolutionLa...

◆ configure() [2/2]

void configure ( ICLTensor input,
const ICLTensor weights,
const ICLTensor biases,
ICLTensor output,
const PadStrideInfo conv_info,
unsigned int  depth_multiplier = 1,
ActivationLayerInfo  act_info = ActivationLayerInfo(),
const Size2D dilation = Size2D(1U, 1U) 
)

Initialize the function's source, destination, weights and convolution information.

Similar to CLDepthwiseConvolutionLayer::configure()

Definition at line 157 of file CLDepthwiseConvolutionLayer.cpp.

References CLDepthwiseConvolutionLayer::configure(), and CLKernelLibrary::get().

159 {
160  configure(CLKernelLibrary::get().get_compile_context(), input, weights, biases, output, conv_info, depth_multiplier, act_info, dilation);
161 }
static CLKernelLibrary & get()
Access the KernelLibrary singleton.
void configure(const CLCompileContext &compile_context, ICLTensor *input, const ICLTensor *weights, const ICLTensor *biases, ICLTensor *output, const PadStrideInfo &conv_info, unsigned int depth_multiplier=1, ActivationLayerInfo act_info=ActivationLayerInfo(), const Size2D &dilation=Size2D(1U, 1U))
Initialize the function&#39;s source, destination, weights and convolution information.

◆ operator=() [1/2]

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

Default move assignment operator.

◆ prepare()

void prepare ( )
overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Note
Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from IFunction.

Definition at line 343 of file CLDepthwiseConvolutionLayer.cpp.

References CLTensorAllocator::allocate(), CLTensor::allocator(), ARM_COMPUTE_ERROR_ON, arm_compute::quantization::compute_quantized_multipliers_and_shifts(), ITensor::info(), ITensor::is_used(), CLTensor::map(), ITensor::mark_as_unused(), ITensor::ptr_to_element(), CLPermute::run(), and CLTensor::unmap().

Referenced by CLDepthwiseConvolutionLayer::run().

344 {
345  if(!_is_prepared)
346  {
347  if(_is_quantized)
348  {
349  _output_multipliers.map();
350  _output_shifts.map();
352  _original_weights->info(),
353  _output != nullptr ? _output->info() : _input->info(),
354  reinterpret_cast<int32_t *>(_output_multipliers.ptr_to_element(Coordinates(0))),
355  reinterpret_cast<int32_t *>(_output_shifts.ptr_to_element(Coordinates(0))));
356  _output_multipliers.unmap();
357  _output_shifts.unmap();
358  }
359 
360  if(_needs_permute)
361  {
362  ARM_COMPUTE_ERROR_ON(!_original_weights->is_used());
363 
364  _permuted_weights.allocator()->allocate();
365  _permute_weights_to_nhwc.run();
366  _original_weights->mark_as_unused();
367  }
368  _is_prepared = true;
369  }
370 }
uint8_t * ptr_to_element(const Coordinates &id) const
Return a pointer to the element at the passed coordinates.
Definition: ITensor.h:63
bool is_used() const
Flags if the tensor is used or not.
Definition: ITensor.cpp:163
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:466
CLTensorAllocator * allocator()
Return a pointer to the tensor&#39;s allocator.
Definition: CLTensor.cpp:61
void run() override
Run the kernels contained in the function.
Definition: CLPermute.cpp:73
void map(bool blocking=true)
Enqueue a map operation of the allocated buffer.
Definition: CLTensor.cpp:66
void mark_as_unused() const
Marks a tensor as unused.
Definition: ITensor.cpp:168
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor&#39;s metadata.
void allocate() override
Allocate size specified by TensorInfo of OpenCL memory.
void compute_quantized_multipliers_and_shifts(const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *output, int32_t *output_multipliers_ptr, int32_t *output_shifts_ptr)
Compute quantized per-channel multipliers and shifts.
void unmap()
Enqueue an unmap operation of the allocated and mapped buffer.
Definition: CLTensor.cpp:71

◆ run()

void run ( )
overridevirtual

Run the kernels contained in the function.

For CPU kernels:

  • Multi-threading is used for the kernels which are parallelisable.
  • By default std::thread::hardware_concurrency() threads are used.
Note
CPPScheduler::set_num_threads() can be used to manually set the number of threads

For OpenCL kernels:

  • All the kernels are enqueued on the queue associated with CLScheduler.
  • The queue is then flushed.
Note
The function will not block until the kernels are executed. It is the user's responsibility to wait.
Will call prepare() on first run if hasn't been done

Implements IFunction.

Definition at line 326 of file CLDepthwiseConvolutionLayer.cpp.

References CLScheduler::enqueue(), CLScheduler::get(), CLDepthwiseConvolutionLayer::prepare(), and CLPermute::run().

327 {
328  prepare();
329 
330  MemoryGroupResourceScope scope_mg(_memory_group);
331 
332  if(_needs_permute)
333  {
334  _permute_input_to_nhwc.run();
335  }
336  CLScheduler::get().enqueue(*_dwc_native_kernel);
337  if(_needs_permute)
338  {
339  _permute_output_to_nchw.run();
340  }
341 }
void prepare() override
Prepare the function for executing.
static CLScheduler & get()
Access the scheduler singleton.
void run() override
Run the kernels contained in the function.
Definition: CLPermute.cpp:73
void enqueue(ICLKernel &kernel, bool flush=true)
Schedule the execution of the passed kernel if possible.

◆ set_memory_group()

void set_memory_group ( std::shared_ptr< IMemoryManager memory_manager)
inline

Definition at line 113 of file CLDepthwiseConvolutionLayer.h.

114  {
115  _memory_group = MemoryGroup(std::move(memory_manager));
116  };

◆ validate()

Status validate ( const ITensorInfo input,
const ITensorInfo weights,
const ITensorInfo biases,
const ITensorInfo output,
const PadStrideInfo conv_info,
unsigned int  depth_multiplier = 1,
ActivationLayerInfo  act_info = ActivationLayerInfo(),
const Size2D dilation = Size2D(1U, 1U) 
)
static

Static function to check if given info will lead to a valid configuration of CLDepthwiseConvolutionLayer.

Similar to CLDepthwiseConvolutionLayer::configure()

Returns
a status

Definition at line 250 of file CLDepthwiseConvolutionLayer.cpp.

References ARM_COMPUTE_RETURN_ERROR_ON, ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_LAYOUT, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES, ARM_COMPUTE_RETURN_ERROR_ON_MSG, ARM_COMPUTE_RETURN_ON_ERROR, arm_compute::CHANNEL, ICloneable< T >::clone(), arm_compute::misc::shape_calculator::compute_depthwise_convolution_shape(), arm_compute::test::validation::conv_info, ITensorInfo::data_layout(), ITensorInfo::data_type(), ITensorInfo::dimension(), CLScheduler::get(), arm_compute::get_data_layout_dimension_index(), arm_compute::HEIGHT, arm_compute::test::validation::info, arm_compute::test::validation::input, arm_compute::is_data_type_quantized(), arm_compute::is_data_type_quantized_per_channel(), arm_compute::NCHW, arm_compute::NHWC, PadStrideInfo::pad_bottom(), PadStrideInfo::pad_left(), PadStrideInfo::pad_right(), PadStrideInfo::pad_top(), arm_compute::permute(), arm_compute::QSYMM8_PER_CHANNEL, arm_compute::S32, CLScheduler::target(), ITensorInfo::tensor_shape(), arm_compute::U, CLDepthwiseConvolutionLayerNativeKernel::validate(), CLPermute::validate(), arm_compute::WIDTH, Size2D::x(), and Size2D::y().

Referenced by CLDepthwiseConvolutionLayer::configure(), and arm_compute::test::validation::DATA_TEST_CASE().

253 {
254  const bool in_place = input == output || output == nullptr;
255  if(in_place)
256  {
257  output = input;
258  }
260  const size_t idx_w = get_data_layout_dimension_index(input->data_layout(), DataLayoutDimension::WIDTH);
261  const size_t idx_h = get_data_layout_dimension_index(input->data_layout(), DataLayoutDimension::HEIGHT);
262 
263  ARM_COMPUTE_RETURN_ERROR_ON(weights->dimension(idx_w) + (weights->dimension(idx_w) - 1) * (dilation.x() - 1) > input->dimension(idx_w) + conv_info.pad_left() + conv_info.pad_right());
264  ARM_COMPUTE_RETURN_ERROR_ON(weights->dimension(idx_h) + (weights->dimension(idx_h) - 1) * (dilation.y() - 1) > input->dimension(idx_h) + conv_info.pad_top() + conv_info.pad_bottom());
265 
266  const GPUTarget gpu_target = CLScheduler::get().target();
267 
268  const ConvolutionInfo conv_kernel_info{ conv_info, depth_multiplier, act_info, dilation };
269 
270  const bool needs_permute = input->data_layout() == DataLayout::NCHW;
271 
272  const bool is_quantized = is_data_type_quantized(input->data_type());
273 
274  TensorInfo output_multipliers_shifts_info(TensorInfo(TensorShape(1U), 1, DataType::S32));
275  if(is_quantized)
276  {
277  if(is_data_type_quantized_per_channel(weights->data_type()))
278  {
280 
281  const size_t idx_c = get_data_layout_dimension_index(weights->data_layout(), DataLayoutDimension::CHANNEL);
282  output_multipliers_shifts_info.set_tensor_shape(TensorShape(weights->dimension(idx_c)));
283  }
284  else
285  {
287  }
288  }
289 
290  if(needs_permute)
291  {
292  ARM_COMPUTE_RETURN_ERROR_ON_MSG(in_place, "In-place is supported only with NHWC data layout");
293  TensorShape permuted_input_shape = input->tensor_shape();
294  TensorShape permuted_weights_shape = weights->tensor_shape();
295  const ConvolutionInfo info{ conv_info, depth_multiplier, ActivationLayerInfo(), dilation };
296  TensorShape permuted_output_shape = shape_calculator::compute_depthwise_convolution_shape(*input, *weights, info);
297 
298  permute(permuted_input_shape, PermutationVector(2U, 0U, 1U));
299  permute(permuted_weights_shape, PermutationVector(2U, 0U, 1U));
300  permute(permuted_output_shape, PermutationVector(2U, 0U, 1U));
301 
302  const TensorInfo permuted_input = input->clone()->set_is_resizable(true).reset_padding().set_tensor_shape(permuted_input_shape).set_data_layout(DataLayout::NHWC);
303  const TensorInfo permuted_weights = weights->clone()->set_is_resizable(true).reset_padding().set_tensor_shape(permuted_weights_shape).set_data_layout(DataLayout::NHWC);
304  const TensorInfo permuted_output = output->clone()->set_is_resizable(true).reset_padding().set_tensor_shape(permuted_output_shape).set_data_layout(DataLayout::NHWC);
305 
307  ARM_COMPUTE_RETURN_ON_ERROR(CLPermute::validate(weights, &permuted_weights, PermutationVector(2U, 0U, 1U)));
308 
309  DWCComputeKernelInfo dwc_native_compute_info;
310  initialize_dwc_native_compute_info(dwc_native_compute_info, &permuted_weights, conv_info, dilation, depth_multiplier, gpu_target);
311 
312  ARM_COMPUTE_RETURN_ON_ERROR(CLDepthwiseConvolutionLayerNativeKernel::validate(&permuted_input, &permuted_weights, biases, &permuted_output,
313  dwc_native_compute_info, conv_kernel_info, &output_multipliers_shifts_info, &output_multipliers_shifts_info));
314  ARM_COMPUTE_RETURN_ON_ERROR(CLPermute::validate(&permuted_output, output, PermutationVector(1U, 2U, 0U)));
315  }
316  else
317  {
318  DWCComputeKernelInfo dwc_native_compute_info;
319  initialize_dwc_native_compute_info(dwc_native_compute_info, weights, conv_info, dilation, depth_multiplier, gpu_target);
320  ARM_COMPUTE_RETURN_ON_ERROR(CLDepthwiseConvolutionLayerNativeKernel::validate(input, weights, biases, output, dwc_native_compute_info, conv_kernel_info, &output_multipliers_shifts_info,
321  &output_multipliers_shifts_info));
322  }
323  return Status{};
324 }
bool is_data_type_quantized(DataType dt)
Check if a given data type is of quantized type.
Definition: Utils.h:981
TensorShape compute_depthwise_convolution_shape(const ITensorInfo &input, const ITensorInfo &weights, const ConvolutionInfo &info)
Calculate the depthwise convolution output shape of a tensor.
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_LAYOUT(...)
Definition: Validate.h:490
static CLScheduler & get()
Access the scheduler singleton.
GPUTarget target() const
Get the target GPU.
Definition: CLScheduler.cpp:45
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204
Strides PermutationVector
Permutation vector.
Definition: Types.h:51
#define ARM_COMPUTE_RETURN_ERROR_ON(cond)
If the condition is true, an error is returned.
Definition: Error.h:296
void permute(Dimensions< T > &dimensions, const PermutationVector &perm)
Permutes given Dimensions according to a permutation vector.
Definition: Helpers.h:125
1 channel, 1 S32 per channel
bool is_data_type_quantized_per_channel(DataType dt)
Check if a given data type is of per channel type.
Definition: Utils.h:1058
Num samples, channels, height, width.
quantized, symmetric per channel fixed-point 8-bit number
ScaleKernelInfo info(interpolation_policy, default_border_mode, PixelValue(), sampling_policy, false)
GPUTarget
Available GPU Targets.
Definition: GPUTarget.h:34
size_t get_data_layout_dimension_index(const DataLayout &data_layout, const DataLayoutDimension &data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:193
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(...)
Definition: Validate.h:541
Num samples, height, width, channels.
#define ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(t, c,...)
Definition: Validate.h:788
static Status validate(const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, const DWCComputeKernelInfo &dwc_info, const ConvolutionInfo &conv_info, const ITensorInfo *output_multipliers=nullptr, const ITensorInfo *output_shifts=nullptr)
Static function to check if given info will lead to a valid configuration of CLDepthwiseConvolutionLa...
#define ARM_COMPUTE_RETURN_ERROR_ON_MSG(cond, msg)
If the condition is true, an error is returned.
Definition: Error.h:244
static Status validate(const ITensorInfo *input, const ITensorInfo *output, const PermutationVector &perm)
Static function to check if given info will lead to a valid configuration of CLPermute.
Definition: CLPermute.cpp:68

The documentation for this class was generated from the following files: