Compute Library
 19.08
NEDepthwiseConvolutionAssemblyDispatch Class Reference

Depthwise convolution assembly kernel glue. More...

#include <NEDepthwiseConvolutionAssemblyDispatch.h>

Collaboration diagram for NEDepthwiseConvolutionAssemblyDispatch:
[legend]

Public Member Functions

 NEDepthwiseConvolutionAssemblyDispatch (std::shared_ptr< IMemoryManager > memory_manager=nullptr)
 Default constructor. More...
 
 NEDepthwiseConvolutionAssemblyDispatch (const NEDepthwiseConvolutionAssemblyDispatch &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 NEDepthwiseConvolutionAssemblyDispatch (NEDepthwiseConvolutionAssemblyDispatch &&)=default
 Default move constructor. More...
 
NEDepthwiseConvolutionAssemblyDispatchoperator= (const NEDepthwiseConvolutionAssemblyDispatch &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
NEDepthwiseConvolutionAssemblyDispatchoperator= (NEDepthwiseConvolutionAssemblyDispatch &&)=default
 Default move assignment operator. More...
 
 ~NEDepthwiseConvolutionAssemblyDispatch ()
 Default destructor. More...
 
void configure (const ITensor *input, const ITensor *weights, const ITensor *bias, ITensor *output, const PadStrideInfo &conv_info, unsigned int depth_multiplier=1, const ActivationLayerInfo &act_info=ActivationLayerInfo(), const Size2D &dilation=Size2D(1, 1))
 Initialize the function's source, destination, kernels and border_size. More...
 
void run () override
 Run the kernels contained in the function. More...
 
void prepare () override
 Prepare the function for executing. More...
 
- Public Member Functions inherited from IFunction
virtual ~IFunction ()=default
 Destructor. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *bias, const ITensorInfo *output, const PadStrideInfo &conv_info, unsigned int depth_multiplier=1, const ActivationLayerInfo &act_info=ActivationLayerInfo(), const Size2D &dilation=Size2D(1, 1))
 Static function to check if given info will lead to a valid configuration of NEDepthwiseConvolutionAssemblyDispatch. More...
 
static bool is_optimized_supported (const ITensorInfo *input, const ITensorInfo *weights, PadStrideInfo conv_info, unsigned int depth_multiplier=1, const Size2D &dilation=Size2D(1, 1))
 Check if the optimized kernel can be used for the given kernel sizes and strides. More...
 

Detailed Description

Depthwise convolution assembly kernel glue.

Definition at line 36 of file NEDepthwiseConvolutionAssemblyDispatch.h.

Constructor & Destructor Documentation

◆ NEDepthwiseConvolutionAssemblyDispatch() [1/3]

NEDepthwiseConvolutionAssemblyDispatch ( std::shared_ptr< IMemoryManager memory_manager = nullptr)

Default constructor.

Parameters
[in,out]memory_managerMemory manager to use

◆ NEDepthwiseConvolutionAssemblyDispatch() [2/3]

Prevent instances of this class from being copied (As this class contains pointers)

◆ NEDepthwiseConvolutionAssemblyDispatch() [3/3]

◆ ~NEDepthwiseConvolutionAssemblyDispatch()

Default destructor.

Member Function Documentation

◆ configure()

void configure ( const ITensor input,
const ITensor weights,
const ITensor bias,
ITensor output,
const PadStrideInfo conv_info,
unsigned int  depth_multiplier = 1,
const ActivationLayerInfo act_info = ActivationLayerInfo(),
const Size2D dilation = Size2D(1, 1) 
)

Initialize the function's source, destination, kernels and border_size.

Note
Supports only NHWC format
Parameters
[in]inputSource tensor. Data type supported: QASYMM8/F16/F32. (Written to only for border filling).
[in]weightsWeights tensor. These are 3D tensors with shape [W, H, IFM]. Data type supported: Same as input.
[in]bias(Optional) Biases tensor. A 1D tensor with shape [IFM]. Must be nullptr if not needed. Data type supported: Same as input.
[out]outputDestination tensor. Data type supported: same as input.
[in]conv_infoPadding and stride information to use for the convolution.
[in]depth_multiplier(Optional) Multiplier to apply to the input's depth in order to retrieve the output's depth. Defaults to 1.
[in]act_info(Optional) Activation layer information in case of a fused activation.
[in]dilation(Optional) Dilation, in elements, across x and y. Defaults to (1, 1).

Definition at line 267 of file NEDepthwiseConvolutionAssemblyDispatch.cpp.

275 {
276  ARM_COMPUTE_ERROR_ON_NULLPTR(input, weights, output);
277  ARM_COMPUTE_UNUSED(depth_multiplier);
279  weights->info(),
280  bias != nullptr ? bias->info() : nullptr,
281  output->info(),
282  conv_info,
283  depth_multiplier,
284  act_info,
285  dilation));
286 
287  // Output auto inizialitation if not yet initialized
288  const TensorShape output_shape = misc::shape_calculator::compute_depthwise_convolution_shape(*input->info(), *weights->info(), conv_info, depth_multiplier, dilation);
289  auto_init_if_empty(*output->info(), input->info()->clone()->set_is_resizable(true).reset_padding().set_tensor_shape(output_shape).set_quantization_info(output->info()->quantization_info()));
290 
291  _input = input;
292  _weights = weights;
293  _bias = bias;
294  _output = output;
295  _is_prepared = false;
296 
297  // Create convolver
298  _pImpl->_dwc_assembly_kernel = create_convolver(input, weights, output, conv_info, act_info, dilation);
299  ARM_COMPUTE_ERROR_ON(_pImpl->_dwc_assembly_kernel == nullptr);
300 
301  // Create assembly kernel wrapper
302  _pImpl->_dwc_acl_kernel.configure(_pImpl->_dwc_assembly_kernel.get());
303 
304  constexpr size_t alignment = 128;
305 
306  // Create workspace
307  const unsigned int num_threads = NEScheduler::get().num_threads();
308  const size_t workspace_size = _pImpl->_dwc_assembly_kernel->get_working_space_size(num_threads);
309  ARM_COMPUTE_ERROR_ON_MSG(workspace_size == 0, "Workspace size cannot be 0 !");
310  _workspace.allocator()->init(TensorInfo(TensorShape{ workspace_size }, 1, DataType::S8), alignment);
311  _memory_group.manage(&_workspace);
312  _workspace.allocator()->allocate();
313 
314  // Create packing tensor
315  const size_t pack_tensor_size = _pImpl->_dwc_assembly_kernel->get_packed_params_size();
316  ARM_COMPUTE_ERROR_ON_MSG(pack_tensor_size == 0, "Pack tensor size cannot be 0 !");
317  _packed_weights.allocator()->init(TensorInfo(TensorShape{ pack_tensor_size }, 1, DataType::S8), alignment);
318 }
TensorInfo * info() const override
Interface to be implemented by the child class to return the tensor's metadata.
Definition: CLTensor.cpp:35
TensorShape compute_depthwise_convolution_shape(const ITensorInfo &input, const ITensorInfo &weights, PadStrideInfo conv_info, unsigned int depth_multiplier, const Size2D &dilation=Size2D(1U, 1U))
Calculate the depthwise convolution output shape of a tensor.
void init(const TensorAllocator &allocator, const Coordinates &coords, TensorInfo &sub_info)
Shares the same backing memory with another tensor allocator, while the tensor info might be differen...
static Status validate(const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *bias, const ITensorInfo *output, const PadStrideInfo &conv_info, unsigned int depth_multiplier=1, const ActivationLayerInfo &act_info=ActivationLayerInfo(), const Size2D &dilation=Size2D(1, 1))
Static function to check if given info will lead to a valid configuration of NEDepthwiseConvolutionAs...
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:337
#define ARM_COMPUTE_ERROR_THROW_ON(status)
Definition: Error.h:327
bool auto_init_if_empty(ITensorInfo &info, const TensorShape &shape, int num_channels, DataType data_type, QuantizationInfo quantization_info=QuantizationInfo())
Auto initialize the tensor info (shape, number of channels and data type) if the current assignment i...
Definition: Helpers.inl:201
TensorAllocator * allocator()
Return a pointer to the tensor's allocator.
Definition: Tensor.cpp:48
#define ARM_COMPUTE_UNUSED(...)
To avoid unused variables warnings.
Definition: Error.h:160
void manage(TensorType *obj)
Sets a object to be managed by the given memory group.
void allocate() override
Allocate size specified by TensorInfo of CPU memory.
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161
virtual unsigned int num_threads() const =0
Returns the number of threads that the SingleThreadScheduler has in his pool.
signed 8-bit number
static IScheduler & get()
Access the scheduler singleton.
Definition: Scheduler.cpp:96
#define ARM_COMPUTE_ERROR_ON_MSG(cond,...)
Definition: Error.h:328

References arm_compute::test::validation::act_info, TensorAllocator::allocate(), Tensor::allocator(), ARM_COMPUTE_ERROR_ON, ARM_COMPUTE_ERROR_ON_MSG, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_ERROR_THROW_ON, ARM_COMPUTE_UNUSED, arm_compute::auto_init_if_empty(), arm_compute::test::validation::bias, ICloneable< T >::clone(), arm_compute::misc::shape_calculator::compute_depthwise_convolution_shape(), arm_compute::test::validation::conv_info, arm_compute::test::validation::dilation, Scheduler::get(), ITensor::info(), CLTensor::info(), TensorAllocator::init(), MemoryGroupBase< TensorType >::manage(), IScheduler::num_threads(), arm_compute::test::validation::output_shape, ITensorInfo::quantization_info(), arm_compute::S8, NEDepthwiseConvolutionAssemblyDispatch::validate(), and arm_compute::test::validation::weights.

◆ is_optimized_supported()

bool is_optimized_supported ( const ITensorInfo input,
const ITensorInfo weights,
PadStrideInfo  conv_info,
unsigned int  depth_multiplier = 1,
const Size2D dilation = Size2D(1, 1) 
)
static

Check if the optimized kernel can be used for the given kernel sizes and strides.

Warning
Even if this return true the inputs and outputs might need to get permuted as the only layout supported is NHWC
Parameters
[in]inputInput tensor info.
[in]weightsWeights tensor info.
[in]conv_infoConvolution layer metadata.
[in]depth_multiplier(Optional) Depth multiplier to be used.
[in]dilation(Optional) Dilation, in elements, across x and y. Defaults to (1, 1).
Returns
True if the assembly kernel could be used else false. Note that transformations of input/output could be needed.

Definition at line 361 of file NEDepthwiseConvolutionAssemblyDispatch.cpp.

366 {
368 
369  // Reshape input shape if in NHWC format
370  const DataLayout data_layout = input->data_layout();
371  TensorShape in_shape{ input->tensor_shape() };
373  {
374  in_shape.set(Window::DimX, input->tensor_shape().y());
375  in_shape.set(Window::DimY, input->tensor_shape().z());
376  in_shape.set(Window::DimZ, input->tensor_shape().x());
377  }
378 
379  // Check data type
380  const DataType data_type = weights->data_type();
382 
383  // Check weighs size
384  std::set<unsigned int> supported_kernel_sizes = { 3, 5 };
387  const unsigned int kernel_w = weights->dimension(width_idx);
388  const unsigned int kernel_h = weights->dimension(height_idx);
389  bool weights_supported = (kernel_w == kernel_h) && (supported_kernel_sizes.count(kernel_w) != 0);
390 
391  // Check for supported strides
392  const auto &strides = conv_info.stride();
393  bool supported_strides = (strides.first == strides.second) && ((strides.first == 1) || (strides.first == 2));
394 
395  // Check for supported padding
396  const auto pad_top = conv_info.pad_top();
397  const auto pad_right = conv_info.pad_right();
398  const auto pad_bottom = conv_info.pad_bottom();
399  const auto pad_left = conv_info.pad_left();
400  PadStrideInfo same_pad = calculate_same_pad(in_shape, TensorShape(kernel_w, kernel_h), conv_info, DataLayout::NCHW, dilation);
401  bool is_same_padding = (pad_top == same_pad.pad_top()) && (pad_right == same_pad.pad_right()) && (pad_bottom == same_pad.pad_bottom()) && (pad_left == same_pad.pad_left());
402  bool is_valid_padding = (pad_top == 0) && (pad_right == 0) && (pad_bottom == 0) && (pad_left == 0);
403  bool supported_padding = is_same_padding || is_valid_padding;
404  // TODO(COMPMID-2464): Enable once dilated conv with stride 2 is supported
405  bool is_dilation_supported = (dilation == Size2D(1U, 1U)) || ((dilation.x() == dilation.y()) && strides.first == 1);
406 
407  return is_data_type_valid && weights_supported && supported_strides && supported_padding && (depth_multiplier == 1) && is_dilation_supported;
408 }
const DataLayout data_layout
Definition: Im2Col.cpp:146
static constexpr size_t DimX
Alias for dimension 0 also known as X dimension.
Definition: Window.h:43
Num samples, channels, height, width.
bool is_data_type_quantized_asymmetric(DataType dt)
Check if a given data type is of asymmetric quantized type.
Definition: Utils.h:1030
static constexpr size_t DimY
Alias for dimension 1 also known as Y dimension.
Definition: Window.h:45
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161
static constexpr size_t DimZ
Alias for dimension 2 also known as Z dimension.
Definition: Window.h:47
Num samples, height, width, channels.
size_t get_data_layout_dimension_index(const DataLayout data_layout, const DataLayoutDimension data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:326
DataType
Available data types.
Definition: Types.h:74
DataLayout
[DataLayout enum definition]
Definition: Types.h:114
bool is_data_type_float(DataType dt)
Check if a given data type is of floating point type.
Definition: Utils.h:990
PadStrideInfo calculate_same_pad(TensorShape input_shape, TensorShape weights_shape, PadStrideInfo conv_info, DataLayout data_layout=DataLayout::NCHW, const Size2D &dilation=Size2D(1u, 1u), const DimensionRoundingType &rounding_type=DimensionRoundingType::FLOOR)
Calculate padding requirements in case of SAME padding.
Definition: Utils.cpp:334

References ARM_COMPUTE_ERROR_ON_NULLPTR, arm_compute::calculate_same_pad(), arm_compute::test::validation::conv_info, arm_compute::test::validation::data_layout, ITensorInfo::data_layout(), arm_compute::test::validation::data_type, arm_compute::test::validation::dilation, Window::DimX, Window::DimY, Window::DimZ, arm_compute::get_data_layout_dimension_index(), arm_compute::HEIGHT, arm_compute::is_data_type_float(), arm_compute::is_data_type_quantized_asymmetric(), arm_compute::NCHW, arm_compute::NHWC, PadStrideInfo::pad_bottom(), PadStrideInfo::pad_left(), PadStrideInfo::pad_right(), PadStrideInfo::pad_top(), TensorShape::set(), ITensorInfo::tensor_shape(), arm_compute::U, arm_compute::test::validation::weights, arm_compute::WIDTH, Dimensions< T >::x(), Dimensions< T >::y(), and Dimensions< T >::z().

Referenced by NEDepthwiseConvolutionLayer3x3::configure(), NEDepthwiseConvolutionLayerOptimized::configure(), NEDepthwiseConvolutionAssemblyDispatch::validate(), NEDepthwiseConvolutionLayer3x3::validate(), and NEDepthwiseConvolutionLayerOptimized::validate().

◆ operator=() [1/2]

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

Default move assignment operator.

◆ prepare()

void prepare ( )
overridevirtual

Prepare the function for executing.

Any one off pre-processing step required by the function is handled here

Note
Prepare stage might not need all the function's buffers' backing memory to be available in order to execute

Reimplemented from IFunction.

Definition at line 441 of file NEDepthwiseConvolutionAssemblyDispatch.cpp.

442 {
443  if(!_is_prepared)
444  {
445  _packed_weights.allocator()->allocate();
446  ARM_COMPUTE_ERROR_ON(_packed_weights.buffer() == nullptr);
447 
448  // Pack weights and bias
449  const int weights_element_size = _weights->info()->element_size();
450  const int weights_row_stride = _weights->info()->strides_in_bytes().z() / weights_element_size;
451  const int weights_col_stride = _weights->info()->strides_in_bytes().y() / weights_element_size;
452  _pImpl->_dwc_assembly_kernel->pack_params(_packed_weights.buffer(),
453  _weights->buffer() + _weights->info()->offset_first_element_in_bytes(),
454  weights_row_stride,
455  weights_col_stride,
456  (_bias != nullptr) ? _bias->buffer() : nullptr);
457  _pImpl->_dwc_assembly_kernel->set_packed_params_buffer(_packed_weights.buffer());
458 
459  _weights->mark_as_unused();
460  if(_bias != nullptr)
461  {
462  _bias->mark_as_unused();
463  }
464  _is_prepared = true;
465  }
466 }
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:337
TensorAllocator * allocator()
Return a pointer to the tensor's allocator.
Definition: Tensor.cpp:48
void mark_as_unused() const
Marks a tensor as unused.
Definition: ITensor.cpp:167
T z() const
Alias to access the size of the third dimension.
Definition: Dimensions.h:91
virtual uint8_t * buffer() const =0
Interface to be implemented by the child class to return a pointer to CPU memory.
void allocate() override
Allocate size specified by TensorInfo of CPU memory.
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor's metadata.
virtual size_t element_size() const =0
Element size in bytes calculated as data_size() * num_channels()
virtual size_t offset_first_element_in_bytes() const =0
The offset from the beginning of the memory allocation to the first element of the tensor.
uint8_t * buffer() const override
Interface to be implemented by the child class to return a pointer to CPU memory.
Definition: Tensor.cpp:43
T y() const
Alias to access the size of the second dimension.
Definition: Dimensions.h:86
virtual const Strides & strides_in_bytes() const =0
The strides in bytes for accessing each dimension of the tensor.

References TensorAllocator::allocate(), Tensor::allocator(), ARM_COMPUTE_ERROR_ON, ITensor::buffer(), Tensor::buffer(), ITensorInfo::element_size(), ITensor::info(), ITensor::mark_as_unused(), ITensorInfo::offset_first_element_in_bytes(), ITensorInfo::strides_in_bytes(), Dimensions< T >::y(), and Dimensions< T >::z().

Referenced by NEDepthwiseConvolutionLayer3x3::prepare(), NEDepthwiseConvolutionLayerOptimized::prepare(), and NEDepthwiseConvolutionAssemblyDispatch::run().

◆ run()

void run ( )
overridevirtual

Run the kernels contained in the function.

For NEON kernels:

  • Multi-threading is used for the kernels which are parallelisable.
  • By default std::thread::hardware_concurrency() threads are used.
Note
CPPScheduler::set_num_threads() can be used to manually set the number of threads

For OpenCL kernels:

  • All the kernels are enqueued on the queue associated with CLScheduler.
  • The queue is then flushed.
Note
The function will not block until the kernels are executed. It is the user's responsibility to wait.
Will call prepare() on first run if hasn't been done

Implements IFunction.

Definition at line 410 of file NEDepthwiseConvolutionAssemblyDispatch.cpp.

411 {
412  // Prepare assembly kernel
413  prepare();
414 
415  MemoryGroupResourceScope scope_mg(_memory_group);
416 
417  // Setup inputs/outputs
418  ARM_COMPUTE_ERROR_ON(_workspace.buffer() == nullptr);
419  _pImpl->_dwc_assembly_kernel->set_working_space(static_cast<void *>(_workspace.buffer()));
420 
421  ARM_COMPUTE_ERROR_ON(_input->buffer() == nullptr);
422  const int input_element_size = _input->info()->element_size();
423  const int input_batch_stride = _input->info()->strides_in_bytes()[3] / input_element_size;
424  const int input_row_stride = _input->info()->strides_in_bytes().z() / input_element_size;
425  const int input_col_stride = _input->info()->strides_in_bytes().y() / input_element_size;
426  const void *input_ptr = _input->buffer() + _input->info()->offset_first_element_in_bytes();
427  _pImpl->_dwc_assembly_kernel->set_input(input_ptr, input_batch_stride, input_row_stride, input_col_stride);
428 
429  ARM_COMPUTE_ERROR_ON(_output->buffer() == nullptr);
430  const int output_element_size = _output->info()->element_size();
431  const int output_batch_stride = _output->info()->strides_in_bytes()[3] / output_element_size;
432  const int output_row_stride = _output->info()->strides_in_bytes().z() / output_element_size;
433  const int output_col_stride = _output->info()->strides_in_bytes().y() / output_element_size;
434  void *output_ptr = _output->buffer() + _output->info()->offset_first_element_in_bytes();
435  _pImpl->_dwc_assembly_kernel->set_output(output_ptr, output_batch_stride, output_row_stride, output_col_stride);
436 
437  // Schedule assembly kernel
438  NEScheduler::get().schedule(&_pImpl->_dwc_acl_kernel, Window::DimX);
439 }
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:337
static constexpr size_t DimX
Alias for dimension 0 also known as X dimension.
Definition: Window.h:43
T z() const
Alias to access the size of the third dimension.
Definition: Dimensions.h:91
virtual uint8_t * buffer() const =0
Interface to be implemented by the child class to return a pointer to CPU memory.
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor's metadata.
virtual size_t element_size() const =0
Element size in bytes calculated as data_size() * num_channels()
virtual size_t offset_first_element_in_bytes() const =0
The offset from the beginning of the memory allocation to the first element of the tensor.
virtual void schedule(ICPPKernel *kernel, const Hints &hints)=0
Runs the kernel in the same thread as the caller synchronously.
uint8_t * buffer() const override
Interface to be implemented by the child class to return a pointer to CPU memory.
Definition: Tensor.cpp:43
T y() const
Alias to access the size of the second dimension.
Definition: Dimensions.h:86
virtual const Strides & strides_in_bytes() const =0
The strides in bytes for accessing each dimension of the tensor.
static IScheduler & get()
Access the scheduler singleton.
Definition: Scheduler.cpp:96

References ARM_COMPUTE_ERROR_ON, ITensor::buffer(), Tensor::buffer(), Window::DimX, ITensorInfo::element_size(), Scheduler::get(), ITensor::info(), ITensorInfo::offset_first_element_in_bytes(), NEDepthwiseConvolutionAssemblyDispatch::prepare(), IScheduler::schedule(), ITensorInfo::strides_in_bytes(), Dimensions< T >::y(), and Dimensions< T >::z().

◆ validate()

Status validate ( const ITensorInfo input,
const ITensorInfo weights,
const ITensorInfo bias,
const ITensorInfo output,
const PadStrideInfo conv_info,
unsigned int  depth_multiplier = 1,
const ActivationLayerInfo act_info = ActivationLayerInfo(),
const Size2D dilation = Size2D(1, 1) 
)
static

Static function to check if given info will lead to a valid configuration of NEDepthwiseConvolutionAssemblyDispatch.

Note
Supports only NHWC format
Parameters
[in]inputSource tensor. Data type supported: QASYMM8/F16/F32. (Written to only for border filling).
[in]weightsWeights tensor. These are 3D tensors with shape [W, H, IFM]. Data type supported: Same as input.
[in]bias(Optional) Biases tensor. A 1D tensor with shape [IFM]. Must be nullptr if not needed. Data type supported: Same as input.
[out]outputDestination tensor. Data type supported: same as input.
[in]conv_infoPadding and stride information to use for the convolution.
[in]depth_multiplier(Optional) Multiplier to apply to the input's depth in order to retrieve the output's depth. Defaults to 1.
[in]act_info(Optional) Activation layer information in case of a fused activation.
[in]dilation(Optional) Dilation, in elements, across x and y. Defaults to (1, 1).
Returns
An error status

Definition at line 320 of file NEDepthwiseConvolutionAssemblyDispatch.cpp.

328 {
333 
334  // Validate convolver
336 
337  // Validate activation
341 
342  // Check bias
343  if(bias != nullptr)
344  {
345  unsigned int channel_idx = get_data_layout_dimension_index(input->data_layout(), DataLayoutDimension::CHANNEL);
346  ARM_COMPUTE_RETURN_ERROR_ON(bias->num_dimensions() > 1);
347  ARM_COMPUTE_RETURN_ERROR_ON(bias->dimension(0) != weights->dimension(channel_idx));
348  }
349 
350  // Check output
351  if(output->total_size() != 0)
352  {
356  }
357 
358  return Status{};
359 }
TensorShape compute_depthwise_convolution_shape(const ITensorInfo &input, const ITensorInfo &weights, PadStrideInfo conv_info, unsigned int depth_multiplier, const Size2D &dilation=Size2D(1U, 1U))
Calculate the depthwise convolution output shape of a tensor.
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_LAYOUT(...)
Definition: Validate.h:494
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(...)
Definition: Validate.h:545
#define ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(t, c,...)
Definition: Validate.h:791
1 channel, 1 F32 per channel
#define ARM_COMPUTE_RETURN_ERROR_ON(cond)
If the condition is true, an error is returned.
Definition: Error.h:244
#define ARM_COMPUTE_RETURN_ERROR_ON_CPU_F16_UNSUPPORTED(tensor)
Definition: Validate.h:71
1 channel, 1 F16 per channel
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DIMENSIONS(...)
Definition: Validate.h:288
quantized, asymmetric fixed-point 8-bit number
static bool is_optimized_supported(const ITensorInfo *input, const ITensorInfo *weights, PadStrideInfo conv_info, unsigned int depth_multiplier=1, const Size2D &dilation=Size2D(1, 1))
Check if the optimized kernel can be used for the given kernel sizes and strides.
bool is_relu6(ActivationLayerInfo activation_info)
Checks if activation information correspond to a relu6 activation function.
Definition: InfoHelpers.h:53
bool is_relu(ActivationLayerInfo activation_info)
Checks if activation information correspond to a relu activation function.
Definition: InfoHelpers.h:42
size_t get_data_layout_dimension_index(const DataLayout data_layout, const DataLayoutDimension data_layout_dimension)
Get the index of the given dimension.
Definition: Helpers.inl:326

References arm_compute::test::validation::act_info, ARM_COMPUTE_RETURN_ERROR_ON, ARM_COMPUTE_RETURN_ERROR_ON_CPU_F16_UNSUPPORTED, ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_LAYOUT, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DIMENSIONS, arm_compute::test::validation::bias, arm_compute::CHANNEL, arm_compute::misc::shape_calculator::compute_depthwise_convolution_shape(), arm_compute::test::validation::conv_info, ITensorInfo::data_layout(), arm_compute::test::validation::dilation, arm_compute::F16, arm_compute::F32, arm_compute::get_data_layout_dimension_index(), NEDepthwiseConvolutionAssemblyDispatch::is_optimized_supported(), arm_compute::utils::info_helpers::is_relu(), arm_compute::utils::info_helpers::is_relu6(), arm_compute::test::validation::output_shape, arm_compute::QASYMM8, ITensorInfo::tensor_shape(), ITensorInfo::total_size(), and arm_compute::test::validation::weights.

Referenced by NEDepthwiseConvolutionAssemblyDispatch::configure(), NEDepthwiseConvolutionLayer3x3::validate(), and NEDepthwiseConvolutionLayerOptimized::validate().


The documentation for this class was generated from the following files: