Compute Library
 21.02
CLReductionOperation Class Reference

Perform reduction operation. More...

#include <CLReductionOperation.h>

Collaboration diagram for CLReductionOperation:
[legend]

Public Member Functions

 CLReductionOperation (std::shared_ptr< IMemoryManager > memory_manager=nullptr)
 Default Constructor. More...
 
 ~CLReductionOperation ()
 Default Destructor. More...
 
 CLReductionOperation (const CLReductionOperation &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
 CLReductionOperation (CLReductionOperation &&)=default
 Default move constructor. More...
 
CLReductionOperationoperator= (const CLReductionOperation &)=delete
 Prevent instances of this class from being copied (As this class contains pointers) More...
 
CLReductionOperationoperator= (CLReductionOperation &&)=default
 Default move assignment operator. More...
 
void configure (ICLTensor *input, ICLTensor *output, unsigned int axis, ReductionOperation op, bool keep_dims=true)
 Set the input and output tensors. More...
 
void configure (const CLCompileContext &compile_context, ICLTensor *input, ICLTensor *output, unsigned int axis, ReductionOperation op, bool keep_dims=true)
 Set the input and output tensors. More...
 
void run () override
 Run the kernels contained in the function. More...
 
- Public Member Functions inherited from IFunction
virtual ~IFunction ()=default
 Destructor. More...
 
virtual void prepare ()
 Prepare the function for executing. More...
 

Static Public Member Functions

static Status validate (const ITensorInfo *input, const ITensorInfo *output, unsigned int axis, ReductionOperation op, bool keep_dims=true)
 Static function to check if given info will lead to a valid configuration of CLReductionOperation. More...
 

Detailed Description

Perform reduction operation.

Definition at line 45 of file CLReductionOperation.h.

Constructor & Destructor Documentation

◆ CLReductionOperation() [1/3]

CLReductionOperation ( std::shared_ptr< IMemoryManager memory_manager = nullptr)

Default Constructor.

Parameters
[in]memory_manager(Optional) Memory manager.

Definition at line 40 of file CLReductionOperation.cpp.

References CLReductionOperation::~CLReductionOperation().

41  : _memory_group(std::move(memory_manager)), _results_vector(), _reduction_kernels_vector(), _border_handlers_vector(), _reshape(), _num_of_stages(), _reduction_axis(), _is_serial(),
42  _is_reshape_required(false)
43 {
44 }

◆ ~CLReductionOperation()

~CLReductionOperation ( )
default

Default Destructor.

Referenced by CLReductionOperation::CLReductionOperation().

◆ CLReductionOperation() [2/3]

Prevent instances of this class from being copied (As this class contains pointers)

◆ CLReductionOperation() [3/3]

Default move constructor.

Member Function Documentation

◆ configure() [1/2]

void configure ( ICLTensor input,
ICLTensor output,
unsigned int  axis,
ReductionOperation  op,
bool  keep_dims = true 
)

Set the input and output tensors.

Parameters
[in]inputSource tensor. Data types supported: QASYMM8/QASYMM8_SIGNED/F16/F32/S32.
[out]outputDestination tensor. Data types and data layouts supported: Same as input.
[in]axisAxis along which to reduce. Supported reduction axis : 0, 1, 2, 3
[in]opReduction operation to perform. Operations supported: MEAN_SUM, PROD, SUM_SQUARE, SUM, MIN, MAX
[in]keep_dims(Optional) Whether to keep the reduced dimension after the operation. Defaults to true.

Definition at line 193 of file CLReductionOperation.cpp.

References CLKernelLibrary::get().

Referenced by CLMeanStdDev::configure(), CLL2NormalizeLayer::configure(), and CLFFTConvolutionLayer::configure().

194 {
195  configure(CLKernelLibrary::get().get_compile_context(), input, output, axis, op, keep_dims);
196 }
static CLKernelLibrary & get()
Access the KernelLibrary singleton.
void configure(ICLTensor *input, ICLTensor *output, unsigned int axis, ReductionOperation op, bool keep_dims=true)
Set the input and output tensors.

◆ configure() [2/2]

void configure ( const CLCompileContext compile_context,
ICLTensor input,
ICLTensor output,
unsigned int  axis,
ReductionOperation  op,
bool  keep_dims = true 
)

Set the input and output tensors.

Parameters
[in]compile_contextThe compile context to be used.
[in]inputSource tensor. Data types supported: QASYMM8/QASYMM8_SIGNED/F16/F32/S32.
[out]outputDestination tensor. Data types and data layouts supported: Same as input.
[in]axisAxis along which to reduce. Supported reduction axis : 0, 1, 2, 3
[in]opReduction operation to perform. Operations supported: MEAN_SUM, PROD, SUM_SQUARE, SUM, MIN, MAX
[in]keep_dims(Optional) Whether to keep the reduced dimension after the operation. Defaults to true.

Definition at line 198 of file CLReductionOperation.cpp.

References ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON_NULLPTR, arm_compute::auto_init_if_empty(), arm_compute::utils::calculate_number_of_stages_only_x_axis(), ICloneable< T >::clone(), arm_compute::misc::shape_calculator::compute_reduced_shape(), CLReshapeLayer::configure(), arm_compute::CONSTANT, ITensorInfo::data_type(), ITensorInfo::dimension(), arm_compute::get_min_max(), ITensor::info(), input_width, MemoryGroup::manage(), arm_compute::MAX, arm_compute::MEAN_SUM, arm_compute::MIN, arm_compute::needs_serialized_reduction(), arm_compute::test::validation::output_shape, arm_compute::PROD, arm_compute::SUM, arm_compute::SUM_SQUARE, and ITensorInfo::tensor_shape().

199 {
201  _num_of_stages = utils::calculate_number_of_stages_only_x_axis(input->info()->dimension(0), axis);
202  _reduction_axis = axis;
203  _is_serial = needs_serialized_reduction(op, input->info()->data_type(), axis);
204  _is_reshape_required = !keep_dims;
205 
206  auto *output_internal = configure_intermediate_result_vector(input, output);
207 
208  if(_is_reshape_required)
209  {
210  const TensorShape output_shape = arm_compute::misc::shape_calculator::compute_reduced_shape(input->info()->tensor_shape(), axis, false);
211  const auto output_data_type = input->info()->data_type();
212  auto_init_if_empty(*output->info(), input->info()->clone()->set_tensor_shape(output_shape).set_data_type(output_data_type).reset_padding().set_is_resizable(true));
213  }
214 
215  // Configure reduction operation kernels
216  _reduction_kernels_vector.reserve(_num_of_stages);
217 
218  // Create temporary tensors
219  if(_is_serial)
220  {
221  if(_is_reshape_required)
222  {
223  _memory_group.manage(&_results_vector.back());
224  }
225 
226  _reduction_kernels_vector.emplace_back(std::make_unique<CLReductionOperationKernel>());
227  _reduction_kernels_vector[0]->configure(compile_context, input, output_internal, axis, op, 0);
228  }
229  else
230  {
231  _border_handlers_vector.reserve(_num_of_stages);
232  _memory_group.manage(&_results_vector[0]);
233 
234  ReductionOperation first_kernel_op;
235  ReductionOperation intermediate_kernel_op;
236  ReductionOperation last_kernel_op;
237  PixelValue pixelValue;
238  switch(op)
239  {
242  first_kernel_op = ReductionOperation::SUM;
243  intermediate_kernel_op = ReductionOperation::SUM;
244  last_kernel_op = op;
245  pixelValue = PixelValue();
246  break;
248  first_kernel_op = ReductionOperation::SUM_SQUARE;
249  intermediate_kernel_op = ReductionOperation::SUM;
250  last_kernel_op = ReductionOperation::SUM;
251  pixelValue = PixelValue();
252  break;
254  first_kernel_op = ReductionOperation::PROD;
255  intermediate_kernel_op = ReductionOperation::PROD;
256  last_kernel_op = ReductionOperation::PROD;
257  pixelValue = PixelValue(1, input->info()->data_type());
258  break;
260  first_kernel_op = ReductionOperation::MIN;
261  intermediate_kernel_op = ReductionOperation::MIN;
262  last_kernel_op = ReductionOperation::MIN;
263  pixelValue = std::get<1>(get_min_max(input->info()->data_type()));
264  break;
266  first_kernel_op = ReductionOperation::MAX;
267  intermediate_kernel_op = ReductionOperation::MAX;
268  last_kernel_op = ReductionOperation::MAX;
269  pixelValue = std::get<0>(get_min_max(input->info()->data_type()));
270  break;
271  default:
272  ARM_COMPUTE_ERROR("Not supported");
273  }
274 
275  _reduction_kernels_vector.emplace_back(std::make_unique<CLReductionOperationKernel>());
276  _reduction_kernels_vector[0]->configure(compile_context, input, &_results_vector[0], axis, first_kernel_op);
277 
278  _border_handlers_vector.emplace_back(std::make_unique<CLFillBorderKernel>());
279  _border_handlers_vector[0]->configure(compile_context, input, _reduction_kernels_vector[0]->border_size(), BorderMode::CONSTANT, pixelValue);
280 
281  // Apply ReductionOperation on intermediate stages
282  for(unsigned int i = 1; i < _num_of_stages - 1; ++i)
283  {
284  _memory_group.manage(&_results_vector[i]);
285 
286  _reduction_kernels_vector.emplace_back(std::make_unique<CLReductionOperationKernel>());
287  _reduction_kernels_vector[i]->configure(compile_context, &_results_vector[i - 1], &_results_vector[i], axis, intermediate_kernel_op);
288 
289  _border_handlers_vector.emplace_back(std::make_unique<CLFillBorderKernel>());
290  _border_handlers_vector[i]->configure(compile_context, &_results_vector[i - 1], _reduction_kernels_vector[i]->border_size(), BorderMode::CONSTANT, pixelValue);
291 
292  _results_vector[i - 1].allocator()->allocate();
293  }
294 
295  // Apply ReductionOperation on the last stage
296  const unsigned int last_stage = _num_of_stages - 1;
297  const unsigned int input_width = input->info()->dimension(0);
298 
299  if(_is_reshape_required)
300  {
301  _memory_group.manage(&_results_vector.back());
302  }
303 
304  _reduction_kernels_vector.emplace_back(std::make_unique<CLReductionOperationKernel>());
305  _reduction_kernels_vector[last_stage]->configure(compile_context, &_results_vector[last_stage - 1], output_internal, axis, last_kernel_op, input_width);
306 
307  _border_handlers_vector.emplace_back(std::make_unique<CLFillBorderKernel>());
308  _border_handlers_vector[last_stage]->configure(compile_context, &_results_vector[last_stage - 1], _reduction_kernels_vector[last_stage]->border_size(), BorderMode::CONSTANT, pixelValue);
309 
310  _results_vector[last_stage - 1].allocator()->allocate();
311  }
312 
313  if(_is_reshape_required)
314  {
315  _reshape.configure(compile_context, &_results_vector.back(), output);
316  _results_vector.back().allocator()->allocate();
317  }
318 }
unsigned int calculate_number_of_stages_only_x_axis(size_t input_x_dimension, unsigned int axis)
Calculate number of stages for parallel implementations.
Definition: Utils.cpp:68
bool needs_serialized_reduction(ReductionOperation op, DataType dt, unsigned int axis)
Check if the given reduction operation should be handled in a serial way.
Definition: Utils.cpp:453
ReductionOperation
Available reduction operations.
Definition: Types.h:521
#define ARM_COMPUTE_ERROR(msg)
Print the given message then throw an std::runtime_error.
Definition: Error.h:352
void manage(IMemoryManageable *obj) override
Sets a object to be managed by the given memory group.
Definition: MemoryGroup.h:79
TensorShape compute_reduced_shape(const TensorShape &input, unsigned int axis, bool keep_dims=true)
Calculate the reduced shape of a tensor given an axis.
bool auto_init_if_empty(ITensorInfo &info, const TensorShape &shape, int num_channels, DataType data_type, QuantizationInfo quantization_info=QuantizationInfo())
Auto initialize the tensor info (shape, number of channels and data type) if the current assignment i...
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161
void configure(const ICLTensor *input, ICLTensor *output)
Initialise the kernel&#39;s inputs and outputs.
std::tuple< PixelValue, PixelValue > get_min_max(DataType dt)
Compute the mininum and maximum values a data type can take.
Definition: Utils.h:564

◆ operator=() [1/2]

CLReductionOperation& operator= ( const CLReductionOperation )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

CLReductionOperation& operator= ( CLReductionOperation &&  )
default

Default move assignment operator.

◆ run()

void run ( )
overridevirtual

Run the kernels contained in the function.

For Neon kernels:

  • Multi-threading is used for the kernels which are parallelisable.
  • By default std::thread::hardware_concurrency() threads are used.
Note
CPPScheduler::set_num_threads() can be used to manually set the number of threads

For OpenCL kernels:

  • All the kernels are enqueued on the queue associated with CLScheduler.
  • The queue is then flushed.
Note
The function will not block until the kernels are executed. It is the user's responsibility to wait.
Will call prepare() on first run if hasn't been done

Implements IFunction.

Definition at line 320 of file CLReductionOperation.cpp.

References CLScheduler::enqueue(), CLScheduler::get(), and CLReshapeLayer::run().

Referenced by CLMeanStdDev::configure(), CLL2NormalizeLayer::run(), and CLFFTConvolutionLayer::run().

321 {
322  MemoryGroupResourceScope scope_mg(_memory_group);
323 
324  if(_is_serial)
325  {
326  CLScheduler::get().enqueue(*_reduction_kernels_vector[0], false);
327  }
328  else
329  {
330  for(unsigned int i = 0; i < _num_of_stages; ++i)
331  {
332  CLScheduler::get().enqueue(*_border_handlers_vector[i], false);
333  CLScheduler::get().enqueue(*_reduction_kernels_vector[i], false);
334  }
335  }
336 
337  if(_is_reshape_required)
338  {
339  _reshape.run();
340  }
341 }
static CLScheduler & get()
Access the scheduler singleton.
void run() override
Run the kernels contained in the function.
void enqueue(ICLKernel &kernel, bool flush=true)
Schedule the execution of the passed kernel if possible.

◆ validate()

Status validate ( const ITensorInfo input,
const ITensorInfo output,
unsigned int  axis,
ReductionOperation  op,
bool  keep_dims = true 
)
static

Static function to check if given info will lead to a valid configuration of CLReductionOperation.

Parameters
[in]inputSource tensor info. Data types supported: QASYMM8/QASYMM8_SIGNED/F16/F32/S32.
[in]outputDestination tensor info. Data types and data layouts supported: Same as input.
[in]axisAxis along which to reduce. Supported reduction axis : 0, 1, 2, 3
[in]opReduction operation to perform. Operations supported: MEAN_SUM, PROD, SUM_SQUARE, SUM, MIN, MAX
[in]keep_dims(Optional) Whether to keep the reduced dimension after the operation. Defaults to true.
Returns
a status

Definition at line 48 of file CLReductionOperation.cpp.

References ARM_COMPUTE_ERROR, ARM_COMPUTE_ERROR_ON_NULLPTR, ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_SHAPES, ARM_COMPUTE_RETURN_ERROR_ON_MSG, ARM_COMPUTE_RETURN_ON_ERROR, arm_compute::utils::calculate_number_of_stages_only_x_axis(), ICloneable< T >::clone(), arm_compute::misc::shape_calculator::compute_reduced_shape(), arm_compute::test::validation::data_type, ITensorInfo::data_type(), ITensorInfo::dimension(), ITensor::info(), arm_compute::test::validation::input, arm_compute::test::validation::input_shape, arm_compute::MAX, arm_compute::MEAN_SUM, arm_compute::MIN, arm_compute::needs_serialized_reduction(), ITensorInfo::num_channels(), Dimensions< size_t >::num_max_dimensions, arm_compute::PROD, arm_compute::test::validation::qinfo, ITensorInfo::quantization_info(), TensorShape::set(), TensorInfo::set_data_type(), ITensorInfo::set_num_channels(), ITensorInfo::set_quantization_info(), ITensorInfo::set_tensor_shape(), arm_compute::test::validation::shape, arm_compute::SUM, arm_compute::SUM_SQUARE, ITensorInfo::tensor_shape(), ITensorInfo::total_size(), CLReshapeLayer::validate(), and CLReductionOperationKernel::validate().

Referenced by arm_compute::test::validation::DATA_TEST_CASE(), CLMeanStdDev::validate(), and CLL2NormalizeLayer::validate().

49 {
51  ARM_COMPUTE_RETURN_ERROR_ON_MSG(axis >= TensorShape::num_max_dimensions, "Reduction axis greater than max number of dimensions");
52  ARM_COMPUTE_RETURN_ERROR_ON_MSG(axis > 3, "Unsupported reduction axis");
53 
54  const unsigned int num_of_stages = utils::calculate_number_of_stages_only_x_axis(input->dimension(0), axis);
55  const bool is_serial = needs_serialized_reduction(op, input->data_type(), axis);
56  const bool is_reshape_required = !keep_dims;
57 
58  if(is_reshape_required && output->total_size() != 0)
59  {
60  const TensorInfo expected_output_shape = output->clone()->set_tensor_shape(arm_compute::misc::shape_calculator::compute_reduced_shape(input->tensor_shape(), axis, keep_dims));
61  ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_SHAPES(&expected_output_shape, output);
62  }
63 
64  auto *output_internal = output;
65 
66  TensorInfo output_before_reshape;
67  const auto input_shape = input->tensor_shape();
68  const auto input_data_type = input->data_type();
69  const auto input_num_channles = input->num_channels();
70  const auto input_qinfo = input->quantization_info();
71  const auto output_data_type = output->data_type();
72 
73  auto initialize_tensorinfo = [](TensorInfo & ti, TensorShape shape, DataType data_type, int num_channels, QuantizationInfo qinfo)
74  {
75  ti.set_data_type(data_type).set_tensor_shape(shape).set_num_channels(num_channels).set_quantization_info(qinfo);
76  };
77 
78  if(is_reshape_required)
79  {
80  auto shape_before_reshape = input_shape;
81  shape_before_reshape.set(axis, 1);
82  initialize_tensorinfo(output_before_reshape, shape_before_reshape, output_data_type, input_num_channles, input_qinfo);
83  output_internal = &output_before_reshape;
84  }
85 
86  if(is_serial)
87  {
89  }
90  else
91  {
92  // Create temporary tensor infos
93  std::vector<TensorInfo> sums_vector(num_of_stages - 1);
94 
95  // Create intermediate tensor info
96  TensorShape shape{ input_shape };
97 
98  shape.set(0, ceil(shape.x() / 128.f));
99 
100  for(unsigned int i = 0; i < num_of_stages - 1; i++)
101  {
102  initialize_tensorinfo(sums_vector[i], shape, input_data_type, input_num_channles, input_qinfo);
103  }
104 
105  ReductionOperation first_kernel_op;
106  ReductionOperation intermediate_kernel_op;
107  ReductionOperation last_kernel_op;
108  switch(op)
109  {
112  first_kernel_op = ReductionOperation::SUM;
113  intermediate_kernel_op = ReductionOperation::SUM;
114  last_kernel_op = op;
115  break;
117  first_kernel_op = ReductionOperation::SUM_SQUARE;
118  intermediate_kernel_op = ReductionOperation::SUM;
119  last_kernel_op = ReductionOperation::SUM;
120  break;
122  first_kernel_op = ReductionOperation::PROD;
123  intermediate_kernel_op = ReductionOperation::PROD;
124  last_kernel_op = ReductionOperation::PROD;
125  break;
127  first_kernel_op = ReductionOperation::MIN;
128  intermediate_kernel_op = ReductionOperation::MIN;
129  last_kernel_op = ReductionOperation::MIN;
130  break;
132  first_kernel_op = ReductionOperation::MAX;
133  intermediate_kernel_op = ReductionOperation::MAX;
134  last_kernel_op = ReductionOperation::MAX;
135  break;
136  default:
137  ARM_COMPUTE_ERROR("Not supported");
138  }
139 
140  // Validate ReductionOperation only on first kernel
141  ARM_COMPUTE_RETURN_ON_ERROR(CLReductionOperationKernel::validate(input, &sums_vector[0], axis, first_kernel_op));
142 
143  // Validate ReductionOperation on intermediate stages
144  for(unsigned int i = 1; i < num_of_stages - 1; ++i)
145  {
146  ARM_COMPUTE_RETURN_ON_ERROR(CLReductionOperationKernel::validate(&sums_vector[i - 1], &sums_vector[i], axis, intermediate_kernel_op));
147  }
148 
149  // Validate ReductionOperation on the last stage
150  const unsigned int last_stage = num_of_stages - 1;
151  ARM_COMPUTE_RETURN_ON_ERROR(CLReductionOperationKernel::validate(&sums_vector[last_stage - 1], output_internal, axis, last_kernel_op, input->dimension(0)));
152  }
153 
154  if(is_reshape_required)
155  {
156  ARM_COMPUTE_RETURN_ON_ERROR(CLReshapeLayer::validate(output_internal, output));
157  }
158 
159  return Status{};
160 }
unsigned int calculate_number_of_stages_only_x_axis(size_t input_x_dimension, unsigned int axis)
Calculate number of stages for parallel implementations.
Definition: Utils.cpp:68
bool needs_serialized_reduction(ReductionOperation op, DataType dt, unsigned int axis)
Check if the given reduction operation should be handled in a serial way.
Definition: Utils.cpp:453
static Status validate(const ITensorInfo *input, const ITensorInfo *output)
Static function to check if given info will lead to a valid configuration of CLReshapeLayer.
ReductionOperation
Available reduction operations.
Definition: Types.h:521
#define ARM_COMPUTE_ERROR(msg)
Print the given message then throw an std::runtime_error.
Definition: Error.h:352
#define ARM_COMPUTE_RETURN_ON_ERROR(status)
Checks if a status contains an error and returns it.
Definition: Error.h:204
const DataType data_type
Definition: Im2Col.cpp:150
TensorShape input_shape
Validate test suite is to test ARM_COMPUTE_RETURN_ON_* macros we use to check the validity of given a...
TensorShape compute_reduced_shape(const TensorShape &input, unsigned int axis, bool keep_dims=true)
Calculate the reduced shape of a tensor given an axis.
static Status validate(const ITensorInfo *input, const ITensorInfo *output, unsigned int axis, ReductionOperation op, unsigned int width=0)
Static function to check if given info will lead to a valid configuration of CLReductionOperationKern...
#define ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_SHAPES(...)
Definition: Validate.h:443
const QuantizationInfo qinfo
Definition: Im2Col.cpp:155
#define ARM_COMPUTE_RETURN_ERROR_ON_MSG(cond, msg)
If the condition is true, an error is returned.
Definition: Error.h:244
#define ARM_COMPUTE_ERROR_ON_NULLPTR(...)
Definition: Validate.h:161
static constexpr size_t num_max_dimensions
Number of dimensions the tensor has.
Definition: Dimensions.h:46
DataType
Available data types.
Definition: Types.h:77
TensorShape & set(size_t dimension, size_t value, bool apply_dim_correction=true, bool increase_dim_unit=true)
Accessor to set the value of one of the dimensions.
Definition: TensorShape.h:79

The documentation for this class was generated from the following files: