This patch simplifies the representation of OpenMP loop wrapper
operations by introducing the `NoTerminator` trait and updating
accordingly the verifier for the `LoopWrapperInterface`.
Since loop wrappers are already limited to having exactly one region
containing exactly one block, and this block can only hold a single
`omp.loop_nest` or loop wrapper and an `omp.terminator` that does not
return any values, it makes sense to simplify the representation of loop
wrappers by removing the terminator.
There is an extensive list of Lit tests that needed updating to remove
the `omp.terminator`s adding some noise to this patch, but actual
changes are limited to the definition of the `omp.wsloop`, `omp.simd`,
`omp.distribute` and `omp.taskloop` loop wrapper ops, Flang lowering for
those, `LoopWrapperInterface::verifyImpl()`, SCF to OpenMP conversion
and OpenMP dialect documentation.
This patch moves the creation of `DataSharingProcessor` instances for
loop constructs out of `genOMPDispatch()` and into their corresponding
codegen functions. This is a necessary first step to enable a proper
handling of privatization on composite constructs.
Some tests are updated due to a change of order between clause
processing and privatization.
This patch splits the lowering for `omp.loop_nest` into its own function
and updates lowering for all supported loop wrappers to stop creating
this operation themselves.
Lowering functions for loop constructs are split into "wrapper" and
"standalone" variants, where the "wrapper" version only creates the
specific operation with nothing inside of it and the "standalone"
version calls the former and also handles clause processing and creates
the nested `omp.loop_nest`.
"Wrapper" lowering functions can be used by "composite" lowering
functions in follow-up patches, minimizing code duplication.
Tests broken as a result of reordering between the processing of the
loop wrapper's and the nested `omp.loop_nest`'s clauses are also
updated.
This PR contains 2 commits:
1. A commit to reapply changes introduced #91116 (was reverted earlier
due to test suite failures)
2. A commit containing a possible solution for the issue causing the
test suite failures. In particular, it introduces a simple symbol
visitor class to keep track of the current active OMP construct and
marking this active construct as the scope defining the symbol being
visisted.
Fixes#88935
Toggling reduction by-ref broke when multiple reduction clauses were
used. Decisions made for the by-ref status for later clauses could then
invalidate decisions for earlier clauses. For example,
```
reduction(+:scalar,scalar2) reduction(+:array)
```
The first clause would choose by value reduction and generate by-value
reduction regions, but then after this the second clause would force
by-ref to support the array argument. But by the time the second clause
is processed, the first clause has already had the wrong kind of
reduction regions generated.
This is solved by toggling whether a variable should be reduced by
reference per variable. In the above example, this allows only `array`
to be reduced by ref.
This patch updates lowering from PFT to MLIR of workshare loops to
follow the loop wrapper approach. Unit tests impacted by this change are
also updated.
As the last patch of the stack, this should compile and pass unit tests.
It turned out that `hlfir::genVariableBox` didn't add lower bounds to
the boxes it created. Using a shapeshift instead of only a shape adds
the lower bounds information to the thread-local copy of the box.
Fixes#89259
Both arrays and trivial scalars are supported. Both cases must use
by-ref reductions because both are boxed.
My understanding of the standards are that OpenMP says that this should
follow the rules of the intrinsic reduction operators in fortran, and
fortran says that unallocated allocatable variables can only be
referenced to allocate them or test if they are already allocated.
Therefore we do not need a null pointer check in the combiner region.
Following up on a review comment:
https://github.com/llvm/llvm-project/pull/84958#discussion_r1527627848
Reductions might be inlined inside of a loop so stack allocations are
not safe.
Normally flang allocates arrays on the stack. Allocatable arrays have a
different type: fir.box<fir.heap<fir.array<...>>> instead of
fir.box<fir.array<...>>. This patch will allocate all arrays on the
heap.
Reductions on allocatable arrays still aren't supported (but I will get
to this soon).
Re-use fir::getTypeAsString instead of creating something new here. This
spells integer names like i32 instead of i_32 so there is a lot of test
churn.
This has been tested with arrays with compile-time constant bounds.
Allocatable arrays and arrays with non-constant bounds are not yet
supported. User-defined reduction functions are also not yet supported.
The design is intended to work for arrays with non-constant bounds too
without a lot of extra work (mostly there are bugs in OpenMPIRBuilder I
haven't fixed yet).
We need some way to get these runtime bounds into the reduction init and
combiner regions. To keep things simple for now I opted to always box
the array arguments so the box can be passed as one argument and the
lower bounds and extents read from the box. This has the disadvantage of
resulting in fir.box_dim operations inside of the critical section. If
these prove to be a performance issue, we could follow OpenACC reading
box lower bounds and extents before the reduction and passing them as
block arguments to the reduction init and combiner regions. I would
prefer to keep things simple for now.
Note: this implementation only works when the HLFIR lowering is used. I
don't think it is worth supporting FIR-only lowering because the plan is
for that to be removed soon.
OpenMP array reductions 6/6
Previous PR: https://github.com/llvm/llvm-project/pull/84957