Using LoopNest's indices with ShapeShifts that have non-default
lower bounds results in accesses to incorrect array elements.
To avoid having to adjust each index, a ShapeShift with default
lower bounds can be used instead.
Fixes#131751
These previously crashed the compiler because !fir.ptr (not wrapped
inside of a box) was not supported.
Real POINTER variables are supported as !fir.box<!fir.ptr<>>. The
version for EQUIVALENCE doesn't need to do anything different to
!fir.ref<>.
The intention of this work is to give MLIR->LLVMIR conversion freedom to
control how the private variable is allocated so that it can be
allocated on the stack in ordinary cases or as part of a structure used
to give closure context for tasks which might outlive the current stack
frame. See RFC:
https://discourse.llvm.org/t/rfc-openmp-supporting-delayed-task-execution-with-firstprivate-variables/83084
For example, a privatizer for an integer used to look like
```mlir
omp.private {type = private} @x.privatizer : !fir.ref<i32> alloc {
^bb0(%arg0: !fir.ref<i32>):
%0 = ... allocate proper memory for the private clone ...
omp.yield(%0 : !fir.ref<i32>)
}
```
After this change, allocation become implicit in the operation:
```mlir
omp.private {type = private} @x.privatizer : i32
```
For more complex types that require initialization after allocation, an
init region can be used:
``` mlir
omp.private {type = private} @x.privatizer : !some.type init {
^bb0(%arg0: !some.pointer<!some.type>, %arg1: !some.pointer<!some.type>):
// initialize %arg1, using %arg0 as a mold for allocations
omp.yield(%arg1 : !some.pointer<!some.type>)
} dealloc {
^bb0(%arg0: !some.pointer<!some.type>):
... deallocate memory allocated by the init region ...
omp.yield
}
```
This patch lays the groundwork for delayed task execution but is not
enough on its own.
After this patch all gfortran tests which previously passed still pass.
There
are the following changes to the Fujitsu test suite:
- 0380_0009 and 0435_0009 are fixed
- 0688_0041 now fails at runtime. This patch is testing firstprivate
variables with tasks. Previously we got lucky with the undefined
behavior and won the race. After these changes we no longer get lucky.
This patch lays the groundwork for a proper fix for this issue.
In flang the lowering re-uses the existing lowering used for reduction
init and dealloc regions.
In flang, before this patch we hit a TODO with the same wording when
generating the copy region for firstprivate polymorphic variables. After
this patch the box-like fir.class is passed by reference into the copy
region, leading to a different path that didn't hit that old TODO but
the generated code still didn't work so I added a new TODO in
DataSharingProcessor.
This will allow code sharing between reduction and privatization after
my (still WIP) changes to `omp.private` to use an `alloc` region similar
to the one used for reduction declarations.
This alternative loop nest generation is used to generate an OpenMP loop nest instead of fir loops to facilitate parallelizing statements in an OpenMP `workshare` construct.
The main purpose of this patch is to centralize the logic for creating
MLIR operation entry blocks and for binding them to the corresponding
symbols. This minimizes the chances of mixing arguments up for
operations having multiple entry block argument-generating clauses and
prevents divergence while binding arguments.
Some changes implemented to this end are:
- Split into two functions the creation of the entry block, and the
binding of its arguments and the corresponding Fortran symbol. This
enabled a significant simplification of the lowering of composite
constructs, where it's no longer necessary to manually ensure the lists
of arguments and symbols refer to the same variables in the same order
and also match the expected order by the `BlockArgOpenMPOpInterface`.
- Removed redundant and error-prone passing of types and locations from
`ClauseProcessor` methods. Instead, these are obtained from the values
in the appropriate clause operands structure. This also simplifies
argument lists of several lowering functions.
- Access block arguments of already created MLIR operations through the
`BlockArgOpenMPOpInterface` instead of directly indexing the argument
list of the operation, which is not scalable as more entry block
argument-generating clauses are added to an operation.
- Simplified the implementation of `genParallelOp` to no longer need to
define different callbacks depending on whether delayed privatization is
enabled.
This re-uses reduction declarations from intrinsic operators to add
support for reductions of allocatables, pointers, and arrays with
procedure designators (e.g. min/max).
I have split this into two commits to make it easier to review. The
first one makes the functional change. The second cleans things up now
that we can share much more code between intrinsic operators and
procedure designators.
Just treat them the same as ALLOCATABLE. gfortran doesn't allow POINTER
objects in a REDUCTION clause, but so far as I can tell the standard
explicitly allows it (openmp5.2 section 5.5.5).
The object identity requires more than just `Symbol`. Don't use `id()`
to get the Symbol associated with the object, becase the return value
will need to change. Instead use `sym()` which is added for that reason.
Fixes#88935
Toggling reduction by-ref broke when multiple reduction clauses were
used. Decisions made for the by-ref status for later clauses could then
invalidate decisions for earlier clauses. For example,
```
reduction(+:scalar,scalar2) reduction(+:array)
```
The first clause would choose by value reduction and generate by-value
reduction regions, but then after this the second clause would force
by-ref to support the array argument. But by the time the second clause
is processed, the first clause has already had the wrong kind of
reduction regions generated.
This is solved by toggling whether a variable should be reduced by
reference per variable. In the above example, this allows only `array`
to be reduced by ref.
Before this patch we crashed lowering intrinsic array reductions.
I think this lost during a rebase. I've added a test to make sure it
doesn't break again.
Also fixed the TODO message to be more accurate.
…ted. (#89998)" (#90250)
This partially reverts commit 7aedd7dc754c74a49fe84ed2640e269c25414087.
This change removes calls to the deprecated member functions. It does
not mark the functions deprecated yet and does not disable the
deprecation warning in TypeSwitch. This seems to cause problems with
MSVC.
It turned out that `hlfir::genVariableBox` didn't add lower bounds to
the boxes it created. Using a shapeshift instead of only a shape adds
the lower bounds information to the thread-local copy of the box.
Fixes#89259
Both arrays and trivial scalars are supported. Both cases must use
by-ref reductions because both are boxed.
My understanding of the standards are that OpenMP says that this should
follow the rules of the intrinsic reduction operators in fortran, and
fortran says that unallocated allocatable variables can only be
referenced to allocate them or test if they are already allocated.
Therefore we do not need a null pointer check in the combiner region.
This patch changes the lowering of max and min to be lowered to
arith::MaxNumFop and arith::MinNumFOp instead of using arith::MaximumFOp
and arith::MinimumFOp. The arith::MaximumFOp and arith::MinimumFOp map
to the corresponding intrinsics llvm.maximum.* and llvm.minimum.*
intrinsics which conform to the semantics specified in the draft of IEEE
754-2019, which is not supported by all hardware. Instead using
arith::MaximumFOp and arith::MinimumFOp will allow code generation for
more targets and match the code generated by clang OpenMP.
fixes#87955
This patch replaces some `saveInsertionPoint`, `restoreInsertionPoint`
call pairs for an `InsertionGuard` instance where it makes sense within
Flang OpenMP lowering to make further modifications less error-prone.
Accept the reduction modifier in the Flang parser. Issue a TODO message
during lowering.
OpenMP 5.0 introduced the reduction modifier. Details can be seen in
2.19.5.4 reductionClause.
OpenMP 5.2 relevant section is 5.5.8reductionClause.
This will help remove some of the parser errors highlighted in the
following post and also bring it to a well defined behaviour (produce
TODO errors for unsupported features, do not crash).
https://discourse.llvm.org/t/proposal-rename-flang-new-to-flang/69462/60
Following up on a review comment:
https://github.com/llvm/llvm-project/pull/84958#discussion_r1527627848
Reductions might be inlined inside of a loop so stack allocations are
not safe.
Normally flang allocates arrays on the stack. Allocatable arrays have a
different type: fir.box<fir.heap<fir.array<...>>> instead of
fir.box<fir.array<...>>. This patch will allocate all arrays on the
heap.
Reductions on allocatable arrays still aren't supported (but I will get
to this soon).
This adds support for complex type to the OpenMP reductions.
Note that some more work would be needed to give decent error messages when complex
is used in ways that need client supplied functions (e.g. MAX or MIN). It does fail these with
a not so user friendly message at present.
The clause templates defined in ClauseT.h were originally based on
flang's parse tree nodes. Since those representations are going to be
reused for clang (together with the clause splitting code), it makes
sense to separate them from flang, and instead have them based on the
actual OpenMP spec (v5.2).
The member names in the templates follow the naming presented in the
spec, and the representation (e.g. members) is derived from the clause
definitions as described in the spec.
Since the representations of some clauses has changed (while preserving
the information), the current code using the clauses (especially the
code converting parser::OmpClause to omp::Clause) needs to be adjusted.
This patch does not make any functional changes.
Re-use fir::getTypeAsString instead of creating something new here. This
spells integer names like i32 instead of i_32 so there is a lot of test
churn.
This has been tested with arrays with compile-time constant bounds.
Allocatable arrays and arrays with non-constant bounds are not yet
supported. User-defined reduction functions are also not yet supported.
The design is intended to work for arrays with non-constant bounds too
without a lot of extra work (mostly there are bugs in OpenMPIRBuilder I
haven't fixed yet).
We need some way to get these runtime bounds into the reduction init and
combiner regions. To keep things simple for now I opted to always box
the array arguments so the box can be passed as one argument and the
lower bounds and extents read from the box. This has the disadvantage of
resulting in fir.box_dim operations inside of the critical section. If
these prove to be a performance issue, we could follow OpenACC reading
box lower bounds and extents before the reduction and passing them as
block arguments to the reduction init and combiner regions. I would
prefer to keep things simple for now.
Note: this implementation only works when the HLFIR lowering is used. I
don't think it is worth supporting FIR-only lowering because the plan is
for that to be removed soon.
OpenMP array reductions 6/6
Previous PR: https://github.com/llvm/llvm-project/pull/84957
…essor
Rename `findRepeatableClause` to `findRepeatableClause2`, and make the
new `findRepeatableClause` operate on new `omp::Clause` objects.
Leave `Map` unchanged, because it will require more changes for it to
work.
[Clause representation 3/6]
Previously reduction variables were always passed by value into and out
of the initialization and combiner regions of the OpenMP reduction
declare operation.
This worked well for reductions of primitive types (and might perform
better than passing by reference). But passing by reference will be
useful for array and derived type reductions (e.g. to move allocation
inside of the init region).
Passing reductions by reference requires different LLVM-IR generation
when lowering from MLIR because some of the loads/stores/allocations
will now be moved inside of the init and combiner regions. This
alternate code generation is requested using a new attribute to
omp.wsloop and omp.parallel.
Existing lowerings from mlir are unaffected (these will continue to use
the by-value argument passing.
Flang will continue to pass by-value argument passing for trivial types
unless a (hidden) command line argument is supplied. Non-trivial types
will always use the by-ref lowering.
Array reductions are not ready yet (but are coming very soon). In the
meantime, this is tested by forcing existing reductions to use by-ref.
Commit series for by-ref OpenMP reductions 3/3
---------
Co-authored-by: Mats Petersson <mats.petersson@arm.com>
This started as an experiment to reduce the compilation time of
iterating over `Lower/OpenMP.cpp` a bit since it is too slow at the
moment. Trying to do that, I split the `DataSharingProcessor`,
`ReductionProcessor`, and `ClauseProcessor` into their own files and
extracted some shared code into a util file. All of these new `.h/.cpp`
files as well as `OpenMP.cpp` are now under a `Lower/OpenMP/` directory.
This resulted is a slightly better organization of the OpenMP lowering
code and hence opening this NFC.
As for the compilation time, this unfortunately does not affect it much
(it shaves off a few seconds of `OpenMP.cpp` compilation) since from
what I learned the bottleneck is in `DirectivesCommon.h` and
`PFTBuilder.h` which both consume a lot of time in template
instantiation it seems.