A compound construct with a list of clauses is broken up into individual
leaf/composite constructs. Each such construct has the list of clauses
that apply to it based on the OpenMP spec.
Each lowering function (i.e. a function that generates MLIR ops) is now
responsible for generating its body as described below.
Functions that receive AST nodes extract the construct, and the clauses
from the node. They then create a work queue consisting of individual
constructs, and invoke a common dispatch function to process (lower) the
queue.
The dispatch function examines the current position in the queue, and
invokes the appropriate lowering function. Each lowering function
receives the queue as well, and once it needs to generate its body, it
either invokes the dispatch function on the rest of the queue (if any),
or processes nested evaluations if the work queue is at the end.
Re-application of ca1bd5995f6ed934f9187305190a5abfac049173 with fixes for
compilation errors.
A compound construct with a list of clauses is broken up into individual
leaf/composite constructs. Each such construct has the list of clauses
that apply to it based on the OpenMP spec.
Each lowering function (i.e. a function that generates MLIR ops) is now
responsible for generating its body as described below.
Functions that receive AST nodes extract the construct, and the clauses
from the node. They then create a work queue consisting of individual
constructs, and invoke a common dispatch function to process (lower) the
queue.
The dispatch function examines the current position in the queue, and
invokes the appropriate lowering function. Each lowering function
receives the queue as well, and once it needs to generate its body, it
either invokes the dispatch function on the rest of the queue (if any),
or processes nested evaluations if the work queue is at the end.
This patch is one in a series of four patches that seeks to refactor
slightly and extend the current record type map support that was
put in place for Fortran's descriptor types to handle explicit
member mapping for record types at a single level of depth.
For example, the below case where two members of a Fortran
derived type are mapped explicitly:
''''
type :: scalar_and_array
real(4) :: real
integer(4) :: array(10)
integer(4) :: int
end type scalar_and_array
type(scalar_and_array) :: scalar_arr
!$omp target map(tofrom: scalar_arr%int, scalar_arr%real)
''''
Current cases of derived type mapping left for future work are:
> explicit member mapping of nested members (e.g. two layers of
record types where we explicitly map a member from the internal
record type)
> Fortran's automagical mapping of all elements and nested elements
of a derived type
> explicit member mapping of a derived type and then constituient members
(redundant in Fortran due to former case but still legal as far as I am aware)
> explicit member mapping of a record type (may be handled reasonably, just
not fully tested in this iteration)
> explicit member mapping for Fortran allocatable types (a variation of nested
record types)
This patch seeks to support this by extending the Flang-new OpenMP lowering to
support generation of this newly required information, creating the neccessary
parent <-to-> member map_info links, calculating the member indices and
setting if it's a partial map.
The OMPDescriptorMapInfoGen pass has also been generalized into a map
finalization phase, now named OMPMapInfoFinalization. This pass was extended
to support the insertion of member maps into the BlockArg and MapOperands of
relevant map carrying operations. Similar to the method in which descriptor types
are expanded and constituient members inserted.
Pull Request: https://github.com/llvm/llvm-project/pull/82853
Besides duplicating code, privatizing variables in every section
causes problems when synchronization barriers are used. This
happens because each section is executed by a given thread, which
will cause the program to hang if not all running threads execute
the barrier operation.
Fixes https://github.com/llvm/llvm-project/issues/72824
…ted. (#89998)" (#90250)
This partially reverts commit 7aedd7dc754c74a49fe84ed2640e269c25414087.
This change removes calls to the deprecated member functions. It does
not mark the functions deprecated yet and does not disable the
deprecation warning in TypeSwitch. This seems to cause problems with
MSVC.
This patch updates lowering from PFT to MLIR of workshare loops to
follow the loop wrapper approach. Unit tests impacted by this change are
also updated.
As the last patch of the stack, this should compile and pass unit tests.
This patch replaces some `saveInsertionPoint`, `restoreInsertionPoint`
call pairs for an `InsertionGuard` instance where it makes sense within
Flang OpenMP lowering to make further modifications less error-prone.
This patch simplifies the lowering from PFT to MLIR of OpenMP compound
constructs (i.e. combined and composite).
The new approach consists of iteratively processing the outermost leaf
construct of the given combined construct until it cannot be split
further. Both leaf constructs and composite ones have `gen...()`
functions that are called when appropriate.
This approach enables treating a leaf construct the same way regardless
of if it appeared as part of a combined construct, and it also enables
the lowering of composite constructs as a single unit.
Previous corner cases are now handled in a more straightforward way and
comments pointing to the relevant spec section are added. Directive sets
are also completed with missing LOOP related constructs.
This patch updates the definition of `omp.simdloop` to enforce the
restrictions of a wrapper operation. It has been renamed to `omp.simd`,
to better reflect the naming used in the spec. All uses of "simdloop" in
function names have been updated accordingly.
Some changes to Flang lowering and OpenMP to LLVM IR translation are
introduced to prevent the introduction of compilation/test failures. The
eventual long term solution might be different.
This patch performs several cleanups with the main purpose of
normalizing the code patterns used to trigger codegen for MLIR OpenMP
operations and making the processing of clauses and constructs
independent. The following changes are made:
- Clean up unused `directive` argument to
`ClauseProcessor::processMap()`.
- Move general helper functions in OpenMP.cpp to the appropriate section
of the file.
- Create `gen<OpName>Clauses()` functions containing the clause
processing code specific for the associated OpenMP construct.
- Update `gen<OpName>Op()` functions to call the corresponding
`gen<OpName>Clauses()` function.
- Sort calls to `ClauseProcessor::process<ClauseName>()` alphabetically,
to avoid inadvertently relying on some arbitrary order. Update some
tests that broke due to the order change.
- Normalize `genOMP()` functions so they all delegate the generation of
MLIR to `gen<OpName>Op()` functions following the same pattern.
- Only process `nowait` clause on `TARGET` constructs if not compiling
for the target device.
A later patch can move the calls to `gen<OpName>Clauses()` out of
`gen<OpName>Op()` functions and passing completed clause structures
instead, in preparation to supporting composite constructs. That will
make it possible to reuse clause processing for a given leaf construct
when appearing alone or in a combined or composite construct, while
controlling where the associated code is produced.
This patch updates Flang lowering to use the new set of OpenMP clause
operand structures and their groupings into directive-specific sets of
clause operands.
It simplifies the passing of information from the clause processor and
the creation of operations.
The `DataSharingProcessor` is slightly modified to not hold delayed
privatization state. Instead, optional arguments are added to
`processStep1` which are only passed when delayed privatization is used.
This enables using the clause operand structure for `private` and
removes the need for the ad-hoc `DelayedPrivatizationInfo` structure.
The processing of the `schedule` clause is updated to process the
`chunk` modifier rather than requiring two separate calls to the
`ClauseProcessor`.
Lowering of a block-associated `ordered` construct is updated to emit a
TODO error if the `simd` clause is specified, since it is not currently
supported by the `ClauseProcessor` or later compilation stages.
Removed processing of `schedule` from `omp.simdloop`, as it doesn't
apply to `simd` constructs.
Added lowering support for IS_DEVICE_PTR and HAS_DEVICE_ADDR clauses for
OMP TARGET directive and added related tests for these changes.
IS_DEVICE_PTR and HAS_DEVICE_ADDR clauses apply to OMP TARGET directive
OpenMP spec states
The **is_device_ptr** clause indicates that its list items are device
pointers.
The **has_device_addr** clause indicates that its list items already
have device addresses and therefore they may be directly accessed from a
target device.
Whereas USE_DEVICE_PTR and USE_DEVICE_ADDR clauses apply to OMP TARGET
DATA directive and OpenMP spec for them states
Each list item in the **use_device_ptr** clause results in a new list
item that is a device pointer that refers to a device address
Each list item in a **use_device_addr** clause that is present in the
device data environment is treated as if it is implicitly mapped by a
map clause on the construct with a map-type of alloc
Fixed build error caused by Squash merge which needs rebase
Added lowering support for IS_DEVICE_PTR and HAS_DEVICE_ADDR clauses for
OMP TARGET directive and added related tests for these changes.
IS_DEVICE_PTR and HAS_DEVICE_ADDR clauses apply to OMP TARGET directive
OpenMP spec states
`The **is_device_ptr** clause indicates that its list items are device
pointers.`
`The **has_device_addr** clause indicates that its list items already
have device addresses and therefore they may be directly accessed from a
target device.`
Whereas USE_DEVICE_PTR and USE_DEVICE_ADDR clauses apply to OMP TARGET
DATA directive and OpenMP spec for them states
`Each list item in the **use_device_ptr** clause results in a new list
item that is a device pointer that refers to a device address`
`Each list item in a **use_device_addr** clause that is present in the
device data environment is treated as if it is implicitly mapped by a
map clause on the construct with a map-type of alloc`
There were several functions, mostly reduction-related, that were only
called from OpenMP.cpp. Remove them from OpenMP.h, and make them local
in OpenMP.cpp:
- genOpenMPReduction
- findReductionChain
- getConvertFromReductionOp
- updateReduction
- removeStoreOp
Also, move the function bodies out of the "public" section.
The clause templates defined in ClauseT.h were originally based on
flang's parse tree nodes. Since those representations are going to be
reused for clang (together with the clause splitting code), it makes
sense to separate them from flang, and instead have them based on the
actual OpenMP spec (v5.2).
The member names in the templates follow the naming presented in the
spec, and the representation (e.g. members) is derived from the clause
definitions as described in the spec.
Since the representations of some clauses has changed (while preserving
the information), the current code using the clauses (especially the
code converting parser::OmpClause to omp::Clause) needs to be adjusted.
This patch does not make any functional changes.
Put all of the genOMP functions together, organize them in two groups:
for declarative constructs and for other (executable) constructs.
Replace visit functions for OpenMPDeclarativeConstruct and
OpenMPConstruct from listing individual visitors for each variant
alternative to using a single generic visitor. Essentially, going from
```
std::visit(
[](foo x) { genOMP(foo); }
[](bar x) { TODO }
[](baz x) { genOMP(baz); }
)
```
to
```
void genOMP(bar x) { // Separate visitor for an unhandled case
TODO
}
[...]
std::visit([&](auto &&s) { genOMP(s); }) // generic
```
This doesn't change any functionality, just reorganizes the functions a
bit. The intent here is to improve the readability of this file.
This patch contains slight modifications to the reverted PR #85258 to
avoid issues with constructs containing multiple reduction clauses,
uncovered by a test on the gfortran testsuite.
This reverts commit 9f80444c2e669237a5c92013f1a42b91b5609012.
The related functions are `gatherDataOperandAddrAndBounds` and
`genBoundsOps`. The former is used in OpenACC as well, and it was
updated to pass evaluate::Expr instead of parser objects.
The difference in the test case comes from unfolded conversions of index
expressions, which are explicitly of type integer(kind=8).
Delete now unused `findRepeatableClause2` and `findClause2`.
Add `AsGenericExpr` that takes std::optional. It already returns
optional Expr. Making it accept an optional Expr as input would reduce
the number of necessary checks when handling frequent optional values in
evaluator.
[Clause representation 4/6]
This patch moves some code in PFT to MLIR OpenMP lowering to the
`ClauseProcessor` class. This is so that some behavior that is related
to certain clauses stays within the `ClauseProcessor` and it's not the
caller the one responsible for always doing this when the clause is
present.
In this patch some uses of `llvm::SmallVector` in Flang's lowering to
MLIR are replaced by other types (i.e. `llvm::ArrayRef` and
`llvm::SmallVectorImpl`) which are intended for these uses. This
generally prevents relying on always passing small vectors with a
particular number of elements in the stack.
…essor
Rename `findRepeatableClause` to `findRepeatableClause2`, and make the
new `findRepeatableClause` operate on new `omp::Clause` objects.
Leave `Map` unchanged, because it will require more changes for it to
work.
[Clause representation 3/6]
This effectively implements some now deprecated OpenMP functionality
that some applications (most notably at the moment GenASiS)
unfortunately depend on (deprecated in specification version 5.2):
"If a list item in a use_device_ptr clause is not of type C_PTR, the
behavior is as if the list item appeared in a use_device_addr clause.
Support for such list items in a use_device_ptr clause is deprecated."
This PR downgrades the hard-error to a deprecated warning and "promotes"
the above cases by simply moving the offending operands from the
use_device_ptr value list to the back of the use_device_addr list (and
moves the related symbols, locs and types that form the BlockArgs
correspondingly) and then the generation of the target data construct
proceeds as normal.
Previously reduction variables were always passed by value into and out
of the initialization and combiner regions of the OpenMP reduction
declare operation.
This worked well for reductions of primitive types (and might perform
better than passing by reference). But passing by reference will be
useful for array and derived type reductions (e.g. to move allocation
inside of the init region).
Passing reductions by reference requires different LLVM-IR generation
when lowering from MLIR because some of the loads/stores/allocations
will now be moved inside of the init and combiner regions. This
alternate code generation is requested using a new attribute to
omp.wsloop and omp.parallel.
Existing lowerings from mlir are unaffected (these will continue to use
the by-value argument passing.
Flang will continue to pass by-value argument passing for trivial types
unless a (hidden) command line argument is supplied. Non-trivial types
will always use the by-ref lowering.
Array reductions are not ready yet (but are coming very soon). In the
meantime, this is tested by forcing existing reductions to use by-ref.
Commit series for by-ref OpenMP reductions 3/3
---------
Co-authored-by: Mats Petersson <mats.petersson@arm.com>
As per the OpenMP standard, "If a variable appears in a link clause on a
declare target directive that does not have a device_type clause with
the nohost device-type-description then it is treated as if it had
appeared in a map clause with a map-type of tofrom" is an implicit
mapping rule. Before this change, such variables were mapped as to by
default.
This patch seeks to create a process that happens on module finalization
for OpenMP, in which a list of operations that had declare target
directives applied to them and were not generated at the time of
processing the original declare target directive are re-checked to apply
the appropriate declare target semantics.
This works by maintaining a vector of declare target related data inside
of the FIR converter, in this case the symbol and the two relevant
unsigned integers representing the enumerators. This vector is added to
via a new function called from Bridge.cpp, insertDeferredDeclareTargets,
which happens prior to the processing of the directive (similarly to
getDeclareTargetFunctionDevice currently for requires), it effectively
checks if the Operation the declare target directive is applied to
currently exists, if it doesn't it appends to the vector. This is a
seperate function to the processing of the declare target via the
overloaded genOMP as we unfortunately do not have access to the list
without passing it through every call, as the AbstractConverter we pass
will not allow access to it (I've seen no other cases of casting it to a
FirConverter, so I opted to not do that).
The list is then processed at the end of the module in the
finalizeOpenMPLowering function in Bridge by calling a new function
markDelayedDeclareTargetFunctions which marks the latently generated
operations. In certain cases, some still will not be generated, e.g. if
an interface is defined, marked as declare target, but has no definition
or usage in the module then it will not be emitted to the module, so due
to these cases we must silently ignore when an operation has not been
found via it's symbol.
The main use-case for this (although, I imagine there is others) is for
processing interfaces that have been declared in a module with a declare
target directive but do not have their implementation defined in the
same module. For example, inside of a seperate C++ module that will be
linked in. In cases where the interface is called inside of a target
region it'll be marked as used on device appropriately (although,
realistically a user should explicitly mark it to match the
corresponding definition), however, in cases where it's used in a
non-clear manner through something like a function pointer passed to an
external call we require this explicit marking, which this patch adds
support for (currently will cause the compiler to crash).
This patch also adds documentation on the declare target process and
mechanisms within the compiler currently.
Add the [[maybe_unused]] attribute to a variable in
lib/Lower/OpenMP/OpenMP.cpp to avoid a (possibly bogus) unused variable
warning when building with GCC 9.3.0.
Adds basic support for emitting delayed privatizers from flang. So far,
only types of symbols are supported (i.e. scalars), support for more
complicated types will be added later. This also makes sure that
reduction and delayed privatization work properly together by merging
the
body-gen callbacks for both in case both clauses are present on the
parallel construct.
Add initial handling of OpenMP copyprivate clause in Flang.
When lowering copyprivate, Flang generates the copy function
needed by each variable and builds the appropriate
omp.single's CopyPrivateVarList.
This is patch 3 of 4, to add support for COPYPRIVATE in Flang.
Original PR: https://github.com/llvm/llvm-project/pull/73128
This patch adds support in flang for the depend clause in target and
target enter/update/exit constructs. Previously, the following line in a
fortran program would have resulted in the error shown below it.
!$omp target map(to:a) depend(in:a)
"not yet implemented: Unhandled clause DEPEND in TARGET construct"
This started as an experiment to reduce the compilation time of
iterating over `Lower/OpenMP.cpp` a bit since it is too slow at the
moment. Trying to do that, I split the `DataSharingProcessor`,
`ReductionProcessor`, and `ClauseProcessor` into their own files and
extracted some shared code into a util file. All of these new `.h/.cpp`
files as well as `OpenMP.cpp` are now under a `Lower/OpenMP/` directory.
This resulted is a slightly better organization of the OpenMP lowering
code and hence opening this NFC.
As for the compilation time, this unfortunately does not affect it much
(it shaves off a few seconds of `OpenMP.cpp` compilation) since from
what I learned the bottleneck is in `DirectivesCommon.h` and
`PFTBuilder.h` which both consume a lot of time in template
instantiation it seems.