Add Automap modifier to the MLIR op definition for the DeclareTarget
directive's Enter clause. Also add lowering support in Flang.
Automap Ref: OpenMP 6.0 section 7.9.7.
See #150178
This may regress some test cases which only ever passed by accident.
I've tested SPEC2017 and a sample of applications to check that this
doesn't break anything too obvious. Presumably this was not a widely
used feature or we would have noticed the bug sooner.
I'm unsure whether this should be backported to LLVM 21 or not: I think
it is much better to refuse to compile than to silently produce the
wrong result, but there is a chance this could regress something which
previously worked by accident. Opinions welcome.
Fixes#149089 and #149700.
Before #145837, when processing a reduction symbol not yet supported by
OpenMP lowering, the reduction processor would simply skip filling in
the reduction symbols and variables. With #145837, this behvaior was
slightly changed because the reduction symbols are populated before
invoking the reduction processor (this is more convenient to shared the
code with `do concurrent`).
This PR restores the previous behavior.
OpenMP 6.0 has changed the modifiers on the MAP clause. Previous patch
has introduced parsing support for them. This patch introduces
processing of the new forms in semantic checks and in lowering. This
only applies to existing modifiers, which were updated in the 6.0 spec.
Any of the newly introduced modifiers (SELF and REF) are ignored.
This PR proposes re-modelling `reduce` specifiers to match OpenMP and
OpenACC. In particular, this PR includes the following:
* A new `fir` op: `fir.delcare_reduction` which is identical to OpenMP's
`omp.declare_reduction` op.
* Updating the `reduce` clause on `fir.do_concurrent.loop` to use the
new op.
* Re-uses the `ReductionProcessor` component to emit reductions for `do
concurrent` just like we do for OpenMP. To do this, the
`ReductionProcessor` had to be refactored to be more generalized.
* Upates mapping `do concurrent` to `fir.loop ... unordered` nests using
the new reduction model.
Unfortunately, this is a big PR that would be difficult to divide up in
smaller parts because the bottom of the changes are the `fir` table-gen
changes to `do concurrent`. However, doing these MLIR changes cascades
to the other parts that have to be modified to not break things.
This PR goes in the same direction we went for `private/local`
speicifiers. Now the `do concurrent` and OpenMP (and OpenACC) dialects
are modelled in essentially the same way which makes mapping between
them more trivial, hopefully.
PR stack:
- https://github.com/llvm/llvm-project/pull/145837 (this one)
- https://github.com/llvm/llvm-project/pull/146025
- https://github.com/llvm/llvm-project/pull/146028
- https://github.com/llvm/llvm-project/pull/146033
In OpenMP 5.2, the `target enter data` and `target exit data` constructs
now have default map types if the user does not define them in the Map
clause. For `target enter data`, this is `to` and `target exit data`
this is `from`. This behaviour is now enabled when OpenMP 5.2 or greater
is used when compiling. To enable this, the default value is now set in
the `processMap` clause, with any previous behaviour being maintained
for either older versions of OpenMP or other directives.
See also #110008
Implement the lowering for passing a fir.boxchar argument to the
copyprivate clause.
Resolves https://github.com/llvm/llvm-project/issues/142123.
---------
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
Fixes#142404
The parser can't tell the difference between array indexing and a
substring: that has to be done in semantics once we have types.
Substrings can only be in the form string([lower]:[higher]) not
string(index) or string(lower:higher:step). I added semantic checks to
catch this for the DEPEND clause.
This patch also adds lowering for correct substrings and for complex
part references.
Even though the spec (version 5.2) prohibits strcuture components from
being specified in `depend` clauses, this restriction is not sensible.
This PR rectifies the issue by lifting that restriction and allowing
structure components in `depend` clauses (which is allowed by OpenMP
6.0).
The current semantic check in place is incorrect, this patch fixes this.
Up to 1 **'default'** named mapper should be allowed for each derived
type.
The current semantic check only allows up to 1 **'default'** named
mapper across all derived types.
This also makes sure that declare mappers follow proper scoping rules
for both default and named mappers.
Co-authored-by: Raghu Maddhipatla <Raghu.Maddhipatla@amd.com>
This patch updates MLIR op verifiers for operations taking arguments
that must always be defined by an `omp.map.info` operation to check this
requirement.
It also modifies Flang lowering for `use_device_{addr, ptr}`, as well as
the custom MLIR printer and parser for these clauses, to support
initializing it to `OMP_MAP_RETURN_PARAM` and represent this in the MLIR
representation as `return_param`. This internal mapping flag is what
eventually is used for variables passed via these clauses into the
target region when translating to LLVM IR, so making it explicit in
Flang and MLIR removes an inconsistency in the current representation.
This aims to implement most of the initial arguments for defaultmap
aside from firstprivate and none, and some of the more recent OpenMP 6
additions which will come in subsequent updates (with the OpenMP 6
variants needing parsing/semantic support first).
This patch,
- Added support for lowering of task_reduction to MLIR
- Added support for lowering of in_reduction to MLIR
- Fixed incorrect DSA handling for variables in the presence of 'in_reduction' clause.
The OpenMP runtime needs the base address of the array section to
identify the dependency.
If we just put the vector subscript through the usual HLFIR expression
lowering, that would generate a new contiguous array representing the
values of the elements in the array which was sectioned. We cannot use
addresses from this array because these addresses would not match
dependencies on the original array. For example
```
integer :: array(1024)
integer :: indices(2)
indices(1) = 1
indices(2) = 100
!$omp task depend(out: array(1:512))
!$omp end task
!$omp task depend(in: array(indices))
!$omp end task
```
This requires taking the lowering path previously only used for ordered
assignments to get the address of the elements in the original array
which were indexed. This is done using `hlfir.elemental_addr`. e.g.
```
array(indices) = 2
```
`hlfir.elemental_addr` is awkward to use because it (by design) doesn't
return something like `!hlfir.expr<>` (like `hlfir.elemental`) and so it
can't have a generic lowering: each place it is used has to carefully
inline the contents of the operation and extract the needed address.
For this reason, `hlfir.elemental_addr` is not allowed outside of these
ordered assignments. In this commit I ignore this restriction so that I
can use `hlfir.elemental_addr` to lower the OpenMP DEPEND clause (this
works because the operation is inlined and removed before the verifier
runs).
One alternative solution would have been to provide my own more limited
re-implementation of `HlfirDesignatorBuilder` which skipped
`hlfir::elemental_addr`, instead inlining its body directly at the
current insertion point applying indices only for the first element.
This would have been difficult to maintain because designation in
Fortran is complex.
The OpenMP standard says that all dependencies in the same set of
inter-dependent tasks must be non-overlapping. This simplification means
that the OpenMP only needs to keep track of the base addresses of
dependency variables. This can be seen in kmp_taskdeps.cpp, which stores
task dependency information in a hash table, using the base address as a
key.
This patch generates a rebox operation to slice boxed arrays, but only
the box data address is used for the task dependency. The extra box is
optimized away by LLVM at O3.
Vector subscripts are TODO (I will address in my next patch).
This also fixes a bug for ordinary subscripts when the symbol was mapped
to a box:
Fixes#132647
This patch simplifies the definition of
`ClauseProcessor::processMapObjects` by hoisting the creation of the
MLIR symbol associated to an existing `omp.declare_mapper` operation
outside of the loop processing all mapped objects.
That change removes some inter-iteration dependencies that made the
implementation more difficult to follow.
The HAS_DEVICE_ADDR indicates that the object(s) listed exists at an
address that is a valid device address. Specifically,
`has_device_addr(x)` means that (in C/C++ terms) `&x` is a device
address.
When entering a target region, `x` does not need to be allocated on the
device, or have its contents copied over (in the absence of additional
mapping clauses). Passing its address verbatim to the region for use is
sufficient, and is the intended goal of the clause.
Some Fortran objects use descriptors in their in-memory representation.
If `x` had a descriptor, both the descriptor and the contents of `x`
would be located in the device memory. However, the descriptors are
managed by the compiler, and can be regenerated at various points as
needed. The address of the effective descriptor may change, hence it's
not safe to pass the address of the descriptor to the target region.
Instead, the descriptor itself is always copied, but for objects like
`x`, no further mapping takes place (as this keeps the storage pointer
in the descriptor unchanged).
---------
Co-authored-by: Sergio Afonso <safonsof@amd.com>
This PR adds an initial implementation for the map modifiers close,
present and ompx_hold, primarily just required adding the appropriate
map type flags to the map type bits. In the case of ompx_hold it
required adding the map type to the OpenMP dialect. Close has a bit of a
problem when utilised with the ALWAYS map type on descriptors, so it is
likely we'll have to make sure close and always are not applied to the
descriptor simultaneously in the future when we apply always to the
descriptors to facilitate movement of descriptor information to device
for consistency, however, we may find an alternative to this with
further investigation. For the moment, it is a TODO/Note to keep track
of it.
This PR tries to fix `lastprivate` update issues in composite
constructs. In particular, pre-determined `lastprivate` symbols are
attached to the wrong leaf of the composite construct (the outermost
one). When using delayed privatization (should be the default mode in
the future), this results in trying to update the `lastprivate` symbol
in the wrong construct (outside the `omp.loop_nest` op).
For example, given the following input:
```fortran
!$omp target teams distribute parallel do simd collapse(2) private(y_max)
do i=x_min,x_max
do j=y_min,y_max
enddo
enddo
```
Without the fixes introduced in this PR, the `DataSharingProcessor`
tries to generate the `lastprivate` update ops in the `parallel` op
since this is the op for which the DSP instance is created.
The fix consists of 2 main parts:
1. Instead of creating a single DSP instance, one instance is created
for the leaf constructs that might need privatization (whether for
explicit, implicit, or pre-determined symbols).
2. When generating the `lastprivate` comparison ops, we don't directly
use the SSA values of the UBs and steps. Instead, we regenerated these
SSA values from the original loop bounds' expressions. We have to do
this to avoid using `host_eval` values in the `lastprivate` comparison
logic which is illegal.
Add Lowering support for OpenMP mapper field in mapInfoOp.
NOTE: This patch only supports explicit mapper lowering. I'll add a
separate PR soon which handles implicit default mapper recognition.
Depends on #120994.
Move OpenACC/OpenMP helpers from Lower/DirectivesCommon.h that are also
used in OpenACC and OpenMP mlir passes into a new
Optimizer/Builder/DirectivesCommon.h so that parser and evaluate headers
are not included in Optimizer libraries (this both introduce
compile-time and link-time pointless overheads).
This should fix https://github.com/llvm/llvm-project/issues/123377
This patch,
- Added a translation support for aligned clause in SIMD directive by passing the alignment details to "llvm.assume" intrinsic.
- Updated the insertion point for llvm.assume intrinsic call in "OMPIRBuilder.cpp".
- Added a check in aligned clause MLIR lowering, to ensure that the alignment value must be a power of 2.
Implementatoin details:
Both Mutexinoutset and Inoutset is recognized as flag=0x4
and 0x8 respectively, the flags is set to `kmp_depend_info` and
passed as argument to `__kmpc_omp_task_with_deps` runtime call
Again, this simplifies the semantic checks and lowering quite a bit.
Update the check for positive alignment to use a more informative
message, and to highlight the modifier itsef, not the whole clause.
Remove the checks for the allocator expression itself being positive:
there is nothing in the spec that says that it should be positive.
Remove the "simple" modifier from the AllocateT template, since both
simple and complex modifiers are the same thing, only differing in
syntax.
When compiling WORKSHARE construct in different compilation units, a
linker error happened, when two equal WORKSHARE constructs with a copy
operation have been compiled:
```
/usr/bin/ld: module2.o: in function `_workshare_copy_f64':
FIRModule:(.text+0x0): multiple definition of `_workshare_copy_f64'; module1.o:FIRModule:(.text+0x0): first defined here
```
Reason is that the generate copy function has the wrong linkage:
```
0000000000000000 T _workshare_copy_f64
```
while it should be
```
0000000000000000 t _workshare_copy_f64
```
This removes the specialized parsers and helper classes for these
clauses, namely ConcatSeparated, MapModifiers, and MotionModifiers. Map
and the motion clauses are now handled in the same way as all other
clauses with modifiers, with one exception: the commas separating their
modifiers are optional. This syntax is deprecated in OpenMP 5.2.
Implement version checks for modifiers: for a given modifier on a given
clause, check if that modifier is allowed on this clause in the
specified OpenMP version. This replaced several individual checks.
Add a testcase for handling map modifiers in a different order, and for
diagnosing an ultimate modifier out of position.
It also modifies the error message to specify it is the dependence-type
that is not supported.
Resolves the crash in
https://github.com/llvm/llvm-project/issues/115647. A fix can come in
later as part of future OpenMP version support.
Extends MLIR lowering support for the `loop` directive by adding
lowering support for the `bind` clause.
Parent PR: https://github.com/llvm/llvm-project/pull/114199, only the
latest commit is relevant to this PR.
This PR is one of 3 in a PR stack, this is the primary change set which
seeks to extend the current derived type explicit member mapping support
to handle descriptor member mapping at arbitrary levels of nesting. The
PR stack seems to do this reasonably (from testing so far) but as you
can create quite complex mappings with derived types (in particular when
adding allocatable derived types or arrays of allocatable derived types)
I imagine there will be hiccups, which I am more than happy to address.
There will also be further extensions to this work to handle the
implicit auto-magical mapping of descriptor members in derived types and
a few other changes planned for the future (with some ideas on
optimizing things).
The changes in this PR primarily occur in the OpenMP lowering and the
OMPMapInfoFinalization pass.
In the OpenMP lowering several utility functions were added or extended
to support the generation of appropriate intermediate member mappings
which are currently required when the parent (or multiple parents) of a
mapped member are descriptor types. We need to map the entirety of these
types or do a "deep copy" for lack of a better term, where we map both
the base address and the descriptor as without the copying of both of
these we lack the information in the case of the descriptor to access
the member or attach the pointers data to the pointer and in the latter
case we require the base address to map the chunk of data. Currently we
do not segment descriptor based derived types as we do with regular
non-descriptor derived types, we effectively map their entirety in all
cases at the moment, I hope to address this at some point in the future
as it adds a fair bit of a performance penalty to having nestings of
allocatable derived types as an example. The process of mapping all
intermediate descriptor members in a members path only occurs if a
member has an allocatable or object parent in its symbol path or the
member itself is a member or allocatable. This occurs in the
createParentSymAndGenIntermediateMaps function, which will also generate
the appropriate address for the allocatable member within the derived
type to use as a the varPtr field of the map (for intermediate
allocatable maps and final allocatable mappings). In this case it's
necessary as we can't utilise the usual Fortran::lower functionality
such as gatherDataOperandAddrAndBounds without causing issues later in
the lowering due to extra allocas being spawned which seem to affect the
pointer attachment (at least this is my current assumption, it results
in memory access errors on the device due to incorrect map information
generation). This is similar to why we do not use the MLIR value
generated for this and utilise the original symbol provided when mapping
descriptor types external to derived types. Hopefully this can be
rectified in the future so this function can be simplified and more
closely aligned to the other type mappings. We also make use of
fir::CoordinateOp as opposed to the HLFIR version as the HLFIR version
doesn't support the appropriate lowering to FIR necessary at the moment,
we also cannot use a single CoordinateOp (similarly to a single GEP) as
when we index through a descriptor operation (BoxType) we encounter
issues later in the lowering, however in either case we need access to
intermediate descriptors so individual CoordinateOp's aid this
(although, being able to compress them into a smaller amount of
CoordinateOp's may simplify the IR and perhaps result in a better end
product, something to consider for the future).
The other large change area was in the OMPMapInfoFinalization pass,
where the pass had to be extended to support the expansion of box types
(or multiple nestings of box types) within derived types, or box type
derived types. This requires expanding each BoxType mapping from one
into two maps and then modifying all of the existing member indices of
the overarching parent mapping to account for the addition of these new
members alongside adjusting the existing member indices to support the
addition of these new maps which extend the original member indices (as
a base address of a box type is currently considered a member of the box
type at a position of 0 as when lowered to LLVM-IR it's a pointer
contained at this position in the descriptor type, however, this means
extending mapped children of this expanded descriptor type to
additionally incorporate the new member index in the correct location in
its own index list). I believe there is a reasonable amount of comments
that should aid in understanding this better, alongside the test
alterations for the pass.
A subset of the changes were also aimed at making some of the utilities
for packing and unpacking the DenseIntElementsAttr containing the member
indices shareable across the lowering and OMPMapInfoFinalization, this
required moving some functions to the Lower/Support/Utils.h header, and
transforming the lowering structure containing the member index data
into something more similar to the version used in
OMPMapInfoFinalization. There we also some other attempts at tidying
things up in relation to the member index data generation in the
lowering, some of which required creating a logical operator for the
OpenMP ID class so it can be utilised as a map key (it simply utilises
the symbol address for the moment as ordering isn't particularly
important).
Otherwise I have added a set of new tests encompassing some of the
mappings currently supported by this PR (unfortunately as you can have
arbitrary nestings of all shapes and types it's not very feasible to
cover them all).
Extract the SINK/SOURCE parse tree elements into a separate class
`OmpDoacross`, share them between DEPEND and DOACROSS clauses. Most of
the changes in Semantics are to accommodate the new contents of
OmpDependClause, and a mere introduction of OmpDoacrossClause.
There are no semantic checks specifically for DOACROSS.
Parse PRESENT modifier as well while we're at it (no MAPPER though). Add
semantic checks for these clauses in the TARGET UPDATE construct, TODO
messages in lowering.
Parse the locator list in OmpDependClause as an OmpObjectList (instead
of a list of Designators). When a common block appears in the locator
list, show an informative message.
Implement resolving symbols in DependSinkVec in a dedicated visitor
instead of having a visitor for OmpDependClause.
Resolve unresolved names common blocks in OmpObjectList.
Minor changes to the code organization:
- rename OmpDependenceType to OmpTaskDependenceType (to follow 5.2
terminology),
- rename Depend::WithLocators to Depend::DepType,
- add comments with more detailed spec references to parse-tree.h.
---------
Co-authored-by: Kiran Chandramohan <kiran.chandramohan@arm.com>