This patch is a follow-up from #161213 and adds the omp.fuse loop
transformation for the OpenMP dialect. Used for lowering a `!$omp fuse`
in Flang.
Added Lowering and end2end tests.
Make sure to add the source range whenever we create a scope for an
OpenMP construct or a clause. This allows that scope to be located via
context.FindScope(source).
In some cases, a clause on a composite simd construct applied to simd
can be using a symbol that is also used by another privatizer, not
applied to simd. Correctly handle this scenario by checking which
directive the privatizer is being generated for while determining
whether to emit the copy region.
Fixes#155195.
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
Fixes a regression uncovered by Fujitsu test 0686_0024.f90. In
particular, verifies that a pre-determined symbol is only privatized by
its defining evaluation (e.g. the loop for which the symbol was marked
as pre-determined).
Fixes a bug when a block variable is marked as implicit private. In such
case, we can simply ignore privatizing that symbol within the context of
the currrent OpenMP construct since the "private" allocation for the
symbol will be emitted within the nested block anyway.
Fixes a bug when a block variable is marked as pre-determined private.
In such case, we can simply ignore privatizing that symbol within the
context of the currrent OpenMP construct since the "private" allocation
for the symbol will be emitted within the nested block anyway.
Currently, we indicate to the runtime that implicit scalar captures are
firstprivate (via map and
capture types), enough for the runtime trace to treat it as such, but we
do not CodeGen the IR
in such a way that we can take full advantage of this aspect of the
OpenMP specification.
This patch seeks to change that by applying the correct symbol flags
(firstprivate/implicit) to the
implicitly captured scalars within target regions, which then triggers
the delayed privitization code
generation for these symbols, bringing the code generation in-line with
the explicit firstpriviate
clause. Currently, similarly to the delayed privitization I have
sheltered this segment of code
behind the EnabledDelayedPrivitization flag, as without it, we'll
trigger an compiler error for
firstprivate not being supported any time we implicitly capture a scalar
and try to firstprivitize
it, in future when this flag is removed it can also be removed here. So,
for now, you need to
enable this via providing the compiler the flag on compilation of any
programs.
Fortran::parser::omp::GetOmpDirectiveName(t) will get the
OmpDirectiveName object that corresponds to construct t. That object (an
AST node) contains the enum id and the source information of the
directive.
Replace uses of extractOmpDirective and getOpenMPDirectiveEnum with the
new function.
Current implementation of linear clause skips privatisation of all
linear variables during the FIR generation phase, since linear variables
are handled in their entirety by the OpenMP IRBuilder. However,
"implicit" linear variables (like OmpPreDetermined) cannot be skipped,
since FIR generation requires privatized symbols. This patch adds checks
to skip the same.
Fixes https://github.com/llvm/llvm-project/issues/142935
The parser will accept a wide variety of illegal attempts at forming an
ATOMIC construct, leaving it to the semantic analysis to diagnose any
issues. This consolidates the analysis into one place and allows us to
produce more informative diagnostics.
The parser's outcome will be parser::OpenMPAtomicConstruct object
holding the directive, parser::Body, and an optional end-directive. The
prior variety of OmpAtomicXyz classes, as well as OmpAtomicClause have
been removed. READ, WRITE, etc. are now proper clauses.
The semantic analysis consistently operates on "evaluation"
representations, mainly evaluate::Expr (as SomeExpr) and
evaluate::Assignment. The results of the semantic analysis are stored in
a mutable member of the OpenMPAtomicConstruct node. This follows a
precedent of having `typedExpr` member in parser::Expr, for example.
This allows the lowering code to avoid duplicated handling of AST nodes.
Using a BLOCK construct containing multiple statements for an ATOMIC
construct that requires multiple statements is now allowed. In fact, any
nesting of such BLOCK constructs is allowed.
This implementation will parse, and perform semantic checks for both
conditional-update and conditional-update-capture, although no MLIR will
be generated for those. Instead, a TODO error will be issues prior to
lowering.
The allowed forms of the ATOMIC construct were based on the OpenMP 6.0
spec.
Extends support for locality specifiers in `do concurrent` by supporting
data types that need `init` regions.
This further unifies the paths taken by the compiler for OpenMP
privatization clauses and `do concurrent` locality specifiers.
Refactors the utils needed to create privtization/locatization ops for
both the fir and OpenMP dialects into a shared location isolating OpenMP
stuff out of it as much as possible.
Fixes#136357
The barrier needs to go between the copying into firstprivate variables
and the initialization call for the OpenMP construct (e.g. wsloop).
There is no way of expressing this in MLIR because for delayed
privatization that is all implicit (added in MLIR->LLVMIR conversion).
The previous approach put the barrier immediately before the wsloop (or
similar). For delayed privatization, the firstprivate copy code would
then be inserted after that, opening the possibility for the race
observed in the bug report.
This patch solves the issue by instead setting an attribute on the mlir
operation, which will instruct openmp dialect to llvm ir conversion to
insert a barrier in the correct place.
This now produces code equivalent to if there was an explicit private
clause on the SECTIONS construct.
The problem was that each SECTION construct got its own DSP, which tried
to privatize the same symbol for that SECTION. Privatization for
SECTION(S) happens on the outer SECTION construct and so the outer
construct's DSP should be shared.
Fixes#135108
This PR tries to fix `lastprivate` update issues in composite
constructs. In particular, pre-determined `lastprivate` symbols are
attached to the wrong leaf of the composite construct (the outermost
one). When using delayed privatization (should be the default mode in
the future), this results in trying to update the `lastprivate` symbol
in the wrong construct (outside the `omp.loop_nest` op).
For example, given the following input:
```fortran
!$omp target teams distribute parallel do simd collapse(2) private(y_max)
do i=x_min,x_max
do j=y_min,y_max
enddo
enddo
```
Without the fixes introduced in this PR, the `DataSharingProcessor`
tries to generate the `lastprivate` update ops in the `parallel` op
since this is the op for which the DSP instance is created.
The fix consists of 2 main parts:
1. Instead of creating a single DSP instance, one instance is created
for the leaf constructs that might need privatization (whether for
explicit, implicit, or pre-determined symbols).
2. When generating the `lastprivate` comparison ops, we don't directly
use the SSA values of the UBs and steps. Instead, we regenerated these
SSA values from the original loop bounds' expressions. We have to do
this to avoid using `host_eval` values in the `lastprivate` comparison
logic which is illegal.
Use hlfir dereferencing for pointers and allocatables and use hlfir
assign. Also, change the code updating IV in lastprivate.
Note: This is a small change. Modifications in existing tests are
changes from fir.store to hlfir.assign.
Fixes#121290
See RFC:
https://discourse.llvm.org/t/rfc-openmp-supporting-delayed-task-execution-with-firstprivate-variables/83084
The aim here is to ensure that tasks which are not executed for a while
after they are created do not try to reference any data which are now
out of scope. This is done by packing the data referred to by the task
into a heap allocated structure (freed at the end of the task).
I decided to create the task context structure in
OpenMPToLLVMIRTranslation instead of adapting how it is done
CodeExtractor (via OpenMPIRBuilder] because CodeExtractor is (at least
in theory) generic code which could have other unrelated uses.
Whenever there is a `lastprivate` variable and another unrelated
variable sets the `mightHaveReadHostSym` flag during Flang lowering
privatization, this will result in the insertion of a barrier.
This patch modifies this behavior such that this barrier will not be
inserted unless the same symbol both sets the flag and has
`lastprivate`.
Fixes a bug in updating `lastprivate` variables. Previously, we only
iterated over the symbols collected from `lastprivate` clauses. This
meants that for pre-determined symbols, we did not implement the update
correctly (e.g. for loop iteration variables of `simd` constructs).
The intention of this work is to give MLIR->LLVMIR conversion freedom to
control how the private variable is allocated so that it can be
allocated on the stack in ordinary cases or as part of a structure used
to give closure context for tasks which might outlive the current stack
frame. See RFC:
https://discourse.llvm.org/t/rfc-openmp-supporting-delayed-task-execution-with-firstprivate-variables/83084
For example, a privatizer for an integer used to look like
```mlir
omp.private {type = private} @x.privatizer : !fir.ref<i32> alloc {
^bb0(%arg0: !fir.ref<i32>):
%0 = ... allocate proper memory for the private clone ...
omp.yield(%0 : !fir.ref<i32>)
}
```
After this change, allocation become implicit in the operation:
```mlir
omp.private {type = private} @x.privatizer : i32
```
For more complex types that require initialization after allocation, an
init region can be used:
``` mlir
omp.private {type = private} @x.privatizer : !some.type init {
^bb0(%arg0: !some.pointer<!some.type>, %arg1: !some.pointer<!some.type>):
// initialize %arg1, using %arg0 as a mold for allocations
omp.yield(%arg1 : !some.pointer<!some.type>)
} dealloc {
^bb0(%arg0: !some.pointer<!some.type>):
... deallocate memory allocated by the init region ...
omp.yield
}
```
This patch lays the groundwork for delayed task execution but is not
enough on its own.
After this patch all gfortran tests which previously passed still pass.
There
are the following changes to the Fujitsu test suite:
- 0380_0009 and 0435_0009 are fixed
- 0688_0041 now fails at runtime. This patch is testing firstprivate
variables with tasks. Previously we got lucky with the undefined
behavior and won the race. After these changes we no longer get lucky.
This patch lays the groundwork for a proper fix for this issue.
In flang the lowering re-uses the existing lowering used for reduction
init and dealloc regions.
In flang, before this patch we hit a TODO with the same wording when
generating the copy region for firstprivate polymorphic variables. After
this patch the box-like fir.class is passed by reference into the copy
region, leading to a different path that didn't hit that old TODO but
the generated code still didn't work so I added a new TODO in
DataSharingProcessor.
This fixes a bug when the same variable is used in `firstprivate` and
`lastprivate` clauses on the same construct. The issue boils down to the
fact that `copyHostAssociateVar` was deciding the direction of the copy
assignment (i.e. the `lhs` and `rhs`) based on whether the
`copyAssignIP`
parameter is set. This is not the best way to do it since it is not
related to whether we doing a copy from host to localized copy or the
other way around. When we set the insertion for `firstprivate` in
delayed privatization, this resulted in switching the direction of the
copy assignment. Instead, this PR adds a new paramter to explicitely
tell
the function the direction of the assignment.
This is a follow up PR for
https://github.com/llvm/llvm-project/pull/122471, only the latest commit
is relevant.
InitializeClone(), implemented in #120295, was not handling top
level pointers and allocatables correctly.
Pointers and unallocated variables must be skipped.
This caused some regressions in the Fujitsu testsuite:
https://linaro.atlassian.net/browse/LLVM-1488
Allocatable members of privatized derived types must be allocated,
with the same bounds as the original object, whenever that member
is also allocated in it, but Flang was not performing such
initialization.
The `Initialize` runtime function can't perform this task unless
its signature is changed to receive an additional parameter, the
original object, that is needed to find out which allocatable
members, with their bounds, must also be allocated in the clone.
As `Initialize` is used not only for privatization, sometimes this
other object won't even exist, so this new parameter would need
to be optional.
Because of this, it seemed better to add a new runtime function:
`InitializeClone`.
To avoid unnecessary calls, lowering inserts a call to it only for
privatized items that are derived types with allocatable members.
Fixes https://github.com/llvm/llvm-project/issues/114888
Fixes https://github.com/llvm/llvm-project/issues/114889
Convert `DataSharingProcessor::symTable` from pointer to reference.
This avoids accidental null pointer dereferences and makes it
possible to use `symTable` when delayed privatization is disabled.
Both OpenMP privatization and DO CONCURRENT LOCAL lowering was incorrect
for pointers and derived type with default initialization.
For pointers, the descriptor was not established with the rank/type
code/element size, leading to undefined behavior if any inquiry was made
to it prior to a pointer assignment (and if/when using the runtime for
pointer assignments, the descriptor must have been established).
For derived type with default initialization, the copies were not
default initialized.
This patch simplifies the representation of OpenMP loop wrapper
operations by introducing the `NoTerminator` trait and updating
accordingly the verifier for the `LoopWrapperInterface`.
Since loop wrappers are already limited to having exactly one region
containing exactly one block, and this block can only hold a single
`omp.loop_nest` or loop wrapper and an `omp.terminator` that does not
return any values, it makes sense to simplify the representation of loop
wrappers by removing the terminator.
There is an extensive list of Lit tests that needed updating to remove
the `omp.terminator`s adding some noise to this patch, but actual
changes are limited to the definition of the `omp.wsloop`, `omp.simd`,
`omp.distribute` and `omp.taskloop` loop wrapper ops, Flang lowering for
those, `LoopWrapperInterface::verifyImpl()`, SCF to OpenMP conversion
and OpenMP dialect documentation.
OpenMP prohibits privatisation of variables that appear in expressions
for statement functions.
This is a re-working of an old patch https://reviews.llvm.org/D93213 by
@praveen-g-ctt.
The old patch couldn't be landed because of ordering concerns. Statement
functions are rewritten during parse tree rewriting, but this was done
after resolve-directives and so some array expressions were incorrectly
identified as statement functions. For this reason **I have opted to
re-order the semantics driver so that resolve-directives is run after
parse tree rewriting**.
Closes#54677
---------
Co-authored-by: Praveen <praveen@compilertree.com>
This patch creates a simple RAII wrapper class for `SymMap` to make it
easier to use and prevent a missing matching `popScope()` for a
`pushScope()` call on simple use cases.
Some push-pop pairs are replaced with instances of the new class by this
patch.
This patch removes unused and undefined method declarations from
`DataSharingProcessor`, as well as the unused `hasLastPrivateOp` class
member. The `insPt` class member is replaced by a local `InsertionGuard`
in the only place it is set and used.
This patch updates the `omp.parallel` operation according to the results
of the discussion in [this
RFC](https://discourse.llvm.org/t/rfc-disambiguation-between-loop-and-block-associated-omp-parallelop/79972).
It is removed from the set of loop wrapper operations, changing the
expected MLIR representation for composite `distribute parallel do/for`
into the following:
```mlir
omp.parallel {
...
omp.distribute {
omp.wsloop {
omp.loop_nest ... { ... }
omp.terminator
}
omp.terminator
}
...
omp.terminator
}
```
MLIR verifiers for operations impacted by this representation change are
updated, as well as related tests. The `LoopWrapperInterface` is also
updated, since it's no longer representing an optional "role" of an
operation but a mandatory set of restrictions instead.
Variables referenced in the body of statement functions need to be
handled as if they are explicitly referenced. Otherwise, they are
skipped during implicit privatization, because statement functions
are represented as procedures in the parse tree.
To avoid missing symbols referenced only in statement functions
during implicit privatization, new symbols, associated with them,
are created and inserted into the context of the directive that
privatizes them. They are later collected and processed in
lowering. To avoid confusing these new symbols with regular ones,
they are tagged with the new OmpFromStmtFunction flag.
Fixes https://github.com/llvm/llvm-project/issues/74273
Handles variables that are storage associated via `equivalence`. The
problem is that these variables are declared as `fir.ptr`s while their
privatized storage is declared as `fir.ref` which was triggering a
validation error in the OpenMP dialect.
This patch introduces a new OpenMP clause definition not defined by the spec.
Its main purpose is to define the `loop_inclusive` (previously "inclusive",
renamed according to the parent of this PR in the stack) argument of
`omp.loop_nest` in such a way that a followup implementation of a tablegen
backend to automatically generate clause and operation operand structures
directly from `OpenMP_Op` and `OpenMP_Clause` definitions can properly generate
the `LoopNestOperands` structure.
`collapse` clause arguments are also moved into this new definition, as they
represent information on the loop nests being collapsed rather than the
`collapse` clause itself.