This patch updates Flang lowering and kernel flags identification in
MLIR so that loop bounds on `target teams loop` constructs are evaluated
on the host, making the trip count available to the corresponding
`__tgt_target_kernel` call emitted for the target region.
This is necessary in order to properly execute these constructs as
`target teams distribute parallel do`.
Co-authored-by: Kareem Ergawy <kareem.ergawy@amd.com>
The OpenMP standard says that all dependencies in the same set of
inter-dependent tasks must be non-overlapping. This simplification means
that the OpenMP only needs to keep track of the base addresses of
dependency variables. This can be seen in kmp_taskdeps.cpp, which stores
task dependency information in a hash table, using the base address as a
key.
This patch generates a rebox operation to slice boxed arrays, but only
the box data address is used for the task dependency. The extra box is
optimized away by LLVM at O3.
Vector subscripts are TODO (I will address in my next patch).
This also fixes a bug for ordinary subscripts when the symbol was mapped
to a box:
Fixes#132647
This patch adds the `OpWithBodyGenInfo::blockArgs` field and updates
`createBodyOfOp()` to prevent the need for `BlockArgOpenMPOpInterface`
operations to pass the same callback, minimizing chances of introducing
inconsistent behavior.
The `OmpDirectiveSpecification` contains directive name, the list of
arguments, and the list of clauses. It was introduced to store the
directive specification in METADIRECTIVE, and could be reused everywhere
a directive representation is needed.
In the long term this would unify the handling of common directive
properties, as well as creating actual constructs from METADIRECTIVE by
linking the contained directive specification with any associated user
code.
Adds Parser and Semantic Support for the below construct and clauses:
- Interop Construct
- Init Clause
- Use Clause
Note:
The other clauses supported by Interop Construct such as Destroy, Use,
Depend and Device are added already.
Fixes#112538
The problem was that the host associated symbol for the threadprivate
variable doesn't have all of the symbol attributes (e.g. POINTER). This
caused the lowering code to generate the wrong type, eventually hitting
an assertion.
Improve the check for whether a type can be passed by copy. Currently,
passing by copy is done via the OMP_MAP_LITERAL mapping, which can only
transfer as much data as can be contained in a pointer representation.
The HAS_DEVICE_ADDR indicates that the object(s) listed exists at an
address that is a valid device address. Specifically,
`has_device_addr(x)` means that (in C/C++ terms) `&x` is a device
address.
When entering a target region, `x` does not need to be allocated on the
device, or have its contents copied over (in the absence of additional
mapping clauses). Passing its address verbatim to the region for use is
sufficient, and is the intended goal of the clause.
Some Fortran objects use descriptors in their in-memory representation.
If `x` had a descriptor, both the descriptor and the contents of `x`
would be located in the device memory. However, the descriptors are
managed by the compiler, and can be regenerated at various points as
needed. The address of the effective descriptor may change, hence it's
not safe to pass the address of the descriptor to the target region.
Instead, the descriptor itself is always copied, but for objects like
`x`, no further mapping takes place (as this keeps the storage pointer
in the descriptor unchanged).
---------
Co-authored-by: Sergio Afonso <safonsof@amd.com>
This PR tries to fix `lastprivate` update issues in composite
constructs. In particular, pre-determined `lastprivate` symbols are
attached to the wrong leaf of the composite construct (the outermost
one). When using delayed privatization (should be the default mode in
the future), this results in trying to update the `lastprivate` symbol
in the wrong construct (outside the `omp.loop_nest` op).
For example, given the following input:
```fortran
!$omp target teams distribute parallel do simd collapse(2) private(y_max)
do i=x_min,x_max
do j=y_min,y_max
enddo
enddo
```
Without the fixes introduced in this PR, the `DataSharingProcessor`
tries to generate the `lastprivate` update ops in the `parallel` op
since this is the op for which the DSP instance is created.
The fix consists of 2 main parts:
1. Instead of creating a single DSP instance, one instance is created
for the leaf constructs that might need privatization (whether for
explicit, implicit, or pre-determined symbols).
2. When generating the `lastprivate` comparison ops, we don't directly
use the SSA values of the UBs and steps. Instead, we regenerated these
SSA values from the original loop bounds' expressions. We have to do
this to avoid using `host_eval` values in the `lastprivate` comparison
logic which is illegal.
Use hlfir dereferencing for pointers and allocatables and use hlfir
assign. Also, change the code updating IV in lastprivate.
Note: This is a small change. Modifications in existing tests are
changes from fir.store to hlfir.assign.
Fixes#121290
The syntax with the object list following the memory-order clause has
been removed in OpenMP 5.2. Still, accept that syntax with versions >=
5.2, but treat it as deprecated (and emit a warning).
Enough suport to parse correctly formed directives of !$OMP ASSUME and
!$OMP ASSUMES with teh related clauses that go with them: ABSENT,
CONTAINS, NO_OPENPP, NO_OPENMP_ROUTINES, NO_PARALLELISM and HOLDS.
Tests added for unparsing and dump parse-tree.
Semantics support is very minimal and no specific tests added.
The lowering will hit a TODO, and there are tests in Lower/OpenMP/Todo
to make it clear that this is currently expected behaviour.
---------
Co-authored-by: Kiran Chandramohan <kiran.chandramohan@arm.com>
Co-authored-by: Krzysztof Parzyszek <Krzysztof.Parzyszek@amd.com>
This patch adds `target teams distribute [simd]` and equivalent
construct nests to the list of cases where loop bounds can be evaluated
in the host, as they represent kernels for which the trip count must
also be evaluated in advance to the kernel call.
Move non-common files from FortranCommon to FortranSupport (analogous to
LLVMSupport) such that
* declarations and definitions that are only used by the Flang compiler,
but not by the runtime, are moved to FortranSupport
* declarations and definitions that are used by both ("common"), the
compiler and the runtime, remain in FortranCommon
* generic STL-like/ADT/utility classes and algorithms remain in
FortranCommon
This allows a for cleaner separation between compiler and runtime
components, which are compiled differently. For instance, runtime
sources must not use STL's `<optional>` which causes problems with CUDA
support. Instead, the surrogate header `flang/Common/optional.h` must be
used. This PR fixes this for `fast-int-sel.h`.
Declarations in include/Runtime are also used by both, but are
header-only. `ISO_Fortran_binding_wrapper.h`, a header used by compiler
and runtime, is also moved into FortranCommon.
Parse METADIRECTIVE as a standalone executable directive at the moment.
This will allow testing the parser code.
There is no lowering, not even clause conversion yet. There is also no
verification of the allowed values for trait sets, trait properties.
This allows the Flang parser to accept the !$OMP DISPATCH and related
clauses.
Lowering is currently not implemented. Tests for unparse and parse-tree
dump is provided, and one for checking that the lowering ends in a "not
yet implemented"
---------
Co-authored-by: Kiran Chandramohan <kiran.chandramohan@arm.com>
Move OpenACC/OpenMP helpers from Lower/DirectivesCommon.h that are also
used in OpenACC and OpenMP mlir passes into a new
Optimizer/Builder/DirectivesCommon.h so that parser and evaluate headers
are not included in Optimizer libraries (this both introduce
compile-time and link-time pointless overheads).
This should fix https://github.com/llvm/llvm-project/issues/123377
Implementation details:
The UNTIED clause is recognized by setting the flag=0 for the default
case or performing logical OR to flag if other clauses are specified,
and this flag is passed as an argument to the `__kmpc_omp_task_alloc`
runtime call.
Resubmitting the PR with fix for the failure, as it was reverted here:
927a70daf31b1610627f346b0dc140eda72144b9
and previously merged here: https://github.com/llvm/llvm-project/pull/115283
This fixes a bug when the same variable is used in `firstprivate` and
`lastprivate` clauses on the same construct. The issue boils down to the
fact that `copyHostAssociateVar` was deciding the direction of the copy
assignment (i.e. the `lhs` and `rhs`) based on whether the
`copyAssignIP`
parameter is set. This is not the best way to do it since it is not
related to whether we doing a copy from host to localized copy or the
other way around. When we set the insertion for `firstprivate` in
delayed privatization, this resulted in switching the direction of the
copy assignment. Instead, this PR adds a new paramter to explicitely
tell
the function the direction of the assignment.
This is a follow up PR for
https://github.com/llvm/llvm-project/pull/122471, only the latest commit
is relevant.
This enable delayed privatization by default for `omp.wsloop` ops, with
one caveat! I had to workaround the "impure" alloc region issue that
being resolved at the moment. The workaround detects whether the alloc
region's argument is used in the region and at the same time defined in
block that does not dominate the chosen alloca insertion point. If so,
we move the alloca insertion point below the defining instruction of the
alloc region argument. This basically reverts to the
non-delayed-privatizaiton behavior.
This patch adds support for lowering OpenMP clauses and expressions
attached to constructs nested inside of a target region that need to be
evaluated in the host device. This is done through the use of the
`OpenMP_HostEvalClause` `omp.target` set of operands and entry block
arguments.
When lowering clauses for a target construct, a more involved
`processHostEvalClauses()` function is called, which looks at the
current and potentially other nested constructs in order to find and
lower clauses that need to be processed outside of the `omp.target`
operation under construction. This populates an instance of a global
structure with the resulting MLIR values.
The resulting list of host-evaluated values is used to initialize the
`host_eval` operands when constructing the `omp.target` operation, and
then replaced with the corresponding block arguments after creating that
operation's region.
Afterwards, while lowering nested operations, those that might
potentially be evaluated on the host (i.e. `num_teams`, `thread_limit`,
`num_threads` and `collapse`) check first whether there is an active
global host-evaluated information structure and whether it holds values
referring to these clauses. If that is the case, the stored values
(`omp.target` entry block arguments at that stage) are used instead of
lowering these clauses again.
This patch adds PFT to MLIR lowering of teams reductions. Since there is
still no MLIR to LLVM IR translation implemented, compilation of
programs including these constructs will still trigger
not-yet-implemented errors.
Attempt to address the following example from causing an assert or ICE:
```
subroutine test(a)
implicit none
integer :: i
real(kind=real64), dimension(:) :: a
real(kind=real64), dimension(size(a, 1)) :: b
!$omp target map(tofrom: b)
do i = 1, 10
b(i) = i
end do
!$omp end target
end subroutine
```
Where we utilise a Fortran intrinsic (size) to calculate the size of
allocatable arrays and then map it to device.
Allow utility constructs (error and nothing) to appear in the
specification part as well as the execution part. The exception is
"ERROR AT(EXECUTION)" which should only be in the execution part.
In case of ambiguity (the boundary between the specification and the
execution part), utility constructs will be parsed as belonging to the
specification part. In such cases move them to the execution part in the
OpenMP canonicalization code.
Implementation details:
The UNTIED clause is recognized by setting the flag=0 for the default
case or performing logical OR to flag if other clauses are specified,
and this flag is passed as an argument to the `__kmpc_omp_task_alloc`
runtime call.
This re-applies #117867 with a small fix that hopefully prevents build
bot failures. The fix is avoiding `dyn_cast` for the result of
`getOperation()`. Instead we can assign the result to `mlir::ModuleOp`
directly since the type of the operation is known statically (`OpT` in
`OperationPass`).
This is a starting PR to implicitly map allocatable record fields.
This PR contains the following changes:
1. Re-purposes some of the utils used in `Lower/OpenMP.cpp` so that
these utils work on the `mlir::Value` level rather than the
`semantics::Symbol` level. This takes one step towards to enabling
MLIR passes to more easily do some lowering themselves (e.g. creating
`omp.map.bounds` ops for implicitely caputured data like this PR
does).
2. Adds support for implicitely capturing and mapping allocatable fields
in record types.
There is quite some distant to still cover to have full support for
this. I added a number of todos to guide further development.
Co-authored-by: Andrew Gozillon <andrew.gozillon@amd.com>
Co-authored-by: Andrew Gozillon <andrew.gozillon@amd.com>
Convert `DataSharingProcessor::symTable` from pointer to reference.
This avoids accidental null pointer dereferences and makes it
possible to use `symTable` when delayed privatization is disabled.
This adds a minimalistic implementation of parsing and semantics for the
ATOMIC COMPARE feature from OpenMP 5.1.
There is no lowering, just a TODO for that part. Some of the Semantics
is also just a comment explaining that more is needed.
Introduces a new conversion pass that rewrites `omp.loop` ops to their
semantically equivalent op nests bases on the surrounding/binding
context of the `loop` op. Not all forms of `omp.loop` are supported yet.
See `isLoopConversionSupported` for more info on which forms are
supported.