Changes:
- Adds a new `omp.declare_simd` operation to the OpenMP MLIR dialect
- Lowers Fortran `!$omp declare simd` into `omp.declare_simd` inside the
enclosing function body
mlir to LLVMIR translation and uniform clause will be added in follow-up
PRs.
Implementation follows exactly what is done for omp.wsloop and omp.task.
See #137841.
The change to the operation verifier is to allow a taskgroup
cancellation point inside of a taskloop. This was already allowed for
omp.cancel.
Add support for OpenMP is_device_ptr clause for target directives.
[MLIR][OpenMP] Add OpenMPToLLVMIRTranslation support for is_device_ptr
#169367 This PR adds support for the OpenMP is_device_ptr clause in the
MLIR to LLVM IR translation for target regions. The is_device_ptr clause
allows device pointers (allocated via OpenMP runtime APIs) to be used
directly in target regions without implicit mapping.
Add support for OpenMP is_device_ptr clause for target directives.
[MLIR][OpenMP] Add OpenMPToLLVMIRTranslation support for is_device_ptr #169367
This PR adds support for the OpenMP is_device_ptr clause in the MLIR to LLVM IR translation for target regions. The is_device_ptr clause allows device pointers (allocated via OpenMP runtime APIs) to be used directly in target regions without implicit mapping.
`dist_schedule` was previously supported in Flang/Clang but was not
implemented in MLIR, instead a user would get a "not yet implemented"
error. This patch adds support for the `dist_schedule` clause to be
lowered to LLVM IR when used in an `omp.distribute` or `omp.wsloop`
section.
There has needed to be some rework required to ensure that MLIR/LLVM
emits the correct Schedule Type for the clause, as it uses a different
schedule type to other OpenMP directives/clauses in the runtime library.
This patch also ensures that when using dist_schedule or a chunked
schedule clause, the correct llvm loop parallel accesses details are
added.
This PR shifts from using the LLVM OpenMP enumerator bit flags to an
OpenMP dialect specific enumerator. This allows us to better represent
map types that wouldn't be of interest to the LLVM backend and runtime
in the dialect.
Primarily things like
ref_ptr/ref_ptee/ref_ptr_ptee/atach_none/attach_always/attach_auto which
are of interest to the compiler for certrain transformations (primarily
in the FIR transformation passes dealing with mapping), but the runtime
has no need to know about them. It also means if another OpenMP
implementation comes along they won't need to stick to the same bit flag
system LLVM chose/do leg work to address it.
Improve the automatic naming of variables defined by the
`omp.canonical_loop` operation:
1. The iteration variable gets a name consistent with the cli variable
2. Instead of appending `_s0` for each nesting level, shorten it to
`_d<num>` for a perfectly nested loop at depth `<num>`
3. Do not add any suffix to the top-level loop if it is the only
top-level loop
Enable the generation of no-loop kernels for Fortran OpenMP code. target
teams distribute parallel do pragmas can be promoted to no-loop kernels
if the user adds the -fopenmp-assume-teams-oversubscription and
-fopenmp-assume-threads-oversubscription flags.
If the OpenMP kernel contains reduction or num_teams clauses, it is not
promoted to no-loop mode.
The global OpenMP device RTL oversubscription flags no longer force
no-loop code generation for Fortran.
This patch enables tiling in flang. In MLIR tiling is handled by
changing the the omp.loop_nest op to be able to represent both collapse
and tiling, so the flang front-end will combine the nested constructs into
a single MLIR op. The MLIR->LLVM-IR lowering of the LoopNestOp is
enhanced to first do the tiling if present, then collapse.
This PR adds workdistribute mlir op in omp dialect and also in llvm
frontend.
The work in this PR is c-p and updated from @ivanradanov commits from coexecute implementation:
flang_workdistribute_iwomp_2024
This PR introduces two new ops in omp dialect, omp.target_allocmem and
omp.target_freemem.
omp.target_allocmem: Allocates heap memory on device. Will be lowered to
omp_target_alloc call in llvm.
omp.target_freemem: Deallocates heap memory on device. Will be lowered
to omp+target_free call in llvm.
Example:
%1 = omp.target_allocmem %device : i32, i64
omp.target_freemem %device, %1 : i32, i64
The work in this PR is C-P/inspired from @ivanradanov commit from
coexecute implementation:
[Add fir omp target alloc and free
ops](be860ac8ba)
[Lower omp_target_{alloc,free} to
llvm](6e2d584dc9)
This PR uses `val.getDefiningOp<OpTy>()` to replace `dyn_cast<OpTy>(val.getDefiningOp())` , `dyn_cast_or_null<OpTy>(val.getDefiningOp())` and `dyn_cast_if_present<OpTy>(val.getDefiningOp())`.
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
This patch includes adding support for OMP ALLOCATE directive along with
ALIGN clause and ALLOCATOR clause which are used within OMP ALLOCATE
directive
Add the supporting OpenMP Dialect operations, types, and interfaces for
modelling
MLIR Operations:
* omp.newcli
* omp.canonical_loop
MLIR Types:
* !omp.cli
MLIR Interfaces:
* LoopTransformationInterface
As a first loop transformations to be able to use these new operation in
follow-up PRs (#144785)
* omp.unroll_heuristic
A barrier is needed at the end of initialization/copying of private
variables if any of those variables is lastprivate. This ensures that
all firstprivate variables receive the original value of the variable
before the lastprivate clause overwrites it.
Previously this barrier was added by the flang fontend, but there is not
a reliable way to put the barrier in the correct place for delayed
privatization, and the OpenMP dialect could some day have other users.
It is important that there are safe ways to use the constructs available
in the dialect.
lastprivate is currently not modelled in the OpenMP dialect, and so
there is no way to reliably determine whether there were lastprivate
variables. Therefore the frontend will have to provide this information
through this new attribute.
Part of a series of patches to fix
https://github.com/llvm/llvm-project/issues/136357
This patch updates MLIR op verifiers for operations taking arguments
that must always be defined by an `omp.map.info` operation to check this
requirement.
It also modifies Flang lowering for `use_device_{addr, ptr}`, as well as
the custom MLIR printer and parser for these clauses, to support
initializing it to `OMP_MAP_RETURN_PARAM` and represent this in the MLIR
representation as `return_param`. This internal mapping flag is what
eventually is used for variables passed via these clauses into the
target region when translating to LLVM IR, so making it explicit in
Flang and MLIR removes an inconsistency in the current representation.
omp.cancel and omp.cancellationpoint contain an attribute describing the
type of parent construct which should be cancelled. e.g.
```
!$omp cancel do
```
Must be inside of a wsloop. Previously the verifer required the
immediate parent to be this operation. This is not quite right because
something like the following is valid:
```
!$omp parallel do
do i = 1, N
if (cond) then
!$omp cancel do
endif
enddo
```
This patch relaxes the verifier to only require that some parent
operation matches (not necessarily the immediate parent).
This patch revisits op verifiers for `LoopWrapperInterface` operations
to improve consistency across operations and to properly cover some
previously misreported cases.
Checks that should be done for these kinds of operations are documented
in the interface description.
This patch updates the handling of target regions to set trip counts and
kernel execution modes properly, based on clang's behavior. This fixes a
race condition on `target teams distribute` constructs with no `parallel
do` loop inside.
This is how kernels are classified, after changes introduced in this
patch:
```f90
! Exec mode: SPMD.
! Trip count: Set.
!$omp target teams distribute parallel do
do i=...
end do
! Exec mode: Generic-SPMD.
! Trip count: Set (outer loop).
!$omp target teams distribute
do i=...
!$omp parallel do private(idx, y)
do j=...
end do
end do
! Exec mode: Generic-SPMD.
! Trip count: Set (outer loop).
!$omp target teams distribute
do i=...
!$omp parallel
...
!$omp end parallel
end do
! Exec mode: Generic.
! Trip count: Set.
!$omp target teams distribute
do i=...
end do
! Exec mode: SPMD.
! Trip count: Not set.
!$omp target parallel do
do i=...
end do
! Exec mode: Generic.
! Trip count: Not set.
!$omp target
...
!$omp end target
```
For the split `target teams distribute + parallel do` case, clang
produces a Generic kernel which gets promoted to Generic-SPMD by the
openmp-opt pass. We can't currently replicate that behavior in flang
because our codegen for these constructs results in the introduction of
calls to the `kmpc_distribute_static_loop` family of functions, instead
of `kmpc_distribute_static_init`, which currently prevent promotion of
the kernel to Generic-SPMD.
For the time being, instead of relying on the openmp-opt pass, we look
at the MLIR representation to find the Generic-SPMD pattern and directly
tag the kernel as such during codegen. This is what we were already
doing, but incorrectly matching other kinds of kernels as such in the
process.
This patch updates Flang lowering and kernel flags identification in
MLIR so that loop bounds on `target teams loop` constructs are evaluated
on the host, making the trip count available to the corresponding
`__tgt_target_kernel` call emitted for the target region.
This is necessary in order to properly execute these constructs as
`target teams distribute parallel do`.
Co-authored-by: Kareem Ergawy <kareem.ergawy@amd.com>
This patch makes the `map_type` and `map_capture_type` arguments of the
`omp.map.info` operation required, which was already an invariant being
verified by its users via `verifyMapClause()`. This makes it clearer, as
getters no longer return misleading `std::optional` values.
Checks for the `mapper_id` argument are moved to a verifier for the
operation, rather than being checked by users.
Functionally NFC, but not marked as such due to a reordering of
arguments in the assembly format of `omp.map.info`.
This patch avoids calling `TargetOp::getInnermostCapturedOmpOp` multiple
times during initialization of default and runtime target attributes in
MLIR to LLVM IR translation of `omp.target` operations. This is a
potentially expensive operation, so this change should help keep compile
times lower.
This patch removes the `ReductionClauseInterface` and all definitions of
its associated `getAllReductionVars` method.
The method mandated by this interface is not used anywhere and the
conflicts its definition produces when multiple reduction clauses are
present in an operation result in a more convoluted operation
definition, so it seems better to remove it and only add something like
this if there's a clear advantage to it.
This patch makes additions to the `BlockArgOpenMPOpInterface` to
simplify its use by letting it handle the matching between operands and
their associated entry block arguments. Most significantly, the
following is now possible:
```c++
SmallVector<std::pair<Value, BlockArgument>> pairs;
cast<BlockArgOpenMPOpInterface>(op).getBlockArgsPairs(pairs);
for (auto [var, arg] : pairs) {
// var points to the operand (outside value) and arg points to the entry
// block argument associated to that value.
}
```
This is achieved by making the interface define and use `getXyzVars()`
methods, which by default return empty `OperandRange`s and are overriden
by getters automatically produced for the `Variadic<...> $xyz_vars`
tablegen argument of the corresponding clause. These definitions can
then be simplified, since they no longer need to manually define
`numXyzBlockArgs` functions as a result.
A side-effect of this is that all ops implementing this interface will
now publicly define `getXyzVars()` functions for all entry block
argument-generating clauses, even if they don't actually accept all
clauses. However, these would just return empty ranges, so it shouldn't
cause issues.
This change uncovered some incorrect definitions of class declarations
related to the `ReductionClauseInterface`, and the `OpenMP_DetachClause`
incorrectly implementing the `BlockArgOpenMPOpInterface`, so these
issues are also addressed.
The HAS_DEVICE_ADDR indicates that the object(s) listed exists at an
address that is a valid device address. Specifically,
`has_device_addr(x)` means that (in C/C++ terms) `&x` is a device
address.
When entering a target region, `x` does not need to be allocated on the
device, or have its contents copied over (in the absence of additional
mapping clauses). Passing its address verbatim to the region for use is
sufficient, and is the intended goal of the clause.
Some Fortran objects use descriptors in their in-memory representation.
If `x` had a descriptor, both the descriptor and the contents of `x`
would be located in the device memory. However, the descriptors are
managed by the compiler, and can be regenerated at various points as
needed. The address of the effective descriptor may change, hence it's
not safe to pass the address of the descriptor to the target region.
Instead, the descriptor itself is always copied, but for objects like
`x`, no further mapping takes place (as this keeps the storage pointer
in the descriptor unchanged).
---------
Co-authored-by: Sergio Afonso <safonsof@amd.com>
This PR adds an initial implementation for the map modifiers close,
present and ompx_hold, primarily just required adding the appropriate
map type flags to the map type bits. In the case of ompx_hold it
required adding the map type to the OpenMP dialect. Close has a bit of a
problem when utilised with the ALWAYS map type on descriptors, so it is
likely we'll have to make sure close and always are not applied to the
descriptor simultaneously in the future when we apply always to the
descriptors to facilitate movement of descriptor information to device
for consistency, however, we may find an alternative to this with
further investigation. For the moment, it is a TODO/Note to keep track
of it.