I'll merge this at the same time as some llvm-zorg changes that start
building the DeviceRTL.
We only see one new test passing because everything still fails because
of the issue described in
https://github.com/llvm/llvm-project/pull/178980
Once a fix for that issue is merged we will see many new passes.
Right now if we run `check-offload` for SPIR-V the DeviceRTL isn't used
because we pass `-nogpulib`.
Don't pass that, but also don't pass `--libomptarget-spirv-bc-path` yet
because the DeviceRTL is brand new so we don't want to error if it's not
present.
Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>
Summary:
We provide an RPC server to manage calls initiated by the device to run
on the host. This is very useful for the built-in handling we have,
however there are cases where we would want to extend this
functionality.
Cases like Fortran or MPI would be useful, but we cannot put references
to these in the core offloading runtime. This way, we can provide this
as a library interface that registers custom handlers for whatever code
people want.
Recursive types can cause re-entrant mapper emission. The mapper
function is created by OpenMPIRBuilder before the callbacks run, so it
may already exist in the LLVM module even though it is not yet
registered in the ModuleTranslation mapping table. Reuse and register it
to break the recursion. Added offloading test.
Fix OpenMP mapper lowering by attaching user-defined/default mappers
only to the base parent entry, not combined/segment entries. This
prevents mapper calls with partial sizes. Added relevant tests.
This reverts commit 5a457837dd988aa01c65820848381a5b99a74c0a.
Includes the test fix from
https://github.com/llvm/llvm-project/pull/177659.
The test had to be updated to exclude a scenario that was failing
with/without the change (involving mapping a struct with a byref member
with a mapper).
-----
**Original PR's description:**
This is a fix for https://github.com/llvm/llvm-project/issues/61636.
Ravi had this implemented downstream before he retired. This PR is a
chery-pick of that.
The test is taken from @jdoerfert's WIP change in
527bf4b129.
The change partially undoes the changes done in
0caf736d7e1d16d1059553fc28dbac31f0b9f788, so @alexey-bataev might need
to take a look.
Make implicit default mapper generation respect defaultmap categories so
unrelated defaultmap clauses no longer suppress mappers for derived
types.
Added related tests.
This is a fix for https://github.com/llvm/llvm-project/issues/61636.
Ravi had this implemented downstream before he retired. This PR is a
chery-pick of that.
The test is taken from @jdoerfert's WIP change in
527bf4b129.
The change partially undoes the changes done in
0caf736d7e1d16d1059553fc28dbac31f0b9f788, so @alexey-bataev might need
to take a look.
---------
Co-authored-by: Ravi Narayanaswamy <ravi.narayanaswamy@intel.com>
Co-authored-by: Johannes Doerfert <johannes@jdoerfert.de>
The compiler skips mapping of named constants (parameters) to OpenMP
target regions under the assumption that constants don't need to be
mapped. This assumption is not valid when array is accessed inside with
dynamic index. The problem can be seen with the following code:
```
module fir_lowering_check
implicit none
integer, parameter :: dp = selected_real_kind(15, 307)
real(dp), parameter :: arrays(2) = (/ 0.0, 0.0 /)
contains
subroutine test(hold)
integer, intent(in) :: hold
integer :: z
real(dp) :: temp
!$omp target teams distribute parallel do
do z = 1, 2
temp = arrays(hold)
end do
!$omp end target teams distribute parallel do
end subroutine test
end module fir_lowering_check
program main
use fir_lowering_check
implicit none
integer :: hold
hold = 1
call test(hold)
print *, "Finished"
end program main
```
It fails with the following error
`'hlfir.designate' op using value defined outside the region`
The fix is to allow mapping of constant arrays and map them as `to`.
Depends on #170578.
The fallback modifiers are currently part of OpenMP 6.1. 4/8 of the
tests check for the current bad output, with FIXME comments.
3 of these "bad" tests will be fixed with the 4th PR in this stack with
the `fb_nullify` codegen changes.
4th bad test will need a follow-up fix to privatization of byref
`use_device_ptr` operands.
Dependent PR: #173931.
PR #144635 enabled non-contiguous updates for both `update from` and
`update to` clauses, but tests for `update to` were missing. This PR
adds those missing tests to ensure coverage.
As per OpenMP 5.1, we need to assume that when the lookup for
`use_device_ptr/addr` fails, the incoming pointer was already device
accessible.
Prior to 5.1, a lookup-failure meant a user-error (for
`use_device_ptr`),
so we could do anything in that scenario. For `use_device_addr`,
it was always incorrect to set the address to null.
OpenMP 6.1 adds a way to retain the previous behavior of nullifying a
pointer
when the lookup fails. That will be tackled by the PR stack
starting with https://github.com/llvm/llvm-project/pull/169603.
Summary:
This is only really meaningful for the NVPTX target. Not all build
environments support host LTO and these are redundant tests, just clean
this up and make it run faster.
The following test was triggering a runtime crash **on the host before
launching the kernel**:
```fortran
program test_omp_target_map_bug_v5
implicit none
type nested_type
real, allocatable :: alloc_field(:)
end type nested_type
type nesting_type
integer :: int_field
type(nested_type) :: derived_field
end type nesting_type
type(nesting_type) :: config
allocate(config%derived_field%alloc_field(1))
!$OMP TARGET ENTER DATA MAP(TO:config, config%derived_field%alloc_field)
!$OMP TARGET
config%derived_field%alloc_field(1) = 1.0
!$OMP END TARGET
deallocate(config%derived_field%alloc_field)
end program test_omp_target_map_bug_v5
```
In particular, the runtime was producing a segmentation fault when the
test is compiled with any optimization level > 0; if you compile with
-O0 the sample ran fine.
After debugging the runtime, it turned out the crash was happening at
the point where the runtime calls the default mapper emitted by the
compiler for `nesting_type; in particular at this point in the runtime:
c62cd2877c/offload/libomptarget/omptarget.cpp (L307).
Bisecting the optimization pipeline using `-mllvm -opt-bisect-limit=N`,
the first pass that triggered the issue on `O1` was the `instcombine`
pass. Debugging this further, the issue narrows down to canonicalizing
`getelementptr` instructions from using struct types (in this case the
`nesting_type` in the sample above) to using addressing bytes (`i8`). In
particular, in `O0`, you would see something like this:
```llvm
define internal void @.omp_mapper._QQFnesting_type_omp_default_mapper(ptr noundef %0, ptr noundef %1, ptr noundef %2, i64 noundef %3, i64 noundef %4, ptr noundef %5) #6 {
entry:
%6 = udiv exact i64 %3, 56
%7 = getelementptr %_QFTnesting_type, ptr %2, i64 %6
....
}
```
```llvm
define internal void @.omp_mapper._QQFnesting_type_omp_default_mapper(ptr noundef %0, ptr noundef %1, ptr noundef %2, i64 noundef %3, i64 noundef %4, ptr noundef %5) #6 {
entry:
%6 = getelementptr i8, ptr %2, i64 %3
....
}
```
The `udiv exact` instruction emitted by the OMP IR Builder (see:
c62cd2877c/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp (L9154))
allows `instcombine` to assume that `%3` is divisible by the struct size
(here `56`) and, therefore, replaces the result of the division with
direct GEP on `i8` rather than the struct type.
However, the runtime was calling
`@.omp_mapper._QQFnesting_type_omp_default_mapper` not with `56` (the
proper struct size) but with `48`!
Debugging this further, I found that the size of `omp.map.info`
operation to which the default mapper is attached computes the value of
`48` because we set the map to partial (see:
c62cd2877c/flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp (L1146)
and
c62cd2877c/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (L4501-L4512)).
However, I think this is incorrect since the emitted mapper (and
user-defined mappers in general) are defined on the whole struct type
and should never be marked as partial. Hence, the fix in this PR.
We finally got our buildbot added (to staging, at least) so we want to
start running L0 tests in CI.
We need `check-offload` to pass though, so XFAIL everything failing.
There's a couple `UNSUPPORTED` as well, those are for sporadic fails.
Also make set the `gpu` and `intelgpu` LIT variables when testing the
`spirv64-intel` triple.
We have no DeviceRTL yet so basically everything fails, but we manage to
get
```
Total Discovered Tests: 432
Unsupported : 169 (39.12%)
Passed : 67 (15.51%)
Expectedly Failed: 196 (45.37%)
```
We still don't build the level zero plugin by default and these tests
don't run unless the plugin was built, so this has no effect on most
builds.
---------
Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>
While the main problem with the test is that it requires LLD, given that
it is unlikely to be testing anything meaningful for a CPU-only build,
just mark it as requiring GPU.
Fixes#100780
Signed-off-by: Michał Górny <mgorny@gentoo.org>
### 1. ElementType deduction for pointer-based array sections
Problem: Pointer-based array sections were previously ignored during
`ElementType` deduction, leading to incorrect assumptions about array
item types.
This often resulted in out-of-bounds access, as seen in the assertion
failure:
```
Assertion `idx < size()' failed.
llvm-project/llvm/include/llvm/ADT/SmallVector.h:292:
reference llvm::SmallVectorTemplateCommon<llvm::Value *>::operatorsize_type
[T = llvm::Value *]
```
Fix: Added a check in clang/lib/CodeGen/CGOpenMPRuntime.cpp to ensure
`ElementType` is correctly detected for cases involving non-contiguous
updates with a base pointer.
Impact: Resolves failures in OpenMP_VV (formerly sollve_vv) and other
offload/clang-OpenMP tests:
All tests under:
https://github.com/OpenMP-Validation-and-Verification/OpenMP_VV/tree/master/tests/5.0/target_update
test_target_update_mapper_from_discontiguous.c
test_target_update_mapper_to_discontiguous.c
test_target_update_to_discontiguous.c
test_target_update_from_discontiguous.c
### 2. Zero-dimension propagation in struct member mappings
Problem: A zero-dimension entry for struct members introduced
inconsistencies in complex mapping logic within OMPIRBuilder.cpp.
Placeholder zeros propagated to emitNonContiguousDescriptor(), breaking
reverse indexing logic and corrupting IR:
Loops assume `Dims[I] >= 1`. When `Dims[I] == 0`:
Reverse indexing still stores pointers to uninitialized allocas or
mismatched slots. Runtime interprets `ArgSizes[I]` (derived from
`Dims[I])` as dimensionality, causing size/offset calculations to
collapse to zero → results in `size=0` async copy and plugin interface
errors.
Fix: Prepend a synthetic dimension of size 1 instead of appending a
zero, preserving correctness in `targetDataUpdate()` for non-contiguous
updates.
Impact: Added dedicated test cases that previously failed on main.
These commits fix issues regarding storage of tool data within
libomptarget. Both libomp and libomptarget have been modified to
accommodate this. We differentiate between two cases depending on the
type of the target region:
- merged target regions (default, without `nowait` clause): behavior
remains unchanged, tool data is stored in the thread local
RegionInterface class within libomptarget.
- deferred target regions (using `nowait` clause): tool data is moved to
`ompt_task_info_t` struct within libomp, as `RegionInterface` is thread
local and its data is lost whenever another task is scheduled on the
thread, which happens with deferred target regions.
In the new implementation, `RegionInterface` receives pointers to
`ompt_task_info_t` within libomp which are handled transparently within
libomptarget. Thus, the problem of tool data getting lost when a thread
receives a new task is resolved: `target_data` and `target_task_data`
remain set.
Another issue was the value of `task_data` which is supposed to belong
to the generating task of the region according to the OpenMP standard,
but instead had been set to the `task_data` of the target task itself
until now.
Test cases have been added which check both of these fixes.
---------
Co-authored-by: Joachim <jenke@itc.rwth-aachen.de>
This is needed as a way to support older code that was expecting
unconditional attachment to happen for cases like:
```c
int *p;
int x;
#pragma omp targret enter data map(p) // (A)
#pragma omp target enter data map(x) // (B)
p = &x;
// By default, this does NOT attach p and x
#pragma omp target enter data map(p[0:0]) // (C)
```
When the environment variable is set, such maps, where both the pointer
and the pointee already have corresponding copies on the device, but are
not attached to one another, will be attached as-if OpenMP 6.1 TR14's
`attach(always)` map-type-modifier was specified on `(C)`.
This adds support for using `ATTACH` map-type for proper
pointer-attachment when mapping list-items that have base-pointers.
For example, for the following:
```c
int *p;
#pragma omp target enter data map(p[1:10])
```
The following maps are now emitted by clang:
```
(A)
&p[0], &p[1], 10 * sizeof(p[1]), TO | FROM
&p, &p[1], sizeof(p), ATTACH
```
Previously, the two possible maps emitted by clang were:
```
(B)
&p[0], &p[1], 10 * sizeof(p[1]), TO | FROM
(C)
&p, &p[1], 10 * sizeof(p[1]), TO | FROM | PTR_AND_OBJ
````
(B) does not perform any pointer attachment, while (C) also maps the
pointer p, both of which are incorrect.
-----
With this change, we are using ATTACH-style maps, like `(A)`, for cases
where the expression has a base-pointer. For example:
```cpp
int *p, **pp;
S *ps, **pps;
... map(p[0])
... map(p[10:20])
... map(*p)
... map(([20])p)
... map(ps->a)
... map(pps->p->a)
... map(pp[0][0])
... map(*(pp + 10)[0])
```
#### Grouping of maps based on attach base-pointers
We also group mapping of clauses with the same base decl in the order of
the increasing complexity of their base-pointers, e.g. for something
like:
```
S **spp;
map(spp[0][0], spp[0][0].a), // attach-ptr: spp[0]
map(spp[0]), // attach-ptr: spp
map(spp), // attach-ptr: N/A
```
We first map `spp`, then `spp[0]` then `spp[0][0]` and `spp[0][0].a`.
This allows us to also group "struct" allocation based on their attach
pointers. This resolves the issues of us always mapping everything from
the beginning of the symbol `spp`. Each group is mapped independently,
and at the same level, like `spp[0][0]` and its member `spp[0][0].a`, we
still get map them together as part of the same contiguous struct
`spp[0][0]`. This resolves issue #141042.
#### use_device_ptr/addr fixes
The handling of `use_device_ptr/addr` was updated to use the attach-ptr
information, and works for many cases that were failing before. It has
to be done as part of this series because otherwise, the switch from
ptr_to_obj to attach-style mapping would have caused regressions in
existing use_device_ptr/addr tests.
#### Handling of attach-pointers that are members of implicitly mapped
structs:
* When a struct member-pointer, like `p` below, is a base-pointer in a
`map` clause on a target construct (like `map(p[0:1])`, and the base of
that struct is either the `this` pointer (implicitly or explicitly), or
a struct that is implicitly mapped on that construct, we add an implicit
`map(p)` so that we don't implicitly map the full struct.
```c
struct S { int *p;
void f1() {
#pragma omp target map(p[0:1]) // Implicitly map this->p, to ensure
// that the implicit map of `this[:]` does
// not map the full struct
printf("%p %p\n", &p, p);
}
```
#### Scope for improvement:
* We may be able to compute attach-ptr expr while collecting
component-lists in Sema.
* But we cache the computation results already, and `findAttachPtrExpr`
is fairly simple, and fast.
* There may be a better way to implement semantic expr comparison.
#### Needs future work:
* Attach-style maps not yet emitted for declare mappers.
* Mapping of class member references: We are still using PTR_AND_OBJ
maps for them. We will likely need to change that to handle
`ref_ptr/ref_ptee`, and `attach` map-type-modifier on them.
* Implicit capturing of "this" needs to map the full `this[0:1]` unless
there is an explicit map on one of the members, or a map with a member
as its base-pointer.
* Implicit map added for capturing a class member pointer needs to also
add a zero-length-array-section map.
* `use_device_addr` on array-sections-on-pointers need further
improvements (documented using FIXMEs)
#### Why a large PR
While it's unfortunate that this PR has gotten large and difficult to
review, the issue is that all the functional changes have to be made
together, to prevent regressions from partially implemented changes.
For example, the changes to capturing were previously done separately
(#145454), but they would still cause stability issues in absence of
full attach-mapping. And attach-mapping needs those changes to be able
to launch kernels.
We extracted the utilities and functions, like those for finding
attach-ptrs, or comparing exprs, out as a separate NFC PR that doesn't
call those functions, just adds them (#155625). Maybe the change that
adds a new error message for use_device_addr on array-sections with
non-var base-pointers could have been extracted out too (but that would
have had to be a follow-up change in that case, and we would get
comp-fails with this PR when the erroneous case was not
caught/diagnosed).
---------
Co-authored-by: Alex Duran <alejandro.duran@intel.com>
Add support for OpenMP is_device_ptr clause for target directives.
[MLIR][OpenMP] Add OpenMPToLLVMIRTranslation support for is_device_ptr
#169367 This PR adds support for the OpenMP is_device_ptr clause in the
MLIR to LLVM IR translation for target regions. The is_device_ptr clause
allows device pointers (allocated via OpenMP runtime APIs) to be used
directly in target regions without implicit mapping.
Add support for OpenMP is_device_ptr clause for target directives.
[MLIR][OpenMP] Add OpenMPToLLVMIRTranslation support for is_device_ptr #169367
This PR adds support for the OpenMP is_device_ptr clause in the MLIR to LLVM IR translation for target regions. The is_device_ptr clause allows device pointers (allocated via OpenMP runtime APIs) to be used directly in target regions without implicit mapping.
This is a branch off of
https://github.com/llvm/llvm-project/pull/159856, in which consists of
the runtime portion of the changes required to support indirect function
and virtual function calls on an `omp target device` when the virtual
class / indirect function is mapped to the device from the host.
Key Changes
- Introduced a new flag OMP_DECLARE_TARGET_INDIRECT_VTABLE to mark
VTable registrations
- Modified setupIndirectCallTable to support both VTable entries and
indirect function pointers
Details:
The setupIndirectCallTable implementation was modified to support this
registration type by retrieving the first address of the VTable and
inferring the remaining data needed to build the indirect call table.
Since the Vtables / Classes registered as indirect can be larger than 8
bytes, and the vtables may not be at the first address we either need to
pass the size to __llvm_omp_indirect_call_lookup and have a check at
each step of the binary search, or add multiple entries to the indirect
table for each address registered. The latter was chosen.
Commit: a00def3f20e166d4fb9328e6f0bc0742cd0afa31 is not a part of this
PR and is handled / reviewed in:
https://github.com/llvm/llvm-project/pull/159856,
This is PR (2/3)
Register Vtable PR (1/3):
https://github.com/llvm/llvm-project/pull/159856,
Codegen / _llvm_omp_indirect_call_lookup PR (3/3):
https://github.com/llvm/llvm-project/pull/159857
These are C tests, not C++, so no function parameters means unspecified
number of parameters, not `void`.
These compile fine on the current tested offload targets because an
error is only
[thrown](https://github.com/llvm/llvm-project/blob/main/clang/lib/Sema/SemaDecl.cpp#L10695)
if the calling convention doesn't support variadic arguments, which they
happen to.
When compiling this test for other targets that do not support variadic
arguments, we get an error, which does not seem intentional.
Just add `void` to the parameter list.
---------
Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>
This patch add support for lowering of custom reductions to MLIR. It
also enhances the capability of the pass to automatically mark functions
as "declare target" by traversing custom reduction initializers and
combiners.
While the infrastructure for declare target to/enter and link for
variables exists in the MLIR dialect and at the Flang level, the current
lowering from MLIR -> LLVM IR isn't in place, it's only in place for
variables that have the link clause applied.
This PR aims to extend that lowering to an initial implementation that
incorporates declare target to as well, which primarily requires changes
in the OpenMPToLLVMIRTranslation phase. However, a minor addition to the
OpenMP dialect was required to extend the declare target enumerator to
include a default None field as well.
This also requires a minor change to the Flang lowering's
MapInfoFinlization.cpp pass to alter the map type for descriptors to
deal with cases where a variable is marked declare to. Currently, when a
descriptor variable is mapped declare target to the descriptor component
can become attatched, and cannot be updated, this results in issues when
an unusual allocation range is specified (effectively an off-by X
error). The current solution is to map the descriptor always, as we
always require an up-to-date version of this data. However, this also
requires an interlinked PR that adds a more intricate type of mapping of
structures/record types that clang currently implements, to circumvent
the overwriting of the pointer in the descriptor.
3/3 required PRs to enable declare target to mapping, this PR should
pass all tests and provide an all green CI.
Co-authored-by: Raghu Maddhipatla raghu.maddhipatla@amd.com
This PR introduces a new additional type of map lowering for record
types that Clang currently supports, in which a user can map a top-level
record type and then individual members with different mapping,
effectively creating a sort of "overlapping" mapping that we attempt to
cut around.
This is currently most predominantly used in Fortran, when mapping
descriptors and there data, we map the descriptor and its data with
separate map modifiers and "cut around" the pointer data, so that wedo
not overwrite it unless the runtime deems it a neccesary action based on
its reference counting mechanism. However, it is a mechanism that will
come in handy/trigger when a user explitily maps a record type (derived
type or structure) and then explicitly maps a member with a different
map type.
These additions were predominantly in the OpenMPToLLVMIRTranslation.cpp
file and phase, however, one Flang test that checks end-to-end IR
compilation (as far as we care for now at least) was altered.
2/3 required PRs to enable declare target to mapping, should look at PR
3/3 to check for full green passes (this one will fail a number due to
some dependencies).
Co-authored-by: Raghu Maddhipatla raghu.maddhipatla@amd.com
Post-commit fix of #164794 reported at
https://github.com/llvm/llvm-project/pull/164794#issuecomment-3536253493
`LLVM_LIBRARY_OUTPUT_INTDIR` and `LLVM_RUNTIME_OUTPUT_INTDIR` is used by
`AddLLVM.cmake` as output directories. Unless we are in a
bootstrapping-build, It must not point to directories found by
`find_package(LLVM)` which may be read-only directories. MLIR for
instance sets thesese variables to its own build output
directory, so should the runtimes.