Converting the current bot config to use the CMake cache file that we
use in other bots (offload/cmake/caches/AMDGPUBot.cmake). This PR
removes all CMake settings that the cache file already sets and only
leaves those that were either not set explicitly or which differ.
Thus, first load the cache file and then adjust the settings to override
existing values.
### Issue: Dimension override missing
When variable count expressions were used with stride, the constant
subsection path computed size first. This marked `ArgSizes` with byte
size semantics. Variable expression logic later triggered, but reused
`ArgSizes` assuming "bytes" semantics
`OMPIRBuilder.cpp` didn't handle dimension count for
`OMP_MAP_NON_CONTIG` flag
**Result**: `ArgSizes` wasn't overwritten with dimension count, breaking
non-contiguous mapping.
**Fixes**:
`llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp` - Expression semantics for
non-contiguous.
stride/count.
Generate 3D descriptor structures with runtime dimensions.
Fix dimension override to use dimension count instead of byte size.
Added testcases to cover stack arrays, heap pointers, struct members,
etc.
- Accept OffloadBinaries as valid images by plugins that support them in
the PluginInterface.
- Add support in L0 plugin to extract SPIRV images and their associated
metadata from an OffloadBinary image.
Depends on:
- #185663
Follow-up PRs:
- #185413 (Changes SPIRV wrapper generation to use OffloadBinary)
- #185425 (Adjusts llvm-objdump)
- #184774 (Adjusts llvm-offload-binary)
Summary:
As suggested in https://github.com/llvm/llvm-project/pull/177665, we
should build a GPU version of the compiler-rt profile library instead of
writing it in-line in the lowering. This PR does not define anything GPU
specific, it simply re-uses the baremetal handling. Later PRs will
prevent the GPU specific handling we would want to do to optimize
counter handling on the GPU.
Note that this will require using the cache file, or setting these
options
manually for existing users. Hopefully if people are using the cache
file
as they should it won't break anything.
As discussed in #185404 we might want to provide a way for plugins to
validate images not recognized by the common layer.
This PR adds such extension and uses it to validate pure SPIRV images by
the Level Zero plugin.
Summary:
This was a regression from the original LLVM-gpu-loader. We used to
handle `-mwavefrontsize64` correctly in the loader by over-allocating
memory and just leaving the upper 32-bits masked off. In order to handle
this in offload we need to scan loaded kernels to see how much memory we
need to allocate. This should be safe, the protocol is designed to
handle an arbitrary size and worst-case this just wastes space.
This fixes a bug in USM mode where the `close` map type modifer was
attached to some `map.info.op`'s corresponding to user-defined type
members while the parent type instance itself is not marked as `close`.
This fix ensures that if a parent record type map does not have the
'close' flag, it is cleared from its members as well, maintaining
consistency.
Gemini was used to create tests. AI generated test code was reviewed
line-by-line by me. Which were derived from a reproducer I was working
with to debug the issue.
Assisted-by: Gemini <gemini@google.com>
This patch implements the CodeGen logic for calling __llvm_omp_indirect_call_lookup
on the device when an indirect function call or a virtual function call is made
within an OpenMP target region.
---------
Co-authored-by: Youngsuk Kim
This patch implements the CodeGen logic for calling __llvm_omp_indirect_call_lookup
on the device when an indirect function call or a virtual function call is made
within an OpenMP target region.
---------
Co-authored-by: Youngsuk Kim
Some tests that were checking for prints inside/outside `target` regions
needed to be updated to work on systems where the ordering wasn't
deterministic.
Reverts llvm/llvm-project#184240
Original description from #165494:
-----
OpenMP allows cases like the following:
```c
int *p1, *p2, x;
p1 = p2 = &x;
...
#pragma omp target_exit_data map(delete: p1[:]) from(p2[0])
```
Which means, when the runtime encounters the `from` entry, the ref-count
may
not be zero, but it will go down to zero at the end of the current
construct,
which should cause the "from" transfer to happen.
Similarly, a user may have:
```c
struct S {
int *p;
};
#pragma omp declare_mapper (id1: S s) map(s.p) map(present, alloc: s.p[0:10])
#pragma omp declare_mapper (id2: S s) map(s.p, s.p[0:10])
S s1;
// present-check should fail here
#pragma omp target_enter_data map(alloc: s.p[0:10]) map(mapper(id1), to: s)
// "to" should be honored here
#pragma omp target_enter_data map(alloc: s.p[0:10]) map(mapper(id2), to: s)
```
Where the allocation happens before the "to" entry is encountered by the
runtime. Or, an allocation happens before a "present" entry is
encountered.
To handle cases like this, we need to use the state information of
previously
seen new allocations, deletions, "from" entries, when honoring
`to`/`from`/`present` map entries.
-----
OpenMP allows cases like the following:
```c
int *p1, *p2, x;
p1 = p2 = &x;
...
#pragma omp target_exit_data map(delete: p1[:]) from(p2[0])
```
Which means, when the runtime encounters the `from` entry, the ref-count
may not be zero, but it will go down to zero at the end of the current
construct, which should cause the "from" transfer to happen.
Similarly, a user may have:
```c
struct S {
int *p;
};
#pragma omp declare_mapper (id1: S s) map(s.p) map(present, alloc: s.p[0:10])
#pragma omp declare_mapper (id2: S s) map(s.p, s.p[0:10])
S s1;
// present-check should fail here
#pragma omp target_enter_data map(alloc: s.p[0:10]) map(mapper(id1), to: s)
// "to" should be honored here
#pragma omp target_enter_data map(alloc: s.p[0:10]) map(mapper(id2), to: s)
```
Where the allocation happens before the "to" entry is encountered by the
runtime. Or, an allocation happens before a "present" entry is
encountered.
To handle cases like this, we need to use the state information of
previously seen new allocations, deletions, "from" entries, when
honoring `to`/`from`/`present` map entries.
Implement support for the OpenMP `mapper` modifier on `target update` `to` and
`from` clauses in Flang.
Semantic name resolution is extended to bind the mapper symbol for
`OmpClause::To` and `OmpClause::From` via a shared `ResolveMapperModifier`
helper. Lowering is extended in `ClauseProcessor` with a `getMapperIdentifier`
template helper to extract the mapper name for both `map` and `target update`
clauses and forward it to `omp.map_info`.
Fixes#168701.
Reviewed By: TIFitis, kparzysz
Assited By: Copilot( For review and articulations of messages)
Summary:
This enables primarily `stop.cpp` and `descriptor.cpp`. Requires a
little bit of wrangling to get it to compile. Unlike the CUDA build,
this build uses an in-tree libc++ configured for the GPU. This is
configured without thread support, environment, or filesystem, and it is
not POSIX at all. So, no mutexes, pthreads, or get/setenv.
I tested stop, but i don't know if it's actually legal to exit from
OpenMP offloading.
`spirv-link` seems to internalize all symbols, which ends up causing the
OpenMP Device Environment global generated by the OMP FE to get
optimized out which causes `liboffload` to run in the wrong
parallelization mode which breaks at least one liboffload lit test.
Pass `--create-library` to tell it not to do that.
```
--create-library
Link the binaries into a library, keeping all exported symbols.
```
This fixes the test.
Closes: https://github.com/llvm/llvm-project/issues/182901
Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>
Summary:
Expands on the previous support to enable formatted output, characters,
and checking basic iostat. We intentionally do not handle cases where
the descriptor is non-null as this is a non-trivial class that cannot
easily be shepherded across the wire.
Summary:
This PR provides the minimal support for Fortran I/O coming from a GPU
in OpenMP offloading. We use the same support the `libc` uses for its
printing through the RPC server. The helper functions `rpc::dispatch`
and `rpc::invoke` help make this mostly automatic.
Becaus Fortran I/O is not reentrant, the vast majority of complexity
comes from needing to stitch together calls from the GPU until they can
be executed all at once. This is needed not only because of the
limitations of recursive I/O, but without this the output would all be
interleaved because of the GPU's lock-step execution.
As such, the return values from the intermediate functions are
meaningless, all returning true. The final value is correct however. For
cookies we create a context pointer on the server to chain these
together.
Works on both my AMD and NVIDIA GPUs.
```fortran
program hello_gpu
implicit none
!$omp target teams num_teams(1)
!$omp parallel num_threads(2)
! Print strings
print *, "Hello from GPU"
!$omp end parallel
!$omp end target teams
end program hello_gpu
```
```console
> flang hello.f90 -O2 -fopenmp --offload-arch=gfx1030
> ./a.out
Hello from GPU
Hello from GPU
> flang hello.f90 -O2 -fopenmp --offload-arch=sm_89
> ./a.out
Hello from GPU
Hello from GPU
```
Since we can now build the DeviceRTL with SPIR-V, redo the
`XFAIL/UNSUPPORTED` specifications for the tests we see passing/failing
on the Level Zero backend with the DeviceRTL being used.
The tests marked `UNSUPPORTED` hang or sporadically fail and those are
tracked in https://github.com/llvm/llvm-project/issues/182119.
This change will allow us to enable CI testing with the DeviceRTL.
Here are the full test results with this change applied, running only
the `spirv64-intel` `check-offload` tests:
```
Total Discovered Tests: 453
Unsupported : 206 (45.47%)
Passed : 141 (31.13%)
Expectedly Failed: 106 (23.40%)
```
31% is not a bad start.
---------
Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>
This PR adds extends liboffload olMemRegister API to handle a case when
a memory block may have been mapped before calling olMemRegister to
support some use cases in libomptarget
Summary:
This PR adds a pointer argument to the initialization routine to be used
for global options. Right now this is used to allow the user to
constrain which backends they wish to use.
If a null argument is passed, the same behavior as before is observed.
This is epxected to be extensible by forcing the user to encode the size
of the struct. So, old executables will encode which fields they have
access to.
We use a macro helper to get this struct rather than a runtime call so
that the current state of the size is baked into the executable rather
than something looked up by the runtime. Otherwise it would just return
the size that the (potentially newer) runtime would see
The change to `llvm-zorg` to start building the DeviceRTL for SPIR-V on
our builder finally got taken by the infra, so we can merge this now.
---------
Co-authored-by: Joseph Huber <huberjn@outlook.com>
In a bootstrapping build with LLVM_ENABLE_PROJECTS=lld, the lld
executable will be found in LLVM's bin/ directory. But `check-offload`
will currently ignore it because the JIT plugin will look for `lld` in
`$PATH`. Similarly, the sanitizer tests require llvm-symbolizer during
execution to FileCheck the expected stack trace.
Add the llvm tools directory to `$PATH` so tests can use lld and
llvm-symbolizer in it. It should be prefered over a system-installed
lld/llvm-symbolizer.
Fixes the JIT and sanitizer tests of
[openmp-offload-amdgpu-clang-flang](https://lab.llvm.org/staging/#/builders/105).
Summary:
Closing ports was previously done manually, This makes the protocol more
error prone as unclosed ports will leak and eventually the locks will
run out. I believe the original fear was that the RAII portion would
negatively impact code generation but I have not noticed anything
significant.
The purpose of this PR is to add support of host as an offloading device
to liboffload. Both OpenMP and sycl support offloading to a host as
their normal workflow and therefore would require such capability from
liboffload library.
Summary:
This change allows lambdas to be used in the RPC dispatching functions.
Just requires an extra function trait to convert a lambda with no
captures into a function pointer. Also rearranged where the `Port` lives
because it looks better no that we may use a lambda and it's more
consistent with the dispatch usage (putting the client at the start).
The default Level Zero loader `libze_loader.so` may not be available on
systems that don't have Level Zero development package. Level Zero
loaders with major version suffix are searched in that case.
Summary:
The RPC interface is useful for forwarding functions. This PR adds
helper functions for doing a completely bare forwarding of a function
from the client to the server. This is intended to facilitate
heterogenous libraries that implement host functions on the GPU (like
MPI or Fortran).
Changes to make host plugin compile on Windows:
* Change IO code to be portable
* Adjust Makefiles
Allow plugin to work partially when libffi support is not found
dynamically (compilation works fine even on Windows because of the
wrapper support).
Allow a to define a set of Types that are not shown by default when
doing default debug loggin (e.g., LIBOMPTARGET_DEBUG=All).
Users can enable output of those types of messages by explicitly adding
them to LIBOMPTARGET_DEBUG.
Used to implement: #180545
---------
Co-authored-by: Michael Klemm <michael.klemm@amd.com>
I'll merge this at the same time as some llvm-zorg changes that start
building the DeviceRTL.
We only see one new test passing because everything still fails because
of the issue described in
https://github.com/llvm/llvm-project/pull/178980
Once a fix for that issue is merged we will see many new passes.