Fix some tests causing hangs, one fail, and a few XPASSing. We are
seeing new passes/fails because of the named barrier changes being
merged.
Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>
This change adds implementation for named barriers for SPIRV backend.
Since there is no built in API/intrinsics for named barrier in SPIRV,
the implementation loosely follows implementation for AMD
This change improves handling of errors during synchronization in Level
Zero plugin by ensuring cleanup of queues and events in case of an
synchronization error. As a result multiple tests stopped hanging.
---------
Co-authored-by: Duran, Alex <alejandro.duran@intel.com>
This commit removes the `LIBOMPTARGET_SHARED_MEMORY_SIZE` envar and
outputs a runtime warning if it is defined. Access to dynamic shared memory
should be obtained through the `dyn_groupprivate` clause (OpenMP 6.1) or
the launch arguments in liboffload kernel launch.
This patch implements the CodeGen logic for calling __llvm_omp_indirect_call_lookup
on the device when an indirect function call or a virtual function call is made
within an OpenMP target region.
---------
Co-authored-by: Youngsuk Kim
This patch implements the CodeGen logic for calling __llvm_omp_indirect_call_lookup
on the device when an indirect function call or a virtual function call is made
within an OpenMP target region.
---------
Co-authored-by: Youngsuk Kim
Since we can now build the DeviceRTL with SPIR-V, redo the
`XFAIL/UNSUPPORTED` specifications for the tests we see passing/failing
on the Level Zero backend with the DeviceRTL being used.
The tests marked `UNSUPPORTED` hang or sporadically fail and those are
tracked in https://github.com/llvm/llvm-project/issues/182119.
This change will allow us to enable CI testing with the DeviceRTL.
Here are the full test results with this change applied, running only
the `spirv64-intel` `check-offload` tests:
```
Total Discovered Tests: 453
Unsupported : 206 (45.47%)
Passed : 141 (31.13%)
Expectedly Failed: 106 (23.40%)
```
31% is not a bad start.
---------
Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>
We finally got our buildbot added (to staging, at least) so we want to
start running L0 tests in CI.
We need `check-offload` to pass though, so XFAIL everything failing.
There's a couple `UNSUPPORTED` as well, those are for sporadic fails.
Also make set the `gpu` and `intelgpu` LIT variables when testing the
`spirv64-intel` triple.
We have no DeviceRTL yet so basically everything fails, but we manage to
get
```
Total Discovered Tests: 432
Unsupported : 169 (39.12%)
Passed : 67 (15.51%)
Expectedly Failed: 196 (45.37%)
```
We still don't build the level zero plugin by default and these tests
don't run unless the plugin was built, so this has no effect on most
builds.
---------
Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>
This is a branch off of
https://github.com/llvm/llvm-project/pull/159856, in which consists of
the runtime portion of the changes required to support indirect function
and virtual function calls on an `omp target device` when the virtual
class / indirect function is mapped to the device from the host.
Key Changes
- Introduced a new flag OMP_DECLARE_TARGET_INDIRECT_VTABLE to mark
VTable registrations
- Modified setupIndirectCallTable to support both VTable entries and
indirect function pointers
Details:
The setupIndirectCallTable implementation was modified to support this
registration type by retrieving the first address of the VTable and
inferring the remaining data needed to build the indirect call table.
Since the Vtables / Classes registered as indirect can be larger than 8
bytes, and the vtables may not be at the first address we either need to
pass the size to __llvm_omp_indirect_call_lookup and have a check at
each step of the binary search, or add multiple entries to the indirect
table for each address registered. The latter was chosen.
Commit: a00def3f20e166d4fb9328e6f0bc0742cd0afa31 is not a part of this
PR and is handled / reviewed in:
https://github.com/llvm/llvm-project/pull/159856,
This is PR (2/3)
Register Vtable PR (1/3):
https://github.com/llvm/llvm-project/pull/159856,
Codegen / _llvm_omp_indirect_call_lookup PR (3/3):
https://github.com/llvm/llvm-project/pull/159857
Summary:
`malloc(0)` and `free(nullptr)` are both defined by the standard but we
current trigger erros and assertions on them. Fix that so this works
with empty arguments.
Summary:
Previously, we removed the special handling for the code object version
global. I erroneously thought that this meant we cold get rid of this
weird `-Xclang` option. However, this also emits an LLVM IR module flag,
which will then cause linking issues.
Summary:
This patch adds an RPC interface that lives directly in the OpenMP
device runtime. This allows OpenMP to implement custom opcodes.
Currently this is only providing the host call interface, which is the
raw version of reverse offloading. Previously this lived in `libc/` as
an extension which is not the correct place.
The interface here uses a weak symbol for the RPC client by the same
name that the `libc` interface uses. This means that it will defer to
the libc one if both are present so we don't need to set up multiple
instances.
The presense of this symbol is what controls whether or not we set up
the RPC server. Because this is an external symbol it normally won't be
optimized out, so there's a special pass in OpenMPOpt that deletes this
symbol if it is unused during linking. That means at `O0` the RPC server
will always be present now, but will be removed trivially if it's not
used at O1 and higher.
It appears that the RUNTIMES build prefers the x86-64-unknown-linux-gnu
triple notation for the host. This fixes runtime / test breakages when
compiler-rt is used as the CLANG_DEFAULT_RTLIB.
Summary:
Currently, we assign this to private memory. This causes failures on
some SOLLVE tests. The standard isn't clear on the semantics of this
allocation type, but there seems to be a consensus that it's supposed to
be shared memory.
Summary:
The 'omp_alloc' function should be callable from a target region. This
patch implemets it by simply calling `malloc` for every non-default
trait value allocator. All the special access modifiers are
unimplemented and return null. The null allocator returns null as the
spec states it should not be usable from the target.
Summary;
Now that we use the linker to do LTO / device linking, we need to inform
the `clang` invocation to use `-flto` so it forwards arguments like
`-On` correctly.
Many tests in the `offload` project have requirements defined by which
targets are not supported rather than which platforms are supported.
This patch aims to streamline the requirement definitions by adding four
new feature tags: `host`, `gpu`, `amdgpu`, and `nvidiagpu`.
In a nutshell, this moves our libomptarget code to populate the offload
subproject.
With this commit, users need to enable the new LLVM/Offload subproject
as a runtime in their cmake configuration.
No further changes are expected for downstream code.
Tests and other components still depend on OpenMP and have also not been
renamed. The results below are for a build in which OpenMP and Offload
are enabled runtimes. In addition to the pure `git mv`, we needed to
adjust some CMake files. Nothing is intended to change semantics.
```
ninja check-offload
```
Works with the X86 and AMDGPU offload tests
```
ninja check-openmp
```
Still works but doesn't build offload tests anymore.
```
ls install/lib
```
Shows all expected libraries, incl.
- `libomptarget.devicertl.a`
- `libomptarget-nvptx-sm_90.bc`
- `libomptarget.rtl.amdgpu.so` -> `libomptarget.rtl.amdgpu.so.18git`
- `libomptarget.so` -> `libomptarget.so.18git`
Fixes: https://github.com/llvm/llvm-project/issues/75124
---------
Co-authored-by: Saiyedul Islam <Saiyedul.Islam@amd.com>