454 Commits

Author SHA1 Message Date
Joseph Huber
580860e8b7
[OpenMP][NFC] Clean up a bunch of warnings and clang-tidy messages (#159831)
Summary:
I made the GPU flags accept more of the default LLVM warnings, which
triggered some new cases. Clean those up and fix some other ones while
I'm at it.
2025-09-19 14:09:33 -05:00
Akash Banerjee
8afea0d0ea
[OpenMP][MLIR] Preserve to/from flags in mapper base entry for mappers (#159799)
With declare mapper, the parent base entry was emitted as `TARGET_PARAM`
only. The mapper received a map-type without `to/from`, causing
components to degrade to `alloc`-only (no copies), breaking allocatable
payload mapping. This PR preserves the map-type bits from the parent.

This fixes #156466.
2025-09-19 19:34:09 +01:00
Joseph Huber
51e3c3d51b
[Offload] Implement 'olIsValidBinary' in offload and clean up (#159658)
Summary:
This exposes the 'isDeviceCompatible' routine for checking if a binary
*can* be loaded. This is useful if people don't want to consume errors
everywhere when figuring out which image to put to what device.

I don't know if this is a good name, I was thining like `olIsCompatible`
or whatever. Let me know what you think.

Long term I'd like to be able to do something similar to what OpenMP
does where we can conditionally only initialize devices if we need them.
That's going to be support needed if we want this to be more
generic.
2025-09-19 12:15:57 -05:00
Ross Brunton
f334ac6665
[Offload] Include product name in llvm-offload-device-info (#159384) 2025-09-18 12:22:13 +01:00
Joseph Huber
dffd7f3d9a
[LLVM] Fix offload and update CUDA ABI for all SM values (#159354)
Summary:
Turns out the new CUDA ABI now applies retroactively to all the other
SMs if you upgrade to CUDA 13.0. This patch changes the scheme, keeping
all the SM flags consistent but using an offset.

Fixes: https://github.com/llvm/llvm-project/issues/159088
2025-09-17 14:39:39 -05:00
Kareem Ergawy
c286a427b9
[NFC][flang][do concurent] Add saxpy offload tests for OpenMP mapping (#155993)
Adds end-to-end tests for `do concurrent` offloading to the device.


PR stack:
- https://github.com/llvm/llvm-project/pull/155754
- https://github.com/llvm/llvm-project/pull/155987
- https://github.com/llvm/llvm-project/pull/155992
- https://github.com/llvm/llvm-project/pull/155993 ◀️
- https://github.com/llvm/llvm-project/pull/157638
- https://github.com/llvm/llvm-project/pull/156610
- https://github.com/llvm/llvm-project/pull/156837
2025-09-17 07:04:13 +02:00
Nick Sarnie
f74583fbe8
[offload] Fix build with debug libomptarget (#159144)
Currently get this error
```
offload/plugins-nextgen/common/src/PluginInterface.cpp:859:63: error: member reference type 'StringRef' is not a pointer; did you mean to use '.'?
```

We pass the full image binary now so we can't really print anything
useful here.

Seems introduced in https://github.com/llvm/llvm-project/pull/158748.

---------

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Co-authored-by: Joseph Huber <huberjn@outlook.com>
2025-09-16 18:40:02 +00:00
Joseph Huber
e7101dac9c
[Offload] Copy loaded images into managed storage (#158748)
Summary:
Currently we have this `__tgt_device_image` indirection which just takes
a reference to some pointers. This was all find and good when the only
usage of this was from a section of GPU code that came from an ELF
constant section. However, we have expanded beyond that and now need to
worry about managing lifetimes. We have code that references the image
even after it was loaded internally. This patch changes the
implementation to instaed copy the memory buffer and manage it locally.

This PR reworks the JIT and other image handling to directly manage its
own memory. We now don't need to duplicate this behavior externally at
the Offload API level. Also we actually free these if the user unloads
them.

Upside, less likely to crash and burn. Downside, more latency when
loading an image.
2025-09-16 08:57:28 -05:00
Jan Patrick Lehr
311d78f2a1
[OpenMP] Fix force-usm test after #157182 (#159095)
The refactoring lead to an additional data transfer. This changes the
assumed transfers in the check-strings to work with that changed
behavior.
2025-09-16 15:42:02 +02:00
Ross Brunton
c5474cdc27
[Offload] Make ASSERT_ERROR output more readable (#157653) 2025-09-16 12:04:53 +01:00
Abhinav Gaba
5af3fa81cc
[Offload][OpenMP] Support shadow-pointer tracking for Fortran descriptors. (#158370)
This change adds support for saving full contents of attached Fortran
descriptors, and not just their pointee address, in the shadow-pointer
table.

With this, we now support:
* comparing full contents of descriptors to check whether a previous
shadow-pointer entry is stale;
* restoring the full contents of descriptors

And with that, we can now use ATTACH map-types (added in #149036) for
mapping Fortran pointer/allocatable arrays, and array-sections on them.
e.g.:

```f90
  integer, allocatable :: x(:)
  !$omp target enter data map(to: x(:))
```

as:

```
  void* addr_of_pointee = allocated(x) ? &x(1) : nullptr;
  int64_t sizeof_pointee = allocated(x) ? sizeof(x(:)) : 0

  addr_of_pointee,    addr_of_pointee, sizeof_pointee,     TO
  addr_of_descriptor, addr_of_pointee, size_of_descriptor, ATTACH
```
2025-09-15 10:37:38 -07:00
Michał Górny
312b5615df
[offload] Fix finding libomptarget in runtimes build (#157856)
Per the logic in top-level CMakeLists, `libomptarget` is placed into
`LLVM_LIBRARY_OUTPUT_INTDIR` when this variable is set. Adjust the test
logic to include this directory in `-L` and `-Wl,-rpath` arguments as
well, in order to fix finding tests when building via the `runtimes`
top-level directory.

Signed-off-by: Michał Górny <mgorny@gentoo.org>
2025-09-10 16:31:22 +02:00
agozillon
8f16af3c20
[Flang][OpenMP] Fix mapping of character type with LEN > 1 specified (#154172)
Currently, there's a number of issues with mapping characters with LEN's
specified (strings effectively). They're represented as a char type in
FIR with a len parameter, and then later on they're expanded into an
array of characters when we're translating to the LLVM dialect. However,
we don't generate a bounds for these at lowering. The fix in this PR for
this is to generate a bounds from the LEN parameter and attatch it to
the map on lowering from FIR to the LLVM dialect when we encounter this
type.
2025-09-09 16:36:04 +02:00
Joseph Huber
4294907022
[Offload] Build libcxx on the GPU libc bot (#157673) 2025-09-09 09:35:53 -05:00
Ross Brunton
7731ecf259
[Offload] Skip most liboffload tests if no devices (#157417)
If there are no devices available for testing on liboffload, the test
will no longer throw an error when it fails to instantiate.

The tests will be silently skipped, but with a warning printed to
stderr.
2025-09-09 10:11:05 +01:00
Joseph Huber
6d032c4df2
[OpenMP] Fix incorrect CUDA bc path after library change (#157547) 2025-09-08 17:27:59 -05:00
Joseph Huber
5d550bf41c
[OpenMP] Move `__omp_rtl_data_environment' handling to OpenMP (#157182)
Summary:
This operation is done every time we load a binary, this behavior should
be moved into OpenMP since it concerns an OpenMP specific data struct.
This is a little messy, because ideally we should only be using public
APIs, but more can be extracted later.
2025-09-08 09:58:38 -05:00
Joseph Huber
3f3f7d1fd9 [Offload] Build the OpenMP device library with the AMDGPU libc bot
Summary:
This is missing because I forgot to add it.
2025-09-08 08:36:18 -05:00
Michał Górny
6343c9bbdf
[offload] Permit redefining OPENMP_STANDALONE_BUILD (#157253)
Permit redefining `OPENMP_STANDALONE_BUILD` to make it possible to build
offload correctly via runtimes build (i.e. build where the top-level
project is `runtimes`). This follows the same logic in `openmp`
component.

Signed-off-by: Michał Górny <mgorny@gentoo.org>
2025-09-08 15:16:02 +02:00
Joseph Huber
be6f110bc0
[OpenMP] Change build of OpenMP device runtime to be a separate runtime (#136729)
Summary:
Currently we build the OpenMP device runtime as part of the `offload/`
project. This is problematic because it has several restrictions when
compared to the normal offloading runtime. It can only be built with an
up-to-date clang and we need to set the target appropriately. Currently
we hack around this by creating the compiler invocation manually, but
this patch moves it into a separate runtimes build.

This follows the same build we use for libc, libc++, compiler-rt, and
flang-rt. This also moves it from `offload/` into `openmp/` because it
is still the `openmp/` runtime and I feel it is more appropriate. We do
want a generic `offload/` library at some point, but it would be trivial
to then add that as a separate library now that we have the
infrastructure that makes adding these new libraries trivial.

This most importantly will require that users update their build
configs, mostly adding the following lines at a minimum. I was debating
whether or not I should 'auto-upgrade' this, but I just went with a
warning.

```
    -DLLVM_RUNTIME_TARGETS='default;amdgcn-amd-amdhsa;nvptx64-nvidia-cuda'     \
    -DRUNTIMES_nvptx64-nvidia-cuda_LLVM_ENABLE_RUNTIMES=openmp \
    -DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES=openmp \
```

This also changed where the `.bc` version of the library lives, but it's
still created.
2025-09-08 07:51:52 -05:00
Julian Brown
c71da7d5e0
[OpenMP] Add tests for mapping of chained 'containing' structs (#156703)
This PR adds several new tests for mapping of chained structures, i.e.
those resembling:

  #pragma omp target map(tofrom: a->b->c)

These are currently XFAILed, although the first two tests actually work
with unified memory -- I'm not sure if it's possible to easily improve
the condition on the XFAILs in question to make them more accurate.

These cases are all fixed by the WIP PR
https://github.com/llvm/llvm-project/pull/153683.
2025-09-08 10:30:04 +01:00
Michał Górny
7a88ddd3b1
Revert "[Offload] Run unit tests as a part of check-offload" (#157346)
Reverts llvm/llvm-project#156675 due to regressions in standalone build
and test errors without all plugins enabled (#157345).
2025-09-07 15:12:15 +00:00
Jan Patrick Lehr
05aff0eb65
[Offload] Run tests 16-way parallel on AMDGPU (#156627)
Reduce the number of paralell tests run to align with the typical number
of VMIDs provided by the kernel driver.
2025-09-05 22:23:20 +02:00
Robert Imschweiler
b2ff3e780a
[OpenMP][Offload] Restore __kmpc_* function signatures (#156104)
Avoid altering existing function signatures of the kmpc interface to fix
regressions in the runtime optimization (OpenMPOpt).
2025-09-04 10:56:42 -05:00
Jan Patrick Lehr
209d91d9e4
[Offload] Fix CHECK string in llvm-omp-device-info test (#156872) 2025-09-04 14:30:37 +02:00
Ross Brunton
4e8b4d6190
[Offload] Port llvm-offload-device-info to new offload API (#155626)
This is a tool similar to urinfo that simply prints properties of all
devices. The old openMP version has been ported to liboffload.
2025-09-04 12:23:30 +01:00
Joseph Huber
99f61f3436
[Offload] Run unit tests as a part of check-offload (#156675)
Summary:
Add a dependnecy on the unit tests on the main check-offload test suite.
This matches what the other projects do, pass `llvm-lit` to the
directory to only run the lit tests, use the `check-offload-unit` for
only the unit tests.
2025-09-03 10:26:44 -05:00
Jan Patrick Lehr
27e541645c
[Offload][OpenMP] Enable more tests on AMDGPU (#156626)
(Re)enables a couple of tests that were disabled on AMDGPU for some
reason. Pass for me locally.
2025-09-03 14:04:39 +02:00
Ross Brunton
32beea0605
[OpenMP][Offload] Mark SPMD_NO_LOOP as a valid exec mode (#155990)
This was added in #154105 , but was not added to the plugin interface's
list of valid modes.
2025-09-01 11:27:24 +01:00
Ross Brunton
70ddd838f0
[Offload] Update tablegen tests (#156041)
These were not updated after #154736 .
2025-08-29 16:20:49 +01:00
Ross Brunton
ffb756dff2
[Offload] Add OL_DEVICE_INFO_MAX_WORK_SIZE[_PER_DIMENSION] (#155823)
This is the total number of work items that the device supports (the
equivalent work group properties are for only a single work group).
2025-08-29 09:39:18 +01:00
Ross Brunton
9e5d8bd3d1
[Offload] Improve olDestroyQueue logic (#153041)
Previously, `olDestroyQueue` would not actually destroy the queue,
instead leaving it for the device to clean up when it was destroyed.
Now, the queue is either released immediately if it is complete or put
into a list of "pending" queues if it is not. Whenever we create a new
queue, we check this list to see if any are now completed. If there are
any we release their resources and use them instead of pulling from
the pool.

This prevents long running programs that create and drop many queues
without syncing them from leaking memory all over the place.
2025-08-29 09:39:00 +01:00
Ross Brunton
41fed2d048
[Offload] Add PRODUCT_NAME device info (#155632)
On my system, this will be "Radeon RX 7900 GRE" rather than "gfx1100". For Nvidia, the product name and device name are identical.
2025-08-28 15:16:17 +01:00
Abhinav Gaba
890bc4652f
[Offload] Update LIBOMPTARGET_INFO text for attach map-type. (#155509)
Also adds two debug dumps regarding pointer-attachment.
2025-08-28 06:48:42 -07:00
Jan Patrick Lehr
bcb9634be8
[Offload][OpenMP] Tests require libc on GPU for printf (#155785)
These tests currently fail when libc is not configured to be built as
they require printf to be available in target regions.
2025-08-28 14:30:18 +02:00
Robert Imschweiler
732c07a8d9
[OpenMP][clang] 6.0: num_threads strict (part 2: device runtime) (#146404)
OpenMP 6.0 12.1.2 specifies the behavior of the strict modifier for the
num_threads clause on parallel directives, along with the message and
severity clauses. This commit implements necessary device runtime
changes.
2025-08-28 09:31:52 +02:00
Dominik Adamski
87db8e9130
[OpenMP][Offload] Add SPMD-No-Loop mode to OpenMP offload runtime (#154105)
Kernels which are marked as SPMD-No-Loop should be launched with
sufficient number of teams and threads to cover loop iteration space.

No-Loop mode is described in RFC:

https://discourse.llvm.org/t/rfc-no-loop-mode-for-openmp-gpu-kernels/87517/
2025-08-28 09:19:14 +02:00
Kevin Sala Penades
0ad35d7586
[NFC][offload] Fix error message for cuFuncSetAttribute (#155655) 2025-08-27 11:35:37 -07:00
Leandro Lacerda
9c410dd33d
[Offload][Conformance] Add README file (#155190)
This patch introduces a `README.md` file for the GPU math conformance
test suite located in `offload/unittests/Conformance`.

The goal of this document is to provide clear and thorough instructions
for new users and future contributors. It covers the project's purpose,
system requirements, build and execution steps, testing methodology, and
overall architecture.
2025-08-26 08:34:15 -05:00
Ross Brunton
1b6875ea1f
[Offload] Full AMD support for olMemFill (#154958) 2025-08-26 11:49:12 +01:00
Abhinav Gaba
bb1cb6a198
[NFC][OpenMP] Add several use_device_ptr/addr tests. (#154939)
Most tests are either compfailing or runfailing.

They should start passing once we start using ATTACH map-type based
codegen. (#153683)

Even after they start passing, there are a few places where the EXPECTED
and actual CHECKs are different, due to two main issues:
* use_device_ptr translation on `&p[0]` is not succeeding in looking-up
a previously mapped `&p[1]`
* privatization of byref use_device_addr operands is not happening
correctly.

The above should be fixed as separate standalone changes.
2025-08-25 14:23:26 -07:00
Leandro Lacerda
5ef4120b64
[Offload][Conformance] Add exhaustive tests for half-precision math functions (#155112)
This patch adds a set of exhaustive tests for half-precision math.

The functions included in this set were selected based on the following
criteria:
- An implementation exists in `libc/src/math/generic` (i.e., it is not
just a wrapper around a compiler built-in).
- The corresponding LLVM CPU libm implementation is correctly rounded.
- The function is listed in Table 69 of the OpenCL C Specification
v3.0.19.

This patch also fixes the testing range of the following functions:
`acos`, `acosf`, `asin`, `asinf`, and `log1p`.
2025-08-24 09:42:26 -05:00
Leandro Lacerda
9919301486
[Offload][Conformance] Add randomized tests for double-precision math functions (#155003)
This patch adds a set of randomized conformance tests for
double-precision math functions.

The functions included in this set were selected based on the following
criteria:
- An implementation exists in `libc/src/math/generic` (i.e., it is not
just a wrapper around a compiler built-in).
- The corresponding LLVM CPU libm implementation is correctly rounded.
- The function is listed in Table 68 of the OpenCL C Specification
v3.0.19.
2025-08-22 14:07:29 -05:00
Callum Fare
77c5a6506f
[Offload] Fix definition of olMemFill (#154947)
Fix regression introduced by #154102 - the way offload-tblgen handles
names has changed
2025-08-22 14:48:00 +01:00
Callum Fare
0b18d2da70
[Offload] Implement olMemFill (#154102)
Implement olMemFill to support filling device memory with arbitrary
length patterns. AMDGPU support will be added in a follow-up PR.
2025-08-22 14:31:16 +01:00
Ross Brunton
4c0c295775
[Offload] OL_EVENT_INFO_IS_COMPLETE (#153194)
A simple info query for events that returns whether the event is
complete or not.
2025-08-22 13:40:31 +01:00
Ross Brunton
17dbb92612
[Offload][NFC] Use tablegen names rather than name parameter for API (#154736) 2025-08-22 11:13:57 +01:00
Leandro Lacerda
eed5f06ae8
[Offload][Conformance] Add randomized tests for single-precision bivariate math functions (#154663)
This patch adds a new set of randomized conformance tests for
single-precision bivariate math functions.

The functions included in this set were selected based on the following
criteria:
- An implementation exists in `libc/src/math/generic` (i.e., it is not
just a wrapper around a compiler built-in).
- The corresponding LLVM CPU libm implementation is correctly rounded.
- The function is listed in Table 65 of the OpenCL C Specification
v3.0.19.
2025-08-21 11:27:25 -05:00
Ross Brunton
2e74cc6c04
[Offload][NFC] Use a sensible order for APIGen (#154518)
The order entries in the tablegen API files are iterated is not the
order
they appear in the file. To avoid any issues with the order changing
in future, we now generate all definitions of a certain class before
class that can use them.

This is a NFC; the definitions don't actually change, just the order
they exist in in the OffloadAPI.h header.
2025-08-21 09:38:21 +01:00
Ross Brunton
273ca1f77b
[Offload] Fix OL_DEVICE_INFO_MAX_MEM_ALLOC_SIZE on AMD (#154521)
This wasn't handled with the normal info API, so needs special handling.
2025-08-21 09:37:58 +01:00