75 Commits

Author SHA1 Message Date
fineg74
848d736e64
[OFFLOAD] Add asynchronous queue query API for libomptarget migration (#172231)
Add liboffload asynchronous queue query API for libomptarget migration

This PR adds liboffload asynchronous queue query API that needed to make
libomptarget to use liboffload
2026-01-20 10:53:32 -08:00
Alex Duran
efad3563ea
[OFFLOAD] Update CUDA and AMD plugins to new debug format (#175787) 2026-01-13 17:53:59 +01:00
Alex Duran
86e114a9b2
Revert "[OFFLOAD] Update CUDA and AMD plugins to new debug format" (#175786)
Reverts llvm/llvm-project#175757
2026-01-13 17:13:46 +01:00
Alex Duran
7c2f49373b
[OFFLOAD] Update CUDA and AMD plugins to new debug format (#175757)
This should be the last step before completely removing the DP macro.
2026-01-13 17:06:35 +01:00
Alex Duran
dbd52bd558
[OFFLOAD][OpenMP] Remove old style REPORT support (#175607)
Fix the few remaining usages and remove the support for the old REPORT
macro.
2026-01-12 19:48:40 +01:00
Kevin Sala Penades
1a86f0aae7
[Offload] Add device info for shared memory (#167817) 2025-11-13 11:00:12 -08:00
Joseph Huber
670c453aeb
[Offload] Remove handling for device memory pool (#163629)
Summary:
This was a lot of code that was only used for upstream LLVM builds of
AMDGPU offloading. We have a generic and fast `malloc` in `libc` now so
just use that. Simplifies code, can be added back if we start providing
alternate forms but I don't think there's a single use-case that would
justify it yet.
2025-11-06 10:15:18 -06:00
Robert Imschweiler
dc94f2cbad
[Offload] Add device UID (#164391)
Introduced in OpenMP 6.0, the device UID shall be a unique identifier of
a device on a given system. (Not necessarily a UUID.) Since it is not
guaranteed that the (U)UIDs defined by the device vendor libraries, such
as HSA, do not overlap with those of other vendors, the device UIDs in
offload are always combined with the offload plugin name. In case the
vendor library does not specify any device UID for a given device, we
fall back to the offload-internal device ID.
The device UID can be retrieved using the `llvm-offload-device-info`
tool.
2025-11-04 20:15:47 +01:00
Nicole Aschenbrenner
16641ad8a2
[OpenMP] Adds omp_target_is_accessible routine (#138294)
Adds omp_target_is_accessible routine.
Refactors common code from omp_target_is_present to work for both
routines.

---------

Co-authored-by: Shilei Tian <i@tianshilei.me>
2025-10-22 17:35:16 +02:00
Ross Brunton
186182bb64
[Offload] Use amd_signal_async_handler for host function calls (#154131) 2025-10-21 13:08:30 +01:00
Alex Duran
45757b9284
[OFFLOAD] Remove unused init_device_info plugin interface (#162650)
This was used for the old interop code. It's dead code after #143491
2025-10-09 08:38:24 -05:00
Joseph Huber
8763812b4c
[Offload] Remove check on kernel argument sizes (#162121)
Summary:
This check is unnecessarily restrictive and currently incorrectly fires
for any size less than eight bytes. Just remove it, we do sanity checks
elsewhere and at some point need to trust the ABI.
2025-10-06 12:49:44 -05:00
Alex Duran
902fe02e87
[OFFLOAD] Restore interop functionality (#161429)
This implements two pieces to restore the interop functionality (that I
broke) when the 6.0 interfaces were added:

* A set of wrappers that support the old interfaces on top of the new
ones
* The same level of interop support for the CUDA amd AMD plugins
2025-10-02 21:48:31 +02:00
Kevin Sala Penades
01d761a776
[Offload] Use Error for allocating/deallocating in plugins (#160811)
Co-authored-by: Joseph Huber <huberjn@outlook.com>
2025-09-26 13:50:00 -05:00
Joseph Huber
23efc67e19
[Offload] Remove non-blocking allocation type (#159851)
Summary:
This was originally added in as a hack to work around CUDA's limitation
on allocation. The `libc` implementation now isn't even used for CUDA so
this code is never hit. Even if this case, this code never truly worked.

A true solution would be to use CUDA's virtual memory API instead to
allocate 2MiB slabs independenctly from the normal memory management
done in the stream.
2025-09-20 09:07:14 -05:00
Joseph Huber
580860e8b7
[OpenMP][NFC] Clean up a bunch of warnings and clang-tidy messages (#159831)
Summary:
I made the GPU flags accept more of the default LLVM warnings, which
triggered some new cases. Clean those up and fix some other ones while
I'm at it.
2025-09-19 14:09:33 -05:00
Joseph Huber
e7101dac9c
[Offload] Copy loaded images into managed storage (#158748)
Summary:
Currently we have this `__tgt_device_image` indirection which just takes
a reference to some pointers. This was all find and good when the only
usage of this was from a section of GPU code that came from an ELF
constant section. However, we have expanded beyond that and now need to
worry about managing lifetimes. We have code that references the image
even after it was loaded internally. This patch changes the
implementation to instaed copy the memory buffer and manage it locally.

This PR reworks the JIT and other image handling to directly manage its
own memory. We now don't need to duplicate this behavior externally at
the Offload API level. Also we actually free these if the user unloads
them.

Upside, less likely to crash and burn. Downside, more latency when
loading an image.
2025-09-16 08:57:28 -05:00
Ross Brunton
ffb756dff2
[Offload] Add OL_DEVICE_INFO_MAX_WORK_SIZE[_PER_DIMENSION] (#155823)
This is the total number of work items that the device supports (the
equivalent work group properties are for only a single work group).
2025-08-29 09:39:18 +01:00
Ross Brunton
41fed2d048
[Offload] Add PRODUCT_NAME device info (#155632)
On my system, this will be "Radeon RX 7900 GRE" rather than "gfx1100". For Nvidia, the product name and device name are identical.
2025-08-28 15:16:17 +01:00
Ross Brunton
1b6875ea1f
[Offload] Full AMD support for olMemFill (#154958) 2025-08-26 11:49:12 +01:00
Callum Fare
0b18d2da70
[Offload] Implement olMemFill (#154102)
Implement olMemFill to support filling device memory with arbitrary
length patterns. AMDGPU support will be added in a follow-up PR.
2025-08-22 14:31:16 +01:00
Ross Brunton
4c0c295775
[Offload] OL_EVENT_INFO_IS_COMPLETE (#153194)
A simple info query for events that returns whether the event is
complete or not.
2025-08-22 13:40:31 +01:00
Ross Brunton
2c11a83691
[Offload] Add olCalculateOptimalOccupancy (#142950)
This is equivalent to `cuOccupancyMaxPotentialBlockSize`. It is
currently
only implemented on Cuda; AMDGPU and Host return unsupported.

---------

Co-authored-by: Callum Fare <callum@codeplay.com>
2025-08-19 15:16:47 +01:00
Rafal Bielski
9c9d9e4cb6
[Offload] Define additional device info properties (#152533)
Add the following properties in Offload device info:
* VENDOR_ID
* NUM_COMPUTE_UNITS
* [SINGLE|DOUBLE|HALF]_FP_CONFIG
* NATIVE_VECTOR_WIDTH_[CHAR|SHORT|INT|LONG|FLOAT|DOUBLE|HALF]
* MAX_CLOCK_FREQUENCY
* MEMORY_CLOCK_RATE
* ADDRESS_BITS
* MAX_MEM_ALLOC_SIZE
* GLOBAL_MEM_SIZE

Add a bitfield option to enumerators, allowing the values to be
bit-shifted instead of incremented. Generate the per-type enums using
`foreach` to reduce code duplication.

Use macros in unit test definitions to reduce code duplication.
2025-08-19 13:02:01 +01:00
Abhinav Gaba
79cf877627
[Offload] Introduce dataFence plugin interface. (#153793)
The purpose of this fence is to ensure that any `dataSubmit`s inserted
into a queue before a `dataFence` finish before finish before any
`dataSubmit`s
inserted after it begin.

This is a no-op for most queues, since they are in-order, and by design
any operations inserted into them occur in order.

But the interface is supposed to be functional for out-of-order queues.

The addition of the interface means that any operations that rely on
such ordering (like ATTACH map-type support in #149036) can invoke it,
without worrying about whether the underlying queue is in-order or
out-of-order.

Once a plugin supports out-of-order queues, the plugin can implement
this function, without requiring any change at the libomptarget level.

---------

Co-authored-by: Alex Duran <alejandro.duran@intel.com>
2025-08-15 11:49:35 -07:00
Ross Brunton
30c7951136
[Offload] olLaunchHostFunction (#152482)
Add an `olLaunchHostFunction` method that allows enqueueing host work
to the stream.
2025-08-15 09:39:48 +01:00
Ross Brunton
910d7e90bf
[Offload] Make olLaunchKernel test thread safe (#149497)
This sprinkles a few mutexes around the plugin interface so that the
olLaunchKernel CTS test now passes when ran on multiple threads.

Part of this also involved changing the interface for device synchronise
so that it can optionally not free the underlying queue (which
introduced a race condition in liboffload).
2025-08-08 10:57:04 +01:00
Ross Brunton
a44532544b
[Offload] Don't create events for empty queues (#152304)
Add a device function to check if a device queue is empty. If liboffload
tries to create an event for an empty queue, we create an "empty" event
that is already complete.

This allows `olCreateEvent`, `olSyncEvent` and `olWaitEvent` to run
quickly for empty queues.
2025-08-07 10:16:33 +01:00
hidekisaito
83e5a99ff6
[AMDGPU][Offload] Enable memory manager use for up to ~3GB allocation size in omp_target_alloc (#151882)
Enables AMD data center class GPUs to use memory manager memory pooling
up to 3GB allocation by default, up from the "1 << 13" threshold that
all plugin-nextgen devices use.
2025-08-06 14:41:20 -07:00
Ross Brunton
d03692a00e
[Offload] Rework MAX_WORK_GROUP_SIZE (#151926)
`MAX_WORK_GROUP_SIZE` now represents the maximum total number of work
groups the device can allocate, rather than the maximum per dimension.
`MAX_WORK_GROUP_SIZE_PER_DIMENSION` has been added, which has the old
behaviour.
2025-08-04 15:21:24 +01:00
Ross Brunton
e87d3904f6
[Offload] Verify SyncCycle for events in AMDGPU (#149524)
This check ensures that events after a synchronise (and thus after the
queue is reset) are always considered complete. A test has been added
as well.
2025-07-21 09:37:29 +01:00
Ross Brunton
311847be4c
[Offload] Allow "tagging" device info entries with offload keys (#147317)
When generating the device info tree, nodes can be marked with an
offload Device Info value. The nodes can also look up children based
on this value.
2025-07-18 14:27:34 +01:00
Ross Brunton
df9a864b04
[Offload] Implement event sync in amdgpu (#149300) 2025-07-18 09:56:17 +01:00
Ross Brunton
abb878438a
[Offload] Allow querying the size of globals (#147698)
The `GlobalTy` helper has been extended to make both the Size and Ptr be
optional. Now `getGlobalMetadataFromDevice`/`Image` is able to write the
size of the global to the struct, instead of just verifying it.
2025-07-10 12:05:31 +01:00
Ross Brunton
6b19cdcefa
[Offload][amdgpu] Map INVALID_CODE_OBJECT to INVALID_BINARY (#147070) 2025-07-04 16:17:51 +01:00
Joseph Huber
df5097dd94
[Offload] Add default for HSA agent type to silence warning (#145943)
Summary:
There's a new one called the AIE (AI Engine). We could handle this, but
since we don't use it currently I'm just making it future-proof. Adding
the AIE check would require checking the HSA version which isn't
worthwhile just yet.
2025-06-26 14:46:08 -05:00
Ross Brunton
0870c8838b
[Offload] Add an unloadBinary interface to PluginInterface (#143873)
This allows removal of a specific Image from a Device, rather than
requiring all image data to outlive the device they were created for.

This is required for `ol_program_handle_t`s, which now specify the
lifetime of the buffer used to create the program.
2025-06-25 14:53:18 +01:00
Ross Brunton
e6a3579653
[Offload] Replace device info queue with a tree (#144050)
Previously, device info was returned as a queue with each element having
a "Level" field indicating its nesting level. This replaces this queue
with a more traditional tree-like structure.

This should not result in a change to the output of
`llvm-offload-device-info`.
2025-06-13 09:22:47 -05:00
Joseph Huber
5b8031a7f7
[Offload][AMDGPU] Correctly handle variable implicit argument sizes (#142199)
Summary:
The size of the implicit argument struct can vary depending on
optimizations, it is not always the size as listed by the full struct.
Additionally, the implicit arguments are always aligned on a pointer
boundary. This patch updates the handling to use the correctly aligned
offset and only initialize the members if they are contained in the
reported size.

Additionally, we modify the `alloc` and `free` routines to allow
`alloc(0)` and `free(nullptr)` as these are mandated by the C standard
and allow us to easily handle cases where the user calls a kernel with
no arguments.
2025-06-02 09:35:16 -05:00
Joseph Huber
b26baf1779
[Offload] Make AMDGPU plugin handle empty allocation properly (#142383)
Summary:
`malloc(0)` and `free(nullptr)` are both defined by the standard but we
current trigger erros and assertions on them. Fix that so this works
with empty arguments.
2025-06-02 08:12:20 -05:00
Ross Brunton
050892d2f8
[Offload] Use new error code handling mechanism and lower-case messages (#139275)
[Offload] Use new error code handling mechanism

This removes the old ErrorCode-less error method and requires
every user to provide a concrete error code. All calls have been
updated.

In addition, for consistency with error messages elsewhere in LLVM, all
messages have been made to start lower case.
2025-05-20 08:50:20 -05:00
Joseph Huber
75f810e025
[Offload] Guard HSA implicit arguments if they aren't created (#133073)
Summary:
We conditionally allocate the implicit arguments, so they possibly are
null. The flang compiler seems to hit this case, even though it
shouldn't when it's supposed to conform to the HSA code object. For now
guard this to fix the regression and cover a case in the future where
someone rolls a fully custom implementatation.

Fixes: https://github.com/llvm/llvm-project/issues/132982
2025-03-26 08:54:33 -05:00
Joseph Huber
25bf4e262c
[Offload] Remove handling for COV4 binaries from offload/ (#131033)
Summary:
We moved from cov4 to cov5 a long time ago, and it guards simplifying
some front end code, so we should be able to move up with this.
2025-03-24 18:58:20 -05:00
Fabian Ritter
a2f9ae1421
[AMDGPU] Replace gfx940 and gfx941 with gfx942 in offload and libclc (#125826)
gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.

For SWDEV-512631 and SWDEV-512633
2025-02-19 09:56:04 +01:00
Christian Clauss
1f56bb3137
[Offload][NFC] Fix typos discovered by codespell (#125119)
https://github.com/codespell-project/codespell

% `codespell
--ignore-words-list=archtype,hsa,identty,inout,iself,nd,te,ths,vertexes
--write-changes`
2025-01-31 09:35:29 -06:00
Joseph Huber
134401deea
[Offload] Move RPC server handling to a dedicated thread (#112988)
Summary:
Handling the RPC server requires running through list of jobs that the
device has requested to be done. Currently this is handled by the thread
that does the waiting for the kernel to finish. However, this is not
sound on NVIDIA architectures and only works for async launches in the
OpenMP model that uses helper threads.

However, we also don't want to have this thread doing work
unnnecessarily. For this reason we track the execution of kernels and
cause the thread to sleep via a condition variable (usually backed by
some kind of futex or other intelligent sleeping mechanism) so that the
thread will be idle while no kernels are running.
2025-01-24 11:36:45 -06:00
hidekisaito
f2bceb2311
[Offload][AMDGPU] accept generic target (#118919)
Enables generic ISA, e.g., "--offload-arch=gfx11-generic" device code to
run on gfx11-generic ISA capable device.

Executable may contain one ELF that has specific target ISA and another
ELF that has compatible generic ISA.
Under that circumstance, this code should say both ELFs are compatible,
leaving the rest to PluginManager to handle.
Suggestions on how best to address that is welcome.
2024-12-09 19:11:38 -05:00
Shilei Tian
92376c3ff5
[Offload][OMPX] Add the runtime support for multi-dim grid and block (#118042) 2024-12-06 09:07:50 -05:00
hidekisaito
8c36a823c2
Fix to account for multiple ISA enumeration (#118676) 2024-12-04 12:11:16 -06:00
Michał Górny
04b26f0eb7
[offload] Standalone build fixes (#118173)
A fair number of fixes to get standalone builds of offload working —
mostly copying missing bits from openmp. It's almost ready — I still
need to figure out why some of the tsts aren't linking to the right
libraries.
2024-12-04 11:27:37 +00:00