69 Commits

Author SHA1 Message Date
Joseph Huber
1a86c146ae
[Offload] Add a function to register an RPC Server callback (#178774)
Summary:
We provide an RPC server to manage calls initiated by the device to run
on the host. This is very useful for the built-in handling we have,
however there are cases where we would want to extend this
functionality.

Cases like Fortran or MPI would be useful, but we cannot put references
to these in the core offloading runtime. This way, we can provide this
as a library interface that registers custom handlers for whatever code
people want.
2026-01-30 08:03:13 -06:00
fineg74
848d736e64
[OFFLOAD] Add asynchronous queue query API for libomptarget migration (#172231)
Add liboffload asynchronous queue query API for libomptarget migration

This PR adds liboffload asynchronous queue query API that needed to make
libomptarget to use liboffload
2026-01-20 10:53:32 -08:00
fineg74
1232599032
[OFFLOAD] Add memory data locking API for libomptarget migration (#173138)
Add liboffload memory data locking API for libomptarget migration

This PR adds liboffload memory data locking API that needed to make
libomptarget to use liboffload
2026-01-12 13:07:57 -06:00
Kevin Sala Penades
93f339b593
[offload] Fix unittests when multiple devices are available (#173209)
This commit appends a device number after the device name (used as
unittest param name). The number is between 0 and the number of
available non-host devices. In this way, it allows multiple devices of
the same vendor to be tested.
2025-12-21 19:50:43 -08:00
Kevin Sala Penades
885bc9a457
[offload] Fix kernel launch unittest (#173203)
This commit fixes the error introduced in #172249.
2025-12-21 18:54:09 -08:00
Kevin Sala Penades
35315a84b4
[offload] Fix CUDA args size by subtracting tail padding (#172249)
This commit makes the cuLaunchKernel call to pass the total arguments size without tail padding.
2025-12-14 21:57:25 -08:00
Kevin Sala Penades
1a86f0aae7
[Offload] Add device info for shared memory (#167817) 2025-11-13 11:00:12 -08:00
Robert Imschweiler
dc94f2cbad
[Offload] Add device UID (#164391)
Introduced in OpenMP 6.0, the device UID shall be a unique identifier of
a device on a given system. (Not necessarily a UUID.) Since it is not
guaranteed that the (U)UIDs defined by the device vendor libraries, such
as HSA, do not overlap with those of other vendors, the device UIDs in
offload are always combined with the offload plugin name. In case the
vendor library does not specify any device UID for a given device, we
fall back to the offload-internal device ID.
The device UID can be retrieved using the `llvm-offload-device-info`
tool.
2025-11-04 20:15:47 +01:00
Joseph Huber
8763812b4c
[Offload] Remove check on kernel argument sizes (#162121)
Summary:
This check is unnecessarily restrictive and currently incorrectly fires
for any size less than eight bytes. Just remove it, we do sanity checks
elsewhere and at some point need to trust the ABI.
2025-10-06 12:49:44 -05:00
Ross Brunton
ea0e5185e2
[Offload] Add olGetMemInfo with platform-less API (#159581) 2025-09-24 12:17:57 +01:00
Ross Brunton
fcebe6bdbb
[Offload] Re-allocate overlapping memory (#159567)
If olMemAlloc happens to allocate memory that was already allocated
elsewhere (possibly by another device on another platform), it is now
thrown away and a new allocation generated.

A new `AllocBases` vector is now available, which is an ordered list
of allocation start addresses.
2025-09-23 13:59:52 +01:00
Joseph Huber
51e3c3d51b
[Offload] Implement 'olIsValidBinary' in offload and clean up (#159658)
Summary:
This exposes the 'isDeviceCompatible' routine for checking if a binary
*can* be loaded. This is useful if people don't want to consume errors
everywhere when figuring out which image to put to what device.

I don't know if this is a good name, I was thining like `olIsCompatible`
or whatever. Let me know what you think.

Long term I'd like to be able to do something similar to what OpenMP
does where we can conditionally only initialize devices if we need them.
That's going to be support needed if we want this to be more
generic.
2025-09-19 12:15:57 -05:00
Ross Brunton
c5474cdc27
[Offload] Make ASSERT_ERROR output more readable (#157653) 2025-09-16 12:04:53 +01:00
Ross Brunton
7731ecf259
[Offload] Skip most liboffload tests if no devices (#157417)
If there are no devices available for testing on liboffload, the test
will no longer throw an error when it fails to instantiate.

The tests will be silently skipped, but with a warning printed to
stderr.
2025-09-09 10:11:05 +01:00
Ross Brunton
ffb756dff2
[Offload] Add OL_DEVICE_INFO_MAX_WORK_SIZE[_PER_DIMENSION] (#155823)
This is the total number of work items that the device supports (the
equivalent work group properties are for only a single work group).
2025-08-29 09:39:18 +01:00
Ross Brunton
9e5d8bd3d1
[Offload] Improve olDestroyQueue logic (#153041)
Previously, `olDestroyQueue` would not actually destroy the queue,
instead leaving it for the device to clean up when it was destroyed.
Now, the queue is either released immediately if it is complete or put
into a list of "pending" queues if it is not. Whenever we create a new
queue, we check this list to see if any are now completed. If there are
any we release their resources and use them instead of pulling from
the pool.

This prevents long running programs that create and drop many queues
without syncing them from leaking memory all over the place.
2025-08-29 09:39:00 +01:00
Ross Brunton
41fed2d048
[Offload] Add PRODUCT_NAME device info (#155632)
On my system, this will be "Radeon RX 7900 GRE" rather than "gfx1100". For Nvidia, the product name and device name are identical.
2025-08-28 15:16:17 +01:00
Ross Brunton
1b6875ea1f
[Offload] Full AMD support for olMemFill (#154958) 2025-08-26 11:49:12 +01:00
Callum Fare
0b18d2da70
[Offload] Implement olMemFill (#154102)
Implement olMemFill to support filling device memory with arbitrary
length patterns. AMDGPU support will be added in a follow-up PR.
2025-08-22 14:31:16 +01:00
Ross Brunton
4c0c295775
[Offload] OL_EVENT_INFO_IS_COMPLETE (#153194)
A simple info query for events that returns whether the event is
complete or not.
2025-08-22 13:40:31 +01:00
Ross Brunton
2c11a83691
[Offload] Add olCalculateOptimalOccupancy (#142950)
This is equivalent to `cuOccupancyMaxPotentialBlockSize`. It is
currently
only implemented on Cuda; AMDGPU and Host return unsupported.

---------

Co-authored-by: Callum Fare <callum@codeplay.com>
2025-08-19 15:16:47 +01:00
Rafal Bielski
9c9d9e4cb6
[Offload] Define additional device info properties (#152533)
Add the following properties in Offload device info:
* VENDOR_ID
* NUM_COMPUTE_UNITS
* [SINGLE|DOUBLE|HALF]_FP_CONFIG
* NATIVE_VECTOR_WIDTH_[CHAR|SHORT|INT|LONG|FLOAT|DOUBLE|HALF]
* MAX_CLOCK_FREQUENCY
* MEMORY_CLOCK_RATE
* ADDRESS_BITS
* MAX_MEM_ALLOC_SIZE
* GLOBAL_MEM_SIZE

Add a bitfield option to enumerators, allowing the values to be
bit-shifted instead of incremented. Generate the per-type enums using
`foreach` to reduce code duplication.

Use macros in unit test definitions to reduce code duplication.
2025-08-19 13:02:01 +01:00
Ross Brunton
30c7951136
[Offload] olLaunchHostFunction (#152482)
Add an `olLaunchHostFunction` method that allows enqueueing host work
to the stream.
2025-08-15 09:39:48 +01:00
Ross Brunton
910d7e90bf
[Offload] Make olLaunchKernel test thread safe (#149497)
This sprinkles a few mutexes around the plugin interface so that the
olLaunchKernel CTS test now passes when ran on multiple threads.

Part of this also involved changing the interface for device synchronise
so that it can optionally not free the underlying queue (which
introduced a race condition in liboffload).
2025-08-08 10:57:04 +01:00
Ross Brunton
197d1c1570
[Offload] OL_QUEUE_INFO_EMPTY (#152473)
Add a queue query that (if possible) reports whether the queue is empty
2025-08-08 10:20:45 +01:00
Ross Brunton
d03692a00e
[Offload] Rework MAX_WORK_GROUP_SIZE (#151926)
`MAX_WORK_GROUP_SIZE` now represents the maximum total number of work
groups the device can allocate, rather than the maximum per dimension.
`MAX_WORK_GROUP_SIZE_PER_DIMENSION` has been added, which has the old
behaviour.
2025-08-04 15:21:24 +01:00
Leandro Lacerda
f1eb869bae
[Offload][UnitTests] Build device code as C++ (#151714)
This commit refactors the `add_offload_test_device_code` CMake function
to compile device code using the C++ compiler (`CMAKE_CXX_COMPILER`)
instead of the C compiler.

This change enables the use of C++ features, such as templates, within
device-side test kernels. This will allow for more advanced and reusable
kernel wrappers, reducing boilerplate code in the conformance test
suite.

As part of this change:
- All `.c` files for device code in `unittests/` have been renamed to
`.cpp`.
- Kernel definitions are now wrapped in `extern "C"` to ensure C linkage
and prevent name mangling.

This change affects the `OffloadAPI` and `Conformance` test suites.

cc @callumfare @RossBrunton @jhuber6
2025-08-04 07:00:51 -05:00
Callum Fare
78faf99c4f
[Offload] Fix olWaitEvents tests after change to events API (#150465)
Fix the olWaitEvents tests after #150217 broke them
2025-07-24 18:35:47 +01:00
Ross Brunton
690c3ee5be
[Offload] Replace "EventOut" parameters with olCreateEvent (#150217)
Rather than having every "enqueue"-type function have an output pointer
specifically for an output event, just provide an `olCreateEvent`
entrypoint which pushes an event to the queue.

For example, replace:
```cpp
olMemcpy(Queue, ..., EventOut);
```
with
```cpp
olMemcpy(Queue, ...);
olCreateEvent(Queue, EventOut);
```
2025-07-24 14:31:06 +01:00
Ross Brunton
081b74caf5
[Offload] Add olWaitEvents (#150036)
This function causes a queue to wait until all the provided events have
completed before running any future scheduled work.
2025-07-23 14:12:16 +01:00
Ross Brunton
2726b7fb1c
[Offload] Rename olWaitEvent/Queue to olSyncEvent/Queue (#150023)
This more closely matches the nomenclature used by CUDA, AMDGPU and
the plugin interface.
2025-07-23 10:52:13 +01:00
Ross Brunton
e87d3904f6
[Offload] Verify SyncCycle for events in AMDGPU (#149524)
This check ensures that events after a synchronise (and thus after the
queue is reset) are always considered complete. A test has been added
as well.
2025-07-21 09:37:29 +01:00
Ross Brunton
df9a864b04
[Offload] Implement event sync in amdgpu (#149300) 2025-07-18 09:56:17 +01:00
Ross Brunton
55b417a75f
[Offload] Cache symbols in program (#148209)
When creating a new symbol, check that it already exists. If it does,
return that pointer rather than building a new symbol structure.
2025-07-16 18:32:47 +01:00
Kenneth Benzie (Benie)
508f9a0274
[Offload] Skip event tests on AMDGPU (#148632)
Add `OffloadDeviceTest::getPlatformBackend()` and use it to skip event
tests which currently fail on AMDGPU due to:

```
OL_ERRC_UNIMPLEMENTED: synchronize event not implemented
```
2025-07-14 09:19:53 -05:00
Ross Brunton
2fdeeefacf
[Offload] Add global variable address/size queries (#147972)
Add two new symbol info types for getting the bounds of a global
variable. As well as a number of tests for reading/writing to it.
2025-07-11 16:12:48 +01:00
Ross Brunton
84e15d08c2
[Offload] Add olGetSymbolInfo[Size] (#147962)
This mirrors the similar functions for other handles. The only
implemented info at the moment is the symbol's kind.
2025-07-11 15:29:53 +01:00
Ross Brunton
eee723f928
[Offload] Replace GetKernel with GetSymbol with global support (#148221)
`olGetKernel` has been replaced by `olGetSymbol` which accepts a
`Kind` parameter. As well as loading information about kernels, it
can now also load information about global variables.
2025-07-11 14:48:10 +01:00
Ross Brunton
466357ab51
[Offload] Change ol_kernel_handle_t -> ol_symbol_handle_t (#147943)
In the future, we want `ol_symbol_handle_t` to represent both kernels
and global variables The first step in this process is a rename and
promotion to a "typed handle".
2025-07-10 14:54:10 +01:00
Callum Fare
7c6edf4a05
[Offload] Implement olGetQueueInfo, olGetEventInfo (#142947)
Add info queries for queues and events.

`olGetQueueInfo` only supports getting the associated device. We were
already tracking this so we can implement this for free. We will likely
add other queries to it in the future (whether the queue is empty, what
flags it was created with, etc)

`olGetEventInfo` only supports getting the associated queue. This is
another thing we were already storing in the handle. We'll be able to
add other queries in future (the event type, status, etc)
2025-07-09 17:09:31 +01:00
Ross Brunton
bed9fe77dc
[Offload] Tests for global memory and constructors (#147537)
Adds two "launch kernel" tests for lib offload, one testing that
global memory works and persists between different kernels, and one
verifying that `[[gnu::constructor]]` works correctly.

Since we now have tests that contain multiple kernels in the same
binary, the test framework has been updated a bit.
2025-07-09 14:26:50 +01:00
Ross Brunton
8ae8d31832
[Offload] Add liboffload unit tests for shared/local memory (#147040) 2025-07-07 16:20:02 +01:00
Ross Brunton
7d52b0983e
[Offload] Add MAX_WORK_GROUP_SIZE device info query (#143718)
This adds a new device info query for the maximum workgroup/block size
for each dimension.
2025-07-02 16:33:54 +01:00
Ross Brunton
003145d0c8
[Offload] Implement olShutDown (#144055)
`olShutDown` was not properly calling deinit on the platforms, resulting
in random segfaults on AMD devices.

As part of this, `olInit` and `olShutDown` now alloc and free the
offload context rather than it being static. This
allows `olShutDown` to be called within a destructor of a static object
(like the tests do) without having to worry about destructor ordering.
2025-06-30 12:14:00 +01:00
Ross Brunton
613c38a992
[Offload] Fix type mismatch warning in test (#143700) 2025-06-23 10:14:12 +01:00
Joseph Huber
3f1de197b1
[Offload] Rework compiling device code for unit test suites (#144776)
Summary:
I'll probably want to use this as a more generic utility in the future.
This patch reworks it to make it a top level function. I also tried to
decouple this from the OpenMP utilities to make that easier in the
future. Instead, I just use `-march=native` functionality which is the
same thing. Needed a small hack to skip the linker stage for checking if
that works.

This should still create the same output as far as I'm aware.
2025-06-20 10:31:54 -05:00
Ross Brunton
e0633d59b9
[Offload] Check for initialization (#144370)
All entry points (except olInit) now check that offload has been
initialized. If not, a new `OL_ERRC_UNINITIALIZED` error is returned.
2025-06-20 09:04:50 -05:00
Ross Brunton
4f60321ca1
[Offload] Add ol_dimensions_t and convert ranges from size_t -> uint32_t (#143901)
This is a three element x, y, z size_t vector that can be used any place
where a 3D vector is required. This ensures that all vectors across
liboffload are the same and don't require any resizing/reordering
dances.
2025-06-12 09:59:59 -05:00
Ross Brunton
269c29ae67
[Offload] Allow setting null arguments in olLaunchKernel (#141958) 2025-06-06 07:05:11 -05:00
Ross Brunton
e83c80340f
[Offload] Split offload unittests into multiple files (#142418)
Rather than a single `offload.unittests` file, this will produce
`device.unittests`, `event.unittests`, etc.. This should reduce time
spent building tests, and make it easier to manually run a subset of
the tests.

Note that `check-offload-unit` will still run all the tests.
2025-06-02 11:48:12 -05:00