llvm-project

Author	SHA1	Message	Date
Leandro Lacerda	34028294e4	[Offload] Add support for measuring elapsed time between events (#186856 ) This patch adds `olGetEventElapsedTime` to the new LLVM Offload API, as requested in [#185728](https://github.com/llvm/llvm-project/issues/185728), and adds the corresponding support in `plugins-nextgen`. A main motivation for this change is to make it possible to measure the elapsed time of work submitted to a queue, especially kernel launches. This is relevant to the intended use of the new Offload API for microbenchmarking GPU libc math functions. ### Summary The new API returns the elapsed time, in milliseconds, between two events on the same device. To support the common pattern `create start event → enqueue kernel → create end event → sync end event → get elapsed time`, `olCreateEvent` now always creates and records a backend event through the device interface. For backends that materialize real event state, this gives the event concrete backend state that can be used for elapsed-time measurement. For backends that do not materialize backend event state, `EventInfo` may still remain null and existing event operations continue to treat such events as trivially complete. Previously, an event created on an empty queue could be represented only as a logical event. That representation was sufficient for sync and completion queries, but it was not suitable for elapsed-time measurement because there was no backend event state to timestamp. The new behavior preserves the meaning of completion of prior work while also allowing backends with timing support to attach real event state. ### Changes in `plugins-nextgen` #### Common interface Add elapsed-time support to the common device and plugin interfaces: * `GenericPluginTy::get_event_elapsed_time` * `GenericDeviceTy::getEventElapsedTime` * `GenericDeviceTy::getEventElapsedTimeImpl` #### AMDGPU * Add the required ROCr declarations and wrappers. * Enable queue profiling at queue creation time. * Record events by enqueuing a real barrier marker packet on the stream. * Retain the timing signal needed to query the recorded marker later. * Implement `getEventElapsedTimeImpl` using `hsa_amd_profiling_get_dispatch_time`, converting the result to milliseconds with `HSA_SYSTEM_INFO_TIMESTAMP_FREQUENCY`. This follows the ROCm/HIP approach of enabling queue profiling at HSA queue creation time, while keeping the AMDGPU queue path simpler than the lazy-enable alternative discussed during review. #### CUDA * Add the required CUDA driver declarations and wrappers. * Implement `getEventElapsedTimeImpl` with `cuEventElapsedTime`. #### Host * Add `getEventElapsedTimeImpl` that stores `0.0f` in the output pointer, when present, and returns success. Reason: the host plugin does not materialize backend event state and already treats event operations as trivially successful. Returning `0.0f` preserves that model without introducing a new failure mode. #### Level Zero * Add `getEventElapsedTimeImpl`, but leave it unimplemented. Reason: the Level Zero plugin currently does not provide standalone backend event support for this event model. For example, `waitEventImpl` / `syncEventImpl` are still unimplemented there. --------- Signed-off-by: Leandro Augusto Lacerda Campos <leandrolcampos@yahoo.com.br> Signed-off-by: Leandro A. Lacerda Campos <leandrolcampos@yahoo.com.br>	2026-04-01 14:13:44 -05:00
Joseph Huber	4961700c10	[libc] Support AMDGPU device interrupts for the RPC interface (#188067 ) Summary: One of the main disadvantages to using the RPC interface is that it requires a server thread to spin on the mailboxes checking for work. The vast majority of the time, there will be no work and work will come in large bursts. The HSA / KFD interface supports device-side interrupts and already has handling for binding these events to an HSA signal. This means that we can send interrupts from the GPU to wake a sleeping thread on the CPU. The sleeping thread will be descheduled with a blocking HSA wait call and woken up when its event ID is raised through the kernel driver's interrupt. This is very target-specific handling, but I believe it is valuable enough to warrant it being in the protocol. It is completely optional, as it is ignored if uninitialized. This should bring this support at parity with the interface HIP expects.	2026-03-24 08:48:52 -05:00
Joseph Huber	154a128c65	Reapply "[OpenMP] Move OpenMP implicit argument to the end and reformat" (#186309 ) Should be working downstream now This reverts commit 9b61ff210fdff752d5db55b128474e9990258488.	2026-03-13 15:48:37 -05:00
theRonShark	9b61ff210f	Revert "[OpenMP] Move OpenMP implicit argument to the end and reformat" (#186309 ) Reverts llvm/llvm-project#185989	2026-03-13 05:20:40 +00:00
Kevin Sala Penades	ac71b185c2	[offload] Remove LIBOMPTARGET_SHARED_MEMORY_SIZE envar (#186231 ) This commit removes the `LIBOMPTARGET_SHARED_MEMORY_SIZE` envar and outputs a runtime warning if it is defined. Access to dynamic shared memory should be obtained through the `dyn_groupprivate` clause (OpenMP 6.1) or the launch arguments in liboffload kernel launch.	2026-03-12 21:21:29 -07:00
Joseph Huber	4376fbd793	[OpenMP] Move OpenMP implicit argument to the end and reformat (#185989 ) Summary: We use this `dyn_ptr` argument in Clang/OpenMP to handle the `KernelLaunchEnvironment`. This is a per-kernel argument used to share some information. Currenetly, it's prepended to the argument list and we generate storage for it in the runtime. This is bad for a few reasons: 1. It changes the ABI by shifting user arguments 2. It cannot be trivially be left uninitialized if unused 3. The runtime must allocate its own memory for it This PR changes it to be appended instead. Additionally, space for this is always emitted. This means the OMPIRBuilder itself will provide the storage, we simply need to populate it in the runtime if it is used. This means that if it's unused we don't always pay the cost and it's easier for non-OpenMP users to ignore it. Backward compatibility is maintained by auto-upgrading the kernel arguments. In `libomptarget` we completely allocate a new buffer to store this in the new format. The plugins still need to respect the old ABI of the called device object, so we simply rotate it if it's the old version.	2026-03-12 18:08:22 -05:00
Kevin Sala Penades	1f583c6dee	[OpenMP][Offload] Add offload runtime support for dyn_groupprivate clause (#152831 ) Part 3 adding offload runtime support. See https://github.com/llvm/llvm-project/pull/152651. --------- Co-authored-by: Krzysztof Parzyszek <Krzysztof.Parzyszek@amd.com>	2026-03-12 01:13:06 -07:00
Alex Duran	be021b8433	[OFFLOAD] Add interface to extend image validation (#185663 ) As discussed in #185404 we might want to provide a way for plugins to validate images not recognized by the common layer. This PR adds such extension and uses it to validate pure SPIRV images by the Level Zero plugin.	2026-03-10 18:41:23 +01:00
Joseph Huber	a9e457a82f	[Offload][AMDGPU] Fix RPC server on mixed w32 w64 workloads (#185496 ) Summary: This was a regression from the original LLVM-gpu-loader. We used to handle `-mwavefrontsize64` correctly in the loader by over-allocating memory and just leaving the upper 32-bits masked off. In order to handle this in offload we need to scan loaded kernels to see how much memory we need to allocate. This should be safe, the protocol is designed to handle an arbitrary size and worst-case this just wastes space.	2026-03-09 17:13:59 -05:00
Jan Patrick Lehr	e1e0e86e60	[Offload] Always check/consume Error (#182008 ) This fixes an issue introduced in https://github.com/llvm/llvm-project/pull/172226 where an llvm::Error is not checked in the "good" code path.	2026-02-18 13:46:21 +01:00
fineg74	1c6d774baa	[OFFLOAD] Extend olMemRegister API to handle cases when a memory block may have been mapped outside of liboffload. (#172226 ) This PR adds extends liboffload olMemRegister API to handle a case when a memory block may have been mapped before calling olMemRegister to support some use cases in libomptarget	2026-02-17 20:53:00 +00:00
fineg74	848d736e64	[OFFLOAD] Add asynchronous queue query API for libomptarget migration (#172231 ) Add liboffload asynchronous queue query API for libomptarget migration This PR adds liboffload asynchronous queue query API that needed to make libomptarget to use liboffload	2026-01-20 10:53:32 -08:00
Hansang Bae	dae3b49cba	[Offload] Update debug message printig in the plugins (#175205 ) * Prepare a set of debug types in llvm::offload::debug to be used in plugin code * Update debug messages in the plugins	2026-01-12 14:26:43 -06:00
Alex Duran	f125c8db5c	[OFFLOAD] Add plugin with support for Intel oneAPI Level Zero (#158900 ) Add a new nextgen plugin that supports GPU devices through the Intel oneAPI Level Zero library. The plugin is not enabled by default and needs to be added to LIBOMPTARGET_PLUGINS_TO_BUILD explicitely. --------- Co-authored-by: Alexey Sachkov <alexey.sachkov@intel.com> Co-authored-by: Nick Sarnie <nick.sarnie@intel.com> Co-authored-by: Joseph Huber <huberjn@outlook.com>	2025-12-18 08:53:03 +01:00
Hansang Bae	ecb94bcfe2	[Offload] Debug message update part 3 (#171684 ) Update debug messages based on the new method from #170425. Updated the following files. - plugins-nextgen/common/include/MemoryManager.h - plugins-nextgen/common/include/PluginInterface.h - plugins-nextgen/common/src/GlobalHandler.cpp - plugins-nextgen/common/src/PluginInterface.cpp - plugins-nextgen/host/dynamic_ffi/ffi.cpp	2025-12-17 09:05:16 -06:00
Kevin Sala Penades	1a86f0aae7	[Offload] Add device info for shared memory (#167817 )	2025-11-13 11:00:12 -08:00
Joseph Huber	aaddd8d38a	[OpenMP] Fix tests relying on the heap size variable Summary: I made that an unimplemented error, but forgot that it was used for this environment variable.	2025-11-06 13:00:26 -06:00
Joseph Huber	670c453aeb	[Offload] Remove handling for device memory pool (#163629 ) Summary: This was a lot of code that was only used for upstream LLVM builds of AMDGPU offloading. We have a generic and fast `malloc` in `libc` now so just use that. Simplifies code, can be added back if we start providing alternate forms but I don't think there's a single use-case that would justify it yet.	2025-11-06 10:15:18 -06:00
Robert Imschweiler	dc94f2cbad	[Offload] Add device UID (#164391 ) Introduced in OpenMP 6.0, the device UID shall be a unique identifier of a device on a given system. (Not necessarily a UUID.) Since it is not guaranteed that the (U)UIDs defined by the device vendor libraries, such as HSA, do not overlap with those of other vendors, the device UIDs in offload are always combined with the offload plugin name. In case the vendor library does not specify any device UID for a given device, we fall back to the offload-internal device ID. The device UID can be retrieved using the `llvm-offload-device-info` tool.	2025-11-04 20:15:47 +01:00
Nicole Aschenbrenner	16641ad8a2	[OpenMP] Adds omp_target_is_accessible routine (#138294 ) Adds omp_target_is_accessible routine. Refactors common code from omp_target_is_present to work for both routines. --------- Co-authored-by: Shilei Tian <i@tianshilei.me>	2025-10-22 17:35:16 +02:00
Alex Duran	45757b9284	[OFFLOAD] Remove unused init_device_info plugin interface (#162650 ) This was used for the old interop code. It's dead code after #143491	2025-10-09 08:38:24 -05:00
Alexey Sachkov	bb584644e9	[Offload][NFC] Avoid temporary string copies in InfoTreeNode (#159372 )	2025-09-23 12:21:57 -05:00
Joseph Huber	51e3c3d51b	[Offload] Implement 'olIsValidBinary' in offload and clean up (#159658 ) Summary: This exposes the 'isDeviceCompatible' routine for checking if a binary can be loaded. This is useful if people don't want to consume errors everywhere when figuring out which image to put to what device. I don't know if this is a good name, I was thining like `olIsCompatible` or whatever. Let me know what you think. Long term I'd like to be able to do something similar to what OpenMP does where we can conditionally only initialize devices if we need them. That's going to be support needed if we want this to be more generic.	2025-09-19 12:15:57 -05:00
Joseph Huber	e7101dac9c	[Offload] Copy loaded images into managed storage (#158748 ) Summary: Currently we have this `__tgt_device_image` indirection which just takes a reference to some pointers. This was all find and good when the only usage of this was from a section of GPU code that came from an ELF constant section. However, we have expanded beyond that and now need to worry about managing lifetimes. We have code that references the image even after it was loaded internally. This patch changes the implementation to instaed copy the memory buffer and manage it locally. This PR reworks the JIT and other image handling to directly manage its own memory. We now don't need to duplicate this behavior externally at the Offload API level. Also we actually free these if the user unloads them. Upside, less likely to crash and burn. Downside, more latency when loading an image.	2025-09-16 08:57:28 -05:00
Joseph Huber	5d550bf41c	[OpenMP] Move `__omp_rtl_data_environment' handling to OpenMP (#157182 ) Summary: This operation is done every time we load a binary, this behavior should be moved into OpenMP since it concerns an OpenMP specific data struct. This is a little messy, because ideally we should only be using public APIs, but more can be extracted later.	2025-09-08 09:58:38 -05:00
Ross Brunton	32beea0605	[OpenMP][Offload] Mark `SPMD_NO_LOOP` as a valid exec mode (#155990 ) This was added in #154105 , but was not added to the plugin interface's list of valid modes.	2025-09-01 11:27:24 +01:00
Dominik Adamski	87db8e9130	[OpenMP][Offload] Add SPMD-No-Loop mode to OpenMP offload runtime (#154105 ) Kernels which are marked as SPMD-No-Loop should be launched with sufficient number of teams and threads to cover loop iteration space. No-Loop mode is described in RFC: https://discourse.llvm.org/t/rfc-no-loop-mode-for-openmp-gpu-kernels/87517/	2025-08-28 09:19:14 +02:00
Callum Fare	0b18d2da70	[Offload] Implement olMemFill (#154102 ) Implement olMemFill to support filling device memory with arbitrary length patterns. AMDGPU support will be added in a follow-up PR.	2025-08-22 14:31:16 +01:00
Ross Brunton	4c0c295775	[Offload] `OL_EVENT_INFO_IS_COMPLETE` (#153194 ) A simple info query for events that returns whether the event is complete or not.	2025-08-22 13:40:31 +01:00
Ross Brunton	2c11a83691	[Offload] Add olCalculateOptimalOccupancy (#142950 ) This is equivalent to `cuOccupancyMaxPotentialBlockSize`. It is currently only implemented on Cuda; AMDGPU and Host return unsupported. --------- Co-authored-by: Callum Fare <callum@codeplay.com>	2025-08-19 15:16:47 +01:00
Abhinav Gaba	79cf877627	[Offload] Introduce dataFence plugin interface. (#153793 ) The purpose of this fence is to ensure that any `dataSubmit`s inserted into a queue before a `dataFence` finish before finish before any `dataSubmit`s inserted after it begin. This is a no-op for most queues, since they are in-order, and by design any operations inserted into them occur in order. But the interface is supposed to be functional for out-of-order queues. The addition of the interface means that any operations that rely on such ordering (like ATTACH map-type support in #149036) can invoke it, without worrying about whether the underlying queue is in-order or out-of-order. Once a plugin supports out-of-order queues, the plugin can implement this function, without requiring any change at the libomptarget level. --------- Co-authored-by: Alex Duran <alejandro.duran@intel.com>	2025-08-15 11:49:35 -07:00
Ross Brunton	30c7951136	[Offload] `olLaunchHostFunction` (#152482 ) Add an `olLaunchHostFunction` method that allows enqueueing host work to the stream.	2025-08-15 09:39:48 +01:00
Ross Brunton	910d7e90bf	[Offload] Make olLaunchKernel test thread safe (#149497 ) This sprinkles a few mutexes around the plugin interface so that the olLaunchKernel CTS test now passes when ran on multiple threads. Part of this also involved changing the interface for device synchronise so that it can optionally not free the underlying queue (which introduced a race condition in liboffload).	2025-08-08 10:57:04 +01:00
Ross Brunton	a44532544b	[Offload] Don't create events for empty queues (#152304 ) Add a device function to check if a device queue is empty. If liboffload tries to create an event for an empty queue, we create an "empty" event that is already complete. This allows `olCreateEvent`, `olSyncEvent` and `olWaitEvent` to run quickly for empty queues.	2025-08-07 10:16:33 +01:00
hidekisaito	83e5a99ff6	[AMDGPU][Offload] Enable memory manager use for up to ~3GB allocation size in omp_target_alloc (#151882 ) Enables AMD data center class GPUs to use memory manager memory pooling up to 3GB allocation by default, up from the "1 << 13" threshold that all plugin-nextgen devices use.	2025-08-06 14:41:20 -07:00
Alex Duran	66d1c37eb6	[OFFLOAD][OPENMP] 6.0 compatible interop interface (#143491 ) The following patch introduces a new interop interface implementation with the following characteristics: * It supports the new 6.0 prefer_type specification * It supports both explicit objects (from interop constructs) and implicit objects (from variant calls). * Implements a per-thread reuse mechanism for implicit objects to reduce overheads. * It provides a plugin interface that allows selecting the supported interop types, and managing all the backend related interop operations (init, sync, ...). * It enables cooperation with the OpenMP runtime to allow progress on OpenMP synchronizations. * It cleanups some vendor/fr_id mismatchs from the current query routines. * It supports extension to define interop callbacks for library cleanup.	2025-08-06 16:34:39 +02:00
Ross Brunton	311847be4c	[Offload] Allow "tagging" device info entries with offload keys (#147317 ) When generating the device info tree, nodes can be marked with an offload Device Info value. The nodes can also look up children based on this value.	2025-07-18 14:27:34 +01:00
Ross Brunton	8e104d69fc	[Offload] Provide proper memory management for Images on host device (#146066 ) The `unloadBinaryImpl` method on the host plugin is now implemented properly (rather than just being a stub). When an image is unloaded, it is deallocated and the library associated with it is closed.	2025-07-08 12:42:06 +01:00
Ross Brunton	4f02965ae2	[Offload] Store kernel name in GenericKernelTy (#142799 ) GenericKernelTy has a pointer to the name that was used to create it. However, the name passed in as an argument may not outlive the kernel. Instead, GenericKernelTy now contains a std::string, and copies the name into there.	2025-07-02 14:11:05 +01:00
Ross Brunton	0870c8838b	[Offload] Add an `unloadBinary` interface to PluginInterface (#143873 ) This allows removal of a specific Image from a Device, rather than requiring all image data to outlive the device they were created for. This is required for `ol_program_handle_t`s, which now specify the lifetime of the buffer used to create the program.	2025-06-25 14:53:18 +01:00
Ross Brunton	f242360e15	[Offload] Add type information to device info nodes (#144535 ) Rather than being "stringly typed", store values as a std::variant that can hold various types. This means that liboffload doesn't have to do any string parsing for integer/bool device info keys.	2025-06-20 09:05:05 -05:00
Ross Brunton	e6a3579653	[Offload] Replace device info queue with a tree (#144050 ) Previously, device info was returned as a queue with each element having a "Level" field indicating its nesting level. This replaces this queue with a more traditional tree-like structure. This should not result in a change to the output of `llvm-offload-device-info`.	2025-06-13 09:22:47 -05:00
Callum Fare	b78bc35d16	[Offload] Don't check in generated files (#141982 ) Previously we decided to check in files that we generate with tablegen. The justification at the time was that it helped reviewers unfamiliar with `offload-tblgen` see the actual changes to the headers in PRs. After trying it for a while, it's ended up causing some headaches and is also not how tablegen is used elsewhere in LLVM. This changes our use of tablegen to be more conventional. Where possible, files are still clang-formatted, but this is no longer a hard requirement. Because `OffloadErrcodes.inc` is shared with libomptarget it now gets generated in a more appropriate place.	2025-06-03 10:39:04 -05:00
Ross Brunton	050892d2f8	[Offload] Use new error code handling mechanism and lower-case messages (#139275 ) [Offload] Use new error code handling mechanism This removes the old ErrorCode-less error method and requires every user to provide a concrete error code. All calls have been updated. In addition, for consistency with error messages elsewhere in LLVM, all messages have been made to start lower case.	2025-05-20 08:50:20 -05:00
Ross Brunton	1532ee6916	[Offload] Add Error Codes to PluginInterface (#138258 ) A new ErrorCode enumeration is present in PluginInterface which can be used when returning an llvm::Error from offload and PluginInterface functions. This enum must be kept up to sync with liboffload's ol_errc_t enum, so both are automatically generated from liboffload's enum definition. Some error codes have also been shuffled around to allow for future work. Note that this patch only adds the machinery; actual error codes will be added in a future patch. ~~Depends on #137339 , please ignore first commit of this MR.~~ This has been merged.	2025-05-19 09:38:34 -05:00
Dhruva Chakrabarti	f965996cfb	[Offload] Remove unused field IsBareKernel. (#139815 )	2025-05-13 17:35:55 -07:00
Joseph Huber	92bba68634	[Offload] Fix handling of 'bare' mode when environment missing (#136794 ) Summary: We treated the missing kernel environment as a unique mode, but it was kind of this random bool that was doing the same thing and it explicitly expects the kernel environment to be zero. It broke after the previous change since it used to default to SPMD and didn't handle zero in any of the other cases despite being used. This fixes that and queries for it without needing to consume an error.	2025-04-23 08:16:39 -05:00
Christian Clauss	1f56bb3137	[Offload][NFC] Fix typos discovered by codespell (#125119 ) https://github.com/codespell-project/codespell % `codespell --ignore-words-list=archtype,hsa,identty,inout,iself,nd,te,ths,vertexes --write-changes`	2025-01-31 09:35:29 -06:00
Shilei Tian	92376c3ff5	[Offload][OMPX] Add the runtime support for multi-dim grid and block (#118042 )	2024-12-06 09:07:50 -05:00
Callum Fare	fd3907ccb5	Reland #118503 : [Offload] Introduce offload-tblgen and initial new API implementation (#118614 ) Reland #118503. Added a fix for builds with `-DBUILD_SHARED_LIBS=ON` (see last commit). Otherwise the changes are identical. --- ### New API Previous discussions at the LLVM/Offload meeting have brought up the need for a new API for exposing the functionality of the plugins. This change introduces a very small subset of a new API, which is primarily for testing the offload tooling and demonstrating how a new API can fit into the existing code base without being too disruptive. Exact designs for these entry points and future additions can be worked out over time. The new API does however introduce the bare minimum functionality to implement device discovery for Unified Runtime and SYCL. This means that the `urinfo` and `sycl-ls` tools can be used on top of Offload. A (rough) implementation of a Unified Runtime adapter (aka plugin) for Offload is available [here](https://github.com/callumfare/unified-runtime/tree/offload_adapter). Our intention is to maintain this and use it to implement and test Offload API changes with SYCL. ### Demoing the new API ```sh # From the runtime build directory $ ninja LibomptUnitTests $ OFFLOAD_TRACE=1 ./offload/unittests/OffloadAPI/offload.unittests ``` ### Open questions and future work * Only some of the available device info is exposed, and not all the possible device queries needed for SYCL are implemented by the plugins. A sensible next step would be to refactor and extend the existing device info queries in the plugins. The existing info queries are all strings, but the new API introduces the ability to return any arbitrary type. * It may be sensible at some point for the plugins to implement the new API directly, and the higher level code on top of it could be made generic, but this is more of a long-term possibility.	2024-12-05 09:34:04 +01:00

1 2

74 Commits