llvm-project

Author	SHA1	Message	Date
Leandro Lacerda	34028294e4	[Offload] Add support for measuring elapsed time between events (#186856 ) This patch adds `olGetEventElapsedTime` to the new LLVM Offload API, as requested in [#185728](https://github.com/llvm/llvm-project/issues/185728), and adds the corresponding support in `plugins-nextgen`. A main motivation for this change is to make it possible to measure the elapsed time of work submitted to a queue, especially kernel launches. This is relevant to the intended use of the new Offload API for microbenchmarking GPU libc math functions. ### Summary The new API returns the elapsed time, in milliseconds, between two events on the same device. To support the common pattern `create start event → enqueue kernel → create end event → sync end event → get elapsed time`, `olCreateEvent` now always creates and records a backend event through the device interface. For backends that materialize real event state, this gives the event concrete backend state that can be used for elapsed-time measurement. For backends that do not materialize backend event state, `EventInfo` may still remain null and existing event operations continue to treat such events as trivially complete. Previously, an event created on an empty queue could be represented only as a logical event. That representation was sufficient for sync and completion queries, but it was not suitable for elapsed-time measurement because there was no backend event state to timestamp. The new behavior preserves the meaning of completion of prior work while also allowing backends with timing support to attach real event state. ### Changes in `plugins-nextgen` #### Common interface Add elapsed-time support to the common device and plugin interfaces: * `GenericPluginTy::get_event_elapsed_time` * `GenericDeviceTy::getEventElapsedTime` * `GenericDeviceTy::getEventElapsedTimeImpl` #### AMDGPU * Add the required ROCr declarations and wrappers. * Enable queue profiling at queue creation time. * Record events by enqueuing a real barrier marker packet on the stream. * Retain the timing signal needed to query the recorded marker later. * Implement `getEventElapsedTimeImpl` using `hsa_amd_profiling_get_dispatch_time`, converting the result to milliseconds with `HSA_SYSTEM_INFO_TIMESTAMP_FREQUENCY`. This follows the ROCm/HIP approach of enabling queue profiling at HSA queue creation time, while keeping the AMDGPU queue path simpler than the lazy-enable alternative discussed during review. #### CUDA * Add the required CUDA driver declarations and wrappers. * Implement `getEventElapsedTimeImpl` with `cuEventElapsedTime`. #### Host * Add `getEventElapsedTimeImpl` that stores `0.0f` in the output pointer, when present, and returns success. Reason: the host plugin does not materialize backend event state and already treats event operations as trivially successful. Returning `0.0f` preserves that model without introducing a new failure mode. #### Level Zero * Add `getEventElapsedTimeImpl`, but leave it unimplemented. Reason: the Level Zero plugin currently does not provide standalone backend event support for this event model. For example, `waitEventImpl` / `syncEventImpl` are still unimplemented there. --------- Signed-off-by: Leandro Augusto Lacerda Campos <leandrolcampos@yahoo.com.br> Signed-off-by: Leandro A. Lacerda Campos <leandrolcampos@yahoo.com.br>	2026-04-01 14:13:44 -05:00
Joseph Huber	15bfc06b6b	[Offload][NFC] Various minor changes to Offload CMake (#189029 ) Summary: Most of these just remove some redundancy or rename `openmp` -> `offload` where the variable is purely internal.	2026-03-27 12:06:37 -05:00
Alex Duran	64e7c77e04	[OFFLOAD][L0] More error handling (#188496 ) This PR improves cleanup/handling of errors in some memory operations, allocating event pools, ...	2026-03-26 05:50:26 +01:00
fineg74	1dbf7c7e1b	[OFFLOAD] Improve resource management of the plugin (#187597 ) This PR improves event management of the plugin by fixing potential resource leaks and preventing a potential deadlock	2026-03-25 09:50:38 +01:00
Alex Duran	e40062c0bd	[OFFLOAD][L0] Add support to run ctor/dtor code (#187510 ) This PR adds support in the Level Zero plugin to execute constructors/destructors on the device code. As spirv-link has some limitations, it mimics the CUDA plugin behavior where the RTL constructs the device side tables before invoking the kernel that will execute them. The kernel and other necessary symbols to create the device tables are created by the SPIRVCtorDtorLowering pass to be added in #187509	2026-03-25 08:43:44 +01:00
Alex Duran	227bab0a62	[OFFLOAD][L0] Improve cleanup on errors (#188251 ) Additional cleanup improvements on error conditions (in addition to those in #187597): * Fixed incomplete cleanup in L0Context::init() * Fixed build log leak in addModule() * Fixed context inconsistent state in findDevices() Disclaimer: The base of this PR was generated by Claude and adjusted by me afterwards.	2026-03-24 15:36:01 +01:00
fineg74	2890f9883c	[OFFLOAD] Improve handling of synchronization errors in L0 plugin and reenable tests (#186927 ) This change improves handling of errors during synchronization in Level Zero plugin by ensuring cleanup of queues and events in case of an synchronization error. As a result multiple tests stopped hanging. --------- Co-authored-by: Duran, Alex <alejandro.duran@intel.com>	2026-03-18 05:50:06 +01:00
Piotr Balcer	1b9a4a0f72	[Offload][L0] clear completed events from a wait list (#186379 ) Queue's WaitEvent collection wasn't being cleared after synchronization and resetting of the events. This led to hangs on subsequent host synchronizations if not preceeded by any other operation.	2026-03-13 13:56:27 +00:00
Kevin Sala Penades	1f583c6dee	[OpenMP][Offload] Add offload runtime support for dyn_groupprivate clause (#152831 ) Part 3 adding offload runtime support. See https://github.com/llvm/llvm-project/pull/152651. --------- Co-authored-by: Krzysztof Parzyszek <Krzysztof.Parzyszek@amd.com>	2026-03-12 01:13:06 -07:00
Alex Duran	789fea83bb	[offload][l0][nfc] remove duplicated entry (#185855 ) Remove left over function by mistake from #185404	2026-03-11 11:55:30 +01:00
Alex Duran	3ff332ad0f	[Offload][L0] Add support for OffloadBinary format in L0 plugin (#185404 ) - Accept OffloadBinaries as valid images by plugins that support them in the PluginInterface. - Add support in L0 plugin to extract SPIRV images and their associated metadata from an OffloadBinary image. Depends on: - #185663 Follow-up PRs: - #185413 (Changes SPIRV wrapper generation to use OffloadBinary) - #185425 (Adjusts llvm-objdump) - #184774 (Adjusts llvm-offload-binary)	2026-03-11 11:42:36 +01:00
Alex Duran	be021b8433	[OFFLOAD] Add interface to extend image validation (#185663 ) As discussed in #185404 we might want to provide a way for plugins to validate images not recognized by the common layer. This PR adds such extension and uses it to validate pure SPIRV images by the Level Zero plugin.	2026-03-10 18:41:23 +01:00
Hansang Bae	8f268e63e4	[Offload] Remove unused data type (#183840 )	2026-02-27 15:46:59 -06:00
Hansang Bae	a347e1298c	[Offload] Enable memory usage printing with `alloc` debug type (#182938 )	2026-02-23 17:19:41 -06:00
Alex Duran	7ed0aa2652	[OFFLOAD][L0] Remove leftover global constructor (#182611 ) (#182665 ) fixes #182611	2026-02-21 18:09:46 +01:00
Hansang Bae	0deb1b6e05	[Offload] Try to load Level Zero loader with version suffix (#180042 ) The default Level Zero loader `libze_loader.so` may not be available on systems that don't have Level Zero development package. Level Zero loaders with major version suffix are searched in that case.	2026-02-11 15:13:26 -06:00
fineg74	848d736e64	[OFFLOAD] Add asynchronous queue query API for libomptarget migration (#172231 ) Add liboffload asynchronous queue query API for libomptarget migration This PR adds liboffload asynchronous queue query API that needed to make libomptarget to use liboffload	2026-01-20 10:53:32 -08:00
Hansang Bae	90b6d33755	[Offload] Small debug message fix in Level Zero plugin (#175958 ) Do not include trailing zeros in the device name.	2026-01-14 09:42:19 -06:00
Hansang Bae	13cd7003ad	[NFC][Offload] Rename a function (#175673 ) Renamed a function as suggested in #175664.	2026-01-12 19:40:17 -06:00
Hansang Bae	496729fe7e	[Offload] Fix level_zero plugin build (#175664 ) Build has been broken when OMPTARGET_DEBUG is undefined.	2026-01-12 16:53:23 -06:00
Hansang Bae	dae3b49cba	[Offload] Update debug message printig in the plugins (#175205 ) * Prepare a set of debug types in llvm::offload::debug to be used in plugin code * Update debug messages in the plugins	2026-01-12 14:26:43 -06:00
fineg74	1232599032	[OFFLOAD] Add memory data locking API for libomptarget migration (#173138 ) Add liboffload memory data locking API for libomptarget migration This PR adds liboffload memory data locking API that needed to make libomptarget to use liboffload	2026-01-12 13:07:57 -06:00
fineg74	583ce49a40	[OFFLOAD] Make L0 provide more information about device to be consistent with other plugins (#172946 ) Update information about devices provided by level zero plugin in order to be more consistent with other plugins.	2026-01-08 22:10:44 +00:00
Alex Duran	280e609d4e	[OFFLOAD][L0] Expose native ELF to upper layers (#172819 ) This PR refactors how the device image is built so we can expose the native ELF of the device to DeviceImageTy which solves several issues regarding symbol look up (as DeviceImageTy expects an ELF). It also simplifies the module linking code taking into account the latest changes in the driver (which adds "-library-compilation when necessary). --------- Co-authored-by: Alexey Sachkov <alexey.sachkov@intel.com> Co-authored-by: Nick Sarnie <nick.sarnie@intel.com> Co-authored-by: Joseph Huber <huberjn@outlook.com>	2025-12-18 18:03:12 +00:00
Alex Duran	5559918321	[OFFLOAD][L0] Improve symbol device lookup (#172820 ) When looking for the device address of a symbol, we need to also look if it's a function symbol if not found as global symbol in the device. --------- Co-authored-by: Alexey Sachkov <alexey.sachkov@intel.com> Co-authored-by: Nick Sarnie <nick.sarnie@intel.com> Co-authored-by: Joseph Huber <huberjn@outlook.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-12-18 15:31:20 +00:00
Alex Duran	3ac0ff2f36	[OFFLOAD][L0] Fix usages of getDebugLevel in L0 plugin (#172815 ) Support for getDebugLevel was removed as part of the new debug macros (#165416). This PR updates such usages to use the new ODBG_* macros. --------- Co-authored-by: Alexey Sachkov <alexey.sachkov@intel.com> Co-authored-by: Nick Sarnie <nick.sarnie@intel.com> Co-authored-by: Joseph Huber <huberjn@outlook.com>	2025-12-18 15:30:59 +00:00
Alex Duran	f125c8db5c	[OFFLOAD] Add plugin with support for Intel oneAPI Level Zero (#158900 ) Add a new nextgen plugin that supports GPU devices through the Intel oneAPI Level Zero library. The plugin is not enabled by default and needs to be added to LIBOMPTARGET_PLUGINS_TO_BUILD explicitely. --------- Co-authored-by: Alexey Sachkov <alexey.sachkov@intel.com> Co-authored-by: Nick Sarnie <nick.sarnie@intel.com> Co-authored-by: Joseph Huber <huberjn@outlook.com>	2025-12-18 08:53:03 +01:00

27 Commits