llvm-project

Author	SHA1	Message	Date
Kevin Sala Penades	00d5f660f4	[offload][CUDA] Fix DLWRAP for memory routines (#190500 )	2026-04-04 19:29:25 -07:00
Matt Arsenault	6f68e58519	offload: Parse triple using to identify amdgcn-amd-amdhsa (#190319 ) Avoid hardcoding the exact triple.	2026-04-03 23:22:48 +02:00
Joseph Huber	b2d3a6574c	[libc] Rename rpc::Status to rpc::RPCStatus to reduce conflicts (#190239 ) Summary: `Status` is unfortunately heavily overloaded in practice. Things like X11 define it as a macro. Best to just remove that possibility entirely.	2026-04-02 14:55:57 -05:00
Leandro Lacerda	34028294e4	[Offload] Add support for measuring elapsed time between events (#186856 ) This patch adds `olGetEventElapsedTime` to the new LLVM Offload API, as requested in [#185728](https://github.com/llvm/llvm-project/issues/185728), and adds the corresponding support in `plugins-nextgen`. A main motivation for this change is to make it possible to measure the elapsed time of work submitted to a queue, especially kernel launches. This is relevant to the intended use of the new Offload API for microbenchmarking GPU libc math functions. ### Summary The new API returns the elapsed time, in milliseconds, between two events on the same device. To support the common pattern `create start event → enqueue kernel → create end event → sync end event → get elapsed time`, `olCreateEvent` now always creates and records a backend event through the device interface. For backends that materialize real event state, this gives the event concrete backend state that can be used for elapsed-time measurement. For backends that do not materialize backend event state, `EventInfo` may still remain null and existing event operations continue to treat such events as trivially complete. Previously, an event created on an empty queue could be represented only as a logical event. That representation was sufficient for sync and completion queries, but it was not suitable for elapsed-time measurement because there was no backend event state to timestamp. The new behavior preserves the meaning of completion of prior work while also allowing backends with timing support to attach real event state. ### Changes in `plugins-nextgen` #### Common interface Add elapsed-time support to the common device and plugin interfaces: * `GenericPluginTy::get_event_elapsed_time` * `GenericDeviceTy::getEventElapsedTime` * `GenericDeviceTy::getEventElapsedTimeImpl` #### AMDGPU * Add the required ROCr declarations and wrappers. * Enable queue profiling at queue creation time. * Record events by enqueuing a real barrier marker packet on the stream. * Retain the timing signal needed to query the recorded marker later. * Implement `getEventElapsedTimeImpl` using `hsa_amd_profiling_get_dispatch_time`, converting the result to milliseconds with `HSA_SYSTEM_INFO_TIMESTAMP_FREQUENCY`. This follows the ROCm/HIP approach of enabling queue profiling at HSA queue creation time, while keeping the AMDGPU queue path simpler than the lazy-enable alternative discussed during review. #### CUDA * Add the required CUDA driver declarations and wrappers. * Implement `getEventElapsedTimeImpl` with `cuEventElapsedTime`. #### Host * Add `getEventElapsedTimeImpl` that stores `0.0f` in the output pointer, when present, and returns success. Reason: the host plugin does not materialize backend event state and already treats event operations as trivially successful. Returning `0.0f` preserves that model without introducing a new failure mode. #### Level Zero * Add `getEventElapsedTimeImpl`, but leave it unimplemented. Reason: the Level Zero plugin currently does not provide standalone backend event support for this event model. For example, `waitEventImpl` / `syncEventImpl` are still unimplemented there. --------- Signed-off-by: Leandro Augusto Lacerda Campos <leandrolcampos@yahoo.com.br> Signed-off-by: Leandro A. Lacerda Campos <leandrolcampos@yahoo.com.br>	2026-04-01 14:13:44 -05:00
Joseph Huber	15bfc06b6b	[Offload][NFC] Various minor changes to Offload CMake (#189029 ) Summary: Most of these just remove some redundancy or rename `openmp` -> `offload` where the variable is purely internal.	2026-03-27 12:06:37 -05:00
Joseph Huber	ffd6a13b5f	[compiler-rt] Rework profile data handling for GPU targets (#187136 ) Summary: Currently, the GPU iterates through all of the present symbols and copies them by prefix. This is inefficient as it requires a lot of small high-latency data transfers rather than a few large ones. Additionally, we force every single profiling symbol to have protected visibility. This means potentially hundreds of unnecessary symbols in the symbol table. This PR changes the interface to move towards the start / stop section handling. AMDGPU supports this natively as an ELF target, so we need little changes. Instead of overriding visibility, we use a single table to define the bounds that we can obtain with one contiguous load. Using a table interface should also work for the in-progress HIP implementation for this, as it wraps the start / stop sections into standard void pointers which will be inside of an already mapped region of memory, so they should be accessible from the HIP API. NVPTX is more difficult as it is an ELF platform without this support. I have hooked up the 'Other' handling to work around this, but even then it's a bit of a stretch. I could remove this support here, but I wanted to demonstrate that we can share the ABI. However, NVPTX will only work if we force LTO and change the backend to emit variables in the same TL;DR, we now do this: ```c struct { start1, stop1, start2, stop2, start3, stop3, version; } device; struct host = DtoH(lookup("device")); counters = DtoH(host.stop - host.start) version = DtoH(host.version); ```	2026-03-26 10:17:43 -05:00
Alex Duran	64e7c77e04	[OFFLOAD][L0] More error handling (#188496 ) This PR improves cleanup/handling of errors in some memory operations, allocating event pools, ...	2026-03-26 05:50:26 +01:00
fineg74	1dbf7c7e1b	[OFFLOAD] Improve resource management of the plugin (#187597 ) This PR improves event management of the plugin by fixing potential resource leaks and preventing a potential deadlock	2026-03-25 09:50:38 +01:00
Alex Duran	e40062c0bd	[OFFLOAD][L0] Add support to run ctor/dtor code (#187510 ) This PR adds support in the Level Zero plugin to execute constructors/destructors on the device code. As spirv-link has some limitations, it mimics the CUDA plugin behavior where the RTL constructs the device side tables before invoking the kernel that will execute them. The kernel and other necessary symbols to create the device tables are created by the SPIRVCtorDtorLowering pass to be added in #187509	2026-03-25 08:43:44 +01:00
Alex Duran	227bab0a62	[OFFLOAD][L0] Improve cleanup on errors (#188251 ) Additional cleanup improvements on error conditions (in addition to those in #187597): * Fixed incomplete cleanup in L0Context::init() * Fixed build log leak in addModule() * Fixed context inconsistent state in findDevices() Disclaimer: The base of this PR was generated by Claude and adjusted by me afterwards.	2026-03-24 15:36:01 +01:00
Joseph Huber	376874a345	[Offload] Fix destroying signal that was never initialized Summary: We create the RPC doorbell signal lazily and destroy it at the plugin level. This means that we can't rely on the normal 'per-device' handling so this needs to be called unconditionally. We only create the signal if a device is registered, but deinit is called unconditionally. Just check the handle.	2026-03-24 09:29:27 -05:00
Joseph Huber	4961700c10	[libc] Support AMDGPU device interrupts for the RPC interface (#188067 ) Summary: One of the main disadvantages to using the RPC interface is that it requires a server thread to spin on the mailboxes checking for work. The vast majority of the time, there will be no work and work will come in large bursts. The HSA / KFD interface supports device-side interrupts and already has handling for binding these events to an HSA signal. This means that we can send interrupts from the GPU to wake a sleeping thread on the CPU. The sleeping thread will be descheduled with a blocking HSA wait call and woken up when its event ID is raised through the kernel driver's interrupt. This is very target-specific handling, but I believe it is valuable enough to warrant it being in the protocol. It is completely optional, as it is ignored if uninitialized. This should bring this support at parity with the interface HIP expects.	2026-03-24 08:48:52 -05:00
Joseph Huber	07896d44a3	[OpenMP] Emit aggregate kernel prototypes and remove libffi dependency (#186261 ) Summary: This PR changes the handling of the emitted kernels when targeting a CPU to be a pointer struct. The old handling emitted a standard function prototype, this necessitated a target specific ABI to call it because the signature differed with the number of arguments. Instead, this PR emits a void pointer to a naturally aligned struct, this is what APIs like `pthreads` assert. This allows us to remove all the complexity around launching host kernels and just pass the argument list.	2026-03-20 13:08:23 -05:00
Bruce Changlong Xu	cbab7e65a7	[AMDGPU] Minor cleanups in offload plugin and AMDGPUEmitPrintf. NFC. (#187587 ) Use empty() in assert, brace-init instead of std::make_pair in the AMDGPU offload plugin, and fix a comment typo in AMDGPUEmitPrintf.	2026-03-19 18:16:47 -04:00
fineg74	2890f9883c	[OFFLOAD] Improve handling of synchronization errors in L0 plugin and reenable tests (#186927 ) This change improves handling of errors during synchronization in Level Zero plugin by ensuring cleanup of queues and events in case of an synchronization error. As a result multiple tests stopped hanging. --------- Co-authored-by: Duran, Alex <alejandro.duran@intel.com>	2026-03-18 05:50:06 +01:00
Joseph Huber	154a128c65	Reapply "[OpenMP] Move OpenMP implicit argument to the end and reformat" (#186309 ) Should be working downstream now This reverts commit 9b61ff210fdff752d5db55b128474e9990258488.	2026-03-13 15:48:37 -05:00
Piotr Balcer	1b9a4a0f72	[Offload][L0] clear completed events from a wait list (#186379 ) Queue's WaitEvent collection wasn't being cleared after synchronization and resetting of the events. This led to hangs on subsequent host synchronizations if not preceeded by any other operation.	2026-03-13 13:56:27 +00:00
theRonShark	9b61ff210f	Revert "[OpenMP] Move OpenMP implicit argument to the end and reformat" (#186309 ) Reverts llvm/llvm-project#185989	2026-03-13 05:20:40 +00:00
Kevin Sala Penades	ac71b185c2	[offload] Remove LIBOMPTARGET_SHARED_MEMORY_SIZE envar (#186231 ) This commit removes the `LIBOMPTARGET_SHARED_MEMORY_SIZE` envar and outputs a runtime warning if it is defined. Access to dynamic shared memory should be obtained through the `dyn_groupprivate` clause (OpenMP 6.1) or the launch arguments in liboffload kernel launch.	2026-03-12 21:21:29 -07:00
Joseph Huber	4376fbd793	[OpenMP] Move OpenMP implicit argument to the end and reformat (#185989 ) Summary: We use this `dyn_ptr` argument in Clang/OpenMP to handle the `KernelLaunchEnvironment`. This is a per-kernel argument used to share some information. Currenetly, it's prepended to the argument list and we generate storage for it in the runtime. This is bad for a few reasons: 1. It changes the ABI by shifting user arguments 2. It cannot be trivially be left uninitialized if unused 3. The runtime must allocate its own memory for it This PR changes it to be appended instead. Additionally, space for this is always emitted. This means the OMPIRBuilder itself will provide the storage, we simply need to populate it in the runtime if it is used. This means that if it's unused we don't always pay the cost and it's easier for non-OpenMP users to ignore it. Backward compatibility is maintained by auto-upgrading the kernel arguments. In `libomptarget` we completely allocate a new buffer to store this in the new format. The plugins still need to respect the old ABI of the called device object, so we simply rotate it if it's the old version.	2026-03-12 18:08:22 -05:00
Kevin Sala Penades	1f583c6dee	[OpenMP][Offload] Add offload runtime support for dyn_groupprivate clause (#152831 ) Part 3 adding offload runtime support. See https://github.com/llvm/llvm-project/pull/152651. --------- Co-authored-by: Krzysztof Parzyszek <Krzysztof.Parzyszek@amd.com>	2026-03-12 01:13:06 -07:00
Alex Duran	789fea83bb	[offload][l0][nfc] remove duplicated entry (#185855 ) Remove left over function by mistake from #185404	2026-03-11 11:55:30 +01:00
Alex Duran	3ff332ad0f	[Offload][L0] Add support for OffloadBinary format in L0 plugin (#185404 ) - Accept OffloadBinaries as valid images by plugins that support them in the PluginInterface. - Add support in L0 plugin to extract SPIRV images and their associated metadata from an OffloadBinary image. Depends on: - #185663 Follow-up PRs: - #185413 (Changes SPIRV wrapper generation to use OffloadBinary) - #185425 (Adjusts llvm-objdump) - #184774 (Adjusts llvm-offload-binary)	2026-03-11 11:42:36 +01:00
Alex Duran	be021b8433	[OFFLOAD] Add interface to extend image validation (#185663 ) As discussed in #185404 we might want to provide a way for plugins to validate images not recognized by the common layer. This PR adds such extension and uses it to validate pure SPIRV images by the Level Zero plugin.	2026-03-10 18:41:23 +01:00
Joseph Huber	a9e457a82f	[Offload][AMDGPU] Fix RPC server on mixed w32 w64 workloads (#185496 ) Summary: This was a regression from the original LLVM-gpu-loader. We used to handle `-mwavefrontsize64` correctly in the loader by over-allocating memory and just leaving the upper 32-bits masked off. In order to handle this in offload we need to scan loaded kernels to see how much memory we need to allocate. This should be safe, the protocol is designed to handle an arbitrary size and worst-case this just wastes space.	2026-03-09 17:13:59 -05:00
Łukasz Plewa	57614e8810	[OFFLOAD] Replace C-style casts with C++ style casts in obtainInfoImpl (#185023 ) Replace C-style bool casts (bool)TmpInt with C++ functional casts bool(TmpInt)	2026-03-06 10:28:38 -06:00
Hansang Bae	8f268e63e4	[Offload] Remove unused data type (#183840 )	2026-02-27 15:46:59 -06:00
Hansang Bae	a347e1298c	[Offload] Enable memory usage printing with `alloc` debug type (#182938 )	2026-02-23 17:19:41 -06:00
Jan Patrick Lehr	92447ed273	[Offload] Fix copy-elision warning (#182848 ) This fixes a warning about a prohibited copy-elision due to the move of a temporary object.	2026-02-23 13:58:07 +00:00
Alex Duran	7ed0aa2652	[OFFLOAD][L0] Remove leftover global constructor (#182611 ) (#182665 ) fixes #182611	2026-02-21 18:09:46 +01:00
Joseph Huber	21b3461440	[flang-rt] Implement basic support for I/O from OpenMP GPU Offloading (#181039 ) Summary: This PR provides the minimal support for Fortran I/O coming from a GPU in OpenMP offloading. We use the same support the `libc` uses for its printing through the RPC server. The helper functions `rpc::dispatch` and `rpc::invoke` help make this mostly automatic. Becaus Fortran I/O is not reentrant, the vast majority of complexity comes from needing to stitch together calls from the GPU until they can be executed all at once. This is needed not only because of the limitations of recursive I/O, but without this the output would all be interleaved because of the GPU's lock-step execution. As such, the return values from the intermediate functions are meaningless, all returning true. The final value is correct however. For cookies we create a context pointer on the server to chain these together. Works on both my AMD and NVIDIA GPUs. ```fortran program hello_gpu implicit none !$omp target teams num_teams(1) !$omp parallel num_threads(2) ! Print strings print *, "Hello from GPU" !$omp end parallel !$omp end target teams end program hello_gpu ``` ```console > flang hello.f90 -O2 -fopenmp --offload-arch=gfx1030 > ./a.out Hello from GPU Hello from GPU > flang hello.f90 -O2 -fopenmp --offload-arch=sm_89 > ./a.out Hello from GPU Hello from GPU ```	2026-02-20 07:56:59 -06:00
Jan Patrick Lehr	e1e0e86e60	[Offload] Always check/consume Error (#182008 ) This fixes an issue introduced in https://github.com/llvm/llvm-project/pull/172226 where an llvm::Error is not checked in the "good" code path.	2026-02-18 13:46:21 +01:00
fineg74	1c6d774baa	[OFFLOAD] Extend olMemRegister API to handle cases when a memory block may have been mapped outside of liboffload. (#172226 ) This PR adds extends liboffload olMemRegister API to handle a case when a memory block may have been mapped before calling olMemRegister to support some use cases in libomptarget	2026-02-17 20:53:00 +00:00
Joseph Huber	d85576d368	[libc] Replace RPC 'close()' mechanism with RAII handler (#181690 ) Summary: Closing ports was previously done manually, This makes the protocol more error prone as unclosed ports will leak and eventually the locks will run out. I believe the original fear was that the RAII portion would negatively impact code generation but I have not noticed anything significant.	2026-02-16 15:14:30 -06:00
fineg74	b58a31d3ce	[OFFLOAD] Add support for host offloading device (#177307 ) The purpose of this PR is to add support of host as an offloading device to liboffload. Both OpenMP and sycl support offloading to a host as their normal workflow and therefore would require such capability from liboffload library.	2026-02-13 10:27:52 +01:00
Hansang Bae	0deb1b6e05	[Offload] Try to load Level Zero loader with version suffix (#180042 ) The default Level Zero loader `libze_loader.so` may not be available on systems that don't have Level Zero development package. Level Zero loaders with major version suffix are searched in that case.	2026-02-11 15:13:26 -06:00
Alex Duran	8b9fd4803c	[OFFLOAD] Support host plugin on Windows (#180401 ) Changes to make host plugin compile on Windows: * Change IO code to be portable * Adjust Makefiles Allow plugin to work partially when libffi support is not found dynamically (compilation works fine even on Windows because of the wrapper support).	2026-02-11 08:54:47 +01:00
Joseph Huber	2f00977fea	[Offload] Make the RPC callbacks private to each running server (#178901 ) Summary: The static object mixes callbacks from different plugins because ever since we moved to the object library target these are actually shared. Just make it a member of the base class and make it a pointer set just to do some basic deduplication.	2026-02-06 08:28:57 -06:00
Alex Duran	4096cb6017	[OFFLOAD] Fix TARGET_NAME in plugins common code (#180151 ) Unlike other names is set between quotes which prevents our debug macros to properly match it.	2026-02-06 14:12:04 +01:00
Joseph Huber	1a86c146ae	[Offload] Add a function to register an RPC Server callback (#178774 ) Summary: We provide an RPC server to manage calls initiated by the device to run on the host. This is very useful for the built-in handling we have, however there are cases where we would want to extend this functionality. Cases like Fortran or MPI would be useful, but we cannot put references to these in the core offloading runtime. This way, we can provide this as a library interface that registers custom handlers for whatever code people want.	2026-01-30 08:03:13 -06:00
Hansang Bae	85d64d1201	[Offload] Cast to `void ` in the debug message (#177019 ) There are a few places where data types based on character array or string are printed in the debug message while they do not represent strings. Such expressions should be casted to `void ` unless they represent actual strings. Change also includes casting from integral type to pointer type when appropriate.	2026-01-20 15:44:08 -06:00
fineg74	848d736e64	[OFFLOAD] Add asynchronous queue query API for libomptarget migration (#172231 ) Add liboffload asynchronous queue query API for libomptarget migration This PR adds liboffload asynchronous queue query API that needed to make libomptarget to use liboffload	2026-01-20 10:53:32 -08:00
Hansang Bae	edd857aad8	[Offload] Remove unnecessary `maybe_unused` attribute (#175855 ) The attribute is not necessary in the new debug messaging.	2026-01-15 14:31:58 -06:00
Hansang Bae	90b6d33755	[Offload] Small debug message fix in Level Zero plugin (#175958 ) Do not include trailing zeros in the device name.	2026-01-14 09:42:19 -06:00
Alex Duran	efad3563ea	[OFFLOAD] Update CUDA and AMD plugins to new debug format (#175787 )	2026-01-13 17:53:59 +01:00
Alex Duran	86e114a9b2	Revert "[OFFLOAD] Update CUDA and AMD plugins to new debug format" (#175786 ) Reverts llvm/llvm-project#175757	2026-01-13 17:13:46 +01:00
Alex Duran	7c2f49373b	[OFFLOAD] Update CUDA and AMD plugins to new debug format (#175757 ) This should be the last step before completely removing the DP macro.	2026-01-13 17:06:35 +01:00
Hansang Bae	13cd7003ad	[NFC][Offload] Rename a function (#175673 ) Renamed a function as suggested in #175664.	2026-01-12 19:40:17 -06:00
Hansang Bae	496729fe7e	[Offload] Fix level_zero plugin build (#175664 ) Build has been broken when OMPTARGET_DEBUG is undefined.	2026-01-12 16:53:23 -06:00
Hansang Bae	dae3b49cba	[Offload] Update debug message printig in the plugins (#175205 ) * Prepare a set of debug types in llvm::offload::debug to be used in plugin code * Update debug messages in the plugins	2026-01-12 14:26:43 -06:00

1 2 3 4 5

233 Commits