llvm-project

Author	SHA1	Message	Date
Kevin Sala Penades	00d5f660f4	[offload][CUDA] Fix DLWRAP for memory routines (#190500 )	2026-04-04 19:29:25 -07:00
Matt Arsenault	6f68e58519	offload: Parse triple using to identify amdgcn-amd-amdhsa (#190319 ) Avoid hardcoding the exact triple.	2026-04-03 23:22:48 +02:00
Kewen Meng	f29d23844c	[Buildbot][AMDGPU] Adapt to recent CMake change (#190381 ) Make changes to adapt to https://github.com/llvm/llvm-project/pull/190349	2026-04-03 18:15:44 +00:00
Joseph Huber	d8ba56ce3f	[compiler-rt] Split the GPU.cmake cache file to AMDGPU and NVPTX (#190349 ) Summary: These will have different functionality going forward. They should be split so we can more easily support things only feasible in AMDGPU.	2026-04-03 10:44:04 -05:00
Robert Imschweiler	a2d3783b45	[offload][libc] Adapt test to changes in #190239 (#190330 )	2026-04-03 12:03:28 +02:00
Joseph Huber	b2d3a6574c	[libc] Rename rpc::Status to rpc::RPCStatus to reduce conflicts (#190239 ) Summary: `Status` is unfortunately heavily overloaded in practice. Things like X11 define it as a macro. Best to just remove that possibility entirely.	2026-04-02 14:55:57 -05:00
Michael Kruse	afb80bddf1	[Runtimes] Introduce variables containing resource dir paths (#177953 ) Introduce common infrastructure for runtimes that determines compiler resource path locations. These variables introduced are: * RUNTIMES_OUTPUT_RESOURCE_DIR * RUNTIMES_INSTALL_RESOURCE_PATH That contain the location for the compiler resource path (typically `lib/clang/<version>`) in the build tree and the install tree (the latter relative to CMAKE_INSTALL_PREFIX). Additionally, define * RUNTIMES_OUTPUT_RESOURCE_LIB_DIR * RUNTIMES_INSTALL_RESOURCE_LIB_PATH as for the location of clang/flang version-locked libraries (typically `lib${LLVM_LIBDIR_SUFFIX}/<targer-triple>`, but also depends on `APPLE` and `LLVM_ENABLE_PER_TARGET_RUNTIME_DIR`). This code is moved from flang-rt and initially becomes its only user. Refactored out of #171610 as requested [here](https://github.com/llvm/llvm-project/pull/171610#discussion_r2687382481). Extracted `get_runtimes_target_libdir_common` from compiler-rt as requested [here](https://github.com/llvm/llvm-project/pull/171610#discussion_r2689565634). Added TODO comments to all runtimes as requested [here](https://github.com/llvm/llvm-project/pull/171610#issuecomment-3789598635).	2026-04-02 10:32:14 +00:00
Leandro Lacerda	34028294e4	[Offload] Add support for measuring elapsed time between events (#186856 ) This patch adds `olGetEventElapsedTime` to the new LLVM Offload API, as requested in [#185728](https://github.com/llvm/llvm-project/issues/185728), and adds the corresponding support in `plugins-nextgen`. A main motivation for this change is to make it possible to measure the elapsed time of work submitted to a queue, especially kernel launches. This is relevant to the intended use of the new Offload API for microbenchmarking GPU libc math functions. ### Summary The new API returns the elapsed time, in milliseconds, between two events on the same device. To support the common pattern `create start event → enqueue kernel → create end event → sync end event → get elapsed time`, `olCreateEvent` now always creates and records a backend event through the device interface. For backends that materialize real event state, this gives the event concrete backend state that can be used for elapsed-time measurement. For backends that do not materialize backend event state, `EventInfo` may still remain null and existing event operations continue to treat such events as trivially complete. Previously, an event created on an empty queue could be represented only as a logical event. That representation was sufficient for sync and completion queries, but it was not suitable for elapsed-time measurement because there was no backend event state to timestamp. The new behavior preserves the meaning of completion of prior work while also allowing backends with timing support to attach real event state. ### Changes in `plugins-nextgen` #### Common interface Add elapsed-time support to the common device and plugin interfaces: * `GenericPluginTy::get_event_elapsed_time` * `GenericDeviceTy::getEventElapsedTime` * `GenericDeviceTy::getEventElapsedTimeImpl` #### AMDGPU * Add the required ROCr declarations and wrappers. * Enable queue profiling at queue creation time. * Record events by enqueuing a real barrier marker packet on the stream. * Retain the timing signal needed to query the recorded marker later. * Implement `getEventElapsedTimeImpl` using `hsa_amd_profiling_get_dispatch_time`, converting the result to milliseconds with `HSA_SYSTEM_INFO_TIMESTAMP_FREQUENCY`. This follows the ROCm/HIP approach of enabling queue profiling at HSA queue creation time, while keeping the AMDGPU queue path simpler than the lazy-enable alternative discussed during review. #### CUDA * Add the required CUDA driver declarations and wrappers. * Implement `getEventElapsedTimeImpl` with `cuEventElapsedTime`. #### Host * Add `getEventElapsedTimeImpl` that stores `0.0f` in the output pointer, when present, and returns success. Reason: the host plugin does not materialize backend event state and already treats event operations as trivially successful. Returning `0.0f` preserves that model without introducing a new failure mode. #### Level Zero * Add `getEventElapsedTimeImpl`, but leave it unimplemented. Reason: the Level Zero plugin currently does not provide standalone backend event support for this event model. For example, `waitEventImpl` / `syncEventImpl` are still unimplemented there. --------- Signed-off-by: Leandro Augusto Lacerda Campos <leandrolcampos@yahoo.com.br> Signed-off-by: Leandro A. Lacerda Campos <leandrolcampos@yahoo.com.br>	2026-04-01 14:13:44 -05:00
Joseph Huber	b528f6746d	[Offload] Run liboffload unit tests as a part of check-offload (#189731 ) Summary: These are currently only run with check-offload-unit. Make them a part of the other tests by putting a dependency on it. We did something like this previously but it was reverted because the tests failed if there were no GPUs (like in systems that only checked the CPU case) but I think that has been fixed.	2026-04-01 11:06:51 -05:00
Nick Sarnie	899a78cbc4	[offload][lit] Disable target_critical_region.cpp on Intel GPU (#189682 ) Already disabled on other GPU platforms and sporadically failing on our builder, so this test seems not be doing too hot. Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>	2026-03-31 18:54:17 +00:00
Nick Sarnie	38a46a12c4	[offload][lit] Disable tests failing on Intel GPU (#189422 ) Fix some tests causing hangs, one fail, and a few XPASSing. We are seeing new passes/fails because of the named barrier changes being merged. Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>	2026-03-30 18:02:34 +00:00
fineg74	563d3f6865	[OFFLOAD] Disable tests that may cause hangs in CI (#189116 )	2026-03-27 21:32:25 +00:00
fineg74	1611a23a5b	[OFFLOAD] Add spirv implementation for named barrier (#180393 ) This change adds implementation for named barriers for SPIRV backend. Since there is no built in API/intrinsics for named barrier in SPIRV, the implementation loosely follows implementation for AMD	2026-03-27 20:14:09 +01:00
Joseph Huber	15bfc06b6b	[Offload][NFC] Various minor changes to Offload CMake (#189029 ) Summary: Most of these just remove some redundancy or rename `openmp` -> `offload` where the variable is purely internal.	2026-03-27 12:06:37 -05:00
Joseph Huber	45d7ef423d	[Offload][NFC] Remove unused testing functions in CMake (#189013 ) Summary: These are called by no one.	2026-03-27 10:04:49 -05:00
Nick Sarnie	1aefe3b111	[offload][L0] Remove XFAIL from XPASSING test strided_offset_multidim_update.c (#188836 ) Passing now I guess https://lab.llvm.org/buildbot/#/builders/225/builds/4729 ``` ******************** Unexpectedly Passed Tests (1): libomptarget :: spirv64-intel :: offloading/strided_offset_multidim_update.c ``` Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>	2026-03-26 21:06:22 +00:00
Joseph Huber	ffd6a13b5f	[compiler-rt] Rework profile data handling for GPU targets (#187136 ) Summary: Currently, the GPU iterates through all of the present symbols and copies them by prefix. This is inefficient as it requires a lot of small high-latency data transfers rather than a few large ones. Additionally, we force every single profiling symbol to have protected visibility. This means potentially hundreds of unnecessary symbols in the symbol table. This PR changes the interface to move towards the start / stop section handling. AMDGPU supports this natively as an ELF target, so we need little changes. Instead of overriding visibility, we use a single table to define the bounds that we can obtain with one contiguous load. Using a table interface should also work for the in-progress HIP implementation for this, as it wraps the start / stop sections into standard void pointers which will be inside of an already mapped region of memory, so they should be accessible from the HIP API. NVPTX is more difficult as it is an ELF platform without this support. I have hooked up the 'Other' handling to work around this, but even then it's a bit of a stretch. I could remove this support here, but I wanted to demonstrate that we can share the ABI. However, NVPTX will only work if we force LTO and change the backend to emit variables in the same TL;DR, we now do this: ```c struct { start1, stop1, start2, stop2, start3, stop3, version; } device; struct host = DtoH(lookup("device")); counters = DtoH(host.stop - host.start) version = DtoH(host.version); ```	2026-03-26 10:17:43 -05:00
Ivan R. Ivanov	19420c0e77	[OpenMP] Fix non-contiguous array omp target update (#156889 ) The existing implementation has three issues which this patch addresses. 1. The last dimension which represents the bytes in the type, has the wrong stride and count. For example, for a 4 byte int, count=1 and stride=4. The correct representation here is count=4 and stride=1 because there are 4 bytes (count=4) that we need to copy and we do not skip any bytes (stride=1). 2. The size of the data copy was computed using the last dimension. However, this is incorrect in cases where some of the final dimensions get merged into one. In this case we need to take the combined size of the merged dimensions, which is (Count * Stride) of the first merged dimension. 3. The Offset into a dimension was computed as a multiple of its Stride. However, this Stride which is in bytes, already includes the stride multiplier given by the user. This means that when the user specified 1:3:2, i.e. elements 1, 3, 5, the runtime incorrectly copied elements 2, 4, 6. Fix this by precomputing at compile time the Offset to be in bytes by correctly multiplying the offset by the stride of the dimension without the user-specified multiplier.	2026-03-26 15:55:31 +01:00
Joseph Huber	82530154ef	[Offload] Enable multilib building for OpenMP/Offload (#188485 ) Summary: Right now the CMake does not follow the pattern other runtime projects use. All this does is use the standard subdir to place libraries in a unique location. This allows, for example, users to configure a debug version of openmp / offload within the same CMake invocation. --------- Co-authored-by: Michael Kruse <github@meinersbur.de>	2026-03-26 07:37:22 -05:00
Alex Duran	64e7c77e04	[OFFLOAD][L0] More error handling (#188496 ) This PR improves cleanup/handling of errors in some memory operations, allocating event pools, ...	2026-03-26 05:50:26 +01:00
fineg74	1dbf7c7e1b	[OFFLOAD] Improve resource management of the plugin (#187597 ) This PR improves event management of the plugin by fixing potential resource leaks and preventing a potential deadlock	2026-03-25 09:50:38 +01:00
Alex Duran	e40062c0bd	[OFFLOAD][L0] Add support to run ctor/dtor code (#187510 ) This PR adds support in the Level Zero plugin to execute constructors/destructors on the device code. As spirv-link has some limitations, it mimics the CUDA plugin behavior where the RTL constructs the device side tables before invoking the kernel that will execute them. The kernel and other necessary symbols to create the device tables are created by the SPIRVCtorDtorLowering pass to be added in #187509	2026-03-25 08:43:44 +01:00
Alex Duran	227bab0a62	[OFFLOAD][L0] Improve cleanup on errors (#188251 ) Additional cleanup improvements on error conditions (in addition to those in #187597): * Fixed incomplete cleanup in L0Context::init() * Fixed build log leak in addModule() * Fixed context inconsistent state in findDevices() Disclaimer: The base of this PR was generated by Claude and adjusted by me afterwards.	2026-03-24 15:36:01 +01:00
Joseph Huber	376874a345	[Offload] Fix destroying signal that was never initialized Summary: We create the RPC doorbell signal lazily and destroy it at the plugin level. This means that we can't rely on the normal 'per-device' handling so this needs to be called unconditionally. We only create the signal if a device is registered, but deinit is called unconditionally. Just check the handle.	2026-03-24 09:29:27 -05:00
Joseph Huber	4961700c10	[libc] Support AMDGPU device interrupts for the RPC interface (#188067 ) Summary: One of the main disadvantages to using the RPC interface is that it requires a server thread to spin on the mailboxes checking for work. The vast majority of the time, there will be no work and work will come in large bursts. The HSA / KFD interface supports device-side interrupts and already has handling for binding these events to an HSA signal. This means that we can send interrupts from the GPU to wake a sleeping thread on the CPU. The sleeping thread will be descheduled with a blocking HSA wait call and woken up when its event ID is raised through the kernel driver's interrupt. This is very target-specific handling, but I believe it is valuable enough to warrant it being in the protocol. It is completely optional, as it is ignored if uninitialized. This should bring this support at parity with the interface HIP expects.	2026-03-24 08:48:52 -05:00
Joseph Huber	07896d44a3	[OpenMP] Emit aggregate kernel prototypes and remove libffi dependency (#186261 ) Summary: This PR changes the handling of the emitted kernels when targeting a CPU to be a pointer struct. The old handling emitted a standard function prototype, this necessitated a target specific ABI to call it because the signature differed with the number of arguments. Instead, this PR emits a void pointer to a naturally aligned struct, this is what APIs like `pthreads` assert. This allows us to remove all the complexity around launching host kernels and just pass the argument list.	2026-03-20 13:08:23 -05:00
Robert Imschweiler	bc6a265e3b	[offload] Use flang-rt for test feature requirements (#187733 )	2026-03-20 18:59:06 +01:00
Robert Imschweiler	c3e7b4556e	[offload] Define flang-rt as an available test feature (#187732 ) Can now be used as `REQUIRES: flang-rt`, for example.	2026-03-20 17:47:51 +01:00
Bruce Changlong Xu	cbab7e65a7	[AMDGPU] Minor cleanups in offload plugin and AMDGPUEmitPrintf. NFC. (#187587 ) Use empty() in assert, brace-init instead of std::make_pair in the AMDGPU offload plugin, and fix a comment typo in AMDGPUEmitPrintf.	2026-03-19 18:16:47 -04:00
Joseph Huber	d18a784d41	[compiler-rt] Define GPU specific handling of profiling functions (#185763 ) Summary: The changes in https://www.github.com/llvm/llvm-project/pull/185552 allowed us to start building the standard `libclang_rt.profile.a` for GPU targets. This PR expands this by adding an optimized GPU routine for counter increment and removing the special-case handling of these functions in the OpenMP runtime. Vast majority of these functions are boilerplate, but we should be able to do more interesting things with this in the future, like value or memory profiling.	2026-03-19 10:51:48 -05:00
estewart08	0e7262407c	[offload] - Remove standalone build in favor of 'runtimes' (#170693 ) Summary: Follow up on removal of OPENMP_STANDALONE_BUILD in openmp (#149878). This build method is redundant and can be accomplished via runtimes. Removes support for: `cmake -S <llvm-project>/offload ...` Switches over to: `make -S <llvm-project>/runtimes -DLLVM_ENABLE_RUNTIMES=openmp;offload ...` Libomptarget has a dependency on libomp.so and requires the omp cmake target to exist at build time, which is why both runtimes are listed. Updates cmake compiler logic in offload/CMakeLists.txt to mirror openmp changes: [openmp] Allow testing OpenMP without a full clang build tree (#182470) User will still need to have a separate invocation to build openmp DeviceRTL via: `-DLLVM_ENABLE_RUNTIMES=openmp` `-DLLVM_DEFAULT_TARGET_TRIPLE=<amdgcn-amd-amdhsa\|nvptx64-nvidia-cuda>`	2026-03-19 09:00:40 -05:00
Jan Patrick Lehr	7a2193cd19	[Offload] Add CMake alias for CI (#186099 ) In the pre-merge CI we need a top-level visible target that can be used to build offload, i.e., libomptarget and LLVMOffload. The related PR to include offload into pre-merge CI is here: https://github.com/llvm/llvm-project/pull/174955	2026-03-18 15:46:08 +01:00
fineg74	2890f9883c	[OFFLOAD] Improve handling of synchronization errors in L0 plugin and reenable tests (#186927 ) This change improves handling of errors during synchronization in Level Zero plugin by ensuring cleanup of queues and events in case of an synchronization error. As a result multiple tests stopped hanging. --------- Co-authored-by: Duran, Alex <alejandro.duran@intel.com>	2026-03-18 05:50:06 +01:00
Jan Patrick Lehr	964091a2db	[OpenMP][AMDGPU] Enable omptest build (#161649 ) This enables building the omptest library across the AMD buildbots that rely on this CMake cache.	2026-03-16 15:25:12 +00:00
Joseph Huber	154a128c65	Reapply "[OpenMP] Move OpenMP implicit argument to the end and reformat" (#186309 ) Should be working downstream now This reverts commit 9b61ff210fdff752d5db55b128474e9990258488.	2026-03-13 15:48:37 -05:00
Piotr Balcer	1b9a4a0f72	[Offload][L0] clear completed events from a wait list (#186379 ) Queue's WaitEvent collection wasn't being cleared after synchronization and resetting of the events. This led to hangs on subsequent host synchronizations if not preceeded by any other operation.	2026-03-13 13:56:27 +00:00
theRonShark	9b61ff210f	Revert "[OpenMP] Move OpenMP implicit argument to the end and reformat" (#186309 ) Reverts llvm/llvm-project#185989	2026-03-13 05:20:40 +00:00
Kevin Sala Penades	ac71b185c2	[offload] Remove LIBOMPTARGET_SHARED_MEMORY_SIZE envar (#186231 ) This commit removes the `LIBOMPTARGET_SHARED_MEMORY_SIZE` envar and outputs a runtime warning if it is defined. Access to dynamic shared memory should be obtained through the `dyn_groupprivate` clause (OpenMP 6.1) or the launch arguments in liboffload kernel launch.	2026-03-12 21:21:29 -07:00
Joseph Huber	4376fbd793	[OpenMP] Move OpenMP implicit argument to the end and reformat (#185989 ) Summary: We use this `dyn_ptr` argument in Clang/OpenMP to handle the `KernelLaunchEnvironment`. This is a per-kernel argument used to share some information. Currenetly, it's prepended to the argument list and we generate storage for it in the runtime. This is bad for a few reasons: 1. It changes the ABI by shifting user arguments 2. It cannot be trivially be left uninitialized if unused 3. The runtime must allocate its own memory for it This PR changes it to be appended instead. Additionally, space for this is always emitted. This means the OMPIRBuilder itself will provide the storage, we simply need to populate it in the runtime if it is used. This means that if it's unused we don't always pay the cost and it's easier for non-OpenMP users to ignore it. Backward compatibility is maintained by auto-upgrading the kernel arguments. In `libomptarget` we completely allocate a new buffer to store this in the new format. The plugins still need to respect the old ABI of the called device object, so we simply rotate it if it's the old version.	2026-03-12 18:08:22 -05:00
Nick Sarnie	1beec14434	[offload][lit] XFAIL new tests failing on intelgpu (#185908 ) New tests from https://github.com/llvm/llvm-project/pull/176708 and https://github.com/llvm/llvm-project/pull/181987 fail on `intelgpu`, I updated the [GH issue](https://github.com/llvm/llvm-project/issues/182897). Example fails [here](https://lab.llvm.org/buildbot/#/builders/225/builds/3441). Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>	2026-03-12 14:37:56 +00:00
Jan Patrick Lehr	ee797883e8	[Offload] Escape \; in command string (#186120 ) This adds a \ in front of the ; between the two cache files to stop the run function to interpret it as a shell statement separator (or so).	2026-03-12 15:02:40 +01:00
Jan Patrick Lehr	bb72ec480f	[Offload] AMD Flang bot to use CMake cache file (#186070 ) Converting the current bot config to use the CMake cache file that we use in other bots (offload/cmake/caches/AMDGPUBot.cmake). This PR removes all CMake settings that the cache file already sets and only leaves those that were either not set explicitly or which differ. Thus, first load the cache file and then adjust the settings to override existing values.	2026-03-12 14:10:04 +01:00
Kevin Sala Penades	1f583c6dee	[OpenMP][Offload] Add offload runtime support for dyn_groupprivate clause (#152831 ) Part 3 adding offload runtime support. See https://github.com/llvm/llvm-project/pull/152651. --------- Co-authored-by: Krzysztof Parzyszek <Krzysztof.Parzyszek@amd.com>	2026-03-12 01:13:06 -07:00
Łukasz Plewa	0e122bea82	[OFFLOAD] Enable Level Zero unittests (#185492 )	2026-03-11 14:09:59 +00:00
Amit Tiwari	a15dcd4117	[Clang][OpenMP] Handled `NonContig` Descriptor `DimCount` (#181987 ) ### Issue: Dimension override missing When variable count expressions were used with stride, the constant subsection path computed size first. This marked `ArgSizes` with byte size semantics. Variable expression logic later triggered, but reused `ArgSizes` assuming "bytes" semantics `OMPIRBuilder.cpp` didn't handle dimension count for `OMP_MAP_NON_CONTIG` flag Result: `ArgSizes` wasn't overwritten with dimension count, breaking non-contiguous mapping. Fixes: `llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp` - Expression semantics for non-contiguous. stride/count. Generate 3D descriptor structures with runtime dimensions. Fix dimension override to use dimension count instead of byte size. Added testcases to cover stack arrays, heap pointers, struct members, etc.	2026-03-11 19:39:55 +05:30
Alex Duran	789fea83bb	[offload][l0][nfc] remove duplicated entry (#185855 ) Remove left over function by mistake from #185404	2026-03-11 11:55:30 +01:00
Alex Duran	3ff332ad0f	[Offload][L0] Add support for OffloadBinary format in L0 plugin (#185404 ) - Accept OffloadBinaries as valid images by plugins that support them in the PluginInterface. - Add support in L0 plugin to extract SPIRV images and their associated metadata from an OffloadBinary image. Depends on: - #185663 Follow-up PRs: - #185413 (Changes SPIRV wrapper generation to use OffloadBinary) - #185425 (Adjusts llvm-objdump) - #184774 (Adjusts llvm-offload-binary)	2026-03-11 11:42:36 +01:00
Joseph Huber	fd069a46bf	[copmiler-rt] Initial support for building profile library on the GPU (#185552 ) Summary: As suggested in https://github.com/llvm/llvm-project/pull/177665, we should build a GPU version of the compiler-rt profile library instead of writing it in-line in the lowering. This PR does not define anything GPU specific, it simply re-uses the baremetal handling. Later PRs will prevent the GPU specific handling we would want to do to optimize counter handling on the GPU. Note that this will require using the cache file, or setting these options manually for existing users. Hopefully if people are using the cache file as they should it won't break anything.	2026-03-10 13:45:18 -05:00
Alex Duran	be021b8433	[OFFLOAD] Add interface to extend image validation (#185663 ) As discussed in #185404 we might want to provide a way for plugins to validate images not recognized by the common layer. This PR adds such extension and uses it to validate pure SPIRV images by the Level Zero plugin.	2026-03-10 18:41:23 +01:00
Amit Tiwari	14de1bb711	[Clang][OpenMP] Support expression semantics in target update fields with non-contiguous array sections (#176708 ) ### Issue: Variable stride not recognized as non-contiguous `CGOpenMPRuntime.cpp` failed to detect `DeclRefExpr`, `MemberExpr`, `ArraySubscriptExpr` as non-contiguous. Fixes: `clang/lib/CodeGen/CGOpenMPRuntime.cpp` - Variable stride detection + dimension count logic Detect variable stride expressions (`DeclRefExpr/MemberExpr/ArraySubscriptExpr`) as non-contiguous Added testcases to cover stack arrays, heap pointers, struct members, etc., for expression semantics in non-contiguous update.	2026-03-10 19:26:17 +05:30

1 2 3 4 5 ...

685 Commits