llvm-project

Author	SHA1	Message	Date
Joseph Huber	2f00977fea	[Offload] Make the RPC callbacks private to each running server (#178901 ) Summary: The static object mixes callbacks from different plugins because ever since we moved to the object library target these are actually shared. Just make it a member of the base class and make it a pointer set just to do some basic deduplication.	2026-02-06 08:28:57 -06:00
Alex Duran	4096cb6017	[OFFLOAD] Fix TARGET_NAME in plugins common code (#180151 ) Unlike other names is set between quotes which prevents our debug macros to properly match it.	2026-02-06 14:12:04 +01:00
Joseph Huber	1a86c146ae	[Offload] Add a function to register an RPC Server callback (#178774 ) Summary: We provide an RPC server to manage calls initiated by the device to run on the host. This is very useful for the built-in handling we have, however there are cases where we would want to extend this functionality. Cases like Fortran or MPI would be useful, but we cannot put references to these in the core offloading runtime. This way, we can provide this as a library interface that registers custom handlers for whatever code people want.	2026-01-30 08:03:13 -06:00
Hansang Bae	85d64d1201	[Offload] Cast to `void ` in the debug message (#177019 ) There are a few places where data types based on character array or string are printed in the debug message while they do not represent strings. Such expressions should be casted to `void ` unless they represent actual strings. Change also includes casting from integral type to pointer type when appropriate.	2026-01-20 15:44:08 -06:00
fineg74	848d736e64	[OFFLOAD] Add asynchronous queue query API for libomptarget migration (#172231 ) Add liboffload asynchronous queue query API for libomptarget migration This PR adds liboffload asynchronous queue query API that needed to make libomptarget to use liboffload	2026-01-20 10:53:32 -08:00
Hansang Bae	edd857aad8	[Offload] Remove unnecessary `maybe_unused` attribute (#175855 ) The attribute is not necessary in the new debug messaging.	2026-01-15 14:31:58 -06:00
Hansang Bae	90b6d33755	[Offload] Small debug message fix in Level Zero plugin (#175958 ) Do not include trailing zeros in the device name.	2026-01-14 09:42:19 -06:00
Alex Duran	efad3563ea	[OFFLOAD] Update CUDA and AMD plugins to new debug format (#175787 )	2026-01-13 17:53:59 +01:00
Alex Duran	86e114a9b2	Revert "[OFFLOAD] Update CUDA and AMD plugins to new debug format" (#175786 ) Reverts llvm/llvm-project#175757	2026-01-13 17:13:46 +01:00
Alex Duran	7c2f49373b	[OFFLOAD] Update CUDA and AMD plugins to new debug format (#175757 ) This should be the last step before completely removing the DP macro.	2026-01-13 17:06:35 +01:00
Hansang Bae	13cd7003ad	[NFC][Offload] Rename a function (#175673 ) Renamed a function as suggested in #175664.	2026-01-12 19:40:17 -06:00
Hansang Bae	496729fe7e	[Offload] Fix level_zero plugin build (#175664 ) Build has been broken when OMPTARGET_DEBUG is undefined.	2026-01-12 16:53:23 -06:00
Hansang Bae	dae3b49cba	[Offload] Update debug message printig in the plugins (#175205 ) * Prepare a set of debug types in llvm::offload::debug to be used in plugin code * Update debug messages in the plugins	2026-01-12 14:26:43 -06:00
fineg74	1232599032	[OFFLOAD] Add memory data locking API for libomptarget migration (#173138 ) Add liboffload memory data locking API for libomptarget migration This PR adds liboffload memory data locking API that needed to make libomptarget to use liboffload	2026-01-12 13:07:57 -06:00
Alex Duran	dbd52bd558	[OFFLOAD][OpenMP] Remove old style REPORT support (#175607 ) Fix the few remaining usages and remove the support for the old REPORT macro.	2026-01-12 19:48:40 +01:00
Joseph Huber	c722ef4874	[OpenMP] Remove testing LTO variant on CPU targets (#175187 ) Summary: This is only really meaningful for the NVPTX target. Not all build environments support host LTO and these are redundant tests, just clean this up and make it run faster.	2026-01-09 10:13:44 -06:00
fineg74	583ce49a40	[OFFLOAD] Make L0 provide more information about device to be consistent with other plugins (#172946 ) Update information about devices provided by level zero plugin in order to be more consistent with other plugins.	2026-01-08 22:10:44 +00:00
Alex Duran	280e609d4e	[OFFLOAD][L0] Expose native ELF to upper layers (#172819 ) This PR refactors how the device image is built so we can expose the native ELF of the device to DeviceImageTy which solves several issues regarding symbol look up (as DeviceImageTy expects an ELF). It also simplifies the module linking code taking into account the latest changes in the driver (which adds "-library-compilation when necessary). --------- Co-authored-by: Alexey Sachkov <alexey.sachkov@intel.com> Co-authored-by: Nick Sarnie <nick.sarnie@intel.com> Co-authored-by: Joseph Huber <huberjn@outlook.com>	2025-12-18 18:03:12 +00:00
Alex Duran	5559918321	[OFFLOAD][L0] Improve symbol device lookup (#172820 ) When looking for the device address of a symbol, we need to also look if it's a function symbol if not found as global symbol in the device. --------- Co-authored-by: Alexey Sachkov <alexey.sachkov@intel.com> Co-authored-by: Nick Sarnie <nick.sarnie@intel.com> Co-authored-by: Joseph Huber <huberjn@outlook.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-12-18 15:31:20 +00:00
Alex Duran	3ac0ff2f36	[OFFLOAD][L0] Fix usages of getDebugLevel in L0 plugin (#172815 ) Support for getDebugLevel was removed as part of the new debug macros (#165416). This PR updates such usages to use the new ODBG_* macros. --------- Co-authored-by: Alexey Sachkov <alexey.sachkov@intel.com> Co-authored-by: Nick Sarnie <nick.sarnie@intel.com> Co-authored-by: Joseph Huber <huberjn@outlook.com>	2025-12-18 15:30:59 +00:00
Alex Duran	f125c8db5c	[OFFLOAD] Add plugin with support for Intel oneAPI Level Zero (#158900 ) Add a new nextgen plugin that supports GPU devices through the Intel oneAPI Level Zero library. The plugin is not enabled by default and needs to be added to LIBOMPTARGET_PLUGINS_TO_BUILD explicitely. --------- Co-authored-by: Alexey Sachkov <alexey.sachkov@intel.com> Co-authored-by: Nick Sarnie <nick.sarnie@intel.com> Co-authored-by: Joseph Huber <huberjn@outlook.com>	2025-12-18 08:53:03 +01:00
Hansang Bae	ecb94bcfe2	[Offload] Debug message update part 3 (#171684 ) Update debug messages based on the new method from #170425. Updated the following files. - plugins-nextgen/common/include/MemoryManager.h - plugins-nextgen/common/include/PluginInterface.h - plugins-nextgen/common/src/GlobalHandler.cpp - plugins-nextgen/common/src/PluginInterface.cpp - plugins-nextgen/host/dynamic_ffi/ffi.cpp	2025-12-17 09:05:16 -06:00
Kevin Sala Penades	35315a84b4	[offload] Fix CUDA args size by subtracting tail padding (#172249 ) This commit makes the cuLaunchKernel call to pass the total arguments size without tail padding.	2025-12-14 21:57:25 -08:00
Alex Duran	66ddc9b3e7	[OFFLOAD] Add support for more fine grained debug messages control (#165416 ) This PR introduces new debug macros that allow a more fined control of which debug message to output and introduce C++ stream style for debug messages. Changing existing messages (except a few that I changed for testing) will come in subsequent PRs. I also think that we should make debug enabling OpenMP agnostic but, for now, I prioritized maintaing the current libomptarget behavior for now, and we might need more changes further down the line as we we decouple libomptarget.	2025-11-20 18:39:56 +01:00
Joseph Huber	eea62159e8	[Offload] Make the RPC thread sleep briefly when idle (#168596 ) Summary: We start this thread if the RPC client symbol is detected in the loaded binary. We should make this sleep if there's no work to avoid the thread running at high priority when the (scarecely used) RPC call is actually required. So, right now after 25 microseconds we will assume the server is inactive and begin sleeping. This resets once we do find work. AMD supports a more intelligent way to do this. HSA signals can wake a sleeping thread from the kernel, and signals can be sent from the GPU side. This would be nice to have and I'm planning on working with it in the future to make this infrastructure more usable with existing AMD workloads.	2025-11-19 15:56:25 -06:00
Kevin Sala Penades	1a86f0aae7	[Offload] Add device info for shared memory (#167817 )	2025-11-13 11:00:12 -08:00
Joseph Huber	aaddd8d38a	[OpenMP] Fix tests relying on the heap size variable Summary: I made that an unimplemented error, but forgot that it was used for this environment variable.	2025-11-06 13:00:26 -06:00
Joseph Huber	670c453aeb	[Offload] Remove handling for device memory pool (#163629 ) Summary: This was a lot of code that was only used for upstream LLVM builds of AMDGPU offloading. We have a generic and fast `malloc` in `libc` now so just use that. Simplifies code, can be added back if we start providing alternate forms but I don't think there's a single use-case that would justify it yet.	2025-11-06 10:15:18 -06:00
Robert Imschweiler	dc94f2cbad	[Offload] Add device UID (#164391 ) Introduced in OpenMP 6.0, the device UID shall be a unique identifier of a device on a given system. (Not necessarily a UUID.) Since it is not guaranteed that the (U)UIDs defined by the device vendor libraries, such as HSA, do not overlap with those of other vendors, the device UIDs in offload are always combined with the offload plugin name. In case the vendor library does not specify any device UID for a given device, we fall back to the offload-internal device ID. The device UID can be retrieved using the `llvm-offload-device-info` tool.	2025-11-04 20:15:47 +01:00
Nicole Aschenbrenner	16641ad8a2	[OpenMP] Adds omp_target_is_accessible routine (#138294 ) Adds omp_target_is_accessible routine. Refactors common code from omp_target_is_present to work for both routines. --------- Co-authored-by: Shilei Tian <i@tianshilei.me>	2025-10-22 17:35:16 +02:00
Ross Brunton	186182bb64	[Offload] Use `amd_signal_async_handler` for host function calls (#154131 )	2025-10-21 13:08:30 +01:00
Alex Duran	45757b9284	[OFFLOAD] Remove unused init_device_info plugin interface (#162650 ) This was used for the old interop code. It's dead code after #143491	2025-10-09 08:38:24 -05:00
Joseph Huber	8763812b4c	[Offload] Remove check on kernel argument sizes (#162121 ) Summary: This check is unnecessarily restrictive and currently incorrectly fires for any size less than eight bytes. Just remove it, we do sanity checks elsewhere and at some point need to trust the ABI.	2025-10-06 12:49:44 -05:00
Alex Duran	902fe02e87	[OFFLOAD] Restore interop functionality (#161429 ) This implements two pieces to restore the interop functionality (that I broke) when the 6.0 interfaces were added: * A set of wrappers that support the old interfaces on top of the new ones * The same level of interop support for the CUDA amd AMD plugins	2025-10-02 21:48:31 +02:00
Kevin Sala Penades	01d761a776	[Offload] Use Error for allocating/deallocating in plugins (#160811 ) Co-authored-by: Joseph Huber <huberjn@outlook.com>	2025-09-26 13:50:00 -05:00
Ross Brunton	e60a5733f0	[Offload] Print Image location rather than casting it (#160309 ) This squishes a warning where the runtime tries to bind a StringRef to a `%p`.	2025-09-24 10:57:55 +01:00
Alexey Sachkov	bb584644e9	[Offload][NFC] Avoid temporary string copies in InfoTreeNode (#159372 )	2025-09-23 12:21:57 -05:00
Tobias Stadler	dfbd76bda0	[Remarks] Restructure bitstream remarks to be fully standalone (#156715 ) Currently there are two serialization modes for bitstream Remarks: standalone and separate. The separate mode splits remark metadata (e.g. the string table) from actual remark data. The metadata is written into the object file by the AsmPrinter, while the remark data is stored in a separate remarks file. This means we can't use bitstream remarks with tools like opt that don't generate an object file. Also, it is confusing to post-process bitstream remarks files, because only the standalone files can be read by llvm-remarkutil. We always need to use dsymutil to convert the separate files to standalone files, which only works for MachO. It is not possible for clang/opt to directly emit bitstream remark files in standalone mode, because the string table can only be serialized after all remarks were emitted. Therefore, this change completely removes the separate serialization mode. Instead, the remark string table is now always written to the end of the remarks file. This requires us to tell the serializer when to finalize remark serialization. This automatically happens when the serializer goes out of scope. However, often the remark file goes out of scope before the serializer is destroyed. To diagnose this, I have added an assert to alert users that they need to explicitly call finalizeLLVMOptimizationRemarks. This change paves the way for further improvements to the remark infrastructure, including more tooling (e.g. #159784), size optimizations for bitstream remarks, and more. Pull Request: https://github.com/llvm/llvm-project/pull/156715	2025-09-22 16:41:39 +01:00
Joseph Huber	23efc67e19	[Offload] Remove non-blocking allocation type (#159851 ) Summary: This was originally added in as a hack to work around CUDA's limitation on allocation. The `libc` implementation now isn't even used for CUDA so this code is never hit. Even if this case, this code never truly worked. A true solution would be to use CUDA's virtual memory API instead to allocate 2MiB slabs independenctly from the normal memory management done in the stream.	2025-09-20 09:07:14 -05:00
Joseph Huber	580860e8b7	[OpenMP][NFC] Clean up a bunch of warnings and clang-tidy messages (#159831 ) Summary: I made the GPU flags accept more of the default LLVM warnings, which triggered some new cases. Clean those up and fix some other ones while I'm at it.	2025-09-19 14:09:33 -05:00
Joseph Huber	51e3c3d51b	[Offload] Implement 'olIsValidBinary' in offload and clean up (#159658 ) Summary: This exposes the 'isDeviceCompatible' routine for checking if a binary can be loaded. This is useful if people don't want to consume errors everywhere when figuring out which image to put to what device. I don't know if this is a good name, I was thining like `olIsCompatible` or whatever. Let me know what you think. Long term I'd like to be able to do something similar to what OpenMP does where we can conditionally only initialize devices if we need them. That's going to be support needed if we want this to be more generic.	2025-09-19 12:15:57 -05:00
Joseph Huber	dffd7f3d9a	[LLVM] Fix offload and update CUDA ABI for all SM values (#159354 ) Summary: Turns out the new CUDA ABI now applies retroactively to all the other SMs if you upgrade to CUDA 13.0. This patch changes the scheme, keeping all the SM flags consistent but using an offset. Fixes: https://github.com/llvm/llvm-project/issues/159088	2025-09-17 14:39:39 -05:00
Nick Sarnie	f74583fbe8	[offload] Fix build with debug libomptarget (#159144 ) Currently get this error ``` offload/plugins-nextgen/common/src/PluginInterface.cpp:859:63: error: member reference type 'StringRef' is not a pointer; did you mean to use '.'? ``` We pass the full image binary now so we can't really print anything useful here. Seems introduced in https://github.com/llvm/llvm-project/pull/158748. --------- Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com> Co-authored-by: Joseph Huber <huberjn@outlook.com>	2025-09-16 18:40:02 +00:00
Joseph Huber	e7101dac9c	[Offload] Copy loaded images into managed storage (#158748 ) Summary: Currently we have this `__tgt_device_image` indirection which just takes a reference to some pointers. This was all find and good when the only usage of this was from a section of GPU code that came from an ELF constant section. However, we have expanded beyond that and now need to worry about managing lifetimes. We have code that references the image even after it was loaded internally. This patch changes the implementation to instaed copy the memory buffer and manage it locally. This PR reworks the JIT and other image handling to directly manage its own memory. We now don't need to duplicate this behavior externally at the Offload API level. Also we actually free these if the user unloads them. Upside, less likely to crash and burn. Downside, more latency when loading an image.	2025-09-16 08:57:28 -05:00
Joseph Huber	5d550bf41c	[OpenMP] Move `__omp_rtl_data_environment' handling to OpenMP (#157182 ) Summary: This operation is done every time we load a binary, this behavior should be moved into OpenMP since it concerns an OpenMP specific data struct. This is a little messy, because ideally we should only be using public APIs, but more can be extracted later.	2025-09-08 09:58:38 -05:00
Ross Brunton	32beea0605	[OpenMP][Offload] Mark `SPMD_NO_LOOP` as a valid exec mode (#155990 ) This was added in #154105 , but was not added to the plugin interface's list of valid modes.	2025-09-01 11:27:24 +01:00
Ross Brunton	ffb756dff2	[Offload] Add `OL_DEVICE_INFO_MAX_WORK_SIZE[_PER_DIMENSION]` (#155823 ) This is the total number of work items that the device supports (the equivalent work group properties are for only a single work group).	2025-08-29 09:39:18 +01:00
Ross Brunton	9e5d8bd3d1	[Offload] Improve `olDestroyQueue` logic (#153041 ) Previously, `olDestroyQueue` would not actually destroy the queue, instead leaving it for the device to clean up when it was destroyed. Now, the queue is either released immediately if it is complete or put into a list of "pending" queues if it is not. Whenever we create a new queue, we check this list to see if any are now completed. If there are any we release their resources and use them instead of pulling from the pool. This prevents long running programs that create and drop many queues without syncing them from leaking memory all over the place.	2025-08-29 09:39:00 +01:00
Ross Brunton	41fed2d048	[Offload] Add PRODUCT_NAME device info (#155632 ) On my system, this will be "Radeon RX 7900 GRE" rather than "gfx1100". For Nvidia, the product name and device name are identical.	2025-08-28 15:16:17 +01:00
Dominik Adamski	87db8e9130	[OpenMP][Offload] Add SPMD-No-Loop mode to OpenMP offload runtime (#154105 ) Kernels which are marked as SPMD-No-Loop should be launched with sufficient number of teams and threads to cover loop iteration space. No-Loop mode is described in RFC: https://discourse.llvm.org/t/rfc-no-loop-mode-for-openmp-gpu-kernels/87517/	2025-08-28 09:19:14 +02:00

1 2 3 4

196 Commits