llvm-project

Author	SHA1	Message	Date
Ross Brunton	ae44418f28	[Offload] Erase entries from JIT cache when program is destroyed (#148847 ) When `unloadBinary` is called, any entries in the JITEngine's cache for that binary will be cleared. This fixes a nasty issue with liboffload program handles. If two handles happen to have had the same address (after one was free'd, for example), the cache would be hit and return the wrong program.	2025-07-25 16:11:30 +01:00
Joseph Huber	b53be5f4b2	[LLVM] Update CUDA ELF flags for their new ABI (#149534 ) Summary: We rely on these flags to do things in the runtime and print the contents of binaries correctly. CUDA updated their ABI encoding recently and we didn't handle that. it's a new ABI entirely so we just select on it when it shows up. Fixes: https://github.com/llvm/llvm-project/issues/148703	2025-07-21 14:38:03 -05:00
Ross Brunton	4f02965ae2	[Offload] Store kernel name in GenericKernelTy (#142799 ) GenericKernelTy has a pointer to the name that was used to create it. However, the name passed in as an argument may not outlive the kernel. Instead, GenericKernelTy now contains a std::string, and copies the name into there.	2025-07-02 14:11:05 +01:00
Ross Brunton	0870c8838b	[Offload] Add an `unloadBinary` interface to PluginInterface (#143873 ) This allows removal of a specific Image from a Device, rather than requiring all image data to outlive the device they were created for. This is required for `ol_program_handle_t`s, which now specify the lifetime of the buffer used to create the program.	2025-06-25 14:53:18 +01:00
Ross Brunton	4359e55838	[Offload] Properly report errors when jit compiling (#145498 ) Previously, if a binary failed to load due to failures when jit compiling, the function would return success with nullptr. Now it returns a new plugin error, `COMPILE_FAILURE`.	2025-06-24 16:27:12 +01:00
Ross Brunton	e6a3579653	[Offload] Replace device info queue with a tree (#144050 ) Previously, device info was returned as a queue with each element having a "Level" field indicating its nesting level. This replaces this queue with a more traditional tree-like structure. This should not result in a change to the output of `llvm-offload-device-info`.	2025-06-13 09:22:47 -05:00
Ethan Luis McDonough	67ff66e677	[PGO][Offload] Fix offload coverage mapping (#143490 ) This pull request fixes coverage mapping on GPU targets. - It adds an address space cast to the coverage mapping generation pass. - It reads the profiled function names from the ELF directly. Reading it from public globals was causing issues in cases where multiple device-code object files are linked together.	2025-06-10 20:19:38 -05:00
Callum Fare	b78bc35d16	[Offload] Don't check in generated files (#141982 ) Previously we decided to check in files that we generate with tablegen. The justification at the time was that it helped reviewers unfamiliar with `offload-tblgen` see the actual changes to the headers in PRs. After trying it for a while, it's ended up causing some headaches and is also not how tablegen is used elsewhere in LLVM. This changes our use of tablegen to be more conventional. Where possible, files are still clang-formatted, but this is no longer a hard requirement. Because `OffloadErrcodes.inc` is shared with libomptarget it now gets generated in a more appropriate place.	2025-06-03 10:39:04 -05:00
Joseph Huber	eb9ed93fce	[Offload] Optimistically accept SM architectures (#142399 ) Summary: We try to clamp these to ones known to work, but we should probably just optimistically accept these. I'd prefer to update the flag check, but since NVIDIA refuses to publish their ELF format it's too much effort to reverse engineer. Fixes: https://github.com/llvm/llvm-project/issues/138532	2025-06-02 14:32:05 -05:00
Ross Brunton	050892d2f8	[Offload] Use new error code handling mechanism and lower-case messages (#139275 ) [Offload] Use new error code handling mechanism This removes the old ErrorCode-less error method and requires every user to provide a concrete error code. All calls have been updated. In addition, for consistency with error messages elsewhere in LLVM, all messages have been made to start lower case.	2025-05-20 08:50:20 -05:00
Ross Brunton	1532ee6916	[Offload] Add Error Codes to PluginInterface (#138258 ) A new ErrorCode enumeration is present in PluginInterface which can be used when returning an llvm::Error from offload and PluginInterface functions. This enum must be kept up to sync with liboffload's ol_errc_t enum, so both are automatically generated from liboffload's enum definition. Some error codes have also been shuffled around to allow for future work. Note that this patch only adds the machinery; actual error codes will be added in a future patch. ~~Depends on #137339 , please ignore first commit of this MR.~~ This has been merged.	2025-05-19 09:38:34 -05:00
Joseph Huber	92bba68634	[Offload] Fix handling of 'bare' mode when environment missing (#136794 ) Summary: We treated the missing kernel environment as a unique mode, but it was kind of this random bool that was doing the same thing and it explicitly expects the kernel environment to be zero. It broke after the previous change since it used to default to SPMD and didn't handle zero in any of the other cases despite being used. This fixes that and queries for it without needing to consume an error.	2025-04-23 08:16:39 -05:00
Joseph Huber	25bf4e262c	[Offload] Remove handling for COV4 binaries from offload/ (#131033 ) Summary: We moved from cov4 to cov5 a long time ago, and it guards simplifying some front end code, so we should be able to move up with this.	2025-03-24 18:58:20 -05:00
Ethan Luis McDonough	c50d39f073	[PGO][Offload] Allow PGO flags to be used on GPU targets (#94268 ) This pull request is the third part of an ongoing effort to extends PGO instrumentation to GPU device code and depends on https://github.com/llvm/llvm-project/pull/93365. This PR makes the following changes: - Allows PGO flags to be supplied to GPU targets - Pulls version global from device - Modifies `__llvm_write_custom_profile` and `lprofWriteDataImpl` to allow the PGO version to be overridden	2025-03-19 19:01:38 -05:00
Joseph Huber	8437b7f558	[libc] Make RPC server handling header only (#131205 ) Summary: This patch moves the RPC server handling to be a header only utility stored in the `shared/` directory. This is intended to be shared within LLVM for the loaders and `offload/` handling. Generally, this makes it easier to share code without weird cross-project binaries being plucked out of the build system. It also allows us to soon move the loader interface out of the `libc` project so that we don't need to bootstrap those and can build them in LLVM.	2025-03-13 19:23:21 -05:00
Nikita Popov	f137c3d592	[TargetRegistry] Accept Triple in createTargetMachine() (NFC) (#130940 ) This avoids doing a Triple -> std::string -> Triple round trip in lots of places, now that the Module stores a Triple.	2025-03-12 17:35:09 +01:00
Nikita Popov	4f469ae046	[offload] Fix build after Module::getTargetTriple() change Adjust for #129868.	2025-03-06 11:04:00 +01:00
Nikita Popov	979c275097	[IR] Store Triple in Module (NFC) (#129868 ) The module currently stores the target triple as a string. This means that any code that wants to actually use the triple first has to instantiate a Triple, which is somewhat expensive. The change in #121652 caused a moderate compile-time regression due to this. While it would be easy enough to work around, I think that architecturally, it makes more sense to store the parsed Triple in the module, so that it can always be directly queried. For this change, I've opted not to add any magic conversions between std::string and Triple for backwards-compatibilty purses, and instead write out needed Triple()s or str()s explicitly. This is because I think a decent number of them should be changed to work on Triple as well, to avoid unnecessary conversions back and forth. The only interesting part in this patch is that the default triple is Triple("") instead of Triple() to preserve existing behavior. The former defaults to using the ELF object format instead of unknown object format. We should fix that as well.	2025-03-06 10:27:47 +01:00
Ethan Luis McDonough	9e5c136d5a	[PGO][Offload] Profile profraw generation for GPU instrumentation #76587 (#93365 ) This pull request is the second part of an ongoing effort to extends PGO instrumentation to GPU device code and depends on #76587. This PR makes the following changes: - Introduces `__llvm_write_custom_profile` to PGO compiler-rt library. This is an external function that can be used to write profiles with custom data to target-specific files. - Adds `__llvm_write_custom_profile` as weak symbol to libomptarget so that it can write the collected data to a profraw file. - Adds `PGODump` debug flag and only displays dump when the aforementioned flag is set	2025-02-11 23:30:54 -06:00
Joseph Huber	baf7a3c1e5	[Offload] Properly guard modifications to the RPC device array (#126790 ) Summary: If the user deallocates an RPC device this can sometimes fail if the RPC server is still running. This will happen if the modification happens while the server is still checking it. This patch adds a mutex to guard modifications to it.	2025-02-11 14:57:31 -06:00
Joseph Huber	5812d0bf8e	[Offload] Make only a single thread handle the RPC server thread (#126067 ) Summary: This patch just changes the interface to make starting the thread multiple times permissable since it will only be done the first time. Note that this does not refcount it or anything, so it's onto the user to make sure that they don't shut down the thread before everyone is done using it. That is the case today because the shutDown portion is run by a single thread in the destructor phase. Another question is if we should make this thread truly global state, because currently it will be private to each plugin instance, so if you have an AMD and NVIDIA image there will be two, similarly if you have those inside of a shared library.	2025-02-06 11:38:14 -06:00
Joseph Huber	7a8779422d	[Offload] Stop the RPC server faiilng with more than one GPU (#125982 ) Summary: Pretty dumb mistake of me, forgot that this is run per-device and per-plugin, which fell through the cracks with my testing because I have two GPUs that use different plugins.	2025-02-05 20:51:28 -06:00
Joseph Huber	a284a6ed17	[OpenMP] Guard OpenMP specific entry handling	2025-02-03 16:16:18 -06:00
Christian Clauss	1f56bb3137	[Offload][NFC] Fix typos discovered by codespell (#125119 ) https://github.com/codespell-project/codespell % `codespell --ignore-words-list=archtype,hsa,identty,inout,iself,nd,te,ths,vertexes --write-changes`	2025-01-31 09:35:29 -06:00
Joseph Huber	38b3f45a81	[Offload] Fix offload-info interface Summary: The offload info tool doesn't initialize things properly, just check this first instead.	2025-01-27 10:36:09 -06:00
Joseph Huber	f07505849c	[Offload] Fix server thread from being shut down if unused	2025-01-27 08:29:41 -06:00
Joseph Huber	e7592d83e0	[Offload][NFC] Make sure the thread is not running already	2025-01-27 08:06:29 -06:00
Joseph Huber	134401deea	[Offload] Move RPC server handling to a dedicated thread (#112988 ) Summary: Handling the RPC server requires running through list of jobs that the device has requested to be done. Currently this is handled by the thread that does the waiting for the kernel to finish. However, this is not sound on NVIDIA architectures and only works for async launches in the OpenMP model that uses helper threads. However, we also don't want to have this thread doing work unnnecessarily. For this reason we track the execution of kernels and cause the thread to sleep via a condition variable (usually backed by some kind of futex or other intelligent sleeping mechanism) so that the thread will be idle while no kernels are running.	2025-01-24 11:36:45 -06:00
Joseph Huber	6518b121f0	[Offload][NFC] Factor out and rename the `__tgt_offload_entry` struct (#123785 ) Summary: This patch is an NFC renaming to make using the offloading entry type more portable between other targets. Right now this is just moving its definition to LLVM so others can use it. Future work will rework the struct layout.	2025-01-21 12:05:24 -06:00
Jinsong Ji	8d1d67ec4d	[Offload][PGO] Fix dump of array in ProfData (#122039 ) Exposed by -Warray-bounds: In file included from ../../../../../../../llvm/offload/plugins-nextgen/common/src/GlobalHandler.cpp:252: ../../../../../../../llvm/llvm/include/llvm/ProfileData/InstrProfData.inc:109:1: error: array index 4 is past the end of the array (that has type 'const std::remove_const<const uint16_t>::type[4]' (aka 'const unsigned short[4]')) [-Werror,-Warray-bounds] 109 \| INSTR_PROF_DATA(const uint16_t, Int16ArrayTy, NumValueSites[IPVK_Last+1], \ \| ^ ~~~~~~~~~~~ ../../../../../../../llvm/offload/plugins-nextgen/common/src/GlobalHandler.cpp:250:15: note: expanded from macro 'INSTR_PROF_DATA' 250 \| outs() << ProfData.Name << " "; \ \| ^ ~~~~ ../../../../../../../llvm/llvm/include/llvm/ProfileData/InstrProfData.inc:109:1: note: array 'NumValueSites' declared here 109 \| INSTR_PROF_DATA(const uint16_t, Int16ArrayTy, NumValueSites[IPVK_Last+1], \ \| ^ ../../../../../../../llvm/offload/plugins-nextgen/common/include/GlobalHandler.h:62:3: note: expanded from macro 'INSTR_PROF_DATA' 62 \| std::remove_const<Type>::type Name; Avoid accessing out-of-bound data, but skip printing array data for now. As there is no simple way to do this without hardcoding the NumValueSites field. --------- Co-authored-by: Ethan Luis McDonough <ethanluismcdonough@gmail.com>	2025-01-14 15:46:27 -05:00
wanglei	bdf727065b	[Offload] Add support for loongarch64 to host plugin This adds support for the loongarch64 architecture to the offload host plugin. Similar to #115773 To fix some test issues, I've had to add the LoongArch64 target to: - CompilerInvocation::ParseLangArgs - linkDevice in ClangLinuxWrapper.cpp - OMPContext::OMPContext (to set the device_kind_cpu trait) Reviewed By: jhuber6 Pull Request: https://github.com/llvm/llvm-project/pull/120173	2024-12-17 19:06:10 +08:00
Jinsong Ji	e85a9f5540	libc: Prefix RPC Status code to avoid conflict in windows build (#119991 ) Somehow conflict with define in wingdi.h. Fix build failures: [ 52%] Building CXX object projects/offload/plugins-nextgen/common/CMakeFiles/PluginCommon.dir/src/RPC.cpp.obj In file included from ...llvm\offload\plugins-nextgen\common\src\RPC.cpp:16: ...\llvm\libc\shared\rpc.h(48,3): error: expected identifier 48 \| ERROR = 0x1000, \| ^ c:\Program files (x86)\Windows Kits\10\include\10.0.22000.0\um\wingdi.h(118,29): note: expanded from macro 'ERROR' 118 \| #define ERROR 0 \| ^ ...\llvm\offload\plugins-nextgen\common\src\RPC.cpp(75,17): error: expected unqualified-id 75 \| return rpc::ERROR; \| ^ c:\Program files (x86)\Windows Kits\10\include\10.0.22000.0\um\wingdi.h(118,29): note: expanded from macro 'ERROR' 118 \| #define ERROR 0 \| ^ 2 errors generated.	2024-12-15 09:35:44 -05:00
hidekisaito	f2bceb2311	[Offload][AMDGPU] accept generic target (#118919 ) Enables generic ISA, e.g., "--offload-arch=gfx11-generic" device code to run on gfx11-generic ISA capable device. Executable may contain one ELF that has specific target ISA and another ELF that has compatible generic ISA. Under that circumstance, this code should say both ELFs are compatible, leaving the rest to PluginManager to handle. Suggestions on how best to address that is welcome.	2024-12-09 19:11:38 -05:00
Shilei Tian	92376c3ff5	[Offload][OMPX] Add the runtime support for multi-dim grid and block (#118042 )	2024-12-06 09:07:50 -05:00
Shilei Tian	eb49788bd9	[Offload][AMDGPU] Allow COV6 images (#118909 )	2024-12-05 20:05:39 -05:00
Shilei Tian	68bcba6d7a	Revert "[AMDGPU] Use COV6 by default (#118515 )" This reverts commit 410cbe3cf28913cca2fc61b3437306b841d08172 because some buildbots are not ready yet.	2024-12-03 20:17:06 -05:00
Shilei Tian	410cbe3cf2	[AMDGPU] Use COV6 by default (#118515 )	2024-12-03 19:38:35 -05:00
Joseph Huber	a6ef0debb1	[libc][NFC] Rename RPC opcodes to better reflect their usage Summary: RPC_ is a generic prefix here, use LIBC_ to indicate that these are opcodes used to implement the C library	2024-12-02 15:35:08 -06:00
Joseph Huber	91f5f974cb	[OpenMP] Unconditionally provide an RPC client interface for OpenMP (#117933 ) Summary: This patch adds an RPC interface that lives directly in the OpenMP device runtime. This allows OpenMP to implement custom opcodes. Currently this is only providing the host call interface, which is the raw version of reverse offloading. Previously this lived in `libc/` as an extension which is not the correct place. The interface here uses a weak symbol for the RPC client by the same name that the `libc` interface uses. This means that it will defer to the libc one if both are present so we don't need to set up multiple instances. The presense of this symbol is what controls whether or not we set up the RPC server. Because this is an external symbol it normally won't be optimized out, so there's a special pass in OpenMPOpt that deletes this symbol if it is unused during linking. That means at `O0` the RPC server will always be present now, but will be removed trivially if it's not used at O1 and higher.	2024-12-02 14:31:51 -06:00
Joseph Huber	1d810ece2b	[libc] Move libc server handlers to a shared header (#117908 ) Summary: We can simply include this header from the shared directory now and do not need to have this level of indirection. Simply stash it with the other libc opcode handlers. If we were able to move the printf handlers to the shared directory then this could just be a header as well, which would HEAVILY simplify the mess associated with building the RPC server first in the projects build, then copying it to the runtimes build.	2024-11-27 14:57:52 -06:00
Joseph Huber	89d8e70031	[libc] Export a pointer to the RPC client directly (#117913 ) Summary: We currently have an unnecessary level of indirection when initializing the RPC client. This is a holdover from when the RPC client was not trivially copyable and simply makes it more complicated. Here we use the `asm` syntax to give the C++ variable a valid name so that we can just copy to it directly. Another advantage to this, is that if users want to piggy-back on the same RPC interface they need only declare theirs as extern with the same symbol name, or make it weak to optionally use it if LIBC isn't avaialb.e	2024-11-27 14:57:38 -06:00
Joseph Huber	d7c20a6f0c	[libc][NFC] Move RPC opcodes to the 'shared/' directory as well	2024-11-25 12:04:10 -06:00
Joseph Huber	b4d49fb52e	[libc] Remove RPC server API and use the header directly (#117075 ) Summary: This patch removes much of the `llvmlibc_rpc_server` interface. This pretty much deletes all of this code and just replaces it with including `rpc.h` directly. We still maintain the file to let `libc` handle the opcodes, since those depend on the `printf` impelmentation. This will need to be cleaned up more, but I don't want to put too much into a single patch.	2024-11-25 07:13:28 -06:00
Ivan Radanov Ivanov	0a27e4eed4	[offload] Fix copy-paste defect in error message	2024-11-19 17:12:51 +09:00
Matin Raayai	bb3f5e1fed	Overhaul the TargetMachine and LLVMTargetMachine Classes (#111234 ) Following discussions in #110443, and the following earlier discussions in https://lists.llvm.org/pipermail/llvm-dev/2017-October/117907.html, https://reviews.llvm.org/D38482, https://reviews.llvm.org/D38489, this PR attempts to overhaul the `TargetMachine` and `LLVMTargetMachine` interface classes. More specifically: 1. Makes `TargetMachine` the only class implemented under `TargetMachine.h` in the `Target` library. 2. `TargetMachine` contains target-specific interface functions that relate to IR/CodeGen/MC constructs, whereas before (at least on paper) it was supposed to have only IR/MC constructs. Any Target that doesn't want to use the independent code generator simply does not implement them, and returns either `false` or `nullptr`. 3. Renames `LLVMTargetMachine` to `CodeGenCommonTMImpl`. This renaming aims to make the purpose of `LLVMTargetMachine` clearer. Its interface was moved under the CodeGen library, to further emphasis its usage in Targets that use CodeGen directly. 4. Makes `TargetMachine` the only interface used across LLVM and its projects. With these changes, `CodeGenCommonTMImpl` is simply a set of shared function implementations of `TargetMachine`, and CodeGen users don't need to static cast to `LLVMTargetMachine` every time they need a CodeGen-specific feature of the `TargetMachine`. 5. More importantly, does not change any requirements regarding library linking. cc @arsenm @aeubanks	2024-11-14 13:30:05 -08:00
aurel32	b6bd7477a9	[Offload] Add support for riscv64 to host plugin (#115773 ) This adds support for the riscv64 architecture to the offload host plugin. The check to define FFI_DEFAULT_ABI is intentionally not guarded by __riscv_xlen as the value is the same for riscv32 and riscv64 (support for OpenMP on riscv32 is still under review).	2024-11-13 08:15:49 -06:00
Johannes Doerfert	08533a3ee8	[Offload][NFC] Reorganize `utils::` and make Device/Host/Shared clearer (#100280 ) We had three `utils::` namespaces, all with different "meaning" (host, device, hsa_utils). We should, when we can, keep "include/Shared" accessible from host and device, thus RefCountTy has been moved to a separate header. `hsa_utils` was introduced to make `utils::` less overloaded. And common functionality was de-duplicated, e.g., `utils::advance` and `utils::advanceVoidPtr` -> `utils:advancePtr`. Type punning now checks for the size of the result to make sure it matches the source type. No functional change was intended.	2024-09-05 13:36:26 -07:00
Ethan Luis McDonough	fde2d23ee2	[PGO][OpenMP] Instrumentation for GPU devices (Revision of #76587 ) (#102691 ) This pull request is a revised version of #76587. This pull request fixes some build issues that were present in the previous version of this change. > This pull request is the first part of an ongoing effort to extends PGO instrumentation to GPU device code. This PR makes the following changes: > > - Adds blank registration functions to device RTL > - Gives PGO globals protected visibility when targeting a supported GPU > - Handles any addrspace casts for PGO calls > - Implements PGO global extraction in GPU plugins (currently only dumps info) > > These changes can be tested by supplying `-fprofile-instrument=clang` while targeting a GPU.	2024-08-22 01:10:54 -05:00
Johannes Doerfert	80525dfcde	[Offload][CUDA] Allow CUDA kernels to use LLVM/Offload (#94549 ) Through the new `-foffload-via-llvm` flag, CUDA kernels can now be lowered to the LLVM/Offload API. On the Clang side, this is simply done by using the OpenMP offload toolchain and emitting calls to `llvm` functions to orchestrate the kernel launch rather than `cuda` functions. These `llvm` functions are implemented on top of the existing LLVM/Offload API. As we are about to redefine the Offload API, this wil help us in the design process as a second offload language. We do not support any CUDA APIs yet, however, we could: https://www.osti.gov/servlets/purl/1892137 For proper host execution we need to resurrect/rebase https://tianshilei.me/wp-content/uploads/2021/12/llpp-2021.pdf (which was designed for debugging). ``` ❯❯❯ cat test.cu extern "C" { void llvm_omp_target_alloc_shared(size_t Size, int DeviceNum); void llvm_omp_target_free_shared(void DevicePtr, int DeviceNum); } __global__ void square(int A) { A = 42; } int main(int argc, char argv) { int DevNo = 0; int Ptr = reinterpret_cast<int >(llvm_omp_target_alloc_shared(4, DevNo)); Ptr = 7; printf("Ptr %p, Ptr %i\n", Ptr, Ptr); square<<<1, 1>>>(Ptr); printf("Ptr %p, Ptr %i\n", Ptr, Ptr); llvm_omp_target_free_shared(Ptr, DevNo); } ❯❯❯ clang++ test.cu -O3 -o test123 -foffload-via-llvm --offload-arch=native ❯❯❯ llvm-objdump --offloading test123 test123: file format elf64-x86-64 OFFLOADING IMAGE [0]: kind elf arch gfx90a triple amdgcn-amd-amdhsa producer openmp ❯❯❯ LIBOMPTARGET_INFO=16 ./test123 Ptr 0x155448ac8000, Ptr 7 Ptr 0x155448ac8000, Ptr 42 ```	2024-08-12 17:44:58 -07:00
Johannes Doerfert	9a1013220b	[Offload] Allow to record kernel launch stack traces (#100472 ) Similar to (de)allocation traces, we can record kernel launch stack traces and display them in case of an error. However, the AMD GPU plugin signal handler, which is invoked on memroy faults, cannot pinpoint the offending kernel. Insteade print `<NUM>`, set via `OFFLOAD_TRACK_NUM_KERNEL_LAUNCH_TRACES=<NUM>`, many traces. The recoding/record uses a ring buffer of fixed size (for now 8). For `trap` errors, we print the actual kernel name, and trace if recorded.	2024-07-31 11:49:50 -07:00

1 2

67 Commits