llvm-project

Author	SHA1	Message	Date
Austin Schuh	32fd97c61d	cuda clang: Move nvptx-surface.cu test to CodeGenCUDA (#134758 ) Signed-off-by: Austin Schuh <austin.linux@gmail.com>	2025-04-22 12:36:39 -07:00
Arvind Sudarsanam	6cfec29cb9	[Offload][SYCL] Refactor OffloadKind implementation (#135809 ) Following are the changes: 1. Make OffloadKind enum values to be powers of two so we can use them like a bitfield 2. Include OFK_SYCL enum value 3. Modify ActiveOffloadKinds support in clang-linker-wrapper to use bitfields instead of a vector. Thanks --------- Signed-off-by: Arvind Sudarsanam <arvind.sudarsanam@intel.com>	2025-04-16 14:13:30 +00:00
Joseph Huber	f1e917d07b	[Offload] Unify offloading entries into a single section (#125731 ) Summary: This patch unifies the existing offloading entires into a single section called `llvm_offload_entires`. This lets us use a more unified offloading infrastructure so that all targets share the same handling. The effect is that people in the runtimes now need to check if the kind is what they expect, but the expectation is that you can combine multiple potential providers into a compile job. Doesn't fully work yet because of other runtime issues, but some day. Mostly this helps the future of liboffload where we want to handle different languages than OpenMP.	2025-02-06 08:24:01 -06:00
Joseph Huber	13dcc95dcd	[Offload] Rework offloading entry type to be more generic (#124018 ) Summary: The previous offloading entry type did not fit the current use-cases very well. This widens it and adds a version to prevent further annoyances. It also includes the kind to better sort who's using it. The first 64-bytes are reserved as zero so the OpenMP runtime can detect the old format for binary compatibilitry.	2025-01-28 07:26:13 -06:00
Joseph Huber	70a16b90ff	[HIP] Support managed variables using the new driver (#123437 ) Summary: Previously, managed variables didn't work in rdc mode using the new driver because we just didn't register them. This was previously ignored because we didn't have enough space in the current struct format. This patch amends that by just emitting a struct pair for the two variables and using the single pointer. In the future, a more extensible entry format would be nice, but that can be done later.	2025-01-22 09:13:14 -06:00
Joseph Huber	42eb54b774	[Clang] Put offloading globals in the `.llvm.rodata.offloading` section (#111890 ) Summary: For our offloading entries, we currently store all the string names of kernels that the runtime will need to load from the target executable. These are available via pointer in the `__tgt_offload_entry` struct, however this makes it difficult to obtain from the object itself. This patch simply puts the strings in a named section so they can be easily queried. The motivation behind this is that when the linker wrapper is doing linking, it wants to know which kernels the host executable is calling. We could get this already via the `.relaomp_offloading_entires` section and trawling through the string table, but that's quite annoying and not portable. The follow-up to this should be to make the linker wrapper get a list of all used symbols the device link job should count as "needed" so we can handle static linking more directly.	2024-10-28 07:17:50 -07:00
Joseph Huber	470aefb240	[Offload][NFC] Remove `omp_` prefix from offloading entries (#88071 ) Summary: These entires are generic for offloading with the new driver now. Having the `omp` prefix was a historical artifact and is confusing when used for CUDA. This patch just renames them for now, future patches will rework the binary format to make it more common.	2024-04-09 15:50:15 -05:00
Joseph Huber	97f3be2c5a	[CUDA][HIP] Improve variable registration with the new driver (#73177 ) Summary: This patch adds support for registering texture / surface variables from CUDA / HIP. Additionally, we now properly track the `extern` and `const` flags that are also used in these runtime functions. This does not implement the `managed` variables yet as those seem to require some extra handling I'm not familiar with. The issue is that the current offload entry isn't large enough to carry size and alignment information along with an extra global.	2023-12-07 15:44:23 -06:00
Joseph Huber	52204a29ab	[Offload] Initial support for registering offloading entries on COFF targets (#72697 ) Summary: This patch provides the initial support to allow handling the new driver's offloading entries. Normally, the ELF target can emit varibles at C-identifier named sections and the linker will provide a pointer to the section. For COFF target, instead the linker merges sections containing a `$` in alphabetical order. We thus can emit these variables at sections and then emit two variables that are guaranteed to be sorted before and after the others to traverse it. Previous patches consolidated the handling of offloading entries so that this patch more easily can handle mapping them to the appropriate section. Ideally, the only remaining step to allow the new driver to run on Windows targets is to accurately map the following `ld.lld` arguments to their `llvm-link` equivalents. These are used inside the linker-wrapper, so we should simply need to remap the arguments to the same functionality if possible. ``` -o, -output -l, --library -L, --library-path -v, --version -rpath -whole-archive, -no-whole-archive ``` I have not tested this at runtime as I do not have access to a windows machine. This patch was adapted from some initial efforts in https://reviews.llvm.org/D137470.	2023-11-21 06:48:34 -06:00
Joseph Huber	b370be37cc	[CUDA] Allow the new driver to compile CUDA in non-RDC mode The new driver primarily allows us to support RDC-mode compilations with proper linking. This is not needed for non-RDC mode compilation, but we still would like the new driver to be able to handle this mode so we can transition away from the old driver in the future. This patch adds the necessary code to support creating a fatbinary for CUDA code generation as well as removing old assumptions and errors about RDC-mode with the new driver. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D129655	2022-07-13 21:49:15 -04:00
Joseph Huber	e88d53d25f	[HIP] Generate offloading entries for HIP with the new driver. This patch adds the small change required to output offloading entried for HIP instead of CUDA. These should be placed in different sections so because they need to be distinct to the offloading toolchain, otherwise we'd have HIP trying to register CUDA kernels or vice-versa. This patch will precede support for HIP in the linker wrapper. Reviewed By: yaxunl, tra Differential Revision: https://reviews.llvm.org/D128850	2022-07-11 15:49:21 -04:00
Joseph Huber	0035f7154c	[CUDA] Create offloading entries when using the new driver The changes made in D123460 generalized the code generation for OpenMP's offloading entries. We can use the same scheme to register globals for CUDA code. This patch adds the code generation to create these offloading entries when compiling using the new offloading driver mode. The offloading entries are simple structs that contain the information necessary to register the global. The struct used is as follows: ``` Type struct __tgt_offload_entry { void addr; // Pointer to the offload entry info. // (function or global) char name; // Name of the function or global. size_t size; // Size of the entry info (0 if it a function). int32_t flags; int32_t reserved; }; ``` Currently CUDA handles RDC code generation by deferring the registration of globals in the current TU to a callback function containing the modules ID. Later all the module IDs will be used to register all of the globals at once. Rather than mimic this, offloading entries allow us to mimic the way OpenMP registers globals. That is, we create a simple global struct for each device global to be registered. These are placed at a special section `cuda_offloading_entires`. Because this section is a valid C-identifier, the linker will profide a `__start` and `__stop` pointer that we can use to iterate and register all globals at runtime. the registration requires a flag variable to indicate which registration function to use. I have assigned the flags somewhat arbitrarily, but these use the following values. Kernel: 0 Variable: 0 Managed: 1 Surface: 2 Texture: 3 Depends on D120272 Reviewed By: tra Differential Revision: https://reviews.llvm.org/D123471	2022-05-11 07:30:21 -04:00

12 Commits