llvm-project

Author	SHA1	Message	Date
Yury Plyakhin	81a8363ae3	[Offload][SYCL] Refactoring: get rid of newline separators (#180274 ) Previously, kernel symbols in offload binaries for SYCL had to be stored as newline-separated strings and we had to use llvm::join and line_iterator. It was needed because Offload Binary v1 format did not store string value sizes. It is not necessary with Offload Binary v2 format, which stores string value size and hence eliminates the need for newline separators.	2026-02-10 12:01:13 -08:00
Yury Plyakhin	4d27530c69	[Offloading] Offload Binary Format V2: Support Multiple Entries (#169425 ) This PR updates the OffloadBinary format from version 1 to version 2, enabling support for multiple offloading entries in a single binary. This allows combining multiple device images into a single binary with common global metadata while maintaining backwards compatibility with version 1 binaries. # Key Changes ## Binary Format Enhancements Version 2 Format Changes: - Changed from single-entry to multi-entry design - Updated `Header` structure: - Renamed `EntryOffset` → `EntriesOffset` (offset to entries array) - Renamed `EntrySize` → `EntriesCount` (number of entries) - Added `StringEntry::ValueSize` field to support explicit string value sizes (enables non-null-terminated strings) - Introduced `OffloadEntryFlags` enum with `OIF_Metadata` flag for metadata-only entries (entries without binary images) API Changes: - `OffloadBinary::create()` now returns `Expected<SmallVector<std::unique_ptr<OffloadBinary>>>` instead of single binary - Added optional `Index` parameter to extract specific entry: `create(Buffer, std::optional<uint64_t> Index)` - `OffloadBinary::write()` now accepts `ArrayRef<OffloadingImage>` instead of single image - Added `OffloadBinary::extractHeader()` for header extraction Memory Management: - Implemented `SharedMemoryBuffer` class to enable memory sharing across multiple `OffloadBinary` instances from the same file - Multiple entries from a single serialized binary share the underlying buffer ## Testing Unit Tests (`unittests/Object/OffloadingTest.cpp`): - `checkMultiEntryBinaryExtraction`: Tests extracting all entries from a multi-entry binary - `checkIndexBasedExtraction`: Tests extracting specific entries by index, including out-of-bounds validation - `checkEdgeCases`: Tests edge cases including: - Empty string metadata - Empty image data - Large string values (4KB) Other Tests: - Updated `test/ObjectYAML/Offload/multiple_members.yaml` to include metadata-only entry --------- Co-authored-by: Joseph Huber <huberjn@outlook.com>	2026-02-05 07:46:57 -08:00
Nikita Popov	86755dd0bf	[llvm] Use ConstantInt::getAllOnesValue() Prefer getAllOnesValue() over get(-1). This is good practice to avoid issues with sign extension for large types.	2025-12-09 12:02:39 +01:00
Yury Plyakhin	4e7ce57e0e	[Offload][NFC] Offload wrapper cleanup/refactoring (#169411 ) Addresses feedback from https://github.com/llvm/llvm-project/pull/147508#pullrequestreview-3272708203 : - Update access modifiers for SYCLWrapper members. - Update comments. - Update types.	2025-11-24 22:29:50 +00:00
Joseph Huber	6655681cd0	[llvm-offload-wrapper] Fix Triple and OpenMP handling (#167580 ) Summary: The OpenMP handling using an offload binary should be optional, it's only used for extra metadata for llvm-objdump. Also the triple was completely wrong, it didn't let anyone correctly choose between ELF and COFF handling.	2025-11-11 19:44:20 -06:00
serge-sans-paille	28d9f99a27	Remove unused standard headers: <string>, <optional>, <numeric>, <tuple> (#167232 )	2025-11-10 12:17:12 +00:00
Maksim Sabianin	65d730b4a5	[SYCL] Add offload wrapping for SYCL kind (#147508 ) This patch adds an Offload Wrapper for the SYCL kind. This is an essential step for SYCL offloading and the compilation flow. The usage of offload wrapping is added to the clang-linker-wrapper tool. Modifications: Implemented `bundleSYCL()` function to handle SYCL image bundling. Implemented `wrapSYCLBinaries()` function that is invoked from clang-linker-wrapper. SYCL Offload Wrapping uses specific data structures such as `__sycl.tgt_device_image` and `__sycl.tgt_bin_desc`. Each SYCL image maintains its own symbol table (unlike shared global tables in other targets). Therefore, symbols are encoded explicitly during the offload wrapping. Also, images refer to their own Offloading Entries arrays unlike other targets. The proposed `__sycl.tgt_device_image` uses Version 3 to differentiate from images generated by Intel DPC++. The structure proposed in this patch doesn't have fields deprecated in DPC++.	2025-09-26 17:02:31 +00:00
Nick Sarnie	2bbc740573	[Offload] Change ELF machine type for SPIR-V OpenMP image (#159623 ) This needs to match the runtime plugin (currently in PR [here](https://github.com/llvm/llvm-project/pull/158900)), and use the recently-added `INTELGT` machine type which is correct for Intel GPU images. --------- Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>	2025-09-22 15:53:19 +00:00
Kazu Hirata	5e3cc00060	[Offloading] Fix a warning This patch fixes: llvm/lib/Frontend/Offloading/PropertySet.cpp:95:12: error: unused variable '[It, Inserted]' [-Werror,-Wunused-variable]	2025-08-01 23:22:15 -07:00
Arvind Sudarsanam	ee67f78776	Fix error caused by reference to local binding (#151789 ) This change fixes one of the failures in https://github.com/llvm/llvm-project/pull/147321 Following code snippet: ` for (const auto &[CategoryName, PropSet] : PSRegistry) { J.attributeObject(CategoryName, [&] { for (const auto &[PropName, PropVal] : PropSet) { ` causes a build warning that is emitted as an error. error: reference to local binding 'PropSet' declared in enclosing lambda expression This is resolved by capturing PropSet in a local variable. Thanks Signed-off-by: Arvind Sudarsanam <arvind.sudarsanam@intel.com>	2025-08-02 00:57:26 -04:00
Justin Cai	185a23e865	[SYCL] Add property set types and JSON representation (#147321 ) This PR adds the `PropertySet` type, along with a pair of functions used to serialize and deserialize into a JSON representation. A property set is a key-value map, with values being one of 2 types - uint32 or byte array. A property set registry is a collection of property sets, indexed by a "category" name. In SYCL offloading, property sets will be used to communicate metadata about device images needed by the SYCL runtime. For example, there is a property set which has a byte array containing the numeric ID, offset, and size of each SYCL2020 spec constant. Another example is a property set describing the optional kernel features used in the module: does it use fp64? fp16? atomic64? This metadata will be computed by `clang-sycl-linker` and the JSON representation will be inserted in the string table of each output `OffloadBinary`. This JSON will be consumed the SYCL offload wrapper and will be lowered to the binary form SYCL runtime expects. For example, consider this SYCL program that calls a kernel that uses fp64: ```c++ #include <sycl/sycl.hpp> using namespace sycl; class MyKernel; int main() { queue q; auto p = malloc_shared<double>(1, q); p = .1; q.single_task<MyKernel>([=]{ p = 2; }).wait(); std::cout << p << "\n"; free(p, q); } ``` The device code for this program would have the kernel marked with `!sycl_used_aspects`: ``` define spir_kernel void @_ZTS8MyKernel([...]) !sycl_used_aspects !n { [...] } !n = {i32 6} ``` `clang-sycl-linker` would recognize this metadata and then would output the following JSON in the `OffloadBinary`'s key-value map: ``` { "SYCL/device requirements": { // aspects contains a list of sycl::aspect values used // by the module; in this case just the value 6 encoded // as a 4-byte little-endian integer "aspects": "BjAwMA==" } } ``` The SYCL offload wrapper would lower those property sets to something like this: ```c++ struct _sycl_device_binary_property_set_struct { char CategoryName; _sycl_device_binary_property PropertiesBegin; _sycl_device_binary_property PropertiesEnd; }; struct _sycl_device_binary_property_struct { char PropertyName; void ValAddr; uint64_t ValSize; }; // _sycl_device_binary_property_struct device_requirements[] = { /* PropertyName / "aspects", / ValAddr / [pointer to the bytes 0x06 0x00 0x00 0x00], / ValSize / 4, }; _sycl_device_binary_property_set_struct properties[] = { / CategoryName / "SYCL/device requirements", / PropertiesBegin / device_requirements, / PropertiesEnd */ std::end(device_requirments), } ``` --------- Co-authored-by: Arvind Sudarsanam <arvind.sudarsanam@intel.com>	2025-08-01 20:05:45 -07:00
Nikita Popov	979c275097	[IR] Store Triple in Module (NFC) (#129868 ) The module currently stores the target triple as a string. This means that any code that wants to actually use the triple first has to instantiate a Triple, which is somewhat expensive. The change in #121652 caused a moderate compile-time regression due to this. While it would be easy enough to work around, I think that architecturally, it makes more sense to store the parsed Triple in the module, so that it can always be directly queried. For this change, I've opted not to add any magic conversions between std::string and Triple for backwards-compatibilty purses, and instead write out needed Triple()s or str()s explicitly. This is because I think a decent number of them should be changed to work on Triple as well, to avoid unnecessary conversions back and forth. The only interesting part in this patch is that the default triple is Triple("") instead of Triple() to preserve existing behavior. The former defaults to using the ELF object format instead of unknown object format. We should fix that as well.	2025-03-06 10:27:47 +01:00
Nick Sarnie	f7b3559ce0	[clang-linker-wrapper] Add ELF packaging for spirv64-intel OpenMP images (#125737 ) Add manual ELF packaging for `spirv64-intel` images as there is no SPIR-V linker available. This format will be expected by the runtime plugin we will submit in the future and is compatible with the format we already use downstream. --------- Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>	2025-02-06 15:46:44 +00:00
Joseph Huber	f1e917d07b	[Offload] Unify offloading entries into a single section (#125731 ) Summary: This patch unifies the existing offloading entires into a single section called `llvm_offload_entires`. This lets us use a more unified offloading infrastructure so that all targets share the same handling. The effect is that people in the runtimes now need to check if the kind is what they expect, but the expectation is that you can combine multiple potential providers into a compile job. Doesn't fully work yet because of other runtime issues, but some day. Mostly this helps the future of liboffload where we want to handle different languages than OpenMP.	2025-02-06 08:24:01 -06:00
Joseph Huber	13dcc95dcd	[Offload] Rework offloading entry type to be more generic (#124018 ) Summary: The previous offloading entry type did not fit the current use-cases very well. This widens it and adds a version to prevent further annoyances. It also includes the kind to better sort who's using it. The first 64-bytes are reserved as zero so the OpenMP runtime can detect the old format for binary compatibilitry.	2025-01-28 07:26:13 -06:00
Mats Jun Larsen	d7c14c8f97	[IR] Replace of PointerType::getUnqual(Type) with opaque version (NFC) (#123909 ) Follow up to https://github.com/llvm/llvm-project/issues/123569	2025-01-23 18:23:05 +09:00
Joseph Huber	70a16b90ff	[HIP] Support managed variables using the new driver (#123437 ) Summary: Previously, managed variables didn't work in rdc mode using the new driver because we just didn't register them. This was previously ignored because we didn't have enough space in the current struct format. This patch amends that by just emitting a struct pair for the two variables and using the single pointer. In the future, a more extensible entry format would be nice, but that can be done later.	2025-01-22 09:13:14 -06:00
Jay Foad	d6fc7d3ab1	Fix typo "intead"	2024-11-21 14:48:38 +00:00
Kazu Hirata	36ada1b9b2	[Frontend] Remove unused includes (NFC) (#116927 ) Identified with misc-include-cleaner.	2024-11-20 06:52:17 -08:00
Joseph Huber	42eb54b774	[Clang] Put offloading globals in the `.llvm.rodata.offloading` section (#111890 ) Summary: For our offloading entries, we currently store all the string names of kernels that the runtime will need to load from the target executable. These are available via pointer in the `__tgt_offload_entry` struct, however this makes it difficult to obtain from the object itself. This patch simply puts the strings in a named section so they can be easily queried. The motivation behind this is that when the linker wrapper is doing linking, it wants to know which kernels the host executable is calling. We could get this already via the `.relaomp_offloading_entires` section and trawling through the string table, but that's quite annoying and not portable. The follow-up to this should be to make the linker wrapper get a list of all used symbols the device link job should count as "needed" so we can handle static linking more directly.	2024-10-28 07:17:50 -07:00
Fabian Mora	cfc76b6498	[llvm][offload] Move AMDGPU offload utilities to LLVM (#102487 ) This patch moves utilities from `offload/plugins-nextgen/amdgpu/utils/UtilitiesRTL.h` to `llvm/Frontend/Offloading/Utility.h` to be reused by other projects. Concretely the following changes were made: - Rename `KernelMetaDataTy` to `AMDGPUKernelMetaData`. - Remove unused fields `KernelObject`, `KernelSegmentSize`, `ExplicitArgumentCount` and `ImplicitArgumentCount` from `AMDGPUKernelMetaData`. - Return the produced error if `ELFObj.sections()` failed instead of using `cantFail`. - Added `AGPRCount` field to `AMDGPUKernelMetaData`. - Added a default invalid value to all the fields in `AMDGPUKernelMetaData`.	2024-08-20 09:03:06 -04:00
Joseph Huber	fa9e90f5d2	[Reland][Libomptarget] Statically link all plugin runtimes (#87009 ) This patch overhauls the `libomptarget` and plugin interface. Currently, we define a C API and compile each plugin as a separate shared library. Then, `libomptarget` loads these API functions and forwards its internal calls to them. This was originally designed to allow multiple implementations of a library to be live. However, since then no one has used this functionality and it prevents us from using much nicer interfaces. If the old behavior is desired it should instead be implemented as a separate plugin. This patch replaces the `PluginAdaptorTy` interface with the `GenericPluginTy` that is used by the plugins. Each plugin exports a `createPlugin_<name>` function that is used to get the specific implementation. This code is now shared with `libomptarget`. There are some notable improvements to this. 1. Massively improved lifetimes of life runtime objects 2. The plugins can use a C++ interface 3. Global state does not need to be duplicated for each plugin + libomptarget 4. Easier to use and add features and improve error handling 5. Less function call overhead / Improved LTO performance. Additional changes in this plugin are related to contending with the fact that state is now shared. Initialization and deinitialization is now handled correctly and in phase with the underlying runtime, allowing us to actually know when something is getting deallocated. Depends on https://github.com/llvm/llvm-project/pull/86971 https://github.com/llvm/llvm-project/pull/86875 https://github.com/llvm/llvm-project/pull/86868	2024-05-09 09:38:22 -05:00
Joseph Huber	e5e66073c3	Revert "[Libomptarget] Statically link all plugin runtimes (#87009 )" Caused failures on build-bots, reverting to investigate. This reverts commit 80f9e814ec896fdc57ee84afad8ac4cb1f8e4627.	2024-05-09 07:05:23 -05:00
Joseph Huber	80f9e814ec	[Libomptarget] Statically link all plugin runtimes (#87009 ) This patch overhauls the `libomptarget` and plugin interface. Currently, we define a C API and compile each plugin as a separate shared library. Then, `libomptarget` loads these API functions and forwards its internal calls to them. This was originally designed to allow multiple implementations of a library to be live. However, since then no one has used this functionality and it prevents us from using much nicer interfaces. If the old behavior is desired it should instead be implemented as a separate plugin. This patch replaces the `PluginAdaptorTy` interface with the `GenericPluginTy` that is used by the plugins. Each plugin exports a `createPlugin_<name>` function that is used to get the specific implementation. This code is now shared with `libomptarget`. There are some notable improvements to this. 1. Massively improved lifetimes of life runtime objects 2. The plugins can use a C++ interface 3. Global state does not need to be duplicated for each plugin + libomptarget 4. Easier to use and add features and improve error handling 5. Less function call overhead / Improved LTO performance. Additional changes in this plugin are related to contending with the fact that state is now shared. Initialization and deinitialization is now handled correctly and in phase with the underlying runtime, allowing us to actually know when something is getting deallocated. Depends on https://github.com/llvm/llvm-project/pull/86971 https://github.com/llvm/llvm-project/pull/86875 https://github.com/llvm/llvm-project/pull/86868	2024-05-09 06:35:54 -05:00
Joseph Huber	470aefb240	[Offload][NFC] Remove `omp_` prefix from offloading entries (#88071 ) Summary: These entires are generic for offloading with the new driver now. Having the `omp` prefix was a historical artifact and is confusing when used for CUDA. This patch just renames them for now, future patches will rework the binary format to make it more common.	2024-04-09 15:50:15 -05:00
Joseph Huber	421085fd74	[Offload] Change unregister library to use `atexit` instead of destructor (#86830 ) Summary: The 'new driver' sets up the lifetime of a registered liftime using global constructors and destructors. Currently, this is put at priority 1 which isn't strictly conformant as it will conflict with system utilities. We now use 101 as this is the loweest suggested for non-system constructors and will still run before user constructors. Secondly, there were issues with the CUDA runtime when destructed with a global destructor. Because the global ones are in any order and potentially run before other things we were hitting an edge case where the OpenMP runtime was uninitialized after `_dl_fini` was called. This would result in us erroring when we call into a destroyed `libcuda.so` instance. using `atexit` is what CUDA / HIP use and it prevents this from happening. Most everything uses `atexit` except system utilities and because of the constructor priority it will be unregistered after everything else but not after `_fl_fini`.	2024-03-27 14:09:37 -05:00
Joseph Huber	3ee8c93769	[Offload] Fix NVPTX global entry names Summary: This was missed, the NVPTX globals cannot use a `.`.	2024-02-21 09:56:21 -06:00
Joseph Huber	5c84054223	[LinkerWrapper] Support relocatable linking for offloading (#80066 ) Summary: The standard GPU compilation process embeds each intermediate object file into the host file at the `.llvm.offloading` section so it can be linked later. We also use a special section called something like `omp_offloading_entries` to store all the globals that need to be registered by the runtime. The linker-wrapper's job is to link the embedded device code stored at this section and then emit code to register the linked image and the kernels and globals in the offloading entry section. One downside to RDC linking is that it can become quite big for very large projects that wish to make use of static linking. This patch changes the support for relocatable linking via `-r` to support a kind of "partial" RDC compilation for offloading languages. This primarily requires manually editing the embedded data in the output object file for the relocatable link. We need to rename the output section to make it distinct from the input sections that will be merged. We then delete the old embedded object code so it won't be linked further. We then need to rename the old offloading section so that it is private to the module. A runtime solution could also be done to defer entries that don't belong to the given GPU executable, but this is easier. Note that this does not work with COFF linking, only the ELF method for handling offloading entries, that could be made to work similarly. Given this support, the following compilation path should produce two distinct images for OpenMP offloading. ``` $ clang foo.c -fopenmp --offload-arch=native -c $ clang foo.c -lomptarget.devicertl --offload-link -r -o merged.o $ clang main.c merged.o -fopenmp --offload-arch=native $ ./a.out ``` Or similarly for HIP to effectively perform non-RDC mode compilation for a subset of files. ``` $ clang -x hip foo.c --offload-arch=native --offload-new-driver -fgpu-rdc -c $ clang -x hip foo.c -lomptarget.devicertl --offload-link -r -o merged.o $ clang -x hip main.c merged.o --offload-arch=native --offload-new-driver -fgpu-rdc $ ./a.out ``` One question is whether or not this should be the default behavior of `-r` when run through the linker-wrapper or a special option. Standard `-r` behavior is still possible if used without invoking the linker-wrapper and it guaranteed to be correct.	2024-02-07 08:20:07 -06:00
Joseph Huber	3bf881635c	[Offload] Fix entry global names on NVPTX target Summary: The PTX language rejects globals with `.` in the name. We need to change the global name if we are targeting NVPTX to prevent the toolchain from complaining.	2024-02-05 08:42:02 -06:00
Joseph Huber	a551703cb5	[Offload] Fix the offloading wrapper when merged multiple times. (#79231 ) Summary: The offloading wrapper is a object file that contains code necessary to register offloading entries for the given runtime. Currently, we expected only one of these to be present when we make the final executable. However, in the case of redistributable linking with `-r` we can end up with multiple of these being generated before finally creating the executable. This patch simply changes the defintiions of these globals to be mergable. This allows multiples of these to participate in a single link job. For ELF, we just make the dummy variable internal and used so it sets up the section as expected. For COFF we make the entries weak_odr so they merge to a single symbol	2024-01-24 13:50:35 -06:00
Fabian Mora	9fa9d9a7e1	[llvm][frontend][offloading] Move clang-linker-wrapper/OffloadWrapper.* to llvm/Frontend/Offloading (#78057 ) This patch moves `clang/tools/clang-linker-wrapper/OffloadWrapper.` to `llvm/Frontend/Offloading` allowing them to be re-utilized by other projects. Additionally, it makes minor modifications to the API to make it more flexible. Concretely: - The `wrap` methods now have additional arguments `EntryArray`, `Suffix` and `EmitSurfacesAndTextures` to specify some additional options. - The `EntryArray` is now constructed by the caller. This change is needed to enable JIT compilation, as ORC doesn't fully support `__start_` and `__stop_` symbols. Thus, to JIT the code, the `EntryArray` has to be constructed explicitly in the IR. - The `Suffix` field is used when emitting the descriptor, registration methods, etc, to make them more readable. It is empty by default. - The `EmitSurfacesAndTextures` field controls whether to emit surface and texture registration code, as those functions were removed from `CUDART` in CUDA 12. It is true by default. - The function `getOffloadingEntryInitializer` was added to help create the `EntryArray`, as it returns the constant initializer and not a global variable.	2024-01-15 16:30:07 -05:00
Joseph Huber	97f3be2c5a	[CUDA][HIP] Improve variable registration with the new driver (#73177 ) Summary: This patch adds support for registering texture / surface variables from CUDA / HIP. Additionally, we now properly track the `extern` and `const` flags that are also used in these runtime functions. This does not implement the `managed` variables yet as those seem to require some extra handling I'm not familiar with. The issue is that the current offload entry isn't large enough to carry size and alignment information along with an extra global.	2023-12-07 15:44:23 -06:00
Joseph Huber	52204a29ab	[Offload] Initial support for registering offloading entries on COFF targets (#72697 ) Summary: This patch provides the initial support to allow handling the new driver's offloading entries. Normally, the ELF target can emit varibles at C-identifier named sections and the linker will provide a pointer to the section. For COFF target, instead the linker merges sections containing a `$` in alphabetical order. We thus can emit these variables at sections and then emit two variables that are guaranteed to be sorted before and after the others to traverse it. Previous patches consolidated the handling of offloading entries so that this patch more easily can handle mapping them to the appropriate section. Ideally, the only remaining step to allow the new driver to run on Windows targets is to accurately map the following `ld.lld` arguments to their `llvm-link` equivalents. These are used inside the linker-wrapper, so we should simply need to remap the arguments to the same functionality if possible. ``` -o, -output -l, --library -L, --library-path -v, --version -rpath -whole-archive, -no-whole-archive ``` I have not tested this at runtime as I do not have access to a windows machine. This patch was adapted from some initial efforts in https://reviews.llvm.org/D137470.	2023-11-21 06:48:34 -06:00
Joseph Huber	9c0e64999b	[Offloading][NFC] Refactor handling of offloading entries (#72544 ) Summary: This patch is a simple refactoring of code out of the linker wrapper into a common location. The main motivation behind this change is to make it easier to change the handling in the future to accept a triple to be used to emit entries that function on that target.	2023-11-17 08:26:20 -06:00
Paulo Matos	7b9d73c2f9	[NFC] Remove Type::getInt8PtrTy (#71029 ) Replace this with PointerType::getUnqual(). Followup to the opaque pointer transition. Fixes an in-code TODO item.	2023-11-07 17:26:26 +01:00
Joseph Huber	078ae8cd64	[Offloading][NFC] Move creation of offloading entries from OpenMP (#70116 ) Summary: This patch is a first step to remove dependencies on the OpenMPIRBuilder for creating generic offloading entries. This patch changes no functionality and merely moves the code around. In the future the interface will be changed to allow for more code re-use in the registration and creation of offloading entries as well as a more generic interface for CUDA, HIP, OpenMP, and SYCL(?). Doing this as a first step to reduce the noise involved in the functional changes.	2023-10-25 09:25:43 -04:00

36 Commits