llvm-project

Author	SHA1	Message	Date
Joseph Huber	9da4d74a88	[Clang] Always pass detected CUDA path to 'clang-nvlink-wrapper' (#152789 ) Summary: We always want to use the detected path. The clang driver's detection is far more sophisticated so we should use that whenever possible. Also update the usage so we properly fall back to path instead of incorrectly using the `/bin` if not provided.	2025-08-11 07:22:11 -05:00
Artem Belevich	12eab1a7b8	[CUDA] Use --image3 to construct fat binary (#151760 ) CUDA-12.9 has removed fatbinary tool's `--image` argument we've been using till now. --image3 has been supported since cuda-9, so we do not need CUDA SDK version checks.	2025-08-01 13:14:14 -07:00
Artem Belevich	507b879b6e	[CUDA] add support for targeting sm_103/sm_121 with CUDA-12.9 (#151587 )	2025-07-31 13:38:54 -07:00
Joseph Huber	a7d93653a6	[Clang] Rework creating offloading toolchains (#125556 ) Summary: This patch reworks how we create offloading toolchains. Previously we would handle this separately for all the different kinds. This patch instead changes this to use the target triple and the offloading kind to determine the proper toolchain. In the old case where the user only passes `--offload-arch` we instead infer the triple from the passed arguments. This is a pretty major overhaul but currently passes all the clang tests with only minor changes to error messages.	2025-07-21 18:36:39 -05:00
Joseph Huber	ddb018f8d3	[Clang][NFC] Add alias target for amdgpu-arch-tool and nvptx-arch-tool (#147558 ) Summary: These commands both do the same thing and behave like the same tool. Now, the `nvptx-arch` and `amdgpu-arch` tools cause it to only emit architectures for that name.	2025-07-08 11:40:22 -05:00
Yaxun (Sam) Liu	44936c8d13	[CUDA][HIP] add options `--[no-]offload-inc` (#140106 ) Currently there is only option -nogpuinc for disabling the default CUDA/HIP wrapper headers. However, there are situations where -nogpuinc needs to be overriden for enabling CUDA/HIP wrapper headers. This patch adds --[no-]offload-inc for that purpose. When both exist, the last wins. -nogpuinc and -nocudainc are now alias to --no-offload-inc.	2025-06-23 11:02:06 -04:00
Cameron McInally	a42bb8b57a	[Driver] Move CommonArgs to a location visible by the Frontend Drivers (#142800 ) This patch moves the CommonArgs utilities into a location visible by the Frontend Drivers, so that the Frontend Drivers may share option parsing code with the Compiler Driver. This is useful when the Frontend Drivers would like to verify that their incoming options are well-formed and also not reinvent the option parsing wheel. We already see code in the Clang/Flang Drivers that is parsing and verifying its incoming options. E.g. OPT_ffp_contract. This option is parsed in the Compiler Driver, Clang Driver, and Flang Driver, all with slightly different parsing code. It would be nice if the Frontend Drivers were not required to duplicate this Compiler Driver code. That way there is no/low maintenance burden on keeping all these parsing functions in sync. Along those lines, the Frontend Drivers will now have a useful mechanism to verify their incoming options are well-formed. Currently, the Frontend Drivers trust that the Compiler Driver is not passing back junk in some cases. The Language Drivers may even accept junk with no error at all. E.g.: `clang -cc1 -mprefer-vector-width=junk test.c' With this patch, we'll now be able to tighten up incomming options to the Frontend drivers in a lightweight way. --------- Co-authored-by: Cameron McInally <cmcinally@nvidia.com> Co-authored-by: Shafik Yaghmour <shafik.yaghmour@intel.com>	2025-06-06 17:59:24 -04:00
Kazu Hirata	6c37341943	[Driver] Remove unused includes (NFC) (#141448 ) These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.	2025-05-26 09:13:36 -07:00
Brad Smith	dddcbc26d6	[Driver][LTO] Move common code for LTO to addLTOOptions() (#74178 )	2025-05-23 23:03:37 -04:00
Joseph Huber	f6e3d33c00	[Clang][NFC] Introduce `--offloadlib` positive flag for `nogpulib` and alias to `--no-offloadlib` (#126567 ) Summary: We support `nogpulib` to disable implicit libraries. In the future we will want to change the default linking of these libraries based on the user language. This patch just introduces a positive variant so now we can do `-nogpulib -gpulib` to disable it. Later patch will make the default a variable in the ROCmToolChain depending on the target languages.	2025-02-13 07:59:08 -06:00
Joseph Huber	3d9409f5bc	[NVPTX] Make ctor/dtor lowering always enabled in NVPTX (#126544 ) Summary: Currently we conditionally enable NVPTX lowering depending on the language (C/C++/OpenMP). Unfortunately this causes problems because this option is only present if the backend was enabled, which causes this to error if you try to make LLVM-IR. This patch instead makes it the only accepted lowering. The reason we had it as opt-in before is because it is not handled by CUDA. So, this pach also introduces diagnostics to prevent all creation of device-side global constructors and destructors. We already did this for variables, now we do it for attributes as well. This inverts the responsibility of blocking this from the backend to the langauage like it should be given that support for this is language dependent.	2025-02-10 14:25:44 -06:00
Joseph Huber	6e2f08b2f8	[Clang][NFC] Clean up fetching the offloading toolchain (#125095 ) Summary: This patch cleans up how we query the offloading toolchain. We create a single that is more similar to the existing `getToolChain` driver function and make all the offloading handlers use it.	2025-02-06 16:35:53 -06:00
Joseph Huber	272ce90ed4	[Clang] Make OpenMP offloading consistently use the bound architecture (#125135 ) Summary: OpenMP was weirdly split between using the bound architecture from `--offload-arch=` and the old `-march=` option which only worked for single jobs. This patch removes that special handling. The main benefit here is that we can now use `getToolchainArgs` without it throwing an error. I'm assuming SYCL doesn't care about this because they don't use an architecture.	2025-01-31 10:32:24 -06:00
Joseph Huber	0c71fdd157	[NVPTX] Fix ctor / dtor lowering when NVPTX target is not enabled (#124116 ) Summary: We pass the `-nvptx-lower-global-ctor-dtor` option to support the `libc` like use-case which needs global constructors sometimes. This only affects the backend. If the NVPTX target is not enabled this option will be unknown which prevents you from compiling generic IR for this.	2025-01-23 08:14:52 -06:00
Sergey Kozub	97c3a990f0	Remove incorrect CUDA defines (#123898 ) Remove CUDA_127 and CUDA_129 defines incorrectly added in https://github.com/llvm/llvm-project/pull/123398	2025-01-22 12:15:32 +01:00
Sergey Kozub	616979ebd7	[NVPTX] Add support for PTX 8.6 and CUDA 12.6 (12.8) (#123398 ) Add CUDA versions 12.7, 12.8, 12.9 which support PTX8.6+ (enables using Blackwell-specific instructions).	2025-01-21 11:00:24 +01:00
Joseph Huber	953beb9fe9	[CUDA] Move CUDA to new driver by default (#122312 ) Summary: This patch updates the --offload-new-driver flag to be default for CUDA. This mostly just required updating a lot of tests to use the old format. I tried to update them where possible, but some were directly checking the old format. https://discourse.llvm.org/t/rfc-use-the-new-offloding-driver-for-cuda-and-hip-compilation-by-default/77468/18	2025-01-10 10:58:26 -06:00
Joseph Huber	34f8573a51	[OpenMP] Use generic IR for the OpenMP DeviceRTL (#119091 ) Summary: We previously built this for every single architecture to deal with incompatibility. This patch updates it to use the 'generic' IR that `libc` and other projects use. Who knows if this will have any side-effects, probably worth testing more but it passes the tests I expect to pass on my side.	2024-12-24 18:05:28 -06:00
Artem Belevich	689c532192	[CUDA] pass -fno-threadsafe-statics to GPU sub-compilations. (#117074 ) We do not have support for the threadsafe statics on the GPU side. However, we do sometimes end up with empty local static initializers, and those happen to trigger calls to `__cxa_guard*`, which breaks compilation. Partially addresses https://github.com/llvm/llvm-project/issues/117023	2024-11-22 10:19:59 -08:00
Joseph Huber	d4c4180417	[Clang] Add a flag to include GPU startup files (#112025 ) Summary: The C library for GPUs provides the ability to target regular C/C++ programs by providing the C library and a file containing kernels that call the `main` function. This is mostly used for unit tests, this patch provides a quick way to add them without needing to know the paths. I currently do this explicitly, but according to the libc++ contributors we don't want to need to specify these paths manually. See the discussion in https://github.com/llvm/llvm-project/pull/104515. I just default to `lib/` if the target-specific one isn't found because the linker will handle giving a reasonable error message if it's not found. Basically the use-case looks like this. ```console $ clang test.c --target=amdgcn-amd-amdhsa -mcpu=native -startfiles -stdlib $ amdhsa-loader a.out PASS! ```	2024-10-28 07:17:19 -07:00
Joseph Huber	f7b6dc821a	[Clang] Fix missing `-` in argument to nvlinker	2024-10-18 19:16:42 -05:00
Joseph Huber	416731bf7f	[NvlinkWrapper] Use `-plugin-opt=mattr=` instead of a custom feature (#111712 ) Summary: We don't need a custom flag for this, LLVM had a way to get the features which are forwarded via `plugin-opt`.	2024-10-18 17:32:23 -05:00
Artem Belevich	30a06e8022	[CUDA] Add support for CUDA-12.6 and sm_100 (#112028 ) This is a copy of #97402(with minor updates), which is now ready to land. --------- Co-authored-by: Sergey Kozub <skozub@nvidia.com>	2024-10-14 11:51:05 -07:00
Fraser Cormack	72a957ba4c	[Cuda] Handle -fcuda-short-ptr even with -nocudalib (#111682 ) When passed -nocudalib/-nogpulib, Cuda's argument handling would bail out before handling -fcuda-short-ptr, meaning the frontend and backend data layouts would mismatch.	2024-10-09 16:17:43 +01:00
Joseph Huber	ba8c96593c	[Clang] Do not implicitly link C libraries for the GPU targets (#109052 ) Summary: I initially thought that it would be convenient to automatically link these libraries like they are for standard C/C++ targets. However, this created issues when trying to use C++ as a GPU target. This patch moves the logic to now implicitly pass it as part of the offloading toolchain instead, if found. This means that the user needs to set the target toolchain for the link job for automatic detection, but can still be done manually via `-Xoffload-linker -lc`.	2024-09-18 06:44:07 -07:00
Daniil Fukalov	b8d6885ff6	[NFC] Add explicit #include llvm-config.h where its macros are used, clang part. (#107301 ) (this is clang related part) Without these explicit includes, removing other headers, who implicitly include llvm-config.h, may have non-trivial side effects. For example, `clagd` may report even `llvm-config.h` as "no used" in case it defines a macro, that is explicitly used with #ifdef. It is actually amplified with different build configs which use different set of macros.	2024-09-06 16:41:24 +02:00
Johannes Doerfert	80525dfcde	[Offload][CUDA] Allow CUDA kernels to use LLVM/Offload (#94549 ) Through the new `-foffload-via-llvm` flag, CUDA kernels can now be lowered to the LLVM/Offload API. On the Clang side, this is simply done by using the OpenMP offload toolchain and emitting calls to `llvm` functions to orchestrate the kernel launch rather than `cuda` functions. These `llvm` functions are implemented on top of the existing LLVM/Offload API. As we are about to redefine the Offload API, this wil help us in the design process as a second offload language. We do not support any CUDA APIs yet, however, we could: https://www.osti.gov/servlets/purl/1892137 For proper host execution we need to resurrect/rebase https://tianshilei.me/wp-content/uploads/2021/12/llpp-2021.pdf (which was designed for debugging). ``` ❯❯❯ cat test.cu extern "C" { void llvm_omp_target_alloc_shared(size_t Size, int DeviceNum); void llvm_omp_target_free_shared(void DevicePtr, int DeviceNum); } __global__ void square(int A) { A = 42; } int main(int argc, char argv) { int DevNo = 0; int Ptr = reinterpret_cast<int >(llvm_omp_target_alloc_shared(4, DevNo)); Ptr = 7; printf("Ptr %p, Ptr %i\n", Ptr, Ptr); square<<<1, 1>>>(Ptr); printf("Ptr %p, Ptr %i\n", Ptr, Ptr); llvm_omp_target_free_shared(Ptr, DevNo); } ❯❯❯ clang++ test.cu -O3 -o test123 -foffload-via-llvm --offload-arch=native ❯❯❯ llvm-objdump --offloading test123 test123: file format elf64-x86-64 OFFLOADING IMAGE [0]: kind elf arch gfx90a triple amdgcn-amd-amdhsa producer openmp ❯❯❯ LIBOMPTARGET_INFO=16 ./test123 Ptr 0x155448ac8000, Ptr 7 Ptr 0x155448ac8000, Ptr 42 ```	2024-08-12 17:44:58 -07:00
Joel E. Denny	1ea0865dd6	[Clang] Add env var for nvptx-arch/amdgpu-arch timeout (#102521 ) When working on very busy systems, check-offload frequently fails many tests with this diagnostic: ``` clang: error: cannot determine amdgcn architecture: /tmp/llvm/build/bin/amdgpu-arch: Child timed out: ; consider passing it via '-march' ``` This patch accepts the environment variable `CLANG_TOOLCHAIN_PROGRAM_TIMEOUT` to set the timeout. It also increases the timeout from 10 to 60 seconds.	2024-08-09 13:39:29 -04:00
macurtis-amd	13dd795ef1	[clang][NFC] Make OffloadLTOMode getter a separate method (#101200 ) Minor readability improvement (IMHO). Also makes it easier to find the places where we are getting the offload lto mode.	2024-08-05 10:06:51 -05:00
Joseph Huber	2bf58f5d27	[Clang] Suppress missing architecture error when doing LTO (#100652 ) Summary: The `nvlink-wrapper` can do LTO now, which means we can still create some LLVM-IR without needing an architecture. In the case that we try to invoke `nvlink` internally, that will still fail. This patch simply defers the error until later so we can use `--lto-emit-llvm` to get the IR without specifying an architecture.	2024-07-31 14:41:55 -05:00
Joseph Huber	2b162286ed	[NVPTX] Correctly forward the PTX feature to the nvlink wrapper (#100607 ) Summary: This is necessary for LTO when the user specifies it or has a CUDA version that supports a sufficiently high version. Previously it would default. Fixes https://github.com/llvm/llvm-project/issues/100606	2024-07-25 13:42:44 -05:00
Joseph Huber	4f516aa04b	[Clang] Make the GPU toolchains implicitly link `-lm` and `-lc` (#98170 ) Summary: The previous patches (The other commits in this chain) allow the offloading toolchain to directly invoke the device linker. Because of this, we can now just have the toolchain implicitly include `-lc` and `-lm` like a standard target does. This removes the old handling that went through the fat binary `-lcgpu`.	2024-07-23 18:30:30 -05:00
Joseph Huber	adbe247701	Reapply "[Clang] Correctly forward `--cuda-path` to the nvlink wrapper (#100170 )" This reverts commit 7d388aeabb34cd954aa57e4321ad3aa9f382c557.	2024-07-23 14:52:30 -05:00
Joseph Huber	7d388aeabb	Revert "[Clang] Correctly forward `--cuda-path` to the nvlink wrapper (#100170 )" This reverts commit 7e1fcf5dd657d465c3fc846f56c6f9d3a4560b43.	2024-07-23 14:51:40 -05:00
Joseph Huber	7e1fcf5dd6	[Clang] Correctly forward `--cuda-path` to the nvlink wrapper (#100170 ) Summary: This was not forwarded properly as it would try to pass it to `nvlink`. Fixes https://github.com/llvm/llvm-project/issues/100168	2024-07-23 14:41:57 -05:00
Joseph Huber	37d0568a65	[Clang] Introduce 'clang-nvlink-wrapper' to work around 'nvlink' (#96561 ) Summary: The `clang-nvlink-wrapper` is a utility that I removed awhile back during the transition to the new driver. This patch adds back in a new, upgraded version that does LTO + archive linking. It's not an easy choice to reintroduce something I happily deleted, but this is the only way to move forward with improving GPU support in LLVM. While NVIDIA provides a linker called 'nvlink', its main interface is very difficult to work with. It does not provide LTO, or static linking, requires all files to be named a non-standard `.cubin`, and rejects link jobs that other linkers would be fine with (i.e empty). I have spent a great deal of time hacking around this in the GPU `libc` implementation, where I deliberately avoid LTO and static linking and have about 100 lines of hacky CMake dedicated to storing these files in a format that the clang-linker-wrapper accepts to avoid this limitation. The main reason I want to re-intorudce this tool is because I am planning on creating a more standard C/C++ toolchain for GPUs to use. This will install files like the following. ``` <install>/lib/nvptx64-nvidia-cuda/libc.a <install>/lib/nvptx64-nvidia-cuda/libc++.a <install>/lib/nvptx64-nvidia-cuda/libomp.a <install>/lib/clang/19/lib/nvptx64-nvidia-cuda/libclang_rt.builtins.a ``` Linking in these libraries will then simply require passing `-lc` like is already done for non-GPU toolchains. However, this doesn't work with the currently deficient `nvlink` linker, so I consider this a blocking issue to massively improving the state of building GPU libraries. In the future we may be able to convince NVIDIA to port their linker to `ld.lld`, but for now this is the only workable solution that allows us to hack around the weird behavior of their closed-source software. This also copies some amount of logic from the clang-linker-wrapper, but not enough for it to be worthwhile to merge them I feel. In the future it may be possible to delete that handling from there entirely.	2024-07-22 18:20:14 -05:00
Jakub Chlanda	ab20086422	[CUDA][NFC] CudaArch to OffloadArch rename (#97028 ) Rename `CudaArch` to `OffloadArch` to better reflect its content and the use. Apply a similar rename to helpers handling the enum.	2024-06-30 07:56:07 +02:00
Joseph Huber	2981f3a284	[Clang] Add timeout for GPU detection utilities (#94751 ) Summary: The utilities `nvptx-arch` and `amdgpu-arch` are used to support `--offload-arch=native` among other utilities in clang. However, these rely on the GPU drivers to query the features. In certain cases these drivers can become locked up, which will lead to indefinate hangs on any compiler jobs running in the meantime. This patch adds a ten second timeout period for these utilities before it kills the job and errors out.	2024-06-07 08:45:35 -05:00
Andrey Portnoy	8407779bb7	[CUDA] Mark CUDA-12.5 as supported and introduce ptx 8.5. (#94113 ) This PR is based on https://github.com/llvm/llvm-project/pull/91516.	2024-06-05 10:09:24 -07:00
Artem Belevich	2f956a35ed	[CUDA] Mark CUDA-12.4 as supported and introduce ptx 8.4. (#91516 )	2024-05-08 12:02:57 -07:00
Jefferson Le Quellec	2921a0928c	Make the argument -Xcuda-ptxas visible to the driver in cl-mode It has been noticed that the arguments are being passed twice to ptxas. This also has been fixed by filtering out the arguments before appending them to the new DAL created by CudaToolChain::TranslateArgs. github:https://github.com/llvm/llvm-project/pull/86807	2024-04-08 14:11:43 +01:00
Yichen Yan	047b2b241d	[NVPTX] Add `-march=general` option to mirror default configuration (#85222 ) This PR adds `-march=generic` support for the NVPTX backend. This fulfills a TODO introduced in #79873. With this PR, users can explicitly request the "default" CUDA architecture, which makes sure that no specific architecture is specified. This PR does not address any compatibility issues between different CUDA versions. --------- Co-authored-by: Joseph Huber <huberjn@outlook.com>	2024-03-15 17:16:10 -05:00
Joseph Huber	3a56b5a27d	[CUDA] Include PTX in non-RDC mode using the new driver (#84367 ) Summary: The old driver embed PTX in rdc-mode and so does the `nvcc` compiler. The new drivers currently does not do this, so we should keep it consistent in this case. This simply requires adding the assembler output as an input to the offloading action that gets fed to fatbin.	2024-03-07 16:53:41 -06:00
Joseph Huber	1977404d20	[OpenMP] Respect LLVM per-target install directories (#83282 ) Summary: One recurring problem we have with the OpenMP libraries is that they are potentially conflicting with ones found on the system, this occurs when there are two copies and one is used for linking that it not attached to the correspoding clang compiler. LLVM already uses target specific directories for this, like with libc++, which are always searched first. This patch changes the install directory to be `lib/x86_64-unknown-linux-gnu` for example. Notable changes would be that users will need to change their LD_LIBRARY_PATH settings optionally, or use default rt-rpath options. This should fix problems were users are linking the wrong versions of static libraries	2024-02-28 15:39:27 -06:00
Joseph Huber	99660082cb	[Clang] Append target search paths for direct offloading compilation (#82699 ) Summary: Recent changes to the `libc` project caused the headers to be installed to `include/<triple>` for the GPU and the libraries to be in `lib/<triple>`. This means we should automatically append these search paths so they can be found by default. This allows the following to work targeting AMDGPU. ```shell $ clang foo.c -flto -mcpu=native --target=amdgcn-amd-amdhsa -lc <install>/lib/amdgcn-amd-amdhsa/crt1.o $ amdhsa-loader a.out ```	2024-02-23 14:21:02 -06:00
Joseph Huber	d5a15f3116	[Clang][NVPTX] Allow passing arguments to the linker while standalone (#73030 ) Summary: We support standalone compilation for the NVPTX architecture using 'nvlink' as our linker. Because of the special handling required to transform input files to cubins, as nvlink expects for some reason, we didn't use the standard AddLinkerInput method. However, this also meant that we weren't forwarding options passed with -Wl to the linker. Add this support in for the standalone toolchain path. Revived from https://reviews.llvm.org/D149978	2024-02-22 16:27:53 -06:00
Joseph Huber	7155c1ef65	[NVPTX] Allow compiling LLVM-IR without `-march` set (#79873 ) Summary: The NVPTX tools require an architecture to be used, however if we are creating generic LLVM-IR we should be able to leave it unspecified. This will result in the `target-cpu` attributes not being set on the functions so it can be changed when linked into code. This allows the standalone `--target=nvptx64-nvidia-cuda` toolchain to create LLVM-IR simmilar to how CUDA's deviceRTL looks from C/C++	2024-01-30 21:44:43 -06:00
Joseph Huber	82d335e70f	[NVPTX] Add support for -march=native in standalone NVPTX (#79373 ) Summary: We support `--target=nvptx64-nvidia-cuda` as a way to target the NVPTX architecture from standard CPU. This patch simply uses the existing support for handling `--offload-arch=native` to also apply to the standalone toolchain.	2024-01-25 15:56:13 -06:00
Kazu Hirata	10886a8f0a	[Driver] Use SmallString::operator std::string (NFC)	2024-01-19 22:24:09 -08:00
Kazu Hirata	f3dcc2351c	[clang] Use StringRef::{starts,ends}_with (NFC) (#75149 ) This patch replaces uses of StringRef::{starts,ends}with with StringRef::{starts,ends}_with for consistency with std::{string,string_view}::{starts,ends}_with in C++20. I'm planning to deprecate and eventually remove StringRef::{starts,ends}with.	2023-12-13 08:54:13 -08:00

1 2 3 4 5

206 Commits