206 Commits

Author SHA1 Message Date
Joseph Huber
9da4d74a88
[Clang] Always pass detected CUDA path to 'clang-nvlink-wrapper' (#152789)
Summary:
We always want to use the detected path. The clang driver's detection is
far more sophisticated so we should use that whenever possible. Also
update the usage so we properly fall back to path instead of incorrectly
using the `/bin` if not provided.
2025-08-11 07:22:11 -05:00
Artem Belevich
12eab1a7b8
[CUDA] Use --image3 to construct fat binary (#151760)
CUDA-12.9 has removed fatbinary tool's `--image` argument we've been
using till now.

--image3 has been supported since cuda-9, so we do not need CUDA SDK
version checks.
2025-08-01 13:14:14 -07:00
Artem Belevich
507b879b6e
[CUDA] add support for targeting sm_103/sm_121 with CUDA-12.9 (#151587) 2025-07-31 13:38:54 -07:00
Joseph Huber
a7d93653a6
[Clang] Rework creating offloading toolchains (#125556)
Summary:
This patch reworks how we create offloading toolchains. Previously we
would handle this separately for all the different kinds. This patch
instead changes this to use the target triple and the offloading kind to
determine the proper toolchain. In the old case where the user only
passes `--offload-arch` we instead infer the triple from the passed
arguments. This is a pretty major overhaul but currently passes all the
clang tests with only minor changes to error messages.
2025-07-21 18:36:39 -05:00
Joseph Huber
ddb018f8d3
[Clang][NFC] Add alias target for amdgpu-arch-tool and nvptx-arch-tool (#147558)
Summary:
These commands both do the same thing and behave like the same tool.
Now, the `nvptx-arch` and `amdgpu-arch` tools cause it to only emit
architectures for that name.
2025-07-08 11:40:22 -05:00
Yaxun (Sam) Liu
44936c8d13
[CUDA][HIP] add options --[no-]offload-inc (#140106)
Currently there is only option -nogpuinc for disabling
the default CUDA/HIP wrapper headers. However, there
are situations where -nogpuinc needs to be overriden
for enabling CUDA/HIP wrapper headers. This patch
adds --[no-]offload-inc for that purpose. When both
exist, the last wins. -nogpuinc and -nocudainc are
now alias to --no-offload-inc.
2025-06-23 11:02:06 -04:00
Cameron McInally
a42bb8b57a
[Driver] Move CommonArgs to a location visible by the Frontend Drivers (#142800)
This patch moves the CommonArgs utilities into a location visible by the
Frontend Drivers, so that the Frontend Drivers may share option parsing
code with the Compiler Driver. This is useful when the Frontend Drivers
would like to verify that their incoming options are well-formed and
also not reinvent the option parsing wheel.

We already see code in the Clang/Flang Drivers that is parsing and
verifying its incoming options. E.g. OPT_ffp_contract. This option is
parsed in the Compiler Driver, Clang Driver, and Flang Driver, all with
slightly different parsing code. It would be nice if the Frontend
Drivers were not required to duplicate this Compiler Driver code. That
way there is no/low maintenance burden on keeping all these parsing
functions in sync.

Along those lines, the Frontend Drivers will now have a useful mechanism
to verify their incoming options are well-formed. Currently, the
Frontend Drivers trust that the Compiler Driver is not passing back junk
in some cases. The Language Drivers may even accept junk with no error
at all. E.g.:

  `clang -cc1 -mprefer-vector-width=junk test.c'

With this patch, we'll now be able to tighten up incomming options to
the Frontend drivers in a lightweight way.

---------

Co-authored-by: Cameron McInally <cmcinally@nvidia.com>
Co-authored-by: Shafik Yaghmour <shafik.yaghmour@intel.com>
2025-06-06 17:59:24 -04:00
Kazu Hirata
6c37341943
[Driver] Remove unused includes (NFC) (#141448)
These are identified by misc-include-cleaner.  I've filtered out those
that break builds.  Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
2025-05-26 09:13:36 -07:00
Brad Smith
dddcbc26d6
[Driver][LTO] Move common code for LTO to addLTOOptions() (#74178) 2025-05-23 23:03:37 -04:00
Joseph Huber
f6e3d33c00
[Clang][NFC] Introduce --offloadlib positive flag for nogpulib and alias to --no-offloadlib (#126567)
Summary:
We support `nogpulib` to disable implicit libraries. In the future we
will want to change the default linking of these libraries based on the
user language. This patch just introduces a positive variant so now we
can do `-nogpulib -gpulib` to disable it.

Later patch will make the default a variable in the ROCmToolChain
depending on the target languages.
2025-02-13 07:59:08 -06:00
Joseph Huber
3d9409f5bc
[NVPTX] Make ctor/dtor lowering always enabled in NVPTX (#126544)
Summary:
Currently we conditionally enable NVPTX lowering depending on the
language (C/C++/OpenMP). Unfortunately this causes problems because this
option is only present if the backend was enabled, which causes this to
error if you try to make LLVM-IR.

This patch instead makes it the only accepted lowering. The reason we
had it as opt-in before is because it is not handled by CUDA. So, this
pach also introduces diagnostics to prevent *all* creation of
device-side global constructors and destructors. We already did this for
variables, now we do it for attributes as well.

This inverts the responsibility of blocking this from the backend to the
langauage like it should be given that support for this is language
dependent.
2025-02-10 14:25:44 -06:00
Joseph Huber
6e2f08b2f8
[Clang][NFC] Clean up fetching the offloading toolchain (#125095)
Summary:
This patch cleans up how we query the offloading toolchain. We create a
single that is more similar to the existing `getToolChain` driver
function and make all the offloading handlers use it.
2025-02-06 16:35:53 -06:00
Joseph Huber
272ce90ed4
[Clang] Make OpenMP offloading consistently use the bound architecture (#125135)
Summary:
OpenMP was weirdly split between using the bound architecture from
`--offload-arch=` and the old `-march=` option which only worked for
single jobs. This patch removes that special handling. The main benefit
here is that we can now use `getToolchainArgs` without it throwing an
error.

I'm assuming SYCL doesn't care about this because they don't use an
architecture.
2025-01-31 10:32:24 -06:00
Joseph Huber
0c71fdd157
[NVPTX] Fix ctor / dtor lowering when NVPTX target is not enabled (#124116)
Summary:
We pass the `-nvptx-lower-global-ctor-dtor` option to support the `libc`
like use-case which needs global constructors sometimes. This only
affects the backend. If the NVPTX target is not enabled this option will
be unknown which prevents you from compiling generic IR for this.
2025-01-23 08:14:52 -06:00
Sergey Kozub
97c3a990f0
Remove incorrect CUDA defines (#123898)
Remove CUDA_127 and CUDA_129 defines incorrectly added in
https://github.com/llvm/llvm-project/pull/123398
2025-01-22 12:15:32 +01:00
Sergey Kozub
616979ebd7
[NVPTX] Add support for PTX 8.6 and CUDA 12.6 (12.8) (#123398)
Add CUDA versions 12.7, 12.8, 12.9 which support PTX8.6+ (enables using Blackwell-specific instructions).
2025-01-21 11:00:24 +01:00
Joseph Huber
953beb9fe9
[CUDA] Move CUDA to new driver by default (#122312)
Summary:
This patch updates the --offload-new-driver flag to be default for CUDA.
This mostly just required updating a lot of tests to use the old format.
I tried to update them where possible, but some were directly checking
the old format.


https://discourse.llvm.org/t/rfc-use-the-new-offloding-driver-for-cuda-and-hip-compilation-by-default/77468/18
2025-01-10 10:58:26 -06:00
Joseph Huber
34f8573a51
[OpenMP] Use generic IR for the OpenMP DeviceRTL (#119091)
Summary:
We previously built this for every single architecture to deal with
incompatibility. This patch updates it to use the 'generic' IR that
`libc` and other projects use. Who knows if this will have any
side-effects, probably worth testing more but it passes the tests I
expect to pass on my side.
2024-12-24 18:05:28 -06:00
Artem Belevich
689c532192
[CUDA] pass -fno-threadsafe-statics to GPU sub-compilations. (#117074)
We do not have support for the threadsafe statics on the GPU side.

However, we do sometimes end up with empty local static initializers,
and those happen to trigger calls to `__cxa_guard*`, which breaks
compilation.

Partially addresses https://github.com/llvm/llvm-project/issues/117023
2024-11-22 10:19:59 -08:00
Joseph Huber
d4c4180417
[Clang] Add a flag to include GPU startup files (#112025)
Summary:
The C library for GPUs provides the ability to target regular C/C++
programs by providing the C library and a file containing kernels that
call the `main` function. This is mostly used for unit tests, this patch
provides a quick way to add them without needing to know the paths. I
currently do this explicitly, but according to the libc++ contributors
we don't want to need to specify these paths manually. See the
discussion in https://github.com/llvm/llvm-project/pull/104515.

I just default to `lib/` if the target-specific one isn't found because
the linker will handle giving a reasonable error message if it's not
found. Basically the use-case looks like this.

```console
$ clang test.c --target=amdgcn-amd-amdhsa -mcpu=native -startfiles -stdlib
$ amdhsa-loader a.out
PASS!
```
2024-10-28 07:17:19 -07:00
Joseph Huber
f7b6dc821a [Clang] Fix missing - in argument to nvlinker 2024-10-18 19:16:42 -05:00
Joseph Huber
416731bf7f
[NvlinkWrapper] Use -plugin-opt=mattr= instead of a custom feature (#111712)
Summary:
We don't need a custom flag for this, LLVM had a way to get the features
which are forwarded via `plugin-opt`.
2024-10-18 17:32:23 -05:00
Artem Belevich
30a06e8022
[CUDA] Add support for CUDA-12.6 and sm_100 (#112028)
This is a copy of #97402(with minor updates), which is now ready to land.

---------

Co-authored-by: Sergey Kozub <skozub@nvidia.com>
2024-10-14 11:51:05 -07:00
Fraser Cormack
72a957ba4c
[Cuda] Handle -fcuda-short-ptr even with -nocudalib (#111682)
When passed -nocudalib/-nogpulib, Cuda's argument handling would bail
out before handling -fcuda-short-ptr, meaning the frontend and backend
data layouts would mismatch.
2024-10-09 16:17:43 +01:00
Joseph Huber
ba8c96593c
[Clang] Do not implicitly link C libraries for the GPU targets (#109052)
Summary:
I initially thought that it would be convenient to automatically link
these libraries like they are for standard C/C++ targets. However, this
created issues when trying to use C++ as a GPU target. This patch moves
the logic to now implicitly pass it as part of the offloading toolchain
instead, if found. This means that the user needs to set the target
toolchain for the link job for automatic detection, but can still be
done manually via `-Xoffload-linker -lc`.
2024-09-18 06:44:07 -07:00
Daniil Fukalov
b8d6885ff6
[NFC] Add explicit #include llvm-config.h where its macros are used, clang part. (#107301)
(this is clang related part)

Without these explicit includes, removing other headers, who implicitly
include llvm-config.h, may have non-trivial side effects. For example,
`clagd` may report even `llvm-config.h` as "no used" in case it defines
a macro, that is explicitly used with #ifdef. It is actually amplified
with different build configs which use different set of macros.
2024-09-06 16:41:24 +02:00
Johannes Doerfert
80525dfcde
[Offload][CUDA] Allow CUDA kernels to use LLVM/Offload (#94549)
Through the new `-foffload-via-llvm` flag, CUDA kernels can now be
lowered to the LLVM/Offload API. On the Clang side, this is simply done
by using the OpenMP offload toolchain and emitting calls to `llvm*`
functions to orchestrate the kernel launch rather than `cuda*`
functions. These `llvm*` functions are implemented on top of the
existing LLVM/Offload API.

As we are about to redefine the Offload API, this wil help us in the
design process as a second offload language.

We do not support any CUDA APIs yet, however, we could:
  https://www.osti.gov/servlets/purl/1892137

For proper host execution we need to resurrect/rebase
  https://tianshilei.me/wp-content/uploads/2021/12/llpp-2021.pdf
(which was designed for debugging).

```
❯❯❯ cat test.cu
extern "C" {
void *llvm_omp_target_alloc_shared(size_t Size, int DeviceNum);
void llvm_omp_target_free_shared(void *DevicePtr, int DeviceNum);
}

__global__ void square(int *A) { *A = 42; }

int main(int argc, char **argv) {
  int DevNo = 0;
  int *Ptr = reinterpret_cast<int *>(llvm_omp_target_alloc_shared(4, DevNo));
  *Ptr = 7;
  printf("Ptr %p, *Ptr %i\n", Ptr, *Ptr);
  square<<<1, 1>>>(Ptr);
  printf("Ptr %p, *Ptr %i\n", Ptr, *Ptr);
  llvm_omp_target_free_shared(Ptr, DevNo);
}

❯❯❯ clang++ test.cu -O3 -o test123 -foffload-via-llvm --offload-arch=native

❯❯❯ llvm-objdump --offloading test123

test123:        file format elf64-x86-64

OFFLOADING IMAGE [0]:
kind            elf
arch            gfx90a
triple          amdgcn-amd-amdhsa
producer        openmp

❯❯❯ LIBOMPTARGET_INFO=16 ./test123
Ptr 0x155448ac8000, *Ptr 7
Ptr 0x155448ac8000, *Ptr 42
```
2024-08-12 17:44:58 -07:00
Joel E. Denny
1ea0865dd6
[Clang] Add env var for nvptx-arch/amdgpu-arch timeout (#102521)
When working on very busy systems, check-offload frequently fails many
tests with this diagnostic:

```
clang: error: cannot determine amdgcn architecture: /tmp/llvm/build/bin/amdgpu-arch: Child timed out: ; consider passing it via '-march'
```

This patch accepts the environment variable
`CLANG_TOOLCHAIN_PROGRAM_TIMEOUT` to set the timeout. It also increases
the timeout from 10 to 60 seconds.
2024-08-09 13:39:29 -04:00
macurtis-amd
13dd795ef1
[clang][NFC] Make OffloadLTOMode getter a separate method (#101200)
Minor readability improvement (IMHO). Also makes it easier to find the
places where we are getting the offload lto mode.
2024-08-05 10:06:51 -05:00
Joseph Huber
2bf58f5d27
[Clang] Suppress missing architecture error when doing LTO (#100652)
Summary:
The `nvlink-wrapper` can do LTO now, which means we can still create
some LLVM-IR without needing an architecture. In the case that we try to
invoke `nvlink` internally, that will still fail. This patch simply
defers the error until later so we can use `--lto-emit-llvm` to get the
IR without specifying an architecture.
2024-07-31 14:41:55 -05:00
Joseph Huber
2b162286ed
[NVPTX] Correctly forward the PTX feature to the nvlink wrapper (#100607)
Summary:
This is necessary for LTO when the user specifies it or has a CUDA
version that supports a sufficiently high version. Previously it would
default.

Fixes https://github.com/llvm/llvm-project/issues/100606
2024-07-25 13:42:44 -05:00
Joseph Huber
4f516aa04b
[Clang] Make the GPU toolchains implicitly link -lm and -lc (#98170)
Summary:
The previous patches (The other commits in this chain) allow the
offloading toolchain to directly invoke the device linker. Because of
this, we can now just have the toolchain implicitly include `-lc` and
`-lm` like a standard target does. This removes the old handling that
went through the fat binary `-lcgpu`.
2024-07-23 18:30:30 -05:00
Joseph Huber
adbe247701 Reapply "[Clang] Correctly forward --cuda-path to the nvlink wrapper (#100170)"
This reverts commit 7d388aeabb34cd954aa57e4321ad3aa9f382c557.
2024-07-23 14:52:30 -05:00
Joseph Huber
7d388aeabb Revert "[Clang] Correctly forward --cuda-path to the nvlink wrapper (#100170)"
This reverts commit 7e1fcf5dd657d465c3fc846f56c6f9d3a4560b43.
2024-07-23 14:51:40 -05:00
Joseph Huber
7e1fcf5dd6
[Clang] Correctly forward --cuda-path to the nvlink wrapper (#100170)
Summary:
This was not forwarded properly as it would try to pass it to `nvlink`.

Fixes https://github.com/llvm/llvm-project/issues/100168
2024-07-23 14:41:57 -05:00
Joseph Huber
37d0568a65
[Clang] Introduce 'clang-nvlink-wrapper' to work around 'nvlink' (#96561)
Summary:
The `clang-nvlink-wrapper` is a utility that I removed awhile back
during the transition to the new driver. This patch adds back in a new,
upgraded version that does LTO + archive linking. It's not an easy
choice to reintroduce something I happily deleted, but this is the only
way to move forward with improving GPU support in LLVM.

While NVIDIA provides a linker called 'nvlink', its main interface is
very difficult to work with. It does not provide LTO, or static linking,
requires all files to be named a non-standard `.cubin`, and rejects link
jobs that other linkers would be fine with (i.e empty). I have spent a
great deal of time hacking around this in the GPU `libc` implementation,
where I deliberately avoid LTO and static linking and have about 100
lines of hacky CMake dedicated to storing these files in a format that
the clang-linker-wrapper accepts to avoid this limitation.

The main reason I want to re-intorudce this tool is because I am
planning on creating a more standard C/C++ toolchain for GPUs to use.
This will install files like the following.
```
<install>/lib/nvptx64-nvidia-cuda/libc.a
<install>/lib/nvptx64-nvidia-cuda/libc++.a
<install>/lib/nvptx64-nvidia-cuda/libomp.a
<install>/lib/clang/19/lib/nvptx64-nvidia-cuda/libclang_rt.builtins.a
```
Linking in these libraries will then simply require passing `-lc` like
is already done for non-GPU toolchains. However, this doesn't work with
the currently deficient `nvlink` linker, so I consider this a blocking
issue to massively improving the state of building GPU libraries.

In the future we may be able to convince NVIDIA to port their linker to
`ld.lld`, but for now this is the only workable solution that allows us
to hack around the weird behavior of their closed-source software.
This also copies some amount of logic from the clang-linker-wrapper,
but not enough for it to be worthwhile to merge them I feel. In the
future it may be possible to delete that handling from there entirely.
2024-07-22 18:20:14 -05:00
Jakub Chlanda
ab20086422
[CUDA][NFC] CudaArch to OffloadArch rename (#97028)
Rename `CudaArch` to `OffloadArch` to better reflect its content and the
use.
Apply a similar rename to helpers handling the enum.
2024-06-30 07:56:07 +02:00
Joseph Huber
2981f3a284
[Clang] Add timeout for GPU detection utilities (#94751)
Summary:
The utilities `nvptx-arch` and `amdgpu-arch` are used to support
`--offload-arch=native` among other utilities in clang. However, these
rely on the GPU drivers to query the features. In certain cases these
drivers can become locked up, which will lead to indefinate hangs on any
compiler jobs running in the meantime.

This patch adds a ten second timeout period for these utilities before
it kills the job and errors out.
2024-06-07 08:45:35 -05:00
Andrey Portnoy
8407779bb7
[CUDA] Mark CUDA-12.5 as supported and introduce ptx 8.5. (#94113)
This PR is based on https://github.com/llvm/llvm-project/pull/91516.
2024-06-05 10:09:24 -07:00
Artem Belevich
2f956a35ed
[CUDA] Mark CUDA-12.4 as supported and introduce ptx 8.4. (#91516) 2024-05-08 12:02:57 -07:00
Jefferson Le Quellec
2921a0928c Make the argument -Xcuda-ptxas visible to the driver in cl-mode
It has been noticed that the arguments are being passed twice to ptxas.
This also has been fixed by filtering out the arguments before appending
them to the new DAL created by CudaToolChain::TranslateArgs.

github:https://github.com/llvm/llvm-project/pull/86807
2024-04-08 14:11:43 +01:00
Yichen Yan
047b2b241d
[NVPTX] Add -march=general option to mirror default configuration (#85222)
This PR adds `-march=generic` support for the NVPTX backend. This
fulfills a TODO introduced in #79873.

With this PR, users can explicitly request the "default" CUDA
architecture, which makes sure that no specific architecture is
specified.

This PR does not address any compatibility issues between different CUDA
versions.

---------

Co-authored-by: Joseph Huber <huberjn@outlook.com>
2024-03-15 17:16:10 -05:00
Joseph Huber
3a56b5a27d
[CUDA] Include PTX in non-RDC mode using the new driver (#84367)
Summary:
The old driver embed PTX in rdc-mode and so does the `nvcc` compiler.
The new drivers currently does not do this, so we should keep it
consistent in this case. This simply requires adding the assembler
output as an input to the offloading action that gets fed to fatbin.
2024-03-07 16:53:41 -06:00
Joseph Huber
1977404d20
[OpenMP] Respect LLVM per-target install directories (#83282)
Summary:
One recurring problem we have with the OpenMP libraries is that they are
potentially conflicting with ones found on the system, this occurs when
there are two copies and one is used for linking that it not attached to
the correspoding clang compiler. LLVM already uses target specific
directories for this, like with libc++, which are always searched first.
This patch changes the install directory to be
`lib/x86_64-unknown-linux-gnu` for example.

Notable changes would be that users will need to change their
LD_LIBRARY_PATH settings optionally, or use default rt-rpath options.
This should fix problems were users are linking the wrong versions of
static libraries
2024-02-28 15:39:27 -06:00
Joseph Huber
99660082cb
[Clang] Append target search paths for direct offloading compilation (#82699)
Summary:
Recent changes to the `libc` project caused the headers to be installed
to `include/<triple>` for the GPU and the libraries to be in
`lib/<triple>`. This means we should automatically append these search
paths so they can be found by default. This allows the following to work
targeting AMDGPU.

```shell
$ clang foo.c -flto -mcpu=native --target=amdgcn-amd-amdhsa -lc <install>/lib/amdgcn-amd-amdhsa/crt1.o
$ amdhsa-loader a.out
```
2024-02-23 14:21:02 -06:00
Joseph Huber
d5a15f3116
[Clang][NVPTX] Allow passing arguments to the linker while standalone (#73030)
Summary:
We support standalone compilation for the NVPTX architecture using
'nvlink' as our linker. Because of the special handling required to
transform input files to cubins, as nvlink expects for some reason, we
didn't use the standard AddLinkerInput method. However, this also meant
that we weren't forwarding options passed with -Wl to the linker. Add
this support in for the standalone toolchain path.

Revived from https://reviews.llvm.org/D149978
2024-02-22 16:27:53 -06:00
Joseph Huber
7155c1ef65
[NVPTX] Allow compiling LLVM-IR without -march set (#79873)
Summary:
The NVPTX tools require an architecture to be used, however if we are
creating generic LLVM-IR we should be able to leave it unspecified. This
will result in the `target-cpu` attributes not being set on the
functions so it can be changed when linked into code. This allows the
standalone `--target=nvptx64-nvidia-cuda` toolchain to create LLVM-IR
simmilar to how CUDA's deviceRTL looks from C/C++
2024-01-30 21:44:43 -06:00
Joseph Huber
82d335e70f
[NVPTX] Add support for -march=native in standalone NVPTX (#79373)
Summary:
We support `--target=nvptx64-nvidia-cuda` as a way to target the NVPTX
architecture from standard CPU. This patch simply uses the existing
support for handling `--offload-arch=native` to also apply to the
standalone toolchain.
2024-01-25 15:56:13 -06:00
Kazu Hirata
10886a8f0a [Driver] Use SmallString::operator std::string (NFC) 2024-01-19 22:24:09 -08:00
Kazu Hirata
f3dcc2351c
[clang] Use StringRef::{starts,ends}_with (NFC) (#75149)
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.

I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
2023-12-13 08:54:13 -08:00