166 Commits

Author SHA1 Message Date
Jefferson Le Quellec
2921a0928c Make the argument -Xcuda-ptxas visible to the driver in cl-mode
It has been noticed that the arguments are being passed twice to ptxas.
This also has been fixed by filtering out the arguments before appending
them to the new DAL created by CudaToolChain::TranslateArgs.

github:https://github.com/llvm/llvm-project/pull/86807
2024-04-08 14:11:43 +01:00
Yichen Yan
047b2b241d
[NVPTX] Add -march=general option to mirror default configuration (#85222)
This PR adds `-march=generic` support for the NVPTX backend. This
fulfills a TODO introduced in #79873.

With this PR, users can explicitly request the "default" CUDA
architecture, which makes sure that no specific architecture is
specified.

This PR does not address any compatibility issues between different CUDA
versions.

---------

Co-authored-by: Joseph Huber <huberjn@outlook.com>
2024-03-15 17:16:10 -05:00
Joseph Huber
3a56b5a27d
[CUDA] Include PTX in non-RDC mode using the new driver (#84367)
Summary:
The old driver embed PTX in rdc-mode and so does the `nvcc` compiler.
The new drivers currently does not do this, so we should keep it
consistent in this case. This simply requires adding the assembler
output as an input to the offloading action that gets fed to fatbin.
2024-03-07 16:53:41 -06:00
Joseph Huber
1977404d20
[OpenMP] Respect LLVM per-target install directories (#83282)
Summary:
One recurring problem we have with the OpenMP libraries is that they are
potentially conflicting with ones found on the system, this occurs when
there are two copies and one is used for linking that it not attached to
the correspoding clang compiler. LLVM already uses target specific
directories for this, like with libc++, which are always searched first.
This patch changes the install directory to be
`lib/x86_64-unknown-linux-gnu` for example.

Notable changes would be that users will need to change their
LD_LIBRARY_PATH settings optionally, or use default rt-rpath options.
This should fix problems were users are linking the wrong versions of
static libraries
2024-02-28 15:39:27 -06:00
Joseph Huber
99660082cb
[Clang] Append target search paths for direct offloading compilation (#82699)
Summary:
Recent changes to the `libc` project caused the headers to be installed
to `include/<triple>` for the GPU and the libraries to be in
`lib/<triple>`. This means we should automatically append these search
paths so they can be found by default. This allows the following to work
targeting AMDGPU.

```shell
$ clang foo.c -flto -mcpu=native --target=amdgcn-amd-amdhsa -lc <install>/lib/amdgcn-amd-amdhsa/crt1.o
$ amdhsa-loader a.out
```
2024-02-23 14:21:02 -06:00
Joseph Huber
d5a15f3116
[Clang][NVPTX] Allow passing arguments to the linker while standalone (#73030)
Summary:
We support standalone compilation for the NVPTX architecture using
'nvlink' as our linker. Because of the special handling required to
transform input files to cubins, as nvlink expects for some reason, we
didn't use the standard AddLinkerInput method. However, this also meant
that we weren't forwarding options passed with -Wl to the linker. Add
this support in for the standalone toolchain path.

Revived from https://reviews.llvm.org/D149978
2024-02-22 16:27:53 -06:00
Joseph Huber
7155c1ef65
[NVPTX] Allow compiling LLVM-IR without -march set (#79873)
Summary:
The NVPTX tools require an architecture to be used, however if we are
creating generic LLVM-IR we should be able to leave it unspecified. This
will result in the `target-cpu` attributes not being set on the
functions so it can be changed when linked into code. This allows the
standalone `--target=nvptx64-nvidia-cuda` toolchain to create LLVM-IR
simmilar to how CUDA's deviceRTL looks from C/C++
2024-01-30 21:44:43 -06:00
Joseph Huber
82d335e70f
[NVPTX] Add support for -march=native in standalone NVPTX (#79373)
Summary:
We support `--target=nvptx64-nvidia-cuda` as a way to target the NVPTX
architecture from standard CPU. This patch simply uses the existing
support for handling `--offload-arch=native` to also apply to the
standalone toolchain.
2024-01-25 15:56:13 -06:00
Kazu Hirata
10886a8f0a [Driver] Use SmallString::operator std::string (NFC) 2024-01-19 22:24:09 -08:00
Kazu Hirata
f3dcc2351c
[clang] Use StringRef::{starts,ends}_with (NFC) (#75149)
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.

I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
2023-12-13 08:54:13 -08:00
Artem Belevich
631c6e834c
[CUDA] Add support for CUDA-12.3 and sm_90a (#74895) 2023-12-11 12:18:28 -08:00
Brad Smith
8a4b9e9965
[Driver] Move assertion check before checking Output.isFilename (#67210) 2023-09-25 20:19:25 -04:00
Takuya Shimizu
01b88dd66d [NFC] Remove unused variables declared in conditions
D152495 makes clang warn on unused variables that are declared in conditions like `if (int var = init) {}`
This patch is an NFC fix to suppress the new warning in llvm,clang,lld builds to pass CI in the above patch.

Differential Revision: https://reviews.llvm.org/D158016
2023-08-30 10:05:06 +09:00
Yaxun (Sam) Liu
e17882430e [CUDA][HIP] Rename and fix -fcuda-approx-transcendentals
Rename -fcuda-approx-transcendentals as
-fgpu-approx-transcendentals and pass it
to both device and host clang -cc1.

Fix its interaction with -ffast-math to allow
-fno-gpu-approx-transcendentals to override
the implicit -fcuda-approx-transcendentals
due to -ffast-math.

Rename the predefined macro to be
__CLANG_GPU_APPROX_TRANSCENDENTALS__.
Emit the macro for both device and host compilation.

Reviewed by: Artem Belevich, Fangrui Song

Differential Revision: https://reviews.llvm.org/D154797
2023-07-25 12:01:41 -04:00
Joseph Huber
d2ac0069a2 [Clang] Only emit CUDA version warnings when creating the CUDA toolchain
This warning primarily applies to users of the CUDA langues as there may
be new features we rely on. The other two users of the toolchain are
OpenMP via `-fopenmp --offload-arch=sm_70` and a cross-compiled build
via `--target=nvptx64-nvida-cuda -march=sm_70`. Both of these do not
rely directly on things that would change significantly between CUDA
versions, and the way they are built can sometims make this warning
print many times.

This patch changees the behaiour to only check for the version when
building for CUDA offloading specifically, the other two will not have
this check.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D155606
2023-07-18 13:48:11 -05:00
Artem Belevich
ffb635cb2d [CUDA] bump supported CUDA version to 12.1/11.8
Differential Revision: https://reviews.llvm.org/D151361
2023-05-25 11:57:55 -07:00
Artem Belevich
a825f3754b [CUDA] Relax restrictions on GPU-side variadic functions
Allow parsing GPU-side variadic functions when we're compiling with CUDA-9 or
newer. We still do not allow accessing variadic arguments.

CUDA-9 was the version which introduced PTX-6.0 which allows implementing
variadic functions, so older versions can't have variadics in principle.

This is required for dealing with headers in recent CUDA versions that rely on
variadic function declarations in some of the templated code in libcu++.
E.g. https://github.com/llvm/llvm-project/issues/58410

Differential Revision: https://reviews.llvm.org/D150718
2023-05-17 12:51:01 -07:00
Joseph Huber
c2c917f7f6 [Clang] Change default triple to LLVM_HOST_TRIPLE for the CUDA toolchain
When cross-compiling NVPTX we use the triple to indicate which paths to
search for the CUDA toolchain. Currently this uses the default target
triple. This might not be exactly correct, as this is the default triple
used to compile binaries, not the host system. We want the host triple
because it indicates which folders should hold CUDA.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D150136
2023-05-08 15:54:50 -05:00
Joseph Huber
f05ce9045a [NVPTX] Add NVPTXCtorDtorLoweringPass to handle global ctors / dtors
This patch mostly adapts the existing AMDGPUCtorDtorLoweringPass for use
by the Nvidia backend. This pass transforms the ctor / dtor list into a
kernel call that can be used to invoke those functinos. Furthermore, we
emit globals such that the names and addresses of these constructor
functions can be found by the driver. Unfortunately, since NVPTX has no
way to emit variables at a named section, nor a functioning linker to
provide the begin / end symbols, we need to mangle these names and have
an external application find them.

This work is related to the work in D149398 and D149340.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D149451
2023-05-04 07:13:00 -05:00
Kiran Chandramohan
ab49747f9d [NFC][Clang] Move DebugOptions to llvm/Frontend for reuse in Flang
This patch moves the Debug Options to llvm/Frontend so that it can be shared by Flang as well.

Reviewed By: kiranchandramohan, awarzynski

Differential Revision: https://reviews.llvm.org/D142347
2023-03-29 12:01:54 +00:00
Joseph Huber
861764b1c5 [NVPTX] Fix NVPTX output name in the driver with -save-temps
Summary:
Currently, OpenMP and direct compilation uses an NVPTX toolchain to
directly invoke the CUDA tools from Clang to do the assembling and
linking of NVPTX codes. This breaks under `-save-temps` because of a
workaround. The `nvlink` linker does not accept `.o` files, so we need
to be selective when we output these. The previous logic keyed off of
presense in the temp files and wasn't a great solution. Change this to
just query the input args for `-c` to see if we stop at the assembler.

Fixes https://github.com/llvm/llvm-project/issues/60767
2023-02-15 07:39:59 -06:00
Archibald Elliott
d768bf994f [NFC][TargetParser] Replace uses of llvm/Support/Host.h
The forwarding header is left in place because of its use in
`polly/lib/External/isl/interface/extract_interface.cc`, but I have
added a GCC warning about the fact it is deprecated, because it is used
in `isl` from where it is included by Polly.
2023-02-10 09:59:46 +00:00
Archibald Elliott
8e3d7cf5de [NFC][TargetParser] Remove llvm/Support/TargetParser.h 2023-02-07 11:08:21 +00:00
Joseph Huber
db202286eb [Clang][NFC] Fix out-of-date comments on 'clang-offload-bundler'
Summary:
These comments are confusing as the `clang-offload-bundler` is no longer
used by these toolchains.
2023-01-26 13:03:01 -06:00
Joseph Huber
11656f204a [CUDA] Fix output from ptxas being removes as a temporary file
Summary:
The logic here is to add the `.cubin` temporary file if we had to create
a new filename to handle it. Unfortuantely the logic was wrong because
we compare `const char *` values here. This logic seems to have been
wrong for some time, but was never noticed since we never used the
relocatable output.

Fixes https://github.com/llvm/llvm-project/issues/60301
2023-01-25 16:24:30 -06:00
Joseph Huber
0660397e68 [CUDA] Allow targeting NVPTX directly without a host toolchain
Currently, the NVPTX compilation toolchain can only be invoked either
through CUDA or OpenMP via `--offload-device-only`. This is because we
cannot build a CUDA toolchain without an accompanying host toolchain for
the offloading. When using `--target=nvptx64-nvidia-cuda` this results
in generating calls to the GNU assembler and linker, leading to errors.

This patch abstracts the portions of the CUDA toolchain that are
independent of the host toolchain or offloading kind into a new base
class called `NVPTXToolChain`. We still need to read the host's triple
to build the CUDA installation, so if not present we just assume it will
match the host's system for now, or the user can provide the path
explicitly.

This should allow the compiler driver to create NVPTX device images
directly from C/C++ code.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D140158
2023-01-18 18:18:25 -06:00
Joseph Huber
debfa43117 [Clang][NFC] Clang-format CUDA toolchain file
Summary:
This file is not formatted, which makes further changes to it more
difficult. Format it.
2023-01-18 17:15:03 -06:00
Joseph Huber
52b9a39742 [OpenMP] Make -fopenmp-target= use the nvptx-arch tool
Previously, if the user did not provide an architecture when using
`-fopenmp-targets=nvptx64` we used the value from
`CLANG_OPENMP_DEFAULT_NVPTX_ARCH` which is defined at compile time. This
isn't ideal because it means that the default is set when the LLVM
compiler it built. Instead this patch uses the `nvptx-arch` tool to
query it at runtime. This matches the existing behaviour of the AMDGPU
toolchain with its `amdgpu-arch` tool.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D141708
2023-01-13 16:52:06 -06:00
Joseph Huber
26d62674cf [Clang] Explicitly move returned values converted to expected
Summary:
These can cause failures on GCC-7 it seems. We should explicitly move
them to prevent this from causing build failures.
2023-01-12 14:38:03 -06:00
Kadir Cetinkaya
4921b0a285
[clang][Driver][CUDA] Get rid of unused LibPath
LibPath discovered during InstallationDetection wasn't used anywhere.
Moreover it actually resulted in discarding installations that don't have any
`/lib` directory.

This is causing troubles for our pipelines downstream, that want to perform
syntax-only analysis on the sources.

Differential Revision: https://reviews.llvm.org/D141467
2023-01-12 10:36:43 +01:00
Joseph Huber
56ebfca4bc [CUDA][HIP] Add support for --offload-arch=native to CUDA and refactor
This patch adds basic support for `--offload-arch=native` to CUDA. This
is done using the `nvptx-arch` tool that was introduced previously. Some
of the logic for handling executing these tools was factored into a
common helper as well. This patch does not add support for OpenMP or the
"new" driver. That will be done later.

Reviewed By: yaxunl

Differential Revision: https://reviews.llvm.org/D141051
2023-01-11 10:30:30 -06:00
Kazu Hirata
6eb0b0a045 Don't include Optional.h
These files no longer use llvm::Optional.
2022-12-14 21:16:22 -08:00
Kazu Hirata
22731dbd75 [clang] Use std::nullopt instead of None in comments (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-04 20:31:05 -08:00
Fangrui Song
2c5d49cffc [Driver] llvm::Optional => std::optional
and change a few referenced Basic and llvm/lib/WindowsDriver API
2022-12-03 20:17:05 +00:00
Artem Belevich
9a01cca660 Add support for CUDA-11.8 and sm_{87,89,90} GPUs.
Differential Revision: https://reviews.llvm.org/D135306
2022-10-07 13:59:28 -07:00
Joseph Huber
937aaead87 [CUDA] Fix arguments after removing unused private variable
Summary:
A previous patch removed the use of the `OK` private variable in CUDA
which resulted in usused variable warnings. this was fixed in
f886f7e8ef7aa4f54298db792a656373af90440c but did not change the
constructor to accurately represent its removal. This patch removes it
from the interface entirely.
2022-08-26 15:28:34 -05:00
Sterling Augustine
f886f7e8ef Remove unused private variable. 2022-08-26 12:43:05 -07:00
Joseph Huber
47166968db [OpenMP] Deprecate the old driver for OpenMP offloading
Recently OpenMP has transitioned to using the "new" driver which
primarily merges the device and host linking phases into a single
wrapper that handles both at the same time. This replaced a few tools
that were only used for OpenMP offloading, such as the
`clang-offload-wrapper` and `clang-nvlink-wrapper`. The new driver
carries some marked benefits compared to the old driver that is now
being deprecated. Things like device-side LTO, static library
support, and more compatible tooling. As such, we should be able to
completely deprecate the old driver, at least for OpenMP. The old driver
support will still exist for CUDA and HIP, although both of these can
currently be compiled on Linux with `--offload-new-driver` to use the new
method.

Note that this does not deprecate the `clang-offload-bundler`, although
it is unused by OpenMP now, it is still used by the HIP toolchain both
as their device binary format and object format.

When I proposed deprecating this code I heard some vendors voice
concernes about needing to update their code in their fork. They should
be able to just revert this commit if it lands.

Reviewed By: jdoerfert, MaskRay, ye-luo

Differential Revision: https://reviews.llvm.org/D130020
2022-08-26 13:47:09 -05:00
John Ericson
3adda398ce [clang][lldb][cmake] Use new *_INSTALL_LIBDIR_BASENAME CPP macro
Use this instead of `*_LIBDIR_SUFFIX`, from which it is computed.

This gets us ready for D130586, in which `*_LIBDIR_SUFFIX` is
deprecated.

Differential Revision: https://reviews.llvm.org/D132300
2022-08-20 12:52:21 -04:00
John Ericson
e941b031d3 Revert "[cmake] Use CMAKE_INSTALL_LIBDIR too"
This reverts commit f7a33090a91015836497c75f173775392ab0304d.

Unfortunately this causes a number of failures that didn't show up in my
local build.
2022-08-18 22:46:32 -04:00
John Ericson
f7a33090a9 [cmake] Use CMAKE_INSTALL_LIBDIR too
We held off on this before as `LLVM_LIBDIR_SUFFIX` conflicted with it.
Now we return this.

`LLVM_LIBDIR_SUFFIX` is kept as a deprecated way to set
`CMAKE_INSTALL_LIBDIR`. The other `*_LIBDIR_SUFFIX` are just removed
entirely.

I imagine this is too potentially-breaking to make LLVM 15. That's fine.
I have a more minimal version of this in the disto (NixOS) patches for
LLVM 15 (like previous versions). This more expansive version I will
test harder after the release is cut.

Reviewed By: sebastian-ne, ldionne, #libc, #libc_abi

Differential Revision: https://reviews.llvm.org/D130586
2022-08-18 15:33:35 -04:00
Joseph Huber
3b52341116 [CUDA] Fix output name being replaced in device-only mode
When performing device only compilation, there was an issue where
`cubin` outputs were being renamed to `cubin` despite the user's name.
This is required in a normal compilation flow as the Nvidia tools only
understand specific filenames instead of checking magic bytes for some
unknown reason. We do not want to perform this transformation when the
user is performing device only compilation.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D131278
2022-08-05 19:08:41 -04:00
Joseph Huber
56ab966a04 [CUDA] Stop adding CUDA features twice
We currently call the `addNVPTXFeatures` function in two places, inside
of the CUDA Toolchain and inside of Clang in the standard entry point.
We normally add features to the job in Clang, so the call inside of the
CUDA toolchain is redundant and results in `+ptx` features being added.
Since we remove this call, we no longer will have a cached CUDA
installation so we will usually create it twice.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D128752
2022-06-29 09:34:09 -04:00
Joseph Huber
4d3c010f1d [CUDA] Do not embed a fatbinary when using the new driver
Previously, when using the new driver we created a fatbinary with the
PTX and Cubin output. This was mainly done in an attempt to create some
backwards compatibility with the existing CUDA support that embeds the
fatbinary in each TU. This will most likely be more work than necessary
to actually implement. The linker wrapper cannot do anything with these
embedded PTX files because we do not know how to link them, and if we
did want to include multiple files it should go through the
`clang-offload-packager` instead. Also this didn't repsect the setting
that disables embedding PTX (although it wasn't used anyway).

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D128441
2022-06-23 15:40:43 -04:00
Joseph Huber
4205f4aba4 [Cuda] Add the features using the last argument
Summary:
We should use the last argument so this flag can be overridden properly.
2022-05-13 18:05:02 -04:00
Joseph Huber
7dc23abbd3 [CUDA] Add a flag to manually specify the target feature to use with CUDA
Summary:
Normally we parse through the CUDA installation to disover the needed
features. However, we may want to build libraries on targets that do not
currently have CUDA installed but still need to know which features to
make use of when creating the PTX or bitcode. This flag is a simple way
to specify this so we can compile certain codes withotu a valid CUDA
installation.

Ideally this could be done via an -Xarch or simimlar flag but currently
they cannot handle this. We would need to support using an -Xarch flag
that takes multiple arguments that then pass them to the -Xclang
functionality.
2022-05-13 16:30:58 -04:00
Joseph Huber
8477a0d769 [OpenMP] Allow compiling multiple target architectures with OpenMP
This patch adds support for OpenMP to use the `--offload-arch` and
`--no-offload-arch` options. Traditionally, OpenMP has only supported
compiling for a single architecture via the `-Xopenmp-target` option.
Now we can pass in a bound architecture and use that if given, otherwise
we default to the value of the `-march` option as before.

Note that this only applies the basic support, the OpenMP target runtime
does not yet know how to choose between multiple architectures.
Additionally other parts of the offloading toolchain (e.g. LTO) require
the `-march` option, these should be worked out later.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D124721
2022-05-06 16:57:16 -04:00
Shilei Tian
20a9fb953e [Clang][OpenMP] Fix the issue that temp cubin files are not removed after compilation when using new OpenMP driver
The root cause of this is, in `NVPTX::Assembler::ConstructJob`, the output file name might not match the `Output`'s file name passed into the function because `CudaToolChain::getInputFilename` is a specialized version. That means the real output file is not added to the temp files list, which will be all removed in the d'tor of `Compilation`. In order to "fix" it, in the function `NVPTX::OpenMPLinker::ConstructJob`, before calling `clang-nvlink-wrapper`, the function calls `getToolChain().getInputFilename(II)` to get the right output file name for each input, and add it to temp file, and then they can be removed w/o any issue. However, this whole logic doesn't work when using the new OpenMP driver because `NVPTX::OpenMPLinker::ConstructJob` is not called at all, which causing the issue that the cubin file generated in each single unit compilation is out of track.

In this patch, we add the real output file into temp files if its name doesn't match `Output`. We add it when the file is an output instead of doing it when it is an input, like what we did in `NVPTX::OpenMPLinker::ConstructJob`, which makes more sense.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D124253
2022-04-22 18:07:28 -04:00
Joseph Huber
d912232741 [CUDA][FIX] Fix name conflict in getNVPTXTargetFeatures
Summary:
There was a naming conflict in the getNVPTXTargetFeatures function that
prevented some compilers from correctly disambiguating between the
enumeration and variable of the same name. Rename the variable to avoid
this.
2022-03-23 23:07:51 -04:00
Joseph Huber
a3248e4b28 [CUDA] Add getTargetFeatures for the NVPTX toolchain
The NVPTX toolchain uses target features to determine the PTX version to
use. However this isn't exposed externally like most other toolchain
specific target features are. Add this functionaliy in preparation for
using it in for OpenMP offloading.

Reviewed By: jdoerfert, tra

Differential Revision: https://reviews.llvm.org/D122089
2022-03-21 16:32:36 -04:00