It has been noticed that the arguments are being passed twice to ptxas.
This also has been fixed by filtering out the arguments before appending
them to the new DAL created by CudaToolChain::TranslateArgs.
github:https://github.com/llvm/llvm-project/pull/86807
This PR adds `-march=generic` support for the NVPTX backend. This
fulfills a TODO introduced in #79873.
With this PR, users can explicitly request the "default" CUDA
architecture, which makes sure that no specific architecture is
specified.
This PR does not address any compatibility issues between different CUDA
versions.
---------
Co-authored-by: Joseph Huber <huberjn@outlook.com>
Summary:
The old driver embed PTX in rdc-mode and so does the `nvcc` compiler.
The new drivers currently does not do this, so we should keep it
consistent in this case. This simply requires adding the assembler
output as an input to the offloading action that gets fed to fatbin.
Summary:
One recurring problem we have with the OpenMP libraries is that they are
potentially conflicting with ones found on the system, this occurs when
there are two copies and one is used for linking that it not attached to
the correspoding clang compiler. LLVM already uses target specific
directories for this, like with libc++, which are always searched first.
This patch changes the install directory to be
`lib/x86_64-unknown-linux-gnu` for example.
Notable changes would be that users will need to change their
LD_LIBRARY_PATH settings optionally, or use default rt-rpath options.
This should fix problems were users are linking the wrong versions of
static libraries
Summary:
Recent changes to the `libc` project caused the headers to be installed
to `include/<triple>` for the GPU and the libraries to be in
`lib/<triple>`. This means we should automatically append these search
paths so they can be found by default. This allows the following to work
targeting AMDGPU.
```shell
$ clang foo.c -flto -mcpu=native --target=amdgcn-amd-amdhsa -lc <install>/lib/amdgcn-amd-amdhsa/crt1.o
$ amdhsa-loader a.out
```
Summary:
We support standalone compilation for the NVPTX architecture using
'nvlink' as our linker. Because of the special handling required to
transform input files to cubins, as nvlink expects for some reason, we
didn't use the standard AddLinkerInput method. However, this also meant
that we weren't forwarding options passed with -Wl to the linker. Add
this support in for the standalone toolchain path.
Revived from https://reviews.llvm.org/D149978
Summary:
The NVPTX tools require an architecture to be used, however if we are
creating generic LLVM-IR we should be able to leave it unspecified. This
will result in the `target-cpu` attributes not being set on the
functions so it can be changed when linked into code. This allows the
standalone `--target=nvptx64-nvidia-cuda` toolchain to create LLVM-IR
simmilar to how CUDA's deviceRTL looks from C/C++
Summary:
We support `--target=nvptx64-nvidia-cuda` as a way to target the NVPTX
architecture from standard CPU. This patch simply uses the existing
support for handling `--offload-arch=native` to also apply to the
standalone toolchain.
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.
I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
D152495 makes clang warn on unused variables that are declared in conditions like `if (int var = init) {}`
This patch is an NFC fix to suppress the new warning in llvm,clang,lld builds to pass CI in the above patch.
Differential Revision: https://reviews.llvm.org/D158016
Rename -fcuda-approx-transcendentals as
-fgpu-approx-transcendentals and pass it
to both device and host clang -cc1.
Fix its interaction with -ffast-math to allow
-fno-gpu-approx-transcendentals to override
the implicit -fcuda-approx-transcendentals
due to -ffast-math.
Rename the predefined macro to be
__CLANG_GPU_APPROX_TRANSCENDENTALS__.
Emit the macro for both device and host compilation.
Reviewed by: Artem Belevich, Fangrui Song
Differential Revision: https://reviews.llvm.org/D154797
This warning primarily applies to users of the CUDA langues as there may
be new features we rely on. The other two users of the toolchain are
OpenMP via `-fopenmp --offload-arch=sm_70` and a cross-compiled build
via `--target=nvptx64-nvida-cuda -march=sm_70`. Both of these do not
rely directly on things that would change significantly between CUDA
versions, and the way they are built can sometims make this warning
print many times.
This patch changees the behaiour to only check for the version when
building for CUDA offloading specifically, the other two will not have
this check.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D155606
Allow parsing GPU-side variadic functions when we're compiling with CUDA-9 or
newer. We still do not allow accessing variadic arguments.
CUDA-9 was the version which introduced PTX-6.0 which allows implementing
variadic functions, so older versions can't have variadics in principle.
This is required for dealing with headers in recent CUDA versions that rely on
variadic function declarations in some of the templated code in libcu++.
E.g. https://github.com/llvm/llvm-project/issues/58410
Differential Revision: https://reviews.llvm.org/D150718
When cross-compiling NVPTX we use the triple to indicate which paths to
search for the CUDA toolchain. Currently this uses the default target
triple. This might not be exactly correct, as this is the default triple
used to compile binaries, not the host system. We want the host triple
because it indicates which folders should hold CUDA.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D150136
This patch mostly adapts the existing AMDGPUCtorDtorLoweringPass for use
by the Nvidia backend. This pass transforms the ctor / dtor list into a
kernel call that can be used to invoke those functinos. Furthermore, we
emit globals such that the names and addresses of these constructor
functions can be found by the driver. Unfortunately, since NVPTX has no
way to emit variables at a named section, nor a functioning linker to
provide the begin / end symbols, we need to mangle these names and have
an external application find them.
This work is related to the work in D149398 and D149340.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D149451
This patch moves the Debug Options to llvm/Frontend so that it can be shared by Flang as well.
Reviewed By: kiranchandramohan, awarzynski
Differential Revision: https://reviews.llvm.org/D142347
Summary:
Currently, OpenMP and direct compilation uses an NVPTX toolchain to
directly invoke the CUDA tools from Clang to do the assembling and
linking of NVPTX codes. This breaks under `-save-temps` because of a
workaround. The `nvlink` linker does not accept `.o` files, so we need
to be selective when we output these. The previous logic keyed off of
presense in the temp files and wasn't a great solution. Change this to
just query the input args for `-c` to see if we stop at the assembler.
Fixes https://github.com/llvm/llvm-project/issues/60767
The forwarding header is left in place because of its use in
`polly/lib/External/isl/interface/extract_interface.cc`, but I have
added a GCC warning about the fact it is deprecated, because it is used
in `isl` from where it is included by Polly.
Summary:
The logic here is to add the `.cubin` temporary file if we had to create
a new filename to handle it. Unfortuantely the logic was wrong because
we compare `const char *` values here. This logic seems to have been
wrong for some time, but was never noticed since we never used the
relocatable output.
Fixes https://github.com/llvm/llvm-project/issues/60301
Currently, the NVPTX compilation toolchain can only be invoked either
through CUDA or OpenMP via `--offload-device-only`. This is because we
cannot build a CUDA toolchain without an accompanying host toolchain for
the offloading. When using `--target=nvptx64-nvidia-cuda` this results
in generating calls to the GNU assembler and linker, leading to errors.
This patch abstracts the portions of the CUDA toolchain that are
independent of the host toolchain or offloading kind into a new base
class called `NVPTXToolChain`. We still need to read the host's triple
to build the CUDA installation, so if not present we just assume it will
match the host's system for now, or the user can provide the path
explicitly.
This should allow the compiler driver to create NVPTX device images
directly from C/C++ code.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D140158
Previously, if the user did not provide an architecture when using
`-fopenmp-targets=nvptx64` we used the value from
`CLANG_OPENMP_DEFAULT_NVPTX_ARCH` which is defined at compile time. This
isn't ideal because it means that the default is set when the LLVM
compiler it built. Instead this patch uses the `nvptx-arch` tool to
query it at runtime. This matches the existing behaviour of the AMDGPU
toolchain with its `amdgpu-arch` tool.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D141708
LibPath discovered during InstallationDetection wasn't used anywhere.
Moreover it actually resulted in discarding installations that don't have any
`/lib` directory.
This is causing troubles for our pipelines downstream, that want to perform
syntax-only analysis on the sources.
Differential Revision: https://reviews.llvm.org/D141467
This patch adds basic support for `--offload-arch=native` to CUDA. This
is done using the `nvptx-arch` tool that was introduced previously. Some
of the logic for handling executing these tools was factored into a
common helper as well. This patch does not add support for OpenMP or the
"new" driver. That will be done later.
Reviewed By: yaxunl
Differential Revision: https://reviews.llvm.org/D141051
Summary:
A previous patch removed the use of the `OK` private variable in CUDA
which resulted in usused variable warnings. this was fixed in
f886f7e8ef7aa4f54298db792a656373af90440c but did not change the
constructor to accurately represent its removal. This patch removes it
from the interface entirely.
Recently OpenMP has transitioned to using the "new" driver which
primarily merges the device and host linking phases into a single
wrapper that handles both at the same time. This replaced a few tools
that were only used for OpenMP offloading, such as the
`clang-offload-wrapper` and `clang-nvlink-wrapper`. The new driver
carries some marked benefits compared to the old driver that is now
being deprecated. Things like device-side LTO, static library
support, and more compatible tooling. As such, we should be able to
completely deprecate the old driver, at least for OpenMP. The old driver
support will still exist for CUDA and HIP, although both of these can
currently be compiled on Linux with `--offload-new-driver` to use the new
method.
Note that this does not deprecate the `clang-offload-bundler`, although
it is unused by OpenMP now, it is still used by the HIP toolchain both
as their device binary format and object format.
When I proposed deprecating this code I heard some vendors voice
concernes about needing to update their code in their fork. They should
be able to just revert this commit if it lands.
Reviewed By: jdoerfert, MaskRay, ye-luo
Differential Revision: https://reviews.llvm.org/D130020
Use this instead of `*_LIBDIR_SUFFIX`, from which it is computed.
This gets us ready for D130586, in which `*_LIBDIR_SUFFIX` is
deprecated.
Differential Revision: https://reviews.llvm.org/D132300
We held off on this before as `LLVM_LIBDIR_SUFFIX` conflicted with it.
Now we return this.
`LLVM_LIBDIR_SUFFIX` is kept as a deprecated way to set
`CMAKE_INSTALL_LIBDIR`. The other `*_LIBDIR_SUFFIX` are just removed
entirely.
I imagine this is too potentially-breaking to make LLVM 15. That's fine.
I have a more minimal version of this in the disto (NixOS) patches for
LLVM 15 (like previous versions). This more expansive version I will
test harder after the release is cut.
Reviewed By: sebastian-ne, ldionne, #libc, #libc_abi
Differential Revision: https://reviews.llvm.org/D130586
When performing device only compilation, there was an issue where
`cubin` outputs were being renamed to `cubin` despite the user's name.
This is required in a normal compilation flow as the Nvidia tools only
understand specific filenames instead of checking magic bytes for some
unknown reason. We do not want to perform this transformation when the
user is performing device only compilation.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D131278
We currently call the `addNVPTXFeatures` function in two places, inside
of the CUDA Toolchain and inside of Clang in the standard entry point.
We normally add features to the job in Clang, so the call inside of the
CUDA toolchain is redundant and results in `+ptx` features being added.
Since we remove this call, we no longer will have a cached CUDA
installation so we will usually create it twice.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D128752
Previously, when using the new driver we created a fatbinary with the
PTX and Cubin output. This was mainly done in an attempt to create some
backwards compatibility with the existing CUDA support that embeds the
fatbinary in each TU. This will most likely be more work than necessary
to actually implement. The linker wrapper cannot do anything with these
embedded PTX files because we do not know how to link them, and if we
did want to include multiple files it should go through the
`clang-offload-packager` instead. Also this didn't repsect the setting
that disables embedding PTX (although it wasn't used anyway).
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D128441
Summary:
Normally we parse through the CUDA installation to disover the needed
features. However, we may want to build libraries on targets that do not
currently have CUDA installed but still need to know which features to
make use of when creating the PTX or bitcode. This flag is a simple way
to specify this so we can compile certain codes withotu a valid CUDA
installation.
Ideally this could be done via an -Xarch or simimlar flag but currently
they cannot handle this. We would need to support using an -Xarch flag
that takes multiple arguments that then pass them to the -Xclang
functionality.
This patch adds support for OpenMP to use the `--offload-arch` and
`--no-offload-arch` options. Traditionally, OpenMP has only supported
compiling for a single architecture via the `-Xopenmp-target` option.
Now we can pass in a bound architecture and use that if given, otherwise
we default to the value of the `-march` option as before.
Note that this only applies the basic support, the OpenMP target runtime
does not yet know how to choose between multiple architectures.
Additionally other parts of the offloading toolchain (e.g. LTO) require
the `-march` option, these should be worked out later.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D124721
The root cause of this is, in `NVPTX::Assembler::ConstructJob`, the output file name might not match the `Output`'s file name passed into the function because `CudaToolChain::getInputFilename` is a specialized version. That means the real output file is not added to the temp files list, which will be all removed in the d'tor of `Compilation`. In order to "fix" it, in the function `NVPTX::OpenMPLinker::ConstructJob`, before calling `clang-nvlink-wrapper`, the function calls `getToolChain().getInputFilename(II)` to get the right output file name for each input, and add it to temp file, and then they can be removed w/o any issue. However, this whole logic doesn't work when using the new OpenMP driver because `NVPTX::OpenMPLinker::ConstructJob` is not called at all, which causing the issue that the cubin file generated in each single unit compilation is out of track.
In this patch, we add the real output file into temp files if its name doesn't match `Output`. We add it when the file is an output instead of doing it when it is an input, like what we did in `NVPTX::OpenMPLinker::ConstructJob`, which makes more sense.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D124253
Summary:
There was a naming conflict in the getNVPTXTargetFeatures function that
prevented some compilers from correctly disambiguating between the
enumeration and variable of the same name. Rename the variable to avoid
this.
The NVPTX toolchain uses target features to determine the PTX version to
use. However this isn't exposed externally like most other toolchain
specific target features are. Add this functionaliy in preparation for
using it in for OpenMP offloading.
Reviewed By: jdoerfert, tra
Differential Revision: https://reviews.llvm.org/D122089