This patch provides a single point for handling the logic behind
choosing common bitcode libraries. The intention is that the users of
ROCm installation detector will not have to rewrite options handling
code each time the bitcode libraries are queried. This is not too
distant from detectors for other architecture that encapsulate the
similar decision making process, providing cleaner interface. The only
flag left in `getCommonBitcodeLibs` (main point of entry) is
`NeedsASanRT`, this is deliberate, as in order to calculate it we need
to consult `ToolChain`.
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
Summary:
We support `nogpulib` to disable implicit libraries. In the future we
will want to change the default linking of these libraries based on the
user language. This patch just introduces a positive variant so now we
can do `-nogpulib -gpulib` to disable it.
Later patch will make the default a variable in the ROCmToolChain
depending on the target languages.
- This reverts commit
0c6c4a9993.
- Add '-mcode-object-version=5' as to explicitly use code object
version 5 to match with 'FAIL' diagnostic.
- Add Requires directive to support lit test run on platforms
registered with x86_64 and amdgpu.
Summary:
OpenMP was weirdly split between using the bound architecture from
`--offload-arch=` and the old `-march=` option which only worked for
single jobs. This patch removes that special handling. The main benefit
here is that we can now use `getToolchainArgs` without it throwing an
error.
I'm assuming SYCL doesn't care about this because they don't use an
architecture.
ASan bitcode linking is currently available for HIPAMD,OpenMP and
OpenCL. Moving sanitizer specific common parts of logic to appropriate
API's so as to reduce code redundancy and maintainability.
Summary:
Now that we have a functional build for `libc++` on the GPU, it will now
find the target specific headers in `include/amdgcn-amd-amdhsa`. This is
a problem for offloading via OpenMP because we need the CPU and GPU
headers to match exactly. All the other toolchains forward this
correctly except the AMDGPU OpenMP one, fix this by overriding it to use
the host toolchain instead of the device one, so the triple is not
returned as `amdgcn-amd-amdhsa`.
-fcuda-is-device flag is not used for OpenMP offloading for AMD GPUs and
it does not need to be added as clang cc1 option for OpenMP code.
This PR has the same functionality as
https://github.com/llvm/llvm-project/pull/96909 but it doesn't introduce
regression for virtual function support.
Summary:
AMDGPU supports a `target-id` feature which is used to qualify targets
with different incompatible features. These are both rules and target
features. Currently, we pass `-target-cpu` twice when offloading to
OpenMP, and do not pass the target-id features at all. The effect was
that passing something like `--offload-arch=gfx90a:xnack+` would show up
as `-target-cpu=gfx90a:xnack+ -target-cpu=gfx90a`. Thus ignoring the
xnack completely and passing it twice. This patch fixes that to pass it
once and then separate it like how HIP does.
Lazyly initialize uncommon toolchain detector
Cuda and rocm toolchain detectors are currently run unconditionally,
while their result may not be used at all. Make their initialization
lazy so that the discovery code is not run in common cases.
Reapplied since 77910ac374656319ff114ef251fda358d4aa166a landed and
fixes the test ordering issue.
Differential Revision: https://reviews.llvm.org/D142606
Cuda and rocm toolchain detectors are currently run unconditionally,
while their result may not be used at all. Make their initialization
lazy so that the discovery code is not run in common cases.
Differential Revision: https://reviews.llvm.org/D142606
The AMDGPU target can only emit LLVM-IR, so we can always rely on LTO to
link the static version of the runtime optimally. Using the static
library only has a few advantages. Namely, it avoids several known bugs
and allows us to optimize out more functions. This is legal since the
changes in D142486 and D142484
Depends on D142486 D142484
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D142491
Previously we had some special handling here that errored out if
multiple architectures were detected. This isn't a problem anymore as
the runtime can handle multi-archicture binaries automatically. So it's
safe to simply take the first architecture that we know works. If users
use `--offload-arch=native` instead it will build for all the
architectures at the same time rather than just picking one. This patch
makes it consisten with the NVPTX version.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D142138
This patch adds basic support for `--offload-arch=native` to CUDA. This
is done using the `nvptx-arch` tool that was introduced previously. Some
of the logic for handling executing these tools was factored into a
common helper as well. This patch does not add support for OpenMP or the
"new" driver. That will be done later.
Reviewed By: yaxunl
Differential Revision: https://reviews.llvm.org/D141051
Previously, we linked in the ROCm device libraries which provide math
and other utility functions late. This is not stricly correct as this
library contains several flags that are only set per-TU, such as fast
math or denormalization. This patch changes this to pass the bitcode
libraries per-TU using the same method we use for the CUDA libraries.
This has the advantage that we correctly propagate attributes making
this implementation more correct. Additionally, many annoying unused
functions were not being fully removed during LTO. This lead to
erroneous warning messages and remarks on unused functions.
I am not sure if not finding these libraries should be a hard error. let
me know if it should be demoted to a warning saying that some device
utilities will not work without them.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D133726
Recently OpenMP has transitioned to using the "new" driver which
primarily merges the device and host linking phases into a single
wrapper that handles both at the same time. This replaced a few tools
that were only used for OpenMP offloading, such as the
`clang-offload-wrapper` and `clang-nvlink-wrapper`. The new driver
carries some marked benefits compared to the old driver that is now
being deprecated. Things like device-side LTO, static library
support, and more compatible tooling. As such, we should be able to
completely deprecate the old driver, at least for OpenMP. The old driver
support will still exist for CUDA and HIP, although both of these can
currently be compiled on Linux with `--offload-new-driver` to use the new
method.
Note that this does not deprecate the `clang-offload-bundler`, although
it is unused by OpenMP now, it is still used by the HIP toolchain both
as their device binary format and object format.
When I proposed deprecating this code I heard some vendors voice
concernes about needing to update their code in their fork. They should
be able to just revert this commit if it lands.
Reviewed By: jdoerfert, MaskRay, ye-luo
Differential Revision: https://reviews.llvm.org/D130020
This patch adds support for OpenMP to use the `--offload-arch` and
`--no-offload-arch` options. Traditionally, OpenMP has only supported
compiling for a single architecture via the `-Xopenmp-target` option.
Now we can pass in a bound architecture and use that if given, otherwise
we default to the value of the `-march` option as before.
Note that this only applies the basic support, the OpenMP target runtime
does not yet know how to choose between multiple architectures.
Additionally other parts of the offloading toolchain (e.g. LTO) require
the `-march` option, these should be worked out later.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D124721
Now that the old device runtime has been deleted there is only a single
target that differs by the triple and the architecture. Simplify the
scheme for identifying the library but directly using the triple.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D119638
Summary:
The new runtime was deleted. AMD's old runtime used the triple name
`amdgcn` while the new runtime used `amdgpu`. This was not updated when
the old runtime was removed causing the library to not be found on
AMDGPU.
This patch completely removes the old OpenMP device runtime. Previously,
the old runtime had the prefix `libomptarget-new-` and the old runtime
was simply called `libomptarget-`. This patch makes the formerly new
runtime the only runtime available. The entire project has been deleted,
and all references to the `libomptarget-new` runtime has been replaced
with `libomptarget-`.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D118934
Summary:
This patch adds support for linking the OpenMP device bitcode library
late when doing LTO. This simply passes it in as an additional device
file when doing the final device linking phase with LTO. This has the
advantage that we don't link it multiple times, and the device
references do not get inlined and prevent us from doing needed OpenMP
optimizations when we have visiblity of the whole module.
Fix some failings where the implicit conversion of an Error to an
Expected triggered the deleted copy constructor.
Depends on D116675
Differential revision: https://reviews.llvm.org/D117048
Once we linked in math files, potentially even if we link in only other
"system libraries", we want to optimize the code again. This is not only
reasonable but also helps to hide various problems with the missing
attribute annotations in the math libraries.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D116906
Extracted from D117246. This reflects the march value used by the
compile back into the toolchain arguments, letting downstream processes
such as LTO rely on it being present. Subsequent patches should also be able
to remove the two other calls to checkSystemForAMDGPU.
Reviewed By: jonchesterfield
Differential Revision: https://reviews.llvm.org/D117706
The new runtime is currently broken for AMD offloading. This patch makes
the default the old runtime only for the AMD target.
Reviewed By: ronlieb
Differential Revision: https://reviews.llvm.org/D114965
This patch changes the `-fopenmp-target-new-runtime` option which controls if
the new or old device runtime is used to be true by default. Disabling this to
use the old runtime now requires using `-fno-openmp-target-new-runtime`.
Reviewed By: JonChesterfield, tianshilei1992, gregrodgers, ronlieb
Differential Revision: https://reviews.llvm.org/D114890
Passes same tests as the current deviceRTL. Includes cmake change from D111987.
CI is showing a different set of pass/fails to local, committing this
without the tests enabled by default while debugging that difference.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D112227
Passes same tests as the current deviceRTL. Includes cmake change from D111987.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D112227
An archive containing device code object files can be passed to
clang command line for linking. For each given offload target
it creates a device specific archives which is either passed to llvm-link
if the target is amdgpu, or to clang-nvlink-wrapper if the target is
nvptx. -L/-l flags are used to specify these fat archives on the command
line. E.g.
clang++ -fopenmp -fopenmp-targets=nvptx64 main.cpp -L. -lmylib
It currently doesn't support linking an archive directly, like:
clang++ -fopenmp -fopenmp-targets=nvptx64 main.cpp libmylib.a
Linking with x86 offload also does not work.
Reviewed By: ye-luo
Differential Revision: https://reviews.llvm.org/D105191
An archive containing device code object files can be passed to
clang command line for linking. For each given offload target
it creates a device specific archives which is either passed to llvm-link
if the target is amdgpu, or to clang-nvlink-wrapper if the target is
nvptx. -L/-l flags are used to specify these fat archives on the command
line. E.g.
clang++ -fopenmp -fopenmp-targets=nvptx64 main.cpp -L. -lmylib
It currently doesn't support linking an archive directly, like:
clang++ -fopenmp -fopenmp-targets=nvptx64 main.cpp libmylib.a
Linking with x86 offload also does not work.
Reviewed By: ye-luo
Differential Revision: https://reviews.llvm.org/D105191
This fixes the 'unused linker option: -lm' warning when compiling
program with -c.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D107952
Math libraries are linked only when -lm is specified. This is because
host system could be missing rocm-device-libs.
Reviewed By: JonChesterfield, yaxunl
Differential Revision: https://reviews.llvm.org/D105981
Moving `InputInfo.h` from `lib/Driver/` into `include/Driver` to be able to expose it in an API consumed from outside of `clangDriver`.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D106787