This partially reverts commit aa964f157f9b50fab3895afbfda6e0915cf6bb4a
because it caused perf regressions in rccl due to drop of -mllvm
-amgpu-kernarg-preload-count=16 from the linker step. Potentially it
could cause similar regressions for other HIP apps using -mllvm options
with -fgpu-rdc.
Fixes: SWDEV-443345
This patch adds the Driver changes needed for enabling HIP parallel algorithm offload on AMDGPU targets. What this change does can be summed up as follows:
- add two flags, one for enabling `hipstdpar` compilation, the second enabling the optional allocation interposition mode;
- the flags correspond to new LangOpt members;
- if we are compiling or linking with --hipstdpar, we enable HIP; in the compilation case C and C++ inputs are treated as HIP inputs;
- the ROCm / AMDGPU driver is augmented to look for and include an implementation detail forwarding header; we error out if the user requested `hipstdpar` but the header or its dependencies cannot be found.
Tests for the behaviour described above are also added.
Reviewed by: MaskRay, yaxunl
Differential Revision: https://reviews.llvm.org/D155775
Summary:
This feature is not needed anymore and is replaced by different
implementations. The code guarded by this flag also causes us to emit an
invalid argument to `-mlink-builtin-bitcode` that will cause errors if
ever actually executed. Remove this feature.
Rename -fcuda-approx-transcendentals as
-fgpu-approx-transcendentals and pass it
to both device and host clang -cc1.
Fix its interaction with -ffast-math to allow
-fno-gpu-approx-transcendentals to override
the implicit -fcuda-approx-transcendentals
due to -ffast-math.
Rename the predefined macro to be
__CLANG_GPU_APPROX_TRANSCENDENTALS__.
Emit the macro for both device and host compilation.
Reviewed by: Artem Belevich, Fangrui Song
Differential Revision: https://reviews.llvm.org/D154797
currently clang passes -mllvm options to the device lld linker plugin
when compiling HIP. This is against default clang behavior
which is only passing -mllvm options to linker plugin specified through -Wl
options. This patch lets clang only pass -Xoffload-linker -mllvm= options
to device lld linker plugin.
Fixes: https://github.com/llvm/llvm-project/issues/63604
Reviewed by: Joseph Huber, Matt Arsenault
Differential Revision: https://reviews.llvm.org/D154145
Add the --whole-archive flag when linking HIP programs to instruct lld
to go through every archive library to link in all the kernel functions
(entry pointers to the GPU program); otherwise, lld may skip some
library files if there are no more symbols that need to be resolved.
Differential Revision: https://reviews.llvm.org/D152207
Change-Id: I084d3d606f9cee646f9adc65f4b648c9bcb252e6
For AMDGCN default to mapping --lto-O# to --lto-CGO# in a 1:1 manner
(i.e. clang -O<N> implies --lto-O<N> and --lto-CGO<N>).
Ensure there is a means to override this via -Xoffload-linker and begin
to claim these arguments to avoid incorrect warnings that they are not
used.
Reviewed By: yaxunl, MaskRay
Differential Revision: https://reviews.llvm.org/D142499
Lazyly initialize uncommon toolchain detector
Cuda and rocm toolchain detectors are currently run unconditionally,
while their result may not be used at all. Make their initialization
lazy so that the discovery code is not run in common cases.
Reapplied since 77910ac374656319ff114ef251fda358d4aa166a landed and
fixes the test ordering issue.
Differential Revision: https://reviews.llvm.org/D142606
Cuda and rocm toolchain detectors are currently run unconditionally,
while their result may not be used at all. Make their initialization
lazy so that the discovery code is not run in common cases.
Differential Revision: https://reviews.llvm.org/D142606
The AMDGPU target only emits shared libraries currently. This patch
changes the handling of the PIC level to be managed in the
AMDGPUToolChain rather than having a special case for it. This causes
`--target=amdgcn--` to no longer set the PIC. This should be an
acceptable change since that doesn't use a correct toolchain anyway.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D142999
When -fgpu-rdc is used for linking relocatable objects, clang driver launches
clang-offload-bundler to extract a device relocatable object from each input
relocatable object file and passes the extracted files to lld. The input relocatable
object file could either come from HIP program or C++ program. The relocatable
object file from C++ program does not contain device relocatable objects, therefore
clang-offload-bundler extracts an empty file and passes it to lld. lld treates
empty file as linker script. When there is no object input file to lld, lld
will emit error:
target emulation unknown: -m or at least one .o file required
This patch adds "elf64_amdgpu" to lld so that lld always know the target
no matter whether there are object input files or not.
Reviewed by: Artem Belevich, Fangrui Song
Differential Revision: https://reviews.llvm.org/D138221
Previously, we linked in the ROCm device libraries which provide math
and other utility functions late. This is not stricly correct as this
library contains several flags that are only set per-TU, such as fast
math or denormalization. This patch changes this to pass the bitcode
libraries per-TU using the same method we use for the CUDA libraries.
This has the advantage that we correctly propagate attributes making
this implementation more correct. Additionally, many annoying unused
functions were not being fully removed during LTO. This lead to
erroneous warning messages and remarks on unused functions.
I am not sure if not finding these libraries should be a hard error. let
me know if it should be demoted to a warning saying that some device
utilities will not work without them.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D133726
Fix a typo about -fno-gpu-sanitize handling and disable warnings when
-fno-gpu-sanitize is specified.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D121302
HIP programs compiled with -c -fgpu-rdc generate clang-offload-bundler
bundles which contain bitcode for different GPU's.
Such files can be archived to an archive file which can be linked with
HIP programs with -fgpu-rdc.
This patch adds suppor of linking archive of bundled bitcode.
When an archive of bundled bitcode is passed to clang by -l, for each
GPU specified through --offload-arch, clang extracts bitcode from
the archive and creates a new archive for that GPU and pass it
to lld.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D120070
Fixes: SWDEV-321741, SWDEV-315773
This patch refactors the HIP tool chain for new HIP tool chain, HIPSPV
tool chain, which is added in the follow up patch part 2.
Rename HIPToolChain to HIPAMDToolChain and Renames HIP.* files to HIPAMD.*.
Introduce HIPUtility.* file where common HIP utilities, shared among HIP
tool chain implementations, are placed in.
Move constructHIPFatbinCommand() and
constructGenerateObjFileFromHIPFatBinary() to HIPUtility. HIPSPV tool
chain is going to use them.
Tweak bundle target ID in constructHIPFatbinCommand(): extra dashes are
dropped if the Target ID is empty and 'hip' offload kind is made default
for non-AMD targets.
Patch by: Henry Linjamäki
Reviewed by: Yaxun Liu, Artem Belevich, Eric Christopher
Differential Revision: https://reviews.llvm.org/D110549