Added --offload-compression-level= option to clang and
-compression-level=
option to clang-offload-bundler for controlling compression level.
Added support of long distance matching (LDM) for llvm::zstd which is
off
by default. Enable it for clang-offload-bundler by default since it
improves compression rate in general.
Change default compression level to 3 for zstd for clang-offload-bundler
since it works well for bundle entry size from 1KB to 32MB, which should
cover most of the clang-offload-bundler usage. Users can still specify
compression level by -compression-level= option if necessary.
`-fgpu-rdc` mode allows device functions call device functions in
different TU. However, currently all device objects have to be linked
together since only one fat binary is supported. This is time consuming
for AMDGPU backend since it only supports LTO.
There are use cases that objects can be divided into groups in which
device functions are self-contained but host functions are not. It is
desirable to link/optimize/codegen the device code and generate a fatbin
for each group, whereas partially link the host code with `ld -r` or
generate a static library by using the `--emit-static-lib` option of
clang. This avoids linking all device code together, therefore decreases
the linking time for `-fgpu-rdc`.
Previously, clang emits an external symbol `__hip_fatbin` for all
objects for `-fgpu-rdc`. With this patch, clang emits an unique external
symbol `__hip_fatbin_{cuid}` for the fat binary for each object. When a
group of objects are linked together to generate a fatbin, the symbols
are merged by alias and point to the same fat binary. Each group has its
own fat binary. One executable or shared library can have multiple fat
binaries. Device linking is done for undefined fab binary symbols only
to avoid repeated linking. `__hip_gpubin_handle` is also uniquefied and
merged to avoid repeated registering. Symbol `__hip_cuid_{cuid}` is
introduced to facilitate debugging and tooling.
Fixes: https://github.com/llvm/llvm-project/issues/77018
HIP toolchain uses llvm-mc to generate a host object embedding device
binary for -fgpu-rdc. Due to lack of .note.GNU-stack section, the
generated relocatable has executable stack marking, which disables
protection from executable stack for any HIP programs compiled with
-fgpu-rdc.
This patch adds .note.GNU-stack section to the input to llvm-mc to fix
the executable stack marking.
Fixes: https://github.com/llvm/llvm-project/issues/71711
Original PR: https://github.com/llvm/llvm-project/pull/67162
The commit was reverted due to UB detected by santizer:
https://lab.llvm.org/buildbot/#/builders/238/builds/5955
clang/lib/Driver/OffloadBundler.cpp:1012:25: runtime error:
load of misaligned address 0xaaaae2d90e7c for type
'const uint64_t' (aka 'const unsigned long'), which
requires 8 byte alignment
It was fixed by using memcpy instead of dereferencing int*
casted from unaligned char*.
This reverts commit a1e81d2ead02e041471ec2299d7382f80c4dbba6.
Revert "Fix test hip-offload-compress-zlib.hip"
This reverts commit ba01ce60665848478ba4e76190907153a8c26fe9.
Revert due to sanity fail at
https://lab.llvm.org/buildbot/#/builders/5/builds/37188https://lab.llvm.org/buildbot/#/builders/238/builds/5955
/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/clang/lib/Driver/OffloadBundler.cpp:1012:25: runtime error: load of misaligned address 0xaaaae2d90e7c for type 'const uint64_t' (aka 'const unsigned long'), which requires 8 byte alignment
0xaaaae2d90e7c: note: pointer points here
bc 00 00 00 94 dc 29 9a 89 fb ca 2b 78 9c 8b 8f 77 f6 71 f4 73 8f f7 77 73 f3 f1 77 74 89 77 0a
^
#0 0xaaaaba125f70 in clang::CompressedOffloadBundle::decompress(llvm::MemoryBuffer const&, bool) /b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/clang/lib/Driver/OffloadBundler.cpp:1012:25
#1 0xaaaaba126150 in clang::OffloadBundler::ListBundleIDsInFile(llvm::StringRef, clang::OffloadBundlerConfig const&) /b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/clang/lib/Driver/OffloadBundler.cpp:1089:7
Will reland after fixing it.
Add option -f[no-]offload-compress to clang to enable/disable
compression of device binary for HIP. By default it is disabled.
Add option -compress to clang-offload-bundler to enable compression of
offload bundle. By default it is disabled.
When enabled, zstd or zlib is used for compression when available.
When disabled, it is NFC compared to previous behavior. The same offload
bundle format is used as before.
Clang-offload-bundler automatically detects whether the input file to be
unbundled is compressed and the compression method and decompress if
necessary.
Currently, clang-offload-bundler has -inputs and -outputs options that accept
values with comma as the delimiter. This causes issues with file paths
containing commas, which are valid file paths on Linux.
This add two new options -input and -output, which accept one single file,
and allow multiple instances. This allows arbitrary file paths. The old
-inputs and -outputs options will be kept for backward compatibility, but
are not allowed to be used with -input and -output options for simplicity.
In the future, -inputs and -outputs options will be phasing out.
RFC: https://discourse.llvm.org/t/rfc-adding-input-and-output-options-to-clang-offload-bundler/60049
Patch by: Siu Chi Chan
Reviewed by: Yaxun Liu
Differential Revision: https://reviews.llvm.org/D120662
This patch refactors the HIP tool chain for new HIP tool chain, HIPSPV
tool chain, which is added in the follow up patch part 2.
Rename HIPToolChain to HIPAMDToolChain and Renames HIP.* files to HIPAMD.*.
Introduce HIPUtility.* file where common HIP utilities, shared among HIP
tool chain implementations, are placed in.
Move constructHIPFatbinCommand() and
constructGenerateObjFileFromHIPFatBinary() to HIPUtility. HIPSPV tool
chain is going to use them.
Tweak bundle target ID in constructHIPFatbinCommand(): extra dashes are
dropped if the Target ID is empty and 'hip' offload kind is made default
for non-AMD targets.
Patch by: Henry Linjamäki
Reviewed by: Yaxun Liu, Artem Belevich, Eric Christopher
Differential Revision: https://reviews.llvm.org/D110549