In LTO, part of LLVM's middle-end runs after linking has finished. LTO's
semantics depend on the complete set of extracted bitcode files being
known at this time. If the middle-end inserts new calls to library
functions (libfuncs) that are implemented in bitcode, this could extract
new bitcode object files into the link. These cannot be compiled,
leading to undefined symbol references.
Additionally, the middle-end in LTO may reason that such library
functions have no references, and it may internalize them, then
manipulate their API or even delete them. Afterwards, it may emit a call
to them, again producing undefined symbol references.
This patch resolves the former issue by ensuring that the middle end
emits no new references to symbols defined in bitcode, and it resolves
the latter issue by ensuring that extracted bitcode for libfuncs is
considered external, since new calls may be emitted to them at any time.
The new semantics are not yet established for MachO LLD, which does not
yet appear to have any special handling for libcalls in LTO. It also
does not yet support distributed ThinLTO; doing so would require
additional (de)serialization work.
This is the patch referenced in @ilovepi's and my talk at the last LLVM
devmeeting: "LT-Uh-Oh"
Gemini 3.1 was used in porting to COFF and WASM LLDs.
The in-process ThinLTO backend typically generates object files in
memory and adds them directly to the link, except when the ThinLTO cache
is in use. DTLTO is unusual in that it adds files to the link from disk
in all cases.
When the ThinLTO cache is not in use, ThinLTO adds files via an
`AddStreamFn` callback provided by the linker, which ultimately appends
to a `SmallVector` in LLD. When the cache is in use, the linker supplies
an `AddBufferFn` callback that adds files more efficiently (by moving
`MemoryBuffer` ownership).
This patch adds a mandatory `AddBufferFn` to the DTLTO ThinLTO backend.
The backend uses this to add files to the link more efficiently.
Additionally:
- Move AddStream from CGThinBackend to InProcessThinBackend, for reader
clarity.
- Modify linker comments that implied the AddBuffer path is
cache-specific.
For a Clang link (Debug build with sanitizers and instrumentation) using
an optimized toolchain (PGO non-LTO, llvmorg-22.1.0), measuring the mean
`Add DTLTO files to the link` time trace scope duration:
- On Windows (Windows 11 Pro Build 26200, AMD Family 25 @ ~4.5 GHz, 16
cores/32 threads, 64 GB RAM), this patch reduces the mean from
2799.148 ms to 157.972 ms.
- On Linux (Ubuntu 24.04.3 LTS Kernel 6.14, Ryzen 9 5950X, 16
cores/32 threads, boost up to 5.09 GHz, 64 GB RAM), this patch reduces
the mean from 255.291 ms to 41.630 ms.
Based on work by @romanova-ekaterina and @kbelochapka.
Without this change, passing -fthinlto-index causes -fpass-plugin
arguments to be ignored. We want to be able to use plugins with
distributed thin-lto, so add support for this.
DTLTO emits temporary files to allow distribution of archive member
inputs.
It also emits temporary files from the ThinLTO backend, such as the
index files needed for each distributed ThinLTO backend compilation.
This change brings archive member temporary files into line with those
produced by the ThinLTO backend. They are now emitted in the same
location, warnings are emitted if they cannot be deleted, and they are
cleaned up on abnormal exit (e.g. Ctrl-C). All temporary files are
preserved when --save-temps is specified.
The existing signal-handling test has been extended to cover the full
set of DTLTO temporary files, and a new test has been added to exercise
temporary file handling in normal operation. Additionally, a minimal
test has been added to show the COFF behaviour.
SIE Internal tracker: TOOLCHAIN-21022
This patch implements support for handling archive members in DTLTO.
Unlike ThinLTO, where archive members are passed as in-memory buffers,
DTLTO requires archive members to be materialized as individual files on
the filesystem.
This is necessary because DTLTO invokes clang externally, which expects
file-based inputs.
To support this, this implementation identifies archive members among
the input files,
saves them to the filesystem, and updates their module_id to match their
file paths.
Add DTLTO linker option `--thinlto-remote-compiler-prepend-arg` to
enable support for the multi-call LLVM driver that requires an
additional option to specify the subcommand, e.g. "llvm clang ...".
Fixes https://github.com/llvm/llvm-project/issues/159125.
This patch introduces support for Integrated Distributed ThinLTO (DTLTO)
in ELF LLD.
DTLTO enables the distribution of ThinLTO backend compilations via
external distribution systems, such as Incredibuild, during the
traditional link step: https://llvm.org/docs/DTLTO.html.
It is expected that users will invoke DTLTO through the compiler driver
(e.g., Clang) rather than calling LLD directly. A Clang-side interface
for DTLTO will be added in a follow-up patch.
Note: Bitcode members of archives (thin or non-thin) are not currently
supported. This will be addressed in a future change. As a consequence
of this lack of support, this patch is not sufficient to allow for
self-hosting an LLVM build with DTLTO. Theoretically,
--start-lib/--end-lib could be used instead of archives in a self-host
build. However, it's unclear how --start-lib/--end-lib can be easily
used with the LLVM build system.
Testing:
- ELF LLD `lit` test coverage has been added, using a mock distributor
to avoid requiring Clang.
- Cross-project `lit` tests cover integration with Clang.
For the design discussion of the DTLTO feature, see: #126654.
isExported, intended to replace exportDynamic, is primarily set in two
locations, (a) after parseSymbolVersion and (b) during demoteSymbols.
In the future, we should try removing exportDynamic. Currently,
merging exportDynamic/isExported would cause
riscv-gp.s to fail:
* The first isExported computation considers the undefined symbol exported
* Defined as a linker-synthesized symbol
* isExported remains true, while it should be false
We've noticed that for large builds executing thin-link can take on the
order of 10s of minutes. We are only using a single thread to write the
sharded indices and import files for each input bitcode file. While we
need to ensure the index file produced lists modules in a deterministic
order, that doesn't prevent us from executing the rest of the work in
parallel.
In this change we use a thread pool to execute as much of the backend's
work as possible in parallel. In local testing on a machine with 80
cores, this change makes a thin-link for ~100,000 input files run in ~2
minutes. Without this change it takes upwards of 10 minutes.
---------
Co-authored-by: Nuri Amari <nuriamari@fb.com>
This reapplies commit 1911a50fae8a441b445eb835b98950710d28fc88 with a
minor fix in lld/ELF/LTO.cpp which sets Options.BBAddrMap when
`--lto-basic-block-sections=labels` is passed.
Remove the global variable `symtab` and add a member variable
(`std::unique_ptr<SymbolTable>`) to `Ctx` instead.
This is one step toward eliminating global states.
Pull Request: https://github.com/llvm/llvm-project/pull/109612
Fix the use-after-free bug and re-apply
https://github.com/llvm/llvm-project/pull/106193
* Without the fix, the string referenced by `objSym.Name` could be
destroyed even if string saver keeps a copy of the referenced string.
This caused use-after-free.
* The fix ([latest
commit](9776ed44cf))
updates `objSym.Name` to reference (via `StringRef`) the string saver's
copy.
Test:
1. For `lld/test/ELF/lto/asmundef.ll`, its test failure is reproducible
with `-DLLVM_USE_SANITIZER=Address` and gone with the fix.
3. Run all tests by following
https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild#try-local-changes.
* Without the fix, `ELF/lto/asmundef.ll` aborted the multi-stage test at
`@@@BUILD_STEP stage2/asan_ubsan check@@@`, defined
[here](https://github.com/llvm/llvm-zorg/blob/main/zorg/buildbot/builders/sanitizers/buildbot_fast.sh#L30)
* With the fix, the [multi-stage
test](https://github.com/llvm/llvm-zorg/blob/main/zorg/buildbot/builders/sanitizers/buildbot_fast.sh)
pass stage2 {asan, ubsan, masan}. This is also the test used by
https://lab.llvm.org/buildbot/#/builders/169
**Original commit message**
`StringMap<T>` creates a [copy of the
string](d4c519e7b2/llvm/include/llvm/ADT/StringMapEntry.h (L55-L58))
for entry insertions and intentionally keep copies [since the
implementation optimizes string memory
usage](d4c519e7b2/llvm/include/llvm/ADT/StringMap.h (L124)).
On the other hand, linker keeps copies of symbol names [1] in
`lld:🧝:parseFiles` [2] before invoking `compileBitcodeFiles` [3].
This change proposes to optimize away string copies inside
[LTO::GlobalResolutions](24e791b416/llvm/include/llvm/LTO/LTO.h (L409)),
which will make LTO indexing more memory efficient for ELF. There are
similar opportunities for other (COFF, wasm, MachO) formats.
The optimization takes place for lld (ELF) only. For the rest of use
cases (gold plugin, `llvm-lto2`, etc), LTO owns a string saver to keep
copies and use global resolution key for de-duplication.
Together with @kazutakahirata's work to make `ComputeCrossModuleImport`
more memory efficient, we see a ~20% peak memory usage reduction in a
binary where peak memory usage needs to go down. Thanks to the
optimization in
329ba523cc,
the max (as opposed to the sum) of `ComputeCrossModuleImport` or
`GlobalResolution` shows up in peak memory usage.
* Regarding correctness, the set of
[resolved](80c47ad3ae/llvm/lib/LTO/LTO.cpp (L739))
[per-module
symbols](80c47ad3ae/llvm/include/llvm/LTO/LTO.h (L188-L191))
is a subset of
[llvm::lto::InputFile::Symbols](80c47ad3ae/llvm/include/llvm/LTO/LTO.h (L120)).
And bitcode symbol parsing saves symbol name when iterating
`obj->symbols` in `BitcodeFile::parse` already. This change updates
`BitcodeFile::parseLazy` to keep copies of per-module undefined symbols.
* Presumably the undefined symbols in a LTO unit (copied in this patch
in linker unique saver) is a small set compared with the set of symbols
in global-resolution (copied before this patch), making this a
worthwhile trade-off. Benchmarking this change alone shows measurable
memory savings across various benchmarks.
[1] ELF
1cea5c2138/lld/ELF/InputFiles.cpp (L1748)
[2]
ef7b18a53c/lld/ELF/Driver.cpp (L2863)
[3]
ef7b18a53c/lld/ELF/Driver.cpp (L2995)
`StringMap<T>` creates a [copy of the
string](d4c519e7b2/llvm/include/llvm/ADT/StringMapEntry.h (L55-L58))
for entry insertions and intentionally keep copies [since the
implementation optimizes string memory
usage](d4c519e7b2/llvm/include/llvm/ADT/StringMap.h (L124)).
On the other hand, linker keeps copies of symbol names [1] in
`lld:🧝:parseFiles` [2] before invoking `compileBitcodeFiles` [3].
This change proposes to optimize away string copies inside
[LTO::GlobalResolutions](24e791b416/llvm/include/llvm/LTO/LTO.h (L409)),
which will make LTO indexing more memory efficient for ELF. There are
similar opportunities for other (COFF, wasm, MachO) formats.
The optimization takes place for lld (ELF) only. For the rest of use
cases (gold plugin, `llvm-lto2`, etc), LTO owns a string saver to keep
copies and use global resolution key for de-duplication.
Together with @kazutakahirata's work to make `ComputeCrossModuleImport`
more memory efficient, we see a ~20% peak memory usage reduction in a
binary where peak memory usage needs to go down. Thanks to the
optimization in
329ba523cc,
the max (as opposed to the sum) of `ComputeCrossModuleImport` or
`GlobalResolution` shows up in peak memory usage.
* Regarding correctness, the set of
[resolved](80c47ad3ae/llvm/lib/LTO/LTO.cpp (L739))
[per-module
symbols](80c47ad3ae/llvm/include/llvm/LTO/LTO.h (L188-L191))
is a subset of
[llvm::lto::InputFile::Symbols](80c47ad3ae/llvm/include/llvm/LTO/LTO.h (L120)).
And bitcode symbol parsing saves symbol name when iterating
`obj->symbols` in `BitcodeFile::parse` already. This change updates
`BitcodeFile::parseLazy` to keep copies of per-module undefined symbols.
* Presumably the undefined symbols in a LTO unit (copied in this patch
in linker unique saver) is a small set compared with the set of symbols
in global-resolution (copied before this patch), making this a
worthwhile trade-off. Benchmarking this change alone shows measurable
memory savings across various benchmarks.
[1] ELF
1cea5c2138/lld/ELF/InputFiles.cpp (L1748)
[2]
ef7b18a53c/lld/ELF/Driver.cpp (L2863)
[3]
ef7b18a53c/lld/ELF/Driver.cpp (L2995)
Summary:
Currently the `--lto-emit-llvm` option writes out the
post-internalization bitcode. This is the bitcode before any
optimizations or other pipelines have been run on it. This patch changes
that to use the pre-codegen module, which is the state of the LLVM-IR
after the optimizations have been run.
I believe that this makes sense as the `--lto-emit-llvm` option seems to
imply that we should emit the final output of the LLVM pass as if it
were the desired output. This should include optimizations at the
requested optimization level. My main motivation for this change is to
be able to use this to link several LLVM-IR files into a single one that
I can then pass back to `ld.lld` later (for JIT purposes).
Today `-split-machine-functions` and `-fbasic-block-sections={all,list}`
cannot be combined with `-basic-block-sections=labels` (the labels
option will be ignored).
The inconsistency comes from the way basic block address map -- the
underlying mechanism for basic block labels -- encodes basic block
addresses
(https://lists.llvm.org/pipermail/llvm-dev/2020-July/143512.html).
Specifically, basic block offsets are computed relative to the function
begin symbol. This relies on functions being contiguous which is not the
case for MFS and basic block section binaries. This means Propeller
cannot use binary profiles collected from these binaries, which limits
the applicability of Propeller for iterative optimization.
To make the `SHT_LLVM_BB_ADDR_MAP` feature work with basic block section
binaries, we propose modifying the encoding of this section as follows.
First let us review the current encoding which emits the address of each
function and its number of basic blocks, followed by basic block entries
for each basic block.
| | |
|--|--|
| Address of the function | Function Address |
| Number of basic blocks in this function | NumBlocks |
| BB entry 1
| BB entry 2
| ...
| BB entry #NumBlocks
To make this work for basic block sections, we treat each basic block
section similar to a function, except that basic block sections of the
same function must be encapsulated in the same structure so we can map
all of them to their single function.
We modify the encoding to first emit the number of basic block sections
(BB ranges) in the function. Then we emit the address map of each basic
block section section as before: the base address of the section, its
number of blocks, and BB entries for its basic block. The first section
in the BB address map is always the function entry section.
| | |
|--|--|
| Number of sections for this function | NumBBRanges |
| Section 1 begin address | BaseAddress[1] |
| Number of basic blocks in section 1 | NumBlocks[1] |
| BB entries for Section 1
|..................|
| Section #NumBBRanges begin address | BaseAddress[NumBBRanges] |
| Number of basic blocks in section #NumBBRanges |
NumBlocks[NumBBRanges] |
| BB entries for Section #NumBBRanges
The encoding of basic block entries remains as before with the minor
change that each basic block offset is now computed relative to the
begin symbol of its containing BB section.
This patch adds a new boolean codegen option `-basic-block-address-map`.
Correspondingly, the front-end flag `-fbasic-block-address-map` and LLD
flag `--lto-basic-block-address-map` are introduced.
Analogously, we add a new TargetOption field `BBAddrMap`. This means BB
address maps are either generated for all functions in the compiling
unit, or for none (depending on `TargetOptions::BBAddrMap`).
This patch keeps the functionality of the old
`-fbasic-block-sections=labels` option but does not remove it. A
subsequent patch will remove the obsolete option.
We refactor the `BasicBlockSections` pass by separating the BB address
map and BB sections handing to their own functions (named
`handleBBAddrMap` and `handleBBSections`). `handleBBSections` renumbers
basic blocks and places them in their assigned sections.
`handleBBAddrMap` is invoked after `handleBBSections` (if requested) and
only renumbers the blocks.
- New tests added:
- Two tests basic-block-address-map-with-basic-block-sections.ll and
basic-block-address-map-with-mfs.ll to exercise the combination of
`-basic-block-address-map` with `-basic-block-sections=list` and
'-split-machine-functions`.
- A driver sanity test for the `-fbasic-block-address-map` option
(basic-block-address-map.c).
- An LLD test for testing the `--lto-basic-block-address-map` option.
This reuses the LLVM IR from `lld/test/ELF/lto/basic-block-sections.ll`.
- Renamed and modified the two existing codegen tests for basic block
address map (`basic-block-sections-labels-functions-sections.ll` and
`basic-block-sections-labels.ll`)
- Removed `SHT_LLVM_BB_ADDR_MAP_V0` tests. Full deprecation of
`SHT_LLVM_BB_ADDR_MAP_V0` and `SHT_LLVM_BB_ADDR_MAP` version less than 2
will happen in a separate PR in a few months.
Port COFF's https://reviews.llvm.org/D78221 and
https://reviews.llvm.org/D137217 to ELF. For the in-process ThinLTO
link, `ld.lld --save-temps a.o d/b.o -o out` will create
ELF relocatable files `out.lto.a.o`/`d/out.lto.b.o` instead of
`out1.lto.o`/`out2.lto.o`. Deriving the LTO-generated relocatable file
name from bitcode file names helps debugging.
The relocatable file name from the first regular LTO partition does not
change: `out.lto.o`. The second, if present due to `--lto-partition=`,
changes from `out1.lto.o` to `lto.1.o`.
For an archive member, e.g. `d/a.a(coll.o at 8)`,
the relocatable file is `d/out.lto.a.a(coll.o at 8).o`.
`--lto-emit-asm` file names are changed similarly. `--lto-emit-asm -o
out` now creates `out.lto.s` instead of `out`, therefore the
`--lto-emit-asm -o -` idiom no longer works. However, I think this new
behavior (which matches COFF) is better since keeping or removing
`--lto-emit-asm` will dump different files, instead of overwriting the
`-o` output file from an executable/shared object to an assembly file.
Reviewers: rnk, igorkudrin, xur-llvm, teresajohnson, ZequanWu
Reviewed By: teresajohnson
Pull Request: https://github.com/llvm/llvm-project/pull/78835
Based on https://reviews.llvm.org/D45375 . Introduce a new InputFile
kind `InternalKind`, use it for
* `ctx.internalFile`: for linker-defined symbols and some synthesized
`Undefined`
* `createInternalFile`: for symbol assignments and --defsym
I picked "internal" instead of "synthetic" to avoid confusion with
SyntheticSection.
Currently a symbol's file is one of: nullptr, ObjKind, SharedKind,
BitcodeKind, BinaryKind. Now it's non-null (I plan to add an
`assert(file)` to Symbol::Symbol and change `toString(const InputFile
*)`
separately).
Debugging and error reporting gets improved. The immediate user-facing
difference is more descriptive "File" column in the --cref output. This
patch may unlock further simplification.
Currently each symbol assignment gets its own
`createInternalFile(cmd->location)`. Two symbol assignments in a linker
script do not share the same file. Making the file the same would be
nice, but would require non trivial code.
Discussion about this approach: https://discourse.llvm.org/t/rfc-safer-whole-program-class-hierarchy-analysis/65144/18
When enabling WPD in an environment where native binaries are present, types we want to optimize can be derived from inside these native files and devirtualizing them can lead to correctness issues. RTTI can be used as a way to determine all such types in native files and exclude them from WPD providing a safe checked way to enable WPD.
The approach is:
1. In the linker, identify if RTTI is available for all native types. If not, under `--lto-validate-all-vtables-have-type-infos` `--lto-whole-program-visibility` is automatically disabled. This is done by examining all .symtab symbols in object files and .dynsym symbols in DSOs for vtable (_ZTV) and typeinfo (_ZTI) symbols and ensuring there's always a match for every vtable symbol.
2. During thinlink, if `--lto-validate-all-vtables-have-type-infos` is set and RTTI is available for all native types, identify all typename (_ZTS) symbols via their corresponding typeinfo (_ZTI) symbols that are used natively or outside of our summary and exclude them from WPD.
Testing:
ninja check-all
large Meta service that uses boost, glog and libstdc++.so runs successfully with WPD via --lto-whole-program-visibility. Previously, native types in boost caused incorrect devirtualization that led to crashes.
Reviewed By: MaskRay, tejohnson
Differential Revision: https://reviews.llvm.org/D155659
This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future change to the params of
TargetMachine.
This matches other nearby enums.
For downstream users, this should be a fairly straightforward
replacement,
e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive
or s/CGFT_/CodeGenFileType::
The unified LTO pipeline creates a single LTO bitcode structure that can
be used by Thin or Full LTO. This means that the LTO mode can be chosen
at link time and that all LTO bitcode produced by the pipeline is
compatible, from an optimization perspective. This makes the behavior of
LTO a bit more predictable by normalizing the set of LTO features
supported by each LTO bitcode file.
Example usage:
clang -flto -funified-lto -fuse-ld=lld foo.c
clang -flto=thin -funified-lto -fuse-ld=lld foo.c
clang -c -flto -funified-lto foo.c # -flto={full,thin} are identical in
terms of compilation actions
clang -flto=thin -fuse-ld=lld foo.o # pass --lto=thin to ld.lld
clang -c -flto -funified-lto foo.c clang -flto -fuse-ld=lld foo.o
The RFC discussing the details and rational for this change is here:
https://discourse.llvm.org/t/rfc-a-unified-lto-bitcode-frontend/61774
Differential Revision: https://reviews.llvm.org/D123805
Currently, the --thinlto-prefix-replace="oldpath;newpath" option is used during
distributed ThinLTO thin links to specify the mapping of the input bitcode object
files' directory tree (oldpath) to the directory tree (newpath) used for both:
1) the output files of the thin link itself (the .thinlto.bc index files and the
optional .imports files)
2) the specified object file paths written to the response file given in the
--thinlto-index-only=${response} option, which is used by the final native
link and must match the paths of the native object files that will be
produced by ThinLTO backend compiles.
This patch expands the --thinlto-prefix-replace option to allow a separate directory
tree mapping to be specified for the object file paths written to the response file
(number 2 above). This is important to support builds and build systems where the
same output directory may not be written by multiple build actions (e.g. the thin link
and the ThinLTO backend compiles).
The new format is: --thinlto-prefix-replace="origpath;outpath[;objpath]"
This replaces the origpath directory tree of the thin link input files with
outpath when writing the thin link index and imports outputs (number 1
above). If objpath is specified it replaces origpath of the input files with
objpath when writing the response file (number 2 above), otherwise it
falls back to the old behavior of using outpath for this as well.
Reviewed By: tejohnson, MaskRay
Differential Revision: https://reviews.llvm.org/D144596
Allow controlling the CodeGenOpt::Level independent of the LTO
optimization level in LLD via new options for the COFF, ELF, MachO, and
wasm frontends to lld. Most are spelled as --lto-CGO[0-3], but COFF is
spelled as -opt:lldltocgo=[0-3].
See D57422 for discussion surrounding the issue of how to set the CG opt
level. The ultimate goal is to let each function control its CG opt
level, but until then the current default means it is impossible to
specify a CG opt level lower than 2 while using LTO. This option gives
the user a means to control it for as long as it is not handled on a
per-function basis.
Reviewed By: MaskRay, #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D141970
D138560 was abandonned as the use case can already be covered by `-Xoffload-linker --lto-emit-asm`.
However the output from `--lto-emit-asm` doesn't have
comments like the Clang `-S` output.
This patch adds verbose assembly output to LLD ELF LTO
so that the resulting assembly file more closely matches Clang's.
Having comments is especially important on targets such as AMDGPU because
they contain additional information about the kernel(s) being compiled.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D141268
MC and lld/ELF defaults were flipped in 2016. For Clang: CMake
ENABLE_X86_RELAX_RELOCATIONS defaults to on in 2020. It makes sense for
the TargetOptions default to be true now.
R_X86_64_GOTPCRELX/R_X86_64_REX_GOTPCRELX require GNU ld newer than 2015-10
(subsumed by the current requirement of -fbinutils-version=).
This should fix `rustc -Z plt=no` PIC relocatable files with GNU ld.
(See https://github.com/rust-lang/rust/pull/106380)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716