385 Commits

Author SHA1 Message Date
Andrew Ng
564c3de67d
[ThinLTO][NFC] Improve performance of addThinLTO (#166067)
Avoid the construction of `GUID` when not required. This improves the
performance of a LLD `--thinlto-index-only` link of `clang` by ~4-5% on
both Windows and Linux.
2025-11-03 11:10:44 +00:00
Kazu Hirata
042ac912b1
[llvm] Add "override" where appropriate (NFC) (#165168)
Note that "override" makes "virtual" redundant.

Identified with modernize-use-override.
2025-10-26 13:34:32 -07:00
Teresa Johnson
750a58337e
[ThinLTO] Simplify checking for single external copy (NFCI) (#164861)
Replace a loop over all summary copies with a simple check for a single
externally available copy of a symbol. The usage of this result has
changed since it was added and we now only need to know if there is a
single one.
2025-10-23 21:09:53 -07:00
Teresa Johnson
0341fb63f2
[ThinLTO] Avoid creating map entries on lookup (NFCI) (#164873)
We could inadvertently create new entries in the PrevailingModuleForGUID
map during lookup, which was always using operator[]. In most cases we
will have one for external symbols, but not in cases where the
prevailing copy is in a native object. Or if this happened to be looked
up for a local.

Make the map private and create and use accessors.
2025-10-23 19:40:42 -07:00
Andrew Ng
cf20a2685e
[DTLTO][Clang][LLD] Fix DTLTO for multi-call LLVM driver toolchain (#162456)
Add DTLTO linker option `--thinlto-remote-compiler-prepend-arg` to
enable support for the multi-call LLVM driver that requires an
additional option to specify the subcommand, e.g. "llvm clang ...".

Fixes https://github.com/llvm/llvm-project/issues/159125.
2025-10-23 16:34:28 +01:00
Teresa Johnson
25a915a91c
[ThinLTO] Add HasLocal flag to GlobalValueSummaryInfo (#164647)
Add a flag to the GlobalValueSummaryInfo indicating whether the
associated SummaryList (all summaries with the same GUID) contains any
summaries with local linkage. This flag is set when building the index,
so it is associated with the original linkage type before
internalization and promotion. Consumers should check the
withInternalizeAndPromote() flag on the index before using it.

In most cases we expect a 1-1 mapping between a GUID and a summary with
local linkage, because for locals the GUID is computed from the hash of
"modulepath;name". However, there can be multiple locals with the same
GUID if translation units are not compiled with enough path. And in rare
but theoretically possible cases, there can be hash collisions on the
underlying MD5 computation. So to be safe when looking for local
summaries, analyses currently look through all summaries in the list.
These lists can be extremely long in the case of large binaries with
template function defs in widely used headers (i.e. linkonce_odr).

A follow on change will use this flag to reduce ThinLTO analysis time in
WPD by 5-6% for a large target (details in PR164046 which will be
reworked to use this flag).

Note that in the past we have tried to keep bits related to the GUID in
the ValueInfo (which has a pointer to the associated
GlobalValueSummaryInfo), via its PointerIntPair. However, we are out of
bits there. This change does add a byte to every GlobalValueSummaryInfo
instance, which I measured as a little under 0.90% overhead in a large
target. However, it enables adding 7 bits of other per-GUID flags in the
future without adding more overhead. Note that it was lower overhead to
add this to the GlobalValueSummaryInfo than the ValueInfo, which tends
to be copied into other maps.
2025-10-22 11:33:58 -07:00
Teresa Johnson
eb74d8e03c
[ThinLTO] Add index flag for internalization/promotion status (#164530)
Add an index-wide flag indicating whether index-based internalization
and promotion have completed. This will be used in a follow on change.
2025-10-22 07:30:43 -07:00
Teresa Johnson
683e2bf059
[ThinLTO] Make SummaryList private (NFC) (#164355)
In preparation for a follow on change that will require checking every
time a new summary is added to the SummaryList for a GUID, make the
SummaryList private and require all accesses to go through one of two
new interfaces. Most changes are to access the list via the read only
getSummaryList() method, and the few that add new summaries (e.g. while
building the combined summary) use the new addSummary() method.
2025-10-21 06:53:40 -07:00
Teresa Johnson
2a7e7e2ac4
[MemProf] Convert removal of memprof attrs and metadata to a pass (#163841)
In preparation for a follow on fix that removes these attributes and
metadata in non-LTO pipelines, convert updateMemProfAttributes to a new
MemProfRemoveInfo pass that executes at the start of the LTO backend
pass pipelines when we don't have an index indicating that we linked
with a library support hot cold operator new.

This is largely NFC from an end user perspective but changes where the
removal can be observed, hence the test updates.

A follow on change will use the new pass for non-LTO pipelines (for
cases when the bitcode is initially matched with memprof data but we
decide to complete the compile without LTO).
2025-10-16 12:25:51 -07:00
Nicolai Hähnle
11a4b2d950
Cleanup the LLVM exported symbols namespace (#161240)
There's a pattern throughout LLVM of cl::opts being exported. That in
itself is probably a bit unfortunate, but what's especially bad about it
is that a lot of those symbols are in the global namespace. Move them
into the llvm namespace.

While doing this, I noticed some other variables in the global namespace
and moved them as well.
2025-10-01 15:32:07 -07:00
Tobias Stadler
dfbd76bda0
[Remarks] Restructure bitstream remarks to be fully standalone (#156715)
Currently there are two serialization modes for bitstream Remarks:
standalone and separate. The separate mode splits remark metadata (e.g.
the string table) from actual remark data. The metadata is written into
the object file by the AsmPrinter, while the remark data is stored in a
separate remarks file. This means we can't use bitstream remarks with
tools like opt that don't generate an object file. Also, it is confusing
to post-process bitstream remarks files, because only the standalone
files can be read by llvm-remarkutil. We always need to use dsymutil
to convert the separate files to standalone files, which only works for
MachO. It is not possible for clang/opt to directly emit bitstream
remark files in standalone mode, because the string table can only be
serialized after all remarks were emitted.

Therefore, this change completely removes the separate serialization
mode. Instead, the remark string table is now always written to the end
of the remarks file. This requires us to tell the serializer when to
finalize remark serialization. This automatically happens when the
serializer goes out of scope. However, often the remark file goes out of
scope before the serializer is destroyed. To diagnose this, I have added
an assert to alert users that they need to explicitly call
finalizeLLVMOptimizationRemarks.

This change paves the way for further improvements to the remark
infrastructure, including more tooling (e.g. #159784), size optimizations
for bitstream remarks, and more.

Pull Request: https://github.com/llvm/llvm-project/pull/156715
2025-09-22 16:41:39 +01:00
Alexandre Ganea
5cda2424c8
[LLD][COFF] Add more --time-trace tags for ThinLTO linking (#156471)
In order to better see what's going on during ThinLTO linking, this PR
adds more profile tags when using `--time-trace` on a `lld-link.exe`
invocation.

After PR, linking `clang.exe`:

<img width="3839" height="2026" alt="Capture d’écran 2025-09-02 082021"
src="https://github.com/user-attachments/assets/bf0c85ba-2f85-4bbf-a5c1-800039b56910"
/>

Linking a custom (Unreal Engine game) binary gives a completly
different picture, probably because of using Unity files, and the sheer
amount of input files (here, providing over 60 GB of .OBJs/.LIBs).

<img width="1940" height="1008" alt="Capture d’écran 2025-09-02 102048"
src="https://github.com/user-attachments/assets/60b28630-7995-45ce-9e8c-13f3cb5312e0"
/>
2025-09-05 15:28:19 -04:00
Matt Arsenault
3e5d8a1439 Reapply "RuntimeLibcalls: Generate table of libcall name lengths (#153… (#153864)
This reverts commit 334e9bf2dd01fbbfe785624c0de477b725cde6f2.

Check if llvm-nm exists before building the benchmark.
2025-08-16 09:53:50 +09:00
gulfemsavrun
334e9bf2dd
Revert "RuntimeLibcalls: Generate table of libcall name lengths (#153… (#153864)
…210)"

This reverts commit 9a14b1d254a43dc0d4445c3ffa3d393bca007ba3.

Revert "RuntimeLibcalls: Return StringRef for libcall names (#153209)"

This reverts commit cb1228fbd535b8f9fe78505a15292b0ba23b17de.

Revert "TableGen: Emit statically generated hash table for runtime
libcalls (#150192)"

This reverts commit 769a9058c8d04fc920994f6a5bbb03c8a4fbcd05.

Reverted three changes because of a CMake error while building llvm-nm
as reported in the following PR:
https://github.com/llvm/llvm-project/pull/150192#issuecomment-3192223073
2025-08-15 13:32:27 -07:00
Matt Arsenault
cb1228fbd5
RuntimeLibcalls: Return StringRef for libcall names (#153209)
Does not yet fully propagate this down into the TargetLowering
uses, many of which are relying on null checks on the returned
value.
2025-08-15 09:55:39 +09:00
Peter Collingbourne
6f2029dc4a LTO: Fix bug, Res was meant to be InputRes. 2025-07-30 14:30:00 -07:00
Peter Collingbourne
ff38981a58
LTO: Redesign the CFI !aliases metadata.
With the current aliases metadata we lose information about which groups
of aliases survive symbol resolution. This causes various problems such
as #150075 where symbol resolution breaks the link between alias groups.

In this redesign of the aliases metadata, we stop representing the
individual aliases in !aliases. Instead, the individual aliases are
represented in !cfi.functions in the same way as functions, and the
alias groups (i.e. groups of symbols with the same address) are stored
in !aliases. At symbol resolution time, we filter out all non-prevailing
members of !aliases; the resulting set is used by LowerTypeTests to
recreate the aliases.

With this change it is now possible for a jump table entry to refer
to an alias in one of the ThinLTO object files (e.g. if a function is
non-prevailing but its alias is prevailing), so instead of deleting them,
rename them with the ".cfi" suffix.

Fixes #150070.

Fixes #150075.

Reviewers: teresajohnson, vitalybuka

Reviewed By: vitalybuka

Pull Request: https://github.com/llvm/llvm-project/pull/150690
2025-07-30 14:04:11 -07:00
Vitaly Buka
b2d876a693
[LTO][NFC] Switch LTO API from output parameter to return value (#151272) 2025-07-30 09:32:38 -07:00
Kazu Hirata
72c0fc2305
[LTO] Remove an unnecessary cast (NFC) (#146275)
&I is already of const uint8_t *.
2025-06-29 19:19:38 -07:00
Matt Arsenault
b88e1f6a79
TableGen: Generate enum for runtime libcall implementations (#144973)
Work towards separating the ABI existence of libcalls vs. the
lowering selection. Set libcall selection through enums, rather
than through raw string names.
2025-06-27 17:40:43 +09:00
Jeremy Morse
97ac6483aa
[DebugInfo][RemoveDIs] Delete debug-info-format flag (#143746)
This flag was used to let us incrementally introduce debug records
into LLVM, however everything is now using records. It serves no
purpose now, so delete it.
2025-06-12 11:51:58 +01:00
Jeremy Morse
0e4b8b8f81
[DebugInfo][RemoveDIs] Rip out the UseNewDbgInfoFormat flag (#143207)
Start removing debug intrinsics support -- starting with the flag that
controls production of their replacement, debug records. This patch
removes the command-line-flag and with it the ability to switch back to
intrinsics. The module / function / block level "IsNewDbgInfoFormat"
flags get hardcoded to true, I'll to incrementally remove things that
depend on those flags.
2025-06-09 19:36:34 +01:00
Andrew Rogers
1a435522c0
[llvm] annotate interfaces in llvm/LTO for DLL export (#142499)
## Purpose

This patch is one in a series of code-mods that annotate LLVM’s public
interface for export. This patch annotates the `llvm/LTO` library. These
annotations currently have no meaningful impact on the LLVM build;
however, they are a prerequisite to support an LLVM Windows DLL (shared
library) build.

## Background

This effort is tracked in #109483. Additional context is provided in
[this
discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307),
and documentation for `LLVM_ABI` and related annotations is found in the
LLVM repo
[here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst).

The bulk of these changes were generated automatically using the
[Interface Definition Scanner (IDS)](https://github.com/compnerd/ids)
tool, followed formatting with `git clang-format`.

The following manual adjustments were also applied after running IDS on
Linux:
- Add `LLVM_ABI` to a small number of symbols that require export but
are not declared in headers

## Validation

Local builds and tests to validate cross-platform compatibility. This
included llvm, clang, and lldb on the following configurations:

- Windows with MSVC
- Windows with Clang
- Linux with GCC
- Linux with Clang
- Darwin with Clang
2025-06-03 09:53:57 -07:00
Kazu Hirata
64e89353b2
[LTO] Remove unused includes (NFC) (#141355)
These are identified by misc-include-cleaner.  I've filtered out those
that break builds.  Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
2025-05-24 09:37:37 -07:00
bd1976bris
6520b21ce0
[DTLTO][LLVM] Integrated Distributed ThinLTO (DTLTO) (#127749)
This patch adds initial support for Integrated Distributed ThinLTO
(DTLTO) in LLVM, which manages distribution internally during the
traditional link step. This enables compatibility with any build
system that supports in-process ThinLTO. In contrast, existing
approaches to distributed ThinLTO, which split the thin-link
(--thinlto-index-only), backend compilation, and final link into
separate steps, require build system support, e.g. Bazel.

This patch implements the core DTLTO mechanism, which enables
delegation of ThinLTO backend jobs to an external process (the
distributor). The distributor can then manage job distribution through
systems like Incredibuild. A generic JSON interface is used to
communicate with the distributor, allowing for the creation of new
distributors (and thus integration with different distribution
systems) without modifying LLVM.

Please see llvm/docs/dtlto.rst for more details.

RFC: https://discourse.llvm.org/t/rfc-integrated-distributed-thinlto/69641
Design Review: https://github.com/llvm/llvm-project/pull/126654
2025-05-23 20:07:53 +01:00
Kazu Hirata
aa15596b5f
[llvm] Remove unused local variables (NFC) (#138478) 2025-05-04 21:33:54 -07:00
Owen Rodley
d3d856ad84
Clean up external users of GlobalValue::getGUID(StringRef) (#129644)
See https://discourse.llvm.org/t/rfc-keep-globalvalue-guids-stable/84801
for context.

This is a non-functional change which just changes the interface of
GlobalValue, in preparation for future functional changes. This part
touches a fair few users, so is split out for ease of review. Future
changes to the GlobalValue implementation can then be focused purely on
that class.

This does the following:

* Rename GlobalValue::getGUID(StringRef) to
  getGUIDAssumingExternalLinkage. This is simply making explicit at the
  callsite what is currently implicit.
* Where possible, migrate users to directly calling getGUID on a
  GlobalValue instance.
* Otherwise, where possible, have them call the newly renamed
  getGUIDAssumingExternalLinkage, to make the assumption explicit.


There are a few cases where neither of the above are possible, as the
caller saves and reconstructs the necessary information to compute the
GUID themselves. We want to migrate these callers eventually, but for
this first step we leave them be.
2025-04-28 11:09:43 +10:00
Shilei Tian
b0fede358f
[ThinLTO] Don't convert functions to declarations if force-import-all is enabled (#134541)
On one hand, we intend to force import all functions when the option is
enabled.
On the other hand, we currently drop definitions of some functions and
convert
them to declarations, which contradicts this intent.

With this PR, functions will no longer be converted to declarations when
`force-import-all` is enabled.
2025-04-12 18:22:22 -04:00
Kazu Hirata
fae34938f6
[llvm] Use *Set::insert_range (NFC) (#132591)
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range.  This patch uses insert_range with
iterator ranges.  For each case, I've verified that foos is defined as
make_range(foo_begin(), foo_end()) or in a similar manner.
2025-03-22 22:14:45 -07:00
Mingming Liu
d99033e4b4
[LTO][WPD] Suppress WPD on a class if the LTO unit doesn't have the prevailing definition of this class (#131721)
Before this patch, whole program devirtualization is suppressed on a
class if any superclass is visible to regular object files, by recording
the class GUID in `VisibleToRegularObjSymbols`.

This patch suppresses whole program devirtualization on a class if the
LTO unit doesn't have the prevailing definition of this class (e.g., the
prevailing definition is in a shared library)

Implementation summaries:
1. In llvm/lib/LTO/LTO.cpp, `IsVisibleToRegularObj` is updated to look
at the global resolution's `IsPrevailing` bit for ThinLTO and
regularLTO.
2. In llvm/tools/llvm-lto2/llvm-lto2.cpp, 
- three command line options are added so `llvm-lto2` can override
`Conf.HasWholeProgramVisibility`, `Conf.ValidateAllVtablesHaveTypeInfos`
and `Conf.AllVtablesHaveTypeInfos`.
    
The test case is reduced from a small C++ program (main.cc, lib.cc/h
pasted below in [1]). To reproduce the program failure without this
patch, compile lib.cc into a shared library, and provide it to a ThinLTO
build of main.cc (commands are pasted in [2]).

[1]

* lib.h

```
#include <cstdio>

class Derived {
 public:
  void dispatch();
  virtual void print();
  virtual void sum();
};

void Derived::dispatch() {
  static_cast<Derived*>(this)->print();
  static_cast<Derived*>(this)->sum();
}

void Derived::sum() {
  printf("Derived::sum\n");
}

__attribute__((noinline)) void* create(int i);
__attribute__((noinline)) void* getPtr(int i);

```

* lib.cc

```
#include "lib.h"
#include <cstdio>
#include <iostream>

class Derived2 : public Derived {
public:
  void print() override {
    printf("DerivedSharedLib\n");
  }
  void sum() override {
    printf("DerivedSharedLib::sum\n");
  }
};


void Derived::print() {
  printf("Derived\n");
}

__attribute__((noinline)) void* create(int i) {
  if (i & 1)
    return new Derived2();
  return new Derived();
}

```

* main.cc

```
cat main.cc
#include "lib.h"

class DerivedN : public Derived {
 public:
};

__attribute__((noinline)) void* getPtr(int x) {
  return new DerivedN();
}


int main() {
  Derived*b = static_cast<Derived*>(create(201));
  b->dispatch();
  delete b;

  Derived* a = static_cast<Derived*>(getPtr(202));
  a->dispatch();
  delete a;
  return 0;
}

```

[2]
```
# compile lib.o in a shared library.
$ ./bin/clang++ -O2 -fPIC -c lib.cc -o lib.o
$ ./bin/clang++ -shared -o libdata.so lib.o

# Provide the shared library in `-ldata`
$ ./bin/clang++ -v -g -ldata --save-temps -fno-discard-value-names -Wl,-mllvm,-print-before=wholeprogramdevirt -Wl,-mllvm,-wholeprogramdevirt-check=trap -Rpass=wholeprogramdevirt -Wl,--lto-whole-program-visibility -Wl,--lto-validate-all-vtables-have-type-infos -mllvm -disable-icp=true -Wl,-mllvm,-disable-icp=false -flto=thin -fwhole-program-vtables  -fno-split-lto-unit -fuse-ld=lld   main.cc  -L . -o main >/tmp/wholeprogramdevirt.ir 2>&1

# Run the program hits a segmentation fault with  `-Wl,-mllvm,-wholeprogramdevirt-check=trap`

$ LD_LIBRARY_PATH=. ./main
DerivedSharedLib
Trace/breakpoint trap (core dumped)

```
2025-03-19 22:10:57 -07:00
Vitaly Buka
7dd5f23279
[NFC][LTO] Move GUID calculation into CfiFunctionIndex (#130370)
Preparation for CFI Index refactoring,
which will fix O(N^2) in ThinLTO indexing.
2025-03-07 18:59:48 -08:00
Nikita Popov
979c275097
[IR] Store Triple in Module (NFC) (#129868)
The module currently stores the target triple as a string. This means
that any code that wants to actually use the triple first has to
instantiate a Triple, which is somewhat expensive. The change in #121652
caused a moderate compile-time regression due to this. While it would be
easy enough to work around, I think that architecturally, it makes more
sense to store the parsed Triple in the module, so that it can always be
directly queried.

For this change, I've opted not to add any magic conversions between
std::string and Triple for backwards-compatibilty purses, and instead
write out needed Triple()s or str()s explicitly. This is because I think
a decent number of them should be changed to work on Triple as well, to
avoid unnecessary conversions back and forth.

The only interesting part in this patch is that the default triple is
Triple("") instead of Triple() to preserve existing behavior. The former
defaults to using the ELF object format instead of unknown object
format. We should fix that as well.
2025-03-06 10:27:47 +01:00
Rahul Joshi
6924fc0326
[LLVM] Add Intrinsic::getDeclarationIfExists (#112428)
Add `Intrinsic::getDeclarationIfExists` to lookup an existing
declaration of an intrinsic in a `Module`.
2024-10-16 07:21:10 -07:00
Kyungwoo Lee
e547d041fa Fix build failure for [CGData][ThinLTO] Global Outlining with Two-CodeGen Rounds (#90933) 2024-10-09 16:09:22 -07:00
Kyungwoo Lee
dc85d5263e
[CGData][ThinLTO] Global Outlining with Two-CodeGen Rounds (#90933)
This feature is enabled by `-codegen-data-thinlto-two-rounds`, which
effectively runs the `-codegen-data-generate` and `-codegen-data-use` in
two rounds to enable global outlining with ThinLTO.
1. The first round: Run both optimization + codegen with a scratch
output.
Before running codegen, we serialize the optimized bitcode modules to a
temporary path.
 2. From the scratch object files, we merge them into the codegen data.
3. The second round: Read the optimized bitcode modules and start the
codegen only this time.
Using the codegen data, the machine outliner effectively performs the
global outlining.
 
Depends on #90934, #110461 and  #110463.
This is a patch for
https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753.
2024-10-09 15:37:41 -07:00
Kyungwoo Lee
1b53aaec55
[ThinLTO][NFC] Refactor ThinBackend (#110461)
This is a prep for https://github.com/llvm/llvm-project/pull/90933.
 
 - Change `ThinBackend` from a function to a type.
 - Store the parallelism level in the type, which will be used when creating two-codegen round backends that inherit this value.
 - `ThinBackendProc` is hoisted to `LTO.h` from `LTO.cpp` to provide its body for `ThinBackend`. However, `emitFiles()` is still implemented separately in `LTO.cpp`, distinct from its parent class.
2024-10-07 22:24:04 -07:00
Nuri Amari
2edd897a42
Make WriteIndexesThinBackend multi threaded (#109847)
We've noticed that for large builds executing thin-link can take on the
order of 10s of minutes. We are only using a single thread to write the
sharded indices and import files for each input bitcode file. While we
need to ensure the index file produced lists modules in a deterministic
order, that doesn't prevent us from executing the rest of the work in
parallel.

In this change we use a thread pool to execute as much of the backend's
work as possible in parallel. In local testing on a machine with 80
cores, this change makes a thin-link for ~100,000 input files run in ~2
minutes. Without this change it takes upwards of 10 minutes.

---------

Co-authored-by: Nuri Amari <nuriamari@fb.com>
2024-10-07 08:16:46 -07:00
Kyungwoo Lee
ed59d571f2
[ThinLTO][NFC] Refactor FileCache (#110463)
This is a prep for https://github.com/llvm/llvm-project/pull/90933.
 
  - Change `FileCache` from a function to a type.
  - Store the cache directory in the type, which will be used when creating additional caches for two-codegen round runs that inherit this value.
2024-10-04 07:50:28 -07:00
Kyungwoo Lee
c1959813d6
[CGData][ThinLTO][NFC] Prep for two-codegen rounds (#90934)
This is NFC for https://github.com/llvm/llvm-project/pull/90933.

- Create a lambda function, `RunBackends`, to group the backend
operations into a single function.
- Explicitly pass the `CodeGenOnly` argument to thinBackend, instead of
depending on a configuration value.

Depends on https://github.com/llvm/llvm-project/pull/90304.
This is a patch for
https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753.
2024-10-03 09:58:01 -07:00
Kazu Hirata
3dad29b677
[LTO] Remove unused includes (NFC) (#108110)
clangd reports these as unused headers.  My manual inspection agrees
with the findings.
2024-09-10 19:36:04 -07:00
Mingming Liu
09b231cb38
Re-apply "[NFCI][LTO][lld] Optimize away symbol copies within LTO global resolution in ELF" (#107792)
Fix the use-after-free bug and re-apply
https://github.com/llvm/llvm-project/pull/106193
* Without the fix, the string referenced by `objSym.Name` could be
destroyed even if string saver keeps a copy of the referenced string.
This caused use-after-free.
* The fix ([latest
commit](9776ed44cf))
updates `objSym.Name` to reference (via `StringRef`) the string saver's
copy.

Test:
1. For `lld/test/ELF/lto/asmundef.ll`, its test failure is reproducible
with `-DLLVM_USE_SANITIZER=Address` and gone with the fix.
3. Run all tests by following
https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild#try-local-changes.
* Without the fix, `ELF/lto/asmundef.ll` aborted the multi-stage test at
`@@@BUILD_STEP stage2/asan_ubsan check@@@`, defined
[here](https://github.com/llvm/llvm-zorg/blob/main/zorg/buildbot/builders/sanitizers/buildbot_fast.sh#L30)
* With the fix, the [multi-stage
test](https://github.com/llvm/llvm-zorg/blob/main/zorg/buildbot/builders/sanitizers/buildbot_fast.sh)
pass stage2 {asan, ubsan, masan}. This is also the test used by
https://lab.llvm.org/buildbot/#/builders/169


**Original commit message**

`StringMap<T>` creates a [copy of the
string](d4c519e7b2/llvm/include/llvm/ADT/StringMapEntry.h (L55-L58))
for entry insertions and intentionally keep copies [since the
implementation optimizes string memory
usage](d4c519e7b2/llvm/include/llvm/ADT/StringMap.h (L124)).
On the other hand, linker keeps copies of symbol names [1] in
`lld:🧝:parseFiles` [2] before invoking `compileBitcodeFiles` [3].

This change proposes to optimize away string copies inside
[LTO::GlobalResolutions](24e791b416/llvm/include/llvm/LTO/LTO.h (L409)),
which will make LTO indexing more memory efficient for ELF. There are
similar opportunities for other (COFF, wasm, MachO) formats.

The optimization takes place for lld (ELF) only. For the rest of use
cases (gold plugin, `llvm-lto2`, etc), LTO owns a string saver to keep
copies and use global resolution key for de-duplication.

Together with @kazutakahirata's work to make `ComputeCrossModuleImport`
more memory efficient, we see a ~20% peak memory usage reduction in a
binary where peak memory usage needs to go down. Thanks to the
optimization in
329ba523cc,
the max (as opposed to the sum) of `ComputeCrossModuleImport` or
`GlobalResolution` shows up in peak memory usage.
* Regarding correctness, the set of
[resolved](80c47ad3ae/llvm/lib/LTO/LTO.cpp (L739))
[per-module
symbols](80c47ad3ae/llvm/include/llvm/LTO/LTO.h (L188-L191))
is a subset of
[llvm::lto::InputFile::Symbols](80c47ad3ae/llvm/include/llvm/LTO/LTO.h (L120)).
And bitcode symbol parsing saves symbol name when iterating
`obj->symbols` in `BitcodeFile::parse` already. This change updates
`BitcodeFile::parseLazy` to keep copies of per-module undefined symbols.
* Presumably the undefined symbols in a LTO unit (copied in this patch
in linker unique saver) is a small set compared with the set of symbols
in global-resolution (copied before this patch), making this a
worthwhile trade-off. Benchmarking this change alone shows measurable
memory savings across various benchmarks.

[1] ELF
1cea5c2138/lld/ELF/InputFiles.cpp (L1748)
[2]
ef7b18a53c/lld/ELF/Driver.cpp (L2863)
[3]
ef7b18a53c/lld/ELF/Driver.cpp (L2995)
2024-09-09 11:16:58 -07:00
Mingming Liu
1cc4c87198
Revert "[NFCI][LTO][lld] Optimize away symbol copies within LTO global resolution in ELF" (#107788)
Reverts llvm/llvm-project#106193 while investigating bot failures
https://lab.llvm.org/buildbot/#/builders/169/builds/2989/steps/9/logs/stdio
2024-09-08 16:45:59 -07:00
Mingming Liu
9ade4e2646
[NFCI][LTO][lld] Optimize away symbol copies within LTO global resolution in ELF (#106193)
`StringMap<T>` creates a [copy of the
string](d4c519e7b2/llvm/include/llvm/ADT/StringMapEntry.h (L55-L58))
for entry insertions and intentionally keep copies [since the
implementation optimizes string memory
usage](d4c519e7b2/llvm/include/llvm/ADT/StringMap.h (L124)).
On the other hand, linker keeps copies of symbol names [1] in
`lld:🧝:parseFiles` [2] before invoking `compileBitcodeFiles` [3].

This change proposes to optimize away string copies inside
[LTO::GlobalResolutions](24e791b416/llvm/include/llvm/LTO/LTO.h (L409)),
which will make LTO indexing more memory efficient for ELF. There are
similar opportunities for other (COFF, wasm, MachO) formats.

The optimization takes place for lld (ELF) only. For the rest of use
cases (gold plugin, `llvm-lto2`, etc), LTO owns a string saver to keep
copies and use global resolution key for de-duplication.

Together with @kazutakahirata's work to make `ComputeCrossModuleImport`
more memory efficient, we see a ~20% peak memory usage reduction in a
binary where peak memory usage needs to go down. Thanks to the
optimization in
329ba523cc,
the max (as opposed to the sum) of `ComputeCrossModuleImport` or
`GlobalResolution` shows up in peak memory usage.
* Regarding correctness, the set of
[resolved](80c47ad3ae/llvm/lib/LTO/LTO.cpp (L739))
[per-module
symbols](80c47ad3ae/llvm/include/llvm/LTO/LTO.h (L188-L191))
is a subset of
[llvm::lto::InputFile::Symbols](80c47ad3ae/llvm/include/llvm/LTO/LTO.h (L120)).
And bitcode symbol parsing saves symbol name when iterating
`obj->symbols` in `BitcodeFile::parse` already. This change updates
`BitcodeFile::parseLazy` to keep copies of per-module undefined symbols.
* Presumably the undefined symbols in a LTO unit (copied in this patch
in linker unique saver) is a small set compared with the set of symbols
in global-resolution (copied before this patch), making this a
worthwhile trade-off. Benchmarking this change alone shows measurable
memory savings across various benchmarks.

[1] ELF
1cea5c2138/lld/ELF/InputFiles.cpp (L1748)
[2]
ef7b18a53c/lld/ELF/Driver.cpp (L2863)
[3]
ef7b18a53c/lld/ELF/Driver.cpp (L2995)
2024-09-08 14:52:03 -07:00
Mingming Liu
d4ddf06b0c
[NFCI]Remove EntryCount from FunctionSummary and clean up surrounding synthetic count passes. (#107471)
The primary motivation is to remove `EntryCount` from `FunctionSummary`.
This frees 8 bytes out of `sizeof(FunctionSummary)` (136 bytes as of
64498c5483).

While I'm at it, this PR clean up {SummaryBasedOptimizations,
SyntheticCountsPropagation} since they were not used and there are no
plans to further invest on them.

With this patch, bitcode writer writes a placeholder 0 at the byte
offset of `EntryCount` and bitcode reader can parse the function entry
count at the correct byte offset. Added a TODO to stop writing
`EntryCount` and bump bitcode version
2024-09-06 16:38:17 -07:00
Nick Sarnie
fedc7556ad
[ThinLTO] Don't always print ModulesToCompile debugging information (#106769)
Nothing went wrong in this case, we just successfully matched a module
by identifier. No need to print to std::error like we would for
something that should be user-visible.

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
2024-09-03 07:50:23 -07:00
Kazu Hirata
5c0d61e318
[LTO] Reduce memory usage for import lists (#106772)
This patch reduces the memory usage for import lists by employing
memory-efficient data structures.

With this patch, an import list for a given destination module is
basically DenseSet<uint32_t> with each element indexing into the
deduplication table containing tuples of:

  {SourceModule, GUID, Definition/Declaration}

In one of our large applications, the peak memory usage goes down by
9.2% from 6.120GB to 5.555GB during the LTO indexing step.

This patch addresses several sources of space inefficiency associated
with std::unordered_map:

- std::unordered_map<GUID, ImportKind> takes up 16 bytes because of
  padding even though ImportKind only carries one bit of information.

- std::unordered_map uses pointers to elements, both in the hash table
  proper and for collision chains.

- We allocate an instance of std::unordered_map for each
  {Destination Module, Source Module} pair for which we have at least
  one import.  Most import lists have less than 10 imports, so the
  metadata like the size of std::unordered_map and the pointer to the
  hash table costs a lot relative to the actual contents.
2024-09-01 08:36:06 -07:00
Kazu Hirata
4f15039cf2
[LTO] Introduce new type alias ImportListsTy (NFC) (#106420)
The background is as follows.  I'm planning to reduce the memory
footprint of ThinLTO indexing by changing ImportMapTy, the data
structure used for an import list.  Once this patch lands, I'm
planning to change the type slightly.  The new type alias allows us to
update the type without touching many places.
2024-08-28 10:42:12 -07:00
Kazu Hirata
dbd7ce0ccd
[IR] Inroduce ModuleToSummariesForIndexTy (NFC) (#105906)
This patch introduces type alias ModuleToSummariesForIndexTy.

I'm planning to change the type slightly to allow heterogeneous lookup
(that is, std::map<K, V, std::less<>>) in a subsequent patch.  The
problem is that changing the type affects many places.  Using a type
alias reduces the impact.
2024-08-23 17:32:52 -07:00
Kazu Hirata
3563907969
[LTO] Turn ImportMapTy into a proper class (NFC) (#105748)
This patch turns type alias ImportMapTy into a proper class to provide
a more intuitive interface like:

  ImportList.addDefinition(...)

as opposed to:

  FunctionImporter::addDefinition(ImportList, ...)

Also, this patch requires all non-const accesses to go through
addDefinition, maybeAddDeclaration, and addGUID while providing const
accesses via:

  const ImportMapTyImpl &getImportMap() const { return ImportMap; }

I realize ImportMapTy may not be the best name as a class (maybe OK as
a type alias).  I am not renaming ImportMapTy in this patch at least
because there are 47 mentions of ImportMapTy under llvm/.
2024-08-22 21:56:01 -07:00
Kazu Hirata
5ddc79b093
[LTO] Use a range-based for loop (NFC) (#105467) 2024-08-21 07:23:30 -07:00