983 Commits

Author SHA1 Message Date
Kazu Hirata
3dad29b677
[LTO] Remove unused includes (NFC) (#108110)
clangd reports these as unused headers.  My manual inspection agrees
with the findings.
2024-09-10 19:36:04 -07:00
Mingming Liu
09b231cb38
Re-apply "[NFCI][LTO][lld] Optimize away symbol copies within LTO global resolution in ELF" (#107792)
Fix the use-after-free bug and re-apply
https://github.com/llvm/llvm-project/pull/106193
* Without the fix, the string referenced by `objSym.Name` could be
destroyed even if string saver keeps a copy of the referenced string.
This caused use-after-free.
* The fix ([latest
commit](9776ed44cf))
updates `objSym.Name` to reference (via `StringRef`) the string saver's
copy.

Test:
1. For `lld/test/ELF/lto/asmundef.ll`, its test failure is reproducible
with `-DLLVM_USE_SANITIZER=Address` and gone with the fix.
3. Run all tests by following
https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild#try-local-changes.
* Without the fix, `ELF/lto/asmundef.ll` aborted the multi-stage test at
`@@@BUILD_STEP stage2/asan_ubsan check@@@`, defined
[here](https://github.com/llvm/llvm-zorg/blob/main/zorg/buildbot/builders/sanitizers/buildbot_fast.sh#L30)
* With the fix, the [multi-stage
test](https://github.com/llvm/llvm-zorg/blob/main/zorg/buildbot/builders/sanitizers/buildbot_fast.sh)
pass stage2 {asan, ubsan, masan}. This is also the test used by
https://lab.llvm.org/buildbot/#/builders/169


**Original commit message**

`StringMap<T>` creates a [copy of the
string](d4c519e7b2/llvm/include/llvm/ADT/StringMapEntry.h (L55-L58))
for entry insertions and intentionally keep copies [since the
implementation optimizes string memory
usage](d4c519e7b2/llvm/include/llvm/ADT/StringMap.h (L124)).
On the other hand, linker keeps copies of symbol names [1] in
`lld:🧝:parseFiles` [2] before invoking `compileBitcodeFiles` [3].

This change proposes to optimize away string copies inside
[LTO::GlobalResolutions](24e791b416/llvm/include/llvm/LTO/LTO.h (L409)),
which will make LTO indexing more memory efficient for ELF. There are
similar opportunities for other (COFF, wasm, MachO) formats.

The optimization takes place for lld (ELF) only. For the rest of use
cases (gold plugin, `llvm-lto2`, etc), LTO owns a string saver to keep
copies and use global resolution key for de-duplication.

Together with @kazutakahirata's work to make `ComputeCrossModuleImport`
more memory efficient, we see a ~20% peak memory usage reduction in a
binary where peak memory usage needs to go down. Thanks to the
optimization in
329ba523cc,
the max (as opposed to the sum) of `ComputeCrossModuleImport` or
`GlobalResolution` shows up in peak memory usage.
* Regarding correctness, the set of
[resolved](80c47ad3ae/llvm/lib/LTO/LTO.cpp (L739))
[per-module
symbols](80c47ad3ae/llvm/include/llvm/LTO/LTO.h (L188-L191))
is a subset of
[llvm::lto::InputFile::Symbols](80c47ad3ae/llvm/include/llvm/LTO/LTO.h (L120)).
And bitcode symbol parsing saves symbol name when iterating
`obj->symbols` in `BitcodeFile::parse` already. This change updates
`BitcodeFile::parseLazy` to keep copies of per-module undefined symbols.
* Presumably the undefined symbols in a LTO unit (copied in this patch
in linker unique saver) is a small set compared with the set of symbols
in global-resolution (copied before this patch), making this a
worthwhile trade-off. Benchmarking this change alone shows measurable
memory savings across various benchmarks.

[1] ELF
1cea5c2138/lld/ELF/InputFiles.cpp (L1748)
[2]
ef7b18a53c/lld/ELF/Driver.cpp (L2863)
[3]
ef7b18a53c/lld/ELF/Driver.cpp (L2995)
2024-09-09 11:16:58 -07:00
Mingming Liu
1cc4c87198
Revert "[NFCI][LTO][lld] Optimize away symbol copies within LTO global resolution in ELF" (#107788)
Reverts llvm/llvm-project#106193 while investigating bot failures
https://lab.llvm.org/buildbot/#/builders/169/builds/2989/steps/9/logs/stdio
2024-09-08 16:45:59 -07:00
Mingming Liu
9ade4e2646
[NFCI][LTO][lld] Optimize away symbol copies within LTO global resolution in ELF (#106193)
`StringMap<T>` creates a [copy of the
string](d4c519e7b2/llvm/include/llvm/ADT/StringMapEntry.h (L55-L58))
for entry insertions and intentionally keep copies [since the
implementation optimizes string memory
usage](d4c519e7b2/llvm/include/llvm/ADT/StringMap.h (L124)).
On the other hand, linker keeps copies of symbol names [1] in
`lld:🧝:parseFiles` [2] before invoking `compileBitcodeFiles` [3].

This change proposes to optimize away string copies inside
[LTO::GlobalResolutions](24e791b416/llvm/include/llvm/LTO/LTO.h (L409)),
which will make LTO indexing more memory efficient for ELF. There are
similar opportunities for other (COFF, wasm, MachO) formats.

The optimization takes place for lld (ELF) only. For the rest of use
cases (gold plugin, `llvm-lto2`, etc), LTO owns a string saver to keep
copies and use global resolution key for de-duplication.

Together with @kazutakahirata's work to make `ComputeCrossModuleImport`
more memory efficient, we see a ~20% peak memory usage reduction in a
binary where peak memory usage needs to go down. Thanks to the
optimization in
329ba523cc,
the max (as opposed to the sum) of `ComputeCrossModuleImport` or
`GlobalResolution` shows up in peak memory usage.
* Regarding correctness, the set of
[resolved](80c47ad3ae/llvm/lib/LTO/LTO.cpp (L739))
[per-module
symbols](80c47ad3ae/llvm/include/llvm/LTO/LTO.h (L188-L191))
is a subset of
[llvm::lto::InputFile::Symbols](80c47ad3ae/llvm/include/llvm/LTO/LTO.h (L120)).
And bitcode symbol parsing saves symbol name when iterating
`obj->symbols` in `BitcodeFile::parse` already. This change updates
`BitcodeFile::parseLazy` to keep copies of per-module undefined symbols.
* Presumably the undefined symbols in a LTO unit (copied in this patch
in linker unique saver) is a small set compared with the set of symbols
in global-resolution (copied before this patch), making this a
worthwhile trade-off. Benchmarking this change alone shows measurable
memory savings across various benchmarks.

[1] ELF
1cea5c2138/lld/ELF/InputFiles.cpp (L1748)
[2]
ef7b18a53c/lld/ELF/Driver.cpp (L2863)
[3]
ef7b18a53c/lld/ELF/Driver.cpp (L2995)
2024-09-08 14:52:03 -07:00
Mingming Liu
d4ddf06b0c
[NFCI]Remove EntryCount from FunctionSummary and clean up surrounding synthetic count passes. (#107471)
The primary motivation is to remove `EntryCount` from `FunctionSummary`.
This frees 8 bytes out of `sizeof(FunctionSummary)` (136 bytes as of
64498c5483).

While I'm at it, this PR clean up {SummaryBasedOptimizations,
SyntheticCountsPropagation} since they were not used and there are no
plans to further invest on them.

With this patch, bitcode writer writes a placeholder 0 at the byte
offset of `EntryCount` and bitcode reader can parse the function entry
count at the correct byte offset. Added a TODO to stop writing
`EntryCount` and bump bitcode version
2024-09-06 16:38:17 -07:00
Nick Sarnie
fedc7556ad
[ThinLTO] Don't always print ModulesToCompile debugging information (#106769)
Nothing went wrong in this case, we just successfully matched a module
by identifier. No need to print to std::error like we would for
something that should be user-visible.

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
2024-09-03 07:50:23 -07:00
Kazu Hirata
5c0d61e318
[LTO] Reduce memory usage for import lists (#106772)
This patch reduces the memory usage for import lists by employing
memory-efficient data structures.

With this patch, an import list for a given destination module is
basically DenseSet<uint32_t> with each element indexing into the
deduplication table containing tuples of:

  {SourceModule, GUID, Definition/Declaration}

In one of our large applications, the peak memory usage goes down by
9.2% from 6.120GB to 5.555GB during the LTO indexing step.

This patch addresses several sources of space inefficiency associated
with std::unordered_map:

- std::unordered_map<GUID, ImportKind> takes up 16 bytes because of
  padding even though ImportKind only carries one bit of information.

- std::unordered_map uses pointers to elements, both in the hash table
  proper and for collision chains.

- We allocate an instance of std::unordered_map for each
  {Destination Module, Source Module} pair for which we have at least
  one import.  Most import lists have less than 10 imports, so the
  metadata like the size of std::unordered_map and the pointer to the
  hash table costs a lot relative to the actual contents.
2024-09-01 08:36:06 -07:00
Kazu Hirata
4f15039cf2
[LTO] Introduce new type alias ImportListsTy (NFC) (#106420)
The background is as follows.  I'm planning to reduce the memory
footprint of ThinLTO indexing by changing ImportMapTy, the data
structure used for an import list.  Once this patch lands, I'm
planning to change the type slightly.  The new type alias allows us to
update the type without touching many places.
2024-08-28 10:42:12 -07:00
Kazu Hirata
dbd7ce0ccd
[IR] Inroduce ModuleToSummariesForIndexTy (NFC) (#105906)
This patch introduces type alias ModuleToSummariesForIndexTy.

I'm planning to change the type slightly to allow heterogeneous lookup
(that is, std::map<K, V, std::less<>>) in a subsequent patch.  The
problem is that changing the type affects many places.  Using a type
alias reduces the impact.
2024-08-23 17:32:52 -07:00
Kazu Hirata
3563907969
[LTO] Turn ImportMapTy into a proper class (NFC) (#105748)
This patch turns type alias ImportMapTy into a proper class to provide
a more intuitive interface like:

  ImportList.addDefinition(...)

as opposed to:

  FunctionImporter::addDefinition(ImportList, ...)

Also, this patch requires all non-const accesses to go through
addDefinition, maybeAddDeclaration, and addGUID while providing const
accesses via:

  const ImportMapTyImpl &getImportMap() const { return ImportMap; }

I realize ImportMapTy may not be the best name as a class (maybe OK as
a type alias).  I am not renaming ImportMapTy in this patch at least
because there are 47 mentions of ImportMapTy under llvm/.
2024-08-22 21:56:01 -07:00
Kazu Hirata
3082a381f5
[LTO] Introduce helper functions to add GUIDs to ImportList (NFC) (#105555)
The new helper functions make the intent clearer while hiding
implementation details, including how we handle previously added
entries.  Note that:

- If we are adding a GUID as a GlobalValueSummary::Definition, then we
  override a previously added GlobalValueSummary::Declaration entry
  for the same GUID.

- If we are adding a GUID as a GlobalValueSummary::Declaration, then a
  previously added GlobalValueSummary::Definition entry for the same
  GUID takes precedence, and no change is made.
2024-08-22 12:06:47 -07:00
Kazu Hirata
5ddc79b093
[LTO] Use a range-based for loop (NFC) (#105467) 2024-08-21 07:23:30 -07:00
Kazu Hirata
d6d8243dcd
[LTO] Use DenseSet in computeLTOCacheKey (NFC) (#105466)
The two instances of std::set are used only for membership checking
purposes in computeLTOCacheKey.  We do not need std::set's strengths
like iterators staying valid or the ability to traverse in a sorted
order.  This patch changes them to DenseSet.

While I am at it, this patch replaces count with contains for slightly
increased readability.
2024-08-21 07:20:23 -07:00
Kazu Hirata
0f22d47a7a
[LTO] Teach computeLTOCacheKey to return std::string (NFC) (#105331)
Without this patch, computeLTOCacheKey computes SHA1, creates its
hexadecimal representation with toHex, which returns std::string, and
then copies it to an output parameter of type SmallString.

This patch removes the redirection and teaches computeLTOCacheKey to
directly return std::string computed by toHex.  With the move
semantics, no buffer copy should be involved.

While I am at it, this patch adds a Twine to concatenate two strings.
2024-08-20 20:56:47 -07:00
Peter Rong
74e4694b8c
[LTO] enable ObjCARCContractPass only on optimized build (#101114)
\#92331 tried to make `ObjCARCContractPass` by default, but it caused a
regression on O0 builds and was reverted.
This patch trys to bring that back by:

1. reverts the
[revert](1579e9ca9c).
2. `createObjCARCContractPass` only on optimized builds.

Tests are updated to refelect the changes. Specifically, all `O0` tests
should not include `ObjCARCContractPass`

Signed-off-by: Peter Rong <PeterRong@meta.com>
2024-08-09 13:04:25 -07:00
macurtis-amd
26e455bac0
[lld][LTO] Teach LTO to print pipeline passes (#101018)
I found this useful while debugging code generation differences between
old and new offloading drivers.
No functional change (intended).
2024-07-29 15:56:43 -04:00
Joseph Huber
615b7eeaa9 Reapply "[LLVM][LTO] Factor out RTLib calls and allow them to be dropped (#98512)"
This reverts commit 740161a9b98c9920dedf1852b5f1c94d0a683af5.

I moved the `ISD` dependencies into the CodeGen portion of the handling,
it's a little awkward but it's the easiest solution I can think of for
now.
2024-07-20 09:29:31 -05:00
NAKAMURA Takumi
5893b1e297 Reformat 2024-07-20 12:36:57 +09:00
NAKAMURA Takumi
740161a9b9 Revert "[LLVM][LTO] Factor out RTLib calls and allow them to be dropped (#98512)"
This reverts commit c05126bdfc3b02daa37d11056fa43db1a6cdef69.
(llvmorg-19-init-17714-gc05126bdfc3b)
See #99610
2024-07-20 12:36:57 +09:00
Joseph Huber
c05126bdfc
[LLVM][LTO] Factor out RTLib calls and allow them to be dropped (#98512)
Summary:
The LTO pass and LLD linker have logic in them that forces extraction
and prevent internalization of needed runtime calls. However, these
currently take all RTLibcalls into account, even if the target does not
support them. The target opts-out of a libcall if it sets its name to
nullptr. This patch pulls this logic out into a class in the header so
that LTO / lld can use it to determine if a symbol actually needs to be
kept.

This is important for targets like AMDGPU that want to be able to use
`lld` to perform the final link step, but does not want the overhead of
uncalled functions. (This adds like a second to the link time trivially)
2024-07-16 06:22:09 -05:00
Mingming Liu
50fea9943f
Reland "[ThinLTO][Bitcode] Generate import type in bitcode" (#97253)
https://github.com/llvm/llvm-project/pull/87600 was reverted in order to
revert
6262763341.
Now https://github.com/llvm/llvm-project/pull/95482 is fix forward for
6262763341.
This patch is a reland for
https://github.com/llvm/llvm-project/pull/87600

**Changes on top of original patch**
In `llvm/include/llvm/IR/ModuleSummaryIndex.h`, make the type of
`GVSummaryPtrSet` an `unordered_set` which is more memory efficient when
the number of elements is smaller than 128 [1]

**Original commit message**

For distributed ThinLTO, the LTO indexing step generates combined
summary for each module, and postlink pipeline reads the combined
summary which stores the information for link-time optimization.

This patch populates the 'import type' of a summary in bitcode, and
updates bitcode reader to parse the bit correctly.

[1]
393eff4e02/llvm/lib/Support/SmallPtrSet.cpp (L43)
2024-07-08 22:20:33 -07:00
Mingming Liu
af784a5c13
[ThinLTO] Use a set rather than a map to track exported ValueInfos. (#97360)
https://github.com/llvm/llvm-project/pull/95482 is a reland of
https://github.com/llvm/llvm-project/pull/88024.
https://github.com/llvm/llvm-project/pull/95482 keeps indexing memory
usage reasonable by using unordered_map and doesn't make other changes
to originally reviewed code.

While discussing possible ways to minimize indexing memory usage, Teresa
asked whether I need `ExportSetTy` as a map or a set is sufficient. This
PR implements the idea. It uses a set rather than a map to track exposed
ValueInfos.

Currently, `ExportLists` has two use cases, and neither needs to track a
ValueInfo's import/export status. So using a set is sufficient and
correct.
1) In both in-process and distributed ThinLTO, it's used to decide if a
function or global variable is visible [1] from another module after importing
creates additional cross-module references.
     * If a cross-module call edge is seen today, the callee must be visible
       to another module without keeping track of its export status already.
       For instance, this [2] is how callees of direct calls get exported.
2) For in-process ThinLTO [3], it's used to compute lto cache key.
     * The cache key computation already hashes [4] 'ImportList' , and 'ExportList' is
        determined by 'ImportList'. So it's fine to not track 'import type' for export list.

[1] 66cd8ec4c0/llvm/lib/LTO/LTO.cpp (L1815-L1819)
[2] 66cd8ec4c0/llvm/lib/LTO/LTO.cpp (L1783-L1794)
[3] 66cd8ec4c0/llvm/lib/LTO/LTO.cpp (L1494-L1496)
[4] b76100e220/llvm/lib/LTO/LTO.cpp (L194-L222)
2024-07-03 13:15:17 -07:00
Nikita Popov
4169338e75
[IR] Don't include Module.h in Analysis.h (NFC) (#97023)
Replace it with a forward declaration instead. Analysis.h is pulled in
by all passes, but not all passes need to access the module.
2024-06-28 14:30:47 +02:00
Joel E. Denny
d29fdfbc4e
[LTO] Avoid assert fail on failed pass plugin load (#96691)
Without this patch, passing -load-pass-plugin=nonexistent.so to
llvm-lto2 produces a backtrace because LTOBackend.cpp does not handle
the error correctly:

```
Failed to load passes from 'nonexistant.so'. Request ignored.
Expected<T> must be checked before access or destruction.
Unchecked Expected<T> contained error:
Could not load library 'nonexistant.so': nonexistant.so: cannot open shared object file: No such file or directoryPLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
```

Any tool using `lto::Config::PassPlugins` should suffer similarly.

Based on the message "Request ignored" and the continue statement, the
intention was apparently to continue on failure to load a plugin.
However, no one appears to rely on that behavior now given that it
crashes instead, and terminating is consistent with opt.
2024-06-26 14:51:24 -04:00
Nikita Popov
8bb3b1440c [TensorSpec] Avoid JSON.h include (NFC)
Instead forward declare the two classes that are referenced.
2024-06-21 15:23:40 +02:00
Mingming Liu
8d9db947b7
Reland "[ThinLTO] Populate declaration import status except for distributed ThinLTO under a default-off new option" (#95482)
Make `FunctionsToImportTy` an `unordered_map` rather than `DenseMap`.
Credit goes to jvoung@ for the 'DenseMap -> unordered_map' change. This
is a reland of https://github.com/llvm/llvm-project/pull/92718

* `DenseMap` allocates space for a large number of key/value pairs and
wastes space when the number of elements are small.
* While init bucket size is zero [1], it quickly allocates buckets for 64 elements [2]
when the number of elements is small (for example, 3 or 4 elements). The programmer
manual [3] also mentions it could waste space.
* Experiments show `FunctionsToImportTy.size()` is smaller than 4 for
multiple binaries with high indexing ram usage. `unordered_map` grows
factor is at most 2 in llvm libc [4] for insert operations.
 
With this change, `ComputeCrossModuleImport` ram increase is smaller
than 0.5G on a couple of binaries with high indexing ram usage. A wider
range of (pre-release) tests pass.

[1] ad79a14c9e/llvm/include/llvm/ADT/DenseMap.h (L431-L432) 
[2] ad79a14c9e/llvm/include/llvm/ADT/DenseMap.h (L849)
[3] https://llvm.org/docs/ProgrammersManual.html#llvm-adt-densemap-h
[4] ad79a14c9e/libcxx/include/__hash_table (L1525-L1526)

**Original commit message** 
The goal is to populate `declaration` import status if a new flag
`-import-declaration` is on.

* For in-process ThinLTO, the `declaration` status is visible to backend
`function-import` pass, so `FunctionImporter::importFunctions` should
read the import status and be no-op for declaration summaries.
Basically, the postlink pipeline is updated to keep its current behavior
(import definitions), but not updated to handle `declaration` summaries.
Two use cases ([better call-graph
sort](https://discourse.llvm.org/t/rfc-for-better-call-graph-sort-build-a-more-complete-call-graph-by-adding-more-indirect-call-edges/74029#support-cross-module-function-declaration-import-5)
or [cross-module
auto-init](https://github.com/llvm/llvm-project/pull/87597#discussion_r1556067195))
would use this bit differently.

* For distributed ThinLTO, the `declaration` status is not serialized to
bitcode. As discussed, https://github.com/llvm/llvm-project/pull/87600
will do this.
2024-06-20 10:50:31 -07:00
Nikita Popov
49ae2dcf36
[PassManager] Remove some unnecessary includes (NFC) (#96175)
SmallPtrSet.h and TimeProfiler.h are unused. CommandLine.h is only
needed for the UseNewDbgInfoFormat declare, which can be moved to the
places that need it.
2024-06-20 17:41:35 +02:00
Pierre van Houtryve
cb20d4d205
[NFC][CodeGen] Remove dead ParallelCG.h/.cpp API (#95770)
LTOBackend inlined it a while ago and now uses a static copy. This API
was unused.

We can always restore it at some point if it's needed, but right now
it's just bloat.
2024-06-19 09:07:11 +02:00
Abhina Sree
d3342e5b92
[SystemZ][z/OS] Continue marking text files with OF_Text (#95111)
Text files should be opened with OF_Text to have the correct encoding.
2024-06-12 09:22:21 -04:00
Mingming Liu
707f4de428
Revert "Reland "[ThinLTO] Populate declaration import status except for distributed ThinLTO under a default-off new option" (#92718) (#94503)
This reverts commit e33db249b53fb70dce62db3ebd82d42239bd1d9d.

The change from *set to *map increases memory usage, and caused indexing
OOM in some applications. Need to profile offline to bring the memory
usage down.
2024-06-05 10:06:55 -07:00
Mingming Liu
53061eecdb
Revert "[ThinLTO][Bitcode] Generate import type in bitcode (#87600)" (#94502)
This reverts commit 6262763341fcd71a2b0708cf7485f9abd1d26ba8, to prepare
for the revert of https://github.com/llvm/llvm-project/pull/92718.


https://github.com/llvm/llvm-project/pull/92718 causes LTO indexing OOM
in some applications.
2024-06-05 09:59:46 -07:00
Nikita Popov
1579e9ca9c Revert "Run ObjCContractPass in Default Codegen Pipeline (#92331)"
This reverts commit 8cc8e5d6c6ac9bfc888f3449f7e424678deae8c2.
This reverts commit dae55c89835347a353619f506ee5c8f8a2c136a7.

Causes major compile-time regressions for unoptimized builds.
2024-05-24 08:14:26 +02:00
Nuri Amari
8cc8e5d6c6
Run ObjCContractPass in Default Codegen Pipeline (#92331)
Prior to this patch, when using -fthinlto-index= the ObjCARCContractPass isn't run prior to CodeGen, and instruction selection fails on IR containing arc intrinsics. This patch is motivated by that usecase.

The pass was previously added in various places codegen is performed. This patch adds the pass to the default codegen pipepline, makes sure it bails immediately if no arc intrinsics are found, and removes the adhoc scheduling of the pass. 

Co-authored-by: Nuri Amari <nuriamari@fb.com>
2024-05-23 10:04:55 -07:00
Mingming Liu
6262763341
[ThinLTO][Bitcode] Generate import type in bitcode (#87600)
For distributed ThinLTO, the LTO indexing step generates combined
summary for each module, and postlink pipeline reads the combined
summary which stores the information for link-time optimization.

This patch populates the 'import type' of a summary in bitcode, and
updates bitcode reader to parse the bit correctly.
2024-05-22 09:52:54 -07:00
Mingming Liu
e33db249b5
Reland "[ThinLTO] Populate declaration import status except for distributed ThinLTO under a default-off new option" (#92718)
The original PR is reviewed in
https://github.com/llvm/llvm-project/pull/88024, and this PR adds one
line (b9f04d199d)
to fix test

Limit to one thread for in-process ThinLTO to test `LLVM_DEBUG` log.
- This should fix build bot failure like
https://lab.llvm.org/buildbot/#/builders/259/builds/4727 and
https://lab.llvm.org/buildbot/#/builders/9/builds/43876
- I could repro the failure and see interleaved log messages by using
`-thinlto-threads=all`

**Original Commit Message:**

The goal is to populate `declaration` import status if a new flag
`-import-declaration` is on.

* For in-process ThinLTO, the `declaration` status is visible to backend
`function-import` pass, so `FunctionImporter::importFunctions` should
read the import status and be no-op for declaration summaries.
Basically, the postlink pipeline is updated to keep its current behavior
(import definitions), but not updated to handle `declaration` summaries.
Two use cases ([better call-graph
sort](https://discourse.llvm.org/t/rfc-for-better-call-graph-sort-build-a-more-complete-call-graph-by-adding-more-indirect-call-edges/74029#support-cross-module-function-declaration-import-5)
or [cross-module
auto-init](https://github.com/llvm/llvm-project/pull/87597#discussion_r1556067195))
would use this bit differently.

* For distributed ThinLTO, the `declaration` status is not serialized to
bitcode. As discussed, https://github.com/llvm/llvm-project/pull/87600
will do this.
2024-05-20 08:55:31 -07:00
Kazu Hirata
32ae9a28a5
[llvm] Use SmallString::str (NFC) (#92712) 2024-05-19 22:48:06 -07:00
Mingming Liu
6b0733e3a3
Revert "[ThinLTO] Populate declaration import status except for distributed ThinLTO under a default-off new option" (#92715)
Reverts llvm/llvm-project#88024

Build bot failures
(https://lab.llvm.org/buildbot/#/builders/259/builds/4727 and
https://lab.llvm.org/buildbot/#/builders/9/builds/43876)
2024-05-19 22:42:18 -07:00
Mingming Liu
8de7890572
[ThinLTO] Populate declaration import status except for distributed ThinLTO under a default-off new option (#88024)
The goal is to populate `declaration` import status if a new flag`-import-declaration` is on.

* For in-process ThinLTO, the `declaration` status is visible to backend
`function-import` pass, so `FunctionImporter::importFunctions` should
read the import status and be no-op for declaration summaries.
Basically, the postlink pipeline is updated to keep its current behavior
(import definitions), but not updated to handle `declaration` summaries.
Two use cases (better call-graph sort and cross-module auto-init)
would use this bit differently.

* For distributed ThinLTO, the `declaration` status is not serialized to
bitcode. As discussed, https://github.com/llvm/llvm-project/pull/87600
will do this.

[1] https://discourse.llvm.org/t/rfc-for-better-call-graph-sort-build-a-more-complete-call-graph-by-adding-more-indirect-call-edges/74029#support-cross-module-function-declaration-import-5
[2] https://github.com/llvm/llvm-project/pull/87597#discussion_r1556067195
2024-05-19 22:22:47 -07:00
Mingming Liu
d34be649af
[ThinLTO]Sort imported GUIDs before cache key update (#92622)
Add 'sort' here since it's helpful when container type
changes (for example, https://github.com/llvm/llvm-project/pull/88024
wants to change container type from `unordered_set` to `DenseMap)

@MaskRay points out `std::` doesn't randomize the iteration order of
`unordered_{set,map}`, and the iteration order for single build is
deterministic.
2024-05-18 19:39:57 -07:00
Kazu Hirata
bb6df0804b
[llvm] Use StringRef::operator== instead of StringRef::equals (NFC) (#91441)
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.

- StringRef::operator==/!= outnumber StringRef::equals by a factor of
  70 under llvm/ in terms of their usage.

- The elimination of StringRef::equals brings StringRef closer to
  std::string_view, which has operator== but not equals.

- S == "foo" is more readable than S.equals("foo"), especially for
  !Long.Expression.equals("str") vs Long.Expression != "str".
2024-05-08 10:33:53 -07:00
Florian Hahn
f3ac55fab8
[LTO] Reset DiscardValueNames in optimize(). (#78705)
libLTO parses options late, so at the moment the option is ignored. To
fix that, re-set it in optimize(), as at this point the options have been
parsed. When LTOCodeGenerator's constructor executes, the options
haven't been parsed by the linker to libLTO yet.

Note that we keep the value name of `%add = add..` because when the
module is imported, DiscardValueNames is still set to false (the default
when building with assertions).

I tried to improve this in libLTO, but I am not sure if there's a
suitable callback when all options have been set.

PR: https://github.com/llvm/llvm-project/pull/78705
2024-04-30 12:32:29 +01:00
Kazu Hirata
d6bf04f476
[LTO] Remove extraneous ArrayRef (NFC) (#90306)
We don't need to explicitly create these instances of ArrayRef because
Hasher::update takes ArrayRef, and ArrayRef can be implicitly
constructed from C arrays.
2024-04-26 18:38:15 -07:00
Pierre van Houtryve
e86ebe4ff8
[LTO] Allow target-specific module splittting (#83128)
Allow targets to implement custom module splitting logic for
--lto-partitions, see #89245

https://discourse.llvm.org/t/rfc-lto-target-specific-module-splittting/77252
2024-04-22 08:59:18 +02:00
Orlando Cazalet-Hyams
b3f98dff75
[RemoveDIs] Load into new debug info format by default in llvm-lto and llvm-lto2 (#86271)
Directly load all bitcode into the new debug info format in `llvm-lto`
and `llvm-lto2`. This means that new-mode bitcode no longer round-trips
back to old-mode after parsing, and that old-mode bitcode gets
auto-upgraded to new-mode debug info (which is the current in-memory
default in LLVM).
2024-03-22 13:52:11 +00:00
Fangrui Song
a331937197 [MC] Move CompressDebugSections/RelaxELFRelocations from TargetOptions/MCAsmInfo to MCTargetOptions
The convention is for such MC-specific options to reside in
MCTargetOptions. However, CompressDebugSections/RelaxELFRelocations do
not follow the convention: `CompressDebugSections` is defined in both
TargetOptions and MCAsmInfo and there is forwarding complexity.

Move the option to MCTargetOptions and hereby simplify the code. Rename
the misleading RelaxELFRelocations to X86RelaxRelocations. llvm-mc
-relax-relocations and llc -x86-relax-relocations can now be unified.
2024-03-06 23:19:59 -08:00
Mehdi Amini
716042a63f
Rename llvm::ThreadPool -> llvm::DefaultThreadPool (NFC) (#83702)
The base class llvm::ThreadPoolInterface will be renamed
llvm::ThreadPool in a subsequent commit.

This is a breaking change: clients who use to create a ThreadPool must
now create a DefaultThreadPool instead.
2024-03-05 18:00:46 -08:00
Mehdi Amini
67221ed886 More fix BUILD_SHARED_LIBS=ON build for platforms which require explicit link of -lpthread (NFC)
Some systems requires explictly providing -lpthread when linking, I don't
have such system so it is hard to find all the missing cases.
2024-03-02 19:44:03 -08:00
Jan Svoboda
695b630ae1
[ThinLTO] NFC: Merge duplicated functions together (#82421) 2024-02-26 09:44:01 -08:00
Igor Kudrin
ec24094b56
[LTO] Remove Config.UseDefaultPipeline (#82587)
This option is not used. It was added in
[D122133](https://reviews.llvm.org/D122133), 5856f30b, with the only
usage in `ClangLinkerWrapper.cpp`, which was later updated in a1d57fc2,
and then finally removed in [D142650](https://reviews.llvm.org/D142650),
6185246f.
2024-02-23 01:05:06 +07:00
Mehdi Amini
744616b3ae
Rename ThreadPool::getThreadCount() to getMaxConcurrency() (NFC) (#82296)
This is addressing a long-time TODO to rename this misleading API. The
old one is preserved for now but marked deprecated.
2024-02-19 18:07:12 -08:00