621 Commits

Author SHA1 Message Date
Paschalis Mpeis
cb9bacf57d
[AArch64][BOLT] Ensure tentative code layout for cold BBs runs. (#96609)
When split functions is used, BOLT may skip tentative code layout
estimation in some cases, like:
- when there is no profile data for some blocks (ie cold blocks)
- when there are cold functions in lite mode
- when skip functions is used
     
However, when rewriting the binary we still need to compute PC-relative
distances between hot and cold basic blocks. Without cold layout
estimation, BOLT uses '0x0' as the address of the first cold block,
leading to incorrect estimations of any PC-relative addresses.
 
This affects large binaries as the relaxStub method expands more
branches than necessary using the short-jump sequence, at it wrongly
believes it has exceeded the branch distance boundary.
 
This increases code size with both a larger and slower sequence;
however,
performance regression is expected to be minimal since this only affects
any called cold code.
 
Example of such an unnecessary relaxation:
from:
```armasm
b       .Ltmp1234
```
 
to:
```armasm
adrp    x16, .Ltmp1234
add     x16, x16, :lo12:.Ltmp1234
br      x16
```
2024-10-17 08:59:05 +01:00
Maksim Panchenko
0e86e5214c
[BOLT][AArch64] Reduce the number of ADR relaxations (#111577)
If ADR instruction references the same function, we can skip relaxation
even if the function is split but ADR is in the main fragment.
2024-10-08 16:15:00 -07:00
ShatianWang
4cab01f072
[BOLT] Profile quality stats -- CFG discontinuity (#109683)
In a perfect profile, each positive-execution-count block in the
function’s CFG should be reachable from a positive-execution-count
function entry block through a positive-execution-count path. This new
pass checks how well the BOLT input profile satisfies this “CFG
continuity” property.

More specifically, for each of the hottest 1000 functions, the pass
calculates the function’s fraction of basic block execution counts that
is “unreachable”. It then reports the 95th percentile of the
distribution of the 1000 unreachable fractions in a single BOLT-INFO
line. The smaller the reported value is, the better the BOLT profile
satisfies the CFG continuity property.

The default value of 1000 above can be changed via the hidden BOLT
option `-num-functions-for-continuity-check=[N]`. If more detailed stats
are needed, `-v=1` can be added to the BOLT invocation: the hottest N
functions will be grouped into 5 equally-sized buckets, from the hottest
to the coldest; for each bucket, various summary statistics of the
distribution of the fractions and the raw unreachable execution counts
will be reported.
2024-10-08 19:07:43 -04:00
Tex Riddell
e237d8aac8
[BOLT] Fix tests broken by abe0dd1 (#110071)
abe0dd195a3b2630afdc5c1c233eb2a068b2d72f (#109553) changed default
llvm-objdump output for consecutive zeros.

This broke two tests:
BOLT :: AArch64/constant_island_pie_update.s
BOLT :: AArch64/update-weak-reference-symbol.s

This fixes the test failures by adding -z to llvm-objdump in RUN line.
2024-09-25 19:34:57 -07:00
Maksim Panchenko
4db0cc4c55
[BOLT] Allow sections in --print-only flag (#109622)
While printing functions, expand --print-only flag to accept section
names. E.g., "--print-only=\.init" will only print functions from
".init" section.
2024-09-25 23:44:06 +02:00
Maksim Panchenko
6fb39ac77b
[BOLT][merge-fdata] Initialize YAML profile header (#109613)
While merging profiles, some fields in the input header, e.g.
HashFunction, could be uninitialized leading to a UMR. Initialize merged
header with the first input header.

Fixes #109592
2024-09-25 23:18:34 +02:00
Amir Ayupov
300051159b [BOLT][test] Update log.test and perf_test
Address noisy tests by:
- perf_test: bumping sampling frequency to maximum,
- log.test: matching Binary Function "main"
2024-09-23 15:47:19 -07:00
sinan
31ac3d092b
[BOLT] Add .iplt support to x86 (#106513)
Add X86 support for parsing .iplt section and symbols.
2024-09-23 18:22:43 +08:00
Tom Stellard
773353b20a
[bolt][tests] Skip tests that use perf when perf counters are unavailable (#107892)
On the GitHub Action runners, perf always fails with the error below ,
so we need to skip the perf tests on platforms like this that have
limited access to the perf counters.

```
Access to performance monitoring and observability operations is limited.
Consider adjusting /proc/sys/kernel/perf_event_paranoid setting to open
access to performance monitoring and observability operations for processes
without CAP_PERFMON, CAP_SYS_PTRACE or CAP_SYS_ADMIN Linux capability.
More information can be found at 'Perf events and tool security' document:
https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
perf_event_paranoid setting is 4:
  -1: Allow use of (almost) all events by all users
      Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
>= 0: Disallow raw and ftrace function tracepoint access
>= 1: Disallow CPU event access
>= 2: Disallow kernel profiling
To make the adjusted perf_event_paranoid setting permanent preserve it
in /etc/sysctl.conf (e.g. kernel.perf_event_paranoid = <setting>)
```
2024-09-17 17:07:35 -07:00
Nikita Popov
827dd1ef2f
[Bolt] Explicitly request PIE in tests (#108818)
When clang is built with `-DCLANG_DEFAULT_PIE_ON_LINUX=OFF`, a number of
bolt tests fail:

    BOLT :: AArch64/build_id.c
    BOLT :: AArch64/plt-call.test
BOLT :: X86/dwarf5-dwarf4-types-backward-forward-cross-reference.test
    BOLT :: X86/dwarf5-locexpr-referrence.test
    BOLT :: X86/internal-call-instrument.s
    BOLT :: X86/linux-static-keys.s
    BOLT :: X86/plt-call.test

Avoid this by explicitly adding `-fPIE` and `-pie` to the default flags
in tests, so we don't depend on the clang-side default.
2024-09-17 08:58:49 +02:00
Amir Ayupov
c00c62c113
[BOLT] Add pseudo probe inline tree to YAML profile
Add probe inline tree information to YAML profile, at function level:
- function GUID,
- checksum,
- parent node id,
- call site in the parent.

This information is used for pseudo probe block matching (#99891).

The encoding adds/changes probe information in multiple levels of
YAML profile:
- BinaryProfile: add pseudo_probe_desc with GUIDs and Hashes, which
  permits deduplication of data:
  - many GUIDs are duplicate as the same callee is commonly inlined
    into multiple callers,
  - hashes are also very repetitive, especially for functions with
    low block counts.
- FunctionProfile: add inline tree (see above). Top-level function
  is included as root of function inline tree, which makes guid and
  pseudo_probe_desc_hash fields redundant.
- BlockProfile: densely-encoded block probe information:
  - probes reference their containing inline tree node,
  - separate lists for block, call, indirect call probes,
  - block probe encoding is specialized: ids are encoded as bitset
    in uint64_t. If only block probe with id=1 is present, it's
    encoded as implicit entry (id=0, omitted).
  - inline tree nodes with identical probes share probe description
    where node indices are combined into a list.

On top of #107970, profile with new probe encoding has the following
characteristics (profile for a large binary):

- Profile without probe information: 33MB, 3.8MB compressed (baseline).
- Profile with inline tree information: 92MB, 14MB compressed.

Profile processing time (YAML parsing, inference, attaching steps):
- profile without pseudo probes: 5s,
- profile with pseudo probes, without pseudo probe matching: 11s,
- with pseudo probe matching: 12.5s.

Test Plan: updated pseudoprobe-decoding-inline.test

Reviewers: wlei-llvm, ayermolo, rafaelauler, dcci, maksfb

Reviewed By: wlei-llvm, rafaelauler

Pull Request: https://github.com/llvm/llvm-project/pull/107137
2024-09-12 20:51:35 -07:00
Amir Ayupov
c820bd3e33
[BOLT][NFC] Rename profile-use-pseudo-probes
The flag currently controls writing of probe information in YAML
profile. #99891 adds a separate flag to use probe information for stale
profile matching. Thus `profile-use-pseudo-probes` becomes a misnomer
and `profile-write-pseudo-probes` better captures the intent.

Reviewers: maksfb, WenleiHe, ayermolo, rafaelauler, dcci

Reviewed By: rafaelauler

Pull Request: https://github.com/llvm/llvm-project/pull/106364
2024-09-11 16:27:33 -07:00
Amir Ayupov
15fa3ba547
[BOLT][YAML] Allow unknown keys in the input (#100824)
This ensures forward compatibility, where old BOLT versions can consume
the profile created by newer versions with extra keys.

Test Plan: added yaml-unknown-keys.test
2024-09-03 11:27:57 -07:00
Maksim Panchenko
abd69b3653
[BOLT] Handle internal calls in ValidateInternalCalls (#105736)
Move handling of all internal calls into the designated pass. Preserve
NOPs and mark functions as non-simple on non-X86 platforms.
2024-08-27 11:31:32 -07:00
Harini0924
7f3793207b
[BOLT][test] Removed the use of parentheses in BOLT tests with lit internal shell (#105720)
This patch addresses compatibility issues with the lit internal shell by
removing the use of subshell execution (parentheses and subshell syntax)
in the `BOLT` tests. The lit internal shell does not support
parentheses, so the tests have been refactored to use separate command
invocations, with outputs redirected to temporary files where necessary.

This change is relevant for enabling the lit internal shell by default,
as outlined in [[RFC] Enabling the Lit Internal Shell by
Default](https://discourse.llvm.org/t/rfc-enabling-the-lit-internal-shell-by-default/80179)

fixes: #102401
2024-08-23 08:20:11 -07:00
ShatianWang
cbd302410e
[BOLT] Improve BinaryFunction::inferFallThroughCounts() (#105450)
This PR improves how basic block execution count is updated when using
the BOLT option `-infer-fall-throughs`. Previously, if a 0-count
fall-through edge is assigned a positive inferred count N, then the
successor block's execution count will be incremented by N. Since the
successor's execution count is calculated using information besides
inflow sum (such as outflow sum), it likely is already correct, and
incrementing it by an additional N would be wrong. This PR improves how
the successor's execution count is updated by using the max over its
current count and N.
2024-08-21 00:35:07 -04:00
Harini0924
4f5d866af7
[llvm-lit] Add REQUIRES: shell to BOLT permission test for lit internal shell (#103012)
This patch adds the `REQUIRES: shell` directive to the BOLT permission
test to ensure it only runs in environments with a full-featured
Unix-like shell. This change is necessary because the test relies on
advanced shell capabilities that are not supported by lit's internal
shell.

**Reasoning:** The BOLT permission test uses features like running
commands in the background with `&`, performing arithmetic operations,
and handling special number formats (octal). These features require a
more capable shell than what lit's internal shell provides. Without a
proper shell, the test could fail or behave unpredictably.

This change is relevant for enabling the lit internal shell by default,
as outlined in [[RFC] Enabling the Lit Internal Shell by
Default](https://discourse.llvm.org/t/rfc-enabling-the-lit-internal-shell-by-default/80179)
2024-08-13 19:58:59 -07:00
Connie
887f7002b6
[NFC][bolt][test] Change '|&' to '2>&1 |' for lit internal shell support (#102402)
This patches changes all references to '|&' in bolt tests to instead use
the '2>&1 |' syntax for better consistency across testing and so that
lit's internal shell can be used to run these tests. This addresses a
suggestion made in the comments of this RFC:
https://discourse.llvm.org/t/rfc-enabling-the-lit-internal-shell-by-default/80179.

Fixes https://github.com/llvm/llvm-project/issues/102388
2024-08-12 17:18:17 -07:00
Sayhaan Siddiqui
6aad62cf5b
[BOLT][DWARF] Add parallelization for processing of DWO debug information (#100282)
Enables parallelization for the processing of DWO CUs.
2024-08-08 16:41:51 -07:00
Davide Italiano
e49549ff19 Revert "[BOLT] Abort on out-of-section symbols in GOT (#100801)"
This reverts commit a4900f0d936f0e86bbd04bd9de4291e1795f1768.
2024-08-07 20:52:19 -07:00
Vladislav Khmelevsky
a4900f0d93
[BOLT] Abort on out-of-section symbols in GOT (#100801)
This patch aborts BOLT execution if it finds out-of-section (section
end) symbol in GOT table. In order to handle such situations properly in
future, we would need to have an arch-dependent way to analyze
relocations or its sequences, e.g., for ARM it would probably be ADRP +
LDR analysis in order to get GOT entry address. Currently, it is also
challenging because GOT-related relocation symbols are replaced to
__BOLT_got_zero. Anyway, it seems to be quite a rare case, which seems
to be only? related to static binaries. For the most part, it seems that
it should be handled on the linker stage, since static binary should not
have GOT table at all. LLD linker with relaxations enabled would replace
instruction addresses from GOT directly to target symbols, which
eliminates the problem.

Anyway, in order to achieve detection of such cases, this patch fixes a
few things in BOLT:
1. For the end symbols, we're now using the section provided by ELF
binary. Previously it would be tied with a wrong section found by symbol
address.
2. The end symbols would have limited registration we would only
add them in name->data GlobalSymbols map, since using address->data
BinaryDataMap map would likely be impossible due to address duality of
such symbols.
3. The outdated BD->getSection (currently returning refence, not
pointer) check in postProcessSymbolTable is replaced by getSize check in
order to allow zero-sized top-level symbols if they are located in
zero-sized sections. For the most part, such things could only be found
in tests, but I don't see a reason not to handle such cases.
4. Updated section-end-sym test and removed x86_64 requirement since
there is no reason for this (tested on aarch64 linux)

The test was provided by peterwaller-arm (thank you) in #100096 and
slightly modified by me.
2024-08-07 16:26:12 +04:00
Vladislav Khmelevsky
097ddd3565
[BOLT] Fix relocations handling (#100890)
After porting BOLT to RISCV some of the relocations were broken on both
AArch64 and X86.
On AArch64 the example of broken relocations would be GOT, during
handling them, we should replace the symbol to __BOLT_got_zero in order
to address GOT entry, not the symbol that addresses this entry. This is
done further in code, so it is too early to add rel here.
On X86 it is a mistake to add relocations without addend. This is the
exact problem that is raised on #97937. Due to different code generation
I had to use gcc-generated yaml test, since with clang I wasn't able to
reproduce problem.
Added tests for both architectures and made the problematic condition
riscV-specific.
2024-08-07 16:25:46 +04:00
sinan
6c8933e1a0
[BOLT] Skip PLT search for zero-value weak reference symbols (#69136)
Take a common weak reference pattern for example
```
    __attribute__((weak)) void undef_weak_fun();
    
      if (&undef_weak_fun)
        undef_weak_fun();
```
    
In this case, an undefined weak symbol `undef_weak_fun` has an address
of zero, and Bolt incorrectly changes the relocation for the
corresponding symbol to symbol@PLT, leading to incorrect runtime
behavior.
2024-08-07 18:02:42 +08:00
sinan
734c0488b6
[BOLT] Support map other function entry address (#101466)
Allow BOLT to map the old address to a new binary address if the old
address is the entry of the function.
2024-08-07 15:57:25 +08:00
Amir Ayupov
3f51bec466
[BOLT][NFC] Print timers in perf2bolt invocation
When BOLT is run in AggregateOnly mode (perf2bolt), it exits with code
zero so destructors are not run thus TimerGroup never prints the timers.

Add explicit printing just before the exit to honor options requesting
timers (`--time-rewrite`, `--time-aggr`).

Test Plan: updated bolt/test/timers.c

Reviewers: ayermolo, maksfb, rafaelauler, dcci

Reviewed By: dcci

Pull Request: https://github.com/llvm/llvm-project/pull/101270
2024-07-31 22:14:52 -07:00
Amir Ayupov
fb97b4f962
[BOLT][NFC] Add timers for MetadataManager invocations
Test Plan: added bolt/test/timers.c

Reviewers: ayermolo, maksfb, rafaelauler, dcci

Reviewed By: dcci

Pull Request: https://github.com/llvm/llvm-project/pull/101267
2024-07-31 22:12:34 -07:00
Sayhaan Siddiqui
33960ce5a8
[BOLT][DWARF] Sort GDBIndexTUEntryVector (#101264)
Sorts GDBIndexTUEntryVector in decreasing order by hash to ensure
determinism when parallelized.
2024-07-31 11:35:38 -07:00
Sayhaan Siddiqui
79dcd93b70
[BOLT][DWARF] Remove option to write to DWP (#100771)
Remove the --write-dwp option as well as related code and tests.
2024-07-30 16:58:01 -07:00
Vladislav Khmelevsky
803eaf2926
[BOLT][NFC] Fix test requirement (#100867)
Tests that are using instrumentation should have bolt-runtime in
requirements
2024-07-27 18:44:58 +04:00
Sayhaan Siddiqui
9a3e66e314
[BOLT][DWARF][NFC] Fix DebugStrOffsetsWriter (#100672)
Fix DebugStrOffsetsWriter so updateAddressMap can't be called after it
is finalized.
2024-07-26 18:58:25 -07:00
Tristan Ross
abc2eae682
[BOLT] Enable standalone build (#97130)
Continue from #87196 as author did not have much time, I have taken over
working on this PR. We would like to have this so it'll be easier to
package for Nix.

Can be tested by copying cmake, bolt, third-party, and llvm directories
out into their own directory with this PR applied and then build bolt.

---------

Co-authored-by: pca006132 <john.lck40@gmail.com>
2024-07-25 08:18:14 -07:00
Amir Ayupov
4d19676de4
[BOLT] Add profile-use-pseudo-probes option
Move pseudo probe profile generation under --profile-use-pseudo-probes
option. Note that updating pseudo probes is independent from this flag.

Test Plan: updated pseudoprobe-decoding-inline.test

Reviewers: maksfb, rafaelauler, ayermolo, dcci, WenleiHe

Reviewed By: WenleiHe

Pull Request: https://github.com/llvm/llvm-project/pull/100299
2024-07-24 07:31:01 -07:00
Amir Ayupov
9d2dd009b6
[BOLT] Support more than two jump table parents
Multi-way splitting can cause multiple fragments to access the same jump
table. Relax the assumption that a jump table can only have up to two
parents.

Test Plan: added bolt/test/X86/three-way-split-jt.s

Reviewers: ayermolo, dcci, rafaelauler, maksfb

Reviewed By: rafaelauler, dcci

Pull Request: https://github.com/llvm/llvm-project/pull/99988
2024-07-24 07:16:39 -07:00
Sayhaan Siddiqui
7cd7a1eab4
[BOLT][DWARF][NFC] Split processUnitDIE into two lambdas (#99957)
Split processUnitDIE into two lambdas to separate the processing of DWO
CUs and CUs in the main binary.
2024-07-23 12:59:40 -07:00
Eisuke Kawashima
8bc02bf5c6
fix(bolt/**.py): fix comparison to None (#94012)
from PEP8
(https://peps.python.org/pep-0008/#programming-recommendations):

> Comparisons to singletons like None should always be done with is or
is not, never the equality operators.

Co-authored-by: Eisuke Kawashima <e-kwsm@users.noreply.github.com>
2024-07-19 16:59:56 -07:00
klensy
1ee8238f0e
[BOLT][test] Fix Filecheck typos (#93979)
Fixes few FileCheck typos in tests and add missing(?) filecheck call in
test.

Co-authored-by: klensy <nightouser@gmail.com>
2024-07-19 16:57:14 -07:00
Shaw Young
296a956369
[BOLT] Match functions with call graph (#98125)
Implemented call graph function matching. First, two call graphs are
constructed for both profiled and binary functions. Then functions are
hashed based on the names of their callee/caller functions. Finally,
functions are matched based on these neighbor hashes and the 
longest common prefix of their names. The `match-with-call-graph` 
flag turns this matching on.

Test Plan: Added match-with-call-graph.test. Matched 164 functions 
in a large binary with 10171 profiled functions.
2024-07-19 14:00:28 -07:00
Amir Ayupov
c905db67a0
[BOLT] Attach pseudo probes to blocks in YAML profile
Read pseudo probes in regular and BAT YAML profile generation, and
attach them to YAML profile basic blocks. This exposes GUID, probe id,
and probe type in profile for future use in stale profile matching.

Test Plan: updated pseudoprobe-decoding-inline.test

Reviewers: dcci, rafaelauler, ayermolo, maksfb

Reviewed By: rafaelauler

Pull Request: https://github.com/llvm/llvm-project/pull/99554
2024-07-18 21:01:40 -07:00
Amir Ayupov
9b007a199d
[BOLT] Expose pseudo probe function checksum and GUID (#99389)
Add a BinaryFunction field for pseudo probe function GUID.
Populate it during pseudo probe section parsing, and emit it in YAML
profile (both regular and BAT), along with function checksum.

To be used for stale function matching.

Test Plan: update pseudoprobe-decoding-inline.test
2024-07-18 20:58:16 -07:00
Amir Ayupov
3023b15fb1 [BOLT] Support POSSIBLE_PIC_FIXED_BRANCH
Detect and support fixed PIC indirect jumps of the following form:
```
movslq  En(%rip), %r1
leaq  PIC_JUMP_TABLE(%rip), %r2
addq  %r2, %r1
jmpq  *%r1
```

with PIC_JUMP_TABLE that looks like following:

```
  JT:  ----------
   E1:| L1 - JT  |
      |----------|
   E2:| L2 - JT  |
      |----------|
      |          |
         ......
   En:| Ln - JT  |
       ----------
```

The code could be produced by compilers, see
https://github.com/llvm/llvm-project/issues/91648.

Test Plan: updated jump-table-fixed-ref-pic.test

Reviewers: maksfb, ayermolo, dcci, rafaelauler

Reviewed By: rafaelauler

Pull Request: https://github.com/llvm/llvm-project/pull/91667
2024-07-18 20:57:05 -07:00
Amir Ayupov
3fe50b6dde
[BOLT] Store FileSymRefs in a multimap
With aggressive ICF, it's possible to have different local symbols
(under different FILE symbols) to be mapped to the same address.

FileSymRefs only keeps a single SymbolRef per address, which prevents
fragment matching from finding the correct symbol to perform parent
function lookup.

Work around this issue by switching FileSymRefs to a multimap. In
future, uses of FileSymRefs can be replaced with SortedSymbols which
keeps essentially the same information.

Test Plan: added ambiguous_fragment.test

Reviewers: dcci, ayermolo, maksfb, rafaelauler

Reviewed By: rafaelauler

Pull Request: https://github.com/llvm/llvm-project/pull/98992
2024-07-16 22:14:43 -07:00
Sayhaan Siddiqui
e140a8a3c8
[BOLT][DWARF][NFC] Refactor address writers (#98094)
Refactors address writers to create an instance for each CU and its DWO
CU.
2024-07-15 23:03:43 -07:00
Daniel Bertalan
c6b3f50194
[bolt][test] Require asserts in X86/match-functions-with-calls-as-anchors.test (#98882)
Otherwise, it fails due to the unsupported `--debug` flag in non-asserts
builds.
2024-07-15 21:40:50 +02:00
Paschalis Mpeis
587308c343
[BOLT][AArch64] Provide createDummyReturnFunction (#96626)
AArch64 needs this function when instrumenting statically-linked binaries.

Sample commands:
```bash
clang -Wl,-q test.c -static -o out
llvm-bolt -instrument -instrumentation-sleep-time=5 out -o out.instr
```
2024-07-15 07:20:47 +01:00
Shaw Young
131eb30584
[BOLT] Match blocks with calls as anchors (#96596)
Added another hash level – call hash – following opcode hash matching
for stale block matching. Call hash strings are the concatenation of the
lexicographically ordered names of each blocks’ called functions. This 
change bolsters block matching in cases where some instructions have
been removed or added but calls remain constant.

Test Plan: added match-functions-with-calls-as-anchors.test.
2024-07-10 15:46:47 -07:00
Sayhaan Siddiqui
7e10ad99ad
[BOLT][DWARF] Cleanup buffer initialization for DWO range writer (#97843)
Cleanup buffer initialization for DWO range writer instances to remove
empty buffer at the beginning.
2024-07-10 11:35:40 -07:00
Amir Ayupov
c641fc3a4c
[BOLT][test] Fix tests for aarch64 buildbot (#97620)
Fix broken tests in
[bolt-aarch64-ubuntu-clang-shared](https://lab.llvm.org/buildbot/#/builders/126/builds/138)
2024-07-09 20:02:01 -07:00
Amir Ayupov
dc1da93958
[BOLT][BAT] Add support for three-way split functions (#93760)
In three-way split functions, if only .warm fragment is present, BAT
incorrectly overwrites the map for .warm fragment by empty .cold
fragment.

Test Plan: updated register-fragments-bolt-symbols.s
2024-07-05 15:18:49 -07:00
Ádám Kallai
e2cee2c1e6
[BOLT][AArch64] Fixes assertion errors occurred when perf2bolt was executed (#83394)
BOLT only checks for the most common indirect branch pattern during the
branch analyzation.
Extended the logic with two other indirect patterns which slightly
differ from the expected one.
Those patterns may be hit when statically linking libc (pattern 2
requires 'lld' linker).

As a workaround mark them as UNKNOWN branch for now. 

Fixes: #83114
2024-07-05 16:24:22 +04:00
Alexander Yermolovich
361350fc89
[BOLT][DWARF] Deduplicate Foreign TU list (#97629)
There could be multiple TUs with the same hash in various DWO files. In
bigger binaries this could be in the thousands. Although they could be
structurally different and we need to output Entries for all of them,
for the purposes of figuring out a TU hash we only need one entry in
Foreign TU list.
2024-07-04 07:20:06 -07:00