239 Commits

Author SHA1 Message Date
Amir Ayupov
08916cef7e
[BOLT] Set RawBranchCount in DataAggregator
Align DataAggregator (Linux perf and pre-aggregated profile reader) to
DataReader (fdata profile reader) behavior: set BF->RawBranchCount which
is used in profile density computation (#101094).

Reviewers: ayermolo, maksfb, dcci, rafaelauler, WenleiHe

Reviewed By: WenleiHe

Pull Request: https://github.com/llvm/llvm-project/pull/101093
2024-10-24 18:28:44 -07:00
Amir Ayupov
79d695f049
[BOLT][NFCI] Speedup BAT::writeMaps
For a large binary with BAT section of size 38 MB with ~170k maps,
reduces writeMaps time from 70s down to 1s.

The inefficiency was in the use of std::distance with std::map::iterator
which doesn't provide random access. Use sorted vector for lookups.

Test Plan: NFC

Reviewers: maksfb, rafaelauler, dcci, ayermolo

Reviewed By: maksfb

Pull Request: https://github.com/llvm/llvm-project/pull/112061
2024-10-11 21:40:53 -07:00
Kazu Hirata
1be849c529
[BOLT] Avoid repeated hash lookups (NFC) (#111782) 2024-10-09 20:19:58 -07:00
Kristof Beyls
6d216fb7b8
[perf2bolt] Improve heuristic to map in-process addresses to specific… (#109397)
… segments in Elf binary.

The heuristic is improved by also taking into account that only
executable segments should contain instructions.

Fixes #109384.
2024-09-23 15:14:51 +02:00
Amir Ayupov
cd774c873c [BOLT][NFC] Rename ProfilePseudoProbeDesc
Address build issues due to aliasing PseudoProbeDesc, e.g.
https://lab.llvm.org/buildbot/#/builders/113/builds/2743
2024-09-12 21:25:38 -07:00
Amir Ayupov
c00c62c113
[BOLT] Add pseudo probe inline tree to YAML profile
Add probe inline tree information to YAML profile, at function level:
- function GUID,
- checksum,
- parent node id,
- call site in the parent.

This information is used for pseudo probe block matching (#99891).

The encoding adds/changes probe information in multiple levels of
YAML profile:
- BinaryProfile: add pseudo_probe_desc with GUIDs and Hashes, which
  permits deduplication of data:
  - many GUIDs are duplicate as the same callee is commonly inlined
    into multiple callers,
  - hashes are also very repetitive, especially for functions with
    low block counts.
- FunctionProfile: add inline tree (see above). Top-level function
  is included as root of function inline tree, which makes guid and
  pseudo_probe_desc_hash fields redundant.
- BlockProfile: densely-encoded block probe information:
  - probes reference their containing inline tree node,
  - separate lists for block, call, indirect call probes,
  - block probe encoding is specialized: ids are encoded as bitset
    in uint64_t. If only block probe with id=1 is present, it's
    encoded as implicit entry (id=0, omitted).
  - inline tree nodes with identical probes share probe description
    where node indices are combined into a list.

On top of #107970, profile with new probe encoding has the following
characteristics (profile for a large binary):

- Profile without probe information: 33MB, 3.8MB compressed (baseline).
- Profile with inline tree information: 92MB, 14MB compressed.

Profile processing time (YAML parsing, inference, attaching steps):
- profile without pseudo probes: 5s,
- profile with pseudo probes, without pseudo probe matching: 11s,
- with pseudo probe matching: 12.5s.

Test Plan: updated pseudoprobe-decoding-inline.test

Reviewers: wlei-llvm, ayermolo, rafaelauler, dcci, maksfb

Reviewed By: wlei-llvm, rafaelauler

Pull Request: https://github.com/llvm/llvm-project/pull/107137
2024-09-12 20:51:35 -07:00
Amir Ayupov
ccc7a072db
[BOLT] Drop blocks without profile in BAT YAML (#107970)
Align BAT YAML (DataAggregator) to YAMLProfileWriter which drops blocks
without profile:

61372fc5db/bolt/lib/Profile/YAMLProfileWriter.cpp (L162-L176)

Test Plan: NFCI
2024-09-11 16:36:47 -07:00
Amir Ayupov
c820bd3e33
[BOLT][NFC] Rename profile-use-pseudo-probes
The flag currently controls writing of probe information in YAML
profile. #99891 adds a separate flag to use probe information for stale
profile matching. Thus `profile-use-pseudo-probes` becomes a misnomer
and `profile-write-pseudo-probes` better captures the intent.

Reviewers: maksfb, WenleiHe, ayermolo, rafaelauler, dcci

Reviewed By: rafaelauler

Pull Request: https://github.com/llvm/llvm-project/pull/106364
2024-09-11 16:27:33 -07:00
Amir Ayupov
15fa3ba547
[BOLT][YAML] Allow unknown keys in the input (#100824)
This ensures forward compatibility, where old BOLT versions can consume
the profile created by newer versions with extra keys.

Test Plan: added yaml-unknown-keys.test
2024-09-03 11:27:57 -07:00
Amir Ayupov
ee09f7d1fc
[MC][NFC] Reduce Address2ProbesMap size
Replace the map from addresses to list of probes with a flat vector
containing probe references sorted by their addresses.

Reduces pseudo probe parsing time from 9.56s to 8.59s and peak RSS from
9.66 GiB to 9.08 GiB as part of perf2bolt processing a large binary.

Test Plan:
```
bin/llvm-lit -sv test/tools/llvm-profgen
```

Reviewers: maksfb, rafaelauler, dcci, ayermolo, wlei-llvm

Reviewed By: wlei-llvm

Pull Request: https://github.com/llvm/llvm-project/pull/102904
2024-08-26 09:14:35 -07:00
Amir Ayupov
4d19676de4
[BOLT] Add profile-use-pseudo-probes option
Move pseudo probe profile generation under --profile-use-pseudo-probes
option. Note that updating pseudo probes is independent from this flag.

Test Plan: updated pseudoprobe-decoding-inline.test

Reviewers: maksfb, rafaelauler, ayermolo, dcci, WenleiHe

Reviewed By: WenleiHe

Pull Request: https://github.com/llvm/llvm-project/pull/100299
2024-07-24 07:31:01 -07:00
Shaw Young
296a956369
[BOLT] Match functions with call graph (#98125)
Implemented call graph function matching. First, two call graphs are
constructed for both profiled and binary functions. Then functions are
hashed based on the names of their callee/caller functions. Finally,
functions are matched based on these neighbor hashes and the 
longest common prefix of their names. The `match-with-call-graph` 
flag turns this matching on.

Test Plan: Added match-with-call-graph.test. Matched 164 functions 
in a large binary with 10171 profiled functions.
2024-07-19 14:00:28 -07:00
Amir Ayupov
79a0b66593 [BOLT] Add MC dependency for Profile 2024-07-18 21:47:58 -07:00
Amir Ayupov
c905db67a0
[BOLT] Attach pseudo probes to blocks in YAML profile
Read pseudo probes in regular and BAT YAML profile generation, and
attach them to YAML profile basic blocks. This exposes GUID, probe id,
and probe type in profile for future use in stale profile matching.

Test Plan: updated pseudoprobe-decoding-inline.test

Reviewers: dcci, rafaelauler, ayermolo, maksfb

Reviewed By: rafaelauler

Pull Request: https://github.com/llvm/llvm-project/pull/99554
2024-07-18 21:01:40 -07:00
Amir Ayupov
9b007a199d
[BOLT] Expose pseudo probe function checksum and GUID (#99389)
Add a BinaryFunction field for pseudo probe function GUID.
Populate it during pseudo probe section parsing, and emit it in YAML
profile (both regular and BAT), along with function checksum.

To be used for stale function matching.

Test Plan: update pseudoprobe-decoding-inline.test
2024-07-18 20:58:16 -07:00
Paschalis Mpeis
34433fdceb
[BOLT] Add -print-mappings option to heatmaps (#97567)
Emit a mapping in the legend between the characters/buckets and the text
sections, using:

```sh
llvm-heatmap-bolt -print-mappings ..
```

Example:
```
Legend:
..
Sections:
  a/A : .init      0x00000100-0x00000200
  b/B : .plt       0x00000200-0x00000500
  c/C : .text      0x00010000-0x000a0000
  d/D : .fini      0x000a0000-0x000f0000
..
```
2024-07-15 08:23:06 +01:00
Shaw Young
131eb30584
[BOLT] Match blocks with calls as anchors (#96596)
Added another hash level – call hash – following opcode hash matching
for stale block matching. Call hash strings are the concatenation of the
lexicographically ordered names of each blocks’ called functions. This 
change bolsters block matching in cases where some instructions have
been removed or added but calls remain constant.

Test Plan: added match-functions-with-calls-as-anchors.test.
2024-07-10 15:46:47 -07:00
Amir Ayupov
dc1da93958
[BOLT][BAT] Add support for three-way split functions (#93760)
In three-way split functions, if only .warm fragment is present, BAT
incorrectly overwrites the map for .warm fragment by empty .cold
fragment.

Test Plan: updated register-fragments-bolt-symbols.s
2024-07-05 15:18:49 -07:00
Shaw Young
37bee25497
[BOLT][NFC] Refactor function matching (#97502)
Moved function matching techniques into separate helper functions for
ease of understanding and to make space for additional function 
matching techniques to be added (e.g. call graph function matching).
2024-07-05 14:44:15 -07:00
shawbyoung
fd524d4df7 [BOLT] Add Demangle to Profile link components
Added Demangle to Profile link components to fix shared build.
2024-07-03 12:58:55 -07:00
Shaw Young
97dc50882c
[BOLT] Match functions with name similarity (#95884)
A mapping - from namespace to associated binary functions - is used to
match function profiles to binary based on the
'--name-similarity-function-matching-threshold' flag set edit distance
threshold. The flag is set to 0 (exact name matching) by default as it is
expensive, requiring the processing of all BFs.

Test Plan: Added name-similarity-function-matching.test. On a binary
with 5M functions, rewrite passes took ~520s without the flag and
~2018s with the flag set to 20.
2024-07-03 11:39:18 -07:00
Shaw Young
49fdbbcfed
[BOLT] Match functions with exact hash (#96572)
Added flag '--match-profile-with-function-hash' to match functions 
based on exact hash. After identical and LTO name matching, more 
functions can be recovered for inference with exact hash, in the case
of function renaming with no functional changes. Collisions are 
possible in the unlikely case where multiple functions share the same
exact hash. The flag is off by default as it requires the processing of 
all binary functions and subsequently is expensive.

Test Plan: added hashing-based-function-matching.test.
2024-06-29 21:19:00 -07:00
shawbyoung
902952ae04 Revert "[𝘀𝗽𝗿] initial version"
This reverts commit bb5ab1ffe719f5e801ef08ac08be975546aa3266.
2024-06-25 08:30:29 -07:00
shawbyoung
bb5ab1ffe7 [𝘀𝗽𝗿] initial version
Created using spr 1.3.4
2024-06-25 08:05:29 -07:00
shaw young
32e4906c28
Revert "[BOLT] Hash-based function matching" (#96568)
Reverts llvm/llvm-project#95821
2024-06-24 18:44:24 -04:00
shaw young
5e097c79d8
[BOLT] Hash-based function matching (#95821)
Using the hashes of binary and profiled functions
to recover functions with changed names.

Test Plan: added 
hashing-based-function-matching.test.
2024-06-24 15:29:44 -07:00
shaw young
753498eed1
[BOLT] Add sink block to flow CFG in profile inference (#95047)
Summary: Constructing an artificial sink block for the
flow CFG in stale profile inference to allow profile
inference to be run on CFGs with blocks that terminate
and have successors.

Testing Plan: Added infer_no_exits.test to verify that 
functions with exit blocks with a landing pad are 
covered by stale profile inference.

---------

Co-authored-by: Amir Ayupov <fads93@gmail.com>
2024-06-17 16:58:26 -07:00
shaw young
68fc8dffe4 [BOLT] Drop high discrepancy profiles in matching (#95156)
Summary: Functions with high discrepancy 
(measured by matched function blocks) 
can be ignored with an added command line 
argument for better performance.

Test Plan: Added 
stale-matching-min-matched-block.test

---------

Co-authored-by: Amir Ayupov <aaupov@fb.com>
2024-06-17 15:14:35 -07:00
Kazu Hirata
8901f718ea
Use StringRef::starts_with (NFC) (#94886) 2024-06-09 08:14:51 -07:00
shaw young
4be3083bb3
[BOLT] Remove mutable from BB::LayoutIndex (#93224)
Removed mutability from BB::LayoutIndex, subsequently removed const from
BB::SetLayout, and changed BF::dfs to track visited blocks with a set as
opposed to tracking and altering LayoutIndexes for more consistent code.
2024-05-31 11:52:22 -07:00
Amir Ayupov
720cade2b6
[BOLT][NFC] Avoid computing BF hash twice in YAML reader (#75096)
We compute BF hashes in `YAMLProfileReader::readProfile` when first
matching profile functions with binary functions, and second time in
`YAMLProfileReader::parseFunctionProfile` during the profile assignment
(we need to do that to account for LTO private functions with
mismatching suffix).

Avoid recomputing the hash if it's been set.
2024-05-24 14:00:03 -07:00
Amir Ayupov
d1d9545ed3
[BOLT][BAT] Add entries for deleted basic blocks
Deleted basic blocks are required for correct mapping of branches
modified by SCTC.

Increases BAT size, bytes:
- large binary: 8622496 -> 8703244.
- small binary (X86/bolt-address-translation.test): 928 -> 940.

Test Plan: updated bb-with-two-tail-calls.s

Reviewers: ayermolo, dcci, maksfb, rafaelauler

Reviewed By: rafaelauler

Pull Request: https://github.com/llvm/llvm-project/pull/91906
2024-05-23 19:19:07 -07:00
Amir Ayupov
465bfd41fa
[BOLT][NFC] Simplify BBHashMapTy (#91812) 2024-05-22 16:00:51 -07:00
Amir Ayupov
1529ec085a
[BOLT][NFC] Move out PrintProgramStats from Profile into Rewrite (#93075)
Eliminate the dependence of Profile on Passes.

Test Plan: NFC
2024-05-22 13:53:41 -07:00
Amir Ayupov
f3dc732b36
[BOLT][NFC] Make estimateEdgeCounts a BinaryFunctionPass (#93074) 2024-05-22 11:59:00 -07:00
shaw young
96378b3da8
[BOLT] Add NamedRegionTimer to inferStaleProfile (#93078) 2024-05-22 11:04:12 -07:00
Amir Ayupov
5057463814
[BOLT][NFC] Make BAT methods const (#91823) 2024-05-22 10:39:27 -07:00
shaw young
c8fc234ee2
[BOLT][NFC] Eliminate uses of throwing std::map::at (#92950)
Remove calls to std::unordered_map::at, std::map::at, and
std::vector::at.
2024-05-22 09:27:14 -07:00
Amir Ayupov
7c5c8b2f47
[BOLT][NFC] Move BAT::fetchParentAddress to header (#93061)
Unbreak shared build after
https://github.com/llvm/llvm-project/pull/91683
2024-05-22 09:14:10 -07:00
Amir Ayupov
97025bd9d5
[BOLT] Use getLocationName in YAMLProfileWriter (#92493)
Disambiguate local functions using the containing file symbol in BAT
mode. Make local function naming consistent across BAT fdata and YAML
profiles.

Test Plan: updated register-fragments-bolt-symbols.s
2024-05-21 20:24:46 -07:00
Amir Ayupov
935b946b1f
[BOLT] Process cross references between ignored functions in BAT mode (#92484)
To align YAML and fdata profiles produced in BAT mode, lift two
restrictions applied in non-relocation mode when BAT is present:
1) register secondary entry points from ignored functions,
2) treat functions with secondary entry points as simple.

This allows constructing CFG for non-simple functions in non-relocation
mode and emitting YAML profile for them, which can then be used for
optimizations in relocation mode.

Test Plan: added test ignored-interprocedural-reference.s
2024-05-21 20:22:12 -07:00
Amir Ayupov
a9b67490b2
[BOLT] Report adjusted program stats from perf2bolt in BAT mode (#91683) 2024-05-21 18:54:15 -07:00
Amir Ayupov
32c9d5ef4f Revert "[BOLT] Add NamedRegionTimer to inferStaleProfile (#92621)"
This reverts commit 9f2313829fd210f9923375e93bc11fe9685c26d5.

Creates a dependency cycle: lib/Rewrite depends on lib/Profile.
2024-05-21 13:55:32 -07:00
shaw young
9f2313829f
[BOLT] Add NamedRegionTimer to inferStaleProfile (#92621) 2024-05-21 13:26:57 -07:00
Kazu Hirata
1486653dcf
[BOLT] Use StringRef::contains (NFC) (#92842) 2024-05-20 19:18:45 -07:00
Amir Ayupov
91423d7193
[BOLT][NFC] Don't assign YAML profile to functions with no CFG (#92487)
YAML profile for non-simple functions without CFG is
  1) useless for optimizations,
  2) can't be attached, similar to fdata profile,
  3) would be reported as invalid/stale even if the profile is valid.

Don't attempt to attach the profile in this case, aligning the behavior
to DataReader.

Test Plan: added yaml-non-simple.test
2024-05-19 20:15:31 -07:00
Amir Ayupov
9f15aa009c
[BOLT][NFC] Rename DataAggregator::BranchInfo to TakenBranchInfo
Align the name to its counterpart `FTInfo` which avoids name aliasing
with llvm::bolt::BranchInfo and allows to drop namespace specifier.

Test Plan: NFC

Reviewers: maksfb, rafaelauler, ayermolo, dcci

Reviewed By: dcci

Pull Request: https://github.com/llvm/llvm-project/pull/92017
2024-05-16 20:02:51 -07:00
Amir Ayupov
b06f97b039
[BOLT] Allow pass-through blocks in YAMLProfileReader (#91828) 2024-05-13 18:02:38 -07:00
Amir Ayupov
4ecf2caf68
[BOLT] Use aggregated FuncBranchData in writeBATYAML
Switch from FuncBranchData intermediate maps (Intra/InterIndex)
to aggregated Data, same as one used by DataReader:
e62ce1f884/bolt/lib/Profile/DataReader.cpp (L385-L389)
This aligns the order of the output between YAMLProfileWriter and
writeBATYAML.

Test Plan: updated bolt-address-translation-yaml.test

Reviewers: rafaelauler, dcci, ayermolo, maksfb

Reviewed By: ayermolo, maksfb

Pull Request: https://github.com/llvm/llvm-project/pull/91289
2024-05-13 14:23:32 -07:00
Kazu Hirata
f841ca0c35
Use StringRef::operator== instead of StringRef::equals (NFC) (#91864)
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.

- StringRef::operator==/!= outnumber StringRef::equals by a factor of
  276 under llvm-project/ in terms of their usage.

- The elimination of StringRef::equals brings StringRef closer to
  std::string_view, which has operator== but not equals.

- S == "foo" is more readable than S.equals("foo"), especially for
  !Long.Expression.equals("str") vs Long.Expression != "str".
2024-05-12 23:08:40 -07:00