Implement selective probe parsing for profiled functions only when
emitting probe information to YAML profile as suggested in
https://github.com/llvm/llvm-project/pull/102904#pullrequestreview-2248714190
For a large binary, this reduces probe parsing
- processing time from 10.5925s to 5.6295s,
- peak RSS from 10.54 to 7.98 GiB.
The flag currently controls writing of probe information in YAML
profile. #99891 adds a separate flag to use probe information for stale
profile matching. Thus `profile-use-pseudo-probes` becomes a misnomer
and `profile-write-pseudo-probes` better captures the intent.
Reviewers: maksfb, WenleiHe, ayermolo, rafaelauler, dcci
Reviewed By: rafaelauler
Pull Request: https://github.com/llvm/llvm-project/pull/106364
Pseudo probe function records contain GUIDs assigned by the compiler
using an IR function name. Thus suffixes added later (e.g. `.llvm.`
for internal symbols, `.destroy`/`.resume` for coroutine fragments,
and `.cold`/`.warm` for split fragments) cause GUID mismatch.
Address that by dropping those suffixes using `getCommonName` which is
a parametrized form of `getLTOCommonName`.
Replace unordered_map with a vector. Pre-parse the section to statically
allocate storage. Use BumpPtrAllocator for FuncName strings, keep
StringRef in FuncDesc.
Reduces peak RSS of pseudo probe parsing from 9.08 GiB to 8.89 GiB as
part of perf2bolt with a large binary.
Test Plan:
```
bin/llvm-lit -sv test/tools/llvm-profgen
```
Reviewers: wlei-llvm, rafaelauler, dcci, maksfb, ayermolo
Reviewed By: wlei-llvm
Pull Request: https://github.com/llvm/llvm-project/pull/102905
Replace the map from addresses to list of probes with a flat vector
containing probe references sorted by their addresses.
Reduces pseudo probe parsing time from 9.56s to 8.59s and peak RSS from
9.66 GiB to 9.08 GiB as part of perf2bolt processing a large binary.
Test Plan:
```
bin/llvm-lit -sv test/tools/llvm-profgen
```
Reviewers: maksfb, rafaelauler, dcci, ayermolo, wlei-llvm
Reviewed By: wlei-llvm
Pull Request: https://github.com/llvm/llvm-project/pull/102904
Use #102774 to allocate storage for decoded probes (`PseudoProbeVec`)
and function records (`InlineTreeVec`).
Leverage that to also shrink sizes of `MCDecodedPseudoProbe`:
- Drop Guid since it's accessible via `InlineTree`.
`MCDecodedPseudoProbeInlineTree`:
- Keep track of probes and inlinees using `ArrayRef`s now that probes
and function records belonging to the same function are allocated
contiguously.
This reduces peak RSS from 13.7 GiB to 9.7 GiB and pseudo probe parsing
time (as part of perf2bolt) from 15.3s to 9.6s for a large binary with
400MiB .pseudo_probe section containing 43M probes and 25M function
records.
Depends on:
#102774#102787#102788
Reviewers: maksfb, rafaelauler, dcci, ayermolo, wlei-llvm
Reviewed By: wlei-llvm
Pull Request: https://github.com/llvm/llvm-project/pull/102789
Pre-parse pseudo probes section counting the number of probes and
function records. These numbers are used in follow-up diff to
pre-allocate vectors for decoded probes and inline tree nodes.
Additional benefit is avoiding error handling during parsing.
This pre-parsing is fast: for a 404MiB .pseudo_probe section with
43373881 probes and 25228770 function records, it only takes 0.68±0.01s.
The total time of buildAddress2ProbeMap is 21s.
Reviewers: dcci, maksfb, rafaelauler, wlei-llvm, ayermolo
Reviewed By: wlei-llvm
Pull Request: https://github.com/llvm/llvm-project/pull/102774
Add a BinaryFunction field for pseudo probe function GUID.
Populate it during pseudo probe section parsing, and emit it in YAML
profile (both regular and BAT), along with function checksum.
To be used for stale function matching.
Test Plan: update pseudoprobe-decoding-inline.test
Note that llvm::support::endianness has been renamed to
llvm::endianness while becoming an enum class as opposed to an
enum. This patch replaces support::{big,little,native} with
llvm::endianness::{big,little,native}.
BOLT uses MCAsmLayout to calculate the output values of basic blocks.
This means output values are calculated based on a pre-linking state and
any changes to symbol values during linking will cause incorrect values
to be used.
This issue was first addressed in D154604 by adding all basic block
symbols to the symbol table for the linker to resolve them. However, the
runtime overhead of handling this huge symbol table turned out to be
prohibitively large.
This patch solves the issue in a different way. First, a temporary
section containing [input address, output symbol] pairs is emitted to the
intermediary object file. The linker will resolve all these references
so we end up with a section of [input address, output address] pairs.
This section is then parsed and used to:
- Replace BinaryBasicBlock::OffsetTranslationTable
- Replace BinaryFunction::InputOffsetToAddressMap
- Update BinaryBasicBlock::OutputAddressRange
Note that the reason this is more performant than the previous attempt
is that these symbol references do not cause entries to be added to the
symbol table. Instead, section-relative references are used for the
relocations.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D155604
Use new MetdataRewriter interface to update pseudo probes and move
ProbeDecoder out of BinaryContext into new PseudoProbeRewriter class.
Depends on D154021
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D154022
Differential Revision: https://reviews.llvm.org/D154023