
Match inline trees first between profile and the binary: by GUID, checksum, parent, and inline site for inlined functions. Map profile probes to binary probes via matched inline tree nodes. Each binary probe has an associated binary basic block. If all probes from one profile basic block map to the same binary basic block, it’s an exact match, otherwise the block is determined by majority vote and reported as loose match. Pseudo probe matching happens between exact hash matching and call/loose matching. Introduce ProbeMatchSpec - a mechanism to match probes belonging to another binary function. For example, given functions foo and bar: ``` void foo() { bar(); } ``` profiled binary: bar is not inlined => have top-level function bar new binary where the profile is applied to: bar is inlined into foo. Currently, BOLT does 1:1 matching between profile functions and binary functions based on the name. #100446 will extend this to N:M where multiple profiles can be matched to one binary function (as in the example above where binary function foo would use profiles for foo and bar), and one profile can be matched to multiple binary functions (e.g. if bar was inlined into multiple functions). In this diff, ProbeMatchSpecs would only have one BinaryFunctionProfile (existing name-based matching). Test Plan: Added match-blocks-with-pseudo-probes.test Performance test: - Setup: - Baseline no-BOLT: Clang with pseudo probes, ThinLTO + CSSPGO (#79942) - BOLT fresh: BOLTed Clang using fresh profile, - BOLT stale (hash): BOLTed Clang using stale profile (collected on Clang 10K commits back), `-infer-stale-profile` (hash+call block matching) - BOLT stale (+probe): BOLTed Clang using stale profile, `-infer-stale-profile` with `-stale-matching-with-pseudo-probes` (hash+call+pseudo probe block matching) - 2S Intel SKX Xeon 6138 with 40C/80T and 256GB RAM, using 20C/40T for build, - BOLT profiles are collected on Clang compiling large preprocessed C++ file. - Benchmark: building Clang (average of 5 runs), see driver in aaupov/llvm-devmtg-2022 - Results, wall time, lower is better: - Baseline no-BOLT: 429.52 +- 2.61s, - BOLT stale (hash): 413.21 +- 2.19s, - BOLT stale (+probe): 409.69 +- 1.41s, - BOLT fresh: 384.50 +- 1.80s. --------- Co-authored-by: Amir Ayupov <aaupov@fb.com>
66 lines
2.1 KiB
Plaintext
66 lines
2.1 KiB
Plaintext
## Test stale block matching with pseudo probes including inline tree matching.
|
|
# RUN: split-file %s %t
|
|
# RUN: llvm-bolt \
|
|
# RUN: %S/../../../llvm/test/tools/llvm-profgen/Inputs/inline-cs-pseudoprobe.perfbin \
|
|
# RUN: -o %t.bolt -data %t/yaml -infer-stale-profile -v=2 \
|
|
# RUN: --stale-matching-with-pseudo-probes 2>&1 | FileCheck %s
|
|
|
|
# CHECK: BOLT-WARNING: 3 (100.0% of all profiled) functions have invalid (possibly stale) profile
|
|
# CHECK: BOLT-INFO: inference found an exact pseudo probe match for 100.00% of basic blocks (3 out of 3 stale)
|
|
|
|
#--- yaml
|
|
---
|
|
header:
|
|
profile-version: 1
|
|
binary-name: 'inline-cs-pseudoprobe.perfbin'
|
|
binary-build-id: '<unknown>'
|
|
profile-flags: [ lbr ]
|
|
profile-origin: perf data aggregator
|
|
profile-events: ''
|
|
dfs-order: false
|
|
hash-func: xxh3
|
|
functions:
|
|
- name: bar
|
|
fid: 9
|
|
hash: 0x1
|
|
exec: 1
|
|
nblocks: 1
|
|
blocks:
|
|
- bid: 0
|
|
insns: 11
|
|
hash: 0x1
|
|
exec: 1
|
|
probes: [ { blx: 9 } ]
|
|
inline_tree: [ { } ]
|
|
- name: foo
|
|
fid: 10
|
|
hash: 0x2
|
|
exec: 1
|
|
nblocks: 6
|
|
blocks:
|
|
- bid: 0
|
|
insns: 3
|
|
hash: 0x2
|
|
exec: 1
|
|
succ: [ { bid: 3, cnt: 0 } ]
|
|
probes: [ { blx: 3 } ]
|
|
inline_tree: [ { g: 1 }, { g: 0, cs: 8 } ]
|
|
- name: main
|
|
fid: 11
|
|
hash: 0x3
|
|
exec: 1
|
|
nblocks: 6
|
|
blocks:
|
|
- bid: 0
|
|
insns: 3
|
|
hash: 0x3
|
|
exec: 1
|
|
succ: [ { bid: 3, cnt: 0 } ]
|
|
probes: [ { blx: 3, id: 1 }, { blx: 1 } ]
|
|
inline_tree: [ { g: 2 }, { g: 1, cs: 2 }, { g: 0, p: 1, cs: 8 } ]
|
|
pseudo_probe_desc:
|
|
gs: [ 0xE413754A191DB537, 0x5CF8C24CDB18BDAC, 0xDB956436E78DD5FA ]
|
|
gh: [ 2, 0, 1 ]
|
|
hs: [ 0x200205A19C5B4, 0x10000FFFFFFFF, 0x10E852DA94 ]
|
|
...
|