Intel's Architectural LBR supports capturing branch type information
as part of LBR stack (SDM Vol 3B, part 2, October 2024):
```
20.1.3.2 Branch Types
The IA32_LBR_x_INFO.BR_TYPE and IA32_LER_INFO.BR_TYPE fields encode
the branch types as shown in Table 20-3.
Table 20-3. IA32_LBR_x_INFO and IA32_LER_INFO Branch Type Encodings
Encoding | Branch Type
0000B | COND
0001B | NEAR_IND_JMP
0010B | NEAR_REL_JMP
0011B | NEAR_IND_CALL
0100B | NEAR_REL_CALL
0101B | NEAR_RET
011xB | Reserved
1xxxB | OTHER_BRANCH
For a list of branch operations that fall into the categories above,
see Table 20-2.
Table 20-2. Branch Type Filtering Details
Branch Type | Operations Recorded
COND | Jcc, J*CXZ, and LOOP*
NEAR_IND_JMP | JMP r/m*
NEAR_REL_JMP | JMP rel*
NEAR_IND_CALL | CALL r/m*
NEAR_REL_CALL | CALL rel* (excluding CALLs to the next sequential IP)
NEAR_RET | RET (0C3H)
OTHER_BRANCH | JMP/CALL ptr*, JMP/CALL m*, RET (0C8H), SYS*,
interrupts, exceptions (other than debug exceptions), IRET, INT3,
INTn, INTO, TSX Abort, EENTER, ERESUME, EEXIT, AEX, INIT, SIPI, RSM
```
Linux kernel can preserve branch type when `save_type` is enabled,
even if CPU does not support Architectural LBR:
f09079bd04/tools/perf/Documentation/perf-record.txt (L457-L460)
> - save_type: save branch type during sampling in case binary is not
available later.
For the platforms with Intel Arch LBR support (12th-Gen+ client or
4th-Gen Xeon+ server), the save branch type is unconditionally enabled
when the taken branch stack sampling is enabled.
Kernel-reported branch type values:
8c6bc74c7f/include/uapi/linux/perf_event.h (L251-L269)
This information is needed to disambiguate external returns (from
DSO/JIT) to an entry point or a landing pad, when BOLT can't
disassemble the branch source.
This patch adds new pre-aggregated types:
- return trace (R),
- external return fall-through (r).
For such types, the checks for fall-through start (not an entry or
a landing pad) are relaxed.
Depends on #143295.
Test Plan: updated callcont-fallthru.s
Define a pre-aggregated basic sample format:
```
E <event name>
S <location> <count>
```
`-nl` flag is required to use parsed basic samples.
Test Plan: update pre-aggregated-perf.test
The mispred and execnt values were originally recorded in reverse order
and then consumed in the opposite order to compensate.
This patch records and uses them in the same order (FDATA) for clarity.
That order is:
```
<is symbol?> <closest ELF symbol or DSO name> <relative FROM address>
<is symbol?> <closest ELF symbol or DSO name> <relative TO address>
<number of mispredictions> <number of branches>
```
We used to report PLT traces as invalid (mismatching disassembled
function contents) because PLT functions are marked as pseudo and
ignored, thus missing CFG. However, such traces are not mismatching
the function contents. Accept them without attaching the profile.
Test Plan: updated callcont-fallthru.s
Traces are triplets of branch source, target, and fall-through end (next
branch).
Traces simplify differentiation of fall-throughs into local- and
external-origin, which improves performance over profile with
undifferentiated fall-throughs by eliminating profile discontinuity in
call to continuation fall-throughs. This makes it possible to avoid
converting return profile into call to continuation profile which may
introduce statistical biases.
The existing format makes provisions for local- (F) and external- (f)
origin fall-throughs, but the profile producer needs to know function
boundaries. BOLT has that information readily available, so providing
the origin branch of a fall-through is a functional replacement of the
fall-through kind (f or F). This also has an effect of combining
branches and fall-throughs into a single record.
As traces subsume other pre-aggregated profile kinds, BOLT may drop
support for them soon. Users of pre-aggregated profile format are
advised to migrate to the trace format.
Test Plan: Updated callcont-fallthru.s
Disambiguate local functions using the containing file symbol in BAT
mode. Make local function naming consistent across BAT fdata and YAML
profiles.
Test Plan: updated register-fragments-bolt-symbols.s
This is an ongoing series of commits that are reformatting our
Python code. This catches the last of the python files to
reformat. Since they where so few I bunched them together.
Reformatting is done with `black`.
If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.
If you run into any problems, post to discourse about it and
we will try to help.
RFC Thread below:
https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style
Reviewed By: jhenderson, #libc, Mordante, sivachandra
Differential Revision: https://reviews.llvm.org/D150784
No-LBR mode wasn't tested and slipped when mayHaveProfileData was added for
Lite mode. This enables processing of profiles collected without LBR and
converted with `perf2bolt -nl` option.
Test Plan:
bin/llvm-lit -a tools/bolt/test/X86/nolbr.s
https://github.com/rafaelauler/bolt-tests/pull/20
Reviewed By: #bolt, rafauler
Differential Revision: https://reviews.llvm.org/D140256
Summary:
Import the test. The assembly input has three functions with associated fdata.
The old link_fdata.sh script only replaces the symbol names with symbol values,
whereas fdata format expects to have symbol offsets against the anchor symbol.
Introduce the link_fdata.py script which is able to parse the input and produce
either an offset or an absolute symbol value.
(cherry picked from FBD32256351)