diff --git a/bolt/docs/index.rst b/bolt/docs/index.rst index 07cefd047642..13ae27d6a4c4 100644 --- a/bolt/docs/index.rst +++ b/bolt/docs/index.rst @@ -250,6 +250,12 @@ contradict each other) you can use ``merge-fdata`` tool: Use ``combined.fdata`` for **Step 3** above to generate a universally optimized binary. +Profile Formats +--------------- + +See `Profile Formats `__ for comprehensive documentation of all +profile formats accepted by BOLT: perf.data, fdata, YAML, and pre-aggregated. + License ------- diff --git a/bolt/docs/profiles.md b/bolt/docs/profiles.md new file mode 100644 index 000000000000..d03e99746cb5 --- /dev/null +++ b/bolt/docs/profiles.md @@ -0,0 +1,212 @@ +# BOLT Profile Formats + +BOLT accepts profile data in several formats. This document describes each +format, how to generate it, and how BOLT consumes it. + +The general recommended workflow is to convert unsymbolized profiles (perf.data +or pre-aggregated) into symbolized (fdata or YAML): + +``` +$ perf2bolt executable \ +# perf.data is consumed directly: + -p perf.data +# OR pre-aggregated requires `--pa` switch: + -p preagg --pa +# fdata is the default output format, YAML is optionally emitted using `-w` flag: + -o perf.fdata [-w perf.yaml] +# the output format for `-o` can be switched with `--profile-format`: + -o perf.yaml --profile-format=yaml +``` + +# Unsymbolized profiles +Sample or trace profiles without symbol information accepted by +perf2bolt, to be converted into symbolized profile formats, used by llvm-bolt. + +## Linux perf data + +### Collection +Example with brstack: +```bash +perf record -j any,u -e cycles:u -o perf.data -- ./binary +``` + +### Consumption modes + +- **Branch samples (default)**: Branch stack samples from capable hardware + (Intel LBR, AMD LBRv2/BRS, ARM BRBE). + Used by default with `perf2bolt` and `llvm-bolt -p perf.data`. +- **Basic aggregation (`-ba`)**: Sample-based profile without branch stacks. + Lower quality but works on hardware/VMs without branch sampling support. +- **Tracing (`--itrace`)**: Synthesizing branch stacks from trace profile (Intel PT, ARM ETM). +Requires a value (e.g. `i10usl`), see +[perf documentation](https://github.com/torvalds/linux/blob/35f5aa9ccc83f4a4171cdb6ba023e514e2b2ecff/tools/perf/Documentation/itrace.txt) +for details. +- **ARM SPE (`--spe`)**: Statistical Profiling Extension on supported ARM + platforms providing short (1-deep) branch stacks. + +### Build-id verification + +BOLT verifies that the build-id in `perf.data` matches the input binary. +Use `--ignore-build-id` to skip this check. + +## Pre-aggregated format + +Pre-aggregated profile for direct consumption by `perf2bolt --pa` or +`llvm-bolt --pa`. Enables external tools to generate BOLT-compatible profiles +without going through `perf.data`. + +### Entry types + +``` +E +S +[TR] +B +[Ff] +r +``` + +Where: +- `E` — Name of the sampling event used for subsequent entries. +- `S` — Aggregated basic sample at ``. +- `T` — Aggregated trace: branch from `` to `` with a + fall-through to ``. +- `R` — Aggregated trace originating at a return. +- `B` — Aggregated branch from `` to ``. +- `F` — Aggregated fall-through from `` to ``. +- `f` — Aggregated fall-through with external origin (disambiguates returns + hitting a basic block head from regular internal jumps). +- `r` — Aggregated fall-through originating at an external return (no checks + performed for fall-through start). + +### Location format + +Locations have the format `[:]`: +- `` — Hex offset from the object base load address. +- `:` — Offset within the object identified by ``. +- `X:` — External address (outside the profiled binary). + +### Examples + +Basic samples profile: +``` +E cycles +S 41be50 3 +E br_inst_retired.near_taken +S 41be60 6 +``` + +Trace profile combining branches and fall-throughs: +``` +T 4b196f 4b19e0 4b19ef 2 +``` + +Legacy branch profile with separate branches and fall-throughs: +``` +F 41be50 41be50 3 +F 41be90 41be90 4 +B 4b1942 39b57f0 3 0 +B 4b196f 4b19e0 2 0 +``` + +### Generation + +Pre-aggregated profiles can be generated by external tools. See +[ebpf-bolt](https://github.com/aaupov/ebpf-bolt) for a reference +implementation using eBPF-based collection. + +# Symbolized profiles +The profiles accepted by llvm-bolt. fdata is the legacy format, YAML is the rich (metadata-enabled) format. + +## fdata format + +Plaintext, space-separated branch profile format written by `perf2bolt` and +consumed by `llvm-bolt -data `. Also produced by BOLT instrumentation. + +### LBR mode format + +Each line records a branch: + +``` + +``` + +Where: +- ``, ``: `1` if the name is an ELF symbol, `0` if + it is a DSO name. Special values: `2` for local symbols (includes + filename), `3`/`4`/`5` for memory events. +- ``, ``: Symbol name or DSO name. +- ``, ``: Hex offset relative to the symbol/DSO. +- ``: Number of branch mispredictions. +- ``: Total number of branches. + +Example: +``` +1 main 3fb 0 /lib/ld-2.21.so 12 4 221 +``` + +### No-LBR mode format + +Requires `no_lbr` header followed by an optional event name: + +``` +no_lbr + +``` + +### Special headers + +- `boltedcollection`: Indicates profile collected on a BOLTed binary. + Requires BAT (BOLT Address Translation) tables for remapping. + +### Memory events format + +Memory event types use `` values 3, 4, 5 to record load address +information alongside the instruction location. + +## YAML format + +Structured profile format with block-level granularity. More resilient to +binary changes and supports stale profile matching. + +### Schema + +Defined in `ProfileYAMLMapping.h`: + +```yaml +header: + profile-version: + binary-name: + binary-build-id: # optional + profile-flags: [lbr|sample|memevent] + profile-origin: # optional, how profile was obtained + profile-events: # optional, event names + dfs-order: # optional, default true + hash-func: # optional, default std-hash +functions: + - name: + fid: + hash: + exec: + nblocks: + blocks: + - bid: + insns: + hash: # optional + exec: # optional + succ: [{bid, cnt, mis}] # optional + calls: [{off, fid, cnt}] # optional + inline_tree: [...] # optional, pseudo probe info +``` + +### Hash functions + +- `std-hash`: Standard hash function (default for backward compatibility). +- `xxh3`: XXH3 hash function (recommended, better distribution). + +### Stale profile matching + +BOLT supports matching profiles to modified binaries using block hashes and +call graph matching. When the binary changes between profile collection and +optimization, BOLT uses the hash values to find corresponding blocks in the +new binary.