
## Change: * Added `--dump-dot-func` command-line option that allows users to dump CFGs only for specific functions instead of dumping all functions (the current only available option being `--dump-dot-all`) ## Usage: * Users can now specify function names or regex patterns (e.g., `--dump-dot-func=main,helper` or `--dump-dot-func="init.*`") to generate .dot files only for functions of interest * Aims to save time when analysing specific functions in large binaries (e.g., only dumping graphs for performance-critical functions identified through profiling) and we can now avoid reduce output clutter from generating thousands of unnecessary .dot files when analysing large binaries ## Testing The introduced test `dump-dot-func.test` confirms the new option does the following: - [x] 1. `dump-dot-func` can correctly filter a specified functions - [x] 2. Can achieve the above with regexes - [x] 3. Can do 1. with a list of functions - [x] No option specified creates no dot files - [x] Passing in a non-existent function generates no dumping messages - [x] `dump-dot-all` continues to work as expected
1253 lines
29 KiB
Markdown
1253 lines
29 KiB
Markdown
# BOLT - a post-link optimizer developed to speed up large applications
|
|
|
|
## SYNOPSIS
|
|
|
|
`llvm-bolt <executable> [-o outputfile] <executable>.bolt [-data=perf.fdata] [options]`
|
|
|
|
## OPTIONS
|
|
|
|
### Generic options:
|
|
|
|
- `-h`
|
|
|
|
Alias for --help
|
|
|
|
- `--help`
|
|
|
|
Display available options (--help-hidden for more)
|
|
|
|
- `--help-hidden`
|
|
|
|
Display all available options
|
|
|
|
- `--help-list`
|
|
|
|
Display list of available options (--help-list-hidden for more)
|
|
|
|
- `--help-list-hidden`
|
|
|
|
Display list of all available options
|
|
|
|
- `--version`
|
|
|
|
Display the version of this program
|
|
|
|
### Output options:
|
|
|
|
- `--bolt-info`
|
|
|
|
Write bolt info section in the output binary
|
|
|
|
- `-o <string>`
|
|
|
|
output file
|
|
|
|
- `-w <string>`
|
|
|
|
Save recorded profile to a file
|
|
|
|
### BOLT generic options:
|
|
|
|
- `--align-text=<uint>`
|
|
|
|
Alignment of .text section
|
|
|
|
- `--allow-stripped`
|
|
|
|
Allow processing of stripped binaries
|
|
|
|
- `--alt-inst-feature-size=<uint>`
|
|
|
|
Size of feature field in .altinstructions
|
|
|
|
- `--alt-inst-has-padlen`
|
|
|
|
Specify that .altinstructions has padlen field
|
|
|
|
- `--asm-dump[=<dump folder>]`
|
|
|
|
Dump function into assembly
|
|
|
|
- `-b`
|
|
|
|
Alias for -data
|
|
|
|
- `--bolt-id=<string>`
|
|
|
|
Add any string to tag this execution in the output binary via bolt info section
|
|
|
|
- `--break-funcs=<func1,func2,func3,...>`
|
|
|
|
List of functions to core dump on (debugging)
|
|
|
|
- `--check-encoding`
|
|
|
|
Perform verification of LLVM instruction encoding/decoding. Every instruction
|
|
in the input is decoded and re-encoded. If the resulting bytes do not match
|
|
the input, a warning message is printed.
|
|
|
|
- `--comp-dir-override=<string>`
|
|
|
|
Overrides DW_AT_comp_dir, and provides an alternative base location, which is
|
|
used with DW_AT_dwo_name to construct a path to *.dwo files.
|
|
|
|
- `--create-debug-names-section`
|
|
|
|
Creates .debug_names section, if the input binary doesn't have it already, for
|
|
DWARF5 CU/TUs.
|
|
|
|
- `--cu-processing-batch-size=<uint>`
|
|
|
|
Specifies the size of batches for processing CUs. Higher number has better
|
|
performance, but more memory usage. Default value is 1.
|
|
|
|
- `--data=<string>`
|
|
|
|
data file
|
|
|
|
- `--data2=<string>`
|
|
|
|
data file
|
|
|
|
- `--debug-skeleton-cu`
|
|
|
|
Prints out offsets for abbrev and debug_info of Skeleton CUs that get patched.
|
|
|
|
- `--debug-thread-count=<uint>`
|
|
|
|
Specifies the number of threads to be used when processing DWO debug information.
|
|
|
|
- `--dot-tooltip-code`
|
|
|
|
Add basic block instructions as tool tips on nodes
|
|
|
|
- `--dump-alt-instructions`
|
|
|
|
Dump Linux alternative instructions info
|
|
|
|
- `--dump-cg=<string>`
|
|
|
|
Dump callgraph to the given file
|
|
|
|
- `--dump-data`
|
|
|
|
Dump parsed bolt data for debugging
|
|
|
|
- `--dump-dot-all`
|
|
|
|
Dump function CFGs to graphviz format after each stage;enable '-print-loops'
|
|
for color-coded blocks
|
|
|
|
- `--dump-dot-func=<func1,func2,func3...>`
|
|
|
|
Dump function CFGs to graphviz format for specified functions only;
|
|
takes function name patterns (regex supported). Note: C++ function names
|
|
must be passed using their mangled names
|
|
|
|
- `--dump-linux-exceptions`
|
|
|
|
Dump Linux kernel exception table
|
|
|
|
- `--dump-orc`
|
|
|
|
Dump raw ORC unwind information (sorted)
|
|
|
|
- `--dump-para-sites`
|
|
|
|
Dump Linux kernel paravitual patch sites
|
|
|
|
- `--dump-pci-fixups`
|
|
|
|
Dump Linux kernel PCI fixup table
|
|
|
|
- `--dump-smp-locks`
|
|
|
|
Dump Linux kernel SMP locks
|
|
|
|
- `--dump-static-calls`
|
|
|
|
Dump Linux kernel static calls
|
|
|
|
- `--dump-static-keys`
|
|
|
|
Dump Linux kernel static keys jump table
|
|
|
|
- `--dwarf-output-path=<string>`
|
|
|
|
Path to where .dwo files or dwp file will be written out to.
|
|
|
|
- `--dwp=<string>`
|
|
|
|
Path and name to DWP file.
|
|
|
|
- `--dyno-stats`
|
|
|
|
Print execution info based on profile
|
|
|
|
- `--dyno-stats-all`
|
|
|
|
Print dyno stats after each stage
|
|
|
|
- `--dyno-stats-scale=<uint>`
|
|
|
|
Scale to be applied while reporting dyno stats
|
|
|
|
- `--enable-bat`
|
|
|
|
Write BOLT Address Translation tables
|
|
|
|
- `--force-data-relocations`
|
|
|
|
Force relocations to data sections to always be processed
|
|
|
|
- `--force-patch`
|
|
|
|
Force patching of original entry points
|
|
|
|
- `--funcs=<func1,func2,func3,...>`
|
|
|
|
Limit optimizations to functions from the list
|
|
|
|
- `--funcs-file=<string>`
|
|
|
|
File with list of functions to optimize
|
|
|
|
- `--funcs-file-no-regex=<string>`
|
|
|
|
File with list of functions to optimize (non-regex)
|
|
|
|
- `--funcs-no-regex=<func1,func2,func3,...>`
|
|
|
|
Limit optimizations to functions from the list (non-regex)
|
|
|
|
- `--hot-data`
|
|
|
|
Hot data symbols support (relocation mode)
|
|
|
|
- `--hot-functions-at-end`
|
|
|
|
If reorder-functions is used, order functions putting hottest last
|
|
|
|
- `--hot-text`
|
|
|
|
Generate hot text symbols. Apply this option to a precompiled binary that
|
|
manually calls into hugify, such that at runtime hugify call will put hot code
|
|
into 2M pages. This requires relocation.
|
|
|
|
- `--hot-text-move-sections=<sec1,sec2,sec3,...>`
|
|
|
|
List of sections containing functions used for hugifying hot text. BOLT makes
|
|
sure these functions are not placed on the same page as the hot text.
|
|
(default='.stub,.mover').
|
|
|
|
- `--insert-retpolines`
|
|
|
|
Run retpoline insertion pass
|
|
|
|
- `--keep-aranges`
|
|
|
|
Keep or generate .debug_aranges section if .gdb_index is written
|
|
|
|
- `--keep-tmp`
|
|
|
|
Preserve intermediate .o file
|
|
|
|
- `--lite`
|
|
|
|
Skip processing of cold functions
|
|
|
|
- `--log-file=<string>`
|
|
|
|
Redirect journaling to a file instead of stdout/stderr
|
|
|
|
- `--long-jump-labels`
|
|
|
|
Always use long jumps/nops for Linux kernel static keys
|
|
|
|
- `--match-profile-with-function-hash`
|
|
|
|
Match profile with function hash
|
|
|
|
- `--max-data-relocations=<uint>`
|
|
|
|
Maximum number of data relocations to process
|
|
|
|
- `--max-funcs=<uint>`
|
|
|
|
Maximum number of functions to process
|
|
|
|
- `--no-huge-pages`
|
|
|
|
Use regular size pages for code alignment
|
|
|
|
- `--no-threads`
|
|
|
|
Disable multithreading
|
|
|
|
- `--pad-funcs=<func1:pad1,func2:pad2,func3:pad3,...>`
|
|
|
|
List of functions to pad with amount of bytes
|
|
|
|
- `--print-mappings`
|
|
|
|
Print mappings in the legend, between characters/blocks and text sections
|
|
(default false).
|
|
|
|
|
|
- `--profile-format=<value>`
|
|
|
|
Format to dump profile output in aggregation mode, default is fdata
|
|
- `fdata`: offset-based plaintext format
|
|
- `yaml`: dense YAML representation
|
|
|
|
- `--r11-availability=<value>`
|
|
|
|
Determine the availability of r11 before indirect branches
|
|
- `never`: r11 not available
|
|
- `always`: r11 available before calls and jumps
|
|
- `abi`: r11 available before calls but not before jumps
|
|
|
|
- `--relocs`
|
|
|
|
Use relocations in the binary (default=autodetect)
|
|
|
|
- `--remove-symtab`
|
|
|
|
Remove .symtab section
|
|
|
|
- `--reorder-skip-symbols=<symbol1,symbol2,symbol3,...>`
|
|
|
|
List of symbol names that cannot be reordered
|
|
|
|
- `--reorder-symbols=<symbol1,symbol2,symbol3,...>`
|
|
|
|
List of symbol names that can be reordered
|
|
|
|
- `--retpoline-lfence`
|
|
|
|
Determine if lfence instruction should exist in the retpoline
|
|
|
|
- `--skip-funcs=<func1,func2,func3,...>`
|
|
|
|
List of functions to skip
|
|
|
|
- `--skip-funcs-file=<string>`
|
|
|
|
File with list of functions to skip
|
|
|
|
- `--strict`
|
|
|
|
Trust the input to be from a well-formed source
|
|
|
|
- `--tasks-per-thread=<uint>`
|
|
|
|
Number of tasks to be created per thread
|
|
|
|
- `--terminal-trap`
|
|
|
|
Assume that execution stops at trap instruction
|
|
|
|
- `--thread-count=<uint>`
|
|
|
|
Number of threads
|
|
|
|
- `--top-called-limit=<uint>`
|
|
|
|
Maximum number of functions to print in top called functions section
|
|
|
|
- `--trap-avx512`
|
|
|
|
In relocation mode trap upon entry to any function that uses AVX-512
|
|
instructions
|
|
|
|
- `--trap-old-code`
|
|
|
|
Insert traps in old function bodies (relocation mode)
|
|
|
|
- `--update-debug-sections`
|
|
|
|
Update DWARF debug sections of the executable
|
|
|
|
- `--use-gnu-stack`
|
|
|
|
Use GNU_STACK program header for new segment (workaround for issues with
|
|
strip/objcopy)
|
|
|
|
- `--use-old-text`
|
|
|
|
Re-use space in old .text if possible (relocation mode)
|
|
|
|
- `-v <uint>`
|
|
|
|
Set verbosity level for diagnostic output
|
|
|
|
- `--write-dwp`
|
|
|
|
Output a single dwarf package file (dwp) instead of multiple non-relocatable
|
|
dwarf object files (dwo).
|
|
|
|
### BOLT optimization options:
|
|
|
|
- `--align-blocks`
|
|
|
|
Align basic blocks
|
|
|
|
- `--align-blocks-min-size=<uint>`
|
|
|
|
Minimal size of the basic block that should be aligned
|
|
|
|
- `--align-blocks-threshold=<uint>`
|
|
|
|
Align only blocks with frequency larger than containing function execution
|
|
frequency specified in percent. E.g. 1000 means aligning blocks that are 10
|
|
times more frequently executed than the containing function.
|
|
|
|
- `--align-functions=<uint>`
|
|
|
|
Align functions at a given value (relocation mode)
|
|
|
|
- `--align-functions-max-bytes=<uint>`
|
|
|
|
Maximum number of bytes to use to align functions
|
|
|
|
- `--assume-abi`
|
|
|
|
Assume the ABI is never violated
|
|
|
|
- `--block-alignment=<uint>`
|
|
|
|
Boundary to use for alignment of basic blocks
|
|
|
|
- `--bolt-seed=<uint>`
|
|
|
|
Seed for randomization
|
|
|
|
- `--cg-from-perf-data`
|
|
|
|
Use perf data directly when constructing the call graph for stale functions
|
|
|
|
- `--cg-ignore-recursive-calls`
|
|
|
|
Ignore recursive calls when constructing the call graph
|
|
|
|
- `--cg-use-split-hot-size`
|
|
|
|
Use hot/cold data on basic blocks to determine hot sizes for call graph
|
|
functions
|
|
|
|
- `--cold-threshold=<uint>`
|
|
|
|
Tenths of percents of main entry frequency to use as a threshold when
|
|
evaluating whether a basic block is cold (0 means it is only considered cold
|
|
if the block has zero samples). Default: 0
|
|
|
|
- `--elim-link-veneers`
|
|
|
|
Run veneer elimination pass
|
|
|
|
- `--eliminate-unreachable`
|
|
|
|
Eliminate unreachable code
|
|
|
|
- `--equalize-bb-counts`
|
|
|
|
Use same count for BBs that should have equivalent count (used in non-LBR and
|
|
shrink wrapping)
|
|
|
|
- `--execution-count-threshold=<uint>`
|
|
|
|
Perform profiling accuracy-sensitive optimizations only if function execution
|
|
count >= the threshold (default: 0)
|
|
|
|
- `--fix-block-counts`
|
|
|
|
Adjust block counts based on outgoing branch counts
|
|
|
|
- `--fix-func-counts`
|
|
|
|
Adjust function counts based on basic blocks execution count
|
|
|
|
- `--force-inline=<func1,func2,func3,...>`
|
|
|
|
List of functions to always consider for inlining
|
|
|
|
- `--frame-opt=<value>`
|
|
|
|
Optimize stack frame accesses
|
|
- `none`: do not perform frame optimization
|
|
- `hot`: perform FOP on hot functions
|
|
- `all`: perform FOP on all functions
|
|
|
|
- `--frame-opt-rm-stores`
|
|
|
|
Apply additional analysis to remove stores (experimental)
|
|
|
|
- `--function-order=<string>`
|
|
|
|
File containing an ordered list of functions to use for function reordering
|
|
|
|
- `--generate-function-order=<string>`
|
|
|
|
File to dump the ordered list of functions to use for function reordering
|
|
|
|
- `--generate-link-sections=<string>`
|
|
|
|
Generate a list of function sections in a format suitable for inclusion in a
|
|
linker script
|
|
|
|
- `--group-stubs`
|
|
|
|
Share stubs across functions
|
|
|
|
- `--hugify`
|
|
|
|
Automatically put hot code on 2MB page(s) (hugify) at runtime. No manual call
|
|
to hugify is needed in the binary (which is what --hot-text relies on).
|
|
|
|
- `--icf=<value>`
|
|
|
|
Fold functions with identical code
|
|
- `all`: Enable identical code folding
|
|
- `none`: Disable identical code folding (default)
|
|
- `safe`: Enable safe identical code folding
|
|
|
|
- `--icp`
|
|
|
|
Alias for --indirect-call-promotion
|
|
|
|
- `--icp-calls-remaining-percent-threshold=<uint>`
|
|
|
|
The percentage threshold against remaining unpromoted indirect call count for
|
|
the promotion for calls
|
|
|
|
- `--icp-calls-topn`
|
|
|
|
Alias for --indirect-call-promotion-calls-topn
|
|
|
|
- `--icp-calls-total-percent-threshold=<uint>`
|
|
|
|
The percentage threshold against total count for the promotion for calls
|
|
|
|
- `--icp-eliminate-loads`
|
|
|
|
Enable load elimination using memory profiling data when performing ICP
|
|
|
|
- `--icp-funcs=<func1,func2,func3,...>`
|
|
|
|
List of functions to enable ICP for
|
|
|
|
- `--icp-inline`
|
|
|
|
Only promote call targets eligible for inlining
|
|
|
|
- `--icp-jt-remaining-percent-threshold=<uint>`
|
|
|
|
The percentage threshold against remaining unpromoted indirect call count for
|
|
the promotion for jump tables
|
|
|
|
- `--icp-jt-targets`
|
|
|
|
Alias for --icp-jump-tables-targets
|
|
|
|
- `--icp-jt-topn`
|
|
|
|
Alias for --indirect-call-promotion-jump-tables-topn
|
|
|
|
- `--icp-jt-total-percent-threshold=<uint>`
|
|
|
|
The percentage threshold against total count for the promotion for jump tables
|
|
|
|
- `--icp-jump-tables-targets`
|
|
|
|
For jump tables, optimize indirect jmp targets instead of indices
|
|
|
|
- `--icp-mp-threshold`
|
|
|
|
Alias for --indirect-call-promotion-mispredict-threshold
|
|
|
|
- `--icp-old-code-sequence`
|
|
|
|
Use old code sequence for promoted calls
|
|
|
|
- `--icp-top-callsites=<uint>`
|
|
|
|
Optimize hottest calls until at least this percentage of all indirect calls
|
|
frequency is covered. 0 = all callsites
|
|
|
|
- `--icp-topn`
|
|
|
|
Alias for --indirect-call-promotion-topn
|
|
|
|
- `--icp-use-mp`
|
|
|
|
Alias for --indirect-call-promotion-use-mispredicts
|
|
|
|
- `--indirect-call-promotion=<value>`
|
|
|
|
Indirect call promotion
|
|
- `none`: do not perform indirect call promotion
|
|
- `calls`: perform ICP on indirect calls
|
|
- `jump-tables`: perform ICP on jump tables
|
|
- `all`: perform ICP on calls and jump tables
|
|
|
|
- `--indirect-call-promotion-calls-topn=<uint>`
|
|
|
|
Limit number of targets to consider when doing indirect call promotion on
|
|
calls. 0 = no limit
|
|
|
|
- `--indirect-call-promotion-jump-tables-topn=<uint>`
|
|
|
|
Limit number of targets to consider when doing indirect call promotion on jump
|
|
tables. 0 = no limit
|
|
|
|
- `--indirect-call-promotion-topn=<uint>`
|
|
|
|
Limit number of targets to consider when doing indirect call promotion. 0 = no
|
|
limit
|
|
|
|
- `--indirect-call-promotion-use-mispredicts`
|
|
|
|
Use misprediction frequency for determining whether or not ICP should be
|
|
applied at a callsite. The -indirect-call-promotion-mispredict-threshold
|
|
value will be used by this heuristic
|
|
|
|
- `--infer-fall-throughs`
|
|
|
|
Infer execution count for fall-through blocks
|
|
|
|
- `--infer-stale-profile`
|
|
|
|
Infer counts from stale profile data.
|
|
|
|
- `--inline-all`
|
|
|
|
Inline all functions
|
|
|
|
- `--inline-ap`
|
|
|
|
Adjust function profile after inlining
|
|
|
|
- `--inline-limit=<uint>`
|
|
|
|
Maximum number of call sites to inline
|
|
|
|
- `--inline-max-iters=<uint>`
|
|
|
|
Maximum number of inline iterations
|
|
|
|
- `--inline-memcpy`
|
|
|
|
Inline memcpy using 'rep movsb' instruction (X86-only)
|
|
|
|
- `--inline-small-functions`
|
|
|
|
Inline functions if increase in size is less than defined by -inline-small-
|
|
functions-bytes
|
|
|
|
- `--inline-small-functions-bytes=<uint>`
|
|
|
|
Max number of bytes for the function to be considered small for inlining
|
|
purposes
|
|
|
|
- `--instrument`
|
|
|
|
Instrument code to generate accurate profile data
|
|
|
|
- `--iterative-guess`
|
|
|
|
In non-LBR mode, guess edge counts using iterative technique
|
|
|
|
- `--jt-footprint-optimize-for-icache`
|
|
|
|
With jt-footprint-reduction, only process PIC jumptables and turn off other
|
|
transformations that increase code size
|
|
|
|
- `--jt-footprint-reduction`
|
|
|
|
Make jump tables size smaller at the cost of using more instructions at jump
|
|
sites
|
|
|
|
- `--jump-tables=<value>`
|
|
|
|
Jump tables support (default=basic)
|
|
- `none`: do not optimize functions with jump tables
|
|
- `basic`: optimize functions with jump tables
|
|
- `move`: move jump tables to a separate section
|
|
- `split`: split jump tables section into hot and cold based on function
|
|
execution frequency
|
|
- `aggressive`: aggressively split jump tables section based on usage of the
|
|
tables
|
|
|
|
- `--keep-nops`
|
|
|
|
Keep no-op instructions. By default they are removed.
|
|
|
|
- `--lite-threshold-count=<uint>`
|
|
|
|
Similar to '-lite-threshold-pct' but specify threshold using absolute function
|
|
call count. I.e. limit processing to functions executed at least the specified
|
|
number of times.
|
|
|
|
- `--lite-threshold-pct=<uint>`
|
|
|
|
Threshold (in percent) for selecting functions to process in lite mode. Higher
|
|
threshold means fewer functions to process. E.g threshold of 90 means only top
|
|
10 percent of functions with profile will be processed.
|
|
|
|
- `--match-with-call-graph`
|
|
|
|
Match functions with call graph
|
|
|
|
- `--memcpy1-spec=<func1,func2:cs1:cs2,func3:cs1,...>`
|
|
|
|
List of functions with call sites for which to specialize memcpy() for size 1
|
|
|
|
- `--min-branch-clusters`
|
|
|
|
Use a modified clustering algorithm geared towards minimizing branches
|
|
|
|
- `--name-similarity-function-matching-threshold=<uint>`
|
|
|
|
Match functions using namespace and edit distance.
|
|
|
|
- `--no-inline`
|
|
|
|
Disable all inlining (overrides other inlining options)
|
|
|
|
- `--no-scan`
|
|
|
|
Do not scan cold functions for external references (may result in slower binary)
|
|
|
|
- `--peepholes=<value>`
|
|
|
|
Enable peephole optimizations
|
|
- `none`: disable peepholes
|
|
- `double-jumps`: remove double jumps when able
|
|
- `tailcall-traps`: insert tail call traps
|
|
- `useless-branches`: remove useless conditional branches
|
|
- `all`: enable all peephole optimizations
|
|
|
|
- `--plt=<value>`
|
|
|
|
Optimize PLT calls (requires linking with -znow)
|
|
- `none`: do not optimize PLT calls
|
|
- `hot`: optimize executed (hot) PLT calls
|
|
- `all`: optimize all PLT calls
|
|
|
|
- `--preserve-blocks-alignment`
|
|
|
|
Try to preserve basic block alignment
|
|
|
|
- `--profile-ignore-hash`
|
|
|
|
Ignore hash while reading function profile
|
|
|
|
- `--profile-use-dfs`
|
|
|
|
Use DFS order for YAML profile
|
|
|
|
- `--reg-reassign`
|
|
|
|
Reassign registers so as to avoid using REX prefixes in hot code
|
|
|
|
- `--reorder-blocks=<value>`
|
|
|
|
Change layout of basic blocks in a function
|
|
- `none`: do not reorder basic blocks
|
|
- `reverse`: layout blocks in reverse order
|
|
- `normal`: perform optimal layout based on profile
|
|
- `branch-predictor`: perform optimal layout prioritizing branch predictions
|
|
- `cache`: perform optimal layout prioritizing I-cache behavior
|
|
- `cache+`: perform layout optimizing I-cache behavior
|
|
- `ext-tsp`: perform layout optimizing I-cache behavior
|
|
- `cluster-shuffle`: perform random layout of clusters
|
|
|
|
- `--reorder-data=<section1,section2,section3,...>`
|
|
|
|
List of sections to reorder
|
|
|
|
- `--reorder-data-algo=<value>`
|
|
|
|
Algorithm used to reorder data sections
|
|
- `count`: sort hot data by read counts
|
|
- `funcs`: sort hot data by hot function usage and count
|
|
|
|
- `--reorder-data-inplace`
|
|
|
|
Reorder data sections in place
|
|
|
|
- `--reorder-data-max-bytes=<uint>`
|
|
|
|
Maximum number of bytes to reorder
|
|
|
|
- `--reorder-data-max-symbols=<uint>`
|
|
|
|
Maximum number of symbols to reorder
|
|
|
|
- `--reorder-functions=<value>`
|
|
|
|
Reorder and cluster functions (works only with relocations)
|
|
- `none`: do not reorder functions
|
|
- `exec-count`: order by execution count
|
|
- `hfsort`: use hfsort algorithm
|
|
- `hfsort+`: use cache-directed sort
|
|
- `cdsort`: use cache-directed sort
|
|
- `pettis-hansen`: use Pettis-Hansen algorithm
|
|
- `random`: reorder functions randomly
|
|
- `user`: use function order specified by -function-order
|
|
|
|
- `--reorder-functions-use-hot-size`
|
|
|
|
Use a function's hot size when doing clustering
|
|
|
|
- `--report-bad-layout=<uint>`
|
|
|
|
Print top <uint> functions with suboptimal code layout on input
|
|
|
|
- `--report-stale`
|
|
|
|
Print the list of functions with stale profile
|
|
|
|
- `--runtime-hugify-lib=<string>`
|
|
|
|
Specify file name of the runtime hugify library
|
|
|
|
- `--runtime-instrumentation-lib=<string>`
|
|
|
|
Specify file name of the runtime instrumentation library
|
|
|
|
- `--sctc-mode=<value>`
|
|
|
|
Mode for simplify conditional tail calls
|
|
- `always`: always perform sctc
|
|
- `preserve`: only perform sctc when branch direction is preserved
|
|
- `heuristic`: use branch prediction data to control sctc
|
|
|
|
- `--sequential-disassembly`
|
|
|
|
Performs disassembly sequentially
|
|
|
|
- `--shrink-wrapping-threshold=<uint>`
|
|
|
|
Percentage of prologue execution count to use as threshold when evaluating
|
|
whether a block is cold enough to be profitable to move eligible spills there
|
|
|
|
- `--simplify-conditional-tail-calls`
|
|
|
|
Simplify conditional tail calls by removing unnecessary jumps
|
|
|
|
- `--simplify-rodata-loads`
|
|
|
|
Simplify loads from read-only sections by replacing the memory operand with
|
|
the constant found in the corresponding section
|
|
|
|
- `--split-align-threshold=<uint>`
|
|
|
|
When deciding to split a function, apply this alignment while doing the size
|
|
comparison (see -split-threshold). Default value: 2.
|
|
|
|
- `--split-all-cold`
|
|
|
|
Outline as many cold basic blocks as possible
|
|
|
|
- `--split-eh`
|
|
|
|
Split C++ exception handling code
|
|
|
|
- `--split-functions`
|
|
|
|
Split functions into fragments
|
|
|
|
- `--split-strategy=<value>`
|
|
|
|
Strategy used to partition blocks into fragments
|
|
- `profile2`: split each function into a hot and cold fragment using profiling
|
|
information
|
|
- `cdsplit`: split each function into a hot, warm, and cold fragment using
|
|
profiling information
|
|
- `random2`: split each function into a hot and cold fragment at a randomly
|
|
chosen split point (ignoring any available profiling information)
|
|
- `randomN`: split each function into N fragments at a randomly chosen split
|
|
points (ignoring any available profiling information)
|
|
- `all`: split all basic blocks of each function into fragments such that each
|
|
fragment contains exactly a single basic block
|
|
|
|
- `--split-threshold=<uint>`
|
|
|
|
Split function only if its main size is reduced by more than given amount of
|
|
bytes. Default value: 0, i.e. split iff the size is reduced. Note that on some
|
|
architectures the size can increase after splitting.
|
|
|
|
- `--stale-matching-max-func-size=<uint>`
|
|
|
|
The maximum size of a function to consider for inference.
|
|
|
|
- `--stale-matching-min-matched-block=<uint>`
|
|
|
|
Percentage threshold of matched basic blocks at which stale profile inference
|
|
is executed.
|
|
|
|
- `--stale-threshold=<uint>`
|
|
|
|
Maximum percentage of stale functions to tolerate (default: 100)
|
|
|
|
- `--stoke`
|
|
|
|
Turn on the stoke analysis
|
|
|
|
- `--strip-rep-ret`
|
|
|
|
Strip 'repz' prefix from 'repz retq' sequence (on by default)
|
|
|
|
- `--tail-duplication=<value>`
|
|
|
|
Duplicate unconditional branches that cross a cache line
|
|
- `none`: do not apply
|
|
- `aggressive`: aggressive strategy
|
|
- `moderate`: moderate strategy
|
|
- `cache`: cache-aware duplication strategy
|
|
|
|
- `--tsp-threshold=<uint>`
|
|
|
|
Maximum number of hot basic blocks in a function for which to use a precise
|
|
TSP solution while re-ordering basic blocks
|
|
|
|
- `--use-aggr-reg-reassign`
|
|
|
|
Use register liveness analysis to try to find more opportunities for -reg-
|
|
reassign optimization
|
|
|
|
- `--use-compact-aligner`
|
|
|
|
Use compact approach for aligning functions
|
|
|
|
- `--use-edge-counts`
|
|
|
|
Use edge count data when doing clustering
|
|
|
|
- `--verify-cfg`
|
|
|
|
Verify the CFG after every pass
|
|
|
|
- `--x86-align-branch-boundary-hot-only`
|
|
|
|
Only apply branch boundary alignment in hot code
|
|
|
|
- `--x86-strip-redundant-address-size`
|
|
|
|
Remove redundant Address-Size override prefix
|
|
|
|
### BOLT instrumentation options:
|
|
|
|
`llvm-bolt <executable> -instrument [-o outputfile] <instrumented-executable>`
|
|
|
|
- `--conservative-instrumentation`
|
|
|
|
Disable instrumentation optimizations that sacrifice profile accuracy (for
|
|
debugging, default: false)
|
|
|
|
- `--instrument-calls`
|
|
|
|
Record profile for inter-function control flow activity (default: true)
|
|
|
|
- `--instrument-hot-only`
|
|
|
|
Only insert instrumentation on hot functions (needs profile, default: false)
|
|
|
|
- `--instrumentation-binpath=<string>`
|
|
|
|
Path to instrumented binary in case if /proc/self/map_files is not accessible
|
|
due to access restriction issues
|
|
|
|
- `--instrumentation-file=<string>`
|
|
|
|
File name where instrumented profile will be saved (default: /tmp/prof.fdata)
|
|
|
|
- `--instrumentation-file-append-pid`
|
|
|
|
Append PID to saved profile file name (default: false)
|
|
|
|
- `--instrumentation-no-counters-clear`
|
|
|
|
Don't clear counters across dumps (use with instrumentation-sleep-time option)
|
|
|
|
- `--instrumentation-sleep-time=<uint>`
|
|
|
|
Interval between profile writes (default: 0 = write only at program end).
|
|
This is useful for service workloads when you want to dump profile every X
|
|
minutes or if you are killing the program and the profile is not being dumped
|
|
at the end.
|
|
|
|
- `--instrumentation-wait-forks`
|
|
|
|
Wait until all forks of instrumented process will finish (use with
|
|
instrumentation-sleep-time option)
|
|
|
|
### BOLT printing options:
|
|
|
|
- `--print-aliases`
|
|
|
|
Print aliases when printing objects
|
|
|
|
- `--print-all`
|
|
|
|
Print functions after each stage
|
|
|
|
- `--print-cfg`
|
|
|
|
Print functions after CFG construction
|
|
|
|
- `--print-debug-info`
|
|
|
|
Print debug info when printing functions
|
|
|
|
- `--print-disasm`
|
|
|
|
Print function after disassembly
|
|
|
|
- `--print-dyno-opcode-stats=<uint>`
|
|
|
|
Print per instruction opcode dyno stats and the functionnames:BB offsets of
|
|
the nth highest execution counts
|
|
|
|
- `--print-dyno-stats-only`
|
|
|
|
While printing functions output dyno-stats and skip instructions
|
|
|
|
- `--print-exceptions`
|
|
|
|
Print exception handling data
|
|
|
|
- `--print-globals`
|
|
|
|
Print global symbols after disassembly
|
|
|
|
- `--print-jump-tables`
|
|
|
|
Print jump tables
|
|
|
|
- `--print-loops`
|
|
|
|
Print loop related information
|
|
|
|
- `--print-mem-data`
|
|
|
|
Print memory data annotations when printing functions
|
|
|
|
- `--print-normalized`
|
|
|
|
Print functions after CFG is normalized
|
|
|
|
- `--print-only=<func1,func2,func3,...>`
|
|
|
|
List of functions to print
|
|
|
|
- `--print-orc`
|
|
|
|
Print ORC unwind information for instructions
|
|
|
|
- `--print-profile`
|
|
|
|
Print functions after attaching profile
|
|
|
|
- `--print-profile-stats`
|
|
|
|
Print profile quality/bias analysis
|
|
|
|
- `--print-pseudo-probes=<value>`
|
|
|
|
Print pseudo probe info
|
|
- `decode`: decode probes section from binary
|
|
- `address_conversion`: update address2ProbesMap with output block address
|
|
- `encoded_probes`: display the encoded probes in binary section
|
|
- `all`: enable all debugging printout
|
|
|
|
- `--print-relocations`
|
|
|
|
Print relocations when printing functions/objects
|
|
|
|
- `--print-reordered-data`
|
|
|
|
Print section contents after reordering
|
|
|
|
- `--print-retpoline-insertion`
|
|
|
|
Print functions after retpoline insertion pass
|
|
|
|
- `--print-sdt`
|
|
|
|
Print all SDT markers
|
|
|
|
- `--print-sections`
|
|
|
|
Print all registered sections
|
|
|
|
- `--print-unknown`
|
|
|
|
Print names of functions with unknown control flow
|
|
|
|
- `--time-build`
|
|
|
|
Print time spent constructing binary functions
|
|
|
|
- `--time-rewrite`
|
|
|
|
Print time spent in rewriting passes
|
|
|
|
- `--print-after-branch-fixup`
|
|
|
|
Print function after fixing local branches
|
|
|
|
- `--print-after-jt-footprint-reduction`
|
|
|
|
Print function after jt-footprint-reduction pass
|
|
|
|
- `--print-after-lowering`
|
|
|
|
Print function after instruction lowering
|
|
|
|
- `--print-cache-metrics`
|
|
|
|
Calculate and print various metrics for instruction cache
|
|
|
|
- `--print-clusters`
|
|
|
|
Print clusters
|
|
|
|
- `--print-estimate-edge-counts`
|
|
|
|
Print function after edge counts are set for no-LBR profile
|
|
|
|
- `--print-finalized`
|
|
|
|
Print function after CFG is finalized
|
|
|
|
- `--print-fix-relaxations`
|
|
|
|
Print functions after fix relaxations pass
|
|
|
|
- `--print-fix-riscv-calls`
|
|
|
|
Print functions after fix RISCV calls pass
|
|
|
|
- `--print-fop`
|
|
|
|
Print functions after frame optimizer pass
|
|
|
|
- `--print-function-statistics=<uint>`
|
|
|
|
Print statistics about basic block ordering
|
|
|
|
- `--print-icf`
|
|
|
|
Print functions after ICF optimization
|
|
|
|
- `--print-icp`
|
|
|
|
Print functions after indirect call promotion
|
|
|
|
- `--print-inline`
|
|
|
|
Print functions after inlining optimization
|
|
|
|
- `--print-large-functions`
|
|
|
|
Print functions that could not be overwritten due to excessive size
|
|
|
|
- `--print-longjmp`
|
|
|
|
Print functions after longjmp pass
|
|
|
|
- `--print-optimize-bodyless`
|
|
|
|
Print functions after bodyless optimization
|
|
|
|
- `--print-output-address-range`
|
|
|
|
Print output address range for each basic block in the function
|
|
whenBinaryFunction::print is called
|
|
|
|
- `--print-peepholes`
|
|
|
|
Print functions after peephole optimization
|
|
|
|
- `--print-plt`
|
|
|
|
Print functions after PLT optimization
|
|
|
|
- `--print-regreassign`
|
|
|
|
Print functions after regreassign pass
|
|
|
|
- `--print-reordered`
|
|
|
|
Print functions after layout optimization
|
|
|
|
- `--print-reordered-functions`
|
|
|
|
Print functions after clustering
|
|
|
|
- `--print-sctc`
|
|
|
|
Print functions after conditional tail call simplification
|
|
|
|
- `--print-simplify-rodata-loads`
|
|
|
|
Print functions after simplification of RO data loads
|
|
|
|
- `--print-sorted-by=<value>`
|
|
|
|
Print functions sorted by order of dyno stats
|
|
- `executed-forward-branches`: executed forward branches
|
|
- `taken-forward-branches`: taken forward branches
|
|
- `executed-backward-branches`: executed backward branches
|
|
- `taken-backward-branches`: taken backward branches
|
|
- `executed-unconditional-branches`: executed unconditional branches
|
|
- `all-function-calls`: all function calls
|
|
- `indirect-calls`: indirect calls
|
|
- `PLT-calls`: PLT calls
|
|
- `executed-instructions`: executed instructions
|
|
- `executed-load-instructions`: executed load instructions
|
|
- `executed-store-instructions`: executed store instructions
|
|
- `taken-jump-table-branches`: taken jump table branches
|
|
- `taken-unknown-indirect-branches`: taken unknown indirect branches
|
|
- `total-branches`: total branches
|
|
- `taken-branches`: taken branches
|
|
- `non-taken-conditional-branches`: non-taken conditional branches
|
|
- `taken-conditional-branches`: taken conditional branches
|
|
- `all-conditional-branches`: all conditional branches
|
|
- `linker-inserted-veneer-calls`: linker-inserted veneer calls
|
|
- `all`: sorted by all names
|
|
|
|
- `--print-sorted-by-order=<value>`
|
|
|
|
Use ascending or descending order when printing functions ordered by dyno stats
|
|
|
|
- `--print-split`
|
|
|
|
Print functions after code splitting
|
|
|
|
- `--print-stoke`
|
|
|
|
Print functions after stoke analysis
|
|
|
|
- `--print-uce`
|
|
|
|
Print functions after unreachable code elimination
|
|
|
|
- `--print-veneer-elimination`
|
|
|
|
Print functions after veneer elimination pass
|
|
|
|
- `--time-opts`
|
|
|
|
Print time spent in each optimization
|
|
|
|
- `--print-all-options`
|
|
|
|
Print all option values after command line parsing
|
|
|
|
- `--print-options`
|
|
|
|
Print non-default options after command line parsing
|