## Change:
* Added `--dump-dot-func` command-line option that allows users to dump
CFGs only for specific functions instead of dumping all functions (the
current only available option being `--dump-dot-all`)
## Usage:
* Users can now specify function names or regex patterns (e.g.,
`--dump-dot-func=main,helper` or `--dump-dot-func="init.*`") to generate
.dot files only for functions of interest
* Aims to save time when analysing specific functions in large binaries
(e.g., only dumping graphs for performance-critical functions identified
through profiling) and we can now avoid reduce output clutter from
generating thousands of unnecessary .dot files when analysing large
binaries
## Testing
The introduced test `dump-dot-func.test` confirms the new option does
the following:
- [x] 1. `dump-dot-func` can correctly filter a specified functions
- [x] 2. Can achieve the above with regexes
- [x] 3. Can do 1. with a list of functions
- [x] No option specified creates no dot files
- [x] Passing in a non-existent function generates no dumping messages
- [x] `dump-dot-all` continues to work as expected
This is a follow-up to #150963. X86 HLT instruction may appear in the
user-level code, in which case we should treat it as a terminator.
Handle it as a non-terminator in the Linux kernel mode.
The name is misleading, as setting Fragment to nullptr does not
necessarily make it undefined - common and equated symbols have
a nullptr fragment as well.
For x86, the halt instruction is defined as a terminator instruction.
When building the CFG, the instruction sequence following the hlt
instruction is treated as an independent MBB. Since there is no jump
information, the predecessor of this MBB cannot be identified, and it is
considered an unreachable MBB that will be removed.
Using this fix, the instruction sequences before and after hlt are
refused to be placed in different blocks.
FrameOptimizer pass runs by default on all targets, however, it defaults
to --frame-opt=none. This patch prevents the pass from running on
non-x86 targets when any other value (hot, full) is specified.
In perf2bolt, we are observing sporadic crashes in the recently added
registerProfiledFunctions from #150622. Addresses provided by the
hardware (from LBR) might be -1, which clashes with what LLVM uses in
DenseSet as empty tombstones records. This causes DenseSet to assert
with "can't insert empty tombstone into map" when ingesting this
data. Revert this change for now to unbreak perf2bolt.
When LLVM_LINK_LLVM_DYLIB is ON, `check-bolt` target reports unit test
failures:
BOLT-Unit :: Core/./CoreTests/failed_to_discover_tests_from_gtest
BOLT-Unit :: Profile/./ProfileTests/failed_to_discover_tests_from_gtest
The reason is that when llvm-lit runs a unit-test executable:
/path/to/CoreTests --gtest_list_tests '--gtest_filter=-*DISABLED_*'
an assertion is triggered with the following message:
LLVM ERROR: Option 'default' already exists!
This assertion triggers when the initializer of defaultListDAGScheduler
defined at SelectionDAGISel.cpp:219 is called as a statically-linked
function after already being called during the initialization of
libLLVM.
The issue can be traced down to LLVMTestingSupport library which depends
on libLLVM as neither COMPONENT_LIB nor DISABLE_LLVM_LINK_LLVM_DYLIB is
specified in a call to `add_llvm_library(LLVMTestingSupport ...)`.
Specifying DISABLE_LLVM_LINK_LLVM_DYLIB for LLVMTestingSupport makes
Clang unit test fail and COMPONENT_LIB is probably inappropriate for a
testing-specific library, thus as a workaround, added Error.cpp source
from LLVMTestingSupport directly to the list of source files of
CoreTests target (as it depends on
`llvm::detail::TakeError(llvm::Error)`) and removed LLVMTestingSupport
from the list of dependencies of ProfileTests.
Code written in assembly can have missing code markers. In BOLT, we can
compensate by recognizing that a function entry point should start a
code sequence.
Seen such code in lua jit library.
While registering profiled functions, only handle each address once.
Speeds up `DataAggregator::preprocessProfile`.
Test Plan:
For intermediate size pre-aggregated profile (10MB), reduces parsing
time from ~0.41s down to ~0.16s.
This patch removes all uses of %T from lit tests within bolt/. %T has
been listed as deprecated for ~7 years and should not be used given it
is not unique per test which means tests that use the same filenames can
race.
`getFallthroughsInTrace` requires CFG for functions not covered by BAT,
even in BAT/fdata mode. BAT-covered functions go through special
handling in fdata (`BAT->getFallthroughsInTrace`) and YAML
(`DataAggregator::writeBATYAML`) modes.
Since all modes (BAT/no-BAT, YAML/fdata) now need disassembly/CFG
construction:
- drop special BAT/fdata handling that omitted disassembly/CFG in
`RewriteInstance::run`, enabling *CFG for all non-BAT functions*,
- switch `getFallthroughsInTrace` to check if a function has CFG,
- which *allows emitting profile for non-simple functions* in all modes.
Previously, traces in non-simple functions were reported as invalid/
mismatching disassembled function contents. This change reduces the
number of such invalid traces and increases the number of profiled
functions. These functions may participate in function reordering via
call graph profile.
Test Plan: updated unclaimed-jt-entries.s
This patch introduces the following improvements:
- Catch an exception when the CMakeCache.txt is not present
- Bail out gracefully when llvm-bolt did not build successfully the
current or previous revision.
- Always do a `--switch-back` even if building the old revision failed
Buildbot (`BOLTBuilder`) no longer relies on a wrapper script to run
tests. This
patch guards the wrapper logic under a flag that is disabled by default.
This
it allows to:
- Eliminate the need for special handling in some tests.
- Fix the issue of a wrapper loop (described below)
- Simplify the NFC-Mode setup.
**Background:**
Previously, tests ran unconditionally, which also compiled any missing
utilities
and the unit tests.
The `nfc-check-setup.py` created:
- `llvm-bolt.new`, renamed from the current compilation
- `llvm-bolt.old`, built from the previous SHA
- `llvm-bolt`: a python wrapper pointing to `llvm-bolt.new`
Current behaviour and wrapper issue:
As before, the old/new binaries identify whether a patch affects BOLT.
If so,
`ninja check-bolt` builds missing dependencies and run tests,
overwriting the
`llvm-bolt` wrapper with a binary.
However, if Ninja reports:
```
ninja: no work to do.
```
the wrapper remains in place. If the next commit also does no work,
`nfc-check-setup.py` renames the existing wrapper to `llvm-bolt.new`,
causing an
infinite loop.
Allowing to disable the wrapper logic prevents this scenario and
simplifies the flow.
**Test plan:**
Creates llvm-bolt.new and llvm-bolt.old and stays on previous revision:
```
./nfc-check-setup.py build
```
Creates llvm-bolt.new and llvm-bolt.old and returns on current revision:
```
./nfc-check-setup.py build --switch-back
```
Creates llvm-bolt.new and llvm-bolt.old, returns on current revision,
and
creates a wrapper:
```
./nfc-check-setup.py build --switch-back --create-wrapper
```
Creates llvm-bolt.new and llvm-bolt.old, and passes an invalid argument
to the
wrapper:
```
./nfc-check-setup.py build --switch-back --create-wrapper --random-arg
```
`__bolt_instr_data_dump()` will find instrumented binary name by
iterating through entries under directory `/proc/self/map_files`,
and then open the binary and memory map it onto heap in order
to locate `.bolt.instr.tables` section to read the descriptions.
If binary name is already known and/or binary is already opened
as memory mapped, we can pass binary name and/or memory
buffer directly to `__bolt_instr_data_dump()` to save some work.
Distributions are making the choice to turn frame pointers on by
default. Nixpkgs recently turned them on, and the method they use to do
so implies that everything is built with them on by default.
https://github.com/NixOS/nixpkgs/pull/399014
Assuming that a well behaved distribution doing this puts
`-fno-omit-frame-pointer` at the beginning of the compiler invocation,
we can still re-enable omission by supplying `-fomit-frame-pointer`
during compilation.
This fixes some segfaults from stack corruption in binaries rewritten by
bolt with `llvm-bolt -instrument`.
See also: #147569Fixes: #148595
In `addCFIInstruction`, we split the CFI information
between `CFIInstrMapType CIEFrameInstructions` and `CFIInstrMapType
FrameInstructions`. In some cases we can end up with the remember CFI in
`CIEFrameInstructions` and the restore CFI in `FrameInstructions`. This
patch adds a check to make sure we do not split remember and restore
states and fixes https://github.com/llvm/llvm-project/issues/133501.
Leverage `sys::ProcessStatistics` to report the run time and memory
usage of perf script processes launched when reading perf data.
The reporting is enabled in debug mode with `-debug-only=aggregator`.
Switch buildid-list command to non-waiting `launchPerfProcess` to get
its runtime as well, unifying it with the rest of perf script processes.
Test Plan: NFC
When --hot-functions-at-end is used in combination with --use-old-text,
allocate code at the highest possible addresses withing old .text.
This feature is mostly useful for HHVM, where it is beneficial to have
hot static code placed as close as possible to jitted code.
The MCSymbolRefExpr::create overload with the specifier parameter is
discouraged and being phased out. Expressions with relocation specifiers
should use MCSpecifierExpr instead.
zero-density.s causes spurious NFC mismatches, e.g.
https://lab.llvm.org/buildbot/#/builders/92/builds/21380
This is caused by NFC script wrapping llvm-bolt binary only, so that
perf2bolt invocations are replaced by `llvm-bolt --agregate-only` to
achieve perf2bolt behavior. Add `show-density` to the list of flags
wrapping perf2bolt calls to avoid similar issues in the future.
Test Plan:
```
$ bolt/utils/nfc-check-setup.py --switch-back
$ bin/llvm-lit -a tools/bolt/test/X86/zero-density.s
```
Reapply "[NFC][DebugInfo][DWARF] Create new low-level dwarf library (#…
(#145959)
This reapplies cbf781f0bdf2f680abbe784faedeefd6f84c246e, with fixes for
the shared-library build and the unconventional sanitizer-runtime build.
Original Description:
This is the culmination of a series of changes described in [1].
Although somewhat large by line count, it is almost entirely mechanical,
creating a new library in DebugInfo/DWARF/LowLevel. This new library has
very minimal dependencies, allowing it to be used from more places than
the normal DebugInfo/DWARF library--in particular from MC.
1.
https://discourse.llvm.org/t/rfc-debuginfo-dwarf-refactor-into-to-lower-and-higher-level-libraries/86665/2
When all section contents are updated in-place, we can skip creation of
new segment(s), save disk space, and free up low memory addresses.
Currently, this feature only works with --use-gnu-stack.
Refactor the code for NewTextSegmentAddress to correctly point at the
true start of the segment when PHDR table is placed at the beginning. We
used to offset NewTextSegmentAddress by PHDR table plus cache line
alignment.
NFC for proper binaries. Some YAML binaries from our tests will diverge
due to bad segment address/offset alignment.
This is the culmination of a series of changes described in [1].
Although somewhat large by line count, it is almost entirely mechanical,
creating a new library in DebugInfo/DWARF/LowLevel. This new library has
very minimal dependencies, allowing it to be used from more places than
the normal DebugInfo/DWARF library--in particular from MC.
I am happy to put it in another location, or to structure it differently
if that makes sense. Some have suggested in BinaryFormat, but it is not
a great fit there. But if that makes more sense to the reviewers, I can
do that.
Another possibility would be to use pass-through headers to allow
clients who don't care to depend only on DebugInfo/DWARF. This would be
a much less invasive change, and perhaps easier for clients. But also a
system that hides details.
Either way, I'm open.
1.
https://discourse.llvm.org/t/rfc-debuginfo-dwarf-refactor-into-to-lower-and-higher-level-libraries/86665/2
Implement the detection of tail calls performed with untrusted link
register, which violates the assumption made on entry to every function.
Unlike other pauth gadgets, detection of this one involves some amount
of guessing which branch instructions should be checked as tail calls.
In gs-pacret-autiasp.s, the undefined call `bl g` causes inconsistent
basic block splitting: in some platforms BOLT emits two blocks, on some
others one.
Defining a dummy `g` symbol forces a single basic block everywhere.
Currently NFC tests only trigger when the llvm-bolt binary itself
changes.
This patch adds `--check-bolt-sources`, which scans git output for any
modifications under bolt/, excluding:
- bolt/docs
- bolt/utils/docker
- bolt/utils/dot2html
If any matching files change between versions, a `.llvm-bolt.changes`
marker is created. Buildbots can then use this marker to trigger in-tree
tests.
Address the issue that stems from how the density is computed.
Binary *function* density is the ratio of its total dynamic number of
executed bytes over the static size in bytes. The meaning of it is the
amount of dynamic profile information relative to its static size.
Binary *profile* density is the minimum *function* density among *well-
-profiled* functions, taken as functions covering p99 samples, or, in
other words, excluding functions in the tail 1% of samples. p99 is an
arbitrary cutoff. The meaning of profile density is the *minimum amount
of profile information per function* to be able to optimize the program
well. The threshold for profile density is set empirically.
The dynamically executed bytes are taken directly from LBR fall-throughs
and for LBRs recorded in trampoline functions, such as
```
000000001a941ec0 <Sleef_expf8_u10>:
1a941ec0: jmpq *0x37b911fa(%rip) # <pnt_expf8_u10>
1a941ec6: nopw %cs:(%rax,%rax)
```
the fall-through has zero length:
```
# Branch Target NextBranch Count
T 1b171cf6 1a941ec0 1a941ec0 568562
```
But it's not correct to say this function has zero executed bytes, just
the size of the next branch is not included in the fall-through.
If such functions have non-trivial sample count, they will fall in p99
samples, and cause the profile density to be zero.
To solve this, we can either:
1. Include fall-through end jump size into executed bytes:
is logically sound but technically challenging: the size needs to
come from disassembly (expensive), and the threshold need to be
reevaluated with updated definition of binary function density.
2. Exclude pass-through functions from density computation:
follows the intent of profile density which is to set the amount of
profile information needed to optimize the function well. Single
instruction pass-through functions don't need samples many times
the size to be optimized well.
Go with option 2 as a reasonable compromise.
Test Plan: added bolt/test/X86/zero-density.s
After a label in a function without CFG information, use a reasonably
pessimistic estimation of register state (assume that any register that
can be clobbered in this function was actually clobbered) instead of the
most pessimistic "all registers are unsafe". This is the same estimation
as used by the dataflow variant of the analysis when the preceding
instruction is not known for sure.
Without this, leaf functions without CFG information are likely to have
false positive reports about non-protected return instructions, as
1) LR is unlikely to be signed and authenticated in a leaf function and
2) LR is likely to be used by a return instruction near the end of the
function and
3) the register state is likely to be reset at least once during the
linear scan through the function