The name is misleading, as setting Fragment to nullptr does not
necessarily make it undefined - common and equated symbols have
a nullptr fragment as well.
For x86, the halt instruction is defined as a terminator instruction.
When building the CFG, the instruction sequence following the hlt
instruction is treated as an independent MBB. Since there is no jump
information, the predecessor of this MBB cannot be identified, and it is
considered an unreachable MBB that will be removed.
Using this fix, the instruction sequences before and after hlt are
refused to be placed in different blocks.
The MCSymbolRefExpr::create overload with the specifier parameter is
discouraged and being phased out. Expressions with relocation specifiers
should use MCSpecifierExpr instead.
Reapply "[NFC][DebugInfo][DWARF] Create new low-level dwarf library (#…
(#145959)
This reapplies cbf781f0bdf2f680abbe784faedeefd6f84c246e, with fixes for
the shared-library build and the unconventional sanitizer-runtime build.
Original Description:
This is the culmination of a series of changes described in [1].
Although somewhat large by line count, it is almost entirely mechanical,
creating a new library in DebugInfo/DWARF/LowLevel. This new library has
very minimal dependencies, allowing it to be used from more places than
the normal DebugInfo/DWARF library--in particular from MC.
1.
https://discourse.llvm.org/t/rfc-debuginfo-dwarf-refactor-into-to-lower-and-higher-level-libraries/86665/2
This is the culmination of a series of changes described in [1].
Although somewhat large by line count, it is almost entirely mechanical,
creating a new library in DebugInfo/DWARF/LowLevel. This new library has
very minimal dependencies, allowing it to be used from more places than
the normal DebugInfo/DWARF library--in particular from MC.
I am happy to put it in another location, or to structure it differently
if that makes sense. Some have suggested in BinaryFormat, but it is not
a great fit there. But if that makes more sense to the reviewers, I can
do that.
Another possibility would be to use pass-through headers to allow
clients who don't care to depend only on DebugInfo/DWARF. This would be
a much less invasive change, and perhaps easier for clients. But also a
system that hides details.
Either way, I'm open.
1.
https://discourse.llvm.org/t/rfc-debuginfo-dwarf-refactor-into-to-lower-and-higher-level-libraries/86665/2
Record the number of function invocations from external code - code
outside the binary, which may include JIT code and DSOs. Accounting
external entry counts improves the fidelity of call graph flow
conservation analysis.
Test Plan: updated shrinkwrapping.test
When we call setIgnored() on functions that already have CFG built,
these functions are not going to get emitted and we risk missing
external function references being updated.
To mitigate the potential issues, run scanExternalRefs() on such
functions to create patches/relocations.
Since scanExternalRefs() relies on function relocations, we have to
preserve relocations until the function is emitted. As a result, the
memory overhead without debug info update could reach up to 2%.
We should never call fixBranches() on a function with invalid CFG. E.g.,
ValidateInternalCalls modifies CFG for its internal analysis purposes.
At the same time, it marks the function as non-simple with an assumption
that fixBranches() will never run on that function.
However, calculateEmittedSize() by default calls fixBranches() which can
lead to all sorts of issues, including assertions firing in
fixBranches().
The fix is to use the original size for non-simple functions in
calculateEmittedSize() since we are supposed to emit the function
unmodified. Additionally, add an assertion at the start of
fixBranches().
lookupTarget takes StringRef and internally creates an instance of
std::string with the StringRef as part of constructing Triple, so we
don't need to create a temporary instance of std::string on our own.
When conditional tail call is located in old code while BOLT is
operating in lite mode, the call will require optional pending
relocation with a type that is currently not supported resulting in a
build-time crash.
Before a proper fix is implemented, ignore conditional tail calls for
relocation purposes and mark their target functions to be patched, i.e.
to be served as veneers/thunks.
Sample is a general term covering both basic (IP) and branch (LBR)
profiles. Find and replace ambiguous uses of sample in a basic sample
sense.
Rename `RawBranchCount` into `RawSampleCount` reflecting its use for
both kinds of profile.
Rename `PF_LBR` profile type as `PF_BRANCH` reflecting non-LBR based
branch profiles (non-brstack SPE, synthesized brstack ETM/PT).
Follow-up to #137644.
Test Plan: NFC
On AArch64, we create optional/weak relocations that may not be
processed due to the relocated value overflow. When the overflow
happens, we used to enforce patching for all functions in the binary via
--force-patch option. This PR relaxes the requirement, and enforces
patching only for functions that are target of optional relocations.
Moreover, if the compact code model is used, the relocation overflow is
guaranteed not to happen and the patching will be skipped.
Patch functions are used to fix instructions in the original code, i.e.,
they are not functions in a traditional sense, but rather pieces of
emitted code that are embedded into real functions.
We used to emit FDEs for all functions, including patch functions.
However, FDEs for patches are not only unnecessary, but they can lead to
problems with libraries and runtimes that consume FDEs, e.g. C++
exception handling runtime.
Note that we use named patches to fix function entry points and in that
case they behave more like regular functions. Thus we issue FDEs for
those.
This patch adds code generation for RISCV64 instrumentation.The work
involved includes the following three points:
a) Implements support for instrumenting direct function call and jump
on RISC-V which relies on , Atomic instructions
(used to increment counters) are only available on RISC-V when the A
extension is used.
b) Implements support for instrumenting direct function inderect call
by implementing the createInstrumentedIndCallHandlerEntryBB and
createInstrumentedIndCallHandlerExitBB interfaces. In this process, we
need to accurately record the target address and IndCallID to ensure
the correct recording of the indirect call counters.
c)Implemented the RISCV64 Bolt runtime library, implemented some system
call interfaces through embedded assembly. Get the difference between
runtime addrress of .text section andstatic address in section header
table, which in turn can be used to search for indirect call
description.
However, the community code currently has problems with relocation in
some scenarios, but this has nothing to do with instrumentation. We
may continue to submit patches to fix the related bugs.
Some functions have their sizes as zero in input binary's symbol
table, like those compiled by assembler. When figuring out function
sizes, we may create label symbol if it doesn't point to any constant
island. However, before function size is known, marker symbol can
not be correctly associated to a function and therefore all such
checks would fail and we could end up adding a code label pointing
to constant island as secondary entry point and later mistakenly
marking the function as not simple.
Querying the global marker symbol array has big throughput overhead.
Instead we can run an extra check when post processing entry points
to identify such label symbols that actually point to constant islands.
To handle relative vftable, which is enabled with clang option
`-fexperimental-relative-c++-abi-vtables`, we look for PC relative
relocations whose fixup locations fall in vtable address ranges.
For such relocations, actual target is just virtual function itself,
and the addend is to record the distance between vtable slot for
target virtual function and the first virtual function slot in vtable,
which is to match generated code that calls virtual function. So
we can skip the logic of handling "function + offset" and directly
save such relocations for future fixup after new layout is known.
When a pending relocation is created it is also marked whether it is
optional or not. It can be optional when such relocation is added as
part of an optimization (i.e., `scanExternalRefs`).
When bolt tries to `flushPendingRelocations`, it safely skips any
optional relocations that cannot be encoded due to being out of
range. A pre-requisite to that is the usage of the `-force-patch`
flag. Alternatrively, BOLT will bail out with a relevant message.
Background:
BOLT, as part of scanExternalRefs, identifies external references from
calls and creates some pending relocations for them. Those when
flushed will update references to point to the optimized functions.
This optimization can be disabled using `--no-scan`.
BOLT can assert if any of these pending relocations cannot be encoded.
This patch does not disable this optimization but instead selectively
applies it given that a pending relocation is optional and `-force-patch`
was enabled.
In lite mode, we only emit code for a subset of functions while
preserving the original code in .bolt.org.text. This requires updating
code references in non-emitted functions to ensure that:
* Non-optimized versions of the optimized code never execute.
* Function pointer comparison semantics is preserved.
On x86-64, we can update code references in-place using "pending
relocations" added in scanExternalRefs(). However, on AArch64, this is
not always possible due to address range limitations and linker address
"relaxation".
There are two types of code-to-code references: control transfer (e.g.,
calls and branches) and function pointer materialization.
AArch64-specific control transfer instructions are covered by #116964.
For function pointer materialization, simply changing the immediate
field of an instruction is not always sufficient. In some cases, we need
to modify a pair of instructions, such as undoing linker relaxation and
converting NOP+ADR into ADRP+ADD sequence.
To achieve this, we use the instruction patch mechanism instead of
pending relocations. Instruction patches are emitted via the regular MC
layer, just like regular functions. However, they have a fixed address
and do not have an associated symbol table entry. This allows us to make
more complex changes to the code, ensuring that function pointers are
correctly updated. Such mechanism should also be portable to RISC-V and
other architectures.
To summarize, for AArch64, we extend the scanExternalRefs() process to
undo linker relaxation and use instruction patches to partially
overwrite unoptimized code.
Instead of filtering and modifying relocations in readRelocations(),
preserve the relocation info and use it in the symbolizing disassembler.
This change mostly affects AArch64, where we need to look at original
linker relocations in order to properly symbolize instruction operands.
We used to filter out relocations corresponding to NOP+ADR instruction
pairs that were a result of linker "relaxation" optimization. However,
these relocations will be useful for reversing the linker optimization.
Keep the relocations and ignore them while symbolizing ADR instruction
operands.
Add AArch64MCSymbolizer that symbolizes `MCInst` operands during
disassembly. The symbolization was previously done in
`BinaryFunction::disassemble()`, but it is also required by
`scanExternalRefs()` for "lite" mode functionality. Hence, similar to
x86, I've implemented the symbolizer interface that uses
`BinaryFunction` relocations to properly create instruction operands. I
expect the result of the disassembly to be identical after the change.
AArch64 disassembler was not calling `tryAddingSymbolicOperand()` for
`MOV` instructions. Fix that. Additionally, the disassembler marks `ldr`
instructions as branches by setting `IsBranch` parameter to true. Ignore
the parameter and rely on `MCPlusBuilder` interface instead.
I've modified `--check-encoding` flag to check symolization of operands
of instructions that have relocations against them.
Add BinaryContext::createInstructionPatch() interface for patching parts
of the original binary with new instruction sequences. Refactor
PatchEntries pass to use the new interface.
In analyzeInstructionForFuncReference(), use MCPlusBuilder interface
while scanning symbolic operands of MCInst. Should be NFC on x86, but
will make the function work on other architectures. Note that it's
currently unused on non-x86 as its functionality is exclusive to safe
ICF that runs on x86 only.
BOLT used to mark multi-entry functions non-simple in non-relocation
mode with the reasoning that we can't move them due to potentially
undetected references. However, in aggregation mode it doesn't apply as
BOLT doesn't perform optimizations.
Relax this constraint in case of an aggregation job.
Test Plan: added entry-point-fallthru.s
Sometimes we need to know the size of a symbol besides its address, so
maybe we can start using the existing `BOLTLinker::lookupSymbolInfo()`
(that returns symbol address and size) and remove
`BOLTLinker::lookupSymbol()` (that only returns symbol address). And for
both we need to check return value as it is wrapped in `std::optional<>`,
which makes the difference even smaller.
When printing disassembly of a function with constant islands, include
the island info in the dump.
At the moment, only print islands in pre-CFG state. Include islands that
are interleaved with instructions.