Check that the function will be emitted in the final binary. Preserving
old function address is needed in case it is PLT trampiline, that is
currently not moved by the BOLT.
Differential Revision: https://reviews.llvm.org/D122098
BOLT treats aarch64 objects located in text as empty functions with
contant islands. Emit them with at least 8-byte alignment to the new
text section.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
Differential Revision: https://reviews.llvm.org/D122097
AArch64 requires CI to be aligned to 8 bytes due to access instructions
restrictions. E.g. the ldr with imm, where imm must be aligned to 8 bytes.
Differential Revision: https://reviews.llvm.org/D122065
The cold text section alignment is set using the maximum alignment value
passed to the emitCodeAlignment. In order to calculate tentetive layout
right we will set the minimum alignment of such sections to the maximum
possible function alignment explicitly.
Differential Revision: https://reviews.llvm.org/D121392
For AArch64 in some cases/some distributions ld uses 64K alignment of LOAD segments by default.
Reviewed By: yota9, maksfb
Differential Revision: https://reviews.llvm.org/D119267
We were not handling correctly conversion from DW_AT_high_pc into DW_AT_ranges,
when size of DW_AT_high_pc is not 4/8 bytes.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D120528
PC-relative memory operand could reference a different object from
the one located at the target address, e.g. when a negative offset
is used. Check relocations for the real referenced object.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D120379
Refactor createBinaryContext and RewriteInstance/MachORewriteInstance
constructors to report an error in a library and fuzzer-friendly way instead of
returning a nullptr or exiting.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D119658
This patch changes patchELFAllocatableRelaSections from going through
old relocations sections and update the relocation offsets to emitting
the relocations stored in binary sections. This is needed in case we
would like to remove and add dynamic relocations during BOLT work and it
is used by golang support pass. Note: Currently we emit relocations in
the old sections, so the total number of them should be equal or less
of old number.
Testing: No special tests are neeeded, since this patch does not fix
anything or add new functionality (it only prepares to add). Every
PIC-compiled test binary will use this code and thus become a test.
But just in case the aarch64 dynamic relocations tests were added.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D117612
Added ability to append new entries to DIE. This is useful to standadize DWARF4
Split Dwarf, and simplify implementation of DWARF5.
Multiple DIEs can share an abbrev. So currently limitation is that only unique
Attributes can be added.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D119577
As usual with that header cleanup series, some implicit dependencies now need to
be explicit:
llvm/DebugInfo/DWARF/DWARFContext.h no longer includes:
- "llvm/DebugInfo/DWARF/DWARFAcceleratorTable.h"
- "llvm/DebugInfo/DWARF/DWARFCompileUnit.h"
- "llvm/DebugInfo/DWARF/DWARFDebugAbbrev.h"
- "llvm/DebugInfo/DWARF/DWARFDebugAranges.h"
- "llvm/DebugInfo/DWARF/DWARFDebugFrame.h"
- "llvm/DebugInfo/DWARF/DWARFDebugLoc.h"
- "llvm/DebugInfo/DWARF/DWARFDebugMacro.h"
- "llvm/DebugInfo/DWARF/DWARFGdbIndex.h"
- "llvm/DebugInfo/DWARF/DWARFSection.h"
- "llvm/DebugInfo/DWARF/DWARFTypeUnit.h"
- "llvm/DebugInfo/DWARF/DWARFUnitIndex.h"
Plus llvm/Support/Errc.h not included by a bunch of llvm/DebugInfo/DWARF/DWARF*.h files
Preprocessed lines to build llvm on my setup:
after: 1065629059
before: 1066621848
Which is a great diff!
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D119723
When a jump table is recovered in postProcessIndirectBranches(),
successors for the containing basic block are added in random order.
Make the order deterministic.
Reviewed By: yota9
Differential Revision: https://reviews.llvm.org/D119672
We can have a scenario where multiple CUs share an abbrev table.
We modify or don't modify one CU, which leads to other CUs having invalid abbrev section.
Example that caused it.
All of CUs shared the same abbrev table. First CU just had compile_unit and sub_program.
It was not modified. Next CU had DW_TAG_lexical_block with
DW_AT_low_pc/DW_AT_high_pc converted to DW_AT_low_pc/DW_AT_ranges.
We used unmodified abbrev section for first and subsequent CUs.
So when parsing subsequent CUs debug info was corrupted.
In this patch we will now duplicate all sections that are modified and are different.
This also means that if .debug_types is present and it shares Abbrev table, and
they usually are, we now can have two Abbrev tables. One for CU that was modified,
and unmodified one for TU.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D118517
The aarch64 platform has special registers like X0_X1_X2_X3_X4_X5_X6_X7.
Using the downwards propagation this register will become a super
register for all X0..X7 and its super registers which is not right. This
patch replaces the downwards propagation with caching all the aliases using MCRegAliasIterator.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D117394
Since we now re-write .debug_info the DWARF CU Offsets can change.
Just like for .debug_aranges the GDB Index will need to be updated.
Reviewed By: Amir, maksfb
Differential Revision: https://reviews.llvm.org/D118273
This patch fixes the removal of unreachable uncondtional branch located
after return instruction.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D117677
Summary:
Move the annotation to avoid dynamic memory allocations.
Improves the CPU time of instrumenting a large binary by 1% (+-0.8%, p-value 0.01)
Test Plan: NFC
Reviewers: maksfb
FBD30091656
This is a follow up to Fix size mismatch error with jemalloc.
4243b6582cf3bb5fbcde908913d4779ded731321
Although that fix works it increased memory footprint.
With this patch we go back to original memory footprint.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D117341
Summary:
If `addUnknownControlFlow` in `BinaryFunction::postProcessIndirectBranches`
is invoked with a basic block that has multiple edges to the same successor,
it leads to an assertion in `BinaryBasicBlock::removePredecessor`.
For basic blocks with multiple edges to the same successor, the default
behavior of removePredecessor is to remove all occurrences of the
predecessor block in its predecessor list (Multiple=true).
Example:
```A -> B (two edges)
A->removeAllSuccessors()
for each successor of block A: // B twice
// this removes both occurrences of A in B's predecessors list
B->removePredecessor(A);
// this invocation triggers an assert as A is no longer in B's
// predecessor list
B->removePredecessor(A);
```
This issue is not fixed by NormalizeCFG as `removeAllSuccessor` is called
earlier (from `buildCFG` -> `postProcessIndirectBranches`).
Solve this issue by collecting the successors into a set (`SmallPtrSet`) first,
before invoking `SuccessorBB->removePredecessor(this)`.
GitHub issue: https://github.com/facebookincubator/BOLT/issues/187
(cherry picked from FBD30796979)
Summary:
Changed the behavior of how we handle .debug_info section.
Instead of patching it will now rewrite it.
With this approach we are no longer constrained to having new values
of the same size.
It handles re-writing by treating .debug_info as raw data.
It copies chunks of data between patches, with new data written in
between.
(cherry picked from FBD32519952)
Summary:
Fix according to Coding Standards doc, section Don't Use
Braces on Simple Single-Statement Bodies of if/else/loop Statements.
This set of changes applies to lib Core only.
(cherry picked from FBD33240028)
Summary:
Since nops are now removed in a separate pass, the profile is consumed
on a CFG with nops. If previously a profile was generated without nops,
the offsets in the profile could be different if branches included nops
either as a source or a destination.
This diff adjust offsets to make the profile reading backwards
compatible.
(cherry picked from FBD33231254)
Summary:
The patch moves the shortenInstructions and nop remove to separate binary
passes. As a result when llvm-bolt optimizations stage will begin the
instructions of the binary functions will be absolutely the same as it
was in the binary. This is needed for the golang support by llvm-bolt.
Some of the tests must be changed, since bb alignment nops might create
unreachable BBs in original functions.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
(cherry picked from FBD32896517)
Summary:
This patch adds AArch64 relocations handling in case updating of
debug sections is enabled
Elvina Yakubova,
Advanced Software Technology Lab, Huawei
(cherry picked from FBD33077609)
Summary:
Gracefully handle binaries with split functions where two fragments are folded
into one, resulting in a fragment with two parent functions.
This behavior is expected in GCC8+ with -O2 optimization level, where both
function splitting and ICF are enabled by default.
On the BOLT side, the changes are:
- BinaryFunction: allow multiple parent fragments:
- `ParentFragment` --> `ParentFragments`,
- `setParentFragment` --> `addParentFragment`.
- BinaryContext:
- `populateJumpTables`: mark fragments to be skipped later,
- `registerFragment`: add a name heuristic check, return false if it failed,
- `processInterproceduralReferences`: check if `registerFragment`
succeeded, otherwise issue a warning,
- `skipMarkedFragments`: move out fragment traversal and skipping from
`populateJumpTables` into a separate function.
This change fixes an issue where unrelated functions might be registered
as fragments:
```
BOLT-WARNING: interprocedural reference between unrelated fragments:
bad_gs/1(*2) and amd_decode_mce.cold.27/1(*2)
```
(Linux kernel binary)
(cherry picked from FBD32786688)
Summary:
Refactor members of BinaryBasicBlock. Replace some std containers with
ADT equivalents. The size of BinaryBasicBlock on x86-64 Linux is reduced
from 232 bytes to 192 bytes.
(cherry picked from FBD33081850)
Summary:
Switched members of BinaryFunction to ADT where it was possible and
made sense. As a result, the size of BinaryFunction on x86-64 Linux
reduced from 1624 bytes to 1448.
(cherry picked from FBD32981555)
Summary:
For DWP case the AbbreviationsOffset is the offset of the abbrev
contribution in the DWP file, so can be none zero.
(cherry picked from FBD32961240)
Summary:
Currently, RuntimeDyld will not allocate a section without relocations
even if such a section is marked allocatable and defines symbols.
When we emit .debug_line for compile units with unchanged code, we
output original (input) data, without relocations. If all units are
emitted in this way, we will have no relocations in the emitted
.debug_line. RuntimeDyld will not allocate the section and as a result
we will write an empty .debug_line section.
To workaround the issue, always emit a relocation of RELOC_NONE type
when emitting raw contents to debug_line.
(cherry picked from FBD32909869)
Summary:
The debug message for the last fall-through block was printed under the
reverse condition, i.e. when the block was not a fall-through. Remove
the debug message. If we'll need such information, we can add a pass
with more analysis, i.e. checking the last instruction, if the block is
reachable, etc.
(cherry picked from FBD32670816)
Summary:
We were not tracking -DBUILD_SHARED_LIBS=ON and we introduced
some commits that break that by not specifying the correct dependencies.
Fix that.
(cherry picked from FBD32453377)
Summary:
Make BOLT build in VisualStudio compiler and run without
crashing on a simple test. Other tests are not running.
(cherry picked from FBD32378736)
Summary:
With LTO, it's possible for multiple DWARF compile units to share the
same abbreviation section set, i.e. to have the same abbrev_offset.
When units sharing the same abbrev set are located next to each other
and neither of them is being processed (i.e. contain processed
functions), it can trigger a bug in BOLT. When this happened,
the abbrev set is considered empty. Additionally, different units
may patch abbrev section differently.
The fix is to not rely on the next unit offset when detecting
abbreviation set boundaries and to delay writing abbrev section
until all units are processed.
(cherry picked from FBD31985046)
Summary:
Moves source files into separate components, and make explicit
component dependency on each other, so LLVM build system knows how to
build BOLT in BUILD_SHARED_LIBS=ON.
Please use the -c merge.renamelimit=230 git option when rebasing your
work on top of this change.
To achieve this, we create a new library to hold core IR files (most
classes beginning with Binary in their names), a new library to hold
Utils, some command line options shared across both RewriteInstance
and core IR files, a new library called Rewrite to hold most classes
concerned with running top-level functions coordinating the binary
rewriting process, and a new library called Profile to hold classes
dealing with profile reading and writing.
To remove the dependency from BinaryContext into X86-specific classes,
we do some refactoring on the BinaryContext constructor to receive a
reference to the specific backend directly from RewriteInstance. Then,
the dependency on X86 or AArch64-specific classes is transfered to the
Rewrite library. We can't have the Core library depend on targets
because targets depend on Core (which would create a cycle).
Files implementing the entry point of a tool are transferred to the
tools/ folder. All header files are transferred to the include/
folder. The src/ folder was renamed to lib/.
(cherry picked from FBD32746834)