Alternative instructions in the Linux kernel may modify control flow in
a function. As such, it is unsafe to optimize functions with alternative
instructions until we properly support CFG alternatives.
Previously, we marked functions with alt instructions before the
emission, but that could be too late if we remove or replace
instructions with alternatives. We could have marked functions as
non-simple immediately after reading .altinstructions, but it's nice to
be able to view functions after CFG is built. Thus assign the non-simple
status after building CFG.
Summary: Constructing an artificial sink block for the
flow CFG in stale profile inference to allow profile
inference to be run on CFGs with blocks that terminate
and have successors.
Testing Plan: Added infer_no_exits.test to verify that
functions with exit blocks with a landing pad are
covered by stale profile inference.
---------
Co-authored-by: Amir Ayupov <fads93@gmail.com>
Summary: Functions with high discrepancy
(measured by matched function blocks)
can be ignored with an added command line
argument for better performance.
Test Plan: Added
stale-matching-min-matched-block.test
---------
Co-authored-by: Amir Ayupov <aaupov@fb.com>
`convertCallToIndirectCall` applies the PLTCall optimization and returns
an (updated if needed) iterator to the converted call instruction. Since
AArch64 requires to inject additional instructions to implement this
pass, the relevant BasicBlock and an iterator was passed to the
`convertCallToIndirectCall`.
`NumCallsOptimized` is updated only on successful application of the
pass.
Tests:
- Inputs/plt-tailcall.c: an example of a tail call optimized PLT call.
- AArch64/plt-call.test: it is the actual A64 test, that runs the
PLTCall optimization on the above input file and verifies the
application of the pass to the calls: 'printf' and 'puts'.
.altinstructions section contains a list of structures where fields can
have different sizes while other fields could be present or not
depending on the kernel version. Add automatic detection of such
variations and use it by default. The user can still overwrite the
automatic detection with `--alt-inst-has-padlen` and
`--alt-inst-feature-size` options.
Previously when an entry was skipped in parent chain a child will point
to the next valid entry in the chain. After discussion in
https://github.com/llvm/llvm-project/pull/91808 this is not very useful.
Changed implemenation so that all the children of the entry that is
skipped won't have DW_IDX_parent.
In ValidateMemRefs pass, when we validate references in the form of
`Symbol + Addend`, we should check `Symbol` not `Symbol + Addend`
against aliasing a jump table.
Recommitting with a modified test case:
https://github.com/llvm/llvm-project/pull/88838
Co-authored-by: sinan <sinan.lin@linux.alibaba.com>
The intention is to check a section name different from
.gcc_except_table . Rather than using a linker script, use llvm-objcopy
--rename-section instead.
bolt/test/lit.local.cfg wants to use the system GCC installation but it
specifies a wrong triple ("linux" instead of "linux-gnu") and relies on
clangDriver's loose GCC installation detection to pick up "*-linux-gnu".
This loose behavior may not work. Use "linux-gnu" instead.
Note: neither "linux" nor "linux-gnu" detects "linux-musl" triples, so
these tests currently fail on musl based systems.
Other files changes are cosmetic.
CDSplit splits functions up to three ways: main fragment with no suffix,
and fragments with .cold and .warm suffixes.
Add .warm suffix to the regex used to recognize split fragments.
Test Plan: updated register-fragments-bolt-symbols.s
Then two tests rely on .interp being the first section.
llvm-bolt would crash if lld places .interp after .got
(f639b57f7993cadb82ee9c36f04703ae4430ed85).
For best portability, when a linker scripts specifies a SECTIONS
command, the first section for each PT_LOAD segment should be specified
with a MAXPAGESIZE alignment. Otherwise, linkers have freedom to decide
how to place orphan sections, which might break intention.
Update the folder titles for targets in the monorepository that have not
seen taken care of for some time. These are the folders that targets are
organized in Visual Studio and XCode (`set_property(TARGET <target>
PROPERTY FOLDER "<title>")`) when using the respective CMake's IDE
generator.
* Ensure that every target is in a folder
* Use a folder hierarchy with each LLVM subproject as a top-level folder
* Use consistent folder names between subprojects
* When using target-creating functions from AddLLVM.cmake, automatically
deduce the folder. This reduces the number of
`set_property`/`set_target_property`, but are still necessary when
`add_custom_target`, `add_executable`, `add_library`, etc. are used. A
LLVM_SUBPROJECT_TITLE definition is used for that in each subproject's
root CMakeLists.txt.
Disambiguate local functions using the containing file symbol in BAT
mode. Make local function naming consistent across BAT fdata and YAML
profiles.
Test Plan: updated register-fragments-bolt-symbols.s
To align YAML and fdata profiles produced in BAT mode, lift two
restrictions applied in non-relocation mode when BAT is present:
1) register secondary entry points from ignored functions,
2) treat functions with secondary entry points as simple.
This allows constructing CFG for non-simple functions in non-relocation
mode and emitting YAML profile for them, which can then be used for
optimizations in relocation mode.
Test Plan: added test ignored-interprocedural-reference.s
Exempt special symbols (hot text/data and _end symbol) from normal
handling. We only need to set their value and make them absolute.
If these symbols are handled as normal symbols and if they alias
functions we may create non-sensical symbols, e.g. __hot_start.cold.
Test Plan: updated hot-end-symbol.s
Reviewers: maksfb, rafaelauler, ayermolo, dcci
Reviewed By: dcci, maksfb
Pull Request: https://github.com/llvm/llvm-project/pull/92713
YAML profile for non-simple functions without CFG is
1) useless for optimizations,
2) can't be attached, similar to fdata profile,
3) would be reported as invalid/stale even if the profile is valid.
Don't attempt to attach the profile in this case, aligning the behavior
to DataReader.
Test Plan: added yaml-non-simple.test
Type unit DIE generated by clang contains DW_AT_comp_dir/DW_AT_dwo_name.
This was added to clang to help LLDB to figure out where type unit come
from when accessing an entry in a .debug_names accelerator table and
type units in .dwp file.
When BOLT writes out .dwo files it changes the name of them. User can
also specify directory of where they can be written out. Added support
to BOLT to update those attributes.
Switch from FuncBranchData intermediate maps (Intra/InterIndex)
to aggregated Data, same as one used by DataReader:
e62ce1f884/bolt/lib/Profile/DataReader.cpp (L385-L389)
This aligns the order of the output between YAMLProfileWriter and
writeBATYAML.
Test Plan: updated bolt-address-translation-yaml.test
Reviewers: rafaelauler, dcci, ayermolo, maksfb
Reviewed By: ayermolo, maksfb
Pull Request: https://github.com/llvm/llvm-project/pull/91289
Fix an issue where the profile for all branches that have a BRANCHENTRY
is dropped. If the branch has an entry in BAT, it will be translated to
its input offset. We used to only permit the basic block offset as a
branch source. Perform a lookup of containing basic block instead.
Test Plan: Updated bolt-address-translation-yaml.test
Reviewers: maksfb, dcci, rafaelauler, ayermolo
Reviewed By: maksfb
Pull Request: https://github.com/llvm/llvm-project/pull/91273
A compiler can generate a redundant indirection for a jump via a fixed
jump table target. Add a test case that covers such pattern that covers
PIC case. We already have non-PIC case detection.
Currently XFAIL.
Returns are ignored in perf/pre-aggregated/fdata profile reader (see
DataReader::convertBranchData). They are also omitted in
YAMLProfileWriter by virtue of not having the profile attached to them
in the reader, and YAMLProfileWriter converting the profile attached to
BinaryFunctions. Thus, return profile is universally ignored across all
profile types except BAT YAML.
To make returns ignored for YAML produced in BAT mode, we can:
1) ignore them in YAMLProfileReader,
2) omit them from YAML profile in profile conversion/writing.
The first option is prone to profile staleness issue, where the profiled
binary doesn't match the one to be optimized, and thus returns in the
profile can no longer be reliably detected (as we don't distinguish them
from calls in the profile).
The second option is robust to staleness but requires disassembling the
branch source instruction.
Test Plan: Updated bolt-address-translation-yaml.test
Reviewers: rafaelauler, dcci, ayermolo, maksfb
Reviewed By: maksfb
Pull Request: https://github.com/llvm/llvm-project/pull/90807
To match profile data to code we need to know branch instruction offsets
within a function. For this reason, we mark branches with the "Offset"
annotation while disassembling the code. However, _dynamic_ branches in
the Linux kernel could be NOPs in disassembled code, and we ignore them
while adding annotations. We need to explicitly add the "Offset"
annotation while creating dynamic branches.
Note that without this change, `getInstructionAtOffset()` would still
return a branch instruction if the offset matched the last instruction
in a basic block (and the profile data was matched correctly). However,
the function failed for cases when the searched instruction was followed
by an unconditional jump. "Offset" annotation solves this case.
Skip updating references for operands that do not directly
refer to jump table symbols but fall within a jump table's
address range to prevent unintended modifications.
Use known order of BOLT split function symbols: fragment symbols
immediately precede the parent fragment symbol.
Depends On: https://github.com/llvm/llvm-project/pull/89648
Test Plan: Added register-fragments-bolt-symbols.s
Fragment matching relies on symbol names to identify and register split
function fragments. However, as split fragments are often local symbols,
name aliasing is possible. For such cases, use symbol table to resolve
ambiguities.
This requires the presence of FILE symbols in the input binary. As BOLT
requires non-stripped binary, this is a reasonable assumption. Note that
`strip -g` removes FILE symbols by default, but `--keep-file-symbols`
can be used to preserve them.
Depends on: https://github.com/llvm/llvm-project/pull/89861
Test Plan:
Updated X86/fragment-lite.s