This change implements import call optimization for AArch64 Windows
(equivalent to the undocumented MSVC `/d2ImportCallOptimization` flag).
Import call optimization adds additional data to the binary which can be
used by the Windows kernel loader to rewrite indirect calls to imported
functions as direct calls. It uses the same [Dynamic Value Relocation
Table mechanism that was leveraged on x64 to implement
`/d2GuardRetpoline`](https://techcommunity.microsoft.com/blog/windowsosplatform/mitigating-spectre-variant-2-with-retpoline-on-windows/295618).
The change to the obj file is to add a new `.impcall` section with the
following layout:
```cpp
// Per section that contains calls to imported functions:
// uint32_t SectionSize: Size in bytes for information in this section.
// uint32_t Section Number
// Per call to imported function in section:
// uint32_t Kind: the kind of imported function.
// uint32_t BranchOffset: the offset of the branch instruction in its
// parent section.
// uint32_t TargetSymbolId: the symbol id of the called function.
```
NOTE: If the import call optimization feature is enabled, then the
`.impcall` section must be emitted, even if there are no calls to
imported functions.
The implementation is split across a few parts of LLVM:
* During AArch64 instruction selection, the `GlobalValue` for each call
to a global is recorded into the Extra Information for that node.
* During lowering to machine instructions, the called global value for
each call is noted in its containing `MachineFunction`.
* During AArch64 asm printing, if the import call optimization feature
is enabled:
- A (new) `.impcall` directive is emitted for each call to an imported
function.
- The `.impcall` section is emitted with its magic header (but is not
filled in).
* During COFF object writing, the `.impcall` section is filled in based
on each `.impcall` directive that were encountered.
The `.impcall` section can only be filled in when we are writing the
COFF object as it requires the actual section numbers, which are only
assigned at that point (i.e., they don't exist during asm printing).
I had tried to avoid using the Extra Information during instruction
selection and instead implement this either purely during asm printing
or in a `MachineFunctionPass` (as suggested in [on the
forums](https://discourse.llvm.org/t/design-gathering-locations-of-instructions-to-emit-into-a-section/83729/3))
but this was not possible due to how loading and calling an imported
function works on AArch64. Specifically, they are emitted as `ADRP` +
`LDR` (to load the symbol) then a `BR` (to do the call), so at the point
when we have machine instructions, we would have to work backwards
through the instructions to discover what is being called. An initial
prototype did work by inspecting instructions; however, it didn't
correctly handle the case where the same function was called twice in a
row, which caused LLVM to elide the `ADRP` + `LDR` and reuse the
previously loaded address. Worse than that, sometimes for the
double-call case LLVM decided to spill the loaded address to the stack
and then reload it before making the second call. So, instead of trying
to implement logic to discover where the value in a register came from,
I instead recorded the symbol being called at the last place where it
was easy to do: instruction selection.
Following discussions in #110443, and the following earlier discussions
in https://lists.llvm.org/pipermail/llvm-dev/2017-October/117907.html,
https://reviews.llvm.org/D38482, https://reviews.llvm.org/D38489, this
PR attempts to overhaul the `TargetMachine` and `LLVMTargetMachine`
interface classes. More specifically:
1. Makes `TargetMachine` the only class implemented under
`TargetMachine.h` in the `Target` library.
2. `TargetMachine` contains target-specific interface functions that
relate to IR/CodeGen/MC constructs, whereas before (at least on paper)
it was supposed to have only IR/MC constructs. Any Target that doesn't
want to use the independent code generator simply does not implement
them, and returns either `false` or `nullptr`.
3. Renames `LLVMTargetMachine` to `CodeGenCommonTMImpl`. This renaming
aims to make the purpose of `LLVMTargetMachine` clearer. Its interface
was moved under the CodeGen library, to further emphasis its usage in
Targets that use CodeGen directly.
4. Makes `TargetMachine` the only interface used across LLVM and its
projects. With these changes, `CodeGenCommonTMImpl` is simply a set of
shared function implementations of `TargetMachine`, and CodeGen users
don't need to static cast to `LLVMTargetMachine` every time they need a
CodeGen-specific feature of the `TargetMachine`.
5. More importantly, does not change any requirements regarding library
linking.
cc @arsenm @aeubanks
The MIRPrinter emits ` :: ` at the start of a MMO. The MIRLexer eats all
the white space after the operand and before the `::` when there is no
comment. We need to eat the space after the comment to allow MIRLexer to
parse comments on a MMO.
As part of FEAT_PAuthLR, a new DWARF Frame Instruction was introduced,
`DW_CFA_AARCH64_negate_ra_state_with_pc`. This instructs Libunwind that
the PC has been used with the signing instruction. This change includes
three commits
- Libunwind support for the newly introduced DWARF Instruction
- CodeGen Support for the DWARF Instructions
- Reversing the changes made in #96377. Due to
`DW_CFA_AARCH64_negate_ra_state_with_pc`'s requirements to be placed
immediately after the signing instruction, this would mean the CFI
Instruction location was not consistent with the generated location when
not using FEAT_PAuthLR. The commit reverses the changes and makes the
location consistent across the different branch protection options.
While this does have a code size effect, this is a negligible one.
For the ABI information, see here:
853286c7ab/aadwarf64/aadwarf64.rst (id23)
A fix-it patch for dbfca24 #110228.
No need for a container. This allows 8 flags for a register.
The virtual register flags vector had a memory leak because the vector's
memory is not freed.
The `BumpPtrAllocator` handles the deallocation and missed calling the
`std::vector<uint8_t> Flags` destructor.
The delegates' callback isn't invoked on parsing new virtual registers.
There are two places in the serialization where new virtual registers can be discovered: in register infos and in instructions.
[MIR] Serialize virtual register flags
This introduces target-specific vreg flag serialization. Flags are represented as `uint8_t` and the `TargetRegisterInfo` override provides methods `getVRegFlagValue` to deserialize and `getVRegFlagsOfReg` to serialize.
Following the addition of the llvm.fake.use intrinsic and corresponding
MIR instruction, two further changes are planned: to add an
-fextend-lifetimes flag to Clang that emits these intrinsics, and to
have -Og enable this flag by default. Currently, some logic for handling
fake uses is gated by the optdebug attribute, which is intended to be
switched on by -fextend-lifetimes (and by extension -Og later on).
However, the decision was made that a general optdebug attribute should
be incompatible with other opt_ attributes (e.g. optsize, optnone),
since they all express different intents for how to optimize the
program. We would still like to allow -fextend-lifetimes with optsize
however (i.e. -Os -fextend-lifetimes should be legal), since it may be a
useful configuration and there is no technical reason to not allow it.
This patch resolves this by tracking MachineFunctions that have fake
uses, allowing us to run passes that interact with them and skip passes
that clash with them.
This reapplies commit 1911a50fae8a441b445eb835b98950710d28fc88 with a
minor fix in lld/ELF/LTO.cpp which sets Options.BBAddrMap when
`--lto-basic-block-sections=labels` is passed.
This feature is supported via the newer option
`-fbasic-block-address-map`. Using the old option still works by
delegating to the newer option, while a warning is printed to show
deprecation.
This fixes a test failure when expensive checks are enabled. Use the
correct return value when computing machine function properties resulted
in an error (e.g. when conflicting with explicitly set values).
Without this, the machine verifier would crash even in the presence of
parsing errors which should have gently terminated execution.
This produces far too much terminal output, particularly for the
instruction reduction. Since it doesn't consider the liveness of of
the instructions it's deleting, it produces quite a lot of verifier
errors.
Allow setting the computed properties IsSSA, NoPHIs, NoVRegs for MIR
functions in MIR input. The default value is still the computed value.
If the property is set to false, the computed result is ignored. Conflicting
values (e.g. setting IsSSA where the input MIR is clearly not SSA) lead to
an error.
Closes#37787
It is almost always simpler to use {} instead of std::nullopt to
initialize an empty ArrayRef. This patch changes all occurrences I could
find in LLVM itself. In future the ArrayRef(std::nullopt_t) constructor
could be deprecated or removed.
S.substr(N) is simpler than S.slice(N, StringRef::npos) and
S.slice(N, S.size()). Also, substr is probably better recognizable
than slice thanks to std::string_view::substr.
In case of functions without a stack frame no "stack" field is
serialized into MIR which leads to isCalleeSavedInfoValid being false
when reading a MIR file back in. To fix this we should serialize
MachineFrameInfo::isCalleeSavedInfoValid() into MIR.
In new pass system, `MachineFunction` could be an analysis result again,
machine module pass can now fetch them from analysis manager.
`MachineModuleInfo` no longer owns them.
Remove `FreeMachineFunctionPass`, replaced by
`InvalidateAnalysisPass<MachineFunctionAnalysis>`.
Now `FreeMachineFunction` is replaced by
`InvalidateAnalysisPass<MachineFunctionAnalysis>`, the workaround in
`MachineFunctionPassManager` is no longer needed, there is no difference
between `unittests/MIR/PassBuilderCallbacksTest.cpp` and
`unittests/IR/PassBuilderCallbacksTest.cpp`.
CallSiteInfo is originally used only for argument - register pairs. Make
it struct, in which we can store additional data for call sites.
Also, the variables/methods used for CallSiteInfo are named for its
original use case, e.g., CallFwdRegsInfo. Refactor these for the
upcoming
use, e.g. addCallArgsForwardingRegs() -> addCallSiteInfo().
An upcoming patch will add type ids for indirect calls to propogate them
from
middle-end to the back-end. The type ids will be then used to emit the
call
graph section.
Original RFC:
https://lists.llvm.org/pipermail/llvm-dev/2021-June/151044.html
Updated RFC:
https://lists.llvm.org/pipermail/llvm-dev/2021-July/151739.html
Differential Revision: https://reviews.llvm.org/D107109?id=362888
Co-authored-by: Necip Fazil Yildiran <necip@google.com>
[GlobalISel] Implement convergence control tokens and intrinsics in GMIR
In the IR translator, convert the LLVM token type to LLT::token(), which is an
alias for the s0 type. These show up as implicit uses on convergent operations.
Differential Revision: https://reviews.llvm.org/D158147
https://github.com/llvm/llvm-project/pull/78171 added support for
non-consecutive local value numbers. This extends the support for global
value numbers (for globals and functions).
This means that it is now possible to delete an unnamed global
definition/declaration without breaking the IR.
This is a lot less common than unnamed local values, but it seems like
something we should support for consistency. (Unnamed globals are used a
lot in Rust though.)
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.
I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
Most users of PseudoSourceValue.h only need PseudoSourceValue, not the
PseudoSourceValueManager. However, this header pulls in some very
expensive dependencies like ValueMap.h, which is only used for the
manager.
Split off the manager into a separate header and include it only where
used.
We were allowing extra immediate arguments, and only bothering to check
if registers were implicit or not.
Also consolidate extra operand checks in verifier, to make this
testable. We had 3 different places checking if you were trying to build
an instruction with more operands than allowed by the definition. We had
an assertion in addOperand, a direct check in the MIRParser to avoid the
assertion, and the machine verifier checks. Remove the assert and parser
check so the verifier can provide a consistent verification experience,
which will also handle instructions modified in place.
28b9126879
introduced the path cloning format in the basic-block-sections profile.
This PR validates and applies path clonings.
A path cloning is valid if all of these conditions hold:
1. All bb ids in the path are mapped to existing blocks.
2. Each two consecutive bb ids in the path have a successor relationship
in the CFG.
3. The path does not include a block with indirect branches, except
possibly as the last block.
Applying a path cloning involves cloning all blocks in the path (except
the first one) and setting up their branches.
Once all clonings are applied, the cluster information is used to guide
block layout in the modified function.
Some opcodes in MIR are defined to be convergent by the target by setting
IsConvergent in the corresponding TD file. For example, in AMDGPU, the opcodes
G_SI_CALL and G_INTRINSIC* are marked as convergent. But this is too
conservative, since calls to functions that do not execute convergent operations
should not be marked convergent. This information is available in LLVM IR.
The new flag MIFlag::NoConvergent now allows the IR translator to mark an
instruction as not performing any convergent operations. It is relevant only on
occurrences of opcodes that are marked isConvergent in the target.
Differential Revision: https://reviews.llvm.org/D157475
Record the call frame size on entry to each basic block. This is usually
zero except when a basic block has been split in the middle of a call
sequence.
This simplifies PEI::replaceFrameIndices which previously had to visit
basic blocks in a specific order and had special handling for
unreachable blocks. More importantly it paves the way for an equally
simple implementation of a backwards version of replaceFrameIndices,
which is required to fully convert PrologEpilogInserter to backwards
register scavenging, which is preferred because it does not rely on
accurate kill flags.
Differential Revision: https://reviews.llvm.org/D156113
Record the SP adjustment on entry to each basic block. This is almost
always zero except on targets like ARM which can split a basic block in
the middle of a call sequence.
This simplifies PEI::replaceFrameIndices which previously had to visit
basic blocks in a specific order and had special handling for
unreachable blocks. More importantly it paves the way for an equally
simple implementation of a backwards version of replaceFrameIndices,
which is required to fully convert PrologEpilogInserter to backwards
register scavenging, which is preferred because it does not rely on
accurate kill flags.
Differential Revision: https://reviews.llvm.org/D154281
to help debug and report better diagnostics for functions like
relaxDwarfCallFrameFragment (D153167).
In MCStreamer, some emitCFI* functions already take a SMLoc argument. Add a
SMLoc argument to the remaining functions that generate a MCCFIInstruction.
Sometimes an developer would like to have more control over cmov vs branch. We have unpredictable metadata in LLVM IR, but currently it is ignored by X86 backend. Propagate this metadata and avoid cmov->branch conversion in X86CmovConversion for cmov with this metadata.
Example:
```
int MaxIndex(int n, int *a) {
int t = 0;
for (int i = 1; i < n; i++) {
// cmov is converted to branch by X86CmovConversion
if (a[i] > a[t]) t = i;
}
return t;
}
int MaxIndex2(int n, int *a) {
int t = 0;
for (int i = 1; i < n; i++) {
// cmov is preserved
if (__builtin_unpredictable(a[i] > a[t])) t = i;
}
return t;
}
```
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D118118
Use big obj copy in range for-loop will call copy constructor every time,
which can be avoided by use ref instead.
Reviewed By: skan
Differential Revision: https://reviews.llvm.org/D150024
This commit implements the serialization and deserialization of the Machine
Function's EntryValueObjects.
Depends on D149879, D149778
Differential Revision: https://reviews.llvm.org/D149880