This reverts commit 05e08cdb3e576cc0887d1507ebd2f756460c7db7.
Adding the missing -mtriple flags in MIR/X86 test files which caused
these tests to fail which was the reason for reverting the patch.
Introducing `EnableCallGraphSection` target option to add
CalleeTypeIds field in CallSiteInfo. Read the callee type ids
in and out by the MIR parser/printer.
Reviewers: ilovepi
Reviewed By: ilovepi
Pull Request: https://github.com/llvm/llvm-project/pull/87574
Deleting a basic block removes all references from jump tables, which
is O(n). When freeing a MachineFunction, all basic blocks are deleted
before the jump tables, causing O(n^2) runtime. Fix this by deallocating
the jump table first.
Test case generator:
import sys
n = int(sys.argv[1])
print("define void @f(i64 %c, ptr %p) {")
print(" switch i64 %c, label %d [")
for i in range(n):
print(f" i64 {i}, label %h{i}")
print(f" ]")
for i in range(n):
print(f'h{i}:')
print(f' store i64 {i*i}, ptr %p')
print(f' ret void')
print('d:')
print(' ret void')
print('}')
Improvement at 5000 entries:
Benchmark 1: ./llc.pre -filetype=obj -O0 <switch5k.bc
Time (mean ± σ): 49.7 ms ± 1.0 ms
Range (min … max): 48.0 ms … 52.1 ms 57 runs
Benchmark 2: ./llc.post -filetype=obj -O0 <switch5k.bc
Time (mean ± σ): 39.4 ms ± 0.8 ms
Range (min … max): 37.1 ms … 41.1 ms 72 runs
Summary
./llc.post -filetype=obj -O0 <switch5k.bc ran
1.26 ± 0.04 times faster than ./llc.pre -filetype=obj -O0 <switch5k.bc
Improvement at 20000 entries:
Benchmark 1: ./llc.pre -filetype=obj -O0 <switch20k.bc
Time (mean ± σ): 281.7 ms ± 1.0 ms
Range (min … max): 280.2 ms … 283.0 ms 10 runs
Benchmark 2: ./llc.post -filetype=obj -O0 <switch20k.bc
Time (mean ± σ): 123.9 ms ± 1.5 ms
Range (min … max): 121.4 ms … 129.2 ms 23 runs
Summary
./llc.post -filetype=obj -O0 <switch20k.bc ran
2.27 ± 0.03 times faster than ./llc.pre -filetype=obj -O0 <switch20k.bc
Pull Request: https://github.com/llvm/llvm-project/pull/144108
The LiveDebugValues pass and the instruction selector (which calls
salvageCopySSA) need to be consistent on what they consider a copy
instruction. With https://github.com/llvm/llvm-project/pull/75184, the
definition of what a copy instruction is was narrowed for AArch64 to
exclude a w->x ORR and treat it as a zero-extend rather than a copy
However, to make sure LiveDebugValues still treats a w->x ORR as a copy,
the new function, isCopyLikeInstr was created. We need to make sure that
salvageCopySSA also calls that function.
This patch addresses this mismatch.
As part of the "RemoveDIs" project, BasicBlock::iterator now carries a
debug-info bit that's needed when getFirstNonPHI and similar feed into
instruction insertion positions. Call-sites where that's necessary were
updated a year ago; but to ensure some type safety however, we'd like to
have all calls to getFirstNonPHI use the iterator-returning version.
This patch changes a bunch of call-sites calling getFirstNonPHI to use
getFirstNonPHIIt, which returns an iterator. All these call sites are
where it's obviously safe to fetch the iterator then dereference it. A
follow-up patch will contain less-obviously-safe changes.
We'll eventually deprecate and remove the instruction-pointer
getFirstNonPHI, but not before adding concise documentation of what
considerations are needed (very few).
---------
Co-authored-by: Stephen Tozer <Melamoto@gmail.com>
https://discourse.llvm.org/t/rfc-profile-guided-static-data-partitioning/83744
proposes to partition static data sections.
This patch introduces a codegen pass. This patch produces jump table
hotness in the in-memory states (machine jump table info and entries).
Target-lowering and asm-printer consume the states and produce `.hot`
section suffix. The follow up PR
https://github.com/llvm/llvm-project/pull/122215 implements such
changes.
---------
Co-authored-by: Ellis Hoag <ellis.sparky.hoag@gmail.com>
Fixes the "use after poison" issue introduced by #121516 (see
<https://github.com/llvm/llvm-project/pull/121516#issuecomment-2585912395>).
The root cause of this issue is that #121516 introduced "Called Global"
information for call instructions modeling how "Call Site" info is
stored in the machine function, HOWEVER it didn't copy the
copy/move/erase operations for call site information.
The fix is to rename and update the existing copy/move/erase functions
so they also take care of Called Global info.
Try to use DiagnosticInfo if every register in the class is reserved
by forcing assignment to a reserved register. Also reduces the number
of redundant errors emitted, particularly with fast.
This is still broken in the case of undef uses. There are additional
complications in greedy and fast, so leave it for a separate fix.
Following discussions in #110443, and the following earlier discussions
in https://lists.llvm.org/pipermail/llvm-dev/2017-October/117907.html,
https://reviews.llvm.org/D38482, https://reviews.llvm.org/D38489, this
PR attempts to overhaul the `TargetMachine` and `LLVMTargetMachine`
interface classes. More specifically:
1. Makes `TargetMachine` the only class implemented under
`TargetMachine.h` in the `Target` library.
2. `TargetMachine` contains target-specific interface functions that
relate to IR/CodeGen/MC constructs, whereas before (at least on paper)
it was supposed to have only IR/MC constructs. Any Target that doesn't
want to use the independent code generator simply does not implement
them, and returns either `false` or `nullptr`.
3. Renames `LLVMTargetMachine` to `CodeGenCommonTMImpl`. This renaming
aims to make the purpose of `LLVMTargetMachine` clearer. Its interface
was moved under the CodeGen library, to further emphasis its usage in
Targets that use CodeGen directly.
4. Makes `TargetMachine` the only interface used across LLVM and its
projects. With these changes, `CodeGenCommonTMImpl` is simply a set of
shared function implementations of `TargetMachine`, and CodeGen users
don't need to static cast to `LLVMTargetMachine` every time they need a
CodeGen-specific feature of the `TargetMachine`.
5. More importantly, does not change any requirements regarding library
linking.
cc @arsenm @aeubanks
A previous commit d826b0c9 accidentally added a new MachineFunctionProperty,
HasFakeUses, that was unused by the commit (and results in an
uncovered-switch warning, which was fixed by a separate followup 1811e872);
this patch removes that enum value.
This reapplies commit 1911a50fae8a441b445eb835b98950710d28fc88 with a
minor fix in lld/ELF/LTO.cpp which sets Options.BBAddrMap when
`--lto-basic-block-sections=labels` is passed.
This feature is supported via the newer option
`-fbasic-block-address-map`. Using the old option still works by
delegating to the newer option, while a warning is printed to show
deprecation.
The dominator tree gained an optimization to use block numbers instead
of a DenseMap to store blocks. Given that machine basic blocks already
have numbers, expose these via appropriate GraphTraits. For debugging,
block number epochs are added to MachineFunction -- this greatly helps
in finding uses of block numbers after RenumberBlocks().
In a few cases where dominator trees are preserved across renumberings,
the dominator tree is updated to use the new numbers.
This avoids another unserializable field. Move the DbgInfoAvailable
field into the AsmPrinter, which is only really a cache/convenience
bit for checking a direct IR module metadata check.
I do not know understand what this is for, but it's only used in
SelectionDAGBuilder, so move it to FunctionLoweringInfo like other
function scope DAG builder state. The intrinsics are not documented
in the LangRef or Intrinsics.td.
This removes the last piece of codegen state from MachineModuleInfo.
shouldRealignStack/canRealignStack are repeatedly called in PEI (through
hasStackRealignment). Checking function attributes is expensive, so
cache this data in the MachineFrameInfo, which had most data already.
This slightly changes the semantics of `MachineFrameInfo::ForcedRealign`
to be also true when the `stackrealign` attribute is set.
This is part of #70452 that changes the type used for the external
interface of MMO to LocationSize as opposed to uint64_t. This means the
constructors take LocationSize, and convert ~UINT64_C(0) to
LocationSize::beforeOrAfter(). The getSize methods return a
LocationSize.
This allows us to be more precise with unknown sizes, not accidentally
treating them as unsigned values, and in the future should allow us to
add proper scalable vector support but none of that is included in this
patch. It should mostly be an NFC.
Global ISel is still expected to use the underlying LLT as it needs, and
are not expected to see unknown sizes for generic operations. Most of
the changes are hopefully fairly mechanical, adding a lot of getValue()
calls and protecting them with hasValue() where needed.
Today `-split-machine-functions` and `-fbasic-block-sections={all,list}`
cannot be combined with `-basic-block-sections=labels` (the labels
option will be ignored).
The inconsistency comes from the way basic block address map -- the
underlying mechanism for basic block labels -- encodes basic block
addresses
(https://lists.llvm.org/pipermail/llvm-dev/2020-July/143512.html).
Specifically, basic block offsets are computed relative to the function
begin symbol. This relies on functions being contiguous which is not the
case for MFS and basic block section binaries. This means Propeller
cannot use binary profiles collected from these binaries, which limits
the applicability of Propeller for iterative optimization.
To make the `SHT_LLVM_BB_ADDR_MAP` feature work with basic block section
binaries, we propose modifying the encoding of this section as follows.
First let us review the current encoding which emits the address of each
function and its number of basic blocks, followed by basic block entries
for each basic block.
| | |
|--|--|
| Address of the function | Function Address |
| Number of basic blocks in this function | NumBlocks |
| BB entry 1
| BB entry 2
| ...
| BB entry #NumBlocks
To make this work for basic block sections, we treat each basic block
section similar to a function, except that basic block sections of the
same function must be encapsulated in the same structure so we can map
all of them to their single function.
We modify the encoding to first emit the number of basic block sections
(BB ranges) in the function. Then we emit the address map of each basic
block section section as before: the base address of the section, its
number of blocks, and BB entries for its basic block. The first section
in the BB address map is always the function entry section.
| | |
|--|--|
| Number of sections for this function | NumBBRanges |
| Section 1 begin address | BaseAddress[1] |
| Number of basic blocks in section 1 | NumBlocks[1] |
| BB entries for Section 1
|..................|
| Section #NumBBRanges begin address | BaseAddress[NumBBRanges] |
| Number of basic blocks in section #NumBBRanges |
NumBlocks[NumBBRanges] |
| BB entries for Section #NumBBRanges
The encoding of basic block entries remains as before with the minor
change that each basic block offset is now computed relative to the
begin symbol of its containing BB section.
This patch adds a new boolean codegen option `-basic-block-address-map`.
Correspondingly, the front-end flag `-fbasic-block-address-map` and LLD
flag `--lto-basic-block-address-map` are introduced.
Analogously, we add a new TargetOption field `BBAddrMap`. This means BB
address maps are either generated for all functions in the compiling
unit, or for none (depending on `TargetOptions::BBAddrMap`).
This patch keeps the functionality of the old
`-fbasic-block-sections=labels` option but does not remove it. A
subsequent patch will remove the obsolete option.
We refactor the `BasicBlockSections` pass by separating the BB address
map and BB sections handing to their own functions (named
`handleBBAddrMap` and `handleBBSections`). `handleBBSections` renumbers
basic blocks and places them in their assigned sections.
`handleBBAddrMap` is invoked after `handleBBSections` (if requested) and
only renumbers the blocks.
- New tests added:
- Two tests basic-block-address-map-with-basic-block-sections.ll and
basic-block-address-map-with-mfs.ll to exercise the combination of
`-basic-block-address-map` with `-basic-block-sections=list` and
'-split-machine-functions`.
- A driver sanity test for the `-fbasic-block-address-map` option
(basic-block-address-map.c).
- An LLD test for testing the `--lto-basic-block-address-map` option.
This reuses the LLVM IR from `lld/test/ELF/lto/basic-block-sections.ll`.
- Renamed and modified the two existing codegen tests for basic block
address map (`basic-block-sections-labels-functions-sections.ll` and
`basic-block-sections-labels.ll`)
- Removed `SHT_LLVM_BB_ADDR_MAP_V0` tests. Full deprecation of
`SHT_LLVM_BB_ADDR_MAP_V0` and `SHT_LLVM_BB_ADDR_MAP` version less than 2
will happen in a separate PR in a few months.
Most users of PseudoSourceValue.h only need PseudoSourceValue, not the
PseudoSourceValueManager. However, this header pulls in some very
expensive dependencies like ValueMap.h, which is only used for the
manager.
Split off the manager into a separate header and include it only where
used.
28b9126879
introduced the path cloning format in the basic-block-sections profile.
This PR validates and applies path clonings.
A path cloning is valid if all of these conditions hold:
1. All bb ids in the path are mapped to existing blocks.
2. Each two consecutive bb ids in the path have a successor relationship
in the CFG.
3. The path does not include a block with indirect branches, except
possibly as the last block.
Applying a path cloning involves cloning all blocks in the path (except
the first one) and setting up their branches.
Once all clonings are applied, the cluster information is used to guide
block layout in the modified function.
Enable FoldImmediate for X86 by implementing X86InstrInfo::FoldImmediate.
Also enhanced peephole by deleting identical instructions after FoldImmediate.
Differential Revision: https://reviews.llvm.org/D151848
This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future change to the params of
TargetMachine.
This matches other nearby enums.
For downstream users, this should be a fairly straightforward
replacement,
e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive
or s/CGFT_/CodeGenFileType::
With the large code model, the label difference may not fit into 32 bits.
Even if we assume that any individual function is no larger than 2^32
and use a difference from the function entry to the target destination,
things like BOLT can rearrange blocks (even if BOLT doesn't necessarily
work with the large code model right now).
set directives avoid static relocations in some 32-bit entry cases, but
don't worry about set directives for 64-bit jump table entries (we can
do that later if somebody really cares about it).
check-llvm in a bootstrapped clang with the large code model passes.
Fixes#62894
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D159297
Fix https://github.com/llvm/llvm-project/issues/63579
```
% cat a.c
void foo() {}
% clang --target=arm-none-eabi -mthumb -mno-unaligned-access -fsanitize=kcfi a.c -S -o - | grep p2align
.p2align 1
% clang --target=armv6m-none-eabi -fsanitize=function a.c -S -o - | grep p2align
.p2align 1
```
Ensure that -fsanitize={function,kcfi} instrumented functions are aligned by at
least 4, so that loading the type hash before the function label will not cause
a misaligned access. This is especially important for -mno-unaligned-access
configurations that don't set `setMinFunctionAlignment` to 4 or greater.
With this patch, the generated assembly for the examples above will contain `.p2align 2`
before the type hash.
If `__attribute__((aligned(N)))` or `-falign-functions=N` is specified, the
larger alignment will be used.
Reviewed By: simon_tatham, samitolvanen
Differential Revision: https://reviews.llvm.org/D154125
Currently, to use PSI->isFunctionHotInCallGraph, we first need to
calculate BPI->BFI, which is expensive. Instead, we can implement this
directly with MBFI. Also as @wenlei mentioned in another patch review,
that MachineSizeOpts already has isFunctionColdInCallGraph,
isFunctionHotInCallGraphNthPercentile, etc implemented. These can be
refactored and so they can be reused across MachineFunctionSplitting
and MachineSizeOpts passes.
This CL does this - it refactors out those internal static functions
into PSI as templated functions, so they can be accessed easily.
Differential Revision: https://reviews.llvm.org/D153927
This patch fixes an issue where we were reusing constant pool entries that contained undef elements, despite the additional uses of the 'equivalent constant' requiring some/all of the elements to be zero.
The CanShareConstantPoolEntry helper function uses ConstantFoldCastOperand to bitcast the type mismatching constants to integer representations to allow comparison, but unfortunately this treats undef elements as zero (which they will be written out as in the final asm). This caused an issue where the original constant pool entry contained undef elements, which was shared with a later constant that required the elements to be zero. This then caused a later analysis pass to incorrectly discard these undef elements.
Ideally we need a more thorough analysis/merging of the constant pool entries so the elements are forced to real zero elements, but for now we just prevent reuse of the constant pool entry entirely if the constants don't have matching undef/poison elements.
Fixes#63108
Differential Revision: https://reviews.llvm.org/D152357
Annotation metadata supports adding singular annotation strings to annotation block. This patch adds the ability to insert a tuple of strings into the metadata array.
The idea here is that each tuple of strings represents a piece of information that can be all related. It makes it easier to parse through related metadata information given it will be contained in one tuple.
For example in remarks any pass that implements annotation remarks can have different type of remarks and pass additional information for each.
The original behaviour of annotation remarks is preserved here and we can mix tuple annotations and single annotations for the same instruction.
Reviewed By: paquette
Differential Revision: https://reviews.llvm.org/D148328
This is a helper function to very slightly simplify many calls to
MachineInstruction::getOperandNo.
Differential Revision: https://reviews.llvm.org/D143250
Computing EH-related information was only relevant for analysis passes so far. Lifting it to IR will allow the IR Verifier to calculate EH funclet coloring and validate funclet operand bundles in a follow-up step.
Reviewed By: rnk, compnerd
Differential Revision: https://reviews.llvm.org/D138122
Add a flag state (and a MIR key) to MachineFunctions indicating whether they
contain instruction referencing debug-info or not. Whether DBG_VALUEs or
DBG_INSTR_REFs are used needs to be determined by LiveDebugValues at least, and
using the current optimisation level as a proxy is proving unreliable.
Test updates are purely adding the flag to tests, in a couple of cases it
involves separating out VarLocBasedLDV/InstrRefBasedLDV tests into separate
files, as they can no longer share the same input.
Differential Revision: https://reviews.llvm.org/D141387
Let Propeller use specialized IDs for basic blocks, instead of MBB number.
This allows optimizations not just prior to asm-printer, but throughout the entire codegen.
This patch only implements the functionality under the new `LLVM_BB_ADDR_MAP` version, but the old version is still being used. A later patch will change the used version.
####Background
Today Propeller uses machine basic block (MBB) numbers, which already exist, to map native assembly to machine IR. This is done as follows.
- Basic block addresses are captured and dumped into the `LLVM_BB_ADDR_MAP` section just before the AsmPrinter pass which writes out object files. This ensures that we have a mapping that is close to assembly.
- Profiling mapping works by taking a virtual address of an instruction and looking up the `LLVM_BB_ADDR_MAP` section to find the MBB number it corresponds to.
- While this works well today, we need to do better when we scale Propeller to target other Machine IR optimizations like spill code optimization. Register allocation happens earlier in the Machine IR pipeline and we need an annotation mechanism that is valid at that point.
- The current scheme will not work in this scenario because the MBB number of a particular basic block is not fixed and changes over the course of codegen (via renumbering, adding, and removing the basic blocks).
- In other words, the volatile MBB numbers do not provide a one-to-one correspondence throughout the lifetime of Machine IR. Profile annotation using MBB numbers is restricted to a fixed point; only valid at the exact point where it was dumped.
- Further, the object file can only be dumped before AsmPrinter and cannot be dumped at an arbitrary point in the Machine IR pass pipeline. Hence, MBB numbers are not suitable and we need something else.
####Solution
We propose using fixed unique incremental MBB IDs for basic blocks instead of volatile MBB numbers. These IDs are assigned upon the creation of machine basic blocks. We modify `MachineFunction::CreateMachineBasicBlock` to assign the fixed ID to every newly created basic block. It assigns `MachineFunction::NextMBBID` to the MBB ID and then increments it, which ensures having unique IDs.
To ensure correct profile attribution, multiple equivalent compilations must generate the same Propeller IDs. This is guaranteed as long as the MachineFunction passes run in the same order. Since the `NextBBID` variable is scoped to `MachineFunction`, interleaving of codegen for different functions won't cause any inconsistencies.
The new encoding is generated under the new version number 2 and we keep backward-compatibility with older versions.
####Impact on Size of the `LLVM_BB_ADDR_MAP` Section
Emitting the Propeller ID results in a 23% increase in the size of the `LLVM_BB_ADDR_MAP` section for the clang binary.
Reviewed By: tmsriram
Differential Revision: https://reviews.llvm.org/D100808
It was only ever used there, already. The previous location seems
left-over from when the personality function was specified on a
per-landingpad basis, instead of per-function.
This patch modifies SelectionDAG and FastISel to produce DBG_INSTR_REFs with
variadic expressions, and produce DBG_INSTR_REFs for debug values with variadic
location expressions. The former essentially means just prepending
DW_OP_LLVM_arg, 0 to the existing expression. The latter is achieved in
MachineFunction::finalizeDebugInstrRefs and InstrEmitter::EmitDbgInstrRef.
Reviewed By: jmorse, Orlando
Differential Revision: https://reviews.llvm.org/D133929
Prior to this patch, variadic DIExpressions (i.e. ones that contain
DW_OP_LLVM_arg) could only be created by salvaging debug values to create
stack value expressions, resulting in a DBG_VALUE_LIST being created. As of
the previous patch in this patch stack, DBG_INSTR_REF's syntax has been
changed to match DBG_VALUE_LIST in preparation for supporting variadic
expressions. This patch adds some minor changes needed to allow variadic
expressions that aren't stack values to exist, and allows variadic expressions
that are trivially reduceable to non-variadic expressions to be handled
similarly to non-variadic expressions.
Reviewed by: jmorse
Differential Revision: https://reviews.llvm.org/D133926
This change removes the `tidyLandingPads` function, which previously
had a few responsibilities:
1. Dealing with the deletion of an invoke, after MachineFunction lowering.
2. Dealing with the deletion of a landing pad BB, after MachineFunction lowering.
3. Cleaning up the type-id list generated by `MachineFunction::addLandingPad`.
Case 3 has been fixed in the generator, and the others are now handled
during table emission.
This change also removes `MachineFunction`'s `addCatchTypeInfo`,
`addFilterTypeInfo`, and `addCleanup` helper fns, as they had a single
caller, and being outlined didn't make it simpler.
Finally, as calling `tidyLandingPads` was effectively the only thing
`DwarfCFIExceptionBase` did, that class has been eliminated.