This is a conservative workaround for broken liveness tracking of
SUBREG_TO_REG to speculatively fix all targets. The current reported
failures are on X86 only, but this issue should appear for all targets
that use SUBREG_TO_REG. The next minimally correct refinement would be
to disallow only implicit defs.
The coalescer now introduces implicit-defs of the super register to
track the dependency on other subregisters. If we see such an implicit
operand, we cannot simply treat the subregister def as the result
operand in case downstream users depend on the implicitly defined
parts. Really target implementations should be considering the
implicit defs and trying to interpret them appropriately (maybe with
some generic helpers). The full implicit def could possibly be
reported as the move result, rather than the subregister def but that
requires additional work.
Hopefully fixes#64060 as well.
This needs to be applied to the release branch.
https://reviews.llvm.org/D156346
Currently coalescing with SUBREG_TO_REG introduces an invisible load
bearing undef. There is liveness for the super register not
represented in the MIR.
This is part 1 of a fix for regressions that appeared after
b7836d856206ec39509d42529f958c920368166b. The allocator started
recognizing undef-def subregister MOVs as copies. Since there was no
representation for the dependency on the high bits, different undef
segments of the super register ended up disconnected and downstream
users ended up observing different undefs than they did previously.
This does not yet fix the regression. The isCopyInstr handling needs
to start handling implicit-defs on any instruction.
I wanted to include an end to end IR test since the actual failure
only appeared with an interaction between the coalescer and the
allocator. It's a bit bigger than I'd like but I'm having a bit of
trouble reducing it to something which definitely shows a diff that's
meaningful.
The same problem likely exists everywhere trying to do anything with
SUBREG_TO_REG. I don't understand how this managed to be broken for so
long.
This needs to be applied to the release branch.
https://reviews.llvm.org/D156345
If this was coalescing a def of a subregister with a def of the super
register, it was introducing a redundant super-register def and
marking the subregister def as dead.
Resulting in something like:
dead $eax = MOVr0, implicit-def $rax, implicit-def $rax
Avoid this by checking if the new instruction already has the super
def, so we end up with this instead:
dead $eax = MOVr0, implicit-def $rax
The dead flag looks suspicious to me, seems like it's easy to buggily
interpret dead def of subreg and a non-dead def of an aliasing
register. It seems to be intentional though.
https://reviews.llvm.org/D156343
Permit an implicit-def of a virtual register when rematerializing if
it defines a super register of a subregister def. The
rematerialization pre-legality check should really have been checking
the implicit operands, but that should be fixed separately.
https://reviews.llvm.org/D156331
Not sure how to produce a test that demonstrates the problem
today. The coalescer would have to introduce a verifier caught SSA
violation, like multiple defs of a virtual register. I'm not sure what
would do that now, but an upcoming patch will.
https://reviews.llvm.org/D156271
- Add tests for computeOverflowFor*Sub functions
- extend the computeOverflowForSignedSub/computeOverflowForUnsignedSub
implementations with ConstantRange (#37109)
This avoids some redundant spills of subranges, and avoids a compile failure.
This greatly reduces the numbers of spills in a loop.
The main range is not informative when multiple instructions are needed to fully define
a register. A common scenario is a lowered reg_sequence where every subregister
is sequentially defined, but each def changes the main range's value number. If
we look at specific lanes at the use index, we can see the value is actually the
same.
In this testcase, there are a large number of materialized 64-bit constant defs
which are hoisted outside of the loop by MachineLICM. These are feeding REG_SEQUENCES,
which is not considered rematerializable inside the loop. After coalescing, the split
constant defs produce main ranges with an apparent phi def. There's no phi def if you look
at each individual subrange, and only half of the register is really redefined to a constant.
Fixes: SWDEV-380865
https://reviews.llvm.org/D147079
SplitKit creates questionably formed bundles of copies
when it needs to copy a subset of live lanes and can't do
it with a single subregister index. These are merely marked
as part of a bundle, and don't start with a BUNDLE instruction.
Queries for the slot index would give the first copy in the
bundle, and we need to inspect the operands of all the other
bundled copies.
Also fix and simplify detection of read lane subsets. This causes
some RISCV test regressions, but these look like accidentally beneficial
splits. I don't see a subrange based reason to perform these splits.
Avoids some really ugly regressions in a future patch.
https://reviews.llvm.org/D146859
https://reviews.llvm.org/D152789 added an `exit` op before each
`unreachable`. This means we never get to the `trap` instruction.
This change limits the insertion of `exit` instructions to the cases
where `unreachable` is not lowered to `trap`. Trap itself is changed to
be emitted as `trap; exit;` to convey to `ptxas` that it exits the CFG.
Replace some uses of `Type::getPointerTo` via 2 ways
* Remove entirely if it's only used to support an unnecessary bitcast
(remove the bitcast as well).
* Replace with `PointerType::get`/`PointerType::getUnqual`
NFC opaque pointer clean-up effort.
The goal in #66818 was to capture function entry counts, but those are not the same as the frequency of the entry (machine) basic block. This fixes that, and adds explicit profiles to the test.
We also increase the precision of `MachineBlockFrequencyInfo::getBlockFreqRelativeToEntryBlock` to double. Existing code uses it as float so should be unaffected.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.
Differential Revision: https://reviews.llvm.org/D158449
Following on from D135150, this patch fixes another crash caused by this
DAG combine:
fadd (fma A, B, (fmul C, D)), E --> fma A, B, (fma C, D, E)
The combine calls ReplaceAllUsesOfValueWith to replace (fmul C, D) with
(fma C, D, E). This can cause nodes to get CSEd. In D135150 the problem
was that the (fma C, D, E) node got CSEd away. In this new case, the
problem is that the outer fadd node gets CSEd away. To fix it we have
to return SDValue(N, 0) from the combine and be careful not to add a
deleted node to the worklist.
This caused asserts:
llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp:2331:
virtual void llvm::DwarfDebug::endFunctionImpl(const llvm::MachineFunction *):
Assertion `LScopes.getAbstractScopesList().size() == NumAbstractSubprograms &&
"getOrCreateAbstractScope() inserted an abstract subprogram scope"' failed.
See comment on the code review for reproducer.
> RFC https://discourse.llvm.org/t/rfc-dwarfdebug-fix-and-improve-handling-imported-entities-types-and-static-local-in-subprogram-and-lexical-block-scopes/68544
>
> Similar to imported declarations, the patch tracks function-local types in
> DISubprogram's 'retainedNodes' field. DwarfDebug is adjusted in accordance with
> the aforementioned metadata change and provided a support of function-local
> types scoped within a lexical block.
>
> The patch assumes that DICompileUnit's 'enums field' no longer tracks local
> types and DwarfDebug would assert if any locally-scoped types get placed there.
>
> Reviewed By: jmmartinez
>
> Differential Revision: https://reviews.llvm.org/D144006
This reverts commit f8aab289b5549086062588fba627b0e4d3a5ab15.
Instead of ConstantExpr::getCast() with a fixed opcode, use the
corresponding getXYZ methods instead. For the one place creating
a pointer bitcast drop it entirely, as this is redundant with
opaque pointers.
There were a couple of issues with maintaining register def/uses held
in `MachineRegisterInfo`:
* when an operand is changed from one register to another, the
corresponding instruction must already be inserted into the function,
or MRI won't be updated
* when traversing the set of all uses of a register, that set must not
change
The operators are defined in DwarfDebug.cpp but are
referenced in the struct definitions of FrameIndexExpr and
EntryValueInfo in DwarfDebug.h, and since they weren't
declared before, gcc warned with
[694/5646] Building CXX object lib/CodeGen/AsmPrinter/CMakeFiles/LLVMAsmPrinter.dir/DwarfDebug.cpp.o
../lib/CodeGen/AsmPrinter/DwarfDebug.cpp:273:6: warning: 'bool llvm::operator<(const llvm::FrameIndexExpr&, const llvm::FrameIndexExpr&)' has not been declared within 'llvm'
273 | bool llvm::operator<(const FrameIndexExpr &LHS, const FrameIndexExpr &RHS) {
| ^~~~
In file included from ../lib/CodeGen/AsmPrinter/DwarfDebug.cpp:13:
../lib/CodeGen/AsmPrinter/DwarfDebug.h:112:15: note: only here as a 'friend'
112 | friend bool operator<(const FrameIndexExpr &LHS, const FrameIndexExpr &RHS);
| ^~~~~~~~
../lib/CodeGen/AsmPrinter/DwarfDebug.cpp:278:6: warning: 'bool llvm::operator<(const llvm::EntryValueInfo&, const llvm::EntryValueInfo&)' has not been declared within 'llvm'
278 | bool llvm::operator<(const EntryValueInfo &LHS, const EntryValueInfo &RHS) {
| ^~~~
In file included from ../lib/CodeGen/AsmPrinter/DwarfDebug.cpp:13:
../lib/CodeGen/AsmPrinter/DwarfDebug.h:121:15: note: only here as a 'friend'
121 | friend bool operator<(const EntryValueInfo &LHS, const EntryValueInfo &RHS);
| ^~~~~~~~
The legalizer currently generates lots of G_AND artifacts.
For example between boolean uses and defs there is always a G_AND with a mask of 1, but when the target uses ZeroOrOneBooleanContents, this is unnecessary.
Currently these artifacts have to be removed using post-legalize combines.
Omitting these artifacts at their source in the artifact combiner has a few advantages:
- We know that the emitted G_AND is very likely to be useless, so our KnownBits call is likely worth it.
- The G_AND and G_CONSTANT can interrupt e.g. G_UADDE/... sequences generated during legalization of wide adds which makes it harder to detect these sequences in the instruction selector (e.g. useful to prevent unnecessary reloading of AArch64 NZCV register).
- This cleans up a lot of legalizer output and even improves compilation-times.
AArch64 CTMark geomean: `O0` -5.6% size..text; `O0` and `O3` ~-0.9% compilation-time (instruction count).
Since this introduces KnownBits into code-paths used by `O0`, I reduced the default recursion depth.
This doesn't seem to make a difference in CTMark, but should prevent excessive recursive calls in the worst case.
Reviewed By: aemerson
Differential Revision: https://reviews.llvm.org/D159140
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.
Differential Revision: https://reviews.llvm.org/D158449
Splitting up patches for #20571. I found these comments generally useful
to add and not predicated on those changes. Hopefully they help future
travelers.
When hosting a loop invariant instruction the resulting register must be
live in
all the basic blocks of the loop body and the killed flags of the
register must
be cleared.
Before this patch killed flags of subregister to a hoisted superregister
was not
cleared in the loop body.
This was found in an out of tree target, but the testcase
mlicm-stack-write-check.mir was modified to trigger the case.
The `S_INLINEES` debug symbol is used to record all the functions that
are directly inlined within the current function (nested inlining is
ignored).
This change implements support for emitting the `S_INLINEES` debug
symbol in LLVM, and cleans up how the `S_INLINEES` and `S_CALLEES` debug
symbols are dumped.
This patch is extracted from D96035, it adds support for the accelerator
tables to the DWARFLinkerParallel functionality.
Differential Revision: https://reviews.llvm.org/D154793
In SUnit::removePred, N->WeakSuccsLeft is reduced but WeakSuccsLeft is checked.
Reviewed By: kerbowa
Differential Revision: https://reviews.llvm.org/D151311
I accidentally introduced this in
commit 330fa7d2a4e0 ("[TargetLowering] Deduplicate choosing InlineAsm
constraint between ISels (#67057)")
Fix forward.
Split a register generated from another split usually doesn't bring us
too much benefit. It may also cause dead loop as pr67188 shows if the
heuristic cost always satisfy the split condition. So prevent such
splitting.
It fixed pr67188.
It is a re-commit from reverted commit 3454cf67bd0a650097dc6ca99874a34e1d59b500.
Following discussion on https://reviews.llvm.org/D154205, make MachineLICM pass
handle subloops with only visiting outermost loop's blocks once.
Differential Revision: https://reviews.llvm.org/D154205
After 330fa7d2a4e0cfbb4b078 we were seeing nondeterministic failures of
llvm/test/CodeGen/ARM/thumb-big-stack.ll, with different code being
generated in different runs.
Switching sort -> stable_sort fixes this.
It looks like the old algorithm picked the first best option, and using
stable_sort restores that behavior.
There isn't a test for this yet since the combines aren't used atm, but it will
be tested as part of a future commit. I'm just making this a separate change
tidyness reasons.
Given a list of constraints for InlineAsm (ex. "imr") I'm looking to
modify the order in which they are chosen. Before doing so, I noticed a
fair
amount of logic is duplicated between SelectionDAGISel and GlobalISel
for this.
That is because SelectionDAGISel is also trying to lower immediates
during selection. If we detangle these concerns into:
1. choose the preferred constraint
2. attempt to lower that constraint
Then we can slide down the list of constraints until we find one that
can be lowered. That allows the implementation to be shared between
instruction selection frameworks.
This makes it so that later I might only need to adjust the priority of
constraints in one place, and have both selectors behave the same.
This patch adds a new code transformation to the `MachineSink` pass,
that tries to sink copies of an instruction, when the copies can be folded
into the addressing modes of load/store instructions, or
replace another instruction (currently, copies into a hard register).
The criteria for performing the transformation is that:
* the register pressure at the sink destination block must not
exceed the register pressure limits
* the latency and throughput of the load/store or the copy must not deteriorate
* the original instruction must be deleted
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D152828
This is part of the work to address the D155472 regressions, there's a number of issues with generalizing this fold which is why I'm just adding SRL support atm.
Differential Revision: https://reviews.llvm.org/D159533
Previously, we where taking `CurVT` before finalizing `ToCast` which
meant potentially returning an `SDValue` with an illegal `ValueType`
for the operation.
Fix is to just take `CurVT` after we have finalized `ToCast` with
`PeekThroughCastsAndTrunc`.