InitUndef pass need replace the implicit def with Undef pseudo, but
current remove method will make noreg2implicit borken.
This patch postpone the removal until all basicblock be processed.
This uses the recently introduced sink-and-fold support in MachineSink.
https://reviews.llvm.org/D152828
This enables folding ADDI into load/store addresses.
Enabling by default will be a separate PR.
In a recent series of refactorings (described here: https://discourse.llvm.org/t/riscv-transition-in-vector-pseudo-structure-policy-variants/71295), I greatly increased the number of IMPLICIT_DEF operands to our vector instructions. This has turned out to have an unexpected negative impact because MachineCSE does not CSE IMPLICIT_DEFs, and thus does not CSE any instruction with an IMPLICIT_DEF operand. SelectionDAG *does* CSE the same case, but that only covers the same block case, not the cross block case. This lead to the performance regression reported in https://github.com/llvm/llvm-project/issues/64282.
This change is a slightly ugly hack to side step the issue. Instead of fixing the root cause (lack of CSE for IMPLICIT_DEF) or undoing the operand changes, we leave the extra operand in place, and use NoReg in place of IMPLICIT_DEF. I then convert back to IMPLICIT_DEF just before register allocation so that ProcessImplicitDefs and TwoAddressInstructions can do the normal transforms to Undef tied registers.
We may end up backporting this into the 17.x release branch. Given how late in the release cycle this is landing, that's much less likely now, but still a possibility.
Differential Revision: https://reviews.llvm.org/D156909
`MachineInstr.h` is a commonly included file and this includes
`llvm/ADT/SmallSet.h` for one function `getUsedDebugRegs()`, which is
used only in one place.
According to `ClangBuildAnalyzer` (run solely on building LLVM, no other
projects) the second most expensive template to instantiate is the
`SmallSet::insert` method used in the `inline` implementation in
`getUsedDebugRegs()`:
```
**** Templates that took longest to instantiate:
554239 ms: std::unordered_map<int, int> (2826 times, avg 196 ms)
521187 ms: llvm::SmallSet<llvm::Register, 4>::insert (930 times, avg 560
ms)
...
```
By removing this method and putting its implementation in the one call
site we greatly reduce the template instantiation time and reduce the
number of includes.
When copying the implementation, I removed a check on `MO.getReg()` as
this is checked within `MO.isVirtual()`.
Differential Revision: https://reviews.llvm.org/D157720
The purpose of this code is to restrict overlap between source and destination registers. The tied input register is conceptually part of the destination. I can't see any reason why we need to prevent a partial undef tied source here, and skipping it reduces register pressure slightly.
Differential Revision: https://reviews.llvm.org/D156709
Reapplying after revert due to sanitizer failure. Includes fix to avoid querying dead lanes for vreg introduced by previous transform.
The code was written with the implicit assumption that each IMPLICIT_DEF either a) the tied operand, or b) an untied source, but not both. This is true right now, but an upcoming change may allow CSE of IMPLICIT_DEFs in some cases, so let's rewrite the code to handle that possibility.
I added an MIR case which demonstrates the multiple use IMPLICIT_DEF. To my knowledge, this is not a reachable configuration from IR right now.
As an aside, this makes the structure a much closer match with the sub-reg liveness case, and we can probably just merge these routines. (Future work.)
Differential Revision: https://reviews.llvm.org/D156477
The code was written with the implicit assumption that each IMPLICIT_DEF either a) the tied operand, or b) an untied source, but not both. This is true right now, but an upcoming change may allow CSE of IMPLICIT_DEFs in some cases, so let's rewrite the code to handle that possibility.
I added an MIR case which demonstrates the multiple use IMPLICIT_DEF. To my knowledge, this is not a reachable configuration from IR right now.
As an aside, this makes the structure a much closer match with the sub-reg liveness case, and we can probably just merge these routines. (Future work.)
Differential Revision: https://reviews.llvm.org/D156477
RISC-V vector instruction has register overlapping constraint for certain
instructions, and will cause illegal instruction trap if violated, we use
early clobber to model this constraint, but it can't prevent register allocator
allocated same or overlapped if the input register is undef value, so convert
IMPLICIT_DEF to temporary pseudo could prevent that happen, it's not best way
to resolve this. Ideally we should model the constraint right, but before we
model the constraint right, it's the approach to prevent that happen.
See also: https://github.com/llvm/llvm-project/issues/50157
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D129735
This reverts commit f1c4241fb6e50c507adafbe14faf82a755ab92ca.
It causes use-after-poison asan failures for
CodeGen/RISCV/rvv/undef-earlyclobber-chain.ll and CodeGen/RISCV/regalloc-last-chance-recoloring-failure.ll
RISC-V vector instruction has register overlapping constraint for certain
instructions, and will cause illegal instruction trap if violated, we use
early clobber to model this constraint, but it can't prevent register allocator
allocated same or overlapped if the input register is undef value, so convert
IMPLICIT_DEF to temporary pseudo could prevent that happen, it's not best way
to resolve this. Ideally we should model the constraint right, but before we
model the constraint right, it's the approach to prevent that happen.
See also: https://github.com/llvm/llvm-project/issues/50157
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D129735