This PR instruments the optimization passes in the SystemZ backend with
calls to `MachineFunction::substituteDebugValuesForInst` where
instruction substitutions are made to instructions that may compute
tracked values.
Tests are also added for each of the substitutions that were inserted.
Details on the individual passes follow.
### systemz-copy-physregs
When a copy targets an access register, we redirect the copy via an
auxiliary register. This leads to the final result being written by a
newly inserted SAR instruction, rather than the original MI, so we need
to update the debug value tracking to account for this.
### systemz-long-branch
This pass relaxes relative branch instructions based on the actual
locations of blocks. Only one of the branch instructions qualifies for
debug value tracking: BRCT, i.e. branch-relative-on-count, which
subtracts 1 from a register and branches if the result is not zero. This
is relaxed into an add-immediate and a conditional branch, so any
`debug-instr-number` present must move to the add-immediate instruction.
### systemz-post-rewrite
This pass replaces `LOCRMux` and `SELRMux` pseudoinstructions with
either the real versions of those instructions, or with branching
programs that implement the intent of the Pseudo. In all these cases,
any `debug-instr-number` attached to the pseudo needs to be reallocated
to the appropriate instruction in the result, either LOCR, SELR, or a
COPY.
### systemz-elim-compare
Similar to systemz-long-branch, for this pass, only few substitutions
are necessary, since it mainly deals with conditional branch
instructions. The only exceptiona are again branch-relative-on-count, as
it modifies a counter as part of the instruction, as well as any of the
load instructions that are affected.
It seems that there can be other cases with this that also can lead to
wrong code (discovered with csmith). This time it involved not the kill
flag but the undef flag.
Use the intersection of the flags from both MachineOperand:s instead
of the RegState from just one of them.
If both operands of a SELRMux use the same register which is killed, and
the SELRMux is expanded to a jump sequence, a broken MIR results if the
kill flag is not removed.
This patch replaces the SELRMux with a COPY in these cases.
This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20.
Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang,
and many LLVM tests, see comments on https://reviews.llvm.org/D121169
SystemZExpandPseudo:s only job was to expand LOCRMux instructions into jump
sequences. This needs to be done if expandLOCRPseudo() or expandSELRPseudo()
fails to find a legal opcode (all registers "high" or "low"). This task has
now been moved to SystemZPostRewrite while removing the SystemZExpandPseudo
pass.
It is in fact preferred to expand these pseudos directly after register
allocation in SystemZPostRewrite since the hinted register combinations are
then not subject to later optimizations.
Review: Ulrich Weigand
https://reviews.llvm.org/D67432
llvm-svn: 371959
Summary:
This clang-tidy check is looking for unsigned integer variables whose initializer
starts with an implicit cast from llvm::Register and changes the type of the
variable to llvm::Register (dropping the llvm:: where possible).
Partial reverts in:
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned&
MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register
PPCFastISel.cpp - No Register::operator-=()
PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned&
MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor
Manual fixups in:
ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned&
HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register
HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register.
PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned&
Depends on D65919
Reviewers: arsenm, bogner, craig.topper, RKSimon
Reviewed By: arsenm
Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65962
llvm-svn: 369041
This patch aims to reduce spilling and register moves by using the 3-address
versions of instructions per default instead of the 2-address equivalent
ones. It seems that both spilling and register moves are improved noticeably
generally.
Regalloc hints are passed to increase conversions to 2-address instructions
which are done in SystemZShortenInst.cpp (after regalloc).
Since the SystemZ reg/mem instructions are 2-address (dst and lhs regs are
the same), foldMemoryOperandImpl() can no longer trivially fold a spilled
source register since the reg/reg instruction is now 3-address. In order to
remedy this, new 3-address pseudo memory instructions are used to perform the
folding only when the dst and lhs virtual registers are known to be allocated
to the same physreg. In order to not let MachineCopyPropagation run and
change registers on these transformed instructions (making it 3-address), a
new target pass called SystemZPostRewrite.cpp is run just after
VirtRegRewriter, that immediately lowers the pseudo to a target instruction.
If it would have been possibe to insert a COPY instruction and change a
register operand (convert to 2-address) in foldMemoryOperandImpl() while
trusting that the caller (e.g. InlineSpiller) would update/repair the
involved LiveIntervals, the solution involving pseudo instructions would not
have been needed. This is perhaps a potential improvement (see Phabricator
post).
Common code changes:
* A new hook TargetPassConfig::addPostRewrite() is utilized to be able to run a
target pass immediately before MachineCopyPropagation.
* VirtRegMap is passed as an argument to foldMemoryOperand().
Review: Ulrich Weigand, Quentin Colombet
https://reviews.llvm.org/D60888
llvm-svn: 362868