SIInstrInfo::commuteInstructionImpl should accept indices to commute in
either order. This simplifies SIFoldOperands::tryAddToFoldList where
OtherIdx, CommuteIdx0 and CommuteIdx1 are no longer needed.
Track the live register state immediately before, instead of after,
MBBI. This makes it simple to track the state at the start or end of a
basic block without a separate (and poorly named) Tracking flag.
This changes the API of the backward(MachineBasicBlock::iterator I)
method, which now recedes to the state just before, instead of just
after, *I. Some clients are simplified by this change.
There is one small functional change shown in the lit tests where
multiple spilled registers all need to be reloaded before the same
instruction. The reloads will now be inserted in the opposite order.
This should not affect correctness.
moveToVALU previously only handled S_CSELECT_B64 in the trivial case
where it was semantically equivalent to a copy. Implement the general
case using V_CNDMASK_B64_PSEUDO and implement post-RA expansion of
V_CNDMASK_B64_PSEUDO with immediate as well as register operands.
When SI_PC_ADD_REL_OFFSET is expanded to S_GETPC/S_ADD/S_ADDC, the
GlobalAddress operands have to be adjusted by 4 or 12 bytes to account
for the offset from the end of the S_GETPC instruction to the literal
operands. Do this all in SIInstrInfo::expandPostRAPseudo instead of
duplicating the adjustment code in both AMDGPULegalizerInfo and
SITargetLowering. NFCI.
Add intrinsic for s_bitreplicate. Lower to S_BITREPLICATE_B64_B32
machine instruction in both GISel and Selection DAG.
Support VGPR arguments by inserting a `v_readfirstlane`.
We can use sign extended 64-bit literals, but only for signed operands.
At the moment we do not know if an operand is signed. Such operand will
be encoded as its low 32 bits and then either correctly sign extended or
incorrectly zero extended by HW.
The insertion point determined by RA while attempting spills and liverange
split at the beginning of a block goes wrong at times, and the newly
inserted vector instructions are placed before the exec-mask restore
instruction which is wrong. It occurs mainly due to the dependency on
isBasicBlockPrologue that doesn't account early inserted instructions
(spills and splits) during RA and causes the block prolog break.
A better approach for deciding the insertion point should be worked out.
For now, improving the helper function to consider all possible early
insertions. This patch includes the spill instructions. The copies
associated with liverange split should also be included in the block
prolog.
For a future patch, is it important to keep the lowered SGPR
spills to be recognized as spill instructions during regalloc.
Directly lowering them into V_WRITELANE/V_READLANE won't allow
us to attach the SPILL flag to their instructions.
This patch introduces the pseudo instructions with the SGPRSpill
flag set in their Desc. They will get lowered to equivalent
instructions later during post RA pseudo expansion.
This helped to catch a codegen failure caused by #69703. MachineVerifier
did not complain about this malformed COPY either before regalloc:
%9:vreg_64 = COPY %17:vgpr_32
Or after regalloc:
renamable $vgpr0_vgpr1 = COPY renamable $vgpr2, implicit $exec
But we can at least catch the problem when copyPhysReg tries to expand
it into 32-bit register moves and fails to find suitable source
registers:
$vgpr0 = V_MOV_B32_e32 $noreg, implicit $exec, implicit-def $vgpr0_vgpr1, implicit $vgpr2
$vgpr1 = V_MOV_B32_e32 $noreg, implicit $exec, implicit $vgpr2, implicit $exec
Extend the list of instructions that can be rematerialized in
SIInstrInfo::isReallyTriviallyReMaterializable() to support scalar
loads.
Try shrinking instructions to remat only the part needed for current
context. Add SIInstrInfo::reMaterialize target hook, and handle
shrinking of S_LOAD_DWORDX16_IMM to S_LOAD_DWORDX8_IMM as a proof of
concept.
Waterfall loop is overwriting SCC bit of status register. Make sure SCC
bit is saved and restored across.
We need to save/restore only in cases where SCC is live across waterfall
loop.
Co-authored-by: Sirish Pande <sirish.pande@amd.com>
Temporal divergence that was present in input or introduced in IR
transforms, like code-sinking or LICM, is handled in SIFixSGPRCopies
by changing sgpr source instr to vgpr instr.
After 5b657f5, that moved LICM after AMDGPUCodeGenPrepare,
machine-sinking can introduce temporal divergence by sinking
instructions outside of the cycle.
Add isSafeToSink callback in TargetInstrInfo.
V_FMAAK_F16_t16 takes VGPR_32_Lo128 operands whereas the original
instructions would have VGPR_32 operands. Switching the opcodes without
updating operands' register classes leads to MachineVerifier complaining
about the classes not matching instruction definitions. The problem only
reveals itself of builds with expensive checks enabled because of
missing -verify-machineinstrs in the test.
This is the third attempt to update CodeGen/AMDGPU/fma.f16.ll to run for
GFX11, following the second attempt in a1e38e0b8e3e, partially reverted
in eaf737a4e004.
Scratch instructions are always in addrspace(5), which can only alias
with flat (and itself). SMEM and buffer instructions can never reference
those address spaces, so they are trivially disjoint.
Names '64BitDPP' and especially 'DPP64' were found misleading, and
DPP64 can easily be mixed with DPP16 and DPP8 while these are
different concepts. DPP16 and DPP8 refers to lanes where DPP64
refers to the operand size.
In fact the essential part here is that these instructions are
executed on the DP ALU, so rename the feature accordingly.
I have also found a bug in a check for these instructions, which is
fixed here and a common utility function is now used.
Differential Revision: https://reviews.llvm.org/D158465
The wwm register spill pseudos are currently defined for VGPR_32
regclass. It causes a verifier error for gfx908 or above as the
regalloc sometimes restores the values to the vector superclass AV_32.
Fixing it by supporting AV wwm-spill pseudos as well.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D155646
While they are represent 32/16 bit immediate values they are already
included in encoding of the instructions that use them and are not true
literals. FMAMK and FMAAK instructions that use them are marked with fixed
size so getInstSizeInBytes will not increase the size for these operands.
We also add tests whose logic relies on KIMM16 and KIMM32 being considered
not inlinable.
Differential Revision: https://reviews.llvm.org/D157624
Currently `isTriviallyReMaterializable` calls
`isReallyTriviallyReMaterializable` and
`isReallyTriviallyReMaterializableGeneric`. The two interfaces
are confusing, but there are also some real issues with this.
The documentation of this function (see below) suggests that
`isReallyTriviallyRematerializable` allows the target to override the
default behaviour.
/// For instructions with opcodes for which the M_REMATERIALIZABLE flag is
/// set, this hook lets the target specify whether the instruction is actually
/// trivially rematerializable, taking into consideration its operands.
It however implements something different. The default behaviour
is the analysis done in `isReallyTriviallyReMaterializableGeneric`,
which is testing if it is safe to rematerialize the MachineInstr.
The result of `isReallyTriviallyReMaterializable` is only considered if
`isReallyTriviallyReMaterializableGeneric` returns `false`. That means
there is no way to override the default behaviour if
`isReallyTriviallyReMaterializableGeneric` returns true (i.e. it is safe to
rematerialize, but we'd rather not).
By making this a single interface, we can override the interface to do either.
Reviewed By: craig.topper, nemanjai
Differential Revision: https://reviews.llvm.org/D156520
This reverts commit a496c8be6e638ae58bb45f13113dbe3a4b7b23fd.
The workaround in c26dfc81e254c78dc23579cf3d1336f77249e1f6 should work
around the underlying problem with SUBREG_TO_REG.
Some opcodes in generic MIR represent calls to intrinsics, where the intrinsic
ID is the first non-def operand to the instruction. These are now represented as
a subclass of GenericMachineInstr, and the method MachineInstr::getIntrinsicID()
is now moved to this subclass GIntrinsic.
Some target-defined instructions behave like GMIR intrinsics, and have an
Intrinsic::ID operand. But they should not be recognized as generic intrinsics,
and should not use GIntrinsic::getIntrinsicID(). Separated these out by
introducing a new AMDGPU::getIntrinsicID().
Reviewed By: arsenm, Pierre-vh
Differential Revision: https://reviews.llvm.org/D155556
This restores commit baa3386edb11a2f9bcadda8cf58d56f3707c39fa.
Originally reverted in d0f7850b01cf17e50a4f4b00e3b84dded94df6b8.
And dependent commits.
Details in D150388.
This reverts commit 825b7f0ca5f2211ec3c93139f98d1e24048c225c.
This reverts commit 7a98f084c4d121244ef7286bc6503b6a181d446e.
This reverts commit b4a62b1fa546312d882fa12dfdcd015177d66826.
This reverts commit b7836d856206ec39509d42529f958c920368166b.
No conflicts in the code, few tests had conflicts in autogenerated CHECKs:
llvm/test/CodeGen/Thumb2/mve-float32regloops.ll
llvm/test/CodeGen/AMDGPU/fix-frame-reg-in-custom-csr-spills.ll
Reviewed By: alexfh
Differential Revision: https://reviews.llvm.org/D156381
This reverts commit baa3386edb11a2f9bcadda8cf58d56f3707c39fa.
The changes did not cover all occurrences of the deteleted function
MachineInstr::getIntrinsicID().
Some opcodes in generic MIR represent calls to intrinsics, where the intrinsic
ID is the first non-def operand to the instruction. These are now represented as
a subclass of GenericMachineInstr, and the method MachineInstr::getIntrinsicID()
is now moved to this subclass GIntrinsic.
Some target-defined instructions behave like GMIR intrinsics, and have an
Intrinsic::ID operand. But they should not be recognized as generic intrinsics,
and should not use GIntrinsic::getIntrinsicID(). Separated these out by
introducing a new AMDGPU::getIntrinsicID().
Reviewed By: arsenm, Pierre-vh
Differential Revision: https://reviews.llvm.org/D155556
I need this for future patch in the MC, while TII is not available
in the llvm-mc. Besides this is not a first time I want it there.
Differential Revision: https://reviews.llvm.org/D155228
So far, we haven't exposed the allocation of whole-wave
registers to regalloc. We hand-picked them for various
whole wave mode operations. With a future patch, we
want the allocator to efficiently allocate them rather
than using the custom pre-allocation pass.
Any liverange split of virtual registers involved in
whole-wave operations require the resulting COPY
introduced with the split to be performed for all
lanes. It isn't implemented in the compiler yet.
This patch would identify all such copies and
manipulate the exec mask around them to enable all
lanes without affecting the value of exec mask
elsewhere.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D143762
To reduce the register pressure during allocation,
when the allocator spills a virtual register that
corresponds to a whole wave mode operation, the
spill loads and restores should be activated for
all lanes by temporarily flipping all bits in exec
register to one just before the spills. It is not
implemented in the compiler as of today and this
patch enables the necessary support.
This is a pre-patch before the SGPR spill to virtual
VGPR lanes that would eventually causes the whole
wave register spills during allocation.
Reviewed By: arsenm, cdevadas
Differential Revision: https://reviews.llvm.org/D143759
Branch relaxation requires 2 additional SGPRs for AMDGPU to handle the
case when an indirect branch target is too far away. The register
scavanger may not find available registers, which causes a “did not find
scavenging index” assert to occur in assignRegToScavengingIndex.
In this patch, we estimate before register allocation whether an
indirect branch is likely to be needed, and reserve 2 SGPRs if the
branch distance is found to be above a threshold. The distance threshold
is an approximation as the exact code size and branch distance are
unknown prior to register allocation.
Patch by Corbin Robeck. Thanks!
Differential Review: https://reviews.llvm.org/D149775