Accelerate finding the base class for a physical register by
building a statically mapping table from physical registers
to base classes using TableGen.
Replace uses of SIRegisterInfo::getPhysRegClass with
TargetRegisterInfo::getPhysRegBaseClass in order to use
the computed table.
Reviewed By: arsenm, foad
Differential Revision: https://reviews.llvm.org/D139422
Currently, the custom SGPR spill lowering pass spills
SGPRs into physical VGPR lanes and the remaining VGPRs
are used by regalloc for vector regclass allocation.
This imposes many restrictions that we ended up with
unsuccessful SGPR spilling when there won't be enough
VGPRs and we are forced to spill the leftover into
memory during PEI. The custom spill handling during PEI
has many edge cases and often breaks the compiler time
to time.
This patch implements spilling SGPRs into virtual VGPR
lanes. Since we now split the register allocation for
SGPRs and VGPRs, the virtual registers introduced for
the spill lanes would get allocated automatically in
the subsequent regalloc invocation for VGPRs.
Spill to virtual registers will always be successful,
even in the high-pressure situations, and hence it avoids
most of the edge cases during PEI. We are now left with
only the custom SGPR spills during PEI for special registers
like the frame pointer which isn an unproblematic case.
This patch also implements the whole wave spills which
might occur if RA spills any live range of virtual registers
involved in the whole wave operations. Earlier, we had
been hand-picking registers for such machine operands.
But now with SGPR spills into virtual VGPR lanes, we are
exposing them to the allocator.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D124196
Unlike the callee-saved VGPR spill instructions emitted by
`PEI::spillCalleeSavedRegs`, the CS VGPR spills inserted during
emitPrologue/emitEpilogue require the exec bits flipping to avoid
clobbering the inactive lanes of VGPRs used for SGPR spilling.
Currently, these spill instructions are referenced from the SP at
function entry and when the callee performs a stack realignment,
they ended up getting incorrect stack offsets. Even if we try to
adjust the offsets, the FP-SP becomes a runtime entity with dynamic
stack realignment and the offsets would still be inaccurate.
To fix it, use FP as the frame base in the spill instructions
whenever the function has FP. The offsets obtained for the CS
objects would always be the right values from FP.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D134949
In general, a callee is free to use a scratch register without
preserving its previous state. However, the VGPR used for SGPR
spilling can potentially have its inactive lanes overwritten by
the writelane instructions. When the function returns, it can
cause unexpected behavior if the VGPR value is not preserved
appropriately.
The current scheme to preserve the inactive lanes of such
scratch VGPRs is not done rightly. It preserves all lanes
and causes the outgoing values (if any) getting overwritten
by the epilog restores. It then corrupts the return value.
To avoid such situation with scratch VGPRs, this patch ensures
we preserve only their inactive lanes.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D134526
There is a lot of customization and eventually code duplication in the
frame lowering that handles special SGPR spills like the one needed for
the Frame Pointer. Incorporating any additional SGPR spill currently
makes it difficult during PEI. This patch introduces a new spill builder
to efficiently handle such spill requirements. Various spill methods are
special handled using a separate class.
Reviewed By: sebastian-ne, scott.linder
Differential Revision: https://reviews.llvm.org/D132436
SILowerSGPRSpills pass handles the lowering of SGPR spills
into VGPR lanes. Some SGPR spills are handled later during
PEI. There is a common function used in both places to find
the free VGPR lane. This patch eliminates that dependency to
find the free VGPR by handling it separately for PEI. It is a
prerequisite patch for a future work to allow SGPR spills to
virtual VGPR lanes during SILowerSGPRSpills.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D124195
We always assume the vector register is dead or killed while
inserting the VGPR spills in the prolog. It is not always
true. Used the entry block liveIn data while setting the flag.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D124194
The custom VGPR spills inserted during frame lowering
maintain a separate list for WWM reserved registers.
Added them into WWMSpills that already tracks such
reserved registers. It unifies the spill insertion.
Reviewed By: nhaehnle, arsenm
Differential Revision: https://reviews.llvm.org/D124193
Since the writelane instruction used for SGPR spills can
modify inactive lanes, the callee must preserve the VGPR
this instruction modifies even if it was marked Caller-saved.
Reviewed By: arsenm, nhaehnle
Differential Revision: https://reviews.llvm.org/D124192
Both SGPR->VGPR and VGPR->AGPR spilling code give a fixup to the
spill frame indices referred in debug instructions so that they
can be entirely removed. We should skip the stack argument debug
objects while looking inside the bitvector with FI as the index
that tracks the spill indices being processed. The stack args will
have negative indices and would crash while accessing the bitvector.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D137277
All the prologue instructions should have unknown source location
co-ordinates while the epilogue instructions should have source
location of last non-debug instruction after which epilogue
instructions are insrted.
This ensures the prologue/epilogue markers are generated correctly
in the line table.
Changes are brought in from the downstream CFI patches.
Reviewed By: scott.linder
Differential Revision: https://reviews.llvm.org/D131485
Based on available register budget, reserve highest available VGPR for
AGPR copy before RA. After RA, shift it to lowest unused VGPR if the one
exist.
Fixes SWDEV-330006.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D123525
There's no reason to create these immediately. They can be created in
the prolog/epilog code like CSR spills. There's probably a cleaner way
to do this by utilizing the CSR spill code.
This makes the frame index used transient state for
PrologEpilogInserter, and thus makes serialization easier. Really this
doesn't need to be saved here but there isn't really a better place
for it.
Currently the return address ABI registers s[30:31], which fall in the call
clobbered register range, are added as a live-in on the function entry to
preserve its value when we have calls so that it gets saved and restored
around the calls.
But the DWARF unwind information (CFI) needs to track where the return address
resides in a frame and the above approach makes it difficult to track the
return address when the CFI information is emitted during the frame lowering,
due to the involvment of understanding the control flow.
This patch moves the return address ABI registers s[30:31] into callee saved
registers range and stops adding live-in for return address registers, so that
the CFI machinery will know where the return address resides when CSR
save/restore happen during the frame lowering.
And doing the above poses an issue that now the return instruction uses undefined
register `sgpr30_sgpr31`. This is resolved by hiding the return address register
use by the return instruction through the `SI_RETURN` pseudo instruction, which
doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the
`S_SETPC_B64_return` during the `expandPostRAPseudo()`.
As an added benefit, this patch simplifies overall return instruction handling.
Note: The AMDGPU CFI changes are there only in the downstream code and another
version of this patch will be posted for review for the downstream code.
Reviewed By: arsenm, ronlieb
Differential Revision: https://reviews.llvm.org/D114652
In a future change, we will sometimes use a VGPR offset for doing
spills to memory, in which case we need 2 free VGPRs to do the SGPR
spill. In most cases we could spill the VGPR along with the SGPR being
spilled, but we don't have any free lanes for SGPR_1024 in wave32 so
we could still potentially need a second scavenging slot.
These have negative / out of bounds frame index values and would
assert when trying to set the BitVector. Fixed stack objects can't be
colored away so ignore them.
After the split register allocation changes in eebe841a47cb it is no
longer necessary to reserve a VGPR before RA. This can also create bugs
when IPRA is enabled since we cannot predict that a called function may
not reserve any register if it does not have any SGPR spills. If that
happens those functions may override reserved registers that are
normally callee saved. Added a test to show this.
Fixes: SWDEV-309900
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D115551
Removing dead frame indices for VGPR to AGPR spills is incorrect
when the frame index is shared by multiple objects, which may
occur due to stack slot coloring. The problem is that subsequent
code that processes the other object will assert because the stack
frame index is marked dead.
Removing dead frame indices is needed prior to stack slot
coloring, which is what happens with SGPR to VGPR spills. These
spills are lowered prior to stack slot coloring, but the VGPR
to AGPR spills are processed afterwards during the Prolog/Epilog
Inserter pass. This patch marks the VGPR to AGPR spill slot as
dead if the slot is not used by another object.
Differential Revision: https://reviews.llvm.org/D115996
AMDGPU is unusual in that the both stack is indexed in the same
direction as stack growth (up). We therefore always need the emergency
stack slots placed as low as possible to ensure they are in range of
load/store instruction immediate offsets. The existing logic is mostly
OK, but failed if we required stack realignment.
I don't understand what the existing control isFPCloseToIncomingSP is
supposed to mean, but can only be used to stop placing the scavenge
slots earlier. Make this explicit so that targets can opt-in rather
than opt-out only.
A future change will add SCC liveness checks. Since we are still
relying on forward register scavenging, add dead flags to avoid
spuriously detecting SCC as live.
Even if wave offset is not present we still need to do the rest
of the initialization. The mov into s32 was missing in the
kernels.
Fixes: SWDEV-310935
Differential Revision: https://reviews.llvm.org/D113628
There is no real source location for code inside prologue as it is
generated by compiler but source locations are being added to code
inside prologue as a side effect of https://reviews.llvm.org/D99269
because buildSpillLoadStore() is using source location of the real
instruction in the basic block if any.
Fixes: SWDEV-307590
Reviewed By: scott.linder, sebastian-ne
Differential Revision: https://reviews.llvm.org/D113100
Based on the reasoning of D53903, register operands of DBG_VALUE are
invariably treated as RegState::Debug operands. This change enforces
this invariant as part of MachineInstr::addOperand so that all passes
emit this flag consistently.
RegState::Debug is inconsistently set on DBG_VALUE registers throughout
LLVM. This runs the risk of a filtering iterator like
MachineRegisterInfo::reg_nodbg_iterator to process these operands
erroneously when not parsed from MIR sources.
This issue was observed in the development of the llvm-mos fork which
adds a backend that relies on physical register operands much more than
existing targets. Physical RegUnit 0 has the same numeric encoding as
$noreg (indicating an undef for DBG_VALUE). Allowing debug operands into
the machine scheduler correlates $noreg with RegUnit 0 (i.e. a collision
of register numbers with different zero semantics). Eventually, this
causes an assert where DBG_VALUE instructions are prohibited from
participating in live register ranges.
Reviewed By: MatzeB, StephenTozer
Differential Revision: https://reviews.llvm.org/D110105
With spilling into AGPRs enabled we cannot reliably predict
if we need to save FP or not. We may finally spill everything
into AGPRs and never touch stack. In this case we still may
save FP. This is deficiency but not an error, so avoid the
assert.
Differential Revision: https://reviews.llvm.org/D107404
If no scratch or flat instructions are used, we do not need to
initialize the flat scratch hardware register.
Differential Revision: https://reviews.llvm.org/D105920
AMDGPU normally spills SGPRs to VGPRs. Previously, since all register
classes are handled at the same time, this was problematic. We don't
know ahead of time how many registers will be needed to be reserved to
handle the spilling. If no VGPRs were left for spilling, we would have
to try to spill to memory. If the spilled SGPRs were required for exec
mask manipulation, it is highly problematic because the lanes active
at the point of spill are not necessarily the same as at the restore
point.
Avoid this problem by fully allocating SGPRs in a separate regalloc
run from VGPRs. This way we know the exact number of VGPRs needed, and
can reserve them for a second run. This fixes the most serious
issues, but it is still possible using inline asm to make all VGPRs
unavailable. Start erroring in the case where we ever would require
memory for an SGPR spill.
This is implemented by giving each regalloc pass a callback which
reports if a register class should be handled or not. A few passes
need some small changes to deal with leftover virtual registers.
In the AMDGPU implementation, a new pass is introduced to take the
place of PrologEpilogInserter for SGPR spills emitted during the first
run.
One disadvantage of this is currently StackSlotColoring is no longer
used for SGPR spills. It would need to be run again, which will
require more work.
Error if the standard -regalloc option is used. Introduce new separate
-sgpr-regalloc and -vgpr-regalloc flags, so the two runs can be
controlled individually. PBQB is not currently supported, so this also
prevents using the unhandled allocator.
This allows to convert the add instruction to s_addk_i32 and
v_add_nc_u32 instead of needing v_add_co_u32 when converting to a VALU
instruction.
Differential Revision: https://reviews.llvm.org/D103322
This patch adds TargetStackID::WasmLocal. This stack holds locations of
values that are only addressable by name -- not via a pointer to memory.
For the WebAssembly target, these objects are lowered to WebAssembly
local variables, which are managed by the WebAssembly run-time and are
not addressable by linear memory.
For the WebAssembly target IR indicates that an AllocaInst should be put
on TargetStackID::WasmLocal by putting it in the non-integral address
space WASM_ADDRESS_SPACE_WASM_VAR, with value 1. SROA will mostly lift
these allocations to SSA locals, but any alloca that reaches instruction
selection (usually in non-optimized builds) will be assigned the new
TargetStackID there. Loads and stores to those values are transformed
to new WebAssemblyISD::LOCAL_GET / WebAssemblyISD::LOCAL_SET nodes,
which then lower to the type-specific LOCAL_GET_I32 etc instructions via
tablegen patterns.
Differential Revision: https://reviews.llvm.org/D101140
This patch adds TargetStackID::WasmLocal. This stack holds locations of
values that are only addressable by name -- not via a pointer to memory.
For the WebAssembly target, these objects are lowered to WebAssembly
local variables, which are managed by the WebAssembly run-time and are
not addressable by linear memory.
For the WebAssembly target IR indicates that an AllocaInst should be put
on TargetStackID::WasmLocal by putting it in the non-integral address
space WASM_ADDRESS_SPACE_WASM_VAR, with value 1. SROA will mostly lift
these allocations to SSA locals, but any alloca that reaches instruction
selection (usually in non-optimized builds) will be assigned the new
TargetStackID there. Loads and stores to those values are transformed
to new WebAssemblyISD::LOCAL_GET / WebAssemblyISD::LOCAL_SET nodes,
which then lower to the type-specific LOCAL_GET_I32 etc instructions via
tablegen patterns.
Differential Revision: https://reviews.llvm.org/D101140
Since ca5f07f8c4bc96d16ed1992b810aa3897df157f2 already reverted
the cause for this warning, this commit now causes warnings about
a default label in a switch that covers the enum.
This reverts commit cf2eeb114c59cfc3a80133e96c585188fa16cc98.
Retrying after revert and fix (removed implicit def flag from operand). Now
passes with expensive_checks enabled.
Since there is a single scratch resource descriptor for all shaders, if there is
a wave32 and a wave64 shader (for instance for VsFs pairs)
then the const_index_stride will be incorrect for wave32 shaders.
Differential Revision: https://reviews.llvm.org/D101830
Change-Id: Ie3b8b2921237968caca91527dd0c97b1b0cc0360
Since there is a single scratch resource descriptor for all shaders, if there is
a wave32 and a wave64 shader (for instance for VsFs pairs)
then the const_index_stride will be incorrect for wave32 shaders.
Differential Revision: https://reviews.llvm.org/D101830
Change-Id: Id8de5566b0d1a07a814e2e7db016df9d20bf6d2c
This allows calling buildSpillLoadStore for an empty basic block, where
MI points at the end of the block instead of to an instruction.
This only happens with downstream CFI changes, so I was not able to
create a testcase that works with upstream LLVM.
Differential Revision: https://reviews.llvm.org/D101356
The values of registers in inactive lanes needs to be saved during
function calls.
Save all registers used for whole wave mode, similar to how it is done
for VGPRs that are used for SGPR spilling.
Differential Revision: https://reviews.llvm.org/D99429
Reapply with fixed tests on window.
The values of registers in inactive lanes needs to be saved during
function calls.
Save all registers used for whole wave mode, similar to how it is done
for VGPRs that are used for SGPR spilling.
Differential Revision: https://reviews.llvm.org/D99429
Spilling the fp or bp to scratch could overwrite VGPRs of inactive
lanes. Fix that by using only the active lanes of the scavenged VGPR.
This builds on the assumptions that
1. a function is never called with exec=0
2. lanes do not die in a function, i.e. exec!=0 in the function epilog
3. no new lanes are active when exiting the function, i.e. exec in the
epilog is a subset of exec in the prolog.
Differential Revision: https://reviews.llvm.org/D96869