76 Commits

Author SHA1 Message Date
Elizaveta Noskova
bbde6be841
[llvm] Support multiple save/restore points in mir (#119357)
Currently mir supports only one save and one restore point
specification:

```
  savePoint:       '%bb.1'
  restorePoint:    '%bb.2'
```

This patch provide possibility to have multiple save and multiple
restore points in mir:

```
  savePoints:
    - point:           '%bb.1'
  restorePoints:
    - point:           '%bb.2'
```

Shrink-Wrap points split Part 3.
RFC:
https://discourse.llvm.org/t/shrink-wrap-save-restore-points-splitting/83581

Part 1: https://github.com/llvm/llvm-project/pull/117862
Part 2: https://github.com/llvm/llvm-project/pull/119355
Part 4: https://github.com/llvm/llvm-project/pull/119358
Part 5: https://github.com/llvm/llvm-project/pull/119359
2025-08-12 16:34:29 +03:00
Matt Arsenault
44ff1ed16e
AMDGPU: Move getMaxNumVectorRegs into GCNSubtarget (NFC) (#150889)
Addresses a TODO
2025-07-28 17:25:20 +09:00
Matt Arsenault
72b77c193f
AMDGPU: Avoid contraction in wwm allocation failure message (#150888) 2025-07-28 17:11:28 +09:00
Rahul Joshi
52c2e45c11
[NFC][CodeGen] Adopt MachineFunctionProperties convenience accessors (#141101) 2025-05-23 08:30:29 -07:00
Jie Fu
a4d1a9d6d5 [AMDGPU] Remove unused variables in SILowerSGPRSpills.cpp (NFC)
/llvm-project/llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp:172:25:
error: unused variable 'RI' [-Werror,-Wunused-variable]
  const SIRegisterInfo *RI = ST.getRegisterInfo();
                        ^
1 error generated.
2025-04-25 17:59:55 +08:00
Diana Picus
5bad5d84a1
Reland [AMDGPU] Support block load/store for CSR #130013 (#137169)
Add support for using the existing SCRATCH_STORE_BLOCK and
SCRATCH_LOAD_BLOCK instructions for saving and restoring callee-saved
VGPRs. This is controlled by a new subtarget feature, block-vgpr-csr. It
does not include WWM registers - those will be saved and restored
individually, just like before. This patch does not change the ABI.

Use of this feature may lead to slightly increased stack usage, because
the memory is not compacted if certain registers don't have to be
transferred (this will happen in practice for calling conventions where
the callee and caller saved registers are interleaved in groups of 8).
However, if the registers at the end of the block of 32 don't have to be
transferred, we don't need to use a whole 128-byte stack slot - we can
trim some space off the end of the range.

In order to implement this feature, we need to rely less on the
target-independent code in the PrologEpilogInserter, so we override
several new methods in SIFrameLowering. We also add new pseudos,
SI_BLOCK_SPILL_V1024_SAVE/RESTORE.

One peculiarity is that both the SI_BLOCK_V1024_RESTORE pseudo and the
SCRATCH_LOAD_BLOCK instructions will have all the registers that are not
transferred added as implicit uses. This is done in order to inform
LiveRegUnits that those registers are not available before the restore
(since we're not really restoring them - so we can't afford to scavenge
them). Unfortunately, this trick doesn't work with the save, so before
the save all the registers in the block will be unavailable (see the
unit test).

This was reverted due to failures in the builds with expensive checks
on, now fixed by always updating LiveIntervals and SlotIndexes in
SILowerSGPRSpills.
2025-04-25 11:29:27 +02:00
Vikash Gupta
bdb63208b4
[AMDGPU][CodeGen] Using MBB's liveIn check in tandem with MCRegAliasIterator in SILowerSGPRSpills (#129848)
This patch replaces use of MachineRegisterInfo's liveIn check with the
machine basicBlock's liveIn. As the MRI's liveIn is inconsistent with
the entry MBB liveIns, when it comes to the machine verifier checks.

PS: Its an alternative solution with respect to #126926.
2025-03-18 10:51:07 +05:30
Matt Arsenault
8387cbd0f9
AMDGPU: Delete spills of undef values (#119684)
AMDGPU: Delete spills of undef values

It would be a bit more logical to preserve the undef and do the normal
expansion, but this is less work. This avoids verifier errors in a
future patch which starts deleting liveness from registers after
allocation failures which results in spills of undef values.

https://reviews.llvm.org/D122607

Move where undef sgpr spills are deleted
2024-12-17 13:08:38 +07:00
Kazu Hirata
be187369a0
[AMDGPU] Remove unused includes (NFC) (#116154)
Identified with misc-include-cleaner.
2024-11-13 21:10:03 -08:00
Akshat Oke
834b820f40
[AMDGPU] Correct pass dependencies for SILowerSGPRSpills (#109937)
Replace unused analysis (VirtRegMap) dependency with the used one (SlotIndexes)
Initializes `SlotIndexesWrapperPass` which is used by SILowerSGPRSpills to ensure that legacy pass manager finds it.
Removes the initialization for `VirtRegMapWrapperPass` since it is not requested in this pass.
2024-10-22 15:20:54 +05:30
Akshat Oke
93802815ab
[NewPM][CodeGen] Port VirtRegMap to NPM (#109936) 2024-10-22 15:15:56 +05:30
Christudasan Devadasan
ac0f64f06d
[AMDGPU] Split vgpr regalloc pipeline (#93526)
Allocating wwm-registers and per-thread VGPR operands
together imposes many challenges in the way the
registers are reused during allocation. There are
times when regalloc reuses the registers of regular
VGPRs operations for wwm-operations in a small range
leading to unwantedly clobbering their inactive lanes
causing correctness issues that are hard to trace.

This patch splits the VGPR allocation pipeline further
to allocate wwm-registers first and the regular VGPR
operands in a separate pipeline. The splitting would
ensure that the physical registers used for wwm
allocations won't take part in the next allocation
pipeline to avoid any such clobbering.
2024-09-30 19:55:42 +05:30
Pravin Jagtap
3659aa8079
[AMDGPU] Fix handling of DBG_VALUE_LIST while fixing the dead frame indices. (#109685)
Both SGPR->VGPR and VGPR->AGPR spilling code give a fixup to the spill
frame indices referred in debug instructions so that they can be
entirely removed. The stack argument is present at 0th index in
DBG_VALUE and at 2nd index for DBG_VALUE_LIST.

Fixes: SWDEV-484156
2024-09-24 14:41:45 +05:30
Akshat Oke
0b0874755d
[AMDGPU][NewPM] Port SILowerSGPRSpills to NPM (#108934) 2024-09-21 09:59:36 +05:30
Jay Foad
5e338f1f4a [AMDGPU] clang-tidy: use emplace_back instead of push_back. NFC. 2024-07-17 08:27:35 +01:00
paperchalice
abde52aa66
[CodeGen][NewPM] Port LiveIntervals to new pass manager (#98118)
- Add `LiveIntervalsAnalysis`.
- Add `LiveIntervalsPrinterPass`.
- Use `LiveIntervalsWrapperPass` in legacy pass manager.
- Use `std::unique_ptr` instead of raw pointer for `LICalc`, so
destructor and default move constructor can handle it correctly.

This would be the last analysis required by `PHIElimination`.
2024-07-10 19:34:48 +08:00
paperchalice
4010f894a1
[CodeGen][NewPM] Port SlotIndexes to new pass manager (#97941)
- Add `SlotIndexesAnalysis`.
- Add `SlotIndexesPrinterPass`.
- Use `SlotIndexesWrapperPass` in legacy pass.
2024-07-09 12:09:11 +08:00
bcahoon
353322f61d
[AMDGPU] Fix end() iterator dereference in SILowerSGPRSpills (#88828) 2024-04-18 09:34:27 -05:00
Christudasan Devadasan
230c13d59d
[AMDGPU] Pick available high VGPR for CSR SGPR spilling (#78669)
CSR SGPR spilling currently uses the early available physical VGPRs. It
currently imposes a high register pressure while trying to allocate
large VGPR tuples within the default register budget.

This patch changes the spilling strategy by picking the VGPRs in the
reverse order, the highest available VGPR first and later after regalloc
shift them back to the lowest available range. With that, the initial
VGPRs would be available for allocation and possibility
of finding large number of contiguous registers will be more.
2024-01-24 07:08:43 +05:30
Carl Ritson
4db4d7f282
[AMDGPU] SILowerSGPRSpills: do not update MRI reserve registers (#77888)
VGPRs used for spilling do not require explicit reservation with MRI.
freezeReservedRegs() executed before register allocation ensures these
are placed in the reserve set.

The only pass after SILowerSGPRSpills is SIPreAllocateWWMRegs which
explicitly tests for interference before register allocation so should
not reuse a WWM VGPR holding spill data. reserveReg prevents calculation
of correct liveness for physical registers which could be used to extend
SIPreAllocateWWMRegs.
2024-01-23 10:49:26 +09:00
Yashwant Singh
7ac532efc8
[AMDGPU] Introduce AMDGPU::SGPR_SPILL asm comment flag (#67091)
Use this flag to give more context to implicit def comments in assembly.

Reviewed on phabricator: 
https://reviews.llvm.org/D153754
2023-09-29 11:15:01 +05:30
Kazu Hirata
8a7f4eeb60 [llvm] Use llvm::is_contained (NFC) 2023-09-22 17:09:27 -07:00
Matt Arsenault
4d42e8b5d1 Reapply "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"
This reverts commit a496c8be6e638ae58bb45f13113dbe3a4b7b23fd.

The workaround in c26dfc81e254c78dc23579cf3d1336f77249e1f6 should work
around the underlying problem with SUBREG_TO_REG.
2023-07-31 20:15:45 -04:00
Vitaly Buka
a496c8be6e Revert "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"
And dependent commits.

Details in D150388.

This reverts commit 825b7f0ca5f2211ec3c93139f98d1e24048c225c.
This reverts commit 7a98f084c4d121244ef7286bc6503b6a181d446e.
This reverts commit b4a62b1fa546312d882fa12dfdcd015177d66826.
This reverts commit b7836d856206ec39509d42529f958c920368166b.

No conflicts in the code, few tests had conflicts in autogenerated CHECKs:
llvm/test/CodeGen/Thumb2/mve-float32regloops.ll
llvm/test/CodeGen/AMDGPU/fix-frame-reg-in-custom-csr-spills.ll

Reviewed By: alexfh

Differential Revision: https://reviews.llvm.org/D156381
2023-07-26 22:13:32 -07:00
Christudasan Devadasan
7a98f084c4 [AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs
Currently, the custom SGPR spill lowering pass spills
SGPRs into physical VGPR lanes and the remaining VGPRs
are used by regalloc for vector regclass allocation.
This imposes many restrictions that we ended up with
unsuccessful SGPR spilling when there won't be enough
VGPRs and we are forced to spill the leftover into
memory during PEI. The custom spill handling during PEI
has many edge cases and often breaks the compiler time
to time.

This patch implements spilling SGPRs into virtual VGPR
lanes. Since we now split the register allocation for
SGPRs and VGPRs, the virtual registers introduced for
the spill lanes would get allocated automatically in
the subsequent regalloc invocation for VGPRs.

Spill to virtual registers will always be successful,
even in the high-pressure situations, and hence it avoids
most of the edge cases during PEI. We are now left with
only the custom SGPR spills during PEI for special registers
like the frame pointer which is an unproblematic case.

Differential Revision: https://reviews.llvm.org/D124196
2023-07-07 23:14:32 +05:30
Christudasan Devadasan
b78b36e1a2 [AMDGPU] Implement whole wave register spill
To reduce the register pressure during allocation,
when the allocator spills a virtual register that
corresponds to a whole wave mode operation, the
spill loads and restores should be activated for
all lanes by temporarily flipping all bits in exec
register to one just before the spills. It is not
implemented in the compiler as of today and this
patch enables the necessary support.

This is a pre-patch before the SGPR spill to virtual
VGPR lanes that would eventually causes the whole
wave register spills during allocation.

Reviewed By: arsenm, cdevadas

Differential Revision: https://reviews.llvm.org/D143759
2023-07-07 22:51:45 +05:30
Christudasan Devadasan
a3028239a7 Revert "[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs"
This reverts commit 40ba0942e2ab1107f83aa5a0ee5ae2980bf47b1a.
2022-12-21 16:17:42 +05:30
Christudasan Devadasan
40ba0942e2 [AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs
Currently, the custom SGPR spill lowering pass spills
SGPRs into physical VGPR lanes and the remaining VGPRs
are used by regalloc for vector regclass allocation.
This imposes many restrictions that we ended up with
unsuccessful SGPR spilling when there won't be enough
VGPRs and we are forced to spill the leftover into
memory during PEI. The custom spill handling during PEI
has many edge cases and often breaks the compiler time
to time.

This patch implements spilling SGPRs into virtual VGPR
lanes. Since we now split the register allocation for
SGPRs and VGPRs, the virtual registers introduced for
the spill lanes would get allocated automatically in
the subsequent regalloc invocation for VGPRs.

Spill to virtual registers will always be successful,
even in the high-pressure situations, and hence it avoids
most of the edge cases during PEI. We are now left with
only the custom SGPR spills during PEI for special registers
like the frame pointer which isn an unproblematic case.

This patch also implements the whole wave spills which
might occur if RA spills any live range of virtual registers
involved in the whole wave operations. Earlier, we had
been hand-picking registers for such machine operands.
But now with SGPR spills into virtual VGPR lanes, we are
exposing them to the allocator.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D124196
2022-12-17 11:56:32 +05:30
Christudasan Devadasan
b5efec4b27 [CodeGen] Additional Register argument to storeRegToStackSlot/loadRegFromStackSlot
With D134950, targets get notified when a virtual register is created and/or
cloned. Targets can do the needful with the delegate callback. AMDGPU propagates
the virtual register flags maintained in the target file itself. They are useful
to identify a certain type of machine operands while inserting spill stores and
reloads. Since RegAllocFast spills the physical register itself, there is no way
its virtual register can be mapped back to retrieve the flags. It can be solved
by passing the virtual register as an additional argument. This argument has no
use when the spill interfaces are called during the greedy allocator or even the
PrologEpilogInserter and can pass a null register in such cases.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D138656
2022-12-17 11:55:34 +05:30
Christudasan Devadasan
b25b4c0ab4 [AMDGPU] Separate out SGPR spills to VGPR lanes during PEI
SILowerSGPRSpills pass handles the lowering of SGPR spills
into VGPR lanes. Some SGPR spills are handled later during
PEI. There is a common function used in both places to find
the free VGPR lane. This patch eliminates that dependency to
find the free VGPR by handling it separately for PEI. It is a
prerequisite patch for a future work to allow SGPR spills to
virtual VGPR lanes during SILowerSGPRSpills.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D124195
2022-12-17 11:49:41 +05:30
Christudasan Devadasan
5692a7e84e [AMDGPU] Callee must always spill writelane VGPRs
Since the writelane instruction used for SGPR spills can
modify inactive lanes, the callee must preserve the VGPR
this instruction modifies even if it was marked Caller-saved.

Reviewed By: arsenm, nhaehnle

Differential Revision: https://reviews.llvm.org/D124192
2022-12-17 11:11:42 +05:30
Christudasan Devadasan
a8d7ad70aa [AMDGPU] Skip stack-arg dbg objects while fixing the dead frame indices
Both SGPR->VGPR and VGPR->AGPR spilling code give a fixup to the
spill frame indices referred in debug instructions so that they
can be entirely removed. We should skip the stack argument debug
objects while looking inside the bitvector with FI as the index
that tracks the spill indices being processed. The stack args will
have negative indices and would crash while accessing the bitvector.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D137277
2022-11-04 15:28:35 +05:30
Matt Arsenault
74ef03d38a AMDGPU: Update SlotIndexes independently of LiveIntervals
Apparently StackColoring depends on SlotIndexes, but not
LiveIntervals. If regalloc fast were manually requested, LiveIntervals
would be dropped before SILowerSGPRSpills but not SlotIndexes.

SILowerSGPRSpills preserved SlotIndexes, but only through
LiveIntervals. As a result, SILowerSGPRSpills was incorrectly
reporting it preserved SlotIndexes. Start updating these directly,
instead of depending on LiveIntervals also being available.
2022-10-07 13:15:15 -07:00
Thomas Symalla
04c5fed5e0 [NFC] Fix wrong comment. 2022-07-05 13:37:44 +02:00
serge-sans-paille
989f1c72e0 Cleanup codegen includes
This is a (fixed) recommit of https://reviews.llvm.org/D121169

after:  1061034926
before: 1063332844

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D121681
2022-03-16 08:43:00 +01:00
Nico Weber
a278250b0f Revert "Cleanup codegen includes"
This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20.
Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang,
and many LLVM tests, see comments on https://reviews.llvm.org/D121169
2022-03-10 07:59:22 -05:00
serge-sans-paille
7f230feeea Cleanup codegen includes
after:  1061034926
before: 1063332844

Differential Revision: https://reviews.llvm.org/D121169
2022-03-10 10:00:30 +01:00
Venkata Ramanaiah Nalamothu
04fff547e2 [AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range
Currently the return address ABI registers s[30:31], which fall in the call
clobbered register range, are added as a live-in on the function entry to
preserve its value when we have calls so that it gets saved and restored
around the calls.

But the DWARF unwind information (CFI) needs to track where the return address
resides in a frame and the above approach makes it difficult to track the
return address when the CFI information is emitted during the frame lowering,
due to the involvment of understanding the control flow.

This patch moves the return address ABI registers s[30:31] into callee saved
registers range and stops adding live-in for return address registers, so that
the CFI machinery will know where the return address resides when CSR
save/restore happen during the frame lowering.

And doing the above poses an issue that now the return instruction uses undefined
register `sgpr30_sgpr31`. This is resolved by hiding the return address register
use by the return instruction through the `SI_RETURN` pseudo instruction, which
doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the
`S_SETPC_B64_return` during the `expandPostRAPseudo()`.

As an added benefit, this patch simplifies overall return instruction handling.

Note: The AMDGPU CFI changes are there only in the downstream code and another
version of this patch will be posted for review for the downstream code.

Reviewed By: arsenm, ronlieb

Differential Revision: https://reviews.llvm.org/D114652
2022-03-09 12:18:02 +05:30
Matt Arsenault
d6fdbbcace AMDGPU: Add second emergency slot for SGPR to vmem for large frames
In a future change, we will sometimes use a VGPR offset for doing
spills to memory, in which case we need 2 free VGPRs to do the SGPR
spill. In most cases we could spill the VGPR along with the SGPR being
spilled, but we don't have any free lanes for SGPR_1024 in wave32 so
we could still potentially need a second scavenging slot.
2022-02-02 19:05:05 -05:00
Jim Lin
d6b0734837 [NFC] Use Register instead of unsigned 2022-01-19 20:17:04 +08:00
Austin Kerbow
8470bf2b08 [AMDGPU] Do not reserve any VGPR for SGPR spills
After the split register allocation changes in eebe841a47cb it is no
longer necessary to reserve a VGPR before RA. This can also create bugs
when IPRA is enabled since we cannot predict that a called function may
not reserve any register if it does not have any SGPR spills. If that
happens those functions may override reserved registers that are
normally callee saved. Added a test to show this.

Fixes: SWDEV-309900

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D115551
2022-01-11 22:14:59 -08:00
Ron Lieberman
09b53296cf Revert "[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range"
This reverts commit 9075009d1fd5f2bf9aa6c2f362d2993691a316b3.

 Failed amdgpu runtime buildbot # 3514
2021-12-22 11:39:28 -05:00
RamNalamothu
9075009d1f [AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range
Currently the return address ABI registers s[30:31], which fall in the call
clobbered register range, are added as a live-in on the function entry to
preserve its value when we have calls so that it gets saved and restored
around the calls.

But the DWARF unwind information (CFI) needs to track where the return address
resides in a frame and the above approach makes it difficult to track the
return address when the CFI information is emitted during the frame lowering,
due to the involvment of understanding the control flow.

This patch moves the return address ABI registers s[30:31] into callee saved
registers range and stops adding live-in for return address registers, so that
the CFI machinery will know where the return address resides when CSR
save/restore happen during the frame lowering.

And doing the above poses an issue that now the return instruction uses undefined
register `sgpr30_sgpr31`. This is resolved by hiding the return address register
use by the return instruction through the `SI_RETURN` pseudo instruction, which
doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the
`S_SETPC_B64_return` during the `expandPostRAPseudo()`.

As an added benefit, this patch simplifies overall return instruction handling.

Note: The AMDGPU CFI changes are there only in the downstream code and another
version of this patch will be posted for review for the downstream code.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D114652
2021-12-22 20:51:12 +05:30
Neubauer, Sebastian
d1f45ed58f [AMDGPU][NFC] Fix typos
Differential Revision: https://reviews.llvm.org/D113672
2021-11-12 11:37:21 +01:00
Kazu Hirata
4bef0304e1 [AArch64, AMDGPU] Use make_early_inc_range (NFC) 2021-11-03 09:22:51 -07:00
Jay Foad
d55db4b033 [AMDGPU] Remove unused VirtRegMap analysis. NFC. 2021-10-18 11:55:40 +01:00
hsmahesha
52cb3af08c [AMDGPU] Remove dead frame indices after sgpr spill.
All those frame indices which are dead after sgpr spill should be removed from
the function frame. Othewise, there is a side effect such as re-mapping of free
frame index ids by the later pass(es) like "stack slot coloring" which in turn
could mess-up with the book keeping of "frame index to VGPR lane".

Reviewed By: cdevadas

Differential Revision: https://reviews.llvm.org/D111150
2021-10-12 09:58:49 +05:30
Jack Andersen
bd4dad87f4 [MachineInstr] Move MIParser's DBG_VALUE RegState::Debug invariant into MachineInstr::addOperand
Based on the reasoning of D53903, register operands of DBG_VALUE are
invariably treated as RegState::Debug operands. This change enforces
this invariant as part of MachineInstr::addOperand so that all passes
emit this flag consistently.

RegState::Debug is inconsistently set on DBG_VALUE registers throughout
LLVM. This runs the risk of a filtering iterator like
MachineRegisterInfo::reg_nodbg_iterator to process these operands
erroneously when not parsed from MIR sources.

This issue was observed in the development of the llvm-mos fork which
adds a backend that relies on physical register operands much more than
existing targets. Physical RegUnit 0 has the same numeric encoding as
$noreg (indicating an undef for DBG_VALUE). Allowing debug operands into
the machine scheduler correlates $noreg with RegUnit 0 (i.e. a collision
of register numbers with different zero semantics). Eventually, this
causes an assert where DBG_VALUE instructions are prohibited from
participating in live register ranges.

Reviewed By: MatzeB, StephenTozer

Differential Revision: https://reviews.llvm.org/D110105
2021-10-07 16:08:52 +01:00
Matt Arsenault
eebe841a47 RegAlloc: Allow targets to split register allocation
AMDGPU normally spills SGPRs to VGPRs. Previously, since all register
classes are handled at the same time, this was problematic. We don't
know ahead of time how many registers will be needed to be reserved to
handle the spilling. If no VGPRs were left for spilling, we would have
to try to spill to memory. If the spilled SGPRs were required for exec
mask manipulation, it is highly problematic because the lanes active
at the point of spill are not necessarily the same as at the restore
point.

Avoid this problem by fully allocating SGPRs in a separate regalloc
run from VGPRs. This way we know the exact number of VGPRs needed, and
can reserve them for a second run.  This fixes the most serious
issues, but it is still possible using inline asm to make all VGPRs
unavailable. Start erroring in the case where we ever would require
memory for an SGPR spill.

This is implemented by giving each regalloc pass a callback which
reports if a register class should be handled or not. A few passes
need some small changes to deal with leftover virtual registers.

In the AMDGPU implementation, a new pass is introduced to take the
place of PrologEpilogInserter for SGPR spills emitted during the first
run.

One disadvantage of this is currently StackSlotColoring is no longer
used for SGPR spills. It would need to be run again, which will
require more work.

Error if the standard -regalloc option is used. Introduce new separate
-sgpr-regalloc and -vgpr-regalloc flags, so the two runs can be
controlled individually. PBQB is not currently supported, so this also
prevents using the unhandled allocator.
2021-07-13 18:49:29 -04:00
Ruiling Song
4cee5cad28 [AMDGPU] Free reserved VGPR if no SGPR spill
I met some code generation behavior change when I tried to remove
the hasStackObject() check when reserving VGPR for SGPR spill.
For example, the function `callee_no_stack_no_fp_elim_all` in the lit
test file `callee-frame-setup.ll`.
The generated code changed from:
```
  s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
  s_mov_b32 s4, s33
  s_mov_b32 s33, s32
  s_mov_b32 s33, s4
  s_setpc_b64 s[30:31]
```

into something like:
```
  s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
  v_writelane_b32 v63, s33, 0
  s_mov_b32 s33, s32
  v_readlane_b32 s33, v63, 0
  s_setpc_b64 s[30:31]
```

I think we still prefer the old version where only scalar instructions are needed.
The idea here is free the reserved VGPR if no SGPR spills. So we will very likely
to use a free SGPR for fp/sp spill.

Reviewed by: arsenm

Differential Revision: https://reviews.llvm.org/D98344
2021-03-12 08:11:14 +08:00