171 Commits

Author SHA1 Message Date
Christudasan Devadasan
7a72a93580 [AMDGPU] Preserve only the inactive lanes of scratch vgprs
In general, a callee is free to use a scratch register without
preserving its previous state.  However, the VGPR used for SGPR
spilling can potentially have its inactive lanes overwritten by
the writelane instructions. When the function returns, it can
cause unexpected behavior if the VGPR value is not preserved
appropriately.

The current scheme to preserve the inactive lanes of such
scratch VGPRs is not done rightly. It preserves all lanes
and causes the outgoing values (if any) getting overwritten
by the epilog restores. It then corrupts the return value.

To avoid such situation with scratch VGPRs, this patch ensures
we preserve only their inactive lanes.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D134526
2022-12-17 11:51:43 +05:30
Christudasan Devadasan
20a940f1e2 [AMDGPU][SIFrameLowering] Unify PEI SGPR spill saves and restores
There is a lot of customization and eventually code duplication in the
frame lowering that handles special SGPR spills like the one needed for
the Frame Pointer. Incorporating any additional SGPR spill currently
makes it difficult during PEI. This patch introduces a new spill builder
to efficiently handle such spill requirements. Various spill methods are
special handled using a separate class.

Reviewed By: sebastian-ne, scott.linder

Differential Revision: https://reviews.llvm.org/D132436
2022-12-17 11:50:25 +05:30
Christudasan Devadasan
b25b4c0ab4 [AMDGPU] Separate out SGPR spills to VGPR lanes during PEI
SILowerSGPRSpills pass handles the lowering of SGPR spills
into VGPR lanes. Some SGPR spills are handled later during
PEI. There is a common function used in both places to find
the free VGPR lane. This patch eliminates that dependency to
find the free VGPR by handling it separately for PEI. It is a
prerequisite patch for a future work to allow SGPR spills to
virtual VGPR lanes during SILowerSGPRSpills.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D124195
2022-12-17 11:49:41 +05:30
Christudasan Devadasan
af5e5c40ff [AMDGPU] Add WWM reserved VGPRs to WWMSpills
The custom VGPR spills inserted during frame lowering
maintain a separate list for WWM reserved registers.
Added them into WWMSpills that already tracks such
reserved registers. It unifies the spill insertion.

Reviewed By: nhaehnle, arsenm

Differential Revision: https://reviews.llvm.org/D124193
2022-12-17 11:47:58 +05:30
Christudasan Devadasan
5692a7e84e [AMDGPU] Callee must always spill writelane VGPRs
Since the writelane instruction used for SGPR spills can
modify inactive lanes, the callee must preserve the VGPR
this instruction modifies even if it was marked Caller-saved.

Reviewed By: arsenm, nhaehnle

Differential Revision: https://reviews.llvm.org/D124192
2022-12-17 11:11:42 +05:30
Jeffrey Byrnes
4d2faf043b [AMDGPU][SIFrameLowering] Mark VGPR used for AGPR spills as reserved
Presently, there is an issue on MI100 (and probably other architecture) where the VGPR used for AGPR copies clobbers VGPR used for AGPR spill. AFAICT this is because in processFunctionBeforeFrameIndicesReplaced we think the VGPR register for AGPR spill is unused. This patch aims to correct that. This is a WIP while I work out issues with producing a good test. For now, I'm curious if this is generally a good / bad idea.

Differential Revision: https://reviews.llvm.org/D139673
2022-12-16 12:00:51 -08:00
Fangrui Song
67819a72c6 [CodeGen] llvm::Optional => std::optional 2022-12-13 09:06:36 +00:00
Krzysztof Parzyszek
c589730ad5 [YAML] Convert Optional to std::optional 2022-12-06 12:49:32 -08:00
Fangrui Song
4b1b9e22b3 Remove unused #include "llvm/ADT/Optional.h" 2022-12-05 04:21:08 +00:00
Kazu Hirata
20cde15415 [Target] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-02 20:36:06 -08:00
Nicolai Hähnle
b7f44f7cf9 AMDGPU: Remove ImagePSV and move images to addrspace 7
Following up on the removal of BufferPSV in commit 43b86bf992 ("AMDGPU:
Remove BufferPseudoSourceValue")

It is unclear what exactly the right address space for images should be.
They seem morally closest to buffers, so that's what I went with. In
practical terms, address space 7 is better than address space 0 because
it can't alias with LDS.

Differential Revision: https://reviews.llvm.org/D138949
2022-11-30 11:32:34 +01:00
Nicolai Hähnle
43b86bf992 AMDGPU: Remove BufferPseudoSourceValue
The use of a PSV for buffer intrinsics is misleading because it may be
misinterpreted as all buffer intrinsics accessing the same address in
memory, which is clearly not true.

Instead, build MachineMemOperands without a pointer value but with an
address space, so that address space-based alias analysis can still
work.

There is a lot of test churn because previously address space 4
(constant address space) was used as an address space for buffer
intrinsics. This doesn't make much sense and seems to have been an
accident -- see the change in
AMDGPUTargetMachine::getAddressSpaceForPseudoSourceKind.

Differential Revision: https://reviews.llvm.org/D138711
2022-11-29 22:15:11 +01:00
Matt Arsenault
ffdbbd112c AMDGPU: Directly pass Function to mayUseAGPRs
This was taking the MachineFunction, but only inspecting the
underlying IR.
2022-11-02 10:48:51 -07:00
Stanislav Mekhanoshin
5a3fe9a039 [AMDGPU] Move SIModeRegisterDefaults to SI MFI
It does not belong to a general AMDGPU MFI.

Differential Revision: https://reviews.llvm.org/D134666
2022-09-28 13:13:40 -07:00
Vitaly Buka
20a80d60a8 Revert "[AMDGPU] Move SIModeRegisterDefaults to SI MFI"
Break msan bots. Details in D134666.

This reverts commit 0ce96e06ee0226938e723bd0c8e16e3d2d51f203.
2022-09-26 22:22:09 -07:00
Stanislav Mekhanoshin
0ce96e06ee [AMDGPU] Move SIModeRegisterDefaults to SI MFI
It does not belong to a general AMDGPU MFI.

Differential Revision: https://reviews.llvm.org/D134666
2022-09-26 13:20:24 -07:00
Jon Chesterfield
3a20597776 [amdgpu] Implement lds kernel id intrinsic
Implement an intrinsic for use lowering LDS variables to different
addresses from different kernels. This will allow kernels that cannot
reach an LDS variable to avoid wasting space for it.

There are a number of implicit arguments accessed by intrinsic already
so this implementation closely follows the existing handling. It is slightly
novel in that this SGPR is written by the kernel prologue.

It is necessary in the general case to put variables at different addresses
such that they can be compactly allocated and thus necessary for an
indirect function call to have some means of determining where a
given variable was allocated. Claiming an arbitrary SGPR into which
an integer can be written by the kernel, in this implementation based
on metadata associated with that kernel, which is then passed on to
indirect call sites is sufficient to determine the variable address.

The intent is to emit a __const array of LDS addresses and index into it.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D125060
2022-07-19 17:46:19 +01:00
Guillaume Chatelet
cef65864af [Alignment] Use Align for MaxKernArgAlign
Differential Revision: https://reviews.llvm.org/D128118
2022-06-22 13:40:37 +00:00
Matt Arsenault
cc5a1b3dd9 llvm-reduce: Add cloning of target MachineFunctionInfo
MIR support is totally unusable for AMDGPU without this, since the set
of reserved registers is set from fields here.

Add a clone method to MachineFunctionInfo. This is a subtle variant of
the copy constructor that is required if there are any MIR constructs
that use pointers. Specifically, at minimum fields that reference
MachineBasicBlocks or the MachineFunction need to be adjusted to the
values in the new function.
2022-06-07 10:14:48 -04:00
Matt Arsenault
cfe5168499 AMDGPU: Make PSV instances static members 2022-06-07 10:14:48 -04:00
Matt Arsenault
dd7e407d81 AMDGPU: Move SpilledReg from MFI to SIRegisterInfo
This isn't the most natural place for it, but it avoids a circular
include dependency in an out of tree patch.
2022-06-02 17:11:24 -04:00
hsmahesha
5bd87350a5 [AMDGPU] On gfx908, reserve VGPR for AGPR copy based on register budget.
Based on available register budget, reserve highest available VGPR for
AGPR copy before RA. After RA, shift it to lowest unused VGPR if the one
exist.

Fixes SWDEV-330006.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D123525
2022-04-21 07:57:26 +05:30
Matt Arsenault
987df725ac AMDGPU: Serialize VGPRForAGPRCopy 2022-04-19 22:14:52 -04:00
Matt Arsenault
b5ec131267 AMDGPU: Fix allocating GDS globals to LDS offsets
These don't seem to be very well used or tested, but try to make the
behavior a bit more consistent with LDS globals.

I'm not sure what the definition for amdgpu-gds-size is supposed to
mean. For now I assumed it's allocating a static size at the beginning
of the allocation, and any known globals are allocated after it.
2022-04-19 22:14:48 -04:00
Matt Arsenault
378bb8014d AMDGPU: Serialize a few more MachineFunctionInfo fields in MIR 2022-04-19 22:12:59 -04:00
Matt Arsenault
f90f4884c8 AMDGPU: Serialize gds size in MIR 2022-04-19 22:12:59 -04:00
Matt Arsenault
5cd17f9d43 AMDGPU: Serialize WWM registers 2022-04-19 21:44:43 -04:00
Matt Arsenault
e0d585d75a AMDGPU: Defer creation of WWM VGPR spill slots
There's no reason to create these immediately. They can be created in
the prolog/epilog code like CSR spills. There's probably a cleaner way
to do this by utilizing the CSR spill code.

This makes the frame index used transient state for
PrologEpilogInserter, and thus makes serialization easier. Really this
doesn't need to be saved here but there isn't really a better place
for it.
2022-04-19 21:07:13 -04:00
hsmahesha
ea47373af4 [AMDGPU][NFC] Organize code around reserving VGPR32 for AGPR copy.
This is an NFC patch in preparation to fix a bug related to always
reserving VGPR32 for AGPR copy.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D123651
2022-04-14 12:51:33 +05:30
Changpeng Fang
8384ced974 [AMDGPU][NFC]: Remove unnecessary MFI functions
Summary:
  hasHostcallPtr() and hasHeapPtr() are only used in metadata emit.
However, we can use the corresponding function attributes directly
instead introducing the functions.

Reviewers: arsenm

Differential Revision: https://reviews.llvm.org/D122600
2022-03-28 12:13:33 -07:00
Changpeng Fang
ca62b1db9f [AMDGPU][NFC]: Emit metadata for hidden_heap_v1 kernarg
Summary:
  Emit metadata for hidden_heap_v1 kernarg

Reviewers:
  sameerds, b-sumner

Fixes:
  SWDEV-307188

Differential Revision:
  https://reviews.llvm.org/D119027
2022-02-25 10:45:35 -08:00
Sebastian Neubauer
6527b2a4d5 [AMDGPU][NFC] Fix typos
Fix some typos in the amdgpu backend.

Differential Revision: https://reviews.llvm.org/D119235
2022-02-18 15:05:21 +01:00
Sameer Sahasrabuddhe
d8f99bb6e0 [AMDGPU] replace hostcall module flag with function attribute
The module flag to indicate use of hostcall is insufficient to catch
all cases where hostcall might be in use by a kernel. This is now
replaced by a function attribute that gets propagated to top-level
kernel functions via their respective call-graph.

If the attribute "amdgpu-no-hostcall-ptr" is absent on a kernel, the
default behaviour is to emit kernel metadata indicating that the
kernel uses the hostcall buffer pointer passed as an implicit
argument.

The attribute may be placed explicitly by the user, or inferred by the
AMDGPU attributor by examining the call-graph. The attribute is
inferred only if the function is not being sanitized, and the
implictarg_ptr does not result in a load of any byte in the hostcall
pointer argument.

Reviewed By: jdoerfert, arsenm, kpyzhov

Differential Revision: https://reviews.llvm.org/D119216
2022-02-11 22:51:56 +05:30
Stanislav Mekhanoshin
aeaf85b9c2 [AMDGPU] Select VGPR versions of MFMA if possible
We can select _vgprcd versions of MAI instructions and have no
AGPRs with the whole budget left for VGPRs if:

1. This is a kernel;
2. It has no calls;
3. It runs at least on 2 waves thus having not more that 256 VGPRs.
4. There is no inline asm requesting AGPRs.

Differential Revision: https://reviews.llvm.org/D117253
2022-02-08 10:19:41 -08:00
Matt Arsenault
d6fdbbcace AMDGPU: Add second emergency slot for SGPR to vmem for large frames
In a future change, we will sometimes use a VGPR offset for doing
spills to memory, in which case we need 2 free VGPRs to do the SGPR
spill. In most cases we could spill the VGPR along with the SGPR being
spilled, but we don't have any free lanes for SGPR_1024 in wave32 so
we could still potentially need a second scavenging slot.
2022-02-02 19:05:05 -05:00
Matt Arsenault
de1600a1d9 AMDGPU: Avoid enabling kernel workitem IDs with reqd_work_group_size 2022-01-18 13:52:04 -05:00
Austin Kerbow
8470bf2b08 [AMDGPU] Do not reserve any VGPR for SGPR spills
After the split register allocation changes in eebe841a47cb it is no
longer necessary to reserve a VGPR before RA. This can also create bugs
when IPRA is enabled since we cannot predict that a called function may
not reserve any register if it does not have any SGPR spills. If that
happens those functions may override reserved registers that are
normally callee saved. Added a test to show this.

Fixes: SWDEV-309900

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D115551
2022-01-11 22:14:59 -08:00
Brendon Cahoon
d45a247998 [AMDGPU] Don't remove VGPR to AGPR dead spills from frame info
Removing dead frame indices for VGPR to AGPR spills is incorrect
when the frame index is shared by multiple objects, which may
occur due to stack slot coloring. The problem is that subsequent
code that processes the other object will assert because the stack
frame index is marked dead.

Removing dead frame indices is needed prior to stack slot
coloring, which is what happens with SGPR to VGPR spills. These
spills are lowered prior to stack slot coloring, but the VGPR
to AGPR spills are processed afterwards during the Prolog/Epilog
Inserter pass. This patch marks the VGPR to AGPR spill slot as
dead if the slot is not used by another object.

Differential Revision: https://reviews.llvm.org/D115996
2021-12-23 11:09:19 -06:00
Matt Arsenault
06b90175e7 AMDGPU: Remove fixed function ABI option 2021-12-10 19:41:19 -05:00
Matt Arsenault
729bf9b26b AMDGPU: Enable fixed function ABI by default
Code using indirect calls is broken without this, and there isn't
really much value in supporting the old attempt to vary the argument
placement based on uses. This resulted in more argument shuffling code
anyway.

Also have the option stop implying all inputs need to be passed. This
will no rely on the amdgpu-no-* attributes to avoid passing
unnecessary values.
2021-12-04 10:49:18 -05:00
Stanislav Mekhanoshin
e5340ed30c [AMDGPU] Fix global isel for kernels using agprs on gfx90a
With Global ISel getReservedRegs() is called before function is
regbank selected for the first time. Defer caching of usesAGPRs()
in this case.

Differential Revision: https://reviews.llvm.org/D112644
2021-10-29 14:23:14 -07:00
Stanislav Mekhanoshin
ca0c92d6a1 [AMDGPU] Allow to use a whole register file on gfx90a for VGPRs
In a kernel which does not have calls or AGPR usage we can allocate
the whole vector register budget for VGPRs and have no AGPRs as
long as VGPRs stay addressable (i.e. below 256).

Differential Revision: https://reviews.llvm.org/D111764
2021-10-21 18:24:34 -07:00
hsmahesha
52cb3af08c [AMDGPU] Remove dead frame indices after sgpr spill.
All those frame indices which are dead after sgpr spill should be removed from
the function frame. Othewise, there is a side effect such as re-mapping of free
frame index ids by the later pass(es) like "stack slot coloring" which in turn
could mess-up with the book keeping of "frame index to VGPR lane".

Reviewed By: cdevadas

Differential Revision: https://reviews.llvm.org/D111150
2021-10-12 09:58:49 +05:30
Matt Arsenault
722b8e0e5a AMDGPU: Invert ABI attribute handling
Previously we assumed all callable functions did not need any
implicitly passed inputs, and added attributes to functions to
indicate when they were necessary. Requiring attributes for
correctness is pretty ugly, and it makes supporting indirect and
external calls more complicated.

This inverts the direction of the attributes, so an undecorated
function is assumed to need all implicit imputs. This enables
AMDGPUAttributor by default to mark when functions are proven to not
need a given input. This strips the equivalent functionality from the
legacy AMDGPUAnnotateKernelFeatures pass.

However, AMDGPUAnnotateKernelFeatures is not fully removed at this
point although it should be in the future. It is still necessary for
the two hacky amdgpu-calls and amdgpu-stack-objects attributes, which
would be better served by a trivial analysis on the IR during
selection. Additionally, AMDGPUAnnotateKernelFeatures still
redundantly handles the uniform-work-group-size attribute to be
removed in a future commit.

At this point when not using -amdgpu-fixed-function-abi, we are still
modifying the ABI based on these newly negated attributes. In the
future, this option will be removed and the locations for implicit
inputs will always be fixed. We will then use the new attributes to
avoid passing the values when unnecessary.
2021-09-09 18:24:28 -04:00
Matt Arsenault
98d7aa435f AMDGPU: Stop inferring use of llvm.amdgcn.kernarg.segment.ptr
We no longer use this intrinsic outside of the backend and no longer
support using it outside of kernels.
2021-08-26 20:30:03 -04:00
Stanislav Mekhanoshin
827dd17e26 [AMDGPU] Invert partial vgpr to agpr spill lane order
On targets requiring VGPR alignment we may end up spilling an
unaligned register if we were partially spilled odd number of
leading lanes. The reminder will start with an odd register.

This problem is solved by inverting the order of lanes to
be spillied so that we start from the end.

Differential Revision: https://reviews.llvm.org/D108732
2021-08-26 09:39:03 -07:00
Matt Arsenault
5beb9a0e6a AMDGPU: Respect compute ABI attributes with unknown OS
Unfortunately Mesa is still using amdgcn-- as the triple for OpenGL,
so we still have the awkward unknown OS case to deal with. Previously
if the HSA ABI intrinsics appeared, we we would not add the ABI
registers to the function. We would emit an error later, but we still
need to produce some compile result. Start adding the registers to any
compute function, regardless of the OS. This keeps the internal state
more consistent, and will help avoid numerous test crashes in a future
patch which starts assuming the ABI inputs are present on functions by
default.
2021-08-13 20:44:46 -04:00
Sebastian Neubauer
4359b870b1 [AMDGPU] Init scratch only if necessary
If no scratch or flat instructions are used, we do not need to
initialize the flat scratch hardware register.

Differential Revision: https://reviews.llvm.org/D105920
2021-07-14 10:45:22 +02:00
Matt Arsenault
eebe841a47 RegAlloc: Allow targets to split register allocation
AMDGPU normally spills SGPRs to VGPRs. Previously, since all register
classes are handled at the same time, this was problematic. We don't
know ahead of time how many registers will be needed to be reserved to
handle the spilling. If no VGPRs were left for spilling, we would have
to try to spill to memory. If the spilled SGPRs were required for exec
mask manipulation, it is highly problematic because the lanes active
at the point of spill are not necessarily the same as at the restore
point.

Avoid this problem by fully allocating SGPRs in a separate regalloc
run from VGPRs. This way we know the exact number of VGPRs needed, and
can reserve them for a second run.  This fixes the most serious
issues, but it is still possible using inline asm to make all VGPRs
unavailable. Start erroring in the case where we ever would require
memory for an SGPR spill.

This is implemented by giving each regalloc pass a callback which
reports if a register class should be handled or not. A few passes
need some small changes to deal with leftover virtual registers.

In the AMDGPU implementation, a new pass is introduced to take the
place of PrologEpilogInserter for SGPR spills emitted during the first
run.

One disadvantage of this is currently StackSlotColoring is no longer
used for SGPR spills. It would need to be run again, which will
require more work.

Error if the standard -regalloc option is used. Introduce new separate
-sgpr-regalloc and -vgpr-regalloc flags, so the two runs can be
controlled individually. PBQB is not currently supported, so this also
prevents using the unhandled allocator.
2021-07-13 18:49:29 -04:00
Stanislav Mekhanoshin
6fb02596a2 [AMDGPU] Add support for architected flat scratch
Add support for the readonly flat Scratch register initialized
by the SPI.

Differential Revision: https://reviews.llvm.org/D102432
2021-05-14 10:53:48 -07:00