21 Commits

Author SHA1 Message Date
Venkata Ramanaiah Nalamothu
04fff547e2 [AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range
Currently the return address ABI registers s[30:31], which fall in the call
clobbered register range, are added as a live-in on the function entry to
preserve its value when we have calls so that it gets saved and restored
around the calls.

But the DWARF unwind information (CFI) needs to track where the return address
resides in a frame and the above approach makes it difficult to track the
return address when the CFI information is emitted during the frame lowering,
due to the involvment of understanding the control flow.

This patch moves the return address ABI registers s[30:31] into callee saved
registers range and stops adding live-in for return address registers, so that
the CFI machinery will know where the return address resides when CSR
save/restore happen during the frame lowering.

And doing the above poses an issue that now the return instruction uses undefined
register `sgpr30_sgpr31`. This is resolved by hiding the return address register
use by the return instruction through the `SI_RETURN` pseudo instruction, which
doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the
`S_SETPC_B64_return` during the `expandPostRAPseudo()`.

As an added benefit, this patch simplifies overall return instruction handling.

Note: The AMDGPU CFI changes are there only in the downstream code and another
version of this patch will be posted for review for the downstream code.

Reviewed By: arsenm, ronlieb

Differential Revision: https://reviews.llvm.org/D114652
2022-03-09 12:18:02 +05:30
Matt Arsenault
f1f15d0285 AMDGPU: Fix failing test 2022-02-04 16:54:09 -05:00
Matt Arsenault
4622afa94c AMDGPU: Convert AMDGPUResourceUsageAnalysis to a Module pass
This is more precise in the face of indirect calls and aliases, still
assuming the call target is defined somewhere in the current module.

This sometimes changes the order the functions are printed, and also
changes the point where context errors are printed relative to
stdout. This also likely has negative consequences for compile time
and memory usage.
2022-02-04 15:56:04 -05:00
Matt Arsenault
935abab65c AMDGPU: Use module level register maximums for unknown callees
Compute the theoretical register budget based on the IR function
signature/attributes, and use the global maximum register budgets for
unknown callees.

This should fix the kernel reported register usage in the presence of
indirect calls. The previous fix in
2b08f6af62afbf32e89a6a392dbafa92c62f7bdf was incorrect becauset it was
only taking the maximum in the known call graph, and missing something
that was either outside of it or codegened later.

This fixes a second case I discovered where calls to aliases also did
not work as expected. CallGraphAnalysis misses these, so functions
called through aliases were not codegened ahead of callers as
expected. CallGraphAnalysis should probably be fixed to understand
this case, and there's likely a bug with IPRA here. This fixes
numerous failures in the conformance test at -O0.
2022-02-04 15:56:03 -05:00
Matt Arsenault
045be6ff36 AMDGPU/GlobalISel: Fold wave address into mubuf addressing modes 2022-01-26 15:25:26 -05:00
Matt Arsenault
dc2457c8cf AMDGPU: Fix crashing on calls to C functions from graphics contexts
If we had one of the shader calling conventions calling a default
calling convention callee, this would crash when the caller did not
have anything to pass to the workitem ID.

This is illegal, but we still need to produce something
sensible. llvm-reduce likes to replace calls to intrinsics with calls
to null or undef, so this does appear and is helpful to avoid hard
erroring.

Pass undef in this case, as already happened for the other implicit
arguments. It might make sense to define the behavior here and pass
null for the pointers, and -1 for the workitem ID. We do have extra
bits in the workitem ID, so this wouldn't conflict with a valid value.
2022-01-17 10:13:25 -05:00
Ron Lieberman
09b53296cf Revert "[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range"
This reverts commit 9075009d1fd5f2bf9aa6c2f362d2993691a316b3.

 Failed amdgpu runtime buildbot # 3514
2021-12-22 11:39:28 -05:00
RamNalamothu
9075009d1f [AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range
Currently the return address ABI registers s[30:31], which fall in the call
clobbered register range, are added as a live-in on the function entry to
preserve its value when we have calls so that it gets saved and restored
around the calls.

But the DWARF unwind information (CFI) needs to track where the return address
resides in a frame and the above approach makes it difficult to track the
return address when the CFI information is emitted during the frame lowering,
due to the involvment of understanding the control flow.

This patch moves the return address ABI registers s[30:31] into callee saved
registers range and stops adding live-in for return address registers, so that
the CFI machinery will know where the return address resides when CSR
save/restore happen during the frame lowering.

And doing the above poses an issue that now the return instruction uses undefined
register `sgpr30_sgpr31`. This is resolved by hiding the return address register
use by the return instruction through the `SI_RETURN` pseudo instruction, which
doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the
`S_SETPC_B64_return` during the `expandPostRAPseudo()`.

As an added benefit, this patch simplifies overall return instruction handling.

Note: The AMDGPU CFI changes are there only in the downstream code and another
version of this patch will be posted for review for the downstream code.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D114652
2021-12-22 20:51:12 +05:30
Neubauer, Sebastian
26924b57e8 [AMDGPU] Ignore special ABI registers for graphics
Fixed ABI arguments are compute specific and should not be added to
graphics shaders or functions, so do not try to add them.

Differential Revision: https://reviews.llvm.org/D115344
2021-12-13 16:44:37 +01:00
Matt Arsenault
06b90175e7 AMDGPU: Remove fixed function ABI option 2021-12-10 19:41:19 -05:00
Matt Arsenault
729bf9b26b AMDGPU: Enable fixed function ABI by default
Code using indirect calls is broken without this, and there isn't
really much value in supporting the old attempt to vary the argument
placement based on uses. This resulted in more argument shuffling code
anyway.

Also have the option stop implying all inputs need to be passed. This
will no rely on the amdgpu-no-* attributes to avoid passing
unnecessary values.
2021-12-04 10:49:18 -05:00
Thomas Symalla
76cbe62262 [AMDGPU] Changes the AMDGPU_Gfx calling convention by making the SGPRs 4..29 callee-save. This is to avoid superfluous s_movs when executing amdgpu_gfx function calls as the callee is likely not going to change the argument values.
This patch changes the AMDGPU_Gfx calling convention. It defines the SGPR registers s[4:29] as callee-save and leaves some SGPRs usable for callers. The intention is to avoid unneccessary s_mov instructions for arguments the caller would otherwise save and restore in these registers.

Reviewed By: sebastian-ne

Differential Revision: https://reviews.llvm.org/D111637
2021-11-04 21:50:18 +01:00
Sebastian Neubauer
2b08f6af62 [AMDGPU] Improve register computation for indirect calls
First, collect the register usage in each function, then apply the
maximum register usage of all functions to functions with indirect
calls.

This is more accurate than guessing the maximum register usage without
looking at the actual usage.

As before, assume that indirect calls will hit a function in the
current module.

Differential Revision: https://reviews.llvm.org/D105839
2021-07-20 13:48:50 +02:00
Sebastian Neubauer
db646de3ee [AMDGPU] Set optional PAL metadata
Set informational fields in the .shader_functions table.

Also correct the documentation, .scratch_memory_size and .lds_size are
integers.

Differential Revision: https://reviews.llvm.org/D105116
2021-07-06 11:58:00 +02:00
madhur13490
c27e8141b3 [AMDGPU][IndirectCalls] Fix register usage propagation for indirect/external calls
This patch computes max SGPRs and VGPRs used by module
in presence of indirect calls and makes that
as register requirement for functions/kernels
which makes indirect calls.

This patch also refactors code AMDGPUSubTarget.cpp
which add a "base" variants of getMaxNumSGPRs which
is used by MachineFunction and new Function version.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D103636
2021-06-12 11:59:34 +05:30
Jay Foad
7f81c5a5ba [AMDGPU] getMemOperandsWithOffset: add vaddr operand for stack access BUF instructions
A consequence is that checkInstOffsetsDoNotOverlap can now distinguish
sp+offset from fp+offset, so it knows that it shouldn't try to work out
whether the accesses overlap just by comparing the offsets. For example
in these two instructions:

MIR:
BUFFER_STORE_DWORD_OFFSET %0:vgpr_32(s32), $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 4 into stack + 4, addrspace 5)
%4:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %stack.0.alloca, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 4 from `i8 addrspace(5)* undef`, addrspace 5)

ISA:
buffer_store_dword v0, off, s[0:3], s32 offset:4
buffer_load_dword v0, off, s[0:3], s34

Differential Revision: https://reviews.llvm.org/D73957
2021-05-14 10:10:43 +01:00
Matt Arsenault
9719f17011 AMDGPU: Move handling of allocation of fixed ABI inputs
For the fixed ABI, set this in the initial argument constructor,
rather than relying on the allocation logic to set the values. Also
stop passing them for amdgpu_gfx, since the DAG path seems to skip
these. I'm unclear on what amdgpu_gfx's expectations are.  This will
allow moving the special input registers out of the normal argument
range.
2021-02-03 09:27:59 -05:00
Christudasan Devadasan
ff8a1cae18 [AMDGPU] Fix the inconsistency in soffset for MUBUF stack accesses.
During instruction selection, there is an inconsistency in choosing
the initial soffset value. With certain early passes, this value is
getting modified and that brought additional fixup during
eliminateFrameIndex to work for all cases. This whole transformation
looks trivial and can be handled better.

This patch clearly defines the initial value for soffset and keeps it
unchanged before eliminateFrameIndex. The initial value must be zero
for MUBUF with a frame index. The non-frame index MUBUF forms that
use a raw offset from SP will have the stack register for soffset.
During frame elimination, the soffset remains zero for entry functions
with zero dynamic allocas and no callsites, or else is updated to the
appropriate frame/stack register.

Also, did some code clean up and made all asserts around soffset
stricter to match.

Reviewed By: scott.linder

Differential Revision: https://reviews.llvm.org/D95071
2021-01-22 14:20:59 +05:30
Sebastian Neubauer
5733167f54 [AMDGPU] Mark amdgpu_gfx functions as module entry function
- Allows lds allocations
- Writes resource usage into COMPUTE_PGM_RSRC1 registers in PAL metadata

Differential Revision: https://reviews.llvm.org/D92946
2020-12-14 10:43:39 +01:00
Sebastian Neubauer
edd675643d [AMDGPU] Emit stack frame size in metadata
Add .shader_functions to pal metadata, which contains the stack frame
size for all non-entry-point functions.

Differential Revision: https://reviews.llvm.org/D90036
2020-11-25 16:30:02 +01:00
Sebastian Neubauer
a022b1ccd8 [AMDGPU] Add amdgpu_gfx calling convention
Add a calling convention called amdgpu_gfx for real function calls
within graphics shaders. For the moment, this uses the same calling
convention as other calls in amdgpu, with registers excluded for return
address, stack pointer and stack buffer descriptor.

Differential Revision: https://reviews.llvm.org/D88540
2020-11-09 16:51:44 +01:00