33 Commits

Author SHA1 Message Date
Emma Pilkington
4490003a22
[AMDGPU] Rename COV module flag to amdhsa_code_object_version (#79905)
The previous name 'amdgpu_code_object_version', was misleading since
this is really a property of the HSA OS. The new spelling also matches
the asm directive I added in bc82cfb.
2024-03-06 09:51:48 -05:00
Diana Picus
bc6955f18c
[AMDGPU] Don't fix the scavenge slot at offset 0 (#79136)
At the moment, the emergency spill slot is a fixed object for entry
functions and chain functions, and a regular stack object otherwise.
This patch adopts the latter behaviour for entry/chain functions too. It
seems this was always the intention [1] and it will also save us a bit
of stack space in cases where the first stack object has a large
alignment.

[1]
34c8b835b1
2024-02-09 09:20:25 +01:00
Saiyedul Islam
777b6de7a4
[AMDGPU][NFC] Test autogenerated llc tests for COV5 (#74339)
Regenerate a few llc tests to test for COV5 instead of the default ABI
version.
2023-12-12 14:35:13 +05:30
Yashwant Singh
7ac532efc8
[AMDGPU] Introduce AMDGPU::SGPR_SPILL asm comment flag (#67091)
Use this flag to give more context to implicit def comments in assembly.

Reviewed on phabricator: 
https://reviews.llvm.org/D153754
2023-09-29 11:15:01 +05:30
Jay Foad
e0919b189b [CodeGen] Renumber slot indexes before register allocation (#66334)
RegAllocGreedy uses SlotIndexes::getApproxInstrDistance to approximate
the length of a live range for its heuristics. Renumbering all slot
indexes with the default instruction distance ensures that this estimate
will be as accurate as possible, and will not depend on the history of
how instructions have been added to and removed from SlotIndexes's maps.

This also means that enabling -early-live-intervals, which runs the
SlotIndexes analysis earlier, will not cause large amounts of churn due
to different register allocator decisions.
2023-09-19 11:18:12 +01:00
Saiyedul Islam
466a8149b3
Revert "[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410)" (#66060)
This reverts commit 0a8d17e79b02a92814a2a788d79df1f54d70ec3e.
2023-09-12 15:13:59 +05:30
Saiyedul Islam
0a8d17e79b
[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410)
Also update LIT tests and docs.
For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata

Reviewed By: arsenm, jhuber6

Github PR: #65410

Differential Revision: https://reviews.llvm.org/D129818
2023-09-12 13:53:31 +05:30
Matt Arsenault
4d42e8b5d1 Reapply "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"
This reverts commit a496c8be6e638ae58bb45f13113dbe3a4b7b23fd.

The workaround in c26dfc81e254c78dc23579cf3d1336f77249e1f6 should work
around the underlying problem with SUBREG_TO_REG.
2023-07-31 20:15:45 -04:00
Vitaly Buka
a496c8be6e Revert "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"
And dependent commits.

Details in D150388.

This reverts commit 825b7f0ca5f2211ec3c93139f98d1e24048c225c.
This reverts commit 7a98f084c4d121244ef7286bc6503b6a181d446e.
This reverts commit b4a62b1fa546312d882fa12dfdcd015177d66826.
This reverts commit b7836d856206ec39509d42529f958c920368166b.

No conflicts in the code, few tests had conflicts in autogenerated CHECKs:
llvm/test/CodeGen/Thumb2/mve-float32regloops.ll
llvm/test/CodeGen/AMDGPU/fix-frame-reg-in-custom-csr-spills.ll

Reviewed By: alexfh

Differential Revision: https://reviews.llvm.org/D156381
2023-07-26 22:13:32 -07:00
Christudasan Devadasan
7a98f084c4 [AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs
Currently, the custom SGPR spill lowering pass spills
SGPRs into physical VGPR lanes and the remaining VGPRs
are used by regalloc for vector regclass allocation.
This imposes many restrictions that we ended up with
unsuccessful SGPR spilling when there won't be enough
VGPRs and we are forced to spill the leftover into
memory during PEI. The custom spill handling during PEI
has many edge cases and often breaks the compiler time
to time.

This patch implements spilling SGPRs into virtual VGPR
lanes. Since we now split the register allocation for
SGPRs and VGPRs, the virtual registers introduced for
the spill lanes would get allocated automatically in
the subsequent regalloc invocation for VGPRs.

Spill to virtual registers will always be successful,
even in the high-pressure situations, and hence it avoids
most of the edge cases during PEI. We are now left with
only the custom SGPR spills during PEI for special registers
like the frame pointer which is an unproblematic case.

Differential Revision: https://reviews.llvm.org/D124196
2023-07-07 23:14:32 +05:30
Jay Foad
5aea839ab3 [AMDGPU] Switch to backwards scavenging in eliminateFrameIndex
Frame index elimination runs backwards so we must use backwards
scavenging. Otherwise, when a scavenged register is spilled, the
scavenger will remember that the register is in use until the restore
point, but it will never reach that restore point. The result is that in
some cases it will keep scavenging different registers instead of
reusing the same one.

Differential Revision: https://reviews.llvm.org/D152394
2023-06-07 20:59:05 +01:00
Jay Foad
8fcb4fa847 [RegScavenger] Change scavengeRegister to pick registers in allocation order
This matches what scavengeRegisterBackwards does.

This is in preparation for converting most uses of scavengeRegister to
scavengeRegisterBackwards, to reduce test case churn when that lands and
to help with bisection if anything goes wrong.

Differential Revision: https://reviews.llvm.org/D150792
2023-05-19 21:39:19 +01:00
Christudasan Devadasan
a3028239a7 Revert "[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs"
This reverts commit 40ba0942e2ab1107f83aa5a0ee5ae2980bf47b1a.
2022-12-21 16:17:42 +05:30
Nikita Popov
bdf2fbba9c [AMDGPU] Convert some tests to opaque pointers (NFC) 2022-12-19 12:41:13 +01:00
Christudasan Devadasan
40ba0942e2 [AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs
Currently, the custom SGPR spill lowering pass spills
SGPRs into physical VGPR lanes and the remaining VGPRs
are used by regalloc for vector regclass allocation.
This imposes many restrictions that we ended up with
unsuccessful SGPR spilling when there won't be enough
VGPRs and we are forced to spill the leftover into
memory during PEI. The custom spill handling during PEI
has many edge cases and often breaks the compiler time
to time.

This patch implements spilling SGPRs into virtual VGPR
lanes. Since we now split the register allocation for
SGPRs and VGPRs, the virtual registers introduced for
the spill lanes would get allocated automatically in
the subsequent regalloc invocation for VGPRs.

Spill to virtual registers will always be successful,
even in the high-pressure situations, and hence it avoids
most of the edge cases during PEI. We are now left with
only the custom SGPR spills during PEI for special registers
like the frame pointer which isn an unproblematic case.

This patch also implements the whole wave spills which
might occur if RA spills any live range of virtual registers
involved in the whole wave operations. Earlier, we had
been hand-picking registers for such machine operands.
But now with SGPR spills into virtual VGPR lanes, we are
exposing them to the allocator.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D124196
2022-12-17 11:56:32 +05:30
Pierre van Houtryve
a88deb4b65 [AMDGPU] Use aperture registers instead of S_GETREG
Fixes a longstanding TODO in the codebase where we were using S_GETREG + shift to do something that could simply be done with an inline constant (register).

Patch based on D31874 by @kzhuravl
Depends on D137767

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D137542
2022-11-30 12:25:10 +00:00
Ron Lieberman
ca856fff1c Revert "enable code-object-version=5"
very sorry wrong repo.

This reverts commit d882ba7aeac4b496dccd1b10cb58bd691786b691.
2022-11-29 15:21:09 -06:00
Ron Lieberman
d882ba7aea enable code-object-version=5 2022-11-29 15:11:57 -06:00
Alexander Timofeev
32bd75716c PEI should be able to use backward walk in replaceFrameIndicesBackward.
The backward register scavenger has correct register
liveness information. PEI should leverage the backward register scavenger.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D137574
2022-11-18 15:57:34 +01:00
Fangrui Song
6c7666a408 Revert D137574 "PEI should be able to use backward walk in replaceFrameIndicesBackward."
This reverts commit e05ce03cfa0b36e9b99149e21afcb1fc039df813.

Caused asan use-after-poison to 4 DebugInfo/AMDGPU/ tests.
Triggered in PEI::replaceFrameIndicesBackward called llvm::MachineInstr::getNumOperands
2022-11-15 19:19:46 +00:00
Alexander Timofeev
e05ce03cfa PEI should be able to use backward walk in replaceFrameIndicesBackward.
The backward register scavenger has correct register
liveness information. PEI should leverage the backward register scavenger.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D137574
2022-11-15 15:20:25 +01:00
Sebastian Neubauer
a5d4f82b73 [AMDGPU] Make enable-flat-scratch a subtarget feature
Use a subtarget feature instead of a command line argument to reduce
global state.
We want to enable flat scratch for graphics in some cases and this
doesn't work well with command line options.

Differential Revision: https://reviews.llvm.org/D119425
2022-02-11 18:23:07 +01:00
Matt Arsenault
68468bbe15 AMDGPU: Avoid null check during addrspacecast lowering
If we know the source is a valid object, we do not need to insert a
null check. This misses a lot of opportunities from
metadata/attributes not tracked in codegen.
2022-01-10 13:27:39 -05:00
Nico Weber
085f078307 Revert "Revert D109159 "[amdgpu] Enable selection of s_cselect_b64.""
This reverts commit 859ebca744e634dcc89a2294ffa41574f947bd62.
The change contained many unrelated changes and e.g. restored
unit test failes for the old lld port.
2022-01-05 13:10:25 -05:00
David Salinas
859ebca744 Revert D109159 "[amdgpu] Enable selection of s_cselect_b64."
This reverts commit 640beb38e7710b939b3cfb3f4c54accc694b1d30.

That commit caused performance degradtion in Quicksilver test QS:sGPU and a functional test failure in (rocPRIM rocprim.device_segmented_radix_sort).
Reverting until we have a better solution to s_cselect_b64 codegen cleanup

Change-Id: Ibf8e397df94001f248fba609f072088a46abae08

Reviewed By: kzhuravl

Differential Revision: https://reviews.llvm.org/D115960

Change-Id: Id169459ce4dfffa857d5645a0af50b0063ce1105
2022-01-05 17:57:32 +00:00
Matt Arsenault
729bf9b26b AMDGPU: Enable fixed function ABI by default
Code using indirect calls is broken without this, and there isn't
really much value in supporting the old attempt to vary the argument
placement based on uses. This resulted in more argument shuffling code
anyway.

Also have the option stop implying all inputs need to be passed. This
will no rely on the amdgpu-no-* attributes to avoid passing
unnecessary values.
2021-12-04 10:49:18 -05:00
Matt Arsenault
2959e082e1 AMDGPU: Assume all amdhsa kernarg passed implicit arguments by default
Previously we would require adding an attribute to kernels to enable
the inputs passed in the kernarg segment, accessed by
llvm.amdgcn.implicitarg.ptr. This violates the principle of being
correct by default. Some OpenMP testcases were broken recently since
it wasn't correctly setting this attribute, and no known frontends are
setting this to anything other than the maximum.

Most of the test changes are from load widening of argument loads
since there now more implied dereferenceable bytes.
2021-12-04 10:38:25 -05:00
Stanislav Mekhanoshin
476ab0f809 [AMDGPU] Fixed stack pointer init with architected flat scratch
Even if wave offset is not present we still need to do the rest
of the initialization. The mov into s32 was missing in the
kernels.

Fixes: SWDEV-310935

Differential Revision: https://reviews.llvm.org/D113628
2021-11-10 17:18:38 -08:00
Joe Nash
3ce1b9631a [AMDGPU] Switch PostRA sched to MachineSched
Use GCNHazardRecognizer in postra sched.
Updated tests for the new schedules.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D109536

Change-Id: Ia86ba2ae168f12fb34b4d8efdab491f84d936cde
2021-09-14 15:11:27 -04:00
Matt Arsenault
722b8e0e5a AMDGPU: Invert ABI attribute handling
Previously we assumed all callable functions did not need any
implicitly passed inputs, and added attributes to functions to
indicate when they were necessary. Requiring attributes for
correctness is pretty ugly, and it makes supporting indirect and
external calls more complicated.

This inverts the direction of the attributes, so an undecorated
function is assumed to need all implicit imputs. This enables
AMDGPUAttributor by default to mark when functions are proven to not
need a given input. This strips the equivalent functionality from the
legacy AMDGPUAnnotateKernelFeatures pass.

However, AMDGPUAnnotateKernelFeatures is not fully removed at this
point although it should be in the future. It is still necessary for
the two hacky amdgpu-calls and amdgpu-stack-objects attributes, which
would be better served by a trivial analysis on the IR during
selection. Additionally, AMDGPUAnnotateKernelFeatures still
redundantly handles the uniform-work-group-size attribute to be
removed in a future commit.

At this point when not using -amdgpu-fixed-function-abi, we are still
modifying the ABI based on these newly negated attributes. In the
future, this option will be removed and the locations for implicit
inputs will always be fixed. We will then use the new attributes to
avoid passing the values when unnecessary.
2021-09-09 18:24:28 -04:00
Michael Liao
640beb38e7 [amdgpu] Enable selection of s_cselect_b64.
Differential Revision: https://reviews.llvm.org/D109159
2021-09-07 10:45:07 -04:00
Sebastian Neubauer
4359b870b1 [AMDGPU] Init scratch only if necessary
If no scratch or flat instructions are used, we do not need to
initialize the flat scratch hardware register.

Differential Revision: https://reviews.llvm.org/D105920
2021-07-14 10:45:22 +02:00
Sebastian Neubauer
a12e551882 [AMDGPU] Precommit flat-scratch-init.ll test 2021-07-14 10:45:22 +02:00