154 Commits

Author SHA1 Message Date
Archibald Elliott
8e3d7cf5de [NFC][TargetParser] Remove llvm/Support/TargetParser.h 2023-02-07 11:08:21 +00:00
Ruiling Song
9119d9bfce AMDGPU/SIInsertWait: Skip dummy tied source
For D16 memory load instructions, the hardware usually only write to half
of the 32bit register, but we define the destination register using
32bit register for the MachineIR instruction. Without the extra tied
source register, LLVM framework will think previous write to the other
half of the register being dead. This is because by using 32bit register
as the destination register, LLVM will think the instruction will always
overwrite the whole 32bit register. By adding the extra tied source,
LLVM will think we are reading the register, so previous write to the
register will not be dead. This dummy tied source is introducing
unnecessary read-after-write dependency. The change here is to bypass the
tied source that can be skipped, thus avoiding an unnecessary s_waitcnt.

Reviewed by: foad

Differential Revision: https://reviews.llvm.org/D140537
2023-01-11 09:59:35 +08:00
Jay Foad
6443c0ee02 [AMDGPU] Stop using make_pair and make_tuple. NFC.
C++17 allows us to call constructors pair and tuple instead of helper
functions make_pair and make_tuple.

Differential Revision: https://reviews.llvm.org/D139828
2022-12-14 13:22:26 +00:00
Stephen Thomas
ab2e27faa4 [AMDGPU] Small cleanup in insertWaitcntInBlock()
Move some code that checks if an instruction is a waitcount into a separate
function, mainly to aid readability in the logic where it is used.

Differential Revision: https://reviews.llvm.org/D139522
2022-12-07 11:58:59 +00:00
Ruiling Song
0eaf6759ae [AMDGPU][InsertWaits] No wait for WAW for global/scratch_load
global/scratch_load will return in order they are issued. No
need to insert a s_waitcnt for WAW hazard.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D138476
2022-11-23 09:57:50 +08:00
Jay Foad
bc0c89d3d8 [AMDGPU] More cleanup after D117544. NFC. 2022-11-22 21:15:22 +00:00
Jay Foad
6ed6e8e3b8 [AMDGPU] Remove RegStrictDom variable. NFC.
D117544 removed the only substantive use of RegStrictDom. Now we can
simplify by using StrictDom for everything.
2022-11-22 17:08:00 +00:00
Jay Foad
dce7c09bd4 [AMDGPU] Define and use new allZeroWaitcnt helper. NFC. 2022-11-22 17:01:36 +00:00
Stephen Thomas
69dd9910c7 [AMDGPU] Declutter applyPreexistingWaitcnt()
Declutter applyPreexistingWaitcnt() a little by abstracting the code
that updates the operands to S_WAITCNT and S_WAITCNT_VSCNT instructions
into discrete functions.

Differential Revision: https://reviews.llvm.org/D137624
2022-11-08 11:51:16 +00:00
Pierre van Houtryve
7425077e31 [AMDGPU] Add & use hasNamedOperand, NFC
In a lot of places, we were just calling `getNamedOperandIdx` to check if the result was != or == to -1.
This is fine in itself, but it's verbose and doesn't make the intention clear, IMHO. I added a `hasNamedOperand` and replaced all cases I could find with regexes and manually.

Reviewed By: arsenm, foad

Differential Revision: https://reviews.llvm.org/D137540
2022-11-08 07:57:21 +00:00
Jay Foad
86dc6a3c0f [AMDGPU] Constify a couple of methods. NFC. 2022-11-02 11:04:54 +00:00
Stephen Thomas
c8a90316fa [AMDGPU] Small cleanups in wait counter code
A small number of cleanups and refactors intended to enhance readability in
two passes that deal with s_waitcnt instructions.

Differential Revision: https://reviews.llvm.org/D136677
2022-10-28 11:05:02 +01:00
Jay Foad
f0ca946bf9 [AMDGPU] New helper function SIInsertWaitcnts::getVmemWaitEventType
This just commons up and simplifies some logic that was repeated in
SIInsertWaitcnts::updateEventWaitcntAfter. NFCI.

Differential Revision: https://reviews.llvm.org/D136253
2022-10-19 16:22:50 +01:00
Baptiste
b556726ccc [AMDGPU] Avoid flushing the vmcnt counter in loop preheaders if not necessary
One of the conditions to flush the vmcnt counter in loop preheaders is: The loop
contains a use of a vgpr that is defined out of the loop. The code currently
checks if a waitcnt is needed by looking at the score of that vgpr in the score
brackets. This is not enough and may cause the generation of an unnecessary
vmcnt flush. This patch fixes that case.

Differential Revision: https://reviews.llvm.org/D130313
2022-09-28 13:05:50 -04:00
Austin Kerbow
2c82a126d7 [AMDGPU] Omit unnecessary waitcnt before barriers
It is not necessary to wait for all outstanding memory operations before
barriers on hardware that can back off of the barrier in the event of an
exception when traps are enabled. Add a new subtarget feature which
tracks which HW has this ability.

Reviewed By: #amdgpu, rampitec

Differential Revision: https://reviews.llvm.org/D130722
2022-07-29 11:12:36 -07:00
Baptiste Saleil
79e77a9f39 [AMDGPU] Flush the vmcnt counter in loop preheaders when necessary
waitcnt vmcnt instructions are currently generated in loop bodies before using
values loaded outside of the loop. In some cases, it is better to flush the
vmcnt counter in a loop preheader before entering the loop body. This patch
detects these cases and generates waitcnt instructions to flush the counter.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D115747
2022-06-23 10:53:21 -04:00
Joe Nash
2a68364745 [AMDGPU] gfx11 waitcnt support for VINTERP and LDSDIR instructions
Reviewed By: rampitec, #amdgpu

Differential Revision: https://reviews.llvm.org/D127781
2022-06-17 09:30:37 -04:00
Jay Foad
6c372daa84 [AMDGPU] New GFX11 intrinsic llvm.amdgcn.s.sendmsg.rtn
Add new intrinsic and codegen support for the s_sendmsg_rtn_b32 and
s_sendmsg_rtn_b64 instructions.

Differential Revision: https://reviews.llvm.org/D127315
2022-06-10 08:15:23 +01:00
Joe Nash
d21b9b4946 [AMDGPU] gfx11 scalar alu instructions
MC layer support for SOP(scalar alu operations) including encoding
support for s_delay_alu and s_sendmsg_rtn.

Contributors:
Jay Foad <jay.foad@amd.com>

Patch 7/N for upstreaming of AMDGPU gfx11 architecture.

Depends on D125319

Reviewed By: #amdgpu, arsenm

Differential Revision: https://reviews.llvm.org/D125498
2022-05-17 13:35:41 -04:00
Stanislav Mekhanoshin
791ec1c68e [AMDGPU] Add intrinsics llvm.amdgcn.{raw|struct}.buffer.load.lds
Differential Revision: https://reviews.llvm.org/D124884
2022-05-17 10:32:13 -07:00
Stanislav Mekhanoshin
51e02409f0 [AMDGPU] Produce waitcounts for LDS DMA
MUBUF and FLAT LDS DMA operations need a wait on vmcnt before LDS written
can be accessed. A load from LDS to VMEM does not need a wait.

Differential Revision: https://reviews.llvm.org/D124626
2022-04-29 11:14:11 -07:00
Austin Kerbow
7f97ac94f7 Revert "[AMDGPU] Omit unnecessary waitcnt before barriers"
This reverts commit 8d0c34fd4fb66ea0d19563154a59658e4b7f35d4.
2022-04-18 21:24:08 -07:00
Venkata Ramanaiah Nalamothu
04fff547e2 [AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range
Currently the return address ABI registers s[30:31], which fall in the call
clobbered register range, are added as a live-in on the function entry to
preserve its value when we have calls so that it gets saved and restored
around the calls.

But the DWARF unwind information (CFI) needs to track where the return address
resides in a frame and the above approach makes it difficult to track the
return address when the CFI information is emitted during the frame lowering,
due to the involvment of understanding the control flow.

This patch moves the return address ABI registers s[30:31] into callee saved
registers range and stops adding live-in for return address registers, so that
the CFI machinery will know where the return address resides when CSR
save/restore happen during the frame lowering.

And doing the above poses an issue that now the return instruction uses undefined
register `sgpr30_sgpr31`. This is resolved by hiding the return address register
use by the return instruction through the `SI_RETURN` pseudo instruction, which
doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the
`S_SETPC_B64_return` during the `expandPostRAPseudo()`.

As an added benefit, this patch simplifies overall return instruction handling.

Note: The AMDGPU CFI changes are there only in the downstream code and another
version of this patch will be posted for review for the downstream code.

Reviewed By: arsenm, ronlieb

Differential Revision: https://reviews.llvm.org/D114652
2022-03-09 12:18:02 +05:30
Austin Kerbow
8d0c34fd4f [AMDGPU] Omit unnecessary waitcnt before barriers
It is not necessary to wait for all outstanding memory operations before
barriers on hardware that can back off of the barrier in the event of an
exception when traps are enabled. Add a new subtarget feature which
tracks which HW has this ability.

Reviewed By: #amdgpu, rampitec

Differential Revision: https://reviews.llvm.org/D120544
2022-03-07 08:23:53 -08:00
Sebastian Neubauer
6527b2a4d5 [AMDGPU][NFC] Fix typos
Fix some typos in the amdgpu backend.

Differential Revision: https://reviews.llvm.org/D119235
2022-02-18 15:05:21 +01:00
Joe Nash
c87c61c52c [AMDGPU] Fix AGPR offset for waitcnt
An enum value stores the offset between AGPR ranges and VGPR
ranges in the internal storage of SIInsertWaitcnts. It said 226 when
it should say 256, causing some portion of the ranges to overlap. That
in turn causes 'aliasing' between the registers, potentially inserting
waitcnts that are not required.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D119749
2022-02-14 15:16:21 -05:00
Sebastian Neubauer
f1e36474b9 [AMDGPU][NFC] Fix debug prints
Print the instructions instead of pointers.
2022-01-24 13:55:00 +01:00
Piotr Sobczak
8dfb417e67 [AMDGPU] Fix missing waitcnt issue
Ignore out of order counters when merging brackets. The fact that
there was a pending event in the old state does not guarantee that
the waitcnt was generated, so we still need to conservatively re-process
the block.

The patch fixes a correctness issue where the block was not re-processed
and the waitcnt not inserted in consequence.

Differential Revision: https://reviews.llvm.org/D117544
2022-01-19 10:54:44 +01:00
Ron Lieberman
09b53296cf Revert "[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range"
This reverts commit 9075009d1fd5f2bf9aa6c2f362d2993691a316b3.

 Failed amdgpu runtime buildbot # 3514
2021-12-22 11:39:28 -05:00
RamNalamothu
9075009d1f [AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range
Currently the return address ABI registers s[30:31], which fall in the call
clobbered register range, are added as a live-in on the function entry to
preserve its value when we have calls so that it gets saved and restored
around the calls.

But the DWARF unwind information (CFI) needs to track where the return address
resides in a frame and the above approach makes it difficult to track the
return address when the CFI information is emitted during the frame lowering,
due to the involvment of understanding the control flow.

This patch moves the return address ABI registers s[30:31] into callee saved
registers range and stops adding live-in for return address registers, so that
the CFI machinery will know where the return address resides when CSR
save/restore happen during the frame lowering.

And doing the above poses an issue that now the return instruction uses undefined
register `sgpr30_sgpr31`. This is resolved by hiding the return address register
use by the return instruction through the `SI_RETURN` pseudo instruction, which
doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the
`S_SETPC_B64_return` during the `expandPostRAPseudo()`.

As an added benefit, this patch simplifies overall return instruction handling.

Note: The AMDGPU CFI changes are there only in the downstream code and another
version of this patch will be posted for review for the downstream code.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D114652
2021-12-22 20:51:12 +05:30
Jakub Kuderski
1e93f3895f [AMDGPU] Use enum_seq to iterator over InstCounterTypes. NFC.
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D115900
2021-12-18 16:07:28 -05:00
Jakub Kuderski
d9ae852fcc [AMDGPU] Fix data race in SIInsertWaitcnts
The race condition happened when two pass managers ran on two different modules but modified/read the global variables.

To address this, I considered using singletons and freestanding functions to allow getting/setting `HardwareLimits` and `RegisterEncoding`, or making it local to the pass. I chose the latter and made it a member of `WaitcntsBrackets`, to minimizes the amount of global state.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D115896
2021-12-18 16:03:09 -05:00
Jay Foad
6a7db0dc8e [AMDGPU] Skip some work on subtargets without scalar stores. NFC. 2021-12-15 12:46:33 +00:00
David Stuttard
0e8590f065 [AMDGPU] Add support for in-order bvh in waitcnt pass
bvh should be handled separately from vmem and vmem with sampler instructions
for waitcnt handling.

Differential Revision: https://reviews.llvm.org/D114794
2021-12-02 14:26:11 +00:00
Kazu Hirata
ee0133dc6d [llvm] Use range-for loops (NFC) 2021-11-16 09:01:56 -08:00
Neubauer, Sebastian
d1f45ed58f [AMDGPU][NFC] Fix typos
Differential Revision: https://reviews.llvm.org/D113672
2021-11-12 11:37:21 +01:00
Thomas Symalla
76cbe62262 [AMDGPU] Changes the AMDGPU_Gfx calling convention by making the SGPRs 4..29 callee-save. This is to avoid superfluous s_movs when executing amdgpu_gfx function calls as the callee is likely not going to change the argument values.
This patch changes the AMDGPU_Gfx calling convention. It defines the SGPR registers s[4:29] as callee-save and leaves some SGPRs usable for callers. The intention is to avoid unneccessary s_mov instructions for arguments the caller would otherwise save and restore in these registers.

Reviewed By: sebastian-ne

Differential Revision: https://reviews.llvm.org/D111637
2021-11-04 21:50:18 +01:00
Piotr Sobczak
a4db7025a9 [AMDGPU] Remove assert
Remove assert introduced in D101177, following post-commit feedback.
2021-05-12 14:52:37 +02:00
Piotr Sobczak
68137ef568 [AMDGPU] Skip invariant loads when avoiding WAR conflicts
No need to handle invariant loads when avoiding WAR conflicts, as
there cannot be a vector store to the same memory location.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D101177
2021-05-12 10:57:05 +02:00
Austin Kerbow
4433f4601e [AMDGPU] Fix extra waitcnt being added with BUFFER_INVL2
The waitcnt pass would increment the number of vmem events for some buffer
invalidates that were not handled by the pass.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D102252
2021-05-11 13:17:33 -07:00
Austin Kerbow
6617a5a5ea [AMDGPU] Move insertion of function entry waitcnt later
This allows tracking these as preexisting waitcnt.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D101380
2021-05-05 17:58:38 -07:00
Austin Kerbow
f5199d7ae0 [AMDGPU] Revise handling of preexisting waitcnt
Preexisting waitcnt may not update the scoreboard if the instruction
being examined needed to wait on fewer counters than what was encoded in
the old waitcnt instruction. Fixing this results in the elimination of
some redudnat waitcnt.

These changes also enable combining consecutive waitcnt into a single
S_WAITCNT or S_WAITCNT_VSCNT instruction.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D100281
2021-05-05 17:21:33 -07:00
Stanislav Mekhanoshin
a8d9d50762 [AMDGPU] gfx90a support
Differential Revision: https://reviews.llvm.org/D96906
2021-02-17 16:01:32 -08:00
Stanislav Mekhanoshin
5cf9292ce3 [AMDGPU] Add two TSFlags: IsAtomicNoRtn and IsAtomicRtn
We are using AtomicNoRet map in multiple places to determine
if an instruction atomic, rtn or nortn atomic. This method
does not work always since we have some instructions which
only has rtn or nortn version.

One such instruction is ds_wrxchg_rtn_b32 which does not have
nortn version. This has caused changes in memory legalizer
tests.

Differential Revision: https://reviews.llvm.org/D96639
2021-02-15 11:27:59 -08:00
Dmitry Preobrazhensky
745064e36b [AMDGPU][MC] Refactored exp tgt handling
Summary:
- Separated tgt encoding from parsing;
- Separated tgt decoding from printing;
- Improved errors handling;
- Disabled leading zeroes in index. The following code is no longer accepted: exp pos00 v3, v2, v1, v0

Reviewers: arsenm, rampitec, foad

Differential Revision: https://reviews.llvm.org/D95216
2021-01-26 14:54:15 +03:00
dfukalov
560d7e0411 [NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets
... to reduce headers dependency.

Reviewed By: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D95036
2021-01-20 22:22:45 +03:00
dfukalov
6a87e9b08b [NFC][AMDGPU] Reduce include files dependency.
Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D93813
2021-01-07 22:22:05 +03:00
Jay Foad
7ecf19697e [AMDGPU] Fix and extend vccz workarounds
We have workarounds for two different cases where vccz can get out of
sync with the value in vcc. This fixes them in two ways:

1. Fix the case where the def of vcc was in a previous basic block, by
pessimistically assuming that vccz might be incorrect at a basic block
boundary.

2. Fix the handling of pre-existing waitcnt instructions by calling
generateWaitcntInstBefore before examining ScoreBrackets to determine
whether there's an outstanding smem read operation.

Differential Revision: https://reviews.llvm.org/D91636
2020-11-18 15:26:06 +00:00
Jay Foad
a6ecb2eb3d [AMDGPU] Add comments. NFC. 2020-11-16 16:34:13 +00:00
Jay Foad
6881a82e8c [AMDGPU] Fix scheduling of exp pos4
Also fix a similar issue in SIInsertWaitcnts, but I don't think that fix
has any effect in practice.

Differential Revision: https://reviews.llvm.org/D91290
2020-11-12 19:57:14 +00:00