209 Commits

Author SHA1 Message Date
Stanislav Mekhanoshin
e0a16371c6
[AMDGPU] Omit isReg() check for all_uses() in SIInsertWaitcnts. NFC. (#109041) 2024-09-18 00:08:23 -07:00
Stanislav Mekhanoshin
731a68383f
[AMDGPU] Refine operand iterators in the SIInsertWaitcnts. NFCI. (#108884) 2024-09-17 02:58:08 -07:00
Stanislav Mekhanoshin
18f1c980bc
[AMDGPU] Avoid unneeded waitcounts before spill stores (#108303)
Implicit defs and uses on spill stores were accounted as real defs and
uses, while only exist for liveness accounting. As a result unneded
waits were generated.

Fixes: SWDEV-484177
2024-09-14 02:22:28 -07:00
Kazu Hirata
78505ade2c
[AMDGPU] Use range-based for loops (NFC) (#106184) 2024-08-27 06:46:01 -07:00
Jay Foad
fa2dccb377
[AMDGPU] Remove one case of vmcnt loop header flushing for GFX12 (#105550)
When a loop contains a VMEM load whose result is only used outside the
loop, do not bother to flush vmcnt in the loop head on GFX12. A wait for
vmcnt will be required inside the loop anyway, because VMEM instructions
can write their VGPR results out of order.
2024-08-23 10:31:33 +01:00
Jay Foad
5506831f7b
[AMDGPU] GFX12 VMEM loads can write VGPR results out of order (#105549)
Fix SIInsertWaitcnts to account for this by adding extra waits to avoid
WAW dependencies.
2024-08-22 11:46:51 +01:00
Alexis Engelke
8cae9dcd4a
[AMDGPU] Clear load addresses between functions (#102515)
SLoadAddresses previously held data across different functions and used
these for dominance queries of blocks in different functions. This is
not intended; clear the state at the end of the pass.
2024-08-08 21:26:17 +02:00
Carl Ritson
62aa596ba1
[AMDGPU] Add no return image_sample intrinsics and instructions (#97542)
An appropriately configured image resource descriptor can trigger
image_sample instructions to store outputs directly to a linked memory
location instead of returning to VGPRs.

This is opaque to the backend as instruction encoding is unchanged;
however, a mechanism is require to allow frontends to communicate that
these instructions do not require destination VGPRs and store to memory.
Flagging these as stores means they will not be optimized away.
2024-07-20 17:26:58 +09:00
Jay Foad
80d261493e [AMDGPU] clang-tidy: use override consistently. NFC. 2024-07-16 15:55:39 +01:00
Jay Foad
63a1242ae3 [AMDGPU] clang-tidy: define trivial constructors with = default. NFC. 2024-07-16 15:41:54 +01:00
paperchalice
79d0de2ac3
[CodeGen][NewPM] Port machine-loops to new pass manager (#97793)
- Add `MachineLoopAnalysis`.
- Add `MachineLoopPrinterPass`.
- Convert to `MachineLoopInfoWrapperPass` in legacy pass manager.
2024-07-09 09:11:18 +08:00
paperchalice
4b24c2dfb5
[CodeGen][NewPM] Split MachinePostDominators into a concrete analysis result (#95113)
`MachinePostDominators` version of #94571.
2024-06-12 14:29:22 +08:00
Jay Foad
558f3ea4ae
[AMDGPU] Remove #if 0 code for indexed resources in SIInsertWaitcnts (#92905)
I do not understand what optimization this was supposed to implement.
It has never been enabled. I suspect it no longer applies to GCN/RDNA
architectures.
2024-05-21 13:51:42 +01:00
Jay Foad
4e86b0006b
[AMDGPU] Remove #if 0 code for buffer stores in SIInsertWaitcnts (#92903) 2024-05-21 13:33:49 +01:00
Jay Foad
f3aaaafe50
[AMDGPU] Remove #if 0 code for fences in SIInsertWaitcnts (#92902)
We insert required waits for fences in SIMemoryLegalizer.
2024-05-21 13:33:20 +01:00
Nicolai Hähnle
ec1f28dc97
AMDGPU/gfx12: avoid crashing on legacy waitcnt intrinsics (#92306)
They *are* still accepted by the HW but have a conservative effect.

Leave them untouched since handling them would complicate the logic a
bit, and developers who code to such a low level really need to revisit
what they're doing anyway.
2024-05-15 22:23:18 +02:00
David Stuttard
f898161bfa
[AMDGPU] Fix image_msaa_load waitcnt insertion for pre-gfx12 (#90710)
https://github.com/llvm/llvm-project/pull/90201 made some fixes for
gfx12
image_msaa_load waitcnt insertion.
That fix might break in some situations for pre-gfx12 - this fixes that
by
explitly checking for VSAMPLE which always requires a s_wait_samplecnt
and
leaves the previous logic intact for non-gfx12.
2024-05-01 11:37:57 +01:00
David Stuttard
5fb1e2825f
[AMDGPU] Enhance s_waitcnt insertion before barrier for gfx12 (#90595)
Code to determine if a waitcnt is required before a barrier instruction
only
considered S_BARRIER.
gfx12 adds barrier_signal/wait so need to enhance the existing code to
look for
a barrier start (which is just an S_BARRIER for earlier architectures).
2024-05-01 11:37:13 +01:00
Jay Foad
0b21b25eac
[AMDGPU] Do not optimize away pre-existing waitcnt instructions at -O0 (#90716)
The autogenerated memory legalizer tests use -O0 so this allows us to
see the exact waitcnts that were inserted by the memory legalizer
without them being optimized away.
2024-05-01 11:29:11 +01:00
David Stuttard
62dea99a7d
[AMDGPU] Fix gfx12 waitcnt type for image_msaa_load (#90201)
image_msaa_load is actually encoded as a VSAMPLE instruction and
requires the appropriate waitcnt variant.
2024-04-30 10:41:51 +01:00
Xu Zhang
f6d431f208
[CodeGen] Make the parameter TRI required in some functions. (#85968)
Fixes #82659

There are some functions, such as `findRegisterDefOperandIdx` and  `findRegisterDefOperand`, that have too many default parameters. As a result, we have encountered some issues due to the lack of TRI  parameters, as shown in issue #82411.

Following @RKSimon 's suggestion, this patch refactors 9 functions, including `{reads, kills, defines, modifies}Register`,  `registerDefIsDead`, and `findRegister{UseOperandIdx, UseOperand, DefOperandIdx, DefOperand}`, adjusting the order of the TRI parameter and making it required. In addition, all the places that call these functions have also been updated correctly to ensure no additional impact.

After this, the caller of these functions should explicitly know whether to pass the `TargetRegisterInfo` or just a `nullptr`.
2024-04-24 14:24:14 +01:00
Emma Pilkington
16fed31f44
[AMDGPU] Fix debug line table for MSG_DEALLOC_VGPRS optimization (#88924)
Deallocating VGPRs interferes with doing a context save, which is needed for GDB
to report a breakpoint. So, in this sequence:

  s_sendmsg MSG_DEALLOC_VGPRS
  s_endpgm

We now use the debug location of the s_endpgm for the s_sendmsg, so a breakpoint
set in the debugger at the end of a shader will be hit before deallocating VGPRs.
2024-04-18 09:29:32 -04:00
Jun Wang
86842e1f72
[AMDGPU] New clang option for emitting a waitcnt instruction after each memory instruction (#79236)
This patch introduces a new command-line option for clang, namely,
amdgpu-precise-mem-op (or precise-memory in the backend). When this option is specified, a waitcnt
instruction is generated after each memory load/store instruction. The
counter values are always 0, but which counters are involved depends on
the memory instruction.

---------

Co-authored-by: Jun Wang <jun.wang7@amd.com>
2024-04-10 10:47:04 -07:00
Jay Foad
3cf539fb04
[AMDGPU] Combine or remove redundant waitcnts at the end of each MBB (#87539)
Call generateWaitcnt unconditionally at the end of
SIInsertWaitcnts::insertWaitcntInBlock. Even if we don't need to
generate a new waitcnt instruction it has the effect of combining or
removing redundant waitcnts that were already present. Tests show
various small improvements in waitcnt placement.
2024-04-04 10:14:16 +01:00
Christudasan Devadasan
c54f22f5fe
[AMDGPU] Add eventMask function in WaitcntGenerator class (NFC) (#85210)
This would bring a cleaner interface while obtaining wait event masks by
combining various wait event types in the derived classes.
2024-03-15 10:22:58 +05:30
vangthao95
f37c6d55c6
[AMDGPU][NFC] Refactor SIInsertWaitcnts zero waitcnt generation (#82575)
Move the allZero* waitcnt generation methods into WaitcntGenerator
class.
2024-02-22 15:55:26 -08:00
Jie Fu
779af9b713 [AMDGPU] Fix -Wunused-variable in SIInsertWaitcnts.cpp (NFC)
llvm-project/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1539:10:
 error: unused variable 'SWaitInst' [-Werror,-Wunused-variable]
    auto SWaitInst =
         ^
1 error generated.
2024-01-18 19:28:48 +08:00
Jay Foad
ba52f06f9d
[AMDGPU] CodeGen for GFX12 S_WAIT_* instructions (#77438)
Update SIMemoryLegalizer and SIInsertWaitcnts to use separate wait
instructions per counter (e.g. S_WAIT_LOADCNT) and split VMCNT into
separate LOADCNT, SAMPLECNT and BVHCNT counters.
2024-01-18 10:47:45 +00:00
Stanislav Mekhanoshin
021def6c22
[AMDGPU] Use alias info to relax waitcounts for LDS DMA (#74537)
LDA DMA loads increase VMCNT and a load from the LDS stored must wait on
this counter to only read memory after it is written. Wait count
insertion pass does not track memory dependencies, it tracks register
dependencies. To model the LDS dependency a pseudo register is used in
the scoreboard, acting like if LDS DMA writes it and LDS load reads it.

This patch adds 8 more pseudo registers to use for independent LDS
locations if we can prove they are disjoint using alias analysis.

Fixes: SWDEV-433427
2024-01-17 23:44:15 -08:00
Jay Foad
36ef291d63
[AMDGPU] Fix hang caused by VS_CNT handling at calls (#78318)
Fix a potential hang introduced by #77439 and #77935. This line:

  setScoreUB(VS_CNT, getScoreLB(VS_CNT) + getWaitCountMax(VS_CNT));

could potentialy set UB lower than it was before, which confused
SIInsertWaitcnts's fixed point algorithm.

This was only triggered a STORE instruction with an implicit-def, which
seems odd but apparently happens for some spills.
2024-01-17 10:24:29 +00:00
Jay Foad
9d8e53818d
[AMDGPU] Refactor getNonSoftWaitcntOpcode and its callers (#77933)
This avoids listing all soft waitcnt opcodes in two places
(getNonSoftWaitcntOpcode and isSoftWaitcnt) and avoids the need for
helpers isWaitcnt and isWaitcntVsCnt.
2024-01-12 17:12:09 +00:00
Jay Foad
dec74a8347
[AMDGPU] Fix VS_CNT overflow assertion (#77935)
Always set the upper bound for VS_CNT higher than the lower bound.
Before #77439 this code was only executed on function entry where the
lower bound was 0 so it was not a problem.

Fixes #77931
2024-01-12 17:11:19 +00:00
Diana Picus
16945bc16d
[AMDGPU] Don't send DEALLOC_VGPRs after calls (#77439)
Calls do not have to wait for VsCnt, so after they return there might
still be scratch stores in progress. It's important that we don't send
the DEALLOC_VGPR message in that case, since that might release the
VGPRs and scratch allocation before those stores are complete.
2024-01-11 09:14:52 +01:00
Mirko Brkušanin
7ca4473dd9
[AMDGPU] Add new cache flushing instructions for GFX12 (#76944)
Co-authored-by: Diana Picus <Diana-Magda.Picus@amd.com>
2024-01-08 14:06:58 +00:00
Jay Foad
7e5019e82b
[AMDGPU] Simplify WaitcntBrackets::getRegInterval with getPhysRegBaseClass (#74087)
This means that getRegInterval no longer depends on the MCInstrDesc, so
it could be simplified further to take just a MachineOperand or just a
physical register. NFCI.
2023-12-18 14:16:02 +00:00
Jie Fu
f0b44ce28e [AMDGPU] Fix -Wunused-variable in SIInsertWaitcnts.cpp (NFC)
llvm-project/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1322:10:
 error: unused variable 'SWaitInst' [-Werror,-Wunused-variable]
    auto SWaitInst =
         ^
llvm-project/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1334:10:
 error: unused variable 'SWaitInst' [-Werror,-Wunused-variable]
    auto SWaitInst = BuildMI(Block, It, DL, TII->get(AMDGPU::S_WAITCNT_VSCNT))
         ^
2 errors generated.
2023-12-15 20:00:18 +08:00
Pierre van Houtryve
ef067f5204
[AMDGPU][SIInsertWaitcnts] Do not add s_waitcnt when the counters are known to be 0 already (#72830)
Co-authored-by: Juan Manuel MARTINEZ CAAMAÑO <juamarti@amd.com>
2023-12-15 12:33:32 +01:00
Pierre van Houtryve
f1ea77f7be
[AMDGPU][SIInsertWaitcnts] Set initial state for VS_CNT in non-kernel functions (#75436)
Split from #72830
2023-12-15 08:31:14 +01:00
Stanislav Mekhanoshin
c6ecbcb48b
[AMDGPU] Fix no waitcnt produced between LDS DMA and ds_read on gfx10 (#75245)
BUFFER_LOAD_DWORD_LDS was incorrectly touching vscnt instead of the
vmcnt. This is VMEM load and DS store, so it shall use vmcnt.
2023-12-13 10:49:36 -08:00
Mariusz Sikora
7f55d7de1a
[AMDGPU] GFX12: Add Split Workgroup Barrier (#74836)
Co-authored-by: Vang Thao <Vang.Thao@amd.com>
2023-12-13 15:01:13 +01:00
Ivan Kosarev
d1e3d32088
[AMDGPU][NFCI] Decouple actual register encodings from HWEncoding values. (#69452)
The HWEncoding values currently form a strange mix of actual register
codes for some subtargets and types of operands and informational flags.
This patch removes the dependency allowing arbitrary changes in the
structure of HWEncoding values without breaking register encodings.

Such changes, in turn, would make it possible to speed up and simplify
getAVOperandEncoding() testing for AGPRs as well as other functions
dealing with register codes downstream. They would also allow to
maintain the same format of HWEncoding values across our downstream code
bases, thus simplifying merging in mainline changes.
2023-10-25 13:24:50 +01:00
Ivan Kosarev
637dfc5f9a [AMDGPU][True16] Support disassembling .h registers.
Differential Revision: https://reviews.llvm.org/D156939
2023-09-27 12:02:50 +01:00
Juan Manuel MARTINEZ CAAMAÑO
356494c36e [NFC][AMDGPU] Perform a single lookup in map in SIInsertWaitcnts::isPreheaderToFlush 2023-09-20 14:02:04 +02:00
Arthur Eubanks
0a1aa6cda2
[NFC][CodeGen] Change CodeGenOpt::Level/CodeGenFileType into enum classes (#66295)
This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future change to the params of
TargetMachine.

This matches other nearby enums.

For downstream users, this should be a fairly straightforward
replacement,
e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive
or s/CGFT_/CodeGenFileType::
2023-09-14 14:10:14 -07:00
Luke Drummond
ce0d16f574 [NFC][AMDGPU] assert scoreboard index is in range
`getRegInterval` can theoretically return AGPRs or SGPRS which aren't
valid when updating the VgprMemTypes array. Make this clear with an
assert.
2023-08-25 13:30:06 +01:00
Jay Foad
3091bdb86d [AMDGPU] Do not release VGPRs at -O0
This was an oversight when the GFX11 early release VGPRs optimization
was reimplemented in D153279.

Sending the DEALLOC_VGPRS message is a performance optimization so there
is no need to do it at -O0. In addition it makes some kinds of post
mortem debugging hard or impossible, since VGPR values are no longer
available to inspect at the s_endpgm instruction.

Differential Revision: https://reviews.llvm.org/D157599
2023-08-10 14:58:06 +01:00
Jay Foad
e61ca23289 [AMDGPU] Add and use SIInstrFlags::GWS. NFC.
This reduces the number of places where we have to check for a list of
DS_GWS_* opcodes.

Differential Revision: https://reviews.llvm.org/D157099
2023-08-07 12:05:14 +01:00
Jay Foad
7fa7a08f21 [AMDGPU] Insert s_nop before s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
Differential Revision: https://reviews.llvm.org/D155681
2023-07-19 10:33:11 +01:00
Jay Foad
f2c164c815 [AMDGPU] Do not wait for vscnt on function entry and return
SIInsertWaitcnts inserts waitcnt instructions to resolve data
dependencies. The GFX10+ vscnt (VMEM store count) counter is never used
in this way. It is only used to resolve memory dependencies, and that is
handled by SIMemoryLegalizer. Hence there is no need to conservatively
wait for vscnt to be 0 on function entry and before returns.

Differential Revision: https://reviews.llvm.org/D153537
2023-07-04 12:22:38 +01:00
Jay Foad
98c4ab146b [AMDGPU] Simplify BlockInfo in SIInsertWaitcnts. NFC. 2023-06-20 20:10:12 +01:00