llvm-project

Author	SHA1	Message	Date
Austin Kerbow	e75f586b81	[AMDGPU] Relax lds dma waitcnt with no aliasing pair (#131842 ) If we cannot find any lds DMA instruction that is aliased by some load from lds, we will still insert vmcnt(0). This is overly cautious since handling inter-thread dependences is normally managed by the memory model instead of the waitcnt pass, so this change updates the behavior to be more inline with how other types of memory events are handled.	2025-03-24 10:38:47 -07:00
Akshat Oke	f10dc76f03	[AMDGPU][NPM] Port SIInsertWaitcnts to NPM (#130061 )	2025-03-24 21:36:45 +05:30
Kazu Hirata	3c2731ce46	[AMDGPU] Avoid repeated hash lookups (NFC) (#132657 )	2025-03-23 22:30:09 -07:00
Stephen Thomas	2e3fa4ba9e	[AMDGPU] Insert before and after instructions that always use GDS (#131338 ) It is an architectural requirement that there must be no outstanding GDS instructions when an "always GDS" instruction is issued, and also that an always GDS instruction must be allowed to complete. Insert waits on DScnt/LGKMcnt prior to (if necessary) and subsequent to (unconditionally) any always GDS instruction, and an additional S_NOP if the subsequent wait was followed by S_ENDPGM. Always GDS instructions are GWS instructions, DS_ORDERED_COUNT, DS_ADD_GS_REG_RTN, and DS_SUB_GS_REG_RTN (the latter two as considered always GDS as of this patch).	2025-03-21 09:33:04 +00:00
Diana Picus	8a53324aa5	[AMDGPU] Deallocate VGPRs before exiting in dynamic VGPR mode (#130037 ) In dynamic VGPR mode, Waves must deallocate all VGPRs before exiting. If the shader program does not do this, hardware inserts `S_ALLOC_VGPR 0` before S_ENDPGM, but this may incur some performance cost. Therefore it's better if the compiler proactively generates that instruction. This patch extends `si-insert-waitcnts` to deallocate the VGPRs via a `S_ALLOC_VGPR 0` before any `S_ENDPGM` when in dynamic VGPR mode.	2025-03-19 09:00:36 +01:00
Brox Chen	222b99d3aa	[AMDGPU][True16][CodeGen] update waitcnt for true16 (#128927 ) update waitcnt pass to check hi16 and lo16 in true16 mode --------- Co-authored-by: Jay Foad <jay.foad@gmail.com>	2025-03-11 10:59:51 -04:00
Jay Foad	d6c0839c9c	[AMDGPU] Reduce size of SGPR arrays in SIInsertWaitcnts. NFC. (#130097 )	2025-03-06 13:44:16 +00:00
Dmitri Gribenko	4e6721b70d	[llvm] Fix an unused variable warning	2025-03-06 14:43:13 +01:00
Jay Foad	59e0704a52	[AMDGPU] Remove RegisterEncoding from SIInsertWaitcnts. NFC. (#130056 ) The information in this struct seemed useless. VGPR0 and SGPR0 were always 0. VGPRL and SGPRL were only used in assertions.	2025-03-06 12:50:00 +00:00
Mariusz Sikora	cd3acd1bff	[AMDGPU] Remove unused s_barrier_{init,join,leave} instructions (#129548 )	2025-03-04 17:52:43 +01:00
Rahul Joshi	bee9664970	[TableGen] Emit OpName as an enum class instead of a namespace (#125313 ) - Change InstrInfoEmitter to emit OpName as an enum class instead of an anonymous enum in the OpName namespace. - This will help clearly distinguish between values that are OpNames vs just operand indices and should help avoid bugs due to confusion between the two. - Rename OpName::OPERAND_LAST to NUM_OPERAND_NAMES. - Emit declaration of getOperandIdx() along with the OpName enum so it doesn't have to be repeated in various headers. - Also updated AMDGPU, RISCV, and WebAssembly backends to conform to the new definition of OpName (mostly mechanical changes).	2025-02-12 08:19:30 -08:00
Stanislav Mekhanoshin	8a20c6459e	[AMDGPU] Create new option for force flush load counter (#124974 ) In ceratin situations it is beneficial to wait for all outstanding loads regardless of specific load's data we need. This may allow to reduce a number of cache requests. Fixes: SWDEV-511507	2025-01-30 11:14:38 -08:00
Kazu Hirata	be187369a0	[AMDGPU] Remove unused includes (NFC) (#116154 ) Identified with misc-include-cleaner.	2024-11-13 21:10:03 -08:00
Stanislav Mekhanoshin	3277c7cd28	[AMDGPU] Skip VGPR deallocation for waveslot limited kernels (#112765 ) MSG_DEALLOC_VGPRS slows down very small waveslot limited kernels. It's been identified this message is only really needed for VGPR limited kernels. A kernel becomes VGPR limited if a total number of VGPRs per SIMD / number of used VGPRs is more than a number of wave slots.	2024-10-21 09:39:52 -07:00
Shilei Tian	a74659445d	[AMDGPU] Skip terminators when forcing emit zero flag (#112116 ) When forcing emit zero, we need to skip terminators of a MBB; otherwise the terminator list of the MBB would be broken.	2024-10-14 11:46:18 -04:00
Jay Foad	cbc4be2dd5	[AMDGPU] Use MachineInstr::mayLoadOrStore. NFC.	2024-10-14 15:37:56 +01:00
Shilei Tian	ed77df56f2	[NFC] clang-format llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp	2024-10-14 00:57:01 -04:00
Shilei Tian	3da7d55b35	[NFC][AMDGPU] Remove unnecessary member `ForceEmitZeroWaitcnts` (#112114 ) We can use `ForceEmitZeroFlag` directly.	2024-10-14 00:54:16 -04:00
Jay Foad	8d13e7b8c3	[AMDGPU] Qualify auto. NFC. (#110878 ) Generated automatically with: $ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find lib/Target/AMDGPU/ -type f)	2024-10-03 13:07:54 +01:00
Jay Foad	e64ef74e64	[AMDGPU] Remember to clear a DenseMap between runs of SIInsertWaitcnts (#110650 ) This caused nondeterministic codegen in some cases.	2024-10-02 10:07:54 +01:00
Gang Chen	c66dee4c6b	[AMDGPU] Refactor several functions for merging with downstream work. (#110562 ) For setScore, the root function is setScoreByInterval with RegInterval input For determineWait, the root function is determineWait with RegInterval input	2024-10-01 08:28:55 -07:00
Stanislav Mekhanoshin	4f90e75bdc	[AMDGPU] Do not count implicit VGPRs in SIInsertWaitcnts (#109049 ) When generating waitcounts before a use or def skip VGPRs. We never have a real implicit VGPR operands on memory instructions, it is only for super-reg liveness accounting. Some other instructions (MOVRELS as an example) may have real implicit VGPR uses though. This is less then ideal but most of the problems observed with spills.	2024-09-25 00:41:49 -07:00
Stanislav Mekhanoshin	e0a16371c6	[AMDGPU] Omit isReg() check for all_uses() in SIInsertWaitcnts. NFC. (#109041 )	2024-09-18 00:08:23 -07:00
Stanislav Mekhanoshin	731a68383f	[AMDGPU] Refine operand iterators in the SIInsertWaitcnts. NFCI. (#108884 )	2024-09-17 02:58:08 -07:00
Stanislav Mekhanoshin	18f1c980bc	[AMDGPU] Avoid unneeded waitcounts before spill stores (#108303 ) Implicit defs and uses on spill stores were accounted as real defs and uses, while only exist for liveness accounting. As a result unneded waits were generated. Fixes: SWDEV-484177	2024-09-14 02:22:28 -07:00
Kazu Hirata	78505ade2c	[AMDGPU] Use range-based for loops (NFC) (#106184 )	2024-08-27 06:46:01 -07:00
Jay Foad	fa2dccb377	[AMDGPU] Remove one case of vmcnt loop header flushing for GFX12 (#105550 ) When a loop contains a VMEM load whose result is only used outside the loop, do not bother to flush vmcnt in the loop head on GFX12. A wait for vmcnt will be required inside the loop anyway, because VMEM instructions can write their VGPR results out of order.	2024-08-23 10:31:33 +01:00
Jay Foad	5506831f7b	[AMDGPU] GFX12 VMEM loads can write VGPR results out of order (#105549 ) Fix SIInsertWaitcnts to account for this by adding extra waits to avoid WAW dependencies.	2024-08-22 11:46:51 +01:00
Alexis Engelke	8cae9dcd4a	[AMDGPU] Clear load addresses between functions (#102515 ) SLoadAddresses previously held data across different functions and used these for dominance queries of blocks in different functions. This is not intended; clear the state at the end of the pass.	2024-08-08 21:26:17 +02:00
Carl Ritson	62aa596ba1	[AMDGPU] Add no return image_sample intrinsics and instructions (#97542 ) An appropriately configured image resource descriptor can trigger image_sample instructions to store outputs directly to a linked memory location instead of returning to VGPRs. This is opaque to the backend as instruction encoding is unchanged; however, a mechanism is require to allow frontends to communicate that these instructions do not require destination VGPRs and store to memory. Flagging these as stores means they will not be optimized away.	2024-07-20 17:26:58 +09:00
Jay Foad	80d261493e	[AMDGPU] clang-tidy: use override consistently. NFC.	2024-07-16 15:55:39 +01:00
Jay Foad	63a1242ae3	[AMDGPU] clang-tidy: define trivial constructors with = default. NFC.	2024-07-16 15:41:54 +01:00
paperchalice	79d0de2ac3	[CodeGen][NewPM] Port `machine-loops` to new pass manager (#97793 ) - Add `MachineLoopAnalysis`. - Add `MachineLoopPrinterPass`. - Convert to `MachineLoopInfoWrapperPass` in legacy pass manager.	2024-07-09 09:11:18 +08:00
paperchalice	4b24c2dfb5	[CodeGen][NewPM] Split `MachinePostDominators` into a concrete analysis result (#95113 ) `MachinePostDominators` version of #94571.	2024-06-12 14:29:22 +08:00
Jay Foad	558f3ea4ae	[AMDGPU] Remove #if 0 code for indexed resources in SIInsertWaitcnts (#92905 ) I do not understand what optimization this was supposed to implement. It has never been enabled. I suspect it no longer applies to GCN/RDNA architectures.	2024-05-21 13:51:42 +01:00
Jay Foad	4e86b0006b	[AMDGPU] Remove #if 0 code for buffer stores in SIInsertWaitcnts (#92903 )	2024-05-21 13:33:49 +01:00
Jay Foad	f3aaaafe50	[AMDGPU] Remove #if 0 code for fences in SIInsertWaitcnts (#92902 ) We insert required waits for fences in SIMemoryLegalizer.	2024-05-21 13:33:20 +01:00
Nicolai Hähnle	ec1f28dc97	AMDGPU/gfx12: avoid crashing on legacy waitcnt intrinsics (#92306 ) They are still accepted by the HW but have a conservative effect. Leave them untouched since handling them would complicate the logic a bit, and developers who code to such a low level really need to revisit what they're doing anyway.	2024-05-15 22:23:18 +02:00
David Stuttard	f898161bfa	[AMDGPU] Fix image_msaa_load waitcnt insertion for pre-gfx12 (#90710 ) https://github.com/llvm/llvm-project/pull/90201 made some fixes for gfx12 image_msaa_load waitcnt insertion. That fix might break in some situations for pre-gfx12 - this fixes that by explitly checking for VSAMPLE which always requires a s_wait_samplecnt and leaves the previous logic intact for non-gfx12.	2024-05-01 11:37:57 +01:00
David Stuttard	5fb1e2825f	[AMDGPU] Enhance s_waitcnt insertion before barrier for gfx12 (#90595 ) Code to determine if a waitcnt is required before a barrier instruction only considered S_BARRIER. gfx12 adds barrier_signal/wait so need to enhance the existing code to look for a barrier start (which is just an S_BARRIER for earlier architectures).	2024-05-01 11:37:13 +01:00
Jay Foad	0b21b25eac	[AMDGPU] Do not optimize away pre-existing waitcnt instructions at -O0 (#90716 ) The autogenerated memory legalizer tests use -O0 so this allows us to see the exact waitcnts that were inserted by the memory legalizer without them being optimized away.	2024-05-01 11:29:11 +01:00
David Stuttard	62dea99a7d	[AMDGPU] Fix gfx12 waitcnt type for image_msaa_load (#90201 ) image_msaa_load is actually encoded as a VSAMPLE instruction and requires the appropriate waitcnt variant.	2024-04-30 10:41:51 +01:00
Xu Zhang	f6d431f208	[CodeGen] Make the parameter TRI required in some functions. (#85968 ) Fixes #82659 There are some functions, such as `findRegisterDefOperandIdx` and `findRegisterDefOperand`, that have too many default parameters. As a result, we have encountered some issues due to the lack of TRI parameters, as shown in issue #82411. Following @RKSimon 's suggestion, this patch refactors 9 functions, including `{reads, kills, defines, modifies}Register`, `registerDefIsDead`, and `findRegister{UseOperandIdx, UseOperand, DefOperandIdx, DefOperand}`, adjusting the order of the TRI parameter and making it required. In addition, all the places that call these functions have also been updated correctly to ensure no additional impact. After this, the caller of these functions should explicitly know whether to pass the `TargetRegisterInfo` or just a `nullptr`.	2024-04-24 14:24:14 +01:00
Emma Pilkington	16fed31f44	[AMDGPU] Fix debug line table for MSG_DEALLOC_VGPRS optimization (#88924 ) Deallocating VGPRs interferes with doing a context save, which is needed for GDB to report a breakpoint. So, in this sequence: s_sendmsg MSG_DEALLOC_VGPRS s_endpgm We now use the debug location of the s_endpgm for the s_sendmsg, so a breakpoint set in the debugger at the end of a shader will be hit before deallocating VGPRs.	2024-04-18 09:29:32 -04:00
Jun Wang	86842e1f72	[AMDGPU] New clang option for emitting a waitcnt instruction after each memory instruction (#79236 ) This patch introduces a new command-line option for clang, namely, amdgpu-precise-mem-op (or precise-memory in the backend). When this option is specified, a waitcnt instruction is generated after each memory load/store instruction. The counter values are always 0, but which counters are involved depends on the memory instruction. --------- Co-authored-by: Jun Wang <jun.wang7@amd.com>	2024-04-10 10:47:04 -07:00
Jay Foad	3cf539fb04	[AMDGPU] Combine or remove redundant waitcnts at the end of each MBB (#87539 ) Call generateWaitcnt unconditionally at the end of SIInsertWaitcnts::insertWaitcntInBlock. Even if we don't need to generate a new waitcnt instruction it has the effect of combining or removing redundant waitcnts that were already present. Tests show various small improvements in waitcnt placement.	2024-04-04 10:14:16 +01:00
Christudasan Devadasan	c54f22f5fe	[AMDGPU] Add eventMask function in WaitcntGenerator class (NFC) (#85210 ) This would bring a cleaner interface while obtaining wait event masks by combining various wait event types in the derived classes.	2024-03-15 10:22:58 +05:30
vangthao95	f37c6d55c6	[AMDGPU][NFC] Refactor SIInsertWaitcnts zero waitcnt generation (#82575 ) Move the allZero* waitcnt generation methods into WaitcntGenerator class.	2024-02-22 15:55:26 -08:00
Jie Fu	779af9b713	[AMDGPU] Fix -Wunused-variable in SIInsertWaitcnts.cpp (NFC) llvm-project/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1539:10: error: unused variable 'SWaitInst' [-Werror,-Wunused-variable] auto SWaitInst = ^ 1 error generated.	2024-01-18 19:28:48 +08:00
Jay Foad	ba52f06f9d	[AMDGPU] CodeGen for GFX12 S_WAIT_* instructions (#77438 ) Update SIMemoryLegalizer and SIInsertWaitcnts to use separate wait instructions per counter (e.g. S_WAIT_LOADCNT) and split VMCNT into separate LOADCNT, SAMPLECNT and BVHCNT counters.	2024-01-18 10:47:45 +00:00

1 2 3 4 5

231 Commits