llvm-project

Author	SHA1	Message	Date
Austin Kerbow	89503bda38	[AMDGPU] Add structural stall heuristic to scheduling strategies (#169617 ) Implements a structural stall heuristic that considers both resource hazards and latency constraints when selecting instructions. In coexec, this changes the pending queue from a binary “not ready to issue” distinction into part of a unified candidate comparison. Pending instructions still identify structural stalls in the current cycle, but they are now evaluated directly against available instructions by stall cost, making the heuristics both more intuitive and more expressive. - Add getStructuralStallCycles() to GCNSchedStrategy that computes the number of cycles an instruction must wait due to: - Resource conflicts on unbuffered resources (from the SchedModel) - Sequence-dependent hazards (from GCNHazardRecognizer) - Add getHazardWaitStates() to GCNHazardRecognizer that returns the number of wait states until all hazards for an instruction are resolved, providing cycle-accurate hazard information for scheduling heuristics.	2026-03-23 11:33:43 -07:00
Austin Kerbow	7f77ca0dbd	[AMDGPU] Include TRANS instructions in WMMA coexecution hazard checking (#186269 )	2026-03-16 16:10:58 -07:00
Jay Foad	673a71f018	[CodeGen] Make ShouldPreferAnother const. NFC. (#185606 )	2026-03-10 15:03:56 +00:00
Jay Foad	d2c0937c5a	[AMDGPU] Make GCNHazardRecognizer "check" functions const. NFC. (#185416 )	2026-03-09 14:15:54 +00:00
Jay Foad	fff2f0ba78	[AMDGPU] Handle GFX1250 hazards between WMMA and VOPD (#183573 ) Hazards between WMMA and VALU were handled in #149865 but this only worked for regular VOP* VALU encodings, not for VOPD. Fixes: #183546	2026-02-27 19:51:53 +00:00
Dark Steve	254cb2a326	[AMDGPU] Hoist WMMA coexecution hazard V_NOPs from loops to preheaders (#176895 ) On GFX1250, V_NOPs inserted for WMMA coexecution hazards are placed at the use-site. When the hazard-consuming instruction is inside a loop and the WMMA is outside, these NOPs execute every iteration even though the hazard only needs to be covered once. This patch hoists the V_NOPs to the loop preheader, reducing executions from N iterations to 1. ``` Example (assuming a hazard requiring K V_NOPs): Before: bb.0 (preheader): WMMA writes vgpr0 bb.1 (loop): V_NOP xK, VALU reads vgpr0, branch bb.1 -> K NOPs executed per iteration After: bb.0 (preheader): WMMA writes vgpr0, V_NOP xK bb.1 (loop): VALU reads vgpr0, branch bb.1 -> K NOPs executed once ``` For nested loops, V_NOPs are hoisted to the outermost preheader where no WMMA hazard exists within the loop. Hoisting is restricted to strict preheaders (not any single predecessor) to avoid introducing V_NOPs on unrelated control flow paths. The optimization is controlled by `-amdgpu-wmma-vnop-hoisting` (default: on). Fixes: SWDEV-573407	2026-02-26 17:19:00 +05:30
vporpo	5a756c8a3a	[AMDGPU][SIInsertWaitcnts][NFC] Make Waitcnt members private (#180772 ) This patch makes Waitcnt member variables private and replaces their accesses with calls to set() or get(). This will help us change the implementation to an a array in the followup patch.	2026-02-23 11:44:19 -08:00
Mariusz Sikora	3c0f5045e1	[AMDGPU] Add FeatureGFX13 and SMEM encoding for gfx13 (#177567 ) For now list of features is based on gfx12 and gfx1250 --------- Co-authored-by: Jay Foad <jay.foad@amd.com>	2026-01-26 14:16:36 +01:00
Sam Elliott	7184229fea	[NFC][MI] Tidy Up RegState enum use (2/2) (#177090 ) This Change makes `RegState` into an enum class, with bitwise operators. It also: - Updates declarations of flag variables/arguments/returns from `unsigned` to `RegState`. - Updates empty RegState initializers from 0 to `{}`. If this is causing problems in downstream code: - Adopt the `RegState getXXXRegState(bool)` functions instead of using a ternary operator such as `bool ? RegState::XXX : 0`. - Adopt the `bool hasRegState(RegState, RegState)` function instead of using a bitwise check of the flags.	2026-01-23 00:19:03 -08:00
Sam Elliott	2042887709	Reland "[NFC][MI] Tidy Up RegState enum use (1/2)" (#176277 ) This Change is to prepare to make RegState into an enum class. It: - Updates documentation to match the order in the code. - Brings the `get<>RegState` functions together and makes them `constexpr`. - Adopts the `get<>RegState` where RegStates were being chosen with ternary operators in backend code. - Introduces `hasRegState` to make querying RegState easier once it is an enum class. - Adopts `hasRegState` where equivalent was done with bitwise arithmetic. - Introduces `RegState::NoFlags`, which will be used for the lack of flags. - Documents that `0x1` is a reserved flag value used to detect if someone is passing `true` instead of flags (due to implicit bool to unsigned conversions). - Updates two calls to `MachineInstrBuilder::addReg` which were passing `false` to the flags operand, to no longer pass a value. - Documents that `getRegState` seems to have forgotten a call to `getEarlyClobberRegState`. This PR relands llvm/llvm-project#176091 (commit 1d616cdca3aba9d22f120888bb6b09b75ca90b92) which was reverted in llvm/llvm-project#176190 (commit 6309cd8668fc2ae589f156b23f86821f4ce5b7ea).	2026-01-16 13:05:06 -08:00
Sam Elliott	6309cd8668	Revert "[NFC][MI] Tidy Up RegState enum use (1/2)" (#176190 ) Reverts llvm/llvm-project#176091 Reverting because some compilers were erroring on the call to `Reg.isReg()` (which is not `constexpr`) in a `constexpr` function.	2026-01-15 07:58:05 -08:00
Sam Elliott	1d616cdca3	[NFC][MI] Tidy Up RegState enum use (1/2) (#176091 ) This Change is to prepare to make RegState into an enum class. It: - Updates documentation to match the order in the code. - Brings the `get<>RegState` functions together and makes them `constexpr`. - Adopts the `get<>RegState` where RegStates were being chosen with ternary operators in backend code. - Introduces `hasRegState` to make querying RegState easier once it is an enum class. - Adopts `hasRegState` where equivalent was done with bitwise arithmetic. - Introduces `RegState::NoFlags`, which will be used for the lack of flags. - Documents that `0x1` is a reserved flag value used to detect if someone is passing `true` instead of flags (due to implicit bool to unsigned conversions). - Updates two calls to `MachineInstrBuilder::addReg` which were passing `false` to the flags operand, to no longer pass a value. - Documents that `getRegState` seems to have forgotten a call to `getEarlyClobberRegState`.	2026-01-15 07:47:05 -08:00
LU-JOHN	49381c3000	[NFC][AMDGPU] Declare variables initialized with getDebugLoc as const ref (#174434 ) Declare variables initialized with getDebugLoc as a const reference. Signed-off-by: John Lu <John.Lu@amd.com>	2026-01-05 12:37:47 -06:00
LU-JOHN	7e2b79b049	[AMDGPU] Generate more efficient code to avoid shift64 hazard (#171871 ) Generate more efficient code to avoid shift64 hazard when dst!=src1. Transform: dst = shiftrev64 amt, src1 to: dst.sub0 =amt dst = shiftrev64 dst.sub0, src1 --------- Signed-off-by: John Lu <John.Lu@amd.com>	2026-01-05 09:19:15 -06:00
Stephen Thomas	7c328d8a0a	[AMDGPU][GCNHazardRecognizer] Remove instances of hardcoded S_WAITCNT_DEPCTR operand values (#171811 ) Two S_WAITCNT_DEPCTR instructions are constructed with hardcoded operand values. Replace these with appropriate calls to AMDGPU::DepCtr::encodeFieldVmVsrc(). NFC, except that the original code was setting reserved operand bits that should-be-zero, and this is now corrected.	2025-12-11 13:26:54 +00:00
Stanislav Mekhanoshin	fffe9bcbc7	[AMDGPU] Allow hazard checks for WMMA co-exec (#168805 ) Now we are just inserting V_NOP instrtuctions, try to schedule something into the shadow. It is still somewhat imprecise, for example AdvanceCycle() will use TII.getNumWaitStates() anyway, but in a scheduling mode we are not required to be precise. We must be finally precise in the hazard recognizer mode. Then EmittedInstrs buffer is also limited to MaxLookAhead even though VALU only hazards may actually never expire and require an endless buffer. But that's OK, we can at least mitigate what the buffer can hold. The buffer is also currently much bigger than any of VALU hazards may need. That said the rest of the 'fix*' functions here can be changed the same way, these which are using V_NOPs. This one is just the worst because it may require up to 9 nops.	2025-12-01 11:46:30 -08:00
Stanislav Mekhanoshin	e6ae2462bd	[AMDGPU] Refactor hazard recognizer for VALU-pipeline hazards. NFCI. (#168801 ) This is in preparation of handling these in scheduler. I do not expect any changes to the produced code here, it is just an infrastructure. Our current problem with the VALU pipeline hazards is that we only insert V_NOP instructions in the hazard recognizer mode, but ignore it during scheduling. This patch is meant to create a mechanism to actually account for that during scheduling.	2025-12-01 10:59:52 -08:00
Jay Foad	d748c81218	[AMDGPU] Change the immediate operand of s_waitcnt_depctr / s_wait_alu (#169378 ) The 16-bit immediate operand of s_waitcnt_depctr / s_wait_alu has some unused bits. Previously codegen would set these bits to 1, but setting them to 0 matches the SP3 assembler behaviour better, which in turn means that we can print them using the human readable SP3 syntax: s_wait_alu 0xfffd ; unused bits set to 1 s_wait_alu 0xff9d ; unused bits set to 0 s_wait_alu depctr_va_vcc(0) ; unused bits set to 0, human readable Note that the set of unused bits changed between GFX10.1 and GFX10.3.	2025-11-25 11:55:26 +00:00
Robert Imschweiler	0b82415c59	[AMDGPU] Consider FLAT instructions for VMEM hazard detection (#137170 ) In general, "Flat instructions look at the per-workitem address and determine for each work item if the target memory address is in global, private or scratch memory." (RDNA2 ISA) That means that FLAT instructions need to be considered for VMEM hazards even without "specific segment". Also, LDS DMA should be considered for LDS hazard detection. See also #137148	2025-11-18 18:41:04 +01:00
Sergei Barannikov	86d712cda4	[AMDGPU] Use MCRegUnit, insert explicit casts to/from unsigned (NFC) (#167889 ) The casts are currently no-op because `MCRegUnit` is a typedef'ed to `unsigned`, but this will change soon enough and explicit cast will be required.	2025-11-13 21:39:02 +03:00
Kazu Hirata	50faea28fb	[llvm] Use conventional enum declarations (NFC) (#166318 ) This patch replaces: using Foo = enum { A, B, C }; with the more conventional: enum Foo { A, B, C }; These two enum declaration styles are not identical, but their difference does not matter in these .cpp files. With the "using Foo" style, the enum is unnamed and cannot be forward-declared, whereas the conventional style creates a named enum that can be. Since these changes are confined to .cpp files, this distinction has no practical impact here.	2025-11-04 07:12:53 -08:00
Carl Ritson	385c12134a	[AMDGPU] Rework GFX11 VALU Mask Write Hazard (#138663 ) Apply additional counter waits to address VALU writes to SGPRs. Rework expiry detection and apply wait coalescing to mitigate some of the additional waits.	2025-10-28 16:09:28 +09:00
Matt Arsenault	1a5494ca4a	AMDGPU: Use RegClassByHwMode to manage operand VGPR operand constraints (#158272 ) This removes special case processing in TargetInstrInfo::getRegClass to fixup register operands which depending on the subtarget support AGPRs, or require even aligned registers. This regresses assembler diagnostics, which currently work by hackily accepting invalid cases and then post-rejecting a validly parsed instruction. On the plus side this now emits a comment when disassembling unaligned registers for targets with the alignment requirement.	2025-10-08 11:19:54 +09:00
Carl Ritson	e60ca86621	[AMDGPU] Refine GCNHazardRecognizer hasHazard() (#138841 ) Remove recursion to avoid stack overflow on large CFGs. Avoid worklist for hazard search within single MachineBasicBlock. Ensure predecessors are visited for all state combinations.	2025-09-24 18:42:11 +09:00
Stanislav Mekhanoshin	32c2393ca5	[AMDGPU] Handle S_GETREG_B32_const in the hazard recognizer. NFCI (#160364 )	2025-09-23 14:30:24 -07:00
Jay Foad	f15c6ff6cb	[AMDGPU] Make use of SIInstrInfo::isWaitcnt. NFC. (#154087 )	2025-08-18 16:18:46 +01:00
Stanislav Mekhanoshin	4198649c19	[AMDGPU] Use encodeFieldVaVdst in hazard recognizer. NFCI. (#153881 ) Co-authored-by: Stephen Thomas <Stephen.Thomas@amd.com> --------- Co-authored-by: Stephen Thomas <Stephen.Thomas@amd.com>	2025-08-15 17:50:27 -07:00
Stanislav Mekhanoshin	b7ec10ca6c	[AMDGPU] Update GCNHazardRecognizer's understanding of gfx12 waitcount instructions (#153880 ) This simply updates the pass's cognizance of these instructions, and for the most part the hazards where they might be encountered do not exist for gfx12. Nonetheless, encountering them has to be checked for as doing so would indicate a compiler error. Co-authored-by: Stephen Thomas <Stephen.Thomas@amd.com> --------- Co-authored-by: Stephen Thomas <Stephen.Thomas@amd.com>	2025-08-15 17:18:41 -07:00
Stanislav Mekhanoshin	4f34c740ab	[AMDGPU] w/a for s_setreg_b32 gfx1250 hazard with MODE register (#153879 )	2025-08-15 16:08:13 -07:00
Stanislav Mekhanoshin	f1fc50748a	[AMDGPU] w/a hazard with writing s102/103 and reading FLAT_SCRATCH_BASE (#153878 )	2025-08-15 15:23:06 -07:00
Stanislav Mekhanoshin	1f25c4883e	[AMDGPU] Mitigate DS_ATOMIC_ASYNC_BARRIER_ARRIVE_B64 bug (#153872 ) DS_ATOMIC_ASYNC_BARRIER_ARRIVE_B64 shall not be claused (we already do not clause DS instructions) and needs waits before and after.	2025-08-15 14:17:54 -07:00
Stanislav Mekhanoshin	29976f2e58	[AMDGPU] Handle S_GETREG_B32 hazard on gfx1250 (#153848 ) GFX1250 SPG says: S_GETREG_B32 does not wait for idle before executing. The user must S_WAIT_ALU 0 before S_GETREG_B32 on: STATUS, STATE_PRIV, EXCP_FLAG_PRIV, or EXCP_FLAG_USER.	2025-08-15 11:38:22 -07:00
Stanislav Mekhanoshin	33abf05af4	[AMDGPU] gfx1250 v_permlane_* instructions (#151749 )	2025-08-01 16:14:19 -07:00
Changpeng Fang	e47d5eb454	[AMDGPU] Hazard handling for gfx1250 wmma instructions (#149865 ) If both instructions are xdl WMMA, hazard exists when the first WMMA writes a register (D0) and the second WMMA reads it (A1/B1/Index1). If the first instruction is a xdl WMMA, and the second one is a VALU, three kinds of hazards exist: WMMA writes (D0), VALU reads (Use1); WMMA writes (D0), VALU writes (D1); WMMA reads (A0/B0.Index0), VALU writes (D1). The actual number of hazard slots depends on the categories of the first xdl WMMA as well as whether the second instruction is a xdl WMMA or VALU. If there is not enough unrelated VALUs in between the two instructions, appropriate number (to cover the missing) of V_NOPs will be inserted to satisfy the hazard handling requirements.	2025-07-21 13:24:10 -07:00
Diana Picus	20d8398825	[AMDGPU] ISel & PEI for whole wave functions (#145858 ) Whole wave functions are functions that will run with a full EXEC mask. They will not be invoked directly, but instead will be launched by way of a new intrinsic, `llvm.amdgcn.call.whole.wave` (to be added in a future patch). These functions are meant as an alternative to the `llvm.amdgcn.init.whole.wave` or `llvm.amdgcn.strict.wwm` intrinsics. Whole wave functions will set EXEC to -1 in the prologue and restore the original value of EXEC in the epilogue. They must have a special first argument, `i1 %active`, that is going to be mapped to EXEC. They may have either the default calling convention or amdgpu_gfx. The inactive lanes need to be preserved for all registers used, active lanes only for the CSRs. At the IR level, arguments to a whole wave function (other than `%active`) contain poison in their inactive lanes. Likewise, the return value for the inactive lanes is poison. This patch contains the following work: * 2 new pseudos, SI_SETUP_WHOLE_WAVE_FUNC and SI_WHOLE_WAVE_FUNC_RETURN used for managing the EXEC mask. SI_SETUP_WHOLE_WAVE_FUNC will return a SReg_1 representing `%active`, which needs to be passed into SI_WHOLE_WAVE_FUNC_RETURN. * SelectionDAG support for generating these 2 new pseudos and the special handling of %active. Since the return may be in a different basic block, it's difficult to add the virtual reg for %active to SI_WHOLE_WAVE_FUNC_RETURN, so we initially generate an IMPLICIT_DEF which is later replaced via a custom inserter. * Expansion of the 2 pseudos during prolog/epilog insertion. PEI also marks any used VGPRs as WWM registers, which are then spilled and restored with the usual logic. Future patches will include the `llvm.amdgcn.call.whole.wave` intrinsic and a lot of optimization work (especially in order to reduce spills around function calls). --------- Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com> Co-authored-by: Shilei Tian <i@tianshilei.me>	2025-07-21 10:39:09 +02:00
Changpeng Fang	560e7df689	AMDGPU: Handle the co-execution hazards for TRANS for gfx1250 (#149024 ) For the co-execution of the TRANS ops, the requirement is: 1 independent op or V_NOP (since TRANS takes 2 cycles) after the trans op before its sources can be overwritten or the output of the trans op can be used.	2025-07-16 10:58:54 -07:00
Shilei Tian	c0e9084b1c	[AMDGPU] Add a debug option `-amdgpu-snop-padding` for `GCNHazardRecognizer` (#146587 ) This can help to identify if there is potential hazards. Co-authored-by: Byrnes, Jeffrey <Jeffrey.Byrnes@amd.com>	2025-07-02 08:16:38 -04:00
Harrison Hao	b2379bd5d5	[AMDGPU] Support bottom-up postRA scheduing. (#135295 ) Solely relying on top‑down scheduling can underutilize hardware, since long‑latency instructions often end up scheduled too late and their latency isn’t well hidden. Adding bottom‑up post‑RA scheduling lets us move those instructions earlier, which improves latency hiding and yields roughly a 2% performance gain on key benchmarks.	2025-06-05 22:07:06 +08:00
Robert Imschweiler	e55172f139	[AMDGPU] Classify FLAT instructions as VMEM (#137148 ) Also adapt hazard and wait handling.	2025-05-07 09:20:52 +02:00
Brox Chen	cd54d581b5	[AMDGPU][True16][CodeGen] add v_cndmask_t16 to hazardmask (#128912 ) add v_cndmask_t16 to hazardmask	2025-03-14 12:31:57 -04:00
sstipano	531c48546d	[AMDGPU][NFC] Move isXDL and isDGEMM to SIInstrInfo. (#129103 )	2025-02-28 03:14:51 +01:00
Fabian Ritter	8615f9aaff	[AMDGPU] Replace gfx940 and gfx941 with gfx942 in llvm (#126763 ) gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. This PR removes all non-documentation occurrences of gfx940/gfx941 from the llvm directory, and the remaining occurrences in clang. Documentation changes will follow. For SWDEV-512631	2025-02-19 10:20:48 +01:00
Rahul Joshi	bee9664970	[TableGen] Emit OpName as an enum class instead of a namespace (#125313 ) - Change InstrInfoEmitter to emit OpName as an enum class instead of an anonymous enum in the OpName namespace. - This will help clearly distinguish between values that are OpNames vs just operand indices and should help avoid bugs due to confusion between the two. - Rename OpName::OPERAND_LAST to NUM_OPERAND_NAMES. - Emit declaration of getOperandIdx() along with the OpName enum so it doesn't have to be repeated in various headers. - Also updated AMDGPU, RISCV, and WebAssembly backends to conform to the new definition of OpName (mostly mechanical changes).	2025-02-12 08:19:30 -08:00
Vigneshwar Jayakumar	1188b1ff7b	AMDGPU: Handle gfx950 XDL Write-VGPR-VALU-WAW wait state change (#126132 ) There are additional wait states for XDL write VALU WAW hazard in gfx950 compared to gfx940.	2025-02-12 01:32:23 +07:00
Vigneshwar Jayakumar	a2263eba4d	AMDGPU: Handle gfx950 XDL-write-VGPR-VALU-Mem-Exp wait state change (#126727 )	2025-02-12 01:30:53 +07:00
Vigneshwar Jayakumar	c837f57286	AMDGPU: Handle gfx950 XDL-write-VGPR-Overlap-Src-AB wait state (#126732 ) gfx950 needs more additional waitstates from gfx940	2025-02-11 22:30:16 +07:00
Carl Ritson	a3a3e6997b	[AMDGPU] Rewrite GFX12 SGPR hazard handling to dedicated pass (#118750 ) - Algorithm operates over whole IR to attempt to minimize waits. - Add support for VALU->VALU SGPR hazards via VA_SDST/VA_VCC.	2025-01-30 11:21:11 +09:00
Chinmay Deshpande	9ca1323de1	[AMDGPU] Fix crash due to missing check for FLAT instructions that dont use vector registers when computing VALU hazard (#123627 )	2025-01-21 05:50:58 -08:00
Brox Chen	8a0c2e7567	[AMDGPU][True16][MC][CodeGen] true16 for v_cndmask_b16 (#119736 ) Support true16 format for v_cndmask_b16 in MC and CodeGen in true16 and fake16 flow. Since we are replacing `v_cndmask_b16` to `v_cndmask_b16_t16/fake16`, we have to at least update the fake16 codeGen to get codeGen test passing. For this case, we have to update the true16 and with fake16 together, otherwise some of the true16 tests will fail	2025-01-16 17:18:28 -05:00
Pravin Jagtap	5e007afa9d	[AMDGPU] Handle hazard in v_scalef32_sr_fp4_* conversions (#118589 ) Presently, compiler selectivelly adds nop when opsel != 0 i.e. only when partially writing to high bytes. Experiments in SWDEV-499733 and SWDEV-501347 suggest that we need nop for above cases irrespective of opsel values. Note: We might need to add few others into the same table.	2024-12-11 18:38:10 +05:30

1 2 3 4

200 Commits