llvm-project

Author	SHA1	Message	Date
Robert Imschweiler	0b82415c59	[AMDGPU] Consider FLAT instructions for VMEM hazard detection (#137170 ) In general, "Flat instructions look at the per-workitem address and determine for each work item if the target memory address is in global, private or scratch memory." (RDNA2 ISA) That means that FLAT instructions need to be considered for VMEM hazards even without "specific segment". Also, LDS DMA should be considered for LDS hazard detection. See also #137148	2025-11-18 18:41:04 +01:00
Sergei Barannikov	86d712cda4	[AMDGPU] Use MCRegUnit, insert explicit casts to/from unsigned (NFC) (#167889 ) The casts are currently no-op because `MCRegUnit` is a typedef'ed to `unsigned`, but this will change soon enough and explicit cast will be required.	2025-11-13 21:39:02 +03:00
Kazu Hirata	50faea28fb	[llvm] Use conventional enum declarations (NFC) (#166318 ) This patch replaces: using Foo = enum { A, B, C }; with the more conventional: enum Foo { A, B, C }; These two enum declaration styles are not identical, but their difference does not matter in these .cpp files. With the "using Foo" style, the enum is unnamed and cannot be forward-declared, whereas the conventional style creates a named enum that can be. Since these changes are confined to .cpp files, this distinction has no practical impact here.	2025-11-04 07:12:53 -08:00
Carl Ritson	385c12134a	[AMDGPU] Rework GFX11 VALU Mask Write Hazard (#138663 ) Apply additional counter waits to address VALU writes to SGPRs. Rework expiry detection and apply wait coalescing to mitigate some of the additional waits.	2025-10-28 16:09:28 +09:00
Matt Arsenault	1a5494ca4a	AMDGPU: Use RegClassByHwMode to manage operand VGPR operand constraints (#158272 ) This removes special case processing in TargetInstrInfo::getRegClass to fixup register operands which depending on the subtarget support AGPRs, or require even aligned registers. This regresses assembler diagnostics, which currently work by hackily accepting invalid cases and then post-rejecting a validly parsed instruction. On the plus side this now emits a comment when disassembling unaligned registers for targets with the alignment requirement.	2025-10-08 11:19:54 +09:00
Carl Ritson	e60ca86621	[AMDGPU] Refine GCNHazardRecognizer hasHazard() (#138841 ) Remove recursion to avoid stack overflow on large CFGs. Avoid worklist for hazard search within single MachineBasicBlock. Ensure predecessors are visited for all state combinations.	2025-09-24 18:42:11 +09:00
Stanislav Mekhanoshin	32c2393ca5	[AMDGPU] Handle S_GETREG_B32_const in the hazard recognizer. NFCI (#160364 )	2025-09-23 14:30:24 -07:00
Jay Foad	f15c6ff6cb	[AMDGPU] Make use of SIInstrInfo::isWaitcnt. NFC. (#154087 )	2025-08-18 16:18:46 +01:00
Stanislav Mekhanoshin	4198649c19	[AMDGPU] Use encodeFieldVaVdst in hazard recognizer. NFCI. (#153881 ) Co-authored-by: Stephen Thomas <Stephen.Thomas@amd.com> --------- Co-authored-by: Stephen Thomas <Stephen.Thomas@amd.com>	2025-08-15 17:50:27 -07:00
Stanislav Mekhanoshin	b7ec10ca6c	[AMDGPU] Update GCNHazardRecognizer's understanding of gfx12 waitcount instructions (#153880 ) This simply updates the pass's cognizance of these instructions, and for the most part the hazards where they might be encountered do not exist for gfx12. Nonetheless, encountering them has to be checked for as doing so would indicate a compiler error. Co-authored-by: Stephen Thomas <Stephen.Thomas@amd.com> --------- Co-authored-by: Stephen Thomas <Stephen.Thomas@amd.com>	2025-08-15 17:18:41 -07:00
Stanislav Mekhanoshin	4f34c740ab	[AMDGPU] w/a for s_setreg_b32 gfx1250 hazard with MODE register (#153879 )	2025-08-15 16:08:13 -07:00
Stanislav Mekhanoshin	f1fc50748a	[AMDGPU] w/a hazard with writing s102/103 and reading FLAT_SCRATCH_BASE (#153878 )	2025-08-15 15:23:06 -07:00
Stanislav Mekhanoshin	1f25c4883e	[AMDGPU] Mitigate DS_ATOMIC_ASYNC_BARRIER_ARRIVE_B64 bug (#153872 ) DS_ATOMIC_ASYNC_BARRIER_ARRIVE_B64 shall not be claused (we already do not clause DS instructions) and needs waits before and after.	2025-08-15 14:17:54 -07:00
Stanislav Mekhanoshin	29976f2e58	[AMDGPU] Handle S_GETREG_B32 hazard on gfx1250 (#153848 ) GFX1250 SPG says: S_GETREG_B32 does not wait for idle before executing. The user must S_WAIT_ALU 0 before S_GETREG_B32 on: STATUS, STATE_PRIV, EXCP_FLAG_PRIV, or EXCP_FLAG_USER.	2025-08-15 11:38:22 -07:00
Stanislav Mekhanoshin	33abf05af4	[AMDGPU] gfx1250 v_permlane_* instructions (#151749 )	2025-08-01 16:14:19 -07:00
Changpeng Fang	e47d5eb454	[AMDGPU] Hazard handling for gfx1250 wmma instructions (#149865 ) If both instructions are xdl WMMA, hazard exists when the first WMMA writes a register (D0) and the second WMMA reads it (A1/B1/Index1). If the first instruction is a xdl WMMA, and the second one is a VALU, three kinds of hazards exist: WMMA writes (D0), VALU reads (Use1); WMMA writes (D0), VALU writes (D1); WMMA reads (A0/B0.Index0), VALU writes (D1). The actual number of hazard slots depends on the categories of the first xdl WMMA as well as whether the second instruction is a xdl WMMA or VALU. If there is not enough unrelated VALUs in between the two instructions, appropriate number (to cover the missing) of V_NOPs will be inserted to satisfy the hazard handling requirements.	2025-07-21 13:24:10 -07:00
Diana Picus	20d8398825	[AMDGPU] ISel & PEI for whole wave functions (#145858 ) Whole wave functions are functions that will run with a full EXEC mask. They will not be invoked directly, but instead will be launched by way of a new intrinsic, `llvm.amdgcn.call.whole.wave` (to be added in a future patch). These functions are meant as an alternative to the `llvm.amdgcn.init.whole.wave` or `llvm.amdgcn.strict.wwm` intrinsics. Whole wave functions will set EXEC to -1 in the prologue and restore the original value of EXEC in the epilogue. They must have a special first argument, `i1 %active`, that is going to be mapped to EXEC. They may have either the default calling convention or amdgpu_gfx. The inactive lanes need to be preserved for all registers used, active lanes only for the CSRs. At the IR level, arguments to a whole wave function (other than `%active`) contain poison in their inactive lanes. Likewise, the return value for the inactive lanes is poison. This patch contains the following work: * 2 new pseudos, SI_SETUP_WHOLE_WAVE_FUNC and SI_WHOLE_WAVE_FUNC_RETURN used for managing the EXEC mask. SI_SETUP_WHOLE_WAVE_FUNC will return a SReg_1 representing `%active`, which needs to be passed into SI_WHOLE_WAVE_FUNC_RETURN. * SelectionDAG support for generating these 2 new pseudos and the special handling of %active. Since the return may be in a different basic block, it's difficult to add the virtual reg for %active to SI_WHOLE_WAVE_FUNC_RETURN, so we initially generate an IMPLICIT_DEF which is later replaced via a custom inserter. * Expansion of the 2 pseudos during prolog/epilog insertion. PEI also marks any used VGPRs as WWM registers, which are then spilled and restored with the usual logic. Future patches will include the `llvm.amdgcn.call.whole.wave` intrinsic and a lot of optimization work (especially in order to reduce spills around function calls). --------- Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com> Co-authored-by: Shilei Tian <i@tianshilei.me>	2025-07-21 10:39:09 +02:00
Changpeng Fang	560e7df689	AMDGPU: Handle the co-execution hazards for TRANS for gfx1250 (#149024 ) For the co-execution of the TRANS ops, the requirement is: 1 independent op or V_NOP (since TRANS takes 2 cycles) after the trans op before its sources can be overwritten or the output of the trans op can be used.	2025-07-16 10:58:54 -07:00
Shilei Tian	c0e9084b1c	[AMDGPU] Add a debug option `-amdgpu-snop-padding` for `GCNHazardRecognizer` (#146587 ) This can help to identify if there is potential hazards. Co-authored-by: Byrnes, Jeffrey <Jeffrey.Byrnes@amd.com>	2025-07-02 08:16:38 -04:00
Harrison Hao	b2379bd5d5	[AMDGPU] Support bottom-up postRA scheduing. (#135295 ) Solely relying on top‑down scheduling can underutilize hardware, since long‑latency instructions often end up scheduled too late and their latency isn’t well hidden. Adding bottom‑up post‑RA scheduling lets us move those instructions earlier, which improves latency hiding and yields roughly a 2% performance gain on key benchmarks.	2025-06-05 22:07:06 +08:00
Robert Imschweiler	e55172f139	[AMDGPU] Classify FLAT instructions as VMEM (#137148 ) Also adapt hazard and wait handling.	2025-05-07 09:20:52 +02:00
Brox Chen	cd54d581b5	[AMDGPU][True16][CodeGen] add v_cndmask_t16 to hazardmask (#128912 ) add v_cndmask_t16 to hazardmask	2025-03-14 12:31:57 -04:00
sstipano	531c48546d	[AMDGPU][NFC] Move isXDL and isDGEMM to SIInstrInfo. (#129103 )	2025-02-28 03:14:51 +01:00
Fabian Ritter	8615f9aaff	[AMDGPU] Replace gfx940 and gfx941 with gfx942 in llvm (#126763 ) gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. This PR removes all non-documentation occurrences of gfx940/gfx941 from the llvm directory, and the remaining occurrences in clang. Documentation changes will follow. For SWDEV-512631	2025-02-19 10:20:48 +01:00
Rahul Joshi	bee9664970	[TableGen] Emit OpName as an enum class instead of a namespace (#125313 ) - Change InstrInfoEmitter to emit OpName as an enum class instead of an anonymous enum in the OpName namespace. - This will help clearly distinguish between values that are OpNames vs just operand indices and should help avoid bugs due to confusion between the two. - Rename OpName::OPERAND_LAST to NUM_OPERAND_NAMES. - Emit declaration of getOperandIdx() along with the OpName enum so it doesn't have to be repeated in various headers. - Also updated AMDGPU, RISCV, and WebAssembly backends to conform to the new definition of OpName (mostly mechanical changes).	2025-02-12 08:19:30 -08:00
Vigneshwar Jayakumar	1188b1ff7b	AMDGPU: Handle gfx950 XDL Write-VGPR-VALU-WAW wait state change (#126132 ) There are additional wait states for XDL write VALU WAW hazard in gfx950 compared to gfx940.	2025-02-12 01:32:23 +07:00
Vigneshwar Jayakumar	a2263eba4d	AMDGPU: Handle gfx950 XDL-write-VGPR-VALU-Mem-Exp wait state change (#126727 )	2025-02-12 01:30:53 +07:00
Vigneshwar Jayakumar	c837f57286	AMDGPU: Handle gfx950 XDL-write-VGPR-Overlap-Src-AB wait state (#126732 ) gfx950 needs more additional waitstates from gfx940	2025-02-11 22:30:16 +07:00
Carl Ritson	a3a3e6997b	[AMDGPU] Rewrite GFX12 SGPR hazard handling to dedicated pass (#118750 ) - Algorithm operates over whole IR to attempt to minimize waits. - Add support for VALU->VALU SGPR hazards via VA_SDST/VA_VCC.	2025-01-30 11:21:11 +09:00
Chinmay Deshpande	9ca1323de1	[AMDGPU] Fix crash due to missing check for FLAT instructions that dont use vector registers when computing VALU hazard (#123627 )	2025-01-21 05:50:58 -08:00
Brox Chen	8a0c2e7567	[AMDGPU][True16][MC][CodeGen] true16 for v_cndmask_b16 (#119736 ) Support true16 format for v_cndmask_b16 in MC and CodeGen in true16 and fake16 flow. Since we are replacing `v_cndmask_b16` to `v_cndmask_b16_t16/fake16`, we have to at least update the fake16 codeGen to get codeGen test passing. For this case, we have to update the true16 and with fake16 together, otherwise some of the true16 tests will fail	2025-01-16 17:18:28 -05:00
Pravin Jagtap	5e007afa9d	[AMDGPU] Handle hazard in v_scalef32_sr_fp4_* conversions (#118589 ) Presently, compiler selectivelly adds nop when opsel != 0 i.e. only when partially writing to high bytes. Experiments in SWDEV-499733 and SWDEV-501347 suggest that we need nop for above cases irrespective of opsel values. Note: We might need to add few others into the same table.	2024-12-11 18:38:10 +05:30
Matt Arsenault	39337ff2dc	AMDGPU: Handle cvt_scale F32/F16->F4/F8 gfx950 hazard (#117844 ) gfx950 SP changes doc says: No 4 clk forwarding on opcodes that convert from F32/F16->F8 or F32/F16->F4. Must insert a NOP or instruction writing some other destination VREG after a conversion to F4/F8 since it writes either low/high half or bytes. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com> Co-authored-by: Jeffrey Byrnes <Jeffrey.Byrnes@amd.com>	2024-12-02 09:23:17 -05:00
Matt Arsenault	27a8afa3fc	AMDGPU: Handle gfx950 valu write vdst + permlane read hazard (#117287 )	2024-11-25 09:33:04 -08:00
Matt Arsenault	c3fe5ad6be	AMDGPU: Handle vcmpx+permalane gfx950 hazard (#117286 ) Confusingly, this is a different hazard to the one on gfx10 with a subtarget feature.	2024-11-25 09:27:53 -08:00
Matt Arsenault	3db4f5b0da	AMDGPU: Refine gfx950 xdl-write-vgpr hazard cases (#117285 ) The 2-pass XDL write VGPR, read by non-XDL SGEMM/DGEMM case was 1 wait state overly conservative. Previously, for gfx940, the XDL/non-XDL cases happened to have the same number of cycles in all cases. Now the XDL consumer case has an additional state for 2 pass sources.	2024-11-25 09:23:51 -08:00
Matt Arsenault	85601fd78f	AMDGPU: Handle v_mfma_f64_16x16x4_f64 write VGPR read srca/srcb hazard change for gfx950 (#117284 ) Increase in wait states from 11 to 19. The index for smfmac counts as like srcA/srcB.	2024-11-22 20:30:06 -08:00
Matt Arsenault	db08d78c3e	AMDGPU: Handle v_mfma_f64_16x16x4_f64 srcc write VGPR hazard change for gfx950 (#117283 ) Read by sgemm/dgemm in srcc after v_mfma_f64_16x16x4_f64 increases from 9 to 17 wait states.	2024-11-22 20:26:58 -08:00
Matt Arsenault	8cb6c9907c	AMDGPU: Handle gfx950 XDL-write-overlapped-smfma-src-c wait state change (#117263 ) These have an additional wait state compared to gfx940.	2024-11-22 20:23:46 -08:00
Matt Arsenault	b078b882b9	AMDGPU: Handle gfx950 change in mfma_f64_16x16x4 + valu hazard (#117262 ) Increase from 11 wait states to 19	2024-11-22 20:20:23 -08:00
Juan Manuel Martinez Caamaño	d617371375	[AMDGPU] Use the SchedModel available in SIInstrInfo (#110859 ) Instead of allocating an initializing a new instance in `GCNHazardRecognizer` and `AMDGPUInsertDelayAlu`.	2024-10-02 18:17:27 +02:00
Carl Ritson	86627149f6	[AMDGPU] Mitigate GFX12 VALU read SGPR hazard (#100067 ) Any SGPR read by a VALU can potentially obscure SALU writes to the same register. Insert s_wait_alu instructions to mitigate the hazard on affected paths. Compute a global cache of SGPRs with any VALU reads and use this to avoid inserting mitigation for SGPRs never accessed by VALUs. To avoid excessive search when compile time is priority implement secondary mode where all SALU writes are mitigated. Co-authored-by: Shilei Tian <shilei.tian@amd.com>	2024-09-04 12:15:20 +09:00
Carl Ritson	987ffc31f8	[AMDGPU] Refactor code for GETPC bundle updates in hazards (NFCI) As suggested in review for PR #100067. Refactor code for S_GETPC_B64 bundle updates for use with multiple hazard mitigations.	2024-08-23 11:58:47 +09:00
Jeffrey Byrnes	7bcf4d63cf	[AMDGPU] Correctly insert s_nops for dst forwarding hazard (#100276 ) MI300 ISA section 4.5 states there is a hazard between "VALU op which uses OPSEL or SDWA with changes the result’s bit position" and "VALU op consumes result of that op" This includes the case where the second op is SDWA with same dest and dst_sel != DWORD && dst_unused == UNUSED_PRESERVE. In this case, there is an implicit read of the first op dst and the compiler needs to resolve this hazard. Confirmed with HW team. We model dst_unused == UNUSED_PRESERVE as tied-def of implicit operand, so this PR checks for that. MI300_SP_MAS section 1.3.9.2 specifies that CVT_SR_FP8_F32 and CVT_SR_BF8_F32 with opsel[3:2] !=0 have dest forwarding issue. Currently, we only add check for CVT_SR_FP8_F32 with opsel[3] != 0 -- this PR adds support opsel[2] != 0 as well	2024-08-22 11:38:24 -07:00
Carl Ritson	939a6624ac	[AMDGPU] Implement workaround for GFX11.5 export priority (#99273 ) On GFX11.5 shaders having completed exports need to execute/wait at a lower priority than shaders still executing exports. Add code to maintain normal priority of 2 for shaders that export and drop to priority 0 after exports.	2024-07-23 17:03:21 +09:00
Jay Foad	aeafdc21d2	[AMDGPU] Use using instead of typedef. NFC.	2024-07-16 16:44:12 +01:00
Jay Foad	0606747c96	[AMDGPU] Remove some pointless fallthrough annotations	2024-05-01 16:04:35 +01:00
Xu Zhang	f6d431f208	[CodeGen] Make the parameter TRI required in some functions. (#85968 ) Fixes #82659 There are some functions, such as `findRegisterDefOperandIdx` and `findRegisterDefOperand`, that have too many default parameters. As a result, we have encountered some issues due to the lack of TRI parameters, as shown in issue #82411. Following @RKSimon 's suggestion, this patch refactors 9 functions, including `{reads, kills, defines, modifies}Register`, `registerDefIsDead`, and `findRegister{UseOperandIdx, UseOperand, DefOperandIdx, DefOperand}`, adjusting the order of the TRI parameter and making it required. In addition, all the places that call these functions have also been updated correctly to ensure no additional impact. After this, the caller of these functions should explicitly know whether to pass the `TargetRegisterInfo` or just a `nullptr`.	2024-04-24 14:24:14 +01:00
Austin Kerbow	0234d90d81	[AMDGPU] Extend MFMA padding option to gfx90a+ (#86768 ) It was shown experimentally that this may have some benefit on newer HW.	2024-03-31 10:46:05 -07:00
Matt Arsenault	a6382de399	AMDGPU: Refactor mfma hazard handling [NFC] (#84276 ) Try to make this editable by using functions for the number of wait states as a function of the number of passes. I'm assuming the current hazard test coverage is comprehensive. This could probably use another round to further simplify it. Alternatively, I believe this could all be expressed in a constant table indexed by an instruction classify function and number of passes.	2024-03-07 14:39:59 +05:30

1 2 3 4

182 Commits