llvm-project

Author	SHA1	Message	Date
Pierre van Houtryve	6f7c77fe90	[AMDGPU] Check noalias.addrspace in mayAccessScratchThroughFlat (#151319 ) PR #149247 made the MD accessible by the backend so we can now leverage it in the memory model. The first use case here is detecting if a flat op can access scratch memory. Benefits both the MemoryLegalizer and InsertWaitCnt.	2025-08-19 07:42:59 +02:00
Stanislav Mekhanoshin	a26c3e9491	[AMDGPU] User SGPR count increased to 32 on gfx1250 (#154205 )	2025-08-18 15:04:56 -07:00
Stanislav Mekhanoshin	57c1e01e48	[AMDGPU] Don't allow wgp mode on gfx1250 (#153680 ) - gfx1250 only supports cu mode	2025-08-14 15:16:56 -07:00
Stanislav Mekhanoshin	49f2093477	[AMDGPU] Increase LDS to 320K on gfx1250 (#153645 )	2025-08-14 12:52:00 -07:00
Stanislav Mekhanoshin	80d430df5d	[AMDGPU] Add MSG_SAVEWAVE_HAS_TDM on gfx1250 (#153483 )	2025-08-13 23:01:50 -07:00
Stanislav Mekhanoshin	fc911fe928	[AMDGPU] Add HW_REG_IB_STS2 on gfx1250 (#153479 )	2025-08-13 23:01:28 -07:00
Stanislav Mekhanoshin	ea14834966	[AMDGPU] Per-subtarget DPP instruction classification (#153096 ) This is NFCI at this point.	2025-08-11 15:41:02 -07:00
Stanislav Mekhanoshin	dddeb07c2e	[AMDGPU] Restrict packed math FP32 instructions to read only one SGPR per operand on gfx12+ (#152465 ) Sec. 4.6.7.1 of the gfx1250 SPG states that if an SGPR is used as an operand, only one SGPR will be read for both the low and high operations. As a result, the corresponding bits in `op_sel` and `op_sel_hi` must be the same when the operand is an SGPR. Co-authored-by: Tian, Shilei <Shilei.Tian@amd.com> Co-authored-by: Tian, Shilei <Shilei.Tian@amd.com>	2025-08-07 16:13:34 -07:00
Stanislav Mekhanoshin	66392a8d8d	[AMDGPU] Add XNACK_STATE_PRIV and _MASK gfx1250 registers (#152374 ) Co-authored-by: Pierre Vanhoutryve <pierre.vanhoutryve@amd.com> Co-authored-by: Pierre Vanhoutryve <pierre.vanhoutryve@amd.com>	2025-08-06 14:44:17 -07:00
Stanislav Mekhanoshin	d08c2977e8	[AMDGPU] Add MC support for new gfx1250 src_flat_scratch_base_lo/hi (#152203 )	2025-08-05 14:35:48 -07:00
Matt Arsenault	1d7a0fa08a	AMDGPU: Move asm constraint physreg parsing to utils (#150903 ) Also fixes an assertion on out of bound physical register indexes.	2025-08-01 16:11:11 +09:00
Stanislav Mekhanoshin	ce40863209	[AMDGPU] Add v_cvt_sr\|pk_bf8\|fp8_f16 gfx1250 instructions (#151415 )	2025-07-30 17:24:45 -07:00
Changpeng Fang	67e2faa50c	[AMDGPU] MC support for async load and store on gfx1250 (#151030 )	2025-07-28 13:45:37 -07:00
Stanislav Mekhanoshin	a0b854d576	[AMDGPU] MC support for gfx1250 scale_offset modifier (#149881 )	2025-07-21 15:04:59 -07:00
Changpeng Fang	d6094370cb	AMDGPU: Support v_wmma_f32_16x16x128_f8f6f4 on gfx1250 (#149684 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2025-07-21 10:09:42 -07:00
Diana Picus	20d8398825	[AMDGPU] ISel & PEI for whole wave functions (#145858 ) Whole wave functions are functions that will run with a full EXEC mask. They will not be invoked directly, but instead will be launched by way of a new intrinsic, `llvm.amdgcn.call.whole.wave` (to be added in a future patch). These functions are meant as an alternative to the `llvm.amdgcn.init.whole.wave` or `llvm.amdgcn.strict.wwm` intrinsics. Whole wave functions will set EXEC to -1 in the prologue and restore the original value of EXEC in the epilogue. They must have a special first argument, `i1 %active`, that is going to be mapped to EXEC. They may have either the default calling convention or amdgpu_gfx. The inactive lanes need to be preserved for all registers used, active lanes only for the CSRs. At the IR level, arguments to a whole wave function (other than `%active`) contain poison in their inactive lanes. Likewise, the return value for the inactive lanes is poison. This patch contains the following work: * 2 new pseudos, SI_SETUP_WHOLE_WAVE_FUNC and SI_WHOLE_WAVE_FUNC_RETURN used for managing the EXEC mask. SI_SETUP_WHOLE_WAVE_FUNC will return a SReg_1 representing `%active`, which needs to be passed into SI_WHOLE_WAVE_FUNC_RETURN. * SelectionDAG support for generating these 2 new pseudos and the special handling of %active. Since the return may be in a different basic block, it's difficult to add the virtual reg for %active to SI_WHOLE_WAVE_FUNC_RETURN, so we initially generate an IMPLICIT_DEF which is later replaced via a custom inserter. * Expansion of the 2 pseudos during prolog/epilog insertion. PEI also marks any used VGPRs as WWM registers, which are then spilled and restored with the usual logic. Future patches will include the `llvm.amdgcn.call.whole.wave` intrinsic and a lot of optimization work (especially in order to reduce spills around function calls). --------- Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com> Co-authored-by: Shilei Tian <i@tianshilei.me>	2025-07-21 10:39:09 +02:00
Changpeng Fang	b52cf756ce	AMDGPU: Treat WMMA XDL ops as TRANS in S_DELAY_ALU insertion for gfx1250 (#149208 ) WMMA XDL instructions are tracked as TRANs ops and the compiler should consider them the same as TRANS in S_DELAY_ALU insertion. We use a searchable table for the InsertDelayAlu pass to recognize these WMMA XDL instructions. Co-authored-by: Stefan Stipanovic <Stefan.Stipanovic@amd.com>	2025-07-16 17:07:48 -07:00
Stanislav Mekhanoshin	5277021c3c	[AMDGPU] Add gfx1250 v_fmac_f64 implementation (#148725 )	2025-07-14 15:39:04 -07:00
Stanislav Mekhanoshin	f090554359	[AMDGPU] MC support for v_fmaak_f64/v_fmamk_f64 gfx1250 intructions (#148282 )	2025-07-11 14:17:03 -07:00
Stanislav Mekhanoshin	7920dff394	[AMDGPU] VOPD/VOPD3 changes for gfx1250 (#147602 )	2025-07-10 14:15:01 -07:00
Stanislav Mekhanoshin	00a85e5704	[AMDGPU] gfx1250: MC support for 64-bit literals (#147861 )	2025-07-09 22:25:47 -07:00
Changpeng Fang	eda3161c35	AMDGPU: Implement tensor load and store instructions for gfx1250 (#146636 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2025-07-03 13:49:34 -07:00
Kazu Hirata	e79e22c9af	[AMDGPU] Remove an unnecessary cast (NFC) (#146548 ) Val is already of uint64_t.	2025-07-01 10:42:22 -07:00
Christudasan Devadasan	08b8d467d4	[AMDGPU][GFX1250] Insert S_WAIT_XCNT for SMEM and VMEM load-stores (#145566 ) This patch tracks the register operands of both VMEM (FLAT, MUBUF, MTBUF) and SMEM load-store operations and inserts a S_WAIT_XCNT instruction with sufficient wait-count before potentially redefining them. For VMEM instructions, XNACK is returned in the same order as they were issued and hence non-zero counter values can be inserted. However, SMEM execution is out-of-order and so is their XNACK reception. Thus, only zero counter value can be inserted to capture SMEM dependencies.	2025-06-25 10:40:36 +05:30
Diana Picus	a201f8872a	[AMDGPU] Replace dynamic VGPR feature with attribute (#133444 ) Use a function attribute (amdgpu-dynamic-vgpr) instead of a subtarget feature, as requested in #130030.	2025-06-24 11:09:36 +02:00
Stanislav Mekhanoshin	fa0b84f23c	[AMDGPU] Rename call instructions from b64 to i64 (#145103 ) These get renamed in gfx1250 and on from B64 to I64: S_CALL_I64 S_GET_PC_I64 S_RFE_I64 S_SET_PC_I64 S_SWAP_PC_I64	2025-06-21 21:42:09 -07:00
Diana Picus	40a7dce9ef	[AMDGPU] Remove duplicated/confusing helpers. NFCI (#142598 ) Move canGuaranteeTCO and mayTailCallThisCC into AMDGPUBaseInfo instead of keeping two copies for DAG/Global ISel. Also remove isKernelCC, which doesn't agree with isKernel and doesn't seem very useful. While at it, also move all the CC-related helpers into AMDGPUBaseInfo.h and mark them constexpr.	2025-06-05 11:19:20 +02:00
Fangrui Song	a0901a2f87	Replace #include MCAsmLexer.h with AsmLexer.h MCAsmLexer.h has been made a forwarder header since #134207	2025-05-25 11:57:29 -07:00
Kazu Hirata	70ef89b913	[AMDGPU] Use std::optional::value_or (NFC) (#140006 )	2025-05-14 23:50:13 -07:00
Lucas Ramirez	6456ee056f	Reapply "[AMDGPU][Scheduler] Refactor ArchVGPR rematerialization during scheduling (#125885 )" (#139548 ) This reapplies 067caaa and 382a085 (reverting b35f6e2) with fixes to issues detected by the address sanitizer (MIs have to be removed from live intervals before being removed from their parent MBB). Original commit description below. AMDGPU scheduler's `PreRARematStage` attempts to increase function occupancy w.r.t. ArchVGPR usage by rematerializing trivial ArchVGPR-defining instruction next to their single use. It first collects all eligible trivially rematerializable instructions in the function, then sinks them one-by-one while recomputing occupancy in all affected regions each time to determine if and when it has managed to increase overall occupancy. If it does, changes are committed to the scheduler's state; otherwise modifications to the IR are reverted and the scheduling stage gives up. In both cases, this scheduling stage currently involves repeated queries for up-to-date occupancy estimates and some state copying to enable reversal of sinking decisions when occupancy is revealed not to increase. The current implementation also does not accurately track register pressure changes in all regions affected by sinking decisions. This commit refactors this scheduling stage, improving RP tracking and splitting the stage into two distinct steps to avoid repeated occupancy queries and IR/state rollbacks. - Analysis and collection (`canIncreaseOccupancyOrReduceSpill`). The number of ArchVGPRs to save to reduce spilling or increase function occupancy by 1 (when there is no spilling) is computed. Then, instructions eligible for rematerialization are collected, stopping as soon as enough have been identified to be able to achieve our goal (according to slightly optimistic heuristics). If there aren't enough of such instructions, the scheduling stage stops here. - Rematerialization (`rematerialize`). Instructions collected in the first step are rematerialized one-by-one. Now we are able to directly update the scheduler's state since we have already done the occupancy analysis and know we won't have to rollback any state. Register pressures for impacted regions are recomputed only once, as opposed to at every sinking decision. In the case where the stage attempted to increase occupancy, and if both rematerializations alone and rescheduling after were unable to improve occupancy, then all rematerializations are rollbacked.	2025-05-13 11:11:00 +02:00
Vitaly Buka	b35f6e26a5	Revert "[AMDGPU][Scheduler] Refactor ArchVGPR rematerialization during scheduling (#125885 )" (#139341 ) And related "[AMDGPU] Regenerate mfma-loop.ll test" Introduce memory error detected by Asan #125885. This reverts commit 382a085a95b0abeac77b150b7b644b372bd08e78. This reverts commit 067caaafb58a156d0d77229422607782a639f5b5.	2025-05-09 17:51:46 -07:00
Ivan Kosarev	66d3980b53	[AMDGPU][NFC] Remove _DEFERRED operands. (#139123 ) All immediates are deferred now.	2025-05-09 10:10:53 +01:00
Ivan Kosarev	c290f48a45	[AMDGPU][NFC] Remove unused operand types. (#139062 )	2025-05-08 12:48:25 +01:00
Lucas Ramirez	067caaafb5	[AMDGPU][Scheduler] Refactor ArchVGPR rematerialization during scheduling (#125885 ) AMDGPU scheduler's `PreRARematStage` attempts to increase function occupancy w.r.t. ArchVGPR usage by rematerializing trivial ArchVGPR-defining instruction next to their single use. It first collects all eligible trivially rematerializable instructions in the function, then sinks them one-by-one while recomputing occupancy in all affected regions each time to determine if and when it has managed to increase overall occupancy. If it does, changes are committed to the scheduler's state; otherwise modifications to the IR are reverted and the scheduling stage gives up. In both cases, this scheduling stage currently involves repeated queries for up-to-date occupancy estimates and some state copying to enable reversal of sinking decisions when occupancy is revealed not to increase. The current implementation also does not accurately track register pressure changes in all regions affected by sinking decisions. This commit refactors this scheduling stage, improving RP tracking and splitting the stage into two distinct steps to avoid repeated occupancy queries and IR/state rollbacks. - Analysis and collection (`canIncreaseOccupancyOrReduceSpill`). The number of ArchVGPRs to save to reduce spilling or increase function occupancy by 1 (when there is no spilling) is computed. Then, instructions eligible for rematerialization are collected, stopping as soon as enough have been identified to be able to achieve our goal (according to slightly optimistic heuristics). If there aren't enough of such instructions, the scheduling stage stops here. - Rematerialization (`rematerialize`). Instructions collected in the first step are rematerialized one-by-one. Now we are able to directly update the scheduler's state since we have already done the occupancy analysis and know we won't have to rollback any state. Register pressures for impacted regions are recomputed only once, as opposed to at every sinking decision. In the case where the stage attempted to increase occupancy, and if both rematerializations alone and rescheduling after were unable to improve occupancy, then all rematerializations are rollbacked.	2025-05-08 12:51:06 +02:00
David Stuttard	1a32613dac	[AMDGPU] Update pal metadata for v3.6 and fix v3.0 (#135196 ) Update entry_point for all pal versions below 3.6. 3.6 and above removes entry_point.	2025-04-28 13:31:14 +01:00
Kazu Hirata	fcd0664126	[AMDGPU] Simplify GetMember...::Get (NFC) (#137536 ) We can use "constexpr if" to combine the two variants of functions.	2025-04-27 14:09:16 -07:00
Kazu Hirata	cbd32833fb	[AMDGPU] Simplify PrintField::printField (NFC) (#137502 ) We can use "constexpr if" to combine the two variants of functions.	2025-04-27 08:52:29 -07:00
Brox Chen	cbe8f3ad76	[AMDGPU][True16][MC] fix fmac_f16_t16 vop3 format (#135464 ) add fmac_f16_t16_e64 to isfmac check to fix the vop3 format of fmac_f16_t16 instruction	2025-04-13 18:10:31 -04:00
David Stuttard	15428e0d78	[AMDGPU] Add support for point sample accel out of order returns (#127991 ) Add target feature for point sample acceleration and enable it for relevant targets. Also add support to insert waitcnts where required when point sample accel may have occurred. This has implications for out of order returns, which is why extra waitcnts are required. Add a VMEM_NOSAMPLER bit in the register masks to determine when waitcnt is required.	2025-04-10 15:33:48 +01:00
Shilei Tian	3e742b517a	[NFC][AMDGPU] clang-format `AMDGPUBaseInfo.[h,cpp]` (#133559 )	2025-03-29 10:28:34 -04:00
Shilei Tian	02b45f4b81	[AMDGPU] Add a new function `getIntegerPairAttribute` (#133271 ) The new function will return `std::nullopt` when any error occurs.	2025-03-27 15:38:54 -04:00
Shilei Tian	f1ac2afe21	Reapply "[AMDGPU] Use COV6 by default (#118515 )" (#130963 ) This reverts commit 68bcba6d7a1cc18996c0bcb7c62267c62d2040d0.	2025-03-21 15:26:45 -04:00
Diana Picus	1f84495255	[AMDGPU] Update target helpers & GCNSchedStrategy for dynamic VGPRs (#130047 ) In dynamic VGPR mode, we can allocate up to 8 blocks of either 16 or 32 VGPRs (based on a chip-wide setting which we can model with a Subtarget feature). Update some of the subtarget helpers to reflect this. In particular: - getVGPRAllocGranule is set to the block size - getAddresableNumVGPR will limit itself to 8 * size of a block We also try to be more careful about how many VGPR blocks we allocate. Therefore, when deciding if we should revert scheduling after a given stage, we check that we haven't increased the number of VGPR blocks that need to be allocated. --------- Co-authored-by: Jannik Silvanus <jannik.silvanus@amd.com>	2025-03-19 10:29:38 +01:00
Alex Voicu	c1fabd681f	[llvm][AMDGPU] Enable FWD_PROGRESS bit for GFX10+ (#128367 ) From GFX10 onwards it is possible to employ benevolent scheduling of waves. This patch unconditionally enables, for the `amdhsa` OS, the bit which controls that capability, as it is beneficial for algorithms that rely on more complex concurrent coordination and it is generally performance neutral otherwise.	2025-03-17 23:17:46 +00:00
Kazu Hirata	508db53d1a	[AMDGPU] Avoid repeated hash lookups (NFC) (#131493 )	2025-03-15 22:35:42 -07:00
Fangrui Song	7722d7519c	[MC] evaluateAsRelocatableImpl: remove the Fixup argument Follow-up to d6fbffa23c84e622735b3e880fd800985c1c0072 . This commit updates all call sites and removes the argument from the function.	2025-03-15 16:10:19 -07:00
Shilei Tian	dccc0a836c	[NFC][AMDGPU] Replace more direct arch comparison with isAMDGCN() (#131379 ) This is an extension of #131357. Hopefully this would be the last one.	2025-03-14 17:02:15 -04:00
Ana Mihajlovic	65ade6d2eb	[AMDGPU] Merge consecutive wait_alu instruction (#128916 )	2025-03-12 10:27:27 +01:00
Matt Arsenault	0eaca04125	AMDGPU: Remove repeated define in base info header The identical define is repeated on the previous line.	2025-03-04 15:58:06 +07:00
Jay Foad	44607666b3	[AMDGPU] Simplify conditional expressions. NFC. (#129228 ) Simplfy `cond ? val : false` to `cond && val` and similar.	2025-03-03 10:40:49 +00:00

1 2 3 4 5 ...

548 Commits