llvm-project

Author	SHA1	Message	Date
Sirish Pande	dbde0ba150	[AMDGPU] Rename 1_5xVGPRs to 1536VGPRs to be more contextual. NFC (#190245 ) Renaming feature from 1_5xVGPRs to 1536VGPRs to to be more contextual.	2026-04-02 15:35:13 -05:00
Petar Avramovic	5226289b8e	Revert "AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3" (#190159 ) This reverts commit 47f6a19181b426baa03182ab6a7a41e16b35301d. Breaks MIOpen, don't have propper fix yet.	2026-04-02 14:05:08 +00:00
Petar Avramovic	2f38a8fc57	AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3 (#179226 ) For V_DOT2_F32_F16 and V_DOT2_F32_BF16 add their VOPDName and mark them with usesCustomInserter which will be used to add pre-RA register allocation hints to preferably assign dst and src2 to the same physical register. When the hint is satisfied, canMapVOP3PToVOPD recognises the instruction as eligible for VOPD pairing by checking if it is VOP2 like: dst==src2, no source modifiers, no clamp, and src1 is a register. Mark both instructions as commutable to allow a literal in src1 to be moved to src0, since VOPD only permits a literal in src0.	2026-03-25 11:47:07 +01:00
Sameer Sahasrabuddhe	e6789f94b9	[AMDGPU] Introduce ASYNC_CNT on GFX1250 (#185810 ) Async operations transfer data between global memory and LDS. Their progress is tracked by the ASYNC_CNT counter on GFX1250 and later architectures. This change introduces the representation of that counter in SIInsertWaitCnts. For now, the programmer must manually insert s_wait_asyncnt instructions. Later changes will add compiler assistance for generating the waits by including this counter in the asyncmark instructions. Assisted-by: Claude Sonnet 4.5 This is part of a stack: - #185813 - #185810	2026-03-20 09:52:05 +00:00
Stanislav Mekhanoshin	59bc629bf3	[AMDGPU] Fix decoding of SETREG MSBs (#187578 ) Decoding of the immediate was wrong with non-zero offset and did not factor MSB fixup offset handling.	2026-03-19 14:08:03 -07:00
vporpo	4df2725a2e	[AMDGPU][AMDGPUBaseInfo] Replace Waitcnt members with array (#182927 ) This patch replaces the member variables of Waitcnt with an array. This helps in several ways: (i) It helps replace switch cases with array accesses, and (ii) It makes operating on all elements with a loop which is much easier, and should require less maintenance if we add more counters	2026-03-19 11:37:10 -07:00
Jannik Silvanus	6cc0cb69e4	AMDGPU: Don't limit VGPR usage based on occupancy in dVGPR mode (#185981 ) The maximum VGPR usage of a shader is limited based on the target occupancy, ensuring that the targeted number of waves actually fit onto a CU/WGP. However, in dynamic VGPR mode, we should not do that, because VGPRs are allocated dynamically at runtime, and there are no static constraints based on occupancy. Fix that in this patch. Also fixup the getMinNumVGPRs helper to behave consistently by always returning zero in dVGPR mode. This also fixes a problem where AMDGPUAsmPrinter bumps the VGPR usage to at least the result of getMinNumVGPRs, per my understanding in order to avoid an occupancy that is higher than the occupancy target. That was causing incorrect (too high) VGPR usages in dVGPR mode with medium-sized workgroups (say 768).	2026-03-16 14:39:26 +01:00
Mirko Brkušanin	efd20a3603	[AMDGPU] Codegen for min/max instructions for gfx1170 (#185625 ) gfx1170 does not have s_minimum/maximum_f16/f32 instructions so a new feature `SALUMinimumMaximumInsts` is added for gfx12+ subtargets.	2026-03-12 12:32:56 +01:00
Stanislav Mekhanoshin	b5fc8a15e0	[AMDGPU] Recover high VGPRs from S_SETREG_IMM32_B32 in disasm (#185968 )	2026-03-11 14:37:14 -07:00
Stanislav Mekhanoshin	f16d872c73	[AMDGPU] Add asm comments if setreg changes MSBs (#185774 )	2026-03-11 11:37:03 -07:00
Addmisol	1b61537d44	[AMDGPU][InstCombine] Fold unused m0 operand to poison for sendmsg intrinsics (#183755 ) Fold the second operand (m0) of llvm.amdgcn.s.sendmsg and llvm.amdgcn.s.sendmsghalt to poison when the message type does not use m0. Only MSG_GS_ALLOC_REQ (message ID 9) actually reads the m0 value. All other message types ignore it, so we can fold the operand to poison, which eliminates unnecessary s_mov_b32 m0, 0 instructions in the generated code. Fixes https://github.com/llvm/llvm-project/issues/183605 - Added InstCombine case for amdgcn_s_sendmsg and amdgcn_s_sendmsghalt intrinsics - Extract message ID using 8-bit mask to handle both pre-GFX11 (4-bit) and GFX11+ (8-bit) encoding - Only preserve m0 operand for ID_GS_ALLOC_REQ	2026-03-11 19:27:45 +01:00
Mirko Brkušanin	a5aa136eb3	[AMDGPU] Add GFX11_7Insts feature, eliminate isGFX1170 helpers. NFC (#185878 )	2026-03-11 17:05:18 +01:00
Matt Arsenault	8ec961e1a9	Reapply "AMDGPU: Annotate group size ABI loads with range metadata (#185420 )" (#185588 ) This reverts commit d5685ac6db0ae4cbca1745f18d8f2f7dc7d673a5. Fix off by one error. The end of the range is open.	2026-03-10 07:41:26 +00:00
Matt Arsenault	3545e51093	Revert "AMDGPU: Annotate group size ABI loads with range metadata (#185420 )" (#185521 ) This reverts commit 76daf31b4000623d5c9548348a859ea3ed8712e1. Bot failure.	2026-03-10 01:04:02 +00:00
Matt Arsenault	76daf31b40	AMDGPU: Annotate group size ABI loads with range metadata (#185420 ) We previously did the same for the grid size when annotated. The group size is easier, so it's weird that this wasn't implemented first.	2026-03-09 19:11:59 +01:00
Mariusz Sikora	2e93eb71b6	[AMDGPU] Use subtarget feature for flat offset bit width instead of arch checks (#183742 )	2026-03-05 10:58:42 +01:00
Matt Arsenault	2c95b8d518	AMDGPU: Clean up print handling of AMDGPUTargetID (#184643 ) Provide print to raw_ostream method and use it where applicable.	2026-03-04 18:52:22 +01:00
Matt Arsenault	8bb41c929f	AMDGPU: Fix copy of Triple (#184594 )	2026-03-04 12:41:56 +00:00
Changpeng Fang	5b144c0aec	[AMDGPU] Add suffix _d4 to tensor load/store with 4 groups D#, NFC (#184176 ) Rename TENSOR_LOAD_TO_LDS to TENSOR_LOAD_TO_LDS_d4; Rename TENSOR_STORE_FROM_LDS to TENSOR_STORE_FROM_LDS_d4; Also rename function names in a couple of tests to reflect this change.	2026-03-03 14:10:38 -08:00
Mariusz Sikora	610b40706f	[AMDGPU] Add VOP2 to gfx13 (#182812 ) Co-authored-by: Ivan Kosarev <ivan.kosarev@amd.com>	2026-02-24 08:42:49 +01:00
Mariusz Sikora	1911488ca3	[AMDGPU] Add VOPD to gfx13 (#182815 ) Co-authored-by: Jay Foad <jay.foad@amd.com>	2026-02-24 08:42:03 +01:00
vporpo	5a756c8a3a	[AMDGPU][SIInsertWaitcnts][NFC] Make Waitcnt members private (#180772 ) This patch makes Waitcnt member variables private and replaces their accesses with calls to set() or get(). This will help us change the implementation to an a array in the followup patch.	2026-02-23 11:44:19 -08:00
Mariusz Sikora	61fa74b899	[AMDGPU] Add VEXPORT encoding for gfx13 (#181788 ) Co-authored-by: Jay Foad <jay.foad@amd.com>	2026-02-19 08:46:50 +01:00
Jay Foad	27144f4c2e	[TableGen] Return int32_t from InstrMapping table lookup functions. NFC. (#182079 ) Since #182059 there is only one case in which these functions return -1, so callers no longer need to distinguish between (int64_t)-1 and (uint32_t)-1, so we can go back to a 32-bit return value like it was before #180954.	2026-02-18 18:49:58 +00:00
Mirko Brkušanin	829afc4c91	[AMDGPU] Add WMMA and SWMMAC instructions for gfx1170 (#180731 ) Introduce two new subtarget features: - WMMA256bInsts for GFX11 WMMA instructions and - WMMA128bInsts for GFX1170 and GFX12 WMMA and SWMMAC instructions Some WMMA instructions have changed from GFX 11.0 to GFX 11.7 so new Real versions were added with "_gfx1170" suffix. For consistency all WMMA and SWMMAC GFX11.7 instructions use this suffix. To resolve decoding issues between different formats for some WMMA instructions between GFX 11 and GFX 11.7, new decoding tables were added.	2026-02-18 19:17:48 +01:00
sstipano	5ec5701db3	Reapply "[MC][TableGen] Expand Opcode field of MCInstrDesc" (#180321 ) (#180954 ) Difference from the previous version is that this one doesn't actually encode opcodes in matcher tables as 32 bits, but still as 16 bits.	2026-02-12 09:17:02 +01:00
vporpo	9898082bd3	[AMDGPU][SIInsertWaitcnt][NFC] Access Waitcnt elements using InstCounterType (#178345 ) This patch introduces `get(T)` and `set(T, Val)` functions for Waitcnt and removes getCounterRef() and getWait(). For this to work we also need to move InstrCounterType to AMDGPUBaseInfo.h. Please note that the member variables are still public to keep this patch small. They will be replaced in the follow-up patch.	2026-02-09 17:54:08 -08:00
Vladimir Vereschaka	19d681177f	Revert "[MC][TableGen] Expand Opcode field of MCInstrDesc" (#180321 ) Reverts llvm/llvm-project#179652 This PR causes the out-of-memory build failures on many Windows builders.	2026-02-06 21:58:50 -08:00
sstipano	13d8870d45	[MC][TableGen] Expand Opcode field of MCInstrDesc (#179652 ) Increase width of Opcode to `int` from `short` to allow more capacity.	2026-02-06 20:21:48 +01:00
vporpo	1658456ccf	[AMDGPU] Introduce custom MIR formatting for s_wait_alu (#176316 ) This patch implements a custom printer/parser for the immediate operand of s_wait_alu that prints/parses the decoded counter values. Format: ``` .<counter1>_<value1>_<counter2>_<value2> ``` Example: `s_wait_alu .VaVdst_1_VmVsrc_1` ; Which is equivalent to this: `s_wait_alu 8167` Features: - If a counter is at its maximum value it won't get printed. - The parser will error out if a counter is greater or equal to its max value. - If all counters are disabled we can use 'AllOff'. - For now we also accept numeric values for backwards compatibility with older MIR. Note: This is similar to https://github.com/llvm/llvm-project/pull/96004 but for `s_wait_alu`.	2026-01-31 10:46:59 -08:00
Mariusz Sikora	3c0f5045e1	[AMDGPU] Add FeatureGFX13 and SMEM encoding for gfx13 (#177567 ) For now list of features is based on gfx12 and gfx1250 --------- Co-authored-by: Jay Foad <jay.foad@amd.com>	2026-01-26 14:16:36 +01:00
Shilei Tian	c253b9f9ca	[AMDGPU] Fix inline constant encoding for `v_pk_fmac_f16` (#176659 ) This PR handles`v_pk_fmac_f16` inline constant encoding/decoding differences between pre-GFX11 and GFX11+ hardware. - Pre-GFX11: fp16 inline constants produce `(f16, 0)` - value in low 16 bits, zero in high. - GFX11+: fp16 inline constants are duplicated to both halves `(f16, f16)`. Fixes #94116.	2026-01-20 19:14:59 -05:00
Shilei Tian	e59ed9a29e	[AMDGPU] Introduce `AMDGPUSubtargetFeature` multiclass to reduce boilerplate (#176981 ) Many `SubtargetFeature` definitions in `AMDGPU.td` follow a repetitive pattern where a `FeatureXYZ` is paired with a `HasXYZ` predicate. This creates significant code duplication. This PR introduces `AMDGPUSubtargetFeature` multiclass that generates both the `SubtargetFeature` and its corresponding `Predicate` from a single definition. The multiclass accepts an optional `GenPredicate` parameter (default 1) to skip predicate generation when not needed. Not converted: - Features with dependencies - multiclass doesn't support this yet. Will do it in a follow-up. - Features with irregular predicates (e.g., Predicate without `AssemblerPredicate`, negated `Predicate`, complex multi-feature conditions). For those without `AssemblerPredicate`, this can be done by adding an extra optional argument to indicate whether `AssemblerPredicate` is needed. Will be done in a follow-up. - Features where field name doesn't match the `HasXYZ` pattern. 148 features converted, saving ~529 lines of code.	2026-01-20 18:46:02 +00:00
Pankaj Dwivedi	6b86e24ec1	[AMDGPU][SIInsertWaitcnt] Address review feedback for waitcnt profiling expansion (#175922 )	2026-01-17 14:57:45 +05:30
Ramkumar Ramachandra	d69335bac9	[LLVM] Clean up code using [not_]equal_to (NFC) (#175824 ) Use llvm::[not_]equal_to landed in d2a521750 ([ADT] Introduce bind_{front,back}, [not_]equal_to, #175056) across LLVM for cleaner code.	2026-01-13 21:19:39 +00:00
Pankaj Dwivedi	3dfb782333	[AMDGPU][SIInsertWaitcnt] Implement Waitcnt Expansion for Profiling (#169345 ) Reference issue: https://github.com/ROCm/llvm-project/issues/67 This patch adds support for expanding s_waitcnt instructions into sequences with decreasing counter values, enabling PC-sampling profilers to identify which specific memory operation is causing a stall. This is controlled via: Clang flag: -mamdgpu-expand-waitcnt-profiling / -mno-amdgpu-expand-waitcnt-profiling Function attribute: "amdgpu-expand-waitcnt-profiling" When enabled, instead of emitting a single waitcnt, the pass generates a sequence that waits for each outstanding operation individually. For example, if there are 5 outstanding memory operations and the target is to wait until 2 remain: Original: s_waitcnt vmcnt(2) Expanded: s_waitcnt vmcnt(4) s_waitcnt vmcnt(3) s_waitcnt vmcnt(2) The expansion starts from (Outstanding - 1) down to the target value, since waitcnt(Outstanding) would be a no-op (the counter is already at that value). - Uses ScoreBrackets to determine the actual number of outstanding operations - Only expands when operations complete in-order - Skips expansion for mixed event types (e.g., LDS+SMEM on same counter) - Skips expansion for scalar memory (always out-of-order) Releated previous work for Reference - PR: llvm/llvm-project#79236 (related `-amdgpu-waitcnt-forcezero`) --------- Co-authored-by: Pierre van Houtryve <pierre.vanhoutryve@amd.com>	2026-01-12 17:35:06 +05:30
Jay Foad	475f022cb7	[AMDGPU] Add support for GFX12 expert scheduling mode 2 (#170319 )	2026-01-09 15:49:10 +00:00
Sameer Sahasrabuddhe	130fa98a29	[AMDGPU][NFC] dump Waitcnt using an ostream operator (#171251 )	2025-12-10 20:45:49 +05:30
anjenner	740a3ad1f7	AMDGPU: Add codegen for atomicrmw operations usub_cond and usub_sat (#141068 ) Split off from https://github.com/llvm/llvm-project/pull/105553 as per discussion there.	2025-12-05 12:37:33 +00:00
Stanislav Mekhanoshin	31ec45a3d9	[AMDGPU] Fix VGPR lowering for V_DUAL_FMAMK_F32 (#170567 ) Fixes: https://github.com/llvm/llvm-project/issues/170552	2025-12-03 15:12:26 -08:00
Jay Foad	d748c81218	[AMDGPU] Change the immediate operand of s_waitcnt_depctr / s_wait_alu (#169378 ) The 16-bit immediate operand of s_waitcnt_depctr / s_wait_alu has some unused bits. Previously codegen would set these bits to 1, but setting them to 0 matches the SP3 assembler behaviour better, which in turn means that we can print them using the human readable SP3 syntax: s_wait_alu 0xfffd ; unused bits set to 1 s_wait_alu 0xff9d ; unused bits set to 0 s_wait_alu depctr_va_vcc(0) ; unused bits set to 0, human readable Note that the set of unused bits changed between GFX10.1 and GFX10.3.	2025-11-25 11:55:26 +00:00
Changpeng Fang	5f38ae4a77	[AMDGPU] update LDS block size for gfx1250 (#167614 ) LDS block size should be 2048 bytes (512 dwords) based on current spec.	2025-11-17 16:03:47 -08:00
Shilei Tian	72a6ae6844	[AMDGPU] Fix wrong MSB encoding for V_FMAMK instructions (#168107 ) These instructions use `src0`, `imm`, `src1` as operand. Fixes SWDEV-566579.	2025-11-14 22:50:17 +00:00
Craig Topper	8eb28ca83d	[AMDGPU] Remove implicit conversions of MCRegister to unsigned. NFC (#167284 ) Use MCRegister instead of MCPhysReg or use MCRegister::id().	2025-11-11 08:54:27 -08:00
Ivan Kosarev	20f41ed8c1	[AMDGPU][MC] Avoid creating lit64() operands unless asked or needed. (#161191 ) There should normally be no need to generate implicit lit64() modifiers on the assembler side. It's the encoder's responsibility to recognise literals that are implicitly 64 bits wide. The exceptions are where we rewrite floating-point operand values as integer ones, which would not be assembled back to the original values unless wrapped into lit64(). Respect explicit lit() modifiers for non-inline values as necessary to avoid regressions in MC tests. This change still doesn't prevent use of inline constants where lit()/lit64 is specified; subject to a separate patch. On disassembling, only create lit64() operands where necessary for correct round-tripping. Add round-tripping tests where useful and feasible.	2025-10-08 10:51:55 +01:00
Matt Arsenault	1a5494ca4a	AMDGPU: Use RegClassByHwMode to manage operand VGPR operand constraints (#158272 ) This removes special case processing in TargetInstrInfo::getRegClass to fixup register operands which depending on the subtarget support AGPRs, or require even aligned registers. This regresses assembler diagnostics, which currently work by hackily accepting invalid cases and then post-rejecting a validly parsed instruction. On the plus side this now emits a comment when disassembling unaligned registers for targets with the alignment requirement.	2025-10-08 11:19:54 +09:00
Matt Arsenault	cb53a2de37	AMDGPU: Account for read/write register intrinsics for AGPR usage (#161988 ) Fix the special case intrinsics that can directly reference a physical register. There's no reason to use this.	2025-10-08 02:09:22 +00:00
Diana Picus	ebbc0e97b9	[AMDGPU] Remove subtarget features for dynamic VGPRs (#160822 ) Users of the backend are expected to enable dynamic VGPRs via the `amdgpu-dynamic-vgpr-block-size` attribute instead of the subtarget features (see https://github.com/llvm/llvm-project/pull/133444).	2025-10-06 09:50:11 +02:00
Stanislav Mekhanoshin	f693a7f2c2	[AMDGPU] Fix high vgpr printing with true16 (#160209 )	2025-09-23 09:51:21 -07:00
Ivan Kosarev	7ba7021951	[AMDGPU][MC] Keep MCOperands unencoded. (#158685 ) We have proper encoding facilities to encode operands and instructions; there's no need to pollute the MC representation with encoding details. Supposed to be an NFCI, but happens to fix some re-encoded instruction codes in disassembler tests. The 64-bit operands are to be addressed in following patches introducing MC-level representation for lit() and lit64() modifiers, to then be respected by both the assembler and disassembler.	2025-09-16 09:01:01 +01:00

1 2 3 4 5 ...

464 Commits