439 Commits

Author SHA1 Message Date
sstipano
5ec5701db3
Reapply "[MC][TableGen] Expand Opcode field of MCInstrDesc" (#180321) (#180954)
Difference from the previous version is that this one doesn't actually
encode opcodes in matcher tables as 32 bits, but still as 16 bits.
2026-02-12 09:17:02 +01:00
vporpo
9898082bd3
[AMDGPU][SIInsertWaitcnt][NFC] Access Waitcnt elements using InstCounterType (#178345)
This patch introduces `get(T)` and `set(T, Val)` functions for Waitcnt
and removes getCounterRef() and getWait(). For this to work we also need
to move InstrCounterType to AMDGPUBaseInfo.h.

Please note that the member variables are still public to keep this
patch small.
They will be replaced in the follow-up patch.
2026-02-09 17:54:08 -08:00
Vladimir Vereschaka
19d681177f
Revert "[MC][TableGen] Expand Opcode field of MCInstrDesc" (#180321)
Reverts llvm/llvm-project#179652

This PR causes the out-of-memory build failures on many Windows
builders.
2026-02-06 21:58:50 -08:00
sstipano
13d8870d45
[MC][TableGen] Expand Opcode field of MCInstrDesc (#179652)
Increase width of Opcode to `int` from `short` to allow more capacity.
2026-02-06 20:21:48 +01:00
vporpo
1658456ccf
[AMDGPU] Introduce custom MIR formatting for s_wait_alu (#176316)
This patch implements a custom printer/parser for the immediate operand
of s_wait_alu that prints/parses the decoded counter values.

Format:
```
 .<counter1>_<value1>_<counter2>_<value2>
```

Example:
 `s_wait_alu .VaVdst_1_VmVsrc_1`
 ; Which is equivalent to this:
 `s_wait_alu 8167`

Features:
- If a counter is at its maximum value it won't get printed.
- The parser will error out if a counter is greater or equal to its max
value.
- If all counters are disabled we can use 'AllOff'.
- For now we also accept numeric values for backwards compatibility with
older MIR.

Note: This is similar to https://github.com/llvm/llvm-project/pull/96004
but for `s_wait_alu`.
2026-01-31 10:46:59 -08:00
Mariusz Sikora
3c0f5045e1
[AMDGPU] Add FeatureGFX13 and SMEM encoding for gfx13 (#177567)
For now list of features is based on gfx12 and gfx1250

---------

Co-authored-by: Jay Foad <jay.foad@amd.com>
2026-01-26 14:16:36 +01:00
Shilei Tian
c253b9f9ca
[AMDGPU] Fix inline constant encoding for v_pk_fmac_f16 (#176659)
This PR handles`v_pk_fmac_f16` inline constant encoding/decoding
differences between pre-GFX11 and GFX11+ hardware.

- Pre-GFX11: fp16 inline constants produce `(f16, 0)` - value in low 16
bits, zero in high.
- GFX11+: fp16 inline constants are duplicated to both halves `(f16,
f16)`.

Fixes #94116.
2026-01-20 19:14:59 -05:00
Shilei Tian
e59ed9a29e
[AMDGPU] Introduce AMDGPUSubtargetFeature multiclass to reduce boilerplate (#176981)
Many `SubtargetFeature` definitions in `AMDGPU.td` follow a repetitive
pattern where a `FeatureXYZ` is paired with a `HasXYZ` predicate. This
creates significant code duplication.

This PR introduces `AMDGPUSubtargetFeature` multiclass that generates
both the `SubtargetFeature` and its corresponding `Predicate` from a
single definition. The multiclass accepts an optional `GenPredicate`
parameter (default 1) to skip predicate generation when not needed.

Not converted:

- Features with dependencies - multiclass doesn't support this yet. Will
do it in a follow-up.
- Features with irregular predicates (e.g., Predicate without
`AssemblerPredicate`, negated `Predicate`, complex multi-feature
conditions). For those without `AssemblerPredicate`, this can be done by
adding an extra optional argument to indicate whether
`AssemblerPredicate` is needed. Will be done in a follow-up.
- Features where field name doesn't match the `HasXYZ` pattern.

148 features converted, saving ~529 lines of code.
2026-01-20 18:46:02 +00:00
Pankaj Dwivedi
6b86e24ec1
[AMDGPU][SIInsertWaitcnt] Address review feedback for waitcnt profiling expansion (#175922) 2026-01-17 14:57:45 +05:30
Ramkumar Ramachandra
d69335bac9
[LLVM] Clean up code using [not_]equal_to (NFC) (#175824)
Use llvm::[not_]equal_to landed in d2a521750 ([ADT] Introduce
bind_{front,back}, [not_]equal_to, #175056) across LLVM for cleaner
code.
2026-01-13 21:19:39 +00:00
Pankaj Dwivedi
3dfb782333
[AMDGPU][SIInsertWaitcnt] Implement Waitcnt Expansion for Profiling (#169345)
Reference issue: https://github.com/ROCm/llvm-project/issues/67

This patch adds support for expanding s_waitcnt instructions into
sequences with decreasing counter values, enabling PC-sampling profilers
to identify which specific memory operation is causing a stall.

This is controlled via:
Clang flag: -mamdgpu-expand-waitcnt-profiling /
-mno-amdgpu-expand-waitcnt-profiling
Function attribute: "amdgpu-expand-waitcnt-profiling"

When enabled, instead of emitting a single waitcnt, the pass generates a
sequence that waits for each outstanding operation individually. For
example, if there are 5 outstanding memory operations and the target is
to wait until 2 remain:


**Original**: 
s_waitcnt vmcnt(2)

**Expanded**:  
s_waitcnt vmcnt(4)
s_waitcnt vmcnt(3)
s_waitcnt vmcnt(2)

The expansion starts from (Outstanding - 1) down to the target value,
since waitcnt(Outstanding) would be a no-op (the counter is already at
that value).

- Uses ScoreBrackets to determine the actual number of outstanding
operations
- Only expands when operations complete in-order
- Skips expansion for mixed event types (e.g., LDS+SMEM on same counter)
- Skips expansion for scalar memory (always out-of-order)

Releated previous work for Reference
- **PR**: llvm/llvm-project#79236 (related `-amdgpu-waitcnt-forcezero`)

---------

Co-authored-by: Pierre van Houtryve <pierre.vanhoutryve@amd.com>
2026-01-12 17:35:06 +05:30
Jay Foad
475f022cb7
[AMDGPU] Add support for GFX12 expert scheduling mode 2 (#170319) 2026-01-09 15:49:10 +00:00
Sameer Sahasrabuddhe
130fa98a29
[AMDGPU][NFC] dump Waitcnt using an ostream operator (#171251) 2025-12-10 20:45:49 +05:30
anjenner
740a3ad1f7
AMDGPU: Add codegen for atomicrmw operations usub_cond and usub_sat (#141068)
Split off from https://github.com/llvm/llvm-project/pull/105553 as per
discussion there.
2025-12-05 12:37:33 +00:00
Stanislav Mekhanoshin
31ec45a3d9
[AMDGPU] Fix VGPR lowering for V_DUAL_FMAMK_F32 (#170567)
Fixes: https://github.com/llvm/llvm-project/issues/170552
2025-12-03 15:12:26 -08:00
Jay Foad
d748c81218
[AMDGPU] Change the immediate operand of s_waitcnt_depctr / s_wait_alu (#169378)
The 16-bit immediate operand of s_waitcnt_depctr / s_wait_alu has some
unused bits. Previously codegen would set these bits to 1, but setting
them to 0 matches the SP3 assembler behaviour better, which in turn
means that we can print them using the human readable SP3 syntax:

s_wait_alu 0xfffd ; unused bits set to 1
s_wait_alu 0xff9d ; unused bits set to 0
s_wait_alu depctr_va_vcc(0) ; unused bits set to 0, human readable

Note that the set of unused bits changed between GFX10.1 and GFX10.3.
2025-11-25 11:55:26 +00:00
Changpeng Fang
5f38ae4a77
[AMDGPU] update LDS block size for gfx1250 (#167614)
LDS block size should be 2048 bytes (512 dwords) based on current spec.
2025-11-17 16:03:47 -08:00
Shilei Tian
72a6ae6844
[AMDGPU] Fix wrong MSB encoding for V_FMAMK instructions (#168107)
These instructions use `src0`, `imm`, `src1` as operand.

Fixes SWDEV-566579.
2025-11-14 22:50:17 +00:00
Craig Topper
8eb28ca83d
[AMDGPU] Remove implicit conversions of MCRegister to unsigned. NFC (#167284)
Use MCRegister instead of MCPhysReg or use MCRegister::id().
2025-11-11 08:54:27 -08:00
Ivan Kosarev
20f41ed8c1
[AMDGPU][MC] Avoid creating lit64() operands unless asked or needed. (#161191)
There should normally be no need to generate implicit lit64()
modifiers on the assembler side. It's the encoder's responsibility
to recognise literals that are implicitly 64 bits wide.

The exceptions are where we rewrite floating-point operand values
as integer ones, which would not be assembled back to the original
values unless wrapped into lit64().

Respect explicit lit() modifiers for non-inline values as
necessary to avoid regressions in MC tests. This change still
doesn't prevent use of inline constants where lit()/lit64 is
specified; subject to a separate patch.

On disassembling, only create lit64() operands where necessary for
correct round-tripping.

Add round-tripping tests where useful and feasible.
2025-10-08 10:51:55 +01:00
Matt Arsenault
1a5494ca4a
AMDGPU: Use RegClassByHwMode to manage operand VGPR operand constraints (#158272)
This removes special case processing in TargetInstrInfo::getRegClass to
fixup register operands which depending on the subtarget support AGPRs,
or require even aligned registers.

This regresses assembler diagnostics, which currently work by hackily
accepting invalid cases and then post-rejecting a validly parsed
instruction.
On the plus side this now emits a comment when disassembling unaligned
registers for targets with the alignment requirement.
2025-10-08 11:19:54 +09:00
Matt Arsenault
cb53a2de37
AMDGPU: Account for read/write register intrinsics for AGPR usage (#161988)
Fix the special case intrinsics that can directly reference a physical
register. There's no reason to use this.
2025-10-08 02:09:22 +00:00
Diana Picus
ebbc0e97b9
[AMDGPU] Remove subtarget features for dynamic VGPRs (#160822)
Users of the backend are expected to enable dynamic VGPRs via the
`amdgpu-dynamic-vgpr-block-size` attribute instead of the subtarget
features (see https://github.com/llvm/llvm-project/pull/133444).
2025-10-06 09:50:11 +02:00
Stanislav Mekhanoshin
f693a7f2c2
[AMDGPU] Fix high vgpr printing with true16 (#160209) 2025-09-23 09:51:21 -07:00
Ivan Kosarev
7ba7021951
[AMDGPU][MC] Keep MCOperands unencoded. (#158685)
We have proper encoding facilities to encode operands and instructions;
there's no need to pollute the MC representation with encoding details.

Supposed to be an NFCI, but happens to fix some re-encoded instruction
codes in disassembler tests.

The 64-bit operands are to be addressed in following patches introducing
MC-level representation for lit() and lit64() modifiers, to then be
respected by both the assembler and disassembler.
2025-09-16 09:01:01 +01:00
Shilei Tian
1180c2ced0
[AMDGPU] Support lowering of cluster related instrinsics (#157978)
Since many code are connected, this also changes how workgroup id is lowered.

Co-authored-by: Jay Foad <jay.foad@amd.com>
Co-authored-by: Ivan Kosarev <ivan.kosarev@amd.com>
2025-09-12 21:11:17 -04:00
Stanislav Mekhanoshin
d267fac3bc
[AMDGPU] Use subtarget call to determine number of VGPRs (#157927)
Since the register file was increased that is no longer valid to
call VGPR_32RegClass.getNumregs() to get a total number of arch
registers available on a subtarget.

Fixes: SWDEV-550425
2025-09-11 00:39:56 -07:00
Stanislav Mekhanoshin
5dfb9649cb
[AMDGPU] Prevent VOPD combining of VGPRs with different MSBs (#157168) 2025-09-05 13:34:53 -07:00
Stanislav Mekhanoshin
1f0f3473e6
[AMDGPU] High VGPR lowering on gfx1250 (#156965) 2025-09-04 16:20:47 -07:00
Aleksandar Spasojevic
1b47135c9d
[AMDGPU] Ensure positive InstOffset for buffer operations (#145504)
GFX12+ buffer ops require positive InstOffset per AMD hardware spec.
Modified assembler/disassembler to reject negative buffer offsets.
2025-09-04 15:37:46 +02:00
Stanislav Mekhanoshin
6aebbb0a85
[AMDGPU] Define 1024 VGPRs on gfx1250 (#156765)
This is a baseline support, it is not useable yet.
2025-09-03 16:25:18 -07:00
Matt Arsenault
d7484684e5
AMDGPU: Refactor isImmOperandLegal (#155607)
The goal is to expose more variants that can operate without
preconstructed MachineInstrs or MachineOperands.
2025-09-03 09:06:18 +09:00
Stanislav Mekhanoshin
e2901f1610
[AMDGPU] Adjust VGPR allocation encoding on gfx1250 (#156546) 2025-09-02 15:53:17 -07:00
Jay Foad
cf5243619a
[AMDGPU] Common up two local memory size calculations. NFCI. (#154784) 2025-08-22 08:44:11 +01:00
Pierre van Houtryve
6f7c77fe90
[AMDGPU] Check noalias.addrspace in mayAccessScratchThroughFlat (#151319)
PR #149247 made the MD accessible by the backend so we can now leverage
it in the memory model. The first use case here is detecting if a flat op
can access scratch memory.
Benefits both the MemoryLegalizer and InsertWaitCnt.
2025-08-19 07:42:59 +02:00
Stanislav Mekhanoshin
a26c3e9491
[AMDGPU] User SGPR count increased to 32 on gfx1250 (#154205) 2025-08-18 15:04:56 -07:00
Stanislav Mekhanoshin
57c1e01e48
[AMDGPU] Don't allow wgp mode on gfx1250 (#153680)
- gfx1250 only supports cu mode
2025-08-14 15:16:56 -07:00
Stanislav Mekhanoshin
49f2093477
[AMDGPU] Increase LDS to 320K on gfx1250 (#153645) 2025-08-14 12:52:00 -07:00
Stanislav Mekhanoshin
ea14834966
[AMDGPU] Per-subtarget DPP instruction classification (#153096)
This is NFCI at this point.
2025-08-11 15:41:02 -07:00
Stanislav Mekhanoshin
dddeb07c2e
[AMDGPU] Restrict packed math FP32 instructions to read only one SGPR per operand on gfx12+ (#152465)
Sec. 4.6.7.1 of the gfx1250 SPG states that if an SGPR is used
as an operand, only one SGPR will be read for both the low and high
operations. As a result, the corresponding bits in `op_sel` and
`op_sel_hi` must be the same when the operand is an SGPR.

Co-authored-by: Tian, Shilei <Shilei.Tian@amd.com>

Co-authored-by: Tian, Shilei <Shilei.Tian@amd.com>
2025-08-07 16:13:34 -07:00
Stanislav Mekhanoshin
d08c2977e8
[AMDGPU] Add MC support for new gfx1250 src_flat_scratch_base_lo/hi (#152203) 2025-08-05 14:35:48 -07:00
Matt Arsenault
1d7a0fa08a
AMDGPU: Move asm constraint physreg parsing to utils (#150903)
Also fixes an assertion on out of bound physical register
indexes.
2025-08-01 16:11:11 +09:00
Stanislav Mekhanoshin
ce40863209
[AMDGPU] Add v_cvt_sr|pk_bf8|fp8_f16 gfx1250 instructions (#151415) 2025-07-30 17:24:45 -07:00
Changpeng Fang
67e2faa50c
[AMDGPU] MC support for async load and store on gfx1250 (#151030) 2025-07-28 13:45:37 -07:00
Stanislav Mekhanoshin
a0b854d576
[AMDGPU] MC support for gfx1250 scale_offset modifier (#149881) 2025-07-21 15:04:59 -07:00
Changpeng Fang
d6094370cb
AMDGPU: Support v_wmma_f32_16x16x128_f8f6f4 on gfx1250 (#149684)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2025-07-21 10:09:42 -07:00
Changpeng Fang
b52cf756ce
AMDGPU: Treat WMMA XDL ops as TRANS in S_DELAY_ALU insertion for gfx1250 (#149208)
WMMA XDL instructions are tracked as TRANs ops and the compiler should
consider them the same as TRANS in S_DELAY_ALU insertion. We use a searchable
table for the InsertDelayAlu pass to recognize these WMMA XDL instructions.

Co-authored-by: Stefan Stipanovic <Stefan.Stipanovic@amd.com>
2025-07-16 17:07:48 -07:00
Stanislav Mekhanoshin
5277021c3c
[AMDGPU] Add gfx1250 v_fmac_f64 implementation (#148725) 2025-07-14 15:39:04 -07:00
Stanislav Mekhanoshin
7920dff394
[AMDGPU] VOPD/VOPD3 changes for gfx1250 (#147602) 2025-07-10 14:15:01 -07:00
Stanislav Mekhanoshin
00a85e5704
[AMDGPU] gfx1250: MC support for 64-bit literals (#147861) 2025-07-09 22:25:47 -07:00