415 Commits

Author SHA1 Message Date
Ivan Kosarev
7ba7021951
[AMDGPU][MC] Keep MCOperands unencoded. (#158685)
We have proper encoding facilities to encode operands and instructions;
there's no need to pollute the MC representation with encoding details.

Supposed to be an NFCI, but happens to fix some re-encoded instruction
codes in disassembler tests.

The 64-bit operands are to be addressed in following patches introducing
MC-level representation for lit() and lit64() modifiers, to then be
respected by both the assembler and disassembler.
2025-09-16 09:01:01 +01:00
Shilei Tian
1180c2ced0
[AMDGPU] Support lowering of cluster related instrinsics (#157978)
Since many code are connected, this also changes how workgroup id is lowered.

Co-authored-by: Jay Foad <jay.foad@amd.com>
Co-authored-by: Ivan Kosarev <ivan.kosarev@amd.com>
2025-09-12 21:11:17 -04:00
Stanislav Mekhanoshin
d267fac3bc
[AMDGPU] Use subtarget call to determine number of VGPRs (#157927)
Since the register file was increased that is no longer valid to
call VGPR_32RegClass.getNumregs() to get a total number of arch
registers available on a subtarget.

Fixes: SWDEV-550425
2025-09-11 00:39:56 -07:00
Stanislav Mekhanoshin
5dfb9649cb
[AMDGPU] Prevent VOPD combining of VGPRs with different MSBs (#157168) 2025-09-05 13:34:53 -07:00
Stanislav Mekhanoshin
1f0f3473e6
[AMDGPU] High VGPR lowering on gfx1250 (#156965) 2025-09-04 16:20:47 -07:00
Aleksandar Spasojevic
1b47135c9d
[AMDGPU] Ensure positive InstOffset for buffer operations (#145504)
GFX12+ buffer ops require positive InstOffset per AMD hardware spec.
Modified assembler/disassembler to reject negative buffer offsets.
2025-09-04 15:37:46 +02:00
Stanislav Mekhanoshin
6aebbb0a85
[AMDGPU] Define 1024 VGPRs on gfx1250 (#156765)
This is a baseline support, it is not useable yet.
2025-09-03 16:25:18 -07:00
Matt Arsenault
d7484684e5
AMDGPU: Refactor isImmOperandLegal (#155607)
The goal is to expose more variants that can operate without
preconstructed MachineInstrs or MachineOperands.
2025-09-03 09:06:18 +09:00
Stanislav Mekhanoshin
e2901f1610
[AMDGPU] Adjust VGPR allocation encoding on gfx1250 (#156546) 2025-09-02 15:53:17 -07:00
Jay Foad
cf5243619a
[AMDGPU] Common up two local memory size calculations. NFCI. (#154784) 2025-08-22 08:44:11 +01:00
Pierre van Houtryve
6f7c77fe90
[AMDGPU] Check noalias.addrspace in mayAccessScratchThroughFlat (#151319)
PR #149247 made the MD accessible by the backend so we can now leverage
it in the memory model. The first use case here is detecting if a flat op
can access scratch memory.
Benefits both the MemoryLegalizer and InsertWaitCnt.
2025-08-19 07:42:59 +02:00
Stanislav Mekhanoshin
a26c3e9491
[AMDGPU] User SGPR count increased to 32 on gfx1250 (#154205) 2025-08-18 15:04:56 -07:00
Stanislav Mekhanoshin
57c1e01e48
[AMDGPU] Don't allow wgp mode on gfx1250 (#153680)
- gfx1250 only supports cu mode
2025-08-14 15:16:56 -07:00
Stanislav Mekhanoshin
49f2093477
[AMDGPU] Increase LDS to 320K on gfx1250 (#153645) 2025-08-14 12:52:00 -07:00
Stanislav Mekhanoshin
ea14834966
[AMDGPU] Per-subtarget DPP instruction classification (#153096)
This is NFCI at this point.
2025-08-11 15:41:02 -07:00
Stanislav Mekhanoshin
dddeb07c2e
[AMDGPU] Restrict packed math FP32 instructions to read only one SGPR per operand on gfx12+ (#152465)
Sec. 4.6.7.1 of the gfx1250 SPG states that if an SGPR is used
as an operand, only one SGPR will be read for both the low and high
operations. As a result, the corresponding bits in `op_sel` and
`op_sel_hi` must be the same when the operand is an SGPR.

Co-authored-by: Tian, Shilei <Shilei.Tian@amd.com>

Co-authored-by: Tian, Shilei <Shilei.Tian@amd.com>
2025-08-07 16:13:34 -07:00
Stanislav Mekhanoshin
d08c2977e8
[AMDGPU] Add MC support for new gfx1250 src_flat_scratch_base_lo/hi (#152203) 2025-08-05 14:35:48 -07:00
Matt Arsenault
1d7a0fa08a
AMDGPU: Move asm constraint physreg parsing to utils (#150903)
Also fixes an assertion on out of bound physical register
indexes.
2025-08-01 16:11:11 +09:00
Stanislav Mekhanoshin
ce40863209
[AMDGPU] Add v_cvt_sr|pk_bf8|fp8_f16 gfx1250 instructions (#151415) 2025-07-30 17:24:45 -07:00
Changpeng Fang
67e2faa50c
[AMDGPU] MC support for async load and store on gfx1250 (#151030) 2025-07-28 13:45:37 -07:00
Stanislav Mekhanoshin
a0b854d576
[AMDGPU] MC support for gfx1250 scale_offset modifier (#149881) 2025-07-21 15:04:59 -07:00
Changpeng Fang
d6094370cb
AMDGPU: Support v_wmma_f32_16x16x128_f8f6f4 on gfx1250 (#149684)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2025-07-21 10:09:42 -07:00
Changpeng Fang
b52cf756ce
AMDGPU: Treat WMMA XDL ops as TRANS in S_DELAY_ALU insertion for gfx1250 (#149208)
WMMA XDL instructions are tracked as TRANs ops and the compiler should
consider them the same as TRANS in S_DELAY_ALU insertion. We use a searchable
table for the InsertDelayAlu pass to recognize these WMMA XDL instructions.

Co-authored-by: Stefan Stipanovic <Stefan.Stipanovic@amd.com>
2025-07-16 17:07:48 -07:00
Stanislav Mekhanoshin
5277021c3c
[AMDGPU] Add gfx1250 v_fmac_f64 implementation (#148725) 2025-07-14 15:39:04 -07:00
Stanislav Mekhanoshin
7920dff394
[AMDGPU] VOPD/VOPD3 changes for gfx1250 (#147602) 2025-07-10 14:15:01 -07:00
Stanislav Mekhanoshin
00a85e5704
[AMDGPU] gfx1250: MC support for 64-bit literals (#147861) 2025-07-09 22:25:47 -07:00
Changpeng Fang
eda3161c35
AMDGPU: Implement tensor load and store instructions for gfx1250 (#146636)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2025-07-03 13:49:34 -07:00
Christudasan Devadasan
08b8d467d4
[AMDGPU][GFX1250] Insert S_WAIT_XCNT for SMEM and VMEM load-stores (#145566)
This patch tracks the register operands of both VMEM (FLAT, MUBUF,
MTBUF) and SMEM load-store operations and inserts a S_WAIT_XCNT
instruction with sufficient wait-count before potentially redefining
them. For VMEM instructions, XNACK is returned in the same order as
they were issued and hence non-zero counter values can be inserted.
However, SMEM execution is out-of-order and so is their XNACK reception.
Thus, only zero counter value can be inserted to capture SMEM dependencies.
2025-06-25 10:40:36 +05:30
Diana Picus
a201f8872a
[AMDGPU] Replace dynamic VGPR feature with attribute (#133444)
Use a function attribute (amdgpu-dynamic-vgpr) instead of a subtarget
feature, as requested in #130030.
2025-06-24 11:09:36 +02:00
Stanislav Mekhanoshin
fa0b84f23c
[AMDGPU] Rename call instructions from b64 to i64 (#145103)
These get renamed in gfx1250 and on from B64 to I64:

  S_CALL_I64
  S_GET_PC_I64
  S_RFE_I64
  S_SET_PC_I64
  S_SWAP_PC_I64
2025-06-21 21:42:09 -07:00
Diana Picus
40a7dce9ef
[AMDGPU] Remove duplicated/confusing helpers. NFCI (#142598)
Move canGuaranteeTCO and mayTailCallThisCC into AMDGPUBaseInfo instead
of keeping two copies for DAG/Global ISel.

Also remove isKernelCC, which doesn't agree with isKernel and doesn't
seem very useful.

While at it, also move all the CC-related helpers into AMDGPUBaseInfo.h and
mark them constexpr.
2025-06-05 11:19:20 +02:00
Kazu Hirata
70ef89b913
[AMDGPU] Use std::optional::value_or (NFC) (#140006) 2025-05-14 23:50:13 -07:00
Lucas Ramirez
6456ee056f
Reapply "[AMDGPU][Scheduler] Refactor ArchVGPR rematerialization during scheduling (#125885)" (#139548)
This reapplies 067caaa and 382a085 (reverting b35f6e2) with fixes to
issues detected by the address sanitizer (MIs have to be removed from
live intervals before being removed from their parent MBB).

Original commit description below.

AMDGPU scheduler's `PreRARematStage` attempts to increase function
occupancy w.r.t. ArchVGPR usage by rematerializing trivial
ArchVGPR-defining instruction next to their single use. It first
collects all eligible trivially rematerializable instructions in the
function, then sinks them one-by-one while recomputing occupancy in all
affected regions each time to determine if and when it has managed to
increase overall occupancy. If it does, changes are committed to the
scheduler's state; otherwise modifications to the IR are reverted and
the scheduling stage gives up.

In both cases, this scheduling stage currently involves repeated queries
for up-to-date occupancy estimates and some state copying to enable
reversal of sinking decisions when occupancy is revealed not to
increase. The current implementation also does not accurately track
register pressure changes in all regions affected by sinking decisions.

This commit refactors this scheduling stage, improving RP tracking and
splitting the stage into two distinct steps to avoid repeated occupancy
queries and IR/state rollbacks.

- Analysis and collection (`canIncreaseOccupancyOrReduceSpill`). The
number of ArchVGPRs to save to reduce spilling or increase function
occupancy by 1 (when there is no spilling) is computed. Then,
instructions eligible for rematerialization are collected, stopping as
soon as enough have been identified to be able to achieve our goal
(according to slightly optimistic heuristics). If there aren't enough of
such instructions, the scheduling stage stops here.
- Rematerialization (`rematerialize`). Instructions collected in the
first step are rematerialized one-by-one. Now we are able to directly
update the scheduler's state since we have already done the occupancy
analysis and know we won't have to rollback any state. Register
pressures for impacted regions are recomputed only once, as opposed to
at every sinking decision.

In the case where the stage attempted to increase occupancy, and if both
rematerializations alone and rescheduling after were unable to improve
occupancy, then all rematerializations are rollbacked.
2025-05-13 11:11:00 +02:00
Vitaly Buka
b35f6e26a5
Revert "[AMDGPU][Scheduler] Refactor ArchVGPR rematerialization during scheduling (#125885)" (#139341)
And related "[AMDGPU] Regenerate mfma-loop.ll test"

Introduce memory error detected by Asan #125885.

This reverts commit 382a085a95b0abeac77b150b7b644b372bd08e78.
This reverts commit 067caaafb58a156d0d77229422607782a639f5b5.
2025-05-09 17:51:46 -07:00
Ivan Kosarev
66d3980b53
[AMDGPU][NFC] Remove _DEFERRED operands. (#139123)
All immediates are deferred now.
2025-05-09 10:10:53 +01:00
Ivan Kosarev
c290f48a45
[AMDGPU][NFC] Remove unused operand types. (#139062) 2025-05-08 12:48:25 +01:00
Lucas Ramirez
067caaafb5
[AMDGPU][Scheduler] Refactor ArchVGPR rematerialization during scheduling (#125885)
AMDGPU scheduler's `PreRARematStage` attempts to increase function
occupancy w.r.t. ArchVGPR usage by rematerializing trivial
ArchVGPR-defining instruction next to their single use. It first
collects all eligible trivially rematerializable instructions in the
function, then sinks them one-by-one while recomputing occupancy in all
affected regions each time to determine if and when it has managed to
increase overall occupancy. If it does, changes are committed to the
scheduler's state; otherwise modifications to the IR are reverted and
the scheduling stage gives up.

In both cases, this scheduling stage currently involves repeated queries
for up-to-date occupancy estimates and some state copying to enable
reversal of sinking decisions when occupancy is revealed not to
increase. The current implementation also does not accurately track
register pressure changes in all regions affected by sinking decisions.

This commit refactors this scheduling stage, improving RP tracking and
splitting the stage into two distinct steps to avoid repeated occupancy
queries and IR/state rollbacks.

- Analysis and collection (`canIncreaseOccupancyOrReduceSpill`). The
number of ArchVGPRs to save to reduce spilling or increase function
occupancy by 1 (when there is no spilling) is computed. Then,
instructions eligible for rematerialization are collected, stopping as
soon as enough have been identified to be able to achieve our goal
(according to slightly optimistic heuristics). If there aren't enough of
such instructions, the scheduling stage stops here.
- Rematerialization (`rematerialize`). Instructions collected in the
first step are rematerialized one-by-one. Now we are able to directly
update the scheduler's state since we have already done the occupancy
analysis and know we won't have to rollback any state. Register
pressures for impacted regions are recomputed only once, as opposed to
at every sinking decision.

In the case where the stage attempted to increase occupancy, and if both
rematerializations alone and rescheduling after were unable to improve
occupancy, then all rematerializations are rollbacked.
2025-05-08 12:51:06 +02:00
Brox Chen
cbe8f3ad76
[AMDGPU][True16][MC] fix fmac_f16_t16 vop3 format (#135464)
add fmac_f16_t16_e64 to isfmac check to fix the vop3 format of
fmac_f16_t16 instruction
2025-04-13 18:10:31 -04:00
Shilei Tian
3e742b517a
[NFC][AMDGPU] clang-format AMDGPUBaseInfo.[h,cpp] (#133559) 2025-03-29 10:28:34 -04:00
Shilei Tian
02b45f4b81
[AMDGPU] Add a new function getIntegerPairAttribute (#133271)
The new function will return `std::nullopt` when any error occurs.
2025-03-27 15:38:54 -04:00
Shilei Tian
f1ac2afe21
Reapply "[AMDGPU] Use COV6 by default (#118515)" (#130963)
This reverts commit 68bcba6d7a1cc18996c0bcb7c62267c62d2040d0.
2025-03-21 15:26:45 -04:00
Diana Picus
1f84495255
[AMDGPU] Update target helpers & GCNSchedStrategy for dynamic VGPRs (#130047)
In dynamic VGPR mode, we can allocate up to 8 blocks of either 16 or 32
VGPRs (based on a chip-wide setting which we can model with a Subtarget
feature). Update some of the subtarget helpers to reflect this.

In particular:
- getVGPRAllocGranule is set to the block size
- getAddresableNumVGPR will limit itself to 8 * size of a block

We also try to be more careful about how many VGPR blocks we allocate.
Therefore, when deciding if we should revert scheduling after a given
stage, we check that we haven't increased the number of VGPR blocks that
need to be allocated.

---------

Co-authored-by: Jannik Silvanus <jannik.silvanus@amd.com>
2025-03-19 10:29:38 +01:00
Alex Voicu
c1fabd681f
[llvm][AMDGPU] Enable FWD_PROGRESS bit for GFX10+ (#128367)
From GFX10 onwards it is possible to employ benevolent scheduling of
waves. This patch unconditionally enables, for the `amdhsa` OS, the bit
which controls that capability, as it is beneficial for algorithms that
rely on more complex concurrent coordination and it is generally
performance neutral otherwise.
2025-03-17 23:17:46 +00:00
Shilei Tian
dccc0a836c
[NFC][AMDGPU] Replace more direct arch comparison with isAMDGCN() (#131379)
This is an extension of #131357. Hopefully this would be the last one.
2025-03-14 17:02:15 -04:00
Ana Mihajlovic
65ade6d2eb
[AMDGPU] Merge consecutive wait_alu instruction (#128916) 2025-03-12 10:27:27 +01:00
Jay Foad
44607666b3
[AMDGPU] Simplify conditional expressions. NFC. (#129228)
Simplfy `cond ? val : false` to `cond && val` and similar.
2025-03-03 10:40:49 +00:00
Brox Chen
e6f6a1e863
[AMDGPU][True16][CodeGen] uaddsat/usubsat true16 selection in gisel (#128233)
Enable gisel selection for uaddsat and usubsat in true16 flow

This patch includes:

1. Added VGPR_16_Lo128/VGPR_16 to register bank and update register info
for recognizing 16bit regclass id and bit width
2. uaddsat/usubsat test update
2025-02-25 17:09:34 -05:00
Brox Chen
7c24041895
[AMDGPU][True16][CodeGen] reopen "FLAT_load using D16 pseudo instruction" (#127673)
Previous patch is merged
https://github.com/llvm/llvm-project/pull/114500 and it hit a buildbot
failure and thus reverted

It seems the AMDGPU::OpName::OPERAND_LAST is removed at the meantime
when previous patch is merged and that's causing the compile error.
Fixed and reopen it here
2025-02-18 18:16:23 -05:00
Nikita Popov
2cb5241c77 Revert "[AMDGPU][True16][CodeGen] FLAT_load using D16 pseudo instruction (#114500)"
This reverts commit f7a5f067885b7f6cc4a000c8392adf6b777a9108.

Fails to build with:

llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp:126:37: error: no member named 'OPERAND_LAST' in 'llvm::AMDGPU::OpName'
  126 |   uint16_t OpName = AMDGPU::OpName::OPERAND_LAST;
2025-02-18 17:16:12 +01:00
Brox Chen
f7a5f06788
[AMDGPU][True16][CodeGen] FLAT_load using D16 pseudo instruction (#114500)
Implement new pseudos with the suffix _t16 for FLAT_LOAD which have
VGPR_16 as the load dst. Lower the pseudos to the existing real
instructions with VGPR_32 src or dst (which makes them consistent with
the hardware encoding). This patch reduces VGPR usage by making hi
halves of VGPRs available for other values.

There are more 8/16 bits ld/st instructions to be supported in the
up-coming patches
2025-02-18 11:05:25 -05:00