1005 Commits

Author SHA1 Message Date
Akash Dutta
c256966fe2
[AMDGPU]: Unpack packed instructions overlapped by MFMAs post-RA scheduling (#157968)
This is a cleaned up version of PR #151704. These optimizations are now
performed post-RA scheduling.
2025-09-19 09:41:02 -07:00
Matt Arsenault
daed12d00d
AMDGPU: Remove unnecessary AGPR legalize logic (#159491)
The manual legalizeOperands code only need to consider cases that
require full instruction context to know if the operand is legal.
This does not need to handle basic operand register class constraints.
2025-09-19 09:51:46 +09:00
Matt Arsenault
aa8b624518
AMDGPU: Remove unnecessary operand legalization for WMMAs (#159370)
The operand constraints already express this constraint, and
InstrEmitter will respect them.
2025-09-18 09:20:18 +09:00
Matt Arsenault
d57aa484e1
AMDGPU: Constrain regclass when replacing SGPRs with VGPRs (#159369) 2025-09-18 07:36:28 +09:00
Jay Foad
eeced0d073
[AMDGPU] Use larger immediate values in S_NOP (#158990)
The S_NOP instruction has an immediate operand which is one less than
the number of cycles to delay for. The maximum value that may be encoded
in this field was increased in GFX8 and again in GFX12.
2025-09-16 15:51:06 +01:00
Stanislav Mekhanoshin
72aa946762
[AMDGPU] Drop high 32 bits of aperture registers (#158725)
Fixes: SWDEV-551181
2025-09-16 02:11:39 -07:00
Carl Ritson
fdb06d9792
[AMDGPU] Refactor out common exec mask opcode patterns (NFCI) (#154718)
Create utility mechanism for finding wave size dependent opcodes used to
manipulate exec/lane masks.
2025-09-16 03:22:14 +00:00
Matt Arsenault
1dc4db8f1e
AMDGPU: Relax verifier for agpr/vgpr loads and stores (#158391) 2025-09-13 16:34:02 +09:00
Matt Arsenault
7289f2cd0c
CodeGen: Remove MachineFunction argument from getRegClass (#158188)
This is a low level utility to parse the MCInstrInfo and should
not depend on the state of the function.
2025-09-12 19:22:02 +09:00
Matt Arsenault
3b48c64d08
AMDGPU: Move spill pseudo special case out of adjustAllocatableRegClass (#158246)
This is special for the same reason av_mov_b64_imm_pseudo is special.
2025-09-12 18:35:57 +09:00
Matt Arsenault
9e1d656c68
AMDGPU: Remove MIMG special case in adjustAllocatableRegClass (#158184)
I have no idea why this was here. MIMG atomics use tied operands
for the input and output, so AV classes should have always worked.
We have poor test coverage for AGPRs with atomics, so add a partial
set. Everything seems to work OK, although it seems image cmpswap
always uses VGPRs unnecessarily.
2025-09-12 09:02:24 +00:00
Matt Arsenault
5a21128f24
AMDGPU: Relax legal register operand constraint (#157989)
Find a common subclass instead of directly checking for a subclass
relationship. This fixes folding logic for unaligned register defs
into aligned use contexts. e.g., a vreg_64 def into an av_64_align2
use should be able to find the common subclass vreg_align2. This
avoids regressions in future patches.

Checking the subclass was also redundant on the subregister path;
getMatchingSuperRegClass is sufficient.
2025-09-12 08:57:47 +09:00
Matt Arsenault
1c325a07f8
AMDGPU: Stop checking allocatable in adjustAllocatableRegClass (#158105)
This no longer does anything.
2025-09-12 08:56:34 +09:00
Petar Avramovic
41c685975e
AMDGPU/UniformityAnalysis: fix G_ZEXTLOAD and G_SEXTLOAD (#157845)
Use same rules for G_ZEXTLOAD and G_SEXTLOAD as for G_LOAD.
Flat addrspace(0) and private addrspace(5) G_ZEXTLOAD and G_SEXTLOAD
should be always divergent.
2025-09-10 17:57:15 +02:00
Stanislav Mekhanoshin
b0ee92be94
[AMDGPU] Restrict scale operands of WMMA to low 256 VGPRs (#157526)
These cannot accept high registers.
2025-09-08 15:44:51 -07:00
Matt Arsenault
727e9f5ea5
CodeGen: Pass SubtargetInfo to TargetGenInstrInfo constructors (#157337)
This will make it possible for tablegen to make subtarget
dependent decisions without adding new arguments to every
target.

---------

Co-authored-by: Sergei Barannikov <barannikov88@gmail.com>
2025-09-08 12:12:19 +09:00
Matt Arsenault
884130bf93
AMDGPU: Allow folding multiple uses of some immediates into copies (#154757)
In some cases this will require an avoidable re-defining of
a register, but it works out better most of the time. Also allow
folding 64-bit immediates into subregister extracts, unless it would
break an inline constant.

We could be more aggressive here, but this set of conditions seems
to do a reasonable job without introducing too many regressions.
2025-09-06 08:22:09 +09:00
Matt Arsenault
d096b1d48e
AMDGPU: Remove flat special case in getRegClass (#156991) 2025-09-06 07:42:16 +09:00
Stanislav Mekhanoshin
1f0f3473e6
[AMDGPU] High VGPR lowering on gfx1250 (#156965) 2025-09-04 16:20:47 -07:00
Pierre van Houtryve
e2bd10cf16
[AMDGPU][gfx1250] Add 128B cooperative atomics (#156418)
- Add clang built-ins + sema/codegen
- Add IR Intrinsic + verifier
- Add DAG/GlobalISel codegen for the intrinsics
- Add lowering in SIMemoryLegalizer using a MMO flag.
2025-09-04 09:19:25 +00:00
Diana Picus
018dc1b397
[AMDGPU] Tail call support for whole wave functions (#145860)
Support tail calls to whole wave functions (trivial) and from whole wave
functions (slightly more involved because we need a new pseudo for the
tail call return, that patches up the EXEC mask).

Move the expansion of whole wave function return pseudos (regular and
tail call returns) to prolog epilog insertion, since that's where we
patch up the EXEC mask.
2025-09-04 10:34:43 +02:00
Matt Arsenault
a23a5b0683
AMDGPU: Remove the DS special case in getRegClass (#156696)
These instructions should now have proper representation
with separate instructions for operands which must be paired.
2025-09-04 15:14:17 +09:00
Matt Arsenault
dc170c7e31 AMDGPU: Special case align requirement for AV_MOV_B64_IMM_PSEUDO
This should not require aligned registers. Fixes expensive_checks
test failure. I don't see a better way until the new system
to specify the alignment per register is done.
2025-09-04 09:55:39 +09:00
Matt Arsenault
dd5eb46690
AMDGPU: Fold 64-bit immediate into copy to AV class (#155615)
This is in preparation for patches which will intoduce more
copies to av registers.
2025-09-03 09:29:59 +09:00
Matt Arsenault
d7484684e5
AMDGPU: Refactor isImmOperandLegal (#155607)
The goal is to expose more variants that can operate without
preconstructed MachineInstrs or MachineOperands.
2025-09-03 09:06:18 +09:00
Matt Arsenault
d6a72cb300
AMDGPU: Fix fixme for out of bounds indexing in usesConstantBus check (#155603)
This loop over all the operands in the MachineInstr will eventually
go past the end of the MCInstrDesc's explicit operands. We don't
need the instr desc to compute the constant bus usage, just the
register and whether it's implicit or not. The check here is slightly
conservative. e.g. a random vcc implicit use appended to an instruction
will falsely report a constant bus use.
2025-09-02 17:25:08 +00:00
Matt Arsenault
e3e1652d18
AMDGPU: Add version of isImmOperandLegal for MCInstrDesc (#155560)
This avoids the need for a pre-constructed instruction, at least
for the first argument.
2025-09-03 01:18:41 +09:00
Chris Jackson
7d0203b39f
[AMDGPU] Prevent generation of unused SGPR IMPLICIT_DEF assignments (#155241)
Dead VGPR->SGPR copies were converted to IMPLICIT_DEF assignments that
were unused. Prevent these from being created and update the numerous
affected tests.
2025-08-27 13:18:18 +01:00
Matt Arsenault
de99aabed6
AMDGPU: Remove unused argument from adjustAllocatableRegClass (#155554) 2025-08-27 06:00:34 +00:00
Matt Arsenault
05f208ac0b
AMDGPU: Stop checking if registers are reserved in adjustAllocatableRegClass (#155125)
This function is used to implement TargetInstrInfo::getRegClass and
conceptually should not depend on the dynamic state of the function.
2025-08-26 20:09:32 +09:00
Matt Arsenault
db024764c1
AMDGPU: Fix not diagnosing unaligned VGPRs for vsrc operands (#155104)
This was not checking the alignment requirement for 64-bit
operands which accept inline immediates. Not all custom operand
types were handled in the switch, so round out with explicit
handling of all enum values, and change the default to use
the default checks for unhandled cases.

Fixes #155095
2025-08-25 17:42:58 +09:00
Matt Arsenault
52ed03db59
AMDGPU: Simplify foldImmediate with register class based checks (#154682)
Generalize the code over the properties of the mov instruction,
rather than maintaining parallel logic to figure out the type
of mov to use. I've maintained the behavior with 16-bit physical
SGPRs, though I think the behavior here is broken and corrupting
any value that happens to be live in the high bits. It just happens
there's no way to separately write to those with a real instruction
but I don't think we should be trying to make assumptions around
that property.

This is NFC-ish. It now does a better job with imm pseudos which
practically won't reach here. This also will make it easier
to support more folds in a future patch.

I added a couple of new tests with 16-bit extract of 64-bit sources.
2025-08-23 02:13:50 +00:00
Matt Arsenault
2b46f31ee3
AMDGPU: Sign extend immediates for 32-bit subregister extracts (#154870)
extractSubregFromImm previously would sign extend the 16-bit subregister
extracts, but not the 32-bit. We try to consistently store immediates
as sign extended, since not doing it can result in misreported
isInlineImmediate checks.
2025-08-22 16:50:36 +09:00
Matt Arsenault
fc5fcc0c95
AMDGPU: Start using AV_MOV_B64_IMM_PSEUDO (#154500) 2025-08-22 13:59:36 +09:00
Matt Arsenault
694a488708
AMDGPU: Add pseudoinstruction for 64-bit agpr or vgpr constants (#154499)
64-bit version of 7425af4b7aaa31da10bd1bc7996d3bb212c79d88. We
still need to lower to 32-bit v_accagpr_write_b32s, so this has
a unique value restriction that requires both halves of the constant
to be 32-bit inline immediates. This only introduces the new
pseudo definitions, but doesn't try to use them yet.
2025-08-20 22:54:37 +09:00
Matt Arsenault
ed0e531044
AMDGPU: Use Register type for isStackAccess (#154320) 2025-08-19 23:00:45 +09:00
Pierre van Houtryve
6f7c77fe90
[AMDGPU] Check noalias.addrspace in mayAccessScratchThroughFlat (#151319)
PR #149247 made the MD accessible by the backend so we can now leverage
it in the memory model. The first use case here is detecting if a flat op
can access scratch memory.
Benefits both the MemoryLegalizer and InsertWaitCnt.
2025-08-19 07:42:59 +02:00
Stanislav Mekhanoshin
906c9e9542
[AMDGPU] Remove misplaced assert. (#154187)
The assert that RegScavenger required for long branching is now
placed below the code to use s_add_pc64, where it is actually
used.
2025-08-18 13:58:54 -07:00
Stanislav Mekhanoshin
13716843eb
[AMDGPU] Make s_setprio_inc_wg a scheduling boundary (#154188) 2025-08-18 13:20:38 -07:00
Stanislav Mekhanoshin
ea14834966
[AMDGPU] Per-subtarget DPP instruction classification (#153096)
This is NFCI at this point.
2025-08-11 15:41:02 -07:00
Stanislav Mekhanoshin
dddeb07c2e
[AMDGPU] Restrict packed math FP32 instructions to read only one SGPR per operand on gfx12+ (#152465)
Sec. 4.6.7.1 of the gfx1250 SPG states that if an SGPR is used
as an operand, only one SGPR will be read for both the low and high
operations. As a result, the corresponding bits in `op_sel` and
`op_sel_hi` must be the same when the operand is an SGPR.

Co-authored-by: Tian, Shilei <Shilei.Tian@amd.com>

Co-authored-by: Tian, Shilei <Shilei.Tian@amd.com>
2025-08-07 16:13:34 -07:00
Shilei Tian
351b38f266
[AMDGPU] Mark address space cast from private to flat as divergent if target supports globally addressable scratch (#152376)
Globally addressable scratch is a new feature introduced in gfx1250.
However, this feature changes how scratch space is mapped into the flat
aperture, making address space casts from private to flat no longer
uniform.
2025-08-06 17:08:56 -04:00
Changpeng Fang
32161e9de3
[AMDGPU] Do not fold an immediate into instructions with frame indexes (#151263)
Do not fold an immediate into an instruction that already has a frame
index operand. A frame index could possibly turn out to be another immediate.

Fixes: SWDEV-536263

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-08-06 11:47:37 -07:00
Stanislav Mekhanoshin
33abf05af4
[AMDGPU] gfx1250 v_permlane_* instructions (#151749) 2025-08-01 16:14:19 -07:00
Stanislav Mekhanoshin
ce40863209
[AMDGPU] Add v_cvt_sr|pk_bf8|fp8_f16 gfx1250 instructions (#151415) 2025-07-30 17:24:45 -07:00
Brox Chen
2a3f72ee6e
[AMDGPU][CodeGen][True16] Correct size calculation for d16 insts (#151042)
D16 pesudo instructions are introduced in true16 mode to represet a D16
load/store. In MC lowering, the pesudo instructions are lowered to the
corresponding D16 Lo/Hi MC Inst respecting the register allocation.

However, the pesudo instruction has size 0 and cause an issue in the
Inst size estimation. Use D16 Lo when calculating inst size
2025-07-29 13:01:57 -04:00
Pierre van Houtryve
2ad4e93ded
[AMDGPU][gfx1250] Use SCOPE_SE for stores that may hit scratch (#150586) 2025-07-28 11:40:56 +02:00
Changpeng Fang
400ce1a3d3
[AMDGPU] Support AMDGPUClamp for bf16 on gfx1250 (#150663)
Scalar version uses V_MAX_BF16_PSEUDO which is expanded to V_PK_MAX_BF16
with unused high bits. If V_PK_MAX_BF16 is produced directly instead
that creates problem with folding of the clamp into other scalar
instructions due to incompatible clamp bits.

FIXME-TRUE16: enable bf16 clamp with true16

---------

Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2025-07-25 12:13:06 -07:00
Jay Foad
8005c6a108
[AMDGPU] Simplify SIInstrInfo::isLegalToSwap. NFC. (#149058) 2025-07-25 13:02:34 +01:00
Stanislav Mekhanoshin
2346968807
[AMDGPU] Add V_ADD|SUB|MUL_U64 gfx1250 opcodes (#150291) 2025-07-23 13:17:56 -07:00