933 Commits

Author SHA1 Message Date
Jay Foad
9cacc4138e
[AMDGPU] Move S_ADD_U64_PSEUDO handling into getVALUOp. NFC. (#142934)
S_ADD_U64_PSEUDO and S_SUB_U64_PSEUDO are not "special cases" so can be
handled in getVALUOp instead of moveToVALUImpl.
2025-06-05 16:49:24 +01:00
Brox Chen
b668b6439a
[AMDGPU][True16][CodeGen] legalize 16bit and 32bit use-def chain for moveToVALU in si-fix-sgpr-lowering (#138734)
Two changes in this patch:
1. Covered another case in legalizeOperandVALUt16 functions and the COPY
lowering, when SALU16 is used by SALU32, need to insert a reg_sequence
after moved to valu (previously only considered SALU32 used by SALU16
case)
2. Moved the useMI analysis into addUsersToMoveVALUList. Legalize the
targetted operand when needed.

Turn on frem test with true16 mode for gfx1150 which is failing before
this patch. A few bitcast tests also impacted by this change with some
v_mov being replaced to dual mov
2025-06-04 09:53:10 -04:00
Matt Arsenault
65b90c59ce
AMDGPU: Remove redundant operand folding checks (#140587)
This was pre-filtering out a specific situation from being
added to the fold candidate list. The operand legality will
ultimately be checked with isOperandLegal before the fold is
performed, so I don't see the plus in pre-filtering this one
case.
2025-05-29 19:38:45 +02:00
Justin Bogner
b7bb256703
Warn on misuse of DiagnosticInfo classes that hold Twines (#137397)
This annotates the `Twine` passed to the constructors of the various
DiagnosticInfo subclasses with `[[clang::lifetimebound]]`, which causes
us to warn when we would try to print the twine after it had already
been destructed.

We also update `DiagnosticInfoUnsupported` to hold a `const Twine &`
like all of the other DiagnosticInfo classes, since this warning allows
us to clean up all of the places where it was being used incorrectly.
2025-05-28 12:26:39 -07:00
Ivan Kosarev
66d3980b53
[AMDGPU][NFC] Remove _DEFERRED operands. (#139123)
All immediates are deferred now.
2025-05-09 10:10:53 +01:00
Ivan Kosarev
c290f48a45
[AMDGPU][NFC] Remove unused operand types. (#139062) 2025-05-08 12:48:25 +01:00
Brox Chen
09d01be856
[AMDGPU][True16][CodeGen] replace subreg_to_reg to req_sequence (#138746)
Since subreg_to_reg is considered broken in llvm, replace subreg_to_reg
to reg_sequence
2025-05-07 10:28:10 -04:00
Frederik Harwath
f541a3aad8
[AMDGPU] SIInstrInfo: Fix resultDependsOnExec for VOPC instructions (#134629)
SIInstrInfo::resultDependsOnExec assumes that operand 0 of a comparison
is always the destination of the instruction. This is not true for
instructions in VOPC form where it is "src0". This led to a crash in
machine-cse.

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-04-22 10:17:35 +02:00
Philip Reames
f2ecd86e34
[Analysis] Remove implicit LocationSize conversion from uint64_t (#133342)
This change removes the uint64_t constructor on LocationSize
preventing implicit conversion, and fixes up the using APIs to adapt to
the change. Note that I'm adding a couple of explicit conversion points
on routines where passing in a fixed offset as an integer seems likely
to have well understood semantics.

We had an unfortunate case which arose if you tried to pass a TypeSize
value to a parameter of LocationSize type. We'd find the implicit
conversion path through TypeSize -> uint64_t -> LocationSize which works
just fine for fixed values, but looses information and fails assertions
if the TypeSize was scalable. This change breaks the first link in that
implicit conversion chain since that seemed to be the easier one.
2025-04-18 07:46:31 -07:00
Brox Chen
bf388f8a43
[AMDGPU][True16][CodeGen] legalize operands when move16bit SALU to VALU (#133985)
This is a follow up PR from
https://github.com/llvm/llvm-project/pull/132089.

When a V2S copy and its useMI are lowered to VALU,  this patch check:
If the generated new VALU is a true16 inst. Add subreg access on all
operands if necessary.

an example MIR looks like:
```
%1:vgpr_32 = V_CVT_F32_U32_e64 %0:vgpr_32, 0, 0 ...
%2:sreg_32 = COPY %1:vgpr_32
%3:sreg_32 = S_FLOOR_F16 %2:sreg_32, ...
```
currently lowered to
```
%1:vgpr_32 = V_CVT_F32_U32_e64 %0:vgpr_32, 0, 0 ...
%2:vgpr_16 = V_FLOOR_F16_t16_e64 0, %1:vgpr_32, 0, 0, 0 ...
```
after this patch
```
%1:vgpr_32 = V_CVT_F32_U32_e64 %0:vgpr_32, 0, 0 ...
%2:vgpr_16 = V_FLOOR_F16_t16_e64 0, %1.lo16:vgpr_32, 0, 0, 0 ...
```
2025-04-03 12:26:41 -04:00
Brox Chen
dd1d41f833
[AMDGPU][True16][CodeGen] fix moveToVALU with proper subreg access in true16 (#132089)
There are V2S copies between vpgr16 and spgr32 in true16 mode. This is
caused by vgpr16 and sgpr32 both selectable by 16bit src in ISel.

When a V2S copy and its useMI are lowered to VALU,  this patch check
1. If the generated new VALU is used by a true16 inst. Add subreg access
if necessary.
2. Legalize the V2S copy by replacing it to subreg_to_reg

an example MIR looks like:
```
%2:sgpr_32 = COPY %1:vgpr_16
%3:sgpr_32 = S_OR_B32 %2:sgpr_32, ...
%4:vgpr_16 = V_ADD_F16_t16 %3:sgpr_32, ...
```
currently lowered to
```
%2:vgpr_32 = COPY %1:vgpr_16
%3:vgpr_32 = V_OR_B32 %2:vgpr_32, ...
%4:vgpr_16 = V_ADD_F16_t16 %3:vgpr_32, ...
```
after this patch
```
%2:vgpr_32 = SUBREG_TO_REG 0, %1:vgpr_16, lo16
%3:vgpr_32 = V_OR_B32 %2:vgpr_32, ...
%4:vgpr_16 = V_ADD_F16_t16 %3.lo16:vgpr_32, ...
```
2025-04-01 12:40:18 -04:00
Stephen Thomas
2e3fa4ba9e
[AMDGPU] Insert before and after instructions that always use GDS (#131338)
It is an architectural requirement that there must be no outstanding GDS
instructions when an "always GDS" instruction is issued, and also that
an always GDS instruction must be allowed to complete.

Insert waits on DScnt/LGKMcnt prior to (if necessary) and subsequent to
(unconditionally) any always GDS instruction, and an additional S_NOP if
the subsequent wait was followed by S_ENDPGM.

Always GDS instructions are GWS instructions, DS_ORDERED_COUNT,
DS_ADD_GS_REG_RTN, and DS_SUB_GS_REG_RTN (the latter two as considered
always GDS as of this patch).
2025-03-21 09:33:04 +00:00
Shilei Tian
b7852939b5
[NFC][AMDGPU] Replace multiple calls to MI.getOpcode() with Opcode (#131400) 2025-03-14 20:14:12 -04:00
Mirko Brkušanin
a6089a949f
[AMDGPU] Ignore RegMask operands when folding operands to SALU insts (#130813)
Otherwise we hit an assert in isInlineConstant.
2025-03-12 09:59:24 +01:00
Matt Arsenault
7425af4b7a
AMDGPU: Add pseudoinstruction for agpr or vgpr constants (#130042) 2025-03-07 09:18:22 +07:00
Matt Arsenault
4fb31e4401 AMDGPU: Use const reference for DebugLoc 2025-03-04 13:56:52 +07:00
sstipano
531c48546d
[AMDGPU][NFC] Move isXDL and isDGEMM to SIInstrInfo. (#129103) 2025-02-28 03:14:51 +01:00
Frederik Harwath
50b508cc7b
[AMDGPU] Verify SdwaSel value range (#128898)
Make the MachineVerifier check that the value provided for an SDWA selection is a
valid value for the SdwaSel enum.
2025-02-27 08:11:29 +01:00
Brox Chen
364b97f23b
[AMDGPU][True16][CodeGen] 16bit spill support in true16 mode (#128060)
Enables 16-bit values to be spilled to scratch.

Note, the memory instructions used are defined as reading and writing
VGPR_32, but do not clobber the unspecified 16-bits of those registers,
and so spills and reloads of lo and hi halves of the registers work.
2025-02-26 16:17:20 -05:00
Brox Chen
bb62af7d14
[AMDGPU][True16][CodeGen] true16 codegen for valu op (#124797)
true16 selection for valu ops, enable `real-true16` attribute and update
the codegen test
2025-02-26 10:50:49 -05:00
Pierre van Houtryve
0f0d3fb6b5
[AMDGPU] Do not allow M0 as v_readlane_b32 dst (#128867)
See #128851 - this is the same patch, but for v_readlane_b32.

This instruction is used much less often so there were less changes
required.
2025-02-26 14:13:39 +01:00
Pierre van Houtryve
5231736329
[AMDGPU] Do not allow M0 as v_readfirstlane_b32 dst (#128851)
M0 can only be written to by the SALU, so `v_readfirstlane_b32 m0` is
effectively useless. Represent this by restricting the dest RC of that
instruction to `SReg_32_XM0` which excludes M0.

There is a lot of test changes due to the register class changing, but
most changes are trivial. In some cases, an extra register and
`s_mov_b32` is needed.

Fixes SWDEV-513269
2025-02-26 13:14:03 +01:00
Craig Topper
571b787b83
[CodeGen] Change copyPhysReg interface to use Register instead of MCRegister. (#128473)
NVPTX, SPIRV, and WebAssembly pass virtual registers to this function
since they don't perform register allocation. We need to use Register to
avoid a virtual register being converted to MCRegister by the caller.
2025-02-24 09:55:34 -08:00
Benjamin Kramer
ddf24086f1 [AMDGPU] Remove unused variables. NFC 2025-02-19 18:05:22 +01:00
Brox Chen
210036a22e
[AMDGPU][True16][CodeGen] true16 codegen pattern for fma (#127240)
Previous PR https://github.com/llvm/llvm-project/pull/122950 get
reverted since it hit the buildbot failure. Another patch get merged
when this PR is under review, and thus causing one test not up to date.

repen this PR and fixed the issue.
2025-02-19 11:37:24 -05:00
Matt Arsenault
22d65d8989
AMDGPU: Teach isOperandLegal about SALU literal restrictions (#127626)
isOperandLegal mostly implemented the VALU operand rules, and
largely ignored SALU restrictions. This theoretically avoids
folding literals into SALU insts which already have a literal
operand. This issue is currently avoided due to a bug in
SIFoldOperands; this change will allow using raw operand
legality rules.

This breaks the formation of s_fmaak_f32 in SIFoldOperands,
but it probably should not have been forming there in the first
place. TwoAddressInsts or RA should generally handle that,
and this only worked by accident.
2025-02-19 10:53:03 +07:00
Matt Arsenault
eb7c947272
AMDGPU: Correct legal literal operand logic for multiple uses (#127594)
The same literal can be used multiple times in an instruction,
not just once. We were not tracking the used value to verify this,
so correct this.

This helps avoid regressions in a future patch.
2025-02-18 19:58:42 +07:00
Matt Arsenault
7c03865a1e
AMDGPU: Extract lambda used in foldImmediate into a helper function (#127484)
It was also too permissive for a more general utilty, only return
the original immediate if there is no subregister.
2025-02-18 17:16:50 +07:00
Matt Arsenault
c5def84ca4
AMDGPU: Handle brev and not cases in getConstValDefinedInReg (#127483)
We should not encounter these cases in the peephole-opt use today,
but get the common helper function to handle these.
2025-02-18 11:23:49 +07:00
Matt Arsenault
83d7f4b8c3
AMDGPU: Implement getConstValDefinedInReg and use in foldImmediate (NFC) (#127482)
This is NFC because it currently only matters for cases that are not
isMoveImmediate, and we do not yet implement any of those. This just
moves the implementation of foldImmediate to use the common  interface,
similar to how x86 does it.
2025-02-18 11:21:02 +07:00
Matt Arsenault
4dee305ce2
AMDGPU: Fix foldImmediate breaking register class constraints (#127481)
This fixes a verifier error when folding an immediate materialized
into an aligned vgpr class into a copy to an unaligned virtual register.
2025-02-18 10:34:48 +07:00
Kazu Hirata
02d4aac55c
[AMDGPU] Remove materializeImmediate (#127420)
The lase use was removed in:

  commit cbf34a5f7701148d68951320a72f483849b22eaf
  Author: Juan Manuel Martinez Caamaño <jmartinezcaamao@gmail.com>
  Date:   Fri Aug 23 14:06:17 2024 +0200
2025-02-16 22:47:14 -08:00
Brox Chen
cf1165cb9c
Revert "[AMDGPU][True16][CodeGen] true16 codegen pattern for fma (#12… (#127175)
Reverting this patch since it raise buildbot failure

This reverts commit 2a7487cc2e0fb8bd91784e2d9636a65baa6d90ed.
2025-02-14 02:28:45 -05:00
Brox Chen
2a7487cc2e
[AMDGPU][True16][CodeGen] true16 codegen pattern for fma (#122950)
true16 codegen pattern for f16 fma.

created a duplicated shrink-mad-fma-gfx10.mir from shrink-mad-fma to
seperate pre-GFX11 and GFX11 mir test.
2025-02-14 02:16:00 -05:00
Rahul Joshi
bee9664970
[TableGen] Emit OpName as an enum class instead of a namespace (#125313)
- Change InstrInfoEmitter to emit OpName as an enum class
  instead of an anonymous enum in the OpName namespace.
- This will help clearly distinguish between values that are 
  OpNames vs just operand indices and should help avoid
  bugs due to confusion between the two.
- Rename OpName::OPERAND_LAST to NUM_OPERAND_NAMES.
- Emit declaration of getOperandIdx() along with the OpName
  enum so it doesn't have to be repeated in various headers.
- Also updated AMDGPU, RISCV, and WebAssembly backends
  to conform to the new definition of OpName (mostly
  mechanical changes).
2025-02-12 08:19:30 -08:00
Jon Chesterfield
4f358d75d0 [amdgpu][nfc] Post-commit feedback on c39fba209 2025-01-30 20:07:44 +00:00
Jon Chesterfield
c39fba209c
[AMDGPU] S_SET_GPR_IDX_ON can be passed an immediate index (#125086)
Oversight found by ISel fuzz effort. Assuming the argument is a
register, in some cases it can be an immediate. Tablegen's type for the
instruction is SSrc_b32, i.e. register or immediate fine. Added the
repro from the bug reporter as a test case - prior to this patch llvm
will assert in getReg.

Fixes SWDEV-508589
2025-01-30 16:40:12 +00:00
Brox Chen
5d1c596ab4
[AMDGPU][True16][MC] true16 for minimummaximum/max/min/max3/min3 (#124184)
true16 support for gfx12 instructions including:

v_minimummaximum_f16
v_maximumminimum_f16
v_maximum_f16
v_minimum_f16
v_maximum3_f16
v_minimum3_f16
2025-01-27 16:52:59 -05:00
Venkata Ramanaiah Nalamothu
f7d8336a2f
[llvm] Pass MachineInstr flags to storeRegToStackSlot/loadRegFromStackSlot (NFC) (#120622)
This patch is in preparation to enable setting the MachineInstr::MIFlag
flags, i.e. FrameSetup/FrameDestroy, on callee saved register
spill/reload instructions in prologue/epilogue. This eventually helps in
setting the prologue_end and epilogue_begin markers more accurately.

The DWARF Spec in "6.4 Call Frame Information" says:

The code that allocates space on the call frame stack and performs the
save
operation is called the subroutine’s prologue, and the code that
performs
the restore operation and deallocates the frame is called its epilogue.

which means the callee saved register spills and reloads are part of
prologue (a.k.a frame setup) and epilogue (a.k.a frame destruction),
respectively. And, IIUC, LLVM backend uses FrameSetup/FrameDestroy flags
to identify instructions that are part of call frame setup and
destruction.

In the trunk, while most targets consistently set
FrameSetup/FrameDestroy on save/restore call frame information (CFI)
instructions of callee saved registers, they do not consistently set
those flags on the actual callee saved register spill/reload
instructions.

I believe this patch provides a clean mechanism to set
FrameSetup/FrameDestroy flags on the actual callee saved register
spill/reload instructions as needed. And, by having default argument of
MachineInstr::NoFlags for Flags, this patch is a NFC.

With this patch, the targets have to just pass FrameSetup/FrameDestroy
flag to the storeRegToStackSlot/loadRegFromStackSlot calls from the
target derived spillCalleeSavedRegisters and restoreCalleeSavedRegisters
to set those flags on callee saved register spill/reload instructions.

Also, this patch makes it very easy to set the source line information
on callee saved register spill/reload instructions which is needed by
the DwarfDebug.cpp implementation to set prologue_end and epilogue_begin
markers more accurately.

As per DwarfDebug.cpp implementation:

prologue_end is the first known non-DBG_VALUE and non-FrameSetup
location
    that marks the beginning of the function body

epilogue_begin is the first FrameDestroy location that has been seen in
the
    epilogue basic block

With this patch, the targets have to just do the following to set the
source line information on callee saved register spill/reload
instructions, without hampering the LLVM's efforts to avoid adding
source line information on the artificial code generated by the
compiler.

    <Foo>InstrInfo::storeRegToStackSlot() {
    ...
      DebugLoc DL =
Flags & MachineInstr::FrameSetup ? DebugLoc() : MBB.findDebugLoc(I);
    ...
    }

    <Foo>InstrInfo::loadRegFromStackSlot() {
    ...
      DebugLoc DL =
Flags & MachineInstr::FrameDestroy ? MBB.findDebugLoc(I) : DebugLoc();
    ...
    }

While I understand this patch would break out-of-tree backend builds, I
think it is in the right direction.

One immediate use case that can benefit from this patch is fixing
#120553 becomes simpler.
2025-01-22 13:36:39 +05:30
Kazu Hirata
ceaaa2b9ae [AMDGPU] Fix warnings
This patch fixes:

  llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:2792:14: error: comparison of
  integers of different signs: 'unsigned int' and 'int'
  [-Werror,-Wsign-compare]

  llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:2797:14: error: comparison of
  integers of different signs: 'unsigned int' and 'int'
  [-Werror,-Wsign-compare]
2025-01-21 20:24:30 -08:00
Shoreshen
7c58d6363a
[AMDGPU] Add commute for some VOP3 inst (#121326)
add commute for some VOP3 inst, allow commute for both inline constant
operand, adjust tests

Fixes #111205
2025-01-22 11:08:26 +07:00
Austin Kerbow
657fb4433e
[AMDGPU] Add target hook to isGlobalMemoryObject (#112781)
We want special handing for IGLP instructions in the scheduler but they
should still be treated like they have side effects by other passes. Add
a target hook to the ScheduleDAGInstrs DAG builder so that we have more
control over this.
2025-01-11 09:57:57 -08:00
Matt Arsenault
f6365a47a1
AMDGPU: Fix assert on physreg MUBUF rsrc operand (#120815)
The stack case uses a physical register and should not ordinarily
reach here, but strange things happen at -O0. The testcase still
errors because we do not yet attempt to handle arbitrary dynamic
sized allocas yet.

Fixes: SWDEV-503538
2025-01-07 08:11:05 +07:00
Brox Chen
ce831a231a
[AMDGPU][True16][MC] true16 for v_fma_f16 (#119477)
Support true16 format for v_fma_f16 in MC.

Since we are replacing v_fma_f16 to v_fma_f16_t16/v_fma_f16_fake16 in
Post-GFX11, have to update the CodeGen pattern for v_fma_f16_fake16 to
get CodeGen test passing. There is no pattern modified/created, but just
replacing the v_fma_f16 with fake16 format.
2025-01-06 15:02:04 -05:00
Brox Chen
e10b12e656
[AMDGPU][True16][MC] true16 for v_div_fixup_f16 (#119613)
Support true16 format for v_div_fixup_f16 in MC.
2024-12-18 18:01:13 -05:00
Ruiling, Song
67c55b1ffc
[AMDGPU] Make max dwords of memory cluster configurable (#119342)
We find it helpful to increase the value for graphics workload. Make it
configurable so we can experiment with a different value.
2024-12-18 14:17:27 +08:00
Matt Arsenault
5e53a8dadb
AMDGPU: Fix verifier assert with out of bounds subregister indexes (#119799)
The manual check for aligned VGPR classes would assert if a virtual
register used an index not supported by the register class.
2024-12-13 11:52:11 +09:00
Matt Arsenault
1944d192bd
AMDGPU: Use isWave[32|64] instead of comparing size value (#117411) 2024-11-23 09:30:57 -08:00
Matt Arsenault
d1cca3133a
AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32 for gfx950 (#117260)
This was a bit annoying because these introduce a new special case
encoding usage. op_sel is repurposed as a subset of dpp controls,
and is eligible for VOP3->VOP1 shrinking. For some reason fi also
uses an enum value, so we need to convert the raw boolean to 1 instead
of -1.

The 2 registers are swapped, so this has 2 defs. Ideally the builtin
would return a pair, but that's difficult so return a vector instead.
This would make a hypothetical builtin that supports v2f16 directly
uglier.
2024-11-22 20:12:50 -08:00
Brox Chen
4cc278587f
[AMDGPU][True16][MC] VOPC profile fake16 pseudo update (#113175)
Update VOPC profile with VOP3 pseudo:

1. On GFX11+, v_cmp_class_f16 has src1 type f16 for literals, however
it's semantically interpreted as an integer. Update VOPC class f16
profile from operand type f16, i16 to f16, f16, currently updating it
for fake16 format, and will update t16 format in the following patch.
2. 16bit V_CMP_CLASS instructions (V_CMP_**_U/I/F16) are named with
`t16`, but actually using 32 bit registers. Correct it by updating the
pseudo definitions with useRealTrue16/useFakeTrue16 predicates and
rename these `t16` instructions to `fake16`.
3. Update the inst select so that `t16`/`fake16` instructions are
selected in true16/fake16 flow.
4. The mir test file are impacted for a name change of these impacted 16
bit V_CMP instructions, but non-functional change to emitted code
2024-11-22 12:12:13 -05:00