918 Commits

Author SHA1 Message Date
Matt Arsenault
4fb31e4401 AMDGPU: Use const reference for DebugLoc 2025-03-04 13:56:52 +07:00
sstipano
531c48546d
[AMDGPU][NFC] Move isXDL and isDGEMM to SIInstrInfo. (#129103) 2025-02-28 03:14:51 +01:00
Frederik Harwath
50b508cc7b
[AMDGPU] Verify SdwaSel value range (#128898)
Make the MachineVerifier check that the value provided for an SDWA selection is a
valid value for the SdwaSel enum.
2025-02-27 08:11:29 +01:00
Brox Chen
364b97f23b
[AMDGPU][True16][CodeGen] 16bit spill support in true16 mode (#128060)
Enables 16-bit values to be spilled to scratch.

Note, the memory instructions used are defined as reading and writing
VGPR_32, but do not clobber the unspecified 16-bits of those registers,
and so spills and reloads of lo and hi halves of the registers work.
2025-02-26 16:17:20 -05:00
Brox Chen
bb62af7d14
[AMDGPU][True16][CodeGen] true16 codegen for valu op (#124797)
true16 selection for valu ops, enable `real-true16` attribute and update
the codegen test
2025-02-26 10:50:49 -05:00
Pierre van Houtryve
0f0d3fb6b5
[AMDGPU] Do not allow M0 as v_readlane_b32 dst (#128867)
See #128851 - this is the same patch, but for v_readlane_b32.

This instruction is used much less often so there were less changes
required.
2025-02-26 14:13:39 +01:00
Pierre van Houtryve
5231736329
[AMDGPU] Do not allow M0 as v_readfirstlane_b32 dst (#128851)
M0 can only be written to by the SALU, so `v_readfirstlane_b32 m0` is
effectively useless. Represent this by restricting the dest RC of that
instruction to `SReg_32_XM0` which excludes M0.

There is a lot of test changes due to the register class changing, but
most changes are trivial. In some cases, an extra register and
`s_mov_b32` is needed.

Fixes SWDEV-513269
2025-02-26 13:14:03 +01:00
Craig Topper
571b787b83
[CodeGen] Change copyPhysReg interface to use Register instead of MCRegister. (#128473)
NVPTX, SPIRV, and WebAssembly pass virtual registers to this function
since they don't perform register allocation. We need to use Register to
avoid a virtual register being converted to MCRegister by the caller.
2025-02-24 09:55:34 -08:00
Benjamin Kramer
ddf24086f1 [AMDGPU] Remove unused variables. NFC 2025-02-19 18:05:22 +01:00
Brox Chen
210036a22e
[AMDGPU][True16][CodeGen] true16 codegen pattern for fma (#127240)
Previous PR https://github.com/llvm/llvm-project/pull/122950 get
reverted since it hit the buildbot failure. Another patch get merged
when this PR is under review, and thus causing one test not up to date.

repen this PR and fixed the issue.
2025-02-19 11:37:24 -05:00
Matt Arsenault
22d65d8989
AMDGPU: Teach isOperandLegal about SALU literal restrictions (#127626)
isOperandLegal mostly implemented the VALU operand rules, and
largely ignored SALU restrictions. This theoretically avoids
folding literals into SALU insts which already have a literal
operand. This issue is currently avoided due to a bug in
SIFoldOperands; this change will allow using raw operand
legality rules.

This breaks the formation of s_fmaak_f32 in SIFoldOperands,
but it probably should not have been forming there in the first
place. TwoAddressInsts or RA should generally handle that,
and this only worked by accident.
2025-02-19 10:53:03 +07:00
Matt Arsenault
eb7c947272
AMDGPU: Correct legal literal operand logic for multiple uses (#127594)
The same literal can be used multiple times in an instruction,
not just once. We were not tracking the used value to verify this,
so correct this.

This helps avoid regressions in a future patch.
2025-02-18 19:58:42 +07:00
Matt Arsenault
7c03865a1e
AMDGPU: Extract lambda used in foldImmediate into a helper function (#127484)
It was also too permissive for a more general utilty, only return
the original immediate if there is no subregister.
2025-02-18 17:16:50 +07:00
Matt Arsenault
c5def84ca4
AMDGPU: Handle brev and not cases in getConstValDefinedInReg (#127483)
We should not encounter these cases in the peephole-opt use today,
but get the common helper function to handle these.
2025-02-18 11:23:49 +07:00
Matt Arsenault
83d7f4b8c3
AMDGPU: Implement getConstValDefinedInReg and use in foldImmediate (NFC) (#127482)
This is NFC because it currently only matters for cases that are not
isMoveImmediate, and we do not yet implement any of those. This just
moves the implementation of foldImmediate to use the common  interface,
similar to how x86 does it.
2025-02-18 11:21:02 +07:00
Matt Arsenault
4dee305ce2
AMDGPU: Fix foldImmediate breaking register class constraints (#127481)
This fixes a verifier error when folding an immediate materialized
into an aligned vgpr class into a copy to an unaligned virtual register.
2025-02-18 10:34:48 +07:00
Kazu Hirata
02d4aac55c
[AMDGPU] Remove materializeImmediate (#127420)
The lase use was removed in:

  commit cbf34a5f7701148d68951320a72f483849b22eaf
  Author: Juan Manuel Martinez Caamaño <jmartinezcaamao@gmail.com>
  Date:   Fri Aug 23 14:06:17 2024 +0200
2025-02-16 22:47:14 -08:00
Brox Chen
cf1165cb9c
Revert "[AMDGPU][True16][CodeGen] true16 codegen pattern for fma (#12… (#127175)
Reverting this patch since it raise buildbot failure

This reverts commit 2a7487cc2e0fb8bd91784e2d9636a65baa6d90ed.
2025-02-14 02:28:45 -05:00
Brox Chen
2a7487cc2e
[AMDGPU][True16][CodeGen] true16 codegen pattern for fma (#122950)
true16 codegen pattern for f16 fma.

created a duplicated shrink-mad-fma-gfx10.mir from shrink-mad-fma to
seperate pre-GFX11 and GFX11 mir test.
2025-02-14 02:16:00 -05:00
Rahul Joshi
bee9664970
[TableGen] Emit OpName as an enum class instead of a namespace (#125313)
- Change InstrInfoEmitter to emit OpName as an enum class
  instead of an anonymous enum in the OpName namespace.
- This will help clearly distinguish between values that are 
  OpNames vs just operand indices and should help avoid
  bugs due to confusion between the two.
- Rename OpName::OPERAND_LAST to NUM_OPERAND_NAMES.
- Emit declaration of getOperandIdx() along with the OpName
  enum so it doesn't have to be repeated in various headers.
- Also updated AMDGPU, RISCV, and WebAssembly backends
  to conform to the new definition of OpName (mostly
  mechanical changes).
2025-02-12 08:19:30 -08:00
Jon Chesterfield
4f358d75d0 [amdgpu][nfc] Post-commit feedback on c39fba209 2025-01-30 20:07:44 +00:00
Jon Chesterfield
c39fba209c
[AMDGPU] S_SET_GPR_IDX_ON can be passed an immediate index (#125086)
Oversight found by ISel fuzz effort. Assuming the argument is a
register, in some cases it can be an immediate. Tablegen's type for the
instruction is SSrc_b32, i.e. register or immediate fine. Added the
repro from the bug reporter as a test case - prior to this patch llvm
will assert in getReg.

Fixes SWDEV-508589
2025-01-30 16:40:12 +00:00
Brox Chen
5d1c596ab4
[AMDGPU][True16][MC] true16 for minimummaximum/max/min/max3/min3 (#124184)
true16 support for gfx12 instructions including:

v_minimummaximum_f16
v_maximumminimum_f16
v_maximum_f16
v_minimum_f16
v_maximum3_f16
v_minimum3_f16
2025-01-27 16:52:59 -05:00
Venkata Ramanaiah Nalamothu
f7d8336a2f
[llvm] Pass MachineInstr flags to storeRegToStackSlot/loadRegFromStackSlot (NFC) (#120622)
This patch is in preparation to enable setting the MachineInstr::MIFlag
flags, i.e. FrameSetup/FrameDestroy, on callee saved register
spill/reload instructions in prologue/epilogue. This eventually helps in
setting the prologue_end and epilogue_begin markers more accurately.

The DWARF Spec in "6.4 Call Frame Information" says:

The code that allocates space on the call frame stack and performs the
save
operation is called the subroutine’s prologue, and the code that
performs
the restore operation and deallocates the frame is called its epilogue.

which means the callee saved register spills and reloads are part of
prologue (a.k.a frame setup) and epilogue (a.k.a frame destruction),
respectively. And, IIUC, LLVM backend uses FrameSetup/FrameDestroy flags
to identify instructions that are part of call frame setup and
destruction.

In the trunk, while most targets consistently set
FrameSetup/FrameDestroy on save/restore call frame information (CFI)
instructions of callee saved registers, they do not consistently set
those flags on the actual callee saved register spill/reload
instructions.

I believe this patch provides a clean mechanism to set
FrameSetup/FrameDestroy flags on the actual callee saved register
spill/reload instructions as needed. And, by having default argument of
MachineInstr::NoFlags for Flags, this patch is a NFC.

With this patch, the targets have to just pass FrameSetup/FrameDestroy
flag to the storeRegToStackSlot/loadRegFromStackSlot calls from the
target derived spillCalleeSavedRegisters and restoreCalleeSavedRegisters
to set those flags on callee saved register spill/reload instructions.

Also, this patch makes it very easy to set the source line information
on callee saved register spill/reload instructions which is needed by
the DwarfDebug.cpp implementation to set prologue_end and epilogue_begin
markers more accurately.

As per DwarfDebug.cpp implementation:

prologue_end is the first known non-DBG_VALUE and non-FrameSetup
location
    that marks the beginning of the function body

epilogue_begin is the first FrameDestroy location that has been seen in
the
    epilogue basic block

With this patch, the targets have to just do the following to set the
source line information on callee saved register spill/reload
instructions, without hampering the LLVM's efforts to avoid adding
source line information on the artificial code generated by the
compiler.

    <Foo>InstrInfo::storeRegToStackSlot() {
    ...
      DebugLoc DL =
Flags & MachineInstr::FrameSetup ? DebugLoc() : MBB.findDebugLoc(I);
    ...
    }

    <Foo>InstrInfo::loadRegFromStackSlot() {
    ...
      DebugLoc DL =
Flags & MachineInstr::FrameDestroy ? MBB.findDebugLoc(I) : DebugLoc();
    ...
    }

While I understand this patch would break out-of-tree backend builds, I
think it is in the right direction.

One immediate use case that can benefit from this patch is fixing
#120553 becomes simpler.
2025-01-22 13:36:39 +05:30
Kazu Hirata
ceaaa2b9ae [AMDGPU] Fix warnings
This patch fixes:

  llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:2792:14: error: comparison of
  integers of different signs: 'unsigned int' and 'int'
  [-Werror,-Wsign-compare]

  llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:2797:14: error: comparison of
  integers of different signs: 'unsigned int' and 'int'
  [-Werror,-Wsign-compare]
2025-01-21 20:24:30 -08:00
Shoreshen
7c58d6363a
[AMDGPU] Add commute for some VOP3 inst (#121326)
add commute for some VOP3 inst, allow commute for both inline constant
operand, adjust tests

Fixes #111205
2025-01-22 11:08:26 +07:00
Austin Kerbow
657fb4433e
[AMDGPU] Add target hook to isGlobalMemoryObject (#112781)
We want special handing for IGLP instructions in the scheduler but they
should still be treated like they have side effects by other passes. Add
a target hook to the ScheduleDAGInstrs DAG builder so that we have more
control over this.
2025-01-11 09:57:57 -08:00
Matt Arsenault
f6365a47a1
AMDGPU: Fix assert on physreg MUBUF rsrc operand (#120815)
The stack case uses a physical register and should not ordinarily
reach here, but strange things happen at -O0. The testcase still
errors because we do not yet attempt to handle arbitrary dynamic
sized allocas yet.

Fixes: SWDEV-503538
2025-01-07 08:11:05 +07:00
Brox Chen
ce831a231a
[AMDGPU][True16][MC] true16 for v_fma_f16 (#119477)
Support true16 format for v_fma_f16 in MC.

Since we are replacing v_fma_f16 to v_fma_f16_t16/v_fma_f16_fake16 in
Post-GFX11, have to update the CodeGen pattern for v_fma_f16_fake16 to
get CodeGen test passing. There is no pattern modified/created, but just
replacing the v_fma_f16 with fake16 format.
2025-01-06 15:02:04 -05:00
Brox Chen
e10b12e656
[AMDGPU][True16][MC] true16 for v_div_fixup_f16 (#119613)
Support true16 format for v_div_fixup_f16 in MC.
2024-12-18 18:01:13 -05:00
Ruiling, Song
67c55b1ffc
[AMDGPU] Make max dwords of memory cluster configurable (#119342)
We find it helpful to increase the value for graphics workload. Make it
configurable so we can experiment with a different value.
2024-12-18 14:17:27 +08:00
Matt Arsenault
5e53a8dadb
AMDGPU: Fix verifier assert with out of bounds subregister indexes (#119799)
The manual check for aligned VGPR classes would assert if a virtual
register used an index not supported by the register class.
2024-12-13 11:52:11 +09:00
Matt Arsenault
1944d192bd
AMDGPU: Use isWave[32|64] instead of comparing size value (#117411) 2024-11-23 09:30:57 -08:00
Matt Arsenault
d1cca3133a
AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32 for gfx950 (#117260)
This was a bit annoying because these introduce a new special case
encoding usage. op_sel is repurposed as a subset of dpp controls,
and is eligible for VOP3->VOP1 shrinking. For some reason fi also
uses an enum value, so we need to convert the raw boolean to 1 instead
of -1.

The 2 registers are swapped, so this has 2 defs. Ideally the builtin
would return a pair, but that's difficult so return a vector instead.
This would make a hypothetical builtin that supports v2f16 directly
uglier.
2024-11-22 20:12:50 -08:00
Brox Chen
4cc278587f
[AMDGPU][True16][MC] VOPC profile fake16 pseudo update (#113175)
Update VOPC profile with VOP3 pseudo:

1. On GFX11+, v_cmp_class_f16 has src1 type f16 for literals, however
it's semantically interpreted as an integer. Update VOPC class f16
profile from operand type f16, i16 to f16, f16, currently updating it
for fake16 format, and will update t16 format in the following patch.
2. 16bit V_CMP_CLASS instructions (V_CMP_**_U/I/F16) are named with
`t16`, but actually using 32 bit registers. Correct it by updating the
pseudo definitions with useRealTrue16/useFakeTrue16 predicates and
rename these `t16` instructions to `fake16`.
3. Update the inst select so that `t16`/`fake16` instructions are
selected in true16/fake16 flow.
4. The mir test file are impacted for a name change of these impacted 16
bit V_CMP instructions, but non-functional change to emitted code
2024-11-22 12:12:13 -05:00
Christudasan Devadasan
2b5b57c5cf
[AMDGPU] Skip non-wwm reg implicit-def from bb prolog (#115834)
Currently all implicit-def instructions are part of
bb prolog. We should only include the wwm-register's
implicit definitions into the BB prolog. The other
vector class registers' implicit defs when exist at
the bb top might cause interference when pushed the
LR_split copy insertion downwards. The SplitKit is
very strict on altering the insertion points and will
assert such instances.
2024-11-12 23:30:57 +05:30
Brox Chen
e8644e3b47
[AMDGPU][True16][MC] VOP2 update instructions with fake16 format (#114436)
Some old "t16" VOP2 instructions are actually in fake16 format. Correct
and update test file
2024-11-05 16:12:49 -05:00
Matt Arsenault
8e61aaa021
AMDGPU: Fix illegal commute with frame index (#114497)
In ca409892c5396fa3fbb8ea4dbf53d0e952f36d09, frame indexes started
being treated more like registers, rather than immediates. Update
the commute logic to avoid failing the verifier by moving illegal
SGPR operands in place of a frame index.
2024-11-01 10:02:29 -07:00
Christudasan Devadasan
3c5cea650d
[AMDGPU]: Add implicit-def to the BB prolog (#112872)
IMPLICIT_DEF inserted for a wwm-register at the
very first block or the predecessor block where
it is used for sgpr spilling can appear at a block
begin that requires spill-insertion during per-lane
VGPR regalloc phase. The presence of the IMPLICIT_DEF
currently breaks the BB prolog.

Fixes: SWDEV-490717
2024-10-21 13:21:16 +05:30
Nikita Popov
255a99c29f
[APInt] Fix APInt constructions where value does not fit bitwidth (NFCI) (#80309)
This fixes all the places that hit the new assertion added in
https://github.com/llvm/llvm-project/pull/106524 in tests. That is,
cases where the value passed to the APInt constructor is not an N-bit
signed/unsigned integer, where N is the bit width and signedness is
determined by the isSigned flag.

The fixes either set the correct value for isSigned, set the
implicitTrunc flag, or perform more calculations inside APInt.

Note that the assertion is currently still disabled by default, so this
patch is mostly NFC.
2024-10-17 08:48:08 +02:00
Brox Chen
35e937b4de
[AMDGPU][True16][CodeGen] fp conversion in true/fake16 format (#101678)
fp conversion V_CVT_F_F/V_CVT_F_U instructions true16 format were
previously implemented using fake16 profile.

With the MC support inplace, correct and support these instructions in
true16/fake16 format in CodeGen
2024-10-16 12:26:01 -04:00
Changpeng Fang
f6e93b8147
AMDGPU: Minor improvement and cleanup for waterfall loop generation (#111886)
First, ReadlanePieces should be in the scope of each MachineOperand. It
is not correct if we declare in a outer scope without clearing after the
use for a MachineOperand.
Additionally, we do not need the OrigBB argyment for
emitLoadScalarOpsFromVGPRLoop, since MachineFunction (the only use) can
be obtained from LoopBB (or BodyBB).
2024-10-10 12:13:36 -07:00
Christudasan Devadasan
6636f32615
[AMDGPU] Include WWM register spill into BB Prolog (#111496)
With #93526 we split the regalloc pipeline further
to have a standalone allocation for wwm registers
and per-lane VGPRs. Currently the presence of the
wwm-spill reloads inserted at the bb-top limits the
isBasicPrologue function during the per-lane vgpr
regalloc to skip past the exec manipulation instruction
and ended up causing incorrect codegen. The wmm-spill
inserted during the wwm-regalloc pipeline should also
be included in the bb-prolog so that the per-lane vgpr
regalloc pipeline can identify the appropriate insertion
points for their spills and copies.
2024-10-08 15:13:12 +05:30
Yaxun (Sam) Liu
3b88805ca2
[AMDGPU] Fix SDWA commuting (#106920)
SDWA insts miss reverse opcode, which causes them to be treated as
commutable with default reverse opcode i.e. their own opcode. As a
result, SWDA F16 sub A, B and Sub B, A are merged by machine CSE. The
correct behavior is to merged sub A, B and subrev B, A instead of sub B,
A. This issues caused failures in rocFFT tests.

Another issue is that src0_sel and src1_sel are not swapped when SDWA
insts are commuted.

Verified that this fixes rocFFT tests failure.
2024-10-04 15:53:40 -04:00
Jay Foad
8d13e7b8c3
[AMDGPU] Qualify auto. NFC. (#110878)
Generated automatically with:
$ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find
lib/Target/AMDGPU/ -type f)
2024-10-03 13:07:54 +01:00
Jay Foad
735a5f67e3
[AMDGPU] When allocating VGPRs, VGPR spills are not part of the prologue (#109439)
PRs #69924 and #72140 modified SIInstrInfo::isBasicBlockPrologue to skip
over EXEC modifications and spills when allocating VGPRs. But treating
VGPR spills as part of the prologue can confuse the register allocator
as in #109294, so restrict it to SGPR spills, which were inserted during
SGPR allocation which is done in an earlier pass.

Fixes: #109294
Fixes: SWDEV-485841
2024-09-30 13:24:55 +01:00
Corbin Robeck
661666d43a
[AMDGPU] Move renamedInGFX9 from TableGen to SIInstrInfo helper function/macro to free up a bit slot (#82787)
Follow on to #81525 and #81901 in the series of consolidating bits in
TSFlags.

Remove renamedInGFX9 from SIInstrFormats.td and move to helper
function/macro in SIInstrInfo. renamedInGFX9 points to V_{add, sub,
subrev, addc, subb, subbrev}_ U32 and V_{div_fixup_F16, fma_F16,
interp_p2_F16, mad_F16, mad_U16, mad_I16}.
2024-09-25 20:38:51 -04:00
Georgi Mirazchiyski
c30fa3cde7
[AMDGPU] Fix has_single_bit assertion for Mask in SIInstrInfo (#109785)
Convert the `int64_t` Mask to `uint64_t` for `llvm::has_single_bit` to
compile.
2024-09-24 15:54:05 +04:00
Georgi Mirazchiyski
6cfe6a6b3e
[NFC][AMDGPU] Assert no bad shift operations will happen (#108416)
The assumption in the asserts is based on the fact that no SGPR/VGPR
register Arg mask in the ISelLowering and Legalizer can equal zero. They
are implicitly set to ~0 by default (meaning non-masked) or explicitly
to a non-zero value.
The `optimizeCompareInstr` case is different from the above described.
It requires the mask to be a power-of-two because it's a special-case
optimization, hence in this case we still cannot have an invalid shift.
This commit also silences static analysis tools wrt potential bad shifts
that could result from the output of `countr_zero(Mask)`.
2024-09-24 14:47:57 +04:00
Jun Wang
f6a8eb98b1
[AMDGPU][MC] Disallow null as saddr in flat instructions (#101730)
Some flat instructions have an saddr operand. When 'null' is provided as
saddr, it may have the same encoding as another instruction. For
example, the instructions 'global_atomic_add v1, v2, null' and
'global_atomic_add v[1:2], v2, off' have the same encoding. This patch
disallows having null as saddr.
2024-09-24 11:08:41 +04:00