87 Commits

Author SHA1 Message Date
Sam Elliott
2042887709
Reland "[NFC][MI] Tidy Up RegState enum use (1/2)" (#176277)
This Change is to prepare to make RegState into an enum class. It:
- Updates documentation to match the order in the code.
- Brings the `get<>RegState` functions together and makes them
`constexpr`.
- Adopts the `get<>RegState` where RegStates were being chosen with
ternary operators in backend code.
- Introduces `hasRegState` to make querying RegState easier once it is
an enum class.
- Adopts `hasRegState` where equivalent was done with bitwise
arithmetic.
- Introduces `RegState::NoFlags`, which will be used for the lack of
flags.
- Documents that `0x1` is a reserved flag value used to detect if
someone is passing `true` instead of flags (due to implicit bool to
unsigned conversions).
- Updates two calls to `MachineInstrBuilder::addReg` which were passing
`false` to the flags operand, to no longer pass a value.
- Documents that `getRegState` seems to have forgotten a call to
`getEarlyClobberRegState`.

This PR relands llvm/llvm-project#176091 (commit
1d616cdca3aba9d22f120888bb6b09b75ca90b92) which was reverted in
llvm/llvm-project#176190 (commit
6309cd8668fc2ae589f156b23f86821f4ce5b7ea).
2026-01-16 13:05:06 -08:00
Sam Elliott
6309cd8668
Revert "[NFC][MI] Tidy Up RegState enum use (1/2)" (#176190)
Reverts llvm/llvm-project#176091

Reverting because some compilers were erroring on the call to
`Reg.isReg()` (which is not `constexpr`) in a `constexpr` function.
2026-01-15 07:58:05 -08:00
Sam Elliott
1d616cdca3
[NFC][MI] Tidy Up RegState enum use (1/2) (#176091)
This Change is to prepare to make RegState into an enum class. It:
- Updates documentation to match the order in the code.
- Brings the `get<>RegState` functions together and makes them
`constexpr`.
- Adopts the `get<>RegState` where RegStates were being chosen with
ternary operators in backend code.
- Introduces `hasRegState` to make querying RegState easier once it is
an enum class.
- Adopts `hasRegState` where equivalent was done with bitwise
arithmetic.
- Introduces `RegState::NoFlags`, which will be used for the lack of
flags.
- Documents that `0x1` is a reserved flag value used to detect if
someone is passing `true` instead of flags (due to implicit bool to
unsigned conversions).
- Updates two calls to `MachineInstrBuilder::addReg` which were passing
`false` to the flags operand, to no longer pass a value.
- Documents that `getRegState` seems to have forgotten a call to
`getEarlyClobberRegState`.
2026-01-15 07:47:05 -08:00
Jay Foad
72c69aefba
[AMDGPU] Make use of getFunction and getMF. NFC. (#167872) 2025-11-14 11:00:57 +00:00
Matt Arsenault
55422e804b
CodeGen: Remove TRI argument from getRegClass (#158225)
TargetInstrInfo now directly holds a reference to TargetRegisterInfo
and does not need TRI passed in anywhere.
2025-11-10 15:43:55 -08:00
Matt Arsenault
de4aa9cdea
AMDGPU: Minor SDWA pass cleanups (#166629)
Don't use low level regclass query in SDWA pass.
2025-11-07 20:50:01 -08:00
Matt Arsenault
1a5494ca4a
AMDGPU: Use RegClassByHwMode to manage operand VGPR operand constraints (#158272)
This removes special case processing in TargetInstrInfo::getRegClass to
fixup register operands which depending on the subtarget support AGPRs,
or require even aligned registers.

This regresses assembler diagnostics, which currently work by hackily
accepting invalid cases and then post-rejecting a validly parsed
instruction.
On the plus side this now emits a comment when disassembling unaligned
registers for targets with the alignment requirement.
2025-10-08 11:19:54 +09:00
Jay Foad
3cb2174219
[AMDGPU] Skip debug uses in SIPeepholeSDWA (#160092) 2025-09-22 14:40:00 +01:00
Frederik Harwath
d0d79fd1ac
[AMDGPU] si-peephole-sdwa: reuse getOne{NonDBGUse,Def} (NFC) (#156455)
This patch changes the findSingleRegDef function from si-peephole-sdwa
to reuse MachineRegisterInfo::getOneDef and findSingleRefUse to use a
new MachineRegisterInfo::getOneNonDBGUse function.
2025-09-03 10:35:32 +02:00
Jun Wang
063cee7bde
[AMDGPU][MC] Allow opsel for v_max_i16 etc in GFX10 (#143982)
In GFX10, a number of VOP3 instructions should allow opsel, including
V_MAX_I16, V_MAX_U16, V_MIN_I16, V_MIN_U16, V_MUL_LO_U16, V_LSHLREV_B16,
V_LSHRREV_B16, and V_ASHRREV_I16.
2025-06-26 14:08:13 -07:00
Frederik Harwath
8a198f89bf
[AMDGPU] si-peephole-sdwa: Remove dead code from createSDWAversion (#141462)
In an earlier state of this code, it was possible for an existing SDWA
MI to reach the code in the "createSDWAversion" function. This is no
longer possible; see assert at the top of the function.

Remove code that tries to handle operands on pre-existing SDWA
instructions from the function.
2025-05-26 15:28:03 +02:00
Frederik Harwath
d45031ce52
[AMDGPU] si-peephole-sdwa: Disable V_CNDMASK_B32 conversion with sext (#140760)
The sext modifier on an operand of V_CNDMASK_B32_sdwa gets erroneously
turned into a neg modifier in the assembly output.

As a workaround, to avoid miscompilation, this patch disables the
conversion of V_CNDMASK_B32 to the SDWA form if any operand uses an sext
modifier.

Fixes #138766.

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-05-26 09:33:09 +02:00
Frederik Harwath
1377535d99
[AMDGPU] si-peephole-sdwa: Fix cndmask vcc use for wave32 (#139541)
Before V_CNDMASK_B32_e64 gets converted to SDWA form, a conversion to
V_CNDMASK_B32_e32 occurs.
The vcc use of this instruction must be fixed into a vcc_lo use for wave32.
This fix only happens after the final conversion to the SDWA form. This led
to a compiler error in situations where the conversion to SDWA aborts.

Make sure that the vcc-fix gets applied even if the SDWA conversion is
not completed.

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-05-14 07:37:01 +02:00
Kazu Hirata
e1cff21f65 [AMDGPU] Fix a warning
This patch fixes:

  llvm/lib/Target/AMDGPU/SIPeepholeSDWA.cpp:1102:8: error: unused
  variable 'Converted' [-Werror,-Wunused-variable]
2025-05-05 10:07:22 -07:00
Frederik Harwath
721cba476d
[AMDGPU] SIPeepholeSDWA: Handle V_CNDMASK_B32_e64 (#137930)
The VOP3 form of the V_CNDMASK_B32 instruction takes a carry-in
operand. The conversion to SDWA implies a conversion to VOP2 form
which reads from VCC instead.

Convert V_CNDMASK_B32_e64 instructions that might be converted to SDWA
to V_CNDMASK_B32_e32 first and introduce a copy of the carry-in operand
to VCC.

Closes #133431.

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-05-05 18:14:14 +02:00
Frederik Harwath
ba9bd22e1b
[AMDGPU] Account for existing SDWA selections (#123221)
The si-peephole-sdwa pass adjusts the selections on sdwa instructions to
the selections on their operands during its conversions. For instance,
if an instruction selects `BYTE_0` and its operand selects `WORD_1`, the
combined selection should be `BYTE_2`, i.e. "`BYTE_0` of `WORD_1`". The
existing implementation does not always handle this correctly in some
complex situations with instructions across different basic blocks as
demonstrated by the test cases included in this PR.

This PR adds an additional selection combination step to the conversion
to fix this issue. It reverts the changes made by PR #123942 which had
disabled the conversion of preexisting SDWA instructions completely as a
quick fix.

---------

Co-authored-by: Jeffrey Byrnes <Jeffrey.Byrnes@amd.com>
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-03-03 17:07:28 +01:00
Frederik Harwath
bfd9bc2745
[AMDGPU] SIPeepholeSDWA: Disable on existing SDWA instructions (#124131)
This PR reapplies the changes from PR #123942 which had to be reverted
because of a test failure. The test has been adjusted.
2025-01-24 09:12:32 +01:00
Nico Weber
99d450e9f5 Revert "[AMDGPU] SIPeepholeSDWA: Disable on existing SDWA instructions (#123942)"
This reverts commit 6fdaaafd89d7cbc15dafe3ebf1aa3235d148aaab.
Breaks check-llvm, see
https://github.com/llvm/llvm-project/pull/123942#issuecomment-2609861953
2025-01-23 09:19:42 -05:00
Frederik Harwath
6fdaaafd89
[AMDGPU] SIPeepholeSDWA: Disable on existing SDWA instructions (#123942)
This is meant as a short-term workaround for an invalid conversion in
this pass that occurs because existing SDWA selections are not correctly
taken into account during the conversion.

See the draft PR #123221 for an attempt to fix the actual issue.

---------

Co-authored-by: Frederik Harwath <fharwath@amd.com>
2025-01-23 14:32:01 +01:00
Jay Foad
8d13e7b8c3
[AMDGPU] Qualify auto. NFC. (#110878)
Generated automatically with:
$ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find
lib/Target/AMDGPU/ -type f)
2024-10-03 13:07:54 +01:00
Akshat Oke
e1ee07d0ff
[AMDGPU][NewPM] Port SIPeepholeSDWA pass to NPM (#107049) 2024-09-11 14:30:16 +04:00
Jay Foad
63fae3ed65
[AMDGPU] clang-tidy: no else after return etc. NFC. (#99298) 2024-07-17 21:11:00 +01:00
Jeffrey Byrnes
f903e3ec77 [AMDGPU] Reset kill flags for multiple uses of SDWAInst Ops
Change-Id: I8b56d86a55c397623567945a87ad2f55749680bc
2024-07-01 09:14:02 -07:00
Brian Favela
e7e90dd1c1
[AMDGPU] Adding multiple use analysis to SIPeepholeSDWA (#94800)
Allow for multiple uses of an operand where each instruction can be
promoted to SDWA.

For instance:

; v_and_b32 v2, lit(0x0000ffff), v2
; v_and_b32 v3, 6, v2
; v_and_b32 v2, 1, v2

Can be folded to:
; v_and_b32 v3, 6, sel_lo(v2)
; v_and_b32 v2, 1, sel_lo(v2)
2024-06-14 19:14:19 +02:00
Pierre van Houtryve
52d5b8e02d
[AMDGPU] Don't form sext/abs/neg fp8 cvt (#83843)
gfx940 does not allow abs/sext/neg on v_cvt_fp8/bf8 & pk variants.

Fixes SWDEV-447468
2024-03-06 10:38:20 +01:00
Valery Pykhtin
a845ea3878
[AMDGPU] Fix SDWA 'preserve' transformation for instructions in different basic blocks. (#82406)
This fixes crash when operand sources for V_OR instruction reside in
different basic blocks.
2024-02-28 14:47:33 +01:00
Pierre van Houtryve
d2edff839d
[AMDGPU] PeepholeSDWA: Don't assume inst srcs are registers (#69576)
To fix that ticket we only needed to address the V_LSHLREV_B16 case, but
I did it for all insts just in case.

Fixes #66899
2023-10-19 12:13:45 +02:00
David Green
2802739dfd [NFC] Replace ;; with ; 2023-06-11 10:25:24 +01:00
Jay Foad
a07584d57d [CodeGen] Make more use of MachineOperand::getOperandNo. NFC.
Differential Revision: https://reviews.llvm.org/D143252
2023-02-07 11:50:57 +00:00
Jay Foad
768aed1378 [MC] Make more use of MCInstrDesc::operands. NFC.
Change MCInstrDesc::operands to return an ArrayRef so we can easily use
it everywhere instead of the (IMHO ugly) opInfo_begin and opInfo_end.
A future patch will remove opInfo_begin and opInfo_end.

Also use it instead of raw access to the OpInfo pointer. A future patch
will remove this pointer.

Differential Revision: https://reviews.llvm.org/D142213
2023-01-23 11:31:41 +00:00
Jay Foad
6443c0ee02 [AMDGPU] Stop using make_pair and make_tuple. NFC.
C++17 allows us to call constructors pair and tuple instead of helper
functions make_pair and make_tuple.

Differential Revision: https://reviews.llvm.org/D139828
2022-12-14 13:22:26 +00:00
Fangrui Song
67819a72c6 [CodeGen] llvm::Optional => std::optional 2022-12-13 09:06:36 +00:00
Kazu Hirata
20cde15415 [Target] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-02 20:36:06 -08:00
Kazu Hirata
09e0aeaaaa [AMDGPU] Use std::optional in SIPeepholeSDWA.cpp (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-11-25 22:40:00 -08:00
Yashwant Singh
2652db4d68 Handling ADD|SUB U64 decomposed Pseudos not getting lowered to SDWA form
This patch fixes some of the V_ADD/SUB_U64_PSEUDO not getting converted to their sdwa form.
We still get below patterns in generated code:
v_and_b32_e32 v0, 0xff, v0
v_add_co_u32_e32 v0, vcc, v1, v0
v_addc_co_u32_e64 v1, s[0:1], 0, 0, vcc

and,
v_and_b32_e32 v2, 0xff, v2
v_add_co_u32_e32 v0, vcc, v0, v2
v_addc_co_u32_e32 v1, vcc, 0, v1, vcc

1st and 2nd instructions of both above examples should have been folded into sdwa add with BYTE_0 src operand.

The reason being the pseudo instruction is broken down into VOP3 instruction pair of V_ADD_CO_U32_e64 and V_ADDC_U32_e64.
The sdwa pass attempts lowering them to their VOP2 form before converting them into sdwa instructions. However V_ADDC_U32_e64
cannot be shrunk to it's VOP2 form if it has non-reg src1 operand.
This change attempts to fix that problem by only shrinking V_ADD_CO_U32_e64 instruction.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D136663
2022-11-17 10:01:40 +05:30
Pierre van Houtryve
7425077e31 [AMDGPU] Add & use hasNamedOperand, NFC
In a lot of places, we were just calling `getNamedOperandIdx` to check if the result was != or == to -1.
This is fine in itself, but it's verbose and doesn't make the intention clear, IMHO. I added a `hasNamedOperand` and replaced all cases I could find with regexes and manually.

Reviewed By: arsenm, foad

Differential Revision: https://reviews.llvm.org/D137540
2022-11-08 07:57:21 +00:00
Sebastian Neubauer
6527b2a4d5 [AMDGPU][NFC] Fix typos
Fix some typos in the amdgpu backend.

Differential Revision: https://reviews.llvm.org/D119235
2022-02-18 15:05:21 +01:00
Christudasan Devadasan
399b7de0ea [AMDGPU] Add a regclass flag for scalar registers
Along with vector RC flags, this scalar flag will
make various regclass queries like `isVGPR` more
accurate.

Regclasses other than vectors are currently set
with the new flag even though certain unallocatable
classes aren't truly scalars. It would be ok as long
as they remain unallocatable.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D110053
2021-12-01 23:31:07 -05:00
Christudasan Devadasan
654c89d85a [AMDGPU] Make vector superclasses allocatable
The combined vector register classes with both
VGPRs and AGPRs are currently unallocatable.
This patch turns them into allocatable as a
prerequisite to enable copy between VGPR and
AGPR registers during regalloc.

Also, added the missing AV register classes from
192b to 1024b.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D109300
2021-11-26 00:42:12 -05:00
Neubauer, Sebastian
d1f45ed58f [AMDGPU][NFC] Fix typos
Differential Revision: https://reviews.llvm.org/D113672
2021-11-12 11:37:21 +01:00
dfukalov
560d7e0411 [NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets
... to reduce headers dependency.

Reviewed By: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D95036
2021-01-20 22:22:45 +03:00
Joe Nash
314e29ed2b [AMDGPU] Add _e64 suffix to VOP3 Insts
Previously, instructions which could be
expressed as VOP3 in addition to another
encoding had a _e64 suffix on the tablegen
record name, while those
only available as VOP3 did not. With this
patch, all VOP3s will have the _e64 suffix.
The assembly does not change, only  the mir.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D94341

Change-Id: Ia8ec8890d47f8f94bbbdac43745b4e9dd2b03423
2021-01-12 18:33:18 -05:00
dfukalov
6a87e9b08b [NFC][AMDGPU] Reduce include files dependency.
Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D93813
2021-01-07 22:22:05 +03:00
Jay Foad
3497860203 [AMDGPU] Remove uses of Register::isPhysicalRegister/isVirtualRegister
... in favour of the isPhysical/isVirtual methods.
2020-08-20 17:59:11 +01:00
Stanislav Mekhanoshin
0462aef5f3 [AMDGPU] Inhibit SDWA if target instruction has FI
Differential Revision: https://reviews.llvm.org/D85918
2020-08-13 11:34:28 -07:00
Matt Arsenault
79f67cae91 AMDGPU: Rename add/sub with carry out instructions
The hardware has created a real mess in the naming for add/sub, which
have been renamed basically every generation. Switch the carry out
pseudos to have the gfx9/gfx10 names. We were using the original SI/CI
v_add_i32/v_sub_i32 names. Later targets reintroduced these names as
carryless instructions with a saturating clamp bit, which we do not
define. Do this rename so we can unambiguously add these missing
instructions.

The carry-in versions should also be renamed, but at least those had a
consistent _u32 name to begin with. The 16-bit instructions were also
renamed, but aren't ambiguous.

This does regress assembler error message quality in some cases. In
mismatched wave32/wave64 situations, this will switch from
"unsupported instruction" to "invalid operand", with the error
pointing at the wrong position. I couldn't quite follow how the
assembler selects these, but the previous behavior seemed accidental
to me. It looked like there was a partial attempt to handle this which
was never completed (i.e. there is an AMDGPUOperand::isBoolReg but it
isn't used for anything).
2020-07-16 13:16:30 -04:00
Matt Arsenault
07cd19efa2 AMDGPU: Fix dropping MI flags when rewriting instructions
All 3 passes that change instruction encodings were dropping MI
flags. This avoids scheduling regressions caused by setting
mayRaiseFPExceptions on FP instructions for non-strictfp functions.
2020-05-27 13:27:06 -04:00
Hans Wennborg
a19de32095 Fix unused function warning (PR44808) 2020-02-12 15:12:48 +01:00
Tim Renouf
3d5ba7c60f AMDGPU: Fixed indeterminate map iteration in SIPeepholeSDWA
Differential Revision: https://reviews.llvm.org/D70783

Change-Id: Ic26f915a4acb4c00ecefa9d09d7c24cec370ed06
2019-12-02 12:08:49 +00:00
Daniel Sanders
0c47611131 Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM
Summary:
This clang-tidy check is looking for unsigned integer variables whose initializer
starts with an implicit cast from llvm::Register and changes the type of the
variable to llvm::Register (dropping the llvm:: where possible).

Partial reverts in:
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned&
MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register
PPCFastISel.cpp - No Register::operator-=()
PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned&
MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor

Manual fixups in:
ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned&
HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register
HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register.
PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned&

Depends on D65919

Reviewers: arsenm, bogner, craig.topper, RKSimon

Reviewed By: arsenm

Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65962

llvm-svn: 369041
2019-08-15 19:22:08 +00:00