llvm-project

Author	SHA1	Message	Date
Sam Elliott	2042887709	Reland "[NFC][MI] Tidy Up RegState enum use (1/2)" (#176277 ) This Change is to prepare to make RegState into an enum class. It: - Updates documentation to match the order in the code. - Brings the `get<>RegState` functions together and makes them `constexpr`. - Adopts the `get<>RegState` where RegStates were being chosen with ternary operators in backend code. - Introduces `hasRegState` to make querying RegState easier once it is an enum class. - Adopts `hasRegState` where equivalent was done with bitwise arithmetic. - Introduces `RegState::NoFlags`, which will be used for the lack of flags. - Documents that `0x1` is a reserved flag value used to detect if someone is passing `true` instead of flags (due to implicit bool to unsigned conversions). - Updates two calls to `MachineInstrBuilder::addReg` which were passing `false` to the flags operand, to no longer pass a value. - Documents that `getRegState` seems to have forgotten a call to `getEarlyClobberRegState`. This PR relands llvm/llvm-project#176091 (commit 1d616cdca3aba9d22f120888bb6b09b75ca90b92) which was reverted in llvm/llvm-project#176190 (commit 6309cd8668fc2ae589f156b23f86821f4ce5b7ea).	2026-01-16 13:05:06 -08:00
Sam Elliott	6309cd8668	Revert "[NFC][MI] Tidy Up RegState enum use (1/2)" (#176190 ) Reverts llvm/llvm-project#176091 Reverting because some compilers were erroring on the call to `Reg.isReg()` (which is not `constexpr`) in a `constexpr` function.	2026-01-15 07:58:05 -08:00
Sam Elliott	1d616cdca3	[NFC][MI] Tidy Up RegState enum use (1/2) (#176091 ) This Change is to prepare to make RegState into an enum class. It: - Updates documentation to match the order in the code. - Brings the `get<>RegState` functions together and makes them `constexpr`. - Adopts the `get<>RegState` where RegStates were being chosen with ternary operators in backend code. - Introduces `hasRegState` to make querying RegState easier once it is an enum class. - Adopts `hasRegState` where equivalent was done with bitwise arithmetic. - Introduces `RegState::NoFlags`, which will be used for the lack of flags. - Documents that `0x1` is a reserved flag value used to detect if someone is passing `true` instead of flags (due to implicit bool to unsigned conversions). - Updates two calls to `MachineInstrBuilder::addReg` which were passing `false` to the flags operand, to no longer pass a value. - Documents that `getRegState` seems to have forgotten a call to `getEarlyClobberRegState`.	2026-01-15 07:47:05 -08:00
Jay Foad	72c69aefba	[AMDGPU] Make use of getFunction and getMF. NFC. (#167872 )	2025-11-14 11:00:57 +00:00
Matt Arsenault	55422e804b	CodeGen: Remove TRI argument from getRegClass (#158225 ) TargetInstrInfo now directly holds a reference to TargetRegisterInfo and does not need TRI passed in anywhere.	2025-11-10 15:43:55 -08:00
Matt Arsenault	de4aa9cdea	AMDGPU: Minor SDWA pass cleanups (#166629 ) Don't use low level regclass query in SDWA pass.	2025-11-07 20:50:01 -08:00
Matt Arsenault	1a5494ca4a	AMDGPU: Use RegClassByHwMode to manage operand VGPR operand constraints (#158272 ) This removes special case processing in TargetInstrInfo::getRegClass to fixup register operands which depending on the subtarget support AGPRs, or require even aligned registers. This regresses assembler diagnostics, which currently work by hackily accepting invalid cases and then post-rejecting a validly parsed instruction. On the plus side this now emits a comment when disassembling unaligned registers for targets with the alignment requirement.	2025-10-08 11:19:54 +09:00
Jay Foad	3cb2174219	[AMDGPU] Skip debug uses in SIPeepholeSDWA (#160092 )	2025-09-22 14:40:00 +01:00
Frederik Harwath	d0d79fd1ac	[AMDGPU] si-peephole-sdwa: reuse getOne{NonDBGUse,Def} (NFC) (#156455 ) This patch changes the findSingleRegDef function from si-peephole-sdwa to reuse MachineRegisterInfo::getOneDef and findSingleRefUse to use a new MachineRegisterInfo::getOneNonDBGUse function.	2025-09-03 10:35:32 +02:00
Jun Wang	063cee7bde	[AMDGPU][MC] Allow opsel for v_max_i16 etc in GFX10 (#143982 ) In GFX10, a number of VOP3 instructions should allow opsel, including V_MAX_I16, V_MAX_U16, V_MIN_I16, V_MIN_U16, V_MUL_LO_U16, V_LSHLREV_B16, V_LSHRREV_B16, and V_ASHRREV_I16.	2025-06-26 14:08:13 -07:00
Frederik Harwath	8a198f89bf	[AMDGPU] si-peephole-sdwa: Remove dead code from createSDWAversion (#141462 ) In an earlier state of this code, it was possible for an existing SDWA MI to reach the code in the "createSDWAversion" function. This is no longer possible; see assert at the top of the function. Remove code that tries to handle operands on pre-existing SDWA instructions from the function.	2025-05-26 15:28:03 +02:00
Frederik Harwath	d45031ce52	[AMDGPU] si-peephole-sdwa: Disable V_CNDMASK_B32 conversion with sext (#140760 ) The sext modifier on an operand of V_CNDMASK_B32_sdwa gets erroneously turned into a neg modifier in the assembly output. As a workaround, to avoid miscompilation, this patch disables the conversion of V_CNDMASK_B32 to the SDWA form if any operand uses an sext modifier. Fixes #138766. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-05-26 09:33:09 +02:00
Frederik Harwath	1377535d99	[AMDGPU] si-peephole-sdwa: Fix cndmask vcc use for wave32 (#139541 ) Before V_CNDMASK_B32_e64 gets converted to SDWA form, a conversion to V_CNDMASK_B32_e32 occurs. The vcc use of this instruction must be fixed into a vcc_lo use for wave32. This fix only happens after the final conversion to the SDWA form. This led to a compiler error in situations where the conversion to SDWA aborts. Make sure that the vcc-fix gets applied even if the SDWA conversion is not completed. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-05-14 07:37:01 +02:00
Kazu Hirata	e1cff21f65	[AMDGPU] Fix a warning This patch fixes: llvm/lib/Target/AMDGPU/SIPeepholeSDWA.cpp:1102:8: error: unused variable 'Converted' [-Werror,-Wunused-variable]	2025-05-05 10:07:22 -07:00
Frederik Harwath	721cba476d	[AMDGPU] SIPeepholeSDWA: Handle V_CNDMASK_B32_e64 (#137930 ) The VOP3 form of the V_CNDMASK_B32 instruction takes a carry-in operand. The conversion to SDWA implies a conversion to VOP2 form which reads from VCC instead. Convert V_CNDMASK_B32_e64 instructions that might be converted to SDWA to V_CNDMASK_B32_e32 first and introduce a copy of the carry-in operand to VCC. Closes #133431. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-05-05 18:14:14 +02:00
Frederik Harwath	ba9bd22e1b	[AMDGPU] Account for existing SDWA selections (#123221 ) The si-peephole-sdwa pass adjusts the selections on sdwa instructions to the selections on their operands during its conversions. For instance, if an instruction selects `BYTE_0` and its operand selects `WORD_1`, the combined selection should be `BYTE_2`, i.e. "`BYTE_0` of `WORD_1`". The existing implementation does not always handle this correctly in some complex situations with instructions across different basic blocks as demonstrated by the test cases included in this PR. This PR adds an additional selection combination step to the conversion to fix this issue. It reverts the changes made by PR #123942 which had disabled the conversion of preexisting SDWA instructions completely as a quick fix. --------- Co-authored-by: Jeffrey Byrnes <Jeffrey.Byrnes@amd.com> Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-03-03 17:07:28 +01:00
Frederik Harwath	bfd9bc2745	[AMDGPU] SIPeepholeSDWA: Disable on existing SDWA instructions (#124131 ) This PR reapplies the changes from PR #123942 which had to be reverted because of a test failure. The test has been adjusted.	2025-01-24 09:12:32 +01:00
Nico Weber	99d450e9f5	Revert "[AMDGPU] SIPeepholeSDWA: Disable on existing SDWA instructions (#123942 )" This reverts commit 6fdaaafd89d7cbc15dafe3ebf1aa3235d148aaab. Breaks check-llvm, see https://github.com/llvm/llvm-project/pull/123942#issuecomment-2609861953	2025-01-23 09:19:42 -05:00
Frederik Harwath	6fdaaafd89	[AMDGPU] SIPeepholeSDWA: Disable on existing SDWA instructions (#123942 ) This is meant as a short-term workaround for an invalid conversion in this pass that occurs because existing SDWA selections are not correctly taken into account during the conversion. See the draft PR #123221 for an attempt to fix the actual issue. --------- Co-authored-by: Frederik Harwath <fharwath@amd.com>	2025-01-23 14:32:01 +01:00
Jay Foad	8d13e7b8c3	[AMDGPU] Qualify auto. NFC. (#110878 ) Generated automatically with: $ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find lib/Target/AMDGPU/ -type f)	2024-10-03 13:07:54 +01:00
Akshat Oke	e1ee07d0ff	[AMDGPU][NewPM] Port SIPeepholeSDWA pass to NPM (#107049 )	2024-09-11 14:30:16 +04:00
Jay Foad	63fae3ed65	[AMDGPU] clang-tidy: no else after return etc. NFC. (#99298 )	2024-07-17 21:11:00 +01:00
Jeffrey Byrnes	f903e3ec77	[AMDGPU] Reset kill flags for multiple uses of SDWAInst Ops Change-Id: I8b56d86a55c397623567945a87ad2f55749680bc	2024-07-01 09:14:02 -07:00
Brian Favela	e7e90dd1c1	[AMDGPU] Adding multiple use analysis to SIPeepholeSDWA (#94800 ) Allow for multiple uses of an operand where each instruction can be promoted to SDWA. For instance: ; v_and_b32 v2, lit(0x0000ffff), v2 ; v_and_b32 v3, 6, v2 ; v_and_b32 v2, 1, v2 Can be folded to: ; v_and_b32 v3, 6, sel_lo(v2) ; v_and_b32 v2, 1, sel_lo(v2)	2024-06-14 19:14:19 +02:00
Pierre van Houtryve	52d5b8e02d	[AMDGPU] Don't form sext/abs/neg fp8 cvt (#83843 ) gfx940 does not allow abs/sext/neg on v_cvt_fp8/bf8 & pk variants. Fixes SWDEV-447468	2024-03-06 10:38:20 +01:00
Valery Pykhtin	a845ea3878	[AMDGPU] Fix SDWA 'preserve' transformation for instructions in different basic blocks. (#82406 ) This fixes crash when operand sources for V_OR instruction reside in different basic blocks.	2024-02-28 14:47:33 +01:00
Pierre van Houtryve	d2edff839d	[AMDGPU] PeepholeSDWA: Don't assume inst srcs are registers (#69576 ) To fix that ticket we only needed to address the V_LSHLREV_B16 case, but I did it for all insts just in case. Fixes #66899	2023-10-19 12:13:45 +02:00
David Green	2802739dfd	[NFC] Replace ;; with ;	2023-06-11 10:25:24 +01:00
Jay Foad	a07584d57d	[CodeGen] Make more use of MachineOperand::getOperandNo. NFC. Differential Revision: https://reviews.llvm.org/D143252	2023-02-07 11:50:57 +00:00
Jay Foad	768aed1378	[MC] Make more use of MCInstrDesc::operands. NFC. Change MCInstrDesc::operands to return an ArrayRef so we can easily use it everywhere instead of the (IMHO ugly) opInfo_begin and opInfo_end. A future patch will remove opInfo_begin and opInfo_end. Also use it instead of raw access to the OpInfo pointer. A future patch will remove this pointer. Differential Revision: https://reviews.llvm.org/D142213	2023-01-23 11:31:41 +00:00
Jay Foad	6443c0ee02	[AMDGPU] Stop using make_pair and make_tuple. NFC. C++17 allows us to call constructors pair and tuple instead of helper functions make_pair and make_tuple. Differential Revision: https://reviews.llvm.org/D139828	2022-12-14 13:22:26 +00:00
Fangrui Song	67819a72c6	[CodeGen] llvm::Optional => std::optional	2022-12-13 09:06:36 +00:00
Kazu Hirata	20cde15415	[Target] Use std::nullopt instead of None (NFC) This patch mechanically replaces None with std::nullopt where the compiler would warn if None were deprecated. The intent is to reduce the amount of manual work required in migrating from Optional to std::optional. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-02 20:36:06 -08:00
Kazu Hirata	09e0aeaaaa	[AMDGPU] Use std::optional in SIPeepholeSDWA.cpp (NFC) This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-11-25 22:40:00 -08:00
Yashwant Singh	2652db4d68	Handling ADD\|SUB U64 decomposed Pseudos not getting lowered to SDWA form This patch fixes some of the V_ADD/SUB_U64_PSEUDO not getting converted to their sdwa form. We still get below patterns in generated code: v_and_b32_e32 v0, 0xff, v0 v_add_co_u32_e32 v0, vcc, v1, v0 v_addc_co_u32_e64 v1, s[0:1], 0, 0, vcc and, v_and_b32_e32 v2, 0xff, v2 v_add_co_u32_e32 v0, vcc, v0, v2 v_addc_co_u32_e32 v1, vcc, 0, v1, vcc 1st and 2nd instructions of both above examples should have been folded into sdwa add with BYTE_0 src operand. The reason being the pseudo instruction is broken down into VOP3 instruction pair of V_ADD_CO_U32_e64 and V_ADDC_U32_e64. The sdwa pass attempts lowering them to their VOP2 form before converting them into sdwa instructions. However V_ADDC_U32_e64 cannot be shrunk to it's VOP2 form if it has non-reg src1 operand. This change attempts to fix that problem by only shrinking V_ADD_CO_U32_e64 instruction. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D136663	2022-11-17 10:01:40 +05:30
Pierre van Houtryve	7425077e31	[AMDGPU] Add & use `hasNamedOperand`, NFC In a lot of places, we were just calling `getNamedOperandIdx` to check if the result was != or == to -1. This is fine in itself, but it's verbose and doesn't make the intention clear, IMHO. I added a `hasNamedOperand` and replaced all cases I could find with regexes and manually. Reviewed By: arsenm, foad Differential Revision: https://reviews.llvm.org/D137540	2022-11-08 07:57:21 +00:00
Sebastian Neubauer	6527b2a4d5	[AMDGPU][NFC] Fix typos Fix some typos in the amdgpu backend. Differential Revision: https://reviews.llvm.org/D119235	2022-02-18 15:05:21 +01:00
Christudasan Devadasan	399b7de0ea	[AMDGPU] Add a regclass flag for scalar registers Along with vector RC flags, this scalar flag will make various regclass queries like `isVGPR` more accurate. Regclasses other than vectors are currently set with the new flag even though certain unallocatable classes aren't truly scalars. It would be ok as long as they remain unallocatable. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D110053	2021-12-01 23:31:07 -05:00
Christudasan Devadasan	654c89d85a	[AMDGPU] Make vector superclasses allocatable The combined vector register classes with both VGPRs and AGPRs are currently unallocatable. This patch turns them into allocatable as a prerequisite to enable copy between VGPR and AGPR registers during regalloc. Also, added the missing AV register classes from 192b to 1024b. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D109300	2021-11-26 00:42:12 -05:00
Neubauer, Sebastian	d1f45ed58f	[AMDGPU][NFC] Fix typos Differential Revision: https://reviews.llvm.org/D113672	2021-11-12 11:37:21 +01:00
dfukalov	560d7e0411	[NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets ... to reduce headers dependency. Reviewed By: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D95036	2021-01-20 22:22:45 +03:00
Joe Nash	314e29ed2b	[AMDGPU] Add _e64 suffix to VOP3 Insts Previously, instructions which could be expressed as VOP3 in addition to another encoding had a _e64 suffix on the tablegen record name, while those only available as VOP3 did not. With this patch, all VOP3s will have the _e64 suffix. The assembly does not change, only the mir. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D94341 Change-Id: Ia8ec8890d47f8f94bbbdac43745b4e9dd2b03423	2021-01-12 18:33:18 -05:00
dfukalov	6a87e9b08b	[NFC][AMDGPU] Reduce include files dependency. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D93813	2021-01-07 22:22:05 +03:00
Jay Foad	3497860203	[AMDGPU] Remove uses of Register::isPhysicalRegister/isVirtualRegister ... in favour of the isPhysical/isVirtual methods.	2020-08-20 17:59:11 +01:00
Stanislav Mekhanoshin	0462aef5f3	[AMDGPU] Inhibit SDWA if target instruction has FI Differential Revision: https://reviews.llvm.org/D85918	2020-08-13 11:34:28 -07:00
Matt Arsenault	79f67cae91	AMDGPU: Rename add/sub with carry out instructions The hardware has created a real mess in the naming for add/sub, which have been renamed basically every generation. Switch the carry out pseudos to have the gfx9/gfx10 names. We were using the original SI/CI v_add_i32/v_sub_i32 names. Later targets reintroduced these names as carryless instructions with a saturating clamp bit, which we do not define. Do this rename so we can unambiguously add these missing instructions. The carry-in versions should also be renamed, but at least those had a consistent _u32 name to begin with. The 16-bit instructions were also renamed, but aren't ambiguous. This does regress assembler error message quality in some cases. In mismatched wave32/wave64 situations, this will switch from "unsupported instruction" to "invalid operand", with the error pointing at the wrong position. I couldn't quite follow how the assembler selects these, but the previous behavior seemed accidental to me. It looked like there was a partial attempt to handle this which was never completed (i.e. there is an AMDGPUOperand::isBoolReg but it isn't used for anything).	2020-07-16 13:16:30 -04:00
Matt Arsenault	07cd19efa2	AMDGPU: Fix dropping MI flags when rewriting instructions All 3 passes that change instruction encodings were dropping MI flags. This avoids scheduling regressions caused by setting mayRaiseFPExceptions on FP instructions for non-strictfp functions.	2020-05-27 13:27:06 -04:00
Hans Wennborg	a19de32095	Fix unused function warning (PR44808)	2020-02-12 15:12:48 +01:00
Tim Renouf	3d5ba7c60f	AMDGPU: Fixed indeterminate map iteration in SIPeepholeSDWA Differential Revision: https://reviews.llvm.org/D70783 Change-Id: Ic26f915a4acb4c00ecefa9d09d7c24cec370ed06	2019-12-02 12:08:49 +00:00
Daniel Sanders	0c47611131	Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM Summary: This clang-tidy check is looking for unsigned integer variables whose initializer starts with an implicit cast from llvm::Register and changes the type of the variable to llvm::Register (dropping the llvm:: where possible). Partial reverts in: X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned& MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register PPCFastISel.cpp - No Register::operator-=() PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned& MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor Manual fixups in: ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned& HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register. PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned& Depends on D65919 Reviewers: arsenm, bogner, craig.topper, RKSimon Reviewed By: arsenm Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65962 llvm-svn: 369041	2019-08-15 19:22:08 +00:00

1 2

87 Commits