llvm-project

Author	SHA1	Message	Date
Jay Foad	9cacc4138e	[AMDGPU] Move S_ADD_U64_PSEUDO handling into getVALUOp. NFC. (#142934 ) S_ADD_U64_PSEUDO and S_SUB_U64_PSEUDO are not "special cases" so can be handled in getVALUOp instead of moveToVALUImpl.	2025-06-05 16:49:24 +01:00
Brox Chen	b668b6439a	[AMDGPU][True16][CodeGen] legalize 16bit and 32bit use-def chain for moveToVALU in si-fix-sgpr-lowering (#138734 ) Two changes in this patch: 1. Covered another case in legalizeOperandVALUt16 functions and the COPY lowering, when SALU16 is used by SALU32, need to insert a reg_sequence after moved to valu (previously only considered SALU32 used by SALU16 case) 2. Moved the useMI analysis into addUsersToMoveVALUList. Legalize the targetted operand when needed. Turn on frem test with true16 mode for gfx1150 which is failing before this patch. A few bitcast tests also impacted by this change with some v_mov being replaced to dual mov	2025-06-04 09:53:10 -04:00
Matt Arsenault	65b90c59ce	AMDGPU: Remove redundant operand folding checks (#140587 ) This was pre-filtering out a specific situation from being added to the fold candidate list. The operand legality will ultimately be checked with isOperandLegal before the fold is performed, so I don't see the plus in pre-filtering this one case.	2025-05-29 19:38:45 +02:00
Justin Bogner	b7bb256703	Warn on misuse of DiagnosticInfo classes that hold Twines (#137397 ) This annotates the `Twine` passed to the constructors of the various DiagnosticInfo subclasses with `[[clang::lifetimebound]]`, which causes us to warn when we would try to print the twine after it had already been destructed. We also update `DiagnosticInfoUnsupported` to hold a `const Twine &` like all of the other DiagnosticInfo classes, since this warning allows us to clean up all of the places where it was being used incorrectly.	2025-05-28 12:26:39 -07:00
Ivan Kosarev	66d3980b53	[AMDGPU][NFC] Remove _DEFERRED operands. (#139123 ) All immediates are deferred now.	2025-05-09 10:10:53 +01:00
Ivan Kosarev	c290f48a45	[AMDGPU][NFC] Remove unused operand types. (#139062 )	2025-05-08 12:48:25 +01:00
Brox Chen	09d01be856	[AMDGPU][True16][CodeGen] replace subreg_to_reg to req_sequence (#138746 ) Since subreg_to_reg is considered broken in llvm, replace subreg_to_reg to reg_sequence	2025-05-07 10:28:10 -04:00
Frederik Harwath	f541a3aad8	[AMDGPU] SIInstrInfo: Fix resultDependsOnExec for VOPC instructions (#134629 ) SIInstrInfo::resultDependsOnExec assumes that operand 0 of a comparison is always the destination of the instruction. This is not true for instructions in VOPC form where it is "src0". This led to a crash in machine-cse. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-04-22 10:17:35 +02:00
Philip Reames	f2ecd86e34	[Analysis] Remove implicit LocationSize conversion from uint64_t (#133342 ) This change removes the uint64_t constructor on LocationSize preventing implicit conversion, and fixes up the using APIs to adapt to the change. Note that I'm adding a couple of explicit conversion points on routines where passing in a fixed offset as an integer seems likely to have well understood semantics. We had an unfortunate case which arose if you tried to pass a TypeSize value to a parameter of LocationSize type. We'd find the implicit conversion path through TypeSize -> uint64_t -> LocationSize which works just fine for fixed values, but looses information and fails assertions if the TypeSize was scalable. This change breaks the first link in that implicit conversion chain since that seemed to be the easier one.	2025-04-18 07:46:31 -07:00
Brox Chen	bf388f8a43	[AMDGPU][True16][CodeGen] legalize operands when move16bit SALU to VALU (#133985 ) This is a follow up PR from https://github.com/llvm/llvm-project/pull/132089. When a V2S copy and its useMI are lowered to VALU, this patch check: If the generated new VALU is a true16 inst. Add subreg access on all operands if necessary. an example MIR looks like: ``` %1:vgpr_32 = V_CVT_F32_U32_e64 %0:vgpr_32, 0, 0 ... %2:sreg_32 = COPY %1:vgpr_32 %3:sreg_32 = S_FLOOR_F16 %2:sreg_32, ... ``` currently lowered to ``` %1:vgpr_32 = V_CVT_F32_U32_e64 %0:vgpr_32, 0, 0 ... %2:vgpr_16 = V_FLOOR_F16_t16_e64 0, %1:vgpr_32, 0, 0, 0 ... ``` after this patch ``` %1:vgpr_32 = V_CVT_F32_U32_e64 %0:vgpr_32, 0, 0 ... %2:vgpr_16 = V_FLOOR_F16_t16_e64 0, %1.lo16:vgpr_32, 0, 0, 0 ... ```	2025-04-03 12:26:41 -04:00
Brox Chen	dd1d41f833	[AMDGPU][True16][CodeGen] fix moveToVALU with proper subreg access in true16 (#132089 ) There are V2S copies between vpgr16 and spgr32 in true16 mode. This is caused by vgpr16 and sgpr32 both selectable by 16bit src in ISel. When a V2S copy and its useMI are lowered to VALU, this patch check 1. If the generated new VALU is used by a true16 inst. Add subreg access if necessary. 2. Legalize the V2S copy by replacing it to subreg_to_reg an example MIR looks like: ``` %2:sgpr_32 = COPY %1:vgpr_16 %3:sgpr_32 = S_OR_B32 %2:sgpr_32, ... %4:vgpr_16 = V_ADD_F16_t16 %3:sgpr_32, ... ``` currently lowered to ``` %2:vgpr_32 = COPY %1:vgpr_16 %3:vgpr_32 = V_OR_B32 %2:vgpr_32, ... %4:vgpr_16 = V_ADD_F16_t16 %3:vgpr_32, ... ``` after this patch ``` %2:vgpr_32 = SUBREG_TO_REG 0, %1:vgpr_16, lo16 %3:vgpr_32 = V_OR_B32 %2:vgpr_32, ... %4:vgpr_16 = V_ADD_F16_t16 %3.lo16:vgpr_32, ... ```	2025-04-01 12:40:18 -04:00
Stephen Thomas	2e3fa4ba9e	[AMDGPU] Insert before and after instructions that always use GDS (#131338 ) It is an architectural requirement that there must be no outstanding GDS instructions when an "always GDS" instruction is issued, and also that an always GDS instruction must be allowed to complete. Insert waits on DScnt/LGKMcnt prior to (if necessary) and subsequent to (unconditionally) any always GDS instruction, and an additional S_NOP if the subsequent wait was followed by S_ENDPGM. Always GDS instructions are GWS instructions, DS_ORDERED_COUNT, DS_ADD_GS_REG_RTN, and DS_SUB_GS_REG_RTN (the latter two as considered always GDS as of this patch).	2025-03-21 09:33:04 +00:00
Shilei Tian	b7852939b5	[NFC][AMDGPU] Replace multiple calls to `MI.getOpcode()` with `Opcode` (#131400 )	2025-03-14 20:14:12 -04:00
Mirko Brkušanin	a6089a949f	[AMDGPU] Ignore RegMask operands when folding operands to SALU insts (#130813 ) Otherwise we hit an assert in isInlineConstant.	2025-03-12 09:59:24 +01:00
Matt Arsenault	7425af4b7a	AMDGPU: Add pseudoinstruction for agpr or vgpr constants (#130042 )	2025-03-07 09:18:22 +07:00
Matt Arsenault	4fb31e4401	AMDGPU: Use const reference for DebugLoc	2025-03-04 13:56:52 +07:00
sstipano	531c48546d	[AMDGPU][NFC] Move isXDL and isDGEMM to SIInstrInfo. (#129103 )	2025-02-28 03:14:51 +01:00
Frederik Harwath	50b508cc7b	[AMDGPU] Verify SdwaSel value range (#128898 ) Make the MachineVerifier check that the value provided for an SDWA selection is a valid value for the SdwaSel enum.	2025-02-27 08:11:29 +01:00
Brox Chen	364b97f23b	[AMDGPU][True16][CodeGen] 16bit spill support in true16 mode (#128060 ) Enables 16-bit values to be spilled to scratch. Note, the memory instructions used are defined as reading and writing VGPR_32, but do not clobber the unspecified 16-bits of those registers, and so spills and reloads of lo and hi halves of the registers work.	2025-02-26 16:17:20 -05:00
Brox Chen	bb62af7d14	[AMDGPU][True16][CodeGen] true16 codegen for valu op (#124797 ) true16 selection for valu ops, enable `real-true16` attribute and update the codegen test	2025-02-26 10:50:49 -05:00
Pierre van Houtryve	0f0d3fb6b5	[AMDGPU] Do not allow M0 as v_readlane_b32 dst (#128867 ) See #128851 - this is the same patch, but for v_readlane_b32. This instruction is used much less often so there were less changes required.	2025-02-26 14:13:39 +01:00
Pierre van Houtryve	5231736329	[AMDGPU] Do not allow M0 as v_readfirstlane_b32 dst (#128851 ) M0 can only be written to by the SALU, so `v_readfirstlane_b32 m0` is effectively useless. Represent this by restricting the dest RC of that instruction to `SReg_32_XM0` which excludes M0. There is a lot of test changes due to the register class changing, but most changes are trivial. In some cases, an extra register and `s_mov_b32` is needed. Fixes SWDEV-513269	2025-02-26 13:14:03 +01:00
Craig Topper	571b787b83	[CodeGen] Change copyPhysReg interface to use Register instead of MCRegister. (#128473 ) NVPTX, SPIRV, and WebAssembly pass virtual registers to this function since they don't perform register allocation. We need to use Register to avoid a virtual register being converted to MCRegister by the caller.	2025-02-24 09:55:34 -08:00
Benjamin Kramer	ddf24086f1	[AMDGPU] Remove unused variables. NFC	2025-02-19 18:05:22 +01:00
Brox Chen	210036a22e	[AMDGPU][True16][CodeGen] true16 codegen pattern for fma (#127240 ) Previous PR https://github.com/llvm/llvm-project/pull/122950 get reverted since it hit the buildbot failure. Another patch get merged when this PR is under review, and thus causing one test not up to date. repen this PR and fixed the issue.	2025-02-19 11:37:24 -05:00
Matt Arsenault	22d65d8989	AMDGPU: Teach isOperandLegal about SALU literal restrictions (#127626 ) isOperandLegal mostly implemented the VALU operand rules, and largely ignored SALU restrictions. This theoretically avoids folding literals into SALU insts which already have a literal operand. This issue is currently avoided due to a bug in SIFoldOperands; this change will allow using raw operand legality rules. This breaks the formation of s_fmaak_f32 in SIFoldOperands, but it probably should not have been forming there in the first place. TwoAddressInsts or RA should generally handle that, and this only worked by accident.	2025-02-19 10:53:03 +07:00
Matt Arsenault	eb7c947272	AMDGPU: Correct legal literal operand logic for multiple uses (#127594 ) The same literal can be used multiple times in an instruction, not just once. We were not tracking the used value to verify this, so correct this. This helps avoid regressions in a future patch.	2025-02-18 19:58:42 +07:00
Matt Arsenault	7c03865a1e	AMDGPU: Extract lambda used in foldImmediate into a helper function (#127484 ) It was also too permissive for a more general utilty, only return the original immediate if there is no subregister.	2025-02-18 17:16:50 +07:00
Matt Arsenault	c5def84ca4	AMDGPU: Handle brev and not cases in getConstValDefinedInReg (#127483 ) We should not encounter these cases in the peephole-opt use today, but get the common helper function to handle these.	2025-02-18 11:23:49 +07:00
Matt Arsenault	83d7f4b8c3	AMDGPU: Implement getConstValDefinedInReg and use in foldImmediate (NFC) (#127482 ) This is NFC because it currently only matters for cases that are not isMoveImmediate, and we do not yet implement any of those. This just moves the implementation of foldImmediate to use the common interface, similar to how x86 does it.	2025-02-18 11:21:02 +07:00
Matt Arsenault	4dee305ce2	AMDGPU: Fix foldImmediate breaking register class constraints (#127481 ) This fixes a verifier error when folding an immediate materialized into an aligned vgpr class into a copy to an unaligned virtual register.	2025-02-18 10:34:48 +07:00
Kazu Hirata	02d4aac55c	[AMDGPU] Remove materializeImmediate (#127420 ) The lase use was removed in: commit cbf34a5f7701148d68951320a72f483849b22eaf Author: Juan Manuel Martinez Caamaño <jmartinezcaamao@gmail.com> Date: Fri Aug 23 14:06:17 2024 +0200	2025-02-16 22:47:14 -08:00
Brox Chen	cf1165cb9c	Revert "[AMDGPU][True16][CodeGen] true16 codegen pattern for fma (#12… (#127175 ) Reverting this patch since it raise buildbot failure This reverts commit 2a7487cc2e0fb8bd91784e2d9636a65baa6d90ed.	2025-02-14 02:28:45 -05:00
Brox Chen	2a7487cc2e	[AMDGPU][True16][CodeGen] true16 codegen pattern for fma (#122950 ) true16 codegen pattern for f16 fma. created a duplicated shrink-mad-fma-gfx10.mir from shrink-mad-fma to seperate pre-GFX11 and GFX11 mir test.	2025-02-14 02:16:00 -05:00
Rahul Joshi	bee9664970	[TableGen] Emit OpName as an enum class instead of a namespace (#125313 ) - Change InstrInfoEmitter to emit OpName as an enum class instead of an anonymous enum in the OpName namespace. - This will help clearly distinguish between values that are OpNames vs just operand indices and should help avoid bugs due to confusion between the two. - Rename OpName::OPERAND_LAST to NUM_OPERAND_NAMES. - Emit declaration of getOperandIdx() along with the OpName enum so it doesn't have to be repeated in various headers. - Also updated AMDGPU, RISCV, and WebAssembly backends to conform to the new definition of OpName (mostly mechanical changes).	2025-02-12 08:19:30 -08:00
Jon Chesterfield	4f358d75d0	[amdgpu][nfc] Post-commit feedback on c39fba209	2025-01-30 20:07:44 +00:00
Jon Chesterfield	c39fba209c	[AMDGPU] S_SET_GPR_IDX_ON can be passed an immediate index (#125086 ) Oversight found by ISel fuzz effort. Assuming the argument is a register, in some cases it can be an immediate. Tablegen's type for the instruction is SSrc_b32, i.e. register or immediate fine. Added the repro from the bug reporter as a test case - prior to this patch llvm will assert in getReg. Fixes SWDEV-508589	2025-01-30 16:40:12 +00:00
Brox Chen	5d1c596ab4	[AMDGPU][True16][MC] true16 for minimummaximum/max/min/max3/min3 (#124184 ) true16 support for gfx12 instructions including: v_minimummaximum_f16 v_maximumminimum_f16 v_maximum_f16 v_minimum_f16 v_maximum3_f16 v_minimum3_f16	2025-01-27 16:52:59 -05:00
Venkata Ramanaiah Nalamothu	f7d8336a2f	[llvm] Pass MachineInstr flags to storeRegToStackSlot/loadRegFromStackSlot (NFC) (#120622 ) This patch is in preparation to enable setting the MachineInstr::MIFlag flags, i.e. FrameSetup/FrameDestroy, on callee saved register spill/reload instructions in prologue/epilogue. This eventually helps in setting the prologue_end and epilogue_begin markers more accurately. The DWARF Spec in "6.4 Call Frame Information" says: The code that allocates space on the call frame stack and performs the save operation is called the subroutine’s prologue, and the code that performs the restore operation and deallocates the frame is called its epilogue. which means the callee saved register spills and reloads are part of prologue (a.k.a frame setup) and epilogue (a.k.a frame destruction), respectively. And, IIUC, LLVM backend uses FrameSetup/FrameDestroy flags to identify instructions that are part of call frame setup and destruction. In the trunk, while most targets consistently set FrameSetup/FrameDestroy on save/restore call frame information (CFI) instructions of callee saved registers, they do not consistently set those flags on the actual callee saved register spill/reload instructions. I believe this patch provides a clean mechanism to set FrameSetup/FrameDestroy flags on the actual callee saved register spill/reload instructions as needed. And, by having default argument of MachineInstr::NoFlags for Flags, this patch is a NFC. With this patch, the targets have to just pass FrameSetup/FrameDestroy flag to the storeRegToStackSlot/loadRegFromStackSlot calls from the target derived spillCalleeSavedRegisters and restoreCalleeSavedRegisters to set those flags on callee saved register spill/reload instructions. Also, this patch makes it very easy to set the source line information on callee saved register spill/reload instructions which is needed by the DwarfDebug.cpp implementation to set prologue_end and epilogue_begin markers more accurately. As per DwarfDebug.cpp implementation: prologue_end is the first known non-DBG_VALUE and non-FrameSetup location that marks the beginning of the function body epilogue_begin is the first FrameDestroy location that has been seen in the epilogue basic block With this patch, the targets have to just do the following to set the source line information on callee saved register spill/reload instructions, without hampering the LLVM's efforts to avoid adding source line information on the artificial code generated by the compiler. <Foo>InstrInfo::storeRegToStackSlot() { ... DebugLoc DL = Flags & MachineInstr::FrameSetup ? DebugLoc() : MBB.findDebugLoc(I); ... } <Foo>InstrInfo::loadRegFromStackSlot() { ... DebugLoc DL = Flags & MachineInstr::FrameDestroy ? MBB.findDebugLoc(I) : DebugLoc(); ... } While I understand this patch would break out-of-tree backend builds, I think it is in the right direction. One immediate use case that can benefit from this patch is fixing #120553 becomes simpler.	2025-01-22 13:36:39 +05:30
Kazu Hirata	ceaaa2b9ae	[AMDGPU] Fix warnings This patch fixes: llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:2792:14: error: comparison of integers of different signs: 'unsigned int' and 'int' [-Werror,-Wsign-compare] llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:2797:14: error: comparison of integers of different signs: 'unsigned int' and 'int' [-Werror,-Wsign-compare]	2025-01-21 20:24:30 -08:00
Shoreshen	7c58d6363a	[AMDGPU] Add commute for some VOP3 inst (#121326 ) add commute for some VOP3 inst, allow commute for both inline constant operand, adjust tests Fixes #111205	2025-01-22 11:08:26 +07:00
Austin Kerbow	657fb4433e	[AMDGPU] Add target hook to isGlobalMemoryObject (#112781 ) We want special handing for IGLP instructions in the scheduler but they should still be treated like they have side effects by other passes. Add a target hook to the ScheduleDAGInstrs DAG builder so that we have more control over this.	2025-01-11 09:57:57 -08:00
Matt Arsenault	f6365a47a1	AMDGPU: Fix assert on physreg MUBUF rsrc operand (#120815 ) The stack case uses a physical register and should not ordinarily reach here, but strange things happen at -O0. The testcase still errors because we do not yet attempt to handle arbitrary dynamic sized allocas yet. Fixes: SWDEV-503538	2025-01-07 08:11:05 +07:00
Brox Chen	ce831a231a	[AMDGPU][True16][MC] true16 for v_fma_f16 (#119477 ) Support true16 format for v_fma_f16 in MC. Since we are replacing v_fma_f16 to v_fma_f16_t16/v_fma_f16_fake16 in Post-GFX11, have to update the CodeGen pattern for v_fma_f16_fake16 to get CodeGen test passing. There is no pattern modified/created, but just replacing the v_fma_f16 with fake16 format.	2025-01-06 15:02:04 -05:00
Brox Chen	e10b12e656	[AMDGPU][True16][MC] true16 for v_div_fixup_f16 (#119613 ) Support true16 format for v_div_fixup_f16 in MC.	2024-12-18 18:01:13 -05:00
Ruiling, Song	67c55b1ffc	[AMDGPU] Make max dwords of memory cluster configurable (#119342 ) We find it helpful to increase the value for graphics workload. Make it configurable so we can experiment with a different value.	2024-12-18 14:17:27 +08:00
Matt Arsenault	5e53a8dadb	AMDGPU: Fix verifier assert with out of bounds subregister indexes (#119799 ) The manual check for aligned VGPR classes would assert if a virtual register used an index not supported by the register class.	2024-12-13 11:52:11 +09:00
Matt Arsenault	1944d192bd	AMDGPU: Use isWave[32\|64] instead of comparing size value (#117411 )	2024-11-23 09:30:57 -08:00
Matt Arsenault	d1cca3133a	AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32 for gfx950 (#117260 ) This was a bit annoying because these introduce a new special case encoding usage. op_sel is repurposed as a subset of dpp controls, and is eligible for VOP3->VOP1 shrinking. For some reason fi also uses an enum value, so we need to convert the raw boolean to 1 instead of -1. The 2 registers are swapped, so this has 2 defs. Ideally the builtin would return a pair, but that's difficult so return a vector instead. This would make a hypothetical builtin that supports v2f16 directly uglier.	2024-11-22 20:12:50 -08:00
Brox Chen	4cc278587f	[AMDGPU][True16][MC] VOPC profile fake16 pseudo update (#113175 ) Update VOPC profile with VOP3 pseudo: 1. On GFX11+, v_cmp_class_f16 has src1 type f16 for literals, however it's semantically interpreted as an integer. Update VOPC class f16 profile from operand type f16, i16 to f16, f16, currently updating it for fake16 format, and will update t16 format in the following patch. 2. 16bit V_CMP_CLASS instructions (V_CMP_**_U/I/F16) are named with `t16`, but actually using 32 bit registers. Correct it by updating the pseudo definitions with useRealTrue16/useFakeTrue16 predicates and rename these `t16` instructions to `fake16`. 3. Update the inst select so that `t16`/`fake16` instructions are selected in true16/fake16 flow. 4. The mir test file are impacted for a name change of these impacted 16 bit V_CMP instructions, but non-functional change to emitted code	2024-11-22 12:12:13 -05:00

1 2 3 4 5 ...

933 Commits