llvm-project

Author	SHA1	Message	Date
Jay Foad	0448a1c0dc	[AMDGPU] Simplify commuted operand handling. NFCI. (#71965 ) SIInstrInfo::commuteInstructionImpl should accept indices to commute in either order. This simplifies SIFoldOperands::tryAddToFoldList where OtherIdx, CommuteIdx0 and CommuteIdx1 are no longer needed.	2023-11-10 21:51:52 +00:00
Jay Foad	d5f3b3b3b1	[RegScavenger] Simplify state tracking for backwards scavenging (#71202 ) Track the live register state immediately before, instead of after, MBBI. This makes it simple to track the state at the start or end of a basic block without a separate (and poorly named) Tracking flag. This changes the API of the backward(MachineBasicBlock::iterator I) method, which now recedes to the state just before, instead of just after, *I. Some clients are simplified by this change. There is one small functional change shown in the lit tests where multiple spilled registers all need to be reloaded before the same instruction. The reloads will now be inserted in the opposite order. This should not affect correctness.	2023-11-08 09:49:07 +00:00
Stanislav Mekhanoshin	c3851a987b	[AMDGPU] Remove dead handling of S_SETPC_B64 (#71275 ) At the very least there are no tests covering this. Nothing breaks when I remove it.	2023-11-06 10:35:49 -08:00
Jessica Del	6e4692c9ee	[AMDGPU] - Add s_wqm intrinsics (#71048 ) Add intrinsics to generate `s_wqm_b32` and `s_wqm_b64`. Support VGPR arguments by inserting a `v_readfirstlane`.	2023-11-03 14:48:59 +01:00
Jay Foad	1590cac494	[AMDGPU] Implement moveToVALU for S_CSELECT_B64 (#70352 ) moveToVALU previously only handled S_CSELECT_B64 in the trivial case where it was semantically equivalent to a copy. Implement the general case using V_CNDMASK_B64_PSEUDO and implement post-RA expansion of V_CNDMASK_B64_PSEUDO with immediate as well as register operands.	2023-11-02 10:08:09 +00:00
Jessica Del	41cf94e6b8	[AMDGPU] - Add s_quadmask intrinsics (#70804 ) Add intrinsics to generate `s_quadmask_b32` and `s_quadmask_b64`. Support VGPR arguments by inserting a `v_readfirstlane`.	2023-11-02 10:37:52 +01:00
Jay Foad	86f2e09250	[AMDGPU] Tweak handling of GlobalAddress operands in SI_PC_ADD_REL_OFFSET (#70960 ) When SI_PC_ADD_REL_OFFSET is expanded to S_GETPC/S_ADD/S_ADDC, the GlobalAddress operands have to be adjusted by 4 or 12 bytes to account for the offset from the end of the S_GETPC instruction to the literal operands. Do this all in SIInstrInfo::expandPostRAPseudo instead of duplicating the adjustment code in both AMDGPULegalizerInfo and SITargetLowering. NFCI.	2023-11-01 19:48:30 +00:00
Jay Foad	2be251fbf4	[AMDGPU] Simplify expandPostRAPseudo for SI_PC_ADD_REL_OFFSET. NFC.	2023-11-01 15:51:20 +00:00
Jessica Del	b8d3ccdff1	[AMDGPU] - Add s_bitreplicate intrinsic (#69209 ) Add intrinsic for s_bitreplicate. Lower to S_BITREPLICATE_B64_B32 machine instruction in both GISel and Selection DAG. Support VGPR arguments by inserting a `v_readfirstlane`.	2023-10-31 11:26:45 +01:00
Stanislav Mekhanoshin	ee6d62db99	[AMDGPU] Prevent folding of the negative i32 literals as i64 (#70274 ) We can use sign extended 64-bit literals, but only for signed operands. At the moment we do not know if an operand is signed. Such operand will be encoded as its low 32 bits and then either correctly sign extended or incorrectly zero extended by HW.	2023-10-30 08:07:43 -07:00
Christudasan Devadasan	a0eb6b88f9	[AMDGPU] Try to fix the block prologs broken by RA inserted instructions (#69924 ) The insertion point determined by RA while attempting spills and liverange split at the beginning of a block goes wrong at times, and the newly inserted vector instructions are placed before the exec-mask restore instruction which is wrong. It occurs mainly due to the dependency on isBasicBlockPrologue that doesn't account early inserted instructions (spills and splits) during RA and causes the block prolog break. A better approach for deciding the insertion point should be worked out. For now, improving the helper function to consider all possible early insertions. This patch includes the spill instructions. The copies associated with liverange split should also be included in the block prolog.	2023-10-27 19:10:18 +05:30
Christudasan Devadasan	f9cd789658	[AMDGPU] Add pseudo instructions for SGPR spill to VGPR (#69923 ) For a future patch, is it important to keep the lowered SGPR spills to be recognized as spill instructions during regalloc. Directly lowering them into V_WRITELANE/V_READLANE won't allow us to attach the SPILL flag to their instructions. This patch introduces the pseudo instructions with the SGPRSpill flag set in their Desc. They will get lowered to equivalent instructions later during post RA pseudo expansion.	2023-10-27 17:24:10 +05:30
Jay Foad	3c58e53041	[AMDGPU] Use const reference in SIInstrInfo::buildExtractSubReg. NFC.	2023-10-26 15:42:24 +01:00
Jay Foad	7caff73e38	[AMDGPU] Assert that we can find subregs in copyPhysReg. NFC. (#70332 ) This helped to catch a codegen failure caused by #69703. MachineVerifier did not complain about this malformed COPY either before regalloc: %9:vreg_64 = COPY %17:vgpr_32 Or after regalloc: renamable $vgpr0_vgpr1 = COPY renamable $vgpr2, implicit $exec But we can at least catch the problem when copyPhysReg tries to expand it into 32-bit register moves and fails to find suitable source registers: $vgpr0 = V_MOV_B32_e32 $noreg, implicit $exec, implicit-def $vgpr0_vgpr1, implicit $vgpr2 $vgpr1 = V_MOV_B32_e32 $noreg, implicit $exec, implicit $vgpr2, implicit $exec	2023-10-26 15:39:10 +01:00
Christudasan Devadasan	16fbc45f48	Revert "[AMDGPU] Cleanup hasUnwantedEffectsWhenEXECEmpty function (#70206 )" This reverts commit 7ce613fc77af092dd6e9db71ce3747b75bc5616e.	2023-10-26 17:04:28 +05:30
Piotr Sobczak	ba3d6e0499	[AMDGPU] Rematerialize scalar loads (#68778 ) Extend the list of instructions that can be rematerialized in SIInstrInfo::isReallyTriviallyReMaterializable() to support scalar loads. Try shrinking instructions to remat only the part needed for current context. Add SIInstrInfo::reMaterialize target hook, and handle shrinking of S_LOAD_DWORDX16_IMM to S_LOAD_DWORDX8_IMM as a proof of concept.	2023-10-26 11:34:33 +02:00
Christudasan Devadasan	7ce613fc77	[AMDGPU] Cleanup hasUnwantedEffectsWhenEXECEmpty function (#70206 ) The readlane & writelane instructions don't really depend on the the EXEC mask and they should return false from here.	2023-10-25 22:10:16 +05:30
Stanislav Mekhanoshin	98e95a0055	[AMDGPU] Make S_MOV_B64_IMM_PSEUDO foldable (#69483 ) With the legality checks in place it is now safe to do. S_MOV_B64 shall not be used with wide literals, thus updating the test.	2023-10-18 13:38:20 -07:00
Stanislav Mekhanoshin	47ed921985	[AMDGPU] Add legality check when folding short 64-bit literals (#69391 ) We can only fold it if it can fit into 32-bit. I believe it did not trigger yet because we do not select 64-bit literals generally.	2023-10-18 09:22:23 -07:00
Sirish Pande	28e4f97320	[AMDGPU] Save/Restore SCC bit across waterfall loop. (#68363 ) Waterfall loop is overwriting SCC bit of status register. Make sure SCC bit is saved and restored across. We need to save/restore only in cases where SCC is live across waterfall loop. Co-authored-by: Sirish Pande <sirish.pande@amd.com>	2023-10-18 08:43:29 -05:00
Stanislav Mekhanoshin	a22a1fe151	[AMDGPU] support 64-bit immediates in SIInstrInfo::FoldImmediate (#69260 ) This is a part of https://github.com/llvm/llvm-project/issues/67781. Until we select more 64-bit move immediates the impact is minimal.	2023-10-17 10:53:22 -07:00
Petar Avramovic	2fa7d652d0	AMDGPU: Fix temporal divergence introduced by machine-sink (#67456 ) Temporal divergence that was present in input or introduced in IR transforms, like code-sinking or LICM, is handled in SIFixSGPRCopies by changing sgpr source instr to vgpr instr. After 5b657f5, that moved LICM after AMDGPUCodeGenPrepare, machine-sinking can introduce temporal divergence by sinking instructions outside of the cycle. Add isSafeToSink callback in TargetInstrInfo.	2023-10-06 15:00:08 +02:00
Ivan Kosarev	f04aa1f814	[AMDGPU][CodeGen] Fold immediates in src1 operands of V_MAD/MAC/FMA/FMAC. (#68002 )	2023-10-05 14:22:29 +03:00
Ivan Kosarev	cf80defae2	[AMDGPU][GFX11] Do not rewrite V_FMA/FMAC_* to V_FMAAK_F16_t16 on operand legalization. (#66202 ) V_FMAAK_F16_t16 takes VGPR_32_Lo128 operands whereas the original instructions would have VGPR_32 operands. Switching the opcodes without updating operands' register classes leads to MachineVerifier complaining about the classes not matching instruction definitions. The problem only reveals itself of builds with expensive checks enabled because of missing -verify-machineinstrs in the test. This is the third attempt to update CodeGen/AMDGPU/fma.f16.ll to run for GFX11, following the second attempt in a1e38e0b8e3e, partially reverted in eaf737a4e004.	2023-10-04 12:41:46 +01:00
Ivan Kosarev	64482d5766	[AMDGPU] Fix passing CodeGen/AMDGPU/frem.ll on gfx1150. (#67425 ) We would currently crash on it trying to use t16 instructions instead of fake16 ones.	2023-09-26 15:13:23 +01:00
Ivan Kosarev	287f6cdd17	[AMDGPU] Remove the support for non-True16 copies between different register sizes. Differential Revision: https://reviews.llvm.org/D156985	2023-09-26 14:46:34 +01:00
Ivan Kosarev	758df22bcf	[AMDGPU][True16] Support emitting copies between different register sizes. Differential Revision: https://reviews.llvm.org/D156105	2023-09-26 12:15:34 +01:00
Mirko Brkušanin	ecfdc23dd2	[AMDGPU] Select gfx1150 SALU Float instructions (#66885 )	2023-09-21 12:22:55 +02:00
Nicolai Hähnle	2eb767c9e1	AMDGPU: Scratch instructions are trivially disjoint from SMEM and buffer instructions (#65287 ) Scratch instructions are always in addrspace(5), which can only alias with flat (and itself). SMEM and buffer instructions can never reference those address spaces, so they are trivially disjoint.	2023-09-08 07:43:36 +02:00
Stanislav Mekhanoshin	294f632859	[AMDGPU] Move add64/sub64 to VALU This is NFCI as far as I can tell, but I see no reason not to do it. Differential Revision: https://reviews.llvm.org/D159077	2023-08-29 09:31:04 -07:00
Mateusz Hurnik	232f0c9a9a	[NFC][AMDGPU] Remove redundant code As the result of this constant function is unused it is redundant. Reviewed by: arsenm Differential Revision: https://reviews.llvm.org/D158747	2023-08-25 11:18:48 +01:00
Stanislav Mekhanoshin	cfe9a134bb	[AMDGPU] Rename 64BitDPP feature and fix the checks Names '64BitDPP' and especially 'DPP64' were found misleading, and DPP64 can easily be mixed with DPP16 and DPP8 while these are different concepts. DPP16 and DPP8 refers to lanes where DPP64 refers to the operand size. In fact the essential part here is that these instructions are executed on the DP ALU, so rename the feature accordingly. I have also found a bug in a check for these instructions, which is fixed here and a common utility function is now used. Differential Revision: https://reviews.llvm.org/D158465	2023-08-22 11:00:10 -07:00
Mirko Brkusanin	de82fde22d	AMDGPU/Uniformity/GlobalISel: G_AMDGPU atomics are always divergent Patch by: Acim Maravic Differential Revision: https://reviews.llvm.org/D157091	2023-08-18 18:23:40 +02:00
Christudasan Devadasan	81827f8cfb	[AMDGPU] Support wwm-reg AV spill pseudos The wwm register spill pseudos are currently defined for VGPR_32 regclass. It causes a verifier error for gfx908 or above as the regalloc sometimes restores the values to the vector superclass AV_32. Fixing it by supporting AV wwm-spill pseudos as well. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D155646	2023-08-17 20:04:18 +05:30
Stanislav Mekhanoshin	f7480bc5c1	[AMDGPU] Decouple V_PK_MOV_B32 from FeaturePackedFP32Ops This is not an FP32 operation. Differential Revision: https://reviews.llvm.org/D157909	2023-08-14 14:22:52 -07:00
Mirko Brkusanin	1e5359c6ba	[AMDGPU] Treat KIMM32 and KIMM16 operand types as noninlinable While they are represent 32/16 bit immediate values they are already included in encoding of the instructions that use them and are not true literals. FMAMK and FMAAK instructions that use them are marked with fixed size so getInstSizeInBytes will not increase the size for these operands. We also add tests whose logic relies on KIMM16 and KIMM32 being considered not inlinable. Differential Revision: https://reviews.llvm.org/D157624	2023-08-11 18:46:39 +02:00
Matt Arsenault	4b1702e87a	AMDGPU: Fix counting source modifiers as literal constants This fixes over estimating code size. This was broken by 79f52af4cd9a76485dd50bcdbb5d393eb7a70103. https://reviews.llvm.org/D157103	2023-08-07 18:40:16 -04:00
Sander de Smalen	bbb95893de	[TII] NFCI: Simplify the interface for isTriviallyReMaterializable Currently `isTriviallyReMaterializable` calls `isReallyTriviallyReMaterializable` and `isReallyTriviallyReMaterializableGeneric`. The two interfaces are confusing, but there are also some real issues with this. The documentation of this function (see below) suggests that `isReallyTriviallyRematerializable` allows the target to override the default behaviour. /// For instructions with opcodes for which the M_REMATERIALIZABLE flag is /// set, this hook lets the target specify whether the instruction is actually /// trivially rematerializable, taking into consideration its operands. It however implements something different. The default behaviour is the analysis done in `isReallyTriviallyReMaterializableGeneric`, which is testing if it is safe to rematerialize the MachineInstr. The result of `isReallyTriviallyReMaterializable` is only considered if `isReallyTriviallyReMaterializableGeneric` returns `false`. That means there is no way to override the default behaviour if `isReallyTriviallyReMaterializableGeneric` returns true (i.e. it is safe to rematerialize, but we'd rather not). By making this a single interface, we can override the interface to do either. Reviewed By: craig.topper, nemanjai Differential Revision: https://reviews.llvm.org/D156520	2023-08-07 13:01:06 +00:00
Jay Foad	e61ca23289	[AMDGPU] Add and use SIInstrFlags::GWS. NFC. This reduces the number of places where we have to check for a list of DS_GWS_* opcodes. Differential Revision: https://reviews.llvm.org/D157099	2023-08-07 12:05:14 +01:00
Matt Arsenault	4d42e8b5d1	Reapply "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting" This reverts commit a496c8be6e638ae58bb45f13113dbe3a4b7b23fd. The workaround in c26dfc81e254c78dc23579cf3d1336f77249e1f6 should work around the underlying problem with SUBREG_TO_REG.	2023-07-31 20:15:45 -04:00
Sameer Sahasrabuddhe	7c760b224b	Restore "[GlobalISel] GIntrinsic subclass to represent intrinsics in Generic Machine IR" Some opcodes in generic MIR represent calls to intrinsics, where the intrinsic ID is the first non-def operand to the instruction. These are now represented as a subclass of GenericMachineInstr, and the method MachineInstr::getIntrinsicID() is now moved to this subclass GIntrinsic. Some target-defined instructions behave like GMIR intrinsics, and have an Intrinsic::ID operand. But they should not be recognized as generic intrinsics, and should not use GIntrinsic::getIntrinsicID(). Separated these out by introducing a new AMDGPU::getIntrinsicID(). Reviewed By: arsenm, Pierre-vh Differential Revision: https://reviews.llvm.org/D155556 This restores commit baa3386edb11a2f9bcadda8cf58d56f3707c39fa. Originally reverted in d0f7850b01cf17e50a4f4b00e3b84dded94df6b8.	2023-07-27 14:49:17 +05:30
Vitaly Buka	a496c8be6e	Revert "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting" And dependent commits. Details in D150388. This reverts commit 825b7f0ca5f2211ec3c93139f98d1e24048c225c. This reverts commit 7a98f084c4d121244ef7286bc6503b6a181d446e. This reverts commit b4a62b1fa546312d882fa12dfdcd015177d66826. This reverts commit b7836d856206ec39509d42529f958c920368166b. No conflicts in the code, few tests had conflicts in autogenerated CHECKs: llvm/test/CodeGen/Thumb2/mve-float32regloops.ll llvm/test/CodeGen/AMDGPU/fix-frame-reg-in-custom-csr-spills.ll Reviewed By: alexfh Differential Revision: https://reviews.llvm.org/D156381	2023-07-26 22:13:32 -07:00
Sameer Sahasrabuddhe	d0f7850b01	Revert "[GlobalISel] GIntrinsic subclass to represent intrinsics in Generic Machine IR" This reverts commit baa3386edb11a2f9bcadda8cf58d56f3707c39fa. The changes did not cover all occurrences of the deteleted function MachineInstr::getIntrinsicID().	2023-07-27 10:14:24 +05:30
Sameer Sahasrabuddhe	baa3386edb	[GlobalISel] GIntrinsic subclass to represent intrinsics in Generic Machine IR Some opcodes in generic MIR represent calls to intrinsics, where the intrinsic ID is the first non-def operand to the instruction. These are now represented as a subclass of GenericMachineInstr, and the method MachineInstr::getIntrinsicID() is now moved to this subclass GIntrinsic. Some target-defined instructions behave like GMIR intrinsics, and have an Intrinsic::ID operand. But they should not be recognized as generic intrinsics, and should not use GIntrinsic::getIntrinsicID(). Separated these out by introducing a new AMDGPU::getIntrinsicID(). Reviewed By: arsenm, Pierre-vh Differential Revision: https://reviews.llvm.org/D155556	2023-07-27 10:00:45 +05:30
Stanislav Mekhanoshin	7972b9c829	[AMDGPU] Move SIEncodingFamily into SIDefines.h. NFC. I need this for future patch in the MC, while TII is not available in the llvm-mc. Besides this is not a first time I want it there. Differential Revision: https://reviews.llvm.org/D155228	2023-07-13 12:42:28 -07:00
Christudasan Devadasan	b4a62b1fa5	[AMDGPU] Enable whole wave register copy So far, we haven't exposed the allocation of whole-wave registers to regalloc. We hand-picked them for various whole wave mode operations. With a future patch, we want the allocator to efficiently allocate them rather than using the custom pre-allocation pass. Any liverange split of virtual registers involved in whole-wave operations require the resulting COPY introduced with the split to be performed for all lanes. It isn't implemented in the compiler yet. This patch would identify all such copies and manipulate the exec mask around them to enable all lanes without affecting the value of exec mask elsewhere. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D143762	2023-07-07 22:58:55 +05:30
Christudasan Devadasan	b78b36e1a2	[AMDGPU] Implement whole wave register spill To reduce the register pressure during allocation, when the allocator spills a virtual register that corresponds to a whole wave mode operation, the spill loads and restores should be activated for all lanes by temporarily flipping all bits in exec register to one just before the spills. It is not implemented in the compiler as of today and this patch enables the necessary support. This is a pre-patch before the SGPR spill to virtual VGPR lanes that would eventually causes the whole wave register spills during allocation. Reviewed By: arsenm, cdevadas Differential Revision: https://reviews.llvm.org/D143759	2023-07-07 22:51:45 +05:30
Ivan Kosarev	9d8171f8c4	[AMDGPU][Codegen] Clean up legalizeOpWithMove(). The removed logic was added in <https://reviews.llvm.org/rG0c93c9ecee0624f8469f5a971a09fbc9e9cc1061>, but now doesn't seem to be needed. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D154220	2023-06-30 16:48:16 +01:00
Brendon Cahoon	853b2a84cb	[AMDGPU] Reserve SGPR pair when long branches are present Branch relaxation requires 2 additional SGPRs for AMDGPU to handle the case when an indirect branch target is too far away. The register scavanger may not find available registers, which causes a “did not find scavenging index” assert to occur in assignRegToScavengingIndex. In this patch, we estimate before register allocation whether an indirect branch is likely to be needed, and reserve 2 SGPRs if the branch distance is found to be above a threshold. The distance threshold is an approximation as the exact code size and branch distance are unknown prior to register allocation. Patch by Corbin Robeck. Thanks! Differential Review: https://reviews.llvm.org/D149775	2023-06-29 16:50:46 -05:00
Ivan Kosarev	4e312abdfd	[AMDGPU][NFC] Add a getRegBitWidth() helper for TargetRegisterClass operands. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D152257	2023-06-07 11:41:11 +01:00

1 2 3 4 5 ...

783 Commits