llvm-project

Author	SHA1	Message	Date
Jay Foad	8fdfd34cd2	[AMDGPU] Remove GDS and GWS for GFX12 (#76148 )	2023-12-21 15:27:08 +00:00
Mariusz Sikora	a018c8cdbb	GFX12: Add LoopDataPrefetchPass (#75625 ) It is currently disabled by default. It will need experiments on a real HW to tune and decide on the profitability. --------- Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2023-12-19 08:32:16 +01:00
Stanislav Mekhanoshin	94230ce548	[AMDGPU] Fix lack of LDS DMA check in the AA handling (#75249 ) SIInstrInfo::areMemAccessesTriviallyDisjoint does a DS offset checks, but does not account for LDS DMA instructions. Added these checks. Without it code falls through and returns true which is wrong. As a result mayAlias would always return false for LDS DMA and a regular LDS instruction or 2 LDS DMA instructions. At the moment this is NFCI because we do not use this AA in a context which may touch LDS DMA instructions. This is also unreacheable now because of the ordered memory ref checks just above in the function and LDS DMA is marked as volatile. This volatile marking is removed in PR #75247, therefore I'd submit this check before #75247.	2023-12-18 10:58:50 -08:00
Mariusz Sikora	414d27419f	[AMDGPU] GFX12: select @llvm.prefetch intrinsic (#74576 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2023-12-15 17:15:55 +01:00
Mirko Brkušanin	07a6d73664	[AMDGPU] CodeGen for GFX12 VFLAT, VSCRATCH and VGLOBAL instructions (#75493 )	2023-12-15 15:01:40 +01:00
Mirko Brkušanin	5879162f7f	[AMDGPU] CodeGen for GFX12 VBUFFER instructions (#75492 )	2023-12-15 13:45:03 +01:00
Mirko Brkušanin	26b14aedb7	[AMDGPU] CodeGen for GFX12 VIMAGE and VSAMPLE instructions (#75488 )	2023-12-15 12:40:23 +01:00
Pierre van Houtryve	ef067f5204	[AMDGPU][SIInsertWaitcnts] Do not add s_waitcnt when the counters are known to be 0 already (#72830 ) Co-authored-by: Juan Manuel MARTINEZ CAAMAÑO <juamarti@amd.com>	2023-12-15 12:33:32 +01:00
Mirko Brkušanin	569ef8ddd9	[AMDGPU] Add pseudo scalar trans instructions for GFX12 (#75204 )	2023-12-15 10:41:05 +01:00
Jay Foad	3e6da3252f	[AMDGPU] Add GFX12 s_sleep_var instruction and intrinsic (#75499 )	2023-12-14 21:11:39 +00:00
Mariusz Sikora	7f55d7de1a	[AMDGPU] GFX12: Add Split Workgroup Barrier (#74836 ) Co-authored-by: Vang Thao <Vang.Thao@amd.com>	2023-12-13 15:01:13 +01:00
Piotr Sobczak	6eec80133b	[AMDGPU] Min/max changes for GFX12 (#75214 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2023-12-13 14:18:10 +01:00
Jay Foad	220e095a2c	[AMDGPU] Remove unused function splitScalar64BitAddSub	2023-12-12 15:01:57 +00:00
Piotr Sobczak	cd138fddf1	[AMDGPU] Turn off clang-format in moveToVALU (#75188 )	2023-12-12 15:38:06 +01:00
Mariusz Sikora	a97028ac51	[AMDGPU] Update VOP instructions for GFX12 (#74853 ) Co-authored-by: Mirko Brkusanin <Mirko.Brkusanin@amd.com>	2023-12-12 11:38:24 +01:00
Alex Bradbury	b717365216	[MachineScheduler][NFCI] Add Offset and OffsetIsScalable args to shouldClusterMemOps (#73778 ) These are picked up from getMemOperandsWithOffsetWidth but weren't then being passed through to shouldClusterMemOps, which forces backends to collect the information again if they want to use the kind of heuristics typically used for the similar shouldScheduleLoadsNear function (e.g. checking the offset is within 1 cache line). This patch just adds the parameters, but doesn't attempt to use them. There is potential to use them in the current PPC and AArch64 shouldClusterMemOps implementation, and I intend to use the offset in the heuristic for RISC-V. I've left these for future patches in the interest of being as incremental as possible. As noted in the review and in an inline FIXME, an ElementCount-style abstraction may later be used to condense these two parameters to one argument. ElementCount isn't quite suitable as it doesn't support negative offsets.	2023-12-06 15:30:48 +00:00
Alex Bradbury	6cf3566850	[NFC][MachineScheduler] Rename NumLoads parameter of shouldClusterMemOps to ClusterSize (#73757 ) As the same hook is called for both load and store clustering, NumLoads is a misleading name. Use ClusterSize instead.	2023-11-29 09:47:03 +00:00
Stanislav Mekhanoshin	87d884b5c8	[AMDGPU] Fix folding of v2i16/v2f16 splat imms (#72709 ) We can use inline constants with packed 16-bit operands, but these should use op_sel. Currently splat of inlinable constants is considered legal, which is not really true if we fail to fold it with op_sel and drop the high half. It may be legal as a literal but not as inline constant, but then usual literal checks must be performed. This patch makes these splat literals illegal but adds additional logic to the operand folding to keep current folds. This logic is somewhat heavy though. This has fixed constant bus violation in the fdot2 test.	2023-11-28 09:07:26 -08:00
Jay Foad	0d40831765	[AMDGPU] Allow folding to FMAAK with SGPR and immediate operand on GFX10+ (#72266 ) Allow foldImmediate to create instructions like: v_fmaak_f32 v0, s0, v0, 0x42000000 This instruction has two "scalar values": s0 and 0x42000000. On GFX10+ this is allowed. This fold was originally implemented before the compiler supported GFX10, when all ASICs were limited to one scalar value.	2023-11-28 14:36:37 +00:00
Jay Foad	cf1e0c0b07	[AMDGPU] Define new targets gfx1200 and gfx1201 (#73133 ) Define target names and ELF numbers for new GFX12 targets gfx1200 and gfx1201. For now they behave identically to GFX11.	2023-11-23 16:44:05 +00:00
Jay Foad	be2388c0d9	[AMDGPU] Prefer v_madak_f32 over v_madmk_f32 to reduce vgpr pressure (#72506 ) As explained in the comment in SIInstrInfo::FoldImmediate, if we have a choice between v_madak_f32 and v_madmk_f32 we should choose the former so that the literal that is not folded into the instruction can be materialized in an sgpr instead of a vgpr.	2023-11-16 12:50:26 +00:00
Christudasan Devadasan	ce7fd498ed	[AMDGPU] RA inserted scalar instructions can be at the BB top (#72140 ) We adjust the insertion point at the BB top for spills/copies during RA to ensure they are placed after the exec restore instructions required for the divergent control flow execution. This is, however, required only for the vector operations. The insertions for scalar registers can still go to the BB top.	2023-11-16 10:30:03 +05:30
Jay Foad	1e8c17e9c7	[AMDGPU] Allow folding to FMAMK with SGPR and immediate operand on GFX10+ (#72258 ) Allow foldImmediate to create instructions like: v_fmamk_f32 v0, s0, 0x42000000, v0 This instruction has two "scalar values": s0 and 0x42000000. On GFX10+ this is allowed. This fold was originally implemented before the compiler supported GFX10, when all ASICs were limited to one scalar value.	2023-11-15 10:58:00 +00:00
Jay Foad	0448a1c0dc	[AMDGPU] Simplify commuted operand handling. NFCI. (#71965 ) SIInstrInfo::commuteInstructionImpl should accept indices to commute in either order. This simplifies SIFoldOperands::tryAddToFoldList where OtherIdx, CommuteIdx0 and CommuteIdx1 are no longer needed.	2023-11-10 21:51:52 +00:00
Jay Foad	d5f3b3b3b1	[RegScavenger] Simplify state tracking for backwards scavenging (#71202 ) Track the live register state immediately before, instead of after, MBBI. This makes it simple to track the state at the start or end of a basic block without a separate (and poorly named) Tracking flag. This changes the API of the backward(MachineBasicBlock::iterator I) method, which now recedes to the state just before, instead of just after, *I. Some clients are simplified by this change. There is one small functional change shown in the lit tests where multiple spilled registers all need to be reloaded before the same instruction. The reloads will now be inserted in the opposite order. This should not affect correctness.	2023-11-08 09:49:07 +00:00
Stanislav Mekhanoshin	c3851a987b	[AMDGPU] Remove dead handling of S_SETPC_B64 (#71275 ) At the very least there are no tests covering this. Nothing breaks when I remove it.	2023-11-06 10:35:49 -08:00
Jessica Del	6e4692c9ee	[AMDGPU] - Add s_wqm intrinsics (#71048 ) Add intrinsics to generate `s_wqm_b32` and `s_wqm_b64`. Support VGPR arguments by inserting a `v_readfirstlane`.	2023-11-03 14:48:59 +01:00
Jay Foad	1590cac494	[AMDGPU] Implement moveToVALU for S_CSELECT_B64 (#70352 ) moveToVALU previously only handled S_CSELECT_B64 in the trivial case where it was semantically equivalent to a copy. Implement the general case using V_CNDMASK_B64_PSEUDO and implement post-RA expansion of V_CNDMASK_B64_PSEUDO with immediate as well as register operands.	2023-11-02 10:08:09 +00:00
Jessica Del	41cf94e6b8	[AMDGPU] - Add s_quadmask intrinsics (#70804 ) Add intrinsics to generate `s_quadmask_b32` and `s_quadmask_b64`. Support VGPR arguments by inserting a `v_readfirstlane`.	2023-11-02 10:37:52 +01:00
Jay Foad	86f2e09250	[AMDGPU] Tweak handling of GlobalAddress operands in SI_PC_ADD_REL_OFFSET (#70960 ) When SI_PC_ADD_REL_OFFSET is expanded to S_GETPC/S_ADD/S_ADDC, the GlobalAddress operands have to be adjusted by 4 or 12 bytes to account for the offset from the end of the S_GETPC instruction to the literal operands. Do this all in SIInstrInfo::expandPostRAPseudo instead of duplicating the adjustment code in both AMDGPULegalizerInfo and SITargetLowering. NFCI.	2023-11-01 19:48:30 +00:00
Jay Foad	2be251fbf4	[AMDGPU] Simplify expandPostRAPseudo for SI_PC_ADD_REL_OFFSET. NFC.	2023-11-01 15:51:20 +00:00
Jessica Del	b8d3ccdff1	[AMDGPU] - Add s_bitreplicate intrinsic (#69209 ) Add intrinsic for s_bitreplicate. Lower to S_BITREPLICATE_B64_B32 machine instruction in both GISel and Selection DAG. Support VGPR arguments by inserting a `v_readfirstlane`.	2023-10-31 11:26:45 +01:00
Stanislav Mekhanoshin	ee6d62db99	[AMDGPU] Prevent folding of the negative i32 literals as i64 (#70274 ) We can use sign extended 64-bit literals, but only for signed operands. At the moment we do not know if an operand is signed. Such operand will be encoded as its low 32 bits and then either correctly sign extended or incorrectly zero extended by HW.	2023-10-30 08:07:43 -07:00
Christudasan Devadasan	a0eb6b88f9	[AMDGPU] Try to fix the block prologs broken by RA inserted instructions (#69924 ) The insertion point determined by RA while attempting spills and liverange split at the beginning of a block goes wrong at times, and the newly inserted vector instructions are placed before the exec-mask restore instruction which is wrong. It occurs mainly due to the dependency on isBasicBlockPrologue that doesn't account early inserted instructions (spills and splits) during RA and causes the block prolog break. A better approach for deciding the insertion point should be worked out. For now, improving the helper function to consider all possible early insertions. This patch includes the spill instructions. The copies associated with liverange split should also be included in the block prolog.	2023-10-27 19:10:18 +05:30
Christudasan Devadasan	f9cd789658	[AMDGPU] Add pseudo instructions for SGPR spill to VGPR (#69923 ) For a future patch, is it important to keep the lowered SGPR spills to be recognized as spill instructions during regalloc. Directly lowering them into V_WRITELANE/V_READLANE won't allow us to attach the SPILL flag to their instructions. This patch introduces the pseudo instructions with the SGPRSpill flag set in their Desc. They will get lowered to equivalent instructions later during post RA pseudo expansion.	2023-10-27 17:24:10 +05:30
Jay Foad	3c58e53041	[AMDGPU] Use const reference in SIInstrInfo::buildExtractSubReg. NFC.	2023-10-26 15:42:24 +01:00
Jay Foad	7caff73e38	[AMDGPU] Assert that we can find subregs in copyPhysReg. NFC. (#70332 ) This helped to catch a codegen failure caused by #69703. MachineVerifier did not complain about this malformed COPY either before regalloc: %9:vreg_64 = COPY %17:vgpr_32 Or after regalloc: renamable $vgpr0_vgpr1 = COPY renamable $vgpr2, implicit $exec But we can at least catch the problem when copyPhysReg tries to expand it into 32-bit register moves and fails to find suitable source registers: $vgpr0 = V_MOV_B32_e32 $noreg, implicit $exec, implicit-def $vgpr0_vgpr1, implicit $vgpr2 $vgpr1 = V_MOV_B32_e32 $noreg, implicit $exec, implicit $vgpr2, implicit $exec	2023-10-26 15:39:10 +01:00
Christudasan Devadasan	16fbc45f48	Revert "[AMDGPU] Cleanup hasUnwantedEffectsWhenEXECEmpty function (#70206 )" This reverts commit 7ce613fc77af092dd6e9db71ce3747b75bc5616e.	2023-10-26 17:04:28 +05:30
Piotr Sobczak	ba3d6e0499	[AMDGPU] Rematerialize scalar loads (#68778 ) Extend the list of instructions that can be rematerialized in SIInstrInfo::isReallyTriviallyReMaterializable() to support scalar loads. Try shrinking instructions to remat only the part needed for current context. Add SIInstrInfo::reMaterialize target hook, and handle shrinking of S_LOAD_DWORDX16_IMM to S_LOAD_DWORDX8_IMM as a proof of concept.	2023-10-26 11:34:33 +02:00
Christudasan Devadasan	7ce613fc77	[AMDGPU] Cleanup hasUnwantedEffectsWhenEXECEmpty function (#70206 ) The readlane & writelane instructions don't really depend on the the EXEC mask and they should return false from here.	2023-10-25 22:10:16 +05:30
Stanislav Mekhanoshin	98e95a0055	[AMDGPU] Make S_MOV_B64_IMM_PSEUDO foldable (#69483 ) With the legality checks in place it is now safe to do. S_MOV_B64 shall not be used with wide literals, thus updating the test.	2023-10-18 13:38:20 -07:00
Stanislav Mekhanoshin	47ed921985	[AMDGPU] Add legality check when folding short 64-bit literals (#69391 ) We can only fold it if it can fit into 32-bit. I believe it did not trigger yet because we do not select 64-bit literals generally.	2023-10-18 09:22:23 -07:00
Sirish Pande	28e4f97320	[AMDGPU] Save/Restore SCC bit across waterfall loop. (#68363 ) Waterfall loop is overwriting SCC bit of status register. Make sure SCC bit is saved and restored across. We need to save/restore only in cases where SCC is live across waterfall loop. Co-authored-by: Sirish Pande <sirish.pande@amd.com>	2023-10-18 08:43:29 -05:00
Stanislav Mekhanoshin	a22a1fe151	[AMDGPU] support 64-bit immediates in SIInstrInfo::FoldImmediate (#69260 ) This is a part of https://github.com/llvm/llvm-project/issues/67781. Until we select more 64-bit move immediates the impact is minimal.	2023-10-17 10:53:22 -07:00
Petar Avramovic	2fa7d652d0	AMDGPU: Fix temporal divergence introduced by machine-sink (#67456 ) Temporal divergence that was present in input or introduced in IR transforms, like code-sinking or LICM, is handled in SIFixSGPRCopies by changing sgpr source instr to vgpr instr. After 5b657f5, that moved LICM after AMDGPUCodeGenPrepare, machine-sinking can introduce temporal divergence by sinking instructions outside of the cycle. Add isSafeToSink callback in TargetInstrInfo.	2023-10-06 15:00:08 +02:00
Ivan Kosarev	f04aa1f814	[AMDGPU][CodeGen] Fold immediates in src1 operands of V_MAD/MAC/FMA/FMAC. (#68002 )	2023-10-05 14:22:29 +03:00
Ivan Kosarev	cf80defae2	[AMDGPU][GFX11] Do not rewrite V_FMA/FMAC_* to V_FMAAK_F16_t16 on operand legalization. (#66202 ) V_FMAAK_F16_t16 takes VGPR_32_Lo128 operands whereas the original instructions would have VGPR_32 operands. Switching the opcodes without updating operands' register classes leads to MachineVerifier complaining about the classes not matching instruction definitions. The problem only reveals itself of builds with expensive checks enabled because of missing -verify-machineinstrs in the test. This is the third attempt to update CodeGen/AMDGPU/fma.f16.ll to run for GFX11, following the second attempt in a1e38e0b8e3e, partially reverted in eaf737a4e004.	2023-10-04 12:41:46 +01:00
Ivan Kosarev	64482d5766	[AMDGPU] Fix passing CodeGen/AMDGPU/frem.ll on gfx1150. (#67425 ) We would currently crash on it trying to use t16 instructions instead of fake16 ones.	2023-09-26 15:13:23 +01:00
Ivan Kosarev	287f6cdd17	[AMDGPU] Remove the support for non-True16 copies between different register sizes. Differential Revision: https://reviews.llvm.org/D156985	2023-09-26 14:46:34 +01:00
Ivan Kosarev	758df22bcf	[AMDGPU][True16] Support emitting copies between different register sizes. Differential Revision: https://reviews.llvm.org/D156105	2023-09-26 12:15:34 +01:00

1 2 3 4 5 ...

806 Commits