llvm-project

Author	SHA1	Message	Date
Fangrui Song	9e9907f1cf	[AMDGPU,test] Change llc -march= to -mtriple= (#75982 ) Similar to 806761a7629df268c8aed49657aeccffa6bca449. For IR files without a target triple, -mtriple= specifies the full target triple while -march= merely sets the architecture part of the default target triple, leaving a target triple which may not make sense, e.g. amdgpu-apple-darwin. Therefore, -march= is error-prone and not recommended for tests without a target triple. The issue has been benign as we recognize $unknown-apple-darwin as ELF instead of rejecting it outrightly. This patch changes AMDGPU tests to not rely on the default OS/environment components. Tests that need fixes are not changed: ``` LLVM :: CodeGen/AMDGPU/fabs.f64.ll LLVM :: CodeGen/AMDGPU/fabs.ll LLVM :: CodeGen/AMDGPU/floor.ll LLVM :: CodeGen/AMDGPU/fneg-fabs.f64.ll LLVM :: CodeGen/AMDGPU/fneg-fabs.ll LLVM :: CodeGen/AMDGPU/r600-infinite-loop-bug-while-reorganizing-vector.ll LLVM :: CodeGen/AMDGPU/schedule-if-2.ll ```	2024-01-16 21:54:58 -08:00
Nicolai Hähnle	49b492048a	AMDGPU: Fix packed 16-bit inline constants (#76522 ) Consistently treat packed 16-bit operands as 32-bit values, because that's really what they are. The attempt to treat them differently was ultimately incorrect and lead to miscompiles, e.g. when using non-splat constants such as (1, 0) as operands. Recognize 32-bit float constants for i/u16 instructions. This is a bit odd conceptually, but it matches HW behavior and SP3. Remove isFoldableLiteralV216; there was too much magic in the dependency between it and its use in SIFoldOperands. Instead, we now simply rely on checking whether a constant is an inline constant, and trying a bunch of permutations of the low and high halves. This is more obviously correct and leads to some new cases where inline constants are used as shown by tests. Move the logic for switching packed add vs. sub into SIFoldOperands. This has two benefits: all logic that optimizes for inline constants in packed math is now in one place; and it applies to both SelectionDAG and GISel paths. Disable the use of opsel with v_dot* instructions on gfx11. They are documented to ignore opsel on src0 and src1. It may be interesting to re-enable to use of opsel on src2 as a future optimization. A similar "proper" fix of what inline constants mean could potentially be applied to unpacked 16-bit ops. However, it's less clear what the benefit would be, and there are surely places where we'd have to carefully audit whether values are properly sign- or zero-extended. It is best to keep such a change separate. Fixes: Corruption in FSR 2.0 (latent bug exposed by an LLPC change)	2024-01-04 00:10:15 +01:00
Jay Foad	0d40831765	[AMDGPU] Allow folding to FMAAK with SGPR and immediate operand on GFX10+ (#72266 ) Allow foldImmediate to create instructions like: v_fmaak_f32 v0, s0, v0, 0x42000000 This instruction has two "scalar values": s0 and 0x42000000. On GFX10+ this is allowed. This fold was originally implemented before the compiler supported GFX10, when all ASICs were limited to one scalar value.	2023-11-28 14:36:37 +00:00
Jay Foad	a4196666ac	[AMDGPU] Revert "Preliminary patch for divergence driven instruction selection. Operands Folding 1." (#71710 ) This reverts commit 201f892b3b597f24287ab6a712a286e25a45a7d9.	2023-11-13 13:53:10 +00:00
Ivan Kosarev	cf80defae2	[AMDGPU][GFX11] Do not rewrite V_FMA/FMAC_* to V_FMAAK_F16_t16 on operand legalization. (#66202 ) V_FMAAK_F16_t16 takes VGPR_32_Lo128 operands whereas the original instructions would have VGPR_32 operands. Switching the opcodes without updating operands' register classes leads to MachineVerifier complaining about the classes not matching instruction definitions. The problem only reveals itself of builds with expensive checks enabled because of missing -verify-machineinstrs in the test. This is the third attempt to update CodeGen/AMDGPU/fma.f16.ll to run for GFX11, following the second attempt in a1e38e0b8e3e, partially reverted in eaf737a4e004.	2023-10-04 12:41:46 +01:00
Ivan Kosarev	eaf737a4e0	[AMDGPU] Remove the GFX11 runs in CodeGen/AMDGPU/fma.f16.ll. It still fails with expensive checks enabled. This partially reverts: a1e38e0b8e3e [AMDGPU][GFX11] Add more test coverage for FMA instructions.	2023-09-12 10:30:52 +01:00
Ivan Kosarev	a1e38e0b8e	[AMDGPU][GFX11] Add more test coverage for FMA instructions. (#65935 ) This is another attempt to update the tests to run for GFX11. Previously done in <https://reviews.llvm.org/D153269>, and then reverted in <https://reviews.llvm.org/rG2d3e6c440244ad94777aa13566b0376eb3c088f1> due to a failure on a buildbot with expensive checks enabled. Commit 4b1702e87a2687569b197aea4721353f8b788182 fixed the problem.	2023-09-12 09:40:10 +01:00
Konstantina Mitropoulou	17fc78e7a4	[DAGCombiner] Change foldAndOrOfSETCC() to optimize and/or patterns with floating points. This reverts commit 48fa79a503a7cf380f98b6335fbd349afae1bd86. Reviewed By: brooksmoses Differential Revision: https://reviews.llvm.org/D159240	2023-08-31 11:36:50 -07:00
Konstantina Mitropoulou	48fa79a503	Revert "[DAGCombiner] Change foldAndOrOfSETCC() to optimize and/or patterns with floating points." This reverts commit 5ec13535235d07eafd64058551bc495f87c283b1.	2023-08-24 20:39:04 -07:00
Konstantina Mitropoulou	5ec1353523	[DAGCombiner] Change foldAndOrOfSETCC() to optimize and/or patterns with floating points. CMP(A,C)\|\|CMP(B,C) => CMP(MIN/MAX(A,B), C) CMP(A,C)&&CMP(B,C) => CMP(MIN/MAX(A,B), C) If the operands are proven to be non NaN, then the optimization can be applied for all predicates. We can apply the optimization for the following predicates for FMINNUM/FMAXNUM (for quiet and signaling NaNs) and for FMINNUM_IEEE/FMAXNUM_IEEE if we can prove that the operands are not signaling NaNs. - ordered lt/le and \|\| - ordered gt/ge and \|\| - unordered lt/le and && - unordered gt/ge and && Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D155267	2023-08-24 10:48:56 -07:00
Jay Foad	f2c164c815	[AMDGPU] Do not wait for vscnt on function entry and return SIInsertWaitcnts inserts waitcnt instructions to resolve data dependencies. The GFX10+ vscnt (VMEM store count) counter is never used in this way. It is only used to resolve memory dependencies, and that is handled by SIMemoryLegalizer. Hence there is no need to conservatively wait for vscnt to be 0 on function entry and before returns. Differential Revision: https://reviews.llvm.org/D153537	2023-07-04 12:22:38 +01:00
Ivan Kosarev	2d3e6c4402	[AMDGPU] Drop GFX11 runs for dagcombine-fma-fmad.ll and fma.f16.ll. They cause failures on the llvm-clang-x86_64-expensive-checks-debian buildbot. This partially reverts D153269 [AMDGPU][GFX11] Add test coverage for FMA instructions.	2023-06-20 11:32:44 +01:00
Ivan Kosarev	dec42ffa28	[AMDGPU][GFX11] Add test coverage for FMA instructions. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D153269	2023-06-20 10:50:03 +01:00
Jay Foad	e2eee902a4	[AMDGPU] Fix an assertion failure when folding into src2 of V_FMAC_F16 D139469 "[AMDGPU] Enable OMod on more VOP3 instructions" caused an assertion failure when trying to fold into src2 of V_FMAC_F16. It would temporarily convert the instruction to V_FMA_F16_gfx9 and add an opsel operand, but if the fold still failed then it would forget to remove the opsel operand. Differential Revision: https://reviews.llvm.org/D144558	2023-02-22 14:26:03 +00:00
Jay Foad	afa0ed33df	[AMDGPU] Fix shrinking of F16 FMA on newer subtargets D125803 introduced shrinking of F16 FMA to FMAAK/FMAMK in SIShrinkInstructions (useful on GFX10+ where VOP3 instructions may have a literal operand) but failed to handle the V_FMA_F16_gfx9_e64 form of the opcode which is used on GFX9+. Differential Revision: https://reviews.llvm.org/D133489	2022-09-08 16:41:04 +01:00
Jay Foad	5b652f77e0	[AMDGPU] Add basic tests for emitting v_fma_f16 and friends	2022-09-08 14:40:36 +01:00

16 Commits