llvm-project

Author	SHA1	Message	Date
Austin Kerbow	864a2b25be	[AMDGPU] Reserve extra SGPR blocks wth XNACK "any" TID Setting ASMPrinter was relying on feature bits to setup extra SGRPs in the knerel descriptor for the xnack_mask. This was broken for the dynamic XNACK "any" TID setting which could cause user SGPRs to be clobbered if the number of SGPRs reserved was near a granulated block boundary. When XNACK was enabled this worked correctly in the ASMParser which meant some kernels were only failing without "-save-temps". Fixes: SWDEV-382764 Reviewed By: kzhuravl Differential Revision: https://reviews.llvm.org/D145401	2023-03-17 20:26:23 -07:00
Matt Arsenault	9356ec1516	CodeGen: Reorder case handling for is.fpclass legalization Subnormal and zero checks can be combined into one, so move the code closer to reduce the diff in a future change.	2023-03-17 11:29:50 -04:00
Vitaly Buka	aa15fe98b6	Revert "[AMDGPUUnifyDivergentExitNodes] Add NewPM support" Introduces nullptr dereference. This reverts commit a5455e32b364dabe499ec11722626d4bbaf047ba.	2023-03-16 19:03:46 -07:00
Mirko Brkusanin	d5c0c1b6f0	[AMDGPU] Select flat atomic fmin/fmax Also disables global atomic fmin/fmax x2 patterns on gfx11 Differential Revision: https://reviews.llvm.org/D146137	2023-03-16 18:07:26 +01:00
Anshil Gandhi	a5455e32b3	[AMDGPUUnifyDivergentExitNodes] Add NewPM support Meanwhile, use UniformityAnalysis instead of LegacyDivergenceAnalysis to collect divergence info. Reviewed By: arsenm, sameerds Differential Revision: https://reviews.llvm.org/D141355	2023-03-16 16:13:29 +00:00
Nikita Popov	bbfb13a5ff	[ConstExpr] Remove select constant expression This removes the select constant expression, as part of https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179. Uses of this expressions have already been removed in advance, so this just removes related infrastructure and updates tests. Differential Revision: https://reviews.llvm.org/D145382	2023-03-16 10:32:08 +01:00
Konstantina Mitropoulou	6bc5aa592a	[AMDGPU] Update mul.ll with auto-generated checks Reviewed By: foad Differential Revision: https://reviews.llvm.org/D145990	2023-03-15 08:16:28 -07:00
pvanhout	723a53caaf	[AMDGPU] Avoid constant bus limitation on V_BFE GISel pattern For D141247 - if that pattern was used by GISel it could cause constant bus limitation failures. Just use inline immediates instead of S_MOV to avoid the issue. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D146131	2023-03-15 15:01:33 +01:00
pvanhout	f90849dfa3	[AMDGPU] Use UniformityAnalysis in AtomicOptimizer Adds & uses a new `isDivergentUse` API in UA. UniformityAnalysis now requires CycleInfo as well as the new temporal divergence API can query it. ----- Original patch that adds `isDivergentUse` by @sameerds The user of a temporally divergent value is marked as divergent in the uniformity analysis. But the same user may also have been marked divergent for other reasons, thus losing this information about temporal divergence. But some clients need to specificly check for temporal divergence. This change restores such an API, that already existed in DivergenceAnalysis. Reviewed By: sameerds, foad Differential Revision: https://reviews.llvm.org/D146018	2023-03-15 09:39:55 +01:00
pvanhout	64b45db34a	[AMDGPU] Select v_sat_pk_u8_i16 The backend knew about `v_sat_pk_u8_i16` but never made use of it. This patch adds selection patterns (DAG/GISel) for that instruction. I think it'll be very rarely used, but at least it's possible to use it. Solves #58266 (https://github.com/llvm/llvm-project/issues/58266) Reviewed By: foad Differential Revision: https://reviews.llvm.org/D144729	2023-03-15 09:36:12 +01:00
Matt Arsenault	cd60bff329	CodeGen: Add some additional is_fpclass lowering tests Cover more cases in preparation for making greater use of fcmp based lowerings. Also add more tests for the inverted cases. Test iszero \| isnan test masks. We should probably just generate every combination of test masks.	2023-03-15 01:13:08 -04:00
Simon Pilgrim	4bf004e07e	[DAG] Fold (bitcast (logicop (bitcast x), (c))) -> (logicop x, (bitcast c)) iff the current logicop type is illegal Try to remove extra bitcasts around logicops if we're dealing with illegal types Fixes the regressions in D145939 Differential Revision: https://reviews.llvm.org/D146032	2023-03-14 14:41:11 +00:00
pvanhout	1f1fea6c38	Reland: [DAG/AMDGPU] Use UniformityAnalysis in DAGISel Switch DAGISel over to UniformityAnalysis, which was one of the last remaining users of the DivergenceAnalysis. No explosions seen during internal testing so this looks like a smooth transition. Reviewed By: sameerds Differential Revision: https://reviews.llvm.org/D145918	2023-03-14 14:38:45 +01:00
pvanhout	0ea6f0e158	[AMDGPU] Don't run `llc-pipeline.ll` when expensive_checks are enabled AMDGPU ISel can add extra passes when expensive checks are enabled. This means the pipeline can be reordered and the checks may fail. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D146038	2023-03-14 14:12:36 +01:00
pvanhout	0e79106fc9	Revert "[DAG/AMDGPU] Use UniformityAnalysis in DAGISel" This reverts commit 0022b5803fd4f5a4e9fcf233267c0ffa1b88f763.	2023-03-14 11:48:58 +01:00
pvanhout	0022b5803f	[DAG/AMDGPU] Use UniformityAnalysis in DAGISel Switch DAGISel over to UniformityAnalysis, which was one of the last remaining users of the DivergenceAnalysis. No explosions seen during internal testing so this looks like a smooth transition. Reviewed By: sameerds Differential Revision: https://reviews.llvm.org/D145918	2023-03-14 11:18:28 +01:00
Chen Zheng	4f0ed16a46	Reland rGf35a09daebd0a90daa536432e62a2476f708150d and rG63854f91d3ee1056796a5ef27753648396cac6ec [DAGCombiner] handle more store value forwarding When lowering calls on target like PPC, some stack loads will be generated for by value parameters. Node CALLSEQ_START prevents such loads from being combined. Suggested by @RolandF, this patch removes the unnecessary loads for the byval parameter by extending ForwardStoreValueToDirectLoad Reviewed By: nemanjai, RolandF Differential Revision: https://reviews.llvm.org/D138899	2023-03-12 21:59:18 -04:00
Simon Pilgrim	f759275c1c	[AMDGPU] Regenerate sdwa-peephole.ll	2023-03-12 13:50:25 +00:00
Jon Chesterfield	d3dda422bf	[amdgpu][nfc] Replace ad hoc LDS frame recalculation with absolute_symbol MD Post ISel, LDS variables are absolute values. Representing them as such is simpler than the frame recalculation currently used to build assembler tables from their addresses. This is a precursor to lowering dynamic/external LDS accesses from non-kernel functions. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D144221	2023-03-12 13:47:48 +00:00
Simon Pilgrim	b53ea2b9c5	[DAG] visitAND - fold (and (any_ext V), c) -> (zero_ext (and (trunc V), c)) if profitable. Try to more aggressively narrow masks of extended values. This is mainly for cases where the mask is trying to zero out any_extended upper bits, assuming we can zext/trunc the values for free. This catches a few actual missed folds, as well as helps canonicalize a number of other cases which were being caught in isel etc. Differential Revision: https://reviews.llvm.org/D145866	2023-03-12 13:25:23 +00:00
Mirko Brkusanin	2eada459c7	[AMDGPU][MachineVerifier] Fix vdata reg count for MIMG d16 Differential Revision: https://reviews.llvm.org/D145785	2023-03-10 14:47:49 +01:00
Max Kazantsev	6b03ce374e	[LICM] Simplify (X < A && X < B) into (X < MIN(A, B)) if MIN(A, B) is loop-invariant We don't do this transform in InstCombine in general case for arbitrary values, because cost of AND and 2 ICMP's isn't higher than of MIN and ICMP. However, LICM also has a notion about the loop structure. This transform becomes profitable if `A` and `B` are loop-invariant and `X` is not: by doing this, we can compute min outside the loop. Differential Revision: https://reviews.llvm.org/D143726 Reviewed By: nikic	2023-03-10 17:36:52 +07:00
Max Kazantsev	279f0c02ad	[Test] Regenerate tests using update_llc_test_checks.py	2023-03-10 11:34:16 +07:00
Valery Pykhtin	8f6c47b7a4	[AMDGPU] Speedup GCNDownwardRPTracker::advanceBeforeNext The function makes liveness tests for the entire live register set for every instruction it passes by. This becomes very slow on high RP regions such as ASAN enabled code. Instead only uses of last tracked instruction should be tested and this greatly improves compilation time. This patch revealed few bugs in SIFormMemoryClauses and PreRARematStage::sinkTriviallyRematInsts which should be fixed first. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D136267	2023-03-09 15:18:02 +01:00
Diana Picus	99d053a97d	AMDGPU: Update checks for a couple of tests. NFC	2023-03-09 15:09:19 +01:00
Petar Avramovic	ded69779be	Fix SGPR + VGPR + offset Scratch offset folding Values in SGPR and VGPR register are treated as unsigned by hardware. When value in 32-bit SGPR or VGPR base can be negative calculate offset using 32-bit add instructions, otherwise use sgpr(unsigned) + vgpr(unsigned) + offset. LoopStrengthReduce.cpp changes offsets to negative and in some iterations value in SGPR or VGPR register could be negative. Differential Revision: https://reviews.llvm.org/D144957	2023-03-09 10:53:41 +01:00
Petar Avramovic	3ae310d0ae	Fix VGPR + offset Scratch offset folding Values in VGPR register are treated as unsigned by hardware. When value in 32-bit VGPR base can be negative calculate offset using 32-bit add instruction, otherwise use vgpr base(unsigned) + offset. Does not affect case where whole offset comes from VGPR register (immediate offset is 0). LoopStrengthReduce.cpp changes offsets to negative and in some iterations value in VGPR register could be negative. Differential Revision: https://reviews.llvm.org/D144956	2023-03-09 10:52:44 +01:00
Petar Avramovic	5e56d59999	Fix SGPR + offset Scratch offset folding Values in SGPR register are treated as unsigned by hardware. When value in 32-bit SGPR base can be negative calculate offset using 32-bit add instruction, otherwise use sgpr base(unsigned) + offset. Does not affect case where whole offset comes from SGPR register (immediate offset is 0). LoopStrengthReduce.cpp changes offsets to negative and in some iterations value in SGPR register could be negative. Differential Revision: https://reviews.llvm.org/D144955	2023-03-09 10:52:44 +01:00
Stanislav Mekhanoshin	e7ec123c6a	[AMDGPU] Implement idempotent atomic lowering This turns an idempotent atomic operation into an atomic load. Fixes: SWDEV-385135 Differential Revision: https://reviews.llvm.org/D144759	2023-03-08 14:09:59 -08:00
Stanislav Mekhanoshin	59162e3859	[AMDGPU] Skip buffer_wbl2 before atomic fence acquire Memory models for gfx90a and gfx940 do not require buffer_wbl2 before the fence for acquire ordering, but we do insert the full release. Fixes: SWDEV-386785 Differential Revision: https://reviews.llvm.org/D145524	2023-03-08 01:24:20 -08:00
Christudasan Devadasan	2171f04c12	[AMDGPU] Extend WorkGroupID* codegen for compute shaders Currently, the codegen support for llvm.amdgcn.workgroup.id* intrinsics are enabled only for compute kernels. In addition, this patch enables their selection for compute shaders on subtargets that have architected SGPRs. Differential Revision: https://reviews.llvm.org/D145045	2023-03-08 07:36:19 +05:30
Florian Hahn	7019624ee1	[SCEV] Strengthen nowrap flags via ranges for ARs on construction. At the moment, proveNoWrapViaConstantRanges is only used when creating SCEV[Zero,Sign]ExtendExprs. We can get significant improvements by strengthening flags after creating the AddRec. I'll also share a follow-up patch that removes the code to strengthen flags when creating SCEV[Zero,Sign]ExtendExprs. Modifying AddRecs while creating those can lead to surprising changes. Compile-time looks neutral: https://llvm-compile-time-tracker.com/compare.php?from=94676cf8a13c511a9acfc24ed53c98964a87bde3&to=aced434e8b103109104882776824c4136c90030d&stat=instructions:u Reviewed By: mkazantsev, nikic Differential Revision: https://reviews.llvm.org/D144050	2023-03-07 17:10:34 +01:00
pvanhout	edca49cfb7	[AMDGPU] Match med3 for (max (min ..)) We previously only matched (min (max ...)) Depends on D144728 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D145159	2023-03-07 11:14:31 +01:00
pvanhout	764e39048c	[AMDGPU] Precommit test: v_sat_pk_u8_i16.ll Differential Revision: https://reviews.llvm.org/D144728	2023-03-07 11:07:13 +01:00
Jay Foad	5281f5c1e6	[AMDGPU] Add GFX9,GFX10,GFX11 checks for llvm.amdgcn.s.buffer.load	2023-03-06 18:19:50 +00:00
Jay Foad	e73d3150b1	[AMDGPU] Generate checks for llvm.amdgcn.s.buffer.load	2023-03-06 18:19:50 +00:00
pvanhout	036431e31e	[AMDGPU] Use UniformityAnalysis in LateCodeGenPrepare Reviewed By: foad Differential Revision: https://reviews.llvm.org/D145366	2023-03-06 13:35:57 +01:00
pvanhout	dbebebf6f6	[AMDGPU] Use UniformityAnalysis in CodeGenPrepare A little extra change was needed in UA because it didn't consider InvokeInst and it made call-constexpr.ll assert. Reviewed By: sameerds, arsenm Differential Revision: https://reviews.llvm.org/D145358	2023-03-06 13:26:51 +01:00
Jay Foad	271010bf50	[AMDGPU] Restore temporal divergence in test The loop in this test was supposed to have temporal divergence but this was broken by r367221. Fix it.	2023-03-06 12:09:52 +00:00
pvanhout	7a5d850da2	[AMDGPU] Use UniformityAnalysis in RewriteUndefsForPHI Reviewed By: foad Differential Revision: https://reviews.llvm.org/D145359	2023-03-06 12:15:33 +01:00
Matt Arsenault	9f4746b65f	AMDGPU: Combine down fcopysign f64 magnitude Copy through the low bits and only apply an f32 copysign to the high half. This is effectively what we do for codegen anyway, but this provides some combine benefits. The cases involving constants show some small improvements. https://reviews.llvm.org/D142682	2023-03-06 05:54:25 -04:00
Matt Arsenault	606a62ce27	AMDGPU: Force sign operand of f64 fcopysign to f32 The fcopysign DAG operation, unlike the IR one, allows different types for the sign and magnitude. We can reduce the bitwidth of the high operand since only the sign bit matters. The default combine only introduces mixed fcopysign operand types from fpext/fptrunc. We effectively do this already during selection, but doing it earlier in the combiner should expose new combine opportunities (e.g. the existing tests now eliminate the load of the low half of the double). Unfortunately this isn't enough to handle the case I'm interested in just yet.	2023-03-05 19:54:13 -04:00
Matt Arsenault	bd1f7c417f	AMDGPU: Try to push fneg as integer into select I initially attempted to select the source modifier from xor of a sign mask. This proved to be more difficult since foldBinOpIntoSelect does not consider free fneg of integers and undoes the combine.	2023-03-05 18:53:16 -04:00
Jeffrey Byrnes	b89236a96f	[AMDGPU] Vectorize misaligned global loads & stores Based on experimentation on gfx906,908,90a and 1030, wider global loads / stores are more performant than multiple narrower ones independent of alignment -- this is especially true when combining 8 bit loads / stores, in which case speedup was usually 2x across all alignments. Differential Revision: https://reviews.llvm.org/D145170 Change-Id: I6ee6c76e6ace7fc373cc1b2aac3818fc1425a0c1	2023-03-03 13:18:25 -08:00
Jay Foad	7442f8635b	[AMDGPU] Fix invalid instid value in s_delay_alu instruction Differential Revision: https://reviews.llvm.org/D145232	2023-03-03 21:08:26 +00:00
Jay Foad	08bdff862c	[AMDGPU] Fix error message for illegal copy	2023-03-03 11:46:01 +00:00
Jay Foad	f5ab447cf6	[AMDGPU] Add test case for AMDGPUInsertDelayAlu bug	2023-03-03 11:08:39 +00:00
Petar Avramovic	c77bd1fe15	AMDGPU: Add more flat scratch load and store tests for 8 and 16-bit types Add tests for more complicated scratch load and store patterns. Includes: - sign and zero extending loads of i8 and i16 to i32 into 32-bit register - D16 instructions that affect only high or low 16 bits of 32-bit register - D16 sign and zero extending loads of i8 to i16 into high or low 16 bits of 32-bit register - D16 loads of i16 to high or low 16 bits of 32-bit register - D16 stores of i8 and i16 from high 16 bits of 32-bit register Differential Revision: https://reviews.llvm.org/D145081	2023-03-02 13:20:14 +01:00
Anshil Gandhi	7474cd3e2e	[SIAnnotateControlFlow] Use Uniformity analysis Reviewed By: foad Differential Revision: https://reviews.llvm.org/D145013	2023-03-01 10:19:45 -07:00
Anshil Gandhi	1b52c7be91	[AMDGPUUnifyDivergentExitNodes] Use Uniformity Analysis Reviewed By: foad Differential Revision: https://reviews.llvm.org/D145018	2023-03-01 10:17:11 -07:00

1 2 3 4 5 ...

6243 Commits