llvm-project

Author	SHA1	Message	Date
Matt Arsenault	cd60bff329	CodeGen: Add some additional is_fpclass lowering tests Cover more cases in preparation for making greater use of fcmp based lowerings. Also add more tests for the inverted cases. Test iszero \| isnan test masks. We should probably just generate every combination of test masks.	2023-03-15 01:13:08 -04:00
Simon Pilgrim	4bf004e07e	[DAG] Fold (bitcast (logicop (bitcast x), (c))) -> (logicop x, (bitcast c)) iff the current logicop type is illegal Try to remove extra bitcasts around logicops if we're dealing with illegal types Fixes the regressions in D145939 Differential Revision: https://reviews.llvm.org/D146032	2023-03-14 14:41:11 +00:00
pvanhout	1f1fea6c38	Reland: [DAG/AMDGPU] Use UniformityAnalysis in DAGISel Switch DAGISel over to UniformityAnalysis, which was one of the last remaining users of the DivergenceAnalysis. No explosions seen during internal testing so this looks like a smooth transition. Reviewed By: sameerds Differential Revision: https://reviews.llvm.org/D145918	2023-03-14 14:38:45 +01:00
pvanhout	0ea6f0e158	[AMDGPU] Don't run `llc-pipeline.ll` when expensive_checks are enabled AMDGPU ISel can add extra passes when expensive checks are enabled. This means the pipeline can be reordered and the checks may fail. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D146038	2023-03-14 14:12:36 +01:00
pvanhout	0e79106fc9	Revert "[DAG/AMDGPU] Use UniformityAnalysis in DAGISel" This reverts commit 0022b5803fd4f5a4e9fcf233267c0ffa1b88f763.	2023-03-14 11:48:58 +01:00
pvanhout	0022b5803f	[DAG/AMDGPU] Use UniformityAnalysis in DAGISel Switch DAGISel over to UniformityAnalysis, which was one of the last remaining users of the DivergenceAnalysis. No explosions seen during internal testing so this looks like a smooth transition. Reviewed By: sameerds Differential Revision: https://reviews.llvm.org/D145918	2023-03-14 11:18:28 +01:00
Chen Zheng	4f0ed16a46	Reland rGf35a09daebd0a90daa536432e62a2476f708150d and rG63854f91d3ee1056796a5ef27753648396cac6ec [DAGCombiner] handle more store value forwarding When lowering calls on target like PPC, some stack loads will be generated for by value parameters. Node CALLSEQ_START prevents such loads from being combined. Suggested by @RolandF, this patch removes the unnecessary loads for the byval parameter by extending ForwardStoreValueToDirectLoad Reviewed By: nemanjai, RolandF Differential Revision: https://reviews.llvm.org/D138899	2023-03-12 21:59:18 -04:00
Simon Pilgrim	f759275c1c	[AMDGPU] Regenerate sdwa-peephole.ll	2023-03-12 13:50:25 +00:00
Jon Chesterfield	d3dda422bf	[amdgpu][nfc] Replace ad hoc LDS frame recalculation with absolute_symbol MD Post ISel, LDS variables are absolute values. Representing them as such is simpler than the frame recalculation currently used to build assembler tables from their addresses. This is a precursor to lowering dynamic/external LDS accesses from non-kernel functions. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D144221	2023-03-12 13:47:48 +00:00
Simon Pilgrim	b53ea2b9c5	[DAG] visitAND - fold (and (any_ext V), c) -> (zero_ext (and (trunc V), c)) if profitable. Try to more aggressively narrow masks of extended values. This is mainly for cases where the mask is trying to zero out any_extended upper bits, assuming we can zext/trunc the values for free. This catches a few actual missed folds, as well as helps canonicalize a number of other cases which were being caught in isel etc. Differential Revision: https://reviews.llvm.org/D145866	2023-03-12 13:25:23 +00:00
Mirko Brkusanin	2eada459c7	[AMDGPU][MachineVerifier] Fix vdata reg count for MIMG d16 Differential Revision: https://reviews.llvm.org/D145785	2023-03-10 14:47:49 +01:00
Max Kazantsev	6b03ce374e	[LICM] Simplify (X < A && X < B) into (X < MIN(A, B)) if MIN(A, B) is loop-invariant We don't do this transform in InstCombine in general case for arbitrary values, because cost of AND and 2 ICMP's isn't higher than of MIN and ICMP. However, LICM also has a notion about the loop structure. This transform becomes profitable if `A` and `B` are loop-invariant and `X` is not: by doing this, we can compute min outside the loop. Differential Revision: https://reviews.llvm.org/D143726 Reviewed By: nikic	2023-03-10 17:36:52 +07:00
Max Kazantsev	279f0c02ad	[Test] Regenerate tests using update_llc_test_checks.py	2023-03-10 11:34:16 +07:00
Valery Pykhtin	8f6c47b7a4	[AMDGPU] Speedup GCNDownwardRPTracker::advanceBeforeNext The function makes liveness tests for the entire live register set for every instruction it passes by. This becomes very slow on high RP regions such as ASAN enabled code. Instead only uses of last tracked instruction should be tested and this greatly improves compilation time. This patch revealed few bugs in SIFormMemoryClauses and PreRARematStage::sinkTriviallyRematInsts which should be fixed first. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D136267	2023-03-09 15:18:02 +01:00
Diana Picus	99d053a97d	AMDGPU: Update checks for a couple of tests. NFC	2023-03-09 15:09:19 +01:00
Petar Avramovic	ded69779be	Fix SGPR + VGPR + offset Scratch offset folding Values in SGPR and VGPR register are treated as unsigned by hardware. When value in 32-bit SGPR or VGPR base can be negative calculate offset using 32-bit add instructions, otherwise use sgpr(unsigned) + vgpr(unsigned) + offset. LoopStrengthReduce.cpp changes offsets to negative and in some iterations value in SGPR or VGPR register could be negative. Differential Revision: https://reviews.llvm.org/D144957	2023-03-09 10:53:41 +01:00
Petar Avramovic	3ae310d0ae	Fix VGPR + offset Scratch offset folding Values in VGPR register are treated as unsigned by hardware. When value in 32-bit VGPR base can be negative calculate offset using 32-bit add instruction, otherwise use vgpr base(unsigned) + offset. Does not affect case where whole offset comes from VGPR register (immediate offset is 0). LoopStrengthReduce.cpp changes offsets to negative and in some iterations value in VGPR register could be negative. Differential Revision: https://reviews.llvm.org/D144956	2023-03-09 10:52:44 +01:00
Petar Avramovic	5e56d59999	Fix SGPR + offset Scratch offset folding Values in SGPR register are treated as unsigned by hardware. When value in 32-bit SGPR base can be negative calculate offset using 32-bit add instruction, otherwise use sgpr base(unsigned) + offset. Does not affect case where whole offset comes from SGPR register (immediate offset is 0). LoopStrengthReduce.cpp changes offsets to negative and in some iterations value in SGPR register could be negative. Differential Revision: https://reviews.llvm.org/D144955	2023-03-09 10:52:44 +01:00
Stanislav Mekhanoshin	e7ec123c6a	[AMDGPU] Implement idempotent atomic lowering This turns an idempotent atomic operation into an atomic load. Fixes: SWDEV-385135 Differential Revision: https://reviews.llvm.org/D144759	2023-03-08 14:09:59 -08:00
Stanislav Mekhanoshin	59162e3859	[AMDGPU] Skip buffer_wbl2 before atomic fence acquire Memory models for gfx90a and gfx940 do not require buffer_wbl2 before the fence for acquire ordering, but we do insert the full release. Fixes: SWDEV-386785 Differential Revision: https://reviews.llvm.org/D145524	2023-03-08 01:24:20 -08:00
Christudasan Devadasan	2171f04c12	[AMDGPU] Extend WorkGroupID* codegen for compute shaders Currently, the codegen support for llvm.amdgcn.workgroup.id* intrinsics are enabled only for compute kernels. In addition, this patch enables their selection for compute shaders on subtargets that have architected SGPRs. Differential Revision: https://reviews.llvm.org/D145045	2023-03-08 07:36:19 +05:30
Florian Hahn	7019624ee1	[SCEV] Strengthen nowrap flags via ranges for ARs on construction. At the moment, proveNoWrapViaConstantRanges is only used when creating SCEV[Zero,Sign]ExtendExprs. We can get significant improvements by strengthening flags after creating the AddRec. I'll also share a follow-up patch that removes the code to strengthen flags when creating SCEV[Zero,Sign]ExtendExprs. Modifying AddRecs while creating those can lead to surprising changes. Compile-time looks neutral: https://llvm-compile-time-tracker.com/compare.php?from=94676cf8a13c511a9acfc24ed53c98964a87bde3&to=aced434e8b103109104882776824c4136c90030d&stat=instructions:u Reviewed By: mkazantsev, nikic Differential Revision: https://reviews.llvm.org/D144050	2023-03-07 17:10:34 +01:00
pvanhout	edca49cfb7	[AMDGPU] Match med3 for (max (min ..)) We previously only matched (min (max ...)) Depends on D144728 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D145159	2023-03-07 11:14:31 +01:00
pvanhout	764e39048c	[AMDGPU] Precommit test: v_sat_pk_u8_i16.ll Differential Revision: https://reviews.llvm.org/D144728	2023-03-07 11:07:13 +01:00
Jay Foad	5281f5c1e6	[AMDGPU] Add GFX9,GFX10,GFX11 checks for llvm.amdgcn.s.buffer.load	2023-03-06 18:19:50 +00:00
Jay Foad	e73d3150b1	[AMDGPU] Generate checks for llvm.amdgcn.s.buffer.load	2023-03-06 18:19:50 +00:00
pvanhout	036431e31e	[AMDGPU] Use UniformityAnalysis in LateCodeGenPrepare Reviewed By: foad Differential Revision: https://reviews.llvm.org/D145366	2023-03-06 13:35:57 +01:00
pvanhout	dbebebf6f6	[AMDGPU] Use UniformityAnalysis in CodeGenPrepare A little extra change was needed in UA because it didn't consider InvokeInst and it made call-constexpr.ll assert. Reviewed By: sameerds, arsenm Differential Revision: https://reviews.llvm.org/D145358	2023-03-06 13:26:51 +01:00
Jay Foad	271010bf50	[AMDGPU] Restore temporal divergence in test The loop in this test was supposed to have temporal divergence but this was broken by r367221. Fix it.	2023-03-06 12:09:52 +00:00
pvanhout	7a5d850da2	[AMDGPU] Use UniformityAnalysis in RewriteUndefsForPHI Reviewed By: foad Differential Revision: https://reviews.llvm.org/D145359	2023-03-06 12:15:33 +01:00
Matt Arsenault	9f4746b65f	AMDGPU: Combine down fcopysign f64 magnitude Copy through the low bits and only apply an f32 copysign to the high half. This is effectively what we do for codegen anyway, but this provides some combine benefits. The cases involving constants show some small improvements. https://reviews.llvm.org/D142682	2023-03-06 05:54:25 -04:00
Matt Arsenault	606a62ce27	AMDGPU: Force sign operand of f64 fcopysign to f32 The fcopysign DAG operation, unlike the IR one, allows different types for the sign and magnitude. We can reduce the bitwidth of the high operand since only the sign bit matters. The default combine only introduces mixed fcopysign operand types from fpext/fptrunc. We effectively do this already during selection, but doing it earlier in the combiner should expose new combine opportunities (e.g. the existing tests now eliminate the load of the low half of the double). Unfortunately this isn't enough to handle the case I'm interested in just yet.	2023-03-05 19:54:13 -04:00
Matt Arsenault	bd1f7c417f	AMDGPU: Try to push fneg as integer into select I initially attempted to select the source modifier from xor of a sign mask. This proved to be more difficult since foldBinOpIntoSelect does not consider free fneg of integers and undoes the combine.	2023-03-05 18:53:16 -04:00
Jeffrey Byrnes	b89236a96f	[AMDGPU] Vectorize misaligned global loads & stores Based on experimentation on gfx906,908,90a and 1030, wider global loads / stores are more performant than multiple narrower ones independent of alignment -- this is especially true when combining 8 bit loads / stores, in which case speedup was usually 2x across all alignments. Differential Revision: https://reviews.llvm.org/D145170 Change-Id: I6ee6c76e6ace7fc373cc1b2aac3818fc1425a0c1	2023-03-03 13:18:25 -08:00
Jay Foad	7442f8635b	[AMDGPU] Fix invalid instid value in s_delay_alu instruction Differential Revision: https://reviews.llvm.org/D145232	2023-03-03 21:08:26 +00:00
Jay Foad	08bdff862c	[AMDGPU] Fix error message for illegal copy	2023-03-03 11:46:01 +00:00
Jay Foad	f5ab447cf6	[AMDGPU] Add test case for AMDGPUInsertDelayAlu bug	2023-03-03 11:08:39 +00:00
Petar Avramovic	c77bd1fe15	AMDGPU: Add more flat scratch load and store tests for 8 and 16-bit types Add tests for more complicated scratch load and store patterns. Includes: - sign and zero extending loads of i8 and i16 to i32 into 32-bit register - D16 instructions that affect only high or low 16 bits of 32-bit register - D16 sign and zero extending loads of i8 to i16 into high or low 16 bits of 32-bit register - D16 loads of i16 to high or low 16 bits of 32-bit register - D16 stores of i8 and i16 from high 16 bits of 32-bit register Differential Revision: https://reviews.llvm.org/D145081	2023-03-02 13:20:14 +01:00
Anshil Gandhi	7474cd3e2e	[SIAnnotateControlFlow] Use Uniformity analysis Reviewed By: foad Differential Revision: https://reviews.llvm.org/D145013	2023-03-01 10:19:45 -07:00
Anshil Gandhi	1b52c7be91	[AMDGPUUnifyDivergentExitNodes] Use Uniformity Analysis Reviewed By: foad Differential Revision: https://reviews.llvm.org/D145018	2023-03-01 10:17:11 -07:00
Zhongyunde	15d5c59280	[InstCombine] Improvement the analytics through the dominating condition Address the dominating condition, the urem fold is benefit from the analytics improvements. Fix https://github.com/llvm/llvm-project/issues/60546 NOTE: delete the calls in simplifyBinaryIntrinsic and foldICmpWithDominatingICmp is used to reduce compile time. Reviewed By: nikic, arsenm, erikdesjardins Differential Revision: https://reviews.llvm.org/D144248	2023-03-01 17:03:34 +08:00
Anshil Gandhi	a78301560d	[AMDGPU] Replace LegacyDA with Uniformity Analysis in AnnotateUniformValues Reviewed By: sameerds Differential Revision: https://reviews.llvm.org/D144162	2023-02-28 13:05:38 -07:00
zhongyunde	d514726d31	[AMDGPU] Update the CHECK autogenerated as it's expired Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D144771	2023-02-28 02:01:25 +08:00
Diana Picus	44f1cb04e5	[AMDGPU] Run update scripts on existing tests. NFC Update a few tests where the checks aren't exactly kosher. Differential Revision: https://reviews.llvm.org/D144639	2023-02-27 09:48:57 +01:00
Jon Chesterfield	bf579a7049	[amdgpu] Change LDS lowering default to hybrid Postponed from D139433 until the bug fixed by D139874 could be resolved. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D141852	2023-02-24 15:20:12 +00:00
Jon Chesterfield	8e2f8387cc	[amdgpu] Add test case showing bug prior to D141852	2023-02-24 15:15:07 +00:00
Vikram	64fc892cda	[AMDGPU] Autogenerate carryout-selection.ll, uaddo.ll, usubo.ll (NFC) Differential Revision: https://reviews.llvm.org/D143987	2023-02-24 02:03:56 -05:00
Piotr Sobczak	ab174c57f4	[AMDGPU] Add more tests for buffer intrinsics Add more tests for buffer intrinsics with large voffsets.	2023-02-23 14:39:12 +01:00
Mirko Brkusanin	926746d22a	[AMDGPU][GFX11] Legalize and select partial NSA MIMG instructions If more registers are needed for VAddr then the NSA format allows then the final register can act as a contigous set of remaining addresses. Update legalizer to pack register for this new format and allow instruction selection to use NSA encoding when number of addresses exceeds max size. Also update SIShrinkInstructions to handle partial NSA. Differential Revision: https://reviews.llvm.org/D144034	2023-02-23 13:33:34 +01:00
Diana Picus	da629d3381	[AMDGPU] Add GISel RUN lines to 2 existing tests. NFC This adds a bit of coverage for GlobalISel. Differential Revision: https://reviews.llvm.org/D144555	2023-02-23 09:46:54 +01:00

1 2 3 4 5 ...

6233 Commits