llvm-project

Author	SHA1	Message	Date
Florian Hahn	b06a45c66f	[VPlan] Add all blocks to outer loop if present during ::execute (NFCI). This ensures that all blocks created during VPlan execution are properly added to an enclosing loop, if present. Split off from https://github.com/llvm/llvm-project/pull/108378 and also needed once more of the skeleton blocks are created directly via VPlan. This also allows removing the custom logic for early-exit loop vectorization added as part of https://github.com/llvm/llvm-project/pull/117008.	2024-12-31 19:34:34 +00:00
Simon Pilgrim	b195bb87e1	[VectorCombine] scalarizeLoadExtract - consistently use LoadInst and ExtractElementInst specific operand getters. NFC Noticed while investigating the hung builds reported after af83093933ca73bc82c33130f8bda9f1ae54aae2	2024-12-31 14:42:39 +00:00
Florian Hahn	ddef380cd6	[VPlan] Move simplifyRecipe(s) definitions up to allow re-use (NFC) Move definitions to allow easy reuse in https://github.com/llvm/llvm-project/pull/108378.	2024-12-31 13:23:19 +00:00
Muhammad Omair Javaid	332d2647ff	Revert "[LV]: Teach LV to recursively (de)interleave. (#89018 )" This reverts commit ccfe0de0e1e37ed369c9bf89dd0188ba0afb2e9a. This breaks LLVM build on AArch64 SVE Linux buildbots https://lab.llvm.org/buildbot/#/builders/143/builds/4462 https://lab.llvm.org/buildbot/#/builders/17/builds/4902 https://lab.llvm.org/buildbot/#/builders/4/builds/4399 https://lab.llvm.org/buildbot/#/builders/41/builds/4299	2024-12-31 03:12:24 +05:00
Simon Pilgrim	d5a96eb125	Revert af83093933ca73bc82c33130f8bda9f1ae54aae2 "[VectorCombine] eraseInstruction - ensure we reattempt to fold other users of an erased instruction's operands" Reports of hung builds, but I don't have time to investigate at the moment.	2024-12-30 21:20:56 +00:00
Simon Pilgrim	af83093933	[VectorCombine] eraseInstruction - ensure we reattempt to fold other users of an erased instruction's operands As we're reducing the use count of the operands its more likely that they will now fold, as they were previously being prevented by a m_OneUse check, or the cost of retaining the extra instruction had been too high. This is necessary for some upcoming patches, although the only change so far is instruction ordering as it allows some SSE folds of 256/512-bit with 128-bit subvectors to occur earlier in foldShuffleToIdentity as the subvector concats are free. Pulled out of #120984	2024-12-30 17:52:42 +00:00
Florian Hahn	16d19aaedf	[VPlan] Manage created blocks directly in VPlan. (NFC) (#120918 ) This patch changes the way blocks are managed by VPlan. Previously all blocks reachable from entry would be cleaned up when a VPlan is destroyed. With this patch, each VPlan keeps track of blocks created for it in a list and this list is then used to delete all blocks in the list when the VPlan is destroyed. To do so, block creation is funneled through helpers in directly in VPlan. The main advantage of doing so is it simplifies CFG transformations, as those do not have to take care of deleting any blocks, just adjusting the CFG. This helps to simplify https://github.com/llvm/llvm-project/pull/108378 and https://github.com/llvm/llvm-project/pull/106748. This also simplifies handling of 'immutable' blocks a VPlan holds references to, which at the moment only include the scalar header block. PR: https://github.com/llvm/llvm-project/pull/120918	2024-12-30 12:08:12 +00:00
Florian Hahn	7f3428d3ed	[VPlan] Compute induction end values in VPlan. (#112145 ) Use createDerivedIV to compute IV end values directly in VPlan, instead of creating them up-front. This allows updating IV users outside the loop as follow-up. Depends on https://github.com/llvm/llvm-project/pull/110004 and https://github.com/llvm/llvm-project/pull/109975. PR: https://github.com/llvm/llvm-project/pull/112145	2024-12-29 19:05:08 +00:00
Simon Pilgrim	f2f02b21cd	[VectorCombine] foldShuffleOfBinops - only accept exact matching cmp predicates m_SpecificCmp allowed equivalent predicate+flags which don't necessarily work after being folded from "shuffle (cmpop), (cmpop)" into "cmpop (shuffle), (shuffle)" Fixes #121110	2024-12-28 09:21:31 +00:00
Fangrui Song	edc42b2dc1	[SLP] Migrate away from PointerUnion::get	2024-12-27 21:01:09 -08:00
Zequan Wu	4d8f9594b2	Revert "Reland "[LoopVectorizer] Add support for partial reductions" (#120721 )" This reverts commit c858bf620c3ab2a4db53e84b9365b553c3ad1aa6 as it casuse optimization crash on -O2, see https://github.com/llvm/llvm-project/pull/120721#issuecomment-2563192057	2024-12-27 11:51:54 -08:00
Florian Hahn	8caeb2e0c2	[VPlan] Always create initial blocks in constructor (NFC). Update C++ unit tests to use VPlanTestBase to construct initial VPlan, using a constructor that creates the VP blocks directly in the constructor. Split off from and in preparation for https://github.com/llvm/llvm-project/pull/120918.	2024-12-27 17:43:22 +00:00
Alexey Bataev	07ba457525	[SLP][NFC]Add dump of combined entries, where applicable	2024-12-27 07:56:10 -08:00
Hassnaa Hamdi	ccfe0de0e1	[LV]: Teach LV to recursively (de)interleave. (#89018 ) Currently available intrinsics are only ld2/st2, which don't support interleaving factor > 2. This patch teaches the LV to use ld2/st2 recursively to support high interleaving factors.	2024-12-27 12:42:07 +00:00
Elvis Wang	47e1c87a61	[VPlan] Set debug location for VPReduction/VPWidenIntrinsicRecipe. (#120054 ) This patch add missing debug location for VPReduction/VPWidenIntrinsicRecipe.	2024-12-27 10:37:21 +08:00
Florian Hahn	2dfe1b4042	[VPlan] Remove stray space when printing reverse vector pointer. printFlags() takes care of printing the required space, remove the extra printed space between flags and operands.	2024-12-26 21:26:17 +00:00
Alexey Bataev	889215a30e	[SLP]Followup fix for the poisonous logical op in reductions If the VectorizedTree still may generate poisonous value, but it is not the original operand of the reduction op, need to check if Res still the operand, to generate correct code. Fixes #114905	2024-12-26 05:11:26 -08:00
LiqinWeng	b5f0ec80d5	[VPlan] Remove redundant printing final in VPlan::execute (#121048 ) Multiple prints will cause problems when testing ir-bb	2024-12-25 10:11:02 +08:00
Alexey Bataev	07d284d4eb	[SLP]Add cost estimation for gather node reshuffling Adds cost estimation for the variants of the permutations of the scalar values, used in gather nodes. Currently, SLP just unconditionally emits shuffles for the reused buildvectors, but in some cases better to leave them as buildvectors rather than shuffles, if the cost of such buildvectors is better. X86, AVX512, -O3+LTO Metric: size..text Program size..text results results0 diff test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test 912998.00 913238.00 0.0% test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test 203070.00 203102.00 0.0% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1396320.00 1396448.00 0.0% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1396320.00 1396448.00 0.0% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 309790.00 309678.00 -0.0% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12477607.00 12470807.00 -0.1% CINT2006/445.gobmk - extra code vectorized MiBench/consumer-lame - small variations CFP2017speed/638.imagick_s CFP2017rate/538.imagick_r - extra vectorized code Benchmarks/Bullet - extra code vectorized CFP2017rate/526.blender_r - extra vector code RISC-V, sifive-p670, -O3+LTO CFP2006/433.milc - regressions, should be fixed by https://github.com/llvm/llvm-project/pull/115173 CFP2006/453.povray - extra vectorized code CFP2017rate/508.namd_r - better vector code CFP2017rate/510.parest_r - extra vectorized code SPEC/CFP2017rate - extra/better vector code CFP2017rate/526.blender_r - extra vectorized code CFP2017rate/538.imagick_r - extra vectorized code CINT2006/403.gcc - extra vectorized code CINT2006/445.gobmk - extra vectorized code CINT2006/464.h264ref - extra vectorized code CINT2006/483.xalancbmk - small variations CINT2017rate/525.x264_r - better vectorization Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/115201	2024-12-24 15:35:29 -05:00
Florian Hahn	2d038caeeb	[VPlan] Remove stray space when printing VPWidenCastRecipe. printFlags() already takes care of printing a single space if there are no flags. Remove the extra space when printing a recipe without flags.	2024-12-24 20:23:48 +00:00
Alexey Bataev	852feea820	[SLP]Propagate AssumptionCache where possible	2024-12-24 09:20:26 -08:00
Alexey Bataev	0d6cb0ae9d	[SLP]Fix strict weak ordering criterion in comparators Fixes #121019	2024-12-24 08:13:57 -08:00
Alexey Bataev	f0f8dab712	[SLP]Check if the first reduced value requires freeze/swap, if it may be too poisonous If several reduced values are combined and the first reduced value is just the original reduced value of the bool logical op, need to freeze it to prevent the propagation of the poison value. Fixes #114905	2024-12-24 07:40:35 -08:00
Sam Tebbs	c858bf620c	Reland "[LoopVectorizer] Add support for partial reductions" (#120721 ) This re-lands the reverted #92418 When the VF is small enough so that dividing the VF by the scaling factor results in 1, the reduction phi execution thinks the VF is scalar and sets the reduction's output as a scalar value, tripping assertions expecting a vector value. The latest commit in this PR fixes that by using `State.VF` in the scalar check, rather than the divided VF. --------- Co-authored-by: Nicholas Guy <nicholas.guy@arm.com>	2024-12-24 12:08:17 +00:00
Alexey Bataev	030829a7e5	[SLP]Drop samesign flag if the vector node has reduced bitwidth If the operands of the icmp instructions has reduced bitwidth after MinBitwidth analysis, need to drop samesign flag to preserve correctness of the transformation. Fixes #120823	2024-12-23 16:55:11 -08:00
Benjamin Maxwell	9ab5474e56	[LV] Rename `ToVectorTy` to `toVectorTy` (NFC) (#120404 ) This is for consistency with other helpers (and also follows the LLVM naming conventions).	2024-12-23 23:33:44 +00:00
Florian Hahn	c7a777322d	[VPlan] Replace else-if dyn_cast with cast (NFC). The recipes handled here are either VPWidenIntrinsic or VPWidenCast, so replace the else-if dyn_cast with a single else + cast.	2024-12-23 19:46:22 +00:00
Simon Pilgrim	e3f8c229f5	[VectorCombine] foldInsExtVectorToShuffle - inserting into a poison base vector can be modelled as a single src shuffle We already canonicalized an undef base vector to the RHS to improve further folding, this extends this to improve the shuffle cost estimate of the single src shuffle	2024-12-23 15:49:17 +00:00
Simon Pilgrim	29c89d7265	[VectorCombine] foldShuffleOfShuffles - fold "shuffle (shuffle x, y, m1), (shuffle y, x, m2)" -> "shuffle x, y, m3" (#120959 ) foldShuffleOfShuffles currently only folds unary shuffles to ensure we don't end up with a merged shuffle with more than 2 sources, but this prevented cases where both shuffles were sharing sources. This patch generalizes the merge process to find up to 2 sources as it merges with the inner shuffles, it also moves the undef/poison handling stages into the merge loop as well. Fixes #120764	2024-12-23 14:56:15 +00:00
Han-Kuan Chen	11676da808	[SLP] Normalize debug messages for newTreeEntry. (#119514 ) A debug message should follow after newTreeEntry. Make ExtractValueInst and ExtractElementInst use setOperand directly.	2024-12-23 21:42:02 +08:00
LiqinWeng	b1fab4f849	[LV][VPlan] Initialize the variable 'VPID' of the createEVLRecipe (#120926 ) Resolve the compilation error caused by the merge issue: #119510	2024-12-23 09:23:22 +08:00
LiqinWeng	8a51471d83	[LV][VPlan] Extract the implementation of transform Recipe to EVLRecipe into a small function. NFC (#119510 )	2024-12-23 08:28:19 +08:00
Simon Pilgrim	bf873aa3ec	[VectorCombine] foldShuffleToIdentity - add debug message for match Helps with debugging to show to that the fold found the match.	2024-12-22 17:21:44 +00:00
Simon Pilgrim	f96337e04e	[VectorCombine] foldConcatOfBoolMasks - add debug message for match + cost-comparison Helps with debugging to show to that the fold found the match, and shows the old + new costs to indicate whether the fold was/wasn't profitable.	2024-12-22 16:21:02 +00:00
Florian Hahn	e1833e3a7e	[VPlan] Simplify redundant VPDerivedIVRecipe (NFC). Split DerivedIV simplification off from https://github.com/llvm/llvm-project/pull/112145 and use to remove the need for extra checks in createScalarIVSteps. Required an extra simplification run after IV transforms.	2024-12-22 09:39:19 +00:00
LiqinWeng	86fa35ce7e	[LV][VPlan] Use opcode to retrieve the VPID of the CallRecipe, rather than underlying instruction (#120816 ) This patch may cause the flags in the CallRecipe to be lost after EVL transformation, and it has been addressed in the patch: #119847	2024-12-22 10:28:20 +08:00
Florian Hahn	9b496deb90	[VPlan] Set and use debug location for VPPredInstPHIRecipe. Update the recipe it always set its debug location and use it during IR generation.	2024-12-21 21:57:47 +00:00
Florian Hahn	bb86c5dd4d	[VPlan] Use inferScalarType in VPInstruction::ResumePhi codegen (NFC). Use VPlan-based type analysis to retrieve type of phi node. Also adds missing type inference for ResumePhi and ComputeReductionResult opcodes.	2024-12-21 15:55:21 +00:00
vporpo	7a38445ee2	[SandboxVec][DAG] Register move instr callback (#120146 ) This patch implements the move instruction notifier for the DAG. Whenever an instruction moves the notifier will maintain the DAG.	2024-12-20 23:10:24 -08:00
Simon Pilgrim	82b5bda42c	[VectorCombine] Add "VC: Erasing" debug message to help the log show when dead WorkList instructions are erased.	2024-12-20 17:59:14 +00:00
Simon Pilgrim	e3157d3f0d	[VectorCombine] foldBitcastShuffle - add debug message for match + cost-comparison Helps with debugging to show to that the fold found the match, and shows the old + new costs to indicate whether the fold was/wasn't profitable.	2024-12-20 17:59:13 +00:00
David Green	70eac255b8	[VectorCombine] Add fp cast handling for shuffletoidentity (#120641 ) This fixes some regressions from recent changes to vector combine in #120216. It allows shuffleToIdentity to look through fp casts as other casts, and makes sure mismatching vector types in splats and casts do not block the transform, as only the lanes should matter.	2024-12-20 15:05:08 +00:00
Simon Pilgrim	b87a5fb9fd	[VectorCombine] Add "VC: Visiting" debug message to help the log show the instruction folding order.	2024-12-20 14:57:58 +00:00
Simon Pilgrim	5f0db7c112	[VectorCombine] Add "VECTORCOMBINE on <FUNCTION_NAME>" title debug message to help finding vectorcombine stages in the debug log	2024-12-20 13:32:49 +00:00
Simon Pilgrim	c5434804ee	[VectorCombine] foldInsExtVectorToShuffle - add debug message for match + cost-comparison Helps with debugging to show to that the fold found the match, and shows the old + new costs to indicate whether the fold was/wasn't profitable.	2024-12-20 13:32:49 +00:00
hanbeom	ff93ca7d6c	[VectorCombine] Combine scalar fneg with insert/extract to vector fneg when length is different (#120461 ) insertelt DestVec, (fneg (extractelt SrcVec, Index)), Index -> shuffle DestVec, (shuffle (fneg SrcVec), poison, SrcMask), Mask Original combining left the combine between vectors of different lengths as a TODO. this commit do that. (see #[`baab4aa1ba`])	2024-12-20 10:44:49 +00:00
Florian Hahn	5f096fd221	Revert "[LoopVectorizer] Add support for partial reductions (#92418 )" This reverts commit 060d62b48aeb5080ffcae1dc56e41a06c6f56701. It looks like this is triggering an assertion when build llvm-test-suite on ARM64 macOS. Reproducer from MultiSource/Benchmarks/Ptrdist/bc/number.c target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-n32:64-S128-Fn32" target triple = "arm64-apple-macosx15.0.0" define void @test(i64 %idx.neg, i8 %0) #0 { entry: br label %while.body while.body: ; preds = %while.body, %entry %n1ptr.0.idx131 = phi i64 [ %n1ptr.0.add, %while.body ], [ %idx.neg, %entry ] %n2ptr.0.idx130 = phi i64 [ %n2ptr.0.add, %while.body ], [ 0, %entry ] %sum.1129 = phi i64 [ %add99, %while.body ], [ 0, %entry ] %n1ptr.0.add = add i64 %n1ptr.0.idx131, 1 %conv = sext i8 %0 to i64 %n2ptr.0.add = add i64 %n2ptr.0.idx130, 1 %1 = load i8, ptr null, align 1 %conv97 = sext i8 %1 to i64 %mul = mul i64 %conv97, %conv %add99 = add i64 %mul, %sum.1129 %cmp94 = icmp ugt i64 %n1ptr.0.idx131, 0 %cmp95 = icmp ne i64 %n2ptr.0.idx130, -1 %2 = and i1 %cmp94, %cmp95 br i1 %2, label %while.body, label %while.end.loopexit while.end.loopexit: ; preds = %while.body %add99.lcssa = phi i64 [ %add99, %while.body ] ret void } attributes #0 = { "target-cpu"="apple-m1" } > opt -p loop-vectorize Assertion failed: ((VF.isScalar() \|\| V->getType()->isVectorTy()) && "scalar values must be stored as (0, 0)"), function set, file VPlan.h, line 284.	2024-12-19 21:46:51 +00:00
Finn Plummer	45c01e8a33	[NFC][TargetTransformInfo][VectorUtils] Consolidate `isVectorIntrinsic...` api (#117635 ) - update `VectorUtils:isVectorIntrinsicWithScalarOpAtArg` to use TTI for all uses, to allow specifiction of target specific intrinsics - add TTI to the `isVectorIntrinsicWithStructReturnOverloadAtField` api - update TTI api to provide `isTargetIntrinsicWith...` functions and consistently name them - move `isTriviallyScalarizable` to VectorUtils - update all uses of the api and provide the TTI parameter Resolves #117030	2024-12-19 11:54:26 -08:00
Nicholas Guy	060d62b48a	[LoopVectorizer] Add support for partial reductions (#92418 ) Following on from https://github.com/llvm/llvm-project/pull/94499, this patch adds support to the Loop Vectorizer to emit the partial reduction intrinsics where they may be beneficial for the target. --------- Co-authored-by: Samuel Tebbs <samuel.tebbs@arm.com>	2024-12-19 11:42:40 +00:00
David Sherwood	c18fda02e1	[LoopVectorize] Use new single string variant of reportVectorizationFailure (#120414 )	2024-12-19 10:07:13 +00:00

1 2 3 4 5 ...

5358 Commits