llvm-project

Author	SHA1	Message	Date
David Sherwood	1feeeb47e5	[LoopVectorize][NFC] Move "LV: Selecting VF" debug output (#120744 ) Move the debug output that prints out the selected VF from selectVectorizationFactor -> computeBestVF. This means that the output will still be written even after removing the assert for the legacy and vplan cost models matching.	2025-01-06 10:39:34 +00:00
Florian Hahn	f4230b4332	[VPlan] Add and use debug location for VPScalarCastRecipe. Update the recipe it always take a debug location and set it.	2025-01-05 20:08:51 +00:00
Florian Hahn	f48884ded8	[VPlan] Remove loop region in optimizeForVFAndUF. (#108378 ) Update optimizeForVFAndUF to completely remove the vector loop region when possible. At the moment, we cannot remove the region if it contains * widened IVs: the recipe is needed to generate the step vector * reductions: ComputeReductionResults requires the reduction phi recipe for codegen. Both cases can be addressed by more explicit modeling. The patch also includes a number of updates to allow executing VPlans without a vector loop region. Depends on https://github.com/llvm/llvm-project/pull/110004	2025-01-05 15:50:42 +00:00
Florian Hahn	df4a615c98	[VPlan] Convert induction increment check to be VPlan-based. Check the VPlan directly to determine if a VPValue is an optimiziable IV or IV use instead of checking the underlying IR instructions. Split off from https://github.com/llvm/llvm-project/pull/112147. This refactoring enables moving IV end value creation from the legacy fixupIVUsers to a VPlan-based transform. There is one case we now won't optimize, that is IVs with subtracts and non-constant steps. But as this is a minor optimization and doesn't impact correctness, the benefits of performing the check in VPlan should outweigh the missed case.	2025-01-05 11:16:01 +00:00
Florian Hahn	b95cce9904	[VPlan] Update wide induction inc recipes to use same step as Wide IV. Update wide induction increments to use the same step as the corresponding wide induction. This enables detecting induction increments directly in VPlan and removes redundant splats.	2025-01-04 20:04:59 +00:00
Florian Hahn	11c6af666b	[VPlan] Fix name ExitVPBB -> MiddleVPBB (NFC). ExitVPBB actually refers to the middle block, clarify name.	2025-01-03 19:28:03 +00:00
John Brawn	073e65a8e5	[LoopVectorize] Make needsExtract notice scalarized instructions (#119720 ) LoopVectorizationCostModel::needsExtract should recognise instructions that have been widened by scalarizing as scalar instructions, and thus not needing an extract when used by later scalarized instructions. This fixes an incorrect cost calculation in computePredInstDiscount, where we are adding a scalarization overhead cost when we shouldn't, though I haven't come up with a test case where it makes a difference. It will make a difference when the cost model switches to using the cost kind TCK_CodeSize for optsize, as not doing this causes the test LoopVectorize/X86/small-size.ll to get worse.	2025-01-02 14:31:36 +00:00
Florian Hahn	207e485f4b	[VPlan] Track VectorPH during skeleton creation. (NFC) Split off from https://github.com/llvm/llvm-project/pull/108378. This ensures that the logic works even if now vector region exits.	2025-01-02 11:09:03 +00:00
Florian Hahn	c7ebe4fd0a	[VPlan] Replace VPBBs with VPIRBBs during skeleton creation (NFC). Move replacement of VPBBs for vector preheader, middle block and scalar preheader from VPlan::execute to skeleton creation, which actually creates the IR basic blocks. For now, the vector preheader can only be replaced after prepareToExecute as it may create new instructions in the vector preheader.	2025-01-01 22:05:43 +00:00
Florian Hahn	418dedc234	[VPlan] Remove redundant setting of insert point in ::executePlan (NFC). The entry block is a VPIRBasicBkock wrapping the original loop's preheader, so the insert point doesn't need to be set.	2025-01-01 21:44:22 +00:00
Florian Hahn	b06a45c66f	[VPlan] Add all blocks to outer loop if present during ::execute (NFCI). This ensures that all blocks created during VPlan execution are properly added to an enclosing loop, if present. Split off from https://github.com/llvm/llvm-project/pull/108378 and also needed once more of the skeleton blocks are created directly via VPlan. This also allows removing the custom logic for early-exit loop vectorization added as part of https://github.com/llvm/llvm-project/pull/117008.	2024-12-31 19:34:34 +00:00
Muhammad Omair Javaid	332d2647ff	Revert "[LV]: Teach LV to recursively (de)interleave. (#89018 )" This reverts commit ccfe0de0e1e37ed369c9bf89dd0188ba0afb2e9a. This breaks LLVM build on AArch64 SVE Linux buildbots https://lab.llvm.org/buildbot/#/builders/143/builds/4462 https://lab.llvm.org/buildbot/#/builders/17/builds/4902 https://lab.llvm.org/buildbot/#/builders/4/builds/4399 https://lab.llvm.org/buildbot/#/builders/41/builds/4299	2024-12-31 03:12:24 +05:00
Florian Hahn	16d19aaedf	[VPlan] Manage created blocks directly in VPlan. (NFC) (#120918 ) This patch changes the way blocks are managed by VPlan. Previously all blocks reachable from entry would be cleaned up when a VPlan is destroyed. With this patch, each VPlan keeps track of blocks created for it in a list and this list is then used to delete all blocks in the list when the VPlan is destroyed. To do so, block creation is funneled through helpers in directly in VPlan. The main advantage of doing so is it simplifies CFG transformations, as those do not have to take care of deleting any blocks, just adjusting the CFG. This helps to simplify https://github.com/llvm/llvm-project/pull/108378 and https://github.com/llvm/llvm-project/pull/106748. This also simplifies handling of 'immutable' blocks a VPlan holds references to, which at the moment only include the scalar header block. PR: https://github.com/llvm/llvm-project/pull/120918	2024-12-30 12:08:12 +00:00
Florian Hahn	7f3428d3ed	[VPlan] Compute induction end values in VPlan. (#112145 ) Use createDerivedIV to compute IV end values directly in VPlan, instead of creating them up-front. This allows updating IV users outside the loop as follow-up. Depends on https://github.com/llvm/llvm-project/pull/110004 and https://github.com/llvm/llvm-project/pull/109975. PR: https://github.com/llvm/llvm-project/pull/112145	2024-12-29 19:05:08 +00:00
Zequan Wu	4d8f9594b2	Revert "Reland "[LoopVectorizer] Add support for partial reductions" (#120721 )" This reverts commit c858bf620c3ab2a4db53e84b9365b553c3ad1aa6 as it casuse optimization crash on -O2, see https://github.com/llvm/llvm-project/pull/120721#issuecomment-2563192057	2024-12-27 11:51:54 -08:00
Hassnaa Hamdi	ccfe0de0e1	[LV]: Teach LV to recursively (de)interleave. (#89018 ) Currently available intrinsics are only ld2/st2, which don't support interleaving factor > 2. This patch teaches the LV to use ld2/st2 recursively to support high interleaving factors.	2024-12-27 12:42:07 +00:00
Elvis Wang	47e1c87a61	[VPlan] Set debug location for VPReduction/VPWidenIntrinsicRecipe. (#120054 ) This patch add missing debug location for VPReduction/VPWidenIntrinsicRecipe.	2024-12-27 10:37:21 +08:00
Sam Tebbs	c858bf620c	Reland "[LoopVectorizer] Add support for partial reductions" (#120721 ) This re-lands the reverted #92418 When the VF is small enough so that dividing the VF by the scaling factor results in 1, the reduction phi execution thinks the VF is scalar and sets the reduction's output as a scalar value, tripping assertions expecting a vector value. The latest commit in this PR fixes that by using `State.VF` in the scalar check, rather than the divided VF. --------- Co-authored-by: Nicholas Guy <nicholas.guy@arm.com>	2024-12-24 12:08:17 +00:00
Benjamin Maxwell	9ab5474e56	[LV] Rename `ToVectorTy` to `toVectorTy` (NFC) (#120404 ) This is for consistency with other helpers (and also follows the LLVM naming conventions).	2024-12-23 23:33:44 +00:00
Florian Hahn	5f096fd221	Revert "[LoopVectorizer] Add support for partial reductions (#92418 )" This reverts commit 060d62b48aeb5080ffcae1dc56e41a06c6f56701. It looks like this is triggering an assertion when build llvm-test-suite on ARM64 macOS. Reproducer from MultiSource/Benchmarks/Ptrdist/bc/number.c target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-n32:64-S128-Fn32" target triple = "arm64-apple-macosx15.0.0" define void @test(i64 %idx.neg, i8 %0) #0 { entry: br label %while.body while.body: ; preds = %while.body, %entry %n1ptr.0.idx131 = phi i64 [ %n1ptr.0.add, %while.body ], [ %idx.neg, %entry ] %n2ptr.0.idx130 = phi i64 [ %n2ptr.0.add, %while.body ], [ 0, %entry ] %sum.1129 = phi i64 [ %add99, %while.body ], [ 0, %entry ] %n1ptr.0.add = add i64 %n1ptr.0.idx131, 1 %conv = sext i8 %0 to i64 %n2ptr.0.add = add i64 %n2ptr.0.idx130, 1 %1 = load i8, ptr null, align 1 %conv97 = sext i8 %1 to i64 %mul = mul i64 %conv97, %conv %add99 = add i64 %mul, %sum.1129 %cmp94 = icmp ugt i64 %n1ptr.0.idx131, 0 %cmp95 = icmp ne i64 %n2ptr.0.idx130, -1 %2 = and i1 %cmp94, %cmp95 br i1 %2, label %while.body, label %while.end.loopexit while.end.loopexit: ; preds = %while.body %add99.lcssa = phi i64 [ %add99, %while.body ] ret void } attributes #0 = { "target-cpu"="apple-m1" } > opt -p loop-vectorize Assertion failed: ((VF.isScalar() \|\| V->getType()->isVectorTy()) && "scalar values must be stored as (0, 0)"), function set, file VPlan.h, line 284.	2024-12-19 21:46:51 +00:00
Nicholas Guy	060d62b48a	[LoopVectorizer] Add support for partial reductions (#92418 ) Following on from https://github.com/llvm/llvm-project/pull/94499, this patch adds support to the Loop Vectorizer to emit the partial reduction intrinsics where they may be beneficial for the target. --------- Co-authored-by: Samuel Tebbs <samuel.tebbs@arm.com>	2024-12-19 11:42:40 +00:00
David Sherwood	c18fda02e1	[LoopVectorize] Use new single string variant of reportVectorizationFailure (#120414 )	2024-12-19 10:07:13 +00:00
Florian Hahn	0e8d022ffe	[VPlan] Handle exit phis with multiple operands in addUsersInExitBlocks. (#120260 ) Currently the addUsersInExitBlocks incorrectly assumes exit phis only have a single operand, which may not be the case for loops with early exits when they share a common exit block. Also further relax the assertion in fixupIVUsers to allow exit values if they come from theloop latch/middle.block. PR: https://github.com/llvm/llvm-project/pull/120260	2024-12-18 14:47:16 +00:00
David Sherwood	13107cb094	[LoopVectorize] Enable more early exit vectorisation tests (#117008 ) PR #112138 introduced initial support for dispatching to multiple exit blocks via split middle blocks. This patch fixes a few issues so that we can enable more tests to use the new enable-early-exit-vectorization flag. Fixes are: 1. The code to bail out for any loop live-out values happens too late. This is because collectUsersInExitBlocks ignores induction variables, which get dealt with in fixupIVUsers. I've moved the check much earlier in processLoop by looking for outside users of loop-defined values. 2. We shouldn't yet be interleaving when vectorising loops with uncountable early exits, since we've not added support for this yet. 3. Similarly, we also shouldn't be creating vector epilogues. 4. Similarly, we shouldn't enable tail-folding. 5. The existing implementation doesn't yet support loops that require scalar epilogues, although I plan to add that as part of PR #88385. 6. The new split middle blocks weren't being added to the parent loop.	2024-12-18 09:25:45 +00:00
Nikita Popov	1157187496	[VPlan] Propagate all GEP flags (#119899 ) Store GEPNoWrapFlags instead of only InBounds and propagate them.	2024-12-17 13:48:50 +01:00
Florian Hahn	0e528ac404	[VPlan] Use start value operand for FindLastIV reduction phis. Update VPReductionPHIRecipe::execute to use the start value from the start value operand of the recipe. This is needed to make sure we resume from the correct value during epilogue vectorization. At the moment, the start value is set to the sentinel value in adjustRecipesForReductions, as the original start value needs to be used when creating ResumePhi recipes. Fixes a mis-compile introduced by b3cba9be41bfa8 in SPEC2017 on AArch64.	2024-12-16 23:29:49 +00:00
Florian Hahn	f9120dc2a6	[VPlan] Make sure vector trip count is ready for prepareToExecute (NFC) Split off from https://github.com/llvm/llvm-project/pull/112145. This ensures that getOrCreateVectorTripCount creates the trip count as needed when induction resume value creation is moved to VPlan and no longer creates the vector trip count early.	2024-12-16 20:44:20 +00:00
Florian Hahn	89d5272841	[VPlan] Remove getPreheader(). (NFC) The preheader is now the entry block, connected to the vector.ph. Clean up after https://github.com/llvm/llvm-project/pull/114292.	2024-12-16 19:48:02 +00:00
Florian Hahn	95e509a989	[VPlan] Add VPWidenInduction recipe as common base class (NFC). (#120008 ) This helps to simplify some existing code and new code (https://github.com/llvm/llvm-project/pull/112145) PR: https://github.com/llvm/llvm-project/pull/120008	2024-12-16 09:40:03 +00:00
Florian Hahn	2067e604a4	[VPlan] Manage VPWidenPointerInduction debug location via recipe. Update VPWidenPointerInduction to manage its debug location via recipe. This makes sure we emit a proper debug location for VPWidenPointerInductionRecipes.	2024-12-15 14:41:07 +00:00
Florian Hahn	734a204fbd	[VPlan] Manage VPWidenIntOrFPInduction debug location via recipe (NFC). Properly set VPWidenIntOrFpInductionRecipe's debug location in the recipe and use it, instead of using the debug location of the underlying IR instruction.	2024-12-15 13:45:28 +00:00
Florian Hahn	4e828f8d74	[VPlan] Perform DT expensive input DT verification earlier (NFC). After 6c8f41d33674, DT adjustments for the skeleton are applied as VPBBs are executed. Move input DT verification up before starting to execute any VPBBs to avoid checking DT while the CFG and DT are in an incomplete state. This fixes a number of verification failures with expensive checks enabled, including https://lab.llvm.org/buildbot/#/builders/16/builds/10584	2024-12-12 20:06:20 +00:00
Florian Hahn	6c8f41d336	[VPlan] Hook IR blocks into VPlan during skeleton creation (NFC) (#114292 ) As a first step to move towards modeling the full skeleton in VPlan, start by wrapping IR blocks created during legacy skeleton creation in VPIRBasicBlocks and hook them into the VPlan. This means the skeleton CFG is represented in VPlan, just before execute. This allows moving parts of skeleton creation into recipes in the VPBBs gradually. Note that this allows retiring some manual DT updates, as this will be handled automatically during VPlan execution. PR: https://github.com/llvm/llvm-project/pull/114292	2024-12-12 15:58:16 +00:00
Florian Hahn	a480d51722	[VPlan] Use existing vector trip count VPValue for resume phi (NFC) Instead of going through getOrAddLiveIn to get a VPValue for the vector trip count retrieve it directly from VPlan via getVectorTripCount. Small simplification following 0e70289f373.	2024-12-12 11:03:47 +00:00
Mel Chen	b3cba9be41	[LoopVectorize] Vectorize select-cmp reduction pattern for increasing integer induction variable (#67812 ) Consider the following loop: ``` int rdx = init; for (int i = 0; i < n; ++i) rdx = (a[i] > b[i]) ? i : rdx; ``` We can vectorize this loop if `i` is an increasing induction variable. The final reduced value will be the maximum of `i` that the condition `a[i] > b[i]` is satisfied, or the start value `init`. This patch added new RecurKind enums - IFindLastIV and FFindLastIV. --------- Co-authored-by: Alexey Bataev <5361294+alexey-bataev@users.noreply.github.com>	2024-12-12 16:48:31 +08:00
Florian Hahn	5fae408d3a	[VPlan] Dispatch to multiple exit blocks via middle blocks. (#112138 ) A more lightweight variant of https://github.com/llvm/llvm-project/pull/109193, which dispatches to multiple exit blocks via the middle blocks. The patch also introduces a bit of required scaffolding to enable early-exit vectorization, including an option. At the moment, early-exit vectorization doesn't come with legality checks, and is only used if the option is provided and the loop has metadata forcing vectorization. This is only intended to be used for testing during bring-up, with @david-arm enabling auto early-exit vectorization plugging in the changes from https://github.com/llvm/llvm-project/pull/88385. PR: https://github.com/llvm/llvm-project/pull/112138	2024-12-11 21:11:05 +00:00
Florian Hahn	0e7f18791c	[LV] Relax assertion in fixupIVUsers (NFC). Adjust the assertion in fixupIVUsers to only require a unique exit block if there are any values to fix up. This enables the bring up of multi-exit loop vectorization without requiring a scalar epilogue. Split off as suggested from https://github.com/llvm/llvm-project/pull/112138.	2024-12-10 10:39:28 +00:00
Florian Hahn	56ddbeff83	[LV] Use getUniqueLatchExitBlock in createVectorLoopSkeleton (NFC). Use getUniqueLatchExitBlock instead of getUniqueExitBlock in preparation for multi-exit vectorization without requiring a scalar epilogue. Split off as suggested from https://github.com/llvm/llvm-project/pull/112138	2024-12-10 09:35:08 +00:00
Florian Hahn	e9834209aa	[VPlan] Move convertToConreteRecipes to end of VPlan-opt phase (NFCI). Adjust placement as suggested in https://github.com/llvm/llvm-project/pull/114305, after some refactoring to prepare for the move.	2024-12-10 09:13:13 +00:00
Florian Hahn	0e70289f37	[VPlan] Create canonical IV resume value for epilogue in VPlan. (NFCI) Update the code to create induction resume PHIs to also create a resume phi for the canonical induction during epilogue vectorization. This unifies the code for handling induction resume values and removes the need to explicitly create manually resume PHI and return it during epilogue creation. Overall it helps to move the code for updating the canonical induction resume value to the place where all other header phi resume values are updated. This is NFC, modulo order of the created phis.	2024-12-09 23:11:38 +00:00
Florian Hahn	adfe54f7da	[VPlan] Directly check VectorizingEpilogue in ::executePlan (NFC). Directly check VectorizingEpilogue which directly indicates that the epilogue is vectorized.	2024-12-09 22:21:25 +00:00
Florian Hahn	4fd8dbc184	[LV] Move code to prepare VPlan for epilogue vector loop to helper (NFC) Move code to prepare the VPlan for the epilogue vector loop to a helper to reduce size and complexity of processLoop.	2024-12-09 21:56:10 +00:00
Kazu Hirata	9099d694f6	[Vectorize] Fix a warning This patch fixes: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2699:49: error: captured structured bindings are a C++20 extension [-Werror,-Wc++20-extensions]	2024-12-09 10:26:48 -08:00
Igor Kirillov	337936a83b	[LV] Ignore some costs when loop gets fully unrolled (#106699 ) When VF has a fixed width and equals the number of iterations, and we are not tail folding by masking, comparison instruction and induction operation will be DCEed later. Ignoring the costs of these instructions improves the cost model.	2024-12-09 18:17:52 +00:00
Florian Hahn	eff0d8103c	[VPlan] Adjust original position of convertToConcreteRecipes. Restore the original position of the call before afef545efab77a8 to fix a number of crashes.	2024-12-08 21:52:51 +00:00
Florian Hahn	afef545efa	[VPlan] Address post-commit for #114305 . Apply suggested renaming and adjust placement as suggested in https://github.com/llvm/llvm-project/pull/114305. Also drop unneeded RPOT creation.	2024-12-08 21:24:19 +00:00
Florian Hahn	156da98683	[VPlan] Move printing final VPlan to ::execute (NFC). This moves printing of the final VPlan to ::execute. This ensures the final VPlan is printed, including recipes that get introduced by late, lowering transforms and skeleton construction. Split off from https://github.com/llvm/llvm-project/pull/114292, to simplify the diff.	2024-12-07 09:39:10 +00:00
Florian Hahn	7f7f540a48	Reapply "[VPlan] Update scalar induction resume values in VPlan. (#110577 )" This reverts commit f09b16e2671cbcdf7cb7dc7ed705db092a9deda1. The crash when building llvm-test-suite with stage2 should have been fixed by 1091fad31a83d5ab87eb6fa11fe3bdb3f0d152ea.	2024-12-06 19:41:51 +00:00
Nikita Popov	f09b16e267	Revert "[VPlan] Update scalar induction resume values in VPlan. (#110577 )" This reverts commit 0678e2058364ec10b94560d27ec7138dfa003287. This reverts commit 1091fad31a83d5ab87eb6fa11fe3bdb3f0d152ea. Causes crashes in llvm-test-suite when using stage 2 clang.	2024-12-06 18:01:42 +01:00
Florian Hahn	0678e20583	[VPlan] Update scalar induction resume values in VPlan. (#110577 ) Updated ILV.createInductionResumeValues (now createInductionResumeVPValue) to directly update the VPIRInstructions wrapping the original phis with the created resume values. This is the first step towards modeling them completely in VPlan. Subsequent patches will move creation of the resume values completely into VPlan. Depends on https://github.com/llvm/llvm-project/pull/109975. PR: https://github.com/llvm/llvm-project/pull/110577	2024-12-06 12:26:19 +00:00

1 2 3 4 5 ...

2346 Commits