llvm-project

Author	SHA1	Message	Date
Florian Hahn	59a7185dd9	[VPlan] Truncate/Extend ComputeReductionResult at construction (NFC). (#141860 ) Instead of looking up the narrower reduction type via getRecurrenceType we can generate the needed extend directly at constructiond re-use the truncated value from the loop. PR: https://github.com/llvm/llvm-project/pull/141860	2025-06-30 22:39:17 +01:00
Florian Hahn	20fbbd7675	[LV] Add support for cmp reductions with decreasing IVs. (#140451 ) Similar to FindLastIV, add FindFirstIVSMin to support select (icmp(), x, y) reductions where one of x or y is a decreasing induction, producing a SMin reduction. It uses signed max as sentinel value. PR: https://github.com/llvm/llvm-project/pull/140451	2025-06-29 11:17:03 +01:00
Ramkumar Ramachandra	613804cca9	[LV] Improve code using [[maybe_unused]] (NFC) (#137138 )	2025-06-27 10:58:17 +01:00
David Sherwood	bf2b14acf3	[LV] Enable auto-vectorisation of loops with uncountable exits (#133099 ) Until now the feature to enable vectorisation of some early exit loops with uncountable exits was controlled under a flag, off by default. Now that we have efficient code generation for vectorising such loops (see PR #130766) and we still have some time from the next LLVM release it seems like a good time point to enable the feature by default. If any issues arise post-commit it can be easily reverted. Using this patch I built and ran the LLVM test suite successfully, which on neoverse-v1 led to the vectorisation of 114 additional early exit loops. I also built and ran SPEC2017 successfully for both neoverse-v1 and neoverse-v2.	2025-06-27 10:39:33 +01:00
David Sherwood	6f43754e9c	[LV] Disable interleaving via hints for uncountable early exit loops (#145877 ) Currently if the user enables interleaving during vectorisation of uncountable early exit loops via the interleave_count pragma and the enable-early-exit-vectorization option, it will miscompile. There is ongoing work to fix this, but for now it seems safer to ignore the hint until it is supported. --------- Co-authored-by: Paul Walker <paul.walker@arm.com>	2025-06-27 09:09:55 +01:00
Florian Hahn	786ccb2c0e	[LV] Directly check if memory or SCEV check blocks are used (NFCI). Slightly simplify the logic to retrieve check blocks in GeneratedRTChecks, to prepare for additional refactoring.	2025-06-27 07:24:32 +01:00
Florian Hahn	aa24029319	[VPlan] Unroll VPReplicateRecipe by VF. (#142433 ) Explicitly unroll VPReplicateRecipes outside replicate regions by VF, replacing them by VF single-scalar recipes. Extracts for operands are added as needed and the scalar results are combined to a vector using a new BuildVector VPInstruction. It also adds a few folds to simplify unnecessary extracts/BuildVectors. It also adds a BuildStructVector opcode for handling of calls that have struct return types. VPReplicateRecipe in replicate regions can will be unrolled as follow up, turing non-single-scalar VPReplicateRecipes into 'abstract', i.e. not executable. PR: https://github.com/llvm/llvm-project/pull/142433	2025-06-26 11:19:09 +01:00
Florian Hahn	830b2c842e	[LV] Replace redundant ExtractLastElement of reduction result (NFC). Replace redundant ExtractLastElement VPInstructions early. This is NFC, as the VPInstruction computing the final result is vector-to-scalar, producing a single scalar already. This enables follow-up changes to model more aspects of reductions directly in VPlan.	2025-06-24 21:48:58 +01:00
David Green	77941eba7f	[CostModel] Add a DstTy to getShuffleCost (#141634 ) A shuffle will take two input vectors and a mask, to produce a new vector of size <MaskElts x SrcEltTy>. Historically it has been assumed that the SrcTy and the DstTy are the same for getShuffleCost, with that being relaxed in recent years. If the Tp passed to getShuffleCost is the SrcTy, then the DstTy can be calculated from the Mask elts and the src elt size, but the Mask is not always provided and the Tp is not reliably always the SrcTy. This has led to situations notably in the SLP vectorizer but also in the generic cost routines where assumption about how vectors will be legalized are built into the generic cost routines - for example whether they will widen or promote, with the cost modelling assuming they will widen but the default lowering to promote for integer vectors. This patch attempts to start improving that - it originally tried to alter more of the cost model but that too quickly became too many changes at once, so this patch just plumbs in a DstTy to getShuffleCost so that DstTy and SrcTy can be reliably distinguished. The callers of getShuffleCost have been updated to try and include a DstTy that is more accurate. Otherwise it tries to be fairly non-functional, keeping the SrcTy used as the primary type used in shuffle cost routines, only using DstTy where it was in the past (for InsertSubVector for example). Some asserts have been added that help to check for consistent values when a Mask and a DstTy are provided to getShuffleCost. Some of them took a while to get right, and some non-mask calls might still be incorrect. Hopefully this will provide a useful base to build more shuffles that alter size.	2025-06-21 12:29:29 +01:00
Philip Reames	c103bbc836	[LV] Consider whether vscale is a known power of two for iteration check (#144963 ) Going mostly by the comment here - but it says "vscale is not necessarily a power-of-2". Both in tree targets have vscale as a power of two, and we have an existing TTI hook for that.	2025-06-20 11:37:27 -07:00
Luke Lau	a2b8a93ff9	[VPlan] Pass NumUnrolledElems as operand to VPWidenPointerInductionRecipe. NFC (#119859 ) Similarly to VPWidenIntOrFpInductionRecipe, if we want to support it in EVL tail folding we need to increment the induction by EVL steps instead of VF*UF steps, but currently this is hard-wired in VPWidenPointerInductionRecipe. This adds an operand for the number of elements unrolled and plumbs it through, so that we can swap it out in VPlanTransforms::tryAddExplicitVectorLength further down the line.	2025-06-20 15:46:52 +01:00
Ramkumar Ramachandra	c8c4bd1ebc	[LV] Stengthen loop-invariance checks in isPredicatedInst (#140744 ) Check loop-invariance against SCEV as well.	2025-06-20 14:01:48 +01:00
Philip Reames	d8e6d74c69	[LV] Consider EVL legality for TTI tail folding preference (#144790 )	2025-06-19 16:15:23 -07:00
Philip Reames	b96370131d	[TTI] Plumb CostKind through getPartialReductionCost (#144953 ) Purely for the sake of being idiomatic with other TTI costing routines, no direct motivation beyond that.	2025-06-19 15:29:56 -07:00
David Sherwood	af51c9d9df	[LV][NFC] Add branch weight test showing incorrect behaviour (#144682 ) This patch adds a test that shows incorrect branch weights being set in function EpilogueVectorizerEpilogueLoop::emitMinimumVectorEpilogueIterCountCheck	2025-06-19 10:49:40 +01:00
Florian Hahn	071a6feabd	[TTI] Remove PPC hasActiveVectorLength impl, simplify interface (NFC). (#142310 ) PPCTTIImpl defines hasActiveVectorLength and also getVPMemoryOpCost, but they appear unused (i.e. no changes to tests). Remove them, as they complicate the interface for hasActiveVectorLength. This simplifies the only use in LV as now no placeholder values need to be passed. PR: https://github.com/llvm/llvm-project/pull/142310	2025-06-18 19:02:17 +01:00
Paul Walker	d3441f7348	[LV] Change getSmallBestKnownTC to return an ElementCount (NFC) (#141793 ) This is prep work for enabling better UF calculations when using vscale based VFs to vectorise loops with vscale based tripcounts. NOTE: NFC because All uses remain fixed-length until a following PR changes LoopVectorize's version of getSmallConstantTripCount().	2025-06-18 11:45:20 +01:00
Mel Chen	ba40a7bc2e	[LoopVectorize] Vectorize fixed-order recurrence with vscale x 1. (#142772 ) When the fixed-order recurrence phi is live-out from the loop, the vectorizer uses VPInstruction::ExtractPenultimateElement to extract the penultimate element from the recurrence vector. However, this is not feasible when the VF is vscale x 1, since vscale could be 1, making the vector contain only one element. This patch changes the behavior for vscale x 1 by extracting the last element from the vector produced by splicing the recurrence phi and the previous value. This ensures we can still determine the correct live-out value of the recurrence phi.	2025-06-18 16:03:20 +08:00
Luke Lau	9dd1c66e8f	[VPlan] Expand VPWidenIntOrFpInductionRecipe into separate recipes (#118638 ) The motivation of this PR is to make #115274 easier to implement, and should allow us to add EVL support by just passing EVL to the VF operand. The current difficulty with widening IVs with EVL is that VPWidenIntOrFpInductionRecipe generates its own backedge value. Since it's a VPHeaderPHIRecipe the VF operand must be in the preheader, which means we can't use the EVL since it's defined in the loop body. The gist in this PR is to take the approach in #114305 and expand VPWidenIntOrFpInductionRecipe into several recipes for the initial value, phi and backedge value just before execution. I.e. this example: ``` vector.ph: Successor(s): vector loop <x1> vector loop: { vector.body: WIDEN-INDUCTION %i = phi %start, %step, %vf ... EMIT branch-on-count ... No successors } ``` gets expanded to: ``` vector.ph: ... vp<%induction.start> = ... vp<%induction.increment> = ... Successor(s): vector loop <x1> vector loop: { vector.body: ir<%i> = WIDEN-PHI vp<%induction.start>, vp<%vec.ind.next> ... vp<%vec.ind.next> = add ir<%i>, vp<%induction.increment> EMIT branch-on-count ... No successors } ``` This allows us to a value defined in the loop in the backedge value, and also means we can just reuse the existing backedge fixups in VPlan::execute without having to specially handle it ourselves. After this #115274 should just become a matter of setting the VF operand to EVL (and building the increment step in the loop body, not the preheader).	2025-06-17 18:24:07 +01:00
Florian Hahn	d3bc834ece	[LV] Update check to find epilogue resume value to check all incoming. This fixes a crash where all incoming values for the epilogue resume value are zero, because there are no remaining iterations to execute for the epilogue loop.	2025-06-16 21:10:12 +01:00
David Sherwood	a75e0627f9	[LV] Use vscale for tuning when updating profile information (#143690 ) In fixVectorizedLoop we call setProfileInfoAfterUnrolling to update the profile information after vectorising, however for scalable VFs we pessimistically assume vscale=1. We can improve upon this by using the value of vscale used for tuning, i.e. when targeting neoverse-v1 the expected value is 2.	2025-06-16 10:02:38 +01:00
Sam Tebbs	3dd61c1876	[LV] Fix MVE regression from #132190 (#141736 ) Register pressure was only considered if the vector bandwidth was being maximised (chosen either by the target or user options), but #132190 inadvertently caused high pressure VFs to be pruned even when max bandwidth wasn't enabled. This PR returns to the previous behaviour.	2025-06-16 09:58:03 +01:00
Ramkumar Ramachandra	fbade95ebf	[LV] Strip unnecessary make_{pair,optional} (NFC) (#141924 )	2025-06-16 08:55:46 +01:00
Florian Hahn	df54a2d935	[VPlan] Only skip induction phis in planContainsAdditionalSimps (NFC). Skip induction phis when checking for simplifications, as they may not be lowered directly be lowered to a corresponding PHI recipe. Reductions and first-order recurrences will get lowered to phi recipes, unless they are removed. Considering them for simplifications allows removing them if there are no remaining users. NFC as currently reduction and recurrence phis are not simplified/removed if dead.	2025-06-15 19:31:30 +01:00
Florian Hahn	577199f922	Reapply "[VPlan] Set branch weight metadata on middle term in VPlan (NFC) (#143035 )" This reverts commit 0604dc199c019b23746f4a54885ba0c75569cdae. The recommitted version addresses post-commit comments and adjusts the place the branch weights are added. It now runs before VPlans are optimized for VF and UF, which may remove the vector loop region, causing a crash trying to get the middle block after that. Test case added in 72f99b75afc12bb. Original message: Manage branch weights for the BranchOnCond in the middle block in VPlan. This requires updating VPInstruction to inherit from VPIRMetadata, which in general makes sense as there are a number of opcodes that could take metadata. There are other branches (part of the skeleton) that also need branch weights adding. PR: https://github.com/llvm/llvm-project/pull/143035	2025-06-14 17:20:46 +01:00
Florian Hahn	732ebf803b	[VPlan] Address post-commit comments for f68848015f62. Assign sentinel value to named variable to clarify naming and update comments. Addresses post-commit comments from https://github.com/llvm/llvm-project/pull/142291.	2025-06-14 10:44:20 +01:00
Florian Hahn	1ded2c599f	[LV] Use createIterationCountCheck during epilogue skeleton creation. Use helper already used for minimum trip count checks for the regular ILV skeleton creation also for epilogue skeleton creation.	2025-06-13 21:01:12 +01:00
Florian Hahn	f68848015f	[VPlan] Manage Sentinel value for FindLastIV in VPlan. (#142291 ) Similar to modeling the start value as operand, also model the sentinel value as operand explicitly. This makes all require information for code-gen available directly in VPlan. PR: https://github.com/llvm/llvm-project/pull/142291	2025-06-13 19:17:01 +01:00
David Sherwood	541e5118ce	[LV] Use getFixedValue instead of getKnownMinValue when appropriate (#143526 ) There are many places in VPlan and LoopVectorize where we use getKnownMinValue to discover the number of elements in a vector. Where we expect the vector to have a fixed length, I have used the stronger getFixedValue call. I believe this is clearer and adds extra protection in the form of an assert in getFixedValue that the vector is not scalable. While looking at VPFirstOrderRecurrencePHIRecipe::computeCost I also took the liberty of simplifying the code. In theory I believe this patch should be NFC, but I'm reluctant to add that to the title in case we're just missing tests for some of the VPlan changes. I built and ran the LLVM test suite when targeting neoverse-v1 and it seemed ok.	2025-06-13 11:43:50 +01:00
Florian Hahn	d65904675e	[LV] Move logic to create trip count check to helper (NFC). Move the logic to create the iteration count check to a separate helper, so it can be re-used by when creating the skeleton for epilogue vectorization as well.	2025-06-12 21:35:56 +01:00
Hans Wennborg	0604dc199c	Revert "[VPlan] Set branch weight metadata on middle term in VPlan (NFC) (#143035 )" This caused assertion failures: llvm/lib/Transforms/Vectorize/VPlan.h:4021: llvm::VPBasicBlock* llvm::VPlan::getMiddleBlock(): Assertion `LoopRegion && "cannot call the function after vector loop region has been removed"' failed. See comment on the PR. > Manage branch weights for the BranchOnCond in the middle block in VPlan. > This requires updating VPInstruction to inherit from VPIRMetadata, which > in general makes sense as there are a number of opcodes that could take > metadata. > > There are other branches (part of the skeleton) that also need branch > weights adding. > > PR: https://github.com/llvm/llvm-project/pull/143035 This reverts commit db8d34db26e9ea92c08d6e813eca9cce40c48478.	2025-06-12 13:52:05 +02:00
Luke Lau	7ef77eb998	[LV] Support scalable interleave groups for factors 3,5,6 and 7 (#141865 ) Currently the loop vectorizer can only vectorize interleave groups for power-of-2 factors at scalable VFs by recursively interleaving [de]interleave2 intrinsics. However after https://github.com/llvm/llvm-project/pull/124825 and #139893, we now have [de]interleave intrinsics for all factors up to 8, which is enough to support all types of segmented loads and stores on RISC-V. Now that the interleaved access pass has been taught to lower these in #139373 and #141512, this patch teaches the loop vectorizer to emit these intrinsics for factors up to 8, which enables scalable vectorization for non-power-of-2 factors. As far as I'm aware, no in-tree target will vectorize a scalable interelave group above factor 8 because the maximum interleave factor is capped at 4 on AArch64 and 8 on RISC-V, and the `-max-interleave-group-factor` CLI option defaults to 8, so the recursive [de]interleaving code has been removed for now. Factors of 3 with scalable VFs are also turned off in AArch64 since there's no lowering for [de]interleave3 just yet either.	2025-06-12 11:09:09 +01:00
Florian Hahn	db8d34db26	[VPlan] Set branch weight metadata on middle term in VPlan (NFC) (#143035 ) Manage branch weights for the BranchOnCond in the middle block in VPlan. This requires updating VPInstruction to inherit from VPIRMetadata, which in general makes sense as there are a number of opcodes that could take metadata. There are other branches (part of the skeleton) that also need branch weights adding. PR: https://github.com/llvm/llvm-project/pull/143035	2025-06-12 10:04:08 +01:00
Florian Hahn	5623b7f2d5	[LV] Use GeneratedRTChecks to check if safety checks were added (NFC). Directly check via GeneratedRTChecks if any checks have been added, instead of needing to go through ILV. This simplifies the code and enables further refactoring in follow-up patches.	2025-06-11 21:08:36 +01:00
Stephen Tozer	aa8a1fa6f5	[DLCov][NFC] Annotate intentionally-blank DebugLocs in existing code (#136192 ) Following the work in PR #107279, this patch applies the annotative DebugLocs, which indicate that a particular instruction is intentionally missing a location for a given reason, to existing sites in the compiler where their conditions apply. This is NFC in ordinary LLVM builds (each function `DebugLoc::getFoo()` is inlined as `DebugLoc()`), but marks the instruction in coverage-tracking builds so that it will be ignored by Debugify, allowing only real errors to be reported. From a developer standpoint, it also communicates the intentionality and reason for a missing DebugLoc. Some notes for reviewers: - The difference between `I->dropLocation()` and `I->setDebugLoc(DebugLoc::getDropped())` is that the former _may_ decide to keep some debug info alive, while the latter will always be empty; in this patch, I always used the latter (even if the former could technically be correct), because the former could result in some (barely) different output, and I'd prefer to keep this patch purely NFC. - I've generally documented the uses of `DebugLoc::getUnknown()`, with the exception of the vectorizers - in summary, they are a huge cause of dropped source locations, and I don't have the time or the domain knowledge currently to solve that, so I've plastered it all over them as a form of "fixme".	2025-06-11 17:42:10 +01:00
Florian Hahn	62b3e89afc	[LV] Remove unused LoopBypassBlocks from ILV (NFC). After recent refactorings to move parts of skeleton creation LoopBypassBlocks isn't used any more. Remove it.	2025-06-10 21:37:29 +01:00
Florian Hahn	4a6d31f4bf	[LV] Pass resume phi to fixReductionScalarResumeWhenVectorizing (NFC). fixReductionScalarResumeWhenVectorizingEpilog updates the resume phis in the scalar preheader. Instead of looking at all recipes in the middle block and finding their resume-phi users we can iterate over all resume phis in the scalar preheader directly. This slightly simplifies the code and removes the need to look for the resume phi. Also slightly simplifies https://github.com/llvm/llvm-project/pull/141860.	2025-06-09 22:11:20 +01:00
Florian Hahn	f9b98e386e	[LV] Remove unused LoopMiddleBlock arg from fixReductionScalarRes (NFC) The argument isn't used, remove it.	2025-06-09 21:47:03 +01:00
Florian Hahn	6108d50aed	[VPlan] Add ReductionStartVector VPInstruction. (#142290 ) Add a new VPInstruction::ReductionStartVector opcode to create the start values for wide reductions. This more accurately models the start value creation in VPlan and simplifies VPReductionPHIRecipe::execute. Down the line it also allows removing VPReductionPHIRecipe::RdxDesc. PR: https://github.com/llvm/llvm-project/pull/142290	2025-06-09 20:59:12 +01:00
Florian Hahn	c2ea9404ab	[LV] Simplify finding EPResumeValue (NFC). It should be sufficient to check that the resume phi has the correct type, as the vector trip count as incoming value and starts at 0 otherwise. There is no need to find the middle block.	2025-06-09 19:37:33 +01:00
Ramkumar Ramachandra	6716d4eaa8	[LV] Prefer DenseMap::lookup over find (NFC) (#141809 ) Apart from the stylistic improvement, lookup has the nice property of returning a default-constructed object on failure-to-find, while find returns the end iterator, which cannot be dereferenced.	2025-06-03 14:37:19 +01:00
Florian Hahn	5520ab3d50	[VPlan] Add ComputeAnyOfResult VPInstruction (NFC) (#141932 ) Add a dedicated opcode for any-of reduction, similar to https://github.com/llvm/llvm-project/pull/132689 and https://github.com/llvm/llvm-project/pull/132690. The patch also explictly adds the start value to not require RecurrenceDescriptor during execute. It also allows freezing the start value to make it poison-safe. PR: https://github.com/llvm/llvm-project/pull/141932	2025-06-03 14:33:53 +01:00
Luke Lau	ddfeecf4c5	[VPlan] Convert to concrete recipes before dissolving loop regions. NFCI (#141999 ) After updating #118638 on tip of tree, expanding VPWidenIntOrFpInductionRecipes fails because it needs the loop region to get the latch to insert the increment into: VPBasicBlock ExitingBB = Plan->getVectorLoopRegion()->getExitingBasicBlock(); Builder.setInsertPoint(ExitingBB, ExitingBB->getTerminator()->getIterator()); auto Next = Builder.createNaryOp(AddOp, {Prev, Inc}, Flags, WidenIVR->getDebugLoc(), "vec.ind.next"); However after #117506, the region is dissolved so it doesn't work. This shuffles the dissolveLoopRegions steps to be after convertToConcreteRecipes so we can use the region when expanding VPWidenIntOrFpInductionRecipes	2025-06-03 12:05:13 +01:00
Florian Hahn	11713e86b0	[LV] Move VPlan-based calculateRegisterUsage to VPlanAnalysis (NFC). (#135673 ) Move VPlan-based calculateRegisterUsage from LoopVectorize to VPlanAnalysis.cpp. It is a VPlan-based analysis and this helps to reduce the size of LoopVectorize. PR: https://github.com/llvm/llvm-project/pull/135673	2025-06-02 17:40:50 +01:00
Florian Hahn	0f00a96fed	[VPlan] Simplify branch on False in VPlan transform (NFC). (#140409 ) Simplify branch on false, starting with the branch from the middle block to the scalar preheader. Initially this helps simplifying the initial VPlan construction. Depends on https://github.com/llvm/llvm-project/pull/140405. PR: https://github.com/llvm/llvm-project/pull/140409	2025-05-31 20:32:45 +01:00
Jon Roelofs	798058fca5	[Remarks] Remove an upcast footgun. NFC (#142191 ) CodeRegion's were previously passed as Value*, but then immediately upcast to BasicBlock. Let's keep the type information around until the use cases for non-BasicBlock code regions actually materialize.	2025-05-31 11:07:54 -07:00
Florian Hahn	10bd4cd9cd	[VPlan] Remove ResumePhi opcode, use regular PHI instead (NFC). (#140405 ) Use regular VPPhi instead of a separate opcode for resume phis. This removes an unneeded specialized opcode and unifies the code (verification, printing, updating when CFG is changed). Depends on https://github.com/llvm/llvm-project/pull/140132. PR: https://github.com/llvm/llvm-project/pull/140405	2025-05-30 12:50:08 +01:00
Florian Hahn	417e43ad43	[LV] Set PhiTy once in adjustRecipesForReductions (NFC).	2025-05-30 08:33:15 +01:00
Ramkumar Ramachandra	663aea2601	[LV] Clean up unused template args of min/max (NFC) (#141778 )	2025-05-29 09:57:22 +02:00
Elvis Wang	332fe08f1d	[VPlan] Implement VPlan-based cost model for VPReduction, VPExtendedReduction and VPMulAccumulateReduction. (#113903 ) This patch implement the VPlan-based cost model for VPReduction, VPExtendedReduction and VPMulAccumulateReduction. With this patch, we can calculate the reduction cost by the VPlan-based cost model so remove the reduction costs in `precomputeCost()`. Ref: Original instruction based implementation: https://reviews.llvm.org/D93476	2025-05-29 11:15:16 +08:00

1 2 3 4 5 ...

2593 Commits