llvm-project

Author	SHA1	Message	Date
Florian Hahn	c467d38090	[LV] Fix offset handling for epilogue resume values. (NFCI) (#189259 ) Instead of replacing all uses of the canonical IV with an add of the resume value and then relying on the fold to simplify, directly create offset versions of both the canonical IV and its increment. The original offset computation were incorrect, but not resulted in mis-compiles due to the corresponding fold. Split off from approved https://github.com/llvm/llvm-project/pull/156262.	2026-03-29 17:04:50 +00:00
Florian Hahn	3b58a740d3	[VPlan] Move flags to separate variable (NFC). Addresses post-commit feedback missed in https://github.com/llvm/llvm-project/pull/188966.	2026-03-27 12:44:11 +00:00
Florian Hahn	90c1c588f8	[VPlan] Don't set WrapFlags for truncated IVs. (#188966 ) The wrap flags from the IV bin-op are not guaranteed to apply to truncated inductions, which are evaluated in narrower types. Instead of dropping them late (in expandVPWidenIntOrFpInduction), do not add them at the outset, the prevent invalid transforms based on incorrect flags in the future. PR: https://github.com/llvm/llvm-project/pull/188966	2026-03-27 12:39:03 +00:00
Florian Hahn	5aae014ed5	[LV] Refine tripcount estimate using minimum iteration count rt check. (#188135 ) When not folding the tail the minimum iteration count check ensures that the vector loop is not executed if computing the trip count wraps around to zero, as the trip count must be at least VF when vectorizing without tail-folding. Add and use a new tryToRefineConstantMaxTripCount helper. This ensures we do not create dead main loops when vectorizing the epilogue, as we choose smaller main VFs. PR: https://github.com/llvm/llvm-project/pull/188135	2026-03-26 20:48:53 +00:00
Florian Hahn	40304d8fef	Reapply "[VPlan] Remove manual region removal when simplifying for VF and UF. (#181252 )" (#188589 ) This reverts commit e30f9c19464bcf1bf1e9f69b63884fb78ad2d05d. Re-land, now that the reported crash causing the revert has been fixed as part of 77fb84889 (#187504). Original message: Replace manual region dissolution code in simplifyBranchConditionForVFAndUF with using general removeBranchOnConst. simplifyBranchConditionForVFAndUF now just creates a (BranchOnCond true) or updates BranchOnTwoConds. The loop then gets automatically removed by running removeBranchOnConst. This removes a bunch of special logic to handle header phi replacements and CFG updates. With the new code, there's no restriction on what kind of header phi recipes the loop contains. Note that VPEVLBasedIVRecipe needs to be marked as readnone. This is technically unrelated, but I could not find an independent test that would be impacted. The code to deal with epilogue resume values now needs updating, because we may simplify a reduction directly to the start value. PR: https://github.com/llvm/llvm-project/pull/181252	2026-03-26 10:14:10 +00:00
Florian Hahn	77fb848894	Reapply "[LV] Simplify and unify resume value handling for epilogue vec." (#187504 ) This reverts commit cdaf29f84dd0abbd1f961982799059c92d76625b. This version skips removeBranchOnConst when vectorizing the epilogue, as it may trigger folds that remove the resume phi used as resume value from the epilogue. This fixes https://github.com/llvm/llvm-project/issues/187323. Original message: This patch tries to drastically simplify resume value handling for the scalar loop when vectorizing the epilogue. It uses a simpler, uniform approach for updating all resume values in the scalar loop: 1. Create ResumeForEpilogue recipes for all scalar resume phis in the main loop (the epilogue plan will have exactly the same scalar resume phis, in exactly the same order) 2. Update ::execute for ResumeForEpilogue to set the underlying value when executing. This is not super clean, but allows easy lookup of the generated IR value when we update the resume phis in the epilogue. Once we connect the 2 plans together explicitly, this can be removed. 3. Use the list of ResumeForEpilogue VPInstructions from the main loop to update the resume/bypass values from the epilogue. This simplifies the code quite a bit, makes it more robust (should fix https://github.com/llvm/llvm-project/issues/179407) and also fixes a mis-compile in the existing tests (see change in llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-sub-epilogue-vec.ll, where previously we would incorrectly resume using the start value when the epilogue iteration check failed) In some cases, we get simpler code, due to additional CSE, in some cases the induction end value computations get moved from the epilogue iteration check to the vector preheader. We could try to sink the instructions as cleanup, but it is probably not worth the trouble. Fixes https://github.com/llvm/llvm-project/issues/179407. PR for recommit https://github.com/llvm/llvm-project/pull/188134	2026-03-23 22:09:40 +00:00
Alan Zhao	c624851037	[LoopVectorize] Fix an integer narrowing conversion in `getPredBlockCostDivisor(...)` (#187605 ) `LoopVectorizationCostModel::getPredBlockCostDivisor(...)` may return large `uint64_t` values that get coerced to an `unsigned` by `VPCostContext::getPredBlockCostDivisor(...)`, which can cause division by zero. Fixes #187584	2026-03-23 17:22:05 +00:00
Florian Hahn	19b0c68ee0	[VPlan] Skip epilogue vectorization if dead after narrowing IGs. (#187016 ) When narrowing interleave groups, the main vector loop processes IC iterations instead of VF * IC. Update selectEpilogueVectorizationFactor to use the effective VF, checking if the canonical IV controlling the loop now steps by UF instead of VFxUF. This avoids epilogue vectorization with dead epilogue vector loops and also prevents crashes in cases where we can prove both the epilogue and scalar loop are dead. Fixes https://github.com/llvm/llvm-project/issues/186846 PR: https://github.com/llvm/llvm-project/pull/187016	2026-03-20 12:33:16 +00:00
Sander de Smalen	a971089cb8	[LV] Explain why a less profitable VF was chosen (NFCI) (#187469 ) I was very puzzled the other day when it showed that VF 8 had a cost of X and VF 16 had a cost of X/2, yet it still choose VF 8. This PR adds some extra debug output to explain why this happens.	2026-03-20 07:21:17 +00:00
Florian Hahn	fd3cf1c160	[LV] Move dereferenceability check from Legal to VPlan (NFC) (#185323 ) Instead of checking dereferenceability early during LoopVectorizationLegality, defer the check to VPlan construction via areAllLoadsDereferenceable. This in preparation for supporting early exit vectorization of non-dereferencable loads, e.g. via speculative loads (https://discourse.llvm.org/t/rfc-provide-intrinsics-for-speculative-loads/89692) or first-faulting loads. Detection in VPlan allows easily replacing potentially non-deref loads with other loads as needed. PR: https://github.com/llvm/llvm-project/pull/185323	2026-03-19 19:21:45 +00:00
Florian Hahn	cdaf29f84d	Revert "[LV] Simplify and unify resume value handling for epilogue vec." (#187504 ) Reverts llvm/llvm-project#185969 This is suspected to cause a miscompile in 549.fotonik3d_r from SPEC 2017 FP	2026-03-19 14:38:37 +00:00
Graham Hunter	b227fab5a6	[NFC][LV] Introduce enums for uncountable exit detail and style (#184808 ) Recursively splitting out some work from #183318; this covers the enums for early exit loop type (none, readonly, readwrite) and the style used (just readonly and masked-handle-ee-in-scalar-tail for now) and refactoring for basic use of those enums.	2026-03-19 14:17:25 +00:00
Florian Hahn	78a8f00977	Revert "[VPlan] Create header phis once regions have been created (NFC)." This reverts commit 91b928f919364b29e241821fc639b9ef56dab1a5. This complicates some analysis that need the happen on the scalar VPlan, before regions have been created, e.g. https://github.com/llvm/llvm-project/pull/185323/.	2026-03-19 12:53:12 +00:00
Elvis Wang	53f8f3b017	Reland [LV] Replace remaining LogicalAnd to vp.merge in EVL optimization. (#184068 ) (#187199 ) This patch replace the remaining LogicalAnd to vp.merge in the second pass to not break the `m_RemoveMask` pattern in the optimizeMaskToEVL. Also skip cost model comparison when the plan contains `vp_merge` which won't be calculated by the legacy model. This can help to remove header mask for FindLast reduction (CSA) loops. Original PR: https://github.com/llvm/llvm-project/pull/184068 Original built-bot failure: https://lab.llvm.org/buildbot/#/builders/213/builds/2497	2026-03-19 07:56:42 +08:00
John Brawn	a083e19efe	[VPlan] Add the cost of spills when considering register pressure (#179646 ) Currently when considering register pressure is enabled, we reject any VF that has higher pressure than the number of registers. However this can result in failing to vectorize in cases where it's beneficial, as the cost of the extra spills is less than the benefit we get from vectorizing. Deal with this by instead calculating the cost of spills and adding that to the rest of the cost, so we can detect this kind of situation and still vectorize while avoiding vectorizing in cases where the extra cost makes it not with it.	2026-03-18 15:30:39 +00:00
Alexis Engelke	080bc25728	[IR][NFCI] Remove *WithoutDebug (#187240 ) The function instructionsWithoutDebug serves two uses: skipping debug intrinsics and skipping pseudo instructions. Nonetheless, these functions are expensive due to out-of-line filtering using std::function. Ideally, the filter should be inlined, but that would require including IntrinsicInst.h in BasicBlock.h. We no longer use debug intrinsics, so the first use (parameter false) is no longer needed. The second use is sometimes needed, but the distinction between PseudoProbe instructions can be made at the call sites more easily in many cases. Therefore, remove instructionsWithoutDebug/sizeWithoutDebug. c-t-t stage2-O3 -0.21%.	2026-03-18 15:08:41 +00:00
Florian Hahn	91b928f919	[VPlan] Create header phis once regions have been created (NFC). Since 1b29ac1d1857ea42273fc7862ea019a74a55195d, regions are constructed as part of the scalar transforms; moving header phi creation after region creation slightly simplifies the code.	2026-03-17 08:02:56 +00:00
Florian Hahn	013f2542a2	[LV] Simplify and unify resume value handling for epilogue vec. (#185969 ) This patch tries to drastically simplify resume value handling for the scalar loop when vectorizing the epilogue. It uses a simpler, uniform approach for updating all resume values in the scalar loop: 1. Create ResumeForEpilogue recipes for all scalar resume phis in the main loop (the epilogue plan will have exactly the same scalar resume phis, in exactly the same order) 2. Update ::execute for ResumeForEpilogue to set the underlying value when executing. This is not super clean, but allows easy lookup of the generated IR value when we update the resume phis in the epilogue. Once we connect the 2 plans together explicitly, this can be removed. 3. Use the list of ResumeForEpilogue VPInstructions from the main loop to update the resume/bypass values from the epilogue. This simplifies the code quite a bit, makes it more robust (should fix https://github.com/llvm/llvm-project/issues/179407) and also fixes a mis-compile in the existing tests (see change in llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-sub-epilogue-vec.ll, where previously we would incorrectly resume using the start value when the epilogue iteration check failed) In some cases, we get simpler code, due to additional CSE, in some cases the induction end value computations get moved from the epilogue iteration check to the vector preheader. We could try to sink the instructions as cleanup, but it is probably not worth the trouble. Fixes https://github.com/llvm/llvm-project/issues/179407.	2026-03-16 21:21:59 +00:00
Florian Hahn	9de31c4a3e	[VPlan] Create zero resume value for CanIV directly (NFC). The start value of the canonical IV is always 0. Assert and generate zero VPValue manually in preparation for https://github.com/llvm/llvm-project/pull/156262. Split off as suggested.	2026-03-14 21:13:20 +00:00
Alexis Engelke	efcd3b6108	[IPO][InstCombine][Vectorize][NFCI] Drop uses of BranchInst (#186596 ) Refactor remaining parts of Transforms apart from Scalar and Utils.	2026-03-14 17:49:00 +00:00
Florian Hahn	1b29ac1d18	[LV] Move predication, early exit & region handling to VPlan0 (NFCI) (#185305 ) Move handleEarlyExits, predication and region creation to operate directly on VPlan0. This means they only have to run once, reducing compile time a bit; the relative order remains unchanged. Introducing the regions at this point in particular unlocks performing more transforms once, on the initial VPlan, instead of running them for each VF. Whether a scalar epilogue is required is still determined by legacy cost model, so we need to still account for that in the VF specific VPlan logic. PR: https://github.com/llvm/llvm-project/pull/185305	2026-03-14 17:14:08 +00:00
Florian Hahn	475cc4fe0b	[VPlan] Account for any-of costs in legacy cost model Some VPlan transforms, like vectorizing fmin without fast-math, introduce AnyOfs, which have costs assigned in the VPlan-based cost model, but not the legacy cost model. Account for their cost like done for other similar VPInstrctions, like EVL. Fixes https://github.com/llvm/llvm-project/issues/185867.	2026-03-12 21:51:23 +00:00
Alexis Engelke	4fd826d1f9	[IR] Split Br into UncondBr and CondBr (#184027 ) BranchInst currently represents both unconditional and conditional branches. However, these are quite different operations that are often handled separately. Therefore, split them into separate opcodes and classes to allow distinguishing these operations in the type system. Additionally, this also slightly improves compile-time performance.	2026-03-11 12:31:10 +00:00
Florian Hahn	c79a058a6a	[VPlan] Materialize VectorTripCount in narrowInterleaveGroups. (#182146 ) When narrowInterleaveGroups transforms a plan, VF and VFxUF are materialized (replaced with concrete values). This patch also materializes the VectorTripCount in the same transform. This ensures that VectorTripCount is properly computed when the narrow interleave transform is applied, instead of using the original VF + UF to compute the vector trip count. The previous behavior generated correct code, but executed fewer iterations in the vector loop. The change also enables stricter verification prevent accesses of UF, VF, VFxUF etc after materialization as follow-up. Note that in some cases we no miss branch folding, but that should be addressed separately, https://github.com/llvm/llvm-project/pull/181252 Fixes one of the violations accessing a VectorTripCount after UF and VF being materialized PR: https://github.com/llvm/llvm-project/pull/182146	2026-03-10 12:33:30 +00:00
Aiden Grossman	e30f9c1946	Revert "Reapply "[VPlan] Remove manual region removal when simplifying for VF and UF. (#181252 )"" This reverts commit 6aa115bba55054b0dc81ebfc049e8c7a29e614b2. This is causing crashes. See #185345 for details.	2026-03-09 04:24:01 +00:00
Florian Hahn	6aa115bba5	Reapply "[VPlan] Remove manual region removal when simplifying for VF and UF. (#181252 )" This reverts commit d7e037c8383e66e5c07897f144f6d8ef47258682. Recommit with a small fix to properly handle ordered reductions when connecting the epilogue. Original message: Replace manual region dissolution code in simplifyBranchConditionForVFAndUF with using general removeBranchOnConst. simplifyBranchConditionForVFAndUF now just creates a (BranchOnCond true) or updates BranchOnTwoConds. The loop then gets automatically removed by running removeBranchOnConst. This removes a bunch of special logic to handle header phi replacements and CFG updates. With the new code, there's no restriction on what kind of header phi recipes the loop contains. Note that VPEVLBasedIVRecipe needs to be marked as readnone. This is technically unrelated, but I could not find an independent test that would be impacted. The code to deal with epilogue resume values now needs updating, because we may simplify a reduction directly to the start value. PR: https://github.com/llvm/llvm-project/pull/181252	2026-03-08 11:13:40 +00:00
Florian Hahn	2ce5f91425	[VPlan] Optimize resume values of IVs together with other exit values. (#174239 ) Remove updateScalarResumePhis and create extracts for live-outs early in addInitialSkeleton. Instead of extracting the from the header phi recipes for the resume values (which is incorrect), extract the last lane of the backedege value. Then update optimizeInductionExitUsers to optimize both the scalar resume values for IVs and IV exit values together. This removes the need to pass state between transforms and addresses a TODO. PR: https://github.com/llvm/llvm-project/pull/174239	2026-03-06 17:05:53 +00:00
Benjamin Maxwell	03c34bb59e	[LV] Support interleaving with FindLast reductions (#184099 ) This extends the existing support to work with arbitrary interleave factors. The main change here is reworking the ExtractLastActive VPInstruction to take a variable amount of arguments and handling it in unrollRecipeByUF and VPInstruction::generate. The select condition for all mask/data values in a find-last recurrence is the true if the mask for any part is true. Because of this the masks for inactive parts will be updated to all-false when the parts with active lanes are updated. This ensures the mask/data for last active element always corresponds to the greatest part with an active lane. This means finding the last element in the middle block simply requires chaining the `extract.last.active` to forward the result from the last active part through any inactive parts ahead of it.	2026-03-06 15:30:58 +00:00
Luke Lau	825129378e	[VPlan] Move tail folding out of VPlanPredicator. NFC (#176143 ) Currently the logic for introducing a header mask and predicating the vector loop region is done inside introduceMasksAndLinearize. This splits the tail folding part out into an individual VPlan transform so that VPlanPredicator.cpp doesn't need to worry about tail folding, which seemed to be a temporary measure according to a comment in VPlanTransforms.h. To perform tail folding independently, this splits the "body" of the vector loop region between the phis in the header and the branch + iv increment in the latch: Before: ``` +-------------------------------------------+ \|%iv = ... \| \|... \| \|%iv.next = add %iv, vfxuf \| \|branch-on-count %iv.next, vector-trip-count\| +-------------------------------------------+ ``` After: ``` +-------------------------------------------+ \|%iv = ... \| \|%wide.iv = widen-canonical-iv ... \| \|%header-mask = icmp ule %wide.iv, BTC \|---+ \|branch-on-cond %header-mask \| \| +-------------------------------------------+ \| \| \| v \| +-------------------------------------------+ \| \|... \| \| +-------------------------------------------+ \| \| \| v \| +-------------------------------------------+ \| \|%iv.next = add %iv, vfxuf \|<--+ \|branch-on-count %iv.next, vector-trip-count\| +-------------------------------------------+ ``` Phis are then inserted in the latch for any value in the loop body that have outside uses, with poison as their incoming value from the header edge. The motivation for this is to allow us to share the same "predicate all successor blocks" type of predication we do for tail folding, but for early-exit loops in #172454. This may also allow us to directly emit an EVL based header mask, instead of having to match + transform the existing header mask in addExplicitVectorLength. This also allows us to eventually handle recurrences in the same transform, avoiding the need to special case tail folding in addReductionResultComputation.	2026-03-05 08:17:37 +00:00
Benjamin Maxwell	c6bb6a7e42	[LV] Add `-force-target-supports-masked-memory-ops` option (#184325 ) This can be used to make target agnostic tail-folding tests much less verbose, as masked loads/stores can be used rather than scalar predication.	2026-03-04 13:36:29 +00:00
Luke Lau	bcc272b322	[LV] Remove DataAndControlFlowWithoutRuntimeCheck. NFC (#183762 ) After #144963 and #183292 we never emit the runtime check, so DataAndControlFlowWithoutRuntimeCheck is equivalent to DataAndControlFlow. With that we only need to store one tail folding style instead of two, because we don't need to distinguish whether or not the IV update overflows (to a non-zero value)	2026-03-02 21:14:04 +08:00
Tomer Shafir	265c1f4833	[LV] Add debug print for TTI.MaxInterleaveFactor (NFC) (#183309 ) As its not currently visible in the debug output. --------- Co-authored-by: Sander de Smalen <sander.desmalen@arm.com>	2026-03-02 10:21:58 +02:00
Florian Hahn	3cf53f684d	[LV] Handle sunk reverse VPInstruction in planContainsAdditionalSimps. Licm can now sink reverse VPInstructions outside the loop region; they won't be considered when computing costs. Account for that in planContainsAdditionalSimplifications. Fixes https://github.com/llvm/llvm-project/issues/183592.	2026-03-01 18:44:46 +00:00
Benjamin Maxwell	74c0ee7e72	[TTI] Remove TargetLibraryInfo from IntrinsicCostAttributes (NFC) (#183764 ) This is a remnant from when `sincos` costs used the vector mappings from `TargetLibraryInfo::getVectorMappingInfo`.	2026-03-01 10:16:16 +00:00
Florian Hahn	d7e037c838	Revert "[VPlan] Remove manual region removal when simplifying for VF and UF. (#181252 )" This reverts commit 9c53215d213189d1f62e8f6ee7ba73a089ac2269. Appears to cause crashes with ordered reductions, revert while I investigate	2026-02-27 21:29:41 +00:00
Florian Hahn	9c53215d21	[VPlan] Remove manual region removal when simplifying for VF and UF. (#181252 ) Replace manual region dissolution code in simplifyBranchConditionForVFAndUF with using general removeBranchOnConst. simplifyBranchConditionForVFAndUF now just creates a (BranchOnCond true) or updates BranchOnTwoConds. The loop then gets automatically removed by running removeBranchOnConst. This removes a bunch of special logic to handle header phi replacements and CFG updates. With the new code, there's no restriction on what kind of header phi recipes the loop contains. Note that VPEVLBasedIVRecipe needs to be marked as readnone. This is technically unrelated, but I could not find an independent test that would be impacted. The code to deal with epilogue resume values now needs updating, because we may simplify a reduction directly to the start value. PR: https://github.com/llvm/llvm-project/pull/181252	2026-02-27 16:49:54 +00:00
Luke Lau	d43213fe80	Revert "[VPlan] Don't drop NUW flag on tail folded canonical IVs (#183301 )" (#183698 ) This reverts commit b0b3e3e1c7f6387eabc2ef9ff1fea311e63a4299. After thinking about this for a bit, I don't think this is correct. vscale being a power-of-2 only guarantees the canonical IV increment overflows to zero, but not overflows in general.	2026-02-27 09:13:33 +00:00
Luke Lau	b0b3e3e1c7	[VPlan] Don't drop NUW flag on tail folded canonical IVs (#183301 ) After #183080 vscale can no longer be a non-power of 2, which means the canonical IV can't overflow with tail folding w/ scalable vectors anymore. Therefore we don't need to drop the NUW flag. IVUpdateMayOverflow is left to be removed in a separate PR since it removes further runtime checks.	2026-02-27 07:19:49 +00:00
Florian Hahn	32b8b9ba1e	[VPlan] Simplify ExitingIVValue and use for tail-folded IVs. (#182507 ) Now that we have ExitingIVValue, we can also use it for tail-folded loops; the only difference is that we have to compute the end value with the original trip count instead the vector trip count. This allows removing the induction increment operand only used when tail-folding. PR: https://github.com/llvm/llvm-project/pull/182507	2026-02-26 11:48:04 +00:00
Benjamin Maxwell	3c566a698a	[LV] Fix miscompile with conditional scalar assignment + tail folding (#182492 ) Previously, we could miscompile when vectorizing conditional scalar assignments with forced tail folding, as the backedge select could be based on the header mask, not the assignment conditional. This resulted in a number of failures in the LLVM test suite when building with `-O3 -march=armv8-a+sve -mllvm -prefer-predicate-over-epilogue=predicate-dont-vectorize`. The patch reworks `handleFindLastReductions()` to correctly handle tail folding.	2026-02-26 09:00:16 +00:00
Luke Lau	1e560c181a	[LV] Remove CheckNeededWithTailFolding from addMinimumIterationCheck. NFC (#183066 ) The IV can no longer overflow with tail folding after #183080.	2026-02-25 18:08:16 +00:00
Luke Lau	a8f5f4a9fc	[VPlan] Assert vplan-verify-each result and fix LastActiveLane verification (#182254 ) Currently if -vplan-verify-each is enabled and a pass fails the verifier, it will output the failure to stderr but will still finish with a zero exit code. This adds an assert that the verification fails so that e.g. lit will pick up verifier failures in the in-tree tests with an EXPENSIVE_CHECKS build. Currently the LastActiveLane verification fails in several tests, so this also includes a fix to handle more prefix masks. All of the prefix masks that the verifier encounters are of the form `icmp ult/ule monotonically-increasing-sequence, uniform`, which always generate a prefix mask. Tested that llvm-test-suite + SPEC CPU 2017 now pass with -vplan-verify-each enabled for RISC-V.	2026-02-26 01:49:39 +08:00
Paul Walker	ab360b1e7e	[LLVM][TTI] Remove the isVScaleKnownToBeAPowerOfTwo hook. (#183292 ) After https://github.com/llvm/llvm-project/pull/183080 this is no longer a configurable property. NOTE: No test changes expected beyond llvm/test/Transforms/LoopVectorize/scalable-predication.ll which has been removed because it only existed to verfiy the now unsupported functionality.	2026-02-25 14:09:52 +00:00
Luke Lau	0e9880cc04	[VPlan] Remove verifyLate from VPlanVerifier. NFC (#182799 ) We can instead just check if the VPlan has been unrolled.	2026-02-23 23:06:37 +08:00
Jonas Paulsson	d3081aafc4	[SystemZ, LoopVectorizer] Enable vectorization of epilogue loops. (#172925 ) This enables vectorization of epilogue loops produced by LoopVectorizer on SystemZ. LoopVectorizationCostModel::isEpilogueVectorizationProfitable() and TTI.preferEpilogueVectorization() have been refactored slightly so that targets can override preferEpilogueVectorization(ElementCount Iters) and directly control this, whereas before this depended on TTI.getMaxInterleaveFactor() as well. The Iters passed to preferEpilogueVectorization() reflects the total number of scalar iterations performed in the vectorized loop (including interleaving). The default implementation of preferEpilogueVectorization() now subsumes the old check against getMaxInterleaveFactor(). This patch should be NFC for other targets.	2026-02-22 10:59:09 -06:00
Florian Hahn	4ce4987381	[VPlan] Optimize FindLast of FindIV w/o sentinel. (#172569 ) For FindLast reduction selecting an IV, we can avoid the horizontal AnyOf in the vector loop, by introducing an independent boolean reduction to track if the condition was ever true in the loop. If it was never true in the loop, we select the start value, otherwise the select the min/max of the FindIV reduction, as required by the predicate. The main advantage of this approach is that we have 2 independent reductions, that do not require a horizontal AnyOf reduction in the loop. Currently this requires a non-wrapping IV, but this can be relaxed in the future by selecting a canonical IV, which is then transformed to the specific derived IV for the reduction after the loop. Depends on https://github.com/llvm/llvm-project/pull/177870. PR: https://github.com/llvm/llvm-project/pull/172569	2026-02-20 21:48:35 +00:00
Luke Lau	6a5375fbce	[VPlan] Plumb recurrence FMFs through VPReductionPHIRecipe via VPIRFlags. NFC (#181694 ) In order to be able to create selects for reduction phis through tail folding in foldTailByMasking (#176143), make VPReductionPHIRecipe an instance of VPIRFlags and plumb the FMFs from the original RdxDesc. This allows us to remove more uses of the RecurrenceDescriptor in addReductionResultComputation, which should help untie it from LoopVectorizationLegality.	2026-02-19 11:23:47 +00:00
Florian Hahn	4042975b63	[LV] Support argmin/argmax with strict predicates. (#170223 ) Extend handleMultiUseReductions to support strict predicates (>, <), matching the first index instead of the last for non-strict predicates. Builds on top of https://github.com/llvm/llvm-project/pull/141431. FindLast reductions with strict predicates are adjusted to compute the correct result as follows: 1. Find the first canonical indices corresponding to partial min/max values, using loop reductions. 2. Find which of the partial min/max values are equal to the overall min/max value. 3. Select among the canonical indices those corresponding to the overall min/max value. 4. Find the first canonical index of overall min/max and scale it back to the original IV using VPDerivedIVRecipe. 5. If the overall min/max equals the starting min/max, the condition in the loop was always false, due to being strict; return the original start value in that case.	2026-02-19 10:52:27 +00:00
Sander de Smalen	114e20805f	[LV] Fix sub-reduction PHI in vectorized epilogue (#182072 ) When the vectorized epilogue loop uses partial reductions, the PHI node in the loop must start at 0 (because for partial sub-reductions the sub is done in the middle block) and the compute-reduction-result must subtract from the partial result (as calculated in the middle block of the main vector loop), instead of subtracting from the original init value. This fixes the issue as reported on #178919 by @aeubanks.	2026-02-19 10:15:32 +00:00
Florian Hahn	cb87d346b4	[VPlan] Retrieve vector trip count from BranchOnCount (NFC). In preparePlanForMainVectorLoop, the vector trip count may already be materialized, e.g. when narrowing interleave groups. VPSymbolicValues should not be accessed after materializing. Instead retrieve the trip count directly from the branch-on-count. This is NFC at the moment, but is needed to tighten VPSymbolicValue access verification.	2026-02-18 21:08:33 +00:00

1 2 3 4 5 ...

2935 Commits