llvm-project

Author	SHA1	Message	Date
Florian Hahn	c79a058a6a	[VPlan] Materialize VectorTripCount in narrowInterleaveGroups. (#182146 ) When narrowInterleaveGroups transforms a plan, VF and VFxUF are materialized (replaced with concrete values). This patch also materializes the VectorTripCount in the same transform. This ensures that VectorTripCount is properly computed when the narrow interleave transform is applied, instead of using the original VF + UF to compute the vector trip count. The previous behavior generated correct code, but executed fewer iterations in the vector loop. The change also enables stricter verification prevent accesses of UF, VF, VFxUF etc after materialization as follow-up. Note that in some cases we no miss branch folding, but that should be addressed separately, https://github.com/llvm/llvm-project/pull/181252 Fixes one of the violations accessing a VectorTripCount after UF and VF being materialized PR: https://github.com/llvm/llvm-project/pull/182146	2026-03-10 12:33:30 +00:00
Sander de Smalen	0da00c325b	[LV] Support float and pointer FindLast reductions (#184101 ) This duplicates #182313 with some very small modifications on top, as @dheaton-arm is unable to finish the PR and I'm unable to push to his branch. Expands support for the `FindLast` Reccurence Kind to floating-point and pointer types, thereby enabling conditional scalar assignment (CSA) for these types. Originally authored by @dheaton-arm --------- Co-authored-by: Damian Heaton <Damian.Heaton@arm.com>	2026-03-09 10:27:06 +00:00
Aiden Grossman	e30f9c1946	Revert "Reapply "[VPlan] Remove manual region removal when simplifying for VF and UF. (#181252 )"" This reverts commit 6aa115bba55054b0dc81ebfc049e8c7a29e614b2. This is causing crashes. See #185345 for details.	2026-03-09 04:24:01 +00:00
Florian Hahn	2207296d3f	[VPlan] Fold constant trunc after EVL simplification. This fixes a crash for the new test after 6aa115bba55054b0dc81ebfc049e8c7a29e614b2.	2026-03-08 19:31:20 +00:00
Florian Hahn	6aa115bba5	Reapply "[VPlan] Remove manual region removal when simplifying for VF and UF. (#181252 )" This reverts commit d7e037c8383e66e5c07897f144f6d8ef47258682. Recommit with a small fix to properly handle ordered reductions when connecting the epilogue. Original message: Replace manual region dissolution code in simplifyBranchConditionForVFAndUF with using general removeBranchOnConst. simplifyBranchConditionForVFAndUF now just creates a (BranchOnCond true) or updates BranchOnTwoConds. The loop then gets automatically removed by running removeBranchOnConst. This removes a bunch of special logic to handle header phi replacements and CFG updates. With the new code, there's no restriction on what kind of header phi recipes the loop contains. Note that VPEVLBasedIVRecipe needs to be marked as readnone. This is technically unrelated, but I could not find an independent test that would be impacted. The code to deal with epilogue resume values now needs updating, because we may simplify a reduction directly to the start value. PR: https://github.com/llvm/llvm-project/pull/181252	2026-03-08 11:13:40 +00:00
Florian Hahn	2ce5f91425	[VPlan] Optimize resume values of IVs together with other exit values. (#174239 ) Remove updateScalarResumePhis and create extracts for live-outs early in addInitialSkeleton. Instead of extracting the from the header phi recipes for the resume values (which is incorrect), extract the last lane of the backedege value. Then update optimizeInductionExitUsers to optimize both the scalar resume values for IVs and IV exit values together. This removes the need to pass state between transforms and addresses a TODO. PR: https://github.com/llvm/llvm-project/pull/174239	2026-03-06 17:05:53 +00:00
Florian Hahn	d316fb0797	[VPlan] Replicate VPScalarIVStepsRecipe by VF outside replicate regions. (#170053 ) Extend replicateByVF to also handle VPScalarIVStepsRecipe. To do so, the patch adds a new lane operand to VPScalarIVStepsRecipe, which is only added when replicating. This enables removing a number of lane 0 computations. The lane operand will also be used to explicitly replicate replicate regions in a follow-up. Depends on https://github.com/llvm/llvm-project/pull/169796 Depends on https://github.com/llvm/llvm-project/pull/170906 PR: https://github.com/llvm/llvm-project/pull/170053	2026-03-05 12:42:20 +00:00
Ramkumar Ramachandra	ca0d100e79	[VPlan] Use VPlan::getZero to improve code (NFC) (#184591 )	2026-03-04 21:21:35 +00:00
Florian Hahn	c370f5af6c	[VPlan] Preserve IsSingleScalar for hoisted predicated load. (#184453 ) The predicated loads may be single scalar (e.g. for VF = 1). We should preserve IsSingleScalar when hoisting them. As all loops access the same address, IsSingleScalar must match across all loads in the group. This fixes an assertion when interleaving-only with hoisted loads. Fixes https://github.com/llvm/llvm-project/issues/184372 PR: https://github.com/llvm/llvm-project/pull/184453	2026-03-04 14:32:00 +00:00
Florian Hahn	bbde3e3b59	[VPlan] Preserve IsSingleScalar for sunken predicated stores. (#184329 ) The predicated stores may be single scalar (e.g. for VF = 1). We should preserve IsSingleScalar. As all stores access the same address, IsSingleScalar must match across all stores in the group. This fixes an assertion when interleaving-only with sunken stores. Fixes https://github.com/llvm/llvm-project/issues/184317 PR: https://github.com/llvm/llvm-project/pull/184329	2026-03-03 14:08:00 +00:00
Ramkumar Ramachandra	b4743b2641	[VPlan] Introduce VPlan::get(Zero\|AllOnes) (NFC) (#184085 )	2026-03-03 09:47:05 +00:00
Luke Lau	bcc272b322	[LV] Remove DataAndControlFlowWithoutRuntimeCheck. NFC (#183762 ) After #144963 and #183292 we never emit the runtime check, so DataAndControlFlowWithoutRuntimeCheck is equivalent to DataAndControlFlow. With that we only need to store one tail folding style instead of two, because we don't need to distinguish whether or not the IV update overflows (to a non-zero value)	2026-03-02 21:14:04 +08:00
Jan Patrick Lehr	60fec80bdc	Revert "[VPlan] Remove unused VPExpandSCEVRecipe before expansion" (#184108 ) Reverts llvm/llvm-project#181329 Breaks: https://lab.llvm.org/buildbot/#/builders/123/builds/36163 Local revert fixes the issue seen in the buildbot.	2026-03-02 12:45:48 +00:00
Mel Chen	c62c00c524	[VPlan] Remove unused VPExpandSCEVRecipe before expansion (#181329 ) VPExpandSCEVRecipe may become unused after VPlan optimizations. This patch removes VPExpandSCEVRecipes with no users before expansion in expandSCEVs, avoiding generating dead code during VPlan execution.	2026-03-02 09:04:59 +00:00
Florian Hahn	320220e48b	[VPlan] Support arbitrary predicated early exits. (#182396 ) This removes the restriction requiring a single predicated early exit. Using MaskedCond, we only combine early-exit conditions with block masks from non-exiting control flow. This means we have to ensure that we check the early exit conditions in program order, to make sure we take the first exit in program order that exits at the first lane for the combined exit condition. To do so, sort the exits by their reverse post-order numbers. Depends on https://github.com/llvm/llvm-project/pull/182395 PR: https://github.com/llvm/llvm-project/pull/182396	2026-03-01 16:07:05 +00:00
Florian Hahn	72525fb4ee	[VPlan] Materialize UF after unrolling (NFCI). Move materialization of the symbolic UF directly to unrollByUF. At this point, unrolling materializes the decision and it is natural to also materialize the symbolic UF here.	2026-02-28 12:44:15 +00:00
Luke Lau	6f9c68d320	[VPlan] Don't adjust trip count for DataAndControlFlowWithoutRuntimeCheck (#183729 ) Previously, the canonical IV increment may have overflowed to a non-zero value due to vscale being a non power-of-two. So we used to emit a runtime check for this. If you didn't want the runtime check, DataAndControlFlowWithoutRuntimeCheck skipped it and instead tweaked the trip count so it wouldn't overflow. However #144963 stopped the check from ever being emitted because vscale is always a power-of-two on AArch64 and RISC-V, so it never overflowed to a non-zero value. And in #183292 the code to emit the check was removed. But we never restored the trip count back to normal when the target's vscale was a power-of-two. Now that vscale is always a power-of-two, this PR avoids adjusting it. A follow up NFC can then remove DataAndControlFlowWithoutRuntimeCheck.	2026-02-28 04:01:58 +00:00
Florian Hahn	d7e037c838	Revert "[VPlan] Remove manual region removal when simplifying for VF and UF. (#181252 )" This reverts commit 9c53215d213189d1f62e8f6ee7ba73a089ac2269. Appears to cause crashes with ordered reductions, revert while I investigate	2026-02-27 21:29:41 +00:00
Florian Hahn	9c53215d21	[VPlan] Remove manual region removal when simplifying for VF and UF. (#181252 ) Replace manual region dissolution code in simplifyBranchConditionForVFAndUF with using general removeBranchOnConst. simplifyBranchConditionForVFAndUF now just creates a (BranchOnCond true) or updates BranchOnTwoConds. The loop then gets automatically removed by running removeBranchOnConst. This removes a bunch of special logic to handle header phi replacements and CFG updates. With the new code, there's no restriction on what kind of header phi recipes the loop contains. Note that VPEVLBasedIVRecipe needs to be marked as readnone. This is technically unrelated, but I could not find an independent test that would be impacted. The code to deal with epilogue resume values now needs updating, because we may simplify a reduction directly to the start value. PR: https://github.com/llvm/llvm-project/pull/181252	2026-02-27 16:49:54 +00:00
Luke Lau	c5c0fe663c	[VPlan] Remove non-power-of-2 scalable VF comment. NFC (#183719 ) No longer holds after #183080	2026-02-27 10:45:17 +00:00
Sander de Smalen	a1f83ba1b6	[LV] NFCI: Move extend optimization to transformToPartialReduction. (#182860 ) The reason for doing this in `transformToPartialReduction` is so that we can create the VPExpressions directly when transforming reductions into partial reductions (to be done in a follow-up PR). I also intent to see if we can merge the in-loop reductions with partial reductions, so that there will be no need for the separate `convertToAbstractRecipes` VPlan Transform pass.	2026-02-27 08:38:13 +00:00
Ramkumar Ramachandra	bd5f9384d8	[VPlan] Extend interleave-group-narrowing to WidenCast (#183204 ) WidenCast is very similar to Widen recipes. Fixes #128062.	2026-02-26 14:50:25 +00:00
Florian Hahn	c5d6feb315	[VPlan] Limit interleave group narrowing to consecutive wide loads. Tighten check in canNarrowLoad to require consecutive wide loads; we cannot properly narrow gathers at the moment. Fixe https://github.com/llvm/llvm-project/issues/183345.	2026-02-26 12:52:31 +00:00
Florian Hahn	32b8b9ba1e	[VPlan] Simplify ExitingIVValue and use for tail-folded IVs. (#182507 ) Now that we have ExitingIVValue, we can also use it for tail-folded loops; the only difference is that we have to compute the end value with the original trip count instead the vector trip count. This allows removing the induction increment operand only used when tail-folding. PR: https://github.com/llvm/llvm-project/pull/182507	2026-02-26 11:48:04 +00:00
Benjamin Maxwell	3c566a698a	[LV] Fix miscompile with conditional scalar assignment + tail folding (#182492 ) Previously, we could miscompile when vectorizing conditional scalar assignments with forced tail folding, as the backedge select could be based on the header mask, not the assignment conditional. This resulted in a number of failures in the LLVM test suite when building with `-O3 -march=armv8-a+sve -mllvm -prefer-predicate-over-epilogue=predicate-dont-vectorize`. The patch reworks `handleFindLastReductions()` to correctly handle tail folding.	2026-02-26 09:00:16 +00:00
Florian Hahn	bf4705c05b	[VPlan] Supported conditionally executed single early exits. (#182395 ) Add support for a single early exit that is executed conditionally. To make sure the mask from any non-exiting control flow is combined with the early exit condition. To do so, introduce a MaskedCond VPInstruction, which is inserted as user of the early-exit condition, at the point of the early-exit branch. The VPInstruction will get masked automatically if needed by the predicator, ensuring that we properly account for it when checking whether the early exit has been taken. Note that this does not allow for instructions that require predication after the early exit. This requires additional work in progress: https://github.com/llvm/llvm-project/pull/172454 As an alternative to MaskedCond, we could also predicate before handling early exiting blocks: https://github.com/llvm/llvm-project/pull/181830 PR: https://github.com/llvm/llvm-project/pull/182395	2026-02-25 14:28:04 +00:00
Florian Hahn	804572136e	[VPlan] Allow recursive narrowing in interleave group narrowing. (#167310 ) This allows canNarrowOps to recursively check if operands can be narrowed, enabling narrowing of longer chains of operations that feed interleave groups. Depends on https://github.com/llvm/llvm-project/pull/167309. PR: https://github.com/llvm/llvm-project/pull/167310	2026-02-24 21:30:00 +00:00
Benjamin Maxwell	3b5a05d0b2	Revert "[VPlan] Strengthen materializeFactors with assert (NFC) (#181665 )" (#183014 ) This PR did not solve the TODO as intended. Reverting so the TODO is not lost. This reverts commit aab9412a69a07787e9ec98b25709d709b7b537a6.	2026-02-24 18:03:01 +00:00
Ramkumar Ramachandra	e147b3a05e	[VPlan] Fix alias logic in canHoistOrSinkWithNoAliasCheck (#179504 ) The correct way to check if two memory locations may alias is outlined in ScopedNoAliasAAResult::alias: extract this into a helper, to fix the current logic.	2026-02-24 16:14:44 +00:00
Florian Hahn	72c0a074db	[VPlan] Move out canNarrowOps (NFC). (#167309 ) Move definition of canNarrowOps out to static function, to make it easier to extend + generalize PR: https://github.com/llvm/llvm-project/pull/167309	2026-02-24 14:20:47 +00:00
Florian Hahn	6b352aa8ea	Revert "[VPlan] Add simple driver option to run some individual transforms. (#178522 )" This reverts commit 3df1c6f88bfbbd76d9256c55358bb75e02e33779. Causes build-failures without assertions https://lab.llvm.org/buildbot/#/builders/159/builds/41683	2026-02-23 22:55:42 +00:00
Florian Hahn	3df1c6f88b	[VPlan] Add simple driver option to run some individual transforms. (#178522 ) Add an alternative to test VPlan in more isolation via a new `vplan-test-transform` option, which builds VPlan0 for each loop in the input IR and then can invoke a set of transforms on it. In order to allow different recipe types to be created, a new widen-from-metadata transform is added, which transforms VPInstructions to different recipes, based on custom !vplan.widen metadata. Currently this supports creating widen & replicate recipes, but can easily be extended in the future. Currently the handling is intentionally bare-bones, to be extended gradually as needed. PR: https://github.com/llvm/llvm-project/pull/178522	2026-02-23 22:49:00 +00:00
Luke Lau	ff88b83fed	[VPlan] Handle extracts for middle blocks also used by early exiting blocks. NFC (#181789 ) Currently createExtractsForLiveOuts only handles creating extracts when the middle block has one predecessor, but if an early exit exits to the same block as the latch then it might have multiple predecessors. This handles the latter case to avoid the need to handle it in VPlanTransforms::handleUncountableEarlyExits. Addresses the comment in https://github.com/llvm/llvm-project/pull/174864#discussion_r2794153217	2026-02-23 04:03:49 +00:00
Florian Hahn	4ce4987381	[VPlan] Optimize FindLast of FindIV w/o sentinel. (#172569 ) For FindLast reduction selecting an IV, we can avoid the horizontal AnyOf in the vector loop, by introducing an independent boolean reduction to track if the condition was ever true in the loop. If it was never true in the loop, we select the start value, otherwise the select the min/max of the FindIV reduction, as required by the predicate. The main advantage of this approach is that we have 2 independent reductions, that do not require a horizontal AnyOf reduction in the loop. Currently this requires a non-wrapping IV, but this can be relaxed in the future by selecting a canonical IV, which is then transformed to the specific derived IV for the reduction after the loop. Depends on https://github.com/llvm/llvm-project/pull/177870. PR: https://github.com/llvm/llvm-project/pull/172569	2026-02-20 21:48:35 +00:00
Sander de Smalen	46bfd69343	[LV] NFCI: Add RecurKind to VPPartialReductionChain (#181705 ) This avoids having to pass around the RecurKind or re-figure it out from the VPReductionPHI node. This is useful in a follow-up PR, where we need to distinguish between a `Sub` and `AddWithSub` recurrence, which can't be deduced from the `ReductionBinOp` field.	2026-02-19 13:35:10 +00:00
Luke Lau	6a5375fbce	[VPlan] Plumb recurrence FMFs through VPReductionPHIRecipe via VPIRFlags. NFC (#181694 ) In order to be able to create selects for reduction phis through tail folding in foldTailByMasking (#176143), make VPReductionPHIRecipe an instance of VPIRFlags and plumb the FMFs from the original RdxDesc. This allows us to remove more uses of the RecurrenceDescriptor in addReductionResultComputation, which should help untie it from LoopVectorizationLegality.	2026-02-19 11:23:47 +00:00
Benjamin Maxwell	867272d52a	[LV] Pass symbolic VF to CalculateTripCountMinusVF and CanonicalIVIncrementForPart (NFC) (#180542 ) This makes it easier to update the runtime VF per VPlan.	2026-02-18 08:58:47 +00:00
Shih-Po Hung	97fa3e5936	[NFC][VPlan] Rename VPEVLBasedIVPHIRecipe to VPCurrentIterationPHIRecipe (#177114 ) This is groundwork for #151300, which aims to support first-faulting loads in non-tail-folded early-exit loops. Per #175900, we need a variable-length stepping transform that can shared between EVL and non-EVL loops. The idea is to have an EVL-independent counter and transform for tracking the cumulative number of processed elements. This patch renames the existing counter (VPEVLBasedIVPHIRecipe) and transform (canonicalizeEVLLoops) to be EVL-independent: - Rename VPEVLBasedIVPHIRecipe to VPCurrentIterationRecipe to reflect its general purpose of tracking processed element count. - Rename canonicalizeEVLLoops to convertToVariableLengthStep. This is NFC.	2026-02-18 07:04:58 +00:00
Ramkumar Ramachandra	aab9412a69	[VPlan] Strengthen materializeFactors with assert (NFC) (#181665 ) This fixes a TODO.	2026-02-17 16:18:22 +00:00
Luke Lau	09a0615686	[VPlan] Simplify worklist in reassociateHeaderMask. NFC (#181595 ) Addresses review comments from https://github.com/llvm/llvm-project/pull/180898#pullrequestreview-3791945590. We don't need to recursively collect direct users of the header mask, we can do that as a separate step so that the main worklist loop only handles potentially reassociatable candidates. Also add back mention of tail folding to comment and a TODO.	2026-02-17 14:36:25 +00:00
Florian Hahn	cdaeecabf7	[VPlan] Only remove backedge if IV is still incremented by VFxUF. After 6f253e87dd, VFxUF may have been replaced by UF, in which case the simplification is no longer correct. Tighten check to make sure the increment is still what we expect. Fixes a miscompile in the added test case.	2026-02-17 11:40:32 +00:00
Ramkumar Ramachandra	2b7c1f9d82	[VPlan] Directly unroll VectorEndPointerRecipe (#172372 ) Directly unroll VectorEndPointerRecipe following 0636225b ([VPlan] Directly unroll VectorPointerRecipe, #168886). It allows us to leverage existing VPlan simplifications to optimize. Co-authored-by: Luke Lau <luke@igalia.com> Co-authored-by: Florian Hahn <flo@fhahn.com>	2026-02-16 09:59:55 +00:00
Ramkumar Ramachandra	edfe43cc9e	[VPlan] Factor common VPDT-sort in sink-replicate (NFC) (#179214 )	2026-02-16 07:24:55 +00:00
Florian Hahn	6f253e87dd	Reapply "[VPlan] Run narrowInterleaveGroups during general VPlan optimizations. (#149706 )" This reverts commit 8d29d09309654541fb2861524276ada6a3ebf84c. The underlying issue causing the revert has been fixed independently as 301fa24671256734df6b7ee65f23ad885400108e. Original message: Move narrowInterleaveGroups to to general VPlan optimization stage. To do so, narrowInterleaveGroups now has to find a suitable VF where all interleave groups are consecutive and saturate the full vector width. If such a VF is found, the original VPlan is split into 2: a) a new clone which contains all VFs of Plan, except VFToOptimize, and b) the original Plan with VFToOptimize as single VF. The original Plan is then optimized. If a new copy for the other VFs has been created, it is returned and the caller has to add it to the list of candidate plans. Together with https://github.com/llvm/llvm-project/pull/149702, this allows to take the narrowed interleave groups into account when computing costs to choose the best VF and interleave count. One example where we currently miss interleaving/unrolling when narrowing interleave groups is https://godbolt.org/z/Yz77zbacz PR: https://github.com/llvm/llvm-project/pull/149706	2026-02-15 20:10:10 +00:00
Florian Hahn	f3a816598d	[VPlan] Add VPSymbolicValue for UF. (NFC) Add a symbolic unroll factor (UF) to VPlan similar to VF & VFxUF that gets replaced with the concrete UF during plan execution, similar to how VF is used for the vectorization factor. This is a preparatory change that allows transforms to use the symbolic UF before the concrete UF is determined. Note that the old getUF that returns the concrete UF after unrolling has been renamed to getConcreteUF. Split off from the re-commit of 8d29d093096 (https://github.com/llvm/llvm-project/pull/149706) as suggested.	2026-02-15 15:24:35 +00:00
Florian Hahn	f26e8595c3	[VPlan] Use VPlan::getConstantInt in a few more cases (NFC). VPlan::getConstantInt() allows for slightly more compact creation of VPIRValues wrapping ConstantInts.	2026-02-15 14:45:33 +00:00
Brian Cain	02429c4633	[LV] Fix strict weak ordering violation in handleUncountableEarlyExits sort (#181462 ) The sort comparator used VPDT.dominates() which returns true for dominates(A, A), violating the irreflexivity requirement of strict weak ordering. With _GLIBCXX_DEBUG enabled (LLVM_ENABLE_EXPENSIVE_CHECKS=ON), std::sort validates this property and aborts: Error: comparison doesn't meet irreflexive requirements, assert(!(a < a)). Use properlyDominates() instead, which correctly returns false for equal inputs while preserving the intended dominance-based ordering. This fixes a crash introduced by ede1a9626b89 ("[LV] Vectorize early exit loops with multiple exits.").	2026-02-14 23:05:39 -06:00
Florian Hahn	ede1a9626b	[LV] Vectorize early exit loops with multiple exits. (#174864 ) Building on top of the recent changes to introduce BranchOnTwoConds, this patch adds support for vectorizing loops with multiple early exits, all dominating a countable latch. The early exits must form a dominance chain, so we can simply check which early exit has been taken in dominance order. Currently LoopVectorizationLegality ensures that all exits other than the latch must be uncountable. handleUncountableEarlyExits now collects those uncountable exits and processes each exit. In the vector region, we compute if any exit has been taken, by taking the OR of all early exit conditions (EarlyExitConds) and checking if there's any active lane. If the early exit is taken, we exit the loop and compute which early exit has been taken. The first taken early exit is the one where its exit condition is true in the first active lane of EarlyExitConds. We create a chain of dispatch blocks outside the loop to check this for the early exit blocks ordered by dominance. Depends on https://github.com/llvm/llvm-project/pull/174016. PR: https://github.com/llvm/llvm-project/pull/174864	2026-02-13 16:44:23 +00:00
Ramkumar Ramachandra	ec0b22ff47	[VPlan] Reuse introduces-broadcast logic in narrowToSingleScalars (#174444 ) narrowToSingleScalarRecipes' operands check is a bit too restrictive by permitting a single user. Factor out and reuse the existing introduces-broadcast logic to improve results.	2026-02-13 15:56:57 +00:00
Florian Hahn	a55fbab0cf	[VPlan] Run initial recipe simplification on VPlan0. (#176828 ) In some cases, LV gets simplifyable IR as input. Directly apply simplifications on the initial VPlan0 to avoid vectorization in cases where the loop body can be folded away. Using the end-to-end pipeline, this is relatively rare, but when reducing test cases, the reduction often ends up with cases with trivial folds. Rejecting those will result in more robust & realistic test cases. As follow-up, I also plan to add initial dead recipe removal. Depends on https://github.com/llvm/llvm-project/pull/176795. PR: https://github.com/llvm/llvm-project/pull/176828	2026-02-13 12:01:22 +00:00

1 2 3 4 5 ...

690 Commits