llvm-project

Author	SHA1	Message	Date
Alexey Bataev	e317f42455	[SLP]Recalculate dependencies for the buildvector schedule node, if they have copyable node Need to recalculate the deps for all buildvector nodes with copyable deps to prevent a compiler crash during scheduling of instructions	2026-02-28 12:29:47 -08:00
Florian Hahn	72525fb4ee	[VPlan] Materialize UF after unrolling (NFCI). Move materialization of the symbolic UF directly to unrollByUF. At this point, unrolling materializes the decision and it is natural to also materialize the symbolic UF here.	2026-02-28 12:44:15 +00:00
Luke Lau	6f9c68d320	[VPlan] Don't adjust trip count for DataAndControlFlowWithoutRuntimeCheck (#183729 ) Previously, the canonical IV increment may have overflowed to a non-zero value due to vscale being a non power-of-two. So we used to emit a runtime check for this. If you didn't want the runtime check, DataAndControlFlowWithoutRuntimeCheck skipped it and instead tweaked the trip count so it wouldn't overflow. However #144963 stopped the check from ever being emitted because vscale is always a power-of-two on AArch64 and RISC-V, so it never overflowed to a non-zero value. And in #183292 the code to emit the check was removed. But we never restored the trip count back to normal when the target's vscale was a power-of-two. Now that vscale is always a power-of-two, this PR avoids adjusting it. A follow up NFC can then remove DataAndControlFlowWithoutRuntimeCheck.	2026-02-28 04:01:58 +00:00
Alexey Bataev	12e1075b64	[SLP]Fix operand reordering when estimating profitability of operands Need to swap operand for a single instruction, not for the the same lane of the first and second instruction in the list	2026-02-27 16:16:22 -08:00
Florian Hahn	73d655a598	[VPlan] Support unrolling/cloning masked VPInstructions. Account for masked VPInstruction when verifying the operands in the constructor. Fixes a crash when trying to unroll VPlans for predicated early exits.	2026-02-27 22:14:45 +00:00
Florian Hahn	d7e037c838	Revert "[VPlan] Remove manual region removal when simplifying for VF and UF. (#181252 )" This reverts commit 9c53215d213189d1f62e8f6ee7ba73a089ac2269. Appears to cause crashes with ordered reductions, revert while I investigate	2026-02-27 21:29:41 +00:00
Akash Dutta	cf28f23f10	[SLP] Reject duplicate shift amounts in matchesShlZExt reorder path (#183627 ) In the reordered RHS path of matchesShlZExt, the code never checked that each shift amount (0, Stride, 2×Stride, …) appears at most once. When the same shift appeared in multiple lanes, it still filled Order, producing a non-permutation (e.g. Order = [0,0,0,1]). That led to bad shuffle masks and miscompilation (e.g. shuffles with poison). The patch adds an explicit duplicate check: before setting Order[Idx] = Pos, it ensures Pos has not been seen before, using a SmallBitVector SeenPositions(VF). If a position is seen twice, the function returns false and the optimization is not applied.	2026-02-27 13:00:58 -06:00
Florian Hahn	9c53215d21	[VPlan] Remove manual region removal when simplifying for VF and UF. (#181252 ) Replace manual region dissolution code in simplifyBranchConditionForVFAndUF with using general removeBranchOnConst. simplifyBranchConditionForVFAndUF now just creates a (BranchOnCond true) or updates BranchOnTwoConds. The loop then gets automatically removed by running removeBranchOnConst. This removes a bunch of special logic to handle header phi replacements and CFG updates. With the new code, there's no restriction on what kind of header phi recipes the loop contains. Note that VPEVLBasedIVRecipe needs to be marked as readnone. This is technically unrelated, but I could not find an independent test that would be impacted. The code to deal with epilogue resume values now needs updating, because we may simplify a reduction directly to the start value. PR: https://github.com/llvm/llvm-project/pull/181252	2026-02-27 16:49:54 +00:00
Luke Lau	d8671280d4	[VPlan] Add nuw to unrolled canonical IVs (#183716 ) After #183080, the canonical IV (not the increment!) can't overflow. So now canonical IVs that are unrolled will have steps that don't overflow, so we can add the nuw flag. This allows us to tighten the VPlanVerifier isKnownMonotonic check by restricting it to adds with nuw.	2026-02-27 11:46:29 +00:00
Luke Lau	c5c0fe663c	[VPlan] Remove non-power-of-2 scalable VF comment. NFC (#183719 ) No longer holds after #183080	2026-02-27 10:45:17 +00:00
Luke Lau	d43213fe80	Revert "[VPlan] Don't drop NUW flag on tail folded canonical IVs (#183301 )" (#183698 ) This reverts commit b0b3e3e1c7f6387eabc2ef9ff1fea311e63a4299. After thinking about this for a bit, I don't think this is correct. vscale being a power-of-2 only guarantees the canonical IV increment overflows to zero, but not overflows in general.	2026-02-27 09:13:33 +00:00
Sander de Smalen	a1f83ba1b6	[LV] NFCI: Move extend optimization to transformToPartialReduction. (#182860 ) The reason for doing this in `transformToPartialReduction` is so that we can create the VPExpressions directly when transforming reductions into partial reductions (to be done in a follow-up PR). I also intent to see if we can merge the in-loop reductions with partial reductions, so that there will be no need for the separate `convertToAbstractRecipes` VPlan Transform pass.	2026-02-27 08:38:13 +00:00
Simon Pilgrim	92704064e5	[VectorCombine][X86] Ensure we recognise free sign extends of vector comparison results (#183575 ) Unless we're working with AVX512 mask predicate types, sign extending a vXi1 comparison result back to the width of the comparison source types is free. VectorCombine::foldShuffleOfCastops - pass the original CastInst in the getCastInstrCost calls to track the source comparison instruction. Fixes #165813	2026-02-27 07:55:39 +00:00
Luke Lau	b0b3e3e1c7	[VPlan] Don't drop NUW flag on tail folded canonical IVs (#183301 ) After #183080 vscale can no longer be a non-power of 2, which means the canonical IV can't overflow with tail folding w/ scalable vectors anymore. Therefore we don't need to drop the NUW flag. IVUpdateMayOverflow is left to be removed in a separate PR since it removes further runtime checks.	2026-02-27 07:19:49 +00:00
Florian Hahn	d5e501725e	Reapply "[VPlan] Use VPInstructionWithType for Load in VPlan0 (NFC)" This reverts commit 97835516393311d681d1ff6bec67e1093f94890e. Unit tests have been updated	2026-02-26 22:39:33 +00:00
Aiden Grossman	9783551639	Revert "[VPlan] Use VPInstructionWithType for Load in VPlan0 (NFC)" This reverts commit 2576ee1fd93fb87699650734ffafdb8092062d59. This was causing test failures when running check-llvm-unit.	2026-02-26 22:35:10 +00:00
Florian Hahn	2576ee1fd9	[VPlan] Use VPInstructionWithType for Load in VPlan0 (NFC) VPInstructionWithType directly allows modeling the loaded type.	2026-02-26 22:08:09 +00:00
Ramkumar Ramachandra	bd5f9384d8	[VPlan] Extend interleave-group-narrowing to WidenCast (#183204 ) WidenCast is very similar to Widen recipes. Fixes #128062.	2026-02-26 14:50:25 +00:00
Florian Hahn	c5d6feb315	[VPlan] Limit interleave group narrowing to consecutive wide loads. Tighten check in canNarrowLoad to require consecutive wide loads; we cannot properly narrow gathers at the moment. Fixe https://github.com/llvm/llvm-project/issues/183345.	2026-02-26 12:52:31 +00:00
Florian Hahn	32b8b9ba1e	[VPlan] Simplify ExitingIVValue and use for tail-folded IVs. (#182507 ) Now that we have ExitingIVValue, we can also use it for tail-folded loops; the only difference is that we have to compute the end value with the original trip count instead the vector trip count. This allows removing the induction increment operand only used when tail-folding. PR: https://github.com/llvm/llvm-project/pull/182507	2026-02-26 11:48:04 +00:00
Benjamin Maxwell	3c566a698a	[LV] Fix miscompile with conditional scalar assignment + tail folding (#182492 ) Previously, we could miscompile when vectorizing conditional scalar assignments with forced tail folding, as the backedge select could be based on the header mask, not the assignment conditional. This resulted in a number of failures in the LLVM test suite when building with `-O3 -march=armv8-a+sve -mllvm -prefer-predicate-over-epilogue=predicate-dont-vectorize`. The patch reworks `handleFindLastReductions()` to correctly handle tail folding.	2026-02-26 09:00:16 +00:00
Luke Lau	1e560c181a	[LV] Remove CheckNeededWithTailFolding from addMinimumIterationCheck. NFC (#183066 ) The IV can no longer overflow with tail folding after #183080.	2026-02-25 18:08:16 +00:00
Luke Lau	a8f5f4a9fc	[VPlan] Assert vplan-verify-each result and fix LastActiveLane verification (#182254 ) Currently if -vplan-verify-each is enabled and a pass fails the verifier, it will output the failure to stderr but will still finish with a zero exit code. This adds an assert that the verification fails so that e.g. lit will pick up verifier failures in the in-tree tests with an EXPENSIVE_CHECKS build. Currently the LastActiveLane verification fails in several tests, so this also includes a fix to handle more prefix masks. All of the prefix masks that the verifier encounters are of the form `icmp ult/ule monotonically-increasing-sequence, uniform`, which always generate a prefix mask. Tested that llvm-test-suite + SPEC CPU 2017 now pass with -vplan-verify-each enabled for RISC-V.	2026-02-26 01:49:39 +08:00
Florian Hahn	bf4705c05b	[VPlan] Supported conditionally executed single early exits. (#182395 ) Add support for a single early exit that is executed conditionally. To make sure the mask from any non-exiting control flow is combined with the early exit condition. To do so, introduce a MaskedCond VPInstruction, which is inserted as user of the early-exit condition, at the point of the early-exit branch. The VPInstruction will get masked automatically if needed by the predicator, ensuring that we properly account for it when checking whether the early exit has been taken. Note that this does not allow for instructions that require predication after the early exit. This requires additional work in progress: https://github.com/llvm/llvm-project/pull/172454 As an alternative to MaskedCond, we could also predicate before handling early exiting blocks: https://github.com/llvm/llvm-project/pull/181830 PR: https://github.com/llvm/llvm-project/pull/182395	2026-02-25 14:28:04 +00:00
Paul Walker	ab360b1e7e	[LLVM][TTI] Remove the isVScaleKnownToBeAPowerOfTwo hook. (#183292 ) After https://github.com/llvm/llvm-project/pull/183080 this is no longer a configurable property. NOTE: No test changes expected beyond llvm/test/Transforms/LoopVectorize/scalable-predication.ll which has been removed because it only existed to verfiy the now unsupported functionality.	2026-02-25 14:09:52 +00:00
Florian Hahn	804572136e	[VPlan] Allow recursive narrowing in interleave group narrowing. (#167310 ) This allows canNarrowOps to recursively check if operands can be narrowed, enabling narrowing of longer chains of operations that feed interleave groups. Depends on https://github.com/llvm/llvm-project/pull/167309. PR: https://github.com/llvm/llvm-project/pull/167310	2026-02-24 21:30:00 +00:00
Benjamin Maxwell	3b5a05d0b2	Revert "[VPlan] Strengthen materializeFactors with assert (NFC) (#181665 )" (#183014 ) This PR did not solve the TODO as intended. Reverting so the TODO is not lost. This reverts commit aab9412a69a07787e9ec98b25709d709b7b537a6.	2026-02-24 18:03:01 +00:00
Ramkumar Ramachandra	e147b3a05e	[VPlan] Fix alias logic in canHoistOrSinkWithNoAliasCheck (#179504 ) The correct way to check if two memory locations may alias is outlined in ScopedNoAliasAAResult::alias: extract this into a helper, to fix the current logic.	2026-02-24 16:14:44 +00:00
Florian Hahn	72c0a074db	[VPlan] Move out canNarrowOps (NFC). (#167309 ) Move definition of canNarrowOps out to static function, to make it easier to extend + generalize PR: https://github.com/llvm/llvm-project/pull/167309	2026-02-24 14:20:47 +00:00
Alexey Bataev	c08079d8e7	[SLP]Add single-use check for the bitcasted reduction If the reduced value, to be bitcasted, is used multiple times, it will require emission of the extractelement instruction. Such nodes should not be bitcasted, should be vectorized as vector instructions. Fixes https://github.com/llvm/llvm-project/pull/181940#issuecomment-3950734168	2026-02-24 05:27:38 -08:00
Florian Hahn	6b352aa8ea	Revert "[VPlan] Add simple driver option to run some individual transforms. (#178522 )" This reverts commit 3df1c6f88bfbbd76d9256c55358bb75e02e33779. Causes build-failures without assertions https://lab.llvm.org/buildbot/#/builders/159/builds/41683	2026-02-23 22:55:42 +00:00
Florian Hahn	3df1c6f88b	[VPlan] Add simple driver option to run some individual transforms. (#178522 ) Add an alternative to test VPlan in more isolation via a new `vplan-test-transform` option, which builds VPlan0 for each loop in the input IR and then can invoke a set of transforms on it. In order to allow different recipe types to be created, a new widen-from-metadata transform is added, which transforms VPInstructions to different recipes, based on custom !vplan.widen metadata. Currently this supports creating widen & replicate recipes, but can easily be extended in the future. Currently the handling is intentionally bare-bones, to be extended gradually as needed. PR: https://github.com/llvm/llvm-project/pull/178522	2026-02-23 22:49:00 +00:00
Luke Lau	0e9880cc04	[VPlan] Remove verifyLate from VPlanVerifier. NFC (#182799 ) We can instead just check if the VPlan has been unrolled.	2026-02-23 23:06:37 +08:00
Valeriy Savchenko	966a4618b8	[VectorCombine] Support ashr sign-bit extraction (#181998 ) This change extends a sign-bit reduction fold introduced earlier. Prior to it, we only supported LSHR isntructions for sign-bits extraction. Similar logic can be applied to ASHR and the fold can be generalized. ## Alive2 proofs \| Reduction \| == 0 \| == -1 / -N \| slt 0 \| sgt -1 / -N \| \|-----------\|------\|------------\|-------\|-------------\| \| or \| [proof](https://alive2.llvm.org/ce/z/DaSMPt) \| [proof](https://alive2.llvm.org/ce/z/wzR48R) \| [proof](https://alive2.llvm.org/ce/z/rfyr_7) \| [proof](https://alive2.llvm.org/ce/z/MTFFe5) \| \| and \| [proof](https://alive2.llvm.org/ce/z/PmmpbX) \| [proof](https://alive2.llvm.org/ce/z/7_9hSn) \| [proof](https://alive2.llvm.org/ce/z/wudWY3) \| [proof](https://alive2.llvm.org/ce/z/QZ33KB) \| \| umax \| [proof](https://alive2.llvm.org/ce/z/gQGnDc) \| [proof](https://alive2.llvm.org/ce/z/dMsoQF) \| [proof](https://alive2.llvm.org/ce/z/QwFbae) \| [proof](https://alive2.llvm.org/ce/z/3dbmy6) \| \| umin \| [proof](https://alive2.llvm.org/ce/z/Z2pZUQ) \| [proof](https://alive2.llvm.org/ce/z/6FQgGR) \| [proof](https://alive2.llvm.org/ce/z/95-em6) \| [proof](https://alive2.llvm.org/ce/z/PW7c-m) \| \| add \| [proof](https://alive2.llvm.org/ce/z/FVhuhj) \| [proof](https://alive2.llvm.org/ce/z/h1B9jQ) \| [proof](https://alive2.llvm.org/ce/z/DmiYRr) \| [proof](https://alive2.llvm.org/ce/z/P4WDN5) \|	2026-02-23 13:01:39 +00:00
Luke Lau	4bd0e49f8a	[VPlan] Remove verifyEVLRecipe (#182798 ) In #182254 we want to start aborting compilation when the verifier fails between passes, but currently we run into various EVL related failures. The EVL is used in quite a few more places than when the verification was originally added, all of which need to be handled by the verifier. I think this is also exacerbated by the fact that many recipes nowadays are converted to concrete recipes later in the pipeline which duplicates the number of patterns we need to match. The EVL transform itself has also changed much since its original implementation, i.e. non-trapping recipes don't use EVL (#127180) and VP recipes are generated via pattern matching instead of unconditionally (#155394), so I'm not sure if the verification is as relevant today. Rather than try to add more patterns this PR removes the verification to reduce the maintainence cost. Split off from #182254	2026-02-23 10:09:00 +00:00
Luke Lau	ff88b83fed	[VPlan] Handle extracts for middle blocks also used by early exiting blocks. NFC (#181789 ) Currently createExtractsForLiveOuts only handles creating extracts when the middle block has one predecessor, but if an early exit exits to the same block as the latch then it might have multiple predecessors. This handles the latter case to avoid the need to handle it in VPlanTransforms::handleUncountableEarlyExits. Addresses the comment in https://github.com/llvm/llvm-project/pull/174864#discussion_r2794153217	2026-02-23 04:03:49 +00:00
Florian Hahn	690312e088	[LV] Add additional find-last reduction tests with sink-able exprs.	2026-02-22 21:38:54 +00:00
Jonas Paulsson	d3081aafc4	[SystemZ, LoopVectorizer] Enable vectorization of epilogue loops. (#172925 ) This enables vectorization of epilogue loops produced by LoopVectorizer on SystemZ. LoopVectorizationCostModel::isEpilogueVectorizationProfitable() and TTI.preferEpilogueVectorization() have been refactored slightly so that targets can override preferEpilogueVectorization(ElementCount Iters) and directly control this, whereas before this depended on TTI.getMaxInterleaveFactor() as well. The Iters passed to preferEpilogueVectorization() reflects the total number of scalar iterations performed in the vectorized loop (including interleaving). The default implementation of preferEpilogueVectorization() now subsumes the old check against getMaxInterleaveFactor(). This patch should be NFC for other targets.	2026-02-22 10:59:09 -06:00
Florian Hahn	4ce4987381	[VPlan] Optimize FindLast of FindIV w/o sentinel. (#172569 ) For FindLast reduction selecting an IV, we can avoid the horizontal AnyOf in the vector loop, by introducing an independent boolean reduction to track if the condition was ever true in the loop. If it was never true in the loop, we select the start value, otherwise the select the min/max of the FindIV reduction, as required by the predicate. The main advantage of this approach is that we have 2 independent reductions, that do not require a horizontal AnyOf reduction in the loop. Currently this requires a non-wrapping IV, but this can be relaxed in the future by selecting a canonical IV, which is then transformed to the specific derived IV for the reduction after the loop. Depends on https://github.com/llvm/llvm-project/pull/177870. PR: https://github.com/llvm/llvm-project/pull/172569	2026-02-20 21:48:35 +00:00
Alexey Bataev	95a960daa0	[SLP]Do not convert inversed cmp nodes, if they reordered/reused If the cmp node with inversed compares must be reordered/shuffled with the reuses, disable transformation for such nodes for now, they require some special processing. Fixes https://github.com/llvm/llvm-project/pull/181580#issuecomment-3933026221	2026-02-20 06:04:51 -08:00
Hari Limaye	ac23bf6f2a	[LoopIdiomVectorize] Bail when vectorization is disabled (#181142 ) Bail on vectorizing a loop in LoopIdiomVectorize when the loop carries hints that indicate vectorization is disabled. This means that LoopIdiomVectorize will now respect vectorize(disable) loop hints.	2026-02-20 13:01:51 +00:00
Luke Lau	e55b6c0919	[LV] Allow tail folding with IVs with outside users (#182322 ) #149042 added last-active-lane and removed the restriction that we couldn't tail fold loops that had outside users (in AllowedExit). However we still have a restriction that IVs can't have outside users. This was added separately to the AllowedExit restriction in #81609, but it looks like #149042 didn't remove it. AFAICT we currently extract the correct lane for IVs, so this PR relaxes the restriction. This helps a good few loops get tail folded in llvm-test-suite. -force-tail-folding-style=none was added to pr5881-scev-expansion.ll to preserve the original scev expansion, since otherwise we end up with a cttz.elts(false, false, true, true) that blocks SCEV analysis. We should probably teach ConstantFolding to fold it.	2026-02-20 04:32:59 +00:00
Alexey Bataev	29d4fea59b	[SLP]Handle mixed select-to-bicasts and general reductions If the reduction tree represents mixed select-to-bitcasts and general reductions, need to handle them correctly to avoid a compiler crash Fixes https://github.com/llvm/llvm-project/pull/181940#issuecomment-3929220929	2026-02-19 13:38:34 -08:00
Alexey Bataev	38d804725f	[SLP]Do not mark for transforming to buildvector inversed compares Inversed compares must remain vector nodes, they should be converted to gathers to generate correct code. Fixes issue reported in https://github.com/llvm/llvm-project/pull/181580#issuecomment-3926951332	2026-02-19 09:37:50 -08:00
Sander de Smalen	46bfd69343	[LV] NFCI: Add RecurKind to VPPartialReductionChain (#181705 ) This avoids having to pass around the RecurKind or re-figure it out from the VPReductionPHI node. This is useful in a follow-up PR, where we need to distinguish between a `Sub` and `AddWithSub` recurrence, which can't be deduced from the `ReductionBinOp` field.	2026-02-19 13:35:10 +00:00
Luke Lau	6a5375fbce	[VPlan] Plumb recurrence FMFs through VPReductionPHIRecipe via VPIRFlags. NFC (#181694 ) In order to be able to create selects for reduction phis through tail folding in foldTailByMasking (#176143), make VPReductionPHIRecipe an instance of VPIRFlags and plumb the FMFs from the original RdxDesc. This allows us to remove more uses of the RecurrenceDescriptor in addReductionResultComputation, which should help untie it from LoopVectorizationLegality.	2026-02-19 11:23:47 +00:00
Florian Hahn	4042975b63	[LV] Support argmin/argmax with strict predicates. (#170223 ) Extend handleMultiUseReductions to support strict predicates (>, <), matching the first index instead of the last for non-strict predicates. Builds on top of https://github.com/llvm/llvm-project/pull/141431. FindLast reductions with strict predicates are adjusted to compute the correct result as follows: 1. Find the first canonical indices corresponding to partial min/max values, using loop reductions. 2. Find which of the partial min/max values are equal to the overall min/max value. 3. Select among the canonical indices those corresponding to the overall min/max value. 4. Find the first canonical index of overall min/max and scale it back to the original IV using VPDerivedIVRecipe. 5. If the overall min/max equals the starting min/max, the condition in the loop was always false, due to being strict; return the original start value in that case.	2026-02-19 10:52:27 +00:00
Sander de Smalen	114e20805f	[LV] Fix sub-reduction PHI in vectorized epilogue (#182072 ) When the vectorized epilogue loop uses partial reductions, the PHI node in the loop must start at 0 (because for partial sub-reductions the sub is done in the middle block) and the compute-reduction-result must subtract from the partial result (as calculated in the middle block of the main vector loop), instead of subtracting from the original init value. This fixes the issue as reported on #178919 by @aeubanks.	2026-02-19 10:15:32 +00:00
Ricardo Jesus	c255e3df64	[LoopIdiomVectorize] Test all needles when vectorising find_first_of loops. (#179298 ) Fixes #179187 - as described in the issue, the current FindFirstByte transformation in LoopIdiomVectorizePass will incorrectly early-exit as soon as a needle matching a search element is found, even if a previous search element could match a subsequent needle. This patch ensures all needles are tested before we return a matching search element.	2026-02-19 09:07:13 +00:00
Alexey Bataev	c6425aa9ae	[SLP]Support reduced or selects of bitmask as cmp bitcast Converts reduced or(select %cmp, bitmask, 0) to zext(bitcast %vector_cmp to i<num_reduced_values>) to in Reviewers: RKSimon, hiraditya Pull Request: https://github.com/llvm/llvm-project/pull/181940	2026-02-18 18:01:42 -05:00

1 2 3 4 5 ...

7228 Commits