llvm-project

Author	SHA1	Message	Date
Hans Wennborg	ed004cf42b	Revert "[VPlan] Only use isAddressSCEVForCost in legacy getAddressAccSCEV (NFCI)" This caused assertion failures: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7265: VectorizationFactor llvm::LoopVectorizationPlanner::computeBestVF(): Assertion `(BestFactor.Width == LegacyVF.Width \|\| BestPlan.hasEarlyExit() \|\| !Legal->getLAI()->getSymbolicStrides().empty() \|\| UsesEVLGatherScatter \|\| planContainsAdditionalSimplifications( getPlanFor(BestFactor.Width), CostCtx, OrigLoop, BestFactor.Width) \|\| planContainsAdditionalSimplifications( getPlanFor(LegacyVF.Width), CostCtx, OrigLoop, LegacyVF.Width)) && " VPlan cost model and legacy cost model disagreed"' failed. see comment on https://github.com/llvm/llvm-project/pull/171204 This reverts commit 01d34eb38fa0587cb95eedd3bada8257abc122f8.	2026-01-09 15:38:32 +01:00
Florian Hahn	4998280c3f	[LV] Find reduction result VPInstruction from backedge value (NFC). Split off from https://github.com/llvm/llvm-project/pull/174026. Make the lookup of the reduction phi recipe/compute-reduction-result VPInstruction independent of the latter having the reduction phi as operand.	2026-01-07 21:12:07 +00:00
Florian Hahn	31b93d6e38	[VPlan] Add specialized VPValue subclasses for different types (NFC) (#172758 ) This patch adds VPValue sub-classes for the different cases we currently have: * VPIRValue: A live-in VPValue that wraps an underlying IR value * VPSymbolicValue: A symbolic VPValue not tied to an underlying value, e.g. the vector trip count or VF VPValues * VPRecipeValue: A VPValue defined by a VPDef/VPRecipeBase. This has multiple benefits: * clearer constructors for each kind of VPValue * limited scope: for example allows moving VPDef member to VPRecipeValue, reducing size of other VPValues. * stricter type checking for member variables (e.g. using VPLiveIn in the Value -> live-in map in VPlan, or using VPSymbolicValue for symbolic member VPValues) There probably are additional opportunities for cleanups as follow-ups. PR: https://github.com/llvm/llvm-project/pull/172758	2026-01-07 20:29:05 +00:00
Shih-Po Hung	39d6f10e33	[LV] Conservatively predicate SDiv/SRem (#170818 ) Conservatively predicate sdiv/srem: - RHS may carry poison in masked‑off lanes. - RHS could be −1 while LHS has masked‑off lanes (risking INT_MIN/−1 overflow). We’ll relax this once we can prove non‑wrap/non‑poison conditions. Fixes #170775.	2026-01-07 04:25:38 +00:00
Florian Hahn	01d34eb38f	[VPlan] Only use isAddressSCEVForCost in legacy getAddressAccSCEV (NFCI) Follow-up to https://github.com/llvm/llvm-project/pull/171204 and 1f331e453f to only rely on isAddressSCEVForCost in legacy isAddressSCEVForCost, completely aligning the decisions of VPlan and legacy cost model.	2026-01-06 19:18:13 +00:00
Florian Hahn	16830b2164	[VPlan] Remove VPWidenSelectRecipe, use VPWidenRecipe instead (NFCI). (#174234 ) All extra state has been removed from VPWidenSelectRecipe at this point. There's no benefit of having a separate recipe and Select can easily be handled by the existing VPWidenRecipe. PR: https://github.com/llvm/llvm-project/pull/174234	2026-01-05 22:33:37 +00:00
Florian Hahn	990883a690	[VPlan] Handle Alloca in VPReplicateRecipe::computeCost. (NFCI) Handle Alloca in the VPlan-based cost mode. This also updates the cost in the legacy cost model to clarify that we always compute the scalar cost.	2026-01-03 17:40:51 +00:00
Florian Hahn	2d60f87111	[VPlan] Only use legacy cost for instructions only used by exit conds. (#174029 ) Currently we need to precompute costs for exit conditions, to match the legacy cost, as they will get replaced by a compare against the canonical IV (or others, like active-lane-mask or EVL based) and the original compare will get removed. This is not true for instructions with users other than the exit condition. Those will remain, and we can just use the VPlan-based cost model to get more accurate results. This improves results in some cases, like @test_value_in_exit_compare_chain_used_outside because the IV increment user outside the loop is replaced by computing the final value outside the loop. It also fixes a crash introduced by f196b1d66ff (#146525). PR: https://github.com/llvm/llvm-project/pull/174029	2025-12-31 13:34:54 +00:00
Florian Hahn	524b1788c4	[VPlan] Add BranchOnTwoConds, use for early exit plans. (#172750 ) This PR introduces a new BranchOnTwoConds VPInstruction, that takes 2 boolean operands and must be placed in a block with 3 successors. If condition I is true, branches to successor I, otherwise falls through to check the next condition. If both conditions are false, branch to the third successor. This new branch recipe is used for early-exit loops, to simplify the representation in VPlan initially, by avoid the need for splitting the middle block early on, in a way that preserves the single-exit block property of regions. All exits still go through the latch block, but they can go to more than 2 successors. This idea was part of one of the original proposals for how to model early exits in VPlan, but at that point in time, there was no good way to handle this during code-gen, and we went with the early split-middle block approach initially. Now that we dissolve regions before ::execute, the new recipe can be lowered nicely after regions have been removed, to a set of VPBBs and BranchOnCond recipes. The initial lowering preserves the original structure with the split middle blocks. Follow-ups will improve the lowering to avoid this splitting, providing performance gains. PR: https://github.com/llvm/llvm-project/pull/172750	2025-12-29 19:39:38 +00:00
Florian Hahn	d777b1a230	[VPlan] Skip phi recipes in tryToBuildVPlan (NFC). No phi recipes are being transformed in the main loop any longer, so skip phi recipes. This also allows to clarify which recipes need skipping explicitly. Those are recipes that have been already transformed. Follow-up to post-commit comment in https://github.com/llvm/llvm-project/pull/168291.	2025-12-27 17:02:48 +00:00
Florian Hahn	c2a8739cd1	[VPlan] Split off VPReductionRecipe creation for in-loop reductions (NFC) (#168784 ) This patch splits off VPReductionRecipe creation for in-loop reductions to a separate transform from adjustInLoopReductions, which has been renamed. The new transform has been updated to work directly on VPInstructions, and gets applied after header phis have been processed, once on VPlan0. Builds on top of https://github.com/llvm/llvm-project/pull/168291 and https://github.com/llvm/llvm-project/pull/166099 which should be reviewed first. PR: https://github.com/llvm/llvm-project/pull/168784	2025-12-25 14:02:58 +00:00
Florian Hahn	c43ccefc9f	[VPlan] Use PSE to construct SCEVs in getSCEVExprForVPValue (NFCI). getSCEVExprForVPValue is used to create SCEVs for expressions from the original loop, which may be predicated. Use PSE to construct predicated SCEVs if possible. This matches the legacy LV code behavior. Currently should be NFC, but will enable migrating more SCEV/cost-based computations to VPlan. The patch requires exposing a new getPredicatedSCEV helper to PredicatedScalarEvolution which just takes a SCEV, to avoid needing to go through IR values, which isn't an option for getSCEVExprForVPValue.	2025-12-21 22:39:49 +00:00
Florian Hahn	1f78f6a2d6	[LV] Check Addr in getAddressAccessSCEV in terms of SCEV expressions. (#171204 ) getAddressAccessSCEV previously had some restrictive checks that limited pointer SCEV expressions passed to TTI to GEPs with operands that must either be invariant or marked as inductions. As a consequence, the check rejected things like `GEP %base, (%iv + 1)`, while the SCEV for the GEP should be as easily analyzeable as for `GEP %base, %v`, with the only difference being the of the AddRec start adjusted by 1. This patch changes the code to use a SCEV-based check, limiting the address SCEV to be loop invariant, an affine AddRec (i.e. induction ), or an add expression of such operands or a sign-extended AddRec. This catches all existing cases getAddressAccessSCEV caught, plus additional ones like the cases mentioned above. This means we pass address SCEVs in more cases, giving the backends a better change to make informed decisions. It also unifies the decision when to use an address SCEV between the legacy and VPlan-based cost model. An illustrative example of showing the impact are the gather-cost.ll tests. Previously they were considered not profitable to vectorize because we failed to determine that %gep.src_data = getelementptr inbounds [1536 x float], ptr @src_data, i64 0, i64 %mul has a relatively small constant stride. There may be some rough edges in the cost models, where not passing pointer SCEVs hid some incorrect modeling, but those issues should be fixed in the target cost models if they surface. PR: https://github.com/llvm/llvm-project/pull/171204	2025-12-19 22:05:27 +00:00
Mel Chen	f196b1d66f	[VPlan] Extract reverse operation for reverse accesses (#146525 ) This patch introduces VPInstruction::Reverse and extracts the reverse operations of loaded/stored values from reverse memory accesses. This extraction facilitates future support for permutation elimination within VPlan.	2025-12-18 14:57:48 +00:00
Florian Hahn	bab0dc4d48	Reapply "[LV] Mark checks as never succeeding for high cost cutoff." Reapply 8a115b6934a90441 with an update to tests handling remarks. The patch now directly emits a clear remark when we bail out due to the memory check threshold. Original message: When GeneratedRTChecks::create bails out due to exceeding the cost threshold, no runtime checks are generated and we must not proceed assuming checks have been generated. Mark the checks as never succeeding, to make sure we don't try to vectorize assuming the runtime checks hold. This fixes a case where we previously incorrectly vectorized assuming runtime checks had been generated when forcing vectorization via metadate. Fixes the mis-compile mentioned in https://github.com/llvm/llvm-project/pull/166247#issuecomment-3631471588	2025-12-17 20:21:49 +00:00
Ramkumar Ramachandra	1c6e5b2d04	[LV] Improve code using VPlan::get{ConstantInt,True} (NFC) (#172471 )	2025-12-16 13:03:43 +00:00
Luke Lau	67d0e21a62	Reapply "[VPlan] Remove legacy costing inside VPBlendRecipe::computeCost (#171846 )" (#172261 ) This reapplies #171846 with a test case and fix for a legacy cost-model mismatch assertion. In the previous version of the patch, we only considered the plan to contain simplifications when it had a VPBlendRecipe and VF.isScalar() was true. However for some VPlans we may have a blend with only the first lane used: BLEND ir<%phi> = ir<%foo.res> ir<%bar.res>/ir<%c> CLONE ir<%gep> = getelementptr ir<%p>, ir<%phi> vp<%5> = vector-pointer ir<%gep> And in the legacy cost model we cost a blend as a phi if it's uniform: // If we know that this instruction will remain uniform, check the cost of // the scalar version. if (isUniformAfterVectorization(I, VF)) VF = ElementCount::getFixed(1); So this replaces the VF.isScalar() check with vputils::onlyFirstLaneUsed, which matches how the VPlan cost model mirrored the legacy model beforehand. A VPInstruction::Select will also emit a scalar select for a vector VF if only the first lane is used, so this also updates VPBlendRecipe::computeCost to reflect that too.	2025-12-16 06:30:54 +00:00
Florian Hahn	83eea87a36	[VPlan] Create header phis once, after constructing VPlan0 (NFC). (#168291 ) Together with https://github.com/llvm/llvm-project/pull/168289 & https://github.com/llvm/llvm-project/pull/166099 we can construct header phis once up front, after creating VPlan0, as the induction/reduction/first-order-recurrence classification applies across all VFs. Depends on https://github.com/llvm/llvm-project/pull/168289 & https://github.com/llvm/llvm-project/pull/166099 PR: https://github.com/llvm/llvm-project/pull/168291	2025-12-15 22:12:10 +00:00
Florian Hahn	dbb4f5c2dd	[VPlan] Set VF scale factor in tryToCreatePartialReduction (NFCI). Split off unrelated change from approved https://github.com/llvm/llvm-project/pull/168291/ to land separately as suggested.	2025-12-15 21:18:07 +00:00
Florian Hahn	bcbbe2c2bc	[VPlan] Pass backedge value directly to FOR and reduction phis (NFC). Pass backedge values directly to VPFirstOrderRecurrencePHIRecipe and VPReductionPHIRecipe directly, as they must be provided and availbale. Split off from https://github.com/llvm/llvm-project/pull/168291.	2025-12-14 20:59:22 +00:00
Florian Hahn	53cf22f3a1	[VPlan] Simplify live-ins early using SCEV. (#155304 ) Use SCEV to simplify all live-ins during VPlan0 construction. This enables us to remove special SCEV queries when constructing VPWidenRecipes and improves results in some cases. This leads to simplifications in a number of cases in real-world applications (~250 files changed across LLVM, SPEC, ffmpeg) PR: https://github.com/llvm/llvm-project/pull/155304	2025-12-14 20:15:05 +00:00
Luke Lau	4ea8157773	Revert "[VPlan] Remove legacy costing inside VPBlendRecipe::computeCost (#171846 )" This reverts commit fd5f53aa9b21060063484fc6c346316a34a6464c. It's triggering legacy cost model assertions reported in https://github.com/llvm/llvm-project/pull/171846#issuecomment-3647640019	2025-12-13 20:05:34 +08:00
Florian Hahn	333ee931df	[LV] Update stale comment after 4e05d702f02a. (NFC) Address post-commit suggestion, update stale comment after 4e05d702f.	2025-12-12 21:36:56 +00:00
Florian Hahn	4e05d702f0	[LV] Always include middle block cost in isOutsideLoopWorkProfitable. (#171102 ) Always include the cost of the middle block in isOutsideLoopWorkProfitable. This addresses the TODO from https://github.com/llvm/llvm-project/pull/168949 and removes the temporary restriction. isOutsideLoopWorkProfitable already scales the cost outside loops according the expected trip counts. In practice this increases the minimum iteration threshold in a few cases. On a large IR corpus based on C/C++ workloads, ~50 out of 179450 vector loops have their thresholds increased slightly. PR: https://github.com/llvm/llvm-project/pull/171102	2025-12-11 21:41:47 +00:00
Luke Lau	fd5f53aa9b	[VPlan] Remove legacy costing inside VPBlendRecipe::computeCost (#171846 ) A VPBlendRecipe always emits selects, even when the VF is scalar. However the legacy cost model always costs all scalar non-header phis as a phi, and the VPlan cost model has to account for this. This can cause the cost to be a little off, for example not including the cost of the select in @smax_call_uniform leading to unprofitable vectorization. This removes this from the VPlan cost model and handles checks for the case in planContainsAdditionalSimplifications instead. I considered trying to make the legacy cost model more accurate but I'm not sure if it's possible. We need information as to whether or not the scalar VF we are costing is the original loop in which case it's actually a phi, or if it's a VPBlendRecipe that emits a select, potentially from a VF=1, UF>=1 VPlan.	2025-12-12 00:25:58 +08:00
Ramkumar Ramachandra	85fafd5db0	[SCEVExp] Get DL from SE, strip constructor arg (NFC) (#171823 )	2025-12-11 14:26:47 +00:00
Aiden Grossman	f29d06029f	Revert "[LV] Mark checks as never succeeding for high cost cutoff." This reverts commit 8a115b6934a90441d77ea54af73e7aaaa1394b38. This broke premerge. https://lab.llvm.org/staging/#/builders/192/builds/13326 /home/gha/llvm-project/clang/test/Frontend/optimization-remark-options.c:10:11: remark: loop not vectorized: cannot prove it is safe to reorder floating-point operations; allow reordering by specifying '#pragma clang loop vectorize(enable)' before the loop or by providing the compiler option '-ffast-math'	2025-12-09 21:32:09 +00:00
Florian Hahn	8a115b6934	[LV] Mark checks as never succeeding for high cost cutoff. When GeneratedRTChecks::create bails out due to exceeding the cost threshold, no runtime checks are generated and we must not proceed assuming checks have been generated. Mark the checks as never succeeding, to make sure we don't try to vectorize assuming the runtime checks hold. This fixes a case where we previously incorrectly vectorized assuming runtime checks had been generated when forcing vectorization via metadate. Fixes the mis-compile mentioned in https://github.com/llvm/llvm-project/pull/166247#issuecomment-3631471588	2025-12-09 20:37:21 +00:00
Pengcheng Wang	1ef0a56b55	[LV][NFC] Use foldTailWithEVL() (#171282 )	2025-12-09 16:58:53 +08:00
Luke Lau	0fbb45e7d6	[LV] Return getPredBlockCostDivisor in uint64_t When the probability of a block is extremely low, HeaderFreq / BBFreq may be larger than 32 bits. Previously this got truncated to uint32_t which could cause division by zero exceptions on x86. Widen the return type to uint64_t which should fit the entire range of BlockFrequency values. It's also worth noting that a frequency can never be zero according to BlockFrequency.h, so we shouldn't need to worry about divide by zero in getPredBlockCostDivisor itself.	2025-12-09 15:43:13 +08:00
Florian Hahn	65dd29b335	[LV] Compare induction start values via SCEV in assertion (NFCI). Instead of comparing plain VPValue in the assertion checking the start values, directly compare the SCEV's. This future-proofs the code in preparation of performing more simplifications/canonicalizations for live-ins.	2025-12-08 21:31:53 +00:00
Luke Lau	e8219e5ce8	[VPlan] Use BlockFrequencyInfo in getPredBlockCostDivisor (#158690 ) In 531.deepsjeng_r from SPEC CPU 2017 there's a loop that we unprofitably loop vectorize on RISC-V. The loop looks something like: ```c for (int i = 0; i < n; i++) { if (x0[i] == a) if (x1[i] == b) if (x2[i] == c) // do stuff... } ``` Because it's so deeply nested the actual inner level of the loop rarely gets executed. However we still deem it profitable to vectorize, which due to the if-conversion means we now always execute the body. This stems from the fact that `getPredBlockCostDivisor` currently assumes that blocks have 50% chance of being executed as a heuristic. We can fix this by using BlockFrequencyInfo, which gives a more accurate estimate of the innermost block being executed 12.5% of the time. We can then calculate the probability as `HeaderFrequency / BlockFrequency`. Fixing the cost here gives a 7% speedup for 531.deepsjeng_r on RISC-V. Whilst there's a lot of changes in the in-tree tests, this doesn't affect llvm-test-suite or SPEC CPU 2017 that much: - On armv9-a -flto -O3 there's 0.0%/0.2% more geomean loops vectorized on llvm-test-suite/SPEC CPU 2017. - On x86-64 -flto -O3 with PGO there's 0.9%/0% less geomean loops vectorized on llvm-test-suite/SPEC CPU 2017. Overall geomean compile time impact is 0.03% on stage1-ReleaseLTO: https://llvm-compile-time-tracker.com/compare.php?from=9eee396c58d2e24beb93c460141170def328776d&to=32fbff48f965d03b51549fdf9bbc4ca06473b623&stat=instructions%3Au	2025-12-08 14:28:26 +00:00
Florian Hahn	3fc7419236	[VPlan] Replace ExtractLast(Elem\|LanePerPart) with ExtractLast(Lane/Part) (#164124 ) Replace ExtractLastElement and ExtractLastLanePerPart with more generic and specific ExtractLastLane and ExtractLastPart, which model distinct parts of extracting across parts and lanes. ExtractLastElement == ExtractLastLane(ExtractLastPart) and ExtractLastLanePerPart == ExtractLastLane, the latter clarifying the name of the opcode. A new m_ExtractLastElement matcher is provided for convenience. The patch should be NFC modulo printing changes. PR: https://github.com/llvm/llvm-project/pull/164124	2025-12-07 15:15:43 +00:00
Luke Lau	37ea097943	[VPlan] Remove VPWidenRecipe constructor with no underlying instruction. NFCI (#166521 ) My understanding is that a VPWidenRecipe should be used for recipes with an exact underlying scalar instruction, and VPInstruction should be used elsewhere e.g. for instructions generated as a part of the vectorization process. The only user of the VPWidenRecipe constructor that doesn't take an underlying instruction is in adjustRecipesForReductions, but we can just use VPInstruction there.	2025-12-04 19:19:23 +08:00
Florian Hahn	f0e1254bce	[LV] Use forced cost once for whole interleave group in legacy costmodel (#168270 ) The VPlan-based cost model assigns the forced cost once for a whole VPInterleaveRecipe. Update the legacy cost model to match this behavior. This fixes a cost-model divergence, and assigns the cost in a way that matches the generated code more accurately. PR: https://github.com/llvm/llvm-project/pull/168270	2025-12-02 21:39:54 +00:00
Florian Hahn	4b6ad11876	[VPlan] Sink predicated stores with complementary masks. (#168771 ) Extend the logic to hoist predicated loads (https://github.com/llvm/llvm-project/pull/168373) to sink predicated stores with complementary masks in a similar fashion. The patch refactors some of the existing logic for legality checks to be shared between hosting and sinking, and adds a new sinking transform on top. With respect to the legality checks, for sinking stores the code also checks if there are any aliasing stores that may alias, not only loads. PR: https://github.com/llvm/llvm-project/pull/168771	2025-12-02 11:43:37 +00:00
Florian Hahn	24b87b8d48	[VPlan] Skip cost verification for loops with EVL gather/scatter. The VPlan-based cost model use vp_gather/vp_scatter for gather/scatter costs, which is different to the legacy cost model and cannot be matched there. Don't verify the costs match for plans containing gather/scatters with EVL. Fixes https://github.com/llvm/llvm-project/issues/169948.	2025-11-29 22:00:30 +00:00
Florian Hahn	99addbf73d	[LV] Vectorize selecting last IV of min/max element. (#141431 ) Add support for vectorizing loops that select the index of the minimum or maximum element. The patch implements vectorizing those patterns by combining Min/Max and FindFirstIV reductions. It extends matching Min/Max reductions to allow in-loop users that are FindLastIV reductions. It records a flag indicating that the Min/Max reduction is used by another reduction. The extra user is then check as part of the new `handleMultiUseReductions` VPlan transformation. It processes any reduction that has other reduction users. The reduction using the min/max reduction currently must be a FindLastIV reduction, which needs adjusting to compute the correct result: 1. We need to find the last IV for which the condition based on the min/max reduction is true, 2. Compare the partial min/max reduction result to its final value and, 3. Select the lanes of the partial FindLastIV reductions which correspond to the lanes matching the min/max reduction result. Depends on https://github.com/llvm/llvm-project/pull/140451 PR: https://github.com/llvm/llvm-project/pull/141431	2025-11-28 22:26:19 +00:00
Shih-Po Hung	b9bdec3021	[TTI][Vectorize] Migrate masked/gather-scatter/strided/expand-compress costing (NFCI) (#165532 ) In #160470, there is a discussion about the possibility to explored a general approach for handling memory intrinsics. API changes: - Remove getMaskedMemoryOpCost, getGatherScatterOpCost, getExpandCompressMemoryOpCost, getStridedMemoryOpCost from Analysis/TargetTransformInfo. - Add getMemIntrinsicInstrCost. In BasicTTIImpl, map intrinsic IDs to existing target implementation until the legacy TTI hooks are retired. - masked_load/store → getMaskedMemoryOpCost - masked_/vp_gather/scatter → getGatherScatterOpCost - masked_expandload/compressstore → getExpandCompressMemoryOpCost - experimental_vp_strided_{load,store} → getStridedMemoryOpCost TODO: add support for vp_load_ff. No functional change intended; costs continue to route to the same target-specific hooks.	2025-11-28 05:14:37 +00:00
Florian Hahn	f8eca64a28	Reapply "[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042 )" This reverts commit a6edeedbfa308876d6f2b1648729d52970bb07e6. The following fixes have landed, addressing issues causing the original revert: * https://github.com/llvm/llvm-project/pull/169298 * https://github.com/llvm/llvm-project/pull/167897 * https://github.com/llvm/llvm-project/pull/168949 Original message: Building on top of https://github.com/llvm/llvm-project/pull/148817, introduce a new abstract LastActiveLane opcode that gets lowered to Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1). When folding the tail, update all extracts for uses outside the loop the extract the value of the last actice lane. See also https://github.com/llvm/llvm-project/issues/148603 PR: https://github.com/llvm/llvm-project/pull/149042	2025-11-26 20:03:55 +00:00
Florian Hahn	d58ebe339c	Revert "Reapply "[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042 )"" This reverts commit 72e51d389f66d9cc6b55fd74b56fbbd087672a43. Missed some test updates.	2025-11-26 19:41:39 +00:00
Florian Hahn	72e51d389f	Reapply "[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042 )" This reverts commit a6edeedbfa308876d6f2b1648729d52970bb07e6. The following fixes have landed, addressing issues causing the original revert: * https://github.com/llvm/llvm-project/pull/169298 * https://github.com/llvm/llvm-project/pull/167897 * https://github.com/llvm/llvm-project/pull/168949 Original message: Building on top of https://github.com/llvm/llvm-project/pull/148817, introduce a new abstract LastActiveLane opcode that gets lowered to Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1). When folding the tail, update all extracts for uses outside the loop the extract the value of the last actice lane. See also https://github.com/llvm/llvm-project/issues/148603 PR: https://github.com/llvm/llvm-project/pull/149042	2025-11-26 19:31:25 +00:00
Sam Tebbs	071d1fb8be	[LV] Use VPReductionRecipe for partial reductions (#147513 ) Partial reductions can easily be represented by the VPReductionRecipe class by setting their scale factor to something greater than 1. This PR merges the two together and gives VPReductionRecipe a VFScaleFactor so that it can choose to generate the partial reduction intrinsic at execute time. Stacked PRs: 1. https://github.com/llvm/llvm-project/pull/147026 2. https://github.com/llvm/llvm-project/pull/147255 3. https://github.com/llvm/llvm-project/pull/156976 4. https://github.com/llvm/llvm-project/pull/160154 5. https://github.com/llvm/llvm-project/pull/147302 6. https://github.com/llvm/llvm-project/pull/162503 7. -> https://github.com/llvm/llvm-project/pull/147513 Replaces https://github.com/llvm/llvm-project/pull/146073 .	2025-11-26 16:18:22 +00:00
Florian Hahn	4cc8cc81e3	[VPlan] Hoist predicated loads with complementary masks. (#168373 ) This patch adds a new VPlan transformation to hoist predicated loads, if we can prove they execute unconditionally, i.e. there are 2 predicated loads to the same address with complementary masks. Then we are guaranteed to execute one of them on each iteration, allowing us to remove the mask. The transform groups masked replicating loads by their address SCEV, then checks if there are 2 loads with complementary mask. If that is the case, we check if there are any writes that may alias the load address in the blocks between the first and last load with the same address. The transforms operates after linearizing the CFG, but before introducing replicate regions, which means this is just checking a chain of consecutive blocks. Currently this only uses noalias metadata to check for no-alias (using the helpers added in https://github.com/llvm/llvm-project/pull/166247). Then we create an unpredicated VPReplicateRecipe at the position of the first load, then replace all users of the grouped loads with it. Small Alive2 proof for hoisting with complementary masks: https://alive2.llvm.org/ce/z/kUx742 PR: https://github.com/llvm/llvm-project/pull/168373	2025-11-26 13:55:14 +00:00
Florian Hahn	48eb697441	[LV] Count cost of middle block if TC <= VF. (#168949 ) If the expected trip count is less than the VF, the vector loop will only execute a single iteration. When that's the case, the cost of the middle block has the same impact as the cost of the vector loop. Include it in isOutsideLoopWorkProfitable to avoid vectorizing when the extra work in the middle block makes it unprofitable. Note that isOutsideLoopWorkProfitable already scales the cost of blocks outside the vector region, but the patch restricts accounting for the middle block to cases where VF <= ExpectedTC, to initially catch some worst cases and avoid regressions. This initial version should specifically avoid unprofitable tail-folding for loops with low trip counts after re-applying https://github.com/llvm/llvm-project/pull/149042. PR: https://github.com/llvm/llvm-project/pull/168949	2025-11-24 19:23:04 +00:00
Ramkumar Ramachandra	1abb055c57	[IVDesc] Make getCastInsts return an ArrayRef (NFC) (#169021 ) To make it clear that the return value is immutable.	2025-11-24 08:57:55 +00:00
Florian Hahn	080ca902c6	[VPlan] Create resume phis in scalar preheader early. (NFC) (#166099 ) Create phi recipes for scalar resume value up front in addInitialSkeleton during initial construction. This will allow moving the remaining code dealing with resume values to VPlan transforms/construction. PR: https://github.com/llvm/llvm-project/pull/166099	2025-11-22 20:45:41 +00:00
Florian Hahn	7acfbc23a7	[VPlan] Remove PtrIV::IsScalarAfterVectorization, use VPlan analysis. (#168289 ) Remove `VPWidenPointerInductionRecipe::IsScalarAfterVectorization` and replace it with `onlyScalarValuesUsed`. This removes the need to carry state from the legacy cost model through VPlan, and the VPlan-based analysis gives more accurate results, avoiding a number of extracts. PR: https://github.com/llvm/llvm-project/pull/168289	2025-11-20 18:58:25 +00:00
Florian Hahn	67e35bbebb	[LV] Check full partial reduction chains in order. (#168036 ) https://github.com/llvm/llvm-project/pull/162822 added another validation step to check if entries in a partial reduction chain have the same scale factor. But the validation was still dependent on the order of entries in PartialReductionChains, and would fail to reject some cases (e.g. if the first first link matched the scale of the second link, but the second link is invalidated later). To fix that, group chains by their starting phi nodes, then perform the validation for each chain, and if it fails, invalidate the whole chain for the phi. Fixes https://github.com/llvm/llvm-project/issues/167243. Fixes https://github.com/llvm/llvm-project/issues/167867. PR: https://github.com/llvm/llvm-project/pull/168036	2025-11-20 15:54:57 +00:00
Sam Tebbs	3396b4654b	[LV] Allow partial reductions with an extended bin op (#165536 ) A pattern of the form reduce.add(ext(mul)) is valid for a partial reduction as long as the mul and its operands fulfill the requirements of a normal partial reduction. The mul's extend operands will be optimised to the wider extend, and we already have oneUse checks in place to make sure the mul and operands can be modified safely. 1. -> https://github.com/llvm/llvm-project/pull/165536 2. https://github.com/llvm/llvm-project/pull/165543	2025-11-20 10:22:11 +00:00

1 2 3 4 5 ...

2834 Commits