llvm-project

Author	SHA1	Message	Date
Sander de Smalen	3157758190	[LV] Handle partial sub-reductions with sub in middle block. (#178919 ) Sub-reductions can be implemented in two ways: (1) negate the operand in the vector loop (the default way). (2) subtract the reduced value from the init value in the middle block. Note that both ways keep the reduction itself as an 'add' reduction, which is necessary because only llvm.vector.partial.reduce.add exists. The ISD nodes for partial reductions don't support folding the sub/negation into its operands because the following is not a valid transformation: ``` sub(0, mul(ext(a), ext(b))) -> mul(ext(a), ext(sub(0, b))) ``` It can therefore be better to choose option (2) such that the partial reduction is always positive (starting at '0') and to do a final subtract in the middle block. For AArch64 there are no dot-product instructions that can do a `partial.reduce.sub(acc, mul(ext(a), ext(b)))` operation. I'm not sure if such instructions exist for other targets. (If so then we may want to make this decision a target option) This PR also increases the AArch64 cost of a partial sub-reduction when this exists in an 'add-sub' reduction chain. Fixes https://github.com/llvm/llvm-project/issues/178703	2026-02-10 11:00:32 +00:00
Benjamin Maxwell	f22a178b13	Reland "[LV] Support conditional scalar assignments of masked operations" (#180708 ) This patch extends the support added in #158088 to loops where the assignment is non-speculatable (e.g. a conditional load or divide). For example, the following loop can now be vectorized: ``` int simple_csa_int_load( int* a, int* b, int default_val, int N, int threshold) { int result = default_val; for (int i = 0; i < N; ++i) if (a[i] > threshold) result = b[i]; return result; } ``` It does this by extending the recurrence matching from only looking for selects, to include phis where all operands are the header phi, except for one which can be an arbitrary value outside the recurrence. --- Reverts llvm/llvm-project#180275 (original PR: #178862) Additional type legalization for `ISD::VECTOR_FIND_LAST_ACTIVE` was added in #180290, which should resolve the backend crashes on x86.	2026-02-10 09:57:48 +00:00
Florian Hahn	2b9a1aee5a	[LV] Add additional tests for reductions with intermediate stores. (NFC) Adds missing test coverage for reductions with intermediate stores, including partial reductions with intermediate stores, as well as chained min/max reductions with intermediate stores.	2026-02-09 23:14:26 +00:00
Florian Hahn	d1ec04dfd4	[VPlan] Simplify single-entry VPWidenPHIRecipe. Include VPWidenPHIRecipe in phi simplification if there's a single incoming value.	2026-02-09 22:10:13 +00:00
Vishruth Thimmaiah	84f4b1e52d	Reland "[LoopVectorize] Support vectorization of overflow intrinsics" (#180526 ) Enables support for marking overflow intrinsics `uadd`, `sadd`, `usub`, `ssub`, `umul` and `smul` as trivially vectorizable. Fixes #174617 --- This patch is a reland of #174835. Reverts #179819	2026-02-09 15:32:04 +00:00
Florian Hahn	7defb0a4a3	[VPlan] Skip applying InstsToScalarize with forced instr costs. (#168269 ) ForceTargetInstructionCost in the legacy cost model overrides any costs from InstsToScalarize. Match the behavior in the VPlan-based cost model. This fixes a crash with -force-target-instr-cost for the added test case. PR: https://github.com/llvm/llvm-project/pull/168269	2026-02-09 13:20:44 +00:00
Kewen Meng	703c2762d3	Revert "[LV] Support conditional scalar assignments of masked operations" (#180275 ) Reverts llvm/llvm-project#178862 revert to unblock bot: https://lab.llvm.org/buildbot/#/builders/206/builds/13225	2026-02-06 13:24:40 -08:00
Benjamin Maxwell	4f90eb6427	[LV] Support conditional scalar assignments of masked operations (#178862 ) This patch extends the support added in #158088 to loops where the assignment is non-speculatable (e.g. a conditional load or divide). For example, the following loop can now be vectorized: ``` int simple_csa_int_load( int* a, int* b, int default_val, int N, int threshold) { int result = default_val; for (int i = 0; i < N; ++i) if (a[i] > threshold) result = b[i]; return result; } ``` It does this by extending the recurrence matching from only looking for selects, to include phis where all operands are the header phi, except for one which can be an arbitrary value outside the recurrence.	2026-02-06 11:43:06 +00:00
David Green	8f484ff2a0	[AArch64] Add FeatureUseFixedOverScalableIfEqualCost to Neoverse-V3 and Neoverse-V3ae (#179903 ) This was missing from neoverse-v3 and neoverse-v3ae, but should be present like neoverse-v2.	2026-02-05 17:55:49 +00:00
Alexander Kornienko	7165353506	Revert "[LoopVectorize] Support vectorization of overflow intrinsics" (#179819 ) Reverts llvm/llvm-project#174835, which causes clang crashes. See https://github.com/llvm/llvm-project/pull/174835#issuecomment-3844233831 and https://github.com/llvm/llvm-project/issues/179671 for details.	2026-02-05 15:41:49 +01:00
Florian Hahn	792f7b089a	[VPlan] Refine exit select check in transformtoPartialReduction. Make sure we find the actual select for the exit users and only use it for the final link in the chain. This fixes a miscompile after 90b3712d8a20efa2cbaadc177da576e485dce038.	2026-02-03 21:07:02 +00:00
Mel Chen	8c6658aca6	[VPlan] Sink recipes from the vector loop region in licm. (#168031 ) When a recipe can be safely sunk and all of its users are outside the vector loop region in the same dedicated exit block, the recipe does not need to be executed on every iteration. This patch extends the VPlan-based LICM (Loop Invariant Code Motion) to also sink such recipes from the vector loop region into the exit block. This reduces redundant computation and improves cost model accuracy. TODO: Support nested loop sinking TODO: Support sinking `VPReplicateRecipe` (requires `replicateByVF` fixes) TODO: Support recipes with multiple defined values (e.g., interleaved loads) TODO: Clone recipes without users to all exit blocks TODO: Support PHI node users by checking incoming value blocks TODO: Support sinking when users are in multiple blocks TODO: Clone recipes when users are on multiple exit paths Co-authored-by: Luke Lau <luke@igalia.com> --------- Co-authored-by: Luke Lau <luke@igalia.com> Co-authored-by: Luke Lau <luke_lau@icloud.com>	2026-02-03 07:57:15 +00:00
Florian Hahn	a0b99e32d3	[LV] Add additional partial reduction test coverage for #167851 . Add test cases for which earlier versions of https://github.com/llvm/llvm-project/pull/167851 was not NFC. Test chained_sext_adds is moved to a new file.	2026-01-30 20:31:32 +00:00
Florian Hahn	abfd56293c	[VPlan] Mark VPActiveLaneMaskPHIRecipe as readnone. (#177886 ) VPWidenActiveLaneMaskPHIRecipe does not have side-effects and also does not access memory. Mark accordingly. This allows hoisting of some invariant loads out of loops and also removing unused phi recipes in the future. In llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll, the hoisting makes vectorization profitable. PR: https://github.com/llvm/llvm-project/pull/177886	2026-01-30 16:12:30 +00:00
Sander de Smalen	a726b1907a	NFC: Cleanup AArch64/partial-reduce-chained.ll This had some loop attributes that were unused. Also cleaned up the flags a little bit.	2026-01-30 14:59:38 +00:00
Sander de Smalen	b4c7518a0f	[LV] Add support for extended fadd reductions (#178447 ) This makes use of the llvm.vector.partial.reduce.fadd intrinsics added in #163975 to handle the following with FDOT: ``` float32_t fdot(float16_t *src, int N) { float32_t sum = 0.0f; for (int i=0; i<N; ++i) sum += src[i]; return sum; } ```	2026-01-30 08:27:57 +00:00
Damian Heaton	762ba885f9	[LV] Add support for llvm.vector.partial.reduce.fadd (#163975 ) Allows the Loop Vectorizer to generate `llvm.vector.partial.reduce.fadd` intrinsics when sequences which match its requirements are found.	2026-01-28 15:05:34 +00:00
Tomer Shafir	4239e858fe	[AArch64] Align nontemporal store/load little-endian checks (#177468 ) This patch aims to align all nontemporal store/load handling to systematically enforce a little-endian target. This has been the effective support LLVM had for NT store/load lowering (there has been no effective support for big-endian, even with the inconsistencies). The change in `llvm/lib/Target/AArch64/AArch64InstrInfo.td` is effectively a NFC, because the only lowering of LDNP, in `llvm/lib/Target/AArch64/AArch64ISelLowering.cpp`, have already checked for `isLittleEndian`. The change in `llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h` affects its single caller `llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp`. The previous logic has been wrong, enabling vectorization of effectively illegal nontemporal store/load instructions on big-endian.	2026-01-27 21:37:21 +02:00
Florian Hahn	537c648fc0	[LV] Precommit extra argmin/argmax tests for #170223 . Precommit extra tests for https://github.com/llvm/llvm-project/pull/170223	2026-01-26 21:32:50 +00:00
Florian Hahn	7b445ddcd2	[LV] Add additional tests for early-exit loops loads not known deref. Add additional test coverage for loops with loads that are not known to be dereferenceable.	2026-01-25 23:09:58 +00:00
Mel Chen	149c76538e	[LV] Separate runtime check cost from total overhead in profitability check (#176754 ) In isOutsideLoopWorkProfitable function, there are two places where only the runtime check cost (RtC) should be used, but incorrectly included the costs of middle blocks and early-exit blocks. 1. VectorizeMemoryCheckThreshold comparison for interleaving-only 2. Minimum trip count that bounds runtime check overhead, i.e. MinTC2 calculation This results in an overly conservative minimum profitable trip count. This patch separates the runtime check cost from the total overhead cost, and uses only RtC for VectorizeMemoryCheckThreshold comparison and the MinTC2 calculation.	2026-01-23 07:29:56 +00:00
Florian Hahn	f41767db9d	[LV] Add replicating load/store cost tests for Apple CPUs. Add dedicated tests to check replicating load/store costs on Apple CPUs.	2026-01-21 16:37:12 +00:00
Florian Hahn	d3f2f1366d	[LV] Consider UserIC when limiting VF. (#174573 ) If a UserIC is provided, the vector loop will process VF * UserIC. Pass it through UserIC to computeFeasibleMaxVF and use it to limit the max VF to factors where VF * UserIC <= MaxTripCount. This avoids creating dead vector loops with user provided interleave counts. PR: https://github.com/llvm/llvm-project/pull/174573	2026-01-20 14:19:11 +00:00
David Sherwood	abc924356e	[LV][NFC] Update low trip count tail-folding tests (#176898 ) Whilst reviewing PR #176754 I realised there seemed to be some odd cost model issues for the tests in file LoopVectorize/AArch64/fold-tail-low-trip-count.ll where we seemed to be vectorising loops that aren't worth it. It turns out the tests were not targeting AArch64 despite being in the AArch64 directory. I fixed the RUN line for the file and also added a new file for RISCV so we get more test coverage.	2026-01-20 12:15:50 +00:00
Florian Hahn	e34fefdb35	[LV] Add extra tests with sink-able recipes. Add extra test coverage for https://github.com/llvm/llvm-project/pull/168031.	2026-01-19 18:33:55 +00:00
Florian Hahn	497a6d6722	Recommit "[VPlan] Only use isAddressSCEVForCost in legacy getAddressAccSCEV" This reverts commit ed004cf42bf57ca79b57bc3076ef83a8477426ea. The original commit exposed an independent cost issue, triggering an assertion. That issue has been fixed in 3457e7efc3. Reland the patch now that the assertion has been fixed.	2026-01-18 19:55:46 +00:00
Florian Hahn	3457e7efc3	[VPlan] Match inverted logical AND/OR for select costs. VPlan transforms may invert logical AND/OR selects, which can impact costs on targets the select is not cheap but the boolean AND/OR is. Also match the inverted logical AND/OR to improve accuracy of the cost estimation and fixes the underlying issue for the cost divergence between legacy and VPlan-based cost model that caused the revert of 01d34eb38fa058 in ed004cf42bf57c.	2026-01-18 16:15:42 +00:00
Florian Hahn	123acb24da	[LV] Add missing coverage for LV cost model code paths. Add a set of tests that expose crashes with some upcoming and pending patches.	2026-01-17 21:59:50 +00:00
Vishruth Thimmaiah	04baf1105f	[LoopVectorize] Support vectorization of overflow intrinsics (#174835 ) Enables support for marking overflow intrinsics `uadd`, `sadd`, `usub`, `ssub`, `umul` and `smul` as trivially vectorizable. Fixes #174617 --------- Signed-off-by: vishruth-thimmaiah <vishruththimmaiah@gmail.com>	2026-01-16 10:09:42 +00:00
Elvis Wang	aa11629192	[LV] Prevent `extract-lane` generate unused IRs with single vector operand. (#172798 ) When `extract-lane` only contains single vector operand. We can simplify it to `extractelement`. This patch makes `extract-lane` generate simple `extractelement` when it only contains single vector operand to prevent unused IR generated. This patch is mostly NFC, the unused IR should be removed in following IR passes.	2026-01-16 13:59:51 +08:00
Florian Hahn	8c5352cf3e	[LV] Add additional cost and folding test coverage. (NFC)	2026-01-14 22:19:11 +00:00
Graham Hunter	2abd6d6d7a	[LV] Vectorize conditional scalar assignments (#158088 ) Based on Michael Maitland's previous work: https://github.com/llvm/llvm-project/pull/121222 This PR uses the existing recurrences code instead of introducing a new pass just for CSA autovec. I've also made recipes that are more generic.	2026-01-14 14:59:18 +00:00
Florian Hahn	d5c11b9a24	[VPlan] Replace PhiR operand of ComputeRdxResult with VPIRFlags. (#174026 ) Remove the artificial PhiR operand of ComputeReductionResult, which was only used to look up recurrence kind, in-loop and ordered properties. Instead, encode them as VPIRFlags as suggested by @ayalz in https://github.com/llvm/llvm-project/pull/170223. This addresses a TODO to make codegen for ComputeReductionResult independent of looking up information from other recipes. This is NFC w.r.t. codegen, the printing has been improved to include the reduction type, and whether it is in-loop/ordered. PR: https://github.com/llvm/llvm-project/pull/174026	2026-01-14 07:45:44 +00:00
David Sherwood	48ce7bb038	[LV] Fix bug in setVectorizedCallDecision (#175742 ) There is a bug in this logic: ``` InstructionCost Cost = ScalarCost; InstWidening Decision = CM_Scalarize; if (VectorCost <= Cost) { Cost = VectorCost; Decision = CM_VectorCall; } if (IntrinsicCost <= Cost) { Cost = IntrinsicCost; Decision = CM_IntrinsicCall; } ``` because it assumes that the comparisons behave sensibly in the face of invalid costs. Unfortunately, PR #174835 exposes an issue when attempting to vectorise the new test uadd_with_overflow_i32 for AArch64 targets. Specifically, there are situations where all costs are invalid (e.g. VF=vscale x 1), but some costs are more invalid than others. For example, when querying the intrinsic cost via the TTI hook we get an invalid cost with a non-zero value, whereas the vector cost is invalid with a zero value. That leads to us erroneously choosing CM_VectorCall as the call widening decision, despite the lack of a vector math variant. Inevitably this causes crashes because we create a VPCallWidenRecipe without a variant function. Fix this by only performing comparisons if the costs are valid. It now leads to us choosing CM_Scalarize more often, but it's a toin coss anyway between CM_Scalarize and CM_IntrinsicCall when both strategies are invalid. Potentially we could also create a new strategy called CM_Invalid, and avoid the creation of VPlans entirely.	2026-01-14 07:28:38 +00:00
Florian Hahn	4a807e8dd9	[VPlan] Optimize BranchOnTwoConds to chain of 2 simple branches. (#174016 ) This patch improves the lowering for BranchOnTwoConds added in https://github.com/llvm/llvm-project/pull/172750 by replacing the branch on OR with a chain of 2 branches. On Apple M cores, the new lowering is ~8-10% faster for std::find-like loops. It also makes it easier to determine the early exits in VPlan. I am also planning on extensions to support loops with multiple early exits and early-exits at different positions, which should also be slightly easier to do with the new representation. PR: https://github.com/llvm/llvm-project/pull/174016	2026-01-13 20:14:15 +00:00
Mircea Trofin	f9c561b561	[profcheck] Fix encoding of 0 loopEstimatedTrip count (#174896 ) We currently encode an estimated trip count of 0 as the latch having branch probabilities 0-0. That's an invalid pair of weights. The probability of a branch is computed as a fraction of its corresponding weight and the sum of the weights. In fact, `BranchProbabilityInfo::calcMetadataWeights` will convert this to a 1-1, meaning 50% - 50%, which isn't quite what we want. To indicate the loop is never taken, we just need to initialize the exit probability to non-zero (hence, 1) Related: https://reviews.llvm.org/D67905 Issue #147390	2026-01-12 17:12:35 -08:00
Sander de Smalen	de9120dd3b	[AArch64] Define cost of i16->i32 udot/sdot instructions (#174102 ) i16 -> i32 dot-product operations are natively supported with SVE2p1 and SME2. This updates the cost-model so that those operations are recognized as cheap by the LoopVectorizer.	2026-01-12 16:46:25 +00:00
Elvis Wang	cd2caf6580	[LV] Simplify extract-lane with scalar operand to the scalar value itself. (#174534 ) This patch simplifies extract-lane(%lane_num, %X) to %X when %X is a scalar value. Extracting from a scalar is redundant since there is only one value to extract.	2026-01-12 10:03:44 +08:00
Florian Hahn	2f7e218017	[VPlan] Add missing sext(sub) SCEV fold to getSCEVExprForVPValue. SCEV has a manual fold when doing SCEV construction from IR, that is not integrated in the regular SCEV construction functions. Mirror the behavior in getSCEVExprForVPValue, to match results when constructing SCEVs from IR. Fixes https://github.com/llvm/llvm-project/issues/174622.	2026-01-11 20:51:13 +00:00
Paul Osmialowski	6ca6a328ae	[AArch64][VecLib] Add vector function mappings for the modf, sincos, sincospi vector intrinsics (#175098 ) Following the improvements introduced in #109833 and the most recent development of the libamath library (used by `-fveclib=ArmPL`), this patch adds the missing mappings for the functions that return literal struct values.	2026-01-09 18:08:16 +00:00
Hans Wennborg	ed004cf42b	Revert "[VPlan] Only use isAddressSCEVForCost in legacy getAddressAccSCEV (NFCI)" This caused assertion failures: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7265: VectorizationFactor llvm::LoopVectorizationPlanner::computeBestVF(): Assertion `(BestFactor.Width == LegacyVF.Width \|\| BestPlan.hasEarlyExit() \|\| !Legal->getLAI()->getSymbolicStrides().empty() \|\| UsesEVLGatherScatter \|\| planContainsAdditionalSimplifications( getPlanFor(BestFactor.Width), CostCtx, OrigLoop, BestFactor.Width) \|\| planContainsAdditionalSimplifications( getPlanFor(LegacyVF.Width), CostCtx, OrigLoop, LegacyVF.Width)) && " VPlan cost model and legacy cost model disagreed"' failed. see comment on https://github.com/llvm/llvm-project/pull/171204 This reverts commit 01d34eb38fa0587cb95eedd3bada8257abc122f8.	2026-01-09 15:38:32 +01:00
Florian Hahn	1dea577186	[SCEV] Handle URem pattern in getRangeRef. (#174456 ) Check if an scAddExpr expressions represents an URem, and if it does, use the divisor to limit the conservative range. https://alive2.llvm.org/ce/z/VPxe7C PR: https://github.com/llvm/llvm-project/pull/174456	2026-01-07 11:32:43 +00:00
David Sherwood	97ee9b66c0	[LV] Teach m_One, m_ZeroInt patterns to look through broadcasts (#170159 ) In VPlanPatternMatch.h I have changed the int_pred_ty code to look through broadcasts in order to catch more cases, i.e. multiplying by a splat of one, etc.	2026-01-07 10:35:08 +00:00
Ramkumar Ramachandra	d12e99376f	Reland [VPlan] Simplify pow-of-2 (mul\|udiv) -> (shl\|lshr) (#174581 ) The original patch, landed as a2db31b0 ([VPlan] Simplify pow-of-2 (mul\|udiv) -> (shl\|lshr), #172477) had a critical commutative matcher bug, which has now been fixed. An assert has also been strengthened, following a post-commit review.	2026-01-06 20:36:26 +00:00
Florian Hahn	01d34eb38f	[VPlan] Only use isAddressSCEVForCost in legacy getAddressAccSCEV (NFCI) Follow-up to https://github.com/llvm/llvm-project/pull/171204 and 1f331e453f to only rely on isAddressSCEVForCost in legacy isAddressSCEVForCost, completely aligning the decisions of VPlan and legacy cost model.	2026-01-06 19:18:13 +00:00
Alex Bradbury	5a456c17d9	Revert "[VPlan] Simplify pow-of-2 (mul\|udiv) -> (shl\|lshr)" (#174559 ) Reverts llvm/llvm-project#172477 This is causing failures for RVA23 (including some tests running away in their execution causing OOM, hence the builder dying). I will attempt to follow up on the PR with a reproducer of some kind. https://lab.llvm.org/buildbot/#/builders/210/builds/7243	2026-01-06 10:26:51 +00:00
Ramkumar Ramachandra	a2db31b06f	[VPlan] Simplify pow-of-2 (mul\|udiv) -> (shl\|lshr) (#172477 )	2026-01-06 08:27:48 +00:00
Luke Lau	ad4bfac732	[IR] Split vector.splice into vector.splice.left and vector.splice.right (#170796 ) This PR implements the first change outlined in https://discourse.llvm.org/t/rfc-allow-non-constant-offsets-in-llvm-vector-splice/88974?u=lukel In order to allow non-immediate offsets in the llvm.vector.splice intrinsic, we need to separate out the "shift left" and "shift right" modes into two separate intrinsics, which were previously determined by whether or not the offset is positive or negative. The description in the LangRef has also been reworded in terms of sliding elements left or right and extracting either the upper or lower half as opposed to extracting from a certain index, which brings it inline with the definition of `llvm.fshr.`/`llvm.fshl.`. This patch teaches AutoUpgrade.cpp to upgrade the old intrinsics into their new equivalent one based on their offset, so existing uses of vector.splice should still work. Uses of llvm.vector.splice in `llvm/test/CodeGen` haven't been replaced in this PR to keep the diff small and kick the tyres on the AutoUpgrader a bit. I planned to do this in a follow up NFC but can include it in this PR if reviewers prefer. Similarly the shuffle costing kind `SK_Splice` has just been kept the same for now, to be split into `SK_SpliceLeft` and `SK_SpliceRight` later.	2026-01-06 15:41:26 +08:00
Florian Hahn	4bbc1f6cb5	[LV] Add test case for costs of load of pointer inductions (NFC).	2026-01-05 22:45:44 +00:00
Florian Hahn	16830b2164	[VPlan] Remove VPWidenSelectRecipe, use VPWidenRecipe instead (NFCI). (#174234 ) All extra state has been removed from VPWidenSelectRecipe at this point. There's no benefit of having a separate recipe and Select can easily be handled by the existing VPWidenRecipe. PR: https://github.com/llvm/llvm-project/pull/174234	2026-01-05 22:33:37 +00:00

1 2 3 4 5 ...

1137 Commits