llvm-project

Author	SHA1	Message	Date
Sander de Smalen	a4e6f495c3	[AArch64] More accurately model cost of partial reductions (#181707 ) With #181706 using the cost-model to decide whether using partial reductions is profitable, we need to more accurately represent the cost of certain partial reduction operations: * Reflect the fact that MLALB/T instructions can be used for 16-bit -> 32-bit partial reductions (or MLAL/MLAL2 for NEON). * Calculate the cost of expanding the partial reduction in ISel for reductions that don't have an explicit instruction, rather than returning a random number. For sub-reductions we scale the cost to make them slightly cheaper, so that they're still candidates for forming cdot operations.	2026-03-30 08:49:41 +01:00
Florian Hahn	617ec39fd0	[VPlan] Add printing test for dissolving replicate regions. (#189192 ) Add VPlan printing test for https://github.com/llvm/llvm-project/pull/186252 https://github.com/llvm/llvm-project/pull/189022	2026-03-28 21:01:42 +00:00
Ramkumar Ramachandra	840e9a4ddd	[VPlan] Fix wrap-flags on WidenInduction unroll (#187710 ) Due to a somewhat recent change, IntOrFpInduction recipes have associated VPIRFlags. The VPlanUnroll logic for WidenInduction recipes predates this change, and computes incomplete wrap-flags: update it to simply use the flags on IntOrFpInduction recipes; PointerInduction recipes have no associated flags, and indeed, no flags should be used.	2026-03-27 13:26:04 +00:00
Florian Hahn	90c1c588f8	[VPlan] Don't set WrapFlags for truncated IVs. (#188966 ) The wrap flags from the IV bin-op are not guaranteed to apply to truncated inductions, which are evaluated in narrower types. Instead of dropping them late (in expandVPWidenIntOrFpInduction), do not add them at the outset, the prevent invalid transforms based on incorrect flags in the future. PR: https://github.com/llvm/llvm-project/pull/188966	2026-03-27 12:39:03 +00:00
Florian Hahn	99aa33d5b3	Reapply "[VPlan] Explicitly unroll replicate-regions without live-outs by VF." (#188947 ) This reverts commit 4562a953db9d9813a873b78144cee1df39c7e0c0. The recommit adjusts processLaneForReplicateRegion to first remap all operands, then update the new operands. This fixes a VPlan verification failure when running LV tests with expensive checks. Original message: This patch adds a new replicateReplicateRegionsByVF transform to unroll replicate=regions by VF, dissolving them. The transform creates VF copies of the replicate-region's content, connects them and converts recipes to single-scalar variants for the corresponding lanes. The initial version skips regions with live-outs (VPPredInstPHIRecipe), which will be added in follow-up patches. Depends on https://github.com/llvm/llvm-project/pull/170053 PR: https://github.com/llvm/llvm-project/pull/170212	2026-03-27 12:19:58 +00:00
Ramkumar Ramachandra	9d5684bb00	[LV] Regen iv_outside_user test with UTC (NFC) (#188934 ) To merge different CHECK prefixes to a common one.	2026-03-27 12:07:49 +00:00
Florian Hahn	849ba979bd	[VPlan] Add test showing incorrect flags on truncated inductions (NFC).	2026-03-27 11:00:26 +00:00
Florian Hahn	4562a953db	Revert "[VPlan] Explicitly unroll replicate-regions without live-outs by VF." (#188868 ) Reverts llvm/llvm-project#170212 appears to cause a failure with expensive checks: https://lab.llvm.org/buildbot/#/builders/187/builds/18306	2026-03-26 23:20:49 +00:00
Florian Hahn	cb1661b046	[VPlan] Explicitly unroll replicate-regions without live-outs by VF. (#170212 ) This patch adds a new replicateReplicateRegionsByVF transform to unroll replicate=regions by VF, dissolving them. The transform creates VF copies of the replicate-region's content, connects them and converts recipes to single-scalar variants for the corresponding lanes. The initial version skips regions with live-outs (VPPredInstPHIRecipe), which will be added in follow-up patches. Depends on https://github.com/llvm/llvm-project/pull/170053 PR: https://github.com/llvm/llvm-project/pull/170212	2026-03-26 21:35:29 +00:00
Florian Hahn	5aae014ed5	[LV] Refine tripcount estimate using minimum iteration count rt check. (#188135 ) When not folding the tail the minimum iteration count check ensures that the vector loop is not executed if computing the trip count wraps around to zero, as the trip count must be at least VF when vectorizing without tail-folding. Add and use a new tryToRefineConstantMaxTripCount helper. This ensures we do not create dead main loops when vectorizing the epilogue, as we choose smaller main VFs. PR: https://github.com/llvm/llvm-project/pull/188135	2026-03-26 20:48:53 +00:00
Ramkumar Ramachandra	76a9692254	[VPlan] Sink single-scalar replicates in licm (#187047 ) Refine the replicate bail-out in licm to permit single-scalar replicates.	2026-03-26 14:42:57 +00:00
Florian Hahn	40304d8fef	Reapply "[VPlan] Remove manual region removal when simplifying for VF and UF. (#181252 )" (#188589 ) This reverts commit e30f9c19464bcf1bf1e9f69b63884fb78ad2d05d. Re-land, now that the reported crash causing the revert has been fixed as part of 77fb84889 (#187504). Original message: Replace manual region dissolution code in simplifyBranchConditionForVFAndUF with using general removeBranchOnConst. simplifyBranchConditionForVFAndUF now just creates a (BranchOnCond true) or updates BranchOnTwoConds. The loop then gets automatically removed by running removeBranchOnConst. This removes a bunch of special logic to handle header phi replacements and CFG updates. With the new code, there's no restriction on what kind of header phi recipes the loop contains. Note that VPEVLBasedIVRecipe needs to be marked as readnone. This is technically unrelated, but I could not find an independent test that would be impacted. The code to deal with epilogue resume values now needs updating, because we may simplify a reduction directly to the start value. PR: https://github.com/llvm/llvm-project/pull/181252	2026-03-26 10:14:10 +00:00
Florian Hahn	6420dd833e	[LV] Add missing REQUIRES: asserts to new test from #188126 . Test checks debug output, and requires asserts.	2026-03-25 15:45:22 +00:00
John Brawn	5f49ce5eaf	[ARM] Consider register pressure when vectorizing with MVE (#188053 ) MVE only has 8 vector registers, so it's not too hard for the vectorizer to end up using more than that resulting in enough spilling that it's worse than not vectorizing. Enable shouldConsiderVectorizationRegPressure for targets with MVE so the vectorizer doesn't vectorize in those cases.	2026-03-25 10:46:49 +00:00
Rohit Aggarwal	d21e1a3798	[LIBM][AMDLIBM] - New vector calls for cdfnorm and round scalar calls (#187232 ) In amdlibm, new vector calls cdfnorm amd_vrd2_cdfnorm amd_vrd4_cdfnorm amd_vrd8_cdfnorm round amd_vrs16_roundf amd_vrs8_roundf amd_vrs4_roundf amd_vrd8_round amd_vrd4_round amd_vrd2_round Link to aocl repo - [aocl-libm-ose](https://github.com/amd/aocl-libm-ose)	2026-03-25 10:03:00 +00:00
Florian Hahn	86c1510418	[VPlan] Remove isVector guard in getCostForRecipeWithOpcode. (#188126 ) The legacy cost model computes and passes RHSInfo both when widening and replicating. Match behavior in VPlan-based cost model. The added test shows that we now compute the same cost as the legacy cost model. Without this change, the test added in llvm/test/Transforms/LoopVectorize/AArch64/predicated-costs.ll would crash with https://github.com/llvm/llvm-project/pull/187056. PR: https://github.com/llvm/llvm-project/pull/188126	2026-03-25 09:59:13 +00:00
David Sherwood	85e1c641eb	[LV][NFC] Remove some unused attributes from tests (#188091 ) The local_unnamed_addr and dso_local attributes add no value to any of the tests and simply increase file size, so I've removed all instances.	2026-03-24 06:52:31 +00:00
Florian Hahn	77fb848894	Reapply "[LV] Simplify and unify resume value handling for epilogue vec." (#187504 ) This reverts commit cdaf29f84dd0abbd1f961982799059c92d76625b. This version skips removeBranchOnConst when vectorizing the epilogue, as it may trigger folds that remove the resume phi used as resume value from the epilogue. This fixes https://github.com/llvm/llvm-project/issues/187323. Original message: This patch tries to drastically simplify resume value handling for the scalar loop when vectorizing the epilogue. It uses a simpler, uniform approach for updating all resume values in the scalar loop: 1. Create ResumeForEpilogue recipes for all scalar resume phis in the main loop (the epilogue plan will have exactly the same scalar resume phis, in exactly the same order) 2. Update ::execute for ResumeForEpilogue to set the underlying value when executing. This is not super clean, but allows easy lookup of the generated IR value when we update the resume phis in the epilogue. Once we connect the 2 plans together explicitly, this can be removed. 3. Use the list of ResumeForEpilogue VPInstructions from the main loop to update the resume/bypass values from the epilogue. This simplifies the code quite a bit, makes it more robust (should fix https://github.com/llvm/llvm-project/issues/179407) and also fixes a mis-compile in the existing tests (see change in llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-sub-epilogue-vec.ll, where previously we would incorrectly resume using the start value when the epilogue iteration check failed) In some cases, we get simpler code, due to additional CSE, in some cases the induction end value computations get moved from the epilogue iteration check to the vector preheader. We could try to sink the instructions as cleanup, but it is probably not worth the trouble. Fixes https://github.com/llvm/llvm-project/issues/179407. PR for recommit https://github.com/llvm/llvm-project/pull/188134	2026-03-23 22:09:40 +00:00
Florian Hahn	2f1e0d14f4	[LV] Add additional epilogue vector tests. Add additional epilogue vectorization tests for * https://github.com/llvm/llvm-project/issues/187323 * https://github.com/llvm/llvm-project/issues/185345	2026-03-23 20:44:00 +00:00
Andrei Elovikov	a9ae2fd79e	[NFC][LV] Fix what seems to be a typo in the test (#187769 ) The test was added in `4e9894498e`. Alternative fixes would be: * Remove unused GEP, although not clear why we'd want to overwrite stored `i64` with `ptr` store. * Keep this patch, but perform both GEPs with `i64` element type to reduce the diff. It's not clear if the scalarization caused by that type mismatch is intentional/relevant for the original change.	2026-03-23 17:28:32 +00:00
Alan Zhao	c624851037	[LoopVectorize] Fix an integer narrowing conversion in `getPredBlockCostDivisor(...)` (#187605 ) `LoopVectorizationCostModel::getPredBlockCostDivisor(...)` may return large `uint64_t` values that get coerced to an `unsigned` by `VPCostContext::getPredBlockCostDivisor(...)`, which can cause division by zero. Fixes #187584	2026-03-23 17:22:05 +00:00
Benjamin Maxwell	249b086545	[LV] Fix crash when extends are not widened in partial reduction matching (#187782 ) Fixes https://github.com/llvm/llvm-project/pull/185821#issuecomment-4098933551	2026-03-23 10:30:19 +00:00
Sander de Smalen	6feced2a7c	Fix select-best-vf-tripcount.ll buildbot failure This test failed on the llvm-clang-win-x-aarch64 buildbot. It seems the rounding is different, leading to a different output. Instead of: Cost for VF 4: 9 (Estimated cost per lane: 2.2) The windows buildbot it fails because the test output is: Cost for VF 4: 9 (Estimated cost per lane: 2.3)	2026-03-20 14:16:58 +00:00
Florian Hahn	19b0c68ee0	[VPlan] Skip epilogue vectorization if dead after narrowing IGs. (#187016 ) When narrowing interleave groups, the main vector loop processes IC iterations instead of VF * IC. Update selectEpilogueVectorizationFactor to use the effective VF, checking if the canonical IV controlling the loop now steps by UF instead of VFxUF. This avoids epilogue vectorization with dead epilogue vector loops and also prevents crashes in cases where we can prove both the epilogue and scalar loop are dead. Fixes https://github.com/llvm/llvm-project/issues/186846 PR: https://github.com/llvm/llvm-project/pull/187016	2026-03-20 12:33:16 +00:00
Ramkumar Ramachandra	1dfd268f10	[VPlan] Simplify mul x, -1 -> sub 0, x (#187551 ) Simplify exactly as InstCombine does. A follow-up would include simplifying add x, (sub 0, y) -> sub x, y. Alive2 proof: https://alive2.llvm.org/ce/z/Af7QiD	2026-03-20 12:07:51 +00:00
Ramkumar Ramachandra	b6accfa0b4	[LV] Regen induction-ptrcasts test with UTC (NFC) (#187678 )	2026-03-20 11:58:19 +00:00
Benjamin Maxwell	4b17135d14	[LV] Simplify `matchExtendedReductionOperand()` (NFCI) (#185821 ) This updates `matchExtendedReductionOperand` so the simple case of `UpdateR(PrevValue, ext(...))` is matched first as an early exit. The binop matching is then flattened to remove the extra layer of the `MatchExtends` lambda.	2026-03-20 09:29:28 +00:00
Sander de Smalen	a971089cb8	[LV] Explain why a less profitable VF was chosen (NFCI) (#187469 ) I was very puzzled the other day when it showed that VF 8 had a cost of X and VF 16 had a cost of X/2, yet it still choose VF 8. This PR adds some extra debug output to explain why this happens.	2026-03-20 07:21:17 +00:00
Florian Hahn	fd3cf1c160	[LV] Move dereferenceability check from Legal to VPlan (NFC) (#185323 ) Instead of checking dereferenceability early during LoopVectorizationLegality, defer the check to VPlan construction via areAllLoadsDereferenceable. This in preparation for supporting early exit vectorization of non-dereferencable loads, e.g. via speculative loads (https://discourse.llvm.org/t/rfc-provide-intrinsics-for-speculative-loads/89692) or first-faulting loads. Detection in VPlan allows easily replacing potentially non-deref loads with other loads as needed. PR: https://github.com/llvm/llvm-project/pull/185323	2026-03-19 19:21:45 +00:00
Florian Hahn	cdaf29f84d	Revert "[LV] Simplify and unify resume value handling for epilogue vec." (#187504 ) Reverts llvm/llvm-project#185969 This is suspected to cause a miscompile in 549.fotonik3d_r from SPEC 2017 FP	2026-03-19 14:38:37 +00:00
John Brawn	e8556ff6b6	[NFC] Remove fractional part of costs in maxbandwidth-regpressure.ll (#187498 ) This test is failing on the llvm-clang-x-aarch64 buildbot due to what looks like a difference in rounding behaviour when printing estimated cost per lane. Solve this by removing the fractional part, which is what we've done in the past when this has happened (e.g. commit aeb88f677).	2026-03-19 13:50:56 +00:00
John Brawn	191c84b822	[VPlan] Permit derived IV in isHeaderMask (#187360 ) When matching scalar steps of the canonical IV, also match a derived IV of the canonical IV if the derivation is essentially a no-op. Fixes a failure in the mve-reg-pressure-spills.ll test when expensive checks are enabled.	2026-03-19 12:05:07 +00:00
Koakuma	6aeeae676a	[SPARC][Tests] Add lit.local.cfg to SPARC LoopVectorize tests (#187489 )	2026-03-19 18:59:15 +07:00
Koakuma	23af867e6d	[SPARC] Add TTI implementation for getting register numbers and widths (#180660 ) Correctly inform transform passes about our registers; this prevents the issue with the `find-last` test where the loop vectorizer pass mistakenly thinks that the backend has vector capabilities and generates vector types, which causes the backend to crash. See also: https://github.com/sparclinux/issues/issues/69	2026-03-19 18:37:46 +07:00
Elvis Wang	53f8f3b017	Reland [LV] Replace remaining LogicalAnd to vp.merge in EVL optimization. (#184068 ) (#187199 ) This patch replace the remaining LogicalAnd to vp.merge in the second pass to not break the `m_RemoveMask` pattern in the optimizeMaskToEVL. Also skip cost model comparison when the plan contains `vp_merge` which won't be calculated by the legacy model. This can help to remove header mask for FindLast reduction (CSA) loops. Original PR: https://github.com/llvm/llvm-project/pull/184068 Original built-bot failure: https://lab.llvm.org/buildbot/#/builders/213/builds/2497	2026-03-19 07:56:42 +08:00
Florian Hahn	fce100e26e	[VPlan] Fix masked_cond expansion. masked_cond is used to combine early-exit conditions with masks from predicate. The early-exit condition should only be evaluated if the mask is true. Emit the mask first, to avoid incorrect poison propagation. Fixes https://github.com/llvm/llvm-project/issues/187061.	2026-03-18 20:26:04 +00:00
Florian Hahn	2be4a9b1b2	[LV] Add predicated early-exit tests showing poison prop issue. (NFC) Add tests showing incorrect poison propagation from https://github.com/llvm/llvm-project/issues/187061.	2026-03-18 20:00:45 +00:00
Florian Hahn	0ea2e5813f	[VPlan] Account for early-exit dispatch blocks when updating LI. (#185618 ) Now that we can vectorize loops with multiple early exits, we emit dispatch blocks after the middle block to go to a specific exit or continue in the dispatch chain. With that, we need to be a bit more careful when it comes to picking the loop the dispatch block belongs to. The dispatch block will belong to the innermost loop of all exit blocks reachable from the current block. Fixes https://github.com/llvm/llvm-project/issues/185362 PR: https://github.com/llvm/llvm-project/pull/185618	2026-03-18 18:37:34 +00:00
John Brawn	81d3f04f29	[NFC] Fix mve-reg-pressure-spills.ll test (#187316 ) This test needs a REQUIRES: asserts, as it uses -debug-only.	2026-03-18 16:48:48 +00:00
Andrei Elovikov	9b0c2a135e	[NFC] Update `LoopVectorize/predicator.ll` test (#187125 ) Align it with the style of `LoopVectorize/VPlan/predicator.ll`: * Move ascii-graphs close to IR to avoid scrolling through CHECKs when comparing the picture and actual IR * Rename `%cN` to ensure that `bbN` branches on `%cN`	2026-03-18 16:28:50 +00:00
John Brawn	a083e19efe	[VPlan] Add the cost of spills when considering register pressure (#179646 ) Currently when considering register pressure is enabled, we reject any VF that has higher pressure than the number of registers. However this can result in failing to vectorize in cases where it's beneficial, as the cost of the extra spills is less than the benefit we get from vectorizing. Deal with this by instead calculating the cost of spills and adding that to the rest of the cost, so we can detect this kind of situation and still vectorize while avoiding vectorizing in cases where the extra cost makes it not with it.	2026-03-18 15:30:39 +00:00
Luke Lau	bf46a95f2c	[VPlan] Use target's index type for {First,Last}ActiveLane instead of i64 (#186361 ) Fixes #186005 On RV32 with zve32x, i.e. no legal 64 bit types either scalar or vector, @llvm.cttz.elts.i64 cannot be lowered and so returns an illegal cost for scalable VFs. However VPInstruction::FirstActiveLane and VPInstruction::LastActiveLane always use a hardcoded i64 type. This causes a legacy/VPlan cost model mismatch in the live-out.ll test, and in early-exit-live-out.ll prevents the scalable VF from being chosen. This PR teaches the two VPInstructions to use the target's index type, i.e. the width of a pointer in the default address space, so it will generate a 32 bit cttz.elts on RV32. This should be large enough to hold the maximum number of elements in a vector, as if the vector was any bigger it would imply it isn't accessible by memory. I considered using the canonical IV type but I don't think that will work since the canonical IV can be i64 on RV32, and it causes regressions due to extra zexting on 64-bit targets with a 32-bit IV.	2026-03-18 15:01:21 +00:00
Florian Hahn	49f9b4b44a	[LV] Add test for diff checks with ptrtoint subtract. (NFC) Adds extra test coverage for https://github.com/llvm/llvm-project/pull/180244.	2026-03-18 11:08:09 +00:00
Elvis Wang	3eb8b788b7	Revert "[LV] Replace remaining LogicalAnd to vp.merge in EVL optimization." (#187170 ) Reverts llvm/llvm-project#184068 This hit the cost model assertion in rva23 stage2 build bot. https://lab.llvm.org/buildbot/#/builders/213/builds/2497	2026-03-18 09:21:40 +08:00
Elvis Wang	52089f895e	[LV] Replace remaining LogicalAnd to vp.merge in EVL optimization. (#184068 ) This patch replace the remaining LogicalAnd to vp.merge in the second pass to not break the `m_RemoveMask` pattern in the optimizeMaskToEVL. This can help to remove header mask for FindLast reduction (CSA) loops. PR: https://github.com/llvm/llvm-project/pull/184068	2026-03-18 08:39:27 +08:00
Elvis Wang	51b3b9b039	[LV] Optimize x && (x && y) -> x && y (#185806 ) This patch removes the extra logical-and in `x && (x && y)` and `x && (y && x)` to `x && y`. This helps to simplify mask calculation in the FindLast reduction and exposes more opportunities to replace to EVL. PR link: https://github.com/llvm/llvm-project/pull/185806	2026-03-17 13:03:04 +08:00
Florian Hahn	013f2542a2	[LV] Simplify and unify resume value handling for epilogue vec. (#185969 ) This patch tries to drastically simplify resume value handling for the scalar loop when vectorizing the epilogue. It uses a simpler, uniform approach for updating all resume values in the scalar loop: 1. Create ResumeForEpilogue recipes for all scalar resume phis in the main loop (the epilogue plan will have exactly the same scalar resume phis, in exactly the same order) 2. Update ::execute for ResumeForEpilogue to set the underlying value when executing. This is not super clean, but allows easy lookup of the generated IR value when we update the resume phis in the epilogue. Once we connect the 2 plans together explicitly, this can be removed. 3. Use the list of ResumeForEpilogue VPInstructions from the main loop to update the resume/bypass values from the epilogue. This simplifies the code quite a bit, makes it more robust (should fix https://github.com/llvm/llvm-project/issues/179407) and also fixes a mis-compile in the existing tests (see change in llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-sub-epilogue-vec.ll, where previously we would incorrectly resume using the start value when the epilogue iteration check failed) In some cases, we get simpler code, due to additional CSE, in some cases the induction end value computations get moved from the epilogue iteration check to the vector preheader. We could try to sink the instructions as cleanup, but it is probably not worth the trouble. Fixes https://github.com/llvm/llvm-project/issues/179407.	2026-03-16 21:21:59 +00:00
Ramkumar Ramachandra	92e44b247f	Reland [VPlan] Extend interleave-group-narrowing to WidenCast (#186454 ) The patch was intially landed as bd5f9384, but then reverted due to an underlying issue in narrowInterleaveGroups, described in #185860. The issue has since been fixed. The reland is simply a conflict-resolved version of the original patch, which includes an additonal test update. WidenCast is very similar to Widen recipes. Fixes #128062.	2026-03-16 12:21:48 +00:00
Luke Lau	ee4bb2cea3	[LV] Add more tests for blend masks. NFC (#186751 ) To be used in #184838	2026-03-16 09:14:35 +00:00
Eli Friedman	7bc3bb0196	[ScalarEvolution] Limit recursion in getRangeRef for PHI nodes. (#152823 ) Restrict PHI nodes that getRangeRef is allowed to recursively examine so we don't need a "visited" set. And fix createSCEVIter so it creates all the relevant SCEV nodes before getRangeRef tries to examine them. The tests that are affected have induction variables that aren't AddRecs. (Other cases are theoretically affected, but don't seem to show up in our tests.)	2026-03-13 16:27:39 -07:00

1 2 3 4 5 ...

3994 Commits