llvm-project

Author	SHA1	Message	Date
Mel Chen	f196b1d66f	[VPlan] Extract reverse operation for reverse accesses (#146525 ) This patch introduces VPInstruction::Reverse and extracts the reverse operations of loaded/stored values from reverse memory accesses. This extraction facilitates future support for permutation elimination within VPlan.	2025-12-18 14:57:48 +00:00
Florian Hahn	bab0dc4d48	Reapply "[LV] Mark checks as never succeeding for high cost cutoff." Reapply 8a115b6934a90441 with an update to tests handling remarks. The patch now directly emits a clear remark when we bail out due to the memory check threshold. Original message: When GeneratedRTChecks::create bails out due to exceeding the cost threshold, no runtime checks are generated and we must not proceed assuming checks have been generated. Mark the checks as never succeeding, to make sure we don't try to vectorize assuming the runtime checks hold. This fixes a case where we previously incorrectly vectorized assuming runtime checks had been generated when forcing vectorization via metadate. Fixes the mis-compile mentioned in https://github.com/llvm/llvm-project/pull/166247#issuecomment-3631471588	2025-12-17 20:21:49 +00:00
Luke Lau	67d0e21a62	Reapply "[VPlan] Remove legacy costing inside VPBlendRecipe::computeCost (#171846 )" (#172261 ) This reapplies #171846 with a test case and fix for a legacy cost-model mismatch assertion. In the previous version of the patch, we only considered the plan to contain simplifications when it had a VPBlendRecipe and VF.isScalar() was true. However for some VPlans we may have a blend with only the first lane used: BLEND ir<%phi> = ir<%foo.res> ir<%bar.res>/ir<%c> CLONE ir<%gep> = getelementptr ir<%p>, ir<%phi> vp<%5> = vector-pointer ir<%gep> And in the legacy cost model we cost a blend as a phi if it's uniform: // If we know that this instruction will remain uniform, check the cost of // the scalar version. if (isUniformAfterVectorization(I, VF)) VF = ElementCount::getFixed(1); So this replaces the VF.isScalar() check with vputils::onlyFirstLaneUsed, which matches how the VPlan cost model mirrored the legacy model beforehand. A VPInstruction::Select will also emit a scalar select for a vector VF if only the first lane is used, so this also updates VPBlendRecipe::computeCost to reflect that too.	2025-12-16 06:30:54 +00:00
Ramkumar Ramachandra	0636225b93	[VPlan] Directly unroll VectorPointerRecipe (#168886 ) In an effort to get rid of VPUnrollPartAccessor and directly unroll recipes, start by directly unrolling VectorPointerRecipe, allowing for VPlan-based simplifications and simplification of the corresponding execute.	2025-12-15 10:54:06 +00:00
Florian Hahn	a99a982440	[LV] Add test coverage for remark for unprofitable RT checks. Add test coverage for remark when runtime checks are not profitable with threshold provided. Also make sure that X86 remark tests actually passes an X86 triple, which is needed for the threshold remark. Also clean up the tests a bit.	2025-12-13 22:44:09 +00:00
Luke Lau	4ea8157773	Revert "[VPlan] Remove legacy costing inside VPBlendRecipe::computeCost (#171846 )" This reverts commit fd5f53aa9b21060063484fc6c346316a34a6464c. It's triggering legacy cost model assertions reported in https://github.com/llvm/llvm-project/pull/171846#issuecomment-3647640019	2025-12-13 20:05:34 +08:00
Luke Lau	fd5f53aa9b	[VPlan] Remove legacy costing inside VPBlendRecipe::computeCost (#171846 ) A VPBlendRecipe always emits selects, even when the VF is scalar. However the legacy cost model always costs all scalar non-header phis as a phi, and the VPlan cost model has to account for this. This can cause the cost to be a little off, for example not including the cost of the select in @smax_call_uniform leading to unprofitable vectorization. This removes this from the VPlan cost model and handles checks for the case in planContainsAdditionalSimplifications instead. I considered trying to make the legacy cost model more accurate but I'm not sure if it's possible. We need information as to whether or not the scalar VF we are costing is the original loop in which case it's actually a phi, or if it's a VPBlendRecipe that emits a select, potentially from a VF=1, UF>=1 VPlan.	2025-12-12 00:25:58 +08:00
Florian Hahn	de53b1a4ef	[LV] Simplify IR for gather-cost.ll, auto-generate checks. (NFC) Simplify tests and auto-generate check in preparation for further updates.	2025-12-08 19:19:51 +00:00
Florian Hahn	1054a6e9de	[SCEV] Handle non-constant start values in AddRec UDiv canonicalization. (#170474 ) Follow-up to https://github.com/llvm/llvm-project/pull/169576 to enable UDiv canonicalization if the start of the AddRec is not constant. The fold is not restricted to constant start values, as long as we are able to compute a constant remainder. The fold is only applied if the subtraction of the remainder can be folded into to start expression, but that is just to avoid creating more complex AddRecs. For reference, the proof from #169576 is https://alive2.llvm.org/ce/z/iu2tav PR: https://github.com/llvm/llvm-project/pull/170474	2025-12-03 21:13:11 +00:00
Florian Hahn	5d876093b7	[SCEV] Allow udiv canonicalization of potentially-wrapping AddRecs (#169576 ) Extend the {X,+,N}/C => {(X - X%N),+,N}/C canonicalization to handle AddRecs that may wrap, when X < N <= C and both N,C are powers of 2. The alignment and power-of-2 properties ensure division results remain equivalent for all offsets [(X - X%N), X). Alive2 Proof: https://alive2.llvm.org/ce/z/iu2tav Fixes https://github.com/llvm/llvm-project/issues/168709 PR: https://github.com/llvm/llvm-project/pull/169576	2025-12-02 14:09:53 +00:00
Florian Hahn	b76089c7f3	[VPlan] Skip uses-scalars restriction if one of ops needs broadcast. (#168246 ) Update the logic in narrowToSingleScalar to allow narrowing even if not all users use scalars, if at least one of the operands already needs broadcasting. In that case, there won't be any additional broadcasts introduced. This should allow removing the special handling for stores, which can introduce additional broadcasts currently. Fixes https://github.com/llvm/llvm-project/issues/169668. PR: https://github.com/llvm/llvm-project/pull/168246	2025-11-28 10:26:27 +00:00
Florian Hahn	f8eca64a28	Reapply "[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042 )" This reverts commit a6edeedbfa308876d6f2b1648729d52970bb07e6. The following fixes have landed, addressing issues causing the original revert: * https://github.com/llvm/llvm-project/pull/169298 * https://github.com/llvm/llvm-project/pull/167897 * https://github.com/llvm/llvm-project/pull/168949 Original message: Building on top of https://github.com/llvm/llvm-project/pull/148817, introduce a new abstract LastActiveLane opcode that gets lowered to Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1). When folding the tail, update all extracts for uses outside the loop the extract the value of the last actice lane. See also https://github.com/llvm/llvm-project/issues/148603 PR: https://github.com/llvm/llvm-project/pull/149042	2025-11-26 20:03:55 +00:00
Florian Hahn	d58ebe339c	Revert "Reapply "[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042 )"" This reverts commit 72e51d389f66d9cc6b55fd74b56fbbd087672a43. Missed some test updates.	2025-11-26 19:41:39 +00:00
Florian Hahn	72e51d389f	Reapply "[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042 )" This reverts commit a6edeedbfa308876d6f2b1648729d52970bb07e6. The following fixes have landed, addressing issues causing the original revert: * https://github.com/llvm/llvm-project/pull/169298 * https://github.com/llvm/llvm-project/pull/167897 * https://github.com/llvm/llvm-project/pull/168949 Original message: Building on top of https://github.com/llvm/llvm-project/pull/148817, introduce a new abstract LastActiveLane opcode that gets lowered to Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1). When folding the tail, update all extracts for uses outside the loop the extract the value of the last actice lane. See also https://github.com/llvm/llvm-project/issues/148603 PR: https://github.com/llvm/llvm-project/pull/149042	2025-11-26 19:31:25 +00:00
Ramkumar Ramachandra	2d4a8dadba	[VPlan] Use DL index type consistently for GEPs (#169396 ) In preparation to strip VPUnrollPartAccessor and unroll recipes directly, strip unnecessary complication in getGEPIndexTy, as the unroll part will no longer be available in follow-ups (see #168886 for instance). The patch also helps by doing a mass test update up-front. Narrowing the GEP index type conditionally does not yield any benefit, and the change is non-functional in terms of emitted assembly. While at it, avoid hard-coding address-space 0, and use the pointer operand's address space to get the GEP index type.	2025-11-26 12:25:55 +00:00
David Sherwood	c0a7b15d01	[LV][NFC] Remove remaining uses of undef in tests (#169357 ) Split off from PR #163525, this standalone patch replaces almost all the remaining cases where undef is used as value in loop vectoriser tests. This will reduce the likelihood of contributors hitting the `undef deprecator` warning in github. NOTE: The remaining use of undef in iv_outside_user.ll will be fixed in a separate PR. I've removed the test stride_undef from version-mem-access.ll, since there is already a stride_poison test.	2025-11-26 11:24:10 +00:00
Ramkumar Ramachandra	cb63e99e58	[VPlan] Include flags in VectorPointerRecipe::printRecipe (#169466 ) The change is non-functional with respect to emitted IR.	2025-11-25 10:26:51 +00:00
Ramkumar Ramachandra	c25e0d3e29	[VPlan] Simplify x + 0 -> x (#169394 )	2025-11-25 05:58:41 +00:00
Florian Hahn	48eb697441	[LV] Count cost of middle block if TC <= VF. (#168949 ) If the expected trip count is less than the VF, the vector loop will only execute a single iteration. When that's the case, the cost of the middle block has the same impact as the cost of the vector loop. Include it in isOutsideLoopWorkProfitable to avoid vectorizing when the extra work in the middle block makes it unprofitable. Note that isOutsideLoopWorkProfitable already scales the cost of blocks outside the vector region, but the patch restricts accounting for the middle block to cases where VF <= ExpectedTC, to initially catch some worst cases and avoid regressions. This initial version should specifically avoid unprofitable tail-folding for loops with low trip counts after re-applying https://github.com/llvm/llvm-project/pull/149042. PR: https://github.com/llvm/llvm-project/pull/168949	2025-11-24 19:23:04 +00:00
Ramkumar Ramachandra	299ea95747	[VPlan] Drop poison-generating flags on induction trunc (#168922 ) After truncating an integer-induction, neither nuw nor nsw hold. Fixes #168902. Co-authored-by: Florian Hahn <flo@fhahn.com>	2025-11-21 08:14:46 +00:00
Florian Hahn	a3f6c4308a	[LV] Add test a low-trip count test without folding the tail. Add a low trip count test that is currently vectorized but unprofitable, for https://github.com/llvm/llvm-project/issues/167858.	2025-11-20 21:24:33 +00:00
Florian Hahn	7acfbc23a7	[VPlan] Remove PtrIV::IsScalarAfterVectorization, use VPlan analysis. (#168289 ) Remove `VPWidenPointerInductionRecipe::IsScalarAfterVectorization` and replace it with `onlyScalarValuesUsed`. This removes the need to carry state from the legacy cost model through VPlan, and the VPlan-based analysis gives more accurate results, avoiding a number of extracts. PR: https://github.com/llvm/llvm-project/pull/168289	2025-11-20 18:58:25 +00:00
Florian Hahn	827ff2c1ce	[LV] Add tests for loops with low trip counts requiring tail-folding. Add extra tests for over-eager tail-folding for tiny trip-count loops. Reduced from https://github.com/llvm/llvm-project/issues/167858.	2025-11-20 18:42:12 +00:00
Florian Hahn	7c34848ae1	[VPlan] Hoist loads with invariant addresses using noalias metadata. (#166247 ) This patch implements a transform to hoists single-scalar replicated loads with invariant addresses out of the vector loop to the preheader when scoped noalias metadata proves they cannot alias with any stores in the loop. This enables hosting of loads we can prove do not alias any stores in the loop due to memory runtime checks added during vectorization. PR: https://github.com/llvm/llvm-project/pull/166247	2025-11-18 09:35:48 +00:00
Ramkumar Ramachandra	ef023cae38	Reland [VPlan] Expand WidenInt inductions with nuw/nsw (#168354 ) Changes: The previous patch had to be reverted to a mismatching-OpType assert in cse. The reduced-test has now been added corresponding to a RVV pointer-induction, and the pointer-induction case has been updated to use createOverflowingBinaryOp. While at it, record VPIRFlags in VPWidenInductionRecipe.	2025-11-17 13:44:25 +00:00
Alex Bradbury	f2336d4c7e	Revert "[VPlan] Expand WidenInt inductions with nuw/nsw" (#168080 ) Reverts llvm/llvm-project#163538 This is causing build failures on the two-stage RVV buildbots. e.g. https://lab.llvm.org/buildbot/#/builders/214/builds/1363. I've shared a reproducer and more information at https://github.com/llvm/llvm-project/pull/163538#issuecomment-3533482822 This reverts commit 355e0f94af5adabe90ac57110ce1b47596afd4cd.	2025-11-14 16:11:48 +00:00
Ramkumar Ramachandra	355e0f94af	[VPlan] Expand WidenInt inductions with nuw/nsw (#163538 ) While at it, record VPIRFlags in VPWidenInductionRecipe.	2025-11-14 12:10:55 +00:00
Florian Hahn	a6edeedbfa	Revert "[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042 )" This reverts commit 62d1a080e69e3c5e98840e000135afa7c688a77b. This appears to be causing some runtime failures on RISCV https://lab.llvm.org/buildbot/#/builders/210/builds/5221	2025-11-13 22:34:55 +00:00
Ramkumar Ramachandra	9ba738af2c	[VPlan] Fix assert in store-user in narrowToSingleScalars (#167686 ) Follow up on c2d4c7c18b96 ([VPlan] Permit more users in narrowToSingleScalars) to fix an assert related to WidenStore users of the recipe being narrowed in narrowToSingleScalars.	2025-11-12 17:53:24 +00:00
Florian Hahn	62d1a080e6	[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042 ) Building on top of https://github.com/llvm/llvm-project/pull/148817, introduce a new abstract LastActiveLane opcode that gets lowered to Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1). When folding the tail, update all extracts for uses outside the loop the extract the value of the last actice lane. See also https://github.com/llvm/llvm-project/issues/148603 PR: https://github.com/llvm/llvm-project/pull/149042	2025-11-12 15:11:00 +00:00
Luke Lau	bfd4155f23	[VPlan] Don't apply predication discount to non-originally-predicated blocks (#160449 ) Split off from #158690. Currently if an instruction needs predicated due to tail folding, it will also have a predicated discount applied to it in multiple places. This is likely inaccurate because we can expect a tail folded instruction to be executed on every iteration bar the last. This fixes it by checking if the instruction/block was originally predicated, and in doing so prevents vectorization with tail folding where we would have had to scalarize the memory op anyway. On llvm-test-suite this causes 4 loops in total to no longer be vectorized with -O3 on arm64-apple-darwin, and there's no observable performance impact.	2025-11-10 12:10:40 +00:00
Ramkumar Ramachandra	2d1d5fe78e	[VPlan] Simplify branch-cond with getVectorTripCount (#155604 ) Call getVectorTripCount first, and call getTripCount failing that, in simplifyBranchConditionForVFAndUF, to simplify missed cases. While at it, strip the dead check for a zero TC.	2025-11-10 10:43:37 +00:00
Florian Hahn	b0b4616790	[VPlan] Handle single-scalar conds in VPWidenSelectRecipe. (#165506 ) Generalize VPWidenSelectRecipe codegen to consider single-scalar conditions instead of just loop-invariant ones. If the condition is a single-scalar, we can simply use a scalar condition. PR: https://github.com/llvm/llvm-project/pull/165506	2025-11-05 22:11:29 +00:00
David Sherwood	7b3fe5fd42	[LV][NFC] Remove undef values in some test cases (#164401 ) Split off from PR #163525, this standalone patch replaces simple cases where undef is used as a value for arithmetic or getelementptr instructions. This will reduce the likelihood of contributors hitting the `undef deprecator` warning in github.	2025-11-05 09:18:02 +00:00
Florian Hahn	af9a4263a1	[LAA] Only use inbounds/nusw in isNoWrap if the GEP is dereferenced. (#161445 ) Update isNoWrap to only use the inbounds/nusw flags from GEPs that are guaranteed to be dereferenced on every iteration. This fixes a case where we incorrectly determine no dependence. I think the issue is isolated to code that evaluates the resulting AddRec at BTC, just using it to compute the distance between accesses should still be fine; if the access does not execute in a given iteration, there's no dependence in that iteration. But isolating the code is not straight-forward, so be conservative for now. The practical impact should be very minor (only one loop changed across a corpus with 27k modules from large C/C++ workloads. Fixes https://github.com/llvm/llvm-project/issues/160912. PR: https://github.com/llvm/llvm-project/pull/161445	2025-11-04 17:08:12 +00:00
Florian Hahn	b7e922a3da	[VPlan] Convert BuildVector with all-equal values to Broadcast. (#165826 ) Fold BuildVector where all operands are equal to Broadcast of the first operand. This will subsequently make it easier to remove additional buildvectors/broadcasts, e.g. via https://github.com/llvm/llvm-project/pull/165506. PR: https://github.com/llvm/llvm-project/pull/165826	2025-11-01 17:28:42 -07:00
Florian Hahn	317b42ef5c	[VPlan] Remove original recipe after narrowing to single-scalar. Directly remove RepOrWidenR after replacing all uses. Removing the dead user early unlocks additional opportunities for further narrowing.	2025-10-31 04:38:16 +00:00
Florian Hahn	98d3a25f74	[VPlan] Don't preserve LCSSA in expandSCEVs. (#165505 ) This follows similar reasoning as 45ce88758d24 (https://github.com/llvm/llvm-project/pull/159556): LV does not preserve LCSSA, it constructs it just before processing a loop to vectorize. Runtime check expressions are invariant to that loop, so expanding them should not break LCSSA form for the loop we are about to vectorize. LV creates SCEV and memory runtime checks early on and then disconnects the blocks temporarily. The patch fixes a mis-compile, where previously LCSSA construction during SCEV expand may replace uses in currently unreachable SCEV/memory check blocks. Fixes https://github.com/llvm/llvm-project/issues/162512 PR: https://github.com/llvm/llvm-project/pull/165505	2025-10-29 18:25:46 +00:00
paperchalice	249883d0c5	[test][Transforms] Remove unsafe-fp-math uses part 2 (NFC) (#164786 ) Post cleanup for #164534.	2025-10-23 20:31:31 +08:00
Florian Hahn	bfc322dd72	Revert "[VPlan] Run narrowInterleaveGroups during general VPlan optimizations. (#149706 )" This reverts commit 8d29d09309654541fb2861524276ada6a3ebf84c. There have been reports of mis-compiles in https://github.com/llvm/llvm-project/pull/149706. Revert while I investigate.	2025-10-22 21:27:11 +01:00
Florian Hahn	8d29d09309	[VPlan] Run narrowInterleaveGroups during general VPlan optimizations. (#149706 ) Move narrowInterleaveGroups to to general VPlan optimization stage. To do so, narrowInterleaveGroups now has to find a suitable VF where all interleave groups are consecutive and saturate the full vector width. If such a VF is found, the original VPlan is split into 2: a) a new clone which contains all VFs of Plan, except VFToOptimize, and b) the original Plan with VFToOptimize as single VF. The original Plan is then optimized. If a new copy for the other VFs has been created, it is returned and the caller has to add it to the list of candidate plans. Together with https://github.com/llvm/llvm-project/pull/149702, this allows to take the narrowed interleave groups into account when computing costs to choose the best VF and interleave count. One example where we currently miss interleaving/unrolling when narrowing interleave groups is https://godbolt.org/z/Yz77zbacz PR: https://github.com/llvm/llvm-project/pull/149706	2025-10-21 11:37:42 +01:00
David Sherwood	822c291aac	[LV][NFC] Remove undef from phi incoming values (#163762 ) Split off from PR #163525, this standalone patch replaces use of undef as incoming PHI values with zero, in order to reduce the likelihood of contributors hitting the `undef deprecator` warning in github.	2025-10-21 10:49:27 +01:00
Florian Hahn	9317975a7a	[VPlan] Match legacy behavior w.r.t. using pointer phis as scalar addrs. When the legacy cost model scalarizes loads that are used as addresses for other loads and stores, it looks to phi nodes, if they are direct address operands of loads/stores. Match this behavior in isUsedByLoadStoreAddress, to fix a divergence between legacy and VPlan-based cost model.	2025-10-20 11:09:25 +01:00
Nikita Popov	573ca36753	[IR] Replace alignment argument with attribute on masked intrinsics (#163802 ) The `masked.load`, `masked.store`, `masked.gather` and `masked.scatter` intrinsics currently accept a separate alignment immarg. Replace this with an `align` attribute on the pointer / vector of pointers argument. This is the standard representation for alignment information on intrinsics, and is already used by all other memory intrinsics. This means the signatures now match llvm.expandload, llvm.vp.load, etc. (Things like llvm.memcpy used to have a separate alignment argument as well, but were already migrated a long time ago.) It's worth noting that the masked.gather and masked.scatter intrinsics previously accepted a zero alignment to indicate the ABI type alignment of the element type. This special case is gone now: If the align attribute is omitted, the implied alignment is 1, as usual. If ABI alignment is desired, it needs to be explicitly emitted (which the IRBuilder API already requires anyway).	2025-10-20 08:50:09 +00:00
Florian Hahn	b9ce7656e9	[VPlan] Add VPInstruction to unpack vector values to scalars. (#155670 ) Add a new Unpack VPInstruction (name to be improved) to explicitly extract scalars values from vectors. Test changes are movements of the extracts: they are no generated together and also directly after the producer. Depends on https://github.com/llvm/llvm-project/pull/155102 (included in PR) PR: https://github.com/llvm/llvm-project/pull/155670	2025-10-19 18:49:05 +00:00
Nikita Popov	8fa4a1029c	[LoopVectorize] Regenerate test checks (NFC)	2025-10-16 18:21:42 +02:00
Ramkumar Ramachandra	b71515cc76	[VPlan] Extend licm to hoist assumes (#162636 ) Assumes are safe to hoist if they're guaranteed to execute, since they don't alias, and don't throw. This mirrors what the IR-LICM does.	2025-10-16 13:59:32 +00:00
David Sherwood	c48aa54656	[LV][NFC] Remove undef from function return values (#163578 ) Split off from PR #163525, this standalone patch replaces `ret * undef` returns with `ret void` in order to reduce the likelihood of contributors hitting the `undef deprecator` warning in github.	2025-10-16 09:49:38 +01:00
Florian Hahn	ba69e33e13	[LV] Consistently apply address def scalarization across loop. Consistently scalarize loads used as part of address computations across all uses in the loop. This aligns the VPlan and legacy cost model and fixes a divergence crash. It doesn't matter if the load and address users are in different blocks, as long as they are in the same loop, the scalar value can be used. This removes a number of insert/extracts.	2025-10-09 22:04:15 +01:00
Florian Hahn	98ce434870	[VPlan] Skip VPBlendRecipe in isUsedByLoadStoreAddress. VPBlendRecipes are introduced as part of if-conversion, potentially adding a def-use chain from a load used in a compare to another load/store. In the scalar IR, there is no connection via def-use chains, so the legacy cost model won't consider the load used by memory operation. Skipping blends brings the VPlan-based cost-computation in line with the legacy cost model after https://github.com/llvm/llvm-project/pull/162157.	2025-10-08 18:43:23 +01:00

1 2 3 4 5 ...

1041 Commits