1066 Commits

Author SHA1 Message Date
Benjamin Maxwell
f22a178b13
Reland "[LV] Support conditional scalar assignments of masked operations" (#180708)
This patch extends the support added in #158088 to loops where the
assignment is non-speculatable (e.g. a conditional load or divide).

For example, the following loop can now be vectorized:

```
int simple_csa_int_load(
  int* a, int* b, int default_val, int N, int threshold)
{
  int result = default_val;
  for (int i = 0; i < N; ++i)
    if (a[i] > threshold)
      result = b[i];
  return result;
}
```

It does this by extending the recurrence matching from only looking for
selects, to include phis where all operands are the header phi, except
for one which can be an arbitrary value outside the recurrence.

---

Reverts llvm/llvm-project#180275 (original PR: #178862)

Additional type legalization for `ISD::VECTOR_FIND_LAST_ACTIVE` was
added in #180290, which should resolve the backend crashes on x86.
2026-02-10 09:57:48 +00:00
Florian Hahn
3192fe2c7b
[VPlan] Fall back to legacy cost model if PtrSCEV is nullptr.
There are some cases when PtrSCEV can be nullptr. Fall back to legacy
cost model, to not call isLoopInvariant with nullptr.

Fixes a crash after 0c4f8094939d2.
2026-02-08 11:55:12 +00:00
Kewen Meng
703c2762d3
Revert "[LV] Support conditional scalar assignments of masked operations" (#180275)
Reverts llvm/llvm-project#178862 

revert to unblock bot:
https://lab.llvm.org/buildbot/#/builders/206/builds/13225
2026-02-06 13:24:40 -08:00
Benjamin Maxwell
4f90eb6427
[LV] Support conditional scalar assignments of masked operations (#178862)
This patch extends the support added in #158088 to loops where the
assignment is non-speculatable (e.g. a conditional load or divide).

For example, the following loop can now be vectorized:

```
int simple_csa_int_load(
  int* a, int* b, int default_val, int N, int threshold)
{
  int result = default_val;
  for (int i = 0; i < N; ++i)
    if (a[i] > threshold)
      result = b[i];
  return result;
}
```

It does this by extending the recurrence matching from only looking for
selects, to include phis where all operands are the header phi, except
for one which can be an arbitrary value outside the recurrence.
2026-02-06 11:43:06 +00:00
Mel Chen
8c6658aca6
[VPlan] Sink recipes from the vector loop region in licm. (#168031)
When a recipe can be safely sunk and all of its users are outside the
vector loop region in the same dedicated exit block, the recipe does not
need to be executed on every iteration.
This patch extends the VPlan-based LICM (Loop Invariant Code Motion) to
also sink such recipes from the vector loop region into the exit block.
This reduces redundant computation and improves cost model accuracy.

TODO: Support nested loop sinking
TODO: Support sinking `VPReplicateRecipe` (requires `replicateByVF`
fixes)
TODO: Support recipes with multiple defined values (e.g., interleaved
loads)
TODO: Clone recipes without users to all exit blocks
TODO: Support PHI node users by checking incoming value blocks
TODO: Support sinking when users are in multiple blocks
TODO: Clone recipes when users are on multiple exit paths

Co-authored-by: Luke Lau <luke@igalia.com>

---------

Co-authored-by: Luke Lau <luke@igalia.com>
Co-authored-by: Luke Lau <luke_lau@icloud.com>
2026-02-03 07:57:15 +00:00
Florian Hahn
90b3712d8a
Reapply "[VPlan] Detect and create partial reductions in VPlan. (NFCI) (#167851)"
This reverts commit d1e477b00b49c63ff4dd513eeb14a5b18bc055d7.

Recommit with a extra checks making sure extends are VPWidenCastRecipes,
rejecting VPReplicateRecipes.

Original message:
As a first step, move the existing partial reduction detection logic to
VPlan, trying to preserve the existing code structure & behavior as
closely as possible.

With this, partial reductions are detected and created together in a
single step.

This allows forming partial reductions and bundling them up if
profitable together in a follow-up.

PR: https://github.com/llvm/llvm-project/pull/167851
2026-02-01 16:27:27 +00:00
Mel Chen
149c76538e
[LV] Separate runtime check cost from total overhead in profitability check (#176754)
In isOutsideLoopWorkProfitable function, there are two places where only
the runtime check cost (RtC) should be used, but incorrectly included
the costs of middle blocks and early-exit blocks.
1. VectorizeMemoryCheckThreshold comparison for interleaving-only
2. Minimum trip count that bounds runtime check overhead, i.e. MinTC2
calculation
This results in an overly conservative minimum profitable trip count.

This patch separates the runtime check cost from the total overhead
cost, and uses only RtC for VectorizeMemoryCheckThreshold comparison and
the MinTC2 calculation.
2026-01-23 07:29:56 +00:00
Florian Hahn
69fbab2b72
[VPlan] Fall back to legacy cost if operands may be force-scalarized.
If any of the operands of a VPReplicateRecipe have been
force-scalarized, then the legacy cost model skips the scalarization
overhead, but we cannot match this in the VPlan cost model.

Bail out for now in those very rare cases.

Fixes https://github.com/llvm/llvm-project/issues/176720.
2026-01-19 21:25:54 +00:00
Florian Hahn
8c5352cf3e
[LV] Add additional cost and folding test coverage. (NFC) 2026-01-14 22:19:11 +00:00
Graham Hunter
2abd6d6d7a
[LV] Vectorize conditional scalar assignments (#158088)
Based on Michael Maitland's previous work:
https://github.com/llvm/llvm-project/pull/121222

This PR uses the existing recurrences code instead of introducing a
new pass just for CSA autovec. I've also made recipes that are more
generic.
2026-01-14 14:59:18 +00:00
Florian Hahn
8f182526de
[VPlan] Don't fold UDiv in replicate regions. (#175460)
The UDiv fold added in d12e993 (#174581) is currently also applied to
replicate regions, which means we may end up with VPInstructions in
replicate regions, which is currently nots supported.

Fixes https://github.com/llvm/llvm-project/issues/175295.

PR: https://github.com/llvm/llvm-project/pull/175460
2026-01-12 12:16:48 +00:00
Aiden Grossman
7450a75b93
[VPlan] Allow truncation for lanes in VPScalarIVStepsRecipe (#175268)
VPScalarIVStepsRecipe relies on APInt truncation in order to vectorize
blocks with a width greater than the maximum value the types of some of
their (changing) operands are able to hold (e.g., an i1 input with a
vector width of 4). Simply reenable implicit truncation in
ConstantInt::get() to cover this case.

Remove the helper function given it is only called in one place to
prevent accidentally using it elsewhere where we probably do not want
implicit truncation turned on.

This fixes another case that we saw after
acb78bde6fb613a9af2a604bc69fa744a8cee850 did not fix that issue, which
had the same stack trace. We still want to keep lane constants as
unsigned.

Somewhat similar to 6d1e7d4982fabc9e245897056a5425496df6a7a3.

This test case comes from a tensorflow/XLA compilation from a test case
in https://github.com/google-research/spherical-cnn.
2026-01-10 10:34:46 -05:00
Aiden Grossman
acb78bde6f
[VPlan] Use unsigned integers for lane start indices (#175231)
a83c89495ba6fe0134dcaa02372c320cc7ff0dbf caused assertion failures here
as if we have a single bit induction variable and two lanes (0 and 1),
then the second lane index (1) will be out of bounds of what a signed
1-bit integer can hold. Lane indices are always >0 according to
VPlanHelpers.h:125, and the lane representation in this code is also
unsigned.

The test case come from tensorflow/XLA.
2026-01-09 14:28:28 -08:00
David Sherwood
97ee9b66c0
[LV] Teach m_One, m_ZeroInt patterns to look through broadcasts (#170159)
In VPlanPatternMatch.h I have changed the int_pred_ty code to look
through broadcasts in order to catch more cases, i.e. multiplying by a
splat of one, etc.
2026-01-07 10:35:08 +00:00
Shih-Po Hung
39d6f10e33
[LV] Conservatively predicate SDiv/SRem (#170818)
Conservatively predicate sdiv/srem:
- RHS may carry poison in masked‑off lanes.
- RHS could be −1 while LHS has masked‑off lanes (risking INT_MIN/−1
overflow).

We’ll relax this once we can prove non‑wrap/non‑poison conditions.

Fixes #170775.
2026-01-07 04:25:38 +00:00
Ramkumar Ramachandra
d12e99376f
Reland [VPlan] Simplify pow-of-2 (mul|udiv) -> (shl|lshr) (#174581)
The original patch, landed as a2db31b0 ([VPlan] Simplify pow-of-2
(mul|udiv) -> (shl|lshr), #172477) had a critical commutative matcher
bug, which has now been fixed. An assert has also been strengthened,
following a post-commit review.
2026-01-06 20:36:26 +00:00
Alex Bradbury
5a456c17d9
Revert "[VPlan] Simplify pow-of-2 (mul|udiv) -> (shl|lshr)" (#174559)
Reverts llvm/llvm-project#172477

This is causing failures for RVA23 (including some tests running away in
their execution causing OOM, hence the builder dying). I will attempt to
follow up on the PR with a reproducer of some kind.
https://lab.llvm.org/buildbot/#/builders/210/builds/7243
2026-01-06 10:26:51 +00:00
Ramkumar Ramachandra
a2db31b06f
[VPlan] Simplify pow-of-2 (mul|udiv) -> (shl|lshr) (#172477) 2026-01-06 08:27:48 +00:00
Nikita Popov
707d18c8e7
[VPlan] Use getSigned() for index in VectorEndPointer recipe (#174426)
The stride can be negative here, so we should use getSigned().

This avoids an assertion failure with
https://github.com/llvm/llvm-project/pull/171456. It also avoids a
miscompile if the index is >64-bit, but I don't think that can happen in
practice.
2026-01-06 09:10:59 +01:00
Florian Hahn
16830b2164
[VPlan] Remove VPWidenSelectRecipe, use VPWidenRecipe instead (NFCI). (#174234)
All extra state has been removed from VPWidenSelectRecipe at this point.
There's no benefit of having a separate recipe and Select can easily be
handled by the existing VPWidenRecipe.

PR: https://github.com/llvm/llvm-project/pull/174234
2026-01-05 22:33:37 +00:00
Florian Hahn
b4d833135a
[VPlan] Handle non-free bitcasts in getCostForRecipeWithOpcode.
Update bitcast cost handling to match the legacy cost model.
2026-01-02 18:04:13 +00:00
Florian Hahn
5ee82dffc6
[VPlan] Handle addrspacecast/ptrtoaddr in VPlan-based cost model.
Also handle missing PtrToAddrs and AddrSpaceCast in
getCostForRecipeWithOpcode.

This makes sure all cast opcodes are handled, fixing a crash on loops
replicating addrspacecast and ptrtoaddrs.
2026-01-01 10:35:35 +00:00
Florian Hahn
2d60f87111
[VPlan] Only use legacy cost for instructions only used by exit conds. (#174029)
Currently we need to precompute costs for exit conditions, to match the
legacy cost, as they will get replaced by a compare against the
canonical IV (or others, like active-lane-mask or EVL based) and the
original compare will get removed.

This is not true for instructions with users other than the exit
condition. Those will remain, and we can just use the VPlan-based cost
model to get more accurate results.

This improves results in some cases, like
@test_value_in_exit_compare_chain_used_outside because the IV increment
user outside the loop is replaced by computing the final value outside
the loop.

It also fixes a crash introduced by f196b1d66ff (#146525).

PR: https://github.com/llvm/llvm-project/pull/174029
2025-12-31 13:34:54 +00:00
Florian Hahn
746eced47d
[LV] Add extra tests for computing replicating cast costs (NFC) 2025-12-30 22:08:04 +00:00
Florian Hahn
1f78f6a2d6
[LV] Check Addr in getAddressAccessSCEV in terms of SCEV expressions. (#171204)
getAddressAccessSCEV previously had some restrictive checks that limited
pointer SCEV expressions passed to TTI to GEPs with operands that must
either be invariant or marked as inductions.

As a consequence, the check rejected things like `GEP %base, (%iv + 1)`,
while the SCEV for the GEP should be as easily analyzeable as for `GEP
%base, %v`, with the only difference being the of the AddRec start
adjusted by 1.

This patch changes the code to use a SCEV-based check, limiting the
address SCEV to be loop invariant, an affine AddRec (i.e. induction ),
or an add expression of such operands or a sign-extended AddRec.

This catches all existing cases getAddressAccessSCEV caught, plus
additional ones like the cases mentioned above.

This means we pass address SCEVs in more cases, giving the backends a
better change to make informed decisions. It also unifies the decision
when to use an address SCEV between the legacy and VPlan-based cost
model.

An illustrative example of showing the impact are the gather-cost.ll
tests. Previously they were considered not profitable to vectorize
because we failed to determine that
 %gep.src_data = getelementptr inbounds [1536 x float], ptr @src_data,
                                                        i64 0, i64 %mul
has a relatively small constant stride.

There may be some rough edges in the cost models, where not passing
pointer SCEVs hid some incorrect modeling, but those issues should be
fixed in the target cost models if they surface.


PR: https://github.com/llvm/llvm-project/pull/171204
2025-12-19 22:05:27 +00:00
Mel Chen
f196b1d66f
[VPlan] Extract reverse operation for reverse accesses (#146525)
This patch introduces VPInstruction::Reverse and extracts the reverse
operations of loaded/stored values from reverse memory accesses. This
extraction facilitates future support for permutation elimination within
VPlan.
2025-12-18 14:57:48 +00:00
Florian Hahn
bab0dc4d48
Reapply "[LV] Mark checks as never succeeding for high cost cutoff."
Reapply 8a115b6934a90441 with an update to tests handling remarks.

The patch now directly emits a clear remark when we bail out
due to the memory check threshold.

Original message:
When GeneratedRTChecks::create bails out due to exceeding the cost
threshold, no runtime checks are generated and we must not proceed
assuming checks have been generated.

Mark the checks as never succeeding, to make sure we don't try to
vectorize assuming the runtime checks hold. This fixes a case where we
previously incorrectly vectorized assuming runtime checks had been
generated when forcing vectorization via metadate.

Fixes the mis-compile mentioned in
https://github.com/llvm/llvm-project/pull/166247#issuecomment-3631471588
2025-12-17 20:21:49 +00:00
Luke Lau
67d0e21a62
Reapply "[VPlan] Remove legacy costing inside VPBlendRecipe::computeCost (#171846)" (#172261)
This reapplies #171846 with a test case and fix for a legacy cost-model
mismatch assertion.

In the previous version of the patch, we only considered the plan to
contain simplifications when it had a VPBlendRecipe and VF.isScalar()
was true.

However for some VPlans we may have a blend with only the first lane
used:

    BLEND ir<%phi> = ir<%foo.res> ir<%bar.res>/ir<%c>
    CLONE ir<%gep> = getelementptr ir<%p>, ir<%phi>
    vp<%5> = vector-pointer ir<%gep>

And in the legacy cost model we cost a blend as a phi if it's uniform:

// If we know that this instruction will remain uniform, check the cost
of
    // the scalar version.
    if (isUniformAfterVectorization(I, VF))
      VF = ElementCount::getFixed(1);

So this replaces the VF.isScalar() check with
vputils::onlyFirstLaneUsed, which matches how the VPlan cost model
mirrored the legacy model beforehand.

A VPInstruction::Select will also emit a scalar select for a vector VF
if only the first lane is used, so this also updates
VPBlendRecipe::computeCost to reflect that too.
2025-12-16 06:30:54 +00:00
Ramkumar Ramachandra
0636225b93
[VPlan] Directly unroll VectorPointerRecipe (#168886)
In an effort to get rid of VPUnrollPartAccessor and directly unroll
recipes, start by directly unrolling VectorPointerRecipe, allowing for
VPlan-based simplifications and simplification of the corresponding
execute.
2025-12-15 10:54:06 +00:00
Florian Hahn
a99a982440
[LV] Add test coverage for remark for unprofitable RT checks.
Add test coverage for remark when runtime checks are not profitable with
threshold provided.

Also make sure that X86 remark tests actually passes an X86 triple,
which is needed for the threshold remark.

Also clean up the tests a bit.
2025-12-13 22:44:09 +00:00
Luke Lau
4ea8157773 Revert "[VPlan] Remove legacy costing inside VPBlendRecipe::computeCost (#171846)"
This reverts commit fd5f53aa9b21060063484fc6c346316a34a6464c.

It's triggering legacy cost model assertions reported in
https://github.com/llvm/llvm-project/pull/171846#issuecomment-3647640019
2025-12-13 20:05:34 +08:00
Luke Lau
fd5f53aa9b
[VPlan] Remove legacy costing inside VPBlendRecipe::computeCost (#171846)
A VPBlendRecipe always emits selects, even when the VF is scalar.

However the legacy cost model always costs all scalar non-header phis as
a phi, and the VPlan cost model has to account for this.

This can cause the cost to be a little off, for example not including
the cost of the select in @smax_call_uniform leading to unprofitable
vectorization.

This removes this from the VPlan cost model and handles checks for the
case in planContainsAdditionalSimplifications instead.

I considered trying to make the legacy cost model more accurate but I'm
not sure if it's possible. We need information as to whether or not the
scalar VF we are costing is the original loop in which case it's
actually a phi, or if it's a VPBlendRecipe that emits a select,
potentially from a VF=1, UF>=1 VPlan.
2025-12-12 00:25:58 +08:00
Florian Hahn
de53b1a4ef
[LV] Simplify IR for gather-cost.ll, auto-generate checks. (NFC)
Simplify tests and auto-generate check in preparation for further
updates.
2025-12-08 19:19:51 +00:00
Florian Hahn
1054a6e9de
[SCEV] Handle non-constant start values in AddRec UDiv canonicalization. (#170474)
Follow-up to https://github.com/llvm/llvm-project/pull/169576 to enable
UDiv canonicalization if the start of the AddRec is not constant.

The fold is not restricted to constant start values, as long as we are
able to compute a constant remainder. The fold is only applied if the
subtraction of the remainder can be folded into to start expression, but
that is just to avoid creating more complex AddRecs.

For reference, the proof from #169576 is
https://alive2.llvm.org/ce/z/iu2tav

PR: https://github.com/llvm/llvm-project/pull/170474
2025-12-03 21:13:11 +00:00
Florian Hahn
5d876093b7
[SCEV] Allow udiv canonicalization of potentially-wrapping AddRecs (#169576)
Extend the {X,+,N}/C => {(X - X%N),+,N}/C canonicalization to handle
AddRecs that may wrap, when X < N <= C and both N,C are powers of 2. The
alignment and power-of-2 properties ensure division results remain
equivalent for all offsets [(X - X%N), X).

Alive2 Proof: https://alive2.llvm.org/ce/z/iu2tav

Fixes https://github.com/llvm/llvm-project/issues/168709

PR: https://github.com/llvm/llvm-project/pull/169576
2025-12-02 14:09:53 +00:00
Florian Hahn
b76089c7f3
[VPlan] Skip uses-scalars restriction if one of ops needs broadcast. (#168246)
Update the logic in narrowToSingleScalar to allow narrowing even if not
all users use scalars, if at least one of the operands already needs
broadcasting.

In that case, there won't be any additional broadcasts introduced. This
should allow removing the special handling for stores, which can
introduce additional broadcasts currently.

Fixes https://github.com/llvm/llvm-project/issues/169668.

PR: https://github.com/llvm/llvm-project/pull/168246
2025-11-28 10:26:27 +00:00
Florian Hahn
f8eca64a28
Reapply "[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042)"
This reverts commit a6edeedbfa308876d6f2b1648729d52970bb07e6.

The following fixes have landed, addressing issues causing the original
revert:
* https://github.com/llvm/llvm-project/pull/169298
* https://github.com/llvm/llvm-project/pull/167897
* https://github.com/llvm/llvm-project/pull/168949

Original message:
Building on top of https://github.com/llvm/llvm-project/pull/148817,
introduce a new abstract LastActiveLane opcode that gets lowered to
Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1).

When folding the tail, update all extracts for uses outside the loop the
extract the value of the last actice lane.

See also https://github.com/llvm/llvm-project/issues/148603

PR: https://github.com/llvm/llvm-project/pull/149042
2025-11-26 20:03:55 +00:00
Florian Hahn
d58ebe339c
Revert "Reapply "[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042)""
This reverts commit 72e51d389f66d9cc6b55fd74b56fbbd087672a43.

Missed some test updates.
2025-11-26 19:41:39 +00:00
Florian Hahn
72e51d389f
Reapply "[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042)"
This reverts commit a6edeedbfa308876d6f2b1648729d52970bb07e6.

The following fixes have landed, addressing issues causing the original
revert:
* https://github.com/llvm/llvm-project/pull/169298
* https://github.com/llvm/llvm-project/pull/167897
* https://github.com/llvm/llvm-project/pull/168949

Original message:
Building on top of https://github.com/llvm/llvm-project/pull/148817,
introduce a new abstract LastActiveLane opcode that gets lowered to
Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1).

When folding the tail, update all extracts for uses outside the loop the
extract the value of the last actice lane.

See also https://github.com/llvm/llvm-project/issues/148603

PR: https://github.com/llvm/llvm-project/pull/149042
2025-11-26 19:31:25 +00:00
Ramkumar Ramachandra
2d4a8dadba
[VPlan] Use DL index type consistently for GEPs (#169396)
In preparation to strip VPUnrollPartAccessor and unroll recipes
directly, strip unnecessary complication in getGEPIndexTy, as the unroll
part will no longer be available in follow-ups (see #168886 for
instance). The patch also helps by doing a mass test update up-front.
Narrowing the GEP index type conditionally does not yield any benefit,
and the change is non-functional in terms of emitted assembly. While at
it, avoid hard-coding address-space 0, and use the pointer operand's
address space to get the GEP index type.
2025-11-26 12:25:55 +00:00
David Sherwood
c0a7b15d01
[LV][NFC] Remove remaining uses of undef in tests (#169357)
Split off from PR #163525, this standalone patch replaces almost all the
remaining cases where undef is used as value in loop vectoriser tests.
This will reduce the likelihood of contributors hitting the `undef
deprecator` warning in github.
NOTE: The remaining use of undef in iv_outside_user.ll will be fixed in
a separate PR.

I've removed the test stride_undef from version-mem-access.ll, since
there is already a stride_poison test.
2025-11-26 11:24:10 +00:00
Ramkumar Ramachandra
cb63e99e58
[VPlan] Include flags in VectorPointerRecipe::printRecipe (#169466)
The change is non-functional with respect to emitted IR.
2025-11-25 10:26:51 +00:00
Ramkumar Ramachandra
c25e0d3e29
[VPlan] Simplify x + 0 -> x (#169394) 2025-11-25 05:58:41 +00:00
Florian Hahn
48eb697441
[LV] Count cost of middle block if TC <= VF. (#168949)
If the expected trip count is less than the VF, the vector loop will
only execute a single iteration. When that's the case, the cost of the
middle block has the same impact as the cost of the vector loop. Include
it in isOutsideLoopWorkProfitable to avoid vectorizing when the extra
work in the middle block makes it unprofitable.

Note that isOutsideLoopWorkProfitable already scales the cost of blocks
outside the vector region, but the patch restricts accounting for the
middle block to cases where VF <= ExpectedTC, to initially catch some
worst cases and avoid regressions.

This initial version should specifically avoid unprofitable tail-folding
for loops with low trip counts after re-applying
https://github.com/llvm/llvm-project/pull/149042.

PR: https://github.com/llvm/llvm-project/pull/168949
2025-11-24 19:23:04 +00:00
Ramkumar Ramachandra
299ea95747
[VPlan] Drop poison-generating flags on induction trunc (#168922)
After truncating an integer-induction, neither nuw nor nsw hold.

Fixes #168902.

Co-authored-by: Florian Hahn <flo@fhahn.com>
2025-11-21 08:14:46 +00:00
Florian Hahn
a3f6c4308a [LV] Add test a low-trip count test without folding the tail.
Add a low trip count test that is currently vectorized but unprofitable,
for https://github.com/llvm/llvm-project/issues/167858.
2025-11-20 21:24:33 +00:00
Florian Hahn
7acfbc23a7
[VPlan] Remove PtrIV::IsScalarAfterVectorization, use VPlan analysis. (#168289)
Remove `VPWidenPointerInductionRecipe::IsScalarAfterVectorization` and
replace it with `onlyScalarValuesUsed`. This removes the need to carry
state from the legacy cost model through VPlan, and the VPlan-based
analysis gives more accurate results, avoiding a number of extracts.

PR: https://github.com/llvm/llvm-project/pull/168289
2025-11-20 18:58:25 +00:00
Florian Hahn
827ff2c1ce
[LV] Add tests for loops with low trip counts requiring tail-folding.
Add extra tests for over-eager tail-folding for tiny trip-count loops.

Reduced from https://github.com/llvm/llvm-project/issues/167858.
2025-11-20 18:42:12 +00:00
Florian Hahn
7c34848ae1
[VPlan] Hoist loads with invariant addresses using noalias metadata. (#166247)
This patch implements a transform to hoists single-scalar replicated
loads with invariant addresses out of the vector loop to the preheader
when scoped noalias metadata proves they cannot alias with any stores in
the loop.

This enables hosting of loads we can prove do not alias any stores in
the loop due to memory runtime checks added during vectorization.

PR: https://github.com/llvm/llvm-project/pull/166247
2025-11-18 09:35:48 +00:00
Ramkumar Ramachandra
ef023cae38
Reland [VPlan] Expand WidenInt inductions with nuw/nsw (#168354)
Changes: The previous patch had to be reverted to a mismatching-OpType
assert in cse. The reduced-test has now been added corresponding to a
RVV pointer-induction, and the pointer-induction case has been updated
to use createOverflowingBinaryOp.

While at it, record VPIRFlags in VPWidenInductionRecipe.
2025-11-17 13:44:25 +00:00