511 Commits

Author SHA1 Message Date
Mel Chen
f196b1d66f
[VPlan] Extract reverse operation for reverse accesses (#146525)
This patch introduces VPInstruction::Reverse and extracts the reverse
operations of loaded/stored values from reverse memory accesses. This
extraction facilitates future support for permutation elimination within
VPlan.
2025-12-18 14:57:48 +00:00
Mel Chen
e655317cf1
[LV][EVL] Add test case for checking debug info when tail folding by EVL. nfc (#172429) 2025-12-18 08:59:37 +00:00
Ramkumar Ramachandra
0636225b93
[VPlan] Directly unroll VectorPointerRecipe (#168886)
In an effort to get rid of VPUnrollPartAccessor and directly unroll
recipes, start by directly unrolling VectorPointerRecipe, allowing for
VPlan-based simplifications and simplification of the corresponding
execute.
2025-12-15 10:54:06 +00:00
Ramkumar Ramachandra
c5b90103da
[VPlan] Use nuw when computing {VF,VScale}xUF (#170710)
These quantities should never unsigned-wrap. This matches the behavior
if only VFxUF is used (and not VF): when computing both VF and VFxUF,
nuw should hold for each step separately.
2025-12-08 15:46:02 +00:00
Luke Lau
e8219e5ce8
[VPlan] Use BlockFrequencyInfo in getPredBlockCostDivisor (#158690)
In 531.deepsjeng_r from SPEC CPU 2017 there's a loop that we
unprofitably loop vectorize on RISC-V.

The loop looks something like:

```c
  for (int i = 0; i < n; i++) {
    if (x0[i] == a)
      if (x1[i] == b)
        if (x2[i] == c)
          // do stuff...
  }
```

Because it's so deeply nested the actual inner level of the loop rarely
gets executed. However we still deem it profitable to vectorize, which
due to the if-conversion means we now always execute the body.

This stems from the fact that `getPredBlockCostDivisor` currently
assumes that blocks have 50% chance of being executed as a heuristic.

We can fix this by using BlockFrequencyInfo, which gives a more accurate
estimate of the innermost block being executed 12.5% of the time. We can
then calculate the probability as `HeaderFrequency / BlockFrequency`.

Fixing the cost here gives a 7% speedup for 531.deepsjeng_r on RISC-V.

Whilst there's a lot of changes in the in-tree tests, this doesn't
affect llvm-test-suite or SPEC CPU 2017 that much:

- On armv9-a -flto -O3 there's 0.0%/0.2% more geomean loops vectorized
on llvm-test-suite/SPEC CPU 2017.
- On x86-64 -flto -O3 **with PGO** there's 0.9%/0% less geomean loops
vectorized on llvm-test-suite/SPEC CPU 2017.

Overall geomean compile time impact is 0.03% on stage1-ReleaseLTO:
https://llvm-compile-time-tracker.com/compare.php?from=9eee396c58d2e24beb93c460141170def328776d&to=32fbff48f965d03b51549fdf9bbc4ca06473b623&stat=instructions%3Au
2025-12-08 14:28:26 +00:00
Florian Hahn
24b87b8d48
[VPlan] Skip cost verification for loops with EVL gather/scatter.
The VPlan-based cost model use vp_gather/vp_scatter for gather/scatter
costs, which is different to the legacy cost model and cannot be matched
there. Don't verify the costs match for plans containing gather/scatters
with EVL.

Fixes https://github.com/llvm/llvm-project/issues/169948.
2025-11-29 22:00:30 +00:00
Florian Hahn
b76089c7f3
[VPlan] Skip uses-scalars restriction if one of ops needs broadcast. (#168246)
Update the logic in narrowToSingleScalar to allow narrowing even if not
all users use scalars, if at least one of the operands already needs
broadcasting.

In that case, there won't be any additional broadcasts introduced. This
should allow removing the special handling for stores, which can
introduce additional broadcasts currently.

Fixes https://github.com/llvm/llvm-project/issues/169668.

PR: https://github.com/llvm/llvm-project/pull/168246
2025-11-28 10:26:27 +00:00
Florian Hahn
8459508227
[VPlan] Handle scalar VPWidenPointerInd in convertToConcreteRecipes. (#169338)
In some case, VPWidenPointerInductions become only used by scalars after
legalizeAndOptimizationInducftions was already run, for example due to
some VPlan optimizations.

Move the code to scalarize VPWidenPointerInductions to a helper and use
it if needed.

This fixes a crash after #148274 in the added test case.

Fixes https://github.com/llvm/llvm-project/issues/169780
2025-11-27 21:52:15 +00:00
Luke Lau
1c7ec06b16
[VPlan] Optimize LastActiveLane to EVL - 1 (#169766)
With EVL tail folding, the LastActiveLane can be computed with EVL - 1.
This removes the need for a header mask and vfirst.m for loops with live
outs on RISC-V:

     # %bb.5:                                # %for.cond.cleanup7
    -       vsetvli zero, zero, e32, m2, ta, ma
    -       vmv.v.x v8, s1
    -       vmsleu.vv       v10, v8, v22
    -       vfirst.m        a0, v10
    -       srli    a1, a0, 63
    -       czero.nez       a0, a0, a1
    -       czero.eqz       a1, s8, a1
    -       or      a0, a0, a1
    -       addi    a0, a0, -1
    -       vsetvli zero, zero, e64, m4, ta, ma
    -       vslidedown.vx   v8, v12, a0
    +       addi    s1, s1, -1
    +       vslidedown.vx   v8, v12, s1
2025-11-27 17:03:08 +08:00
Florian Hahn
f8eca64a28
Reapply "[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042)"
This reverts commit a6edeedbfa308876d6f2b1648729d52970bb07e6.

The following fixes have landed, addressing issues causing the original
revert:
* https://github.com/llvm/llvm-project/pull/169298
* https://github.com/llvm/llvm-project/pull/167897
* https://github.com/llvm/llvm-project/pull/168949

Original message:
Building on top of https://github.com/llvm/llvm-project/pull/148817,
introduce a new abstract LastActiveLane opcode that gets lowered to
Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1).

When folding the tail, update all extracts for uses outside the loop the
extract the value of the last actice lane.

See also https://github.com/llvm/llvm-project/issues/148603

PR: https://github.com/llvm/llvm-project/pull/149042
2025-11-26 20:03:55 +00:00
Florian Hahn
d58ebe339c
Revert "Reapply "[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042)""
This reverts commit 72e51d389f66d9cc6b55fd74b56fbbd087672a43.

Missed some test updates.
2025-11-26 19:41:39 +00:00
Florian Hahn
72e51d389f
Reapply "[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042)"
This reverts commit a6edeedbfa308876d6f2b1648729d52970bb07e6.

The following fixes have landed, addressing issues causing the original
revert:
* https://github.com/llvm/llvm-project/pull/169298
* https://github.com/llvm/llvm-project/pull/167897
* https://github.com/llvm/llvm-project/pull/168949

Original message:
Building on top of https://github.com/llvm/llvm-project/pull/148817,
introduce a new abstract LastActiveLane opcode that gets lowered to
Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1).

When folding the tail, update all extracts for uses outside the loop the
extract the value of the last actice lane.

See also https://github.com/llvm/llvm-project/issues/148603

PR: https://github.com/llvm/llvm-project/pull/149042
2025-11-26 19:31:25 +00:00
Ramkumar Ramachandra
2d4a8dadba
[VPlan] Use DL index type consistently for GEPs (#169396)
In preparation to strip VPUnrollPartAccessor and unroll recipes
directly, strip unnecessary complication in getGEPIndexTy, as the unroll
part will no longer be available in follow-ups (see #168886 for
instance). The patch also helps by doing a mass test update up-front.
Narrowing the GEP index type conditionally does not yield any benefit,
and the change is non-functional in terms of emitted assembly. While at
it, avoid hard-coding address-space 0, and use the pointer operand's
address space to get the GEP index type.
2025-11-26 12:25:55 +00:00
Ramkumar Ramachandra
cb63e99e58
[VPlan] Include flags in VectorPointerRecipe::printRecipe (#169466)
The change is non-functional with respect to emitted IR.
2025-11-25 10:26:51 +00:00
Ramkumar Ramachandra
37f7b3128d
Reland [VPlan] Handle WidenGEP in narrowToSingleScalars (#167880)
Changes: Fix a missed update to WidenGEP::usesFirstLaneOnly, and include
reduced-case test that was previously hitting the new assert: the
underlying reason was that VPWidenGEP::usesScalars was too weak, and the
single-scalar WidenGEP was not narrowed by narrowToSingleScalarRecipes.

This allows us to strip a special case in VPWidenGEP::execute.
2025-11-24 18:11:58 +00:00
Ramkumar Ramachandra
299ea95747
[VPlan] Drop poison-generating flags on induction trunc (#168922)
After truncating an integer-induction, neither nuw nor nsw hold.

Fixes #168902.

Co-authored-by: Florian Hahn <flo@fhahn.com>
2025-11-21 08:14:46 +00:00
Florian Hahn
7c34848ae1
[VPlan] Hoist loads with invariant addresses using noalias metadata. (#166247)
This patch implements a transform to hoists single-scalar replicated
loads with invariant addresses out of the vector loop to the preheader
when scoped noalias metadata proves they cannot alias with any stores in
the loop.

This enables hosting of loads we can prove do not alias any stores in
the loop due to memory runtime checks added during vectorization.

PR: https://github.com/llvm/llvm-project/pull/166247
2025-11-18 09:35:48 +00:00
Ramkumar Ramachandra
ef023cae38
Reland [VPlan] Expand WidenInt inductions with nuw/nsw (#168354)
Changes: The previous patch had to be reverted to a mismatching-OpType
assert in cse. The reduced-test has now been added corresponding to a
RVV pointer-induction, and the pointer-induction case has been updated
to use createOverflowingBinaryOp.

While at it, record VPIRFlags in VPWidenInductionRecipe.
2025-11-17 13:44:25 +00:00
Alex Bradbury
f2336d4c7e
Revert "[VPlan] Expand WidenInt inductions with nuw/nsw" (#168080)
Reverts llvm/llvm-project#163538

This is causing build failures on the two-stage RVV buildbots. e.g.
https://lab.llvm.org/buildbot/#/builders/214/builds/1363. I've shared a
reproducer and more information at
https://github.com/llvm/llvm-project/pull/163538#issuecomment-3533482822

This reverts commit 355e0f94af5adabe90ac57110ce1b47596afd4cd.
2025-11-14 16:11:48 +00:00
Ramkumar Ramachandra
355e0f94af
[VPlan] Expand WidenInt inductions with nuw/nsw (#163538)
While at it, record VPIRFlags in VPWidenInductionRecipe.
2025-11-14 12:10:55 +00:00
Luke Lau
851f8f7984
[VPlan] Disable partial reductions again with EVL tail folding (#167863)
VPPartialReductionRecipe doesn't yet support an EVL variant, and we
guard against this by not calling convertToAbstractRecipes when we're
tail folding with EVL.

However recently some things got shuffled around which means we may
detect some scaled reductions in collectScaledReductions and store them
in ScaledReductionMap, where outside of convertToAbstractRecipes we may
look them up and start e.g. adding a scale factor to an otherwise
regular VPReductionPHI.

This fixes it by skipping collectScaledReductions, and fixes #167861
2025-11-14 06:30:12 +00:00
Florian Hahn
a6edeedbfa Revert "[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042)"
This reverts commit 62d1a080e69e3c5e98840e000135afa7c688a77b.

This appears to be causing some runtime failures on RISCV
https://lab.llvm.org/buildbot/#/builders/210/builds/5221
2025-11-13 22:34:55 +00:00
Luke Lau
c0f7d51e8a
[VPlan] Simplify ExplicitVectorLength(%AVL) -> %AVL when AVL <= VF (#167647)
[`llvm.experimental.get.vector.length`](https://llvm.org/docs/LangRef.html#id2399)
has the property that if the AVL (%cnt) is less than or equal to VF
(%max_lanes) then the return value is just AVL.

This patch uses SCEV to simplify this in optimizeForVFAndUF, and adds
`ExplicitVectorLength` to
`VPInstruction::opcodeMayReadOrWriteFromMemory` so it gets removed once
dead.
2025-11-13 13:17:01 +00:00
Florian Hahn
62d1a080e6
[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042)
Building on top of https://github.com/llvm/llvm-project/pull/148817,
introduce a new abstract LastActiveLane opcode that gets lowered to
Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1).

When folding the tail, update all extracts for uses outside the loop the
extract the value of the last actice lane.

See also https://github.com/llvm/llvm-project/issues/148603

PR: https://github.com/llvm/llvm-project/pull/149042
2025-11-12 15:11:00 +00:00
Luke Lau
02c68b3ef7
[VPlan] Plumb scalable register size through narrowInterleaveGroups (#167505)
On RISC-V narrowInterleaveGroups doesn't kick in because the wrong
VectorRegWidth is passed to isConsecutiveInterleaveGroup.

narrowInterleaveGroups is always passed the RGK_FixedWidthVector
register size, but on RISC-V the RGK_ScalableVector size is twice as
large because we want to use LMUL 2. This causes the `GroupSize ==
VectorRegWidth` check to fail.

This fixes it by using the scalable register size whenever the VF is
scalable and plumbing it through as a potentially scalable TypeSize.

Note that this only makes a difference when tail folding is disabled, as
narrowInterleaveGroups can't handle EVL based IVs yet.
2025-11-12 11:14:53 +00:00
Mel Chen
68a4af6acc
[LV][EVL] Replace VPInstruction::Select with vp.merge for predicated div/rem (#154072)
Since div/rem operations don’t support a mask operand, the lanes of the
divisor that are masked out are currently replaced with 1 using
VPInstruction::Select before the predicated div/rem operation.
This patch replaces
```
  VPInstruction::Select(logical_and(header_mask, conditional_mask), LHS, RHS)
```
with
```
  vp.merge(conditional_mask, LHS, RHS, EVL)
```
so that the header mask can be replaced by EVL in this usage scenario
when tail folding with EVL.
2025-11-12 08:03:57 +00:00
Ramkumar Ramachandra
c8c328406c
Revert "[VPlan] Handle WidenGEP in narrowToSingleScalars" (#167509)
This reverts commit fdd52f5fe130fb8b98f4aed3d15aa0789cce6b40, as it
causes buildbot failures. This will give us time to investigate the
failure.

https://lab.llvm.org/buildbot/#/builders/210/builds/5160
2025-11-11 14:29:28 +00:00
Ramkumar Ramachandra
fdd52f5fe1
[VPlan] Handle WidenGEP in narrowToSingleScalars (#166740)
This allows us to strip a special case in VPWidenGEP::execute.
2025-11-11 10:33:55 +00:00
Ramkumar Ramachandra
c2d4c7c18b
[VPlan] Permit more users in narrowToSingleScalars (#166559)
narrowToSingleScalarRecipes can permit users that are WidenStore, or a
VPInstruction that has a suitable opcode. This is a generalization and
extension of the existing code.
2025-11-10 17:03:14 +00:00
Ramkumar Ramachandra
2d1d5fe78e
[VPlan] Simplify branch-cond with getVectorTripCount (#155604)
Call getVectorTripCount first, and call getTripCount failing that, in
simplifyBranchConditionForVFAndUF, to simplify missed cases. While at
it, strip the dead check for a zero TC.
2025-11-10 10:43:37 +00:00
Luke Lau
bac427a0f6
[VPlan] Remove no-longer-needed EVL VPlan debug output tests. NFC (#166158)
These VPlan debug output tests were added in
https://github.com/llvm/llvm-project/pull/108351 and
https://github.com/llvm/llvm-project/pull/110412, whenever we used to
convert regular widening recipes to VP intrinsics during EVL tail
folding.

Nowadays we don't convert these recipes so there's nothing really to be
gained from testing them. This removes the VPlan tests since an upcoming
patch slightly perturbs these VPlans and removing them seems easier than
manually going through and updating them all.

I've kept behind the LLVM IR/UTC counterparts in
`tail-folding-{cast,call}-intrinsics.ll`, since even though they also
aren't really testing anything useful at least they're easy to update.
2025-11-07 16:34:10 +08:00
Florian Hahn
b0b4616790
[VPlan] Handle single-scalar conds in VPWidenSelectRecipe. (#165506)
Generalize VPWidenSelectRecipe codegen to consider single-scalar
conditions instead of just loop-invariant ones.

If the condition is a single-scalar, we can simply use a scalar
condition.

PR: https://github.com/llvm/llvm-project/pull/165506
2025-11-05 22:11:29 +00:00
Florian Hahn
af9a4263a1
[LAA] Only use inbounds/nusw in isNoWrap if the GEP is dereferenced. (#161445)
Update isNoWrap to only use the inbounds/nusw flags from GEPs that are
guaranteed to be dereferenced on every iteration. This fixes a case
where we incorrectly determine no dependence.

I think the issue is isolated to code that evaluates the resulting
AddRec at BTC, just using it to compute the distance between accesses
should still be fine; if the access does not execute in a given
iteration, there's no dependence in that iteration. But isolating the
code is not straight-forward, so be conservative for now. The practical
impact should be very minor (only one loop changed across a corpus with
27k modules from large C/C++ workloads.

Fixes https://github.com/llvm/llvm-project/issues/160912.

PR: https://github.com/llvm/llvm-project/pull/161445
2025-11-04 17:08:12 +00:00
Sam Tebbs
6b19a546aa
[LV] Bundle partial reductions inside VPExpressionRecipe (#147302)
This PR bundles partial reductions inside the VPExpressionRecipe class.

Stacked PRs:
1. https://github.com/llvm/llvm-project/pull/147026
2. https://github.com/llvm/llvm-project/pull/147255
3. https://github.com/llvm/llvm-project/pull/156976
4. https://github.com/llvm/llvm-project/pull/160154
5. -> https://github.com/llvm/llvm-project/pull/147302
6. https://github.com/llvm/llvm-project/pull/162503
7. https://github.com/llvm/llvm-project/pull/147513
2025-10-23 11:18:55 +00:00
Florian Hahn
35b9f20449
[LV] Check for TruncInsts in canTruncateToMinimalBitwidth.
TruncInst must truncate at most to their destination. Return false if
MinBWs contains a destination size > the trunc result type size.

Fixes https://github.com/llvm/llvm-project/issues/162688.
2025-10-20 22:31:16 +01:00
Nikita Popov
573ca36753
[IR] Replace alignment argument with attribute on masked intrinsics (#163802)
The `masked.load`, `masked.store`, `masked.gather` and `masked.scatter`
intrinsics currently accept a separate alignment immarg. Replace this
with an `align` attribute on the pointer / vector of pointers argument.

This is the standard representation for alignment information on
intrinsics, and is already used by all other memory intrinsics. This
means the signatures now match llvm.expandload, llvm.vp.load, etc.
(Things like llvm.memcpy used to have a separate alignment argument as
well, but were already migrated a long time ago.)

It's worth noting that the masked.gather and masked.scatter intrinsics
previously accepted a zero alignment to indicate the ABI type alignment
of the element type. This special case is gone now: If the align
attribute is omitted, the implied alignment is 1, as usual. If ABI
alignment is desired, it needs to be explicitly emitted (which the
IRBuilder API already requires anyway).
2025-10-20 08:50:09 +00:00
Florian Hahn
b9ce7656e9
[VPlan] Add VPInstruction to unpack vector values to scalars. (#155670)
Add a new Unpack VPInstruction (name to be improved) to explicitly
extract scalars values from vectors.

Test changes are movements of the extracts: they are no generated
together and also directly after the producer.

Depends on https://github.com/llvm/llvm-project/pull/155102 (included in
PR)

PR: https://github.com/llvm/llvm-project/pull/155670
2025-10-19 18:49:05 +00:00
Nikita Popov
8fa4a1029c [LoopVectorize] Regenerate test checks (NFC) 2025-10-16 18:21:42 +02:00
Florian Hahn
4bf5ab4f9d
[VPlan] Set flags when constructing truncs using VPWidenCastRecipe.
VPWidenCastRecipes with Trunc opcodes where missing the correct OpType
for IR flags. Update createWidenCast to set the correct flags for
truncs, and use it consistenly.

Fixes https://github.com/llvm/llvm-project/issues/162374.
2025-10-12 14:01:12 +01:00
Ramkumar Ramachandra
2a02d57efb
[IR] Mark vector intrinsics speculatable (#162334)
The vector intrinsics in question have no undefined behavior, and have
no other effect besides returning the result: they should hence be
marked speculatable.
2025-10-09 09:41:59 +01:00
Florian Hahn
8907b6d393
[VPlan] Remove original loop blocks if dead. (#155497)
Build on top of https://github.com/llvm/llvm-project/pull/154510 to
completely remove the blocks of dead scalar loops.

Depends on https://github.com/llvm/llvm-project/pull/154510. 

PR: https://github.com/llvm/llvm-project/pull/155497
2025-10-01 16:53:59 +00:00
Florian Hahn
41a2dfc0d7
[VPlan] Allow multiple users of (broadcast %evl).
CSE may replace multiple redundant broadcasts of EVL with a single
broadcast which may have more than 1 user. Adjust the verifier to allow
this.

Fixes a crash when building llvm-test-suite with EVL:
https://lab.llvm.org/buildbot/#/builders/210/builds/3303
2025-09-27 21:47:54 +01:00
Florian Hahn
8460dbb450
[VPlan] Mark VPInstruction::Broadcast as not reading/writing memory.
This enables additional DCE/CSE opportunities and ensures that we don't
end up with multiple redundant users of a VPInstruction using EVL. It
fixes a verifier error in the added test_3_inductions test.
2025-09-27 20:48:42 +01:00
Florian Hahn
78af056137
[VPlan] Run CSE closer to VPlan::execute. (#160572)
Additional CSE opportunities are exposed after converting to concrete
recipes/dissolving regions and materializing various expressions. Run
CSE later, to capitalize on some of the late opportunities.

PR: https://github.com/llvm/llvm-project/pull/160572
2025-09-26 09:38:58 +00:00
Luke Lau
aa6a33ae65
[LV] Remove EVLIndVarSimplify pass (#160454)
Initially this was needed to replace the fixed-step canonical IV with
the variable-step EVL IV, but this was eventually superseded by the loop
vectorizer doing this transform itself in #147222. The pass was then
removed from the RISC-V pipeline in #151483 and the loop vectorizer
stopped emitting the metadata used by the pass in #155760, so now
there's no users of it.
2025-09-25 12:18:20 +08:00
Shih-Po Hung
0d22f8344a
[LV][EVL] Remove metadata on EVL vectorized loops (#155760)
This patch  removes the metadata emission for EVL‑vectorized loops,
since there is no current in-tree consumer: 
   1) after VPlan performs canonical IV replacement #147222 and 
2) RISCV dropped EVLIndVarSimplifyPass #151483, which was the only user
of this metadata.
2025-09-23 07:39:33 +08:00
Florian Hahn
addfdb5273
[LV] Skip select cost for invariant divisors in legacy cost model.
For UDiv/SDiv with invariant divisors, the created selects will be
hoisted out. Don't compute their cost for each iteration, to match the
more accurate VPlan-based cost modeling.

Fixes https://github.com/llvm/llvm-project/issues/159402.
2025-09-21 15:08:50 +01:00
Florian Hahn
50b9ca4dda
[VPlan] Simplify Plan's entry in removeBranchOnConst. (#154510)
After https://github.com/llvm/llvm-project/pull/153643, there may be a
BranchOnCond with constant condition in the entry block.

Simplify those in removeBranchOnConst. This removes a number of
redundant conditional branch from entry blocks.

In some cases, it may also make the original scalar loop unreachable,
because we know it will never execute. In that case, we need to remove
the loop from LoopInfo, because all unreachable blocks may dominate each
other, making LoopInfo invalid. In those cases, we can also completely
remove the loop, for which I'll share a follow-up patch.

Depends on https://github.com/llvm/llvm-project/pull/153643.

PR: https://github.com/llvm/llvm-project/pull/154510
2025-09-18 19:25:05 +01:00
Sander de Smalen
17e008db17
[IR] NFC: Remove 'experimental' from partial.reduce.add intrinsic (#158637)
The partial reduction intrinsics are no longer experimental, because
they've been used in production for a while and are unlikely to change.
2025-09-17 11:44:47 +01:00
Ramkumar Ramachandra
46fcece2a8
[VPlan] Extend CSE to eliminate GEPs (#156699)
The motivation for this patch is to close the gap between the
VPlan-based CSE and the legacy CSE, to make it easier to remove the
legacy CSE. Before this patch, stubbing out the legacy CSE leads to 22
test failures, and after this patch, there are only 12 failures, and all
of them seem to have a single root cause:
VPlanTransforms::createInterleaveGroups() and
VPInterleaveGroup::execute(). The improvements from this patch are of
course welcome.

While developing the patch, a miscompile was found when GEP
source-element-types differ, and this has been fixed.

Co-authored-by: Florian Hahn <flo@fhahn.com>
Co-authored-by: Luke Lau <luke@igalia.com>
2025-09-16 10:14:32 +00:00