52 Commits

Author SHA1 Message Date
Florian Hahn
db85babddd
[VPlan] Use m_Intrinsic to match assumes/noalias_scope_decl (NFC).
Use pattern matching to check for intrinsics to slightly simplify code.
2025-11-27 18:50:34 +00:00
Florian Hahn
f8eca64a28
Reapply "[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042)"
This reverts commit a6edeedbfa308876d6f2b1648729d52970bb07e6.

The following fixes have landed, addressing issues causing the original
revert:
* https://github.com/llvm/llvm-project/pull/169298
* https://github.com/llvm/llvm-project/pull/167897
* https://github.com/llvm/llvm-project/pull/168949

Original message:
Building on top of https://github.com/llvm/llvm-project/pull/148817,
introduce a new abstract LastActiveLane opcode that gets lowered to
Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1).

When folding the tail, update all extracts for uses outside the loop the
extract the value of the last actice lane.

See also https://github.com/llvm/llvm-project/issues/148603

PR: https://github.com/llvm/llvm-project/pull/149042
2025-11-26 20:03:55 +00:00
Florian Hahn
d58ebe339c
Revert "Reapply "[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042)""
This reverts commit 72e51d389f66d9cc6b55fd74b56fbbd087672a43.

Missed some test updates.
2025-11-26 19:41:39 +00:00
Florian Hahn
72e51d389f
Reapply "[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042)"
This reverts commit a6edeedbfa308876d6f2b1648729d52970bb07e6.

The following fixes have landed, addressing issues causing the original
revert:
* https://github.com/llvm/llvm-project/pull/169298
* https://github.com/llvm/llvm-project/pull/167897
* https://github.com/llvm/llvm-project/pull/168949

Original message:
Building on top of https://github.com/llvm/llvm-project/pull/148817,
introduce a new abstract LastActiveLane opcode that gets lowered to
Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1).

When folding the tail, update all extracts for uses outside the loop the
extract the value of the last actice lane.

See also https://github.com/llvm/llvm-project/issues/148603

PR: https://github.com/llvm/llvm-project/pull/149042
2025-11-26 19:31:25 +00:00
Florian Hahn
091aece72b
[VPlan] Remove redundant transferFlags call from replicateByVF (NFC).
Flags are now passed on construction/cloning. Remove unnecessary
transferFlags call, and make code independent of VPRecipeWithIRFlags, to
support additional recipes in the future.
2025-11-25 20:57:42 +00:00
Florian Hahn
2befda2225
[VPlan] Populate and use VPIRFlags from initial VPInstruction. (#168450)
Update VPlan to populate VPIRFlags during VPInstruction construction and
use it when creating widened recipes, instead of constructing VPIRFlags
from the underlying IR instruction each time. The VPRecipeWithIRFlags
constructor taking an underlying instruction and setting the flags based
on it has been removed.

This centralizes initial VPIRFlags creation and ensures flags are
consistently available throughout VPlan transformations and makes sure
we don't accidentally re-add flags from the underlying instruction that
already got dropped during transformations.

Follow-up to https://github.com/llvm/llvm-project/pull/167253, which did
the same for VPIRMetadata.

Should be NFC w.r.t. to the generated IR.

PR: https://github.com/llvm/llvm-project/pull/168450
2025-11-18 15:15:14 +00:00
Florian Hahn
a6edeedbfa Revert "[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042)"
This reverts commit 62d1a080e69e3c5e98840e000135afa7c688a77b.

This appears to be causing some runtime failures on RISCV
https://lab.llvm.org/buildbot/#/builders/210/builds/5221
2025-11-13 22:34:55 +00:00
Florian Hahn
62d1a080e6
[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042)
Building on top of https://github.com/llvm/llvm-project/pull/148817,
introduce a new abstract LastActiveLane opcode that gets lowered to
Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1).

When folding the tail, update all extracts for uses outside the loop the
extract the value of the last actice lane.

See also https://github.com/llvm/llvm-project/issues/148603

PR: https://github.com/llvm/llvm-project/pull/149042
2025-11-12 15:11:00 +00:00
Ramkumar Ramachandra
eab44600fb
[VPlan] Rename onlyFirst(Lane|Part)Used (NFC) (#166562)
Rename onlyFirst(Lane|Part)Used to usesFirst(Lane|Part)Only, in line
with usesScalars, for clarity.
2025-11-06 10:07:58 +00:00
Florian Hahn
6e83937f39
[VPlan] Add getConstantInt helpers for constant int creation (NFC).
Add getConstantInt helper methods to VPlan to simplify the common
pattern of creating constant integer live-ins.

Suggested as follow-up in
https://github.com/llvm/llvm-project/pull/164127.
2025-11-01 04:13:01 +00:00
Florian Hahn
a943132761
[VPlan] Add VPRegionBlock::getCanonicalIVType (NFC). (#164127)
Split off from https://github.com/llvm/llvm-project/pull/156262.

Similar to VPRegionBlock::getCanonicalIV, add helper to get the type of
the canonical IV, in preparation for removing VPCanonicalIVPHIRecipe.

PR: https://github.com/llvm/llvm-project/pull/164127
2025-10-31 20:05:02 -07:00
Florian Hahn
b9ce7656e9
[VPlan] Add VPInstruction to unpack vector values to scalars. (#155670)
Add a new Unpack VPInstruction (name to be improved) to explicitly
extract scalars values from vectors.

Test changes are movements of the extracts: they are no generated
together and also directly after the producer.

Depends on https://github.com/llvm/llvm-project/pull/155102 (included in
PR)

PR: https://github.com/llvm/llvm-project/pull/155670
2025-10-19 18:49:05 +00:00
Florian Hahn
4f23767852
[VPlan] Add m_FirstActiveLane matcher (NFC).
Add m_FirstActiveLane, to slightly simplify pattern matching in
preparation for https://github.com/llvm/llvm-project/pull/149042.
2025-10-15 18:55:26 +01:00
Florian Hahn
861519327a
[VPlan] Move getCanonicalIV to VPRegionBlock (NFC). (#163020)
The canonical IV is tied to region blocks; move getCanonicalIV there and
update all users.

PR: https://github.com/llvm/llvm-project/pull/163020
2025-10-15 12:48:35 +01:00
Ramkumar Ramachandra
869c76dda3
[VPlan] Allow zero-operand m_BranchOn(Cond|Count) (NFC) (#162721) 2025-10-13 08:50:09 +01:00
Ramkumar Ramachandra
f68f3b9a7e
[VPlan] Allow zero-operand m_VPInstruction (NFC) (#159550) 2025-09-18 12:40:31 +01:00
Ramkumar Ramachandra
148a83543b
[LV] Introduce m_One and improve (0|1)-match (NFC) (#157419) 2025-09-15 10:34:06 +00:00
Florian Hahn
b8eaceb39b
[VPlan] Explicitly replicate VPInstructions by VF. (#155102)
Extend replicateByVF added in #142433 (aa240293190) to also explicitly
unroll replicating VPInstructions.

Now the only remaining case where we replicate for all lanes is
VPReplicateRecipes in replicate regions.

PR: https://github.com/llvm/llvm-project/pull/155102
2025-09-12 17:06:26 +01:00
Florian Hahn
1efa997317
[VPlan] Handle stores to single-scalar addr in narrowToSingleScalars.
Move handling of stores to single-scalar/uniform address from
replicateByVF to narrowToSingleScalar.
2025-09-10 21:58:29 +01:00
Ramkumar Ramachandra
1e0e0e0a56
[VPlan] Improve style around container-inserts (NFC) (#155174) 2025-08-26 14:12:59 +01:00
Florian Hahn
9f87cd68a4
[VPlan] Add m_ExtractLastElement matcher. (NFC) 2025-08-23 21:21:03 +01:00
Florian Hahn
d67dba5e88
[VPlan] Check Def2LaneDefs first in cloneForLane. (NFC)
If we have entries in Def2LaneDefs, we always have to use it. Move the
check before.

Otherwise we may not pick the correct operand, e.g. if Op was a
replicate recipe that got single-scalar after replicating it.

Fixes https://github.com/llvm/llvm-project/issues/154330.
2025-08-21 11:34:49 +01:00
Florian Hahn
7e9989390d
[VPlan] Materialize Build(Struct)Vectors for VPReplicateRecipes. (NFCI) (#151487)
Materialze Build(Struct)Vectors explicitly for VPRecplicateRecipes, to
serve their users requiring a vector, instead of doing so when unrolling
by VF.

Now we only need to implicitly build vectors in VPTransformState::get
for VPInstructions. Once they are also unrolled by VF we can remove the
code-path alltogether.

PR: https://github.com/llvm/llvm-project/pull/151487
2025-08-18 20:49:42 +01:00
Luke Lau
aea82a780a
[VPlan] Remove some getCanonicalIV() uses. NFC (#152969)
A lot of time getCanonicalIV() is used to get the canonical IV type,
e.g. to instantiate a VPTypeAnalysis or to get the LLVMContext.

However VPTypeAnalysis has a constructor that takes the VPlan directly
and there's a method on VPlan to get the LLVMContext directly, so use
those instead where possible.

This lets us remove a constructor on VPTypeAnalysis.

Also remove an unused LLVMContext argument in UnrollState whilst we're
here.
2025-08-11 18:12:05 +08:00
Luke Lau
94a6cd464e
[VPlan] Expand VPWidenPointerInductionRecipe into separate recipes (#148274)
This is the VPWidenPointerInductionRecipe equivalent of #118638, with
the motivation of allowing us to use the EVL as the induction step.

There is a new VPInstruction added, WidePtrAdd to allow adding the step
vector to the induction phi, since VPInstruction::PtrAdd only handles
scalars or multiple scalar lanes.

Originally this transformation was copied from the original recipe's
execute code, but it's since been simplifed by teaching
`unrollWidenInductionByUF` to unroll the recipe, which brings it inline
with VPWidenIntOrFpInductionRecipe.
2025-08-05 16:54:02 +08:00
Florian Hahn
80c43b6c07
[VPlan] Add ExtractLane VPInst to extract across multiple parts. (#148817)
This patch adds a new ExtractLane VPInstruction which extracts across
multiple parts using a wide index, to be used in combination with
FirstActiveLane.

The patch updates early-exit codegen to use it instead ExtractElement,
which is only per-part. With this change, interleaving should work
correctly with early-exit loops.

The patch removes the restrictions added in 6f43754e9 (#145877), but
does not yet automatically select interleave counts > 1 for early-exit
loops.

I'll share a patch as follow-up. The cost of extracting a lane adds
non-trivial overhead in the exit block, so that should be considered
when picking the interleave count.

PR: https://github.com/llvm/llvm-project/pull/148817
2025-07-27 08:08:25 +01:00
James Y Knight
093afed969
[VPlan] Fix miscompile after PR #142433. (#147398)
This fixes a bug introduced by aa2402931908317f5cc19b164ef17c5a74f2ae67,
"[VPlan] Unroll VPReplicateRecipe by VF", which cloned a
VPReplicateRecipe without transferring the flags from the original.

That can cause incorrect nsw/nuw flags to be emitted on the new
instructions, which may result in miscompiles.

It turns out there were no test-cases in the repo which end up hitting
the situation where the recipe requires instruction clones to have
different flags from the underlying instruction. The existing tests
covered the flags being correct when the replacement instruction is a
vectorized version of the initial instruction, but not when it required
clones. A new test is added covering this.
2025-07-08 14:51:27 -04:00
Florian Hahn
20fbbd7675
[LV] Add support for cmp reductions with decreasing IVs. (#140451)
Similar to FindLastIV, add FindFirstIVSMin to support select (icmp(), x, y)
reductions where one of x or y is a decreasing induction, producing a SMin
 reduction. It uses signed max as sentinel value.

PR: https://github.com/llvm/llvm-project/pull/140451
2025-06-29 11:17:03 +01:00
Florian Hahn
1949536494
[VPlan] Also visit VPBBs outside loop region when unrolling by VF.
Make sure all VPBBs outside the top-level loop region and directly
inside the region are visited; all those blocks may contain
VPReplicateRecipes that need unrolling.

This makes sure we unroll VPRepicateRecipes by VF if they are hoisted
out of the loop, but cannot be converted to single scalar recipes yet.
2025-06-28 19:02:22 +01:00
Florian Hahn
ec62dee703
[VPlan] Handle FirstActiveLane when unrolling. (#145394)
Currently FirstActiveLane is not handled correctly during
 unrolling. This is currently causing mis-compiles when
 vectorizing early-exit loops with interleaving forced.

This patch updates handling of FirstActiveLane to be analogous to
computing final reduction results: during unrolling, the created copies
for its original operand are added as additional operands, and
FirstActiveLane will always produce the index of the first active lane
across all unrolled iterations.

Note that some of the generated code is still incorrect, as we also need
to handle ExtractElement with FirstActiveLane operands. I will share
patches for those soon as well.

PR: https://github.com/llvm/llvm-project/pull/145394
2025-06-27 08:44:57 +01:00
Florian Hahn
5b76cdba5a
[VPlan] Handle AnyOf when unrolling. (#145340)
Currently AnyOf is not handled correctly during unrolling. This is
currently causing mis-compiles when vectorizing early-exit loops with
interleaving forced (even though selectInterleaveCount will currently
only pick IC = 1, unless forced by the user).

This patch updates handling of AnyOf to be analogous to computing final
reduction results: during unrolling, the created copies for its original
operand are added as additional operands, and AnyOf will always produce
the reduced value across all unrolled iterations.

Note that the generated code is still incorrect, as we also need to
handle FirstActiveLane and ExtractElement with FirstActiveLane operands.
I will share patches for those soon as well.

PR: https://github.com/llvm/llvm-project/pull/145340
2025-06-26 14:19:38 +01:00
Florian Hahn
aa24029319
[VPlan] Unroll VPReplicateRecipe by VF. (#142433)
Explicitly unroll VPReplicateRecipes outside replicate regions by VF,
replacing them by VF single-scalar recipes. Extracts for operands are
added as needed and the scalar results are combined to a vector using a
new BuildVector VPInstruction.

It also adds a few folds to simplify unnecessary extracts/BuildVectors.

It also adds a BuildStructVector opcode for handling of calls that have
struct return types.

VPReplicateRecipe in replicate regions can will be unrolled as follow
up, turing non-single-scalar VPReplicateRecipes into 'abstract', i.e.
not executable.

PR: https://github.com/llvm/llvm-project/pull/142433
2025-06-26 11:19:09 +01:00
Florian Hahn
c4c2d777f4
[VPlan] Fix handling of ReductionStartVector for rdxs when unrolling.
Update handling of ReductionStartVector in VPlanUnroll for partial
reductions. The new code makes sure all parts are properly set to the
cloned ReductionStartVector.

Fixes a mis-compile reported for
https://github.com/llvm/llvm-project/pull/142290.
2025-06-19 13:26:19 +01:00
Florian Hahn
f68848015f
[VPlan] Manage Sentinel value for FindLastIV in VPlan. (#142291)
Similar to modeling the start value as operand, also model the sentinel
value as operand explicitly. This makes all require information for
code-gen available directly in VPlan.

PR: https://github.com/llvm/llvm-project/pull/142291
2025-06-13 19:17:01 +01:00
Florian Hahn
6108d50aed
[VPlan] Add ReductionStartVector VPInstruction. (#142290)
Add a new VPInstruction::ReductionStartVector opcode to create the start
values for wide reductions. This more accurately models the start value
creation in VPlan and simplifies VPReductionPHIRecipe::execute. Down the
line it also allows removing VPReductionPHIRecipe::RdxDesc.

PR: https://github.com/llvm/llvm-project/pull/142290
2025-06-09 20:59:12 +01:00
Florian Hahn
5520ab3d50
[VPlan] Add ComputeAnyOfResult VPInstruction (NFC) (#141932)
Add a dedicated opcode for any-of reduction, similar to
https://github.com/llvm/llvm-project/pull/132689 and
https://github.com/llvm/llvm-project/pull/132690.

The patch also explictly adds the start value to not require
RecurrenceDescriptor during execute. It also allows freezing the start
value to make it poison-safe.

PR: https://github.com/llvm/llvm-project/pull/141932
2025-06-03 14:33:53 +01:00
Florian Hahn
c0506a11f4
[VPlan] Separate out logic to manage IR flags to VPIRFlags (NFC). (#140621)
This patch moves the logic to manage IR flags to a separate VPIRFlags
class. For now, VPRecipeWithIRFlags is the only class that inherits
VPIRFlags. The new class allows for simpler passing of flags when
constructing recipes, simplifying the constructors for various recipes
(VPInstruction in particular, which now just has 2 constructors, one
taking an extra VPIRFlags argument.

This mirrors the approach taken for VPIRMetadata and makes it easier to
extend in the future. The patch also adds a unified flagsValidForOpcode
to check if the flags in a VPIRFlags match the provided opcode.

PR: https://github.com/llvm/llvm-project/pull/140621
2025-05-25 11:13:11 +01:00
Florian Hahn
df21288247
[VPlan] Replace ExtractFromEnd with Extract(Last|Penultimate)Element (NFC). (#137030)
ExtractFromEnd only has 2 uses, extracting the last and penultimate
elements. Replace it with 2 separate opcodes, removing the need to
materialize and handle a constant argument.

PR: https://github.com/llvm/llvm-project/pull/137030
2025-04-25 16:27:29 +01:00
Florian Hahn
54b33eba16
[VPlan] Add opcode to create step for wide inductions. (#119284)
This patch adds a WideIVStep opcode that can be used to create a vector
with the steps to increment a wide induction. The opcode has 2 operands
* the vector step
* the scale of the vector step

The opcode is later converted into a sequence of recipes that convert
the scale and step to the target type, if needed, and then multiply
vector step by scale.

This simplifies code that needs to materialize step vectors, e.g.
replacing wide IVs as follow up to
https://github.com/llvm/llvm-project/pull/108378 with an increment of
the wide IV step.

PR: https://github.com/llvm/llvm-project/pull/119284
2025-04-14 23:20:44 +02:00
Florian Hahn
8ddbc01295
[VPlan] Manage FindLastIV start value in ComputeFindLastIVResult (NFC) (#132690)
Keep the start value as operand of ComputeFindLastIVResult. A follow-up
patch will use this to make sure the start value is frozen if needed.

Depends on https://github.com/llvm/llvm-project/pull/132689

PR: https://github.com/llvm/llvm-project/pull/132690
2025-03-27 18:34:13 +00:00
Florian Hahn
420c056f85
[VPlan] Add ComputeFindLastIVResult opcode (NFC). (#132689)
This moves the logic for computing the FindLastIV reduction result to
its own opcode. A follow-up patch will update the new opcode to also
take the start value, to fix
https://github.com/llvm/llvm-project/issues/126836.

PR: https://github.com/llvm/llvm-project/pull/132689
2025-03-26 10:49:09 +00:00
Luke Lau
a4dc02c0e7
[VPlan] Rename VPReverseVectorPointerRecipe to VPVectorEndPointerRecipe. NFC (#131086)
After #128718 lands there will be two ways of performing a reversed
widened memory access, either by performing a consecutive unit-stride
access and a reverse, or a strided access with a negative stride.

Even though both produce a reversed vector, only the former needs
VPReverseVectorPointerRecipe which computes a pointer to the last
element of each part. A strided reverse still needs a pointer to the
first element of each part so it will use VPVectorPointerRecipe.

This renames VPReverseVectorPointerRecipe to VPVectorEndPointerRecipe to
clarify that a reversed access may not necessarily need a pointer to the
last element.
2025-03-19 00:09:15 +08:00
Luke Lau
5e54c92314
[VPlan] Fix crash when unrolling in-loop reduction chains (#129840)
If an in-loop reduction is chained e.g.

    WIDEN-REDUCTION-PHI ir<%rdx> = phi ir<0>, ir<%add2>
    REDUCE ir<%add1> = ir<%rdx> + reduce.add (ir<%x>)
    REDUCE ir<%add2> = ir<%add1> + reduce.add (ir<%y>)

When we try to unroll the second add reduction, we crash because we
currently expect the chain to be a VPReductionPHIRecipe, when in fact
it's the previous reduction. This relaxes the cast to a dyn_cast, so we
end up unrolling to:

    WIDEN-REDUCTION-PHI ir<%rdx> = phi ir<0>, ir<%add2>
    WIDEN-REDUCTION-PHI ir<%rdx>.1 = phi ir<0>, ir<%add2>.1, ir<1>
    WIDEN-REDUCTION-PHI ir<%rdx>.2 = phi ir<0>, ir<%add2>.2, ir<2>
    WIDEN-REDUCTION-PHI ir<%rdx>.3 = phi ir<0>, ir<%add2>.3, ir<3>
    REDUCE ir<%add1> = ir<%rdx> + reduce.add (ir<%x>)
    REDUCE ir<%add1>.1 = ir<%rdx>.1 + reduce.add (ir<%x>.1)
    REDUCE ir<%add1>.2 = ir<%rdx>.2 + reduce.add (ir<%x>.2)
    REDUCE ir<%add1>.3 = ir<%rdx>.3 + reduce.add (ir<%x>.3)
    REDUCE ir<%add2> = ir<%add1> + reduce.add (ir<%y>)
    REDUCE ir<%add2>.1 = ir<%add1>.1 + reduce.add (ir<%y>.1)
    REDUCE ir<%add2>.2 = ir<%add1>.2 + reduce.add (ir<%y>.2)
    REDUCE ir<%add2>.3 = ir<%add1>.3 + reduce.add (ir<%y>.3)

This fixes a crash when building 525.x264_r from SPEC CPU 2017 on
AArch64 with -mllvm -prefer-inloop-reductions
2025-03-05 19:13:23 +08:00
Florian Hahn
6c8f41d336
[VPlan] Hook IR blocks into VPlan during skeleton creation (NFC) (#114292)
As a first step to move towards modeling the full skeleton in VPlan,
start by wrapping IR blocks created during legacy skeleton creation in
VPIRBasicBlocks and hook them into the VPlan. This means the skeleton
CFG is represented in VPlan, just before execute. This allows moving
parts of skeleton creation into recipes in the VPBBs gradually.

Note that this allows retiring some manual DT updates, as this will be
handled automatically during VPlan execution.

PR: https://github.com/llvm/llvm-project/pull/114292
2024-12-12 15:58:16 +00:00
Florian Hahn
4f7f71b7bc
[VPlan] Compare APInt instead of getSExtValue to fix crash in unroll.
getSExtValue assumes the result fits in 64 bits, but this may not be the
case for indcutions with wider types. Instead, directly perform the
compare on the APInt for the ConstantInt.

Fixes https://github.com/llvm/llvm-project/issues/118850.
2024-12-06 16:28:49 +00:00
Kazu Hirata
2c0f463b25
[Vectorize] Simplify code with DenseMap::operator[] (NFC) (#115635) 2024-11-10 07:24:47 -08:00
Kazu Hirata
aa825b74af
[Vectorize] Remove unused includes (NFC) (#114643)
Identified with misc-include-cleaner.
2024-11-03 08:58:51 -08:00
Florian Hahn
b021464d35
[VPlan] Introduce scalar loop header in plan, remove VPLiveOut. (#109975)
Update VPlan to include the scalar loop header. This allows retiring
VPLiveOut, as the remaining live-outs can now be handled by adding
operands to the wrapped phis in the scalar loop header.

Note that the current version only includes the scalar loop header, no
other loop blocks and also does not wrap it in a region block.

PR: https://github.com/llvm/llvm-project/pull/109975
2024-10-31 21:36:44 +01:00
Shih-Po Hung
266ff98cba
[LV][VPlan] Use VF VPValue in VPVectorPointerRecipe (#110974)
Refactors VPVectorPointerRecipe to use the VF VPValue to obtain the
runtime VF, similar to #95305.

Since only reverse vector pointers require the runtime VF, the patch
sets VPUnrollPart::PartOpIndex to 1 for vector pointers and 2 for
reverse vector pointers. As a result, the generation of reverse vector
pointers is moved into a separate recipe.
2024-10-26 23:18:50 +08:00
Florian Hahn
21ac5c8661
[VPlan] Remove duplicated ExtractFromEnd handling from unoll (NFC).
ExtractFromEnd is already handled earlier, remove duplicated code.
2024-09-26 11:38:45 +01:00