83 Commits

Author SHA1 Message Date
Florian Hahn
3c5b05427d
[VPlan] Pass underlying instr to getMemoryOpCost in ::computeCost.
Pass underlying instruction to getMemoryOpCost in
VPReplicateRecipe::computeCost if UsedByLoadStoreAddress is true.
Some targets use the underlying instruction to improve costs,
and this is needed to match the legacy cost model.

Fixes https://github.com/llvm/llvm-project/issues/177780.
Fixes https://github.com/llvm/llvm-project/issues/177772.
2026-02-08 16:15:39 +00:00
Ramkumar Ramachandra
d12e99376f
Reland [VPlan] Simplify pow-of-2 (mul|udiv) -> (shl|lshr) (#174581)
The original patch, landed as a2db31b0 ([VPlan] Simplify pow-of-2
(mul|udiv) -> (shl|lshr), #172477) had a critical commutative matcher
bug, which has now been fixed. An assert has also been strengthened,
following a post-commit review.
2026-01-06 20:36:26 +00:00
Alex Bradbury
5a456c17d9
Revert "[VPlan] Simplify pow-of-2 (mul|udiv) -> (shl|lshr)" (#174559)
Reverts llvm/llvm-project#172477

This is causing failures for RVA23 (including some tests running away in
their execution causing OOM, hence the builder dying). I will attempt to
follow up on the PR with a reproducer of some kind.
https://lab.llvm.org/buildbot/#/builders/210/builds/7243
2026-01-06 10:26:51 +00:00
Ramkumar Ramachandra
a2db31b06f
[VPlan] Simplify pow-of-2 (mul|udiv) -> (shl|lshr) (#172477) 2026-01-06 08:27:48 +00:00
Florian Hahn
12ec050b9b
[LV] Remove some unnecessary uses of poison from tests. 2025-10-17 21:20:44 +01:00
Florian Hahn
8907b6d393
[VPlan] Remove original loop blocks if dead. (#155497)
Build on top of https://github.com/llvm/llvm-project/pull/154510 to
completely remove the blocks of dead scalar loops.

Depends on https://github.com/llvm/llvm-project/pull/154510. 

PR: https://github.com/llvm/llvm-project/pull/155497
2025-10-01 16:53:59 +00:00
Florian Hahn
50b9ca4dda
[VPlan] Simplify Plan's entry in removeBranchOnConst. (#154510)
After https://github.com/llvm/llvm-project/pull/153643, there may be a
BranchOnCond with constant condition in the entry block.

Simplify those in removeBranchOnConst. This removes a number of
redundant conditional branch from entry blocks.

In some cases, it may also make the original scalar loop unreachable,
because we know it will never execute. In that case, we need to remove
the loop from LoopInfo, because all unreachable blocks may dominate each
other, making LoopInfo invalid. In those cases, we can also completely
remove the loop, for which I'll share a follow-up patch.

Depends on https://github.com/llvm/llvm-project/pull/153643.

PR: https://github.com/llvm/llvm-project/pull/154510
2025-09-18 19:25:05 +01:00
Florian Hahn
351d398a37
[VPlan] Run final VPlan simplifications before codegen.
Dissolving the hierarchical VPlan CFG and converting abstract to
concrete recipes can expose additional simplification opportunities.

Do a final run of simplifyRecipes before executing the VPlan.
2025-08-16 18:54:27 +01:00
Florian Hahn
86813aa786
[VPlan] Add dedicated user for resume phi with epilogue vectorization.
Epilogue vectorization currently relies on the resume phi for the
canonical induction being always available, which is why VPPhi are
considered to have side-effects, to prevent their removal.

This patch adds a new ResumeForEpilogue opcode to mark the resume phi as
used for epilogue vectorization. This allows treating VPPhis in general
as not having side-effects, enabling removal of unused VPPhis.
2025-08-10 21:21:16 +01:00
Nikita Popov
c23b4fbdbb
[IR] Remove size argument from lifetime intrinsics (#150248)
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.

This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
2025-08-08 11:09:34 +02:00
Florian Hahn
25d1285eec
[VPlan] Replace single-entry VPPhis with their incoming values.
Replace trivial, single-entry VPPhis with their incoming values,
2025-08-06 20:03:31 +01:00
Florian Hahn
c9dd14d1d4
[VPlan] Compute interleave count for VPlan. (#149702)
Move selectInterleaveCount to LoopVectorizationPlanner and retrieve some
information directly from VPlan. Register pressure was already computed
for a VPlan, and with this patch we now also check for reductions
directly on VPlan, as well as checking how many load and store
operations remain in the loop.

This should be mostly NFC, but we may compute slightly different
interleave counts, except for some edge cases, e.g. where dead loads
have been removed. This shouldn't happen in practice, and the patch
doesn't cause changes across a large test corpus on AArch64.

Computing the interleave count based on VPlan allows for making better
decisions in presence of VPlan optimizations, for example when
operations on interleave groups are narrowed.

Note that there are a few test changes for tests that were still
checking the legacy cost-model output when it was computed in
selectInterleaveCount.

PR: https://github.com/llvm/llvm-project/pull/149702
2025-08-05 09:42:55 +01:00
Florian Hahn
89ae085859
[VPlan] Remove VPVectorPointer for part 0 after unrolling. (#149735)
VPVectorPointer for part 0 is just the pointer operand. Simplify it
after unrolling. This removes a large number of redundant GEPs with
index 0.

PR: https://github.com/llvm/llvm-project/pull/149735
2025-07-27 13:53:26 +01:00
Florian Hahn
fa3ec0c17c
[VPlan] Materialize constant vector trip counts before final opts. (#142309)
Materialize constant vector trip counts before ::execute, if the trip
count can be computed as Original (TC / (VF * UF)) * (VF * UF). For now
this excludes when the tail is folded or scalar epilogues are required.

This enables removing a number of redundant branches from the middle
block.

For now this is also only done when not vectorizing the epilogue, as the
simplification complicates stitching the 2 plans together.

PR: https://github.com/llvm/llvm-project/pull/142309
2025-07-26 17:16:36 +01:00
Ramkumar Ramachandra
cf1f116f78
[VPlan] Introduce constant folder in simplifyRecipe (#125365)
Introduce a VPlan-level constant folder in simplifyRecipe that tries to
fold a recipe to a constant using TargetFolder.
2025-05-20 14:16:01 +01:00
Florian Hahn
07c085af3e
[VPlan] Add narrowToSingleScalarRecipe transform. (#139150)
Add a new convertToUniformRecipes transform which uses VPlan-based
uniformity analysis to determine if wide recipes and replicate recipes
can be converted to uniform recipes.

There are a few places where we ad-hoc convert recipes to uniform
recipes, which this transform will eventually replace. There are a few
more generalizations required to do so which I plan to do as follow-ups.

By converting the recipes to uniform recipes, we effectively materialize
the information from the VPlan-based analysis.

Note that there is one regression at the moment in SystemZ/pr47665.ll
due to trivial constant folding opportunities in the input IR. This will
be fixed by VPlan-based constant folding
(https://github.com/llvm/llvm-project/pull/125365/)

PR: https://github.com/llvm/llvm-project/pull/139150
2025-05-18 09:32:27 +01:00
Björn Pettersson
092b6e73e6
[InstCombine] Handle "add like" in ADD+GEP->GEP+GEP rewrites (#135156)
Considering that "or disjoint" is the canonical for certain add
operations, then I think we want to support such "add like" operations
when doing ADD+GEP->GEP+GEP rewrites to make things more consistent.

Problem was found when improving ValueTracking, which turned an ADD into
OR, and then suddenly optimizations got worse due to these rewrites no
longer triggering.
2025-04-14 17:11:13 +02:00
Florian Hahn
5550d30228
[VPlan] Check captured operand when simplifying redundant OR.
Follow-up to 0f607f to actually use the captured operand X instead of Y.
2025-04-13 13:23:27 +01:00
Florian Hahn
0f607f3df5
[VPlan] Simplify 'or x, true' -> true.
Add additional OR simplification to fix a divergence between legacy and
VPlan-based cost model.

This adds a new m_AllOnes matcher by generalizing specific_intval to
int_pred_ty, which takes a predicate to check to support matching both
specific APInts and other APInt predices, like isAllOnes.

Fixes https://github.com/llvm/llvm-project/issues/131359.
2025-04-13 12:09:40 +01:00
Florian Hahn
5fbd0658a0
[VPlan] Add initial CFG simplification, removing BranchOnCond true. (#106748)
Add an initial CFG simplification transform, which removes the dead
edges for blocks terminated with BranchOnCond true.

At the moment, this removes the edge between middle block and scalar
preheader when folding the tail.

PR: https://github.com/llvm/llvm-project/pull/106748
2025-04-04 15:44:26 +01:00
Hari Limaye
bf5627c85e
[LV] Optimize VPWidenIntOrFpInductionRecipe for known TC (#118828)
Optimize the IR generated for a VPWidenIntOrFpInductionRecipe to use the
narrowest type necessary, when the trip-count of a loop is known to be
constant and the only use of the recipe is the condition used by the
vector loop's backedge branch.
2025-03-28 14:47:40 +00:00
Nikita Popov
29441e4f5f
[IR] Convert from nocapture to captures(none) (#123181)
This PR removes the old `nocapture` attribute, replacing it with the new
`captures` attribute introduced in #116990. This change is
intended to be essentially NFC, replacing existing uses of `nocapture`
with `captures(none)` without adding any new analysis capabilities.
Making use of non-`none` values is left for a followup.

Some notes:
* `nocapture` will be upgraded to `captures(none)` by the bitcode
   reader.
* `nocapture` will also be upgraded by the textual IR reader. This is to
   make it easier to use old IR files and somewhat reduce the test churn in
   this PR.
* Helper APIs like `doesNotCapture()` will check for `captures(none)`.
* MLIR import will convert `captures(none)` into an `llvm.nocapture`
   attribute. The representation in the LLVM IR dialect should be updated
   separately.
2025-01-29 16:56:47 +01:00
Florian Hahn
f48884ded8
[VPlan] Remove loop region in optimizeForVFAndUF. (#108378)
Update optimizeForVFAndUF to completely remove the vector loop region
when possible. At the moment, we cannot remove the region if it contains

* widened IVs: the recipe is needed to generate the step vector
* reductions: ComputeReductionResults requires the reduction phi recipe
for codegen.

Both cases can be addressed by more explicit modeling.

The patch also includes a number of updates to allow executing VPlans
without a vector loop region.

Depends on https://github.com/llvm/llvm-project/pull/110004
2025-01-05 15:50:42 +00:00
Florian Hahn
7f3428d3ed
[VPlan] Compute induction end values in VPlan. (#112145)
Use createDerivedIV to compute IV end values directly in VPlan, instead
of creating them up-front.

This allows updating IV users outside the loop as follow-up.

Depends on https://github.com/llvm/llvm-project/pull/110004 and
https://github.com/llvm/llvm-project/pull/109975.

PR: https://github.com/llvm/llvm-project/pull/112145
2024-12-29 19:05:08 +00:00
Florian Hahn
82821254f5
[LV] Use IVUpdateMayOverflow to set HasNUW. (#111758)
If IVUpdateMayOverflow is false, we proved that the induction increment
cannot overflow in the vector loop. This allows setting NUW in some
cases when folding the tail.

PR: https://github.com/llvm/llvm-project/pull/111758
2024-11-28 10:12:41 +00:00
David Sherwood
3097c60928
[LoopVectorize][NFC] Rewrite tests to check output of vplan cost model (#113697)
Currently it's very difficult to improve the cost model for tail-folded
loops because as soon as you add a VPInstruction::computeCost function
that adds the costs of instructions such as
VPInstruction::ActiveLaneMask
and VPInstruction::ExplicitVectorLength the assert in
LoopVectorizationPlanner::computeBestVF fails for some tests. This is
because the VF chosen by the legacy cost model doesn't match the vplan
cost model. See PR #90191. This assert is currently making it difficult
to improve the cost model.

Hopefully we will be in a position to remove the assert soon, however
in order to do that we have to fix up a whole bunch of tests that rely
upon the legacy cost model output. I've tried my best to update
these tests to use vplan output instead.

There is still work needed for the VF=1 case because the vplan cost
model is not printed out in this case. I've not attempted to fix those
in this patch.
2024-11-19 08:55:39 +00:00
Paul Walker
38fffa630e
[LLVM][IR] Use splat syntax when printing Constant[Data]Vector. (#112548) 2024-11-06 11:53:33 +00:00
Florian Hahn
4eb9838409
[VPlan] Generalize VPValue::isDefinedOutsideLoopRegions.
Update isDefinedOutsideLoopRegions to check if a recipe is defined
outside any region. Split off already approved
https://github.com/llvm/llvm-project/pull/95842 now that this can be
tested separately after landing VPlan-based LICM
https://github.com/llvm/llvm-project/issues/107501
2024-09-20 15:34:00 +01:00
Florian Hahn
a861ed411a
[VPlan] Add initial loop-invariant code motion transform. (#107894)
Add initial transform to move out loop-invariant recipes.

This also helps to fix a divergence between legacy and VPlan-based cost
model due to legacy using ScalarEvolution::isLoopInvariant in some
cases.

Fixes https://github.com/llvm/llvm-project/issues/107501.

PR: https://github.com/llvm/llvm-project/pull/107894
2024-09-20 11:22:03 +01:00
Florian Hahn
ea83e1c05a
[LV] Assign cost to all interleave members when not interleaving.
At the moment, the full cost of all interleave group members is assigned
to the instruction at the group's insert position, even if the decision
was to not form an interleave group.

This can lead to inaccurate cost estimates, e.g. if the instruction at
the insert position is dead. If the decision is to not vectorize but
scalarize or scather/gather, then the cost will be to total cost for all
members. In those cases, assign individual the cost per member, to more
closely reflect to choice per instruction.

This fixes a divergence between legacy and VPlan-based cost model.

Fixes https://github.com/llvm/llvm-project/issues/108098.
2024-09-11 21:04:34 +01:00
Florian Hahn
bb60dd391f
[VPlan] Only use force-target-instruction-cost for recipes with insts.
To match the behavior of the legacy cost model, only apply
-force-target-instruction-cost to recipes with underlying instructions
for now, as only original IR instructions are considered by the legacy
cost model.

This fixes a difference between legacy and VPlan based cost model,
triggering the verification assertion, reported by @JonPsson1.
2024-07-23 21:05:10 +01:00
Florian Hahn
c8c0b18b5d
[LV] Update tests to not have dead interleave groups.
Update existing tests with dead interleave groups by adding users. This
ensures the tests keep testing what they were intended to test with a
planned change to skip unused instructions in cost computations.
2024-07-21 14:03:40 +01:00
Florian Hahn
b8741cc185
[VPlan] Relax assertion retrieving a scalar from VPTransformState::get.
The current assertion VPTransformState::get when retrieving a single
scalar only does not account for cases where a def has multiple users,
some demanding all scalar lanes, some demanding only a single scalar.

For an example, see the modified test case. Relax the assertion by also
allowing requesting scalar lanes only when the Def doesn't have only its
first lane used.

Fixes https://github.com/llvm/llvm-project/issues/88849.
2024-07-19 11:33:57 +01:00
Florian Hahn
17f98baf70
[LV] Add test with users both demanding all lanes and first-lane-only.
Add a test case where scalar steps are used by both a VPReplicateRecipe
(demands all scalar lanes) and a VPInstruction that only demands the
first lane.

Test case for https://github.com/llvm/llvm-project/issues/88849.
2024-07-19 10:29:43 +01:00
Florian Hahn
9a5a8731e7
[VPlan] Introduce ResumePhi VPInstruction, use to create phi for FOR. (#94760)
This patch introduces a new ResumePhi VPInstruction which creates a phi
in a leaf block of a VPlan. The first use is to create the phi node for
fixed-order recurrence resume values in the scalar preheader.

The VPInstruction takes 2 operands: 1) the incoming value from the
middle-block and a default value to be used for all other incoming
blocks.

In follow-up changes, it will also be used to create phis for reduction
and induction resume values.

Depends on https://github.com/llvm/llvm-project/pull/92651

PR: https://github.com/llvm/llvm-project/pull/94760
2024-07-11 16:08:04 +01:00
Florian Hahn
29b8b72117
[LV] Move check if any vector insts will be generated to VPlan. (#96622)
This patch moves the check if any vector instructions will be generated
from getInstructionCost to be based on VPlan. This simplifies
getInstructionCost, is more accurate as we check the final result and
also allows us to exit early once we visit a recipe that generates
vector instructions.

The helper can then be re-used by the VPlan-based cost model to match
the legacy selectVectorizationFactor behavior, this fixing a crash and
paving the way to recommit
https://github.com/llvm/llvm-project/pull/92555.

PR: https://github.com/llvm/llvm-project/pull/96622
2024-07-07 20:08:01 +01:00
Florian Hahn
959ff45bda
[LV] Regenerate test checks for zero_unroll.ll (NFC).
Regenerate test checks to better show impact of
https://github.com/llvm/llvm-project/pull/96622.
2024-07-05 11:37:13 +01:00
Florian Hahn
99d6c6d936
[VPlan] Model branch cond to enter scalar epilogue in VPlan. (#92651)
This patch moves branch condition creation to enter the scalar epilogue
loop to VPlan. Modeling the branch in the middle block also requires
modeling the successor blocks. This is done using the recently
introduced VPIRBasicBlock.

Note that the middle.block is still created as part of the skeleton and
then patched in during VPlan execution. Unfortunately the skeleton needs
to create the middle.block early on, as it is also used for induction
resume value creation and is also needed to properly update the
dominator tree during skeleton creation.

After this patch lands, I plan to move induction resume value and phi
node creation in the scalar preheader to VPlan. Once that is done, we
should be able to create the middle.block in VPlan directly.

This is a re-worked version based on the earlier
https://reviews.llvm.org/D150398 and the main change is the use of
VPIRBasicBlock.

Depends on https://github.com/llvm/llvm-project/pull/92525

PR: https://github.com/llvm/llvm-project/pull/92651
2024-07-05 10:08:42 +01:00
David Green
352a836176
[InstCombine] Canonicalize non-i8 gep of mul to i8 (#96606)
This is a small canonicalization for `gep i32, p, (mul x, C)` -> `gep
i8, p, (mul x, C*4)`, so that the mul can combine both of the constant
multiplications, and we take a small step towards canonicalizing more
geps to i8.

It currently doesn't attempt to check for multiple uses on the mul, but
that should be possible if it sounds better. Let me know what you think
of the idea in general.
2024-06-26 14:25:54 +01:00
Ramkumar Ramachandra
bb0d29a72d
[LV] fix logical error in trunc cost (#91136)
In LoopVectorizationCostModel::getInstructionCost(), when the condition
canTruncateToMinimalBitwidth() is satisfied, for a trunc, the source
type is computed as the smallest type of the source vector and the
destination vector, and the destination type is computed as the largest
type of the instruction and destination type. This is clearly a logical
error, as the original source vector type could be smaller than the
original destination vector type, and the trunc semantics are broken
because we're attempting to widen.

Fixes #47665.
2024-05-24 18:01:58 +01:00
Ramkumar Ramachandra
dc148c9fb8
[LV] add test for #47665, #88802 (#91135) 2024-05-24 10:50:43 +01:00
Nilanjana Basu
c1c5b854ad
[LV] Remove loop trip count threshold for deciding whether to interleave a loop (#67725)
A set of microbenchmarks (https://github.com/llvm/llvm-test-suite/pull/26) showed that loop interleaving can be beneficial for loops with low trip count as well. Loop interleaving count computation is updated accordingly in prior patches while this patch removes the loop trip count threshold for interleaving.
2024-02-05 17:23:58 -08:00
Jonas Paulsson
62b7e35f10
[SystemZ] Don't assert for i128 vectors in getInterleavedMemoryOpCost() (#78009)
This assert does not seem justified given that the LoopVectorizer can
form interleave groups containing i128 elements where the number of
elements per vector is indeed just one.
2024-01-15 17:31:18 +01:00
Craig Topper
7ec4f6094e
[InstCombine] Infer disjoint flag on Or instructions. (#72912)
The disjoint flag was recently added to IR in #72583

We already set it when we turn an add into an or. This patch sets it on Ors that weren't converted from an Add.
2023-12-02 14:11:12 -08:00
Craig Topper
03d4a9d94d
[InstCombine] Set disjoint flag when turning Add into Or. (#72702)
The disjoint flag was recently added to IR in #72583
2023-11-27 12:54:11 -08:00
Nikita Popov
2b7c347c7f [LoopVectorize] Convert test to opaque pointers (NFC)
I'm keeping the bitcast in the input here, because without it
we end up introducing a stride 1 assumption and end up testing
a different case.
2023-06-12 14:49:45 +02:00
Tobias Hieta
f84bac329b
[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.

Reformatting is done with `black`.

If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.

If you run into any problems, post to discourse about it and
we will try to help.

RFC Thread below:

https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style

Reviewed By: barannikov88, kwk

Differential Revision: https://reviews.llvm.org/D150762
2023-05-17 17:03:15 +02:00
Florian Hahn
35af27c30a
[VPlan] Only create extracts for recurrence exits if there are live-outs.
Move the code to collect live-out earlier and only generate extracts for
exit values if there are any live-outs that use them.

Depends on D147472.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147567
2023-04-10 21:08:34 +01:00
Nikita Popov
9ed2f14c87 [AsmParser] Remove typed pointer auto-detection
IR is now always parsed in opaque pointer mode, unless
-opaque-pointers=0 is explicitly given. There is no automatic
detection of typed pointers anymore.

The -opaque-pointers=0 option is added to any remaining IR tests
that haven't been migrated yet.

Differential Revision: https://reviews.llvm.org/D141912
2023-01-18 09:58:32 +01:00
Nikita Popov
2fab927546 [LoopVectorize] Convert some tests to opaque pointers (NFC)
Check lines for some of these tests were regenerated. The difference
is that with opaque pointers SCEVExpander always emits i8 GEPs,
making the address calculation explicit. This is a known problem
that will be solved long term by making all address calculations
explicit.
2023-01-04 17:25:42 +01:00