976 Commits

Author SHA1 Message Date
Florian Hahn
50b9ca4dda
[VPlan] Simplify Plan's entry in removeBranchOnConst. (#154510)
After https://github.com/llvm/llvm-project/pull/153643, there may be a
BranchOnCond with constant condition in the entry block.

Simplify those in removeBranchOnConst. This removes a number of
redundant conditional branch from entry blocks.

In some cases, it may also make the original scalar loop unreachable,
because we know it will never execute. In that case, we need to remove
the loop from LoopInfo, because all unreachable blocks may dominate each
other, making LoopInfo invalid. In those cases, we can also completely
remove the loop, for which I'll share a follow-up patch.

Depends on https://github.com/llvm/llvm-project/pull/153643.

PR: https://github.com/llvm/llvm-project/pull/154510
2025-09-18 19:25:05 +01:00
Ramkumar Ramachandra
46fcece2a8
[VPlan] Extend CSE to eliminate GEPs (#156699)
The motivation for this patch is to close the gap between the
VPlan-based CSE and the legacy CSE, to make it easier to remove the
legacy CSE. Before this patch, stubbing out the legacy CSE leads to 22
test failures, and after this patch, there are only 12 failures, and all
of them seem to have a single root cause:
VPlanTransforms::createInterleaveGroups() and
VPInterleaveGroup::execute(). The improvements from this patch are of
course welcome.

While developing the patch, a miscompile was found when GEP
source-element-types differ, and this has been fixed.

Co-authored-by: Florian Hahn <flo@fhahn.com>
Co-authored-by: Luke Lau <luke@igalia.com>
2025-09-16 10:14:32 +00:00
Florian Hahn
1858532c48
[VPlan] Handle predicated UDiv in VPReplicateRecipe::computeCost.
Account for predicated UDiv,SDiv,URem,SRem in
VPReplicateRecipe::computeCost: compute costs of extra phis and apply
getPredBlockCostDivisor.

Fixes https://github.com/llvm/llvm-project/issues/158660
2025-09-15 21:46:50 +01:00
Florian Hahn
4949cb4a5e
[VPlan] Track VPValues instead of VPRecipes in calculateRegisterUsage. (#155301)
Update calculateRegisterUsageForPlan to track live-ness of VPValues
instead of recipes. This gives slightly more accurate results for
recipes that define multiple values (i.e. VPInterleaveRecipe).

When tracking the live-ness of recipes, all VPValues defined by an
VPInterleaveRecipe are considered alive until the last use of any of
them. When tracking the live-ness of individual VPValues, we can
accurately track the individual values until their last use.

Note the changes in large-loop-rdx.ll and pr47437.ll. This patch
restores the original behavior before introducing VPlan-based liveness
tracking.

PR: https://github.com/llvm/llvm-project/pull/155301
2025-09-15 20:55:11 +01:00
Antonio Frighetto
370607065d
[llvm] Regenerate test checks including TBAA semantics (NFC)
Tests exercizing TBAA metadata (both purposefully and not), and
previously generated via UTC, have been regenerated and updated
to version 6.
2025-09-12 20:01:17 +02:00
Joel E. Denny
0e3c5566c0
[PGO] Add llvm.loop.estimated_trip_count metadata (#152775)
This patch implements the `llvm.loop.estimated_trip_count` metadata
discussed in [[RFC] Fix Loop Transformations to Preserve Block
Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785).
As the RFC explains, that metadata enables future patches, such as PR
#128785, to fix block frequency issues without losing estimated trip
counts.
2025-09-11 15:55:18 -04:00
Florian Hahn
1efa997317
[VPlan] Handle stores to single-scalar addr in narrowToSingleScalars.
Move handling of stores to single-scalar/uniform address from
replicateByVF to narrowToSingleScalar.
2025-09-10 21:58:29 +01:00
Nikita Popov
a301e1a895
[InstCombine] Split GEPs with multiple non-zero offsets (#151333)
Split GEPs that have more than one non-zero offset into two GEPs. This
is in preparation for the ptradd migration, which can only represent
such GEPs.

This also enables CSE and LICM of the common base.
2025-09-10 16:51:58 +02:00
Florian Hahn
c3e76b2770
[VPlan] Keep common flags during CSE. (#157664)
During CSE, we don't have to drop all poison-generating flags on
mis-match, we can keep the ones common on both recipes.

PR: https://github.com/llvm/llvm-project/pull/157664
2025-09-10 10:20:48 +00:00
Florian Hahn
c4b17bf9ed
[VPlan] Slightly extend ExtractLastElement fold to single-scalars.
Update ExtractLastElement fold to support single scalar recipes, if all
their users only use scalars.
2025-09-09 22:08:08 +01:00
Florian Hahn
9b1b93766d
Reapply "[SCEVExp] Add helper to clean up dead instructions after expansion. (#157308)"
This reverts commit eeb43806eb1b40e690aeeba496ee974172202df9.

Recommit with with a fix for MSan failure (
https://lab.llvm.org/buildbot/#/builders/169/builds/14799), by adding a
set to track deleted values. Using the InsertedInstructions set is not
sufficient, as it use asserting value handles as keys, which may
dereference the value at construction.

Original message:

Add new helper to erase dead instructions inserted during SCEV expansion
but not being used due to InstSimplifyFolder simplifications.

Together with https://github.com/llvm/llvm-project/pull/157307 this also
allows removing some specialized folds, e.g.
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp#L2205

PR: https://github.com/llvm/llvm-project/pull/157308
2025-09-09 09:47:41 +01:00
Florian Hahn
eeb43806eb
Revert "[SCEVExp] Add helper to clean up dead instructions after expansion. (#157308)"
This reverts commit 528b13df571c86a2c5b8305d7974f135d785e30f.

Triggers MSan errors in some configurations, e.g.
https://lab.llvm.org/buildbot/#/builders/169/builds/14799
2025-09-08 14:52:28 +01:00
Florian Hahn
408a2e7cee
[LV] Remove instcombine,simplifycfg and dce from some tests.
Remove instcombine, simplifycfg and dce from some tests, as they make it
a bit more difficult to see the codegen coming out of LV and most
simplifications are already done on the VPlan-level.

Also modernizes some check lines.
2025-09-08 12:01:37 +01:00
Florian Hahn
528b13df57
[SCEVExp] Add helper to clean up dead instructions after expansion. (#157308)
Add new helper to erase dead instructions inserted during SCEV expansion
but not being used due to InstSimplifyFolder simplifications.

Together with https://github.com/llvm/llvm-project/pull/157307 this also
allows removing some specialized folds, e.g.
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp#L2205

PR: https://github.com/llvm/llvm-project/pull/157308
2025-09-08 10:53:20 +01:00
Florian Hahn
b50ad945dd
[InstSimplify] Simplify extractvalue (umul_with_overflow(x, 1)). (#157307)
Look through extractvalue to simplify umul_with_overflow where one of
the operands is 1.

This removes some redundant instructions when expanding SCEVs, which in
turn makes the runtime check cost estimate more accurate, reducing the
minimum iterations for which vectorization is profitable.

PR: https://github.com/llvm/llvm-project/pull/157307
2025-09-07 18:32:40 +01:00
Florian Hahn
2654690006
[LV] Add additional test with SCEV predicate.
The SCEV predicate in the existing tests for optimizing for size is
known to be false. Add additional test with a predicate that cannot be
proven true/false.

Also generate checks with latest version of script.
2025-09-07 16:14:52 +01:00
Florian Hahn
afa0e70cc6
[LV] Remove instcombine,simplifycfg and dce from some tests.
Remove instcombine, simplifycfg and dce from some tests, as they make it
a bit more difficult to see the codegen coming out of LV and most
simplifications are already done on the VPlan-level.

Also modernizes some check lines.
2025-09-07 10:28:25 +01:00
Florian Hahn
e0f00bd645
[LV] Don't consider second op as invariant in getDivRemSpeculationCost.
The second operand when using a safe divisor will always be a select in
the loop, so won't be invariant; don't treat it as such.

This fixes a divergence with legacy and VPlan based cost model.

Fixes https://github.com/llvm/llvm-project/issues/156066.
2025-09-06 14:06:04 +01:00
Florian Hahn
f8972c8280
[SCEVExp] Fix early exit in ComputeEndCheck. (#156910)
ComputeEndCheck incorrectly returned false for unsigned predicates
starting at zero and a positive step.

The AddRec could still wrap if Step * trunc ExitCount wraps or trunc
ExitCount strips leading 1s.

Fixes https://github.com/llvm/llvm-project/issues/156849.

PR: https://github.com/llvm/llvm-project/pull/156910
2025-09-05 15:13:11 +00:00
Phoebe Wang
94b164c218
[X86][AVX10] Remove EVEX512 and AVX10-256 implementations (#157034)
The 256-bit maximum vector register size control was removed from AVX10
whitepaper, ref: https://cdrdv2.intel.com/v1/dl/getContent/784343

We have warned these options in LLVM21 through #132542. This patch
removes underlying implementations in LLVM22.
2025-09-05 14:08:59 +00:00
Florian Hahn
74ec38fad0
[SCEV] Fold (C * A /u C) -> A, if A is a multiple of C and C a pow-of-2. (#156730)
Alive2 Proof: https://alive2.llvm.org/ce/z/JoHJE9

PR: https://github.com/llvm/llvm-project/pull/156730
2025-09-05 08:45:13 +00:00
Ramkumar Ramachandra
c14052e20b
[VPlan] Let Not preserve uniformity in isSingleScalar (#156676)
LogicalAnd and WidePtrAdd should also preserve uniformity, but we don't
have test coverage to enable adding them.
2025-09-04 11:27:14 +01:00
Ramkumar Ramachandra
e4c0b3e111
[VPlan] Simplify x && false -> false, x | 0 -> x (#156345)
The OR x, 0 -> x simplification has been introduced to avoid
regressions.
2025-09-04 10:29:59 +01:00
Florian Hahn
ce5a1158b8
[LV] Regenerate checks for missing branch weights. 2025-09-03 21:37:52 +01:00
Luke Lau
c33ccfa52b
[VPlan] Reassociate (x & y) & z -> x & (y & z) (#155383)
This PR reassociates logical ands in order to enable more
simplifications.

The driving motivation for this is that with tail folding all blocks
inside the loop body will end up using the header mask. However this can
end up nestled deep within a chain of logical ands from other edges.

Typically the header mask will be a leaf nested in the LHS, e.g.
(headermask & y) & z. So pulling it out allows it to be simplified
further, e.g. allows it to be optimised away to VP intrinsics with EVL
tail folding.
2025-09-03 01:09:19 +00:00
Ramkumar Ramachandra
d8fd511480
[VPlan] Introduce CSE pass (#151872)
Introduce a simple common-subexpression-elimination pass at the
VPlan-level, running late during the execution of the VPlan. The
long-term vision is to get rid of the legacy non-VPlan-based cse routine
in LV, but this patch doesn't yet fully subsume it.
2025-09-02 12:23:29 +01:00
Nikita Popov
055bfc0271
[InstCombine] Strip leading zero indices from GEP (#155415)
GEPs are often in the form `gep [N x %T], ptr %p, i64 0, i64 %idx`.
Canonicalize these to `gep %T, ptr %p, i64 %idx`.

This enables transforms that only support one GEP index to work and
improves CSE.

Various transforms were recently hardened to make sure they still work
without the leading index.
2025-09-01 09:58:11 +02:00
Luke Lau
c9faedd760
[VPlan] Fold common edges away in convertPhisToBlends (#150368)
If a phi is widened with tail folding, all of its predecessors will have
a mask of the form

    %x = logical-and %active-lane-mask, %foo
    %y = logical-and %active-lane-mask, %bar
    %z = logical-and %active-lane-mask, %baz
    ...

We can remove the common %active-lane-mask from all of these edge masks,
which in turn allows us to simplify a lot of VPBlendRecipes.

In particular, it allows the header mask to be removed in selects with
EVL tail folding, improving RISC-V codegen on SPEC CPU 2017 for
525.x264_r, and supersedes #147243.

This also allows us to remove VPBlendRecipe and directly emit
VPInstruction::Select in another patch.
2025-09-01 07:03:33 +00:00
Florian Hahn
0aac22758a
[LV] Correctly cost chains of replicating calls in legacy CM.
Check for scalarized calls in needsExtract to fix a divergence between
legacy and VPlan-based cost model.

The legacy cost model was missing a check for scalarized calls in
needsExtract, which meant if incorrectly assumed the result of a
scalarized call needs extracting.

Exposed by https://github.com/llvm/llvm-project/pull/154617.

Fixes https://github.com/llvm/llvm-project/issues/156091.
2025-08-31 15:13:47 +01:00
Florian Hahn
5ebd59806b
[VPlan] Fold BinaryAnd x, 0 -> 0 in simplifyRecipe.
This also fixes a cost-model divergence in the newly added
tests in constant-fold.ll
2025-08-27 22:35:08 +01:00
David Sherwood
958cec0ab1
[LV] Remove use of llc from vectoriser tests (#154759)
There were 5 X86 loop vectoriser tests that were piping the output from
opt into llc. I think in the directory test/Transforms/LoopVectorize we
should only be testing the output from the loop vectoriser pass. Any
codegen tests should live in test/CodeGen/X86 instead.

avx512.ll: it looks like we were really just testing that we generate
the right vector length.
fp32_to_uint32-cost-model.ll/fp64_to_uint32-cost-model.ll: the tests
only seem to care that we're not scalarising the fptoui, so I've
modified the test to check for vector ops. I've assumed there are
already codegen tests for fptoui vector operations.
vectorization-remarks-loopid-dbg.ll: i've copied this test to
CodeGen/X86/vectorization-remarks-loopid-dbg.ll for the llc RUN line
variant
vectorization-remarks.ll: seems to test the same thing as
vectorization-remarks-loopid-dbg.ll
2025-08-26 09:59:06 +01:00
Florian Hahn
954097dd61
[VPlan] Use SCEV to check subtract in getOptimizableIVOf.
Simplify checks for IV subtractions in getOptimizableIVOf by using SCEV.
This slightly generalizes the patterns we can handle.
2025-08-23 22:00:01 +01:00
Luke Lau
c97c6869b6
[VPlan] Allow folding not (cmp eq) -> icmp ne with other select users (#154497)
Currently we only allow folding not (cmp eq) -> icmp ne if the not is
the only user of the compare.
However a common scenario is that some select might also use the
compare. We can still fold the not if we also swizzle the arms of the
selects.

This helps avoid regressions in #150368
2025-08-22 07:59:14 +08:00
Florian Hahn
e41aaf5a64
[VPlan] Use VPIRMetadata for VPInterleaveRecipe. (#153084)
Use VPIRMetadata for VPInterleaveRecipe to preserve noalias metadata
added by versioning.

This still uses InterleaveGroup's logic to preserve existing metadata
from IR. This can be migrated separately.

Fixes https://github.com/llvm/llvm-project/issues/153006.

PR: https://github.com/llvm/llvm-project/pull/153084
2025-08-21 18:58:10 +01:00
Florian Hahn
d67dba5e88
[VPlan] Check Def2LaneDefs first in cloneForLane. (NFC)
If we have entries in Def2LaneDefs, we always have to use it. Move the
check before.

Otherwise we may not pick the correct operand, e.g. if Op was a
replicate recipe that got single-scalar after replicating it.

Fixes https://github.com/llvm/llvm-project/issues/154330.
2025-08-21 11:34:49 +01:00
Shih-Po Hung
cf0e86118d
[VPlan] Handle canonical VPWidenIntOrFpInduction in branch-condition simplification (#153539)
SimplifyBranchConditionForVFAndUF only recognized canonical IVs and a
few PHI
recipes in the loop header. With more IV-step optimizations,
the canonical widen-canonical-iv can be replaced by a canonical
VPWidenIntOrFpInduction,
which the pass did not handle, causing regressions (missed
simplifications).

This patch replaces canonical VPWidenIntOrFpInduction with a StepVector
in the vector preheader
since the vector loop region only executes once.
2025-08-21 07:34:54 +08:00
Tobias Stadler
8135b7c1ab
[LV] Emit all remarks for unvectorizable instructions (#153833)
If ExtraAnalysis is requested, emit all remarks caused by unvectorizable instructions - instead of only the first.
This is in line with how other places handle DoExtraAnalysis and it can be quite helpful to get info about all instructions in a loop that prevent vectorization.
2025-08-18 18:04:53 +01:00
Florian Hahn
351d398a37
[VPlan] Run final VPlan simplifications before codegen.
Dissolving the hierarchical VPlan CFG and converting abstract to
concrete recipes can expose additional simplification opportunities.

Do a final run of simplifyRecipes before executing the VPlan.
2025-08-16 18:54:27 +01:00
Florian Hahn
2b1e06598f
[LV] Regenerate some more check lines. (NFC) 2025-08-15 15:53:19 +01:00
Florian Hahn
8a0c7e9b32
[LV] Regenerate some more tests. 2025-08-14 21:21:03 +01:00
Florian Hahn
8cdab07aaa
Reapply "[VPlan] Remove trivial dead VPPhi cycles."
This reverts commit 1c7c8e3ad39957285524ff116d9a6aec0d9b62f9.

Recommit with a fix for the verifier error caused for EVL recipes.

Extra test coverage added in 6f939da60e.
2025-08-12 22:09:30 +01:00
Florian Hahn
1c7c8e3ad3
Revert "[VPlan] Remove trivial dead VPPhi cycles."
This reverts commit 1f17bb133f4f49942a1e0245291811ca3c99a7d2.

This seems to be breaking some RISCV bots, reverting for now
https://lab.llvm.org/buildbot/#/builders/210/builds/1266
2025-08-11 22:05:30 +01:00
Florian Hahn
1f17bb133f
[VPlan] Remove trivial dead VPPhi cycles.
Update removeDeadRecipes to remove trivial dead VPPhi cycles.

Should effectively be NFC end-to-end.
2025-08-11 21:29:49 +01:00
Florian Hahn
86813aa786
[VPlan] Add dedicated user for resume phi with epilogue vectorization.
Epilogue vectorization currently relies on the resume phi for the
canonical induction being always available, which is why VPPhi are
considered to have side-effects, to prevent their removal.

This patch adds a new ResumeForEpilogue opcode to mark the resume phi as
used for epilogue vectorization. This allows treating VPPhis in general
as not having side-effects, enabling removal of unused VPPhis.
2025-08-10 21:21:16 +01:00
Florian Hahn
82d633e9ff
[VPlan] Materialize vector trip count using VPInstructions. (#151925)
Materialize the vector trip count computation using VPInstruction
instead of directly creating IR. This is one of the last few steps
needed to model the full vector skeleton in VPlan. It also simplifies
vector-trip count computations for scalable vectors, as we can re-use
the UF x VF computation.

PR: https://github.com/llvm/llvm-project/pull/151925
2025-08-08 11:44:32 +01:00
Nikita Popov
c23b4fbdbb
[IR] Remove size argument from lifetime intrinsics (#150248)
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.

This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
2025-08-08 11:09:34 +02:00
Florian Hahn
25d1285eec
[VPlan] Replace single-entry VPPhis with their incoming values.
Replace trivial, single-entry VPPhis with their incoming values,
2025-08-06 20:03:31 +01:00
Ramkumar Ramachandra
5dfc2d4535
[LV] Regen some tests with UTC (#152128) 2025-08-05 18:01:02 +01:00
Luke Lau
94a6cd464e
[VPlan] Expand VPWidenPointerInductionRecipe into separate recipes (#148274)
This is the VPWidenPointerInductionRecipe equivalent of #118638, with
the motivation of allowing us to use the EVL as the induction step.

There is a new VPInstruction added, WidePtrAdd to allow adding the step
vector to the induction phi, since VPInstruction::PtrAdd only handles
scalars or multiple scalar lanes.

Originally this transformation was copied from the original recipe's
execute code, but it's since been simplifed by teaching
`unrollWidenInductionByUF` to unroll the recipe, which brings it inline
with VPWidenIntOrFpInductionRecipe.
2025-08-05 16:54:02 +08:00
Florian Hahn
eee9755881
[LV] Refine check to find epilogue IV resume value.
Make sure to check that the vector trip count is containedin the list of
incoming values to serve as tie-breaker with phis with all-zero incoming
values.

Fixes https://github.com/llvm/llvm-project/issues/151686.
2025-08-01 20:54:39 +01:00