973 Commits

Author SHA1 Message Date
Paul Walker
7b8fd8f31b
[LLVM][SCEV] Look through common vscale multiplicand when simplifying compares. (#141798)
My usecase is simplifying the control flow generated by LoopVectorize
when vectorising loops whose tripcount is a function of the runtime
vector length. This can be problematic because:

* CSE is a pre-LoopVectorize transform and so it's common for an IR
function to include several calls to llvm.vscale(). (NOTE: Code
generation will typically remove the duplicates)
* Pre-LoopVectorize instcombines will rewrite some multiplies as shifts.
This leads to a mismatch between VL based maths of the scalar loop and
that created for the vector loop, which prevents some obvious
simplifications.

SCEV does not suffer these issues because it effectively does CSE during
construction and shifts are represented as multiplies.
2025-09-19 12:57:13 +01:00
Florian Hahn
0c028bbf33
[LV] Always add uniform pointers to uniforms list.
Always add pointers proved to be uniform via legal/SCEV to worklist.
This extends the existing logic to handle a few more pointers known to
be uniform.
2025-09-18 22:56:19 +01:00
Florian Hahn
70a7ffdc29
[LV] Add missing test cover for replicating load/store costs. 2025-09-18 19:47:06 +01:00
Florian Hahn
50b9ca4dda
[VPlan] Simplify Plan's entry in removeBranchOnConst. (#154510)
After https://github.com/llvm/llvm-project/pull/153643, there may be a
BranchOnCond with constant condition in the entry block.

Simplify those in removeBranchOnConst. This removes a number of
redundant conditional branch from entry blocks.

In some cases, it may also make the original scalar loop unreachable,
because we know it will never execute. In that case, we need to remove
the loop from LoopInfo, because all unreachable blocks may dominate each
other, making LoopInfo invalid. In those cases, we can also completely
remove the loop, for which I'll share a follow-up patch.

Depends on https://github.com/llvm/llvm-project/pull/153643.

PR: https://github.com/llvm/llvm-project/pull/154510
2025-09-18 19:25:05 +01:00
Hassnaa Hamdi
e8aa0b688a
[LV]: Ensure fairness when selecting epilogue VF. (#155547)
Consider IC when deciding if epilogue profitable for scalable vectors,
same as fixed-width vectors.
2025-09-17 14:48:10 +01:00
Sander de Smalen
17e008db17
[IR] NFC: Remove 'experimental' from partial.reduce.add intrinsic (#158637)
The partial reduction intrinsics are no longer experimental, because
they've been used in production for a while and are unlikely to change.
2025-09-17 11:44:47 +01:00
Ramkumar Ramachandra
46fcece2a8
[VPlan] Extend CSE to eliminate GEPs (#156699)
The motivation for this patch is to close the gap between the
VPlan-based CSE and the legacy CSE, to make it easier to remove the
legacy CSE. Before this patch, stubbing out the legacy CSE leads to 22
test failures, and after this patch, there are only 12 failures, and all
of them seem to have a single root cause:
VPlanTransforms::createInterleaveGroups() and
VPInterleaveGroup::execute(). The improvements from this patch are of
course welcome.

While developing the patch, a miscompile was found when GEP
source-element-types differ, and this has been fixed.

Co-authored-by: Florian Hahn <flo@fhahn.com>
Co-authored-by: Luke Lau <luke@igalia.com>
2025-09-16 10:14:32 +00:00
Florian Hahn
4949cb4a5e
[VPlan] Track VPValues instead of VPRecipes in calculateRegisterUsage. (#155301)
Update calculateRegisterUsageForPlan to track live-ness of VPValues
instead of recipes. This gives slightly more accurate results for
recipes that define multiple values (i.e. VPInterleaveRecipe).

When tracking the live-ness of recipes, all VPValues defined by an
VPInterleaveRecipe are considered alive until the last use of any of
them. When tracking the live-ness of individual VPValues, we can
accurately track the individual values until their last use.

Note the changes in large-loop-rdx.ll and pr47437.ll. This patch
restores the original behavior before introducing VPlan-based liveness
tracking.

PR: https://github.com/llvm/llvm-project/pull/155301
2025-09-15 20:55:11 +01:00
Florian Hahn
985dc69a2d
[LV] Add test for missed interleaving after narrowing interleave groups.
Add extra test coverage for
https://github.com/llvm/llvm-project/pull/149706. The added loop should
be interleaved, after narrowing interleave groups, which requires moving
the transform earlier.
2025-09-15 17:33:59 +01:00
Florian Hahn
2848e28012
[LV] Add test with partial reduction without narrowing. 2025-09-15 11:56:58 +01:00
Joel E. Denny
0e3c5566c0
[PGO] Add llvm.loop.estimated_trip_count metadata (#152775)
This patch implements the `llvm.loop.estimated_trip_count` metadata
discussed in [[RFC] Fix Loop Transformations to Preserve Block
Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785).
As the RFC explains, that metadata enables future patches, such as PR
#128785, to fix block frequency issues without losing estimated trip
counts.
2025-09-11 15:55:18 -04:00
Florian Hahn
055e4ff35a
[VPlan] Don't narrow op multiple times in narrowInterleaveGroups.
Track which ops already have been narrowed, to avoid narrowing the same
operation multiple times. Repeated narrowing will lead to incorrect
results, because we could first narrow from an interleave group -> wide
load, and then narrow the wide load > single-scalar load.

Fixes thttps://github.com/llvm/llvm-project/issues/156190.
2025-09-10 19:22:42 +01:00
Florian Hahn
7b828738c6
[LV] Add tests with multiple store groups re-using widened ops.
Test coverage for https://github.com/llvm/llvm-project/issues/156190.
2025-09-10 17:10:46 +01:00
Nikita Popov
a301e1a895
[InstCombine] Split GEPs with multiple non-zero offsets (#151333)
Split GEPs that have more than one non-zero offset into two GEPs. This
is in preparation for the ptradd migration, which can only represent
such GEPs.

This also enables CSE and LICM of the common base.
2025-09-10 16:51:58 +02:00
Hassnaa Hamdi
5739142345
[LV][AArch64][NFC]: Change TC in a test case. (#157512)
- In sve-epilog-vscale-fixed.ll file, it tests the preference of
fixed-width epilogue VF vs scalable when costs are equal. This NFC patch
is changing the TC in the test case to be unknown to avoid folding the
epilogue in future LV changes.
2025-09-10 12:41:49 +01:00
David Green
204917ea97
[LoopVectorizer][AArch64] Add a -sve-vscale-for-tuning override option. (#156916)
It can be useful for debugging and tuning to be able to alter the
VScaleForTuning. This adds a quick option to the aarch64 subtarget for
altering it.
2025-09-09 10:46:12 +01:00
Florian Hahn
9b1b93766d
Reapply "[SCEVExp] Add helper to clean up dead instructions after expansion. (#157308)"
This reverts commit eeb43806eb1b40e690aeeba496ee974172202df9.

Recommit with with a fix for MSan failure (
https://lab.llvm.org/buildbot/#/builders/169/builds/14799), by adding a
set to track deleted values. Using the InsertedInstructions set is not
sufficient, as it use asserting value handles as keys, which may
dereference the value at construction.

Original message:

Add new helper to erase dead instructions inserted during SCEV expansion
but not being used due to InstSimplifyFolder simplifications.

Together with https://github.com/llvm/llvm-project/pull/157307 this also
allows removing some specialized folds, e.g.
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp#L2205

PR: https://github.com/llvm/llvm-project/pull/157308
2025-09-09 09:47:41 +01:00
Florian Hahn
eeb43806eb
Revert "[SCEVExp] Add helper to clean up dead instructions after expansion. (#157308)"
This reverts commit 528b13df571c86a2c5b8305d7974f135d785e30f.

Triggers MSan errors in some configurations, e.g.
https://lab.llvm.org/buildbot/#/builders/169/builds/14799
2025-09-08 14:52:28 +01:00
Florian Hahn
528b13df57
[SCEVExp] Add helper to clean up dead instructions after expansion. (#157308)
Add new helper to erase dead instructions inserted during SCEV expansion
but not being used due to InstSimplifyFolder simplifications.

Together with https://github.com/llvm/llvm-project/pull/157307 this also
allows removing some specialized folds, e.g.
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp#L2205

PR: https://github.com/llvm/llvm-project/pull/157308
2025-09-08 10:53:20 +01:00
Florian Hahn
afa0e70cc6
[LV] Remove instcombine,simplifycfg and dce from some tests.
Remove instcombine, simplifycfg and dce from some tests, as they make it
a bit more difficult to see the codegen coming out of LV and most
simplifications are already done on the VPlan-level.

Also modernizes some check lines.
2025-09-07 10:28:25 +01:00
Florian Hahn
59d72b57b0
[LV] Modernize and regenerate checks for some tests. 2025-09-06 20:52:29 +01:00
Luke Lau
4e5e65e55d
[VPlan] Only compute reg pressure if considered. NFCI (#156923)
In #149056 VF pruning was changed so that it only pruned VFs that
stemmed from MaxBandwidth being enabled.

However we always compute register pressure regardless of whether or not
max bandwidth is permitted for any VFs (via
`MaxPermissibleVFWithoutMaxBW`).

This skips the computation if not needed and renames the method for
clarity.

The diff in reg-usage.ll is due to the scalable VPlan not actually
having any maxbandwidth VFs, so I've changed it to check the
fixed-length VF instead, which is affected by maxbandwidth.
2025-09-05 00:23:47 +00:00
Hassnaa Hamdi
35b22764e2
[LV][AArch64] Prefer epilogue with fixed-width over scalable VF. (#155546)
In case of equal costs Prefer epilogue with fixed-width over scalable VF.
That is helpful in cases like post-LTO vectorization where epilogue with
fixed-width VF can be removed when we eventually know that the trip count
is less than the epilogue iterations.
2025-09-04 19:31:30 +01:00
Ramkumar Ramachandra
e4c0b3e111
[VPlan] Simplify x && false -> false, x | 0 -> x (#156345)
The OR x, 0 -> x simplification has been introduced to avoid
regressions.
2025-09-04 10:29:59 +01:00
Florian Hahn
f1e91bff42
[LV] Regenerate more checks for missing branch weights. 2025-09-03 22:18:04 +01:00
Florian Hahn
ce5a1158b8
[LV] Regenerate checks for missing branch weights. 2025-09-03 21:37:52 +01:00
Luke Lau
c33ccfa52b
[VPlan] Reassociate (x & y) & z -> x & (y & z) (#155383)
This PR reassociates logical ands in order to enable more
simplifications.

The driving motivation for this is that with tail folding all blocks
inside the loop body will end up using the header mask. However this can
end up nestled deep within a chain of logical ands from other edges.

Typically the header mask will be a leaf nested in the LHS, e.g.
(headermask & y) & z. So pulling it out allows it to be simplified
further, e.g. allows it to be optimised away to VP intrinsics with EVL
tail folding.
2025-09-03 01:09:19 +00:00
Ramkumar Ramachandra
d8fd511480
[VPlan] Introduce CSE pass (#151872)
Introduce a simple common-subexpression-elimination pass at the
VPlan-level, running late during the execution of the VPlan. The
long-term vision is to get rid of the legacy non-VPlan-based cse routine
in LV, but this patch doesn't yet fully subsume it.
2025-09-02 12:23:29 +01:00
David Sherwood
e867b85118
[LV] Always emit branch weights for vector epilogue (#155437)
We currently only emit the branch weights for the epilogue
iteration count check if there was already branch weight
data for the scalar loop. However, the code makes no use
of the existing branch weight when estimating the
likelihood of taking a particular branch and so we can
just always add the branch weights regardless. These
hints should hopefully improve code generation.
2025-09-02 10:15:21 +01:00
Kerry McLaughlin
f0e9bba024
[LoopVectorize] Generate wide active lane masks (#147535)
This patch adds a new flag (-enable-wide-lane-mask) which allows
LoopVectorize to generate wider-than-VF active lane masks when it
is safe to do so (i.e. the mask is used for data and control flow).

The transform in extractFromWideActiveLaneMask creates vector
extracts from the first active lane mask in the header & loop body,
modifying the active lane mask phi operands to use the extracts.

An additional operand is passed to the ActiveLaneMask instruction,
the value of which is used as a multiplier of VF when generating the
mask.
By default this is 1, and is updated to UF by
extractFromWideActiveLaneMask.

The motivation for this change is to improve interleaved loops when
SVE2.1 is available, where we can make use of the whilelo instruction
which returns a predicate pair.

This is based on a PR that was created by @momchil-velikov (#81140)
and contains tests which were added there.
2025-09-01 13:53:30 +01:00
Nikita Popov
055bfc0271
[InstCombine] Strip leading zero indices from GEP (#155415)
GEPs are often in the form `gep [N x %T], ptr %p, i64 0, i64 %idx`.
Canonicalize these to `gep %T, ptr %p, i64 %idx`.

This enables transforms that only support one GEP index to work and
improves CSE.

Various transforms were recently hardened to make sure they still work
without the leading index.
2025-09-01 09:58:11 +02:00
Luke Lau
c9faedd760
[VPlan] Fold common edges away in convertPhisToBlends (#150368)
If a phi is widened with tail folding, all of its predecessors will have
a mask of the form

    %x = logical-and %active-lane-mask, %foo
    %y = logical-and %active-lane-mask, %bar
    %z = logical-and %active-lane-mask, %baz
    ...

We can remove the common %active-lane-mask from all of these edge masks,
which in turn allows us to simplify a lot of VPBlendRecipes.

In particular, it allows the header mask to be removed in selects with
EVL tail folding, improving RISC-V codegen on SPEC CPU 2017 for
525.x264_r, and supersedes #147243.

This also allows us to remove VPBlendRecipe and directly emit
VPInstruction::Select in another patch.
2025-09-01 07:03:33 +00:00
Florian Hahn
465b17c450
[VPlan] Support scalable VFs in narrowInterleaveGroups. (#154842)
Update narrowInterleaveGroups to support scalable VFs. After the
transform, the vector loop will process a single iteration of the
original vector loop for fixed-width vectors and vscale iterations for
scalable vectors.
2025-08-31 20:45:07 +01:00
Florian Hahn
13aff91e7c
Revert "[VPlan] Support plans with vector pointers in narrowInterleaveGroups."
This reverts commit 806a797c52d8018639f5cdcce5eb375b17c87f5e as it
introduced a miscompile.
2025-08-31 19:37:24 +01:00
Florian Hahn
806a797c52
[VPlan] Support plans with vector pointers in narrowInterleaveGroups.
After narrowing interleave groups and related memory operations, all
vector pointers should be removed. Remove the check.

In preparation for https://github.com/llvm/llvm-project/pull/149706.
2025-08-29 20:55:40 +01:00
Florian Hahn
c7d1425016
[LV] Auto-generated some check lines of tests. 2025-08-27 17:49:11 +01:00
Florian Hahn
5faed1ad84
[VPlan] Add VPlan-based addMinIterCheck, replace ILV for non-epilogue. (#153643)
This patch adds a new VPlan-based addMinimumIterationCheck, which
replaced the ILV version for the non-epilogue case.

The VPlan-based version constructs a SCEV expression to compute the
minimum iterations, use that to check if the check is known true or
false. Otherwise it creates a VPExpandSCEV recipe and emits a
compare-and-branch.

When using epilogue vectorization, we still need to create the minimum
trip-count-check during the legacy skeleton creation. The patch moves
the definitions out of ILV.

PR: https://github.com/llvm/llvm-project/pull/153643
2025-08-26 15:52:31 +01:00
Mircea Trofin
3af4597ac9
[NFC][SimplifyCFG] Simplify operators for the combined predicate in mergeConditionalStoreToAddress (#155058)
This is about code readability. The operands in the disjunction forming the combined predicate in `mergeConditionalStoreToAddress` could sometimes be negated twice. This patch addresses that.

2 tests needed updating because they exposed the double negation and now they don’t.
2025-08-26 07:07:59 -07:00
Kerry McLaughlin
884c03e71b
[LV] Return Invalid from getLegacyCost when instruction cost forced. (#154543)
LoopVectorizationCostModel::expectedCost will only override the cost
returned by getInstructionCost when valid. This patch ensures we do
the same in VPCostContext::getLegacyCost, avoiding the "VPlan cost
model and legacy cost model disagreed" assert in the included test.
2025-08-26 10:26:57 +01:00
Florian Hahn
c950a72974
[VPlan] Support scalar VF for ExtractLane and FirstActiveLane.
Extend ExtractLane and FirstActiveLane to support scalable VFs. This
allows correct handling when interleaving with VF = 1.

Alive2 proofs:
 - Fixed codegen with this patch: https://alive2.llvm.org/ce/z/8Y5_Vc
   (verifies as correct)
 - Original codegen: https://alive2.llvm.org/ce/z/twdg3X (doesn't
   verify)

Fixes https://github.com/llvm/llvm-project/issues/154967.
2025-08-25 21:45:21 +01:00
Florian Hahn
f492eb9509
[VPlan] Make VPInstruction::AnyOf poison-safe. (#154156)
AnyOf reduces multiple input vectors to a single boolean value. When
used for early-exit vectorization, we need to consider any lane after
the early exit being poison. Any poison lane would result in poison
after the AnyOf reduction. To prevent this, freeze all inputs to AnyOf.

Fixes https://github.com/llvm/llvm-project/issues/153946.
Fixes https://github.com/llvm/llvm-project/issues/155162.

https://alive2.llvm.org/ce/z/FD-XxA

PR: https://github.com/llvm/llvm-project/pull/154156
2025-08-25 18:55:23 +01:00
Ramkumar Ramachandra
66be00d635
[VPlan] Introduce m_Cmp; match more compares (#154771)
Extend [Specific]Cmp_match to handle floating-point compares, and
introduce m_Cmp that matches both integer and floating-point compares.
Use it in simplifyRecipe to match and simplify the general case of
compares. The change has necessitated a bugfix in
VPReplicateRecipe::execute.
2025-08-24 13:27:06 +01:00
Florian Hahn
524665fe96
[LV] Auto-generate check lines for tail-fold-uniform-memops.ll. (NFC) 2025-08-23 17:25:44 +01:00
Florian Hahn
ba5d487ac4
[LV] Add test with interleave groups separated by offset.
Adds extra test coverage for
https://github.com/llvm/llvm-project/pull/91196.
2025-08-21 21:40:47 +01:00
Shih-Po Hung
cf0e86118d
[VPlan] Handle canonical VPWidenIntOrFpInduction in branch-condition simplification (#153539)
SimplifyBranchConditionForVFAndUF only recognized canonical IVs and a
few PHI
recipes in the loop header. With more IV-step optimizations,
the canonical widen-canonical-iv can be replaced by a canonical
VPWidenIntOrFpInduction,
which the pass did not handle, causing regressions (missed
simplifications).

This patch replaces canonical VPWidenIntOrFpInduction with a StepVector
in the vector preheader
since the vector loop region only executes once.
2025-08-21 07:34:54 +08:00
Florian Hahn
7d33743324
[LV] Add tests for narrowing interleave groups with scalable vectors. 2025-08-20 22:31:24 +01:00
Florian Hahn
b0d0e04693
[LV] Add test where we choose VF * IC is larger than trip count. 2025-08-20 20:40:49 +01:00
Florian Hahn
dc23869f98
[LV] Handle vector trip count being zero in preparePlanForEpiVectorLoop.
After a485e0e, we may not set the vector trip count in
preparePlanForEpilogueVectorLoop if it is zero. We should not choose a
VF * UF that makes the main vector loop dead (i.e. vector trip count is
zero), but there are some cases where this can happen currently.

In those cases, set EPI.VectorTripCount to zero.
2025-08-20 11:54:22 +01:00
Florian Hahn
23ea79de61
[LV] Add more tests for costs of predicated udivs and calls.
Adds missing test coverage for the cost model. Also reduce the size of
check lines a bit, by using a common prefix and filtering out after
scalar.ph.
2025-08-19 20:04:31 +01:00
David Sherwood
13d8ba7dea
[LV][TTI] Calculate cost of extracting last index in a scalable vector (#144086)
There are a couple of places in the loop vectoriser where we
want to calculate the cost of extracting the last lane in a
vector. However, we wrongly assume that asking for the cost
of extracting lane (VF.getKnownMinValue() - 1) is an accurate
representation of the cost of extracting the last lane. For
SVE at least, this is non-trivial as it requires the use of
whilelo and lastb instructions.

To solve this problem I have added a new
getReverseVectorInstrCost interface where the index is used
in reverse from the end of the vector. Suppose a vector has
a given ElementCount EC, the extracted/inserted lane would be
EC - 1 - Index. For scalable vectors this index is unknown at
compile time. I've added a AArch64 hook that better represents
the cost, and also a RISCV hook that maintains compatibility
with the behaviour prior to this PR.

I've also taken the liberty of adding support in vplan for
calculating the cost of VPInstruction::ExtractLastElement.
2025-08-19 09:31:37 +01:00