3111 Commits

Author SHA1 Message Date
Florian Hahn
8fa440a1e0
[LV] Add tests for speculatively loading ptrs with UB/poison ops.
Test cases for https://github.com/llvm/llvm-project/issues/142957.
2025-06-06 21:18:41 +01:00
Paul Walker
6955a7d134 [NFC][LLVM][Instrumentation][LoopVectorize] Regenerate test checks. 2025-06-05 11:38:30 +00:00
Luke Lau
5458ea5122 [LV] Regenerate UTC variable names in RISCV/interleaved-accesses.ll. NFC 2025-06-04 01:07:32 +01:00
Ramkumar Ramachandra
86f18d394e
[LV] Re-org tests; introduce iv-select-cmp-decreasing.ll (#141769)
Having FindFirstIV tests in if-reduction.ll is misleading, and
iv-select-cmp.ll is already too large.
2025-06-03 18:13:51 +01:00
Florian Hahn
11713e86b0
[LV] Move VPlan-based calculateRegisterUsage to VPlanAnalysis (NFC). (#135673)
Move VPlan-based calculateRegisterUsage from LoopVectorize
to VPlanAnalysis.cpp. It is a VPlan-based analysis and this helps
to reduce the size of LoopVectorize.

PR: https://github.com/llvm/llvm-project/pull/135673
2025-06-02 17:40:50 +01:00
Ramkumar Ramachandra
b8c4eea3d8
[VPlan] Simplify PredPHI LiveIn -> LiveIn (#142271)
5f39be5 ([VPlan] Use InstSimplifyFolder instead of TargetFolder) updated
simplifyRecipe to fold live-ins to Values that are not necessarily
Constant, but forgot to update the corresponding PredPHI folder, which
still folds PredPHI constant -> constant. Update it to fold PredPHI
LiveIn -> LiveIn.

Fixes #141968.
2025-06-02 14:56:35 +01:00
Florian Hahn
0ba63b2f22
[SCEV] Add additional test coverage for loop-guards reasoning.
Add additional tests showing missed opportunities when using loop guards
for reasoning in SCEV, depending on the order the guards appear in the
IR.
2025-06-01 22:39:37 +01:00
Madhur Amilkanthwar
67ff713052
[NFC][LV] Remove incorrect comment about lack of support (#142126) 2025-05-30 13:25:55 +02:00
Florian Hahn
9ea4924720
[VPlan] Use EMIT-SCALAR for single-scalar VPPhis (NFC).
Follow-up to https://github.com/llvm/llvm-project/pull/141428, to also
use EMIT-SCALAR for VPPhis that are single scalars.
2025-05-29 11:20:07 +01:00
Florian Hahn
5b85e4b08d
[VPlan] Use EMIT-SCALAR when printing single-scalar VPInstructions. (#141428)
By using SINGLE-SCALAR when printing, it is clear in the debug output
that those VPInstructions only produce a single scalar.

Split off in preparation for
https://github.com/llvm/llvm-project/pull/140623.

PR: https://github.com/llvm/llvm-project/pull/141428
2025-05-29 09:29:06 +01:00
Elvis Wang
332fe08f1d
[VPlan] Implement VPlan-based cost model for VPReduction, VPExtendedReduction and VPMulAccumulateReduction. (#113903)
This patch implement the VPlan-based cost model for VPReduction,
VPExtendedReduction and VPMulAccumulateReduction.

With this patch, we can calculate the reduction cost by the VPlan-based
cost model so remove the reduction costs in `precomputeCost()`.

Ref: Original instruction based implementation:
https://reviews.llvm.org/D93476
2025-05-29 11:15:16 +08:00
Florian Hahn
440a8adb86
[VPlan] Use VPIRFlags to manage FMFs for ComputeReductionResult (NFC).
Manage fast-math flags using VPIRFlags from VPInstruciton, in inline
with other VPInstructions. With this change, we now print the correctly
flags for ComputeReductionResult, other than that NFC.
2025-05-28 20:54:58 +01:00
Paul Walker
9aebf4c399 [NFC][LLVM] Tests for vectorisation of loops with vscale base trip counts. 2025-05-28 12:42:41 +00:00
Ramkumar Ramachandra
5f39be5917
[VPlan] Use InstSimplifyFolder instead of TargetFolder (#141222)
For more powerful folding with operands that are not necessarily
all-constant, use InstSimplifyFolder instead of TargetFolder in
tryToConstantFold, and rename the function tryToFoldLiveIns.
2025-05-28 11:00:14 +02:00
Florian Hahn
d56deea1e4
[VPlan] Connect Entry to scalar preheader during initial construction. (#140132)
Update initial construction to connect the Plan's entry to the scalar
preheader during initial construction. This moves a small part of the
 skeleton creation out of ILV and will also enable replacing
 VPInstruction::ResumePhi with regular VPPhi recipes.

Resume phis need 2 incoming values to start with, the second being the
bypass value from the scalar ph (and used to replicate the incoming
value for other bypass blocks). Adding the extra edge ensures we
incoming values for resume phis match the incoming blocks.

PR: https://github.com/llvm/llvm-project/pull/140132
2025-05-27 16:07:56 +01:00
Luke Lau
841c8d48a6 [LV] Add tests for more interleave group factors on AArch64 and RISC-V. NFC
The plan is to eventually add support for scalably vectorizing these for
non-power-of-2 factors, see https://github.com/llvm/llvm-project/pull/139893

Simultaneously, we need to add a test to make sure we don't generate
@llvm.vector.[de]interleave3 for AArch64 if we can't lower it (yet)
2025-05-26 18:21:27 +01:00
Philip Reames
041d189f01
[RISCV][TTI] Adjust costing in getPartialReductionCost for zvqdotq (#141430)
Two changes:

1) Handle fixed vector cases now that 77a3f8 has landed. 
2) Fix a mistake in the original costing - the VF passed in is the
   input VF, not the output VF.  Given that we should be costing the
   accumulator type with VF/4.

Note that (2) does not cause any visible test differences as the
vectorizer (outside of maximize-bandwidth mode) does not consider wide
enough VF for the costing difference to matter.
2025-05-26 08:23:56 -07:00
Florian Hahn
dcef154b5c
[VPlan] Replace VPRegionBlock with explicit CFG before execute (NFCI). (#117506)
Building on top of https://github.com/llvm/llvm-project/pull/114305,
replace VPRegionBlocks with explicit CFG before executing.

This brings the final VPlan closer to the IR that is generated and
helps to simplify codegen.

It will also enable further simplifications of phi handling during
execution and transformations that do not have to preserve the 
canonical IV required by loop regions. This for example could include
replacing the canonical IV with an EVL based phi while completely
removing the original canonical IV.

PR: https://github.com/llvm/llvm-project/pull/117506
2025-05-24 19:17:16 +01:00
Florian Hahn
e089d48944
[VPlan] VPWidenGEPRecipe uses first lane of invariant indices (NFC)
Update VPWidenGEPRecipe::onlyFirstLaneUsed to return true for indices
that are defined outside the loop regions, if the base pointer is not
invariant.
2025-05-24 17:32:05 +01:00
Florian Hahn
69f2ff3e9b
[LV] Add test case showing unnecessary broadcast of invariant GEP idx. 2025-05-24 15:19:21 +01:00
Luke Lau
4b4699a13c
[InstCombine] Don't cover up poison elements for shifts when folding shuffles thru binops (#141303)
As noted in the TODO, we don't need to cover up the poison elements
placed in the unused lanes for shifts, since it's not UB unlike div/rem.

New poison elements are only introduced in cases like

ShMask = <1,1,2,2> and C = <5,5,6,6> --> NewC = <poison,5,6,poison>

And the resulting shuffle won't use the poison lanes.
2025-05-24 13:47:18 +01:00
Florian Hahn
a9b2998e31
[VPlan] Skip cost assert if VPlan converted to single-scalar recipes.
Check if a VPlan transform converted recipes to single-scalar
VPReplicateRecipes (after 07c085af3efcd67503232f99a1652efc6e54c1a9). If
that's the case, the legacy cost model incorrectly overestimates the cost.

Fixes https://github.com/llvm/llvm-project/issues/141237.
2025-05-24 11:09:27 +01:00
Philip Reames
a21fb74c0c
[RISCV][TTI] Implement getPartialReductionCost for the vqdotq cases (#140974)
Doing so tells the loop vectorizer that the partial.reduce intrinsic is
profitable to use over the plain extend/multiply/reduce.add sequence.
2025-05-23 07:15:06 -07:00
Florian Hahn
95ba5508e5
Reapply "[VPlan] Move predication to VPlanTransform (NFC). (#128420)"
This reverts commit 793bb6b257fa4d9f4af169a4366cab3da01f2e1f.

The recommitted version contains a fix to make sure only the original
phis are processed in convertPhisToBlends nu collecting them in a vector
first. This fixes a crash when no mask is needed, because there is only
a single incoming value.

Original message:
This patch moves the logic to predicate and linearize a VPlan to a
dedicated VPlan transform. It mostly ports the existing logic directly.

There are a number of follow-ups planned in the near future to
further improve on the implementation:
* Edge and block masks are cached in VPPredicator, but the block masks
are still made available to VPRecipeBuilder, so they can be accessed
during recipe construction. As a follow-up, this should be replaced by
adding mask operands to all VPInstructions that need them and use that
during recipe construction.
* The mask caching in a map also means that this map needs updating each
time a new recipe replaces a VPInstruction; this would also be handled
by adding mask operands.

PR: https://github.com/llvm/llvm-project/pull/128420
2025-05-22 08:16:15 +01:00
Mohammad Bashir
bcdce987c0
Fix regression tests with bad FileCheck checks (#140373)
Fixes https://github.com/llvm/llvm-project/issues/140149
2025-05-22 07:59:57 +03:00
Philip Reames
c21416d1f9 [RISCV][TTI] Add test coverage for getPartialReductionCost [nfc]
Adding testing in advance of a change to cost the zvqdotq instructions
such that we emit them from LV.
2025-05-21 15:12:23 -07:00
Florian Hahn
bf15aadcbc
[VPlan] Don't try to narrow predicated VPReplicateRecipe.
We cannot convert predicated recipes to uniform ones at the moment.
This fixes a crash reported for https://github.com/llvm/llvm-project/pull/139150.
2025-05-21 22:13:55 +01:00
Ramkumar Ramachandra
cf1f116f78
[VPlan] Introduce constant folder in simplifyRecipe (#125365)
Introduce a VPlan-level constant folder in simplifyRecipe that tries to
fold a recipe to a constant using TargetFolder.
2025-05-20 14:16:01 +01:00
Sam Tebbs
70501ed2f0
[LoopVectorizer] Prune VFs based on plan register pressure (#132190)
This PR moves the register usage checking to after the plans are
created, so that any recipes that optimise register usage (such as
partial reductions) can be properly costed and not have their VF pruned
unnecessarily.

Depends on https://github.com/llvm/llvm-project/pull/137746
2025-05-19 13:27:17 +01:00
Florian Hahn
07c085af3e
[VPlan] Add narrowToSingleScalarRecipe transform. (#139150)
Add a new convertToUniformRecipes transform which uses VPlan-based
uniformity analysis to determine if wide recipes and replicate recipes
can be converted to uniform recipes.

There are a few places where we ad-hoc convert recipes to uniform
recipes, which this transform will eventually replace. There are a few
more generalizations required to do so which I plan to do as follow-ups.

By converting the recipes to uniform recipes, we effectively materialize
the information from the VPlan-based analysis.

Note that there is one regression at the moment in SystemZ/pr47665.ll
due to trivial constant folding opportunities in the input IR. This will
be fixed by VPlan-based constant folding
(https://github.com/llvm/llvm-project/pull/125365/)

PR: https://github.com/llvm/llvm-project/pull/139150
2025-05-18 09:32:27 +01:00
Florian Hahn
ba93685ea2
[VPlan] Also use original parent loop for exit VPBBs.
When vectorizing loops with early exits that is nested within another
one, one of the loop exits may be outside both loops, so setting adding
it to the parent loop is incorrect. Also use the original parent loop
for exit blocks.
2025-05-16 21:12:39 +01:00
Elvis Wang
664c937b43
[VPlan] Implement VPExtendedReduction, VPMulAccumulateReductionRecipe and corresponding vplan transformations. (#137746)
This patch introduce two new recipes.

* VPExtendedReductionRecipe
  - cast + reduction.

* VPMulAccumulateReductionRecipe
  - (cast) + mul + reduction.

This patch also implements the transformation that match following
patterns via vplan and converts to abstract recipes for better cost
estimation.

* VPExtendedReduction
  - reduce(cast(...))

* VPMulAccumulateReductionRecipe
  - reduce.add(mul(...))
  - reduce.add(mul(ext(...), ext(...))
  - reduce.add(ext(mul(ext(...), ext(...))))

The converted abstract recipes will be lower to the concrete recipes
(widen-cast + widen-mul + reduction) just before recipe execution.

Note that this patch still relies on legacy cost model the calculate the
cost for these patters.
Will enable vplan-based cost decision in #113903.

Split from #113903.
2025-05-16 10:25:38 +08:00
Min-Yih Hsu
0ab67ec191
[LV][EVL] Introduce the EVLIndVarSimplify Pass for EVL-vectorized loops (#131005)
When we enable EVL-based loop vectorization w/ predicated tail-folding,
each vectorized loop has effectively two induction variables: one
calculates the step using (VF x vscale) and the other one increases the
IV by values returned from experiment.get.vector.length. The former,
also known as canonical IV, is more favorable for analyses as it's
"countable" in the sense of SCEV; the latter (EVL-based IV), however, is
more favorable to codegen, at least for those that support scalable
vectors like AArch64 SVE and RISC-V.

The idea is that we use canonical IV all the way until the end of all
vectorizers, where we replace it with EVL-based IV using EVLIVSimplify
introduced here. Such that we can have the best from both worlds.

This Pass is enabled by default in RISC-V. However, since we haven't
really vectorize loops with predicate tail-folding by default, this Pass
is no-op at this moment.
2025-05-14 13:49:50 -07:00
Florian Hahn
7a9fd62278
[VPlan] Use VPlan operand order for VPBlendRecipes. (#139475)
Don't use the order of incoming values of IR phis when creating 
VPBlendRecipes. Instead, simply use the incoming operands and
blocks from the VPWidenPHIRecipe.

Note that this changes the order of the incoming operands/masks for some
blends.

PR: https://github.com/llvm/llvm-project/pull/139475
2025-05-14 14:56:35 +01:00
Florian Hahn
5fa64d65e9
[VPlan] Use printPhiOperands for VPPhi.
Split off from  https://github.com/llvm/llvm-project/pull/139151 to land
printing improvements separately.

Updates printing of VPPhi operands to be consistent with
VPWidenPHIRecipe.
2025-05-10 12:49:29 +01:00
Florian Hahn
8c6c525a6b
[LV] Don't consider FORs as profitable to scalarize.
Fixed-order recurrence phis cannot be scalarized, they will always be
widened at the moment. Make sure they are not incorrectly considered
profitable to scalarize, similar to 41c1a7be3f1a2556e.

Fixes https://github.com/llvm/llvm-project/issues/139060.
Fixes https://github.com/llvm/llvm-project/issues/139065.
2025-05-09 20:29:22 +01:00
Ramkumar Ramachandra
f058333941
[LV] Regen a test with UTC (#139235) 2025-05-09 14:26:20 +01:00
Florian Hahn
e854c381c6
[VPlan] Manage noalias/alias_scope metadata in VPlan. (#136450)
Use VPIRMetadata added in
https://github.com/llvm/llvm-project/pull/135272
to also manage no-alias metadata added by versioning.

Note that this means we have to build the no-alias metadata up-front
once. If it is not used, it will be discarded automatically.

This also fixes a case where incorrect metadata was added to wide
loads/stores that got converted from an interleave group.

Compile-time impact is neutral:

https://llvm-compile-time-tracker.com/compare.php?from=38bf1af41c5425a552a53feb13c71d82873f1c18&to=2fd7844cfdf5ec0f1c2ce0b9b3ae0763245b6922&stat=instructions:u
2025-05-09 11:19:12 +01:00
Florian Hahn
d06d43a9e8
[VPlan] Add printPhiOperands to VPPhiAccessors, use for wide phis.
(NFC modulo debug output changes)

Add generic helper to print phi operands (incoming values) together with
their incoming blocks.

As more and more transforms are added, keeping the incoming blocks of
phis becomes more important. Print incoming blocks via VPPhiAcessors, to
make debugging easier.
2025-05-08 20:56:48 +01:00
Florian Hahn
339dc9500b
[VPlan] Retain exit conditions and edges in initial VPlan (NFC). (#137709)
Update initial VPlan construction to include exit conditions and edges.

The loop region is now first constructed without entry/exiting. Those
are set after inserting the region in the CFG, to preserve the original
predecessor/successor order of blocks.

For now, all early exits are disconnected before forming the regions,
but a follow-up will update uncountable exit handling to also happen
here. This is required to enable VPlan predication and remove the
dependence any IR BBs
(https://github.com/llvm/llvm-project/pull/128420).

PR: https://github.com/llvm/llvm-project/pull/137709
2025-05-08 18:10:52 +01:00
Ramkumar Ramachandra
c4f723a7c3
[LV] Strip unmaintainable MinBWs assert (#136858)
tryToWiden attempts to replace an Instruction with a Constant from SCEV,
but forgets to erase the Instruction from the MinBWs map, leading to an
assert in VPlanTransforms::truncateToMinimalBitwidths. Going forward,
the assertion in truncateToMinimalBitwidths is unmaintainable, as LV
could simplify the expression at any point: fix the bug by stripping the
unmaintable assertion.

Fixes #125278.
2025-05-08 11:49:54 +01:00
Luke Lau
1484f82cbc
[VPlan] Add VPInstruction::StepVector and use it in VPWidenIntOrFpInductionRecipe (#129508)
Split off from #118638, this adds VPInstruction::StepVector, which
generates integer step vectors (0,1,2,...,VF). This is a step towards
eventually modelling all the separate parts of
VPWidenIntOrFpInductionRecipe in VPlan.

This is then used by VPWidenIntOrFpInductionRecipe, where we materialize
it just before unrolling so the operands stay in a fixed position.

The need for a separate operand in VPWidenIntOrFpInductionRecipe, as
well as the need to update it in
optimizeVectorInductionWidthForTCAndVFUF, should be removed with #118638
when everything is expanded in convertToConcreteRecipes.
2025-05-08 18:47:44 +08:00
Florian Hahn
127f48668b
[LV] Add test showing incorrect metadata merging when narrowing IGs.
Add test showing that incorrect tbaa metadata is added to the widened
loads and stores when narrowing interleave groups.

The widened loads/stores currently have the TBAA metadata of the first
load/store, even though the wide accesses also access data with types of
the second load/store.
2025-05-08 11:13:25 +01:00
Paul Walker
01813e8929
[LLVM][VecLib] Refactor LIBMVEC integration to be target neutral. (#138262)
Renames LIBMVEC-X86 to LIBMVEC and updates TLI to only add the existing
x86 specific mapping when targeting x86.
2025-05-07 11:05:25 +01:00
Min-Yih Hsu
e0537c0768
[LV][EVL] Attach a new metadata on EVL vectorized loops (#131000)
This patch attaches a new metadata, `llvm.loop.isvectorized.withevl`, on
loops vectorized with explicit vector length. This will help other
optimizations down in the pipeline that focus on EVL-vectorized loop

This approach is much safer than, said IR pattern matching to figure out
if a loop is EVL-vectorized or not.
2025-05-06 10:06:37 -07:00
Maryam Moghadas
a750893fea
[VPlan][LV] Fix invalid truncation in VPScalarIVStepsRecipe (#137832)
Replace CreateTrunc with CreateSExtOrTrunc in VPScalarIVStepsRecipe to
safely handle type conversion. This prevents assertion failures from
invalid truncation when StartIdx0 has a smaller integer type than
IntStepTy. The assertion was introduced by commit 783a846.
Fixes https://github.com/llvm/llvm-project/issues/137185
2025-05-06 12:48:21 -04:00
Florian Hahn
9a26b2903b
[VPlan] Don't rely on region check in isUniformAfterVectorization. (#137883)
Generalize isUniformAfterVectorization check to not rely on the region,
but purely work on checking operands and opcodes.

This will be needed when disolving the vector region
(https://github.com/llvm/llvm-project/pull/117506) and improves codegen
slightly in some cases.

PR: https://github.com/llvm/llvm-project/pull/137883
2025-05-02 15:42:21 +01:00
Sam Tebbs
2876dbcd66
[AArch64] Don't allow mixed partial reductions without i8mm (#137602)
Partial reductions with mixed extends should only be allowed if i8mm is
present.
2025-05-01 16:06:37 +01:00
Samuel Tebbs
fa769655e7 [LV] NFC: Make VPPartialReductionRecipe a VPReductionRecipe 2025-04-30 19:44:40 +01:00
Luke Lau
2cd829fc2c
[VectorUtils][VPlan] Consolidate VPWidenIntrinsicRecipe::onlyFirstLaneUsed and isVectorIntrinsicWithScalarOpAtArg (#137497)
We can reuse isVectorIntrinsicWithScalarOpAtArg in VectorUtils to
determine if only the first lane will be used for a
VPWidenIntrinsicRecipe, provided that we also move the VP EVL operand
check into it.

This was needed by a local patch I was working on that created a
VPWidenIntrinsicRecipe with a VP intrinsic, and prevents the need to
update the scalar arguments in two places.
2025-05-01 01:25:41 +08:00