297 Commits

Author SHA1 Message Date
Kazu Hirata
fae34938f6
[llvm] Use *Set::insert_range (NFC) (#132591)
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range.  This patch uses insert_range with
iterator ranges.  For each case, I've verified that foos is defined as
make_range(foo_begin(), foo_end()) or in a similar manner.
2025-03-22 22:14:45 -07:00
Florian Hahn
dfa665f19c
[VPlan] Add transformation to narrow interleave groups. (#106441)
This patch adds a new narrowInterleaveGroups transfrom, which tries
convert a plan with interleave groups with VF elements to a plan that
instead replaces the interleave groups with wide loads and stores
processing VF elements.

This effectively is a very simple form of loop-aware SLP, where we
use interleave groups to identify candidates.

This initial version is quite restricted and hopefully serves as a
starting point for how to best model those kinds of transforms.

Depends on https://github.com/llvm/llvm-project/pull/106431.

Fixes https://github.com/llvm/llvm-project/issues/82936.

PR: https://github.com/llvm/llvm-project/pull/106441
2025-03-22 21:40:17 +00:00
Kazu Hirata
3520dc5e7a [Vectorize] Fix a build
This patch fixes:

  llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:2263:19: error:
  expected ';' after return statement
2025-03-20 12:52:27 -07:00
Florian Hahn
c73ad7ba20
[VPlan] Add transformation to narrow interleave groups.
This patch adds a new narrowInterleaveGroups transfrom, which tries
convert a plan with interleave groups with VF elements to a plan that
instead replaces the interleave groups with wide loads and stores
processing VF elements.

This effectively is a very simple form of loop-aware SLP, where we
use interleave groups to identify candidates.

This initial version is quite restricted and hopefully serves as a
starting point for how to best model those kinds of transforms. For now
it only transforms load interleave groups feeding store groups.

Depends on #106431.

This lands the main parts of the approved
https://github.com/llvm/llvm-project/pull/106441 as suggested to break
things up a bit more.
2025-03-20 19:41:37 +00:00
Florian Hahn
2e13ec561c
[VPlan] Bail out on non-intrinsic calls in VPlanNativePath.
Update initial VPlan-construction in VPlanNativePath in line with the
inner loop path, in that it bails out when encountering constructs it
cannot handle, like non-intrinsic calls.

Fixes https://github.com/llvm/llvm-project/issues/131071.
2025-03-19 21:35:15 +00:00
Florian Hahn
870f753f1f
[VPlan] Also materialize broadcasts for backedge-taken-counts (NFC).
Also include VPlan's BTC in the set of VPValues to materialize
broadcasts for, if it is used.
2025-03-18 22:35:18 +00:00
Florian Hahn
d51bc83511
[VPlan] Only skip live-ins with constants in materializeBroadccast (NFC)
Currently this should be NFC, but will be needed in future patches.
2025-03-18 20:23:16 +00:00
Luke Lau
a4dc02c0e7
[VPlan] Rename VPReverseVectorPointerRecipe to VPVectorEndPointerRecipe. NFC (#131086)
After #128718 lands there will be two ways of performing a reversed
widened memory access, either by performing a consecutive unit-stride
access and a reverse, or a strided access with a negative stride.

Even though both produce a reversed vector, only the former needs
VPReverseVectorPointerRecipe which computes a pointer to the last
element of each part. A strided reverse still needs a pointer to the
first element of each part so it will use VPVectorPointerRecipe.

This renames VPReverseVectorPointerRecipe to VPVectorEndPointerRecipe to
clarify that a reversed access may not necessarily need a pointer to the
last element.
2025-03-19 00:09:15 +08:00
David Sherwood
3b6d0093aa
[LV][NFC] Refactor code for extracting first active element (#131118)
Refactor the code to extract the first active element of a
vector in the early exit block, in preparation for PR #130766.
I've replaced the VPInstruction::ExtractFirstActive nodes with
a combination of a new VPInstruction::FirstActiveLane node and
a Instruction::ExtractElement node.
2025-03-14 11:14:09 +00:00
Florian Hahn
02575f887b
[VPlan] Use VPInstruction for VPScalarPHIRecipe. (NFCI) (#129767)
Now that all phi nodes manage their incoming blocks through the
VPlan-predecessors, there should be no need for having a dedicate
recipe, it should be sufficient to allow PHI opcodes in VPInstruction.

Follow-ups will also migrate VPWidenPHIRecipe and possibly others,
building on top of https://github.com/llvm/llvm-project/pull/129388.

PR: https://github.com/llvm/llvm-project/pull/129767
2025-03-13 18:35:07 +00:00
Florian Hahn
62994c3291
[VPlan] Also introduce explicit broadcasts for values from entry VPBB.
Update and generalize materializeBroadcasts to also introduce explicit
broadcasts for VPValues defined in the Plans Entry block.

This fixes a crash when trying to insert the broadcasts generated by
VPTransformState::get after the generating instruction, which isn't
possible after invoke instructions.

Fixes https://github.com/llvm/llvm-project/issues/128838.
2025-03-12 22:03:19 +00:00
Florian Hahn
8132c4f554
[VPlan] Also introduce broadcasts for live-ins used in vec preheader.
Slightly generalize materializeLiveInBroadcasts to also introduce
broadcasts for live-ins used in the vector preheader. This should cover
all live-ins.

If the live-in is used in the vector preheader, insert the broadcast at
the beginning of the block.
2025-03-11 21:19:14 +00:00
David Sherwood
055db3ec33
[LV] Optimise latch exit induction users for some early exit loops (#128880)
This is the first of two PRs that attempts to improve the IR
generated in the exit blocks of vectorised loops with uncountable
early exits. In this PR I am improving the generated code for
users of induction variables in early exit loops that have a
unique exit block, when exiting via the latch.

I have moved some of the code for calculating the exit values in
latch exit blocks from `optimizeInductionExitUsers` into a new
function `optimizeLatchExitInductionUser`.

I intend to follow this up very soon with another patch to
optimise the code for induction users in the vector.early.exit
block.
2025-03-11 10:13:16 +00:00
Florian Hahn
8dd160f476
Revert "[VPlan] Fold NOT into predicate of wide compares." (#130347)
Reverts llvm/llvm-project#129430

this seems to have introduced a divergence between legacy and
VPlan-based cost model

https://lab.llvm.org/buildbot/#/builders/30/builds/17159
2025-03-07 21:18:49 +00:00
Florian Hahn
cb3ce30ca8
[VPlan] Fold NOT into predicate of wide compares. (#129430)
Add simplification to fold negation into a compare, if the negation is
the only user of the compare. This removes a number of redundant
negations.

Alive2 Proofs for FPCMP test changes:  https://alive2.llvm.org/ce/z/WGDz9U

PR: https://github.com/llvm/llvm-project/pull/129430
2025-03-07 20:32:43 +00:00
Florian Hahn
b2d70e8796
[VPlan] Use Builder to create cast recipes in VPlanTransforms (NFC).
Use VPBuilder in a few more places. This avoids manual insertions and
will make changing the cast recipe easier in the future.
2025-03-04 13:39:12 +00:00
Florian Hahn
15770a1e9d
[VPlan] Remove dead recipes in entry when merging regions. (NFC)
Also remove recipes in the entry of the region that will be removed.
This makes sure we don't leave any dead users around. NFC at the moment,
but avoids causing issues in the future.
2025-03-04 08:26:27 +00:00
Mel Chen
9b4ad2fe50
[LV][EVL] Support fixed-order recurrence idiom with EVL tail folding. (#124093)
This patch converts the llvm.vector.splice intrinsic to
llvm.experimental.vp.splice, ensuring that fixed-order recurrences
execute correctly when tail folding by EVL is enable.
Due to the non-VFxUF penultimate EVL issue, the EVL from the previous
iteration will be preserved and used in llvm.experimental.vp.splice.
2025-03-03 21:27:13 +08:00
Florian Hahn
6ce41db6b0
[VPlan] Preserve DebugLoc for VPBranchOnMaskRecipe.
Update code to set and generate debug location for branch recipe
2025-02-27 20:19:42 +00:00
Florian Hahn
1e1b9bccc0
[VPlan] Simplify BLEND %a, %b, NOT(%m) -> BLEND %b, %a, %m. (#128375)
Avoid negations for normalized blends by reordering operands.

PR: https://github.com/llvm/llvm-project/pull/128375
2025-02-27 17:43:24 +00:00
Florian Hahn
4277c21059
[VPlan] Introduce explicit broadcasts for live-ins. (#124644)
Add a new VPInstruction::Broadcast opcode and use it to materialize
explicit broadcasts of live-ins. The initial patch only materlizes the
broadcasts if the vector preheader dominates all uses that need it.
Later patches will pick the best valid insert point, thus retiring
implicit hoisting of broadcasts from VPTransformsState::get().

PR: https://github.com/llvm/llvm-project/pull/124644
2025-02-26 13:57:51 +00:00
Florian Hahn
522b05afb6
[VPlan] Construct immutable VPIRBBs for exit blocks at construction(NFC) (#128374)
Constract immutable VPIRBasicBlocks for all exit blocks up front and
keep a list of them. Same as the scalar header, they are leaf nodes of
the VPlan and won't change. Some exit blocks may be unreachable, e.g. if
the scalar epilogue always executes or depending on optimizations.

This simplifies both the way we retrieve the exit blocks as well as
hooking up the exit blocks.

PR: https://github.com/llvm/llvm-project/pull/128374
2025-02-25 14:23:27 +00:00
Florian Hahn
baa77e30f0
[LV] Remove some redundant casts (NFC). 2025-02-24 21:46:29 +00:00
Luke Lau
e23ab73335
[VPlan] Don't convert widen recipes to VP intrinsics in EVL transform (#127180)
This is a copy of #126177, since it was automatically and permanently
closed because I messed up the source branch on my remote

This patch proposes to avoid converting widening recipes to VP
intrinsics during the EVL transform.

IIUC we initially did this to avoid `vl` toggles on RISC-V. However we
now have the RISCVVLOptimizer pass which mostly makes this redundant.

Emitting regular IR instead of VP intrinsics allows more generic
optimisations, both in the middle end and DAGCombiner, and we generally
have better patterns in the RISC-V backend for non-VP nodes. Sticking to
regular IR instructions is likely a lot less work than reimplementing
all of these optimisations for VP intrinsics, and on SPEC CPU 2017 we get
noticeably better code generation.
2025-02-22 19:38:11 +08:00
Florian Hahn
6ff8a06de9
[VPlan] Run recipe removal and simplification after optimizeForVFAndUF. (#125926)
Run recipe simplification and dead recipe removal after VPlan-based
unrolling and optimizeForVFAndUF, to clean up any redundant or dead
recipes introduced by them. Currently this is NFC, as it removes the
corresponding removeDeadRecipes run in optimizeForVFAndUF and no
additional simplifications kick in after unrolling yet. That is changing
with https://github.com/llvm/llvm-project/pull/123655.

Note that with this change, pattern-matching is now applied after
EVL-based recipes have been introduced.

Trying to match VPWidenEVLRecipe when not explicitly requested might
apply a pattern with 2 operands to one with 3 due to the extra EVL
operand and VPWidenEVLRecipe being a subclass of VPWidenRecipe.

To prevent this, update Recipe_match::match to only match
VPWidenEVLRecipe if it is in the requested recipe types (RecipeTy).

PR: https://github.com/llvm/llvm-project/pull/125926
2025-02-08 13:33:46 +00:00
Florian Hahn
ee806646ad
[VPlan] Consistently use hasScalarVFOnly (NFC).
Consistently use hasScalarVFOnly instead of using
hasVF(ElementCount::getFixed(1)). Also add an assert to ensure all cases
are covered by hasScalarVFOnly.
2025-02-08 12:19:25 +00:00
Florian Hahn
5008277322
[VPlan] Move auxiliary declarations out of VPlan.h (NFC). (#124104)
Nothing in VPlan.h directly depends on VPTransformState, VPCostContext,
VPFRange, VPlanPrinter or VPSlotTracker. Move them out to a separate
header to reduce the size of widely used VPlan.h.

This is a first step towards more cleanly separating declarations in
VPlan.

Besides reducing VPlan.h's size, this also allows including additional
VPlan-related headers in VPlanHelpers.h for use there. An example is
using VPDominatorTree in VPTransformState
(https://github.com/llvm/llvm-project/pull/117138).

PR: https://github.com/llvm/llvm-project/pull/124104
2025-02-02 13:44:07 +00:00
David Sherwood
3bc2dade36
[LoopVectorize] Enable vectorisation of early exit loops with live-outs (#120567)
This work feeds part of PR
https://github.com/llvm/llvm-project/pull/88385, and adds support for
vectorising
loops with uncountable early exits and outside users of loop-defined
variables. When calculating the final value from an uncountable early
exit we need to calculate the vector lane that triggered the exit,
and hence determine the value at the point we exited.

All code for calculating the last value when exiting the loop early
now lives in a new vector.early.exit block, which sits between the
middle.split block and the original exit block. Doing this required
two fixes:

1. The vplan verifier incorrectly assumed that the block containing
a definition always dominates the block of the user. That's not true
if you can arrive at the use block from multiple incoming blocks.
This is possible for early exit loops where both the early exit and
the latch jump to the same block.
2. We were adding the new vector.early.exit to the wrong parent loop.
It needs to have the same parent as the actual early exit block from
the original loop.

I've added a new ExtractFirstActive VPInstruction that extracts the
first active lane of a vector, i.e. the lane of the vector predicate
that triggered the exit.

NOTE: The IR generated for dealing with live-outs from early exit
loops is unoptimised, as opposed to normal loops. This inevitably
leads to poor quality code, but this can be fixed up later.
2025-01-30 10:37:00 +00:00
Florian Hahn
2b55ef187c
[VPlan] Add helper to run VPlan passes, verify after run (NFC). (#123640)
Add new runPass helpers to run a VPlan transformation. This makes it
easier to add additional checks/functionality for each transform run. In
this patch, an option is added to run the verifier after each VPlan
transform.

Follow-ups will use the same helper to also support printing VPlans
after each transform.

Note that the verifier at the moment requires there to be a canonical IV
and vector loop region, so the final lowering transforms aren't run via
runPass yet.

PR: https://github.com/llvm/llvm-project/pull/123640
2025-01-29 10:50:01 +00:00
Florian Hahn
09a29fcc8d
[VPlan] Don't collect live-ins in collectUsersInExitBlocks. (NFC) (#123819)
Live-ins don't need to be handled, other than adding to the exit phi
recipe. Do that early and assert that otherwise the exit value is
defined in the vector loop region.

This should enable simply skipping other exit values that do not need
further fixing, e.g. if handling the exit value from the early exit
directly in handleUncountableEarlyExit.

PR: https://github.com/llvm/llvm-project/pull/123819
2025-01-27 16:12:07 +00:00
Florian Hahn
2c87133c62
Reapply "[VPlan] Update final IV exit value via VPlan. (#112147)"
This reverts the revert commit 58326f1d5b5b379590af92dd129b2f3b3e96af46.

The build failure in sanitizer stage2 builds has been fixed with
0d39fe6f5bb3edf0bddec09a8c6417377390aeac.

Original commit message:
Model updating IV users directly in VPlan, replace fixupIVUsers.

Now simple extracts are created for all phis in the exit block during
initial VPlan construction. A later VPlan transform
(optimizeInductionExitUsers) replaces extracts of inductions with
their pre-computed values if possible.

This completes the transition towards modeling all live-outs directly in
VPlan.

There are a few follow-ups:
* emit extracts initially also for resume phis, and optimize them
   tougher with IV exit users
* support for VPlans with multiple exits in optimizeInductionExitUsers.

Depends on https://github.com/llvm/llvm-project/pull/110004,
https://github.com/llvm/llvm-project/pull/109975 and
https://github.com/llvm/llvm-project/pull/112145.
2025-01-19 19:32:03 +00:00
Florian Hahn
58326f1d5b
Revert "[VPlan] Update final IV exit value via VPlan. (#112147)"
This reverts commit c2d15ac4d4432788557e77c15ce572ac655a8fec.

Causes build failures on PPC stage2 & fuchsia bots
    https://lab.llvm.org/buildbot/#/builders/168/builds/7650
    https://lab.llvm.org/buildbot/#/builders/11/builds/11248
2025-01-18 13:40:33 +00:00
Florian Hahn
c2d15ac4d4
[VPlan] Update final IV exit value via VPlan. (#112147)
Model updating IV users directly in VPlan, replace fixupIVUsers.

Now simple extracts are created for all phis in the exit block during
initial VPlan construction. A later VPlan transform 
(optimizeInductionExitUsers) replaces extracts of inductions with 
their pre-computed values if possible.

This completes the transition towards modeling all live-outs directly in
VPlan.

There are a few follow-ups:
* emit extracts initially also for resume phis, and optimize them 
   tougher with IV exit users
* support for VPlans with multiple exits in optimizeInductionExitUsers.


Depends on https://github.com/llvm/llvm-project/pull/110004,
https://github.com/llvm/llvm-project/pull/109975 and
https://github.com/llvm/llvm-project/pull/112145.
2025-01-18 13:22:34 +00:00
Luke Lau
f925e54554
[VPlan] Fix mutating whilst iterating over users in EVL transform (#122885)
This fixes a miscompilation extracted from 525.x264_r, where we were
failing to update the runtime VF of a VPReverseVectorPointerRecipe.

We were removing a use of VF whilst iterating over the users() iterator,
which messed up the iterator in-flight and caused us to miss some
recipes. This fixes it by copying the users into a SmallVector first.

Fixes #122681
Fixes #122682
2025-01-14 22:17:51 +08:00
Florian Hahn
8df64ed777 [LV] Don't consider IV increments uniform if exit value is used outside.
In some cases, there might be a chain of uniform instructions producing
the exit value. To generate correct code in all cases, consider the IV
increment not uniform, if there are users outside the loop.

Instead, let VPlan narrow the IV, if possible using the logic from
3ff1d01985752.

Test case from #122602 verified with Alive2:
    https://alive2.llvm.org/ce/z/bA4EGj

Fixes https://github.com/llvm/llvm-project/issues/122496.
Fixes https://github.com/llvm/llvm-project/issues/122602.
2025-01-12 22:03:21 +00:00
Florian Hahn
3ff1d01985 Recommit "[VPlan] Try to narrow wide and replicating recipes to uniform recipes."
This reverts commit 0ebb3ac7c92c4c1c44e7f3d17832d75ec5a42a67.

Re-applies commit with typos fixed.
2025-01-12 20:10:28 +00:00
Florian Hahn
0ebb3ac7c9 Revert "[VPlan] Try to narrow wide and replicating recipes to uniform recipes."
This reverts commit 1afba19913253dda865a8e57b37b9f4dabead1ac.

Typo breaking the build
2025-01-12 19:37:45 +00:00
Florian Hahn
1afba19913 [VPlan] Try to narrow wide and replicating recipes to uniform recipes.
Use the existing VPlan-based analysis to identify recipes that only have
their first lane demanded and transform them to uniform recpliate
recipes. This simplifies the generated code in some places and prepares
for fixing https://github.com/llvm/llvm-project/issues/122496.
2025-01-12 19:32:01 +00:00
Florian Hahn
7f59b4e998
[VPlan] Skip non-induction phi recipes in legalizeAndOptimizeInductions.
The body of the loop only applies to wide induction recipes, skip any other
header phi recipes up-frond
2025-01-11 20:33:02 +00:00
Florian Hahn
7ffb691595
[VPlan] Remove dead ToRemove (NFC). 2025-01-09 22:02:32 +00:00
Florian Hahn
f9369cc602
[VPlan] Make sure last IV increment value is available if needed.
Legalize extract-from-ends using uniform VPReplicateRecipe of wide
inductions to use regular VPReplicateRecipe, so the correct end value
is available.

Fixes https://github.com/llvm/llvm-project/issues/121745.
2025-01-06 22:40:41 +00:00
Florian Hahn
f4230b4332
[VPlan] Add and use debug location for VPScalarCastRecipe.
Update the recipe it always take a debug location and set it.
2025-01-05 20:08:51 +00:00
Florian Hahn
f48884ded8
[VPlan] Remove loop region in optimizeForVFAndUF. (#108378)
Update optimizeForVFAndUF to completely remove the vector loop region
when possible. At the moment, we cannot remove the region if it contains

* widened IVs: the recipe is needed to generate the step vector
* reductions: ComputeReductionResults requires the reduction phi recipe
for codegen.

Both cases can be addressed by more explicit modeling.

The patch also includes a number of updates to allow executing VPlans
without a vector loop region.

Depends on https://github.com/llvm/llvm-project/pull/110004
2025-01-05 15:50:42 +00:00
Luke Lau
7700695739
[VPlan] Fix crash with EVL tail folding intrinsic with no corresponding VP (#121542)
This fixes a crash when building SPEC CPU 2017 with EVL tail folding
when widening @llvm.log10 intrinsics.

@llvm.log10 and some other intrinsics don't have a corresponding VP
intrinsic, so this fixes the crash by removing the assert and bailing
instead.
2025-01-05 11:41:56 +08:00
Florian Hahn
5f5792aedb
[VPlan] Use removeDeadRecipes in optimizeForVFAndUF (NFCI)
Split off from https://github.com/llvm/llvm-project/pull/108378.
2025-01-02 20:10:46 +00:00
Florian Hahn
ddef380cd6
[VPlan] Move simplifyRecipe(s) definitions up to allow re-use (NFC)
Move definitions to allow easy reuse in
https://github.com/llvm/llvm-project/pull/108378.
2024-12-31 13:23:19 +00:00
Florian Hahn
16d19aaedf
[VPlan] Manage created blocks directly in VPlan. (NFC) (#120918)
This patch changes the way blocks are managed by VPlan. Previously all
blocks reachable from entry would be cleaned up when a VPlan is
destroyed. With this patch, each VPlan keeps track of blocks created for
it in a list and this list is then used to delete all blocks in the list
when the VPlan is destroyed. To do so, block creation is funneled
through helpers in directly in VPlan.

The main advantage of doing so is it simplifies CFG transformations, as
those do not have to take care of deleting any blocks, just adjusting
the CFG. This helps to simplify
https://github.com/llvm/llvm-project/pull/108378 and
https://github.com/llvm/llvm-project/pull/106748.

This also simplifies handling of 'immutable' blocks a VPlan holds
references to, which at the moment only include the scalar header block.

PR: https://github.com/llvm/llvm-project/pull/120918
2024-12-30 12:08:12 +00:00
Florian Hahn
c7a777322d
[VPlan] Replace else-if dyn_cast with cast (NFC).
The recipes handled here are either VPWidenIntrinsic or VPWidenCast, so
replace the else-if dyn_cast with a single else + cast.
2024-12-23 19:46:22 +00:00
LiqinWeng
b1fab4f849
[LV][VPlan] Initialize the variable 'VPID' of the createEVLRecipe (#120926)
Resolve the compilation error caused by the merge issue: #119510
2024-12-23 09:23:22 +08:00
LiqinWeng
8a51471d83
[LV][VPlan] Extract the implementation of transform Recipe to EVLRecipe into a small function. NFC (#119510) 2024-12-23 08:28:19 +08:00