275 Commits

Author SHA1 Message Date
Florian Hahn
baa77e30f0
[LV] Remove some redundant casts (NFC). 2025-02-24 21:46:29 +00:00
Luke Lau
e23ab73335
[VPlan] Don't convert widen recipes to VP intrinsics in EVL transform (#127180)
This is a copy of #126177, since it was automatically and permanently
closed because I messed up the source branch on my remote

This patch proposes to avoid converting widening recipes to VP
intrinsics during the EVL transform.

IIUC we initially did this to avoid `vl` toggles on RISC-V. However we
now have the RISCVVLOptimizer pass which mostly makes this redundant.

Emitting regular IR instead of VP intrinsics allows more generic
optimisations, both in the middle end and DAGCombiner, and we generally
have better patterns in the RISC-V backend for non-VP nodes. Sticking to
regular IR instructions is likely a lot less work than reimplementing
all of these optimisations for VP intrinsics, and on SPEC CPU 2017 we get
noticeably better code generation.
2025-02-22 19:38:11 +08:00
Florian Hahn
6ff8a06de9
[VPlan] Run recipe removal and simplification after optimizeForVFAndUF. (#125926)
Run recipe simplification and dead recipe removal after VPlan-based
unrolling and optimizeForVFAndUF, to clean up any redundant or dead
recipes introduced by them. Currently this is NFC, as it removes the
corresponding removeDeadRecipes run in optimizeForVFAndUF and no
additional simplifications kick in after unrolling yet. That is changing
with https://github.com/llvm/llvm-project/pull/123655.

Note that with this change, pattern-matching is now applied after
EVL-based recipes have been introduced.

Trying to match VPWidenEVLRecipe when not explicitly requested might
apply a pattern with 2 operands to one with 3 due to the extra EVL
operand and VPWidenEVLRecipe being a subclass of VPWidenRecipe.

To prevent this, update Recipe_match::match to only match
VPWidenEVLRecipe if it is in the requested recipe types (RecipeTy).

PR: https://github.com/llvm/llvm-project/pull/125926
2025-02-08 13:33:46 +00:00
Florian Hahn
ee806646ad
[VPlan] Consistently use hasScalarVFOnly (NFC).
Consistently use hasScalarVFOnly instead of using
hasVF(ElementCount::getFixed(1)). Also add an assert to ensure all cases
are covered by hasScalarVFOnly.
2025-02-08 12:19:25 +00:00
Florian Hahn
5008277322
[VPlan] Move auxiliary declarations out of VPlan.h (NFC). (#124104)
Nothing in VPlan.h directly depends on VPTransformState, VPCostContext,
VPFRange, VPlanPrinter or VPSlotTracker. Move them out to a separate
header to reduce the size of widely used VPlan.h.

This is a first step towards more cleanly separating declarations in
VPlan.

Besides reducing VPlan.h's size, this also allows including additional
VPlan-related headers in VPlanHelpers.h for use there. An example is
using VPDominatorTree in VPTransformState
(https://github.com/llvm/llvm-project/pull/117138).

PR: https://github.com/llvm/llvm-project/pull/124104
2025-02-02 13:44:07 +00:00
David Sherwood
3bc2dade36
[LoopVectorize] Enable vectorisation of early exit loops with live-outs (#120567)
This work feeds part of PR
https://github.com/llvm/llvm-project/pull/88385, and adds support for
vectorising
loops with uncountable early exits and outside users of loop-defined
variables. When calculating the final value from an uncountable early
exit we need to calculate the vector lane that triggered the exit,
and hence determine the value at the point we exited.

All code for calculating the last value when exiting the loop early
now lives in a new vector.early.exit block, which sits between the
middle.split block and the original exit block. Doing this required
two fixes:

1. The vplan verifier incorrectly assumed that the block containing
a definition always dominates the block of the user. That's not true
if you can arrive at the use block from multiple incoming blocks.
This is possible for early exit loops where both the early exit and
the latch jump to the same block.
2. We were adding the new vector.early.exit to the wrong parent loop.
It needs to have the same parent as the actual early exit block from
the original loop.

I've added a new ExtractFirstActive VPInstruction that extracts the
first active lane of a vector, i.e. the lane of the vector predicate
that triggered the exit.

NOTE: The IR generated for dealing with live-outs from early exit
loops is unoptimised, as opposed to normal loops. This inevitably
leads to poor quality code, but this can be fixed up later.
2025-01-30 10:37:00 +00:00
Florian Hahn
2b55ef187c
[VPlan] Add helper to run VPlan passes, verify after run (NFC). (#123640)
Add new runPass helpers to run a VPlan transformation. This makes it
easier to add additional checks/functionality for each transform run. In
this patch, an option is added to run the verifier after each VPlan
transform.

Follow-ups will use the same helper to also support printing VPlans
after each transform.

Note that the verifier at the moment requires there to be a canonical IV
and vector loop region, so the final lowering transforms aren't run via
runPass yet.

PR: https://github.com/llvm/llvm-project/pull/123640
2025-01-29 10:50:01 +00:00
Florian Hahn
09a29fcc8d
[VPlan] Don't collect live-ins in collectUsersInExitBlocks. (NFC) (#123819)
Live-ins don't need to be handled, other than adding to the exit phi
recipe. Do that early and assert that otherwise the exit value is
defined in the vector loop region.

This should enable simply skipping other exit values that do not need
further fixing, e.g. if handling the exit value from the early exit
directly in handleUncountableEarlyExit.

PR: https://github.com/llvm/llvm-project/pull/123819
2025-01-27 16:12:07 +00:00
Florian Hahn
2c87133c62
Reapply "[VPlan] Update final IV exit value via VPlan. (#112147)"
This reverts the revert commit 58326f1d5b5b379590af92dd129b2f3b3e96af46.

The build failure in sanitizer stage2 builds has been fixed with
0d39fe6f5bb3edf0bddec09a8c6417377390aeac.

Original commit message:
Model updating IV users directly in VPlan, replace fixupIVUsers.

Now simple extracts are created for all phis in the exit block during
initial VPlan construction. A later VPlan transform
(optimizeInductionExitUsers) replaces extracts of inductions with
their pre-computed values if possible.

This completes the transition towards modeling all live-outs directly in
VPlan.

There are a few follow-ups:
* emit extracts initially also for resume phis, and optimize them
   tougher with IV exit users
* support for VPlans with multiple exits in optimizeInductionExitUsers.

Depends on https://github.com/llvm/llvm-project/pull/110004,
https://github.com/llvm/llvm-project/pull/109975 and
https://github.com/llvm/llvm-project/pull/112145.
2025-01-19 19:32:03 +00:00
Florian Hahn
58326f1d5b
Revert "[VPlan] Update final IV exit value via VPlan. (#112147)"
This reverts commit c2d15ac4d4432788557e77c15ce572ac655a8fec.

Causes build failures on PPC stage2 & fuchsia bots
    https://lab.llvm.org/buildbot/#/builders/168/builds/7650
    https://lab.llvm.org/buildbot/#/builders/11/builds/11248
2025-01-18 13:40:33 +00:00
Florian Hahn
c2d15ac4d4
[VPlan] Update final IV exit value via VPlan. (#112147)
Model updating IV users directly in VPlan, replace fixupIVUsers.

Now simple extracts are created for all phis in the exit block during
initial VPlan construction. A later VPlan transform 
(optimizeInductionExitUsers) replaces extracts of inductions with 
their pre-computed values if possible.

This completes the transition towards modeling all live-outs directly in
VPlan.

There are a few follow-ups:
* emit extracts initially also for resume phis, and optimize them 
   tougher with IV exit users
* support for VPlans with multiple exits in optimizeInductionExitUsers.


Depends on https://github.com/llvm/llvm-project/pull/110004,
https://github.com/llvm/llvm-project/pull/109975 and
https://github.com/llvm/llvm-project/pull/112145.
2025-01-18 13:22:34 +00:00
Luke Lau
f925e54554
[VPlan] Fix mutating whilst iterating over users in EVL transform (#122885)
This fixes a miscompilation extracted from 525.x264_r, where we were
failing to update the runtime VF of a VPReverseVectorPointerRecipe.

We were removing a use of VF whilst iterating over the users() iterator,
which messed up the iterator in-flight and caused us to miss some
recipes. This fixes it by copying the users into a SmallVector first.

Fixes #122681
Fixes #122682
2025-01-14 22:17:51 +08:00
Florian Hahn
8df64ed777 [LV] Don't consider IV increments uniform if exit value is used outside.
In some cases, there might be a chain of uniform instructions producing
the exit value. To generate correct code in all cases, consider the IV
increment not uniform, if there are users outside the loop.

Instead, let VPlan narrow the IV, if possible using the logic from
3ff1d01985752.

Test case from #122602 verified with Alive2:
    https://alive2.llvm.org/ce/z/bA4EGj

Fixes https://github.com/llvm/llvm-project/issues/122496.
Fixes https://github.com/llvm/llvm-project/issues/122602.
2025-01-12 22:03:21 +00:00
Florian Hahn
3ff1d01985 Recommit "[VPlan] Try to narrow wide and replicating recipes to uniform recipes."
This reverts commit 0ebb3ac7c92c4c1c44e7f3d17832d75ec5a42a67.

Re-applies commit with typos fixed.
2025-01-12 20:10:28 +00:00
Florian Hahn
0ebb3ac7c9 Revert "[VPlan] Try to narrow wide and replicating recipes to uniform recipes."
This reverts commit 1afba19913253dda865a8e57b37b9f4dabead1ac.

Typo breaking the build
2025-01-12 19:37:45 +00:00
Florian Hahn
1afba19913 [VPlan] Try to narrow wide and replicating recipes to uniform recipes.
Use the existing VPlan-based analysis to identify recipes that only have
their first lane demanded and transform them to uniform recpliate
recipes. This simplifies the generated code in some places and prepares
for fixing https://github.com/llvm/llvm-project/issues/122496.
2025-01-12 19:32:01 +00:00
Florian Hahn
7f59b4e998
[VPlan] Skip non-induction phi recipes in legalizeAndOptimizeInductions.
The body of the loop only applies to wide induction recipes, skip any other
header phi recipes up-frond
2025-01-11 20:33:02 +00:00
Florian Hahn
7ffb691595
[VPlan] Remove dead ToRemove (NFC). 2025-01-09 22:02:32 +00:00
Florian Hahn
f9369cc602
[VPlan] Make sure last IV increment value is available if needed.
Legalize extract-from-ends using uniform VPReplicateRecipe of wide
inductions to use regular VPReplicateRecipe, so the correct end value
is available.

Fixes https://github.com/llvm/llvm-project/issues/121745.
2025-01-06 22:40:41 +00:00
Florian Hahn
f4230b4332
[VPlan] Add and use debug location for VPScalarCastRecipe.
Update the recipe it always take a debug location and set it.
2025-01-05 20:08:51 +00:00
Florian Hahn
f48884ded8
[VPlan] Remove loop region in optimizeForVFAndUF. (#108378)
Update optimizeForVFAndUF to completely remove the vector loop region
when possible. At the moment, we cannot remove the region if it contains

* widened IVs: the recipe is needed to generate the step vector
* reductions: ComputeReductionResults requires the reduction phi recipe
for codegen.

Both cases can be addressed by more explicit modeling.

The patch also includes a number of updates to allow executing VPlans
without a vector loop region.

Depends on https://github.com/llvm/llvm-project/pull/110004
2025-01-05 15:50:42 +00:00
Luke Lau
7700695739
[VPlan] Fix crash with EVL tail folding intrinsic with no corresponding VP (#121542)
This fixes a crash when building SPEC CPU 2017 with EVL tail folding
when widening @llvm.log10 intrinsics.

@llvm.log10 and some other intrinsics don't have a corresponding VP
intrinsic, so this fixes the crash by removing the assert and bailing
instead.
2025-01-05 11:41:56 +08:00
Florian Hahn
5f5792aedb
[VPlan] Use removeDeadRecipes in optimizeForVFAndUF (NFCI)
Split off from https://github.com/llvm/llvm-project/pull/108378.
2025-01-02 20:10:46 +00:00
Florian Hahn
ddef380cd6
[VPlan] Move simplifyRecipe(s) definitions up to allow re-use (NFC)
Move definitions to allow easy reuse in
https://github.com/llvm/llvm-project/pull/108378.
2024-12-31 13:23:19 +00:00
Florian Hahn
16d19aaedf
[VPlan] Manage created blocks directly in VPlan. (NFC) (#120918)
This patch changes the way blocks are managed by VPlan. Previously all
blocks reachable from entry would be cleaned up when a VPlan is
destroyed. With this patch, each VPlan keeps track of blocks created for
it in a list and this list is then used to delete all blocks in the list
when the VPlan is destroyed. To do so, block creation is funneled
through helpers in directly in VPlan.

The main advantage of doing so is it simplifies CFG transformations, as
those do not have to take care of deleting any blocks, just adjusting
the CFG. This helps to simplify
https://github.com/llvm/llvm-project/pull/108378 and
https://github.com/llvm/llvm-project/pull/106748.

This also simplifies handling of 'immutable' blocks a VPlan holds
references to, which at the moment only include the scalar header block.

PR: https://github.com/llvm/llvm-project/pull/120918
2024-12-30 12:08:12 +00:00
Florian Hahn
c7a777322d
[VPlan] Replace else-if dyn_cast with cast (NFC).
The recipes handled here are either VPWidenIntrinsic or VPWidenCast, so
replace the else-if dyn_cast with a single else + cast.
2024-12-23 19:46:22 +00:00
LiqinWeng
b1fab4f849
[LV][VPlan] Initialize the variable 'VPID' of the createEVLRecipe (#120926)
Resolve the compilation error caused by the merge issue: #119510
2024-12-23 09:23:22 +08:00
LiqinWeng
8a51471d83
[LV][VPlan] Extract the implementation of transform Recipe to EVLRecipe into a small function. NFC (#119510) 2024-12-23 08:28:19 +08:00
Florian Hahn
e1833e3a7e
[VPlan] Simplify redundant VPDerivedIVRecipe (NFC).
Split DerivedIV simplification off from
https://github.com/llvm/llvm-project/pull/112145 and use to remove the
need for extra checks in createScalarIVSteps. Required an extra
simplification run after IV transforms.
2024-12-22 09:39:19 +00:00
LiqinWeng
86fa35ce7e
[LV][VPlan] Use opcode to retrieve the VPID of the CallRecipe, rather than underlying instruction (#120816)
This patch may cause the flags in the CallRecipe to be lost after EVL
transformation, and it has been addressed in the patch: #119847
2024-12-22 10:28:20 +08:00
Florian Hahn
9b496deb90
[VPlan] Set and use debug location for VPPredInstPHIRecipe.
Update the recipe it always set its debug location and use it during IR
generation.
2024-12-21 21:57:47 +00:00
Luke Lau
c2a879ecaa
[VPlan] Fix VPTypeAnalysis cache clobbering in EVL transform (#120252)
When building SPEC CPU 2017 with RISC-V and EVL tail folding, this
assertion in VPTypeAnalysis would trigger during the transformation to
EVL recipes:


d8a0709b10/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp (L135-L142)

It was caused by this recipe:

```
WIDEN ir<%shr> = vp.or ir<%add33>, ir<0>, vp<%6>
```

Having its type inferred as i16, when ir<%add33> and ir<0> had inferred
types of i32 somehow.

The cause of this turned out to be because the VPTypeAnalysis cache was
getting clobbered: In this transform we were erasing recipes but keeping
around the same mapping from VPValue* to Type*. In the meantime, new
recipes would be created which would have the same address as the old
value. They would then incorrectly get the old erased VPValue*'s cached
type:

```
  --- before ---
  0x600001ec5030: WIDEN ir<%mul21.neg> = vp.mul vp<%11>, ir<0>, vp<%6>
  0x600001ec5450: <badref> <- some value that was erased
  --- after ---
  0x600001ec5030: WIDEN ir<%mul21.neg> = vp.mul vp<%11>, ir<0>, vp<%6>
  0x600001ec5450: WIDEN ir<%shr> = vp.or ir<%add33>, ir<0>, vp<%6>  <- a new value that happens to have the same address
```

This fixes this by deferring the erasing of recipes till after the
transformation.

The test case might be a bit flakey since it just happens to have the
right conditions to recreate this. I tried to add an assert in
inferScalarType that every VPValue in the cache was valid, but couldn't
find a way of telling if a VPValue had been erased.

---------

Co-authored-by: Florian Hahn <flo@fhahn.com>
2024-12-18 11:28:28 +08:00
Luke Lau
4a7f60d328
[VPlan] Handle VPWidenCastRecipe without underlying value in EVL transform (#120194)
This fixes a crash that shows up when building SPEC CPU 2017 with EVL
tail folding on RISC-V.

A VPWidenCastRecipe doesn't always have an underlying value, and in the
case of this crash this happens whenever a widened cast is created via
truncateToMinimalBitwidths.

Fix this by just using the opcode stored in the recipe itself.

I think a similar issue exists with VPWidenIntrinsicRecipe and how it's
widened, but I haven't run into any crashes with it just yet.
2024-12-18 11:28:07 +08:00
Florian Hahn
734a204fbd
[VPlan] Manage VPWidenIntOrFPInduction debug location via recipe (NFC).
Properly set VPWidenIntOrFpInductionRecipe's debug location in the
recipe and use it, instead of using the debug location of the underlying
IR instruction.
2024-12-15 13:45:28 +00:00
Florian Hahn
2564f1e199
[VPlan] Simplify Not(Not(A)) -> A.
Follow-up simplification to 5fae408d3a4c073ee4.
2024-12-14 20:08:26 +00:00
Florian Hahn
6c8f41d336
[VPlan] Hook IR blocks into VPlan during skeleton creation (NFC) (#114292)
As a first step to move towards modeling the full skeleton in VPlan,
start by wrapping IR blocks created during legacy skeleton creation in
VPIRBasicBlocks and hook them into the VPlan. This means the skeleton
CFG is represented in VPlan, just before execute. This allows moving
parts of skeleton creation into recipes in the VPBBs gradually.

Note that this allows retiring some manual DT updates, as this will be
handled automatically during VPlan execution.

PR: https://github.com/llvm/llvm-project/pull/114292
2024-12-12 15:58:16 +00:00
Luke Lau
b26fe5b7e9
[VPlan] Use variadic isa<> in a few more places. NFC (#119538) 2024-12-12 13:26:39 +08:00
Florian Hahn
5fae408d3a
[VPlan] Dispatch to multiple exit blocks via middle blocks. (#112138)
A more lightweight variant of
https://github.com/llvm/llvm-project/pull/109193,
which dispatches to multiple exit blocks via the middle blocks.

The patch also introduces a bit of required scaffolding to enable
early-exit vectorization, including an option. At the moment, early-exit
vectorization doesn't come with legality checks, and is only used if the
option is provided and the loop has metadata forcing vectorization. This
is only intended to be used for testing during bring-up, with @david-arm
enabling auto early-exit vectorization plugging in the changes from
https://github.com/llvm/llvm-project/pull/88385.

PR: https://github.com/llvm/llvm-project/pull/112138
2024-12-11 21:11:05 +00:00
LiqinWeng
b759020cc8
[LV][EVL] Support cast instruction with EVL-vectorization (#108351) 2024-12-11 10:01:41 +08:00
Florian Hahn
afef545efa
[VPlan] Address post-commit for #114305.
Apply suggested renaming and adjust placement as suggested in
https://github.com/llvm/llvm-project/pull/114305. Also drop unneeded
RPOT creation.
2024-12-08 21:24:19 +00:00
Florian Hahn
a7fda0e1e4
[VPlan] Introduce VPScalarPHIRecipe, use for can & EVL IV codegen (NFC). (#114305)
Introduce a general recipe to generate a scalar phi. Lower
VPCanonicalIVPHIRecipe and VPEVLBasedIVRecipe to VPScalarIVPHIrecipe
before plan execution, avoiding the need for duplicated ::execute
implementations. There are other cases that could benefit, including
in-loop reduction phis and pointer induction phis.

Builds on a similar idea as
https://github.com/llvm/llvm-project/pull/82270.

PR: https://github.com/llvm/llvm-project/pull/114305
2024-12-03 14:53:51 +00:00
Florian Hahn
77767986ed
[LV] Use IsaPred in a few more places (NFC).
Simplifies the code slightly by removing explicit lambdas.
2024-12-01 18:47:53 +00:00
LiqinWeng
4a3f46de50
[LV][EVL] Support call instruction with EVL-vectorization (#110412) 2024-11-28 10:05:08 +08:00
Florian Hahn
590f451b60
[VPlan] Allow setting IR name for VPDerivedIVRecipe (NFCI).
Allow setting the name to use for the generated IR value of the derived
IV in preparations for https://github.com/llvm/llvm-project/pull/112145.

This is analogous to VPInstruction::Name.
2024-11-24 20:39:12 +00:00
Shih-Po Hung
632c5d2991
[VPlan] Support VPReverseVectorPointer in DataWithEVL vectorization (#113667)
VPReverseVectorPointer relies on the runtime VF, but in DataWithEVL
tail-folding, EVL (which can be less than VF at runtime) should be used
instead.

This patch updates the logic to check the users of VF and replaces the
second operand if the user is VPReverseVectorPointer.
2024-11-22 17:18:39 +08:00
Stephen Tozer
caa9a82797
[DebugInfo][LoopVectorizer] Avoid dropping !dbg in optimizeForVFAndUF (#114243)
Prior to this patch, optimizeForVFAndUF may optimize the conditional
branch for a VPBasicblock to have a constant condition, but
unnecessarily drops the DILocation attachment when it does so; this
patch changes it to preserve the DILocation.
2024-11-14 09:33:46 +00:00
Florian Hahn
ccb40b0b7a
[VPlan] Add insertOnEdge to VPBlockUtils (NFC).
Add a new helper to insert a new VPBlockBase on an edge between 2
blocks. Suggested in https://github.com/llvm/llvm-project/pull/114292
and also useful for some existing code.
2024-11-09 21:19:39 +00:00
Florian Hahn
95eeae195e
[VPlan] Add PredIdx and SuccIdx arguments to connectBlocks (NFC).
Add extra arguments to connectBlocks which allow selecting which
existing predecessor/successor to update. This avoids having to
disconnect blocks first unnecessarily.

Suggested in https://github.com/llvm/llvm-project/pull/114292.
2024-11-09 17:18:40 +00:00
Mel Chen
4480a22c2b
[LV][EVL] Emit vp.merge intrinsic to enable out-loop reduction in EVL vectorization. (#101641)
Following #90184, this patch emits vp.merge intrinsic, which is used to
set the inactive lanes in a select operation to the RHS instead of
undef. Currently, it is applied to out-loop reduction for EVL
vectorization.

This patch performs transformation to convert 
  select(header_mask, LHS, RHS)
into
  vp.merge(all-true, LHS, RHS, EVL) 
And always use the predicated reduction select to set the incoming value
of the reduction phi to support out-loop reduction when using tail
folding with EVL.

TODO: Postpone the adjustment of the predicated reduction select to
VPlanTransform. The current adjustment might be too early, which could
lead to a situation where the predicated reduction select is adjusted,
but the EVL recipes cannot be successfully generated during
VPlanTransform.
2024-11-06 14:53:49 +08:00
Kazu Hirata
aa825b74af
[Vectorize] Remove unused includes (NFC) (#114643)
Identified with misc-include-cleaner.
2024-11-03 08:58:51 -08:00