83 Commits

Author SHA1 Message Date
Florian Hahn
dcef154b5c
[VPlan] Replace VPRegionBlock with explicit CFG before execute (NFCI). (#117506)
Building on top of https://github.com/llvm/llvm-project/pull/114305,
replace VPRegionBlocks with explicit CFG before executing.

This brings the final VPlan closer to the IR that is generated and
helps to simplify codegen.

It will also enable further simplifications of phi handling during
execution and transformations that do not have to preserve the 
canonical IV required by loop regions. This for example could include
replacing the canonical IV with an EVL based phi while completely
removing the original canonical IV.

PR: https://github.com/llvm/llvm-project/pull/117506
2025-05-24 19:17:16 +01:00
Florian Hahn
95ba5508e5
Reapply "[VPlan] Move predication to VPlanTransform (NFC). (#128420)"
This reverts commit 793bb6b257fa4d9f4af169a4366cab3da01f2e1f.

The recommitted version contains a fix to make sure only the original
phis are processed in convertPhisToBlends nu collecting them in a vector
first. This fixes a crash when no mask is needed, because there is only
a single incoming value.

Original message:
This patch moves the logic to predicate and linearize a VPlan to a
dedicated VPlan transform. It mostly ports the existing logic directly.

There are a number of follow-ups planned in the near future to
further improve on the implementation:
* Edge and block masks are cached in VPPredicator, but the block masks
are still made available to VPRecipeBuilder, so they can be accessed
during recipe construction. As a follow-up, this should be replaced by
adding mask operands to all VPInstructions that need them and use that
during recipe construction.
* The mask caching in a map also means that this map needs updating each
time a new recipe replaces a VPInstruction; this would also be handled
by adding mask operands.

PR: https://github.com/llvm/llvm-project/pull/128420
2025-05-22 08:16:15 +01:00
Florian Hahn
793bb6b257
Revert "[VPlan] Move predication to VPlanTransform (NFC). (#128420)"
This reverts commit b263c08e1a0b54a871915930aa9a1a6ba205b099.

Looks like this triggers a crash in one of the Fortran tests. Reverting
while I investigate
    https://lab.llvm.org/buildbot/#/builders/41/builds/6825
2025-05-21 19:24:21 +01:00
Florian Hahn
b263c08e1a
[VPlan] Move predication to VPlanTransform (NFC). (#128420)
This patch moves the logic to predicate and linearize a VPlan to a
dedicated VPlan transform. It mostly ports the existing logic directly.

There are a number of follow-ups planned in the near future to
further improve on the implementation:
* Edge and block masks are cached in VPPredicator, but the block masks
are still made available to VPRecipeBuilder, so they can be accessed
during recipe construction. As a follow-up, this should be replaced by
adding mask operands to all VPInstructions that need them and use that
during recipe construction.
* The mask caching in a map also means that this map needs updating each
time a new recipe replaces a VPInstruction; this would also be handled
by adding mask operands.


PR: https://github.com/llvm/llvm-project/pull/128420
2025-05-21 15:47:33 +01:00
Elvis Wang
664c937b43
[VPlan] Implement VPExtendedReduction, VPMulAccumulateReductionRecipe and corresponding vplan transformations. (#137746)
This patch introduce two new recipes.

* VPExtendedReductionRecipe
  - cast + reduction.

* VPMulAccumulateReductionRecipe
  - (cast) + mul + reduction.

This patch also implements the transformation that match following
patterns via vplan and converts to abstract recipes for better cost
estimation.

* VPExtendedReduction
  - reduce(cast(...))

* VPMulAccumulateReductionRecipe
  - reduce.add(mul(...))
  - reduce.add(mul(ext(...), ext(...))
  - reduce.add(ext(mul(ext(...), ext(...))))

The converted abstract recipes will be lower to the concrete recipes
(widen-cast + widen-mul + reduction) just before recipe execution.

Note that this patch still relies on legacy cost model the calculate the
cost for these patters.
Will enable vplan-based cost decision in #113903.

Split from #113903.
2025-05-16 10:25:38 +08:00
Florian Hahn
2f55123cbb
[VPlan] Handle early exit before forming regions. (NFC) (#138393)
Move early-exit handling up front to original VPlan construction, before
introducing early exits.

This builds on https://github.com/llvm/llvm-project/pull/137709, which
adds exiting edges to the original VPlan, instead of adding exit blocks
later.

This retains the exit conditions early, and means we can handle early
exits before forming regions, without the reliance on VPRecipeBuilder.

Once we retain all exits initially, handling early exits before region
construction ensures the regions are valid; otherwise we would leave
edges exiting the region from elsewhere than the latch.

Removing the reliance on VPRecipeBuilder removes the dependence on
mapping IR BBs to VPBBs and unblocks predication as VPlan transform:
https://github.com/llvm/llvm-project/pull/128420.

Depends on https://github.com/llvm/llvm-project/pull/137709 (included in
PR).

PR: https://github.com/llvm/llvm-project/pull/138393
2025-05-12 12:53:20 +01:00
Florian Hahn
cfde685e22
[VPlan] Sink VPB2IRBB lookups to VPRecipeBuilder (NFC).
This allows migrating some more code to be based on VPBBs in
VPRecipeBuilder, in preparation for
https://github.com/llvm/llvm-project/pull/128420.
2025-05-10 22:00:58 +01:00
Luke Lau
1484f82cbc
[VPlan] Add VPInstruction::StepVector and use it in VPWidenIntOrFpInductionRecipe (#129508)
Split off from #118638, this adds VPInstruction::StepVector, which
generates integer step vectors (0,1,2,...,VF). This is a step towards
eventually modelling all the separate parts of
VPWidenIntOrFpInductionRecipe in VPlan.

This is then used by VPWidenIntOrFpInductionRecipe, where we materialize
it just before unrolling so the operands stay in a fixed position.

The need for a separate operand in VPWidenIntOrFpInductionRecipe, as
well as the need to update it in
optimizeVectorInductionWidthForTCAndVFUF, should be removed with #118638
when everything is expanded in convertToConcreteRecipes.
2025-05-08 18:47:44 +08:00
Florian Hahn
edb690dc5b
Reapply "[VPlan] Add canonical IV during construction (NFC)."
This reverts commit d431921677ae923d189ff2d6f188f676a2964ed8.

Missing gtests have been updated.

Original message:

This addresses an existing TODO and simply moves the current code to add
canonical IV recipes to the initial skeleton construction, at the same
place where the corresponding region will be introduced.
2025-05-03 10:54:59 +01:00
Florian Hahn
d431921677
Revert "[VPlan] Add canonical IV during construction (NFC)."
This reverts commit e17122fffa8d233fcf9f717354ecda46173f1b8d.

Revert as this seems to break some unit tests on some bots.
2025-04-29 22:55:11 +01:00
Florian Hahn
e17122fffa
[VPlan] Add canonical IV during construction (NFC).
This addresses an existing TODO and simply moves the current code to add
canonical IV recipes to the initial skeleton construction, at the same
place where the corresponding region will be introduced.
2025-04-29 22:38:59 +01:00
Florian Hahn
d2ce88a939
[VPlan] Create initial skeleton before creating regions. (NFC)
Move out the logic to prepare for vectorization to a separate transform,
before creating loop regions. This was discussed as follow-up
in https://github.com/llvm/llvm-project/pull/136455.

This just moves the existing code around slightly  and will simplify
follow-up patches to include the exiting edges during initial VPlan
construction.
2025-04-28 21:51:32 +01:00
Florian Hahn
7cce38beea
[VPlan] Remove dead SE argument from handleUncountableEarlyExit (NFC).
ScalarEvolution is not used by the function, remove the dead arg.
2025-04-24 19:59:05 +01:00
Florian Hahn
e232d28eff
[VPlan] Move plain CFG construction to VPlanConstruction. (NFC)
Follow-up as discussed in https://github.com/llvm/llvm-project/pull/129402.

After bc03d6cce257, the VPlanHCFGBuilder doesn't actually build a HCFG
any longer. Move what remains directly into VPlanConstruction.cpp.
2025-04-18 21:52:05 +01:00
Kazu Hirata
f0621b31f8 [Vectorize] Fix a warning
This patch fixes:

  llvm/lib/Transforms/Vectorize/VPlanTransforms.h:31:1: error: class
  'VFRange' was previously declared as a struct; this is valid, but
  may result in linker errors under the Microsoft C++ ABI
  [-Werror,-Wmismatched-tags]
2025-04-17 16:23:52 -07:00
Elvis Wang
69ade7c090
[LV] Check if the VF is scalar by VFRange in handleUncountableEarlyExit. (#135294)
This patch check if the plan contains scalar VF by VFRange instead of
Plan.
This patch also clamp the range to contains either only scalar or only
vector VFs to prevent mis-compile.

Split from #113903.
2025-04-18 06:51:36 +08:00
Florian Hahn
bc03d6cce2
[VPlan] Introduce all loop regions as VPlan transform. (NFC) (#129402)
Further simplify VPlan CFG builder by moving introduction of inner
regions to a VPlan transform, building on
https://github.com/llvm/llvm-project/pull/128419.

The HCFG builder now only constructs plain CFGs. I will move it to
VPlanConstruction as follow-up.

Depends on https://github.com/llvm/llvm-project/pull/128419.

PR: https://github.com/llvm/llvm-project/pull/129402
2025-04-16 13:30:45 +02:00
Florian Hahn
54b33eba16
[VPlan] Add opcode to create step for wide inductions. (#119284)
This patch adds a WideIVStep opcode that can be used to create a vector
with the steps to increment a wide induction. The opcode has 2 operands
* the vector step
* the scale of the vector step

The opcode is later converted into a sequence of recipes that convert
the scale and step to the target type, if needed, and then multiply
vector step by scale.

This simplifies code that needs to materialize step vectors, e.g.
replacing wide IVs as follow up to
https://github.com/llvm/llvm-project/pull/108378 with an increment of
the wide IV step.

PR: https://github.com/llvm/llvm-project/pull/119284
2025-04-14 23:20:44 +02:00
Florian Hahn
c73ad7ba20
[VPlan] Add transformation to narrow interleave groups.
This patch adds a new narrowInterleaveGroups transfrom, which tries
convert a plan with interleave groups with VF elements to a plan that
instead replaces the interleave groups with wide loads and stores
processing VF elements.

This effectively is a very simple form of loop-aware SLP, where we
use interleave groups to identify candidates.

This initial version is quite restricted and hopefully serves as a
starting point for how to best model those kinds of transforms. For now
it only transforms load interleave groups feeding store groups.

Depends on #106431.

This lands the main parts of the approved
https://github.com/llvm/llvm-project/pull/106441 as suggested to break
things up a bit more.
2025-03-20 19:41:37 +00:00
Florian Hahn
2e13ec561c
[VPlan] Bail out on non-intrinsic calls in VPlanNativePath.
Update initial VPlan-construction in VPlanNativePath in line with the
inner loop path, in that it bails out when encountering constructs it
cannot handle, like non-intrinsic calls.

Fixes https://github.com/llvm/llvm-project/issues/131071.
2025-03-19 21:35:15 +00:00
Florian Hahn
62994c3291
[VPlan] Also introduce explicit broadcasts for values from entry VPBB.
Update and generalize materializeBroadcasts to also introduce explicit
broadcasts for VPValues defined in the Plans Entry block.

This fixes a crash when trying to insert the broadcasts generated by
VPTransformState::get after the generating instruction, which isn't
possible after invoke instructions.

Fixes https://github.com/llvm/llvm-project/issues/128838.
2025-03-12 22:03:19 +00:00
Florian Hahn
fd267082ee
[VPlan] Refactor VPlan creation, add transform introducing region (NFC). (#128419)
Create an empty VPlan first, then let the HCFG builder create a plain
CFG for the top-level loop (w/o a top-level region). The top-level
region is introduced by a separate VPlan-transform. This is instead of
creating the vector loop region before building the VPlan CFG for the
input loop.

This simplifies the HCFG builder (which should probably be renamed) and
moves along the roadmap ('buildLoop') outlined in [1].

As follow-up, I plan to also preserve the exit branches in the initial
VPlan out of the CFG builder, including connections to the exit blocks.

The conversion from plain CFG with potentially multiple exits to a
single entry/exit region will be done as VPlan transform in a follow-up.

This is needed to enable VPlan-based predication. Currently early exit
support relies on building the block-in masks on the original CFG,
because exiting branches and conditions aren't preserved in the VPlan.
So in order to switch to VPlan-based predication, we will have to
preserve them in the initial plain CFG, so the exit conditions are
available explicitly when we convert to single entry/exit regions.

Another follow-up is updating the outer loop handling to also introduce
VPRegionBlocks for nested loops as transform. Currently the existing
logic in the builder will take care of creating VPRegionBlocks for
nested loops, but not the top-level loop.

[1]
https://llvm.org/devmtg/2023-10/slides/techtalks/Hahn-VPlan-StatusUpdateAndRoadmap.pdf

PR: https://github.com/llvm/llvm-project/pull/128419
2025-03-09 15:05:35 +00:00
Florian Hahn
4277c21059
[VPlan] Introduce explicit broadcasts for live-ins. (#124644)
Add a new VPInstruction::Broadcast opcode and use it to materialize
explicit broadcasts of live-ins. The initial patch only materlizes the
broadcasts if the vector preheader dominates all uses that need it.
Later patches will pick the best valid insert point, thus retiring
implicit hoisting of broadcasts from VPTransformsState::get().

PR: https://github.com/llvm/llvm-project/pull/124644
2025-02-26 13:57:51 +00:00
Florian Hahn
6ff8a06de9
[VPlan] Run recipe removal and simplification after optimizeForVFAndUF. (#125926)
Run recipe simplification and dead recipe removal after VPlan-based
unrolling and optimizeForVFAndUF, to clean up any redundant or dead
recipes introduced by them. Currently this is NFC, as it removes the
corresponding removeDeadRecipes run in optimizeForVFAndUF and no
additional simplifications kick in after unrolling yet. That is changing
with https://github.com/llvm/llvm-project/pull/123655.

Note that with this change, pattern-matching is now applied after
EVL-based recipes have been introduced.

Trying to match VPWidenEVLRecipe when not explicitly requested might
apply a pattern with 2 operands to one with 3 due to the extra EVL
operand and VPWidenEVLRecipe being a subclass of VPWidenRecipe.

To prevent this, update Recipe_match::match to only match
VPWidenEVLRecipe if it is in the requested recipe types (RecipeTy).

PR: https://github.com/llvm/llvm-project/pull/125926
2025-02-08 13:33:46 +00:00
David Sherwood
3bc2dade36
[LoopVectorize] Enable vectorisation of early exit loops with live-outs (#120567)
This work feeds part of PR
https://github.com/llvm/llvm-project/pull/88385, and adds support for
vectorising
loops with uncountable early exits and outside users of loop-defined
variables. When calculating the final value from an uncountable early
exit we need to calculate the vector lane that triggered the exit,
and hence determine the value at the point we exited.

All code for calculating the last value when exiting the loop early
now lives in a new vector.early.exit block, which sits between the
middle.split block and the original exit block. Doing this required
two fixes:

1. The vplan verifier incorrectly assumed that the block containing
a definition always dominates the block of the user. That's not true
if you can arrive at the use block from multiple incoming blocks.
This is possible for early exit loops where both the early exit and
the latch jump to the same block.
2. We were adding the new vector.early.exit to the wrong parent loop.
It needs to have the same parent as the actual early exit block from
the original loop.

I've added a new ExtractFirstActive VPInstruction that extracts the
first active lane of a vector, i.e. the lane of the vector predicate
that triggered the exit.

NOTE: The IR generated for dealing with live-outs from early exit
loops is unoptimised, as opposed to normal loops. This inevitably
leads to poor quality code, but this can be fixed up later.
2025-01-30 10:37:00 +00:00
Florian Hahn
2b55ef187c
[VPlan] Add helper to run VPlan passes, verify after run (NFC). (#123640)
Add new runPass helpers to run a VPlan transformation. This makes it
easier to add additional checks/functionality for each transform run. In
this patch, an option is added to run the verifier after each VPlan
transform.

Follow-ups will use the same helper to also support printing VPlans
after each transform.

Note that the verifier at the moment requires there to be a canonical IV
and vector loop region, so the final lowering transforms aren't run via
runPass yet.

PR: https://github.com/llvm/llvm-project/pull/123640
2025-01-29 10:50:01 +00:00
Florian Hahn
09a29fcc8d
[VPlan] Don't collect live-ins in collectUsersInExitBlocks. (NFC) (#123819)
Live-ins don't need to be handled, other than adding to the exit phi
recipe. Do that early and assert that otherwise the exit value is
defined in the vector loop region.

This should enable simply skipping other exit values that do not need
further fixing, e.g. if handling the exit value from the early exit
directly in handleUncountableEarlyExit.

PR: https://github.com/llvm/llvm-project/pull/123819
2025-01-27 16:12:07 +00:00
Florian Hahn
2c87133c62
Reapply "[VPlan] Update final IV exit value via VPlan. (#112147)"
This reverts the revert commit 58326f1d5b5b379590af92dd129b2f3b3e96af46.

The build failure in sanitizer stage2 builds has been fixed with
0d39fe6f5bb3edf0bddec09a8c6417377390aeac.

Original commit message:
Model updating IV users directly in VPlan, replace fixupIVUsers.

Now simple extracts are created for all phis in the exit block during
initial VPlan construction. A later VPlan transform
(optimizeInductionExitUsers) replaces extracts of inductions with
their pre-computed values if possible.

This completes the transition towards modeling all live-outs directly in
VPlan.

There are a few follow-ups:
* emit extracts initially also for resume phis, and optimize them
   tougher with IV exit users
* support for VPlans with multiple exits in optimizeInductionExitUsers.

Depends on https://github.com/llvm/llvm-project/pull/110004,
https://github.com/llvm/llvm-project/pull/109975 and
https://github.com/llvm/llvm-project/pull/112145.
2025-01-19 19:32:03 +00:00
Florian Hahn
58326f1d5b
Revert "[VPlan] Update final IV exit value via VPlan. (#112147)"
This reverts commit c2d15ac4d4432788557e77c15ce572ac655a8fec.

Causes build failures on PPC stage2 & fuchsia bots
    https://lab.llvm.org/buildbot/#/builders/168/builds/7650
    https://lab.llvm.org/buildbot/#/builders/11/builds/11248
2025-01-18 13:40:33 +00:00
Florian Hahn
c2d15ac4d4
[VPlan] Update final IV exit value via VPlan. (#112147)
Model updating IV users directly in VPlan, replace fixupIVUsers.

Now simple extracts are created for all phis in the exit block during
initial VPlan construction. A later VPlan transform 
(optimizeInductionExitUsers) replaces extracts of inductions with 
their pre-computed values if possible.

This completes the transition towards modeling all live-outs directly in
VPlan.

There are a few follow-ups:
* emit extracts initially also for resume phis, and optimize them 
   tougher with IV exit users
* support for VPlans with multiple exits in optimizeInductionExitUsers.


Depends on https://github.com/llvm/llvm-project/pull/110004,
https://github.com/llvm/llvm-project/pull/109975 and
https://github.com/llvm/llvm-project/pull/112145.
2025-01-18 13:22:34 +00:00
Florian Hahn
5fae408d3a
[VPlan] Dispatch to multiple exit blocks via middle blocks. (#112138)
A more lightweight variant of
https://github.com/llvm/llvm-project/pull/109193,
which dispatches to multiple exit blocks via the middle blocks.

The patch also introduces a bit of required scaffolding to enable
early-exit vectorization, including an option. At the moment, early-exit
vectorization doesn't come with legality checks, and is only used if the
option is provided and the loop has metadata forcing vectorization. This
is only intended to be used for testing during bring-up, with @david-arm
enabling auto early-exit vectorization plugging in the changes from
https://github.com/llvm/llvm-project/pull/88385.

PR: https://github.com/llvm/llvm-project/pull/112138
2024-12-11 21:11:05 +00:00
Florian Hahn
afef545efa
[VPlan] Address post-commit for #114305.
Apply suggested renaming and adjust placement as suggested in
https://github.com/llvm/llvm-project/pull/114305. Also drop unneeded
RPOT creation.
2024-12-08 21:24:19 +00:00
Florian Hahn
a7fda0e1e4
[VPlan] Introduce VPScalarPHIRecipe, use for can & EVL IV codegen (NFC). (#114305)
Introduce a general recipe to generate a scalar phi. Lower
VPCanonicalIVPHIRecipe and VPEVLBasedIVRecipe to VPScalarIVPHIrecipe
before plan execution, avoiding the need for duplicated ::execute
implementations. There are other cases that could benefit, including
in-loop reduction phis and pointer induction phis.

Builds on a similar idea as
https://github.com/llvm/llvm-project/pull/82270.

PR: https://github.com/llvm/llvm-project/pull/114305
2024-12-03 14:53:51 +00:00
Florian Hahn
2dfb1c664c
[VPlan] Try to hoist Previous (and operands), if sinking fails for FORs. (#108945)
In some cases, Previous (and its operands) can be hoisted. This allows
supporting additional cases where sinking of all users of to FOR fails,
e.g. due having to sink recipes with side-effects.

This fixes a crash where we fail to create a scalar VPlan for a
first-order recurrence, but can create a vector VPlan, because the trunc
instruction of an IV which generates the previous value of the
recurrence has been optimized to a truncated induction recipe, thus
hoisting it to the beginning.

Fixes https://github.com/llvm/llvm-project/issues/106523.

PR: https://github.com/llvm/llvm-project/pull/108945
2024-10-23 13:12:03 -07:00
Alexey Bataev
f148d5791b
[LV]Initial support for safe distance in predicated DataWithEVL vectorization mode.
Enabled initial support for max safe distance in DataWithEVL mode. If
max safe distance is required, need to emit special code:
CMP = icmp ult AVL, MAX_SAFE_DISTANCE
SAFE_AVL = select CMP, AVL, MAX_SAFE_DISTANCE
EVL = call i32 @llvm.experimental.get.vector.length(i64 SAFE_AVL)

while vectorize the loop in DataWithEVL tail folding mode.

Reviewers: fhahn

Reviewed By: fhahn

Pull Request: https://github.com/llvm/llvm-project/pull/102897
2024-10-18 15:51:49 -04:00
Florian Hahn
7f74651837
[VPlan] Use pointer to member 0 as VPInterleaveRecipe's pointer arg. (#106431)
Update VPInterleaveRecipe to always use the pointer to member 0 as
pointer argument. This in many cases helps to remove unneeded index
adjustments and simplifies VPInterleaveRecipe::execute.

In some rare cases, the address of member 0 does not dominate the insert
position of the interleave group. In those cases a PtrAdd VPInstruction
is emitted to compute the address of member 0 based on the address of
the insert position. Alternatively we could hoist the recipe computing
the address of member 0.
2024-10-06 22:53:13 +01:00
Florian Hahn
53266f73f0
[VPlan] Run DCE after unrolling.
This cleans up a number of dead recipes after unrolling if only their
first or last parts are used. This simplifies a number of tests.

Fixes https://github.com/llvm/llvm-project/issues/109581.
2024-09-22 22:08:46 +01:00
Florian Hahn
8ec406757c
[VPlan] Implement unrolling as VPlan-to-VPlan transform. (#95842)
This patch implements explicit unrolling by UF  as VPlan transform. In
follow up patches this will allow simplifying VPTransform state (no need
to store unrolled parts) as well as recipe execution (no need to
generate code for multiple parts in an each recipe). It also allows for
more general optimziations (e.g. avoid generating code for recipes that
are uniform-across parts).

It also unifies the logic dealing with unrolled parts in a single place,
rather than spreading it out across multiple places (e.g. VPlan post
processing for header-phi recipes previously.)

In the initial implementation, a number of recipes still take the
unrolled part as additional, optional argument, if their execution
depends on the unrolled part.

The computation for start/step values for scalable inductions changed
slightly. Previously the step would be computed as scalar and then
splatted, now vscale gets splatted and multiplied by the step in a
vector mul.

This has been split off https://github.com/llvm/llvm-project/pull/94339
which also includes changes to simplify VPTransfomState and recipes'
::execute.

The current version mostly leaves existing ::execute untouched and
instead sets VPTransfomState::UF to 1.

A follow-up patch will clean up all references to VPTransformState::UF.

Another follow-up patch will simplify VPTransformState to only store a
single vector value per VPValue.

PR: https://github.com/llvm/llvm-project/pull/95842
2024-09-21 19:47:37 +01:00
David Sherwood
f3029b330a
[NFC][LoopVectorize] Avoid passing ScalarEvolution to VPlanTransforms::optimize (#108380)
Whilst trying to write some VPlan unit tests I realised
that we don't need to pass a ScalarEvolution object into
VPlanTransforms::optimize because the only thing we
actually need is a LLVMContext.
2024-09-13 12:09:00 +01:00
Florian Hahn
16910a21ee
[VPlan] Move logic to create interleave groups to VPlanTransforms (NFC).
This is a step towards further breaking up the rather large
tryToBuildVPlanWithVPRecipes. It moves logic create interleave groups to
VPlanTransforms.cpp, where similar replacements for other recipes are
defined as well (e.g. EVL-based ones)
2024-08-28 15:56:09 +01:00
Shih-Po Hung
0338c55ea5
[LV, VPlan] Check if plan is compatible to EVL transform (#92092)
The transform updates all users of inductions to work based on EVL,
instead
of the VF directly. At the moment, widened inductions cannot be updated,
so
bail out if the plan contains any.
This patch introduces a check before applying EVL transform. If any
recipes in loop rely on RuntimeVF, the plan is discarded.
2024-05-25 08:22:49 +08:00
Alexey Bataev
413a66f339
[LV, VP]VP intrinsics support for the Loop Vectorizer + adding new tail-folding mode using EVL. (#76172)
This patch introduces generating VP intrinsics in the Loop Vectorizer.

Currently the Loop Vectorizer supports vector predication in a very
limited capacity via tail-folding and masked load/store/gather/scatter
intrinsics. However, this does not let architectures with active vector
length predication support take advantage of their capabilities.
Architectures with general masked predication support also can only take
advantage of predication on memory operations. By having a way for the
Loop Vectorizer to generate Vector Predication intrinsics, which (will)
provide a target-independent way to model predicated vector
instructions. These architectures can make better use of their
predication capabilities.

Our first approach (implemented in this patch) builds on top of the
existing tail-folding mechanism in the LV (just adds a new tail-folding
mode using EVL), but instead of generating masked intrinsics for memory
operations it generates VP intrinsics for loads/stores instructions. The
patch adds a new VPlanTransforms to replace the wide header predicate
compare with EVL and updates codegen for load/stores to use VP
store/load with EVL.

Other important part of this approach is how the Explicit Vector Length
is computed. (VP intrinsics define this vector length parameter as
Explicit Vector Length (EVL)). We use an experimental intrinsic
`get_vector_length`, that can be lowered to architecture specific
instruction(s) to compute EVL.

Also, added a new recipe to emit instructions for computing EVL. Using
VPlan in this way will eventually help build and compare VPlans
corresponding to different strategies and alternatives.

Differential Revision: https://reviews.llvm.org/D99750
2024-04-04 18:30:17 -04:00
Florian Hahn
20177c45db
[VPlan] Turn private members of VPlanTransforms to static funcs (NFC)
Private members of VPlanTransforms are only used inside
VPlanTransforms.cpp, just make them static.
2024-02-17 13:45:23 +00:00
Florian Hahn
debca7ee43
[VPlan] Move dropping of poison flags to VPlanTransforms. (NFC)
Move collectPoisonGeneratingFlags from InnerLoopVectorizer to
VPlanTransforms and also update its name. collectPoisonGeneratingFlags
already directly drops poison-generating flags, not only collecting it.
This means it is more appropriate to integerate it directly into the
VPlan transform pipeline.

The current implementation still calls back to legal to check if a block
needs predication, which should be improved in the future.
2024-02-14 12:28:58 +00:00
Kazu Hirata
8b1181133d [Transforms] Remove unused forward declarations (NFC) 2023-12-10 10:07:12 -08:00
Florian Hahn
70535f5e60
[VPlan] Replace IR based truncateToMinimalBitwidths with VPlan version.
This patch replaces the IR based truncateToMinimalBitwidths with a VPlan
version. This has 3 benefits:
1) the VPlan-based version is simpler; we don't need to implement
   special codegen for each supported instruction type like the IR based
   one.
2) Removes a dependency on the cost-model after VPlan execution and
3) Removes a use of getVPValue that uses underlying values after VPlan
   execution (See removed FIXME).

Depends on D149081.

Depends on D149079.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D149903
2023-12-02 16:12:38 +00:00
Florian Hahn
97687b7aea
[VPlan] Add active-lane-mask as VPlan-to-VPlan transformation.
This patch updates the mask creation code to always create compares of
the form (ICMP_ULE, wide canonical IV, backedge-taken-count) up front
when tail folding and introduce active-lane-mask as later
transformation.

This effectively makes (ICMP_ULE, wide canonical IV, backedge-taken-count)
the canonical form for tail-folding early on. Introducing more specific
active-lane-mask recipes is treated as a VPlan-to-VPlan optimization.

This has the advantage of keeping the logic  (and complexity) of
introducing active-lane-mask recipes in a single place, instead of
spreading the logic out across multiple functions. It also simplifies
initial VPlan construction and enables treating introducing EVL as
similar optimization.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D158779
2023-09-25 13:34:45 +01:00
Florian Hahn
a6d6730709
[LV] Split off code to optimize initial VPlan (NFC).
Split up tryToBuildVPlanWithVPRecipes into intial plan creation and
optimizations, by introducing a VPLanTransform::optimize helper.

Depends on D154640.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D154644
2023-08-04 13:21:20 +01:00
Florian Hahn
9259f41e62
[VPlan] Clear reduction flags directly as VPlanTransform.
After D150027, all relevant recipes should model their IR flags
directly. Instead of removing the flags after codegen as part of
fixReductions, drop poison generating flags directly from the recipes.

Depends on D150027.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D150028
2023-07-09 21:11:51 +01:00
Florian Hahn
6303fa369c
[VPlan] Remove DeadInsts arg from VPInstructionsToVPRecipes (NFC)
The argument isn't used. VPlan-based dead recipe removal can be used
instead.
2023-05-01 15:03:29 +01:00