349 Commits

Author SHA1 Message Date
LiqinWeng
b5f0ec80d5
[VPlan] Remove redundant printing final in VPlan::execute (#121048)
Multiple prints will cause problems when testing ir-bb
2024-12-25 10:11:02 +08:00
Florian Hahn
5ca3794e82
[VPlan] Move initial VPlan block creation to constructor. (NFC)
This sets up the initial blocks needed to initialize a VPlan directly
in the constructor. This will allow tracking of all created blocks
directly in VPlan, simplifying block deletion.
2024-12-18 22:00:30 +00:00
Florian Hahn
58cfa39861
[VPlan] Remove legacy VPlan() constructors (NFC).
The constructors were retained to reduce the diff during transition.

Remove them now.
2024-12-17 08:22:22 +00:00
Florian Hahn
95e509a989
[VPlan] Add VPWidenInduction recipe as common base class (NFC). (#120008)
This helps to simplify some existing code and new code
(https://github.com/llvm/llvm-project/pull/112145)

PR: https://github.com/llvm/llvm-project/pull/120008
2024-12-16 09:40:03 +00:00
Florian Hahn
c95af0844d
[VPlan] Move ::getVectorLoopRegion out of ifdef (NFC).
Fixes a build failure with assertions disabled after
6c8f41d336747.
2024-12-12 16:21:21 +00:00
Florian Hahn
6c8f41d336
[VPlan] Hook IR blocks into VPlan during skeleton creation (NFC) (#114292)
As a first step to move towards modeling the full skeleton in VPlan,
start by wrapping IR blocks created during legacy skeleton creation in
VPIRBasicBlocks and hook them into the VPlan. This means the skeleton
CFG is represented in VPlan, just before execute. This allows moving
parts of skeleton creation into recipes in the VPBBs gradually.

Note that this allows retiring some manual DT updates, as this will be
handled automatically during VPlan execution.

PR: https://github.com/llvm/llvm-project/pull/114292
2024-12-12 15:58:16 +00:00
Luke Lau
b26fe5b7e9
[VPlan] Use variadic isa<> in a few more places. NFC (#119538) 2024-12-12 13:26:39 +08:00
Florian Hahn
5fae408d3a
[VPlan] Dispatch to multiple exit blocks via middle blocks. (#112138)
A more lightweight variant of
https://github.com/llvm/llvm-project/pull/109193,
which dispatches to multiple exit blocks via the middle blocks.

The patch also introduces a bit of required scaffolding to enable
early-exit vectorization, including an option. At the moment, early-exit
vectorization doesn't come with legality checks, and is only used if the
option is provided and the loop has metadata forcing vectorization. This
is only intended to be used for testing during bring-up, with @david-arm
enabling auto early-exit vectorization plugging in the changes from
https://github.com/llvm/llvm-project/pull/88385.

PR: https://github.com/llvm/llvm-project/pull/112138
2024-12-11 21:11:05 +00:00
Florian Hahn
e9834209aa
[VPlan] Move convertToConreteRecipes to end of VPlan-opt phase (NFCI).
Adjust placement as suggested in
https://github.com/llvm/llvm-project/pull/114305, after some refactoring
to prepare for the move.
2024-12-10 09:13:13 +00:00
Florian Hahn
0e70289f37
[VPlan] Create canonical IV resume value for epilogue in VPlan. (NFCI)
Update the code to create induction resume PHIs to also create a resume
phi for the canonical induction during epilogue vectorization. This
unifies the code for handling induction resume values and removes the
need to explicitly create manually resume PHI and return it during
epilogue creation.

Overall it helps to move the code for updating the canonical induction
resume value to the place where all other header phi resume values are
updated.

This is NFC, modulo order of the created phis.
2024-12-09 23:11:38 +00:00
Florian Hahn
ec22b1ab47
[VPlan] Iterate over blocks in VPlan::execute in RPOT (NFC).
This prepares for more complex CFGs in VPlan, as in
    https://github.com/llvm/llvm-project/pull/114292
    https://github.com/llvm/llvm-project/pull/112138
2024-12-07 10:19:27 +00:00
Florian Hahn
156da98683
[VPlan] Move printing final VPlan to ::execute (NFC).
This moves printing of the final VPlan to ::execute. This ensures the
final VPlan is printed, including recipes that get introduced by late,
lowering transforms and skeleton construction.

Split off from https://github.com/llvm/llvm-project/pull/114292, to
simplify the diff.
2024-12-07 09:39:10 +00:00
Florian Hahn
6797b0f0c0
[VPlan] Use RPOT for VPlan codegen and printing.
This split off changes for more complex CFGs in VPlan from both
    https://github.com/llvm/llvm-project/pull/114292
    https://github.com/llvm/llvm-project/pull/112138

This simplifies their respective diffs.
2024-12-06 21:49:00 +00:00
Florian Hahn
a7fda0e1e4
[VPlan] Introduce VPScalarPHIRecipe, use for can & EVL IV codegen (NFC). (#114305)
Introduce a general recipe to generate a scalar phi. Lower
VPCanonicalIVPHIRecipe and VPEVLBasedIVRecipe to VPScalarIVPHIrecipe
before plan execution, avoiding the need for duplicated ::execute
implementations. There are other cases that could benefit, including
in-loop reduction phis and pointer induction phis.

Builds on a similar idea as
https://github.com/llvm/llvm-project/pull/82270.

PR: https://github.com/llvm/llvm-project/pull/114305
2024-12-03 14:53:51 +00:00
Florian Hahn
0dbdc6dc35
[VPlan] Simplify code to re-use existing basic blocks (NFCI).
Restructure and slightly simplify code to re-use existing basic blocks.
2024-11-24 19:14:29 +00:00
Finn Plummer
8663b8777e
[NFC][VectorUtils][TargetTransformInfo] Add isVectorIntrinsicWithOverloadTypeAtArg api (#114849)
This changes allows target intrinsics to specify and overwrite overloaded types.

- Updates `ReplaceWithVecLib` to not provide TTI as there most probably won't be a use-case
- Updates `SLPVectorizer` to use available TTI
- Updates `VPTransformState` to pass down TTI
- Updates `VPlanRecipe` to use passed-down TTI

This change will let us add scalarization for `asdouble`:  #114847
2024-11-21 11:04:25 -08:00
Florian Hahn
a5a1612deb
[VPlan] Consistently use DEBUG_TYPE loop-vectorize.
This ensures debug messages in VPlan.cpp are included in the commonly
used -debug-only=loop-vectorize.
2024-11-10 09:17:03 +00:00
Florian Hahn
8a7a7b5ffc
[VPlan] Remove unneeded code connecting blocks in VPBB:splitAt (NFC).
insertBlockAfter already takes care of transferring successors. Remove
unneeded code to transfer them manually.
2024-11-08 21:52:18 +00:00
Florian Hahn
596fd103f8
[VPlan] Share logic to connect predecessors in VPBB/VPIRBB execute (NFC)
This moves the common logic to connect IRBBs created for a VPBB to their
predecessors in the VPlan CFG, making it easier to keep in sync in the
future.
2024-11-04 19:01:39 +00:00
Kazu Hirata
aa825b74af
[Vectorize] Remove unused includes (NFC) (#114643)
Identified with misc-include-cleaner.
2024-11-03 08:58:51 -08:00
David Sherwood
4ed7bcb4a6
[VPlan][NFC] Add new getMiddleBlock interface to VPlan (#113558)
This work is in preparation for PRs #112138 and #88385 where
the middle block is not guaranteed to be the immediate successor
to the region block. I've simply add new getMiddleBlock()
interfaces to VPlan that for now just return

cast<VPBasicBlock>(VectorRegion->getSingleSuccessor())

Once PR #112138 lands we'll need to do more work to discover
the middle block.
2024-11-01 10:50:52 +00:00
Florian Hahn
3b4c45e4e5
[VPlan] Fix long comment added in b021464d35ca (NFC).
Fix formatting of comment added in b021464d35ca.
2024-10-31 21:05:00 +00:00
Florian Hahn
b021464d35
[VPlan] Introduce scalar loop header in plan, remove VPLiveOut. (#109975)
Update VPlan to include the scalar loop header. This allows retiring
VPLiveOut, as the remaining live-outs can now be handled by adding
operands to the wrapped phis in the scalar loop header.

Note that the current version only includes the scalar loop header, no
other loop blocks and also does not wrap it in a region block.

PR: https://github.com/llvm/llvm-project/pull/109975
2024-10-31 21:36:44 +01:00
Florian Hahn
7fe149cdf0
[VPlan] Replace getIRBasicBlock with IRBB in VPIRBB::execute (NFC).
Suggested in https://github.com/llvm/llvm-project/pull/109975. This
makes the function consistent throughout.
2024-10-27 16:22:18 +01:00
Florian Hahn
ef217a0f6b
[VPlan] Introduce and use getVectorPreheader (NFC).
Introduce a dedicated function to retrieve the vector preheader. This
ensures the correct block is used, even if the skeleton is exetended.
2024-10-23 21:01:52 -07:00
Florian Hahn
1d9b3222f3
[VPlan] Implement VPWidenSelectRecipe::computeCost.
Implement VPlan-based cost computation for VPWidenSelectRecipe.
2024-10-22 03:10:04 +01:00
Florian Hahn
b497010854
[VPlan] Use VPInstruction::Name when assigning names (NFCI).
This slightly improves the printing of VPInstructions. NFC except debug
output.
2024-10-18 05:52:35 +01:00
David Sherwood
175461a22a
[NFC][LoopVectorize] Make replaceVPBBWithIRVPBB more efficient (#111514)
In replaceVPBBWithIRVPBB we spend time erasing and appending
predecessors and successors from a list, when all we really have to do
is replace the old with the new. Not only is this more efficient, but it
also preserves the ordering of successors and predecessors. This is
something which may become important for vectorising early exit loops
(see PR #88385), since a VPIRInstruction is the wrapper for a live-out
phi with extra operands that map to the incoming block according to the
block's predecessor.
2024-10-15 14:11:55 +01:00
David Sherwood
0b2403197f
[LoopVectorize] In LoopVectorize.cpp start using getSymbolicMaxBackedgeTakenCount (#108833)
LoopVectorizationLegality currently only treats a loop as legal to vectorise
if PredicatedScalarEvolution::getBackedgeTakenCount returns a valid
SCEV, or more precisely that the loop must have an exact backedge taken
count. Therefore, in LoopVectorize.cpp we can safely replace all calls to
getBackedgeTakenCount with calls to getSymbolicMaxBackedgeTakenCount,
since the result is the same.

This also helps prepare the loop vectoriser for PR #88385.
2024-10-02 10:28:54 +01:00
Florian Hahn
725eb6bb12
[VPlan] Move createVPIRBasicBlock helper to VPIRBasicBlock (NFC).
Move the helper to VPIRBasicBlock to allow easier re-use outside
VPlan.cpp
2024-09-30 22:12:09 +01:00
Florian Hahn
aae7ac6685
[VPlan] Remove VPIteration, update to use directly VPLane instead (NFC)
After 8ec406757cb92 (https://github.com/llvm/llvm-project/pull/95842),
only the lane part of VPIteration is used.

Simplify the code by replacing remaining uses of VPIteration with VPLane directly.
2024-09-25 16:44:42 +01:00
Florian Hahn
3fbf6f8bb1
[LV] Remove more references of unrolled parts after 57f5d8f2fe.
Continue to clean up some now stale references of unroll parts and
related terminology as pointed out post-commit for 06c3a7d.
2024-09-24 15:50:31 +01:00
Florian Hahn
f76dae1586
[VPlan] Only store single scalar array per VPValue in VPTransState (NFC)
After 8ec406757cb92 (https://github.com/llvm/llvm-project/pull/95842),
VPTransformState only stores a single scalar vector per VPValue.

Simplify the code by replacing the nested SmallVector in PerPartScalars with
a single SmallVector and rename to VPV2Scalars for clarity.
2024-09-23 19:24:28 +01:00
Florian Hahn
57f5d8f2fe
[VPlan] Only store single vector per VPValue in VPTransformState. (NFC)
After 8ec406757cb92 (https://github.com/llvm/llvm-project/pull/95842),
VPTransformState only stores a single vector value per VPValue.

Simplify the code by replacing the SmallVector in PerPartOutput with a
single Value * and rename to VPV2Vector for clarity.

Also remove the redundant Part argument from various accessors.
2024-09-23 11:28:24 +01:00
Florian Hahn
06c3a7d2d7
[VPlan] Remove unneeded State.UF after 8ec406757cb92 (NFC).
State.UF is not needed any longer after 8ec406757cb92
(https://github.com/llvm/llvm-project/pull/95842). Clean it up,
simplifying ::execute of existing recipes.
2024-09-22 20:42:37 +01:00
Florian Hahn
8ec406757c
[VPlan] Implement unrolling as VPlan-to-VPlan transform. (#95842)
This patch implements explicit unrolling by UF  as VPlan transform. In
follow up patches this will allow simplifying VPTransform state (no need
to store unrolled parts) as well as recipe execution (no need to
generate code for multiple parts in an each recipe). It also allows for
more general optimziations (e.g. avoid generating code for recipes that
are uniform-across parts).

It also unifies the logic dealing with unrolled parts in a single place,
rather than spreading it out across multiple places (e.g. VPlan post
processing for header-phi recipes previously.)

In the initial implementation, a number of recipes still take the
unrolled part as additional, optional argument, if their execution
depends on the unrolled part.

The computation for start/step values for scalable inductions changed
slightly. Previously the step would be computed as scalar and then
splatted, now vscale gets splatted and multiplied by the step in a
vector mul.

This has been split off https://github.com/llvm/llvm-project/pull/94339
which also includes changes to simplify VPTransfomState and recipes'
::execute.

The current version mostly leaves existing ::execute untouched and
instead sets VPTransfomState::UF to 1.

A follow-up patch will clean up all references to VPTransformState::UF.

Another follow-up patch will simplify VPTransformState to only store a
single vector value per VPValue.

PR: https://github.com/llvm/llvm-project/pull/95842
2024-09-21 19:47:37 +01:00
Florian Hahn
4eb9838409
[VPlan] Generalize VPValue::isDefinedOutsideLoopRegions.
Update isDefinedOutsideLoopRegions to check if a recipe is defined
outside any region. Split off already approved
https://github.com/llvm/llvm-project/pull/95842 now that this can be
tested separately after landing VPlan-based LICM
https://github.com/llvm/llvm-project/issues/107501
2024-09-20 15:34:00 +01:00
Florian Hahn
256100489d
[VPlan] Rename isDefinedOutside[Vector]Regions -> [Loop] (NFC)
Clarify name of helper, split off from
https://github.com/llvm/llvm-project/pull/95842/files#r1765556732.
2024-09-19 11:20:31 +01:00
Florian Hahn
0d736e296c
[VPlan] Add getSCEVExprForVPValue util, use to get trip count SCEV (NFC) (#94464)
Add a new getSCEVExprForVPValue utility which can be used to get a SCEV
expression for a VPValue. The initial implementation only returns SCEVs
for live-in IR values (by constructing a SCEV based on the live-in IR
value) and VPExpandSCEVRecipe. This is enough to serve its first use,
getting a SCEV for a VPlan's trip count, but will be extended in the
future.

It also removes createTripCountSCEV, as the new helper can be used to
retrieve the SCEV from the VPlan.

PR: https://github.com/llvm/llvm-project/pull/94464
2024-09-18 14:41:56 +01:00
David Sherwood
b84c42944a
[NFC][LoopVectorize] Rename variable in replaceVPBBWithIRVPBB (#108543)
I've renamed the variable in replaceVPBBWithIRVPBB from IRMiddleVPBB ->
IRVPBB, since the function is used for more than just replacing the
middle VP block.
2024-09-17 12:54:55 +01:00
David Sherwood
b29c5b66fd
[NFC][LoopVectorize] Dont pass LLVMContext to VPTypeAnalysis constructor (#108540)
We already pass a Type object into the VPTypeAnalysis constructor, which
can be used to obtain the context. While in the same area it also made
sense to avoid passing the context into the VPTransformState and
VPCostContext constructors.
2024-09-16 09:12:11 +01:00
Florian Hahn
012dbec604
[VPlan] Handle ForceTargetInstructionCost in during precomputeCosts.
Make sure ForceTargetInstruction is respected in precomputeCosts.
2024-09-15 10:53:43 +01:00
Florian Hahn
f66509bf52
[VPlan] Clarify comment for replaceVPBBWithIRVPBB and add assert (NFCI).
Follow-up to suggestion during
https://github.com/llvm/llvm-project/pull/100735.

More specifically
9a40ed0919 (diff-6d0b73adfa9f8465923d2225ab6674ddcdeab71666f7a73dfaec7fa1246b3a1f)
2024-09-14 21:51:19 +01:00
Florian Hahn
f0c5caa814
[VPlan] Add VPIRInstruction, use for exit block live-outs. (#100735)
Add a new VPIRInstruction recipe to wrap existing IR instructions not to
be modified during execution, execept for PHIs. For PHIs, a single
VPValue
operand is allowed, and it is used to add a new incoming value for the
single predecessor VPBB. Expect PHIs, VPIRInstructions cannot have any
operands.

Depends on https://github.com/llvm/llvm-project/pull/100658.

PR: https://github.com/llvm/llvm-project/pull/100735
2024-09-14 21:21:55 +01:00
David Sherwood
f3029b330a
[NFC][LoopVectorize] Avoid passing ScalarEvolution to VPlanTransforms::optimize (#108380)
Whilst trying to write some VPlan unit tests I realised
that we don't need to pass a ScalarEvolution object into
VPlanTransforms::optimize because the only thing we
actually need is a LLVMContext.
2024-09-13 12:09:00 +01:00
Florian Hahn
a794ee4559
[VPlan] Add VPValue for VF, use it for VPWidenIntOrFpInductionRecipe. (#95305)
Similar to VFxUF, also add a VF VPValue to VPlan and use it to get the
runtime VF in VPWidenIntOrFpInductionRecipe. Code for VF is only
generated if there are users of VF, to avoid unnecessary test changes.

PR: https://github.com/llvm/llvm-project/pull/95305
2024-09-10 10:41:35 +01:00
Florian Hahn
1a5a1e9781
[VPlan] Assert that VFxUF is always used.
Add assertion to ensure invariant discussed in
https://github.com/llvm/llvm-project/pull/95305.
2024-09-09 14:26:09 +01:00
Florian Hahn
96e1320a9a
[VPlan] Move properlyDominates to VPDominatorTree (NFCI).
This allows for easier re-use in additional places in the future. Also
move code to VPlanAnalysis.cpp
2024-08-28 13:58:12 +01:00
Ramkumar Ramachandra
71ede8d831
VPlan: factor out VPlanUtils into its own file (NFC) (#105857) 2024-08-28 13:54:41 +01:00
Florian Hahn
4e04286d61
[VPlan] Only use selectVectorizationFactor for cross-check (NFCI). (#103033)
Use getBestVF to select VF up-front and only use
selectVectorizationFactor to get the VF legacy VF to check the
vectorization decision matches the VPlan-based cost model.

PR: https://github.com/llvm/llvm-project/pull/103033
2024-08-21 13:09:01 +02:00