332 Commits

Author SHA1 Message Date
Florian Hahn
8a7a7b5ffc
[VPlan] Remove unneeded code connecting blocks in VPBB:splitAt (NFC).
insertBlockAfter already takes care of transferring successors. Remove
unneeded code to transfer them manually.
2024-11-08 21:52:18 +00:00
Florian Hahn
596fd103f8
[VPlan] Share logic to connect predecessors in VPBB/VPIRBB execute (NFC)
This moves the common logic to connect IRBBs created for a VPBB to their
predecessors in the VPlan CFG, making it easier to keep in sync in the
future.
2024-11-04 19:01:39 +00:00
Kazu Hirata
aa825b74af
[Vectorize] Remove unused includes (NFC) (#114643)
Identified with misc-include-cleaner.
2024-11-03 08:58:51 -08:00
David Sherwood
4ed7bcb4a6
[VPlan][NFC] Add new getMiddleBlock interface to VPlan (#113558)
This work is in preparation for PRs #112138 and #88385 where
the middle block is not guaranteed to be the immediate successor
to the region block. I've simply add new getMiddleBlock()
interfaces to VPlan that for now just return

cast<VPBasicBlock>(VectorRegion->getSingleSuccessor())

Once PR #112138 lands we'll need to do more work to discover
the middle block.
2024-11-01 10:50:52 +00:00
Florian Hahn
3b4c45e4e5
[VPlan] Fix long comment added in b021464d35ca (NFC).
Fix formatting of comment added in b021464d35ca.
2024-10-31 21:05:00 +00:00
Florian Hahn
b021464d35
[VPlan] Introduce scalar loop header in plan, remove VPLiveOut. (#109975)
Update VPlan to include the scalar loop header. This allows retiring
VPLiveOut, as the remaining live-outs can now be handled by adding
operands to the wrapped phis in the scalar loop header.

Note that the current version only includes the scalar loop header, no
other loop blocks and also does not wrap it in a region block.

PR: https://github.com/llvm/llvm-project/pull/109975
2024-10-31 21:36:44 +01:00
Florian Hahn
7fe149cdf0
[VPlan] Replace getIRBasicBlock with IRBB in VPIRBB::execute (NFC).
Suggested in https://github.com/llvm/llvm-project/pull/109975. This
makes the function consistent throughout.
2024-10-27 16:22:18 +01:00
Florian Hahn
ef217a0f6b
[VPlan] Introduce and use getVectorPreheader (NFC).
Introduce a dedicated function to retrieve the vector preheader. This
ensures the correct block is used, even if the skeleton is exetended.
2024-10-23 21:01:52 -07:00
Florian Hahn
1d9b3222f3
[VPlan] Implement VPWidenSelectRecipe::computeCost.
Implement VPlan-based cost computation for VPWidenSelectRecipe.
2024-10-22 03:10:04 +01:00
Florian Hahn
b497010854
[VPlan] Use VPInstruction::Name when assigning names (NFCI).
This slightly improves the printing of VPInstructions. NFC except debug
output.
2024-10-18 05:52:35 +01:00
David Sherwood
175461a22a
[NFC][LoopVectorize] Make replaceVPBBWithIRVPBB more efficient (#111514)
In replaceVPBBWithIRVPBB we spend time erasing and appending
predecessors and successors from a list, when all we really have to do
is replace the old with the new. Not only is this more efficient, but it
also preserves the ordering of successors and predecessors. This is
something which may become important for vectorising early exit loops
(see PR #88385), since a VPIRInstruction is the wrapper for a live-out
phi with extra operands that map to the incoming block according to the
block's predecessor.
2024-10-15 14:11:55 +01:00
David Sherwood
0b2403197f
[LoopVectorize] In LoopVectorize.cpp start using getSymbolicMaxBackedgeTakenCount (#108833)
LoopVectorizationLegality currently only treats a loop as legal to vectorise
if PredicatedScalarEvolution::getBackedgeTakenCount returns a valid
SCEV, or more precisely that the loop must have an exact backedge taken
count. Therefore, in LoopVectorize.cpp we can safely replace all calls to
getBackedgeTakenCount with calls to getSymbolicMaxBackedgeTakenCount,
since the result is the same.

This also helps prepare the loop vectoriser for PR #88385.
2024-10-02 10:28:54 +01:00
Florian Hahn
725eb6bb12
[VPlan] Move createVPIRBasicBlock helper to VPIRBasicBlock (NFC).
Move the helper to VPIRBasicBlock to allow easier re-use outside
VPlan.cpp
2024-09-30 22:12:09 +01:00
Florian Hahn
aae7ac6685
[VPlan] Remove VPIteration, update to use directly VPLane instead (NFC)
After 8ec406757cb92 (https://github.com/llvm/llvm-project/pull/95842),
only the lane part of VPIteration is used.

Simplify the code by replacing remaining uses of VPIteration with VPLane directly.
2024-09-25 16:44:42 +01:00
Florian Hahn
3fbf6f8bb1
[LV] Remove more references of unrolled parts after 57f5d8f2fe.
Continue to clean up some now stale references of unroll parts and
related terminology as pointed out post-commit for 06c3a7d.
2024-09-24 15:50:31 +01:00
Florian Hahn
f76dae1586
[VPlan] Only store single scalar array per VPValue in VPTransState (NFC)
After 8ec406757cb92 (https://github.com/llvm/llvm-project/pull/95842),
VPTransformState only stores a single scalar vector per VPValue.

Simplify the code by replacing the nested SmallVector in PerPartScalars with
a single SmallVector and rename to VPV2Scalars for clarity.
2024-09-23 19:24:28 +01:00
Florian Hahn
57f5d8f2fe
[VPlan] Only store single vector per VPValue in VPTransformState. (NFC)
After 8ec406757cb92 (https://github.com/llvm/llvm-project/pull/95842),
VPTransformState only stores a single vector value per VPValue.

Simplify the code by replacing the SmallVector in PerPartOutput with a
single Value * and rename to VPV2Vector for clarity.

Also remove the redundant Part argument from various accessors.
2024-09-23 11:28:24 +01:00
Florian Hahn
06c3a7d2d7
[VPlan] Remove unneeded State.UF after 8ec406757cb92 (NFC).
State.UF is not needed any longer after 8ec406757cb92
(https://github.com/llvm/llvm-project/pull/95842). Clean it up,
simplifying ::execute of existing recipes.
2024-09-22 20:42:37 +01:00
Florian Hahn
8ec406757c
[VPlan] Implement unrolling as VPlan-to-VPlan transform. (#95842)
This patch implements explicit unrolling by UF  as VPlan transform. In
follow up patches this will allow simplifying VPTransform state (no need
to store unrolled parts) as well as recipe execution (no need to
generate code for multiple parts in an each recipe). It also allows for
more general optimziations (e.g. avoid generating code for recipes that
are uniform-across parts).

It also unifies the logic dealing with unrolled parts in a single place,
rather than spreading it out across multiple places (e.g. VPlan post
processing for header-phi recipes previously.)

In the initial implementation, a number of recipes still take the
unrolled part as additional, optional argument, if their execution
depends on the unrolled part.

The computation for start/step values for scalable inductions changed
slightly. Previously the step would be computed as scalar and then
splatted, now vscale gets splatted and multiplied by the step in a
vector mul.

This has been split off https://github.com/llvm/llvm-project/pull/94339
which also includes changes to simplify VPTransfomState and recipes'
::execute.

The current version mostly leaves existing ::execute untouched and
instead sets VPTransfomState::UF to 1.

A follow-up patch will clean up all references to VPTransformState::UF.

Another follow-up patch will simplify VPTransformState to only store a
single vector value per VPValue.

PR: https://github.com/llvm/llvm-project/pull/95842
2024-09-21 19:47:37 +01:00
Florian Hahn
4eb9838409
[VPlan] Generalize VPValue::isDefinedOutsideLoopRegions.
Update isDefinedOutsideLoopRegions to check if a recipe is defined
outside any region. Split off already approved
https://github.com/llvm/llvm-project/pull/95842 now that this can be
tested separately after landing VPlan-based LICM
https://github.com/llvm/llvm-project/issues/107501
2024-09-20 15:34:00 +01:00
Florian Hahn
256100489d
[VPlan] Rename isDefinedOutside[Vector]Regions -> [Loop] (NFC)
Clarify name of helper, split off from
https://github.com/llvm/llvm-project/pull/95842/files#r1765556732.
2024-09-19 11:20:31 +01:00
Florian Hahn
0d736e296c
[VPlan] Add getSCEVExprForVPValue util, use to get trip count SCEV (NFC) (#94464)
Add a new getSCEVExprForVPValue utility which can be used to get a SCEV
expression for a VPValue. The initial implementation only returns SCEVs
for live-in IR values (by constructing a SCEV based on the live-in IR
value) and VPExpandSCEVRecipe. This is enough to serve its first use,
getting a SCEV for a VPlan's trip count, but will be extended in the
future.

It also removes createTripCountSCEV, as the new helper can be used to
retrieve the SCEV from the VPlan.

PR: https://github.com/llvm/llvm-project/pull/94464
2024-09-18 14:41:56 +01:00
David Sherwood
b84c42944a
[NFC][LoopVectorize] Rename variable in replaceVPBBWithIRVPBB (#108543)
I've renamed the variable in replaceVPBBWithIRVPBB from IRMiddleVPBB ->
IRVPBB, since the function is used for more than just replacing the
middle VP block.
2024-09-17 12:54:55 +01:00
David Sherwood
b29c5b66fd
[NFC][LoopVectorize] Dont pass LLVMContext to VPTypeAnalysis constructor (#108540)
We already pass a Type object into the VPTypeAnalysis constructor, which
can be used to obtain the context. While in the same area it also made
sense to avoid passing the context into the VPTransformState and
VPCostContext constructors.
2024-09-16 09:12:11 +01:00
Florian Hahn
012dbec604
[VPlan] Handle ForceTargetInstructionCost in during precomputeCosts.
Make sure ForceTargetInstruction is respected in precomputeCosts.
2024-09-15 10:53:43 +01:00
Florian Hahn
f66509bf52
[VPlan] Clarify comment for replaceVPBBWithIRVPBB and add assert (NFCI).
Follow-up to suggestion during
https://github.com/llvm/llvm-project/pull/100735.

More specifically
9a40ed0919 (diff-6d0b73adfa9f8465923d2225ab6674ddcdeab71666f7a73dfaec7fa1246b3a1f)
2024-09-14 21:51:19 +01:00
Florian Hahn
f0c5caa814
[VPlan] Add VPIRInstruction, use for exit block live-outs. (#100735)
Add a new VPIRInstruction recipe to wrap existing IR instructions not to
be modified during execution, execept for PHIs. For PHIs, a single
VPValue
operand is allowed, and it is used to add a new incoming value for the
single predecessor VPBB. Expect PHIs, VPIRInstructions cannot have any
operands.

Depends on https://github.com/llvm/llvm-project/pull/100658.

PR: https://github.com/llvm/llvm-project/pull/100735
2024-09-14 21:21:55 +01:00
David Sherwood
f3029b330a
[NFC][LoopVectorize] Avoid passing ScalarEvolution to VPlanTransforms::optimize (#108380)
Whilst trying to write some VPlan unit tests I realised
that we don't need to pass a ScalarEvolution object into
VPlanTransforms::optimize because the only thing we
actually need is a LLVMContext.
2024-09-13 12:09:00 +01:00
Florian Hahn
a794ee4559
[VPlan] Add VPValue for VF, use it for VPWidenIntOrFpInductionRecipe. (#95305)
Similar to VFxUF, also add a VF VPValue to VPlan and use it to get the
runtime VF in VPWidenIntOrFpInductionRecipe. Code for VF is only
generated if there are users of VF, to avoid unnecessary test changes.

PR: https://github.com/llvm/llvm-project/pull/95305
2024-09-10 10:41:35 +01:00
Florian Hahn
1a5a1e9781
[VPlan] Assert that VFxUF is always used.
Add assertion to ensure invariant discussed in
https://github.com/llvm/llvm-project/pull/95305.
2024-09-09 14:26:09 +01:00
Florian Hahn
96e1320a9a
[VPlan] Move properlyDominates to VPDominatorTree (NFCI).
This allows for easier re-use in additional places in the future. Also
move code to VPlanAnalysis.cpp
2024-08-28 13:58:12 +01:00
Ramkumar Ramachandra
71ede8d831
VPlan: factor out VPlanUtils into its own file (NFC) (#105857) 2024-08-28 13:54:41 +01:00
Florian Hahn
4e04286d61
[VPlan] Only use selectVectorizationFactor for cross-check (NFCI). (#103033)
Use getBestVF to select VF up-front and only use
selectVectorizationFactor to get the VF legacy VF to check the
vectorization decision matches the VPlan-based cost model.

PR: https://github.com/llvm/llvm-project/pull/103033
2024-08-21 13:09:01 +02:00
Florian Hahn
f2fcd9cb97
[VPlan] Rename getBestPlanFor -> getPlanFor (NFC).
As suggested in https://github.com/llvm/llvm-project/pull/103033, more
accurately rename to getPlanFor , as it simplify returns the VPlan for
VF, relying on the fact that there is a single VPlan for each VF at the
moment.
2024-08-19 13:05:19 +01:00
Florian Hahn
cd60d10a10
[VPlan] Move some LoopVectorizationPlanner helpers to VPlan.cpp (NFC).
Members not requiring access to LoopVectorizationLegality or
LoopVectorizationCostModel can safely be moved out of the very large
LoopVectorization.cpp and are more accurately placed in VPlan.cpp
2024-08-19 09:58:46 +01:00
Mel Chen
e3d9b01a36
[VPlan][NFC] Make VPValue pointer const. (#101334) 2024-08-02 09:34:25 +08:00
Florian Hahn
b8741cc185
[VPlan] Relax assertion retrieving a scalar from VPTransformState::get.
The current assertion VPTransformState::get when retrieving a single
scalar only does not account for cases where a def has multiple users,
some demanding all scalar lanes, some demanding only a single scalar.

For an example, see the modified test case. Relax the assertion by also
allowing requesting scalar lanes only when the Def doesn't have only its
first lane used.

Fixes https://github.com/llvm/llvm-project/issues/88849.
2024-07-19 11:33:57 +01:00
Florian Hahn
b841e2eca3
Recommit "[VPlan] First step towards VPlan cost modeling. (#92555)"
This reverts commit 6f538f6a2d3224efda985e9eb09012fa4275ea92.

A number of crashes have been fixed by separate fixes, including
ttps://github.com/llvm/llvm-project/pull/96622. This version of the
PR also pre-computes the costs for branches (except the latch) instead
of computing their costs as part of costing of replicate regions, as
there may not be a direct correspondence between original branches and
number of replicate regions.

Original message:
This adds a new interface to compute the cost of recipes, VPBasicBlocks,
VPRegionBlocks and VPlan, initially falling back to the legacy cost model
for all recipes. Follow-up patches will gradually migrate recipes to
compute their own costs step-by-step.

It also adds getBestPlan function to LVP which computes the cost of all
VPlans and picks the most profitable one together with the most
profitable VF.

The VPlan selected by the VPlan cost model is executed and there is an
assert to catch cases where the VPlan cost model and the legacy cost
model disagree. Even though I checked a number of different build
configurations on AArch64 and X86, there may be some differences
that have been missed.

Additional discussions and context can be found in @arcbbb's
https://github.com/llvm/llvm-project/pull/67647 and
https://github.com/llvm/llvm-project/pull/67934 which is an earlier
version of the current PR.

PR: https://github.com/llvm/llvm-project/pull/92555
2024-07-10 14:22:21 +01:00
Florian Hahn
72937203dd
[VPlan] Create vector header and latch VPBBs in createInitialVPlan (NFC)
The empty header and latch blocks can be created together with the
vector loop region.

This is in preparation for splitting up the very large
tryToBuildVPlanWithVPRecipes into several distinct functions, as
suggested multiple times, including in
https://github.com/llvm/llvm-project/pull/94760
2024-07-09 12:41:12 +01:00
Florian Hahn
99d6c6d936
[VPlan] Model branch cond to enter scalar epilogue in VPlan. (#92651)
This patch moves branch condition creation to enter the scalar epilogue
loop to VPlan. Modeling the branch in the middle block also requires
modeling the successor blocks. This is done using the recently
introduced VPIRBasicBlock.

Note that the middle.block is still created as part of the skeleton and
then patched in during VPlan execution. Unfortunately the skeleton needs
to create the middle.block early on, as it is also used for induction
resume value creation and is also needed to properly update the
dominator tree during skeleton creation.

After this patch lands, I plan to move induction resume value and phi
node creation in the scalar preheader to VPlan. Once that is done, we
should be able to create the middle.block in VPlan directly.

This is a re-worked version based on the earlier
https://reviews.llvm.org/D150398 and the main change is the use of
VPIRBasicBlock.

Depends on https://github.com/llvm/llvm-project/pull/92525

PR: https://github.com/llvm/llvm-project/pull/92651
2024-07-05 10:08:42 +01:00
Florian Hahn
ef1773ad57
[VPlan] Rewrite cloneSESE to use 2 depth-first passes (NFCI).
Rewrite cloneSESE to perform 2 depth-first passes with the first one
cloning blocks and the second one updating the predecessors and
successors.

This is needed to preserve the correct predecessor/successor ordering
with https://github.com/llvm/llvm-project/pull/92651 and has been split
off as suggested.
2024-06-23 20:37:51 +01:00
Florian Hahn
31a94bd783
[VPlan] Rename Preheader -> Entry in createInitialVPlan (NFCI).
Split off from https://github.com/llvm/llvm-project/pull/92651 as
suggested.
2024-06-23 20:28:15 +01:00
Florian Hahn
f1f3c34b47
Revert "Recommit "[VPlan] First step towards VPlan cost modeling. (#92555)""
This reverts commit 242cc200ccb24e22eaf54aed7b0b0c84cfc54c0b and
eea150c84053035163f307b46549a2997a343ce9, as it is causing a build bot
failure and there have been a number of crashes reported at
https://github.com/llvm/llvm-project/pull/92555
2024-06-21 19:54:21 +01:00
Florian Hahn
eea150c840
[VPlan] Include IV phi and backedge cost in VPlan cost computation.
In WebAssembly, costs != 0 are assigned to be backedge and induction
phis, so make sure we include those costs in the VPlan-based cost model.

This fixes a downstream crash with WebAssembly after 242cc200ccb
(https://github.com/llvm/llvm-project/pull/92555)
2024-06-20 20:44:17 +01:00
Florian Hahn
242cc200cc
Recommit "[VPlan] First step towards VPlan cost modeling. (#92555)"
This reverts commit 6f538f6a2d3224efda985e9eb09012fa4275ea92.

Extra tests for crashes discovered when building Chromium have been
added in fb86cb7ec157689e, 3be7312f81ad2.

Original message:
This adds a new interface to compute the cost of recipes, VPBasicBlocks,
VPRegionBlocks and VPlan, initially falling back to the legacy cost model
for all recipes. Follow-up patches will gradually migrate recipes to
compute their own costs step-by-step.

It also adds getBestPlan function to LVP which computes the cost of all
VPlans and picks the most profitable one together with the most
profitable VF.

The VPlan selected by the VPlan cost model is executed and there is an
assert to catch cases where the VPlan cost model and the legacy cost
model disagree. Even though I checked a number of different build
configurations on AArch64 and X86, there may be some differences
that have been missed.

Additional discussions and context can be found in @arcbbb's
https://github.com/llvm/llvm-project/pull/67647 and
https://github.com/llvm/llvm-project/pull/67934 which is an earlier
version of the current PR.

PR: https://github.com/llvm/llvm-project/pull/92555
2024-06-20 17:32:52 +01:00
Florian Hahn
3808ba78de
[VPlan] Model middle block via VPIRBasicBlock. (#95816)
Use VPIRBasicBlock to wrap the middle block and implement patching up
branches in predecessors in VPIRBasicBlock::execute. The IR middle block
is only created after skeleton creation. Initially a regular
VPBasicBlock is created, which will later be replaced by a
VPIRBasicBlock once the middle IR basic block has been created.

Note that this slightly changes the order of instructions created in the
middle block; code generated by recipe execution in the middle block
will now be inserted before the terminator (and in between the compare
to used by the terminator). The original order will be restored in
https://github.com/llvm/llvm-project/pull/92651.


PR: https://github.com/llvm/llvm-project/pull/95816
2024-06-20 13:42:20 +01:00
Florian Hahn
c008647b3a
[VPlan] Introduce isHeaderMask helper (NFCI).
Split off from https://github.com/llvm/llvm-project/pull/92555
and slightly generalized to more precisely check for a header mask.

Use it to replace manual checks in collectHeaderMasks.
2024-06-19 20:17:01 +01:00
Florian Hahn
40a72f8cc4
[VPlan] Support extracting any lane of uniform value.
If the value we are extracting a lane from is uniform, only the first
lane will be set. Return lane 0 for any requested lane.

This fixes a crash when trying to extract the last lane for a
first-order recurrence resume value.

Fixes https://github.com/llvm/llvm-project/issues/95520.
2024-06-14 22:16:52 +01:00
Arthur Eubanks
6f538f6a2d Revert "Recommit "[VPlan] First step towards VPlan cost modeling. (#92555)""
This reverts commit 90fd99c0795711e1cf762a02b29b0a702f86a264.
This reverts commit 43e6f46936e177e47de6627a74b047ba27561b44.

Causes crashes, see comments on https://github.com/llvm/llvm-project/pull/92555.
2024-06-14 17:47:08 +00:00
Florian Hahn
90fd99c079
Recommit "[VPlan] First step towards VPlan cost modeling. (#92555)"
This reverts commit 46080abe9b136821eda2a1a27d8a13ceac349f8c.

Extra tests have been added in 52d29eb287.

Original message:
This adds a new interface to compute the cost of recipes, VPBasicBlocks,
VPRegionBlocks and VPlan, initially falling back to the legacy cost model
for all recipes. Follow-up patches will gradually migrate recipes to
compute their own costs step-by-step.

It also adds getBestPlan function to LVP which computes the cost of all
VPlans and picks the most profitable one together with the most
profitable VF.

The VPlan selected by the VPlan cost model is executed and there is an
assert to catch cases where the VPlan cost model and the legacy cost
model disagree. Even though I checked a number of different build
configurations on AArch64 and X86, there may be some differences
that have been missed.

Additional discussions and context can be found in @arcbbb's
https://github.com/llvm/llvm-project/pull/67647 and
https://github.com/llvm/llvm-project/pull/67934 which is an earlier
version of the current PR.

PR: https://github.com/llvm/llvm-project/pull/92555
2024-06-14 12:33:48 +01:00