Split up the not clearly named prepareForVectorization transform into
buildVPlan0, which adds the vector preheader, middle and scalar
preheader blocks, as well as the canonical induction recipes and sets
the trip count. The new transform is run directly after building the
plain CFG VPlan initially.
The remaining code handling early exits and adding the branch in the
middle block is renamed to handleEarlyExitsAndAddMiddleCheck and still
runs at the original position.
With the code movement, we only have to add the skeleton once to the
initial VPlan, and cloning will take care of the rest. It will also
enable moving other construction steps to work directly on VPlan0, like
adding resume phis.
PR: https://github.com/llvm/llvm-project/pull/150848
The initial VPlan closely reflects the original scalar loop, so unsing
VPWidenPHIRecipe here is premature. Widened phi recipes should only be
introduced together with other widened recipes.
PR: https://github.com/llvm/llvm-project/pull/150847
Update LV to vectorize maxnum/minnum reductions without fast-math flags,
by adding an extra check in the loop if any inputs to maxnum/minnum are
NaN, due to maxnum/minnum behavior w.r.t to signaling NaNs. Signed-zeros
are already handled consistently by maxnum/minnum.
If any input is NaN,
*exit the vector loop,
*compute the reduction result up to the vector iteration that contained
NaN inputs and
* resume in the scalar loop
New recurrence kinds are added for reductions using maxnum/minnum
without fast-math flags.
PR: https://github.com/llvm/llvm-project/pull/148239
Simplify the handling of exit users by generating all extracts first
(safe option), and have FOR handling optimize the extracts, similar to
already done for reductions and inductions.
NFC modulo first-order recurrence extract order in middle block.
VPlanConstruction.cpp doesn't use IR pattern matching, so using
VPlanPatternMatch can be moved to the top-level, to slightly reduce the
diff in follow-up changes.
Connect SCEV and memory runtime check block directly in VPlan as
VPIRBasicBlocks, removing ILV::emitSCEVChecks and
ILV::emitMemRuntimeChecks.
The new logic is currently split across
LoopVectorizationPlanner::addRuntimeChecks which collects a list of
{Condition, CheckBlock} pairs and performs some checks and emits remarks
if needed. The list of checks is then added to VPlan in
VPlanTransforms::connectCheckBlocks.
PR: https://github.com/llvm/llvm-project/pull/143879
Update initial construction to connect the Plan's entry to the scalar
preheader during initial construction. This moves a small part of the
skeleton creation out of ILV and will also enable replacing
VPInstruction::ResumePhi with regular VPPhi recipes.
Resume phis need 2 incoming values to start with, the second being the
bypass value from the scalar ph (and used to replicate the incoming
value for other bypass blocks). Adding the extra edge ensures we
incoming values for resume phis match the incoming blocks.
PR: https://github.com/llvm/llvm-project/pull/140132
This reverts commit 793bb6b257fa4d9f4af169a4366cab3da01f2e1f.
The recommitted version contains a fix to make sure only the original
phis are processed in convertPhisToBlends nu collecting them in a vector
first. This fixes a crash when no mask is needed, because there is only
a single incoming value.
Original message:
This patch moves the logic to predicate and linearize a VPlan to a
dedicated VPlan transform. It mostly ports the existing logic directly.
There are a number of follow-ups planned in the near future to
further improve on the implementation:
* Edge and block masks are cached in VPPredicator, but the block masks
are still made available to VPRecipeBuilder, so they can be accessed
during recipe construction. As a follow-up, this should be replaced by
adding mask operands to all VPInstructions that need them and use that
during recipe construction.
* The mask caching in a map also means that this map needs updating each
time a new recipe replaces a VPInstruction; this would also be handled
by adding mask operands.
PR: https://github.com/llvm/llvm-project/pull/128420
This patch moves the logic to predicate and linearize a VPlan to a
dedicated VPlan transform. It mostly ports the existing logic directly.
There are a number of follow-ups planned in the near future to
further improve on the implementation:
* Edge and block masks are cached in VPPredicator, but the block masks
are still made available to VPRecipeBuilder, so they can be accessed
during recipe construction. As a follow-up, this should be replaced by
adding mask operands to all VPInstructions that need them and use that
during recipe construction.
* The mask caching in a map also means that this map needs updating each
time a new recipe replaces a VPInstruction; this would also be handled
by adding mask operands.
PR: https://github.com/llvm/llvm-project/pull/128420
Move early-exit handling up front to original VPlan construction, before
introducing early exits.
This builds on https://github.com/llvm/llvm-project/pull/137709, which
adds exiting edges to the original VPlan, instead of adding exit blocks
later.
This retains the exit conditions early, and means we can handle early
exits before forming regions, without the reliance on VPRecipeBuilder.
Once we retain all exits initially, handling early exits before region
construction ensures the regions are valid; otherwise we would leave
edges exiting the region from elsewhere than the latch.
Removing the reliance on VPRecipeBuilder removes the dependence on
mapping IR BBs to VPBBs and unblocks predication as VPlan transform:
https://github.com/llvm/llvm-project/pull/128420.
Depends on https://github.com/llvm/llvm-project/pull/137709 (included in
PR).
PR: https://github.com/llvm/llvm-project/pull/138393
Update initial VPlan construction to include exit conditions and edges.
The loop region is now first constructed without entry/exiting. Those
are set after inserting the region in the CFG, to preserve the original
predecessor/successor order of blocks.
For now, all early exits are disconnected before forming the regions,
but a follow-up will update uncountable exit handling to also happen
here. This is required to enable VPlan predication and remove the
dependence any IR BBs
(https://github.com/llvm/llvm-project/pull/128420).
PR: https://github.com/llvm/llvm-project/pull/137709
Flip the region construction order to innermost first from outermost
first. This ensures we only set the final parent region for VPBBs once.
Split off from https://github.com/llvm/llvm-project/pull/137709.
Remove checking if the instruction is in the preheader or exit blocks.
Those checks are redundant and handled by checking if the instruction is
outside the loop below.
Split off as suggested from https://github.com/llvm/llvm-project/pull/137709.
This reverts commit d431921677ae923d189ff2d6f188f676a2964ed8.
Missing gtests have been updated.
Original message:
This addresses an existing TODO and simply moves the current code to add
canonical IV recipes to the initial skeleton construction, at the same
place where the corresponding region will be introduced.
This addresses an existing TODO and simply moves the current code to add
canonical IV recipes to the initial skeleton construction, at the same
place where the corresponding region will be introduced.
Move out the logic to prepare for vectorization to a separate transform,
before creating loop regions. This was discussed as follow-up
in https://github.com/llvm/llvm-project/pull/136455.
This just moves the existing code around slightly and will simplify
follow-up patches to include the exiting edges during initial VPlan
construction.
Simplify initial VPlan construction by not creating a separate
vector.latch block, which isn't needed and will get folded away later.
This has been suggested as independent clean-up multiple times.
Add incoming exit phi operands during the initial VPlan construction.
This ensures all users are added to the initial VPlan and is also needed
in preparation to retaining exiting edges during initial construction.
PR: https://github.com/llvm/llvm-project/pull/136455
Follow-up as discussed in https://github.com/llvm/llvm-project/pull/129402.
After bc03d6cce257, the VPlanHCFGBuilder doesn't actually build a HCFG
any longer. Move what remains directly into VPlanConstruction.cpp.
Create an empty VPlan first, then let the HCFG builder create a plain
CFG for the top-level loop (w/o a top-level region). The top-level
region is introduced by a separate VPlan-transform. This is instead of
creating the vector loop region before building the VPlan CFG for the
input loop.
This simplifies the HCFG builder (which should probably be renamed) and
moves along the roadmap ('buildLoop') outlined in [1].
As follow-up, I plan to also preserve the exit branches in the initial
VPlan out of the CFG builder, including connections to the exit blocks.
The conversion from plain CFG with potentially multiple exits to a
single entry/exit region will be done as VPlan transform in a follow-up.
This is needed to enable VPlan-based predication. Currently early exit
support relies on building the block-in masks on the original CFG,
because exiting branches and conditions aren't preserved in the VPlan.
So in order to switch to VPlan-based predication, we will have to
preserve them in the initial plain CFG, so the exit conditions are
available explicitly when we convert to single entry/exit regions.
Another follow-up is updating the outer loop handling to also introduce
VPRegionBlocks for nested loops as transform. Currently the existing
logic in the builder will take care of creating VPRegionBlocks for
nested loops, but not the top-level loop.
[1]
https://llvm.org/devmtg/2023-10/slides/techtalks/Hahn-VPlan-StatusUpdateAndRoadmap.pdf
PR: https://github.com/llvm/llvm-project/pull/128419