Move the logic to expand SCEVs directly to a late VPlan transform that
expands SCEVs in the entry block. This turns VPExpandSCEVRecipe into an
abstract recipe without execute, which clarifies how the recipe is
handled, i.e. it is not executed like regular recipes.
It also helps to simplify construction, as now scalar evolution isn't
required to be passed to the recipe.
Materialze Build(Struct)Vectors explicitly for VPRecplicateRecipes, to
serve their users requiring a vector, instead of doing so when unrolling
by VF.
Now we only need to implicitly build vectors in VPTransformState::get
for VPInstructions. Once they are also unrolled by VF we can remove the
code-path alltogether.
PR: https://github.com/llvm/llvm-project/pull/151487
Materialize VF and VFxUF computation using VPInstruction
instead of directly creating IR.
This is one of the last few steps needed to model the full vector
skeleton in VPlan.
This is mostly NFC, although in some cases we remove some unused
computations.
PR: https://github.com/llvm/llvm-project/pull/152879
A lot of time getCanonicalIV() is used to get the canonical IV type,
e.g. to instantiate a VPTypeAnalysis or to get the LLVMContext.
However VPTypeAnalysis has a constructor that takes the VPlan directly
and there's a method on VPlan to get the LLVMContext directly, so use
those instead where possible.
This lets us remove a constructor on VPTypeAnalysis.
Also remove an unused LLVMContext argument in UnrollState whilst we're
here.
Split up the not clearly named prepareForVectorization transform into
buildVPlan0, which adds the vector preheader, middle and scalar
preheader blocks, as well as the canonical induction recipes and sets
the trip count. The new transform is run directly after building the
plain CFG VPlan initially.
The remaining code handling early exits and adding the branch in the
middle block is renamed to handleEarlyExitsAndAddMiddleCheck and still
runs at the original position.
With the code movement, we only have to add the skeleton once to the
initial VPlan, and cloning will take care of the rest. It will also
enable moving other construction steps to work directly on VPlan0, like
adding resume phis.
PR: https://github.com/llvm/llvm-project/pull/150848
Materialize the vector trip count computation using VPInstruction
instead of directly creating IR. This is one of the last few steps
needed to model the full vector skeleton in VPlan. It also simplifies
vector-trip count computations for scalable vectors, as we can re-use
the UF x VF computation.
PR: https://github.com/llvm/llvm-project/pull/151925
Now that VPWidenPointerInductionRecipes are modelled in VPlan in
#148274, we can support them in EVL tail folding.
We need to replace their VFxUF operand with EVL as the increment is not
guaranteed to always be VF on the penultimate iteration, and UF is
always 1 with EVL tail folding.
We also need to move the creation of the backedge value to the latch so
that EVL dominates it.
With this we will no longer fail to convert a VPlan to EVL tail folding,
so adjust tryAddExplicitVectorLength to account for this. This brings us
to 99.4% of all vector loops vectorized on SPEC CPU 2017 with tail
folding vs no tail folding.
The test in only-compute-cost-for-vplan-vfs.ll previously relied on
widened pointer inductions with EVL tail folding to end up in a scenario
with no vector VPlans, so this also replaces it with an unvectorizable
fixed-order recurrence test from
first-order-recurrence-multiply-recurrences.ll that also gets discarded.
Explicitly compute the backedge-taken count using VPInstruction. This is
needed to model the full skeleton in VPlan.
NFC modulo some instruction re-ordering.
Loop regions require fixed-length steps and rounded-up trip counts, but
after dissolution creates explicit control flow, EVL loops can leverage
variable-length stepping with original trip counts.
This patch adds a post-dissolution transform pass to convert EVL loops
from fixed-length to variable-length stepping .
Materialize constant vector trip counts before ::execute, if the trip
count can be computed as Original (TC / (VF * UF)) * (VF * UF). For now
this excludes when the tail is folded or scalar epilogues are required.
This enables removing a number of redundant branches from the middle
block.
For now this is also only done when not vectorizing the epilogue, as the
simplification complicates stitching the 2 plans together.
PR: https://github.com/llvm/llvm-project/pull/142309
Update LV to vectorize maxnum/minnum reductions without fast-math flags,
by adding an extra check in the loop if any inputs to maxnum/minnum are
NaN, due to maxnum/minnum behavior w.r.t to signaling NaNs. Signed-zeros
are already handled consistently by maxnum/minnum.
If any input is NaN,
*exit the vector loop,
*compute the reduction result up to the vector iteration that contained
NaN inputs and
* resume in the scalar loop
New recurrence kinds are added for reductions using maxnum/minnum
without fast-math flags.
PR: https://github.com/llvm/llvm-project/pull/148239
Simplify the handling of exit users by generating all extracts first
(safe option), and have FOR handling optimize the extracts, similar to
already done for reductions and inductions.
NFC modulo first-order recurrence extract order in middle block.
## Purpose
Export a small number of private LLVM symbols so that unit tests can
still build/run when LLVM is built as a Windows DLL or a shared library
with default hidden symbol visibility.
## Background
The effort to build LLVM as a WIndows DLL is tracked in #109483.
Additional context is provided in [this
discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307).
Some LLVM unit tests use internal/private symbols that are not part of
LLVM's public interface. When building LLVM as a DLL or shared library
with default hidden symbol visibility, the symbols are not available
when the unit test links against the DLL or shared library.
This problem can be solved in one of two ways:
1. Export the private symbols from the DLL.
2. Link the unit tests against the intermediate static libraries instead
of the final LLVM DLL.
This PR applies option 1. Based on the discussion of option 2 in
#145448, this option is preferable.
## Overview
* Adds a new `LLVM_ABI_FOR_TEST` export macro, which is currently just
an alias for `LLVM_ABI`.
* Annotates the sub-set of symbols under `llvm/lib` that are required to
get unit tests building using the new macro.
Connect SCEV and memory runtime check block directly in VPlan as
VPIRBasicBlocks, removing ILV::emitSCEVChecks and
ILV::emitMemRuntimeChecks.
The new logic is currently split across
LoopVectorizationPlanner::addRuntimeChecks which collects a list of
{Condition, CheckBlock} pairs and performs some checks and emits remarks
if needed. The list of checks is then added to VPlan in
VPlanTransforms::connectCheckBlocks.
PR: https://github.com/llvm/llvm-project/pull/143879
In addBranchWeightToMiddleTerminator we attempt to add branch weights to
the middle block terminator. We pessimistically assume vscale=1, whereas
we can improve the estimate by using the value of vscale used for
tuning.
Explicitly unroll VPReplicateRecipes outside replicate regions by VF,
replacing them by VF single-scalar recipes. Extracts for operands are
added as needed and the scalar results are combined to a vector using a
new BuildVector VPInstruction.
It also adds a few folds to simplify unnecessary extracts/BuildVectors.
It also adds a BuildStructVector opcode for handling of calls that have
struct return types.
VPReplicateRecipe in replicate regions can will be unrolled as follow
up, turing non-single-scalar VPReplicateRecipes into 'abstract', i.e.
not executable.
PR: https://github.com/llvm/llvm-project/pull/142433
The motivation of this PR is to make #115274 easier to implement, and
should allow us to add EVL support by just passing EVL to the VF
operand.
The current difficulty with widening IVs with EVL is that
VPWidenIntOrFpInductionRecipe generates its own backedge value. Since
it's a VPHeaderPHIRecipe the VF operand must be in the preheader, which
means we can't use the EVL since it's defined in the loop body.
The gist in this PR is to take the approach in #114305 and expand
VPWidenIntOrFpInductionRecipe into several recipes for the initial
value, phi and backedge value just before execution. I.e. this example:
```
vector.ph:
Successor(s): vector loop
<x1> vector loop: {
vector.body:
WIDEN-INDUCTION %i = phi %start, %step, %vf
...
EMIT branch-on-count ...
No successors
}
```
gets expanded to:
```
vector.ph:
...
vp<%induction.start> = ...
vp<%induction.increment> = ...
Successor(s): vector loop
<x1> vector loop: {
vector.body:
ir<%i> = WIDEN-PHI vp<%induction.start>, vp<%vec.ind.next>
...
vp<%vec.ind.next> = add ir<%i>, vp<%induction.increment>
EMIT branch-on-count ...
No successors
}
```
This allows us to a value defined in the loop in the backedge value, and
also means we can just reuse the existing backedge fixups in
VPlan::execute without having to specially handle it ourselves.
After this #115274 should just become a matter of setting the VF operand
to EVL (and building the increment step in the loop body, not the
preheader).
This reverts commit 0604dc199c019b23746f4a54885ba0c75569cdae.
The recommitted version addresses post-commit comments and adjusts the
place the branch weights are added. It now runs before VPlans are optimized
for VF and UF, which may remove the vector loop region, causing a crash
trying to get the middle block after that. Test case added in
72f99b75afc12bb.
Original message:
Manage branch weights for the BranchOnCond in the middle block in VPlan.
This requires updating VPInstruction to inherit from VPIRMetadata, which
in general makes sense as there are a number of opcodes that could take
metadata.
There are other branches (part of the skeleton) that also need branch
weights adding.
PR: https://github.com/llvm/llvm-project/pull/143035
Building on top of https://github.com/llvm/llvm-project/pull/114305,
replace VPRegionBlocks with explicit CFG before executing.
This brings the final VPlan closer to the IR that is generated and
helps to simplify codegen.
It will also enable further simplifications of phi handling during
execution and transformations that do not have to preserve the
canonical IV required by loop regions. This for example could include
replacing the canonical IV with an EVL based phi while completely
removing the original canonical IV.
PR: https://github.com/llvm/llvm-project/pull/117506
This reverts commit 793bb6b257fa4d9f4af169a4366cab3da01f2e1f.
The recommitted version contains a fix to make sure only the original
phis are processed in convertPhisToBlends nu collecting them in a vector
first. This fixes a crash when no mask is needed, because there is only
a single incoming value.
Original message:
This patch moves the logic to predicate and linearize a VPlan to a
dedicated VPlan transform. It mostly ports the existing logic directly.
There are a number of follow-ups planned in the near future to
further improve on the implementation:
* Edge and block masks are cached in VPPredicator, but the block masks
are still made available to VPRecipeBuilder, so they can be accessed
during recipe construction. As a follow-up, this should be replaced by
adding mask operands to all VPInstructions that need them and use that
during recipe construction.
* The mask caching in a map also means that this map needs updating each
time a new recipe replaces a VPInstruction; this would also be handled
by adding mask operands.
PR: https://github.com/llvm/llvm-project/pull/128420
This patch moves the logic to predicate and linearize a VPlan to a
dedicated VPlan transform. It mostly ports the existing logic directly.
There are a number of follow-ups planned in the near future to
further improve on the implementation:
* Edge and block masks are cached in VPPredicator, but the block masks
are still made available to VPRecipeBuilder, so they can be accessed
during recipe construction. As a follow-up, this should be replaced by
adding mask operands to all VPInstructions that need them and use that
during recipe construction.
* The mask caching in a map also means that this map needs updating each
time a new recipe replaces a VPInstruction; this would also be handled
by adding mask operands.
PR: https://github.com/llvm/llvm-project/pull/128420
This patch introduce two new recipes.
* VPExtendedReductionRecipe
- cast + reduction.
* VPMulAccumulateReductionRecipe
- (cast) + mul + reduction.
This patch also implements the transformation that match following
patterns via vplan and converts to abstract recipes for better cost
estimation.
* VPExtendedReduction
- reduce(cast(...))
* VPMulAccumulateReductionRecipe
- reduce.add(mul(...))
- reduce.add(mul(ext(...), ext(...))
- reduce.add(ext(mul(ext(...), ext(...))))
The converted abstract recipes will be lower to the concrete recipes
(widen-cast + widen-mul + reduction) just before recipe execution.
Note that this patch still relies on legacy cost model the calculate the
cost for these patters.
Will enable vplan-based cost decision in #113903.
Split from #113903.
Move early-exit handling up front to original VPlan construction, before
introducing early exits.
This builds on https://github.com/llvm/llvm-project/pull/137709, which
adds exiting edges to the original VPlan, instead of adding exit blocks
later.
This retains the exit conditions early, and means we can handle early
exits before forming regions, without the reliance on VPRecipeBuilder.
Once we retain all exits initially, handling early exits before region
construction ensures the regions are valid; otherwise we would leave
edges exiting the region from elsewhere than the latch.
Removing the reliance on VPRecipeBuilder removes the dependence on
mapping IR BBs to VPBBs and unblocks predication as VPlan transform:
https://github.com/llvm/llvm-project/pull/128420.
Depends on https://github.com/llvm/llvm-project/pull/137709 (included in
PR).
PR: https://github.com/llvm/llvm-project/pull/138393
Split off from #118638, this adds VPInstruction::StepVector, which
generates integer step vectors (0,1,2,...,VF). This is a step towards
eventually modelling all the separate parts of
VPWidenIntOrFpInductionRecipe in VPlan.
This is then used by VPWidenIntOrFpInductionRecipe, where we materialize
it just before unrolling so the operands stay in a fixed position.
The need for a separate operand in VPWidenIntOrFpInductionRecipe, as
well as the need to update it in
optimizeVectorInductionWidthForTCAndVFUF, should be removed with #118638
when everything is expanded in convertToConcreteRecipes.
This reverts commit d431921677ae923d189ff2d6f188f676a2964ed8.
Missing gtests have been updated.
Original message:
This addresses an existing TODO and simply moves the current code to add
canonical IV recipes to the initial skeleton construction, at the same
place where the corresponding region will be introduced.
This addresses an existing TODO and simply moves the current code to add
canonical IV recipes to the initial skeleton construction, at the same
place where the corresponding region will be introduced.
Move out the logic to prepare for vectorization to a separate transform,
before creating loop regions. This was discussed as follow-up
in https://github.com/llvm/llvm-project/pull/136455.
This just moves the existing code around slightly and will simplify
follow-up patches to include the exiting edges during initial VPlan
construction.
Follow-up as discussed in https://github.com/llvm/llvm-project/pull/129402.
After bc03d6cce257, the VPlanHCFGBuilder doesn't actually build a HCFG
any longer. Move what remains directly into VPlanConstruction.cpp.
This patch fixes:
llvm/lib/Transforms/Vectorize/VPlanTransforms.h:31:1: error: class
'VFRange' was previously declared as a struct; this is valid, but
may result in linker errors under the Microsoft C++ ABI
[-Werror,-Wmismatched-tags]
This patch check if the plan contains scalar VF by VFRange instead of
Plan.
This patch also clamp the range to contains either only scalar or only
vector VFs to prevent mis-compile.
Split from #113903.
This patch adds a WideIVStep opcode that can be used to create a vector
with the steps to increment a wide induction. The opcode has 2 operands
* the vector step
* the scale of the vector step
The opcode is later converted into a sequence of recipes that convert
the scale and step to the target type, if needed, and then multiply
vector step by scale.
This simplifies code that needs to materialize step vectors, e.g.
replacing wide IVs as follow up to
https://github.com/llvm/llvm-project/pull/108378 with an increment of
the wide IV step.
PR: https://github.com/llvm/llvm-project/pull/119284
This patch adds a new narrowInterleaveGroups transfrom, which tries
convert a plan with interleave groups with VF elements to a plan that
instead replaces the interleave groups with wide loads and stores
processing VF elements.
This effectively is a very simple form of loop-aware SLP, where we
use interleave groups to identify candidates.
This initial version is quite restricted and hopefully serves as a
starting point for how to best model those kinds of transforms. For now
it only transforms load interleave groups feeding store groups.
Depends on #106431.
This lands the main parts of the approved
https://github.com/llvm/llvm-project/pull/106441 as suggested to break
things up a bit more.
Update initial VPlan-construction in VPlanNativePath in line with the
inner loop path, in that it bails out when encountering constructs it
cannot handle, like non-intrinsic calls.
Fixes https://github.com/llvm/llvm-project/issues/131071.
Update and generalize materializeBroadcasts to also introduce explicit
broadcasts for VPValues defined in the Plans Entry block.
This fixes a crash when trying to insert the broadcasts generated by
VPTransformState::get after the generating instruction, which isn't
possible after invoke instructions.
Fixes https://github.com/llvm/llvm-project/issues/128838.
Create an empty VPlan first, then let the HCFG builder create a plain
CFG for the top-level loop (w/o a top-level region). The top-level
region is introduced by a separate VPlan-transform. This is instead of
creating the vector loop region before building the VPlan CFG for the
input loop.
This simplifies the HCFG builder (which should probably be renamed) and
moves along the roadmap ('buildLoop') outlined in [1].
As follow-up, I plan to also preserve the exit branches in the initial
VPlan out of the CFG builder, including connections to the exit blocks.
The conversion from plain CFG with potentially multiple exits to a
single entry/exit region will be done as VPlan transform in a follow-up.
This is needed to enable VPlan-based predication. Currently early exit
support relies on building the block-in masks on the original CFG,
because exiting branches and conditions aren't preserved in the VPlan.
So in order to switch to VPlan-based predication, we will have to
preserve them in the initial plain CFG, so the exit conditions are
available explicitly when we convert to single entry/exit regions.
Another follow-up is updating the outer loop handling to also introduce
VPRegionBlocks for nested loops as transform. Currently the existing
logic in the builder will take care of creating VPRegionBlocks for
nested loops, but not the top-level loop.
[1]
https://llvm.org/devmtg/2023-10/slides/techtalks/Hahn-VPlan-StatusUpdateAndRoadmap.pdf
PR: https://github.com/llvm/llvm-project/pull/128419
Add a new VPInstruction::Broadcast opcode and use it to materialize
explicit broadcasts of live-ins. The initial patch only materlizes the
broadcasts if the vector preheader dominates all uses that need it.
Later patches will pick the best valid insert point, thus retiring
implicit hoisting of broadcasts from VPTransformsState::get().
PR: https://github.com/llvm/llvm-project/pull/124644
Run recipe simplification and dead recipe removal after VPlan-based
unrolling and optimizeForVFAndUF, to clean up any redundant or dead
recipes introduced by them. Currently this is NFC, as it removes the
corresponding removeDeadRecipes run in optimizeForVFAndUF and no
additional simplifications kick in after unrolling yet. That is changing
with https://github.com/llvm/llvm-project/pull/123655.
Note that with this change, pattern-matching is now applied after
EVL-based recipes have been introduced.
Trying to match VPWidenEVLRecipe when not explicitly requested might
apply a pattern with 2 operands to one with 3 due to the extra EVL
operand and VPWidenEVLRecipe being a subclass of VPWidenRecipe.
To prevent this, update Recipe_match::match to only match
VPWidenEVLRecipe if it is in the requested recipe types (RecipeTy).
PR: https://github.com/llvm/llvm-project/pull/125926
This work feeds part of PR
https://github.com/llvm/llvm-project/pull/88385, and adds support for
vectorising
loops with uncountable early exits and outside users of loop-defined
variables. When calculating the final value from an uncountable early
exit we need to calculate the vector lane that triggered the exit,
and hence determine the value at the point we exited.
All code for calculating the last value when exiting the loop early
now lives in a new vector.early.exit block, which sits between the
middle.split block and the original exit block. Doing this required
two fixes:
1. The vplan verifier incorrectly assumed that the block containing
a definition always dominates the block of the user. That's not true
if you can arrive at the use block from multiple incoming blocks.
This is possible for early exit loops where both the early exit and
the latch jump to the same block.
2. We were adding the new vector.early.exit to the wrong parent loop.
It needs to have the same parent as the actual early exit block from
the original loop.
I've added a new ExtractFirstActive VPInstruction that extracts the
first active lane of a vector, i.e. the lane of the vector predicate
that triggered the exit.
NOTE: The IR generated for dealing with live-outs from early exit
loops is unoptimised, as opposed to normal loops. This inevitably
leads to poor quality code, but this can be fixed up later.
Add new runPass helpers to run a VPlan transformation. This makes it
easier to add additional checks/functionality for each transform run. In
this patch, an option is added to run the verifier after each VPlan
transform.
Follow-ups will use the same helper to also support printing VPlans
after each transform.
Note that the verifier at the moment requires there to be a canonical IV
and vector loop region, so the final lowering transforms aren't run via
runPass yet.
PR: https://github.com/llvm/llvm-project/pull/123640
Live-ins don't need to be handled, other than adding to the exit phi
recipe. Do that early and assert that otherwise the exit value is
defined in the vector loop region.
This should enable simply skipping other exit values that do not need
further fixing, e.g. if handling the exit value from the early exit
directly in handleUncountableEarlyExit.
PR: https://github.com/llvm/llvm-project/pull/123819
This reverts the revert commit 58326f1d5b5b379590af92dd129b2f3b3e96af46.
The build failure in sanitizer stage2 builds has been fixed with
0d39fe6f5bb3edf0bddec09a8c6417377390aeac.
Original commit message:
Model updating IV users directly in VPlan, replace fixupIVUsers.
Now simple extracts are created for all phis in the exit block during
initial VPlan construction. A later VPlan transform
(optimizeInductionExitUsers) replaces extracts of inductions with
their pre-computed values if possible.
This completes the transition towards modeling all live-outs directly in
VPlan.
There are a few follow-ups:
* emit extracts initially also for resume phis, and optimize them
tougher with IV exit users
* support for VPlans with multiple exits in optimizeInductionExitUsers.
Depends on https://github.com/llvm/llvm-project/pull/110004,
https://github.com/llvm/llvm-project/pull/109975 and
https://github.com/llvm/llvm-project/pull/112145.