This consists of the following changes:
* Merge several overloads of `VPlanTransforms::runPass` into a single
function
to avoid code duplication.
* Add helper macro `RUN_VPLAN_PASS` to capture the transformation name
and pass it to the helper above for printing.
* Add new `-vplan-print-after-all` option (somewhat similar to existing
`-vplan-verify-each`).
* Add two empty passes `printAfterInitialConstruction`/`printFinalVPlan`
so that initial/final
VPlans would be supported in `-vplan-print-after-all`
This follows the original future plans in
https://github.com/llvm/llvm-project/pull/123640.
In some cases, we identify patterns as reductions, even though they can
be simplified to a non-reduction.
Mark VPReductionPHIRecipe as not reading from memory & not having
side-effects, to clean them up.
We also need to remove ComputeReductionResult VPInstructions with
live-in arguments. This means there is actually no reduction, and we
need to fold it to the live in. Otherwise we would incorrectly reduce
the live-in.
PR: https://github.com/llvm/llvm-project/pull/176795
Move SubclassID to VPRecipeBase, and store VPRecipeBase directly in
VPRecipeValue, instead of VPDef. This allows for some additional
simplifications and VPDef now just holds various helpers to deal with
removing and adding VPValues.
This reverts commit 16395da0ff577750571b99fe28281ce6fb6a3ae8.
PR: https://github.com/llvm/llvm-project/pull/174282
Bail out on scalable vectors in helper. Currently this is not causing
issues, but fixes a potential crash that would be exposed by a follow-up
change.
Test would exposes the issue in the future has been added in
8c5352cf3e14ec0c56f592091899d229de8436a7.
When `extract-lane` only contains single vector operand. We can simplify
it to `extractelement`.
This patch makes `extract-lane` generate simple `extractelement` when it
only contains single vector operand to prevent unused IR generated.
This patch is mostly NFC, the unused IR should be removed in following
IR passes.
Fold select c, false, true -> not c. This allows for more accurate cost
estimation and fixes the underlying issue for the cost divergence
between legacy and VPlan-based cost model that caused the revert of
01d34eb38fa058 in ed004cf42bf57c.
https://alive2.llvm.org/ce/z/yVuSgW.
Addresses part of #153144 and splits off part of #166164
There are two parts to the EVL transform:
1) Convert the loop so the number of elements processed each iteration
is EVL, not VF. The IV and header mask are replaced with EVL-based
variants.
2) Optimize users of the EVL based header mask to VP intrinsic based
recipes.
(1) changes the semantics of the vector loop region, whereas (2) needs
to preserve them. This splits (2) out so we don't mix the two up, and
allows us to move (1) earlier in the pipeline in a future PR.
This patch improves the lowering for BranchOnTwoConds added in
https://github.com/llvm/llvm-project/pull/172750 by replacing the branch
on OR with a chain of 2 branches.
On Apple M cores, the new lowering is ~8-10% faster for std::find-like
loops. It also makes it easier to determine the early exits in VPlan. I
am also planning on extensions to support loops with multiple early
exits and early-exits at different positions, which should also be
slightly easier to do with the new representation.
PR: https://github.com/llvm/llvm-project/pull/174016
This patch simplifies extract-lane(%lane_num, %X) to %X when %X is a
scalar value. Extracting from a scalar is redundant since there is only
one value to extract.
This patch adds VPValue sub-classes for the different cases we currently
have:
* VPIRValue: A live-in VPValue that wraps an underlying IR value
* VPSymbolicValue: A symbolic VPValue not tied to an underlying value,
e.g. the vector trip count or VF VPValues
* VPRecipeValue: A VPValue defined by a VPDef/VPRecipeBase.
This has multiple benefits:
* clearer constructors for each kind of VPValue
* limited scope: for example allows moving VPDef member to VPRecipeValue,
reducing size of other VPValues.
* stricter type checking for member variables (e.g. using VPLiveIn in
the Value -> live-in map in VPlan, or using VPSymbolicValue for symbolic
member VPValues)
There probably are additional opportunities for cleanups as follow-ups.
PR: https://github.com/llvm/llvm-project/pull/172758
The original patch, landed as a2db31b0 ([VPlan] Simplify pow-of-2
(mul|udiv) -> (shl|lshr), #172477) had a critical commutative matcher
bug, which has now been fixed. An assert has also been strengthened,
following a post-commit review.
All extra state has been removed from VPWidenSelectRecipe at this point.
There's no benefit of having a separate recipe and Select can easily be
handled by the existing VPWidenRecipe.
PR: https://github.com/llvm/llvm-project/pull/174234
This fixes a crash after introducing BranchOnTwoConds (524b1788,
https://github.com/llvm/llvm-project/pull/172750) when trying to
replace BranchOnTwoConds with a VPBranchOnCond, without dissolving the
region.
In that case, we need to update the appropriate condition operand.
This PR introduces a new BranchOnTwoConds VPInstruction, that takes 2
boolean operands and must be placed in a block with 3 successors.
If condition I is true, branches to successor I, otherwise falls through
to check the next condition. If both conditions are false, branch to the
third successor.
This new branch recipe is used for early-exit loops, to simplify the
representation in VPlan initially, by avoid the need for splitting the
middle block early on, in a way that preserves the single-exit block
property of regions. All exits still go through the latch block, but
they can go to more than 2 successors.
This idea was part of one of the original proposals for how to model
early exits in VPlan, but at that point in time, there was no good way
to handle this during code-gen, and we went with the early split-middle
block approach initially.
Now that we dissolve regions before ::execute, the new recipe can be
lowered nicely after regions have been removed, to a set of VPBBs and
BranchOnCond recipes. The initial lowering preserves the original
structure with the split middle blocks. Follow-ups will improve the
lowering to avoid this splitting, providing performance gains.
PR: https://github.com/llvm/llvm-project/pull/172750
getSCEVExprForVPValue is used to create SCEVs for expressions from the
original loop, which may be predicated. Use PSE to construct predicated
SCEVs if possible. This matches the legacy LV code behavior.
Currently should be NFC, but will enable migrating more SCEV/cost-based
computations to VPlan.
The patch requires exposing a new getPredicatedSCEV helper to
PredicatedScalarEvolution which just takes a SCEV, to avoid needing to
go through IR values, which isn't an option for getSCEVExprForVPValue.
This patch introduces VPInstruction::Reverse and extracts the reverse
operations of loaded/stored values from reverse memory accesses. This
extraction facilitates future support for permutation elimination within
VPlan.
This patch optimizes vector scatters that have a uniform (single-scalar)
address by replacing them with "extract-last-lane + scalar store" when
the scatter is unmasked.
Notes:
- The legacy cost model can scalarize a store if both the address and
the value are uniform. In VPlan we materialize the stored value via
ExtractLastLane, so only the address must be uniform.
- Some of the loops won't be vectorized any sine no vector instructions
will be generated.
In an effort to get rid of VPUnrollPartAccessor and directly unroll
recipes, start by directly unrolling VectorPointerRecipe, allowing for
VPlan-based simplifications and simplification of the corresponding
execute.
ExtractLastLane is a no-op for scalar VFs. Update simplifyRecipe to
remove them. This also requires adjusting the code in VPlanUnroll.cpp to
split off handling of ExtractLastLane/ExtractPenultimateElement for
scalar VFs, which now needs to match ExtractLastPart.
PR: https://github.com/llvm/llvm-project/pull/171145
These quantities should never unsigned-wrap. This matches the behavior
if only VFxUF is used (and not VF): when computing both VF and VFxUF,
nuw should hold for each step separately.
Replace ExtractLastElement and ExtractLastLanePerPart with more generic
and specific ExtractLastLane and ExtractLastPart, which model distinct
parts of extracting across parts and lanes. ExtractLastElement ==
ExtractLastLane(ExtractLastPart) and ExtractLastLanePerPart ==
ExtractLastLane, the latter clarifying the name of the opcode. A new
m_ExtractLastElement matcher is provided for convenience.
The patch should be NFC modulo printing changes.
PR: https://github.com/llvm/llvm-project/pull/164124
Some recent patches have added more non-annotated empty locations to the
loop vectorizer, resulting in errors reported on the DebugLoc coverage
tracking buildbot:
https://lab.llvm.org/staging/#/builders/222/builds/1938
This patch adds "unknown" annotations in place of the empty locations,
allowing the buildbot to ignore them for now.
The logic for narrowing to single scalar recipes is in two different
places: narrowToSingleScalarRecipes and legalizeAndOptimizeInductions.
Consolidate them.
Extend the logic to hoist predicated loads
(https://github.com/llvm/llvm-project/pull/168373) to sink predicated
stores with complementary masks in a similar fashion.
The patch refactors some of the existing logic for legality checks to be
shared between hosting and sinking, and adds a new sinking transform on
top.
With respect to the legality checks, for sinking stores the code also
checks if there are any aliasing stores that may alias, not only loads.
PR: https://github.com/llvm/llvm-project/pull/168771
For scalable vectors, VPScsalarIVStepsRecipe cannot create all scalar
step values. At the moment, it creates a vector, in addition to to the
first lane. The only supported case for this is when only the last lane
is used. A recipe should not set both scalar and vector values.
Instead, we can simply use a vector induction. It would also be possible
to preserve the current vector code-gen, by creating VPInstructions
based on the first lane of VPScalarIVStepsRecipe, but using a vector
induction seems simpler.
PR: https://github.com/llvm/llvm-project/pull/169796
While attempting to remove the use of undef from more loop vectoriser
tests I discovered a bug where this assert was firing:
```
llvm::Constant* llvm::Constant::getSplatValue(bool) const: Assertion `this->getType()->isVectorTy() && "Only valid for vectors!"' failed.
...
#8 0x0000aaaab9e2fba4 llvm::Constant::getSplatValue
#9 0x0000aaaab9dfb844 llvm::ConstantFoldBinaryInstruction
```
This seems to be happening because we are incorrectly generating
WidePtrAdd recipes for scalar VFs. The PR fixes this by checking whether
a plan has a scalar VF only in legalizeAndOptimizeInductions.
This PR also removes the use of undef from the test `both` in
Transforms/LoopVectorize/iv_outside_user.ll, which is what started
triggering the assert.
Fixes#169334
Update the logic in narrowToSingleScalar to allow narrowing even if not
all users use scalars, if at least one of the operands already needs
broadcasting.
In that case, there won't be any additional broadcasts introduced. This
should allow removing the special handling for stores, which can
introduce additional broadcasts currently.
Fixes https://github.com/llvm/llvm-project/issues/169668.
PR: https://github.com/llvm/llvm-project/pull/168246
In some case, VPWidenPointerInductions become only used by scalars after
legalizeAndOptimizationInducftions was already run, for example due to
some VPlan optimizations.
Move the code to scalarize VPWidenPointerInductions to a helper and use
it if needed.
This fixes a crash after #148274 in the added test case.
Fixes https://github.com/llvm/llvm-project/issues/169780