- Remove file local functions out of `llvm` or anonymous namespace and
make them static.
- Use namespace qualifier to define `BoUpSLP` class and several template
specializations.
Replace the assert checking if CurrentLinkI is a CmpInst with a pattern
matching check in the if condition. This uses VPlan-level pattern matching
instead of inspecting the underlying instruction type.
Directly update induction increments with step value created for wide
inductions in createWidenInductionRecipes, which does not require
looking up via RecipeBuilder.
Currently in optimizeMaskToEVL we convert every widened load, store or
reduction to a VP predicated recipe with EVL, regardless of whether or
not it uses the header mask.
So currently we have to be careful when working on other parts VPlan to
make sure that the EVL transform doesn't break or transform something
incorrectly, because it's not a semantics preserving transform.
Forgetting to do so has caused miscompiles before, like the case that
was fixed in #113667
This PR rewrites it to work in terms of pattern matching, so it now only
converts a recipe to a VP predicated recipe if it is exactly masked with
the header mask.
After this the transform should be a true optimisation and not change
any semantics, so it shouldn't miscompile things if other parts of VPlan
change.
This fixes#152541, and allows us to move addExplicitVectorLength into
tryToBuildVPlanWithVPRecipes in #153144
It also splits out the load/store transforms into separate patterns for
reversed and non-reversed, which should make #146525 easier to implement
and reason about.
Follow up on 7c4f188 ([LV] Support multiplies by constants when forming
scaled reductions), introducing m_APInt, and improving code around
canConstantBeExtended: we change canConstantBeExtended to take an APInt.
When narrowing stores of a single-scalar, we currently use
ExtractLastElement, which extracts the last element across all parts.
This is not correct if the store's address is not uniform across all
parts. If it is only uniform-per-part, the last lane per part must be
extracted. Add a new ExtractLastLanePerPart opcode to handle this
correctly. Most transforms apply to both ExtractLastElement and
ExtractLastLanePerPart, with the only difference being their treatment
during unrolling.
Fixes https://github.com/llvm/llvm-project/issues/162498.
PR: https://github.com/llvm/llvm-project/pull/163056
Splitting out just the recipe finding code from #148626 into a utility
function (along with the extra pattern matchers). Hopefully this makes
reviewing a bit easier.
Added a gtest, since this isn't actually used anywhere yet.
Follow up on 7fb3a91 ([PatternMatch] Introduce match functor) to
introduce the VPlanPatternMatch version of the match functor to shorten
some idioms.
Co-authored-by: Luke Lau <luke@igalia.com>
The m_GetElementPtr matcher is incorrect and incomplete. Fix it to match
all possible GEPs to avoid misleading users. It currently just has one
use, and the change is non-functional for that use.
This patch adds a new flag (-enable-wide-lane-mask) which allows
LoopVectorize to generate wider-than-VF active lane masks when it
is safe to do so (i.e. the mask is used for data and control flow).
The transform in extractFromWideActiveLaneMask creates vector
extracts from the first active lane mask in the header & loop body,
modifying the active lane mask phi operands to use the extracts.
An additional operand is passed to the ActiveLaneMask instruction,
the value of which is used as a multiplier of VF when generating the
mask.
By default this is 1, and is updated to UF by
extractFromWideActiveLaneMask.
The motivation for this change is to improve interleaved loops when
SVE2.1 is available, where we can make use of the whilelo instruction
which returns a predicate pair.
This is based on a PR that was created by @momchil-velikov (#81140)
and contains tests which were added there.
This changes the branch condition to use the AVL's backedge value
instead of the EVL-based IV.
This allows us to emit bnez on RISC-V and removes a use of the trip
count, which should reduce register pressure.
To match phis with VPlanPatternMatch I've had to relax the assert that
the number of operands must exactly match the pattern for the Phi
opcode, and I've copied over m_ZExtOrSelf from the LLVM IR
PatternMatch.h.
Fixes#151459
Extend [Specific]Cmp_match to handle floating-point compares, and
introduce m_Cmp that matches both integer and floating-point compares.
Use it in simplifyRecipe to match and simplify the general case of
compares. The change has necessitated a bugfix in
VPReplicateRecipe::execute.
Instead of defining unary/binary/ternary/4ary overloads of each matcher,
we can use parameter packs to support arbitrary numbers of operands.
This allows us to remove the explicit N-ary definitions for each
matcher.
We need to rewrite Recipe_match's constructor to use a parameter pack
too, otherwise we end up with ambiguous overloads.
This is split off from #133993
On its own this simplification isn't that useful, but it allows us to
make the equivalent VPBlendRecipe optimisation more generic by operating
on VPInstructions.
In order to actually test this without #133993, I've had to also extend
the m_Not pattern matcher to also catch VPWidenRecipes, since I couldn't
really think of a straightforward way to create a VPInstruction::Select
with a negated condition.
Explicitly unroll VPReplicateRecipes outside replicate regions by VF,
replacing them by VF single-scalar recipes. Extracts for operands are
added as needed and the scalar results are combined to a vector using a
new BuildVector VPInstruction.
It also adds a few folds to simplify unnecessary extracts/BuildVectors.
It also adds a BuildStructVector opcode for handling of calls that have
struct return types.
VPReplicateRecipe in replicate regions can will be unrolled as follow
up, turing non-single-scalar VPReplicateRecipes into 'abstract', i.e.
not executable.
PR: https://github.com/llvm/llvm-project/pull/142433
Recipes that are vector-to-scalar are guaranteed to generate a scalar
value, so the extract is redundant after VPlan unrolling. Remove it.
This removes unneeded ExtractLastElement VPInstruction of reduction
result computations.
Similar to modeling the start value as operand, also model the sentinel
value as operand explicitly. This makes all require information for
code-gen available directly in VPlan.
PR: https://github.com/llvm/llvm-project/pull/142291
783a846 changed VPScalarIVStepsRecipe to take 3 arguments (adding
VF explicitly) instead of 2, but didn't change the corresponding
pattern matcher.
This matcher was only used in vputils::isHeaderMask, and no test
ever reached that function with a ScalarIVSteps recipe for the
value being matched -- it was always a WideCanonicalIV. So the
matcher bailed out immediately before checking arguments and
asserting that the number of arguments in the recipe was the
same provided by the matcher.
Since the constructors for ScalarIVSteps take 3 values, we should
be safe to update the matcher and guard it with a dedicated gtest.
m_CanonicalIV() on the other hand is removed; as a phi recipe it
may not have a consistent number of arguments to match, only
requiring one (the start value) when being constructed with the
assumption that a second incoming value is added for the backedge
later. In order to keep the matcher we would need to add multiple
matchers with different numbers of arguments for it depending on
what phase of vplan construction we were in, and ensure that we
never reorder matcher usage vs. vplan transformation. Since the
main IR PatternMatch.h doesn't contain any matchers for PHI nodes,
I think we can just remove it and match via m_Specific() using the
VPValue we get from Plan.getCanonicalIV().
With EVL tail folding an AnyOf reduction will emit an i1 vp.merge like
vp.merge true, (or phi, cond), phi, evl
We can remove the or and optimise this to
vp.merge cond, true, phi, evl
Which makes it slightly easier to pattern match in #134898.
This also adds a pattern matcher for calls to help match this.
Blended AnyOf reductions will use an and instead of an or, which we may
also be able to simplify in a later patch.
Add additional OR simplification to fix a divergence between legacy and
VPlan-based cost model.
This adds a new m_AllOnes matcher by generalizing specific_intval to
int_pred_ty, which takes a predicate to check to support matching both
specific APInts and other APInt predices, like isAllOnes.
Fixes https://github.com/llvm/llvm-project/issues/131359.
This copies over the implementation of m_Deferred which allows matching
values that were bound in the pattern, and uses it for the (X && Y) ||
(X && !Y) -> X simplifcation.
Optimize the IR generated for a VPWidenIntOrFpInductionRecipe to use the
narrowest type necessary, when the trip-count of a loop is known to be
constant and the only use of the recipe is the condition used by the
vector loop's backedge branch.
This is a copy of #126177, since it was automatically and permanently
closed because I messed up the source branch on my remote
This patch proposes to avoid converting widening recipes to VP
intrinsics during the EVL transform.
IIUC we initially did this to avoid `vl` toggles on RISC-V. However we
now have the RISCVVLOptimizer pass which mostly makes this redundant.
Emitting regular IR instead of VP intrinsics allows more generic
optimisations, both in the middle end and DAGCombiner, and we generally
have better patterns in the RISC-V backend for non-VP nodes. Sticking to
regular IR instructions is likely a lot less work than reimplementing
all of these optimisations for VP intrinsics, and on SPEC CPU 2017 we get
noticeably better code generation.
Run recipe simplification and dead recipe removal after VPlan-based
unrolling and optimizeForVFAndUF, to clean up any redundant or dead
recipes introduced by them. Currently this is NFC, as it removes the
corresponding removeDeadRecipes run in optimizeForVFAndUF and no
additional simplifications kick in after unrolling yet. That is changing
with https://github.com/llvm/llvm-project/pull/123655.
Note that with this change, pattern-matching is now applied after
EVL-based recipes have been introduced.
Trying to match VPWidenEVLRecipe when not explicitly requested might
apply a pattern with 2 operands to one with 3 due to the extra EVL
operand and VPWidenEVLRecipe being a subclass of VPWidenRecipe.
To prevent this, update Recipe_match::match to only match
VPWidenEVLRecipe if it is in the requested recipe types (RecipeTy).
PR: https://github.com/llvm/llvm-project/pull/125926
Check the VPlan directly to determine if a VPValue is an optimiziable IV
or IV use instead of checking the underlying IR instructions.
Split off from https://github.com/llvm/llvm-project/pull/112147. This
refactoring enables moving IV end value creation from the legacy
fixupIVUsers to a VPlan-based transform.
There is one case we now won't optimize, that is IVs with subtracts and
non-constant steps. But as this is a minor optimization and doesn't
impact correctness, the benefits of performing the check in VPlan should
outweigh the missed case.
Split DerivedIV simplification off from
https://github.com/llvm/llvm-project/pull/112145 and use to remove the
need for extra checks in createScalarIVSteps. Required an extra
simplification run after IV transforms.
Following #90184, this patch emits vp.merge intrinsic, which is used to
set the inactive lanes in a select operation to the RHS instead of
undef. Currently, it is applied to out-loop reduction for EVL
vectorization.
This patch performs transformation to convert
select(header_mask, LHS, RHS)
into
vp.merge(all-true, LHS, RHS, EVL)
And always use the predicated reduction select to set the incoming value
of the reduction phi to support out-loop reduction when using tail
folding with EVL.
TODO: Postpone the adjustment of the predicated reduction select to
VPlanTransform. The current adjustment might be too early, which could
lead to a situation where the predicated reduction select is adjusted,
but the EVL recipes cannot be successfully generated during
VPlanTransform.