This reverts commit d431921677ae923d189ff2d6f188f676a2964ed8.
Missing gtests have been updated.
Original message:
This addresses an existing TODO and simply moves the current code to add
canonical IV recipes to the initial skeleton construction, at the same
place where the corresponding region will be introduced.
This addresses an existing TODO and simply moves the current code to add
canonical IV recipes to the initial skeleton construction, at the same
place where the corresponding region will be introduced.
Move out the logic to prepare for vectorization to a separate transform,
before creating loop regions. This was discussed as follow-up
in https://github.com/llvm/llvm-project/pull/136455.
This just moves the existing code around slightly and will simplify
follow-up patches to include the exiting edges during initial VPlan
construction.
This reverts commit 8dd160f4767f971572eac065c8650d9202ff5bf9.
The recommit contains an adjustment to planContainsAdditionalSimplifications,
which considers changes to the original predicate for compares.
Original commit message:
Add simplification to fold negation into a compare, if the negation is
the only user of the compare. This removes a number of redundant
negations.
Alive2 Proofs for FPCMP test changes: https://alive2.llvm.org/ce/z/WGDz9U
PR: https://github.com/llvm/llvm-project/pull/129430
Simplify initial VPlan construction by not creating a separate
vector.latch block, which isn't needed and will get folded away later.
This has been suggested as independent clean-up multiple times.
ExtractFromEnd only has 2 uses, extracting the last and penultimate
elements. Replace it with 2 separate opcodes, removing the need to
materialize and handle a constant argument.
PR: https://github.com/llvm/llvm-project/pull/137030
Remove legacy ILV sinkScalarOperands, which is superseded by the
sinkScalarOperands VPlan transforms.
There are a few cases that aren't handled by VPlan's sinkScalarOperands,
because the recipes doesn't support replicating. Those are pointer
inductions and blends.
We could probably improve this further, by allowing replication for more
recipes, but I don't think the extra complexity is warranted.
Depends on https://github.com/llvm/llvm-project/pull/136021.
PR: https://github.com/llvm/llvm-project/pull/136023
Add incoming exit phi operands during the initial VPlan construction.
This ensures all users are added to the initial VPlan and is also needed
in preparation to retaining exiting edges during initial construction.
PR: https://github.com/llvm/llvm-project/pull/136455
willGenerateVectors switches on opcodes of a recipe, but Histogram is
missing in the switch statement, which could cause a crash in some
cases. The crash was initially observed when developing another patch.
InstructionCost is already an optional value, containing an Invalid
state that can be checked with isValid(). There is little point in
returning another optional from getValue(). Most uses do not make use of
it being a std::optional, dereferencing the value directly (either
isValid has been checked previously or the Cost is assumed to be valid).
The one case that does in AMDGPU used value_or which has been replaced
by a isValid() check.
Follow-up as discussed in https://github.com/llvm/llvm-project/pull/129402.
After bc03d6cce257, the VPlanHCFGBuilder doesn't actually build a HCFG
any longer. Move what remains directly into VPlanConstruction.cpp.
This patch moves the check for a single latch exit from computeMaxVF()
to LoopVectorizationLegality::canFoldTailByMasking(), as it duplicates
the logic when foldTailByMasking() returns false.
It also updates the NoScalarEpilogueNeeded logic to return false for
loops that are neither single-latch-exit nor early-exit. This avoids
applying tail-folding in unsupported cases and prevents triggering
assertions during analysis.
This patch check if the plan contains scalar VF by VFRange instead of
Plan.
This patch also clamp the range to contains either only scalar or only
vector VFs to prevent mis-compile.
Split from #113903.
For loops without loads/stores, where the smallest/widest types are
calculated from the reduction, the smallest type returned is always -1U
and it actually returns the smallest type as the widest type. This PR
fixes the calculation.
This follows from
https://github.com/llvm/llvm-project/pull/132190#discussion_r2044232607
Any VPlan we generate that contains a replicator region will result in
replicated blocks in the output, causing a large code size increase.
Reject such VPlans when optimizing for size, as the code size impact is
usually worse than having a scalar epilogue, which we already forbid
with optsize.
This change requires a lot of test changes. For tests of optsize
specifically I've updated the test with the new output, otherwise the
tests have been adjusted to not rely on optsize.
Fixes#66652
Fixed-order recurrence phis cannot be forced to be scalar, they will
always be widened at the moment.
Make sure we don't add them to ForcedScalars, otherwise the legacy cost
model will compute incorrect costs.
This fixes an assertion reported with
https://github.com/llvm/llvm-project/pull/129645.
This patch adds a WideIVStep opcode that can be used to create a vector
with the steps to increment a wide induction. The opcode has 2 operands
* the vector step
* the scale of the vector step
The opcode is later converted into a sequence of recipes that convert
the scale and step to the target type, if needed, and then multiply
vector step by scale.
This simplifies code that needs to materialize step vectors, e.g.
replacing wide IVs as follow up to
https://github.com/llvm/llvm-project/pull/108378 with an increment of
the wide IV step.
PR: https://github.com/llvm/llvm-project/pull/119284
This PR accounts for scaled reductions in `calculateRegisterUsage` to
reflect the fact that the number of lanes in their output is smaller
than the VF.
Depends on https://github.com/llvm/llvm-project/pull/126437
Now that VPlan is able to fold away redundant branches to the scalar
preheader, we can directly check in VPlan if the scalar tail may
execute. hasScalarTail returns true if the tail may execute.
We know that the scalar tail won't execute if the scalar preheader
doesn't have any predecessors, i.e. is not reachable.
This removes some late uses of the legacy cost model.
PR: https://github.com/llvm/llvm-project/pull/134674
There are some opcodes that currently require specialized recipes, due
to their result type not being implied by their operands, including
casts.
This leads to duplication from defining multiple full recipes.
This patch introduces a new VPInstructionWithType subclass that also
stores the result type. The general idea is to have opcodes needing to
specify a result type to use this general recipe. The current patch
replaces VPScalarCastRecipe with VInstructionWithType, a similar patch
for VPWidenCastRecipe will follow soon.
There are a few proposed opcodes that should also benefit, without the
need of workarounds:
* https://github.com/llvm/llvm-project/pull/129508
* https://github.com/llvm/llvm-project/pull/119284
PR: https://github.com/llvm/llvm-project/pull/129706
Add a version of calculateRegisterUsage that works estimates register
usage for a VPlan. This mostly just ports the existing code, with some
updates to figure out what recipes will generate vectors vs scalars.
There are number of changes in the computed register usages, but they
should be more accurate w.r.t. to the generated vector code.
There are the following changes:
* Scalar usage increases in most cases by 1, as we always create a
scalar canonical IV, which is alive across the loop and is not
considered by the legacy implementation
* Output is ordered by insertion, now scalar registers are added first
due the canonical IV phi.
* Using the VPlan, we now also more precisely know if an induction will
be vectorized or scalarized.
Depends on https://github.com/llvm/llvm-project/pull/126415
PR: https://github.com/llvm/llvm-project/pull/126437
Add a dedicated function to check if a plan is for a loop with an early
exit. This can easily be determined by checking the exit blocks.
This allows removing a use of Legal->hasUncountableEarlyExit() from
InnerLoopVectorizer.
PR: https://github.com/llvm/llvm-project/pull/134720
This patch changes the preferInLoopReduction function to take a
RecurKind instead of an unsigned Opcode.
This makes it possible to distinguish non-arithmetic reductions such as
min/max, AnyOf, and FindLastIV, and also helps unify IAnyOf with FAnyOf
and IFindLastIV with FFindLastIV.
Related patch #118393#131830
This reverts commit 46a2f4174a051f29a09dbc3844df763571c67309.
Recommits 2fd6f8fb5e3a with corresponding VPlan change to ensure
LoopInfo is updated for all blocks during VPlan execution if needed.
Add an initial CFG simplification transform, which removes the dead
edges for blocks terminated with BranchOnCond true.
At the moment, this removes the edge between middle block and scalar
preheader when folding the tail.
PR: https://github.com/llvm/llvm-project/pull/106748
If ScalarPH has predecessors, we may need to update its reduction resume
values. If there is a middle block, it must be the first predecessor.
Note that the first predecessor may not be the middle block, if the
middle block doesn't branch to the scalar preheader. In that case,
fixReductionScalarResumeWhenVectorizingEpilog will be a no-op.
In preparation for https://github.com/llvm/llvm-project/pull/106748.