Update VPRecipeBuilder to construct VPBlendRecipe from VPWidenPHIRecipe,
starting to thread recipes through the builder instead of the
underlying IR instruction up-front.
Landing first part of approved
https://github.com/llvm/llvm-project/pull/139475 separately as NFC as
suggested.
Use the fact that getSmallBestKnownTC returns an exact trip count, if
possible, and falls back to returning an estimate, to factor some code
in selectInterleaveCount.
Move early-exit handling up front to original VPlan construction, before
introducing early exits.
This builds on https://github.com/llvm/llvm-project/pull/137709, which
adds exiting edges to the original VPlan, instead of adding exit blocks
later.
This retains the exit conditions early, and means we can handle early
exits before forming regions, without the reliance on VPRecipeBuilder.
Once we retain all exits initially, handling early exits before region
construction ensures the regions are valid; otherwise we would leave
edges exiting the region from elsewhere than the latch.
Removing the reliance on VPRecipeBuilder removes the dependence on
mapping IR BBs to VPBBs and unblocks predication as VPlan transform:
https://github.com/llvm/llvm-project/pull/128420.
Depends on https://github.com/llvm/llvm-project/pull/137709 (included in
PR).
PR: https://github.com/llvm/llvm-project/pull/138393
Move flattening of the CFG out of the loop that creates the wide
recipes. This simplifies the already large loop and prepares for moving
flattening to a separate transform.
Split off from #118638, this adds VPInstruction::StepVector, which
generates integer step vectors (0,1,2,...,VF). This is a step towards
eventually modelling all the separate parts of
VPWidenIntOrFpInductionRecipe in VPlan.
This is then used by VPWidenIntOrFpInductionRecipe, where we materialize
it just before unrolling so the operands stay in a fixed position.
The need for a separate operand in VPWidenIntOrFpInductionRecipe, as
well as the need to update it in
optimizeVectorInductionWidthForTCAndVFUF, should be removed with #118638
when everything is expanded in convertToConcreteRecipes.
This patch attaches a new metadata, `llvm.loop.isvectorized.withevl`, on
loops vectorized with explicit vector length. This will help other
optimizations down in the pipeline that focus on EVL-vectorized loop
This approach is much safer than, said IR pattern matching to figure out
if a loop is EVL-vectorized or not.
This reverts commit d431921677ae923d189ff2d6f188f676a2964ed8.
Missing gtests have been updated.
Original message:
This addresses an existing TODO and simply moves the current code to add
canonical IV recipes to the initial skeleton construction, at the same
place where the corresponding region will be introduced.
This addresses an existing TODO and simply moves the current code to add
canonical IV recipes to the initial skeleton construction, at the same
place where the corresponding region will be introduced.
Move out the logic to prepare for vectorization to a separate transform,
before creating loop regions. This was discussed as follow-up
in https://github.com/llvm/llvm-project/pull/136455.
This just moves the existing code around slightly and will simplify
follow-up patches to include the exiting edges during initial VPlan
construction.
This reverts commit 8dd160f4767f971572eac065c8650d9202ff5bf9.
The recommit contains an adjustment to planContainsAdditionalSimplifications,
which considers changes to the original predicate for compares.
Original commit message:
Add simplification to fold negation into a compare, if the negation is
the only user of the compare. This removes a number of redundant
negations.
Alive2 Proofs for FPCMP test changes: https://alive2.llvm.org/ce/z/WGDz9U
PR: https://github.com/llvm/llvm-project/pull/129430
Simplify initial VPlan construction by not creating a separate
vector.latch block, which isn't needed and will get folded away later.
This has been suggested as independent clean-up multiple times.
ExtractFromEnd only has 2 uses, extracting the last and penultimate
elements. Replace it with 2 separate opcodes, removing the need to
materialize and handle a constant argument.
PR: https://github.com/llvm/llvm-project/pull/137030
Remove legacy ILV sinkScalarOperands, which is superseded by the
sinkScalarOperands VPlan transforms.
There are a few cases that aren't handled by VPlan's sinkScalarOperands,
because the recipes doesn't support replicating. Those are pointer
inductions and blends.
We could probably improve this further, by allowing replication for more
recipes, but I don't think the extra complexity is warranted.
Depends on https://github.com/llvm/llvm-project/pull/136021.
PR: https://github.com/llvm/llvm-project/pull/136023
Add incoming exit phi operands during the initial VPlan construction.
This ensures all users are added to the initial VPlan and is also needed
in preparation to retaining exiting edges during initial construction.
PR: https://github.com/llvm/llvm-project/pull/136455
willGenerateVectors switches on opcodes of a recipe, but Histogram is
missing in the switch statement, which could cause a crash in some
cases. The crash was initially observed when developing another patch.
InstructionCost is already an optional value, containing an Invalid
state that can be checked with isValid(). There is little point in
returning another optional from getValue(). Most uses do not make use of
it being a std::optional, dereferencing the value directly (either
isValid has been checked previously or the Cost is assumed to be valid).
The one case that does in AMDGPU used value_or which has been replaced
by a isValid() check.
Follow-up as discussed in https://github.com/llvm/llvm-project/pull/129402.
After bc03d6cce257, the VPlanHCFGBuilder doesn't actually build a HCFG
any longer. Move what remains directly into VPlanConstruction.cpp.
This patch moves the check for a single latch exit from computeMaxVF()
to LoopVectorizationLegality::canFoldTailByMasking(), as it duplicates
the logic when foldTailByMasking() returns false.
It also updates the NoScalarEpilogueNeeded logic to return false for
loops that are neither single-latch-exit nor early-exit. This avoids
applying tail-folding in unsupported cases and prevents triggering
assertions during analysis.
This patch check if the plan contains scalar VF by VFRange instead of
Plan.
This patch also clamp the range to contains either only scalar or only
vector VFs to prevent mis-compile.
Split from #113903.
For loops without loads/stores, where the smallest/widest types are
calculated from the reduction, the smallest type returned is always -1U
and it actually returns the smallest type as the widest type. This PR
fixes the calculation.
This follows from
https://github.com/llvm/llvm-project/pull/132190#discussion_r2044232607
Any VPlan we generate that contains a replicator region will result in
replicated blocks in the output, causing a large code size increase.
Reject such VPlans when optimizing for size, as the code size impact is
usually worse than having a scalar epilogue, which we already forbid
with optsize.
This change requires a lot of test changes. For tests of optsize
specifically I've updated the test with the new output, otherwise the
tests have been adjusted to not rely on optsize.
Fixes#66652
Fixed-order recurrence phis cannot be forced to be scalar, they will
always be widened at the moment.
Make sure we don't add them to ForcedScalars, otherwise the legacy cost
model will compute incorrect costs.
This fixes an assertion reported with
https://github.com/llvm/llvm-project/pull/129645.
This patch adds a WideIVStep opcode that can be used to create a vector
with the steps to increment a wide induction. The opcode has 2 operands
* the vector step
* the scale of the vector step
The opcode is later converted into a sequence of recipes that convert
the scale and step to the target type, if needed, and then multiply
vector step by scale.
This simplifies code that needs to materialize step vectors, e.g.
replacing wide IVs as follow up to
https://github.com/llvm/llvm-project/pull/108378 with an increment of
the wide IV step.
PR: https://github.com/llvm/llvm-project/pull/119284