It is almost always simpler to use {} instead of std::nullopt to
initialize an empty ArrayRef. This patch changes all occurrences I could
find in LLVM itself. In future the ArrayRef(std::nullopt_t) constructor
could be deprecated or removed.
This patch is split off from PR #88385 and concerns only the code
related to the legality of vectorising early exit loops. It is the first
step in adding support for vectorisation of a simple class of loops that
typically involves searching for something, i.e.
for (int i = 0; i < n; i++) {
if (p[i] == val)
return i;
}
return n;
or
for (int i = 0; i < n; i++) {
if (p1[i] != p2[i])
return i;
}
return n;
In this initial commit LoopVectorizationLegality will only consider
early exit loops legal for vectorising if they follow these criteria:
1. There are no stores in the loop.
2. The loop must have only one early exit like those shown in the above
example. I have referred to such exits as speculative early exits, to
distinguish from existing support for early exits where the
exit-not-taken count is known exactly at compile time.
3. The early exit block dominates the latch block.
4. The latch block must have an exact exit count.
5. There are no loads after the early exit block.
6. The loop must not contain reductions or recurrences. I don't see
anything fundamental blocking vectorisation of such loops, but I just
haven't done the work to support them yet.
7. We must be able to prove at compile-time that loops will not contain
faulting loads.
Tests have been added here:
Transforms/LoopVectorize/AArch64/simple_early_exit.ll
Add a new getSCEVExprForVPValue utility which can be used to get a SCEV
expression for a VPValue. The initial implementation only returns SCEVs
for live-in IR values (by constructing a SCEV based on the live-in IR
value) and VPExpandSCEVRecipe. This is enough to serve its first use,
getting a SCEV for a VPlan's trip count, but will be extended in the
future.
It also removes createTripCountSCEV, as the new helper can be used to
retrieve the SCEV from the VPlan.
PR: https://github.com/llvm/llvm-project/pull/94464
We already pass a Type object into the VPTypeAnalysis constructor, which
can be used to obtain the context. While in the same area it also made
sense to avoid passing the context into the VPTransformState and
VPCostContext constructors.
Add a new VPIRInstruction recipe to wrap existing IR instructions not to
be modified during execution, execept for PHIs. For PHIs, a single
VPValue
operand is allowed, and it is used to add a new incoming value for the
single predecessor VPBB. Expect PHIs, VPIRInstructions cannot have any
operands.
Depends on https://github.com/llvm/llvm-project/pull/100658.
PR: https://github.com/llvm/llvm-project/pull/100735
Whilst trying to write some VPlan unit tests I realised
that we don't need to pass a ScalarEvolution object into
VPlanTransforms::optimize because the only thing we
actually need is a LLVMContext.
At the moment, the full cost of all interleave group members is assigned
to the instruction at the group's insert position, even if the decision
was to not form an interleave group.
This can lead to inaccurate cost estimates, e.g. if the instruction at
the insert position is dead. If the decision is to not vectorize but
scalarize or scather/gather, then the cost will be to total cost for all
members. In those cases, assign individual the cost per member, to more
closely reflect to choice per instruction.
This fixes a divergence between legacy and VPlan-based cost model.
Fixes https://github.com/llvm/llvm-project/issues/108098.
The check for IV increments in collectUsersInEntryBlock currently
triggers for exit-block PHIs which use the IV start value, resulting in
us failing to add the input value for the middle block to these PHIs.
Fix this by amending the check for IV increments to only include
incoming values that are instructions inside the loop.
Fixes#108004
Update planContainsAdditionalSimplifications to also check phis not in
the loop header. This ensures we don't miss cases where VPBlendRecipes
(which correspond to such phis) have been simplified.
Fixes https://github.com/llvm/llvm-project/issues/107473.
Similar to VFxUF, also add a VF VPValue to VPlan and use it to get the
runtime VF in VPWidenIntOrFpInductionRecipe. Code for VF is only
generated if there are users of VF, to avoid unnecessary test changes.
PR: https://github.com/llvm/llvm-project/pull/95305
There are some cases where only the first operand is marked for
truncation. In that case, the compare won't be truncated which would
incorrectly trigger the assertion.
It also shows that the check pre 3fe6a064f15c also considered compares
truncated that cannot be truncated.
The current check for truncated compares in getInstructionCost misses
cases where either the first or both operands are constants.
Check directly if the compare is marked for truncation. In that case,
the minimum bitwidth is that of the operands.
The patch also adds asserts to ensure that.
This fixes a divergence between legacy and VPlan-based cost model, where
the legacy cost model incorrectly estimated the cost of compares with
truncated operands.
Fixes https://github.com/llvm/llvm-project/issues/107171.
Similarly to dd94537b4, setVectorizedCallDecision also did not consider
ForcedScalars. This lead to VPlans not reflecting the decision by the
legacy cost model (cost computation would use scalar cost, VPlan would
have VPWidenCallRecipe).
To fix this, check if the call has been forced to scalar in
setVectorizedCallDecision.
Note that this requires moving setVectorizedCallDecision after
collectLoopUniforms (which sets ForcedScalars). collectLoopUniforms does
not depend on call decisions and can safely be moved.
Fixes https://github.com/llvm/llvm-project/issues/107051.
collectInstsToScalarize may decide to scalarize a call. If so, we have
to update the widening decision for the call, otherwise the call won't
be scalarized as expected during VPlan construction.
This issue was uncovered by f82543d509.
This moves the logic to create simplified operands using SCEV to MUL
recipe creation. This is needed to match the behavior of the legacy's cost
model. TODOs are to extend to other opcodes and move to a transform.
Note that this also restricts the number of SCEV simplifications we
apply to more precisely match the cases handled by the legacy cost
model.
Fixes https://github.com/llvm/llvm-project/issues/107015.
Branches exiting the loop will remain regardless, so don't consider them
in collectValuesToIgnore.
This fixes another divergence between legacy and VPlan-based cost model.
Fixes https://github.com/llvm/llvm-project/issues/106780.
A optimizable cast can also be removed by VPlan simplifications. Remove
the restriction from planContainsAdditionalSimplifications, as this
causes it to miss relevant simplifications, triggering false positives
for the cost decision verification.
Also adds debug output for printing additional cost-precomputations.
Fixes https://github.com/llvm/llvm-project/issues/106641.
This ensures we skip any instructions identified to be ignored by the
legacy cost model as well. Fixes a divergence between legacy and
VPlan-based cost model.
Fixes https://github.com/llvm/llvm-project/issues/106417.
Improve operand analysis using SCEV for cost purposes. This fixes a
divergence between legacy and VPlan-based cost-modeling after
533e6bbd0d34.
Fixes https://github.com/llvm/llvm-project/issues/106248.
Live-ins that are used as exit values don't need to be extracted, they
can be passed through directly. This fixes a crash when trying to
extract from a live-in.
Fixes https://github.com/llvm/llvm-project/issues/106257.
This is a step towards further breaking up the rather large
tryToBuildVPlanWithVPRecipes. It moves logic create interleave groups to
VPlanTransforms.cpp, where similar replacements for other recipes are
defined as well (e.g. EVL-based ones)
Don't consider the cost of branches marked to be skipped in VPlan cost
pre-computation. Those aren't included in the legacy cost, so they
should not be included in the VPlan cast.
This patch fixes:
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7245:1: error:
unused function 'planContainsAdditionalSimplifications'
[-Werror,-Wunused-function]
There are cases where VPlans contain some simplifications that are very
hard to accurately account for up-front in the legacy cost model. Those
cases are caused by un-simplified inputs, which trigger the assert
ensuring both the legacy and VPlan-based cost model agree on the VF.
To avoid false positives due to missed simplifications in general, only
trigger the assert if the chosen VPlan doesn't contain any additional
simplifications.
Fixes https://github.com/llvm/llvm-project/issues/104714.
Fixes https://github.com/llvm/llvm-project/issues/105713.
Move VPWiden[Load|Store]EVLRecipe::executeto VPlanRecipes.cpp in line
with other ::execute implementations that don't depend on anything
defined in LoopVectorization.cpp
It's not possible to pick the best mask to remove when optimising
VPBlend at construction and so this patch refactors the code to move the
decision (and thus transformation) to VPlanTransforms.
NOTE: This patch does not change the decision of which mask to pick.
That will be done in a following PR to keep this patch as NFC from an
output point of view.
Use getBestVF to select VF up-front and only use
selectVectorizationFactor to get the VF legacy VF to check the
vectorization decision matches the VPlan-based cost model.
PR: https://github.com/llvm/llvm-project/pull/103033
Introduce explicit ExtractFromEnd recipes to extract the final values
for live-outs instead of implicitly extracting in VPLiveOut::fixPhi.
This is a follow-up to the recent changes of modeling extracts for
recurrences and consolidates live-out extract creation for fixed-order
recurrences at a single place: addLiveOutsForFirstOrderRecurrences.
It is also in preparation of replacing VPLiveOut with VPIRInstructions
wrapping the original scalar phis.
PR: https://github.com/llvm/llvm-project/pull/100658
As suggested in https://github.com/llvm/llvm-project/pull/103033, more
accurately rename to getPlanFor , as it simplify returns the VPlan for
VF, relying on the fact that there is a single VPlan for each VF at the
moment.
As suggested in https://github.com/llvm/llvm-project/pull/103033, add a
remark when the UserVF is ignored due to it being larger than MaxUserVF.
Only changes behavior of diagnostic/debug output.