This patch adds a new transform to remove dead recipes. For now, it only
removes dead recipes in the header, to keep the number tests that require
updating manageable. Future patches will extend this to remove dead
recipes across the whole plan.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D118051
isCandidateForEpilogueVectorization will currently return false for loops
which contain reductions. This patch removes this restriction and makes
the following changes to support epilogue vectorisation with reductions:
- `fixReduction`: If fixReduction is being called during vectorisation of the
epilogue, the phi node it creates will need to additionally carry incoming
values from the middle block of the main loop.
- `createEpilogueVectorizedLoopSkeleton`: The incoming values of the phi
created by fixReduction are updated after the vec.epilog.iter.check block
is added. The phi is also moved to the preheader of the epilogue.
- `processLoop`: The start value of any VPReductionPHIRecipes are updated before
vectorising the epilogue loop. The getResumeInstr function added to the ILV
will return the resume instruction associated with the recurrence descriptor.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D116928
9345ab3a4550 updated generateOverflowCheck to skip creating checks that
always evaluate to false. This in turn means that we only need to check
for overflows if the result of the multiplication is actually used.
Sink the Or for the overflow check into ComputeEndCheck, so it is only
created when there's an actual check.
Currently generateOverflowCheck always creates code for Step being
negative and positive, followed by a select at the end depending on
Step's sign.
This patch updates the code to only create either the checks for step
being positive or negative, if the sign is known.
Follow-up to D116696.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D116747
This patch updates SCEVExpander::expandUnionPredicate to not create
redundant 'or false, x' instructions. While those are trivially
foldable, they can be easily avoided and hinder code that checks the
size/cost of the generated checks before further folds.
I am planning on look into a few other similar improvements to code
generated by SCEVExpander.
I remember a while ago @lebedev.ri working on doing some trivial folds
like that in IRBuilder itself, but there where concerns that such
changes may subtly break existing code.
Reviewed By: reames, lebedev.ri
Differential Revision: https://reviews.llvm.org/D116696
Upon further investigation and discussion,
this is actually the opposite direction from what we should be taking,
and this direction wouldn't solve the motivational problem anyway.
Additionally, some more (polly) tests have escaped being updated.
So, let's just take a step back here.
This reverts commit f3190dedeef9da2109ea57e4cb372f295ff53b88.
This reverts commit 749581d21f2b3f53e4fca4eb8728c942d646893b.
This reverts commit f3df87d57e096143670e0fd396e81d43393a2dd2.
This reverts commit ab1dbcecd6f0969976fafd62af34730436ad5944.
While we could emit such a tautological `select`,
it will stick around until the next instsimplify invocation,
which may happen after we count the cost of this redundant `select`.
Which is precisely what happens with loop vectorization legality checks,
and that artificially increases the cost of said checks,
which is bad.
There is prior art for this in `IRBuilderBase::CreateAnd()`/`IRBuilderBase::CreateOr()`.
Refs. https://reviews.llvm.org/D109368#3089809
This patch marks the induction increment of the main induction variable
of the vector loop as NUW when not folding the tail.
If the tail is not folded, we know that End - Start >= Step (either
statically or through the minimum iteration checks). We also know that both
Start % Step == 0 and End % Step == 0. We exit the vector loop if %IV +
%Step == %End. Hence we must exit the loop before %IV + %Step unsigned
overflows and we can mark the induction increment as NUW.
This should make SCEV return more precise bounds for the created vector
loops, used by later optimizations, like late unrolling.
At the moment quite a few tests still need to be updated, but before
doing so I'd like to get initial feedback to make sure I am not missing
anything.
Note that this could probably be further improved by using information
from the original IV.
Attempt of modeling of the assumption in Alive2:
https://alive2.llvm.org/ce/z/H_DL_g
Part of a set of fixes required for PR50412.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D103255
These intrinsics, not the icmp+select are the canonical form nowadays,
so we might as well directly emit them.
This should not cause any regressions, but if it does,
then then they would needed to be fixed regardless.
Note that this doesn't deal with `SCEVExpander::isHighCostExpansion()`,
but that is a pessimization, not a correctness issue.
Additionally, the non-intrinsic form has issues with undef,
see https://reviews.llvm.org/D88287#2587863
Motivating examples are seen in the PhaseOrdering tests based on:
https://bugs.llvm.org/show_bug.cgi?id=43953#c2 - if we have
intrinsics there, some pass can fold them.
The intrinsics are still named "experimental" at this point, but
if there is no fallout from this patch, that will be a good
indicator that it is safe to finalize them.
Differential Revision: https://reviews.llvm.org/D80867
This was reverted because of a miscompilation. At closer inspection, the
problem was actually visible in a changed llvm regression test too. This
one-line follow up fix/recommit will splat the IV, which is what we are trying
to avoid if unnecessary in general, if tail-folding is requested even if all
users are scalar instructions after vectorisation. Because with tail-folding,
the splat IV will be used by the predicate of the masked loads/stores
instructions. The previous version omitted this, which caused the
miscompilation. The original commit message was:
If tail-folding of the scalar remainder loop is applied, the primary induction
variable is splat to a vector and used by the masked load/store vector
instructions, thus the IV does not remain scalar. Because we now mark
that the IV does not remain scalar for these cases, we don't emit the vector IV
if it is not used. Thus, the vectoriser produces less dead code.
Thanks to Ayal Zaks for the direction how to fix this.
If tail-folding of the scalar remainder loop is applied, the primary induction
variable is splat to a vector and used by the masked load/store vector
instructions, thus the IV does not remain scalar. Because we now mark
that the IV does not remain scalar for these cases, we don't emit the vector IV
if it is not used. Thus, the vectoriser produces less dead code.
Thanks to Ayal Zaks for the direction how to fix this.
Differential Revision: https://reviews.llvm.org/D78911
This reverts commit r365260 which broke the following tests:
Clang :: CodeGenCXX/cfi-mfcall.cpp
Clang :: CodeGenObjC/ubsan-nullability.m
LLVM :: Transforms/LoopVectorize/AArch64/pr36032.ll
llvm-svn: 365284
Without this, we have the unfortunate property that tests are dependent on the order of operads passed the CreateOr and CreateAnd functions. In actual usage, we'd promptly optimize them away, but it made tests slightly more verbose than they should have been.
llvm-svn: 365260
Summary:
Currently we express umin as `~umax(~x, ~y)`. However, this becomes
a problem for operands in non-integral pointer spaces, because `~x`
is not something we can compute for `x` non-integral. However, since
comparisons are generally still allowed, we are actually able to
express `umin(x, y)` directly as long as we don't try to express is
as a umax. Support this by adding an explicit umin/smin representation
to SCEV. We do this by factoring the existing getUMax/getSMax functions
into a new function that does all four. The previous two functions were
largely identical.
Reviewed By: sanjoy
Differential Revision: https://reviews.llvm.org/D50167
llvm-svn: 360159
As it's causing some bot failures (and per request from kbarton).
This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda.
llvm-svn: 358546
This reverts r319889.
Unfortunately, wrapping flags are not a part of SCEV's identity (they
do not participate in computing a hash value or in equality
comparisons) and in fact they could be assigned after the fact w/o
rebuilding a SCEV.
Grep for const_cast's to see quite a few of examples, apparently all
for AddRec's at the moment.
So, if 2 expressions get built in 2 slightly different ways: one with
flags set in the beginning, the other with the flags attached later
on, we may end up with 2 expressions which are exactly the same but
have their operands swapped in one of the commutative N-ary
expressions, and at least one of them will have "sorted by complexity"
invariant broken.
2 identical SCEV's won't compare equal by pointer comparison as they
are supposed to.
A real-world reproducer is added as a regression test: the issue
described causes 2 identical SCEV expressions to have different order
of operands and therefore compare not equal, which in its turn
prevents LoadStoreVectorizer from vectorizing a pair of consecutive
loads.
On a larger example (the source of the test attached, which is a
bugpoint) I have seen even weirder behavior: adding a constant to an
existing SCEV changes the order of the existing terms, for instance,
getAddExpr(1, ((A * B) + (C * D))) returns (1 + (C * D) + (A * B)).
Differential Revision: https://reviews.llvm.org/D40645
llvm-svn: 340777