Currently, we hardly ever actually run SCEV verification, even in
tests with -verify-scev. This is because the NewPM LPM does not
verify SCEV. The reason for this is that SCEV verification can
actually change the result of subsequent SCEV queries, which means
that you see different transformations depending on whether
verification is enabled or not.
To allow verification in the LPM, this limits verification to
BECounts that have actually been cached. It will not calculate
new BECounts.
BackedgeTakenInfo::getExact() is still not entirely readonly,
it still calls getUMinFromMismatchedTypes(). But I hope that this
is not problematic in the same way. (This could be avoided by
performing the umin in the other SCEV instance, but this would
require duplicating some of the code.)
Differential Revision: https://reviews.llvm.org/D120551
For unreachable loops, any BECount is legal, and since D98706 SCEV
can make use of this for loops that are unreachable due to constant
branches. To avoid false positives, adjust SCEV verification to only
check BECounts in reachable loops.
Fixes https://github.com/llvm/llvm-project/issues/50523.
Differential Revision: https://reviews.llvm.org/D120651
This reverts commit 7cd273c339cfe8427404f881ae280bd9fae6ff78.
Several patches with tests fixes have been applied:
0cada82f0a30e5ae22dce66b58604ab9b47a3897 "[Test] Remove incorrect test in GVN"
97cb13615d6d9df254e3c0f3deef9eaedfe189b6 "[Test] Separate IndVars test into AArch64 and X86 parts"
985cc490f17d28b20392ee214895d947b85120ef "[Test] Remove separated test in IndVars",
and test failures caused by 5ec2386 should be resolved now.
The widen-loop-comp.ll in indvars has a target triple with
specified aarch64 architecture. This caused test failures with
db28934 "[IndVars] Pass TTI to replaceCongruentIVs" applied, because
with the patch indvars performed some target-specific
transforms, and for example if a build supported only X86,
then indvars would not have applied those transforms.
However, the checks in the test were generated as for aarch64.
Thus the test failures on such builds.
This patch separates widen-loop-comp.ll into two parts.
The first one is intended to be run only if a build supports aarch64.
This is now in AArch64 directory with a lit config.
The second one was added recently to show db28934 improvements.
This one is now in X86 directory.
This patch should resolve build issues caused by
5ec23863320ca12bfabb6dcff1d0425cb614b7a5.
This reapplies patch db289340c841990055a164e8eb2a3b5ff25677bf.
The test failures on build with expensive checks caused by the patch happened due
to the fact that we sorted loop Phis in replaceCongruentIVs using llvm::sort,
which shuffles the given container if the expensive checks are enabled,
so equivalent Phis in the sorted vector had different mutual order from run
to run. replaceCongruentIVs tries to replace narrow Phis with truncations
of wide ones. In some test cases there were several Phis with the same
width, so if their order differs from run to run, the narrow Phis would
be replaced with a different Phi, depending on the shuffling result.
The patch ae14fae0ff4304022beda5ab484f84ac0fdda807 fixed this issue by
replacing llvm::sort with llvm::stable_sort.
In IndVarSimplify after simplifying and extending loop IVs we call 'replaceCongruentIVs'.
This function optionally takes a TTI argument to be able to replace narrow IVs uses
with truncates of the widest one.
For some reason the TTI wasn't passed to the function, so it couldn't perform such
transform.
This patch fixes it.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D113024
Only the most recent cpus support really 1cy 64-bit multiplies, and the X64 cost table represents a realistic worst case. The 1cy value was also discouraging vectorization when most vXi64 PMULDQ expansions aren't actually slower than scalarization.
Noticed while investigating PR51436.
Since d6de1e1a71406c75a4ea4d5a2fe84289f07ea3a1, no attributes is quivalent to
setting attribute to false.
This is a preliminary commit for https://reviews.llvm.org/D99080
We can prove more predicates when we have a context when eliminating ICmp.
As first (and very obvious) approximation we can use the ICmp instruction itself,
though in the future we are going to use a common dominator of all its users.
Need some refactoring before that.
Observed ~0.5% negative compile time impact.
Differential Revision: https://reviews.llvm.org/D98697
Reviewed By: lebedev.ri
By definition of Implication operator, `false -> true` and `false -> false`. It means that
`false` implies any predicate, no matter true or false. We don't need to go any further
trying to prove the statement we need and just always say that `false` implies it in this case.
In practice it means that we are trying to prove something guarded by `false` condition,
which means that this code is unreachable, and we can safely prove any fact or perform any
transform in this code.
Differential Revision: https://reviews.llvm.org/D98706
Reviewed By: lebedev.ri
If we have a recurrence of the form <Start, And, Step> we know that the value taken by the recurrence stabilizes on the first iteration (provided step is loop invariant). We can exploit that fact to remove the loop carried dependence in the recurrence.
Differential Revision: https://reviews.llvm.org/D97578 (and part)
These intrinsics, not the icmp+select are the canonical form nowadays,
so we might as well directly emit them.
This should not cause any regressions, but if it does,
then then they would needed to be fixed regardless.
Note that this doesn't deal with `SCEVExpander::isHighCostExpansion()`,
but that is a pessimization, not a correctness issue.
Additionally, the non-intrinsic form has issues with undef,
see https://reviews.llvm.org/D88287#2587863
This builds on the restricted after initial revert form of D93906, and adds back support for breaking backedges of inner loops. It turns out the original invalidation logic wasn't quite right, specifically around the handling of LCSSA.
When breaking the backedge of an inner loop, we can cause blocks which were in the outer loop only because they were also included in a sub-loop to be removed from both loops. This results in the exit block set for our original parent loop changing, and thus a need for new LCSSA phi nodes.
This case happens when the inner loop has an exit block which is also an exit block of the parent, and there's a block in the child which reaches an exit to said block without also reaching an exit to the parent loop.
(I'm describing this in terms of the immediate parent, but the problem is general for any transitive parent in the nest.)
The approach implemented here involves a potentially expensive LCSSA rebuild. Perf testing during review didn't show anything concerning, but we may end up needing to revert this if anyone encounters a practical compile time issue.
Differential Revision: https://reviews.llvm.org/D94378
This reverts commit dd6bb367d19e3bf18353e40de54d35480999a930.
Multi-stage builders are showing an assertion failure w/LCSSA not being preserved on entry to IndVars. Reason isn't clear, reverting while investigating.
The basic idea is that if SCEV can prove the backedge isn't taken, we can go ahead and get rid of the backedge (and thus the loop) while leaving the rest of the control in place. This nicely handles cases with dispatch between multiple exits and internal side effects.
Differential Revision: https://reviews.llvm.org/D93906
In 35676a4f9a536a2aab768af63ddbb15bc722d7f9 I've added handling for
non-trivial dominating conditions that imply non-zero on the true
branch. This adds the same support for the false branch.
The changes in pr45360.ll change block ordering and naming, but
don't change the control flow. The urem is still guaraded by a
non-zero check correctly.
In an effort to make code around flag determination more readable, and (possibly) prepare for a follow up change, factor out some of the flag detection logic. In the process, reduce the number of locations we mutate wrap flags by a couple.
Note that this isn't NFC. The old code tried for NSW xor (NUW || NW). This is, two different paths computed different sets of wrap flags. The new code will try for all three. The result is that some expressions end up with a few extra flags set.
We should first try to constant fold the add expression and only
strengthen nowrap flags afterwards. This allows us to determine
stronger flags if e.g. only two operands are left after constant
folding (and thus "guaranteed no wrap region" code applies) or the
resulting operands are non-negative and thus nsw->nuw strengthening
applies.
This reverts commit a10a64e7e334dc878d281aba9a46f751fe606567.
It broke polly/test/ScopInfo/NonAffine/non-affine-loop-condition-dependent-access_3.ll
The difference suggests that this may be a serious issue.
Fixed wrapping range case & proof methods reduced to constant range
checks to save compile time.
Differential Revision: https://reviews.llvm.org/D89381
It was reverted because of negative compile time impact. In this version,
less powerful proof methods are used (non-recursive reasoning only), and
scope limited to constant End values to avoid explision of complex proofs.
Differential Revision: https://reviews.llvm.org/D89381
We can sharpen the range of a AddRec if we know that it does not
self-wrap and know the symbolic iteration count in the loop. If we can
evaluate the value of AddRec on the last iteration and prove that at least
one its intermediate value lies between start and end, then no-wrap flag
allows us to conclude that all of them also lie between start and end. So
the estimate of range can be improved to union of ranges of start and end.
Differential Revision: https://reviews.llvm.org/D89381
Reviewed By: efriedma
Summary:
As [[ https://bugs.llvm.org/show_bug.cgi?id=45360 | PR45360 ]] reports,
with new cost-model we can sometimes end up being able to expand `udiv`/`urem` instructions.
And that exposes at least one instance of when we do that
regardless of whether or not it is safe to do.
In this particular case, it's `SimplifyIndvar::replaceIVUserWithLoopInvariant()`.
It seems to me, we simply need to check with `isSafeToExpandAt()` first.
The test isn't great. I'm not sure how to make it only run `-indvars`.
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=45360 | PR45360 ]].
Reviewers: mkazantsev, reames, helloqirun
Reviewed By: mkazantsev
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82108
https://bugs.llvm.org/show_bug.cgi?id=45360
This is reduced from the (runnable) test provided in the bug report.
The remainder operation is originally guarded, it never divides by zero.
Indvars should not make it execute unconditionally.
This is not a great test, running whole -O2 is fragile,
but i really don't understand why running -indvars on the IR before
that tranform happens doesn't work.