Reverted because the compile time impact is still too high.
isKnownViaNonRecursiveReasoning is used twice, we can do it just once.
Differential Revision: https://reviews.llvm.org/D92152
This reverts commit 3d4c0460ec6040fc071e56dc113afd181294591e.
Compile time impact is still high. Need to understand why.
Differential Revision: https://reviews.llvm.org/D92153
Previously we tried to using isKnownPredicateAt, but it makes an
extra query to isKnownPredicate, which has negative impact on compile
time. Let's try to use more lightweight isBasicBlockEntryGuardedByCond.
Differential Revision: https://reviews.llvm.org/D92152
This reverts commit 14f2ad0e3cc54d5eb254b545a469e8ffdb62b119.
Reverting to investigate compile time drop.
Differential Revision: https://reviews.llvm.org/D92152
A piece of code in `isLoopBackedgeGuardedByCond` basically duplicates
the dominators traversal from `isBlockEntryGuardedByCond` called from
`isKnownPredicateAt`, but it's less powerful because it does not give context
to `isImpliedCond`. This patch reuses the `isKnownPredicateAt `function there,
reducing the amount of code duplication and making it more powerful.
Differential Revision: https://reviews.llvm.org/D92152
Reviewed By: skatkov
Use more context to prove contextual facts about the last iteration. It is
only executed when the backedge is taken, so we can use `isLoopBackedgeGuardedByCond`
to make this check.
Differential Revision: https://reviews.llvm.org/D91535
Reviewed By: skatkov
The TypeSize warning would occur because RuntimePointerChecking::insert
was not scalable vector aware. The fix is to use
ScalarEvolution::getSizeOfExpr to grab the size of types.
Differential Revision: https://reviews.llvm.org/D90171
This reverts commit 7dcc8899174f44b7447bc48a9f2ff27f5458f8b7.
This patch introduced a logical error that breaks whole logic of this analysis.
All checks we are making are supposed to be loop-independent, so that we could
safely remove the range check. The 'nw' fact is loop-dependent, so we can remove
the check basing on facts from this very check.
Motivating examples will follow-up.
This reverts commit 2734a9ebf4a31df0131acdfc739395a5e692c342.
This patch appeared to not be a NFC. It introduced an execution path where
monotonicity check on limited space started relying in existing nsw/nuw
flags, which is illegal. The motivating test will follow-up.
SCEV makes a logical mistake when handling EitherMayExit in
case when both conditions must be met to exit the loop. The
mistake looks like follows: "if condition `A` fails within at most `X` first
iterations, and `B` fails within at most `Y` first iterations, then `A & B`
fails at most within `min (X, Y)` first iterations". This is wrong, because
both of them must fail at the same time.
Simple example illustrating this is following: we have an IV with step 1,
condition `A` = "IV is even", condition `B` = "IV is odd". Both `A` and `B`
will fail within first two iterations. But it doesn't mean that both of them
will fail within first two first iterations at the same time, which would mean
that IV is neither even nor odd at the same time within first 2 iterations.
We can only do so for known exact BE counts, but not for max.
Differential Revision: https://reviews.llvm.org/D91942
Reviewed By: nikic
Handling of `and` and `or` vastly uses copy-paste. Factored out into
a helper function as preparation step for further fix (see PR48225).
Differential Revision: https://reviews.llvm.org/D91864
Reviewed By: nikic
This is a cut down version of 1ec6e1 which was reverted due to a compile time issue. The key changes made from that patch: 1) only infer the flags needed along each path, 2) be careful to preserve order of checks, and 3) avoid computing NW flags at all since we need to prove the stronger property (does not cross 0) in the caller anyways.
Assuming this doesn't trip regressions, I'm going to try weakening (1). My end objective is to move flag inference into addrec construction. If I can't weaken (1) without compile time impact, I'll have a problem.
In an effort to make code around flag determination more readable, and (possibly) prepare for a follow up change, factor out some of the flag detection logic. In the process, reduce the number of locations we mutate wrap flags by a couple.
Note that this isn't NFC. The old code tried for NSW xor (NUW || NW). This is, two different paths computed different sets of wrap flags. The new code will try for all three. The result is that some expressions end up with a few extra flags set.
The SCEV code for constructing GEP expressions currently assumes
that the addition of the base and all the offsets is nsw if the GEP
is inbounds. While the addition of the offsets is indeed nsw, the
addition to the base address is not, as the base address is
interpreted as an unsigned value.
Fix the GEP expression code to not assume nsw for the base+offset
calculation. However, do assume nuw if we know that the offset is
non-negative. With this, we use the same behavior as the
construction of GEP addrecs does. (Modulo the fact that we
disregard SCEV unification, as the pre-existing FIXME points out).
Differential Revision: https://reviews.llvm.org/D90648
A piece of logic of `isLoopInvariantExitCondDuringFirstIterations` is actually
a generalized predicate monotonicity check. This patch moves it into the
corresponding method and generalizes it a bit.
Differential Revision: https://reviews.llvm.org/D90395
Reviewed By: apilipenko
Lift limitation on step being `+/- 1`. In fact, the only thing it is needed for
is proving no-self-wrap. We can instead check this flag directly.
Theoretically it can increase the scope of the transform, but I could not
construct such test easily.
Differential Revision: https://reviews.llvm.org/D91126
Reviewed By: apilipenko
This changes the definition of t2DoLoopStart from
t2DoLoopStart rGPR
to
GPRlr = t2DoLoopStart rGPR
This will hopefully mean that low overhead loops are more tied together,
and we can more reliably generate loops without reverting or being at
the whims of the register allocator.
This is a fairly simple change in itself, but leads to a number of other
required alterations.
- The hardware loop pass, if UsePhi is set, now generates loops of the
form:
%start = llvm.start.loop.iterations(%N)
loop:
%p = phi [%start], [%dec]
%dec = llvm.loop.decrement.reg(%p, 1)
%c = icmp ne %dec, 0
br %c, loop, exit
- For this a new llvm.start.loop.iterations intrinsic was added, identical
to llvm.set.loop.iterations but produces a value as seen above, gluing
the loop together more through def-use chains.
- This new instrinsic conceptually produces the same output as input,
which is taught to SCEV so that the checks in MVETailPredication are not
affected.
- Some minor changes are needed to the ARMLowOverheadLoop pass, but it has
been left mostly as before. We should now more reliably be able to tell
that the t2DoLoopStart is correct without having to prove it, but
t2WhileLoopStart and tail-predicated loops will remain the same.
- And all the tests have been updated. There are a lot of them!
This patch on it's own might cause more trouble that it helps, with more
tail-predicated loops being reverted, but some additional patches can
hopefully improve upon that to get to something that is better overall.
Differential Revision: https://reviews.llvm.org/D89881
Our range computation methods benefit from no-wrap flags. But if the ranges
were first computed before the flags were set, the cached range will be too
pessimistic.
We need to drop cached ranges whenever we sharpen AddRec's no wrap flags.
Differential Revision: https://reviews.llvm.org/D89847
Reviewed By: fhahn
Strengthening nowrap flags is relatively expensive. Make sure we
only do it if we're actually going to use the flags -- we don't
use them for many recursive invocations. Additionally, if we're
reusing an existing SCEV node, there's no point in trying to
strengthen the flags if we don't have any new baseline facts.
This change falls slightly short of being NFC, because the way
flags during add+addrec / mul+addrec folding are handled may be
more precise (as less operands are included in the calculation).
Instead of performing a sequence of pairwise additions, directly
construct a multi-operand add expression.
This should be NFC modulo any SCEV canonicalization deficiencies.
This is functionally-identical to the previous implementation,
just using a generic interface to do that instead of hand-rolled one,
with caching as a bonus. Thought the sinking is still recursive..
Note that SCEVRewriteVisitor<>'s default implementations
don't preserve NoWrap flags on Add/Mul (but does on AddRec!),
but here we know we can preserve them,
so `visitAddExpr()`/`visitMulExpr()` are specialized.
If we've got an SCEVPtrToIntExpr(op), where op is not an SCEVUnknown,
we want to sink the SCEVPtrToIntExpr into an operand,
so that the operation is performed on integers,
and eventually we end up with just an `SCEVPtrToIntExpr(SCEVUnknown)`.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D89692
And use it to model LLVM IR's `ptrtoint` cast.
This is essentially an alternative to D88806, but with no chance for
all the problems it caused due to having the cast as implicit there.
(see rG7ee6c402474a2f5fd21c403e7529f97f6362fdb3)
As we've established by now, there are at least two reasons why we want this:
* It will allow SCEV to actually model the `ptrtoint` casts
and their operands, instead of treating them as `SCEVUnknown`
* It should help with initial problem of PR46786 - this should eventually allow us
to not loose pointer-ness of an expression in more cases
As discussed in [[ https://bugs.llvm.org/show_bug.cgi?id=46786 | PR46786 ]], in principle,
we could just extend `SCEVUnknown` with a `is ptrtoint` cast, because `ScalarEvolution::getPtrToIntExpr()`
should sink the cast as far down into the expression as possible,
so in the end we should always end up with `SCEVPtrToIntExpr` of `SCEVUnknown`.
But i think that it isn't the best solution, because it doesn't really matter
from memory consumption side - there probably won't be *that* many `SCEVPtrToIntExpr`s
for it to matter, and it allows for much better discoverability.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D89456
URem operations with constant power-of-2 second operands are modeled as
such. This patch on its own has very little impact (e.g. no changes in
CodeGen for MultiSource/SPEC2000/SPEC2006 on X86 -O3 -flto), but I'll
soon post follow-up patches that make use of it to more accurately
determine the trip multiple.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D89821
This reverts commit e038b60d9169733367393f733058f0ff23c28d3f.
This reverts commit a0d84d80315d0c725b5efcd889928bad1171ba56.
This revert was a mistake. The reason of the failures was
"Use uint64_t for branch weights instead of uint32_t"
Differential Revision: https://reviews.llvm.org/D87832
When we need to prove implication of expressions of different type width,
the default strategy is to widen everything to wider type and prove in this
type. This does not interact well with AddRecs with negative steps and
unsigned predicates: such AddRec will likely not have a `nuw` flag, and its
`zext` to wider type will not be an AddRec. In contraty, `trunc` of an AddRec
in some cases can easily be proved to be an `AddRec` too.
This patch introduces an alternative way to handling implications of different
type widths. If we can prove that wider type values actually fit in the narrow type,
we truncate them and prove the implication in narrow type.
The return was due to revert of underlying patch that this one depends on.
Unit test temporarily disabled because the required logic in SCEV is switched
off due to compile time reasons.
Differential Revision: https://reviews.llvm.org/D89548
We can sharpen the range of a AddRec if we know that it does not
self-wrap and know the symbolic iteration count in the loop. If we can
evaluate the value of AddRec on the last iteration and prove that at least
one its intermediate value lies between start and end, then no-wrap flag
allows us to conclude that all of them also lie between start and end. So
the estimate of range can be improved to union of ranges of start and end.
Switched off by default, can be turned on by flag.
Differential Revision: https://reviews.llvm.org/D89381
Reviewed By: lebedev.ri, nikic
This reverts commit c6ca26c0bfedb8f80d6f8cb9adde25b1d6aac1c5.
This breaks stage2 builds due to hitting this assert:
```
Assertion failed: (WeightSum <= UINT32_MAX && "Expected weights to scale down to 32 bits"), function calcMetadataWeights
```
when compiling AArch64RegisterBankInfo.cpp in LLVM.
Even if the exact exit count is unknown, we can still prove that this
exit will not be taken. If we can prove that the predicate is monotonic,
fulfilled on first & last iteration, and no overflow happened in between,
then the check can be removed.
Differential Revision: https://reviews.llvm.org/D87832
Reviewed By: apilipenko
Same change as 0dda6333175c1749f12be660456ecedade3bcf21, but for
mul expressions. We want to first fold any constant operans and
then strengthen the nowrap flags, as we can compute more precise
flags at that point.
Establish parity with the handling of add expressions, by always
constant folding mul expression operands before checking the depth
limit (this is a non-recursive simplification). The code was already
unconditionally constant folding the case where all operands were
constants, but was not folding multiple constant operands together
if there were also non-constant operands.
This requires picking out a different demonstration for depth-based
folding differences in the limit-depth.ll test.
Separate out the code handling constant folding into a separate
block, that is independent of other folds that need a constant
first operand. Also make some minor adjustments to make the
constant folding look nearly identical to the same code in
getAddExpr().
The only reason this change is not strictly NFC is that the
C1*(C2+V) fold is moved below the constant folding, which means
that it now also applies to C1*C2*(C3+V), as it should.
We should first try to constant fold the add expression and only
strengthen nowrap flags afterwards. This allows us to determine
stronger flags if e.g. only two operands are left after constant
folding (and thus "guaranteed no wrap region" code applies) or the
resulting operands are non-negative and thus nsw->nuw strengthening
applies.
We want to have a caching version of symbolic BE exit count
rather than recompute it every time we need it.
Differential Revision: https://reviews.llvm.org/D89954
Reviewed By: nikic, efriedma
When we need to prove implication of expressions of different type width,
the default strategy is to widen everything to wider type and prove in this
type. This does not interact well with AddRecs with negative steps and
unsigned predicates: such AddRec will likely not have a `nuw` flag, and its
`zext` to wider type will not be an AddRec. In contraty, `trunc` of an AddRec
in some cases can easily be proved to be an `AddRec` too.
This patch introduces an alternative way to handling implications of different
type widths. If we can prove that wider type values actually fit in the narrow type,
we truncate them and prove the implication in narrow type.
Differential Revision: https://reviews.llvm.org/D89548
Reviewed By: fhahn