This addresses a performance regression reported against 3c6e4191. That change (correctly) limited a transform based on assumed finiteness to mustprogress loops, but the previous change (38540d7) which introduced the mustprogress check utility only handled function attributes, not the loop metadata form.
It turns out that clang uses the function attribute form for C++, and the loop metadata form for C. As a result, 3c6e4191 ended up being a large regression in practice for C code as loops weren't being considered mustprogress despite the language semantics.
Currently, NoWrapFlags are dropped if we inline operands of SCEVAddExpr
operands. As a consequence, we always drop flags when building
expressions like `getAddExpr(A, getAddExpr(B, C, NUW), NUW)`.
We should be able to retain NUW flags common among all inlined
SCEVAddExpr and the original flags.
Reviewed By: nikic, mkazantsev
Differential Revision: https://reviews.llvm.org/D103877
Noticed via code inspection. We changed the semantics of the IR when we added mustprogress, and we appear to have not updated this location.
Differential Revision: https://reviews.llvm.org/D103834
The motivation here is simple loops with unsigned induction variables w/non-one steps. A toy example would be:
for (unsigned i = 0; i < N; i += 2) { body; }
Given C/C++ semantics, we do not get the nuw flag on the induction variable. Given that lack, we currently can't compute a bound for this loop. We can do better for many cases, depending on the contents of "body".
The basic intuition behind this patch is as follows:
* A step which evenly divides the iteration space must wrap through the same numbers repeatedly. And thus, we can ignore potential cornercases where we exit after the n-th wrap through uint32_max.
* Per C++ rules, infinite loops without side effects are UB. We already have code in SCEV which relies on this. In LLVM, this is tied to the mustprogress attribute.
Together, these let us conclude that the trip count of this loop must come before unsigned overflow unless the body would form a well defined infinite loop.
A couple notes for those reading along:
* I reused the loop properties code which is overly conservative for this case. I may follow up in another patch to generalize it for the actual UB rules.
* We could cache the n(s/u)w facts. I left that out because doing a pre-patch which cached existing inference showed a lot of diffs I had trouble fully explaining. I plan to get back to this, but I don't want it on the critical path.
Differential Revision: https://reviews.llvm.org/D103118
We might want to use it when creating SCEV proper in createSCEV(),
now that we don't `forgetValue()` in `SimplifyIndvar::strengthenOverflowingOperation()`,
which might have caused us to loose some optimization potential.
When we're remapping an AddRec, the AddRec constructed by a partial
rewrite might not make sense. This triggers an assertion complaining
it's not loop-invariant.
Instead of constructing the partially rewritten AddRec, just skip
straight to calling evaluateAtIteration.
Testcase was automatically reduced using llvm-reduce, so it's a little
messy, but hopefully makes sense.
Differential Revision: https://reviews.llvm.org/D102959
ExprValueMap is a map from SCEV * to a set-vector of (Value *, ConstantInt *) pair,
and while the map itself will likely be big-ish (have many keys),
it is a reasonable assumption that each key will refer to a small-ish
number of pairs.
In particular looking at n=512 case from
https://bugs.llvm.org/show_bug.cgi?id=50384,
the small-size of 4 appears to be the sweet spot,
it results in the least allocations while minimizing memory footprint.
```
$ for i in $(ls heaptrack.opt.*.gz); do echo $i; heaptrack_print $i | tail -n 6; echo ""; done
heaptrack.opt.0-orig.gz
total runtime: 14.32s.
calls to allocation functions: 8222442 (574192/s)
temporary memory allocations: 2419000 (168924/s)
peak heap memory consumption: 190.98MB
peak RSS (including heaptrack overhead): 239.65MB
total memory leaked: 67.58KB
heaptrack.opt.1-n1.gz
total runtime: 13.72s.
calls to allocation functions: 7184188 (523705/s)
temporary memory allocations: 2419017 (176338/s)
peak heap memory consumption: 191.38MB
peak RSS (including heaptrack overhead): 239.64MB
total memory leaked: 67.58KB
heaptrack.opt.2-n2.gz
total runtime: 12.24s.
calls to allocation functions: 6146827 (502355/s)
temporary memory allocations: 2418997 (197695/s)
peak heap memory consumption: 163.31MB
peak RSS (including heaptrack overhead): 211.01MB
total memory leaked: 67.58KB
heaptrack.opt.3-n4.gz
total runtime: 12.28s.
calls to allocation functions: 6068532 (494260/s)
temporary memory allocations: 2418985 (197017/s)
peak heap memory consumption: 155.43MB
peak RSS (including heaptrack overhead): 201.77MB
total memory leaked: 67.58KB
heaptrack.opt.4-n8.gz
total runtime: 12.06s.
calls to allocation functions: 6068042 (503321/s)
temporary memory allocations: 2418992 (200646/s)
peak heap memory consumption: 166.03MB
peak RSS (including heaptrack overhead): 213.55MB
total memory leaked: 67.58KB
heaptrack.opt.5-n16.gz
total runtime: 12.14s.
calls to allocation functions: 6067993 (499958/s)
temporary memory allocations: 2418999 (199307/s)
peak heap memory consumption: 187.24MB
peak RSS (including heaptrack overhead): 233.69MB
total memory leaked: 67.58KB
```
While that test may be an edge worst-case scenario,
https://llvm-compile-time-tracker.com/compare.php?from=dee85d47d9f15fc268f7b18f279dac2774836615&to=98a57e31b1947d5bcdf4a5605ac2ab32b4bd5f63&stat=instructions
agrees that this also results in improvements in the usual situations.
This patch implements getSmallConstantTripMultiple(L) correctly for multiple exit loops. The previous implementation was both imprecise, and violated the specified behavior of the method. This was fine in practice, because it turns out the function was both dead in real code, and not tested for the multiple exit case.
Differential Revision: https://reviews.llvm.org/D103189
This came up in review for another patch, see https://reviews.llvm.org/D102982#2782407 for full context.
I've reviewed the callers to make sure they can handle multiple exit loops w/non-zero returns. There's two cases in target cost models where results might change (Hexagon and PowerPC), but the results looked legal and reasonable. If a target maintainer wishes to back out the effect of the costing change, they should explicitly check for multiple exit loops and handle them as desired.
Differential Revision: https://reviews.llvm.org/D103182
When memoized values for a SCEV expressions are dropped, we also
drop all BECounts that make use of the SCEV expression. This is done
by iterating over all the ExitNotTaken counts and (recursively)
checking whether they use the SCEV expression. If there are many
exits, this will take a lot of time.
This patch improves the situation by pre-computing a set of all
used operands, so that we can determine whether a certain BEInfo
needs to be invalidated using a simple set lookup. Will still need
to loop over all BEInfos though.
This makes for a mild improvement on non-degenerate cases:
https://llvm-compile-time-tracker.com/compare.php?from=b661a55a253f4a1cf5a0fbcb86e5ba7b9fb1387b&to=be1393f450e594c53f0ad7e62339a6bc831b16f6&stat=instructions
For the degenerate case from https://bugs.llvm.org/show_bug.cgi?id=50384,
for n=128 I'm seeing run time drop from 1.6s to 1.1s.
Differential Revision: https://reviews.llvm.org/D102796
We already apply loop-guards when computing the maximum with unitary
steps. This extends the code to also do so when dealing with non-unitary
steps.
This allows us to infer a tighter maximum in some cases.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D102267
applyLoopGuards() already combines conditions from multiple nested
guards. However, it cannot use multiple conditions on the same guard,
combined using and/or. Add support for this by recursing into either
`and` or `or`, depending on the direction of the branch.
Differential Revision: https://reviews.llvm.org/D101692
I think currently isImpliedViaMerge can incorrectly return true for phis
in a loop/cycle, if the found condition involves the previous value of
Consider the case in exit_cond_depends_on_inner_loop.
At some point, we call (modulo simplifications)
isImpliedViaMerge(<=, %x.lcssa, -1, %call, -1).
The existing code tries to prove IncV <= -1 for all incoming values
InvV using the found condition (%call <= -1). At the moment this succeeds,
but only because it does not compare the same runtime value. The found
condition checks the value of the last iteration, but the incoming value
is from the *previous* iteration.
Hence we incorrectly determine that the *previous* value was <= -1,
which may not be true.
I think we need to be more careful when looking at the incoming values
here. In particular, we need to rule out that a found condition refers to
any value that may refer to one of the previous iterations. I'm not sure
there's a reliable way to do so (that also works of irreducible control
flow).
So for now this patch adds an additional requirement that the incoming
value must properly dominate the phi block. This should ensure the
values do not change in a cycle. I am not entirely sure if will catch
all cases and I appreciate a through second look in that regard.
Alternatively we could also unconditionally bail out in this case,
instead of checking the incoming values
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D101829
This seems to be a leftover from when the BackedgeTakenInfo
stored multiple exit counts with manual memory management. At
some point this was switchted to a simple vector, and there should
be no need to micro-manage the clearing anymore. We can simply
drop the loop from the map and the the destructor do its job.
Straight forward extension to the recently added infrastructure which was pioneered with shl. This was originally posted as part of D99687, but split off for ease of review.
(I also decided to exclude the unknown start sign case explicitly for simplicity of understanding.)
Differential Revision: https://reviews.llvm.org/D101181
These can be handled the same way as ule/ult, just using umax
instead of umin. This is useful in cases where the umax prevents
the upper bound from overflowing.
Differential Revision: https://reviews.llvm.org/D101196
ICMP_NE predicates directly overwrote the rewritten result,
instead of chaining it with previous rewrites, as was done for
ICMP_ULT and ICMP_ULE. This means that some guards were effectively
discarded, depending on their order.
Adding the switches to reduce diffs. I'm about to split that into an lshr part and an ashr part, doing the NFC part first makes it easier to maintain both diffs.
As being discussed in https://reviews.llvm.org/D100721,
this modelling is lossy, we can't reconstruct `ash`/`ashr exact`
from it, which means that whenever we actually expand the IR,
we've just pessimized the code..
It would be good to model this pattern, after all it comes up every time
you want to compute a distance between two pointers, but not at this cost.
This reverts commit ec54867df5e7f20e12146e628af34f0384308bcb.
I've run into some cases where a large fraction of compile-time is
spent invalidating SCEV. One of the causes is forgetLoop(), which
walks all values that are def-use reachable from the loop header
phis. When invalidating a topmost loop, that might be close to all
values in a function. Additionally, it's fairly common for there to
not actually be anything to invalidate, but we'll still be performing
this walk again and again.
My first thought was that we don't need to continue walking the uses
if the current value doesn't have a SCEV expression. However, this
isn't quite right, because SCEV construction can skip over values
(e.g. for a chain of adds, we might only create a SCEV expression
for the final value).
What this patch does instead is to only walk the (full) def-use chain
of loop phis that have a SCEV expression. If there's no expression
for a phi, then we also don't have any dependent expressions to
invalidate.
Differential Revision: https://reviews.llvm.org/D100264
"Does the predicate hold between two ranges?"
Not very surprisingly, some places were already doing this check,
without explicitly naming the algorithm, cleanup them all.
"Does the predicate hold between two ranges?"
Not very surprisingly, some places were already doing this check,
without explicitly naming the algorithm, cleanup them all.
A value from reachable block may come to a Phi node as its input from
unreachable block. This may confuse matchSimpleRecurrence which
has no access to DomTree and can falsely recognize something as a recurrency
because of this effect, as the attached test shows.
Patch `ae7b1e` deals with half of this problem, but it only accounts from
the case when an unreachable instruction comes to Phi as an input.
This patch provides a generalization by checking that no Phi block's
predecessor is unreachable (no matter what the input is).
Differential Revision: https://reviews.llvm.org/D99929
Reviewed By: reames
This fixes an issue introduced with my change d4648e, and reported in pr49768.
The root problem is that dominance collapses in unreachable code, and that LoopInfo explicitly only models reachable code. Since the recurrence matcher doesn't filter by reachability (and can't easily because not all consumers have domtree), we need to bailout before assuming that finding a recurrence implies we found a loop.
SCEV currently tries to prove implications of x pred y by also
trying to imply ~y pred ~x. This is expensive in terms of
compile-time (in fact, the majority of isImpliedCond compile-time
is spent here) and generally not fruitful. The issue is that this
also swaps the operands and thus breaks canonical ordering. If
originally we were trying to prove an implication like
X > C1 -> Y > C2, then we'll now try to prove X > C1 -> C3 > ~Y,
which will not work.
The only real case where we can get some use out of this transform
is if the original conditions were in the form X > C1 -> Y < C2, were
then swapped to X > C1 -> C2 > Y and are then swapped again here to
X > C1 -> ~Y > C3.
As such, handle this at a higher level, where we are doing the
swapping in the first place. There's four different ways that we
can line up a predicate and a swapped predicate, so we use some
heuristics to pick some profitable way.
Because we now try this transform at a higher level
(isImpliedCondOperands rather than isImpliedCondOperandsHelper),
we can also prove additional facts. Of the added tests, one was
proven previously while the other wasn't.
Differential Revision: https://reviews.llvm.org/D90926
This patch exploits the knowledge that we may be running many fewer than bitwidth iterations of the loop, and may be able to disallow the overflow case. This patch specifically implements only the shl case, but this can be generalized to ashr and lshr without difficulty.
Differential Revision: https://reviews.llvm.org/D98222
By definition of Implication operator, `false -> true` and `false -> false`. It means that
`false` implies any predicate, no matter true or false. We don't need to go any further
trying to prove the statement we need and just always say that `false` implies it in this case.
In practice it means that we are trying to prove something guarded by `false` condition,
which means that this code is unreachable, and we can safely prove any fact or perform any
transform in this code.
Differential Revision: https://reviews.llvm.org/D98706
Reviewed By: lebedev.ri
Provides API that allows to check predicate for being true or
false with one call. Current implementation is naive and just
calls isKnownPredicate twice, but further we can rework this
logic trying to use one check to prove both facts.
One of (and primary) callers of isBasicBlockEntryGuardedByCond is
isKnownPredicateAt, which makes isKnownPredicate check before it.
It already makes non-recursive check inside. So, on this execution
path this check is made twice. The only other caller is
isLoopEntryGuardedByCond. Moving the check there should save some
compile time.
This reverts commit 329aeb5db43f5e69df038fb20d2def77fe6f8595,
and relands commit 61f006ac655431bd44b9e089f74c73bec0c1a48c.
This is a continuation of D89456.
As it was suggested there, now that SCEV models `PtrToInt`,
we can try to improve SCEV's pointer handling.
In particular, i believe, i will need this in the future
to further fix `SCEVAddExpr`operation type handling.
This removes special handling of `ConstantPointerNull`
from `ScalarEvolution::createSCEV()`, and add constant folding
into `ScalarEvolution::getPtrToIntExpr()`.
This way, `null` constants stay as such in SCEV's,
but gracefully become zero integers when asked.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D98147
This is a continuation of D89456.
As it was suggested there, now that SCEV models `PtrToInt`,
we can try to improve SCEV's pointer handling.
In particular, i believe, i will need this in the future
to further fix `SCEVAddExpr`operation type handling.
This removes special handling of `ConstantPointerNull`
from `ScalarEvolution::createSCEV()`, and add constant folding
into `ScalarEvolution::getPtrToIntExpr()`.
This way, `null` constants stay as such in SCEV's,
but gracefully become zero integers when asked.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D98147