Modifying AddRecs when constructing other expressions can lead to
surprising changes. It also seems like it is not really beneficial i
most cases.
At the moment, there's a single regression, but we still might be able
to improve the flags at AddRec construction.
Might help with the issue discussed in D143409.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D144051
There are a number of issues with the current code for converting
ule -> ult (etc) predicates for comparisons controlling finite loops:
* It sets nowrap flags, which may only hold for that particular
comparison, not globally. (PR60944)
* It doesn't check that the RHS is invariant. (I'm not sure this
can cause practical issues independently of the previous point.)
* It runs before simplifications that may be more profitable. (PR54191)
This patch moves the handling for this into computeExitLimitFromICmp(),
because it is somewhat tightly coupled with assumptions in that code,
and addresses the aforementioned issues.
Fixes https://github.com/llvm/llvm-project/issues/60944.
Fixes https://github.com/llvm/llvm-project/issues/54191.
Differential Revision: https://reviews.llvm.org/D145510
The motivation is that 'createInvariantCond' unconditionally
builds icmp in the loop block, while it could always do it
in preheader. Build it in preheader instead.
Patch by Aleksandr Popov!
Differential Revision: https://reviews.llvm.org/D141994
Reviewed By: nikic
This patch does two things, both related to support of multi-exit loops with
many exits that have known symbolic max exit count. They can theoretically
go independently, but I don't know how to write a test showing separate
impact.
Part 1: `SkipLastIter` can be set to `true` not when a particular exit has exit
count same as the whole loop (and therefore it must exit on the last iteration),
but when the aggregate of first few exits has umin same as whole loop exit count.
It means that it's not known which of them will exit exactly, but one of them will.
Part 2: when `SkipLastIter` is set, and exit count is `umin(a, b, c)`, instead of
`umin(a, b, c) - 1` use `umin(a - 1, b - 1, c - 1)`. We don't care about overflows
here, but the further logic knows how to deal with umin by element, but the
`SCEVAddExpr` node will confuse it.
Differential Revision: https://reviews.llvm.org/D141361
Reviewed By: nikic
When exit by condition `C1` dominates exit by condition `C2`, and
max symbolic exit count for `C1` matches those for loop, we will
apply more optimistic logic to `C2` by setting `SkipLastIter` for it,
meaning that it will do 1 iteration less because the dominating branch
must exit on the last loop iteration.
But when we have a single exit by condition `C1 & C2`, we cannot
apply the same logic, because there is no dominating condition.
However, if we can prove that the symbolic max exit count of `C1 & C2`
matches those of `C1`, it means that for `C2` we can assume that it
doesn't matter on the last iteration (because the whole thing is `false`
because `C1` must be `false`). Therefore, in this situation, we can handle
`C2` as if it had `SkipLastIter`.
Differential Revision: https://reviews.llvm.org/D139934
Reviewed By: nikic
A lit test used overaligned i8, apparently due to an old copy-paste
error, intending to specify i16 alignment.
Change the datalayout string to use naturally aligned i8.
This patch allows optimizeLoopExitWithUnknownExitCount to deal with
branches by conditions that are not immediately ICmp's, but aggregates
of ICmp's joined by arithmetic or logical AND/OR. Each ICmp is optimized
independently.
Differential Revision: https://reviews.llvm.org/D139832
Reviewed By: nikic
Recent improvements in symbolic exit count computation revealed some problems with
SCEV's ability to find invariant predicate during first iterations. Ultimately it is based on its
ability to prove some facts for value on the last iteration. This last value, when it includes
`umin` as part of exit count, isn't always simplified enough. The motivating example is following:
https://github.com/llvm/llvm-project/issues/59615
Could not prove:
```
Pred = 36, LHS = (-1 + (-1 * (2147483645 umin (-1 + %var)<nsw>))<nsw> + %var), RHS = %var
FoundPred = 36, FoundLHS = {1,+,1}<nuw><nsw><%bb3>, FoundRHS = %var
```
Can prove:
```
Pred = 36, LHS = (-1 + (-1 * (-1 + %var)<nsw>)<nsw> + %var), RHS = %var
FoundPred = 36, FoundLHS = {1,+,1}<nuw><nsw><%bb3>, FoundRHS = %var
```
Here ` (2147483645 umin (-1 + %var)<nsw>)` is exit count composed of two parts from
two different exits: `2147483645 ` and `(-1 + %var)<nsw>`. When it was only one (latter)
analyzeable exit, for it everything was easily provable. Unfortunately, in general case `umin`
in one of `add`'s operands doesn't guarantee that the whole sum reduces, especially in presence
of negative steps and lack of `nuw`. I don't think there is a generic legal way to somehow play
around this `umin`.
So the ad-hoc solution is following: if we failed to find an equivalent predicate that is invariant
during first `MaxIter` iterations, and `MaxIter = umin(a, b, c...)`, try to find solution for at least one
of `a`, `b`, `c`... Because they all are `uge` than `MaxIter`, whatever is true during `a (b, c)` iterations
is also true during `MaxIter` iterations.
Differential Revision: https://reviews.llvm.org/D140456
Reviewed By: nikic
Use FoldID to cache SignExtendExprs that get folded to a different
SCEV.
Depends on D137505.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D137849
When creating SCEV expressions for ZExt, there's quite a bit of
reasoning done and in many places the reasoning in turn will try to
create new SCEVs for other ZExts.
This can have a huge compile-time impact. The attached test from #58402
takes an excessive amount of compile time; without the patch, the test
doesn't complete in 1500+ seconds, but with the patch it completes in 1
second.
To speed up this case, cache created ZExt expressions for given (SCEV, Ty) pairs.
Caching just ZExts is relatively straight-forward, but it might make
sense to extend it to other expressions in the future.
This has a slight positive impact on CTMark:
* O3: -0.03%
* ReleaseThinLTO: -0.03%
* ReleaseLTO-g: 0.00%
https://llvm-compile-time-tracker.com/compare.php?from=bf9de7464946c65f488fe86ea61bfdecb8c654c1&to=5ac0108553992fb3d58bc27b1518e8cf06658a32&stat=instructions:u
The patch also improves compile-time for some internal real-world workloads
where time spent in SCEV goes from ~300 seconds to ~3 seconds.
There are a few cases where computing & caching the result earlier may
return more pessimistic results, but the compile-time savings seem to
outweigh that.
Fixes#58402.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D137505
Old logic: when loop symbolic max exit count matched *exact* block exit count,
assume that all subsequent blocks will do 1 iteration less.
New logic: when loop symbolic max exit count matched *symbolic max* block exit count,
assume that all subsequent blocks will do 1 iteration less.
The new logic is still legal and is more permissive in situations when exact
block exit count is not known.
Differential Revision: https://reviews.llvm.org/D139692
Reviewed By: nikic
LLVM *loves* to convert `add` of operands with no common bits
into an `or`. But SCEV really doesn't deal with `or` that well,
so try extra hard to recognize this `or` as an `add`.
I believe, previously this wasn't being done because of the recursive
of this, but now that the `createSCEV()` is not recursive,
this should be fine. Unless this is *too* costly compile-time wise...
https://alive2.llvm.org/ce/z/EfapCo
These limitations are too strict, and their only purpose is to avoid code
size explosion. These restrictions seem obsolete, and the size problem
is solved in other places through cheap expansion limits.
The motivation is that the old code cannot deal with comparisons against
induction variant's increment.
Differential Revision: https://reviews.llvm.org/D138412
Reviewed By: lebedev.ri, reames
At the moment, getRangeRef may overflow the stack for very deeply nested
expressions.
This patch introduces a new getRangeRefIter function, which first builds
a worklist of N-ary expressions and phi nodes, followed by their
operands iteratively.
getRangeRef has been extended to also take a Depth argument and it
switches to use getRangeRefIter once the depth reaches a certain
threshold.
This ensures compile-time is not impacted in general. Note that
the iterative algorithm may lead to a slightly different evaluation
order, which could result in slightly worse ranges for cyclic phis.
https://llvm-compile-time-tracker.com/compare.php?from=23c3eb7cdf3478c9db86f6cb5115821a8f0f5f40&to=e0e09fa338e77e53242bfc846e1484350ad79773&stat=instructionsFixes#49579.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D130728
Additional SCEV verification highlighted a case where the cached loop
dispositions where incorrect after simplifying a phi node in IndVars.
Fix it by invalidating the phi before replacing it.
Fixes#58750
Additional SCEV verification highlighted a case where the cached loop
dispositions where incorrect after simplifying a condition in IndVars
and moving the user in LoopDeletion. Fix it by invalidating ICmp and all
its users.
Fixes#58515.
Moving an instruction can invalidate the cached block dispositions of
the corresponding SCEV. Invalidate the cached dispositions.
Also fixes a copy-paste error in forgetBlockAndLoopDispositions where
the start expression S was removed from BlockDispositions in the loop
but not the current values. This was also exposed by the new test case.
Fixes#58439.