Loop guards that apply to loop SCEV bounds allow IRCE for cases with
compound loop bounds such as:
if (K > 0 && M > 0)
for (i = 0; i < min(K, M); i++) {...}
if (K > 0 && M > 0)
for (i = min(K, M); i >= 0; i--) {...}
Otherwise SCEV couldn't prove that loops have safe bounds in these
cases.
Co-authored-by: Aleksander Popov <apopov@azul.com>
LSR uses SCEVExpander to generate induction formulas. The expander
internally tries to reuse existing IR expressions. To do that, it needs
to strip any poison generating flags (nsw, nuw, exact, nneg, etc..)
which may not be valid for the newly added users.
This is conservatively correct, but has the effect that LSR will strip
nneg flags on zext instructions involved in trip counts in loop
preheaders. To avoid this, this patch adjusts the expanded to reinfer
the flags on the CSE candidate if legal for all possible users.
This should fix the regression reported in
https://github.com/llvm/llvm-project/issues/71200.
This should arguably be done inside canReuseInstruction instead, but
doing it outside is more conservative compile time wise. Both
canReuseInstruction and isGuaranteedNotToBePoison walk operand lists, so
right now we are performing work which is roughly O(N^2) in the size of
the operand graph. We should fix that before making the per operand step
more expensive. My tenative plan is to land this, and then rework the
code to sink the logic into more core interfaces.
These tests rely on SCEV looking recognizing an "or" with no common
bits as an "add". Add the disjoint flag to relevant or instructions
in preparation for switching SCEV to use the flag instead of the
ValueTracking query. The IR with disjoint flag matches what
InstCombine would produce.
As far as I can tell, there's nothing in this code which actually
assumes the two predicates in (FoundLHS FoundPred FoundRHS) => (LHS Pred
RHS) are the same.
Noticed while investigating something else, this is purely an
oppurtunistic optimization while I'm looking at the code. Unfortunately,
this doesn't solve my original problem. :)
Add tests for compound loop bounds where IRCE is possible
if (K > 0 && M > 0)
for (i = 0; i < min(K, M); i++) {...}
if (K > 0 && M > 0)
for (i = min(K, M); i >= 0; i--) {...}
Co-authored-by: Aleksander Popov <apopov@azul.com>
As usual, making it easier for an upcoming test delta to be seen.
Note that several of these are examples of extremely bad testing practice.
Checking internal debug output (for no real purpose), and checking the
result of a fully O2 + llc run instead of reducing the specific problematic
pass.
zext nneg was recently added to the IR in #67982. Teaching SCEVExpander
to emit nneg when possible is valuable since SCEV may have proved
non-trivial facts about loop bounds which would otherwise be lost when
materializing the value.
We have guarantees that induction variable will not overflow in the main
loop after the loop constrained. Therefore we can add no wrap flags on
its base in order not to miss info that loop is countable.
Add NSW flag now, since adding NUW flag requires a bit more complicated
analysis.
Reviewed By: skatkov
Differential Revision: https://reviews.llvm.org/D154954
Here is activated check elimination which was parsed previously in
https://reviews.llvm.org/D154069
* Added runtime check that computed range's boundary doesn't overflow in
terms of range type.
* From the statement INT_MIN <= END <= INT_MAX is inferred check:
isNonNegative(INT_MAX - END) * isNonNegative(END - INT_MIN).
* If overflow happens, check will return 0 and the safe interval will be
empty.
Reviewed By: skatkov
Differential Revision: https://reviews.llvm.org/D154188
Introduced the following range checks forms parsing:
* IV - Offset vs Limit
* Offset - IV vs Limit
Range's end boundary is computed as (Offset +/- Limit ).
If it's not possible to prove at compile time that computed upper bound
will not overflow, then scale boundary computation to a wider type to
perform overflow check at runtime.
Runtime overflow will be implemented in the next patch. In the meantime
safe range for such kind of checks isn't computed.
Reviewed By: skatkov
Differential Revision: https://reviews.llvm.org/D154069
Added tests on range checks with non-strick predicate:
* N - IV > limit
* IV - N < limit
* IV + N < limit
Also added tests with known to be non-negative N
Differential Revision: https://reviews.llvm.org/D154593
IRCE expects true edge of range check's branch comes to loop.
If it meets reverse case - invert the branch.
Reviewed By: skatkov
Differential Revision: https://reviews.llvm.org/D148244
Added tests on 3 types of range checks which are not supported by IRCE now:
* N - IV >= limit
* IV - N <= limit
* IV + N <= limit
Reviewed By: skatkov
Differential Revision: https://reviews.llvm.org/D154062
It seems that existing logic is too strict about latch block exit count.
It is required to be computable, however it is not used in any computations,
and effectively the only thing it is used for is to get the type of computed
exit count.
Sometimes the exit count for latch block is not known, but the loop is still
finite because of other exits, and safe bounds are still computable. In this case,
we miss an opportunity to apply IRCE.
We could instead use a more relaxed version - max symbolic exit count, which,
if exists, is enough to say that the loop is finite, and its type should be good enough.
There is a subtlety with type: we do not support latch count type wider than range
check type. Because of that, we want to have the narrowest type available. So if it
can be computed from latch block immediately, take it. Otherwise, take whatever whole
loop provides and hope that it's type isn't too wide.
Differential Revision: https://reviews.llvm.org/D147910
Reviewed By: danilaml
When IRCE runs on outer loop and sees a check of an AddRec of
inner loop, it crashes with an assert in SCEV that the AddRec
must be loop invariant.
This adds a bail out if the AddRec which is checked in icmp
is for another loop.
Fixes https://github.com/llvm/llvm-project/issues/58912.
Differential Revision: https://reviews.llvm.org/D137822
This adds a test for https://github.com/llvm/llvm-project/issues/58912.
IRCE crashes when it tries to check whether it is possible to safely
calculate the bounds of a loop with IV AddRec which is in another loop.
Since SCEV learned to look through single value phis with
20d798bd47ec5191de1b2a8a031da06a04e612e1, whenever we add
a new input to a Phi, we should make sure that the old cached
value is dropped. Otherwise, it may lead to various miscompiles,
such as breach of dominance as shown in the bug
https://github.com/llvm/llvm-project/issues/57335
When trying to prove an implied condition on a phi by proving it
for all incoming values, we need to be careful about values coming
from a backedge, as these may refer to a previous loop iteration.
A variant of this issue was fixed in D101829, but the dominance
condition used there isn't quite right: It checks that the value
dominates the incoming block, which doesn't exclude backedges
(values defined in a loop will usually dominate the loop latch,
which is the incoming block of the backedge).
Instead, we should be checking for domination of the phi block.
Any values defined inside the loop will not dominate the loop
header phi.
Fixes https://github.com/llvm/llvm-project/issues/56242.
Differential Revision: https://reviews.llvm.org/D128640
The basic problem we have is that we're trying to reuse an instruction which is mapped to some SCEV. Since we can have multiple such instructions (potentially with different flags), this is analogous to our need to drop flags when performing CSE. A trivial implementation would simply drop flags on any instruction we decided to reuse, and that would be correct.
This patch is almost that trivial patch except that we preserve flags on the reused instruction when existing users would imply UB on overflow already. Adding new users can, at most, refine this program to one which doesn't execute UB which is valid.
In practice, this fixes two conceptual problems with the previous code: 1) a binop could have been canonicalized into a form with different opcode or operands, or 2) the inbounds GEP case which was simply unhandled.
On the test changes, most are pretty straight forward. We loose some flags (in some cases, they'd have been dropped on the next CSE pass anyways). The one that took me the longest to understand was the ashr-expansion test. What's happening there is that we're considering reuse of the mul, previously we disallowed it entirely, now we allow it with no flags. The surrounding diffs are all effects of generating the same mul with a different operand order, and then doing simple DCE.
The loss of the inbounds is unfortunate, but even there, we can recover most of those once we actually treat branch-on-poison as immediate UB.
Differential Revision: https://reviews.llvm.org/D112734
I think currently isImpliedViaMerge can incorrectly return true for phis
in a loop/cycle, if the found condition involves the previous value of
Consider the case in exit_cond_depends_on_inner_loop.
At some point, we call (modulo simplifications)
isImpliedViaMerge(<=, %x.lcssa, -1, %call, -1).
The existing code tries to prove IncV <= -1 for all incoming values
InvV using the found condition (%call <= -1). At the moment this succeeds,
but only because it does not compare the same runtime value. The found
condition checks the value of the last iteration, but the incoming value
is from the *previous* iteration.
Hence we incorrectly determine that the *previous* value was <= -1,
which may not be true.
I think we need to be more careful when looking at the incoming values
here. In particular, we need to rule out that a found condition refers to
any value that may refer to one of the previous iterations. I'm not sure
there's a reliable way to do so (that also works of irreducible control
flow).
So for now this patch adds an additional requirement that the incoming
value must properly dominate the phi block. This should ensure the
values do not change in a cycle. I am not entirely sure if will catch
all cases and I appreciate a through second look in that regard.
Alternatively we could also unconditionally bail out in this case,
instead of checking the incoming values
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D101829
Prevent cases in which the start value of IV is bigger than bound for
increasing.
Prevent cases in which the start value of IV is smaller than bound for
decreasing.
Differential Revision: https://reviews.llvm.org/D101174
This is a small patch to make FoldBranchToCommonDest poison-safe by default.
After fc3f0c9c, only two syntactic changes are needed to fix unit tests.
This does not cause any assembly difference in testsuite as well (-O3, X86-64 Manjaro).
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D99452
These intrinsics, not the icmp+select are the canonical form nowadays,
so we might as well directly emit them.
This should not cause any regressions, but if it does,
then then they would needed to be fixed regardless.
Note that this doesn't deal with `SCEVExpander::isHighCostExpansion()`,
but that is a pessimization, not a correctness issue.
Additionally, the non-intrinsic form has issues with undef,
see https://reviews.llvm.org/D88287#2587863
In the last change to IRCE the BPI is ignored if BFI is present, however
BFI and BPI have a different thresholds. Specifically BPI approach checks only
latch exit probability so it is expected if the loop has only one exit block (latch)
the behavior with BFI and BPI should be the same,
BPI approach by default uses threshold 10, so it considers the loop with estimated
number of iterations less then 10 should not be considered for IRCE optimization.
BFI approach uses the default value 3 and this is inconsistent.
The CL modifies the code to use the same threshold for both approaches..
The test is updated due to it has two side-exits (except latch) and each of them has a
probability 1/16, so BFI estimates the number of runtime iteration is about to 7
(1/16 + 1/16 + some for latch) and test fails.
Reviewers: mkazantsev, ebrevnov
Reviewed By: mkazantsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D91230
IRCE has some overhead for runtime checks and in case number of iteration is small
the overhead can kill the benefit from optimizations.
This CL bases on BlockFrequencyInfo of pre-header and header to estimate the
number of loop iterations. If it is less than irce-min-estimated-iters we do not transform the loop.
Probably it is better to make more complex cost model but for simplicity it seems the be enough.
The usage of BFI is added only for new pass manager and tries to use it efficiently.
Reviewers: ebrevnov, dantrushin, asbirlea, mkazantsev
Reviewed By: mkazantsev
Subscribers: llvm-commits, fhahn
Differential Revision: https://reviews.llvm.org/D89541
Summary:
Currently when --passes is used, any passes specified via -foo are
ignored. Explicitly bail out when that happens.
This requires changing some tests. Most were straightforward, but
codegenprepare-produced-address-math.ll is tricky. One of its RUNs runs
CodeGenPrepare. I tried porting CodeGenPrepare to the NPM, but ended up
getting stuck when I needed a TargetMachine. NPM doesn't have support
for MachineFunctions yet. So I just deleted that RUN line, since it was
mass-added in https://reviews.llvm.org/D54848 and is likely not that
useful.
Reviewers: echristo, hans
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82271
For IR generated by a compiler, this is really simple: you just take the
datalayout from the beginning of the file, and apply it to all the IR
later in the file. For optimization testcases that don't care about the
datalayout, this is also really simple: we just use the default
datalayout.
The complexity here comes from the fact that some LLVM tools allow
overriding the datalayout: some tools have an explicit flag for this,
some tools will infer a datalayout based on the code generation target.
Supporting this properly required plumbing through a bunch of new
machinery: we want to allow overriding the datalayout after the
datalayout is parsed from the file, but before we use any information
from it. Therefore, IR/bitcode parsing now has a callback to allow tools
to compute the datalayout at the appropriate time.
Not sure if I covered all the LLVM tools that want to use the callback.
(clang? lli? Misc IR manipulation tools like llvm-link?). But this is at
least enough for all the LLVM regression tests, and IR without a
datalayout is not something frontends should generate.
This change had some sort of weird effects for certain CodeGen
regression tests: if the datalayout is overridden with a datalayout with
a different program or stack address space, we now parse IR based on the
overridden datalayout, instead of the one written in the file (or the
default one, if none is specified). This broke a few AVR tests, and one
AMDGPU test.
Outside the CodeGen tests I mentioned, the test changes are all just
fixing CHECK lines and moving around datalayout lines in weird places.
Differential Revision: https://reviews.llvm.org/D78403
IRCE pass checks that it can calculate loop bounds by checking
SCEV availability at loop entry. However it is possible that loop
bound SCEV is loop invariant, but instruction used to compute it
resides within loop. In such case adjusting loop bound in preheader
using IRBuilder leads to malformed SSA.
Use SCEVExpander instead to generate proper instructions.
Reviewed-by: mkazantsev
Differential Revision: https://reviews.llvm.org/D73496
We were failing to compute trip counts (both exact and maximum) for any loop which involved a comparison against either an umin or smin. It looks like this simply got missed when we added smin/umin to SCEV. (Note: umin was submitted separately earlier today. Turned out two folks hit this at the same time.)
Differential Revision: https://reviews.llvm.org/D67514
llvm-svn: 371776