BinomialCoefficient computes the value of W-bit IV at iteration It of a loop. When W is 1, we can call multiplicative inverse on 0 which triggers an assert since 1b76120.
Since the arithmetic is supposed to wrap if It or K does not fit in W bits, do the truncation into W bits after we do the shift.
Fixes#87798
In our analysis of guarding conditions, we were converting a-b == 0 into
a == b alternate form, but we were only checking for one of the two
forms for the sub. There's no requirement that the multiply only be on
the LHS of the add.
These tests highlight that we have missed oppurtunities proving
trip count bounds when our start/end values are sign extended
from smaller types and we have either a loop guard to relate our
start vs end, or a nsw/nuw fact to bound end.
This extends the work from 7755c26 to all of the different backend
taken count kinds that we print for the scev analysis printer. As
before, the goal is to cut down on confusion as i4 -1 is a very
different (unsigned) value from i32 -1.
When printing the result of SCEV's analysis, we can avoid printing
the predicated backedge taken count and the predicates if the predicates
are empty and no new information is provided. This helps to reduce the
verbosity of the output.
When printing the result of the analysis, i8 -1 and i64 -1 are quite
different in terms of analysis quality. In a recent conversion with
a new contributor, we ran into exactly this confusion.
Adding the type for constant scevs more globally seems worthwhile, but
introduces a much larger test diff. I'm splitting this off first since
it addresses the immediate need, and then going to do some further
changes to clarify a few related bits of analysis result output.
A few notes:
* pr34538.ll has bitrotten. The original test printed the analysis after transforms in some cases, but this appears to been lost during migration to new pass manager. Remove the now redundant pass invocations and simplify the test setup.
This patch extends `propagatesPoison` to handle more integer intrinsics.
It will turn more logical ands/ors into bitwise ands/ors.
See also https://reviews.llvm.org/D99671.
This patch adds a set of tests taken
from/llvm/test/Transforms/IndVarSimplify/iv-poison.ll with multiple
congruent IVs but different set of flags on the increments.
Extra tests for https://github.com/llvm/llvm-project/pull/80430.
Loop guards tend to provide better results when it comes to reasoning
about ranges than isLoopEntryGuardedByCond(). See the test change for
the motivating case.
I have retained both the loop guard check and the implied cond based
check for now, though the latter only seems to impact a single test and
only via side effects (nowrap flag calculation) at that.
Use the disjoint flag to convert or to add instead of calling the
haveNoCommonBitsSet() ValueTracking query. This ensures that we can
reliably undo add -> or canonicalization, even in cases where the
necessary information has been lost or is too complex to reinfer in
SCEV.
I have updated the bulk of the test coverage to add the necessary
disjoint flags in advance.
These tests rely on SCEV looking recognizing an "or" with no common
bits as an "add". Add the disjoint flag to relevant or instructions
in preparation for switching SCEV to use the flag instead of the
ValueTracking query. The IR with disjoint flag matches what
InstCombine would produce.
For `{{regex}}` we don't really need a capturing group, and only add it
to properly handle cases like `{{foo|bar}}`. This is problematic,
because the use of capturing groups makes our regex implementation
slower (we have to go through the "dissect" stage, which can have
quadratic complexity).
Unfortunately, our regex implementation does not support non-capturing
groups like `(?:regex)`. So instead, avoid adding the group entirely if
the regex doesn't contain any alternations.
This causes a slight difference in escaping behavior, where previously
it was possible to write `{{{{}}` and get the same behavior as
`{{\{\{}}`. This will no longer work. I don't think this is a problem,
especially as we recently taught update_analyze_test_checks.py to emit
`{{\{\{}}`, so this shouldn't get introduced in any new tests.
For CodeGen/X86/vector-interleaved-store-i16-stride-7.ll (our slowest
X86 test) this drops FileCheck time from 6s to 5s (the remainder is
spent in a different regex issue). I expect similar speedups in other
tests using a lot of `{{}}`.
In commit 5a9a02f67b771fb2edcf06 scalar evolution got support for
computing SCEV:s for (ashr(add(shl(x, n), c), m)) constructs. The code
however used APInt::getZExtValue without first checking that the APInt
would fit inside an uint64_t. When for example using 128-bit types we
ended up in assertion failures (or maybe miscompiles in non-assert
builds).
This patch simply avoid converting from APInt to uint64_t when creating
the truncated constant. We can just truncate the APInt instead.
As far as I can tell, there's nothing in this code which actually
assumes the two predicates in (FoundLHS FoundPred FoundRHS) => (LHS Pred
RHS) are the same.
Noticed while investigating something else, this is purely an
oppurtunistic optimization while I'm looking at the code. Unfortunately,
this doesn't solve my original problem. :)
zext nneg was recently added to the IR in #67982. This patch teaches
SimplifyIndVars to prefer zext nneg over *both* sext and plain zext,
when a local SCEV query indicates the source is non-negative.
The choice to prefer zext nneg over sext looks slightly aggressive
here, but probably isn't so much in practice. For cases where we'd
"remember" the range fact, instcombine would convert the sext into
a zext nneg anyways. The only cases where this produces a different
result overall are when SCEV knows a non-local fact, and it doesn't
get materialized into the IR. Those are exactly the cases where
using zext nneg are most useful. We do run the risk of e.g. a
missing combine - since we haven't updated most of them yet - but
that seems like a manageable risk.
Note that there are much deeper algorithmic changes we could make
to this code to exploit zext nneg, but this seemed like a reasonable
and low risk starting point.
update_analyze_test_checks.py will now insert check lines for
empty lines, which means that all the existing test coverage will
have a spurious change to check for the newline after "Predicates:".
I don't think we actually want to have that newline, so drop it
before it gets into more test coverage.
Remove support for zext and sext constant expressions. All places
creating them have been removed beforehand, so this just removes the
APIs and uses of these constant expressions in tests.
There is some additional cleanup that can be done on top of this, e.g.
we can remove the ZExtInst vs ZExtOperator footgun.
This is part of
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179.
I don't think there is a use case for having an index type that is wider
than the pointer type, and I'm not entirely clear what semantics this
would even have.
Also clarify the GEP semantics to explicitly say how they interact with
the index type width.
Fixes a corner case of the analysis: previously GlobalValues could be
affected by assumptions, but were not tracked within AffectedValues.
This patch allows assumptions which affect a given GlobalValue to be
looked up via `assumptionsFor()`.
A small update to llvm/test/Analysis/ScalarEvolution/ranges.ll was
necessary due to knowledge about a global value now being propagated
from AssumptionCache -> ValueTracking -> ScalarEvolution.
The following commit enabled the analysis of ranges for heap allocations:
22ca38da25e19a7c5fcfeb3f22159aba92ec381e
The range turns out to be empty in cases such as the one in test (which
is [1,1)), leading to an assertion failure. This patch fixes for the
same case.
Fixes https://github.com/llvm/llvm-project/issues/63856
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D159160
%x = shl i64 %w, n
%y = add i64 %x, c
%z = ashr i64 %y, m
The above given instruction triplet is seen many times in the generated
LLVM IR, but SCEV model is not able to compute the SCEV value of AShr
instruction in this case.
This patch models the two cases of the above instruction pattern using
the following expression:
=> sext(add(mul(trunc(w), 2^(n-m)), c >> m))
1) when n = m the expression reduces to sext(add(trunc(w), c >> n))
as n-m=0, and multiplying with 2^0 gives the same result.
2) when n > m the expression works as given above.
It also adds several unittest to verify that SCEV is able to compute
the value.
$ opt sext-add-inreg.ll -passes="print<scalar-evolution>"
Comparing the snippets of the result of SCEV analysis:
* SCEV of ashr before change
----------------------------
%idxprom = ashr exact i64 %sext, 32
--> %idxprom U: [-2147483648,2147483648) S: [-2147483648,2147483648)
Exits: 8 LoopDispositions: { %for.body: Variant }
* SCEV of ashr after change
---------------------------
%idxprom = ashr exact i64 %sext, 32
--> {0,+,1}<nuw><nsw><%for.body> U: [0,9) S: [0,9)
Exits: 8 LoopDispositions: { %for.body: Computable }
LoopDisposition of the given SCEV was LoopVariant before, after adding
the new way to model the instruction, the LoopDisposition becomes
LoopComputable as it is able to compute the SCEV of the instruction.
Differential Revision: https://reviews.llvm.org/D152278
Loop unrolling tends to produce chains of
`%x1 = add %x0, 1; %x2 = add %x1, 1; ...` with one add per unrolled
iteration. This patch simplifies these adds to `%xN = add %x0, N`
directly during unrolling, rather than waiting for InstCombine to do so.
The motivation for this is that having a single add (rather than
an add chain) on the induction variable makes it a simple recurrence,
which we specially recognize in a number of places. This allows
InstCombine to directly perform folds with that knowledge, instead
of first folding the add chains, and then doing other folds in another
InstCombine iteration.
Due to the reduced number of InstCombine iterations, this also
results in a small compile-time improvement.
Differential Revision: https://reviews.llvm.org/D153540
We know that certain pointers (e.g. non-extern-weak globals or
allocas in default address space) are not null, in which case the
lowest address they can be allocated at is their alignment.
This allows us to calculate better exit counts for loops that have
an additional null check in the guarding condition
(see alloca_icmp_null_exit_count).
Differential Revision: https://reviews.llvm.org/D153624
The object size and alignment based restriction on the possible
allocation range also applies to allocas, not just globals, so
handle them as well.
We shouldn't really need any type restriction here at all, but
for now stay conservative.
When multiplying several AddRecs, we do the following simplification:
{A1,+,A2,+,...,+,An}<L> * {B1,+,B2,+,...,+,Bn}<L> = {x=1 in [ sum y=x..2x
[ sum z=max(y-x, y-n)..min(x,n) [
choose(x, 2x)*choose(2x-y, x-z)*A_{y-z}*B_z]]
],+,...up to x=2n}
This is done iteratively, pair by pair. So if we try to multiply three AddRecs
A1, A2, A3, then we'd try to simplify A1 * A2 to A1' and then try to
simplify A1' * A3 if A1' is also an AddRec.
The transform is only legal if the loops of the two AddRecs are the same.
It is checked in the code, but the loop of one of the AddRecs is stored
in a local variable and doesn't get updated when we simplify a pair to a new AddRec.
In the motivating test the new AddRec A1' was created for a different loop and,
as the loop variable didn't get updated, the check for different loops passed and
the transform worked for two AddRecs from different loops.
So it created a wrong SCEV. And it caused LSR to replace an instruction with another
one that had the same SCEV as the incorrectly computed one.
Differential Revision: https://reviews.llvm.org/D153254
If we didn't find the extact ZExt expr in the rewrite map, check if
there's an entry for a smaller ZExt we can use instead.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D149786
This is now checked as part of the usual SCEV verification. There
is little value in checking this on each lookup.
These two maps are strictly synchronized nowadays, which was not
the case historically.
Make ValueTracking directly call the KnownBits shift helpers, which
provides more precise results.
Unfortunately, ValueTracking has a special case where sometimes we
determine non-zero shift amounts using isKnownNonZero(). I have my
doubts about the usefulness of that special-case (it is only tested
in a single unit test), but I've reproduced the special-case via an
extra parameter to the KnownBits methods.
Differential Revision: https://reviews.llvm.org/D151816
Before this patch, we can only use the MaxBECount for an AddRec's range
computation if the MaxBECount has <= bit width of the AddRec. This patch
reasons that if a MaxBECount has > bit width, and is <= the max value of
AddRec's bit width, we can still use the MaxBECount.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D151698
This exposed an issue in SCEVExpander/LCSSA, which has been fixed
in D150681.
-----
As far as I understand, the IsAvailableOnEntry() function basically
implements the same functionality as the properlyDominates() block
disposition. The primary difference (apart from a weaker
implementation) seems to be in this comment at the top:
// Checks if the SCEV S is available at BB. S is considered available at BB
// if S can be materialized at BB without introducing a fault.
However, I don't really understand why there would be such a
requirement. It's my understanding that SCEV explicitly does not
care about trapping udiv instructions itself, and it's the job of
SCEVExpander's isSafeToExpand() to make sure these don't get
expanded if they may trap.
Differential Revision: https://reviews.llvm.org/D149344
For always poison shifts, any KnownBits return value is valid.
Currently we return unknown, but returning zero is generally more
profitable. We had some code in ValueTracking that tried to do this,
but was actually dead code.
Differential Revision: https://reviews.llvm.org/D150648
This fixes assertion crash in https://github.com/llvm/llvm-project/issues/62380.
In the beginning of ScalarEvolution::getBackedgeTakenInfo
we make sure that BackedgeTakenCounts contains an entry
for the given loop.
Then we call computeBackedgeTakenCount which computes the result,
and in the end we insert it in the map like so:
return BackedgeTakenCounts.find(L)->second = std::move(Result);
So we expect that the entry for L still exists in the cache.
However, it can get deleted. When it has computed the result,
getBackedgeTakenInfo clears all the cached SCEVs that use the AddRecs in the loop.
In the crashing example, getBackedgeTakenInfo first gets called on an inner loop,
and during this call it gets called again on its parent loop.
This recursion happens after the call to computeBackedgeTakenCount.
And it happens so that some SCEV from the BTI of the child loop uses
an AddRec of the parent loop. So when we successfully compute BTI
for the parent loop, we erase already computed result for the child one.
The recursion happens in some debug only code that
updates statistics. The algorithm itself is non-recursive.
Namely the recursive call happens in BackedgeTakenInfo::getExact function
and its return value is only used to compare it against SCEVCouldNotCompute.
As suggested by nikic I replaced the NumTripCountsComputed and NumTripCountsNotComputed
with NumExitCountsComputed and NumExitCountsNotComputed respectively.
They are updated during computations made for single exits. It relieves us of the need
to compute exact exit count for the loop just to update the named
statistic and thus the recursion cannot happen anymore.
Differential Revision: https://reviews.llvm.org/D149251
We currently have getMinTrailingZeros(), from which we can get a SCEV's
multiple by computing 1 << MinTrailingZeroes. However, this only gets us
multiples that are a power of 2. This patch introduces a way to get max
constant multiples that are not just a power of 2. The logic is similar
to that of getMinTrailingZeros. getMinTrailingZerosImpl is replaced by
computing the max constant multiple, and counting the number of trailing
bits.
I have so far found this useful in two places:
1) Computing unsigned constant ranges. For example, if we have i8
{10,+,10}<nuw>, we know the max constant it can be is 250.
2) My original intent was to use this in getSmallConstantTripMultiples,
but it has no effect right now due to change from D110587. For
example, if we have backedge count `(6 * %N) - 1`, the trip count
becomes `1 + zext((6 * %N) - 1)`, and we cannot say that 6 is a
multiple of the SCEV. I plan to look further into this separately.
The implementation assumes the value is unsigned. It can probably be
extended to handle signed values as well.
If the code sees that a SCEV does not have <nuw>, it will fall back to
finding the max multiple that is a power of 2. Multiples that are a
power of 2 will still be a multiple even after the SCEV overflows. This
does not apply to other values. This is the 1st commit message:
---
This relands https://reviews.llvm.org/D141823. The verification fails
when expensive checks are turned on. This can occur when:
1. SCEV S's multiple is cached
2. SCEV S's no wrap flags are strengthened, and the multiple changes
3. SCEV verifier finds that S's cached and recomputed multiple are
different
We eliminate most cases by forgetting SCEVAddRecExpr's cached values
when the flags are modified, but there are still cases for other SCEV
types. We relax the check by making sure the cached multiple divides the
recomputed multiple, ensuring the cached multiple is correct,
conservative multiple.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D149529