At the moment, the effectivness of guards that contain divisibility
information (A % B == 0 ) depends on the order of the conditions.
This patch makes using divisibility information independent of the
order, by collecting and applying the divisibility information
separately.
We first collect all conditions in a vector, then collect the
divisibility information from all guards.
When processing other guards, we apply divisibility info collected
earlier.
After all guards have been processed, we add the divisibility info,
rewriting the existing rewrite. This ensures we apply the divisibility
info to the largest rewrite expression.
This helps to improve results in a few cases, one in
https://github.com/dtcxzyw/llvm-opt-benchmark/pull/2921 and another one
in a different large C/C++ based IR corpus.
PR: https://github.com/llvm/llvm-project/pull/163021
Move getPreviousSCEVDivisibleByDivisor from a lambda to a static
function and clarify the name (DividesBy -> DivisibleBy).
Split off refactoring from https://github.com/llvm/llvm-project/pull/163021.
Add a new variant of m_scev_Mul that binds a SCEVMulExpr and use it in
SCEVURem_match and also update 2 more places in ScalarEvolution.cpp that
can use m_scev_Mul as well.
PR: https://github.com/llvm/llvm-project/pull/163364
Follow-up to https://github.com/llvm/llvm-project/pull/160941.
Even if we don't have a context instruction for the caller, we should be
able to provide context instructions for SCEVUnknowns. Unless I am
missing something, SCEVUnknown only become available at the point their
underlying IR instruction has been defined. If it is an argument, it
should be safe to use the first instruction in the entry block or the
instruction itself if it wraps an instruction.
This allows getConstantMultiple to make better use of alignment
assumptions.
PR: https://github.com/llvm/llvm-project/pull/163260
My usecase is simplifying the control flow generated by LoopVectorize
when vectorising loops whose tripcount is a function of the runtime
vector length. This can be problematic because:
* CSE is a pre-LoopVectorize transform and so it's common for an IR
function to include several calls to llvm.vscale(). (NOTE: Code
generation will typically remove the duplicates)
* Pre-LoopVectorize instcombines will rewrite some multiplies as shifts.
This leads to a mismatch between VL based maths of the scalar loop and
that created for the vector loop, which prevents some obvious
simplifications.
SCEV does not suffer these issues because it effectively does CSE during
construction and shifts are represented as multiplies.
When adding a new predicate to a union, we currently do a bidirectional
implication for all the contained predicates. This means that the number
of implication checks is quadratic in the number of total predicates (if
they don't end up being eliminated).
Fix this by not checking for implication if the number of predicates
grows too large. The expectation is that if there is a large number of
predicates, we should be discarding them later anyway, as expanding them
would be too expensive.
Fixes https://github.com/llvm/llvm-project/issues/156114.
Reverts llvm/llvm-project#157656
There are multiple reports that this is causing miscompiles in the MSan
test suite after bootstrapping and that this is causing miscompiles in
rustc. Let's revert for now, and work to capture a reproducer next week.
If we have a phi where one of it's source blocks is an unreachable
block, we don't want to traverse back into the unreachable region. Doing
so allows e.g. finding a trivial self loop when walking back the
predecessor chain.
This reverts commit f0df1e3dd4ec064821f673ced7d83e5a2cf6afa1.
Recommit with extra check for SCEVCouldNotCompute. Test has been added in
b16930204b.
Original message:
Remove the fall-back to constant max BTC if the backedge-taken-count
cannot be computed.
The constant max backedge-taken count is computed considering loop
guards, so to avoid regressions we need to apply loop guards as needed.
Also remove the special handling for Mul in willNotOverflow, as this
should not longer be needed after 914374624f
(https://github.com/llvm/llvm-project/pull/155300).
PR: https://github.com/llvm/llvm-project/pull/155672
Remove the fall-back to constant max BTC if the backedge-taken-count
cannot be computed.
The constant max backedge-taken count is computed considering loop
guards, so to avoid regressions we need to apply loop guards as needed.
Also remove the special handling for Mul in willNotOverflow, as this
should not longer be needed after 914374624f
(https://github.com/llvm/llvm-project/pull/155300).
PR: https://github.com/llvm/llvm-project/pull/155672
Add support for identifying multiplication overflow in SCEV.
This is needed in LoopAccessAnalysis and that limitation was worked
around by 484417a.
This allows early-exit vectorization to work as expected in
vect.stats.ll test without needing the workaround.
This patch adds a new VPlan-based addMinimumIterationCheck, which
replaced the ILV version for the non-epilogue case.
The VPlan-based version constructs a SCEV expression to compute the
minimum iterations, use that to check if the check is known true or
false. Otherwise it creates a VPExpandSCEV recipe and emits a
compare-and-branch.
When using epilogue vectorization, we still need to create the minimum
trip-count-check during the legacy skeleton creation. The patch moves
the definitions out of ILV.
PR: https://github.com/llvm/llvm-project/pull/153643
We just replaced SmallSet<T *, N> with SmallPtrSet<T *, N>, bypassing
the redirection found in SmallSet.h. With that, we no longer need to
include SmallSet.h in many files.
This patch replaces SmallSet<T *, N> with SmallPtrSet<T *, N>. Note
that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer
element types:
template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};
We only have 140 instances that rely on this "redirection", with the
vast majority of them under llvm/. Since relying on the redirection
doesn't improve readability, this patch replaces SmallSet with
SmallPtrSet for pointer element types.
Similarly to https://github.com/llvm/llvm-project/pull/131538, we can
also try and check if a predicate is known to wrap given the backedge
taken count.
For now, this just checks directly when we try to create predicated
AddRecs. This both helps to avoid spending compile-time on optimizations
where we know the predicate is false, and can also help to allow
additional vectorization (e.g. by deciding to scalarize memory accesses
when otherwise we would try to create a predicated AddRec with a
predicate that's always false).
The initial version is quite restricted, but can be extended in
follow-ups to cover more cases.
PR: https://github.com/llvm/llvm-project/pull/151134
For the attached test:
Before the loop-idiom pass, we have a store into the inner loop which is
considered simple and one that does not have any side effects on the
loop. Post loop-idiom pass, we get a memset into the outer loop that is
considered to introduce side effects on the loop. This changes the
backedge taken count before and after the pass and hence, the crash with
verify-scev.
We try to consider non-volatile memory intrinsics as not having
side-effect for forward progress to fix the issue.
Fixes#149377
Try to push the constant operand into a ZExt:
A + zext (-A + B) -> zext (B), if trunc (A) + -A + B does not
unsigned-wrap.
The actual code supports ZExts with arbitrary number of arguments, hence
the getAddExpr in the return.
This helps SCEV reasoning in some cases, commonly when adding an offset
to a zero-extended SCEV that subtracts the same offset.
Note that this is restricted to cases where we can fold away an operand
of the inner Add. This is needed to avoid bad interactions with patterns
when forming ZExts, which try to push to ZExt to add operands.
https://alive2.llvm.org/ce/z/q7d303
PR: https://github.com/llvm/llvm-project/pull/151227
Relax the NUW requirements for isKnownPredicateViaNoOverflow, if the
second operand (Y) is an ADD. The code only simplifies the condition if
C1 < C2, so if the second ADD is NUW, it doesn't matter whether the
first operand also has the NUW flag, as it cannot wrap if C1 < C2.
https://alive2.llvm.org/ce/z/b3dM7N
PR: https://github.com/llvm/llvm-project/pull/149795
Current the comparator is inconsistent when we have two external globals
and one internal globals due to
```
if (IsGVNameSemantic(LGV) && IsGVNameSemantic(RGV))
return LGV->getName().compare(RGV->getName());
```
The internal global compares equal to (not strictly less than) the
external globals, but the external globals are not equal.
Fixes#147862.
Update SimplifyICmpOperands to only try subtracting 1 from RHS first, if
RHS is an op we can fold the subtract directly into. Otherwise try
adding to LHS first, as we can preserve NUW flags.
This improves results in a few cases, including the modified test case
from berkeley-abc and new code to be added in
https://github.com/llvm/llvm-project/pull/128061.
Note that there are more cases where the results can be improved by
better ordering here which I'll try to investigate as follow-up.
PR: https://github.com/llvm/llvm-project/pull/144404
Having a finite Depth (or recursion limit) for computeKnownBits is very
limiting, but is currently a load-bearing necessity, as all KnownBits
are recomputed on each call and there is no caching. As a prerequisite
for an effort to remove the recursion limit altogether, either using a
clever caching technique, or writing a easily-invalidable KnownBits
analysis, make the Depth argument in APIs in ValueTracking uniformly the
last argument with a default value. This would aid in removing the
argument when the time comes, as many callers that currently pass 0
explicitly are now updated to omit the argument altogether.
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
try_emplace can default-construct values, so we do not need to do so
on our own. Plus, try_emplace(Key) is much simpler/shorter than
insert({Key, LongValueType()}).