This patch simplifies the pattern `icmp X and/or C1, X and/or C2` when
one constant mask is the subset of the other.
If `C1 & C2 == C1`, `A = X and/or C1`, `B = X and/or C2`, we can do the
following folds:
`icmp ule A, B -> true`
`icmp ugt A, B -> false`
We can apply similar folds for signed predicates when `C1` and `C2` are
the same sign:
`icmp sle A, B -> true`
`icmp sgt A, B -> false`
Alive2: https://alive2.llvm.org/ce/z/Q4ekP5Fixes#65833.
Reapply after D156401, which stops PatternMatch from recognizing
binop constant expressions, which should avoid the infinite loops
and assertion failures this patch previously exposed.
-----
In preparation for removing support for and/or expressions, mark
them as undesirable. As such, we will no longer implicitly create
such expressions, but they still exist.
This reapplies the change for and, but also marks or as undesirable
at the same time. Only handling one of them can cause infinite
combine loops due to the asymmetric handling.
-----
In preparation for removing support for and/or expressions, mark
them as undesirable. As such, we will no longer implicitly create
such expressions, but they still exist.
This reverts commit 086ee99564afbb11449c08ea2e094f7f49fadde5.
This patch causes an infinite loop when building arch/mips/mm/c-r4k.c in
the Linux kernel. See the comment in Phabricator for a reduced
reproducer: https://reviews.llvm.org/rG086ee99564afbb11449c08ea2e094f7f49fadde5
Reapply after fixing an issue in canonicalizeLogicFirst() exposed
by this change (218f97578b26f7a89f7f8ed0748c31ef0181f80a).
-----
In preparation for removing support for and expressions, mark them
as undesirable. As such, we will no longer implicitly create such
expressions, but they still exist.
This reverts commit f8a36d8c3e264c4fccf8058e699201a452ea7bb7.
I believe this is causing an assertion failure on the
sanitizer-x86_64-linux buildbot:
clang++: /b/sanitizer-x86_64-linux/build/llvm-project/llvm/include/llvm/Support/Casting.h:578: decltype(auto) llvm::cast(From *) [To = llvm::BinaryOperator, From = llvm::Value]: Assertion `isa<To>(Val) && "cast<Ty>() argument of incompatible type!"' failed.
#10 0x000055bdd7e82408 canonicalizeLogicFirst(llvm::BinaryOperator&, llvm::IRBuilder<llvm::TargetFolder, llvm::IRBuilderCallbackInserter>&) /b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp:2131:5
#11 0x000055bdd7e80183 llvm::InstCombinerImpl::visitAnd(llvm::BinaryOperator&) /b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp:2661:20
Likely the code is encountering a constant expression in a case it
didn't before.
In preparation for removing support for add expressions, mark them
as undesirable. As such, we will no longer implicitly create such
expressions, but they still exist.
define i1 @compare_vscales() {
%vscale = call i64 @llvm.vscale.i64()
%vscalex2 = shl nuw nsw i64 %vscale, 1
%vscalex4 = shl nuw nsw i64 %vscale, 2
%cmp = icmp ult i64 %vscalex2, %vscalex4
ret i1 %cmp
}
This IR is currently emitted by LLVM. This icmp is redundant as this snippet
can be simplified to true or false as both operands originate from the same
@llvm.vscale.i64() call.
Differential Revision: https://reviews.llvm.org/D142542
When inferring that a GEP of a global variable is inbounds because
there is no notional overindexing, we need to check that the
global value type and the GEP source element type match.
This was not necessary with typed pointers (because we would have
a bitcast in between), but is necessary with opaque pointers.
We should be able to recover some of the safe cases by performing
an offset based inbounds inference in DL-aware ConstantFolding.
The only interesting test change is in @PR31262, where the following
fold is now performed, while it previously was not:
https://alive2.llvm.org/ce/z/a5Qmr6
llvm/test/Transforms/InstSimplify/ConstProp/gep.ll has not been
updated, because there is a tradeoff between folding and inrange
preservation there that we may want to discuss.
Updates have been performed using:
https://gist.github.com/nikic/98357b71fd67756b0f064c9517b62a34
The code was using exact sizing only, but since what we really need is just to make sure the offsets are in bounds, a minimum bound on the object size is sufficient.
To demonstrate the difference, support computing minimum sizes from obects of scalable vector type.
Remove some code which tried to handle the case of comparing two allocas where an object size could not be precisely computed. This code had zero coverage in tree, and at least one nasty bug.
The bug comes from the fact that the code uses the size of the result pointer as a proxy for whether the alloca can be of size zero. Since the result of an alloca is *always* a pointer type, and a pointer type can *never* be empty, this check was a nop. As a result, we blindly consider a zero offset from two allocas to never be equal. They can in fact be equal when one or more of the allocas is zero sized.
This is particularly ugly because instcombine contains the exact opposite rule. If instcombine reaches the allocas first, it combines them into one (making them equal). If instsimplify reaches the compare first, it would consider them not equal. This creates all kinds of fun scenarios for order of optimization reaching different and contradictory conclusions.
As described in https://bugs.llvm.org/show_bug.cgi?id=52429 this
fold is incorrect, because inbounds only guarantees that the
pointers don't wrap in the unsigned space: It is possible that
the sign boundary is crossed by an object.
I'm dropping the fold entirely rather than adjusting it, because
computePointerICmp() fully subsumes it (just with correct predicate
handling).
Differential Revision: https://reviews.llvm.org/D113343
As discussed in D89952,
instcombine can sometimes find a way to reduce similar patterns,
but it is incomplete.
InstSimplify uses the computeConstantRange() ValueTracking analysis
via simplifyICmpWithConstant(), so we just need to fill in the max
value of cttz to process any "icmp pred cttz(X), C" pattern (the
min value is initialized to zero automatically).
https://alive2.llvm.org/ce/z/Z_SLWZ
Follow-up to D89976.
As discussed in D89952,
instcombine can sometimes find a way to reduce similar patterns,
but it is incomplete.
InstSimplify uses the computeConstantRange() ValueTracking analysis
via simplifyICmpWithConstant(), so we just need to fill in the max
value of ctlz to process any "icmp pred ctlz(X), C" pattern (the
min value is initialized to zero automatically).
Follow-up to D89976.
As discussed in D89952,
instcombine can sometimes find a way to reduce similar patterns,
but it is incomplete.
InstSimplify uses the computeConstantRange() ValueTracking analysis
via simplifyICmpWithConstant(), so we just need to fill in the max
value of ctpop to process any "icmp pred ctpop(X), C" pattern (the
min value is initialized to zero automatically).
Differential Revision: https://reviews.llvm.org/D89976
This improves simplifications for pattern `icmp (X+Y), (X+Z)` -> `icmp Y,Z`
if only one of the operands has NSW set, e.g.:
icmp slt (x + 0), (x +nsw 1)
We can still safely rewrite this to:
icmp slt 0, 1
because we know that the LHS can't overflow if the RHS has NSW set and
C1 < C2 && C1 >= 0, or C2 < C1 && C1 <= 0
This simplification is useful because ScalarEvolutionExpander which is used to
generate code for SCEVs in different loop optimisers is not always able to put
back NSW flags across control-flow, thus inhibiting CFG simplifications.
Differential Revision: https://reviews.llvm.org/D89317
This revision adds the following peephole optimization
and it's negation:
%a = urem i64 %x, %y
%b = icmp ule i64 %a, %x
====>
%b = true
With John Regehr's help this optimization was checked with Alive2
which suggests it should be valid.
This pattern occurs in the bound checks of Rust code, the program
const N: usize = 3;
const T = u8;
pub fn split_mutiple(slice: &[T]) -> (&[T], &[T]) {
let len = slice.len() / N;
slice.split_at(len * N)
}
the method call slice.split_at will check that len * N is within
the bounds of slice, this bounds check is after some transformations
turned into the urem seen above and then LLVM fails to optimize it
any further. Adding this optimization would cause this bounds check
to be fully optimized away.
ref: https://github.com/rust-lang/rust/issues/74938
Differential Revision: https://reviews.llvm.org/D85092
The "null-pointer-is-valid" attribute needs to be checked by many
pointer-related combines. To make the check more efficient, convert
it from a string into an enum attribute.
In the future, this attribute may be replaced with data layout
properties.
Differential Revision: https://reviews.llvm.org/D78862