Currently OptimizePointerDifference() only handles single GEPs with a
common base, not GEP chains. This patch generalizes the support to
nested GEPs with a common base.
Finding the common base is a bit annoying because we want to stop as
soon as possible and not recurse into common GEP prefixes.
This helps avoids regressions from
https://github.com/llvm/llvm-project/pull/137297.
Logically it does not matter; getFreelyInvertedImpl doesn't
depend on the value for the m_ImmConstant case.
This use count logic should probably sink into getFreelyInvertedImpl,
every use of this appears to just be a hasOneUse or hasNUse count,
so this could change to just be a use count threshold.
The idea behind this canonicalization is that it allows us to handle less
patterns, because we know that some will be canonicalized away. This is
indeed very useful to e.g. know that constants are always on the right.
However, this is only useful if the canonicalization is actually
reliable. This is the case for constants, but not for arguments: Moving
these to the right makes it look like the "more complex" expression is
guaranteed to be on the left, but this is not actually the case in
practice. It fails as soon as you replace the argument with another
instruction.
The end result is that it looks like things correctly work in tests,
while they actually don't. We use the "thwart complexity-based
canonicalization" trick to handle this in tests, but it's often a
challenge for new contributors to get this right, and based on the
regressions this PR originally exposed, we clearly don't get this right
in many cases.
For this reason, I think that it's better to remove this complexity
canonicalization. It will make it much easier to write tests for
commuted cases and make sure that they are handled.
This patch extends the transform `(icmp pred iM (shl iM %v, N), C) ->
(icmp pred i(M-N) (trunc %v iM to i(M-N)), (trunc (C>>N))` to handle
icmps with the flipped strictness of predicate.
See the following case:
```
icmp ult i64 (shl X, 32), 8589934593 ->
icmp ule i64 (shl X, 32), 8589934592 ->
icmp ule i32 (trunc X, i32), 2 ->
icmp ult i32 (trunc X, i32), 3
```
Fixes the regression introduced by
https://github.com/llvm/llvm-project/pull/86111#issuecomment-2098203152.
Alive2 proofs: https://alive2.llvm.org/ce/z/-sp5n3
`nuw` cannot be propagated as we always use `ashr` here. I don't see the
value of fixing this (see the test `test_icmp_shl_nuw`).
Change all the cstval_pred_ty based PatternMatch helpers (things like
m_AllOnes and m_Zero) to only allow poison elements inside vector
splats, not undef elements.
Historically, we used to represent non-demanded elements in vectors
using undef. Nowadays, we use poison instead. As such, I believe that
support for undef in vector splats is no longer useful.
At the same time, while poison splat elements are pretty much always
safe to ignore, this is not generally the case for undef elements. We
have existing miscompiles in our tests due to this (see the
masked-merge-*.ll tests changed here) and it's easy to miss such cases
in the future, now that we write tests using poison instead of undef
elements.
I think overall, keeping support for undef elements no longer makes
sense, and we should drop it. Once this is done consistently, I think we
may also consider allowing poison in m_APInt by default, as doing that
change is much less risky than doing the same with undef.
This change involves a substantial amount of test changes. For most
tests, I've just replaced undef with poison, as I don't think there is
value in retaining both. For some tests (where the distinction between
undef and poison is important), I've duplicated tests.
Alive2: https://alive2.llvm.org/ce/z/u49dQ9
It will improve the codegen of `std::_Vector_base<T>::~_Vector_base()` when `sizeof(T)` is not a power of 2.
NOTE: We can also fold `icmp signed-pred (sdiv exact X, C), (sdiv exact Y, C)` into `icmp signed-pred (sdiv exact Y, C), (sdiv exact X, C)` when C is negative. But I don't think it enables more optimizations for real-world applications.
InstCombine canonicalizes `add` to `or` when possible, but this makes
some optimizations applicable to `add` to be missed because they don't
realize that the `or` is equivalent to `add`.
In this patch we generalize `foldICmpBinOp` to handle such cases.
The disjoint flag was recently added to IR in #72583
We already set it when we turn an add into an or. This patch sets it on Ors that weren't converted from an Add.
Remove support for zext and sext constant expressions. All places
creating them have been removed beforehand, so this just removes the
APIs and uses of these constant expressions in tests.
There is some additional cleanup that can be done on top of this, e.g.
we can remove the ZExtInst vs ZExtOperator footgun.
This is part of
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179.
This patch canonicalizes the pattern `and(zext(A), B)` into `select A, B
& 1, 0`. Thus, we can reuse transforms `select B == even, B & 1, 0 -> 0`
and `select B == odd, B & 1, 0 -> zext(B == odd)` in `InstCombine`.
It is an alternative to #66676.
Alive2: https://alive2.llvm.org/ce/z/598phEFixes#66733.
Fixes#66606.
Fixes#28612.
If the shl has either nuw or nsw flags, then we know that bits
cannot be shifted out, so a power of two cannot become zero.
Proofs: https://alive2.llvm.org/ce/z/4QfebE
InstCombine tries to swap compare operands to match sub instructions
in order to expose "CSE opportunities". However, it doesn't really
make sense to perform this transform in the middle-end, as we cannot
actually CSE the instructions there.
The backend already performs this fold in
18f5446a45/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp (L4236)
on the SDAG level, however this only works within a single basic block.
To handle cross-BB cases, we do need to handle this in the IR layer.
This patch moves the fold from InstCombine to CGP in the backend,
while keeping the same (somewhat dubious) heuristic.
Differential Revision: https://reviews.llvm.org/D152541
https://alive2.llvm.org/ce/z/2iC4oB
This is similar to changes made for zext + lshr:
21d3871b7c90
6c39a3aae1dc
The existing fold did not account for extra uses, so we
see some instruction count reductions in the test diffs.
This is intended to improve analysis (icmp likely has more
transforms than any other opcode), make other transforms
more symmetric with zext/lshr, and it can be inverted
in codegen if profitable.
As with the earlier changes, there is potential to uncover
infinite combine loops, but I have not found any yet.
Complexity canonicalization guarantees that a binop and cast
are op0/op1 respectively. Adjusted generic test names to
show that this pattern is still useful.
The code tried to do this for (icmp sle (1 << Y), 0), but that is
canonicalized to sgt before we get there.
Simplify the code by removing the unreachable SGE and SLE handling.
Also remove the (1 << Y) >=u 2147483648 and (1 << Y) <u 2147483648
handling since those are canonicalized to (1 << Y) <s 0 and
(1 << Y) >=s 0 before we get there.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D141753