As part of the RemoveDIs project we need LLVM to insert instructions using
iterators wherever possible, so that the iterators can carry a bit of
debug-info. This commit implements some of that by updating the contents of
llvm/lib/Transforms/Utils to always use iterator-versions of instruction
constructors.
There are two general flavours of update:
* Almost all call-sites just call getIterator on an instruction
* Several make use of an existing iterator (scenarios where the code is
actually significant for debug-info)
The underlying logic is that any call to getFirstInsertionPt or similar
APIs that identify the start of a block need to have that iterator passed
directly to the insertion function, without being converted to a bare
Instruction pointer along the way.
Noteworthy changes:
* FindInsertedValue now takes an optional iterator rather than an
instruction pointer, as we need to always insert with iterators,
* I've added a few iterator-taking versions of some value-tracking and
DomTree methods -- they just unwrap the iterator. These are purely
convenience methods to avoid extra syntax in some passes.
* A few calls to getNextNode become std::next instead (to keep in the
theme of using iterators for positions),
* SeparateConstOffsetFromGEP has it's insertion-position field changed.
Noteworthy because it's not a purely localised spelling change.
All this should be NFC.
This patch adds support for canonicalization of icmp with a scalable
splat. Some optimizations assume that `icmp pred X, APInt C` is in
canonical form.
Fixes https://github.com/llvm/llvm-project/issues/83931.
Noticed on #82241 - we don't need to use the IntegerType just for the scalar width, and we were calling it 3 times in different forms - we can just call Type::getScalarSizeInBits once and reuse.
The fold for icmp (gep (p, i1), gep (p, i2)) to icmp (i1, i2) is
currently limited to one of the GEPs either having one use or a constant
offset. I believe this is to avoid duplicating complex arithmetic both
in the GEP and the offset comparison.
This patch instead does the same thing that the indexed compare fold
does, which is to rewrite the GEP into i8 form if necessary, so that the
offset arithmetic is not repeated after the transform.
I ran into this problem in a case where there are multiple conditions on
the same pointer, which prevents them from getting folded.
This patch does the following folds:
```
icmp eq/ne (bitcast X to int), (bitcast +/-inf to int) -> llvm.is.fpclass(X, (~)fcPosInf/fcNegInf)
icmp eq/ne (bitcast X to int), (bitcast +0/-0 to int) -> llvm.is.fpclass(X, (~)fcPosZero/fcNegZero)
```
Alive2: https://alive2.llvm.org/ce/z/JJmEE9
This patch refactors the interface of the `computeKnownFPClass` family
to pass `SimplifyQuery` directly.
The motivation of this patch is to compute known fpclass with
`DomConditionCache`, which was introduced by
https://github.com/llvm/llvm-project/pull/73662. With
`DomConditionCache`, we can do more optimization with context-sensitive
information.
Example (extracted from
[fmt/format.h](e17bc67547/include/fmt/format.h (L3555-L3566))):
```
define float @test(float %x, i1 %cond) {
%i32 = bitcast float %x to i32
%cmp = icmp slt i32 %i32, 0
br i1 %cmp, label %if.then1, label %if.else
if.then1:
%fneg = fneg float %x
br label %if.end
if.else:
br i1 %cond, label %if.then2, label %if.end
if.then2:
br label %if.end
if.end:
%value = phi float [ %fneg, %if.then1 ], [ %x, %if.then2 ], [ %x, %if.else ]
%ret = call float @llvm.fabs.f32(float %value)
ret float %ret
}
```
We can prove the signbit of `%value` is always zero. Then the fabs can
be eliminated.
`(ctpop (not x))` <-> `(sub nuw nsw BitWidth(x), (ctpop x))`. The
`sub` expression can sometimes be constant folded depending on the use
case of `(ctpop (not x))`.
This patch adds fold for the following cases:
`(add/sub/disjoint_or C, (ctpop (not x))`
-> `(add/sub/disjoint_or C', (ctpop x))`
`(cmp pred C, (ctpop (not x))`
-> `(cmp swapped_pred C', (ctpop x))`
Where `C'` depends on how we constant fold `C` with `BitWidth(x)` for
the given opcode.
Proofs: https://alive2.llvm.org/ce/z/qUgfF3Closes#77859
This patch tries to flip the signedness of predicates when folding an
unsigned icmp with a signed min/max. It will enable more optimizations
as we canonicalizes a signed icmp into an unsigned icmp when both
operands are known to have the same sign.
Fixes#76672.
Compile-time impact:
http://llvm-compile-time-tracker.com/compare.php?from=949ec83eaf6fa6dbffb94c2ea9c0a4d5efdbd239&to=2deca1aea8a4e13609bab72c522a97d424f0fc2d&stat=instructions:u
|stage1-O3|stage1-ReleaseThinLTO|stage1-ReleaseLTO-g|stage1-O0-g|stage2-O3|stage2-O0-g|stage2-clang|
|--|--|--|--|--|--|--|
|-0.00%|+0.01%|+0.05%|-0.12%|-0.01%|-0.03%|-0.00%|
NOTE: We can flip the signedness of predicate if both operands are
negative. But I don't see the benefit of handling these cases.
Alive2: https://alive2.llvm.org/ce/z/u49dQ9
It will improve the codegen of `std::_Vector_base<T>::~_Vector_base()` when `sizeof(T)` is not a power of 2.
NOTE: We can also fold `icmp signed-pred (sdiv exact X, C), (sdiv exact Y, C)` into `icmp signed-pred (sdiv exact Y, C), (sdiv exact X, C)` when C is negative. But I don't think it enables more optimizations for real-world applications.
InstCombine canonicalizes `add` to `or` when possible, but this makes
some optimizations applicable to `add` to be missed because they don't
realize that the `or` is equivalent to `add`.
In this patch we generalize `foldICmpBinOp` to handle such cases.
Instead of checking whether the GEP as a whole is constant, only
check whether it has constant incides. This matches what we do in
other places in this code.
This has little practical impact, because it is mostly already
handled through other cases anyway. We see a difference for
non-inbounds equality comparisons.
We have a bunch of folds that basically perform X pred Y to ~Y pred ~X
for various special cases where this saves an instruction.
Generalize these folds to use isFreeToInvert(). We have to make sure
that we consume an instruction in either of the inversions, otherwise
we're just going to swap the icmp back and forth.
Fixes https://github.com/llvm/llvm-project/issues/74302.
This adds support for using dominating conditions in computeKnownBits()
when called from InstCombine. The implementation uses a
DomConditionCache, which stores which branches may provide information
that is relevant for a given value.
DomConditionCache is similar to AssumptionCache, but does not try to do
any kind of automatic tracking. Relevant branches have to be explicitly
registered and invalidated values explicitly removed. The necessary
tracking is done inside InstCombine.
The reason why this doesn't just do exactly the same thing as
AssumptionCache is that a lot more transforms touch branches and branch
conditions than assumptions. AssumptionCache is an immutable analysis
and mostly gets away with this because only a handful of places have to
register additional assumptions (mostly as a result of cloning). This is
very much not the case for branches.
This change regresses compile-time by about ~0.2%. It also improves
stage2-O0-g builds by about ~0.2%, which indicates that this change results
in additional optimizations inside clang itself.
Fixes https://github.com/llvm/llvm-project/issues/74242.
The indexed compare fold converts comparisons of GEPs with same
(indirect) base into comparisons of offset. Currently, it only supports
GEPs with the same source element type.
This change makes the transform operate on offsets instead, which
removes the type dependence. To keep closer to the scope of the original
implementation, this keeps the limitation that we should only have at
most one variable index per GEP.
This addresses the main regression from
https://github.com/llvm/llvm-project/pull/68882.
TBH I have some doubts that this is really a useful transform (at least
for the case where there are extra pointer users, so we have to
rematerialize pointers at some point). I can only assume it exists for a
reason...
This is nearly an NFC, the only change is potentially to order that
values are created/names.
Otherwise it is a slight speed boost/simplification to avoid having to
go through the `getFreelyInverted` recursive logic twice to simplify
the extra `not` op.
This PR fixes https://github.com/llvm/llvm-project/issues/55013 : the
max intrinsics is not generated for this simple loop case :
https://godbolt.org/z/hxz1xhMPh. This is caused by a ICMP not being
folded into a select, thus not generating the max intrinsics.
For the story :
Since LLVM 14, SCCP pass got smarter by folding sext into zext for
positive ranges : https://reviews.llvm.org/D81756. After this change,
InstCombine was sometimes unable to fold ICMP correctly as both of the
arguments pointed to mismatched zext/sext. To fix this, @rotateright
implemented this fix : https://reviews.llvm.org/D124419 that tries to
resolve the mismatch by knowing if the argument of a zext is positive
(in which case, it is like a sext) by using ValueTracking, however
ValueTracking is not smart enough to infer that the value is positive in
some cases. Recently, @nikic implemented #67982 which keeps the
information that a zext is non-negative. This PR simply uses this
information to do the folding accordingly.
TLDR : This PR uses the recent nneg tag on zext to fold the icmp
accordingly in instcombine.
This PR also contains test cases for sext/zext folding with InstCombine
as well as a x86 regression tests for the max/min case.
Looking through inttoptr / ptrtoint intermixed with GEPs is very
questionable from a provenance perspective. We also don't seem to
have any test coverage that shows this is useful (apart from one
test I added to guard against a crash).