When folding icmp over phi, add a special case for `icmp eq (zext(bool),
0)`, which is known to fold to `!bool` and thus won't increase the
instruction count. This helps convert more phis to i1, esp. in loops.
This is based on existing logic we have to support this for icmp of
ucmp/scmp.
In current visitPHINode function during InstCombine, it can remove dead
phi cycles (all phis have one use, which is another phi). However, it
cannot deal with the case when the phis form a web (all phis have one or
more uses, and all the uses are phi). This change extends the algorithm
so that it can also deal with the dead phi web.
The op of phi transform wants to prevent moving an operation across a
backedge, as this may lead to an infinite combine loop.
Currently, this is done using isPotentiallyReachable(). The problem with
that is that all blocks inside a loop are reachable from each other.
This means that the op of phi transform is effectively completely
disabled for code inside loops, even when it's not actually operating on
a loop phi (just a phi that happens to be in a loop).
Fix this by explicitly computing the backedges inside the function
instead. Do this via RPOT, which is a bit more efficient than using
FindFunctionBackedges() (which does it without any pre-computed
analyses).
For irreducible cycles, there may be multiple possible choices of
backedge, and this just picks one of them. This is still sufficient to
prevent combine loops.
This also removes the last use of LoopInfo in InstCombine -- I'll drop
the analysis in a followup.
This patch replaces all dominated uses of condition with true/false to
improve context-sensitive optimizations. It eliminates a bunch of
branches in llvm-opt-benchmark.
As a side effect, it may introduce new phi nodes in some corner cases.
See the following case:
```
define i1 @test(i1 %cmp, i1 %cond) {
entry:
br i1 %cond, label %bb1, label %bb2
bb1:
br i1 %cmp, label %if.then, label %if.else
if.then:
br %bb2
if.else:
br %bb2
bb2:
%res = phi i1 [%cmp, %entry], [%cmp, %if.then], [%cmp, %if.else]
ret i1 %res
}
```
It will be simplified into:
```
define i1 @test(i1 %cmp, i1 %cond) {
entry:
br i1 %cond, label %bb1, label %bb2
bb1:
br i1 %cmp, label %if.then, label %if.else
if.then:
br %bb2
if.else:
br %bb2
bb2:
%res = phi i1 [%cmp, %entry], [true, %if.then], [false, %if.else]
ret i1 %res
}
```
I am planning to fix this in late pipeline/CGP since this problem exists
before the patch.
The idea behind this canonicalization is that it allows us to handle less
patterns, because we know that some will be canonicalized away. This is
indeed very useful to e.g. know that constants are always on the right.
However, this is only useful if the canonicalization is actually
reliable. This is the case for constants, but not for arguments: Moving
these to the right makes it look like the "more complex" expression is
guaranteed to be on the left, but this is not actually the case in
practice. It fails as soon as you replace the argument with another
instruction.
The end result is that it looks like things correctly work in tests,
while they actually don't. We use the "thwart complexity-based
canonicalization" trick to handle this in tests, but it's often a
challenge for new contributors to get this right, and based on the
regressions this PR originally exposed, we clearly don't get this right
in many cases.
For this reason, I think that it's better to remove this complexity
canonicalization. It will make it much easier to write tests for
commuted cases and make sure that they are handled.
This patch canonicalizes getelementptr instructions with constant
indices to use the `i8` source element type. This makes it easier for
optimizations to recognize that two GEPs are identical, because they
don't need to see past many different ways to express the same offset.
This is a first step towards
https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699.
This is limited to constant GEPs only for now, as they have a clear
canonical form, while we're not yet sure how exactly to deal with
variable indices.
The test llvm/test/Transforms/PhaseOrdering/switch_with_geps.ll gives
two representative examples of the kind of optimization improvement we
expect from this change. In the first test SimplifyCFG can now realize
that all switch branches are actually the same. In the second test it
can convert it into simple arithmetic. These are representative of
common optimization failures we see in Rust.
Fixes https://github.com/llvm/llvm-project/issues/69841.
This patch folds `switch(zext/sext(X))` into `switch(X)`.
The original motivation of this patch is to optimize a pattern found in
cvc5. For example:
```
%bf.load.i = load i16, ptr %d_kind.i, align 8
%bf.clear.i = and i16 %bf.load.i, 1023
%bf.cast.i = zext nneg i16 %bf.clear.i to i32
switch i32 %bf.cast.i, label %if.else [
i32 335, label %if.then
i32 303, label %if.then
]
if.then: ; preds = %entry, %entry
%d_children.i.i = getelementptr inbounds %"class.cvc5::internal::expr::NodeValue", ptr %0, i64 0, i32 3
%cmp.i.i.i.i.i = icmp eq i16 %bf.clear.i, 1023
%cond.i.i.i.i.i = select i1 %cmp.i.i.i.i.i, i32 -1, i32 %bf.cast.i
```
`%cmp.i.i.i.i.i` always evaluates to false because `%bf.clear.i` can
only be 335 or 303.
Folding `switch i32 %bf.cast.i` to `switch i16 %bf.clear.i` will help
`CVP` to handle this case.
See also
https://github.com/llvm/llvm-project/pull/76928#issuecomment-1877055722.
Compile-time impact:
http://llvm-compile-time-tracker.com/compare.php?from=7954c57124b495fbdc73674d71f2e366e4afe522&to=502b13ed34e561d995ae1f724cf06d20008bd86f&stat=instructions:u
|stage1-O3|stage1-ReleaseThinLTO|stage1-ReleaseLTO-g|stage1-O0-g|stage2-O3|stage2-O0-g|stage2-clang|
|--|--|--|--|--|--|--|
|+0.03%|+0.06%|+0.07%|+0.00%|-0.02%|-0.03%|+0.02%|
In InstCombinePHI currently the only use of PHI as an Icmp is being
checked as a requirement to reduce a value if isKnownNonZero.
However this can be extended to include or(icmp) . This is always true
as OR only adds bits and we are checking against 0.
There is a combine in instcombine that will look for phi cycles that only have
a single incoming value:
```
%0 = phi i64 [ %3, %exit ], [ %othervalue, %preheader ]
%3 = phi i64 [ %0, %body ], [ %othervalue, %body2 ]
```
This currently doesn't handle if %othervalue is a phi though, as the algorithm
will recurse into the phi and fail with multiple incoming values. This adjusts
the algorithm, not requiring the initial value to be found immediately,
allowing it to be set to the value of one of the phis that would otherwise fail
due to having multiple input values.
The icmp is being folded in phi only if they belong in the same BB.
This patch extends the same beyond the BB.
Have seen scenarios where this seems to be beneficial.
Differential Revision: https://reviews.llvm.org/D157740
Set phi inputs to poison whenever we find a dead edge (either
during initial worklist population or the main InstCombine run),
instead of only doing this for successors of dead blocks.
This means that the phi operand is set to poison even if for
critical edges without an intermediate block.
There are quite a few test changes, because the pattern is fairly
common in vectorizer output, for cases where we know the vectorized
loop will be entered.
Improve foldOpIntoPhi() for icmp operator to check if incoming PHI value can be replaced with constant based on implied condition.
Depends on D156619.
Differential Revision: https://reviews.llvm.org/D156620
Combine binary operator of two phi node if there is at least one
specific constant value in phi0 and phi1's incoming values for each
same incoming block and this specific constant value can be used
to do optimization for specific binary operator.
For example:
```
%phi0 = phi i32 [0, %bb0], [%i, %bb1]
%phi1 = phi i32 [%j, %bb0], [0, %bb1]
%add = add i32 %phi0, %phi1
==>
%add = phi i32 [%j, %bb0], [%i, %bb1]
```
Fixes: https://github.com/llvm/llvm-project/issues/61137
Differential Revision: https://reviews.llvm.org/D145223
Relative to the previous attempt, this is rebased over the
InstSimplify fix in ac74e7a7806480a000c9a3502405c3dedd8810de,
which addresses the miscompile reported in PR58401.
-----
foldOpIntoPhi() currently only folds operations into the phi if all
but one operands constant-fold. The two exceptions to this are freeze
and select, where we allow more general simplification.
This patch makes foldOpIntoPhi() generally simplification based and
removes all the instruction-specific logic. We just try to simplify
the instruction for each operand, and for the (potentially) one
non-simplified operand, we move it into the new block with adjusted
operands.
This fixes https://github.com/llvm/llvm-project/issues/57448, which
was my original motivation for the change.
Differential Revision: https://reviews.llvm.org/D134954
Relative to the previous attempt, this adjusts simplification to
use the correct context instruction: We need to use the terminator
of the incoming block, not the original instruction.
-----
foldOpIntoPhi() currently only folds operations into the phi if all
but one operands constant-fold. The two exceptions to this are freeze
and select, where we allow more general simplification.
This patch makes foldOpIntoPhi() generally simplification based and
removes all the instruction-specific logic. We just try to simplify
the instruction for each operand, and for the (potentially) one
non-simplified operand, we move it into the new block with adjusted
operands.
This fixes https://github.com/llvm/llvm-project/issues/57448, which
was my original motivation for the change.
Differential Revision: https://reviews.llvm.org/D134954
The infinite loop seen on buildbots should be fixed by
11897708c0229c92802e747564e7c34b722f045f (assuming there are not
multiple infinite combine loops...)
-----
foldOpIntoPhi() currently only folds operations into the phi if all
but one operands constant-fold. The two exceptions to this are freeze
and select, where we allow more general simplification.
This patch makes foldOpIntoPhi() generally simplification based and
removes all the instruction-specific logic. We just try to simplify
the instruction for each operand, and for the (potentially) one
non-simplified operand, we move it into the new block with adjusted
operands.
This fixes https://github.com/llvm/llvm-project/issues/57448, which
was my original motivation for the change.
Differential Revision: https://reviews.llvm.org/D134954
Reapply with a fix for the case where an operand simplified back
to the original phi: We need to map this case to the new phi node.
-----
foldOpIntoPhi() currently only folds operations into the phi if all
but one operands constant-fold. The two exceptions to this are freeze
and select, where we allow more general simplification.
This patch makes foldOpIntoPhi() generally simplification based and
removes all the instruction-specific logic. We just try to simplify
the instruction for each operand, and for the (potentially) one
non-simplified operand, we move it into the new block with adjusted
operands.
This fixes https://github.com/llvm/llvm-project/issues/57448, which
was my original motivation for the change.