This patch tries to canonicalize the pattern `Overflow | icmp pred Res,
C2` into `Overflow | icmp pred X, C2 +/- C1`, where `Overflow` and `Res`
are return values of `xxx.with.overflow X, C1`.
Alive2: https://alive2.llvm.org/ce/z/PhR_3SFixes#75360.
When I originally added this fold, it did not actually fix my
motivation case, where the add was represented as an or. Now that
we have the disjoint flag this can finally be cleanly supported.
We were able to fold
or (mul X, Y), X --> mul X, (add Y, 1) (when the multiply has no common
bits with X)
This patch makes the transform work if the mul operands are commuted.
Part of the "RemoveDIs" project to remove debug intrinsics requires
passing block-positions around in iterators rather than as instruction
pointers, allowing some debug-info to reside in BasicBlock::iterator.
This means getInsertionPointAfterDef has to return an iterator, and as
it can return no-instruction that means returning an optional iterator.
This patch changes the signature for getInsertionPtAfterDef and then
patches up the various places that use it to handle the different type.
This would overall be an NFC patch, however in
InstCombinerImpl::freezeOtherUses I've started skipping any debug
intrinsics at the returned insert-position. This should not have any
_meaningful_ effect on the compiler output: at worst it means variable
assignments that are skipped will now cover the freeze instruction and
anything inserted before it, which should be inconsequential.
Sadly: this makes the function signature ugly. This is probably the
ugliest piece of fallout for the "RemoveDIs" work, but it serves the
overall purpose of improving compile times and not allowing `-g` to
affect compiler output, so should be worthwhile in the end.
This is nearly an NFC, the only change is potentially to order that
values are created/names.
Otherwise it is a slight speed boost/simplification to avoid having to
go through the `getFreelyInverted` recursive logic twice to simplify
the extra `not` op.
With the current logic of `if(isFreeToInvert(Op)) return Not(Op)` its
fairly easy to either 1) cause regressions or 2) infinite loops
if the folds we have for `Not(Op)` ever de-sync with the cases we
know are freely invertible.
This patch adds `getFreeInverted` which is able to build the free
inverted op along with check for free inversion to alleviate this
problem.
If there are two 'or' instructions concat variables in opposite order
and the first 'or' dominates the second one, the second 'or' can be
optimized to fshl to rotate shift first 'or'. This can eliminate an shl
and expose more optimization opportunity for bswap/bitreverse.
(icmp eq/ne (lshr x, C), (lshr y, C) gets optimized to `(icmp
ult/uge (xor x, y), (1 << C)`. This can cause the current equal by
parts detection to miss the high-bits as it may get optimized to the
new pattern.
This commit adds support for detecting / combining the ult/ugt
pattern.
Closes#69884
Base - Offset == 0 will get canonicalized to Base == Offset even
in multi-use contexts, at which point all of these patterns already
get handled by generic code.
This patch canonicalizes the pattern `(X +/- Y) & Y` into `~X & Y` when `Y` is a power of 2 or zero.
It will reduce the patterns to match in #67836 and exploit more optimization opportunities.
Alive2: https://alive2.llvm.org/ce/z/LBpvRF
This patch canonicalizes the pattern `and(zext(A), B)` into `select A, B
& 1, 0`. Thus, we can reuse transforms `select B == even, B & 1, 0 -> 0`
and `select B == odd, B & 1, 0 -> zext(B == odd)` in `InstCombine`.
It is an alternative to #66676.
Alive2: https://alive2.llvm.org/ce/z/598phEFixes#66733.
Fixes#66606.
Fixes#28612.
Technically increases the number of instructions if the
result isn't cast back to float. Even in this case it's
still probably a better canonical form since it enables FP value
tracking.
https://reviews.llvm.org/D151939
In the past we sort of pretended float might be implementable
as a non-IEEE type but that never realistically would work. Exotic
FP types would need to be added to the IR. Turning these
into FP operations enables FP tracking optimizations.
https://reviews.llvm.org/D151937
This is a resurrection of D18874. This was previously wrong with
fneg conflated with fsub, but we now have a proper fneg instruction.
Additionally, I think it is now clearer that IR float=IEEE float,
and a different bit layout would require adding a different IR type.
https://reviews.llvm.org/D151934
This extends foldCastedBitwiseLogic to handle the similar cases.
I have recently submitted a patch to implement a single fold like:
(A > 0) | (A < 0) -> zext (A != 0)
But it is not general enough, and some problems like
a < b & a >= b - 1 happen again.
So I generalize this fold by matching the pattern
bitwise(A >> C - 1, zext(icmp)), and replace A >> C - 1 with
zext(A < 0) here. (C is the scalar size bits of the type of A.)
Then we get bitwise(zext(A < 0), zext(icmp)), this will be folded
by original code in foldCastedBitwiseLogic, into
zext(bitwise(A < 0, icmp)). And finally, any related icmp fold will
be automatically implemented because bitwise(icmp,icmp) had been
implemented.
The proof of the correctness is obvious, because the folds below
were previously proved and implemented.
A >> C - 1 -> zext(A < 0)
bitwise(zext(A), zext(B)) -> zext(bitwise(A, B))
And the fold of this patch is the combination of folds above.
Fixes https://github.com/llvm/llvm-project/issues/63751.
Differential Revision: https://reviews.llvm.org/D154791
The underlying copyIRFlags() API accepts arbitrary values and can
work with flags on operators (i.e. instructions or constant
expressions). Remove the arbitrary limitation that the
CreateWithCopiedFlags() API imposes, so we can directly pass through
values matched by PatternMatch, which can be constant expressions.
The attached test case works fine now, but would crash with an
upcoming change to not produce and constant expressions.
If the and/or operand is an immediate constant, it will get folded
away anyway. Don't try to freely invert those operands.
A particularly degenerate case of this arises when both operands
are constant and the result is a constant, in which case we try
to invert users of a constant, resulting in an assertion failure.
Fixes https://github.com/llvm/llvm-project/issues/63791.
[InstCombine] Transform (A > 0) | (A < 0) -> zext (A != 0) fold
This extends **foldCastedBitwiseLogic** to handle the similar cases.
Actually, for `(A > B) | (A < B)`, when B != 0, it can be optimized to `zext( A != B )` by **foldAndOrOfICmpsUsingRanges**.
However, when B = 0, **transformZExtICmp** will transform `zext(A < 0) to i32` into `A << 31`,
which cannot be optimized by **foldAndOrOfICmpsUsingRanges**.
Because I'm new to LLVM and has no concise knowledge about how LLVM decides the order of optimization,
I choose to extend **foldCastedBitwiseLogic** to fold `( A << (X - 1) ) | ((A > 0) zext to iX) -> (A != 0) zext to iX`.
And the equivalent fold follows:
```
A << (X - 1) ) | ((A > 0) zext to iX
-> A < 0 | A > 0
-> (A != 0) zext to iX
```
It's proved by [[https://alive2.llvm.org/ce/z/33HzjE|alive-tv]]
Related issue:
[[https://github.com/llvm/llvm-project/issues/62586 | (a > b) | (a < b) is not simplified only for the case b=0 ]]
Reviewed By: goldstein.w.n
Differential Revision: https://reviews.llvm.org/D154126
Fold
binop(shift(ShiftedC1, ShAmt), shift(ShiftedC2, add(ShAmt, AddC)))
->
shift(binop(ShiftedC1, shift(ShiftedC2, AddC)), ShAmt)
where both shifts are the same and AddC is a valid shift amount.
Proofs: https://alive2.llvm.org/ce/z/PhVVeg
Differential Revision: https://reviews.llvm.org/D152927
If `Mask` and `Amt` are not constants and `binop1` and `binop2` are
the same we can transform to:
`(binop (lshift (binop X, Y), Amt), Mask)`
If `binop` is `add`, `lshift` must be `shl`.
If `Mask` and `Amt` are constants `C` and `C1` respectively.
We can transform to:
`(lshift1 (binop1 (binop2 X, (inv_lshift1 C, C1), Y)), C1)`
Saving an instruction IFF:
`lshift1` is same opcode as `lshift2`
Either `bitwise1` and/or `bitwise2` is `and`.
Proofs(1/2): https://alive2.llvm.org/ce/z/BjN-m_
Proofs(2/2): https://alive2.llvm.org/ce/z/bZn5QB
This is to help fix the regression caused in D151807
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D152568
This patch add another icmp fold for -1 case.
This fixes https://github.com/llvm/llvm-project/issues/62311,
where we want instcombine to merge all compare intructions together so
later passes like simplifycfg and slpvectorize can better optimize this
chained comparison.
Reviewed By: goldstein.w.n
Differential Revision: https://reviews.llvm.org/D151660
canonicalizeLogicFirst reorders logic op / math op for suitable
constants, and this commit makes this function pass through
nsw/nuw flags on the Add.
Differential Revision: https://reviews.llvm.org/D147568
This is motivated by patterns like !isfinite || zero. The AMDGPU math
libraries have a lot of patterns like this, and I'm trying to fix the
code to be more portable and less dependent on directly calling class
intrinsics.
I believe this is the first place where new is.fpclass calls are
introduced. There are more class-like compares that could be
recognized; this is a set I currently care about plus a few extras.
Keep the == 0 cases disabled for now. It depends on the denormal
mode. If we just check IEEE mode now, it will break my use case
without another patch I'm working on.
https://alive2.llvm.org/ce/z/2iC4oB
This is similar to changes made for zext + lshr:
21d3871b7c90
6c39a3aae1dc
The existing fold did not account for extra uses, so we
see some instruction count reductions in the test diffs.
This is intended to improve analysis (icmp likely has more
transforms than any other opcode), make other transforms
more symmetric with zext/lshr, and it can be inverted
in codegen if profitable.
As with the earlier changes, there is potential to uncover
infinite combine loops, but I have not found any yet.