724 Commits

Author SHA1 Message Date
Yingwei Zheng
caa2258250
[LLVM] Remove nuw neg (#86295)
This patch removes APIs that creating NUW neg. It is a trivial case
because `sub nuw 0, X` always gets simplified into zero.
I believe there is no optimization opportunities in the real-world
applications that we can take advantage of the nuw flag.

Motivated by
https://github.com/llvm/llvm-project/pull/84792#discussion_r1524891134.

Compile-time improvement:
https://llvm-compile-time-tracker.com/compare.php?from=d1f182c895728d89c5c3d198b133e212a5d9d4a3&to=da7b7478b7cbb32c09d760f6b8d0e67901e0d533&stat=instructions:u
2024-03-26 20:56:16 +08:00
Yingwei Zheng
9eb399b854
[InstCombine] Support zext nneg in foldLogicCastConstant (#82355)
This patch extends [D36234](https://reviews.llvm.org/D36234) to handle
`zext nneg` instructions.
I found this while adding support for cast instructions in
`getFreelyInvertedImpl`.
2024-02-20 23:09:00 +08:00
SahilPatidar
4b483ecd55
[InstCombine] Fix failure to fold (and %x, (sext i1 %m)) -> (select %m, %x, 0) with multiple uses of %m (#81409)
Resolves #81288.
2024-02-19 11:07:16 +01:00
Eikansh Gupta
db870cfc9e [InstCombine] Extract helper from matchFunnelShift (NFC)
The matchFunnelShift function was doing pattern matching and creating
the fshl/fshr instruction if needed. Moved the pattern matching code to
function convertOrOfShiftsToFunnelShift. It can be reused for other
optimizations.
2024-02-16 16:46:41 +01:00
Yingwei Zheng
470c5b8011
[InstSimplify][InstCombine] Remove unnecessary m_c_* matchers. (#81712)
This patch removes unnecessary `m_c_*` matchers since we always
canonicalize `commutive_op Cst, X` into `commutive_op X, Cst`.

Compile-time impact:
https://llvm-compile-time-tracker.com/compare.php?from=bfc0b7c6891896ee8e9818f22800472510093864&to=d27b058bb9acaa43d3cadbf3cd889e8f79e5c634&stat=instructions:u
2024-02-14 16:40:36 +08:00
Nikita Popov
074f7c2235 [InstCombine] Remove redundant fold (NFCI)
This has been subsumed by simplifyAndOrWithOpReplaced().
2024-02-12 09:54:32 +01:00
Nikita Popov
35d6ae8110
[InstCombine] Handle multi-use in simplifyAndOrWithOpReplaced() (#81006)
Slightly generalize simplifyAndOrWithOpReplaced() by allowing it to
perform simplifications (without creating new instructions) in multi-use
cases. This way we can remove existing patterns without worrying about
multi-use edge cases.

I've opted to change the general way the implementation works to be more
similar to the standard simplifyWithOpReplaced(). We perform the operand
replacement generically, and then try to simplify the result or create a
new instruction if we're allowed to do so.
2024-02-08 09:44:51 +01:00
Yingwei Zheng
65bf93dd7b
[InstCombine] Clean up bitwise folds without one-use check (#80587)
This patch removes some bitwise folds that fail to check the one-use
constraint on the operands.
See also the comments
https://github.com/llvm/llvm-project/pull/77231#issuecomment-1904090035.
2024-02-08 01:15:05 +08:00
Yingwei Zheng
f37d81f8a3
[PatternMatch] Add a matching helper m_ElementWiseBitCast. NFC. (#80764)
This patch introduces a matching helper `m_ElementWiseBitCast`, which is
used for matching element-wise int <-> fp casts.
The motivation of this patch is to avoid duplicating checks in
https://github.com/llvm/llvm-project/pull/80740 and
https://github.com/llvm/llvm-project/pull/80414.
2024-02-07 21:02:13 +08:00
Yingwei Zheng
4858e9c9fe
[InstCombine] Canonicalize the fcmp range check idiom into fabs + fcmp (#76367)
This patch canonicalizes the fcmp range check idiom into `fabs + fcmp`
since the canonicalized form is better than the original form for the
backends.
Godbolt: https://godbolt.org/z/x3eqPb1fz
```
and (fcmp olt/ole/ult/ule x, C), (fcmp ogt/oge/ugt/uge x, -C) --> fabs(x) olt/ole/ult/ule C
or  (fcmp ogt/oge/ugt/uge x, C), (fcmp olt/ole/ult/ule x, -C) --> fabs(x) ogt/oge/ugt/uge C
```
Alive2: https://alive2.llvm.org/ce/z/MRtoYq
2024-02-07 04:33:26 +08:00
elhewaty
2614672cc1
[InstCombine] Fold ((cst << x) & 1) --> x == 0 when cst is odd (#79772)
Fold ((cst << x) & 1) to zext(x == 0) when cst is odd.

Fixes: https://github.com/llvm/llvm-project/issues/73384
Alive2: https://alive2.llvm.org/ce/z/5RbaK6
2024-02-05 16:27:53 +01:00
Yingwei Zheng
f2816ff60c
[InstCombine] Simplify and/or by replacing operands with constants (#77231)
This patch tries to simplify `X | Y` by replacing occurrences of `Y` in
`X` with 0. Similarly, it tries to simplify `X & Y` by replacing
occurrences of `Y` in `X` with -1.

Alive2: https://alive2.llvm.org/ce/z/cNjDTR
Note: As the current implementation is too conservative in the one-use
checks, I cannot remove other existing hard-coded simplifications if
they involves more than two instructions (e.g, `A & ~(A ^ B) --> A &
B`).

Compile-time impact:
http://llvm-compile-time-tracker.com/compare.php?from=a085402ef54379758e6c996dbaedfcb92ad222b5&to=9d655c6685865ffce0ad336fed81228f3071bd03&stat=instructions%3Au

|stage1-O3|stage1-ReleaseThinLTO|stage1-ReleaseLTO-g|stage1-O0-g|stage2-O3|stage2-O0-g|stage2-clang|
|--|--|--|--|--|--|--|
|+0.01%|-0.00%|+0.00%|-0.02%|+0.01%|+0.02%|-0.01%|

Fixes #76554.
2024-01-31 14:30:55 +08:00
Yingwei Zheng
9acc404230
[InstCombine] Recognize more rotation patterns (#78107)
InstCombine already handles the pattern `(shl ShVal, (X & (Width - 1)))
| (lshr ShVal, ((-X) & (Width - 1)))`. Under certain circumstances, `X &
(Width - 1)` will be simplified to `X`. Therefore, this patch adds
support for the pattern `(shl ShVal, X) | (lshr ShVal, ((-X) & (Width -
1)))`.

Alive2: https://alive2.llvm.org/ce/z/P7JQ2V
2024-01-18 20:29:53 +08:00
Noah Goldstein
60e8915d22 [InstCombine] Add folds for (add/sub/disjoint_or/icmp C, (ctpop (not x)))
`(ctpop (not x))` <-> `(sub nuw nsw BitWidth(x), (ctpop x))`. The
`sub` expression can sometimes be constant folded depending on the use
case of `(ctpop (not x))`.

This patch adds fold for the following cases:

`(add/sub/disjoint_or C, (ctpop (not x))`
    -> `(add/sub/disjoint_or C', (ctpop x))`
`(cmp pred C, (ctpop (not x))`
    -> `(cmp swapped_pred C', (ctpop x))`

Where `C'` depends on how we constant fold `C` with `BitWidth(x)` for
the given opcode.

Proofs: https://alive2.llvm.org/ce/z/qUgfF3

Closes #77859
2024-01-15 12:05:38 -08:00
Yingwei Zheng
29f98d6c25
[InstCombine] Fold bitwise logic with intrinsics (#77460)
This patch does the following folds:
```
bitwise(fshl (A, B, ShAmt), fshl(C, D, ShAmt)) -> fshl(bitwise(A, C), bitwise(B, D), ShAmt)
bitwise(fshr (A, B, ShAmt), fshr(C, D, ShAmt)) -> fshr(bitwise(A, C), bitwise(B, D), ShAmt)
bitwise(bswap(A), bswap(B)) -> bswap(bitwise(A, B))
bitwise(bswap(A), C) -> bswap(bitwise(A, bswap(C)))
bitwise(bitreverse(A), bitreverse(B)) -> bitreverse(bitwise(A, B))
bitwise(bitreverse(A), C) -> bitreverse(bitwise(A, bitreverse(C)))
```
Alive2: https://alive2.llvm.org/ce/z/iZN_TL
2024-01-10 19:33:18 +08:00
Yingwei Zheng
90802e652d
[InstCombine] Handle commuted cases of the fold ((B|C)&A)|B -> B|(A&C) (#76565)
Alive2: https://alive2.llvm.org/ce/z/Qdsqk6

The commit f1eda23514
didn't handle other cases that commute operands.
2023-12-29 23:58:58 +08:00
Yingwei Zheng
7a1a476116
[InstCombine] Fold (X & C1) | C2 into X & (C1 | C2) iff (X & C2) == C2 (#76470)
Alive2: https://alive2.llvm.org/ce/z/VKJYaS
2023-12-28 20:47:40 +08:00
Yingwei Zheng
0d454d6e59
[InstCombine] Fold xor of icmps using range information (#76334)
This patch folds xor of icmps into a single comparison using range-based reasoning as `foldAndOrOfICmpsUsingRanges` does.
Fixes #70928.
2023-12-25 07:14:31 +08:00
Yingwei Zheng
c59ea32f82
[InstCombine] Canonicalize icmp pred (X +/- C1), C2 into icmp pred X, C2 -/+ C1 with nowrap flag implied by with.overflow intrinsic (#75511)
This patch tries to canonicalize the pattern `Overflow | icmp pred Res,
C2` into `Overflow | icmp pred X, C2 +/- C1`, where `Overflow` and `Res`
are return values of `xxx.with.overflow X, C1`.
Alive2: https://alive2.llvm.org/ce/z/PhR_3S

Fixes #75360.
2023-12-16 17:58:57 +08:00
Yingwei Zheng
9cf3e31172
[InstCombine] Explicitly fold ~(~X >>u Y) into X >>s Y (#75473)
Fixes #75369.

This patch explicitly folds `~(~X >>u Y)` into `X >>s Y` to fix assertion failure in #75369.
2023-12-14 23:06:38 +08:00
Nikita Popov
6e8b17d821 [InstCombine] Support or disjoint in displaced shift fold
When I originally added this fold, it did not actually fix my
motivation case, where the add was represented as an or. Now that
we have the disjoint flag this can finally be cleanly supported.
2023-12-07 15:00:40 +01:00
Craig Topper
56248caa3b
[InstCombine] Explicitly set disjoint flag when converting xor to or. (#74229) 2023-12-06 09:41:59 -08:00
Nikita Popov
a1b9736e9b [PatternMatch] Add m_c_DisjointOr (NFC)
Add commutative variant of m_DisjointOr.
2023-12-06 14:05:02 +01:00
Craig Topper
3e7ca05e93
[InstCombine] Use disjoint flag instead of calling haveNoCommonBitsSet. (#74222) 2023-12-03 12:34:49 -08:00
Craig Topper
5db1c6ed48 Revert "[InstCombine] Fix missed opportunity to fold 'or' into 'mul' operand. (#74225)"
This reverts commit e3b3c91dd0bbc8bd6f1ee562641daf1e554eb1b6.

This is causing an infinite loop on stage 2 builds.
2023-12-03 01:50:44 -08:00
Craig Topper
e3b3c91dd0
[InstCombine] Fix missed opportunity to fold 'or' into 'mul' operand. (#74225)
We were able to fold
or (mul X, Y), X --> mul X, (add Y, 1) (when the multiply has no common
bits with X)
    
This patch makes the transform work if the mul operands are commuted.
2023-12-03 00:51:22 -08:00
Nikita Popov
93636581d3 [InstCombiner] Make isFreeToInvert() and friends instance functions (NFC)
In order to use SQ inside of these. There doesn't seem to be any
strong need for these to be static.
2023-12-01 15:40:12 +01:00
Jeremy Morse
2425e2940e
[DebugInfo][RemoveDIs] Have getInsertionPtAfterDef return an iterator (#73149)
Part of the "RemoveDIs" project to remove debug intrinsics requires
passing block-positions around in iterators rather than as instruction
pointers, allowing some debug-info to reside in BasicBlock::iterator.
This means getInsertionPointAfterDef has to return an iterator, and as
it can return no-instruction that means returning an optional iterator.

This patch changes the signature for getInsertionPtAfterDef and then
patches up the various places that use it to handle the different type.
This would overall be an NFC patch, however in
InstCombinerImpl::freezeOtherUses I've started skipping any debug
intrinsics at the returned insert-position. This should not have any
_meaningful_ effect on the compiler output: at worst it means variable
assignments that are skipped will now cover the freeze instruction and
anything inserted before it, which should be inconsequential.

Sadly: this makes the function signature ugly. This is probably the
ugliest piece of fallout for the "RemoveDIs" work, but it serves the
overall purpose of improving compile times and not allowing `-g` to
affect compiler output, so should be worthwhile in the end.
2023-11-30 12:19:57 +00:00
Noah Goldstein
b7c0f79926 [InstCombine] Replace isFreeToInvert + CreateNot with getFreelyInverted
This is nearly an NFC, the only change is potentially to order that
values are created/names.

Otherwise it is a slight speed boost/simplification to avoid having to
go through the `getFreelyInverted` recursive logic twice to simplify
the extra `not` op.
2023-11-20 17:59:27 -06:00
Noah Goldstein
3039691f53 [InstCombine] add getFreeInverted to perform folds for free inversion of op
With the current logic of `if(isFreeToInvert(Op)) return Not(Op)` its
fairly easy to either 1) cause regressions or 2) infinite loops
if the folds we have for `Not(Op)` ever de-sync with the cases we
know are freely invertible.

This patch adds `getFreeInverted` which is able to build the free
inverted op along with check for free inversion to alleviate this
problem.
2023-11-20 17:59:27 -06:00
HaohaiWen
95d584c6ac
[InstCombine] Convert or concat to fshl if opposite or concat exists (#68502)
If there are two 'or' instructions concat variables in opposite order
and the first 'or' dominates the second one, the second 'or' can be
optimized to fshl to rotate shift first 'or'. This can eliminate an shl
and expose more optimization opportunity for bswap/bitreverse.
2023-11-20 13:12:55 +08:00
Noah Goldstein
ad9147399f [InstCombine] Improve eq/ne by parts to handle ult/ugt equality pattern.
(icmp eq/ne (lshr x, C), (lshr y, C) gets optimized to `(icmp
ult/uge (xor x, y), (1 << C)`. This can cause the current equal by
parts detection to miss the high-bits as it may get optimized to the
new pattern.

This commit adds support for detecting / combining the ult/ugt
pattern.

Closes #69884
2023-11-04 19:00:28 -05:00
Nikita Popov
95e4ad3f0f [InstCombine] Remove redundant add+and fold (NFCI)
This is handling a special case of demanded bits simplification
(which has multi-use support for adds, so it's not applicable in
that case either).
2023-10-24 17:10:27 +02:00
Nikita Popov
b5c44564e5 [InstCombine] Remove redundant folds in foldCastedBitwiseLogic() (NFCI)
The vector sext limitation the comment talks about has been removed
a long time ago, in https://reviews.llvm.org/D36213.
2023-10-24 17:06:39 +02:00
Nikita Popov
14b0ae439f [InstCombine] Remove redundant fold in foldUnsignedUnderflowCheck() (NFCI)
Base - Offset == 0 will get canonicalized to Base == Offset even
in multi-use contexts, at which point all of these patterns already
get handled by generic code.
2023-10-24 16:57:18 +02:00
HaohaiWen
8ff3e4f39b
[InstCombine] Refactor matchFunnelShift to allow more pattern (NFC) (#68474)
Current implementation of matchFunnelShift only allows opposite shift
pattern. Refactor it to allow more pattern.
2023-10-19 09:06:30 +08:00
Yingwei Zheng
8a7e547798
[InstCombine] Canonicalize (X +/- Y) & Y into ~X & Y when Y is a power of 2 (#67915)
This patch canonicalizes the pattern `(X +/- Y) & Y` into `~X & Y` when `Y` is a power of 2 or zero.
It will reduce the patterns to match in #67836 and exploit more optimization opportunities.
Alive2: https://alive2.llvm.org/ce/z/LBpvRF
2023-10-12 17:18:12 +08:00
Yingwei Zheng
a7f962c007
[InstCombine] Canonicalize and(zext(A), B) into select A, B & 1, 0 (#66740)
This patch canonicalizes the pattern `and(zext(A), B)` into `select A, B
& 1, 0`. Thus, we can reuse transforms `select B == even, B & 1, 0 -> 0`
and `select B == odd, B & 1, 0 -> zext(B == odd)` in `InstCombine`.
It is an alternative to #66676. 
Alive2: https://alive2.llvm.org/ce/z/598phE
Fixes #66733.
Fixes #66606.
Fixes #28612.
2023-09-29 02:51:58 +08:00
Nikita Popov
6cd5eb1f54 [InstCombine] Avoid some uses of ConstantExpr::getZExt() (NFC)
Add helpers getLosslessUnsignedTrunc/getLosslessSignedTrunc for
this common pattern.
2023-09-28 17:02:33 +02:00
Yingwei Zheng
4c241a9335
[InstCombine] Fold (-1 + A) & B into A ? 0 : B where A is effectively a bool
Solves issue https://github.com/llvm/llvm-project/issues/63321.

This patch explicitly folds `(-1 + A) & B` into `A ? 0 : B`. Additional trunc will be created when `A` is neither i1 nor <N x i1>.

https://alive2.llvm.org/ce/z/pWv9jJ

Reviewed By: goldstein.w.n

Differential Revision: https://reviews.llvm.org/D153148
2023-09-24 19:10:47 +08:00
Marc Auberer
1f313034cb
[InstCombine] Remove unnecessary one-use-check (#66419)
This removes a oneUse check, that is actually unnecessary.

Alive2: https://alive2.llvm.org/ce/z/qEkUEf
Original patch: https://reviews.llvm.org/D159380
2023-09-15 06:46:30 +02:00
Noah Goldstein
2a904f456a [InstCombine] Rename some shadow variables; NFC
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D159325
2023-09-13 15:50:18 -05:00
Yingwei Zheng
780b046bd0
[InstCombine] Use m_c_And/m_c_Or instead of duplicate logic. NFC.
See also https://reviews.llvm.org/D153148#inline-1535588
2023-09-10 23:34:23 +08:00
Marc Auberer
904ac6fe6b [InstCombine] Fold ((A&B)^A)|((A&B)^B) to A^B
Depends on D159379

((A & B) ^ A) | ((A & B) ^ B) -> A ^ B
(A ^ (A & B)) | (B ^ (A & B)) -> A ^ B
((A & B) ^ B) | ((A & B) ^ A) -> A ^ B
(B ^ (A & B)) | (A ^ (A & B)) -> A ^ B

Alive2: https://alive2.llvm.org/ce/z/i44xmq
Baseline tests: https://reviews.llvm.org/D159379

Reviewed By: huihuiz

Differential Revision: https://reviews.llvm.org/D159380
2023-09-07 17:33:06 -07:00
Matt Arsenault
70aede228a InstCombine: Recognize fneg(fabs) as bitcasted integer
Technically increases the number of instructions if the
result isn't cast back to float. Even in this case it's
still probably a better canonical form since it enables FP value
tracking.

https://reviews.llvm.org/D151939
2023-08-31 19:07:36 -04:00
Matt Arsenault
5c0da5839d InstCombine: Recognize fabs as bitcasted integer
In the past we sort of pretended float might be implementable
as a non-IEEE type but that never realistically would work. Exotic
FP types would need to be added to the IR. Turning these
into FP operations enables FP tracking optimizations.

https://reviews.llvm.org/D151937
2023-08-31 19:03:48 -04:00
Matt Arsenault
50a9b3d8a5 InstCombine: Recognize fneg when performed as bitcasted integer
This is a resurrection of D18874. This was previously wrong with
fneg conflated with fsub, but we now have a proper fneg instruction.
Additionally, I think it is now clearer that IR float=IEEE float,
and a different bit layout would require adding a different IR type.

https://reviews.llvm.org/D151934
2023-08-31 18:59:34 -04:00
XChy
8a0b2ca821 [InstCombine] Transform bitwise (A >> C - 1, zext(icmp)) -> zext (bitwise(A < 0, icmp))
This extends foldCastedBitwiseLogic to handle the similar cases.
I have recently submitted a patch to implement a single fold like:

(A > 0) | (A < 0) -> zext (A != 0)

But it is not general enough, and some problems like
a < b & a >= b - 1 happen again.

So I generalize this fold by matching the pattern
bitwise(A >> C - 1, zext(icmp)), and replace A >> C - 1 with
zext(A < 0) here. (C is the scalar size bits of the type of A.)

Then we get bitwise(zext(A < 0), zext(icmp)), this will be folded
by original code in foldCastedBitwiseLogic, into
zext(bitwise(A < 0, icmp)). And finally, any related icmp fold will
be automatically implemented because bitwise(icmp,icmp) had been
implemented.

The proof of the correctness is obvious, because the folds below
were previously proved and implemented.
  A >> C - 1 -> zext(A < 0)
  bitwise(zext(A), zext(B)) -> zext(bitwise(A, B))
And the fold of this patch is the combination of folds above.

Fixes https://github.com/llvm/llvm-project/issues/63751.

Differential Revision: https://reviews.llvm.org/D154791
2023-07-24 13:04:32 +02:00
Nikita Popov
218f97578b [IR] Accept non-Instruction in BinaryOperator::CreateWithCopiedFlags() (NFC)
The underlying copyIRFlags() API accepts arbitrary values and can
work with flags on operators (i.e. instructions or constant
expressions). Remove the arbitrary limitation that the
CreateWithCopiedFlags() API imposes, so we can directly pass through
values matched by PatternMatch, which can be constant expressions.

The attached test case works fine now, but would crash with an
upcoming change to not produce and constant expressions.
2023-07-21 10:05:52 +02:00
Dhruv Chawla
20ae2d200d
[InstCombine] Generalize foldAndOrOfICmpEqZeroAndICmp
This patch generalizes the fold implemented by foldAndOrOfICmpEqZeroAndICmp,
which are:

(icmp eq X, 0) | (icmp ult Other, X) -> (icmp ule Other, X-1)
(icmp ne X, 0) & (icmp uge Other, X) -> (icmp ugt Other, X-1)

to the following:

(icmp eq X, C) | (icmp ult Other, (X - C)) -> (icmp ule Other, (X - (C + 1)))
(icmp ne X, C) & (icmp uge Other, (X - C)) -> (icmp ugt Other, (X - (C + 1)))

The function foldAndOrOfICmpEqZeroAndICmp is also renamed to
foldAndOrOfICmpEqConstantAndICmp to reflect the changes.

Proofs: https://alive2.llvm.org/ce/z/yXGv6q

Fixes #63749.

Differential Revision: https://reviews.llvm.org/D154937
2023-07-12 11:13:37 +05:30