4965 Commits

Author SHA1 Message Date
Sanjay Patel
6fedc6a2b4 Revert "[InstCombine] add narrowing transform for low-masked binop with zext operand"
This reverts commit afa192cfb6049a15c5542d132d500b910b802c74.
This can cause an infinite loop as shown with an example in the
post-commit thread.
2022-06-10 08:25:10 -04:00
David Sherwood
8daaea206b [InstCombine] Use +0.0 instead of -0.0 as the FP identity for some folds
In foldSelectIntoOp we sometimes transform a select of a fadd into a
fadd of a select, where we select between data and an identity value.
For both fadd and fsub the identity is always -0.0, but if the nsz
flag is set on the select instruction we can use +0.0 instead. Doing
so then triggers other optimisations, such as when folding the select
of masked load into a new masked load.

Differential Revision: https://reviews.llvm.org/D126774
2022-06-10 12:42:34 +01:00
chenglin.bi
de7a6ae1ff [InstCombine] Optimize shl+lshr+and conversion pattern
if `C1` and `C3` are pow2 and `Log2(C3)+C2 < BitWidth`:
    ((C1 << X) >> C2) & C3 -> X == (Log2(C3)+C2-Log2(C1)) ? C3 : 0;

https://alive2.llvm.org/ce/z/Pus5bd

Fix issue https://github.com/llvm/llvm-project/issues/55739

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D126617
2022-06-10 09:36:58 +08:00
Sanjay Patel
afa192cfb6 [InstCombine] add narrowing transform for low-masked binop with zext operand
https://alive2.llvm.org/ce/z/hRy3rE

As shown in D123408, we can produce this pattern when moving
cast around, and we already have a related fold for a binop
with a constant operand.
2022-06-09 16:59:26 -04:00
Simon Moll
b8c2781ff6 [NFC] format InstructionSimplify & lowerCaseFunctionNames
Clang-format InstructionSimplify and convert all "FunctionName"s to
"functionName".  This patch does touch a lot of files but gets done with
the cleanup of InstructionSimplify in one commit.

This is the alternative to the less invasive clang-format only patch: D126783

Reviewed By: spatel, rengolin

Differential Revision: https://reviews.llvm.org/D126889
2022-06-09 16:10:08 +02:00
Biplob Mishra
d87bfa9ad0 [InstCombine] Combine instructions of type or/and where AND masks can be combined.
The patch simplifies some of the patterns as below

(A | (B & C0)) | (B & C1) -> A | (B & C0|C1)
((B & C0) | A) | (B & C1) -> (B & C0|C1) | A

In some scenarios like byte reverse on half word, we can see this pattern multiple times and this conversion can optimize these patterns.

Additionally this commit fixes the issue reported with the test case.
int f(int a, int b) {
  int c = ((unsigned char)(a >> 23) & 925);
  if (a)
    c = (a >> 23 & b) | ((unsigned char)(a >> 23) & 925) | (b >> 23 & 157);
  return c;
}

The previous revision/commit did not check one-use of an intermediate value that this transform re-uses.
When that value has another use, an existing transform will try to invert the transform here.
By adding one-use checks, we avoid the infinite loops seen with the earlier commit.

Differential Revision: https://reviews.llvm.org/D124119
2022-06-09 10:58:30 +01:00
Chenbing Zheng
38992d2c5e [InstCombine] improve fold for icmp-ugt-ashr
Existing condition for
fold icmp ugt (ashr X, ShAmtC), C --> icmp ugt X, ((C + 1) << ShAmtC) - 1
missed some boundary. It cause this fold don't work for some cases, and the
reason is due to signed number overflow.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D127188
2022-06-09 16:22:12 +08:00
Wael Yehia
0952cf5bbb [InstCombine] decomposeSimpleLinearExpr should bail out on negative operands.
InstCombine tries to rewrite

  %prod = mul nsw i64 %X,   Scale
  %acc = add nsw i64 %prod,   Offset
  %0 = alloca i8, i64 %acc, align 4
  %1 = bitcast i8* %0 to i32*
  Use ( %1 )

into

  %prod = mul nsw i64 %X,   Scale/4
  %acc = add nsw i64 %prod,   Offset/4
  %0 = alloca i32, i64 %acc, align 4
  Use (%0)

But it assumes Scale is unsigned, and performs an unsigned division.
So we should bail out if Scale cannot be interpreted as an unsigned safely.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D126546
2022-06-08 00:57:25 +00:00
Sanjay Patel
cae993d4c8 [InstCombine] [InstCombine] reduce left-shift-of-right-shifted constant via demanded bits
If we don't demand low bits and it is valid to pre-shift a constant:
(C2 >> X) << C1 --> (C2 << C1) >> X

https://alive2.llvm.org/ce/z/_UzTMP

This is the reverse-order shift sibling to 82040d414b3c ( D127122 ).
It seems likely that we would want to add this to the SDAG version of
the code too to keep it on par with IR.
2022-06-07 18:43:27 -04:00
Sanjay Patel
a4d2c5ecaa [InstCombine] reduce code duplication for accessing type; NFC 2022-06-07 18:43:27 -04:00
Sanjay Patel
82040d414b [InstCombine] reduce right-shift-of-left-shifted constant via demanded bits
If we don't demand high bits (zeros) and it is valid to pre-shift a constant:
(C2 << X) >> C1 --> (C2 >> C1) << X

https://alive2.llvm.org/ce/z/P3dWDW

There are a variety of related patterns, but I haven't found a single solution
that gets all of the motivating examples - so pulling this piece out of
D126617 along with more tests.

We should also handle the case where we shift-right followed by shift-left,
but I'll make that a follow-on patch assuming this one is ok. It seems likely
that we would want to add this to the SDAG version of the code too to keep it
on par with IR.

Differential Revision: https://reviews.llvm.org/D127122
2022-06-07 13:28:18 -04:00
Sanjay Patel
3f33d67d8a [InstCombine] fold mul with masked low bit operand to trunc+select
https://alive2.llvm.org/ce/z/o7rQ5q

This shows an extra instruction in some cases, but that is
caused by an existing canonicalization of trunc -> and+icmp.

Codegen should be better for any target where a multiply is
more costly than the most simple ALU op.

This ends up producing the requested x86 asm from issue #55618,
but it's not the same IR. We are missing a canonicalization
from the negate+mask pattern to the trunc+select created here.
2022-06-05 20:07:18 -04:00
Sanjay Patel
8689463bfb [InstCombine] make pattern matching more consistent; NFC
We could go either way on this and several similar matches.
Just matching as a binop is possibly slightly more efficient;
we don't need to re-confirm the opcode of the instruction.
2022-06-02 16:01:23 -04:00
Alexander Kornienko
aa98e7e1eb Revert "[InstCombine] Combine instructions of type or/and where AND masks can be combined."
This reverts commit ec4adf1f6c333d3d36663d8f763c0896407f1b61. The commit causes
clang to hang on a certain input:
```
$ cat q.cc
int f(int a, int b) {
  int c = ((unsigned char)(a >> 23) & 925);
  if (a)
    c = (a >> 23 & b) | ((unsigned char)(a >> 23) & 925) | (b >> 23 & 157);
  return c;
}

$ time ./clang-15-10515 --target=x86_64--linux-gnu -O1  -c q.cc
^C

real    0m45.072s
user    0m0.025s
sys     0m0.099s
```
2022-06-01 14:20:00 +02:00
Sanjay Patel
2bf6123f22 [InstCombine] fold icmp of sext bool based on limited range
X <=u (sext i1 Y) --> (X == 0) | Y

https://alive2.llvm.org/ce/z/W_tZzo

This is the conjugate/sibling pattern suggested with D126171
for a sign-extended bool value.
2022-05-31 12:37:56 -04:00
Nikita Popov
36cbdaa163 [InstCombine] Fix inbounds preservation when swapping GEPs (PR44206)
When reassociating GEPs, we can only keep inbounds if both original
GEPs were inbounds, and their offsets have the same sign. For the
sake of simplicity, I only handle the case where both offsets are
non-negative here.

It would probably be fine to just not preserve inbounds at all here,
but as I don't see a compile-time impact for adding the
isKnownNonNegative() calls I went with this more conservative
approach.

Fixes https://github.com/llvm/llvm-project/issues/44206.

Differential Revision: https://reviews.llvm.org/D126687
2022-05-31 15:45:02 +02:00
Danila Malyutin
4fb3fd7d82 [InstCombine] Fix const folding of switches with default case
In case phi was in the default block it could lead to multi-edge.
Fixes #55721.

Differential Revision: https://reviews.llvm.org/D126650
2022-05-31 15:13:58 +03:00
Nikita Popov
872d69e5d4 [InstCombine] Fix inbounds preservation when merging GEPs (PR55722)
Even if the total offset is inbounds, we might represent it by first
performing a large negative offset and then a small positive one.
With inbounds semantics as currently specified, each offset must
be inbounds individually, not just the overall offset of the GEP.

Fix this by checking that the sign of all offsets is the same.

Fixes https://github.com/llvm/llvm-project/issues/55722.
2022-05-31 11:54:01 +02:00
Sanjay Patel
a0c3c60728 [InstCombine] fold shift-right-by-constant with shift-right-of-constant operand
(C2 >> X) >> C1 --> (C2 >> C1) >> X

The shift-left form of this transform has existed since:
16f18ed7b555bce5163

...but it applies to matching shift right opcodes too:
https://alive2.llvm.org/ce/z/c5eQms
2022-05-30 15:30:01 -04:00
Sanjay Patel
c5d942a4fb [InstCombine] remove unnecessary one-use check from (C2 << X) << C1 fold
The restriction goes back to:
16f18ed7b555bce51
...but the fold only replaces a shift with a shift, so that's not necessary.
Generalizing to other opcodes is planned as a follow-up.
2022-05-30 15:17:54 -04:00
Nikita Popov
a770f534e6 [InstCombine] When swapping GEPs, only keep inbounds if both are
If only one of the GEPs is inbounds, then after swapping, there is
no guarantee that one of them will be inbounds as well
(see e.g. https://alive2.llvm.org/ce/z/agaCnp).

This is only a partial fix, because even if both are inbounds, the
result is not necessarily inbounds (if the offsets have different
signs).
2022-05-30 17:04:42 +02:00
Nikita Popov
2d7bab666f [InstCombine] Always create new GEPs when swapping GEPs
As the long explanatory comment attests, performing the modification
in place is pretty tricky. Drop this unnecessary complexity and
always create new instructions.

This should be NFC-ish, but can probably cause difference due to
worklist order.
2022-05-30 16:48:52 +02:00
zhongyunde
3e6ba89055 [InstCombine] Fold a mul with bool value into and
Fixes https://github.com/llvm/llvm-project/issues/55599
  X * Y --> X & Y, iff X, Y can be only {0, 1}.
https://alive2.llvm.org/ce/z/_RsTKF

Reviewed By: spatel, nikic

Differential Revision: https://reviews.llvm.org/D126040
2022-05-30 21:05:00 +08:00
Chenbing Zheng
ef256ed58e [InstCombine] bitcast (extractelement <1 x elt>, dest) -> bitcast(<1 x elt>, dest)
Only solve dest type is vector to avoid inverse transform in visitBitCast.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D125951
2022-05-30 10:16:32 +08:00
Sanjay Patel
b5b6aa4d53 [InstCombine] fold multiply by signbit-splat to cmp+select
(ashr i32 X, 31) * C --> (X < 0) ? -C : 0
https://alive2.llvm.org/ce/z/G8u9SS

With a constant operand, this is an improvement in IR
and codegen (where it can be converted to a mask op).

Without a constant operand, we would have to negate
the operand, so that is probably better left to the backend.

This is similar but not the same optimization that is requested
in #55618.
2022-05-27 11:54:19 -04:00
Sanjay Patel
5a6e085757 [InstCombine] reduce code duplication; NFC 2022-05-27 11:54:19 -04:00
Sanjay Patel
c4c750058f [InstCombine] fold mul of signbit directly to X < 0 ? Y : 0
This is effectively NFC (intentionally no test diffs)
because we already have the related fold that converts
the 'and' pattern to select. So this is just an efficiency
improvement.
2022-05-26 16:19:15 -04:00
Sanjay Patel
49f8b05137 [InstCombine] fold icmp equality with sdiv and SMIN
This extends the fold from D126410 / 3952c905ef08
to allow for the only case where it works with signed
division:
https://alive2.llvm.org/ce/z/k7_ypu

(X s/ Y) == SMIN --> (X == SMIN) && (Y == 1)
(X s/ Y) != SMIN --> (X != SMIN) || (Y != 1)

This is another improvement based on #55695.
2022-05-26 16:19:15 -04:00
Sanjay Patel
ed5be1523f [InstCombine] reduce code duplication in icmp+div folds; NFC 2022-05-26 16:19:15 -04:00
Sanjay Patel
3952c905ef [InstCombine] fold icmp equality with udiv and large constant
With large compare constant:
(X u/ Y) == C --> (X == C) && (Y == 1)
(X u/ Y) != C --> (X != C) || (Y != 1)

https://alive2.llvm.org/ce/z/EhKwh6

There are various potential missing icmp (div) transforms shown here:
https://github.com/llvm/llvm-project/issues/55695

This is a generalization for part of the udiv + equality.
I didn't check in detail, but some of those may only make sense as
codegen transforms.

This results in one extra instruction in IR, but it is better for
analysis, and looks much better in codegen on all targets that I tried.

Differential Revision: https://reviews.llvm.org/D126410
2022-05-26 09:08:47 -04:00
Chenbing Zheng
1486a9c9fe [InstCombine] [NFC] refector foldXorOfICmps
Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D126268
2022-05-26 11:07:18 +08:00
Chenbing Zheng
41aab93afc [InstCombine] bitcast(logic(bitcast(X), bitcast(Y))) -> bitcast'(logic(bitcast'(X), Y))
This patch break foldBitCastBitwiseLogic limite the destination
must have an integer element type, and eliminate one bitcast by
doing the logic op in the type of the input that has an integer
element type.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D126184
2022-05-26 10:23:44 +08:00
Chenbing Zheng
269e3f7369 [InstCombine] [NFC] Move transforms for truncated shifts into narrowBinOp
Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D126056
2022-05-25 10:21:39 +08:00
Sanjay Patel
05527b68a0 [InstCombine] fold more shuffles with FP<->Int cast operands
shuffle (cast X), (cast Y), Mask --> cast (shuffle X, Y, Mask)

This extends the transform added with 0353c2c996c5.

If the shuffle reduces vector length, the transform
reduces the width of the cast, so that should be a
win for most codegen (if not, it can be inverted).
2022-05-24 15:11:38 -04:00
Nikita Popov
e6e0eb3bc8 [InstCombine] Strip bitcasts in GEP diff fold
Bitcasts were stripped in one case, but not the other. Of course,
this no longer really matters with opaque pointers, but as I went
through the trouble of tracking this down, we may as well remove
one typed vs opaque pointer optimization discrepancy.
2022-05-24 16:12:01 +02:00
Nikita Popov
b2a13d3e2d [InstCombine] Use IRBuilder in freeze pushing transform (PR55619)
Use IRBuilder so that the newly created freeze instructions
automatically gets inserted back into the IC worklist.

The changed worklist processing order leads to some cosmetic
differences in tests.

Fixes https://github.com/llvm/llvm-project/issues/55619.
2022-05-24 15:48:28 +02:00
Nikita Popov
a7c079aaa2 [InstCombine] Support logical and in masked icmp fold
Most of the folds implemented in this function work fine with
logical operations. We only need to be careful for the cases that
work on non-constant masks, where the RHS operand shouldn't be
poison.

This is a conservative implementation that bails out of illegal
transforms, but we could also change these to insert freeze instead.
2022-05-24 11:16:33 +02:00
Nikita Popov
5abaabed22 [InstCombine] Use m_APInt() in asymmetric masked icmp fold
This is mostly intended as code cleanup, but it does also add
support for splat vectors to this fold.
2022-05-24 10:57:28 +02:00
Nikita Popov
c0e06c7448 [InstCombine] Handle logical and/or in recursive and/or of icmps fold
The and/or of icmps fold is also applied in reassociated form.
However, this currently only happens for bitwise and of bitwise
and, but not for bitwise and of logical and (or other combinations,
but this is the one being addressed here).

We can do this for bitwise+logical combinations as well, but need
to be a bit careful about which of the resulting ands are logical:
https://alive2.llvm.org/ce/z/WYSjGh
https://alive2.llvm.org/ce/z/guxYnz
https://alive2.llvm.org/ce/z/S5SYxY
https://alive2.llvm.org/ce/z/2rAWeW
2022-05-24 10:13:10 +02:00
Sanjay Patel
e8c20d995b [IR] add and use pattern match specialization for sqrt intrinsic; NFC
This was included in D126190 originally, but it's
independent and a useful change for readability.
2022-05-23 14:16:30 -04:00
Nikita Popov
f45c1e436e [InstCombine] Change operand order in recursive and/or of icmps fold
The order obviously doesn't matter for bitwise and/or, but would
matter for logical and/or, so change it to preserve the original
order.
2022-05-23 17:29:33 +02:00
Sanjay Patel
1ebad988b1 [InstCombine] fold icmp of zext bool based on limited range
X <u (zext i1 Y) --> (X == 0) && Y

https://alive2.llvm.org/ce/z/avQDRY

This is a generalization of 4069cccf3b4ff4a based on the post-commit suggestion.
This also adds the i1 type check and tests that were missing from the earlier
attempt; that commit caused several bot fails and was reverted.

Differential Revision: https://reviews.llvm.org/D126171
2022-05-23 09:59:21 -04:00
Nikita Popov
45226d04f0 [InstCombine] Reuse icmp of and/or folds for logical and/or
Similarly to a change recently done for fcmps, add a flag that
indicates whether the and/or is logical to foldAndOrOfICmps, and
reuse the function when folding logical and/or.

We were already calling some parts of it, but this gives us a
clearer indication of which parts may need poison-safe variants,
and would also allow to fold combinations of bitwise and logical
and/or.

This change should be close to NFC, because all folds this enables
were either already called previously, or can make use of implied
poison reasoning.
2022-05-23 15:37:07 +02:00
Sanjay Patel
cba0ebd576 Revert "[InstCombine] fold icmp with sub and bool"
This reverts commit 4069cccf3b4ff4afb743d3d371ead9e2d5491e3a.
This causes bot failures, and there's a possibly a better way to get this and other patterns.
2022-05-22 12:13:20 -04:00
Sanjay Patel
4069cccf3b [InstCombine] fold icmp with sub and bool
This is the specific pattern seen in #53432, but it can be extended
in multiple ways:
1. The 'zext' could be an 'and'
2. The 'sub' could be some other binop with a similar ==0 property (udiv).

There might be some way to generalize using knownbits, but that
would require checking that the 'bool' value is created with
some instruction that can be replaced with new icmp+logic.

https://alive2.llvm.org/ce/z/-KCfpa
2022-05-22 11:51:07 -04:00
Sanjay Patel
f0071d43e4 [InstCombine] add use check to fold of bitwise logic with cast ops
This was shown as a potential regression in D126040.
2022-05-20 09:08:53 -04:00
Chenbing Zheng
cf348f6a2c [InstCombine] [NFC] Use a pattern matcher for ExtractElementInst
Reviewed By: RKSimon, rampitec

Differential Revision: https://reviews.llvm.org/D125857
2022-05-20 10:31:40 +08:00
Jay Foad
6bec3e9303 [APInt] Remove all uses of zextOrSelf, sextOrSelf and truncOrSelf
Most clients only used these methods because they wanted to be able to
extend or truncate to the same bit width (which is a no-op). Now that
the standard zext, sext and trunc allow this, there is no reason to use
the OrSelf versions.

The OrSelf versions additionally have the strange behaviour of allowing
extending to a *smaller* width, or truncating to a *larger* width, which
are also treated as no-ops. A small amount of client code relied on this
(ConstantRange::castOp and MicrosoftCXXNameMangler::mangleNumber) and
needed rewriting.

Differential Revision: https://reviews.llvm.org/D125557
2022-05-19 11:23:13 +01:00
Chenbing Zheng
ffaaf2498b [InstCombine] (rot X, ?) == 0/-1 --> X == 0/-1
In this patch we add a function foldICmpInstWithConstantAllowUndef
to fold integer comparisons with a constant operand: icmp Pred X, C
where X is some kind of instruction and C is AllowUndef.

We move this fold to the new function, so that it can solve undef elts in a vector.

Reviewed By: spatel, RKSimon

Differential Revision: https://reviews.llvm.org/D125220
2022-05-19 11:22:26 +08:00
Chenbing Zheng
51df77f36d [InstCombine] Allow undef vectors when foldSelectToCopysign
Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D125671
2022-05-19 10:57:49 +08:00