4825 Commits

Author SHA1 Message Date
Alexander Shaposhnikov
6cf10b7e6e [InstCombine] Fold srem(X, PowerOf2) == C into (X & Mask) == C for positive C
This diff extends InstCombinerImpl::foldICmpSRemConstant to handle the cases
srem(X, PowerOf2) == C and
srem(X, PowerOf2) != C
for positive C.
This addresses the issue https://github.com/llvm/llvm-project/issues/54650

Differential revision: https://reviews.llvm.org/D122942

Test plan: make check-all
2022-04-03 03:57:05 +00:00
Sanjay Patel
5f8c2b884d [InstCombine] limit icmp fold with sub if other sub user is a phi
This is a hacky fix for:
https://github.com/llvm/llvm-project/issues/54558

As discussed there, codegen regressed when we opened up this transform
to allow extra uses ( 61580d0949fd3465 ), and it's not clear how to
undo the transforms at the later stage of compilation.

As noted in the code comments, there's a set of remaining folds that
are still limited to one-use, so we can try harder to refine and
expand the limitations on these folds, but it's likely to be an
up-and-down battle as we find and overcome similar regressions.

Differential Revision: https://reviews.llvm.org/D122909
2022-04-02 19:23:42 -04:00
Sanjay Patel
97ac0cd6c4 [InstCombine] fold fcmp with lossy casted constant (2nd try)
This is a retry of 9397bdc67eb2 - that was reverted until
we had a clang warning in place to alert users about a
possible mistake in source. The warning was added with
ab982eace6e4.

This is noted as a missing clang warning in #54222,
but it is also a missing optimization opportunity.

Alive2 proofs:
https://alive2.llvm.org/ce/z/Q8drDq
https://alive2.llvm.org/ce/z/pE6LRt

I don't see a single conversion for all predicates
using "getFCmpCode" logic, so other predicates are
left as a TODO item.
2022-04-02 19:23:01 -04:00
Roman Lebedev
308ca349cb
[InstCombine] Fold (X | C2) ^ C1 --> (X & ~C2) ^ (C1^C2)
These two are equivalent,
and i *think* the `and` form is more-ish canonical.

General proof: https://alive2.llvm.org/ce/z/RrF5s6

If constant on the (outer) `xor` is an `undef`,
the whole lane is dead: https://alive2.llvm.org/ce/z/mu4Sh2

However, if the constant on the (inner) `or` is an `undef`,
we must sanitize it first: https://alive2.llvm.org/ce/z/MHYJL7
I guess, producing a zero `and`-mask is optimal in that case.

alive-tv is happy about the entirety of `xor-of-or.ll`.
2022-04-03 00:12:56 +03:00
Hirochika Matsumoto
a3cffc1150 [InstCombine] Fold (ctpop(X) == 1) | (X == 0) into ctpop(X) < 2
https://alive2.llvm.org/ce/z/94yRMN

Fixes #54177

Differential Revision: https://reviews.llvm.org/D122077
2022-03-29 11:30:06 -04:00
Nikita Popov
682ef39b1a [InstCombine] Remove call to getPointerElementType()
This was erroneously re-introduced as part of
bb0b23174e4ab963df427393fbf21bddede499bf.
2022-03-29 16:52:29 +02:00
Johannes Doerfert
bb0b23174e [InstCombineCalls] Optimize call of bitcast even w/ parameter attributes
Before we gave up if a call through bitcast had parameter attributes.
Interestingly, we allowed attributes for the return value already. We
now handle both the same way, namely, we drop the ones that are
incompatible with the new type and keep the rest. This cannot cause
"more UB" than initially present.

Differential Revision: https://reviews.llvm.org/D119967
2022-03-28 20:57:52 -05:00
chenglin.bi
9a53793ab8 [InstCombine] Fold two select patterns into and-or
select (~a | c), a, b -> and a, (or c, b) https://alive2.llvm.org/ce/z/bnDobs
select (~c & b), a, b -> and b, (or a, c) https://alive2.llvm.org/ce/z/k2jJHJ

Differential Revision: https://reviews.llvm.org/D122152
2022-03-28 16:07:55 -04:00
Simon Pilgrim
6a094a6264 [InstCombine] SimplifyDemandedUseBits - remove ashr node if we only demand known sign bits
We already do this for SelectionDAG, but we're missing it here.

Noticed while re-triaging PR21929

Differential Revision: https://reviews.llvm.org/D122340
2022-03-25 15:39:08 +00:00
Sanjay Patel
5dbb53b1b4 [InstCombine] merge shuffled vector negate and multiply
Add the "(0 - X) --> (X * -1)" reverse identity to the list of alternate form binops.

We need a little hack to make the existing logic work because it does not expect to
move constants from op0 to op1, but the code comment hopefully makes that clear.
I don't think there are any other identities like that.

Fixes #54364

Differential Revision: https://reviews.llvm.org/D122390
2022-03-24 10:25:16 -04:00
Dávid Bolvanský
4397504c2d [NFCI] Fix set-but-unused warning in InstCombineAddSub.cpp 2022-03-24 08:33:40 +01:00
chenglin.bi
52f323d0f1 [InstCombine] Fold abs of known negative operand when source is sub
When abs source comes from (x - y), check if a "x > y" dominating
condition exists.

Fixes #54132

Differential Revision: https://reviews.llvm.org/D122013
2022-03-23 15:21:33 -04:00
Sanjay Patel
0fcff69bcb [InstCombine] try to narrow shifted bswap-of-zext (2nd try)
The first attempt at this missed a validity check.
This version includes a test of the narrow source
type for modulo-16-bits.

Original commit message:

This is the IR counterpart to 370ebc9d9a573d6
which provided a bswap narrowing fix for issue #53867.

Here we can be more general (although I'm not sure yet
what would happen for illegal types in codegen - too
rare to worry about?):
https://alive2.llvm.org/ce/z/3-CPfo

This will be more effective if we have moved the shift
after the bswap as proposed in D122010, but it is
independent of that patch.

Differential Revision: https://reviews.llvm.org/D122166
2022-03-23 11:28:37 -04:00
Nathan Chancellor
4e0008dcbe
Revert "[InstCombine] try to narrow shifted bswap-of-zext"
This reverts commit 9e9bda2e8f5b88715bad767a4b7740df32b040d2.

This causes a backend error when building the Linux kernel for arm64.
See https://reviews.llvm.org/D122166 for a simplified reproducer.
2022-03-22 17:32:33 -07:00
Philip Reames
7abefc4222 [instcombine] Fold away memset/memmove from otherwise unused alloca
The motivation for this is that while both memcpyopt and dse will catch this case, both are limited by MSSA's walk back threshold when finding clobbers.  As such, if you have a memcpy of an otherwise dead alloca placed towards the end of a long basic block with lots of other memory instructions, it would be missed.  This is a bit undesirable for such an "obviously" useless bit of code.

As noted in comments, we should probably generalize instcombine's escape analysis peephole (see visitAllocInst) to allow read xor write.  Doing that would subsume this code in a more general way, but is also a more involved change.  For the moment, I went with the easiest fix.
2022-03-22 13:48:48 -07:00
Sanjay Patel
ccf8c969c2 [InstCombine] reorder code, fix formatting; NFC
The affected code can be updated to solve #54364,
so make some cosmetic diffs before real changes.
2022-03-22 16:33:01 -04:00
Sanjay Patel
60820e53ec [InstCombine] try to canonicalize logical shift after bswap
When shifting by a byte-multiple:
bswap (shl X, C) --> lshr (bswap X), C
bswap (lshr X, C) --> shl (bswap X), C

This is an IR implementation of a transform suggested in D120648.
The "swaps cancel" test models the motivating optimization from
that proposal.

Alive2 checks (as noted in the other review, we could use
knownbits to handle shift-by-variable-amount, but that can be an
enhancement patch):
https://alive2.llvm.org/ce/z/pXUaRf
https://alive2.llvm.org/ce/z/ZnaMLf

Differential Revision: https://reviews.llvm.org/D122010
2022-03-22 09:10:55 -04:00
Sanjay Patel
9e9bda2e8f [InstCombine] try to narrow shifted bswap-of-zext
This is the IR counterpart to 370ebc9d9a573d6
which provided a bswap narrowing fix for issue #53867.

Here we can be more general (although I'm not sure yet
what would happen for illegal types in codegen - too
rare to worry about?):
https://alive2.llvm.org/ce/z/3-CPfo

This will be more effective if we have moved the shift
after the bswap as proposed in D122010, but it is
independent of that patch.

Differential Revision: https://reviews.llvm.org/D122166
2022-03-22 08:22:30 -04:00
Nikita Popov
fc8946fae7 [InstCombine] Remove integer SPF of SPF folds (NFCI)
Now that we canonicalize to intrinsics, these folds should no
longer be needed. Only one fold that also applies to floating-point
min/max is retained.
2022-03-18 10:20:48 +01:00
Andrew Wei
0af3e6a22d [InstCombine] Sink instructions with multiple users in a successor block.
This patch tries to sink instructions when they are only used in a successor block.

This is a further enhancement patch based on Anna's commit:
D109700, which allows sinking an instruction having multiple uses in a single user.

In this patch, sink instructions with multiple users in a single successor block will be supported.
It could fix a known issue from rust:
  https://github.com/rust-lang/rust/issues/51346#issuecomment-394443610

Reviewed By: nikic, reames

Differential Revision: https://reviews.llvm.org/D121585
2022-03-18 11:53:45 +08:00
Nikita Popov
4010a7a5d0 Reapply [InstCombine] Support switch in phi to cond fold
Reapply with an explicit check for multi-edges, as the expected
behavior of multi-edge dominance is unclear (D120811).

-----

For conditional branches, we know the value is i1 0 or i1 1 along
the outgoing edges. For switches we can apply exactly the same
optimization, just with the known values determined by the switch
cases.
2022-03-17 10:03:09 +01:00
Sanjay Patel
598721f866 [InstCombine] try harder to propagate 'nsz' through fneg-of-select
This can be viewed as swapping the select arms:
https://alive2.llvm.org/ce/z/jUvFMJ
...so we don't have the 'nsz' problem with the more general fold.

This unlocks other folds for the motivating fabs example.
This was discussed in issue #38828.
2022-03-15 11:05:29 -04:00
Simon Pilgrim
7e4cf582cf [InstCombine] Add general constant support to eq/ne icmp(add(X,C1),add(Y,C2)) -> icmp(add(X,C1-C2),Y) fold
A further extension for Issue #32161

For eq/ne comparisons - the sign mismatch and bounds constraints are redundant, so if the that fold fails, fallback and just fold the constants directly.

https://alive2.llvm.org/ce/z/cdodNQ

The loop rotation test change looks mostly benign - the backend doesn't seem to suffer? https://gcc.godbolt.org/z/dErMY78To

Differential Revision: https://reviews.llvm.org/D121551
2022-03-15 14:17:38 +00:00
Craig Topper
ce78e68261 [InstCombine] Fold select based logic of fcmps with same operands when FMF is present.
If we have a logical and/or in select form and the true/false operand
is an fcmp with poison generating FMF, we won't be able to fold it
to an and/or instruction. This prevents us from optimizing the case
where it is a logical operation of two fcmps with identical operands.

This patch adds explicit checks for this case that doesn't rely on
converting to and/or to do the optimization. It reuses the existing
foldLogicOfFCmps, but adds a new flag to disable the other combine
that is inside that function.

FMF flags from the two FCmps are intersected using the logic added in
D121243. The FIXME has been updated to indicate that we can only use
a union for the non-select form.

This allows us to optimize cases like this from compare-fp-3.c in the
gcc torture suite with fast math.

void
test1 (float x, float y)
{
  if ((x==y) && (x!=y))
    link_error0();
}

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D121323
2022-03-14 14:45:07 -07:00
Sanjay Patel
3491f2f4b0 [InstCombine] replace negated operand in fcmp with 0.0
X (any pred) -X --> X (any pred) 0.0

This works with all FP values and preserves FMF.
Alive2 examples:
https://alive2.llvm.org/ce/z/dj6jhp

This can also create one of the patterns that we match as "fabs"
as shown in one of the test diffs.
2022-03-10 12:53:32 -05:00
Sanjay Patel
9fac110bf7 Revert "[InstCombine] fold fcmp with lossy casted constant"
This reverts commit 9397bdc67eb2b9eedc247a89bef01c2484b48b89.

This optimization is likely to surprise programmers as seen
in post-commit comments, so we should add a clang warning
first (that is proposed in D121306).
2022-03-10 10:22:22 -05:00
Simon Pilgrim
808d9d260b [InstCombine] Add vector support to icmp(add(X,C1),add(Y,C2)) -> icmp(add(X,C1-C2),Y) fold
As discussed on Issue #32161 this fold can be generalized a lot more than it currently is, but this patch at least adds vector support.

Differential Revision: https://reviews.llvm.org/D121358
2022-03-10 13:30:48 +00:00
Craig Topper
f72fe2ef67 [InstCombine] Preserve FMF in foldLogicOfFCmps.
This patch intersects the fast math flags from the two fcmps instead
of dropping them.

I poked at this a bunch with Alive2 for nnan and ninf flags and it seemed
to check out. With the other flags it told me "Couldn't prove the
correctness of the transformation". Not sure if I should just preserve
nnan and ninf?

Reviewed By: spatel, lebedev.ri

Differential Revision: https://reviews.llvm.org/D121243
2022-03-09 09:17:09 -08:00
Sanjay Patel
9397bdc67e [InstCombine] fold fcmp with lossy casted constant
This is noted as a missing clang warning in #54222
(and we should still make that enhancement).

Alive2 proofs:
https://alive2.llvm.org/ce/z/Q8drDq
https://alive2.llvm.org/ce/z/pE6LRt

I don't see a single conversion for all predicates
using "getFCmpCode" logic, so other predicates are
left as a TODO item.
2022-03-08 12:41:12 -05:00
Arnold Schwaighofer
dcdc1f29bb InstCombine: Can't fold a phi arg load into the phi if the load is from a swifterror address
`swifterror` addresses are only allowed as operands to load, store, and
calls.

The following transformation is not allowed. It would create a phi with a
`swifterror` address operand.

```
 %addr = alloca swifterror i8*
 br %cond, label %bb1, label %b22

 bb1:
   %val1 = load i8*, i8** %addr
   br exit

 bb2:
   %val2 = load i8*, i8** %addr
   br exit

 exit:
   %val = phi [%val1, %bb1] [%val2, %bb2]
```

=>

```
 %addr = alloca swifterror i8*
 br %cond, label %bb1, label %b22

 bb1:
   br exit

 bb2:
   br exit

 exit:
   %val_addr = phi [%addr, %bb1] [%addr, %bb2]
   %val2 = load i8*, i8** %val_addr
```

rdar://89865485

Differential Revision: https://reviews.llvm.org/D121217
2022-03-08 09:09:51 -08:00
Augie Fackler
5e4c75db3b InstructionCombining: avoid eliding mismatched alloc/free pairs
Prior to this change LLVM would happily elide a call to any allocation
function and a call to any free function operating on the same unused
pointer. This can cause problems in some obscure cases, for example if
the body of operator::new can be inlined but the body of
operator::delete can't, as in this example from jyknight:

    #include <stdlib.h>
    #include <stdio.h>

    int allocs = 0;

    void *operator new(size_t n) {
        allocs++;
        void *mem = malloc(n);
        if (!mem) abort();
        return mem;
    }

    __attribute__((noinline)) void operator delete(void *mem) noexcept {
        allocs--;
        free(mem);
    }

    void deleteit(int*i) { delete i; }
    int main() {
        int*i = new int;
        deleteit(i);
        if (allocs != 0)
          printf("MEMORY LEAK! allocs: %d\n", allocs);
    }

This patch addresses the issue by introducing the concept of an
allocator function family and uses it to make sure that alloc/free
function pairs are only removed if they're in the same family.

Differential Revision: https://reviews.llvm.org/D117356
2022-03-04 10:41:10 -05:00
Craig Topper
608161225e [InstCombine][Analysis] Move getFCmpCode and getPredForFCmpCode to CmpInstAnalysis. NFC
The similar getICmpCode and getPredForICmpCode are already there.
This moves FP for consistency.

I think InstCombine is currently the only user of both.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D120754
2022-03-03 09:33:24 -08:00
Nikita Popov
c1b9667148 [InstCombine] Support opaque pointers in callee bitcast fold
To make this actually trigger, we also need to check whether the
function types differ, which is a hidden cast under opaque pointers.
The transform is somewhat less relevant there because it is
primarily about pointer bitcasts, but it can also happen with other
bit- or pointer-castable types.

Byval handling is easier with opaque pointers because there is no
need to adjust the byval type, we only need to make sure that it's
still a pointer.
2022-03-03 11:07:39 +01:00
Nikita Popov
6c8adc5054 [InstCombine] Remove unnecessary byval check in callee cast fold
The logic for handling this was fixed in
8d7f118ab2b9e51d6cf2811291e319b4d977eb8c, but the check for byval
on the callee was retained. This resulted in a weird situation
where the transform would work depending on whether the byval
was only on the call or on both the call and the function.
2022-03-03 10:55:14 +01:00
serge-sans-paille
59630917d6 Cleanup includes: Transform/Scalar
Estimated impact on preprocessor output line:
before: 1062981579
after:  1062494547

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D120817
2022-03-03 07:56:34 +01:00
Nikita Popov
61580d0949 Reapply [InstCombine] Remove one-use limitation from X-Y==0 fold
This is a recommit without changes. I originally reverted this
due to a significant code-size regression on tramp3d-v4, however
further investigation showed that in the tramp3d-v4 case this
change enables additional optimizations (in particular more
jump threading), which happens to reduce the size of a function
just enough to be eligible for inlining at hot callsites, which
results in the code size increase. As such, this was just bad
luck.

-----

This one-use limitation is artificial, we do not increase
instruction count if we perform the fold with multiple uses. The
motivating case is shown in @sub_eq_zero_select, where the one-use
limitation causes us to miss a subsequent select fold.

I believe the backend is pretty good about reusing flag-producing
subs for cmps with same operands, so I think doing this is fine.

Differential Revision: https://reviews.llvm.org/D120337
2022-03-02 16:43:33 +01:00
Nikita Popov
5cf06d10f8 Revert "[InstCombine] Support switch in phi to cond fold"
This reverts commit 0817ce86b540f909eade6a8d7370e1b47e863a70.

Seeing some ppc64le stage2 failures, reverting to investigate.
2022-03-02 12:49:47 +01:00
Nikita Popov
0817ce86b5 [InstCombine] Support switch in phi to cond fold
For conditional branches, we know the value is i1 0 or i1 1 along
the outgoing edges. For switches we can apply exactly the same
optimization, just with the known values determined by the switch
cases.
2022-03-02 12:16:32 +01:00
serge-sans-paille
a494ae43be Cleanup includes: TransformsUtils
Estimation on the impact on preprocessor output:
before: 1065307662
after:  1064800684

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D120741
2022-03-01 21:00:07 +01:00
Craig Topper
7bc6667845 [Analysis] Simplify the interface to llvm::getICmpCode. NFC
Instead of passing an InstCmpInt * and a bool just pass the predicate
from the caller.

I'm considering moving the similar FCmp functions from InstCombine
over here and this makes the interface consistent with what is used
for FCmp.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D120609
2022-03-01 09:53:27 -08:00
Nikita Popov
a1f442b278 [InstCombine] Support phi to cond fold with more than two preds
This transform can still be applied if there are more than two
phi inputs, as long as phi inputs with the same value are dominated
by the same idom edge.
2022-03-01 16:31:49 +01:00
Nikita Popov
26748bb15a [InstCombine] Slightly relax one-use check in abs canonicalization
Treat the icmp and sub symmetrically, and require that one of them
has one use, not the icmp in particular. This could be further
relaxed in the abs (but not nabs) case to not check one-use at
all.
2022-03-01 15:06:41 +01:00
Sanjay Patel
84812b9b07 [InstCombine] drop FMF in select->copysign transform
It is not correct to propagate flags from the select
to the new instructions:
https://alive2.llvm.org/ce/z/tNATrd
https://alive2.llvm.org/ce/z/VwcVzn

Fixes #54077
2022-03-01 08:51:41 -05:00
Nikita Popov
c2428a4fad [InstCombine] Remove SPF min/max check from select demanded bits (NFCI)
This should no longer be necessary now that we canonicalize to
intrinsics. This may not be entirely NFC in practice if worklist
order gets inverted and we perform demanded bits simplification
of a select user before the select is canonicalized.
2022-03-01 14:50:37 +01:00
Sanjay Patel
278b407a30 [InstCombine] fold mul-with-overflow intrinsic with -1 operand
extractvalue (any_mul_with_overflow X, -1), 0 --> -X

There are similar other potential transforms that we could do as
noted by the last TODO in the test diffs.

Fixes #54053
2022-02-28 14:13:48 -05:00
Sanjay Patel
f422c5d871 [InstCombine] fold select-of-zero-or-ones with negated op
(X u< 2) ? -X : -1 --> sext (X != 0)
(X u> 1) ? -1 : -X --> sext (X != 0)

https://alive2.llvm.org/ce/z/U3y5Bb
https://alive2.llvm.org/ce/z/hgi-4p

This is part of solving:
2022-02-28 12:07:49 -05:00
Nikita Popov
5423b0a525 [InstCombine] Remove not of SPF min/max fold (NFCI)
This should no longer be necessary now that we canonicalize to
intrinsics. Might not be strictly NFC due to worklist order.
2022-02-28 11:02:31 +01:00
Nikita Popov
d5ea3b2f33 [InstCombine] Remove sub of SPF min/max fold (NFCI)
This isn't necessary anymore, now that we canonicalize SPF min/max
to intrinsics. Might not be strictly NFC due to worklist order
changes.
2022-02-28 10:57:24 +01:00
Nikita Popov
9353ed6a53 [InstCombine] Don't call matchSAddSubSat() for SPF (NFC)
Only call it for intrinsic min/max. The moved implementation is
unchanged apart from the one-use check: It is now hardcoded to
one-use, without the two-use special case for SPF.
2022-02-28 10:41:56 +01:00
Nikita Popov
53602e4c70 [InstCombine] Remove SPF moveAddAfterMinMax() (NFC)
As SPF min/max is canonicalized to intrinsics before this point,
this change should be entirely NFC.
2022-02-28 10:28:16 +01:00