165 Commits

Author SHA1 Message Date
Noah Goldstein
01d8e1ca01 [ValueTracking] Handle non-canonical operand order in isImpliedCondICmps
We don't always have canonical order here, so do it manually.

Closes #85575
2024-03-17 17:46:06 -05:00
Paul Walker
fd07b8f809 [LLVM][tests/Transforms/InstCombine] Convert instances of ConstantExpr based splats to use splat().
This is mostly NFC but some output does change due to consistently
inserting into poison rather than undef and using i64 as the index
type for inserts.
2024-02-27 13:37:23 +00:00
Nikita Popov
90ba33099c
[InstCombine] Canonicalize constant GEPs to i8 source element type (#68882)
This patch canonicalizes getelementptr instructions with constant
indices to use the `i8` source element type. This makes it easier for
optimizations to recognize that two GEPs are identical, because they
don't need to see past many different ways to express the same offset.

This is a first step towards
https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699.
This is limited to constant GEPs only for now, as they have a clear
canonical form, while we're not yet sure how exactly to deal with
variable indices.

The test llvm/test/Transforms/PhaseOrdering/switch_with_geps.ll gives
two representative examples of the kind of optimization improvement we
expect from this change. In the first test SimplifyCFG can now realize
that all switch branches are actually the same. In the second test it
can convert it into simple arithmetic. These are representative of
common optimization failures we see in Rust.

Fixes https://github.com/llvm/llvm-project/issues/69841.
2024-01-24 15:25:29 +01:00
Nikita Popov
d77067d08a
[ValueTracking] Add dominating condition support in computeKnownBits() (#73662)
This adds support for using dominating conditions in computeKnownBits()
when called from InstCombine. The implementation uses a
DomConditionCache, which stores which branches may provide information
that is relevant for a given value.

DomConditionCache is similar to AssumptionCache, but does not try to do
any kind of automatic tracking. Relevant branches have to be explicitly
registered and invalidated values explicitly removed. The necessary
tracking is done inside InstCombine.

The reason why this doesn't just do exactly the same thing as
AssumptionCache is that a lot more transforms touch branches and branch
conditions than assumptions. AssumptionCache is an immutable analysis
and mostly gets away with this because only a handful of places have to
register additional assumptions (mostly as a result of cloning). This is
very much not the case for branches.

This change regresses compile-time by about ~0.2%. It also improves
stage2-O0-g builds by about ~0.2%, which indicates that this change results
in additional optimizations inside clang itself.

Fixes https://github.com/llvm/llvm-project/issues/74242.
2023-12-06 14:17:18 +01:00
Craig Topper
7ec4f6094e
[InstCombine] Infer disjoint flag on Or instructions. (#72912)
The disjoint flag was recently added to IR in #72583

We already set it when we turn an add into an or. This patch sets it on Ors that weren't converted from an Add.
2023-12-02 14:11:12 -08:00
Nikita Popov
ac75171d41 [InstCombine] Fix incorrect nneg inference on shift amount
Whether this is valid depends on the bit widths of the involved
integers.

Fixes https://github.com/llvm/llvm-project/issues/72927.
2023-11-21 15:47:55 +01:00
Nikita Popov
a1652fdb5e [InstCombine] Add tests for incorrect shift nneg inference (NFC)
The second test is a miscompile.
2023-11-21 15:47:55 +01:00
Yingwei Zheng
6da4ecdf92
[InstCombine] Infer shift flags with unknown shamt (#72535)
Alive2: https://alive2.llvm.org/ce/z/82Wr3q

Related patch:
2dd52b4527
2023-11-18 15:15:14 +08:00
Yingwei Zheng
4fdc289d4a
[InstCombine] Infer nsw flag for (X <<nuw C1) >>u C --> X << (C1 - C) (#72407)
Alive2: https://alive2.llvm.org/ce/z/nnHAPy
This missed optimization is discovered with the help of
https://github.com/AliveToolkit/alive2/pull/962.
2023-11-16 02:34:59 +08:00
Nikita Popov
5918f62301
[InstCombine] Infer zext nneg flag (#71534)
Use KnownBits to infer the nneg flag on zext instructions.

Currently we only set nneg when converting sext -> zext, but don't set
it when we have a zext in the first place. If we want to use it in
optimizations, we should make sure the flag inference is consistent.
2023-11-08 09:34:40 +01:00
Nikita Popov
b47ff36134
[InstCombine] Drop exact flag instead of increasing demanded bits (#70311)
Demanded bit simplification for lshr/ashr will currently demand the low
bits if the exact flag is set. This is because these bits must be zero
to satisfy the flag.

However, this means that our demanded bits simplification is worse for
lshr/ashr exact than it is for plain lshr/ashr, which is generally not
desirable.

Instead, drop the exact flag if a demanded bits simplification of the
operand succeeds, which may no longer satisfy the exact flag.

This matches what we do for the exact flag on udiv, as well as the
nuw/nsw flags on add/sub/mul.
2023-10-26 13:12:30 +02:00
Nikita Popov
cf3ac964dc [InstCombine] Add additional demanded bits tests for shifts (NFC) 2023-10-26 11:03:58 +02:00
Dmitriy Smirnov
e13bed4c5f [PATCH] [llvm] [InstCombine] Canonicalise ADD+GEP
This patch tries to canonicalise add + gep to gep + gep.

Co-authored-by: Paul Walker <paul.walker@arm.com>

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D155688
2023-10-06 12:29:06 +01:00
Nikita Popov
41895843b5 [InstCombine] Only perform one iteration
InstCombine is a worklist-driven algorithm, which works roughly
as follows:

* All instructions are initially pushed to the worklist.
  The initial order is in RPO program order.
* All newly inserted instructions get added to the worklist.
* When an instruction is folded, its users get added back to the
  worklist.
* When the use-count of an instruction decreases, it gets added
  back to the worklist.
* And a few of other heuristics on when we should revisit
  instructions.

On top of the worklist algorithm, InstCombine layers an additional
fix-point iteration: If any fold was performed in the previous
iteration, then InstCombine will re-populate the worklist from
scratch and fold the entire function again. This continues until
a fix-point is reached.

In the vast majority of cases, InstCombine will reach a fix-point
within a single iteration: However, a second iteration is performed
to verify that this is indeed the fixpoint. We can see this in the
statistics for llvm-test-suite:

    "instcombine.NumOneIteration": 411380,
    "instcombine.NumTwoIterations": 117921,
    "instcombine.NumThreeIterations": 236,
    "instcombine.NumFourOrMoreIterations": 2,

The way to read these numbers is that in 411380 cases, InstCombine
performs no folds. In 117921 cases it performs a fold and reaches
the fix-point within one iteration (the second iteration verifies
the fixpoint). In the remaining 238 cases, more than one iteration
is needed to reach the fixpoint.

In other words, only in 0.04% of cases are additional iterations
needed to reach a fixpoint. Conversely, in 22.3% of cases InstCombine
performs a completely useless extra iteration to verify the fix point.

This patch removes the fixpoint iteration from InstCombine, and always
only perform a single iteration. This results in a major compile-time
improvement of around 4% at negligible codegen impact.

This explicitly does accept that we will not reach a fixpoint in all
cases. However, this is mitigated by two factors: First, the data
suggests that this happens very rarely in practice. Second,
InstCombine runs many times during the optimization pipeline
(8 times even without LTO), so there are many chances to recover
such cases.

In order to prevent accidental optimization regressions in the
future, this implements a verify-fixpoint option, which is enabled
by default when instcombine is specified in -passes and disabled
when InstCombinePass() is constructed from C++. This means that
test cases need to explicitly use the no-verify-fixpoint option
if they fail to reach a fixed point (for a well understand reason
we cannot / do not want to avoid).

Differential Revision: https://reviews.llvm.org/D154579
2023-07-31 10:56:49 +02:00
Nikita Popov
0db5d8e123 Reapply [InstSimplify] Make simplifyWithOpReplaced() recursive (PR63104)
A similar assumption as for the x^x case also existed for the absorber
case, which lead to a stage2 miscompile. That assumption is not fixed.

-----

Support replacement of operands not only in the immediate
instruction, but also instructions it uses.

To the most part, this extension is straightforward, but there are
two bits worth highlighting:

First, we can now no longer assume that if the Op is a vector, the
instruction also returns a vector. If Op is a vector and the
instruction returns a scalar, we should consider it as a cross-lane
operation.

Second, for the x ^ x special case and the absorber special case, we
can no longer assume that one of the operands is RepOp, as we might
have a replacement higher up the instruction chain.

There is one optimization regression, but it is in a fuzzer-generated
test case.

Fixes https://github.com/llvm/llvm-project/issues/63104.
2023-07-18 10:36:39 +02:00
Nikita Popov
2bc7d02312 Revert "[InstSimplify] Make simplifyWithOpReplaced() recursive (PR63104)"
This is very likely the cause of a stage 2 failure in
Transforms/LoopVectorize/check-prof-info.ll. Revert until I can
investigate this.

This reverts commit 3d199d086e076f0b9b90d4c59f2226a417a639b5.
2023-07-14 18:33:39 +02:00
Nikita Popov
3d199d086e [InstSimplify] Make simplifyWithOpReplaced() recursive (PR63104)
Support replacement of operands not only in the immediate
instruction, but also instructions it uses.

To the most part, this extension is straightforward, but there are
two bits worth highlighting:

First, we can now no longer assume that if the Op is a vector, the
instruction also returns a vector. If Op is a vector and the
instruction returns a scalar, we should consider it as a cross-lane
operation.

Second, for the x ^ x special case, we can no longer assume that
the operand is RepOp, as we might have a replacement higher up the
instruction chain.

There is one optimization regression, but it is in a fuzzer-generated
test case.

Fixes https://github.com/llvm/llvm-project/issues/63104.
2023-07-14 16:33:40 +02:00
Nikita Popov
21827268ad [InstCombine] Fold add of zext and sext of i1
(zext a) + (sext a) is 0 if a is a bool.

The regression is in a fuzzer-generated test.

Proof: https://alive2.llvm.org/ce/z/KotnN6
2023-07-14 14:52:13 +02:00
Nikita Popov
fa45fb7f0c [InstCombine] Handle assumes in multi-use demanded bits simplification
This fixes the largest remaining discrepancy between results of
computeKnownBits() and SimplifyDemandedBits(). We only care about
the multi-use case here, because the assume necessarily introduces
an extra use.
2023-06-02 14:24:24 +02:00
Nikita Popov
f7d1baa414 [KnownBits] Return zero instead of unknown for always poison shifts
For always poison shifts, any KnownBits return value is valid.
Currently we return unknown, but returning zero is generally more
profitable. We had some code in ValueTracking that tried to do this,
but was actually dead code.

Differential Revision: https://reviews.llvm.org/D150648
2023-05-23 14:41:22 +02:00
Zhongyunde
90d30fde12 [InstCombine] Add frozen for the condition value of SelectInst
If the condition value of SelectInst may be a poison or undef value,
infer constant range at SelectInst use is incorrect, similar to D143883.
Fixes https://github.com/llvm/llvm-project/issues/62401

Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D149339
2023-04-27 21:35:54 +08:00
Noah Goldstein
5a3d9e0617 [InstCombine] Transform (shift X,Or(Y,BitWidth-1)) -> (shift X,BitWidth-1)
shl : https://alive2.llvm.org/ce/z/_B7Qca
lshr: https://alive2.llvm.org/ce/z/6eXz_W
ashr: https://alive2.llvm.org/ce/z/oGEx-q

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D145326
2023-03-06 20:30:06 -06:00
Noah Goldstein
9bb409ff1d [InstCombine] Add tests for transform (shift X,(Or Y, BitWidth-1)); NFC
Differential Revision: https://reviews.llvm.org/D145334
2023-03-06 20:29:57 -06:00
Sanjay Patel
8e8467d9d8 [InstCombine] canonicalize "extract lowest set bit" away from cttz intrinsic
1 << (cttz X) --> -X & X
https://alive2.llvm.org/ce/z/qv3E9e

This creates an extra use of the input value, so that's generally
not preferred, but there are advantages to this direction:
1. 'negate' and 'and' allow for better analysis than 'cttz'.
2. This is more likely to induce follow-on transforms (in the
   example from issue #60801, we'll get the decrement pattern).
3. The more basic ALU ops are more likely to result in better
   codegen across a variety of targets.

This won't solve the motivating bugs (see issue #60799) because
we do not recognize the redundant icmp+sel, and the x86 backend
may not have the pattern-matching to produce the optimal BMI
instructions.

Differential Revision: https://reviews.llvm.org/D144329
2023-02-19 17:29:40 -05:00
Sanjay Patel
a8831631c7 [InstCombine] add tests for 1<<cttz(x); NFC
issue #60799
issue #60801
2023-02-18 08:34:55 -05:00
Sanjay Patel
d4493dd1ed [InstCombine] add nuw to any (1<<x)
https://alive2.llvm.org/ce/z/9EjDKE

This was mentioned as a missing fold in D139598.

It can unlock follow-on folds in some cases.
This verifies one of the changed tests:
https://alive2.llvm.org/ce/z/B_btDM
2022-12-15 12:03:47 -05:00
William Huang
be4b1dd35b [InstCombine] Revert D125845
Reverting D125845 `[InstCombine] Canonicalize GEP of GEP by swapping constant-indexed GEP to the back` because multiple users reported performance regression

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D138950
2022-11-29 22:02:40 +00:00
Philip Reames
656e53e544 [instcombine] Add basic test coverage for demanded bits of scalable vectors 2022-10-21 07:59:04 -07:00
William Huang
6c767cef5a [InstCombine] Canonicalize GEP of GEP by swapping constant-indexed GEP to the back
Canonicalize GEP of GEP by swapping GEP with some suffix constant indices to the back (and GEP with all constant indices to the back of that), this allows more constant index GEP merging to happen. Exceptions are: If swapping violates use-def relations, or anti-optimizes LICM

For constant indexed GEP of GEP, if they cannot be merged directly, they will be casted to i8* and merged.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D125845
2022-10-20 17:41:26 +00:00
Bjorn Pettersson
4ab40eca08 [test][InstCombine] Update some test cases to use opaque pointers
These tests cases were converted using the script at
https://gist.github.com/nikic/98357b71fd67756b0f064c9517b62a34

Differential Revision: https://reviews.llvm.org/D135094
2022-10-03 22:17:59 +02:00
Sanjay Patel
1d1d1e6f22 [InstCombine] fold full-shift of sdiv to icmp+extend
This is a disguised sign-bit test with offset:
(X / +DivC) >> (Width - 1) --> ext (X <= -DivC)
(X / -DivC) >> (Width - 1) --> ext (X >= +DivC)

https://alive2.llvm.org/ce/z/cO8JO4

We don't match/test poison in the sdiv constant because
that would be immediate undefined behavior.
2022-09-18 13:13:14 -04:00
Sanjay Patel
0ae6bc0771 [InstCombine] add tests for full-right-shift of sdiv; NFC 2022-09-18 13:13:14 -04:00
Sanjay Patel
73919a87e9 [InstCombine] try multi-use demanded bits folds for 'add'
This patch enables a multi-use demanded bits fold (motivated by issue #57576):
https://alive2.llvm.org/ce/z/DsZakh

This mimics transforms that we already do on the single-use path.

Originally, this patch did not include the last part to form a constant, but
that can be removed independently to reduce risk. It's not clear what the
effect of either change will be when viewed end-to-end.

This is expected to be neutral or a slight win for compile-time.
See the "add-demand2" series for experimental timing results:
https://llvm-compile-time-tracker.com/?config=NewPM-O3&stat=instructions&remote=rotateright

Differential Revision: https://reviews.llvm.org/D133788
2022-09-14 09:30:59 -04:00
Craig Topper
d76c8f5127 [InstCombine] Add mul with negated power of 2 constant to canEvaluateShifted.
If we are right shifting a multiply by a negated power of 2 where
the power of 2 is the same as the shift amount, we can replace with
a negate followed by an And.

New tests have not been committed yet but the patch shows the diffs.
Let me know if you want any changes or additional tests.

Differential Revision: https://reviews.llvm.org/D130103
2022-07-20 11:00:22 -07:00
Craig Topper
3aff7870a7 [InstCombine] Pre-commit test for D130103. 2022-07-20 11:00:21 -07:00
Nuno Lopes
952e069393 [NFC] remove 'br undef' from InstCombine test cases
This is UB and allows the compiler to give any result, so these tests weren't meaningful
InstCombine tests are now clean of 'br undef'
2022-06-10 15:28:57 +01:00
Sanjay Patel
a004438959 [InstCombine] add/move tests for shift-of-constant-by-same-shift-by-constant; NFC 2022-05-30 15:17:54 -04:00
Nikita Popov
0863abe3ac [InstCombine] Fold icmp of select with non-constant operand
Try to push an icmp into a select even if the icmp operand isn't
constant - perform a generic SimplifyICmpInst instead.

This doesn't appear to impact compile-time much, and forming
logical and/or is generally profitable, as we have very good
support for them.
2022-05-06 16:04:39 +02:00
Bjorn Pettersson
acdc419c89 [test] Use -passes=instcombine instead of -instcombine in lots of tests. NFC
Another step moving away from the deprecated syntax of specifying
pass pipeline in opt.

Differential Revision: https://reviews.llvm.org/D119081
2022-02-07 14:26:59 +01:00
Sanjay Patel
2d031ec5e5 [InstCombine] add one-use check to opposite shift folds
Test comments say this might be intentional, but I don't
see any hard evidence to support it. The extra instruction
shows up as a potential regression in D117680.

One test does show a missed fold that might be recovered
with better demanded bits analysis.
2022-01-20 13:49:23 -05:00
Sanjay Patel
09c575e728 [InstCombine] add/move tests for shl with binop; NFC 2021-09-28 14:46:27 -04:00
Sanjay Patel
21429cf43a [InstCombine] generalize fold for (trunc (X u>> C1)) u>> C
This is another step towards trying to re-apply D110170
by eliminating conflicting transforms that cause infinite loops.
a47c8e40c734 was a previous patch in this direction.

The diffs here are mostly cosmetic, but intentional:
1. The existing code that would handle this pattern in FoldShiftByConstant()
   is limited to 'shl' only now. The formatting change to IsLeftShift shows
   that we could move several transforms into visitShl() directly for
   efficiency because they are not common shift transforms.

2. The tests are regenerated to show new instruction names to prove that
   we are getting (almost) identical logic results.

3. The one case where we differ ("trunc_sandwich_small_shift1") shows that
   we now use a narrow 'and' instruction. Previously, we relied on another
   transform to do that, but it is limited to legal types. That seems to
   be a legacy constraint from when IR analysis and codegen were less robust.

https://alive2.llvm.org/ce/z/JxyGA4

  declare void @llvm.assume(i1)

  define i8 @src(i32 %x, i32 %c0, i8 %c1) {
    ; The sum of the shifts must not overflow the source width.
    %z1 = zext i8 %c1 to i32
    %sum = add i32 %c0, %z1
    %ov = icmp ult i32 %sum, 32
    call void @llvm.assume(i1 %ov)

    %sh1 = lshr i32 %x, %c0
    %tr = trunc i32 %sh1 to i8
    %sh2 = lshr i8 %tr, %c1
    ret i8 %sh2
  }

  define i8 @tgt(i32 %x, i32 %c0, i8 %c1) {
    %z1 = zext i8 %c1 to i32
    %sum = add i32 %c0, %z1
    %maskc = lshr i8 -1, %c1

    %s = lshr i32 %x, %sum
    %t = trunc i32 %s to i8
    %a = and i8 %t, %maskc
    ret i8 %a
  }
2021-09-27 10:57:31 -04:00
Simon Pilgrim
5a14edd8ed [InstCombine] Ensure shifts are in range for (X << C1) / C2 -> X fold.
We can get here before out of range shift amounts have been handled - limit to BW-2 for sdiv and BW-1 for udiv

Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=38078
2021-09-25 12:57:43 +01:00
Simon Pilgrim
10c982e0b3 Revert rG1c9bec727ab5c53fa060560dc8d346a911142170 : [InstCombine] Fold (gep (oneuse(gep Ptr, Idx0)), Idx1) -> (gep Ptr, (add Idx0, Idx1)) (PR51069)
Reverted (manually due to merge conflicts) while regressions reported on PR51540 are investigated

As noticed on D106352, after we've folded "(select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0))" if the inner Ptr was also a (now one use) gep we could then merge the geps, using the sum of the indices instead.

I've limited this to basic 2-op geps - a more general case further down InstCombinerImpl.visitGetElementPtrInst doesn't have the one-use limitation but only creates the add if it can be created via SimplifyAddInst.

https://alive2.llvm.org/ce/z/f8pLfD (Thanks Roman!)

Differential Revision: https://reviews.llvm.org/D106450
2021-08-23 21:09:26 +01:00
Simon Pilgrim
1c9bec727a [InstCombine] Fold (gep (oneuse(gep Ptr, Idx0)), Idx1) -> (gep Ptr, (add Idx0, Idx1)) (PR51069)
As noticed on D106352, after we've folded "(select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0))" if the inner Ptr was also a (now one use) gep we could then merge the geps, using the sum of the indices instead.

I've limited this to basic 2-op geps - a more general case further down InstCombinerImpl.visitGetElementPtrInst doesn't have the one-use limitation but only creates the add if it can be created via SimplifyAddInst.

https://alive2.llvm.org/ce/z/f8pLfD (Thanks Roman!)

Differential Revision: https://reviews.llvm.org/D106450
2021-07-22 10:58:51 +01:00
Sanjay Patel
0be0a1237c [ValueTracking] improve analysis for "C << X" and "C >> X"
This is based on the example/comments in:
https://llvm.org/PR48984

I tried just lifting the restriction in computeKnownBitsFromShiftOperator()
as suggested in the bug report, but that doesn't catch all of the cases
shown here. I didn't step through to see exactly why that happened. But it
seems like a reasonable compromise to cheaply check the special-case of
shifting a constant.

There's a slight regression on a cmp transform as noted, but this is likely
the more important/common pattern, so we can fix that icmp pattern later if
needed.

Differential Revision: https://reviews.llvm.org/D95959
2021-02-09 12:38:06 -05:00
Sanjay Patel
9d230295d9 [InstCombine] add tests for demanded/known bits of shifted constant; NFC
These are variations of a missed analysis noted in:
https://llvm.org/PR48984
2021-02-04 10:31:22 -05:00
Nikita Popov
a6df39236f [InstSimplify] Fold out-of-bounds shift to poison
Make InstSimplify return poison rather than undef for out-of-bounds
shifts, as specified by LandRef:

> If op2 is (statically or dynamically) equal to or larger than the
> number of bits in op1, this instruction returns a poison value.

Differential Revision: https://reviews.llvm.org/D93998
2021-01-06 20:41:37 +01:00
Nikita Popov
766cf7f32e [InstSimplify] Fold division by zero to poison
Div/rem by zero is immediate undefined behavior and anything goes.
Currently we fold it to undef, this patch changes it to fold to
poison instead, which is slightly stronger.

Differential Revision: https://reviews.llvm.org/D93995
2021-01-03 20:52:45 +01:00
Roman Lebedev
0e76a9bc58
[NFC][InstCombine] Update few comment updates i missed in 0ac56e8eaaeb
As pointed out in post-commit review in that commit
2020-11-06 17:38:00 +03:00