6354 Commits

Author SHA1 Message Date
Yingwei Zheng
a77dedcacb
[InstSimplify][InstCombine][ConstantFold] Move vector div/rem by zero fold to InstCombine (#114280)
Previously we fold `div/rem X, C` into `poison` if any element of the
constant divisor `C` is zero or undef. However, it is incorrect when
threading udiv over an vector select:
https://alive2.llvm.org/ce/z/3Ninx5
```
define <2 x i32> @vec_select_udiv_poison(<2 x i1> %x) {
  %sel = select <2 x i1> %x, <2 x i32> <i32 -1, i32 -1>, <2 x i32> <i32 0, i32 1>
  %div = udiv <2 x i32> <i32 42, i32 -7>, %sel
  ret <2 x i32> %div
}
```
In this case, `threadBinOpOverSelect` folds `udiv <i32 42, i32 -7>, <i32
-1, i32 -1>` and `udiv <i32 42, i32 -7>, <i32 0, i32 1>` into
`zeroinitializer` and `poison`, respectively. One solution is to
introduce a new flag indicating that we are threading over a vector
select. But it requires to modify both `InstSimplify` and
`ConstantFold`.

However, this optimization doesn't provide benefits to real-world
programs:

https://dtcxzyw.github.io/llvm-opt-benchmark/coverage/data/zyw/opt-ci/actions-runner/_work/llvm-opt-benchmark/llvm-opt-benchmark/llvm/llvm-project/llvm/lib/IR/ConstantFold.cpp.html#L908

https://dtcxzyw.github.io/llvm-opt-benchmark/coverage/data/zyw/opt-ci/actions-runner/_work/llvm-opt-benchmark/llvm-opt-benchmark/llvm/llvm-project/llvm/lib/Analysis/InstructionSimplify.cpp.html#L1107

This patch moves the fold into InstCombine to avoid breaking numerous
existing tests.

Fixes #114191 and #113866 (only poison-safety issue).
2024-11-01 22:56:22 +08:00
Yingwei Zheng
e577f14b67
[InstCombine] Use m_NotForbidPoison when folding (X u< Y) ? -1 : (~X + Y) --> uadd.sat(~X, Y) (#114345)
Alive2: https://alive2.llvm.org/ce/z/mTGCo-
We cannot reuse `~X` if `m_AllOnes` matches a vector constant with some
poison elts. An alternative solution is to create a new not instead of
reusing `~X`. But it doesn't worth the effort because we need to add a
one-use check.

Fixes https://github.com/llvm/llvm-project/issues/113869.
2024-11-01 22:18:44 +08:00
Yingwei Zheng
96b14f2ccb
[Reland][InstCombine] Fix FMF propagation in foldSelectIntoOp (#114499)
Relands #114356. Compared to the last version, this patch only merges
poison-generating/nsz flags from the select to fix LV regression in
`llvm/test/Transforms/PhaseOrdering/AArch64/predicated-reduction.ll`.
2024-11-01 12:22:57 +08:00
gulfemsavrun
d183dc7c24
Revert "[InstCombine] Fix FMF propagation in foldSelectIntoOp" (#114458)
Reverts llvm/llvm-project#114356 because it caused test failures.
https://lab.llvm.org/buildbot/#/builders/190/builds/8601

https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-base-linux-x64/b8732549597609293617/overview
2024-10-31 13:21:52 -07:00
Yingwei Zheng
cf1963afad
[InstCombine] Fix FMF propagation in foldSelectIntoOp (#114356)
Closes https://github.com/llvm/llvm-project/issues/113423.
2024-10-31 23:26:45 +08:00
Yingwei Zheng
18311093ab
[InstCombine] Do not fold shufflevector(select) if the select condition is a vector (#113993)
Since `shufflevector` is not element-wise, we cannot do fold it into
select when the select condition is a vector.
For shufflevector that doesn't change the length, it doesn't crash, but
it is still a miscompilation: https://alive2.llvm.org/ce/z/s8saCx

Fixes https://github.com/llvm/llvm-project/issues/113986.
2024-10-29 10:39:07 +08:00
David Majnemer
902acde341 [InstCombine] Optimize away certain additions using modular arithmetic
We can turn:
```
  %add = add i8 %arg, C1
  %and = and i8 %add, C2
  %cmp = icmp eq i1 %and, C3
```

into:
```
  %and = and i8 %arg, C2
  %cmp = icmp eq i1 %and, (C3 - C1) & C2
```

This is only worth doing if the sequence is the sole user of the addition
operation.
2024-10-28 22:51:35 +00:00
Matthias Braun
5903c6af44
InstCombine: Fold shufflevector(select) and shufflevector(phi) (#113746)
- Transform `shufflevector(select(c, x, y), C)` to
  `select(c, shufflevector(x, C), shufflevector(y, C))` by re-using
  the `FoldOpIntoSelect` helper.
- Transform `shufflevector(phi(x, y), C)` to
  `phi(shufflevector(x, C), shufflevector(y, C))` by re-using the
  `foldOpInotPhi` helper.
2024-10-28 15:35:17 -07:00
Yingwei Zheng
f78610af3f
[InstCombine] Add function attribute instcombine-no-verify-fixpoint (#113822)
This patch introduces a function attribute
`instcombine-no-verify-fixpoint` to avoids disabling fix-point
verification for unrelated tests in the same file.
Address comment
https://github.com/llvm/llvm-project/pull/112642#discussion_r1804714387.
2024-10-28 17:45:08 +08:00
Yingwei Zheng
5155c38cee
[InstCombine] Don't check uses of constant exprs (#113684)
This patch skips constant expressions to avoid iterating over uses on
other functions.

Fix crash reported in
https://github.com/llvm/llvm-project/pull/105510#issuecomment-2437521147.
2024-10-28 15:09:20 +08:00
David Majnemer
5d4a0d54b5 [InstCombine] Teach takeLog2 about right shifts, truncation and bitwise-and
We left some easy opportunities for further simplifications.

log2(trunc(x)) is simply trunc(log2(x)). This is safe if we know that
trunc is NUW because it means that the truncation didn't drop any bits.
It is also safe if the caller is OK with zero as a possible answer.

log2(x >>u y) is simply `log2(x) - y`.

log2(x & y) is a funny one. It comes up when doing something like:
```
unsigned int f(unsigned int x, unsigned int y) {
  unsigned char a = 1u << x;
  return y / a;
}
```

LLVM would canonicalize this to:
```
  %shl = shl nuw i32 1, %x
  %conv1 = and i32 %shl, 255
  %div = udiv i32 %y, %conv1
```

In cases like these, we can ignore the mask entirely.
This is equivalent to `y >> x`.
2024-10-28 05:13:04 +00:00
Jay Foad
90cdc03e7f
[IR] Fix undiagnosed cases of structs containing scalable vectors (#113455)
Type::isScalableTy and StructType::containsScalableVectorType failed to
detect some cases of structs containing scalable vectors because
containsScalableVectorType did not call back into isScalableTy to check
the element types. Fix this, which requires sharing the same Visited set
in both functions. Also change the external API so that callers are
never required to pass in a Visited set, and normalize the naming to
isScalableTy.
2024-10-25 12:56:10 +01:00
Noah Goldstein
294726d738 Reapply "[InstCombine] Folding (icmp eq/ne (and X, -P2), INT_MIN)" (#111236)
The underlying issue with msan was fixed by #113200
2024-10-23 09:12:08 -05:00
Andreas Jonson
00b47b98d4 [NFC] Fix missplaced comment 2024-10-22 20:51:46 +02:00
XChy
a2ba438f3e
[InstCombine] Preserve the flag from RHS only if the and is bitwise (#113164)
Fixes #113123
Alive proof: https://alive2.llvm.org/ce/z/hnqeLC
2024-10-21 22:30:31 +08:00
Kazu Hirata
8819267747
[InstCombine] Simplify code with SmallMapVector::operator[] (NFC) (#113022) 2024-10-19 14:38:40 -07:00
Ramkumar Ramachandra
7b65971e1f
InstCombine: sink loads with invariant.load metadata (#112692) 2024-10-18 10:35:56 +01:00
goldsteinn
c85611e858
[SimplifyLibCall][Attribute] Fix bug where we may keep range attr with incompatible type (#112649)
In a variety of places we change the bitwidth of a parameter but don't
update the attributes.

The issue in this case is from the `range` attribute when inlining
`__memset_chk`. `optimizeMemSetChk` will replace an `i32` with an
`i8`, and if the `i32` had a `range` attr assosiated it will cause an
error.

Fixes #112633
2024-10-17 10:32:55 -05:00
Jay Foad
85c17e4092
[LLVM] Make more use of IRBuilder::CreateIntrinsic. NFC. (#112706)
Convert many instances of:
  Fn = Intrinsic::getOrInsertDeclaration(...);
  CreateCall(Fn, ...)
to the equivalent CreateIntrinsic call.
2024-10-17 16:20:43 +01:00
Yingwei Zheng
095d49da76
[InstCombine] Set samesign when converting signed predicates into unsigned (#112642)
Alive2: https://alive2.llvm.org/ce/z/6cqdt-
2024-10-17 20:43:48 +08:00
Nikita Popov
0f7d148db4 [InstCombine] Add shared helper for logical and bitwise and/or (NFC)
Add a helper for shared folds between logical and bitwise and/or
and move the and/or of icmp and fcmp folds in there. This makes
it easier to extend to more folds.

A possible extension would be to base the current and/or of icmp
reassociation logic on this helper, so that it for example also
applies to fcmp.
2024-10-17 14:25:44 +02:00
Ramkumar Ramachandra
682fa797b7
InstCombine/Select: remove redundant code (NFC) (#112388)
InstCombinerImpl::foldSelectInstWithICmp has some inlined code for
select-icmp-xor simplification, but this simplification is already done
by other code, via another path:

  (X & Y) == 0 ? X : X ^ Y ->
  ((X & Y) == 0 ? 0 : Y) ^ X ->
  (X & Y) ^ X ->
  X & ~Y

Cover the cases that it claims to simplify, and demonstrate that
stripping it doesn't cause test changes.
2024-10-16 12:44:09 +01:00
Yingwei Zheng
0936195311
[InstCombine] Drop samesign in InstCombine (#112480)
Closes https://github.com/llvm/llvm-project/issues/112476.
2024-10-16 19:13:52 +08:00
Yingwei Zheng
3bf2295ee0
[InstCombine] Drop samesign flag in foldAndOrOfICmpsWithConstEq (#112489)
In
5dbfca30c1
we assume that RHS is poison implies LHS is also poison. It doesn't hold
after introducing samesign flag.

This patch drops the `samesign` flag on RHS if the original expression
is a logical and/or.

Closes #112467.
2024-10-16 16:24:44 +08:00
Alexey Bader
583fa4f5b7
[InstCombine] Extend fcmp+select folding to minnum/maxnum intrinsics (#112088)
Today, InstCombine can fold fcmp+select patterns to minnum/maxnum
intrinsics when the nnan and nsz flags are set. The ordering of the
operands in both the fcmp and select instructions is important for the
folding to occur.

maxnum patterns:
1. (a op b) ? a : b -> maxnum(a, b), where op is one of {ogt, oge}
2. (a op b) ? b : a -> maxnum(a, b), where op is one of {ule, ult}

The second pattern is supposed to make the order of the operands in the
select instruction irrelevant. However, the pattern matching code uses
the CmpInst::getInversePredicate method to invert the comparison
predicate. This method doesn't take into account the fast-math flags,
which can lead missing the folding opportunity.

The patch extends the pattern matching code to handle unordered fcmp
instructions. This allows the folding to occur even when the select
instruction has the operands in the inverse order.

New maxnum patterns:
1. (a op b) ? a : b -> maxnum(a, b), where op is one of {ugt, uge}
2. (a op b) ? b : a -> maxnum(a, b), where op is one of {ole, olt}

The same changes are applied to the minnum intrinsic.
2024-10-15 22:05:16 +04:00
Ramkumar Ramachandra
1c6c850937
InstCombine: extend select-equiv to support vectors (#111966)
foldSelectEquivalence currently doesn't support GVN-like replacements on
vector types. Put in the checks for potentially lane-crossing
operations, and lift the limitation.
2024-10-15 11:10:45 +01:00
Yingwei Zheng
9edc454ee6
[InstCombine] Drop range attributes in foldIsPowerOf2OrZero (#112178)
Closes https://github.com/llvm/llvm-project/issues/112078.
2024-10-14 20:52:55 +08:00
Ramkumar Ramachandra
c5f82f7893
ValueTracking: introduce llvm::isNotCrossLaneOperation (#112011)
Factor out and unify common code from InstSimplify and InstCombine that
partially guard against cross-lane vector operations into
llvm::isNotCrossLaneOperation in ValueTracking.

Alive2 proofs for changed tests: https://alive2.llvm.org/ce/z/68H4ka
2024-10-14 11:37:30 +01:00
Rahul Joshi
fa789dffb1
[NFC] Rename Intrinsic::getDeclaration to getOrInsertDeclaration (#111752)
Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).
2024-10-11 05:26:03 -07:00
Yingwei Zheng
6a65e98fa7
[InstCombine] Drop range attributes in foldIsPowerOf2 (#111946)
Fixes https://github.com/llvm/llvm-project/issues/111934.
2024-10-11 18:19:21 +08:00
Arthur Eubanks
e34d614e7d
[Passes] Remove -enable-infer-alignment-pass flag (#111873)
This flag has been on for a while without any complaints.
2024-10-10 12:28:46 -07:00
Kazu Hirata
2d8cd32ae5
[InstCombine] Avoid repeated hash lookups (NFC) (#111618) 2024-10-08 20:37:33 -07:00
David Green
d2408c417c
[InstCombine] Canonicalize more geps with constant gep bases and constant offsets. (#110033)
This is another small but hopefully not performance negative step to
canonicalizing towards i8 geps. We looks for geps with a constant offset
base pointer of the form `gep (gep @glob, C1), x, C2` and expand the gep
instruction, so that the constant can hopefully be combined together (or
the x offset can be computed in common).
2024-10-06 10:44:21 +01:00
Vitaly Buka
574266ce33
Revert "[InstCombine] Folding (icmp eq/ne (and X, -P2), INT_MIN)" (#111236)
Reverts #110880 because of exposed issue is Msan instrumentation
#111212.

This reverts commit a64643688526114b50c25b3eda8a57855bd2be87.
2024-10-04 23:20:40 -07:00
Benjamin Maxwell
6b3220afa6
[InstCombine] Avoid crash on aggregate types in SimplifyDemandedUseFPClass (#111128)
The disables folding for FP aggregates that are not poison/posZero
types, which is currently not supported. Note: To fully handle this
aggregates would also likely require teaching `computeKnownFPClass()` to
handle array and struct constants (which does not seem implemented
outside of zero init).
2024-10-04 15:00:24 +01:00
Nikita Popov
67d247a441
[InstCombine] Decompose more icmps into masks (#110836)
Extend decomposeBitTestICmp() to handle cases where the resulting
comparison is of the form `icmp (X & Mask) pred C` with non-zero
`C`. Add a flag to allow code to opt-in to this behavior and use it in
the "log op of icmp" fold infrastructure.

This addresses regressions from #97289.

Proofs: https://alive2.llvm.org/ce/z/hUhdbU
2024-10-04 10:17:23 +02:00
Noah Goldstein
a646436885 [InstCombine] Folding (icmp eq/ne (and X, -P2), INT_MIN)
Folds to `(icmp slt/sge X, (INT_MIN + P2))`

Proofs: https://alive2.llvm.org/ce/z/vpNFY5

Closes #110880
2024-10-03 13:05:08 -05:00
Stephen Tozer
caa265e01c
[DebugInfo][InstCombine] Do not overwrite prior DILocation for new Insts (#108565)
When InstCombine replaces an old instruction with a new instruction, it
copies !dbg and !annotation metadata from old to new. For some
InstCombine patterns we set a specific DILocation on the new instruction
prior to insertion, however, which more accurately reflects the new
instruction. This more specific DILocation may be overwritten on
insertion by a less appropriate one, resulting in a less correct line
mapping. This patch changes this behaviour to only copy the DILocation
from old to new if the new instruction has no existing DILocation (which
will always be the case for a new instruction unless InstCombine has
specifically set one).
2024-10-03 17:08:45 +01:00
Marina Taylor
d0d12fc78a
[InstCombine] Fold (X==Z) ? (Y==Z) : (!(Y==Z) && X==Y) --> X==Y (#108619)
This corresponds to the canonicalized form of some logic that was
seen in Swift-generated code for comparing optional pointers:
`(X==Z || Y==Z) ? (X==Z && Y==Z) : X==Y --> X==Y`
where `Z` was the constant `0`.

https://alive2.llvm.org/ce/z/J_3aa9
2024-10-03 15:33:30 +01:00
Nikita Popov
7de492f90d [InstCombine] Preserve nuw flag in indexed compare fold
If all the involved GEPs have the nuw flag, also preserve it on
the resulting adds and GEPs.
2024-10-02 16:03:47 +02:00
Yingwei Zheng
62cd07fb67
[InstCombine] Canonicalize sub mask, X -> ~X when high bits are ignored (#110635)
Alive2: https://alive2.llvm.org/ce/z/NJgBPL

The motivating case of this patch is to emit `andn` on RISC-V with zbb
for expressions like `(sub 63, X) & 63`.
2024-10-02 12:48:06 +08:00
Nikita Popov
e565a4fa0b [IR] Extract helper for GEPNoWrapFlags intersection (NFC)
When combining two geps into one by adding the offsets, we have
to take some care when intersecting the flags, because nusw flags
cannot be straightforwardly preserved.

Add a helper for this on GEPNoWrapFlags so we won't have to repeat
this logic in various places.
2024-10-01 16:58:23 +02:00
Yingwei Zheng
2a2c35a9a6
[InstCombine] Fold icmp spred (mul nsw X, Z), (mul nsw Y, Z) into icmp spred X, Y (#110630)
```
icmp spred (mul nsw X, Z), (mul nsw Y, Z) -> icmp spred X, Y iff Z > 0
icmp spred (mul nsw X, Z), (mul nsw Y, Z) -> icmp spred Y, X iff Z < 0
```
Alive2: https://alive2.llvm.org/ce/z/9fXFfn
2024-10-01 22:16:05 +08:00
Nikita Popov
e2a855def5 [InstCombine] Fix SimplifyDemandedBits recursion cutoff for Arguments
There was a discrepancy between how SimplifyDemandedBits and
computeKnownBits handled the Argument case. computeKnownBits()
would use information from range attributes even once the
recursion limit has been reached.

Fixes https://github.com/llvm/llvm-project/issues/110631.
2024-10-01 11:44:13 +02:00
Yingwei Zheng
1efd1227b2
[InstCombine] Fold icmp eq/ne (X *nw Z), (Y *nw Z) -> icmp eq/ne Z, 0 when X != Y (#110413)
Alive2: https://alive2.llvm.org/ce/z/9oDP6K
I found this pattern in
04e75858d7/casadi/core/repmat.cpp (L70-L78).
2024-09-30 10:21:20 +08:00
Simon Pilgrim
795c24c6fb
[InstCombine] foldVecExtTruncToExtElt - extend to handle trunc(lshr(extractelement(x,c1),c2)) -> extractelement(bitcast(x),c3) patterns. (#109689)
This patch moves the existing trunc+extractlement -> extractelement+bitcast fold into a foldVecExtTruncToExtElt helper and extends the helper to handle trunc+lshr+extractelement cases as well.

Fixes #107404
2024-09-28 17:52:10 +01:00
Ramkumar Ramachandra
1832d609f7
InstCombine/Demanded: simplify srem case (NFC) (#110260)
The srem case of SimplifyDemandedUseBits partially duplicates
KnownBits::srem. It is guarded by a statement that takes the absolute
value of the RHS and checks whether it is a power of 2, but the abs()
call here useless, since an srem with a negative RHS is flipped into one
with a positive RHS, adjusting LHS appropriately. Stripping the abs call
allows us to call KnownBits::srem instead of partially duplicating it.
2024-09-27 19:12:35 +01:00
Nikita Popov
5ef02a3fd4 [InstCombine] Fall through to computeKnownBits() for sdiv by -1
When dividing by -1 we were breaking out of the code entirely,
while we should fall through to computeKnownBits().

This fixes an instcombine-verify-known-bits discrepancy.

Fixes https://github.com/llvm/llvm-project/issues/109957.
2024-09-25 14:23:06 +02:00
Nikita Popov
b8d1bae648
[CmpInstAnalysis] Return decomposed bit test as struct (NFC) (#109819)
decomposeBitTestICmp() currently returns the result via two out
parameters plus an in-place modification of Pred. This changes it to
return an optional struct instead.

The motivation here is twofold. First, I'd like to extend this code to
handle cases where the comparison is against a value other than zero,
which would mean yet another out parameter. Second, while doing that I
was badly bitten by the in-place modification, so I'd like to get rid of
it.
2024-09-25 10:14:15 +02:00
Marina Taylor
5cd0900ef6
[InstCombine] Compare icmp inttoptr, inttoptr values directly (#107012)
InstCombine already has some rules for `icmp ptrtoint, ptrtoint` to drop
the casts and compare the source values. This change adds the same for
the reverse case with `inttoptr`.
2024-09-24 09:39:07 +02:00