8995 Commits

Author SHA1 Message Date
Yingwei Zheng
a77dedcacb
[InstSimplify][InstCombine][ConstantFold] Move vector div/rem by zero fold to InstCombine (#114280)
Previously we fold `div/rem X, C` into `poison` if any element of the
constant divisor `C` is zero or undef. However, it is incorrect when
threading udiv over an vector select:
https://alive2.llvm.org/ce/z/3Ninx5
```
define <2 x i32> @vec_select_udiv_poison(<2 x i1> %x) {
  %sel = select <2 x i1> %x, <2 x i32> <i32 -1, i32 -1>, <2 x i32> <i32 0, i32 1>
  %div = udiv <2 x i32> <i32 42, i32 -7>, %sel
  ret <2 x i32> %div
}
```
In this case, `threadBinOpOverSelect` folds `udiv <i32 42, i32 -7>, <i32
-1, i32 -1>` and `udiv <i32 42, i32 -7>, <i32 0, i32 1>` into
`zeroinitializer` and `poison`, respectively. One solution is to
introduce a new flag indicating that we are threading over a vector
select. But it requires to modify both `InstSimplify` and
`ConstantFold`.

However, this optimization doesn't provide benefits to real-world
programs:

https://dtcxzyw.github.io/llvm-opt-benchmark/coverage/data/zyw/opt-ci/actions-runner/_work/llvm-opt-benchmark/llvm-opt-benchmark/llvm/llvm-project/llvm/lib/IR/ConstantFold.cpp.html#L908

https://dtcxzyw.github.io/llvm-opt-benchmark/coverage/data/zyw/opt-ci/actions-runner/_work/llvm-opt-benchmark/llvm-opt-benchmark/llvm/llvm-project/llvm/lib/Analysis/InstructionSimplify.cpp.html#L1107

This patch moves the fold into InstCombine to avoid breaking numerous
existing tests.

Fixes #114191 and #113866 (only poison-safety issue).
2024-11-01 22:56:22 +08:00
Yingwei Zheng
e577f14b67
[InstCombine] Use m_NotForbidPoison when folding (X u< Y) ? -1 : (~X + Y) --> uadd.sat(~X, Y) (#114345)
Alive2: https://alive2.llvm.org/ce/z/mTGCo-
We cannot reuse `~X` if `m_AllOnes` matches a vector constant with some
poison elts. An alternative solution is to create a new not instead of
reusing `~X`. But it doesn't worth the effort because we need to add a
one-use check.

Fixes https://github.com/llvm/llvm-project/issues/113869.
2024-11-01 22:18:44 +08:00
Yingwei Zheng
96b14f2ccb
[Reland][InstCombine] Fix FMF propagation in foldSelectIntoOp (#114499)
Relands #114356. Compared to the last version, this patch only merges
poison-generating/nsz flags from the select to fix LV regression in
`llvm/test/Transforms/PhaseOrdering/AArch64/predicated-reduction.ll`.
2024-11-01 12:22:57 +08:00
c8ef
cf0b6cc711
Revert "[ConstantFold] Fold tgamma and tgammaf when the input parameter is a constant value." (#114496)
Reverts llvm/llvm-project#114065
2024-11-01 09:26:11 +08:00
c8ef
1f07f995cc
[ConstantFold] Fold tgamma and tgammaf when the input parameter is a constant value. (#114065)
This patch adds support for constant folding for the `tgamma` and
`tgammaf` libc functions.
2024-11-01 09:07:55 +08:00
gulfemsavrun
d183dc7c24
Revert "[InstCombine] Fix FMF propagation in foldSelectIntoOp" (#114458)
Reverts llvm/llvm-project#114356 because it caused test failures.
https://lab.llvm.org/buildbot/#/builders/190/builds/8601

https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-base-linux-x64/b8732549597609293617/overview
2024-10-31 13:21:52 -07:00
Artem Belevich
8129b6b53b
[NVPTX, InstCombine] instcombine known pointer AS checks. (#114325)
The change improves the code in general and, as a side effect, avoids
crashing on an impossible address space casts guarded 
by `__isGlobal/__isShared`, which partially fixes 
https://github.com/llvm/llvm-project/issues/112760

It's still possible to trigger the issue by using explicit AS casts w/o
AS checks, but LLVM should no longer crash on valid code.

This is #112964 + a small fix for the crash on unintended argument
access which was the root cause to revers the earlier version of the patch.
2024-10-31 09:24:51 -07:00
Yingwei Zheng
cf1963afad
[InstCombine] Fix FMF propagation in foldSelectIntoOp (#114356)
Closes https://github.com/llvm/llvm-project/issues/113423.
2024-10-31 23:26:45 +08:00
Artem Belevich
04e876e6c6
Revert "[NVPTX] instcombine known pointer AS checks." (#114319)
Reverts llvm/llvm-project#112964

Crashes MLIR: https://lab.llvm.org/buildbot/#/builders/138/builds/5665
2024-10-30 15:34:08 -07:00
Artem Belevich
1cecc58c3f
[NVPTX] instcombine known pointer AS checks. (#112964)
The change improves the code in general and, as a side effect, avoids crashing
on an impossible address space casts guarded by `__isGlobal/__isShared`, which
partially fixes https://github.com/llvm/llvm-project/issues/112760
It's still possible to trigger the issue by using explicit AS casts w/o
AS checks, but LLVM should no longer crash on valid code.
2024-10-30 15:13:06 -07:00
Yingwei Zheng
18311093ab
[InstCombine] Do not fold shufflevector(select) if the select condition is a vector (#113993)
Since `shufflevector` is not element-wise, we cannot do fold it into
select when the select condition is a vector.
For shufflevector that doesn't change the length, it doesn't crash, but
it is still a miscompilation: https://alive2.llvm.org/ce/z/s8saCx

Fixes https://github.com/llvm/llvm-project/issues/113986.
2024-10-29 10:39:07 +08:00
David Majnemer
902acde341 [InstCombine] Optimize away certain additions using modular arithmetic
We can turn:
```
  %add = add i8 %arg, C1
  %and = and i8 %add, C2
  %cmp = icmp eq i1 %and, C3
```

into:
```
  %and = and i8 %arg, C2
  %cmp = icmp eq i1 %and, (C3 - C1) & C2
```

This is only worth doing if the sequence is the sole user of the addition
operation.
2024-10-28 22:51:35 +00:00
Matthias Braun
5903c6af44
InstCombine: Fold shufflevector(select) and shufflevector(phi) (#113746)
- Transform `shufflevector(select(c, x, y), C)` to
  `select(c, shufflevector(x, C), shufflevector(y, C))` by re-using
  the `FoldOpIntoSelect` helper.
- Transform `shufflevector(phi(x, y), C)` to
  `phi(shufflevector(x, C), shufflevector(y, C))` by re-using the
  `foldOpInotPhi` helper.
2024-10-28 15:35:17 -07:00
Yingwei Zheng
f78610af3f
[InstCombine] Add function attribute instcombine-no-verify-fixpoint (#113822)
This patch introduces a function attribute
`instcombine-no-verify-fixpoint` to avoids disabling fix-point
verification for unrelated tests in the same file.
Address comment
https://github.com/llvm/llvm-project/pull/112642#discussion_r1804714387.
2024-10-28 17:45:08 +08:00
Yingwei Zheng
5155c38cee
[InstCombine] Don't check uses of constant exprs (#113684)
This patch skips constant expressions to avoid iterating over uses on
other functions.

Fix crash reported in
https://github.com/llvm/llvm-project/pull/105510#issuecomment-2437521147.
2024-10-28 15:09:20 +08:00
David Majnemer
5d4a0d54b5 [InstCombine] Teach takeLog2 about right shifts, truncation and bitwise-and
We left some easy opportunities for further simplifications.

log2(trunc(x)) is simply trunc(log2(x)). This is safe if we know that
trunc is NUW because it means that the truncation didn't drop any bits.
It is also safe if the caller is OK with zero as a possible answer.

log2(x >>u y) is simply `log2(x) - y`.

log2(x & y) is a funny one. It comes up when doing something like:
```
unsigned int f(unsigned int x, unsigned int y) {
  unsigned char a = 1u << x;
  return y / a;
}
```

LLVM would canonicalize this to:
```
  %shl = shl nuw i32 1, %x
  %conv1 = and i32 %shl, 255
  %div = udiv i32 %y, %conv1
```

In cases like these, we can ignore the mask entirely.
This is equivalent to `y >> x`.
2024-10-28 05:13:04 +00:00
ssijaric-nv
14db069468
[InstCombine] Fix a cycle when folding fneg(select) with scalable vector types (#112465)
The two folding operations are causing a cycle for the following case
with
scalable vector types:

define <vscale x 2 x double> @test_fneg_select_abs(<vscale x 2 x i1>
%cond, <vscale x 2 x double> %b) {
%1 = select <vscale x 2 x i1> %cond, <vscale x 2 x double>
zeroinitializer, <vscale x 2 x double> %b
  %2 = fneg fast <vscale x 2 x double> %1
  ret <vscale x 2 x double> %2
}

1) fold fneg:  -(Cond ? C : Y) -> Cond ? -C : -Y

2) fold select: (Cond ? -X : -Y) -> -(Cond ? X : Y)

1) results in the following since '<vscale x 2 x double>
zeroinitializer' passes
the check for the immediate constant:

%.neg = fneg fast <vscale x 2 x double> zeroinitializer
%b.neg = fneg fast <vscale x 2 x double> %b
%1 = select fast <vscale x 2 x i1> %cond, <vscale x 2 x double> %.neg,
<vscale x 2 x double> %b.neg

and so we end up going back and forth between 1) and 2).

Attempt to fold scalable vector constants, so that we end up with a
splat instead:

define <vscale x 2 x double> @test_fneg_select_abs(<vscale x 2 x i1>
%cond, <vscale x 2 x double> %b) {
  %b.neg = fneg fast <vscale x 2 x double> %b
%1 = select fast <vscale x 2 x i1> %cond, <vscale x 2 x double>
shufflevector (<vscale x 2 x double> insertelement (<vscale x 2 x
double> poison, double -0.000000e+00, i64 0), <vscale x 2 x double>
poison, <vscale x 2 x i32> zeroinitializer), <vscale x 2 x double>
%b.neg
  ret <vscale x 2 x double> %1
}
2024-10-25 10:47:39 -07:00
Noah Goldstein
294726d738 Reapply "[InstCombine] Folding (icmp eq/ne (and X, -P2), INT_MIN)" (#111236)
The underlying issue with msan was fixed by #113200
2024-10-23 09:12:08 -05:00
Alex MacLean
4c1b1f6d21
[NVPTX] Add support for clamped funnel shift intrinsics (#113228)
Add support for ``llvm.nvvm.fshl.clamp`` and ``llvm.nvvm.fshr.clamp``
intrinsics. These intrinsics are similar to the generic llvm funnel
shift, except that the shift value is clamped to the integer width.
Currently only ``i32`` is supported and is implemented with the
`shf.[rl].clamp.b32` PTX instruction.
2024-10-22 16:39:44 -07:00
Paul Walker
5bb34803a4 [NFC] Migrate tests to use autoupdate for CHECK lines. 2024-10-22 12:55:15 +00:00
c8ef
b90ea5caad
[ConstantFold] Fold erf and erff when the input parameter is a constant value. (#113079)
This patch adds support for constant folding for the `erf` and `erff`
libc functions.
2024-10-22 12:58:11 +08:00
Jake Egan
900b6369e2 [AIX][test] XFAIL constant folding log1p test
Test added by commit 47a6da2d4dc7d996eb2678243ac566822d59e483 fails on the AIX bot. So XFAIL for now to investigate further.
2024-10-21 11:27:15 -04:00
XChy
a2ba438f3e
[InstCombine] Preserve the flag from RHS only if the and is bitwise (#113164)
Fixes #113123
Alive proof: https://alive2.llvm.org/ce/z/hnqeLC
2024-10-21 22:30:31 +08:00
c8ef
1336e3d0b9
[ConstantFold] Fold ilogb and ilogbf when the input parameter is a constant value. (#113014)
This patch adds support for constant folding for the `ilogb` and
`ilogbf` libc functions.
2024-10-20 10:46:35 +08:00
Ramkumar Ramachandra
7b65971e1f
InstCombine: sink loads with invariant.load metadata (#112692) 2024-10-18 10:35:56 +01:00
Danila Malyutin
1a609052b6
[AArch64][InstCombine] Eliminate redundant barrier intrinsics (#112023)
If there are no memory ops on the path from one dmb to another then one
barrier can be eliminated.
2024-10-17 21:04:04 +04:00
goldsteinn
c85611e858
[SimplifyLibCall][Attribute] Fix bug where we may keep range attr with incompatible type (#112649)
In a variety of places we change the bitwidth of a parameter but don't
update the attributes.

The issue in this case is from the `range` attribute when inlining
`__memset_chk`. `optimizeMemSetChk` will replace an `i32` with an
`i8`, and if the `i32` had a `range` attr assosiated it will cause an
error.

Fixes #112633
2024-10-17 10:32:55 -05:00
Nikita Popov
d9cd607200 [InstCombine] Add tests for #110919 (NFC) 2024-10-17 14:57:38 +02:00
Yingwei Zheng
095d49da76
[InstCombine] Set samesign when converting signed predicates into unsigned (#112642)
Alive2: https://alive2.llvm.org/ce/z/6cqdt-
2024-10-17 20:43:48 +08:00
Yingwei Zheng
aad3a1630e
[ValueTracking] Respect samesign flag in isKnownInversion (#112390)
In https://github.com/llvm/llvm-project/pull/93591 we introduced
`isKnownInversion` and assumes `X` is poison implies `Y` is poison
because they share common operands. But after introducing `samesign`
this assumption no longer hold if `X` is an icmp has `samesign` flag.

Alive2 link: https://alive2.llvm.org/ce/z/rj3EwQ (Please run it locally
with this patch and https://github.com/AliveToolkit/alive2/pull/1098).

This approach is the most conservative way in my mind to address this
problem. If `X` has `samesign` flag, it will check if `Y` also has this
flag and make sure constant RHS operands have the same sign.

Fixes https://github.com/llvm/llvm-project/issues/112350.
2024-10-17 00:27:21 +08:00
Ramkumar Ramachandra
682fa797b7
InstCombine/Select: remove redundant code (NFC) (#112388)
InstCombinerImpl::foldSelectInstWithICmp has some inlined code for
select-icmp-xor simplification, but this simplification is already done
by other code, via another path:

  (X & Y) == 0 ? X : X ^ Y ->
  ((X & Y) == 0 ? 0 : Y) ^ X ->
  (X & Y) ^ X ->
  X & ~Y

Cover the cases that it claims to simplify, and demonstrate that
stripping it doesn't cause test changes.
2024-10-16 12:44:09 +01:00
Yingwei Zheng
0936195311
[InstCombine] Drop samesign in InstCombine (#112480)
Closes https://github.com/llvm/llvm-project/issues/112476.
2024-10-16 19:13:52 +08:00
Yingwei Zheng
3bf2295ee0
[InstCombine] Drop samesign flag in foldAndOrOfICmpsWithConstEq (#112489)
In
5dbfca30c1
we assume that RHS is poison implies LHS is also poison. It doesn't hold
after introducing samesign flag.

This patch drops the `samesign` flag on RHS if the original expression
is a logical and/or.

Closes #112467.
2024-10-16 16:24:44 +08:00
Alexey Bader
583fa4f5b7
[InstCombine] Extend fcmp+select folding to minnum/maxnum intrinsics (#112088)
Today, InstCombine can fold fcmp+select patterns to minnum/maxnum
intrinsics when the nnan and nsz flags are set. The ordering of the
operands in both the fcmp and select instructions is important for the
folding to occur.

maxnum patterns:
1. (a op b) ? a : b -> maxnum(a, b), where op is one of {ogt, oge}
2. (a op b) ? b : a -> maxnum(a, b), where op is one of {ule, ult}

The second pattern is supposed to make the order of the operands in the
select instruction irrelevant. However, the pattern matching code uses
the CmpInst::getInversePredicate method to invert the comparison
predicate. This method doesn't take into account the fast-math flags,
which can lead missing the folding opportunity.

The patch extends the pattern matching code to handle unordered fcmp
instructions. This allows the folding to occur even when the select
instruction has the operands in the inverse order.

New maxnum patterns:
1. (a op b) ? a : b -> maxnum(a, b), where op is one of {ugt, uge}
2. (a op b) ? b : a -> maxnum(a, b), where op is one of {ole, olt}

The same changes are applied to the minnum intrinsic.
2024-10-15 22:05:16 +04:00
c8ef
47a6da2d4d
[ConstantFold] Fold log1p and log1pf when the input parameter is a constant value. (#112113)
This patch adds support for constant folding for the `log1p` and
`log1pf` libc functions.
2024-10-16 00:19:26 +08:00
Yingwei Zheng
9b7491e866
[IR] Add support for samesign in Operator::hasPoisonGeneratingFlags (#112358)
Fix https://github.com/llvm/llvm-project/issues/112356.
2024-10-15 23:07:16 +08:00
Ramkumar Ramachandra
1c6c850937
InstCombine: extend select-equiv to support vectors (#111966)
foldSelectEquivalence currently doesn't support GVN-like replacements on
vector types. Put in the checks for potentially lane-crossing
operations, and lift the limitation.
2024-10-15 11:10:45 +01:00
Ramkumar Ramachandra
fe526ae99b
InstCombine/test: cover foldSelectValueEquivalence (#111694)
Write dedicated tests for foldSelectValueEquivalence, demonstrating that
it does not perform many GVN-like replacements when:

- the comparison is a vector-type
- the comparison is a floating-point type

 as a prelude to fixing these deficiencies.
2024-10-15 10:33:03 +01:00
Yingwei Zheng
8d8bb4032b
[Verifier] Verify attribute denormal-fp-math[-f32] (#112310)
Some typos are also fixed. Address
https://github.com/llvm/llvm-project/pull/112067#pullrequestreview-2363722447.
2024-10-15 17:32:16 +08:00
elhewaty
9efb07f261
[IR] Add samesign flag to icmp instruction (#111419)
Inspired by
https://discourse.llvm.org/t/rfc-signedness-independent-icmps/81423
2024-10-15 17:11:25 +08:00
Ramkumar Ramachandra
bdf241cab3
ValueTracking: handle more ops in isNotCrossLaneOperation (#112183)
Reuse llvm::isTriviallyVectorizable in llvm::isNotCrossLaneOperation, in
order to get it to handle more intrinsics.

Alive2 proofs for changed tests: https://alive2.llvm.org/ce/z/XSV_GT
2024-10-14 14:08:12 +01:00
Yingwei Zheng
9edc454ee6
[InstCombine] Drop range attributes in foldIsPowerOf2OrZero (#112178)
Closes https://github.com/llvm/llvm-project/issues/112078.
2024-10-14 20:52:55 +08:00
Ramkumar Ramachandra
c5f82f7893
ValueTracking: introduce llvm::isNotCrossLaneOperation (#112011)
Factor out and unify common code from InstSimplify and InstCombine that
partially guard against cross-lane vector operations into
llvm::isNotCrossLaneOperation in ValueTracking.

Alive2 proofs for changed tests: https://alive2.llvm.org/ce/z/68H4ka
2024-10-14 11:37:30 +01:00
Yingwei Zheng
966bee739c
[InstCombine][NFC] Fix typo in is_fpclass.ll (#112067)
This typo causes alive2 to crash.
2024-10-12 11:06:25 +08:00
Yingwei Zheng
6a65e98fa7
[InstCombine] Drop range attributes in foldIsPowerOf2 (#111946)
Fixes https://github.com/llvm/llvm-project/issues/111934.
2024-10-11 18:19:21 +08:00
braw-lee
3645c64d87
[SimplifyLibCalls] fdim constant fold (#109235)
2nd PR to fix #108695 

based on #108702

---------

Signed-off-by: Kushal Pal <kushalpal109@gmail.com>
2024-10-10 14:44:39 +04:00
David Green
5184d763c7
[InstCombine] Convert @log to @llvm.log if the input is known positive. (#111428)
Similar to 112aac4e8961b9626bb84f36deeaa5a674f03f5a, this converts log
libcalls to llvm.log.f64 intrinsics if we know they do not set errno, as
the input is not zero and not negative. As log will produce errno if the
input is 0 (returning -inf) or if the input is negative (returning nan),
we also perform the conversion when we have noinf and nonan.
2024-10-10 09:54:25 +01:00
c8ef
923566a67d
[ConstantFold] Fold logb and logbf when the input parameter is a constant value. (#111232)
This patch adds support for constant folding for the `logb` and `logbf`
libc functions.
2024-10-10 07:56:16 +08:00
David Green
587f31fb28 [InstCombine] Add a test for converting log to an intrinsic. NFC 2024-10-09 09:25:13 +01:00
Matt Arsenault
a8e1311a1c
[RFC] IR: Define noalias.addrspace metadata (#102461)
This is intended to solve a problem with lowering atomics in
OpenMP and C++ common to AMDGPU and NVPTX.

In OpenCL and CUDA, it is undefined behavior for an atomic instruction
to modify an object in thread private memory. In OpenMP, it is defined.
Correspondingly, the hardware does not handle this correctly. For
AMDGPU,
32-bit atomics work and 64-bit atomics are silently dropped. We
therefore
need to codegen this by inserting a runtime address space check,
performing
the private case without atomics, and fallback to issuing the real
atomic
otherwise. This metadata allows us to avoid this extra check and branch.

Handle this by introducing metadata intended to be applied to atomicrmw,
indicating they cannot access the forbidden address space.
2024-10-07 23:21:42 +04:00