472 Commits

Author SHA1 Message Date
Slava Zakharin
c0c71463f6
[InstCombine] Optimize sub(sext(add(x,y)),sext(add(x,z))). (#144174)
This pattern can be often met in Flang generated LLVM IR,
for example, for the counts of the loops generated for array
expressions like: `a(x:x+y)` or `a(x+z:x+z)` or their variations.

In order to compute the loop count, Flang needs to subtract
the lower bound of the array slice from the upper bound
of the array slice. To avoid the sign wraps, it sign extends
the original values (that may be of any user data type)
to `i64`.

This peephole is really helpful in CPU2017/548.exchange2,
where we have multiple following statements like this:
```
block(row+1:row+2, 7:9, i7) = block(row+1:row+2, 7:9, i7) - 10
```

While this is just a 2x3 iterations loop nest, LLVM cannot
figure it out, ending up vectorizing the inner loop really
hard (with a vector epilog and scalar remainder). This, in turn,
causes problems for LSR that ends up creating too many loop-carried
values in the loop containing the above statement, which are then
causing too many spills/reloads.

Alive2: https://alive2.llvm.org/ce/z/gLgfYX

Related to #143219.
2025-06-19 10:13:58 -07:00
amordo
e4c3b037bc
[InstCombine] Fold tan(x) * cos(x) => sin(x) (#136319)
This patch enables folding `tan(x) * cos(x) -> sin(x)` under the `contract` flag.

Fixes https://github.com/llvm/llvm-project/issues/34950.
2025-06-18 23:12:31 +08:00
Ramkumar Ramachandra
b40e4ceaa6
[ValueTracking] Make Depth last default arg (NFC) (#142384)
Having a finite Depth (or recursion limit) for computeKnownBits is very
limiting, but is currently a load-bearing necessity, as all KnownBits
are recomputed on each call and there is no caching. As a prerequisite
for an effort to remove the recursion limit altogether, either using a
clever caching technique, or writing a easily-invalidable KnownBits
analysis, make the Depth argument in APIs in ValueTracking uniformly the
last argument with a default value. This would aid in removing the
argument when the time comes, as many callers that currently pass 0
explicitly are now updated to omit the argument altogether.
2025-06-03 17:12:24 +01:00
AZero13
ffd2633061
[InstCombine] Fold mul (shr exact (X, N)), 2^N + 1 -> add (X , shr exact (X, N)) (#112407)
Alive2 Proofs:
https://alive2.llvm.org/ce/z/aJnxyp
https://alive2.llvm.org/ce/z/dyeGEv
2025-02-13 14:25:09 +08:00
Jeremy Morse
304a99091c [NFC][DebugInfo] Use iterators for insertion at some final callsites
These are the callsites that have materialised in the last three weeks
since I last built with deprecation warnings.
2025-01-28 11:37:11 +00:00
Sushant Gokhale
e79bb8731a
[InstCombine] Fixup commit 7253c6f (#123315)
This should fix the assert failure we were getting for the darwin OS.
2025-01-17 02:14:04 -08:00
Sushant Gokhale
3b3590aa59
Revert "Revert "[InstCombine] Transform high latency, dependent FSQRT/FDIV into FMUL"" (#123313)
Reverts llvm/llvm-project#123289
2025-01-17 02:05:05 -08:00
Sushant Gokhale
606d0a7cdc
Revert "[InstCombine] Transform high latency, dependent FSQRT/FDIV into FMUL" (#123289)
Reverts llvm/llvm-project#87474
2025-01-16 22:50:20 -08:00
Sushant Gokhale
7253c6fde4
[InstCombine] Transform high latency, dependent FSQRT/FDIV into FMUL (#87474)
The proposed patch, in general, tries to transform the below code
sequence:
x = 1.0 / sqrt (a);
r1 = x * x;  // same as 1.0 / a
r2 = a / sqrt(a); // same as sqrt (a)

TO

(If x, r1 and r2 are all used further in the code) 
r1 = 1.0 / a
r2 = sqrt (a)
x = r1 * r2

The transform tries to make high latency sqrt and div operations
independent and also saves on one multiplication.

The patch was tested with SPEC17 suite with cpu=neoverse-v2. The
performance uplift achieved was:
544.nab_r   ~4%

No other regressions were observed. Also, no compile time differences
were observed with the patch.

Closes #54652
2025-01-16 21:09:15 -08:00
Veera
2d5f07c828
[InstCombine] Fold X udiv Y to X lshr cttz(Y) if Y is a power of 2 (#121386)
Fixes #115767

This PR folds `X udiv Y` to `X lshr cttz(Y)` if Y is a power of two
since bitwise operations are faster than division.

Proof: https://alive2.llvm.org/ce/z/qHmLta
2025-01-11 13:56:13 +08:00
Noah Goldstein
0d9c027ad7 [InstCombine] Make takeLog2 visible in all of InstCombine; NFC
Also add `tryGetLog2` helper that encapsulates the common pattern:

```
if (takeLog2(..., /*DoFold=*/false)) {
    Value * Log2 = takeLog2(..., /*DoFold=*/true);
    ...
}
```

Closes #122498
2025-01-10 16:21:35 -06:00
Yingwei Zheng
a77346bad0
[IRBuilder] Refactor FMF interface (#121657)
Up to now, the only way to set specified FMF flags in IRBuilder is to
use `FastMathFlagGuard`. It makes the code ugly and hard to maintain.

This patch introduces a helper class `FMFSource` to replace the original
parameter `Instruction *FMFSource` in IRBuilder. To maximize the
compatibility, it accepts an instruction or a specified FMF.
This patch also removes the use of `FastMathFlagGuard` in some simple
cases.

Compile-time impact:
https://llvm-compile-time-tracker.com/compare.php?from=f87a9db8322643ccbc324e317a75b55903129b55&to=9397e712f6010be15ccf62f12740e9b4a67de2f4&stat=instructions%3Au
2025-01-06 14:37:04 +08:00
Yingwei Zheng
24c2ba07ce
[InstCombine] Drop NSW when converting shl X, BW - 1 back into mul (#121633)
`X <<s BW - 1` and `X *s INT_MIN` are not equivalent.
Alive2: https://alive2.llvm.org/ce/z/MKKFrj
Closes https://github.com/llvm/llvm-project/issues/121584
2025-01-05 01:20:54 +08:00
Yingwei Zheng
a77dedcacb
[InstSimplify][InstCombine][ConstantFold] Move vector div/rem by zero fold to InstCombine (#114280)
Previously we fold `div/rem X, C` into `poison` if any element of the
constant divisor `C` is zero or undef. However, it is incorrect when
threading udiv over an vector select:
https://alive2.llvm.org/ce/z/3Ninx5
```
define <2 x i32> @vec_select_udiv_poison(<2 x i1> %x) {
  %sel = select <2 x i1> %x, <2 x i32> <i32 -1, i32 -1>, <2 x i32> <i32 0, i32 1>
  %div = udiv <2 x i32> <i32 42, i32 -7>, %sel
  ret <2 x i32> %div
}
```
In this case, `threadBinOpOverSelect` folds `udiv <i32 42, i32 -7>, <i32
-1, i32 -1>` and `udiv <i32 42, i32 -7>, <i32 0, i32 1>` into
`zeroinitializer` and `poison`, respectively. One solution is to
introduce a new flag indicating that we are threading over a vector
select. But it requires to modify both `InstSimplify` and
`ConstantFold`.

However, this optimization doesn't provide benefits to real-world
programs:

https://dtcxzyw.github.io/llvm-opt-benchmark/coverage/data/zyw/opt-ci/actions-runner/_work/llvm-opt-benchmark/llvm-opt-benchmark/llvm/llvm-project/llvm/lib/IR/ConstantFold.cpp.html#L908

https://dtcxzyw.github.io/llvm-opt-benchmark/coverage/data/zyw/opt-ci/actions-runner/_work/llvm-opt-benchmark/llvm-opt-benchmark/llvm/llvm-project/llvm/lib/Analysis/InstructionSimplify.cpp.html#L1107

This patch moves the fold into InstCombine to avoid breaking numerous
existing tests.

Fixes #114191 and #113866 (only poison-safety issue).
2024-11-01 22:56:22 +08:00
David Majnemer
5d4a0d54b5 [InstCombine] Teach takeLog2 about right shifts, truncation and bitwise-and
We left some easy opportunities for further simplifications.

log2(trunc(x)) is simply trunc(log2(x)). This is safe if we know that
trunc is NUW because it means that the truncation didn't drop any bits.
It is also safe if the caller is OK with zero as a possible answer.

log2(x >>u y) is simply `log2(x) - y`.

log2(x & y) is a funny one. It comes up when doing something like:
```
unsigned int f(unsigned int x, unsigned int y) {
  unsigned char a = 1u << x;
  return y / a;
}
```

LLVM would canonicalize this to:
```
  %shl = shl nuw i32 1, %x
  %conv1 = and i32 %shl, 255
  %div = udiv i32 %y, %conv1
```

In cases like these, we can ignore the mask entirely.
This is equivalent to `y >> x`.
2024-10-28 05:13:04 +00:00
AtariDreams
60e90a1929
[InstCombine] Check for undef first before freeze (#96769)
All of these insert freeze due to multi-use, which is only
relevant for undef values, not poison.
2024-07-13 18:39:41 +02:00
Noah Goldstein
afa3d58ee2 [InstCombine] Fold (mul (div exact X, C0), C1) -> (div exact X, C0/C1)
We can do this if `C0 % C1 == 0` and if we avoid UB in the signed
case.

Proofs: https://alive2.llvm.org/ce/z/HHWHDg

Closes #96915
2024-06-28 16:52:03 +08:00
Nikita Popov
2d209d964a
[IR] Add getDataLayout() helpers to BasicBlock and Instruction (#96902)
This is a helper to avoid writing `getModule()->getDataLayout()`. I
regularly try to use this method only to remember it doesn't exist...

`getModule()->getDataLayout()` is also a common (the most common?)
reason why code has to include the Module.h header.
2024-06-27 16:38:15 +02:00
Alex MacLean
a4ca22506c
[InstCombine] (uitofp bool X) * Y --> X ? Y : 0 (#96216)
Fold `mul (uitofp i1 X), Y` to `select i1 X, Y, 0.0` when the `mul` is
`nnan` and `nsz`

Proof: https://alive2.llvm.org/ce/z/_stiPm
2024-06-22 09:50:21 -07:00
Nikita Popov
76e889d3b0 [InstCombine] Avoid use of ConstantExpr::getShl()
Use the constant folding API instead (we later call isNotMinSignedValue
on it, so we do need the Constant* return type here). Use ImmConstant
to guarantee that constant folding succeeds.
2024-06-18 16:29:40 +02:00
Nikita Popov
3c553fc9e0
[InstCombine] Infer nuw on mul nsw with non-negative operands (#90170)
If a mul nsw has non-negative operands, it's also nuw.

Proof: https://alive2.llvm.org/ce/z/2Dz9Uu

Fixes https://github.com/llvm/llvm-project/issues/90020.
2024-04-29 09:53:09 +09:00
zhongyunde 00443407
56ca5ecf41 [InstCombine] Optimize powi(X, Y)/ (X * Z) with Ofast
foldFDivPowDivisor can address A / powi(x, y) to A * powi(x, -y),
while for small const value y, for example y=2, the instcombine will
transform powi(x, 2) to fmul x, x, so it is not optimal for A / powi(x, 2).

Fix https://github.com/llvm/llvm-project/issues/77171
2024-04-21 12:09:20 +08:00
zhongyunde 00443407
cb7cb83010 [InstCombine] Add check to avoid dependent optimization order, NFC
Since PR86428, foldPowiReassoc is called by both FMul and FDiv,
as the optimization of FDiv is placed after the FMul, so now
it is correct we don't add the checking of FDiv for powi(X, Y) / X.
But, we may add more matching scenarios later, so add the checking opcode
explicitly is easier to understand.
2024-04-21 12:07:56 +08:00
Nikita Popov
eb7ad8853c
[InstCombine] Remove some uses with replaceUndefsWith() (#89190)
Now that we don't accept undef splat in PatternMatch, we can remove some
uses of replaceUndefsWith(). I believe in all these cases only poison
splats are possible now, in which case no replacement is necessary.
2024-04-19 09:01:56 +09:00
Nikita Popov
1baa385065
[IR][PatternMatch] Only accept poison in getSplatValue() (#89159)
In #88217 a large set of matchers was changed to only accept poison
values in splats, but not undef values. This is because we now use
poison for non-demanded vector elements, and allowing undef can cause
correctness issues.

This patch covers the remaining matchers by changing the AllowUndef
parameter of getSplatValue() to AllowPoison instead. We also carry out
corresponding renames in matchers.

As a followup, we may want to change the default for things like m_APInt
to m_APIntAllowPoison (as this is much less risky when only allowing
poison), but this change doesn't do that.

There is one caveat here: We have a single place
(X86FixupVectorConstants) which does require handling of vector splats
with undefs. This is because this works on backend constant pool
entries, which currently still use undef instead of poison for
non-demanded elements (because SDAG as a whole does not have an explicit
poison representation). As it's just the single use, I've open-coded a
getSplatValueAllowUndef() helper there, to discourage use in any other
places.
2024-04-18 15:44:12 +09:00
Andy Kaylor
be50a259f1
Update foldFMulReassoc to respect absent fast-math flags (#88589)
This change updates a few of the transformations in foldFMulReassoc to
respect absent fast-math flags in cases where fmul and fdiv, fadd, or fsub
instructions were being folded but the code was only checking for
fast-math flags on the fmul instruction and was transferring flags to
the folded instruction that were not present on the other original 
instructions.

This fixes https://github.com/llvm/llvm-project/issues/82857
2024-04-16 16:22:31 -07:00
Nikita Popov
c50f7e9a42
[InstCombine] Remove mul of SPF abs fold (#88675)
Remove the fold working on abs in SPF representation now that we
canonicalize SPF to intrinsics.

This is not strictly NFC because the SPF fold might fire for
non-canonical IR due to multi-use, but given the lack of test coverage,
I assume this is not important.
2024-04-16 09:17:52 +09:00
AtariDreams
5d6b00929b
[NFC] Replace m_Sub(m_Zero(), X) with m_Neg(X) (#88461) 2024-04-12 18:24:03 +09:00
zhongyunde 00443407
bd9bb31bce [InstCombine] add restrict reassoc for the powi(X,Y) / X
add restrict reassoc for the powi(X,Y) / X according the discuss on PR69998.
2024-03-27 16:47:03 +08:00
zhongyunde 00443407
2938f1cff9 [InstCombine] Refactor powi(X,Y) / X to call foldPowiReassoc, NFC 2024-03-27 16:47:03 +08:00
Yingwei Zheng
caa2258250
[LLVM] Remove nuw neg (#86295)
This patch removes APIs that creating NUW neg. It is a trivial case
because `sub nuw 0, X` always gets simplified into zero.
I believe there is no optimization opportunities in the real-world
applications that we can take advantage of the nuw flag.

Motivated by
https://github.com/llvm/llvm-project/pull/84792#discussion_r1524891134.

Compile-time improvement:
https://llvm-compile-time-tracker.com/compare.php?from=d1f182c895728d89c5c3d198b133e212a5d9d4a3&to=da7b7478b7cbb32c09d760f6b8d0e67901e0d533&stat=instructions:u
2024-03-26 20:56:16 +08:00
Noah Goldstein
b3ee127e7d [InstCombine] integrate N{U,S}WAddLike into existing folds
Just went a quick replacement of `N{U,S}WAdd` with the `Like` variant
that old matches `or disjoint`

Closes #86082
2024-03-21 13:03:38 -05:00
Yingwei Zheng
2bfa7d0e16
[InstCombine] Fold fmul X, -0.0 into copysign(0.0, -X) (#85772)
`fneg + copysign` is better than fmul for analysis/codegen.
godbolt: https://godbolt.org/z/eEs6dGd1G
Alive2: https://alive2.llvm.org/ce/z/K3M5BA
2024-03-21 21:48:10 +08:00
SahilPatidar
e61e26091c
[InstCombine] Fold mul (sext bool X), Y into select X, -Y, 0 (#84792)
Alive2: https://alive2.llvm.org/ce/z/n_ns-W

Resolve #84608
2024-03-15 16:08:46 +08:00
zhongyunde 00443407
2d6988a45e [InstCombine] Add restrict reassoc for the operands of fmul
According the discussion, except the fmul itself, all its operands
should also have reassoc flag.
Add new API m_AllowReassoc to check reassoc flag
2024-03-14 22:05:21 +08:00
zhongyunde 00443407
1752b9e4c7 [InstCombine] create a helper function foldPowiReassoc, NFC 2024-03-14 22:05:20 +08:00
zhongyunde 00443407
098520244f [InstCombine] optimize powi(X,Y) * X with Ofast
Try to transform the powi(X, Y) * X into powi(X, Y+1) with Ofast

For this case, when the Y is 3, then powi(X, 4) is replaced by
X2 = X * X; X2 * X2 in the further step.
Similar to D109954, who requires reassoc.

Fixes https://github.com/llvm/llvm-project/issues/69862.
2024-03-14 22:05:20 +08:00
Zain Jaffal
f5811494b0
check if operand is div in fold FDivSqrtDivisor (#81970)
This patch fixes the issues introduced in
bb5c3899d1.

I moved the check for the instruction to be div before I check for the
fast math flags which resolves the crash in

```
float a, b;
double sqrt();
void c() { b = a / sqrt(a); }
```

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2024-03-09 17:15:14 +00:00
Noah Goldstein
946ea4e3ca [InstCombine] Add folds for (fp_binop ({s|u}itofp x), ({s|u}itofp y))
The full fold is one of the following:
1) `(fp_binop ({s|u}itofp x), ({s|u}itofp y))`
    -> `({s|u}itofp (int_binop x, y))`
2) `(fp_binop ({s|u}itofp x), FpC)`
    -> `({s|u}itofp (int_binop x, (fpto{s|u}i FpC)))`

And support the following binops:
    `fmul` -> `mul`
    `fadd` -> `add`
    `fsub` -> `sub`

Proofs: https://alive2.llvm.org/ce/z/zuacA8

The proofs timeout, so they must be reproduced locally.

Closes #82555
2024-03-06 13:28:04 -06:00
Nikita Popov
f7f947e620 [InstCombine] Remove some uninteresting FIXMEs (NFC)
If there are two undef operands, the select would get folded away
entirely. One undef operand can occur if the other two operands
do not satisfy the poison implication check. However, I don't think
that handling this edge case is worthwhile in this fold. If we
wanted to handle this, it would be more natural to do so in the
simplifyValueKnownNonZero() fold (as this is actually the property
we would be exploiting -- this doesn't really have any relation
to taking the log2).
2024-02-12 10:36:29 +01:00
Martin Storsjö
f022aaf4e7 Revert "[InstCombine] Optimise x / sqrt(y / z) with fast-math pattern. (#76737)"
This reverts commit bb5c3899d1936ebdf7ebf5ca4347ee2e057bee7f.

That commit caused failed asserts like this:

$ cat repro.c
float a, b;
double sqrt();
void c() { b = a / sqrt(a); }
$ clang -target x86_64-linux-gnu -c -O2 -ffast-math repro.c
clang: ../lib/IR/Instruction.cpp:522: bool llvm::Instruction::hasAllowReassoc() const: Assertion `isa<FPMathOperator>(this) && "getting fast-math flag on invalid op"' failed.
2024-02-10 11:54:31 +02:00
Zain Jaffal
bb5c3899d1
[InstCombine] Optimise x / sqrt(y / z) with fast-math pattern. (#76737)
Replace the pattern with
x * sqrt(z/y)

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2024-02-09 17:24:41 +00:00
AtariDreams
966f78bdf8
[InstCombine] Resolve TODO: nnan nsz X / -0.0 -> copysign(inf, X) (#79766) 2024-02-07 11:48:37 +05:30
Congcong Cai
64e94438a4
[InstCombine] combine mul(abs(x),abs(y)) to abs(mul(x,y)) (#78395)
Fixes: https://github.com/llvm/llvm-project/issues/78076
Alive2 Proof: https://alive2.llvm.org/ce/z/XEDy0f
2024-01-18 20:12:00 +08:00
Yingwei Zheng
0ce193708c
[InstCombine] Refactor folding of commutative binops over select/phi/minmax (#76692)
This patch cleans up the duplicate code for folding commutative binops
over `select/phi/minmax`.

Related commits:
+ select support:
88cc35b27e
+ phi support:
8674a023bc
+ minmax support:
624973806c
2024-01-04 15:11:28 +08:00
Z572
e6d2bb0ed8
[InstCombine] Simplifiy (-x * y * -x) into (x * y * x) (#72953)
fix https://github.com/llvm/llvm-project/issues/72259
proof: https://alive2.llvm.org/ce/z/HsrmTC
2023-12-21 19:13:09 +08:00
Z572
1c494198c3
[InstCombine] simplify (X * C0) / (X * C1) into C0 / C1. (#73204)
fix #72114
proof: https://alive2.llvm.org/ce/z/xqprFm
2023-12-13 17:17:06 +08:00
Nikita Popov
5295b12cd0 [PatternMatch] Add m_AddLike matcher (NFC)
This matches either a plain "add" or an "or disjoint" that can
be converted into an add. The AddLike terminology is adopted from
the SDAG layer.
2023-12-07 14:45:12 +01:00
Nikita Popov
410bf5e142 [InstCombine] Use disjoint flag in mul of or fold
Slightly more powerful if the information used to infer disjoint
was lost.
2023-12-05 15:24:50 +01:00
Nikita Popov
4b3ea337ad [ValueTracking] Convert isKnownNonNegative() to use SimplifyQuery (NFC) 2023-11-29 10:52:52 +01:00