478 Commits

Author SHA1 Message Date
Matt Arsenault
302ff8fd00
InstCombine: Use SimplifyDemandedFPClass on fmul (#177490)
Start trying to use SimplifyDemandedFPClass on instructions, starting
with fmul. This subsumes the old transform on multiply of 0. The
main change is the introduction of nnan/ninf. I do not think anywhere
was systematically trying to introduce fast math flags before, though
a few odd transforms would set them.

Previously we only called SimplifyDemandedFPClass on function returns
with nofpclass annotations. Start following the pattern of
SimplifyDemandedBits, where this will be called from relevant root
instructions.

I was wondering if this should go into InstCombineAggressive, but that
apparently does not make use of InstCombineInternal's worklist.
2026-02-10 09:49:31 +00:00
Snehasish Kumar
15a30e3acf
[InstCombine][profcheck] Fix profile metadata propagation in takeLog2 (#179331)
Pass the select inst to Builder.CreateSelect so that profile metadata is retained.

Assisted-by: gemini
2026-02-05 19:20:32 -08:00
Alan Zhao
89ce70920a
[InstCombine][profcheck] Add unknown branch weight to all selects in InstCombineMulDivRem.cpp (#178308)
These select instructions originate from code without existing branches,
so there's no existing branch weight data. Therefore, we add unknown
branch weights to all of them.

Tracking issue: #147390
2026-01-28 09:56:07 -08:00
Justin Lebar
4684e504a8
[Instcombine] Ensure simplifyValueKnownNonZero adds instrs in dominance order (#173703)
This was found by a fuzzer I'm working on.  The high-level design is to
randomly generate LLVM IR, run a pass on it, and then run the original
and new IR through the interpreter.  They should produce the same
results.  Right now I'm only fuzzing instcombine.
2026-01-09 10:30:28 +01:00
Gábor Spaits
58822a6933
[InstCombine] Fold (select C, (x bin_op a), x) bin_op b into x bin_op select C, (a bin_op b), b (#173511)
Fixes #154246.

The original pattern in the issue was the following:
```llvm
define i8 @src(i8 %arg0, i8 %arg1) {
  %v0 = icmp eq i8 %arg1, -1
  %v1 = or i8 %arg0, 4
  %v2 = select i1 %v0, i8 %v1, i8 %arg0
  %v3 = or i8 %v2, 1
  ret i8 %v3
}
```
to
```llvm
define i8 @tgt(i8 %arg0, i8 %arg1) {
  %v0 = icmp eq i8 %arg1, -1
  %v3.v = select i1 %v0, i8 5, i8 1
  %v3 = or i8 %arg0, %v3.v
  ret i8 %v3
}
```
(Alive2: https://alive2.llvm.org/ce/z/Nq6b4)

This PR works with other binary operators too like `add` and `mul`.
2026-01-04 20:45:25 +01:00
Hongyu Chen
75b0c89e62
[InstCombine][VectorCombine][NFC] Unify uses of lossless inverse cast (#156597)
This patch addresses
https://github.com/llvm/llvm-project/pull/155216#discussion_r2297724663.
This patch adds a helper function to put the inverse cast on constants,
with cast flags preserved(optional).
Follow-up patches will add trunc/ext handling on VectorCombine and flags
preservation on InstCombine.
2025-09-08 13:30:06 +00:00
Slava Zakharin
c0c71463f6
[InstCombine] Optimize sub(sext(add(x,y)),sext(add(x,z))). (#144174)
This pattern can be often met in Flang generated LLVM IR,
for example, for the counts of the loops generated for array
expressions like: `a(x:x+y)` or `a(x+z:x+z)` or their variations.

In order to compute the loop count, Flang needs to subtract
the lower bound of the array slice from the upper bound
of the array slice. To avoid the sign wraps, it sign extends
the original values (that may be of any user data type)
to `i64`.

This peephole is really helpful in CPU2017/548.exchange2,
where we have multiple following statements like this:
```
block(row+1:row+2, 7:9, i7) = block(row+1:row+2, 7:9, i7) - 10
```

While this is just a 2x3 iterations loop nest, LLVM cannot
figure it out, ending up vectorizing the inner loop really
hard (with a vector epilog and scalar remainder). This, in turn,
causes problems for LSR that ends up creating too many loop-carried
values in the loop containing the above statement, which are then
causing too many spills/reloads.

Alive2: https://alive2.llvm.org/ce/z/gLgfYX

Related to #143219.
2025-06-19 10:13:58 -07:00
amordo
e4c3b037bc
[InstCombine] Fold tan(x) * cos(x) => sin(x) (#136319)
This patch enables folding `tan(x) * cos(x) -> sin(x)` under the `contract` flag.

Fixes https://github.com/llvm/llvm-project/issues/34950.
2025-06-18 23:12:31 +08:00
Ramkumar Ramachandra
b40e4ceaa6
[ValueTracking] Make Depth last default arg (NFC) (#142384)
Having a finite Depth (or recursion limit) for computeKnownBits is very
limiting, but is currently a load-bearing necessity, as all KnownBits
are recomputed on each call and there is no caching. As a prerequisite
for an effort to remove the recursion limit altogether, either using a
clever caching technique, or writing a easily-invalidable KnownBits
analysis, make the Depth argument in APIs in ValueTracking uniformly the
last argument with a default value. This would aid in removing the
argument when the time comes, as many callers that currently pass 0
explicitly are now updated to omit the argument altogether.
2025-06-03 17:12:24 +01:00
AZero13
ffd2633061
[InstCombine] Fold mul (shr exact (X, N)), 2^N + 1 -> add (X , shr exact (X, N)) (#112407)
Alive2 Proofs:
https://alive2.llvm.org/ce/z/aJnxyp
https://alive2.llvm.org/ce/z/dyeGEv
2025-02-13 14:25:09 +08:00
Jeremy Morse
304a99091c [NFC][DebugInfo] Use iterators for insertion at some final callsites
These are the callsites that have materialised in the last three weeks
since I last built with deprecation warnings.
2025-01-28 11:37:11 +00:00
Sushant Gokhale
e79bb8731a
[InstCombine] Fixup commit 7253c6f (#123315)
This should fix the assert failure we were getting for the darwin OS.
2025-01-17 02:14:04 -08:00
Sushant Gokhale
3b3590aa59
Revert "Revert "[InstCombine] Transform high latency, dependent FSQRT/FDIV into FMUL"" (#123313)
Reverts llvm/llvm-project#123289
2025-01-17 02:05:05 -08:00
Sushant Gokhale
606d0a7cdc
Revert "[InstCombine] Transform high latency, dependent FSQRT/FDIV into FMUL" (#123289)
Reverts llvm/llvm-project#87474
2025-01-16 22:50:20 -08:00
Sushant Gokhale
7253c6fde4
[InstCombine] Transform high latency, dependent FSQRT/FDIV into FMUL (#87474)
The proposed patch, in general, tries to transform the below code
sequence:
x = 1.0 / sqrt (a);
r1 = x * x;  // same as 1.0 / a
r2 = a / sqrt(a); // same as sqrt (a)

TO

(If x, r1 and r2 are all used further in the code) 
r1 = 1.0 / a
r2 = sqrt (a)
x = r1 * r2

The transform tries to make high latency sqrt and div operations
independent and also saves on one multiplication.

The patch was tested with SPEC17 suite with cpu=neoverse-v2. The
performance uplift achieved was:
544.nab_r   ~4%

No other regressions were observed. Also, no compile time differences
were observed with the patch.

Closes #54652
2025-01-16 21:09:15 -08:00
Veera
2d5f07c828
[InstCombine] Fold X udiv Y to X lshr cttz(Y) if Y is a power of 2 (#121386)
Fixes #115767

This PR folds `X udiv Y` to `X lshr cttz(Y)` if Y is a power of two
since bitwise operations are faster than division.

Proof: https://alive2.llvm.org/ce/z/qHmLta
2025-01-11 13:56:13 +08:00
Noah Goldstein
0d9c027ad7 [InstCombine] Make takeLog2 visible in all of InstCombine; NFC
Also add `tryGetLog2` helper that encapsulates the common pattern:

```
if (takeLog2(..., /*DoFold=*/false)) {
    Value * Log2 = takeLog2(..., /*DoFold=*/true);
    ...
}
```

Closes #122498
2025-01-10 16:21:35 -06:00
Yingwei Zheng
a77346bad0
[IRBuilder] Refactor FMF interface (#121657)
Up to now, the only way to set specified FMF flags in IRBuilder is to
use `FastMathFlagGuard`. It makes the code ugly and hard to maintain.

This patch introduces a helper class `FMFSource` to replace the original
parameter `Instruction *FMFSource` in IRBuilder. To maximize the
compatibility, it accepts an instruction or a specified FMF.
This patch also removes the use of `FastMathFlagGuard` in some simple
cases.

Compile-time impact:
https://llvm-compile-time-tracker.com/compare.php?from=f87a9db8322643ccbc324e317a75b55903129b55&to=9397e712f6010be15ccf62f12740e9b4a67de2f4&stat=instructions%3Au
2025-01-06 14:37:04 +08:00
Yingwei Zheng
24c2ba07ce
[InstCombine] Drop NSW when converting shl X, BW - 1 back into mul (#121633)
`X <<s BW - 1` and `X *s INT_MIN` are not equivalent.
Alive2: https://alive2.llvm.org/ce/z/MKKFrj
Closes https://github.com/llvm/llvm-project/issues/121584
2025-01-05 01:20:54 +08:00
Yingwei Zheng
a77dedcacb
[InstSimplify][InstCombine][ConstantFold] Move vector div/rem by zero fold to InstCombine (#114280)
Previously we fold `div/rem X, C` into `poison` if any element of the
constant divisor `C` is zero or undef. However, it is incorrect when
threading udiv over an vector select:
https://alive2.llvm.org/ce/z/3Ninx5
```
define <2 x i32> @vec_select_udiv_poison(<2 x i1> %x) {
  %sel = select <2 x i1> %x, <2 x i32> <i32 -1, i32 -1>, <2 x i32> <i32 0, i32 1>
  %div = udiv <2 x i32> <i32 42, i32 -7>, %sel
  ret <2 x i32> %div
}
```
In this case, `threadBinOpOverSelect` folds `udiv <i32 42, i32 -7>, <i32
-1, i32 -1>` and `udiv <i32 42, i32 -7>, <i32 0, i32 1>` into
`zeroinitializer` and `poison`, respectively. One solution is to
introduce a new flag indicating that we are threading over a vector
select. But it requires to modify both `InstSimplify` and
`ConstantFold`.

However, this optimization doesn't provide benefits to real-world
programs:

https://dtcxzyw.github.io/llvm-opt-benchmark/coverage/data/zyw/opt-ci/actions-runner/_work/llvm-opt-benchmark/llvm-opt-benchmark/llvm/llvm-project/llvm/lib/IR/ConstantFold.cpp.html#L908

https://dtcxzyw.github.io/llvm-opt-benchmark/coverage/data/zyw/opt-ci/actions-runner/_work/llvm-opt-benchmark/llvm-opt-benchmark/llvm/llvm-project/llvm/lib/Analysis/InstructionSimplify.cpp.html#L1107

This patch moves the fold into InstCombine to avoid breaking numerous
existing tests.

Fixes #114191 and #113866 (only poison-safety issue).
2024-11-01 22:56:22 +08:00
David Majnemer
5d4a0d54b5 [InstCombine] Teach takeLog2 about right shifts, truncation and bitwise-and
We left some easy opportunities for further simplifications.

log2(trunc(x)) is simply trunc(log2(x)). This is safe if we know that
trunc is NUW because it means that the truncation didn't drop any bits.
It is also safe if the caller is OK with zero as a possible answer.

log2(x >>u y) is simply `log2(x) - y`.

log2(x & y) is a funny one. It comes up when doing something like:
```
unsigned int f(unsigned int x, unsigned int y) {
  unsigned char a = 1u << x;
  return y / a;
}
```

LLVM would canonicalize this to:
```
  %shl = shl nuw i32 1, %x
  %conv1 = and i32 %shl, 255
  %div = udiv i32 %y, %conv1
```

In cases like these, we can ignore the mask entirely.
This is equivalent to `y >> x`.
2024-10-28 05:13:04 +00:00
AtariDreams
60e90a1929
[InstCombine] Check for undef first before freeze (#96769)
All of these insert freeze due to multi-use, which is only
relevant for undef values, not poison.
2024-07-13 18:39:41 +02:00
Noah Goldstein
afa3d58ee2 [InstCombine] Fold (mul (div exact X, C0), C1) -> (div exact X, C0/C1)
We can do this if `C0 % C1 == 0` and if we avoid UB in the signed
case.

Proofs: https://alive2.llvm.org/ce/z/HHWHDg

Closes #96915
2024-06-28 16:52:03 +08:00
Nikita Popov
2d209d964a
[IR] Add getDataLayout() helpers to BasicBlock and Instruction (#96902)
This is a helper to avoid writing `getModule()->getDataLayout()`. I
regularly try to use this method only to remember it doesn't exist...

`getModule()->getDataLayout()` is also a common (the most common?)
reason why code has to include the Module.h header.
2024-06-27 16:38:15 +02:00
Alex MacLean
a4ca22506c
[InstCombine] (uitofp bool X) * Y --> X ? Y : 0 (#96216)
Fold `mul (uitofp i1 X), Y` to `select i1 X, Y, 0.0` when the `mul` is
`nnan` and `nsz`

Proof: https://alive2.llvm.org/ce/z/_stiPm
2024-06-22 09:50:21 -07:00
Nikita Popov
76e889d3b0 [InstCombine] Avoid use of ConstantExpr::getShl()
Use the constant folding API instead (we later call isNotMinSignedValue
on it, so we do need the Constant* return type here). Use ImmConstant
to guarantee that constant folding succeeds.
2024-06-18 16:29:40 +02:00
Nikita Popov
3c553fc9e0
[InstCombine] Infer nuw on mul nsw with non-negative operands (#90170)
If a mul nsw has non-negative operands, it's also nuw.

Proof: https://alive2.llvm.org/ce/z/2Dz9Uu

Fixes https://github.com/llvm/llvm-project/issues/90020.
2024-04-29 09:53:09 +09:00
zhongyunde 00443407
56ca5ecf41 [InstCombine] Optimize powi(X, Y)/ (X * Z) with Ofast
foldFDivPowDivisor can address A / powi(x, y) to A * powi(x, -y),
while for small const value y, for example y=2, the instcombine will
transform powi(x, 2) to fmul x, x, so it is not optimal for A / powi(x, 2).

Fix https://github.com/llvm/llvm-project/issues/77171
2024-04-21 12:09:20 +08:00
zhongyunde 00443407
cb7cb83010 [InstCombine] Add check to avoid dependent optimization order, NFC
Since PR86428, foldPowiReassoc is called by both FMul and FDiv,
as the optimization of FDiv is placed after the FMul, so now
it is correct we don't add the checking of FDiv for powi(X, Y) / X.
But, we may add more matching scenarios later, so add the checking opcode
explicitly is easier to understand.
2024-04-21 12:07:56 +08:00
Nikita Popov
eb7ad8853c
[InstCombine] Remove some uses with replaceUndefsWith() (#89190)
Now that we don't accept undef splat in PatternMatch, we can remove some
uses of replaceUndefsWith(). I believe in all these cases only poison
splats are possible now, in which case no replacement is necessary.
2024-04-19 09:01:56 +09:00
Nikita Popov
1baa385065
[IR][PatternMatch] Only accept poison in getSplatValue() (#89159)
In #88217 a large set of matchers was changed to only accept poison
values in splats, but not undef values. This is because we now use
poison for non-demanded vector elements, and allowing undef can cause
correctness issues.

This patch covers the remaining matchers by changing the AllowUndef
parameter of getSplatValue() to AllowPoison instead. We also carry out
corresponding renames in matchers.

As a followup, we may want to change the default for things like m_APInt
to m_APIntAllowPoison (as this is much less risky when only allowing
poison), but this change doesn't do that.

There is one caveat here: We have a single place
(X86FixupVectorConstants) which does require handling of vector splats
with undefs. This is because this works on backend constant pool
entries, which currently still use undef instead of poison for
non-demanded elements (because SDAG as a whole does not have an explicit
poison representation). As it's just the single use, I've open-coded a
getSplatValueAllowUndef() helper there, to discourage use in any other
places.
2024-04-18 15:44:12 +09:00
Andy Kaylor
be50a259f1
Update foldFMulReassoc to respect absent fast-math flags (#88589)
This change updates a few of the transformations in foldFMulReassoc to
respect absent fast-math flags in cases where fmul and fdiv, fadd, or fsub
instructions were being folded but the code was only checking for
fast-math flags on the fmul instruction and was transferring flags to
the folded instruction that were not present on the other original 
instructions.

This fixes https://github.com/llvm/llvm-project/issues/82857
2024-04-16 16:22:31 -07:00
Nikita Popov
c50f7e9a42
[InstCombine] Remove mul of SPF abs fold (#88675)
Remove the fold working on abs in SPF representation now that we
canonicalize SPF to intrinsics.

This is not strictly NFC because the SPF fold might fire for
non-canonical IR due to multi-use, but given the lack of test coverage,
I assume this is not important.
2024-04-16 09:17:52 +09:00
AtariDreams
5d6b00929b
[NFC] Replace m_Sub(m_Zero(), X) with m_Neg(X) (#88461) 2024-04-12 18:24:03 +09:00
zhongyunde 00443407
bd9bb31bce [InstCombine] add restrict reassoc for the powi(X,Y) / X
add restrict reassoc for the powi(X,Y) / X according the discuss on PR69998.
2024-03-27 16:47:03 +08:00
zhongyunde 00443407
2938f1cff9 [InstCombine] Refactor powi(X,Y) / X to call foldPowiReassoc, NFC 2024-03-27 16:47:03 +08:00
Yingwei Zheng
caa2258250
[LLVM] Remove nuw neg (#86295)
This patch removes APIs that creating NUW neg. It is a trivial case
because `sub nuw 0, X` always gets simplified into zero.
I believe there is no optimization opportunities in the real-world
applications that we can take advantage of the nuw flag.

Motivated by
https://github.com/llvm/llvm-project/pull/84792#discussion_r1524891134.

Compile-time improvement:
https://llvm-compile-time-tracker.com/compare.php?from=d1f182c895728d89c5c3d198b133e212a5d9d4a3&to=da7b7478b7cbb32c09d760f6b8d0e67901e0d533&stat=instructions:u
2024-03-26 20:56:16 +08:00
Noah Goldstein
b3ee127e7d [InstCombine] integrate N{U,S}WAddLike into existing folds
Just went a quick replacement of `N{U,S}WAdd` with the `Like` variant
that old matches `or disjoint`

Closes #86082
2024-03-21 13:03:38 -05:00
Yingwei Zheng
2bfa7d0e16
[InstCombine] Fold fmul X, -0.0 into copysign(0.0, -X) (#85772)
`fneg + copysign` is better than fmul for analysis/codegen.
godbolt: https://godbolt.org/z/eEs6dGd1G
Alive2: https://alive2.llvm.org/ce/z/K3M5BA
2024-03-21 21:48:10 +08:00
SahilPatidar
e61e26091c
[InstCombine] Fold mul (sext bool X), Y into select X, -Y, 0 (#84792)
Alive2: https://alive2.llvm.org/ce/z/n_ns-W

Resolve #84608
2024-03-15 16:08:46 +08:00
zhongyunde 00443407
2d6988a45e [InstCombine] Add restrict reassoc for the operands of fmul
According the discussion, except the fmul itself, all its operands
should also have reassoc flag.
Add new API m_AllowReassoc to check reassoc flag
2024-03-14 22:05:21 +08:00
zhongyunde 00443407
1752b9e4c7 [InstCombine] create a helper function foldPowiReassoc, NFC 2024-03-14 22:05:20 +08:00
zhongyunde 00443407
098520244f [InstCombine] optimize powi(X,Y) * X with Ofast
Try to transform the powi(X, Y) * X into powi(X, Y+1) with Ofast

For this case, when the Y is 3, then powi(X, 4) is replaced by
X2 = X * X; X2 * X2 in the further step.
Similar to D109954, who requires reassoc.

Fixes https://github.com/llvm/llvm-project/issues/69862.
2024-03-14 22:05:20 +08:00
Zain Jaffal
f5811494b0
check if operand is div in fold FDivSqrtDivisor (#81970)
This patch fixes the issues introduced in
bb5c3899d1.

I moved the check for the instruction to be div before I check for the
fast math flags which resolves the crash in

```
float a, b;
double sqrt();
void c() { b = a / sqrt(a); }
```

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2024-03-09 17:15:14 +00:00
Noah Goldstein
946ea4e3ca [InstCombine] Add folds for (fp_binop ({s|u}itofp x), ({s|u}itofp y))
The full fold is one of the following:
1) `(fp_binop ({s|u}itofp x), ({s|u}itofp y))`
    -> `({s|u}itofp (int_binop x, y))`
2) `(fp_binop ({s|u}itofp x), FpC)`
    -> `({s|u}itofp (int_binop x, (fpto{s|u}i FpC)))`

And support the following binops:
    `fmul` -> `mul`
    `fadd` -> `add`
    `fsub` -> `sub`

Proofs: https://alive2.llvm.org/ce/z/zuacA8

The proofs timeout, so they must be reproduced locally.

Closes #82555
2024-03-06 13:28:04 -06:00
Nikita Popov
f7f947e620 [InstCombine] Remove some uninteresting FIXMEs (NFC)
If there are two undef operands, the select would get folded away
entirely. One undef operand can occur if the other two operands
do not satisfy the poison implication check. However, I don't think
that handling this edge case is worthwhile in this fold. If we
wanted to handle this, it would be more natural to do so in the
simplifyValueKnownNonZero() fold (as this is actually the property
we would be exploiting -- this doesn't really have any relation
to taking the log2).
2024-02-12 10:36:29 +01:00
Martin Storsjö
f022aaf4e7 Revert "[InstCombine] Optimise x / sqrt(y / z) with fast-math pattern. (#76737)"
This reverts commit bb5c3899d1936ebdf7ebf5ca4347ee2e057bee7f.

That commit caused failed asserts like this:

$ cat repro.c
float a, b;
double sqrt();
void c() { b = a / sqrt(a); }
$ clang -target x86_64-linux-gnu -c -O2 -ffast-math repro.c
clang: ../lib/IR/Instruction.cpp:522: bool llvm::Instruction::hasAllowReassoc() const: Assertion `isa<FPMathOperator>(this) && "getting fast-math flag on invalid op"' failed.
2024-02-10 11:54:31 +02:00
Zain Jaffal
bb5c3899d1
[InstCombine] Optimise x / sqrt(y / z) with fast-math pattern. (#76737)
Replace the pattern with
x * sqrt(z/y)

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2024-02-09 17:24:41 +00:00
AtariDreams
966f78bdf8
[InstCombine] Resolve TODO: nnan nsz X / -0.0 -> copysign(inf, X) (#79766) 2024-02-07 11:48:37 +05:30
Congcong Cai
64e94438a4
[InstCombine] combine mul(abs(x),abs(y)) to abs(mul(x,y)) (#78395)
Fixes: https://github.com/llvm/llvm-project/issues/78076
Alive2 Proof: https://alive2.llvm.org/ce/z/XEDy0f
2024-01-18 20:12:00 +08:00