llvm-project

Author	SHA1	Message	Date
Slava Zakharin	c0c71463f6	[InstCombine] Optimize sub(sext(add(x,y)),sext(add(x,z))). (#144174 ) This pattern can be often met in Flang generated LLVM IR, for example, for the counts of the loops generated for array expressions like: `a(x:x+y)` or `a(x+z:x+z)` or their variations. In order to compute the loop count, Flang needs to subtract the lower bound of the array slice from the upper bound of the array slice. To avoid the sign wraps, it sign extends the original values (that may be of any user data type) to `i64`. This peephole is really helpful in CPU2017/548.exchange2, where we have multiple following statements like this: ``` block(row+1:row+2, 7:9, i7) = block(row+1:row+2, 7:9, i7) - 10 ``` While this is just a 2x3 iterations loop nest, LLVM cannot figure it out, ending up vectorizing the inner loop really hard (with a vector epilog and scalar remainder). This, in turn, causes problems for LSR that ends up creating too many loop-carried values in the loop containing the above statement, which are then causing too many spills/reloads. Alive2: https://alive2.llvm.org/ce/z/gLgfYX Related to #143219.	2025-06-19 10:13:58 -07:00
amordo	e4c3b037bc	[InstCombine] Fold `tan(x) * cos(x) => sin(x)` (#136319 ) This patch enables folding `tan(x) * cos(x) -> sin(x)` under the `contract` flag. Fixes https://github.com/llvm/llvm-project/issues/34950.	2025-06-18 23:12:31 +08:00
Ramkumar Ramachandra	b40e4ceaa6	[ValueTracking] Make Depth last default arg (NFC) (#142384 ) Having a finite Depth (or recursion limit) for computeKnownBits is very limiting, but is currently a load-bearing necessity, as all KnownBits are recomputed on each call and there is no caching. As a prerequisite for an effort to remove the recursion limit altogether, either using a clever caching technique, or writing a easily-invalidable KnownBits analysis, make the Depth argument in APIs in ValueTracking uniformly the last argument with a default value. This would aid in removing the argument when the time comes, as many callers that currently pass 0 explicitly are now updated to omit the argument altogether.	2025-06-03 17:12:24 +01:00
AZero13	ffd2633061	[InstCombine] Fold mul (shr exact (X, N)), 2^N + 1 -> add (X , shr exact (X, N)) (#112407 ) Alive2 Proofs: https://alive2.llvm.org/ce/z/aJnxyp https://alive2.llvm.org/ce/z/dyeGEv	2025-02-13 14:25:09 +08:00
Jeremy Morse	304a99091c	[NFC][DebugInfo] Use iterators for insertion at some final callsites These are the callsites that have materialised in the last three weeks since I last built with deprecation warnings.	2025-01-28 11:37:11 +00:00
Sushant Gokhale	e79bb8731a	[InstCombine] Fixup commit 7253c6f (#123315 ) This should fix the assert failure we were getting for the darwin OS.	2025-01-17 02:14:04 -08:00
Sushant Gokhale	3b3590aa59	Revert "Revert "[InstCombine] Transform high latency, dependent FSQRT/FDIV into FMUL"" (#123313 ) Reverts llvm/llvm-project#123289	2025-01-17 02:05:05 -08:00
Sushant Gokhale	606d0a7cdc	Revert "[InstCombine] Transform high latency, dependent FSQRT/FDIV into FMUL" (#123289 ) Reverts llvm/llvm-project#87474	2025-01-16 22:50:20 -08:00
Sushant Gokhale	7253c6fde4	[InstCombine] Transform high latency, dependent FSQRT/FDIV into FMUL (#87474 ) The proposed patch, in general, tries to transform the below code sequence: x = 1.0 / sqrt (a); r1 = x * x; // same as 1.0 / a r2 = a / sqrt(a); // same as sqrt (a) TO (If x, r1 and r2 are all used further in the code) r1 = 1.0 / a r2 = sqrt (a) x = r1 * r2 The transform tries to make high latency sqrt and div operations independent and also saves on one multiplication. The patch was tested with SPEC17 suite with cpu=neoverse-v2. The performance uplift achieved was: 544.nab_r ~4% No other regressions were observed. Also, no compile time differences were observed with the patch. Closes #54652	2025-01-16 21:09:15 -08:00
Veera	2d5f07c828	[InstCombine] Fold `X udiv Y` to `X lshr cttz(Y)` if Y is a power of 2 (#121386 ) Fixes #115767 This PR folds `X udiv Y` to `X lshr cttz(Y)` if Y is a power of two since bitwise operations are faster than division. Proof: https://alive2.llvm.org/ce/z/qHmLta	2025-01-11 13:56:13 +08:00
Noah Goldstein	0d9c027ad7	[InstCombine] Make `takeLog2` visible in all of InstCombine; NFC Also add `tryGetLog2` helper that encapsulates the common pattern: ``` if (takeLog2(..., /DoFold=/false)) { Value * Log2 = takeLog2(..., /DoFold=/true); ... } ``` Closes #122498	2025-01-10 16:21:35 -06:00
Yingwei Zheng	a77346bad0	[IRBuilder] Refactor FMF interface (#121657 ) Up to now, the only way to set specified FMF flags in IRBuilder is to use `FastMathFlagGuard`. It makes the code ugly and hard to maintain. This patch introduces a helper class `FMFSource` to replace the original parameter `Instruction *FMFSource` in IRBuilder. To maximize the compatibility, it accepts an instruction or a specified FMF. This patch also removes the use of `FastMathFlagGuard` in some simple cases. Compile-time impact: https://llvm-compile-time-tracker.com/compare.php?from=f87a9db8322643ccbc324e317a75b55903129b55&to=9397e712f6010be15ccf62f12740e9b4a67de2f4&stat=instructions%3Au	2025-01-06 14:37:04 +08:00
Yingwei Zheng	24c2ba07ce	[InstCombine] Drop NSW when converting `shl X, BW - 1` back into mul (#121633 ) `X <<s BW - 1` and `X *s INT_MIN` are not equivalent. Alive2: https://alive2.llvm.org/ce/z/MKKFrj Closes https://github.com/llvm/llvm-project/issues/121584	2025-01-05 01:20:54 +08:00
Yingwei Zheng	a77dedcacb	[InstSimplify][InstCombine][ConstantFold] Move vector div/rem by zero fold to InstCombine (#114280 ) Previously we fold `div/rem X, C` into `poison` if any element of the constant divisor `C` is zero or undef. However, it is incorrect when threading udiv over an vector select: https://alive2.llvm.org/ce/z/3Ninx5 ``` define <2 x i32> @vec_select_udiv_poison(<2 x i1> %x) { %sel = select <2 x i1> %x, <2 x i32> <i32 -1, i32 -1>, <2 x i32> <i32 0, i32 1> %div = udiv <2 x i32> <i32 42, i32 -7>, %sel ret <2 x i32> %div } ``` In this case, `threadBinOpOverSelect` folds `udiv <i32 42, i32 -7>, <i32 -1, i32 -1>` and `udiv <i32 42, i32 -7>, <i32 0, i32 1>` into `zeroinitializer` and `poison`, respectively. One solution is to introduce a new flag indicating that we are threading over a vector select. But it requires to modify both `InstSimplify` and `ConstantFold`. However, this optimization doesn't provide benefits to real-world programs: https://dtcxzyw.github.io/llvm-opt-benchmark/coverage/data/zyw/opt-ci/actions-runner/_work/llvm-opt-benchmark/llvm-opt-benchmark/llvm/llvm-project/llvm/lib/IR/ConstantFold.cpp.html#L908 https://dtcxzyw.github.io/llvm-opt-benchmark/coverage/data/zyw/opt-ci/actions-runner/_work/llvm-opt-benchmark/llvm-opt-benchmark/llvm/llvm-project/llvm/lib/Analysis/InstructionSimplify.cpp.html#L1107 This patch moves the fold into InstCombine to avoid breaking numerous existing tests. Fixes #114191 and #113866 (only poison-safety issue).	2024-11-01 22:56:22 +08:00
David Majnemer	5d4a0d54b5	[InstCombine] Teach takeLog2 about right shifts, truncation and bitwise-and We left some easy opportunities for further simplifications. log2(trunc(x)) is simply trunc(log2(x)). This is safe if we know that trunc is NUW because it means that the truncation didn't drop any bits. It is also safe if the caller is OK with zero as a possible answer. log2(x >>u y) is simply `log2(x) - y`. log2(x & y) is a funny one. It comes up when doing something like: ``` unsigned int f(unsigned int x, unsigned int y) { unsigned char a = 1u << x; return y / a; } ``` LLVM would canonicalize this to: ``` %shl = shl nuw i32 1, %x %conv1 = and i32 %shl, 255 %div = udiv i32 %y, %conv1 ``` In cases like these, we can ignore the mask entirely. This is equivalent to `y >> x`.	2024-10-28 05:13:04 +00:00
AtariDreams	60e90a1929	[InstCombine] Check for undef first before freeze (#96769 ) All of these insert freeze due to multi-use, which is only relevant for undef values, not poison.	2024-07-13 18:39:41 +02:00
Noah Goldstein	afa3d58ee2	[InstCombine] Fold `(mul (div exact X, C0), C1)` -> `(div exact X, C0/C1)` We can do this if `C0 % C1 == 0` and if we avoid UB in the signed case. Proofs: https://alive2.llvm.org/ce/z/HHWHDg Closes #96915	2024-06-28 16:52:03 +08:00
Nikita Popov	2d209d964a	[IR] Add getDataLayout() helpers to BasicBlock and Instruction (#96902 ) This is a helper to avoid writing `getModule()->getDataLayout()`. I regularly try to use this method only to remember it doesn't exist... `getModule()->getDataLayout()` is also a common (the most common?) reason why code has to include the Module.h header.	2024-06-27 16:38:15 +02:00
Alex MacLean	a4ca22506c	[InstCombine] (uitofp bool X) * Y --> X ? Y : 0 (#96216 ) Fold `mul (uitofp i1 X), Y` to `select i1 X, Y, 0.0` when the `mul` is `nnan` and `nsz` Proof: https://alive2.llvm.org/ce/z/_stiPm	2024-06-22 09:50:21 -07:00
Nikita Popov	76e889d3b0	[InstCombine] Avoid use of ConstantExpr::getShl() Use the constant folding API instead (we later call isNotMinSignedValue on it, so we do need the Constant* return type here). Use ImmConstant to guarantee that constant folding succeeds.	2024-06-18 16:29:40 +02:00
Nikita Popov	3c553fc9e0	[InstCombine] Infer nuw on mul nsw with non-negative operands (#90170 ) If a mul nsw has non-negative operands, it's also nuw. Proof: https://alive2.llvm.org/ce/z/2Dz9Uu Fixes https://github.com/llvm/llvm-project/issues/90020.	2024-04-29 09:53:09 +09:00
zhongyunde 00443407	56ca5ecf41	[InstCombine] Optimize powi(X, Y)/ (X * Z) with Ofast foldFDivPowDivisor can address A / powi(x, y) to A * powi(x, -y), while for small const value y, for example y=2, the instcombine will transform powi(x, 2) to fmul x, x, so it is not optimal for A / powi(x, 2). Fix https://github.com/llvm/llvm-project/issues/77171	2024-04-21 12:09:20 +08:00
zhongyunde 00443407	cb7cb83010	[InstCombine] Add check to avoid dependent optimization order, NFC Since PR86428, foldPowiReassoc is called by both FMul and FDiv, as the optimization of FDiv is placed after the FMul, so now it is correct we don't add the checking of FDiv for powi(X, Y) / X. But, we may add more matching scenarios later, so add the checking opcode explicitly is easier to understand.	2024-04-21 12:07:56 +08:00
Nikita Popov	eb7ad8853c	[InstCombine] Remove some uses with replaceUndefsWith() (#89190 ) Now that we don't accept undef splat in PatternMatch, we can remove some uses of replaceUndefsWith(). I believe in all these cases only poison splats are possible now, in which case no replacement is necessary.	2024-04-19 09:01:56 +09:00
Nikita Popov	1baa385065	[IR][PatternMatch] Only accept poison in getSplatValue() (#89159 ) In #88217 a large set of matchers was changed to only accept poison values in splats, but not undef values. This is because we now use poison for non-demanded vector elements, and allowing undef can cause correctness issues. This patch covers the remaining matchers by changing the AllowUndef parameter of getSplatValue() to AllowPoison instead. We also carry out corresponding renames in matchers. As a followup, we may want to change the default for things like m_APInt to m_APIntAllowPoison (as this is much less risky when only allowing poison), but this change doesn't do that. There is one caveat here: We have a single place (X86FixupVectorConstants) which does require handling of vector splats with undefs. This is because this works on backend constant pool entries, which currently still use undef instead of poison for non-demanded elements (because SDAG as a whole does not have an explicit poison representation). As it's just the single use, I've open-coded a getSplatValueAllowUndef() helper there, to discourage use in any other places.	2024-04-18 15:44:12 +09:00
Andy Kaylor	be50a259f1	Update foldFMulReassoc to respect absent fast-math flags (#88589 ) This change updates a few of the transformations in foldFMulReassoc to respect absent fast-math flags in cases where fmul and fdiv, fadd, or fsub instructions were being folded but the code was only checking for fast-math flags on the fmul instruction and was transferring flags to the folded instruction that were not present on the other original instructions. This fixes https://github.com/llvm/llvm-project/issues/82857	2024-04-16 16:22:31 -07:00
Nikita Popov	c50f7e9a42	[InstCombine] Remove mul of SPF abs fold (#88675 ) Remove the fold working on abs in SPF representation now that we canonicalize SPF to intrinsics. This is not strictly NFC because the SPF fold might fire for non-canonical IR due to multi-use, but given the lack of test coverage, I assume this is not important.	2024-04-16 09:17:52 +09:00
AtariDreams	5d6b00929b	[NFC] Replace m_Sub(m_Zero(), X) with m_Neg(X) (#88461 )	2024-04-12 18:24:03 +09:00
zhongyunde 00443407	bd9bb31bce	[InstCombine] add restrict reassoc for the powi(X,Y) / X add restrict reassoc for the powi(X,Y) / X according the discuss on PR69998.	2024-03-27 16:47:03 +08:00
zhongyunde 00443407	2938f1cff9	[InstCombine] Refactor powi(X,Y) / X to call foldPowiReassoc, NFC	2024-03-27 16:47:03 +08:00
Yingwei Zheng	caa2258250	[LLVM] Remove nuw neg (#86295 ) This patch removes APIs that creating NUW neg. It is a trivial case because `sub nuw 0, X` always gets simplified into zero. I believe there is no optimization opportunities in the real-world applications that we can take advantage of the nuw flag. Motivated by https://github.com/llvm/llvm-project/pull/84792#discussion_r1524891134. Compile-time improvement: https://llvm-compile-time-tracker.com/compare.php?from=d1f182c895728d89c5c3d198b133e212a5d9d4a3&to=da7b7478b7cbb32c09d760f6b8d0e67901e0d533&stat=instructions:u	2024-03-26 20:56:16 +08:00
Noah Goldstein	b3ee127e7d	[InstCombine] integrate `N{U,S}WAddLike` into existing folds Just went a quick replacement of `N{U,S}WAdd` with the `Like` variant that old matches `or disjoint` Closes #86082	2024-03-21 13:03:38 -05:00
Yingwei Zheng	2bfa7d0e16	[InstCombine] Fold `fmul X, -0.0` into `copysign(0.0, -X)` (#85772 ) `fneg + copysign` is better than fmul for analysis/codegen. godbolt: https://godbolt.org/z/eEs6dGd1G Alive2: https://alive2.llvm.org/ce/z/K3M5BA	2024-03-21 21:48:10 +08:00
SahilPatidar	e61e26091c	[InstCombine] Fold `mul (sext bool X), Y` into `select X, -Y, 0` (#84792 ) Alive2: https://alive2.llvm.org/ce/z/n_ns-W Resolve #84608	2024-03-15 16:08:46 +08:00
zhongyunde 00443407	2d6988a45e	[InstCombine] Add restrict reassoc for the operands of fmul According the discussion, except the fmul itself, all its operands should also have reassoc flag. Add new API m_AllowReassoc to check reassoc flag	2024-03-14 22:05:21 +08:00
zhongyunde 00443407	1752b9e4c7	[InstCombine] create a helper function foldPowiReassoc, NFC	2024-03-14 22:05:20 +08:00
zhongyunde 00443407	098520244f	[InstCombine] optimize powi(X,Y) * X with Ofast Try to transform the powi(X, Y) * X into powi(X, Y+1) with Ofast For this case, when the Y is 3, then powi(X, 4) is replaced by X2 = X * X; X2 * X2 in the further step. Similar to D109954, who requires reassoc. Fixes https://github.com/llvm/llvm-project/issues/69862.	2024-03-14 22:05:20 +08:00
Zain Jaffal	f5811494b0	check if operand is div in fold FDivSqrtDivisor (#81970 ) This patch fixes the issues introduced in `bb5c3899d1`. I moved the check for the instruction to be div before I check for the fast math flags which resolves the crash in ``` float a, b; double sqrt(); void c() { b = a / sqrt(a); } ``` --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2024-03-09 17:15:14 +00:00
Noah Goldstein	946ea4e3ca	[InstCombine] Add folds for `(fp_binop ({s\|u}itofp x), ({s\|u}itofp y))` The full fold is one of the following: 1) `(fp_binop ({s\|u}itofp x), ({s\|u}itofp y))` -> `({s\|u}itofp (int_binop x, y))` 2) `(fp_binop ({s\|u}itofp x), FpC)` -> `({s\|u}itofp (int_binop x, (fpto{s\|u}i FpC)))` And support the following binops: `fmul` -> `mul` `fadd` -> `add` `fsub` -> `sub` Proofs: https://alive2.llvm.org/ce/z/zuacA8 The proofs timeout, so they must be reproduced locally. Closes #82555	2024-03-06 13:28:04 -06:00
Nikita Popov	f7f947e620	[InstCombine] Remove some uninteresting FIXMEs (NFC) If there are two undef operands, the select would get folded away entirely. One undef operand can occur if the other two operands do not satisfy the poison implication check. However, I don't think that handling this edge case is worthwhile in this fold. If we wanted to handle this, it would be more natural to do so in the simplifyValueKnownNonZero() fold (as this is actually the property we would be exploiting -- this doesn't really have any relation to taking the log2).	2024-02-12 10:36:29 +01:00
Martin Storsjö	f022aaf4e7	Revert "[InstCombine] Optimise x / sqrt(y / z) with fast-math pattern. (#76737 )" This reverts commit bb5c3899d1936ebdf7ebf5ca4347ee2e057bee7f. That commit caused failed asserts like this: $ cat repro.c float a, b; double sqrt(); void c() { b = a / sqrt(a); } $ clang -target x86_64-linux-gnu -c -O2 -ffast-math repro.c clang: ../lib/IR/Instruction.cpp:522: bool llvm::Instruction::hasAllowReassoc() const: Assertion `isa<FPMathOperator>(this) && "getting fast-math flag on invalid op"' failed.	2024-02-10 11:54:31 +02:00
Zain Jaffal	bb5c3899d1	[InstCombine] Optimise x / sqrt(y / z) with fast-math pattern. (#76737 ) Replace the pattern with x * sqrt(z/y) --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2024-02-09 17:24:41 +00:00
AtariDreams	966f78bdf8	[InstCombine] Resolve TODO: nnan nsz X / -0.0 -> copysign(inf, X) (#79766 )	2024-02-07 11:48:37 +05:30
Congcong Cai	64e94438a4	[InstCombine] combine mul(abs(x),abs(y)) to abs(mul(x,y)) (#78395 ) Fixes: https://github.com/llvm/llvm-project/issues/78076 Alive2 Proof: https://alive2.llvm.org/ce/z/XEDy0f	2024-01-18 20:12:00 +08:00
Yingwei Zheng	0ce193708c	[InstCombine] Refactor folding of commutative binops over select/phi/minmax (#76692 ) This patch cleans up the duplicate code for folding commutative binops over `select/phi/minmax`. Related commits: + select support: `88cc35b27e` + phi support: `8674a023bc` + minmax support: `624973806c`	2024-01-04 15:11:28 +08:00
Z572	e6d2bb0ed8	[InstCombine] Simplifiy `(-x * y * -x)` into `(x * y * x)` (#72953 ) fix https://github.com/llvm/llvm-project/issues/72259 proof: https://alive2.llvm.org/ce/z/HsrmTC	2023-12-21 19:13:09 +08:00
Z572	1c494198c3	[InstCombine] simplify `(X * C0) / (X * C1)` into `C0 / C1`. (#73204 ) fix #72114 proof: https://alive2.llvm.org/ce/z/xqprFm	2023-12-13 17:17:06 +08:00
Nikita Popov	5295b12cd0	[PatternMatch] Add m_AddLike matcher (NFC) This matches either a plain "add" or an "or disjoint" that can be converted into an add. The AddLike terminology is adopted from the SDAG layer.	2023-12-07 14:45:12 +01:00
Nikita Popov	410bf5e142	[InstCombine] Use disjoint flag in mul of or fold Slightly more powerful if the information used to infer disjoint was lost.	2023-12-05 15:24:50 +01:00
Nikita Popov	4b3ea337ad	[ValueTracking] Convert isKnownNonNegative() to use SimplifyQuery (NFC)	2023-11-29 10:52:52 +01:00

1 2 3 4 5 ...

472 Commits