llvm-project

Author	SHA1	Message	Date
Yingwei Zheng	db90673d16	[InstCombine] Re-queue users of phi when nsw/nuw flags of add are inferred (#113933 ) This patch re-queue users of phi when one of its incoming add instructions is updated. If an add instruction is updated, the analysis results of phis may be improved. Thus we may further fold some users of this phi node. See the following case: ``` define i8 @trunc_in_loop_exit_block() { ; CHECK-LABEL: @trunc_in_loop_exit_block( ; CHECK-NEXT: entry: ; CHECK-NEXT: br label [[LOOP:%.]] ; CHECK: loop: ; CHECK-NEXT: [[IV:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.]], [[LOOP_LATCH:%.]] ] ; CHECK-NEXT: [[PHI:%.]] = phi i32 [ 1, [[ENTRY]] ], [ [[IV_NEXT]], [[LOOP_LATCH]] ] ; CHECK-NEXT: [[CMP:%.]] = icmp samesign ult i32 [[IV]], 100 ; CHECK-NEXT: br i1 [[CMP]], label [[LOOP_LATCH]], label [[EXIT:%.]] ; CHECK: loop.latch: ; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i32 [[IV]], 1 ; CHECK-NEXT: br label [[LOOP]] ; CHECK: exit: ; CHECK-NEXT: [[TRUNC:%.*]] = trunc i32 [[PHI]] to i8 ; CHECK-NEXT: ret i8 [[TRUNC]] ; entry: br label %loop loop: %iv = phi i32 [ 0, %entry ], [ %iv.next, %loop.latch ] %phi = phi i32 [ 1, %entry ], [ %iv.next, %loop.latch ] %cmp = icmp ult i32 %iv, 100 br i1 %cmp, label %loop.latch, label %exit loop.latch: %iv.next = add i32 %iv, 1 br label %loop exit: %trunc = trunc i32 %phi to i8 ret i8 %trunc } ``` `%iv u< 100` -> infer `nsw/nuw` for `%iv.next = add i32 %iv, 1` -> `%iv` is non-negative -> infer `samesign` for `%cmp = icmp ult i32 %iv, 100`. Without re-queuing users of phi nodes, we cannot improve `%cmp` in one iteration. Address review comment https://github.com/llvm/llvm-project/pull/112642#discussion_r1804712271. This patch also fixes some non-fixpoint issues in tests.	2024-11-18 17:15:46 +08:00
Nikolay Panchenko	6c1fc8213e	[InstCombine] fold `sub(zext(ptrtoint),zext(ptrtoint))` (#115369 ) On a 32-bit target if pointer arithmetic with `addrspace` is used in i64 computation, the missed folding in InstCombine results to suboptimal performance, unlike same code compiled for 64bit target.	2024-11-15 15:36:35 +01:00
Rahul Joshi	fa789dffb1	[NFC] Rename `Intrinsic::getDeclaration` to `getOrInsertDeclaration` (#111752 ) Rename the function to reflect its correct behavior and to be consistent with `Module::getOrInsertFunction`. This is also in preparation of adding a new `Intrinsic::getDeclaration` that will have behavior similar to `Module::getFunction` (i.e, just lookup, no creation).	2024-10-11 05:26:03 -07:00
Noah Goldstein	a6edcea211	[InstCombine] Simplify `(add/sub (sub/add) (sub/add))` irrelivant of use-count Added folds: - `(add (sub X, Y), (sub Z, X))` -> `(sub Z, Y)` - `(sub (add X, Y), (add X, Z))` -> `(sub Y, Z)` The fold typically is handled in the `Reassosiate` pass, but it fails if the inner `sub`/`add` are multi-use. Less importantly, Reassosiate doesn't propagate flags correctly. This patch adds the fold explicitly the InstCombine Proofs: https://alive2.llvm.org/ce/z/p6JyRP Closes #105866	2024-08-27 11:43:17 -07:00
Volodymyr Vasylkun	be7d08cd59	[InstCombine] Fold `sext(A < B) + zext(A > B)` into `ucmp/scmp(A, B)` (#103833 ) This change also covers the fold of `zext(A > B) - zext(A < B)` since it is already being canonicalized into the aforementioned pattern. Proof: https://alive2.llvm.org/ce/z/AgnfMn	2024-08-21 23:15:24 +01:00
Nikita Popov	a105877646	[InstCombine] Remove some of the complexity-based canonicalization (#91185 ) The idea behind this canonicalization is that it allows us to handle less patterns, because we know that some will be canonicalized away. This is indeed very useful to e.g. know that constants are always on the right. However, this is only useful if the canonicalization is actually reliable. This is the case for constants, but not for arguments: Moving these to the right makes it look like the "more complex" expression is guaranteed to be on the left, but this is not actually the case in practice. It fails as soon as you replace the argument with another instruction. The end result is that it looks like things correctly work in tests, while they actually don't. We use the "thwart complexity-based canonicalization" trick to handle this in tests, but it's often a challenge for new contributors to get this right, and based on the regressions this PR originally exposed, we clearly don't get this right in many cases. For this reason, I think that it's better to remove this complexity canonicalization. It will make it much easier to write tests for commuted cases and make sure that they are handled.	2024-08-21 12:02:54 +02:00
Nikita Popov	dd9a99f2b6	[InstCombine] Preserve nsw in A + -B fold This was already done for -B + A, but not for A + -B. Proof: https://alive2.llvm.org/ce/z/F3V2yZ	2024-08-16 16:33:12 +02:00
Yingwei Zheng	62e9f40949	[PatternMatch] Use `m_SpecificCmp` matchers. NFC. (#100878 ) Compile-time improvement: http://llvm-compile-time-tracker.com/compare.php?from=13996378d81c8fa9a364aeaafd7382abbc1db83a&to=861ffa4ec5f7bde5a194a7715593a1b5359eb581&stat=instructions:u baseline: 803eaf29267c6aae9162d1a83a4a2ae508b440d3 ``` Top 5 improvements: stockfish/movegen.ll 2541620819 2538599412 -0.12% minetest/profiler.cpp.ll 431724935 431246500 -0.11% abc/luckySwap.c.ll 581173720 580581935 -0.10% abc/kitTruth.c.ll 2521936288 2519445570 -0.10% abc/extraUtilTruth.c.ll 1216674614 1215495502 -0.10% Top 5 regressions: openssl/libcrypto-shlib-sm4.ll 1155054721 1155943201 +0.08% openssl/libcrypto-lib-sm4.ll 1155054838 1155943063 +0.08% spike/vsm4r_vv.ll 1296430080 1297039258 +0.05% spike/vsm4r_vs.ll 1312496906 1313093460 +0.05% nuttx/lib_rand48.c.ll 126201233 126246692 +0.04% Overall: -0.02112308% ```	2024-07-29 10:04:06 +08:00
Craig Topper	01ceb9840a	[InstCombine] Fold (zext (X +nuw C)) + -C --> zext(X) when zext has additional use. (#98533 ) We have a general fold for (zext (X +nuw C2)) + C1 --> zext (X + (C2 + trunc(C1))) but this fold is disabled if the zext has an additional use. If the two constants cancel, we can fold the whole expression to zext(X) without increasing the number of instructions.	2024-07-12 07:40:29 -07:00
csstormq	96af114941	[InstCombine] Preserve the nsw/nuw flags for (X \| Op01C) + Op1C --> X + (Op01C + Op1C) (#94586 ) This patch simplifies `sdiv` to `udiv` by preserving the `nsw` flag for `(X \| Op01C) + Op1C --> X + (Op01C + Op1C)` if the sum of `Op01C` and `Op1C` will not overflow, and preserves the `nuw` flag unconditionally. Alive2 Proofs (provided by @nikic): https://alive2.llvm.org/ce/z/nrdCZT, https://alive2.llvm.org/ce/z/YnJHnH	2024-06-08 08:38:27 +08:00
Noah Goldstein	0310f7f2d0	[InstCombine] Fold `(add X, (sext/zext (icmp eq X, C)))` We can convert this to a select based on the `(icmp eq X, C)`, then constant fold the addition the true arm begin `(add C, (sext/zext 1))` and the false arm being `(add X, 0)` e.g - `(select (icmp eq X, C), (add C, (sext/zext 1)), (add X, 0))`. This is essentially a specialization of the only case that sees to actually show up from #89020 Closes #93840	2024-06-01 17:49:15 -05:00
Nikita Popov	0d335f78e4	[InstCombine] Handle more commuted cases in matchesSquareSum()	2024-05-09 12:35:16 +09:00
Nikita Popov	d26002ac38	[InstCombine] Fix use-after-free in OptimizePointerDifference() EmitGEPOffset() may remove the old GEP, so be sure to cache the inbounds flag beforehand.	2024-04-26 12:05:12 +09:00
Nikita Popov	cbe1760f02	[InstCombine] Allow multi-use OptimizePointerDifference() with two GEPs (#90017 ) Currently, the OptimizePointerDifference fold does not trigger when working on the sub of two geps where one of the geps has multiple uses, to avoid duplicating the offset arithmetic too much. However, there are cases where performing it would still be clearly profitable, e.g. test_sub_ptradd_multiuse. This patch drops the one-use restriction using the same strategy we use in GEP comparison folds: If there are multiple uses, we rewrite the GEP to use the expanded offset arithmetic instead (effectively canonicalizing it into ptradd representation). Fixes https://github.com/llvm/llvm-project/issues/88231.	2024-04-26 10:53:03 +09:00
Yingwei Zheng	cbb0477e9a	[InstCombine] Fold fneg over select (#89947 ) As we folds fabs over select in https://github.com/llvm/llvm-project/pull/86390, this patch folds fneg over select to make sure nabs idioms are generated. Addresses https://github.com/llvm/llvm-project/pull/86390#discussion_r1568862289. Alive2 for FMF propagation: https://alive2.llvm.org/ce/z/-h6Vuo	2024-04-25 23:14:37 +08:00
Yingwei Zheng	945eeb2d92	[InstCombine] Simplify `(X / C0) * C1 + (X % C0) * C2` to `(X / C0) * (C1 - C2 * C0) + X * C2` (#76285 ) Since `DivRemPairPass` runs after `ReassociatePass` in the optimization pipeline, I decided to do this simplification in `InstCombine`. Alive2: https://alive2.llvm.org/ce/z/Jgsiqf Fixes #76128.	2024-04-24 17:01:49 +08:00
ZelinMa557	97c7124731	[InstCombine] Regard zext nneg as sext when folding add(zext neg(add)) (#88887 ) fixes #88348 proof: https://alive2.llvm.org/ce/z/fJnM7t test will be added later --------- Signed-off-by: ZelinMa557 <3388706467@qq.com>	2024-04-19 22:59:07 +08:00
Nikita Popov	1baa385065	[IR][PatternMatch] Only accept poison in getSplatValue() (#89159 ) In #88217 a large set of matchers was changed to only accept poison values in splats, but not undef values. This is because we now use poison for non-demanded vector elements, and allowing undef can cause correctness issues. This patch covers the remaining matchers by changing the AllowUndef parameter of getSplatValue() to AllowPoison instead. We also carry out corresponding renames in matchers. As a followup, we may want to change the default for things like m_APInt to m_APIntAllowPoison (as this is much less risky when only allowing poison), but this change doesn't do that. There is one caveat here: We have a single place (X86FixupVectorConstants) which does require handling of vector splats with undefs. This is because this works on backend constant pool entries, which currently still use undef instead of poison for non-demanded elements (because SDAG as a whole does not have an explicit poison representation). As it's just the single use, I've open-coded a getSplatValueAllowUndef() helper there, to discourage use in any other places.	2024-04-18 15:44:12 +09:00
Craig Topper	e15f47f267	[InstCombine] Don't use dominating conditions to transform sub into xor. (#88566 ) Other passes are unable to reverse this transform if we use dominating conditions. Fixes #88239.	2024-04-17 13:16:08 -07:00
Harald van Dijk	60de56c743	[ValueTracking] Restore isKnownNonZero parameter order. (#88873 ) Prior to #85863, the required parameters of llvm::isKnownNonZero were Value and DataLayout. After, they are Value, Depth, and SimplifyQuery, where SimplifyQuery is implicitly constructible from DataLayout. The change to move Depth before SimplifyQuery needed callers to be updated unnecessarily, and as commented in #85863, we actually want Depth to be after SimplifyQuery anyway so that it can be defaulted and the caller does not need to specify it.	2024-04-16 15:21:09 +01:00
Yingwei Zheng	e0a628715a	[ValueTracking] Convert `isKnownNonZero` to use SimplifyQuery (#85863 ) This patch converts `isKnownNonZero` to use SimplifyQuery. Then we can use the context information from `DomCondCache`. Fixes https://github.com/llvm/llvm-project/issues/85823. Alive2: https://alive2.llvm.org/ce/z/QUvHVj	2024-04-12 23:47:20 +08:00
Yingwei Zheng	caa2258250	[LLVM] Remove nuw neg (#86295 ) This patch removes APIs that creating NUW neg. It is a trivial case because `sub nuw 0, X` always gets simplified into zero. I believe there is no optimization opportunities in the real-world applications that we can take advantage of the nuw flag. Motivated by https://github.com/llvm/llvm-project/pull/84792#discussion_r1524891134. Compile-time improvement: https://llvm-compile-time-tracker.com/compare.php?from=d1f182c895728d89c5c3d198b133e212a5d9d4a3&to=da7b7478b7cbb32c09d760f6b8d0e67901e0d533&stat=instructions:u	2024-03-26 20:56:16 +08:00
Noah Goldstein	b3ee127e7d	[InstCombine] integrate `N{U,S}WAddLike` into existing folds Just went a quick replacement of `N{U,S}WAdd` with the `Like` variant that old matches `or disjoint` Closes #86082	2024-03-21 13:03:38 -05:00
Noah Goldstein	946ea4e3ca	[InstCombine] Add folds for `(fp_binop ({s\|u}itofp x), ({s\|u}itofp y))` The full fold is one of the following: 1) `(fp_binop ({s\|u}itofp x), ({s\|u}itofp y))` -> `({s\|u}itofp (int_binop x, y))` 2) `(fp_binop ({s\|u}itofp x), FpC)` -> `({s\|u}itofp (int_binop x, (fpto{s\|u}i FpC)))` And support the following binops: `fmul` -> `mul` `fadd` -> `add` `fsub` -> `sub` Proofs: https://alive2.llvm.org/ce/z/zuacA8 The proofs timeout, so they must be reproduced locally. Closes #82555	2024-03-06 13:28:04 -06:00
Noah Goldstein	0f5849eeee	[InstCombine] Move folding `(add (sitofp x), (sitofp y))` impl to InstructionCombiner; NFC	2024-03-06 13:28:04 -06:00
Kai Luo	0f02431273	[InstCombine] Fold (sub (xor X, (sext C)), (sext C)) => (select C (neg X), X) (#79417 ) This is useful when computing absdiff. Correctness prove: https://alive2.llvm.org/ce/z/eMbxps, https://alive2.llvm.org/ce/z/SNCWJe. --------- Co-authored-by: Yingwei Zheng <dtcxzyw@qq.com>	2024-02-26 09:43:20 +08:00
Yingwei Zheng	930996e9e4	[ValueTracking][NFC] Pass `SimplifyQuery` to `computeKnownFPClass` family (#80657 ) This patch refactors the interface of the `computeKnownFPClass` family to pass `SimplifyQuery` directly. The motivation of this patch is to compute known fpclass with `DomConditionCache`, which was introduced by https://github.com/llvm/llvm-project/pull/73662. With `DomConditionCache`, we can do more optimization with context-sensitive information. Example (extracted from [fmt/format.h](`e17bc67547/include/fmt/format.h (L3555-L3566)`)): ``` define float @test(float %x, i1 %cond) { %i32 = bitcast float %x to i32 %cmp = icmp slt i32 %i32, 0 br i1 %cmp, label %if.then1, label %if.else if.then1: %fneg = fneg float %x br label %if.end if.else: br i1 %cond, label %if.then2, label %if.end if.then2: br label %if.end if.end: %value = phi float [ %fneg, %if.then1 ], [ %x, %if.then2 ], [ %x, %if.else ] %ret = call float @llvm.fabs.f32(float %value) ret float %ret } ``` We can prove the signbit of `%value` is always zero. Then the fabs can be eliminated.	2024-02-06 02:30:12 +08:00
Noah Goldstein	60e8915d22	[InstCombine] Add folds for `(add/sub/disjoint_or/icmp C, (ctpop (not x)))` `(ctpop (not x))` <-> `(sub nuw nsw BitWidth(x), (ctpop x))`. The `sub` expression can sometimes be constant folded depending on the use case of `(ctpop (not x))`. This patch adds fold for the following cases: `(add/sub/disjoint_or C, (ctpop (not x))` -> `(add/sub/disjoint_or C', (ctpop x))` `(cmp pred C, (ctpop (not x))` -> `(cmp swapped_pred C', (ctpop x))` Where `C'` depends on how we constant fold `C` with `BitWidth(x)` for the given opcode. Proofs: https://alive2.llvm.org/ce/z/qUgfF3 Closes #77859	2024-01-15 12:05:38 -08:00
Yingwei Zheng	1220c9bafc	[InstCombine] Fold the `log2_ceil` idiom (#76661 ) This patch folds the `log2_ceil` idiom: ``` (BW - ctlz(A)) + (is_power2(A) ? 0 : 1) -> zext(ctpop(A) >u/!= 1) + (ctlz(A, true) ^ (BW - 1)) (canonical form) -> BW - ctlz(A - 1, false) ``` Alive2: https://alive2.llvm.org/ce/z/6mSbdi	2024-01-10 20:24:20 +08:00
Yingwei Zheng	0ce193708c	[InstCombine] Refactor folding of commutative binops over select/phi/minmax (#76692 ) This patch cleans up the duplicate code for folding commutative binops over `select/phi/minmax`. Related commits: + select support: `88cc35b27e` + phi support: `8674a023bc` + minmax support: `624973806c`	2024-01-04 15:11:28 +08:00
Craig Topper	7f1c8fc25a	[InstCombine] Use ConstantInt::getSigned to sign extend -2 for large types. (#76464 ) Using ContantInt::get will zero extend. Fixes #76441	2023-12-27 12:27:12 -08:00
Craig Topper	56248caa3b	[InstCombine] Explicitly set disjoint flag when converting xor to or. (#74229 )	2023-12-06 09:41:59 -08:00
Nikita Popov	e4710872e9	[InstCombine] Use disjoint flag in add of or fold Use disjoint instead of haveNoCommonBitsSet(), which is slightly stronger in case the information used to infer disjoint has been lost. Introduce the m_DisjointOr() matcher to make handling cases like this cleaner.	2023-12-05 15:06:40 +01:00
shaojingzhi	9a99a1a39e	[InstCombine] Add one-use limitation to box multiply fold (#72876 ) Check the operands of I are used in no more than one place, which can not be deleted, cause a mul instruction has far more weight than add and shl instruction in IR, thus this method cannot achieve the goal of simplifying instructions, just return null.	2023-12-04 14:14:59 +01:00
Antonio Frighetto	7d5f79f13b	[InstCombine] Handle equality comparison when flooring by constant 2 Support `icmp eq` when reducing signed divisions by power of 2 to arithmetic shift right, as `icmp ugt` may have been canonicalized into `icmp eq` by the time additions are folded into `ashr`. Fixes: https://github.com/llvm/llvm-project/issues/73622. Proof: https://alive2.llvm.org/ce/z/8-eUdb.	2023-11-30 11:57:01 +01:00
Craig Topper	03d4a9d94d	[InstCombine] Set disjoint flag when turning Add into Or. (#72702 ) The disjoint flag was recently added to IR in #72583	2023-11-27 12:54:11 -08:00
Noah Goldstein	5271d33077	[InstCombine] Add transform for `~X + ~Y)` -> `-2 - Y - X` Proof: https://alive2.llvm.org/ce/z/36FySK Closes #66787.	2023-11-20 17:59:27 -06:00
Noah Goldstein	b7c0f79926	[InstCombine] Replace `isFreeToInvert` + `CreateNot` with `getFreelyInverted` This is nearly an NFC, the only change is potentially to order that values are created/names. Otherwise it is a slight speed boost/simplification to avoid having to go through the `getFreelyInverted` recursive logic twice to simplify the extra `not` op.	2023-11-20 17:59:27 -06:00
Noah Goldstein	d01857803f	[InstCombine] Make `isFreeToInvert` check recursively. Some Instructions (select/min/max) are inverted by just inverting the operands. So the answer of whether they are free to invert is really just whether the operands are free to invert. Differential Revision: https://reviews.llvm.org/D159056	2023-11-20 17:59:26 -06:00
Noah Goldstein	dbf6f30926	[InstCombine] Add folds for `(X + Y) - (W + Z)` If `Y` and `Z` are constant then we can simplify to `(X - W) + (Y - Z)`. If `Y == Z` we can fold to `X - W`. Note these transform exist outside of InstCombine. The purpose of this commit is primarily to make it so that folds can generate these simplifiable patterns without having to worry about creating an inf loop.	2023-11-20 17:59:26 -06:00
Yingwei Zheng	dfe1d35c62	[InstCombine] Propagate NSW/NUW flags for `(X - Y) - Z -> X - (Y + Z)` (#72693 ) Alive2: https://alive2.llvm.org/ce/z/gqeaVo Related patch: `31d219d299`	2023-11-20 00:02:23 +08:00
Yingwei Zheng	8e516d48fe	[InstCombine] Infer nuw flags for `C-(X+C2)` -> `(C-C2)-X` (#72373 ) This patch improves https://reviews.llvm.org/D152068 by inferring NUW flags for sub insts. It is worth noting that we don't need to check overflow for `C-C2`. Alive2: https://alive2.llvm.org/ce/z/uutGpS This missed optimization is discovered with the help of https://github.com/AliveToolkit/alive2/pull/962.	2023-11-16 02:35:47 +08:00
Z572	76ba660688	[InstCombine] Follow-up to "When -A + B both have nsw flag, set nsw f… (#72282 ) …lag." In 3c037b7306f57039e24a1470687cc39a795584ac, Use cast instead of dyn_cast for cast that cannot fail.	2023-11-15 00:42:02 +08:00
Z572	3c037b7306	[InstCombine] When -A + B both have nsw flag, set nsw flag. (#72127 ) Fixes #72119 https://alive2.llvm.org/ce/z/5f_QuC	2023-11-14 13:48:51 +08:00
Nikita Popov	25af06fd7a	[InstCombine] Avoid use of FP cast constant expressions (NFC) Use the constant folding API instead. As we're working on plain ConstantFP, this should always succeed.	2023-11-06 15:22:33 +01:00
Dhruv Chawla	be57381a4a	[InstCombine] Create a class to lazily track computed known bits (#66611 ) This patch adds a new class "WithCache" which stores a pointer to any type passable to computeKnownBits along with KnownBits information which is computed on-demand when getKnownBits() is called. This allows reusing the known bits information when it is passed as an argument to multiple functions. It also changes a few functions to accept WithCache(s) so that known bits information computed in some callees can be propagated to others from the top level visitAddSub caller. This gives a speedup of 0.14%: https://llvm-compile-time-tracker.com/compare.php?from=499d41cef2e7bbb65804f6a815b9fa8b27efce0f&to=fbea87f1f1e6d5552e2bc309f8e201a3af6d28ec&stat=instructions:u	2023-10-17 21:40:18 +05:30
Nikita Popov	80fa5a6377	[ValueTracking] Use SimplifyQuery in haveNoCommonBitsSet() (NFC) Pass SimplifyQuery instead of unpacked list of arguments.	2023-10-10 11:39:59 +02:00
Nikita Popov	1b8fb1a664	[InstCombine] Avoid some uses of ConstantExpr::getZExt() (NFC) Let the IRBuilder constant fold instead.	2023-09-28 15:31:42 +02:00
Nikita Popov	1fc73cacb2	[InstCombine] Propagate nsw flag when negating When pushing a sub nsw 0, %x negation into an expression, try to preserve the nsw flag for the cases where this is possible. Do this by passing the flag through recursive Negator::negate() calls. Proofs: https://alive2.llvm.org/ce/z/oRPNcY Differential Revision: https://reviews.llvm.org/D158510	2023-09-14 09:09:45 +02:00
Christoph Stiller	3af4590506	[InstCombine] Contracting x^2 + 2xy + y^2 to (x + y)^2 (float) Resolves https://github.com/llvm/llvm-project/issues/61296 if https://reviews.llvm.org/D156026 didn't suffice. Reviewed By: goldstein.w.n Differential Revision: https://reviews.llvm.org/D158079	2023-09-01 15:02:12 -05:00

1 2 3 4 5 ...

453 Commits