llvm-project

Author	SHA1	Message	Date
Max Kazantsev	d02c3b1358	[Test] Add test showing potential conflict b/w AND elimination and IV widening	2022-12-26 14:37:49 +07:00
Max Kazantsev	9a7286b61f	[SCEV] Help getLoopInvariantExitCondDuringFirstIterations deal with complex `umin` exit counts. PR59615 Recent improvements in symbolic exit count computation revealed some problems with SCEV's ability to find invariant predicate during first iterations. Ultimately it is based on its ability to prove some facts for value on the last iteration. This last value, when it includes `umin` as part of exit count, isn't always simplified enough. The motivating example is following: https://github.com/llvm/llvm-project/issues/59615 Could not prove: ``` Pred = 36, LHS = (-1 + (-1 * (2147483645 umin (-1 + %var)<nsw>))<nsw> + %var), RHS = %var FoundPred = 36, FoundLHS = {1,+,1}<nuw><nsw><%bb3>, FoundRHS = %var ``` Can prove: ``` Pred = 36, LHS = (-1 + (-1 * (-1 + %var)<nsw>)<nsw> + %var), RHS = %var FoundPred = 36, FoundLHS = {1,+,1}<nuw><nsw><%bb3>, FoundRHS = %var ``` Here ` (2147483645 umin (-1 + %var)<nsw>)` is exit count composed of two parts from two different exits: `2147483645 ` and `(-1 + %var)<nsw>`. When it was only one (latter) analyzeable exit, for it everything was easily provable. Unfortunately, in general case `umin` in one of `add`'s operands doesn't guarantee that the whole sum reduces, especially in presence of negative steps and lack of `nuw`. I don't think there is a generic legal way to somehow play around this `umin`. So the ad-hoc solution is following: if we failed to find an equivalent predicate that is invariant during first `MaxIter` iterations, and `MaxIter = umin(a, b, c...)`, try to find solution for at least one of `a`, `b`, `c`... Because they all are `uge` than `MaxIter`, whatever is true during `a (b, c)` iterations is also true during `MaxIter` iterations. Differential Revision: https://reviews.llvm.org/D140456 Reviewed By: nikic	2022-12-21 18:12:17 +07:00
Max Kazantsev	474c8fe9b7	[Test] Precommit test for PR59615	2022-12-21 11:39:05 +07:00
Nikita Popov	3ce360f15b	[IndVarSimplify] Convert more tests to opaque pointers (NFC)	2022-12-14 15:37:58 +01:00
Nikita Popov	8b7b5f9cfe	[IndVarSimplify] Regenerate test checks (NFC)	2022-12-14 15:35:58 +01:00
Nikita Popov	864bb84a42	[IndVarSimplify] Convert tests to opaque pointers (NFC) This leaves lftr.ll alone, because there is a suspicious test diff.	2022-12-13 14:50:13 +01:00
Roman Lebedev	67bbdd05c4	[NFC] Port all IndVarSimplify tests to `-passes=` syntax	2022-12-08 02:38:44 +03:00
Roman Lebedev	6017d9a628	[NFC] Port all IndVarSimplify tests to `-passes=` syntax	2022-12-07 22:22:09 +03:00
Bjorn Pettersson	a11faeed44	[test] Switch to use -passes syntax in various test cases	2022-12-01 21:25:59 +01:00
Max Kazantsev	57fd7ffeff	[IndVarSimplify] Lift limitations on IV being a Phi for turn-to-invariant These limitations are too strict, and their only purpose is to avoid code size explosion. These restrictions seem obsolete, and the size problem is solved in other places through cheap expansion limits. The motivation is that the old code cannot deal with comparisons against induction variant's increment. Differential Revision: https://reviews.llvm.org/D138412 Reviewed By: lebedev.ri, reames	2022-11-22 12:53:37 +07:00
Max Kazantsev	41e41cc2c4	[Test] Add some test showing limitations of makeIVComparisonInvariant The transform doesn't work if argument isn't immediately a Phi.	2022-11-21 16:20:36 +07:00
Bjorn Pettersson	f15ed06a65	[test][IndVarSimplify] Use -passes syntax in RUN lines. NFC	2022-10-13 10:44:37 +02:00
Max Kazantsev	86d5586d78	[SCEVExpander] Recompute poison-generating flags on hoisting. PR57187 Instruction being hoisted could have nuw/nsw flags inferred from the old context, and we cannot simply move it to the new location keeping them because we are going to introduce new uses to them that didn't exist before. Example in https://github.com/llvm/llvm-project/issues/57187 shows how this can produce branch by poison from initially well-defined program. This patch forcefully recomputes poison-generating flag in the new context. Differential Revision: https://reviews.llvm.org/D132022 Reviewed By: fhahn, nikic	2022-09-13 12:56:35 +07:00
Max Kazantsev	e351e8213b	[Test] Remove addrspace1 ptr to not confuse alive2 addrspace here is not import for the test itself.	2022-08-19 13:25:50 +07:00
David Spickett	8375c3124d	[LLVM][IndvarSimplify] Move test that requires X86 This is failing on our bots that only build Arm/AArch64. https://lab.llvm.org/buildbot/#/builders/171/builds/19033/steps/5/logs/FAIL__LLVM__pr57187_ll	2022-08-17 11:14:08 +00:00
Max Kazantsev	ebabd6bf18	Return "[SCEV] Use context to strengthen flags of BinOps" This reverts commit 354fa0b48008eca701a110badd6974bf449df257. Returning as is. The patch was reverted due to a miscompile, but this patch is not causing it. This patch made it possible to infer some nuw flags in code guarded by `false` condition, and then someone else to managed to propagate the flag from dead code outside. Returning the patch to be able to reproduce the issue.	2022-08-16 14:12:36 +07:00
Max Kazantsev	354fa0b480	Revert "[SCEV] Use context to strengthen flags of BinOps" This reverts commit 34ae308c73e4d76dbdab25a6206d3fbc5ebafdf5. Our internal testing found a miscompile. Not sure if it's caused by this patch or it revealed something else. Reverting while investigating.	2022-08-15 18:51:59 +07:00
Max Kazantsev	34ae308c73	[SCEV] Use context to strengthen flags of BinOps Sometimes SCEV cannot infer nuw/nsw from something as simple as ``` len in [0, MAX_INT] ... iv = phi(0, iv.next) guard(iv <s len) guard(iv <u len) iv.next = iv + 1 ``` just because flag strenthening only relies on definition and does not use local facts. This patch adds support for the simplest case: inference of flags of `add(x, constant)` if we can contextually prove that `x <= max_int - constant`. In case if it has negative CT impact, we can add an option to switch it off. I woudln't expect that though. Differential Revision: https://reviews.llvm.org/D129643 Reviewed By: apilipenko	2022-08-03 14:08:57 +07:00
Nuno Lopes	9df0b254d2	[NFC] Switch a few uses of undef to poison as placeholders for unreachable code	2022-07-23 21:50:11 +01:00
Nikita Popov	af49bed933	[IndVars] Simplify instructions after replacing header phi with preheader value After replacing a loop phi with the preheader value, it's usually possible to simplify some of the using instructions, so do that as part of replaceLoopPHINodesWithPreheaderValues(). Doing this as part of IndVars is valuable, because it may make GEPs in the loop have constant offsets and allow the following SROA run to succeed (as demonstrated in the PhaseOrdering test). Differential Revision: https://reviews.llvm.org/D129293	2022-07-13 10:27:04 +02:00
Nikita Popov	a5ee62a141	[IndVars] Call replaceLoopPHINodesWithPreheaderValues() for already constant exits Currently we only call replaceLoopPHINodesWithPreheaderValues() if optimizeLoopExits() replaces the exit with an unconditional exit. However, it is very common that this already happens as part of eliminateIVComparison(), in which case we're leaving behind the dead header phi. Tweak the early bailout for already-constant exits to also call replaceLoopPHINodesWithPreheaderValues(). Differential Revision: https://reviews.llvm.org/D129214	2022-07-13 09:43:21 +02:00
Nikita Popov	d1e880acaa	[SCEV] Enable verification in LoopPM Currently, we hardly ever actually run SCEV verification, even in tests with -verify-scev. This is because the NewPM LPM does not verify SCEV. The reason for this is that SCEV verification can actually change the result of subsequent SCEV queries, which means that you see different transformations depending on whether verification is enabled or not. To allow verification in the LPM, this limits verification to BECounts that have actually been cached. It will not calculate new BECounts. BackedgeTakenInfo::getExact() is still not entirely readonly, it still calls getUMinFromMismatchedTypes(). But I hope that this is not problematic in the same way. (This could be avoided by performing the umin in the other SCEV instance, but this would require duplicating some of the code.) Differential Revision: https://reviews.llvm.org/D120551	2022-03-07 09:46:20 +01:00
Nikita Popov	aeab6167b0	[SCEV] Only verify BECounts for reachable loops (PR50523) For unreachable loops, any BECount is legal, and since D98706 SCEV can make use of this for loops that are unreachable due to constant branches. To avoid false positives, adjust SCEV verification to only check BECounts in reachable loops. Fixes https://github.com/llvm/llvm-project/issues/50523. Differential Revision: https://reviews.llvm.org/D120651	2022-03-01 11:52:35 +01:00
Nikita Popov	859567725d	[IndVars] Don't run full optimization pipeline in test (NFC) This extracts the IR prior to IndVarSimplify and only runs the single pass.	2022-02-17 09:28:33 +01:00
Dmitry Makogon	62f86d4f95	Reapply 5ec2386 "Reapply db28934 "[IndVars] Pass TTI to replaceCongruentIVs"" This reverts commit 7cd273c339cfe8427404f881ae280bd9fae6ff78. Several patches with tests fixes have been applied: 0cada82f0a30e5ae22dce66b58604ab9b47a3897 "[Test] Remove incorrect test in GVN" 97cb13615d6d9df254e3c0f3deef9eaedfe189b6 "[Test] Separate IndVars test into AArch64 and X86 parts" 985cc490f17d28b20392ee214895d947b85120ef "[Test] Remove separated test in IndVars", and test failures caused by 5ec2386 should be resolved now.	2021-11-10 17:36:14 +07:00
Dmitry Makogon	97cb13615d	[Test] Separate IndVars test into AArch64 and X86 parts The widen-loop-comp.ll in indvars has a target triple with specified aarch64 architecture. This caused test failures with db28934 "[IndVars] Pass TTI to replaceCongruentIVs" applied, because with the patch indvars performed some target-specific transforms, and for example if a build supported only X86, then indvars would not have applied those transforms. However, the checks in the test were generated as for aarch64. Thus the test failures on such builds. This patch separates widen-loop-comp.ll into two parts. The first one is intended to be run only if a build supports aarch64. This is now in AArch64 directory with a lit config. The second one was added recently to show db28934 improvements. This one is now in X86 directory. This patch should resolve build issues caused by 5ec23863320ca12bfabb6dcff1d0425cb614b7a5.	2021-11-10 16:15:20 +07:00
Douglas Yung	7cd273c339	Revert "Reapply db28934 "[IndVars] Pass TTI to replaceCongruentIVs"" This reverts commit 5ec23863320ca12bfabb6dcff1d0425cb614b7a5. This change is causing test failures on the PS4 linux build bot: https://lab.llvm.org/buildbot/#/builders/139/builds/12871	2021-11-09 10:28:41 -08:00
Dmitry Makogon	5ec2386332	Reapply db28934 "[IndVars] Pass TTI to replaceCongruentIVs" This reapplies patch db289340c841990055a164e8eb2a3b5ff25677bf. The test failures on build with expensive checks caused by the patch happened due to the fact that we sorted loop Phis in replaceCongruentIVs using llvm::sort, which shuffles the given container if the expensive checks are enabled, so equivalent Phis in the sorted vector had different mutual order from run to run. replaceCongruentIVs tries to replace narrow Phis with truncations of wide ones. In some test cases there were several Phis with the same width, so if their order differs from run to run, the narrow Phis would be replaced with a different Phi, depending on the shuffling result. The patch ae14fae0ff4304022beda5ab484f84ac0fdda807 fixed this issue by replacing llvm::sort with llvm::stable_sort.	2021-11-09 17:42:29 +07:00
Dmitry Makogon	8d4eba6c0d	Revert "[IndVars] Pass TTI to replaceCongruentIVs" This reverts commit db289340c841990055a164e8eb2a3b5ff25677bf. The patch caused 2 crashes with expensive checks enabled.	2021-11-08 19:35:14 +07:00
Dmitry Makogon	db289340c8	[IndVars] Pass TTI to replaceCongruentIVs In IndVarSimplify after simplifying and extending loop IVs we call 'replaceCongruentIVs'. This function optionally takes a TTI argument to be able to replace narrow IVs uses with truncates of the widest one. For some reason the TTI wasn't passed to the function, so it couldn't perform such transform. This patch fixes it. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D113024	2021-11-08 19:20:53 +07:00
Philip Reames	e69f6476a8	Autogen tests for ease of future update	2021-11-05 12:46:07 -07:00
Dmitry Makogon	dd000e67f0	[Test] Regenerate IndVars test's checks This just regenerates a certain IndVars test's checks.	2021-11-02 22:03:58 +07:00
Simon Pilgrim	c931d35216	[CostModel][X86] Increase i64 mul cost from 1 to 2 Only the most recent cpus support really 1cy 64-bit multiplies, and the X64 cost table represents a realistic worst case. The 1cy value was also discouraging vectorization when most vXi64 PMULDQ expansions aren't actually slower than scalarization. Noticed while investigating PR51436.	2021-09-23 14:48:21 +01:00
Simon Pilgrim	6de42e104f	[IndVarSimplify][X86] Regenerate loop-invariant-conditions.ll test checks	2021-07-07 13:58:28 +01:00
serge-sans-paille	4ab3041acb	Revert "[NFC] remove explicit default value for strboolattr attribute in tests" This reverts commit bda6e5bee04c75b1f1332b4fd1ac4e8ef6c3c247. See https://lab.llvm.org/buildbot/#/builders/109/builds/15424 for instance	2021-05-24 19:43:40 +02:00
serge-sans-paille	bda6e5bee0	[NFC] remove explicit default value for strboolattr attribute in tests Since d6de1e1a71406c75a4ea4d5a2fe84289f07ea3a1, no attributes is quivalent to setting attribute to false. This is a preliminary commit for https://reviews.llvm.org/D99080	2021-05-24 19:31:04 +02:00
Roman Lebedev	a26f1bf67e	[PassManager] Run additional LICM before LoopRotate Loop rotation often has to perform code duplication from header into preheader, which introduces PHI nodes. >>! In D99204, @thopre wrote: > > With loop peeling, it is important that unnecessary PHIs be avoided or > it will leads to spurious peeling. One source of such PHIs is loop > rotation which creates PHIs for invariant loads. Those PHIs are > particularly problematic since loop peeling is now run as part of simple > loop unrolling before GVN is run, and are thus a source of spurious > peeling. > > Note that while some of the load can be hoisted and eventually > eliminated by instruction combine, this is not always possible due to > alignment issue. In particular, the motivating example [1] was a load > inside a class instance which cannot be hoisted because the `this' > pointer has an alignment of 1. > > [1] http://lists.llvm.org/pipermail/llvm-dev/attachments/20210312/4ce73c47/attachment.cpp Now, we could enhance LoopRotate to avoid duplicating code when not needed, but instead hoist loop-invariant code, but isn't that a code duplication? (sic) We have LICM, and in fact we already run it right after LoopRotation. We could try to move it to before LoopRotation, that is basically free from compile-time perspective: https://llvm-compile-time-tracker.com/compare.php?from=6c93eb4477d88af046b915bc955c03693b2cbb58&to=a4bee6d07732b1184c436da489040b912f0dc271&stat=instructions But, looking at stats, i think it isn't great that we would no longer do LICM after LoopRotation, in particular: \| statistic name \| LoopRotate-LICM \| LICM-LoopRotate \| Δ \| % \| abs(%) \| \| asm-printer.EmittedInsts \| 9015930 \| 9015799 \| -131 \| 0.00% \| 0.00% \| \| indvars.NumElimCmp \| 3536 \| 3544 \| 8 \| 0.23% \| 0.23% \| \| indvars.NumElimExt \| 36725 \| 36580 \| -145 \| -0.39% \| 0.39% \| \| indvars.NumElimIV \| 1197 \| 1187 \| -10 \| -0.84% \| 0.84% \| \| indvars.NumElimIdentity \| 143 \| 136 \| -7 \| -4.90% \| 4.90% \| \| indvars.NumElimRem \| 4 \| 5 \| 1 \| 25.00% \| 25.00% \| \| indvars.NumLFTR \| 29842 \| 29890 \| 48 \| 0.16% \| 0.16% \| \| indvars.NumReplaced \| 2293 \| 2227 \| -66 \| -2.88% \| 2.88% \| \| indvars.NumSimplifiedSDiv \| 6 \| 8 \| 2 \| 33.33% \| 33.33% \| \| indvars.NumWidened \| 26438 \| 26329 \| -109 \| -0.41% \| 0.41% \| \| instcount.TotalBlocks \| 1178338 \| 1173840 \| -4498 \| -0.38% \| 0.38% \| \| instcount.TotalFuncs \| 111825 \| 111829 \| 4 \| 0.00% \| 0.00% \| \| instcount.TotalInsts \| 9905442 \| 9896139 \| -9303 \| -0.09% \| 0.09% \| \| lcssa.NumLCSSA \| 425871 \| 423961 \| -1910 \| -0.45% \| 0.45% \| \| licm.NumHoisted \| 378357 \| 378753 \| 396 \| 0.10% \| 0.10% \| \| licm.NumMovedCalls \| 2193 \| 2208 \| 15 \| 0.68% \| 0.68% \| \| licm.NumMovedLoads \| 35899 \| 31821 \| -4078 \| -11.36% \| 11.36% \| \| licm.NumPromoted \| 11178 \| 11154 \| -24 \| -0.21% \| 0.21% \| \| licm.NumSunk \| 13359 \| 13587 \| 228 \| 1.71% \| 1.71% \| \| loop-delete.NumDeleted \| 8547 \| 8402 \| -145 \| -1.70% \| 1.70% \| \| loop-instsimplify.NumSimplified \| 12876 \| 11890 \| -986 \| -7.66% \| 7.66% \| \| loop-peel.NumPeeled \| 1008 \| 925 \| -83 \| -8.23% \| 8.23% \| \| loop-rotate.NumNotRotatedDueToHeaderSize \| 368 \| 365 \| -3 \| -0.82% \| 0.82% \| \| loop-rotate.NumRotated \| 42015 \| 42003 \| -12 \| -0.03% \| 0.03% \| \| loop-simplifycfg.NumLoopBlocksDeleted \| 240 \| 242 \| 2 \| 0.83% \| 0.83% \| \| loop-simplifycfg.NumLoopExitsDeleted \| 497 \| 20 \| -477 \| -95.98% \| 95.98% \| \| loop-simplifycfg.NumTerminatorsFolded \| 618 \| 336 \| -282 \| -45.63% \| 45.63% \| \| loop-unroll.NumCompletelyUnrolled \| 11028 \| 11032 \| 4 \| 0.04% \| 0.04% \| \| loop-unroll.NumUnrolled \| 12608 \| 12529 \| -79 \| -0.63% \| 0.63% \| \| mem2reg.NumDeadAlloca \| 10222 \| 10221 \| -1 \| -0.01% \| 0.01% \| \| mem2reg.NumPHIInsert \| 192110 \| 192106 \| -4 \| 0.00% \| 0.00% \| \| mem2reg.NumSingleStore \| 637650 \| 637643 \| -7 \| 0.00% \| 0.00% \| \| scalar-evolution.NumBruteForceTripCountsComputed \| 814 \| 812 \| -2 \| -0.25% \| 0.25% \| \| scalar-evolution.NumTripCountsComputed \| 283108 \| 282934 \| -174 \| -0.06% \| 0.06% \| \| scalar-evolution.NumTripCountsNotComputed \| 106712 \| 106718 \| 6 \| 0.01% \| 0.01% \| \| simple-loop-unswitch.NumBranches \| 5178 \| 4752 \| -426 \| -8.23% \| 8.23% \| \| simple-loop-unswitch.NumCostMultiplierSkipped \| 914 \| 503 \| -411 \| -44.97% \| 44.97% \| \| simple-loop-unswitch.NumSwitches \| 20 \| 18 \| -2 \| -10.00% \| 10.00% \| \| simple-loop-unswitch.NumTrivial \| 183 \| 95 \| -88 \| -48.09% \| 48.09% \| ... but that actually regresses LICM (-12% `licm.NumMovedLoads`), loop-simplifycfg (`NumLoopExitsDeleted`, `NumTerminatorsFolded`), simple-loop-unswitch (`NumTrivial`). What if we instead have LICM both before and after LoopRotate? \| statistic name \| LoopRotate-LICM \| LICM-LoopRotate-LICM \| Δ \| % \| abs(%) \| \| asm-printer.EmittedInsts \| 9015930 \| 9014474 \| -1456 \| -0.02% \| 0.02% \| \| indvars.NumElimCmp \| 3536 \| 3546 \| 10 \| 0.28% \| 0.28% \| \| indvars.NumElimExt \| 36725 \| 36681 \| -44 \| -0.12% \| 0.12% \| \| indvars.NumElimIV \| 1197 \| 1185 \| -12 \| -1.00% \| 1.00% \| \| indvars.NumElimIdentity \| 143 \| 146 \| 3 \| 2.10% \| 2.10% \| \| indvars.NumElimRem \| 4 \| 5 \| 1 \| 25.00% \| 25.00% \| \| indvars.NumLFTR \| 29842 \| 29899 \| 57 \| 0.19% \| 0.19% \| \| indvars.NumReplaced \| 2293 \| 2299 \| 6 \| 0.26% \| 0.26% \| \| indvars.NumSimplifiedSDiv \| 6 \| 8 \| 2 \| 33.33% \| 33.33% \| \| indvars.NumWidened \| 26438 \| 26404 \| -34 \| -0.13% \| 0.13% \| \| instcount.TotalBlocks \| 1178338 \| 1173652 \| -4686 \| -0.40% \| 0.40% \| \| instcount.TotalFuncs \| 111825 \| 111829 \| 4 \| 0.00% \| 0.00% \| \| instcount.TotalInsts \| 9905442 \| 9895452 \| -9990 \| -0.10% \| 0.10% \| \| lcssa.NumLCSSA \| 425871 \| 425373 \| -498 \| -0.12% \| 0.12% \| \| licm.NumHoisted \| 378357 \| 383352 \| 4995 \| 1.32% \| 1.32% \| \| licm.NumMovedCalls \| 2193 \| 2204 \| 11 \| 0.50% \| 0.50% \| \| licm.NumMovedLoads \| 35899 \| 35755 \| -144 \| -0.40% \| 0.40% \| \| licm.NumPromoted \| 11178 \| 11163 \| -15 \| -0.13% \| 0.13% \| \| licm.NumSunk \| 13359 \| 14321 \| 962 \| 7.20% \| 7.20% \| \| loop-delete.NumDeleted \| 8547 \| 8538 \| -9 \| -0.11% \| 0.11% \| \| loop-instsimplify.NumSimplified \| 12876 \| 12041 \| -835 \| -6.48% \| 6.48% \| \| loop-peel.NumPeeled \| 1008 \| 924 \| -84 \| -8.33% \| 8.33% \| \| loop-rotate.NumNotRotatedDueToHeaderSize \| 368 \| 365 \| -3 \| -0.82% \| 0.82% \| \| loop-rotate.NumRotated \| 42015 \| 42005 \| -10 \| -0.02% \| 0.02% \| \| loop-simplifycfg.NumLoopBlocksDeleted \| 240 \| 241 \| 1 \| 0.42% \| 0.42% \| \| loop-simplifycfg.NumTerminatorsFolded \| 618 \| 619 \| 1 \| 0.16% \| 0.16% \| \| loop-unroll.NumCompletelyUnrolled \| 11028 \| 11029 \| 1 \| 0.01% \| 0.01% \| \| loop-unroll.NumUnrolled \| 12608 \| 12525 \| -83 \| -0.66% \| 0.66% \| \| mem2reg.NumPHIInsert \| 192110 \| 192073 \| -37 \| -0.02% \| 0.02% \| \| mem2reg.NumSingleStore \| 637650 \| 637652 \| 2 \| 0.00% \| 0.00% \| \| scalar-evolution.NumTripCountsComputed \| 283108 \| 282998 \| -110 \| -0.04% \| 0.04% \| \| scalar-evolution.NumTripCountsNotComputed \| 106712 \| 106691 \| -21 \| -0.02% \| 0.02% \| \| simple-loop-unswitch.NumBranches \| 5178 \| 5185 \| 7 \| 0.14% \| 0.14% \| \| simple-loop-unswitch.NumCostMultiplierSkipped \| 914 \| 925 \| 11 \| 1.20% \| 1.20% \| \| simple-loop-unswitch.NumTrivial \| 183 \| 179 \| -4 \| -2.19% \| 2.19% \| \| simple-loop-unswitch.NumBranches \| 5178 \| 4752 \| -426 \| -8.23% \| 8.23% \| \| simple-loop-unswitch.NumCostMultiplierSkipped \| 914 \| 503 \| -411 \| -44.97% \| 44.97% \| \| simple-loop-unswitch.NumSwitches \| 20 \| 18 \| -2 \| -10.00% \| 10.00% \| \| simple-loop-unswitch.NumTrivial \| 183 \| 95 \| -88 \| -48.09% \| 48.09% \| I.e. we end up with less instructions, less peeling, more LICM activity, also note how none of those 4 regressions are here. Namely: \| statistic name \| LICM-LoopRotate \| LICM-LoopRotate-LICM \| Δ \| % \| abs(%) \| \| asm-printer.EmittedInsts \| 9015799 \| 9014474 \| -1325 \| -0.01% \| 0.01% \| \| indvars.NumElimCmp \| 3544 \| 3546 \| 2 \| 0.06% \| 0.06% \| \| indvars.NumElimExt \| 36580 \| 36681 \| 101 \| 0.28% \| 0.28% \| \| indvars.NumElimIV \| 1187 \| 1185 \| -2 \| -0.17% \| 0.17% \| \| indvars.NumElimIdentity \| 136 \| 146 \| 10 \| 7.35% \| 7.35% \| \| indvars.NumLFTR \| 29890 \| 29899 \| 9 \| 0.03% \| 0.03% \| \| indvars.NumReplaced \| 2227 \| 2299 \| 72 \| 3.23% \| 3.23% \| \| indvars.NumWidened \| 26329 \| 26404 \| 75 \| 0.28% \| 0.28% \| \| instcount.TotalBlocks \| 1173840 \| 1173652 \| -188 \| -0.02% \| 0.02% \| \| instcount.TotalInsts \| 9896139 \| 9895452 \| -687 \| -0.01% \| 0.01% \| \| lcssa.NumLCSSA \| 423961 \| 425373 \| 1412 \| 0.33% \| 0.33% \| \| licm.NumHoisted \| 378753 \| 383352 \| 4599 \| 1.21% \| 1.21% \| \| licm.NumMovedCalls \| 2208 \| 2204 \| -4 \| -0.18% \| 0.18% \| \| licm.NumMovedLoads \| 31821 \| 35755 \| 3934 \| 12.36% \| 12.36% \| \| licm.NumPromoted \| 11154 \| 11163 \| 9 \| 0.08% \| 0.08% \| \| licm.NumSunk \| 13587 \| 14321 \| 734 \| 5.40% \| 5.40% \| \| loop-delete.NumDeleted \| 8402 \| 8538 \| 136 \| 1.62% \| 1.62% \| \| loop-instsimplify.NumSimplified \| 11890 \| 12041 \| 151 \| 1.27% \| 1.27% \| \| loop-peel.NumPeeled \| 925 \| 924 \| -1 \| -0.11% \| 0.11% \| \| loop-rotate.NumRotated \| 42003 \| 42005 \| 2 \| 0.00% \| 0.00% \| \| loop-simplifycfg.NumLoopBlocksDeleted \| 242 \| 241 \| -1 \| -0.41% \| 0.41% \| \| loop-simplifycfg.NumLoopExitsDeleted \| 20 \| 497 \| 477 \| 2385.00% \| 2385.00% \| \| loop-simplifycfg.NumTerminatorsFolded \| 336 \| 619 \| 283 \| 84.23% \| 84.23% \| \| loop-unroll.NumCompletelyUnrolled \| 11032 \| 11029 \| -3 \| -0.03% \| 0.03% \| \| loop-unroll.NumUnrolled \| 12529 \| 12525 \| -4 \| -0.03% \| 0.03% \| \| mem2reg.NumDeadAlloca \| 10221 \| 10222 \| 1 \| 0.01% \| 0.01% \| \| mem2reg.NumPHIInsert \| 192106 \| 192073 \| -33 \| -0.02% \| 0.02% \| \| mem2reg.NumSingleStore \| 637643 \| 637652 \| 9 \| 0.00% \| 0.00% \| \| scalar-evolution.NumBruteForceTripCountsComputed \| 812 \| 814 \| 2 \| 0.25% \| 0.25% \| \| scalar-evolution.NumTripCountsComputed \| 282934 \| 282998 \| 64 \| 0.02% \| 0.02% \| \| scalar-evolution.NumTripCountsNotComputed \| 106718 \| 106691 \| -27 \| -0.03% \| 0.03% \| \| simple-loop-unswitch.NumBranches \| 4752 \| 5185 \| 433 \| 9.11% \| 9.11% \| \| simple-loop-unswitch.NumCostMultiplierSkipped \| 503 \| 925 \| 422 \| 83.90% \| 83.90% \| \| simple-loop-unswitch.NumSwitches \| 18 \| 20 \| 2 \| 11.11% \| 11.11% \| \| simple-loop-unswitch.NumTrivial \| 95 \| 179 \| 84 \| 88.42% \| 88.42% \| {F15983613} {F15983615} {F15983616} (this is vanilla llvm testsuite + rawspeed + darktable) As an example of the code where early LICM only is bad, see: https://godbolt.org/z/GzEbacs4K This does have an observable compile-time regression of +~0.5% geomean https://llvm-compile-time-tracker.com/compare.php?from=7c5222e4d1a3a14f029e5f614c9aefd0fa505f1e&to=5d81826c3411982ca26e46b9d0aff34c80577664&stat=instructions but i think that's basically nothing, and there's potential that it might be avoidable in the future by fixing clang to produce alignment information on function arguments, thus making the second run unneeded. Differential Revision: https://reviews.llvm.org/D99249	2021-04-02 11:11:42 +03:00
Max Kazantsev	16370e02a7	[IndVars] Provide eliminateIVComparison with context We can prove more predicates when we have a context when eliminating ICmp. As first (and very obvious) approximation we can use the ICmp instruction itself, though in the future we are going to use a common dominator of all its users. Need some refactoring before that. Observed ~0.5% negative compile time impact. Differential Revision: https://reviews.llvm.org/D98697 Reviewed By: lebedev.ri	2021-03-19 12:28:22 +07:00
Max Kazantsev	fff1363ba0	[SCEV] Add false->any implication By definition of Implication operator, `false -> true` and `false -> false`. It means that `false` implies any predicate, no matter true or false. We don't need to go any further trying to prove the statement we need and just always say that `false` implies it in this case. In practice it means that we are trying to prove something guarded by `false` condition, which means that this code is unreachable, and we can safely prove any fact or perform any transform in this code. Differential Revision: https://reviews.llvm.org/D98706 Reviewed By: lebedev.ri	2021-03-19 11:29:48 +07:00
Philip Reames	239a618180	[instcombine] Collapse trivial and recurrences If we have a recurrence of the form <Start, And, Step> we know that the value taken by the recurrence stabilizes on the first iteration (provided step is loop invariant). We can exploit that fact to remove the loop carried dependence in the recurrence. Differential Revision: https://reviews.llvm.org/D97578 (and part)	2021-03-08 09:21:38 -08:00
Roman Lebedev	b46c085d2b	[NFCI] SCEVExpander: emit intrinsics for integral {u,s}{min,max} SCEV expressions These intrinsics, not the icmp+select are the canonical form nowadays, so we might as well directly emit them. This should not cause any regressions, but if it does, then then they would needed to be fixed regardless. Note that this doesn't deal with `SCEVExpander::isHighCostExpansion()`, but that is a pessimization, not a correctness issue. Additionally, the non-intrinsic form has issues with undef, see https://reviews.llvm.org/D88287#2587863	2021-03-06 21:52:46 +03:00
Philip Reames	ef51eed37b	[LoopDeletion] Handle inner loops w/untaken backedges This builds on the restricted after initial revert form of D93906, and adds back support for breaking backedges of inner loops. It turns out the original invalidation logic wasn't quite right, specifically around the handling of LCSSA. When breaking the backedge of an inner loop, we can cause blocks which were in the outer loop only because they were also included in a sub-loop to be removed from both loops. This results in the exit block set for our original parent loop changing, and thus a need for new LCSSA phi nodes. This case happens when the inner loop has an exit block which is also an exit block of the parent, and there's a block in the child which reaches an exit to said block without also reaching an exit to the parent loop. (I'm describing this in terms of the immediate parent, but the problem is general for any transitive parent in the nest.) The approach implemented here involves a potentially expensive LCSSA rebuild. Perf testing during review didn't show anything concerning, but we may end up needing to revert this if anyone encounters a practical compile time issue. Differential Revision: https://reviews.llvm.org/D94378	2021-01-22 16:31:29 -08:00
Philip Reames	7c63aac7bd	Revert "[LoopDeletion] Break backedge of loops when known not taken" This reverts commit dd6bb367d19e3bf18353e40de54d35480999a930. Multi-stage builders are showing an assertion failure w/LCSSA not being preserved on entry to IndVars. Reason isn't clear, reverting while investigating.	2021-01-04 09:50:47 -08:00
Philip Reames	dd6bb367d1	[LoopDeletion] Break backedge of loops when known not taken The basic idea is that if SCEV can prove the backedge isn't taken, we can go ahead and get rid of the backedge (and thus the loop) while leaving the rest of the control in place. This nicely handles cases with dispatch between multiple exits and internal side effects. Differential Revision: https://reviews.llvm.org/D93906	2021-01-04 09:19:29 -08:00
Arthur Eubanks	85af1d6257	[test] Fix pr45360.ll under NPM The IR is the same under the NPM, but some basic block labels and value names are different.	2020-12-28 14:42:52 -08:00
Nikita Popov	b218407512	[ValueTracking] Handle more non-trivial conditions in isKnownNonZero() In 35676a4f9a536a2aab768af63ddbb15bc722d7f9 I've added handling for non-trivial dominating conditions that imply non-zero on the true branch. This adds the same support for the false branch. The changes in pr45360.ll change block ordering and naming, but don't change the control flow. The urem is still guaraded by a non-zero check correctly.	2020-12-26 15:48:04 +01:00
Nikita Popov	9ace4b337f	Revert "[SCEV] Factor out part of wrap flag detection logic [NFC-ish]" This reverts commit 1ec6e1eb8a084bffae8a40236eb9925d8026dd07. This change causes a significant compile-time regression: https://llvm-compile-time-tracker.com/compare.php?from=dd0b8b94d0796bd895cc998dd163b4fbebceb0b8&to=1ec6e1eb8a084bffae8a40236eb9925d8026dd07&stat=instructions I assume that this is due to the non-NFC part of the change, which now performs expensive nowrap inference even for nowrap flags that are not used by the particular code.	2020-11-15 10:19:44 +01:00
Philip Reames	1ec6e1eb8a	[SCEV] Factor out part of wrap flag detection logic [NFC-ish] In an effort to make code around flag determination more readable, and (possibly) prepare for a follow up change, factor out some of the flag detection logic. In the process, reduce the number of locations we mutate wrap flags by a couple. Note that this isn't NFC. The old code tried for NSW xor (NUW \|\| NW). This is, two different paths computed different sets of wrap flags. The new code will try for all three. The result is that some expressions end up with a few extra flags set.	2020-11-14 19:21:05 -08:00
Nikita Popov	0dda633317	[SCEV] Strength nowrap flags after constant folding We should first try to constant fold the add expression and only strengthen nowrap flags afterwards. This allows us to determine stronger flags if e.g. only two operands are left after constant folding (and thus "guaranteed no wrap region" code applies) or the resulting operands are non-negative and thus nsw->nuw strengthening applies.	2020-10-25 18:00:22 +01:00
Nikita Popov	c5718253c9	[IndVars] Regenerate test checks (NFC) Also run the test case through -instnamer.	2020-10-25 17:45:12 +01:00

1 2

60 Commits