llvm-project

Author	SHA1	Message	Date
Wenju He	0bda12b5bc	[NewPM] Add OptimizerEarly module extension point VectorizerStart extension is module callback in old PM, but is function callback in new PM. We lack a module extension point between end of buildModuleSimplificationPipeline and the function optimization (including vectorizer) pipeline. So this patch adds a new module extension point before the function optimization pipeline. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D122296	2022-03-31 08:22:27 -07:00
Arthur Eubanks	9bd66b312c	[PassManager][Coroutine] Run passes under -O0 conditionally and run GlobalDCE CoroSplit lowers various coroutine intrinsics. It's a CGSCC pass and CGSCC passes don't run on unreachable functions. Normally GlobalDCE will come along and delete unreachable functions, but we don't run GlobalDCE under -O0, so an unreachable function with coroutine intrinsics may never have CoroSplit run on it. This patch adds GlobalDCE when coroutines intrinsics are present. It also now runs all coroutine passes conditional when coroutine intrinsics are present. This should also solve the -O0 regression reported in D105877 due to LazyCallGraph construction. Fixes https://github.com/llvm/llvm-project/issues/54117 Reviewed By: ChuanqiXu Differential Revision: https://reviews.llvm.org/D122275	2022-03-23 11:03:26 -07:00
Arthur Eubanks	4fc7c55fff	[NewPM] Actually recompute GlobalsAA before module optimization pipeline RequireAnalysis<GlobalsAA> doesn't actually recompute GlobalsAA. GlobalsAA isn't invalidated (unless specifically invalidated) because it's self-updating via ValueHandles, but can be imprecise during the self-updates. Rather than invalidating GlobalsAA, which would invalidate AAManager and any analyses that use AAManager, create a new pass that recomputes GlobalsAA. Fixes #53131. Differential Revision: https://reviews.llvm.org/D121167	2022-03-14 09:42:34 -07:00
Elia Geretto	942efa5927	[NewPM] Add extension points to LTO pipeline in PassBuilder This PR adds two extension points to the default LTO pipeline in PassBuilder, one at the beginning and one at the end. These two extension points already existed in the old pass manager, the aim is to replicate the same functionality in the new one. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D120491	2022-02-25 14:48:54 -08:00
Arthur Eubanks	18da681034	[NFC] Remove unnecessary function pass managers	2022-02-25 10:03:17 -08:00
William S. Moses	d9da6a535f	[LICM][PhaseOrder] Don't speculate in LICM until after running loop rotate LICM will speculatively hoist code outside of loops. This requires removing information, like alias analysis (https://github.com/llvm/llvm-project/issues/53794), range information (https://bugs.llvm.org/show_bug.cgi?id=50550), among others. Prior to https://reviews.llvm.org/D99249 , LICM would only be run after LoopRotate. Running Loop Rotate prior to LICM prevents a instruction hoist from being speculative, if it was conditionally executed by the iteration (as is commonly emitted by clang and other frontends). Adding the additional LICM pass first, however, forces all of these instructions to be considered speculative, even if they are not speculative after LoopRotate. This destroys information, resulting in performance losses for discarding this additional information. This PR modifies LICM to accept a ``speculative'' parameter which allows LICM to be set to perform information-loss speculative hoists or not. Phase ordering is then modified to not perform the information-losing speculative hoists until after loop rotate is performed, preserving this additional information. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D119965	2022-02-17 20:13:07 -05:00
Roman Lebedev	371fcb720e	[SimplifyCFG][PhaseOrdering] Defer lowering switch into an integer range comparison and branch until after at least the IPSCCP That transformation is lossy, as discussed in https://github.com/llvm/llvm-project/issues/53853 and https://github.com/rust-lang/rust/issues/85133#issuecomment-904185574 This is an alternative to D119839, which would add a limited IPSCCP into SimplifyCFG. Unlike lowering switch to lookup, we still want this transformation to happen relatively early, but after giving a chance for the things like CVP to do their thing. It seems like deferring it just until the IPSCCP is enough for the tests at hand, but perhaps we need to be more aggressive and disable it until CVP. Fixes https://github.com/llvm/llvm-project/issues/53853 Refs. https://github.com/rust-lang/rust/issues/85133 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D119854	2022-02-17 12:13:55 +03:00
Joseph Huber	9d3a47576c	[PassBuilder] Add OpenMPOpt to default LTO pipeline The LTO support for OpenMP offloading allows us to run the OpenMPOpt pass during the LTO pipeline. This patch introduces an early run of the Module pass and a late run of the CGSCC pass. These are quick no-ops if there is no OpenMP in the module. Depends on D118198 Differential Revision: https://reviews.llvm.org/D118611	2022-01-31 23:11:43 -05:00
Sjoerd Meijer	f269ec230e	[LoopFlatten] Move it from LPM2 to LPM1 In D110057 we moved LoopFlatten to a LoopPassManager. This caused a performance regression for our 64-bit targets (the 32-bit were unaffected), the pass is no longer triggering for a motivating example. The reason is that the IR is just very different than expected; we try to match loop statements and particular uses of induction variables. The easiest is to just move LoopFlatten to a place in the pipeline where the IR is as expected, which is just before IndVarSimplify. This means we move it from LPM2 to LPM1, so that it actually runs just a bit earlier from where it was running before. IndVarSimplify is responsible for significant rewrites that are difficult to "look through" in LoopFlatten. Differential Revision: https://reviews.llvm.org/D116612	2022-01-19 14:38:05 +00:00
Sjoerd Meijer	016022e5da	Recommit "[LoopFlatten] Move it to a LoopPassManager" This was reverted because of a performance regression, which is fixed by D116612 that I will commit directly after this change. This reverts commit e92d63b467e13227408f9726fbd66d07cc58811c.	2022-01-19 14:38:05 +00:00
David Green	e92d63b467	Revert "[LoopFlatten] Move it to a LoopPassManager" This commit caused performance regressions due to differences in the expected code during loop flattening. Reverting it until the fix is ready, which hopefully wont take too long. This reverts commit 86825fc2fb363b807569327880c05e4b0b5393ec.	2022-01-10 11:03:49 +00:00
Sjoerd Meijer	86825fc2fb	[LoopFlatten] Move it to a LoopPassManager In D109958 it was noticed that we could optimise the pipeline and avoid rerunning LoopSimplify/LCSSA for LoopFlatten by moving it to a LoopPassManager. Differential Revision: https://reviews.llvm.org/D110057	2021-12-30 12:32:14 +00:00
Florian Hahn	acea6e9cfa	[Passes] Only run extra vector passes if loops have been vectorized. This patch uses a similar trick as in D113947 to only run the extra passes after vectorization on functions where loops have been vectorized. The reason for running the 'extra vector passes' is simplification/unswitching of the runtime checks created by LV, there should be no need to run them if nothing got vectorized To do that, a new dummy analysis ShouldRunExtraVectorPasses has been added. If loops have been vectorized for a function, LV will cache the analysis. At the moment it uses MadeCFGChanges as proxy for loop vectorized, which isn't perfect (it could be too aggressive, e.g. because no runtime checks have been added), but should be good enough for now. The extra passes are now managed by a new FunctionPassManager that runs its passes only if ShouldRunExtraVectorPasses has been cached. Without this patch, `-extra-vectorizer-passes` has the following compile-time impact: NewPM-O3: +4.86% NewPM-ReleaseThinLTO: +3.56% NewPM-ReleaseLTO-g: +7.17% http://llvm-compile-time-tracker.com/compare.php?from=ead3979a92fc33add4710c4510d6906260dcb4ad&to=c292da649e2c6e88a31e702fdc474727d09c72bc&stat=instructions With this patch, that gets reduced to NewPM-O3: +1.43% NewPM-ReleaseThinLTO: +1.00% NewPM-ReleaseLTO-g: +1.58% http://llvm-compile-time-tracker.com/compare.php?from=ead3979a92fc33add4710c4510d6906260dcb4ad&to=e67d86b57810011cf285eb9aa1944781be6096f0&stat=instructions It is probably still too high to enable by default, but much better. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D115052	2021-12-10 11:42:45 +00:00
Arthur Eubanks	5c7e783ebe	[NFC] Clarify comment about LoopDeletionPass in the optimization pipeline Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D115179	2021-12-07 09:58:12 -08:00
Nikita Popov	ae7f468073	[NewPM] Fix MergeFunctions scheduling MergeFunctions (as well as HotColdSplitting an IROutliner) are incorrectly scheduled under the new pass manager. The code makes it look like they run towards the end of the module optimization pipeline (as they should), while in reality the run at the start. This is because the OptimizePM populated around them is only scheduled later. I'm fixing this by moving these three passes until after OptimizePM to avoid splitting the function pass pipeline. It doesn't seem important to me that some of the function passes run after these late module passes. Differential Revision: https://reviews.llvm.org/D115098	2021-12-04 17:30:30 +01:00
Anton Afanasyev	c34d157fc7	[Passes] Move AggressiveInstCombine after InstCombine Swap AIC and IC neighbouring in pipeline. This looks more natural and even almost has no effect for now (three slightly touched tests of test-suite). Also this could be the first step towards merging AIC (or its part) to -O2 pipeline. After several changes in AIC (like D108091, D108201, D107766, D109515, D109236) there've been observed several regressions (like PR52078, PR52253, PR52289) that were fixed in different passes (see D111330, D112721) by extending their functionality, but these regressions were exposed since changed AIC prevents IC from making some of early optimizations. This is common problem and it should be fixed by just moving AIC after IC which looks more logically by itself: make aggressive instruction combining only after failed ordinary one. Fixes PR52289 Reviewed By: spatel, RKSimon Differential Revision: https://reviews.llvm.org/D113179	2021-12-04 14:22:43 +03:00
Nikita Popov	5b94037a30	[PhaseOrdering] Add test for incorrect merge function scheduling Add an -enable-merge-functions option to allow testing of function merging as it will actually happen in the optimization pipeline. Based on that add a test where we currently produce two identical functions without merging them due to incorrect pass scheduling under the new pass manager.	2021-12-04 10:12:04 +01:00
Liqiang Tao	7e8f9d6b38	[llvm][Inline] Add FunctionSimplificationPipeline to module inliner pipeline The FunctionSimplificationPipeline could effectively reduce the size of .text section when module inliner is enabled. Reviewed By: kazu Differential Revision: https://reviews.llvm.org/D114704	2021-12-03 12:13:31 +09:00
Florian Hahn	770a50b28c	[AnnotationRemarks] Support generating annotation remarks with -O0. This matches the legacy pass manager behavior. If remarks are not enabled the pass is effectively a no-op.	2021-12-02 15:01:02 +00:00
Arthur Eubanks	e3e25b5112	[NewPM] Add option to prevent rerunning function pipeline on functions in CGSCC adaptor In a CGSCC pass manager, we may visit the same function multiple times due to SCC mutations. In the inliner pipeline, this results in running the function simplification pipeline on a function multiple times even if it hasn't been changed since the last function simplification pipeline run. We use a newly introduced analysis to keep track of whether or not a function has changed since the last time the function simplification pipeline has run on it. If we see this analysis available for a function in a CGSCCToFunctionPassAdaptor, we skip running the function passes on the function. The analysis is queried at the end of the function passes so that it's available after the first time the function simplification pipeline runs on a function. This is a per-adaptor option so it doesn't apply to every adaptor. The goal of this is to improve compile times. However, currently we can't turn this on by default at least for the higher optimization levels since the function simplification pipeline is not robust enough to be idempotent in many cases, resulting in performance regressions if we stop running the function simplification pipeline on a function multiple times. We may be able to turn this on for -O1 in the near future, but turning this on for higher optimization levels would require more investment in the function simplification pipeline. Heavily inspired by D98103. Example compile time improvements with flag turned on: https://llvm-compile-time-tracker.com/compare.php?from=998dc4a5d3491d2ae8cbe742d2e13bc1b0cacc5f&to=5c27c913687d3d5559ef3ab42b5a3d513531d61c&stat=instructions Reviewed By: asbirlea, nikic Differential Revision: https://reviews.llvm.org/D113947	2021-11-17 09:06:46 -08:00
Arthur Eubanks	19867de9e7	[NewPM] Only invalidate modified functions' analyses in CGSCC passes + turn on eagerly invalidate analyses Previously, any change in any function in an SCC would cause all analyses for all functions in the SCC to be invalidated. With this change, we now manually invalidate analyses for functions we modify, then let the pass manager know that all function analyses should be preserved since we've already handled function analysis invalidation. So far this only touches the inliner, argpromotion, function-attrs, and updateCGAndAnalysisManager(), since they are the most used. This is part of an effort to investigate running the function simplification pipeline less on functions we visit multiple times in the inliner pipeline. However, this causes major memory regressions especially on larger IR. To counteract this, turn on the option to eagerly invalidate function analyses. This invalidates analyses on functions immediately after they're processed in a module or scc to function adaptor for specific parts of the pipeline. Within an SCC, if a pass only modifies one function, other functions in the SCC do not have their analyses invalidated, so in later function passes in the SCC pass manager the analyses may still be cached. It is only after the function passes that the eager invalidation takes effect. For the default pipelines this makes sense because the inliner pipeline runs the function simplification pipeline after all other SCC passes (except CoroSplit which doesn't request any analyses). Overall this has mostly positive effects on compile time and positive effects on memory usage. https://llvm-compile-time-tracker.com/compare.php?from=7f627596977624730f9298a1b69883af1555765e&to=39e824e0d3ca8a517502f13032dfa67304841c90&stat=instructions https://llvm-compile-time-tracker.com/compare.php?from=7f627596977624730f9298a1b69883af1555765e&to=39e824e0d3ca8a517502f13032dfa67304841c90&stat=max-rss D113196 shows that we slightly regressed compile times in exchange for some memory improvements when turning on eager invalidation. D100917 shows that we slightly improved compile times in exchange for major memory regressions in some cases when invalidating less in SCC passes. Turning these on at the same time keeps the memory improvements while keeping compile times neutral/slightly positive. Reviewed By: asbirlea, nikic Differential Revision: https://reviews.llvm.org/D113304	2021-11-15 14:44:53 -08:00
Arthur Eubanks	1d8750c3da	[NFC] Rename GVN -> GVNPass and SROA -> SROAPass To be more consistent with other pass struct names. There are still more passes that don't end with "Pass", but these are the important ones. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D112935	2021-11-09 10:35:58 -08:00
Liqiang Tao	6cad45d5c6	[llvm][Inline] Add a module level inliner Add module level inliner, which is a minimum viable product at this point. Also add some tests for it. RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-August/152297.html Reviewed By: kazu Differential Revision: https://reviews.llvm.org/D106448	2021-11-09 11:03:29 +08:00
Arthur Eubanks	7175886a0f	[NewPM] Make eager analysis invalidation per-adaptor Follow-up change to D111575. We don't need eager invalidation on every adaptor. Most notably, adaptors running passes that use very few analyses, or passes that purely invalidate specific analyses. Also allow testing of this via a pipeline string "function<eager-inv>()". The compile time/memory impact of this is very comparable to D111575. https://llvm-compile-time-tracker.com/compare.php?from=9a2eec512a29df45c90c2fcb741e9d5c693b1383&to=b9f20bcdea138060967d95a98eab87ce725b22bb&stat=instructions Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D113196	2021-11-04 17:16:11 -07:00
Sjoerd Meijer	3fd1902ad8	[FuncSpec] Enable it only with -O3 Function specialisation was running at all optimisation levels (if enabled on the command line, it is not on by default). That was an oversight and not something we want to do. Function specialisation duplicates functions when it triggers, so the backend is processing more functions/instructions resulting in compile-time increases, which seems more appropriate with -O3 and inline with GCC. Please note that since function specialisation is not enabled by default, this didn't require updating any pass manager tests. Differential Revision: https://reviews.llvm.org/D112129	2021-11-04 13:59:00 +00:00
Roman Lebedev	9c2469c1dd	[PassManager] `buildModuleOptimizationPipeline()`: schedule `LoopDeletion` pass run before vectorization passes Test thanks to Michael Kuklinski from `#llvm`: https://godbolt.org/z/bdrah5Goo originally inspired by Daniel Lemire's https://lemire.me/blog/2021/10/26/in-c-is-empty-faster-than-comparing-the-size-with-zero/ We manage to deduce that the answer does not require looping, but we do that after the last `LoopDeletion` pass run, so we end up being stuck with a dead loop. Now, as with all things SCEV, this has a very expected ~`+0.12%` compile time performance regression: https://llvm-compile-time-tracker.com/compare.php?from=0ae7bf124a9bca76dd9a91b2f7379168ff13f562&to=c2ae57c9b961aeb4a28c747266949340613a6d84&stat=instructions (for comparison, doing that in function simplification pipeline would have been ~`+0.5` compile time performance regression, D112840) Looking at the transformation stats over vanilla test-suite, i think it's rather expected: ``` \| statistic name \| baseline \| proposed \| Δ \| % \| \|%\| \| \|--------------------------------------------------\|----------:\|----------:\|------:\|-------:\|-------:\| \| scalar-evolution.NumBruteForceTripCountsComputed \| 789 \| 888 \| 99 \| 12.55% \| 12.55% \| \| scalar-evolution.NumTripCountsNotComputed \| 105592 \| 117900 \| 12308 \| 11.66% \| 11.66% \| \| loop-delete.NumBackedgesBroken \| 542 \| 559 \| 17 \| 3.14% \| 3.14% \| \| regalloc.numExtends \| 81 \| 79 \| -2 \| -2.47% \| 2.47% \| \| indvars.NumFoldedUser \| 408 \| 400 \| -8 \| -1.96% \| 1.96% \| \| indvars.NumElimCmp \| 3831 \| 3758 \| -73 \| -1.91% \| 1.91% \| \| scalar-evolution.NumTripCountsComputed \| 299759 \| 304278 \| 4519 \| 1.51% \| 1.51% \| \| loop-delete.NumDeleted \| 8055 \| 8128 \| 73 \| 0.91% \| 0.91% \| \| machine-cse.NumCommutes \| 111 \| 110 \| -1 \| -0.90% \| 0.90% \| \| globaldce.NumFunctions \| 1187 \| 1192 \| 5 \| 0.42% \| 0.42% \| \| codegenprepare.NumSelectsExpanded \| 277 \| 278 \| 1 \| 0.36% \| 0.36% \| \| loop-unroll.NumRuntimeUnrolled \| 13841 \| 13791 \| -50 \| -0.36% \| 0.36% \| \| machinelicm.NumPostRAHoisted \| 1168 \| 1172 \| 4 \| 0.34% \| 0.34% \| \| phi-node-elimination.NumCriticalEdgesSplit \| 83054 \| 82879 \| -175 \| -0.21% \| 0.21% \| \| machine-cse.NumPREs \| 3085 \| 3079 \| -6 \| -0.19% \| 0.19% \| \| branch-folder.NumBranchOpts \| 108122 \| 107942 \| -180 \| -0.17% \| 0.17% \| \| loop-unroll.NumUnrolled \| 40136 \| 40067 \| -69 \| -0.17% \| 0.17% \| \| branch-folder.NumDeadBlocks \| 130818 \| 130607 \| -211 \| -0.16% \| 0.16% \| \| codegenprepare.NumBlocksElim \| 92856 \| 92714 \| -142 \| -0.15% \| 0.15% \| \| instsimplify.NumSimplified \| 103263 \| 103129 \| -134 \| -0.13% \| 0.13% \| \| instcombine.NumConstProp \| 26070 \| 26102 \| 32 \| 0.12% \| 0.12% \| \| instsimplify.NumExpand \| 1716 \| 1718 \| 2 \| 0.12% \| 0.12% \| \| loop-unroll.NumCompletelyUnrolled \| 9236 \| 9225 \| -11 \| -0.12% \| 0.12% \| \| branch-folder.NumHoist \| 2773 \| 2770 \| -3 \| -0.11% \| 0.11% \| \| regalloc.NumReloadsRemoved \| 10822 \| 10834 \| 12 \| 0.11% \| 0.11% \| \| regalloc.NumSnippets \| 11394 \| 11406 \| 12 \| 0.11% \| 0.11% \| \| machine-cse.NumCrossBBCSEs \| 1052 \| 1053 \| 1 \| 0.10% \| 0.10% \| \| machinelicm.NumCSEed \| 99887 \| 99784 \| -103 \| -0.10% \| 0.10% \| \| branch-folder.NumTailMerge \| 72501 \| 72435 \| -66 \| -0.09% \| 0.09% \| \| codegenprepare.NumExtUses \| 22007 \| 21987 \| -20 \| -0.09% \| 0.09% \| \| local.NumRemoved \| 68232 \| 68294 \| 62 \| 0.09% \| 0.09% \| \| loop-vectorize.LoopsAnalyzed \| 75483 \| 75413 \| -70 \| -0.09% \| 0.09% \| ``` Note that i'm only changing current PM, and not touching obsolete PM. This is an alternative to the function simplification pipeline variant of the same change, D112840. It has both less compile time impact (since the additional number of SCEV trip count calculations is way lass less than with the D112840), and it is much more powerful/impactful (almost 2x more loops deleted). I have checked, and doing this after loop rotation is favorable (more loops deleted). Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D112851	2021-11-03 19:24:49 +03:00
Florian Hahn	4a1d63d7d0	[VectorCombine] Add option to only run scalarization transforms. This patch adds a pass option to only run transforms that scalarize vector operations and do not create new vector instructions. When running VectorCombine early in the pipeline introducing new vector operations can have negative effects, like blocking loop or SLP vectorization. To avoid regressions, restrict the early VectorCombine run (when using -enable-matrix) to only perform scalarization and not introduce new vector operations. This is done as option to the pass directly, which is then set when adding the pass to the pipeline. This is done for the new pass manager only. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D111800	2021-10-15 20:35:58 +01:00
Hongtao Yu	42ad7e1bc9	[CSSPGO] Turn off PseudoProbeUpdatePass for non-FDO builds. PseudoProbeUpdatePass is used to distribute sample counts among dulplicated probes. It doesn't make sense for it to run without a sample profile. The pass takes 1% of the build time. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D111847	2021-10-14 17:12:49 -07:00
Florian Hahn	a7c6471a85	[Passes] Run vector-combine early with -fenable-matrix. IR with matrix intrinsics is likely to also contain large vector operations, which can benefit from early simplifications. This is the last step in a series of changes to improve code-gen for code using matrix subscript operators with the C/C++ matrix extension in CLang, like using matrix_t = double __attribute__((matrix_type(15, 15))); void foo(unsigned i, matrix_t &A, matrix_t &B) { for (unsigned j = 0; j < 4; ++j) for (unsigned k = 0; k < i; k++) B[k][j] -= A[k][j] * B[i][j]; } https://clang.godbolt.org/z/6dKxK1Ed7 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D102496	2021-09-22 12:48:32 +01:00
Arthur Eubanks	c3ddc13d7d	[NFC] Split up PassBuilder.cpp PassBuilder.cpp is the slowest file to compile in LLVM. When trying to test changes to pipelines, it takes a long time to recompile. This doesn't actually speedup building PassBuilder.cpp itself since most of the time is spent in other large/duplicated functions caused by PassRegistry.def. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D109798	2021-09-15 15:30:39 -07:00

30 Commits