llvm-project

Author	SHA1	Message	Date
Florian Hahn	fb13dcf343	[ConstraintElim] Enable pass by default. The pass should help to close a functional gap when it comes to reasoning about related conditions in a relatively general way. It addresses multiple existing issues (linked below) and the need for a more powerful reasoning system was also discussed recently in https://discourse.llvm.org/t/rfc-alternative-approach-of-dealing-with-implications-from-comparisons-through-pos-analysis/65601/7 On AArch64, the new pass performs ~2000 simplifications on MultiSource,SPEC2006,SPEC2017 with -O3. Compile-time impact: NewPM-O3: +0.20% NewPM-ReleaseThinLTO: +0.32% NewPM-ReleaseLTO-g: +0.28% https://llvm-compile-time-tracker.com/compare.php?from=f01a3a893c147c1594b9a3fbd817456b209dabbf&to=577688758ef64fb044215ec3e497ea901bb2db28&stat=instructions:u Fixes #49344. Fixes #47888. Fixes #48253. Fixes #49229. Fixes #58074. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D135915	2023-01-04 18:00:37 +00:00
Florian Hahn	f3c1d92682	[ConstraintElim] Adjust position in LTO pipeline. This runs ConstraintElim earlier during LTO, similar to non-LTO. Discussed and split off from D135915.	2023-01-03 17:07:43 +00:00
Florian Hahn	9e6d2c82d6	[ConstraintElim] Move after first instcombine run. Running ConstraintEliminiation after the first InstCombine run results in slightly more simplifications on average. There are is a tiny number of regressions, mostly due to CVP eliminating a condition that ConstraintElimination would use, but in most cases there's a slight improvement or no change. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D140853	2023-01-03 13:25:00 +00:00
Florian Hahn	60359f56aa	Revert "[IPSCCP] Enable specialization of functions." This reverts commit 2656572d485127cc30b8fe9752024d2a0f1c50db. It looks like CINT2017rate/502.gcc_r gets mis-compiled with LTO + PGO on AArch64 with function specialization.	2022-12-26 16:02:59 +00:00
Alexandros Lamprineas	2656572d48	[IPSCCP] Enable specialization of functions. This patch enables Function Specialization by default at all optimization levels except Os, Oz. Compilation Time Overhead: -------------------------- Measured the Instruction Count increase (Geomean) for CTMark from the llvm-testsuite as in https://llvm-compile-time-tracker.com. * {-O3, Non-LTO}: +0.136% Instruction Count * {-O3, LTO}: +0.346% Instruction Count Performance Uplift: ------------------- Measured +9.121% score increase for 505.mcf_r from SPEC Int 2017 (Tested on Neoverse N1 with -O3 + LTO) Correctness Testing: -------------------- * Passes bootstrap Clang with ASAN + LTO + FuncSpec aggressive options: { MaxClonesThreshold=10, SmallFunctionThreshold=10, AvgLoopIterationCount=30, SpecializeOnAddresses=true, EnableSpecializationForLiteralConstant=true, FuncSpecializationMaxIters=10 } * Builds Chromium and passes its unittests with the above options + ThinLTO. For more info please refer to https://discourse.llvm.org/t/rfc-should-we-enable-function-specialization/61518 Differential Revision: https://reviews.llvm.org/D140210	2022-12-25 10:05:21 +02:00
Alexandros Lamprineas	8136a0172b	[FuncSpec] Make the Function Specializer part of the IPSCCP pass. Reland 877a9f9abec61f06e39f1cd872e37b828139c2d1 since D138654 (parent) has been fixed with 9ebaf4fef4aac89d4eff08e48185d61bc893f14e and with 8f1e11c5a7d70f96943a72649daa69f152d73e90. Differential Revision: https://reviews.llvm.org/D126455	2022-12-10 14:39:49 +00:00
Roman Lebedev	4f7e5d2206	[SROA] For non-speculatable `load`s of `select`s -- split block, insert then/else blocks, form two-entry PHI node, take 2 Currently, SROA is CFG-preserving. Not doing so does not affect any pipeline test. (???) Internally, SROA requires Dominator Tree, and uses it solely for the final `-mem2reg` call. By design, we can't really SROA alloca if their address escapes somehow, but we have logic to deal with `load` of `select`/`PHI`, where at least one of the possible addresses prevents promotion, by speculating the `load`s and `select`ing between loaded values. As one would expect, that requires ensuring that the speculation is actually legal. Even ignoring complexity bailouts, that logic does not deal with everything, e.g. `isSafeToLoadUnconditionally()` does not recurse into hands of `select`. There can also be cases where the load is genuinely non-speculate. So if we can't prove that the load can be speculated, unfold the select, produce two-entry phi node, and perform predicated load. Now, that transformation must obviously update Dominator Tree, since we require it later on. Doing so is trivial. Additionally, we don't want to do this for the final SROA invocation (D136806). In the end, this ends up having negative (!) compile-time cost: https://llvm-compile-time-tracker.com/compare.php?from=c6d7e80ec4c17a415673b1cfd25924f98ac83608&to=ddf9600365093ea50d7e278696cbfa01641c959d&stat=instructions:u Though indeed, this only deals with `select`s, `PHI`s are still using speculation. Should we update some more analysis? Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D138238 This reverts commit 739611870d3b06605afe25cc07833f6a62de9545, and recommits 03e6d9d9d1d48e43f3efc35eb75369b90d4510d5 with a fixed assertion - we should check that DTU is there, not just assert false...	2022-12-08 20:19:55 +03:00
Roman Lebedev	739611870d	Revert "[SROA] For non-speculatable `load`s of `select`s -- split block, insert then/else blocks, form two-entry PHI node" The assertion about not modifying the CFG seems to not hold, will recommit in a bit. https://lab.llvm.org/buildbot#builders/139/builds/32412 This reverts commit 03e6d9d9d1d48e43f3efc35eb75369b90d4510d5. This reverts commit 4f90f4ada33718f9025d0870a4fe3fe88276b3da.	2022-12-08 19:51:15 +03:00
Roman Lebedev	03e6d9d9d1	[SROA] For non-speculatable `load`s of `select`s -- split block, insert then/else blocks, form two-entry PHI node Currently, SROA is CFG-preserving. Not doing so does not affect any pipeline test. (???) Internally, SROA requires Dominator Tree, and uses it solely for the final `-mem2reg` call. By design, we can't really SROA alloca if their address escapes somehow, but we have logic to deal with `load` of `select`/`PHI`, where at least one of the possible addresses prevents promotion, by speculating the `load`s and `select`ing between loaded values. As one would expect, that requires ensuring that the speculation is actually legal. Even ignoring complexity bailouts, that logic does not deal with everything, e.g. `isSafeToLoadUnconditionally()` does not recurse into hands of `select`. There can also be cases where the load is genuinely non-speculate. So if we can't prove that the load can be speculated, unfold the select, produce two-entry phi node, and perform predicated load. Now, that transformation must obviously update Dominator Tree, since we require it later on. Doing so is trivial. Additionally, we don't want to do this for the final SROA invocation (D136806). In the end, this ends up having negative (!) compile-time cost: https://llvm-compile-time-tracker.com/compare.php?from=c6d7e80ec4c17a415673b1cfd25924f98ac83608&to=ddf9600365093ea50d7e278696cbfa01641c959d&stat=instructions:u Though indeed, this only deals with `select`s, `PHI`s are still using speculation. Should we update some more analysis? Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D138238	2022-12-08 16:51:32 +03:00
Alexandros Lamprineas	0f0cb92cb2	Revert "[FuncSpec] Make the Function Specializer part of the IPSCCP pass." This reverts commit 877a9f9abec61f06e39f1cd872e37b828139c2d1. It depends on the parent revision 42c2dc401742266da3e0251b6c1ca491f4779963 which needs to be reverted as it broke some buildbots, so reverting both.	2022-12-08 12:41:43 +00:00
Alexandros Lamprineas	877a9f9abe	[FuncSpec] Make the Function Specializer part of the IPSCCP pass. The aim of this patch is to minimize the compilation time overhead of running Function Specialization. It is about 40% slower to run as a standalone pass (IPSCCP + FuncSpec vs IPSCCP with FuncSpec) according to my measurements. I compiled the llvm testsuite with NewPM-O3 + LTO and measured single threaded [user + system] time of IPSCCP and FuncSpec by passing the '-time-passes' option to lld. Then I compared the two configurations in terms of Instruction Count of the total compilation (not of the individual passes) as in https://llvm-compile-time-tracker.com. Geomean for non-LTO builds is -0.25% and LTO is -0.5% approximately. You can find more info below: https://discourse.llvm.org/t/rfc-should-we-enable-function-specialization/61518 Differential Revision: https://reviews.llvm.org/D126455	2022-12-08 12:14:27 +00:00
Sjoerd Meijer	8250180238	Revert "Recommit "[LoopFlatten] Enable it by default"" This reverts commit 3ea6a9a469fde168c527b1c34c09f6d684ec86af because of the reported miscompilation in: https://github.com/llvm/llvm-project/issues/59339	2022-12-05 15:14:12 +00:00
Sjoerd Meijer	3ea6a9a469	Recommit "[LoopFlatten] Enable it by default" The problem in 58441 that was reported after enabling this last time was fixed in 8e9e22f07bcbe2ee95478684cf31948370e4e51e.	2022-11-29 10:45:13 +00:00
Rong Xu	6327d263f5	[CHR] Add a threshold for the code duplication ControlHeightReduction (CHR) clones the code region to reduce the branches in the hot code path. The number of clones is linear to the depth of the region. Currently it does not have control over the code size increase. We are seeing one ~9000 BB functions get expanded to ~250000 BBs, an 25x increase. This creates a big compile time issue for the downstream optimizations. This patch adds a cap for number of clones for one region. Differential Revision: https://reviews.llvm.org/D138333	2022-11-22 11:36:40 -08:00
Sanjay Patel	163bb6d64e	[Passes][VectorCombine] enable early run generally and try load folds An early run of VectorCombine was added with D102496 specifically to deal with unnecessary vector ops produced with the C matrix extension. This patch is proposing to try those folds in general and add a pair of load folds to the menu. The load transform will partly solve (see PhaseOrdering diffs) a longstanding vectorization perf bug by removing redundant loads via GVN: issue #17113 The main reason for not enabling the extra pass generally in the initial patch was compile-time cost. The cost of VectorCombine was significantly (surprisingly) improved with: 87debdadaf18 https://llvm-compile-time-tracker.com/compare.php?from=ffe05b8f57d97bc4340f791cb386c8d00e0739f2&to=87debdadaf18f8a5c7e5d563889e10731dc3554d&stat=instructions:u ...so the extra run is going to cost very little now - the total cost of the 2 runs should be less than the 1 run before that micro-optimization: https://llvm-compile-time-tracker.com/compare.php?from=5e8c2026d10e8e2c93c038c776853bed0e7c8fc1&to=2c4b68eab5ae969811f422714e0eba44c5f7eefb&stat=instructions:u It may be possible to reduce the cost slightly more with a few more earlier-exits like that, but it's probably in the noise based on timing experiments. Differential Revision: https://reviews.llvm.org/D138353	2022-11-21 13:57:55 -05:00
Sanjay Patel	8f337f8ffe	[VectorCombine] generalize pass param name for early combines; NFC The option was added with https://reviews.llvm.org/D102496, and currently the name is accurate, but I am hoping to add a load transform that is not a scalarization. See issue #17113.	2022-11-21 13:57:55 -05:00
Roman Lebedev	8adfa29706	[Pipelines] Introduce SROA after (final, run-time) loop unrolling Now that we are done with loop unrolling, be it either by LoopVectorizer, or LoopUnroll passes, some variable-offset GEP's into alloca's could have become constant-offset, thus enabling SROA and alloca promotion, yet we don't capitalize on that, which is surprizing. While it would be good to not introduce one more SROA invocation, but instead move the one from `PassBuilder::buildFunctionSimplificationPipeline()`, the existing test coverage says that is a bad idea, though it would be fine compile-time wise: https://llvm-compile-time-tracker.com/compare.php?from=b150d34c47efbd8fa09604bce805c0920360f8d7&to=5a9a5c855158b482552be8c7af3e73d67fa44805&stat=instructions So instead, i add yet another SROA run. I have checked, and it needs to be at least after said final loop unrolling. This is still fine compile-time wise: https://llvm-compile-time-tracker.com/compare.php?from=70324cd88328c0924e605fa81b696572560aa5c9&to=fb489bbef687ad821c3173a931709f9cad9aee8a&stat=instructions I've encountered this in a real code, `SROA-after-final-loop-unrolling.ll` has been reduced from https://godbolt.org/z/fsdMhETh3 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D136806	2022-11-17 21:31:30 +03:00
Arthur Eubanks	cbcf123af2	[LegacyPM] Remove cl::opts controlling optimization pass manager passes Move these to the new PM if they're used there. Part of removing the legacy pass manager for optimization pipeline. Reland with UseNewGVN usage in clang removed. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D137915	2022-11-14 09:38:17 -08:00
Arthur Eubanks	d7c1427953	Revert "[LegacyPM] Remove cl::opts controlling optimization pass manager passes" This reverts commit 7ec05fec7115a910b2e172de794adc462388c25e. Breaks bots, e.g. https://lab.llvm.org/buildbot#builders/217/builds/15008	2022-11-14 09:33:38 -08:00
Arthur Eubanks	7ec05fec71	[LegacyPM] Remove cl::opts controlling optimization pass manager passes Move these to the new PM if they're used there. Part of removing the legacy pass manager for optimization pipeline. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D137915	2022-11-14 09:23:17 -08:00
Arthur Eubanks	4fa328074e	[NewPM][Pipeline] Add PipelineTuningOption to set inliner threshold The legacy PM allowed you to set a custom inliner threshold via builder.Inliner = llvm::createFunctionInliningPass(inline_threshold); This allows the same thing to be done with the new PM optimization pipelines. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D137038	2022-11-02 10:47:51 -07:00
Paul Walker	ab8257ca0e	[NFC] Fix a few whitespace inconsistencies.	2022-10-20 14:52:25 +00:00
Pavel Samolysov	1c530500ab	[Pipelines] Introduce DAE after ArgumentPromotion The ArgumentPromotion pass uses Mem2Reg promotion at the end to cutting down generated `alloca` instructions as well as meaningless `store`s and this behavior can leave unused (dead) arguments. To eliminate the dead arguments and therefore let the DeadCodeElimination remove becoming dead inserted `GEP`s as well as `load`s and `cast`s in the callers, the DeadArgumentElimination pass should be run after the ArgumentPromotion one. Differential Revision: https://reviews.llvm.org/D128830	2022-09-22 15:33:46 -07:00
Nuno Lopes	d953d01737	Introduce -enable-global-analyses to allow users to disable inter-procedural analyses Alive2 doesn't support verification of optimizations that use inter-procedural analyses. Right now, clang uses GlobalsAA by default and there's no way to disable it. This leads to Alive2 producing false positives. The added flag allows us to skip global analyses altogether. Differential Revision: https://reviews.llvm.org/D134139	2022-09-19 11:59:35 +01:00
Vitaly Buka	181d408186	[pipelines] OptimizerEarlyEPCallbacks for ThinLTO prelink Similar to OptimizerLastEPCallbacks workaround added D96320. Probably NFC as-is, I don't see anything hooked with this callbacks yet, but I we are looking to move sanitizers. Reviewed By: aeubanks, MaskRay Differential Revision: https://reviews.llvm.org/D133333	2022-09-06 15:54:04 -07:00
Arthur Eubanks	9599393eeb	Revert "[Pipelines] Introduce DAE after ArgumentPromotion" This reverts commit b10a341aa5b0b93b9175a8f11efc9a0955ab361e. This commit exposes the pre-existing https://github.com/llvm/llvm-project/issues/56503 in some edge cases. Will fix that and then reland this.	2022-09-01 08:52:19 -07:00
Pavel Samolysov	b10a341aa5	[Pipelines] Introduce DAE after ArgumentPromotion The ArgumentPromotion pass uses Mem2Reg promotion at the end to cutting down generated `alloca` instructions as well as meaningless `store`s and this behavior can leave unused (dead) arguments. To eliminate the dead arguments and therefore let the DeadCodeElimination remove becoming dead inserted `GEP`s as well as `load`s and `cast`s in the callers, the DeadArgumentElimination pass should be run after the ArgumentPromotion one. Differential Revision: https://reviews.llvm.org/D128830	2022-08-28 10:47:03 +03:00
Pavel Samolysov	f964417c32	Revert "[Pipelines] Introduce DAE after ArgumentPromotion" The commit breaks the compiler when a function is used as a function parameter (hm... for a function from the standard C library?): ``` static float strtof(char , char ) {} void a() { strtof(a, 0); } ``` This reverts commit 879f5118fc74657e4a5c4eff6810098e1eed75ac.	2022-08-26 13:43:09 +03:00
Pavel Samolysov	879f5118fc	[Pipelines] Introduce DAE after ArgumentPromotion The ArgumentPromotion pass uses Mem2Reg promotion at the end to cutting down generated `alloca` instructions as well as meaningless `store`s and this behavior can leave unused (dead) arguments. To eliminate the dead arguments and therefore let the DeadCodeElimination remove becoming dead inserted `GEP`s as well as `load`s and `cast`s in the callers, the DeadArgumentElimination pass should be run after the ArgumentPromotion one. Differential Revision: https://reviews.llvm.org/D128830	2022-08-25 10:55:47 +03:00
Pavel Samolysov	6703ad1e0c	Revert "[Pipelines] Introduce DAE after ArgumentPromotion" This reverts commit 3f20dcbf708cb23f79c4866d8285a8ae7bd885de.	2022-08-24 12:44:13 +03:00
Pavel Samolysov	3f20dcbf70	[Pipelines] Introduce DAE after ArgumentPromotion The ArgumentPromotion pass uses Mem2Reg promotion at the end to cutting down generated `alloca` instructions as well as meaningless `store`s and this behavior can leave unused (dead) arguments. To eliminate the dead arguments and therefore let the DeadCodeElimination remove becoming dead inserted `GEP`s as well as `load`s and `cast`s in the callers, the DeadArgumentElimination pass should be run after the ArgumentPromotion one. Differential Revision: https://reviews.llvm.org/D128830	2022-08-24 10:36:12 +03:00
Ellis Hoag	0f946a50a4	[InstrProf] Add option to disable loop opt after PGO Add the `-enable-post-pgo-loop-rotation` option to enable or disable the loop rotation transformation [1]. With some instrumentations, e.g., function entry coverage [2], loop rotation is not necessary and can lead to some surprise differences in codegen, even for functions where instrumentation is blocked with `noprofile` or `skipprofile`. The default value is `true` so the default behavior does not change. [1] https://www.llvm.org/docs/LoopTerminology.html#loop-terminology-loop-rotate [2] https://reviews.llvm.org/D116180 Reviewed By: phosek Differential Revision: https://reviews.llvm.org/D131817	2022-08-17 12:23:18 -07:00
Sanjay Patel	bfb9b8e075	[Passes] add a tail-call-elim pass near the end of the opt pipeline We call tail-call-elim near the beginning of the pipeline, but that is too early to annotate calls that get added later. In the motivating case from issue #47852, the missing 'tail' on memset leads to sub-optimal codegen. I experimented with removing the early instance of tail-call-elim instead of just adding another pass, but that appears to be slightly worse for compile-time: +0.15% vs. +0.08% time. "tailcall" shows adding the pass; "tailcall2" shows moving the pass to later, then adding the original early pass back (so 1596886802 is functionally equivalent to 180b0439dc ): https://llvm-compile-time-tracker.com/index.php?config=NewPM-O3&stat=instructions&remote=rotateright Note that there was an effort to split the tail call functionality into 2 passes - that could help reduce compile-time if we find that this change costs more in compile-time than expected based on the preliminary testing: D60031 Differential Revision: https://reviews.llvm.org/D130374	2022-07-25 15:25:47 -04:00
Alina Sbirlea	846d10f16a	Turn on flag to not re-run simplification pipeline. This patch turns on the flag `-enable-no-rerun-simplification-pipeline`, which means the simplification pipeline will not be rerun on unchanged functions in the CGSCCPass Manager. Compile time improvement: https://llvm-compile-time-tracker.com/compare.php?from=17457be1c393ff691cca032b04ea1698fedf0301&to=882301ebb893c8ef9f09fe1ea871f7995426fa07&stat=instructions No meaningful run time regressions observed in the llvm test suite and in additional internal workloads at this time. The example test in `test/Other/no-rerun-function-simplification-pipeline.ll` is a good means to understand the effect of this change: ``` define void @f1(void()* %p) alwaysinline { call void %p() ret void } define void @f2() #0 { call void @f1(void()* @f2) call void @f3() ret void } define void @f3() #0 { call void @f2() ret void } ``` There are two SCCs formed by the ModuleToPostOrderCGSCCAdaptor: (f1) and (f2, f3). The pass manager runs on the first SCC, leading to running the simplification pipeline (function and loop passes) on f1. With the flag on, after this, the output will have `Running analysis: ShouldNotRunFunctionPassesAnalysis on f1`. Next, the pass manager runs on the second SCC: (f2, f3). Since f1() was inlined, f2() now calls itself, and also calls f3(), while f3() only calls f2(). So the pass manager for the SCC first runs the Inliner on (f2, f3), then the simplification pipeline on f2. With the flag on, the output will have `Running analysis: ShouldNotRunFunctionPassesAnalysis on f2`; unless the inliner makes a change, this analysis remains preserved which means there's no reason to rerun the simplification pipeline. With the flag off, there is a second run of the simplification pipeline run on f2. Next, the same flow occurs for f3. The simplification pipeline is run on f3 a single time with the flag on, along with `ShouldNotRunFunctionPassesAnalysis on f3`, and twice with the flag off. The reruns occur only on f2 and f3 due to the additional ref edges.	2022-07-14 06:23:55 -07:00
Kazu Hirata	ec9a0e36d9	[IPO] Remove addLTOOptimizationPasses and addLateLTOOptimizationPasses (NFC) The last uses were removed on Apr 15, 2022 in commit 2e6ac54cf48aa04f7b05c382c33135b16d3f01ea. Differential Revision: https://reviews.llvm.org/D129460	2022-07-11 20:15:24 -07:00
Ben Dunbobbin	325e7e8b87	[LLVM][LTO][LLD] Enable Profile Guided Layout (--call-graph-profile-sort) for FullLTO The CGProfilePass needs to be run during FullLTO compilation at link time to emit the .llvm.call-graph-profile section to the compiled LTO object file. Currently, it is being run only during the initial LTO-prelink compilation stage (to produce the bitcode files to be consumed by the linker) and so the section is not produced. ThinLTO is not affected because: - For ThinLTO-prelink compilation the CGProfilePass pass is not run because ThinLTO-prelink passes are added via buildThinLTOPreLinkDefaultPipeline. Normal and FullLTO-prelink passes are both added via buildPerModuleDefaultPipeline which uses the LTOPreLink parameter to customize its behavior for the FullLTO-prelink pass differences. - ThinLTO backend compilation phase adds the CGProfilePass (see: buildModuleOptimizationPipeline). Adjust when the pass is run so that the .llvm.call-graph-profile section is produced correctly for FullLTO. Fixes #56185 (https://github.com/llvm/llvm-project/issues/56185)	2022-07-01 13:57:36 +01:00
Mingming Liu	e0d069598b	[Inline] Annotate inline pass name with link phase information for analysis. The annotation is flag gated; flag is turned off by default. Differential Revision: https://reviews.llvm.org/D125495	2022-06-24 10:06:43 -07:00
Fangrui Song	d86a206f06	Remove unneeded cl::ZeroOrMore for cl::opt/cl::list options	2022-06-05 00:31:44 -07:00
Fangrui Song	557efc9a8b	[llvm] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!` error. More were added due to cargo cult. Since the error has been removed, cl::ZeroOrMore is unneeded. Also remove cl::init(false) while touching the lines.	2022-06-03 21:59:05 -07:00
Arthur Eubanks	36096c2b38	[NFC][JumpThreading] Remove InsertFreezeWhenUnfoldingSelect pass parameter All callers pass true. select-unfold-freeze.ll is now a subset of select.ll so delete it. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D126501	2022-05-26 16:13:34 -07:00
Chuanqi Xu	405bf90235	[NFC] [Pipelines] Hoist CoroCleanup as Module Pass This is similar to previous patch https://reviews.llvm.org/D123925. It could also reduce the time we call declaresCoroCleanupIntrinsics. And it is helpful for further changes. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D124362	2022-05-05 15:15:09 +08:00
Chuanqi Xu	7d40f562e7	[Pipelines] Hoist CoroCleanup to avoid blocking optimizations CoroCleanup is designed to lowering all the remaining coroutine intrinsics. It is required to run after CoroSplit only. However, the position of CoroCleanup now is far too late. The downside here is that the unlowered coroutine instrincs might blocking other optimizations too. So it should be a pure win to hoist the position of CoroCleanup. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D124360	2022-05-05 15:13:27 +08:00
Mingming Liu	408bb9a375	Add a regression test to guard the 0 hot-caller threshold in SamplePGO + ThinLTO. - Add a comment near where the threshold is set.	2022-04-25 18:29:56 +00:00
Chuanqi Xu	f9bee35689	[Pipelines] Hoist CoroEarly as a module pass This change could reduce the time we call `declaresCoroEarlyIntrinsics`. And it is helpful for future changes. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D123925	2022-04-19 11:04:24 +08:00
Wenju He	0bda12b5bc	[NewPM] Add OptimizerEarly module extension point VectorizerStart extension is module callback in old PM, but is function callback in new PM. We lack a module extension point between end of buildModuleSimplificationPipeline and the function optimization (including vectorizer) pipeline. So this patch adds a new module extension point before the function optimization pipeline. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D122296	2022-03-31 08:22:27 -07:00
Arthur Eubanks	9bd66b312c	[PassManager][Coroutine] Run passes under -O0 conditionally and run GlobalDCE CoroSplit lowers various coroutine intrinsics. It's a CGSCC pass and CGSCC passes don't run on unreachable functions. Normally GlobalDCE will come along and delete unreachable functions, but we don't run GlobalDCE under -O0, so an unreachable function with coroutine intrinsics may never have CoroSplit run on it. This patch adds GlobalDCE when coroutines intrinsics are present. It also now runs all coroutine passes conditional when coroutine intrinsics are present. This should also solve the -O0 regression reported in D105877 due to LazyCallGraph construction. Fixes https://github.com/llvm/llvm-project/issues/54117 Reviewed By: ChuanqiXu Differential Revision: https://reviews.llvm.org/D122275	2022-03-23 11:03:26 -07:00
Arthur Eubanks	4fc7c55fff	[NewPM] Actually recompute GlobalsAA before module optimization pipeline RequireAnalysis<GlobalsAA> doesn't actually recompute GlobalsAA. GlobalsAA isn't invalidated (unless specifically invalidated) because it's self-updating via ValueHandles, but can be imprecise during the self-updates. Rather than invalidating GlobalsAA, which would invalidate AAManager and any analyses that use AAManager, create a new pass that recomputes GlobalsAA. Fixes #53131. Differential Revision: https://reviews.llvm.org/D121167	2022-03-14 09:42:34 -07:00
Elia Geretto	942efa5927	[NewPM] Add extension points to LTO pipeline in PassBuilder This PR adds two extension points to the default LTO pipeline in PassBuilder, one at the beginning and one at the end. These two extension points already existed in the old pass manager, the aim is to replicate the same functionality in the new one. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D120491	2022-02-25 14:48:54 -08:00
Arthur Eubanks	18da681034	[NFC] Remove unnecessary function pass managers	2022-02-25 10:03:17 -08:00
William S. Moses	d9da6a535f	[LICM][PhaseOrder] Don't speculate in LICM until after running loop rotate LICM will speculatively hoist code outside of loops. This requires removing information, like alias analysis (https://github.com/llvm/llvm-project/issues/53794), range information (https://bugs.llvm.org/show_bug.cgi?id=50550), among others. Prior to https://reviews.llvm.org/D99249 , LICM would only be run after LoopRotate. Running Loop Rotate prior to LICM prevents a instruction hoist from being speculative, if it was conditionally executed by the iteration (as is commonly emitted by clang and other frontends). Adding the additional LICM pass first, however, forces all of these instructions to be considered speculative, even if they are not speculative after LoopRotate. This destroys information, resulting in performance losses for discarding this additional information. This PR modifies LICM to accept a ``speculative'' parameter which allows LICM to be set to perform information-loss speculative hoists or not. Phase ordering is then modified to not perform the information-losing speculative hoists until after loop rotate is performed, preserving this additional information. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D119965	2022-02-17 20:13:07 -05:00

1 2

74 Commits