llvm-project

Author	SHA1	Message	Date
Ryan Mansfield	67efbd0bf1	[LLVM] Fix various cl::desc typos and whitespace issues (NFC) (#121955 )	2025-01-08 11:07:23 +01:00
Florian Hahn	9e66206638	[Passes] Generalize ShouldRunExtraVectorPasses to allow re-use (NFCI). (#118323 ) Generalize ShouldRunExtraVectorPasses to ShouldRunExtraPasses, to allow re-use for other transformations. PR: https://github.com/llvm/llvm-project/pull/118323	2024-12-04 16:55:06 +00:00
Kyungwoo Lee	d23c5c2d65	[CGData] Global Merge Functions (#112671 ) This implements a global function merging pass. Unlike traditional function merging passes that use IR comparators, this pass employs a structurally stable hash to identify similar functions while ignoring certain constant operands. These ignored constants are tracked and encoded into a stable function summary. When merging, instead of explicitly folding similar functions and their call sites, we form a merging instance by supplying different parameters via thunks. The actual size reduction occurs when identically created merging instances are folded by the linker. Currently, this pass is wired to a pre-codegen pass, enabled by the `-enable-global-merge-func` flag. In a local merging mode, the analysis and merging steps occur sequentially within a module: - `analyze`: Collects stable function hashes and tracks locations of ignored constant operands. - `finalize`: Identifies merge candidates with matching hashes and computes the set of parameters that point to different constants. - `merge`: Uses the stable function map to optimistically create a merged function. We can enable a global merging mode similar to the global function outliner (https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753/), which will perform the above steps separately. - `-codegen-data-generate`: During the first round of code generation, we analyze local merging instances and publish their summaries. - Offline using `llvm-cgdata` or at link-time, we can finalize all these merging summaries that are combined to determine parameters. - `-codegen-data-use`: During the second round of code generation, we optimistically create merging instances within each module, and finally, the linker folds identically created merging instances. Depends on #112664 This is a patch for https://discourse.llvm.org/t/rfc-global-function-merging/82608.	2024-11-13 17:34:07 -08:00
Lei Wang	bc1aa2863b	[SampleFDO] Support enabling sample loader pass in O0 mode (#113985 ) Add support for enabling sample loader pass in O0 mode(under `-fsample-profile-use`). This can help verify PGO raw profile count quality or provide a more accurate performance proxy(predictor), as O0 mode has minimal or no compiler optimizations that might otherwise impact profile count accuracy. - Explicitly disable the sample loader inlining to ensure it only emits sampling annotation. - Use flattened profile for O0 mode. - Add the pass after `AddDiscriminatorsPass` pass to work with `-fdebug-info-for-profiling`.	2024-11-08 15:29:44 -08:00
Yuxuan Chen	c6414970d7	[Coroutines] Inline the `.noalloc` ramp function marked coro_safe_elide (#114004 )	2024-11-07 22:41:32 -08:00
Hari Limaye	fbd89bcc66	Reland "[LTO] Run Argument Promotion before IPSCCP" (#111853 ) Run ArgumentPromotion before IPSCCP in the LTO pipeline, to expose more constants to be propagated. We also run PostOrderFunctionAttrs to improve the information available to ArgumentPromotion's alias analysis, and SROA to clean up allocas. Relands #111163.	2024-11-06 13:54:48 +00:00
Shilei Tian	390300d9f4	[PassBuilder] Add `ThinOrFullLTOPhase` to optimizer pipeline (#114577 )	2024-11-03 23:25:29 -05:00
Shilei Tian	dc45ff1d2a	[PassBuilder] Add `ThinOrFullLTOPhase` to early simplication EP call backs (#114547 ) The early simplication pipeline is used in non-LTO and (Thin/Full)LTO pre-link stage. There are some passes that we want them in non-LTO mode, but not at LTO pre-link stage. The control is missing currently. This PR adds the support. To demonstrate the use, we only enable the internalization pass in non-LTO mode for AMDGPU because having it run in pre-link stage causes some issues.	2024-11-03 23:24:10 -05:00
Shilei Tian	5445edb5d6	[PassBuilder] Replace `bool LTOPreLink` with `ThinOrFullLTOPhase Phase` (#114564 ) This will allow more fine-grained control in the future.	2024-11-01 14:56:35 -04:00
Lei Wang	bef3b54ea1	[InstrPGO] Avoid using global variable to fix potential data race (#114364 ) In https://github.com/llvm/llvm-project/pull/109837, it sets a global variable(`PGOInstrumentColdFunctionOnly`) in PassBuilderPipelines.cpp which introduced a data race detected by TSan. To fix this, I decouple the flag setting, the flags are now set separately(`instrument-cold-function-only-path` is required to be used with `--pgo-instrument-cold-function-only`).	2024-10-31 21:28:13 -07:00
Paul Kirth	913cd11f94	[llvm][fatlto] Drop any CFI related instrumentation after emitting bitcode (#112788 ) We want to support CFI instrumentation for the bitcode section, without miscompiling the object code portion of a FatLTO object. We can reuse the existing mechanisms in the LowerTypeTestsPass to do that, by just adding the pass to the FatLTO pipeline after the EmbedBitcodePass with the correct options set. Fixes #112053	2024-10-31 12:40:21 -07:00
Dmitry Chernenkov	d924a9ba03	Revert "[InstrPGO] Support cold function coverage instrumentation (#109837 )" This reverts commit e517cfc531886bf6ed64b4e7109bb3141ac7f430.	2024-10-31 10:55:17 +00:00
Paul Kirth	b01e2a8b56	[llvm] Allow always dropping all llvm.type.test sequences Currently, the `DropTypeTests` parameter only fully works with phi nodes and llvm.assume instructions. However, we'd like CFI to work in conjunction with FatLTO, in so far as the bitcode section should be able to contain the CFI instrumentation, while any incompatible bits are dropped when compiling the object code. To do that, we need to drop the llvm.type.test instructions everywhere, and not just their uses in phi nodes. This patch updates the LowerTypeTest pass so that uses are removed, and replaced with `true` in all cases, and not just in phi nodes. Addressing this will allow us to fix #112053 by modifying the FatLTO pipeline. Reviewers: pcc, nikic Reviewed By: pcc Pull Request: https://github.com/llvm/llvm-project/pull/112787	2024-10-30 16:56:30 -07:00
Lei Wang	e517cfc531	[InstrPGO] Support cold function coverage instrumentation (#109837 ) This patch adds support for cold function coverage instrumentation based on sampling PGO counts. The major motivation is to detect dead functions for the services that are optimized with sampling PGO. If a function is covered by sampling profile count (e.g., those with an entry count > 0), we choose to skip instrumenting those functions, which significantly reduces the instrumentation overhead. More details about the implementation and flags: - Added a flag `--pgo-instrument-cold-function-only` in `PGOInstrumentation.cpp` as the main switch to control skipping the instrumentation. - Built the extra instrumentation passes(a bundle of passes in `addPGOInstrPasses`) under sampling PGO pipeline. This is controlled by `--instrument-cold-function-only-path` flag. - Added a driver flag `-fprofile-generate-cold-function-coverage`: - 1) Config the flags in one place, i,e. adding `--instrument-cold-function-only-path=<...>` and `--pgo-function-entry-coverage`. Note that the instrumentation file path is passed through `--instrument-sample-cold-function-path`, because we cannot use the `PGOOptions.ProfileFile` as it's already used by `-fprofile-sample-use=<...>`. - 2) makes linker to link `compiler_rt.profile` lib(see [ToolChain.cpp#L1125-L1131](https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/ToolChain.cpp#L1125-L1131) ). - Added a flag(`--pgo-cold-instrument-entry-threshold`) to config entry count to determine cold function. Overall, the full command is like: ``` clang++ -O2 -fprofile-generate-cold-function-coverage=<...> -fprofile-sample-use=<...> code.cc -o code ```	2024-10-28 10:13:45 -07:00
Teresa Johnson	1de71652fd	[MemProf] Support cloning for indirect calls with ThinLTO (#110625 ) This patch enables support for cloning in indirect callsites. This is done by synthesizing callsite records for each virtual call target from the profile metadata. In the thin link all the synthesized records for a particular indirect callsite initially share the same context node, but support is added to partition the callsites and outgoing edges based on the callee function, creating a separate node for each target. In the LTO backend, when cloning is needed we first perform indirect call promotion, then change the target of the new direct call to the desired clone. Note this is ThinLTO-specific, since for regular LTO indirect call promotion should have already occurred.	2024-10-11 13:53:35 -07:00
Arthur Eubanks	e34d614e7d	[Passes] Remove -enable-infer-alignment-pass flag (#111873 ) This flag has been on for a while without any complaints.	2024-10-10 12:28:46 -07:00
Hari Limaye	0a0f100a70	Revert "[LTO] Run Argument Promotion before IPSCCP" (#111839 ) Reverts llvm/llvm-project#111163, as this was merged prematurely.	2024-10-10 15:03:01 +01:00
Hari Limaye	b9754e9d28	[LTO] Run Argument Promotion before IPSCCP (#111163 ) Run ArgumentPromotion before IPSCCP in the LTO pipeline, to expose more constants to be propagated. We also run PostOrderFunctionAttrs to improve the information available to ArgumentPromotion's alias analysis, and SROA to clean up allocas.	2024-10-10 06:08:27 -04:00
Mircea Trofin	9c0ba62010	[ctx_prof] Relax the "profile use" case around `PGOOpt` (#108265 ) `PGOOpt` could have a value if, for instance, debug info for profiling is requested. Relaxing the requirement, for now, following that eventually we would factor `PGOOpt` to better capture the supported interplay between the various profiling options.	2024-09-11 13:39:50 -07:00
Yuxuan Chen	761bf333e3	[LLVM][Coroutines] Switch CoroAnnotationElidePass to a FunctionPass (#107897 ) After landing https://github.com/llvm/llvm-project/pull/99285 we found that the call graph update was causing the following crash when expensive checks are turned on ``` llvm-project/llvm/lib/Analysis/CGSCCPassManager.cpp:982: LazyCallGraph::SCC &updateCGAndAnalysisManagerForPass(LazyCallGraph &, LazyCallGraph::SCC &, LazyCallGraph::Node &, CGSCCAnalysisManager &, CGSCCUpdateResult &, FunctionAnalysisManager &, bool): Assertion `(RC == &TargetRC \|\| RC->isAncestorOf(Targe tRC)) && "New call edge is not trivial!"' failed. ``` I have to admit I believe that the call graph update process I did for that patch could be wrong. After reading the code in `CGSCCToFunctionPassAdaptor`, I am convinced that `CoroAnnotationElidePass` can be a FunctionPass and rely on the adaptor to update the call graph for us, so long as we properly invalidate the caller's analyses. After this patch, `llvm/test/Transforms/Coroutines/coro-transform-must-elide.ll` no longer fails under expensive checks.	2024-09-09 18:57:39 -07:00
Mircea Trofin	3b22618094	[ctx_prof] Insert the ctx prof flattener after the module inliner (#107499 ) This patch enables experimenting with the contextual profile. ICP is currently disabled in this case - will reenable it subsequently. Also subsequently the inline cost model / decision making would be updated to be context-aware. Right now, this just achieves "complete use" of the profile, in that it's ingested, maintained, and sunk to a flat profile when not needed anymore. Issue [#89287](https://github.com/llvm/llvm-project/issues/89287)	2024-09-09 18:16:24 -07:00
Yuxuan Chen	a416267a5f	[LLVM][Coroutines] Transform "coro_elide_safe" calls to switch ABI coroutines to the `noalloc` variant (#99285 ) This patch is episode three of the middle end implementation for the coroutine HALO improvement project published on discourse: https://discourse.llvm.org/t/language-extension-for-better-more-deterministic-halo-for-c-coroutines/80044 After we attribute the calls to some coroutines as "coro_elide_safe" in the C++ FE and creating a `noalloc` ramp function, we use a new middle end pass to move the call to coroutines to the noalloc variant. This pass should be run after CoroSplit. For each node we process in CoroSplit, we look for its callers and replace the attributed ones in presplit coroutines to the noalloc one. The transformed `noalloc` ramp function will also require a frame pointer to a block of memory it can use as an activation frame. We allocate this on the caller's frame with an alloca. Please note that we cannot safely transform such attributed calls in post-split coroutines due to memory lifetime reasons. The CoroSplit pass is responsible for creating the coroutine frame spills for all the allocas in the coroutine. Therefore it will be unsafe to create new allocas like this one in post-split coroutines. This happens relatively rarely because CGSCC performs the passes on the callees before the caller. However, if multiple coroutines coexist in one SCC, this situation does happen (and prevents us from having potentially unbound frame size due to recursion.) You can find episode 1: Clang FE of this patch series at https://github.com/llvm/llvm-project/pull/99282 Episode 2: CoroSplit at https://github.com/llvm/llvm-project/pull/99283	2024-09-08 23:09:40 -07:00
Mingming Liu	d4ddf06b0c	[NFCI]Remove EntryCount from FunctionSummary and clean up surrounding synthetic count passes. (#107471 ) The primary motivation is to remove `EntryCount` from `FunctionSummary`. This frees 8 bytes out of `sizeof(FunctionSummary)` (136 bytes as of `64498c5483`). While I'm at it, this PR clean up {SummaryBasedOptimizations, SyntheticCountsPropagation} since they were not used and there are no plans to further invest on them. With this patch, bitcode writer writes a placeholder 0 at the byte offset of `EntryCount` and bitcode reader can parse the function entry count at the correct byte offset. Added a TODO to stop writing `EntryCount` and bump bitcode version	2024-09-06 16:38:17 -07:00
Mircea Trofin	775c50709c	[ctx_prof] Flattened profile lowering pass (#107329 ) Pass to flatten and lower the contextual profile to profile (i.e. `MD_prof`) metadata. This is expected to be used after all IPO transformations have happened. Prior to lowering, the instrumentation is maintained during IPO and the contextual profile is kept in sync (see PRs #105469, #106154). Flattening (#104539) sums up all the counters belonging to all a function's context nodes. We first propagate counter values (from the flattened profile) using the same propagation algorithm as `PGOUseFunc::populateCounters`, then map the edge values to `branch_weights`. Functions. in the module that don't have an entry in the flattened profile are deemed cold, and any `MD_prof` metadata they may have is reset. The profile summary is also reset at this point. Issue [#89287](https://github.com/llvm/llvm-project/issues/89287)	2024-09-06 13:47:08 -07:00
Arthur Eubanks	fb14f1df54	[PGO][Pipeline] Enable PGOForceFunctionAttrs in PGO optimization pipelines (#106790 ) Remove flag that turns on the PGOForceFunctionAttrs pass and always add it to default pipelines when using PGO. This is NFC by default since PGOOpt->ColdOptType is by default ColdFuncOpt::Default. Remove -O2 RUN line in basic.ll since we now have the pipeline tests.	2024-09-03 10:35:08 -07:00
Shengchen Kan	87c86aa6b9	[X86,SimplifyCFG] Support hoisting load/store with conditional faulting (Part I) (#96878 ) This is simplifycfg part of https://github.com/llvm/llvm-project/pull/95515 In this PR, we support hoisting load/store with conditional faulting in `SimplifyCFGOpt::speculativelyExecuteBB` to eliminate conditional branches. This is for cases like ``` void test (int a, int b) { if (a) b = a; } ``` In the following patches, we will support the hoist in `SimplifyCFGOpt::hoistCommonCodeFromSuccessors`. That is for cases like ``` void test (int a, int c, int d) { if (a) c = a; else d = a; } ```	2024-08-29 10:42:44 +08:00
Mircea Trofin	3f18a0a71c	[nfc] Improve testability of PGOInstrumentationGen (#104490 ) Passing to the `PGOInstrumentationGen` pass whether it needs to produce contextual profiling instrumentation as a flag, in the process restructuring a bit the places that need to be aware of that (some were unnecessarily in `PGOInstrumentationUse`)	2024-08-16 09:45:29 -07:00
Mircea Trofin	aca01bff07	[ctx_prof] CtxProfAnalysis: populate module data (#102930 ) Continuing from #102084, which introduced the analysis, we now populate it with info about functions contained in the module. When we will update the profile due to e.g. inlined callsites, we'll ingest the callee's counters and callsites to the caller. We'll move those to the caller's respective index space (counter and callers), so we need to know and maintain where those currently end. We also don't need to keep profiles not pertinent to this module. This patch also introduces an arguably much simpler way to track the GUID of a function from the frontend compilation, through ThinLTO, and into the post-thinlink compilation step, which doesn't rely on keeping names around. A separate RFC and patches will discuss extending this to the current PGO (instrumented and sampled) and other consumers as an infrastructural component.	2024-08-14 18:46:25 -07:00
Mircea Trofin	4a2bf05980	Reapply "[ctx_prof] Fix the pre-thinlink "use" case (#102511 )" This reverts commit 967185eeb85abb77bd6b6cdd2b026d5c54b7d4f3. The problem was link dependencies, moved `UseCtxProfile` to `Analysis`.	2024-08-08 17:04:00 -07:00
Aiden Grossman	967185eeb8	Revert "[ctx_prof] Fix the pre-thinlink "use" case (#102511 )" This reverts commit 1a6d60e0162b3ef767c87c95512dd453bf4f4746. Broke some buildbots.	2024-08-08 21:14:56 +00:00
Mircea Trofin	1a6d60e016	[ctx_prof] Fix the pre-thinlink "use" case (#102511 ) Didn't notice in #101338 that the instrumentation in `llvm/test/Transforms/PGOProfile/ctx-prof-use-prelink.ll` was actually incorrect.	2024-08-08 16:45:04 -04:00
Mircea Trofin	dbbf0762b6	[ctx_prof] CtxProfAnalysis (#102084 ) This is an immutable analysis that loads and makes the contextual profile available to other passes. This patch introduces the analysis and an analysis printer pass. Subsequent patches will introduce the APIs that IPO passes will call to modify the profile as result of their changes.	2024-08-07 14:39:48 -04:00
Mircea Trofin	ba4da5a087	[ctx_prof] "Use" support for pre-thinlink. (#101338 ) There is currently no plan to support contextual profiling use in a non- ThinLTO scenario. In the pre-link phase, we only instrument and then immediately bail out to let the linker group functions under an entrypoint in the same module as the entrypoint. We don't actually care what the profile contains - just that we want to use a contextual profile. After that, in post-thinlink, we require the profile be passed again so we can actually use it. The earlier instrumentation will be used to match counter values. While the feature is in development, we add a hidden flag for the use scenario, but we can eventually tie it to the `PGOOptions` mechanism. We will use the same flag in both pre- and post-thinlink, because it simplifies things - usually the post-thinlink args are the same as the ones for pre-. This, despite the flag being basically treated as a boolean in pre-thinlink.	2024-08-02 20:51:27 -04:00
Wei Wang	3a9ef4e69a	[Pipelines] Do not run CoroSplit and CoroCleanup in LTO pre-link pipeline (#100205 ) This is re-land of #90310 after making asan skip pre-split coroutines in #99415. Skip CoroSplit and CoroCleanup in LTO pre-link pipeline so that CoroElide can happen after callee coroutine is imported into caller's module in ThinLTO.	2024-07-29 17:42:01 -07:00
Joseph Huber	8758091a70	[LLVM] Add 'ExpandVariadicsPass' to LTO default pipeline (#100479 ) Summary: This pass expands variadic functions into non-variadic function calls according to the target ABI. Currently, this is used as the lowering for the NVPTX and AMDGPU targets. This pass is currently only run late in the target's backend. However, during LTO we want to run it before the inliner pass so that the expanded functions can be inlined using standard heuristics. This pass is a no-op for unsupported targets, so this won't apply to any code that isn't already using it.	2024-07-25 09:21:05 -05:00
Tianqing Wang	3d494bfc7f	[SimplifyCFG] Increase budget for FoldTwoEntryPHINode() if the branch is unpredictable. (#98495 ) The `!unpredictable` metadata has been present for a long time, but it's usage in optimizations is still limited. This patch teaches `FoldTwoEntryPHINode()` to be more aggressive with an unpredictable branch to reduce mispredictions. A TTI interface `getBranchMispredictPenalty()` is added to distinguish between different hardwares to ensure we don't go too far for simpler cores. For simplicity, only a naive x86 implementation is included for the time being.	2024-07-23 07:47:21 +08:00
xur-llvm	b1ca2a9546	[PGO] Sampled instrumentation in PGO to speed up instrumentation binary (#69535 ) In comparison to non-instrumented binaries, PGO instrumentation binaries can be significantly slower. For highly threaded programs, this slowdown can reach 10x due to data races or false sharing within counters. This patch incorporates sampling into the PGO instrumentation process to enhance the speed of instrumentation binaries. The fundamental concept is similar to the one proposed in https://reviews.llvm.org/D63949. Three sampling modes are introduced: 1. Simple Sampling: When '-sampled-instr-bust-duration' is set to 1. 2. Fast Burst Sampling: When not using simple sampling, and '-sampled-instr-period' is set to 65535. This is the default mode of sampling. 3. Full Burst Sampling: When neither simple nor fast burst sampling is used. Utilizing this sampled instrumentation significantly improves the binary's execution speed. Measurements show up to 5x speedup with default settings. Fast burst sampling now results in only around 20% to 30% slowdown (compared to 8 to 10x slowdown without sampling). Out tests show that profile quality remains good with sampling, with edge counts typically showing more than 90% overlap. For applications whose behavior changes due to binary speed, sampling instrumentation can enhance performance. Observations have shown some apps experiencing up to a ~2% improvement in PGO. A potential drawback of this patch is the increased binary size and compilation time. The Sampling method in this patch does not improve single threaded program instrumentation binary speed.	2024-07-22 09:19:17 -07:00
YAMAMOTO Takashi	5d79110959	[Pipelines] Perform mergefunc after constmerge (#92498 ) Constmerge can fold switch jump tables, possibly making functions identical again. It can help mergefunc. On the other hand, the opposite seems unlikely. Fixes https://github.com/llvm/llvm-project/issues/92201.	2024-07-05 12:28:03 +02:00
Egor Pasko	cab81dd038	[EntryExitInstrumenter] Move passes out of clang into LLVM default pipelines (#92171 ) Move EntryExitInstrumenter(PostInlining=true) to as late as possible and EntryExitInstrumenter(PostInlining=false) to an early pre-inlining stage (but skip for ThinLTO post-link). This should fix the issues reported in https://github.com/rust-lang/rust/issues/92109 and https://github.com/llvm/llvm-project/issues/52853. These are caused by https://reviews.llvm.org/D97608.	2024-05-31 12:48:45 -07:00
Mircea Trofin	d311a62e2f	[ctx_profile] Decouple ctx instrumentation from PGOOpt (#92445 ) We currently don't support passing files and don't need frontend involvement either.	2024-05-16 13:41:36 -07:00
Mircea Trofin	174cdeced0	[nfc] Clarify when the various PGO instrumentation passes run (#92330 ) The code seems easier to read if it's centered on what the user wants rather than combinations of whatever internal variables.	2024-05-16 12:17:22 -07:00
Reid Kleckner	aa0776de46	Revert "[Pipelines] Do not run CoroSplit and CoroCleanup in LTO pre-link pipeline (#90310 )" and related patches This change is incorrect when thinlto and asan are enabled, and this can be observed by adding `-fsanitize=address` to the provided coro-elide-thinlto.cpp test. It results in the error "Coroutines cannot handle non static allocas yet", and ASan introduces a dynamic alloca. In other words, we must preserve the invariant that CoroSplit runs before ASan. If we move CoroSplit to the post post-link compile stage, ASan has to be moved to the post-link compile stage first. It would also be correct to make CoroSplit handle dynamic allocas so the pass ordering doesn't matter, but sanitizer instrumentation really ought to be last, after coroutine splitting. This reverts commit bafc5f42c0132171287d7cba7f5c14459be1f7b7. This reverts commit b1b1bfa7bea0ce489b5ea9134e17a43c695df5ec. This reverts commit 0232b77e145577ab78e3ed1fdbb7eacc5a7381ab. This reverts commit fb2d3056618e3d03ba9a695627c7b002458e59f0. This reverts commit 1cb33713910501c6352d0eb2a15b7a15e6e18695. This reverts commit cd68d7b3c0ebf6da5e235cfabd5e6381737eb7fe.	2024-05-10 21:28:13 +00:00
Mircea Trofin	96568f3539	[llvm][ctx_profile] Add instrumentation lowering (#90821 ) This adds the instrumentation lowering pass. (Tracking Issue: #89287, RFC referenced there)	2024-05-08 16:49:08 -07:00
Wei Wang	bafc5f42c0	[Pipelines][Coroutines] Tune coroutine passes only for ThinLTO pre-link pipeline (#90690 ) Follow up to #90310, limit the tune up only to ThinLTO pre-link as coroutine passes are not in MonoLTO backend	2024-04-30 21:40:04 -07:00
Wei Wang	cd68d7b3c0	[Pipelines] Do not run CoroSplit and CoroCleanup in LTO pre-link pipeline (#90310 ) Skip CoroSplit and CoroCleanup in LTO pre-link pipeline so that CoroElide can happen after callee coroutine is imported into caller's module in ThinLTO.	2024-04-29 10:24:53 -07:00
Arthur Eubanks	947b656add	[PGO] Check that PGOOpt exists before using PGOOpt->ColdOptType (#89139 ) This means that the pass is unusable without some sort of profile. We can revisit this decision later if we want to support running this pass without a profile.	2024-04-18 11:22:10 -07:00
Florian Hahn	0f82469314	[Passes] Run SimpleLoopUnswitch after introducing invariant branches. (#81271 ) IndVars may be able to replace a loop dependent condition with a loop invariant one, but loop-unswitch runs before IndVars, so the invariant check remains in the loop. For an example, consider a read-only loop with a bounds check: https://godbolt.org/z/8cdj4qhbG This patch uses a approach similar to the way extra cleanup passes are run on demand after vectorization (added in acea6e9cfa4c4a0e8678c7). It introduces a new ShouldRunExtraSimpleLoopUnswitch analysis marker, which IndVars can use to indicate that extra unswitching is beneficial. ExtraSimpleLoopUnswitchPassManager uses this analysis to determine whether to run its passes on a loop. Compile-time impact (geomean) ranges from +0.0% to 0.02% https://llvm-compile-time-tracker.com/compare.php?from=138c0beb109ffe47f75a0fe8c4dc2cdabe8a6532&to=19e6e99eeb280d426907ea73a21b139ba7225627&stat=instructions%3Au Compile-time impact (geomean) of unconditionally running SimpleLoopUnswitch ranges from +0.05% - +0.16% https://llvm-compile-time-tracker.com/compare.php?from=138c0beb109ffe47f75a0fe8c4dc2cdabe8a6532&to=2930dfd5accdce2e6f8d5146ae4d626add2065a2&stat=instructions:u Unconditionally running SimpleLoopUnswitch seems to indicate that there are multiple other scenarios where we fail to run unswitching when opportunities remain. Fixes https://github.com/llvm/llvm-project/issues/85551. PR: https://github.com/llvm/llvm-project/pull/81271	2024-04-12 22:07:29 +01:00
lifengxiang1025	e40cabfea4	[MemProf] Match function's summary and definition strictly (#83665 ) Problem description: https://github.com/llvm/llvm-project/pull/81008#issuecomment-1933468520 Solution: https://github.com/llvm/llvm-project/pull/81008#issuecomment-1934192548 (choose plan2)	2024-03-12 11:00:02 +08:00
Paul Kirth	2fef685363	[llvm][loop-rotate] Allow forcing loop-rotation (#82828 ) Many profitable optimizations cannot be performed at -Oz, due to unrotated loops. While this is worse for size (minimally), many of the optimizations significantly reduce code size, such as memcpy optimizations and other patterns found by loop idiom recognition. Related discussion can be found in issue #50308. This patch adds an experimental, backend-only flag to allow loop header duplication, regardless of the optimization level. Downstream consumers can experiment with this flag, and if it is profitable, we can adjust the compiler's defaults accordingly, and expose any useful frontend flags to opt into the new behavior.	2024-02-29 13:46:13 -08:00
Paul Kirth	777ac46ddb	[llvm] Remove pipeline checks for optsize for DFAJumpThreadingPass The pass itself checks whether to apply the optimization based on the minsize attribute, so there isn't much functional benefit to preventing the pass from being added. Gating the pass gets added to the pass pipeline complicates the interaction with -enable-dfa-jump-thread, as well. Reviewers: aeubanks Reviewed By: aeubanks Pull Request: https://github.com/llvm/llvm-project/pull/83318	2024-02-28 11:12:13 -08:00

1 2 3 4 5

217 Commits