llvm-project

Author	SHA1	Message	Date
Joel E. Denny	37e03b56b8	Revert "[PGO] Add `llvm.loop.estimated_trip_count` metadata" (#151585 ) Reverts llvm/llvm-project#148758 [As requested.](https://github.com/llvm/llvm-project/pull/148758#pullrequestreview-3076627201)	2025-07-31 15:56:31 -04:00
Joel E. Denny	f7b65011de	[PGO] Add `llvm.loop.estimated_trip_count` metadata (#148758 ) This patch implements the `llvm.loop.estimated_trip_count` metadata discussed in [[RFC] Fix Loop Transformations to Preserve Block Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785). As [suggested in the RFC comments](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785/4), it adds the new metadata to all loops at the time of profile ingestion and estimates each trip count from the loop's `branch_weights` metadata. As [suggested in the PR #128785 review](https://github.com/llvm/llvm-project/pull/128785#discussion_r2151091036), it does so via a new `PGOEstimateTripCountsPass` pass, which creates the new metadata for each loop but omits the value if it cannot estimate a trip count due to the loop's form. An important observation not previously discussed is that `PGOEstimateTripCountsPass` often cannot estimate a loop's trip count, but later passes can sometimes transform the loop in a way that makes it possible. Currently, such passes do not necessarily update the metadata, but eventually that should be fixed. Until then, if the new metadata has no value, `llvm::getLoopEstimatedTripCount` disregards it and tries again to estimate the trip count from the loop's current `branch_weights` metadata.	2025-07-31 12:28:25 -04:00
Ryotaro Kasuga	3099b7eb5d	[Passes] Move LoopInterchange into optimization pipeline (#145503 ) As mentioned in https://github.com/llvm/llvm-project/pull/145071, LoopInterchange should be part of the optimization pipeline rather than the simplification pipeline. This patch moves LoopInterchange into the optimization pipeline. More contexts: - By default, LoopInterchange attempts to improve data locality, however, it also takes increasing vectorization opportunities into account. Given that, it is reasonable to run it as close to vectorization as possible. - I looked into previous changes related to the placement of LoopInterchange, but couldn’t find any strong motivation suggesting that it benefits other simplifications. - As far as I tried some tests (including llvm-test-suite), removing LoopInterchange from the simplification pipeline does not affect other simplifications. Therefore, there doesn't seem to be much value in keeping it there. - The new position reduces compile-time for ThinLTO, probably because it only runs once per function in post-link optimization, rather than both in pre-link and post-link optimization. I haven't encountered any cases where the positional difference affects optimization results, so please feel free to revert if you run into any issues.	2025-07-04 20:06:53 +09:00
Mircea Trofin	daa2a587cc	[TRE] Adjust function entry count when using instrumented profiles (#143987 ) The entry count of a function needs to be updated after a callsite is elided by TRE: before elision, the entry count accounted for the recursive call at that callsite. After TRE, we need to remove that callsite's contribution. This patch enables this for instrumented profiling cases because, there, we know the function entry count captured entries before TRE. We cannot currently address this for sample-based (because we don't know whether this function was TRE-ed in the binary that donated samples)	2025-06-23 08:07:31 -07:00
Nikita Popov	ae8c85c9ce	[Passes] Remove LoopInterchange from O1 pipeline (#145071 ) This is a fairly exotic pass, I don't think it makes a lot of sense to run it at O1, esp. as vectorization wouldn't run at O1 anyway.	2025-06-23 09:11:03 +02:00
Peter Collingbourne	3fa231f47c	Add SimplifyTypeTests pass. This pass figures out whether inlining has exposed a constant address to a lowered type test, and remove the test if so and the address is known to pass the test. Unfortunately this pass ends up needing to reverse engineer what LowerTypeTests did; this is currently inherent to the design of ThinLTO importing where LowerTypeTests needs to run at the start. Reviewers: teresajohnson Reviewed By: teresajohnson Pull Request: https://github.com/llvm/llvm-project/pull/141327	2025-06-05 11:09:20 -07:00
Snehasish Kumar	16c7b3c9f5	[MemProf] Split MemProfiler into Instrumentation and Use. (#142811 ) Most of the recent development on the MemProfiler has been on the Use part. The instrumentation has been quite stable for a while. As the complexity of the use grows (with undrifting, diagnostics etc) I figured it would be good to separate these two implementations.	2025-06-05 07:36:50 -07:00
Kazu Hirata	228f66807d	[llvm] Remove unused includes (NFC) (#142733 ) These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.	2025-06-04 12:30:52 -07:00
Min-Yih Hsu	0ab67ec191	[LV][EVL] Introduce the EVLIndVarSimplify Pass for EVL-vectorized loops (#131005 ) When we enable EVL-based loop vectorization w/ predicated tail-folding, each vectorized loop has effectively two induction variables: one calculates the step using (VF x vscale) and the other one increases the IV by values returned from experiment.get.vector.length. The former, also known as canonical IV, is more favorable for analyses as it's "countable" in the sense of SCEV; the latter (EVL-based IV), however, is more favorable to codegen, at least for those that support scalable vectors like AArch64 SVE and RISC-V. The idea is that we use canonical IV all the way until the end of all vectorizers, where we replace it with EVL-based IV using EVLIVSimplify introduced here. Such that we can have the best from both worlds. This Pass is enabled by default in RISC-V. However, since we haven't really vectorize loops with predicate tail-folding by default, this Pass is no-op at this moment.	2025-05-14 13:49:50 -07:00
Chengjun	94d933676c	[AA] Move Target Specific AA before BasicAA (#125965 ) In this change, NVPTX AA is moved before Basic AA to potentially improve compile time. Additionally, it introduces a flag in the `ExternalAAWrapper` that allows other backends to run their target-specific AA passes before Basic AA, if desired. The change works for both New Pass Manager and Legacy Pass Manager. Original implementation by Princeton Ferro <pferro@nvidia.com>	2025-05-07 15:25:48 -07:00
Tianle Liu	e038c5401c	[LTO][Pipelines] Add 0 hot-caller threshold for SamplePGO + FullLTO (#135152 ) If a hot callsite function is not inlined in the 1st build, inlining the hot callsite in pre-link stage of SPGO 2nd build may lead to Function Sample not found in profile file in link stage. It will miss some profile info. ThinLTO has already considered and dealed with it by setting HotCallSiteThreshold to 0 to stop the inline. This patch just adds the same processing for FullLTO.	2025-04-14 11:21:08 +08:00
Mircea Trofin	4c90d977db	[ctxprof] Use the flattened contextual profile pre-thinlink (#134723 ) Flatten the profile pre-thinlink so that ThinLTO has something to work with for the parts of the binary that aren't covered by contextual profiles. Post-thinlink, the flattener is re-run and will actually change profile info, but just for the modules containing contextual trees ("specialized modules"). For the rest, the flattener just yanks out the instrumentation.	2025-04-08 17:30:49 -07:00
Paul Kirth	268c065eab	[fatlto] Add coroutine passes when using FatLTO with ThinLTO (#134434 ) When coroutines are used w/ both -ffat-lto-objects and -flto=thin, the coroutine passes are not added to the optimization pipelines. Ensure they are added before ModuleOptimization to generate a working ELF object. Fixes #134409.	2025-04-07 08:41:49 -07:00
Ellis Hoag	2044dd07da	[InstrProf] Remove -forder-file-instrumentation (#130192 )	2025-03-13 08:28:16 -07:00
Vitaly Buka	3ccacc4e44	Revert "[LTO][Pipelines][Coro] De-duplicate Coro passes" (#129977 ) Reverts llvm/llvm-project#128654 Breaks FatLTO https://github.com/llvm/llvm-project/pull/128654#issuecomment-2700053700	2025-03-06 07:57:30 -08:00
Mircea Trofin	eb1c3ace39	[ctxprof] Override type of instrumentation if `-profile-context-root` is specified (#128940 ) This patch makes it easy to enable ctxprof instrumentation for targets where the build has a bunch of defaults for instrumented PGO that we want to inherit for ctxprof. This is switching experimental defaults: we'll eventually enable ctxprof instrumentation through `PGOOpt` but that type is currently quite entangled and, for the time being, no point adding to that.	2025-02-26 19:56:59 -08:00
Mircea Trofin	f6703a4ff5	[ctxprof] don't inline weak symbols after instrumentation (#128811 ) Contextual profiling identifies functions by GUID. Functions that may get overridden by the linker with a prevailing copy may have, during instrumentation, different variants in different modules. If these variants get inlined before linking (here I assume thinlto), they will identify themselves to the ctxprof runtime as their GUID, leading to issues - they may have different counter counts, for instance. If we block their inlining in the pre-thinlink compilation, only the prevailing copy will survive post-thinlink and the confusion is avoided. The change introduces a small pass just for this purpose, which marks any symbols that could be affected by the above as `noinline` (even if they were `alwaysinline`). We already carried out some inlining (via the preinliner), before instrumenting, so technically the `alwaysinline` directives were honored. We could later (different patch) choose to mark them back to their original attribute (none or `alwaysinline`) post-thinlink, if we want to - but experimentally that doesn't really change much of the performance of the instrumented binary.	2025-02-26 11:01:37 -08:00
Vitaly Buka	31897e651a	[LTO][Pipelines][Coro] De-duplicate Coro passes (#128654 ) ``` if (!isLTOPostLink(Phase)) CoroPM.addPass(CoroEarlyPass()); if (!isLTOPreLink(Phase)) // Other Coro passes ``` Followup to #126168.	2025-02-25 20:14:19 -08:00
Vitaly Buka	30a7c816ee	[LTO][Pipelines][NFC] Exctract isLTOPostLink (#128653 )	2025-02-25 14:56:51 -08:00
Paul Kirth	63c1be7249	[llvm][fatlto] Add FatLTOCleanup pass (#125911 ) When using FatLTO, it is common to want to enable certain types of whole program optimizations (WPD) or security transforms (CFI), so that they can be made available when performing LTO. However, these transforms should not be used when compiling the non-LTO object code. Since the frontend must emit different IR, we cannot simply clone the module and optimize the LTO section and non-LTO section differently to work around this. Instead, we need to remove any problematic instruction sequences. This patch adds a new pass whose responsibility is to clean up the IR in the FatLTO pipeline after creating the bitcode section, which is after running the pre-link pipeline but before running module optimization. This allows us to safely drop any conflicting instructions or IR constructs that are inappropriate for non-LTO compilation.	2025-02-13 11:39:02 -08:00
Vitaly Buka	1032df6f60	[LTO][Pipelines][Coro] Handle coroutines in LTO pipeline (#126168 ) ThinLTO delays handling of coroutines to ThinLTO backend. However it's usually possible to use ThinLTO prelink objects for FullLTO. In this case we have left-over coroutines which crash in codegen. Issue #104525.	2025-02-12 21:39:32 -08:00
Vitaly Buka	be98428374	[NFC][Pipelines] Extract buildCoroConditionalWrapper (#126860 ) Helper for #126168. `Phase` will be used in followup patches.	2025-02-11 23:54:07 -08:00
Sjoerd Meijer	612df14c00	[Clang][Driver] Add an option to control loop-interchange (#125830 ) This introduces options `-floop-interchange` and `-fno-loop-interchange` to enable/disable the loop-interchange pass. This is part of the work that tries to get that pass enabled by default (#124911), where it was remarked that a user facing option to control this would be convenient to have. The option name is the same as GCC's.	2025-02-07 10:31:24 +00:00
Joseph Huber	d9500f5032	[OpenMP] Fix the OpenMPOpt pass incorrectly optimizing if definition was missing Summary: This code is intended to block transformations if the call isn't present, however the way it's coded it silently lets it pass if the definition doesn't exist at all. This previously was always valid since we included the runtime as one giant blob so everything was always there, but now that we want to move towards separate ones, it's not quite correct.	2025-02-06 21:38:36 -06:00
Axel Sorenson	d3161defd6	[PassBuilder] VectorizerEnd Extension Points (#123494 ) Added an extension point after vectorizer passes in the PassBuilder. Additionally, added extension points before and after vectorizer passes in `buildLTODefaultPipeline`. Credit goes to @mshockwave for guiding me through my first LLVM contribution (and my first open source contribution in general!) :) - Implemented `registerVectorizerEndEPCallback` - Implemented `invokeVectorizerEndEPCallbacks` - Added `VectorizerEndEPCallbacks` SmallVector - Added a command line option `passes-ep-vectorizer-end` to `NewPMDriver.cpp` - `buildModuleOptimizationPipeline` now calls `invokeVectorizerEndEPCallbacks` - `buildO0DefaultPipeline` now calls `invokeVectorizerEndEPCallbacks` - `buildLTODefaultPipeline` now calls BOTH `invokeVectorizerStartEPCallbacks` and `invokeVectorizerEndEPCallbacks` - Added LIT tests to `new-pm-defaults.ll`, `new-pm-lto-defaults.ll`, `new-pm-O0-ep-callbacks.ll`, and `pass-pipeline-parsing.ll` - Renamed `CHECK-EP-Peephole` to `CHECK-EP-PEEPHOLE` in `new-pm-lto-defaults.ll` for consistency. This code is intended for developers that wish to implement and run custom passes after the vectorizer passes in the PassBuilder pipeline. For example, in #91796, a pass was created that changed the induction variables of vectorized code. This is right after the vectorization passes.	2025-01-29 11:24:03 -08:00
gulfemsavrun	38902153fe	[PassBuilder] Add RelLookupTableConverterPass to LTO (#124053 ) [PassBuilder] Add RelLookupTableConverterPass to LTO This patch adds RelLookupTableConverterPass into the LTO post-link optimization pass pipeline. This optimization converts lookup tables to relative lookup tables to make them PIC-friendly, which is already included in the non-LTO pass pipeline. This patch adds this optimization to the post-link optimization pipeline to discover more opportunities in the LTO context.	2025-01-28 15:08:03 -08:00
Ryan Mansfield	67efbd0bf1	[LLVM] Fix various cl::desc typos and whitespace issues (NFC) (#121955 )	2025-01-08 11:07:23 +01:00
Florian Hahn	9e66206638	[Passes] Generalize ShouldRunExtraVectorPasses to allow re-use (NFCI). (#118323 ) Generalize ShouldRunExtraVectorPasses to ShouldRunExtraPasses, to allow re-use for other transformations. PR: https://github.com/llvm/llvm-project/pull/118323	2024-12-04 16:55:06 +00:00
Kyungwoo Lee	d23c5c2d65	[CGData] Global Merge Functions (#112671 ) This implements a global function merging pass. Unlike traditional function merging passes that use IR comparators, this pass employs a structurally stable hash to identify similar functions while ignoring certain constant operands. These ignored constants are tracked and encoded into a stable function summary. When merging, instead of explicitly folding similar functions and their call sites, we form a merging instance by supplying different parameters via thunks. The actual size reduction occurs when identically created merging instances are folded by the linker. Currently, this pass is wired to a pre-codegen pass, enabled by the `-enable-global-merge-func` flag. In a local merging mode, the analysis and merging steps occur sequentially within a module: - `analyze`: Collects stable function hashes and tracks locations of ignored constant operands. - `finalize`: Identifies merge candidates with matching hashes and computes the set of parameters that point to different constants. - `merge`: Uses the stable function map to optimistically create a merged function. We can enable a global merging mode similar to the global function outliner (https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753/), which will perform the above steps separately. - `-codegen-data-generate`: During the first round of code generation, we analyze local merging instances and publish their summaries. - Offline using `llvm-cgdata` or at link-time, we can finalize all these merging summaries that are combined to determine parameters. - `-codegen-data-use`: During the second round of code generation, we optimistically create merging instances within each module, and finally, the linker folds identically created merging instances. Depends on #112664 This is a patch for https://discourse.llvm.org/t/rfc-global-function-merging/82608.	2024-11-13 17:34:07 -08:00
Lei Wang	bc1aa2863b	[SampleFDO] Support enabling sample loader pass in O0 mode (#113985 ) Add support for enabling sample loader pass in O0 mode(under `-fsample-profile-use`). This can help verify PGO raw profile count quality or provide a more accurate performance proxy(predictor), as O0 mode has minimal or no compiler optimizations that might otherwise impact profile count accuracy. - Explicitly disable the sample loader inlining to ensure it only emits sampling annotation. - Use flattened profile for O0 mode. - Add the pass after `AddDiscriminatorsPass` pass to work with `-fdebug-info-for-profiling`.	2024-11-08 15:29:44 -08:00
Yuxuan Chen	c6414970d7	[Coroutines] Inline the `.noalloc` ramp function marked coro_safe_elide (#114004 )	2024-11-07 22:41:32 -08:00
Hari Limaye	fbd89bcc66	Reland "[LTO] Run Argument Promotion before IPSCCP" (#111853 ) Run ArgumentPromotion before IPSCCP in the LTO pipeline, to expose more constants to be propagated. We also run PostOrderFunctionAttrs to improve the information available to ArgumentPromotion's alias analysis, and SROA to clean up allocas. Relands #111163.	2024-11-06 13:54:48 +00:00
Shilei Tian	390300d9f4	[PassBuilder] Add `ThinOrFullLTOPhase` to optimizer pipeline (#114577 )	2024-11-03 23:25:29 -05:00
Shilei Tian	dc45ff1d2a	[PassBuilder] Add `ThinOrFullLTOPhase` to early simplication EP call backs (#114547 ) The early simplication pipeline is used in non-LTO and (Thin/Full)LTO pre-link stage. There are some passes that we want them in non-LTO mode, but not at LTO pre-link stage. The control is missing currently. This PR adds the support. To demonstrate the use, we only enable the internalization pass in non-LTO mode for AMDGPU because having it run in pre-link stage causes some issues.	2024-11-03 23:24:10 -05:00
Shilei Tian	5445edb5d6	[PassBuilder] Replace `bool LTOPreLink` with `ThinOrFullLTOPhase Phase` (#114564 ) This will allow more fine-grained control in the future.	2024-11-01 14:56:35 -04:00
Lei Wang	bef3b54ea1	[InstrPGO] Avoid using global variable to fix potential data race (#114364 ) In https://github.com/llvm/llvm-project/pull/109837, it sets a global variable(`PGOInstrumentColdFunctionOnly`) in PassBuilderPipelines.cpp which introduced a data race detected by TSan. To fix this, I decouple the flag setting, the flags are now set separately(`instrument-cold-function-only-path` is required to be used with `--pgo-instrument-cold-function-only`).	2024-10-31 21:28:13 -07:00
Paul Kirth	913cd11f94	[llvm][fatlto] Drop any CFI related instrumentation after emitting bitcode (#112788 ) We want to support CFI instrumentation for the bitcode section, without miscompiling the object code portion of a FatLTO object. We can reuse the existing mechanisms in the LowerTypeTestsPass to do that, by just adding the pass to the FatLTO pipeline after the EmbedBitcodePass with the correct options set. Fixes #112053	2024-10-31 12:40:21 -07:00
Dmitry Chernenkov	d924a9ba03	Revert "[InstrPGO] Support cold function coverage instrumentation (#109837 )" This reverts commit e517cfc531886bf6ed64b4e7109bb3141ac7f430.	2024-10-31 10:55:17 +00:00
Paul Kirth	b01e2a8b56	[llvm] Allow always dropping all llvm.type.test sequences Currently, the `DropTypeTests` parameter only fully works with phi nodes and llvm.assume instructions. However, we'd like CFI to work in conjunction with FatLTO, in so far as the bitcode section should be able to contain the CFI instrumentation, while any incompatible bits are dropped when compiling the object code. To do that, we need to drop the llvm.type.test instructions everywhere, and not just their uses in phi nodes. This patch updates the LowerTypeTest pass so that uses are removed, and replaced with `true` in all cases, and not just in phi nodes. Addressing this will allow us to fix #112053 by modifying the FatLTO pipeline. Reviewers: pcc, nikic Reviewed By: pcc Pull Request: https://github.com/llvm/llvm-project/pull/112787	2024-10-30 16:56:30 -07:00
Lei Wang	e517cfc531	[InstrPGO] Support cold function coverage instrumentation (#109837 ) This patch adds support for cold function coverage instrumentation based on sampling PGO counts. The major motivation is to detect dead functions for the services that are optimized with sampling PGO. If a function is covered by sampling profile count (e.g., those with an entry count > 0), we choose to skip instrumenting those functions, which significantly reduces the instrumentation overhead. More details about the implementation and flags: - Added a flag `--pgo-instrument-cold-function-only` in `PGOInstrumentation.cpp` as the main switch to control skipping the instrumentation. - Built the extra instrumentation passes(a bundle of passes in `addPGOInstrPasses`) under sampling PGO pipeline. This is controlled by `--instrument-cold-function-only-path` flag. - Added a driver flag `-fprofile-generate-cold-function-coverage`: - 1) Config the flags in one place, i,e. adding `--instrument-cold-function-only-path=<...>` and `--pgo-function-entry-coverage`. Note that the instrumentation file path is passed through `--instrument-sample-cold-function-path`, because we cannot use the `PGOOptions.ProfileFile` as it's already used by `-fprofile-sample-use=<...>`. - 2) makes linker to link `compiler_rt.profile` lib(see [ToolChain.cpp#L1125-L1131](https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/ToolChain.cpp#L1125-L1131) ). - Added a flag(`--pgo-cold-instrument-entry-threshold`) to config entry count to determine cold function. Overall, the full command is like: ``` clang++ -O2 -fprofile-generate-cold-function-coverage=<...> -fprofile-sample-use=<...> code.cc -o code ```	2024-10-28 10:13:45 -07:00
Teresa Johnson	1de71652fd	[MemProf] Support cloning for indirect calls with ThinLTO (#110625 ) This patch enables support for cloning in indirect callsites. This is done by synthesizing callsite records for each virtual call target from the profile metadata. In the thin link all the synthesized records for a particular indirect callsite initially share the same context node, but support is added to partition the callsites and outgoing edges based on the callee function, creating a separate node for each target. In the LTO backend, when cloning is needed we first perform indirect call promotion, then change the target of the new direct call to the desired clone. Note this is ThinLTO-specific, since for regular LTO indirect call promotion should have already occurred.	2024-10-11 13:53:35 -07:00
Arthur Eubanks	e34d614e7d	[Passes] Remove -enable-infer-alignment-pass flag (#111873 ) This flag has been on for a while without any complaints.	2024-10-10 12:28:46 -07:00
Hari Limaye	0a0f100a70	Revert "[LTO] Run Argument Promotion before IPSCCP" (#111839 ) Reverts llvm/llvm-project#111163, as this was merged prematurely.	2024-10-10 15:03:01 +01:00
Hari Limaye	b9754e9d28	[LTO] Run Argument Promotion before IPSCCP (#111163 ) Run ArgumentPromotion before IPSCCP in the LTO pipeline, to expose more constants to be propagated. We also run PostOrderFunctionAttrs to improve the information available to ArgumentPromotion's alias analysis, and SROA to clean up allocas.	2024-10-10 06:08:27 -04:00
Mircea Trofin	9c0ba62010	[ctx_prof] Relax the "profile use" case around `PGOOpt` (#108265 ) `PGOOpt` could have a value if, for instance, debug info for profiling is requested. Relaxing the requirement, for now, following that eventually we would factor `PGOOpt` to better capture the supported interplay between the various profiling options.	2024-09-11 13:39:50 -07:00
Yuxuan Chen	761bf333e3	[LLVM][Coroutines] Switch CoroAnnotationElidePass to a FunctionPass (#107897 ) After landing https://github.com/llvm/llvm-project/pull/99285 we found that the call graph update was causing the following crash when expensive checks are turned on ``` llvm-project/llvm/lib/Analysis/CGSCCPassManager.cpp:982: LazyCallGraph::SCC &updateCGAndAnalysisManagerForPass(LazyCallGraph &, LazyCallGraph::SCC &, LazyCallGraph::Node &, CGSCCAnalysisManager &, CGSCCUpdateResult &, FunctionAnalysisManager &, bool): Assertion `(RC == &TargetRC \|\| RC->isAncestorOf(Targe tRC)) && "New call edge is not trivial!"' failed. ``` I have to admit I believe that the call graph update process I did for that patch could be wrong. After reading the code in `CGSCCToFunctionPassAdaptor`, I am convinced that `CoroAnnotationElidePass` can be a FunctionPass and rely on the adaptor to update the call graph for us, so long as we properly invalidate the caller's analyses. After this patch, `llvm/test/Transforms/Coroutines/coro-transform-must-elide.ll` no longer fails under expensive checks.	2024-09-09 18:57:39 -07:00
Mircea Trofin	3b22618094	[ctx_prof] Insert the ctx prof flattener after the module inliner (#107499 ) This patch enables experimenting with the contextual profile. ICP is currently disabled in this case - will reenable it subsequently. Also subsequently the inline cost model / decision making would be updated to be context-aware. Right now, this just achieves "complete use" of the profile, in that it's ingested, maintained, and sunk to a flat profile when not needed anymore. Issue [#89287](https://github.com/llvm/llvm-project/issues/89287)	2024-09-09 18:16:24 -07:00
Yuxuan Chen	a416267a5f	[LLVM][Coroutines] Transform "coro_elide_safe" calls to switch ABI coroutines to the `noalloc` variant (#99285 ) This patch is episode three of the middle end implementation for the coroutine HALO improvement project published on discourse: https://discourse.llvm.org/t/language-extension-for-better-more-deterministic-halo-for-c-coroutines/80044 After we attribute the calls to some coroutines as "coro_elide_safe" in the C++ FE and creating a `noalloc` ramp function, we use a new middle end pass to move the call to coroutines to the noalloc variant. This pass should be run after CoroSplit. For each node we process in CoroSplit, we look for its callers and replace the attributed ones in presplit coroutines to the noalloc one. The transformed `noalloc` ramp function will also require a frame pointer to a block of memory it can use as an activation frame. We allocate this on the caller's frame with an alloca. Please note that we cannot safely transform such attributed calls in post-split coroutines due to memory lifetime reasons. The CoroSplit pass is responsible for creating the coroutine frame spills for all the allocas in the coroutine. Therefore it will be unsafe to create new allocas like this one in post-split coroutines. This happens relatively rarely because CGSCC performs the passes on the callees before the caller. However, if multiple coroutines coexist in one SCC, this situation does happen (and prevents us from having potentially unbound frame size due to recursion.) You can find episode 1: Clang FE of this patch series at https://github.com/llvm/llvm-project/pull/99282 Episode 2: CoroSplit at https://github.com/llvm/llvm-project/pull/99283	2024-09-08 23:09:40 -07:00
Mingming Liu	d4ddf06b0c	[NFCI]Remove EntryCount from FunctionSummary and clean up surrounding synthetic count passes. (#107471 ) The primary motivation is to remove `EntryCount` from `FunctionSummary`. This frees 8 bytes out of `sizeof(FunctionSummary)` (136 bytes as of `64498c5483`). While I'm at it, this PR clean up {SummaryBasedOptimizations, SyntheticCountsPropagation} since they were not used and there are no plans to further invest on them. With this patch, bitcode writer writes a placeholder 0 at the byte offset of `EntryCount` and bitcode reader can parse the function entry count at the correct byte offset. Added a TODO to stop writing `EntryCount` and bump bitcode version	2024-09-06 16:38:17 -07:00
Mircea Trofin	775c50709c	[ctx_prof] Flattened profile lowering pass (#107329 ) Pass to flatten and lower the contextual profile to profile (i.e. `MD_prof`) metadata. This is expected to be used after all IPO transformations have happened. Prior to lowering, the instrumentation is maintained during IPO and the contextual profile is kept in sync (see PRs #105469, #106154). Flattening (#104539) sums up all the counters belonging to all a function's context nodes. We first propagate counter values (from the flattened profile) using the same propagation algorithm as `PGOUseFunc::populateCounters`, then map the edge values to `branch_weights`. Functions. in the module that don't have an entry in the flattened profile are deemed cold, and any `MD_prof` metadata they may have is reset. The profile summary is also reset at this point. Issue [#89287](https://github.com/llvm/llvm-project/issues/89287)	2024-09-06 13:47:08 -07:00

1 2 3 4 5

243 Commits