llvm-project

Author	SHA1	Message	Date
Nikita Popov	c23b4fbdbb	[IR] Remove size argument from lifetime intrinsics (#150248 ) Now that #149310 has restricted lifetime intrinsics to only work on allocas, we can also drop the explicit size argument. Instead, the size is implied by the alloca. This removes the ability to only mark a prefix of an alloca alive/dead. We never used that capability, so we should remove the need to handle that possibility everywhere (though many key places, including stack coloring, did not actually respect this).	2025-08-08 11:09:34 +02:00
Mircea Trofin	f675483905	[profcheck] Annotate `select` instructions (#152171 ) For `select`, we don't have the equivalent of the branch probability analysis to offer defaults, so we make up our own and allow their overriding with flags. Issue #147390	2025-08-06 02:48:50 +02:00
Mircea Trofin	9a60841dc4	[PGO][profcheck] ignore explicitly cold functions (#151778 ) There is a case when branch profile metadata is OK to miss, namely, cold functions. The goal of the RFC (see the referenced issue) is to avoid accidental omission (and, at a later date, corruption) of profile metadata. However, asking cold functions to have all their conditional branches marked with "0" probabilities would be overdoing it. We can just ask cold functions to have an explicit 0 entry count. This patch: - injects an entry count for functions, unless they have one (synthetic or not) - if the entry count is 0, doesn't inject, nor does it verify the rest of the metadata - at verification, if the entry count is missing, it reports an error Issue #147390	2025-08-04 03:53:49 +02:00
Joel E. Denny	37e03b56b8	Revert "[PGO] Add `llvm.loop.estimated_trip_count` metadata" (#151585 ) Reverts llvm/llvm-project#148758 [As requested.](https://github.com/llvm/llvm-project/pull/148758#pullrequestreview-3076627201)	2025-07-31 15:56:31 -04:00
Joel E. Denny	f7b65011de	[PGO] Add `llvm.loop.estimated_trip_count` metadata (#148758 ) This patch implements the `llvm.loop.estimated_trip_count` metadata discussed in [[RFC] Fix Loop Transformations to Preserve Block Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785). As [suggested in the RFC comments](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785/4), it adds the new metadata to all loops at the time of profile ingestion and estimates each trip count from the loop's `branch_weights` metadata. As [suggested in the PR #128785 review](https://github.com/llvm/llvm-project/pull/128785#discussion_r2151091036), it does so via a new `PGOEstimateTripCountsPass` pass, which creates the new metadata for each loop but omits the value if it cannot estimate a trip count due to the loop's form. An important observation not previously discussed is that `PGOEstimateTripCountsPass` often cannot estimate a loop's trip count, but later passes can sometimes transform the loop in a way that makes it possible. Currently, such passes do not necessarily update the metadata, but eventually that should be fixed. Until then, if the new metadata has no value, `llvm::getLoopEstimatedTripCount` disregards it and tries again to estimate the trip count from the loop's current `branch_weights` metadata.	2025-07-31 12:28:25 -04:00
Mircea Trofin	931228e28f	[PGO] Drive profile validator from opt (#147418 ) Add option to `opt` to run the `ProfileInjectorPass` before the passes opt would run, and then `ProfileVerifierPass` after. This will then be a mode in which we run tests on a specialized buildbot, with the goal of finding passes that drop (and, later, corrupt) profile information.	2025-07-26 16:14:00 +02:00
xur-llvm	c9a8e15494	[ICP] Add a few tunings to indirect-call-promotion (#149892 ) [ICP] Add a few tunings to indirect-call-promtion Indirect-call promotion (ICP) has been adjusted with the following tunings: (1) Candidate functions can be now ICP'd even if only a declaration is present. (2) All non-cold candidate functions are now considered by ICP. Previously, only hot targets were considered. (3) If one target cannot be ICP'd, proceed with the remaining targets instead of exiting the callsite. This update hides all tunings under internal options and disables them by default. They'll be enabled in a later update. There'll also be another update to address the "not found" issue with indirect targets.	2025-07-24 09:55:28 -07:00
Mircea Trofin	df2d2d125b	[PGO] Add ProfileInjector and ProfileVerifier passes (#147388 ) Adding 2 passes, one to inject `MD_prof` and one to check its presence. A subsequent patch will add these (similar to debugify) to `opt` (and, eventually, a variant of this, to `llc`) Tracking issue: #147390	2025-07-23 21:34:58 +02:00
Snehasish Kumar	70233c61d6	Add minimum count threshold for indirect call promotion (#145282 ) Allow users to set the minimum absolute count for indirect call promotion. This is primarily meant to be control indirect call promotion for synthetic vp metadata introduced in #141164 for use by MemProf.	2025-06-26 12:10:59 -07:00
Teresa Johnson	3ec2de2753	[MemProf] Optionally save context size info on largest cold allocations (#142837 ) Reapply PR142507 with fix for test: add in the same x86_64-linux requirement as other tests as the stack ids are currently computed differently on big endian systems. This will be investigated separately. In order to allow selective reporting of context hinting during the LTO link, and in the future to allow selective more aggressive cloning, add an option to specify a minimum percent of the max cold size in the profile summary. Contexts that meet that threshold will get context size info metadata (and ThinLTO summary information) on the associated allocations. Specifying -memprof-report-hinted-sizes during the pre-LTO compile step will continue to cause all contexts to receive this metadata. But specifying -memprof-report-hinted-sizes only during the LTO link will cause only those that meet the new threshold and have the metadata to get reported. To support this, because the alloc info summary and associated bitcode requires the context size information to be in the same order as the other context information, 0s are inserted for contexts without this metadata. The bitcode writer uses a more compact format for the context ids to allow better compression of the 0s. As part of this change several helper methods are added to query whether metadata contains context size info on any or all contexts.	2025-06-04 13:08:56 -07:00
Mircea Trofin	e0909003ff	[ctxprof] Instrumentation: handle direct call targets to aliases (#142657 ) This was an oversight. GlobalAliases aren't `Functions`, so `getCalledFunction` would return `nullptr` and the callsite would be deemed as uninstrumentable.	2025-06-04 13:04:56 -07:00
Snehasish Kumar	d245b410a3	Revert "[MemProf] Drop unneccessary REQUIRES: x86-linux directives." (#142816 ) Reverts llvm/llvm-project#142718 Breaks ppc aix builds: https://lab.llvm.org/buildbot/#/builders/64/builds/4036	2025-06-04 10:08:55 -07:00
Snehasish Kumar	a87c4eef1d	[MemProf] Drop unneccessary REQUIRES: x86-linux directives. (#142718 ) These tests now use the YAML profile and should work across all platforms.	2025-06-04 08:37:02 -07:00
Teresa Johnson	6c1091ea3f	Revert "[MemProf] Optionally save context size info on largest cold allocations" (#142688 ) Reverts llvm/llvm-project#142507 due to buildbot failures that I will look into tomorrow.	2025-06-03 16:05:16 -07:00
Teresa Johnson	f2adae5780	[MemProf] Optionally save context size info on largest cold allocations (#142507 ) In order to allow selective reporting of context hinting during the LTO link, and in the future to allow selective more aggressive cloning, add an option to specify a minimum percent of the max cold size in the profile summary. Contexts that meet that threshold will get context size info metadata (and ThinLTO summary information) on the associated allocations. Specifying -memprof-report-hinted-sizes during the pre-LTO compile step will continue to cause all contexts to receive this metadata. But specifying -memprof-report-hinted-sizes only during the LTO link will cause only those that meet the new threshold and have the metadata to get reported. To support this, because the alloc info summary and associated bitcode requires the context size information to be in the same order as the other context information, 0s are inserted for contexts without this metadata. The bitcode writer uses a more compact format for the context ids to allow better compression of the 0s. As part of this change several helper methods are added to query whether metadata contains context size info on any or all contexts.	2025-06-03 14:20:38 -07:00
Kazu Hirata	c261bb7649	[memprof] Deduplicate alloc site matches (#142334 ) With: commit 2425626d803002027cbf71c39df80cb7b56db0fb Author: Kazu Hirata <kazu@google.com> Date: Sun Jun 1 08:09:58 2025 -0700 we print out a lot of duplicate alloc site matches. This patch partially reverts the patch above. The core idea of using a map to deduplicate entries remains the same, but details are different. Specifically: - This PR uses the [FullStackID, MatchLength] as the key, where MatchLength is the length of an alloc site match. - AllocMatchInfo in this PR no longer has Matched because we always report matches. - AllocMatchInfo in this PR no longer has NumFramesMatched because it has become part of the key. This deduplication roughly halves the amount of messages printed out.	2025-06-02 07:59:34 -07:00
Kazu Hirata	2425626d80	[memprof] Print alloc site matches immediately (#142233 ) Without this patch, we buffer alloc site matches in FullStackIdToAllocMatchInfo and then print them out at the end of MemProfUsePass. This practice is problematic when we have multiple matches per alloc site. Consider: char f1() { return new char[3]; } char f2() { return f1(); } __attribute__((noinline)) char *f3() { return f2(); } In this example, f1 contains an alloc site, of course, but so do f2 and f3 via inlining. When something like this happens, FullStackIdToAllocMatchInfo gets updated multiple times for the same full stack ID at: FullStackIdToAllocMatchInfo[FullStackId] = { ... }; with different InlinedCallStack.size() each time. This patch changes the behavior by immediately printing out alloc site matches, potentially printing out multiple matches for the same FullStackId. It is up to the consumer of the message to figure out the length of the longest matches for example. For the test, this test adjusts an existing one, memprof-dump-matched-alloc-site.ll. Specifically, this patch "restores" the IR and corresponding profile for f2 and f1 so that the compiler generates a "MemProf notcold" message for each of f1, f2, and f3.	2025-06-01 08:09:58 -07:00
Snehasish Kumar	c7b421deac	[MemProf] Attach value profile metadata to the IR using CalleeGuids. (#141164 ) Use the newly introduced CalleeGuids in CallSiteInfo to annotate the IR where necessary with value profile metadata. Use a synthetic count of 1 since we don't have actual counts in the profile collection.	2025-05-31 12:53:30 -07:00
xur-llvm	a004c703bc	[PGO] Make the PGO instrumentation insert point after alloca (#142043 ) We're changing PGO instrumentation to insert the intrinsic after alloca instructions. For sampled instrumentation, a conditional check is placed before the intrinsic. If this intrinsic comes before an alloca, the alloca (whose size might be unknown due to Phi node) becomes conditional, resulting in inefficient code. We have seen some stack overflows due to this. This patch guarantees the intrinsic is always after the alloca.	2025-05-30 14:37:06 -07:00
Teresa Johnson	49d48c32e0	[MemProf] Emit remarks when hinting allocations not needing cloning (#141859 ) The context disambiguation code already emits remarks when hinting allocations (by adding hotness attributes) during cloning. However, we did not yet emit hints when applying the hotness attributes during building of the metadata (during matching and again after inlining). Add remarks when we apply the hint attributes for these non-context-sensitive allocations.	2025-05-28 16:44:44 -07:00
Teresa Johnson	cc6f446d38	[MemProf] Add basic summary section support (#141805 ) This patch adds support for a basic MemProf summary section, which is built along with the indexed MemProf profile (e.g. when reading the raw or YAML profiles), and serialized through the indexed profile just after the header. Currently only 6 fields are written, specifically the number of contexts (total, cold, hot), and the max context size (cold, warm, hot). To support forwards and backwards compatibility for added fields in the indexed profile, the number of fields serialized first. The code is written to support forwards compatibility (reading newer profiles with additional summary fields), and comments indicate how to implement backwards compatibility (reading older profiles with fewer summary fields) as needed. Support is added to print the summary as YAML comments when displaying both the raw and indexed profiles via `llvm-profdata show`. Because they are YAML comments, the YAML reader ignores these (the summary is always recomputed when building the indexed profile as described above). This necessitated moving some options and a couple of interfaces out of Analysis/MemoryProfileInfo.cpp and into the new ProfileData/MemProfSummary.cpp file, as we need to classify context hotness earlier and also compute context ids to build the summary from older indexed profiles.	2025-05-28 13:12:41 -07:00
Arthur Eubanks	5ab017a30f	[PGO] Don't unconditionally request BBInfo in verifyFuncBFI() (#140804 ) This breaks in the case where there are unreachable blocks after an entry block with no successors, which don't have a `BBInfo`, causing crashes. `BBInfo` doesn't exist for unreachable blocks, see https://reviews.llvm.org/D27280. Fixes #135828.	2025-05-27 09:47:08 -07:00
Teresa Johnson	8836d68a0d	[MemProf] Optionally discard small non-cold contexts (#139113 ) Adds a new option -memprof-callsite-cold-threshold that allows specifying a percent that will cause non-cold contexts to be discarded if the percent cold bytes at a callsite including that context exceeds the given threshold. Default is 100% (no discarding). This reduces the amount of cloning needed to expose cold allocation contexts when parts of the context are dominantly cold. This motivated the change in PR138792, since discarding a context might require a different decision about which not-cold contexts must be kept to expose cloning requirements, so we need to determine that on the fly. Additionally, this required a change to include the context size information in the alloc trie in more cases, so we now guard the inclusion of this information in the generated metadata on the option values.	2025-05-09 15:56:54 -07:00
Teresa Johnson	764614e635	[MemProf] Restructure the pruning of unneeded NotCold contexts (#138792 ) This change is mostly NFC, other than the addition of a new message printed when contexts are pruned when -memprof-report-hinted-sizes is enabled. To prepare for a follow on change, adjust the way we determine which NotCold contexts can be pruned (because they overlap with longer NotCold contexts), and change the way we perform this pruning. Instead of determining the points at which we need to keep NotCold contexts during the building of the trie, we now determine this on the fly as the MIB metadata nodes are recursively built. This simplifies a follow on change that performs additional pruning of some NotCold contexts, and which can affect which others need to be kept as the longest overlapping NotCold contexts.	2025-05-07 17:34:44 -07:00
Kazu Hirata	cb96a3dc07	[memprof] Dump the number of matched frames (#137082 ) This patch teaches readMemprof to dump the number of frames for each allocation site match. This information helps us analyze what part of the call stack in the MemProf profile has matched the IR. Aside from updating existing test cases, this patch adds one more test case, memprof-dump-matched-alloc-site.ll, because none of the existing test cases has the number of frames greater than one.	2025-04-23 21:29:16 -07:00
Mircea Trofin	1576fa1010	[ctxprof] Extend the notion of "cannot return" (#135651 ) At the time of instrumentation (and instrumentation lowering), `noreturn` is not applied uniformously. Rather than running `FunctionAttrs` pass, we just need to use `llvm::canReturn` exposed in PR #135650	2025-04-16 10:39:34 -07:00
Mircea Trofin	e7aed23d32	[ctxprof] Handle instrumenting functions with `musttail` calls (#135121 ) Functions with `musttail` calls can't be roots because we can't instrument their `ret` to release the context. This patch tags their `CtxRoot` field in their `FunctionData`. In compiler-rt we then know not to allow such functions become roots, and also not confuse `CtxRoot == 0x1` with there being a context root. Currently we also lose the context tree under such cases. We can, in a subsequent patch, have the root detector search past these functions.	2025-04-14 10:01:25 -07:00
Mircea Trofin	4c90d977db	[ctxprof] Use the flattened contextual profile pre-thinlink (#134723 ) Flatten the profile pre-thinlink so that ThinLTO has something to work with for the parts of the binary that aren't covered by contextual profiles. Post-thinlink, the flattener is re-run and will actually change profile info, but just for the modules containing contextual trees ("specialized modules"). For the rest, the flattener just yanks out the instrumentation.	2025-04-08 17:30:49 -07:00
Mircea Trofin	cfa6a5940e	[ctxprof] Don't lower instrumentation for `noreturn` functions (#134932 ) `noreturn` functions are doubtfully interesting for performance optimization / profiling.	2025-04-08 14:48:41 -07:00
Mircea Trofin	b2dea4fd22	[ctxprof] root autodetection mechanism (#133147 ) This is an optional mechanism that automatically detects roots. It's a best-effort mechanism, and its main goal is to avoid pointing at the message pump function as a root. This is the function that polls message queue(s) in an infinite loop, and is thus a bad root (it never exits). High-level, when collection is requested - which should happen when a server has already been set up and handing requests - we spend a bit of time sampling all the server's threads. Each sample is a stack which we insert in a `PerThreadCallsiteTrie`. After a while, we run for each `PerThreadCallsiteTrie` the root detection logic. We then traverse all the `FunctionData`, find the ones matching the detected roots, and allocate a `ContextRoot` for them. From here, we special case `FunctionData` objects, in `__llvm_ctx_profile_get_context, that have a `CtxRoot` and route them to `__llvm_ctx_profile_start_context`. For this to work, on the llvm side, we need to have all functions call `__llvm_ctx_profile_release_context` because they _might_ be roots. This comes at a slight (percentages) penalty during collection - which we can afford since the overall technique is ~5x faster than normal instrumentation. We can later explore conditionally enabling autoroot detection and avoiding this penalty, if desired. Note that functions that `musttail call` can't have their return instrumented this way, and a subsequent patch will harden the mechanism against this case. The mechanism could be used in combination with explicit root specification, too.	2025-04-08 06:59:38 -07:00
Mircea Trofin	1757a235e3	[ctxprof] Make ContextRoot an implementation detail (#131416 ) `ContextRoot` `FunctionData` are currently known by the llvm side, which has to instantiate and zero-initialize them. This patch makes `FunctionData` the only global value that needs to be known and instantiated by the compiler. On the compiler-rt side, `ContextRoot`s are hung off `FunctionData`, when applicable. This is for two reasons. First, it is a step towards root autodetection (in a subsequent patch). An autodetection mechanism would instantiate the `ContextRoot` for the detected roots, and then `__llvm_ctx_profile_get_context` would detect that and route to `__llvm_ctx_profile_start_context`. The second reason is that we will hang off `ContextRoot` more complex datatypes (next patch), and we want to avoid too deep of a coupling between llvm and compiler-rt. Acting as a place to hang related data, `FunctionData` can stay simple - pointers and an (atomic) int (the mutex).	2025-03-18 22:03:26 -07:00
Mircea Trofin	215c47e4d3	[ctxprof] Missing test update post #131201 (#131428 )	2025-03-14 21:46:10 -07:00
Mircea Trofin	a5b95487d6	[ctxprof] Missing test for #131269 (#131271 )	2025-03-13 21:45:17 -07:00
Snehasish Kumar	e1ac57d53a	[MemProf] Extend CallSite information to include potential callees. (#130441 ) * Added YAML traits for `CallSiteInfo` * Updated the `MemProfReader` to pass `Frames` instead of the entire `CallSiteInfo` * Updated test cases to use `testing::Field` * Add YAML sequence traits for CallSiteInfo in MemProfYAML * Also extend IndexedMemProfRecord * XFAIL the MemProfYaml round trip test until we update the profile format For now we only read and write the additional information from the YAML format. The YAML round trip test will be enabled when the serialized format is updated.	2025-03-12 09:55:56 -07:00
Mircea Trofin	07d86d25c9	[ctxprof] Flat profile collection (#130655 ) Collect flat profiles. We only do this for function activations that aren't otherwise collectible under a context root are encountered. This allows us to reason about the full profile without concerning ourselves wether we are double-counting. For example we can combine (during profile use) flattened contextual profiles with flat profiles.	2025-03-12 07:47:58 -07:00
Kazu Hirata	b488ce0a67	[memprof] Improve call site matching (#129770 ) Suppose we have a call instruction satisfying: - AllocInfoIter != LocHashToAllocInfo.end() - CallSitesIter != LocHashToCallSites.end() - !isAllocationWithHotColdVariant(CI->getCalledFunction(), TLI) In this case this patch, we would take: if (AllocInfoIter != LocHashToAllocInfo.end() but end up discarding the opportunity because of the call to isAllocationWithHotColdVariant. This can happen in C++ code like: new Something[100]; which is lowered to two calls -- new and the constructor. This patch fixes the problem by falling back to the call site annotation if we have !isAllocationWithHotColdVariant.	2025-03-04 21:09:40 -08:00
Mircea Trofin	eb1c3ace39	[ctxprof] Override type of instrumentation if `-profile-context-root` is specified (#128940 ) This patch makes it easy to enable ctxprof instrumentation for targets where the build has a bunch of defaults for instrumented PGO that we want to inherit for ctxprof. This is switching experimental defaults: we'll eventually enable ctxprof instrumentation through `PGOOpt` but that type is currently quite entangled and, for the time being, no point adding to that.	2025-02-26 19:56:59 -08:00
Mircea Trofin	f6703a4ff5	[ctxprof] don't inline weak symbols after instrumentation (#128811 ) Contextual profiling identifies functions by GUID. Functions that may get overridden by the linker with a prevailing copy may have, during instrumentation, different variants in different modules. If these variants get inlined before linking (here I assume thinlto), they will identify themselves to the ctxprof runtime as their GUID, leading to issues - they may have different counter counts, for instance. If we block their inlining in the pre-thinlink compilation, only the prevailing copy will survive post-thinlink and the confusion is avoided. The change introduces a small pass just for this purpose, which marks any symbols that could be affected by the above as `noinline` (even if they were `alwaysinline`). We already carried out some inlining (via the preinliner), before instrumenting, so technically the `alwaysinline` directives were honored. We could later (different patch) choose to mark them back to their original attribute (none or `alwaysinline`) post-thinlink, if we want to - but experimentally that doesn't really change much of the performance of the instrumented binary.	2025-02-26 11:01:37 -08:00
Kazu Hirata	b7feccb31d	[memprof] Dump call site matching information (#125130 ) MemProfiler.cpp annotates the IR with the memory profile so that we can later duplicate context. This patch dumps the entire inline call stack for each call site match.	2025-02-06 23:37:10 -08:00
Teresa Johnson	ae6d5dd58b	[MemProf] Prune unneeded non-cold contexts (#124823 ) We can take advantage of the fact that we subsequently only clone cold allocation contexts, since not cold behavior is the default, and significantly reduce the amount of metadata (and later ThinLTO summary and MemProfContextDisambiguation graph nodes) by pruning unnecessary not cold contexts when building metadata from the trie. Specifically, we only need to keep notcold contexts that overlap the longest with cold allocations, to know how deeply to clone those contexts to expose the cold allocation behavior. For a large target this reduced ThinLTO bitcode object sizes by about 35%. It reduced the ThinLTO indexing time by about half and the peak ThinLTO indexing memory by about 20%.	2025-01-29 10:38:31 -08:00
Teresa Johnson	2af819fa3d	[MemProf] Add test for hot hints (#124394 ) The change in PR124219 required removing one of the tests added for -memprof-use-hot-hints, since we no longer label any contexts as hot in metadata, so add a new test that checks the hot attribute instead.	2025-01-26 07:53:53 -08:00
Teresa Johnson	c725a95e08	[MemProf] Convert Hot contexts to NotCold early (#124219 ) While we convert hot contexts to notcold contexts during the cloning step, their existence was greatly limiting the context trimming performed when we add the MemProf profile to the IR. To address this, any hot contexts are converted to notcold contexts immediately after first checking for unambiguous allocation types, and before checking it again and before adding metadata while performing context trimming. Note that hot hints are now disabled by default, however, this avoids adding unnecessary overhead if they are re-enabled.	2025-01-24 15:58:13 -08:00
Teresa Johnson	ae8b560899	[MemProf] Disable hot hints by default (#124338 ) By default we were marking some contexts as hot, and adding hot hints to unambiguously hot allocations. However, there is not yet support for cloning to expose hot allocation contexts, and none is planned for the forseeable future. While we convert hot contexts to notcold contexts during the cloning step, their existence was greatly limiting the context trimming performed when we add the MemProf profile to the IR. This change simply disables the generation of hot contexts / hints by default, as few allocations were unambiguously hot. A subsequent change will address the issue when hot hints are optionally enabled. See PR124219 for details. This change resulted in significant overhead reductions for a large target: ~48% reduction in the per-module ThinLTO bitcode summary sizes ~72% reduction in the distributed ThinLTO bitcode combined summary sizes ~68% reduction in thin link time ~34% reduction in thin link peak memory	2025-01-24 13:06:11 -08:00
Kazu Hirata	adf0c817f3	[memprof] Undrift MemProf profile even when some frames are missing (#120500 ) This patch makes the MemProf undrifting process a little more lenient. Consider an inlined call hierarchy: foo -> bar -> ::new If bar tail-calls ::new, the profile appears to indicate that foo directly calls ::new. This is a problem because the perceived call hierarchy in the profile looks different from what we can obtain from the inline stack in the IR. Recall that undrifting works by constructing and comparing a list of direct calls from the profile and that from the IR. This patch modifies the construction of the latter. Specifically, if foo calls bar in the IR, but bar is missing the profile, we pretend that foo directly calls some heap allocation function. We apply this transformation only in the inline stack leading to some heap allocation function.	2024-12-20 15:40:08 -08:00
Teresa Johnson	c7451ffcb9	[MemProf] Supporting hinting mostly-cold allocations after cloning (#120633 ) Optionally unconditionally hint allocations as cold or not cold during the cloning step if the percentage of bytes allocated is at least that of the given threshold. This is similar to PR120301 which supports this during matching, but enables the same behavior during cloning, to reduce the false positives that can be addressed by cloning at the cost of carrying the additional size metadata/summary.	2024-12-20 11:27:54 -08:00
Kazu Hirata	a03343daa6	[memprof] YAMLify the profile for memprof_missing_leaf.ll (NFC) (#120488 ) This patch converts the profile for memprof_missing_leaf.ll to the recently introduced YAML-based text format.	2024-12-19 10:16:10 -08:00
Kazu Hirata	ac8a9f8fff	[memprof] Undrift MemProfRecord (#120138 ) This patch undrifts source locations in MemProfRecord before readMemprof starts the matching process. The thoery of operation is as follows: 1. Collect the lists of direct calls, one from the IR and the other from the profile. 2. Compute the correspondence (called undrift map in the patch) between the two lists with longestCommonSequence. 3. Apply the undrift map just before readMemprof consumes MemProfRecord. The new function gated by a flag that is off by default.	2024-12-18 14:21:25 -08:00
Teresa Johnson	a15e7b11da	[MemProf] Add option to hint allocations at a given cold byte percentage (#120301 ) Optionally unconditionally hint allocations as cold or not cold during the matching step if the percentage of bytes allocated is at least that of the given threshold.	2024-12-17 15:53:56 -08:00
Teresa Johnson	d7d0e740cc	[MemProf] Refactor single alloc type handling and use in more cases (#120290 ) Emit message when we have aliased contexts that are conservatively hinted not cold. This is not a change in behavior, just in message when the -memprof-report-hinted-sizes flag is enabled.	2024-12-17 12:50:49 -08:00
Kazu Hirata	8476ba71f2	[memprof] YAMLify one test (NFC) (#119955 ) This patch replaces the raw binary profile with a YAML profile. I've trimmed the profile by removing all MemProfRecords except the one for _Z3foov. This patch demonstrates that we can see !memprof generated even with a YAML profile.	2024-12-15 22:22:25 -08:00

1 2 3 4 5 ...

477 Commits