477 Commits

Author SHA1 Message Date
Nikita Popov
c23b4fbdbb
[IR] Remove size argument from lifetime intrinsics (#150248)
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.

This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
2025-08-08 11:09:34 +02:00
Mircea Trofin
f675483905
[profcheck] Annotate select instructions (#152171)
For `select`, we don't have the equivalent of the branch probability analysis to offer defaults, so we make up our own and allow their overriding with flags.

Issue #147390
2025-08-06 02:48:50 +02:00
Mircea Trofin
9a60841dc4
[PGO][profcheck] ignore explicitly cold functions (#151778)
There is a case when branch profile metadata is OK to miss, namely, cold functions. The goal of the RFC (see the referenced issue) is to avoid accidental omission (and, at a later date, corruption) of profile metadata. However, asking cold functions to have all their conditional branches marked with "0" probabilities would be overdoing it. We can just ask cold functions to have an explicit 0 entry count.

This patch:
- injects an entry count for functions, unless they have one (synthetic or not)
- if the entry count is 0, doesn't inject, nor does it verify the rest of the metadata
- at verification, if the entry count is missing, it reports an error

Issue #147390
2025-08-04 03:53:49 +02:00
Joel E. Denny
37e03b56b8
Revert "[PGO] Add llvm.loop.estimated_trip_count metadata" (#151585)
Reverts llvm/llvm-project#148758

[As
requested.](https://github.com/llvm/llvm-project/pull/148758#pullrequestreview-3076627201)
2025-07-31 15:56:31 -04:00
Joel E. Denny
f7b65011de
[PGO] Add llvm.loop.estimated_trip_count metadata (#148758)
This patch implements the `llvm.loop.estimated_trip_count` metadata
discussed in [[RFC] Fix Loop Transformations to Preserve Block
Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785).
As [suggested in the RFC
comments](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785/4),
it adds the new metadata to all loops at the time of profile ingestion
and estimates each trip count from the loop's `branch_weights` metadata.
As [suggested in the PR #128785
review](https://github.com/llvm/llvm-project/pull/128785#discussion_r2151091036),
it does so via a new `PGOEstimateTripCountsPass` pass, which creates the
new metadata for each loop but omits the value if it cannot estimate a
trip count due to the loop's form.

An important observation not previously discussed is that
`PGOEstimateTripCountsPass` *often* cannot estimate a loop's trip count,
but later passes can sometimes transform the loop in a way that makes it
possible. Currently, such passes do not necessarily update the metadata,
but eventually that should be fixed. Until then, if the new metadata has
no value, `llvm::getLoopEstimatedTripCount` disregards it and tries
again to estimate the trip count from the loop's current
`branch_weights` metadata.
2025-07-31 12:28:25 -04:00
Mircea Trofin
931228e28f
[PGO] Drive profile validator from opt (#147418)
Add option to `opt` to run the `ProfileInjectorPass` before the passes opt would run, and then `ProfileVerifierPass` after. This will then be a mode in which we run tests on a specialized buildbot, with the goal of finding passes that drop (and, later, corrupt) profile information.
2025-07-26 16:14:00 +02:00
xur-llvm
c9a8e15494
[ICP] Add a few tunings to indirect-call-promotion (#149892)
[ICP] Add a few tunings to indirect-call-promtion

Indirect-call promotion (ICP) has been adjusted with the following
tunings:
(1) Candidate functions can be now ICP'd even if only a declaration is
     present.
(2) All non-cold candidate functions are now considered by ICP.
      Previously, only hot targets were considered.
(3) If one target cannot be ICP'd, proceed with the remaining targets
     instead of exiting the callsite.
    
This update hides all tunings under internal options and disables them
by default. They'll be enabled in a later update. There'll also be
another update to address the "not found" issue with indirect targets.
2025-07-24 09:55:28 -07:00
Mircea Trofin
df2d2d125b
[PGO] Add ProfileInjector and ProfileVerifier passes (#147388)
Adding 2 passes, one to inject `MD_prof` and one to check its presence. A subsequent patch will add these (similar to debugify) to `opt` (and, eventually, a variant of this, to `llc`)

Tracking issue: #147390
2025-07-23 21:34:58 +02:00
Snehasish Kumar
70233c61d6
Add minimum count threshold for indirect call promotion (#145282)
Allow users to set the minimum absolute count for indirect call promotion. This is primarily meant to be control indirect call promotion for synthetic vp metadata introduced in #141164 for use by MemProf.
2025-06-26 12:10:59 -07:00
Teresa Johnson
3ec2de2753
[MemProf] Optionally save context size info on largest cold allocations (#142837)
Reapply PR142507 with fix for test: add in the same x86_64-linux
requirement as other tests as the stack ids are currently computed
differently on big endian systems. This will be investigated separately.

In order to allow selective reporting of context hinting during the LTO
link, and in the future to allow selective more aggressive cloning, add
an option to specify a minimum percent of the max cold size in the
profile summary. Contexts that meet that threshold will get context size
info metadata (and ThinLTO summary information) on the associated
allocations.

Specifying -memprof-report-hinted-sizes during the pre-LTO compile step
will continue to cause all contexts to receive this metadata. But
specifying -memprof-report-hinted-sizes only during the LTO link will
cause only those that meet the new threshold and have the metadata to
get reported.

To support this, because the alloc info summary and associated bitcode
requires the context size information to be in the same order as the
other context information, 0s are inserted for contexts without this
metadata. The bitcode writer uses a more compact format for the context
ids to allow better compression of the 0s.

As part of this change several helper methods are added to query whether
metadata contains context size info on any or all contexts.
2025-06-04 13:08:56 -07:00
Mircea Trofin
e0909003ff
[ctxprof] Instrumentation: handle direct call targets to aliases (#142657)
This was an oversight. GlobalAliases aren't `Functions`, so `getCalledFunction` would return `nullptr` and the callsite would be deemed as uninstrumentable.
2025-06-04 13:04:56 -07:00
Snehasish Kumar
d245b410a3
Revert "[MemProf] Drop unneccessary REQUIRES: x86-linux directives." (#142816)
Reverts llvm/llvm-project#142718

Breaks ppc aix builds:
https://lab.llvm.org/buildbot/#/builders/64/builds/4036
2025-06-04 10:08:55 -07:00
Snehasish Kumar
a87c4eef1d
[MemProf] Drop unneccessary REQUIRES: x86-linux directives. (#142718)
These tests now use the YAML profile and should work across all
platforms.
2025-06-04 08:37:02 -07:00
Teresa Johnson
6c1091ea3f
Revert "[MemProf] Optionally save context size info on largest cold allocations" (#142688)
Reverts llvm/llvm-project#142507 due to buildbot failures that I will
look into tomorrow.
2025-06-03 16:05:16 -07:00
Teresa Johnson
f2adae5780
[MemProf] Optionally save context size info on largest cold allocations (#142507)
In order to allow selective reporting of context hinting during the LTO
link, and in the future to allow selective more aggressive cloning, add
an option to specify a minimum percent of the max cold size in the
profile summary. Contexts that meet that threshold will get context size
info metadata (and ThinLTO summary information) on the associated
allocations.

Specifying -memprof-report-hinted-sizes during the pre-LTO compile step
will continue to cause all contexts to receive this metadata. But
specifying -memprof-report-hinted-sizes only during the LTO link will
cause only those that meet the new threshold and have the metadata to
get reported.

To support this, because the alloc info summary and associated bitcode
requires the context size information to be in the same order as the
other context information, 0s are inserted for contexts without this
metadata. The bitcode writer uses a more compact format for the context
ids to allow better compression of the 0s.

As part of this change several helper methods are added to query whether
metadata contains context size info on any or all contexts.
2025-06-03 14:20:38 -07:00
Kazu Hirata
c261bb7649
[memprof] Deduplicate alloc site matches (#142334)
With:

  commit 2425626d803002027cbf71c39df80cb7b56db0fb
  Author: Kazu Hirata <kazu@google.com>
  Date:   Sun Jun 1 08:09:58 2025 -0700

we print out a lot of duplicate alloc site matches.

This patch partially reverts the patch above.  The core idea of using
a map to deduplicate entries remains the same, but details are
different.  Specifically:

- This PR uses the [FullStackID, MatchLength] as the key, where
  MatchLength is the length of an alloc site match.

- AllocMatchInfo in this PR no longer has Matched because we always
  report matches.

- AllocMatchInfo in this PR no longer has NumFramesMatched because it
  has become part of the key.

This deduplication roughly halves the amount of messages printed out.
2025-06-02 07:59:34 -07:00
Kazu Hirata
2425626d80
[memprof] Print alloc site matches immediately (#142233)
Without this patch, we buffer alloc site matches in
FullStackIdToAllocMatchInfo and then print them out at the end of
MemProfUsePass.

This practice is problematic when we have multiple matches per alloc
site.  Consider:

  char *f1() { return new char[3]; }
  char *f2() { return f1(); }
  __attribute__((noinline)) char *f3() { return f2(); }

In this example, f1 contains an alloc site, of course, but so do f2
and f3 via inlining.  When something like this happens,
FullStackIdToAllocMatchInfo gets updated multiple times for the same
full stack ID at:

  FullStackIdToAllocMatchInfo[FullStackId] = { ... };

with different InlinedCallStack.size() each time.

This patch changes the behavior by immediately printing out alloc site
matches, potentially printing out multiple matches for the same
FullStackId.  It is up to the consumer of the message to figure out
the length of the longest matches for example.

For the test, this test adjusts an existing one,
memprof-dump-matched-alloc-site.ll.  Specifically, this patch
"restores" the IR and corresponding profile for f2 and f1 so that the
compiler generates a "MemProf notcold" message for each of f1, f2, and
f3.
2025-06-01 08:09:58 -07:00
Snehasish Kumar
c7b421deac
[MemProf] Attach value profile metadata to the IR using CalleeGuids. (#141164)
Use the newly introduced CalleeGuids in CallSiteInfo to annotate the IR
where necessary with value profile metadata. Use a synthetic count of 1
since we don't have actual counts in the profile collection.
2025-05-31 12:53:30 -07:00
xur-llvm
a004c703bc
[PGO] Make the PGO instrumentation insert point after alloca (#142043)
We're changing PGO instrumentation to insert the intrinsic after alloca
instructions. For sampled instrumentation, a conditional check is placed
before the intrinsic. If this intrinsic comes before an alloca, the
alloca (whose size might be unknown due to Phi node) becomes
conditional, resulting in inefficient code. We have seen some stack
overflows due to this.

This patch guarantees the intrinsic is always after the alloca.
2025-05-30 14:37:06 -07:00
Teresa Johnson
49d48c32e0
[MemProf] Emit remarks when hinting allocations not needing cloning (#141859)
The context disambiguation code already emits remarks when hinting
allocations (by adding hotness attributes) during cloning. However,
we did not yet emit hints when applying the hotness attributes during
building of the metadata (during matching and again after inlining).
Add remarks when we apply the hint attributes for these
non-context-sensitive allocations.
2025-05-28 16:44:44 -07:00
Teresa Johnson
cc6f446d38
[MemProf] Add basic summary section support (#141805)
This patch adds support for a basic MemProf summary section, which is
built along with the indexed MemProf profile (e.g. when reading the raw
or YAML profiles), and serialized through the indexed profile just after
the header.

Currently only 6 fields are written, specifically the number of contexts
(total, cold, hot), and the max context size (cold, warm, hot).

To support forwards and backwards compatibility for added fields in the
indexed profile, the number of fields serialized first. The code is
written to support forwards compatibility (reading newer profiles with
additional summary fields), and comments indicate how to implement
backwards compatibility (reading older profiles with fewer summary
fields) as needed.

Support is added to print the summary as YAML comments when displaying
both the raw and indexed profiles via `llvm-profdata show`. Because they
are YAML comments, the YAML reader ignores these (the summary is always
recomputed when building the indexed profile as described above).

This necessitated moving some options and a couple of interfaces out of
Analysis/MemoryProfileInfo.cpp and into the new
ProfileData/MemProfSummary.cpp file, as we need to classify context
hotness earlier and also compute context ids to build the summary from
older indexed profiles.
2025-05-28 13:12:41 -07:00
Arthur Eubanks
5ab017a30f
[PGO] Don't unconditionally request BBInfo in verifyFuncBFI() (#140804)
This breaks in the case where there are unreachable blocks after an
entry block with no successors, which don't have a `BBInfo`, causing
crashes.

`BBInfo` doesn't exist for unreachable blocks, see
https://reviews.llvm.org/D27280.

Fixes #135828.
2025-05-27 09:47:08 -07:00
Teresa Johnson
8836d68a0d
[MemProf] Optionally discard small non-cold contexts (#139113)
Adds a new option -memprof-callsite-cold-threshold that allows
specifying a percent that will cause non-cold contexts to be discarded
if the percent cold bytes at a callsite including that context exceeds
the given threshold. Default is 100% (no discarding).

This reduces the amount of cloning needed to expose cold allocation
contexts when parts of the context are dominantly cold.

This motivated the change in PR138792, since discarding a context might
require a different decision about which not-cold contexts must be kept
to expose cloning requirements, so we need to determine that on the fly.

Additionally, this required a change to include the context size
information in the alloc trie in more cases, so we now guard the
inclusion of this information in the generated metadata on the option
values.
2025-05-09 15:56:54 -07:00
Teresa Johnson
764614e635
[MemProf] Restructure the pruning of unneeded NotCold contexts (#138792)
This change is mostly NFC, other than the addition of a new message
printed when contexts are pruned when -memprof-report-hinted-sizes is
enabled.

To prepare for a follow on change, adjust the way we determine which
NotCold contexts can be pruned (because they overlap with longer NotCold
contexts), and change the way we perform this pruning.

Instead of determining the points at which we need to keep NotCold
contexts during the building of the trie, we now determine this on the
fly as the MIB metadata nodes are recursively built. This simplifies a
follow on change that performs additional pruning of some NotCold
contexts, and which can affect which others need to be kept as the
longest overlapping NotCold contexts.
2025-05-07 17:34:44 -07:00
Kazu Hirata
cb96a3dc07
[memprof] Dump the number of matched frames (#137082)
This patch teaches readMemprof to dump the number of frames for each
allocation site match.  This information helps us analyze what part of
the call stack in the MemProf profile has matched the IR.

Aside from updating existing test cases, this patch adds one more test
case, memprof-dump-matched-alloc-site.ll, because none of the existing
test cases has the number of frames greater than one.
2025-04-23 21:29:16 -07:00
Mircea Trofin
1576fa1010
[ctxprof] Extend the notion of "cannot return" (#135651)
At the time of instrumentation (and instrumentation lowering), `noreturn` is not applied uniformously. Rather than running `FunctionAttrs` pass, we just need to use `llvm::canReturn` exposed in PR #135650
2025-04-16 10:39:34 -07:00
Mircea Trofin
e7aed23d32
[ctxprof] Handle instrumenting functions with musttail calls (#135121)
Functions with `musttail` calls can't be roots because we can't instrument their `ret` to release the context. This patch tags their `CtxRoot` field in their `FunctionData`. In compiler-rt we then know not to allow such functions become roots, and also not confuse `CtxRoot == 0x1` with there being a context root.

Currently we also lose the context tree under such cases. We can, in a subsequent patch, have the root detector search past these functions.
2025-04-14 10:01:25 -07:00
Mircea Trofin
4c90d977db
[ctxprof] Use the flattened contextual profile pre-thinlink (#134723)
Flatten the profile pre-thinlink so that ThinLTO has something to work with for the parts of the binary that aren't covered by contextual profiles. Post-thinlink, the flattener is re-run and will actually change profile info, but just for the modules containing contextual trees ("specialized modules"). For the rest, the flattener just yanks out the instrumentation.
2025-04-08 17:30:49 -07:00
Mircea Trofin
cfa6a5940e
[ctxprof] Don't lower instrumentation for noreturn functions (#134932)
`noreturn` functions are doubtfully interesting for performance optimization / profiling.
2025-04-08 14:48:41 -07:00
Mircea Trofin
b2dea4fd22
[ctxprof] root autodetection mechanism (#133147)
This is an optional mechanism that automatically detects roots. It's a best-effort mechanism, and its main goal is to *avoid* pointing at the message pump function as a root. This is the function that polls message queue(s) in an infinite loop, and is thus a bad root (it never exits).

High-level, when collection is requested - which should happen when a server has already been set up and handing requests - we spend a bit of time sampling all the server's threads. Each sample is a stack which we insert in a `PerThreadCallsiteTrie`. After a while, we run for each `PerThreadCallsiteTrie` the root detection logic. We then traverse all the `FunctionData`, find the ones matching the detected roots, and allocate a `ContextRoot` for them. From here, we special case `FunctionData` objects, in `__llvm_ctx_profile_get_context, that have a `CtxRoot` and route them to `__llvm_ctx_profile_start_context`.

For this to work, on the llvm side, we need to have all functions call `__llvm_ctx_profile_release_context` because they _might_ be roots. This comes at a slight (percentages) penalty during collection - which we can afford since the overall technique is ~5x faster than normal instrumentation. We can later explore conditionally enabling autoroot detection and avoiding this penalty, if desired. 

Note that functions that `musttail call` can't have their return instrumented this way, and a subsequent patch will harden the mechanism against this case.

The mechanism could be used in combination with explicit root specification, too.
2025-04-08 06:59:38 -07:00
Mircea Trofin
1757a235e3
[ctxprof] Make ContextRoot an implementation detail (#131416)
`ContextRoot` `FunctionData` are currently known by the llvm side, which has to instantiate and zero-initialize them. 

This patch makes `FunctionData` the only global value that needs to be known and instantiated by the compiler. On the compiler-rt side, `ContextRoot`s are hung off `FunctionData`, when applicable.

This is for two reasons. First, it is a step towards root autodetection (in a subsequent patch). An autodetection mechanism would instantiate the `ContextRoot` for the detected roots, and then `__llvm_ctx_profile_get_context` would detect that and route to `__llvm_ctx_profile_start_context`.

The second reason is that we will hang off `ContextRoot` more complex datatypes (next patch), and we want to avoid too deep of a coupling between llvm and compiler-rt. Acting as a place to hang related data, `FunctionData` can stay simple - pointers and an (atomic) int (the mutex).
2025-03-18 22:03:26 -07:00
Mircea Trofin
215c47e4d3
[ctxprof] Missing test update post #131201 (#131428) 2025-03-14 21:46:10 -07:00
Mircea Trofin
a5b95487d6
[ctxprof] Missing test for #131269 (#131271) 2025-03-13 21:45:17 -07:00
Snehasish Kumar
e1ac57d53a
[MemProf] Extend CallSite information to include potential callees. (#130441)
* Added YAML traits for `CallSiteInfo`
* Updated the `MemProfReader` to pass `Frames` instead of the entire
`CallSiteInfo`
* Updated test cases to use `testing::Field`
* Add YAML sequence traits for CallSiteInfo in MemProfYAML
* Also extend IndexedMemProfRecord
* XFAIL the MemProfYaml round trip test until we update the profile
format

For now we only read and write the additional information from the YAML
format. The YAML round trip test will be enabled when the serialized format is updated.
2025-03-12 09:55:56 -07:00
Mircea Trofin
07d86d25c9
[ctxprof] Flat profile collection (#130655)
Collect flat profiles. We only do this for function activations that aren't otherwise collectible under a context root are encountered. 

This allows us to reason about the full profile without concerning ourselves wether we are double-counting. For example we can combine (during profile use) flattened contextual profiles with flat profiles.
2025-03-12 07:47:58 -07:00
Kazu Hirata
b488ce0a67
[memprof] Improve call site matching (#129770)
Suppose we have a call instruction satisfying:

- AllocInfoIter != LocHashToAllocInfo.end()
- CallSitesIter != LocHashToCallSites.end()
- !isAllocationWithHotColdVariant(CI->getCalledFunction(), TLI)

In this case this patch, we would take:

  if (AllocInfoIter != LocHashToAllocInfo.end()

but end up discarding the opportunity because of the call to
isAllocationWithHotColdVariant.

This can happen in C++ code like:

  new Something[100];

which is lowered to two calls -- new and the constructor.

This patch fixes the problem by falling back to the call site
annotation if we have !isAllocationWithHotColdVariant.
2025-03-04 21:09:40 -08:00
Mircea Trofin
eb1c3ace39
[ctxprof] Override type of instrumentation if -profile-context-root is specified (#128940)
This patch makes it easy to enable ctxprof instrumentation for targets where the build has a bunch of defaults for instrumented PGO that we want to inherit for ctxprof.

This is switching experimental defaults: we'll eventually enable ctxprof instrumentation through `PGOOpt` but that type is currently quite entangled and, for the time being, no point adding to that.
2025-02-26 19:56:59 -08:00
Mircea Trofin
f6703a4ff5
[ctxprof] don't inline weak symbols after instrumentation (#128811)
Contextual profiling identifies functions by GUID. Functions that may get overridden by the linker with a prevailing copy may have, during instrumentation, different variants in different modules. If these variants get inlined before linking (here I assume thinlto), they will identify themselves to the ctxprof runtime as their GUID, leading to issues - they may have different counter counts, for instance.

If we block their inlining in the pre-thinlink compilation, only the prevailing copy will survive post-thinlink and the confusion is avoided.

The change introduces a small pass just for this purpose, which marks any symbols that could be affected by the above as `noinline` (even if they were `alwaysinline`). We already carried out some inlining (via the preinliner), before instrumenting, so technically the `alwaysinline` directives were honored.

We could later (different patch) choose to mark them back to their original attribute (none or `alwaysinline`) post-thinlink, if we want to - but experimentally that doesn't really change much of the performance of the instrumented binary.
2025-02-26 11:01:37 -08:00
Kazu Hirata
b7feccb31d
[memprof] Dump call site matching information (#125130)
MemProfiler.cpp annotates the IR with the memory profile so that we
can later duplicate context. This patch dumps the entire inline call
stack
for each call site match.
2025-02-06 23:37:10 -08:00
Teresa Johnson
ae6d5dd58b
[MemProf] Prune unneeded non-cold contexts (#124823)
We can take advantage of the fact that we subsequently only clone cold
allocation contexts, since not cold behavior is the default, and
significantly reduce the amount of metadata (and later ThinLTO summary
and MemProfContextDisambiguation graph nodes) by pruning unnecessary not
cold contexts when building metadata from the trie.

Specifically, we only need to keep notcold contexts that overlap the
longest with cold allocations, to know how deeply to clone those
contexts to expose the cold allocation behavior.

For a large target this reduced ThinLTO bitcode object sizes by about
35%. It reduced the ThinLTO indexing time by about half and the peak
ThinLTO indexing memory by about 20%.
2025-01-29 10:38:31 -08:00
Teresa Johnson
2af819fa3d
[MemProf] Add test for hot hints (#124394)
The change in PR124219 required removing one of the tests added for
-memprof-use-hot-hints, since we no longer label any contexts as hot in
metadata, so add a new test that checks the hot attribute instead.
2025-01-26 07:53:53 -08:00
Teresa Johnson
c725a95e08
[MemProf] Convert Hot contexts to NotCold early (#124219)
While we convert hot contexts to notcold contexts during the cloning
step, their existence was greatly limiting the context trimming
performed when we add the MemProf profile to the IR. To address this,
any hot contexts are converted to notcold contexts immediately after
first checking for unambiguous allocation types, and before checking it
again and before adding metadata while performing context trimming.

Note that hot hints are now disabled by default, however, this avoids
adding unnecessary overhead if they are re-enabled.
2025-01-24 15:58:13 -08:00
Teresa Johnson
ae8b560899
[MemProf] Disable hot hints by default (#124338)
By default we were marking some contexts as hot, and adding hot hints to
unambiguously hot allocations. However, there is not yet support for
cloning to expose hot allocation contexts, and none is planned for the
forseeable future.

While we convert hot contexts to notcold contexts during the cloning
step, their existence was greatly limiting the context trimming
performed when we add the MemProf profile to the IR. This change simply
disables the generation of hot contexts / hints by default, as few
allocations were unambiguously hot.

A subsequent change will address the issue when hot hints are optionally
enabled. See PR124219 for details.

This change resulted in significant overhead reductions for a large
target:
~48% reduction in the per-module ThinLTO bitcode summary sizes
~72% reduction in the distributed ThinLTO bitcode combined summary sizes
~68% reduction in thin link time
~34% reduction in thin link peak memory
2025-01-24 13:06:11 -08:00
Kazu Hirata
adf0c817f3
[memprof] Undrift MemProf profile even when some frames are missing (#120500)
This patch makes the MemProf undrifting process a little more lenient.
Consider an inlined call hierarchy:

  foo -> bar -> ::new

If bar tail-calls ::new, the profile appears to indicate that foo
directly calls ::new.  This is a problem because the perceived call
hierarchy in the profile looks different from what we can obtain from
the inline stack in the IR.

Recall that undrifting works by constructing and comparing a list of
direct calls from the profile and that from the IR.  This patch
modifies the construction of the latter.  Specifically, if foo calls
bar in the IR, but bar is missing the profile, we pretend that foo
directly calls some heap allocation function.  We apply this
transformation only in the inline stack leading to some heap
allocation function.
2024-12-20 15:40:08 -08:00
Teresa Johnson
c7451ffcb9
[MemProf] Supporting hinting mostly-cold allocations after cloning (#120633)
Optionally unconditionally hint allocations as cold or not cold during
the cloning step if the percentage of bytes allocated is at least that
of the given threshold. This is similar to PR120301 which supports this
during matching, but enables the same behavior during cloning, to reduce
the false positives that can be addressed by cloning at the cost of
carrying the additional size metadata/summary.
2024-12-20 11:27:54 -08:00
Kazu Hirata
a03343daa6
[memprof] YAMLify the profile for memprof_missing_leaf.ll (NFC) (#120488)
This patch converts the profile for memprof_missing_leaf.ll to the
recently introduced YAML-based text format.
2024-12-19 10:16:10 -08:00
Kazu Hirata
ac8a9f8fff
[memprof] Undrift MemProfRecord (#120138)
This patch undrifts source locations in MemProfRecord before readMemprof
starts the matching process.

The thoery of operation is as follows:

1. Collect the lists of direct calls, one from the IR and the other
   from the profile.

2. Compute the correspondence (called undrift map in the patch)
   between the two lists with longestCommonSequence.

3. Apply the undrift map just before readMemprof consumes
   MemProfRecord.

The new function gated by a flag that is off by default.
2024-12-18 14:21:25 -08:00
Teresa Johnson
a15e7b11da
[MemProf] Add option to hint allocations at a given cold byte percentage (#120301)
Optionally unconditionally hint allocations as cold or not cold during
the matching step if the percentage of bytes allocated is at least that
of the given threshold.
2024-12-17 15:53:56 -08:00
Teresa Johnson
d7d0e740cc
[MemProf] Refactor single alloc type handling and use in more cases (#120290)
Emit message when we have aliased contexts that are conservatively
hinted not cold. This is not a change in behavior, just in message when
the -memprof-report-hinted-sizes flag is enabled.
2024-12-17 12:50:49 -08:00
Kazu Hirata
8476ba71f2
[memprof] YAMLify one test (NFC) (#119955)
This patch replaces the raw binary profile with a YAML profile.

I've trimmed the profile by removing all MemProfRecords except the one
for _Z3foov.

This patch demonstrates that we can see !memprof generated even with a
YAML profile.
2024-12-15 22:22:25 -08:00