85 Commits

Author SHA1 Message Date
Teresa Johnson
edb7f6c0da
[MemProf] Add more assertion checking to the edge removal helper (#125017)
Check a few unexpected cases (edge already removed, edge not in its
caller or callee edge lists).
2025-01-29 19:23:35 -08:00
Teresa Johnson
6c3bf34114
[MemProf] Fix summary identification for imported locals (#124659)
When we apply cloning decisions in the ThinLTO backend, we need to find
the corresponding summary for each function in the IR, and in some cases
for callee functions. This is complicated when the function was a
promoted local, in which case the GUID was formed from the hash of the
original source file prepended to the function name. Those functions
can be identified by the fact that they were given a ".llvm." suffix
during promotion.

We previously didn't do this correctly for promoted locals imported from
other modules, as we only tried the current module source name. This led
to crashes, in particular when the current module also had an local
function of the same original name. In particular, we were attempting to
iterate through the wrong summary's callsites, and there were fewer than
in the actual function so we accessed data off the end (in a release
build with assertion checking off - with assertion checking on we double
check the stack ids and that would have failed). Even if we hadn't
crashed or hit an assert, we could have applied the wrong cloning
decisions, leading to unsats at link time.

Luckily, function importing attaches thinlto_src_file metadata
containing the original source file name to all imported functions. It
normally doesn't do this by default, however, it always does if MemProf
context disambiguation is enabled. Therefore, we can just look to see if
the function contains this metadata and if so use it to recreate the
original GUID.

A similar issue can occur when looking for the ValueInfo / GUID of
a direct tail call to see if we synthesized a callsite record for a
missing tail call frame. In that case, the callee function may be a
declaration, if we imported its caller but not the callee function
definition. Because imported declarations don't get the thinlto_src_file
metadata, we instead look at its caller (which works because this
happens very early in the backend before any inlining).
2025-01-29 18:22:14 -08:00
Teresa Johnson
8a86e6aefe
[MemProf] Constify a couple of methods used during cloning (#124994)
This also helps ensure we don't inadvartently create map entries
by forcing use of at() instead of operator[].
2025-01-29 14:18:11 -08:00
Kazu Hirata
e0c5a8553d
[memprof] Migrate away from PointerUnion::dyn_cast (NFC) (#124505)
Note that PointerUnion::dyn_cast has been soft deprecated in
PointerUnion.h:

  // FIXME: Replace the uses of is(), get() and dyn_cast() with
  //        isa<T>, cast<T> and the llvm::dyn_cast<T>

Literal migration would result in dyn_cast_if_present (see the
definition of PointerUnion::dyn_cast), but this patch uses cast
because we know which alternative to expect in the ternary expression.
2025-01-27 10:35:37 -08:00
Teresa Johnson
7ad8a3da47
[MemProf] Simplify edge iterations (NFC) (#123469)
Remove edge iterator parameters from the various helpers that move edges
onto other nodes, and their associated iterator update code, and instead
iterate over copies of the edge lists in the caller loops. This also
avoids the need to increment these iterators at every early loop
continue.

This simplifies the code, makes it less error prone when updating, and
in particular, facilitates adding handling of recursive contexts.

There were no measurable compile time and memory overhead effects for a
large target.
2025-01-22 11:35:52 -08:00
Kazu Hirata
debe7bd916
[memprof] Migrate away from PointerUnion::dyn_cast (NFC) (#123716)
Note that PointerUnion::dyn_cast has been soft deprecated in
PointerUnion.h:

  // FIXME: Replace the uses of is(), get() and dyn_cast() with
  //        isa<T>, cast<T> and the llvm::dyn_cast<T>

Literal migration would result in dyn_cast_if_present (see the
definition of PointerUnion::dyn_cast), but this patch uses cast
because we expect the arguments to be of the requested types.  Note
that all these cases have assert and/or dereferences just after cast,
implying that the return value from cast must be nonnull.

---------

Co-authored-by: Nikita Popov <github@npopov.com>
2025-01-21 15:02:41 -08:00
Teresa Johnson
0ca6b2b0cc
[MemProf] Fix an incorrect iterator increment (#123438)
We pass in a pointer to an Edge iterator to
moveEdgeToExistingCalleeClone, so that it can be correctly updated when
we remove edges during an edge iteration. We were not dereferencing this
pointer in one case, meaning we would increment the pointer and not the
iterator as intended.

This did not cause any issues, as it turns out that we would simply skip
the edge on the next iteration as it was already appropriately handled.
While in theory this incurred some extra compilation time, in practice
for a large application the effect was not significant. I confirmed that
there was no effect to any cloning from the fix.

I plan to send a follow up change to avoid the need to pass in an
iterator at all and simplify / consolidate the handling in the caller,
but want to fix this in case something requires a revert of the follow
on fix.
2025-01-21 11:31:29 -08:00
Kazu Hirata
cac3f5ecb9
[memprof] Add simplify_type (NFC) (#123556)
IndexCall is a simple wrapper around:

  PointerUnion<CallsiteInfo *, AllocInfo *>

Now, because we don't have CastInfo for IndexCall, we would have to
use getBase like so:

  dyn_cast_if_present<CallsiteInfo *>(Call.getBase())

This patch adds simplify_type<IndexCall>, which in turn enables
CastInfo for IndexCall, so we can drop getBase like so::

  dyn_cast_if_present<CallsiteInfo *>(Call)
2025-01-20 10:12:39 -08:00
Kazu Hirata
43fdd6e81d
[memprof] Migrate away from PointerUnion::is (NFC) (#122622)
Note that PointerUnion::is have been soft deprecated in
PointerUnion.h:

  // FIXME: Replace the uses of is(), get() and dyn_cast() with
  //        isa<T>, cast<T> and the llvm::dyn_cast<T>

In this patch, I'm calling call().getBase() for an instance of
PointerUnion.  call() alone would return an instance of IndexCall,
which wraps PointerUnion.  Note that isa<> cannot directly accept an
instance of IndexCall, at least without defining CastInfo.

I'm not touching PointerUnion::dyn_cast for now because it's a bit
complicated; we could blindly migrate it to dyn_cast_if_present, but
we should probably use dyn_cast when the operand is known to be
non-null.
2025-01-12 11:06:42 -08:00
Teresa Johnson
3055e86c71
[MemProf] Disable cloning of callsites in recursive cycles by default (#122354)
This disables the support added in PR121985 by default while we
investigate a compile time crash.
2025-01-09 12:01:43 -08:00
Teresa Johnson
b8ad6fb066
[MemProf] Allow cloning of callsites in recursive cycles (#121985)
Optionally (by default) no longer mark callsite nodes as Recursive,
which means they would be automatically skipped during cloning. This was
too conservative as it prevents cloning of any callsite that showed up
in any recursive cycle, even for non-recursive contexts.

While this will enable partial cloning of recursive contexts, the
recursive calls themselves will not be updated to call the correct
clone, possibly leading to some unnecessary but benign cloning and
affecting bytes hinted reporting. To prevent this, optional support
looks for recursive cycles in contexts during cloning and removes
those contexts from cloning. This requires some additional runtime
overhead, so is disabled by default for now.

Support for correct cloning of recursive cycles is WIP.
2025-01-07 17:00:46 -08:00
Teresa Johnson
c7451ffcb9
[MemProf] Supporting hinting mostly-cold allocations after cloning (#120633)
Optionally unconditionally hint allocations as cold or not cold during
the cloning step if the percentage of bytes allocated is at least that
of the given threshold. This is similar to PR120301 which supports this
during matching, but enables the same behavior during cloning, to reduce
the false positives that can be addressed by cloning at the cost of
carrying the additional size metadata/summary.
2024-12-20 11:27:54 -08:00
Teresa Johnson
2916352936
[MemProf] Skip unmatched callers when cloning (#120455)
Don't unnecessarily clone for a caller that wasn't matched to a call
instruction.

This necessitated updated a couple of tests that were either
unnecessarily cloning or unnecessarily processing an allocation and
hinting it not cold.
2024-12-18 12:47:19 -08:00
Kazu Hirata
1dac0cd41f
[memprof] Use ListSeparator (NFC) (#120047)
ListSeparator from StringExtras.h is essentially the same as
FieldSeparator being removed in this patch.  ListSeparator returns the
empty string on the first use via "operator StringRef()".  It returns
", " on subsequent uses.
2024-12-16 09:41:16 -08:00
Teresa Johnson
9513f2fdf2
[MemProf] Print full context hash when reporting hinted bytes (#114465)
Improve the information printed when -memprof-report-hinted-sizes is
enabled. Now print the full context hash computed from the original
profile, similar to what we do when reporting matching statistics. This
will make it easier to correlate with the profile.

Note that the full context hash must be computed at profile match time
and saved in the metadata and summary, because we may trim the context
during matching when it isn't needed for distinguishing hotness.
Similarly, due to the context trimming, we may have more than one full
context id and total size pair per MIB in the metadata and summary,
which now get a list of these pairs.

Remove the old aggregate size from the metadata and summary support.
One other change from the prior support is that we no longer write the
size information into the combined index for the LTO backends, which
don't use this information, which reduces unnecessary bloat in
distributed index files.
2024-11-15 08:24:44 -08:00
Kazu Hirata
17bc738324
[memprof] Make ContextNode smaller (#116271)
With this patch, sizeof(ContextNode) goes down from 144 to 128.

Note that SmallVector<T, 0> uses uint32_t for its capacity and size
fields.

I could change other instances of std::vector to SmallVector<T, 0>,
but that would require updates to many places, so I am leaving them
alone for now.
2024-11-14 17:28:56 -08:00
Teresa Johnson
3654183afb
[MemProf] Allow promotion if target is a declaration (#115555)
Fixes an oversight in the MemProf ICP handling, that was blocking
promotion/cloning of indirect calls when the profiled target is a
declaration (i.e wasn't imported). There is no issue promoting in
that case, and in fact the comment mentions we should attempt to at
least import as declarations to enable more promotion.

Note that normal ICP currently requires that the target be a definition,
which is how this check ended up here. The comment there says that it
must be a definition because ThinLTO could remove declarations for
symbols found to be globally dead in the binary. However, here we are
always performing MemProf ICP in the ThinLTO backends, which is after
the globally dead symbols are removed (via dropDeadSymbols before
starting the optimization pipeline) [1].

For now, guard this with an option (flag is off which means the new
promotion is enabled by default) to simplify debugging or disabling it
if
this proves problematic.

[1] In fact we could also be more aggressive in regular ICP when invoked
in the ThinLTO backend
2024-11-09 07:05:43 -08:00
Teresa Johnson
594e11ce42
[MemProf] Avoid incorrect ICP symtab canonicalization (#115419)
ICP builds a symtab from the symbols in the module allowing mapping from
the VP metadata GUIDs to the Function. MemProf uses this same symtab
handling for its ICP during cloning. When symbols are added to the
symtab, the handling adds both a GUID computed from the function name,
or from the attached PGOFuncName metadata for locals, as well as a GUID
computed from the "canonicalized" name, which strips all "." suffixes
other than ".__uniq". This was originally meant to remove the ".llvm.*"
suffix added to promoted locals (done earlier in the ThinLTO backend).
In theory, it should no longer be needed as locals should have
PGOFuncName metadata.

However, this was causing a linker unsat, in code that used coroutines.
For an original coroutine function, there were several additional
functions created that had the same name, but different "." suffixes.
Therefore the canonical name for these additional functions had the same
GUID as that of the original function, leading to extra entries in the
symtab, and to selecting the wrong function for promotion. For regular
ICP this can happen, but is just a performance issue. However, for
memprof the promoted direct call calls a memprof clone, and because we
called the wrong function, in this case it didn't have a memprof clone
and we got a linker unsat.

We may be able to remove the canonical name handling for ICP in general,
but for now disable it for MemProf. At worst this could lead to not
finding a GUID in the symtab and not performing an ICP, so should be
conservatively correct.
2024-11-07 21:00:42 -08:00
Kazu Hirata
98ea1a81a2
[IPO] Remove unused includes (NFC) (#114716)
Identified with misc-include-cleaner.
2024-11-03 13:48:55 -08:00
Teresa Johnson
355e6948d4
[MemProf] Fix clone edge comparison (#113753)
The issue fixed in PR113337 exposed a bug in the comparisons done in
allocTypesMatch, which compares a vector of alloc types to those in the
given vector of Edges. The form of std::equal used, which didn't provide
the end iterator for the Edges vector, will iterate through as many
entries in the Edges vector as in the InAllocTypes vector, which can
fail if there are fewer entries in the Edges vector, because we may
dereference a bogus Edge pointer. This function is called twice, once
for the Node, with its callee edges, in which case the number of edges
should always match the number of entries in allocTypesMatch, which is
computed from the Node's callee edges. It was also called for Node's
clones, and it turns out that after cloning and edge modifications done
for other allocations, the number of callee edges in Node and its clones
may no longer match. In some cases, more common with memprof ICP before
the PR113337, the number of clone edges can be smaller leading to a bad
dereference. I found for a large application even before adding memprof
ICP support we sometimes call this with fewer entries in the clone's
callee edges, but were getting lucky as they had allocation type None,
and we didn't end up attempting to dereference the bad edge pointer.

Fix this by passing Edges.end() to std::equal, which means std::equal
will fail if the number of entries in the 2 vectors are not equal.
However, this is too conservative, as clone edges may have been added or
removed since it was initially cloned, and in fact can be wrong as we
may not be comparing allocation types corresponding to the same callee.

Therefore, a couple of enhancements are made to avoid regressing and
improve the checking and cloning:
- Don't bother calling the alloc type comparison when the clone and the
  Node's alloc type for the current allocation are precise (have a
  single allocation type) and are the same (which is guaranteed by an
  earlier check, and an assert is added to confirm that). In that case
  we can trivially determine that the clone can be used.
- Split the alloc type matching handling into a separate function for
  the clone case. In that case, for each of the InAllocType entries,
  attempt to find and compare to the clone callee edge with the same
  callee as the corresponding original node callee.

To create a test case I needed to take a spec application (xalancbmk),
and repeatedly apply random hot/cold-ness to the memprof contexts
when building, until I hit the problematic case. I then reduced that
full LTO IR using llvm-reduce and then manually.
2024-10-26 20:53:20 -07:00
Teresa Johnson
144ddca9ed
[MemProf] Avoid duplicate edges between nodes (#113337)
The recent change to add support for cloning indirect calls
inadvertantly caused duplicate edges to be created between the same
caller/callee pair. This is due to the new moveCalleeEdgeToNewCaller
not properly guarding the addition of a new edge (ironically I was
testing for that in an assertion, but failed to handle that case
specially otherwise). Now simply move the context ids over to any
existing edge.

This issue in turn led to some assumptions in cloning being violated,
resulting in a later crash.

Add a test for this case to checkNode.
2024-10-25 11:09:57 -07:00
Teresa Johnson
120e42d313
[MemProf] Improve metadata cleanup in LTO backend (#113039)
Previously we were attempting to remove the memprof-related metadata
when iterating through instructions in the LTO backend. However, we
missed some as there are a number of cases where we skip instructions,
or even entire functions. Simplify the cleanup and ensure all is removed
by doing a full sweep over all instructions after completing cloning.

This is largely NFC except with -memprof-report-hinted-sizes enabled,
because we were propagating and simplifying the metadata after inlining
in the LTO backend, which caused some stray messages as metadata was
re-converted to attributes.
2024-10-21 08:51:36 -07:00
Teresa Johnson
1de71652fd
[MemProf] Support cloning for indirect calls with ThinLTO (#110625)
This patch enables support for cloning in indirect callsites.

This is done by synthesizing callsite records for each virtual call
target from the profile metadata. In the thin link all the synthesized
records for a particular indirect callsite initially share the same
context node, but support is added to partition the callsites and
outgoing edges based on the callee function, creating a separate node
for each target.

In the LTO backend, when cloning is needed we first perform indirect
call promotion, then change the target of the new direct call to the
desired clone.

Note this is ThinLTO-specific, since for regular LTO indirect call
promotion should have already occurred.
2024-10-11 13:53:35 -07:00
Teresa Johnson
c616f19924
[MemProf] Refactor context node creation into a new helper (NFC) (#108408)
Simplify code by refactoring some common handling for node creation into
a helper function.
2024-09-27 11:36:40 -07:00
Teresa Johnson
9483ff9f09
Reapply "[MemProf] Streamline and avoid unnecessary context id duplication (#107918)" (#110036)
This reverts commit 12d4769cb84b2b2e60f9776fa043c6ea16f08ebb, reapplying
524a028f69cdf25503912c396ebda7ebf0065ed2 but with fixes for failures
seen in broader testing.
2024-09-26 13:41:56 -07:00
Teresa Johnson
02d6aad5cc
[MemProf] Reduce unnecessary context id computation (NFC) (#109857)
One of the memory reduction techniques was to compute node context ids
on the fly. This reduced memory at the expense of some compile time
increase.

For a large binary we were spending a lot of time invoking getContextIds
on the node during assignStackNodesPostOrder, because we were iterating
through the stack ids for a call from leaf to root (first to last node
in the parlance used in that code). However, all calls for a given entry
in the StackIdToMatchingCalls map share the same last node, so we can
borrow the approach used by similar code in updateStackNodes and compute
the context ids on the last node once, then iterate each call's stack
ids in reverse order while reusing the last node's context ids.

This reduced the thin link time by 43% for a large target. It isn't
clear why there wasn't a similar increase measured when introducing the
node context id recomputation, but the compile time was longer to start
with then.
2024-09-24 16:18:48 -07:00
Teresa Johnson
beb2ae7348
[MemProf] Refactor and clean up edge removal (#109188)
Add helper for removing an edge from the graph, and for checking if an
edge has been removed from the graph, and then update code to use those
consistently for removal and during edge iteration, respectively. Also
fix a couple of places that were incorrectly iterating over edge lists
that could in theory be updated during the iteration.
2024-09-19 09:31:50 -07:00
Teresa Johnson
12d4769cb8
Revert "[MemProf] Streamline and avoid unnecessary context id duplication (#107918)" (#108652)
This reverts commit 524a028f69cdf25503912c396ebda7ebf0065ed2, but
manually so that follow on PR108086 /
ae5f1a78d3a930466f927989faac8e0b9d820a7b
is retained (NFC patch to convert tuple to a struct).
2024-09-13 16:20:43 -07:00
Teresa Johnson
ae5f1a78d3
[MemProf] Convert CallContextInfo to a struct (NFC) (#108086)
As suggested in #107918, improve readability by converting this tuple to
a struct.
2024-09-10 16:27:56 -07:00
Teresa Johnson
524a028f69
[MemProf] Streamline and avoid unnecessary context id duplication (#107918)
Sort the list of calls such that those with the same stack ids are also
sorted by function. This allows processing of all matching calls (that
can share a context node) in bulk as they are all adjacent.

This has 2 benefits:
1. It reduces unnecessary work, specifically the handling to intersect
   the context ids with those along the graph edges for the stack ids,
   for calls that we know can share a node.
2. It simplifies detecting when we have matching stack ids but don't
   need to duplicate context ids. Specifically, we were previously
   still duplicating context ids whenever we saw another call with the
   same stack ids, but that isn't necessary if they will share a context
   node. With this change we now only duplicate context ids if we see
   some that not only have the same ids but also are in different
   functions.

This change reduced the amount of context id duplication and provided
reductions in both both peak memory (~8%) and time (~%5) for a large
target.
2024-09-10 10:11:33 -07:00
Teresa Johnson
e46f03bc31
[MemProf] Remove unnecessary data structure (NFC) (#107643)
Recent change #106623 added the CallToFunc map, but I subsequently
realized the same information is already available for the calls being
examined in the StackIdToMatchingCalls map we're iterating through.
2024-09-09 08:17:41 -07:00
Teresa Johnson
0ab3d6e143
Reapply "[MemProf] Reduce cloning overhead by sharing nodes when possible" (#102932) with fixes (#106623)
This reverts commit 11aa31f595325d6b2dede3364e4b86d78fffe635, restoring
commit 055e4319112282354327af9908091fdb25149e9b, with added fixes for
linker unsats.

In some cases multiple calls to different targets may end up with the
same debug information, and therefore callsite id. We will end up
sharing the node between these calls. We don't know which one matches
the callees until all nodes are matched with calls, at which point any
non-matching calls should be removed from the node. The fix extends the
handling in handleCallsitesWithMultipleTargets to do this, and adds
tests for various permutations of this situation.
2024-08-30 17:24:40 -07:00
Teresa Johnson
11aa31f595
Revert "[MemProf] Reduce cloning overhead by sharing nodes when possible" (#102932)
Reverts llvm/llvm-project#99832

This caused a couple failures in wider testing, reverting for now and
will recommit once they are addressed
2024-08-12 10:38:08 -07:00
lifengxiang1025
e6aeb3f4da
[MemProf] Fix when function has indirect call (#101170)
When function has indirect call in LTO mode, it causes `assert(Alias)`
in `findProfiledCalleeThroughTailCalls`
2024-08-01 10:16:53 +08:00
Teresa Johnson
055e431911
[MemProf] Reduce cloning overhead by sharing nodes when possible (#99832)
When assigning calls to nodes while building the graph, we can share
nodes between multiple calls in some cases. Specifically, when we
process the list of calls that had the same stack ids (possibly pruned,
because we are looking at the stack ids that actually had nodes in the
graph due to stack ids in the pruned allocation MIBs), for calls that
are located in the same function, we know that they will behave exactly
the same through cloning and function assignment. Therefore, instead of
creating nodes for all of them (requiring context id duplication), keep
a list of additional "matching calls" on the nodes. During function
assignment we simply update all the matching calls the same way as the
primary call.

This change not only reduces the number of nodes (both original and
cloned), but also greatly reduces the number of duplicated context ids
and the time to propagate them.

For a large target, I measured a 25% peak memory reduction and 42% time
reduction.
2024-07-23 12:44:06 -07:00
Teresa Johnson
edfe25064e
[MemProf] Consolidate increments in callee matching code (#99385)
To facilitate some follow on changes, consolidate the incrementing of
the edge iterator used during callee matching to the for loop statement.
This requires an additional adjustment in the case of tail call
handling.
2024-07-17 20:25:18 -07:00
Teresa Johnson
9f8205d9d8
[MemProf] Track and report profiled sizes through cloning (#98382)
If requested, via the -memprof-report-hinted-sizes option, track the
total profiled size of each MIB through the thin link, then report on
the corresponding allocation coldness after all cloning is complete.

To save size, a different bitcode record type is used for the allocation
info when the option is specified, and the sizes are kept separate from
the MIBs in the index.
2024-07-11 16:10:30 -07:00
Kazu Hirata
fef144cebb Revert "[llvm] Use llvm::sort (NFC) (#96434)"
This reverts commit 05d167fc201b4f2e96108be0d682f6800a70c23d.

Reverting the patch fixes the following under EXPENSIVE_CHECKS:

  LLVM :: CodeGen/AMDGPU/sched-group-barrier-pipeline-solver.mir
  LLVM :: CodeGen/AMDGPU/sched-group-barrier-pre-RA.mir
  LLVM :: CodeGen/PowerPC/aix-xcoff-used-with-stringpool.ll
  LLVM :: CodeGen/PowerPC/merge-string-used-by-metadata.mir
  LLVM :: CodeGen/PowerPC/mergeable-string-pool-large.ll
  LLVM :: CodeGen/PowerPC/mergeable-string-pool-pass-only.mir
  LLVM :: CodeGen/PowerPC/mergeable-string-pool.ll
2024-06-25 11:18:40 -07:00
Kazu Hirata
05d167fc20
[llvm] Use llvm::sort (NFC) (#96434) 2024-06-23 10:38:51 -07:00
Nikita Popov
f1075a34ab [FileSystem] Avoid <stack> include (NFC)
The standard pattern in LLVM is to directly use vectors for stacks,
without an additional std::stack wrapper to rename some methods.
2024-06-21 13:44:46 +02:00
Kazu Hirata
5dc99af487
[llvm] Use llvm::is_contained (NFC) (#95362) 2024-06-13 08:09:13 -07:00
Kazu Hirata
e2d539bbba [memprof] Fix comment typos (NFC) 2024-06-10 16:38:24 -07:00
Kazu Hirata
b7d976d4e5
[memprof] Use std::move in ContextEdge::ContextEdge (NFC) (#94687)
Since the constructor of ContextEdge takes ContextIds by value, we
should move it to the corresponding member variable as suggested by
clang-tidy's performance-unnecessary-value-param.

While we are at it, this patch updates a couple of callers.  To avoid
the ambiguity in the evaluation order among the constructor arguments,
I'm calling computeAllocType before calling the constructor.
2024-06-06 23:49:05 -07:00
Teresa Johnson
9eac38a000
[MemProf] Remove context id set from nodes and recompute on demand (#94415)
The ContextIds set on the ContextNode struct is not technically needed
as we can compute it from either the callee or caller edge context ids.
Remove it and add a helper to recompute from the edges on demand. Also
add helpers to compute the node allocation type and whether the context
ids are empty from the edges without needing to first compute the node's
context id set, to minimize the runtime cost increase.

This yielded a 20% reduction in peak memory for a large thin link, for
about a 2% time increase (which is more than offset by some other recent
time efficiency improvements).
2024-06-06 11:04:45 -07:00
Teresa Johnson
4973ad4718
[MemProf][NFC] Use range for loop (#94308)
With the change in 2fa059195bb54f422cc996db96ac549888268eae we can now
use a range for loop.
2024-06-03 21:15:40 -07:00
Teresa Johnson
2fa059195b
[MemProf] Use remove_if to erase MapVector elements in bulk (#94269)
A cycle profile showed that we were spending a lot of time invoking
MapVector::erase. According to
https://llvm.org/docs/ProgrammersManual.html#llvm-adt-mapvector-h,
erasing elements one at a time is very inefficient for MapVector and it
is better to use remove_if.

This change resulted in around 7% time reduction on a large thin link.

While here remove an unused function that also invokes erase on
MapVectors.
2024-06-03 20:43:52 -07:00
Teresa Johnson
61afebdacc
[MemProf][NFC] Switch to DenseMaps (#93868)
Change a couple of maps from std::map to DenseMap, which showed
a modest (3.6%) reduction in peak RSS.
2024-05-30 12:57:14 -07:00
Teresa Johnson
b2f6d323fc
[MemProf] Fix tailcall discovery checking for multiple callee chains (#92632)
When looking for missing frames due to tail calls, we were not checking
the output parameter of the recursive call in the correct place.
Make sure we check for the case when that recursive call returned false
due to multiple possible callee chains.

Extended the existing test a bit to catch this case.
2024-05-24 07:38:07 -07:00
Teresa Johnson
a332cfc986
[MemProf] Perform cloning for each allocation separately (#87112)
Restructures the cloning slightly to perform all cloning for each
allocation separately. The prior algorithm would sometimes miss cloning
opportunities in cases where trimmed cold contexts partially overlapped
with longer contexts for different allocations.

Most of the change is isolated to the helpers that move edges to new or
existing clones, which now support moving a subset of context ids.
2024-04-09 14:12:32 -07:00
Teresa Johnson
082e7c480e
[MemProf] Remove empty edges once after cloning (#85320)
Restructure the handling of edges that become empty during the cloning
process. Instead of removing them as they become empty (no context ids
and alloc type), do this once after all cloning is complete.

This has no effect on the cloning result, but prepares for a follow on
change that does improve the cloning. The structural change here reduces
the diffs for the follow on change, which would be much more difficult
with the previous handling.
2024-03-26 20:06:27 -07:00