llvm-project

Author	SHA1	Message	Date
Teresa Johnson	355e6948d4	[MemProf] Fix clone edge comparison (#113753 ) The issue fixed in PR113337 exposed a bug in the comparisons done in allocTypesMatch, which compares a vector of alloc types to those in the given vector of Edges. The form of std::equal used, which didn't provide the end iterator for the Edges vector, will iterate through as many entries in the Edges vector as in the InAllocTypes vector, which can fail if there are fewer entries in the Edges vector, because we may dereference a bogus Edge pointer. This function is called twice, once for the Node, with its callee edges, in which case the number of edges should always match the number of entries in allocTypesMatch, which is computed from the Node's callee edges. It was also called for Node's clones, and it turns out that after cloning and edge modifications done for other allocations, the number of callee edges in Node and its clones may no longer match. In some cases, more common with memprof ICP before the PR113337, the number of clone edges can be smaller leading to a bad dereference. I found for a large application even before adding memprof ICP support we sometimes call this with fewer entries in the clone's callee edges, but were getting lucky as they had allocation type None, and we didn't end up attempting to dereference the bad edge pointer. Fix this by passing Edges.end() to std::equal, which means std::equal will fail if the number of entries in the 2 vectors are not equal. However, this is too conservative, as clone edges may have been added or removed since it was initially cloned, and in fact can be wrong as we may not be comparing allocation types corresponding to the same callee. Therefore, a couple of enhancements are made to avoid regressing and improve the checking and cloning: - Don't bother calling the alloc type comparison when the clone and the Node's alloc type for the current allocation are precise (have a single allocation type) and are the same (which is guaranteed by an earlier check, and an assert is added to confirm that). In that case we can trivially determine that the clone can be used. - Split the alloc type matching handling into a separate function for the clone case. In that case, for each of the InAllocType entries, attempt to find and compare to the clone callee edge with the same callee as the corresponding original node callee. To create a test case I needed to take a spec application (xalancbmk), and repeatedly apply random hot/cold-ness to the memprof contexts when building, until I hit the problematic case. I then reduced that full LTO IR using llvm-reduce and then manually.	2024-10-26 20:53:20 -07:00
Teresa Johnson	144ddca9ed	[MemProf] Avoid duplicate edges between nodes (#113337 ) The recent change to add support for cloning indirect calls inadvertantly caused duplicate edges to be created between the same caller/callee pair. This is due to the new moveCalleeEdgeToNewCaller not properly guarding the addition of a new edge (ironically I was testing for that in an assertion, but failed to handle that case specially otherwise). Now simply move the context ids over to any existing edge. This issue in turn led to some assumptions in cloning being violated, resulting in a later crash. Add a test for this case to checkNode.	2024-10-25 11:09:57 -07:00
Teresa Johnson	120e42d313	[MemProf] Improve metadata cleanup in LTO backend (#113039 ) Previously we were attempting to remove the memprof-related metadata when iterating through instructions in the LTO backend. However, we missed some as there are a number of cases where we skip instructions, or even entire functions. Simplify the cleanup and ensure all is removed by doing a full sweep over all instructions after completing cloning. This is largely NFC except with -memprof-report-hinted-sizes enabled, because we were propagating and simplifying the metadata after inlining in the LTO backend, which caused some stray messages as metadata was re-converted to attributes.	2024-10-21 08:51:36 -07:00
Teresa Johnson	1de71652fd	[MemProf] Support cloning for indirect calls with ThinLTO (#110625 ) This patch enables support for cloning in indirect callsites. This is done by synthesizing callsite records for each virtual call target from the profile metadata. In the thin link all the synthesized records for a particular indirect callsite initially share the same context node, but support is added to partition the callsites and outgoing edges based on the callee function, creating a separate node for each target. In the LTO backend, when cloning is needed we first perform indirect call promotion, then change the target of the new direct call to the desired clone. Note this is ThinLTO-specific, since for regular LTO indirect call promotion should have already occurred.	2024-10-11 13:53:35 -07:00
Teresa Johnson	c616f19924	[MemProf] Refactor context node creation into a new helper (NFC) (#108408 ) Simplify code by refactoring some common handling for node creation into a helper function.	2024-09-27 11:36:40 -07:00
Teresa Johnson	9483ff9f09	Reapply "[MemProf] Streamline and avoid unnecessary context id duplication (#107918 )" (#110036 ) This reverts commit 12d4769cb84b2b2e60f9776fa043c6ea16f08ebb, reapplying 524a028f69cdf25503912c396ebda7ebf0065ed2 but with fixes for failures seen in broader testing.	2024-09-26 13:41:56 -07:00
Teresa Johnson	02d6aad5cc	[MemProf] Reduce unnecessary context id computation (NFC) (#109857 ) One of the memory reduction techniques was to compute node context ids on the fly. This reduced memory at the expense of some compile time increase. For a large binary we were spending a lot of time invoking getContextIds on the node during assignStackNodesPostOrder, because we were iterating through the stack ids for a call from leaf to root (first to last node in the parlance used in that code). However, all calls for a given entry in the StackIdToMatchingCalls map share the same last node, so we can borrow the approach used by similar code in updateStackNodes and compute the context ids on the last node once, then iterate each call's stack ids in reverse order while reusing the last node's context ids. This reduced the thin link time by 43% for a large target. It isn't clear why there wasn't a similar increase measured when introducing the node context id recomputation, but the compile time was longer to start with then.	2024-09-24 16:18:48 -07:00
Teresa Johnson	beb2ae7348	[MemProf] Refactor and clean up edge removal (#109188 ) Add helper for removing an edge from the graph, and for checking if an edge has been removed from the graph, and then update code to use those consistently for removal and during edge iteration, respectively. Also fix a couple of places that were incorrectly iterating over edge lists that could in theory be updated during the iteration.	2024-09-19 09:31:50 -07:00
Teresa Johnson	12d4769cb8	Revert "[MemProf] Streamline and avoid unnecessary context id duplication (#107918 )" (#108652 ) This reverts commit 524a028f69cdf25503912c396ebda7ebf0065ed2, but manually so that follow on PR108086 / ae5f1a78d3a930466f927989faac8e0b9d820a7b is retained (NFC patch to convert tuple to a struct).	2024-09-13 16:20:43 -07:00
Teresa Johnson	ae5f1a78d3	[MemProf] Convert CallContextInfo to a struct (NFC) (#108086 ) As suggested in #107918, improve readability by converting this tuple to a struct.	2024-09-10 16:27:56 -07:00
Teresa Johnson	524a028f69	[MemProf] Streamline and avoid unnecessary context id duplication (#107918 ) Sort the list of calls such that those with the same stack ids are also sorted by function. This allows processing of all matching calls (that can share a context node) in bulk as they are all adjacent. This has 2 benefits: 1. It reduces unnecessary work, specifically the handling to intersect the context ids with those along the graph edges for the stack ids, for calls that we know can share a node. 2. It simplifies detecting when we have matching stack ids but don't need to duplicate context ids. Specifically, we were previously still duplicating context ids whenever we saw another call with the same stack ids, but that isn't necessary if they will share a context node. With this change we now only duplicate context ids if we see some that not only have the same ids but also are in different functions. This change reduced the amount of context id duplication and provided reductions in both both peak memory (~8%) and time (~%5) for a large target.	2024-09-10 10:11:33 -07:00
Teresa Johnson	e46f03bc31	[MemProf] Remove unnecessary data structure (NFC) (#107643 ) Recent change #106623 added the CallToFunc map, but I subsequently realized the same information is already available for the calls being examined in the StackIdToMatchingCalls map we're iterating through.	2024-09-09 08:17:41 -07:00
Teresa Johnson	0ab3d6e143	Reapply "[MemProf] Reduce cloning overhead by sharing nodes when possible" (#102932 ) with fixes (#106623 ) This reverts commit 11aa31f595325d6b2dede3364e4b86d78fffe635, restoring commit 055e4319112282354327af9908091fdb25149e9b, with added fixes for linker unsats. In some cases multiple calls to different targets may end up with the same debug information, and therefore callsite id. We will end up sharing the node between these calls. We don't know which one matches the callees until all nodes are matched with calls, at which point any non-matching calls should be removed from the node. The fix extends the handling in handleCallsitesWithMultipleTargets to do this, and adds tests for various permutations of this situation.	2024-08-30 17:24:40 -07:00
Teresa Johnson	11aa31f595	Revert "[MemProf] Reduce cloning overhead by sharing nodes when possible" (#102932 ) Reverts llvm/llvm-project#99832 This caused a couple failures in wider testing, reverting for now and will recommit once they are addressed	2024-08-12 10:38:08 -07:00
lifengxiang1025	e6aeb3f4da	[MemProf] Fix when function has indirect call (#101170 ) When function has indirect call in LTO mode, it causes `assert(Alias)` in `findProfiledCalleeThroughTailCalls`	2024-08-01 10:16:53 +08:00
Teresa Johnson	055e431911	[MemProf] Reduce cloning overhead by sharing nodes when possible (#99832 ) When assigning calls to nodes while building the graph, we can share nodes between multiple calls in some cases. Specifically, when we process the list of calls that had the same stack ids (possibly pruned, because we are looking at the stack ids that actually had nodes in the graph due to stack ids in the pruned allocation MIBs), for calls that are located in the same function, we know that they will behave exactly the same through cloning and function assignment. Therefore, instead of creating nodes for all of them (requiring context id duplication), keep a list of additional "matching calls" on the nodes. During function assignment we simply update all the matching calls the same way as the primary call. This change not only reduces the number of nodes (both original and cloned), but also greatly reduces the number of duplicated context ids and the time to propagate them. For a large target, I measured a 25% peak memory reduction and 42% time reduction.	2024-07-23 12:44:06 -07:00
Teresa Johnson	edfe25064e	[MemProf] Consolidate increments in callee matching code (#99385 ) To facilitate some follow on changes, consolidate the incrementing of the edge iterator used during callee matching to the for loop statement. This requires an additional adjustment in the case of tail call handling.	2024-07-17 20:25:18 -07:00
Teresa Johnson	9f8205d9d8	[MemProf] Track and report profiled sizes through cloning (#98382 ) If requested, via the -memprof-report-hinted-sizes option, track the total profiled size of each MIB through the thin link, then report on the corresponding allocation coldness after all cloning is complete. To save size, a different bitcode record type is used for the allocation info when the option is specified, and the sizes are kept separate from the MIBs in the index.	2024-07-11 16:10:30 -07:00
Kazu Hirata	fef144cebb	Revert "[llvm] Use llvm::sort (NFC) (#96434 )" This reverts commit 05d167fc201b4f2e96108be0d682f6800a70c23d. Reverting the patch fixes the following under EXPENSIVE_CHECKS: LLVM :: CodeGen/AMDGPU/sched-group-barrier-pipeline-solver.mir LLVM :: CodeGen/AMDGPU/sched-group-barrier-pre-RA.mir LLVM :: CodeGen/PowerPC/aix-xcoff-used-with-stringpool.ll LLVM :: CodeGen/PowerPC/merge-string-used-by-metadata.mir LLVM :: CodeGen/PowerPC/mergeable-string-pool-large.ll LLVM :: CodeGen/PowerPC/mergeable-string-pool-pass-only.mir LLVM :: CodeGen/PowerPC/mergeable-string-pool.ll	2024-06-25 11:18:40 -07:00
Kazu Hirata	05d167fc20	[llvm] Use llvm::sort (NFC) (#96434 )	2024-06-23 10:38:51 -07:00
Nikita Popov	f1075a34ab	[FileSystem] Avoid <stack> include (NFC) The standard pattern in LLVM is to directly use vectors for stacks, without an additional std::stack wrapper to rename some methods.	2024-06-21 13:44:46 +02:00
Kazu Hirata	5dc99af487	[llvm] Use llvm::is_contained (NFC) (#95362 )	2024-06-13 08:09:13 -07:00
Kazu Hirata	e2d539bbba	[memprof] Fix comment typos (NFC)	2024-06-10 16:38:24 -07:00
Kazu Hirata	b7d976d4e5	[memprof] Use std::move in ContextEdge::ContextEdge (NFC) (#94687 ) Since the constructor of ContextEdge takes ContextIds by value, we should move it to the corresponding member variable as suggested by clang-tidy's performance-unnecessary-value-param. While we are at it, this patch updates a couple of callers. To avoid the ambiguity in the evaluation order among the constructor arguments, I'm calling computeAllocType before calling the constructor.	2024-06-06 23:49:05 -07:00
Teresa Johnson	9eac38a000	[MemProf] Remove context id set from nodes and recompute on demand (#94415 ) The ContextIds set on the ContextNode struct is not technically needed as we can compute it from either the callee or caller edge context ids. Remove it and add a helper to recompute from the edges on demand. Also add helpers to compute the node allocation type and whether the context ids are empty from the edges without needing to first compute the node's context id set, to minimize the runtime cost increase. This yielded a 20% reduction in peak memory for a large thin link, for about a 2% time increase (which is more than offset by some other recent time efficiency improvements).	2024-06-06 11:04:45 -07:00
Teresa Johnson	4973ad4718	[MemProf][NFC] Use range for loop (#94308 ) With the change in 2fa059195bb54f422cc996db96ac549888268eae we can now use a range for loop.	2024-06-03 21:15:40 -07:00
Teresa Johnson	2fa059195b	[MemProf] Use remove_if to erase MapVector elements in bulk (#94269 ) A cycle profile showed that we were spending a lot of time invoking MapVector::erase. According to https://llvm.org/docs/ProgrammersManual.html#llvm-adt-mapvector-h, erasing elements one at a time is very inefficient for MapVector and it is better to use remove_if. This change resulted in around 7% time reduction on a large thin link. While here remove an unused function that also invokes erase on MapVectors.	2024-06-03 20:43:52 -07:00
Teresa Johnson	61afebdacc	[MemProf][NFC] Switch to DenseMaps (#93868 ) Change a couple of maps from std::map to DenseMap, which showed a modest (3.6%) reduction in peak RSS.	2024-05-30 12:57:14 -07:00
Teresa Johnson	b2f6d323fc	[MemProf] Fix tailcall discovery checking for multiple callee chains (#92632 ) When looking for missing frames due to tail calls, we were not checking the output parameter of the recursive call in the correct place. Make sure we check for the case when that recursive call returned false due to multiple possible callee chains. Extended the existing test a bit to catch this case.	2024-05-24 07:38:07 -07:00
Teresa Johnson	a332cfc986	[MemProf] Perform cloning for each allocation separately (#87112 ) Restructures the cloning slightly to perform all cloning for each allocation separately. The prior algorithm would sometimes miss cloning opportunities in cases where trimmed cold contexts partially overlapped with longer contexts for different allocations. Most of the change is isolated to the helpers that move edges to new or existing clones, which now support moving a subset of context ids.	2024-04-09 14:12:32 -07:00
Teresa Johnson	082e7c480e	[MemProf] Remove empty edges once after cloning (#85320 ) Restructure the handling of edges that become empty during the cloning process. Instead of removing them as they become empty (no context ids and alloc type), do this once after all cloning is complete. This has no effect on the cloning result, but prepares for a follow on change that does improve the cloning. The structural change here reduces the diffs for the follow on change, which would be much more difficult with the previous handling.	2024-03-26 20:06:27 -07:00
lifengxiang1025	e40cabfea4	[MemProf] Match function's summary and definition strictly (#83665 ) Problem description: https://github.com/llvm/llvm-project/pull/81008#issuecomment-1933468520 Solution: https://github.com/llvm/llvm-project/pull/81008#issuecomment-1934192548 (choose plan2)	2024-03-12 11:00:02 +08:00
Teresa Johnson	48bc9022b4	[MemProf] Fix the stack updating handling of pruned contexts (#81322 ) Fix a bug in the handling of cases where a callsite's stack ids partially overlap with the pruned context during matching of calls to the graph contructed from the profiled contexts. This fix makes the code match the comments.	2024-02-27 07:23:51 -08:00
Teresa Johnson	9eed89908c	[MemProf] Handle empty stack context during ThinLTO cloning (#81008 ) Fix for assert after PR#78264. Handle the case where the MIB context is empty after skipping the callsite context, because the callsite context is actually longer than the MIB context. Presumably this happened as a result of inlining, but in theory the metadata should have been replaced with an attribute in that case. Need to investigate why this is occuring, but for now handle this gracefully to fix the build regression.	2024-02-07 10:46:34 -08:00
lifengxiang1025	6ccb06a7ab	[MemProf] Fix assert when exists direct recursion (#78264 ) Fix assert in `MemProfContextDisambiguation::applyImport` when exists direct recursion.	2024-01-26 20:55:44 +08:00
Teresa Johnson	070738ba88	[MemProf][NFC] Explicitly specify llvm version of function_ref (#77783 ) As suggested in https://github.com/llvm/llvm-project/pull/75823, to avoid confusion with std::function_ref, qualify all uses with llvm:: (we were already using the llvm version, but this avoids ambiguity).	2024-01-16 11:20:55 -08:00
Teresa Johnson	c37699b9e3	[MemProf] Add missing <unordered_map> include to fix buildbot (#77788 ) Should fix buildbot failure https://lab.llvm.org/buildbot/#/builders/54/builds/8451 from #75823.	2024-01-11 07:48:55 -08:00
Teresa Johnson	26a8664ed4	[MemProf] Handle missing tail call frames (#75823 ) If tail call optimization was not disabled for the profiled binary, the call contexts will be missing frames for tail calls. Handle this by performing a limited search through tail call edges for the profiled callee when a discontinuity is detected. The search depth is adjustable but defaults to 5. If we are able to identify a short sequence of tail calls, update the graph for those calls. In the case of ThinLTO, synthesize the necessary CallsiteInfos for carrying the cloning information to the backends.	2024-01-11 06:57:48 -08:00
Teresa Johnson	24a618f69e	[MemProf] Look through alias when applying cloning in ThinLTO backend (#72156 ) Mirror the handling in ModuleSummaryAnalysis to look through aliases when handling call instructions in the ThinLTO backend MemProf handling. Fixes #72094	2023-11-15 13:14:19 -08:00
Kazu Hirata	a5dca533bd	Use llvm::count (NFC)	2023-10-22 21:18:23 -07:00
Kazu Hirata	291d8ab3ed	[llvm] Use llvm::find_if (NFC)	2023-10-20 00:03:19 -07:00
Fangrui Song	2d854dd3e7	Move global namespace cl::opt inside llvm:: or internalize them	2023-10-10 19:58:03 -07:00
Benjamin Kramer	6e55370b81	Hide some implementation details so they can't cause ODR conflicts. NFC.	2023-07-14 15:54:04 +02:00
Kazu Hirata	c963892a45	[llvm] Use DenseMapBase::lookup (NFC)	2023-06-10 09:02:25 -07:00
Kan Wu	b8d2f7177c	[MemProf] Add hot allocation type Add "Hot" AllocationType (in addition to existing cold, notcold). Use lifetime access density as metric to identify hot allocations. Treat hot as notcold for MemProfContextDisambiguation for now before the disambiguation for "hot" is done. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D149932	2023-05-08 10:34:53 -07:00
Teresa Johnson	1768898680	[MemProf] Control availability of hot/cold operator new from LTO link Adds an LTO option to indicate that whether we are linking with an allocator that supports hot/cold operator new interfaces. If not, at the start of the LTO backends any existing memprof hot/cold attributes are removed from the IR, and we also remove memprof metadata so that post-LTO inlining doesn't add any new attributes. This is done via setting a new flag in the module summary index. It is important to communicate via the index to the LTO backends so that distributed ThinLTO handles this correctly, as they are invoked by separate clang processes and the combined index is how we communicate information from the LTO link. Specifically, for distributed ThinLTO the LTO related processes look like: ``` # Thin link: $ lld --thinlto-index-only obj1.o ... objN.o -llib ... # ThinLTO backends: $ clang -x ir obj1.o -fthinlto-index=obj1.o.thinlto.bc -c -O2 ... $ clang -x ir objN.o -fthinlto-index=objN.o.thinlto.bc -c -O2 ``` It is during the thin link (lld --thinlto-index-only) that we have visibility into linker dependences and want to be able to pass the new option via -Wl,-supports-hot-cold-new. This will be recorded in the summary indexes created for the distributed backend processes (*.thinlto.bc) and queried from there, so that we don't need to know during those individual clang backends what allocation library was linked. Since in-process ThinLTO and regular LTO also use a combined index, for consistency we query the flag out of the index in all LTO backends. Additionally, when the LTO option is disabled, exit early from the MemProfContextDisambiguation handling performed during LTO, as this is unnecessary. Depends on D149117 and D149192. Differential Revision: https://reviews.llvm.org/D149215	2023-05-08 08:02:21 -07:00
NAKAMURA Takumi	a29a97d76b	Fix a warning in D149117 [-Wunused-but-set-variable]	2023-05-06 11:30:04 +09:00
Teresa Johnson	a28261c711	[MemProf] Create single version of helper function (NFC) Small clean up to keep a single version of getAllocTypeAttributeString which was duplicated locally.	2023-05-05 18:31:35 -07:00
Teresa Johnson	cfad2d3a3d	[MemProf] Context disambiguation cloning pass [patch 4/4] Applies ThinLTO cloning decisions made during the thin link and recorded in the summary index to the IR during the ThinLTO backend. Depends on D141077. Differential Revision: https://reviews.llvm.org/D149117	2023-05-05 16:26:32 -07:00
Teresa Johnson	04f3c5a71e	Restore again "[MemProf] Context disambiguation cloning pass [patch 3/4]" This reverts commit f09807ca9dda2f588298d8733e89a81105c88120, restoring bfe7205975a63a605ff3faacd97fe4c1bf4c19b3 and follow on fix e3e6bc699574550f2ed1de07f4e5bcdddaa65557, now that the nondeterminism has been addressed by D149924. Differential Revision: https://reviews.llvm.org/D141077	2023-05-05 13:27:33 -07:00

1 2

66 Commits