llvm-project

Author	SHA1	Message	Date
Teresa Johnson	dc90472532	[MemProf] Ensure node merging happens for newly created nodes (#151593 ) We weren't performing node merging on newly created nodes in some cases. Use a simple iteration over the node and its callers until no more opportunities are found. I confirmed that for several large codes the max iterations is 3 (meaning we only needed to do any work on the first 2, as expected). This can potentially be made more elegant in the future, but it is a simple and effective solution. Also fix a bug exposed by the test case, getting the function for a call instruction in the FullLTO handling, using an existing method to look through aliases if needed.	2025-08-01 12:51:12 -07:00
Teresa Johnson	d4562a1991	[MemProf] Use DenseMap for call map (NFC) (#151161 ) There is no reason to use std::map for the call maps maintained for function clones during function clone assignment, as we don't iterate over them and don't need deterministic ordering, so use the more efficient DenseMap.	2025-07-29 08:18:31 -07:00
Kazu Hirata	255bba0136	[memprof] Fix a warning This patch fixes: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:4771:9: error: non-void lambda does not return a value in all control paths [-Werror,-Wreturn-type]	2025-07-28 19:35:02 -07:00
Teresa Johnson	f3761ab340	Reapply "[MemProf] Ensure all callsite clones are assigned a function clone" (#150856 ) (#151055 ) This reverts commit 314e22bcab2b0f3d208708431a14215058f0718f, reapplying PR150735 with a fix for the unstable iteration order exposed by the new tests (PR151039).	2025-07-28 17:04:45 -07:00
Teresa Johnson	ced3b90738	[MemProf] Change map to vector to avoid unstable iteration (#151039 ) We iterate over a std::map indexed by FuncInfo, which is a pair of a pointer and a clone number. In the ThinLTO case, this isn't an issue as the function pointer always points to the same FunctionSummary object. However, for regular LTO, this is a pointer to a Function object, which is different for each clone. This will lead to unstable iteration order. This was exposed in a test case added for PR150735, which added a new instance of iteration over this map. Since these function clones are added and numbered sequentially, change this to a vector indexed by clone number, which points to a structure containing the clone FuncInfo and the call map (the old map's key and value, respectively).	2025-07-28 14:20:49 -07:00
Teresa Johnson	314e22bcab	Revert "[MemProf] Ensure all callsite clones are assigned a function clone" (#150856 ) Reverts llvm/llvm-project#150735 due to bot failures that I need to investigate	2025-07-27 15:55:22 -07:00
Teresa Johnson	0f2484a740	[MemProf] Ensure all callsite clones are assigned a function clone (#150735 ) Fix a bug in function assignment where we were not assigning all callsite clones to a function clone. This led to incorrect call updates because multiple callsite clones could look like they were assigned to the same function clone. Add in a stat and debug message to help identify and debug cases where this is still happening.	2025-07-27 11:48:30 -07:00
Teresa Johnson	e4963834e4	[MemProf] Include caller clone information in dot graph nodes (#150492 ) We already included the assigned clone of the callsite node's callee in the dot graph after function assignment. This adds the same information for the enclosing caller function to aid debugging.	2025-07-25 07:29:22 -07:00
Teresa Johnson	0e42c665f9	[MemProf] Update the declaration DISubprogram linkageName for clones (#149864 ) Follow up to PR145385 to also update the linkageName on any separate DISubprogram for the clone function declaration.	2025-07-21 12:27:36 -07:00
Teresa Johnson	ac39d26dc4	[MemProf] Don't mutate the function type when calling clone (#147829 ) In rare cases the declaration of a function may not match its callsite after function importing, when the declaration was imported from a module where the function had void return type (presumably due to incomplete types). Instead of using setCalledFunction() to change a call to call its clone, which updates the call's function type as well, just call setCalledOperand directly so the only thing changed is the function target. Note this can't happen for the other places where we call setCalledFunction: FullLTO requires the cloned callee to be defined in the same FullLTO merged module; ThinLTO memprof ICP calls an ICP facility to first perform the promotion and that will be blocked if the function type doesn't match the callsite (the new test explicitly tests this latter case).	2025-07-11 11:33:43 -07:00
Teresa Johnson	838701a540	MemProf: Add minimum count threshold for inlining of promoted calls (#148001 ) Allow users to set the minimum absolute count for inlining of indirect calls promoted during cloning. This is primarily meant to enable generation of synthetic vp metadata introduced in PR141164 when profiling memprof-optimized binaries.	2025-07-10 13:48:16 -07:00
Teresa Johnson	90a6819cfe	[MemProf] Update the DISubprogram linkageName for clones (#145385 ) This corrects the debug information for the cloned functions so that it contains the correct linkage name.	2025-06-23 18:57:22 -07:00
David Spickett	36ac72f4e3	[llvm][MemProf] Fix unused variable warning in release build g++-13 warned that: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:1645:8: warning: variable ‘PrevIterCreatedNode’ set but not used [-Wunused-but-set-variable] 1645 \| bool PrevIterCreatedNode = false; \| ^~~~~~~~~~~~~~~~~~~ When asserts were not enabled.	2025-06-12 12:52:54 +00:00
Teresa Johnson	b58b3e1d36	[MemProf] Add dot graph dumping immediately after stack node update (#143025 ) To aid in debugging, (optionally) dump the dot graph immediately after the stack update phase (which matches nodes to interior callsites) and before we cleanup mismatched callee edges (either via tail call fixup, indirect call fixup, or nulling otherwise).	2025-06-05 13:49:11 -07:00
Teresa Johnson	3ec2de2753	[MemProf] Optionally save context size info on largest cold allocations (#142837 ) Reapply PR142507 with fix for test: add in the same x86_64-linux requirement as other tests as the stack ids are currently computed differently on big endian systems. This will be investigated separately. In order to allow selective reporting of context hinting during the LTO link, and in the future to allow selective more aggressive cloning, add an option to specify a minimum percent of the max cold size in the profile summary. Contexts that meet that threshold will get context size info metadata (and ThinLTO summary information) on the associated allocations. Specifying -memprof-report-hinted-sizes during the pre-LTO compile step will continue to cause all contexts to receive this metadata. But specifying -memprof-report-hinted-sizes only during the LTO link will cause only those that meet the new threshold and have the metadata to get reported. To support this, because the alloc info summary and associated bitcode requires the context size information to be in the same order as the other context information, 0s are inserted for contexts without this metadata. The bitcode writer uses a more compact format for the context ids to allow better compression of the 0s. As part of this change several helper methods are added to query whether metadata contains context size info on any or all contexts.	2025-06-04 13:08:56 -07:00
Teresa Johnson	6c1091ea3f	Revert "[MemProf] Optionally save context size info on largest cold allocations" (#142688 ) Reverts llvm/llvm-project#142507 due to buildbot failures that I will look into tomorrow.	2025-06-03 16:05:16 -07:00
Teresa Johnson	f2adae5780	[MemProf] Optionally save context size info on largest cold allocations (#142507 ) In order to allow selective reporting of context hinting during the LTO link, and in the future to allow selective more aggressive cloning, add an option to specify a minimum percent of the max cold size in the profile summary. Contexts that meet that threshold will get context size info metadata (and ThinLTO summary information) on the associated allocations. Specifying -memprof-report-hinted-sizes during the pre-LTO compile step will continue to cause all contexts to receive this metadata. But specifying -memprof-report-hinted-sizes only during the LTO link will cause only those that meet the new threshold and have the metadata to get reported. To support this, because the alloc info summary and associated bitcode requires the context size information to be in the same order as the other context information, 0s are inserted for contexts without this metadata. The bitcode writer uses a more compact format for the context ids to allow better compression of the 0s. As part of this change several helper methods are added to query whether metadata contains context size info on any or all contexts.	2025-06-03 14:20:38 -07:00
Iris Shi	bdf03fcff3	Revert "[llvm][NFC] Use `llvm::sort()`" (#140668 )	2025-05-20 11:27:03 +08:00
Iris Shi	061a7699f3	[llvm][NFC] Use `llvm::sort()` (#140335 )	2025-05-17 14:49:46 +08:00
Kazu Hirata	18ecff4f65	[llvm] Use llvm::stable_sort (NFC) (#140067 )	2025-05-15 12:18:18 -07:00
Teresa Johnson	7348d7eccb	[MemProf] Avoid assertion checking loop under NDEBUG (NFC) (#138985 ) Guard a loop that only exists to do assertion checking of stack ids on memprof metadata so that it isn't compiled and executed under NDEBUG. This is similar to how callsite metadata stack id verification is guarded further below.	2025-05-07 21:20:01 -07:00
Owen Rodley	d3d856ad84	Clean up external users of GlobalValue::getGUID(StringRef) (#129644 ) See https://discourse.llvm.org/t/rfc-keep-globalvalue-guids-stable/84801 for context. This is a non-functional change which just changes the interface of GlobalValue, in preparation for future functional changes. This part touches a fair few users, so is split out for ease of review. Future changes to the GlobalValue implementation can then be focused purely on that class. This does the following: * Rename GlobalValue::getGUID(StringRef) to getGUIDAssumingExternalLinkage. This is simply making explicit at the callsite what is currently implicit. * Where possible, migrate users to directly calling getGUID on a GlobalValue instance. * Otherwise, where possible, have them call the newly renamed getGUIDAssumingExternalLinkage, to make the assumption explicit. There are a few cases where neither of the above are possible, as the caller saves and reconstructs the necessary information to compute the GUID themselves. We want to migrate these callers eventually, but for this first step we leave them be.	2025-04-28 11:09:43 +10:00
Kazu Hirata	2e230f5685	[llvm] Use llvm::interleaved (NFC) (#137496 )	2025-04-26 23:28:46 -07:00
Teresa Johnson	2cdf474f12	[MemProf] Merge callee clones as needed before assigning functions (#135702 ) We perform cloning for each allocation node separately. However, this sometimes results in a situation where the same node calls multiple clones of the same callee, created for different allocations. This causes issues when assigning functions to these clones, as each node can in reality only call a single callee clone. To address this, before assigning functions, merge callee clone nodes as needed using a post order traversal from the allocations. We attempt to use existing clones as the merge node when legal, and to share them among callers with the same properties (callers calling the same set of callee clone nodes for the same allocations). Without this fix, in some cases incorrect function assignment will lead to calling the wrong allocation clone. In fact, this showed up in an existing test, that I didn't notice as it existed to test earlier parts of the cloning process.	2025-04-21 17:05:34 -07:00
Kazu Hirata	a42ac55a79	[IPO] Avoid repeated hash lookups (NFC) (#135750 )	2025-04-17 23:03:25 -07:00
Teresa Johnson	e6e56f5b6a	[MemProf] Handle recursion during stack node update (#135837 ) If we are replacing a sequence of stack nodes with a single node representing inlined IR, and the stack id sequence contains recursion, we may have already removed some edges. Handle this case correctly by skipping the now removed edge.	2025-04-15 12:45:18 -07:00
Kazu Hirata	20d35fe5a5	[llvm] Use llvm::is_contained (NFC) (#135566 )	2025-04-13 16:35:29 -07:00
Kazu Hirata	06cb7b1e14	[Transforms] Use llvm::append_range (NFC) (#133650 )	2025-03-30 12:21:59 -07:00
Kazu Hirata	0dcc201ac4	[Transforms] Use *Set::insert_range (NFC) (#132056 ) DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently gained C++23-style insert_range. This patch replaces: Dest.insert(Src.begin(), Src.end()); with: Dest.insert_range(Src); This patch does not touch custom begin like succ_begin for now.	2025-03-19 15:35:01 -07:00
chrisPyr	71f4c7dabe	[NFC]Make file-local cl::opt global variables static (#126486 ) #125983	2025-03-03 13:46:33 +07:00
Teresa Johnson	6a5bb4c2f1	[MemProf] Fix handling of recursive edges during func assignment (#129066 ) When we need to reclone other callees of a caller node during function assignment due to the creation of a new function clone, we need to skip recursive edges on that caller. We don't want to reclone the callee in that case (which is the caller), which isn't necessary and also isn't correct from a graph update perspective. It resulted in an assertion and in an NDEBUG build caused an infinite loop.	2025-02-27 08:41:28 -08:00
Teresa Johnson	eb92157399	[MemProf] Add ability to export or highlight only a portion of graph (#128255 ) To simplify debugging and analysis, particularly for very large applications with large graphs, this patch adds support for either highlighting a single context id or allocation's context ids, and/or only exporting the nodes/edges for a single context id or allocation's context ids. When highlighting, the specified nodes and edges are a brighter color and larger. This can be controlled by the new -memprof-dot-scope={all,alloc,context} flag which controls how much to export, along with two companion flags: -memprof-dot-alloc-id=ID -memprof-dot-context-id=ID These two are interpreted differently depending on the value of -memprof-dot-scope (where "all" is the default). If exporting all, one of the above flags can optionally be passed to highlight the nodes/edges for the given context id or allocation's context ids. If exporting alloc scope, an alloc id must be provided. A context id can optionally be provided to highlight that context. If exporting context scope, a context id must be provided. The ids to use can be obtained either by looking at the full graph, or a context id can be identified from the -memprof-report-hinted-sizes output after PR128188 is merged.	2025-02-22 05:42:46 -08:00
Teresa Johnson	9d6f2647de	[MemProf] Print internal context id when reporting bytes hinted (#128188 ) During the whole program reporting of contexts when hinted byte reporting is enabled via -memprof-report-hinted-sizes, also print the internal context id. This is useful for debugging, as well as for guiding the dot file dumping with some upcoming changes that will accept a context id to focus the graph on a context of interest.	2025-02-22 05:42:28 -08:00
Teresa Johnson	92e02ad9dc	[MemProf] Display backedges with dotted line in dot graphs (#128235 ) Add checking of this behavior in the postbuild dot graphs, facilitated by PR128226 which marked these edges at the end of the graph building.	2025-02-21 14:49:28 -08:00
Teresa Johnson	c3d5070086	[MemProf] Refactor backedge computation and invoke earlier (#128226 ) Invoke the backedge computation (refactored as a new method) at the end of the graph construction, instead of at the start of cloning. That makes more logical sense, and it also makes it easier to look at the results in the postbuild dot graph with a follow on change to display those differently.	2025-02-21 12:57:40 -08:00
Teresa Johnson	741f923fac	[MemProf] Minor fixes to dot graph printing (#128217 ) Two misc cleanup/improvements to the dot printing. Remove a redundant "style=filled" in the Node attributes. No effect on resulting graph. Add a "color" attribute to the Edge, with the same color name as "fillcolor". The latter only fills in the arrowhead, and the former is what affects the line. This makes the edge colors more visible, previously it was a black edge with a colored in arrowhead. For the second change, I added the new Edge color attributes to the checking in the two "basic.ll" tests, so we get some testing coverage of the full printing. For the other affected tests I removed the final "]'" after the fillcolor so it matches up through that attribute and ignores the rest of the line.	2025-02-21 12:02:06 -08:00
Kazu Hirata	6342095bce	[memprof] Fix a warning This patch fixes: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:3409:8: error: unused variable 'I' [-Werror,-Wunused-variable]	2025-02-19 14:28:13 -08:00
Teresa Johnson	92b07520bc	[MemProf] Support cloning through recursive cycles (#127429 ) In order to facilitate cloning of recursive cycles, we first identify backedges using a standard DFS search from the root callers, then initially defer recursively invoking the cloning function via those edges. This is because the cloning opportunity along the backedge may not be exposed until the current node is cloned for other non-backedge callers that are cold after the earlier recursive cloning, resulting in a cold predecessor of the backedge. So we recursively invoke the cloning function for the backedges during the cloning of the current node for its caller edges (which were sorted to enable handling cold callers first). There was no significant time or memory overhead measured for several large applications.	2025-02-19 12:44:33 -08:00
Teresa Johnson	1dbfbb5ce6	[MemProf] Stop cloning traversal on single allocation type (#126131 ) We were previously checking this after recursing on all callers, but if we already have a single allocation type there is no need to even look at any callers. Didn't show a significant improvement overall, but it does reduce the count of times we enter the identifyClones and do other checks.	2025-02-06 13:21:02 -08:00
Teresa Johnson	b9e4bde804	[MemProf] Re-enable cloning of callsites in recursive cycles with fixes (#125947 ) This change addresses a number of issues with the support added by PR121985 which were exposed through more exhaustive testing, specifically places that needed updates to perform correct graph updates in the presence of cycles. A new test case is added that reproduces these issues, and the default is flipped back to enabling this handling.	2025-02-06 08:04:42 -08:00
Teresa Johnson	edb7f6c0da	[MemProf] Add more assertion checking to the edge removal helper (#125017 ) Check a few unexpected cases (edge already removed, edge not in its caller or callee edge lists).	2025-01-29 19:23:35 -08:00
Teresa Johnson	6c3bf34114	[MemProf] Fix summary identification for imported locals (#124659 ) When we apply cloning decisions in the ThinLTO backend, we need to find the corresponding summary for each function in the IR, and in some cases for callee functions. This is complicated when the function was a promoted local, in which case the GUID was formed from the hash of the original source file prepended to the function name. Those functions can be identified by the fact that they were given a ".llvm." suffix during promotion. We previously didn't do this correctly for promoted locals imported from other modules, as we only tried the current module source name. This led to crashes, in particular when the current module also had an local function of the same original name. In particular, we were attempting to iterate through the wrong summary's callsites, and there were fewer than in the actual function so we accessed data off the end (in a release build with assertion checking off - with assertion checking on we double check the stack ids and that would have failed). Even if we hadn't crashed or hit an assert, we could have applied the wrong cloning decisions, leading to unsats at link time. Luckily, function importing attaches thinlto_src_file metadata containing the original source file name to all imported functions. It normally doesn't do this by default, however, it always does if MemProf context disambiguation is enabled. Therefore, we can just look to see if the function contains this metadata and if so use it to recreate the original GUID. A similar issue can occur when looking for the ValueInfo / GUID of a direct tail call to see if we synthesized a callsite record for a missing tail call frame. In that case, the callee function may be a declaration, if we imported its caller but not the callee function definition. Because imported declarations don't get the thinlto_src_file metadata, we instead look at its caller (which works because this happens very early in the backend before any inlining).	2025-01-29 18:22:14 -08:00
Teresa Johnson	8a86e6aefe	[MemProf] Constify a couple of methods used during cloning (#124994 ) This also helps ensure we don't inadvartently create map entries by forcing use of at() instead of operator[].	2025-01-29 14:18:11 -08:00
Kazu Hirata	e0c5a8553d	[memprof] Migrate away from PointerUnion::dyn_cast (NFC) (#124505 ) Note that PointerUnion::dyn_cast has been soft deprecated in PointerUnion.h: // FIXME: Replace the uses of is(), get() and dyn_cast() with // isa<T>, cast<T> and the llvm::dyn_cast<T> Literal migration would result in dyn_cast_if_present (see the definition of PointerUnion::dyn_cast), but this patch uses cast because we know which alternative to expect in the ternary expression.	2025-01-27 10:35:37 -08:00
Teresa Johnson	7ad8a3da47	[MemProf] Simplify edge iterations (NFC) (#123469 ) Remove edge iterator parameters from the various helpers that move edges onto other nodes, and their associated iterator update code, and instead iterate over copies of the edge lists in the caller loops. This also avoids the need to increment these iterators at every early loop continue. This simplifies the code, makes it less error prone when updating, and in particular, facilitates adding handling of recursive contexts. There were no measurable compile time and memory overhead effects for a large target.	2025-01-22 11:35:52 -08:00
Kazu Hirata	debe7bd916	[memprof] Migrate away from PointerUnion::dyn_cast (NFC) (#123716 ) Note that PointerUnion::dyn_cast has been soft deprecated in PointerUnion.h: // FIXME: Replace the uses of is(), get() and dyn_cast() with // isa<T>, cast<T> and the llvm::dyn_cast<T> Literal migration would result in dyn_cast_if_present (see the definition of PointerUnion::dyn_cast), but this patch uses cast because we expect the arguments to be of the requested types. Note that all these cases have assert and/or dereferences just after cast, implying that the return value from cast must be nonnull. --------- Co-authored-by: Nikita Popov <github@npopov.com>	2025-01-21 15:02:41 -08:00
Teresa Johnson	0ca6b2b0cc	[MemProf] Fix an incorrect iterator increment (#123438 ) We pass in a pointer to an Edge iterator to moveEdgeToExistingCalleeClone, so that it can be correctly updated when we remove edges during an edge iteration. We were not dereferencing this pointer in one case, meaning we would increment the pointer and not the iterator as intended. This did not cause any issues, as it turns out that we would simply skip the edge on the next iteration as it was already appropriately handled. While in theory this incurred some extra compilation time, in practice for a large application the effect was not significant. I confirmed that there was no effect to any cloning from the fix. I plan to send a follow up change to avoid the need to pass in an iterator at all and simplify / consolidate the handling in the caller, but want to fix this in case something requires a revert of the follow on fix.	2025-01-21 11:31:29 -08:00
Kazu Hirata	cac3f5ecb9	[memprof] Add simplify_type (NFC) (#123556 ) IndexCall is a simple wrapper around: PointerUnion<CallsiteInfo , AllocInfo > Now, because we don't have CastInfo for IndexCall, we would have to use getBase like so: dyn_cast_if_present<CallsiteInfo >(Call.getBase()) This patch adds simplify_type<IndexCall>, which in turn enables CastInfo for IndexCall, so we can drop getBase like so:: dyn_cast_if_present<CallsiteInfo >(Call)	2025-01-20 10:12:39 -08:00
Kazu Hirata	43fdd6e81d	[memprof] Migrate away from PointerUnion::is (NFC) (#122622 ) Note that PointerUnion::is have been soft deprecated in PointerUnion.h: // FIXME: Replace the uses of is(), get() and dyn_cast() with // isa<T>, cast<T> and the llvm::dyn_cast<T> In this patch, I'm calling call().getBase() for an instance of PointerUnion. call() alone would return an instance of IndexCall, which wraps PointerUnion. Note that isa<> cannot directly accept an instance of IndexCall, at least without defining CastInfo. I'm not touching PointerUnion::dyn_cast for now because it's a bit complicated; we could blindly migrate it to dyn_cast_if_present, but we should probably use dyn_cast when the operand is known to be non-null.	2025-01-12 11:06:42 -08:00
Teresa Johnson	3055e86c71	[MemProf] Disable cloning of callsites in recursive cycles by default (#122354 ) This disables the support added in PR121985 by default while we investigate a compile time crash.	2025-01-09 12:01:43 -08:00

1 2 3

125 Commits