llvm-project

Author	SHA1	Message	Date
Ryan Mansfield	67efbd0bf1	[LLVM] Fix various cl::desc typos and whitespace issues (NFC) (#121955 )	2025-01-08 11:07:23 +01:00
Haohai Wen	ccc8e45404	[PseudoProbe] Fix cleanup for pseudo probe after annotation (#119660 ) When using -sample-profile-remove-probe, pseudo probe desc should also be removed and dwarf discriminator for call instruction should be restored.	2024-12-13 17:05:03 +08:00
Lei Wang	bc1aa2863b	[SampleFDO] Support enabling sample loader pass in O0 mode (#113985 ) Add support for enabling sample loader pass in O0 mode(under `-fsample-profile-use`). This can help verify PGO raw profile count quality or provide a more accurate performance proxy(predictor), as O0 mode has minimal or no compiler optimizations that might otherwise impact profile count accuracy. - Explicitly disable the sample loader inlining to ensure it only emits sampling annotation. - Use flattened profile for O0 mode. - Add the pass after `AddDiscriminatorsPass` pass to work with `-fdebug-info-for-profiling`.	2024-11-08 15:29:44 -08:00
Kazu Hirata	98ea1a81a2	[IPO] Remove unused includes (NFC) (#114716 ) Identified with misc-include-cleaner.	2024-11-03 13:48:55 -08:00
Antonio Frighetto	2ae968a0d9	[Instrumentation] Move out to Utils (NFC) (#108532 ) Utility functions have been moved out to Utils. Minor opportunity to drop the header where not needed.	2024-09-15 21:07:40 -07:00
Lei Wang	ce8c43fe27	Fix assertion of null pointer samples in inline replay mode (#99378 ) Fix https://github.com/llvm/llvm-project/issues/97108. In inline replay mode, `CalleeSamples` may be null and the order doesn't matter.	2024-07-18 10:16:44 -07:00
Lei Wang	18cdfa72e0	[SampleFDO] Stale profile call-graph matching (#95135 ) Profile staleness could be due to function renaming. Given that sample profile loader relies on exact string matching, a trivial change in the function signature( such as `int foo()` --> `long foo()` ) can make the mangled name different, the function profile(including all nested children profile) becomes unavailable. This patch introduces stale profile call-graph level matching, targeting at identifying the trivial function renaming and reusing the old function profile. Some noteworthy details: 1. Extend the LCS based CFG level matching to identify new function. - Extend to match function and profile have different name instead of the exact function name matching. This leverages LCS, i.e during the finding of callsite anchor matching, when two function name are different, try matching the functions instead of return. - In LCS, the equal function check is replaced by `functionMatchesProfile`. - Only try matching functions that are new functions(neither appears on each side). This reduces the matching scope as we don't need to match the originally matched function. 2. Determine the matching by call-site anchor similarity check. - A new function `functionMatchesProfile(IRFunc, ProfFunc)` is used to check the renaming for the possible <IRFunc, ProfFunc> pair, use the LCS(diff) matching to compute the equal set and we define: `Similarity = \|equalSet * 2\| / (\|A\| + \|B\|)`. The profile name is marked as renamed if the similarity is above a threshold(`-func-profile-similarity-threshold`) 3. Process the matching in top-down function order - when a caller's is done matching, the new function names are saved for later use, using top-down order will maximize the reused results. - `ProfileNameToFuncMap` is used to save or cache the matching result. 4. Update the original profile at the end using `ProfileNameToFuncMap`. 5. Added a new switch --salvage-unused-profile to control this, default is false. Verified on one Meta's internal big service, confirmed 90%+ of the found renaming pair is good. (There could be incorrect renaming pair if the num of the anchor is small, but checked that those functions are simple cold function)	2024-07-17 10:33:00 -07:00
Mircea Trofin	ce03155a1b	[NFC] Coding style fixes: SampleProf (#98208 ) Also some control flow simplifications. Notably, this doesn't address `sampleprof_error`. I think the style there tries to match `std::error_category`. Also left `hash_value` as-is, because it matches what we do in Hashing.h	2024-07-09 14:35:49 -07:00
Kazu Hirata	3cf762b7b7	[Transforms] Migrate to a new version of getValueProfDataFromInst (#96380 )	2024-06-30 14:54:07 -07:00
Kazu Hirata	836ca5bbf7	[Transforms] Migrate to a new version of getValueProfDataFromInst (#95485 ) Note that the version of getValueProfDataFromInst that returns bool has been "deprecated" since: commit 1e15371dd8843dfc52b9435afaa133997c1773d8 Author: Mingming Liu <mingmingl@google.com> Date: Mon Apr 1 15:14:49 2024 -0700	2024-06-13 18:21:09 -07:00
Paul Kirth	294f3ce5dd	Reapply "[llvm][IR] Extend BranchWeightMetadata to track provenance o… (#95281 ) …f weights" #95136 Reverts #95060, and relands #86609, with the unintended code generation changes addressed. This patch implements the changes to LLVM IR discussed in https://discourse.llvm.org/t/rfc-update-branch-weights-metadata-to-allow-tracking-branch-weight-origins/75032 In this patch, we add an optional field to MD_prof meatdata nodes for branch weights, which can be used to distinguish weights added from llvm.expect* intrinsics from those added via other methods, e.g. from profiles or inserted by the compiler. One of the major motivations, is for use with MisExpect diagnostics, which need to know if branch_weight metadata originates from an llvm.expect intrinsic. Without that information, we end up checking branch weights multiple times in the case if ThinLTO + SampleProfiling, leading to some inaccuracy in how we report MisExpect related diagnostics to users. Since we change the format of MD_prof metadata in a fundamental way, we need to update code handling branch weights in a number of places. We also update the lang ref for branch weights to reflect the change.	2024-06-12 12:52:28 -07:00
Paul Kirth	607afa0b63	Revert "[llvm][IR] Extend BranchWeightMetadata to track provenance of weights" (#95060 ) Reverts llvm/llvm-project#86609 This change causes compile-time regressions for stage2 builds (https://llvm-compile-time-tracker.com/compare.php?from=3254f31a66263ea9647c9547f1531c3123444fcd&to=c5978f1eb5eeca8610b9dfce1fcbf1f473911cd8&stat=instructions:u). It also introduced unintended changes to `.text` which should be addressed before relanding.	2024-06-11 08:06:06 +02:00
Paul Kirth	c5978f1eb5	[llvm][IR] Extend BranchWeightMetadata to track provenance of weights (#86609 ) This patch implements the changes to LLVM IR discussed in https://discourse.llvm.org/t/rfc-update-branch-weights-metadata-to-allow-tracking-branch-weight-origins/75032 In this patch, we add an optional field to MD_prof metadata nodes for branch weights, which can be used to distinguish weights added from `llvm.expect*` intrinsics from those added via other methods, e.g. from profiles or inserted by the compiler. One of the major motivations, is for use with MisExpect diagnostics, which need to know if branch_weight metadata originates from an llvm.expect intrinsic. Without that information, we end up checking branch weights multiple times in the case if ThinLTO + SampleProfiling, leading to some inaccuracy in how we report MisExpect related diagnostics to users. Since we change the format of MD_prof metadata in a fundamental way, we need to update code handling branch weights in a number of places. We also update the lang ref for branch weights to reflect the change.	2024-06-10 11:27:21 -07:00
William Junda Huang	5a23d31c50	[Sample Profile] Check hot callsite threshold when inlining a function with a sample profile (#93286 ) Currently if a callsite is hot as determined by the sample profile, it is unconditionally inlined barring invalid cases (such as recursion). Inline cost check should still apply because a function's hotness and its inline cost are two different things. For example if a function is calling another very large function multiple times (at different code paths), the large function should not be inlined even if its hot.	2024-05-28 16:41:53 -04:00
Nabeel Omer	686a206b26	[SampleProfileLoader] Fix integer overflow in generateMDProfMetadata (#90217 ) This patch fixes an integer overflow in the SampleProfileLoader pass. The issue occurs when weights are saturated and Profi isn't being used. This patch also adds a newline to a debug message to make it more readable.	2024-05-08 14:32:56 +01:00
Lei Wang	b7248d5363	[PseudoProbe] Add an option to remove pseudo probes after profile annotation (#90293 ) This can be used for testing perf overhead of pseudo-probe.	2024-04-29 09:27:33 -07:00
Lei Wang	1aceee7bb6	Remove unused variable (#88223 ) fix the CI	2024-04-09 19:25:08 -07:00
Lei Wang	1d99d7a6f8	[SampleFDO][NFC] Refactoring SampleProfileMatcher (#86988 ) Move all the stale profile matching stuffs into new files so that it can be shared for unit testing.	2024-03-28 20:03:03 -07:00
Lei Wang	f8bab38b6d	[CSSPGO] Fix the issue of missing callee profile matches (#85715 ) Two fixes related to the callee/inlinee profile: 1. Fix the bug that the matching results are missing to distribute to the callee profiles (should be pass-by-reference). 2. Narrow imported function matching to checksum mismatched functions. More context: before we run matchings for all imported functions even checksums are matched, however, after we fix 1), we got a regression, it's likely due to the matching is not no-op for checksum matched function, so we want to make it consistent to only run matching for checksum mismatched (imported)functions. Since the metadata(pseudo_probe_desc) are dropped for imported function, we leverage the function attribute mechanism and add a new function attribute(`profile-checksum-mismatch`) to transfer the info from pre-link to post-link.	2024-03-27 22:27:22 -07:00
Lei Wang	2598aa67c8	[CSSPGO] Reject high checksum mismatched profile (#84097 ) Error out the build if the checksum mismatch is extremely high, it's better to drop the profile rather than apply the bad profile. Note that the check is on a module level, the user could make big changes to functions in one single module but those changes might not be performance significant to the whole binary, so we want to be conservative, only expect to catch big perf regression. To do this, we select a set of the "hot" functions for the check. We use two parameter(`hot-func-cutoff-for-staleness-error` and `min-functions-for-staleness-error`) to control the function selection to make sure the selected are hot enough and the num of function is not small. Tuned the parameters on our internal services, it works to catch big perf regression due to the high mismatch .	2024-03-27 11:14:21 -07:00
Lei Wang	12a2bc301f	[CSSPGO] Fix the issue of preinliner import function list (#85719 ) By design, when the nested profile is pre-inliner based, we should fully honor pre-inliner decision, fix it by setting threshold to zero. We observed a perf win on one internal service, no negative impact for other big services.	2024-03-19 16:50:48 -07:00
Lei Wang	c98da372cb	[CSSPGO] Compute and report profile matching recovered callsites and samples (#79090 ) This change adds the support to compute and report the staleness metrics after stale profile matching so that we can know how effective the fuzzy matching is, i. e. how many callsites and samples are recovered by the matching. Some implementation notes: - The function checksum mismatch metrics are not applicable here as it's function-level metrics, checksum mismatch remains the same before and after matching, so we need to compute based on the callsite samples. - Added two new counters `NumRecoveredCallsites`, `RecoveredCallsiteSamples` for this and removed `TotalCallsiteSamples` as now the we can use the `TotalFuncHashSamples` as base, and renamed some counters. - In profile matching, we changed to use a state machine to represent the callsite's matching state changes. See the `MatchState` for the state, and used a new function `recordCallsiteMatchStates` to compute and record the callsite's match states changes before and after the matching, , the result is compressed and saved into a `FuncCallsiteMatchStates` map for later counting use. - Changed the counting function to run on module-level and moved it to the end of the whole process(`computeAndReportProfileStaleness`). The reason is before the callsite is only counted on top-level function, this change extends it to count(recursively) on the inlined functions and samples, which is more accurate.	2024-02-19 11:36:20 -08:00
Benjamin Kramer	9423e45987	[ProfileData] Copy CallTargetMaps a bit less. NFCI	2023-12-24 17:48:18 +01:00
Matthias Braun	cb4627d150	Add setBranchWeigths convenience function. NFC (#72446 ) Add `setBranchWeights` convenience function to ProfDataUtils.h and use it where appropriate.	2023-11-16 10:55:19 -08:00
William Junda Huang	683f2df6e5	[SampleProfile] Fix bug where remapper returns empty string and crashing Sample Profile loader (#71479 ) Normally SampleContext does not allow using an empty StirngRef to construct an object, this is to prevent bugs reading the profile. However empty names may be emitted by a function which its name is intentionally set to empty, or a bug in the remapper that returns an empty string. Regardless, converting it to FunctionId first will prevent the assert, and that assert check is unnecessary, which will be addressed in another patch	2023-11-10 21:38:13 +00:00
William Junda Huang	ef0e0adccd	[llvm-profdata] Do not create numerical strings for MD5 function names read from a Sample Profile. (#66164 ) This is phase 2 of the MD5 refactoring on Sample Profile following https://reviews.llvm.org/D147740 In previous implementation, when a MD5 Sample Profile is read, the reader first converts the MD5 values to strings, and then create a StringRef as if the numerical strings are regular function names, and later on IPO transformation passes perform string comparison over these numerical strings for profile matching. This is inefficient since it causes many small heap allocations. In this patch I created a class `ProfileFuncRef` that is similar to `StringRef` but it can represent a hash value directly without any conversion, and it will be more efficient (I will attach some benchmark results later) when being used in associative containers. ProfileFuncRef guarantees the same function name in string form or in MD5 form has the same hash value, which also fix a few issue in IPO passes where function matching/lookup only check for function name string, while returns a no-match if the profile is MD5. When testing on an internal large profile (> 1 GB, with more than 10 million functions), the full profile load time is reduced from 28 sec to 25 sec in average, and reading function offset table from 0.78s to 0.7s	2023-10-17 21:09:39 +00:00
Fangrui Song	111fcb0df0	[llvm] Fix duplicate word typos. NFC Those fixes were taken from https://reviews.llvm.org/D137338	2023-09-01 18:25:16 -07:00
wlei	f14a5ff635	[CSSPGO] Refactoring findIRAnchors Address feedback in https://reviews.llvm.org/D158817. Since `extractProbe` can be used for both calliste and BB probe, we can leverage this to unify the callsite handling code. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D159169	2023-08-31 16:25:47 -07:00
Jie Fu	3b51881dd5	[CSSPGO] Silence -Wunused-but-set-variable warning without asserts (NFC) /data/home/jiefu/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:2189:8: error: variable 'IsFuncHashMismatch' set but not used [-Werror,-Wunused-but-set-variable] bool IsFuncHashMismatch = false; ^ 1 error generated.	2023-08-31 09:58:29 +08:00
wlei	4bb6bbb9bf	[CSSPGO] Skip reporting staleness metrics for imported functions Accumulating the staleness metrics from per-link is less accurate than doing it from post-link time(assuming we use the offline profile mismatch as baseline), the reason is that there are some duplicated reports for the same functions, for example, one template function could be included in multiple TUs, but in post thin link time, only one function are kept(linkonce_odr) and others are marked as available-externally function. Hence, this change skips reporting the metrics for imported functions(available-externally). I saw the post-link number is now very close to the offline number(dump the mismatched functions and count the metrics offline based on the entire profile), sightly smaller than offline number due to some missing inlined functions. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D156725	2023-08-30 18:00:23 -07:00
wlei	3365cd4544	[CSSPGO] Compute checksum mismatch recursively on nested profile Follow-up diff for https://reviews.llvm.org/D158891. Compute the checksum mismatch based on the original nested profile. Additionally, use a recursive way to compute the children mismatched samples in the nested tree even the top-level func checksum is matched. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D158900	2023-08-30 18:00:23 -07:00
wlei	62a3f6c96e	[CSSPGO] Retire FlattenProfileForMatching - Always use flattened profile to find the profile anchors. Since profile under different contexts may have different inlined callsites, to get more profile anchors, we use a merged profile from all the contexts(the flattened profile) to find callsite anchors. - Compute the staleness metrics based on the original nested profile, as currently once a callsite is mismatched, all its children profile are dropped.(TODO: in future, we can improve to reuse the children valid profile) Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D158891	2023-08-30 18:00:23 -07:00
wlei	062af2e763	[CSSPGO] Support stale profile matching for LTO As in per-link time, callsites could be optimized out by inlining, we don't have those original call targets in the IR in LTO time. Additionally, the inlined code doesn't actually belong to the original function, the IR locations or pseudo probe parsed from it are incorrect and could mislead the matching later. This change adds the support to extract the original IR location info from the inlined code, specifically, it make sure to skip all the inlined code that doesn't belong the original function, but before that, it processes the inline frames of the debug info to extract the base frame and recover its callsite and callee target(name). Measured on some stale profile instances, all showed some perf improvements. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D156722	2023-08-30 18:00:23 -07:00
wlei	148cceb0d6	[CSSPGO] Refactoring SampleProfileMatcher::runOnFunction - rename `IRLocation` --> `IRAnchors`, `ProfileLocation` --> `ProfileAnchors` - reorganize runOnFunction, fact out the finding IR anchors code into `findIRAnchors` - introduce a new function `findProfileAnchors` to populate the profile related anchors, the result is saved into `ProfileAnchors`, it's later used for both mismatch report and matching, this can avoid to parse the `getBodySamples` and `getCallsiteSamples` for multiple times. - move the `MatchedCallsiteLocs` stuffs from `findIRAnchors` to `countProfileMismatches` so that all the staleness metrics report are computed in one function. - move all matching related into `runStaleProfileMatching`, and move all mismatching report into `countProfileMismatches` Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D158817	2023-08-30 18:00:23 -07:00
William Huang	da2855c0ba	[SampleProfile] Potential use after move in SampleProfileLoader::promoteMergeNotInlinedContextSamples SampleProfileLoader::promoteMergeNotInlinedContextSample adds certain uninlined functions to the sample profile map (unordered_map, which is previously read from a profile file). This action may cause the map to be rehashed, invalidating all pointers to FunctionSamples used by many members of SampleProfileLoader, while the existing code did nothing to guard against that. This bug is theoretical since adding a few new functions to a large profile usually won't trigger a rehash, or even if there's a rehash std::unordered_map tries its best to expand its capacity in-place. This bug will trigger if the container type of sample profile map is changed to llvm::DenseMap or other implementation, such as in D147740, for SampleProfReader's performance reason. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D157061	2023-08-16 20:32:15 +00:00
wlei	bfefeeb139	[SamplePGO] Fix ICE that callee samples returns null while finding import functions We found that in a special condition, the input callee `Samples` is null for `findExternalInlineCandidate`, which caused an ICE. In some rare cases, call instruction could be changed after being pushed into inline candidate queue, this is because earlier inlining may expose constant propagation which can change indirect call to direct call. When this happens, we may fail to find matching function samples for the candidate later(for example if the profile is stale), even if a match was found when the candidate was enqueued. See this reduced program: file1.c: ``` int bar(int x); int(foo())() { return bar; }; void func() { int (fptr)(int); fptr = foo(); a += (fptr)(10); } ``` file2.c: ``` int bar(int x) { return x + 1;} ``` The two CALL: `foo` and `(ptr)` are pushed into the queue at the beginning, say `foo` is hotter and popped first for inlining. During the inlining of `foo`, it performs the constant propagation for the function pointer `bar` and then changed `(ptr)` to a direct call `bar(..)`. Note that at this time, `(ptr)/bar` is still in the queue, later while it's popped out for inlining, it use the a different target name(bar) to look for the callee samples. At the same time, if the profile is stale and the new function is different from the old function in the profile, then this led the return of the null callee sample. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D154637	2023-07-07 14:56:21 -07:00
wlei	444d2e1a54	[CSSPGO] Enable stale profile matching by default for CSSPGO We tested the stale profile matching on several Meta's internal services, all results are positive, for instance, in one service that refreshed its profile every one or two weeks, it consistently gave 1~2% performance improvement. We also observed an instance that a trivial refactoring caused a 2% regression and the matching can successfully recover the whole regression. Therefore, we'd like to turn it on by default for CSSPGO. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D154027	2023-06-29 11:18:51 -07:00
Amir Ayupov	b244a4c4c9	[profi][NFC] Get rid of afdo_detail::TypeMap Parametrize SampleProfileInference and SampleProfileLoaderBaseImpl by function type (Function/MachineFunction) instead of block type (BasicBlock/MachineBasicBlock). Move out specializations to appropriate locations. This change makes it possible to use GraphTraits instead of a custom TypeMap and make SampleProfileInference not dependent on LLVM types, paving the way for generalizing SampleProfileInference interfaces to BOLT IR types (BinaryFunction/BinaryBasicBlock) in stale profile matching (D144500). Reviewed By: hoy Differential Revision: https://reviews.llvm.org/D152187	2023-06-06 13:48:37 -07:00
Hongtao Yu	b7d9322b49	[FS-AFDO] Load pseudo probe profile on MIR This change enables loading pseudo-probe based profile on MIR. Different from the IR profile loader, callsites are excluded from MIR profile loading since they are not assinged a FS discriminator. Using zero as the discriminator is not accurate and would undo the distribution work done by the IR loader based on pseudo probe distribution factor. We reply on block probes only for FS profile loading. Some refactoring is done to the IR profile loader so that `getProbeWeight` can be shared by both loaders. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D148584	2023-05-10 11:29:37 -07:00
wlei	ba3cbc7aad	fix a use-after-free failure	2023-04-28 15:55:51 -07:00
wlei	892daede72	[SamplePGO] Stale profile matching(part 2) Part 2 of https://reviews.llvm.org/D147456 Use callee name on IR as an anchor to match the call target/inlinee name in the profile. The advantages of this in particular: - Different from the traditional way of encoding hash signatures to every block that would affect binary/profile size and build speed, it doesn't require any additional information for this, all the data is already in the IR and profiles. - Effective for current nested profile layout in which once a callsite is mismatched all the inlinee's profiles are dropped. The input of the algorithm: - IR locations: the anchor is the callee name of direct callsite. - Profile locations: the anchor is the call target name for `BodySample`s or inlinee's profile name for `CallsiteSamples`. The two lists are populated by parsing the IR and profile and both can be generalized as a sequence of locations with an optional anchor. For example: say location `1.2(foo)` refers to a callsite at `1.2` with callee name `foo` and `1.3` refers to a non-directcall location `1.3`. ``` // The current build source code: int main() { 1. ... 2. foo(); 3. ... 4 ... 5. ... 6. bar(); 7. ... } ``` IR locations are populated and simplified as: `[1, 2(foo), 3, 5, 6(bar), 7]`. ``` ; The "stale" profile: main:350:1 1: 1 2: 3 3: 100 foo:100 4: 2 7: 2 8: 200 bar:200 9: 30 ``` Profile locations are populated and simplified as `[1, 2, 3(foo), 4, 7, 8(bar), 9]` Matching heuristic: - Match all the anchors in lexical order first. - Match non-anchors evenly between two anchors: Split the non-anchor range, the first half is matched based on the start anchor, the second half is matched based on the end anchor. So the example above is matched like: ``` [1, 2(foo), 3, 5, 6(bar), 7] \| \| \| \| \| \| [1, 2, 3(foo), 4, 7, 8(bar), 9] ``` 3 -> 4 matching is based on anchor `foo`, 5 -> 7 matching is based on anchor `bar`. The output mapping of matching is [2->3, 3->4, 5->7, 6->8, 7->9]. For the implementation, the anchors are saved in a map for fast look-up. The result mapping is saved into `IRToProfileLocationMap`(see https://reviews.llvm.org/D147456) and distributed to all FunctionSamples(`distributeIRToProfileLocationMap`) Clang-self build benchmark: Current build version: clang-10 The profiled version: clang-9 Results compared to a refresh profile(collected profile on clang-10) and to be fair, we invalidated new functions' profiles(both refresh and stale profile use the same profile list). 1) Regression to using refresh profile with this off : -3.93% 2) Regression to using refresh profile with this on : -1.1% So this algorithm can recover ~72% of the regression. Internal(Meta) large-scale services. we saw one real instance of a 3 week stale profile., it delivered a ~1.8% win. Notes or future work: - Classic AutoFDO support: the current version only supports pseudo-probe, but I believe it's not hard to extend to classic line-number based AutoFDO since pseudo-probe and line-number are shared the LineLocation structure. - The fuzzy matching is an open-ended area and there could be more heuristics to try out, but since the current version already recovers a reasonable percentage of regression(with some pseudo probe order change, it can recover close to 90%), I'm submitting the patch for review and we will try more heuristics in future. - Profile call target name are only available when the call is hit by samples, the missing anchor might mislead the matching, this can be mitigated in llvm-profgen to generate the call target for the zero samples. - This doesn't handle function name mismatch, we plan to solve it in future. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D147545	2023-04-28 13:07:32 -07:00
wlei	a98d6a11ea	[SamplePGO] Stale profile matching(part 1) AutoFDO/CSSPGO often has to deal with stale profiles collected on binaries built from several revisions behind release. It’s likely to get incorrect profile annotations using the stale profile, which results in unstable or low performing binaries. Currently for source location based profile, once a code change causes a profile mismatch, all the locations afterward are mismatched, the affected samples or inlining info are lost. If we can provide a matching framework to reuse parts of the mismatched profile - aka incremental PGO, it will make PGO more stable, also increase the optimization coverage and boost the performance of binary. This patch is the part 1 of stale profile matching, summary of the implementation: - Added a structure for the matching result:`LocToLocMap`, which is a location to location map meaning the location of current build is matched to the location of the previous build(to be used to query the “stale” profile). - In order to use the matching results for sample query, we need to pass them to all the location queries. For code cleanliness, we added a new pointer field(`IRToProfileLocationMap`) to `FunctionSamples`. - Added a wrapper(`mapIRLocToProfileLoc`) for the query to the location, the location from input IR will be remapped to the matched profile location. - Added a new switch `--salvage-stale-profile`. - Some refactoring for the staleness detection. Test case is in part 2 with the matching algorithm. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D147456	2023-04-28 13:07:32 -07:00
Bjorn Pettersson	a20f7efbc5	Remove several no longer needed includes. NFCI Mostly removing includes of InitializePasses.h and Pass.h in passes that no longer has support for the legacy PM.	2023-04-17 13:54:19 +02:00
wlei	339b8a0019	[AutoFDO] Use flattened profiles for profile staleness metrics For profile staleness report, before it only counts for the top-level function samples in the nested profile, the samples in the inlinees are ignored. This could affect the quality of the metrics when there are heavily inlined functions. This change adds a feature to flatten the nested profile and we're changing to use flatten profile as the input for stale profile detection and matching. Example for profile flattening: ``` Original profile: _Z3bazi:20301:1000 1: 1000 3: 2000 5: inline1:1600 1: 600 3: inline2:500 1: 500 Flattened profile: _Z3bazi:18701:1000 1: 1000 3: 2000 5: 600 inline1:600 inline1:1100:600 1: 600 3: 500 inline2: 500 inline2:500:500 1: 500 ``` This feature could be useful for offline analysis, like understanding the hotness of each individual function. So I'm adding the support to `llvm-profdata merge` under `--gen-flattened-profile`. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D146452	2023-03-30 11:05:10 -07:00
Arthur Eubanks	fa6ea7a419	[AlwaysInliner] Make legacy pass like the new pass The legacy pass is only used in AMDGPU codegen, which doesn't care about running it in call graph order (it actually has to work around that fact). Make the legacy pass a module pass and share code with the new pass. This allows us to remove the legacy inliner infrastructure. Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D146446	2023-03-21 11:04:22 -07:00
Arthur Eubanks	eecb8c5f06	[SampleProfile] Use LazyCallGraph instead of CallGraph The function order in some tests had to be changed because they relied on ordering of functions returned in an SCC which is consistent but unspecified.	2023-03-20 13:43:54 -07:00
Fangrui Song	1e6921131a	Move global namespace cl::opt inside llvm::	2023-02-14 00:09:44 -08:00
Steven Wu	516e301752	[NFC][Profile] Access profile through VirtualFileSystem Make the access to profile data going through virtual file system so the inputs can be remapped. In the context of the caching, it can make sure we capture the inputs and provided an immutable input as profile data. Reviewed By: akyrtzi, benlangmuir Differential Revision: https://reviews.llvm.org/D139052	2023-02-01 09:25:02 -08:00
Fangrui Song	21c4dc7997	std::optional::value => operator*/operator-> value() has undesired exception checking semantics and calls __throw_bad_optional_access in libc++. Moreover, the API is unavailable without _LIBCPP_NO_EXCEPTIONS on older Mach-O platforms (see _LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS). This fixes clang.	2022-12-17 00:42:05 +00:00
wlei	97e2aeab71	[AutoFDO] Use getHeadSamplesEstimate instead of getTotalSamples to compute profile callsite staleness Fix two issues for profile staleness report. 1) It should be more accurate to use the sum of all entry count(`getHeadSamplesEstimate`) for the callsite samples than the total samples, since even the top-level callsite is mismatched, it does affect the inlining but it can still be merged into base profile and used later. 2) I accidentally missed to persist the num of mismatched callsite into binary. Also added the asm testing to test the decoding of the section. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D140063	2022-12-15 11:21:18 -08:00

1 2 3 4 5 ...

369 Commits