llvm-project

Author	SHA1	Message	Date
Lei Wang	6e60330af5	[SampleFDO] Read call-graph matching recovered top-level function profile (#101053 ) With extbinary profile format, initial profile loading only reads profile based on current function names in the module. However, if a function is renamed, sample loader skips to load its original profile(which has a different name), we will miss this case. To address this, we load the top-level profile candidate explicitly for the matching. If a match is found later, the function profile will be further preserved for use by the sample loader.	2024-09-04 11:08:40 -07:00
Lei Wang	18cdfa72e0	[SampleFDO] Stale profile call-graph matching (#95135 ) Profile staleness could be due to function renaming. Given that sample profile loader relies on exact string matching, a trivial change in the function signature( such as `int foo()` --> `long foo()` ) can make the mangled name different, the function profile(including all nested children profile) becomes unavailable. This patch introduces stale profile call-graph level matching, targeting at identifying the trivial function renaming and reusing the old function profile. Some noteworthy details: 1. Extend the LCS based CFG level matching to identify new function. - Extend to match function and profile have different name instead of the exact function name matching. This leverages LCS, i.e during the finding of callsite anchor matching, when two function name are different, try matching the functions instead of return. - In LCS, the equal function check is replaced by `functionMatchesProfile`. - Only try matching functions that are new functions(neither appears on each side). This reduces the matching scope as we don't need to match the originally matched function. 2. Determine the matching by call-site anchor similarity check. - A new function `functionMatchesProfile(IRFunc, ProfFunc)` is used to check the renaming for the possible <IRFunc, ProfFunc> pair, use the LCS(diff) matching to compute the equal set and we define: `Similarity = \|equalSet * 2\| / (\|A\| + \|B\|)`. The profile name is marked as renamed if the similarity is above a threshold(`-func-profile-similarity-threshold`) 3. Process the matching in top-down function order - when a caller's is done matching, the new function names are saved for later use, using top-down order will maximize the reused results. - `ProfileNameToFuncMap` is used to save or cache the matching result. 4. Update the original profile at the end using `ProfileNameToFuncMap`. 5. Added a new switch --salvage-unused-profile to control this, default is false. Verified on one Meta's internal big service, confirmed 90%+ of the found renaming pair is good. (There could be incorrect renaming pair if the num of the anchor is small, but checked that those functions are simple cold function)	2024-07-17 10:33:00 -07:00
Krzysztof Pszeniczny	21ba91c439	[SamplePGO] Add a cutoff for number of profile matching anchors (#95542 ) The algorithm added by PR #87375 can be potentially quadratic in the number of anchors. This is almost never a problem because normally functions have a reasonable number of function calls. However, in some rare cases of auto-generated code we observed very large functions that trigger quadratic behaviour here (resulting in >130GB of peak heap memory usage for clang). Let's add a knob for controlling the max number of callsites in a function above which stale profile matching won't be performed.	2024-06-17 19:50:58 +02:00
Lei Wang	5b6f151104	[SampleFDO] Improve stale profile matching by diff algorithm (#87375 ) This change improves the matching algorithm by using the diff algorithm, the current matching algorithm only processes the callsites grouped by the same name functions, it doesn't consider the order relationships between different name functions, this sometimes fails to handle this ambiguous anchor case. For example. (`Foo:1` means a calliste[callee_name: callsite_location]) ``` IR : foo:1 bar:2 foo:4 bar:5 Profile : bar:3 foo:5 bar:6 ``` The `foo:1` is matched to the 2nd `foo:5` and using the diff algorithm(finding longest common subsequence ) can help on this issue. One well-known diff algorithm is the Myers diff algorithm(paper "An O(ND) Difference Algorithm and Its Variations∗" Eugene W. Myers), its variations have been implemented and used in many famous tools, like the GNU diff or git diff. It provides an efficient way to find the longest common subsequence or the shortest edit script through graph searching. There are several variations/refinements for the algorithm, but as in our case, the num of function callsites is usually very small, so we implemented the basic greedy version in this change which should be good enough. We observed better matchings and positive perf improvement on our internal services.	2024-05-13 16:01:29 -07:00
Krzysztof Pszeniczny	e06d6ed1ef	[SamplePGO] Handle FS discriminators in SampleProfileMatcher (#90858 ) Currently the code uses FunctionSamples::getCallSiteIdentifier which will sometimes incorrectly guess that FSAFDO discriminators are probe based and will convert them incorrectly. This change doesn't affect builds which don't use FSAFDO, it only fixes sample profile matching with FS discriminators. The test for this is manually updated to use discriminator value 15, which is a perfectly valid base discriminator in the FS world, but satisfies `isPseudoProbeDiscriminator`, so `getBaseDiscriminatorFromDiscriminator` will incorrectly extract the probe index from it. Note: this change only affects how the base discriminators will be extracted when doing stale profile matching in the IR-level sample profile loader. It doesn't add stale profile matching to the MIR-level FS profile loader pass.	2024-05-02 10:32:37 -07:00
Krzysztof Pszeniczny	17642c7602	[SamplePGO] Support -salvage-stale-profile without probes too (#86116 ) Currently -salvage-stale-profile is a no-op if the profile is not probe-based. We observed that it can help for regular, non-probe- based profiles too: some of our internal benchmarks show 0.2-0.3% QPS improvement. There seems to be no good reason to limit this flag to only work for probe-based profiles.	2024-04-03 09:46:00 -07:00
Lei Wang	1d99d7a6f8	[SampleFDO][NFC] Refactoring SampleProfileMatcher (#86988 ) Move all the stale profile matching stuffs into new files so that it can be shared for unit testing.	2024-03-28 20:03:03 -07:00

7 Commits