9 Commits

Author SHA1 Message Date
Lei Wang
068d0c0f4b
[CSSPGO] Turn on call-graph matching by default for CSSPGO (#125938)
Tested call-graph matching on some of Meta's large services, it works to
reuse some renamed function profiles, no negative perf or significant
build speed regression observed. Turned it on by default for CSSPGO
mode.
2025-02-06 11:54:59 -08:00
Kazu Hirata
3a5791e75c
[SampleProf] Templatize longestCommonSequence (NFC) (#114633)
This patch moves the implementation of longestCommonSequence to a new
header file.

I'm planning to implement a profile undrifting algorithm for MemProf
so that the compiler can ingest somewhat stale MemProf profile and
still deliver most of the benefits that would be delivered if the
profile were completely up to date (with no line number or column
number differences).

Since the core undrifting algorithm is the same between MemProf and
AutoFDO, this patch turns longestCommonSequence into a template.  The
original longestCommonSequence implementation is repurposed and now
serves as a wrapper around a template specialization.

Note that the usage differences between MemProf and AutoFDO are minor.
For example, I'm planning to use line-column number pair instead of
LineLocation, which uses a discriminator.  To identify a function, I'm
planning to use uint64_t GUID instead of FunctionId.

For now, I'm returning matches via a function object InsertMatching
because it's impossible to infer the map type from LineLocation alone.
Specifically:

  std::unordered_map<LineLocation, LineLocation>

does not work because we cannot infer the hash functor
LineLocationHash.  I could define std::hash<LineLocation>.
Alternatively, in the future, I might switch to DenseMap and define
DenseMapInfo<LineLocation>.  This way:

  DenseMap<LineLocation, LineLocation>

automatically picks up DenseMapInfo<LineLocation>.
2024-11-04 13:20:25 -08:00
Lei Wang
6e60330af5
[SampleFDO] Read call-graph matching recovered top-level function profile (#101053)
With extbinary profile format, initial profile loading only reads
profile based on current function names in the module. However, if a
function is renamed, sample loader skips to load its original
profile(which has a different name), we will miss this case. To address
this, we load the top-level profile candidate explicitly for the
matching. If a match is found later, the function profile will be
further preserved for use by the sample loader.
2024-09-04 11:08:40 -07:00
Lei Wang
18cdfa72e0
[SampleFDO] Stale profile call-graph matching (#95135)
Profile staleness could be due to function renaming. Given that sample
profile loader relies on exact string matching, a trivial change in the
function signature( such as `int foo()` --> `long foo()` ) can make the
mangled name different, the function profile(including all nested
children profile) becomes unavailable.

This patch introduces stale profile call-graph level matching, targeting
at identifying the trivial function renaming and reusing the old
function profile.

Some noteworthy details:

1. Extend the LCS based CFG level matching to identify new function. 
- Extend to match function and profile have different name instead of
the exact function name matching. This leverages LCS, i.e during the
finding of callsite anchor matching, when two function name are
different, try matching the functions instead of return.
- In LCS, the equal function check is replaced by
`functionMatchesProfile`.
- Only try matching functions that are new functions(neither appears on
each side). This reduces the matching scope as we don't need to match
the originally matched function.
2.  Determine the matching by call-site anchor similarity check.
- A new function `functionMatchesProfile(IRFunc, ProfFunc)` is used to
check the renaming for the possible <IRFunc, ProfFunc> pair, use the
LCS(diff) matching to compute the equal set and we define: `Similarity =
|equalSet * 2| / (|A| + |B|)`. The profile name is marked as renamed if
the similarity is above a
threshold(`-func-profile-similarity-threshold`)

3.  Process the matching in top-down function order 
- when a caller's is done matching, the new function names are saved for
later use, using top-down order will maximize the reused results.
- `ProfileNameToFuncMap` is used to save or cache the matching result.
4. Update the original profile at the end using `ProfileNameToFuncMap`.

5. Added a new switch --salvage-unused-profile to control this, default
is false.

Verified on one Meta's internal big service, confirmed 90%+ of the found
renaming pair is good. (There could be incorrect renaming pair if the
num of the anchor is small, but checked that those functions are simple
cold function)
2024-07-17 10:33:00 -07:00
Krzysztof Pszeniczny
21ba91c439
[SamplePGO] Add a cutoff for number of profile matching anchors (#95542)
The algorithm added by PR #87375 can be potentially quadratic in the
number of anchors. This is almost never a problem because normally
functions have a reasonable number of function calls.

However, in some rare cases of auto-generated code we observed very
large functions that trigger quadratic behaviour here (resulting in
>130GB of peak heap memory usage for clang). Let's add a knob for
controlling the max number of callsites in a function above which stale
profile matching won't be performed.
2024-06-17 19:50:58 +02:00
Lei Wang
5b6f151104
[SampleFDO] Improve stale profile matching by diff algorithm (#87375)
This change improves the matching algorithm by using the diff algorithm,
the current matching algorithm only processes the callsites grouped by
the same name functions, it doesn't consider the order relationships
between different name functions, this sometimes fails to handle this
ambiguous anchor case. For example. (`Foo:1` means a
calliste[callee_name: callsite_location])
```
IR :      foo:1  bar:2  foo:4  bar:5 
Profile :        bar:3  foo:5  bar:6
```
The `foo:1` is matched to the 2nd `foo:5` and using the diff
algorithm(finding longest common subsequence ) can help on this issue.
One well-known diff algorithm is the Myers diff algorithm(paper "An
O(ND) Difference Algorithm and Its Variations∗" Eugene W. Myers), its
variations have been implemented and used in many famous tools, like the
GNU diff or git diff. It provides an efficient way to find the longest
common subsequence or the shortest edit script through graph searching.
There are several variations/refinements for the algorithm, but as in
our case, the num of function callsites is usually very small, so we
implemented the basic greedy version in this change which should be good
enough.
We observed better matchings and positive perf improvement on our
internal services.
2024-05-13 16:01:29 -07:00
Krzysztof Pszeniczny
e06d6ed1ef
[SamplePGO] Handle FS discriminators in SampleProfileMatcher (#90858)
Currently the code uses FunctionSamples::getCallSiteIdentifier which
will sometimes incorrectly guess that FSAFDO discriminators are probe
based and will convert them incorrectly.

This change doesn't affect builds which don't use FSAFDO, it only fixes
sample profile matching with FS discriminators.

The test for this is manually updated to use discriminator value 15,
which is a perfectly valid base discriminator in the FS world, but
satisfies `isPseudoProbeDiscriminator`, so
`getBaseDiscriminatorFromDiscriminator` will incorrectly extract the
probe index from it.

Note: this change only affects how the base discriminators will be
extracted when doing stale profile matching in the IR-level sample
profile loader. It doesn't add stale profile matching to the MIR-level
FS profile loader pass.
2024-05-02 10:32:37 -07:00
Krzysztof Pszeniczny
17642c7602
[SamplePGO] Support -salvage-stale-profile without probes too (#86116)
Currently -salvage-stale-profile is a no-op if the profile is not
probe-based. We observed that it can help for regular, non-probe- based
profiles too: some of our internal benchmarks show 0.2-0.3% QPS
improvement.

There seems to be no good reason to limit this flag to only work for
probe-based profiles.
2024-04-03 09:46:00 -07:00
Lei Wang
1d99d7a6f8
[SampleFDO][NFC] Refactoring SampleProfileMatcher (#86988)
Move all the stale profile matching stuffs into new files so that it can
be shared for unit testing.
2024-03-28 20:03:03 -07:00