346 Commits

Author SHA1 Message Date
Matthias Braun
cb4627d150
Add setBranchWeigths convenience function. NFC (#72446)
Add `setBranchWeights` convenience function to ProfDataUtils.h and use
it where appropriate.
2023-11-16 10:55:19 -08:00
William Junda Huang
683f2df6e5
[SampleProfile] Fix bug where remapper returns empty string and crashing Sample Profile loader (#71479)
Normally SampleContext does not allow using an empty StirngRef to
construct an object, this is to prevent bugs reading the profile.
However empty names may be emitted by a function which its name is
intentionally set to empty, or a bug in the remapper that returns an
empty string. Regardless, converting it to FunctionId first will prevent
the assert, and that assert check is unnecessary, which will be
addressed in another patch
2023-11-10 21:38:13 +00:00
William Junda Huang
ef0e0adccd
[llvm-profdata] Do not create numerical strings for MD5 function names read from a Sample Profile. (#66164)
This is phase 2 of the MD5 refactoring on Sample Profile following
https://reviews.llvm.org/D147740
    
In previous implementation, when a MD5 Sample Profile is read, the
reader first converts the MD5 values to strings, and then create a
StringRef as if the numerical strings are regular function names, and
later on IPO transformation passes perform string comparison over these
numerical strings for profile matching. This is inefficient since it
causes many small heap allocations.
In this patch I created a class `ProfileFuncRef` that is similar to
`StringRef` but it can represent a hash value directly without any
conversion, and it will be more efficient (I will attach some benchmark
results later) when being used in associative containers.

ProfileFuncRef guarantees the same function name in string form or in
MD5 form has the same hash value, which also fix a few issue in IPO
passes where function matching/lookup only check for function name
string, while returns a no-match if the profile is MD5.

When testing on an internal large profile (> 1 GB, with more than 10
million functions), the full profile load time is reduced from 28 sec to
25 sec in average, and reading function offset table from 0.78s to 0.7s
2023-10-17 21:09:39 +00:00
Fangrui Song
111fcb0df0 [llvm] Fix duplicate word typos. NFC
Those fixes were taken from https://reviews.llvm.org/D137338
2023-09-01 18:25:16 -07:00
wlei
f14a5ff635 [CSSPGO] Refactoring findIRAnchors
Address feedback in https://reviews.llvm.org/D158817. Since `extractProbe` can be used for both calliste and BB probe, we can leverage this to unify the callsite handling code.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D159169
2023-08-31 16:25:47 -07:00
Jie Fu
3b51881dd5 [CSSPGO] Silence -Wunused-but-set-variable warning without asserts (NFC)
/data/home/jiefu/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:2189:8: error: variable 'IsFuncHashMismatch' set but not used [-Werror,-Wunused-but-set-variable]
  bool IsFuncHashMismatch = false;
       ^
1 error generated.
2023-08-31 09:58:29 +08:00
wlei
4bb6bbb9bf [CSSPGO] Skip reporting staleness metrics for imported functions
Accumulating the staleness metrics from per-link is less accurate than doing it from post-link time(assuming we use the offline profile mismatch as baseline), the reason is that there are some duplicated reports for the same functions, for example, one template function could be included in multiple TUs, but in post thin link time, only one function are kept(linkonce_odr) and others are marked as available-externally function. Hence, this change skips reporting the metrics for imported functions(available-externally).

I saw the post-link number is now very close to the offline number(dump the mismatched functions and count the metrics offline based on the entire profile), sightly smaller than offline number due to some missing inlined functions.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D156725
2023-08-30 18:00:23 -07:00
wlei
3365cd4544 [CSSPGO] Compute checksum mismatch recursively on nested profile
Follow-up diff for https://reviews.llvm.org/D158891. Compute the checksum mismatch based on the original nested profile. Additionally, use a recursive way to compute the children mismatched samples in the nested tree even the top-level func checksum is matched.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D158900
2023-08-30 18:00:23 -07:00
wlei
62a3f6c96e [CSSPGO] Retire FlattenProfileForMatching
- Always use flattened profile to find the profile anchors. Since profile under different contexts may have different inlined callsites, to get more profile anchors, we use a merged profile from all the contexts(the flattened profile) to find callsite anchors.

- Compute the staleness metrics based on the original nested profile, as currently once a callsite is mismatched, all its children profile are dropped.(TODO: in future, we can improve to reuse the children valid profile)

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D158891
2023-08-30 18:00:23 -07:00
wlei
062af2e763 [CSSPGO] Support stale profile matching for LTO
As in per-link time, callsites could be optimized out by inlining, we don't have those original call targets in the IR in LTO time. Additionally, the inlined code doesn't actually belong to the original function, the IR locations or pseudo probe parsed from it are incorrect and could mislead the matching later.

This change adds the support to extract the original IR location info from the inlined code, specifically, it make sure to skip all the inlined code that doesn't belong the original function, but before that, it processes the inline frames of the debug info to extract the base frame and recover its callsite and callee target(name).

Measured on some stale profile instances, all showed some perf improvements.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D156722
2023-08-30 18:00:23 -07:00
wlei
148cceb0d6 [CSSPGO] Refactoring SampleProfileMatcher::runOnFunction
- rename `IRLocation` --> `IRAnchors`,  `ProfileLocation` --> `ProfileAnchors`
- reorganize runOnFunction, fact out the finding IR anchors code into `findIRAnchors`
- introduce a new function `findProfileAnchors` to populate the profile related anchors, the result is saved into `ProfileAnchors`, it's later used for both mismatch report and matching, this can avoid to parse the `getBodySamples` and `getCallsiteSamples` for multiple times.
- move the `MatchedCallsiteLocs` stuffs from `findIRAnchors` to `countProfileMismatches` so that all the staleness metrics report are computed in one function.
- move all matching related into `runStaleProfileMatching`, and move all mismatching report into `countProfileMismatches`

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D158817
2023-08-30 18:00:23 -07:00
William Huang
da2855c0ba [SampleProfile] Potential use after move in SampleProfileLoader::promoteMergeNotInlinedContextSamples
SampleProfileLoader::promoteMergeNotInlinedContextSample adds
certain uninlined functions to the sample profile map (unordered_map, which is
previously read from a profile file). This action may cause the map to
be rehashed, invalidating all pointers to FunctionSamples used by many
members of SampleProfileLoader, while the existing code did nothing to
guard against that. This bug is theoretical since adding a few
new functions to a large profile usually won't trigger a rehash, or even
if there's a rehash std::unordered_map tries its best to expand its
capacity in-place.

This bug will trigger if the container type of sample profile map is
changed to llvm::DenseMap or other implementation, such as in D147740,
for SampleProfReader's performance reason.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D157061
2023-08-16 20:32:15 +00:00
wlei
bfefeeb139 [SamplePGO] Fix ICE that callee samples returns null while finding import functions
We found that in a special condition, the input callee `Samples` is null for `findExternalInlineCandidate`, which caused an ICE.

In some rare cases, call instruction could be changed after being pushed into inline candidate queue, this is because earlier inlining may expose constant propagation which can change indirect call to direct call. When this happens, we may fail to find matching function samples for the candidate later(for example if the profile is stale), even if a match was found when the candidate was enqueued.

See this reduced program:

file1.c:
```
 int bar(int x);

 int(*foo())()  {
  return bar;
 };

void func()
 {
     int (*fptr)(int);
     fptr = foo();
     a +=  (*fptr)(10);
 }
```
file2.c:
```
int bar(int x) { return x + 1;}
```

The two CALL: `foo` and `(*ptr)` are pushed into the queue at the beginning, say `foo` is hotter and popped first for inlining. During the inlining of `foo`, it performs the constant propagation for the function pointer `bar` and then changed `(*ptr)` to a direct call `bar(..)`. Note that at this time, `(*ptr)/bar` is still in the queue, later while it's popped out for inlining, it use the a different target name(bar) to look for the callee samples. At the same time, if the profile is stale and the new function is different from the old function in the profile, then this led the return of the null callee sample.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D154637
2023-07-07 14:56:21 -07:00
wlei
444d2e1a54 [CSSPGO] Enable stale profile matching by default for CSSPGO
We tested the stale profile matching on several Meta's internal services, all results are positive, for instance, in one service that refreshed its profile every one or two weeks, it consistently gave 1~2% performance improvement. We also observed an instance that a trivial refactoring caused a 2% regression and the matching can successfully recover the whole regression. Therefore, we'd like to turn it on by default for CSSPGO.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D154027
2023-06-29 11:18:51 -07:00
Amir Ayupov
b244a4c4c9 [profi][NFC] Get rid of afdo_detail::TypeMap
Parametrize SampleProfileInference and SampleProfileLoaderBaseImpl by function
type (Function/MachineFunction) instead of block type
(BasicBlock/MachineBasicBlock). Move out specializations to appropriate
locations.

This change makes it possible to use GraphTraits instead of a custom TypeMap and
make SampleProfileInference not dependent on LLVM types, paving the way for
generalizing SampleProfileInference interfaces to BOLT IR types
(BinaryFunction/BinaryBasicBlock) in stale profile matching (D144500).

Reviewed By: hoy

Differential Revision: https://reviews.llvm.org/D152187
2023-06-06 13:48:37 -07:00
Hongtao Yu
b7d9322b49 [FS-AFDO] Load pseudo probe profile on MIR
This change enables loading pseudo-probe based profile on MIR. Different from the IR profile loader, callsites are excluded from MIR profile loading since they are not assinged a FS discriminator. Using zero as the discriminator is not accurate and would undo the distribution work done by the IR loader based on pseudo probe distribution factor. We reply on block probes only for FS profile loading.

Some refactoring is done to the IR profile loader so that `getProbeWeight` can be shared by both loaders.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D148584
2023-05-10 11:29:37 -07:00
wlei
ba3cbc7aad fix a use-after-free failure 2023-04-28 15:55:51 -07:00
wlei
892daede72 [SamplePGO] Stale profile matching(part 2)
Part 2 of https://reviews.llvm.org/D147456
Use callee name on IR as an anchor to match the call target/inlinee name in the profile. The advantages of this in particular:
- Different from the traditional way of encoding hash signatures to every block that would affect binary/profile size and build speed, it doesn't require any additional information for this, all the data is already in the IR and profiles.
- Effective for current nested profile layout in which once a callsite is mismatched all the inlinee's profiles are dropped.
**The input of the algorithm:**
- IR locations: the anchor is the callee name of direct callsite.
- Profile locations: the anchor is the call target name for `BodySample`s or inlinee's profile name for `CallsiteSamples`.
The two lists are populated by parsing the IR and profile and both can be generalized as a sequence of locations with an optional anchor.
For example: say location `1.2(foo)` refers to a callsite at `1.2` with callee name `foo` and `1.3` refers to a non-directcall location `1.3`.
```
// The current build source code:
   int main() {
1.     ...
2.     foo();
3.     ...
4      ...
5.     ...
6.     bar();
7.     ...
   }
```
IR locations are populated and simplified as: `[1, 2(foo), 3, 5, 6(bar), 7]`.
```
; The "stale" profile:
main:350:1
 1: 1
 2: 3
 3: 100 foo:100
 4: 2
 7: 2
 8: 200 bar:200
 9: 30
```
Profile locations are populated and simplified as `[1, 2, 3(foo), 4, 7, 8(bar), 9]`
**Matching heuristic:**
- Match all the anchors in lexical order first.
- Match non-anchors evenly between two anchors: Split the non-anchor range, the first half is matched based on the start anchor, the second half is matched based on the end anchor.
So the example above is matched like:
```
   [1,    2(foo), 3,  5,  6(bar), 7]
    |     |       |   |     |     |
   [1, 2, 3(foo), 4,  7,  8(bar), 9]
```
3 -> 4 matching is based on anchor `foo`, 5 -> 7 matching is based on anchor `bar`.
The output mapping of matching is [2->3, 3->4, 5->7, 6->8, 7->9].

For the implementation, the anchors are saved in a map for fast look-up. The result mapping is saved into `IRToProfileLocationMap`(see https://reviews.llvm.org/D147456) and distributed to all FunctionSamples(`distributeIRToProfileLocationMap`)

**Clang-self build benchmark: **
Current build version: clang-10
The profiled version:  clang-9
Results compared to a refresh profile(collected profile on clang-10) and to be fair, we invalidated new functions' profiles(both refresh and stale profile use the same profile list).
1) Regression to using refresh profile with this off : -3.93%
2) Regression to using refresh profile with this on  : -1.1%
So this algorithm can recover ~72% of the regression.
**Internal(Meta) large-scale services.**
we saw one real instance of a 3 week stale profile., it delivered a ~1.8% win.

**Notes or future work:**
- Classic AutoFDO support: the current version only supports pseudo-probe, but I believe it's not hard to extend to classic line-number based AutoFDO since pseudo-probe and line-number are shared the LineLocation structure.
- The fuzzy matching is an open-ended area and there could be more heuristics to try out, but since the current version already recovers a reasonable percentage of regression(with some pseudo probe order change, it can recover close to 90%), I'm submitting the patch for review and we will try more heuristics in future.
- Profile call target name are only available when the call is hit by samples, the missing anchor might mislead the matching, this can be mitigated in llvm-profgen to generate the call target for the zero samples.
- This doesn't handle function name mismatch, we plan to solve it in future.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D147545
2023-04-28 13:07:32 -07:00
wlei
a98d6a11ea [SamplePGO] Stale profile matching(part 1)
AutoFDO/CSSPGO often has to deal with stale profiles collected on binaries built from several revisions behind release. It’s likely to get incorrect profile annotations using the stale profile, which results in unstable or low performing binaries. Currently for source location based profile, once a code change causes a profile mismatch, all the locations afterward are mismatched, the affected samples or inlining info are lost. If we can provide a matching framework to reuse parts of the mismatched profile - aka incremental PGO, it will make PGO more stable, also increase the optimization coverage and boost the performance of binary.

This patch is the part 1 of stale profile matching, summary of the implementation:
 - Added a structure for the matching result:`LocToLocMap`, which is a location to location map meaning the location of current build is matched to the location of the previous build(to be used to query the “stale” profile).
 - In order to use the matching results for sample query, we need to pass them to all the location queries. For code cleanliness, we added a new pointer field(`IRToProfileLocationMap`) to `FunctionSamples`.
 - Added a wrapper(`mapIRLocToProfileLoc`) for the query to the location, the location from input IR will be remapped to the matched profile location.
 - Added a new switch `--salvage-stale-profile`.
 - Some refactoring for the staleness detection.

Test case is in part 2 with the matching algorithm.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D147456
2023-04-28 13:07:32 -07:00
Bjorn Pettersson
a20f7efbc5 Remove several no longer needed includes. NFCI
Mostly removing includes of InitializePasses.h and Pass.h in
passes that no longer has support for the legacy PM.
2023-04-17 13:54:19 +02:00
wlei
339b8a0019 [AutoFDO] Use flattened profiles for profile staleness metrics
For profile staleness report, before it only counts for the top-level function samples in the nested profile, the samples in the inlinees are ignored. This could affect the quality of the metrics when there are heavily inlined functions. This change adds a feature to flatten the nested profile and we're changing to use flatten profile as the input for stale profile detection and matching.
Example for profile flattening:

```
Original profile:
_Z3bazi:20301:1000
 1: 1000
 3: 2000
 5: inline1:1600
   1: 600
   3: inline2:500
     1: 500

Flattened profile:
_Z3bazi:18701:1000
 1: 1000
 3: 2000
 5: 600 inline1:600
inline1:1100:600
 1: 600
 3: 500 inline2: 500
inline2:500:500
 1: 500
```
This feature could be useful for offline analysis, like understanding the hotness of each individual function. So I'm adding the support to `llvm-profdata merge` under `--gen-flattened-profile`.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D146452
2023-03-30 11:05:10 -07:00
Arthur Eubanks
fa6ea7a419 [AlwaysInliner] Make legacy pass like the new pass
The legacy pass is only used in AMDGPU codegen, which doesn't care about running it in call graph order (it actually has to work around that fact).

Make the legacy pass a module pass and share code with the new pass.

This allows us to remove the legacy inliner infrastructure.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D146446
2023-03-21 11:04:22 -07:00
Arthur Eubanks
eecb8c5f06 [SampleProfile] Use LazyCallGraph instead of CallGraph
The function order in some tests had to be changed because they relied on ordering of functions returned in an SCC which is consistent but unspecified.
2023-03-20 13:43:54 -07:00
Fangrui Song
1e6921131a Move global namespace cl::opt inside llvm:: 2023-02-14 00:09:44 -08:00
Steven Wu
516e301752 [NFC][Profile] Access profile through VirtualFileSystem
Make the access to profile data going through virtual file system so the
inputs can be remapped. In the context of the caching, it can make sure
we capture the inputs and provided an immutable input as profile data.

Reviewed By: akyrtzi, benlangmuir

Differential Revision: https://reviews.llvm.org/D139052
2023-02-01 09:25:02 -08:00
Fangrui Song
21c4dc7997 std::optional::value => operator*/operator->
value() has undesired exception checking semantics and calls
__throw_bad_optional_access in libc++. Moreover, the API is unavailable without
_LIBCPP_NO_EXCEPTIONS on older Mach-O platforms (see
_LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS).

This fixes clang.
2022-12-17 00:42:05 +00:00
wlei
97e2aeab71 [AutoFDO] Use getHeadSamplesEstimate instead of getTotalSamples to compute profile callsite staleness
Fix two issues for profile staleness report.

1) It should be more accurate to use the sum of all entry count(`getHeadSamplesEstimate`) for the callsite samples than the total samples, since even the top-level callsite is mismatched, it does affect the inlining but it can still be merged into base profile and used later.

2) I accidentally missed to persist the num of mismatched callsite into binary.

Also added the asm testing to test the decoding of the section.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D140063
2022-12-15 11:21:18 -08:00
Fangrui Song
75801e3b45 Transforms/IPO: llvm::Optional => std::optional 2022-12-05 07:07:19 +00:00
Fangrui Song
89fae41ef1 [IR] llvm::Optional => std::optional
Many llvm/IR/* files have been migrated by other contributors.
This migrates most remaining files.
2022-12-05 04:13:11 +00:00
Vasileios Porpodas
8a1ccb8ae0 [NFC] Removed call to getInstList() from range loops on BBs.
Differential Revision: https://reviews.llvm.org/D138605
2022-11-29 17:33:10 -08:00
wlei
47b0758049 [SampleFDO] Persist profile staleness metrics into binary
With https://reviews.llvm.org/D136627, now we have the metrics for profile staleness based on profile statistics, monitoring the profile staleness in real-time can help user quickly identify performance issues. For a production scenario, the build is usually incremental and if we want the real-time metrics, we should store/cache all the old object's metrics somewhere and pull them in a post-build time. To make it more convenient, this patch add an option to persist them into the object binary, the metrics can be reported right away by decoding the binary rather than polling the previous stdout/stderrs from a cache system.

For implementation, it writes the statistics first into a new metadata section(llvm.stats) then encode into a special ELF `.llvm_stats` section. The section data is formatted as a list of key/value pair so that future statistics can be easily extended. This is also under a new switch(`-persist-profile-staleness`)

In terms of size overhead, the metrics are computed at module level, so the size overhead should be small, measured on one of our internal service, it costs less than < 1MB for a 10GB+ binary.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D136698
2022-11-09 22:34:33 -08:00
wlei
63c27c5c75 Use getCanonicalFnName for callee name 2022-10-27 13:14:47 -07:00
wlei
d6a0585dd1 [SampleFDO] Compute and report profile staleness metrics
When a profile is stale and profile mismatch could happen, the mismatched samples are discarded, so we'd like to compute the mismatch metrics to quantify how stale the profile is, which will suggest user to refresh the profile if the number is high.

Two sets of metrics are introduced here:

 - (Num_of_mismatched_funchash/Total_profiled_funchash), (Samples_of_mismached_func_hash / Samples_of_profiled_function) : Here it leverages the FunctionSamples's checksums attribute which is a feature of pseudo probe. When the source code CFG changes, the function checksums will be different, later sample loader will discard the whole functions' samples, this metrics can show the percentage of samples are discarded due to this.
 -  (Num_of_mismatched_callsite/Total_profiled_callsite), (Samples_of_mismached_callsite / Samples_of_profiled_callsite) : This shows how many mismatching for the callsite location as callsite location mismatch will affect the inlining which is highly correlated with the performance. It goes through all the callsite location in the IR and profile, use the call target name to match, report the num of samples in the profile that doesn't match a IR callsite.

This is implemented in a new class(SampleProfileMatcher) and under a switch("--report-profile-staleness"), we plan to extend it with a fuzzy profile matching feature in the future.

Reviewed By: hoy, wenlei, davidxl

Differential Revision: https://reviews.llvm.org/D136627
2022-10-26 21:06:52 -07:00
Kazu Hirata
00874c48ea [IPO] Reorder parameters of InlineFunction (NFC)
With the recent addition of new parameter MergeAttributes (D134117),
callers need to specify several default parameters before getting to
specify the new parameter.

This patch reorders the parameters so that callers do not have to
specify as many default parameters.

Differential Revision: https://reviews.llvm.org/D134125
2022-09-20 09:09:38 -07:00
Kazu Hirata
284f0397e2 [Transforms] Merge function attributes within InlineFunction (NFC)
In the past, we've had a bug resulting in a compiler crash after
forgetting to merge function attributes (D105729).

This patch teaches InlineFunction to merge function attributes.  This
way, we minimize the "time" when the IR is valid, but the function
attributes are not.

Differential Revision: https://reviews.llvm.org/D134117
2022-09-17 23:10:23 -07:00
Kazu Hirata
56ea4f9bd3 [Transforms] Qualify auto in range-based for loops (NFC)
Identified with readability-qualified-auto.
2022-08-27 21:21:02 -07:00
Paul Kirth
3155e3070c [llvm][misexpect] Re-enable MisExpect for SampleProfiling
MisExpect was occasionally crashing under SampleProfiling, due to a division by zero.
We worked around that in D124302 by changing the assert to an early return.
This patch is intended to add a test case for the crashing scenario and
re-enable MisExpect for SampleProfiling.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D124481
2022-08-26 20:24:10 +00:00
Kazu Hirata
50724716cd [Transforms] Qualify auto in range-based for loops (NFC)
Identified with readability-qualified-auto.
2022-08-14 12:51:58 -07:00
Arthur Eubanks
a3ac1cfaed [SampleProfile] Fix non-determinism in promoteMergeNotInlinedContextSamples()
We're seeing non-determinism with loading sample profiles. It seems to
be related to the order in which we merge FunctionSamples in
promoteMergeNotInlinedContextSamples(). Use a MapVector to iterate over
NonInlinedCallSites in the order entries were inserted.

Reviewed By: wenlei, davidxl

Differential Revision: https://reviews.llvm.org/D131592
2022-08-12 10:13:25 -07:00
Mircea Trofin
7b81a81d5f [NFC] FunctionSamples::getEntrySamples -> getHeadSamplesEstimate
The name `getEntrySamples` was misleading for 2 reasons. One, it's
close in name to `Function::getEntryCount`, but the equivalent here is
`getHeadSamples`; second, as opposed to the other get* APIs in
`FunctionSamples`, it performs an estimate/heuristic rather than just
retrieving raw data (or a non-heuristic derivate off that data, like
`getMaxCountInside`)

The new name should more clearly communicate its intent; and, being
close (in name) to `getHeadSamples`, it should allow the reader discover
the relation between them.

Also updated the doc comments for both `getHeadSamples[Estimate]` so a
reader may better understand the relation between them.

Differential Revision: https://reviews.llvm.org/D130281
2022-07-22 09:17:59 -07:00
Fangrui Song
dd5e3f0e27 [LegacyPM] Remove SampleProfileLoaderLegacyPass
Following recent changes removing non-core features of the legacy
PM/optimization pipeline (e.g. PGO), remove SamplePGO.
2022-07-17 12:09:46 -07:00
Kazu Hirata
611ffcf4e4 [llvm] Use value instead of getValue (NFC) 2022-07-13 23:11:56 -07:00
wlei
7e86b13c63 [CSSPGO][llvm-profgen] Reimplement SampleContextTracker using context trie
This is the followup patch to https://reviews.llvm.org/D125246 for the `SampleContextTracker` part. Before the promotion and merging of the context is based on the SampleContext(the array of frame), this causes a lot of cost to the memory. This patch detaches the tracker from using the array ref instead to use the context trie itself. This can save a lot of memory usage and benefit both the compiler's CS inliner and llvm-profgen's pre-inliner.

One structure needs to be specially treated is the `FuncToCtxtProfiles`, this is used to get all the functionSamples for one function to do the merging and promoting. Before it search each functions' context and traverse the trie to get the node of the context. Now we don't have the context inside the profile, instead we directly use an auxiliary map `ProfileToNodeMap` for profile , it initialize to create the FunctionSamples to TrieNode relations and keep updating it during promoting and merging the node.

Moreover, I was expecting the results before and after remain the same, but I found that the order of FuncToCtxtProfiles matter and affect the results. This can happen on recursive context case, but the difference should be small. Now we don't have the context, so I just used a vector for the order, the result is still deterministic.

Measured on one huge size(12GB) profile from one of our internal service. The profile similarity difference is 99.999%, and the running time is improved by 3X(debug mode) and the memory is reduced from 170GB to 90GB.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D127031
2022-06-27 23:22:21 -07:00
Kazu Hirata
3b7c3a654c Revert "Don't use Optional::hasValue (NFC)"
This reverts commit aa8feeefd3ac6c78ee8f67bf033976fc7d68bc6d.
2022-06-25 11:56:50 -07:00
Kazu Hirata
aa8feeefd3 Don't use Optional::hasValue (NFC) 2022-06-25 11:55:57 -07:00
Mingming Liu
e0d069598b [Inline] Annotate inline pass name with link phase information for analysis.
The annotation is flag gated; flag is turned off by default.

Differential Revision: https://reviews.llvm.org/D125495
2022-06-24 10:06:43 -07:00
Mingming Liu
bc856eb3fc [SampleProfile][Inline] Annotate sample profile inline remarks with link phase (prelink/postlink) information.
Differential Revision: https://reviews.llvm.org/D126833
2022-06-22 17:00:53 -07:00
Kazu Hirata
e0e687a615 [llvm] Don't use Optional::hasValue (NFC) 2022-06-20 10:38:12 -07:00
Fangrui Song
d86a206f06 Remove unneeded cl::ZeroOrMore for cl::opt/cl::list options 2022-06-05 00:31:44 -07:00
Fangrui Song
557efc9a8b [llvm] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC
Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!`
error. More were added due to cargo cult. Since the error has been removed,
cl::ZeroOrMore is unneeded.

Also remove cl::init(false) while touching the lines.
2022-06-03 21:59:05 -07:00