343 Commits

Author SHA1 Message Date
wlei
bfefeeb139 [SamplePGO] Fix ICE that callee samples returns null while finding import functions
We found that in a special condition, the input callee `Samples` is null for `findExternalInlineCandidate`, which caused an ICE.

In some rare cases, call instruction could be changed after being pushed into inline candidate queue, this is because earlier inlining may expose constant propagation which can change indirect call to direct call. When this happens, we may fail to find matching function samples for the candidate later(for example if the profile is stale), even if a match was found when the candidate was enqueued.

See this reduced program:

file1.c:
```
 int bar(int x);

 int(*foo())()  {
  return bar;
 };

void func()
 {
     int (*fptr)(int);
     fptr = foo();
     a +=  (*fptr)(10);
 }
```
file2.c:
```
int bar(int x) { return x + 1;}
```

The two CALL: `foo` and `(*ptr)` are pushed into the queue at the beginning, say `foo` is hotter and popped first for inlining. During the inlining of `foo`, it performs the constant propagation for the function pointer `bar` and then changed `(*ptr)` to a direct call `bar(..)`. Note that at this time, `(*ptr)/bar` is still in the queue, later while it's popped out for inlining, it use the a different target name(bar) to look for the callee samples. At the same time, if the profile is stale and the new function is different from the old function in the profile, then this led the return of the null callee sample.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D154637
2023-07-07 14:56:21 -07:00
Fangrui Song
2cb8d5ca3a [Pseudo Probe] Do not place functions in nodeduplicate COMDATs
For a function not in an IR COMDAT, currently we place it into a nodeduplicate IR
COMDAT so that its text section and its associated .pseudo_probe section will be
in the same section group, which can be retained or discarded by the linker as a
unit. However, the section group wastes space.

After D153189 uses SHF_LINK_ORDER to ensure a .pseudo_probe section will be
discarded when its associated text section is discarded, we can remove the
nodeduplicate IR change.

In the following example, the .pseudo_probe associated with .text.f is discarded as expected.
```
clang -c -ffunction-sections -fpseudo-probe-for-profiling -xc =(printf 'void _start(){} void f(){}') -o a.o
ld.lld --gc-sections --print-gc-sections a.o
```

Reviewed By: hoy

Differential Revision: https://reviews.llvm.org/D153191
2023-06-17 15:40:20 -07:00
Fangrui Song
62d8614223 [Pseudo Probe] Make .pseudo_probe GC-able
* Add the SHF_LINK_ORDER flag so that the .pseudo_probe section is discarded when the associated text section is discarded.
* Add unique ID so that with `clang -ffunction-sections -fno-unique-section-names`, there is one separate .pseudo_probe for each text section (disambiguated by `.section ....,unique,id` in assembly)

The changes allow .pseudo_probe GC even if we don't place instrumented functions
in an IR comdat (see `getOrCreateFunctionComdat` in SampleProfileProbe.cpp).

Reviewed By: hoy

Differential Revision: https://reviews.llvm.org/D153189
2023-06-16 23:46:36 -07:00
Arthur Eubanks
3e39cfe5b4 Revert "Revert "InstSimplify: Require instruction be parented""
This reverts commit 0c03f48480f69b854f86d31235425b5cb71ac921.

Going to fix forward size regression instead due to more dependent patches needing to be reverted otherwise.
2023-06-16 13:53:31 -07:00
Arthur Eubanks
0c03f48480 Revert "InstSimplify: Require instruction be parented"
This reverts commit 1536e299e63d7788f38117b0212ca50eb76d7a3b.

Causes large binary size regressions, see comments on https://reviews.llvm.org/rG1536e299e63d7788f38117b0212ca50eb76d7a3b.
2023-06-16 11:24:29 -07:00
Alan Zhao
d6b4f6786b Revert "Revert "InstSimplify: Require instruction be parented""
This reverts commit 00264eac4d0938ae8a0826da38e4777be269124c.

Reason: caused a bunch of bots to break
2023-06-16 10:58:54 -07:00
Alan Zhao
00264eac4d Revert "InstSimplify: Require instruction be parented"
This reverts commit 1536e299e63d7788f38117b0212ca50eb76d7a3b.

Reason: causes a regression in the inliner (see https://crbug.com/1454531 and https://reviews.llvm.org/rG1536e299e63d7788f38117b0212ca50eb76d7a3b#1217141)
2023-06-16 10:36:49 -07:00
Matt Arsenault
1536e299e6 InstSimplify: Require instruction be parented
Unlike every other analysis and transform, simplifyInstruction
permitted operating on instructions which are not inserted
into a function. This created an edge case no other code needs
to really worry about, and limited transforms in cases that
can make use of the context function. Only the inliner and a handful
of other utilities were making use of this, so just fix up these
edge cases. Results in some IR ordering differences since
cloned blocks are inserted eagerly now. Plus some additional
simplifications trigger (e.g. some add 0s now folded out that
previously didn't).
2023-06-02 18:14:28 -04:00
Hongtao Yu
23da210624 [PseudoProbe] Do not force the calliste debug loc to inlined probes from __nodebug__ functions.
For pseudo probes we would like to keep their original dwarf discriminator (either a zero or null) until the first FS-discriminator pass. The inliner is a violation of that, given that it assigns inlinee instructions with no debug info with the that of the callsite. This is being disabled in this patch.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D151568
2023-05-26 13:00:16 -07:00
Shengchen Kan
c81a121f3f Revert "Revert "[X86] Remove patterns for ADC/SBB with immediate 8 and optimize during MC lowering, NFCI""
This reverts commit cb16b33a03aff70b2499c3452f2f817f3f92d20d.

In fact, the test https://bugs.chromium.org/p/chromium/issues/detail?id=1446973#c2
already passed after 5586bc539acb26cb94e461438de01a5080513401
2023-05-19 22:21:56 +08:00
Hans Wennborg
cb16b33a03 Revert "[X86] Remove patterns for ADC/SBB with immediate 8 and optimize during MC lowering, NFCI"
This caused compiler assertions, see comment on
https://reviews.llvm.org/D150107.

This also reverts the dependent follow-up change:

> [X86] Remove patterns for ADD/AND/OR/SUB/XOR/CMP with immediate 8 and optimize during MC lowering, NFCI
>
> This is follow-up of D150107.
>
> In addition, the function `X86::optimizeToFixedRegisterOrShortImmediateForm` can be
> shared with project bolt and eliminates the code in X86InstrRelaxTables.cpp.
>
> Differential Revision: https://reviews.llvm.org/D150949

This reverts commit 2ef8ae134828876ab3ebda4a81bb2df7b095d030 and
5586bc539acb26cb94e461438de01a5080513401.
2023-05-19 14:43:33 +02:00
Shengchen Kan
5586bc539a [X86] Remove patterns for ADD/AND/OR/SUB/XOR/CMP with immediate 8 and optimize during MC lowering, NFCI
This is follow-up of D150107.

In addition, the function `X86::optimizeToFixedRegisterOrShortImmediateForm` can be
shared with project bolt and eliminates the code in X86InstrRelaxTables.cpp.

Differential Revision: https://reviews.llvm.org/D150949
2023-05-19 18:22:30 +08:00
Hongtao Yu
b7d9322b49 [FS-AFDO] Load pseudo probe profile on MIR
This change enables loading pseudo-probe based profile on MIR. Different from the IR profile loader, callsites are excluded from MIR profile loading since they are not assinged a FS discriminator. Using zero as the discriminator is not accurate and would undo the distribution work done by the IR loader based on pseudo probe distribution factor. We reply on block probes only for FS profile loading.

Some refactoring is done to the IR profile loader so that `getProbeWeight` can be shared by both loaders.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D148584
2023-05-10 11:29:37 -07:00
Hongtao Yu
9272d0f079 [PseudoProbe] Clean up dwarf discriminator and avoid duplicating factor.
A pseudo probe is created with dwarf line information shared with its nearest instruction. If the instruction comes with a dwarf discriminator, it will be shared with the probe as well. This can confuse the later FS-AFDO discriminator assignment pass. To fix this, I'm cleaning up the discriminator fields for probes when they are inserted.

I also notice another possibility to change the discriminator field of pseudo probes in the pipeline before the FS discriminator assignment pass. That is the loop unroller, which assigns duplication factor to instruction being vectorized. I'm disabling that for pseudo probe intrinsics specifically, also for callsites with probes.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D148569
2023-05-10 11:26:23 -07:00
William Huang
d38d6ca179 [llvm-profdata] Deprecate Compact Binary Sample Profile Format
Remove support for compact binary sample profile format

Reviewed By: davidxl, wenlei

Differential Revision: https://reviews.llvm.org/D149400
2023-05-01 17:10:08 +00:00
wlei
892daede72 [SamplePGO] Stale profile matching(part 2)
Part 2 of https://reviews.llvm.org/D147456
Use callee name on IR as an anchor to match the call target/inlinee name in the profile. The advantages of this in particular:
- Different from the traditional way of encoding hash signatures to every block that would affect binary/profile size and build speed, it doesn't require any additional information for this, all the data is already in the IR and profiles.
- Effective for current nested profile layout in which once a callsite is mismatched all the inlinee's profiles are dropped.
**The input of the algorithm:**
- IR locations: the anchor is the callee name of direct callsite.
- Profile locations: the anchor is the call target name for `BodySample`s or inlinee's profile name for `CallsiteSamples`.
The two lists are populated by parsing the IR and profile and both can be generalized as a sequence of locations with an optional anchor.
For example: say location `1.2(foo)` refers to a callsite at `1.2` with callee name `foo` and `1.3` refers to a non-directcall location `1.3`.
```
// The current build source code:
   int main() {
1.     ...
2.     foo();
3.     ...
4      ...
5.     ...
6.     bar();
7.     ...
   }
```
IR locations are populated and simplified as: `[1, 2(foo), 3, 5, 6(bar), 7]`.
```
; The "stale" profile:
main:350:1
 1: 1
 2: 3
 3: 100 foo:100
 4: 2
 7: 2
 8: 200 bar:200
 9: 30
```
Profile locations are populated and simplified as `[1, 2, 3(foo), 4, 7, 8(bar), 9]`
**Matching heuristic:**
- Match all the anchors in lexical order first.
- Match non-anchors evenly between two anchors: Split the non-anchor range, the first half is matched based on the start anchor, the second half is matched based on the end anchor.
So the example above is matched like:
```
   [1,    2(foo), 3,  5,  6(bar), 7]
    |     |       |   |     |     |
   [1, 2, 3(foo), 4,  7,  8(bar), 9]
```
3 -> 4 matching is based on anchor `foo`, 5 -> 7 matching is based on anchor `bar`.
The output mapping of matching is [2->3, 3->4, 5->7, 6->8, 7->9].

For the implementation, the anchors are saved in a map for fast look-up. The result mapping is saved into `IRToProfileLocationMap`(see https://reviews.llvm.org/D147456) and distributed to all FunctionSamples(`distributeIRToProfileLocationMap`)

**Clang-self build benchmark: **
Current build version: clang-10
The profiled version:  clang-9
Results compared to a refresh profile(collected profile on clang-10) and to be fair, we invalidated new functions' profiles(both refresh and stale profile use the same profile list).
1) Regression to using refresh profile with this off : -3.93%
2) Regression to using refresh profile with this on  : -1.1%
So this algorithm can recover ~72% of the regression.
**Internal(Meta) large-scale services.**
we saw one real instance of a 3 week stale profile., it delivered a ~1.8% win.

**Notes or future work:**
- Classic AutoFDO support: the current version only supports pseudo-probe, but I believe it's not hard to extend to classic line-number based AutoFDO since pseudo-probe and line-number are shared the LineLocation structure.
- The fuzzy matching is an open-ended area and there could be more heuristics to try out, but since the current version already recovers a reasonable percentage of regression(with some pseudo probe order change, it can recover close to 90%), I'm submitting the patch for review and we will try more heuristics in future.
- Profile call target name are only available when the call is hit by samples, the missing anchor might mislead the matching, this can be mitigated in llvm-profgen to generate the call target for the zero samples.
- This doesn't handle function name mismatch, we plan to solve it in future.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D147545
2023-04-28 13:07:32 -07:00
wlei
339b8a0019 [AutoFDO] Use flattened profiles for profile staleness metrics
For profile staleness report, before it only counts for the top-level function samples in the nested profile, the samples in the inlinees are ignored. This could affect the quality of the metrics when there are heavily inlined functions. This change adds a feature to flatten the nested profile and we're changing to use flatten profile as the input for stale profile detection and matching.
Example for profile flattening:

```
Original profile:
_Z3bazi:20301:1000
 1: 1000
 3: 2000
 5: inline1:1600
   1: 600
   3: inline2:500
     1: 500

Flattened profile:
_Z3bazi:18701:1000
 1: 1000
 3: 2000
 5: 600 inline1:600
inline1:1100:600
 1: 600
 3: 500 inline2: 500
inline2:500:500
 1: 500
```
This feature could be useful for offline analysis, like understanding the hotness of each individual function. So I'm adding the support to `llvm-profdata merge` under `--gen-flattened-profile`.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D146452
2023-03-30 11:05:10 -07:00
wlei
8ab9eebb18 [Pseudo Probe] Add the test for probe desc
Added a test to https://reviews.llvm.org/D146657, make sure the guid and name are computed using the debug info name.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D146826
2023-03-24 12:43:16 -07:00
Arthur Eubanks
eecb8c5f06 [SampleProfile] Use LazyCallGraph instead of CallGraph
The function order in some tests had to be changed because they relied on ordering of functions returned in an SCC which is consistent but unspecified.
2023-03-20 13:43:54 -07:00
Yuanfang Chen
9aae408d55 [NFC] fix typo funciton -> function
credits to @jmagee
2023-03-10 18:05:25 -08:00
Hongtao Yu
c38c8d6743 [PseudoProbe] Refactoring a test
As titled.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D144137
2023-02-15 14:07:51 -08:00
Hongtao Yu
eddec9de44 [Pseudo probe] Duplicate probes in vectorized loop body.
Prevoius pseudo probes were dropped out of a vectorized loop body during loop vectorization. This can result in the samples of the loop entry is used for the loop body, which in turn can cause undercounting of the loop iteration count. The undercounting can further prevent the loop from being vectorized in the next build. I'm fixing this by explicting allowing pseudo probes to be kept in the vectorized loop body, and by claiming a probe instruction is not "uniform", the vectorizer will duplicate it by the number of vector lanes.

For one internal service, I'm seeing the change causes the size increase of the .pseudoprobe section by 0.7%, which should count around 0.2% of the whole binary size.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D144066
2023-02-15 10:18:08 -08:00
Hongtao Yu
950487bddf [Pseudo Probe] Do not instrument EH blocks.
This change avoids inserting probes to EH blocks. Pseudo probe can prevent block merging when probes in the blocks look different. This has a chained effect to passes incurring exponential IR growth (such as jump threading) and as a consequence the compilation may time out.  Not inserting probes to EH blocks could mitigate the issue. Another benefit is that both IR size and binary size are smaller. Since EH blocks are usually cold, the change should have minimal impact to profile quality.

Testing:

Out of two internal large benchmarks, no perf impact seen. 1% size savings to both the `text` and the `pseudo_probe` section.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D142747
2023-01-30 13:26:56 -08:00
spupyrev
45b155924e [BOLT] using jump weights in profi
We want to use profile inference (profi) in BOLT for stale profile matching.
This is the second change for existing usages of profi (e.g., CSSPGO):

(i) Added the ability to provide (estimated) jump weights for the algorithm. The
goal of the algorithm is to create a valid control flow for a given function
(that is, one in which incoming counts equal outgoing counts for every basic
block while minimally modifying the original input block and jump weights). The
input jump weights will be provided based on collected LBR profiles in BOLT.

(ii) Added the corresponding options to ProfiParams.

(iii) Slightly modified / simplified the construction of the flow network in profi
so as it utilizes fewer auxiliary nodes. This is done by introducing parallel
edges to the network (which is supported by MMF) and reduces the size of the
network from 3*|V| to 2*|V|, where |V| is the number of basic blocks in the
function.

**Inference (profile quality) impact:**
The diff is supposed to be a no-op for the inferred counts. However, our
implementation of MCF is not fully deterministic and might return different
results depending on the input network model. Since we changed the model
construction, there are a few differences in comparison to the original
implementation. I checked manually on an internal benchmark and see a minor
difference (+/- 1 count for certain basic blocks) in just a dozen of instances
(out of 10000+ input functions). Hence, the diff is highly unlikely to have an
impact for existing prod workloads.

**Runtime impact:**
I measure up to 10% speedup for block-only (ie CSSPGO/AutoFDO) inference and up
to 50% speedup for block+jump inference (ie BOLT) in comparison to the original
unoptimized version.

Reviewed By: hoy

Differential Revision: https://reviews.llvm.org/D139870
2023-01-11 14:34:43 -08:00
spupyrev
61eb12e1f4 [BOLT] introducing profi params
We want to use profile inference (**profi**) in BOLT for stale profile matching.
To this end, I am making a few changes modifying the interface of the algorithm.
This is the first change for existing usages of profi (e.g., CSSPGO):
- introducing an object holding the algorithmic parameters;
- some renaming of existing options;
- dropped unused option, SampleProfileInferEntryCount, as we don't plan to change its default value;
- no changes in the output / tests.

Reviewed By: hoy

Differential Revision: https://reviews.llvm.org/D134756
2023-01-09 12:03:28 -08:00
Nikita Popov
25450788a4 [SampleProfile] Avoid branch on undef UB in tests (NFC) 2023-01-03 14:23:25 +01:00
Nikita Popov
bf5f05e3fe [SampleProfile] Regenerate test checks (NFC) 2022-12-22 16:24:03 +01:00
wlei
97e2aeab71 [AutoFDO] Use getHeadSamplesEstimate instead of getTotalSamples to compute profile callsite staleness
Fix two issues for profile staleness report.

1) It should be more accurate to use the sum of all entry count(`getHeadSamplesEstimate`) for the callsite samples than the total samples, since even the top-level callsite is mismatched, it does affect the inlining but it can still be merged into base profile and used later.

2) I accidentally missed to persist the num of mismatched callsite into binary.

Also added the asm testing to test the decoding of the section.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D140063
2022-12-15 11:21:18 -08:00
Bjorn Pettersson
3528e63d89 [test] Remove duplicate RUN lines in Transform tests 2022-12-08 11:47:16 +01:00
Roman Lebedev
b2c2d49edc
[NFC] Port all SampleProfile tests to -passes= syntax 2022-12-08 02:38:50 +03:00
Bjorn Pettersson
a11faeed44 [test] Switch to use -passes syntax in various test cases 2022-12-01 21:25:59 +01:00
Arthur Eubanks
4b3202e639 [opt] Remove "new-pm" from some cl::opt names 2022-11-28 11:00:45 -08:00
Matt Arsenault
0d2271bb44 SampleProfile: Convert tests to opaque pointers
syntax.ll required removing some diffs that apparently looked like
pointers in message checking.
2022-11-27 21:27:50 -05:00
Matt Arsenault
5e49649d16 SampleProfile: Don't use anonymous values in test
These interfered with converting the test to opaque pointers.
2022-11-27 09:40:00 -05:00
wlei
18df04c944 Run test only on x86_64-linux to fix a build break 2022-11-09 23:06:15 -08:00
wlei
47b0758049 [SampleFDO] Persist profile staleness metrics into binary
With https://reviews.llvm.org/D136627, now we have the metrics for profile staleness based on profile statistics, monitoring the profile staleness in real-time can help user quickly identify performance issues. For a production scenario, the build is usually incremental and if we want the real-time metrics, we should store/cache all the old object's metrics somewhere and pull them in a post-build time. To make it more convenient, this patch add an option to persist them into the object binary, the metrics can be reported right away by decoding the binary rather than polling the previous stdout/stderrs from a cache system.

For implementation, it writes the statistics first into a new metadata section(llvm.stats) then encode into a special ELF `.llvm_stats` section. The section data is formatted as a list of key/value pair so that future statistics can be easily extended. This is also under a new switch(`-persist-profile-staleness`)

In terms of size overhead, the metrics are computed at module level, so the size overhead should be small, measured on one of our internal service, it costs less than < 1MB for a 10GB+ binary.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D136698
2022-11-09 22:34:33 -08:00
Nikita Popov
304f1d59ca [IR] Switch everything to use memory attribute
This switches everything to use the memory attribute proposed in
https://discourse.llvm.org/t/rfc-unify-memory-effect-attributes/65579.
The old argmemonly, inaccessiblememonly and inaccessiblemem_or_argmemonly
attributes are dropped. The readnone, readonly and writeonly attributes
are restricted to parameters only.

The old attributes are auto-upgraded both in bitcode and IR.
The bitcode upgrade is a policy requirement that has to be retained
indefinitely. The IR upgrade is mainly there so it's not necessary
to update all tests using memory attributes in this patch, which
is already large enough. We could drop that part after migrating
tests, or retain it longer term, to make it easier to import IR
from older LLVM versions.

High-level Function/CallBase APIs like doesNotAccessMemory() or
setDoesNotAccessMemory() are mapped transparently to the memory
attribute. Code that directly manipulates attributes (e.g. via
AttributeList) on the other hand needs to switch to working with
the memory attribute instead.

Differential Revision: https://reviews.llvm.org/D135780
2022-11-04 10:21:38 +01:00
Hongtao Yu
d5a963ab8b [PseudoProbe] Replace relocation with offset for entry probe.
Currently pseudo probe encoding for a function is like:
	- For the first probe, a relocation from it to its physical position in the code body
	- For subsequent probes, an incremental offset from the current probe to the previous probe

The relocation could potentially cause relocation overflow during link time. I'm now replacing it with an offset from the first probe to the function start address.

A source function could be lowered into multiple binary functions due to outlining (e.g, coro-split). Since those binary function have independent link-time layout, to really avoid relocations from .pseudo_probe sections to .text sections, the offset to replace with should really be the offset from the probe's enclosing binary function, rather than from the entry of the source function. This requires some changes to previous section-based emission scheme which now switches to be function-based. The assembly form of pseudo probe directive is also changed correspondingly, i.e, reflecting the binary function name.

Most of the source functions end up with only one binary function. For those don't, a sentinel probe is emitted for each of the binary functions with a different name from the source. The sentinel probe indicates the binary function name to differentiate subsequent probes from the ones from a different binary function. For examples, given source function

```
Foo() {
  …
  Probe 1
  …
  Probe 2
}
```

If it is transformed into two binary functions:

```
Foo:
   …

Foo.outlined:
   …
```

The encoding for the two binary functions will be separate:

```

GUID of Foo
  Probe 1

GUID of Foo
  Sentinel probe of Foo.outlined
  Probe 2
```

Then probe1 will be decoded against binary `Foo`'s address, and Probe 2 will be decoded against `Foo.outlined`. The sentinel probe of `Foo.outlined` makes sure there's not accidental relocation from `Foo.outlined`'s probes to `Foo`'s entry address.

On the BOLT side, to be minimal intrusive, the pseudo probe re-encoding sticks with the old encoding format. This is fine since unlike linker, Bolt processes the pseudo probe section as a whole and it is free from relocation overflow issues.

The change is downwards compatible as long as there's no mixed use of the old encoding and the new encoding.

Reviewed By: wenlei, maksfb

Differential Revision: https://reviews.llvm.org/D135912
Differential Revision: https://reviews.llvm.org/D135914
Differential Revision: https://reviews.llvm.org/D136394
2022-10-27 13:28:22 -07:00
wlei
d6a0585dd1 [SampleFDO] Compute and report profile staleness metrics
When a profile is stale and profile mismatch could happen, the mismatched samples are discarded, so we'd like to compute the mismatch metrics to quantify how stale the profile is, which will suggest user to refresh the profile if the number is high.

Two sets of metrics are introduced here:

 - (Num_of_mismatched_funchash/Total_profiled_funchash), (Samples_of_mismached_func_hash / Samples_of_profiled_function) : Here it leverages the FunctionSamples's checksums attribute which is a feature of pseudo probe. When the source code CFG changes, the function checksums will be different, later sample loader will discard the whole functions' samples, this metrics can show the percentage of samples are discarded due to this.
 -  (Num_of_mismatched_callsite/Total_profiled_callsite), (Samples_of_mismached_callsite / Samples_of_profiled_callsite) : This shows how many mismatching for the callsite location as callsite location mismatch will affect the inlining which is highly correlated with the performance. It goes through all the callsite location in the IR and profile, use the call target name to match, report the num of samples in the profile that doesn't match a IR callsite.

This is implemented in a new class(SampleProfileMatcher) and under a switch("--report-profile-staleness"), we plan to extend it with a fuzzy profile matching feature in the future.

Reviewed By: hoy, wenlei, davidxl

Differential Revision: https://reviews.llvm.org/D136627
2022-10-26 21:06:52 -07:00
Arthur Eubanks
c384b20b55 [opt] Remove temporary legacy pass name translations
And update corresponding tests.
2022-10-07 11:09:46 -07:00
Paul Kirth
3155e3070c [llvm][misexpect] Re-enable MisExpect for SampleProfiling
MisExpect was occasionally crashing under SampleProfiling, due to a division by zero.
We worked around that in D124302 by changing the assert to an early return.
This patch is intended to add a test case for the crashing scenario and
re-enable MisExpect for SampleProfiling.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D124481
2022-08-26 20:24:10 +00:00
Fangrui Song
0271ae65a6 [test] Change test/SampleProfile to use opaque pointers 2022-07-17 17:38:35 -07:00
Fangrui Song
5250e7a0d8 [test] Change -sample-profile tests to -passes=
so that we can remove SampleProfileLoaderLegacyPass.
2022-07-17 12:00:41 -07:00
Fangrui Song
6f32e71b54 [test] Remove duplicate -sample-profile tests
When -passes=sample-profile is tested, -sample-profile is redundant.
2022-07-17 00:52:30 -07:00
Mingming Liu
bc856eb3fc [SampleProfile][Inline] Annotate sample profile inline remarks with link phase (prelink/postlink) information.
Differential Revision: https://reviews.llvm.org/D126833
2022-06-22 17:00:53 -07:00
Dávid Bolvanský
f02a0a69af [NFCI] Fixed missing colon in CHECK directives 2022-04-03 11:52:38 +02:00
Hongtao Yu
7a316c0a1f [CSSPGO] Turn on profi and ext-tsp when using probe-based profile.
Probe-based profile leads to a better performance when combined with profi and ext-tsp block layout. I'm turning them on by default.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D122442
2022-03-25 09:09:21 -07:00
Johannes Doerfert
a81fff8afd Reapply "[Intrinsics] Add nocallback to the default intrinsic attributes"
This reverts commit c5f789050daab25aad6770790987e2b7c0395936 and
reapplies 7aea3ea8c3b33c9bb338d5d6c0e4832be1d09ac3 with additional test
changes.
2022-03-25 09:36:50 -05:00
minglotus-6
e2074de6a8 [ProfSampleLoader] When disable-sample-loader-inlining is true, merge profiles of inlined instances to outlining versions.
When --disable-sample-loader-inlining is true, skip inline transformation, but merge profiles of inlined instances to outlining versions.

Differential Revision: https://reviews.llvm.org/D121862
2022-03-23 13:02:48 -07:00
Arthur Eubanks
d051c566cd [test] Remove the last couple uses of -analyze in llvm/test 2022-03-23 11:31:12 -07:00