90 Commits

Author SHA1 Message Date
Matthias Braun
48232594a0
llvm-profgen: Options cleanup / fixes (#147632)
- Add `cl::cat(ProfGenCategory)` to non-hidden options so they show up
  in `--help` output.
- Introduce `Options.h` for options referenced in multiple files.
2025-08-18 21:42:55 +00:00
Matthias Braun
43df97a909
llvm-profgen: Avoid "using namespace" in headers (#147631)
Avoid global `using namespace` directives in headers as they are bad
style.
2025-08-18 18:55:23 +00:00
Lei Wang
6c3c90b5a8
[CSSPGO]Add a flag to limit unsymbolized context depth (#121531)
Adding a new flag(`--csprof-max-unsymbolized-context-depth`) to only
limit unsymbolized context depth. Currently,`--csprof-max-context-depth`
applies to both symbolized and unsymbolized profile context, there are
scenarios where `--csprof-max-context-depth` may not be flexible enough,
e.g. if we want to limit the context but still keep all the inlinings
from the leaf frame, we could set the value
csprof-max-unsymbolized-context-depth >= 1.
2025-01-07 10:29:52 -08:00
Tim Creech
23609a383c
[llvm-profgen] Revert #99826 and #99026 (#100147)
Revert #99826 and #99026 to allow for additional input.
2024-08-02 21:16:48 +08:00
Tim Creech
01d783643a
[llvm-profgen] Add --sample-period to estimate absolute counts (#99826)
Without `--sample-period`, no assumptions are made about perf profile
sample frequencies. This is useful for comparing relative hotness of
different program locations within the same profile.

With `--sample-period`, LBR- and IP-based profile hit counts are
adjusted to estimate the absolute total event count for each program
location. This makes it reasonable to compare hit counts between
different profiles, e.g., between two LBR-based execution frequency
profiles with different sampling periods or between LBR-based execution
frequency profiles and IP-based branch mispredict profiles.

This functionality is in support of HWPGO[^1], which aims to enable
feedback from a wider range of hardware events.

[^1]:
https://llvm.org/devmtg/2024-04/slides/TechnicalTalks/Xiao-EnablingHW-BasedPGO.pdf
2024-07-22 17:22:36 +08:00
Tim Creech
0caf0c93e7
[llvm-profgen] Support creating profiles of arbitrary events (#99026)
This change introduces two options which may be used to create profiles
of arbitrary PMU events.

1. `--leading-ip-only` provides a simple sample-IP-based profile mode.
This is not useful for building a profile of execution frequency, but it
is useful for building new types of profiles.

   For example, to build a profile of unpredictable branches:

perf record -b -e branch-misses:upp -o perf.data ... llvm-profgen
--perfdata perf.data --leading-ip-only ...

2. `--perf-event=event` enables the creation of a profile concerned with
a specific event or set of events. The names given should match the
"event" field as emitted by perf-script(1).

This option has two spellings: `--perf-event` and `--perf-events`. The
plural spelling accepts a comma-separated list. The singular spelling
appends a single event name to the set of events which will be used.
This is meant to accommodate event names containing commas.

Combined, these options allow generating multiple kinds of profiles from
a single `perf record` collection. For example, to generate both
execution frequency and branch mispredict profiles:

perf record -c 1000003 -b -e
br_inst_retired.near_taken:upp,br_misp_retired.all_branches:upp ...
llvm-profgen --output execution.prof
--perf-event=br_inst_retired.near_taken:upp ...
llvm-profgen --leading-ip-only --output unpredictable.prof
--perf-event=br_misp_retired.all_branches:upp ...

These additions are in support of more general HWPGO[^1], allowing
feedback from a wider range of hardware events.

[^1]:
https://llvm.org/devmtg/2024-04/slides/TechnicalTalks/Xiao-EnablingHW-BasedPGO.pdf

---------

Co-authored-by: Tim Creech <tcreech@tcreech.com>
2024-07-21 15:04:27 +08:00
Jay Foad
d4a0154902
[llvm-project] Fix typo "seperate" (#95373) 2024-06-13 20:20:27 +01:00
xur-llvm
2fa6eaf93b
[llvm-profgen] Add support for Linux kenrel profile (#92831)
Add the support to handle Linux kernel perf files. The functionality is
under option -kernel. Note that currently only main kernel (in vmlinux)
is handled: kernel modules are not handled.

---------

Co-authored-by: Han Shen <shenhan@google.com>
2024-06-13 10:21:46 -07:00
Haohai Wen
8f5a2325c3
[llvm-profgen] Trim tail CR+LF for LBR record line (#93210)
On Windows, perfscript generated by sep contains CR+LF at the end of
LBR records line. This '\r' will be treated as a LBR record when running
llvm-profgen on Linux and then generate warning.
2024-05-24 09:03:53 +08:00
Haohai Wen
3f7f446d38
[llvm-profgen] Remove temporary perf script files (#86668)
The temporary perf script files converted from perf data will occupy
lots
of space for large project. This patch removes them when llvm-profgen
exits normally or receives signals.
2024-04-11 15:28:32 +08:00
Haohai Wen
8c03f400a8
[llvm-profgen] Support COFF binary (#83972)
Intel Vtune/SEP has supported collecting LBR on Windows and generating
perf-script file which is same format as Linux perf script. This patch
teaches llvm-profgen to disassemble COFF binary so that we can do
Sampling based PGO on Windows.
2024-03-15 09:02:26 +08:00
Matthias Braun
8466ab98ca
llvm-profgen: Fix race condition (#83489)
Fix race condition when multiple instances of `llvm-progen` read from
the same inputs.
2024-02-29 14:53:11 -08:00
Kazu Hirata
586ecdf205
[llvm] Use StringRef::{starts,ends}_with (NFC) (#74956)
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.

I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
2023-12-11 21:01:36 -08:00
Hongtao Yu
345fd0c10e [FS-AFDO] Generate pseudo-probe-based profiles with FS-discriminators.
This change enables generating pseudo-probe-based FS-AFDO profiles. The change is straightforward based-on previous change {D147651} by just injecting FS-discriminators into various profile generation spot.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D147957
2023-05-10 11:28:54 -07:00
Fangrui Song
da2f5d0a41 [tools] llvm::Optional => std::optional 2022-12-14 08:01:04 +00:00
Kazu Hirata
b4482f7ca0 [tools] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-02 21:11:40 -08:00
Matt Arsenault
e748db0f7f Support: Convert Program APIs to std::optional 2022-12-01 17:00:44 -05:00
wlei
467652486f [llvm-profgen] Fix inconsistent loading address issues
This is to fix two issues related with loading address:

1) When multiple MMAPs occur and their loading address are different, before it only used the first MMap as base address, all perf address after it used the wrong base address.

2) For pseudo probe profile, the address is always based on preferred loading address. If the base address is not equal to the preferred loading address, the pseudo probe address query will be wrong.

Solution: Instead of converting the address to offset lazily, right now all the address after parsing are converted on the fly based on preferred loading address in the parsing time. There is no "offset" used in profile generator any more.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D126827
2022-10-13 23:19:30 -07:00
Kazu Hirata
89f1433225 Use llvm::lower_bound (NFC) 2022-09-03 11:17:37 -07:00
wlei
1b212d1098 [llvm-profgen] Fix perf script parsing issues
Fix two perf script parsing issues:

1) Redirect the error message to a new file. (the error message mixed in the perfscript could screw up the MMAP event line and cause a parsing failure)

2) Changed the MMap parsing error message to warning since the perfscript can still be parsed using the preferred address as base address.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D131449
2022-08-08 15:51:07 -07:00
Corentin Jabot
b62e3a73e1 Replace to_hexString by touhexstr [NFC]
LLVM had 2 methods to convert a number to an hexa string,
this remove one of them.

Differential Revision: https://reviews.llvm.org/D127958
2022-06-16 17:29:50 +02:00
Fangrui Song
d86a206f06 Remove unneeded cl::ZeroOrMore for cl::opt/cl::list options 2022-06-05 00:31:44 -07:00
Fangrui Song
557efc9a8b [llvm] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC
Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!`
error. More were added due to cargo cult. Since the error has been removed,
cl::ZeroOrMore is unneeded.

Also remove cl::init(false) while touching the lines.
2022-06-03 21:59:05 -07:00
Hongtao Yu
9f732af583 [llvm-profgen] Filter out oversized LBR ranges.
As a follow up to {D123271}, LBR ranges that are too big should also be considered as invalid.

For example, the last two pairs in the following trace form a range [0x0d7b02b0, 0x368ba706] that covers a ton of functions in the binary. Such oversized range should also be ignored.

   0x0c74505f/0x368b99a0 **0x368ba706**/0x0c745040  0x0d7b1c3f/**0x0d7b02b0**

Add a defensive check to filter out those ranges based that the valid range should not cross the unconditional branch(Call, return, unconditional jmp).

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D125448
2022-05-12 10:58:50 -07:00
Hongtao Yu
e36786d15f [CSSPGO] Rename ProfileIsCSNested and ProfileIsCSFlat
To be more clear and definitive, I'm renaming `ProfileIsCSFlat` back to `ProfileIsCS` which stands for full context-sensitive flat profiles.  `ProfileIsCSNested` is now renamed to `ProfileIsPreInlined` and is extended to be applicable for CS flat profiles too. More specifically, `ProfileIsPreInlined` is for any kind of profiles (flat or nested) that contain 'ShouldBeInlined' contexts. The flag is encoded in the profile summary section for extbinary profiles and is computed on-the-fly for text profiles.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D122602
2022-04-29 17:03:52 -07:00
wlei
bfcb2c1119 [llvm-profgen] Decouple artificial branch from LBR parser and fix external address related issues
This patch is fixing two issues for both CS and non-CS.
1) For external-call-internal, the head samples of the the internal function should be recorded.
2) avoid ignoring LBR after meeting the interrupt branch for CS profile

LBR parser is shared between CS and non-CS, we found it's error-prone while dealing with artificial branch inside LBR parser. Since artificial branch is mainly used for CS profile unwinding, this patch tries to simplify LBR parser by decoupling artificial branch code from it, the concept of artificial branch is removed and split into two transitional branches(internal-to-external, external-to-internal). Then we leave all the processing of external branch to unwinder.

Specifically for unwinder, remembering that we introduce external frame in https://reviews.llvm.org/D115550. We can just take external address as a regular address and reuse current unwind function(unwindCall, unwindReturn). For a normal case, the external frame will match an external LBR, and it will be filtered out by `unwindLinear` without losing any context.

The data also shows that the interrupt or standalone LBR pattern(unpaired case) does exist, we choose to handle it by clearing the call stack and keeping unwinding. Here we leverage checking in `unwindLinear`, because a standalone LBR, no matter its type, since it doesn’t have other part to pair, it will eventually cause a wrong linear range, like [external, internal], [internal, external]. Then set the state to invalid there.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D118177
2022-04-28 16:07:28 -07:00
Wenlei He
17f6cba30d [llvm-profgen] Add process filter for perf reader
For profile generation, we need to filter raw perf samples for binary of interest. Sometimes binary name along isn't enough as we can have binary of the same name running in the system. This change adds a process id filter to allow users to further disambiguiate the input raw samples.

Differential Revision: https://reviews.llvm.org/D123869
2022-04-18 09:50:16 -07:00
Hongtao Yu
8a0406dcc8 [llvm-profgen] Filter out invalid LBR ranges.
The profiler can sometimes give us a LBR trace that implicates bogus code ranges. For example,

    0xc5acb56/0xc66c6c0 0xc628195/0xf31fbb0 0xc611261/0xc628130 0xc5c1a21/0xc6111c0 0x1f7edfd3/0xc5c3a50 0xc5c154f/0x1f7edec0 0xe8eed07/0xc5c11e0

, note that the first two pairs are supposed to form a linear execution range, in this case, it is [0xf31fbb0, 0xc5acb56] , which doesn't make sense.

Such bogus ranges should be ruled out to avoid generating a bad profile. I'm fixing this for both CS and non-CS cases.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D123271
2022-04-07 21:42:01 -07:00
Hongtao Yu
3f97016857 [llvm-profgen] Decoding pseudo probe for profiled function only.
Complete pseudo probes decoding can result in large memory usage. In practice only a small porting of the decoded probes are used in profile generation. I'm changing the full decoding mode to be decoding for profiled functions only, though we still do a full scan of the .pseudoprobe section due to a missing table-of-content but we don't have to build the in-memory data structure for functions not sampled.

To build the in-memory data structure for profiled functions only, I'm rewriting the previous non-recursive probe decoding logic to be recursive. This is easy to read and maintain.

I also have to change the previous representation of unsymbolized context from probe-based stack to address-based stack since the profiled functions are unknown yet by the time of virtual unwinding. The address-based stack will be converted to probe-based stack after virtual unwinding and on-demand probe decoding.

I'm seeing 20GB memory is saved for one of our internal large service.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D121643
2022-03-23 14:15:11 -07:00
serge-sans-paille
db29f4374d Cleanup include: DebugInfo/Symbolize
Estimation of the impact on preprocessor output
after: 1067349756
before:1067487786

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D120433
2022-02-24 13:25:11 +01:00
Hongtao Yu
67db31115d [llvm-profgen] Clean up unnecessary memory reservations between phases.
Cleaning up data structures that are not used after a certain point. This further brings down peak memory usage by 15% for a large benchmark.

Before:
   note: Before parsePerfTraces
   note: VM: 40.73 GB   RSS: 39.18 GB
   note: Before parseAndAggregateTrace
   note: VM: 40.73 GB   RSS: 39.18 GB
   note: After parseAndAggregateTrace
   note: VM: 88.93 GB   RSS: 87.97 GB
   note: Before generateUnsymbolizedProfile
   note: VM: 88.95 GB   RSS: 87.99 GB
   note: After generateUnsymbolizedProfile
   note: VM: 93.50 GB   RSS: 92.53 GB
   note: After computeSizeForProfiledFunctions
   note: VM: 101.13 GB   RSS: 99.36 GB
   note: After generateProbeBasedProfile
   note: VM: 215.61 GB   RSS: 210.88 GB
   note: After postProcessProfiles
   note: VM: 237.48 GB   RSS: 212.50 GB

After:
   note: Before parsePerfTraces
   note: VM: 40.73 GB   RSS: 39.18 GB
   note: Before parseAndAggregateTrace
   note: VM: 40.73 GB   RSS: 39.18 GB
   note: After parseAndAggregateTrace
   note: VM: 88.93 GB   RSS: 87.96 GB
   note: Before generateUnsymbolizedProfile
   note: VM: 88.95 GB   RSS: 87.97 GB
   note: After generateUnsymbolizedProfile
   note: VM: 93.50 GB   RSS: 92.51 GB
   note: After computeSizeForProfiledFunctions
   note: VM: 93.50 GB   RSS: 92.53 GB
   note: After generateProbeBasedProfile
   note: VM: 164.87 GB   RSS: 163.55 GB
   note: After postProcessProfiles
   note: VM: 182.28 GB   RSS: 179.43 GB

Reviewed By: wenlei, wlei

Differential Revision: https://reviews.llvm.org/D118677
2022-02-01 16:27:54 -08:00
Hongtao Yu
fec57e5b17 Revert "[llvm-profgen] Clean up unnecessary memory reservations between phases."
This reverts commit 057e784b0962a7c5a17e858932bb6f03c7676c47.
2022-02-01 14:44:48 -08:00
Hongtao Yu
057e784b09 [llvm-profgen] Clean up unnecessary memory reservations between phases.
Cleaning up data structures that are not used after a certain point. This further brings down peak memory usage by 15% for a large benchmark.

Before:
   note: Before parsePerfTraces
   note: VM: 40.73 GB   RSS: 39.18 GB
   note: Before parseAndAggregateTrace
   note: VM: 40.73 GB   RSS: 39.18 GB
   note: After parseAndAggregateTrace
   note: VM: 88.93 GB   RSS: 87.97 GB
   note: Before generateUnsymbolizedProfile
   note: VM: 88.95 GB   RSS: 87.99 GB
   note: After generateUnsymbolizedProfile
   note: VM: 93.50 GB   RSS: 92.53 GB
   note: After computeSizeForProfiledFunctions
   note: VM: 101.13 GB   RSS: 99.36 GB
   note: After generateProbeBasedProfile
   note: VM: 215.61 GB   RSS: 210.88 GB
   note: After postProcessProfiles
   note: VM: 237.48 GB   RSS: 212.50 GB

After:
   note: Before parsePerfTraces
   note: VM: 40.73 GB   RSS: 39.18 GB
   note: Before parseAndAggregateTrace
   note: VM: 40.73 GB   RSS: 39.18 GB
   note: After parseAndAggregateTrace
   note: VM: 88.93 GB   RSS: 87.96 GB
   note: Before generateUnsymbolizedProfile
   note: VM: 88.95 GB   RSS: 87.97 GB
   note: After generateUnsymbolizedProfile
   note: VM: 93.50 GB   RSS: 92.51 GB
   note: After computeSizeForProfiledFunctions
   note: VM: 93.50 GB   RSS: 92.53 GB
   note: After generateProbeBasedProfile
   note: VM: 164.87 GB   RSS: 163.55 GB
   note: After postProcessProfiles
   note: VM: 182.28 GB   RSS: 179.43 GB

Reviewed By: wenlei, wlei

Differential Revision: https://reviews.llvm.org/D118677
2022-02-01 12:48:08 -08:00
wlei
b239b2b0db [llvm-profgen] Fix warning of enumerated and non-enumerated type in conditional expression
Differential Revision: https://reviews.llvm.org/D115842
2021-12-16 19:28:55 -08:00
wlei
0f53df864e [CSSPGO][llvm-profgen] Fix external address issues of perf reader (return to external addr part)
Before we have an issue with artificial LBR whose source is a return, recalling that "an internal code(A) can return to external address, then from the external address call a new internal code(B), making an artificial branch that looks like a return from A to B can confuse the unwinder". We just ignore the LBRs after this artificial LBR which can miss some samples. This change aims at fixing this by correctly unwinding them instead of ignoring them.

List some typical scenarios covered by this change.

1)  multiple sequential call back happen in external address, e.g.

```
[ext, call, foo] [foo, return, ext] [ext, call, bar]
```
Unwinder should avoid having foo return from bar. Wrong call stack is like [foo, bar]

2) the call stack before and after external call should be correctly unwinded.
```
 {call stack1}                                            {call stack2}
 [foo, call, ext]  [ext, call, bar]  [bar, return, ext]  [ext, return, foo ]
```
call stack 1 should be the same to call stack2. Both shouldn't be truncated

3) call stack should be truncated after call into external code since we can't do inlining with external code.

```
 [foo, call, ext]  [ext, call, bar]  [bar, call, baz] [baz, return, bar ] [bar, return, ext]
```
the call stack of code in baz should not include foo.

### Implementation:

We leverage artificial frame to fix #2 and #3: when we got a return artificial LBR, push an extra artificial frame to the stack. when we pop frame, check if the parent is an artificial frame to pop(fix #2). Therefore, call/ return artificial LBR is just the same as regular LBR which can keep the call stack.

While recording context on the trie, artificial frame is used as a tag indicating that we should truncate the call stack(fix #3).

To differentiate #1 and #2, we leverage `getCallAddrFromFrameAddr`.  Normally the target of the return should be the next inst of a call inst and `getCallAddrFromFrameAddr` will return the address of call inst. Otherwise, getCallAddrFromFrameAddr will return to 0 which is the case of #1.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D115550
2021-12-14 16:40:54 -08:00
wlei
30c3aba998 [llvm-profgen] Fix to use getUntrackedCallsites outside the loop
Unwinder is hoisted out in https://reviews.llvm.org/D115550, so fix the useage of getUntrackedCallsites.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D115760
2021-12-14 16:40:53 -08:00
wlei
3dcb60db9a [CSSPGO][llvm-profgen] Fix external address issues of perf reader (leading external LBR part)
We can have the sampling just hit into the external addresses, in that case, both the top stack frame and the latest LBR target are external addresses. For example:
```
	        ffffffff
 0x4006c8/0xffffffff/P/-/-/0  0x40069b/0x400670/M/-/-/0

 	          ffffffff
	          40067e
0xffffffff/0xffffffff/P/-/-/0  0x4006c8/0xffffffff/P/-/-/0  0x40069b/0x400670/M/-/-/0
```
Before we will ignore the entire samples. However, we found there exists some internal LBRs in the remaining part of sample, the range between them is still a valid range, we will lose some valid LBRs. Those LBRs will be unwinded based on a empty(context-less) call stack.

This change tries to fix it, instead of ignoring the entire sample, we only ignore the leading external addresses.

Note that the first outgoing LBR is useful since there is a valid range between it's source and next LBR's target.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D115538
2021-12-14 16:40:53 -08:00
Hongtao Yu
5740bb801a [CSSPGO] Use nested context-sensitive profile.
CSSPGO currently employs a flat profile format for context-sensitive profiles. Such a flat profile allows for precisely manipulating contexts that is either inlined or not inlined. This is a benefit over the nested profile format used by non-CS AutoFDO. A downside of this is the longer build time due to parsing the indexing the full CS contexts.

For a CS flat profile, though only the context profiles relevant to a module are loaded when that module is compiled, the cost to figure out what profiles are relevant is noticeably high when there're many contexts,  since the sample reader will need to scan all context strings anyway. On the contrary, a nested function profile has its related inline subcontexts isolated from other unrelated contexts. Therefore when compiling a set of functions, unrelated contexts will never need to be scanned.

In this change we are exploring using nested profile format for CSSPGO. This is expected to work based on an assumption that with a preinliner-computed profile all contexts are precomputed and expected to be inlined by the compiler. Contexts not expected to be inlined will be cut off and returned to corresponding base profiles (for top-level outlined functions). This naturally forms a nested profile where all nested contexts are expected to be inlined. The compiler will less likely optimize on derived contexts that are not precomputed.

A CS-nested profile will look exactly the same with regular nested profile except that each nested profile can come with an attributes. With pseudo probes,  a nested profile shown as below can also have a CFG checksum.

```

main:1968679:12
 2: 24
 3: 28 _Z5funcAi:18
 3.1: 28 _Z5funcBi:30
 3: _Z5funcAi:1467398
  0: 10
  1: 10 _Z8funcLeafi:11
  3: 24
  1: _Z8funcLeafi:1467299
   0: 6
   1: 6
   3: 287884
   4: 287864 _Z3fibi:315608
   15: 23
   !CFGChecksum: 138828622701
   !Attributes: 2
  !CFGChecksum: 281479271677951
  !Attributes: 2
```

Specific work included in this change:
- A recursive profile converter to convert CS flat profile to nested profile.
- Extend function checksum and attribute metadata to be stored in nested way for text profile and extbinary profile.
- Unifiy sample loader inliner path for CS and preinlined nested profile.
 - Changes in the sample loader to support probe-based nested profile.

I've seen promising results regarding build time. A nested profile can result in a 20% shorter build time than a CS flat profile while keep an on-par performance. This is with -duplicate-contexts-into-base=1.

Test Plan:

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D115205
2021-12-14 14:40:25 -08:00
wlei
41a681ce09 [FS-AFDO][llvm-profgen] Generate profile with FS-AFDO discriminator
In order to support generating profile  with FS discriminator, three kind of changes are done in llvm-profgen:

1) Dissassemble .rodata section to check if FS discriminator var ('"__llvm_fs_discriminator__"') exists and set the corresponding flag in the binary.

2) Change the discriminator decoding in `getBaseDiscriminator` and `getDuplicationFactor`.

3) set true for `FunctionSamples::ProfileIsFS` to enable FS functionality in ProfileData.

Reviewed By: xur, hoy, wenlei

Differential Revision: https://reviews.llvm.org/D113296
2021-11-30 15:57:59 -08:00
Wenlei He
f7976edc1e [llvm-profgen] Add switch to allow use of first loadable segment for calculating offset
Adding `-use-loadable-segment-as-base` to allow use of first loadable segment for calculating offset. By default first executable segment is used for calculating offset. The switch helps compatibility with unsymbolized profile generated from older tools.

Differential Revision: https://reviews.llvm.org/D113727
2021-11-15 19:00:27 -08:00
wlei
aab1810006 [llvm-profgen] Fix bug of setting function entry
Previously we set `isFuncEntry` flag  to true when the funcName from DWARF is equal to the name in symbol table and we use this flag to ignore reporting callsite sample that's from an intra func branch. However, in HHVM, it appears that the symbol table name is inconsistent with the dwarf info func name, it's likely due to `OptimizeGlobalAliases`.

This change is a workaround in llvm-profgen side to mark the only one range as the function entry and add warnings for the remaining inconsistence.

This also fixed a missing `getCanonicalFnName` for symbol name which caused the mismatching as well.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D113492
2021-11-12 12:18:43 -08:00
wlei
dc9f037955 [llvm-profgen] Refactor the code of getHashCode
Refactor to generate hash code lazily. Tested on clang self build, no observable generating time regression.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D113059
2021-11-02 19:56:20 -07:00
wlei
138202a8c3 [llvm-profgen] Warn on invalid range and show warning summary
Two things in this diff:

1) Warn on the invalid range, currently three types of checking, see the detailed message in the code.

2) In some situation, llvm-profgen gives lots of warnings on the truncated stacks which is noisy. This change provides a switch to `--show-detailed-warning` to skip the warnings. Alternatively, we use a summary for those warning and show the percentage of cases with those issues.

Example of warning summary.
```
warning: 0.05%(1120/2428958) cases with issue: Profile context truncated due to missing probe for call instruction.
warning: 0.00%(2/178637) cases with issue: Range does not belong to any functions, likely from external function.
```

Reviewed By: hoy

Differential Revision: https://reviews.llvm.org/D111902
2021-11-02 19:55:55 -07:00
wlei
a5f411b7f8 [llvm-profgen] Allow unsymbolized profile as perf input
This change allows the unsymbolized profile as input. The unsymbolized profile is created by `llvm-profgen` with `--skip-symbolization` and it's after the sample aggregation but before symbolization , so it has much small file size. It can be used for sample merging and trimming,  also is useful for debugging or adding test cases. A switch `--unsymbolized-profile=file-patch` is added for this.

Format of unsymbolized profile:
```

   [context stack1]    # If it's a CS profile
      number of entries in RangeCounter
      from_1-to_1:count_1
      from_2-to_2:count_2
      ......
      from_n-to_n:count_n
      number of entries in BranchCounter
      src_1->dst_1:count_1
      src_2->dst_2:count_2
      ......
      src_n->dst_n:count_n
    [context stack2]
      ......
```

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D111750
2021-10-25 23:58:08 -07:00
Kazu Hirata
4e3eebc6bd [tools, utils] Use StringRef::contains (NFC) 2021-10-22 17:22:13 -07:00
Wenlei He
a316343e19 [llvm-profgen] Allow generating AutoFDO profile from CSSPGO binary
Add `-use-dwarf-correlation` switch to allow llvm-profgen to generate AutoFDO profile for binaries built with CSSPGO (pseudo-probe).

Differential Revision: https://reviews.llvm.org/D111776
2021-10-14 09:11:56 -07:00
wlei
30ca33eab0 [llvm-profgen] Ignore the whole trace with the leading external branch
The first LBR entry can be an external branch, we should ignore the whole trace.

```
     7f7448e889e4 0x7f7448e889e4/0x7f7448e88826/P/-/-/1  0x7f7448e8899f/0x7f7448e889d8/P/-/-/4  ...
```

Reviewed By: wenlei, hoy

Differential Revision: https://reviews.llvm.org/D111749
2021-10-13 16:52:29 -07:00
wlei
ab5d65e685 [llvm-profgen] Ignore stack samples before aggregation
With `ignore-stack-samples`, We can ignore the call stack before the samples aggregation which could reduce some redundant computations.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D111577
2021-10-13 16:52:29 -07:00
Wenlei He
da4e5fc861 [llvm-profgen] Deduplicate PID when processing perf input
When parsing mmap to retrieve PID, deduplicate them before passing PID list to perf script. Perf script would error out when there's duplicated PID in the input, however raw perf data may main duplicated PID for large binary where more than one mmap is needed to load executable segment.

Differential Revision: https://reviews.llvm.org/D111384
2021-10-10 13:30:17 -07:00
Wenlei He
1f0bc617bd [llvm-porfgen] Allow perf data as input
This change enables llvm-profgen to take raw perf data as alternative input format. Sometimes we need to retrieve evenets for processes with matching binary. Using perf data as input allows us to retrieve process Ids from mmap events for matching binary, then filter by process id during perf script generation.

Differential Revision: https://reviews.llvm.org/D110793
2021-09-29 22:57:35 -07:00