265 Commits

Author SHA1 Message Date
William Huang
31af18bcce [llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map
This is phase 1 of multiple planned improvements on the sample profile loader.   The major change is to use MD5 hash code ((instead of the function itself) as the key to look up the function offset table and the profiles, which significantly reduce the time it takes to construct the map.

The optimization is based on the fact that many practical sample profiles are using MD5 values for function names to reduce profile size, so we shouldn't need to convert the MD5 to a string and then to a SampleContext and use it as the map's key, because it's extremely slow.

Several changes to note:

(1) For non-CS SampleContext, if it is already MD5 string, the hash value will be its integral value, instead of hashing the MD5 again. In phase 2 this is going to be optimized further using a union to represent MD5 function (without converting it to string) and regular function names.

(2) The SampleProfileMap is a wrapper to *map<uint64_t, FunctionSamples>, while providing interface allowing using SampleContext as key, so that existing code still work. It will check for MD5 collision (unlikely but not too unlikely, since we only takes the lower 64 bits) and handle it to at least guarantee compilation correctness (conflicting old profile is dropped, instead of returning an old profile with inconsistent context). Other code should not try to use MD5 as key to access the map directly, because it will not be able to handle MD5 collision at all. (see exception at (5) )

(3) Any SampleProfileMap::emplace() followed by SampleContext assignment if newly inserted, should be replaced with SampleProfileMap::Create(), which does the same thing.

(4) Previously we ensure an invariant that in SampleProfileMap, the key is equal to the Context of the value, for profile map that is eventually being used for output (as in llvm-profdata/llvm-profgen). Since the key became MD5 hash, only the value keeps the context now, in several places where an intermediate SampleProfileMap is created, each new FunctionSample's context is set immediately after insertion, which is necessary to "remember" the context otherwise irretrievable.

(5) When reading a profile, we cache the MD5 values of all functions, because they are used at least twice (one to index into FuncOffsetTable, the other into SampleProfileMap, more if there are additional sections), in this case the SampleProfileMap is directly accessed with MD5 value so that we don't recalculate it each time (expensive)

Performance impact:
When reading a ~1GB extbinary profile (fixed length MD5, not compressed) with 10 million function names and 2.5 million top level functions (non CS functions, each function has varying nesting level from 0 to 20), this patch improves the function offset table loading time by 20%, and improves full profile read by 5%.

Reviewed By: davidxl, snehasish

Differential Revision: https://reviews.llvm.org/D147740
2023-06-23 21:48:52 +00:00
Hongtao Yu
09742be818 [llvm-profgen] Remove target triple check to allow for more targets
Llvm-profgen internally uses the llvm libraries and the MCDesc interface to do disassembling and symblization and it never checks against target-specific instruction operators. This makes it quite transparent to targets and a first attempt for an aarch64 binary just works. Therefore I'm removing the unnecessary triple check to unblock for new targets.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D153449
2023-06-23 10:16:24 -07:00
Kazu Hirata
a3b9c1533e [tools] Use llvm::is_contained (NFC) 2023-06-19 23:36:14 -07:00
Mark Santaniello
27c37327da Avoid pointless canonicalize when using Dwarf names
CPU profile indicated memcmp was hot due to the two rfind calls in
getCanonicalFnName. If UseSymbolTable is false, we can avoid the cost entirely.

For CSSPGO profiles I've measured ~5% speedup with this change.

Profile similarity before/after matches 100%.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D151441
2023-05-25 08:14:11 -07:00
Hongtao Yu
345fd0c10e [FS-AFDO] Generate pseudo-probe-based profiles with FS-discriminators.
This change enables generating pseudo-probe-based FS-AFDO profiles. The change is straightforward based-on previous change {D147651} by just injecting FS-discriminators into various profile generation spot.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D147957
2023-05-10 11:28:54 -07:00
William Huang
d38d6ca179 [llvm-profdata] Deprecate Compact Binary Sample Profile Format
Remove support for compact binary sample profile format

Reviewed By: davidxl, wenlei

Differential Revision: https://reviews.llvm.org/D149400
2023-05-01 17:10:08 +00:00
wlei
339b8a0019 [AutoFDO] Use flattened profiles for profile staleness metrics
For profile staleness report, before it only counts for the top-level function samples in the nested profile, the samples in the inlinees are ignored. This could affect the quality of the metrics when there are heavily inlined functions. This change adds a feature to flatten the nested profile and we're changing to use flatten profile as the input for stale profile detection and matching.
Example for profile flattening:

```
Original profile:
_Z3bazi:20301:1000
 1: 1000
 3: 2000
 5: inline1:1600
   1: 600
   3: inline2:500
     1: 500

Flattened profile:
_Z3bazi:18701:1000
 1: 1000
 3: 2000
 5: 600 inline1:600
inline1:1100:600
 1: 600
 3: 500 inline2: 500
inline2:500:500
 1: 500
```
This feature could be useful for offline analysis, like understanding the hotness of each individual function. So I'm adding the support to `llvm-profdata merge` under `--gen-flattened-profile`.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D146452
2023-03-30 11:05:10 -07:00
Hongtao Yu
5c2ae37bbe [CSSPGO][Preinliner] Trim cold call edges of the profiled call graph for a more stable profile generation.
I've noticed that for some services CSSPGO profile is less stable than non-CS AutoFDO profile from profiling to profiling without source changes. This is manifested by comparing profile similarities. For example in my experiments, AutoFDO profiles are always 99+% similar over same binary but different inputs (very close dynamic traffics) while CSSPGO profile similarity is around 90%.

The main source of the profile stability is the top-down order computed on the profiled call graph in the llvm-profgen CS preinliner. The top-down order is used to guide the CS preinliner to pre-compute an inline decision that is later on fulfilled by the compiler. A subtle change in the top-down order from run to run could cause a different inline decision computed. A deeper look in the diversion of the top-down order revealed that:
	- The topological sorting inside one SCC isn't quite right. This is fixed by {D130717}.
	- The profiled call graphs of the two sides of the A/B run isn't 100% the same. The call edges in the two runs do not subsume each other, and edges appear in both graphs may not have exactly the same weight. This is due to the nature that the graphs are dynamic. However, I saw that the graphs can be made more close by removing the cold edges from them and this bumped up the CSSPGO profile stableness to the same level of the AutoFDO profile.

Removing cold call edges from the dynamic call graph may have an impact on cold inlining, but so far I haven't seen any performance issues since the CS preinliner mainly targets hot callsites, and cold inlining can always be done by the compiler CGSCC inliner.

Also fixing an issue where the largest weight instead of the accumulated weight for a call edge is used in the profiled call graph.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D147013
2023-03-28 16:24:38 -07:00
Fangrui Song
1e6921131a Move global namespace cl::opt inside llvm:: 2023-02-14 00:09:44 -08:00
Hongtao Yu
39eb1c6145 [CSSPGO][Preinliner] Set default value of sample-profile-inline-limit-max to 50000.
The previous threshold 3000 is too small to enable any inlining for giant functions which come in with bigger size than that. In real world, I've seen a big hot function with 34000 dissasembly size. Motivated by that I'm changing the value to 50000.

With the new value the allowance size growth should still be reasonable, as it is also bounded by another threshold, i.e, --sample-profile-inline-growth-limit , which defaults to 12. The new value should mostly only affect giant functions.

I've seen for serveral internal services, the new threshold boosts performance, and it has neutral impact for other services without hot giant functions. So far I haven't seen any performance regression with that.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D143696
2023-02-13 09:17:50 -08:00
Archibald Elliott
62c7f035b4 [NFC][TargetParser] Remove llvm/ADT/Triple.h
I also ran `git clang-format` to get the headers in the right order for
the new location, which has changed the order of other headers in two
files.
2023-02-07 12:39:46 +00:00
Steven Wu
516e301752 [NFC][Profile] Access profile through VirtualFileSystem
Make the access to profile data going through virtual file system so the
inputs can be remapped. In the context of the caching, it can make sure
we capture the inputs and provided an immutable input as profile data.

Reviewed By: akyrtzi, benlangmuir

Differential Revision: https://reviews.llvm.org/D139052
2023-02-01 09:25:02 -08:00
Elena Lepilkina
537cdf92c4 [llvm-objdump][RISCV] Use new common method to parse ARCH RISCV attribute
Differential Revision: https://reviews.llvm.org/D139553
2023-01-16 16:57:55 +03:00
Archibald Elliott
f09cf34d00 [Support] Move TargetParsers to new component
This is a fairly large changeset, but it can be broken into a few
pieces:
- `llvm/Support/*TargetParser*` are all moved from the LLVM Support
  component into a new LLVM Component called "TargetParser". This
  potentially enables using tablegen to maintain this information, as
  is shown in https://reviews.llvm.org/D137517. This cannot currently
  be done, as llvm-tblgen relies on LLVM's Support component.
- This also moves two files from Support which use and depend on
  information in the TargetParser:
  - `llvm/Support/Host.{h,cpp}` which contains functions for inspecting
    the current Host machine for info about it, primarily to support
    getting the host triple, but also for `-mcpu=native` support in e.g.
    Clang. This is fairly tightly intertwined with the information in
    `X86TargetParser.h`, so keeping them in the same component makes
    sense.
  - `llvm/ADT/Triple.h` and `llvm/Support/Triple.cpp`, which contains
    the target triple parser and representation. This is very intertwined
    with the Arm target parser, because the arm architecture version
    appears in canonical triples on arm platforms.
- I moved the relevant unittests to their own directory.

And so, we end up with a single component that has all the information
about the following, which to me seems like a unified component:
- Triples that LLVM Knows about
- Architecture names and CPUs that LLVM knows about
- CPU detection logic for LLVM

Given this, I have also moved `RISCVISAInfo.h` into this component, as
it seems to me to be part of that same set of functionality.

If you get link errors in your components after this patch, you likely
need to add TargetParser into LLVM_LINK_COMPONENTS in CMake.

Differential Revision: https://reviews.llvm.org/D137838
2022-12-20 11:05:50 +00:00
David Blaikie
09e79659bf llvm-profgen: Fix use of stats to be under LLVM_ENABLE_STATS
This caused a -Wunused-variable warning in a without-assert+with-stats
build (because the stats were included but their use was not).

Stat use is meant to be gated by LLVM_ENABLE_STATS which can be set
independently of assertions.
2022-12-18 17:46:01 +00:00
Fangrui Song
21c4dc7997 std::optional::value => operator*/operator->
value() has undesired exception checking semantics and calls
__throw_bad_optional_access in libc++. Moreover, the API is unavailable without
_LIBCPP_NO_EXCEPTIONS on older Mach-O platforms (see
_LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS).

This fixes clang.
2022-12-17 00:42:05 +00:00
Florian Hahn
e2b9cd796b
[llvm-profgen] Fix build failure after 5d7950a403bec25e52.
This fixes a build failure with libc++
(`error: no matching function for call to 'max')`
2022-12-16 17:21:12 +00:00
Hongtao Yu
5d7950a403 [CSSPGO][llvm-profgen] Missing frame inference.
This change introduces a missing frame inferrer aiming at fixing missing frames. It current only handles missing frames due to the compiler tail call elimination (TCE) but could also be extended to supporting other scenarios like frame pointer omission. When a tail called function is sampled, the caller frame will be missing from the call chain because the caller frame is reused for the callee frame. While TCE is beneficial to both perf and reducing stack overflow, a workaround being made in this change aims to find back the missing frames as much as possible.

The idea behind this work is to build a dynamic call graph that consists of only tail call edges constructed from LBR samples and DFS-search for a unique path for a given source frame and target frame on the graph. The unique path will be used to fill in the missing frames between the source and target. Note that only a unique path counts. Multiple paths are treated unreachable since we don't want to overcount for any particular possible path.

A switch --infer-missing-frame is introduced and defaults to be on.

Some testing results:
- 0.4% perf win according to three internal benchmarks.
- About 2/3 of the missing tail call frames can be recovered, according to an internal benchmark.
- 10% more profile generation time.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D139367
2022-12-16 08:44:43 -08:00
Kazu Hirata
6eb0b0a045 Don't include Optional.h
These files no longer use llvm::Optional.
2022-12-14 21:16:22 -08:00
Fangrui Song
da2f5d0a41 [tools] llvm::Optional => std::optional 2022-12-14 08:01:04 +00:00
Fangrui Song
89fab98e88 [DebugInfo] llvm::Optional => std::optional
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-05 00:09:22 +00:00
Kazu Hirata
b4482f7ca0 [tools] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-02 21:11:40 -08:00
Matt Arsenault
e748db0f7f Support: Convert Program APIs to std::optional 2022-12-01 17:00:44 -05:00
Kazu Hirata
286223edc6 [llvm-profgen] Use std::optional in ProfiledBinary.cpp (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-11-26 19:01:24 -08:00
Fangrui Song
c2250d8bc0 [CSSPGO] Move cl::opt inside llvm:: after D100528 and D108342 2022-11-23 23:08:49 -08:00
Hongtao Yu
c566777818 [llvm-profgen] Fix a typo in collectProfiledFunctions
As titled. The change should have minimal impact since the targets of branch samples are mostly covered by range samples.

Reviewed By: wenlei, wlei

Differential Revision: https://reviews.llvm.org/D137203
2022-11-01 16:52:39 -07:00
Hongtao Yu
d5a963ab8b [PseudoProbe] Replace relocation with offset for entry probe.
Currently pseudo probe encoding for a function is like:
	- For the first probe, a relocation from it to its physical position in the code body
	- For subsequent probes, an incremental offset from the current probe to the previous probe

The relocation could potentially cause relocation overflow during link time. I'm now replacing it with an offset from the first probe to the function start address.

A source function could be lowered into multiple binary functions due to outlining (e.g, coro-split). Since those binary function have independent link-time layout, to really avoid relocations from .pseudo_probe sections to .text sections, the offset to replace with should really be the offset from the probe's enclosing binary function, rather than from the entry of the source function. This requires some changes to previous section-based emission scheme which now switches to be function-based. The assembly form of pseudo probe directive is also changed correspondingly, i.e, reflecting the binary function name.

Most of the source functions end up with only one binary function. For those don't, a sentinel probe is emitted for each of the binary functions with a different name from the source. The sentinel probe indicates the binary function name to differentiate subsequent probes from the ones from a different binary function. For examples, given source function

```
Foo() {
  …
  Probe 1
  …
  Probe 2
}
```

If it is transformed into two binary functions:

```
Foo:
   …

Foo.outlined:
   …
```

The encoding for the two binary functions will be separate:

```

GUID of Foo
  Probe 1

GUID of Foo
  Sentinel probe of Foo.outlined
  Probe 2
```

Then probe1 will be decoded against binary `Foo`'s address, and Probe 2 will be decoded against `Foo.outlined`. The sentinel probe of `Foo.outlined` makes sure there's not accidental relocation from `Foo.outlined`'s probes to `Foo`'s entry address.

On the BOLT side, to be minimal intrusive, the pseudo probe re-encoding sticks with the old encoding format. This is fine since unlike linker, Bolt processes the pseudo probe section as a whole and it is free from relocation overflow issues.

The change is downwards compatible as long as there's no mixed use of the old encoding and the new encoding.

Reviewed By: wenlei, maksfb

Differential Revision: https://reviews.llvm.org/D135912
Differential Revision: https://reviews.llvm.org/D135914
Differential Revision: https://reviews.llvm.org/D136394
2022-10-27 13:28:22 -07:00
wlei
bf97c1e066 [NFC] fix a wrong name change during rebase 2022-10-25 21:16:29 -07:00
wlei
91cc53d5a4 [llvm-profgen] Do not cache the frame location stack during computing inlined context size
In `computeInlinedContextSizeForRange`, the offset of range is only used one time, there is no need to cache the frame location stack.
Measured on one internal service binary, this can save 2GB memory usage and reduce a small run time (avoid one hash search).

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D128859
2022-10-25 21:08:36 -07:00
wlei
467652486f [llvm-profgen] Fix inconsistent loading address issues
This is to fix two issues related with loading address:

1) When multiple MMAPs occur and their loading address are different, before it only used the first MMap as base address, all perf address after it used the wrong base address.

2) For pseudo probe profile, the address is always based on preferred loading address. If the base address is not equal to the preferred loading address, the pseudo probe address query will be wrong.

Solution: Instead of converting the address to offset lazily, right now all the address after parsing are converted on the fly based on preferred loading address in the parsing time. There is no "offset" used in profile generator any more.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D126827
2022-10-13 23:19:30 -07:00
wlei
c136d8582b [llvm-profgen] Remove CommaSeparated option from perf file cl
There could be comma in one perf file path, since at this point it only support one file as input, there is no need for the `llvm:🆑:MiscFlags::CommaSeparated` option.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D134287
2022-09-21 09:46:21 -07:00
Kazu Hirata
89f1433225 Use llvm::lower_bound (NFC) 2022-09-03 11:17:37 -07:00
wlei
1b212d1098 [llvm-profgen] Fix perf script parsing issues
Fix two perf script parsing issues:

1) Redirect the error message to a new file. (the error message mixed in the perfscript could screw up the MMAP event line and cause a parsing failure)

2) Changed the MMap parsing error message to warning since the perfscript can still be parsed using the preferred address as base address.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D131449
2022-08-08 15:51:07 -07:00
Kazu Hirata
a2d4501718 [llvm] Fix comment typos (NFC) 2022-08-07 00:16:14 -07:00
Kazu Hirata
b5188591a0 [llvm] Remove redundaunt virtual specifiers (NFC)
Identified with modernize-use-override.
2022-07-24 21:50:35 -07:00
Mircea Trofin
7b81a81d5f [NFC] FunctionSamples::getEntrySamples -> getHeadSamplesEstimate
The name `getEntrySamples` was misleading for 2 reasons. One, it's
close in name to `Function::getEntryCount`, but the equivalent here is
`getHeadSamples`; second, as opposed to the other get* APIs in
`FunctionSamples`, it performs an estimate/heuristic rather than just
retrieving raw data (or a non-heuristic derivate off that data, like
`getMaxCountInside`)

The new name should more clearly communicate its intent; and, being
close (in name) to `getHeadSamples`, it should allow the reader discover
the relation between them.

Also updated the doc comments for both `getHeadSamples[Estimate]` so a
reader may better understand the relation between them.

Differential Revision: https://reviews.llvm.org/D130281
2022-07-22 09:17:59 -07:00
Kazu Hirata
611ffcf4e4 [llvm] Use value instead of getValue (NFC) 2022-07-13 23:11:56 -07:00
wlei
7e86b13c63 [CSSPGO][llvm-profgen] Reimplement SampleContextTracker using context trie
This is the followup patch to https://reviews.llvm.org/D125246 for the `SampleContextTracker` part. Before the promotion and merging of the context is based on the SampleContext(the array of frame), this causes a lot of cost to the memory. This patch detaches the tracker from using the array ref instead to use the context trie itself. This can save a lot of memory usage and benefit both the compiler's CS inliner and llvm-profgen's pre-inliner.

One structure needs to be specially treated is the `FuncToCtxtProfiles`, this is used to get all the functionSamples for one function to do the merging and promoting. Before it search each functions' context and traverse the trie to get the node of the context. Now we don't have the context inside the profile, instead we directly use an auxiliary map `ProfileToNodeMap` for profile , it initialize to create the FunctionSamples to TrieNode relations and keep updating it during promoting and merging the node.

Moreover, I was expecting the results before and after remain the same, but I found that the order of FuncToCtxtProfiles matter and affect the results. This can happen on recursive context case, but the difference should be small. Now we don't have the context, so I just used a vector for the order, the result is still deterministic.

Measured on one huge size(12GB) profile from one of our internal service. The profile similarity difference is 99.999%, and the running time is improved by 3X(debug mode) and the memory is reduced from 170GB to 90GB.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D127031
2022-06-27 23:22:21 -07:00
wlei
aa58b7b1e3 [CSSPGO][llvm-profgen] Reimplement computeSummaryAndThreshold using context trie
Follow-up patch to https://reviews.llvm.org/D125246, support `computeSummaryAndThreshold` based on context trie.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D127026
2022-06-27 23:22:21 -07:00
wlei
eba5749262 [CSSPGO][llvm-profgen] Reimplement CS profile generator using context trie
Our investigation showed ProfileMap's key is the bottleneck of the memory consumption for CS profile generation on some large services. This patch tries to optimize it by storing the CS function samples using the context trie tree structure instead of the context frame array ref. Parts of code in `ContextTrieNode` are reused.

Our experiment on one internal service showed that the context key's memory can be reduced from 80GB to 300MB.

To be compatible with non-CS profiles, the profile writer still needs to use ProfileMap as input, so rebuild the ProfileMap using the context trie in `postProcessProfiles`.

The optimization is not complete yet, next step is to reimplement Pre-inliner or profile trimmer, after that, ProfileMap should be small to be written.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D125246
2022-06-27 23:22:21 -07:00
Kazu Hirata
a7938c74f1 [llvm] Don't use Optional::hasValue (NFC)
This patch replaces Optional::hasValue with the implicit cast to bool
in conditionals only.
2022-06-25 21:42:52 -07:00
Kazu Hirata
3b7c3a654c Revert "Don't use Optional::hasValue (NFC)"
This reverts commit aa8feeefd3ac6c78ee8f67bf033976fc7d68bc6d.
2022-06-25 11:56:50 -07:00
Kazu Hirata
aa8feeefd3 Don't use Optional::hasValue (NFC) 2022-06-25 11:55:57 -07:00
Kazu Hirata
e0e687a615 [llvm] Don't use Optional::hasValue (NFC) 2022-06-20 10:38:12 -07:00
Corentin Jabot
b62e3a73e1 Replace to_hexString by touhexstr [NFC]
LLVM had 2 methods to convert a number to an hexa string,
this remove one of them.

Differential Revision: https://reviews.llvm.org/D127958
2022-06-16 17:29:50 +02:00
Hongtao Yu
7d293744a8 [CSSPGO][Preinliner] Set default value of sample-profile-inline-limit-max to 3000
The default value of sample-profile-inline-limit-max is defined as 10000 in sampleprofile.cpp. This is too big for cspreinliner which works with assembly size instead of IR size. The value 3000 turns out to be a good tradeoff. Compared to the value 10000, 3000 gives as good performance and code size, but lower build time.

Reviewed By: wenlei, wlei

Differential Revision: https://reviews.llvm.org/D127330
2022-06-08 12:10:11 -07:00
Fangrui Song
d86a206f06 Remove unneeded cl::ZeroOrMore for cl::opt/cl::list options 2022-06-05 00:31:44 -07:00
Fangrui Song
36c7d79dc4 Remove unneeded cl::ZeroOrMore for cl::opt options
Similar to 557efc9a8b68628c2c944678c6471dac30ed9e8e.
This commit handles options where cl::ZeroOrMore is more than one line below
cl::opt.
2022-06-04 00:10:42 -07:00
Fangrui Song
557efc9a8b [llvm] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC
Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!`
error. More were added due to cargo cult. Since the error has been removed,
cl::ZeroOrMore is unneeded.

Also remove cl::init(false) while touching the lines.
2022-06-03 21:59:05 -07:00
Hongtao Yu
acfd0a3456 [llvm-profgen] Update callsite body samples by summing up all call target samples.
Current profile generation caculcates callsite body samples and call target samples separately. The former is done based on LBR range samples while the latter is done based on branch samples. Note that there's a subtle difference. LBR ranges is formed from two consecutive branch samples. Therefore the last entry in a LBR record will not be counted towards body samples while there's still a chance for it to be counted towards call targets if it is a function call. I'm making sense of the call body samples by updating it to the aggregation of call targets.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D122609
2022-05-16 09:13:37 -07:00