llvm-project

Author	SHA1	Message	Date
William Huang	31af18bcce	[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map This is phase 1 of multiple planned improvements on the sample profile loader. The major change is to use MD5 hash code ((instead of the function itself) as the key to look up the function offset table and the profiles, which significantly reduce the time it takes to construct the map. The optimization is based on the fact that many practical sample profiles are using MD5 values for function names to reduce profile size, so we shouldn't need to convert the MD5 to a string and then to a SampleContext and use it as the map's key, because it's extremely slow. Several changes to note: (1) For non-CS SampleContext, if it is already MD5 string, the hash value will be its integral value, instead of hashing the MD5 again. In phase 2 this is going to be optimized further using a union to represent MD5 function (without converting it to string) and regular function names. (2) The SampleProfileMap is a wrapper to *map<uint64_t, FunctionSamples>, while providing interface allowing using SampleContext as key, so that existing code still work. It will check for MD5 collision (unlikely but not too unlikely, since we only takes the lower 64 bits) and handle it to at least guarantee compilation correctness (conflicting old profile is dropped, instead of returning an old profile with inconsistent context). Other code should not try to use MD5 as key to access the map directly, because it will not be able to handle MD5 collision at all. (see exception at (5) ) (3) Any SampleProfileMap::emplace() followed by SampleContext assignment if newly inserted, should be replaced with SampleProfileMap::Create(), which does the same thing. (4) Previously we ensure an invariant that in SampleProfileMap, the key is equal to the Context of the value, for profile map that is eventually being used for output (as in llvm-profdata/llvm-profgen). Since the key became MD5 hash, only the value keeps the context now, in several places where an intermediate SampleProfileMap is created, each new FunctionSample's context is set immediately after insertion, which is necessary to "remember" the context otherwise irretrievable. (5) When reading a profile, we cache the MD5 values of all functions, because they are used at least twice (one to index into FuncOffsetTable, the other into SampleProfileMap, more if there are additional sections), in this case the SampleProfileMap is directly accessed with MD5 value so that we don't recalculate it each time (expensive) Performance impact: When reading a ~1GB extbinary profile (fixed length MD5, not compressed) with 10 million function names and 2.5 million top level functions (non CS functions, each function has varying nesting level from 0 to 20), this patch improves the function offset table loading time by 20%, and improves full profile read by 5%. Reviewed By: davidxl, snehasish Differential Revision: https://reviews.llvm.org/D147740	2023-06-23 21:48:52 +00:00
Hongtao Yu	09742be818	[llvm-profgen] Remove target triple check to allow for more targets Llvm-profgen internally uses the llvm libraries and the MCDesc interface to do disassembling and symblization and it never checks against target-specific instruction operators. This makes it quite transparent to targets and a first attempt for an aarch64 binary just works. Therefore I'm removing the unnecessary triple check to unblock for new targets. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D153449	2023-06-23 10:16:24 -07:00
Kazu Hirata	a3b9c1533e	[tools] Use llvm::is_contained (NFC)	2023-06-19 23:36:14 -07:00
Mark Santaniello	27c37327da	Avoid pointless canonicalize when using Dwarf names CPU profile indicated memcmp was hot due to the two rfind calls in getCanonicalFnName. If UseSymbolTable is false, we can avoid the cost entirely. For CSSPGO profiles I've measured ~5% speedup with this change. Profile similarity before/after matches 100%. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D151441	2023-05-25 08:14:11 -07:00
Hongtao Yu	345fd0c10e	[FS-AFDO] Generate pseudo-probe-based profiles with FS-discriminators. This change enables generating pseudo-probe-based FS-AFDO profiles. The change is straightforward based-on previous change {D147651} by just injecting FS-discriminators into various profile generation spot. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D147957	2023-05-10 11:28:54 -07:00
William Huang	d38d6ca179	[llvm-profdata] Deprecate Compact Binary Sample Profile Format Remove support for compact binary sample profile format Reviewed By: davidxl, wenlei Differential Revision: https://reviews.llvm.org/D149400	2023-05-01 17:10:08 +00:00
wlei	339b8a0019	[AutoFDO] Use flattened profiles for profile staleness metrics For profile staleness report, before it only counts for the top-level function samples in the nested profile, the samples in the inlinees are ignored. This could affect the quality of the metrics when there are heavily inlined functions. This change adds a feature to flatten the nested profile and we're changing to use flatten profile as the input for stale profile detection and matching. Example for profile flattening: ``` Original profile: _Z3bazi:20301:1000 1: 1000 3: 2000 5: inline1:1600 1: 600 3: inline2:500 1: 500 Flattened profile: _Z3bazi:18701:1000 1: 1000 3: 2000 5: 600 inline1:600 inline1:1100:600 1: 600 3: 500 inline2: 500 inline2:500:500 1: 500 ``` This feature could be useful for offline analysis, like understanding the hotness of each individual function. So I'm adding the support to `llvm-profdata merge` under `--gen-flattened-profile`. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D146452	2023-03-30 11:05:10 -07:00
Hongtao Yu	5c2ae37bbe	[CSSPGO][Preinliner] Trim cold call edges of the profiled call graph for a more stable profile generation. I've noticed that for some services CSSPGO profile is less stable than non-CS AutoFDO profile from profiling to profiling without source changes. This is manifested by comparing profile similarities. For example in my experiments, AutoFDO profiles are always 99+% similar over same binary but different inputs (very close dynamic traffics) while CSSPGO profile similarity is around 90%. The main source of the profile stability is the top-down order computed on the profiled call graph in the llvm-profgen CS preinliner. The top-down order is used to guide the CS preinliner to pre-compute an inline decision that is later on fulfilled by the compiler. A subtle change in the top-down order from run to run could cause a different inline decision computed. A deeper look in the diversion of the top-down order revealed that: - The topological sorting inside one SCC isn't quite right. This is fixed by {D130717}. - The profiled call graphs of the two sides of the A/B run isn't 100% the same. The call edges in the two runs do not subsume each other, and edges appear in both graphs may not have exactly the same weight. This is due to the nature that the graphs are dynamic. However, I saw that the graphs can be made more close by removing the cold edges from them and this bumped up the CSSPGO profile stableness to the same level of the AutoFDO profile. Removing cold call edges from the dynamic call graph may have an impact on cold inlining, but so far I haven't seen any performance issues since the CS preinliner mainly targets hot callsites, and cold inlining can always be done by the compiler CGSCC inliner. Also fixing an issue where the largest weight instead of the accumulated weight for a call edge is used in the profiled call graph. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D147013	2023-03-28 16:24:38 -07:00
Fangrui Song	1e6921131a	Move global namespace cl::opt inside llvm::	2023-02-14 00:09:44 -08:00
Hongtao Yu	39eb1c6145	[CSSPGO][Preinliner] Set default value of sample-profile-inline-limit-max to 50000. The previous threshold 3000 is too small to enable any inlining for giant functions which come in with bigger size than that. In real world, I've seen a big hot function with 34000 dissasembly size. Motivated by that I'm changing the value to 50000. With the new value the allowance size growth should still be reasonable, as it is also bounded by another threshold, i.e, --sample-profile-inline-growth-limit , which defaults to 12. The new value should mostly only affect giant functions. I've seen for serveral internal services, the new threshold boosts performance, and it has neutral impact for other services without hot giant functions. So far I haven't seen any performance regression with that. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D143696	2023-02-13 09:17:50 -08:00
Archibald Elliott	62c7f035b4	[NFC][TargetParser] Remove llvm/ADT/Triple.h I also ran `git clang-format` to get the headers in the right order for the new location, which has changed the order of other headers in two files.	2023-02-07 12:39:46 +00:00
Steven Wu	516e301752	[NFC][Profile] Access profile through VirtualFileSystem Make the access to profile data going through virtual file system so the inputs can be remapped. In the context of the caching, it can make sure we capture the inputs and provided an immutable input as profile data. Reviewed By: akyrtzi, benlangmuir Differential Revision: https://reviews.llvm.org/D139052	2023-02-01 09:25:02 -08:00
Elena Lepilkina	537cdf92c4	[llvm-objdump][RISCV] Use new common method to parse ARCH RISCV attribute Differential Revision: https://reviews.llvm.org/D139553	2023-01-16 16:57:55 +03:00
Archibald Elliott	f09cf34d00	[Support] Move TargetParsers to new component This is a fairly large changeset, but it can be broken into a few pieces: - `llvm/Support/TargetParser` are all moved from the LLVM Support component into a new LLVM Component called "TargetParser". This potentially enables using tablegen to maintain this information, as is shown in https://reviews.llvm.org/D137517. This cannot currently be done, as llvm-tblgen relies on LLVM's Support component. - This also moves two files from Support which use and depend on information in the TargetParser: - `llvm/Support/Host.{h,cpp}` which contains functions for inspecting the current Host machine for info about it, primarily to support getting the host triple, but also for `-mcpu=native` support in e.g. Clang. This is fairly tightly intertwined with the information in `X86TargetParser.h`, so keeping them in the same component makes sense. - `llvm/ADT/Triple.h` and `llvm/Support/Triple.cpp`, which contains the target triple parser and representation. This is very intertwined with the Arm target parser, because the arm architecture version appears in canonical triples on arm platforms. - I moved the relevant unittests to their own directory. And so, we end up with a single component that has all the information about the following, which to me seems like a unified component: - Triples that LLVM Knows about - Architecture names and CPUs that LLVM knows about - CPU detection logic for LLVM Given this, I have also moved `RISCVISAInfo.h` into this component, as it seems to me to be part of that same set of functionality. If you get link errors in your components after this patch, you likely need to add TargetParser into LLVM_LINK_COMPONENTS in CMake. Differential Revision: https://reviews.llvm.org/D137838	2022-12-20 11:05:50 +00:00
David Blaikie	09e79659bf	llvm-profgen: Fix use of stats to be under LLVM_ENABLE_STATS This caused a -Wunused-variable warning in a without-assert+with-stats build (because the stats were included but their use was not). Stat use is meant to be gated by LLVM_ENABLE_STATS which can be set independently of assertions.	2022-12-18 17:46:01 +00:00
Fangrui Song	21c4dc7997	std::optional::value => operator*/operator-> value() has undesired exception checking semantics and calls __throw_bad_optional_access in libc++. Moreover, the API is unavailable without _LIBCPP_NO_EXCEPTIONS on older Mach-O platforms (see _LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS). This fixes clang.	2022-12-17 00:42:05 +00:00
Florian Hahn	e2b9cd796b	[llvm-profgen] Fix build failure after 5d7950a403bec25e52. This fixes a build failure with libc++ (`error: no matching function for call to 'max')`	2022-12-16 17:21:12 +00:00
Hongtao Yu	5d7950a403	[CSSPGO][llvm-profgen] Missing frame inference. This change introduces a missing frame inferrer aiming at fixing missing frames. It current only handles missing frames due to the compiler tail call elimination (TCE) but could also be extended to supporting other scenarios like frame pointer omission. When a tail called function is sampled, the caller frame will be missing from the call chain because the caller frame is reused for the callee frame. While TCE is beneficial to both perf and reducing stack overflow, a workaround being made in this change aims to find back the missing frames as much as possible. The idea behind this work is to build a dynamic call graph that consists of only tail call edges constructed from LBR samples and DFS-search for a unique path for a given source frame and target frame on the graph. The unique path will be used to fill in the missing frames between the source and target. Note that only a unique path counts. Multiple paths are treated unreachable since we don't want to overcount for any particular possible path. A switch --infer-missing-frame is introduced and defaults to be on. Some testing results: - 0.4% perf win according to three internal benchmarks. - About 2/3 of the missing tail call frames can be recovered, according to an internal benchmark. - 10% more profile generation time. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D139367	2022-12-16 08:44:43 -08:00
Kazu Hirata	6eb0b0a045	Don't include Optional.h These files no longer use llvm::Optional.	2022-12-14 21:16:22 -08:00
Fangrui Song	da2f5d0a41	[tools] llvm::Optional => std::optional	2022-12-14 08:01:04 +00:00
Fangrui Song	89fab98e88	[DebugInfo] llvm::Optional => std::optional https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-05 00:09:22 +00:00
Kazu Hirata	b4482f7ca0	[tools] Use std::nullopt instead of None (NFC) This patch mechanically replaces None with std::nullopt where the compiler would warn if None were deprecated. The intent is to reduce the amount of manual work required in migrating from Optional to std::optional. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-02 21:11:40 -08:00
Matt Arsenault	e748db0f7f	Support: Convert Program APIs to std::optional	2022-12-01 17:00:44 -05:00
Kazu Hirata	286223edc6	[llvm-profgen] Use std::optional in ProfiledBinary.cpp (NFC) This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-11-26 19:01:24 -08:00
Fangrui Song	c2250d8bc0	[CSSPGO] Move cl::opt inside llvm:: after D100528 and D108342	2022-11-23 23:08:49 -08:00
Hongtao Yu	c566777818	[llvm-profgen] Fix a typo in collectProfiledFunctions As titled. The change should have minimal impact since the targets of branch samples are mostly covered by range samples. Reviewed By: wenlei, wlei Differential Revision: https://reviews.llvm.org/D137203	2022-11-01 16:52:39 -07:00
Hongtao Yu	d5a963ab8b	[PseudoProbe] Replace relocation with offset for entry probe. Currently pseudo probe encoding for a function is like: - For the first probe, a relocation from it to its physical position in the code body - For subsequent probes, an incremental offset from the current probe to the previous probe The relocation could potentially cause relocation overflow during link time. I'm now replacing it with an offset from the first probe to the function start address. A source function could be lowered into multiple binary functions due to outlining (e.g, coro-split). Since those binary function have independent link-time layout, to really avoid relocations from .pseudo_probe sections to .text sections, the offset to replace with should really be the offset from the probe's enclosing binary function, rather than from the entry of the source function. This requires some changes to previous section-based emission scheme which now switches to be function-based. The assembly form of pseudo probe directive is also changed correspondingly, i.e, reflecting the binary function name. Most of the source functions end up with only one binary function. For those don't, a sentinel probe is emitted for each of the binary functions with a different name from the source. The sentinel probe indicates the binary function name to differentiate subsequent probes from the ones from a different binary function. For examples, given source function ``` Foo() { … Probe 1 … Probe 2 } ``` If it is transformed into two binary functions: ``` Foo: … Foo.outlined: … ``` The encoding for the two binary functions will be separate: ``` GUID of Foo Probe 1 GUID of Foo Sentinel probe of Foo.outlined Probe 2 ``` Then probe1 will be decoded against binary `Foo`'s address, and Probe 2 will be decoded against `Foo.outlined`. The sentinel probe of `Foo.outlined` makes sure there's not accidental relocation from `Foo.outlined`'s probes to `Foo`'s entry address. On the BOLT side, to be minimal intrusive, the pseudo probe re-encoding sticks with the old encoding format. This is fine since unlike linker, Bolt processes the pseudo probe section as a whole and it is free from relocation overflow issues. The change is downwards compatible as long as there's no mixed use of the old encoding and the new encoding. Reviewed By: wenlei, maksfb Differential Revision: https://reviews.llvm.org/D135912 Differential Revision: https://reviews.llvm.org/D135914 Differential Revision: https://reviews.llvm.org/D136394	2022-10-27 13:28:22 -07:00
wlei	bf97c1e066	[NFC] fix a wrong name change during rebase	2022-10-25 21:16:29 -07:00
wlei	91cc53d5a4	[llvm-profgen] Do not cache the frame location stack during computing inlined context size In `computeInlinedContextSizeForRange`, the offset of range is only used one time, there is no need to cache the frame location stack. Measured on one internal service binary, this can save 2GB memory usage and reduce a small run time (avoid one hash search). Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D128859	2022-10-25 21:08:36 -07:00
wlei	467652486f	[llvm-profgen] Fix inconsistent loading address issues This is to fix two issues related with loading address: 1) When multiple MMAPs occur and their loading address are different, before it only used the first MMap as base address, all perf address after it used the wrong base address. 2) For pseudo probe profile, the address is always based on preferred loading address. If the base address is not equal to the preferred loading address, the pseudo probe address query will be wrong. Solution: Instead of converting the address to offset lazily, right now all the address after parsing are converted on the fly based on preferred loading address in the parsing time. There is no "offset" used in profile generator any more. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D126827	2022-10-13 23:19:30 -07:00
wlei	c136d8582b	[llvm-profgen] Remove CommaSeparated option from perf file cl There could be comma in one perf file path, since at this point it only support one file as input, there is no need for the `llvm:🆑:MiscFlags::CommaSeparated` option. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D134287	2022-09-21 09:46:21 -07:00
Kazu Hirata	89f1433225	Use llvm::lower_bound (NFC)	2022-09-03 11:17:37 -07:00
wlei	1b212d1098	[llvm-profgen] Fix perf script parsing issues Fix two perf script parsing issues: 1) Redirect the error message to a new file. (the error message mixed in the perfscript could screw up the MMAP event line and cause a parsing failure) 2) Changed the MMap parsing error message to warning since the perfscript can still be parsed using the preferred address as base address. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D131449	2022-08-08 15:51:07 -07:00
Kazu Hirata	a2d4501718	[llvm] Fix comment typos (NFC)	2022-08-07 00:16:14 -07:00
Kazu Hirata	b5188591a0	[llvm] Remove redundaunt virtual specifiers (NFC) Identified with modernize-use-override.	2022-07-24 21:50:35 -07:00
Mircea Trofin	7b81a81d5f	[NFC] FunctionSamples::getEntrySamples -> getHeadSamplesEstimate The name `getEntrySamples` was misleading for 2 reasons. One, it's close in name to `Function::getEntryCount`, but the equivalent here is `getHeadSamples`; second, as opposed to the other get* APIs in `FunctionSamples`, it performs an estimate/heuristic rather than just retrieving raw data (or a non-heuristic derivate off that data, like `getMaxCountInside`) The new name should more clearly communicate its intent; and, being close (in name) to `getHeadSamples`, it should allow the reader discover the relation between them. Also updated the doc comments for both `getHeadSamples[Estimate]` so a reader may better understand the relation between them. Differential Revision: https://reviews.llvm.org/D130281	2022-07-22 09:17:59 -07:00
Kazu Hirata	611ffcf4e4	[llvm] Use value instead of getValue (NFC)	2022-07-13 23:11:56 -07:00
wlei	7e86b13c63	[CSSPGO][llvm-profgen] Reimplement SampleContextTracker using context trie This is the followup patch to https://reviews.llvm.org/D125246 for the `SampleContextTracker` part. Before the promotion and merging of the context is based on the SampleContext(the array of frame), this causes a lot of cost to the memory. This patch detaches the tracker from using the array ref instead to use the context trie itself. This can save a lot of memory usage and benefit both the compiler's CS inliner and llvm-profgen's pre-inliner. One structure needs to be specially treated is the `FuncToCtxtProfiles`, this is used to get all the functionSamples for one function to do the merging and promoting. Before it search each functions' context and traverse the trie to get the node of the context. Now we don't have the context inside the profile, instead we directly use an auxiliary map `ProfileToNodeMap` for profile , it initialize to create the FunctionSamples to TrieNode relations and keep updating it during promoting and merging the node. Moreover, I was expecting the results before and after remain the same, but I found that the order of FuncToCtxtProfiles matter and affect the results. This can happen on recursive context case, but the difference should be small. Now we don't have the context, so I just used a vector for the order, the result is still deterministic. Measured on one huge size(12GB) profile from one of our internal service. The profile similarity difference is 99.999%, and the running time is improved by 3X(debug mode) and the memory is reduced from 170GB to 90GB. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D127031	2022-06-27 23:22:21 -07:00
wlei	aa58b7b1e3	[CSSPGO][llvm-profgen] Reimplement computeSummaryAndThreshold using context trie Follow-up patch to https://reviews.llvm.org/D125246, support `computeSummaryAndThreshold` based on context trie. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D127026	2022-06-27 23:22:21 -07:00
wlei	eba5749262	[CSSPGO][llvm-profgen] Reimplement CS profile generator using context trie Our investigation showed ProfileMap's key is the bottleneck of the memory consumption for CS profile generation on some large services. This patch tries to optimize it by storing the CS function samples using the context trie tree structure instead of the context frame array ref. Parts of code in `ContextTrieNode` are reused. Our experiment on one internal service showed that the context key's memory can be reduced from 80GB to 300MB. To be compatible with non-CS profiles, the profile writer still needs to use ProfileMap as input, so rebuild the ProfileMap using the context trie in `postProcessProfiles`. The optimization is not complete yet, next step is to reimplement Pre-inliner or profile trimmer, after that, ProfileMap should be small to be written. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D125246	2022-06-27 23:22:21 -07:00
Kazu Hirata	a7938c74f1	[llvm] Don't use Optional::hasValue (NFC) This patch replaces Optional::hasValue with the implicit cast to bool in conditionals only.	2022-06-25 21:42:52 -07:00
Kazu Hirata	3b7c3a654c	Revert "Don't use Optional::hasValue (NFC)" This reverts commit aa8feeefd3ac6c78ee8f67bf033976fc7d68bc6d.	2022-06-25 11:56:50 -07:00
Kazu Hirata	aa8feeefd3	Don't use Optional::hasValue (NFC)	2022-06-25 11:55:57 -07:00
Kazu Hirata	e0e687a615	[llvm] Don't use Optional::hasValue (NFC)	2022-06-20 10:38:12 -07:00
Corentin Jabot	b62e3a73e1	Replace to_hexString by touhexstr [NFC] LLVM had 2 methods to convert a number to an hexa string, this remove one of them. Differential Revision: https://reviews.llvm.org/D127958	2022-06-16 17:29:50 +02:00
Hongtao Yu	7d293744a8	[CSSPGO][Preinliner] Set default value of sample-profile-inline-limit-max to 3000 The default value of sample-profile-inline-limit-max is defined as 10000 in sampleprofile.cpp. This is too big for cspreinliner which works with assembly size instead of IR size. The value 3000 turns out to be a good tradeoff. Compared to the value 10000, 3000 gives as good performance and code size, but lower build time. Reviewed By: wenlei, wlei Differential Revision: https://reviews.llvm.org/D127330	2022-06-08 12:10:11 -07:00
Fangrui Song	d86a206f06	Remove unneeded cl::ZeroOrMore for cl::opt/cl::list options	2022-06-05 00:31:44 -07:00
Fangrui Song	36c7d79dc4	Remove unneeded cl::ZeroOrMore for cl::opt options Similar to 557efc9a8b68628c2c944678c6471dac30ed9e8e. This commit handles options where cl::ZeroOrMore is more than one line below cl::opt.	2022-06-04 00:10:42 -07:00
Fangrui Song	557efc9a8b	[llvm] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!` error. More were added due to cargo cult. Since the error has been removed, cl::ZeroOrMore is unneeded. Also remove cl::init(false) while touching the lines.	2022-06-03 21:59:05 -07:00
Hongtao Yu	acfd0a3456	[llvm-profgen] Update callsite body samples by summing up all call target samples. Current profile generation caculcates callsite body samples and call target samples separately. The former is done based on LBR range samples while the latter is done based on branch samples. Note that there's a subtle difference. LBR ranges is formed from two consecutive branch samples. Therefore the last entry in a LBR record will not be counted towards body samples while there's still a chance for it to be counted towards call targets if it is a function call. I'm making sense of the call body samples by updating it to the aggregation of call targets. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D122609	2022-05-16 09:13:37 -07:00

1 2 3 4 5 ...

265 Commits