llvm-project

Author	SHA1	Message	Date
spupyrev	42da84fda9	[BOLT] Always match stale entry blocks Two (minor) improvements for stale matching: - always match entry blocks to each other, even if there is a hash mismatch; - ignore nops in (loose) hash computation. I record a small improvement in inference quality on my benchmarks. Tests are not affected Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D159488	2023-09-08 15:46:20 -07:00
spupyrev	1256ef274c	[BOLT] Fine-tuning hash computation for stale matching Fine-tuning hash computation for stale matching: - introducing a new "loose" basic block hash that allows to match many more blocks than before; - tweaking params of the inference algorithm that find (slightly) better solutions; - added more meaningful tests for stale matching. Tested the changes on several open-source benchmarks (clang, rocksdb, chrome) and one prod workload using different compiler modes (LTO/PGO etc). There is always an improvement in the quality of inferred profiles. (The current implementation is still not optimal but the diff is a step forward; I am open to further suggestions) Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D156278	2023-08-31 07:29:02 -07:00
Job Noorman	23c8d38258	[BOLT] Calculate input to output address map using BOLTLinker BOLT uses MCAsmLayout to calculate the output values of basic blocks. This means output values are calculated based on a pre-linking state and any changes to symbol values during linking will cause incorrect values to be used. This issue was first addressed in D154604 by adding all basic block symbols to the symbol table for the linker to resolve them. However, the runtime overhead of handling this huge symbol table turned out to be prohibitively large. This patch solves the issue in a different way. First, a temporary section containing [input address, output symbol] pairs is emitted to the intermediary object file. The linker will resolve all these references so we end up with a section of [input address, output address] pairs. This section is then parsed and used to: - Replace BinaryBasicBlock::OffsetTranslationTable - Replace BinaryFunction::InputOffsetToAddressMap - Update BinaryBasicBlock::OutputAddressRange Note that the reason this is more performant than the previous attempt is that these symbol references do not cause entries to be added to the symbol table. Instead, section-relative references are used for the relocations. Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D155604	2023-08-21 10:36:20 +02:00
Amir Ayupov	d796f36fbc	[BOLT][NFC] Simplify DataAggregator Use short loop instead of duplicating the code for setHasProfileAvailable. Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D154749	2023-07-31 14:54:41 -07:00
Amir Ayupov	b0b566b5da	[BOLT][YAML] Only read first profile per function Work around the issue of multiple profiles per function. Can happen with a stale profile which has separate profiles that in a new binary got merged and became aliases. Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D156644	2023-07-31 13:48:09 -07:00
spupyrev	6d1502c654	[BOLT] (Minor) Changes in stale inference 1. Using ADT/Bitfields.h for hash computation; this is equivalent but shorter than the existing implementation 2. Getting rid of Layout indices for stale matching; using BB->getIndex for indexing Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D155748	2023-07-27 15:29:03 -07:00
spupyrev	31e8a9f4d9	[BOLT] Add stale-related logging Adding some logs related to stale profile matching. The new data can be helpful to understand how "stale" the input profile is and how well the inference is able to utilize the stale data. Example of outputs on clang-10 built with LTO (profile collected on a year-old release): ``` BOLT-INFO: inferred profile for 2101 (18.52% of profiled, 100.00% of stale) functions responsible for 30.95% samples (14754697 out of 47670654) BOLT-INFO: stale inference matched 89.42% of basic blocks (79052 out of 88402 stale) responsible for 76.99% samples (645737 out of 838719 stale) ``` LTO+AutoFDO: ``` BOLT-INFO: inferred profile for 6146 (57.57% of profiled, 100.00% of stale) functions responsible for 90.34% samples (50891403 out of 56330313) BOLT-INFO: stale inference matched 74.55% of basic blocks (191295 out of 256589 stale) responsible for 57.30% samples (1288632 out of 2248799 stale) ``` Reviewed By: Amir, maksfb Differential Revision: https://reviews.llvm.org/D154737	2023-07-27 08:56:57 -07:00
Amir Ayupov	e8a75c3f6e	[BOLT][NFC] Simplify YAMLProfileReader - Add `FunctionSet` type alias. - Use any_of - Use ErrorOr handling pattern Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D156043	2023-07-26 08:26:16 -07:00
Amir Ayupov	1e0d08e872	[BOLT] Add blocks order kind to YAML profile header Specify blocks order used in YAML profile. Needed to ensure profile backwards compatibility with pre-D155514 DFS order by default. Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D156176	2023-07-24 21:33:05 -07:00
Amir Ayupov	69b7e257fe	[BOLT] Switch to using layout order in YAML Use layout order in YAML profile reading/writing. Preserve old behavior (DFS order) under `-profile-use-dfs` option. Reviewed By: spupyrev Differential Revision: https://reviews.llvm.org/D155514	2023-07-18 14:33:41 -07:00
Amir Ayupov	6c87315518	[BOLT] Sort CallSiteInfo targets by symbol name in YAMLWriter Align YAML and fdata profiles by sorting CallSiteInfo targets by symbol name, aligning it to fdata. By default, YAML CallSiteInfo is sorted by function id, which is the order of function in the binary. Follow-up to D152731, aligning yaml vs fdata, and in turn all three between to each other. Reviewed By: #bolt, rafauler Differential Revision: https://reviews.llvm.org/D152733	2023-06-20 15:20:24 -07:00
Kazu Hirata	b188f9f597	[BOLT] Use {StringMap,DenseMapBase}::lookup (NFC)	2023-06-16 07:48:19 -07:00
Amir Ayupov	224e4cc516	[BOLT] Sort BranchData in DataAggregator Align perf reader to fdata behavior by sorting BranchData after reading samples, in the same way as DataReader: `20c66a0c66/bolt/lib/Profile/DataReader.cpp (L1239)` Namely, that order affects CallSiteInfo annotations which determine the construction order of CallGraph, which in turn affects function reordering. Reviewed By: #bolt, rafauler Differential Revision: https://reviews.llvm.org/D152731	2023-06-15 12:08:57 -07:00
Amir Ayupov	5acac7db6e	[BOLT][NFCI] Use StringRef.split in launchPerfProcess Use StringRef method instead of reimplementing the splitting. Incidentally, it also fixes the duplicate printing of the command arguments: ``` PERF2BOLT: spawning perf job to read branch events Launching perf: /usr/bin/perf script^@-F^@pid,ip,brstack -F^@pid,ip,brstack pid,ip,brstack -f -i PERF2BOLT: spawning perf job to read mem events Launching perf: /usr/bin/perf script^@-F^@pid,event,addr,ip -F^@pid,event,addr,ip pid,event,addr,ip -f -i PERF2BOLT: spawning perf job to read process events Launching perf: /usr/bin/perf script^@--show-mmap-events --show-mmap-events -f -i PERF2BOLT: spawning perf job to read task events Launching perf: /usr/bin/perf script^@--show-task-events --show-task-events -f -i ``` Fixes it to: ``` PERF2BOLT: spawning perf job to read branch events Launching perf: /usr/bin/perf script -F pid,ip,brstack -f -i PERF2BOLT: spawning perf job to read mem events Launching perf: /usr/bin/perf script -F pid,event,addr,ip -f -i PERF2BOLT: spawning perf job to read process events Launching perf: /usr/bin/perf script --show-mmap-events -f -i PERF2BOLT: spawning perf job to read task events Launching perf: /usr/bin/perf script --show-task-events -f -i ``` Reviewed By: #bolt, rafauler Differential Revision: https://reviews.llvm.org/D152483	2023-06-09 06:24:17 -07:00
spupyrev	2316a10fe5	[BOLT] stale profile matching [part 2 out of 2] This is a first "serious" version of stale profile matching in BOLT. This diff extends the hash computation for basic blocks so that we can apply a fuzzy hash-based matching. The idea is to compute several "versions" of a hash value for a basic block. A loose version of a hash (computed by ignoring instruction operands) allows to match blocks in functions whose content has been changed, while stricter hash values (considering instruction opcodes with operands and even based on hashes of block's successors/predecessors) allow to resolve collisions. In order to save space and build time, individual hash components are blended into a single uint64_t. There are likely numerous ways of improving hash computation but already this simple variant provides significant perf benefits. Perf testing on the clang binary: collecting data on clang-10 and using it to optimize clang-11 (with ~1 year of commits in between). Next, we compare - //stale_clang// (clang-11 optimized with profile collected on clang-10 with infer-stale-profile=0) - //opt_clang// (clang-11 optimized with profile collected on clang-11) - //infer_clang// (clang-11 optimized with profile collected on clang-10 with infer-stale-profile=1) `LTO-only` mode: //stale_clang// vs //opt_clang//: task-clock [delta(%): 9.4252 ± 1.6582, p-value: 0.000002] (That is, there is a ~9.5% perf regression) //infer_clang// vs //opt_clang//: task-clock [delta(%): 2.1834 ± 1.8158, p-value: 0.040702] (That is, the regression is reduced to ~2%) Related BOLT logs: ``` BOLT-INFO: identified 2114 (18.61%) stale functions responsible for 30.96% samples BOLT-INFO: inferred profile for 2101 (18.52% of all profiled) functions responsible for 30.95% samples ``` `LTO+AutoFDO` mode: //stale_clang// vs //opt_clang//: task-clock [delta(%): 19.1293 ± 1.4131, p-value: 0.000002] //infer_clang// vs //opt_clang//: task-clock [delta(%): 7.4364 ± 1.3343, p-value: 0.000002] Related BOLT logs: ``` BOLT-INFO: identified 5452 (50.27%) stale functions responsible for 85.34% samples BOLT-INFO: inferred profile for 5442 (50.23% of all profiled) functions responsible for 85.33% samples ``` Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D146661	2023-06-08 14:42:41 -07:00
Amir Ayupov	c061f75554	[BOLT] Handle recursive calls as inter-branches in DataAggregator Align yaml and fdata profiles by applying the same treatment to recursive calls (direct, indirect, tail). fdata profile increments entry count when handling recursive calls. Make perf/pre-aggregated perf reader (DataAggregator) do the same. Test Plan: In pre-aggregated-perf.test, add a dummy pre-aggregated branch entry between an indirect call in `frame_dummy` function and its entry point. Check that YAML profile gets incremented entry count for this function. End-to-end test: https://github.com/rafaelauler/bolt-tests/pull/24 Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D152338	2023-06-08 04:17:07 -07:00
Amir Ayupov	713b28532e	[BOLT][NFC] Fix debug messages Fix debug printing, making it easier to compare two debug logs side by side: - `BinaryFunction::addRelocation`: print function name instead of `this` ptr, - `DataAggregator::doTrace`: remove duplicated function name. Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D152314	2023-06-06 15:50:58 -07:00
spupyrev	44268271f6	[BOLT] stale profile matching [part 1 out of 2] BOLT often has to deal with profiles collected on binaries built from several revisions behind release. As a result, a certain percentage of functions is considered stale and not optimized. This diff adds an ability to match profile to functions that are not 100% binary identical, which increases the optimization coverage and boosts the performance of applications. The algorithm consists of two phases: matching and inference: - At the matching phase, we try to "guess" as many block and jump counts from the stale profile as possible. To this end, the content of each basic block is hashed and stored in the (yaml) profile. When BOLT optimizes a binary, it computes block hashes and identifies the corresponding entries in the stale profile. It yields a partial profile for every CFG in the binary. - At the inference phase, we employ a network flow-based algorithm (profi) to reconstruct "realistic" block and jump counts from the partial profile generated at the first stage. In practice, we don't always produce proper profile data but the majority (e.g., >90%) of CFGs get the correct counts. This is a first part of the change; the next stacked diff extends the block hashing and provides perf evaluation numbers. Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D144500	2023-06-06 12:13:52 -07:00
Amir Ayupov	a478a09131	[BOLT][NFC] Drop MMap events for deleted files Don't parse/handle mmap events with "(deleted)" filename. Reviewed By: #bolt, rafauler Differential Revision: https://reviews.llvm.org/D151948	2023-06-05 13:03:40 -07:00
Amir Ayupov	bce889c8df	[BOLT] Align BranchInfo and FuncBranchData in DataAggregator::recordTrace `DataAggregator::recordTrace` serves two purposes: - Attaching LBR fallthrough ("trace") information to CFG (`getBranchInfo`), which eventually gets emitted as YAML profile. - Populating vector of offsets that gets added to `FuncBranchData`, which eventually gets emitted as fdata profile. `recordTrace` is invoked from `getFallthroughsInTrace` which checks its return status and passes on the collected vector of offsets to `doTrace`. However, if a malformed trace is passed to `recordTrace` it might partially attach the profile to CFG and exit with false, not propagating the vector of offsets to `doTrace`. This leads to a difference between fdata and yaml profile collected from the same binary and the same perf file. (Skylake LBR errata might produce such malformed traces where the last entry is duplicated, resulting in invalid fallthrough path between the last two entries). There are two ways to handle this mismatch: conservative (aligned with fdata), or aggressive (aligned with yaml). Conservative approach would discard the trace entirely, buffering the CFG updates until all fallthroughs are confirmed. Aggressive approach would apply CFG updates and return the matching fallthroughs in the vector even if the trace is invalid (doesn't correspond to a valid fallthrough path). I chose to go with the former (conservative/fdata) approach which produces more accurate profile. We can't rely on pre-filtering such traces early (in LBR sample processing) as DataAggregator is used for both perf samples and pre-aggregated perf information which loses branch stack information. Test Plan: https://github.com/rafaelauler/bolt-tests/pull/22 Reviewed By: #bolt, rafauler Differential Revision: https://reviews.llvm.org/D151614	2023-05-30 18:03:45 -07:00
Amir Ayupov	860543d96e	[BOLT][NFC] Extract DataAggregator::parseLBRSample Reviewed By: #bolt, rafauler Differential Revision: https://reviews.llvm.org/D150986	2023-05-19 17:50:02 -07:00
Amir Ayupov	17f3cbe3af	[BOLT][NFC] Use llvm::make_range Use `llvm::make_range` convenience wrapper from ADT. Reviewed By: #bolt, rafauler Differential Revision: https://reviews.llvm.org/D145887	2023-05-17 10:50:56 -07:00
spupyrev	3e3a926be8	[BOLT][NFC] Add hash computation for basic blocks Extending yaml profile format with block hashes, which are used for stale profile matching. To avoid duplication of the code, created a new class with a collection of utilities for computing hashes. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D144306	2023-05-02 14:03:47 -07:00
spupyrev	92758a99c3	[BOLT] computing raw branch count for yaml profiles `Function.RawBranchCount` is initialized for fdata profile but not for yaml one. The diff adds the computation of the field for yaml profiles Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D144211	2023-03-28 11:09:21 -07:00
Kazu Hirata	4e585e51c1	Use *{Map,Set}::contains (NFC)	2023-03-15 22:55:35 -07:00
Amir Ayupov	c7af4f383d	[BOLT][NFC] Simplify preprocessProfile Move out prepareToParse lambda, generalize it to handle mem events perf process. Reviewed By: #bolt, rafauler Differential Revision: https://reviews.llvm.org/D146002	2023-03-15 12:56:06 -07:00
Maksim Panchenko	73b89e3f38	[BOLT] Remove dependency on StringMap iteration order Remove the usage of StringMap in places where the iteration order affects the output since the iteration over StringMap is non-deterministic. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D145194	2023-03-03 09:21:26 -08:00
Amir Ayupov	dbb7316e02	[BOLT][NFC] Move getLTOCommonName to Utils Reuse getLTOCommonName in components other than Profile (to be used in Core) Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D142259	2023-01-20 19:52:14 -08:00
Amir Ayupov	e5e07b01d8	[BOLT] Handle __uniq suffix added by -funique-internal-linkage-names In profile matching, if `.__uniq` suffix added for internal linkage symbols with `-funique-internal-linkage-names` prevents BOLT from matching to a binary function, try to strip the suffix and perform fuzzy name matching. Follow-up to D124117. Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D142073	2023-01-20 19:23:06 -08:00
Amir Ayupov	4a7966ea1b	[BOLT][NFC] DataAggregator code cleanup Reviewed By: #bolt, rafauler Differential Revision: https://reviews.llvm.org/D139794	2023-01-18 13:44:44 -08:00
Joe Loser	a288d7f937	[llvm][ADT] Replace uses of `makeMutableArrayRef` with deduction guides Similar to how `makeArrayRef` is deprecated in favor of deduction guides, do the same for `makeMutableArrayRef`. Once all of the places in-tree are using the deduction guides for `MutableArrayRef`, we can mark `makeMutableArrayRef` as deprecated. Differential Revision: https://reviews.llvm.org/D141814	2023-01-16 14:49:37 -07:00
Amir Ayupov	f40d25dd8d	[BOLT][NFC] Use llvm::reverse Use llvm::reverse instead of `for (auto I = rbegin(), E = rend(); I != E; ++I)` Reviewed By: #bolt, rafauler Differential Revision: https://reviews.llvm.org/D140516	2023-01-03 17:32:11 -08:00
Amir Ayupov	6b05a62a6b	[BOLT] Check no-LBR samples in mayHaveProfileData No-LBR mode wasn't tested and slipped when mayHaveProfileData was added for Lite mode. This enables processing of profiles collected without LBR and converted with `perf2bolt -nl` option. Test Plan: bin/llvm-lit -a tools/bolt/test/X86/nolbr.s https://github.com/rafaelauler/bolt-tests/pull/20 Reviewed By: #bolt, rafauler Differential Revision: https://reviews.llvm.org/D140256	2023-01-03 14:43:36 -08:00
Matt Arsenault	765f3cafa1	bolt: Update more sys::Wait calls	2022-12-14 12:00:41 -05:00
Matt Arsenault	6be2db6ca5	bolt: Try to fix build after sys::Program API change Hopefully fixes build after 15a6e3c636977dc962a415c067182e6d57242116	2022-12-14 11:56:13 -05:00
Amir Ayupov	15d1e51750	[BOLT][NFC] Use std::optional for getLTOCommonName	2022-12-11 22:13:46 -08:00
Amir Ayupov	e8f5743e86	[BOLT][NFC] Use std::optional in BC	2022-12-11 22:13:46 -08:00
Amir Ayupov	835a9c2801	[BOLT][NFC] Use std::optional in DataAggregator	2022-12-11 22:13:46 -08:00
Amir Ayupov	3d573fdbb4	[BOLT][NFC] Use std::optional in BAT	2022-12-11 22:13:46 -08:00
Maksim Panchenko	0f915826cc	[BOLT] Handle access errors while reading profile When the user does not have permissions to access the profile, consume the error contained in Expected<> to avoid dumping stack to the user. Differential Revision: https://reviews.llvm.org/D139480	2022-12-07 17:11:30 -08:00
Amir Ayupov	2563fd63c6	[BOLT][NFC] Use std::optional in MCPlusBuilder Reviewed By: maksfb, #bolt Differential Revision: https://reviews.llvm.org/D139260	2022-12-06 14:51:38 -08:00
Krzysztof Parzyszek	3c255f679c	Process: convert Optional to std::optional This applies to GetEnv and FindInEnvPath.	2022-12-06 09:56:14 -08:00
Kazu Hirata	e324a80fab	[BOLT] Use std::nullopt instead of None (NFC) This patch mechanically replaces None with std::nullopt where the compiler would warn if None were deprecated. The intent is to reduce the amount of manual work required in migrating from Optional to std::optional. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-02 23:12:38 -08:00
Kazu Hirata	1028b165ee	[BOLT] Fix a build error This patch fixes: bolt/lib/Profile/DataAggregator.cpp:264:66: error: no viable conversion from 'Optional<llvm::StringRef>[3]' to 'ArrayRef<std::optional<StringRef>>'	2022-12-01 15:48:03 -08:00
Kazu Hirata	34bcadc38c	Use std::nullopt_t instead of NoneType (NFC) This patch replaces those occurrences of NoneType that would trigger an error if the definition of NoneType were missing in None.h. To keep this patch focused, I am deliberately not replacing None with std::nullopt in this patch or updating comments. They will be addressed in subsequent patches. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716 Differential Revision: https://reviews.llvm.org/D138539	2022-11-23 14:16:04 -08:00
Kazu Hirata	1fa870b1bd	Use None consistently (NFC) This patch replaces NoneType() and NoneType::None with None in preparation for migration from llvm::Optional to std::optional. In the std::optional world, we are not guranteed to be able to default-construct std::nullopt_t or peek what's inside it, so neither NoneType() nor NoneType::None has a corresponding expression in the std::optional world. Once we consistently use None, we should even be able to replace the contents of llvm/include/llvm/ADT/None.h with something like: using NoneType = std::nullopt_t; inline constexpr std::nullopt_t None = std::nullopt; to ease the migration from llvm::Optional to std::optional. Differential Revision: https://reviews.llvm.org/D138376	2022-11-20 00:24:40 -08:00
Rafael Auler	ba9cc6537c	[PERF2BOLT] Fix unittest failure Fix failure caused by commit e549ac072b "Do not issue parsing error on weird build ids".	2022-09-28 16:01:57 -07:00
Rafael Auler	e549ac072b	[PERF2BOLT] Do not issue parsing error on weird build ids In weird entries we were issueing a parse error. For example, in line 5 here: 6862acc063b0aa86595f52ff81628577df4296ff a.so 6862acc063b0aa86595f52ff81628577df4296ff a.so 6862acc063b0aa86595f52ff81628577df4296ff a.so db758cb3c970044e78d5a4c99b011708a9995636 bin1 60326683eab31acfd03435d9ed4ff9a8 bin2 7d448e51851b4bdb33eac84f90e74628a14a5f00 b.so 742aa26e0211794356cc25f415c25230a26aa045 c.so Error reading BOLT data input file: line 89, column 33: malformed field Fix that. Reviewed By: #bolt, Amir Differential Revision: https://reviews.llvm.org/D134822	2022-09-28 14:41:55 -07:00
serge-sans-paille	61cff9079c	[BOLT] Support building bolt when LLVM_LINK_LLVM_DYLIB is ON This does not link with libLLVM, but with static archives instead. Not super-great, but at least the build works, which is probably better than failing. Related to #57551 Differential Revision: https://reviews.llvm.org/D134434	2022-09-23 07:59:30 +02:00
Amir Ayupov	39336fc09c	[BOLT] Control aggregation mode output profile file format In perf2bolt and `-aggregate-only` BOLT mode, the output profile file is written in fdata format by default. Provide a knob `-profile-format=[fdata,yaml]` to control the format. Note that `-w` option still dumps in YAML format. Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D133995	2022-09-19 13:37:10 -07:00

1 2 3 4 5

239 Commits