llvm-project

Author	SHA1	Message	Date
Amir Ayupov	0c77468288	[BOLT] Expose external entry count for functions (#141674 ) Record the number of function invocations from external code - code outside the binary, which may include JIT code and DSOs. Accounting external entry counts improves the fidelity of call graph flow conservation analysis. Test Plan: updated shrinkwrapping.test	2025-06-10 14:31:22 -07:00
Kazu Hirata	383a825d6d	[BOLT] Use StringRef::contains (NFC) (#139658 ) Once we convert EventNames to StringRef, which is cheap, we can call StringRef::contains without creating a temporary instance of std::string.	2025-05-12 22:59:26 -07:00
Amir Ayupov	e039d16ee5	[BOLT][NFC] Disambiguate sample as basic sample (#139350 ) Sample is a general term covering both basic (IP) and branch (LBR) profiles. Find and replace ambiguous uses of sample in a basic sample sense. Rename `RawBranchCount` into `RawSampleCount` reflecting its use for both kinds of profile. Rename `PF_LBR` profile type as `PF_BRANCH` reflecting non-LBR based branch profiles (non-brstack SPE, synthesized brstack ETM/PT). Follow-up to #137644. Test Plan: NFC	2025-05-12 17:15:16 -07:00
chrisPyr	038fff3f24	[NFC][BOLT] Make file-local cl::opt global variables static (#126472 ) #125983	2025-03-05 22:11:05 -08:00
Kazu Hirata	06e0869624	[BOLT] Fix warnings This patch fixes: bolt/lib/Profile/StaleProfileMatching.cpp:694:24: error: unused variable 'BinHash' [-Werror,-Wunused-variable] bolt/lib/Profile/YAMLProfileWriter.cpp:206:61: error: missing field 'GUID' initializer [-Werror,-Wmissing-field-initializers] bolt/lib/Profile/YAMLProfileReader.cpp:840:16: error: unused variable 'MatchedWithPseudoProbes' [-Werror,-Wunused-variable]	2024-11-12 09:39:57 -08:00
Shaw Young	9a9af0a23f	[BOLT] Match blocks with pseudo probes (#99891 ) Match inline trees first between profile and the binary: by GUID, checksum, parent, and inline site for inlined functions. Map profile probes to binary probes via matched inline tree nodes. Each binary probe has an associated binary basic block. If all probes from one profile basic block map to the same binary basic block, it’s an exact match, otherwise the block is determined by majority vote and reported as loose match. Pseudo probe matching happens between exact hash matching and call/loose matching. Introduce ProbeMatchSpec - a mechanism to match probes belonging to another binary function. For example, given functions foo and bar: ``` void foo() { bar(); } ``` profiled binary: bar is not inlined => have top-level function bar new binary where the profile is applied to: bar is inlined into foo. Currently, BOLT does 1:1 matching between profile functions and binary functions based on the name. #100446 will extend this to N:M where multiple profiles can be matched to one binary function (as in the example above where binary function foo would use profiles for foo and bar), and one profile can be matched to multiple binary functions (e.g. if bar was inlined into multiple functions). In this diff, ProbeMatchSpecs would only have one BinaryFunctionProfile (existing name-based matching). Test Plan: Added match-blocks-with-pseudo-probes.test Performance test: - Setup: - Baseline no-BOLT: Clang with pseudo probes, ThinLTO + CSSPGO (#79942) - BOLT fresh: BOLTed Clang using fresh profile, - BOLT stale (hash): BOLTed Clang using stale profile (collected on Clang 10K commits back), `-infer-stale-profile` (hash+call block matching) - BOLT stale (+probe): BOLTed Clang using stale profile, `-infer-stale-profile` with `-stale-matching-with-pseudo-probes` (hash+call+pseudo probe block matching) - 2S Intel SKX Xeon 6138 with 40C/80T and 256GB RAM, using 20C/40T for build, - BOLT profiles are collected on Clang compiling large preprocessed C++ file. - Benchmark: building Clang (average of 5 runs), see driver in aaupov/llvm-devmtg-2022 - Results, wall time, lower is better: - Baseline no-BOLT: 429.52 +- 2.61s, - BOLT stale (hash): 413.21 +- 2.19s, - BOLT stale (+probe): 409.69 +- 1.41s, - BOLT fresh: 384.50 +- 1.80s. --------- Co-authored-by: Amir Ayupov <aaupov@fb.com>	2024-11-12 07:21:03 -08:00
Amir Ayupov	d936924f5e	[BOLT][NFC] Make YamlProfileToFunction a DenseMap (#108712 ) YAML function profiles have sparse function IDs, assigned from sequential function IDs from profiled binary. For example, for one large binary, YAML profile has 15K functions, but the highest ID is ~600K, close to number of functions in the profiled binary. In `matchProfileToFunction`, `YamlProfileToFunction` vector was resized to match function ID, which entails a 40X overcommit. Change the type of `YamlProfileToFunction` to DenseMap to reduce memory utilization. #99891 makes use of it for profile lookup associated with a given binary function.	2024-11-08 15:24:48 -08:00
Kazu Hirata	1be849c529	[BOLT] Avoid repeated hash lookups (NFC) (#111782 )	2024-10-09 20:19:58 -07:00
Amir Ayupov	c820bd3e33	[BOLT][NFC] Rename profile-use-pseudo-probes The flag currently controls writing of probe information in YAML profile. #99891 adds a separate flag to use probe information for stale profile matching. Thus `profile-use-pseudo-probes` becomes a misnomer and `profile-write-pseudo-probes` better captures the intent. Reviewers: maksfb, WenleiHe, ayermolo, rafaelauler, dcci Reviewed By: rafaelauler Pull Request: https://github.com/llvm/llvm-project/pull/106364	2024-09-11 16:27:33 -07:00
Amir Ayupov	15fa3ba547	[BOLT][YAML] Allow unknown keys in the input (#100824 ) This ensures forward compatibility, where old BOLT versions can consume the profile created by newer versions with extra keys. Test Plan: added yaml-unknown-keys.test	2024-09-03 11:27:57 -07:00
Amir Ayupov	4d19676de4	[BOLT] Add profile-use-pseudo-probes option Move pseudo probe profile generation under --profile-use-pseudo-probes option. Note that updating pseudo probes is independent from this flag. Test Plan: updated pseudoprobe-decoding-inline.test Reviewers: maksfb, rafaelauler, ayermolo, dcci, WenleiHe Reviewed By: WenleiHe Pull Request: https://github.com/llvm/llvm-project/pull/100299	2024-07-24 07:31:01 -07:00
Shaw Young	296a956369	[BOLT] Match functions with call graph (#98125 ) Implemented call graph function matching. First, two call graphs are constructed for both profiled and binary functions. Then functions are hashed based on the names of their callee/caller functions. Finally, functions are matched based on these neighbor hashes and the longest common prefix of their names. The `match-with-call-graph` flag turns this matching on. Test Plan: Added match-with-call-graph.test. Matched 164 functions in a large binary with 10171 profiled functions.	2024-07-19 14:00:28 -07:00
Shaw Young	131eb30584	[BOLT] Match blocks with calls as anchors (#96596 ) Added another hash level – call hash – following opcode hash matching for stale block matching. Call hash strings are the concatenation of the lexicographically ordered names of each blocks’ called functions. This change bolsters block matching in cases where some instructions have been removed or added but calls remain constant. Test Plan: added match-functions-with-calls-as-anchors.test.	2024-07-10 15:46:47 -07:00
Shaw Young	37bee25497	[BOLT][NFC] Refactor function matching (#97502 ) Moved function matching techniques into separate helper functions for ease of understanding and to make space for additional function matching techniques to be added (e.g. call graph function matching).	2024-07-05 14:44:15 -07:00
Shaw Young	97dc50882c	[BOLT] Match functions with name similarity (#95884 ) A mapping - from namespace to associated binary functions - is used to match function profiles to binary based on the '--name-similarity-function-matching-threshold' flag set edit distance threshold. The flag is set to 0 (exact name matching) by default as it is expensive, requiring the processing of all BFs. Test Plan: Added name-similarity-function-matching.test. On a binary with 5M functions, rewrite passes took ~520s without the flag and ~2018s with the flag set to 20.	2024-07-03 11:39:18 -07:00
Shaw Young	49fdbbcfed	[BOLT] Match functions with exact hash (#96572 ) Added flag '--match-profile-with-function-hash' to match functions based on exact hash. After identical and LTO name matching, more functions can be recovered for inference with exact hash, in the case of function renaming with no functional changes. Collisions are possible in the unlikely case where multiple functions share the same exact hash. The flag is off by default as it requires the processing of all binary functions and subsequently is expensive. Test Plan: added hashing-based-function-matching.test.	2024-06-29 21:19:00 -07:00
shawbyoung	902952ae04	Revert "[𝘀𝗽𝗿] initial version" This reverts commit bb5ab1ffe719f5e801ef08ac08be975546aa3266.	2024-06-25 08:30:29 -07:00
shawbyoung	bb5ab1ffe7	[𝘀𝗽𝗿] initial version Created using spr 1.3.4	2024-06-25 08:05:29 -07:00
shaw young	32e4906c28	Revert "[BOLT] Hash-based function matching" (#96568 ) Reverts llvm/llvm-project#95821	2024-06-24 18:44:24 -04:00
shaw young	5e097c79d8	[BOLT] Hash-based function matching (#95821 ) Using the hashes of binary and profiled functions to recover functions with changed names. Test Plan: added hashing-based-function-matching.test.	2024-06-24 15:29:44 -07:00
Amir Ayupov	720cade2b6	[BOLT][NFC] Avoid computing BF hash twice in YAML reader (#75096 ) We compute BF hashes in `YAMLProfileReader::readProfile` when first matching profile functions with binary functions, and second time in `YAMLProfileReader::parseFunctionProfile` during the profile assignment (we need to do that to account for LTO private functions with mismatching suffix). Avoid recomputing the hash if it's been set.	2024-05-24 14:00:03 -07:00
Amir Ayupov	f3dc732b36	[BOLT][NFC] Make estimateEdgeCounts a BinaryFunctionPass (#93074 )	2024-05-22 11:59:00 -07:00
Amir Ayupov	91423d7193	[BOLT][NFC] Don't assign YAML profile to functions with no CFG (#92487 ) YAML profile for non-simple functions without CFG is 1) useless for optimizations, 2) can't be attached, similar to fdata profile, 3) would be reported as invalid/stale even if the profile is valid. Don't attempt to attach the profile in this case, aligning the behavior to DataReader. Test Plan: added yaml-non-simple.test	2024-05-19 20:15:31 -07:00
Amir Ayupov	b06f97b039	[BOLT] Allow pass-through blocks in YAMLProfileReader (#91828 )	2024-05-13 18:02:38 -07:00
Amir Ayupov	3c64b24ed3	[BOLT] Add extra staleness logging (#80225 ) Report two extra metrics: - # of stale functions with matching block count, - # of stale blocks with matching instruction count.	2024-02-01 07:16:40 -08:00
Kazu Hirata	ad8fd5b185	[BOLT] Use StringRef::{starts,ends}_with (NFC) This patch replaces uses of StringRef::{starts,ends}with with StringRef::{starts,ends}_with for consistency with std::{string,string_view}::{starts,ends}_with in C++20. I'm planning to deprecate and eventually remove StringRef::{starts,ends}with.	2023-12-13 23:34:49 -08:00
Amir Ayupov	b039ccc684	[BOLT] Provide backwards compatibility for YAML profile with std::hash (#74253 ) Provide backwards compatibility for YAML profile that uses `std::hash`: xxh3 hash is the default for newly produced profile (sets `std-hash: false`), whereas the profile that doesn't specify `std-hash` will be treated as `std-hash: true`, preserving old behavior.	2023-12-11 12:27:32 -08:00
Ho Cheung	3af586f797	[BOLT] Fix type mismatch error (#73016 ) Fix build issue on Windows. Fixes #73006	2023-11-21 19:13:46 -08:00
Amir Ayupov	6a1cf545cc	[BOLT][YAML] Only read first profile per function D159460 regressed the bugfix in D156644. Fix that and emit a warning. Add a test case. Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D159529	2023-09-18 20:40:47 -07:00
Amir Ayupov	7b750943d7	[BOLT][NFC] Speedup YAML profile processing Reduce YAML profile processing times: - preprocessProfile: speed up buildNameMaps by replacing ProfileNameToProfile mapping with ProfileFunctionNames set and ProfileBFs vector. Pre-look up YamlBF->BF correspondence, memoize in ProfileBFs. - readProfile: replace iteration over all functions in the binary by iteration over profile functions (strict match and LTO name match). On a large binary (1.9M functions) and large YAML profile (121MB, 30k functions) reduces profile steps runtime: pre-process profile data: 12.4953s -> 10.7123s process profile data: 9.8195s -> 5.6639s Compared to fdata profile reading: pre-process profile data: 8.0268s process profile data: 1.0265s process profile data pre-CFG: 0.1644s Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D159460	2023-09-11 16:07:57 -07:00
Amir Ayupov	b0b566b5da	[BOLT][YAML] Only read first profile per function Work around the issue of multiple profiles per function. Can happen with a stale profile which has separate profiles that in a new binary got merged and became aliases. Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D156644	2023-07-31 13:48:09 -07:00
spupyrev	6d1502c654	[BOLT] (Minor) Changes in stale inference 1. Using ADT/Bitfields.h for hash computation; this is equivalent but shorter than the existing implementation 2. Getting rid of Layout indices for stale matching; using BB->getIndex for indexing Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D155748	2023-07-27 15:29:03 -07:00
Amir Ayupov	e8a75c3f6e	[BOLT][NFC] Simplify YAMLProfileReader - Add `FunctionSet` type alias. - Use any_of - Use ErrorOr handling pattern Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D156043	2023-07-26 08:26:16 -07:00
Amir Ayupov	1e0d08e872	[BOLT] Add blocks order kind to YAML profile header Specify blocks order used in YAML profile. Needed to ensure profile backwards compatibility with pre-D155514 DFS order by default. Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D156176	2023-07-24 21:33:05 -07:00
Amir Ayupov	69b7e257fe	[BOLT] Switch to using layout order in YAML Use layout order in YAML profile reading/writing. Preserve old behavior (DFS order) under `-profile-use-dfs` option. Reviewed By: spupyrev Differential Revision: https://reviews.llvm.org/D155514	2023-07-18 14:33:41 -07:00
spupyrev	44268271f6	[BOLT] stale profile matching [part 1 out of 2] BOLT often has to deal with profiles collected on binaries built from several revisions behind release. As a result, a certain percentage of functions is considered stale and not optimized. This diff adds an ability to match profile to functions that are not 100% binary identical, which increases the optimization coverage and boosts the performance of applications. The algorithm consists of two phases: matching and inference: - At the matching phase, we try to "guess" as many block and jump counts from the stale profile as possible. To this end, the content of each basic block is hashed and stored in the (yaml) profile. When BOLT optimizes a binary, it computes block hashes and identifies the corresponding entries in the stale profile. It yields a partial profile for every CFG in the binary. - At the inference phase, we employ a network flow-based algorithm (profi) to reconstruct "realistic" block and jump counts from the partial profile generated at the first stage. In practice, we don't always produce proper profile data but the majority (e.g., >90%) of CFGs get the correct counts. This is a first part of the change; the next stacked diff extends the block hashing and provides perf evaluation numbers. Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D144500	2023-06-06 12:13:52 -07:00
spupyrev	92758a99c3	[BOLT] computing raw branch count for yaml profiles `Function.RawBranchCount` is initialized for fdata profile but not for yaml one. The diff adds the computation of the field for yaml profiles Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D144211	2023-03-28 11:09:21 -07:00
Kazu Hirata	4e585e51c1	Use *{Map,Set}::contains (NFC)	2023-03-15 22:55:35 -07:00
Amir Ayupov	15d1e51750	[BOLT][NFC] Use std::optional for getLTOCommonName	2022-12-11 22:13:46 -08:00
Fangrui Song	b92436efcb	[bolt] Remove unneeded cl::ZeroOrMore for cl::opt options	2022-06-05 13:29:49 -07:00
Amir Ayupov	def464aaae	[BOLT][NFC] Fix braces usage in Profile Summary: Refactor bolt/*/Profile to follow the braces rule for if/else/loop from [LLVM Coding Standards](https://llvm.org/docs/CodingStandards.html). (cherry picked from FBD33345741)	2021-12-28 18:29:54 -08:00
Maksim Panchenko	2f09f445b2	[BOLT][NFC] Fix file-description comments Summary: Fix comments at the start of source files. (cherry picked from FBD33274597)	2021-12-21 10:21:41 -08:00
Maksim Panchenko	40c2e0fafe	[BOLT][NFC] Reformat with clang-format Summary: Selectively apply clang-format to BOLT code base. (cherry picked from FBD33119052)	2021-12-14 16:52:51 -08:00
Maksim Panchenko	ebe51c4d23	[BOLT] Use more ADT data structures for BinaryFunction Summary: Switched members of BinaryFunction to ADT where it was possible and made sense. As a result, the size of BinaryFunction on x86-64 Linux reduced from 1624 bytes to 1448. (cherry picked from FBD32981555)	2021-12-08 22:59:09 -08:00
Rafael Auler	a34c753fe7	Rebase: [NFC] Refactor sources to be buildable in shared mode Summary: Moves source files into separate components, and make explicit component dependency on each other, so LLVM build system knows how to build BOLT in BUILD_SHARED_LIBS=ON. Please use the -c merge.renamelimit=230 git option when rebasing your work on top of this change. To achieve this, we create a new library to hold core IR files (most classes beginning with Binary in their names), a new library to hold Utils, some command line options shared across both RewriteInstance and core IR files, a new library called Rewrite to hold most classes concerned with running top-level functions coordinating the binary rewriting process, and a new library called Profile to hold classes dealing with profile reading and writing. To remove the dependency from BinaryContext into X86-specific classes, we do some refactoring on the BinaryContext constructor to receive a reference to the specific backend directly from RewriteInstance. Then, the dependency on X86 or AArch64-specific classes is transfered to the Rewrite library. We can't have the Core library depend on targets because targets depend on Core (which would create a cycle). Files implementing the entry point of a tool are transferred to the tools/ folder. All header files are transferred to the include/ folder. The src/ folder was renamed to lib/. (cherry picked from FBD32746834)	2021-10-08 11:47:10 -07:00

45 Commits