llvm-project

Author	SHA1	Message	Date
Owen Rodley	d3d856ad84	Clean up external users of GlobalValue::getGUID(StringRef) (#129644 ) See https://discourse.llvm.org/t/rfc-keep-globalvalue-guids-stable/84801 for context. This is a non-functional change which just changes the interface of GlobalValue, in preparation for future functional changes. This part touches a fair few users, so is split out for ease of review. Future changes to the GlobalValue implementation can then be focused purely on that class. This does the following: * Rename GlobalValue::getGUID(StringRef) to getGUIDAssumingExternalLinkage. This is simply making explicit at the callsite what is currently implicit. * Where possible, migrate users to directly calling getGUID on a GlobalValue instance. * Otherwise, where possible, have them call the newly renamed getGUIDAssumingExternalLinkage, to make the assumption explicit. There are a few cases where neither of the above are possible, as the caller saves and reconstructs the necessary information to compute the GUID themselves. We want to migrate these callers eventually, but for this first step we leave them be.	2025-04-28 11:09:43 +10:00
Shaw Young	9a9af0a23f	[BOLT] Match blocks with pseudo probes (#99891 ) Match inline trees first between profile and the binary: by GUID, checksum, parent, and inline site for inlined functions. Map profile probes to binary probes via matched inline tree nodes. Each binary probe has an associated binary basic block. If all probes from one profile basic block map to the same binary basic block, it’s an exact match, otherwise the block is determined by majority vote and reported as loose match. Pseudo probe matching happens between exact hash matching and call/loose matching. Introduce ProbeMatchSpec - a mechanism to match probes belonging to another binary function. For example, given functions foo and bar: ``` void foo() { bar(); } ``` profiled binary: bar is not inlined => have top-level function bar new binary where the profile is applied to: bar is inlined into foo. Currently, BOLT does 1:1 matching between profile functions and binary functions based on the name. #100446 will extend this to N:M where multiple profiles can be matched to one binary function (as in the example above where binary function foo would use profiles for foo and bar), and one profile can be matched to multiple binary functions (e.g. if bar was inlined into multiple functions). In this diff, ProbeMatchSpecs would only have one BinaryFunctionProfile (existing name-based matching). Test Plan: Added match-blocks-with-pseudo-probes.test Performance test: - Setup: - Baseline no-BOLT: Clang with pseudo probes, ThinLTO + CSSPGO (#79942) - BOLT fresh: BOLTed Clang using fresh profile, - BOLT stale (hash): BOLTed Clang using stale profile (collected on Clang 10K commits back), `-infer-stale-profile` (hash+call block matching) - BOLT stale (+probe): BOLTed Clang using stale profile, `-infer-stale-profile` with `-stale-matching-with-pseudo-probes` (hash+call+pseudo probe block matching) - 2S Intel SKX Xeon 6138 with 40C/80T and 256GB RAM, using 20C/40T for build, - BOLT profiles are collected on Clang compiling large preprocessed C++ file. - Benchmark: building Clang (average of 5 runs), see driver in aaupov/llvm-devmtg-2022 - Results, wall time, lower is better: - Baseline no-BOLT: 429.52 +- 2.61s, - BOLT stale (hash): 413.21 +- 2.19s, - BOLT stale (+probe): 409.69 +- 1.41s, - BOLT fresh: 384.50 +- 1.80s. --------- Co-authored-by: Amir Ayupov <aaupov@fb.com>	2024-11-12 07:21:03 -08:00
Amir Ayupov	7ec682b16b	[MC] Use StringRefs from pseudo_probe_desc section if it's mapped Add `IsMMapped` flag to `buildGUID2FuncDescMap` controlling whether to allocate a string in `FuncNameAllocator` or use StringRef directly. Keep it false by default, only set it for BOLT use case because BOLT keeps file sections in memory while processing them. llvm-profgen constructs GUID2FuncDescMap and then releases the binary. For medium sized binary with 0.8 GiB .pseudo_probe_desc section, this saves 0.7 GiB peak RSS in perf2bolt. Test Plan: no-op for llvm-profgen, NFC for perf2bolt Reviewers: maksfb, dcci, wlei-llvm, rafaelauler, ayermolo Reviewed By: wlei-llvm Pull Request: https://github.com/llvm/llvm-project/pull/112996	2024-11-08 16:39:33 -08:00
Amir Ayupov	86ec59e2f7	[BOLT] Only parse probes for profiled functions in profile-write-pseudo-probes mode (#106365 ) Implement selective probe parsing for profiled functions only when emitting probe information to YAML profile as suggested in https://github.com/llvm/llvm-project/pull/102904#pullrequestreview-2248714190 For a large binary, this reduces probe parsing - processing time from 10.5925s to 5.6295s, - peak RSS from 10.54 to 7.98 GiB.	2024-09-11 16:33:34 -07:00
Amir Ayupov	c820bd3e33	[BOLT][NFC] Rename profile-use-pseudo-probes The flag currently controls writing of probe information in YAML profile. #99891 adds a separate flag to use probe information for stale profile matching. Thus `profile-use-pseudo-probes` becomes a misnomer and `profile-write-pseudo-probes` better captures the intent. Reviewers: maksfb, WenleiHe, ayermolo, rafaelauler, dcci Reviewed By: rafaelauler Pull Request: https://github.com/llvm/llvm-project/pull/106364	2024-09-11 16:27:33 -07:00
Amir Ayupov	a66ce58ac6	[BOLT] Drop suffixes in parsePseudoProbe GUID assignment (#106243 ) Pseudo probe function records contain GUIDs assigned by the compiler using an IR function name. Thus suffixes added later (e.g. `.llvm.` for internal symbols, `.destroy`/`.resume` for coroutine fragments, and `.cold`/`.warm` for split fragments) cause GUID mismatch. Address that by dropping those suffixes using `getCommonName` which is a parametrized form of `getLTOCommonName`.	2024-09-11 14:42:51 -07:00
Amir Ayupov	a79cf0228e	[MC][NFC] Use vector for GUIDProbeFunctionMap Replace unordered_map with a vector. Pre-parse the section to statically allocate storage. Use BumpPtrAllocator for FuncName strings, keep StringRef in FuncDesc. Reduces peak RSS of pseudo probe parsing from 9.08 GiB to 8.89 GiB as part of perf2bolt with a large binary. Test Plan: ``` bin/llvm-lit -sv test/tools/llvm-profgen ``` Reviewers: wlei-llvm, rafaelauler, dcci, maksfb, ayermolo Reviewed By: wlei-llvm Pull Request: https://github.com/llvm/llvm-project/pull/102905	2024-08-26 09:15:53 -07:00
Amir Ayupov	ee09f7d1fc	[MC][NFC] Reduce Address2ProbesMap size Replace the map from addresses to list of probes with a flat vector containing probe references sorted by their addresses. Reduces pseudo probe parsing time from 9.56s to 8.59s and peak RSS from 9.66 GiB to 9.08 GiB as part of perf2bolt processing a large binary. Test Plan: ``` bin/llvm-lit -sv test/tools/llvm-profgen ``` Reviewers: maksfb, rafaelauler, dcci, ayermolo, wlei-llvm Reviewed By: wlei-llvm Pull Request: https://github.com/llvm/llvm-project/pull/102904	2024-08-26 09:14:35 -07:00
Amir Ayupov	04ebd1907c	[MC][NFC] Statically allocate storage for decoded pseudo probes and function records Use #102774 to allocate storage for decoded probes (`PseudoProbeVec`) and function records (`InlineTreeVec`). Leverage that to also shrink sizes of `MCDecodedPseudoProbe`: - Drop Guid since it's accessible via `InlineTree`. `MCDecodedPseudoProbeInlineTree`: - Keep track of probes and inlinees using `ArrayRef`s now that probes and function records belonging to the same function are allocated contiguously. This reduces peak RSS from 13.7 GiB to 9.7 GiB and pseudo probe parsing time (as part of perf2bolt) from 15.3s to 9.6s for a large binary with 400MiB .pseudo_probe section containing 43M probes and 25M function records. Depends on: #102774 #102787 #102788 Reviewers: maksfb, rafaelauler, dcci, ayermolo, wlei-llvm Reviewed By: wlei-llvm Pull Request: https://github.com/llvm/llvm-project/pull/102789	2024-08-26 09:09:13 -07:00
Amir Ayupov	121ed07975	[MC][NFC] Count pseudo probes and function records Pre-parse pseudo probes section counting the number of probes and function records. These numbers are used in follow-up diff to pre-allocate vectors for decoded probes and inline tree nodes. Additional benefit is avoiding error handling during parsing. This pre-parsing is fast: for a 404MiB .pseudo_probe section with 43373881 probes and 25228770 function records, it only takes 0.68±0.01s. The total time of buildAddress2ProbeMap is 21s. Reviewers: dcci, maksfb, rafaelauler, wlei-llvm, ayermolo Reviewed By: wlei-llvm Pull Request: https://github.com/llvm/llvm-project/pull/102774	2024-08-26 09:05:34 -07:00
Amir Ayupov	4d19676de4	[BOLT] Add profile-use-pseudo-probes option Move pseudo probe profile generation under --profile-use-pseudo-probes option. Note that updating pseudo probes is independent from this flag. Test Plan: updated pseudoprobe-decoding-inline.test Reviewers: maksfb, rafaelauler, ayermolo, dcci, WenleiHe Reviewed By: WenleiHe Pull Request: https://github.com/llvm/llvm-project/pull/100299	2024-07-24 07:31:01 -07:00
Amir Ayupov	9b007a199d	[BOLT] Expose pseudo probe function checksum and GUID (#99389 ) Add a BinaryFunction field for pseudo probe function GUID. Populate it during pseudo probe section parsing, and emit it in YAML profile (both regular and BAT), along with function checksum. To be used for stale function matching. Test Plan: update pseudoprobe-decoding-inline.test	2024-07-18 20:58:16 -07:00
Kazu Hirata	4a0ccfa865	Use llvm::endianness::{big,little,native} (NFC) Note that llvm::support::endianness has been renamed to llvm::endianness while becoming an enum class as opposed to an enum. This patch replaces support::{big,little,native} with llvm::endianness::{big,little,native}.	2023-10-12 21:21:45 -07:00
Job Noorman	23c8d38258	[BOLT] Calculate input to output address map using BOLTLinker BOLT uses MCAsmLayout to calculate the output values of basic blocks. This means output values are calculated based on a pre-linking state and any changes to symbol values during linking will cause incorrect values to be used. This issue was first addressed in D154604 by adding all basic block symbols to the symbol table for the linker to resolve them. However, the runtime overhead of handling this huge symbol table turned out to be prohibitively large. This patch solves the issue in a different way. First, a temporary section containing [input address, output symbol] pairs is emitted to the intermediary object file. The linker will resolve all these references so we end up with a section of [input address, output address] pairs. This section is then parsed and used to: - Replace BinaryBasicBlock::OffsetTranslationTable - Replace BinaryFunction::InputOffsetToAddressMap - Update BinaryBasicBlock::OutputAddressRange Note that the reason this is more performant than the previous attempt is that these symbol references do not cause entries to be added to the symbol table. Instead, section-relative references are used for the relocations. Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D155604	2023-08-21 10:36:20 +02:00
Maksim Panchenko	43dce27c06	[BOLT][NFCI] Migrate pseudo probes to MetadataRewriter interface Use new MetdataRewriter interface to update pseudo probes and move ProbeDecoder out of BinaryContext into new PseudoProbeRewriter class. Depends on D154021 Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D154022 Differential Revision: https://reviews.llvm.org/D154023	2023-07-06 11:19:30 -07:00

15 Commits