llvm-project

Author	SHA1	Message	Date
Amir Ayupov	6ee5ff95ab	[BOLT] Add profile density computation Reuse the definition of profile density from llvm-profgen (#92144): - the density is computed in perf2bolt using raw samples (perf.data or pre-aggregated data), - function density is the ratio of dynamically executed function bytes to the static function size in bytes, - profile density: - functions are sorted by density in decreasing order, accumulating their respective sample counts, - profile density is the smallest density covering 99% of total sample count. In other words, BOLT binary profile density is the minimum amount of profile information per function (excluding functions in tail 1% sample count) which is sufficient to optimize the binary well. The density threshold of 60 was determined through experiments with large binaries by reducing the sample count and checking resulting profile density and performance. The threshold is conservative. perf2bolt would print the warning if the density is below the threshold and suggest to increase the sampling duration and/or frequency to reach a given density, e.g.: ``` BOLT-WARNING: BOLT is estimated to optimize better with 2.8x more samples. ``` Test Plan: updated pre-aggregated-perf.test Reviewers: maksfb, wlei-llvm, rafaelauler, ayermolo, dcci, WenleiHe Reviewed By: WenleiHe, wlei-llvm Pull Request: https://github.com/llvm/llvm-project/pull/101094	2024-10-24 18:30:59 -07:00
sinan	c3bbc3a57d	[BOLT] Fix logs with no hex convension (#112650 ) Add `utohexstr` to ensure that offsets/addresses are correctly formatted as hexadecimal values.	2024-10-18 09:46:41 +08:00
Amir Ayupov	79d695f049	[BOLT][NFCI] Speedup BAT::writeMaps For a large binary with BAT section of size 38 MB with ~170k maps, reduces writeMaps time from 70s down to 1s. The inefficiency was in the use of std::distance with std::map::iterator which doesn't provide random access. Use sorted vector for lookups. Test Plan: NFC Reviewers: maksfb, rafaelauler, dcci, ayermolo Reviewed By: maksfb Pull Request: https://github.com/llvm/llvm-project/pull/112061	2024-10-11 21:40:53 -07:00
ShatianWang	4cab01f072	[BOLT] Profile quality stats -- CFG discontinuity (#109683 ) In a perfect profile, each positive-execution-count block in the function’s CFG should be reachable from a positive-execution-count function entry block through a positive-execution-count path. This new pass checks how well the BOLT input profile satisfies this “CFG continuity” property. More specifically, for each of the hottest 1000 functions, the pass calculates the function’s fraction of basic block execution counts that is “unreachable”. It then reports the 95th percentile of the distribution of the 1000 unreachable fractions in a single BOLT-INFO line. The smaller the reported value is, the better the BOLT profile satisfies the CFG continuity property. The default value of 1000 above can be changed via the hidden BOLT option `-num-functions-for-continuity-check=[N]`. If more detailed stats are needed, `-v=1` can be added to the BOLT invocation: the hottest N functions will be grouped into 5 equally-sized buckets, from the hottest to the coldest; for each bucket, various summary statistics of the distribution of the fractions and the raw unreachable execution counts will be reported.	2024-10-08 19:07:43 -04:00
Youngsuk Kim	0a5edb4de4	[bolt] Don't call llvm::raw_string_ostream::flush() (NFC) Don't call raw_string_ostream::flush(), which is essentially a no-op. As specified in the docs, raw_string_ostream is always unbuffered. ( 65b13610a5226b84889b923bae884ba395ad084d for further reference )	2024-09-23 17:07:11 -05:00
Kristof Beyls	6d216fb7b8	[perf2bolt] Improve heuristic to map in-process addresses to specific… (#109397 ) … segments in Elf binary. The heuristic is improved by also taking into account that only executable segments should contain instructions. Fixes #109384.	2024-09-23 15:14:51 +02:00
sinan	31ac3d092b	[BOLT] Add .iplt support to x86 (#106513 ) Add X86 support for parsing .iplt section and symbols.	2024-09-23 18:22:43 +08:00
Daniil Fukalov	65bc259a97	[NFC] Add explicit #include llvm-config.h where its macros are used, last part. (#107615 ) (this is the part related to bolt, lld and mlir) Without these explicit includes, removing other headers, who implicitly include llvm-config.h, may have non-trivial side effects. For example, `clangd` may report even `llvm-config.h` as "no used" in case it defines a macro, that is explicitly used with #ifdef. It is actually amplified with different build configs which use different set of macros.	2024-09-20 19:59:39 +02:00
Amir Ayupov	cd774c873c	[BOLT][NFC] Rename ProfilePseudoProbeDesc Address build issues due to aliasing PseudoProbeDesc, e.g. https://lab.llvm.org/buildbot/#/builders/113/builds/2743	2024-09-12 21:25:38 -07:00
Amir Ayupov	c00c62c113	[BOLT] Add pseudo probe inline tree to YAML profile Add probe inline tree information to YAML profile, at function level: - function GUID, - checksum, - parent node id, - call site in the parent. This information is used for pseudo probe block matching (#99891). The encoding adds/changes probe information in multiple levels of YAML profile: - BinaryProfile: add pseudo_probe_desc with GUIDs and Hashes, which permits deduplication of data: - many GUIDs are duplicate as the same callee is commonly inlined into multiple callers, - hashes are also very repetitive, especially for functions with low block counts. - FunctionProfile: add inline tree (see above). Top-level function is included as root of function inline tree, which makes guid and pseudo_probe_desc_hash fields redundant. - BlockProfile: densely-encoded block probe information: - probes reference their containing inline tree node, - separate lists for block, call, indirect call probes, - block probe encoding is specialized: ids are encoded as bitset in uint64_t. If only block probe with id=1 is present, it's encoded as implicit entry (id=0, omitted). - inline tree nodes with identical probes share probe description where node indices are combined into a list. On top of #107970, profile with new probe encoding has the following characteristics (profile for a large binary): - Profile without probe information: 33MB, 3.8MB compressed (baseline). - Profile with inline tree information: 92MB, 14MB compressed. Profile processing time (YAML parsing, inference, attaching steps): - profile without pseudo probes: 5s, - profile with pseudo probes, without pseudo probe matching: 11s, - with pseudo probe matching: 12.5s. Test Plan: updated pseudoprobe-decoding-inline.test Reviewers: wlei-llvm, ayermolo, rafaelauler, dcci, maksfb Reviewed By: wlei-llvm, rafaelauler Pull Request: https://github.com/llvm/llvm-project/pull/107137	2024-09-12 20:51:35 -07:00
Amir Ayupov	a66ce58ac6	[BOLT] Drop suffixes in parsePseudoProbe GUID assignment (#106243 ) Pseudo probe function records contain GUIDs assigned by the compiler using an IR function name. Thus suffixes added later (e.g. `.llvm.` for internal symbols, `.destroy`/`.resume` for coroutine fragments, and `.cold`/`.warm` for split fragments) cause GUID mismatch. Address that by dropping those suffixes using `getCommonName` which is a parametrized form of `getLTOCommonName`.	2024-09-11 14:42:51 -07:00
Maksim Panchenko	abd69b3653	[BOLT] Handle internal calls in ValidateInternalCalls (#105736 ) Move handling of all internal calls into the designated pass. Preserve NOPs and mark functions as non-simple on non-X86 platforms.	2024-08-27 11:31:32 -07:00
Sayhaan Siddiqui	6aad62cf5b	[BOLT][DWARF] Add parallelization for processing of DWO debug information (#100282 ) Enables parallelization for the processing of DWO CUs.	2024-08-08 16:41:51 -07:00
Davide Italiano	e49549ff19	Revert "[BOLT] Abort on out-of-section symbols in GOT (#100801 )" This reverts commit a4900f0d936f0e86bbd04bd9de4291e1795f1768.	2024-08-07 20:52:19 -07:00
Vladislav Khmelevsky	a4900f0d93	[BOLT] Abort on out-of-section symbols in GOT (#100801 ) This patch aborts BOLT execution if it finds out-of-section (section end) symbol in GOT table. In order to handle such situations properly in future, we would need to have an arch-dependent way to analyze relocations or its sequences, e.g., for ARM it would probably be ADRP + LDR analysis in order to get GOT entry address. Currently, it is also challenging because GOT-related relocation symbols are replaced to __BOLT_got_zero. Anyway, it seems to be quite a rare case, which seems to be only? related to static binaries. For the most part, it seems that it should be handled on the linker stage, since static binary should not have GOT table at all. LLD linker with relaxations enabled would replace instruction addresses from GOT directly to target symbols, which eliminates the problem. Anyway, in order to achieve detection of such cases, this patch fixes a few things in BOLT: 1. For the end symbols, we're now using the section provided by ELF binary. Previously it would be tied with a wrong section found by symbol address. 2. The end symbols would have limited registration we would only add them in name->data GlobalSymbols map, since using address->data BinaryDataMap map would likely be impossible due to address duality of such symbols. 3. The outdated BD->getSection (currently returning refence, not pointer) check in postProcessSymbolTable is replaced by getSize check in order to allow zero-sized top-level symbols if they are located in zero-sized sections. For the most part, such things could only be found in tests, but I don't see a reason not to handle such cases. 4. Updated section-end-sym test and removed x86_64 requirement since there is no reason for this (tested on aarch64 linux) The test was provided by peterwaller-arm (thank you) in #100096 and slightly modified by me.	2024-08-07 16:26:12 +04:00
Sayhaan Siddiqui	910012e7c5	[BOLT][DWARF][NFC] Split DIEBuilder::finish (#101244 ) Split DIEBuilder::finish so that code updating .debug_names is in a separate function.	2024-07-31 13:41:38 -07:00
Sayhaan Siddiqui	33960ce5a8	[BOLT][DWARF] Sort GDBIndexTUEntryVector (#101264 ) Sorts GDBIndexTUEntryVector in decreasing order by hash to ensure determinism when parallelized.	2024-07-31 11:35:38 -07:00
Sayhaan Siddiqui	79dcd93b70	[BOLT][DWARF] Remove option to write to DWP (#100771 ) Remove the --write-dwp option as well as related code and tests.	2024-07-30 16:58:01 -07:00
Sayhaan Siddiqui	9a3e66e314	[BOLT][DWARF][NFC] Fix DebugStrOffsetsWriter (#100672 ) Fix DebugStrOffsetsWriter so updateAddressMap can't be called after it is finalized.	2024-07-26 18:58:25 -07:00
Tristan Ross	abc2eae682	[BOLT] Enable standalone build (#97130 ) Continue from #87196 as author did not have much time, I have taken over working on this PR. We would like to have this so it'll be easier to package for Nix. Can be tested by copying cmake, bolt, third-party, and llvm directories out into their own directory with this PR applied and then build bolt. --------- Co-authored-by: pca006132 <john.lck40@gmail.com>	2024-07-25 08:18:14 -07:00
Amir Ayupov	83ea7ce3a1	[BOLT][NFC] Track fragment relationships using EquivalenceClasses Three-way splitting can create references between split fragments (warm to cold or vice versa) that are not handled by `isChildOf/isParentOf/isChildOrParentOf`. Generalize fragment relationships to allow checking if two functions belong to one group, potentially in presence of ICF which can join multiple groups. Test Plan: NFC for existing tests Reviewers: maksfb, ayermolo, rafaelauler, dcci Reviewed By: rafaelauler Pull Request: https://github.com/llvm/llvm-project/pull/99979	2024-07-24 07:15:10 -07:00
Fangrui Song	86e21e1af2	[BOLT] Remove unused bool arguments from createMCObjectStreamer callers	2024-07-20 21:30:49 -07:00
Shaw Young	296a956369	[BOLT] Match functions with call graph (#98125 ) Implemented call graph function matching. First, two call graphs are constructed for both profiled and binary functions. Then functions are hashed based on the names of their callee/caller functions. Finally, functions are matched based on these neighbor hashes and the longest common prefix of their names. The `match-with-call-graph` flag turns this matching on. Test Plan: Added match-with-call-graph.test. Matched 164 functions in a large binary with 10171 profiled functions.	2024-07-19 14:00:28 -07:00
Amir Ayupov	c905db67a0	[BOLT] Attach pseudo probes to blocks in YAML profile Read pseudo probes in regular and BAT YAML profile generation, and attach them to YAML profile basic blocks. This exposes GUID, probe id, and probe type in profile for future use in stale profile matching. Test Plan: updated pseudoprobe-decoding-inline.test Reviewers: dcci, rafaelauler, ayermolo, maksfb Reviewed By: rafaelauler Pull Request: https://github.com/llvm/llvm-project/pull/99554	2024-07-18 21:01:40 -07:00
Amir Ayupov	9b007a199d	[BOLT] Expose pseudo probe function checksum and GUID (#99389 ) Add a BinaryFunction field for pseudo probe function GUID. Populate it during pseudo probe section parsing, and emit it in YAML profile (both regular and BAT), along with function checksum. To be used for stale function matching. Test Plan: update pseudoprobe-decoding-inline.test	2024-07-18 20:58:16 -07:00
Amir Ayupov	3023b15fb1	[BOLT] Support POSSIBLE_PIC_FIXED_BRANCH Detect and support fixed PIC indirect jumps of the following form: ``` movslq En(%rip), %r1 leaq PIC_JUMP_TABLE(%rip), %r2 addq %r2, %r1 jmpq *%r1 ``` with PIC_JUMP_TABLE that looks like following: ``` JT: ---------- E1:\| L1 - JT \| \|----------\| E2:\| L2 - JT \| \|----------\| \| \| ...... En:\| Ln - JT \| ---------- ``` The code could be produced by compilers, see https://github.com/llvm/llvm-project/issues/91648. Test Plan: updated jump-table-fixed-ref-pic.test Reviewers: maksfb, ayermolo, dcci, rafaelauler Reviewed By: rafaelauler Pull Request: https://github.com/llvm/llvm-project/pull/91667	2024-07-18 20:57:05 -07:00
Sayhaan Siddiqui	c0c157a518	[BOLT][DWARF][NFC] Remove DWO ranges base (#99284 ) Removes getters and setters for DWO ranges base due to it not being used.	2024-07-18 09:24:46 -07:00
Pavel Labath	09cbb45edd	[BOLT][DWARF][NFC] A better DIEBuilder for the llvm API change in #98905 (#99324 ) The caller (cloneAttribute) already switches on the reference type. By aligning the cases with the retrieval functions, we can avoid branching twice.	2024-07-18 09:46:29 +02:00
Amir Ayupov	3fe50b6dde	[BOLT] Store FileSymRefs in a multimap With aggressive ICF, it's possible to have different local symbols (under different FILE symbols) to be mapped to the same address. FileSymRefs only keeps a single SymbolRef per address, which prevents fragment matching from finding the correct symbol to perform parent function lookup. Work around this issue by switching FileSymRefs to a multimap. In future, uses of FileSymRefs can be replaced with SortedSymbols which keeps essentially the same information. Test Plan: added ambiguous_fragment.test Reviewers: dcci, ayermolo, maksfb, rafaelauler Reviewed By: rafaelauler Pull Request: https://github.com/llvm/llvm-project/pull/98992	2024-07-16 22:14:43 -07:00
Sayhaan Siddiqui	e140a8a3c8	[BOLT][DWARF][NFC] Refactor address writers (#98094 ) Refactors address writers to create an instance for each CU and its DWO CU.	2024-07-15 23:03:43 -07:00
Paschalis Mpeis	deff3afd35	[NFC][BOLT] Rename createDummyReturnFunction to createReturnInstructi.. (#98448 ) `createDummyReturnFunction` is not creating a function but instead only a function body that is simply a return statement. This patch renames it to: `createReturnInstructionList`	2024-07-15 16:30:40 +01:00
Paschalis Mpeis	34433fdceb	[BOLT] Add -print-mappings option to heatmaps (#97567 ) Emit a mapping in the legend between the characters/buckets and the text sections, using: ```sh llvm-heatmap-bolt -print-mappings .. ``` Example: ``` Legend: .. Sections: a/A : .init 0x00000100-0x00000200 b/B : .plt 0x00000200-0x00000500 c/C : .text 0x00010000-0x000a0000 d/D : .fini 0x000a0000-0x000f0000 .. ```	2024-07-15 08:23:06 +01:00
Paschalis Mpeis	587308c343	[BOLT][AArch64] Provide createDummyReturnFunction (#96626 ) AArch64 needs this function when instrumenting statically-linked binaries. Sample commands: ```bash clang -Wl,-q test.c -static -o out llvm-bolt -instrument -instrumentation-sleep-time=5 out -o out.instr ```	2024-07-15 07:20:47 +01:00
Shaw Young	131eb30584	[BOLT] Match blocks with calls as anchors (#96596 ) Added another hash level – call hash – following opcode hash matching for stale block matching. Call hash strings are the concatenation of the lexicographically ordered names of each blocks’ called functions. This change bolsters block matching in cases where some instructions have been removed or added but calls remain constant. Test Plan: added match-functions-with-calls-as-anchors.test.	2024-07-10 15:46:47 -07:00
Sayhaan Siddiqui	7e10ad99ad	[BOLT][DWARF] Cleanup buffer initialization for DWO range writer (#97843 ) Cleanup buffer initialization for DWO range writer instances to remove empty buffer at the beginning.	2024-07-10 11:35:40 -07:00
Sayhaan Siddiqui	f137be30a4	[BOLT][DWARF][NFC] Remove unnecessary SectionOffset (#97841 ) Removes unnecessary SectionOffset variable from DebugData.	2024-07-09 16:36:49 -07:00
Shaw Young	37bee25497	[BOLT][NFC] Refactor function matching (#97502 ) Moved function matching techniques into separate helper functions for ease of understanding and to make space for additional function matching techniques to be added (e.g. call graph function matching).	2024-07-05 14:44:15 -07:00
Alexander Yermolovich	361350fc89	[BOLT][DWARF] Deduplicate Foreign TU list (#97629 ) There could be multiple TUs with the same hash in various DWO files. In bigger binaries this could be in the thousands. Although they could be structurally different and we need to output Entries for all of them, for the purposes of figuring out a TU hash we only need one entry in Foreign TU list.	2024-07-04 07:20:06 -07:00
Sayhaan Siddiqui	5828b04b03	[BOLT][DWARF] Refactor legacy ranges writers (#96006 ) Refactors legacy ranges writers to create a writer for each instance of a DWO file. We now write out everything into .debug_ranges after the all the DWO files are processed. This also changes the order that ranges is written out in, as before we wrote out while in the main CU processing loop and we now iterate through the CU buckets created by partitionCUs, after the main processing loop.	2024-07-03 14:50:40 -07:00
Shaw Young	97dc50882c	[BOLT] Match functions with name similarity (#95884 ) A mapping - from namespace to associated binary functions - is used to match function profiles to binary based on the '--name-similarity-function-matching-threshold' flag set edit distance threshold. The flag is set to 0 (exact name matching) by default as it is expensive, requiring the processing of all BFs. Test Plan: Added name-similarity-function-matching.test. On a binary with 5M functions, rewrite passes took ~520s without the flag and ~2018s with the flag set to 20.	2024-07-03 11:39:18 -07:00
Amir Ayupov	344228ebf4	[BOLT] Drop macro-fusion alignment (#97358 ) 9d0754ada5dbbc0c009bcc2f7824488419cc5530 dropped MC support required for optimal macro-fusion alignment in BOLT. Remove the support in BOLT as performance measurements with large binaries didn't show a significant improvement. Test Plan: macro-fusion alignment was never upstreamed, so no upstream tests are affected.	2024-07-02 09:20:41 -07:00
Fangrui Song	e3e0df391c	[BOLT] Replace the MCAsmLayout parameter with MCAssembler Continue the MCAsmLayout removal work started by 67957a45ee1ec42ae1671cdbfa0d73127346cc95.	2024-07-01 18:02:34 -07:00
Nathan Sidwell	6c5b62b846	[BOLT][NFC] Separate isReversibleBranch's 2 semantics (#95572 ) `isUnsupportedBranch` was renamed (and inverted) to `isReversibleBranch`, as that was how it was being used. But one use in `BinaryFunction::disassemble` was using the original meaning to detect unsupported branches, and the `isUnsupportedBranch` had 2 separate semantic checks. Move the unsupported branch check from `isReversibleBranch` to a new entry point: `isUnsupportedInstruction`. Call that from `BinaryFunction::disassemble`. Move the dynamic branch check from X86's isReversibleBranch to the base class, as it is not an architecture-specific check. Remove unnecessary `isReversibleBranch` calls from Instrumentation and X86 MCPlusBuilder.	2024-06-28 07:45:37 -04:00
Maksim Panchenko	d16b21b17d	[BOLT][Linux] Support ORC for alternative instructions (#96709 ) Alternative instruction sequences in the Linux kernel can modify the stack and thus they need their own ORC unwind entries. Since there's only one ORC table, it has to be "shared" among multiple instruction sequences. The kernel achieves this by putting a restriction on instruction boundaries. If ORC state changes at a given IP, only one of the alternative sequences can have an instruction starting/ending at this IP. Then, developers can insert NOPs to guarantee the above requirement is met. The most common use of ORC with alternatives is "pushf; pop %rax" sequence used for paravirtualization. Note that newer kernel versions no longer use .parainstructions; instead, they utilize alternatives for the same purpose. Before we implement a better support for alternatives, we can safely skip ORC entries associated with them. Fixes #87052.	2024-06-27 19:26:11 -07:00
shaw young	2430a354bf	[BOLT][NFC] Move CallGraph from Passes to Core (#96922 ) Moved CallGraph and BinaryFunctionCallGraph from Passes to Core for future use in stale matching.	2024-06-27 16:34:47 -07:00
Paschalis Mpeis	a13bc9714a	[BOLT][AArch64] Implement PLTCall optimization (#93584 ) `convertCallToIndirectCall` applies the PLTCall optimization and returns an (updated if needed) iterator to the converted call instruction. Since AArch64 requires to inject additional instructions to implement this pass, the relevant BasicBlock and an iterator was passed to the `convertCallToIndirectCall`. `NumCallsOptimized` is updated only on successful application of the pass. Tests: - Inputs/plt-tailcall.c: an example of a tail call optimized PLT call. - AArch64/plt-call.test: it is the actual A64 test, that runs the PLTCall optimization on the above input file and verifies the application of the pass to the calls: 'printf' and 'puts'.	2024-06-11 19:21:11 +01:00
Sayhaan Siddiqui	727ecbeee3	[BOLT][DWARF][NFC] Remove old GDB Index functions (#95019 ) Remove old usages of GDB Index functions after replacing them with new ones.	2024-06-11 10:36:49 -07:00
Sayhaan Siddiqui	61df854d4c	[BOLT][DWARF][NFC] Replace usages of GDBIndex functions (#94701 ) Replace old usages of GDB Index functions to use the new class.	2024-06-10 10:46:20 -07:00
Nathan Sidwell	3fefb3c598	[BOLT][NFC] Infailable fns return void (#92018 ) Both `reverseBranchCondition` and `replaceBranchTarget` return a success boolean. But all-but-one caller ignores the return value, and the exception emits a fatal error on failure. Thus, just return nothing.	2024-06-07 06:59:52 -04:00
Sayhaan Siddiqui	2a6efe6a49	[BOLT][DWARF][NFC] Refactor GDB Index into a new file (#94405 ) Create a new class and file for functions that update GDB index.	2024-06-06 07:23:05 -07:00

1 2 3 4 5 ...

481 Commits