llvm-project

Author	SHA1	Message	Date
Amir Ayupov	0d5325bb20	[BOLT] Directly use call count in buildCallGraph (#134966 ) In call graph construction, call block count is used for call graph edge weight. Change that to use call count directly if it's available, falling back to block count if not. Test Plan: This change together with disabling `fix-block-counts` improves profile quality metrics, e.g. for large binaries and sampled LBR profiles: `br_inst_retired.near_taken:uppp` trigger event - Ads1: - Profiled functions 58096 - CFG imbalance 2.63% -> 2.45% - CG imbalance 8.23% -> 7.44% - Ads2: - Profiled functions 54358 - CFG imbalance 3.12% -> 2.77% - CG imbalance 8.22% -> 7.06% - uwsgi: - Profiled functions 78103 - CFG imbalance 4.42% -> 4.03% - CG imbalance 100.00% -> 100.00% `cycles:u` trigger event: - web: - Profiled functions 31306 - CG flow imbalance: 31.16% -> 20.29% - CFG flow imbalance: 7.04% -> 6.44%	2025-07-14 14:28:52 -07:00
YongKang Zhu	dadaa7941d	[BOLT][instr] Add optional arguments to __bolt_instr_data_dump() (#148700 ) `__bolt_instr_data_dump()` will find instrumented binary name by iterating through entries under directory `/proc/self/map_files`, and then open the binary and memory map it onto heap in order to locate `.bolt.instr.tables` section to read the descriptions. If binary name is already known and/or binary is already opened as memory mapped, we can pass binary name and/or memory buffer directly to `__bolt_instr_data_dump()` to save some work.	2025-07-14 13:06:16 -07:00
Peter Waller	81ad8e6345	[BOLT] Force frame pointers off for runtimes (#148009 ) Distributions are making the choice to turn frame pointers on by default. Nixpkgs recently turned them on, and the method they use to do so implies that everything is built with them on by default. https://github.com/NixOS/nixpkgs/pull/399014 Assuming that a well behaved distribution doing this puts `-fno-omit-frame-pointer` at the beginning of the compiler invocation, we can still re-enable omission by supplying `-fomit-frame-pointer` during compilation. This fixes some segfaults from stack corruption in binaries rewritten by bolt with `llvm-bolt -instrument`. See also: #147569 Fixes: #148595	2025-07-14 11:03:35 +01:00
YongKang Zhu	e088334432	[BOLT][NFC] Add const qualifier to certain pointers to read-only objects (#148543 )	2025-07-13 19:17:20 -07:00
Amir Ayupov	00311cf604	[BOLT] Impute missing trace fall-through (#145258 )	2025-07-12 12:25:58 -07:00
Asher Dobrescu	c8c0e90233	[BOLT] Ensure remember and restore CFIs are in the same list (#144348 ) In `addCFIInstruction`, we split the CFI information between `CFIInstrMapType CIEFrameInstructions` and `CFIInstrMapType FrameInstructions`. In some cases we can end up with the remember CFI in `CIEFrameInstructions` and the restore CFI in `FrameInstructions`. This patch adds a check to make sure we do not split remember and restore states and fixes https://github.com/llvm/llvm-project/issues/133501.	2025-07-10 15:48:57 +01:00
Fangrui Song	dcf485609c	MC: Centralize X86 PC-relative fixup adjustment in MCAssembler Move the X86 PC-relative fixup adjustment from X86MCCodeEmitter::emitImmediate to MCAssembler, leveraging a generalized evaluateFixup. This saves a MCBinaryExpr. For `call foo`, the fixup expression is now `foo` instead of `foo-4`. There is no change in generated relocations. In bolt/lib/Target/X86/X86MCPlusBuilder.cpp, createRelocation needs to decrease the addend. Both max-rss and instructions:u show a minor decrease. https://llvm-compile-time-tracker.com/compare.php?from=ea600576a6f94d6f28925c4b99962cc26b463c29&to=016e8fd4ddf851e5555f606c6394241d68f1a7bb&stat=max-rss&linkStats=on Next: Update targets that use FKF_IsAlignedDownTo32Bits to define `evaluateFixup` and remove FKF_IsAlignedDownTo32Bits from the generic code. Pull Request: https://github.com/llvm/llvm-project/pull/147113	2025-07-08 09:22:30 -07:00
Amir Ayupov	46e3ec0244	[BOLT][NFCI] Report perf script time (#147232 ) Leverage `sys::ProcessStatistics` to report the run time and memory usage of perf script processes launched when reading perf data. The reporting is enabled in debug mode with `-debug-only=aggregator`. Switch buildid-list command to non-waiting `launchPerfProcess` to get its runtime as well, unifying it with the rest of perf script processes. Test Plan: NFC	2025-07-07 06:05:48 -07:00
Fangrui Song	244e053b6c	MC: Remove llvm/MC/MCFixupKindInfo.h The file used to define `MCFixupKindInfo`, a simple structure, which is now in MCAsmBackend.h.	2025-07-05 11:24:11 -07:00
Fangrui Song	5b7f1c17d9	BOLT: Replace deprecated MCFixupKindInfo::FKF_IsPCRel with MCFixup::isPCRel MCFixup::PCRel is now set at creation and the MCFixupKindInfo::FKF_IsPCRel flag is no longer set.	2025-07-04 17:33:20 -07:00
Fangrui Song	2bfc488d34	X86MCCodeEmitter: Remove unneeded MCFixupKindInfo::FKF_IsPCRel	2025-07-04 16:30:07 -07:00
Maksim Panchenko	218fd69261	[BOLT] Decouple new segment creation from PHDR rewrite. NFCI (#146111 ) Refactor handling of PHDR table rewrite to make modifications easier.	2025-07-02 11:22:12 -07:00
Maksim Panchenko	ad7d675991	[BOLT] Refactor mapCodeSections(). NFC (#146434 ) Factor out non-relocation specific code into a separate function.	2025-06-30 17:09:41 -07:00
Maksim Panchenko	fb24b4d46a	[BOLT] Push code to higher addresses under options (#146180 ) When --hot-functions-at-end is used in combination with --use-old-text, allocate code at the highest possible addresses withing old .text. This feature is mostly useful for HHVM, where it is beneficial to have hot static code placed as close as possible to jitted code.	2025-06-28 13:53:56 -07:00
Fangrui Song	28d4cc6d7b	MC: Reduce MCSymbolRefExpr::VK_None uses	2025-06-27 21:55:36 -07:00
Fangrui Song	109b7d965c	MC: Remove unneeded VK_None argument to MCSymbolRefExpr::create calls The MCSymbolRefExpr::create overload with the specifier parameter is discouraged and being phased out. Expressions with relocation specifiers should use MCSpecifierExpr instead.	2025-06-27 21:22:46 -07:00
Maksim Panchenko	54a7d53227	[BOLT] Fix program-header.test	2025-06-27 19:51:51 -07:00
Amir Ayupov	17328f36f6	[BOLT][test] Fix NFC mismatches in perf2bolt tests (#146148 ) zero-density.s causes spurious NFC mismatches, e.g. https://lab.llvm.org/buildbot/#/builders/92/builds/21380 This is caused by NFC script wrapping llvm-bolt binary only, so that perf2bolt invocations are replaced by `llvm-bolt --agregate-only` to achieve perf2bolt behavior. Add `show-density` to the list of flags wrapping perf2bolt calls to avoid similar issues in the future. Test Plan: ``` $ bolt/utils/nfc-check-setup.py --switch-back $ bin/llvm-lit -a tools/bolt/test/X86/zero-density.s ```	2025-06-27 12:52:37 -07:00
Sterling-Augustine	23f1ba3ee4	Reapply "[NFC][DebugInfo][DWARF] Create new low-level dwarf library (#… (#145959 ) (#146112 ) Reapply "[NFC][DebugInfo][DWARF] Create new low-level dwarf library (#… (#145959) This reapplies cbf781f0bdf2f680abbe784faedeefd6f84c246e, with fixes for the shared-library build and the unconventional sanitizer-runtime build. Original Description: This is the culmination of a series of changes described in [1]. Although somewhat large by line count, it is almost entirely mechanical, creating a new library in DebugInfo/DWARF/LowLevel. This new library has very minimal dependencies, allowing it to be used from more places than the normal DebugInfo/DWARF library--in particular from MC. 1. https://discourse.llvm.org/t/rfc-debuginfo-dwarf-refactor-into-to-lower-and-higher-level-libraries/86665/2	2025-06-27 11:05:49 -07:00
Maksim Panchenko	d00c83ef22	[BOLT] Skip creation of new segments (#146023 ) When all section contents are updated in-place, we can skip creation of new segment(s), save disk space, and free up low memory addresses. Currently, this feature only works with --use-gnu-stack.	2025-06-27 09:12:08 -07:00
Sterling-Augustine	5d03e7a204	Revert "[NFC][DebugInfo][DWARF] Create new low-level dwarf library (#… (#145959 ) …145081)" This reverts commit cbf781f0bdf2f680abbe784faedeefd6f84c246e. Breaks a couple of buildbots.	2025-06-26 13:09:20 -07:00
Maksim Panchenko	4308292d1e	[BOLT] Refactor NewTextSegmentAddress handling (#145950 ) Refactor the code for NewTextSegmentAddress to correctly point at the true start of the segment when PHDR table is placed at the beginning. We used to offset NewTextSegmentAddress by PHDR table plus cache line alignment. NFC for proper binaries. Some YAML binaries from our tests will diverge due to bad segment address/offset alignment.	2025-06-26 12:09:11 -07:00
Sterling-Augustine	cbf781f0bd	[NFC][DebugInfo][DWARF] Create new low-level dwarf library (#145081 ) This is the culmination of a series of changes described in [1]. Although somewhat large by line count, it is almost entirely mechanical, creating a new library in DebugInfo/DWARF/LowLevel. This new library has very minimal dependencies, allowing it to be used from more places than the normal DebugInfo/DWARF library--in particular from MC. I am happy to put it in another location, or to structure it differently if that makes sense. Some have suggested in BinaryFormat, but it is not a great fit there. But if that makes more sense to the reviewers, I can do that. Another possibility would be to use pass-through headers to allow clients who don't care to depend only on DebugInfo/DWARF. This would be a much less invasive change, and perhaps easier for clients. But also a system that hides details. Either way, I'm open. 1. https://discourse.llvm.org/t/rfc-debuginfo-dwarf-refactor-into-to-lower-and-higher-level-libraries/86665/2	2025-06-26 11:23:46 -07:00
Anatoly Trosinenko	7a5af4f6b8	[BOLT] Gadget scanner: detect untrusted LR before tail call (#137224 ) Implement the detection of tail calls performed with untrusted link register, which violates the assumption made on entry to every function. Unlike other pauth gadgets, detection of this one involves some amount of guessing which branch instructions should be checked as tail calls.	2025-06-26 12:37:25 +03:00
Paschalis Mpeis	249f074b22	[BOLT][AArch64] Make gs-pacret-autiasp.s deterministic (#145527 ) In gs-pacret-autiasp.s, the undefined call `bl g` causes inconsistent basic block splitting: in some platforms BOLT emits two blocks, on some others one. Defining a dummy `g` symbol forces a single basic block everywhere.	2025-06-26 09:33:49 +01:00
Paschalis Mpeis	d681c73a04	[BOLT] Create marker for source changes in nfc-mode testing. (#142931 ) Currently NFC tests only trigger when the llvm-bolt binary itself changes. This patch adds `--check-bolt-sources`, which scans git output for any modifications under bolt/, excluding: - bolt/docs - bolt/utils/docker - bolt/utils/dot2html If any matching files change between versions, a `.llvm-bolt.changes` marker is created. Buildbots can then use this marker to trigger in-tree tests.	2025-06-26 09:32:20 +01:00
Amir Ayupov	49847148d4	[BOLT] Fix density for jump-through functions (#145619 ) Address the issue that stems from how the density is computed. Binary function density is the ratio of its total dynamic number of executed bytes over the static size in bytes. The meaning of it is the amount of dynamic profile information relative to its static size. Binary profile density is the minimum function density among well- -profiled functions, taken as functions covering p99 samples, or, in other words, excluding functions in the tail 1% of samples. p99 is an arbitrary cutoff. The meaning of profile density is the minimum amount of profile information per function to be able to optimize the program well. The threshold for profile density is set empirically. The dynamically executed bytes are taken directly from LBR fall-throughs and for LBRs recorded in trampoline functions, such as ``` 000000001a941ec0 <Sleef_expf8_u10>: 1a941ec0: jmpq *0x37b911fa(%rip) # <pnt_expf8_u10> 1a941ec6: nopw %cs:(%rax,%rax) ``` the fall-through has zero length: ``` # Branch Target NextBranch Count T 1b171cf6 1a941ec0 1a941ec0 568562 ``` But it's not correct to say this function has zero executed bytes, just the size of the next branch is not included in the fall-through. If such functions have non-trivial sample count, they will fall in p99 samples, and cause the profile density to be zero. To solve this, we can either: 1. Include fall-through end jump size into executed bytes: is logically sound but technically challenging: the size needs to come from disassembly (expensive), and the threshold need to be reevaluated with updated definition of binary function density. 2. Exclude pass-through functions from density computation: follows the intent of profile density which is to set the amount of profile information needed to optimize the function well. Single instruction pass-through functions don't need samples many times the size to be optimized well. Go with option 2 as a reasonable compromise. Test Plan: added bolt/test/X86/zero-density.s	2025-06-25 22:20:37 -07:00
Benjamin Kramer	357297c0f2	[bolt] Fix the build after 0556a2aa187b86c28a9441aec3e98b9780a2c9ee StringRef now implictly converts into ArrayRef<char>.	2025-06-25 20:16:19 +02:00
Anatoly Trosinenko	a8a2c6fa88	[BOLT] Gadget scanner: fix LR to be safe in leaf functions without CFG (#141824 ) After a label in a function without CFG information, use a reasonably pessimistic estimation of register state (assume that any register that can be clobbered in this function was actually clobbered) instead of the most pessimistic "all registers are unsafe". This is the same estimation as used by the dataflow variant of the analysis when the preceding instruction is not known for sure. Without this, leaf functions without CFG information are likely to have false positive reports about non-protected return instructions, as 1) LR is unlikely to be signed and authenticated in a leaf function and 2) LR is likely to be used by a return instruction near the end of the function and 3) the register state is likely to be reset at least once during the linear scan through the function	2025-06-25 13:11:23 +03:00
Anatoly Trosinenko	20a72083fd	[BOLT] Gadget scanner: improve handling of unreachable basic blocks (#136183 ) Instead of refusing to analyze an instruction completely when it is unreachable according to the CFG reconstructed by BOLT, use pessimistic assumption of register state when possible. Nevertheless, unreachable basic blocks found in optimized code likely means imprecise CFG reconstruction, thus report a warning once per function.	2025-06-25 12:29:41 +03:00
Fangrui Song	30922f740e	Move relocation specifier constants to AArch64:: Rename these relocation specifier constants, aligning with the naming convention used by other targets (`S_` instead of `VK_`). * ELF/COFF: AArch64MCExpr::VK_ => AArch64::S_ (VK_ABS/VK_PAGE_ABS are also used by Mach-O as a hack) * Mach-O: AArch64MCExpr::M_ => AArch64::S_MACHO_ * shared: AArch64MCExpr::None => AArch64::S_None Apologies for the churn following the recent rename in #132595. This change ensures consistency after introducing MCSpecifierExpr to replace MCTargetSpecifier subclasses. Pull Request: https://github.com/llvm/llvm-project/pull/144633	2025-06-24 19:06:22 -07:00
Amir Ayupov	f6973baf28	[BOLT][NFC] Split out parsePerfData (#145248 )	2025-06-24 09:39:51 -07:00
Paschalis Mpeis	08513281bd	[BOLT][test] Drop toolname from X86/perf2bolt-spe.test (#145515 )	2025-06-24 15:12:16 +01:00
Kazu Hirata	d4d37d8430	[BOLT] Remove a redundant call to std::unique_ptr<T>::get (NFC) (#145211 )	2025-06-23 18:04:19 -07:00
Fangrui Song	17e8465a3e	AArch64: Replace AArch64MCExpr with MCSpecifierExpr Replace AArch64MCExpr, which encodes expressions with relocation specifiers, with the new generic MCSpecifierExpr interface, aligning with other targets by phasing out target-specific XXXMCExpr classes. Temporarily convert AArch64MCExpr to a namespace to avoid renaming `AArch64MCExpr::VK_` constants in this PR. A follow-up patch will rename these to `AArch64::S_` to match the convention used by other targets. Move helper functions to AArch64MCAsmInfo.h, with the goal of eventually removing AArch64MCExpr.h. Pull Request: https://github.com/llvm/llvm-project/pull/144632	2025-06-20 20:06:32 -07:00
Amir Ayupov	f0d32575a1	[BOLT][NFCI] Use FileSymbols for local symbol disambiguation (#89088 ) Remove SymbolToFileName mapping from every local symbol to its containing FILE symbol name, and reuse FileSymbols to disambiguate local symbols instead. Also removes the check for `ld-temp.o` file symbol which was added to prevent LTO build mode from affecting the disambiguated name. This may cause incompatibility when using the profile collected on a binary built in a different mode than the input binary. Addresses #90661. Speeds up discover file objects by 5-10% for large binaries: - binary with ~1.2M symbols: 12.6422s -> 12.0297s - binary with ~4.5M symbols: 48.8851s -> 43.7315s	2025-06-20 14:29:32 -07:00
Amir Ayupov	4959e8a1da	[BOLT][NFCI] Use heuristic for matching split global functions (#90429 ) This change speeds up fragment matching for large BOLTed binaries where all fragments of global parent functions are put under `bolt-pseudo.o` file symbol: - before: iterating over symbols under `bolt-pseudo.o` only to fail to find a parent, - after: bail out immediately and use a global parent by name. Test Plan: NFC, updated register-fragments-bolt-symbols.s	2025-06-20 12:46:56 -07:00
Amir Ayupov	6d8c6ef90c	[BOLT][NFC] Simplify doTrace in BAT mode (#143233 ) `BoltAddressTranslation::getFallthroughsInTrace` iterates over address translation map entries and therefore has direct access to both original and translated offsets. Return the translated offsets in fall-throughs list to avoid duplicate address translation inside `doTrace`. Test Plan: NFC	2025-06-20 12:45:21 -07:00
Amir Ayupov	770b16cd49	[BOLT][test] Update X86/perf2bolt-spe.test (#145061 ) Address NFC mismatches caused by running perf2bolt from under the wrapper script: https://lab.llvm.org/buildbot/#/builders/92/builds/20938 > <stdin>:2:64: note: possible intended match here > /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/bin/llvm-bolt.old: -spe is available only on AArch64. Test Plan: ninja check-bolt	2025-06-20 09:07:08 -07:00
Amir Ayupov	7085065c02	[BOLT] Support pre-aggregated returns (#143296 ) Intel's Architectural LBR supports capturing branch type information as part of LBR stack (SDM Vol 3B, part 2, October 2024): ``` 20.1.3.2 Branch Types The IA32_LBR_x_INFO.BR_TYPE and IA32_LER_INFO.BR_TYPE fields encode the branch types as shown in Table 20-3. Table 20-3. IA32_LBR_x_INFO and IA32_LER_INFO Branch Type Encodings Encoding \| Branch Type 0000B \| COND 0001B \| NEAR_IND_JMP 0010B \| NEAR_REL_JMP 0011B \| NEAR_IND_CALL 0100B \| NEAR_REL_CALL 0101B \| NEAR_RET 011xB \| Reserved 1xxxB \| OTHER_BRANCH For a list of branch operations that fall into the categories above, see Table 20-2. Table 20-2. Branch Type Filtering Details Branch Type \| Operations Recorded COND \| Jcc, JCXZ, and LOOP NEAR_IND_JMP \| JMP r/m* NEAR_REL_JMP \| JMP rel* NEAR_IND_CALL \| CALL r/m* NEAR_REL_CALL \| CALL rel* (excluding CALLs to the next sequential IP) NEAR_RET \| RET (0C3H) OTHER_BRANCH \| JMP/CALL ptr, JMP/CALL m, RET (0C8H), SYS*, interrupts, exceptions (other than debug exceptions), IRET, INT3, INTn, INTO, TSX Abort, EENTER, ERESUME, EEXIT, AEX, INIT, SIPI, RSM ``` Linux kernel can preserve branch type when `save_type` is enabled, even if CPU does not support Architectural LBR: `f09079bd04/tools/perf/Documentation/perf-record.txt (L457-L460)` > - save_type: save branch type during sampling in case binary is not available later. For the platforms with Intel Arch LBR support (12th-Gen+ client or 4th-Gen Xeon+ server), the save branch type is unconditionally enabled when the taken branch stack sampling is enabled. Kernel-reported branch type values: `8c6bc74c7f/include/uapi/linux/perf_event.h (L251-L269)` This information is needed to disambiguate external returns (from DSO/JIT) to an entry point or a landing pad, when BOLT can't disassemble the branch source. This patch adds new pre-aggregated types: - return trace (R), - external return fall-through (r). For such types, the checks for fall-through start (not an entry or a landing pad) are relaxed. Depends on #143295. Test Plan: updated callcont-fallthru.s	2025-06-20 03:17:08 -07:00
Ádám Kallai	f75973949b	[BOLT][AArch64] Add support for SPE brstack format (#129231 ) Since Linux 6.14, Perf gained the ability to report SPE branch events using the `brstack` format, which matches the layout of LBR/BRBE. This patch reuses the existing LBR parsing logic to support SPE. Example SPE brstack format: ```bash perf script -i perf.data -F pid,brstack --itrace=bl ``` ``` PID FROM / TO / PREDICTED 16984 0x72e342e5f4/0x72e36192d0/M/-/-/11/RET/- 16984 0x72e7b8b3b4/0x72e7b8b3b8/PN/-/-/11/COND/- 16984 0x72e7b92b48/0x72e7b92b4c/PN/-/-/8/COND/- 16984 0x72eacc6b7c/0x760cc94b00/P/-/-/9/RET/- 16984 0x72e3f210fc/0x72e3f21068/P/-/-/4//- 16984 0x72e39b8c5c/0x72e3627b24/P/-/-/4//- 16984 0x72e7b89d20/0x72e7b92bbc/P/-/-/4/RET/- ``` SPE brstack flags can be two characters long: `PN` or `MN`: - `P` = predicted branch - `M` = mispredicted branch - `N` = optionally appears when the branch is NOT-TAKEN - flag is relevant only to conditional branches Example of usage with BOLT: 1. Capture SPE branch events: ```bash perf record -e 'arm_spe_0/branch_filter=1/u' -- binary ``` 2. Convert profile for BOLT: ```bash perf2bolt -p perf.data -o perf.fdata --spe binary ``` 3. Run BOLT Optimization: ```bash llvm-bolt binary -o binary.bolted --data perf.fdata ... ``` A unit test verifies the parsing of the 'SPE brstack format'. --------- Co-authored-by: Paschalis Mpeis <paschalis.mpeis@arm.com>	2025-06-20 10:40:35 +01:00
Maksim Panchenko	b8d3efa189	[BOLT][Linux] Fix linux_banner lookup (#144962 ) While detecting the Linux kernel version, look for `linux_banner` symbol with local visibility if the global one was not found. Fixes #144847	2025-06-19 23:09:34 -07:00
Anatoly Trosinenko	e873fd157e	[BOLT] Gadget scanner: do not crash on debug-printing CFI instructions (#136151 ) Some instruction-printing code used under LLVM_DEBUG does not handle CFI instructions well. While CFI instructions seem to be harmless for the correctness of the analysis results, they do not convey any useful information to the analysis either, so skip them early.	2025-06-19 15:52:54 +03:00
Anatoly Trosinenko	2b4d757290	[BOLT] Gadget scanner: detect authentication oracles (#135663 ) Implement the detection of authentication instructions whose results can be inspected by an attacker to know whether authentication succeeded. As the properties of output registers of authentication instructions are inspected, add a second set of analysis-related classes to iterate over the instructions in reverse order.	2025-06-19 15:15:26 +03:00
Paschalis Mpeis	50a7511138	[BOLT][AArch64] Fix PREL Relocs on RHEL8 (#144505 )	2025-06-19 08:51:08 +01:00
Amir Ayupov	9fed480f18	[BOLT] Explicitly check for returns when extending call continuation profile (#143295 ) Call continuation logic relies on assumptions about fall-through origin: - the branch is external to the function, - fall-through start is at the beginning of the block, - the block is not an entry point or a landing pad. Leverage trace information to explicitly check whether the origin is a return instruction, and defer to checks above only in case of DSO-external branch source. This covers both regular and BAT cases, addressing call continuation fall-through undercounting in the latter mode, which improves BAT profile quality metrics. For example, for one large binary: - CFG discontinuity 21.83% -> 0.00%, - CFG flow imbalance 10.77%/100.00% -> 3.40%/13.82% (weighted/worst) - CG flow imbalance 8.49% —> 8.49%. Depends on #143289. Test Plan: updated callcont-fallthru.s	2025-06-17 06:28:27 -07:00
Amir Ayupov	7e6c1bd3ed	[BOLT][NFCI] Simplify DataAggregator using traces (#143289 ) Consistently apply traces as defined in #127125 for branch profile aggregation. This combines branches and fall-through records into one. With large input binaries/profiles, the speed up in aggregation time (`-time-aggr`, wall time): - perf.data, pre-BOLT input: 154.5528s -> 144.0767s - pre-aggregated data, pre-BOLT input: 15.1026s -> 9.0711s - pre-aggregated data, BOLTed input: 15.4871s -> 10.0077s Test Plan: NFC	2025-06-16 23:54:40 -07:00
Paschalis Mpeis	41b9d28327	[BOLT][NFC] Using target_triple in lit config (#144078 )	2025-06-17 07:42:57 +01:00
Paschalis Mpeis	0952992ac6	[BOLT] Fix LLVM_APPEND_VC_REV support (#142410 ) The CMake flag LLVM_APPEND_VC_REV can be passed when building BOLT a BOLT to prevent including a VC Revision. This patch enables this functionality. Usage: `-DLLVM_APPEND_VC_REV=OFF` when running CMake.	2025-06-16 09:42:59 +01:00
Fangrui Song	c9d511bc64	Replace deprecated MCExpr::print with MCAsmInfo::printExpr	2025-06-15 17:41:17 -07:00

1 2 3 4 5 ...

2597 Commits