llvm-project

Author	SHA1	Message	Date
Fangrui Song	34c7b7ccae	MCSymbol: Remove setUndefined The name is misleading, as setting Fragment to nullptr does not necessarily make it undefined - common and equated symbols have a nullptr fragment as well.	2025-08-17 15:57:27 -07:00
Amir Ayupov	49847148d4	[BOLT] Fix density for jump-through functions (#145619 ) Address the issue that stems from how the density is computed. Binary function density is the ratio of its total dynamic number of executed bytes over the static size in bytes. The meaning of it is the amount of dynamic profile information relative to its static size. Binary profile density is the minimum function density among well- -profiled functions, taken as functions covering p99 samples, or, in other words, excluding functions in the tail 1% of samples. p99 is an arbitrary cutoff. The meaning of profile density is the minimum amount of profile information per function to be able to optimize the program well. The threshold for profile density is set empirically. The dynamically executed bytes are taken directly from LBR fall-throughs and for LBRs recorded in trampoline functions, such as ``` 000000001a941ec0 <Sleef_expf8_u10>: 1a941ec0: jmpq *0x37b911fa(%rip) # <pnt_expf8_u10> 1a941ec6: nopw %cs:(%rax,%rax) ``` the fall-through has zero length: ``` # Branch Target NextBranch Count T 1b171cf6 1a941ec0 1a941ec0 568562 ``` But it's not correct to say this function has zero executed bytes, just the size of the next branch is not included in the fall-through. If such functions have non-trivial sample count, they will fall in p99 samples, and cause the profile density to be zero. To solve this, we can either: 1. Include fall-through end jump size into executed bytes: is logically sound but technically challenging: the size needs to come from disassembly (expensive), and the threshold need to be reevaluated with updated definition of binary function density. 2. Exclude pass-through functions from density computation: follows the intent of profile density which is to set the amount of profile information needed to optimize the function well. Single instruction pass-through functions don't need samples many times the size to be optimized well. Go with option 2 as a reasonable compromise. Test Plan: added bolt/test/X86/zero-density.s	2025-06-25 22:20:37 -07:00
Kazu Hirata	e401fb8c47	[BOLT] Use llvm::replace (NFC) (#140199 )	2025-05-16 07:30:29 -07:00
Amir Ayupov	e039d16ee5	[BOLT][NFC] Disambiguate sample as basic sample (#139350 ) Sample is a general term covering both basic (IP) and branch (LBR) profiles. Find and replace ambiguous uses of sample in a basic sample sense. Rename `RawBranchCount` into `RawSampleCount` reflecting its use for both kinds of profile. Rename `PF_LBR` profile type as `PF_BRANCH` reflecting non-LBR based branch profiles (non-brstack SPE, synthesized brstack ETM/PT). Follow-up to #137644. Test Plan: NFC	2025-05-12 17:15:16 -07:00
Maksim Panchenko	96e5ee23a7	[BOLT][AArch64] Add partial support for lite mode (#133014 ) In lite mode, we only emit code for a subset of functions while preserving the original code in .bolt.org.text. This requires updating code references in non-emitted functions to ensure that: * Non-optimized versions of the optimized code never execute. * Function pointer comparison semantics is preserved. On x86-64, we can update code references in-place using "pending relocations" added in scanExternalRefs(). However, on AArch64, this is not always possible due to address range limitations and linker address "relaxation". There are two types of code-to-code references: control transfer (e.g., calls and branches) and function pointer materialization. AArch64-specific control transfer instructions are covered by #116964. For function pointer materialization, simply changing the immediate field of an instruction is not always sufficient. In some cases, we need to modify a pair of instructions, such as undoing linker relaxation and converting NOP+ADR into ADRP+ADD sequence. To achieve this, we use the instruction patch mechanism instead of pending relocations. Instruction patches are emitted via the regular MC layer, just like regular functions. However, they have a fixed address and do not have an associated symbol table entry. This allows us to make more complex changes to the code, ensuring that function pointers are correctly updated. Such mechanism should also be portable to RISC-V and other architectures. To summarize, for AArch64, we extend the scanExternalRefs() process to undo linker relaxation and use instruction patches to partially overwrite unoptimized code.	2025-03-27 21:33:25 -07:00
Maksim Panchenko	1b8e0cf090	[BOLT] Never emit "large" functions (#115974 ) "Large" functions are functions that are too big to fit into their original slots after code modifications. CheckLargeFunctions pass is designed to prevent such functions from emission. Extend this pass to work with functions with constant islands. Now that CheckLargeFunctions covers all functions, it guarantees that we will never see such functions after code emission on all platforms (previously it was guaranteed on x86 only). Hence, we can get rid of RewriteInstance extensions that were meant to support "large" functions.	2024-11-13 09:58:44 -08:00
Shaw Young	9a9af0a23f	[BOLT] Match blocks with pseudo probes (#99891 ) Match inline trees first between profile and the binary: by GUID, checksum, parent, and inline site for inlined functions. Map profile probes to binary probes via matched inline tree nodes. Each binary probe has an associated binary basic block. If all probes from one profile basic block map to the same binary basic block, it’s an exact match, otherwise the block is determined by majority vote and reported as loose match. Pseudo probe matching happens between exact hash matching and call/loose matching. Introduce ProbeMatchSpec - a mechanism to match probes belonging to another binary function. For example, given functions foo and bar: ``` void foo() { bar(); } ``` profiled binary: bar is not inlined => have top-level function bar new binary where the profile is applied to: bar is inlined into foo. Currently, BOLT does 1:1 matching between profile functions and binary functions based on the name. #100446 will extend this to N:M where multiple profiles can be matched to one binary function (as in the example above where binary function foo would use profiles for foo and bar), and one profile can be matched to multiple binary functions (e.g. if bar was inlined into multiple functions). In this diff, ProbeMatchSpecs would only have one BinaryFunctionProfile (existing name-based matching). Test Plan: Added match-blocks-with-pseudo-probes.test Performance test: - Setup: - Baseline no-BOLT: Clang with pseudo probes, ThinLTO + CSSPGO (#79942) - BOLT fresh: BOLTed Clang using fresh profile, - BOLT stale (hash): BOLTed Clang using stale profile (collected on Clang 10K commits back), `-infer-stale-profile` (hash+call block matching) - BOLT stale (+probe): BOLTed Clang using stale profile, `-infer-stale-profile` with `-stale-matching-with-pseudo-probes` (hash+call+pseudo probe block matching) - 2S Intel SKX Xeon 6138 with 40C/80T and 256GB RAM, using 20C/40T for build, - BOLT profiles are collected on Clang compiling large preprocessed C++ file. - Benchmark: building Clang (average of 5 runs), see driver in aaupov/llvm-devmtg-2022 - Results, wall time, lower is better: - Baseline no-BOLT: 429.52 +- 2.61s, - BOLT stale (hash): 413.21 +- 2.19s, - BOLT stale (+probe): 409.69 +- 1.41s, - BOLT fresh: 384.50 +- 1.80s. --------- Co-authored-by: Amir Ayupov <aaupov@fb.com>	2024-11-12 07:21:03 -08:00
Amir Ayupov	6ee5ff95ab	[BOLT] Add profile density computation Reuse the definition of profile density from llvm-profgen (#92144): - the density is computed in perf2bolt using raw samples (perf.data or pre-aggregated data), - function density is the ratio of dynamically executed function bytes to the static function size in bytes, - profile density: - functions are sorted by density in decreasing order, accumulating their respective sample counts, - profile density is the smallest density covering 99% of total sample count. In other words, BOLT binary profile density is the minimum amount of profile information per function (excluding functions in tail 1% sample count) which is sufficient to optimize the binary well. The density threshold of 60 was determined through experiments with large binaries by reducing the sample count and checking resulting profile density and performance. The threshold is conservative. perf2bolt would print the warning if the density is below the threshold and suggest to increase the sampling duration and/or frequency to reach a given density, e.g.: ``` BOLT-WARNING: BOLT is estimated to optimize better with 2.8x more samples. ``` Test Plan: updated pre-aggregated-perf.test Reviewers: maksfb, wlei-llvm, rafaelauler, ayermolo, dcci, WenleiHe Reviewed By: WenleiHe, wlei-llvm Pull Request: https://github.com/llvm/llvm-project/pull/101094	2024-10-24 18:30:59 -07:00
Amir Ayupov	344228ebf4	[BOLT] Drop macro-fusion alignment (#97358 ) 9d0754ada5dbbc0c009bcc2f7824488419cc5530 dropped MC support required for optimal macro-fusion alignment in BOLT. Remove the support in BOLT as performance measurements with large binaries didn't show a significant improvement. Test Plan: macro-fusion alignment was never upstreamed, so no upstream tests are affected.	2024-07-02 09:20:41 -07:00
aengelke	c1a7c5ac73	[MC] Eliminate two symbol-related hash maps (#95464 ) Previously, a symbol insertion requires (at least) three hash table operations: - Lookup/create entry in Symbols (main symbol table) - Lookup NextUniqueID to deduplicate identical temporary labels - Add entry to UsedNames, which is also used to serve as storage for the symbol name in the MCSymbol. All three lookups are done with the same name, so combining these into a single table reduces the number of lookups to one. Thus, a pointer to a symbol table entry can be passed to createSymbol to avoid a duplicate lookup of the same name. The new symbol table entry value is placed in a separate header to avoid including MCContext in MCSymbol or vice versa.	2024-06-20 11:36:11 +02:00
shaw young	c8fc234ee2	[BOLT][NFC] Eliminate uses of throwing std::map::at (#92950 ) Remove calls to std::unordered_map::at, std::map::at, and std::vector::at.	2024-05-22 09:27:14 -07:00
Amir Ayupov	a9b67490b2	[BOLT] Report adjusted program stats from perf2bolt in BAT mode (#91683 )	2024-05-21 18:54:15 -07:00
Amir Ayupov	878642954f	[BOLT] Fix preserved offset in fixDoubleJumps (#92485 )	2024-05-19 13:23:04 -07:00
Amir Ayupov	54b17fa4ee	[BOLT] Preserve Offset annotation in fixDoubleJumps (#91898 ) Offset annotation was missed when optimizing an unconditional branch to a tail call. Test Plan: update bb-with-two-tail-calls.s	2024-05-13 12:31:48 -07:00
Amir Ayupov	6b9bca8faa	[BOLT] Preserve Offset annotation in SCTC (#91693 ) Offset annotation is used in writing BAT tables. Test Plan: updated sctc-bug4.test	2024-05-10 13:20:51 -07:00
Maksim Panchenko	6b1cf00400	[BOLT] Add support for Linux kernel static keys jump table (#86090 ) Runtime code modification used by static keys is the most ubiquitous self-modifying feature of the Linux kernel. The idea is to to eliminate the condition check and associated conditional jump on a hot path if that condition (based on a boolean value of a static key) does not change often. Whenever they condition changes, the kernel runtime modifies all code paths associated with that key flipping the code between nop and (unconditional) jump.	2024-03-21 14:05:21 -07:00
Maksim Panchenko	bba790db47	[BOLT] Refactor instruction creation interface. NFCI (#85292 ) Refactor MCPlusBuilder's create{Instruction}() functions that used to return bool. We almost never check the return value as we rely on llvm_unreachable() to detect unimplemented functionality. There were a couple of cases that checked the return value, but they would hit the unreachable condition first (at least in debug builds) before the return value gets checked.	2024-03-14 13:17:17 -07:00
Amir Ayupov	52cf07116b	[BOLT][NFC] Log through JournalingStreams (#81524 ) Make core BOLT functionality more friendly to being used as a library instead of in our standalone driver llvm-bolt. To accomplish this, we augment BinaryContext with journaling streams that are to be used by most BOLT code whenever something needs to be logged to the screen. Users of the library can decide if logs should be printed to a file, no file or to the screen, as before. To illustrate this, this patch adds a new option `--log-file` that allows the user to redirect BOLT logging to a file on disk or completely hide it by using `--log-file=/dev/null`. Future BOLT code should now use `BinaryContext::outs()` for printing important messages instead of `llvm::outs()`. A new test log.test enforces this by verifying that no strings are print to screen once the `--log-file` option is used. In previous patches we also added a new BOLTError class to report common and fatal errors, so code shouldn't call exit(1) now. To easily handle problems as before (by quitting with exit(1)), callers can now use `BinaryContext::logBOLTErrorsAndQuitOnFatal(Error)` whenever code needs to deal with BOLT errors. To test this, we have fatal.s that checks we are correctly quitting and printing a fatal error to the screen. Because this is a significant change by itself, not all code was yet ported. Code from Profiler libs (DataAggregator and friends) still print errors directly to screen. Co-authored-by: Rafael Auler <rafaelauler@fb.com> Test Plan: NFC	2024-02-12 14:53:53 -08:00
Amir Ayupov	13d60ce2f2	[BOLT][NFC] Propagate BOLTErrors from Core, RewriteInstance, and passes (2/2) (#81523 ) As part of the effort to refactor old error handling code that would directly call exit(1), in this patch continue the migration on libCore, libRewrite and libPasses to use the new BOLTError class whenever a failure occurs. Test Plan: NFC Co-authored-by: Rafael Auler <rafaelauler@fb.com>	2024-02-12 14:51:15 -08:00
Amir Ayupov	fa7dd4919a	[BOLT][NFC] Add BOLTError and return it from passes (1/2) (#81522 ) As part of the effort to refactor old error handling code that would directly call exit(1), in this patch we add a new class BOLTError and auxiliary functions `createFatalBOLTError()` and `createNonFatalBOLTError()` that allow BOLT code to bubble up the problem to the caller by using the Error class as a return type (or Expected). Also changes passes to use these. Co-authored-by: Rafael Auler <rafaelauler@fb.com> Test Plan: NFC	2024-02-12 14:39:59 -08:00
Amir Ayupov	a5f3d1a803	[BOLT][NFC] Return Error from BinaryFunctionPass::runOnFunctions (#81521 ) As part of the effort to refactor old error handling code that would directly call exit(1), in this patch we change the interface to `BinaryFunctionPass` to return an Error on `runOnFunctions()`. This gives passes the ability to report a serious problem to the caller (RewriteInstance class), so the caller may decide how to best handle the exceptional situation. Co-authored-by: Rafael Auler <rafaelauler@fb.com> Test Plan: NFC	2024-02-12 14:36:12 -08:00
Maksim Panchenko	7fe97f0420	[BOLT] Always run CheckLargeFunctions in non-relocation mode (#80922 ) We run CheckLargeFunctions pass in non-relocation mode to prevent the emission of functions that later could not be written to the output due to their large size. The main reason behind the pass is to prevent the emission of metadata for such functions since this metadata becomes incorrect if the function is left unmodified. Currently, the pass is enabled in non-relocation mode only when debug info output is also enabled. As we emit increasingly more kinds of metadata, e.g. for the Linux kernel, it becomes more challenging to track metadata that needs to be fixed. Hence, I'm enabling the pass to always run in non-relocation mode.	2024-02-08 14:21:49 -08:00
Maksim Panchenko	8ea7f1d20a	[BOLT][NFCI] Keep instruction annotations (#80382 ) We used to delete most instruction annotations before code emission. It was done to release memory taken by annotations and to reduce overall memory consumption. However, since the implementation of annotations has moved to using existing instruction operands, the memory overhead associated with them has reduced drastically. I measured that savings are less than 0.5% on large binaries and processing time is just slightly reduced if we keep them. Additionally, I plan to use annotations in pre-emission passes for the Linux kernel rewriter.	2024-02-06 19:59:53 -08:00
Amir Ayupov	3c64b24ed3	[BOLT] Add extra staleness logging (#80225 ) Report two extra metrics: - # of stale functions with matching block count, - # of stale blocks with matching instruction count.	2024-02-01 07:16:40 -08:00
Amir Ayupov	e9309b27d7	[BOLT] Report input staleness (#79496 ) It's beneficial to have uniform reporting in both `infer-stale-profile` on and off cases, primarily for logging purposes. Without this change, BOLT would report "input" staleness in `infer-stale-profile=0` case (without matching), and "output" staleness in `infer-stale-profile=1` case (after matching). This change makes BOLT report "input" staleness in both cases. "Output" staleness information is printed separately with "BOLT-INFO: inferred profile..."	2024-01-25 14:15:13 -08:00
Maksim Panchenko	f633f325a1	[BOLT] Fix NOP instruction emission on x86 (#72186 ) Use MCAsmBackend::writeNopData() interface to emit NOP instructions on x86. There are multiple forms of NOP instruction on x86 with different sizes. Currently, LLVM's assembly/disassembly does not support all forms correctly which can lead to a breakage of input code semantics, e.g. if the program relies on NOP instructions for reserving a patch space. Add "--keep-nops" option to preserve NOP instructions.	2023-11-13 18:12:39 -08:00
Maksim Panchenko	ec4a03c658	[BOLT] Enhance LowerAnnotations pass. NFCI. (#71847 ) After #70147, all primary annotation types are stored directly in the instruction and hence there's no need for the temporary storage we've used previously for repopulating preserved annotations.	2023-11-12 19:34:42 -08:00
Vladislav Khmelevsky	c6c04a83a7	[BOLT] Run EliminateUnreachableBlocks in parallel (#71299 ) The wall time for this pass decreased on my laptop from ~80 sec to 5 sec processing the clang.	2023-11-10 00:46:04 +04:00
maksfb	e28c393bd1	[BOLT] Reduce the number of emitted symbols. NFCI. (#70175 ) We emit a symbol before an instruction for a number of reasons, e.g. for tracking LocSyms, debug line, or if the instruction has a label annotation. Currently, we may emit multiple symbols per instruction. Reuse the same label instead of creating and emitting new ones when possible. I'm planning to refactor EH labels as well in a separate diff. Change getLabel() to return a pointer instead of std::optional<> since an empty label should be treated identically to no label.	2023-11-06 11:41:47 -08:00
Job Noorman	43e9eae6e8	[BOLT] Preserve label annotations for injected functions (#68713 ) Needed for instrumentation on RISC-V.	2023-10-11 07:26:20 +00:00
Job Noorman	ff5e2babcb	[BOLT] Improve handling of relocations targeting specific instructions (#66395 ) On RISC-V, there are certain relocations that target a specific instruction instead of a more abstract location like a function or basic block. Take the following example that loads a value from symbol `foo`: ``` nop 1: auipc t0, %pcrel_hi(foo) ld t0, %pcrel_lo(1b)(t0) ``` This results in two relocation: - auipc: `R_RISCV_PCREL_HI20` referencing `foo`; - ld: `R_RISCV_PCREL_LO12_I` referencing to local label `1` which points to the auipc instruction. It is of utmost importance that the `R_RISCV_PCREL_LO12_I` keeps referring to the auipc instruction; if not, the program will fail to assemble. However, BOLT currently does not guarantee this. BOLT currently assumes that all local symbols are jump targets and always starts a new basic block at symbol locations. The example above results in a CFG the looks like this: ``` .BB0: nop .BB1: auipc t0, %pcrel_hi(foo) ld t0, %pcrel_lo(.BB1)(t0) ``` While this currently works (i.e., the `R_RISCV_PCREL_LO12_I` relocation points to the correct instruction), it has two downsides: - Too many basic blocks are created (the example above is logically only one yet two are created); - If instructions are inserted in `.BB1` (e.g., by instrumentation), things will break since the label will not point to the auipc anymore. This patch proposes to fix this issue by teaching BOLT to track labels that should always point to a specific instruction. This is implemented as follows: - Add a new annotation type (`kLabel`) that allows us to annotate instructions with an `MCSymbol *`; - Whenever we encounter a relocation type that is used to refer to a specific instruction (`Relocation::isInstructionReference`), we register it without a symbol; - During disassembly, whenever we encounter an instruction with such a relocation, create a symbol for its target and store it in an offset to symbol map (to ensure multiple relocations referencing the same instruction use the same label); - After disassembly, iterate this map to attach labels to instructions via the new annotation type; - During emission, emit these labels right before the instruction. I believe the use of annotations works quite well for this use case as it allows us to reliably track instruction labels. If we were to store them as offsets in basic blocks, it would be error prone to keep them updated whenever instructions are inserted or removed. I have chosen to add labels as first-class annotations (as opposed to a generic one) because the documentation of `MCAnnotation` suggests that generic annotations are to be used for optional metadata that can be discarded without affecting correctness. As this is not the case for labels, a first-class annotation seemed more appropriate.	2023-10-06 06:46:16 +00:00
spupyrev	31e8a9f4d9	[BOLT] Add stale-related logging Adding some logs related to stale profile matching. The new data can be helpful to understand how "stale" the input profile is and how well the inference is able to utilize the stale data. Example of outputs on clang-10 built with LTO (profile collected on a year-old release): ``` BOLT-INFO: inferred profile for 2101 (18.52% of profiled, 100.00% of stale) functions responsible for 30.95% samples (14754697 out of 47670654) BOLT-INFO: stale inference matched 89.42% of basic blocks (79052 out of 88402 stale) responsible for 76.99% samples (645737 out of 838719 stale) ``` LTO+AutoFDO: ``` BOLT-INFO: inferred profile for 6146 (57.57% of profiled, 100.00% of stale) functions responsible for 90.34% samples (50891403 out of 56330313) BOLT-INFO: stale inference matched 74.55% of basic blocks (191295 out of 256589 stale) responsible for 57.30% samples (1288632 out of 2248799 stale) ``` Reviewed By: Amir, maksfb Differential Revision: https://reviews.llvm.org/D154737	2023-07-27 08:56:57 -07:00
Maksim Panchenko	deb53102a7	[BOLT] Remove unnecessary diagnostics When optimizations passes do not change anything, skip their diagnostics output. NFC otherwise. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D153386	2023-06-22 14:07:00 -07:00
spupyrev	44268271f6	[BOLT] stale profile matching [part 1 out of 2] BOLT often has to deal with profiles collected on binaries built from several revisions behind release. As a result, a certain percentage of functions is considered stale and not optimized. This diff adds an ability to match profile to functions that are not 100% binary identical, which increases the optimization coverage and boosts the performance of applications. The algorithm consists of two phases: matching and inference: - At the matching phase, we try to "guess" as many block and jump counts from the stale profile as possible. To this end, the content of each basic block is hashed and stored in the (yaml) profile. When BOLT optimizes a binary, it computes block hashes and identifies the corresponding entries in the stale profile. It yields a partial profile for every CFG in the binary. - At the inference phase, we employ a network flow-based algorithm (profi) to reconstruct "realistic" block and jump counts from the partial profile generated at the first stage. In practice, we don't always produce proper profile data but the majority (e.g., >90%) of CFGs get the correct counts. This is a first part of the change; the next stacked diff extends the block hashing and provides perf evaluation numbers. Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D144500	2023-06-06 12:13:52 -07:00
Rafael Auler	62a2feff57	[BOLT] Fix state of MCSymbols in lowering pass We have mostly harmless data races when running BinaryContext::calculateEmittedSize() in parallel, while performing split function pass. However, it is possible to end up in a state where some MCSymbols are still registered and our clean up failed. This happens rarely but it does happen, and when it happens, it is a difficult to diagnose heisenbug. To avoid this, add a new clean pass to perform a last check on MCSymbols, before they undergo our final emission pass, to verify that they are in a sane state. If we fail to do this, we might resolve some symbols to zero and crash the output binary. Reviewed By: #bolt, Amir Differential Revision: https://reviews.llvm.org/D137984	2023-05-16 14:54:16 -07:00
Amir Ayupov	be2f67c4d8	[BOLT][NFC] Replace anonymous namespace functions with static Follow LLVM Coding Standards guideline on using anonymous namespaces (https://llvm.org/docs/CodingStandards.html#anonymous-namespaces) and use `static` modifier for function definitions. Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D143124	2023-02-06 18:05:41 -08:00
Amir Ayupov	69a9bbf106	[BOLT][NFC] Replace ambiguous BinarySection::isReadOnly with isWritable Address feedback in https://reviews.llvm.org/D102284#2755060 Reviewed By: yota9 Differential Revision: https://reviews.llvm.org/D141733	2023-01-18 14:53:07 -08:00
Maksim Panchenko	27cf96c4ec	[BOLT] Minor refactoring for -print-sorted-by option Only display used values for -print-sorted-by option when printing help. Differential Revision: https://reviews.llvm.org/D141209	2023-01-12 13:25:36 -08:00
hezuoqiang	e3b47d31ae	[BOLT] Modify the print option to a meaningful value Using the option `-print-sorted-by=.` cause to core dump, so change to a legal value. Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D140847	2023-01-09 19:05:21 -08:00
Gabriel Ravier	9966b3e728	[BOLT] Fixed some typos I went over the output of the following mess of a command: `(ulimit -m 2000000; ulimit -v 2000000; git ls-files -z \| parallel --xargs -0 cat \| aspell list --mode=none --ignore-case \| grep -E '^[A-Za-z][a-z]*$' \| sort \| uniq -c \| sort -n \| grep -vE '.{25}' \| aspell pipe -W3 \| grep : \| cut -d' ' -f2 \| less)` and proceeded to spend a few days looking at it to find probable typos and fixed a few hundred of them in all of the llvm project (note, the ones I found are not anywhere near all of them, but it seems like a good start). Reviewed By: Amir, maksfb Differential Revision: https://reviews.llvm.org/D130824	2022-09-30 17:07:04 +02:00
Amir Ayupov	90d87dbf4b	[BOLT] Report BB reordering %-age vs profiled and total number of functions Reviewed By: spupyrev Differential Revision: https://reviews.llvm.org/D134819	2022-09-29 12:35:45 +02:00
Amir Ayupov	873942e178	[BOLT] Change reorder-blocks deprecated option warning output Revert to using `BOLT-WARNING` Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D132778	2022-09-08 15:48:41 -07:00
Fabian Parzefall	07f63b0ac5	[BOLT] Allocate FunctionFragment on heap This changes `FunctionFragment` from being used as a temporary proxy object to access basic block ranges to a heap-allocated object that can store fragment-specific information. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D132050	2022-08-24 18:06:08 -07:00
Fabian Parzefall	d5c03def24	[BOLT] Towards FunctionLayout const-correctness A const-qualified reference to function layout allows accessing non-const qualified basic blocks on a const-qualified function. This patch adds or removes const-qualifiers where necessary to indicate where basic blocks are used in a non-const manner. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D132049	2022-08-24 16:32:33 -07:00
Fabian Parzefall	f24c299e7d	Revert "[BOLT] Towards FunctionLayout const-correctness" This reverts commit 587d2653420d75ef10f30bd612d86f1e08fe9ea7.	2022-08-24 10:51:38 -07:00
Fabian Parzefall	5065134aa0	Revert "[BOLT] Allocate FunctionFragment on heap" This reverts commit 101344af1af82d1633c773b718788eaa813d7f79.	2022-08-24 10:51:36 -07:00
Fabian Parzefall	101344af1a	[BOLT] Allocate FunctionFragment on heap This changes `FunctionFragment` from being used as a temporary proxy object to access basic block ranges to a heap-allocated object that can store fragment-specific information. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D132050	2022-08-24 10:17:17 -07:00
Fabian Parzefall	587d265342	[BOLT] Towards FunctionLayout const-correctness A const-qualified reference to function layout allows accessing non-const qualified basic blocks on a const-qualified function. This patch adds or removes const-qualifiers where necessary to indicate where basic blocks are used in a non-const manner. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D132049	2022-08-24 10:17:17 -07:00
Fabian Parzefall	a191ea7d59	[BOLT] Make exception handling fragment aware This adds basic fragment awareness in the exception handling passes and generates the necessary symbols for fragments. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D130520	2022-08-18 21:55:06 -07:00
Fabian Parzefall	aed75748de	[BOLT] Remove old layout from function layout To track whether a function's new layout is different from its old layout when updating it, the old layout would be kept around in memory indefinitely (if the new layout is different). This was used only for debugging/logging purposes. This patch forces the caller of function layout's update method to copy the old layout into a temporary if they need it by removing the old layout fields. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D131413	2022-08-17 15:06:17 -07:00

1 2

74 Commits