llvm-project

Author	SHA1	Message	Date
Maksim Panchenko	2db9b6a93f	[BOLT] Make instruction size a first-class annotation (#72167 ) When NOP instructions are used to reserve space in the code, e.g. for patching, it becomes critical to preserve their original size while emitting the code. On x86, we rely on "Size" annotation for NOP instructions size, as the original instruction size is lost in the disassembly/assembly process. This change makes instruction size a first-class annotation and is affectively NFCI. A follow-up diff will use the annotation for code emission.	2023-11-13 14:33:39 -08:00
Vladislav Khmelevsky	cf18f142c0	[BOLT] Read .rela.dyn in static non-pie binary (#71635 ) Static non-pie binary doesn't have DYNAMIC segment and BOLT skips reading .rela.dyn section because of it. But such binaries might have this section for example to store IFUNC relocation which is resolved by linked-in startup files, so force reading this section for static executables.	2023-11-10 11:47:12 +04:00
Vladislav Khmelevsky	c6c04a83a7	[BOLT] Run EliminateUnreachableBlocks in parallel (#71299 ) The wall time for this pass decreased on my laptop from ~80 sec to 5 sec processing the clang.	2023-11-10 00:46:04 +04:00
spaette	1a2f83366b	[BOLT] Fix typos (#68121 ) Closes https://github.com/llvm/llvm-project/issues/63097 Before merging please make sure the change to bolt/include/bolt/Passes/StokeInfo.h is correct. bolt/include/bolt/Passes/StokeInfo.h ```diff // This Pass solves the two major problems to use the Stoke program without - // proting its code: + // probing its code: ``` I'm still not happy about the awkward wording in this comment. bolt/include/bolt/Passes/FixRelaxationPass.h ``` $ ed -s bolt/include/bolt/Passes/FixRelaxationPass.h <<<'9,12p' // This file declares the FixRelaxations class, which locates instructions with // wrong targets and fixes them. Such problems usually occures when linker // relaxes (changes) instructions, but doesn't fix relocations types properly // for them. $ ``` bolt/docs/doxygen.cfg.in bolt/include/bolt/Core/BinaryContext.h bolt/include/bolt/Core/BinaryFunction.h bolt/include/bolt/Core/BinarySection.h bolt/include/bolt/Core/DebugData.h bolt/include/bolt/Core/DynoStats.h bolt/include/bolt/Core/Exceptions.h bolt/include/bolt/Core/MCPlusBuilder.h bolt/include/bolt/Core/Relocation.h bolt/include/bolt/Passes/FixRelaxationPass.h bolt/include/bolt/Passes/InstrumentationSummary.h bolt/include/bolt/Passes/ReorderAlgorithm.h bolt/include/bolt/Passes/StackReachingUses.h bolt/include/bolt/Passes/StokeInfo.h bolt/include/bolt/Passes/TailDuplication.h bolt/include/bolt/Profile/DataAggregator.h bolt/include/bolt/Profile/DataReader.h bolt/lib/Core/BinaryContext.cpp bolt/lib/Core/BinarySection.cpp bolt/lib/Core/DebugData.cpp bolt/lib/Core/DynoStats.cpp bolt/lib/Core/Relocation.cpp bolt/lib/Passes/Instrumentation.cpp bolt/lib/Passes/JTFootprintReduction.cpp bolt/lib/Passes/ReorderData.cpp bolt/lib/Passes/RetpolineInsertion.cpp bolt/lib/Passes/ShrinkWrapping.cpp bolt/lib/Passes/TailDuplication.cpp bolt/lib/Rewrite/BoltDiff.cpp bolt/lib/Rewrite/DWARFRewriter.cpp bolt/lib/Rewrite/RewriteInstance.cpp bolt/lib/Utils/CommandLineOpts.cpp bolt/runtime/instr.cpp bolt/test/AArch64/got-ld64-relaxation.test bolt/test/AArch64/unmarked-data.test bolt/test/X86/Inputs/dwarf5-cu-no-debug-addr-helper.s bolt/test/X86/Inputs/linenumber.cpp bolt/test/X86/double-jump.test bolt/test/X86/dwarf5-call-pc-function-null-check.test bolt/test/X86/dwarf5-split-dwarf4-monolithic.test bolt/test/X86/dynrelocs.s bolt/test/X86/fallthrough-to-noop.test bolt/test/X86/tail-duplication-cache.s bolt/test/runtime/X86/instrumentation-ind-calls.s	2023-11-09 11:29:46 -08:00
Job Noorman	96b5e092dc	[BOLT] Support instrumentation hook via DT_FINI_ARRAY (#67348 ) BOLT currently hooks its its instrumentation finalization function via `DT_FINI`. However, this method of calling finalization routines is not supported anymore on newer ABIs like RISC-V. `DT_FINI_ARRAY` is preferred there. This patch adds support for hooking into `DT_FINI_ARRAY` instead if the binary does not have a `DT_FINI` entry. If it does, `DT_FINI` takes precedence so this patch should not change how the currently supported instrumentation targets behave. `DT_FINI_ARRAY` points to an array in memory of `DT_FINI_ARRAYSZ` bytes. It consists of pointer-length entries that contain the addresses of finalization functions. However, the addresses are only filled-in by the dynamic linker at load time using relative relocations. This makes hooking via `DT_FINI_ARRAY` a bit more complicated than via `DT_FINI`. The implementation works as follows: - While scanning the binary: find the section where `DT_FINI_ARRAY` points to, read its first dynamic relocation and use its addend to find the address of the fini function we will use to hook; - While writing the output file: overwrite the addend of the dynamic relocation with the address of the runtime library's fini function. Updating the dynamic relocation required a bit of boiler plate: since dynamic relocations are stored in a `std::multiset` which doesn't support getting mutable references to its items, functions were added to `BinarySection` to take an existing relocation and insert a new one.	2023-11-08 11:01:10 +00:00
Vladislav Khmelevsky	e2f1a95f2a	[BOLT][AArch64] Handle IFUNCS properly (#71104 ) Currently we were testing only the binaries compiled with O0, which results in indirect call to the IFUNC trampoline and the trampoline has associated IFUNC symbol with it. Compile with O3 results in direct calling the IFUNC trampoline and no symbols are associated with it, the IFUNC symbol address becomes the same as IFUNC resolver address. Since no symbol was associated the BF was not created before PLT analyze and be the algorithm we're going to analyze target relocation. As we're expecting the JUMP relocation we're also expecting the associated symbol with it to be presented. But for IFUNC relocation the IRELATIVE relocation is used and no symbol is associated with it, the addend value is pointing on the target symbol, so we need to find BF using it and use it's symbol in this situation. Currently this is checked only for AArch64 platform, so I've limited it in code to use this logic only for this platform, although I wouldn't be surprised if other platforms needs to activate this logic too.	2023-11-08 11:41:43 +04:00
Maksim Panchenko	0df154671b	[BOLT] Use Label annotation instead of EHLabel pseudo. NFCI. (#70179 ) When we need to attach EH label to an instruction, we can now use Label annotation instead of EHLabel pseudo instruction.	2023-11-06 14:43:14 -08:00
Maksim Panchenko	b336d741d0	[BOLT] Use direct storage for Label annotations. NFCI. (#70147 ) Store the Label annotation directly in the operand and avoid the extra allocation and indirection overheads associated with MCSimpleAnnotation.	2023-11-06 14:24:55 -08:00
maksfb	74e0a26fd1	[BOLT] Modify MCPlus annotation internals. NFCI. (#70412 ) When annotating MCInst instructions, attach extra annotation operands directly to the annotated instruction, instead of attaching them to an instruction pointed to by a special kInst operand. With this change, it's no longer necessary to allocate MCInst and most of the first-class annotations come with free memory as currently MCInst is declared with: SmallVector<MCOperand, 10> Operands; i.e. more operands than are normally being used. We still create a kInst operand with a nullptr instruction value to designate the beginning of annotation operands. However, this special operand might not be needed if we can rely on MCInstrDesc::NumOperands.	2023-11-06 12:14:22 -08:00
maksfb	e28c393bd1	[BOLT] Reduce the number of emitted symbols. NFCI. (#70175 ) We emit a symbol before an instruction for a number of reasons, e.g. for tracking LocSyms, debug line, or if the instruction has a label annotation. Currently, we may emit multiple symbols per instruction. Reuse the same label instead of creating and emitting new ones when possible. I'm planning to refactor EH labels as well in a separate diff. Change getLabel() to return a pointer instead of std::optional<> since an empty label should be treated identically to no label.	2023-11-06 11:41:47 -08:00
maksfb	6e26246c22	[BOLT][DWARF] Refactor address ranges processing (#71225 ) Create BinaryFunction::translateInputToOutputRange() and use it for updating DWARF debug ranges and location lists while de-duplicating the existing code. Additionally, move DWARF-specific code out of BinaryFunction and add print functions to facilitate debugging. Note that this change is deliberately kept "bug-level" compatible with the existing solution to keep it NFCI and make it easier to track any possible regressions in the future updates to the ranges-handling code.	2023-11-06 11:10:20 -08:00
Vladislav Khmelevsky	888742a121	[BOLT][AArch64] Handle .plt.got section (#71216 ) It seems that currently this section is only created by the mold linker if 2 conditions are met: 1. The PLT function was called directly. 2. The indirect access to PLT function was found (e.g. through ADRP relocation). Although mold created symbol for every plt entry I've removed them in yaml file to check that .plt.got was truly disassembled by bolt.	2023-11-04 00:47:24 +04:00
spupyrev	287fcd38a1	[BOLT] Rename cds to cdsort (#69966 ) Unify naming for the layout algorithms by renaming "cds" to "cdsort". This is NFC unless someone is already using the new algorithm (which is unlikely).	2023-11-02 12:46:36 -07:00
maksfb	8244ff6739	[BOLT] Fix incorrect basic block output addresses (#70000 ) Some optimization passes may duplicate basic blocks and assign the same input offset to a number of different blocks in a function. This is done e.g. to correctly map debugging ranges for duplicated code. However, duplicate input offsets present a problem when we use AddressMap to generate new addresses for basic blocks. The output address is calculated based on the input offset and will be the same for blocks with identical offsets. The result is potentially incorrect debug info and BAT records. To address the issue, we have to eliminate the dependency on input offsets while generating output addresses for a basic block. Each block has a unique label, hence we extend AddressMap to include address lookup based on MCSymbol and use the new functionality to update block addresses.	2023-10-24 12:22:43 -07:00
Job Noorman	b6b492880f	[BOLT][RISCV] Set minimum function alignment to 2 for RVC (#69837 ) In #67707, the minimum function alignment on RISC-V was set to 4. When RVC (compressed instructions) is enabled, the minimum alignment can be reduced to 2. This patch implements this by delegating the choice of minimum alignment to a new `MCPlusBuilder::getMinFunctionAlignment` function. This way, the target-dependent code in `BinaryFunction` is minimized.	2023-10-23 08:09:11 +00:00
Vladislav Khmelevsky	b7944f7c04	[BOLT] Return proper minimal alignment from BF (#67707 ) Currently minimal alignment of function is hardcoded to 2 bytes. Add 2 more cases: 1. In case BF is data in code return the alignment of CI as minimal alignment 2. For aarch64 and riscv platforms return the minimal value of 4 (added test for aarch64) Otherwise fallback to returning the 2 as it previously was.	2023-10-12 09:33:08 +04:00
Job Noorman	da37139ac9	[BOLT][NFC] Add allocator id to MCPlusBuilder::setLabel (#68707 ) This will be needed for some RISC-V instrumentation functions and is also consistent with other annotation setters.	2023-10-11 07:25:46 +00:00
Job Noorman	ff5e2babcb	[BOLT] Improve handling of relocations targeting specific instructions (#66395 ) On RISC-V, there are certain relocations that target a specific instruction instead of a more abstract location like a function or basic block. Take the following example that loads a value from symbol `foo`: ``` nop 1: auipc t0, %pcrel_hi(foo) ld t0, %pcrel_lo(1b)(t0) ``` This results in two relocation: - auipc: `R_RISCV_PCREL_HI20` referencing `foo`; - ld: `R_RISCV_PCREL_LO12_I` referencing to local label `1` which points to the auipc instruction. It is of utmost importance that the `R_RISCV_PCREL_LO12_I` keeps referring to the auipc instruction; if not, the program will fail to assemble. However, BOLT currently does not guarantee this. BOLT currently assumes that all local symbols are jump targets and always starts a new basic block at symbol locations. The example above results in a CFG the looks like this: ``` .BB0: nop .BB1: auipc t0, %pcrel_hi(foo) ld t0, %pcrel_lo(.BB1)(t0) ``` While this currently works (i.e., the `R_RISCV_PCREL_LO12_I` relocation points to the correct instruction), it has two downsides: - Too many basic blocks are created (the example above is logically only one yet two are created); - If instructions are inserted in `.BB1` (e.g., by instrumentation), things will break since the label will not point to the auipc anymore. This patch proposes to fix this issue by teaching BOLT to track labels that should always point to a specific instruction. This is implemented as follows: - Add a new annotation type (`kLabel`) that allows us to annotate instructions with an `MCSymbol *`; - Whenever we encounter a relocation type that is used to refer to a specific instruction (`Relocation::isInstructionReference`), we register it without a symbol; - During disassembly, whenever we encounter an instruction with such a relocation, create a symbol for its target and store it in an offset to symbol map (to ensure multiple relocations referencing the same instruction use the same label); - After disassembly, iterate this map to attach labels to instructions via the new annotation type; - During emission, emit these labels right before the instruction. I believe the use of annotations works quite well for this use case as it allows us to reliably track instruction labels. If we were to store them as offsets in basic blocks, it would be error prone to keep them updated whenever instructions are inserted or removed. I have chosen to add labels as first-class annotations (as opposed to a generic one) because the documentation of `MCAnnotation` suggests that generic annotations are to be used for optional metadata that can be discarded without affecting correctness. As this is not the case for labels, a first-class annotation seemed more appropriate.	2023-10-06 06:46:16 +00:00
Job Noorman	8fb83bf5f1	[BOLT][NFC] Add MCSubtargetInfo to MCPlusBuilder (#68223 ) On RISC-V, it's helpful to have access to `MCSubtargetInfo` while generating instructions in `MCPlusBuilder`. For example, a return instruction might be generated differently based on if the target supports compressed instructions (`c.jr ra`) or not (`jalr ra`).	2023-10-06 06:39:58 +00:00
Rafael Auler	853e126ce3	[BOLT] Support input binaries that use R_X86_GOTPC64 In large code model, the address of GOT is calculated by the static linker via R_X86_GOTPC64 reloc applied against a MOVABSQ instruction. In the final binary, it can be disassembled as a regular immediate, but because such immediate is the result of PC-relative pointer arithmetic, we need to parse this relocation and update this calculation whenever we move code, otherwise we break the code trying to read GOT. A test case showing how GOT is accessed was provided. Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D158911	2023-10-02 23:12:44 -07:00
Vladislav Khmelevsky	846eb76761	[BOLT][AArch64] Fix instrumentation deadloop According to ARMv8-a architecture reference manual B2.10.5 software must avoid having any explicit memory accesses between exclusive load and associated store instruction. Otherwise exclusive monitor might clear the exclusivity without application-related cause which may result in the deadloop. Disable instrumentation for such functions, since between exclusive load and store there might be branches and we would insert instrumentation snippet which contains loads and stores. The better solution would be to analyze with BFS finding the exact BBs between load and store and not instrumenting them. Or even better to recognize such sequences and replace them with more complex one, e.g. loading value non exclusively, and for the brach where exclusive store is made make exclusive load and store sequentially, but for now just disable instrumentation for such functions completely. Differential Revision: https://reviews.llvm.org/D159520	2023-09-22 00:58:01 +04:00
Kristof Beyls	8fb28e45ce	[BOLT] Fix data race in MCPlusBuilder::getOrCreateAnnotationIndex (#67004 ) MCPlusBuilder::getOrCreateAnnotationIndex(Name) can be called from different threads, for example when making use of ParallelUtilities::runOnEachFunctionWithUniqueAllocId. The race occurs when an Index for a particular annotation Name needs to be created for the first time. For example, this can easily happen when multiple "copies" of an analysis pass run on different BinaryFunctions, and the analysis pass creates a new Annotation Index to be able to store analysis results as annotations. This was found by using the ThreadSanitizer. No regression test was added; I don't think there is good way to write regression tests that verify the absence of data races? --------- Co-authored-by: Amir Ayupov <fads93@gmail.com>	2023-09-21 19:53:09 +02:00
Job Noorman	dc925be68b	[BOLT][RISCV] Carry-over annotations when fixing calls (#66763 ) `FixRISCVCallsPass` changes all different forms of calls to `PseudoCALL` instructions. However, the original call's annotations were lost in the process. This patch fixes this by moving all annotations from the old to the new call. `MCPlusBuilder::moveAnnotations` had to be made public for this.	2023-09-21 06:37:47 +00:00
Job Noorman	c5ba61978c	[BOLT][RISCV] Add support for linker relaxation Calls on RISC-V are typically compiled to `auipc`/`jalr` pairs to allow a maximum target range (32-bit pc-relative). In order to optimize calls to near targets, linker relaxation may replace those pairs with, for example, single `jal` instructions. To allow BOLT to freely reassign function addresses in relaxed binaries, this patch proposes the following approach: - Expand all relaxed calls back to `auipc`/`jalr`; - Rely on JITLink to relax those back to shorter forms where possible. This is implemented by detecting all possible call instructions and replacing them with `PseudoCALL` (or `PseudoTAIL`) instructions. The RISC-V backend then expands those and adds the necessary relocations for relaxation. Since BOLT generally ignores pseudo instruction, this patch makes `MCPlusBuilder::isPseudo` virtual so that `RISCVMCPlusBuilder` can override it to exclude `PseudoCALL` and `PseudoTAIL`. To ensure JITLink knows about the correct section addresses while relaxing, reassignment of addresses has been moved to a post-allocation pass. Note that this is probably the time it had to be done in the first place since in `notifyResolved` (where it was done before), all symbols are supposed to be resolved already. Depends on D159082 Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D159089	2023-09-15 11:57:28 +02:00
Amir Ayupov	7b750943d7	[BOLT][NFC] Speedup YAML profile processing Reduce YAML profile processing times: - preprocessProfile: speed up buildNameMaps by replacing ProfileNameToProfile mapping with ProfileFunctionNames set and ProfileBFs vector. Pre-look up YamlBF->BF correspondence, memoize in ProfileBFs. - readProfile: replace iteration over all functions in the binary by iteration over profile functions (strict match and LTO name match). On a large binary (1.9M functions) and large YAML profile (121MB, 30k functions) reduces profile steps runtime: pre-process profile data: 12.4953s -> 10.7123s process profile data: 9.8195s -> 5.6639s Compared to fdata profile reading: pre-process profile data: 8.0268s process profile data: 1.0265s process profile data pre-CFG: 0.1644s Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D159460	2023-09-11 16:07:57 -07:00
Job Noorman	eafe4ee2e8	[BOLT] Rename isLoad/isStore to mayLoad/mayStore As discussed in D159266, for some instructions it's impossible to know statically if they will load/store (e.g., predicated instructions). Therefore, mayLoad/mayStore are more appropriate names.	2023-09-01 09:36:05 +02:00
Job Noorman	76f040bda6	[BOLT] Provide generic implementations for isLoad/isStore `MCInstrDesc` provides the `mayLoad` and `mayStore` flags that seem appropriate to use as a target-independent way to implement `isLoad` and `isStore`. I believe this is currently good enough to use for the RISC-V target as well. I've provided a test for this that checks the generated dyno stats (which seems to be the only thing both `isLoad` and `isStore` are used for). Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D159266	2023-09-01 09:36:05 +02:00
spupyrev	1256ef274c	[BOLT] Fine-tuning hash computation for stale matching Fine-tuning hash computation for stale matching: - introducing a new "loose" basic block hash that allows to match many more blocks than before; - tweaking params of the inference algorithm that find (slightly) better solutions; - added more meaningful tests for stale matching. Tested the changes on several open-source benchmarks (clang, rocksdb, chrome) and one prod workload using different compiler modes (LTO/PGO etc). There is always an improvement in the quality of inferred profiles. (The current implementation is still not optimal but the diff is a step forward; I am open to further suggestions) Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D156278	2023-08-31 07:29:02 -07:00
Job Noorman	475a93a07a	[BOLT] Calculate output values using BOLTLinker BOLT uses `MCAsmLayout` to calculate the output values of functions and basic blocks. This means output values are calculated based on a pre-linking state and any changes to symbol values during linking will cause incorrect values to be used. This issue can be triggered by enabling linker relaxation on RISC-V. Since linker relaxation can remove instructions, symbol values may change. This causes, among other things, the symbol table created by BOLT in the output executable to be incorrect. This patch solves this issue by using `BOLTLinker` to get symbol values instead of `MCAsmLayout`. This way, output values are calculated based on a post-linking state. To make sure the linker can update all necessary symbols, this patch also makes sure all these symbols are not marked as temporary so that they end-up in the object file's symbol table. Note that this patch only deals with symbols of binary functions (`BinaryFunction::updateOutputValues`). The technique described above turned out to be too expensive for basic block symbols so those are handled differently in D155604. Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D154604	2023-08-28 10:13:07 +02:00
Kazu Hirata	d791fa26a9	[BOLT] Use SmallPtrSet::contains (NFC)	2023-08-27 13:18:38 -07:00
Elvina Yakubova	6e4c230525	[BOLT][Instrumentation] Initial instrumentation support for AArch64 This commit adds code generation for AArch64 instrumentation, including direct and indirect calls support. Reviewed By: rafauler, yota9 Differential Revision: https://reviews.llvm.org/D151899	2023-08-24 19:34:57 +03:00
Denis Revunov	28fd2ca142	[BOLT] Fix trap value for non-X86 The trap value used by BOLT was assumed to be single-byte instruction. It made some functions unaligned on AArch64(e.g exceptions-instrumentation test) and caused emission failures. Fix that by changing fill value to StringRef. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D158191	2023-08-24 01:29:41 +03:00
zhoujiapeng	9fee2ac044	[BOLT][NFC] Split createRelocation in X86 and share the second part This commit splits the createRelocation function for the X86 architecture into two parts, retaining the first half and moving the second half to a new function called extractFixupExpr. The purpose of this change is to make extractFixupExpr a shared function between AArch64 and X86 architectures, increasing code reusability and maintainability. Child revision: https://reviews.llvm.org/D156018 Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D157217	2023-08-23 00:29:25 +08:00
Job Noorman	23c8d38258	[BOLT] Calculate input to output address map using BOLTLinker BOLT uses MCAsmLayout to calculate the output values of basic blocks. This means output values are calculated based on a pre-linking state and any changes to symbol values during linking will cause incorrect values to be used. This issue was first addressed in D154604 by adding all basic block symbols to the symbol table for the linker to resolve them. However, the runtime overhead of handling this huge symbol table turned out to be prohibitively large. This patch solves the issue in a different way. First, a temporary section containing [input address, output symbol] pairs is emitted to the intermediary object file. The linker will resolve all these references so we end up with a section of [input address, output address] pairs. This section is then parsed and used to: - Replace BinaryBasicBlock::OffsetTranslationTable - Replace BinaryFunction::InputOffsetToAddressMap - Update BinaryBasicBlock::OutputAddressRange Note that the reason this is more performant than the previous attempt is that these symbol references do not cause entries to be added to the symbol table. Instead, section-relative references are used for the relocations. Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D155604	2023-08-21 10:36:20 +02:00
hezuoqiang	a37e8a4bdc	[BOLT] Consider Code Fragments during regreassign During register swapping, the code fragments associated with the function need to be swapped together (which may be generated during PGO optimization). Fix https://github.com/llvm/llvm-project/issues/59730 Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D141931	2023-08-18 16:46:18 +08:00
Alexander Yermolovich	2c784f7d26	[BOLT][DWARF] Fix handling of invalid DIE references Compiler can generate DIE References that are invalid. Previously BOLT could assert when writing out IR to .debug_info. Changed where DIE offsets are changed so that it's always done. Thus making sure that assert is not triggered. Added more specific warnings, and ability to print out invalid referenced DIE offset when verbosity >=1. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D157746	2023-08-14 17:28:24 -07:00
Alexander Yermolovich	43fe9dcb71	[BOLT][DWARF][NFC] Remove addIndexAddress Removed unused API DebugAddrWriter::addIndexAddress. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D157357	2023-08-08 18:23:04 -07:00
spupyrev	b402487b74	[BOLT] A new code layout algorithm for function reordering [3b/3] This is a new algorithm for function layout (reordering) based on the call graph extracted from a profile data; see diffs down the stack for more details. This layout is very similar to the existing hfsort+, but perhaps a little better on some benchmarks. The goals of the change is as follows: (i) rename and replace hfsort+ with a newer (hopefully better) implementation. I'd prefer to keep both algs together for some time to simplify evaluation and transition, but do want to remove hfsort+ once we're confident that there are no regressions. (ii) unify the implementation of code layout algorithms across LLVM. Currently Passes/HfsortPlus.cpp and Utils/CodeLayout.cpp share many implementation-specific details; this diff unifies the code. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D153039	2023-07-31 10:49:06 -07:00
Alexander Yermolovich	75f770a68f	[BOLT][DWARF] Update handling of size 1 ranges and fix sub-programs with ranges When output range is only one entry, and input is low_pc/high_pc do not convert to ranges. This helps with size of .debug_ranges/.debug_rnglists. It also helps when either low_pc/high_pc is 0. We not generating potentially invalid ranges that result in LLDB error. Also fixed handling of DW_AT_subprogram with ranges. This can be created with -fbasic-block-sections=all. Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D156374	2023-07-30 17:32:32 -07:00
spupyrev	31e8a9f4d9	[BOLT] Add stale-related logging Adding some logs related to stale profile matching. The new data can be helpful to understand how "stale" the input profile is and how well the inference is able to utilize the stale data. Example of outputs on clang-10 built with LTO (profile collected on a year-old release): ``` BOLT-INFO: inferred profile for 2101 (18.52% of profiled, 100.00% of stale) functions responsible for 30.95% samples (14754697 out of 47670654) BOLT-INFO: stale inference matched 89.42% of basic blocks (79052 out of 88402 stale) responsible for 76.99% samples (645737 out of 838719 stale) ``` LTO+AutoFDO: ``` BOLT-INFO: inferred profile for 6146 (57.57% of profiled, 100.00% of stale) functions responsible for 90.34% samples (50891403 out of 56330313) BOLT-INFO: stale inference matched 74.55% of basic blocks (191295 out of 256589 stale) responsible for 57.30% samples (1288632 out of 2248799 stale) ``` Reviewed By: Amir, maksfb Differential Revision: https://reviews.llvm.org/D154737	2023-07-27 08:56:57 -07:00
Amir Ayupov	e8a75c3f6e	[BOLT][NFC] Simplify YAMLProfileReader - Add `FunctionSet` type alias. - Use any_of - Use ErrorOr handling pattern Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D156043	2023-07-26 08:26:16 -07:00
Amir Ayupov	1e0d08e872	[BOLT] Add blocks order kind to YAML profile header Specify blocks order used in YAML profile. Needed to ensure profile backwards compatibility with pre-D155514 DFS order by default. Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D156176	2023-07-24 21:33:05 -07:00
Maksim Panchenko	dd630d831c	[BOLT][NFC] Add post-CFG processing to MetadataRewriter interface Add MetadataRewriter::postCFGInitializer(). Reviewed By: jobnoorman Differential Revision: https://reviews.llvm.org/D155153	2023-07-13 11:09:57 -07:00
Maksim Panchenko	e6724cbd8a	[BOLT] Add reading support for Linux ORC sections Read ORC (oops rewind capability) info used for unwinding the stack by Linux Kernel. The info is stored in .orc_unwind and .orc_unwind_ip sections. There is also a related .orc_lookup section that is being populated by the kernel during runtime. Contents of the sections are sorted for quicker lookup by a post-link objtool. Unless we modify stack access instructions, we don't have to change ORC info attributed to instructions in the binary. However, we need to update instruction addresses and sort both sections based on the new layout. For pretty printing, we add "--print-orc" option that prints ORC info next to instructions in code dumps. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D154815	2023-07-13 11:07:29 -07:00
Shoaib Meenai	caf5b6a212	[bolt] Fix MSVC builds We need to explicitly mark DWARFUnitInfo as non-copyable since MSVC's STL has a `noexcept(false)` move constructor for `unordered_map`; see the added comment for more details. An alternative might be using SmallVector instead of std::vector, since that never tries to copy elements [1]. That would result in a bunch of API changes though, so I figured a smaller targeted fix was better. [1] https://llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h Reviewed By: ayermolo, maksfb Differential Revision: https://reviews.llvm.org/D154924	2023-07-11 09:39:25 -07:00
Alexander Yermolovich	dcfa2ab534	[BOLT][DWARF] Change to process and write out TUs first then CUs in batches To reduce memory footprint changed so that we process and write out TUs first, reset DIEBuilder and process CUs. CUs are processed in buckets. First bucket contains all the CUs with cross CU references. Rest processd one at a time. clang-17 build in debug mode, by clang-17. before 8:25.81 real, 834.37 user, 86.03 sys, 0 amem, 79525064 mmem 8:02.20 real, 820.46 user, 81.81 sys, 0 amem, 79501616 mmem 7:52.69 real, 802.01 user, 83.99 sys, 0 amem, 79534392 mmem after 7:49.35 real, 822.04 user, 66.19 sys, 0 amem, 34934260 mmem 7:42.16 real, 825.46 user, 63.52 sys, 0 amem, 34951660 mmem 7:46.71 real, 821.11 user, 63.14 sys, 0 amem, 34981164 mmem Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D151909	2023-07-10 14:42:04 -07:00
Alexander Yermolovich	8362418748	[BOLT][DWARF] Output DWO files as they are being processed Changed how we handle writing out .dwo and .dwp files. We now write out DWO sections sooner and destroy DIEBuilder. This should decrease memory footprint. Ran on clang-17 build in debug mode with split-dwarf. before 8:07.49 real, 664.62 user, 69.00 sys, 0 amem, 41601612 mmem 8:07.06 real, 669.60 user, 68.75 sys, 0 amem, 41822588 mmem 8:00.36 real, 664.14 user, 66.36 sys, 0 amem, 41561548 mmem after 8:21.85 real, 682.23 user, 69.64 sys, 0 amem, 39379880 mmem 8:04.58 real, 671.62 user, 66.50 sys, 0 amem, 39735800 mmem 8:10.02 real, 680.67 user, 67.24 sys, 0 amem, 39662888 mmem Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D151908	2023-07-10 14:42:04 -07:00
Alexander Yermolovich	c33536e9c3	[BOLT][DWARF] Numerous fixes for a new DWARFRewriter * Some cleanup and minor fixes for the new debug information re-writer before moving on to productatization. * The new rewriter wasn't handling binary with DWARF5 and DWARF4 with -fdebug-types-sections. * Removed dead cross cu reference code. * Added support for DW_AT_sibling. * With the new re-writer abbrev number can change which can lead to offset of Type Units changing. Before we would just copy raw data. Changed to write out Type Unit List. This is generated by gdb-add-index. * Fixed how bolt handles gdb-index generated by gdb-11 with types sections. Simplified logic that handles variations of gdb-index. * Clang can generate two type units with the same hash, but different content. LLD does not de-duplicate when ThinLTO is involved. Changed so that TU hash and offset are used to make TU's unique. * It is possible to have references within location expression to another DIE. Fixed it so that relative offset is updated correctly. * Removed all the code related to patching. * Removed dead code. Changed how we handling writting out TUs and TU Index. It now should fully work for DWARF4 and DWARF5. * Removed unused arguments from some APIs, changed return type to void, and other small cleanups. Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D151906	2023-07-10 14:42:03 -07:00
Rui Zhong	87fb0ea27e	[BOLT][DWARF] Implement new mechanism for DWARFRewriter This revision implement new mechanism for DWARFRewriter. In the new mechanism, we adopt the same way with DWARFLinker did. By parsing Debug information into IR, we are allowed to handle debug information more flexible. Now the debug information updating process relies on IR and IR will be written out to binary once the updating finished. A new class was added: DIEBuilder. This class is responsible for parsing debug information and raising it to the IR level. This class is also used to write out the .debug_info and .debug_abbrev sections. Since we output brand new Abbrev section we won't need to always convert low_pc/high_pc into ranges. When conversion does happen we can also remove low_pc entry. Reviewed By: maksfb, ayermolo Differential Revision: https://reviews.llvm.org/D130315	2023-07-10 14:42:03 -07:00
Nico Weber	de7781ea42	Revert "[DWARF][BOLT] Implement new mechanism for DWARFRewriter" This reverts commit 460a2244430fae192298a5fd9fa2a269e540e8c1. It breaks building on macOS, and it was landed with a review URL pointing to some Facebook-internal service. Also reverts a bunch of follow-ups: Revert "[BOLT][DWARF] Don't check string offsets" This reverts commit f9d6f48c8bf5acaac07502403c41cf0b0d89c8d2. Revert "[BOLT][DWARF] Change to process and write out TUs first then CUs in batches" This reverts commit 88e95c1e4bb6e2ad3bfd185b96341ad5c09eff6b. Revert "[BOLT][DWARF] Output DWO files as they are being processed" This reverts commit 46ca2e3fcd419b1246357ed3b9cd36630f16e64d. Revert "[BOLT][DWARF] Don't check string offsets" This reverts commit cfe4a4b04f219a9dbb4e3fc01883437b6ff0e702. Revert "[BOLT][DWARF] Numerous fixes for a new DWARFRewriter" This reverts commit 2701a661daa393ad5901ac88d420d7aa931eda0d.	2023-07-07 08:07:01 -04:00

1 2 3 4 5 ...

329 Commits