llvm-project

Author	SHA1	Message	Date
Alexander Yermolovich	bf2b035e58	[BOLT][DWARF] Fix handling .debug_str_offsets for type units (#75522 ) There was an assumpiton that TUs and CUs share .debug_str_offsets contribution. For ThinLTO builds it is not the case. Changed so that we parse contributions for TUs also, and did some refactoring so that we don't re-parse contributions that were not modified.	2023-12-14 17:27:21 -08:00
Kazu Hirata	ad8fd5b185	[BOLT] Use StringRef::{starts,ends}_with (NFC) This patch replaces uses of StringRef::{starts,ends}with with StringRef::{starts,ends}_with for consistency with std::{string,string_view}::{starts,ends}_with in C++20. I'm planning to deprecate and eventually remove StringRef::{starts,ends}with.	2023-12-13 23:34:49 -08:00
Alexander Yermolovich	fb9a851224	[BOLT][DWARF] Fix handling of debug_str_offsets (#75100 ) We were not setting size field of .debug_str_offsets correctly. Fixed it, and added a test.	2023-12-11 15:56:32 -08:00
Amir Ayupov	b039ccc684	[BOLT] Provide backwards compatibility for YAML profile with std::hash (#74253 ) Provide backwards compatibility for YAML profile that uses `std::hash`: xxh3 hash is the default for newly produced profile (sets `std-hash: false`), whereas the profile that doesn't specify `std-hash` will be treated as `std-hash: true`, preserving old behavior.	2023-12-11 12:27:32 -08:00
ShatianWang	c43d0432ef	[BOLT] Create .text.warm for 3-way splitting (#73863 ) This commit explicitly adds a warm code section, .text.warm, when -split-functions -split-strategy=cdsplit is used. This replaces the previous approach of using .text.cold.0 as warm and .text.cold.1 as cold in 3-way function splitting. NFC.	2023-11-29 22:42:36 -05:00
ShatianWang	076bd22f57	[BOLT] Add structure of CDSplit to SplitFunctions (#73430 ) This commit establishes the general structure of the CDSplit strategy in SplitFunctions without incorporating the exact splitting logic. With -split-functions -split-strategy=cdsplit, the SplitFunctions pass will run twice: the first time is before function reordering and functions are hot-cold split; the second time is after function reordering and functions are hot-warm-cold split based on the fixed function ordering. Currently, all functions are hot-warm split after the entry block in the second splitting pass. Subsequent commits will introduce the precise splitting logic. NFC.	2023-11-29 15:43:21 -05:00
Alexander Yermolovich	b47b3bee7b	[BOLT][DWARF] Fix handling of DWARF5 DWP (#72729 ) Fixed handling of DWP as input. Before BOLT crashed. Now it will write out correct CU, and all the TUs. Potential future improvement is to scan all the TUs used in this CU, and only include those.	2023-11-28 15:54:14 -08:00
ShatianWang	d333c0e062	[BOLT] Extend calculateEmittedSize() for block size calculation (#73076 ) This commit modifies BinaryContext::calculateEmittedSize() to update the BinaryBasicBlock::OutputAddressRange of each basic block in the function in place. BinaryBasicBlock::getOutputSize() now gives the emitted size of the basic block.	2023-11-23 15:28:31 -05:00
llongint	f3e54f2f97	[BOLT][NFC] Extract a function for dump MCInst (#67225 ) In GDB debugging, obtaining the assembly representation of MCInst is more intuitive.	2023-11-21 20:30:44 +08:00
Vladislav Khmelevsky	5b59540661	[BOLT] Enhance fixed indirect branch handling (#71324 ) Previously HasFixedIndirectBranch was set in BF to set isSimple to false later because of unreachable bb ellimination pass which might remove the BB with it's symbols accessed by other instructions than calls. It seems to be that better solution would be to add extra entry point on target offset instead of marking BF as non-simple.	2023-11-16 09:30:55 +04:00
Vladislav Khmelevsky	c5a306f07e	[BOLT] Fix LSDA section handling (#71821 ) Currently BOLT finds LSDA secition by it's name .gcc_except_table.main . But sometimes it might have suffix e.g. .gcc_except_table.main. Find LSDA section by it's address, rather by it's name. Fixes #71804	2023-11-15 23:21:50 +04:00
Maksim Panchenko	e823136d43	[BOLT] Refactor --keep-nops option. NFC. (#72228 ) Run RemoveNops pass only if --keep-nops is set to false (default).	2023-11-14 11:28:13 -08:00
Maksim Panchenko	f633f325a1	[BOLT] Fix NOP instruction emission on x86 (#72186 ) Use MCAsmBackend::writeNopData() interface to emit NOP instructions on x86. There are multiple forms of NOP instruction on x86 with different sizes. Currently, LLVM's assembly/disassembly does not support all forms correctly which can lead to a breakage of input code semantics, e.g. if the program relies on NOP instructions for reserving a patch space. Add "--keep-nops" option to preserve NOP instructions.	2023-11-13 18:12:39 -08:00
Maksim Panchenko	2db9b6a93f	[BOLT] Make instruction size a first-class annotation (#72167 ) When NOP instructions are used to reserve space in the code, e.g. for patching, it becomes critical to preserve their original size while emitting the code. On x86, we rely on "Size" annotation for NOP instructions size, as the original instruction size is lost in the disassembly/assembly process. This change makes instruction size a first-class annotation and is affectively NFCI. A follow-up diff will use the annotation for code emission.	2023-11-13 14:33:39 -08:00
Vladislav Khmelevsky	cf18f142c0	[BOLT] Read .rela.dyn in static non-pie binary (#71635 ) Static non-pie binary doesn't have DYNAMIC segment and BOLT skips reading .rela.dyn section because of it. But such binaries might have this section for example to store IFUNC relocation which is resolved by linked-in startup files, so force reading this section for static executables.	2023-11-10 11:47:12 +04:00
Vladislav Khmelevsky	c6c04a83a7	[BOLT] Run EliminateUnreachableBlocks in parallel (#71299 ) The wall time for this pass decreased on my laptop from ~80 sec to 5 sec processing the clang.	2023-11-10 00:46:04 +04:00
spaette	1a2f83366b	[BOLT] Fix typos (#68121 ) Closes https://github.com/llvm/llvm-project/issues/63097 Before merging please make sure the change to bolt/include/bolt/Passes/StokeInfo.h is correct. bolt/include/bolt/Passes/StokeInfo.h ```diff // This Pass solves the two major problems to use the Stoke program without - // proting its code: + // probing its code: ``` I'm still not happy about the awkward wording in this comment. bolt/include/bolt/Passes/FixRelaxationPass.h ``` $ ed -s bolt/include/bolt/Passes/FixRelaxationPass.h <<<'9,12p' // This file declares the FixRelaxations class, which locates instructions with // wrong targets and fixes them. Such problems usually occures when linker // relaxes (changes) instructions, but doesn't fix relocations types properly // for them. $ ``` bolt/docs/doxygen.cfg.in bolt/include/bolt/Core/BinaryContext.h bolt/include/bolt/Core/BinaryFunction.h bolt/include/bolt/Core/BinarySection.h bolt/include/bolt/Core/DebugData.h bolt/include/bolt/Core/DynoStats.h bolt/include/bolt/Core/Exceptions.h bolt/include/bolt/Core/MCPlusBuilder.h bolt/include/bolt/Core/Relocation.h bolt/include/bolt/Passes/FixRelaxationPass.h bolt/include/bolt/Passes/InstrumentationSummary.h bolt/include/bolt/Passes/ReorderAlgorithm.h bolt/include/bolt/Passes/StackReachingUses.h bolt/include/bolt/Passes/StokeInfo.h bolt/include/bolt/Passes/TailDuplication.h bolt/include/bolt/Profile/DataAggregator.h bolt/include/bolt/Profile/DataReader.h bolt/lib/Core/BinaryContext.cpp bolt/lib/Core/BinarySection.cpp bolt/lib/Core/DebugData.cpp bolt/lib/Core/DynoStats.cpp bolt/lib/Core/Relocation.cpp bolt/lib/Passes/Instrumentation.cpp bolt/lib/Passes/JTFootprintReduction.cpp bolt/lib/Passes/ReorderData.cpp bolt/lib/Passes/RetpolineInsertion.cpp bolt/lib/Passes/ShrinkWrapping.cpp bolt/lib/Passes/TailDuplication.cpp bolt/lib/Rewrite/BoltDiff.cpp bolt/lib/Rewrite/DWARFRewriter.cpp bolt/lib/Rewrite/RewriteInstance.cpp bolt/lib/Utils/CommandLineOpts.cpp bolt/runtime/instr.cpp bolt/test/AArch64/got-ld64-relaxation.test bolt/test/AArch64/unmarked-data.test bolt/test/X86/Inputs/dwarf5-cu-no-debug-addr-helper.s bolt/test/X86/Inputs/linenumber.cpp bolt/test/X86/double-jump.test bolt/test/X86/dwarf5-call-pc-function-null-check.test bolt/test/X86/dwarf5-split-dwarf4-monolithic.test bolt/test/X86/dynrelocs.s bolt/test/X86/fallthrough-to-noop.test bolt/test/X86/tail-duplication-cache.s bolt/test/runtime/X86/instrumentation-ind-calls.s	2023-11-09 11:29:46 -08:00
Job Noorman	96b5e092dc	[BOLT] Support instrumentation hook via DT_FINI_ARRAY (#67348 ) BOLT currently hooks its its instrumentation finalization function via `DT_FINI`. However, this method of calling finalization routines is not supported anymore on newer ABIs like RISC-V. `DT_FINI_ARRAY` is preferred there. This patch adds support for hooking into `DT_FINI_ARRAY` instead if the binary does not have a `DT_FINI` entry. If it does, `DT_FINI` takes precedence so this patch should not change how the currently supported instrumentation targets behave. `DT_FINI_ARRAY` points to an array in memory of `DT_FINI_ARRAYSZ` bytes. It consists of pointer-length entries that contain the addresses of finalization functions. However, the addresses are only filled-in by the dynamic linker at load time using relative relocations. This makes hooking via `DT_FINI_ARRAY` a bit more complicated than via `DT_FINI`. The implementation works as follows: - While scanning the binary: find the section where `DT_FINI_ARRAY` points to, read its first dynamic relocation and use its addend to find the address of the fini function we will use to hook; - While writing the output file: overwrite the addend of the dynamic relocation with the address of the runtime library's fini function. Updating the dynamic relocation required a bit of boiler plate: since dynamic relocations are stored in a `std::multiset` which doesn't support getting mutable references to its items, functions were added to `BinarySection` to take an existing relocation and insert a new one.	2023-11-08 11:01:10 +00:00
Vladislav Khmelevsky	e2f1a95f2a	[BOLT][AArch64] Handle IFUNCS properly (#71104 ) Currently we were testing only the binaries compiled with O0, which results in indirect call to the IFUNC trampoline and the trampoline has associated IFUNC symbol with it. Compile with O3 results in direct calling the IFUNC trampoline and no symbols are associated with it, the IFUNC symbol address becomes the same as IFUNC resolver address. Since no symbol was associated the BF was not created before PLT analyze and be the algorithm we're going to analyze target relocation. As we're expecting the JUMP relocation we're also expecting the associated symbol with it to be presented. But for IFUNC relocation the IRELATIVE relocation is used and no symbol is associated with it, the addend value is pointing on the target symbol, so we need to find BF using it and use it's symbol in this situation. Currently this is checked only for AArch64 platform, so I've limited it in code to use this logic only for this platform, although I wouldn't be surprised if other platforms needs to activate this logic too.	2023-11-08 11:41:43 +04:00
Maksim Panchenko	0df154671b	[BOLT] Use Label annotation instead of EHLabel pseudo. NFCI. (#70179 ) When we need to attach EH label to an instruction, we can now use Label annotation instead of EHLabel pseudo instruction.	2023-11-06 14:43:14 -08:00
Maksim Panchenko	b336d741d0	[BOLT] Use direct storage for Label annotations. NFCI. (#70147 ) Store the Label annotation directly in the operand and avoid the extra allocation and indirection overheads associated with MCSimpleAnnotation.	2023-11-06 14:24:55 -08:00
maksfb	74e0a26fd1	[BOLT] Modify MCPlus annotation internals. NFCI. (#70412 ) When annotating MCInst instructions, attach extra annotation operands directly to the annotated instruction, instead of attaching them to an instruction pointed to by a special kInst operand. With this change, it's no longer necessary to allocate MCInst and most of the first-class annotations come with free memory as currently MCInst is declared with: SmallVector<MCOperand, 10> Operands; i.e. more operands than are normally being used. We still create a kInst operand with a nullptr instruction value to designate the beginning of annotation operands. However, this special operand might not be needed if we can rely on MCInstrDesc::NumOperands.	2023-11-06 12:14:22 -08:00
maksfb	e28c393bd1	[BOLT] Reduce the number of emitted symbols. NFCI. (#70175 ) We emit a symbol before an instruction for a number of reasons, e.g. for tracking LocSyms, debug line, or if the instruction has a label annotation. Currently, we may emit multiple symbols per instruction. Reuse the same label instead of creating and emitting new ones when possible. I'm planning to refactor EH labels as well in a separate diff. Change getLabel() to return a pointer instead of std::optional<> since an empty label should be treated identically to no label.	2023-11-06 11:41:47 -08:00
maksfb	6e26246c22	[BOLT][DWARF] Refactor address ranges processing (#71225 ) Create BinaryFunction::translateInputToOutputRange() and use it for updating DWARF debug ranges and location lists while de-duplicating the existing code. Additionally, move DWARF-specific code out of BinaryFunction and add print functions to facilitate debugging. Note that this change is deliberately kept "bug-level" compatible with the existing solution to keep it NFCI and make it easier to track any possible regressions in the future updates to the ranges-handling code.	2023-11-06 11:10:20 -08:00
Vladislav Khmelevsky	888742a121	[BOLT][AArch64] Handle .plt.got section (#71216 ) It seems that currently this section is only created by the mold linker if 2 conditions are met: 1. The PLT function was called directly. 2. The indirect access to PLT function was found (e.g. through ADRP relocation). Although mold created symbol for every plt entry I've removed them in yaml file to check that .plt.got was truly disassembled by bolt.	2023-11-04 00:47:24 +04:00
spupyrev	287fcd38a1	[BOLT] Rename cds to cdsort (#69966 ) Unify naming for the layout algorithms by renaming "cds" to "cdsort". This is NFC unless someone is already using the new algorithm (which is unlikely).	2023-11-02 12:46:36 -07:00
maksfb	8244ff6739	[BOLT] Fix incorrect basic block output addresses (#70000 ) Some optimization passes may duplicate basic blocks and assign the same input offset to a number of different blocks in a function. This is done e.g. to correctly map debugging ranges for duplicated code. However, duplicate input offsets present a problem when we use AddressMap to generate new addresses for basic blocks. The output address is calculated based on the input offset and will be the same for blocks with identical offsets. The result is potentially incorrect debug info and BAT records. To address the issue, we have to eliminate the dependency on input offsets while generating output addresses for a basic block. Each block has a unique label, hence we extend AddressMap to include address lookup based on MCSymbol and use the new functionality to update block addresses.	2023-10-24 12:22:43 -07:00
Job Noorman	b6b492880f	[BOLT][RISCV] Set minimum function alignment to 2 for RVC (#69837 ) In #67707, the minimum function alignment on RISC-V was set to 4. When RVC (compressed instructions) is enabled, the minimum alignment can be reduced to 2. This patch implements this by delegating the choice of minimum alignment to a new `MCPlusBuilder::getMinFunctionAlignment` function. This way, the target-dependent code in `BinaryFunction` is minimized.	2023-10-23 08:09:11 +00:00
Vladislav Khmelevsky	b7944f7c04	[BOLT] Return proper minimal alignment from BF (#67707 ) Currently minimal alignment of function is hardcoded to 2 bytes. Add 2 more cases: 1. In case BF is data in code return the alignment of CI as minimal alignment 2. For aarch64 and riscv platforms return the minimal value of 4 (added test for aarch64) Otherwise fallback to returning the 2 as it previously was.	2023-10-12 09:33:08 +04:00
Job Noorman	da37139ac9	[BOLT][NFC] Add allocator id to MCPlusBuilder::setLabel (#68707 ) This will be needed for some RISC-V instrumentation functions and is also consistent with other annotation setters.	2023-10-11 07:25:46 +00:00
Job Noorman	ff5e2babcb	[BOLT] Improve handling of relocations targeting specific instructions (#66395 ) On RISC-V, there are certain relocations that target a specific instruction instead of a more abstract location like a function or basic block. Take the following example that loads a value from symbol `foo`: ``` nop 1: auipc t0, %pcrel_hi(foo) ld t0, %pcrel_lo(1b)(t0) ``` This results in two relocation: - auipc: `R_RISCV_PCREL_HI20` referencing `foo`; - ld: `R_RISCV_PCREL_LO12_I` referencing to local label `1` which points to the auipc instruction. It is of utmost importance that the `R_RISCV_PCREL_LO12_I` keeps referring to the auipc instruction; if not, the program will fail to assemble. However, BOLT currently does not guarantee this. BOLT currently assumes that all local symbols are jump targets and always starts a new basic block at symbol locations. The example above results in a CFG the looks like this: ``` .BB0: nop .BB1: auipc t0, %pcrel_hi(foo) ld t0, %pcrel_lo(.BB1)(t0) ``` While this currently works (i.e., the `R_RISCV_PCREL_LO12_I` relocation points to the correct instruction), it has two downsides: - Too many basic blocks are created (the example above is logically only one yet two are created); - If instructions are inserted in `.BB1` (e.g., by instrumentation), things will break since the label will not point to the auipc anymore. This patch proposes to fix this issue by teaching BOLT to track labels that should always point to a specific instruction. This is implemented as follows: - Add a new annotation type (`kLabel`) that allows us to annotate instructions with an `MCSymbol *`; - Whenever we encounter a relocation type that is used to refer to a specific instruction (`Relocation::isInstructionReference`), we register it without a symbol; - During disassembly, whenever we encounter an instruction with such a relocation, create a symbol for its target and store it in an offset to symbol map (to ensure multiple relocations referencing the same instruction use the same label); - After disassembly, iterate this map to attach labels to instructions via the new annotation type; - During emission, emit these labels right before the instruction. I believe the use of annotations works quite well for this use case as it allows us to reliably track instruction labels. If we were to store them as offsets in basic blocks, it would be error prone to keep them updated whenever instructions are inserted or removed. I have chosen to add labels as first-class annotations (as opposed to a generic one) because the documentation of `MCAnnotation` suggests that generic annotations are to be used for optional metadata that can be discarded without affecting correctness. As this is not the case for labels, a first-class annotation seemed more appropriate.	2023-10-06 06:46:16 +00:00
Job Noorman	8fb83bf5f1	[BOLT][NFC] Add MCSubtargetInfo to MCPlusBuilder (#68223 ) On RISC-V, it's helpful to have access to `MCSubtargetInfo` while generating instructions in `MCPlusBuilder`. For example, a return instruction might be generated differently based on if the target supports compressed instructions (`c.jr ra`) or not (`jalr ra`).	2023-10-06 06:39:58 +00:00
Rafael Auler	853e126ce3	[BOLT] Support input binaries that use R_X86_GOTPC64 In large code model, the address of GOT is calculated by the static linker via R_X86_GOTPC64 reloc applied against a MOVABSQ instruction. In the final binary, it can be disassembled as a regular immediate, but because such immediate is the result of PC-relative pointer arithmetic, we need to parse this relocation and update this calculation whenever we move code, otherwise we break the code trying to read GOT. A test case showing how GOT is accessed was provided. Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D158911	2023-10-02 23:12:44 -07:00
Vladislav Khmelevsky	846eb76761	[BOLT][AArch64] Fix instrumentation deadloop According to ARMv8-a architecture reference manual B2.10.5 software must avoid having any explicit memory accesses between exclusive load and associated store instruction. Otherwise exclusive monitor might clear the exclusivity without application-related cause which may result in the deadloop. Disable instrumentation for such functions, since between exclusive load and store there might be branches and we would insert instrumentation snippet which contains loads and stores. The better solution would be to analyze with BFS finding the exact BBs between load and store and not instrumenting them. Or even better to recognize such sequences and replace them with more complex one, e.g. loading value non exclusively, and for the brach where exclusive store is made make exclusive load and store sequentially, but for now just disable instrumentation for such functions completely. Differential Revision: https://reviews.llvm.org/D159520	2023-09-22 00:58:01 +04:00
Kristof Beyls	8fb28e45ce	[BOLT] Fix data race in MCPlusBuilder::getOrCreateAnnotationIndex (#67004 ) MCPlusBuilder::getOrCreateAnnotationIndex(Name) can be called from different threads, for example when making use of ParallelUtilities::runOnEachFunctionWithUniqueAllocId. The race occurs when an Index for a particular annotation Name needs to be created for the first time. For example, this can easily happen when multiple "copies" of an analysis pass run on different BinaryFunctions, and the analysis pass creates a new Annotation Index to be able to store analysis results as annotations. This was found by using the ThreadSanitizer. No regression test was added; I don't think there is good way to write regression tests that verify the absence of data races? --------- Co-authored-by: Amir Ayupov <fads93@gmail.com>	2023-09-21 19:53:09 +02:00
Job Noorman	dc925be68b	[BOLT][RISCV] Carry-over annotations when fixing calls (#66763 ) `FixRISCVCallsPass` changes all different forms of calls to `PseudoCALL` instructions. However, the original call's annotations were lost in the process. This patch fixes this by moving all annotations from the old to the new call. `MCPlusBuilder::moveAnnotations` had to be made public for this.	2023-09-21 06:37:47 +00:00
Job Noorman	c5ba61978c	[BOLT][RISCV] Add support for linker relaxation Calls on RISC-V are typically compiled to `auipc`/`jalr` pairs to allow a maximum target range (32-bit pc-relative). In order to optimize calls to near targets, linker relaxation may replace those pairs with, for example, single `jal` instructions. To allow BOLT to freely reassign function addresses in relaxed binaries, this patch proposes the following approach: - Expand all relaxed calls back to `auipc`/`jalr`; - Rely on JITLink to relax those back to shorter forms where possible. This is implemented by detecting all possible call instructions and replacing them with `PseudoCALL` (or `PseudoTAIL`) instructions. The RISC-V backend then expands those and adds the necessary relocations for relaxation. Since BOLT generally ignores pseudo instruction, this patch makes `MCPlusBuilder::isPseudo` virtual so that `RISCVMCPlusBuilder` can override it to exclude `PseudoCALL` and `PseudoTAIL`. To ensure JITLink knows about the correct section addresses while relaxing, reassignment of addresses has been moved to a post-allocation pass. Note that this is probably the time it had to be done in the first place since in `notifyResolved` (where it was done before), all symbols are supposed to be resolved already. Depends on D159082 Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D159089	2023-09-15 11:57:28 +02:00
Amir Ayupov	7b750943d7	[BOLT][NFC] Speedup YAML profile processing Reduce YAML profile processing times: - preprocessProfile: speed up buildNameMaps by replacing ProfileNameToProfile mapping with ProfileFunctionNames set and ProfileBFs vector. Pre-look up YamlBF->BF correspondence, memoize in ProfileBFs. - readProfile: replace iteration over all functions in the binary by iteration over profile functions (strict match and LTO name match). On a large binary (1.9M functions) and large YAML profile (121MB, 30k functions) reduces profile steps runtime: pre-process profile data: 12.4953s -> 10.7123s process profile data: 9.8195s -> 5.6639s Compared to fdata profile reading: pre-process profile data: 8.0268s process profile data: 1.0265s process profile data pre-CFG: 0.1644s Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D159460	2023-09-11 16:07:57 -07:00
Job Noorman	eafe4ee2e8	[BOLT] Rename isLoad/isStore to mayLoad/mayStore As discussed in D159266, for some instructions it's impossible to know statically if they will load/store (e.g., predicated instructions). Therefore, mayLoad/mayStore are more appropriate names.	2023-09-01 09:36:05 +02:00
Job Noorman	76f040bda6	[BOLT] Provide generic implementations for isLoad/isStore `MCInstrDesc` provides the `mayLoad` and `mayStore` flags that seem appropriate to use as a target-independent way to implement `isLoad` and `isStore`. I believe this is currently good enough to use for the RISC-V target as well. I've provided a test for this that checks the generated dyno stats (which seems to be the only thing both `isLoad` and `isStore` are used for). Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D159266	2023-09-01 09:36:05 +02:00
spupyrev	1256ef274c	[BOLT] Fine-tuning hash computation for stale matching Fine-tuning hash computation for stale matching: - introducing a new "loose" basic block hash that allows to match many more blocks than before; - tweaking params of the inference algorithm that find (slightly) better solutions; - added more meaningful tests for stale matching. Tested the changes on several open-source benchmarks (clang, rocksdb, chrome) and one prod workload using different compiler modes (LTO/PGO etc). There is always an improvement in the quality of inferred profiles. (The current implementation is still not optimal but the diff is a step forward; I am open to further suggestions) Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D156278	2023-08-31 07:29:02 -07:00
Job Noorman	475a93a07a	[BOLT] Calculate output values using BOLTLinker BOLT uses `MCAsmLayout` to calculate the output values of functions and basic blocks. This means output values are calculated based on a pre-linking state and any changes to symbol values during linking will cause incorrect values to be used. This issue can be triggered by enabling linker relaxation on RISC-V. Since linker relaxation can remove instructions, symbol values may change. This causes, among other things, the symbol table created by BOLT in the output executable to be incorrect. This patch solves this issue by using `BOLTLinker` to get symbol values instead of `MCAsmLayout`. This way, output values are calculated based on a post-linking state. To make sure the linker can update all necessary symbols, this patch also makes sure all these symbols are not marked as temporary so that they end-up in the object file's symbol table. Note that this patch only deals with symbols of binary functions (`BinaryFunction::updateOutputValues`). The technique described above turned out to be too expensive for basic block symbols so those are handled differently in D155604. Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D154604	2023-08-28 10:13:07 +02:00
Kazu Hirata	d791fa26a9	[BOLT] Use SmallPtrSet::contains (NFC)	2023-08-27 13:18:38 -07:00
Elvina Yakubova	6e4c230525	[BOLT][Instrumentation] Initial instrumentation support for AArch64 This commit adds code generation for AArch64 instrumentation, including direct and indirect calls support. Reviewed By: rafauler, yota9 Differential Revision: https://reviews.llvm.org/D151899	2023-08-24 19:34:57 +03:00
Denis Revunov	28fd2ca142	[BOLT] Fix trap value for non-X86 The trap value used by BOLT was assumed to be single-byte instruction. It made some functions unaligned on AArch64(e.g exceptions-instrumentation test) and caused emission failures. Fix that by changing fill value to StringRef. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D158191	2023-08-24 01:29:41 +03:00
zhoujiapeng	9fee2ac044	[BOLT][NFC] Split createRelocation in X86 and share the second part This commit splits the createRelocation function for the X86 architecture into two parts, retaining the first half and moving the second half to a new function called extractFixupExpr. The purpose of this change is to make extractFixupExpr a shared function between AArch64 and X86 architectures, increasing code reusability and maintainability. Child revision: https://reviews.llvm.org/D156018 Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D157217	2023-08-23 00:29:25 +08:00
Job Noorman	23c8d38258	[BOLT] Calculate input to output address map using BOLTLinker BOLT uses MCAsmLayout to calculate the output values of basic blocks. This means output values are calculated based on a pre-linking state and any changes to symbol values during linking will cause incorrect values to be used. This issue was first addressed in D154604 by adding all basic block symbols to the symbol table for the linker to resolve them. However, the runtime overhead of handling this huge symbol table turned out to be prohibitively large. This patch solves the issue in a different way. First, a temporary section containing [input address, output symbol] pairs is emitted to the intermediary object file. The linker will resolve all these references so we end up with a section of [input address, output address] pairs. This section is then parsed and used to: - Replace BinaryBasicBlock::OffsetTranslationTable - Replace BinaryFunction::InputOffsetToAddressMap - Update BinaryBasicBlock::OutputAddressRange Note that the reason this is more performant than the previous attempt is that these symbol references do not cause entries to be added to the symbol table. Instead, section-relative references are used for the relocations. Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D155604	2023-08-21 10:36:20 +02:00
hezuoqiang	a37e8a4bdc	[BOLT] Consider Code Fragments during regreassign During register swapping, the code fragments associated with the function need to be swapped together (which may be generated during PGO optimization). Fix https://github.com/llvm/llvm-project/issues/59730 Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D141931	2023-08-18 16:46:18 +08:00
Alexander Yermolovich	2c784f7d26	[BOLT][DWARF] Fix handling of invalid DIE references Compiler can generate DIE References that are invalid. Previously BOLT could assert when writing out IR to .debug_info. Changed where DIE offsets are changed so that it's always done. Thus making sure that assert is not triggered. Added more specific warnings, and ability to print out invalid referenced DIE offset when verbosity >=1. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D157746	2023-08-14 17:28:24 -07:00
Alexander Yermolovich	43fe9dcb71	[BOLT][DWARF][NFC] Remove addIndexAddress Removed unused API DebugAddrWriter::addIndexAddress. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D157357	2023-08-08 18:23:04 -07:00

1 2 3 4 5 ...

342 Commits