llvm-project

Author	SHA1	Message	Date
Job Noorman	f873029386	[BOLT] Add minimal RISC-V 64-bit support Just enough features are implemented to process a simple "hello world" executable and produce something that still runs (including libc calls). This was mainly a matter of implementing support for various relocations. Currently, the following are handled: - R_RISCV_JAL - R_RISCV_CALL - R_RISCV_CALL_PLT - R_RISCV_BRANCH - R_RISCV_RVC_BRANCH - R_RISCV_RVC_JUMP - R_RISCV_GOT_HI20 - R_RISCV_PCREL_HI20 - R_RISCV_PCREL_LO12_I - R_RISCV_RELAX - R_RISCV_NONE Executables linked with linker relaxation will probably fail to be processed. BOLT relocates .text to a high address while leaving .plt at its original (low) address. This causes PC-relative PLT calls that were relaxed to a JAL to not fit their offset in an I-immediate anymore. This is something that will be addressed in a later patch. Changes to the BOLT core are relatively minor. Two things were tricky to implement and needed slightly larger changes. I'll explain those below. The R_RISCV_CALL(_PLT) relocation is put on the first instruction of a AUIPC/JALR pair, the second does not get any relocation (unlike other PCREL pairs). This causes issues with the combinations of the way BOLT processes binaries and the RISC-V MC-layer handles relocations: - BOLT reassembles instructions one by one and since the JALR doesn't have a relocation, it simply gets copied without modification; - Even though the MC-layer handles R_RISCV_CALL properly (adjusts both the AUIPC and the JALR), it assumes the immediates of both instructions are 0 (to be able to or-in a new value). This will most likely not be the case for the JALR that got copied over. To handle this difficulty without resorting to RISC-V-specific hacks in the BOLT core, a new binary pass was added that searches for AUIPC/JALR pairs and zeroes-out the immediate of the JALR. A second difficulty was supporting ABS symbols. As far as I can tell, ABS symbols were not handled at all, causing __global_pointer$ to break. RewriteInstance::analyzeRelocation was updated to handle these generically. Tests are provided for all supported relocations. Note that in order to test the correct handling of PLT entries, an ELF file produced by GCC had to be used. While I tried to strip the YAML representation, it's still quite large. Any suggestions on how to improve this would be appreciated. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D145687	2023-06-16 12:19:36 +02:00
Maksim Panchenko	5c4d306a10	[BOLT][NFC] Change signature of MCPlusBuilder::isUnsupportedBranch() Make MCPlusBuilder::isUnsupportedBranch() take MCInst, not opcode. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D152765	2023-06-13 12:20:36 -07:00
Maksim Panchenko	43f56a2f27	[BOLT] Fix handling of code references from unmodified code In lite mode (default for X86), BOLT optimizes and relocates functions with profile. The rest of the code is preserved, but if it references relocated code such references have to be updated. The update is handled by scanExternalRefs() function. Note that we cannot solely rely on relocations written by the linker, as not all code references are exposed to the linker. Additionally, the linker can modify certain instructions and relocations will no longer match the code. With this change, start using symbolic disassembler for scanning code for references in scanExternalRefs(). Unlike the previous approach, the symbolizer properly detects and creates references for instructions with multiple/ambiguous symbolic operands and handles cases where a relocation doesn't match any operand. See test cases for examples. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D152631	2023-06-12 10:46:51 -07:00
spupyrev	2316a10fe5	[BOLT] stale profile matching [part 2 out of 2] This is a first "serious" version of stale profile matching in BOLT. This diff extends the hash computation for basic blocks so that we can apply a fuzzy hash-based matching. The idea is to compute several "versions" of a hash value for a basic block. A loose version of a hash (computed by ignoring instruction operands) allows to match blocks in functions whose content has been changed, while stricter hash values (considering instruction opcodes with operands and even based on hashes of block's successors/predecessors) allow to resolve collisions. In order to save space and build time, individual hash components are blended into a single uint64_t. There are likely numerous ways of improving hash computation but already this simple variant provides significant perf benefits. Perf testing on the clang binary: collecting data on clang-10 and using it to optimize clang-11 (with ~1 year of commits in between). Next, we compare - //stale_clang// (clang-11 optimized with profile collected on clang-10 with infer-stale-profile=0) - //opt_clang// (clang-11 optimized with profile collected on clang-11) - //infer_clang// (clang-11 optimized with profile collected on clang-10 with infer-stale-profile=1) `LTO-only` mode: //stale_clang// vs //opt_clang//: task-clock [delta(%): 9.4252 ± 1.6582, p-value: 0.000002] (That is, there is a ~9.5% perf regression) //infer_clang// vs //opt_clang//: task-clock [delta(%): 2.1834 ± 1.8158, p-value: 0.040702] (That is, the regression is reduced to ~2%) Related BOLT logs: ``` BOLT-INFO: identified 2114 (18.61%) stale functions responsible for 30.96% samples BOLT-INFO: inferred profile for 2101 (18.52% of all profiled) functions responsible for 30.95% samples ``` `LTO+AutoFDO` mode: //stale_clang// vs //opt_clang//: task-clock [delta(%): 19.1293 ± 1.4131, p-value: 0.000002] //infer_clang// vs //opt_clang//: task-clock [delta(%): 7.4364 ± 1.3343, p-value: 0.000002] Related BOLT logs: ``` BOLT-INFO: identified 5452 (50.27%) stale functions responsible for 85.34% samples BOLT-INFO: inferred profile for 5442 (50.23% of all profiled) functions responsible for 85.33% samples ``` Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D146661	2023-06-08 14:42:41 -07:00
Amir Ayupov	713b28532e	[BOLT][NFC] Fix debug messages Fix debug printing, making it easier to compare two debug logs side by side: - `BinaryFunction::addRelocation`: print function name instead of `this` ptr, - `DataAggregator::doTrace`: remove duplicated function name. Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D152314	2023-06-06 15:50:58 -07:00
spupyrev	3e3a926be8	[BOLT][NFC] Add hash computation for basic blocks Extending yaml profile format with block hashes, which are used for stale profile matching. To avoid duplication of the code, created a new class with a collection of utilities for computing hashes. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D144306	2023-05-02 14:03:47 -07:00
Rafael Auler	b87bf74428	[BOLT] Fix creation of invalid CFG in presence of dead code When there is a direct jump right after an indirect one, in the absence of code jumpting to this direct jump, this is obviously dead code. However, BOLT was failing to recognize that by mistakenly placing both jmp instructions in the same basic block, and creating wrong successor edges. Fix that, so we can safely run UCE on that. This bug also causes validateCFG to fail and BOLT to crash if it is running ICP on that function. Reviewed By: #bolt, Amir Differential Revision: https://reviews.llvm.org/D148055	2023-04-11 17:19:39 -07:00
Alexis Engelke	0c049ea60a	[MC] Always encode instruction into SmallVector All users of MCCodeEmitter::encodeInstruction use a raw_svector_ostream to encode the instruction into a SmallVector. The raw_ostream however incurs some overhead for the actual encoding. This change allows an MCCodeEmitter to directly emit an instruction into a SmallVector without using a raw_ostream and therefore allow for performance improvments in encoding. A default path that uses existing raw_ostream implementations is provided. Reviewed By: MaskRay, Amir Differential Revision: https://reviews.llvm.org/D145791	2023-04-06 16:21:49 +02:00
spupyrev	92758a99c3	[BOLT] computing raw branch count for yaml profiles `Function.RawBranchCount` is initialized for fdata profile but not for yaml one. The diff adds the computation of the field for yaml profiles Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D144211	2023-03-28 11:09:21 -07:00
Kazu Hirata	4e585e51c1	Use *{Map,Set}::contains (NFC)	2023-03-15 22:55:35 -07:00
Amir Ayupov	edda85771a	[BOLT][NFC] Move addRelocation{X86,AArch64} into MCPlusBuilder The two methods don't belong in BinaryFunction methods. Move the dispatch tables into target-specific MCPlusBuilder methods. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D131813	2023-03-14 17:34:25 -07:00
Amir Ayupov	2eae9d8eb2	[BOLT][NFC] Use llvm::is_contained Apply the replacement throughout BOLT. Reviewed By: #bolt, rafauler Differential Revision: https://reviews.llvm.org/D145464	2023-03-14 15:37:03 -07:00
Vladislav Khmelevsky	7117af529e	[BOLT] Improve dynamic relocations support for CI This patch fixes few problems with supporting dynamic relocations in CI. 1. After dynamic relocations and functions were read search for dynamic relocations located in functions. Currently we expected them only to be relative and only to be in constant island. Mark islands of such functions to have dynamic relocations and create CI access symbol on the relocation offset, so the BD would be created for such place. 2. During function disassemble and handling address reference for constant island check if the referred external CI has dynamic relocation. And if it has one we would continue to refer original CI rather then creating a local copy. 3. After function disassembly stage mark function that has dynamic reloc in CI as non-simple. We don't want such functions to be optimized, since such passes as split function would create 2 copies of CI which we unable to support currently. 4. During updating output values for BF search for BD located in CI and update their output locations. 5. On dynamic relocation patching stage search for binary data located on relocation offset. If it was moved use new relocation offset value rather then an old one. Vladislav Khmelevsky, Advanced Software Technology Lab, Huawei Differential Revision: https://reviews.llvm.org/D143748	2023-03-13 13:37:28 +04:00
Amir Ayupov	d77f96a9e0	[BOLT][NFC] Simplify BinaryFunction::setTrapOnEntry Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D144758	2023-02-27 15:41:40 -08:00
Amir Ayupov	1c286acfb8	[BOLT] Prevent unsetting unknown control flow for split jump table In case of a function with unknown control flow but with a single jump table and a single jump table site, we attempt to match the jump table and a site and update block successors using jump table targets. Restrict this behavior for split jump tables which have targets in a fragment function. Fixes https://github.com/llvm/llvm-project/issues/60795. Reviewed By: #bolt, rafauler Differential Revision: https://reviews.llvm.org/D144602	2023-02-27 15:22:43 -08:00
Amir Ayupov	be2f67c4d8	[BOLT][NFC] Replace anonymous namespace functions with static Follow LLVM Coding Standards guideline on using anonymous namespaces (https://llvm.org/docs/CodingStandards.html#anonymous-namespaces) and use `static` modifier for function definitions. Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D143124	2023-02-06 18:05:41 -08:00
Amir Ayupov	69a9bbf106	[BOLT][NFC] Replace ambiguous BinarySection::isReadOnly with isWritable Address feedback in https://reviews.llvm.org/D102284#2755060 Reviewed By: yota9 Differential Revision: https://reviews.llvm.org/D141733	2023-01-18 14:53:07 -08:00
Amir Ayupov	f40d25dd8d	[BOLT][NFC] Use llvm::reverse Use llvm::reverse instead of `for (auto I = rbegin(), E = rend(); I != E; ++I)` Reviewed By: #bolt, rafauler Differential Revision: https://reviews.llvm.org/D140516	2023-01-03 17:32:11 -08:00
Maksim Panchenko	be9d3edee8	[BOLT][NFC] Remove unused PrintInstructions argument PrintInstructions was unused in BinaryFunction::print() and dump(). Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D140440	2022-12-20 15:57:13 -08:00
Amir Ayupov	76cfea0c47	[BOLT][NFC] Use std::optional for readDWARFExpressionTargetReg	2022-12-11 22:13:47 -08:00
Amir Ayupov	72528ee4b4	[BOLT][NFC] Use std::optional in has*NameRegex	2022-12-11 22:13:47 -08:00
Amir Ayupov	2563fd63c6	[BOLT][NFC] Use std::optional in MCPlusBuilder Reviewed By: maksfb, #bolt Differential Revision: https://reviews.llvm.org/D139260	2022-12-06 14:51:38 -08:00
Maksim Panchenko	bcc4c90954	[BOLT] Fix instruction encoding validation Always use non-symbolizing disassembler for instruction encoding validation as symbols will be treated as undefined/zeros be the encoder and causing byte sequence mismatches. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D136118	2022-10-18 13:50:00 -07:00
Rafael Auler	8d1fc45dc3	[BOLT][NFC] Refactor creation of symbol+addend references Put code that creates references to symbol+addend behind MCPlusBuilder. Will use this later in validate memory references pass. Reviewed By: #bolt, maksfb, yota9 Differential Revision: https://reviews.llvm.org/D134097	2022-10-12 18:39:26 -07:00
Amir Ayupov	e002523b65	[BOLT] Verify externally referenced blocks against jump table targets For functions with references to internal offsets from data, verify externally referenced blocks against the set of jump table targets. Mark the function as non-simple if there are any unclaimed data to code references. Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D132495	2022-09-16 11:44:33 -07:00
Fabian Parzefall	3ac46f377a	[BOLT] Emit LSDA call sites for all fragments For exception handling, LSDA call sites have to be emitted for each fragment individually. With this patch, call sites and respective LSDA symbols are generated and associated with each fragment of their function, such that they can be used by the emitter. Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D132052	2022-09-08 17:10:29 -07:00
Kazu Hirata	a0c7ca8ad4	[BOLT] Use range-based for loops (NFC) LLVM Coding Standards discourage for_each unless callable objects already exist.	2022-09-03 11:17:32 -07:00
Amir Ayupov	f119a2483d	[BOLT][NFC] Use llvm::any_of Replace the imperative pattern of the following kind ``` bool IsTrue = false; for (Element : Range) { if (Condition(Element)) { IsTrue = true; break; } } ``` with functional style `llvm::any_of`: ``` bool IsTrue = llvm::any_of(Range, [&](Element) { return Condition(Element); }); ``` Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D132276	2022-08-27 21:36:15 -07:00
Fabian Parzefall	9b6e7861ae	[BOLT] Track fragment info for all split fragments To generate all symbols correctly, it is necessary to record the address of each fragment. This patch moves the address info for the main and cold fragments from BinaryFunction to FunctionFragment, where this data is recorded for all fragments. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D132051	2022-08-24 18:07:09 -07:00
Fabian Parzefall	07f63b0ac5	[BOLT] Allocate FunctionFragment on heap This changes `FunctionFragment` from being used as a temporary proxy object to access basic block ranges to a heap-allocated object that can store fragment-specific information. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D132050	2022-08-24 18:06:08 -07:00
Fabian Parzefall	d5c03def24	[BOLT] Towards FunctionLayout const-correctness A const-qualified reference to function layout allows accessing non-const qualified basic blocks on a const-qualified function. This patch adds or removes const-qualifiers where necessary to indicate where basic blocks are used in a non-const manner. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D132049	2022-08-24 16:32:33 -07:00
Fabian Parzefall	f24c299e7d	Revert "[BOLT] Towards FunctionLayout const-correctness" This reverts commit 587d2653420d75ef10f30bd612d86f1e08fe9ea7.	2022-08-24 10:51:38 -07:00
Fabian Parzefall	5065134aa0	Revert "[BOLT] Allocate FunctionFragment on heap" This reverts commit 101344af1af82d1633c773b718788eaa813d7f79.	2022-08-24 10:51:36 -07:00
Fabian Parzefall	6304e38281	Revert "[BOLT] Track fragment info for all split fragments" This reverts commit 7e254818e49454a53bd00e3737007025b62d0f79.	2022-08-24 10:51:19 -07:00
Fabian Parzefall	7e254818e4	[BOLT] Track fragment info for all split fragments To generate all symbols correctly, it is necessary to record the address of each fragment. This patch moves the address info for the main and cold fragments from BinaryFunction to FunctionFragment, where this data is recorded for all fragments. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D132051	2022-08-24 10:17:17 -07:00
Fabian Parzefall	101344af1a	[BOLT] Allocate FunctionFragment on heap This changes `FunctionFragment` from being used as a temporary proxy object to access basic block ranges to a heap-allocated object that can store fragment-specific information. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D132050	2022-08-24 10:17:17 -07:00
Fabian Parzefall	587d265342	[BOLT] Towards FunctionLayout const-correctness A const-qualified reference to function layout allows accessing non-const qualified basic blocks on a const-qualified function. This patch adds or removes const-qualifiers where necessary to indicate where basic blocks are used in a non-const manner. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D132049	2022-08-24 10:17:17 -07:00
Amir Ayupov	37cbbea674	[BOLT][NFC] Move out handleAArch64IndirectCall Move the large lambda out of BinaryFunction::disassemble, reducing its size from 255 to 233 LoC. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D132104	2022-08-23 17:37:01 -07:00
Amir Ayupov	c844850bdf	[BOLT][NFC] Move out handleIndirectBranch Move the large lambda out of BinaryFunction::disassemble, reducing its size from 295 to 255 LoC. Differential Revision: https://reviews.llvm.org/D132101	2022-08-23 17:36:51 -07:00
Amir Ayupov	ec1fbf229e	[BOLT][NFC] Move out handleExternalReference Move the large lambda out of BinaryFunction::disassemble, reducing its size from 338 to 295 LoC. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D132100	2022-08-23 17:36:41 -07:00
Amir Ayupov	6cd475f8ca	[BOLT][NFC] Move out handlePCRelOperand Move the large lambda out of BinaryFunction::disassemble, reducing its size from 377 to 338 LoC. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D132099	2022-08-23 17:36:29 -07:00
Fabian Parzefall	0f74d191d1	[BOLT] Generate sections for multiple fragments This patch adds support to generate any number of sections that are assigned to fragments of functions that are split more than two-way. With this, a function's nth split fragment goes into section `.text.cold.n`. This also changes `FunctionLayout::erase` to make sure, that there are no empty fragments at the end of the function. This sometimes happens when blocks are erased from the function. To avoid creating symbols pointing to these fragments, they need to be removed. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D130521	2022-08-18 21:55:06 -07:00
Fabian Parzefall	a191ea7d59	[BOLT] Make exception handling fragment aware This adds basic fragment awareness in the exception handling passes and generates the necessary symbols for fragments. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D130520	2022-08-18 21:55:06 -07:00
Fabian Parzefall	275e075cbe	[BOLT] Support passing fragments to code emission This changes code emission such that it can emit specific function fragments instead of scanning all basic blocks of a function and just emitting those that are hot or cold. To implement this, `FunctionLayout` explicitly distinguishes the "main" fragment (i.e. the one that contains the entry block and is associated with the original symbol) from "split" fragments. Additionally, `BinaryFunction` receives support for multiple cold symbols - one for each split fragment. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D130052	2022-08-18 21:55:06 -07:00
Denis Revunov	d0e29e87cd	[BOLT][AArch64] Ignore functions with islandsInfo during VeneerEliminarion and ICF Differential Revision: https://reviews.llvm.org/D131881 Reviewed By: yota9	2022-08-18 11:08:47 -04:00
Amir Ayupov	cdef841fe7	[BOLT][NFC] Simplify scanExternalRefs Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D132013	2022-08-17 17:33:59 -07:00
Fabian Parzefall	0f8412c19c	[BOLT] Add main fragment to function layout Functions that do not contain any code still have to be emitted. This occurs on AArch64 where functions can consist only of a constant island. To support fragment semantics in code emission, this commits adds a guaranteed main fragment to function layout. This fragment might be empty, but allows us omit checks whether the function is empty in most places. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D130051	2022-08-17 14:51:31 -07:00
Rafael Auler	19eb908e61	[BOLT] Remove always true if statement Got a warning from GCC when building this. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D131092	2022-08-03 13:11:33 -07:00
Kazu Hirata	b498a8991e	[bolt] Remove redundaunt control-flow statements (NFC) Identified with readability-redundant-control-flow.	2022-07-30 10:35:49 -07:00
Fabian Parzefall	8477bc6761	[BOLT] Add function layout class This patch adds a dedicated class to keep track of each function's layout. It also lays the groundwork for splitting functions into multiple fragments (as opposed to a strict hot/cold split). Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D129518	2022-07-16 17:23:24 -07:00

1 2

91 Commits