llvm-project

Author	SHA1	Message	Date
Amir Ayupov	935b946b1f	[BOLT] Process cross references between ignored functions in BAT mode (#92484 ) To align YAML and fdata profiles produced in BAT mode, lift two restrictions applied in non-relocation mode when BAT is present: 1) register secondary entry points from ignored functions, 2) treat functions with secondary entry points as simple. This allows constructing CFG for non-simple functions in non-relocation mode and emitting YAML profile for them, which can then be used for optimizations in relocation mode. Test Plan: added test ignored-interprocedural-reference.s	2024-05-21 20:22:12 -07:00
Nathan Sidwell	76fdc2e527	[BOLT][NFC] Rename isUnsupportedBranch to isReversibleBranch (#92447 ) `isUnsupportedBranch` is not a very informative name, and doesn't match its corresponding `reverseBranchCondition`, as I noted in PR #92018. Here's a renaming to a more mnemonic name.	2024-05-17 15:40:40 -04:00
Nathan Sidwell	725014d866	[BOLT][NFC] Simplify CFG validation (#91977 ) Remove 'Valid' local boolean that has a single use, and return directly instead.	2024-05-14 09:36:34 -04:00
Amir Ayupov	db29f20fdd	[BOLT] Ignore returns in DataAggregator Returns are ignored in perf/pre-aggregated/fdata profile reader (see DataReader::convertBranchData). They are also omitted in YAMLProfileWriter by virtue of not having the profile attached to them in the reader, and YAMLProfileWriter converting the profile attached to BinaryFunctions. Thus, return profile is universally ignored across all profile types except BAT YAML. To make returns ignored for YAML produced in BAT mode, we can: 1) ignore them in YAMLProfileReader, 2) omit them from YAML profile in profile conversion/writing. The first option is prone to profile staleness issue, where the profiled binary doesn't match the one to be optimized, and thus returns in the profile can no longer be reliably detected (as we don't distinguish them from calls in the profile). The second option is robust to staleness but requires disassembling the branch source instruction. Test Plan: Updated bolt-address-translation-yaml.test Reviewers: rafaelauler, dcci, ayermolo, maksfb Reviewed By: maksfb Pull Request: https://github.com/llvm/llvm-project/pull/90807	2024-05-08 12:02:18 -07:00
Amir Ayupov	fd38366e45	[BOLT][NFC] Clean includes, add license headers (#87200 )	2024-03-31 19:29:45 -07:00
Amir Ayupov	d12e45ad16	[BOLT][NFC] Split out DomTree construction from BF::calculateLoopInfo (#87181 )	2024-03-31 06:24:19 -07:00
Amir Ayupov	d8fe2e4bb0	[BOLT] Fix enumeration of secondary entry points Make them start with 1 instead of 0 (reserved for primary entry point). Test Plan: ``` bin/llvm-lit -a tools/bolt/test/X86/yaml-secondary-entry-discriminator.s ``` Reviewers: rafaelauler, ayermolo, maksfb, dcci Reviewed By: maksfb Pull Request: https://github.com/llvm/llvm-project/pull/86848	2024-03-27 15:23:49 -07:00
Maksim Panchenko	6b1cf00400	[BOLT] Add support for Linux kernel static keys jump table (#86090 ) Runtime code modification used by static keys is the most ubiquitous self-modifying feature of the Linux kernel. The idea is to to eliminate the condition check and associated conditional jump on a hot path if that condition (based on a boolean value of a static key) does not change often. Whenever they condition changes, the kernel runtime modifies all code paths associated with that key flipping the code between nop and (unconditional) jump.	2024-03-21 14:05:21 -07:00
Maksim Panchenko	d7d564b2fc	[BOLT] Add BinaryFunction::registerBranch(). NFC (#83337 ) Add an external interface to register a branch in a function that is in disassembled state. Allows to make custom modifications to the disassembler. E.g., a pre-CFG pass can add an instruction and register a branch that will later be used during the CFG construction.	2024-02-28 20:04:28 -08:00
Maksim Panchenko	3f2a9e5910	[BOLT] Sort TakenBranches immediately before use. NFCI (#83333 ) Move code that sorts TakenBranches right before the branches are used. We can populate TakenBranches in pre-CFG post-processing and hence have to postpone the sorting to a later point in the processing pipeline. Will add such a pass later. For now it's NFC.	2024-02-28 19:51:44 -08:00
Maksim Panchenko	7c206c7812	[BOLT] Refactor interface for instruction labels. NFCI (#83209 ) To avoid accidentally setting the label twice for the same instruction, which can lead to a "lost" label, introduce getOrSetInstLabel() function. Rename existing functions to getInstLabel()/setInstLabel() to make it explicit that they operate on instruction labels. Add an assertion in setInstLabel() that the instruction did not have a prior label set.	2024-02-27 18:44:28 -08:00
Amir Ayupov	52cf07116b	[BOLT][NFC] Log through JournalingStreams (#81524 ) Make core BOLT functionality more friendly to being used as a library instead of in our standalone driver llvm-bolt. To accomplish this, we augment BinaryContext with journaling streams that are to be used by most BOLT code whenever something needs to be logged to the screen. Users of the library can decide if logs should be printed to a file, no file or to the screen, as before. To illustrate this, this patch adds a new option `--log-file` that allows the user to redirect BOLT logging to a file on disk or completely hide it by using `--log-file=/dev/null`. Future BOLT code should now use `BinaryContext::outs()` for printing important messages instead of `llvm::outs()`. A new test log.test enforces this by verifying that no strings are print to screen once the `--log-file` option is used. In previous patches we also added a new BOLTError class to report common and fatal errors, so code shouldn't call exit(1) now. To easily handle problems as before (by quitting with exit(1)), callers can now use `BinaryContext::logBOLTErrorsAndQuitOnFatal(Error)` whenever code needs to deal with BOLT errors. To test this, we have fatal.s that checks we are correctly quitting and printing a fatal error to the screen. Because this is a significant change by itself, not all code was yet ported. Code from Profiler libs (DataAggregator and friends) still print errors directly to screen. Co-authored-by: Rafael Auler <rafaelauler@fb.com> Test Plan: NFC	2024-02-12 14:53:53 -08:00
Amir Ayupov	13d60ce2f2	[BOLT][NFC] Propagate BOLTErrors from Core, RewriteInstance, and passes (2/2) (#81523 ) As part of the effort to refactor old error handling code that would directly call exit(1), in this patch continue the migration on libCore, libRewrite and libPasses to use the new BOLTError class whenever a failure occurs. Test Plan: NFC Co-authored-by: Rafael Auler <rafaelauler@fb.com>	2024-02-12 14:51:15 -08:00
Amir Ayupov	b039ccc684	[BOLT] Provide backwards compatibility for YAML profile with std::hash (#74253 ) Provide backwards compatibility for YAML profile that uses `std::hash`: xxh3 hash is the default for newly produced profile (sets `std-hash: false`), whereas the profile that doesn't specify `std-hash` will be treated as `std-hash: true`, preserving old behavior.	2023-12-11 12:27:32 -08:00
Maksim Panchenko	4f3081296f	[BOLT][NFC] Fix comment (#73983 ) Fix off-by-one error in comment.	2023-11-30 14:31:38 -08:00
Maksim Panchenko	4bcbbe1f70	[BOLT] Refactor fixBranches() (#73752 ) Simplify code in fixBranches(). Mostly NFC, accept the x86-specific check for code fragments now takes into account presence of more than two fragments. Should only matter when we split code into multiple fragments and can run fixBranches() more than once. Also, don't replace a branch target with the same one, as such operation may allocate memory for extra MCSymbolRefExpr.	2023-11-29 16:24:16 -08:00
spupyrev	e7dd596c68	[BOLT] Use deterministic xxh3 for computing BF/BB hashes (#72542 ) std::hash and ADT/Hashing::hash_value are non-deterministic functions whose results might vary across implementation/process/execution. Using xxh3 instead for computing hashes of BinaryFunctions and BinaryBasicBlock for stale profile matching. (A possible alternative is to use ADT/StableHashing.h based on FNV hashing but xxh3 seems to be more popular in LLVM) This is to address https://github.com/llvm/llvm-project/issues/65241.	2023-11-27 14:45:46 -08:00
Maksim Panchenko	f4834255d3	[BOLT] Reset output addresses for deleted blocks (#73429 ) This is a follow-up to #73076. We need to reset output addresses for deleted blocks, otherwise the address translation may mistakenly attribute input address of a deleted block to a non-zero address. While working on a test case, I've discovered that DWARF output ranges were already broken for deleted basic blocks: #73428. I will provide a test case for this PR with a DWARF address range fix PR.	2023-11-25 23:23:47 -08:00
Maksim Panchenko	365114292a	[BOLT][NFC] Refactor function state check (#73420 ) Remove redundant check in updateOutputValues().	2023-11-25 21:09:54 -08:00
ShatianWang	d333c0e062	[BOLT] Extend calculateEmittedSize() for block size calculation (#73076 ) This commit modifies BinaryContext::calculateEmittedSize() to update the BinaryBasicBlock::OutputAddressRange of each basic block in the function in place. BinaryBasicBlock::getOutputSize() now gives the emitted size of the basic block.	2023-11-23 15:28:31 -05:00
Maksim Panchenko	f653f6d57a	[BOLT][NFC] Delete unused declarations (#72596 )	2023-11-16 23:36:19 -08:00
Vladislav Khmelevsky	5b59540661	[BOLT] Enhance fixed indirect branch handling (#71324 ) Previously HasFixedIndirectBranch was set in BF to set isSimple to false later because of unreachable bb ellimination pass which might remove the BB with it's symbols accessed by other instructions than calls. It seems to be that better solution would be to add extra entry point on target offset instead of marking BF as non-simple.	2023-11-16 09:30:55 +04:00
Maksim Panchenko	e823136d43	[BOLT] Refactor --keep-nops option. NFC. (#72228 ) Run RemoveNops pass only if --keep-nops is set to false (default).	2023-11-14 11:28:13 -08:00
Maksim Panchenko	f633f325a1	[BOLT] Fix NOP instruction emission on x86 (#72186 ) Use MCAsmBackend::writeNopData() interface to emit NOP instructions on x86. There are multiple forms of NOP instruction on x86 with different sizes. Currently, LLVM's assembly/disassembly does not support all forms correctly which can lead to a breakage of input code semantics, e.g. if the program relies on NOP instructions for reserving a patch space. Add "--keep-nops" option to preserve NOP instructions.	2023-11-13 18:12:39 -08:00
Maksim Panchenko	2db9b6a93f	[BOLT] Make instruction size a first-class annotation (#72167 ) When NOP instructions are used to reserve space in the code, e.g. for patching, it becomes critical to preserve their original size while emitting the code. On x86, we rely on "Size" annotation for NOP instructions size, as the original instruction size is lost in the disassembly/assembly process. This change makes instruction size a first-class annotation and is affectively NFCI. A follow-up diff will use the annotation for code emission.	2023-11-13 14:33:39 -08:00
Vladislav Khmelevsky	c6c04a83a7	[BOLT] Run EliminateUnreachableBlocks in parallel (#71299 ) The wall time for this pass decreased on my laptop from ~80 sec to 5 sec processing the clang.	2023-11-10 00:46:04 +04:00
Maksim Panchenko	11f52f783a	[BOLT][DWARF] Fix invalid address ranges (#71474 ) When NOP instructions are removed by BOLT and a DWARF address range falls past the removed instructions, it may lead to invalid DWARF ranges in the output binary. E.g. the range may fall outside of the basic block boundaries. This fix makes sure the modified range fits within the containing basic block. A proper fix requires tracking instructions within the block and will come in a different PR.	2023-11-09 09:55:49 -08:00
Maksim Panchenko	254ccb95e8	[BOLT] Follow-up to "Fix incorrect basic block output addresses" (#71630 ) In 8244ff6739a09cb75e6e7fd1c24b85e2b1397266, I've introduced an assertion that incorrectly used BasicBlock::empty(). Some basic blocks may contain only pseudo instructions and thus BB->empty() will evaluate to false, while the actual code size will be zero.	2023-11-08 10:53:36 -08:00
maksfb	74e0a26fd1	[BOLT] Modify MCPlus annotation internals. NFCI. (#70412 ) When annotating MCInst instructions, attach extra annotation operands directly to the annotated instruction, instead of attaching them to an instruction pointed to by a special kInst operand. With this change, it's no longer necessary to allocate MCInst and most of the first-class annotations come with free memory as currently MCInst is declared with: SmallVector<MCOperand, 10> Operands; i.e. more operands than are normally being used. We still create a kInst operand with a nullptr instruction value to designate the beginning of annotation operands. However, this special operand might not be needed if we can rely on MCInstrDesc::NumOperands.	2023-11-06 12:14:22 -08:00
maksfb	7f031d1c7c	[BOLT] Fix address mapping for ICP code (#70136 ) When we create new code for indirect code promotion optimization, we should mark it as originating from the indirect jump instruction for BOLT address translation (BAT) to map it to the original instruction.	2023-11-06 11:25:49 -08:00
maksfb	6e26246c22	[BOLT][DWARF] Refactor address ranges processing (#71225 ) Create BinaryFunction::translateInputToOutputRange() and use it for updating DWARF debug ranges and location lists while de-duplicating the existing code. Additionally, move DWARF-specific code out of BinaryFunction and add print functions to facilitate debugging. Note that this change is deliberately kept "bug-level" compatible with the existing solution to keep it NFCI and make it easier to track any possible regressions in the future updates to the ranges-handling code.	2023-11-06 11:10:20 -08:00
Vladislav Khmelevsky	838331a081	[BOLT] Set NOOP size only on X86 (NFC) (#71307 ) Small fix, we have problems with noop size only on x86, no reason to do it on other platforms.	2023-11-06 15:21:50 +04:00
maksfb	8244ff6739	[BOLT] Fix incorrect basic block output addresses (#70000 ) Some optimization passes may duplicate basic blocks and assign the same input offset to a number of different blocks in a function. This is done e.g. to correctly map debugging ranges for duplicated code. However, duplicate input offsets present a problem when we use AddressMap to generate new addresses for basic blocks. The output address is calculated based on the input offset and will be the same for blocks with identical offsets. The result is potentially incorrect debug info and BAT records. To address the issue, we have to eliminate the dependency on input offsets while generating output addresses for a basic block. Each block has a unique label, hence we extend AddressMap to include address lookup based on MCSymbol and use the new functionality to update block addresses.	2023-10-24 12:22:43 -07:00
Job Noorman	6795bfce4d	[BOLT][RISCV] Handle CIE's produced by GNU as (#69578 ) On RISC-V, GNU as produces the following initial instruction in CIE's: ``` DW_CFA_def_cfa_register: r2 ``` While I believe it is technically illegal to use this instruction without first using a `DW_CFA_def_cfa` (since the offset is undefined), both `readelf` and `llvm-dwarfdump` accept this and implicitly set the offset to 0. In BOLT, however, this triggers an assert (in `CFISnapshot::advanceTo`) as it (correctly) believes the offset is not set. This patch fixes this by setting the offset to 0 whenever executing `DW_CFA_def_cfa_register` while the offset is undefined. Note that this is probably the simplest workaround but it has a downside: while emitting CFI start, we check if the initial instructions are contained within `MCAsmInfo::getInitialFrameState` and omit them if they are. This will not be true for GNU CIE's (since they differ from LLVM's) which causes an unnecessary `DW_CFA_def_cfa_register` to be emitted. While technically correct, it would probably be better to replace the GNU CIE with the one used by LLVM to avoid this situation. This would solve the problem this patch solves while also preventing unnecessary CFI instructions. However, this is a bit trickier to implement correctly so I propose to keep this for a later time. Note on testing: the test creates a simple function with three basic blocks and forces the CFI state of the last one to be different from the others using an arbitrary CFI instruction. Then, `--reorder-blocks=reverse` is used to force `CFISnapshot::advanceTo` to be called. This causes an assert on the current main branch.	2023-10-20 15:49:17 +00:00
Vladislav Khmelevsky	b7944f7c04	[BOLT] Return proper minimal alignment from BF (#67707 ) Currently minimal alignment of function is hardcoded to 2 bytes. Add 2 more cases: 1. In case BF is data in code return the alignment of CI as minimal alignment 2. For aarch64 and riscv platforms return the minimal value of 4 (added test for aarch64) Otherwise fallback to returning the 2 as it previously was.	2023-10-12 09:33:08 +04:00
Job Noorman	ff5e2babcb	[BOLT] Improve handling of relocations targeting specific instructions (#66395 ) On RISC-V, there are certain relocations that target a specific instruction instead of a more abstract location like a function or basic block. Take the following example that loads a value from symbol `foo`: ``` nop 1: auipc t0, %pcrel_hi(foo) ld t0, %pcrel_lo(1b)(t0) ``` This results in two relocation: - auipc: `R_RISCV_PCREL_HI20` referencing `foo`; - ld: `R_RISCV_PCREL_LO12_I` referencing to local label `1` which points to the auipc instruction. It is of utmost importance that the `R_RISCV_PCREL_LO12_I` keeps referring to the auipc instruction; if not, the program will fail to assemble. However, BOLT currently does not guarantee this. BOLT currently assumes that all local symbols are jump targets and always starts a new basic block at symbol locations. The example above results in a CFG the looks like this: ``` .BB0: nop .BB1: auipc t0, %pcrel_hi(foo) ld t0, %pcrel_lo(.BB1)(t0) ``` While this currently works (i.e., the `R_RISCV_PCREL_LO12_I` relocation points to the correct instruction), it has two downsides: - Too many basic blocks are created (the example above is logically only one yet two are created); - If instructions are inserted in `.BB1` (e.g., by instrumentation), things will break since the label will not point to the auipc anymore. This patch proposes to fix this issue by teaching BOLT to track labels that should always point to a specific instruction. This is implemented as follows: - Add a new annotation type (`kLabel`) that allows us to annotate instructions with an `MCSymbol *`; - Whenever we encounter a relocation type that is used to refer to a specific instruction (`Relocation::isInstructionReference`), we register it without a symbol; - During disassembly, whenever we encounter an instruction with such a relocation, create a symbol for its target and store it in an offset to symbol map (to ensure multiple relocations referencing the same instruction use the same label); - After disassembly, iterate this map to attach labels to instructions via the new annotation type; - During emission, emit these labels right before the instruction. I believe the use of annotations works quite well for this use case as it allows us to reliably track instruction labels. If we were to store them as offsets in basic blocks, it would be error prone to keep them updated whenever instructions are inserted or removed. I have chosen to add labels as first-class annotations (as opposed to a generic one) because the documentation of `MCAnnotation` suggests that generic annotations are to be used for optional metadata that can be discarded without affecting correctness. As this is not the case for labels, a first-class annotation seemed more appropriate.	2023-10-06 06:46:16 +00:00
Job Noorman	7fa33773e3	[BOLT][RISCV] Handle long tail calls (#67098 ) Long tail calls use the following instruction sequence on RISC-V: ``` 1: auipc xi, %pcrel_hi(sym) jalr zero, %pcrel_lo(1b)(xi) ``` Since the second instruction in isolation looks like an indirect branch, this confused BOLT and most functions containing a long tail call got marked with "unknown control flow" and didn't get optimized as a consequence. This patch fixes this by detecting long tail call sequence in `analyzeIndirectBranch`. `FixRISCVCallsPass` also had to be updated to expand long tail calls to `PseudoTAIL` instead of `PseudoCALL`. Besides this, this patch also fixes a minor issue with compressed tail calls (`c.jr`) not being detected. Note that I had to change `BinaryFunction::postProcessIndirectBranches` slightly: the documentation of `MCPlusBuilder::analyzeIndirectBranch` mentions that the [`Begin`, `End`) range contains the instructions immediately preceding `Instruction`. However, in `postProcessIndirectBranches`, all the instructions in the BB where passed in the range. This made it difficult to find the preceding instruction so I made sure only the preceding instructions are passed.	2023-10-05 08:55:30 +00:00
zhoujiapeng	16fd879980	[BOLT] Skip the validation of CFG after it is finalized When current state is `CFG_Finalized`, function `validateCFG()` should return true directly. Reviewed By: maksfb, yota9, Kepontry Differential Revision: https://reviews.llvm.org/D159410	2023-09-17 00:07:59 +08:00
Job Noorman	475a93a07a	[BOLT] Calculate output values using BOLTLinker BOLT uses `MCAsmLayout` to calculate the output values of functions and basic blocks. This means output values are calculated based on a pre-linking state and any changes to symbol values during linking will cause incorrect values to be used. This issue can be triggered by enabling linker relaxation on RISC-V. Since linker relaxation can remove instructions, symbol values may change. This causes, among other things, the symbol table created by BOLT in the output executable to be incorrect. This patch solves this issue by using `BOLTLinker` to get symbol values instead of `MCAsmLayout`. This way, output values are calculated based on a post-linking state. To make sure the linker can update all necessary symbols, this patch also makes sure all these symbols are not marked as temporary so that they end-up in the object file's symbol table. Note that this patch only deals with symbols of binary functions (`BinaryFunction::updateOutputValues`). The technique described above turned out to be too expensive for basic block symbols so those are handled differently in D155604. Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D154604	2023-08-28 10:13:07 +02:00
Rafael Auler	b59cf211a0	[BOLT] Don't choke on injected functions' IO map AddressMap would fail lookup for injected functions and crash BOLT. Fix that. Reviewed By: #bolt, maksfb, jobnoorman Differential Revision: https://reviews.llvm.org/D158685	2023-08-24 12:02:55 -07:00
Kazu Hirata	ff22d125a7	[BOLT] Fix an unused variable warning This patch fixes: bolt/lib/Core/BinaryFunction.cpp:4117:20: error: unused variable 'FragmentBaseAddress' [-Werror,-Wunused-variable]	2023-08-21 07:57:18 -07:00
Job Noorman	23c8d38258	[BOLT] Calculate input to output address map using BOLTLinker BOLT uses MCAsmLayout to calculate the output values of basic blocks. This means output values are calculated based on a pre-linking state and any changes to symbol values during linking will cause incorrect values to be used. This issue was first addressed in D154604 by adding all basic block symbols to the symbol table for the linker to resolve them. However, the runtime overhead of handling this huge symbol table turned out to be prohibitively large. This patch solves the issue in a different way. First, a temporary section containing [input address, output symbol] pairs is emitted to the intermediary object file. The linker will resolve all these references so we end up with a section of [input address, output address] pairs. This section is then parsed and used to: - Replace BinaryBasicBlock::OffsetTranslationTable - Replace BinaryFunction::InputOffsetToAddressMap - Update BinaryBasicBlock::OutputAddressRange Note that the reason this is more performant than the previous attempt is that these symbol references do not cause entries to be added to the symbol table. Instead, section-relative references are used for the relocations. Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D155604	2023-08-21 10:36:20 +02:00
Kazu Hirata	363be89c7d	[BOLT] Use static_assert (NFC)	2023-08-10 18:44:17 -07:00
Amir Ayupov	70e76e0982	[BOLT] Fix instrumenting conditional tail calls We identify instructions to be instrumented based on Offset annotation. BOLT "expands" conditional tail calls into a conditional jump to a basic block with unconditional tail call. Move Offset annotation from former CTC to the tail call. For expanded CTC we keep Offset attached to the original instruction which is converted into a regular conditional jump, while leaving the newly created tail call without an Offset annotation. This leads to attempting the instrumentation of the conditional jump which points to the basic block with an inherited input offset thus creating an invalid edge description. At the same time, the newly created tail call is skipped entirely which means we're not creating a call description for it. If we instead reassign Offset annotation from the conditional jump to the tail call we fix both issues. The conditional jump will be skipped not creating an invalid edge description, while tail call will be handled properly (unformly with regular calls). Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D156389	2023-07-31 13:52:50 -07:00
spupyrev	31e8a9f4d9	[BOLT] Add stale-related logging Adding some logs related to stale profile matching. The new data can be helpful to understand how "stale" the input profile is and how well the inference is able to utilize the stale data. Example of outputs on clang-10 built with LTO (profile collected on a year-old release): ``` BOLT-INFO: inferred profile for 2101 (18.52% of profiled, 100.00% of stale) functions responsible for 30.95% samples (14754697 out of 47670654) BOLT-INFO: stale inference matched 89.42% of basic blocks (79052 out of 88402 stale) responsible for 76.99% samples (645737 out of 838719 stale) ``` LTO+AutoFDO: ``` BOLT-INFO: inferred profile for 6146 (57.57% of profiled, 100.00% of stale) functions responsible for 90.34% samples (50891403 out of 56330313) BOLT-INFO: stale inference matched 74.55% of basic blocks (191295 out of 256589 stale) responsible for 57.30% samples (1288632 out of 2248799 stale) ``` Reviewed By: Amir, maksfb Differential Revision: https://reviews.llvm.org/D154737	2023-07-27 08:56:57 -07:00
Maksim Panchenko	b6fbb64d7c	[BOLT] Fix jump table issue for split functions A jump table in a split function may contain an entry matching a start address of another fragment of the function. While converting addresses to labels, we used to ignore such entries resulting in underpopulated jump table. Change that, so we always create one label per address. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D156013	2023-07-23 08:39:48 -07:00
Maksim Panchenko	38639a8159	[BOLT][NFCI] Migrate Linux Kernel handling code to MetadataRewriter Create LinuxKernelRewriter and move kernel-specific code to this class. Depends on D154023 Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D154024	2023-07-06 11:25:50 -07:00
Maksim Panchenko	98e2d63027	[BOLT][NFCI] Use MetadataRewriter interface to update SDT markers Migrate SDT markers processing to the new MetadataRewriter interface. Depends on D154020 Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D154021	2023-07-06 11:17:17 -07:00
Amir Ayupov	59a27170c9	[BOLT][NFC] Simplify postProcessJumpTables Reviewed By: #bolt, rafauler Differential Revision: https://reviews.llvm.org/D154115	2023-06-29 22:47:16 -07:00
Job Noorman	f873029386	[BOLT] Add minimal RISC-V 64-bit support Just enough features are implemented to process a simple "hello world" executable and produce something that still runs (including libc calls). This was mainly a matter of implementing support for various relocations. Currently, the following are handled: - R_RISCV_JAL - R_RISCV_CALL - R_RISCV_CALL_PLT - R_RISCV_BRANCH - R_RISCV_RVC_BRANCH - R_RISCV_RVC_JUMP - R_RISCV_GOT_HI20 - R_RISCV_PCREL_HI20 - R_RISCV_PCREL_LO12_I - R_RISCV_RELAX - R_RISCV_NONE Executables linked with linker relaxation will probably fail to be processed. BOLT relocates .text to a high address while leaving .plt at its original (low) address. This causes PC-relative PLT calls that were relaxed to a JAL to not fit their offset in an I-immediate anymore. This is something that will be addressed in a later patch. Changes to the BOLT core are relatively minor. Two things were tricky to implement and needed slightly larger changes. I'll explain those below. The R_RISCV_CALL(_PLT) relocation is put on the first instruction of a AUIPC/JALR pair, the second does not get any relocation (unlike other PCREL pairs). This causes issues with the combinations of the way BOLT processes binaries and the RISC-V MC-layer handles relocations: - BOLT reassembles instructions one by one and since the JALR doesn't have a relocation, it simply gets copied without modification; - Even though the MC-layer handles R_RISCV_CALL properly (adjusts both the AUIPC and the JALR), it assumes the immediates of both instructions are 0 (to be able to or-in a new value). This will most likely not be the case for the JALR that got copied over. To handle this difficulty without resorting to RISC-V-specific hacks in the BOLT core, a new binary pass was added that searches for AUIPC/JALR pairs and zeroes-out the immediate of the JALR. A second difficulty was supporting ABS symbols. As far as I can tell, ABS symbols were not handled at all, causing __global_pointer$ to break. RewriteInstance::analyzeRelocation was updated to handle these generically. Tests are provided for all supported relocations. Note that in order to test the correct handling of PLT entries, an ELF file produced by GCC had to be used. While I tried to strip the YAML representation, it's still quite large. Any suggestions on how to improve this would be appreciated. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D145687	2023-06-16 12:19:36 +02:00

1 2 3

140 Commits