llvm-project

Author	SHA1	Message	Date
Fangrui Song	109b7d965c	MC: Remove unneeded VK_None argument to MCSymbolRefExpr::create calls The MCSymbolRefExpr::create overload with the specifier parameter is discouraged and being phased out. Expressions with relocation specifiers should use MCSpecifierExpr instead.	2025-06-27 21:22:46 -07:00
Amir Ayupov	0c77468288	[BOLT] Expose external entry count for functions (#141674 ) Record the number of function invocations from external code - code outside the binary, which may include JIT code and DSOs. Accounting external entry counts improves the fidelity of call graph flow conservation analysis. Test Plan: updated shrinkwrapping.test	2025-06-10 14:31:22 -07:00
Maksim Panchenko	06f13f8684	[BOLT] Fix references in ignored functions in CFG state (#140678 ) When we call setIgnored() on functions that already have CFG built, these functions are not going to get emitted and we risk missing external function references being updated. To mitigate the potential issues, run scanExternalRefs() on such functions to create patches/relocations. Since scanExternalRefs() relies on function relocations, we have to preserve relocations until the function is emitted. As a result, the memory overhead without debug info update could reach up to 2%.	2025-06-02 12:33:54 -07:00
Maksim Panchenko	778801cc84	[BOLT] Never call fixBranches() on non-simple functions (#141112 ) We should never call fixBranches() on a function with invalid CFG. E.g., ValidateInternalCalls modifies CFG for its internal analysis purposes. At the same time, it marks the function as non-simple with an assumption that fixBranches() will never run on that function. However, calculateEmittedSize() by default calls fixBranches() which can lead to all sorts of issues, including assertions firing in fixBranches(). The fix is to use the original size for non-simple functions in calculateEmittedSize() since we are supposed to emit the function unmodified. Additionally, add an assertion at the start of fixBranches().	2025-05-22 14:01:54 -07:00
Kazu Hirata	7c8b39740b	[BOLT] Use llvm::is_contained (NFC) (#140984 )	2025-05-21 20:32:09 -07:00
Maksim Panchenko	51e222ef48	[BOLT][AArch64] Fix crash for conditional tail calls (#140669 ) When conditional tail call is located in old code while BOLT is operating in lite mode, the call will require optional pending relocation with a type that is currently not supported resulting in a build-time crash. Before a proper fix is implemented, ignore conditional tail calls for relocation purposes and mark their target functions to be patched, i.e. to be served as veneers/thunks.	2025-05-20 10:38:00 -07:00
Kazu Hirata	e401fb8c47	[BOLT] Use llvm::replace (NFC) (#140199 )	2025-05-16 07:30:29 -07:00
Amir Ayupov	0289ca09be	[BOLT] Print heatmap from perf2bolt (#139194 ) Add perf2bolt `--heatmap` option to produce heatmaps during profile aggregation. Distinguish exclusive mode (`llvm-bolt-heatmap`) and optional mode (`perf2bolt --heatmap`), which impacts perf.data handling: exclusive mode covers all addresses, whereas optional mode consumes attached profile only covering function addresses. Test Plan: updated per2bolt tests: - pre-aggregated-perf.test: pre-aggregated data, - bolt-address-translation-yaml.test: pre-aggregated + BOLTed input, - perf_test.test: no-LBR perf data.	2025-05-13 13:23:18 -07:00
Amir Ayupov	e039d16ee5	[BOLT][NFC] Disambiguate sample as basic sample (#139350 ) Sample is a general term covering both basic (IP) and branch (LBR) profiles. Find and replace ambiguous uses of sample in a basic sample sense. Rename `RawBranchCount` into `RawSampleCount` reflecting its use for both kinds of profile. Rename `PF_LBR` profile type as `PF_BRANCH` reflecting non-LBR based branch profiles (non-brstack SPE, synthesized brstack ETM/PT). Follow-up to #137644. Test Plan: NFC	2025-05-12 17:15:16 -07:00
Maksim Panchenko	254c13d872	[BOLT][AArch64] Patch functions targeted by optional relocs (#138750 ) On AArch64, we create optional/weak relocations that may not be processed due to the relocated value overflow. When the overflow happens, we used to enforce patching for all functions in the binary via --force-patch option. This PR relaxes the requirement, and enforces patching only for functions that are target of optional relocations. Moreover, if the compact code model is used, the relocation overflow is guaranteed not to happen and the patching will be skipped.	2025-05-08 10:53:47 -07:00
Gergely Bálint	5b20b5721a	[BOLT][AArch64] Allow binary-analysis and heatmap tool to run with pac-ret binaries (#136664 ) OpNegateRAState support is only needed for tools that produce binaries.	2025-04-30 13:41:11 +01:00
Kazu Hirata	c6e7bb19f7	[BOLT] Use llvm::unique (NFC) (#136513 )	2025-04-20 18:29:51 -07:00
YongKang Zhu	823adc7a2d	[BOLT] Validate secondary entry point (#135731 ) Some functions have their sizes as zero in input binary's symbol table, like those compiled by assembler. When figuring out function sizes, we may create label symbol if it doesn't point to any constant island. However, before function size is known, marker symbol can not be correctly associated to a function and therefore all such checks would fail and we could end up adding a code label pointing to constant island as secondary entry point and later mistakenly marking the function as not simple. Querying the global marker symbol array has big throughput overhead. Instead we can run an extra check when post processing entry points to identify such label symbols that actually point to constant islands.	2025-04-15 13:19:15 -07:00
Paschalis Mpeis	3d24046b33	[BOLT] Skip out-of-range pending relocations (#116964 ) When a pending relocation is created it is also marked whether it is optional or not. It can be optional when such relocation is added as part of an optimization (i.e., `scanExternalRefs`). When bolt tries to `flushPendingRelocations`, it safely skips any optional relocations that cannot be encoded due to being out of range. A pre-requisite to that is the usage of the `-force-patch` flag. Alternatrively, BOLT will bail out with a relevant message. Background: BOLT, as part of scanExternalRefs, identifies external references from calls and creates some pending relocations for them. Those when flushed will update references to point to the optimized functions. This optimization can be disabled using `--no-scan`. BOLT can assert if any of these pending relocations cannot be encoded. This patch does not disable this optimization but instead selectively applies it given that a pending relocation is optional and `-force-patch` was enabled.	2025-04-04 17:31:14 +01:00
Alexey Moksyakov	19a319667b	[bolt][aarch64] Adding test with unsupported indirect branches (#127655 ) This test contains the set of common indirect branch patterns. Adding the support will be step by step	2025-04-01 13:49:09 +03:00
Kazu Hirata	0c7be9392f	[BOLT] Use *Set::insert_range (NFC) (#133601 )	2025-03-29 16:52:16 -07:00
Maksim Panchenko	96e5ee23a7	[BOLT][AArch64] Add partial support for lite mode (#133014 ) In lite mode, we only emit code for a subset of functions while preserving the original code in .bolt.org.text. This requires updating code references in non-emitted functions to ensure that: * Non-optimized versions of the optimized code never execute. * Function pointer comparison semantics is preserved. On x86-64, we can update code references in-place using "pending relocations" added in scanExternalRefs(). However, on AArch64, this is not always possible due to address range limitations and linker address "relaxation". There are two types of code-to-code references: control transfer (e.g., calls and branches) and function pointer materialization. AArch64-specific control transfer instructions are covered by #116964. For function pointer materialization, simply changing the immediate field of an instruction is not always sufficient. In some cases, we need to modify a pair of instructions, such as undoing linker relaxation and converting NOP+ADR into ADRP+ADD sequence. To achieve this, we use the instruction patch mechanism instead of pending relocations. Instruction patches are emitted via the regular MC layer, just like regular functions. However, they have a fixed address and do not have an associated symbol table entry. This allows us to make more complex changes to the code, ensuring that function pointers are correctly updated. Such mechanism should also be portable to RISC-V and other architectures. To summarize, for AArch64, we extend the scanExternalRefs() process to undo linker relaxation and use instruction patches to partially overwrite unoptimized code.	2025-03-27 21:33:25 -07:00
Maksim Panchenko	bac21719a8	[BOLT] Pass unfiltered relocations to disassembler. NFCI (#131202 ) Instead of filtering and modifying relocations in readRelocations(), preserve the relocation info and use it in the symbolizing disassembler. This change mostly affects AArch64, where we need to look at original linker relocations in order to properly symbolize instruction operands.	2025-03-14 18:44:33 -07:00
Paschalis Mpeis	2f9d94981c	[BOLT] Change Relocation Type to 32-bit NFCI (#130792 )	2025-03-14 18:15:59 +00:00
chrisPyr	038fff3f24	[NFC][BOLT] Make file-local cl::opt global variables static (#126472 ) #125983	2025-03-05 22:11:05 -08:00
Maksim Panchenko	b971d4d7c8	[BOLT][AArch64] Add symbolizer for AArch64 disassembler. NFCI (#127969 ) Add AArch64MCSymbolizer that symbolizes `MCInst` operands during disassembly. The symbolization was previously done in `BinaryFunction::disassemble()`, but it is also required by `scanExternalRefs()` for "lite" mode functionality. Hence, similar to x86, I've implemented the symbolizer interface that uses `BinaryFunction` relocations to properly create instruction operands. I expect the result of the disassembly to be identical after the change. AArch64 disassembler was not calling `tryAddingSymbolicOperand()` for `MOV` instructions. Fix that. Additionally, the disassembler marks `ldr` instructions as branches by setting `IsBranch` parameter to true. Ignore the parameter and rely on `MCPlusBuilder` interface instead. I've modified `--check-encoding` flag to check symolization of operands of instructions that have relocations against them.	2025-03-03 12:44:28 -08:00
Maksim Panchenko	074c2c6713	[BOLT] Refactor MCInst target symbol lookup. NFCI (#129131 ) In analyzeInstructionForFuncReference(), use MCPlusBuilder interface while scanning symbolic operands of MCInst. Should be NFC on x86, but will make the function work on other architectures. Note that it's currently unused on non-x86 as its functionality is exclusive to safe ICF that runs on x86 only.	2025-02-28 17:57:54 -08:00
Amir Ayupov	3968ebd00d	[BOLT] Keep multi-entry functions simple in aggregation mode (#128253 ) BOLT used to mark multi-entry functions non-simple in non-relocation mode with the reasoning that we can't move them due to potentially undetected references. However, in aggregation mode it doesn't apply as BOLT doesn't perform optimizations. Relax this constraint in case of an aggregation job. Test Plan: added entry-point-fallthru.s	2025-02-25 10:53:45 -08:00
YongKang Zhu	9fa77c1854	[BOLT][Linker][NFC] Remove lookupSymbol() in favor of lookupSymbolInfo() (#128070 ) Sometimes we need to know the size of a symbol besides its address, so maybe we can start using the existing `BOLTLinker::lookupSymbolInfo()` (that returns symbol address and size) and remove `BOLTLinker::lookupSymbol()` (that only returns symbol address). And for both we need to check return value as it is wrapped in `std::optional<>`, which makes the difference even smaller.	2025-02-20 17:14:33 -08:00
Maksim Panchenko	0ba391a85f	[BOLT] Improve constant island disassembly (#127971 ) * Add label that identifies constant island. * Support cases where the island is located after the function.	2025-02-20 11:16:01 -08:00
Maksim Panchenko	3115278c4e	[BOLT] Fixup for commit 137c378/#125961	2025-02-06 00:26:20 -08:00
Maksim Panchenko	137c3781e6	[BOLT][AArch64] Include constant islands in disassembly (#125961 ) When printing disassembly of a function with constant islands, include the island info in the dump. At the moment, only print islands in pre-CFG state. Include islands that are interleaved with instructions.	2025-02-05 22:41:40 -08:00
Maksim Panchenko	ef232a7e34	[BOLT][AArch64] Remove nops in functions with defined control flow (#124705 ) When a function has an indirect branch with unknown control flow, we preserve nops in order to keep all instruction offsets (from the start of the function) the same in case the indirect branch is used by a PC-relative jump table. However, when we know the control flow of the function, we should be able to safely remove nops.	2025-01-28 11:03:49 -08:00
Alexander Yermolovich	3c357a49d6	[BOLT] Add support for safe-icf (#116275 ) Identical Code Folding (ICF) folds functions that are identical into one function, and updates symbol addresses to the new address. This reduces the size of a binary, but can lead to problems. For example when function pointers are compared. This can be done either explicitly in the code or generated IR by optimization passes like Indirect Call Promotion (ICP). After ICF what used to be two different addresses become the same address. This can lead to a different code path being taken. This is where safe ICF comes in. Linker (LLD) does it using address significant section generated by clang. If symbol is in it, or an object doesn't have this section symbols are not folded. BOLT does not have the information regarding which objects do not have this section, so can't re-use this mechanism. This implementation scans code section and conservatively marks functions symbols as unsafe. It treats symbols as unsafe if they are used in non-control flow instruction. It also scans through the data relocation sections and does the same for relocations that reference a function symbol. The latter handles the case when function pointer is stored in a local or global variable, etc. If a relocation address points within a vtable these symbols are skipped.	2024-12-16 21:49:53 -08:00
Enna1	4d2bc0adc6	[BOLT] Extract comparator for sorting functions by index into helper function (#116217 ) This change extracts the comparator for sorting functions by index into a helper function `compareBinaryFunctionByIndex()` Not sure why the comparator used in `BinaryContext::getSortedFunctions()` is not same as the other two places. I think they should use the same comparator, so I also change `BinaryContext::getSortedFunctions()` to use `compareBinaryFunctionByIndex()` for sorting functions.	2024-11-27 09:01:12 +08:00
Daniel Sanders	74003f11b3	[mc] Add CFI directive to emit val_offset() rules (#113971 ) These specify that the value of the given register in the previous frame is the CFA plus some offset. This isn't very common but can be necessary if the original value is normally reconstructed from the stack/frame pointer instead of being saved on the stack and reloaded from there.	2024-11-11 11:38:36 -08:00
Kazu Hirata	41baa69a7e	[BOLT] Fix warnings (#114116 ) This patch fixes: bolt/lib/Core/BinaryFunction.cpp:2537:13: error: enumeration value 'OpNegateRAStateWithPC' not handled in switch [-Werror,-Wswitch] bolt/lib/Core/BinaryFunction.cpp:2661:13: error: enumeration value 'OpNegateRAStateWithPC' not handled in switch [-Werror,-Wswitch] bolt/lib/Core/BinaryFunction.cpp:2805:13: error: enumeration value 'OpNegateRAStateWithPC' not handled in switch [-Werror,-Wswitch]	2024-10-29 13:52:22 -07:00
Kazu Hirata	7928e14f5e	[BOLT] Avoid repeated map lookups (NFC) (#112118 )	2024-10-12 22:06:49 -07:00
Maksim Panchenko	4db0cc4c55	[BOLT] Allow sections in --print-only flag (#109622 ) While printing functions, expand --print-only flag to accept section names. E.g., "--print-only=\.init" will only print functions from ".init" section.	2024-09-25 23:44:06 +02:00
Maksim Panchenko	abd69b3653	[BOLT] Handle internal calls in ValidateInternalCalls (#105736 ) Move handling of all internal calls into the designated pass. Preserve NOPs and mark functions as non-simple on non-X86 platforms.	2024-08-27 11:31:32 -07:00
Maksim Panchenko	8f3050684e	[BOLT] Reduce CFI warning verbosity (#105336 ) CFI programs may have more saves than restores and this is completely benign from BOLT's perspective. Reduce the verbosity and print the warning only under `-v=1` and above.	2024-08-20 13:41:19 -07:00
Amir Ayupov	f83a89c1b1	[BOLT] Turn non-empty CFI StateStack assert into a warning (#102216 ) clang-15 can produce binaries with mismatched RememberState/RestoreState CFIs. This is benign for unwinding, so replace an assert with a warning.	2024-08-06 17:23:43 -07:00
Amir Ayupov	3023b15fb1	[BOLT] Support POSSIBLE_PIC_FIXED_BRANCH Detect and support fixed PIC indirect jumps of the following form: ``` movslq En(%rip), %r1 leaq PIC_JUMP_TABLE(%rip), %r2 addq %r2, %r1 jmpq *%r1 ``` with PIC_JUMP_TABLE that looks like following: ``` JT: ---------- E1:\| L1 - JT \| \|----------\| E2:\| L2 - JT \| \|----------\| \| \| ...... En:\| Ln - JT \| ---------- ``` The code could be produced by compilers, see https://github.com/llvm/llvm-project/issues/91648. Test Plan: updated jump-table-fixed-ref-pic.test Reviewers: maksfb, ayermolo, dcci, rafaelauler Reviewed By: rafaelauler Pull Request: https://github.com/llvm/llvm-project/pull/91667	2024-07-18 20:57:05 -07:00
Fangrui Song	2718654c54	[MC] Support .cfi_label GNU assembler 2.26 introduced the .cfi_label directive. It does not expand to any CFI instructions, but defines a label in .eh_frame/.debug_frame, which can be used by runtime patching code to locate the FDE. .cfi_label is not allowed for CIE's initial instructions, and can therefore be used to force the next instruction to be placed in a FDE instead of a CIE. In glibc since 2018, sysdeps/riscv/start.S utilizes .cfi_label to force DW_CFA_undefined to be placed in a FDE. arc/csky/loongarch ports have copied this use. ``` .cfi_startproc // DW_CFA_undefined is allowed for CIE's initial instructions. // Without .cfi_label, gas would place DW_CFA_undefined in a CIE. .cfi_label .Ldummy .cfi_undefined ra .cfi_endproc ``` No CFI instruction is associated with .cfi_label, so the `case MCCFIInstruction::OpLabel:` code in BOLT is unreachable and onlt to make -Wswitch happy. Close #97222 Pull Request: https://github.com/llvm/llvm-project/pull/97922	2024-07-07 12:41:13 -07:00
Amir Ayupov	344228ebf4	[BOLT] Drop macro-fusion alignment (#97358 ) 9d0754ada5dbbc0c009bcc2f7824488419cc5530 dropped MC support required for optimal macro-fusion alignment in BOLT. Remove the support in BOLT as performance measurements with large binaries didn't show a significant improvement. Test Plan: macro-fusion alignment was never upstreamed, so no upstream tests are affected.	2024-07-02 09:20:41 -07:00
Nathan Sidwell	6c5b62b846	[BOLT][NFC] Separate isReversibleBranch's 2 semantics (#95572 ) `isUnsupportedBranch` was renamed (and inverted) to `isReversibleBranch`, as that was how it was being used. But one use in `BinaryFunction::disassemble` was using the original meaning to detect unsupported branches, and the `isUnsupportedBranch` had 2 separate semantic checks. Move the unsupported branch check from `isReversibleBranch` to a new entry point: `isUnsupportedInstruction`. Call that from `BinaryFunction::disassemble`. Move the dynamic branch check from X86's isReversibleBranch to the base class, as it is not an architecture-specific check. Remove unnecessary `isReversibleBranch` calls from Instrumentation and X86 MCPlusBuilder.	2024-06-28 07:45:37 -04:00
Maksim Panchenko	d16b21b17d	[BOLT][Linux] Support ORC for alternative instructions (#96709 ) Alternative instruction sequences in the Linux kernel can modify the stack and thus they need their own ORC unwind entries. Since there's only one ORC table, it has to be "shared" among multiple instruction sequences. The kernel achieves this by putting a restriction on instruction boundaries. If ORC state changes at a given IP, only one of the alternative sequences can have an instruction starting/ending at this IP. Then, developers can insert NOPs to guarantee the above requirement is met. The most common use of ORC with alternatives is "pushf; pop %rax" sequence used for paravirtualization. Note that newer kernel versions no longer use .parainstructions; instead, they utilize alternatives for the same purpose. Before we implement a better support for alternatives, we can safely skip ORC entries associated with them. Fixes #87052.	2024-06-27 19:26:11 -07:00
Maksim Panchenko	ca06b61084	[BOLT] Omit CFI state while printing functions without CFI (#96723 ) If a function has no CFI program attached to it, do not print redundant empty CFI state for every basic block.	2024-06-27 17:26:58 -07:00
Nikita Popov	b23fe1088f	[bolt] Add missing <stack> include (NFC)	2024-06-21 14:02:15 +02:00
shaw young	4be3083bb3	[BOLT] Remove mutable from BB::LayoutIndex (#93224 ) Removed mutability from BB::LayoutIndex, subsequently removed const from BB::SetLayout, and changed BF::dfs to track visited blocks with a set as opposed to tracking and altering LayoutIndexes for more consistent code.	2024-05-31 11:52:22 -07:00
Amir Ayupov	f239490592	[BOLT][NFC] Define getExprValue helper (#91663 ) Move out common code extracting the address of a MCExpr. To be reused in #91667. Test Plan: NFC	2024-05-24 15:33:25 -07:00
Amir Ayupov	720cade2b6	[BOLT][NFC] Avoid computing BF hash twice in YAML reader (#75096 ) We compute BF hashes in `YAMLProfileReader::readProfile` when first matching profile functions with binary functions, and second time in `YAMLProfileReader::parseFunctionProfile` during the profile assignment (we need to do that to account for LTO private functions with mismatching suffix). Avoid recomputing the hash if it's been set.	2024-05-24 14:00:03 -07:00
Amir Ayupov	935b946b1f	[BOLT] Process cross references between ignored functions in BAT mode (#92484 ) To align YAML and fdata profiles produced in BAT mode, lift two restrictions applied in non-relocation mode when BAT is present: 1) register secondary entry points from ignored functions, 2) treat functions with secondary entry points as simple. This allows constructing CFG for non-simple functions in non-relocation mode and emitting YAML profile for them, which can then be used for optimizations in relocation mode. Test Plan: added test ignored-interprocedural-reference.s	2024-05-21 20:22:12 -07:00
Nathan Sidwell	76fdc2e527	[BOLT][NFC] Rename isUnsupportedBranch to isReversibleBranch (#92447 ) `isUnsupportedBranch` is not a very informative name, and doesn't match its corresponding `reverseBranchCondition`, as I noted in PR #92018. Here's a renaming to a more mnemonic name.	2024-05-17 15:40:40 -04:00
Nathan Sidwell	725014d866	[BOLT][NFC] Simplify CFG validation (#91977 ) Remove 'Valid' local boolean that has a single use, and return directly instead.	2024-05-14 09:36:34 -04:00

1 2 3 4

187 Commits