llvm-project

Author	SHA1	Message	Date
Kazu Hirata	4831d92005	[lld] Replace SmallSet with SmallPtrSet (NFC) (#154263 ) This patch replaces SmallSet<T , N> with SmallPtrSet<T , N>. Note that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer element types: template <typename PointeeType, unsigned N> class SmallSet<PointeeType, N> : public SmallPtrSet<PointeeType, N> {}; We only have 30 instances that rely on this "redirection". Since the redirection doesn't improve readability, this patch replaces SmallSet with SmallPtrSet for pointer element types. I'm planning to remove the redirection eventually.	2025-08-18 22:39:45 -07:00
Fangrui Song	94655dc8ae	[ELF] -r: Synthesize R_RISCV_ALIGN at input section start" (#151639 ) Clear `synthesizedAligns` to prevent stray relocations to an unrelated text section. Enhance the test to check llvm-readelf -r output. --- Without linker relaxation enabled for a particular relocatable file or section (e.g., using .option norelax), the assembler will not generate R_RISCV_ALIGN relocations for alignment directives. This becomes problematic in a two-stage linking process: ``` ld -r a.o b.o -o ab.o // b.o is norelax. Its alignment information is lost in ab.o. ld ab.o -o ab ``` When ab.o is linked into an executable, the preceding relaxed section (a.o's content) might shrink. Since there's no R_RISCV_ALIGN relocation in b.o for the linker to act upon, the `.word 0x3a393837` data in b.o may end up unaligned in the final executable. To address the issue, this patch inserts NOP bytes and synthesizes an R_RISCV_ALIGN relocation at the beginning of a text section when the alignment >= 4. For simplicity, when RVC is disabled, we synthesize an ALIGN relocation (addend: 2) for a 4-byte aligned section, allowing the linker to trim the excess 2 bytes. See also https://sourceware.org/bugzilla/show_bug.cgi?id=33236	2025-08-12 22:38:17 -07:00
Fangrui Song	98164d4706	Revert "[ELF] -r: Synthesize R_RISCV_ALIGN at input section start" (#151639 ) This reverts commit 6f53f1c8d2bdd13e30da7d1b85ed6a3ae4c4a856. synthesiedAligns is not cleared, leading to stray relocations for unrelated sections. Revert for now.	2025-08-12 22:18:15 -07:00
Peter Collingbourne	53c41f19e3	ELF: Add support for R_AARCH64_PATCHINST relocation type. The R_AARCH64_PATCHINST relocation type is to support deactivation symbols. For more information, see the RFC: https://discourse.llvm.org/t/rfc-deactivation-symbols/85556 Part of the AArch64 psABI extension: https://github.com/ARM-software/abi-aa/issues/340 Reviewers: smithp35, davemgreen, MaskRay Reviewed By: MaskRay, davemgreen, smithp35 Pull Request: https://github.com/llvm/llvm-project/pull/133534	2025-08-11 11:33:08 -07:00
Fangrui Song	6f53f1c8d2	[ELF] -r: Synthesize R_RISCV_ALIGN at input section start Without linker relaxation enabled for a particular relocatable file or section (e.g., using .option norelax), the assembler will not generate R_RISCV_ALIGN relocations for alignment directives. This becomes problematic in a two-stage linking process: ``` ld -r a.o b.o -o ab.o // b.o is norelax. Its alignment information is lost in ab.o. ld ab.o -o ab ``` When ab.o is linked into an executable, the preceding relaxed section (a.o's content) might shrink. Since there's no R_RISCV_ALIGN relocation in b.o for the linker to act upon, the `.word 0x3a393837` data in b.o may end up unaligned in the final executable. To address the issue, this patch inserts NOP bytes and synthesizes an R_RISCV_ALIGN relocation at the beginning of a text section when the alignment >= 4. For simplicity, when RVC is disabled, we synthesize an ALIGN relocation (addend: 2) for a 4-byte aligned section, allowing the linker to trim the excess 2 bytes. See also https://sourceware.org/bugzilla/show_bug.cgi?id=33236 Pull Request: https://github.com/llvm/llvm-project/pull/151639	2025-08-08 18:40:40 -07:00
Pengying Xu	ffdaf85a95	[lld][ELF] filter out section symbols when use BP reorder (#151685 ) When using Temporal Profiling with the BP algorithm, we encounter an issue with the internal function reorder. In cases where the symbol table contains entries like: ``` Symbol table '.symtab' contains 45 entries: Num: Value Size Type Bind Vis Ndx Name 10: 0000000000000000 0 SECTION LOCAL DEFAULT 18 .text.L1 11: 0000000000000000 24 FUNC LOCAL DEFAULT 18 L1 ```` The zero-sized section symbol .text.L1 gets stored in the secToSym map first. However, when the function lookup searches for L1 (as seen in [BPSectionOrdererBase.inc:191](https://github.com/llvm/llvm-project/blob/main/lld/include/lld/Common/BPSectionOrdererBase.inc#L191)), it fails to find the correct entry in rootSymbolToSectionIdxs because the section symbol has already claimed that slot. This patch fixes the issue by skipping zero-sized symbols during the addSections process, ensuring that function symbols are properly registered for lookup.	2025-08-07 20:25:36 -07:00
Fangrui Song	f9b68838f6	ELF: -r: Call assignAddresses only once The fixed-point layout algorithm handles linker scripts, thunks, and relaxOnce (to suppress out-of-range GOT-indirect-to-PC-relative optimization). These passes are not needed for relocatable links because they require address information that is not yet available. Since we don't scan relocations for relocatable links, the `createThunks` and `relaxOnce` functions are no-ops anyway, making these passes redundant. To prevent cluttering the line history, I place the `if (...) break;` inside the for loop. Pull Request: https://github.com/llvm/llvm-project/pull/152240	2025-08-07 08:31:27 -07:00
Jessica Clarke	723b40a8d9	[ELF][Hexagon] Fix host endianness assumption Fixes: b42f96bc057f ("[lld] Add thunks for hexagon (#111217)")	2025-08-03 21:30:17 +01:00
Jessica Clarke	de15d36574	[NFC][ELF][Hexagon] Avoid pointless ArrayRef::drop_front Fixes: b42f96bc057f ("[lld] Add thunks for hexagon (#111217)")	2025-08-03 21:30:17 +01:00
Jessica Clarke	b03d1e1e2e	[NFC][ELF] Add missing blank line between functions Fixes: b42f96bc057f ("[lld] Add thunks for hexagon (#111217)")	2025-08-03 21:30:16 +01:00
bd1976bris	673476d96b	[DTLTO][LLD][ELF] Support bitcode members of thin archives (#149425 ) This patch adds support for bitcode members of thin archives to DTLTO (https://llvm.org/docs/DTLTO.html) in ELF LLD. For DTLTO, bitcode identifiers must be valid paths to bitcode files on disk. Clang does not support archive inputs for ThinLTO backend compilations. This patch adjusts the identifier for bitcode members of thin archives in DTLTO links so that it is the path to the member file on disk, allowing such members to be supported in DTLTO. This patch is sufficient to allow for self-hosting an LLVM build with DTLTO when thin archives are used. Note: Bitcode members of non-thin archives remain unsupported. This will be addressed in a future change. Testing: - LLD lit test coverage has been added to check that the identifier is adjusted appropriately. - A cross-project lit test has been added to show that a DTLTO link can succeed when linking bitcode members of thin archives. For the design discussion of the DTLTO feature, see: #126654.	2025-08-01 09:38:46 +01:00
Zhaoxin Yang	283c47b4c5	[lld][LoongArch] GOT indirection to PC relative optimization (#123743 ) In LoongArch, we try GOT indirection to PC relative optimization in normal or medium code model, whether or not with R_LARCH_RELAX relocation. From: * pcalau12i $a0, %got_pc_hi20(sym_got) * ld.w/d $a0, $a0, %got_pc_lo12(sym_got) To: * pcalau12i $a0, %pc_hi20(sym) * addi.w/d $a0, $a0, %pc_lo12(sym) If the original code sequence can be relaxed into a single instruction `pcaddi`, this patch will not be taken (see https://github.com/llvm/llvm-project/pull/123566). The optimization related to GOT is split into two locations because the `relax()` function is part of an iteration fixed-point algorithm. We should minimize it to achieve better linker performance. Note: Althouth the optimization has been performed, the GOT entries still exists, similarly to AArch64. Eliminating the entries will increase code complexity.	2025-08-01 14:45:46 +08:00
Jessica Clarke	52ddcfd8d6	[NFC][ELF] Replace DynamicReloc::Kind with the equivalent bool in APIs DynamicReloc::AgainstSymbol is now true and DynamicReloc::AddendOnly is now false; uses of the constants were replaced mechanically. Reviewers: rnk, MaskRay Reviewed By: MaskRay Pull Request: https://github.com/llvm/llvm-project/pull/150813	2025-07-30 17:08:36 +01:00
Jessica Clarke	ab12b43047	[NFCI][ELF] Store DynamicReloc Kind as two bools Aside from Computed, Kind is now just AddendOnly and AgainstSymbol, so it's really just a bool reflecting whether the resulting ELF relocation should reference the symbol or not. Refactor DynamicReloc's storage to reflect this, splitting Computed out into its own orthogonal isFinal bool. As part of this, rename computeRaw to finalize to reflect that it's side-effecting. This also allows needsDynSymIndex() to work even after finalize(), so drop the existing assertion. A future commit will refact the DynamicReloc API to take isAgainstSymbol directly now the enum serves little purpose, as a more invasive, mechanical change. For this commit we keep DynamicReloc::Kind as the external API. Reviewers: MaskRay, arichardson Reviewed By: MaskRay, arichardson Pull Request: https://github.com/llvm/llvm-project/pull/150812	2025-07-30 17:08:14 +01:00
Jessica Clarke	1522ad9569	[NFC][ELF] Don't duplicate DynamicReloc constructor This second constructor is just a shorthand for an AddendOnly relocation against dummySym with R_ADDEND, so write it as such. Reviewers: arichardson, MaskRay Reviewed By: MaskRay, arichardson Pull Request: https://github.com/llvm/llvm-project/pull/150811	2025-07-30 17:07:48 +01:00
Jessica Clarke	06d5e87e65	[NFCI][ELF][Mips] Replace MipsMultiGotPage with new RE_MIPS_OSEC_LOCAL_PAGE Instead of having a special DynamicReloc::Kind, we can just use a new RelExpr for the calculation needed. The only odd thing we do that allows this is to keep a representative symbol for the OutputSection in question (the first we see for it) around to use in this relocation for the addend calculation. This reduces DynamicReloc to just AddendOnly vs AgainstSymbol, plus the internal Computed. Reviewers: MaskRay, arichardson Reviewed By: MaskRay, arichardson Pull Request: https://github.com/llvm/llvm-project/pull/150810	2025-07-30 17:07:22 +01:00
Jessica Clarke	54df4b8c35	[NFCI][ELF] Merge AgainstSymbol and AgainstSymbolWithTargetVA The former is just a special case of the latter, ignoring the expr and always just using the addend. If we use R_ADDEND as expr (which previously had no effect, and so was misleadingly R_ABS not R_ADDEND in all but one use) then we don't need to maintain this as a separate case. Aside from the internal Computed Kind, this just leaves MipsMultiGotPage as a special case; the only difference between the other two Kind values is what needsDynSymIndex returns. Reviewers: MaskRay Reviewed By: MaskRay Pull Request: https://github.com/llvm/llvm-project/pull/150798	2025-07-30 17:06:59 +01:00
Jessica Clarke	e027b9258a	[NFCI][ELF] Merge AddendOnly and AddendOnlyWithTargetVA The former is just a special case of the latter, ignoring the expr and always just using the addend, allowing (and enforcing) the sym is null. If we just use dummySym then we don't need to maintain this as a separate case, since R_ADDEND will return the addend unmodified for the call to getRelocTargetVA. Reviewers: MaskRay, arichardson Reviewed By: MaskRay, arichardson Pull Request: https://github.com/llvm/llvm-project/pull/150797	2025-07-30 17:05:15 +01:00
Jessica Clarke	0d81d3c59a	[NFCI][ELF] Introduce explicit Computed state for DynamicReloc Currently we set the kind to AddendOnly in computeRaw() in order to catch cases where we're not treating the DynamicReloc as computed. Specifically, computeAddend() will then assert that sym is nullptr, so can catch any subsequent calls for relocations that have sym set. However, if the DynamicReloc was already AddendOnly (or MipsMultiGotPage), we will silently allow this, which does work correctly, but is not the intended use. We also cannot catch cases where needsDynSymIndex() is called after this point, which would give a misleading value if the kind were previously against a symbol. By introducing a new (internal) Computed kind we can be explicit and add more rigorous assertions, rather than abusing AddendOnly. Reviewers: arichardson, MaskRay Reviewed By: arichardson, MaskRay Pull Request: https://github.com/llvm/llvm-project/pull/150799	2025-07-30 17:03:54 +01:00
Jessica Clarke	58e6bc87b7	[ELF] Add a dummySym member to Ctx This ensures subsequent calls to elf::postScanRelocations with a new Ctx will correctly use an instance with the right internalFile (with the old one presumably deleted, even). It also avoids having to create a new instance in elf::getErrorPlace, and will allow more uses of such a dummy symbol in future commits. Reviewers: MaskRay Reviewed By: MaskRay Pull Request: https://github.com/llvm/llvm-project/pull/150796	2025-07-30 17:03:34 +01:00
Zhaoxin Yang	4ec8503e4c	[lld][LoongArch] Check that the relocation addend is zero before applying relaxation to R_LARCH_GOT_PC_{HI20,LO12} (#151264 ) Linker relaxation to R_LARCH_GOT_PC_{HI20,LO12} is only possible when the addend of the relocation is zero. Note: For `ld.bfd`, GOT references with non-zero addends will trigger an assert in LoongArch, but `lld` handles these cases without any errors. ``` ld.bfd: BFD (GNU Binutils) 2.44.0 assertion fail /usr/src/debug/binutils/binutils-gdb/bfd/elfnn-loongarch.c:4248 ```	2025-07-30 16:40:59 +08:00
Jessica Clarke	f8685a8533	[NFC][ELF] Wrap invokeELFT in do { } while (0) so it behaves as a function (#150119 ) The current implementation is dangerous if used in contexts that need a single statement, since invokeELFT(...); is in fact two statements, a switch statement and an empty statement.	2025-07-27 00:20:55 +01:00
Mark Murray	d52675e0a7	[lld][AArch64][Build Attributes] Add support for AArch64 Build Attributes (#147970 ) This patch enables lld to read AArch64 Build Attributes and convert them into GNU Properties. Changes: - Parses AArch64 Build Attributes from input object files. - Converts known attributes into corresponding GNU Properties. - Merges attributes when linking multiple objects. Spec reference: https://github.com/ARM-software/abi-aa/pull/230/files#r1030 Co-authored-by: Sivan Shani <sivan.shani@arm.com> --------- Co-authored-by: Sivan Shani <sivan.shani@arm.com>	2025-07-24 10:38:36 +01:00
Fangrui Song	f97adea477	ELF: Simplify isRelRoDataSection and rename the text file PR #148920 was merged before I could share my comments. * Fix the text filename. There are other minor suggestions, but can be done in #148985 * Make `isRelRoDataSection` concise, to be consistent with the majority of helper functions.	2025-07-23 09:34:07 -07:00
Mingming Liu	7dcd90df45	[lld][ELF] Allow `data.rel.ro.hot` and `.data.rel.ro.unlikely` to be RELRO (#148920 ) https://discourse.llvm.org/t/rfc-profile-guided-static-data-partitioning/83744 proposes to partition a static data section (like `.data.rel.ro`) into two sections, one grouping the cold ones and the other grouping the rest. lld requires all relro sections to be contiguous. To place `.data.rel.ro.unlikely` in the middle of all relro sections, this change proposes to add `.data.rel.ro.unlikely` explicitly as RELRO section. --------- Co-authored-by: Sam Elliott <quic_aelliott@quicinc.com>	2025-07-23 08:29:17 -07:00
Zhaoxin Yang	2a5cd50c46	[lld][LoongArch] Support relaxation during TLSDESC GD/LD to IE/LE conversion (#123730 ) Complement https://github.com/llvm/llvm-project/pull/123715. When relaxation enable, remove redundant NOPs.	2025-07-23 17:12:13 +08:00
Brian Cain	3e9ceae29f	[lld] [hexagon] guard allocateAux: only if idx nonzero (#149690 ) While building libclang_rt.asan-hexagon.so, lld would assert in lld:🧝:hexagonTLSSymbolUpdate(). Fixes #132766	2025-07-20 14:39:03 -05:00
Brian Cain	b42f96bc05	[lld] Add thunks for hexagon (#111217 ) Without thunks, programs will encounter link errors complaining that the branch target is out of range. Thunks will extend the range of branch targets, which is a critical need for large programs. Thunks provide this flexibility at a cost of some modest code size increase. When configured with the maximal feature set, the hexagon port of the linux kernel would often encounter these limitations when linking with `lld`. The relocations which will be extended by thunks are: * R_HEX_B22_PCREL, R_HEX_{G,L}D_PLT_B22_PCREL, R_HEX_PLT_B22_PCREL relocations have a range of ± 8MiB on the baseline * R_HEX_B15_PCREL: ±65,532 bytes * R_HEX_B13_PCREL: ±16,380 bytes * R_HEX_B9_PCREL: ±1,020 bytes Fixes #149689 Co-authored-by: Alexey Karyakin <akaryaki@quicinc.com> --------- Co-authored-by: Alexey Karyakin <akaryaki@quicinc.com>	2025-07-20 11:46:31 -05:00
Pengying Xu	6c705d1136	[lld][elf] Skip BP ordering input sections with null data (#149265 )	2025-07-18 08:01:16 -07:00
Fangrui Song	3cb0c7f45b	MC: Rework .reloc directive and fix the offset when it evaluates to a constant * Fix `.reloc constant` to mean section_symbol+constant instead of .+constant . The initial .reloc support from MIPS incorrectly interpreted the offset. * Delay the evaluation of the offset expression after MCAssembler::layout, deleting a lot of code working with MCFragment. * Delete many FIXME from https://reviews.llvm.org/D79625 * Some lld/ELF/Arch/LoongArch.cpp relaxation tests rely on .reloc ., R_LARCH_ALIGN generating ALIGN relocations at specific location. Sort the relocations.	2025-07-17 00:36:11 -07:00
Brian Cain	d2bcc51a5a	[LLD] Merge .hexagon.attributes sections (#148098 ) Merge the attributes of object files being linked together. The `.hexagon.attributes` section can be used by loaders and analysis tools. This is similar to the .riscv.attributes, introduced in 8a900f2438b4a167b98404565ad4da2645cc9330 / https://reviews.llvm.org/D138550.	2025-07-14 22:36:05 -05:00
Parth	923a3cc160	[LLD] Fix crash on parsing ':ALIGN' in linker script (#146723 ) The linker was crashing due to stack overflow when parsing ':ALIGN' in an output section description. This commit fixes the linker script parser so that the crash does not happen. The root cause of the stack overflow is how we parse expressions (readExpr) in linker script and the behavior of ScriptLexer::expect(...) utility. ScriptLexer::expect does not do anything if errors have already been encountered during linker script parsing. In particular, it never increments the current token position in the script file, even if the current token is the same as the expected token. This causes an infinite call cycle on parsing an expression such as '(4096)' when an error has already been encountered. readExpr() calls readPrimary() readPrimary() calls readParenExpr() readParenExpr(): expect("("); // no-op, current token still points to '(' Expression *E = readExpr(); // The cycle continues... Closes #146722 Signed-off-by: Parth Arora <partaror@qti.qualcomm.com>	2025-07-06 10:22:50 -07:00
bd1976bris	3b4e79398d	[DTLTO][LLD][ELF] Add support for Integrated Distributed ThinLTO (#142757 ) This patch introduces support for Integrated Distributed ThinLTO (DTLTO) in ELF LLD. DTLTO enables the distribution of ThinLTO backend compilations via external distribution systems, such as Incredibuild, during the traditional link step: https://llvm.org/docs/DTLTO.html. It is expected that users will invoke DTLTO through the compiler driver (e.g., Clang) rather than calling LLD directly. A Clang-side interface for DTLTO will be added in a follow-up patch. Note: Bitcode members of archives (thin or non-thin) are not currently supported. This will be addressed in a future change. As a consequence of this lack of support, this patch is not sufficient to allow for self-hosting an LLVM build with DTLTO. Theoretically, --start-lib/--end-lib could be used instead of archives in a self-host build. However, it's unclear how --start-lib/--end-lib can be easily used with the LLVM build system. Testing: - ELF LLD `lit` test coverage has been added, using a mock distributor to avoid requiring Clang. - Cross-project `lit` tests cover integration with Clang. For the design discussion of the DTLTO feature, see: #126654.	2025-07-02 16:12:27 +01:00
Zhaoxin Yang	2c1900860c	[lld][LoongArch] Support TLSDESC GD/LD to IE/LE (#123715 ) Support TLSDESC to initial-exec or local-exec optimizations. Introduce a new hook RE_LOONGARCH_RELAX_TLS_GD_TO_IE_PAGE_PC and use existing R_RELAX_TLS_GD_TO_IE_ABS to support TLSDESC => IE, while use existing R_RELAX_TLS_GD_TO_LE to support TLSDESC => LE. In normal or medium code model, there are two forms of code sequences: * pcalau12i $a0, %desc_pc_hi20(sym_desc) * addi.d $a0, $a0, %desc_pc_lo12(sym_desc) * ld.d $ra, $a0, %desc_ld(sym_desc) * jirl $ra, $ra, %desc_call(sym_desc) ------ * pcaddi $a0, %desc_pcrel_20(sym_desc) * ld.d $ra, $a0, %desc_ld(sym_desc) * jirl $ra, $ra, %desc_call(sym_desc) Convert to IE: * pcalau12i $a0, %ie_pc_hi20(sym_ie) * ld.[wd] $a0, $a0, %ie_pc_lo12(sym_ie) Convert to LE: * lu12i.w $a0, %le_hi20(sym_le) # le_hi20 != 0, otherwise NOP * ori $a0 src, %le_lo12(sym_le) # le_hi20 != 0, src = $a0, otherwise src = $zero Simplicity, whether tlsdescToIe or tlsdescToLe, we always tend to convert the preceding instructions to NOPs, due to both forms of code sequence (corresponding to relocation combinations: R_LARCH_TLS_DESC_PC_HI20+R_LARCH_TLS_DESC_PC_LO12 and R_LARCH_TLS_DESC_PCREL20_S2) have same process. TODO: When relaxation enables, redundant NOPs can be removed. It will be implemented in a future patch. Note: All forms of TLSDESC code sequences should not appear interleaved in the normal, medium or extreme code model, which compilers do not generate and lld is unsupported. This is thanks to the guard in PostRASchedulerList.cpp in llvm. ``` Calls are not scheduling boundaries before register allocation, but post-ra we don't gain anything by scheduling across calls since we don't need to worry about register pressure. ```	2025-07-02 16:09:51 +08:00
Mingjie Xu	6323541a2a	[LLD][ELF] Skip non-SHF_ALLOC sections when checking max VA and max VA difference in relaxOnce() (#145863 ) For non-SHF_ALLOC sections, sh_addr is set to 0. Skip sections without SHF_ALLOC flag, so `minVA` will not be set to 0 with non-SHF_ALLOC sections, and the size of non-SHF_ALLOC sections will not contribute to `maxVA`.	2025-07-01 09:02:06 +08:00
Peter Collingbourne	494a74882b	Reapply "ELF: Add branch-to-branch optimization." Fixed assertion failure when reading .eh_frame sections, and added .eh_frame sections to tests. This reverts commit 1e95349dbe329938d2962a78baa0ec421e9cd7d1. Original commit message follows: When code calls a function which then immediately tail calls another function there is no need to go via the intermediate function. By branching directly to the target function we reduce the program's working set for a slight increase in runtime performance. Normally it is relatively uncommon to have functions that just tail call another function, but with LLVM control flow integrity we have jump tables that replace the function itself as the canonical address. As a result, when a function address is taken and called directly, for example after a compiler optimization resolves the indirect call, or if code built without control flow integrity calls the function, the call will go via the jump table. The impact of this optimization was measured using a large internal Google benchmark. The results were as follows: CFI enabled: +0.1% ± 0.05% queries per second CFI disabled: +0.01% queries per second [not statistically significant] The optimization is enabled by default at -O2 but may also be enabled or disabled individually with --{,no-}branch-to-branch. This optimization is implemented for AArch64 and X86_64 only. lld's runtime performance (real execution time) after adding this optimization was measured using firefox-x64 from lld-speed-test [1] with ldflags "-O2 -S" on an Apple M2 Ultra. The results are as follows: ``` N Min Max Median Avg Stddev x 512 1.2264546 1.3481076 1.2970261 1.2965788 0.018620888 + 512 1.2561196 1.3839965 1.3214632 1.3209327 0.019443971 Difference at 95.0% confidence 0.0243538 +/- 0.00233202 1.87831% +/- 0.179859% (Student's t, pooled s = 0.0190369) ``` [1] https://discourse.llvm.org/t/improving-the-reproducibility-of-linker-benchmarking/86057 Reviewers: zmodem, MaskRay Reviewed By: MaskRay Pull Request: https://github.com/llvm/llvm-project/pull/145579	2025-06-24 22:16:18 -07:00
Hans Wennborg	1e95349dbe	Revert "ELF: Add branch-to-branch optimization." This caused assertion failures in applyBranchToBranchOpt(): llvm/include/llvm/Support/Casting.h:578: decltype(auto) llvm::cast(From*) [with To = lld:🧝:InputSection; From = lld:🧝:InputSectionBase]: Assertion `isa<To>(Val) && "cast<Ty>() argument of incompatible type!"' failed. See comment on the PR (https://github.com/llvm/llvm-project/pull/138366) This reverts commit 491b82a5ec1add78d2c93370580a2f1897b6a364. This also reverts the follow-up "[lld] Use llvm::partition_point (NFC) (#145209)" This reverts commit 2ac293f5ac4cf65c0c038bf75a88f1d6715e467d.	2025-06-23 13:26:02 +02:00
Kazu Hirata	2ac293f5ac	[lld] Use llvm::partition_point (NFC) (#145209 )	2025-06-22 06:30:10 -07:00
Peter Collingbourne	491b82a5ec	ELF: Add branch-to-branch optimization. When code calls a function which then immediately tail calls another function there is no need to go via the intermediate function. By branching directly to the target function we reduce the program's working set for a slight increase in runtime performance. Normally it is relatively uncommon to have functions that just tail call another function, but with LLVM control flow integrity we have jump tables that replace the function itself as the canonical address. As a result, when a function address is taken and called directly, for example after a compiler optimization resolves the indirect call, or if code built without control flow integrity calls the function, the call will go via the jump table. The impact of this optimization was measured using a large internal Google benchmark. The results were as follows: CFI enabled: +0.1% ± 0.05% queries per second CFI disabled: +0.01% queries per second [not statistically significant] The optimization is enabled by default at -O2 but may also be enabled or disabled individually with --{,no-}branch-to-branch. This optimization is implemented for AArch64 and X86_64 only. lld's runtime performance (real execution time) after adding this optimization was measured using firefox-x64 from lld-speed-test [1] with ldflags "-O2 -S" on an Apple M2 Ultra. The results are as follows: ``` N Min Max Median Avg Stddev x 512 1.2264546 1.3481076 1.2970261 1.2965788 0.018620888 + 512 1.2561196 1.3839965 1.3214632 1.3209327 0.019443971 Difference at 95.0% confidence 0.0243538 +/- 0.00233202 1.87831% +/- 0.179859% (Student's t, pooled s = 0.0190369) ``` [1] https://discourse.llvm.org/t/improving-the-reproducibility-of-linker-benchmarking/86057 Pull Request: https://github.com/llvm/llvm-project/pull/138366	2025-06-20 13:16:24 -07:00
Peter Smith	eb0f1dc00e	[LLD][ELF] Include offset when adding Thunk symbols (#144995 ) Include the offset of a thunk in the ThunkSection when adding symbols. At Thunk creation time the offset is set to 0 as we don't know where in the ThunkSection the Thunk will end up. The symbol values are updated by the setOffset() call in assignOffsets(). When we transform a thunk from a short to a long, we sometimes add a mapping symbol. At this point the offset of the thunk is non zero and we need to account for that when defining the symbol, as the setOffset() call subtracts the offset before adding the new one back in. To test; added a second thunk that is converted to a long thunk to aarch64-thunk-bit-multipass. This second thunk is given a non zero offset from the start of the Thunk Section so we can observe the mapping symbol being put in the wrong place without accounting for the offset. fixes: https://github.com/llvm/llvm-project/issues/142326	2025-06-20 10:11:42 +01:00
Ming-Yi Lai	9adde28df7	[LLD][ELF][RISCV][Zicfilp][Zicfiss] Support `-z zicfilp=` and `-z zicfiss=` to force enable/disable features (#143114 ) + If `-z zicfilp=implicit` or option not specified, the output would have the ZICFILP feature enabled/disabled based on input objects + If `-z zicfilp=<never\|unlabeled\|func-sig>`, the output would have ZICFILP feature forced <off\|on to the "unlabeled" scheme\|on to the "func-sig" scheme> + If `-z zicfiss=implicit` or option not specified, the output would have the ZICFISS feature enabled/disabled based on input objects + If `-z zicfiss=<never\|always>`, the output would have the ZICFISS feature forced <off\|on>	2025-06-16 11:18:41 +08:00
Kazu Hirata	d78eec864c	[lld] Use range-based for loops (NFC) (#144251 )	2025-06-15 10:32:45 -07:00
SivanShani-Arm	5762491e2a	[lld] Refactor storage of PAuth ABI core info (#141920 ) Previously, the AArch64 PAuth ABI core values were stored as an ArrayRef<uint8_t>, introducing unnecessary indirection. This patch replaces the ArrayRef with two explicit uint64_t fields: aarch64PauthAbiPlatform and aarch64PauthAbiVersion. This simplifies the representation and improves readability. No functional change intended, aside from improved error messages.	2025-06-13 11:02:33 +01:00
Fangrui Song	07dad4ecba	[ELF] Implement -z dynamic-undefined-weak The behavior of an undefined weak reference is implementation defined. For static -no-pie linking, dynamic relocations are generally avoided (except IRELATIVE). -shared linking generally emits dynamic relocations. Dynamic -no-pie linking and -pie allow flexibility. Changes adjust the behavior for better consistency and simpler internal representation, e.g. https://reviews.llvm.org/D63003 https://reviews.llvm.org/D105164 (generalized to undefined non-weak in 2fcaa00d1e2317a90c9071b735eb0e758b5dd58b). GNU ld introduced -z [no]dynamic-undefined-weak option to fine-tune the behavior. (The option is not very effective with -no-pie, e.g. on x86-64, `ld.bfd a.o s.so -z dynamic-undefined-weak` generates R_X86_64_NONE relocations instead of GLOB_DAT/JUMP_SLOT) This patch implements -z [no]dynamic-undefined-weak option. The effects are summarized as follows: * Static -no-pie: no-op * Dynamic -no-pie: nodynamic-undefined-weak suppresses GLOB_DAT/JUMP_SLOT * Static -pie: dynamic-undefined-weak generates ABS/GLOB_DAT/JUMP_SLOT. https://discourse.llvm.org/t/lld-weak-undefined-symbols-in-vdso-only/86749 * Dynamic -pie: nodynamic-undefined-weak suppresses ABS/GLOB_DAT/JUMP_SLOT The -pie behavior likely stays stable while -no-pie (`!ctx.arg.isPic` in `isStaticLinkTimeConstant`) behavior will likely change in the future. The current default value of ctx.arg.zDynamicUndefined is selected to prevent behavior changes. Pull Request: https://github.com/llvm/llvm-project/pull/143831	2025-06-12 19:50:41 -07:00
Arthur Eubanks	46085d8f83	[lld/ELF][x86-64] Place large executable sections at the edges of binary (#70358 ) So that when mixing small and large text, large text stays out of the way of the rest of the binary. Place large RX sections at the beginning rather than at the end so that with `--no-rosegment`, the large text and rodata share a single PT_LOAD segment. Place large RWX sections at the end to keep writable and readonly sections separate. Clang started emitting the large section flag for `.ltext` sections in #73037.	2025-06-12 11:41:16 -07:00
Fangrui Song	2fcaa00d1e	[ELF] -z undefs: handle relocations referencing undefined non-weak like undefined weak * Merge the special case into isStaticLinkTimeConstant * Generalize isUndefWeak to isUndefined. undefined non-weak is an error case. We choose to be general, which also brings us in line with GNU ld.	2025-06-11 20:37:15 -07:00
Kazu Hirata	c1d21f4434	[lld] Use std::tie to implement comparison operators (NFC) (#143726 ) std::tie facilitates lexicographical comparisons through std::tuple's built-in operator< and operator>.	2025-06-11 12:50:19 -07:00
Alexander Ziaee	44a7ecd1d7	[doc] Use ISO nomenclature for 1024 byte units (#133148 ) Increase specificity by using the correct unit sizes. KBytes is an abbreviation for kB, 1000 bytes, and the hardware industry as well as several operating systems have now switched to using 1000 byte kBs. If this change is acceptable, sometimes GitHub mangles merges to use the original email of the account. $dayjob asks contributions have my work email. Thanks!	2025-06-11 13:27:23 +02:00
Fangrui Song	8957e64a20	[ELF,RISCV] Fix oscillation due to call relaxation The new test (derived from riscv32 openssl/test/cmp_msg_test.c) revealed oscillation in two R_RISCV_CALL_PLT jumps: - First jump (~2^11 bytes away): alternated between 4 and 8 bytes. - Second jump (~2^20 bytes away): alternated between 2 and 8 bytes. The issue is not related to alignment. In 2019, GNU ld addressed a similar problem by reducing the relaxation allowance for cross-section relaxation (https://sourceware.org/bugzilla/show_bug.cgi?id=25181). This approach would result in a suboptimal layout for the tight range tested by riscv-relax-call.s. This patch stabilizes the process by preventing `remove` increment after a few passes, similar to integrated assembler's fragment relaxation. (For the Android bit reproduce, `pass < 2` leads to non-optimal layout while `pass < 3` and `pass < 4` output is identical.) Fix https://github.com/llvm/llvm-project/issues/113838 Possibly fix https://github.com/llvm/llvm-project/issues/123248 (inputs are bitcode, subject to ever-changing code generation, not reproducible) Pull Request: https://github.com/llvm/llvm-project/pull/142899	2025-06-10 09:28:39 -07:00
Jorge Gorbe Moya	d099d953ef	[lld] Add missing includes. (#143453 ) Some inline methods in these headers require a complete type but the corresponding include was missing.	2025-06-09 15:52:37 -07:00

1 2 3 4 5 ...

7824 Commits