17981 Commits

Author SHA1 Message Date
Daniel Thornburgh
fecf609998
Reland "[LTO][LLD] Prevent invalid LTO libfunc transforms (#164916)" (#190642)
This reverts commit 1ec7e86b3a779df2a0af3f37e58c8f5b3a398d7f after issue
#190072 was fixed.
2026-04-06 19:20:45 +00:00
Jasmine Tang
e3e4b8481d
[WebAssembly] Add support for shared tags (#188367)
Mostly following the structure of other Shared* constructs

Fixes: #188120
2026-04-05 05:53:20 +00:00
Haohai Wen
6565e08c1e
[lld][COFF] Add /discard-section option to discard input sections by name (#189542)
This provides a general mechanism similar to ELF linker scripts'
/DISCARD/ for COFF. Though the intention is to explicitly discard
.llvmbc and .llvmcmd sections. (See discussion in #150897, #188398
for more details.)
2026-04-04 20:26:12 +08:00
Fangrui Song
2f7bd4fa97
[ELF] Enable parallel relocation scanning for -z nocombreloc and PPC64 (#190309)
The `bool serial` condition in scanRelocations disabled parallelism for
three cases: -z nocombreloc, MIPS, and PPC64. Resolve two cases:

- nocombreloc: .rela.dyn is now always created with combreloc=true so
  non-relative relocations are sorted deterministically. Since
  #187964 already separates relative relocations unconditionally,
  the only remaining effect of -z nocombreloc is suppressing
  DT_RELACOUNT (gated on ctx.arg.zCombreloc in DynamicSection).

- PPC64: After #181496 moved scanning into scanSectionImpl, the
  sole thread-unsafe access is ctx.ppc64noTocRelax (DenseSet::insert).
  Protect it with ctx.relocMutex, which is already used for rare
  operations during parallel scanning.

MIPS retains serial scanning due to `MipsGotSection` mutations.
2026-04-02 22:00:15 -07:00
Simi Pallipurath
dc9be4ee30
[LLD][ELF] Skip non-inputsections to avoid invalid cast in Arm BE8 handling (#188154)
This patch fixes https://github.com/llvm/llvm-project/issues/187033

In BE8 mode, instruction bytes are reversed for sections containing
code. This logic currently assumes that arm mapping symbols (e.g. $a,
$t, $d) are always associated with InputSections.

However, mapping symbols can also be defined in other section types such
as mergeable sections (SHF_MERGE). These are not represented as
InputSection, and attempting to cast them using
cast_if_present<InputSection> results in an assertion failure.
2026-04-02 10:16:54 +01:00
Fangrui Song
6f9646a598
[ELF] Parallelize --gc-sections mark phase (#189321)
Add `markParallel` using level-synchronized `parallelFor`. Each BFS
level is processed in parallel; newly discovered sections are collected
in per-thread queues and merged for the next level.

The parallel path is used when `!TrackWhyLive && partitions.size()==1`.
`parallelFor` naturally degrades to serial when `--threads=1`.

Uses depth-limited inline recursion (depth<3) and optimistic
load-then-exchange dedup for best performance.

Linking a Release+Asserts clang (--gc-sections, --time-trace) on an old
x86-64:

8 threads: markLive 315ms -> 82ms (-234ms). Total 1562ms -> 1350ms
(1.16x).
16 threads: markLive 199ms -> 50ms (-149ms). Total 1017ms -> 862ms
(1.18x).

and on Apple M4: markLive 61ms -> 13ms. Total 317.3ms -> 272.7ms
(1.16x).
2026-04-02 06:42:00 +00:00
Fangrui Song
6a87416162
[ELF] Move Symbol::used to atomic flags field (#190117)
Move the `used` bitfield into the existing `std::atomic<uint16_t>
flags`,
making it safe for concurrent access from parallel GC mark (#189321).
2026-04-01 23:21:13 -07:00
Fangrui Song
2118499a89
[ELF] Decouple SharedFile::isNeeded from GC mark. NFC (#190112)
... out of the per-relocation resolveReloc and into a post-GC scan of
global symbols. This decouples the --as-needed logic from the mark
algorithm, simplifying the imminent parallel GC mark.
2026-04-01 22:42:51 -07:00
Fangrui Song
0bde74ab04
[ELF] Pass SectionPiece by reference in getSectionPiece. NFC (#190110)
The generated assembly looks more optimized. In addition, this avoids
widened load, which would cause a TSan-detected data race with parallel
--gc-sections (#189321).
2026-04-01 22:07:42 -07:00
Fangrui Song
8daaa26efd
[Support] Support nested parallel TaskGroup via work-stealing (#189293)
Nested TaskGroups run serially to prevent deadlock, as documented by
https://reviews.llvm.org/D61115 and refined by
https://reviews.llvm.org/D148984 to use threadIndex.

Enable nested parallelism by having worker threads actively execute
tasks from the work queue while waiting (work-stealing), instead of
just blocking. Root-level TaskGroups (main thread) keep the efficient
blocking Latch::sync(), so there is no overhead for the common
non-nested case.

In lld, https://reviews.llvm.org/D131247 worked around the limitation
by passing a single root TaskGroup into OutputSection::writeTo and
spawning 4MB-chunked tasks into it. However, SyntheticSection::writeTo
calls with internal parallelism (e.g. GdbIndexSection,
MergeNoTailSection) still ran serially on worker threads. With this
change, their internal parallelFor/parallelForEach calls parallelize
automatically via helpSync work-stealing.

The increased parallelism can reorder error messages from parallel
phases (e.g. relocation processing during section writes), so one lld
test is updated to use --threads=1 for deterministic output.
2026-04-01 19:20:16 -07:00
Zhaoxuan Jiang
fd609e5d33
[lld] Glob-based BP compression sort groups (#185661)
Add
--bp-compression-sort-section=<glob>[=<layout_priority>[=<match_priority>]]
to let users split input sections into multiple compression groups, run
balanced partitioning independently per group, and leave out sections
that are poor candidates for BP. This replaces the old coarse
--bp-compression-sort with a more explicit, user-controlled one.

In ELF, the glob matches input section names (.text.unlikely.cold1). In
Mach-O, it matches the concatenated segment+section name (__TEXT__text).

layout_priority controls group placement in the final layout.
match_priority resolves conflicts when multiple globs match the same
section: explicit priority beats positional matching, and among
positional specs the last match wins.

A CRTP hook getCompressionSubgroupKey() allows backends to further
subdivide glob groups into independent BP instances. This allows Mach-O
backend to separate cold functions via N_COLD_FUNC in the future.

The deprecated --bp-compression-sort option keeps its existing
function/data behavior by assigning sections to fixed legacy groups.
2026-04-01 17:53:08 -07:00
Fangrui Song
42cc454777
[ELF] Optimize binary search in getSectionPiece (#187916)
Two optimizations to make getSectionPiece O(1) for common cases:

1. For non-string fixed-size merge sections, use direct computation
   (offset / entsize) instead of binary search.

2. Pre-resolve piece indices for non-section Defined symbols during
   splitSections. The piece index and intra-piece offset are packed
   into Defined::value as ((pieceIdx+1) << 32) | intraPieceOffset,
   replacing repeated binary searches (MarkLive, includeInSymtab,
   getRelocTargetVA) with a single upfront resolution.

On x86-64, references to mergeable strings use local labels:

    leaq .LC0(%rip), %rax  # R_X86_64_PC32 .LC0-4

The relocations use non-section symbols and benefit from optimization 2.
On many other targets (e.g. AArch64), the addend is 0 and the assembler
adjusts such relocations to reference section symbols, which still use
binary search.

On a clang link (clang-relassert reproduce tarball, x86-64):
- --gc-sections: 1.05x as fast
2026-03-30 20:51:30 -07:00
Kewen Meng
1ec7e86b3a Revert "[LTO][LLD] Prevent invalid LTO libfunc transforms (#164916)"
This reverts commit 8b21fe60b43fe358321bca904ae307406725c002.

to unblock bot: https://lab.llvm.org/buildbot/#/builders/67/builds/1196
2026-03-30 22:25:25 -05:00
Shivam Gupta
14ce208a45
[LLD][AArch64] Handle R_AARCH64_TLS_DTPREL64 in non-alloc sections (#183962)
Clang plan to emit R_AARCH64_TLS_DTPREL64 in .debug_info (see PR
#146572). LLD currently fails to recognize this relocation.

This prevent the debugger from correctly locating TLS variables when
using the DWARF DW_OP_GNU_push_tls_address or DW_AT_location with DTPREL
offsets.

This patch adds support for R_AARCH64_TLS_DTPREL64, adds its mapping to
R_DTPREL.
2026-03-31 08:47:54 +05:30
Daniel Thornburgh
8b21fe60b4
[LTO][LLD] Prevent invalid LTO libfunc transforms (#164916)
In LTO, part of LLVM's middle-end runs after linking has finished. LTO's
semantics depend on the complete set of extracted bitcode files being
known at this time. If the middle-end inserts new calls to library
functions (libfuncs) that are implemented in bitcode, this could extract
new bitcode object files into the link. These cannot be compiled,
leading to undefined symbol references.

Additionally, the middle-end in LTO may reason that such library
functions have no references, and it may internalize them, then
manipulate their API or even delete them. Afterwards, it may emit a call
to them, again producing undefined symbol references.

This patch resolves the former issue by ensuring that the middle end
emits no new references to symbols defined in bitcode, and it resolves
the latter issue by ensuring that extracted bitcode for libfuncs is
considered external, since new calls may be emitted to them at any time.

The new semantics are not yet established for MachO LLD, which does not
yet appear to have any special handling for libcalls in LTO. It also
does not yet support distributed ThinLTO; doing so would require
additional (de)serialization work.

This is the patch referenced in @ilovepi's and my talk at the last LLVM
devmeeting: "LT-Uh-Oh"

Gemini 3.1 was used in porting to COFF and WASM LLDs.
2026-03-30 14:44:52 -07:00
Brian Cain
ba228181c2
[lld][Hexagon] Fix out-of-range PLT branch thunks (#186545)
Linking large Hexagon binaries (e.g. ASan runtime with >8 MiB of text)
fails with R_HEX_B22_PCREL / R_HEX_PLT_B22_PCREL relocation overflow on
calls to PLT entries, even though the thunk infrastructure exists and
needsThunks is set.

needsThunk() always used s.getVA() to compute the branch destination,
even for PLT calls where the actual destination is the PLT entry. This
meant the distance check used the wrong address and failed to create
thunks when the PLT entry was out of B22_PCREL range.

Fix by using s.getPltVA() when expr == R_PLT_PC. Also override
getThunkSectionSpacing() so ThunkSections are pre-created at appropriate
intervals for large binaries.
2026-03-30 14:06:47 -05:00
Brian Cain
8f4f515898
[lld][Hexagon] Fix TLS GD PLT to only create PLT entry for __tls_get_addr (#180297)
Previously, R_HEX_GD_PLT_* relocations would create PLT entries for TLS
symbols like 'foo' in addition to __tls_get_addr.

This fix skips NEEDS_PLT on TLS symbols with R_HEX_GD_PLT_*, creates
__tls_get_addr symbol earlier with NEEDS_PLT, changes
hexagonTLSSymbolUpdate to only rebind relocations.

Also a test for the edge case where a GD_PLT relocation directly
references __tls_get_addr which previously caused a crash due to
duplicate PLT entry creation.

---------

Co-authored-by: Fangrui Song <i@maskray.me>
2026-03-30 09:35:17 -05:00
Austin Hudson
1e99c9e4c7
[lld][COFF] Restore lto-embed-bitcode and -fembed-bitcode Bitcode Embedding Features (#188398)
Removes the patches introduced by #150897 which broke LTO embed
documented features for creating whole-program-bitcode representations
of executables, used in production analysis/rewriting toolsets. This was
a documented feature available up until 21.1.8 broken by 22.x release.

This previously allowed the users to have a whole-program-bitcode
section `.llvmbc` embedded inside of the final executable.
2026-03-30 16:05:37 +03:00
Fangrui Song
fcd0e2cca0
[ELF] Remove redundant sec->repl != sec check in BPSectionOrderer. NFC (#189214)
ICF's InputSection::replace() calls markDead() on folded sections, so
`!sec->isLive()` already filters them.
2026-03-29 00:38:03 -07:00
Farid Zakaria
cf3a0f2553
[lld] update maintainers (#183803)
As a new contributor, it helps to correctly see the right maintainer.
2026-03-28 22:21:13 -07:00
Ilija Tovilo
1128d74438
[LLD][skip ci] Fix typo in linker_script.rst (#148867) 2026-03-27 15:50:25 -07:00
Ben Dunbobbin
80b304d14b
[DTLTO] Improve performance of adding files to the link (#186366)
The in-process ThinLTO backend typically generates object files in
memory and adds them directly to the link, except when the ThinLTO cache
is in use. DTLTO is unusual in that it adds files to the link from disk
in all cases.

When the ThinLTO cache is not in use, ThinLTO adds files via an
`AddStreamFn` callback provided by the linker, which ultimately appends
to a `SmallVector` in LLD. When the cache is in use, the linker supplies
an `AddBufferFn` callback that adds files more efficiently (by moving
`MemoryBuffer` ownership).

This patch adds a mandatory `AddBufferFn` to the DTLTO ThinLTO backend.
The backend uses this to add files to the link more efficiently.
Additionally:
- Move AddStream from CGThinBackend to InProcessThinBackend, for reader
  clarity.
- Modify linker comments that implied the AddBuffer path is
  cache-specific.

For a Clang link (Debug build with sanitizers and instrumentation) using
an optimized toolchain (PGO non-LTO, llvmorg-22.1.0), measuring the mean
`Add DTLTO files to the link` time trace scope duration:
- On Windows (Windows 11 Pro Build 26200, AMD Family 25 @ ~4.5 GHz, 16
  cores/32 threads, 64 GB RAM), this patch reduces the mean from
  2799.148 ms to 157.972 ms.
- On Linux (Ubuntu 24.04.3 LTS Kernel 6.14, Ryzen 9 5950X, 16
  cores/32 threads, boost up to 5.09 GHz, 64 GB RAM), this patch reduces
  the mean from 255.291 ms to 41.630 ms.

Based on work by @romanova-ekaterina and @kbelochapka.
2026-03-27 17:51:49 +00:00
Zhaoxuan Jiang
788ea11054
[lld-macho] Make safe ICF conservative without __llvm_addrsig (#188400)
MachO --icf=safe and --icf=safe_thunks used to keep folding code from
object files that did not contain __llvm_addrsig, which was inconsistent
with the conservative ELF/COFF behavior. Mark all symbols in such
objects as address-significant instead, and add regression coverage for
both safe ICF modes with and without addrsig.
2026-03-26 16:30:23 -07:00
Fangrui Song
bb443359a8
[ELF] Validate merge section offsets in getSymVA and match GNU ld (#188677)
Move the "offset is outside the section" error for merge sections from
getSectionPiece to getSymVA, where we know the offset comes from a
section symbol + addend. Include the offset value in the diagnostic.

Accept offset == section_size (one-past-end) to match GNU ld behavior,
while rejecting offset > section_size. Skip out-of-bounds offsets in
MarkLive to avoid assertion failures in getSectionPiece.
2026-03-26 10:29:36 -07:00
Fangrui Song
36aef4ba81
[ELF,test] Combine merge section out-of-bounds tests into merge-piece-oob.s (#188688) 2026-03-25 23:06:34 -07:00
Elia Geretto
ebd62d652c
[lld][ELF][clang][MTE] Add -z memtag-{mode,heap,stack} (#188205)
This change eliminates the Android-specific --android-memtag-* flags
from lld, replacing them with -z memtag-* generic equivalents. With
these generic flags, the linker will emit only the dynamic array tags
specified in the "Memtag ABI Extension to ELF", but no Android-specific
memtag note.

In addition, this change adds an --android-memtag-note flag which should
be used when the Android-specific memtag note should be emitted.

This change also modifies the clang driver to make use of the new flags.
2026-03-25 13:02:58 -07:00
Brian Cain
51c3f971a0
[lld][Hexagon] Test undefined weak branches (#186613)
Undefined weak branches do not needsThunk().

Add a test case to cover undef weak.
2026-03-25 15:03:35 +00:00
Fangrui Song
f599bfcd27
[ELF] Guard relocation section handling behind copyRelocs in addOrphanSections. NFC (#188409)
In addOrphanSections, getRelocatedSection() only returns non-null for -r
or --emit-relocs links. Guard code blocks with `copyRelocs` to skip
unnecessary dyn_cast + getRelocatedSection calls per section in the
common case. Hoist copyRelocs and relocatable to local variables so the
compiler does not reload them through ctx on every loop iteration.

"Assign sections" decreases by 1ms.
2026-03-25 04:59:03 +00:00
Derek Schuff
b60a39e3c2
[lld][WebAssembly] Propagate +atomics for ThinLTO when using --shared-memory (#188381)
When compiling WebAssembly with ThinLTO, functions are partitioned into
isolated `.bc` modules and dispatched to individual LTO backend threads.
During code generation, the `CoalesceFeaturesAndStripAtomics` pass
iterates over the module to gather the union of target features (like
`+atomics`) attached to defined functions. In particular when not using
threads, it lowers away atomics and TLS variables to their
single-threaded equivalents.

However, if a partitioned module only contains globally defined TLS
variables (e.g. there are no functions, or all functions were fully
inlined or stripped by dropDeadSymbols before ThinLTO optimization), the
module becomes completely devoid of function definitions. The coalescing
pass then falls back to fetching features from the `TargetMachine`.
Because in LTO the `TargetMachine` defaults to a generic target without
atomics enabled, the TLS is lowered away and the `wasm-feature-atomics`
flag is omitted from the resulting ThinLTO object partition, causing
`wasm-ld` to immediately reject it.

To fix this we take advantage of the fact that the linker always knows
whether threads are being used (via the --shared-memory flag). When
using shared memory, we enable +atomics and +bulk-memory in the
TargetMachine that is used for the backend, and the feature coalescing
pass will correctly detect the use of therads.
This only makes sense for atomics because of the global linker
configuration; for other features we wouldn't be able to do this, but we
don't rewrite away any other features anyway.
2026-03-25 00:30:39 +00:00
Fangrui Song
036b755dae
[ELF] Parallelize demoteAndCopyLocalSymbols. NFC (#187970)
Use parallelFor to process files in parallel, collecting Symbol*
pointers per-file, then merge into the symbol table serially.

Linking clang-14 (208K .symtab entries) is 1.04x as fast.
2026-03-23 04:52:55 +00:00
Fangrui Song
dc4df5da88
[ELF] Always separate relative relocations regardless of -z combreloc (#187964)
Remove the combreloc guard from addReloc and mergeRels so that
relative relocations are always routed to relativeRelocs, even with -z
nocombreloc or --pack-dyn-relocs=android.

Update AndroidPackedRelocationSection::updateAllocSize to iterate
both relativeRelocs and relocs.
2026-03-23 03:22:44 +00:00
Fangrui Song
076226f378
[ELF] Separate relative and non-relative dynamic relocations (#187959)
Previously, the flow was:

1. Parallel scan adds relative relocs to per-thread `relocsVec`
2. `mergeRels()` copies all into `relocs`
3. `partitionRels()` uses `stable_partition` to separate

Now, relative relocs are routed at `addReloc` time by checking
`reloc.type == relativeRel`. In `mergeRels`, sharded entries are
classified through the same `addReloc` path rather than blindly
appended. `relocsVec` may contain non-relative entries like
`R_AARCH64_AUTH_RELATIVE`.

This eliminates the `stable_partition` on the full relocation vector
(543K entries for clang) and avoids copying relative relocations into
`relocs` only to move them out again.

Linking an x86_64 release+assertions build of clang is 1.04x as fast.

`numRelativeRelocs` caches `relativeRelocs.size()` at `finalizeContents`
time for `DT_RELACOUNT`. Using a live `relativeRelocs.size()` would
cause `DynamicSection::writeTo` to emit an extra entry when thunks add
relocs after `.dynamic` is sized, overflowing into adjacent sections.
Tested by ppc64-long-branch-rel14.s.
2026-03-23 01:46:20 +00:00
Martin Storsjö
4c4925f1a2
[LLD] [ELF] Make {bti,gcs}-report=none silence warnings from force-bti/gcs=always (#186343)
Previously, the implicit warnings from force-bti (or gcs=always) weren't
possible to silence.

The force-ibt/cet-report flags could also be handled the same way, but I
haven't checked with GNU ld how they behave. And there, the force-ibt
flag only produces warnings if the IBT bit is missing, while cet-report
warns if either IBT or SHSTK are missing - but force-ibt probably
shouldn't implicitly start warning for missing SHSTK.

This addresses a discrepancy to GNU ld that was noted in #186173.
2026-03-22 14:17:52 +02:00
wanglei
655d5e7f69
[lld][ELF] Fix crash when relaxation pass encounters synthetic sections
In LoongArch and RISC-V, the relaxation pass iterates over input sections
within executable output sections. When a linker script places a synthetic
section (e.g., .got) into such an output section, the linker would crash
because synthetic sections do not have the relaxAux field initialized.

The relaxAux data structure is only allocated for non-synthetic sections
in initSymbolAnchors. This patch adds the necessary null checks in the
relaxation loops (relaxOnce and finalizeRelax) to skip sections that
do not require relaxation.

A null check is also added to elf::initSymbolAnchors to ensure the
subsequent sorting of anchors is safe.

Fixes: #184757

Reviewers: MaskRay

Pull Request: https://github.com/llvm/llvm-project/pull/184758
2026-03-16 10:06:34 +08:00
Martin Storsjö
887d2d4bf7
[LLD] [ELF] Make -z gcs=always implicitly warn on missing GCS, like force-bti (#186203)
This matches GNU ld, where gcs=always makes it implicitly warn about
missing GCS flags, by matching the existing code pattern used for BTI
and IBT.

Also test that warnings can be printed for both missing BTI and GCS for
the same object file.

This fixes #186173.
2026-03-13 10:52:29 +02:00
Fangrui Song
0da4396c1e
[ELF] Fix -u with TLS symbols: propagate type from STT_NOTYPE to STT_TLS (#185794)
-u creates an Undefined with STT_NOTYPE. When an object file provides
another Undefined with STT_TLS for the same symbol, Symbol::resolve
only updated binding, leaving type as STT_NOTYPE. This caused
sym.isTls() to return false in postScanRelocations, skipping TLS GOT
entry creation and leading to an out-of-range R_X86_64_GOTTPOFF error.

Fix: in resolve(Undefined), when the existing type is STT_NOTYPE,
adopt the incoming type.
2026-03-11 17:47:07 +00:00
Fangrui Song
3c171f4200
[ELF,test] Add test for -u error message referencing object file (#185938)
When -u creates an undefined symbol and a relocatable file has a weak
reference, the error message references the relocatable file, not
<internal>.
2026-03-11 10:38:21 -07:00
Sam Clegg
40cd48fd38
[lld][WebAssembly] Restore inactive checks relocatable.ll test. NFC (#185569)
Back in 6474d1b20 this test was updated, removing the NORMAL vs SHARED
distinction in the output checking. However many of the NORMAL-NEXT
lines were left unmodified, making them effectively disabled.

This restores and updates the expectations.
2026-03-10 11:49:38 -07:00
Zhaoxuan Jiang
34b2de9e4c
[lld][MachO] Deduplicate branch-extension thunks for ICF-folded symbols (#185396)
After ICF, multiple symbols may resolve to the same address but remain
as distinct Symbol pointers. When used as keys in thunkMap, this caused
redundant branch-extension thunks to be created for the same target. Fix
this by providing a custom DenseMapInfo for thunkMap that hashes and
compares Defined symbols by (isec, value) instead of pointer identity.
2026-03-09 18:01:05 -07:00
Nuri Amari
23cb4e5f46
Support -fpass-plugin + -fthinlto-index together (#183525)
Without this change, passing -fthinlto-index causes -fpass-plugin
arguments to be ignored. We want to be able to use plugins with
distributed thin-lto, so add support for this.
2026-03-06 10:17:01 -05:00
Fangrui Song
46d29d43ba
[ELF] Remove unused handleTlsRelocation (#184951)
Now that all targets use target-specific relocation scanning for TLS
(#181332 RISC-V being the last), handleTlsRelocation is unused.
2026-03-06 05:53:28 +00:00
Fangrui Song
4ea72c1e8c
[ELF] Add target-specific relocation scanning for RISC-V (#181332)
Implement RISCV::scanSectionImpl, following the pattern established
for x86 (#178846) and AArch64 (#181099). This merges the getRelExpr
and TLS handling for SHF_ALLOC sections into the target-specific
scanner, enabling devirtualization and eliminating abstraction
overhead.

- Inline relocation classification into scanSectionImpl with a switch
  on relocation type, replacing the generic rs.scan() path.
- Use processR_PC/processR_PLT_PC for common PC-relative and PLT
  relocations.
- Handle TLS IE and GD directly (RISC-V does not optimize GD/LD/IE).
- Replace TLS-optimization-specific expressions for TLSDESC, following
  the x86 pattern: R_RELAX_TLS_GD_TO_IE -> R_GOT_PC,
  R_RELAX_TLS_GD_TO_LE -> R_TPREL. Update relocateAlloc and relax()
  to dispatch on relocation type instead of RelExpr for TLSDESC.
- Simplify getRelExpr to only handle relocations needed by
  relocateNonAlloc and preprocessRelocs.
- Remove RISC-V-specific checks from handleTlsRelocation (isRISCV
  variable, TLSDESC label special cases).
- Move R_RISCV_VENDOR handling into the relocation type switch. An
  undefined vendor symbol now gets the standard undefined symbol error
  instead of a vendor-specific diagnostic.
2026-03-06 04:08:40 +00:00
Fangrui Song
dd79c925d1
[ELF] handleTlsGd: support disabling GD-to-IE/LE optimization. NFC (#184934)
Use this in ARM::scanSectionImpl for R_ARM_TLS_GD32 and the upcoming
RISC-V change.
2026-03-06 02:29:13 +00:00
Will
e9657a12b2
COFF: Allow hex literals in .def files: BASE/HEAPSIZE/STACKSIZE (#184764)
For a Win32 DLL, a .def file can have a custom executable base:
```
LIBRARY "stub.dll" BASE=0x10000000
```

Currently the parser enforces Base 10, but [Microsoft's
documentation](https://learn.microsoft.com/en-us/cpp/build/reference/rules-for-module-definition-statements?view=msvc-170)
states "Numeric arguments are specified in base 10 or hexadecimal".

This fixes that, and also HEAPSIZE and STACKSIZE (which use the same
function).

There are a few more instances of `getAsInteger` that expect base10 -
for ordinals and the VERSION directive. Since I don't have an
in-the-wild example of a .def file using hexadecimal for these, I am
wary about changing those too.
2026-03-05 18:01:58 +02:00
Sam Clegg
ec15263cb8
[lld][WebAssembly] Convert weak-alias tests to assembly. NFC (#184667)
This actually both improves and simplifies the `Inputs/weak_alias`. With
the `.ll` version we ended up using memory and `__stack_pointer` and
locals, but LLVM ended up generated `call` rather than `call_indirect`
for the `call_alias_ptr` and `call_direct_ptr`. With the assembly tests
we can ensure the usage of `call_indirect` while avoiding all the other
stuff.
2026-03-04 13:05:02 -08:00
Vladislav Dzhidzhoev
63074da25d
[DebugInfo][DwarfDebug] Move emission of globals from beginModule() to endModule() (5/7) (#184219)
RFC
https://discourse.llvm.org/t/rfc-dwarfdebug-fix-and-improve-handling-imported-entities-types-and-static-local-in-subprogram-and-lexical-block-scopes/68544

This patch moves the emission of global variables from
`DwarfDebug::beginModule()` to `DwarfDebug::endModule()`.

It has the following effects:
1. The order of debug entities in the resulting DWARF changes.
2. Currently, if a DISubprogram requires emission of both concrete
out-of-line and inlined subprogram DIEs, and such a subprogram contains
a static local variable, the DIE for the variable is emitted into the
concrete out-of-line subprogram DIE. As a result, the variable is not
available in debugger when breaking at the inlined function instance.

It happens because static locals are emitted in
`DwarfDebug::beginModule()`, but abstract DIEs for functions that are
not completely inlined away are created only later during
`DwarfDebug::endFunctionImpl()` calls.

With this patch, DIEs for static local variables of subprograms that
have both inlined and the concrete out-of-line instances are placed into
abstract subprogram DIEs. They become visible in debugger when breaking
at concrete out-of-line and inlined function instances.

   `llvm/test/DebugInfo/Generic/inlined-static-var.ll` illustrates that.
3. It will allow to simplify abstract subprogram DIEs creation by
reverting https://github.com/llvm/llvm-project/pull/159104 later.

This is needed to simplify DWARF emission in a context of proper support
of function-local static variables which comes in the next patch
(https://reviews.llvm.org/D144008), making all function-local entities
handled in `DwarfDebug::endModuleImpl()`.

Authored-by: Kristina Bessonova <kbessonova@accesssoftek.com>
Co-authored-by: David Blaikie <dblaikie@gmail.com>
Co-authored-by: Vladislav Dzhidzhoev <vdzhidzhoev@accesssoftek.com>
2026-03-04 20:32:12 +01:00
Brian Cain
9105d9c249
[lld][Hexagon] Fix findMaskR8 missing duplex support (#183936)
findMaskR8() lacked an isDuplex() check, unlike findMaskR6(),
findMaskR11(), and findMaskR16() which all handle duplex instructions.

When the assembler generates R_HEX_8_X on a duplex SA1_addi instruction
(e.g. `{ r0 = add(r0, ##target); memw(r1+#0) = r2 }`), the wrong mask
0x00001fe0 placed relocation bits at [12:5] instead of [25:20],
corrupting the low sub-instruction (e.g. memw became memb).

Add the isDuplex() check returning 0x03f00000, and add a comprehensive
test covering all duplex instruction x relocation type combinations
across findMaskR6, findMaskR8, findMaskR11, and findMaskR16.
2026-03-04 11:11:09 -06:00
Fangrui Song
c9355cc121
[ELF] Move ArmCmseSGSection into Arch/ARM.cpp (#184570)
Move the ArmCmseSGVeneer and ArmCmseSGSection class definitions from
SyntheticSections.h into the anonymous namespace in Arch/ARM.cpp, where
the implementations already reside. Rename ArmCmseSGVeneer to
CmseSGVeneer as it no longer needs the Arm prefix for disambiguation.
2026-03-04 09:09:31 +00:00
Fangrui Song
cd01e6526a
[ELF] Add target-specific relocation scanning for LoongArch (#182236)
Implement LoongArch::scanSectionImpl, following the pattern established
for x86, PPC64, SystemZ, AArch64. This merges the getRelExpr and TLS
handling for SHF_ALLOC sections into the target-specific scanner,
enabling devirtualization and eliminating abstraction overhead.

- Inline relocation classification into scanSectionImpl with a switch
  on relocation type, replacing the generic rs.scan() path.
- Use processR_PC/processR_PLT_PC for common PC-relative and PLT
  relocations.
- Inline TLS handling: IE->LE optimization for _PC_ variants only (not
  _PCADD_ or absolute), TLSDESC->IE/LE for non-extreme code model,
  GD/LD flag setting without going through generic handleTlsRelocation.
- Remove adjustTlsExpr by inlining its logic into scanSectionImpl.
- Remove LoongArch-specific code from Relocations.cpp:
  handleTlsRelocation, execOptimizeInLoongArch, and the sort condition.
- Simplify getRelExpr to only handle relocations needed by
  relocateNonAlloc, scanEhSection, and the extreme code model fallback
  in relocateAlloc.
2026-03-03 22:22:55 -08:00
Sam Clegg
928505c983
[lld][WebAssembly] Convert more tests to assembly. NFC (#184418)
The only expectations change here is that `__stack_pointer` is
no longer exports in the `archive-export.test` test. This is because
we don't enable the mutable-globals feature (since the assembly files
don't contains all the now-default features of the generic CPU).
2026-03-03 17:04:20 -08:00