After ICF, multiple symbols may resolve to the same address but remain
as distinct Symbol pointers. When used as keys in thunkMap, this caused
redundant branch-extension thunks to be created for the same target. Fix
this by providing a custom DenseMapInfo for thunkMap that hashes and
compares Defined symbols by (isec, value) instead of pointer identity.
Without this change, passing -fthinlto-index causes -fpass-plugin
arguments to be ignored. We want to be able to use plugins with
distributed thin-lto, so add support for this.
Implement RISCV::scanSectionImpl, following the pattern established
for x86 (#178846) and AArch64 (#181099). This merges the getRelExpr
and TLS handling for SHF_ALLOC sections into the target-specific
scanner, enabling devirtualization and eliminating abstraction
overhead.
- Inline relocation classification into scanSectionImpl with a switch
on relocation type, replacing the generic rs.scan() path.
- Use processR_PC/processR_PLT_PC for common PC-relative and PLT
relocations.
- Handle TLS IE and GD directly (RISC-V does not optimize GD/LD/IE).
- Replace TLS-optimization-specific expressions for TLSDESC, following
the x86 pattern: R_RELAX_TLS_GD_TO_IE -> R_GOT_PC,
R_RELAX_TLS_GD_TO_LE -> R_TPREL. Update relocateAlloc and relax()
to dispatch on relocation type instead of RelExpr for TLSDESC.
- Simplify getRelExpr to only handle relocations needed by
relocateNonAlloc and preprocessRelocs.
- Remove RISC-V-specific checks from handleTlsRelocation (isRISCV
variable, TLSDESC label special cases).
- Move R_RISCV_VENDOR handling into the relocation type switch. An
undefined vendor symbol now gets the standard undefined symbol error
instead of a vendor-specific diagnostic.
For a Win32 DLL, a .def file can have a custom executable base:
```
LIBRARY "stub.dll" BASE=0x10000000
```
Currently the parser enforces Base 10, but [Microsoft's
documentation](https://learn.microsoft.com/en-us/cpp/build/reference/rules-for-module-definition-statements?view=msvc-170)
states "Numeric arguments are specified in base 10 or hexadecimal".
This fixes that, and also HEAPSIZE and STACKSIZE (which use the same
function).
There are a few more instances of `getAsInteger` that expect base10 -
for ordinals and the VERSION directive. Since I don't have an
in-the-wild example of a .def file using hexadecimal for these, I am
wary about changing those too.
This actually both improves and simplifies the `Inputs/weak_alias`. With
the `.ll` version we ended up using memory and `__stack_pointer` and
locals, but LLVM ended up generated `call` rather than `call_indirect`
for the `call_alias_ptr` and `call_direct_ptr`. With the assembly tests
we can ensure the usage of `call_indirect` while avoiding all the other
stuff.
RFC
https://discourse.llvm.org/t/rfc-dwarfdebug-fix-and-improve-handling-imported-entities-types-and-static-local-in-subprogram-and-lexical-block-scopes/68544
This patch moves the emission of global variables from
`DwarfDebug::beginModule()` to `DwarfDebug::endModule()`.
It has the following effects:
1. The order of debug entities in the resulting DWARF changes.
2. Currently, if a DISubprogram requires emission of both concrete
out-of-line and inlined subprogram DIEs, and such a subprogram contains
a static local variable, the DIE for the variable is emitted into the
concrete out-of-line subprogram DIE. As a result, the variable is not
available in debugger when breaking at the inlined function instance.
It happens because static locals are emitted in
`DwarfDebug::beginModule()`, but abstract DIEs for functions that are
not completely inlined away are created only later during
`DwarfDebug::endFunctionImpl()` calls.
With this patch, DIEs for static local variables of subprograms that
have both inlined and the concrete out-of-line instances are placed into
abstract subprogram DIEs. They become visible in debugger when breaking
at concrete out-of-line and inlined function instances.
`llvm/test/DebugInfo/Generic/inlined-static-var.ll` illustrates that.
3. It will allow to simplify abstract subprogram DIEs creation by
reverting https://github.com/llvm/llvm-project/pull/159104 later.
This is needed to simplify DWARF emission in a context of proper support
of function-local static variables which comes in the next patch
(https://reviews.llvm.org/D144008), making all function-local entities
handled in `DwarfDebug::endModuleImpl()`.
Authored-by: Kristina Bessonova <kbessonova@accesssoftek.com>
Co-authored-by: David Blaikie <dblaikie@gmail.com>
Co-authored-by: Vladislav Dzhidzhoev <vdzhidzhoev@accesssoftek.com>
findMaskR8() lacked an isDuplex() check, unlike findMaskR6(),
findMaskR11(), and findMaskR16() which all handle duplex instructions.
When the assembler generates R_HEX_8_X on a duplex SA1_addi instruction
(e.g. `{ r0 = add(r0, ##target); memw(r1+#0) = r2 }`), the wrong mask
0x00001fe0 placed relocation bits at [12:5] instead of [25:20],
corrupting the low sub-instruction (e.g. memw became memb).
Add the isDuplex() check returning 0x03f00000, and add a comprehensive
test covering all duplex instruction x relocation type combinations
across findMaskR6, findMaskR8, findMaskR11, and findMaskR16.
Move the ArmCmseSGVeneer and ArmCmseSGSection class definitions from
SyntheticSections.h into the anonymous namespace in Arch/ARM.cpp, where
the implementations already reside. Rename ArmCmseSGVeneer to
CmseSGVeneer as it no longer needs the Arm prefix for disambiguation.
Implement LoongArch::scanSectionImpl, following the pattern established
for x86, PPC64, SystemZ, AArch64. This merges the getRelExpr and TLS
handling for SHF_ALLOC sections into the target-specific scanner,
enabling devirtualization and eliminating abstraction overhead.
- Inline relocation classification into scanSectionImpl with a switch
on relocation type, replacing the generic rs.scan() path.
- Use processR_PC/processR_PLT_PC for common PC-relative and PLT
relocations.
- Inline TLS handling: IE->LE optimization for _PC_ variants only (not
_PCADD_ or absolute), TLSDESC->IE/LE for non-extreme code model,
GD/LD flag setting without going through generic handleTlsRelocation.
- Remove adjustTlsExpr by inlining its logic into scanSectionImpl.
- Remove LoongArch-specific code from Relocations.cpp:
handleTlsRelocation, execOptimizeInLoongArch, and the sort condition.
- Simplify getRelExpr to only handle relocations needed by
relocateNonAlloc, scanEhSection, and the extreme code model fallback
in relocateAlloc.
The only expectations change here is that `__stack_pointer` is
no longer exports in the `archive-export.test` test. This is because
we don't enable the mutable-globals feature (since the assembly files
don't contains all the now-default features of the generic CPU).
Move PPC32Got2Section from SyntheticSections.h/cpp into the anonymous
namespace in Arch/PPC.cpp, renaming it to Got2Section.
Extracted from #184292. Moved initTargetSpecificSections after ctor and
before other hooks to match the linker's pass order.
Move MipsAbiFlagsSection, MipsOptionsSection, MipsReginfoSection, and
MipsRldMapSection from SyntheticSections.h/cpp into Arch/Mips.cpp.
The MIPS-specific section creation in createSyntheticSections is
replaced
by the initTargetSpecificSections hook.
so that we can move target-specific synthetic section creation from
createSyntheticSections into per-target initTargetSpecificSections
overrides. This reduces target-specific code in the shared
SyntheticSections.cpp. The subsequent commits (split from
https://github.com/llvm/llvm-project/pull/184057) will move these
target-specific classes to Arch/ files.
Previously, transitively inherited calls to
`target_include_directories(foo SYSTEM ...)` were being squashed into a
flat list of includes, effectively stripping off `-isystem` and
unintentionally forwarding warnings from such dependencies.
To correctly propagate `SYSTEM` dependencies, use
`target_link_libraries` to forward the parent target's link dependencies
to the OBJECT library (similar to the `_static` flow below). Unlike a
flat `target_include_directories`, this lets CMake resolve transitive
SYSTEM include directories through the proper dependency chain.
Note that `target_link_libraries` on an OBJECT library propagates all
usage requirements, not just includes. This also brings in transitive
`INTERFACE_COMPILE_DEFINITIONS`, `INTERFACE_COMPILE_OPTIONS`, and
`INTERFACE_COMPILE_FEATURES`. This is arguably more correct, as the
OBJECT library compiles the same sources and should see the same flags.
The existing `target_include_directories` call is retained for include
directories set directly on the target (not through link dependencies).
CMake deduplicates include directories that appear through both paths.
Compile definitions and options may technically appear twice (once via
the OBJECT library, once via the consuming target), but duplicate `-D`
and flag entries are harmless in practice.
Also fix `clang_target_link_libraries` and `mlir_target_link_libraries`
to forward the link type (PUBLIC/PRIVATE/INTERFACE) to `obj.*` targets.
Previously the type keyword was silently dropped, resulting in plain-
signature `target_link_libraries` calls. This is now required because
the new keyword-signature call in `llvm_add_library` would otherwise
conflict (CMake requires all calls on a target to use the same
signature).
…83889)"
This reverts commit 2342db00ab4d0305580814fb00f477b4b5cebec6.
Revert "[CMake] Propagate dependencies to OBJECT libraries in
`add_llvm_library` (#183541)"
This reverts commit e3c045415ae52167e197d4a6ed4ad5a04e49423a.
Currently for thin-lto, the imported static global values (functions,
variables, etc) will be promoted/renamed from e.g., foo() to
foo.llvm.(). Such a renaming caused difficulties in live patching
since function name is changed ([1]).
It is possible that some global value names have to be promoted to avoid
name collision and linker failure. But in practice, majority of name
promotions can be avoided.
In [2], the suggestion is that thin-lto pre-link decides whether
a particular global value needs name promotion or not. If yes, later on
in thinBackend() the name will be promoted.
I compiled a particular linux kernel version (latest bpf-next tree)
and found 1216 global values with suffix .llvm.. With this patch,
the number of promoted functions is 2, 98% reduction from the
original kernel build.
If some native objects are not participating with LTO, name promotions
have to be done to avoid potential linker issues. So the current
implementation cannot be on by default. But in certain cases, e.g., linux kernel
build, people can enable lld flag --lto-whole-program-visibility to reduce the
number of functions like foo.llvm.().
For ThinLTOCodeGenerator.cpp which is used by llvm-lto tool and a
few other rare cases, reducing the number of renaming due to promotion,
is not implemented as lld flag '-lto-whole-program-visibility' is not
supported in ThinLTOCodeGenerator.cpp for now. In summary, this pull
request only supports llvm-lto2 style workflow.
The feature is off by default. To enable the future, lld flag
'-lto-whole-program-visibility' and llvm flag
'-always-rename-promoted-locals=false' are needed.
The link [3] has more context for the pull request discussions.
[1] https://lpc.events/event/19/contributions/2212
[2] https://discourse.llvm.org/t/rfc-avoid-functions-like-foo-llvm-for-kernel-live-patch/89400
[3] https://github.com/llvm/llvm-project/pull/178587
Fix-forward for https://github.com/llvm/llvm-project/pull/183541.
Two callsites to target_link_libraries were not migrated to the
keyword signature.
Signed-off-by: Itay Bookstein <itay.bookstein@nextsilicon.com>
Similar to #132569 for RISC-V, replace the unofficial `@plt` and
`@gotpcrel` relocation specifiers, currently only used by clang
-fexperimental-relative-c++-abi-vtables, with %pltpcrel %gotpcrel. The
syntax is not used in humand-written assembly code, and is not supported
by GNU assembler.
Also replace the recent `@funcinit` with `%funcinit(x)`.
Currently for thin-lto, the imported static global values (functions,
variables, etc) will be promoted/renamed from e.g., foo() to
foo.llvm.<hash>(). Such a renaming caused difficulties in live patching
since function name is changed ([1]).
It is possible that some global value names have to be promoted to avoid
name collision and linker failure. But in practice, majority of name
promotions can be avoided.
In [2], the suggestion is that thin-lto pre-link decides whether
a particular global value needs name promotion or not. If yes, later on
in thinBackend() the name will be promoted.
I compiled a particular linux kernel version (latest bpf-next tree)
and found 1216 global values with suffix .llvm.<hash>. With this patch,
the number of promoted functions is 2, 98% reduction from the
original kernel build.
If some native objects are not participating with LTO, name promotions
have to be done to avoid potential linker issues. So the current
implementation cannot be on by default. But in certain cases, e.g., linux kernel
build, people can enable lld flag --lto-whole-program-visibility to reduce the
number of functions like foo.llvm.<hash>().
For ThinLTOCodeGenerator.cpp which is used by llvm-lto tool and a
few other rare cases, reducing the number of renaming due to promotion,
is not implemented as lld flag '-lto-whole-program-visibility' is not supported
in ThinLTOCodeGenerator.cpp for now. In summary, this pull request
only supports llvm-lto2 style workflow.
[1] https://lpc.events/event/19/contributions/2212
[2] https://discourse.llvm.org/t/rfc-avoid-functions-like-foo-llvm-for-kernel-live-patch/89400
Currently, WASM symbols taken from the export section of shared objects
lose their flags. This can result in link failures. For example, if a
TLS symbol is exported from a shared object, relocation fails because
`wasm-ld` thinks that the symbol is not flagged as a TLS symbol.
This PR populates symbol flags for symbols in the export section from
the flags stored in the dylink0 section.
The export info section was also not serialized by the WASM emitter for
YAML, which this PR fixes
wasm sections sizes are specified as u32s, and thus can be as large as
4GB. wasm-ld currently stores the offset into a section as an int32_t
which overflows on large sections and results in a crash. This change
makes it a int64_t to accommodate any valid wasm section and allow
catching even larger sections instead of wrapping around.
This PR fixes the issue by storing the offset as a int64_t, as well as
adding extra checks to handle un-encodeable sections to fail instead of
producing garbage wasm binaries, and also adds lit tests to make sure it
works. I confirmed the test fails on main but passes with this fix.
This is the same as https://github.com/llvm/llvm-project/pull/178287 but
deletes the temporary files the tests create and requires the tests run
on a 64-bit platform to avoid OOM issues due to the large binaries it
creates.
First, disallow R_X86_64_PC64 - generally only absolute relocations are
allowed in getDynRel. glibc and musl don't support R_X86_64_PC64 as
dynamic relocations.
Second, support R_X86_64_32 as dynamic relocation for the ILP32 ABI
(x32). GNU ld's behavior looks like:
- R_X86_64_32 => R_X86_64_RELATIVE
- R_X86_64_64 with addend 0 => R_X86_64_RELATIVE
- R_X86_64_64 with non-zero addend => R_X86_64_RELATIVE64 (unsupported
by musl; compilers do not generate such constructs to the best of my
knowledge)
For now we require R_X86_64_64 to be resolved at link-time for x32.
Fix#140465
Commit 21a4710c67a97838dd75cf60ed24da11280800f8 previously enabled
LoopVectorization and SLPVectorization CodeGen options for the ELF and
COFF LTO backends. Since the Mach-O LTO port did not exist at the time,
it missed this configuration.
This patch adds these options to the Mach-O LTO setup for consistency
with the other backends. Without this, SLP and loop vectorization passes
are silently skipped during Mach-O LTO for O2 and O3 builds.
Implement ARM::scanSectionImpl, following the pattern established for
AArch64 and other targets. This merges the getRelExpr and TLS handling
for SHF_ALLOC sections into the target-specific scanner, enabling
devirtualization and eliminating abstraction overhead.
- Inline relocation classification into scanSectionImpl with a switch
on relocation type, replacing the generic rs.scan() path.
- Use processR_PC/processR_PLT_PC for common PC-relative and PLT
relocations.
- Handle TLS inline: checkTlsLe for TLS LE, handleTlsIe<false> for
TLS IE (no IE-to-LE optimization for ARM), and direct flag/reloc
emission for TLS GD/LD (no GD/LD optimization for ARM).
- Set hasGotOffRel for R_GOTREL/R_GOTONLY_PC relocations.
- Simplify getRelExpr to only handle relocations needed by
relocateNonAlloc and preprocessRelocs.
Implement AArch64::scanSectionImpl, following the pattern established
for x86 (#178846), PPC64 (#181496), and SystemZ (#181563). This merges
the getRelExpr and TLS handling for SHF_ALLOC sections into the
target-specific scanner, enabling devirtualization and eliminating
abstraction overhead.
- Inline relocation classification into scanSectionImpl with a switch
on relocation type, replacing the generic rs.scan() path.
- Use processR_PC/processR_PLT_PC for common PC-relative and PLT
relocations, and handleTlsIe/handleTlsDesc for TLS IE/TLSDESC.
- Remove some AArch64-specific RelExpr members (RE_AARCH64_AUTH_GOT,
RE_AARCH64_AUTH_GOT_PC, RE_AARCH64_AUTH_GOT_PAGE_PC,
RE_AARCH64_AUTH_TLSDESC_PAGE, RE_AARCH64_AUTH_TLSDESC,
RE_AARCH64_RELAX_TLS_GD_TO_IE_PAGE_PC) by using regular RelExpr
members with flag-based dispatch (NEEDS_GOT_AUTH, NEEDS_TLSDESC_AUTH).
AUTH GOT relocations now call `sym.setFlags(NEEDS_GOT | NEEDS_GOT_AUTH)`
and `rs.processAux` directly.
- Remove adjustTlsExpr and handleAArch64PAuthTlsRelocation by inlining
their logic into scanSectionImpl and relocateAlloc.
- Simplify getRelExpr to only handle relocations needed by
relocateNonAlloc and EhInputSection::preprocessRelocs.
These can appear in .debug_info so, like other architectures (e.g.
X86_64), we still need to handle them in getRelExpr.
Fixes: aec1c984266c ("[ELF] Add target-specific relocation scanning for SystemZ (#181563)")
Implement Hexagon::scanSectionImpl, following the pattern established
for x86 (#178846) and PPC64. This merges the getRelExpr and TLS handling
for
SHF_ALLOC sections into the target-specific scanner, enabling
devirtualization and eliminating abstraction overhead.
- Inline relocation classification into scanSectionImpl with a switch
on relocation type, replacing the generic rs.scan() path.
- Use processR_PC/processR_PLT_PC for common PC-relative and PLT
relocations.
- Handle GD PLT relocations inline, always setting NEEDS_PLT. Remove
the R_HEX_GD_PLT special case from process().
- Handle TLS IE, GD GOT, and TPREL directly, bypassing
handleTlsRelocation. Remove EM_HEXAGON from the execOptimize check.
- Simplify getRelExpr to only handle relocations needed by
relocateNonAlloc and scanEhSection.
Implement SystemZ::scanSectionImpl, following the pattern established
for x86 (#178846) and PPC64 (#181496). This merges the getRelExpr and
TLS handling for SHF_ALLOC sections into the target-specific scanner,
enabling devirtualization and eliminating abstraction overhead.
- Inline relocation classification into scanSectionImpl with a switch
on relocation type, replacing the generic `rs.scan()` path.
- Use processR_PC/processR_PLT_PC for common PC-relative and PLT
relocations.
- Handle TLS GD, LD, and DTPREL directly, eliminating
handleTlsRelocation, getTlsGdRelaxSkip, and adjustTlsExpr overrides.
Replace R_RELAX_TLS_GD_TO_IE_GOT_OFF with R_GOT_OFF and
R_RELAX_TLS_GD_TO_LE/R_RELAX_TLS_LD_TO_LE with R_TPREL, using
type-based dispatch in relocate() for marker relocation types.
- Handle TLS IE inline without IE-to-LE optimization. Cannot use
`handleTlsIe`.
- Remove `sortRels`: instead of sorting relocations to process GDCALL
before PLT32DBL, skip PLT32DBL by peeking ahead at the next
relocation to check for a TLS marker (GDCALL/LDCALL).
This fixes SHT_CREL as an alternative to #149640
- Simplify getRelExpr to only handle relocations needed by
relocateNonAlloc and .eh_frame.
Fix#149511
Use AT() to ensure output sections with huge addresses are placed in separate
PT_LOAD segments. Without this, 2.lds and 3.lds created a huge PT_LOAD
segment, making the sparse file size larger than 4GiB, unsupported by
some 32-bit systems.
https://github.com/llvm/llvm-project/pull/179089#issuecomment-3908549089
This was obsoleted by Ctx::hasTlsIe when the latter was introduced, but
the old Config::hasTlsIe was not removed at the same time.
Fixes: 2b153088be4a ("[ELF] Set DF_STATIC_TLS for AArch64/PPC32/PPC64")
Move ctx.hasTlsIe stores from the relocation scan phase to
postScanRelocations. The ctx.hasTlsIe value is ignored for
`!shared` case, so we can remove some redundant assignment.
Delete the toAddr16Rel helper and the GOT_TLS-to-GOT16 remapping switch,
inlining their val adjustments as fallthrough cases in the main switch.
This reduces relocate() from 4 switches on type to 2 (TLS relaxation
pre-switch with early returns, and the unified write switch).
checkPPC64TLSRelax detects TLS GD/LD without TLSGD/TLSLD markers
(generated from old IBM XL) and disables TLS optimization. Previously it
set a per-file flag (ppc64DisableTLSRelax). Now scope it in the section
being scanned.
In addition, simplify the R_PPC64_TLSGD/R_PPC64_TLSLD marker handling:
the redundant `sym.setFlags(NEEDS_TLSIE)` is unnecessary as the
preceding GOT_TLSGD relocation already sets it.
Implement PPC::scanSectionImpl, following the pattern established for
x86 (#178846) and PPC64 (#181496). This merges the getRelExpr and TLS
handling for SHF_ALLOC sections into the target-specific scanner,
enabling devirtualization and eliminating abstraction overhead.
- Inline relocation classification into scanSectionImpl with a switch
on relocation type, replacing the generic rs.scan() path.
- Use processR_PC/processR_PLT_PC for common PC-relative and PLT
relocations.
- Handle R_PPC_PLTREL24 inline with addend masking via processAux,
removing the EM_PPC special case from process().
- Handle TLS GD/LD/IE directly, eliminating handleTlsRelocation,
getTlsGdRelaxSkip, and adjustTlsExpr overrides. Use handleTlsIe
for TLS IE, and handleTlsGd for R_PPC_GOT_TLSGD16.
- Use R_DTPREL unconditionally for DTPREL relocations, removing
R_RELAX_TLS_LD_TO_LE_ABS (PPC32 was the only user).
- Move TLS relaxation dispatch from relocateAlloc into relocate,
removing the relocateAlloc override.
- Simplify getRelExpr to only handle relocations needed by
relocateNonAlloc and .eh_frame.
Implement PPC64::scanSectionImpl, following the pattern established for
x86. This merges the getRelExpr and TLS handling for SHF_ALLOC sections
into the target-specific scanner, enabling devirtualization and
eliminating abstraction overhead.
- Inline relocation classification into scanSectionImpl with a switch
on relocation type, replacing the generic rs.scan() path.
- Use processR_PC/processR_PLT_PC for common PC-relative and PLT
relocations.
- Handle TLS GD, LD, and DTPREL directly, eliminating
handleTlsRelocation, getTlsGdRelaxSkip, and adjustTlsExpr overrides.
Use handleTlsIe for TLS IE, enabling IE-to-LE optimization even when
ppc64DisableTLSRelax is set (lifted a limitation from
the workaround patch https://reviews.llvm.org/D92959).
- Use processAux for R_PPC64_PCREL_OPT. Remove the PPC64-specific
special case from process().
- Replace RE_PPC64_RELAX_GOT_PC with R_RELAX_GOT_PC, which computes
the same value (sym + addend - PC).
- Replace RE_PPC64_RELAX_TOC with R_GOTREL, moving the
ctx.arg.tocOptimize check to relocateAlloc.
- Switch relocateAlloc from expr-based to type-based dispatch.
- Simplify getRelExpr to only handle relocations needed by
relocateNonAlloc.
This reverts commit 0c6d7a40187e5e6cbdff1cf5dbdb6fe91054bef4 and
follow-up 7dfa132936a89a966befb6045f306cb9905c6dab.
It landed prematurely with multiple issues in the implementation and
tests.
LLDB parses compiler versions out of DW_AT_producer DWARF attributes
into a VersionTuple. The Swift compiler recently switched to 5-component
version numbers. In order to support this version scheme without growing
the size of VersionTuple, this patch dedicates the last 10 bits of the
build version to a 5th "sub-build" component. The Swift compiler
currently uses 1 digit for this and promises to never use more than 3
digits for the last 3 components.
This patch still leaves 6 decimal digits for the build component for
other version schemes.
rdar://170181060