The fixed-point layout algorithm handles linker scripts, thunks, and
relaxOnce (to suppress out-of-range GOT-indirect-to-PC-relative
optimization). These passes are not needed for relocatable links because
they require address information that is not yet available.
Since we don't scan relocations for relocatable links, the
`createThunks` and `relaxOnce` functions are no-ops anyway, making these
passes redundant.
To prevent cluttering the line history, I place the `if (...) break;`
inside the for loop.
Pull Request: https://github.com/llvm/llvm-project/pull/152240
DynamicReloc::AgainstSymbol is now true and DynamicReloc::AddendOnly is
now false; uses of the constants were replaced mechanically.
Reviewers: rnk, MaskRay
Reviewed By: MaskRay
Pull Request: https://github.com/llvm/llvm-project/pull/150813
The former is just a special case of the latter, ignoring the expr and
always just using the addend, allowing (and enforcing) the sym is null.
If we just use dummySym then we don't need to maintain this as a
separate case, since R_ADDEND will return the addend unmodified for the
call to getRelocTargetVA.
Reviewers: MaskRay, arichardson
Reviewed By: MaskRay, arichardson
Pull Request: https://github.com/llvm/llvm-project/pull/150797
PR #148920 was merged before I could share my comments.
* Fix the text filename. There are other minor suggestions, but can be
done in #148985
* Make `isRelRoDataSection` concise, to be consistent with the majority of
helper functions.
https://discourse.llvm.org/t/rfc-profile-guided-static-data-partitioning/83744
proposes to partition a static data section (like `.data.rel.ro`) into
two sections, one grouping the cold ones and the other grouping the
rest.
lld requires all relro sections to be contiguous. To place
`.data.rel.ro.unlikely` in the middle of all relro sections, this change
proposes to add `.data.rel.ro.unlikely` explicitly as RELRO section.
---------
Co-authored-by: Sam Elliott <quic_aelliott@quicinc.com>
The behavior of an undefined weak reference is implementation defined.
For static -no-pie linking, dynamic relocations are generally avoided (except
IRELATIVE). -shared linking generally emits dynamic relocations.
Dynamic -no-pie linking and -pie allow flexibility. Changes adjust the
behavior for better consistency and simpler internal representation,
e.g. https://reviews.llvm.org/D63003https://reviews.llvm.org/D105164
(generalized to undefined non-weak in
2fcaa00d1e2317a90c9071b735eb0e758b5dd58b).
GNU ld introduced -z [no]dynamic-undefined-weak option to fine-tune the
behavior. (The option is not very effective with -no-pie, e.g. on
x86-64, `ld.bfd a.o s.so -z dynamic-undefined-weak` generates
R_X86_64_NONE relocations instead of GLOB_DAT/JUMP_SLOT)
This patch implements -z [no]dynamic-undefined-weak option.
The effects are summarized as follows:
* Static -no-pie: no-op
* Dynamic -no-pie: nodynamic-undefined-weak suppresses GLOB_DAT/JUMP_SLOT
* Static -pie: dynamic-undefined-weak generates ABS/GLOB_DAT/JUMP_SLOT.
https://discourse.llvm.org/t/lld-weak-undefined-symbols-in-vdso-only/86749
* Dynamic -pie: nodynamic-undefined-weak suppresses ABS/GLOB_DAT/JUMP_SLOT
The -pie behavior likely stays stable while -no-pie (`!ctx.arg.isPic` in
`isStaticLinkTimeConstant`) behavior will likely change in the future.
The current default value of ctx.arg.zDynamicUndefined is selected to
prevent behavior changes.
Pull Request: https://github.com/llvm/llvm-project/pull/143831
So that when mixing small and large text, large text stays out of the
way of the rest of the binary.
Place large RX sections at the beginning rather than at the end so that
with `--no-rosegment`, the large text and rodata share a single PT_LOAD
segment. Place large RWX sections at the end to keep writable and
readonly sections separate.
Clang started emitting the large section flag for `.ltext` sections in
#73037.
When using `-no-pie` without a `SECTIONS` command, the linker uses the
target's default image base. If `-Ttext=` or `--section-start` specifies
an output section address below this base, the result is likely
unintended.
- With `--no-rosegment`, the PT_LOAD segment covering the ELF header cannot include `.text` if `.text`'s address is too low, causing an `error: output file too large`.
- With default `--rosegment`:
- If a read-only section (e.g., `.rodata`) exists, a similar `error: output file too large` occurs.
- Without read-only sections, the PT_LOAD segment covering the ELF header and program headers includes no sections, which is unusual and likely undesired. This also causes non-ascending PT_LOAD `p_vaddr` values related to the PT_LOAD that overlaps with PT_PHDR (#138584).
To prevent these issues, report an error if a section address is below
the image base and suggest `--image-base`. This check also applies when
`--image-base` is explicitly set but is skipped when a `SECTIONS`
command is used.
Pull Request: https://github.com/llvm/llvm-project/pull/140187
When the last PT_LOAD segment is executable and includes BSS sections,
its p_memsz may exceed the aligned p_filesz. This change ensures p_memsz
is not reduced in such cases (e.g. --omagic).
In addition, disable this behavior when a SECTIONS command is specified.
Refined behavior introduced in https://reviews.llvm.org/D37369 (2017).
The -z separate-loadable-segments --omagic test adds coverage for the
option combination, even if it might be practical.
Pull Request: https://github.com/llvm/llvm-project/pull/139207
Following from the discussion in #132224, this seems like the best
approach to deal with a mix of XO and RX output sections in the same
binary. This change will also simplify the implementation of the
PURECODE section flag for AArch64.
To control this behaviour, the `--[no-]xosegment` flag is added to LLD
(similarly to `--[no-]rosegment`), which determines whether to allow
merging XO and RX sections in the same segment. The default value is
`--no-xosegment`, which is a breaking change compared to the previous
behaviour.
Release notes are also added, since this will be a breaking change.
`-z execute-only-report` checks that all executable sections have either
the SHF_AARCH64_PURECODE or SHF_ARM_PURECODE section flag set on AArch64
and ARM respectively.
There are considerable number of changes done in the address assignment
fixed point loop, and errors in any of them could cause address
assignment not to converge. However, this is reported to the user as
either "thunk creation not converged" or "relaxation not converged".
We saw a confused bug about this in the wild when spilling failed to
converge. (I'm working on a fix for that.)
We may eventually want a complete reason system when reporting address
assignment taking too many passes, but in the interim it seems prudent
to generalize the error message to "address assignment did not
converge".
--export-dynamic should be a no-op when ctx.hasDynsym is false.
* Drop unneeded ctx.hasDynsym checks.
* Static linking with --export-dynamic does not prevent devirtualization.
Reland 994cea3f0a2d0caf4d66321ad5a06ab330144d89 after bolt tests no
longer rely on -pie --unresolved-symbols=ignore-all with no input DSO
generating PLT entries.
---
Commit f10441ad003236ef3b9e5415a571d2be0c0ce5ce , while dropping a
special case for isUndefWeak and --no-dynamic-linking, made
--export-dynamic ineffective when -pie is used without any input DSO.
This change restores --export-dynamic and unifies -pie and -pie
--no-dynamic-linker when there is no input DSO.
* -pie with no input DSO suppresses undefined symbols in .dynsym.
Previously this only appied to -pie --no-dynamic-linker.
* As a side effect, -pie with no input DSO suppresses PLT.
Reland #120514 after 2f6e3df08a8b7cd29273980e47310cf09c6fdbd8 fixed
iteration order issue and libstdc++/libc++ differences.
---
Both options instruct the linker to optimize section layout with the
following goals:
* `--bp-compression-sort=[data|function|both]`: Improve Lempel-Ziv
compression by grouping similar sections together, resulting in a
smaller compressed app size.
* `--bp-startup-sort=function --irpgo-profile=<file>`: Utilize a
temporal profile file to reduce page faults during program startup.
The linker determines the section order by considering three groups:
* Function sections ordered according to the temporal profile
(`--irpgo-profile=`), prioritizing early-accessed and frequently
accessed functions.
* Function sections. Sections containing similar functions are placed
together, maximizing compression opportunities.
* Data sections. Similar data sections are placed together.
Within each group, the sections are ordered using the Balanced
Partitioning algorithm.
The linker constructs a bipartite graph with two sets of vertices:
sections and utility vertices.
* For profile-guided function sections:
+ The number of utility vertices is determined by the symbol order
within the profile file.
+ If `--bp-compression-sort-startup-functions` is specified, extra
utility vertices are allocated to prioritize nearby function similarity.
* For sections ordered for compression: Utility vertices are determined
by analyzing k-mers of the section content and relocations.
The call graph profile is disabled during this optimization.
When `--symbol-ordering-file=` is specified, sections described in that
file are placed earlier.
Co-authored-by: Pengying Xu <xpy66swsry@gmail.com>
The ELF/bp-section-orderer.s test is failing on some buildbots due to
what seems like non-determinism issues, see comments on the original PR
and #125450
Reverting to green the build.
This reverts commit 0154dce8d39d2688b09f4e073fe601099a399365 and
follow-up commits 046dd4b28b9c1a75a96cf63465021ffa9fe1a979 and
c92f20416e6dbbde9790067b80e75ef1ef5d0fa4.
Add new ELF linker options for profile-guided section ordering
optimizations:
- `--irpgo-profile=<file>`: Read IRPGO profile data for use with startup
and compression optimizations
- `--bp-startup-sort={none,function}`: Order sections based on profile
data to improve star tup time
- `--bp-compression-sort={none,function,data,both}`: Order sections
using balanced partitioning to improve compressed size
- `--bp-compression-sort-startup-functions`: Additionally optimize
startup functions for compression
- `--verbose-bp-section-orderer`: Print statistics about balanced
partitioning section ordering
Thanks to the @ellishg, @thevinster, and their team's work.
---------
Co-authored-by: Fangrui Song <i@maskray.me>
Commit f10441ad003236ef3b9e5415a571d2be0c0ce5ce dropped a special case
for isUndefWeak and --no-dynamic-linking but also made --export-dynamic
ineffective for static PIE.
This change restores the --export-dynamic behavior and entirely drops
special handling of --no-dynamic-linker:
* -pie with no input DSO, similar to --no-dynamic-linker, suppresses
undefined symbols in .dynsym
The new behaviors resemble GNU ld more.
Commit 3733ed6f1c6b0eef1e13e175ac81ad309fc0b080 introduced isExported to
cache includeInDynsym. If we don't unnecessarily set isExported for
undefined symbols, exportDynamic/includeInDynsym can be replaced with
isExported.
Commit 2a26292388fcab0c857c91b2d08074c33abd37e8 made `isExported`
accurate except a few linker-synthesized symbols in finalizeSections.
We can collect these linker-synthesized symbols into a vector
and avoid recomputation for other symbols.
This is reland of 1a4d6de1b532149b10522eae5dabce39e5f7c687 after
`isExported` has been made accurate by f10441ad003236ef3b9e5415a571d2be0c0ce5ce
`includeInDynsym` has a special case for isUndefWeak and
--no-dynamic-linker, which can be removed if we simplify disallow
dynamic symbols for static-pie.
The partition feature reports errors only when a symbol `isExported`.
We need to link in a DSO to trigger the mips error.
This reverts commit 048f35037779763963c4b4478a0884e828ea9538.
This reverts commit f7bbc40b0736cc417f57cd039b098b504cf6a71f.
Related to #95949. A developer with no prior lld contribution and very
little AMD contribution sneaked in these application-specific section
order rules we discourage.
Commit 2a26292388fcab0c857c91b2d08074c33abd37e8 made `isExported`
accurate except a few linker-synthesized symbols in finalizeSections.
We can collect these linker-synthesized symbols into a vector
and avoid recomputation for other symbols.
Port https://reviews.llvm.org/D117354 from the MachO port.
If both --symbol-ordering-file and call graph profile are present, the
--symbol-ordering-file takes precedence, but the call graph profile is
still used for symbols that don't appear in the order file.
In addition, call graph profile described sections are now ordered
before other sections.
isExported, intended to replace exportDynamic, is primarily set in two
locations, (a) after parseSymbolVersion and (b) during demoteSymbols.
In the future, we should try removing exportDynamic. Currently,
merging exportDynamic/isExported would cause
riscv-gp.s to fail:
* The first isExported computation considers the undefined symbol exported
* Defined as a linker-synthesized symbol
* isExported remains true, while it should be false
In #86751 we moved the IRELATIVE relocations to .rela.plt when
--pack-dyn-relocs=android was enabled but we neglected to also move
the __rela_iplt_{start,end} symbols. As a result, static binaries
linked with this flag were unable to find their IRELATIVE relocations.
Fix it by having the symbols surround the correct section.
Reviewers: MaskRay, smithp35
Reviewed By: MaskRay
Pull Request: https://github.com/llvm/llvm-project/pull/118585