The processing of `-oso_prefix` uses `llvm::sys::fs::real_path` from the
user value, but it is later tried to be matched with the result of
`make_absolute`. While `real_path` resolves special symbols like `.`,
`..` and `~`, and resolves symlinks along the path, `make_absolute` does
neither, causing an incompatibility in some situations.
In macOS, temporary directories would normally be reported as
`/var/folders/<random>`, but `/var` is in fact a symlink to
`private/var`. If own is working on a temporary directory and uses
`-oso_prefix .`, it will be expanded to `/private/var/folder/<random>`,
while `make_absolute` will expand to `/var/folder/<random>` instead, and
`-oso_prefix` will fail to remove the prefix from the `N_OSO` entries,
leaving absolute paths to the temporary directory in the resulting file.
This would happen in any situation in which the working directory
includes a symlink, not only in temporary directories.
One can change the usage of `make_absolute` to use `real_path` as well,
but `real_path` will mean checking the file system for each `N_OSO`
entry. The other solution is stop using `real_path` when processing
`-oso_prefix` and manually expand an input of `.` like `make_absolute`
will do.
This second option is the one implemented here, since it is the closest
to the visible behaviour of ld64 (like the removed comment notes), so it
is the better one for compatibility. This means that a test that checked
the usage of the tilde as `-oso_prefix` needs to be removed (since it
was done by using `real_path`), and two new tests are provided checking
that symlinks do not affect the result. The second test checks a change
in behaviour, in which if one provides the input files with a prefix of
`./`, even when using `-oso_prefix .` because the matching is textual,
the `./` prefix will stay in the `N_OSO` entries. This matches the
observed behaviour of ld64.
The backend emits `.loh` directives for arm64_32 as well. Our pass
already handles 32-bit pointer loads correctly (there was an extraneous
sanity check for 8-byte pointer sizes, I removed that here), so we can
enable them for all arm64 subtargets, including our upcoming arm64e
support.
Moving it away from the arm64 `TargetInfo` class will let us enable it
more easily for arm64_32 and the soon-to-be-added arm64e target as well.
This is the NFC part of #148964
This is causing use-after-frees due to references getting invalidating
after the loadedDylibs map grows, see comments on the PR.
This reverts commit 475a8a47ead32755bb1377d361afbd1928880e14.
Expand the `-order_file` also accept cstrings to order.
The purpose is to order hot cstrings for performance (implemented in
this diff), and then later on we can also order cold cstrings for
compression size win.
Due to the speciality of cstrings, there's no way to pass in symbol
names in the order file as the existing -order_file, so we expect `<hash
of cstring literal content>` to represent/identify each cstring.
```
// An order file has one entry per line, in the following format:
//
// <cpu>:<object file>:[<symbol name> | CStringEntryPrefix <cstring hash>]
//
// <cpu> and <object file> are optional.
// If not specified, then that entry tries to match either,
//
// 1) any symbol of the <symbol name>;
// Parsing this format is not quite straightforward because the symbol name
// itself can contain colons, so when encountering a colon, we consider the
// preceding characters to decide if it can be a valid CPU type or file path.
// If a symbol is matched by multiple entries, then it takes the
// lowest-ordered entry (the one nearest to the front of the list.)
//
// or 2) any cstring literal with the given hash, if the entry has the
// CStringEntryPrefix prefix defined below in the file. <cstring hash> is the
// hash of cstring literal content.
//
// Cstring literals are not symbolized, we can't identify them by name
// However, cstrings are deduplicated, hence unique, so we use the hash of
// the content of cstring literals to identify them and assign priority to it.
// We use the same hash as used in StringPiece, i.e. 31 bit:
// xxh3_64bits(string) & 0x7fffffff
//
```
The ordering of cstring has to happen during/before the finalizing of
the cstring section content in the `finalizeContents()` function, which
happens before the writer is run
---------
Co-authored-by: Sharon Xu <sharonxu@fb.com>
```
/// Symbols can be appended with "(.__uniq.xxxx)?.llvm.yyyy" where "xxxx" and
/// "yyyy" are numbers that could change between builds. We need to use the root
/// symbol name before this suffix so these symbols can be matched with profiles
/// which may have different suffixes.
```
Just like what we are doing in BP,
https://github.com/llvm/llvm-project/blob/main/lld/MachO/BPSectionOrderer.cpp#L127
the patch removes the suffixes when parsing the order file and getting
the symbol priority to have a better symbol match.
---------
Co-authored-by: Sharon Xu <sharonxu@fb.com>
Co-authored-by: Ellis Hoag <ellis.sparky.hoag@gmail.com>
With an upcoming change to llvm/ProfileData headers, this file ends up
with both llvm::Reloc in scope as well as the lld::macho::Reloc type
intended for use here. Qualify the use in this file with the intended
namespace to avoid compilation errors.
In `applyAdrpAddLdr()` we make a transformation that is identical to the
one in `applyAdrpAdd()`, so lets reuse that code. Also refactor
`forEachHint()` to use more `ArrayRef` and move around some lines for
consistancy.
Framework load paths can be either the top level framework name, or
subpaths of the framework bundle pointing to specific framework binary
versions. Extend the framework lookup logic to handle the latter case.
Enhance branch extension logic to handle __objc_stubs identically to
__stubs
The branch extension algorithm currently has specific handling for the
`__stubs` section:
1. It ensures all `__stubs` content is directly reachable via branches
from the text section.
2. It calculates the latest text section address that might require
thunks to reach the end of `__stubs`.
The `__objc_stubs` section requires precisely the same handling due to
its similar nature, but this was not implemented.
This commit generalizes the existing logic so it applies consistently to
both the `__stubs` and `__objc_stubs` sections, ensuring correct
reachability and thunk placement for both. Without this change it's
possible to get relocation errors during linking in scenarios where the
`__objc_stubs` section is large enough.
When loading frameworks it is possible to have load commands for the
same framework through symlinks and the real path. To avoid loading
these multiple times find the real path before checking the dylib cache.
Details:
When we have the following scenario:
- lib_a re-exports lib_b
- lib_b re-exports @rpath/lib_c
+ lib_b contains LC_RPATH
Previously, lld-macho cannot find lib_c because it was attempting to
resolve the '@rpath' from lib_b (which had no LC_RPATH defined). The
change here is to also consider all the LC_RPATH rom lib_b when trying
to find lib_c.
Inspired by real-life example when linking with
libXCTestSwiftSupport.dylib (which re-exports XCTest, which re-exports
XCTestCore)
This is a ~port of https://reviews.llvm.org/D117284. Like in that
change, archives without indices are treated as a collection of lazy
object files (as in `--start-lib/--end-lib`)
Porting the ELF follow-up to convert *all* archives to the lazy object
code path (https://reviews.llvm.org/D119074) is a natural next step, but
we would need to ensure the assertions about memory use hold for Mach-O.
NB: without an index, we can't do the part of the `-ObjC` scan where we
check for Objective-C symbols directly. We *can* still check for
`__obcj` sections so I wonder how much of a problem this actually is,
since I'm not sure how the "symbols but no sections" case can appear in
the wild.
In `OutputSegment.cpp`, we need to ensure a specific order for certain
sections. The current sorting logic incorrectly prioritizes code
sections over explicitly defined section orders. This is problematic
because the `__objc_stubs` section is both a code section *and* has a
specific ordering requirement. The current logic would incorrectly
prioritize its code section status, causing it to be sorted *before* the
`__stubs` section. This incorrect ordering breaks the branch extension
algorithm, ultimately leading to linker failures due to relocation
errors.
We also modify the `lld/test/MachO/arm64-objc-stubs.s` test to ensure
correct section ordering.
When using the linker flags `--icf=safe_thunks` and `--keep-icf-stabs`
together, an issue arises with the STABS debugging entries in the linked
output. The problem affects STABS entries for functions that are folded
via ICF using thunks.
For instance, if `func1` is merged into `func2` through a thunk, the
STABS entry for `func1` incorrectly points to the object file of
`func2`. This is incorrect behavior—each function’s STABS entry should
consistently point to its own original object file (e.g., the STABS
entry for `func1` should reference `func1`’s object file). This issue
causes `dsymutil` to not be able to retrieve the debug information for
the problematic function.
This patch corrects this behavior - making it so that STABS entries
always point to the correct object file.
When a library is specified with both `-l` and `-reexport_libraries`,
lld will emit two load commands for it, in contrast with ld64.
In an upcoming version of macOS, this fails dyld validation; see
https://crbug.com/404905688
---------
Co-authored-by: Mark Rowe <markrowe@chromium.org>>
The `--disable_verify` flag is implemented for ELF and is used to
disable LLVM module verification.
93afd8f9ac/lld/ELF/Options.td (L661)
This allows us to quickly suppress verification errors.
Refactor to add some early return logic to `applySafeThunksToRange` so
that we completely skip irrelevant ranges.
Also add a check for `isCodeSection` to ensure we only apply branch
thunks to code section (they don't make sense for anything else).
Currently this isn't an issue since there are no `keepUnique` non-code
sections - but this is not a hard restriction and may be implemented in
the future, so we should be able to handle (i.e. avoid) this scenario.
## Problem
The `safe_thunks` ICF optimization in `lld-macho` was creating thunks
that pointed to `InputSection`s instead of `Symbol`s. While, generally,
branch relocations can point to symbols or input sections, in this case
we need them to point to symbols as subsequently the branch extension
algorithm expects branches to always point to `Symbol`'s.
## Solution
This patch changes the ICF implementation so that safe thunks point to
`Symbol`'s rather than `InputSection`s.
## Testing
The existing `arm64-thunks.s` test is modified to include
`--icf=safe_thunks` to explicitly verify the interaction between ICF and
branch range extension thunks. Two functions were added that will be
merged together via a thunk. Before this patch, this test would generate
an assert - now this scenario is correctly handled.
ICF runs before BPSectionOrderer. When a section is ICF'ed, it seems
that the original sections are marked as not live, but are still kept
around. Prior to this patch, those ICF'ed sections would be passed to BP
and ordered before being skipped when writing the output. Now, these
sections are no longer passed to BP, saving runtime and possibly
improving BP's output.
In a large binary, I found that the number of sections ordered using BP
decreased, while the number of duplicate sections drastically decreased
as expected.
```
Functions for startup: 50755 -> 50520
Functions for compression: 165734 -> 105328
Duplicate functions: 1827231 -> 55230
```
Note that PointerUnion::dyn_cast has been soft deprecated in
PointerUnion.h:
// FIXME: Replace the uses of is(), get() and dyn_cast() with
// isa<T>, cast<T> and the llvm::dyn_cast<T>
Literal migration would result in dyn_cast_if_present (see the
definition of PointerUnion::dyn_cast), but this patch uses dyn_cast
because we expect referent to be nonnull.
Exposed by the test added in the reverted #120514
* Fix libstdc++/libc++ differences due to nth_element. https://github.com/llvm/llvm-project/pull/125450#issuecomment-2631404178
* Fix LLVM_ENABLE_REVERSE_ITERATION=1 differences
* Fix potential issue in `currentSize += D::getSize(*sections[*sectionIdxs.begin()])` where DenseSet was used, though not covered by a test
The ELF/bp-section-orderer.s test is failing on some buildbots due to
what seems like non-determinism issues, see comments on the original PR
and #125450
Reverting to green the build.
This reverts commit 0154dce8d39d2688b09f4e073fe601099a399365 and
follow-up commits 046dd4b28b9c1a75a96cf63465021ffa9fe1a979 and
c92f20416e6dbbde9790067b80e75ef1ef5d0fa4.
PR #117514 refactored BPSectionOrderer to be used by the ELF port
but introduced some inefficiency:
* BPSectionBase/BPSymbol are wrappers around a single pointer.
The numbers of sections and symbols could be huge, and the extra
allocations are memory inefficient.
* Reconstructing the returned DenseMap (since BPSectionBase != InputSectin)
is wasteful.
This patch refactors BPSectionOrderer with Curiously Recurring Template
Pattern and eliminates the inefficiency. In addition,
`symbolToSectionIdxs` is removed and `rootSymbolToSectionIdxs` building
is moved to lld/MachO: while getting sections for symbols is cheap in
Mach-O, it is awkward and inefficient in the ELF port.
While here, add a file-level comment and replace some `StringMap<*>`
(which copies strings) with `DenseMap<CachedHashStringRef, *>`.
Pull Request: https://github.com/llvm/llvm-project/pull/124482
Note that PointerUnion::dyn_cast has been soft deprecated in
PointerUnion.h:
// FIXME: Replace the uses of is(), get() and dyn_cast() with
// isa<T>, cast<T> and the llvm::dyn_cast<T>
This patch migrates uses of PointerUnion::dyn_cast to
dyn_cast_if_present (see the definition of PointerUnion::dyn_cast).
Note that we cannot use dyn_cast in any of the migrations in this
patch; placing
assert(!X.isNull());
just before any of dyn_cast_if_present in this patch triggers some
failure in check-lld.
The symbol string table does not have deduplication.
Here we add code to deduplicate the symbol string table.
This has a rather large size impact (20-30%) on unstripped binaries
(typically debug binaries) but no size impact on stripped
binaries(typically release binaries).
We enable deduplication by default and add a flag to disable it
(`-no-deduplicate-symbol-strings`).
Note that PointerUnion::dyn_cast has been soft deprecated in
PointerUnion.h:
// FIXME: Replace the uses of is(), get() and dyn_cast() with
// isa<T>, cast<T> and the llvm::dyn_cast<T>
Literal migration would result in dyn_cast_if_present (see the
definition of PointerUnion::dyn_cast), but this patch uses cast
because we know expect isa<Symbol *>(rel.referent) to be true.
Note that PointerUnion::dyn_cast has been soft deprecated in
PointerUnion.h:
// FIXME: Replace the uses of is(), get() and dyn_cast() with
// isa<T>, cast<T> and the llvm::dyn_cast<T>
Literal migration would result in dyn_cast_if_present (see the
definition of PointerUnion::dyn_cast), but this patch uses cast
because we know expect isa<InputSection *>(reloc.referent) to be true.
Adding the `safe_thunks` option in `Options.td` as it was missing there
- mentioned by @Colibrow in
https://github.com/llvm/llvm-project/pull/106573
Also documenting what the various options mean.
Help now looks like this:
```
..........
--error-limit=<value> Maximum number of errors to print before exiting (default: 20)
--help-hidden Display help for hidden options
--icf=[none,safe,safe_thunks,all]
Set level for identical code folding (default: none). Possible values:
none - Disable ICF
safe - Only folds non-address significant functions (as described by `__addrsig` section)
safe_thunks - Like safe, but replaces address-significant functions with thunks
all - Fold all identical functions
--ignore-auto-link-option=<value>
Ignore a single auto-linked library or framework. Useful to ignore invalid options that ld64 ignores
--irpgo-profile-sort=<profile>
Deprecated. Please use --irpgo-profile and --bp-startup-sort=function
..........
```
xxHash, inferior to xxh3, is discouraged. We try not to use xxhash in
lld.
Switch to read32le for content hash and xxh3/stable_hash_combine for
relocation hash. Remove the intermediate std::string for relocation
hash.
Change the tail hashing scheme to consider individual bytes instead.
This helps group 0102 and 0201 together. The benefit is negligible,
though.
Pull Request: https://github.com/llvm/llvm-project/pull/121729
This commit improves the memory efficiency of the lld-macho linker by
optimizing how thunks are printed in the map file. Previously, merging
vectors of input sections required creating a temporary vector, which
increased memory usage and in some cases caused the linker to run out of
memory as reported in comments on
https://github.com/llvm/llvm-project/pull/120496. The new approach
interleaves the printing of two arrays of ConcatInputSection in sorted
order without allocating additional memory for a merged array.
--order_file, call graph profile, and BalancedPartitioning currently
build the section order vector by decreasing priority (from SIZE_MAX to
0). However, it's conventional to use an increasing key (see
OutputSection::inputOrder).
Switch to increasing priorities, remove the global variable
highestAvailablePriority, and remove the highestAvailablePriority
parameter from BPSectionOrderer. Change size_t to int.
This improves consistenty with the ELF and COFF ports. The ELF port
utilizes negative priorities for --symbol-ordering-file and call graph
profile, and non-negative priorities for --shuffle-sections (no Mach-O
counterpart yet).
Pull Request: https://github.com/llvm/llvm-project/pull/121727