This commit provides linker support for Cortex-M Security Extensions (CMSE).
The specification for this feature can be found in ARM v8-M Security Extensions:
Requirements on Development Tools.
The linker synthesizes a security gateway veneer in a special section;
`.gnu.sgstubs`, when it finds non-local symbols `__acle_se_<entry>` and `<entry>`,
defined relative to the same text section and having the same address. The
address of `<entry>` is retargeted to the starting address of the
linker-synthesized security gateway veneer in section `.gnu.sgstubs`.
In summary, the linker translates input:
```
.text
entry:
__acle_se_entry:
[entry_code]
```
into:
```
.section .gnu.sgstubs
entry:
SG
B.W __acle_se_entry
.text
__acle_se_entry:
[entry_code]
```
If addresses of `__acle_se_<entry>` and `<entry>` are not equal, the linker
considers that `<entry>` already defines a secure gateway veneer so does not
synthesize one.
If `--out-implib=<out.lib>` is specified, the linker writes the list of secure
gateway veneers into a CMSE import library `<out.lib>`. The CMSE import library
will have 3 sections: `.symtab`, `.strtab`, `.shstrtab`. For every secure gateway
veneer <entry> at address `<addr>`, `.symtab` contains a `SHN_ABS` symbol `<entry>` with
value `<addr>`.
If `--in-implib=<in.lib>` is specified, the linker reads the existing CMSE import
library `<in.lib>` and preserves the entry function addresses in the resulting
executable and new import library.
Reviewed By: MaskRay, peter.smith
Differential Revision: https://reviews.llvm.org/D139092
This reverts commit 9246df7049b0bb83743f860caff4221413c63de2.
Reason: This patch broke the UBSan buildbots. See more information in
the original phabricator review: https://reviews.llvm.org/D139092
This reverts commit c4fea3905617af89d1ad87319893e250f5b72dd6.
I am reverting this for now until I figure out how to fix
the build bot errors and warnings.
Errors:
llvm-project/lld/ELF/Arch/ARM.cpp:1300:29: error: expected primary-expression before ‘>’ token
osec->writeHeaderTo<ELFT>(++sHdrs);
Warnings:
llvm-project/lld/ELF/Arch/ARM.cpp:1306:31: warning: left operand of comma operator has no effect [-Wunused-value]
This commit provides linker support for Cortex-M Security Extensions (CMSE).
The specification for this feature can be found in ARM v8-M Security Extensions:
Requirements on Development Tools.
The linker synthesizes a security gateway veneer in a special section;
`.gnu.sgstubs`, when it finds non-local symbols `__acle_se_<entry>` and `<entry>`,
defined relative to the same text section and having the same address. The
address of `<entry>` is retargeted to the starting address of the
linker-synthesized security gateway veneer in section `.gnu.sgstubs`.
In summary, the linker translates input:
```
.text
entry:
__acle_se_entry:
[entry_code]
```
into:
```
.section .gnu.sgstubs
entry:
SG
B.W __acle_se_entry
.text
__acle_se_entry:
[entry_code]
```
If addresses of `__acle_se_<entry>` and `<entry>` are not equal, the linker
considers that `<entry>` already defines a secure gateway veneer so does not
synthesize one.
If `--out-implib=<out.lib>` is specified, the linker writes the list of secure
gateway veneers into a CMSE import library `<out.lib>`. The CMSE import library
will have 3 sections: `.symtab`, `.strtab`, `.shstrtab`. For every secure gateway
veneer <entry> at address `<addr>`, `.symtab` contains a `SHN_ABS` symbol `<entry>` with
value `<addr>`.
If `--in-implib=<in.lib>` is specified, the linker reads the existing CMSE import
library `<in.lib>` and preserves the entry function addresses in the resulting
executable and new import library.
Reviewed By: MaskRay, peter.smith
Differential Revision: https://reviews.llvm.org/D139092
Arm has BE8 big endian configuration called a byte-invariant(every byte has the same address on little and big-endian systems).
When in BE8 mode:
1. Instructions are big-endian in relocatable objects but
little-endian in executables and shared objects.
2. Data is big-endian.
3. The data encoding of the ELF file is ELFDATA2MSB.
To support BE8 without an ABI break for relocatable objects,the linker takes on the responsibility of changing the endianness of instructions. At a high level the only difference between BE32 and BE8 in the linker is that for BE8:
1. The linker sets the flag EF_ARM_BE8 in the ELF header.
2. The linker endian reverses the instructions, but not data.
This patch adds BE8 big endian support for Arm. To endian reverse the instructions we'll need access to the mapping symbols. Code sections can contain a mix of Arm, Thumb and literal data. We need to endian reverse Arm instructions as words, Thumb instructions
as half-words and ignore literal data.The only way to find these transitions precisely is by using mapping symbols. The instruction reversal will need to take place after relocation. For Arm BE8 code sections (Section has SHF_EXECINSTR flag ) we inserted a step after relocation to endian reverse the instructions. The implementation strategy i have used here is to write all sections BE32 including SyntheticSections then endian reverse all code in InputSections via mapping symbols.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D150870
This reverts commit aa495214b39d475bab24b468de7a7c676ce9e366.
As discussed in https://github.com/llvm/llvm-project/issues/53475 this patch
allows for using LLD-as-a-lib. It also lets clients link only the drivers that
they want (see unit tests).
This also adds the unit test infra as in the other LLVM projects. Among the
test coverage, I've added the original issue from @krzysz00, see:
https://github.com/ROCmSoftwarePlatform/D108850-lld-bug-reproduction
Important note: this doesn't allow (yet) linking in parallel. This will come a
bit later hopefully, in subsequent patches, for COFF at least.
Differential revision: https://reviews.llvm.org/D119049
--remap-inputs-file= can be specified multiple times, each naming a
remap file that contains `from-glob=to-file` lines or `#`-led comments.
('=' is used a separator a la -fdebug-prefix-map=)
--remap-inputs-file= can be used to:
* replace an input file. E.g. `"*/libz.so=exp/libz.so"` can replace a resolved
`-lz` without updating the input file list or (if used) a response file.
When debugging an application where a bug is isolated to one single
input file, this option gives a convenient way to test fixes.
* remove an input file with `/dev/null` (changed to `NUL` on Windows), e.g.
`"a.o=/dev/null"`. A build system may add unneeded dependencies.
This option gives a convenient way to test the result removing some inputs.
`--remap-inputs=a.o=aa.o` can be specified to provide one pattern without using
an extra file.
(bash/zsh process substitution is handy for specifying a pattern without using
a remap file, e.g. `--remap-inputs-file=<(printf 'a.o=aa.o')`, but it may be
unavailable in some systems. An extra file can be inconvenient for a build
system.)
Exact patterns are tested before wildcard patterns. In case of a tie, the first
patterns wins. This is an implementation detail that users should not rely on.
Co-authored-by: Marco Elver <elver@google.com>
Link: https://discourse.llvm.org/t/rfc-support-exclude-inputs/70070
Reviewed By: melver, peter.smith
Differential Revision: https://reviews.llvm.org/D148859
When --threads= is unspecified, we set it to
`parallel::strategy.compute_thread_count()`, which uses
sched_getaffinity (Linux)/cpuset_getaffinity (FreeBSD)/std:🧵:hardware_concurrency (others).
With extensive testing on many machines (many configurations from
{aarch64,x86-64} x {Linux,FreeBSD,Windows} x allocators(native,mimalloc,rpmalloc) combinations)
with varying workloads, we discovered that when the concurrency is larger than
16, the linking process is slower than using --threads=16 due to parallelism
overhead outweighs optimizations. This is particularly harmful for machines with
many cores or when the link job competes with other jobs.
Cap parallel::strategy when --threads= is unspecified.
For some workloads changing the concurrency from 8 to 16 has nearly no improvement.
--thinlto-jobs= is unchanged since ThinLTO backend compiles are embarrassingly
parallel.
Link: https://discourse.llvm.org/t/avoidable-overhead-from-threading-by-default/69160
Reviewed By: peter.smith, andrewng
Differential Revision: https://reviews.llvm.org/D147493
This reverts commit da68d2164efcc1f5e57f090e2ae2219056b120a0.
This change is correct, but left a `config->threadCount` use that is error-prone
and may harm performance when parallel::strategy.compute_thread_count() > 16.
This implements support for relaxing these relocations to use the GP
register to compute addresses of globals in the .sdata and .sbss
sections.
This feature is off by default and must be enabled by passing
--relax-gp to the linker.
The GP register might not always be the "global pointer". It can
be used for other purposes. See discussion here
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/371
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D143673
When --threads= is unspecified, we set it to
`parallel::strategy.compute_thread_count()`, which uses
sched_getaffinity (Linux)/cpuset_getaffinity (FreeBSD)/std:🧵:hardware_concurrency (others).
With extensive testing on many machines (many configurations from
{aarch64,x86-64} x {Linux,FreeBSD,Windows} x allocators(native,mimalloc,rpmalloc) combinations)
with varying workloads, we discovered that when the concurrency is larger than
16, the linking process is slower than using --threads=16 due to parallelism
overhead outweighs optimizations. This is particularly harmful for machines with
many cores or when the link job competes with other jobs.
Cap parallel::strategy when --threads= is unspecified.
For some workloads changing the concurrency from 8 to 16 has nearly no improvement.
--thinlto-jobs= is unchanged since ThinLTO backend compiles are embarrassingly
parallel.
Link: https://discourse.llvm.org/t/avoidable-overhead-from-threading-by-default/69160
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D147493
Currently, the --thinlto-prefix-replace="oldpath;newpath" option is used during
distributed ThinLTO thin links to specify the mapping of the input bitcode object
files' directory tree (oldpath) to the directory tree (newpath) used for both:
1) the output files of the thin link itself (the .thinlto.bc index files and the
optional .imports files)
2) the specified object file paths written to the response file given in the
--thinlto-index-only=${response} option, which is used by the final native
link and must match the paths of the native object files that will be
produced by ThinLTO backend compiles.
This patch expands the --thinlto-prefix-replace option to allow a separate directory
tree mapping to be specified for the object file paths written to the response file
(number 2 above). This is important to support builds and build systems where the
same output directory may not be written by multiple build actions (e.g. the thin link
and the ThinLTO backend compiles).
The new format is: --thinlto-prefix-replace="origpath;outpath[;objpath]"
This replaces the origpath directory tree of the thin link input files with
outpath when writing the thin link index and imports outputs (number 1
above). If objpath is specified it replaces origpath of the input files with
objpath when writing the response file (number 2 above), otherwise it
falls back to the old behavior of using outpath for this as well.
Reviewed By: tejohnson, MaskRay
Differential Revision: https://reviews.llvm.org/D144596
Changes:
- Adding BE32 big endian Support for Arm.
- Replace the writele and readle with their endian-aware versions.
- Adding test cases for the big-endian be32 arm configuration.
Patch by: Milosz Plichta. This patch merges all the changes from
this patch https://reviews.llvm.org/D140203 as well.
Reviewed By: peter.smith, MaskRay
Differential Revision: https://reviews.llvm.org/D140202
Adds the new AArch64-ABI dynamic entry generation to LLD. This will
allow Android to move from the Android-specific ELF note onto the
dynamic entries.
Change the behaviour of an unspecified --android-memtag-mode. Now, when
unspecified, this will print a warning that you're doing a no-op, rather
than implicitly turning on sync mode. This is important for MTE globals
later, where a binary containing static tagged global descriptors
shouldn't have MTE turned on without specific intent being passed to the
linker.
For now, continue to emit the Android ELF note by default. In future, we
can probably make it only emit the note when provided a flag.
Do a quick NFC-cleanup of the ELF note while we're here. It doesn't
change anything about the ELF note itself, but makes it more clear to
the reader of the code what alignment requirements are being (previously
implicitly) met.
Reviewed By: fmayer, MaskRay
Differential Revision: https://reviews.llvm.org/D143769
Allow controlling the CodeGenOpt::Level independent of the LTO
optimization level in LLD via new options for the COFF, ELF, MachO, and
wasm frontends to lld. Most are spelled as --lto-CGO[0-3], but COFF is
spelled as -opt:lldltocgo=[0-3].
See D57422 for discussion surrounding the issue of how to set the CG opt
level. The ultimate goal is to let each function control its CG opt
level, but until then the current default means it is impossible to
specify a CG opt level lower than 2 while using LTO. This option gives
the user a means to control it for as long as it is not handled on a
per-function basis.
Reviewed By: MaskRay, #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D141970
gcc warned like
../../lld/ELF/InputSection.cpp:75:37: warning: ISO C++11 requires at least one argument for the "..." in a variadic macro
75 | invokeELFT(parseCompressedHeader);
| ^
changes:
- BLX: The Arm architecture versions that support the branch and link
instruction (BLX), can rewrite BLs in place when a state change from Arm<->Thumb
is required. Armv4T does not have BLX and so needs thunks for state changes.
- v4T Thumb long branches needed their own thunk. We could have used the v6M
implementation, but v6M doesn't have Arm state and must resolve to rather
inefficient stack reshuffling. We also can't reuse v7 thumb thunks as they use
MOVV/MOVT, which wasn't available yet for v4T.
- Remove the `lack of BLX' warning. LLVM only supports Arm Architecture versions
upwards of v4, which we now all support in LLD.
- renamed existing thunks to better reflect their use:
ARMV5ABSLongThunk -> ARMV5LongLdrPcThunk,
ARMV5PILongThunk -> ARMV4PILongThunk
- removed isCompatibleWith method from ARMV5ABSLongThunk and ARMV5PILongThunk,
as they were identical to the ARMThunk parent class implementation.
Support for (efficient) position independent thunks for v4T will be added in a
follow-up patch, including possible related thunk renaming and code comment
cleanup.
Reviewed By: MaskRay, peter.smith
Differential Revision: https://reviews.llvm.org/D139888
Currently we take the first SHT_RISCV_ATTRIBUTES (.riscv.attributes) as the
output. If we link an object without an extension with an object with the
extension, the output Tag_RISCV_arch may not contain the extension and some
tools like objdump -d will not decode the related instructions.
This patch implements
Tag_RISCV_stack_align/Tag_RISCV_arch/Tag_RISCV_unaligned_access merge as
specified by
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-elf.adoc#attributes
For the deprecated Tag_RISCV_priv_spec{,_minor,_revision}, dump the attribute to
the output iff all input agree on the value. This is different from GNU ld but
our simple approach should be ok for deprecated tags.
`RISCVAttributeParser::handler` currently warns about unknown tags. This
behavior is retained. In GNU ld arm, tags >= 64 (mod 128) are ignored with a
warning. If RISC-V ever wants to do something similar
(https://github.com/riscv-non-isa/riscv-elf-psabi-doc/issues/352), consider
documenting it in the psABI and changing RISCVAttributeParser.
Like GNU ld, zero value integer attributes and empty string attributes are not
dumped to the output.
Reviewed By: asb, kito-cheng
Differential Revision: https://reviews.llvm.org/D138550
Allowing incorrect version scripts is not a helpful default. Flip that
to help users find their bugs at build time rather than at run time.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D135402
When -fgpu-rdc is used for linking relocatable objects, clang driver launches
clang-offload-bundler to extract a device relocatable object from each input
relocatable object file and passes the extracted files to lld. The input relocatable
object file could either come from HIP program or C++ program. The relocatable
object file from C++ program does not contain device relocatable objects, therefore
clang-offload-bundler extracts an empty file and passes it to lld. lld treates
empty file as linker script. When there is no object input file to lld, lld
will emit error:
target emulation unknown: -m or at least one .o file required
This patch adds "elf64_amdgpu" to lld so that lld always know the target
no matter whether there are object input files or not.
Reviewed by: Artem Belevich, Fangrui Song
Differential Revision: https://reviews.llvm.org/D138221
Follows up on commit cd5d5ce235081005173566c99c592550021de058 by
additionally ignoring relative paths ending in "lto-wrapper.exe" as
can be the case for GCC cross-compiled for Windows.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D138065
Like D134013, but for COFF and Mach-O.
Also expand the ELF test a bit. I at first didn't realize that `getValue()` for
`-mllvm -foo=bar` would return `-foo=bar` instead of just `bar`, and so
I wrote the test to check if we indeed get this wrong. We don't, but
having the test for it seems nice, so I'm including it.
Differential Revision: https://reviews.llvm.org/D137971
Allowing incorrect version scripts is not a helpful default. Flip that
to help users find their bugs at build time rather than at run time.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D135402
Mach-O ld64 supports -w to suppress warnings. GNU ld 2.40 will support the
option as well (https://sourceware.org/bugzilla/show_bug.cgi?id=29654).
This feature has some small value. E.g. when analyzing a large executable with
relocation overflow issues, we may use --noinhibit-exec --emit-relocs to get an
output file with static relocations despite relocation overflow issues. -w can
significantly improve the link time as printing the massive warnings is slow.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D136569
A reference to __real_foo should trigger archive extraction of the input file that defines foo, otherwise a link using --wrap=foo might fail to link with an undefined reference to foo.
This matches bfd linker behaviour.
Differential Revision: https://reviews.llvm.org/D135897
This reverts commit 096f93e73dc3f88636cdcb57515e3732385b452d.
Revert "[Libomptarget] Make the plugins ingore undefined exported symbols"
This reverts commit 3f62314c235bd2475c8e2b5b874b2932a444e823.
Revert "[LLD] Enable --no-undefined-version by default."
This reverts commit 7ec8b0d162e354c703f5390784287054601f9c69.
Three commits are reverted because of the current omp build fail
with GNU ld. See discussion here: https://reviews.llvm.org/rG096f93e73dc3
Allowing incorrect version scripts is not a helpful default. Flip that
to help users find their bugs at build time rather than at run time.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D135402
Add LLVM_LIBRARY_VISIBILITY to remove unneeded GOT and unique_ptr
indirection. We can move other global variables into ctx without
indirection concern. In the long term we may consider passing Ctx
as a parameter to various functions and eliminate global state as
much as possible and then remove `Ctx::reset`.
`config` has 1000+ uses so we try to avoid changing `config->foo`. Define a
wrapper with LLVM_LIBRARY_VISIBILITY to remove unneeded GOT and unique_ptr
indirection.
My x86-64 lld executable is 11+KiB smaller.