Previously we saved registers in the shadow space of callee before
calling __delayLoadHelper2. Now we save arguments in the shadow space of
the caller and allocate shadow space for the callee.
Fixes#51941
---------
Co-authored-by: Benjamin Santerre <benjamin.santerre@gmail.com>
MSVC linker accepts native ARM64 object files as input with
`-machine:arm64ec`, similar to `-machine:arm64x`. Its usefulness is very
limited; for example, both exports and imports are not reflected in the
PE structures and can't work. However, their symbol tables are otherwise
functional.
Since we already have handling of multiple symbol tables implemented for
ARM64X, the required changes are mostly about adjusting relevant checks
to account for them on the ARM64EC target.
Delay-load helper handling is a bit of a shortcut. The patch never pulls
it for native object files and just ensures that the code is fine with
that. In general, I think it would be nice to adjust the driver to pull
it only when it's actually referenced, which would allow applying the
same logic to the native symbol table on ARM64EC without worrying about
pulling too much.
For each imported module, emit null-terminated native import entries,
followed by null-terminated EC entries. If a view lacks imports for a
given module, only terminators are emitted. Use ARM64X relocations to
skip native entries in the EC view.
Move `delayLoadHelper` and `tailMergeUnwindInfoChunk` to `SymbolTable`
since they are different for each symbol table.
This change prepares for ARM64X delay-load imports support (#124600).
Delaying the `setLocation` call is problematic on ARM64X because the
order of addresses may not align with the order of symbols.
In hybrid images, the PE header references a single IAT for both native
and EC views, merging entries where possible. When merging isn't
feasible, different imports are grouped together, and ARM64X relocations
are emitted as needed.
`NullChunk` instances do write data, even if it's always zero. Setting
`hasData` to false causes `Writer::assignAddresses` to ignore them
when calculating `rawSize`. This typically isn't an issue, as null chunks
are usually positioned within a section, and later chunks adjust the
size accordingly.
However, on ARM64EC, the auxiliary IAT is placed at the end of the
`.rdata` section and terminates with a null chunk. As a result, `rawSize`
is never updated to account for it, and space for the null chunk is not
allocated. Consequently, when `NullChunk::writeTo` is called, it receives
an invalid pointer - either pointing to the next section or beyond the
allocated buffer.
Since commit dadc6f2488684, only the constructor of the `EdataContents`
class is used. Replace it with a function and skip the call when using a
custom `.edata` section.
This change prepares for hybrid ARM64X support, which requires two
`SymbolTable` instances: one for native symbols and one for EC symbols.
In such cases, `config.machine` will remain ARM64X, while the
`SymbolTable` instances will store ARM64 and ARM64EC machine types.
This change prepares for the introduction of separate hybrid namespaces.
Hybrid images will require two `SymbolTable` instances, making it
necessary to associate `InputFile` objects with the relevant one.
Currently, null chunks always follow other aligned chunks, so this patch
is NFC. However, it will become observable once support for ARM64X
imports is added. The import tables are shared between the native and EC
views. They are usually very similar, but in cases where they differ,
ARM64X relocations handle the discrepancies. If a DLL is only imported
by EC code, the native view will see it as importing zero functions from
this DLL (with ARM64X relocations replacing those null chunks with
actual imports). In this scenario, the null chunks may appear as the
very first chunks, meaning there is nothing else forcing their
alignment.
Fill the regular delay-load IAT with x86_64 delay-load thunks. Similarly
to regular imports, create an auxiliary IAT and its copy for ARM64EC
calls. These are filled with the same `__impchk_` thunks used for
regular imports, which perform an indirect call with
`__icall_helper_arm64ec` on the regular delay-load IAT. These auxiliary
IATs are exposed via CHPE metadata starting from version 2.
The MSVC linker creates one more copy of the auxiliary IAT. `__imp_func`
symbols refer to that hidden IAT, while the `#func` thunk performs a
call with the public auxiliary IAT. If the public auxiliary IAT is fine
for `#func`, it should be fine for calls using the `__imp_func` symbol
as well. Therefore, I made `__imp_func` refer to that IAT too.
In addition to the auxiliary IAT, ARM64EC modules also contain a copy of
it. At runtime, the auxiliary IAT is filled with the addresses of actual
ARM64EC functions when possible. If patching is detected, the OS may use
the IAT copy to revert the auxiliary IAT, ensuring that the call checker
is used for calls to imported functions.
In addition to the regular IAT, ARM64EC also includes an auxiliary IAT.
At runtime, the regular IAT is populated with the addresses of imported
functions, which may be x86_64 functions or the export thunks of ARM64EC
functions. The auxiliary IAT contains versions of functions that are
guaranteed to be directly callable by ARM64 code.
The linker fills the auxiliary IAT with the addresses of `__impchk_`
thunks. These thunks perform a call on the IAT address using
`__icall_helper_arm64ec` with the target address from the IAT. If the
imported function is an ARM64EC function, the OS may replace the address
in the auxiliary IAT with the address of the ARM64EC version of the
function (not its export thunk), avoiding the runtime call checker for
better performance.
This makes a difference when linking executables with delay loaded
libraries for arm32; the delay loader implementation can load data from
the registry with instructions that assume alignment.
This issue does not show up when linking in MinGW mode, because a
PseudoRelocTableChunk gets injected, which also sets alignment, even if
the chunk itself is empty.
The loader can usually handle an unaligned import dir chunk, but It's not
optimal and it's not what MSVC link.exe does.
Windows refuses to load ARM64X binaries with unaligned import directory.
aarch64 and arm64ec imports are shared in such binaries as much as
possible. As long as they use the same set of functions from given import
directory, both the directory and import addresses chunk are just shared.
When used set of functions differs, ARM64X dynamic relocations are used
to modify import dir to point to different names and import addresses for
its EC view. I suspect that the loader expects some alignment on ARM64X
dynamic relocation offset and may not be the case when relocated import
dir is not aligned.
MSVC link.exe allows overriding exports on the cmd-line with exports seen in OBJ directives. The typical case is what is described in #62329.
Before this patch, trying to override an export with `/export` or `/def` would generate a duplicate warning. This patches tries to replicate the MSVC behavior. A second override on the cmd-line would still generate the warning.
There's still a case which we don't cover: MSVC link.exe is able to demangle an exported OBJ directive function, and match it with a unmangled export function in a .def file. In the meanwhile, one can use the mangled export in the .def to cover that case.
This fixes#62329
Differential revision: https://reviews.llvm.org/D149611
For each symbol in a /delayloaded library, lld injects a small piece of
code to handle the symbol lazy loading. This code doesn't have unwind
information, which may be troublesome.
Provide these information for AMD64.
Thanks to Yannis Juglaret <yjuglaret@mozilla.com> for contributing the
unwinding info and for his support while crafting this patch.
Fix#59639
Differential Revision: https://reviews.llvm.org/D141691
This reverts commit 7370ff624d217b0f8f7512ca5b651a9b8095a411.
(and 47fb8ae2f9a4075de05433ef24f459b6befd1730).
This commit broke the symbol type in import libraries generated
for mingw autoexported symbols, when the source files were built
with LTO. I'll commit a testcase that showcases this issue after
the revert.
Delay-loaded imports creats a load thunk with a symbol name. Before this
change, the name uses a `__imp_load_` prefix. On the other hand, normal
import uses the `__imp_` prefix for the import address pointer. If an
import symbol named `load_func` is imported normally and another named
`func` is imported using delay-load, this can cause a symbol name
collision.
This patch changes delay-load imports to use `__imp___load_` prefix.
Because it is less likely for normal imports to have a name starting in
`__load_` this should reduce the chance of a name collision.
Reviewed By: mstorsjo
Differential Revision: https://reviews.llvm.org/D134464
Before this, LLD sets OrdinalBase to 0, which deviates from usual
practices. This technically would allow LLD to export a symbol using
ordinal 0, however LLD never use export ordinal 0, which results in
binaries with export tables always having an empty export at ordinal 0.
This change makes LLD set OrdinalBase to 1 and not create the empty
export with ordinal 0, which makes its behaviour more in line with both
the MSVC linker and the GNU linker.
Reviewed By: mstorsjo
Differential Revision: https://reviews.llvm.org/D134140
llvm::sort is beneficial even when we use the iterator-based overload,
since it can optionally shuffle the elements (to detect
non-determinism). However llvm::sort is not usable everywhere, for
example, in compiler-rt.
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D130406
Move all variables at file-scope or function-static-scope into a hosting structure (lld::CommonLinkerContext) that lives at lldMain()-scope. Drivers will inherit from this structure and add their own global state, in the same way as for the existing COFFLinkerContext.
See discussion in https://lists.llvm.org/pipermail/llvm-dev/2021-June/151184.html
Differential Revision: https://reviews.llvm.org/D108850
Original commit description:
[LLD] Remove global state in lld/COFF
This patch removes globals from the lldCOFF library, by moving globals
into a context class (COFFLinkingContext) and passing it around wherever
it's needed.
See https://lists.llvm.org/pipermail/llvm-dev/2021-June/151184.html for
context about removing globals from LLD.
I also haven't moved the `driver` or `config` variables yet.
Differential Revision: https://reviews.llvm.org/D109634
This reverts commit a2fd05ada9030eab2258fff25e77a05adccae128.
Original commits were b4fa71eed34d967195514fe9b0a5211fca2bc5bc
and e03c7e367adb8f228332e3c2ef8f45484597b719.
check for timer output"
Seems to be causing a number of asan test failures.
This reverts commit b4fa71eed34d967195514fe9b0a5211fca2bc5bc
and e03c7e367adb8f228332e3c2ef8f45484597b719.
This patch removes globals from the lldCOFF library, by moving globals
into a context class (COFFLinkingContext) and passing it around wherever
it's needed.
See https://lists.llvm.org/pipermail/llvm-dev/2021-June/151184.html for
context about removing globals from LLD.
I also haven't moved the `driver` or `config` variables yet.
Differential Revision: https://reviews.llvm.org/D109634
The following class isn't part of the export table; there's a
second correctly placed comment about the things that actually
belong to the export table.
As this isn't handled as a regular relocation, the normal handling of
maybeReportRelocationToDiscarded in Chunks.cpp doesn't apply here.
This would have caught the issue fixed by
82de4e075339f5ad8d68cfe31eb45b771d4750ae.
Differential Revision: https://reviews.llvm.org/D102115
This patch adds support for creating Guard Address-Taken IAT Entry Tables (.giats$y sections) in object files, matching the behavior of MSVC. These contain lists of address-taken imported functions, which are used by the linker to create the final GIATS table.
Additionally, if any DLLs are delay-loaded, the linker must look through the .giats tables and add the respective load thunks of address-taken imports to the GFIDS table, as these are also valid call targets.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D87544
This broke both Firefox and Chromium (PR47905) due to what seems like dllimport
function not being handled correctly.
> This patch adds support for creating Guard Address-Taken IAT Entry Tables (.giats$y sections) in object files, matching the behavior of MSVC. These contain lists of address-taken imported functions, which are used by the linker to create the final GIATS table.
> Additionally, if any DLLs are delay-loaded, the linker must look through the .giats tables and add the respective load thunks of address-taken imports to the GFIDS table, as these are also valid call targets.
>
> Reviewed By: rnk
>
> Differential Revision: https://reviews.llvm.org/D87544
This reverts commit cfd8481da1adba1952e0f6ecd00440986e49a946.
This patch adds support for creating Guard Address-Taken IAT Entry Tables (.giats$y sections) in object files, matching the behavior of MSVC. These contain lists of address-taken imported functions, which are used by the linker to create the final GIATS table.
Additionally, if any DLLs are delay-loaded, the linker must look through the .giats tables and add the respective load thunks of address-taken imports to the GFIDS table, as these are also valid call targets.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D87544
This patch adds support for creating Guard Address-Taken IAT Entry Tables (.giats$y sections) in object files, matching the behavior of MSVC. These contain lists of address-taken imported functions, which are used by the linker to create the final GIATS table.
Additionally, if any DLLs are delay-loaded, the linker must look through the .giats tables and add the respective load thunks of address-taken imports to the GFIDS table, as these are also valid call targets.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D87544
The alignment of ARM64 range extension thunks was fixed in
7c816492197a, but ARM range extension thunks, and import
and delay import thunks also need aligning (like all code on ARM
platforms).
I'm adding a test for alignment of ARM64 import thunks - not
specifically adding tests for misalignment of all of them though.
Differential Revision: https://reviews.llvm.org/D77796
E.g. for x86_64, previously each symbol's thunk was 87 bytes. Now
there's a 12 byte thunk per symbol, plus a shared 83 byte tail
function.
This is similar to what both MS link.exe and GNU tools do for
delay imports.
Differential Revision: https://reviews.llvm.org/D64288
llvm-svn: 365823
This patch does the same thing as r365595 to other subdirectories,
which completes the naming style change for the entire lld directory.
With this, the naming style conversion is complete for lld.
Differential Revision: https://reviews.llvm.org/D64473
llvm-svn: 365730
Shaves another pointer off of SectionChunk, reducing the size from 96 to
88 bytes, down from 144 before I started working on this. Combined with
D62356, this reduced peak memory usage when linking chrome_child.dll
from 713MB to 675MB, or 5%.
Create NonSectionChunk to provide virtual dispatch to the rest of the
chunk types.
Reviewers: ruiu, aganea
Differential Revision: https://reviews.llvm.org/D62362
llvm-svn: 361667