I couldn't find an existing way to pass -mcpu=lime1 equivalent to LTO
codegen.
This commit would privide one. With this commit, you can do so by
passing
`-mllvm -mcpu=lime1` to wasm-ld.
Without this change files in `--start-lib`/`--end-lib` groups were being
marked as live, which means there static constructors were being
included in the link.
Towards
#https://github.com/llvm/llvm-project/issues/134809#issuecomment-2787206873
This change moves WasmSym from a static global struct to an instance
owned by Ctx, allowing it to be reset cleanly between linker runs. This
enables safe support for multiple invocations of wasm-ld within the same
process
Changes done
- Converted WasmSym from a static struct to a regular struct with
instance members.
- Added a std::unique_ptr<WasmSym> wasmSym field inside Ctx.
- Reset wasmSym in Ctx::reset() to clear state between links.
- Replaced all WasmSym:: references with ctx.wasmSym->.
- Removed global symbol definitions from Symbols.cpp that are no longer
needed.
Clearing wasmSym in ctx.reset() ensures a clean slate for each link
invocation, preventing symbol leakage across runs—critical when using
wasm-ld/lld as a reentrant library where global state can cause subtle,
hard-to-debug errors.
---------
Co-authored-by: Vassil Vassilev <v.g.vassilev@gmail.com>
When generating C++ vtables, Clang declares virtual functions as
`void(void)` when their signature is not known (e.g.parameter types are
forward-declared). As WASM type checks imports, this would conflict with
the real definition during linking. Commit 59f959ff introduced a
workaround for this by deferring signature assignment until a definition
or direct call is seen.
When performing LTO, LLD first scans the bitcode files and creates
`DefinedFunction` symbol table entries for their contents. After LTO
codegen, they are replaced with `UndefinedFunction`s (so that the
definitions will be pulled in from the native LTO-d files when they are
added). At this point, if a function is only referenced in bitcode, its
signature remains `nullptr`.
From here, it should have behaved like in the non-LTO case: the first
direct call sets the signature. However, as the `isCalledDirectly` flag
was set to true, the missing signature was filled in by the type of the
first reference to the function, which could be a `void(void)` vtable
entry, which would then conflict with the real definition.
This commit sets `isCalledDirectly` to false so that the signature will
only be populated when a direct call is found.
See godotengine/godot#104497 and
emscripten-core/emscripten#10831
This commit adds support for WebAssembly's custom-page-sizes proposal to
`wasm-ld`. An overview of the proposal can be found
[here](https://github.com/WebAssembly/custom-page-sizes/blob/main/proposals/custom-page-sizes/Overview.md).
In a sentence, it allows customizing a Wasm memory's page size, enabling
Wasm to target environments with less than 64KiB of memory (the default
Wasm page size) available for Wasm memories.
This commit contains the following:
* Adds a `--page-size=N` CLI flag to `wasm-ld` for configuring the
linked Wasm binary's linear memory's page size.
* When the page size is configured to a non-default value, then the
final Wasm binary will use the encodings defined in the
custom-page-sizes proposal to declare the linear memory's page size.
* Defines a `__wasm_first_page_end` symbol, whose address points to the
first page in the Wasm linear memory, a.k.a. is the Wasm memory's page
size. This allows writing code that is compatible with any page size,
and doesn't require re-compiling its object code. At the same time,
because it just lowers to a constant rather than a memory access or
something, it enables link-time optimization.
* Adds tests for these new features.
r? @sbc100
cc @sunfishcode
When emitting relocs with linked output (i.e. --emit-relocs)
skip relocs against dead symbols (which do not appear in the output)
and do not emit them.
In most circumstances BSS segments are not required in the output binary
but combineOutputSegments was erroneously including them. This meant
that PIC binaries were including the BSS data as zero in the binary.
Fixes: https://github.com/emscripten-core/emscripten/issues/23683
The rule here, which I'm copying from the ELF linker, is that shared
library symbols should take presence, unless the symbol has already be
extracted from the archive. e.g:
```
$ wasm-ld foo.a foo.so ref.o // .so wins
$ wasm-ld foo.a ref.o foo.so // .a wins
```
In the first case the shared library takes precedence because the lazy
symbol is replaced by the .so symbol before it is extracted from the
archive. In the second example the ref.o file causes the archive to be
exracted before the .so file is processed, so in that case the archive
file wins.
Fixes: https://github.com/emscripten-core/emscripten/issues/23501
Some tools (e.g. Rust tooling) produce element segment descriptors with
neither
elemkind or element type descriptors, but with init exprs instead of
func indices
(this is with the flags value of 4 in
https://webassembly.github.io/spec/core/binary/modules.html#element-section).
LLVM doesn't fully model reference types or the various ways to
initialize element
segments, but we do want to correctly parse and skip over all type
sections, so
this change updates the object parser to handle that case, and refactors
for more
clarity.
The test file is updated to include one additional elem segment with a
flags value
of 4, an initializer value of (32.const 0) and an empty vector.
Also support parsing files that export imported (undefined) functions.
The commit 22b7b84860d39da71964c9b329937f2ee1d875ba
made the symbols provided by shared libraries "defined",
and thus effectively made it impossible to generate non-pie
dynamically linked executables using
--unresolved-symbols=import-dynamic.
This commit, based on https://github.com/llvm/llvm-project/pull/109249,
fixes it by checking sym->isShared() explictly.
(as a bonus, you don't need to rely on
--unresolved-symbols=import-dynamic
anymore.)
Fixes https://github.com/llvm/llvm-project/issues/107387
Change the global variable reference to a member access of another
variable `ctx`. In the future, we may pass through `ctx` to functions to
eliminate global variables.
Pull Request: https://github.com/llvm/llvm-project/pull/119835
and forward it to LinkerDriver's ctor so that some uses of the global
`config` can be dropped. This is similar to how the ELF port
migrates away from the global `config`.
Pull Request: https://github.com/llvm/llvm-project/pull/119829
Apologies for the large change, I looked for ways to break this up and
all of the ones I saw added real complexity. This change focuses on the
option's prefixed names and the array of prefixes. These are present in
every option and the dominant source of dynamic relocations for PIE or
PIC users of LLVM and Clang tooling. In some cases, 100s or 1000s of
them for the Clang driver which has a huge number of options.
This PR addresses this by building a string table and a prefixes table
that can be referenced with indices rather than pointers that require
dynamic relocations. This removes almost 7k dynmaic relocations from the
`clang` binary, roughly 8% of the remaining dynmaic relocations outside
of vtables. For busy-boxing use cases where many different option tables
are linked into the same binary, the savings add up a bit more.
The string table is a straightforward mechanism, but the prefixes
required some subtlety. They are encoded in a Pascal-string fashion with
a size followed by a sequence of offsets. This works relatively well for
the small realistic prefixes arrays in use.
Lots of code has to change in order to land this though: both all the
option library code has to be updated to use the string table and
prefixes table, and all the users of the options library have to be
updated to correctly instantiate the objects.
Some follow-up patches in the works to provide an abstraction for this
style of code, and to start using the same technique for some of the
other strings here now that the infrastructure is in place.
Hi @sbc100
I was looking into a use case involving the link function (which got my
attention to reset).
I see that `lazyBitcodeFiles` variable was introduced here
https://github.com/llvm/llvm-project/pull/114327 but I don't see it
being reset while destroying the context eventually. Hopefully this
should be the correct way to address it.
This defines some new target features. These are subsets of existing
features that reflect implementation concerns:
- "call-indirect-overlong" - implied by "reference-types"; just the
overlong encoding for the `call_indirect` immediate, and not the actual
reference types.
- "bulk-memory-opt" - implied by "bulk-memory": just `memory.copy` and
`memory.fill`, and not the other instructions in the bulk-memory
proposal.
This is split out from https://github.com/llvm/llvm-project/pull/112035.
---------
Co-authored-by: Heejin Ahn <aheejin@gmail.com>
TLS-relative relocations always need to be relative the TLS section
since they get added to `__tls_base` at runtime.
Without this change the tls base address was effectively being added to
the final value twice in this case.
This only effects code the is built with `-pthread` but linked without
shared memory (i.e. without threads).
Fixes: https://github.com/emscripten-core/emscripten/issues/22880
For COFF and ELF that are mostly free of global states, lld::errs() and
lld::outs() should not be used. This migration change allows us to
remove lld::errs, which uses the global errorHandler().
Currently, WebAssembly/WASI target does not provide direct support for
code coverage.
This patch set fixes several issues to unlock the feature. The main
changes are:
1. Port `compiler-rt/lib/profile` to WebAssembly/WASI.
2. Adjust profile metadata sections for Wasm object file format.
- [CodeGen] Emit `__llvm_covmap` and `__llvm_covfun` as custom sections
instead of data segments.
- [lld] Align the interval space of custom sections at link time.
- [llvm-cov] Copy misaligned custom section data if the start address is
not aligned.
- [llvm-cov] Read `__llvm_prf_names` from data segments
3. [clang] Link with profile runtime libraries if requested
See each commit message for more details and rationale.
This is part of the effort to add code coverage support in Wasm target
of Swift toolchain.
Instead of always generating __wasm_apply_data_relocs when relevant
options like -pie and -shared are specified, generate it only when the
relevant relocations are actually necessary.
Note: omitting empty __wasm_apply_data_relocs is not a problem because
the export is optional in the spec (DynamicLinking.md) and all runtime
linker implementations I'm aware of implement it that way. (emscripten,
toywasm, wasm-tools)
Motivations:
* This possibly reduces the module size
* This is also a preparation to fix
https://github.com/llvm/llvm-project/issues/107387, for which it isn't
obvious if we need these relocations at the time of
createSyntheticSymbols. (unless we introduce a new explicit option like
--non-pie-dynamic-link.)
Followup to #104926.
We ran into issues on the emscripten waterfall where relocation against
`__dso_handle` were being reported as errors even though
`-r/--relocatable` was being used to generate object file output rather
than executable output.
`WASM_MEMORY_ADDR_REL_` and `WASM_TABLE_INDEX_REL_` relocations against
**undefined symbols** are not supported and, except for
`UnresolvedPolicy::ReportError`, lead to incorrect Wasm code, such as
invalid data address or invalid table index that cannot be patched
during later dynamic Wasm linking with modules declaring those symbols.
This is different to other relocations that support undefined symbols by
declaring correspond Wasm imports.
For more robust behavior, `wasm-ld` should probably report an error for
such unsupported PIC relocations, independent of the `UnresolvedPolicy`.
* Don't call raw_string_ostream::flush(), which is essentially a no-op.
* Strip calls to raw_string_ostream::str(), to avoid excess layer of indirection.
Add `allow-multiple-definition` flag to `wasm-ld`. This follows the ELF
linker logic. In case of duplication, the first symbol met is used.
This PR resolves the #97543
This change is enough to allow `--strip-debug` to work on object files,
without breaking the relocation information or symbol table.
A more complete version of this change would instead reconstruct the
symbol table and relocation sections, but that is much larger change.
Bug: #102002
This reverts commit 740161a9b98c9920dedf1852b5f1c94d0a683af5.
I moved the `ISD` dependencies into the CodeGen portion of the handling,
it's a little awkward but it's the easiest solution I can think of for
now.
We don't currently have a great way to detect the architecture of shared
object files under wasm. The currently method involves checking if the
imported or exported memory is 64-bit. However some shared libraries
don't use linear memory at all.
See https://github.com/llvm/llvm-project/issues/98778