226 Commits

Author SHA1 Message Date
Nick Fitzgerald
6018930ef1
[lld][WebAssembly] Support for the custom-page-sizes WebAssembly proposal (#128942)
This commit adds support for WebAssembly's custom-page-sizes proposal to
`wasm-ld`. An overview of the proposal can be found
[here](https://github.com/WebAssembly/custom-page-sizes/blob/main/proposals/custom-page-sizes/Overview.md).
In a sentence, it allows customizing a Wasm memory's page size, enabling
Wasm to target environments with less than 64KiB of memory (the default
Wasm page size) available for Wasm memories.

This commit contains the following:

* Adds a `--page-size=N` CLI flag to `wasm-ld` for configuring the
linked Wasm binary's linear memory's page size.

* When the page size is configured to a non-default value, then the
final Wasm binary will use the encodings defined in the
custom-page-sizes proposal to declare the linear memory's page size.

* Defines a `__wasm_first_page_end` symbol, whose address points to the
first page in the Wasm linear memory, a.k.a. is the Wasm memory's page
size. This allows writing code that is compatible with any page size,
and doesn't require re-compiling its object code. At the same time,
because it just lowers to a constant rather than a memory access or
something, it enables link-time optimization.

* Adds tests for these new features.

r? @sbc100 

cc @sunfishcode
2025-03-04 09:39:30 -08:00
Hood Chatham
cc7f22ee6c
[object][WebAssembly] Add support for RUNTIME_PATH to yaml2obj and obj2yaml (#126080)
This is the first step of adding RPATH support for wasm. 

See corresponding update to the WebAssembly/tool-conventions repo on dynamic
linking: https://github.com/WebAssembly/tool-conventions/pull/246
2025-02-24 09:15:41 -08:00
Sam Clegg
48415777ea
Revert "[Object][WebAssembly] Fix data segment offsets higher than 2^31 (#125739)" (#125786)
This reverts commit c798a5c4d5c3c8cb21e6001f505d8f44217c2244.

This broke bunch of test the emscripten side. Reverting while we
investigate.
2025-02-04 16:16:17 -08:00
Sam Clegg
c798a5c4d5
[Object][WebAssembly] Fix data segment offsets higher than 2^31 (#125739)
Fixes: #58555
2025-02-04 14:06:07 -08:00
Derek Schuff
9fdc38c81c
[WebAssembly][Object] Support more elem segment flags (#123427)
Some tools (e.g. Rust tooling) produce element segment descriptors with
neither
elemkind or element type descriptors, but with init exprs instead of
func indices
(this is with the flags value of 4 in

https://webassembly.github.io/spec/core/binary/modules.html#element-section).
LLVM doesn't fully model reference types or the various ways to
initialize element
segments, but we do want to correctly parse and skip over all type
sections, so
this change updates the object parser to handle that case, and refactors
for more
clarity.

The test file is updated to include one additional elem segment with a
flags value
of 4, an initializer value of (32.const 0) and an empty vector. 

Also support parsing files that export imported (undefined) functions.
2025-01-17 17:26:44 -08:00
Lang Hames
d02c1676d7 [Support][Error] Add ErrorAsOutParameter constructor that takes an Error by ref.
ErrorAsOutParameter's Error* constructor supports cases where an Error might not
be passed in (because in the calling context it's known that this call won't
fail). Most clients always have an Error present however, and for them an Error&
overload is more convenient.
2024-11-29 15:57:53 +11:00
Kazu Hirata
e9c8106a90
[Object] Remove unused includes (NFC) (#116750)
Identified with misc-include-cleaner.
2024-11-19 19:42:09 -08:00
Kazu Hirata
4048c64306
[llvm] Remove redundant control flow statements (NFC) (#115831)
Identified with readability-redundant-control-flow.
2024-11-12 10:09:42 -08:00
Heejin Ahn
be64ca9123
[WebAssembly] Remove WASM_FEATURE_PREFIX_REQUIRED (NFC) (#113729)
This has not been emitted since

3f34e1b883.

The corresponding proposed tool-conventions change:
https://github.com/WebAssembly/tool-conventions/pull/236
2024-11-04 16:12:57 -08:00
Sam Clegg
22b7b84860
[lld][WebAssembly] Report undefined symbols in -shared/-pie builds (#75242)
Previously we would ignore all undefined symbols when using
`-shared` or `-pie`. All undefined symbols would be treated as imports
regardless of whether those symbols we defined in any shared library.
With this change we now track symbol in shared libraries and report
undefined symbols in the main program by default.
The old behavior is still available via the
`--unresolved-symbols=import-dynamic` command line flag.

This rationale for allowing this type of breaking change is that `-pie`
and `-shared` are both still experimental will warn as such, unless
`--experimental-pic` is passed.

As part of this change the linker now models shared library symbols
via new SharedFunctionSymbol and SharedDataSymbol types.

I've also added a new `--no-shlib-sigcheck` option that bypassed the
checking of functions signature in shared libraries. This is
specifically required by emscripten the case where the imports/exports
of shared libraries have been modified by via JS type legalization (this
is only needed when targeting old JS engines where bigint is not yet
available                                         

See https://github.com/emscripten-core/emscripten/issues/18198
2024-07-12 13:26:52 -07:00
Heejin Ahn
c179d50fd3
[WebAssembly] Add exnref type (#93586)
This adds (back) the exnref type restored in the new EH proposal adopted
in Oct 2023 CG meeting:

https://github.com/WebAssembly/exception-handling/blob/main/proposals/exception-handling/Exceptions.md:x
2024-05-28 16:10:11 -07:00
Derek Schuff
2eaeae7e9a
[Object][Wasm] Use offset instead of index for Global address and store size (#81781)
Currently the address reported by binutils for a global is its index;
but its offset (in the file or section) is more useful for binary size
attribution.
This PR treats globals similarly to functions, and tracks their offset
and size. It also centralizes the logic differentiating linked from object
and dylib files (where section addresses are 0).
2024-02-15 09:36:44 -08:00
Derek Schuff
01706e7677
[llvm-nm][WebAssembly] Print function symbol sizes (#81315)
nm already prints sizes for data symbols. Do that for function symbols 
too, and update objdump to also print size information.

Implements item 3 from https://github.com/llvm/llvm-project/issues/76107
2024-02-09 14:22:47 -08:00
Sam Clegg
c429f48b56
[Object][WebAssembly] Improve error on invalid relocation (#81203)
See https://github.com/emscripten-core/emscripten/issues/21140
2024-02-08 15:20:37 -08:00
Derek Schuff
5818572789
[Object][Wasm] Generate symbol info from name section names (#81063)
Currently symbol info is generated from a linking section or from export
names. This PR generates symbols in a WasmObjectFile from the name
section as well, which allows tools like objdump and nm to show useful
information for more linked binaries. There are some limitations:
most notably that we don't assume any particular ABI, so we don't get
detailed information about data symbols if the segments are merged
(which is the default).

Covers most of the desired functionality from #76107
2024-02-08 13:20:47 -08:00
Derek Schuff
8b0f47bfa4
[Object][Wasm] Use file offset for section addresses in linked wasm files (#80529)
Wasm has no unified virtual memory space as other object formats and
architectures do, so previously WasmObjectFile reported 0 for all
section addresses, and until 428cf71ff used section offsets for function
symbols. Now we use file offsets for function symbols, and this change
switches section addresses to do the same (in linked files). The main
result of this is that objdump now reports VMAs in section listings, and
also uses file offets rather than section offsets when disassembling
linked binaries (matching the behavior of other disassemblers and stack
traces produced by browwsers). To make this work, this PR also updates
objdump's generation of synthetics fallback symbols to match lib/Object
and also correctly plumbs symbol types for regular and dummy symbols
through to the backend to avoid needing special knowledge of address 0.

This also paves the way for generating symbols from name sections rather
than symbol tables or imports (see #76107) by allowing the
disassembler's synthetic fallback symbols match the name-section
generated symbols (in a followup PR).
2024-02-07 11:51:19 -08:00
Derek Schuff
ef1f999e13
[Object][Wasm] Move WasmSymbolInfo directly into WasmSymbol (NFC) (#80219)
Move the WasmSymbolInfos from their own vector on the WasmLinkingData
directly into the WasmSymbol object. Removing the const-ref to an
external object allows the vector of WasmSymbols to be safely
expanded/reallocated; generating symbol info from the name section will
require this, as the numbers of function and data segment names are
stored separately.

This is a step toward generating symbol information from name sections
for #76107
2024-02-02 10:44:52 -08:00
Derek Schuff
7f409cd82b
[Object][Wasm] Allow parsing of GC types in type and table sections (#79235)
This change allows a WasmObjectFile to be created from a wasm file even 
if it uses typed funcrefs and GC types. It does not significantly change how 
lib/Object models its various internal types (e.g. WasmSignature,
WasmElemSegment), so LLVM does not really "support" or understand such
files, but it is sufficient to parse the type, global and element sections, discarding
types that are not understood. This is useful for low-level binary tools such as
nm and objcopy, which use only limited aspects of the binary (such as function
definitions) or deal with sections as opaque blobs.

This is done by allowing `WasmValType` to have a value of `OTHERREF`
(representing any unmodeled reference type), and adding a field to
`WasmSignature` indicating it's a placeholder for an unmodeled reference 
type (since there is a 1:1 correspondence between WasmSignature objects
and types in the type section).
Then the object file parsers for the type and element sections are expanded
to parse encoded reference types and discard any unmodeled fields.
2024-01-25 09:48:38 -08:00
Derek Schuff
103fa3250c
[WebAssembly] Use ValType instead of integer types to model wasm tables (#78012)
LLVM models some features found in the binary format with raw integers
and others with nested or enumerated types. This PR switches modeling of
tables and segments to use wasm::ValType rather than uint32_t. This NFC
change is in preparation for modeling more reference types, but IMO is
also cleaner and closer to the spec.
2024-01-17 11:29:19 -08:00
Derek Schuff
428cf71ffa Reland "[WebAssembly][Object]Use file offset as function symbol address for linked files (#76198)"
WebAssembly doesn't have a single virtual memory space the way other object
formats or architectures do, so "addresses" mean different things depending
on the context.
Function symbol addresses in object files are offsets from the start of the code
section. This is good for linking and relocation. However when dealing with
linked binaries, offsets from the start of the file/module are more often
used (e.g. for stack traces in browsers), and are more useful for use
cases like binary size attribution. This PR changes Object to use
the file offset instead of the section offset for function symbols, but
only for linked (non-DSO) files.

This is a reland of fc5f51cf with a fix for the MSan failure (it was not caused
by this change, but it was revealed by the new tests).
2024-01-03 15:39:48 -08:00
Mitch Phillips
665d1a0eb4 Revert "[WebAssembly][Object]Use file offset as function symbol address for linked files (#76198)"
This reverts commit fc5f51cf5af4364b38bf22e491d46e1e892ade0c.

Reason: Broke the sanitizer buildbot -
https://lab.llvm.org/buildbot/#/builders/5/builds/39751/steps/12/logs/stdio
2024-01-03 11:23:10 +01:00
Derek Schuff
fc5f51cf5a
[WebAssembly][Object]Use file offset as function symbol address for linked files (#76198)
WebAssembly doesn't have a single virtual memory space the way other
object formats or architectures do, so "addresses" mean different things
depending on the context.
Function symbol addresses in object files are offsets from the start of
the code section. This is good for linking and relocation. However when
dealing with linked binaries, offsets from the start of the file/module
are more often used (e.g. for stack traces in browsers), and are more
useful for use cases like binary size attribution. This PR changes
Object to use the file offset instead of the section offset for function
symbols, but only for linked (non-DSO) files.

This implements item number 4 from #76107
2024-01-02 14:54:54 -08:00
DavidKorczynski
e8b6fa5f30
[WebAssembly] Add bounds check in parseCodeSection (#76407)
This is needed as otherwise `Ctx.Ptr` will be incremented to a position
outside it's available buffer, which is being used to read values e.g.
966d564e43/llvm/lib/Object/WasmObjectFile.cpp (L1469)

Fixes: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=28856

Signed-off-by: David Korczynski <david@adalogics.com>
2023-12-26 13:32:13 -08:00
Derek Schuff
35a5df2de6
[WebAssembly][Object] Record section start offsets at start of payload (#76188)
LLVM ObjectFile currently records the start offsets of sections as the
start of the section header, whereas most other tools (WABT, emscripten,
wasm-tools) record it as the start of the section content, after the
header. This affects binutils tools such as objdump and nm, but not
compilation/assembly (since that is driven by symbols and assembler
labels which already have their values inside the section payload rather
in the header. This patch updates LLVM to match the other tools.
2023-12-21 14:16:37 -08:00
Sam Clegg
4e8cb01b01
[WebAssembly] Add symbol information for shared libraries (#75238)
The current (experimental) spec for WebAssembly shared libraries does
not include a full symbol table like the object format. This change
extracts symbol information from the normal wasm exports.

This is the first step in having the linker report undefined symbols
when linking with shared libraries. The current behaviour is to ignore
all undefined symbols when linking with `-pie` or `-shared`.

See https://github.com/emscripten-core/emscripten/issues/18198
2023-12-20 11:13:09 -08:00
Kazu Hirata
586ecdf205
[llvm] Use StringRef::{starts,ends}_with (NFC) (#74956)
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.

I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
2023-12-11 21:01:36 -08:00
Sam Clegg
afe957ea95
[WebAssembly] Allow absolute symbols in the linking section (symbol table) (#67493)
Fixes a crash in `-Wl,-emit-relocs` where the linker was not able to
write linker-synthetic absolute symbols to the symbol table.

This change adds a new symbol flag (`WASM_SYMBOL_ABS`), which means that
the symbol's offset is absolute and not relative to a given segment.
Such symbols include `__stack_low` and `__stack_low`.

Note that wasm object files never contains such symbols, only binaries
linked with `-Wl,-emit-relocs`.

Fixes: #67111
2023-10-03 13:16:16 -07:00
Sam Clegg
79cf24e211 [llvm-nm][WebAssembly] Report the size of data symbols
Fixes: https://github.com/llvm/llvm-project/issues/58839

Differential Revision: https://reviews.llvm.org/D158799
2023-08-25 16:45:50 -07:00
Derek Schuff
1b21067cf2 [WebAssembly][Objcopy] Write output section headers identically to inputs
Previously when objcopy generated section headers, it padded the LEB
that encodes the section size out to 5 bytes, matching the behavior of
clang. This is correct, but results in a binary that differs from the
input. This can sometimes have undesirable consequences (e.g. breaking
source maps).

This change makes the object reader remember the size of the LEB
encoding in the section header, so that llvm-objcopy can reproduce it
exactly. For sections not read from an object file (e.g. that
llvm-objcopy is adding itself), pad to 5 bytes.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D155535
2023-07-27 15:43:51 -07:00
Brendan Dahl
220fe00a7c [WebAssembly] Support annotate clang attributes for marking functions.
Annotation attributes may be attached to a function to mark it with
custom data that will be contained in the final Wasm file. The
annotation causes a custom section named
"func_attr.annotate.<name>.<arg0>.<arg1>..." to be created that will
contain each function's index value that was marked with the annotation.

A new patchable relocation type for function indexes had to be created so
the custom section could be updated during linking.

Reviewed By: sbc100

Differential Revision: https://reviews.llvm.org/D150803
2023-07-11 15:17:26 -07:00
Job Noorman
8de9f2b558 Move SubtargetFeature.h from MC to TargetParser
SubtargetFeature.h is currently part of MC while it doesn't depend on
anything in MC. Since some LLVM components might have the need to work
with target features without necessarily needing MC, it might be
worthwhile to move SubtargetFeature.h to a different location. This will
reduce the dependencies of said components.

Note that I choose TargetParser as the destination because that's where
Triple lives and SubtargetFeatures feels related to that.

This issues came up during a JITLink review (D149522). JITLink would
like to avoid a dependency on MC while still needing to store target
features.

Reviewed By: MaskRay, arsenm

Differential Revision: https://reviews.llvm.org/D150549
2023-06-26 11:20:08 +02:00
Sam Clegg
d65ed8cde0 [lld][WebAssembly] Fix handling of mixed strong and weak references
When adding a undefined symbols to the symbol table, if the existing
reference is weak replace the symbol flags with (potentially) non-weak
binding.

Fixes: https://github.com/llvm/llvm-project/issues/60829

Differential Revision: https://reviews.llvm.org/D144747
2023-02-27 14:20:01 -08:00
Archibald Elliott
62c7f035b4 [NFC][TargetParser] Remove llvm/ADT/Triple.h
I also ran `git clang-format` to get the headers in the right order for
the new location, which has changed the order of other headers in two
files.
2023-02-07 12:39:46 +00:00
Elena Lepilkina
537cdf92c4 [llvm-objdump][RISCV] Use new common method to parse ARCH RISCV attribute
Differential Revision: https://reviews.llvm.org/D139553
2023-01-16 16:57:55 +03:00
Guillaume Chatelet
3e5f54d6d7 Revert D139098 "[Alignment] Use Align for ObjectFile::getSectionAlignment"
This breaks lld.

This reverts commit 10c47465e2505ddfee4e62a2ab2e535abea3ec56.
2022-12-09 09:45:04 +00:00
Guillaume Chatelet
10c47465e2 [Alignment] Use Align for ObjectFile::getSectionAlignment
Differential Revision: https://reviews.llvm.org/D139098
2022-12-09 09:34:43 +00:00
Dan Gohman
9f049e9993 [lld][WebAssemby] Allow import module names to be empty strings.
The component-model [canonical ABI] is currently using import names with
empty strings. Remove the special cases for empty strings from
WasmObjectFile.cpp so that they can pass through as-is.

[canonical ABI]: https://github.com/WebAssembly/component-model/blob/main/design/mvp/CanonicalABI.md

Differential Revision: https://reviews.llvm.org/D133037
2022-08-31 15:30:15 -07:00
Kazu Hirata
7094ab4ee7 [llvm] Modernize bool literals (NFC)
Identified with modernize-use-bool-literals.
2022-07-17 18:08:51 -07:00
Derek Schuff
5a082d9c1c [WebAssembly][Object] Remove requirement that objects must have code sections
When parsing name and linking sections, we currently require that the object
must have a code section (it seems that this was intended to verify section
ordering). However it can be useful for binaries to have their code sections
stripped out (e.g. if we just want the debug info). In that case we need
the rest of the known sections (so e.g. we know how many functions there
are, to verify the name section) but not the actual code.

I've removed the restriction completely. I think this is OK because the
section-parsing code already checks function and global indices in many
places for validity and will return appropriate errors if the relevant sections
are missing. Also we can't just replace the requirement of seeing a code section
with a requirement  that we see a function or global section, because a binary
may just not have any functions or globals.
But there's only an problem if the name or linking section tries to name a
nonexistent function.

Part of a fix for https://github.com/emscripten-core/emscripten/issues/13084

Differential Revision: https://reviews.llvm.org/D128094
2022-06-23 13:56:17 -07:00
Kazu Hirata
7a47ee51a1 [llvm] Don't use Optional::getValue (NFC) 2022-06-20 22:45:45 -07:00
Derek Schuff
2ae385e560 [WebAssembly] Add WASM_SEC_LAST_KNOWN to BinaryFormat section types list [NFC]
There are 3 places where we were using WASM_SEC_TAG as the "last" known
section type, which requires updating (or leaves a bug) when a new known
section type is added. Instead add a "last type" to the enum for this
purpose.

Differential Revision: https://reviews.llvm.org/D127164
2022-06-07 12:05:23 -07:00
Derek Schuff
a205f2904d [WebAssembly] Consolidate sectionTypeToString in BinaryFormat [NFC]
Currently there are 2 duplicate implementation, and I want to add
a use in a 3rd place. Combine them in lib/BinaryFormat so they can
be shared.

Also update toString for symbol and reloc types to use StringRef

Differential Revision: https://reviews.llvm.org/D126553
2022-05-27 09:26:36 -07:00
Sam Clegg
9b27fbd19c [WebAssembly] Fix asan issue from https://reviews.llvm.org/D121349 2022-03-15 11:36:56 -07:00
Sam Clegg
2481adb59c [WebAssembly] Fix asan issue from https://reviews.llvm.org/D121349 2022-03-14 19:57:50 -07:00
Sam Clegg
9504ab32b7 [WebAssembly] Second phase of implemented extended const proposal
This change continues to lay the ground work for supporting extended
const expressions in the linker.

The included test covers object file reading and writing and the YAML
representation.

Differential Revision: https://reviews.llvm.org/D121349
2022-03-14 08:55:47 -07:00
serge-sans-paille
e72c195fdc Cleanup LLVMObject headers
Most notably,

llvm/Object/Binary.h no longer includes llvm/Support/MemoryBuffer.h
llvm/Object/MachOUniversal*.h no longer include llvm/Object/Archive.h
llvm/Object/TapiUniversal.h no longer includes llvm/Object/TapiFile.h

llvm-project preprocessed size:
before: 1068185081
after:  1068324320

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D119457
2022-02-10 21:13:44 +01:00
Quinn Pham
c71fbdd87b [NFC] Inclusive language: Remove instances of master in URLs
[NFC] This patch fixes URLs containing "master". Old URLs were either broken or
redirecting to the new URL.

Reviewed By: #libc, ldionne, mehdi_amini

Differential Revision: https://reviews.llvm.org/D113186
2021-11-05 08:48:41 -05:00
Sam Clegg
659a08399a [WebAssembly] Add import info to dylink section of shared libraries
See https://github.com/WebAssembly/tool-conventions/pull/175

Differential Revision: https://reviews.llvm.org/D111345
2021-10-15 11:49:16 -07:00
Heejin Ahn
9261ee32dc [WebAssembly] Make EH work with dynamic linking
This makes Wasm EH work with dynamic linking. So far we were only able
to handle destructors, which do not use any tags or LSDA info.

1. This uses `TargetExternalSymbol` for `GCC_except_tableN` symbols,
   which points to the address of per-function LSDA info. It is more
   convenient to use than `MCSymbol` because it can take additional
   target flags.

2. When lowering `wasm_lsda` intrinsic, if PIC is enabled, make the
   symbol relative to `__memory_base` and generate the `add` node. If
   PIC is disabled, continue to use the absolute address.

3. Make tag symbols (`__cpp_exception` and `__c_longjmp`) undefined in
   the backend, because it is hard to make it work with dynamic
   linking's loading order. Instead, we make all tag symbols undefined
   in the LLVM backend and import it from JS.

4. Add support for undefined tags to the linker.

Companion patches:
- https://github.com/WebAssembly/binaryen/pull/4223
- https://github.com/emscripten-core/emscripten/pull/15266

Reviewed By: sbc100

Differential Revision: https://reviews.llvm.org/D111388
2021-10-12 23:28:27 -07:00
Heejin Ahn
3ec1760d91 [WebAssembly] Remove WasmTagType
This removes `WasmTagType`. `WasmTagType` contained an attribute and a
signature index:
```
struct WasmTagType {
  uint8_t Attribute;
  uint32_t SigIndex;
};
```

Currently the attribute field is not used and reserved for future use,
and always 0. And that this class contains `SigIndex` as its property is
a little weird in the place, because the tag type's signature index is
not an inherent property of a tag but rather a reference to another
section that changes after linking. This makes tag handling in the
linker also weird that tag-related methods are taking both `WasmTagType`
and `WasmSignature` even though `WasmTagType` contains a signature
index. This is because the signature index changes in linking so it
doesn't have any info at this point. This instead moves `SigIndex` to
`struct WasmTag` itself, as we did for `struct WasmFunction` in D111104.

In this CL, in lib/MC and lib/Object, this now treats tag types in the
same way as function types. Also in YAML, this removes `struct Tag`,
because now it only contains the tag index. Also tags set `SigIndex` in
`WasmImport` union, as functions do.

I think this makes things simpler and makes tag handling more in line
with function handling. These two shares similar properties in that both
of them have signatures, but they are kind of nominal so having the same
signature doesn't mean they are the same element.

Also a drive-by fix: the reserved 'attirubute' part's encoding changed
from uleb32 to uint8 a while ago. This was fixed in lib/MC and
lib/Object but not in YAML. This doesn't change object files because the
field's value is always 0 and its encoding is the same for the both
encoding.

This is effectively NFC; I didn't mark it as such just because it
changed YAML test results.

Reviewed By: sbc100, tlively

Differential Revision: https://reviews.llvm.org/D111086
2021-10-05 17:11:22 -07:00