282 Commits

Author SHA1 Message Date
Fangrui Song
570e09047c MCSymbolWasm: Remove classof
The object file format specific derived classes are used in context
where the type is statically known. We don't use isa/dyn_cast and we
want to eliminate MCSymbol::Kind in the base class.
2025-08-03 17:28:33 -07:00
Fangrui Song
c2faf6a57f MCSectionWasm: Remove classof
The object file format specific derived classes are used in context like
MCStreamer and MCObjectTargetWriter where the type is statically known.
We don't use isa/dyn_cast and we want to eliminate
MCSection::SectionVariant in the base class.
2025-07-26 09:30:02 -07:00
Fangrui Song
2ba5e0ad17
MC: Encode FT_Align in fragment's variable-size tail
Follow-up to #148544

Pull Request: https://github.com/llvm/llvm-project/pull/149030
2025-07-20 00:46:51 -07:00
Fangrui Song
52f56edccf WasmObjectWrier: Simplify fragment walk in .init_array
and reduce the reliance on the FT_Align/FT_Data layout,
which will be changed by #149030
2025-07-19 13:24:24 -07:00
Fangrui Song
dc3a4c0fcf
MC: Restructure MCFragment as a fixed part and a variable tail
Refactor the fragment representation of `push rax; jmp foo; nop; jmp foo`,
previously encoded as
`MCDataFragment(nop); MCRelaxableFragment(jmp foo); MCDataFragment(nop); MCRelaxableFragment(jmp foo)`,

to

```
MCFragment(fixed: push rax, variable: jmp foo)
MCFragment(fixed: nop, variable: jmp foo)
```

Changes:

* Eliminate MCEncodedFragment, moving content and fixup storage to MCFragment.
* The new MCFragment contains a fixed-size content (similar to previous
  MCDataFragment) and an optional variable-size tail.
* The variable-size tail supports FT_Relaxable, FT_LEB, FT_Dwarf, and
  FT_DwarfFrame, with plans to extend to other fragment types.
  dyn_cast/isa should be avoided for the converted fragment subclasses.
* In `setVarFixups`, source fixup offsets are relative to the variable part's start.
  Stored fixup (in `FixupStorage`) offsets are relative to the fixed part's start.
  A lot of code does `getFragmentOffset(Frag) + Fixup.getOffset()`,
  expecting the fixup offset to be relative to the fixed part's start.
* HexagonAsmBackend::fixupNeedsRelaxationAdvanced needs to know the
  associated instruction for a fixup. We have to add a `const MCFragment &` parameter.
* In MCObjectStreamer, extend `absoluteSymbolDiff` to apply to
  FT_Relaxable as otherwise there would be many more FT_DwarfFrame
  fragments in -g compilations.

https://llvm-compile-time-tracker.com/compare.php?from=28e1473e8e523150914e8c7ea50b44fb0d2a8d65&to=778d68ad1d48e7f111ea853dd249912c601bee89&stat=instructions:u

```
stage2-O0-g instructins:u geomeon (-0.07%)
stage1-ReleaseLTO-g (link only) max-rss geomean (-0.39%)
```

```
% /t/clang-old -g -c sqlite3.i -w -mllvm -debug-only=mc-dump &| awk '/^[0-9]+/{s[$2]++;tot++} END{print "Total",tot; n=asorti(s, si); for(i=1;i<=n;i++) print si[i],s[si[i]]}'
Total 59675
Align 2215
Data 29700
Dwarf 12044
DwarfCallFrame 4216
Fill 92
LEB 12
Relaxable 11396
% /t/clang-new -g -c sqlite3.i -w -mllvm -debug-only=mc-dump &| awk '/^[0-9]+/{s[$2]++;tot++} END{print "Total",tot; n=asorti(s, si); for(i=1;i<=n;i++) print si[i],s[si[i]]}'
Total 32287
Align 2215
Data 2312
Dwarf 12044
DwarfCallFrame 4216
Fill 92
LEB 12
Relaxable 11396
```

Pull Request: https://github.com/llvm/llvm-project/pull/148544
2025-07-15 21:56:55 -07:00
Fangrui Song
1fbfa333f6 MCAlignFragment: Rename fields and use uint8_t FillLen
* Rename the vague `Value` to `Fill`.
* FillLen is at most 8. Making the field smaller to facilitate encoding
  MCAlignFragment as a MCFragment union member.
* Replace an unreachable report_fatal_error with assert.
2025-07-13 14:07:10 -07:00
Fangrui Song
244e053b6c MC: Remove llvm/MC/MCFixupKindInfo.h
The file used to define `MCFixupKindInfo`, a simple structure,
which is now in MCAsmBackend.h.
2025-07-05 11:24:11 -07:00
Fangrui Song
b478c38c19 MCAsmBackend: Replace FKF_IsPCRel with isPCRel() 2025-07-03 00:51:20 -07:00
Fangrui Song
95756e67c2 MC: Rework .weakref
Use a variable symbol without any specifier instead of VK_WEAKREF.
Add code in ELFObjectWriter::executePostLayoutBinding to check
whether the target should be made an undefined weak symbol.

This change fixes several issues:

* Unreferenced `.weakref alias, target` no longer creates an undefined `target`.
* When `alias` is already defined, report an error instead of crashing.

.weakref is specific to ELF. llvm-ml has reused the VK_WEAKREF name for
a different concept. wasm incorrectly copied the ELF implementation.
Remove it.
2025-05-25 21:09:55 -07:00
Fangrui Song
7ff0cf6138 MCObjectWriter: Remove the MCAssembler argument from writeObject 2025-05-24 12:55:52 -07:00
Fangrui Song
7d71a35658 MCFixup: Remove FK_PCRel_ from getKindForSize
Remove FK_PCRel_* kinds from the generic fixup list, as they are not
generic like FK_Data_*. In getRelocType, FK_PCRel_* can be replaced with
FK_Data_* by leveraging the IsPCRel argument. Their inclusion in the
generic kind list caused confusion for PowerPC, RISCV, and VE targets.

The X86/M68k uses can be implemented as target-specific fixups.
2025-05-24 12:02:18 -07:00
Fangrui Song
e373f7a452 MC: Simplify recordRelocation
* Remove the MCAssembler * argument. Change subclasses to use MCAssembler *MCObjectWriter::Asm.
* Remove pure specifier and add an empty implementation
* Change MCFragment * to MCFragment &
2025-05-24 09:54:03 -07:00
Fangrui Song
2849b1282e MCObjectwriter: Add getContext and simplify code 2025-05-24 09:26:30 -07:00
Fangrui Song
a8433b88fa MCObjectwriter: Add member variable MCAssembler * and simplify code 2025-05-24 00:11:32 -07:00
Fangrui Song
50428fb5e9
[WebAssembly] Add WebAssembly::Specifier
Move wasm-specific members outside of MCSymbolRefExpr::VariantKind (a
legacy interface I am eliminating). Most changes are mechanic and
similar to what I've done for many ELF targets (e.g. X86 #132149)

Notes:

* `fixSymbolsInTLSFixups` is replaced with `setTLS` in
  `WebAssemblyWasmObjectWriter::getRelocType`, similar to what I've done
  for many ELF targets.
* `SymA->setUsedInGOT()` in `recordRelocation` is moved to
  `getRelocType`.

While here, rename "Modifier' to "Specifier":

> "Relocation modifier", though concise, suggests adjustments happen during the linker's relocation step rather than the assembler's expression evaluation. I landed on "relocation specifier" as the winner. It's clear, aligns with Arm and IBM’s usage, and fits the assembler's role seamlessly.

Pull Request: https://github.com/llvm/llvm-project/pull/133116
2025-04-08 19:44:40 -07:00
Fangrui Song
e592393610 MCValue: Replace getSymSpecifier with getSpecifier
Commit 52eb11f925ddeba4e1b3840fd636ee87387f3ada temporarily introduced
getSymSpecifier to prepare for "MCValue: Replace MCSymbolRefExpr members
with MCSymbol" (d5893fc2a7e1191afdb4940469ec9371a319b114). The
refactoring is now complete.
2025-04-05 23:44:57 -07:00
Fangrui Song
086af83688 [MC] Replace getSymA()->getSymbol() with getAddSym. NFC
We will replace the MCSymbolRefExpr member in MCValue with MCSymbol.
This change reduces dependence on MCSymbolRefExpr.
2025-04-05 13:23:14 -07:00
Heejin Ahn
4d1c827423
[WebAssembly] Support parsing .lto_set_conditional (#126546)
In the split-LTO-unit mode in ThinLTO, a compilation module is split
into two and global variables that meet a specific criteria is moved to
the split module.
d21fc58aee/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp (L315-L366)

And if there is an originally local-linkage global value defined in the
original module and referenced in the split module or the vice versa,
that value is _promoted_ by attaching a module ID to their names in
order to prevent name clashes because now they can be referenced from
other modules.
d21fc58aee/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp (L46-L100)

And when that promoted global value is a function, a
`.lto_set_conditional` entry is written to the original module to avoid
breaking references from inline assembly:

d21fc58aee/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp (L84-L91)

The syntax of this is, if the original function name is `symbolA` and
the module ID is `123`,
```ll
module asm ".lto_set_conditional symbolA,symbolA.123"
```
These symbols are parsed here:

648981f913/llvm/lib/MC/MCParser/AsmParser.cpp (L6467)

The first function symbol in this `.lto_set_conditional` do not exist as
a function in the bitcode anymore because it was renamed to the second.
So they are not assigned as function symbols but they are not really
data either, so the object writer crashes here:
5b9e6c7993/llvm/lib/MC/WasmObjectWriter.cpp (L1820)

This PR makes the object writer just skip those symbols.

---

This problem was discovered when I was testing with
`-fwhole-program-vtables`. The reason we didn't have this problem before
with ThinLTO was because `-fsplit-lto-unit`, which splits LTO units when
possible, defaults to false, but it defaults to true when
`-fwhole-program-vtables` is used.
2025-04-02 03:15:29 +09:00
Fangrui Song
b73e144bdf MCValue: Simplify code with getSubSym
MCValue::SymB is a MCSymbolRefExpr *, which might become MCSymbol * in
the future. Simplify some code that uses MCValue::SymB.
2025-03-23 12:13:13 -07:00
Nick Fitzgerald
6018930ef1
[lld][WebAssembly] Support for the custom-page-sizes WebAssembly proposal (#128942)
This commit adds support for WebAssembly's custom-page-sizes proposal to
`wasm-ld`. An overview of the proposal can be found
[here](https://github.com/WebAssembly/custom-page-sizes/blob/main/proposals/custom-page-sizes/Overview.md).
In a sentence, it allows customizing a Wasm memory's page size, enabling
Wasm to target environments with less than 64KiB of memory (the default
Wasm page size) available for Wasm memories.

This commit contains the following:

* Adds a `--page-size=N` CLI flag to `wasm-ld` for configuring the
linked Wasm binary's linear memory's page size.

* When the page size is configured to a non-default value, then the
final Wasm binary will use the encodings defined in the
custom-page-sizes proposal to declare the linear memory's page size.

* Defines a `__wasm_first_page_end` symbol, whose address points to the
first page in the Wasm linear memory, a.k.a. is the Wasm memory's page
size. This allows writing code that is compatible with any page size,
and doesn't require re-compiling its object code. At the same time,
because it just lowers to a constant rather than a memory access or
something, it enables link-time optimization.

* Adds tests for these new features.

r? @sbc100 

cc @sunfishcode
2025-03-04 09:39:30 -08:00
Kazu Hirata
73beb153c1
[MC] Avoid repeated hash lookups (NFC) (#123698) 2025-01-21 16:22:23 +08:00
Derek Schuff
9fdc38c81c
[WebAssembly][Object] Support more elem segment flags (#123427)
Some tools (e.g. Rust tooling) produce element segment descriptors with
neither
elemkind or element type descriptors, but with init exprs instead of
func indices
(this is with the flags value of 4 in

https://webassembly.github.io/spec/core/binary/modules.html#element-section).
LLVM doesn't fully model reference types or the various ways to
initialize element
segments, but we do want to correctly parse and skip over all type
sections, so
this change updates the object parser to handle that case, and refactors
for more
clarity.

The test file is updated to include one additional elem segment with a
flags value
of 4, an initializer value of (32.const 0) and an empty vector. 

Also support parsing files that export imported (undefined) functions.
2025-01-17 17:26:44 -08:00
Fangrui Song
2cf3d05917 [MC] Simplify WasmObjectWriter
and make it fit well with the future when MCFragment content is
out-of-line.
2024-12-21 21:59:31 -08:00
George Stagg
ed91843d43
[WebAssembly] Handle symbols in .init_array sections (#119127)
Follow on from #111008.
2024-12-10 08:28:18 -08:00
George Stagg
ac5dd455ca
[WebAssembly] Support multiple .init_array fragments when writing Wasm objects (#111008) 2024-12-04 13:12:15 -08:00
Fangrui Song
b76100e220 WasmObjectWriter: replace the MCAsmLayout parameter with MCAssembler 2024-07-01 17:18:24 -07:00
Fangrui Song
78804f891c [MC] Remove the evaluateAsAbsolute overload that takes a MCAsmLayout parameter
Continue the MCAsmLayout removal work started by 67957a45ee1ec42ae1671cdbfa0d73127346cc95.
2024-07-01 15:38:18 -07:00
Fangrui Song
dbf12b2f77 [MC] Remove MCAsmLayout::{getSymbolOffset,getBaseSymbol}
The MCAsmLayout::* forwarders added by
67957a45ee1ec42ae1671cdbfa0d73127346cc95 have all been removed.
2024-07-01 11:51:26 -07:00
Fangrui Song
a40ca78bb9 [MC] Remove MCAsmLayout::{getSectionFileSize,getSectionAddressSize} 2024-07-01 11:27:32 -07:00
Fangrui Song
6b707a8cc1 [MC] Remove the MCAsmLayout parameter from MCObjectWriter::executePostLayoutBinding 2024-07-01 10:47:46 -07:00
Fangrui Song
23e6224374 [MC] Remove the MCAsmLayout parameter from MCObjectWriter::{writeObject,writeSectionData} 2024-07-01 10:04:59 -07:00
Fangrui Song
4289c422a8 [MC] Remove the MCAsmLayout parameter from MCObjectWriter::recordRelocation 2024-06-30 22:13:54 -07:00
aengelke
46beeaa394
[MC] Remove SectionKind from MCSection (#96067)
There are only three actual uses of the section kind in MCSection:
isText(), XCOFF, and WebAssembly. Store isText() in the MCSection, and
store other info in the actual section variants where required.

ELF and COFF flags also encode all relevant information, so for these
two section variants, remove the SectionKind parameter entirely.

This allows to remove the string switch (which is unnecessary and
inaccurate) from createELFSectionImpl. This was introduced in
[D133456](https://reviews.llvm.org/D133456), but apparently, it was
never hit for non-writable sections anyway and the resulting kind was
never used.
2024-06-20 10:52:49 +02:00
Fangrui Song
de19f7b6d4
[MC] Replace fragment ilist with singly-linked lists
Fragments are allocated with `operator new` and stored in an ilist with
Prev/Next/Parent pointers. A more efficient representation would be an
array of fragments without the overhead of Prev/Next pointers.

As the first step, replace ilist with singly-linked lists.

* `getPrevNode` uses have been eliminated by previous changes.
* The last use of the `Prev` pointer remains: for each subsection, there is an insertion point and
  the current insertion point is stored at `CurInsertionPoint`.
* `HexagonAsmBackend::finishLayout` needs a backward iterator. Save all
  fragments within `Frags`. Hexagon programs are usually small, and the
  performance does not matter that much.

To eliminate `Prev`, change the subsection representation to
singly-linked lists for subsections and a pointer to the active
singly-linked list. The fragments from all subsections will be chained
together at layout time.

Since fragment lists are disconnected before layout time, we can remove
`MCFragment::SubsectionNumber` (https://reviews.llvm.org/D69411). The
current implementation of `AttemptToFoldSymbolOffsetDifference` requires
future improvement for robustness.

Pull Request: https://github.com/llvm/llvm-project/pull/95077
2024-06-11 09:18:31 -07:00
Fangrui Song
cb63abca27 [MC] Remove getFragmentList uses. NFC 2024-06-10 18:27:34 -07:00
Sam Clegg
554a2fa4b2
[WebAssembly] Fix element segments in wasm64 object files (#94617)
Followup to #94487
2024-06-06 07:45:38 -07:00
Sam Clegg
c2244f8284
[WebAssembly] Set IS_64 flag correctly on __indirect_function_table in object files (#94487)
Follow up to #92042
2024-06-05 20:28:51 -07:00
Derek Schuff
7f409cd82b
[Object][Wasm] Allow parsing of GC types in type and table sections (#79235)
This change allows a WasmObjectFile to be created from a wasm file even 
if it uses typed funcrefs and GC types. It does not significantly change how 
lib/Object models its various internal types (e.g. WasmSignature,
WasmElemSegment), so LLVM does not really "support" or understand such
files, but it is sufficient to parse the type, global and element sections, discarding
types that are not understood. This is useful for low-level binary tools such as
nm and objcopy, which use only limited aspects of the binary (such as function
definitions) or deal with sections as opaque blobs.

This is done by allowing `WasmValType` to have a value of `OTHERREF`
(representing any unmodeled reference type), and adding a field to
`WasmSignature` indicating it's a placeholder for an unmodeled reference 
type (since there is a 1:1 correspondence between WasmSignature objects
and types in the type section).
Then the object file parsers for the type and element sections are expanded
to parse encoded reference types and discard any unmodeled fields.
2024-01-25 09:48:38 -08:00
Derek Schuff
103fa3250c
[WebAssembly] Use ValType instead of integer types to model wasm tables (#78012)
LLVM models some features found in the binary format with raw integers
and others with nested or enumerated types. This PR switches modeling of
tables and segments to use wasm::ValType rather than uint32_t. This NFC
change is in preparation for modeling more reference types, but IMO is
also cleaner and closer to the spec.
2024-01-17 11:29:19 -08:00
Kazu Hirata
f5f2c313ae [llvm] Use StringRef::consume_front (NFC) 2023-12-25 12:33:00 -08:00
Kazu Hirata
586ecdf205
[llvm] Use StringRef::{starts,ends}_with (NFC) (#74956)
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.

I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
2023-12-11 21:01:36 -08:00
Kazu Hirata
4a0ccfa865 Use llvm::endianness::{big,little,native} (NFC)
Note that llvm::support::endianness has been renamed to
llvm::endianness while becoming an enum class as opposed to an
enum. This patch replaces support::{big,little,native} with
llvm::endianness::{big,little,native}.
2023-10-12 21:21:45 -07:00
Fangrui Song
111fcb0df0 [llvm] Fix duplicate word typos. NFC
Those fixes were taken from https://reviews.llvm.org/D137338
2023-09-01 18:25:16 -07:00
Brendan Dahl
220fe00a7c [WebAssembly] Support annotate clang attributes for marking functions.
Annotation attributes may be attached to a function to mark it with
custom data that will be contained in the final Wasm file. The
annotation causes a custom section named
"func_attr.annotate.<name>.<arg0>.<arg1>..." to be created that will
contain each function's index value that was marked with the annotation.

A new patchable relocation type for function indexes had to be created so
the custom section could be updated during linking.

Reviewed By: sbc100

Differential Revision: https://reviews.llvm.org/D150803
2023-07-11 15:17:26 -07:00
Akshay Khadse
65d4d62ab7 Fix uninitialized pointer members in MC
Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D148421
2023-04-18 17:45:58 +08:00
Guillaume Chatelet
e647b4f519 [reland][Alignment][NFC] Use the Align type in MCSection
Differential Revision: https://reviews.llvm.org/D138653
2022-11-24 13:19:18 +00:00
Guillaume Chatelet
3467f9c7d6 Revert D138653 [Alignment][NFC] Use the Align type in MCSection"
This breaks the bolt project.
This reverts commit 409f0dc4a420db1c6b259d5ae965a070c169d930.
2022-11-24 12:42:30 +00:00
Guillaume Chatelet
409f0dc4a4 [Alignment][NFC] Use the Align type in MCSection
Differential Revision: https://reviews.llvm.org/D138653
2022-11-24 12:32:58 +00:00
Sam Clegg
c5c4ba37b1 [WebAssembly][MC] Avoid the need for .size directives for functions
Warn if `.size` is specified for a function symbol.  The size of a
function symbol is determined solely by its content.

I noticed this simplification was possible while debugging #57427, but
this change doesn't fix that specific issue.

Differential Revision: https://reviews.llvm.org/D132929
2022-08-31 14:28:56 -07:00
Kazu Hirata
129b531c9c [llvm] Use value_or instead of getValueOr (NFC) 2022-06-18 23:07:11 -07:00