870 Commits

Author SHA1 Message Date
Fangrui Song
57b0843f68 MCSymbol: Remove isUnset
The isUnset state lacks significance and should be treated as equivalent
to an undefined symbol.

Equated and common symbols seem to have subtle semantic distinctions in
GAS, justifying the use of distinct `SymContents*` values.

TODO: Ensure common symbols have a fragment, so that `isDefined()`
returns true, rejecting `.comm c, 4, 4; .set c, 4`
2025-08-03 21:31:11 -07:00
Fangrui Song
b77f51f3f1 MCSymbolXOFF: Remove classof
The object file format specific derived classes are used in context
where the type is statically known. We don't use isa/dyn_cast and we
want to eliminate MCSymbol::Kind in the base class.
2025-08-03 18:49:36 -07:00
Fangrui Song
e9de1ee9f5 MC: Move useCodeAlign from MCSection to MCAsmInfo
To centralize assembly-related virtual functions to MCAsmInfo and move
toward making MCSection non-virtual.
2025-07-26 11:34:10 -07:00
Sterling-Augustine
29e8599aa9
Reapply "Support SFrame command-line and .cfi_section syntax (#150316) (#150509)
This reverts commit ad36e4284d66c3609ef8675ef02ff1844bc1951d, fixing a
single uninitialized bit (which cannot be detected with Address
Sanitizer).

This PR adds support for the llvm-mc command-line flag "--gsframe" and
adds ".sframe" to the legal values passed ".cfi_section". It plumbs the
option through the cfi handling code a fair amount. Code to support
actual section generation follows in a future PR.

These options match the gnu-assembler's support syntax for sframes, on
both the command line and in assembly files.

First in a series of changes that will allow llvm-mc to produce sframe
.cfi sections. For more information about sframes, see
https://sourceware.org/binutils/docs-2.44/sframe-spec.html

and the llvm-RFC here:
https://discourse.llvm.org/t/rfc-adding-sframe-support-to-llvm/86900
2025-07-24 14:09:41 -07:00
Sterling-Augustine
ad36e4284d
Revert "Support SFrame command-line and .cfi_section syntax (#149935)" (#150316)
This reverts commit f9d0bd02d966e5c28aca9a6ceadd5ffec6aa9f78.
2025-07-23 14:32:14 -07:00
Sterling-Augustine
f9d0bd02d9
Support SFrame command-line and .cfi_section syntax (#149935)
This PR adds support for the llvm-mc command-line flag "--gsframe" and
adds ".sframe" to the legal values passed ".cfi_section". It plumbs the
option through the cfi handling code a fair amount. Code to support
actual section generation follows in a future PR.

These options match the gnu-assembler's support syntax for sframes, on
both the command line and in assembly files.

First in a series of changes that will allow llvm-mc to produce sframe
.cfi sections. For more information about sframes, see
https://sourceware.org/binutils/docs-2.44/sframe-spec.html

and the llvm-RFC here:
https://discourse.llvm.org/t/rfc-adding-sframe-support-to-llvm/86900
2025-07-23 10:43:38 -07:00
Fangrui Song
673e5422ea
MC: Fix fragment-in-BSS check
* Handle non-zero fill values for `.fill` and `.org` directives.
* Restore the fragment type check
  (5ee34ff1e5cc952116f0da943ddaeb1a71db2940 removed a reachable
  `llvm_unreachable`) to detect unintended API usage.

Remove virtual functions `getVirtualSectionKind` (added in
https://reviews.llvm.org/D78138) as they are unnecessary in diagnostics.
The a.out object file format has the BSS concept, which has been
inherited by COFF, XCOFF, Mach-O, and ELF object file formats.

Pull Request: https://github.com/llvm/llvm-project/pull/149721
2025-07-20 11:11:21 -07:00
Fangrui Song
6201761e96 MC: Rename isVirtualSection to isBssSection
The term BSS (Block Started by Symbol) is a standard, widely recognized
term, available in the a.out object file format and adopted by formats
like COFF, XCOFF, Mach-O (called S_ZEROFILL while `__bss` is also used),
and ELF. To avoid introducing unfamiliar terms, we should use
isBSSSection instead of isVirtualSection.
2025-07-20 10:39:17 -07:00
Fangrui Song
3cb0c7f45b MC: Rework .reloc directive and fix the offset when it evaluates to a constant
* Fix `.reloc constant` to mean section_symbol+constant instead of
  .+constant . The initial .reloc support from MIPS incorrectly
  interpreted the offset.
* Delay the evaluation of the offset expression after
  MCAssembler::layout, deleting a lot of code working with MCFragment.
* Delete many FIXME from https://reviews.llvm.org/D79625
* Some lld/ELF/Arch/LoongArch.cpp relaxation tests rely on .reloc .,
  R_LARCH_ALIGN generating ALIGN relocations at specific location.
  Sort the relocations.
2025-07-17 00:36:11 -07:00
Fangrui Song
28e1473e8e
MC: Remove bundle alignment mode
The being-removed PNaCl has a Software Fault Isolation mechanism, which
requires that certain instructions and groups of instructions do not
cross a bundle boundary. When `.bundle_align_mode` is in effect, each
instruction is placed in its own fragment, allowing flexible NOP
padding.

This feature has significantly complicated our refactoring of MCStreamer
and MCFragment, leading to considerable effort spent untangling
it (including flushPendingLabels (75006466296ed4b0f845cbbec4bf77c21de43b40),
MCAssembler iteration improvement, and recent MCFragment refactoring).

* Make MCObjectStreamer::emitInstToData non-virtual and delete
  MCELFStreamer::emitInstTodata
* Delete MCELFStreamer::emitValueImpl and emitValueToAlignment

Minor instructions:u decrease for both -O0 -g and -O3 builds
https://llvm-compile-time-tracker.com/compare.php?from=c06d3a7b728293cbc53ff91239d6cd87c0982ffb&to=9b078c7f228bc5b6cdbfe839f751c9407f8aec3e&stat=instructions:u

Pull Request: https://github.com/llvm/llvm-project/pull/148781
2025-07-15 19:36:19 -07:00
Fangrui Song
1fbfa333f6 MCAlignFragment: Rename fields and use uint8_t FillLen
* Rename the vague `Value` to `Fill`.
* FillLen is at most 8. Making the field smaller to facilitate encoding
  MCAlignFragment as a MCFragment union member.
* Replace an unreachable report_fatal_error with assert.
2025-07-13 14:07:10 -07:00
Fangrui Song
7b517cf743 MCParser: Add SMLoc to expressions
The information helps debugging, and will be used and tested when we
change MCFixup::getLoc to use the MCExpr location and remove
MCFixup::Loc.
2025-07-04 10:24:37 -07:00
Fangrui Song
7e3e2e1b8c MCParser: Add SMLoc to expressions
The information will be used when we change MCFixup::getLoc to use the
MCExpr location, making MCFixup smaller.
2025-07-04 00:58:07 -07:00
Fangrui Song
e878b7e349 MCParsedAsmOperand::print: Add MCAsmInfo parameter
so that subclasses can provide the appropriate MCAsmInfo to print
MCExpr objects.

At present, llvm/utils/TableGen/AsmMatcherEmitter.cpp constucts a
generic MCAsmInfo.
2025-06-28 12:05:33 -07:00
Fangrui Song
c73906ec69 MCParser: Reduce VK_None uses 2025-06-27 22:02:22 -07:00
Fangrui Song
5aa3e6baa0 MC: Reduce MCSymbolRefExpr::VK_None uses 2025-06-27 21:46:36 -07:00
Fangrui Song
97a32f2ad9 MC: Add MCSpecifierExpr to unify target MCExprs
Many targets define MCTargetExpr subclasses just to encode an expression
with a relocation specifier. Create a generic MCSpecifierExpr to be
inherited instead. Migrate M68k and SPARC as examples.
2025-06-07 11:33:40 -07:00
Fangrui Song
9ff30d4f1c MCParse: Disallow @ specifier in symbol equating
Relocation specifiers are attached to an instruction and cannot be used
in equating. GAS rejects `a = b@plt`. For now, handle just
MCSymbolRefExpr.
2025-06-01 15:20:16 -07:00
Fangrui Song
e015626f18 MC: Allow .set to reassign non-MCConstantExpr expressions
GNU Assembler supports symbol reassignment via .set, .equ, or =.
However, LLVM's integrated assembler only allows reassignment for
MCConstantExpr cases, as it struggles with scenarios like:

```
.data
.set x, 0
.long x         // reference the first instance
x = .-.data
.long x         // reference the second instance
.set x,.-.data
.long x         // reference the third instance
```

Between two assignments binds, we cannot ensure that a reference binds
to the earlier assignment. We use MCSymbol::IsUsed and other conditions
to reject potentially unsafe reassignments, but certain MCConstantExpr
uses could be unsafe as well.

This patch enables reassignment by cloning the symbol upon reassignment
and updating the symbol table. Existing references to the original
symbol remain unchanged, and the original symbol is excluded from the
emitted symbol table.
2025-05-26 21:58:18 -07:00
Fangrui Song
76ee2d34f7 MCParser: Error when .set reassigns a non-redefinable variable
The conditions in parseAssignmentExpression are conservative. We should
also report an error when a non-redefiniable variable (e.g. .equiv
followed by .set; .weakref followed by .set).

Make MCAsmStreamer::emitLabel call setOffset to make the behavior
similar to MCObjectStreamer. `isUndefined()` can now be replaced with
`isUnset()`.

Additionally, fix an AMDGPU API user (tested by a few tests including
MC/AMDGPU/hsa-v4.s)
2025-05-26 20:19:52 -07:00
Fangrui Song
343428c666 MC: Detect cyclic dependency for variable symbols
We report cyclic dependency errors for variable symbols and rely on
isSymbolUsedInExpression in parseAssignmentExpression at parse time,
which does not catch all setVariableValue cases (e.g. cyclic .weakref).
Instead, add a bit to MCSymbol and check it when walking the variable
value MCExpr. When a cycle is detected when we have a final layout,
report an error and set the variable to a constant to avoid duplicate
errors.

isSymbolUsedInExpression is considered deprecated, but it is still used
by AMDGPU (#112251).
2025-05-26 15:08:11 -07:00
Fangrui Song
cb7d68a77b MCParser: Replace deprecated alias MCAsmLexer with AsmLexer 2025-05-25 12:08:40 -07:00
Fangrui Song
a0901a2f87 Replace #include MCAsmLexer.h with AsmLexer.h
MCAsmLexer.h has been made a forwarder header since #134207
2025-05-25 11:57:29 -07:00
Fangrui Song
97ad399c48
MCParser: Move LCurly/RCurly testing into tokenIsStartOfStatement
Commit 8a0453e23abf27433b7539b2da2060d2df9fb39c (2015) added LCurly and
RCurly cases for Hexagon instruction bundles. While gas x86 also adopted
`{` in 2017 for pseudo prefixes (see `tc_symbol_chars`), `{` remains
uncommon among targets. Move `{` and `}` parsing into the newly
introduced `tokenIsStartOfStatement` hook (#137997).

Pull Request: https://github.com/llvm/llvm-project/pull/140101
2025-05-15 18:28:23 -07:00
Jason Eckhardt
d56f23e408
[AsmParser] Replace starIsStartOfStatement with tokenIsStartOfStatement. (#137997)
Currently `MCTargetAsmParser::starIsStartOfStatement` checks for `*` at
the start of the statement. There are other (currently) downstream
back-ends that need the same treatment for other tokens. Instead of
introducing bespoke APIs for each such token, we generalize (and rename)
starIsStartOfStatement as tokenIsStartOfStatement which takes the token
of interest as an argument.

Update the BPF AsmParser (the only upstream consumer today) to use the
new version.
2025-05-07 12:36:17 -05:00
Fangrui Song
ca5b3a0f51 [MC] Remove SetUsed on isUndefined and getFragment
Due to the known limitations of .set reassignment (see
https://sourceware.org/PR288), we use diagnostics to reject patterns
that could lead to errors (ae7ac010594f693fdf7b3ab879e196428d961e75 2009-06)).

This code gets refined multiple times, see:

* 9b4a824217f1fe23f83045afe7521acb791bc2d0 (2010-05) `IsUsedInExpr`
* 46c79ef1132607aead144dfda0f26aa8b065214f (2010-11) renamed `IsUsedInExpr` to `IsUsed`

The related `SetUsed` bit seems unnecessary nowadays.
2025-04-13 00:53:29 -07:00
Fangrui Song
0816c7a95d MCParser: Remove unused enum constant 2025-04-10 23:31:38 -07:00
Fangrui Song
26475f5bdd
[AArch64] Refactor @plt, @gotpcrel, and @AUTH to use parseDataExpr
Following PR #132569 (RISC-V), which added `parseDataExpr` for parsing
expressions in data directives (e.g., `.word`), this PR migrates AArch64
`@plt`, `@gotpcrel`, and `@AUTH` from the `parsePrimaryExpr` workaround
to `parseDataExpr`. The goal is to align with the GNU assembler model,
where relocation specifiers apply to the entire operand rather than
individual terms, reducing complexity-especially evident in `@AUTH`
parsing.

Note: AArch64 ELF lacks an official syntax for data directives
(#132570). A prefix notation might be a preferable future direction.
I recommend `%specifier(expr)`.

AsmParser's `@specifier` parsing is suboptimal, necessitating lexer
workarounds. `@` might appear multiple times in an operand.
We should not use `@` beyond the existing AArch64 Mach-O instruction
operands.

In the test elf-reloc-ptrauth.s, many errors are now reported at parse
time.

Pull Request: https://github.com/llvm/llvm-project/pull/134202
2025-04-08 09:09:19 -07:00
Fangrui Song
b6a9618301 [MCParser] Rename confusing variable names
https://reviews.llvm.org/D24047 added `IsAtStartOfStatement` to
MCAsmLexer, while its subclass AsmLexer had a variable of the same name.
The assignment in `UnLex` is unnecessary, which is now removed.

60b403e75cd25a0c76aaaf4e6b176923acf49443 (2019) named the result
`parseStatement` `Parsed`. `HasError` is a clearer name.
2025-04-04 21:04:06 -07:00
Fangrui Song
b6e2df54c4 [MC] Move some member variables from AsmParser to MCAsmParser
to eliminate some virtual functions and avoid duplication
between AsmParser/MasmParser.
2025-04-02 09:59:18 -07:00
Fangrui Song
36978fadb8 [MC] Add UseAtForSpecifier
Some ELF targets don't use @ for relocation specifiers.
We should not report `error: invalid variant` when @ is used.

Attempt to make expr@specifier parsing less hacky.
2025-04-01 00:06:05 -07:00
Fangrui Song
fe6fb910df
[RISCV] Replace @plt/@gotpcrel in data directives with %pltpcrel %gotpcrel
clang -fexperimental-relative-c++-abi-vtables might generate `@plt` and
`@gotpcrel` specifiers in data directives. The syntax is not used in
humand-written assembly code, and is not supported by GNU assembler.
Note: the `@plt` in `.word foo@plt` is different from
the legacy `call func@plt` (where `@plt` is simply ignored).

The `@plt` syntax was selected was simply due to a quirk of AsmParser:
the syntax was supported by all targets until I updated it
to be an opt-in feature in a0671758eb6e52a758bd1b096a9b421eec60204c

RISC-V favors the `%specifier(expr)` syntax following MIPS and Sparc,
and we should follow this convention.

This PR adds support for `.word %pltpcrel(foo+offset)` and
`.word %gotpcrel(foo)`, and drops `@plt` and `@gotpcrel`.

* MCValue::SymA can no longer have a SymbolVariant. Add an assert
  similar to that of AArch64ELFObjectWriter.cpp before
  https://reviews.llvm.org/D81446 (see my analysis at
  https://maskray.me/blog/2025-03-16-relocation-generation-in-assemblers
  if intrigued)
* `jump foo@plt, x31` now has a different diagnostic.

Pull Request: https://github.com/llvm/llvm-project/pull/132569
2025-03-29 11:08:13 -07:00
Oliver Stannard
7ada6f111f
[AsmParser] Correctly handle .ifeqs nested in other conditional directives (#132713)
The parser function used for the .ifeqs and .ifnes directives was
missing the check for whether we are currently in an ignored block of an
outer conditional directive, causing the block to be evaluated when it
should not, for example:

.if 0
  .ifeqs "a", "a"
    // Should not be evaluated, but is
    nop
  .endif
.endif
2025-03-24 14:03:20 +00:00
Fangrui Song
13bb2f450e [MC] Rename some VariantKind functions to use Specifier
Use the more appropriate term "relocation specifier" and avoid the
variable name `Kind`, which conflicts with MCExpr and FixupKind.
2025-03-20 22:06:16 -07:00
Fangrui Song
7722d7519c [MC] evaluateAsRelocatableImpl: remove the Fixup argument
Follow-up to d6fbffa23c84e622735b3e880fd800985c1c0072 . This commit
updates all call sites and removes the argument from the function.
2025-03-15 16:10:19 -07:00
Fangrui Song
f120b0d6d2 [MC] Remove MCSymbolRefExpr::VK_Invalid in favor of getVaraintKindForName returning std::optional
so that when the enum members are moved to XXXTargetExpr::VariantKind,,
they do not need to implement an invalid value.
2025-03-11 00:21:31 -07:00
Fangrui Song
fe56c4c019 [MC] Remove unneeded VK_None argument from MCSymbolRefExpr::create. NFC 2025-03-05 23:14:04 -08:00
Fangrui Song
98a640a2fa [MC] Move VariantKind info to MCAsmInfo
Follow-up to 14951a5a3120e50084b3c5fb217e2d47992a24d1

* Unify getVariantKindName and getVariantKindForName
* Allow each target to specify the preferred case (albeit ignored in MCParser)

Note: targets that use variant kinds should call MCExpr::print with a
non-null MAI to print variant kinds. operator<< passes a nullptr to
`MCExpr::print`, which should be avoided (e.g. Hexagon; fixed in
commit cf00ac81ac049cddb80aec1d6d88b8fab4f209e8).
2025-03-02 20:36:20 -08:00
Fangrui Song
8c7c791284 [MCParser] Use getVariantKindForName and move PPC specific VariantKind to PowerPC/ 2025-03-02 16:20:59 -08:00
Fangrui Song
18e09da255 [Mips] Rework relocation expression parsing
A relocation expression might be used in an immediate operand or a
memory offset. https://reviews.llvm.org/D23110 , which intended to
generalize chained relocation operators (%hi(%neg(%gp_rel(x)))),
inappropriated introduced intrusive changes to the generic code. This
patch drops the intrusive changes and significantly simplifies the code.
The new style is similar to pre-D23110 but much cleaner.

Some weird expressions allowed by gas are not supported for simplicity,
e.g. "%lo foo", "(%lo(foo))", "%lo(foo)+1".
"(%lo(foo))", while previously parsed, is not used in practice.
"%lo(foo)+1" and "%lo(2*4)+foo" were previously parsed but would lead to
an error anyway as the expression is not relocatable
(`evaluateSymbolicAdd` does not fold the Add when RefKind are
different).
2025-03-02 11:27:15 -08:00
Fangrui Song
1b1dc50505 [MCParser] Improve parseIntToken error message
Add a default argument, which is more readable than existing call sites
and encourages new call sites to omit the argument.

Omit " in ... directive" since this the error message includes the line.
2025-03-01 19:37:00 -08:00
Fangrui Song
077497d180 [MCParser] Remove parseParenExprOfDepth
Introduced by http://reviews.llvm.org/D9742 as a hack, which then became
unneeded.

Primary test: llvm/test/MC/Mips/memory-offsets.s
2025-03-01 16:52:45 -08:00
Fangrui Song
5e6c0853fd [MCParser] Clean up onEndOfFile
and modernize NumOfMacroInstantiations
2025-03-01 15:58:19 -08:00
Fangrui Song
e3e9c5c873 [MC] Remove unneeded onLabelParsed and onLabelParsed from HLASM
They are only used by ARM and wasm.
2025-03-01 15:24:52 -08:00
Fangrui Song
56600c11ad MCAsmInfo: replace HLASM-specific variables with IsHLASM
HLASM is very different from the gas syntax. We don't expect other
targets to customize the differences. Unify the numerous variables.
2024-12-24 18:37:46 -08:00
Aleksei Vetrov
4a5f82b43b
[MC] Fix DWARF file table for files with empty DWARF (#119572)
Update root file in DWARF file/line table as soon as we see the first
"#line" directive.

This was moved from "enabledGenDwarfForAssembly", which is called right
before we emit DWARF information. But if the file is empty or contains
expressions that doesn't need DWARF, it is never called, leaving an
original root file and not the file in the "#line" directive.

Add a test checking for this case.

This is reapply of #119229 with the following fix:

"MCContext::setMCLineTableRootFile" has the effect of adding
".debug_line" section to the output, even if DWARF generation is
disabled. Add a check and a test for this case.

Fixes: #119020
Fixes: #119229
2024-12-12 08:51:52 -08:00
David Blaikie
0d59fc2761
Revert "[MC] Fix DWARF file table for files with empty DWARF (#119020)" (#119486)
Reverts llvm/llvm-project#119229

Causes debug info to be unconditionally emitted, regardless of whether
it's requested.
2024-12-10 18:14:53 -08:00
Aleksei Vetrov
5041d06730
[MC] Fix DWARF file table for files with empty DWARF (#119020) (#119229)
Update root file in DWARF file/line table as soon as we see the first
"#line" directive.

This was moved from "enabledGenDwarfForAssembly", which is called right
before we emit DWARF information. But if the file is empty or contains
expressions that doesn't need DWARF, it is never called, leaving an
original root file and not the file in the "#line" directive.

Add a test checking for this case.

Fixes: #119020
2024-12-10 09:29:25 -08:00
abhishek-kaushik22
d20731ce6b
[CGData][GlobalIsel][Legalizer][DAG][MC][AsmParser][X86][AMX] Use std::move to avoid copy (#118068) 2024-12-06 09:46:15 +08:00
Janek van Oirschot
bd9145c8c2
Reapply [AMDGPU] Avoid resource propagation for recursion through multiple functions (#112251)
I was wrong last patch. I viewed the `Visited` set purely as a possible
recursion deterrent where functions calling a callee multiple times are
handled elsewhere. This doesn't consider cases where a function is
called multiple times by different callers still part of the same call
graph. New test shows the aforementioned case.

Reapplies #111004, fixes #115562.
2024-11-15 18:40:05 +00:00