Fixes#85578, a use-after-free caused by some `MCSymbolWasm` data being
freed too early.
Previously, `WebAssemblyAsmParser` owned the data that is moved to
`MCContext` by this PR, which caused problems when handling module ASM,
because the ASM parser was destroyed after parsing the module ASM, but
the symbols persisted.
The added test passes locally with an LLVM build with AddressSanitizer
enabled.
Implementation notes:
* I've called the added method
<code>allocate<b><i>Generic</i></b>String</code> and added the second
paragraph of its documentation to maybe guide people a bit on when to
use this method (based on my (limited) understanding of the `MCContext`
class). We could also just call it `allocateString` and remove that
second paragraph.
* The added `createWasmSignature` method does not support taking the
return and parameter types as arguments: Specifying them afterwards is
barely any longer and prevents them from being accidentally specified in
the wrong order.
* This removes a _"TODO: Do the uniquing of Signatures here instead of
ObjectFileWriter?"_ since the field it's attached to is also removed.
Let me know if you think that TODO should be preserved somewhere.
If you write a 32- and a 64-bit LDR instruction that both refer to the
same constant or symbol using the = syntax:
```
ldr w0, =something
ldr x1, =something
```
then the first call to `ConstantPool::addEntry` will insert the constant
into its cache of existing entries, and the second one will find the
cache entry and reuse it. This results in a 64-bit load from a 32-bit
constant, reading nonsense into the other half of the target register.
In this patch I've done the simplest fix: include the size of the
constant pool entry as part of the key used to index the cache. So now
32- and 64-bit constant loads will never share a constant pool entry.
There's scope for doing this better, in principle: you could imagine
merging the two slots with appropriate overlap, so that the 32-bit load
loads the LSW of the 64-bit value. But that's much more complicated: you
have to take endianness into account, and maybe also adjust the size of
an existing entry. This is the simplest fix that restores correctness.
And fix `if (hasRelocationAddend())` to `usesRela` to properly treat
SHT_LLVM_CALL_GRAPH_PROFILE as SHT_REL. The incorrect does not cause a
problem because the synthesized SHT_LLVM_CALL_GRAPH_PROFILE has zero
addends.
For very large stack frames, the offset from the stack pointer to a local can be more than 2^31 which overflows various `int` offsets in the frame lowering code.
This patch updates the frame lowering code to calculate the offsets as 64-bit values and resolves the overflows, resulting in the correct codegen for very large frames.
Fixes#48911
This change introduces a version 3 of the PSV data that includes support
for the name of the entry function as an offset into StringTable data to
a null-terminated utf-8 string.
Additional tests were added to ensure that the new value was properly
serialized/deserialized from object data.
Fixes#80175
---------
Co-authored-by: Cooper Partin <coopp@ntdev.microsoft.com>
I'm planning to add HwMode support to SubRegIdxRanges for RISC-V GPR
pairs. The MC layer is currently unaware of the HwMode for registers and
I'd like to keep it that way.
This information is not used by the MC layer so I think it is safe to
move it.
This change introduces a version 3 of the PSV data that includes support
for the name of the entry function as an offset into StringTable data to
a null-terminated utf-8 string.
Additional tests were added to ensure that the new value was properly
serialized/deserialized from object data.
Fixes#80175
---------
Co-authored-by: Cooper Partin <coopp@ntdev.microsoft.com>
Dwarf 5 allows separating filenames from .debug_line into a separate
.debug_line_str section. The strings are referenced relative to the
start of the .debug_line_str section. Previously, on COFF, the
relocation information instead caused offsets to be relocated to the
base address of the COFF-File. This lead to wrong offsets in linked
COFF (PE) files which caused the debugger to be unable to find the
correct source files.
This patch fixes this problem by making the offsets relative to the
start of the .debug_line_str section instead. There should be no
changes for ELF-Files as everything seems to be working there.
A test is also added to ensure that the correct relocation entries are
emitted.
The current values for PrivateGlobalPrefix and PrivateLabelPrefix (@@
and @ respectively) are, in hindsight, poor choices for multiple
reasons:
First, there exist externally visible routines from the language
environment that begin with @@. These functions are certainly not
local/private by any means and they should not share a prefix with
private globals.
Secondly, both private globals and private labels should be handled the
same way by GOFF, so it doesn't make much sense for them to have
separate prefixes. GOFF remains the only file format where these are
different and there is no reason for that to be the case
This PR:
* adds Lifetime intrinsics/instructions
* fixes how the binary header is emitted (correct version and better
approximation of Bound)
* add validation into more test cases
The convention is for such MC-specific options to reside in
MCTargetOptions. However, CompressDebugSections/RelaxELFRelocations do
not follow the convention: `CompressDebugSections` is defined in both
TargetOptions and MCAsmInfo and there is forwarding complexity.
Move the option to MCTargetOptions and hereby simplify the code. Rename
the misleading RelaxELFRelocations to X86RelaxRelocations. llvm-mc
-relax-relocations and llc -x86-relax-relocations can now be unified.
The ppa2list section isn't really part of the ppa2 section. The ppa2list
section contains the offset to the ppa2, and must be created with a
special section name (specifically, C_@@QPPA2). The binder searches for
a section with this name, then uses this value to locate the ppa2.
In GOFF terms, these are entirely separate sections; the PPA2 section
isn't even really a section but rather belongs to the code section. On
the other hand, the ppa2list section is a section in its own right and
resides in a separate TXT record.
A symbol with an `N_ALT_ENTRY` attribute may be defined in the middle of
a subsection, so it is reasonable to opt them out of the
`.cfi_{start,end}proc` nesting check.
Fixes: https://github.com/llvm/llvm-project/issues/82261
Linux kernel fs/binfmt_elf_fdpic.c supports FDPIC for MMU-less systems.
GCC/binutils/qemu support FDPIC ABI for ARM
(https://github.com/mickael-guene/fdpic_doc).
_ARM FDPIC Toolchain and ABI_ provides a summary.
This patch implements FDPIC relocations to the integrated assembler.
There are 6 static relocations and 2 dynamic relocations, with
R_ARM_FUNCDESC as both static and dynamic.
gas requires `--fdpic` to assemble data relocations like `.word f(FUNCDESC)`.
This patch adds `MCTargetOptions::FDPIC` and reports an error if FDPIC
is not set.
Pull Request: https://github.com/llvm/llvm-project/pull/82187
In WebAssembly, we have `WASM_SYMBOL_NO_STRIP` symbol flag to mark the
referenced content as retained. However, the flag is not enough to
express retained data that is not referenced by any symbol. This patch
adds a new segment flag`WASM_SEG_FLAG_RETAIN` to support "private"
linkage data that is retained by llvm.used.
This kind of data that is not referenced but must be retained is usually
used with encapsulation symbols (__start/__stop). Swift runtime uses
this technique and depends on the fact "all metadata sections in live
objects are retained", which was not guaranteed with `--gc-sections`
before this patch.
This is a revised version of https://reviews.llvm.org/D126950 (has been
reverted) based on @MaskRay's comments
C_FILE symbols. To match the behavior of the assembler and the legacy
compiler, this includes using the generic ".file" name for the C_FILE
symbol and generating the actual file name in an auxiliary entry.
This patch adds support for common and local symbols in the TOC for AIX.
Note that we need to update isVirtualSection so as a common symbol in
TOC will have the symbol type XTY_CM and will be initialized when placed
in the TOC so sections with this type are no longer virtual.
---------
Co-authored-by: Zaara Syeda <syzaara@ca.ibm.com>
This change allows a WasmObjectFile to be created from a wasm file even
if it uses typed funcrefs and GC types. It does not significantly change how
lib/Object models its various internal types (e.g. WasmSignature,
WasmElemSegment), so LLVM does not really "support" or understand such
files, but it is sufficient to parse the type, global and element sections, discarding
types that are not understood. This is useful for low-level binary tools such as
nm and objcopy, which use only limited aspects of the binary (such as function
definitions) or deal with sections as opaque blobs.
This is done by allowing `WasmValType` to have a value of `OTHERREF`
(representing any unmodeled reference type), and adding a field to
`WasmSignature` indicating it's a placeholder for an unmodeled reference
type (since there is a 1:1 correspondence between WasmSignature objects
and types in the type section).
Then the object file parsers for the type and element sections are expanded
to parse encoded reference types and discard any unmodeled fields.
Named '.amdhsa_code_object_version'. This directive sets the
e_ident[ABIVERSION] in the ELF header, and should be used as the assumed
COV for the rest of the asm file.
This commit also weakens the --amdhsa-code-object-version CL flag.
Previously, the CL flag took precedence over the IR flag. Now the IR
flag/asm directive take precedence over the CL flag. This is implemented
by merging a few COV-checking functions in AMDGPUBaseInfo.h.
Fix the bug where merge-fdata unconditionally outputs boltedcollection
line, regardless of whether input files have it set.
Test Plan:
Added bolt/test/X86/merge-fdata-nobat-mode.test which fails without this
fix.
LLVM models some features found in the binary format with raw integers
and others with nested or enumerated types. This PR switches modeling of
tables and segments to use wasm::ValType rather than uint32_t. This NFC
change is in preparation for modeling more reference types, but IMO is
also cleaner and closer to the spec.
* Placing 'G' before 'M' (SHF_MERGE) can be misleading as the sh_entsize
argument goes before the section group name, if a reader doesn't know
that the order of extra arguments is not affected by the order of flags.
* 'a', 'w', and 'x' indicate basic permission-related flags. Separating
them with 'G' is kinda ugly.
Simplify code and move 'G' after 'o'. The new output is more similar to
GCC.
When both SHF_LINK_ORDER | SHF_GROUP flags are set, GNU assembler from
2.35 onwards (https://sourceware.org/PR25381https://sourceware.org/binutils/docs/as/Section.html) parses the
SHF_LINK_ORDER argument before section group name, different from us.
This is unfortunate, but does not matter because the `.section` flag `o`
is a niche feature only used by compiler instrumentations, not adopted
by hand-written assembly, and using both flags is extremely rare. Let's
just match GNU assembler. There is another benefit: we now support
zero-flag section group with the SHF_LINK_ORDER flag, while previously
there isn't a syntax.
While here, print 'G' after 'o' to be clear that the 'G' argument is
parsed after the 'o' argument. To make the diff smaller, we don't print
'G' after 'w' in the absence of 'o' for now.
The shouldInsertExtraNopBytesForCodeAlign() need STI to check whether
relax is enabled or not. It is initialized when call setEmitNops. The
setEmitNops may not be called in a section which has instructions but is
not executable. In this case uninitialized STI will cause problems.
Thus, check hasEmitNops before call it.
Fixes:
https://github.com/llvm/llvm-project/pull/76552#issuecomment-1878952480
1, Follow RISCV 1df5ea29 to support generates relocs for .uleb128 which
can not be folded. Unlike RISCV, the located content of LoongArch should
be zero. LoongArch fixup uleb128 value by in-place addition and
subtraction reloc types named R_LARCH_{ADD,SUB}_ULEB128. The located
content can affect the result and R_LARCH_ADD_ULEB128 has enough info to
represent the first symbol value, so it needs to be set to zero.
2, Force relocs if sym is not in section so that it can emit relocs for
external symbol.
Fixes:
https://github.com/llvm/llvm-project/pull/72960#issuecomment-1866844679
Due to delayed decision for ADD/SUB relocations, RISCV and LoongArch may
go slow fragment walk path with available layout. When RISCV (or
LoongArch in the future) don't need insert nops, that means relax is
disabled. With available layout and not needing insert nops, the size of
AlignFragment should be a constant. So we can add it to Displacement for
folding A-B.
For instructions that don't map to a mnemonic string, the implementation
of MCInstPrinter::getMnemonic would return an invalid pointer due to the
result of the calculation of the instruction's position in the `AsmStrs`
table. This patch fixes the issue by ensuring those cases return a
`nullptr` value instead.
Fixes#74177.
This fixes#73109.
In instruction `addl %eax %rax`, because there is a missing comma in the
middle of two registers, the asm parser will treat it as a binary
expression.
```
%rax % rax --> register mod identifier
```
However, In `MCExpr::evaluateAsRelocatableImpl`, it only checks the left
side of the expression. This patch ensures the right side will also be
checked.
Refer to RISCV [1], LoongArch also need delayed decision for ADD/SUB
relocations. In handleAddSubRelocations, just return directly if SecA !=
SecB, handleFixup usually will finish the the rest of creating PCRel
relocations works. Otherwise we emit relocs depends on whether
relaxation is enabled. If not, we return true and avoid record ADD/SUB
relocations.
Now the two symbols separated by alignment directive will return without
folding symbol offset in AttemptToFoldSymbolOffsetDifference, which has
the same effect when relaxation is enabled.
[1] https://reviews.llvm.org/D155357
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.
I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.