GCC 13.3.0 -O3 miscompiles the `getImmFixupKind` after commit
c20379198c7fb66b9514d21ae1e07b0705e3e6fa (-O2 is good), leading to spurious
unreachable failure.
```
% ninja -C /tmp/out/custom-gcc-13 llc && /tmp/out/custom-gcc-13/bin/llc llvm/test/CodeGen/X86/2008-08-06-RewriterBug.ll -mtriple=i686
ninja: Entering directory `/tmp/out/custom-gcc-13'
ninja: no work to do.
Unknown immediate size
UNREACHABLE executed at /home/ray/llvm/llvm/lib/Target/X86/MCTargetDesc/X86BaseInfo.h:904!
```
The latest releases/gcc-13 branch contains the fix
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109934#c6 , resolving this
miscompile.
`Desc.TSFlags` works around the bug.
We will unify the generic fixup kinds FK_Data_ and FK_PCRel_. A
FK_PCRel_ kind is essentially the corresponding FK_Data_ fixup with the
PCRel flag set.
Rename these relocation specifier constants, aligning with the naming
convention used by other targets (`S_` instead of `VK_`).
Move constants to X86MCAsmInfo.h, with the goal of eventually removing
X86MCExpr.h.
Similar to #144633 for AArch64.
Remove FK_PCRel_* kinds from the generic fixup list, as they are not
generic like FK_Data_*. In getRelocType, FK_PCRel_* can be replaced with
FK_Data_* by leveraging the IsPCRel argument. Their inclusion in the
generic kind list caused confusion for PowerPC, RISCV, and VE targets.
The X86/M68k uses can be implemented as target-specific fixups.
Move target-specific members outside of MCSymbolRefExpr::VariantKind
(a legacy interface I am eliminating). Most changes are mechanic,
except:
* ELFObjectWriter::shouldRelocateWithSymbol
* The legacy generic code uses `ELFObjectWriter::fixSymbolsInTLSFixups`
to set `STT_TLS` (and use an unnecessary expression walk). The better
way is to do this in `getRelocType`, which I have done for
AArch64, PowerPC, and RISC-V.
In the future, we should encode expressions with a relocation specifier
as X86MCExpr and use MCValue::RefKind to hold the specifier of the
relocatable expression.
https://maskray.me/blog/2025-03-16-relocation-generation-in-assemblers
While here, rename "Modifier' to "Specifier":
> "Relocation modifier", though concise, suggests adjustments happen during the linker's relocation step rather than the assembler's expression evaluation. I landed on "relocation specifier" as the winner. It's clear, aligns with Arm and IBM’s usage, and fits the assembler's role seamlessly.
Pull Request: https://github.com/llvm/llvm-project/pull/132149
1. Fix the offset for R_X86_64_CODE_6_GOTTPOFF fixup, which is
introduced by #117277. It should be biased with the size of the
immediate field. Related tests are updated.
2. Rename reloc_riprel_6byte_relax to reloc_riprel_4byte_relax_evex as
the number of bytes represents the size of fixup, and "evex" suffix is added
as it's used for APX NDD/NF instructions with EVEX prefix.
3. Remove incorrectly setting R_X86_64_CODE_6_GOTTPOFF relocation type
for APX NDD/NF instructions with GOTPCREL symbol reference modifier.
For
add %reg1, name@GOTTPOFF(%rip), %reg2
add name@GOTTPOFF(%rip), %reg1, %reg2
{nf} add %reg1, name@GOTTPOFF(%rip), %reg2
{nf} add name@GOTTPOFF(%rip), %reg1, %reg2
{nf} add name@GOTTPOFF(%rip), %reg
add
`R_X86_64_CODE_6_GOTTPOFF` = 50
if the instruction starts at 6 bytes before the relocation offset. It's
similar to R_X86_64_GOTTPOFF.
Linker can treat `R_X86_64_CODE_6_GOTTPOFF` as `R_X86_64_GOTTPOFF` or
convert the instructions above to
add $name@tpoff, %reg1, %reg2
add $name@tpoff, %reg1, %reg2
{nf} add $name@tpoff, %reg1, %reg2
{nf} add $name@tpoff, %reg1, %reg2
{nf} add $name@tpoff, %reg
if the first byte of the instruction at the relocation `offset - 6` is
`0xd5` (namely, encoded w/REX2 prefix) when possible.
Binutils patch:
5bc71c2a6b
Binutils mailthread:
https://sourceware.org/pipermail/binutils/2024-February/132351.html
ABI discussion:
https://groups.google.com/g/x86-64-abi/c/FhEZjCtDLFw/m/VHDjN4orAgAJ
Blog: https://kanrobert.github.io/rfc/All-about-APX-relocation
For
lea name@tlsdesc(%rip), %reg
add
R_X86_64_CODE_4_GOTPC32_TLSDESC = 45
if the instruction starts at 4 bytes before the relocation offset. This
should be used if reg is one of the additional general-purpose
registers, r16-r31, in Intel APX. It is similar to
R_X86_64_GOTPC32_TLSDESC and linker optimization must take the different
instruction encoding into account.
Linker can convert the instructions with R_X86_64_CODE_4_GOTPC32_TLSDESC
to
mov $name@tpoff, %reg
if the first byte of the instruction at the relocation offset - 4 is
0xd5 (namely, encoded w/REX2 prefix) when possible.
Binutils patch:
a533c8df59
Binutils mailthread:
https://sourceware.org/pipermail/binutils/2023-December/131463.html
ABI discussion:
https://groups.google.com/g/x86-64-abi/c/ACwD-UQXVDs/m/vrgTenKyFwAJ
Blog: https://kanrobert.github.io/rfc/All-about-APX-relocation
Rename R_X86_64_REX2_GOTPCRELX to R_X86_64_CODE_4_GOTPCRELX, to align
with GCC/binutils and ABI.
GCC/binutils:
3d5a60de52
and
4a54cb0658
ABI:
357de358ba
For
mov name@GOTPCREL(%rip), %reg
test %reg, name@GOTPCREL(%rip)
binop name@GOTPCREL(%rip), %reg
where binop is one of adc, add, and, cmp, or, sbb, sub, xor
instructions, add
`R_X86_64_REX2_GOTPCRELX`/`R_X86_64_CODE_4_GOTPCRELX` = 43
if the instruction starts at 4 bytes before the relocation offset. It
similar to R_X86_64_GOTPCRELX.
Linker can treat `R_X86_64_REX2_GOTPCRELX`/`R_X86_64_CODE_4_GOTPCRELX`
as `R_X86_64_GOTPCREL` or convert the above instructions to
lea name(%rip), %reg
mov $name, %reg
test $name, %reg
binop $name, %reg
if the first byte of the instruction at the relocation `offset - 4` is
`0xd5` (namely, encoded w/ REX2 prefix) when possible.
Binutils patch:
3d5a60de52
Binutils mailthread:
https://sourceware.org/pipermail/binutils/2023-December/131462.html
ABI discussion: https://groups.google.com/g/x86-64-abi/c/KbzaNHRB6QU
Blog: https://kanrobert.github.io/rfc/All-about-APX-relocation
We can convert the MCRegister to bool instead. I think this should
allows us to remove MCRegister::operator==(int). All other comparisons
in tree are unsigned.
For very large stack frames, the offset from the stack pointer to a local can be more than 2^31 which overflows various `int` offsets in the frame lowering code.
This patch updates the frame lowering code to calculate the offsets as 64-bit values and resolves the overflows, resulting in the correct codegen for very large frames.
Fixes#48911
APX assembly syntax recommendations:
https://cdrdv2.intel.com/v1/dl/getContent/817241
NOTE:
The change in llvm/tools/llvm-exegesis/lib/X86/Target.cpp is for test
LLVM ::
tools/llvm-exegesis/X86/latency/latency-SETCCr-cond-codes-sweep.s
For `SETcc`, llvm-exegesis would randomly choose 1 other instruction to
test with `SETcc`, after selecting the instruction, llvm-exegesis would
check if the operand is initialized and valid, if not
`randomizeTargetMCOperand` would choose a value for invalid operand, it
misses support for condition code operand, which cause the flaky failure
after `CCMP` supported.
llvm-exegesis can choose `CCMP` without specifying ccmp feature b/c it
use `MCSubtarget` and only16/32/64 bit is considered.
llvm-exegesis doesn't choose other instructions b/c requirement in
`hasAliasingRegistersThrough`: the instruction should use GPR (defined
by `SETcc`) and define `EFLAGS` (used by `SETcc`).
The instruction-size limit of 15 bytes still applies to APX
instructions.
Note that it is possible for an EVEX-encoded legacy instruction to reach
the 15-byte instruction length limit: 4
bytes of EVEX prefix + 1 byte of opcode + 1 byte of ModRM + 1 byte of
SIB + 4 bytes of displacement + 4 bytes of immediate = 15 bytes in
total, e.g.
```
addq $184, -96, %rax # encoding:
[0x62,0xf4,0xfc,0x18,0x81,0x04,0x25,0xa0,0xff,0xff,0xff,0xb8,0x00,0x00,0x00]
```
If we added a segment prefix like fs, the length would be 16.
In such a case, no additional (ASIZE or segment override) prefix can be
used.
To help users find this issue earlier, especially for assembler users,
we change
the internal compiler error to error in this patch.
Diagnostic is aligned with GAS
https://sourceware.org/bugzilla/show_bug.cgi?id=31323
PUSH2 and POP2 are two new instructions for (respectively)
pushing/popping 2 GPRs at a time to/from
the stack. The opcodes of PUSH2 and POP2 are those of “PUSH r/m” and
“POP r/m” from legacy map 0, but we
require ModRM.Mod = 3 in order to disallow memory operand.
The 1-bit Push-Pop Acceleration hint described in #73092 applies to
PUSH2/POP2 too, then we have PUSH2P/POP2P.
For AT&T syntax, PUSH2[P] pushes the registers from right to left onto
the stack. POP2[P] pops the stack to registers from right to left. Intel
syntax has the opposite order - from left to right.
The assembly syntax is aligned with GCC & binutils
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637718.html
JMPABS is a 64-bit only ISA extension, and acts as a near-direct branch
with an absolute target. The 64-bit immediate operand is treated an as
absolute effective address, which is subject to canonicality checks. It
is in legacy map 0 and requires REX2 prefix with `REX2.M0=0` and
`REX2.W=0`. All other REX2 payload bits are ignored.
blog: https://kanrobert.github.io/rfc/All-about-APX-JMPABS/
This patch
1. Extends `ExplicitVEXPrefix` to `ExplicitOpPrefix` for instrcutions
requires explicit `REX2` or `EVEX`
2. Adds `ATTR_REX2` and `IC_64BIT_REX2` to put `JMPABS` , `MOV EAX,
moffs32` in different tables to avoid opcode conflict
NOTE:
1. `ExplicitREX2Prefix` can be reused by the following PUSHP/POPP
instructions.
2. `ExplicitEVEXPrefix` will be used by the instructions promoted to
EVEX space for EGPR.
This is an alternative of D157485 and a pre-feature to support AVX10.
AVX10 Architecture Specification: https://cdrdv2.intel.com/v1/dl/getContent/784267
AVX10 Technical Paper: https://cdrdv2.intel.com/v1/dl/getContent/784343
RFC: https://discourse.llvm.org/t/rfc-design-for-avx10-feature-support/72661
Based on the feedbacks from LLVM and GCC community, we have agreed to
start from supporting `-m[no-]evex512` on existing AVX512 features.
The option `-mno-evex512` can be used with `-mavx512xxx` to build
binaries that can run on both legacy AVX512 targets and AVX10-256.
There're still arguments about what's the expected behavior when this
option as well as `-mavx512xxx` used together with `-mavx10.1-256`. We
decided to defer the support of `-mavx10.1` after we made consensus.
Or furthermore, we start from supporting AVX10.2 and not providing any
AVX10.1 options.
Reviewed By: RKSimon, skan
Differential Revision: https://reviews.llvm.org/D159250
This is an alternative of D157485 and a pre-feature to support AVX10.
AVX10 Architecture Specification: https://cdrdv2.intel.com/v1/dl/getContent/784267
AVX10 Technical Paper: https://cdrdv2.intel.com/v1/dl/getContent/784343
RFC: https://discourse.llvm.org/t/rfc-design-for-avx10-feature-support/72661
Based on the feedbacks from LLVM and GCC community, we have agreed to
start from supporting `-m[no-]evex512` on existing AVX512 features.
The option `-mno-evex512` can be used with `-mavx512xxx` to build
binaries that can run on both legacy AVX512 targets and AVX10-256.
There're still arguments about what's the expected behavior when this
option as well as `-mavx512xxx` used together with `-mavx10.1-256`. We
decided to defer the support of `-mavx10.1` after we made consensus.
Or furthermore, we start from supporting AVX10.2 and not providing any
AVX10.1 options.
Reviewed By: RKSimon, skan
Differential Revision: https://reviews.llvm.org/D159250
Depends on D145791
Storing instruction bytes directly in a SmallVector instead of a
raw_ostream yields better encoding performance (in some applications,
the improvment is ~1% of the complete back-end time).
Reviewed By: MaskRay, Amir
Differential Revision: https://reviews.llvm.org/D145792
1. Add a variable `HasRegOp` to record if the instruction has a register operand
2. Enumerate all the formats with a register operand in the switch
2. Add a default (unreachable) label in the switch (suggested by @reames)
Reviewed By: pengfei
Differential Revision: https://reviews.llvm.org/D144776