This patch adds the feature flags of LUT and SME_LUTv2, and the
assembly/disassembly
for the following instructions of NEON, SVE2 and SME2:
* NEON:
- LUT2
- LUT4
* SVE2:
- LUTI2_ZZZI
- LUTI4_ZZZI
- LUTI4_Z2ZZI
* SME:
- MOVT
- LUTI4_4ZZT2Z
- LUTI4_S_4ZZT2Z
That is according to this documentation:
https://developer.arm.com/documentation/ddi0602/2023-09
…r the assembly disassembly
This NFC patch refactors the assembly/disassembly class and multiclass
in the AArch64 backend to receive a new 2023/09 AArch64[1] ISA release.
The encoding for the 2023 instructions re-uses encoding blocks from
previous assembly/disassembly instructions.
The refactoring makes the class and multiclass for assembly/disassembly
generic so it can be used to describe the instructions for the new ISA.
[1]https://developer.arm.com/documentation/ddi0602/2023-09
This patch separates PNR registers into their own register class instead
of sharing a register class with PPR registers. This primarily allows us
to return more accurate register classes when applying assembly
constraints, but also more protection from supplying an incorrect
predicate type to an invalid register operand.
Add support for syntax highlighting assembly. The patch introduces new
RAII helper called WithMarkup that takes care of both emitting colors
and markup annotations. It makes adding markup easier and ensures colors
and annotations remain consistent.
This patch adopts the new helper in the AArch64 backend. If your backend
already uses markup annotations, adoption is as easy as using the new
MCInstPrinter::markup overload.
Differential revision: https://reviews.llvm.org/D159162
This change switches both targets from using target specific CompilerBarrier nodes to the recently introduced generic MEMBARRIER instruction.
A couple things to call out.
First, this changes the assembly comment printed. I'm not sure this matters, but if it does, we can simply drop this patch. This is a minor clean up at best.
Second, the ordering operand on the target instruction appears to be unused. We could easily add ordering to the generic instruction, but since we don't seem to have a motivating case in tree, I simply dropped the ordering when selecting to the generic instruction.
Differential Revision: https://reviews.llvm.org/D141513
This adds support for the new PM pstate system register introduced by
the v9.4-A Exception-based Event Profiling extension (FEAT_EBEP).
The new PM pstate register takes a 1-bit immediate and requires
different values to be specified for the higher bits of the Crm field.
To enable that, this patch creates an explicit separation between the
pstate system registers that take 4-bit and 1-bit immediate operands,
allowing each entry to specify the value for the 3 high bits of Crm.
This also updates other pstate registers to correctly accept 4-bit
immediates, matching their decoding specification from the Arm ARM.
These include: `PAN`, `UAO`, `DIT` and `SSBS`.
More information about this extension and the new register can be found
at:
* https://developer.arm.com/documentation/ddi0601/2022-09/AArch64-Registers/PM--PMU-Exception-Mask
Contributors:
* Lucas Prates
* Sam Elliott
Reviewed By: lenary
Differential Revision: https://reviews.llvm.org/D139925
Virtual Memory System Architecture (VMSA)
This is part of the 2022 A-Profile Architecture extensions and adds support for
the following:
- Translation Hardening Extension (FEAT_THE)
- 128-bit Page Table Descriptors (FEAT_D128)
- 56-bit Virtual Address (FEAT_LVA3)
- Support for 128-bit System Registers (FEAT_SYSREG128)
- System Instructions that can take 128-bit inputs (FEAT_SYSINSTR128)
- 128-bit Atomic Instructions (FEAT_LSE128)
- Permission Indirection Extension (FEAT_S1PIE, FEAT_S2PIE)
- Permission Overlay Extension (FEAT_S1POE, FEAT_S2POE)
- Memory Attribute Index Enhancement (FEAT_AIE)
New instructions added:
- FEAT_SYSREG128 adds MRRS and MSRR.
- FEAT_SYSINSTR128 adds the SYSP instruction and TLBIP aliases.
- FEAT_LSE128 adds LDCLRP, LDSET, and SWPP instructions.
- FEAT_THE adds the set of RCW* instructions.
Specs for individual instructions can be found here:
https://developer.arm.com/documentation/ddi0602/2022-09/Base-Instructions/
Contributors:
Keith Walker
Lucas Prates
Sam Elliott
Son Tuan Vu
Tomas Matheson
Differential Revision: https://reviews.llvm.org/D138920
This patch implements the 2022 Architecture General Data-Processing Instructions
They include:
Common Short Sequence Compression (CSSC) instructions
- scalar comparison instructions
SMAX, SMIN, UMAX, UMIN (32/64 bits) with or without immediate
- ABS (absolute), CNT (count non-zero bits), CTZ (count trailing zeroes)
- command-line options for CSSC
Associated with these instructions in the documentation is the Range Prefetch
Memory (RPRFM) instruction, which signals to the memory system that data memory
accesses from a specified range of addresses are likely to occur in the near
future. The instruction lies in hint space, and is made unconditional.
Specs for the individual instructions can be found here:
https://developer.arm.com/documentation/ddi0602/2022-09/Base-Instructions/
contributors to this patch:
- Cullen Rhodes
- Son Tuan Vu
- Mark Murray
- Tomas Matheson
- Sam Elliott
- Ties Stuij
Reviewed By: lenary
Differential Revision: https://reviews.llvm.org/D138488
This patch adds the assembly/disassembly for the following instruction:
SEL: Multi-vector conditionally select elements from two vectors
for 2 and 4 registers
Non-constiguous load with stride resgisters:
LD1B (scalar + immediate): Contiguous load of bytes to multiple strided vectors (immediate index).
(scalar + scalar): Contiguous load of bytes to multiple strided vectors (scalar index).
LD1D (scalar + immediate): Contiguous load of doublewords to multiple strided vectors (immediate index).
(scalar + scalar): Contiguous load of doublewords to multiple strided vectors (scalar index).
LD1H (scalar + immediate): Contiguous load of halfwords to multiple strided vectors (immediate index).
(scalar + scalar): Contiguous load of halfwords to multiple strided vectors (scalar index).
LD1W (scalar + immediate): Contiguous load of words to multiple strided vectors (immediate index).
(scalar + scalar): Contiguous load of words to multiple strided vectors (scalar index).
LDNT1B (scalar + immediate): Contiguous load non-temporal of bytes to multiple strided vectors (immediate index).
(scalar + scalar): Contiguous load non-temporal of bytes to multiple strided vectors (scalar index).
LDNT1D (scalar + immediate): Contiguous load non-temporal of doublewords to multiple strided vectors (immediate index).
(scalar + scalar): Contiguous load non-temporal of doublewords to multiple strided vectors (scalar index).
LDNT1H (scalar + immediate): Contiguous load non-temporal of halfwords to multiple strided vectors (immediate index).
(scalar + scalar): Contiguous load non-temporal of halfwords to multiple strided vectors (scalar index).
LDNT1W (scalar + immediate): Contiguous load non-temporal of words to multiple strided vectors (immediate index).
(scalar + scalar): Contiguous load non-temporal of words to multiple strided vectors (scalar index).
Non-constiguous store with stride resgisters:
ST1B (scalar + immediate): Contiguous store of bytes from multiple strided vectors (immediate index).
(scalar + scalar): Contiguous store of bytes from multiple strided vectors (scalar index).
ST1D (scalar + immediate): Contiguous store of doublewords from multiple strided vectors (immediate index).
(scalar + scalar): Contiguous store of doublewords from multiple strided vectors (scalar index).
ST1H (scalar + immediate): Contiguous store of halfwords from multiple strided vectors (immediate index).
(scalar + scalar): Contiguous store of halfwords from multiple strided vectors (scalar index).
ST1W (scalar + immediate): Contiguous store of words from multiple strided vectors (immediate index).
(scalar + scalar): Contiguous store of words from multiple strided vectors (scalar index).
STNT1B (scalar + immediate): Contiguous store non-temporal of bytes from multiple strided vectors (immediate index).
(scalar + scalar): Contiguous store non-temporal of bytes from multiple strided vectors (scalar index).
STNT1D (scalar + immediate): Contiguous store non-temporal of doublewords from multiple strided vectors (immediate index).
(scalar + scalar): Contiguous store non-temporal of doublewords from multiple strided vectors (scalar index).
STNT1H (scalar + immediate): Contiguous store non-temporal of halfwords from multiple strided vectors (immediate index).
(scalar + scalar): Contiguous store non-temporal of halfwords from multiple strided vectors (scalar index).
STNT1W (scalar + immediate): Contiguous store non-temporal of words from multiple strided vectors (immediate index).
(scalar + scalar): Contiguous store non-temporal of words from multiple strided vectors (scalar index).
The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2022-09
This patch also adds a new SVE vector list to represent the stride loads/stores
ZPRVectorListStrided and the sets of 2 and 4 ZA registers:
ZZ_[b|h|w|d]_strided and ZZZZ_[b|h|w|d]_strided
Differential Revision: https://reviews.llvm.org/D136172
This patch adds the assembly/disassembly for the following instructions:
ZERO (ZT0): Zero ZT0.
LDR (ZT0): Load ZT0 register.
STR (ZT0): Store ZT0 register.
MOVT (scalar to ZT0): Move 8 bytes from general-purpose register to ZT0.
(ZT0 to scalar): Move 8 bytes from ZT0 to general-purpose register.
Consecutive:
LUTI2 (single): Lookup table read with 2-bit indexes.
(two registers): Lookup table read with 2-bit indexes.
(four registers): Lookup table read with 2-bit indexes.
LUTI4 (single): Lookup table read with 4-bit indexes.
(two registers): Lookup table read with 4-bit indexes.
(four registers): Lookup table read with 4-bit indexes.
The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2022-09
This patch also adds a new register class and operand for zt0
and a another index operand uimm3s8
Differential Revision: https://reviews.llvm.org/D136088
This patch adds the assembly/disassembly for the following
predicate pair instructions:
pext: Set pair of predicates from predicate-as-counter
whilelt: While incrementing signed scalar less than scalar
whilele: While incrementing signed scalar less than or equal to scalar
whilegt: While incrementing signed scalar greater than scalar
whilege: While incrementing signed scalar greater than or equal to scalar
whilelo: While incrementing unsigned scalar lower than scalar
whilels: While incrementing unsigned scalar lower or same as scalar
whilehs: While decrementing unsigned scalar higher or same as scalar
whilehi: While decrementing unsigned scalar higher than scalar
The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2022-09
Differential Revision: https://reviews.llvm.org/D136759
This patch adds the assembly/disassembly for the following instructions:
pext (predicate) : Set predicate from predicate-as-counter
ptrue (predicate-as-counter) : Initialise predicate-as-counter to all active
This patch also introduces the predicate-as-counter registers pn8, etc.
The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2022-09
Differential Revision: https://reviews.llvm.org/D136678
This patch adds the assembly/disassembly for the following instructions:
INT:
SMLAL
SMLSL
UMLAL
UMLSL
FP:
BFMLAL
BFMLSL
FMLAL
FMLSL
For multiple and indexed vector, Multiple and Single vector and
Multi vectors, for 1, 2 and 4 ZA registers.
The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2022-09
It also adds a new immediate:
uimm3s2range for off3
uimm2s2range for off2
to represent the vector select offset.
The new operands have the range between the first and the last vector position.
Depends on: D135563
Reviewed By: aemerson, sdesmalen
Re-landing the patch as the problem with https://reviews.llvm.org/D135563
is fixed in this commit: 1e4f82c2578cf5045ffe
Differential Revision: https://reviews.llvm.org/D135785
This patch adds the assembly/disassembly for the following instructions:
INT:
SMLAL
SMLSL
UMLAL
UMLSL
FP:
BFMLAL
BFMLSL
FMLAL
FMLSL
For multiple and indexed vector, Multiple and Single vector and
Multi vectors, for 1, 2 and 4 ZA registers.
The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2022-09
It also adds a new immediate:
uimm3s2range for off3
uimm2s2range for off2
to represent the vector select offset.
The new operands have the range between the first and the last vector position.
Depends on: D135563
Reviewed By: aemerson, sdesmalen
Differential Revision: https://reviews.llvm.org/D135785
This patch has the prefered disassembly changed for SVE vector list.
For instance, instead of printing this assembly:
ld4d { z1.d, z2.d, z3.d, z4.d }, p0/z, [x0]
it will print this:
ld4d { z1.d-z4.d }, p0/z, [x0]
Differential Revision: https://reviews.llvm.org/D135952
We can remove the MatrixZADRegisterTable table of tile registers and
just calculate the register index directly.
Differential Revision: https://reviews.llvm.org/D127757
This reverts commit ef8206320769ad31422a803a0d6de6077fd231d2.
- It conflicts with the existing llvm::size in STLExtras, which will now
never be called.
- Calling it without llvm:: breaks C++17 compat
This change introduces subtarget features to predicate certain
instructions and system registers that are available only on
'A' profile targets. Those features are not present when
targeting a generic CPU, which is the default processor.
In other words the generic CPU now means the intersection of
'A' and 'R' profiles. To maintain backwards compatibility we
enable the features that correspond to -march=armv8-a when the
architecture is not explicitly specified on the command line.
References: https://developer.arm.com/documentation/ddi0600/latest
Differential Revision: https://reviews.llvm.org/D110065
Changes in architecture revision 00eac1:
* Tile slice index offset no longer prefixed with '#'.
* The syntax for 128-bit (.Q) ZA tile slice accesses must now include
an explicit zero index.
The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-09
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D111212
Add a comment when there is a shifted value,
add x9, x0, #291, lsl #12 ; =1191936
but not when the immediate value is unshifted,
subs x9, x0, #256 ; =256
when the comment adds nothing additional to the reader.
Differential Revision: https://reviews.llvm.org/D107196
This patch adds the zero instruction for zeroing a list of 64-bit
element ZA tiles. The instruction takes a list of up to eight tiles
ZA0.D-ZA7.D, which must be in order, e.g.
zero {za0.d,za1.d,za2.d,za3.d,za4.d,za5.d,za6.d,za7.d}
zero {za1.d,za3.d,za5.d,za7.d}
The assembler also accepts 32-bit, 16-bit and 8-bit element tiles which
are mapped to corresponding 64-bit element tiles in accordance with the
architecturally defined mapping between different element size tiles,
e.g.
* Zeroing ZA0.B, or the entire array name ZA, is equivalent to zeroing
all eight 64-bit element tiles ZA0.D to ZA7.D.
* Zeroing ZA0.S is equivalent to zeroing ZA0.D and ZA4.D.
The preferred disassembly of this instruction uses the shortest list of
tile names that represent the encoded immediate mask, e.g.
* An immediate which encodes 64-bit element tiles ZA0.D, ZA1.D, ZA4.D and
ZA5.D is disassembled as {ZA0.S, ZA1.S}.
* An immediate which encodes 64-bit element tiles ZA0.D, ZA2.D, ZA4.D and
ZA6.D is disassembled as {ZA0.H}.
* An all-ones immediate is disassembled as {ZA}.
* An all-zeros immediate is disassembled as an empty list {}.
This patch adds the MatrixTileList asm operand and related parsing to support
this.
Depends on D105570.
The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D105575
This patch adds the new system registers introduced in SME:
- ID_AA64SMFR0_EL1 (ro) SME feature identifier.
- SMCR_ELx (r/w) streaming mode control register for configuring
effective SVE Streaming SVE Vector length when the PE is in
Streaming SVE mode.
- SVCR (r/w) streaming vector control register, visible at all
exception levels. Provides access to PSTATE.SM and PSTATE.ZA
using MSR and MRS instructions.
- SMPRI_EL1 (r/w) streaming mode execution priority register.
- SMPRIMAP_EL2 (r/w) streaming mode priority mapping register.
- SMIDR_EL1 (ro) streaming mode identification register.
- TPIDR2_EL0 (r/w) for use by SME software to manage per-thread
SME context.
- MPAMSM_EL1 (r/w) MPAM (v8.4) streaming mode register, for
labelling memory accesses performed in streaming mode.
Also added in this patch are the SME mode change instructions.
Three MSR immediate instructions are implemented to set or clear
PSTATE.SM, PSTATE.ZA, or both respectively:
- MSR SVCRSM, #<imm1>
- MSR SVCRZA, #<imm1>
- MSR SVCRSMZA, #<imm1>
The following smstart/smstop aliases are also implemented for
convenience:
smstart -> MSR SVCRSMZA, #1
smstart sm -> MSR SVCRSM, #1
smstart za -> MSR SVCRZA, #1
smstop -> MSR SVCRSMZA, #0
smstop sm -> MSR SVCRSM, #0
smstop za -> MSR SVCRZA, #0
The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D105576
SME introduces the ZA array, a new piece of architectural register state
consisting of a matrix of [SVLb x SVLb] bytes, where SVL is the
implementation defined Streaming SVE vector length and SVLb is the
number of 8-bit elements in a vector of SVL bits.
SME instructions consist of three types of matrix operands:
* Tiles: a ZA tile is a square, two-dimensional sub-array of elements
within the ZA array. These tiles make up the larger accumulator array
and the granularity varies based on the element size, i.e.
- ZAQ0..ZAQ15 (smallest tile granule)
- ZAD0..ZAD7
- ZAS0..ZAS3
- ZAH0..ZAH1
or ZAB0 (largest tile granule, single tile)
* Tile vectors: similar to regular tiles, but have an extra 'h' or 'v'
to tell how the vector at [reg+offset] is layed out in the tile,
horizontally or vertically. E.g. za1h.h or za15v.q, which corresponds
to vectors in registers ZAH1 and ZAQ15, respectively.
* Accumulator matrix: this is the entire accumulator array ZA.
This patch adds the register classes and related operands and parsing
for SME instructions operating on the accumulator array.
The ADDHA and ADDVA instructions which operate on tiles are also added
in this patch to make some use of the code added, later patches will
make use of the other operands introduced here.
The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06
Co-authored by: Sander de Smalen (@sdesmalen)
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D105570
This enables the no-aliases forms of many instructions.
Depends on D103004
Reviewed By: tmatheson
Differential Revision: https://reviews.llvm.org/D103005
The Arm Architecture Reference Manual says that the SystemHintOp_BTI
opcode is prefered when CRm:op2 matches 0100:xx0, but llvm-mc
currently accepts 0100:xxx, which isn't right.
Differential Revision: https://reviews.llvm.org/D102415
As mentioned in TODO comment, casting double to float causes NaNs to change bits.
To avoid the change, this patch adds support for single-floating-point immediate value on MachineCode.
Patch by Yuta Saito.
Differential Revision: https://reviews.llvm.org/D77384
This adds a GPR64x8 register class that will be needed as the data
operand to the LD64B/ST64B family of instructions in the v8.7-A
Accelerator Extension, which load or store a contiguous range of eight
x-regs. It has to be its own register class so that register allocation
will have visibility of the full set of registers actually read/written
by the instructions, which will be needed when we add intrinsics and/or
inline asm access to this piece of architecture.
Patch written by Simon Tatham.
Reviewed By: ostannard
Differential Revision: https://reviews.llvm.org/D91774
This introduces support for the v8.7-A architecture through a new
subtarget feature called "v8.7a". It adds two new "WFET" and "WFIT"
instructions, the nXS limited-TLB-maintenance qualifier for DSB and TLBI
instructions, a new CPU id register, ID_AA64ISAR2_EL1, and the new
HCRX_EL2 system register.
Based on patches written by Simon Tatham and Victor Campos.
Reviewed By: ostannard
Differential Revision: https://reviews.llvm.org/D91772
Similar to D77853. Change ADRP to print the target address in hex, instead of the raw immediate.
The behavior is similar to GNU objdump but we also include `0x`.
Note: GNU objdump is not consistent whether or not to emit `0x` for different architectures. We try emitting 0x consistently for all targets.
```
GNU objdump: adrp x16, 10000000
Old llvm-objdump: adrp x16, #0
New llvm-objdump: adrp x16, 0x10000000
```
`adrp Xd, 0x...` assembles to a relocation referencing `*ABS*+0x10000` which is not intended. We need to use a linker or use yaml2obj.
The main test is `test/tools/llvm-objdump/ELF/AArch64/pcrel-address.yaml`
Differential Revision: https://reviews.llvm.org/D93241
Follow-up of D72172 and D72180
This patch passes `uint64_t Address` to print methods of PC-relative
operands so that subsequent target specific patches can change
`*InstPrinter::print{Operand,PCRelImm,...}` to customize the output.
Add MCInstPrinter::PrintBranchImmAsAddress which is set to true by
llvm-objdump.
```
// Current llvm-objdump -d output
aarch64: 20000: bl #0
ppc: 20000: bl .+4
x86: 20000: callq 0
// Ideal output
aarch64: 20000: bl 0x20000
ppc: 20000: bl 0x20004
x86: 20000: callq 0x20005
// GNU objdump -d. The lack of 0x is not ideal because the result cannot be re-assembled
aarch64: 20000: bl 20000
ppc: 20000: bl 0x20004
x86: 20000: callq 20005
```
In `lib/Target/X86/X86GenAsmWriter1.inc` (generated by `llvm-tblgen -gen-asm-writer`):
```
case 12:
// CALL64pcrel32, CALLpcrel16, CALLpcrel32, EH_SjLj_Setup, JCXZ, JECXZ, J...
- printPCRelImm(MI, 0, O);
+ printPCRelImm(MI, Address, 0, O);
return;
```
Some targets have 2 `printOperand` overloads, one without `Address` and
one with `Address`. They should annotate derived `Operand` properly with
`let OperandType = "OPERAND_PCREL"`.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D76574