412 Commits

Author SHA1 Message Date
Lucas Prates
f516e91715 [AArch64] Add new v9.4-A PM pstate system register
This adds support for the new PM pstate system register introduced by
the v9.4-A Exception-based Event Profiling extension (FEAT_EBEP).

The new PM pstate register takes a 1-bit immediate and requires
different values to be specified for the higher bits of the Crm field.
To enable that, this patch creates an explicit separation between the
pstate system registers that take 4-bit and 1-bit immediate operands,
allowing each entry to specify the value for the 3 high bits of Crm.

This also updates other pstate registers to correctly accept 4-bit
immediates, matching their decoding specification from the Arm ARM.
These include: `PAN`,  `UAO`, `DIT` and `SSBS`.

More information about this extension and the new register can be found
at:
* https://developer.arm.com/documentation/ddi0601/2022-09/AArch64-Registers/PM--PMU-Exception-Mask

Contributors:
* Lucas Prates
* Sam Elliott

Reviewed By: lenary

Differential Revision: https://reviews.llvm.org/D139925
2022-12-19 15:07:52 +00:00
Sergei Barannikov
4d48ccfc88 [MC] Use MCRegister instead of unsigned in MCTargetAsmParser
Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D140273
2022-12-18 12:12:05 -08:00
Archibald Elliott
947d4fb373 [AArch64] RASv2 Assembly Support
This feature adds upstream support for FEAT_RASv2 and FEAT_PFAR. Both
are system-register-only, but FEAT_RAS is behind the command-line
extension "+ras", so FEAT_RASv2 is behind "+rasv2".

This patch includes support for ID_AA64MMFR4_EL1. This is an ID system
register so it is not behind any feature flags.

Differential Revision: https://reviews.llvm.org/D139936
2022-12-16 14:37:35 +00:00
Lucas Prates
2050e7ebe1 [Arm][AArch64] Add support for v8.9-A/v9.4-A base extensions
This implements the base extensions that are part of the v8.9-A and
v9.4-A architecture versions, including:

* The Clear BHB Instruction (FEAT_CLRBHB)
* The Speculation Restriction Instruction (FEAT_SPECRES2)
* The SLC target for the PRFM instruction
* New system registers:
  * ID_AA64PFR2_EL1
  * ID_AA64MMFR3_EL1
  * HFGITR2_EL2
  * SCTLR2_EL3

More information on the new extensions can be found on:

* https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/arm-a-profile-architecture-2022
* https://developer.arm.com/downloads/-/exploration-tools

Contributors: Sam Elliott, Tomas Matheson and Son Tuan Vu.

Reviewed By: lenary

Differential Revision: https://reviews.llvm.org/D139424
2022-12-08 10:15:29 +00:00
Tomas Matheson
541a1371c0 Revert "[AArch64] Improve TargetParser API"
This reverts commit e83f1502f1be7a2a3b9a33f5a73867767e78ba6b.

Did not build with C++20 and caused problems with dynamic libs.
2022-12-05 11:09:03 +00:00
Fangrui Song
b0df70403d [Target] llvm::Optional => std::optional
The updated functions are mostly internal with a few exceptions (virtual functions in
TargetInstrInfo.h, TargetRegisterInfo.h).
To minimize changes to LLVMCodeGen, GlobalISel files are skipped.

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-04 22:43:14 +00:00
Kazu Hirata
20cde15415 [Target] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-02 20:36:06 -08:00
Fangrui Song
3c2f9daf2d [AArch64] Remove following .inst/after directive from AsmParser diagnostics
The part of the diagnostic is not useful because the instruction line is
printed. The new style follows generic code.
2022-12-01 21:44:03 +00:00
Tomas Matheson
e83f1502f1 [AArch64] Improve TargetParser API
Re-land with constexpr StringRef::substr():

The TargetParser depends heavily on a collection of macros and enums to tie
together information about architectures, CPUs and extensions. Over time this
has led to some pretty awkward API choices. For example, recently a custom
operator-- has been added to the enum, which effectively turns iteration into
a graph traversal and makes the ordering of the macro calls in the header
significant. More generally there is a lot of string <-> enum conversion
going on. I think this shows the extent to which the current data structures
are constraining us, and the need for a rethink.

Key changes:

 - Get rid of Arch enum, which is used to bind fields together. Instead of
   passing around ArchKind, use the named ArchInfo objects directly or via
   references.

 - The list of all known ArchInfo becomes an array of pointers.

 - ArchKind::operator-- is replaced with ArchInfo::implies(), which defines
   which architectures are predecessors to each other. This allows features
   from predecessor architectures to be added in a more intuitive way.

 - Free functions of the form f(ArchKind) are converted to ArchInfo::f(). Some
   functions become unnecessary and are deleted.

 - Version number and profile are added to the ArchInfo. This makes comparison
   of architectures easier and moves a couple of functions out of clang and
   into AArch64TargetParser.

 - clang::AArch64TargetInfo ArchInfo is initialised to Armv8a not INVALID.

 - AArch64::ArchProfile which is distinct from ARM::ArchProfile

 - Give things sensible names and add some comments.

Differential Revision: https://reviews.llvm.org/D138792
2022-12-01 15:30:07 +00:00
Tomas Matheson
d1ef4b0a8d Revert "[AArch64] Improve TargetParser API"
Buildbots unhappy about constexpr function.

This reverts commit 450de8008bb0ccb5dfc9dd69b6f5b434158772bd.
2022-12-01 13:06:54 +00:00
Tomas Matheson
450de8008b [AArch64] Improve TargetParser API
The TargetParser depends heavily on a collection of macros and enums to tie
together information about architectures, CPUs and extensions. Over time this
has led to some pretty awkward API choices. For example, recently a custom
operator-- has been added to the enum, which effectively turns iteration into
a graph traversal and makes the ordering of the macro calls in the header
significant. More generally there is a lot of string <-> enum conversion
going on. I think this shows the extent to which the current data structures
are constraining us, and the need for a rethink.

Key changes:

 - Get rid of Arch enum, which is used to bind fields together. Instead of
   passing around ArchKind, use the named ArchInfo objects directly or via
   references.

 - The list of all known ArchInfo becomes an array of pointers.

 - ArchKind::operator-- is replaced with ArchInfo::implies(), which defines
   which architectures are predecessors to each other. This allows features
   from predecessor architectures to be added in a more intuitive way.

 - Free functions of the form f(ArchKind) are converted to ArchInfo::f(). Some
   functions become unnecessary and are deleted.

 - Version number and profile are added to the ArchInfo. This makes comparison
   of architectures easier and moves a couple of functions out of clang and
   into AArch64TargetParser.

 - clang::AArch64TargetInfo ArchInfo is initialised to Armv8a not INVALID.

 - AArch64::ArchProfile which is distinct from ARM::ArchProfile

 - Give things sensible names and add some comments.

Differential Revision: https://reviews.llvm.org/D138792
2022-12-01 12:50:23 +00:00
Tomas Matheson
f57f086714 [AArch64TargetParser] getArchFeatures -> getArchFeature
Differential Revision: https://reviews.llvm.org/D138753
2022-12-01 12:50:17 +00:00
Tomas Matheson
7fea6f2e0e [AArch64] Assembly support for VMSA
Virtual Memory System Architecture (VMSA)

This is part of the 2022 A-Profile Architecture extensions and adds support for
the following:

 - Translation Hardening Extension (FEAT_THE)
 - 128-bit Page Table Descriptors (FEAT_D128)
 - 56-bit Virtual Address (FEAT_LVA3)
 - Support for 128-bit System Registers (FEAT_SYSREG128)
 - System Instructions that can take 128-bit inputs (FEAT_SYSINSTR128)
 - 128-bit Atomic Instructions (FEAT_LSE128)
 - Permission Indirection Extension (FEAT_S1PIE, FEAT_S2PIE)
 - Permission Overlay Extension (FEAT_S1POE, FEAT_S2POE)
 - Memory Attribute Index Enhancement (FEAT_AIE)

New instructions added:
 - FEAT_SYSREG128 adds MRRS and MSRR.
 - FEAT_SYSINSTR128 adds the SYSP instruction and TLBIP aliases.
 - FEAT_LSE128 adds LDCLRP, LDSET, and SWPP instructions.
 - FEAT_THE adds the set of RCW* instructions.

Specs for individual instructions can be found here:
https://developer.arm.com/documentation/ddi0602/2022-09/Base-Instructions/

Contributors:
  Keith Walker
  Lucas Prates
  Sam Elliott
  Son Tuan Vu
  Tomas Matheson

Differential Revision: https://reviews.llvm.org/D138920
2022-11-30 13:37:02 +00:00
Sander de Smalen
7f01737687 [AArch64][AsmParser] SME: Allow h/v suffix to be upper-case. 2022-11-28 11:42:43 +00:00
Kazu Hirata
fc07a54ef6 [AsmParser] Use std::optional in AArch64AsmParser.cpp (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-11-25 22:14:44 -08:00
Tomas Matheson
a6aaa969f7 [AArch64] Assembly support for FEAT_LRCPC3
This patch implements assembly support for the 2022 A-Profile Architecture
extension FEAT_LRCPC3. FEAT_LRCPC3 is AArch64 only and introduces new
variants of load/store instructions with release consistency ordering.

Specs for individual instructions can be found here:
https://developer.arm.com/documentation/ddi0602/2022-09/Base-Instructions/

This feature is optionally available from v8.2a and therefore not enabled by
default.

Contributors:
  Lucas Prates
  Sam Elliot
  Son Tuan Vu
  Tomas Matheson

Differential Revision: https://reviews.llvm.org/D138579
2022-11-25 18:59:07 +00:00
Ties Stuij
cb261e30fb [AArch64][clang] implement 2022 General Data-Processing instructions
This patch implements the 2022 Architecture General Data-Processing Instructions

They include:

Common Short Sequence Compression (CSSC) instructions
- scalar comparison instructions
  SMAX, SMIN, UMAX, UMIN (32/64 bits) with or without immediate
- ABS (absolute), CNT (count non-zero bits), CTZ (count trailing zeroes)
- command-line options for CSSC

Associated with these instructions in the documentation is the Range Prefetch
Memory (RPRFM) instruction, which signals to the memory system that data memory
accesses from a specified range of addresses are likely to occur in the near
future. The instruction lies in hint space, and is made unconditional.

Specs for the individual instructions can be found here:
https://developer.arm.com/documentation/ddi0602/2022-09/Base-Instructions/

contributors to this patch:
- Cullen Rhodes
- Son Tuan Vu
- Mark Murray
- Tomas Matheson
- Sam Elliott
- Ties Stuij

Reviewed By: lenary

Differential Revision: https://reviews.llvm.org/D138488
2022-11-22 14:23:12 +00:00
Kazu Hirata
6ba4b62af8 Return None instead of Optional<T>() (NFC)
This patch replaces:

  return Optional<T>();

with:

  return None;

to make the migration from llvm::Optional to std::optional easier.
Specifically, I can deprecate None (in my source tree, that is) to
identify all the instances of None that should be replaced with
std::nullopt.

Note that "return None" far outnumbers "return Optional<T>();".  There
are more than 2000 instances of "return None" in our source tree.

All of the instances in this patch come from functions that return
Optional<T> except Archive::findSym and ASTNodeImporter::import, where
we return Expected<Optional<T>>.  Note that we can construct
Expected<Optional<T>> from any parameter convertible to Optional<T>,
which None certainly is.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716

Differential Revision: https://reviews.llvm.org/D138464
2022-11-21 19:06:42 -08:00
Ties Stuij
983f63f7f0 [AArch64][ARM] add Armv8.9-a/Armv9.4-a identifier support
For both ARM and AArch64 add support for specifying -march=armv8.9a/armv9.4a to
clang. Add backend plumbing like target parser and predicate support.

For a summary of Amv8.9/Armv9.4 features, see:
https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/arm-a-profile-architecture-2022

For detailed information, consult the Arm Architecture Reference Manual for
A-profile architecture:
https://developer.arm.com/documentation/ddi0487/latest/

People who contributed to this patch:
- Keith Walker
- Ties Stuij

Reviewed By: tmatheson

Differential Revision: https://reviews.llvm.org/D138010
2022-11-16 10:20:14 +00:00
Caroline Concatto
4dad168e67 [NFC][AArch64] SME2 Add instruction name convention and fix LookupTable number of registers
This patch adds the name convention for SME instructions.
This patch fixes the number of registers for LookUpTable in the AsmParser.
The number of registers is not used atm, but it is needed.
The switch case in getNumRegsForRegKind needs to have all the
RegKind enum.
2022-11-15 12:09:51 +00:00
Caroline Concatto
3eacda4547 [AArch64] Add all SME2.1 instructions Assembly/Disassembly
This patch adds a new feature flag:
sme-f16f16 to represent FEAT_SME-F16F16

This patch add the following instructions:
SME2.1 stand alone instructions:
   MOVAZ (array to vector, four registers): Move and zero four ZA single-vector groups to vector registers.
         (array to vector, two registers): Move and zero two ZA single-vector groups to vector registers.
         (tile to vector, four registers): Move and zero four ZA tile slices to vector registers.
         (tile to vector, single): Move and zero ZA tile slice to vector register.
         (tile to vector, two registers): Move and zero two ZA tile slices to vector registers.

   LUTI2 (Strided four registers): Lookup table read with 2-bit indexes.
         (Strided two registers): Lookup table read with 2-bit indexes.

   LUTI4 (Strided four registers): Lookup table read with 4-bit indexes.
         (Strided two registers): Lookup table read with 4-bit indexes.

   ZERO (double-vector): Zero ZA double-vector groups.
        (quad-vector): Zero ZA quad-vector groups.
        (single-vector): Zero ZA single-vector groups.

SME2p1 and SME-F16F16:
 All instructions are half precision elements:
   FADD: Floating-point add multi-vector to ZA array vector accumulators.

   FSUB: Floating-point subtract multi-vector from ZA array vector accumulators.

   FMLA (multiple and indexed vector): Multi-vector floating-point fused multiply-add by indexed element.
        (multiple and single vector): Multi-vector floating-point fused multiply-add by vector.
        (multiple vectors): Multi-vector floating-point fused multiply-add.

   FMLS (multiple and indexed vector): Multi-vector floating-point fused multiply-subtract by indexed element.
        (multiple and single vector): Multi-vector floating-point fused multiply-subtract by vector.
        (multiple vectors): Multi-vector floating-point fused multiply-subtract.

   FCVT (widening): Multi-vector floating-point convert from half-precision to single-precision (in-order).

   FCVTL: Multi-vector floating-point convert from half-precision to deinterleaved single-precision.

   FMOPA (non-widening): Floating-point outer product and accumulate.

   FMOPS (non-widening): Floating-point outer product and subtract.

SME2p1 and B16B16:
   BFADD: BFloat16 floating-point add multi-vector to ZA array vector accumulators.

   BFSUB: BFloat16 floating-point subtract multi-vector from ZA array vector accumulators.

   BFCLAMP: Multi-vector BFloat16 floating-point clamp to minimum/maximum number.

   BFMLA (multiple and indexed vector): Multi-vector BFloat16 floating-point fused multiply-add by indexed element.
         (multiple and single vector): Multi-vector BFloat16 floating-point fused multiply-add by vector.
         (multiple vectors): Multi-vector BFloat16 floating-point fused multiply-add.

   BFMLS (multiple and indexed vector): Multi-vector BFloat16 floating-point fused multiply-subtract by indexed element.
         (multiple and single vector): Multi-vector BFloat16 floating-point fused multiply-subtract by vector.
         (multiple vectors): Multi-vector BFloat16 floating-point fused multiply-subtract.

   BFMAX (multiple and single vector): Multi-vector BFloat16 floating-point maximum by vector.
         (multiple vectors): Multi-vector BFloat16 floating-point maximum.

   BFMAXNM (multiple and single vector): Multi-vector BFloat16 floating-point maximum number by vector.
           (multiple vectors): Multi-vector BFloat16 floating-point maximum number.

   BFMIN (multiple and single vector): Multi-vector BFloat16 floating-point minimum by vector.
         (multiple vectors): Multi-vector BFloat16 floating-point minimum.

   BFMINNM (multiple and single vector): Multi-vector BFloat16 floating-point minimum number by vector.
           (multiple vectors): Multi-vector BFloat16 floating-point minimum number.

   BFMOPA (non-widening): BFloat16 floating-point outer product and accumulate.

   BFMOPS (non-widening): BFloat16 floating-point outer product and subtract.

The reference can be found here:

https://developer.arm.com/documentation/ddi0602/2022-09

Differential Revision: https://reviews.llvm.org/D137571
2022-11-14 14:56:16 +00:00
Caroline Concatto
ecab1bc0dc [AArch64]SME2 Multi vector Sel Load and Store instructions
This patch adds the assembly/disassembly for the following instruction:

   SEL: Multi-vector conditionally select elements from two vectors
        for 2 and 4 registers

Non-constiguous load with stride resgisters:

  LD1B (scalar + immediate): Contiguous load of bytes to multiple strided vectors (immediate index).
       (scalar + scalar): Contiguous load of bytes to multiple strided vectors (scalar index).
  LD1D (scalar + immediate): Contiguous load of doublewords to multiple strided vectors (immediate index).
       (scalar + scalar): Contiguous load of doublewords to multiple strided vectors (scalar index).
  LD1H (scalar + immediate): Contiguous load of halfwords to multiple strided vectors (immediate index).
       (scalar + scalar): Contiguous load of halfwords to multiple strided vectors (scalar index).
  LD1W (scalar + immediate): Contiguous load of words to multiple strided vectors (immediate index).
       (scalar + scalar): Contiguous load of words to multiple strided vectors (scalar index).

  LDNT1B (scalar + immediate): Contiguous load non-temporal of bytes to multiple strided vectors (immediate index).
         (scalar + scalar): Contiguous load non-temporal of bytes to multiple strided vectors (scalar index).
  LDNT1D (scalar + immediate): Contiguous load non-temporal of doublewords to multiple strided vectors (immediate index).
         (scalar + scalar): Contiguous load non-temporal of doublewords to multiple strided vectors (scalar index).
  LDNT1H (scalar + immediate): Contiguous load non-temporal of halfwords to multiple strided vectors (immediate index).
         (scalar + scalar): Contiguous load non-temporal of halfwords to multiple strided vectors (scalar index).
  LDNT1W (scalar + immediate): Contiguous load non-temporal of words to multiple strided vectors (immediate index).
         (scalar + scalar): Contiguous load non-temporal of words to multiple strided vectors (scalar index).

Non-constiguous store with stride resgisters:

  ST1B (scalar + immediate): Contiguous store of bytes from multiple strided vectors (immediate index).
       (scalar + scalar): Contiguous store of bytes from multiple strided vectors (scalar index).
  ST1D (scalar + immediate): Contiguous store of doublewords from multiple strided vectors (immediate index).
       (scalar + scalar): Contiguous store of doublewords from multiple strided vectors (scalar index).
  ST1H (scalar + immediate): Contiguous store of halfwords from multiple strided vectors (immediate index).
       (scalar + scalar): Contiguous store of halfwords from multiple strided vectors (scalar index).
  ST1W (scalar + immediate): Contiguous store of words from multiple strided vectors (immediate index).
       (scalar + scalar): Contiguous store of words from multiple strided vectors (scalar index).

  STNT1B (scalar + immediate): Contiguous store non-temporal of bytes from multiple strided vectors (immediate index).
         (scalar + scalar): Contiguous store non-temporal of bytes from multiple strided vectors (scalar index).
  STNT1D (scalar + immediate): Contiguous store non-temporal of doublewords from multiple strided vectors (immediate index).
         (scalar + scalar): Contiguous store non-temporal of doublewords from multiple strided vectors (scalar index).
  STNT1H (scalar + immediate): Contiguous store non-temporal of halfwords from multiple strided vectors (immediate index).
         (scalar + scalar): Contiguous store non-temporal of halfwords from multiple strided vectors (scalar index).
  STNT1W (scalar + immediate): Contiguous store non-temporal of words from multiple strided vectors (immediate index).
         (scalar + scalar): Contiguous store non-temporal of words from multiple strided vectors (scalar index).

    The reference can be found here:

        https://developer.arm.com/documentation/ddi0602/2022-09

This patch also adds a new SVE vector list to represent the stride loads/stores
ZPRVectorListStrided and the sets of 2 and 4 ZA registers:
ZZ_[b|h|w|d]_strided and ZZZZ_[b|h|w|d]_strided

Differential Revision: https://reviews.llvm.org/D136172
2022-11-10 16:04:57 +00:00
Keith Walker
00d98e6572 [AArch64] RME MEC instructions and system registers
This patch adds assembler/disassembler support for
RME MEC (Memory Encryption Contexts).

Cache maintence instructions added:
- DC CIPAPA
- DC CIGDPAPA

System registers added:
- MECIDR_EL2
- MECID_P0_EL2
- MECID_A0_EL2
- MECID_P1_EL2
- MECID_A1_EL2
- VMECID_P_EL2
- VMECID_A_EL2
- MECID_RL_A_EL3

Differential Revision: https://reviews.llvm.org/D137431
2022-11-10 14:05:12 +00:00
Caroline Concatto
1a917568e7 [AArch64]SME2 MOV Instructions
This patch adds the assembly/disassembly for the following instructions:

MOVA (array to vector, four registers): Move four ZA single-vector groups to four vector registers.
     (array to vector, two registers): Move two ZA single-vector groups to two vector registers.
     (tile to vector, four registers): Move four ZA tile slices to four vector registers.
     (tile to vector, single): Move ZA tile slice to vector register.
     (tile to vector, two registers): Move two ZA tile slices to two vector registers.
     (vector to array, four registers): Move four vector registers to four ZA single-vector groups.
     (vector to array, two registers): Move two vector registers to two ZA single-vector groups.
     (vector to tile, four registers): Move four vector registers to four ZA tile slices.
     (vector to tile, single): Move vector register to ZA tile slice.
     (vector to tile, two registers): Move two vector registers to two ZA tile slices.

The reference can be found here:

https://developer.arm.com/documentation/ddi0602/2022-09

It add more sizes for Matrix Operand:
MatrixOp8 and MatrixOp16
two implicit operands uimm0s2range and uimm0s4range.
and  uimm1s2range that are immediates

Differential Revision: https://reviews.llvm.org/D136142
2022-11-08 09:51:56 +00:00
David Sherwood
cf69895ab3 [AArch64][SVE2] Add the SVE2.1 BF16 instructions
This patch adds the new FEAT_B16B16 feature as well as the
assembly/disassembly for all of the B16B16 instructions:

bfadd:   BFloat16 floating-point add vectors
bfsub:   BFloat16 floating-point subtract vectors
bfmul:   BFloat16 floating-point multiply vectors
bfclamp: BFloat16 floating-point clamp to minimum/maximum number
bfmax:   BFloat16 floating-point maximum
bfmaxnm: BFloat16 floating-point maximum number
bfmin:   BFloat16 floating-point minimum
bfminnm: BFloat16 floating-point minimum number
bfmla:   BFloat16 floating-point fused multiply-add vectors
bfmls:   BFloat16 floating-point fused multiply-subtract vectors

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2022-09

Differential Revision: https://reviews.llvm.org/D137321
2022-11-07 15:29:40 +00:00
David Sherwood
12a6572d41 [AArch64] Add SME2.1 target feature for Armv9-A 2022 Architecture Extension
First patch in a series adding MC layer support for SME2.1.

This patch adds the following feature:

sme2p1

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2022-09

Differential Revision: https://reviews.llvm.org/D137410
2022-11-07 14:38:28 +00:00
Caroline Concatto
73d2a4cfd8 [AArch64] SME2 -Fix failing buildbots because of warning
This patch is to solve this:
    https://lab.llvm.org/buildbot#builders/36/builds/26801
Created by this patch: a20112a74cb34f

[AArch64]SME2 instructions that use ZTO operand
2022-11-03 08:34:56 +00:00
Caroline Concatto
a20112a74c [AArch64]SME2 instructions that use ZTO operand
This patch adds the assembly/disassembly for the following instructions:
  ZERO (ZT0): Zero ZT0.
  LDR (ZT0): Load ZT0 register.
  STR (ZT0): Store ZT0 register.
  MOVT (scalar to ZT0): Move 8 bytes from general-purpose register to ZT0.
       (ZT0 to scalar): Move 8 bytes from ZT0 to general-purpose register.
 Consecutive:
   LUTI2 (single): Lookup table read with 2-bit indexes.
         (two registers): Lookup table read with 2-bit indexes.
         (four registers): Lookup table read with 2-bit indexes.
   LUTI4 (single): Lookup table read with 4-bit indexes.
         (two registers): Lookup table read with 4-bit indexes.
         (four registers): Lookup table read with 4-bit indexes.

The reference can be found here:

https://developer.arm.com/documentation/ddi0602/2022-09

This patch also adds a new register class and operand for zt0
and a another index operand uimm3s8

Differential Revision: https://reviews.llvm.org/D136088
2022-11-03 07:35:21 +00:00
David Sherwood
be369ea31b [AArch64][SVE2] Add the SVE2.1 while & pext predicate pair instructions
This patch adds the assembly/disassembly for the following
predicate pair instructions:

pext:    Set pair of predicates from predicate-as-counter
whilelt: While incrementing signed scalar less than scalar
whilele: While incrementing signed scalar less than or equal to scalar
whilegt: While incrementing signed scalar greater than scalar
whilege: While incrementing signed scalar greater than or equal to scalar
whilelo: While incrementing unsigned scalar lower than scalar
whilels: While incrementing unsigned scalar lower or same as scalar
whilehs: While decrementing unsigned scalar higher or same as scalar
whilehi: While decrementing unsigned scalar higher than scalar

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2022-09

Differential Revision: https://reviews.llvm.org/D136759
2022-11-02 08:39:03 +00:00
David Sherwood
5f7a8cf026 [AArch64][SVE2] Add the SVE2.1 cntp instruction
This patch adds the assembly/disassembly for the following instructions:

cntp : Set scalar to count from predicate-as-counter

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2022-09

Differential Revision: https://reviews.llvm.org/D136747
2022-11-01 13:24:37 +00:00
David Sherwood
3eafca58e2 [AArch64][SVE2] Add the SVE2.1 contiguous load to multiple consecutive vector
This patch adds the assembly/disassembly for the following instructions:

ld1*   : Contiguous load of bytes to multiple consecutive vectors -
         (scalar + scalar) and (scalar + immediate)
ldnt1* : Contiguous load non-temporal of bytes to multiple consecutive
         vectors - (scalar + scalar) and (scalar + immediate)

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2022-09

Differential Revision: https://reviews.llvm.org/D136680
2022-11-01 09:31:51 +00:00
Caroline Concatto
340cf2118a [AArch64]SME2 Multi-vector - Index/Single/Multi Array Vectors LONG INT MLA sources
This patch adds the assembly/disassembly for the following instructions:

 SMLALL: (multiple and indexed vector): Multi-vector signed integer multiply-add long long by indexed element.
         (multiple and single vector): Multi-vector signed integer multiply-add long long by vector.
         (multiple vectors): Multi-vector signed integer multiply-add long long.

 SMLSLL: (multiple and indexed vector): Multi-vector signed integer multiply-subtract long long by indexed element.
         (multiple and single vector): Multi-vector signed integer multiply-subtract long long by vector.
         (multiple vectors): Multi-vector signed integer multiply-subtract long long.

 SUMLALL: (multiple and indexed vector): Multi-vector signed by unsigned integer multiply-add long long by indexed element.
          (multiple and single vector): Multi-vector signed by unsigned integer multiply-add long long by vector.

 UMLALL: (multiple and indexed vector): Multi-vector unsigned integer multiply-add long long by indexed element.
         (multiple and single vector): Multi-vector unsigned integer multiply-add long long by vector.
         (multiple vectors): Multi-vector unsigned integer multiply-add long long.

 UMLSLL: (multiple and indexed vector): Multi-vector unsigned integer multiply-subtract long long by indexed element.
         (multiple and single vector): Multi-vector unsigned integer multiply-subtract long long by vector.
         (multiple vectors): Multi-vector unsigned integer multiply-subtract long long.

 USMLALL: (multiple and indexed vector): Multi-vector unsigned by signed integer multiply-add long long by indexed element.
          (multiple and single vector): Multi-vector unsigned by signed integer multiply-add long long by vector.
          (multiple vectors): Multi-vector unsigned by signed integer multiply-add long long.

    The reference can be found here:

    https://developer.arm.com/documentation/ddi0602/2022-09

          It also adds a new immediate:
              uimm2s4range for off2
              uimm1s4range for o1
            to represent the vector select offset.
         The new operands have the range between the first and the last vector position.

Depends on : D135785

Differential Revision: https://reviews.llvm.org/D136075
2022-10-28 17:39:23 +01:00
David Sherwood
3232725852 Fix whitespace introduced by 891aaff9a8a9997582eac1bb1edb8d4b4e117ef1 2022-10-27 18:24:51 +00:00
David Sherwood
891aaff9a8 [AArch64][SVE2] Add the SVE2.1 pext and ptrue predicate-as-counter instructions
This patch adds the assembly/disassembly for the following instructions:

pext (predicate) : Set predicate from predicate-as-counter
ptrue (predicate-as-counter) : Initialise predicate-as-counter to all active

This patch also introduces the predicate-as-counter registers pn8, etc.

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2022-09

Differential Revision: https://reviews.llvm.org/D136678
2022-10-27 18:23:35 +00:00
David Sherwood
fcd545863d [AArch64] Add SVE2.1 target feature for Armv9-A 2022 Architecture Extension
First patch in a series adding MC layer support for SVE2.1.

This patch adds the following feature:

sve2p1

Some of the existing SVE instructions added for SME are now
also available under the sve2p1 feature, which are now guarded
by the HasSVE2p1orSME predicate.

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2022-09

Differential Revision: https://reviews.llvm.org/D136352
2022-10-21 14:02:32 +00:00
Caroline Concatto
fc0cde8af5 [AArch64]SME2 Multi-vector - Index/Single/Multi Array Vectors FMA sources
This patch adds the assembly/disassembly for the following instructions:
  INT:
     SMLAL
     SMLSL
     UMLAL
     UMLSL
  FP:
    BFMLAL
    BFMLSL
    FMLAL
    FMLSL
For multiple and indexed vector, Multiple and Single vector and
Multi vectors, for 1, 2 and 4 ZA registers.

The reference can be found here:

https://developer.arm.com/documentation/ddi0602/2022-09

It also adds a new immediate:
  uimm3s2range for off3
  uimm2s2range for off2
to represent the vector select offset.
The new operands have the range between the first and the last vector position.

Depends on: D135563

Reviewed By: aemerson, sdesmalen

Re-landing the patch as the problem with https://reviews.llvm.org/D135563
is fixed in this commit: 1e4f82c2578cf5045ffe

Differential Revision: https://reviews.llvm.org/D135785
2022-10-21 14:20:44 +01:00
Caroline Concatto
1e4f82c257 [AArch64]SME2 Multi-single vector SVE Destructive 2 and 4 Registers
This patch adds the assembly/disassembly for the following instructions:
  ADD (to vector): Add replicated single vector to multi-vector with multi-vector result.
  SQDMULH (multiple and single vector): Multi-vector signed saturating doubling multiply high by vector.
for 2 and 4 ZA SVE registers.

The reference can be found here:

https://developer.arm.com/documentation/ddi0602/2022-09

It also adds more size for the multiple register tuple:
  ZZ_b_mul_r,  ZZ_h_mul_r,
  ZZZZ_b_mul_r,  ZZZZ_h_mul_r,
for 8 bits and 16 bits with 2 and 4 ZA registers.

Depends on: D135468

With a fix for Mips for this test:
llvm/test/MC/Mips/mips64r6/valid.s

Differential Revision: https://reviews.llvm.org/D135563
2022-10-21 14:01:29 +01:00
Caroline Concatto
9895447006 Revert "[AArch64]SME2 Multi-single vector SVE Destructive 2 and 4 Registers"
This reverts commit 4c4909703d74883e5cc49edcbd22b783135d2897.

This patch was breaking this test:

llvm/test/MC/Mips/mips64r6/valid.s

I will push again when fixed
2022-10-20 19:43:31 +01:00
Caroline Concatto
c05b1bde34 Revert "[AArch64]SME2 Multi-vector - Index/Single/Multi Array Vectors FMA sources"
This reverts commit 3fee9358baab54e4ed646a106297e7fb6f1b4cff.
2022-10-20 19:43:30 +01:00
Caroline Concatto
3fee9358ba [AArch64]SME2 Multi-vector - Index/Single/Multi Array Vectors FMA sources
This patch adds the assembly/disassembly for the following instructions:
  INT:
     SMLAL
     SMLSL
     UMLAL
     UMLSL
  FP:
    BFMLAL
    BFMLSL
    FMLAL
    FMLSL
For multiple and indexed vector, Multiple and Single vector and
Multi vectors, for 1, 2 and 4 ZA registers.

The reference can be found here:

https://developer.arm.com/documentation/ddi0602/2022-09

It also adds a new immediate:
  uimm3s2range for off3
  uimm2s2range for off2
to represent the vector select offset.
The new operands have the range between the first and the last vector position.

Depends on: D135563

Reviewed By: aemerson, sdesmalen

Differential Revision: https://reviews.llvm.org/D135785
2022-10-20 19:09:48 +01:00
Caroline Concatto
4c4909703d [AArch64]SME2 Multi-single vector SVE Destructive 2 and 4 Registers
This patch adds the assembly/disassembly for the following instructions:
  ADD (to vector): Add replicated single vector to multi-vector with multi-vector result.
  SQDMULH (multiple and single vector): Multi-vector signed saturating doubling multiply high by vector.
for 2 and 4 ZA SVE registers.

The reference can be found here:

https://developer.arm.com/documentation/ddi0602/2022-09

It also adds more size for the multiple register tuple:
  ZZ_b_mul_r,  ZZ_h_mul_r,
  ZZZZ_b_mul_r,  ZZZZ_h_mul_r,
for 8 bits and 16 bits with 2 and 4 ZA registers.

Depends on: D135468

Differential Revision: https://reviews.llvm.org/D135563
2022-10-20 18:54:41 +01:00
Caroline Concatto
9db12a45e4 [AArch64]SME2 Multiple vector ternary int/float 2 and 4 registers
This patch adds the assembly/disassembly for the following instructions:
       For INT:
               ADD(array results, multiple vectors): Add multi-vector to multi-vector with ZA array vector results.
               SUB(array results, multiple vectors): Subtract multi-vector from multi-vector with ZA array vector results.
       For FP:
              FMLA (multiple vectors): Multi-vector floating-point fused multiply-add.
              FMLS (multiple vectors): Multi-vector floating-point fused multiply-subtract.

The reference can be found here:

https://developer.arm.com/documentation/ddi0602/2022-09

   This patch also adds a  register operand to represent multiples of ZA multi-vectors.
They are:
        ZZ_s_mul_r, ZZ_d_mul_r, ZZZZ_s_mul_r and ZZZZ_d_mul_r
and represent the Zn or Zm times 2 or 4 according to the vector group.

Depends on: D135455

Differential Revision: https://reviews.llvm.org/D135468
2022-10-20 18:44:22 +01:00
Caroline Concatto
2ecbe8c38c [AArch64] SME2 Single-multi vector ternary int/FP 2 and 4 registers
This patch adds the assembly/disassembly for the following instructions:

For INT:
    ADD(array results, multiple and single vector): Add replicated single
        vector to multi-vector with ZA array vector results.
    SUB(array results, multiple and single vector): Subtract replicated single
        vector from multi-vector with ZA array vector results.
For FP:
    FMLA (multiple and single vector): Multi-vector floating-point fused
          multiply-add by vector.
    FMLS (multiple and single vector): Multi-vector floating-point
          multiply-subtract long by vector.
The reference can be found here:

https://developer.arm.com/documentation/ddi0602/2022-09

The Matriz Operand has 2 new sizes 32(.s) and 64(.d) bits
(MatrixOp32 and MatrixOp64)

Depends on: D135448

Depends on:  D135952

Differential Revision: https://reviews.llvm.org/D135455
2022-10-19 17:49:48 +01:00
Caroline Concatto
579ca5e7e1 [AArch64] Replace sme-i64 by sme-i16i64 and sme-f64 by sme-f64f64
The names in developer.arm for these SME features are:
  HaveSMEI16I64 and HaveSMEF64F64
so the new flag names are consistent with the documentation page

Reviewed By: sdesmalen, c-rhodes

Differential Revision: https://reviews.llvm.org/D135974
2022-10-19 10:56:46 +01:00
Eli Friedman
d6481dc88c [AArch64][Windows] Add MC support for save_any_reg.
Representing this as 12 separate operations is a bit ugly, but
trying to represent the different modes using a bitfield seemed worse.

Differential Revision: https://reviews.llvm.org/D135417
2022-10-18 11:45:27 -07:00
Martin Storsjö
918f6f581d [AArch64] [SEH] Rename pac_sign_return_address to pac_sign_lr
This new opcode was initially documented as "pac_sign_return_address"
in https://github.com/MicrosoftDocs/cpp-docs/pull/4202, but was
soon afterwards renamed into "pac_sign_lr" in
https://github.com/MicrosoftDocs/cpp-docs/pull/4209, as the other
name was unwieldy, and there were no other external references to
that name anywhere.

Rename our external .seh assembler directive - it hasn't been merged
for very long yet, so there's probably no external use to account for.

Rename all other internal references to the opcode similarly.

Differential Revision: https://reviews.llvm.org/D135762
2022-10-12 22:19:59 +03:00
Martin Storsjö
c43bff64e9 [AArch64] Add support for the SEH opcode for return address signing
This was documented upstream in
https://github.com/MicrosoftDocs/cpp-docs/pull/4202.

Differential Revision: https://reviews.llvm.org/D135276
2022-10-12 11:07:11 +03:00
Kazu Hirata
8b1b0d1d81 Revert "Use std::is_same_v instead of std::is_same (NFC)"
This reverts commit c5da37e42d388947a40654b7011f2a820ec51601.

This patch seems to break builds with some versions of MSVC.
2022-08-20 23:00:39 -07:00
Kazu Hirata
c5da37e42d Use std::is_same_v instead of std::is_same (NFC) 2022-08-20 22:36:26 -07:00
Fangrui Song
de9d80c1c5 [llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC
With C++17 there is no Clang pedantic warning or MSVC C5051.
2022-08-08 11:24:15 -07:00