src2 was incorrectly defined as VSrc_f16 but it is tied to dst which is VGPR_32. As a result, disassembler failed to decode src2.
Differential Revision: https://reviews.llvm.org/D140299
Fix a bug with neg_lo:[0,1,0] and neg_hi:[0,1,0] modifiers - they are accepted but not encoded.
Differential Revision: https://reviews.llvm.org/D140470
Generate brh, brw and brd instructions for byte-swap operations
on P10 and generating a single instruction for a 32-bit swap followed
by a 16-bit right shift.
Reviewed By: stefanp
Differential Revision: https://reviews.llvm.org/D140414
This adds support for the missing Non-maskable Interrupts (FEAT_NMI)
feature from armv8.8-A, which consists of the `ALLINT` pstate register.
This is a second iteration of the patch from D131389, building on top of
the D139925 changes that enable better support for `msr (immediate)`
instructions that take 1-bit immediates.
Contributors:
* David Candler
* Tomas Matheson
* Sam Elliott
Reviewed By: lenary, tmatheson
Differential Revision: https://reviews.llvm.org/D140216
This adds support for the new PM pstate system register introduced by
the v9.4-A Exception-based Event Profiling extension (FEAT_EBEP).
The new PM pstate register takes a 1-bit immediate and requires
different values to be specified for the higher bits of the Crm field.
To enable that, this patch creates an explicit separation between the
pstate system registers that take 4-bit and 1-bit immediate operands,
allowing each entry to specify the value for the 3 high bits of Crm.
This also updates other pstate registers to correctly accept 4-bit
immediates, matching their decoding specification from the Arm ARM.
These include: `PAN`, `UAO`, `DIT` and `SSBS`.
More information about this extension and the new register can be found
at:
* https://developer.arm.com/documentation/ddi0601/2022-09/AArch64-Registers/PM--PMU-Exception-Mask
Contributors:
* Lucas Prates
* Sam Elliott
Reviewed By: lenary
Differential Revision: https://reviews.llvm.org/D139925
This feature adds upstream support for FEAT_RASv2 and FEAT_PFAR. Both
are system-register-only, but FEAT_RAS is behind the command-line
extension "+ras", so FEAT_RASv2 is behind "+rasv2".
This patch includes support for ID_AA64MMFR4_EL1. This is an ID system
register so it is not behind any feature flags.
Differential Revision: https://reviews.llvm.org/D139936
Disassembler can successfully decode sgpr register when only vgpr
registers are valid for the operand (e.g. VReg_* and VISrc_* operands).
In InstPrinter, detect when operand register class does not contain
register that is being printed. Does not result in an error.
Intended use is for disassembler tests.
Differential Revision: https://reviews.llvm.org/D139646
After https://reviews.llvm.org/D137653 named sub-operands can be used
in the auto-generated instruction decoders. This allows the
auto-generated decoders to work properly, so all the hand-coded
decoders in the sparc target can be removed.
In some instances, a manually-written decoder had not been implemented
for an instruction, and thus that instruction was not decoded
properly. These have been fixed (and tests added).
Differential Revision: https://reviews.llvm.org/D137727
OMod was disabled if OpSel was enabled, but that restriction is more
specific than necessary. Any VOP3 with float operands can use OMod.
On GFX11, FMAC_F16_e64 can use op_sel.
Previously, SIFoldOperands and convertToThreeAddress were accidentally correct when
they reinterpreted the zero OMod operand on V_FMAC_F16_e64 as the OpSel operand on
V_FMA_F16_gfx9_e64. Now we explicitly add op_sel if required.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D139469
The main motivation for this change is to avoid ambiguity because
mapping symbol names may not be unique across a binary and do not allow uniquely
identifying target address. So that mapping symbols used as branch target
labels make llvm-objdump output less readable.
Another point is that mapping symbols sometimes appear in
non-allocatable sections, like debug info sections which make objdump
output even more confusing.
For example, a small AArch64 executable may contain plenty of `$d[.*]`
symbols and none of them would be useful as a label for resolving
a branch or a memory operand target address:
```
0000000000000254 l .note.ABI-tag 0000000000000000 $d
00000000000008d4 l .eh_frame 0000000000000000 $d
0000000000000868 l .rodata 0000000000000000 $d
0000000000011028 l .data 0000000000000000 $d
0000000000010db8 l .fini_array 0000000000000000 $d
0000000000010db0 l .init_array 0000000000000000 $d
00000000000008e8 l .eh_frame 0000000000000000 $d
0000000000011034 l .bss 0000000000000000 $d
```
Note that GNU objdump doesn't use mapping symbols as branch target
labels for all targets that support such symbols (ARM, AArch64, CSKY).
Differential Revision: https://reviews.llvm.org/D139131
Virtual Memory System Architecture (VMSA)
This is part of the 2022 A-Profile Architecture extensions and adds support for
the following:
- Translation Hardening Extension (FEAT_THE)
- 128-bit Page Table Descriptors (FEAT_D128)
- 56-bit Virtual Address (FEAT_LVA3)
- Support for 128-bit System Registers (FEAT_SYSREG128)
- System Instructions that can take 128-bit inputs (FEAT_SYSINSTR128)
- 128-bit Atomic Instructions (FEAT_LSE128)
- Permission Indirection Extension (FEAT_S1PIE, FEAT_S2PIE)
- Permission Overlay Extension (FEAT_S1POE, FEAT_S2POE)
- Memory Attribute Index Enhancement (FEAT_AIE)
New instructions added:
- FEAT_SYSREG128 adds MRRS and MSRR.
- FEAT_SYSINSTR128 adds the SYSP instruction and TLBIP aliases.
- FEAT_LSE128 adds LDCLRP, LDSET, and SWPP instructions.
- FEAT_THE adds the set of RCW* instructions.
Specs for individual instructions can be found here:
https://developer.arm.com/documentation/ddi0602/2022-09/Base-Instructions/
Contributors:
Keith Walker
Lucas Prates
Sam Elliott
Son Tuan Vu
Tomas Matheson
Differential Revision: https://reviews.llvm.org/D138920
In revision B.q and before of the Armv8-M architecture reference
manual, the vector/scalar forms of the `vmla` and `vmlas` instructions
came in signed and unsigned integer forms, such as `vmla.s8 q0,q1,r2`
or `vmlas.u32 q3,q4,r5`.
Revision B.r has changed this. There are no longer signed and unsigned
versions of these instructions, since they were functionally identical
anyway. Now there is just `vmla.i8` (or `i16` or `i32`, and similarly
for `vmlas`). Bit 28 of the instruction encoding, which was previously
0 for signed or 1 for unsigned, is now expected to be 0 always.
This change updates LLVM to the new version of the architecture. The
obsoleted encodings for unsigned integers are now decoding errors, and
only the still-valid encoding is ever emitted. This shouldn't break
any existing assembly code, because the old signed and unsigned
versions of the mnemonic are still accepted by the assembler (which is
standard practice anyway for all signedness-agnostic MVE integer
instructions).
Reviewed By: dmgreen, lenary
Differential Revision: https://reviews.llvm.org/D138827
This patch implements assembly support for the 2022 A-Profile Architecture
extension FEAT_LRCPC3. FEAT_LRCPC3 is AArch64 only and introduces new
variants of load/store instructions with release consistency ordering.
Specs for individual instructions can be found here:
https://developer.arm.com/documentation/ddi0602/2022-09/Base-Instructions/
This feature is optionally available from v8.2a and therefore not enabled by
default.
Contributors:
Lucas Prates
Sam Elliot
Son Tuan Vu
Tomas Matheson
Differential Revision: https://reviews.llvm.org/D138579
The following system registers have been missing upstream:
- ID_DFR1_EL1
- AMCG1IDR_EL0 (present when FEAT_AMUv1p1 implemented - optional from v8.6-a)
- HAFGRTR_EL2 (present when FEAT_AMUv1 and FEAT_FGT are implemented)
With regards to HAFGRTR_EL2, this is only present when you have both
extensions. As FEAT_FGT is part of a later architecture, we group it
with those registers. In all honesty, this is a good example of the
kinds of place where just enabling all system registers all the time
would be easiest.
Differential Revision: https://reviews.llvm.org/D138553
This adds support for the 2022 Debug and PMU extensions that are part of
the v8.9-A and v9.4-A architecture versions. This includes:
* New archtecture extension for the v9.4-A Instrumentation Extension
(FEAT_ITE), including 'trcit' instruction and system registers
* New system registers for:
* 2022 Debug features (FEAT_Debugv8p9)
* 2022 Performance Monitors Extension features (FEAT_PMUv3p9)
* PMU Snapshot extension (FEAT_PMUv3_SS)
* PMU Fixed-function instruction counter (FEAT_PMUv3_ICNTR)
* System Performance Monitors Extension (FEAT_SPMU)
* Synchornous-exception-based event profiling (FEAT_SEBEP)
* Fine Grained Traps Extension (FEAT_FGT2)
* SPE Data Source filtering (FEAT_SPE_FDS)
More information on the new extensions can be found on:
* https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/arm-a-profile-architecture-2022
* https://developer.arm.com/downloads/-/exploration-tools
Changes by Son Tuan Vu, Sam Elliott and me.
Reviewed By: tmatheson
Differential Revision: https://reviews.llvm.org/D138556
This patch implements the 2022 Architecture General Data-Processing Instructions
They include:
Common Short Sequence Compression (CSSC) instructions
- scalar comparison instructions
SMAX, SMIN, UMAX, UMIN (32/64 bits) with or without immediate
- ABS (absolute), CNT (count non-zero bits), CTZ (count trailing zeroes)
- command-line options for CSSC
Associated with these instructions in the documentation is the Range Prefetch
Memory (RPRFM) instruction, which signals to the memory system that data memory
accesses from a specified range of addresses are likely to occur in the near
future. The instruction lies in hint space, and is made unconditional.
Specs for the individual instructions can be found here:
https://developer.arm.com/documentation/ddi0602/2022-09/Base-Instructions/
contributors to this patch:
- Cullen Rhodes
- Son Tuan Vu
- Mark Murray
- Tomas Matheson
- Sam Elliott
- Ties Stuij
Reviewed By: lenary
Differential Revision: https://reviews.llvm.org/D138488
Add missing assembler/disassembler tests for INSTID_SALU_CYCLE_2
and INSTID_SALU_CYCLE_3 which are possible arguments in S_DELAY_ALU.
Differential Revision: https://reviews.llvm.org/D138482
D136149 and D136148 renamed the MC test files for VOP3 promoted from VOP1 and
VOP2 in a consistent way. Do the same for VOP3 coming from VOPC.
Reviewed By: dp
Differential Revision: https://reviews.llvm.org/D137950
Add a new instruction called SUBUFS that does saturating subtract.
This instruction is only for Future CPU.
Reviewed By: amyk
Differential Revision: https://reviews.llvm.org/D137643
This adds support for atomic_load, atomic_store, atomic_cmpxchg
and atomic_rmw
Fixes#48236
Reviewed by: myhsu, efriedma
Differential Revision: https://reviews.llvm.org/D136525
A new register class as well as a number of related subregisters are being added
to Future CPU. These registers are Dense Math Registers (DMR) and are 1024 bits
long. These regsiters can also be used in consecutive pairs which leads to a
register that is 2048 bits.
This patch also adds 7 new instructions that use these registers. More
instructions will be added in future patches.
Reviewed By: amyk, saghir
Differential Revision: https://reviews.llvm.org/D136366
The sme2 predicate was as AssemblerPredicate, not
AssemblerPredicateWithAll like all the other features, meaning it wasn't
included in +all. This fixes that inconsistency, allowing the
instructions to be decoded by default.
Differential Revision: https://reviews.llvm.org/D137016
This prepares for an upcoming change to make --print-imm-hex the default
behavior of llvm-objdump. These tests were updated in a semi-automatic
fashion.
See D136972 for details.
This is a follow-on to https://reviews.llvm.org/D134073.
The number of MIPS16 changes here is a bit surprising. Many of the
fields with mismatched names were NOT previously choosing the correct
argument positionally, but instead doing something completely wrong
(e.g. it would encode a register where an immediate was expected).
But, machine-code generation for MIPS16 has never actually functioned.
It's also fully untested, thus, the MIPS16 changes, despite changing
behavior, breaks (and fixes) zero tests. This change does not fix
MIPS16 output, but it ought to be at least incrementally less broken.
Outside MIPS16, I believe the only functional change is to the 'ginvi'
instruction: it was previously encoding garbage into a field which was
specified to be '00'. Fortunately, it was covered by tests -- and the
tests were testing the incorrect behavior. So, fixed.
Differential Revision: https://reviews.llvm.org/D134220
Correct v_cndmask_b32 to support abs/neg modifiers in dpp/sdwa/e64 variants.
Correct v_cndmask_b16 for proper disassembly of abs/neg modifiers in e64_dpp variants.
Differential Revision: https://reviews.llvm.org/D135900