For legacy (arithmetic) instructions, the operand size override prefix (0x66)
is used to switch the operand data size from 32b to 16b (in 32/64-bit mode),
16b to 32b (in 16-bit mode). That's why we set OpSize16 for 16-bit instructions
and set OpSize32 for 32-bit instructions.
But it's not a generic rule any more after APX. APX adds 4 variants for
arithmetic instructions: promoted EVEX, NDD (new data destination), NF (no flag),
NF_NDD. All the 4 variants are in EVEX space and only legal in 64-bit
mode. EVEX.pp is set to 01 for the 16-bit instructions to encode 0x66.
For APX, we should set OpSizeFixed for 8/16/32/64-bit variants and set PD for
the 16-bit variants.
Hence, to reuse the classes ITy and its subclasses BinOp* for APX instructions,
we extract the OpSize setting from the class ITy.
It helps simplify the class definitions. Now, the only explicit usage of PS is
to check prefix 0x66/0xf2/0xf3 can not be used a prefix, e.g. wbinvd.
See 82974e0114f02ffc07557e217d87f8dc4e100a26 for more details.
This implements assembly support for the Memory Systems Extensions
introduced as part of the Armv9.5-A architecture version.
The changes include:
* New subtarget feature for FEAT_TLBIW.
* New system registers for FEAT_HDBSS:
* HDBSSBR_EL2 and HDBSSPROD_EL2.
* New system registers for FEAT_HACDBS:
* HACDBSBR_EL2 and HACDBSCONS_EL2.
* New TLBI instructions for FEAT_TLBIW:
* VMALLWS2E1(nXS), VMALLWS2E1IS(nXS) and VMALLWS2E1OS(nXS).
* New system register for FEAT_FGWTE3:
* FGWTE3_EL3.
This adds the following instructions which are added in PAuthLR:
- PACIA171615
- PACIB171615
- AUTIA171615
- AUTIB171615
Also updates some encodings to match final published values.
Documentation can be found here:
https://developer.arm.com/documentation/ddi0602/2023-12/Base-Instructions
Co-authored-by: Lucas Prates <lucas.prates@arm.com>
Instead, we add a `BranchOnly` parameter to indicate that only
branches with its predecessors will be fused.
X86 is the only user of `createBranchMacroFusionDAGMutation`.
This can help us aovid introducing new classes T_MAP*PS|PD|XS|XD
when a new opcode map is supported.
And, T_MAP*PS|PD|XS|XD does not look better than T_MAP*, PS|PD|XS|XD.
`VEX_4V` does not look simpler than `VEX, VVVV`. It's kind of confusing
b/c classes like `VEX_L`, `VEX_LIG` do not imply `VEX` but it does.
For APX, we have promote EVEX, NDD, NF and NDD_NF instructions. All of
the 4 variants are in EVEX space and NDD/NDD_NF set the VVVV fields.
To extract the common fields (e.g EVEX) into a class and set VVVV
conditionally, we need VVVV to not imply other prefixes.
ISD::VP_MERGE treats the false operand as the source for elements past
VL. The vmerge instruction encodes 3 registers and treats the vd
register as the source for the tail.
This patch adds a new ISD opcode that models the tail source explicitly.
During lowering we copy the false operand to this operand.
I think we can merge RISCVISD::VSELECT_VL with this new opcode by using
an UNDEF passthru, but I'll save that for another patch.
By looking at whether a global is large instead of looking at the code
model.
This also fixes references to large data in the small code model.
We now always fold any 32-bit offset into the addressing mode with the
large code model since it uses 64-bit relocations.
Depositing value into the lowest byte/word is a common code pattern.
This patch improves the code generation for it to avoid redundant AND
and OR operations.
IR intrinsics were already defined, but no codegen support had been
added.
I extracted this code from our downstream. Some of it may have come from
https://repo.hca.bsc.es/gitlab/rferrer/llvm-epi/ originally.
According to Intel SDE, ADCX reads CF and ADOX reads OF. `Uses` was
set to empty by accident, the bug was not exposed b/c compiler never
emits these instructions.
1. Simplify the variable name
2. Change HasOddOpcode to HasEvenOpcode b/c
a. opcode of any 8-bit arithmetic instruction is even
b. opcode of a 16/32/64-bit arithmetic instruction is usually
odd, but it can be even sometimes, e.g. INC/DEC, ADCX/ADOX
c. so that we can remove `let Opcode = o` for the mentioned corner
cases.
- Adds a new +pc option to -mbranch-protection that will enable
the use of PC as a diversifier in PAC branch protection code.
- When +pauth-lr is enabled (-march=armv9.5a+pauth-lr) in combination
with -mbranch-protection=pac-ret+pc, the new 9.5-a instructions
(pacibsppc, retaasppc, etc) are used.
Documentation for the relevant instructions can be found here:
https://developer.arm.com/documentation/ddi0602/2023-09/Base-Instructions/
Co-authored-by: Lucas Prates <lucas.prates@arm.com>
Add assembly/disassembly support for the new PAuthLR instructions
introduced in Armv9.5-A:
- AUTIASPPC/AUTIBSPPC
- PACIASPPC/PACIBSPPC
- PACNBIASPPC/PACNBIBSPPC
- RETAASPPC/RETABSPPC
- PACM
Documentation for these instructions can be found here:
https://developer.arm.com/documentation/ddi0602/2023-09/Base-Instructions/
1. Remove redandunt classes
2. Correct comments
3. Move duplicated `let` statement into class definition
4. Simplify the variable name and align the code
This adds post-legalizing lowering of G_UNMERGE_VALUES which take a vector and
produce scalar values for each lane. They are converted to a G_EXTRACT_VECTOR_ELT
for each lane, allowing all the existing tablegen patterns to apply to them.
A couple of tablegen patterns need to be altered to make sure the type of the
constant operand is known, so that the patterns are recognized under global
isel.
Closes#75662
NDD (new data destination) instructions need to set NoCD8 and EVEX_4V.
EVEX_4V already implies EVEX. If NoCD8 implied EVEX too, we would not
be able to reuse the class.
emitPopInst checks a single function exit MBB. If other paths also exit
the function and any of there terminators uses LR implicitly, it is not
save to clear the Restored bit.
Check all terminators for the function before clearing Restored.
This fixes a mis-compile in outlined-fn-may-clobber-lr-in-caller.ll
where the machine-outliner previously introduced BLs that clobbered LR
which in turn is used by the tail call return.
Alternative to #73553
This introduces assembly support for the Checked Pointer Arithmetic
Extension (FEAT_CPA), annouced as part of the Armv9.5-A architecture
version.
The changes include:
* New subtarget feature for FEAT_CPA
* New scalar instruction for pointer arithmetic
* ADDPT, SUBPT, MADDPT, and MSUBPT
* New SVE instructions for pointer arithmetic
* ADDPT (vectors, predicated), ADDPT (vectors, unpredicated)
* SUBPT (vectors, predicated), SUBPT (vectors, unpredicated)
* MADPT and MLAPT
* New ID_AA64ISAR3_EL1 system register
Mode details about the extension can be found at:
* https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/arm-a-profile-architecture-developments-2023
* https://developer.arm.com/documentation/ddi0602/2023-09/
Co-authored-by: Rodolfo Wottrich <rodolfo.wottrich@arm.com>
Return ConstantPoolSDNode instead of const ConstantPoolSDNode - doesn't affect the accessors at all and makes it easier to use result in calls expecting a SDNode.
Return poison instead of undef for non-demanded lanes in the AMDGPU
demanded element simplification hook.
Also bail out of dmask is 0, as this case has special semantics:
> If DMASK==0, the TA overrides DMASK=1 and puts zeros in VGPR followed by
> LWE status if exists. TFE status is not generated since the fetch is dropped.