75658 Commits

Author SHA1 Message Date
Kazu Hirata
1daf2994de [llvm] Use StringRef::contains (NFC) 2023-12-23 22:21:52 -08:00
Shengchen Kan
17ff25a58e [X86][NFC] Not infer OpSize from Xi8|16|32|64
For legacy (arithmetic) instructions, the operand size override prefix (0x66)
is used to switch the operand data size from 32b to 16b (in 32/64-bit mode),
16b to 32b (in 16-bit mode). That's why we set OpSize16 for 16-bit instructions
and set OpSize32 for 32-bit instructions.

But it's not a generic rule any more after APX. APX adds 4 variants for
arithmetic instructions: promoted EVEX, NDD (new data destination), NF (no flag),
NF_NDD. All the 4 variants are in EVEX space and only legal in 64-bit
mode. EVEX.pp is set to 01 for the 16-bit instructions to encode 0x66.
For APX, we should set OpSizeFixed for 8/16/32/64-bit variants and set PD for
the 16-bit variants.

Hence, to reuse the classes ITy and its subclasses BinOp* for APX instructions,
we extract the OpSize setting from the class ITy.
2023-12-24 12:00:25 +08:00
Shengchen Kan
6e20df1a3b [X86][NFC] Set default OpPrefix to PS for XOP/VEX/EVEX instructions
It helps simplify the class definitions. Now, the only explicit usage of PS is
to check prefix 0x66/0xf2/0xf3 can not be used a prefix, e.g. wbinvd.

See 82974e0114f02ffc07557e217d87f8dc4e100a26 for more details.
2023-12-24 10:20:40 +08:00
Momchil Velikov
4b6968952e
[AArch64] Implement spill/fill of predicate pair register classes (#76068)
We are getting ICE with, e.g.
```
#include <arm_sve.h>

 void g();
 svboolx2_t f0(int64_t i, int64_t n) {
     svboolx2_t r = svwhilelt_b16_x2(i, n);
     g();
     return r;
 }
```
2023-12-22 15:54:12 +00:00
Lucas Duarte Prates
e4f1c52832
[AArch64] Assembly support for the Armv9.5-A Memory System Extensions (#76237)
This implements assembly support for the Memory Systems Extensions
introduced as part of the Armv9.5-A architecture version.
The changes include:
* New subtarget feature for FEAT_TLBIW.
* New system registers for FEAT_HDBSS:
  * HDBSSBR_EL2 and HDBSSPROD_EL2.
* New system registers for FEAT_HACDBS:
  * HACDBSBR_EL2 and HACDBSCONS_EL2.
* New TLBI instructions for FEAT_TLBIW:
  * VMALLWS2E1(nXS), VMALLWS2E1IS(nXS) and VMALLWS2E1OS(nXS).
* New system register for FEAT_FGWTE3:
  * FGWTE3_EL3.
2023-12-22 14:40:29 +00:00
Tomas Matheson
f5ab0bb148
[AArch64] paci<k>171615 auti<k>171615 assembly (#76227)
This adds the following instructions which are added in PAuthLR:
 - PACIA171615
 - PACIB171615
 - AUTIA171615
 - AUTIB171615

Also updates some encodings to match final published values.

Documentation can be found here:

https://developer.arm.com/documentation/ddi0602/2023-12/Base-Instructions

Co-authored-by: Lucas Prates <lucas.prates@arm.com>
2023-12-22 13:54:21 +00:00
Lucas Duarte Prates
7109a462cd
[AArch64] Assembly support for the Armv9.5-A RAS Extensions (#76161)
This implements assembly support for the RAS extensions introduced as
part of the Armv9.5-A architecture version.
The changes include:
* New system registers for Delegated SError exceptions for EL3
(FEAT_E3DSE):
  * VDISR_EL3
  * VSESR_EL3

Mode details about these extensions can be found at:
* https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/arm-a-profile-architecture-developments-2023
* https://developer.arm.com/documentation/ddi0602/2023-09/

Co-authored-by: Jirui Wu <jirui.wu@arm.com>
Co-authored-by: Oliver Stannard <oliver.stannard@arm.com>
2023-12-22 10:06:06 +00:00
Wang Pengcheng
17858ce6f3
[MacroFusion] Remove createBranchMacroFusionDAGMutation (#76209)
Instead, we add a `BranchOnly` parameter to indicate that only
branches with its predecessors will be fused.

X86 is the only user of `createBranchMacroFusionDAGMutation`.
2023-12-22 16:31:38 +08:00
Shengchen Kan
ff32ab3ae7 [X86][NFC] Not imply TB in PS|PD|XS|XD
This can help us aovid introducing new classes T_MAP*PS|PD|XS|XD
when a new opcode map is supported.

And, T_MAP*PS|PD|XS|XD does not look better than T_MAP*, PS|PD|XS|XD.
2023-12-22 15:44:30 +08:00
XinWang10
1d4691a233
[X86][MC] Support Enc/Dec for EGPR for promoted CMPCCXADD instruction (#76125)
R16-R31 was added into GPRs in
https://github.com/llvm/llvm-project/pull/70958,
This patch supports the encoding/decoding for promoted CMPCCXADD
instruction in EVEX space.

RFC:
https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4
2023-12-22 15:19:56 +08:00
Wang Pengcheng
f9c908862a
[RISCV] Split TuneShiftedZExtFusion (#76032)
We split `TuneShiftedZExtFusion` into three fusions to make them
reusable and match the GCC implementation[1].

The zexth/zextw fusions can be reused by XiangShan[2] and other
commercial processors, but shifted zero extension is not so common.

`macro-fusions-veyron-v1.mir` is renamed so it's not relevant to
specific processor.

References:
[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637303.html
[2] https://xiangshan-doc.readthedocs.io/zh_CN/latest/frontend/decode
2023-12-22 14:37:26 +08:00
wangpc
90f816e61f [RISCV] Rename TuneVeyronFusions to TuneVentanaVeyron
And fusion features are added to processor definition.
2023-12-22 14:29:31 +08:00
XinWang10
7b3323fffb
[X86][MC] Support Enc/Dec for EGPR for promoted CET instruction (#76023)
R16-R31 was added into GPRs in
https://github.com/llvm/llvm-project/pull/70958,
This patch supports the encoding/decoding for promoted CET instruction
in EVEX space.

RFC:
https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4
2023-12-22 14:11:32 +08:00
Vitaly Buka
0ccc1e7acd Revert "[AArch64] Fold more load.x into load.i with large offset"
Issue #76202

This reverts commit f5687636415969e6d945659a0b78734abdfb0f06.
2023-12-21 21:12:40 -08:00
Matt Arsenault
248fba0cd8 AMDGPU: Remove pointless setOperationAction for xint_to_fp
The legalize action for uint_to_fp/sint_to_fp uses the source integer
type, not the result FP type so setting an action on an FP type does
nothing.
2023-12-22 11:24:35 +07:00
Shengchen Kan
62d8ae0a1e [X86][NFC] Remove class (VEX/EVEX/XOP)_4V and add class VVVV
`VEX_4V` does not look simpler than `VEX, VVVV`. It's kind of confusing
b/c classes like `VEX_L`, `VEX_LIG` do not imply `VEX` but it does.

For APX, we have promote EVEX, NDD, NF and NDD_NF instructions. All of
the 4 variants are in EVEX space and NDD/NDD_NF set the VVVV fields.
To extract the common fields (e.g EVEX) into a class and set VVVV
conditionally, we need VVVV to not imply other prefixes.
2023-12-22 10:38:15 +08:00
Craig Topper
e64f5d6305
[RISCV] Replace RISCVISD::VP_MERGE_VL with a new node that has a separate passthru operand. (#75682)
ISD::VP_MERGE treats the false operand as the source for elements past
VL. The vmerge instruction encodes 3 registers and treats the vd
register as the source for the tail.

This patch adds a new ISD opcode that models the tail source explicitly.
During lowering we copy the false operand to this operand.

I think we can merge RISCVISD::VSELECT_VL with this new opcode by using
an UNDEF passthru, but I'll save that for another patch.
2023-12-21 14:34:49 -08:00
Arthur Eubanks
7433b1ca3e Reapply "[X86] Set SHF_X86_64_LARGE for globals with explicit well-known large section name (#74381)"
This reverts commit 19fff858931bf575b63a0078cc553f8f93cced20.

Now that explicit large globals are handled properly in the small code model.
2023-12-21 10:51:30 -08:00
Arthur Eubanks
2366d53d8d
[X86] Fix more medium code model addressing modes (#75641)
By looking at whether a global is large instead of looking at the code
model.

This also fixes references to large data in the small code model.

We now always fold any 32-bit offset into the addressing mode with the
large code model since it uses 64-bit relocations.
2023-12-21 10:40:56 -08:00
Tomas Matheson
7bd17212ef Re-land "[AArch64] Codegen support for FEAT_PAuthLR" (#75947)
This reverts commit 9f0f5587426a4ff24b240018cf8bf3acc3c566ae.

Fix expensive checks failure by properly marking register def for ADR.
2023-12-21 18:32:55 +00:00
David Li
f44079db22
[ISel] Add pattern matching for depositing subreg value (#75978)
Depositing value into the lowest byte/word is a common code pattern.
This patch improves the code generation for it to avoid redundant AND
and OR operations.
2023-12-21 10:18:57 -08:00
Tomas Matheson
192f720178 Re-land "[AArch64] Add FEAT_PAuthLR assembler support" (#75947)
This reverts commit 199a0f9f5aaf72ff856f68e3bb708e783252af17.
Fixed the left-shift of signed integer which was causing UB.
2023-12-21 18:09:31 +00:00
Craig Topper
0dcff0db3a
[RISCV] Add codegen support for experimental.vp.splice (#74688)
IR intrinsics were already defined, but no codegen support had been
added.

I extracted this code from our downstream. Some of it may have come from
https://repo.hca.bsc.es/gitlab/rferrer/llvm-epi/ originally.
2023-12-21 08:38:32 -08:00
Tomas Matheson
199a0f9f5a Revert "[AArch64] Add FEAT_PAuthLR assembler support"
This reverts commit 934b1099cbf14fa3f86a269dff957da8e5fb619f.

Buildbot failues on sanitizer-x86_64-linux-fast
2023-12-21 16:26:39 +00:00
Tomas Matheson
9f0f558742 Revert "[AArch64] Codegen support for FEAT_PAuthLR"
This reverts commit 5992ce90b8c0fac06436c3c86621fbf6d5398ee5.

Builtbot failures with expensive checks enabled.
2023-12-21 16:25:55 +00:00
Kazu Hirata
e01c063684 [llvm] Use DenseMap::contains (NFC) 2023-12-21 08:18:47 -08:00
Jay Foad
8fdfd34cd2
[AMDGPU] Remove GDS and GWS for GFX12 (#76148) 2023-12-21 15:27:08 +00:00
Shengchen Kan
8eccf2b872 [X86] Set Uses = [EFLAGS] for ADCX/ADOX
According to Intel SDE, ADCX reads CF and ADOX reads OF. `Uses` was
set to empty by accident, the bug was not exposed b/c compiler never
emits these instructions.
2023-12-21 23:01:00 +08:00
Shengchen Kan
2fe94cead0 [X86][NFC] Refine code in X86InstrArithmetic.td
1. Simplify the variable name
2. Change HasOddOpcode to HasEvenOpcode b/c
  a. opcode of any 8-bit arithmetic instruction is even
  b. opcode of a 16/32/64-bit arithmetic instruction is usually
     odd, but it can be even sometimes, e.g. INC/DEC, ADCX/ADOX
  c. so that we can remove `let Opcode = o` for the mentioned corner
     cases.
2023-12-21 22:24:59 +08:00
Tomas Matheson
5992ce90b8 [AArch64] Codegen support for FEAT_PAuthLR
- Adds a new +pc option to -mbranch-protection that will enable
  the use of PC as a diversifier in PAC branch protection code.

- When +pauth-lr is enabled (-march=armv9.5a+pauth-lr) in combination
  with -mbranch-protection=pac-ret+pc, the new 9.5-a instructions
  (pacibsppc, retaasppc, etc) are used.

Documentation for the relevant instructions can be found here:
https://developer.arm.com/documentation/ddi0602/2023-09/Base-Instructions/

Co-authored-by: Lucas Prates <lucas.prates@arm.com>
2023-12-21 14:18:33 +00:00
Oliver Stannard
934b1099cb [AArch64] Add FEAT_PAuthLR assembler support
Add assembly/disassembly support for the new PAuthLR instructions
introduced in Armv9.5-A:

- AUTIASPPC/AUTIBSPPC
- PACIASPPC/PACIBSPPC
- PACNBIASPPC/PACNBIBSPPC
- RETAASPPC/RETABSPPC
- PACM

Documentation for these instructions can be found here:
https://developer.arm.com/documentation/ddi0602/2023-09/Base-Instructions/
2023-12-21 14:18:33 +00:00
Shengchen Kan
b223aebd3f [X86][NFC] Refine code in X86InstrArithmetic.td
1. Remove redandunt classes
2. Correct comments
3. Move duplicated `let` statement into class definition
4. Simplify the variable name and align the code
2023-12-21 20:50:09 +08:00
zhongyunde 00443407
f568763641 [AArch64] Fold more load.x into load.i with large offset
The list of load.x is refer to canFoldIntoAddrMode on D152828.
Also support LDRSroX missed in canFoldIntoAddrMode
2023-12-21 18:54:15 +08:00
zhongyunde 00443407
32878c2065 [AArch64] merge index address with large offset into base address
A case for this transformation, https://gcc.godbolt.org/z/nhYcWq1WE
Fold
  mov     w8, #56952
  movk    w8, #15, lsl #16
  ldrb    w0, [x0, x8]
into
  add     x0, x0, 1036288
  ldrb    w0, [x0, 3704]

Only LDRBBroX is supported for the first time.
Fix https://github.com/llvm/llvm-project/issues/71917
2023-12-21 18:54:14 +08:00
David Green
c0931d4950 [AArch64][GlobalISel] Lower scalarizing G_UNMERGE_VALUES to G_EXTRACT_VECTOR_ELT
This adds post-legalizing lowering of G_UNMERGE_VALUES which take a vector and
produce scalar values for each lane. They are converted to a G_EXTRACT_VECTOR_ELT
for each lane, allowing all the existing tablegen patterns to apply to them.

A couple of tablegen patterns need to be altered to make sure the type of the
constant operand is known, so that the patterns are recognized under global
isel.

Closes #75662
2023-12-21 09:22:23 +00:00
Yeting Kuo
9b561ca044
[RISCV] Make performFP_TO_INTCombine fold with ISD::FRINT. (#76020)
Fold (fp_to_int (frint X)) to (fcvt X) without rounding mode.
2023-12-21 15:03:36 +08:00
Brandon Wu
b3769adbc5
[RISCV] Fix wrong lmul for sf_vfnrclip (#76016) 2023-12-21 13:24:26 +08:00
Shengchen Kan
b26c0ed93a [X86][NFC] Remove class BinOpRM_ImplicitUse b/c it's used once only 2023-12-21 11:31:39 +08:00
Shengchen Kan
5fa46daab3 [X86] Replace EVEX_NoCD8 with EVEX, NoCD8
This fixes the build error after
61b58123a3137323d6876006a6171d42e5e03cc1
2023-12-21 11:05:56 +08:00
Shengchen Kan
61b58123a3 [X86][NFC] Not imply EVEX in NoCD8
NDD (new data destination) instructions need to set NoCD8 and EVEX_4V.
EVEX_4V already implies EVEX. If NoCD8 implied EVEX too, we would not
be able to reuse the class.
2023-12-21 10:46:25 +08:00
Craig Topper
b03f0c596a
[RISCV] Add sifive-p450 CPU. (#75760)
This is an out of order core with no vector unit. More information:
https://www.sifive.com/cores/performance-p450-470

Scheduler model and other tuning will come in separate patches.
2023-12-20 09:52:02 -08:00
Florian Hahn
b1a5ee1feb
[ARM] Check all terms in emitPopInst when clearing Restored for LR. (#75527)
emitPopInst checks a single function exit MBB. If other paths also exit
the function and any of there terminators uses LR implicitly, it is not
save to clear the Restored bit.

Check all terminators for the function before clearing Restored.

This fixes a mis-compile in outlined-fn-may-clobber-lr-in-caller.ll
where the machine-outliner previously introduced BLs that clobbered LR
which in turn is used by the tail call return.

Alternative to #73553
2023-12-20 16:56:15 +01:00
Lucas Duarte Prates
d43fc5a6ad Reland: [AArch64] Assembly support for the Checked Pointer Arithmetic Extension (#73777)
This introduces assembly support for the Checked Pointer Arithmetic
Extension (FEAT_CPA), annouced as part of the Armv9.5-A architecture
version.

The changes include:
* New subtarget feature for FEAT_CPA
* New scalar instruction for pointer arithmetic
  * ADDPT, SUBPT, MADDPT, and MSUBPT
* New SVE instructions for pointer arithmetic
  * ADDPT (vectors, predicated), ADDPT (vectors, unpredicated)
  * SUBPT (vectors, predicated), SUBPT (vectors, unpredicated)
  * MADPT and MLAPT
* New ID_AA64ISAR3_EL1 system register

Mode details about the extension can be found at:
* https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/arm-a-profile-architecture-developments-2023
* https://developer.arm.com/documentation/ddi0602/2023-09/

Co-authored-by: Rodolfo Wottrich <rodolfo.wottrich@arm.com>
2023-12-20 15:43:17 +00:00
Simon Pilgrim
6ec350b483 [X86] SimplifyDemandedVectorEltsForTargetShuffle - don't simplify constant mask if it has multiple uses
Avoid generating extra constant vectors
2023-12-20 15:22:48 +00:00
Hassnaa Hamdi
f3dcc0cba9
[LLVM][AArch64][tblgen]: Match clamp pattern (#75529)
Add isel pattern to replase min(max(v1,v2),v3) by clamp
Add tests for uclamp, sclamp, bfclamp, fclamp.
2023-12-20 14:36:58 +00:00
Matt Arsenault
9e574a3936 DAG: Fix expansion of bf16 sourced extloads
Also fix assorted vector extload failures for AMDGPU.
2023-12-20 19:24:27 +07:00
Simon Pilgrim
3974d89bde [X86] getTargetConstantPoolFromBasePtr - drop const qualifier
Return ConstantPoolSDNode instead of const ConstantPoolSDNode - doesn't affect the accessors at all and makes it easier to use result in calls expecting a SDNode.
2023-12-20 10:40:13 +00:00
Momchil Velikov
52820bdd68
[AArch64] Update target feature requirements of SVE bfloat instructions (#75596)
According to the latest update of the ISA
https://developer.arm.com/documentation/ddi0602/2023-09/?lang=en all of
the affected instruction encodings now require

    (FEAT_SVE2 or FEAT_SME2) and FEAT_SVE_B16B16
2023-12-20 10:16:40 +00:00
Nikita Popov
9d60e95bcd
[AMDGPU] Use poison instead of undef for non-demanded elements (#75914)
Return poison instead of undef for non-demanded lanes in the AMDGPU
demanded element simplification hook.

Also bail out of dmask is 0, as this case has special semantics:

> If DMASK==0, the TA overrides DMASK=1 and puts zeros in VGPR followed by
> LWE status if exists. TFE status is not generated since the fetch is dropped.
2023-12-20 11:01:59 +01:00
Yeting Kuo
b7376c3196
[RISCV][NFC] Add comments and tests for frint case of performFP_TO_INT_SATCombine. (#76014)
performFP_TO_INT_SATCombine could also serve pattern (fp_to_int_sat
(frint X)).
2023-12-20 14:56:28 +08:00