Phoebe Wang
5c73c5c9bf
[X86][NFC] Add missing immediate qualifier to VSM3RNDS2 instruction ( #131576 )
2025-03-17 17:59:39 +08:00
Phoebe Wang
ee2722fc88
[X86][AVX10.2-BF16] Remove [NE]P from intrinsic and instruction name ( #123335 )
...
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965
2025-01-24 15:49:28 +08:00
Phoebe Wang
24f177df61
[X86][AVX10.2-BF16] Update VCOMISBF16 intrinsics and instructions ( #123307 )
...
- Add `I` to intrinsics and instructions
- Add `_` before sbf16 in intrinsics
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965
2025-01-24 08:37:29 +08:00
Mikołaj Piróg
25653e558c
[AVX10.2] Update convert chapter intrinsic and mnemonics names ( #123656 )
...
Intel spec for avx10.2
(https://cdrdv2.intel.com/v1/dl/getContent/828965 ) has been updated.
This PR changes relevant names from the "AVX10 CONVERT INSTRUCTIONS"
chapter .
2025-01-23 22:23:56 +08:00
Simon Pilgrim
90e9895a93
[X86] Handle BSF/BSR "zero-input pass through" behaviour ( #123623 )
...
Intel docs have been updated to be similar to AMD and now describe
BSF/BSR as not changing the destination register if the input value was
zero, which allows us to support CTTZ/CTLZ zero-input cases by setting
the destination to support a NumBits result (BSR is a bit messy as it
has to be XOR'd to create a CTLZ result). VIA/Zhaoxin x86_64 CPUs have also
been confirmed to match this behaviour.
This patch adjusts the X86ISD::BSF/BSR nodes to take a "pass through"
argument for zero-input cases, by default this is set to UNDEF to match
existing behaviour, but it can be set to a suitable value if supported.
There are still some limits to this - its only supported for x86_64
capable processors (and I've only enabled it for x86_64 codegen), and
Intel CPUs sometimes zero the upper 32-bits of a pass through register
when used for BSR32/BSF32 with a zero source value (i.e. the whole
64bits may not get passed through).
Fixes #122004
2025-01-23 12:59:59 +00:00
Phoebe Wang
4f40b07533
[X86][AVX10.2-SATCVT][NFC] Remove NE from intrinsic and instruction name ( #123275 )
...
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965
2025-01-22 22:53:47 +08:00
Phoebe Wang
13c6abfac8
[X86][AVX10.2-MINMAX][NFC] Remove NE[P] from intrinsic and instruction ( #123272 )
...
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965
2025-01-21 19:55:09 +08:00
Phoebe Wang
9cd774d1e4
[X86][NFC] Move "_Int" after "k"/"kz" ( #121450 )
...
Address comment at
https://github.com/llvm/llvm-project/pull/121373#discussion_r1900402932
2025-01-02 21:02:19 +08:00
Phoebe Wang
23ec9ee17e
[X86][AVX10.2] Lower fmininum/fmaximum to VMINMAX* ( #121373 )
2025-01-02 11:30:26 +08:00
Simon Pilgrim
29f11f0a32
[X86] Add missing reg/imm attributes to VRNDSCALES instruction names ( #117203 )
...
More canonicalization of the instruction names to make the predictable - more closely matches VRNDSCALEP / VROUND equivalent instructions
2024-11-22 17:45:30 +00:00
Simon Pilgrim
3a5cf6d99b
[X86] Rename AVX512 VEXTRACT/INSERT??x? to VEXTRACT/INSERT??X? ( #116826 )
...
Use uppercase in the subvector description ("32x2" -> "32X4" etc.) - matches what we already do in VBROADCAST??X?, and we try to use uppercase for all x86 instruction mnemonics anyway (and lowercase just for the arg description suffix).
2024-11-20 08:25:01 +00:00
Simon Pilgrim
7dcefb37a4
[X86] Tidyup up AVX512 FPCLASS instruction naming ( #116661 )
...
FPCLASS is a unary instruction with an immediate operand - update the naming to match similar instructions (e.g. VPSHUFD) by only using the source reg/mem and immediate in the instruction name
2024-11-19 11:26:46 +00:00
Simon Pilgrim
d4f2b71c3f
[X86] Fix position of immediate argument in AVX512 VPCMP comparisons ( #116646 )
...
The 'i' arg was being put between the 'm' and 'b' args instead of afterwards like other avx512 instructions (VCMPPS/D, VPERMILPS/D etc.).
2024-11-19 10:00:24 +00:00
Mahesh-Attarde
e61a7dc256
[X86][AVX512] Use comx for compare ( #113567 )
...
We added AVX10.2 COMEF ISA in LLVM, This does not optimize correctly in
scenario mentioned below.
Summary
Input
```
define i1 @oeq(float %x, float %y) {
%1 = fcmp oeq float %x, %y
ret i1 %1
}define i1 @une(float %x, float %y) {
%1 = fcmp une float %x, %y
ret i1 %1
}define i1 @ogt(float %x, float %y) {
%1 = fcmp ogt float %x, %y
ret i1 %1
}
// Prior AVX10.2, default code generation
oeq: # @oeq
cmpeqss xmm0, xmm1
movd eax, xmm0
and eax, 1
ret
une: # @une
cmpneqss xmm0, xmm1
movd eax, xmm0
and eax, 1
ret
ogt: # @ogt
ucomiss xmm0, xmm1
seta al
ret
```
This patch will remove `cmpeqss` and `cmpneqss`. For complete transform
check unit test.
Continuing on what PR https://github.com/llvm/llvm-project/pull/113098
added
Earlier Legalization and combine expanded `setcc oeq:ch` node into `and`
and `setcc eq` , `setcc o`. From suggestions in community
new internal transform
```
Optimized type-legalized selection DAG: %bb.0 'hoeq:'
SelectionDAG has 11 nodes:
t0: ch,glue = EntryToken
t2: f16,ch = CopyFromReg t0, Register:f16 %0
t4: f16,ch = CopyFromReg t0, Register:f16 %1
t14: i8 = setcc t2, t4, setoeq:ch
t10: ch,glue = CopyToReg t0, Register:i8 $al, t14
t11: ch = X86ISD::RET_GLUE t10, TargetConstant:i32<0>, Register:i8 $al, t10:1
Optimized legalized selection DAG: %bb.0 'hoeq:'
SelectionDAG has 12 nodes:
t0: ch,glue = EntryToken
t2: f16,ch = CopyFromReg t0, Register:f16 %0
t4: f16,ch = CopyFromReg t0, Register:f16 %1
t15: i32 = X86ISD::UCOMX t2, t4
t17: i8 = X86ISD::SETCC TargetConstant:i8<4>, t15
t10: ch,glue = CopyToReg t0, Register:i8 $al, t17
t11: ch = X86ISD::RET_GLUE t10, TargetConstant:i32<0>, Register:i8 $al, t10:1
```
Earlier transform is mentioned here
https://github.com/llvm/llvm-project/pull/113098#discussion_r1810307663
---------
Co-authored-by: mattarde <mattarde@intel.com>
2024-10-30 16:17:25 +08:00
Freddy Ye
5aa1275d03
[X86] Support SM4 EVEX version intrinsics/instructions. ( #113402 )
...
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368
2024-10-28 10:46:16 +08:00
Mahesh-Attarde
311e4e3245
[X86][AVX10.2] Support AVX10.2 MOVZXC new Instructions. ( #108537 )
...
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965
Chapter 14 INTEL® AVX10 ZERO-EXTENDING PARTIAL VECTOR COPY INSTRUCTIONS
---------
Co-authored-by: mattarde <mattarde@intel.com>
2024-09-18 21:01:51 +08:00
Mahesh-Attarde
f5ad9e1ca5
[X86][AVX10.2] Support AVX10.2-COMEF new instructions. ( #108063 )
...
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965
Chapter 8 AVX10 COMPARE SCALAR FP WITH ENHANCED EFLAGS INSTRUCTIONS
---------
Co-authored-by: mattarde <mattarde@intel.com>
2024-09-18 17:55:29 +08:00
Simon Pilgrim
1e33bd2031
[X86] Add missing immediate qualifier to the (V)PINSR/PEXTR instruction names
...
Makes it easier to algorithmically recreate the instruction name in various analysis scripts I'm working on
2024-09-15 14:12:55 +01:00
Simon Pilgrim
7048857f52
[X86] Add missing immediate qualifier to the (V)EXTRACTPS instruction names
...
Makes it easier to algorithmically recreate the instruction name in various analysis scripts I'm working on
2024-09-15 13:41:46 +01:00
Simon Pilgrim
614a064cac
[X86] Add missing immediate qualifier to the (V)INSERT/EXTRACT/PERM2 instruction names ( #108593 )
...
Makes it easier to algorithmically recreate the instruction name in various analysis scripts I'm working on
2024-09-15 11:42:13 +01:00
Malay Sanghi
a409ebc1fc
[X86][AVX10.2] Support AVX10.2-SATCVT-DS new instructions. ( #102592 )
...
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965
2024-09-12 22:45:20 +08:00
Freddy Ye
83ad644afa
[X86][AVX10.2] Support AVX10.2-BF16 new instructions. ( #101603 )
...
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965
2024-09-04 08:13:24 +08:00
Freddy Ye
7c4cadfc43
[X86][AVX10.2] Support AVX10.2-CONVERT new instructions. ( #101600 )
...
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965
2024-08-21 15:44:06 +08:00
Freddy Ye
80721e0d6c
[X86][AVX10.2] Support AVX10.2-SATCVT new instructions. ( #101599 )
...
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965
2024-08-06 19:37:49 +08:00
Phoebe Wang
b0329206db
[X86][AVX10.2] Support AVX10.2 VNNI FP16/INT8/INT16 new instructions ( #101783 )
...
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965
2024-08-05 18:57:42 +08:00
Freddy Ye
3d5cc7e1e6
[X86][AVX10.2] Support AVX10.2-MINMAX new instructions. ( #101598 )
...
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965
2024-08-05 11:06:02 +08:00
Phoebe Wang
259ca9ee9c
Reland "[X86][AVX10.2] Support AVX10.2 option and VMPSADBW/VADDP[D,H,S] new instructions ( #101452 )" ( #101616 )
...
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965
2024-08-03 09:26:07 +08:00
Phoebe Wang
2e0588d5e1
Revert "[X86][AVX10.2] Support AVX10.2 option and VMPSADBW/VADDP[D,H,S] new instructions" ( #101612 )
...
Reverts llvm/llvm-project#101452
There are several buildbot failed. Revert first.
2024-08-02 13:04:10 +08:00
Phoebe Wang
10bad2c8d7
[X86][AVX10.2] Support AVX10.2 option and VMPSADBW/VADDP[D,H,S] new instructions ( #101452 )
...
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965
2024-08-02 12:10:50 +08:00
Freddy Ye
13b265c7b5
[X86][MC] Support Intel FRED and LKGS instructions. ( #91909 )
...
Spec reference: https://cdrdv2.intel.com/v1/dl/getContent/678938
2024-05-15 10:40:16 +08:00
Freddy Ye
de3e4a9dfe
[X86][APX] Remove KEYLOCKER and SHA promotions from EVEX MAP4. ( #89173 )
...
APX spec: https://cdrdv2.intel.com/v1/dl/getContent/784266
Change happended in version 4.0.
Removed instructions' Opcodes:
AESDEC128KL
AESDEC256KL
AESDECWIDE128KL
AESDECWIDE256KL
AESENC128KL
AESENC256KL
AESENCWIDE128KL
AESENCWIDE256KL
ENCODEKEY128
ENCODEKEY256
SHA1MSG1
SHA1MSG2
SHA1NEXTE
SHA1RNDS4
SHA256MSG1
SHA256MSG2
SHA256RNDS2
2024-04-19 10:56:59 +08:00
Freddy Ye
f4509cf284
[X86][MC] Support enc/dec for SETZUCC and promoted SETCC. ( #86473 )
...
apx-spec: https://cdrdv2.intel.com/v1/dl/getContent/784266
apx-syntax-recommendation:
https://cdrdv2.intel.com/v1/dl/getContent/817241
2024-04-11 10:18:29 +08:00
Simon Pilgrim
ecb34599bd
[X86] Add missing immediate qualifier to the (V)ROUND instructions ( #87636 )
...
Makes it easier to algorithmically recreate the instruction name in various analysis scripts I'm working on
2024-04-04 15:20:16 +01:00
Freddy Ye
db7d243978
[X86][MC] Support enc/dec for IMULZU. ( #86653 )
...
apx-spec: https://cdrdv2.intel.com/v1/dl/getContent/784266
apx-syntax-recommendation:
https://cdrdv2.intel.com/v1/dl/getContent/817241
2024-03-29 15:52:41 +08:00
XinWang10
7b766a6f50
[X86] Support APX CMOV/CFCMOV instructions ( #82592 )
...
This patch support ND CMOV instructions and CFCMOV instructions.
RFC:
https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4
2024-03-17 20:18:56 +08:00
Simon Pilgrim
0858c906db
[X86] Add missing register qualifier to the VBLENDVPD/VBLENDVPS/VPBLENDVB instruction names
...
Matches the SSE variants (which has a 0 qualifier to indicate the xmm0 explicit dependency)
2024-03-11 15:48:07 +00:00
Simon Pilgrim
1ec5b1f483
[X86] Add missing immediate qualifier to the (V)PCLMULQDQ instruction names
2024-03-11 13:39:25 +00:00
Simon Pilgrim
2b8f1daf78
[X86] Add missing immediate qualifier to the SSE42 (V)PCMPEST/PCMPIST string instruction names
2024-03-11 13:02:48 +00:00
Simon Pilgrim
92d7aca441
[X86] Add missing immediate qualifier to the (V)CMPSS/D instructions ( #84496 )
...
Matches (V)CMPPS/D and makes it easier to algorithmically recreate the instruction name in various analysis scripts I'm working on
2024-03-09 16:21:25 +00:00
Shengchen Kan
1ca8092e87
[X86][MC] Support encoding/decoding for APX CCMP/CTEST ( #83863 )
...
APX assembly syntax recommendations:
https://cdrdv2.intel.com/v1/dl/getContent/817241
NOTE:
The change in llvm/tools/llvm-exegesis/lib/X86/Target.cpp is for test
LLVM ::
tools/llvm-exegesis/X86/latency/latency-SETCCr-cond-codes-sweep.s
For `SETcc`, llvm-exegesis would randomly choose 1 other instruction to
test with `SETcc`, after selecting the instruction, llvm-exegesis would
check if the operand is initialized and valid, if not
`randomizeTargetMCOperand` would choose a value for invalid operand, it
misses support for condition code operand, which cause the flaky failure
after `CCMP` supported.
llvm-exegesis can choose `CCMP` without specifying ccmp feature b/c it
use `MCSubtarget` and only16/32/64 bit is considered.
llvm-exegesis doesn't choose other instructions b/c requirement in
`hasAliasingRegistersThrough`: the instruction should use GPR (defined
by `SETcc`) and define `EFLAGS` (used by `SETcc`).
2024-03-08 20:54:33 +08:00
XinWang10
d9e875dcc1
[X86][MC] Support encoding/decoding for APX variant LZCNT/TZCNT/POPCNT instructions ( #79954 )
...
Two variants: promoted legacy, NF (no flags update).
The syntax of NF instructions is aligned with GNU binutils.
https://sourceware.org/pipermail/binutils/2023-September/129545.html
2024-01-31 21:10:02 +08:00
Shengchen Kan
7c3ee7cbe6
[X86][tablgen] Fix the broadcast tables ( #79675 )
2024-01-28 09:06:27 +08:00
XinWang10
02d56801ee
[X86] Support APX promoted RAO-INT and MOVBE instructions ( #77431 )
...
R16-R31 was added into GPRs in
https://github.com/llvm/llvm-project/pull/70958 ,
This patch supports the promoted RAO-INT and MOVBE instructions in EVEX
space.
RFC:
https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4
2024-01-26 14:33:45 +08:00
XinWang10
816cc9d24b
[X86][MC] Support Enc/Dec for NF BMI instructions ( #76709 )
...
Promoted BMI instructions were supported in #73899
2024-01-25 10:33:14 +08:00
Shengchen Kan
5c68c6d70f
[X86] Support encoding/decoding and lowering for APX variant SHL/SHR/SAR/ROL/ROR/RCL/RCR/SHLD/SHRD ( #78853 )
...
Four variants: promoted legacy, ND (new data destination), NF (no flags
update) and NF_ND (NF + ND).
The syntax of NF instructions is aligned with GNU binutils.
https://sourceware.org/pipermail/binutils/2023-September/129545.html
2024-01-23 10:23:27 +08:00
Shengchen Kan
4daea501c4
[X86][MC] Support encoding/decoding for APX variant MUL/IMUL/DIV/IDIV instructions ( #76919 )
...
Four variants: promoted legacy, ND (new data destination), NF (no flags
update) and NF_ND (NF + ND).
The syntax of NF instructions is aligned with GNU binutils.
https://sourceware.org/pipermail/binutils/2023-September/129545.html
2024-01-05 17:16:55 +08:00
Shengchen Kan
dd9681f839
[X86][MC] Support encoding/decoding for APX variant INC/DEC/ADCX/ADOX instructions ( #76721 )
...
Four variants: promoted legacy, ND (new data destination), NF (no flags
update) and NF_ND (NF + ND).
The syntax of NF instructions is aligned with GNU binutils.
https://sourceware.org/pipermail/binutils/2023-September/129545.html
2024-01-04 10:12:12 +08:00
XinWang10
d8db2733c8
[X86][MC] Support Enc/Dec for EGPR for promoted CRC32 ( #76434 )
...
R16-R31 was added into GPRs in
https://github.com/llvm/llvm-project/pull/70958 ,
This patch supports the encoding/decoding for promoted CRC32 instruction
in EVEX space.
RFC:
https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4
2024-01-02 10:48:42 +08:00
Shengchen Kan
d3ddb93d04
[X86] Fix typo about the internal name of instructions
...
64ri -> 64ri32
2023-12-29 12:18:34 +08:00
Shengchen Kan
d79ccee8dc
[X86][MC] Support encoding/decoding for APX variant ADD/SUB/ADC/SBB/OR/XOR/NEG/NOT instructions ( #76319 )
...
Four variants: promoted legacy, ND (new data destination), NF (no flags
update) and NF_ND (NF + ND).
The syntax of NF instructions is aligned with GNU binutils.
https://sourceware.org/pipermail/binutils/2023-September/129545.html
2023-12-28 21:22:03 +08:00