66 Commits

Author SHA1 Message Date
Phoebe Wang
5c73c5c9bf
[X86][NFC] Add missing immediate qualifier to VSM3RNDS2 instruction (#131576) 2025-03-17 17:59:39 +08:00
Phoebe Wang
ee2722fc88
[X86][AVX10.2-BF16] Remove [NE]P from intrinsic and instruction name (#123335)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965
2025-01-24 15:49:28 +08:00
Phoebe Wang
24f177df61
[X86][AVX10.2-BF16] Update VCOMISBF16 intrinsics and instructions (#123307)
- Add `I` to intrinsics and instructions
- Add `_` before sbf16 in intrinsics

Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965
2025-01-24 08:37:29 +08:00
Mikołaj Piróg
25653e558c
[AVX10.2] Update convert chapter intrinsic and mnemonics names (#123656)
Intel spec for avx10.2
(https://cdrdv2.intel.com/v1/dl/getContent/828965) has been updated.
This PR changes relevant names from the "AVX10 CONVERT INSTRUCTIONS"
chapter .
2025-01-23 22:23:56 +08:00
Simon Pilgrim
90e9895a93
[X86] Handle BSF/BSR "zero-input pass through" behaviour (#123623)
Intel docs have been updated to be similar to AMD and now describe
BSF/BSR as not changing the destination register if the input value was
zero, which allows us to support CTTZ/CTLZ zero-input cases by setting
the destination to support a NumBits result (BSR is a bit messy as it
has to be XOR'd to create a CTLZ result). VIA/Zhaoxin x86_64 CPUs have also
been confirmed to match this behaviour.

This patch adjusts the X86ISD::BSF/BSR nodes to take a "pass through"
argument for zero-input cases, by default this is set to UNDEF to match
existing behaviour, but it can be set to a suitable value if supported.

There are still some limits to this - its only supported for x86_64
capable processors (and I've only enabled it for x86_64 codegen), and
Intel CPUs sometimes zero the upper 32-bits of a pass through register
when used for BSR32/BSF32 with a zero source value (i.e. the whole
64bits may not get passed through).

Fixes #122004
2025-01-23 12:59:59 +00:00
Phoebe Wang
4f40b07533
[X86][AVX10.2-SATCVT][NFC] Remove NE from intrinsic and instruction name (#123275)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965
2025-01-22 22:53:47 +08:00
Phoebe Wang
13c6abfac8
[X86][AVX10.2-MINMAX][NFC] Remove NE[P] from intrinsic and instruction (#123272)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965
2025-01-21 19:55:09 +08:00
Phoebe Wang
9cd774d1e4
[X86][NFC] Move "_Int" after "k"/"kz" (#121450)
Address comment at
https://github.com/llvm/llvm-project/pull/121373#discussion_r1900402932
2025-01-02 21:02:19 +08:00
Phoebe Wang
23ec9ee17e
[X86][AVX10.2] Lower fmininum/fmaximum to VMINMAX* (#121373) 2025-01-02 11:30:26 +08:00
Simon Pilgrim
29f11f0a32
[X86] Add missing reg/imm attributes to VRNDSCALES instruction names (#117203)
More canonicalization of the instruction names to make the predictable - more closely matches VRNDSCALEP / VROUND equivalent instructions
2024-11-22 17:45:30 +00:00
Simon Pilgrim
3a5cf6d99b
[X86] Rename AVX512 VEXTRACT/INSERT??x? to VEXTRACT/INSERT??X? (#116826)
Use uppercase in the subvector description ("32x2" -> "32X4" etc.) - matches what we already do in VBROADCAST??X?, and we try to use uppercase for all x86 instruction mnemonics anyway (and lowercase just for the arg description suffix).
2024-11-20 08:25:01 +00:00
Simon Pilgrim
7dcefb37a4
[X86] Tidyup up AVX512 FPCLASS instruction naming (#116661)
FPCLASS is a unary instruction with an immediate operand - update the naming to match similar instructions (e.g. VPSHUFD) by only using the source reg/mem and immediate in the instruction name
2024-11-19 11:26:46 +00:00
Simon Pilgrim
d4f2b71c3f
[X86] Fix position of immediate argument in AVX512 VPCMP comparisons (#116646)
The 'i' arg was being put between the 'm' and 'b' args instead of afterwards like other avx512 instructions (VCMPPS/D, VPERMILPS/D etc.).
2024-11-19 10:00:24 +00:00
Mahesh-Attarde
e61a7dc256
[X86][AVX512] Use comx for compare (#113567)
We added AVX10.2 COMEF ISA in LLVM, This does not optimize correctly in
scenario mentioned below.
Summary
Input 
```
define i1 @oeq(float %x, float %y) {
    %1 = fcmp oeq float %x, %y
    ret i1 %1
}define i1 @une(float %x, float %y) {
    %1 = fcmp une float %x, %y
    ret i1 %1
}define i1 @ogt(float %x, float %y) {
    %1 = fcmp ogt float %x, %y
    ret i1 %1
}
// Prior AVX10.2, default code generation

oeq:                                    # @oeq
        cmpeqss xmm0, xmm1
        movd    eax, xmm0
        and     eax, 1
        ret
une:                                    # @une
        cmpneqss        xmm0, xmm1
        movd    eax, xmm0
        and     eax, 1
        ret
ogt:                                    # @ogt
        ucomiss xmm0, xmm1
        seta    al
        ret 
```

This patch will remove `cmpeqss` and `cmpneqss`. For complete transform
check unit test.

Continuing on what PR https://github.com/llvm/llvm-project/pull/113098
added

Earlier Legalization and combine expanded `setcc oeq:ch` node into `and`
and `setcc eq` , `setcc o`. From suggestions in community
new internal transform
```
Optimized type-legalized selection DAG: %bb.0 'hoeq:'
SelectionDAG has 11 nodes:
  t0: ch,glue = EntryToken
      t2: f16,ch = CopyFromReg t0, Register:f16 %0
      t4: f16,ch = CopyFromReg t0, Register:f16 %1
    t14: i8 = setcc t2, t4, setoeq:ch
  t10: ch,glue = CopyToReg t0, Register:i8 $al, t14
  t11: ch = X86ISD::RET_GLUE t10, TargetConstant:i32<0>, Register:i8 $al, t10:1

Optimized legalized selection DAG: %bb.0 'hoeq:'
SelectionDAG has 12 nodes:
  t0: ch,glue = EntryToken
        t2: f16,ch = CopyFromReg t0, Register:f16 %0
        t4: f16,ch = CopyFromReg t0, Register:f16 %1
      t15: i32 = X86ISD::UCOMX t2, t4
    t17: i8 = X86ISD::SETCC TargetConstant:i8<4>, t15
  t10: ch,glue = CopyToReg t0, Register:i8 $al, t17
  t11: ch = X86ISD::RET_GLUE t10, TargetConstant:i32<0>, Register:i8 $al, t10:1
```
Earlier transform is mentioned here
https://github.com/llvm/llvm-project/pull/113098#discussion_r1810307663

---------

Co-authored-by: mattarde <mattarde@intel.com>
2024-10-30 16:17:25 +08:00
Freddy Ye
5aa1275d03
[X86] Support SM4 EVEX version intrinsics/instructions. (#113402)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368
2024-10-28 10:46:16 +08:00
Mahesh-Attarde
311e4e3245
[X86][AVX10.2] Support AVX10.2 MOVZXC new Instructions. (#108537)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965

Chapter 14 INTEL® AVX10 ZERO-EXTENDING PARTIAL VECTOR COPY INSTRUCTIONS

---------

Co-authored-by: mattarde <mattarde@intel.com>
2024-09-18 21:01:51 +08:00
Mahesh-Attarde
f5ad9e1ca5
[X86][AVX10.2] Support AVX10.2-COMEF new instructions. (#108063)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965

Chapter 8  AVX10 COMPARE SCALAR FP WITH ENHANCED EFLAGS INSTRUCTIONS

---------

Co-authored-by: mattarde <mattarde@intel.com>
2024-09-18 17:55:29 +08:00
Simon Pilgrim
1e33bd2031 [X86] Add missing immediate qualifier to the (V)PINSR/PEXTR instruction names
Makes it easier to algorithmically recreate the instruction name in various analysis scripts I'm working on
2024-09-15 14:12:55 +01:00
Simon Pilgrim
7048857f52 [X86] Add missing immediate qualifier to the (V)EXTRACTPS instruction names
Makes it easier to algorithmically recreate the instruction name in various analysis scripts I'm working on
2024-09-15 13:41:46 +01:00
Simon Pilgrim
614a064cac
[X86] Add missing immediate qualifier to the (V)INSERT/EXTRACT/PERM2 instruction names (#108593)
Makes it easier to algorithmically recreate the instruction name in various analysis scripts I'm working on
2024-09-15 11:42:13 +01:00
Malay Sanghi
a409ebc1fc
[X86][AVX10.2] Support AVX10.2-SATCVT-DS new instructions. (#102592)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965
2024-09-12 22:45:20 +08:00
Freddy Ye
83ad644afa
[X86][AVX10.2] Support AVX10.2-BF16 new instructions. (#101603)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965
2024-09-04 08:13:24 +08:00
Freddy Ye
7c4cadfc43
[X86][AVX10.2] Support AVX10.2-CONVERT new instructions. (#101600)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965
2024-08-21 15:44:06 +08:00
Freddy Ye
80721e0d6c
[X86][AVX10.2] Support AVX10.2-SATCVT new instructions. (#101599)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965
2024-08-06 19:37:49 +08:00
Phoebe Wang
b0329206db
[X86][AVX10.2] Support AVX10.2 VNNI FP16/INT8/INT16 new instructions (#101783)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965
2024-08-05 18:57:42 +08:00
Freddy Ye
3d5cc7e1e6
[X86][AVX10.2] Support AVX10.2-MINMAX new instructions. (#101598)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965
2024-08-05 11:06:02 +08:00
Phoebe Wang
259ca9ee9c
Reland "[X86][AVX10.2] Support AVX10.2 option and VMPSADBW/VADDP[D,H,S] new instructions (#101452)" (#101616)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965
2024-08-03 09:26:07 +08:00
Phoebe Wang
2e0588d5e1
Revert "[X86][AVX10.2] Support AVX10.2 option and VMPSADBW/VADDP[D,H,S] new instructions" (#101612)
Reverts llvm/llvm-project#101452

There are several buildbot failed. Revert first.
2024-08-02 13:04:10 +08:00
Phoebe Wang
10bad2c8d7
[X86][AVX10.2] Support AVX10.2 option and VMPSADBW/VADDP[D,H,S] new instructions (#101452)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965
2024-08-02 12:10:50 +08:00
Freddy Ye
13b265c7b5
[X86][MC] Support Intel FRED and LKGS instructions. (#91909)
Spec reference: https://cdrdv2.intel.com/v1/dl/getContent/678938
2024-05-15 10:40:16 +08:00
Freddy Ye
de3e4a9dfe
[X86][APX] Remove KEYLOCKER and SHA promotions from EVEX MAP4. (#89173)
APX spec: https://cdrdv2.intel.com/v1/dl/getContent/784266
Change happended in version 4.0.
Removed instructions' Opcodes:
AESDEC128KL
AESDEC256KL
AESDECWIDE128KL
AESDECWIDE256KL
AESENC128KL
AESENC256KL
AESENCWIDE128KL
AESENCWIDE256KL
ENCODEKEY128
ENCODEKEY256
SHA1MSG1
SHA1MSG2
SHA1NEXTE
SHA1RNDS4
SHA256MSG1
SHA256MSG2
SHA256RNDS2
2024-04-19 10:56:59 +08:00
Freddy Ye
f4509cf284
[X86][MC] Support enc/dec for SETZUCC and promoted SETCC. (#86473)
apx-spec: https://cdrdv2.intel.com/v1/dl/getContent/784266
apx-syntax-recommendation:
https://cdrdv2.intel.com/v1/dl/getContent/817241
2024-04-11 10:18:29 +08:00
Simon Pilgrim
ecb34599bd
[X86] Add missing immediate qualifier to the (V)ROUND instructions (#87636)
Makes it easier to algorithmically recreate the instruction name in various analysis scripts I'm working on
2024-04-04 15:20:16 +01:00
Freddy Ye
db7d243978
[X86][MC] Support enc/dec for IMULZU. (#86653)
apx-spec: https://cdrdv2.intel.com/v1/dl/getContent/784266
apx-syntax-recommendation:
https://cdrdv2.intel.com/v1/dl/getContent/817241
2024-03-29 15:52:41 +08:00
XinWang10
7b766a6f50
[X86] Support APX CMOV/CFCMOV instructions (#82592)
This patch support ND CMOV instructions and CFCMOV instructions.

RFC:
https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4
2024-03-17 20:18:56 +08:00
Simon Pilgrim
0858c906db [X86] Add missing register qualifier to the VBLENDVPD/VBLENDVPS/VPBLENDVB instruction names
Matches the SSE variants (which has a 0 qualifier to indicate the xmm0 explicit dependency)
2024-03-11 15:48:07 +00:00
Simon Pilgrim
1ec5b1f483 [X86] Add missing immediate qualifier to the (V)PCLMULQDQ instruction names 2024-03-11 13:39:25 +00:00
Simon Pilgrim
2b8f1daf78 [X86] Add missing immediate qualifier to the SSE42 (V)PCMPEST/PCMPIST string instruction names 2024-03-11 13:02:48 +00:00
Simon Pilgrim
92d7aca441
[X86] Add missing immediate qualifier to the (V)CMPSS/D instructions (#84496)
Matches (V)CMPPS/D and makes it easier to algorithmically recreate the instruction name in various analysis scripts I'm working on
2024-03-09 16:21:25 +00:00
Shengchen Kan
1ca8092e87
[X86][MC] Support encoding/decoding for APX CCMP/CTEST (#83863)
APX assembly syntax recommendations:
  https://cdrdv2.intel.com/v1/dl/getContent/817241

NOTE:
The change in llvm/tools/llvm-exegesis/lib/X86/Target.cpp is for test
LLVM ::
tools/llvm-exegesis/X86/latency/latency-SETCCr-cond-codes-sweep.s

For `SETcc`, llvm-exegesis would randomly choose 1 other instruction to
test with `SETcc`, after selecting the instruction, llvm-exegesis would
check if the operand is initialized and valid, if not
`randomizeTargetMCOperand` would choose a value for invalid operand, it
misses support for condition code operand, which cause the flaky failure
after `CCMP` supported.

llvm-exegesis can choose `CCMP` without specifying ccmp feature b/c it
use `MCSubtarget` and only16/32/64 bit is considered.
llvm-exegesis doesn't choose other instructions b/c requirement in
`hasAliasingRegistersThrough`: the instruction should use GPR (defined
by `SETcc`) and define `EFLAGS` (used by `SETcc`).
2024-03-08 20:54:33 +08:00
XinWang10
d9e875dcc1
[X86][MC] Support encoding/decoding for APX variant LZCNT/TZCNT/POPCNT instructions (#79954)
Two variants: promoted legacy, NF (no flags update).

The syntax of NF instructions is aligned with GNU binutils.
https://sourceware.org/pipermail/binutils/2023-September/129545.html
2024-01-31 21:10:02 +08:00
Shengchen Kan
7c3ee7cbe6
[X86][tablgen] Fix the broadcast tables (#79675) 2024-01-28 09:06:27 +08:00
XinWang10
02d56801ee
[X86] Support APX promoted RAO-INT and MOVBE instructions (#77431)
R16-R31 was added into GPRs in
https://github.com/llvm/llvm-project/pull/70958,
This patch supports the promoted RAO-INT and MOVBE instructions in EVEX
space.

RFC:
https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4
2024-01-26 14:33:45 +08:00
XinWang10
816cc9d24b
[X86][MC] Support Enc/Dec for NF BMI instructions (#76709)
Promoted BMI instructions were supported in #73899
2024-01-25 10:33:14 +08:00
Shengchen Kan
5c68c6d70f
[X86] Support encoding/decoding and lowering for APX variant SHL/SHR/SAR/ROL/ROR/RCL/RCR/SHLD/SHRD (#78853)
Four variants: promoted legacy, ND (new data destination), NF (no flags
update) and NF_ND (NF + ND).

The syntax of NF instructions is aligned with GNU binutils.
https://sourceware.org/pipermail/binutils/2023-September/129545.html
2024-01-23 10:23:27 +08:00
Shengchen Kan
4daea501c4
[X86][MC] Support encoding/decoding for APX variant MUL/IMUL/DIV/IDIV instructions (#76919)
Four variants: promoted legacy, ND (new data destination), NF (no flags
update) and NF_ND (NF + ND).

The syntax of NF instructions is aligned with GNU binutils.
https://sourceware.org/pipermail/binutils/2023-September/129545.html
2024-01-05 17:16:55 +08:00
Shengchen Kan
dd9681f839
[X86][MC] Support encoding/decoding for APX variant INC/DEC/ADCX/ADOX instructions (#76721)
Four variants: promoted legacy, ND (new data destination), NF (no flags
update) and NF_ND (NF + ND).

The syntax of NF instructions is aligned with GNU binutils.
https://sourceware.org/pipermail/binutils/2023-September/129545.html
2024-01-04 10:12:12 +08:00
XinWang10
d8db2733c8
[X86][MC] Support Enc/Dec for EGPR for promoted CRC32 (#76434)
R16-R31 was added into GPRs in
https://github.com/llvm/llvm-project/pull/70958,
This patch supports the encoding/decoding for promoted CRC32 instruction
in EVEX space.

RFC:
https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4
2024-01-02 10:48:42 +08:00
Shengchen Kan
d3ddb93d04 [X86] Fix typo about the internal name of instructions
64ri -> 64ri32
2023-12-29 12:18:34 +08:00
Shengchen Kan
d79ccee8dc
[X86][MC] Support encoding/decoding for APX variant ADD/SUB/ADC/SBB/OR/XOR/NEG/NOT instructions (#76319)
Four variants: promoted legacy, ND (new data destination), NF (no flags
update) and NF_ND (NF + ND).

The syntax of NF instructions is aligned with GNU binutils.
https://sourceware.org/pipermail/binutils/2023-September/129545.html
2023-12-28 21:22:03 +08:00