62 Commits

Author SHA1 Message Date
Shengchen Kan
8e77390c06
[X86][CodeGen] Support folding memory broadcast in X86InstrInfo::foldMemoryOperandImpl (#79761) 2024-01-31 12:51:03 +08:00
Shengchen Kan
169553688c [X86][NFC] Remove TB_FOLDED_BCAST and format code in X86InstrFoldTables.cpp 2024-01-30 00:27:16 +08:00
Shengchen Kan
d133ada946 [X86][CodeGen] Add missing BroadcastTable1 in X86MemUnfoldTable 2024-01-30 00:02:18 +08:00
Shengchen Kan
cfb702676c [X86][NFC] Rename lookupBroadcastFoldTable to lookupBroadcastFoldTableBySize
Address RKSimon's comments in #79761
2024-01-29 23:23:07 +08:00
Shengchen Kan
7c3ee7cbe6
[X86][tablgen] Fix the broadcast tables (#79675) 2024-01-28 09:06:27 +08:00
Simon Pilgrim
b8bbd5fe6f [X86] X86InstrFoldTables.cpp - add Op4 Broadcast Fold/Unfold table entries
Prep work for #73509 (missed in #73654)
2023-11-30 15:20:42 +00:00
Shengchen Kan
a4e1aa256b
[X86][tablgen] Auto-gen broadcast tables (#73654)
1. Add TB_BCAST_SH for FP16
2. Auto-gen 4 broadcast tables BroadcastTable[1-4]

issue: https://github.com/llvm/llvm-project/issues/66360
2023-11-30 22:24:31 +08:00
Shengchen Kan
bafa51c8a5 [X86] Rename X86MemoryFoldTableEntry to X86FoldTableEntry, NFCI
b/c it's used for element that folds a load, store or broadcast.
2023-11-28 19:49:14 +08:00
Shengchen Kan
c66c15a76d [X86] Rename some variables for memory fold and format code, NFCI
1. Rename the names of tables to simplify the print
2. Align the abbreviation in the same file Instr -> Inst
3. Clang-format
4. Capitalize the first char of the variable name
2023-11-28 19:07:44 +08:00
Shengchen Kan
1b1f3c20b5 [X86][CodeGen] Remove duplicated code for the table checks, NFCI 2023-11-28 14:07:32 +08:00
Simon Pilgrim
0b91de5ea3 [X86] Add X86FixupVectorConstantsPass to re-fold AVX512 vector load folds as broadcast folds
This patch analyzes AVX512 instructions for full vector width folded loads from the constant pool and attempts to determine if it can be replaced with a smaller broadcast folded variant. Typically the broadcast opportunities were missed by type-width mismatches or mulituse limitations which have been removed in later passes.

As well as introducing broadcast fold tables (which can hopefully be extended/automated in the future), this also handles mismatches in the AND/ANDN/OR/XOR/TERNLOG type-widths, catching additional missed opportunities.

This is patch is pulled from the ongoing work based on D150143, but without removing the existing DAG constant broadcast lowering code - this patch is currently a late stage cleanup only.

The intention is to add additional broadcast/extension handling of constants in future patches, but it turned out that AVX512 broadcast handling was the easiest to start with.

Differential Revision: https://reviews.llvm.org/D150526
2023-05-23 10:58:17 +01:00
Shengchen Kan
f3d9abf1f8 [X86][mem-fold] Use the generated memory folding table
Reviewed By: yubing

Differential Revision: https://reviews.llvm.org/D147527
2023-04-06 19:49:39 +08:00
Bing1 Yu
4f4b2161ec [X86][NFC] Move MemoryFoldTable2Addr MemoryFoldTable0~4 into X86InstrFoldTables.def
Reviewed By: pengfei, skan

Differential Revision: https://reviews.llvm.org/D142083
2023-02-02 16:38:50 +08:00
serge-sans-paille
38818b60c5
Move from llvm::makeArrayRef to ArrayRef deduction guides - llvm/ part
Use deduction guides instead of helper functions.

The only non-automatic changes have been:

1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t*), (uint8_t*))
2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase.
3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated.
4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that).

Per reviewers' comment, some useless makeArrayRef have been removed in the process.

This is a follow-up to https://reviews.llvm.org/D140896 that introduced
the deduction guides.

Differential Revision: https://reviews.llvm.org/D140955
2023-01-05 14:11:08 +01:00
Freddy Ye
23f02693ec [X86] Add AVX-VNNI-INT8 instructions.
For more details about these instructions, please refer to the latest ISE document: https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html

Reviewed By: pengfei, skan

Differential Revision: https://reviews.llvm.org/D135938
2022-10-28 10:39:54 +08:00
Freddy Ye
0e720e6ada [X86] Add AVX-IFMA instructions.
For more details about these instructions, please refer to the latest ISE document: https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html

Reviewed By: pengfei, skan

Differential Revision: https://reviews.llvm.org/D135932
2022-10-28 09:42:30 +08:00
Nicolai Hähnle
ede600377c ManagedStatic: remove many straightforward uses in llvm
(Reapply after revert in e9ce1a588030d8d4004f5d7e443afe46245e9a92 due to
Fuchsia test failures. Removed changes in lib/ExecutionEngine/ other
than error categories, to be checked in more detail and reapplied
separately.)

Bulk remove many of the more trivial uses of ManagedStatic in the llvm
directory, either by defining a new getter function or, in many cases,
moving the static variable directly into the only function that uses it.

Differential Revision: https://reviews.llvm.org/D129120
2022-07-10 10:29:15 +02:00
Nicolai Hähnle
e9ce1a5880 Revert "ManagedStatic: remove many straightforward uses in llvm"
This reverts commit e6f1f062457c928c18a88c612f39d9e168f65a85.

Reverting due to a failure on the fuchsia-x86_64-linux buildbot.
2022-07-10 09:54:30 +02:00
Nicolai Hähnle
e6f1f06245 ManagedStatic: remove many straightforward uses in llvm
Bulk remove many of the more trivial uses of ManagedStatic in the llvm
directory, either by defining a new getter function or, in many cases,
moving the static variable directly into the only function that uses it.

Differential Revision: https://reviews.llvm.org/D129120
2022-07-10 09:15:08 +02:00
Shengchen Kan
4d21497006 [X86] Remove TB_NO_REVERSE for 2 memory folding entries
```
X86::MMX_MOVD64from64rr -> X86::MMX_MOVQ64mr
X86::MMX_MOVD64grr -> X86::MMX_MOVD64mr
```

These two entries were added in llvm-svn: 372770.
I think these two should be reversable.

Reviewed By: RKSimon, pengfei

Differential Revision: https://reviews.llvm.org/D122217
2022-04-06 17:21:12 +08:00
Craig Topper
9933015fdd [X86] Fold MMX_MOVD64from64rr + store to MMX_MOVQ64mr instead of MMX_MOVD64from64mr.
MMX_MOVD64from64rr moves an MMX register to a 64-bit GPR.

MMX_MOVD64from64mr is the memory version of moving a MMX register to a
64-bit GPR. It requires the REX.W bit to be set. There are no isel
patterns that use this instruction.

MMX_MOVQ64mr is the MMX register store instruction. It doesn't
require a REX.W prefix. This makes it one byte shorter to encode
than MMX_MOVD64from64mr in many cases.

Both store instructions output the same mnemonic string. The assembler
would choose MMX_MOVQ64mr if it was to parse the output. Which is
another reason using it is the correct thing to do.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D122241
2022-03-22 14:21:55 -07:00
Shengchen Kan
021b42367a [X86] Rename MMX_MOVD64from64rm to MMX_MOVD64from64mr b/c it stores sth, NFC
Reviewed By: pengfei, RKSimon

Differential Revision: https://reviews.llvm.org/D122216
2022-03-22 17:59:28 +08:00
Simon Pilgrim
41052fd699 [X86][MMX] Remove superfluous 'i' from MMX cvt opnames. NFCI.
This is a very old copy+paste typo - none of these cvt ops have an immediate operand.

Noticed while trying to merge MMX instructions into some existing SSE instruction scheduler instregex patterns.
2021-12-12 17:59:16 +00:00
Simon Pilgrim
0a08813cad [X86][MMX] Remove superfluous 'i' from MMX binop opnames. NFCI.
This is a very old copy+paste typo - none of these binops have an immediate operand.

Noticed while trying to merge MMX instructions into some existing SSE instruction scheduler instregex patterns.
2021-12-12 17:59:16 +00:00
Wang, Pengfei
ab40dbfe03 [X86] AVX512FP16 instructions enabling 6/6
Enable FP16 complex FMA instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105269
2021-08-30 13:08:45 +08:00
Wang, Pengfei
c728bd5bba [X86] AVX512FP16 instructions enabling 5/6
Enable FP16 FMA instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105268
2021-08-24 09:07:19 +08:00
Wang, Pengfei
b088536ce9 [X86] AVX512FP16 instructions enabling 4/6
Enable FP16 unary operator instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105267
2021-08-22 08:59:35 +08:00
Wang, Pengfei
2379949aad [X86] AVX512FP16 instructions enabling 3/6
Enable FP16 conversion instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105265
2021-08-18 09:03:41 +08:00
Wang, Pengfei
f1de9d6dae [X86] AVX512FP16 instructions enabling 2/6
Enable FP16 binary operator instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105264
2021-08-15 08:56:33 +08:00
Liu, Chen3
756f597841 [X86] Support Intel avxvnni
This patch mainly made the following changes:

1. Support AVX-VNNI instructions;
2. Introduce ExplicitVEXPrefix flag so that vpdpbusd/vpdpbusds/vpdpbusds/vpdpbusds instructions only use vex-encoding when user explicity add {vex} prefix.

Differential Revision: https://reviews.llvm.org/D89105
2020-10-31 12:39:51 +08:00
Craig Topper
e28376ec28 [X86] Add i32->float and i64->double bitcast pseudo instructions to store folding table.
We have pseudo instructions we use for bitcasts between these types.
We have them in the load folding table, but not the store folding
table. This adds them there so they can be used for stack spills.

I added an exact size check so that we don't fold when the stack slot
is larger than the GPR. Otherwise the upper bits in the stack slot
would be garbage. That would be fine for Eli's test case in PR47874,
but I'm not sure its safe in general.

A step towards fixing PR47874. Next steps are to change the ADDSSrr_Int
pseudo instructions to use FR32 as the second source register class
instead of VR128. That will keep the coalescer from promoting the
register class of the bitcast instruction which will make the stack
slot 4 bytes instead of 16 bytes.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D89656
2020-10-19 12:53:14 -07:00
Craig Topper
912cd8a37f [X86] Add vpternlog to the broadcast unfolding table. 2020-07-02 13:43:44 -07:00
Georgii Rymar
1647ff6e27 [ADT/STLExtras.h] - Add llvm::is_sorted wrapper and update callers.
It can be used to avoid passing the begin and end of a range.
This makes the code shorter and it is consistent with another
wrappers we already have.

Differential revision: https://reviews.llvm.org/D78016
2020-04-14 14:11:02 +03:00
Craig Topper
7badb38918 [X86] Fix copy/paste mistake in comment. NFC 2020-02-14 09:47:50 -08:00
Craig Topper
0152b106ae [X86] Add the recently added (V)CVTSS2SI/CVTSD2SI instructions used for LRINT/LLRINT to the load folding tables. 2020-02-08 17:54:48 -08:00
Simon Pilgrim
10417ad2e4 [X86] Standardize BROADCAST enum names (PR31079)
Tweak EVEX implementation names so it matches the other variants by adding the 'r' prefix. Oddly some of the subvec broadcast ops already matched.
2020-02-08 16:55:00 +00:00
Simon Pilgrim
0ed79e9b8f [X86] Standardize VPSLLDQ/VPSRLDQ enum names (PR31079)
Tweak EVEX implementation names so it matches the other variants
2020-02-08 14:54:44 +00:00
Craig Topper
8b5ad3d16e [X86] Add broadcast load unfolding support for VPTESTMD/Q and VPTESTNMD/Q.
llvm-svn: 373138
2019-09-28 01:56:36 +00:00
Simon Pilgrim
a7f27f357d [X86] Add MMX MOVD/MOVQ stores to folding tables to support stack folding
llvm-svn: 372770
2019-09-24 16:15:32 +00:00
Craig Topper
0e533ca4bb [X86] Add broadcast load unfolding support for VCMPPS/PD.
llvm-svn: 371487
2019-09-10 05:49:53 +00:00
Craig Topper
a88f58ff0e [X86] Add broadcast load unfolding support for vpcmpeq/vpcmpgt/vpcmp/vpcmpu.
llvm-svn: 371368
2019-09-09 07:46:11 +00:00
Craig Topper
8c2ab1c4cb [X86] Add broadcast load unfold support for smin/umin/smax/umax.
llvm-svn: 371366
2019-09-09 06:32:24 +00:00
Craig Topper
ad7822329f [X86] Add broadcast load unfolding support for VMAXPS/PD and VMINPS/PD.
llvm-svn: 371363
2019-09-09 04:25:01 +00:00
Craig Topper
1829a09bea [X86] Add support for unfold broadcast loads from FMA instructions.
llvm-svn: 371323
2019-09-07 21:54:40 +00:00
Craig Topper
3ab210862a [X86] Add initial support for unfolding broadcast loads from arithmetic instructions to enable LICM hoisting of the load
MachineLICM can hoist an invariant load, but if that load is folded it needs to be unfolded. On AVX512 sometimes this load is an broadcast load which we were previously unable to unfold. This patch adds initial support for that with a very basic list of supported instructions as a starting point.

Differential Revision: https://reviews.llvm.org/D67017

llvm-svn: 370620
2019-09-01 22:14:36 +00:00
Craig Topper
46f2b583a2 [X86] Add MOVSDrr->MOVLPDrm entry to load folding table. Add custom handling to turn UNPCKLPDrr->MOVHPDrm when load is under aligned.
If the load is aligned we can turn UNPCKLPDrr into UNPCKLPDrm.

llvm-svn: 365287
2019-07-08 02:10:20 +00:00
Fangrui Song
dc8de6037c Simplify std::lower_bound with llvm::{bsearch,lower_bound}. NFC
llvm-svn: 364006
2019-06-21 05:40:31 +00:00
Craig Topper
587427716c [X86] Remove MOVDI2SSrm/MOV64toSDrm/MOVSS2DImr/MOVSDto64mr CodeGenOnly instructions.
The isel patterns for these use a bitcast and load/store, but
DAG combine should have canonicalized those away.

For the purposes of the memory folding table these opcodes can be
replaced by the MOVSSrm_alt/MOVSDrm_alt and MOVSSmr/MOVSDmr opcodes.

llvm-svn: 363644
2019-06-18 03:23:15 +00:00
Craig Topper
f3f968adcd [X86] Add TB_NO_REVERSE to some memory folding table entries where the register form requires 64-bit mode, but the memory form does not.
We don't know if its safe to unfold if we're in 32-bit mode.

This is simlar to what was done to some load opcodes in r363523.

I think its pretty unlikely we will try to unfold these anyway so
I don't think this is testable.

llvm-svn: 363595
2019-06-17 18:38:07 +00:00
Craig Topper
9f2f127009 [X86] Add TB_NO_REVERSE to some folding table entries where the register from uses the REX prefix, but the memory form does not.
It would not be safe to unfold the memory form the register form
without checking that we are compiling for 64-bit mode.

This probaby isn't a real functional issue since we are unlikely
to unfold any of these instructions since they don't have any
tied registers, aren't commutable, and don't have any inputs
other than the address.

llvm-svn: 363523
2019-06-16 22:33:09 +00:00