15 Commits

Author SHA1 Message Date
Simon Pilgrim
1d56138d74 [X86] X86FixupVectorConstants - create f32/f64 broadcast constants if the source constant data was f32/f64
This partially reverts 33819f3bfb9c - the asm comments become a lot messier in #73509 - we're better off ensuring the constant data is the correct type in DAG
2023-12-12 10:32:04 +00:00
Simon Pilgrim
33819f3bfb [X86] X86FixupVectorConstants - create f32/f64 broadcast constants if the source constant data was ANY floating point type
We don't need an exact match, this is mainly cleanup for cases where v2f32 style types have been cast to f64 etc.
2023-12-11 16:23:04 +00:00
Simon Pilgrim
d1deeae094
[X86] Rename VBROADCASTF128/VBROADCASTI128 to VBROADCASTF128rm/VBROADCASTI128rm (#75040)
Add missing rm postfix to show these are load instructions
2023-12-11 11:52:53 +00:00
Simon Pilgrim
539e60c34a [X86] X86FixupVectorConstantsPass - consistently use non-DQI 128/256-bit subvector broadcasts
Without the predicate there's no benefit to using the DQI variants instead of the default AVX512F instructions
2023-11-30 18:33:52 +00:00
Shengchen Kan
bafa51c8a5 [X86] Rename X86MemoryFoldTableEntry to X86FoldTableEntry, NFCI
b/c it's used for element that folds a load, store or broadcast.
2023-11-28 19:49:14 +08:00
Simon Pilgrim
1552b91162 [X86] X86FixupVectorConstantsPass - attempt to match VEX logic ops back to EVEX if we can create a broadcast fold
On non-DQI AVX512 targets, X86InstrInfo::setExecutionDomainCustom will convert EVEX int-domain instructions to VEX fp-domain instructions. But, if we have the chance to use a broadcast fold we're better off using a EVEX instruction, so handle a reverse fold.
2023-11-21 18:01:29 +00:00
Simon Pilgrim
6155fa69fd [X86] X86FixupVectorConstantsPass - pull out the hasAVX2() test and use single ConvertToBroadcast call. NFC.
Matches AVX512 ConvertToBroadcast calls and makes it easier to add extension support in the future.
2023-11-02 17:32:25 +00:00
Simon Pilgrim
f6ff2cc7e0 [X86] X86FixupVectorConstantsPass - attempt to replace full width integer vector constant loads with broadcasts on AVX2+ targets (REAPPLIED)
lowerBuildVectorAsBroadcast will not broadcast splat constants in all cases, resulting in a lot of situations where a full width vector load that has failed to fold but is loading splat constant values could use a broadcast load instruction just as cheaply, and save constant pool space.

This is an updated commit of ab4b924832ce26c21b88d7f82fcf4992ea8906bb after being reverted at 78de45fd4a902066617fcc9bb88efee11f743bc6
2023-06-14 12:48:33 +01:00
Simon Pilgrim
834cc88c5d [X86] X86FixupVectorConstantsPass - attempt to replace full width fp vector constant loads with broadcasts on AVX+ targets (REAPPLIED)
lowerBuildVectorAsBroadcast will not broadcast splat constants in all cases, resulting in a lot of situations where a full width vector load that has failed to fold but is loading splat constant values could use a broadcast load instruction just as cheaply, and save constant pool space.

NOTE: SSE3 targets can use MOVDDUP but not all SSE era CPUs can perform this as cheaply as a vector load, we will need to add scheduler model checks if we want to pursue this.

This is an updated commit of 98061013e01207444cfd3980cde17b5e75764fbe after being reverted at a279a09ab9524d1d74ef29b34618102d4b202e2f
2023-06-13 12:10:11 +01:00
Simon Pilgrim
a279a09ab9 Revert rG98061013e01207444cfd3980 - [X86] X86FixupVectorConstantsPass - attempt to replace full width fp vector constant loads with broadcasts on AVX+ targets
Reverting while we address an existing issue exposed by this (Issue #63108)
2023-06-06 18:44:24 +01:00
Simon Pilgrim
78de45fd4a Revert rGab4b924832ce26c21b88d7f82fcf4992ea8906bb - [X86] X86FixupVectorConstantsPass - attempt to replace full width integer vector constant loads with broadcasts on AVX2+ targets
Reverting while we address an existing issue exposed by this (Issue #63108)
2023-06-06 18:07:33 +01:00
Simon Pilgrim
d6a36619ce [X86] X86FixupVectorConstantsPass - use VBROADCASTSS/VBROADCASTSD for integer vector loads on AVX1-only targets
Matches behaviour in lowerBuildVectorAsBroadcast
2023-05-31 16:39:09 +01:00
Simon Pilgrim
ab4b924832 [X86] X86FixupVectorConstantsPass - attempt to replace full width integer vector constant loads with broadcasts on AVX2+ targets
lowerBuildVectorAsBroadcast will not broadcast splat constants in all cases, resulting in a lot of situations where a full width vector load that has failed to fold but is loading splat constant values could use a broadcast load instruction just as cheaply, and save constant pool space.
2023-05-30 13:17:26 +01:00
Simon Pilgrim
98061013e0 [X86] X86FixupVectorConstantsPass - attempt to replace full width fp vector constant loads with broadcasts on AVX+ targets
lowerBuildVectorAsBroadcast will not broadcast splat constants in all cases, resulting in a lot of situations where a full width vector load that has failed to fold but is loading splat constant values could use a broadcast load instruction just as cheaply, and save constant pool space.

NOTE: SSE3 targets can use MOVDDUP but not all SSE era CPUs can perform this as cheaply as a vector load, we will need to add scheduler model checks if we want to pursue this.
2023-05-29 16:10:52 +01:00
Simon Pilgrim
0b91de5ea3 [X86] Add X86FixupVectorConstantsPass to re-fold AVX512 vector load folds as broadcast folds
This patch analyzes AVX512 instructions for full vector width folded loads from the constant pool and attempts to determine if it can be replaced with a smaller broadcast folded variant. Typically the broadcast opportunities were missed by type-width mismatches or mulituse limitations which have been removed in later passes.

As well as introducing broadcast fold tables (which can hopefully be extended/automated in the future), this also handles mismatches in the AND/ANDN/OR/XOR/TERNLOG type-widths, catching additional missed opportunities.

This is patch is pulled from the ongoing work based on D150143, but without removing the existing DAG constant broadcast lowering code - this patch is currently a late stage cleanup only.

The intention is to add additional broadcast/extension handling of constants in future patches, but it turned out that AVX512 broadcast handling was the easiest to start with.

Differential Revision: https://reviews.llvm.org/D150526
2023-05-23 10:58:17 +01:00