28 Commits

Author SHA1 Message Date
Simon Pilgrim
bef25ae297 [X86] X86FixupVectorConstants - use explicit register bitwidth for the loaded vector instead of using constant pool bitwidth
Fixes #81136 - we might be loading from a constant pool entry wider than the destination register bitwidth, affecting the vextload scale calculation.

ConvertToBroadcastAVX512 doesn't yet set an explicit bitwidth (it will default to the constant pool bitwidth) due to difficulties in looking up the original register width through the fold tables, but as we only use rebuildSplatCst this shouldn't cause any miscompilations, although it might prevent folding to broadcast if only the lower bits match a splatable pattern.
2024-02-08 17:39:19 +00:00
Simon Pilgrim
f407be32fe [X86] X86FixupVectorConstants - rename FixupEntry::BitWidth to FixupEntry::MemBitWidth NFC.
Make it clearer that this refers to the width of the constant element stored in memory - which won't match the register element width after a sext/zextload
2024-02-08 16:35:14 +00:00
Simon Pilgrim
b846613837 [X86] X86FixupVectorConstants - add destination register width to rebuildSplatCst/rebuildZeroUpperCst/rebuildExtCst callbacks
As found on #81136 - we aren't correctly handling for cases where the constant pool entry is wider than the destination register width, causing incorrect scaling of the truncated constant for load-extension cases.

This first patch just pulls out the destination register width argument, its still currently driven by the constant pool entry but that will be addressed in a followup.
2024-02-08 16:35:13 +00:00
Simon Pilgrim
50d38cf934 [X86] X86FixupVectorConstants.cpp - update comment to describe all the constant load ops performed by the pass 2024-02-07 14:07:02 +00:00
Simon Pilgrim
69ffa7be3b
[X86] X86FixupVectorConstants - load+zero vector constants that can be stored in a truncated form (#80428)
Further develops the vsextload support added in #79815 / b5d35feacb7246573c6a4ab2bddc4919a4228ed5 - reduces the size of the vector constant by storing it in the constant pool in a truncated form, and zero-extend it as part of the load.
2024-02-05 12:17:58 +00:00
Simon Pilgrim
b5d35feacb
[X86] X86FixupVectorConstants - load+sign-extend vector constants that can be stored in a truncated form (#79815)
Reduce the size of the vector constant by storing it in the constant pool in a truncated form, and sign-extend it as part of the load.

I've extended the existing FixupConstant functionality to support these sext constant rebuilds - we still select the smallest stored constant entry and prefer vzload/broadcast/vextload for same bitwidth to avoid domain flips.

I intend to add the matching load+zero-extend handling in a future PR, but that requires some alterations to the existing MC shuffle comments handling first.
2024-02-02 11:28:58 +00:00
Simon Pilgrim
6ac4fe8de0 [X86] X86FixupVectorConstants.cpp - refactor constant search loop to take array of sorted candidates
Pulled out of #79815 - refactors the internal FixupConstant logic to just accept an array of vzload/broadcast candidates that are pre-sorted in ascending constant pool size
2024-02-01 16:06:36 +00:00
Shengchen Kan
cfb702676c [X86][NFC] Rename lookupBroadcastFoldTable to lookupBroadcastFoldTableBySize
Address RKSimon's comments in #79761
2024-01-29 23:23:07 +08:00
Mikael Holmen
e4375bf47f [X86] Fix warning about unused variable [NFC]
Without this gcc complains like
 ../lib/Target/X86/X86FixupVectorConstants.cpp:70:13: warning: unused variable 'CUndef' [-Wunused-variable]
    70 |   if (auto *CUndef = dyn_cast<UndefValue>(C))
       |             ^~~~~~

Remove the unused variable and change dyn_cast to isa.
2024-01-25 11:30:51 +01:00
Simon Pilgrim
8b43c1be23
[X86] X86FixupVectorConstants - shrink vector load to movsd/movsd/movd/movq 'zero upper' instructions (#79000)
If we're loading a vector constant that is known to be zero in the upper elements, then attempt to shrink the constant and just scalar load the lower 32/64 bits.

Always chose the vzload/broadcast with the smallest constant load, and prefer vzload over broadcasts for same bitwidth to avoid domain flips (mainly a AVX1 issue).

Fixes #73783
2024-01-24 14:00:51 +00:00
Simon Pilgrim
4e64ed9780 [X86] Update X86::getConstantFromPool to take base OperandNo instead of Displacement MachineOperand
This allows us to check the entire constant address calculation, and ensure we're not performing any runtime address math into the constant pool (noticed in an upcoming patch).
2024-01-22 15:40:45 +00:00
Simon Pilgrim
c1729c8df2 [X86] X86FixupVectorConstants.cpp - pull out rebuildConstant helper for future patches. NFC.
Add helper to convert raw APInt bit stream into ConstantDataVector elements.

This was used internally by rebuildSplatableConstant but will be reused in future patches for #73783 and #71078
2024-01-22 11:44:51 +00:00
Simon Pilgrim
d12dffacaa [X86] Add X86::getConstantFromPool helper function to replace duplicate implementations.
We had the same helper function in shuffle decode / vector constant code - move this to X86InstrInfo to avoid duplication.
2024-01-18 11:59:46 +00:00
Simon Pilgrim
1d56138d74 [X86] X86FixupVectorConstants - create f32/f64 broadcast constants if the source constant data was f32/f64
This partially reverts 33819f3bfb9c - the asm comments become a lot messier in #73509 - we're better off ensuring the constant data is the correct type in DAG
2023-12-12 10:32:04 +00:00
Simon Pilgrim
33819f3bfb [X86] X86FixupVectorConstants - create f32/f64 broadcast constants if the source constant data was ANY floating point type
We don't need an exact match, this is mainly cleanup for cases where v2f32 style types have been cast to f64 etc.
2023-12-11 16:23:04 +00:00
Simon Pilgrim
d1deeae094
[X86] Rename VBROADCASTF128/VBROADCASTI128 to VBROADCASTF128rm/VBROADCASTI128rm (#75040)
Add missing rm postfix to show these are load instructions
2023-12-11 11:52:53 +00:00
Simon Pilgrim
539e60c34a [X86] X86FixupVectorConstantsPass - consistently use non-DQI 128/256-bit subvector broadcasts
Without the predicate there's no benefit to using the DQI variants instead of the default AVX512F instructions
2023-11-30 18:33:52 +00:00
Shengchen Kan
bafa51c8a5 [X86] Rename X86MemoryFoldTableEntry to X86FoldTableEntry, NFCI
b/c it's used for element that folds a load, store or broadcast.
2023-11-28 19:49:14 +08:00
Simon Pilgrim
1552b91162 [X86] X86FixupVectorConstantsPass - attempt to match VEX logic ops back to EVEX if we can create a broadcast fold
On non-DQI AVX512 targets, X86InstrInfo::setExecutionDomainCustom will convert EVEX int-domain instructions to VEX fp-domain instructions. But, if we have the chance to use a broadcast fold we're better off using a EVEX instruction, so handle a reverse fold.
2023-11-21 18:01:29 +00:00
Simon Pilgrim
6155fa69fd [X86] X86FixupVectorConstantsPass - pull out the hasAVX2() test and use single ConvertToBroadcast call. NFC.
Matches AVX512 ConvertToBroadcast calls and makes it easier to add extension support in the future.
2023-11-02 17:32:25 +00:00
Simon Pilgrim
f6ff2cc7e0 [X86] X86FixupVectorConstantsPass - attempt to replace full width integer vector constant loads with broadcasts on AVX2+ targets (REAPPLIED)
lowerBuildVectorAsBroadcast will not broadcast splat constants in all cases, resulting in a lot of situations where a full width vector load that has failed to fold but is loading splat constant values could use a broadcast load instruction just as cheaply, and save constant pool space.

This is an updated commit of ab4b924832ce26c21b88d7f82fcf4992ea8906bb after being reverted at 78de45fd4a902066617fcc9bb88efee11f743bc6
2023-06-14 12:48:33 +01:00
Simon Pilgrim
834cc88c5d [X86] X86FixupVectorConstantsPass - attempt to replace full width fp vector constant loads with broadcasts on AVX+ targets (REAPPLIED)
lowerBuildVectorAsBroadcast will not broadcast splat constants in all cases, resulting in a lot of situations where a full width vector load that has failed to fold but is loading splat constant values could use a broadcast load instruction just as cheaply, and save constant pool space.

NOTE: SSE3 targets can use MOVDDUP but not all SSE era CPUs can perform this as cheaply as a vector load, we will need to add scheduler model checks if we want to pursue this.

This is an updated commit of 98061013e01207444cfd3980cde17b5e75764fbe after being reverted at a279a09ab9524d1d74ef29b34618102d4b202e2f
2023-06-13 12:10:11 +01:00
Simon Pilgrim
a279a09ab9 Revert rG98061013e01207444cfd3980 - [X86] X86FixupVectorConstantsPass - attempt to replace full width fp vector constant loads with broadcasts on AVX+ targets
Reverting while we address an existing issue exposed by this (Issue #63108)
2023-06-06 18:44:24 +01:00
Simon Pilgrim
78de45fd4a Revert rGab4b924832ce26c21b88d7f82fcf4992ea8906bb - [X86] X86FixupVectorConstantsPass - attempt to replace full width integer vector constant loads with broadcasts on AVX2+ targets
Reverting while we address an existing issue exposed by this (Issue #63108)
2023-06-06 18:07:33 +01:00
Simon Pilgrim
d6a36619ce [X86] X86FixupVectorConstantsPass - use VBROADCASTSS/VBROADCASTSD for integer vector loads on AVX1-only targets
Matches behaviour in lowerBuildVectorAsBroadcast
2023-05-31 16:39:09 +01:00
Simon Pilgrim
ab4b924832 [X86] X86FixupVectorConstantsPass - attempt to replace full width integer vector constant loads with broadcasts on AVX2+ targets
lowerBuildVectorAsBroadcast will not broadcast splat constants in all cases, resulting in a lot of situations where a full width vector load that has failed to fold but is loading splat constant values could use a broadcast load instruction just as cheaply, and save constant pool space.
2023-05-30 13:17:26 +01:00
Simon Pilgrim
98061013e0 [X86] X86FixupVectorConstantsPass - attempt to replace full width fp vector constant loads with broadcasts on AVX+ targets
lowerBuildVectorAsBroadcast will not broadcast splat constants in all cases, resulting in a lot of situations where a full width vector load that has failed to fold but is loading splat constant values could use a broadcast load instruction just as cheaply, and save constant pool space.

NOTE: SSE3 targets can use MOVDDUP but not all SSE era CPUs can perform this as cheaply as a vector load, we will need to add scheduler model checks if we want to pursue this.
2023-05-29 16:10:52 +01:00
Simon Pilgrim
0b91de5ea3 [X86] Add X86FixupVectorConstantsPass to re-fold AVX512 vector load folds as broadcast folds
This patch analyzes AVX512 instructions for full vector width folded loads from the constant pool and attempts to determine if it can be replaced with a smaller broadcast folded variant. Typically the broadcast opportunities were missed by type-width mismatches or mulituse limitations which have been removed in later passes.

As well as introducing broadcast fold tables (which can hopefully be extended/automated in the future), this also handles mismatches in the AND/ANDN/OR/XOR/TERNLOG type-widths, catching additional missed opportunities.

This is patch is pulled from the ongoing work based on D150143, but without removing the existing DAG constant broadcast lowering code - this patch is currently a late stage cleanup only.

The intention is to add additional broadcast/extension handling of constants in future patches, but it turned out that AVX512 broadcast handling was the easiest to start with.

Differential Revision: https://reviews.llvm.org/D150526
2023-05-23 10:58:17 +01:00