llvm-project

Author	SHA1	Message	Date
Simon Pilgrim	bef25ae297	[X86] X86FixupVectorConstants - use explicit register bitwidth for the loaded vector instead of using constant pool bitwidth Fixes #81136 - we might be loading from a constant pool entry wider than the destination register bitwidth, affecting the vextload scale calculation. ConvertToBroadcastAVX512 doesn't yet set an explicit bitwidth (it will default to the constant pool bitwidth) due to difficulties in looking up the original register width through the fold tables, but as we only use rebuildSplatCst this shouldn't cause any miscompilations, although it might prevent folding to broadcast if only the lower bits match a splatable pattern.	2024-02-08 17:39:19 +00:00
Simon Pilgrim	f407be32fe	[X86] X86FixupVectorConstants - rename FixupEntry::BitWidth to FixupEntry::MemBitWidth NFC. Make it clearer that this refers to the width of the constant element stored in memory - which won't match the register element width after a sext/zextload	2024-02-08 16:35:14 +00:00
Simon Pilgrim	b846613837	[X86] X86FixupVectorConstants - add destination register width to rebuildSplatCst/rebuildZeroUpperCst/rebuildExtCst callbacks As found on #81136 - we aren't correctly handling for cases where the constant pool entry is wider than the destination register width, causing incorrect scaling of the truncated constant for load-extension cases. This first patch just pulls out the destination register width argument, its still currently driven by the constant pool entry but that will be addressed in a followup.	2024-02-08 16:35:13 +00:00
Simon Pilgrim	50d38cf934	[X86] X86FixupVectorConstants.cpp - update comment to describe all the constant load ops performed by the pass	2024-02-07 14:07:02 +00:00
Simon Pilgrim	69ffa7be3b	[X86] X86FixupVectorConstants - load+zero vector constants that can be stored in a truncated form (#80428 ) Further develops the vsextload support added in #79815 / b5d35feacb7246573c6a4ab2bddc4919a4228ed5 - reduces the size of the vector constant by storing it in the constant pool in a truncated form, and zero-extend it as part of the load.	2024-02-05 12:17:58 +00:00
Simon Pilgrim	b5d35feacb	[X86] X86FixupVectorConstants - load+sign-extend vector constants that can be stored in a truncated form (#79815 ) Reduce the size of the vector constant by storing it in the constant pool in a truncated form, and sign-extend it as part of the load. I've extended the existing FixupConstant functionality to support these sext constant rebuilds - we still select the smallest stored constant entry and prefer vzload/broadcast/vextload for same bitwidth to avoid domain flips. I intend to add the matching load+zero-extend handling in a future PR, but that requires some alterations to the existing MC shuffle comments handling first.	2024-02-02 11:28:58 +00:00
Simon Pilgrim	6ac4fe8de0	[X86] X86FixupVectorConstants.cpp - refactor constant search loop to take array of sorted candidates Pulled out of #79815 - refactors the internal FixupConstant logic to just accept an array of vzload/broadcast candidates that are pre-sorted in ascending constant pool size	2024-02-01 16:06:36 +00:00
Shengchen Kan	cfb702676c	[X86][NFC] Rename lookupBroadcastFoldTable to lookupBroadcastFoldTableBySize Address RKSimon's comments in #79761	2024-01-29 23:23:07 +08:00
Mikael Holmen	e4375bf47f	[X86] Fix warning about unused variable [NFC] Without this gcc complains like ../lib/Target/X86/X86FixupVectorConstants.cpp:70:13: warning: unused variable 'CUndef' [-Wunused-variable] 70 \| if (auto *CUndef = dyn_cast<UndefValue>(C)) \| ^~~~~~ Remove the unused variable and change dyn_cast to isa.	2024-01-25 11:30:51 +01:00
Simon Pilgrim	8b43c1be23	[X86] X86FixupVectorConstants - shrink vector load to movsd/movsd/movd/movq 'zero upper' instructions (#79000 ) If we're loading a vector constant that is known to be zero in the upper elements, then attempt to shrink the constant and just scalar load the lower 32/64 bits. Always chose the vzload/broadcast with the smallest constant load, and prefer vzload over broadcasts for same bitwidth to avoid domain flips (mainly a AVX1 issue). Fixes #73783	2024-01-24 14:00:51 +00:00
Simon Pilgrim	4e64ed9780	[X86] Update X86::getConstantFromPool to take base OperandNo instead of Displacement MachineOperand This allows us to check the entire constant address calculation, and ensure we're not performing any runtime address math into the constant pool (noticed in an upcoming patch).	2024-01-22 15:40:45 +00:00
Simon Pilgrim	c1729c8df2	[X86] X86FixupVectorConstants.cpp - pull out rebuildConstant helper for future patches. NFC. Add helper to convert raw APInt bit stream into ConstantDataVector elements. This was used internally by rebuildSplatableConstant but will be reused in future patches for #73783 and #71078	2024-01-22 11:44:51 +00:00
Simon Pilgrim	d12dffacaa	[X86] Add X86::getConstantFromPool helper function to replace duplicate implementations. We had the same helper function in shuffle decode / vector constant code - move this to X86InstrInfo to avoid duplication.	2024-01-18 11:59:46 +00:00
Simon Pilgrim	1d56138d74	[X86] X86FixupVectorConstants - create f32/f64 broadcast constants if the source constant data was f32/f64 This partially reverts 33819f3bfb9c - the asm comments become a lot messier in #73509 - we're better off ensuring the constant data is the correct type in DAG	2023-12-12 10:32:04 +00:00
Simon Pilgrim	33819f3bfb	[X86] X86FixupVectorConstants - create f32/f64 broadcast constants if the source constant data was ANY floating point type We don't need an exact match, this is mainly cleanup for cases where v2f32 style types have been cast to f64 etc.	2023-12-11 16:23:04 +00:00
Simon Pilgrim	d1deeae094	[X86] Rename VBROADCASTF128/VBROADCASTI128 to VBROADCASTF128rm/VBROADCASTI128rm (#75040 ) Add missing rm postfix to show these are load instructions	2023-12-11 11:52:53 +00:00
Simon Pilgrim	539e60c34a	[X86] X86FixupVectorConstantsPass - consistently use non-DQI 128/256-bit subvector broadcasts Without the predicate there's no benefit to using the DQI variants instead of the default AVX512F instructions	2023-11-30 18:33:52 +00:00
Shengchen Kan	bafa51c8a5	[X86] Rename X86MemoryFoldTableEntry to X86FoldTableEntry, NFCI b/c it's used for element that folds a load, store or broadcast.	2023-11-28 19:49:14 +08:00
Simon Pilgrim	1552b91162	[X86] X86FixupVectorConstantsPass - attempt to match VEX logic ops back to EVEX if we can create a broadcast fold On non-DQI AVX512 targets, X86InstrInfo::setExecutionDomainCustom will convert EVEX int-domain instructions to VEX fp-domain instructions. But, if we have the chance to use a broadcast fold we're better off using a EVEX instruction, so handle a reverse fold.	2023-11-21 18:01:29 +00:00
Simon Pilgrim	6155fa69fd	[X86] X86FixupVectorConstantsPass - pull out the hasAVX2() test and use single ConvertToBroadcast call. NFC. Matches AVX512 ConvertToBroadcast calls and makes it easier to add extension support in the future.	2023-11-02 17:32:25 +00:00
Simon Pilgrim	f6ff2cc7e0	[X86] X86FixupVectorConstantsPass - attempt to replace full width integer vector constant loads with broadcasts on AVX2+ targets (REAPPLIED) lowerBuildVectorAsBroadcast will not broadcast splat constants in all cases, resulting in a lot of situations where a full width vector load that has failed to fold but is loading splat constant values could use a broadcast load instruction just as cheaply, and save constant pool space. This is an updated commit of ab4b924832ce26c21b88d7f82fcf4992ea8906bb after being reverted at 78de45fd4a902066617fcc9bb88efee11f743bc6	2023-06-14 12:48:33 +01:00
Simon Pilgrim	834cc88c5d	[X86] X86FixupVectorConstantsPass - attempt to replace full width fp vector constant loads with broadcasts on AVX+ targets (REAPPLIED) lowerBuildVectorAsBroadcast will not broadcast splat constants in all cases, resulting in a lot of situations where a full width vector load that has failed to fold but is loading splat constant values could use a broadcast load instruction just as cheaply, and save constant pool space. NOTE: SSE3 targets can use MOVDDUP but not all SSE era CPUs can perform this as cheaply as a vector load, we will need to add scheduler model checks if we want to pursue this. This is an updated commit of 98061013e01207444cfd3980cde17b5e75764fbe after being reverted at a279a09ab9524d1d74ef29b34618102d4b202e2f	2023-06-13 12:10:11 +01:00
Simon Pilgrim	a279a09ab9	Revert rG98061013e01207444cfd3980 - [X86] X86FixupVectorConstantsPass - attempt to replace full width fp vector constant loads with broadcasts on AVX+ targets Reverting while we address an existing issue exposed by this (Issue #63108)	2023-06-06 18:44:24 +01:00
Simon Pilgrim	78de45fd4a	Revert rGab4b924832ce26c21b88d7f82fcf4992ea8906bb - [X86] X86FixupVectorConstantsPass - attempt to replace full width integer vector constant loads with broadcasts on AVX2+ targets Reverting while we address an existing issue exposed by this (Issue #63108)	2023-06-06 18:07:33 +01:00
Simon Pilgrim	d6a36619ce	[X86] X86FixupVectorConstantsPass - use VBROADCASTSS/VBROADCASTSD for integer vector loads on AVX1-only targets Matches behaviour in lowerBuildVectorAsBroadcast	2023-05-31 16:39:09 +01:00
Simon Pilgrim	ab4b924832	[X86] X86FixupVectorConstantsPass - attempt to replace full width integer vector constant loads with broadcasts on AVX2+ targets lowerBuildVectorAsBroadcast will not broadcast splat constants in all cases, resulting in a lot of situations where a full width vector load that has failed to fold but is loading splat constant values could use a broadcast load instruction just as cheaply, and save constant pool space.	2023-05-30 13:17:26 +01:00
Simon Pilgrim	98061013e0	[X86] X86FixupVectorConstantsPass - attempt to replace full width fp vector constant loads with broadcasts on AVX+ targets lowerBuildVectorAsBroadcast will not broadcast splat constants in all cases, resulting in a lot of situations where a full width vector load that has failed to fold but is loading splat constant values could use a broadcast load instruction just as cheaply, and save constant pool space. NOTE: SSE3 targets can use MOVDDUP but not all SSE era CPUs can perform this as cheaply as a vector load, we will need to add scheduler model checks if we want to pursue this.	2023-05-29 16:10:52 +01:00
Simon Pilgrim	0b91de5ea3	[X86] Add X86FixupVectorConstantsPass to re-fold AVX512 vector load folds as broadcast folds This patch analyzes AVX512 instructions for full vector width folded loads from the constant pool and attempts to determine if it can be replaced with a smaller broadcast folded variant. Typically the broadcast opportunities were missed by type-width mismatches or mulituse limitations which have been removed in later passes. As well as introducing broadcast fold tables (which can hopefully be extended/automated in the future), this also handles mismatches in the AND/ANDN/OR/XOR/TERNLOG type-widths, catching additional missed opportunities. This is patch is pulled from the ongoing work based on D150143, but without removing the existing DAG constant broadcast lowering code - this patch is currently a late stage cleanup only. The intention is to add additional broadcast/extension handling of constants in future patches, but it turned out that AVX512 broadcast handling was the easiest to start with. Differential Revision: https://reviews.llvm.org/D150526	2023-05-23 10:58:17 +01:00

28 Commits