llvm-project

Author	SHA1	Message	Date
Simon Pilgrim	bf873aa3ec	[VectorCombine] foldShuffleToIdentity - add debug message for match Helps with debugging to show to that the fold found the match.	2024-12-22 17:21:44 +00:00
Simon Pilgrim	f96337e04e	[VectorCombine] foldConcatOfBoolMasks - add debug message for match + cost-comparison Helps with debugging to show to that the fold found the match, and shows the old + new costs to indicate whether the fold was/wasn't profitable.	2024-12-22 16:21:02 +00:00
Simon Pilgrim	82b5bda42c	[VectorCombine] Add "VC: Erasing" debug message to help the log show when dead WorkList instructions are erased.	2024-12-20 17:59:14 +00:00
Simon Pilgrim	e3157d3f0d	[VectorCombine] foldBitcastShuffle - add debug message for match + cost-comparison Helps with debugging to show to that the fold found the match, and shows the old + new costs to indicate whether the fold was/wasn't profitable.	2024-12-20 17:59:13 +00:00
David Green	70eac255b8	[VectorCombine] Add fp cast handling for shuffletoidentity (#120641 ) This fixes some regressions from recent changes to vector combine in #120216. It allows shuffleToIdentity to look through fp casts as other casts, and makes sure mismatching vector types in splats and casts do not block the transform, as only the lanes should matter.	2024-12-20 15:05:08 +00:00
Simon Pilgrim	b87a5fb9fd	[VectorCombine] Add "VC: Visiting" debug message to help the log show the instruction folding order.	2024-12-20 14:57:58 +00:00
Simon Pilgrim	5f0db7c112	[VectorCombine] Add "VECTORCOMBINE on <FUNCTION_NAME>" title debug message to help finding vectorcombine stages in the debug log	2024-12-20 13:32:49 +00:00
Simon Pilgrim	c5434804ee	[VectorCombine] foldInsExtVectorToShuffle - add debug message for match + cost-comparison Helps with debugging to show to that the fold found the match, and shows the old + new costs to indicate whether the fold was/wasn't profitable.	2024-12-20 13:32:49 +00:00
hanbeom	ff93ca7d6c	[VectorCombine] Combine scalar fneg with insert/extract to vector fneg when length is different (#120461 ) insertelt DestVec, (fneg (extractelt SrcVec, Index)), Index -> shuffle DestVec, (shuffle (fneg SrcVec), poison, SrcMask), Mask Original combining left the combine between vectors of different lengths as a TODO. this commit do that. (see #[`baab4aa1ba`])	2024-12-20 10:44:49 +00:00
Finn Plummer	45c01e8a33	[NFC][TargetTransformInfo][VectorUtils] Consolidate `isVectorIntrinsic...` api (#117635 ) - update `VectorUtils:isVectorIntrinsicWithScalarOpAtArg` to use TTI for all uses, to allow specifiction of target specific intrinsics - add TTI to the `isVectorIntrinsicWithStructReturnOverloadAtField` api - update TTI api to provide `isTargetIntrinsicWith...` functions and consistently name them - move `isTriviallyScalarizable` to VectorUtils - update all uses of the api and provide the TTI parameter Resolves #117030	2024-12-19 11:54:26 -08:00
Simon Pilgrim	fbc18b85d6	Revert "[VectorCombine] Combine scalar fneg with insert/extract to vector fneg when length is different" (#120422 ) Reverts llvm/llvm-project#115209 - investigating a reported regression	2024-12-18 13:32:53 +00:00
hanbeom	b7a8d9584c	[VectorCombine] Combine scalar fneg with insert/extract to vector fneg when length is different (#115209 ) insertelt DestVec, (fneg (extractelt SrcVec, Index)), Index -> shuffle DestVec, (shuffle (fneg SrcVec), poison, SrcMask), Mask Original combining left the combine between vectors of different lengths as a TODO.	2024-12-18 07:47:42 +00:00
Simon Pilgrim	5287299f88	[VectorCombine] foldShuffleOfBinops - prefer same cost fold if it reduces instruction count (#120216 ) We don't fold "shuffle (binop), (binop)" -> "binop (shuffle), (shuffle)" if the old/new costs are equal, but we can relax this if either new shuffle will constant fold as it will reduce instruction count.	2024-12-17 18:10:20 +00:00
Simon Pilgrim	8217c2eaef	[VectorCombine] foldShuffleOfBinops - extend to handle icmp/fcmp ops as well (#120075 ) Extend binary instructions matching to match compare instructions + predicate as well.	2024-12-16 17:23:04 +00:00
Simon Pilgrim	916bae2d92	[VectorCombine] foldShuffleOfBinops - refactor to make it easier to match icmp/fcmp patterns NFC refactor to make it easier to also use the fold for icmp/fcmp patterns in a future patch - match the Shuffle with general Instruction operands and avoid explicit use of the BinaryOperator matches as much as possible for the general costing / fold.	2024-12-15 12:49:24 +00:00
Simon Pilgrim	cc54a0ce56	[VectorCombine] vectorizeLoadInsert - only fold when inserting into a poison vector (#119906 ) We have corresponding poison tests in the "-inseltpoison.ll" sibling test files. Fixes #119900	2024-12-14 11:56:12 +00:00
Ramkumar Ramachandra	4a0d53a0b0	PatternMatch: migrate to CmpPredicate (#118534 ) With the introduction of CmpPredicate in 51a895a (IR: introduce struct with CmpInst::Predicate and samesign), PatternMatch is one of the first key pieces of infrastructure that must be updated to match a CmpInst respecting samesign information. Implement this change to Cmp-matchers. This is a preparatory step in migrating the codebase over to CmpPredicate. Since we no functional changes are desired at this stage, we have chosen not to migrate CmpPredicate::operator==(CmpPredicate) calls to use CmpPredicate::getMatching(), as that would have visible impact on tests that are not yet written: instead, we call CmpPredicate::operator==(Predicate), preserving the old behavior, while also inserting a few FIXME comments for follow-ups.	2024-12-13 14:18:33 +00:00
Simon Pilgrim	86779da52b	[VectorCombine] Fold "(or (zext (bitcast X)), (shl (zext (bitcast Y)), C))" -> "(bitcast (concat X, Y))" MOVMSK bool mask style patterns (#119695 ) Mask/Bool vectors are often bitcast to/from scalar integers, in particular when concatenating mask results, often this is due to the difficulties of working with vector of bools on C/C++. On x86 this typically involves the MOVMSK/KMOV instructions. To concatenate bool masks, these are typically cast to scalars, which are then zero-extended, shifted and OR'd together. This patch attempts to match these scalar concatenation patterns and convert them to vector shuffles instead. This in turn often assists with further vector combines, depending on the cost model. Reapplied patch from #119559 - fixed use after free issue. Fixes #111431	2024-12-12 13:45:10 +00:00
Simon Pilgrim	b604d23feb	[VectorCombine] Pull out isa<VectorType> check. Noticed while investigating a crash in #119559 - we don't account for I being replaced and its Type being reallocated. So hoist the checks to the start of the loop.	2024-12-12 11:02:01 +00:00
Vitaly Buka	89b7aea573	Revert "[VectorCombine] Fold "(or (zext (bitcast X)), (shl (zext (bitcast Y)), C))" -> "(bitcast (concat X, Y))" MOVMSK bool mask style patterns" (#119594 ) Reverts llvm/llvm-project#119559 Introduce use after free, see llvm/llvm-project#119559	2024-12-11 10:50:26 -08:00
Simon Pilgrim	08f904011f	[VectorCombine] Fold "(or (zext (bitcast X)), (shl (zext (bitcast Y)), C))" -> "(bitcast (concat X, Y))" MOVMSK bool mask style patterns (#119559 ) Mask/Bool vectors are often bitcast to/from scalar integers, in particular when concatenating mask results, often this is due to the difficulties of working with vector of bools on C/C++. On x86 this typically involves the MOVMSK/KMOV instructions. To concatenate bool masks, these are typically cast to scalars, which are then zero-extended, shifted and OR'd together. This patch attempts to match these scalar concatenation patterns and convert them to vector shuffles instead. This in turn often assists with further vector combines, depending on the cost model. Fixes #111431	2024-12-11 16:40:56 +00:00
Simon Pilgrim	673c324ae3	[VectorCombine] foldInsExtVectorToShuffle - canonicalize new shuffle(undef,x) -> shuffle(x,undef). foldInsExtVectorToShuffle is likely to be inserting into an undef value, so make sure we've canonicalized this to the RHS in the folded shuffle to help further VectorCombine folds. Minor tweak to help #34072	2024-12-11 16:09:13 +00:00
Simon Pilgrim	05b907f66b	[VectorCombine] foldShuffleOfShuffles - allow fold with only single shuffle operand. (#119354 ) foldShuffleOfShuffles already handles "shuffle (shuffle x, undef), (shuffle y, undef)" patterns, this patch relaxes the requirement so it can handle cases where only a single operand is a shuffle (and the other can be any other value and will be kept in place). Fixes #86068	2024-12-10 13:10:25 +00:00
Simon Pilgrim	54c6a592a0	[VectorCombine] Add explicit CostKind to all getMemoryOpCost calls. NFC. We currently hardwire CostKind to TTI::TCK_RecipThroughput which matches the default CostKind for getMemoryOpCost.	2024-12-09 13:39:19 +00:00
Simon Pilgrim	b3e498799e	[VectorCombine] Add explicit CostKind to all getCmpSelInstrCost calls. NFC. We currently hardwire CostKind to TTI::TCK_RecipThroughput which matches the default CostKind for getCmpSelInstrCost.	2024-12-09 13:25:32 +00:00
Simon Pilgrim	37cb9bdeca	[VectorCombine] Add explicit CostKind to all getArithmeticInstrCost calls. NFC. We currently hardwire CostKind to TTI::TCK_RecipThroughput which matches the default CostKind for getArithmeticInstrCost.	2024-12-09 13:01:14 +00:00
Simon Pilgrim	9328cc0f67	[VectorCombine] Add explicit CostKind to all getShuffleCost calls. NFC. We currently hardwire CostKind to TTI::TCK_RecipThroughput which matches the default CostKind for getShuffleCost.	2024-12-09 13:01:14 +00:00
Simon Pilgrim	7831c5e483	[VectorCombine] Pull out TargetCostKind argument to allow globally set cost kind value (#118652 ) Don't use TCK_RecipThroughput independently in every VectorCombine fold. Some prep work to allow a potential future patch to use VectorCombine to optimise for code size for -Os/Oz builds (setting TCK_CodeSize instead of TCK_RecipThroughput). There's still more cleanup to do as a lot of get*Cost calls are relying on the default TargetCostKind value (usually TCK_RecipThroughput but not always).	2024-12-09 12:05:07 +00:00
Simon Pilgrim	0baa6a7272	[VectorCombine] foldShuffleOfShuffles - relax one-use of inner shuffles (#116062 ) Allow multi-use of either of the inner shuffles and account for that in the cost comparison.	2024-11-13 16:18:11 +00:00
Simon Pilgrim	1878b94568	[VectorCombine] isExtractExtractCheap - specify the extract/insert shuffle mask to improve shuffle costs (#114780 ) This shuffle mask is so focused, the cost model is very likely to be able to determine a specific (lower) cost	2024-11-13 12:31:39 +00:00
hanbeom	d942f5e13d	[VectorCombine] Combine extract/insert from vector into a shuffle (#115213 ) insert (DstVec, (extract SrcVec, ExtIdx), InsIdx) --> shuffle (DstVec, SrcVec, Mask) This commit combines extract/insert on a vector into Shuffle with vector.	2024-11-13 11:16:09 +00:00
Simon Pilgrim	958e37cd1f	[VectorCombine] scalarizeBinopOrCmp - check for out of bounds element indices Fixes #115575	2024-11-09 16:00:03 +00:00
Simon Pilgrim	4f24d0355a	[VectorCombine] Use explicit ExtractElementInst getVectorOperand/getIndexOperand accessors. NFC.	2024-11-07 13:53:25 +00:00
Simon Pilgrim	e3a0775651	[VectorCombine] foldExtractedCmps - (re-)enable fold on non-commutative binops #114901 exposed that foldExtractedCmps didn't account for non-commutative binops, and were disabled by 05e838f428555bcc4507bd37912da60ea9110ef6 This patch re-enables support for non-commutative binops by ensuring that the LHS/RHS arg order of the binop is retained.	2024-11-06 12:10:31 +00:00
Simon Pilgrim	05e838f428	[VectorCombine] foldExtractedCmps - disable fold on non-commutative binops The fold needs to be adjusted to correctly track the LHS/RHS operands, which will take some refactoring, for now just disable the fold in this case. Fixes #114901	2024-11-05 11:42:30 +00:00
Simon Pilgrim	718d50d6d0	[VectorCombine] foldPermuteOfBinops - prefer the new fold for matching costs. Minor tweak to #114101 - as we're reducing the instruction count, we should prefer the fold if the old/new costs are the same.	2024-11-01 17:28:37 +00:00
Simon Pilgrim	92af82a48d	[VectorCombine] Fold "shuffle (binop (shuffle, shuffle)), undef" --> "binop (shuffle), (shuffle)" (#114101 ) Add foldPermuteOfBinops - to fold a permute (single source shuffle) through a binary op that is being fed by other shuffles. Fixes #94546 Fixes #49736	2024-10-31 10:58:09 +00:00
Ramkumar Ramachandra	1f919aa778	VectorCombine: lift one-use limitation in foldExtractedCmps (#110902 ) There are artificial one-use limitations on foldExtractedCmps. Adjust the costs to account for multi-use, and strip the one-use matcher, lifting the limitations.	2024-10-10 14:10:41 +01:00
David Green	c136d3237a	[VectorCombine] Do not try to operate on OperandBundles. (#111635 ) This bails out if we see an intrinsic with an operand bundle on it, to make sure we don't process the bundles incorrectly. Fixes #110382.	2024-10-09 16:20:03 +01:00
Jay Foad	e03f427196	[LLVM] Use {} instead of std::nullopt to initialize empty ArrayRef (#109133 ) It is almost always simpler to use {} instead of std::nullopt to initialize an empty ArrayRef. This patch changes all occurrences I could find in LLVM itself. In future the ArrayRef(std::nullopt_t) constructor could be deprecated or removed.	2024-09-19 16:16:38 +01:00
Yingwei Zheng	87663fdab9	[VectorCombine] Don't shrink lshr if the shamt is not less than bitwidth (#108705 ) Consider the following case: ``` define <2 x i32> @test(<2 x i64> %vec.ind16, <2 x i32> %broadcast.splat20) { %19 = icmp eq <2 x i64> %vec.ind16, zeroinitializer %20 = zext <2 x i1> %19 to <2 x i32> %21 = lshr <2 x i32> %20, %broadcast.splat20 ret <2 x i32> %21 } ``` After https://github.com/llvm/llvm-project/pull/104606, we shrink the lshr into: ``` define <2 x i32> @test(<2 x i64> %vec.ind16, <2 x i32> %broadcast.splat20) { %1 = icmp eq <2 x i64> %vec.ind16, zeroinitializer %2 = trunc <2 x i32> %broadcast.splat20 to <2 x i1> %3 = lshr <2 x i1> %1, %2 %4 = zext <2 x i1> %3 to <2 x i32> ret <2 x i32> %4 } ``` It is incorrect since `lshr i1 X, 1` returns `poison`. This patch adds additional check on the shamt operand. The lshr will get shrunk iff we ensure that the shamt is less than bitwidth of the smaller type. As `computeKnownBits(&I, *DL).countMaxActiveBits() > BW` always evaluates to true for `lshr(zext(X), Y)`, this check will only apply to bitwise logical instructions. Alive2: https://alive2.llvm.org/ce/z/j_RmTa Fixes https://github.com/llvm/llvm-project/issues/108698.	2024-09-15 18:38:06 +08:00
Igor Kirillov	1b57cbcf25	[VectorCombine] Refactor Insertion Point setting in shrinkType (#108398 )	2024-09-13 10:03:31 +01:00
Igor Kirillov	958a337132	[VectorCombine] Fix trunc generated between PHINodes (#108228 )	2024-09-12 10:20:56 +01:00
Han-Kuan Chen	0ccc6092d2	[VectorCombine] Add foldShuffleOfIntrinsics. (#106502 )	2024-09-10 21:10:09 +08:00
Igor Kirillov	bf694841f5	[VectorCombine] Add type shrinking and zext propagation for fixed-width vector types (#104606 ) Check that `binop(zext(value)`, other) is possible and profitable to transform into: `zext(binop(value, trunc(other)))`. When CPU architecture has illegal scalar type iX, but vector type <N * iX> is legal, scalar expressions before vectorisation may be extended to a legal type iY. This extension could result in underutilization of vector lanes, as more lanes could be used at one instruction with the lower type. Vectorisers may not always recognize opportunities for type shrinking, and this patch aims to address that limitation.	2024-09-10 10:09:03 +01:00
Kazu Hirata	e525f91640	[llvm] Use llvm::is_contained (NFC) (#101855 )	2024-08-04 11:42:48 -07:00
Kazu Hirata	b7146aed5b	[Transforms] Construct SmallVector with ArrayRef (NFC) (#101851 )	2024-08-03 15:33:08 -07:00
Philip Reames	ded35c0c3a	[vectorcombine] Pull sext/zext through reduce.or/and/xor (#99548 ) This extends the existing foldTruncFromReductions transform to handle sext and zext as well. This is only legal for the bitwise reductions (and/or/xor) and not the arithmetic ones (add, mul). Use the same costing decision to drive whether we do the transform.	2024-07-18 13:56:40 -07:00
Simon Pilgrim	da286c8bf6	[VectorCombine] foldShuffleToIdentity - peek through bitcasts to see if they come from the same value to form identity sequence (#98334 ) Workaround until I can get #96884 fixed properly - when trying to find identity sequences, peek through any bitcasts to see if the values all came from the same source. We don't run CSE frequently enough to merge all the bitcasts that we end up with.	2024-07-15 21:36:23 +01:00
Simon Pilgrim	ef5b1ec0dd	[VectorCombine] foldShuffleToIdentity - ensure casts have the same source type	2024-07-09 13:10:11 +01:00

1 2 3 4 5

236 Commits