llvm-project

Author	SHA1	Message	Date
Simon Pilgrim	92af82a48d	[VectorCombine] Fold "shuffle (binop (shuffle, shuffle)), undef" --> "binop (shuffle), (shuffle)" (#114101 ) Add foldPermuteOfBinops - to fold a permute (single source shuffle) through a binary op that is being fed by other shuffles. Fixes #94546 Fixes #49736	2024-10-31 10:58:09 +00:00
Simon Pilgrim	80c8ecd565	[VectorCombine] Add baseline "shuffle (binop (shuffle, shuffle)), undef" tests for #114101	2024-10-30 13:42:58 +00:00
Ramkumar Ramachandra	1f919aa778	VectorCombine: lift one-use limitation in foldExtractedCmps (#110902 ) There are artificial one-use limitations on foldExtractedCmps. Adjust the costs to account for multi-use, and strip the one-use matcher, lifting the limitations.	2024-10-10 14:10:41 +01:00
David Green	c136d3237a	[VectorCombine] Do not try to operate on OperandBundles. (#111635 ) This bails out if we see an intrinsic with an operand bundle on it, to make sure we don't process the bundles incorrectly. Fixes #110382.	2024-10-09 16:20:03 +01:00
Philip Reames	0b524efa95	[RISCV][TTI] Reduce cost of a <N x i1> build_vector pattern (#109449 ) This is a follow up to 7f6bbb3. When lowering a <N x i1> build_vector, we currently chose to extend to i8, perform the build_vector there, and then truncate back in vector. Our costing on the other hand accounts for it as if we performed a vector extend, an insert, and a vector extract for every element. This significantly over estimates the cost. Note that we can likely do better in our build_vector lowering here by packing the bits in scalar, and doing a build_vector of the packed bits. Regardless, our costing should match our lowering.	2024-09-23 07:21:54 -07:00
Yingwei Zheng	87663fdab9	[VectorCombine] Don't shrink lshr if the shamt is not less than bitwidth (#108705 ) Consider the following case: ``` define <2 x i32> @test(<2 x i64> %vec.ind16, <2 x i32> %broadcast.splat20) { %19 = icmp eq <2 x i64> %vec.ind16, zeroinitializer %20 = zext <2 x i1> %19 to <2 x i32> %21 = lshr <2 x i32> %20, %broadcast.splat20 ret <2 x i32> %21 } ``` After https://github.com/llvm/llvm-project/pull/104606, we shrink the lshr into: ``` define <2 x i32> @test(<2 x i64> %vec.ind16, <2 x i32> %broadcast.splat20) { %1 = icmp eq <2 x i64> %vec.ind16, zeroinitializer %2 = trunc <2 x i32> %broadcast.splat20 to <2 x i1> %3 = lshr <2 x i1> %1, %2 %4 = zext <2 x i1> %3 to <2 x i32> ret <2 x i32> %4 } ``` It is incorrect since `lshr i1 X, 1` returns `poison`. This patch adds additional check on the shamt operand. The lshr will get shrunk iff we ensure that the shamt is less than bitwidth of the smaller type. As `computeKnownBits(&I, *DL).countMaxActiveBits() > BW` always evaluates to true for `lshr(zext(X), Y)`, this check will only apply to bitwise logical instructions. Alive2: https://alive2.llvm.org/ce/z/j_RmTa Fixes https://github.com/llvm/llvm-project/issues/108698.	2024-09-15 18:38:06 +08:00
Igor Kirillov	1b57cbcf25	[VectorCombine] Refactor Insertion Point setting in shrinkType (#108398 )	2024-09-13 10:03:31 +01:00
Igor Kirillov	958a337132	[VectorCombine] Fix trunc generated between PHINodes (#108228 )	2024-09-12 10:20:56 +01:00
Han-Kuan Chen	0ccc6092d2	[VectorCombine] Add foldShuffleOfIntrinsics. (#106502 )	2024-09-10 21:10:09 +08:00
Igor Kirillov	bf694841f5	[VectorCombine] Add type shrinking and zext propagation for fixed-width vector types (#104606 ) Check that `binop(zext(value)`, other) is possible and profitable to transform into: `zext(binop(value, trunc(other)))`. When CPU architecture has illegal scalar type iX, but vector type <N * iX> is legal, scalar expressions before vectorisation may be extended to a legal type iY. This extension could result in underutilization of vector lanes, as more lanes could be used at one instruction with the lower type. Vectorisers may not always recognize opportunities for type shrinking, and this patch aims to address that limitation.	2024-09-10 10:09:03 +01:00
David Spickett	d1f3a92ea9	Revert "[AArch64] Remove special-case inserted shuffle cost." This reverts commit 19b785b7334d01354e8430634bab3c3341c671ca. My bisect must have been wrong because they're still failing, and there are follow ups to this that would need unpicking anyway.	2024-07-29 11:24:39 +00:00
David Spickett	19b785b733	Revert "[AArch64] Remove special-case inserted shuffle cost." This reverts commit f67fa3be4db68afc08c7f3d9523f1533fa5687b7. Caused test suite failures on AArch64: https://lab.llvm.org/buildbot/#/builders/17/builds/1349	2024-07-29 11:03:25 +00:00
Vitaly Buka	497e2e8cf8	[NFC][VectorCombine] Add negative sanitizer tests (#100832 ) They are already work as expected.	2024-07-26 17:20:57 -07:00
David Green	f67fa3be4d	[AArch64] Remove special-case inserted shuffle cost. This special case tried to measure if the shuffle vector will be multiple inserts into an existing vector, with one of the lanes already in-place. If so it reduces the cost by 1 to to represent it will can insert n-1 vector lanes. This isn't always true though as the original vector may need to be moved to a new value to start inserting new values into it, if other values from the original are still needed. This didn't effect performance much when I tried it, but should hopefully start to address a regression we see from differences in SLP vectorization lane orders.	2024-07-25 17:46:48 +01:00
Luke Lau	58854facb3	[RISCV] Don't cost vector arithmetic fp ops as cheaper than scalar (#99594 ) I was comparing some SPEC CPU 2017 benchmarks across rva22u64 and rva22u64_v, and noticed that in a few cases that rva22u64_v was considerably slower. One of them was 519.lbm_r, which has a large loop that was being unprofitably vectorized. It has an if/else in the loop which requires large amounts of predication when vectorized, but despite the loop vectorizer taking this into account the vector cost came out as cheaper than the scalar. It looks like the reason for this is because we cost scalar floating point ops as 2, but their vector equivalents as 1 (for LMUL 1). This comes from how we use BasicTTIImpl for scalars which treats floats as twice as expensive as integers. This patch doubles the cost of vector floating point arithmetic ops so that they're at least as expensive as their scalar counterparts, which gives a 13% speedup on 519.lbm_r at -O3 on the spacemit-x60. Fixes #62576 (the last point there about scalar fsub/fmul)	2024-07-22 13:56:10 +08:00
Philip Reames	ded35c0c3a	[vectorcombine] Pull sext/zext through reduce.or/and/xor (#99548 ) This extends the existing foldTruncFromReductions transform to handle sext and zext as well. This is only legal for the bitwise reductions (and/or/xor) and not the arithmetic ones (add, mul). Use the same costing decision to drive whether we do the transform.	2024-07-18 13:56:40 -07:00
Philip Reames	5e8cd29d62	[RISCV] Add coverage for vector combine reduce(cast x) transformation This covers both the existing trunc transform - basically checking that it performs sanely with the RISCV cost model - and a planned change to handle sext/zext as well.	2024-07-18 11:39:25 -07:00
Simon Pilgrim	da286c8bf6	[VectorCombine] foldShuffleToIdentity - peek through bitcasts to see if they come from the same value to form identity sequence (#98334 ) Workaround until I can get #96884 fixed properly - when trying to find identity sequences, peek through any bitcasts to see if the values all came from the same source. We don't run CSE frequently enough to merge all the bitcasts that we end up with.	2024-07-15 21:36:23 +01:00
Simon Pilgrim	eb656ea6d7	[VectorCombine] Add vectorcombine specific test coverage for #98334 Don't rely on phaseordering tests alone	2024-07-15 11:14:43 +01:00
Simon Pilgrim	ef5b1ec0dd	[VectorCombine] foldShuffleToIdentity - ensure casts have the same source type	2024-07-09 13:10:11 +01:00
Simon Pilgrim	7e054c33d4	[VectorCombine] foldShuffleOfCastops - don't restrict to oneuse but compare total costs instead Some casts (especially bitcasts but others as well) are incredibly cheap (or free), so don't limit the shuffle(cast(x),cast(y)) -> cast(shuffle(x,y)) to oneuse cases, but instead compare the total before/after costs of possibly repeating some casts.	2024-07-08 14:57:51 +01:00
Elvis Wang	4762f3bab0	[RISCV][TTI] Add cost of type based binOp VP intrinsics with functionalOPC. (#93435 ) Intrinsics not supported in the backend will fall Into BasicTTIImpl, which will check if the VP intrinsic is a type based instruction. All type based instruction will fall into the `getTypeBasedIntrinsicInstrCost()` which doesn't support instruction with scalable vector type. This patch adds the instruction cost for type based binOp VP intrinsic instructions in the backend to get the valid instruction costs. The cost of type based binOp VP intrinsics will be same as their non-VP counterpart.	2024-07-05 08:13:18 +08:00
David Green	76c8e1d857	[VectorCombine] Guard against the lane zero select predicate being scalar All but the first lane was being checked, but this could leave the first lane with a scalar select predicate. This just extends the check to make sure the types are all the same	2024-06-28 17:27:16 +01:00
David Green	efa8463ab9	[VectorCombine] Add free concats to shuffleToIdentity. (#94954 ) This is another relatively small adjustment to shuffleToIdentity, which has had a few knock-one effects to need a few more changes. It attempts to detect free concats, that will be legalized to multiple vector operations. For example if the lanes are '[a[0], a[1], b[0], b[1]]' and a and b are v2f64 under aarch64. In order to do this: - isFreeConcat detects whether the input has piece-wise identities from multiple inputs that can become a concat. - A tree of concat shuffles is created to concatenate the input values into a single vector. This is a little different to most other inputs as there are created from multiple values that are being combined together, and we cannot rely on the Lane0 insert location always being valid. - The insert location is changed to the original location instead of updating per item, which ensure it is valid due to the order that we visit and create items.	2024-06-25 07:55:08 +01:00
David Green	a1bdb01656	[VectorCombine] Change shuffleToIdentity to use Use. NFC When looking up through shuffles, a Value can be multiple different leaf types (for example an identity from one position, a splat from another). We currently detect this by recalculating which type of leaf it is when generating, but as more types of leafs are added (#94954) this doesn't scale very well. This patch switches it to use Use, not Value, to more accurately detect which type of leaf each Use should have.	2024-06-17 15:25:33 +01:00
Simon Pilgrim	22530e7985	[CostModel][X86] Update vXi8 mul costs for AVX512BW/AVX2/AVX1/SSE Later levels were inheriting some of the worst case costs from SSE/AVX1 etc. Based off llvm-mca numbers from the check_cost_tables.py script in https://github.com/RKSimon/llvm-scripts Cleanup prep work for #90748	2024-06-16 07:27:35 +01:00
David Green	d1b5a4b0c5	[VectorCombine] Add tests for shuffleToIdentity of concats. NFC	2024-06-07 22:28:00 +01:00
Henry Jiang	b5b61cce96	[VectorCombine] Preserves the maximal legal FPMathFlags during foldShuffleToIdentity (#94295 ) The `VectorCombine::foldShuffleToIdentity` does not preserve fast math flags when folding the shuffle, leading to unexpected vectorized result and missed optimizations with FMA instructions. We can conservatively take the maximal legal set of fast math flags whenever we fold shuffles to identity to enable further optimizations in the backend. --------- Co-authored-by: Henry Jiang <henry.jiang1@ibm.com>	2024-06-05 11:35:37 -04:00
David Green	93d8d74ae6	[VectorCombine] Remove requirement for Instructions in shuffleToIdentity (#93543 ) This removes the check that both operands of the original shuffle are instructions, which is a relic from a previous version that held more variables as Instructions.	2024-05-29 09:36:53 +01:00
David Green	1c6746e2db	[VectorCombine] Add support for zext/sext/trunc to shuffleToIdentity (#92696 ) This is one of the simple additions to shuffleToIdentity that help it look through intermediate zext/sext instructions.	2024-05-29 08:56:41 +01:00
David Green	516a9f5183	[VectorCombine] Add Cmp and Select for shuffleToIdentity (#92794 ) Other than some additional checks needed for compare predicates and selects with scalar condition operands, these are relatively simple additions to what already exists.	2024-05-28 13:10:19 +01:00
David Green	f53f2a8c92	[VectorCombine] Add constant splat handling for shuffleToIdentity (#92797 ) This just adds splat constants, which can be treated like any other splat which hopefully makes them very simple. It does not try to handle more complex constant vectors yet, just the more common splats.	2024-05-28 10:55:57 +01:00
Ramkumar Ramachandra	63d81311a2	VectorCombine: add tests written for InstSimplify (#92776 ) 2141907 (InstSimplify: increase shufflevector test coverage) was recently merged as a pre-commit test for some work that was misguided. It turns out that InstSimplify can never work on those tests, but the tests are useful nevertheless; move them to VectorCombine to support the development of VectorCombine::foldShuffleToIdentity.	2024-05-21 08:06:34 +01:00
David Green	285f1392da	[VectorCombine] Some more tests for different cmp's and fp consts. NFC	2024-05-20 18:27:33 +01:00
David Green	8b8a38a7b4	[VectorCombine] Additional extend tests for shuffleToIdentity. NFC	2024-05-19 10:18:26 +01:00
David Green	c3677e4522	[VectorCombine] Don't transform single shuffles in shuffleToIdentity This will help in later patches where the checks for operands being instructions is removed, and might help not remove unnecessary poison lanes.	2024-05-18 23:37:55 +01:00
Jay Foad	1650f1b3d7	Fix typo "indicies" (#92232 )	2024-05-15 13:10:16 +01:00
David Green	b7ed097f29	[VectorCombine] Add intrinsics handling to shuffleToIdentity (#91000 ) This is probably the most involved addition, as it tries to make use of isTriviallyVectorizable with isVectorIntrinsicWithScalarOpAtArg to handle a number of different intrinsics that are all lane-wise. Additional tests have been added for some of the different intrinsics from isVectorIntrinsicWithScalarOpAtArg / isVectorIntrinsicWithOverloadTypeAtArg.	2024-05-12 20:31:11 +01:00
Ramkumar Ramachandra	57b9c15227	VectorCombine: fix logical error after m_Trunc match (#91201 ) The matcher m_Trunc() matches an Operator with a given Opcode, which could either be an Instruction or ConstExpr. VectorCombine::foldTruncFromReductions() incorrectly assumes that the pattern matched is always an Instruction, and attempts a cast. Fix this. Fixes #88796.	2024-05-08 09:47:55 +01:00
Ramkumar Ramachandra	9ef28cf88c	VectorCombine: add test for crash #88796 (#91200 )	2024-05-08 09:43:49 +01:00
David Green	d145f40963	[VectorCombine] shuffleToIdentity - guard against call instructions. The shuffleToIdentity fold needs to be a bit more careful about the difference between call instructions and intrinsics. The second can be handled, but the first should result in bailing out. This patch also adds some extra intrinsic tests from #91000. Fixes #91078	2024-05-05 10:47:11 +01:00
David Green	a4d10266d2	[VectorCombine] Add foldShuffleToIdentity (#88693 ) This patch adds a basic version of a combine that attempts to remove shuffles that when combined simplify away to an identity shuffle. For example: %ab = shufflevector <8 x half> %a, <8 x half> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0> %at = shufflevector <8 x half> %a, <8 x half> poison, <4 x i32> <i32 7, i32 6, i32 5, i32 4> %abt = fneg <4 x half> %at %abb = fneg <4 x half> %ab %r = shufflevector <4 x half> %abt, <4 x half> %abb, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0> By looking through the shuffles and fneg, it can be simplified to: %r = fneg <8 x half> %a The code tracks each lane starting from the original shuffle, keeping a track of a vector of {src, idx}. As we propagate up through the instructions we will either look through intermediate instructions (binops and unops) or see a collections of lanes that all have the same src and incrementing idx (an identity). We can also see a single value with identical lanes, which we can treat like a splat. Only the basic version is added here, handling identities, splats, binops and unops. In follow-up patches other instructions can be added such as constants, intrinsics, cmp/sel and zext/sext/trunc.	2024-05-03 19:14:38 +01:00
Simon Pilgrim	282b56f43d	[VectorCombine] foldShuffleOfBinops - add support for length changing shuffles (#88899 ) Refactor to be closer to foldShuffleOfCastops - sibling patch to #88743 that can be used to address some of the issues identified in #88693	2024-04-24 10:18:49 +01:00
Simon Pilgrim	c45fbfdb8e	[VectorCombine][X86] shuffle-of-binops.ll - adjust no matching operand test to use FDIV Use of FDIV allows us to show a definite cost improvement with #88899	2024-04-23 17:31:01 +01:00
Simon Pilgrim	3197146cc6	[VectorCombine][AArch64] shuffletoidentity.ll - regenerate checks Reduce diffs in #88899	2024-04-23 17:06:18 +01:00
Simon Pilgrim	7f4f237cd8	[VectorCombine] foldShuffleOfShuffles - add missing arguments to getShuffleCost calls. Ensure the getShuffleCost arguments/instruction args are populated - minor extension to #88743 to help improve shuffle costs for certain corner cases (e.g. shuffles of loads)	2024-04-23 11:53:08 +01:00
Simon Pilgrim	b4c6607add	[VectorCombine][X86] Add test showing foldShuffleOfShuffles folding shuffles that would be better separate On AVX+ targets a broadcast load can be treated as free.	2024-04-23 11:11:14 +01:00
Simon Pilgrim	bddfbe748b	[VectorCombine] foldShuffleOfShuffles - fold "shuffle (shuffle x, undef), (shuffle y, undef)" -> "shuffle x, y" (#88743 ) Another step towards cleaning up shuffles that have been split, often across bitcasts between SSE intrinsic. Strip shuffles entirely if we fold to an identity shuffle.	2024-04-22 15:57:59 +01:00
Simon Pilgrim	4cc9c6d98d	[VectorCombine] foldShuffleOfBinops - don't fold shuffle(divrem(x,y),divrem(z,w)) if mask contains poison Fixes #89390	2024-04-22 09:00:38 +01:00
Simon Pilgrim	3fbaad5a0c	[VectorCombine] Add test coverage for #89390	2024-04-22 09:00:37 +01:00

1 2 3 4 5 ...

260 Commits