llvm-project

Author	SHA1	Message	Date
Kyle Wang	064f02dac0	[VectorCombine] Preserve scoped alias metadata (#153714 ) Right now if a load op is scalarized, the `!alias.scope` and `!noalias` metadata are dropped. This PR is to keep them if exist.	2025-08-18 18:16:32 +00:00
David Green	790bee99de	[VectorCombine] Remove dead node immediately in VectorCombine (#149047 ) The vector combiner will process all instructions as it first loops through the function, adding any newly added and deleted instructions to a worklist which is then processed when all nodes are done. These leaves extra uses in the graph as the initial processing is performed, leading to sub-optimal decisions being made for other combines. This changes it so that trivially dead instructions are removed immediately. The main changes that this requires is to make sure iterator invalidation does not occur.	2025-08-18 07:55:21 +01:00
XChy	3a4a60deff	[VectorCombine] Apply InstSimplify in scalarizeOpOrCmp to avoid infinite loop (#153069 ) Fixes #153012 As we tolerate unfoldable constant expressions in `scalarizeOpOrCmp`, we may fold ```llvm define void @bug(ptr %ptr1, ptr %ptr2, i64 %idx) #0 { entry: %158 = insertelement <2 x i64> <i64 5, i64 ptrtoint (ptr @val to i64)>, i64 %idx, i32 0 %159 = or disjoint <2 x i64> splat (i64 2), %158 store <2 x i64> %159, ptr %ptr2 ret void } ``` to ```llvm define void @bug(ptr %ptr1, ptr %ptr2, i64 %idx) { entry: %.scalar = or disjoint i64 2, %idx %0 = or <2 x i64> splat (i64 2), <i64 5, i64 ptrtoint (ptr @val to i64)> %1 = insertelement <2 x i64> %0, i64 %.scalar, i64 0 store <2 x i64> %1, ptr %ptr2, align 16 ret void } ``` And it would be folded back in `foldInsExtBinop`, resulting in an infinite loop. This patch forces scalarization iff InstSimplify can fold the constant expression.	2025-08-15 18:38:04 +00:00
Leon Clark	e2bbd6d287	[VectorCombine][AMDGPU] Narrow Phi of Shuffles. (#140188 ) Attempt to narrow a phi of shufflevector instructions where the two incoming values have the same operands but different masks. Related to #128938. --------- Co-authored-by: Leon Clark <leoclark@amd.com>	2025-08-12 18:45:11 +01:00
Leon Clark	9115bef8ee	[VectorCombine] Shrink loads used in shufflevector rebroadcasts. (#153138 ) Reopen #128938. Attempt to shrink the size of vector loads where only some of the incoming lanes are used for rebroadcasts in shufflevector instructions. --------- Co-authored-by: Leon Clark <leoclark@amd.com> Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>	2025-08-12 14:08:37 +01:00
David Green	6ca6d45b29	[VectorCombine] Use hasOneUser in shuffle-to-identity fold (#152675 ) We need to check that the node is part of the graph being converted, so will not contain external uses when transformed.	2025-08-11 07:45:15 +01:00
Simon Pilgrim	88c6448fa2	Revert "[VectorCombine] Shrink loads used in shufflevector rebroadcasts" (#151960 ) Reverts llvm/llvm-project#128938 while a crash regression is investigated	2025-08-04 15:03:53 +01:00
Simon Pilgrim	4655907099	[VectorCombine][X86] Fix typo in src_v8tov8_i16 shuffle(select(),select()) test Shuffle was (free) identity, messing up the fold's cost benefit test	2025-08-04 14:10:17 +01:00
Leon Clark	1feed444aa	[VectorCombine] Shrink loads used in shufflevector rebroadcasts (#128938 ) Attempt to shrink the size of vector loads where only some of the incoming lanes are used for rebroadcasts in shufflevector instructions. --------- Co-authored-by: Leon Clark <leoclark@amd.com> Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>	2025-08-04 10:49:27 +01:00
Nathan Gauër	67273393b1	[VectorCombine][TTI] Prevent extract/ins rewrite to GEP (#150216 ) Using GEP to index into a vector is not disallowed, but not recommended. The SPIR-V backend needs to generate structured access into types, which is impossible with an untyped GEP instruction unless we add more info to the IR. Finding a solution is a work-in-progress, but in the meantime, we'd like to reduce the amount of failures. Preventing this optimizations from rewritting extract/insert instructions into a GEP helps us lower more code to SPIR-V. This change should be OK as it's only active when targeting SPIR-V and disabling a non-recommended transformation. Related to #145002	2025-07-31 14:14:00 +02:00
Rahul Yadav	04e5e643f5	[VectorCombine] Generalize foldBitOpOfBitcasts to support more cast operations (#148350 ) This patch generalizes the existing foldBitOpOfBitcasts optimization in the VectorCombine pass to handle additional cast operations beyond just bitcast. Fixes: [#146037](https://github.com/llvm/llvm-project/issues/146037) Summary The optimization now supports folding bitwise operations (AND/OR/XOR) with the following cast operations: - bitcast (original functionality) - trunc (truncate) - sext (sign extend) - zext (zero extend) The transformation pattern is: bitop(castop(x), castop(y)) -> castop(bitop(x, y)) This reduces the number of cast instructions from 2 to 1, improving performance on targets where cast operations are expensive or where performing bitwise operations on narrower types is beneficial. Implementation Details - Renamed foldBitOpOfBitcasts to foldBitOpOfCastops to reflect broader functionality - Extended pattern matching to handle any CastInst operation - Added validation for each cast type's constraints (e.g., trunc requires source > dest) - Updated cost model to use the actual cast opcode - Preserves IR flags from original instructions - Handles multi-use scenarios appropriately Testing - Added comprehensive tests in test/Transforms/VectorCombine/bitop-of-castops.ll - Tests cover all supported cast types with all bitwise operations - Includes negative tests for unsupported patterns - All existing VectorCombine tests pass	2025-07-21 17:14:56 +01:00
Florian Hahn	02d3738be9	[AArch64,TTI] Remove RealUse check for vector insert/extract costs. (#146526 ) getVectorInstrCostHelper would return costs of zero for vector inserts/extracts that move data between GPR and vector registers, if there was no 'real' use, i.e. there was no corresponding existing instruction. This meant that passes like LoopVectorize and SLPVectorizer, which likely are the main users of the interface, would understimate the cost of insert/extracts that move data between GPR and vector registers, which has non-trivial costs. The patch removes the special case and only returns costs of zero for lane 0 if it there is no need to transfer between integer and vector registers. This impacts a number of SLP test, and most of them look like general improvements.I think the change should make things more accurate for any AArch64 target, but if not it could also just be Apple CPU specific. I am seeing +2% end-to-end improvements on SLP-heavy workloads. PR: https://github.com/llvm/llvm-project/pull/146526	2025-07-15 15:19:27 +01:00
Florian Hahn	1113224f94	[VectorCombine] Account for IRBuilder simplification in translateExt. After https://github.com/llvm/llvm-project/pull/146350, CreateExtractElement may return a folded value and not create an ExtractElement instruction. Replace cast with dyn_cast. Note that the function returns nullptr already earlier if the extract may be constant folded. Fixes https://github.com/llvm/llvm-project/issues/147218	2025-07-07 13:39:39 +02:00
Florian Hahn	4acdb8e14e	[VectorCombine] Scalarize extracts of ZExt if profitable. (#142976 ) Add a new scalarization transform that tries to convert extracts of a vector ZExt to a set of scalar shift and mask operations. This can be profitable if the cost of extracting is the same or higher than the cost of 2 scalar ops. This is the case on AArch64 for example. For AArch64,this shows up in a number of workloads, including av1aom, gmsh, minizinc and astc-encoder. PR: https://github.com/llvm/llvm-project/pull/142976	2025-07-03 08:49:32 +01:00
Luke Lau	7931a8f102	[VectorCombine] Scalarize vector intrinsics with scalar arguments (#146530 ) Some intrinsics like llvm.abs or llvm.powi have a scalar argument even when the overloaded type is a vector. This patch handles these in scalarizeOpOrCmp to allow scalarizing them. In the test the leftover vector powi isn't folded away to poison, this should be fixed in a separate patch.	2025-07-02 16:48:53 +01:00
Florian Hahn	777d6b5de9	[VectorCombine] Use InstSimplifyFolder to simplify instrs on creation. (#146350 ) Update VectorCombine to use InstSimplifyFolder to simplify redundant instructions on creation. PR: https://github.com/llvm/llvm-project/pull/146350	2025-07-01 20:55:51 +01:00
Narayan	d05634d5cd	[VectorCombine] Fold bitwise operations of bitcasts into bitcast of bitwise operation (#137322 ) Currently, LLVM fails to convert certain pblendvb intrinsics into select instructions when the blend mask is derived from complex boolean logic operations. This occurs even when the mask is ultimately based on sign-extended comparison results, preventing further optimization opportunities. Fixes #66513 --------- Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>	2025-06-26 14:57:22 +01:00
Simon Pilgrim	bf4afb08fe	[CostModel] improveShuffleKindFromMask - recognise a SK_PermuteSingleSrc incorrectly tagged as SK_PermuteTwoSrc (#145352 ) If a SK_PermuteTwoSrc shuffle kind's mask only references the first operand, then treat this as SK_PermuteSingleSrc Part of #145335	2025-06-23 20:20:47 +01:00
Simon Pilgrim	26390f22b8	[VectorCombine] foldShuffleOfShuffles - fold shuffle(shuffle(x,y),poison) length changing masks (#144690 ) The shuffle merging code assumes that the shuffle sources are all the same type, which fails if we've changed length and don't have 2 inner shuffles. We already handle length-changing shuffles if we do have 2 inner shuffles. This patch creates a fake "all poison" shuffle mask and reuses the other shuffle's sources, which can be safely used with the existing merge code. The alternative was a considerable refactor of the merge code to account for different vector widths...... Fixes #144656	2025-06-22 13:30:45 +01:00
Florian Hahn	4d683818a0	[VectorCombine] Add test cases for scalarizing extracts of extends. Add test cases where scalarizing extracts of a zext can be profitable.	2025-06-05 10:08:07 +01:00
Luke Lau	2b9ded64b0	[VectorCombine] Support nary operands and intrinsics in scalarizeOpOrCmp (#138406 ) This adds support for unary operands, and unary + ternary intrinsics in scalarizeOpOrCmp (FKA scalarizeBinOpOrCmp). The motivation behind this is to scalarize more intrinsics in VectorCombine rather than in DAGCombine, so we can sink splats across basic blocks: see https://github.com/llvm/llvm-project/pull/137786 The main change required is to generalize the existing VecC0/VecC1 rules across n-ary ops: - An operand can either be a constant vector or an insert of a scalar into a constant vector - If it's an insert, the index needs to be static and in bounds - If it's an insert, all indices need to be the same across all operands - If all the operands are constant vectors, bail as it will get constant folded anyway	2025-05-28 09:45:54 +01:00
Luke Lau	97f6076ded	[VectorCombine][X86] Use updated getVectorInstrCost hook (#137823 ) This addresses a TODO where previously scalarizeBinopOrCmp conservatively bailed if one of the operands was a load. getVectorInstrCost was updated to take in values in https://reviews.llvm.org/D140498 so we can pass in the scalar value to be inserted, which should return an accurate cost for a gather. To prevent regressions on x86 this tries to constant fold NewVecC up front so we can pass it into TTI and get a more accurate cost. We want to remove this restriction on RISC-V since this is always profitable whether or not the scalar is a load.	2025-05-27 16:27:28 +01:00
Florian Hahn	16fdb4f62d	[VectorCombine] Add tests with combine-able vector-extends. Precommit tests for https://github.com/llvm/llvm-project/pull/141109.	2025-05-23 14:05:57 +01:00
Luke Lau	d827588c36	[VectorCombine] Scalarize binop-like intrinsics (#138095 ) Currently VectorCombine can scalarize vector compares and binary ops. This extends it to also scalarize binary-op like intrinsics like umax, minnum etc. The motivation behind this is to scalarize more intrinsics in VectorCombine rather than in DAGCombine, so we can sink splats across basic blocks: see #137786 This currently has very little effect on generated code because InstCombine doesn't yet canonicalize binary intrinsics where one operand is a constant into the form that VectorCombine expects, i.e. `binop (shuffle insert) const --> shuffle (binop insert const)`. The plan is to land this first and then in a subsequent patch teach InstCombine to do the canonicalization to avoid regressions in the meantime. This uses `isTriviallyVectorizable` to determine whether or not an intrinsic is safe to scalarize. There's also `isTriviallyScalarizable`, but this seems more geared towards the Scalarizer pass and includes intrinsics with multiple return values. It also only handles intrinsics with two operands with the same type as the return type. In the future we would generalize this to handle arbitrary numbers of operands, including unary operators too, e.g. fneg or fma, as well as different operand types, e.g. powi or scmp	2025-05-21 09:24:11 +01:00
David Green	3b4d5638b3	[AArch64] Limit vector splitting to vectors of size larger than 128bit The intent of this code is to split larger vectors into smaller shuffles, but it currently triggering on some small vector types. Limit it to vectors of size >128bit.	2025-05-09 22:17:28 +01:00
David Green	fff12fbdb9	[VectorCombine] Fix the type used in foldShuffleOfIntrinsics Cost. (#138419 ) The shuffle needn't be twice the original number of vector elements, so the intermediate type used between the shuffle and the intrinsic should use the ShuffleDstTy number of elements. I found this when looking at shuffle costs and do not have test where it alters the output, but have added some cases where the shuffle output is not twice the size of the input.	2025-05-09 08:55:36 +01:00
Luke Lau	9a7e307b64	[VectorCombine] Add tests for UB issue, remove immediate UB from existing tests. NFC (#138395 ) This adds a test that exercises the part of scalarizeBinOpOrCmp that produces immediate UB as described in https://github.com/llvm/llvm-project/pull/138095#discussion_r2070133432, but is fortunately currently folded into a correct transform. I also noticed a bunch of immediate UB in some of the existing tests so this also cleans them up. They should still all be scalarized though.	2025-05-05 12:22:08 +08:00
David Green	ea08dd8dc3	[AArch64] Add shuffle-of-intrinsics VectorCombine test coverage. NFC	2025-05-03 18:49:09 +01:00
Alexander Richardson	ee13638362	[AMDGPU] Remove explicit datalayout from tests where not needed Since e39f6c1844fab59c638d8059a6cf139adb42279a opt will infer the correct datalayout when given a triple. Avoid explicitly specifying it in tests that depend on the AMDGPU target being present to avoid the string becoming out of sync with the TargetInfo value. Only tests with REQUIRES: amdgpu-registered-target or a local lit.cfg were updated to ensure that tests for non-target-specific passes that happen to use the AMDGPU layout still pass when building with a limited set of targets. Reviewed By: shiltian, arsenm Pull Request: https://github.com/llvm/llvm-project/pull/137921	2025-04-30 10:58:17 -07:00
Simon Pilgrim	f572a5951a	[VectorCombine] Ensure canScalarizeAccess handles cases where the index type can't represent all inbounds values Fixes #132563	2025-04-24 14:17:55 +01:00
Luke Lau	79435de8a5	[ConstantFold] Support scalable constant splats in ConstantFoldCastInstruction (#133207 ) Previously only fixed vector splats were handled. This adds supports for scalable vectors too by allowing ConstantExpr splats. We need to add the extra V->getType()->isVectorTy() check because a ConstantExpr might be a scalar to vector bitcast. By allowing ConstantExprs this also allow fixed vector ConstantExprs to be folded, which causes the diffs in llvm/test/Analysis/ValueTracking/known-bits-from-operator-constexpr.ll and llvm/test/Transforms/InstSimplify/ConstProp/cast-vector.ll. I can remove them from this PR if reviewers would prefer. Fixes #132922	2025-04-03 16:24:56 +01:00
hanbeom	0ee8f69978	[VectorCombine] Fix invalid shuffle cost argument of foldShuffleOfSelects (#130281 ) In the previous code (#128032), it specified the destination vector as the getShuffleCost argument. Because the shuffle mask specifies the indices of the two vectors specified as elements, the maximum value is twice the size of the source vector. This causes a problem if the destination vector is smaller than the source vector and specify an index in the mask that exceeds the size of the destination vector. Fix the problem by correcting the previous code, which was using wrong argument in the Cost calculation. Fixes #130250	2025-03-07 16:40:26 +00:00
hanbeom	5d1029b4a8	[VectorCombine] Handle shuffle of selects (#128032 ) (shuffle(select(c1,t1,f1)), (select(c2,t2,f2)), m) -> (select (shuffle c1,c2,m), (shuffle t1,t2,m), (shuffle f1,f2,m)) The behaviour of SelectInst on vectors is the same as for `V'select[i] = Condition[i] ? V'True[i] : V'False[i]`. If a ShuffleVector is performed on two selects, it will be like: `V'[mask] = (V'select[i] = Condition[i] ? V'True[i] : V'False[i])` That's why a ShuffleVector with two SelectInst is equivalent to first ShuffleVector Condition/True/False and then SelectInst that result. This patch implements the transforming described above. Proof: https://alive2.llvm.org/ce/z/97wfHp Fixes #120775	2025-03-06 12:43:47 +00:00
Simon Pilgrim	5ddf40fa78	[VectorCombine] scalarizeLoadExtract - don't create scalar loads if any extract is waiting to be erased (#129375 ) If any extract is waiting to be erased, then bail out as this will distort the cost calculation and possibly lead to infinite loops. Fixes #129373	2025-03-01 16:54:22 +00:00
Mikhail Gudim	f5d153ef26	[VectorCombine] Fold binary op of reductions. (#121567 ) Replace binary of of two reductions with one reduction of the binary op applied to vectors. For example: ``` %v0_red = tail call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %v0) %v1_red = tail call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %v1) %res = add i32 %v0_red, %v1_red ``` gets transformed to: ``` %1 = add <16 x i32> %v0, %v1 %res = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %1) ```	2025-02-22 06:11:33 -05:00
Simon Pilgrim	eb2b453eb7	[VectorCombine] foldInsExtVectorToShuffle - ensure we call getShuffleCost with the input operand type, not the result Typo in #121216 Fixes #126085	2025-02-06 17:41:24 +00:00
hanbeom	8c1dbac304	[VectorCombine] Allow shuffling between vectors the same type but different element sizes (#121216 ) `foldInsExtVectorToShuffle` function combines the extract/insert of a vector into a vector through a shuffle. However, we only supported coupling between vectors of the same size. This commit allows combining extract/insert for vectors of the same type but with different sizes by converting the length of the vectors. Proof: https://alive2.llvm.org/ce/z/ELNLr7 Fixed https://github.com/llvm/llvm-project/issues/120772	2025-02-06 10:38:50 +00:00
Min-Yih Hsu	635ab515d5	[VectorCombine] Fold vector.interleave2 with two constant splats (#125144 ) If we're interleaving 2 constant splats, for instance `<vscale x 8 x i32> <splat of 666>` and `<vscale x 8 x i32> <splat of 777>`, we can create a larger splat `<vscale x 8 x i64> <splat of ((777 << 32) \| 666)>` first before casting it back into `<vscale x 16 x i32>`.	2025-02-03 19:05:49 -08:00
Simon Pilgrim	6f6d8084ad	[VectorCombine] Fold insert(binop(x,y),binop(a,b),idx) --> binop(insert(x,a,idx),insert(y,b,idx)) (#124909 ) Add foldInsExtBinop fold to cleanup missed vectorization cases which can happen on targets with cheap insert/extract instructions which prevent foldExtractExtract (binop(extract(x),extract(y)) -> extract(binop(x,shuffle(y)))) from helping with the merge.	2025-01-31 09:41:58 +00:00
Simon Pilgrim	9acaaebcdd	[VectorCombine][X86] Add insert(binop(x,y),binop(a,b),idx) test coverage for #124909	2025-01-30 15:12:17 +00:00
Mikhail Gudim	1822462e2a	[InstCombine][VectorCombine][NFC] Move a test from InstCombine to (#124948 ) VectorCombine Since the transformation which is the subject of the 'fold-binop-of-reductions.ll` test will be in VectorCombine move the test there.	2025-01-29 13:54:38 -05:00
Simon Pilgrim	89ca3e72ca	[CostModel][X86] Reduce worst case v8i16/v16i8 SSE2 shuffle costs (#124789 ) These were based off instruction count, not throughput - we can probably improve these further, but these throughput numbers match the worse expanded shuffles we see in the vector-shuffle-128-v* codegen tests.	2025-01-29 10:23:09 +00:00
Simon Pilgrim	178f47143a	[CostModel][X86] getShuffleCost - shuffles with only one defined element are always cheap (#124412 ) If we're just moving a single element around inside a 128-bit lane (probably as an alternative to extracting it), we can assume this is cheap as a single PSRLDQ/PSHUFD/SHUFPS. I've got the horrid feeling we're moving towards matching all SSE shuffle patterns inside the cost model, but I'm going to do my best to avoid this for now :\|	2025-01-27 15:56:22 +00:00
Ramkumar Ramachandra	5187482fd0	IR: handle FP predicates in CmpPredicate::getMatching (#122924 ) CmpPredicate::getMatching implicitly assumes that both predicates are integer-predicates, and this has led to a crash being reported in VectorCombine after e409204 (VectorCombine: teach foldExtractedCmps about samesign). FP predicates are simple enough to handle as there is never any samesign information associated with them: hence handle them in CmpPredicate::getMatching, fixing the VectorCombine crash and guarding against future incorrect usages.	2025-01-14 18:17:07 +00:00
Simon Pilgrim	87750c9de4	[VectorCombine] foldPermuteOfBinops - match identity shuffles only if they match the destination type Fixes regression identified after #122118	2025-01-14 16:09:50 +00:00
Ramkumar Ramachandra	e409204a89	VectorCombine: teach foldExtractedCmps about samesign (#122883 ) Follow up on 4a0d53a (PatternMatch: migrate to CmpPredicate) to get rid of one of the FIXMEs it introduced by replacing a predicate comparison with CmpPredicate::getMatching.	2025-01-14 12:04:14 +00:00
Simon Pilgrim	0bf1591d01	[VectorCombine] foldPermuteOfBinops - fold "shuffle (binop (shuffle, other)), undef" --> "binop (shuffle), (shuffle)". (#122118 ) foldPermuteOfBinops currently requires both binop operands to be oneuse shuffles to fold the shuffles across the binop, but there will be cases where its still profitable to fold across the binop with only one foldable shuffle.	2025-01-14 10:43:22 +00:00
Simon Pilgrim	5a7dfb4659	[CostModel][X86] Attempt to match v4f32 shuffles that map to MOVSS/INSERTPS instruction improveShuffleKindFromMask matches this as a SK_InsertSubvector of a v1f32 (which legalises to f32) into a v4f32 base vector, making it easy to recognise. MOVSS is limited to index0.	2025-01-07 11:31:44 +00:00
Simon Pilgrim	4a42658c1b	[VectorCombine][X86] shuffle-of-cmps.ll - tweak shuf_fcmp_oeq_v4i32 shuffle to be not so cheap An upcoming patch will recognise this as a cheap INSERTPS shuffle - alter the shuffle to ensure the 2 x FCMP is still cheaper on SSE4 targets	2025-01-07 11:07:48 +00:00
Simon Pilgrim	db88071a8b	[CostModel][X86] Attempt to match cheap v4f32 shuffles that map to SHUFPS instruction (#121778 ) Avoid always assuming the worst for v4f32 2 input shuffles, and match the SHUFPS pattern where possible - each pair of output elements must come from the same source register.	2025-01-06 17:54:36 +00:00

1 2 3 4 5 ...

352 Commits