llvm-project

Author	SHA1	Message	Date
Simon Pilgrim	769c22f25b	[VectorCombine] Fold reduce(trunc(x)) -> trunc(reduce(x)) iff cost effective (#81852 ) Vector truncations can be pretty expensive, especially on X86, whilst scalar truncations are often free. If the cost of performing the add/mul/and/or/xor reduction is cheap enough on the pre-truncated type, then avoid the vector truncation entirely. Fixes https://github.com/llvm/llvm-project/issues/81469	2024-02-19 11:32:23 +00:00
Jeremy Morse	2425e2940e	[DebugInfo][RemoveDIs] Have getInsertionPtAfterDef return an iterator (#73149 ) Part of the "RemoveDIs" project to remove debug intrinsics requires passing block-positions around in iterators rather than as instruction pointers, allowing some debug-info to reside in BasicBlock::iterator. This means getInsertionPointAfterDef has to return an iterator, and as it can return no-instruction that means returning an optional iterator. This patch changes the signature for getInsertionPtAfterDef and then patches up the various places that use it to handle the different type. This would overall be an NFC patch, however in InstCombinerImpl::freezeOtherUses I've started skipping any debug intrinsics at the returned insert-position. This should not have any _meaningful_ effect on the compiler output: at worst it means variable assignments that are skipped will now cover the freeze instruction and anything inserted before it, which should be inconsequential. Sadly: this makes the function signature ugly. This is probably the ugliest piece of fallout for the "RemoveDIs" work, but it serves the overall purpose of improving compile times and not allowing `-g` to affect compiler output, so should be worthwhile in the end.	2023-11-30 12:19:57 +00:00
Youngsuk Kim	859338a695	[llvm] Replace uses of Type::getPointerTo (NFC) Work towards removing method Type::getPointerTo. Opaque ptr cleanup effort.	2023-11-29 10:22:31 -06:00
Nikita Popov	03f05a4e72	[IR] Don't include GenericDomTreeConstruction.h in header (NFC) The whole point of the GenericDomTree.h vs GenericDomTreeConstruction.h distinction is that the latter only needs to be included in the source file and not the header.	2023-11-22 09:06:36 +01:00
Michael Maitland	acef83c142	[VectorCombine] Fix crash in scalarizeVPIntrinsic (#72039 ) When getSplatOp returns nullptr, the intrinsic cannot be scalarized. This patch includes a test case that fixes a crash from trying to scalarize the VPIntrinsic when getSplatOp returns nullptr. This fixes https://github.com/llvm/llvm-project/issues/72034.	2023-11-11 19:54:15 -05:00
Nikita Popov	6a06155c53	[VectorCombine] Discard ScalarizationResults if transform aborted Fixes https://github.com/llvm/llvm-project/issues/69820.	2023-10-31 11:24:30 +01:00
Nabeel Omer	8e31acf8ca	[VectorCombine] Add special handling for truncating shuffles (#70013 ) When dealing with a truncating shuffle, we can end up in a situation where the type passed to getShuffleCost is the type of the result of the shuffle, and the mask references an element which is out of bounds of the result vector. If dealing with truncating shuffles, pass the type of the input vectors to `getShuffleCost()` in order to avoid an out-of-bounds assertion.	2023-10-24 15:03:43 +01:00
Hans Wennborg	e2fc68c3db	Typos: 'maxium', 'minium'	2023-10-23 10:42:28 +02:00
Luke Lau	c35939b22e	[VectorCombine] Use isSafeToSpeculativelyExecute to guard VP scalarization (#69494 ) Previously we were just matching against a fixed list of VP intrinsics that we knew couldn't be speculated, but we can reuse the logic in isSafeToSpeculativelyExecuteWithOpcode. This also allows speculation in more cases, e.g. when the divisor is known to be non-zero. Unfortunately we can't reuse the exact same function call for VP intrinsics with functional intrinsics instead of opcodes, because isSafeToSpeculativelyExecute needs an instruction that already exists. So this just copies the logic by peeking into the function attributes of the intrinsic.	2023-10-19 12:45:21 -04:00
Alexey Bataev	c2ae16f6a7	[VectorCombine]Fix a crash during long vector analysis. If the analysis of the single vector requested, need to use original type to avoid crash	2023-10-09 14:22:37 -07:00
Simon Pilgrim	bea3967271	[VectorCombine] Rename foldBitcastShuf -> foldBitcastShuffle. NFC. Consistently use the term "Shuffle" in all vector combiner folds.	2023-10-09 11:28:50 +01:00
Simon Pilgrim	94795a37e8	[VectorCombine] foldBitcastShuf - add support for length changing shuffles Allow length changing shuffle masks in the "bitcast (shuf V, MaskC) --> shuf (bitcast V), MaskC'" fold. It also exposes some poor shuffle mask detection for extract/insert subvector cases inside improveShuffleKindFromMask First stage towards addressing Issue #67803	2023-10-06 11:59:51 +01:00
Simon Pilgrim	d3e66a88c2	[VectorCombine] foldBitcastShuf - compute scale factors using shuffle type element size instead of element count. NFCI. First step towards supporting length changing shuffles	2023-10-05 18:58:36 +01:00
Nikita Popov	3b82397965	[VectorCombine] Check for non-byte-sized element type We should check whether the element type is non-byte-sized, not the vector type. For types like <32 x i1> the whole type is byte-sized, but the individual elements (that we scalarize to) are not. Fixes https://github.com/llvm/llvm-project/issues/67060.	2023-09-28 14:18:30 +02:00
Ben Shi	ea0ee55c02	[VectorCombine] Enable transform 'scalarizeLoadExtract' for non constant indexes (#65445 ) Enable the transform if a non constant index is guaranteed to be safe via a UREM/AND.	2023-09-26 09:41:53 +08:00
Michael Maitland	e0aaa1956d	[VectorCombine][RISCV] Convert VPIntrinsics with splat operands to splats (#65706 ) of the scalar operation VP Intrinsics whose vector operands are both splat values may be simplified into the scalar version of the operation and the result is splatted. This issue is the intrinsic dual of #65072.	2023-09-20 18:27:51 -04:00
Ben Shi	87143ff9f2	[VectorCombine] Fix a spot in commit 068357d9b09cd635b1c2f126d119ce9afecb28f7 My previous commit leads to a crash in "Builders/sanitizer-x86_64-linux-fast" as https://lab.llvm.org/buildbot/#/builders/5/builds/36746. And this patch fixes it.	2023-09-18 15:01:47 +08:00
Ben Shi	068357d9b0	[VectorCombine] Enable transform 'scalarizeLoadExtract' for scalable vector types (#65443 ) The transform 'scalarizeLoadExtract' can be applied to scalable vector types if the index is less than the minimum number of elements. The check whether the index is less than the minimum number of elements locates at line 1175~1180. 'scalarizeLoadExtract' will call 'canScalarizeAccess' and check the returned result if this transform is safe. At the beginning of the function 'canScalarizeAccess', the index will be checked 1. If it is less than the number of elements of a fixed vector type. 2. If it is less than the minimum number of elements of a scalable vector type. Otherwise 'canScalarizeAccess' will return unsafe and this transform will be prevented.	2023-09-18 10:49:18 +08:00
Ben Shi	ad35d916cd	[VectorCombine] Enable transform 'foldSingleElementStore' for scalable vector types The transform 'foldSingleElementStore' can be applied to scalable vector types if the index is less than the minimum number of elements. Reviewed By: dmgreen, nikic Differential Revision: https://reviews.llvm.org/D157676	2023-08-23 17:12:36 +08:00
Nuno Lopes	d75fb17963	[VectorCombine] Use poison insteaf of undef as placeholder [NFC] These vector lanes are never accessed. They are used for shifting a value into the right lane and therefore only 1 value of the whole vector is actually used	2023-07-19 10:29:08 +01:00
ManuelJBrito	d22edb9794	[IR][NFC] Change UndefMaskElem to PoisonMaskElem Following the change in shufflevector semantics, poison will be used to represent undefined elements in shufflevector masks. Differential Revision: https://reviews.llvm.org/D149256	2023-04-27 18:01:54 +01:00
Bjorn Pettersson	a20f7efbc5	Remove several no longer needed includes. NFCI Mostly removing includes of InitializePasses.h and Pass.h in passes that no longer has support for the legacy PM.	2023-04-17 13:54:19 +02:00
Kazu Hirata	c83c4b58d1	[Transforms] Apply fixes from performance-for-range-copy (NFC)	2023-04-16 08:25:28 -07:00
Sanjay Patel	af39acda88	[VectorCombine] fix insertion point of shuffles As shown in issue #60649, the new shuffles were being inserted before a phi, and that is invalid. It seems like most test coverage for this fold (foldSelectShuffle) lives in the AArch64 dir, but this doesn't repro there for a base target.	2023-02-10 10:57:11 -05:00
Arthur Eubanks	15977742d3	Reland [LegacyPM] Remove some legacy passes These are part of the optimization pipeline, of which the legacy pass manager version is deprecated. Namely * Internalize * StripSymbols * StripNonDebugSymbols * StripDeadDebugInfo * StripDeadPrototypes * VectorCombine * WarnMissedTransformations Fixed previously failing ocaml tests (one of them seems to already be failing?)	2023-02-07 12:56:05 -08:00
Arthur Eubanks	1b254022b2	Revert "[LegacyPM] Remove some legacy passes" This reverts commit a4b4f62beb0bf40123181e5f5bdf32ef54f87166. Ocaml bindings tests failing.	2023-02-07 10:17:45 -08:00
Arthur Eubanks	a4b4f62beb	[LegacyPM] Remove some legacy passes These are part of the optimization pipeline, of which the legacy pass manager version is deprecated. Namely * Internalize * StripSymbols * StripNonDebugSymbols * StripDeadDebugInfo * StripDeadPrototypes * VectorCombine * WarnMissedTransformations	2023-02-07 09:57:48 -08:00
ShihPo Hung	5fb3a57ea7	[Cost] Add CostKind to getVectorInstrCost and its related users LoopUnroll estimates the loop size via getInstructionCost(), but getInstructionCost() cannot pass CostKind to getVectorInstrCost(). And so does getShuffleCost() to getBroadcastShuffleOverhead(), getPermuteShuffleOverhead(), getExtractSubvectorOverhead(), and getInsertSubvectorOverhead(). To address this, this patch adds an argument CostKind to these functions. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D142116	2023-01-21 05:29:24 -08:00
Benjamin Kramer	b6942a2880	[NFC] Hide implementation details in anonymous namespaces	2023-01-08 17:37:02 +01:00
Matt Devereau	ee4d6c8bf0	[VectorCombine] Enable scalarizeBinopOrCmp for scalable vectors This reverts a change to exclude scalarizeBinopOrCmp in VectorCombine for scalable vectors which caused poor scalable Binop codegen. Differential Revision: https://reviews.llvm.org/D138545	2022-11-23 13:17:21 +00:00
Benjamin Kramer	f116107f2d	[VectorCombine] Don't touch instruction after foldSingleElementStore, it might be deleted Use after free found by asan.	2022-11-22 21:12:42 +01:00
Sanjay Patel	ede6d608f4	[VectorCombine] switch on opcode to compile faster This follows 87debdadaf18 to further eliminate wasting time calling helper functions only to early return to the main run loop. Once again, this results in significant savings based on experimental data: https://llvm-compile-time-tracker.com/compare.php?from=01023bfcd33f922ed8c934ce563e54abe8bfe246&to=3dce4f70b73e48ccb045decb634c185e6b4c67d5&stat=instructions:u This is NFCI other than making the pass faster. The total cost of VectorCombine runs in an -O3 build appears to be well under 0.1% of compile-time now, so there's not much left to do AFAICT. There's a TODO about making the code cleaner, but it probably doesn't change timing much. I didn't include those changes here because it requires updating much more code.	2022-11-22 10:23:32 -05:00
Sanjay Patel	163bb6d64e	[Passes][VectorCombine] enable early run generally and try load folds An early run of VectorCombine was added with D102496 specifically to deal with unnecessary vector ops produced with the C matrix extension. This patch is proposing to try those folds in general and add a pair of load folds to the menu. The load transform will partly solve (see PhaseOrdering diffs) a longstanding vectorization perf bug by removing redundant loads via GVN: issue #17113 The main reason for not enabling the extra pass generally in the initial patch was compile-time cost. The cost of VectorCombine was significantly (surprisingly) improved with: 87debdadaf18 https://llvm-compile-time-tracker.com/compare.php?from=ffe05b8f57d97bc4340f791cb386c8d00e0739f2&to=87debdadaf18f8a5c7e5d563889e10731dc3554d&stat=instructions:u ...so the extra run is going to cost very little now - the total cost of the 2 runs should be less than the 1 run before that micro-optimization: https://llvm-compile-time-tracker.com/compare.php?from=5e8c2026d10e8e2c93c038c776853bed0e7c8fc1&to=2c4b68eab5ae969811f422714e0eba44c5f7eefb&stat=instructions:u It may be possible to reduce the cost slightly more with a few more earlier-exits like that, but it's probably in the noise based on timing experiments. Differential Revision: https://reviews.llvm.org/D138353	2022-11-21 13:57:55 -05:00
Sanjay Patel	8f337f8ffe	[VectorCombine] generalize pass param name for early combines; NFC The option was added with https://reviews.llvm.org/D102496, and currently the name is accurate, but I am hoping to add a load transform that is not a scalarization. See issue #17113.	2022-11-21 13:57:55 -05:00
Sanjay Patel	87debdadaf	[VectorCombine] check instruction type before dispatching to folds This is no externally visible change intended, but appears to be a noticeable (surprising) improvement in compile-time based on: https://llvm-compile-time-tracker.com/compare.php?from=0f3e72e86c8c7c6bf0ec24bf1e2acd74b4123e7b&to=5e8c2026d10e8e2c93c038c776853bed0e7c8fc1&stat=instructions:u The early returns in the individual fold functions are not good enough to avoid the overhead of the many "fold*" calls, so this speeds up the main instruction loop enough to make a difference.	2022-11-18 16:03:18 -05:00
Sanjay Patel	b57819e130	[VectorCombine] widen a load with subvector insert This adapts/copies code from the existing fold that allows widening of load scalar+insert. It can help in IR because it removes a shuffle, and the backend can already narrow loads if that is profitable in codegen. We might be able to consolidate more of the logic, but handling this basic pattern should be enough to make a small difference on one of the motivating examples from issue #17113. The final goal of combining loads on those patterns is not solved though. Differential Revision: https://reviews.llvm.org/D137341	2022-11-10 14:11:32 -05:00
Sanjay Patel	710e34e136	[VectorCombine] move load safety checks to helper function; NFC These checks can be re-used with other potential transforms such as a load of a subvector-insert.	2022-11-04 10:39:37 -04:00
Sanjay Patel	8d76fbb5f0	[VectorCombine] fix crashing on match of non-canonical fneg We can't assume that operand 0 is the negated operand because the matcher handles "fsub -0.0, X" (and also +0.0 with FMF). By capturing the extract within the match, we avoid the bug and make the transform more robust (can't assume that this pass will only see canonical IR).	2022-10-17 10:47:48 -04:00
Sanjay Patel	baab4aa1ba	[VectorCombine] convert scalar fneg with insert/extract to vector fneg insertelt DestVec, (fneg (extractelt SrcVec, Index)), Index --> shuffle DestVec, (fneg SrcVec), Mask This is a specialized form of what could be a more general fold for a binop. It's also possible that fneg is overlooked by SLP in this kind of insert/extract pattern since it's a unary op. This shows up in the motivating example from #issue 58139, but it won't solve it (that probably requires some x86-specific backend changes). There are also some small enhancements (see TODO comments) that can be done as follow-up patches. Differential Revision: https://reviews.llvm.org/D135278	2022-10-10 14:59:56 -04:00
Matt Arsenault	2adae8e1b7	VectorCombine: Pass through AssumptionCache	2022-09-19 19:25:22 -04:00
Kazu Hirata	56ea4f9bd3	[Transforms] Qualify auto in range-based for loops (NFC) Identified with readability-qualified-auto.	2022-08-27 21:21:02 -07:00
Fangrui Song	7d6017fd31	[TTI] Change new getVectorInstrCost overload to use const reference after D131114 A const reference is preferred over a non-null const pointer. `Type *` is kept as is to match the other overload. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D131197	2022-08-04 15:16:51 -07:00
Mingming Liu	bc8f2f3649	[AArch64][TTI][NFC] Overload method 'getVectorInstrCost' to provide vector instruction itself, as a context information for cost estimation. 1) Overloaded (instruction-based) method is a wrapper around the current (opcode-based) method. 2) This patch also changes a few callsites (VectorCombine.cpp, SLPVectorizer.cpp, CodeGenPrepare.cpp) to call the overloaded method. 3) This is a split of D128302. Differential Revision: https://reviews.llvm.org/D131114	2022-08-04 12:58:25 -07:00
David Green	4b7913c357	[VectorCombine] Only consider shuffle uses with the same type. The backend getShuffleCosts do not currently handle shuffles that change size very well. Limit the shuffles we collect to the same type to make sure they do not cause issues as reported in D128732.	2022-07-16 13:23:39 +01:00
Sander de Smalen	519d7876cb	[VectorCombine] Avoid creating shuffle for extract-extract pattern on scalable vector. This addresses https://github.com/llvm/llvm-project/issues/56377 Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D129136	2022-07-07 08:37:04 +00:00
David Green	5493f8fc59	[VectorCombine] Improve shuffle select shuffle-of-shuffles This in an extension to the code added in D123911 which added vector combine folding of shuffle-select patterns, attempting to reduce the total amount of shuffling required in patterns like: %x = shuffle %i1, %i2 %y = shuffle %i1, %i2 %a = binop %x, %y %b = binop %x, %y shuffle %a, %b, selectmask This patch extends the handing of shuffles that are dependent on one another, which can arise from the SLP vectorizer, as-in: %x = shuffle %i1, %i2 %y = shuffle %x The input shuffles can also be emitted, in which case they are treated like identity shuffles. This patch also attempts to calculate a better ordering of input shuffles, which can help getting lower cost input shuffles, pushing complex shuffles further down the tree. This is a recommit with some additional checks for supported forms and out-of-bounds mask elements, with some extra tests. Differential Revision: https://reviews.llvm.org/D128732	2022-07-05 17:16:18 +01:00
Nikita Popov	b69c75d53f	Revert "[VectorCombine] Improve shuffle select shuffle-of-shuffles" This reverts commit 19a1e20b8a0f69da2a871eae6cbd03d1314ee02d. Clang crashes while linking bullet from llvm-test-suite in ReleaseLTO-g cmake configuration.	2022-07-05 09:31:20 +02:00
David Green	19a1e20b8a	[VectorCombine] Improve shuffle select shuffle-of-shuffles This in an extension to the code added in D123911 which added vector combine folding of shuffle-select patterns, attempting to reduce the total amount of shuffling required in patterns like: %x = shuffle %i1, %i2 %y = shuffle %i1, %i2 %a = binop %x, %y %b = binop %x, %y shuffle %a, %b, selectmask This patch extends the handing of shuffles that are dependent on one another, which can arise from the SLP vectorizer, as-in: %x = shuffle %i1, %i2 %y = shuffle %x The input shuffles can also be emitted, in which case they are treated like identity shuffles. This patch also attempts to calculate a better ordering of input shuffles, which can help getting lower cost input shuffles, pushing complex shuffles further down the tree. Differential Revision: https://reviews.llvm.org/D128732	2022-07-04 13:38:43 +01:00
Nikita Popov	bdba8278d9	[VectorCombine] Avoid ConstantExpr::get() (NFC) Use IRBuilder APIs instead, which will still constant fold.	2022-06-29 17:17:52 +02:00
Kazu Hirata	c399b3a608	[Vectorize] Use llvm::is_contained (NFC)	2022-06-18 15:49:15 -07:00

1 2 3

148 Commits