llvm-project

Author	SHA1	Message	Date
Sam Tebbs	795e35a653	Reland "[LoopVectorizer] Add support for partial reductions" with non-phi operand fix. (#121744 ) This relands the reverted #120721 with a fix for cases where neither reduction operand are the reduction phi. Only 63114239cc8d26225a0ef9920baacfc7cc00fc58 and 63114239cc8d26225a0ef9920baacfc7cc00fc58 are new on top of the reverted PR. --------- Co-authored-by: Nicholas Guy <nicholas.guy@arm.com>	2025-01-13 11:20:35 +00:00
Mel Chen	56a37a3c76	[SLPVectorizer] Refactor HorizontalReduction::createOp (NFC) (#121549 ) This patch simplifies select-based integer min/max reductions by utilizing `llvm::getMinMaxReductionPredicate`, and generates intrinsic-based min/max reductions by utilizing `llvm::getMinMaxReductionIntrinsicOp`.	2025-01-13 16:11:31 +08:00
Florian Hahn	8df64ed777	[LV] Don't consider IV increments uniform if exit value is used outside. In some cases, there might be a chain of uniform instructions producing the exit value. To generate correct code in all cases, consider the IV increment not uniform, if there are users outside the loop. Instead, let VPlan narrow the IV, if possible using the logic from 3ff1d01985752. Test case from #122602 verified with Alive2: https://alive2.llvm.org/ce/z/bA4EGj Fixes https://github.com/llvm/llvm-project/issues/122496. Fixes https://github.com/llvm/llvm-project/issues/122602.	2025-01-12 22:03:21 +00:00
Florian Hahn	3ff1d01985	Recommit "[VPlan] Try to narrow wide and replicating recipes to uniform recipes." This reverts commit 0ebb3ac7c92c4c1c44e7f3d17832d75ec5a42a67. Re-applies commit with typos fixed.	2025-01-12 20:10:28 +00:00
Florian Hahn	0ebb3ac7c9	Revert "[VPlan] Try to narrow wide and replicating recipes to uniform recipes." This reverts commit 1afba19913253dda865a8e57b37b9f4dabead1ac. Typo breaking the build	2025-01-12 19:37:45 +00:00
Florian Hahn	1afba19913	[VPlan] Try to narrow wide and replicating recipes to uniform recipes. Use the existing VPlan-based analysis to identify recipes that only have their first lane demanded and transform them to uniform recpliate recipes. This simplifies the generated code in some places and prepares for fixing https://github.com/llvm/llvm-project/issues/122496.	2025-01-12 19:32:01 +00:00
Florian Hahn	7f59b4e998	[VPlan] Skip non-induction phi recipes in legalizeAndOptimizeInductions. The body of the loop only applies to wide induction recipes, skip any other header phi recipes up-frond	2025-01-11 20:33:02 +00:00
Vasileios Porpodas	25b90c4ef6	[SandboxVec][SeedCollector][NFC] Remove redundant 'else' and move the assertion within the 'if'	2025-01-10 14:54:44 -08:00
vporpo	9248428db7	[SandboxVec][DAG][NFC] Refactor setNextNode() and setPrevNode() (#122363 ) This patch updates DAG's `setNextNode()` and `setPrevNode()` to update both nodes of the link.	2025-01-10 13:32:33 -08:00
Han-Kuan Chen	35e76b6a4f	Revert "[SLP] NFC. Replace MainOp and AltOp in TreeEntry with InstructionsState. (#120198 )" This reverts commit f3d6cdc5aebafac3961d4fccbd2ca0e302c6082c.	2025-01-10 10:09:54 -08:00
Alexey Bataev	681c83a2f9	[SLP]Fix mask generation after cost estimation When estimating the cost of entries shuffles for buildvectors, need to rebuild original mask, not a generated submask, used for subregisters analysis. Fixes #122430	2025-01-10 09:32:35 -08:00
Alex MacLean	986f2ac48f	[SLPVectorizer] minor tweaks around lambdas for compatibility with older compilers (#122348 ) Older version of msvc do not have great lambda support and are not able to handle uses of class data or lambdas with implicit return types in some cases. These minor changes improve the sources compatibility with older msvc and don't hurt readability either.	2025-01-10 09:18:28 -08:00
Alexey Bataev	3c9c94a24f	Revert "[SLP]Fix mask generation after cost estimation" This reverts commit 547ba9730bf05df3383150f730a689f2c8336206 to fix buildbots reported in https://lab.llvm.org/buildbot/#/builders/123/builds/11370, https://lab.llvm.org/buildbot/#/builders/133/builds/9492	2025-01-10 08:46:42 -08:00
Alexey Bataev	547ba9730b	[SLP]Fix mask generation after cost estimation When estimating the cost of entries shuffles for buildvectors, need to rebuild original mask, not a generated submask, used for subregisters analysis. Fixes #122430	2025-01-10 08:17:56 -08:00
Mel Chen	e0f14e11c7	[SLPVectorizer] Refine the scope of RdxOpcode in HorizontalReduction::createOp (NFC) (#122239 ) This patch is one part of unifying IAnyOf and FAnyOf reduction. #118393 The related patch is #118777.	2025-01-10 16:01:36 +08:00
Han-Kuan Chen	f3d6cdc5ae	[SLP] NFC. Replace MainOp and AltOp in TreeEntry with InstructionsState. (#120198 ) Add TreeEntry::hasState. Add assert for getTreeEntry. Remove the OpValue parameter from the canReuseExtract function. Remove the Opcode parameter from the ComputeMaxBitWidth lambda function.	2025-01-09 23:41:52 -08:00
Han-Kuan Chen	5454ac28b3	Revert "[SLP] NFC. Replace MainOp and AltOp in TreeEntry with InstructionsState. (#120198 )" This reverts commit 760f550de25792db83cd39c88ef57ab6d80a41a0.	2025-01-09 18:41:47 -08:00
Han-Kuan Chen	36b423e0f8	[SLP] NFC. Refactor getSameOpcode and reduce for loop iterations. (#122241 ) Replace Cnt and AltIndex with MainOp and AltOp. Reduce the number of iterations in the for loop.	2025-01-10 09:06:07 +08:00
Han-Kuan Chen	760f550de2	[SLP] NFC. Replace MainOp and AltOp in TreeEntry with InstructionsState. (#120198 ) Add TreeEntry::hasState. Add assert for getTreeEntry. Remove the OpValue parameter from the canReuseExtract function. Remove the Opcode parameter from the ComputeMaxBitWidth lambda function.	2025-01-10 09:05:39 +08:00
Florian Hahn	7ffb691595	[VPlan] Remove dead ToRemove (NFC).	2025-01-09 22:02:32 +00:00
vporpo	6312beef78	[SandboxVec][BottomUpVec] Use SeedCollector and slice seeds (#120826 ) With this patch we switch from the temporary dummy seeds to actual seeds provided by the seed collector. The seeds get sliced and each slice is used as the starting point for vectorization.	2025-01-09 11:53:48 -08:00
Alexey Bataev	5ff36748cf	[SLP]Fix mask processing for reused gathered scalars Need to sync the mask between cost and actual emission to avoid bugs in mask calculation Fixes #122324	2025-01-09 11:24:48 -08:00
Florian Hahn	b0697dc1de	[LV] Only check isVectorizableEarlyExitLoop with multiple exits. (#121994 ) Currently we emit early-exit related debug messages/remarks even when there is a single exit. Update to only check isVectorizableEarlyExitLoop if there isn't a single exit block. PR: https://github.com/llvm/llvm-project/pull/121994	2025-01-09 12:05:19 +00:00
Benjamin Maxwell	f88ef1bd1b	[LV] Teach LoopVectorizationLegality about struct vector calls (#119221 ) This is a split-off from #109833 and only adds code relating to checking if a struct-returning call can be vectorized. This initial patch only allows the case where all users of the struct return are `extractvalue` operations that can be widened. ``` %call = tail call { float, float } @foo(float %in_val) %extract_a = extractvalue { float, float } %call, 0 %extract_b = extractvalue { float, float } %call, 1 ``` Note: The tests require the VFABI changes from #119000 to pass.	2025-01-09 09:27:29 +00:00
Alexey Bataev	5b76a2e51b	[SLP]Correctly calculate mask for the inserted vector	2025-01-08 15:18:06 -08:00
Alexey Bataev	0d921f96d4	[SLP][NFC]Introduce and use createInsertVector helper function, NFC	2025-01-08 14:26:13 -08:00
David Green	676c641718	[VectorCombine] Use getInstructionCost to cost Shuffle. (#122068 ) This allows it to produce a more accurate cost for the shuffle, using the more accurate calls to getShuffleCost in getInstructionCost. It helps fix some of the regressions from vector combine a little while ago, now that we have better subvector extract costs.	2025-01-08 20:48:40 +00:00
Alexey Bataev	1160994602	[SLP]Fix a crash for very long GEP chains Need to check if the GEP bases are equal and return false early. Also, need to return false if the lookup is too deep, considering bases equal too. Fixes a crash in the assertion.	2025-01-08 06:47:41 -08:00
Luke Lau	f0d5104c94	[VPlan] Handle some VPInstructions in may{Read,Write}FromMemory (#120058 ) This just copies the same conservative definition from mayWriteToMemory, and enables more VPInstructions to be hoisted out in LICM. I think this should give more accurate costs, and I was able to build llvm-test-suite without the legacy-vplan cost model assertion going off.	2025-01-08 15:17:26 +08:00
Vyacheslav Klochkov	9184c42869	[LoadStoreVectorizer] Postprocess and merge equivalence classes (#121861 ) This patch introduces a new method: void Vectorizer::mergeEquivalenceClasses(EquivalenceClassMap &EQClasses) const; The method is called at the end of Vectorizer::collectEquivalenceClasses() and is needed to merge equivalence classes that differ only by their underlying objects (UO1 and UO2), where UO1 is 1-level-indirection underlying base for UO2. This situation arises due to the limited lookup depth used during the search of underlying bases with llvm::getUnderlyingObject(ptr). Using any fixed lookup depth can result into creation of multiple equivalence classes that only differ by 1-level indirection bases. The new approach merges equivalence classes if they have adjacent bases (1-level indirection). If a series of equivalence classes form ladder formed of 1-step/level indirections, they are all merged into a single equivalence class. This provides more opportunities for the load-store vectorizer to generate better vectors. --------- Signed-off-by: Klochkov, Vyacheslav N <vyacheslav.n.klochkov@intel.com>	2025-01-07 17:17:26 -08:00
Florian Hahn	0eaa69eb23	[VPlan] Handle VPExpandSCEVRecipe in isUniformAfterVectorization. VPExpandSCEVRecipes must be placed in the entry and are alway uniform. This fixes a crash by always identifying them as uniform, even if the main vector loop region has been removed. Fixes https://github.com/llvm/llvm-project/issues/121897.	2025-01-07 21:35:09 +00:00
Florian Mayer	ef391dbc29	[LV] Drop incorrect inbounds for reverse vector pointer when folding tail (#120730 ) When folding the tail, we may compute an address that we don't in the original scalar loop and it may not be inbounds. Drop Inbounds in that case.	2025-01-07 06:14:01 -08:00
Simon Pilgrim	a5e129ccde	[CostModel][X86] getVectorInstrCost - correctly cost v4f32 insertelement into index 0 This is just the MOVSS instruction (SSE41 INSERTPS is still necessary for index != 0) This exposed an issue in VectorCombine::foldInsExtFNeg - we need to use the more general SK_PermuteTwoSrc shuffle kind to allow getShuffleCost to match other shuffle kinds (not just SK_Select).	2025-01-07 12:23:45 +00:00
Florian Hahn	f9369cc602	[VPlan] Make sure last IV increment value is available if needed. Legalize extract-from-ends using uniform VPReplicateRecipe of wide inductions to use regular VPReplicateRecipe, so the correct end value is available. Fixes https://github.com/llvm/llvm-project/issues/121745.	2025-01-06 22:40:41 +00:00
Simon Pilgrim	d993b11b86	[VectorCombine] Remove superfluous whitespace from debug log comment. NFC.	2025-01-06 15:37:15 +00:00
David Sherwood	1feeeb47e5	[LoopVectorize][NFC] Move "LV: Selecting VF" debug output (#120744 ) Move the debug output that prints out the selected VF from selectVectorizationFactor -> computeBestVF. This means that the output will still be written even after removing the assert for the legacy and vplan cost models matching.	2025-01-06 10:39:34 +00:00
Yingwei Zheng	a77346bad0	[IRBuilder] Refactor FMF interface (#121657 ) Up to now, the only way to set specified FMF flags in IRBuilder is to use `FastMathFlagGuard`. It makes the code ugly and hard to maintain. This patch introduces a helper class `FMFSource` to replace the original parameter `Instruction *FMFSource` in IRBuilder. To maximize the compatibility, it accepts an instruction or a specified FMF. This patch also removes the use of `FastMathFlagGuard` in some simple cases. Compile-time impact: https://llvm-compile-time-tracker.com/compare.php?from=f87a9db8322643ccbc324e317a75b55903129b55&to=9397e712f6010be15ccf62f12740e9b4a67de2f4&stat=instructions%3Au	2025-01-06 14:37:04 +08:00
Florian Hahn	f4230b4332	[VPlan] Add and use debug location for VPScalarCastRecipe. Update the recipe it always take a debug location and set it.	2025-01-05 20:08:51 +00:00
Florian Hahn	f48884ded8	[VPlan] Remove loop region in optimizeForVFAndUF. (#108378 ) Update optimizeForVFAndUF to completely remove the vector loop region when possible. At the moment, we cannot remove the region if it contains * widened IVs: the recipe is needed to generate the step vector * reductions: ComputeReductionResults requires the reduction phi recipe for codegen. Both cases can be addressed by more explicit modeling. The patch also includes a number of updates to allow executing VPlans without a vector loop region. Depends on https://github.com/llvm/llvm-project/pull/110004	2025-01-05 15:50:42 +00:00
Simon Pilgrim	054e7c5971	[VectorCombine] foldInsExtVectorToShuffle - ignore shuffle costs for 'identity' insertion masks <u,1,u,u> 'inplace' single src shuffles can be treated as free identity shuffles - ignore any shuffle cost (similar to what we already do in other folds like foldShuffleOfShuffles) - eventually getShuffleCost should just return TCC_Free in these cases but in a lot of the targets' shuffle cost logic this currently ends up treated as a generic SK_PermuteSingleSrc. We still want to generate the shuffle as it will help further shuffle folds with the additional PoisonMaskElem 'undemanded' elements.	2025-01-05 13:02:31 +00:00
Florian Hahn	df4a615c98	[VPlan] Convert induction increment check to be VPlan-based. Check the VPlan directly to determine if a VPValue is an optimiziable IV or IV use instead of checking the underlying IR instructions. Split off from https://github.com/llvm/llvm-project/pull/112147. This refactoring enables moving IV end value creation from the legacy fixupIVUsers to a VPlan-based transform. There is one case we now won't optimize, that is IVs with subtracts and non-constant steps. But as this is a minor optimization and doesn't impact correctness, the benefits of performing the check in VPlan should outweigh the missed case.	2025-01-05 11:16:01 +00:00
Luke Lau	7700695739	[VPlan] Fix crash with EVL tail folding intrinsic with no corresponding VP (#121542 ) This fixes a crash when building SPEC CPU 2017 with EVL tail folding when widening @llvm.log10 intrinsics. @llvm.log10 and some other intrinsics don't have a corresponding VP intrinsic, so this fixes the crash by removing the assert and bailing instead.	2025-01-05 11:41:56 +08:00
Florian Hahn	b95cce9904	[VPlan] Update wide induction inc recipes to use same step as Wide IV. Update wide induction increments to use the same step as the corresponding wide induction. This enables detecting induction increments directly in VPlan and removes redundant splats.	2025-01-04 20:04:59 +00:00
Florian Hahn	20d491bb99	[VPlan] Remove re-using vector PH in VPBasicBlock::execute (NFC). Remove logic to re-use the previous basic block for the vector pre header from VPBasicBlock::execute. The preheader is now modeled as VPIRBasicBlock, so the code is no longer needed. Split off from https://github.com/llvm/llvm-project/pull/108378.	2025-01-03 19:56:44 +00:00
Florian Hahn	11c6af666b	[VPlan] Fix name ExitVPBB -> MiddleVPBB (NFC). ExitVPBB actually refers to the middle block, clarify name.	2025-01-03 19:28:03 +00:00
Han-Kuan Chen	c50370c67a	[SLP] NFC. Use InstructionsState::valid if users just want to know whether VL has same opcode. (#120217 ) Add assert for InstructionsState::getOpcode. Use InstructionsState::getOpcode only when necessary.	2025-01-04 00:44:57 +08:00
Simon Pilgrim	e3ec5a7286	[VectorCombine] foldShuffleOfBinops - fold shuffle(binop(shuffle(x),shuffle(z)),binop(shuffle(y),shuffle(w)) -> binop(shuffle(x,z),shuffle(y,w)) (#120984 ) Some patterns (in particular horizontal style patterns) can end up with shuffles straddling both sides of a binop/cmp. Where individually the folds aren't worth it, by merging the (oneuse) shuffles we can notably reduce the net instruction count and cost. One of the final steps towards finally addressing #34072	2025-01-03 10:29:07 +00:00
Florian Hahn	5f5792aedb	[VPlan] Use removeDeadRecipes in optimizeForVFAndUF (NFCI) Split off from https://github.com/llvm/llvm-project/pull/108378.	2025-01-02 20:10:46 +00:00
Simon Pilgrim	035e64c0ec	[VectorCombine] eraseInstruction - ensure we reattempt to fold other users of an erased instruction's operands (REAPPLIED) As we're reducing the use count of the operands its more likely that they will now fold, as they were previously being prevented by a m_OneUse check, or the cost of retaining the extra instruction had been too high. This is necessary for some upcoming patches, although the only change so far is instruction ordering as it allows some SSE folds of 256/512-bit with 128-bit subvectors to occur earlier in foldShuffleToIdentity as the subvector concats are free. Reapplied with a fix for foldSingleElementStore/scalarizeLoadExtract which were replacing/removing memory operations - we need to ensure that the worklist is populated in the correct order so all users of the old memory operations are erased first, so there are no remaining users of the loads when its time to remove them as well. Pulled out of #120984	2025-01-02 18:19:02 +00:00
Simon Pilgrim	f739aa4004	[VectorCombine] replaceValue - add "VC: Replacing" debug message to help the log show replacement for old/new.	2025-01-02 17:23:13 +00:00

1 2 3 4 5 ...

5412 Commits