llvm-project

Author	SHA1	Message	Date
Florian Hahn	88e9c56990	[LV] Don't adjust name of recurrence phi in scalar loop (NFC). Adjusting the name of the recurrence phi in the scalar loop is a bit inconsistent, as we do not adjust any other names in the scalar loops (including other phis). Remove this adjustment in preparation for https://github.com/llvm/llvm-project/pull/94760/ and as discussed there.	2024-07-10 18:37:35 +01:00
Florian Hahn	b841e2eca3	Recommit "[VPlan] First step towards VPlan cost modeling. (#92555 )" This reverts commit 6f538f6a2d3224efda985e9eb09012fa4275ea92. A number of crashes have been fixed by separate fixes, including ttps://github.com/llvm/llvm-project/pull/96622. This version of the PR also pre-computes the costs for branches (except the latch) instead of computing their costs as part of costing of replicate regions, as there may not be a direct correspondence between original branches and number of replicate regions. Original message: This adds a new interface to compute the cost of recipes, VPBasicBlocks, VPRegionBlocks and VPlan, initially falling back to the legacy cost model for all recipes. Follow-up patches will gradually migrate recipes to compute their own costs step-by-step. It also adds getBestPlan function to LVP which computes the cost of all VPlans and picks the most profitable one together with the most profitable VF. The VPlan selected by the VPlan cost model is executed and there is an assert to catch cases where the VPlan cost model and the legacy cost model disagree. Even though I checked a number of different build configurations on AArch64 and X86, there may be some differences that have been missed. Additional discussions and context can be found in @arcbbb's https://github.com/llvm/llvm-project/pull/67647 and https://github.com/llvm/llvm-project/pull/67934 which is an earlier version of the current PR. PR: https://github.com/llvm/llvm-project/pull/92555	2024-07-10 14:22:21 +01:00
Alexey Bataev	3742c2a83c	[SLP]Use stored signedness after minbitwidth analysis. Need to used stored signedness info for the root node instead of recalculating it after the vectorization, which may lead to a compiler crash.	2024-07-10 03:58:00 -07:00
Han-Kuan Chen	ac299ed2c7	[SLP] Provide an universal interface for FixedVectorType::get. NFC. (#96845 ) SLP vectorizes scalar type to vector type. In the future, we will try to make SLP vectorizes vector type to vector type. We add a getWidenedType as a helper function. For example, SLP will make the following code %v0 = load i32, ptr %in0, align 4 %v1 = load i32, ptr %in1, align 4 %v2 = load i32, ptr %in2, align 4 %v3 = load i32, ptr %in3, align 4 into a load <4 x i32>. The ScalarTy is i32 and VF is 4. In the future, SLP will make the following code %v0 = load <4 x i32>, ptr %in0, align 4 %v1 = load <4 x i32>, ptr %in1, align 4 %v2 = load <4 x i32>, ptr %in2, align 4 %v3 = load <4 x i32>, ptr %in3, align 4 into a load <16 x i32>. The ScalarTy is <4 x i32> and VF is 4. reference: https://discourse.llvm.org/t/rfc-make-slp-vectorizer-revectorize-vector-instructions/79436	2024-07-10 11:50:35 +08:00
Alexey Bataev	af21bc1917	[SLP]Fix a crash on attempt to revectorize vectorized phi. If the PHI node is vectorized during vectorization of its operands, no need to try to vectorize its operands once again.	2024-07-09 14:11:08 -07:00
Florian Hahn	ef89e3efa9	[VPlan] Collect ephemeral values for VPlan. Port collectEphemeralValues to VPlan as collectEphemeralRecipesForVPlan, use it in willGenerateVectors. This fixes a regression caused by 29b8b72117 for loops where the only vector values are ephemeral.	2024-07-09 21:34:49 +01:00
Alexey Bataev	822a818786	[SLP][NFC]Add comments for the code, NFC.	2024-07-09 10:06:34 -07:00
Alexey Bataev	a988821123	[SLP]Keep the original order in the reductions. The patch tries to keep the original order of the instruction in the reductions. Previously, two first instructions were switched, giving reverse order. The first step to support of the ordered reductions. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/98025	2024-07-09 12:26:42 -04:00
Florian Hahn	7346e7cc47	[VPlan] Update HCFG builder after 72937203dd3b to fix leak. Update buildPlainCFG to re-use the vector and latch VPBBs created as part of the initial skeleton in 72937203dd3b. This should fix the leak sanitizer failure discovered by https://lab.llvm.org/buildbot/#/builders/52/builds/619.	2024-07-09 15:28:43 +01:00
Alexey Bataev	2cba218ca5	[SLP]Fix PR98133: Inserting PHI after debug-records! The phi-node-to-be-deleted still should be inserted as the first instruction in the block to avoid random compiler crashes. Fixes https://github.com/llvm/llvm-project/issues/98133	2024-07-09 05:44:45 -07:00
Simon Pilgrim	ef5b1ec0dd	[VectorCombine] foldShuffleToIdentity - ensure casts have the same source type	2024-07-09 13:10:11 +01:00
Florian Hahn	c16e37867c	[VPlan] Clarify setting Lane in fixPhi (NFCI). Split off from https://github.com/llvm/llvm-project/pull/94760, clarify as suggested.	2024-07-09 13:09:40 +01:00
Florian Hahn	72937203dd	[VPlan] Create vector header and latch VPBBs in createInitialVPlan (NFC) The empty header and latch blocks can be created together with the vector loop region. This is in preparation for splitting up the very large tryToBuildVPlanWithVPRecipes into several distinct functions, as suggested multiple times, including in https://github.com/llvm/llvm-project/pull/94760	2024-07-09 12:41:12 +01:00
Florian Hahn	a2a0ef567c	[VPlan] Retrieve LatchVPBB from region in adjustRecipesForRed (NFC) The HeaderVPBB is retrieved in a similar fashion already. This is in preparation for splitting up the very large tryToBuildVPlanWithVPRecipes into several distinct functions, as suggested multiple times, including in https://github.com/llvm/llvm-project/pull/94760	2024-07-09 10:48:43 +01:00
Alexey Bataev	f5ee07a1b5	[SLP]Improve instruction reordering mode detection. The "instruction" reordering mode should be selected only if there are compatible instructions in other operands, which can be reordered. Otherwise, better to select splat reordering mode. Metric: size..text Program size..text results results0 diff test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12383340.00 12383324.00 -0.0% Some 4x operations get replaced by 8x. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/97485	2024-07-08 16:01:55 -04:00
Florian Hahn	0577cdaa32	[LV] Split checking if tail-folding is possible, collecting masked ops. (#77612 ) Introduce new canFoldTail helper which only checks if tail-folding is possible, but without modifying MaskedOps. Just because tail-folding is possible doesn't mean the tail will be folded; that's up to the cost-model to decide. Separating the check if tail-folding is possible and preparing for tail-folding makes sure that MaskedOps is only populated when tail-folding is actually selected. PR: https://github.com/llvm/llvm-project/pull/77612	2024-07-08 16:34:42 +01:00
Alexey Bataev	385118644c	[SLP]Remove operands upon marking instruction for deletion. If the instruction is marked for deletion, better to drop all its operands and mark them for deletion too (if allowed). It allows to have more vectorizable patterns and generate less useless extractelement instructions. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/97409	2024-07-08 07:56:48 -07:00
Simon Pilgrim	7e054c33d4	[VectorCombine] foldShuffleOfCastops - don't restrict to oneuse but compare total costs instead Some casts (especially bitcasts but others as well) are incredibly cheap (or free), so don't limit the shuffle(cast(x),cast(y)) -> cast(shuffle(x,y)) to oneuse cases, but instead compare the total before/after costs of possibly repeating some casts.	2024-07-08 14:57:51 +01:00
Alexey Bataev	4c47b41771	[SLP]Allow matching and shuffling of extractelement vector operands with different VF. Allows better codegen with the free resizing of small VF vector operands and then regular shuffling of the operands of the same size and simplifies the code. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/97414	2024-07-08 09:27:08 -04:00
tcwzxx	c2fe75f99c	Make the logic for checking scatter vectorized nodes of GEP clearer (#97826 ) There is no functional change. Authored-by: zhizhixu <zhizhixu@tencent.com>	2024-07-08 06:08:04 -04:00
Florian Hahn	29b8b72117	[LV] Move check if any vector insts will be generated to VPlan. (#96622 ) This patch moves the check if any vector instructions will be generated from getInstructionCost to be based on VPlan. This simplifies getInstructionCost, is more accurate as we check the final result and also allows us to exit early once we visit a recipe that generates vector instructions. The helper can then be re-used by the VPlan-based cost model to match the legacy selectVectorizationFactor behavior, this fixing a crash and paving the way to recommit https://github.com/llvm/llvm-project/pull/92555. PR: https://github.com/llvm/llvm-project/pull/96622	2024-07-07 20:08:01 +01:00
Kazu Hirata	75bc20ff89	[llvm] Remove redundant calls to std::unique_ptr<T>::get (NFC) (#97914 )	2024-07-07 08:23:41 +09:00
Florian Hahn	ac03ae30cf	[LV] Preserve LAA in LoopVectorize (NFCI). LoopVectorize already always preserves DT, LI and SCEV. If any changes get made to the CFG, cached LAA info for loops are cleared. LoopAccessAnalysis also implements ::invalidate to clear the analysis if SE, DT or LI gets invalidated. Hence it should be safe to preserve LAA and save a small amount of compile-time.	2024-07-05 21:41:31 +01:00
Florian Hahn	eedc2c8cb2	[LV] Remove now obsolete DT updates of scalar exit block. Remove manual DT updates of scalar exit blocks during legacy skeleton creation, as they are not needed after 99d6c6d9365. This fixes DT verification failures with expensive checks, including https://lab.llvm.org/buildbot/#/builders/16/builds/1270.	2024-07-05 11:20:44 +01:00
Florian Hahn	99d6c6d936	[VPlan] Model branch cond to enter scalar epilogue in VPlan. (#92651 ) This patch moves branch condition creation to enter the scalar epilogue loop to VPlan. Modeling the branch in the middle block also requires modeling the successor blocks. This is done using the recently introduced VPIRBasicBlock. Note that the middle.block is still created as part of the skeleton and then patched in during VPlan execution. Unfortunately the skeleton needs to create the middle.block early on, as it is also used for induction resume value creation and is also needed to properly update the dominator tree during skeleton creation. After this patch lands, I plan to move induction resume value and phi node creation in the scalar preheader to VPlan. Once that is done, we should be able to create the middle.block in VPlan directly. This is a re-worked version based on the earlier https://reviews.llvm.org/D150398 and the main change is the use of VPIRBasicBlock. Depends on https://github.com/llvm/llvm-project/pull/92525 PR: https://github.com/llvm/llvm-project/pull/92651	2024-07-05 10:08:42 +01:00
Simon Pilgrim	b546096d94	[VectorCombine] foldShuffleToIdentity - handle bitcasts with equal element counts (#97731 ) Basic initial patch for #96884 that just handles case where we bitcast between float/integers of the same element width	2024-07-05 09:47:42 +01:00
Florian Hahn	8299bfaf29	[VPlan] Extract reduction result insertion point to variable (NFCI). Split off from https://github.com/llvm/llvm-project/pull/92651 as suggested.	2024-07-04 16:25:49 +01:00
Florian Hahn	2b3b405b09	[LV] Don't vectorize first-order recurrence with VF <vscale x 1 x ..> The assertion added as part of https://github.com/llvm/llvm-project/pull/93395 surfaced cases where first-order recurrences are vectorized with <vscale x 1 x ..>. If vscale is 1, then we are unable to extract the penultimate value (second to last lane). Previously this case got mis-compiled, trying to extract from an invalid lane (-1) https://llvm.godbolt.org/z/3adzYYcf9. Fixes https://github.com/llvm/llvm-project/issues/97452.	2024-07-04 11:44:51 +01:00
Jon Roelofs	d3a76b03d8	[llvm][SLPVectorizer] Fix a bad cast assertion (#97621 ) Fixes: rdar://128092379	2024-07-03 16:25:32 -07:00
Alexey Bataev	873c3f7e78	Revert "[SLP]Remove operands upon marking instruction for deletion." This reverts commit bbd52dd44ceee80e3b6ba6a9b2bd8ee9a9713833 to fix a crash revealed in https://lab.llvm.org/buildbot/#/builders/4/builds/505	2024-07-03 13:05:17 -07:00
Alexey Bataev	bbd52dd44c	[SLP]Remove operands upon marking instruction for deletion. If the instruction is marked for deletion, better to drop all its operands and mark them for deletion too (if allowed). It allows to have more vectorizable patterns and generate less useless extractelement instructions. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/97409	2024-07-03 15:11:18 -04:00
Alexey Bataev	4eecf3c650	[SLP]Reorder buildvector/reduction vectorization and fuse the loops. Currently SLP vectorizer tries at first to find reduction nodes, and then vectorize buildvector sequences. Need to try to vectorize wide buildvector sequences at first and only then try to vectorize reductions, and then smaller buildvector sequences. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/96943	2024-07-03 14:36:30 -04:00
Min-Yih Hsu	8b55d342b6	[RISCV][LoopIdiomVectorize] Support VP intrinsics in LoopIdiomVectorize (#94082 ) Teach LoopIdiomVectorize to use VP intrinsics to replace the byte compare loops. Right now only RISC-V uses LoopIdiomVectorize of this style.	2024-07-02 18:48:28 -07:00
Min-Yih Hsu	de5ff38a0d	[LoopIdiomVectorize][NFC] Factoring out the part that handles vectorization strategy (#94682 ) To pave the way for porting LIV to RISC-V, which uses VP intrinsics for vectors. NFC.	2024-07-02 18:21:41 -07:00
Gabriel Baraldi	380beaec86	Fix potential crash in SLPVectorizer caused by missing check (#95937 ) I'm not super familiar with this code, but it seems that we were just missing a check. The original code that triggered this did not have uselistorders but llvm-reduce created them and it reproduces the same issue in a way more compact way. Fixes https://github.com/llvm/llvm-project/issues/95016	2024-07-02 08:15:51 -04:00
Youngsuk Kim	2051736f7b	[llvm][Transforms] Avoid 'raw_string_ostream::str' (NFC) Since `raw_string_ostream` doesn't own the string buffer, it is desirable (in terms of memory safety) for users to directly reference the string buffer rather than use `raw_string_ostream::str()`. Work towards TODO comment to remove `raw_string_ostream::str()`.	2024-06-30 09:03:29 -05:00
Alexey Bataev	d70963a762	[SLP]Fix the cost of the adjusted extracts in per-register analysis. Previous patch did not pass the list of the extract indices by reference, so the compiler just ignored them. Pass indices by reference and fix the per-register analysis. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/96808	2024-06-28 14:33:08 -07:00
Alexey Bataev	a9c12e481b	Revert "[SLP]Fix the cost of the adjusted extracts in per-register analysis." This reverts commit 784152056ea40a800a8fd9f4157a428dfb7a6de8 to fix buildbots issues reported in https://lab.llvm.org/buildbot/#/builders/4/builds/315 and https://lab.llvm.org/buildbot/#/builders/35/builds/481	2024-06-28 13:41:51 -07:00
Alexey Bataev	784152056e	[SLP]Fix the cost of the adjusted extracts in per-register analysis. Previous patch did not pass the list of the extract indices by reference, so the compiler just ignored them. Pass indices by reference and fix the per-register analysis. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/96808	2024-06-28 15:49:47 -04:00
David Green	76c8e1d857	[VectorCombine] Guard against the lane zero select predicate being scalar All but the first lane was being checked, but this could leave the first lane with a scalar select predicate. This just extends the check to make sure the types are all the same	2024-06-28 17:27:16 +01:00
Nikita Popov	9df71d7673	[IR] Add getDataLayout() helpers to Function and GlobalValue (#96919 ) Similar to https://github.com/llvm/llvm-project/pull/96902, this adds `getDataLayout()` helpers to Function and GlobalValue, replacing the current `getParent()->getDataLayout()` pattern.	2024-06-28 08:36:49 +02:00
Nikita Popov	2d209d964a	[IR] Add getDataLayout() helpers to BasicBlock and Instruction (#96902 ) This is a helper to avoid writing `getModule()->getDataLayout()`. I regularly try to use this method only to remember it doesn't exist... `getModule()->getDataLayout()` is also a common (the most common?) reason why code has to include the Module.h header.	2024-06-27 16:38:15 +02:00
Florian Hahn	06079233f8	[VPlan] Return std::nullopt early if plans are empty. Fixes a crash caused by abf5969.	2024-06-27 12:25:59 +01:00
Kolya Panchenko	49e5cd2acc	[LV][NFC] Marked functions as const. Added LLVM_DEBUG. (#96681 )	2024-06-26 17:38:18 -04:00
Alexey Bataev	6f582b7ed3	[SLP][NFC]Remove extra check for VU.	2024-06-26 05:39:37 -07:00
Alexey Bataev	0280f97b36	[SLP]Fix PR95925: extract vectorized index of the potential buildvector sequence. If the vectorized scalar is not the insert value in the buildvector sequence but the index, it should be always extracted.	2024-06-25 14:07:51 -07:00
Alexey Bataev	228c2e1473	[SLP]Fix incorrect promotion of nodes before shuffling. If the base node is signed, but some values are unsigned, still the whole node should be considered signed. Also, an extra bitwidth analysis should be performed, when estimating the minimal bitwidth.	2024-06-25 13:39:28 -07:00
Han-Kuan Chen	de7c1396f2	[SLP] NFC. Refactor and add getAltInstrMask help function. (#94709 ) Co-authored-by: Alexey Bataev <a.bataev@gmx.com>	2024-06-26 00:42:38 +08:00
Nikita Popov	8263bec533	[SLP] Use poison instead of undef in reorderScalars() (#96619 ) -1 mask elements are specified to return poison rather than undef nowadays , so update the reorderScalars() implementation to match.	2024-06-25 14:23:40 +02:00
Ramkumar Ramachandra	0f111ba790	LoopInfo: introduce Loop::getLocStr; unify debug output (#93051 ) Introduce a Loop::getLocStr stolen from LoopVectorize's static function getDebugLocString in order to have uniform debug output headers across LoopVectorize, LoopAccessAnalysis, and LoopDistribute. The motivation for this change is to have UpdateTestChecks recognize the headers and automatically generate CHECK lines for debug output, with minimal special-casing.	2024-06-25 13:12:15 +01:00

1 2 3 4 5 ...

4613 Commits