llvm-project

Author	SHA1	Message	Date
Florian Hahn	0d0abb351b	[VPlan] Use ResumePhi to create reduction resume phis. (#110004 ) Use VPInstruction::ResumePhi to create phi nodes for reduction resume values in the scalar preheader, similar to how ResumePhis are used for first-order recurrence resume values after 9a5a8731e77. This allows simplifying createAndCollectMergePhiForReduction to only collect reduction resume phis when vectorizing epilogue loops and adding extra incoming edges from the main vector loop. Updating phis for the epilogue vector loops requires special attention, because additional incoming values from the bypass blocks need to be added. PR: https://github.com/llvm/llvm-project/pull/110004	2024-10-28 20:14:08 +01:00
Shih-Po Hung	266ff98cba	[LV][VPlan] Use VF VPValue in VPVectorPointerRecipe (#110974 ) Refactors VPVectorPointerRecipe to use the VF VPValue to obtain the runtime VF, similar to #95305. Since only reverse vector pointers require the runtime VF, the patch sets VPUnrollPart::PartOpIndex to 1 for vector pointers and 2 for reverse vector pointers. As a result, the generation of reverse vector pointers is moved into a separate recipe.	2024-10-26 23:18:50 +08:00
Ramkumar Ramachandra	f719cfa868	LAA: be less conservative in isNoWrap (#112553 ) isNoWrap has exactly one caller which handles Assume = true separately, but too conservatively. Instead, pass Assume to isNoWrap, so it is threaded into getPtrStride, which has the correct handling for the Assume flag. Also note that the Stride == 1 check in isNoWrap is incorrect: getPtrStride returns Strides == 1 or -1, except when isNoWrapAddRec or Assume are true, assuming ShouldCheckWrap is true; we can include the case of -1 Stride, and when isNoWrapAddRec is true. With this change, passing Assume = true to getPtrStride could return a non-unit stride, and we correctly handle that case as well.	2024-10-22 09:55:51 +01:00
Alexey Bataev	f148d5791b	[LV]Initial support for safe distance in predicated DataWithEVL vectorization mode. Enabled initial support for max safe distance in DataWithEVL mode. If max safe distance is required, need to emit special code: CMP = icmp ult AVL, MAX_SAFE_DISTANCE SAFE_AVL = select CMP, AVL, MAX_SAFE_DISTANCE EVL = call i32 @llvm.experimental.get.vector.length(i64 SAFE_AVL) while vectorize the loop in DataWithEVL tail folding mode. Reviewers: fhahn Reviewed By: fhahn Pull Request: https://github.com/llvm/llvm-project/pull/102897	2024-10-18 15:51:49 -04:00
Florian Hahn	b497010854	[VPlan] Use VPInstruction::Name when assigning names (NFCI). This slightly improves the printing of VPInstructions. NFC except debug output.	2024-10-18 05:52:35 +01:00
Florian Hahn	3860e29e0e	[VPlan] Mark VPVectorPointerRecipe as not having sideeffects. VectorPointer doesn't read from memory or have any sideeffects. Mark it accordingly.	2024-10-16 06:10:19 +01:00
Florian Hahn	34cdd67c85	[VPlan] Use VPWidenIntrinsicRecipe to vp.select. (#110489 ) Use VPWidenIntrinsicRecipe (https://github.com/llvm/llvm-project/pull/110486) to create vp.select intrinsics. This potentially offers an alternative to duplicating EVL recipes for all existing recipes. There are some recipes that will need duplicates (at least at the moment), due to extra code-gen needs (e.g. widening loads and stores). But in cases the intrinsic can directly be used, creating the widened intrinsic directly would reduce the need to duplicate some recipes. PR: https://github.com/llvm/llvm-project/pull/110489	2024-10-15 21:48:15 +01:00
Florian Hahn	65da32c634	[LV] Account for any-of reduction when computing costs of blend phis. Any-of reductions are narrowed to i1. Update the legacy cost model to use the correct type when computing the cost of a phi that gets lowered to selects (BLEND). This fixes a divergence between legacy and VPlan-based cost models after 36fc291b6ec6d. Fixes https://github.com/llvm/llvm-project/issues/111874.	2024-10-11 11:27:22 +01:00
David Sherwood	72f339de45	[LoopVectorize] Use predicated version of getSmallConstantMaxTripCount (#109928 ) There are a number of places where we call getSmallConstantMaxTripCount without passing a vector of predicates: getSmallBestKnownTC isIndvarOverflowCheckKnownFalse computeMaxVF isMoreProfitable I've changed all of these to now pass in a predicate vector so that we get the benefit of making better vectorisation choices when we know the max trip count for loops that require SCEV predicate checks. I've tried to add tests that cover all the cases affected by these changes.	2024-10-11 10:10:15 +01:00
Florian Hahn	3ec6f805c5	[VPlan] Don't created GEP x, 0 for interleave group pointers. The GEP with offet 0 is redundant, remove it. This addresses a TODO from 7f74651837b ((#106431).	2024-10-08 12:08:13 +01:00
Luke Lau	366e469db9	[RISCV] Add cost tests for more interleave factors. NFC This shows how we're not properly scaling the cost with the number of factors, i.e. a factor 8 interleave costs the same as a factor 2 interleave at VF=2.	2024-10-08 17:16:17 +08:00
Luke Lau	e98875af4c	[RISCV] Add scalable interleave cost tests. NFC This gets the cost from the recipe output rather than the individual instruction cost. The factor 3 test was left alone since we don't support anything else other than factor 2 for scalable vectors currently.	2024-10-08 13:39:13 +08:00
Florian Hahn	7f74651837	[VPlan] Use pointer to member 0 as VPInterleaveRecipe's pointer arg. (#106431 ) Update VPInterleaveRecipe to always use the pointer to member 0 as pointer argument. This in many cases helps to remove unneeded index adjustments and simplifies VPInterleaveRecipe::execute. In some rare cases, the address of member 0 does not dominate the insert position of the interleave group. In those cases a PtrAdd VPInstruction is emitted to compute the address of member 0 based on the address of the insert position. Alternatively we could hoist the recipe computing the address of member 0.	2024-10-06 22:53:13 +01:00
Shih-Po Hung	26fca7256e	[VPlan][NFC] Use patterns in test check (#111086 )	2024-10-04 17:19:07 +08:00
Elvis Wang	a068b974b1	[VPlan] Implement VPWidenLoad/StoreEVLRecipe::computeCost(). (#109644 ) Currently the EVL recipes transfer the tail masking to the EVL. But in the legacy cost model, the mask exist and will calculate the instruction cost of the mask. To fix the difference between the VPlan-based cost model and the legacy cost model, we always calculate the instruction cost for the mask in the EVL recipes. Note that we should remove the mask cost in the EVL recipes when we don't need to compare to the legacy cost model. This patch also fixes #109468.	2024-09-26 07:10:25 +08:00
Alexey Bataev	60ed2361c0	[LV][EVL]Explicitly model AVL as sub, original TC, EVL_PHI. Patch explicitly models AVL as sub original TC, EVL_PHI instead of having it in EXPLICIT-VECTOR-LENGTH VPInstruction. Required for correct safe dependence distance suport. Reviewers: fhahn, ayalz Reviewed By: ayalz Pull Request: https://github.com/llvm/llvm-project/pull/108869	2024-09-25 08:58:29 -04:00
Florian Hahn	53266f73f0	[VPlan] Run DCE after unrolling. This cleans up a number of dead recipes after unrolling if only their first or last parts are used. This simplifies a number of tests. Fixes https://github.com/llvm/llvm-project/issues/109581.	2024-09-22 22:08:46 +01:00
Florian Hahn	8ec406757c	[VPlan] Implement unrolling as VPlan-to-VPlan transform. (#95842 ) This patch implements explicit unrolling by UF as VPlan transform. In follow up patches this will allow simplifying VPTransform state (no need to store unrolled parts) as well as recipe execution (no need to generate code for multiple parts in an each recipe). It also allows for more general optimziations (e.g. avoid generating code for recipes that are uniform-across parts). It also unifies the logic dealing with unrolled parts in a single place, rather than spreading it out across multiple places (e.g. VPlan post processing for header-phi recipes previously.) In the initial implementation, a number of recipes still take the unrolled part as additional, optional argument, if their execution depends on the unrolled part. The computation for start/step values for scalable inductions changed slightly. Previously the step would be computed as scalar and then splatted, now vscale gets splatted and multiplied by the step in a vector mul. This has been split off https://github.com/llvm/llvm-project/pull/94339 which also includes changes to simplify VPTransfomState and recipes' ::execute. The current version mostly leaves existing ::execute untouched and instead sets VPTransfomState::UF to 1. A follow-up patch will clean up all references to VPTransformState::UF. Another follow-up patch will simplify VPTransformState to only store a single vector value per VPValue. PR: https://github.com/llvm/llvm-project/pull/95842	2024-09-21 19:47:37 +01:00
Florian Hahn	4eb9838409	[VPlan] Generalize VPValue::isDefinedOutsideLoopRegions. Update isDefinedOutsideLoopRegions to check if a recipe is defined outside any region. Split off already approved https://github.com/llvm/llvm-project/pull/95842 now that this can be tested separately after landing VPlan-based LICM https://github.com/llvm/llvm-project/issues/107501	2024-09-20 15:34:00 +01:00
Florian Hahn	a861ed411a	[VPlan] Add initial loop-invariant code motion transform. (#107894 ) Add initial transform to move out loop-invariant recipes. This also helps to fix a divergence between legacy and VPlan-based cost model due to legacy using ScalarEvolution::isLoopInvariant in some cases. Fixes https://github.com/llvm/llvm-project/issues/107501. PR: https://github.com/llvm/llvm-project/pull/107894	2024-09-20 11:22:03 +01:00
Shih-Po Hung	ffcff2f465	[VPlan][NFC] Fix the value name of VECTOR_GEP (#107544 ) This patch passes the string `"vector.gep"` to CreateGEP instead of CreateMul.	2024-09-18 19:22:36 +08:00
LiqinWeng	a2994b2999	[LV][NFC] Unify printing for WidenEVLReicpe with other EVL recipes (#108177 )	2024-09-18 15:03:37 +08:00
Luke Lau	30d7dcc1db	[RISCV] Add asserts requirement to loop vectorizer tests Hopefully this fixes a buildbot failure on fuchsia where opt doesn't have -debug-only	2024-09-17 14:18:36 +08:00
Luke Lau	41f1b467a2	[RISCV] Account for zvfhmin and zvfbfmin promotion in register usage (#108370 ) A half with only zvfhmin or bfloat will end up getting promoted to a f32 for most instructions. Unless the loop consists only of memory ops and permutation instructions which don't need promoted (is this common?), we'll end up using double the LMUL than what's currently being returned by getRegUsageForType. Since this is used by the loop vectorizer, it seems better to be conservative and assume that any usage of a zvfhmin half/bfloat will end up being widened to a f32	2024-09-17 13:50:19 +08:00
Florian Hahn	f0c5caa814	[VPlan] Add VPIRInstruction, use for exit block live-outs. (#100735 ) Add a new VPIRInstruction recipe to wrap existing IR instructions not to be modified during execution, execept for PHIs. For PHIs, a single VPValue operand is allowed, and it is used to add a new incoming value for the single predecessor VPBB. Expect PHIs, VPIRInstructions cannot have any operands. Depends on https://github.com/llvm/llvm-project/pull/100658. PR: https://github.com/llvm/llvm-project/pull/100735	2024-09-14 21:21:55 +01:00
Florian Hahn	ea83e1c05a	[LV] Assign cost to all interleave members when not interleaving. At the moment, the full cost of all interleave group members is assigned to the instruction at the group's insert position, even if the decision was to not form an interleave group. This can lead to inaccurate cost estimates, e.g. if the instruction at the insert position is dead. If the decision is to not vectorize but scalarize or scather/gather, then the cost will be to total cost for all members. In those cases, assign individual the cost per member, to more closely reflect to choice per instruction. This fixes a divergence between legacy and VPlan-based cost model. Fixes https://github.com/llvm/llvm-project/issues/108098.	2024-09-11 21:04:34 +01:00
Florian Hahn	e3c537ff90	[VPlan] Consider non-header phis in planContainsAdditionalSimp. Update planContainsAdditionalSimplifications to also check phis not in the loop header. This ensures we don't miss cases where VPBlendRecipes (which correspond to such phis) have been simplified. Fixes https://github.com/llvm/llvm-project/issues/107473.	2024-09-10 21:37:14 +01:00
Florian Hahn	a794ee4559	[VPlan] Add VPValue for VF, use it for VPWidenIntOrFpInductionRecipe. (#95305 ) Similar to VFxUF, also add a VF VPValue to VPlan and use it to get the runtime VF in VPWidenIntOrFpInductionRecipe. Code for VF is only generated if there are users of VF, to avoid unnecessary test changes. PR: https://github.com/llvm/llvm-project/pull/95305	2024-09-10 10:41:35 +01:00
Kolya Panchenko	00e40c9b5b	[LV] Support binary and unary operations with EVL-vectorization (#93854 ) The patch adds `VPWidenEVLRecipe` which represents `VPWidenRecipe` + EVL argument. The new recipe replaces `VPWidenRecipe` in `tryAddExplicitVectorLength` for each binary and unary operations. Follow up patches will extend support for remaining cases, like `FCmp` and `ICmp`	2024-09-06 11:41:36 -04:00
Florian Hahn	cf2ecc7c1c	[LV] Remove over-aggressive assert from 3fe6a064f15c. There are some cases where only the first operand is marked for truncation. In that case, the compare won't be truncated which would incorrectly trigger the assertion. It also shows that the check pre 3fe6a064f15c also considered compares truncated that cannot be truncated.	2024-09-05 18:20:16 +01:00
Florian Hahn	3fe6a064f1	[LV] Check if compare is truncated directly in getInstructionCost. The current check for truncated compares in getInstructionCost misses cases where either the first or both operands are constants. Check directly if the compare is marked for truncation. In that case, the minimum bitwidth is that of the operands. The patch also adds asserts to ensure that. This fixes a divergence between legacy and VPlan-based cost model, where the legacy cost model incorrectly estimated the cost of compares with truncated operands. Fixes https://github.com/llvm/llvm-project/issues/107171.	2024-09-04 20:50:06 +01:00
Philip Reames	1fbb6b4efc	[LV] Prefer FLT_MIN/MAX for fmin/fmax reductions with ninf (#107141 ) Analogous to 2c7786e94a1058bd4f96794a1d4f70dcb86e5cc5, cleanup a case where the vectorizer is emitting a non-canonical identity value given the available flags. We use largest/smallest value during ISEL, and VP expansion, but not during vectorization. Since the fmin/fmax/fminimum/fmaximum intrinsics don't require a start value, this difference is only visible when masking of inactive lanes is required. Primary motivation of this change is simply to remove a difference between version of code which reason about the identity value of a reduction so I can kill all but one off. In review, it was pointed out that this is actually a functional fix as well. The old code used inf on a noinf reduction instruction - whose result is poison! That wasn't the intent of the code.	2024-09-03 12:21:54 -07:00
Philip Reames	2c7786e94a	Prefer use of 0.0 over -0.0 for fadd reductions w/nsz (in IR) (#106770 ) This is a follow up to 924907bc6, and is mostly motivated by consistency but does include one additional optimization. In general, we prefer 0.0 over -0.0 as the identity value for an fadd. We use that value in several places, but don't in others. So, let's be consistent and use the same identity (when nsz allows) everywhere. This creates a bunch of test churn, but due to 924907bc6, most of that churn doesn't actually indicate a change in codegen. The exception is that this change enables the use of 0.0 for nsz, but not reasoc, fadd reductions. Or said differently, it allows the neutral value of an ordered fadd reduction to be 0.0.	2024-09-03 09:16:37 -07:00
Florian Hahn	654bb4e9f2	[LV] Don't consider branches leaving loop in collectValuesToIgnore. Branches exiting the loop will remain regardless, so don't consider them in collectValuesToIgnore. This fixes another divergence between legacy and VPlan-based cost model. Fixes https://github.com/llvm/llvm-project/issues/106780.	2024-09-01 20:35:36 +01:00
Florian Hahn	f0e34f3818	[VPlan] Don't skip optimizable truncs in planContainsAdditionalSimps. A optimizable cast can also be removed by VPlan simplifications. Remove the restriction from planContainsAdditionalSimplifications, as this causes it to miss relevant simplifications, triggering false positives for the cost decision verification. Also adds debug output for printing additional cost-precomputations. Fixes https://github.com/llvm/llvm-project/issues/106641.	2024-08-30 11:29:30 +01:00
Florian Hahn	c4906588ce	[VPlan] Use skipCostComputation when pre-computing induction costs. This ensures we skip any instructions identified to be ignored by the legacy cost model as well. Fixes a divergence between legacy and VPlan-based cost model. Fixes https://github.com/llvm/llvm-project/issues/106417.	2024-08-29 21:20:00 +01:00
Maciej Gabka	95d2d1cba0	Move stepvector intrinsic out of experimental namespace (#98043 ) This patch is moving out stepvector intrinsic from the experimental namespace. This intrinsic exists in LLVM for several years now, and is widely used.	2024-08-28 12:48:20 +01:00
Mel Chen	dfde1a7232	[LV][NFC] Update and clean up the test case LoopVectorize/RISCV/inloop-reduction.ll. (#102907 )	2024-08-28 17:46:58 +08:00
Florian Hahn	cb4efe1d07	[VPlan] Don't trigger VF assertion if VPlan has extra simplifications. There are cases where VPlans contain some simplifications that are very hard to accurately account for up-front in the legacy cost model. Those cases are caused by un-simplified inputs, which trigger the assert ensuring both the legacy and VPlan-based cost model agree on the VF. To avoid false positives due to missed simplifications in general, only trigger the assert if the chosen VPlan doesn't contain any additional simplifications. Fixes https://github.com/llvm/llvm-project/issues/104714. Fixes https://github.com/llvm/llvm-project/issues/105713.	2024-08-22 21:38:06 +01:00
Florian Hahn	4e04286d61	[VPlan] Only use selectVectorizationFactor for cross-check (NFCI). (#103033 ) Use getBestVF to select VF up-front and only use selectVectorizationFactor to get the VF legacy VF to check the vectorization decision matches the VPlan-based cost model. PR: https://github.com/llvm/llvm-project/pull/103033	2024-08-21 13:09:01 +02:00
Florian Hahn	99741ac285	[VPlan] Introduce explicit ExtractFromEnd recipes for live-outs. (#100658 ) Introduce explicit ExtractFromEnd recipes to extract the final values for live-outs instead of implicitly extracting in VPLiveOut::fixPhi. This is a follow-up to the recent changes of modeling extracts for recurrences and consolidates live-out extract creation for fixed-order recurrences at a single place: addLiveOutsForFirstOrderRecurrences. It is also in preparation of replacing VPLiveOut with VPIRInstructions wrapping the original scalar phis. PR: https://github.com/llvm/llvm-project/pull/100658	2024-08-21 10:06:44 +02:00
Florian Hahn	e9e3a183d6	[LV] Don't cost branches and conditions to empty blocks. Update the legacy cost model skip branches with successors blocks that are empty or only contain dead instructions, together with their conditions. Such branches and conditions won't result in any generated code and will be cleaned up by VPlan transforms. This fixes a difference between the legacy and VPlan-based cost model. When running LV in its usual pipeline position, such dead blocks should already have been cleaned up, but they might be generated manually or by fuzzers. Fixes https://github.com/llvm/llvm-project/issues/100591.	2024-08-18 12:51:17 +01:00
Florian Hahn	c7a44ec031	[VPlan] Check successors in VPlan to check if scalar epi required (NFC) Now that the branches to the scalar epilogue are modeled in VPlan directly, check the VPlan to see if a scalar epilogue is required. Preparation for https://github.com/llvm/llvm-project/pull/100658.	2024-08-12 15:33:52 +01:00
Craig Topper	59728193a6	[RISCV] Disable fixed length vectors with Zve32* without Zvl64b. (#102405 ) Fixed length vectors use scalable vector containers. With Zve32* and not Zvl64b, vscale is a 0.5 due RVVBitsPerBlock being 64. To support this correctly we need to lower RVVBitsPerBlock to 32 and change our type mapping. But we need to RVVBitsPerBlock to alway be >= ELEN. This means we need two different mapping depending on ELEN. That is a non-trivial amount of work so disable fixed lenght vectors without Zvl64b for now. We had almost no tests for Zve32x without Zvl64b which is probably why we never realized that it was broken. Fixes #102352.	2024-08-08 09:17:43 -07:00
Philip Reames	d3fd28a134	[RISCV][TTI] Properly model odd vector sized LD/ST operations (#100436 ) The motivation for this change is the costing of a LD or ST with nearly power of 2 vectors (e.g. <3 x i32> or <7 x i32>) on V. There's an experimental option in SLP to allow emitting these if the cost model says they're profitable. This really helps with e.g. RGB vectors. Our actual lowering for these depends on whether a wider container type is known available. If so, we use a vle or vse on the wider type with a restricted VL. If not, we split until a legal type is found, and then apply the vle/vse on the sub-pieces. This change is intentionally restricted to only the case where promotion (widening w/VL predication) is involved. We appear to have at least one bug in our splitting lowering (see discussion on review), and to avoid exposing this more widely, I chose to not adjust costs for the splitting case. The current splitting costing assumes scalarization (which is not true of the actual lowering), but that has the effect of biasing vectorization away from such cases strongly. For the widening case, the true cost scales with the next largest legal type. The default implementation assumes that such a type is scalarized. Changing that brings our cost in line with our actual lowering decision. Note that since scalarization is not possible for scalable types, the prior costing falsely returned Invalid for that case.	2024-07-26 12:52:20 -07:00
Florian Hahn	67a55e01e3	[VPlan] Replace getBestPlan by getBestVF use also for epilogue vec. (#98821 ) Replace getBestPlan by getBestVF which simply finds the best VF out of the VFs for the available VPlans. Then use getBestPlan to retrieve the corresponding VPlan. This allows using getBestVF & getBestPlan for epilogue vectorization as well. As the same plan may be used to vectorize both the main and epilogue loop, restricting the VF of the best plan would cause issues. PR: https://github.com/llvm/llvm-project/pull/98821	2024-07-26 14:06:46 +01:00
Alexey Bataev	7432ad6af5	[LV][VP][NFC]Add tests for safe store/load forwarding/dependence distance. Reviewers: fhahn Reviewed By: fhahn Pull Request: https://github.com/llvm/llvm-project/pull/100635	2024-07-25 19:55:37 -04:00
Philip Reames	ea202f9f2e	[LV,RISCV] Regenerate a test to reduce spurious deltas in upcoming change	2024-07-25 12:22:59 -07:00
Florian Hahn	b72689a5cb	[LV] Ignore live-out users in cost model if scalar epilogue is required. Follow-up to ba8126b6fef79. If a scalar epilogue is required, users outside the loop won't use live-outs from the vector loop but from the scalar epilogue. Ignore them if that is the case. This fixes another case where the VPlan-based cost-model more accurately computes cost. Fixes https://github.com/llvm/llvm-project/issues/100464.	2024-07-25 11:16:18 +01:00
Florian Hahn	ba8126b6fe	[LV] Mark dead instructions in loop as free. Update collectValuesToIgnore to also ignore dead instructions in the loop. Such instructions will be removed by VPlan-based DCE and won't be considered by the VPlan-based cost model. This closes a gap between the legacy and VPlan-based cost model. In practice with the default pipelines, there shouldn't be any dead instructions in loops reaching LoopVectorize, but it is easy to generate such cases by hand or automatically via fuzzers. Fixes https://github.com/llvm/llvm-project/issues/99701.	2024-07-24 09:31:32 +01:00

1 2 3 4 5

247 Commits