llvm-project

Author	SHA1	Message	Date
Luke Lau	f0d5104c94	[VPlan] Handle some VPInstructions in may{Read,Write}FromMemory (#120058 ) This just copies the same conservative definition from mayWriteToMemory, and enables more VPInstructions to be hoisted out in LICM. I think this should give more accurate costs, and I was able to build llvm-test-suite without the legacy-vplan cost model assertion going off.	2025-01-08 15:17:26 +08:00
Florian Mayer	ef391dbc29	[LV] Drop incorrect inbounds for reverse vector pointer when folding tail (#120730 ) When folding the tail, we may compute an address that we don't in the original scalar loop and it may not be inbounds. Drop Inbounds in that case.	2025-01-07 06:14:01 -08:00
Florian Hahn	f48884ded8	[VPlan] Remove loop region in optimizeForVFAndUF. (#108378 ) Update optimizeForVFAndUF to completely remove the vector loop region when possible. At the moment, we cannot remove the region if it contains * widened IVs: the recipe is needed to generate the step vector * reductions: ComputeReductionResults requires the reduction phi recipe for codegen. Both cases can be addressed by more explicit modeling. The patch also includes a number of updates to allow executing VPlans without a vector loop region. Depends on https://github.com/llvm/llvm-project/pull/110004	2025-01-05 15:50:42 +00:00
Luke Lau	7700695739	[VPlan] Fix crash with EVL tail folding intrinsic with no corresponding VP (#121542 ) This fixes a crash when building SPEC CPU 2017 with EVL tail folding when widening @llvm.log10 intrinsics. @llvm.log10 and some other intrinsics don't have a corresponding VP intrinsic, so this fixes the crash by removing the assert and bailing instead.	2025-01-05 11:41:56 +08:00
Muhammad Omair Javaid	332d2647ff	Revert "[LV]: Teach LV to recursively (de)interleave. (#89018 )" This reverts commit ccfe0de0e1e37ed369c9bf89dd0188ba0afb2e9a. This breaks LLVM build on AArch64 SVE Linux buildbots https://lab.llvm.org/buildbot/#/builders/143/builds/4462 https://lab.llvm.org/buildbot/#/builders/17/builds/4902 https://lab.llvm.org/buildbot/#/builders/4/builds/4399 https://lab.llvm.org/buildbot/#/builders/41/builds/4299	2024-12-31 03:12:24 +05:00
Florian Hahn	7f3428d3ed	[VPlan] Compute induction end values in VPlan. (#112145 ) Use createDerivedIV to compute IV end values directly in VPlan, instead of creating them up-front. This allows updating IV users outside the loop as follow-up. Depends on https://github.com/llvm/llvm-project/pull/110004 and https://github.com/llvm/llvm-project/pull/109975. PR: https://github.com/llvm/llvm-project/pull/112145	2024-12-29 19:05:08 +00:00
Hassnaa Hamdi	ccfe0de0e1	[LV]: Teach LV to recursively (de)interleave. (#89018 ) Currently available intrinsics are only ld2/st2, which don't support interleaving factor > 2. This patch teaches the LV to use ld2/st2 recursively to support high interleaving factors.	2024-12-27 12:42:07 +00:00
Elvis Wang	47e1c87a61	[VPlan] Set debug location for VPReduction/VPWidenIntrinsicRecipe. (#120054 ) This patch add missing debug location for VPReduction/VPWidenIntrinsicRecipe.	2024-12-27 10:37:21 +08:00
LiqinWeng	86fa35ce7e	[LV][VPlan] Use opcode to retrieve the VPID of the CallRecipe, rather than underlying instruction (#120816 ) This patch may cause the flags in the CallRecipe to be lost after EVL transformation, and it has been addressed in the patch: #119847	2024-12-22 10:28:20 +08:00
Luke Lau	b1f4a0201a	[LV] Update failing test with middle block. NFC	2024-12-18 11:51:48 +08:00
Luke Lau	c2a879ecaa	[VPlan] Fix VPTypeAnalysis cache clobbering in EVL transform (#120252 ) When building SPEC CPU 2017 with RISC-V and EVL tail folding, this assertion in VPTypeAnalysis would trigger during the transformation to EVL recipes: `d8a0709b10/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp (L135-L142)` It was caused by this recipe: ``` WIDEN ir<%shr> = vp.or ir<%add33>, ir<0>, vp<%6> ``` Having its type inferred as i16, when ir<%add33> and ir<0> had inferred types of i32 somehow. The cause of this turned out to be because the VPTypeAnalysis cache was getting clobbered: In this transform we were erasing recipes but keeping around the same mapping from VPValue* to Type. In the meantime, new recipes would be created which would have the same address as the old value. They would then incorrectly get the old erased VPValue's cached type: ``` --- before --- 0x600001ec5030: WIDEN ir<%mul21.neg> = vp.mul vp<%11>, ir<0>, vp<%6> 0x600001ec5450: <badref> <- some value that was erased --- after --- 0x600001ec5030: WIDEN ir<%mul21.neg> = vp.mul vp<%11>, ir<0>, vp<%6> 0x600001ec5450: WIDEN ir<%shr> = vp.or ir<%add33>, ir<0>, vp<%6> <- a new value that happens to have the same address ``` This fixes this by deferring the erasing of recipes till after the transformation. The test case might be a bit flakey since it just happens to have the right conditions to recreate this. I tried to add an assert in inferScalarType that every VPValue in the cache was valid, but couldn't find a way of telling if a VPValue had been erased. --------- Co-authored-by: Florian Hahn <flo@fhahn.com>	2024-12-18 11:28:28 +08:00
Luke Lau	4a7f60d328	[VPlan] Handle VPWidenCastRecipe without underlying value in EVL transform (#120194 ) This fixes a crash that shows up when building SPEC CPU 2017 with EVL tail folding on RISC-V. A VPWidenCastRecipe doesn't always have an underlying value, and in the case of this crash this happens whenever a widened cast is created via truncateToMinimalBitwidths. Fix this by just using the opcode stored in the recipe itself. I think a similar issue exists with VPWidenIntrinsicRecipe and how it's widened, but I haven't run into any crashes with it just yet.	2024-12-18 11:28:07 +08:00
Florian Hahn	4ad0fdd163	[VPlan] Remove reverse() of predecessors from VPInstruction::generate. This was originally done to reduce the diff for the change. Remove it and update the remaining tests. NFC modulo reordering of incoming values. Clean up after https://github.com/llvm/llvm-project/pull/114292.	2024-12-17 20:44:32 +00:00
Luke Lau	4746395bd7	[VPlan] Omit zero add in VPWidenIntOrFpInductionRecipe (#119668 ) I'm not sure if getStepVector was used for other things in the past where StartIdx was non-zero, but nowadays VPWidenIntOrFpInductionRecipe is the only user of it, and just passes zero to it. I presume InstCombine was already catching this so hopefully removing this won't affect codegen.	2024-12-16 11:55:48 +08:00
Florian Hahn	6c8f41d336	[VPlan] Hook IR blocks into VPlan during skeleton creation (NFC) (#114292 ) As a first step to move towards modeling the full skeleton in VPlan, start by wrapping IR blocks created during legacy skeleton creation in VPIRBasicBlocks and hook them into the VPlan. This means the skeleton CFG is represented in VPlan, just before execute. This allows moving parts of skeleton creation into recipes in the VPBBs gradually. Note that this allows retiring some manual DT updates, as this will be handled automatically during VPlan execution. PR: https://github.com/llvm/llvm-project/pull/114292	2024-12-12 15:58:16 +00:00
LiqinWeng	77b6910b27	[Test] Fix the failed test of #108351 (#119495 )	2024-12-11 11:43:25 +08:00
LiqinWeng	b759020cc8	[LV][EVL] Support cast instruction with EVL-vectorization (#108351 )	2024-12-11 10:01:41 +08:00
Florian Hahn	156da98683	[VPlan] Move printing final VPlan to ::execute (NFC). This moves printing of the final VPlan to ::execute. This ensures the final VPlan is printed, including recipes that get introduced by late, lowering transforms and skeleton construction. Split off from https://github.com/llvm/llvm-project/pull/114292, to simplify the diff.	2024-12-07 09:39:10 +00:00
Florian Hahn	6797b0f0c0	[VPlan] Use RPOT for VPlan codegen and printing. This split off changes for more complex CFGs in VPlan from both https://github.com/llvm/llvm-project/pull/114292 https://github.com/llvm/llvm-project/pull/112138 This simplifies their respective diffs.	2024-12-06 21:49:00 +00:00
Florian Hahn	7f7f540a48	Reapply "[VPlan] Update scalar induction resume values in VPlan. (#110577 )" This reverts commit f09b16e2671cbcdf7cb7dc7ed705db092a9deda1. The crash when building llvm-test-suite with stage2 should have been fixed by 1091fad31a83d5ab87eb6fa11fe3bdb3f0d152ea.	2024-12-06 19:41:51 +00:00
Nikita Popov	f09b16e267	Revert "[VPlan] Update scalar induction resume values in VPlan. (#110577 )" This reverts commit 0678e2058364ec10b94560d27ec7138dfa003287. This reverts commit 1091fad31a83d5ab87eb6fa11fe3bdb3f0d152ea. Causes crashes in llvm-test-suite when using stage 2 clang.	2024-12-06 18:01:42 +01:00
Florian Hahn	0678e20583	[VPlan] Update scalar induction resume values in VPlan. (#110577 ) Updated ILV.createInductionResumeValues (now createInductionResumeVPValue) to directly update the VPIRInstructions wrapping the original phis with the created resume values. This is the first step towards modeling them completely in VPlan. Subsequent patches will move creation of the resume values completely into VPlan. Depends on https://github.com/llvm/llvm-project/pull/109975. PR: https://github.com/llvm/llvm-project/pull/110577	2024-12-06 12:26:19 +00:00
Florian Hahn	82821254f5	[LV] Use IVUpdateMayOverflow to set HasNUW. (#111758 ) If IVUpdateMayOverflow is false, we proved that the induction increment cannot overflow in the vector loop. This allows setting NUW in some cases when folding the tail. PR: https://github.com/llvm/llvm-project/pull/111758	2024-11-28 10:12:41 +00:00
LiqinWeng	4a3f46de50	[LV][EVL] Support call instruction with EVL-vectorization (#110412 )	2024-11-28 10:05:08 +08:00
Mark Goncharov	93caee17ad	[RISCV][SLEEF]: Support SLEEF vector library for RISC-V target. (#114014 ) SLEEF math vector library now supports RISC-V target. Commit: https://github.com/shibatch/sleef/pull/477 This patch enables the use of auto-vectorization with subsequent replacement by the corresponding SLEEF function.	2024-11-26 12:25:54 +03:00
Florian Hahn	e2519b674c	[VPlan] Print incoming VPBB for Phi VPIRInstruction (NFC). Print the incoming block for Phi VPIRInstructions, for better debugging & testing.	2024-11-23 19:06:58 +00:00
Shih-Po Hung	632c5d2991	[VPlan] Support VPReverseVectorPointer in DataWithEVL vectorization (#113667 ) VPReverseVectorPointer relies on the runtime VF, but in DataWithEVL tail-folding, EVL (which can be less than VF at runtime) should be used instead. This patch updates the logic to check the users of VF and replaces the second operand if the user is VPReverseVectorPointer.	2024-11-22 17:18:39 +08:00
Paul Walker	56c091ea71	[LLVM][IR] Use splat syntax when printing ConstantExpr based splats. (#116856 ) This brings the printing of scalable vector constant splats inline with their fixed length counterparts.	2024-11-21 11:21:12 +00:00
David Sherwood	3097c60928	[LoopVectorize][NFC] Rewrite tests to check output of vplan cost model (#113697 ) Currently it's very difficult to improve the cost model for tail-folded loops because as soon as you add a VPInstruction::computeCost function that adds the costs of instructions such as VPInstruction::ActiveLaneMask and VPInstruction::ExplicitVectorLength the assert in LoopVectorizationPlanner::computeBestVF fails for some tests. This is because the VF chosen by the legacy cost model doesn't match the vplan cost model. See PR #90191. This assert is currently making it difficult to improve the cost model. Hopefully we will be in a position to remove the assert soon, however in order to do that we have to fix up a whole bunch of tests that rely upon the legacy cost model output. I've tried my best to update these tests to use vplan output instead. There is still work needed for the VF=1 case because the vplan cost model is not printed out in this case. I've not attempted to fix those in this patch.	2024-11-19 08:55:39 +00:00
Luke Lau	d119d43e92	[LV] Add missing REQUIRES: asserts to test	2024-11-14 17:41:40 +09:00
Luke Lau	050e2d325a	[LV] Remove assertions in IV overflow check (#115705 ) In #111310 an assert was added that for the IV overflow check used with tail folding, the overflow check is never known. However when applying the loop guards, it looks like it's possible that we might actually know the IV won't overflow: this occurs in 500.perlbench_r from SPEC CPU 2017 and triggers the assertion: Assertion failed: (!isIndvarOverflowCheckKnownFalse(Cost, VF * UF) && !SE.isKnownPredicate(CmpInst::getInversePredicate(ICmpInst::ICMP_ULT), TC2OverflowSCEV, SE.getSCEV(Step)) && "unexpectedly proved overflow check to be known"), function emitIterationCountCheck, file LoopVectorize.cpp, line 2501. There is a discrepancy between `isIndvarOverflowCheckKnownFalse` and the ICMP_ULT check, because the former uses `getSmallConstantMaxTripCount` which only takes into trip counts that fit into 32 bits. There doesn't seem to be an easy way to make the assertion aware of this, so this PR just removes it for now. There are two potential follow up things from this PR: 1. We miss calculating the max trip count in `@trip_count_max_1024`, it looks like we might need to apply loop guards somewhere in `ScalarEvolution::computeExitLimitFromICmp` 2. In `@overflow_at_0`, if `%tc == 0` then we the overflow check will always return false, even though it will overflow Fixes https://github.com/llvm/llvm-project/issues/115755	2024-11-14 17:04:49 +09:00
Luke Lau	9e77f59005	[LV] Account for vp_merge in out of loop EVL reductions in legacy cost model (#115903 ) In #101641, support for out of loop reductions with EVL tail folding was added by transforming selects to vp_merges in transformRecipestoEVLRecipes. Whilst the select was previously free, the vp_merge wasn't and incurs a cost on RISC-V with the VPlan cost model. But this diverged from the legacy cost model and caused the "VPlan cost model and legacy cost model disagreed" assertion to trigger when building 502.gcc_r from SPEC CPU 2017. Neither the select nor vp_merge recipes from the VPlan exist in the underlying instructions, so I thought it would make the most sense to fix this by adding the cost to the underlying phi instruction in getInstructionCost. It's worth noting that on RISC-V this vp_merge won't actually generate any instructions because the mask is all true, and will be folded away. So we should update the cost model at some point to reflect that.	2024-11-14 16:55:18 +09:00
Luke Lau	b2e2d8b3f6	[RISCV] Enable scalable loop vectorization for zvfhmin/zvfbfmin (#115272 ) This PR enables scalable loop vectorization for f16 with zvfhmin and bf16 with zvfbfmin. Enabling this was dependent on filling out the gaps for scalable zvfhmin/zvfbfmin codegen, but everything that the loop vectorizer might emit should now be handled. It does this by marking f16 and bf16 as legal in `isLegalElementTypeForRVV`. There are a few users of `isLegalElementTypeForRVV` that have already been enabled in other PRs: - `isLegalStridedLoadStore` #115264 - `isLegalInterleavedAccessType` #115257 - `isLegalMaskedLoadStore` #115145 - `isLegalMaskedGatherScatter` #114945 The remaining user is `isLegalToVectorizeReduction`. We can't promote f16/bf16 reductions to f32 so we need to disable them for scalable vectors. The cost model actually marks these as invalid, but for out-of-tree reductions `ComputeReductionResult` doesn't get costed and it will end up emitting a reduction intrinsic regardless, so we still need to mark them as illegal. We might be able to remove this restriction later for fmax and fmin reductions.	2024-11-11 13:29:48 +08:00
Florian Hahn	a5a1612deb	[VPlan] Consistently use DEBUG_TYPE loop-vectorize. This ensures debug messages in VPlan.cpp are included in the commonly used -debug-only=loop-vectorize.	2024-11-10 09:17:03 +00:00
Florian Hahn	144bdf3eb7	[VPlan] Also check if plan for best legacy VF contains simplifications. The plan for the VF chosen by the legacy cost model could also contain additional simplifications that cause cost differences. Also check if it contains simplifications. Fixes https://github.com/llvm/llvm-project/issues/114860.	2024-11-08 20:53:03 +00:00
Paul Walker	38fffa630e	[LLVM][IR] Use splat syntax when printing Constant[Data]Vector. (#112548 )	2024-11-06 11:53:33 +00:00
Mel Chen	4480a22c2b	[LV][EVL] Emit vp.merge intrinsic to enable out-loop reduction in EVL vectorization. (#101641 ) Following #90184, this patch emits vp.merge intrinsic, which is used to set the inactive lanes in a select operation to the RHS instead of undef. Currently, it is applied to out-loop reduction for EVL vectorization. This patch performs transformation to convert select(header_mask, LHS, RHS) into vp.merge(all-true, LHS, RHS, EVL) And always use the predicated reduction select to set the incoming value of the reduction phi to support out-loop reduction when using tail folding with EVL. TODO: Postpone the adjustment of the predicated reduction select to VPlanTransform. The current adjustment might be too early, which could lead to a situation where the predicated reduction select is adjusted, but the EVL recipes cannot be successfully generated during VPlanTransform.	2024-11-06 14:53:49 +08:00
Luke Lau	beb12f92c7	[RISCV] Add +optimized-nfN-segment-load-store (#114414 ) This is a follow up to #111511, where after benchmarking we learnt that the Banana Pi F3 has fast segmented loads for not just NF=2, but also NF=3 and NF=4: https://github.com/preames/bp3-microarch#vlseg_lmul_x_sew_throughput This adds tuning features to allow these segment loads and stores to be costed cheaper and enables it for the spacemit-x60. It also enables +optimized-nf2-segment-load-store by default in the generic tuning to maintain the previous behaviour when compiled without -mcpu or -mtune.	2024-11-04 06:43:58 +08:00
Florian Hahn	b021464d35	[VPlan] Introduce scalar loop header in plan, remove VPLiveOut. (#109975 ) Update VPlan to include the scalar loop header. This allows retiring VPLiveOut, as the remaining live-outs can now be handled by adding operands to the wrapped phis in the scalar loop header. Note that the current version only includes the scalar loop header, no other loop blocks and also does not wrap it in a region block. PR: https://github.com/llvm/llvm-project/pull/109975	2024-10-31 21:36:44 +01:00
Luke Lau	14045de250	[RISCV] Account for factor in interleave memory op costs (#111511 ) Currently we cost an interleaved memory op as if it were a load/store of the widened vector type, but this was undercosting in all cases when compared to the measured performance of todays hardware. On the x280 at NF=2 and spacemit-x60 at NF=2,3 and 4, a segmented load is carried out as a wide load and NF LMUL shuffle ops: https://github.com/preames/bp3-microarch#vlseg_lmul_x_sew_throughput All other NFs go through a slow path. On the spacemit-x60 this is proportional to VLMAX * NF, and on the x280 proportional to the number of segments. This patch increases the cost by implementing a wide load + NF LMUL shuffle op cost for the lowest common denominator NF=2, and then a slower cost proportional to VL for the other NFs. In a follow up patch we can add a tuning flag to use the faster cost model for NF=3 and 4 on the spacemit-x60. Note that the FIXME about illegal vectors seems to have been fixed in #100436	2024-10-31 05:36:46 +08:00
Florian Hahn	0d0abb351b	[VPlan] Use ResumePhi to create reduction resume phis. (#110004 ) Use VPInstruction::ResumePhi to create phi nodes for reduction resume values in the scalar preheader, similar to how ResumePhis are used for first-order recurrence resume values after 9a5a8731e77. This allows simplifying createAndCollectMergePhiForReduction to only collect reduction resume phis when vectorizing epilogue loops and adding extra incoming edges from the main vector loop. Updating phis for the epilogue vector loops requires special attention, because additional incoming values from the bypass blocks need to be added. PR: https://github.com/llvm/llvm-project/pull/110004	2024-10-28 20:14:08 +01:00
Shih-Po Hung	266ff98cba	[LV][VPlan] Use VF VPValue in VPVectorPointerRecipe (#110974 ) Refactors VPVectorPointerRecipe to use the VF VPValue to obtain the runtime VF, similar to #95305. Since only reverse vector pointers require the runtime VF, the patch sets VPUnrollPart::PartOpIndex to 1 for vector pointers and 2 for reverse vector pointers. As a result, the generation of reverse vector pointers is moved into a separate recipe.	2024-10-26 23:18:50 +08:00
Ramkumar Ramachandra	f719cfa868	LAA: be less conservative in isNoWrap (#112553 ) isNoWrap has exactly one caller which handles Assume = true separately, but too conservatively. Instead, pass Assume to isNoWrap, so it is threaded into getPtrStride, which has the correct handling for the Assume flag. Also note that the Stride == 1 check in isNoWrap is incorrect: getPtrStride returns Strides == 1 or -1, except when isNoWrapAddRec or Assume are true, assuming ShouldCheckWrap is true; we can include the case of -1 Stride, and when isNoWrapAddRec is true. With this change, passing Assume = true to getPtrStride could return a non-unit stride, and we correctly handle that case as well.	2024-10-22 09:55:51 +01:00
Alexey Bataev	f148d5791b	[LV]Initial support for safe distance in predicated DataWithEVL vectorization mode. Enabled initial support for max safe distance in DataWithEVL mode. If max safe distance is required, need to emit special code: CMP = icmp ult AVL, MAX_SAFE_DISTANCE SAFE_AVL = select CMP, AVL, MAX_SAFE_DISTANCE EVL = call i32 @llvm.experimental.get.vector.length(i64 SAFE_AVL) while vectorize the loop in DataWithEVL tail folding mode. Reviewers: fhahn Reviewed By: fhahn Pull Request: https://github.com/llvm/llvm-project/pull/102897	2024-10-18 15:51:49 -04:00
Florian Hahn	b497010854	[VPlan] Use VPInstruction::Name when assigning names (NFCI). This slightly improves the printing of VPInstructions. NFC except debug output.	2024-10-18 05:52:35 +01:00
Florian Hahn	3860e29e0e	[VPlan] Mark VPVectorPointerRecipe as not having sideeffects. VectorPointer doesn't read from memory or have any sideeffects. Mark it accordingly.	2024-10-16 06:10:19 +01:00
Florian Hahn	34cdd67c85	[VPlan] Use VPWidenIntrinsicRecipe to vp.select. (#110489 ) Use VPWidenIntrinsicRecipe (https://github.com/llvm/llvm-project/pull/110486) to create vp.select intrinsics. This potentially offers an alternative to duplicating EVL recipes for all existing recipes. There are some recipes that will need duplicates (at least at the moment), due to extra code-gen needs (e.g. widening loads and stores). But in cases the intrinsic can directly be used, creating the widened intrinsic directly would reduce the need to duplicate some recipes. PR: https://github.com/llvm/llvm-project/pull/110489	2024-10-15 21:48:15 +01:00
Florian Hahn	65da32c634	[LV] Account for any-of reduction when computing costs of blend phis. Any-of reductions are narrowed to i1. Update the legacy cost model to use the correct type when computing the cost of a phi that gets lowered to selects (BLEND). This fixes a divergence between legacy and VPlan-based cost models after 36fc291b6ec6d. Fixes https://github.com/llvm/llvm-project/issues/111874.	2024-10-11 11:27:22 +01:00
David Sherwood	72f339de45	[LoopVectorize] Use predicated version of getSmallConstantMaxTripCount (#109928 ) There are a number of places where we call getSmallConstantMaxTripCount without passing a vector of predicates: getSmallBestKnownTC isIndvarOverflowCheckKnownFalse computeMaxVF isMoreProfitable I've changed all of these to now pass in a predicate vector so that we get the benefit of making better vectorisation choices when we know the max trip count for loops that require SCEV predicate checks. I've tried to add tests that cover all the cases affected by these changes.	2024-10-11 10:10:15 +01:00
Florian Hahn	3ec6f805c5	[VPlan] Don't created GEP x, 0 for interleave group pointers. The GEP with offet 0 is redundant, remove it. This addresses a TODO from 7f74651837b ((#106431).	2024-10-08 12:08:13 +01:00

1 2 3 4 5 ...

287 Commits