llvm-project

Author	SHA1	Message	Date
Florian Hahn	144bdf3eb7	[VPlan] Also check if plan for best legacy VF contains simplifications. The plan for the VF chosen by the legacy cost model could also contain additional simplifications that cause cost differences. Also check if it contains simplifications. Fixes https://github.com/llvm/llvm-project/issues/114860.	2024-11-08 20:53:03 +00:00
David Sherwood	d77a36e01b	[LoopVectorize] Use new getUniqueLatchExitBlock routine (#108231 ) With PR #88385 I am introducing support for vectorising more loops with early exits that don't require a scalar epilogue. As such, if a loop doesn't have a unique exit block it will not automatically imply we require a scalar epilogue. Also, in all places in the code today where we use the variable LoopExitBlock we actually mean the exit block from the latch. Therefore, it seemed reasonable to add a new getUniqueLatchExitBlock that allows the caller to determine the exit block taken from the latch and use this instead of getUniqueExitBlock. I also renamed LoopExitBlock to be LatchExitBlock. I feel this not only better reflects how the variable is used today, but also prepares the code for PR #88385. While doing this I also noticed that one of the comments in requiresScalarEpilogue is wrong when we require a scalar epilogue, i.e. when we're not exiting from the latch block. This doesn't always imply we have multiple exits, e.g. see the test in Transforms/LoopVectorize/unroll_nonlatch.ll where the latch unconditionally branches back to the only exiting block.	2024-11-06 10:35:35 +00:00
Mel Chen	4480a22c2b	[LV][EVL] Emit vp.merge intrinsic to enable out-loop reduction in EVL vectorization. (#101641 ) Following #90184, this patch emits vp.merge intrinsic, which is used to set the inactive lanes in a select operation to the RHS instead of undef. Currently, it is applied to out-loop reduction for EVL vectorization. This patch performs transformation to convert select(header_mask, LHS, RHS) into vp.merge(all-true, LHS, RHS, EVL) And always use the predicated reduction select to set the incoming value of the reduction phi to support out-loop reduction when using tail folding with EVL. TODO: Postpone the adjustment of the predicated reduction select to VPlanTransform. The current adjustment might be too early, which could lead to a situation where the predicated reduction select is adjusted, but the EVL recipes cannot be successfully generated during VPlanTransform.	2024-11-06 14:53:49 +08:00
Mel Chen	70de0b8bea	[LV][NFC] Simplify initialization of MinProfitableTripCount (#113445 ) Iteration runtime check confirms whether the trip count is greater than VFxUF at least. Therefore, there is no need to adjust the MinProfitableTripCount to VF if it is zero. Retaining the original MinProfitableTripCount information is also beneficial for supporting more profitable runtime checks in the future.	2024-11-05 15:13:59 +08:00
Florian Hahn	17bad1a9da	[LV] Bail out on header phis in shouldConsiderInvariant. This fixes an infinite recursion in rare cases. Fixes https://github.com/llvm/llvm-project/issues/113794.	2024-11-01 20:51:25 +00:00
David Sherwood	4ed7bcb4a6	[VPlan][NFC] Add new getMiddleBlock interface to VPlan (#113558 ) This work is in preparation for PRs #112138 and #88385 where the middle block is not guaranteed to be the immediate successor to the region block. I've simply add new getMiddleBlock() interfaces to VPlan that for now just return cast<VPBasicBlock>(VectorRegion->getSingleSuccessor()) Once PR #112138 lands we'll need to do more work to discover the middle block.	2024-11-01 10:50:52 +00:00
Florian Hahn	b021464d35	[VPlan] Introduce scalar loop header in plan, remove VPLiveOut. (#109975 ) Update VPlan to include the scalar loop header. This allows retiring VPLiveOut, as the remaining live-outs can now be handled by adding operands to the wrapped phis in the scalar loop header. Note that the current version only includes the scalar loop header, no other loop blocks and also does not wrap it in a region block. PR: https://github.com/llvm/llvm-project/pull/109975	2024-10-31 21:36:44 +01:00
Florian Hahn	5bd1af5abc	[LV] Directly store VPlan in InnerLoopVectorizer (NFC). The current VPlan is already passed to multiple functions and more in the future. Store it once directly in InnerLoopVectorizer.	2024-10-30 18:39:50 +00:00
Piotr Fusik	3c02fea737	[LV][NFC] Remove stray semicolons (#114057 )	2024-10-30 04:07:14 +01:00
Florian Hahn	0d0abb351b	[VPlan] Use ResumePhi to create reduction resume phis. (#110004 ) Use VPInstruction::ResumePhi to create phi nodes for reduction resume values in the scalar preheader, similar to how ResumePhis are used for first-order recurrence resume values after 9a5a8731e77. This allows simplifying createAndCollectMergePhiForReduction to only collect reduction resume phis when vectorizing epilogue loops and adding extra incoming edges from the main vector loop. Updating phis for the epilogue vector loops requires special attention, because additional incoming values from the bypass blocks need to be added. PR: https://github.com/llvm/llvm-project/pull/110004	2024-10-28 20:14:08 +01:00
Shih-Po Hung	266ff98cba	[LV][VPlan] Use VF VPValue in VPVectorPointerRecipe (#110974 ) Refactors VPVectorPointerRecipe to use the VF VPValue to obtain the runtime VF, similar to #95305. Since only reverse vector pointers require the runtime VF, the patch sets VPUnrollPart::PartOpIndex to 1 for vector pointers and 2 for reverse vector pointers. As a result, the generation of reverse vector pointers is moved into a separate recipe.	2024-10-26 23:18:50 +08:00
Florian Hahn	9648271a3c	[LV] Pass flag indicating epilogue is vectorized to executePlan (NFC) This clarifies the flag, which is now only passed if the epilogue loop is being vectorized.	2024-10-25 20:39:47 +02:00
Florian Hahn	7b9f988a53	[VPlan] Limit stride replacement to vector region and middle VPBB (NFC). At the moment this in NFC, but ensures we only replace uses that are dominated by runtime checks as we model more of the skeleton in VPlan.	2024-10-24 15:08:36 -07:00
Florian Hahn	2437784a17	[LV] Replace unreachable by folding into else with assert (NFC). Simplify code as suggested post-commit in https://github.com/llvm/llvm-project/pull/110576/.	2024-10-21 21:46:48 -07:00
Elvis Wang	b3edc764f7	[VPlan] Implement VPWidenCastRecipe::computeCost(). (NFCI) (#111339 ) This patch implement `VPWidenCastRecipe::computeCost()` and skip cast recipies in the in-loop reduction.	2024-10-22 12:23:49 +08:00
Florian Hahn	173907b5d7	[LV] Move logic to check if op is invariant to legacy cost model. (NFC) This allows the function to be re-used in other places	2024-10-20 17:26:15 -07:00
Florian Hahn	cba5c77a71	[VPlan] Mark unreachable code path when retrieving the scalar PH. (NFCI)	2024-10-19 19:14:21 -07:00
Florian Hahn	2deb3a26fa	[LV] Fixup IV users only once during epilogue vectorization. (NFC) Induction users only need to be updated when vectorizing the epilogue. Avoid running fixupIVUsers when vectorizing the main loop during epilogue vectorization.	2024-10-19 18:11:06 -07:00
Florian Hahn	2a6b09e0d3	[LV] Use type from InsertPos for cost computation of interleave groups. Previously the legacy cost model would pick the type for the cost computation depending on the order of the members in the input IR. This is incompatible with the VPlan-based cost model (independent of original IR order) and also doesn't match code-gen, which uses the type of the insert position. Update the legacy cost model to use the type (and address space) from the Group's insert position. This brings the legacy cost model in line with the legacy cost model and fixes a divergence between both models. Note that the X86 cost model seems to assign different costs to groups with i64 and double types. Added a TODO to check. Fixes https://github.com/llvm/llvm-project/issues/112922.	2024-10-18 19:12:40 -07:00
Florian Hahn	c7496cebac	[LV] Use SCEV to check if minimum iteration check is known. (#111310 ) Use SCEV to check if the minimum iteration check (TC < Step) is known to be false. This is a first step towards addressing https://github.com/llvm/llvm-project/issues/111098. To catch the exact case from the issue, we need to do extra work to make sure the wrap flags on the shl are preserved and used by SCEV. Note that skeleton creation will be gradually moved to VPlan and this simplification should be done as VPlan transform eventually. The current plan is to move skeleton creation to VPlan starting from parts closest to the parts already created by VPlan, starting with induction resume value creation (started with https://github.com/llvm/llvm-project/pull/110577), then memory and SCEV checks and finally minimum iteration checks. PR: https://github.com/llvm/llvm-project/pull/111310	2024-10-18 15:22:59 -07:00
Alexey Bataev	f148d5791b	[LV]Initial support for safe distance in predicated DataWithEVL vectorization mode. Enabled initial support for max safe distance in DataWithEVL mode. If max safe distance is required, need to emit special code: CMP = icmp ult AVL, MAX_SAFE_DISTANCE SAFE_AVL = select CMP, AVL, MAX_SAFE_DISTANCE EVL = call i32 @llvm.experimental.get.vector.length(i64 SAFE_AVL) while vectorize the loop in DataWithEVL tail folding mode. Reviewers: fhahn Reviewed By: fhahn Pull Request: https://github.com/llvm/llvm-project/pull/102897	2024-10-18 15:51:49 -04:00
Florian Hahn	1b4a173fa4	[LV] Remove unneeded LoopScalarBody member variable. (NFC)	2024-10-17 04:15:47 +01:00
Piotr Fusik	e55869ae8a	[LV][NFC] Fix typos (#111971 )	2024-10-16 08:30:22 +02:00
Philip Reames	b3c687b4e9	[LV] Check early for supported interleave factors with scalable types [nfc] (#111592 ) Previously, the cost model was returning an invalid cost. This simply moves the check from one place to another. This is mostly to make the cost modeling code a bit easier to follow. --------- Co-authored-by: Mel Chen <mel.chen@sifive.com>	2024-10-15 07:37:46 -07:00
Florian Hahn	ec42778071	[LV] Remove unused type declaration from ILV (NFC).	2024-10-12 20:36:42 +01:00
Florian Hahn	65da32c634	[LV] Account for any-of reduction when computing costs of blend phis. Any-of reductions are narrowed to i1. Update the legacy cost model to use the correct type when computing the cost of a phi that gets lowered to selects (BLEND). This fixes a divergence between legacy and VPlan-based cost models after 36fc291b6ec6d. Fixes https://github.com/llvm/llvm-project/issues/111874.	2024-10-11 11:27:22 +01:00
David Sherwood	72f339de45	[LoopVectorize] Use predicated version of getSmallConstantMaxTripCount (#109928 ) There are a number of places where we call getSmallConstantMaxTripCount without passing a vector of predicates: getSmallBestKnownTC isIndvarOverflowCheckKnownFalse computeMaxVF isMoreProfitable I've changed all of these to now pass in a predicate vector so that we get the benefit of making better vectorisation choices when we know the max trip count for loops that require SCEV predicate checks. I've tried to add tests that cover all the cases affected by these changes.	2024-10-11 10:10:15 +01:00
Florian Hahn	bb937e276d	[LV] Compute value of escaped induction based on the computed end value. (#110576 ) Update fixupIVUsers to compute the value for escaped inductions using the already computed end value of the induction (EndValue), but subtracting the step. This results in slightly simpler codegen, as we avoid computing the full transformed index at VectorTripCount - 1. PR: https://github.com/llvm/llvm-project/pull/110576	2024-10-10 20:04:46 +01:00
Piotr Fusik	a7a4daa429	[LV][NFC] Improve readability with `bool` instead of `auto` (#111532 )	2024-10-10 12:30:18 +02:00
David Sherwood	e080be5ac2	[NFC][LoopVectorize] Clean up some code around getting a context (#111114 ) There are several places in LoopVectorize where we do more work than necessary to obtain a LLVMContext. I've tried to make the code more efficient.	2024-10-09 09:28:16 +01:00
Florian Hahn	6fbbe152fa	[VPlan] Introduce VPWidenIntrinsicRecipe to separate from libcall. (#110486 ) This patch splits off intrinsic hanlding to a new VPWidenIntrinsicRecipe. VPWidenIntrinsicRecipes only need access to the intrinsic ID to widen and the scalar result type (in case the intrinsic is overloaded on the result type). It does not need access to an underlying IR call instruction or function. This means VPWidenIntrinsicRecipe can be created easily without access to underlying IR.	2024-10-08 22:37:20 +01:00
Florian Hahn	36fc291b6e	[VPlan] Implement VPBlendRecipe::computeCost. Implement VPBlendRecipe::computeCost. VPBlendRecipe is currently is also used if only the first lane is used. This also requires pre-computing costs for forced scalars and instructions considered profitable to scalarize. For those, the cost will be computed separately in the legacy cost model. This will also be needed when implementing VPReplicateRecipe::computeCost.	2024-10-08 21:33:42 +01:00
David Sherwood	66b282014c	[LoopVectorize] Remove redundant code in emitSCEVChecks (#111132 ) There was some code in emitSCEVChecks to update the dominator tree if LoopBypassBlocks is empty, however there are no tests that fail when replacing this code with an assert. I built both SPEC2017 and the LLVM test suite and also didn't see any build failures. I've removed the code for now and added an assert to guard this in case anything changes, since it seems pointless to have code that's impossible to defend.	2024-10-07 07:58:27 +01:00
Florian Hahn	7f74651837	[VPlan] Use pointer to member 0 as VPInterleaveRecipe's pointer arg. (#106431 ) Update VPInterleaveRecipe to always use the pointer to member 0 as pointer argument. This in many cases helps to remove unneeded index adjustments and simplifies VPInterleaveRecipe::execute. In some rare cases, the address of member 0 does not dominate the insert position of the interleave group. In those cases a PtrAdd VPInstruction is emitted to compute the address of member 0 based on the address of the insert position. Alternatively we could hoist the recipe computing the address of member 0.	2024-10-06 22:53:13 +01:00
Florian Hahn	45b526afa2	[LV] Honor uniform-after-vectorization in setVectorizedCallDecision. The legacy cost model always computes the cost for uniforms as cost of VF = 1, but VPWidenCallRecipes would be created, as setVectorizedCallDecisions would not consider uniform calls. Fix setVectorizedCallDecision to set to Scalarize, if the call is uniform-after-vectorization. This fixes a bug in VPlan construction uncovered by the VPlan-based cost model. Fixes https://github.com/llvm/llvm-project/issues/111040.	2024-10-06 10:35:06 +01:00
Florian Hahn	9de327c94d	[LV] Generalize predication checks from 2c8836c899 for operands. This fixes another case where the VPlan-based and legacy cost models disagree. If any of the operands is predicated, it can't be trivially hoisted and we should consider the cost for evaluating it each loop iteration. Fixes https://github.com/llvm/llvm-project/issues/108697.	2024-10-02 20:16:41 +01:00
Florian Hahn	6d6eea92e3	[LV] Use SCEV to simplify wide binop operand to constant. The legacy cost model uses SCEV to determine if the second operand of a binary op is a constant. Update the VPlan construction logic to mirror the current legacy behavior, to fix a difference in the cost models. Fixes https://github.com/llvm/llvm-project/issues/109528. Fixes https://github.com/llvm/llvm-project/issues/110440.	2024-10-02 13:45:49 +01:00
David Sherwood	0b2403197f	[LoopVectorize] In LoopVectorize.cpp start using getSymbolicMaxBackedgeTakenCount (#108833 ) LoopVectorizationLegality currently only treats a loop as legal to vectorise if PredicatedScalarEvolution::getBackedgeTakenCount returns a valid SCEV, or more precisely that the loop must have an exact backedge taken count. Therefore, in LoopVectorize.cpp we can safely replace all calls to getBackedgeTakenCount with calls to getSymbolicMaxBackedgeTakenCount, since the result is the same. This also helps prepare the loop vectoriser for PR #88385.	2024-10-02 10:28:54 +01:00
Jeremy Morse	96f37ae453	[NFC] Use initial-stack-allocations for more data structures (#110544 ) This replaces some of the most frequent offenders of using a DenseMap that cause a malloc, where the typical element-count is small enough to fit in an initial stack allocation. Most of these are fairly obvious, one to highlight is the collectOffset method of GEP instructions: if there's a GEP, of course it's going to have at least one offset, but every time we've called collectOffset we end up calling malloc as well for the DenseMap in the MapVector.	2024-09-30 23:15:18 +01:00
Ramkumar Ramachandra	1e0d3c68c7	LV: reuse getSmallBestKnownTC in a TC estimation (NFC) (#105834 ) GeneratedRTChecks::getCost duplicates getSmallBestKnownTC partially, when attempting to get the best trip-count estimate. Since the intent of this code is to get the best trip-count estimate, and getSmallBestKnownTC is written for exactly this purpose, replace the partial code-duplication with a call to this function.	2024-09-30 15:46:02 +01:00
Mel Chen	f8373cb0f9	[LV] Reuse VPReplicateRecipe to handle scalar stores in exit block. (#106342 ) This patch separates the computation of the final reduction result and the intermediate stores of reduction. --------- Co-authored-by: Florian Hahn <flo@fhahn.com>	2024-09-30 15:35:09 +08:00
Florian Hahn	2c8836c899	[LV] Don't consider predicated insts as invariant unconditionally in CM. Predicated instructions cannot hoisted trivially, so don't treat them as uniform value in the cost model. This fixes a difference between legacy and VPlan-based cost model. Fixes https://github.com/llvm/llvm-project/issues/110295.	2024-09-29 20:31:24 +01:00
Florian Hahn	1edd22030c	[LV] Retrieve reduction resume values directly for epilogue vec. (NFC) Use the reduction resume values from the phis in the scalar header, instead of collecting them in a map. This removes some complexity from the general executePlan code paths and pushes it to only the epilogue vectorization part.	2024-09-29 10:37:58 +01:00
Florian Hahn	deda2f03f8	[LV] Remove redundant InnerLoopUnroller class (NFCI). All members of InnerLoopUnroller were removed previously and it now just forwards to InnerLoopVectorizer's constructor. Remove it.	2024-09-28 20:49:14 +01:00
Florian Hahn	a4b27e7f27	[LV] Remove noalias intrinsics handling from scalarizeInstruction (NFC). This is now handled by the explicit unroller.	2024-09-27 19:38:38 +01:00
Graham Hunter	6f1a8c2da2	[LV] Vectorize histogram operations (#99851 ) This patch implements autovectorization support for the 'all-in-one' histogram intrinsic, which seems to have more support than the 'standalone' intrinsic. See https://discourse.llvm.org/t/rfc-vectorization-support-for-histogram-count-operations/74788/ for an overview of the work and my notes on the tradeoffs between the two approaches.	2024-09-27 13:08:55 +01:00
Florian Hahn	2b125e899b	[LV] Don't pass loop preheader to getOrCreateVectorTripCount (NFCI). The vector trip count must already be created when fixupIVUsers is called. Don't pass the vector preheader there and delay retrieving the vector loop header. This ensures we are re-using the already computed trip count. Computing the trip count from scratch would not be correct, as the IR may not be in a valid state yet.	2024-09-25 20:39:05 +01:00
Florian Hahn	aae7ac6685	[VPlan] Remove VPIteration, update to use directly VPLane instead (NFC) After 8ec406757cb92 (https://github.com/llvm/llvm-project/pull/95842), only the lane part of VPIteration is used. Simplify the code by replacing remaining uses of VPIteration with VPLane directly.	2024-09-25 16:44:42 +01:00
Philip Reames	d288574363	[TTI][RISCV] Model cost of loading constants arms of selects and compares (#109824 ) This follows in the spirit of 7d82c99403f615f6236334e698720bf979959704, and extends the costing API for compares and selects to provide information about the operands passed in an analogous manner. This allows us to model the cost of materializing the vector constant, as some select-of-constants are significantly more expensive than others when you account for the cost of materializing the constants involved. This is a stepping stone towards fixing https://github.com/llvm/llvm-project/issues/109466. A separate SLP patch will be required to utilize the new API.	2024-09-25 07:25:57 -07:00
Florian Hahn	4be1c19a9f	[VPlan] Adjust AnyOf after creating ComputeReductionResult (NFC). Prepares for a follow-up change to use VPInstruction::ResumePhi to create the resume phi for reductions.	2024-09-25 14:13:50 +01:00

1 2 3 4 5 ...

2277 Commits