llvm-project

Author	SHA1	Message	Date
Shih-Po Hung	ffcff2f465	[VPlan][NFC] Fix the value name of VECTOR_GEP (#107544 ) This patch passes the string `"vector.gep"` to CreateGEP instead of CreateMul.	2024-09-18 19:22:36 +08:00
LiqinWeng	a2994b2999	[LV][NFC] Unify printing for WidenEVLReicpe with other EVL recipes (#108177 )	2024-09-18 15:03:37 +08:00
Florian Hahn	3c5c61a414	[LV] Add first order rec test where hoisting can improve over sinking.	2024-09-17 09:25:39 +01:00
Florian Hahn	c48a1ebec1	[LV] Remove force-vector-width/force-vector-interleave from X86 test. Update target-specific test to not force VF/UF, but instead use the cost-model. There are similar tests arleady outside X86 and those force VF & UF. With this change, the target specific test checks the cost model. Changes in picked VF/UF are limited to test_pr62954_scalar_epilogue_required, and should preserve the original spirit of the test.	2024-09-17 08:59:24 +01:00
Luke Lau	30d7dcc1db	[RISCV] Add asserts requirement to loop vectorizer tests Hopefully this fixes a buildbot failure on fuchsia where opt doesn't have -debug-only	2024-09-17 14:18:36 +08:00
Luke Lau	41f1b467a2	[RISCV] Account for zvfhmin and zvfbfmin promotion in register usage (#108370 ) A half with only zvfhmin or bfloat will end up getting promoted to a f32 for most instructions. Unless the loop consists only of memory ops and permutation instructions which don't need promoted (is this common?), we'll end up using double the LMUL than what's currently being returned by getRegUsageForType. Since this is used by the loop vectorizer, it seems better to be conservative and assume that any usage of a zvfhmin half/bfloat will end up being widened to a f32	2024-09-17 13:50:19 +08:00
Florian Hahn	6749f2bbfe	[LV] Add pointer induction test variant with inbounds, remove TODO. The function doesn't crash any more with inbounds, add a variant with inbounds.	2024-09-15 21:48:18 +01:00
Florian Hahn	012dbec604	[VPlan] Handle ForceTargetInstructionCost in during precomputeCosts. Make sure ForceTargetInstruction is respected in precomputeCosts.	2024-09-15 10:53:43 +01:00
Florian Hahn	f0c5caa814	[VPlan] Add VPIRInstruction, use for exit block live-outs. (#100735 ) Add a new VPIRInstruction recipe to wrap existing IR instructions not to be modified during execution, execept for PHIs. For PHIs, a single VPValue operand is allowed, and it is used to add a new incoming value for the single predecessor VPBB. Expect PHIs, VPIRInstructions cannot have any operands. Depends on https://github.com/llvm/llvm-project/pull/100658. PR: https://github.com/llvm/llvm-project/pull/100735	2024-09-14 21:21:55 +01:00
Florian Hahn	08d294df55	[VPlan] Simplify VPBuilder insert point when adding users in exit block. Simplifies setting the insert point, addressing a TODO.	2024-09-12 22:47:03 +01:00
Florian Hahn	ed41497498	[LAA] Also reset CanUseDiffCheck in RTPointerChecking::reset(). RuntimePointerChecking::reset() is used to reset its state between subsequent analysis invocations. Also reset CanUseDiffCheck to its default (true). Otherwise it might have been set to false during a previous analysis invocation, which unnecessarily pessimizes the subsequent analysis invocations with a pruned set of dependences. This is in line with the other fields being reset.	2024-09-12 09:31:59 +01:00
Florian Hahn	ea83e1c05a	[LV] Assign cost to all interleave members when not interleaving. At the moment, the full cost of all interleave group members is assigned to the instruction at the group's insert position, even if the decision was to not form an interleave group. This can lead to inaccurate cost estimates, e.g. if the instruction at the insert position is dead. If the decision is to not vectorize but scalarize or scather/gather, then the cost will be to total cost for all members. In those cases, assign individual the cost per member, to more closely reflect to choice per instruction. This fixes a divergence between legacy and VPlan-based cost model. Fixes https://github.com/llvm/llvm-project/issues/108098.	2024-09-11 21:04:34 +01:00
Hari Limaye	7858e14547	[LV] Amend check for IV increments in collectUsersInEntryBlock (#108020 ) The check for IV increments in collectUsersInEntryBlock currently triggers for exit-block PHIs which use the IV start value, resulting in us failing to add the input value for the middle block to these PHIs. Fix this by amending the check for IV increments to only include incoming values that are instructions inside the loop. Fixes #108004	2024-09-11 16:43:34 +01:00
Florian Hahn	e3c537ff90	[VPlan] Consider non-header phis in planContainsAdditionalSimp. Update planContainsAdditionalSimplifications to also check phis not in the loop header. This ensures we don't miss cases where VPBlendRecipes (which correspond to such phis) have been simplified. Fixes https://github.com/llvm/llvm-project/issues/107473.	2024-09-10 21:37:14 +01:00
Florian Hahn	a794ee4559	[VPlan] Add VPValue for VF, use it for VPWidenIntOrFpInductionRecipe. (#95305 ) Similar to VFxUF, also add a VF VPValue to VPlan and use it to get the runtime VF in VPWidenIntOrFpInductionRecipe. Code for VF is only generated if there are users of VF, to avoid unnecessary test changes. PR: https://github.com/llvm/llvm-project/pull/95305	2024-09-10 10:41:35 +01:00
Florian Hahn	34034381b7	[VPlan] Consistently use VTC for vector trip count in vplan-printing.ll. The inconsistency surfaced in https://github.com/llvm/llvm-project/pull/95305. Split off the reduce the diff.	2024-09-09 21:36:28 +01:00
Florian Hahn	aa158bf402	[LV] Update tests to replace some code with loop varying instructions. Update some tests with loop-invariant instructions, where hoisting them out of the loop changes the vectorization decision. This should preserve their original spirit when making further improvements.	2024-09-09 14:10:12 +01:00
Kolya Panchenko	00e40c9b5b	[LV] Support binary and unary operations with EVL-vectorization (#93854 ) The patch adds `VPWidenEVLRecipe` which represents `VPWidenRecipe` + EVL argument. The new recipe replaces `VPWidenRecipe` in `tryAddExplicitVectorLength` for each binary and unary operations. Follow up patches will extend support for remaining cases, like `FCmp` and `ICmp`	2024-09-06 11:41:36 -04:00
ErikHogeman	78e1e6ace6	[LV] Check for vector-to-scalar casts in legalizer (#106244 ) The code makes assumptions later on the operations and their inputs being scalar in the loops that are processed, so we should make sure this is the case in the legalizer.	2024-09-06 11:20:14 +02:00
Florian Hahn	cf2ecc7c1c	[LV] Remove over-aggressive assert from 3fe6a064f15c. There are some cases where only the first operand is marked for truncation. In that case, the compare won't be truncated which would incorrectly trigger the assertion. It also shows that the check pre 3fe6a064f15c also considered compares truncated that cannot be truncated.	2024-09-05 18:20:16 +01:00
Florian Hahn	3fe6a064f1	[LV] Check if compare is truncated directly in getInstructionCost. The current check for truncated compares in getInstructionCost misses cases where either the first or both operands are constants. Check directly if the compare is marked for truncation. In that case, the minimum bitwidth is that of the operands. The patch also adds asserts to ensure that. This fixes a divergence between legacy and VPlan-based cost model, where the legacy cost model incorrectly estimated the cost of compares with truncated operands. Fixes https://github.com/llvm/llvm-project/issues/107171.	2024-09-04 20:50:06 +01:00
Madhur Amilkanthwar	cd46829e54	[LV] Fix emission of debug message in legality check (#101924 ) Successful vectorization message is emitted even after "Result" is false. "Result" = false indicates failure of one of the legality check and thus successful message should not be printed.	2024-09-04 16:28:39 +05:30
Florian Hahn	3bd161e98d	[LV] Honor forced scalars in setVectorizedCallDecision. Similarly to dd94537b4, setVectorizedCallDecision also did not consider ForcedScalars. This lead to VPlans not reflecting the decision by the legacy cost model (cost computation would use scalar cost, VPlan would have VPWidenCallRecipe). To fix this, check if the call has been forced to scalar in setVectorizedCallDecision. Note that this requires moving setVectorizedCallDecision after collectLoopUniforms (which sets ForcedScalars). collectLoopUniforms does not depend on call decisions and can safely be moved. Fixes https://github.com/llvm/llvm-project/issues/107051.	2024-09-03 21:06:32 +01:00
Philip Reames	1fbb6b4efc	[LV] Prefer FLT_MIN/MAX for fmin/fmax reductions with ninf (#107141 ) Analogous to 2c7786e94a1058bd4f96794a1d4f70dcb86e5cc5, cleanup a case where the vectorizer is emitting a non-canonical identity value given the available flags. We use largest/smallest value during ISEL, and VP expansion, but not during vectorization. Since the fmin/fmax/fminimum/fmaximum intrinsics don't require a start value, this difference is only visible when masking of inactive lanes is required. Primary motivation of this change is simply to remove a difference between version of code which reason about the identity value of a reduction so I can kill all but one off. In review, it was pointed out that this is actually a functional fix as well. The old code used inf on a noinf reduction instruction - whose result is poison! That wasn't the intent of the code.	2024-09-03 12:21:54 -07:00
Philip Reames	2c7786e94a	Prefer use of 0.0 over -0.0 for fadd reductions w/nsz (in IR) (#106770 ) This is a follow up to 924907bc6, and is mostly motivated by consistency but does include one additional optimization. In general, we prefer 0.0 over -0.0 as the identity value for an fadd. We use that value in several places, but don't in others. So, let's be consistent and use the same identity (when nsz allows) everywhere. This creates a bunch of test churn, but due to 924907bc6, most of that churn doesn't actually indicate a change in codegen. The exception is that this change enables the use of 0.0 for nsz, but not reasoc, fadd reductions. Or said differently, it allows the neutral value of an ordered fadd reduction to be 0.0.	2024-09-03 09:16:37 -07:00
Florian Hahn	dd94537b40	[LV] Update call widening decision when scalarzing calls. collectInstsToScalarize may decide to scalarize a call. If so, we have to update the widening decision for the call, otherwise the call won't be scalarized as expected during VPlan construction. This issue was uncovered by f82543d509.	2024-09-03 14:12:41 +01:00
Simon Pilgrim	6c8746b6e3	[Analysis] getIntrinsicForCallSite - add vectorization support for acos/asin/atan and cosh/sinh/tanh libcalls (#106844 ) Followup to #106584 - ensure acos/asin/atan and cosh/sinh/tanh libcalls correctly map to the llvm intrinsic equivalents	2024-09-03 10:05:56 +01:00
Florian Hahn	954ed05c10	[VPlan] Simplify MUL operands at recipe construction. This moves the logic to create simplified operands using SCEV to MUL recipe creation. This is needed to match the behavior of the legacy's cost model. TODOs are to extend to other opcodes and move to a transform. Note that this also restricts the number of SCEV simplifications we apply to more precisely match the cases handled by the legacy cost model. Fixes https://github.com/llvm/llvm-project/issues/107015.	2024-09-02 21:25:31 +01:00
Florian Hahn	50a02e7c68	[VPlan] Pass intrinsic inst to TTI in VPWidenCallRecipe::computeCost. Follow-up to 9ccf825, adjust computeCost to also pass IntrinsicInst to TTI if available, as there are multiple places in TTI which use the IntrinsicInst. Fixes https://github.com/llvm/llvm-project/issues/107016.	2024-09-02 20:47:37 +01:00
Florian Hahn	b0de7fa466	[VPlan] Use op from underlying call in computeCost if needed. This fixes a divergence between legacy and VPlan-based cost model, e.g. if one of the operands has an first-order recurrence phi as operand.	2024-09-02 14:00:10 +01:00
Nikita Popov	f044564db1	[InstCombine] Make backedge check in op of phi transform more precise (#106075 ) The op of phi transform wants to prevent moving an operation across a backedge, as this may lead to an infinite combine loop. Currently, this is done using isPotentiallyReachable(). The problem with that is that all blocks inside a loop are reachable from each other. This means that the op of phi transform is effectively completely disabled for code inside loops, even when it's not actually operating on a loop phi (just a phi that happens to be in a loop). Fix this by explicitly computing the backedges inside the function instead. Do this via RPOT, which is a bit more efficient than using FindFunctionBackedges() (which does it without any pre-computed analyses). For irreducible cycles, there may be multiple possible choices of backedge, and this just picks one of them. This is still sufficient to prevent combine loops. This also removes the last use of LoopInfo in InstCombine -- I'll drop the analysis in a followup.	2024-09-02 09:09:21 +02:00
Florian Hahn	654bb4e9f2	[LV] Don't consider branches leaving loop in collectValuesToIgnore. Branches exiting the loop will remain regardless, so don't consider them in collectValuesToIgnore. This fixes another divergence between legacy and VPlan-based cost model. Fixes https://github.com/llvm/llvm-project/issues/106780.	2024-09-01 20:35:36 +01:00
Yingwei Zheng	380fa875ab	[InstCombine] Replace all dominated uses of condition with constants (#105510 ) This patch replaces all dominated uses of condition with true/false to improve context-sensitive optimizations. It eliminates a bunch of branches in llvm-opt-benchmark. As a side effect, it may introduce new phi nodes in some corner cases. See the following case: ``` define i1 @test(i1 %cmp, i1 %cond) { entry: br i1 %cond, label %bb1, label %bb2 bb1: br i1 %cmp, label %if.then, label %if.else if.then: br %bb2 if.else: br %bb2 bb2: %res = phi i1 [%cmp, %entry], [%cmp, %if.then], [%cmp, %if.else] ret i1 %res } ``` It will be simplified into: ``` define i1 @test(i1 %cmp, i1 %cond) { entry: br i1 %cond, label %bb1, label %bb2 bb1: br i1 %cmp, label %if.then, label %if.else if.then: br %bb2 if.else: br %bb2 bb2: %res = phi i1 [%cmp, %entry], [true, %if.then], [false, %if.else] ret i1 %res } ``` I am planning to fix this in late pipeline/CGP since this problem exists before the patch.	2024-09-01 09:49:23 +08:00
Simon Pilgrim	4d412bedcc	[LoopVectorize][X86] amdlibm-calls.ll - add missing sinh and f64 test coverage to all functions Shows failure to vectorise acos/asin/atan and cosh/sinh/tanh libcalls if they don't have a corresponding veclib mapping	2024-08-31 11:48:22 +01:00
Philip Reames	4b553f4916	Regen a bunch of vectorizer tests to avoid naming churn in upcoming review	2024-08-30 10:13:02 -07:00
Simon Pilgrim	d58d105cda	[Analysis] isTriviallyVectorizable - add vectorization support for acos/asin/atan and cosh/sinh/tanh intrinsics (#106584 ) Show fallback cases in amdlibm tests where it doesn't have that specific op	2024-08-30 16:49:23 +01:00
Paul Walker	ce5620ba9a	[LLVM][VPlan] Pick more optimal initial value for VPBlend. (#104019 ) By choosing an initial value whose mask is only used by the blend we can remove the need for the mask entirely.	2024-08-30 13:30:23 +01:00
Florian Hahn	f0e34f3818	[VPlan] Don't skip optimizable truncs in planContainsAdditionalSimps. A optimizable cast can also be removed by VPlan simplifications. Remove the restriction from planContainsAdditionalSimplifications, as this causes it to miss relevant simplifications, triggering false positives for the cost decision verification. Also adds debug output for printing additional cost-precomputations. Fixes https://github.com/llvm/llvm-project/issues/106641.	2024-08-30 11:29:30 +01:00
Florian Hahn	c4906588ce	[VPlan] Use skipCostComputation when pre-computing induction costs. This ensures we skip any instructions identified to be ignored by the legacy cost model as well. Fixes a divergence between legacy and VPlan-based cost model. Fixes https://github.com/llvm/llvm-project/issues/106417.	2024-08-29 21:20:00 +01:00
Simon Pilgrim	81acc84997	[LoopVectorize][X86] amdlibm-calls.ll - add 2/4/8/16 vector widths test checks for fallback to llvm intrinsics Check for cases where there isn't a amdlib call but it still vectorises the math call	2024-08-29 17:31:55 +01:00
Simon Pilgrim	2f95298727	[LoopVectorize][X86] amdlibm-calls.ll - add additional 2/4/8/16 vector widths test checks This should cover most amdlibm functions, but still not added every VF combo (e.g. 2f32/16f64 often vectorises to the llvm intrinsic for that vector type)	2024-08-29 14:27:31 +01:00
Simon Pilgrim	c57abc66e2	[LoopVectorize][X86] amdlibm-calls.ll - cleanup test checks for 2/4/8/16 vector widths This cleans up the existing tests and shows the gaps in the test checks (for instance we're often testing VF4 + VF16 but not VF8 even though amdlibm supports it).	2024-08-29 14:27:31 +01:00
Florian Hahn	0a272d3a17	[LV] Use SCEV to analyze second operand for cost query. Improve operand analysis using SCEV for cost purposes. This fixes a divergence between legacy and VPlan-based cost-modeling after 533e6bbd0d34. Fixes https://github.com/llvm/llvm-project/issues/106248.	2024-08-29 12:08:27 +01:00
Florian Hahn	7912abe149	[LV] Add extra tests with interleave groups and different insert pos. Add additional test coverage for interleave groups with different insert positions.	2024-08-28 19:35:31 +01:00
Florian Hahn	4b84288f00	[VPlan] Pass live-ins used as exit values straight to live-out. Live-ins that are used as exit values don't need to be extracted, they can be passed through directly. This fixes a crash when trying to extract from a live-in. Fixes https://github.com/llvm/llvm-project/issues/106257.	2024-08-28 19:12:05 +01:00
Maciej Gabka	95d2d1cba0	Move stepvector intrinsic out of experimental namespace (#98043 ) This patch is moving out stepvector intrinsic from the experimental namespace. This intrinsic exists in LLVM for several years now, and is widely used.	2024-08-28 12:48:20 +01:00
Mel Chen	dfde1a7232	[LV][NFC] Update and clean up the test case LoopVectorize/RISCV/inloop-reduction.ll. (#102907 )	2024-08-28 17:46:58 +08:00
Florian Hahn	d43a80936d	Revert "[LAA] Remove loop-invariant check added in 234cc40adc61." This reverts commit a80053322b765eec93951e21db490c55521da2d8. The new asserts exposed an underlying issue where the expanded bounds could wrap, causing the parts of the code to incorrectly determine that accesses do not overlap. Reproducer below based on @mstorsjo's test case. opt -passes='print<access-info>' target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64" define i32 @j(ptr %P, i32 %x, i32 %y) { entry: %gep.P.4 = getelementptr inbounds nuw i8, ptr %P, i32 4 %gep.P.8 = getelementptr inbounds nuw i8, ptr %P, i32 8 br label %loop loop: %1 = phi i32 [ %x, %entry ], [ %sel, %loop.latch ] %iv = phi i32 [ %y, %entry ], [ %iv.next, %loop.latch ] %gep.iv = getelementptr inbounds i64, ptr %gep.P.8, i32 %iv %l = load i32, ptr %gep.iv, align 4 %c.1 = icmp eq i32 %l, 3 br i1 %c.1, label %loop.latch, label %if.then if.then: ; preds = %for.body store i64 0, ptr %gep.iv, align 4 %l.2 = load i32, ptr %gep.P.4 br label %loop.latch loop.latch: %sel = phi i32 [ %l.2, %if.then ], [ %1, %loop ] %iv.next = add nsw i32 %iv, 1 %c.2 = icmp slt i32 %iv.next, %sel br i1 %c.2, label %loop, label %exit exit: %res = phi i32 [ %iv.next, %loop.latch ] ret i32 %res }	2024-08-27 11:55:47 +01:00
Florian Hahn	a80053322b	[LAA] Remove loop-invariant check added in 234cc40adc61. 234cc40adc61 introduced a loop-invariance check to limit the compile-time impact of the newly added checks. This patch removes the restriction and avoids extra compile-time impact by sinking the check to exits where we would return an unknown dependence. This notably reduces the amount the extra checks are executed while not missing out on any improvements from them. https://llvm-compile-time-tracker.com/compare.php?from=33e7cd6ff23f6c904314d17c68dc58168fd32d09&to=7c55e66d4f31ce8262b90c119a8e84e1f9515ff1&stat=instructions:u	2024-08-26 10:24:00 +01:00
Florian Hahn	533e6bbd0d	[VPlan] Simplify live-ins if they are SCEVConstant. The legacy cost model in some parts checks if any of the operands are constants via SCEV. Update VPlan construction to replace live-ins that are constants via SCEV with such constants. This means VPlans (and codegen) reflects what we computing the cost of and removes another case where the legacy and VPlan cost model diverged. Fixes https://github.com/llvm/llvm-project/issues/105722.	2024-08-26 09:15:58 +01:00

1 2 3 4 5 ...

2640 Commits