llvm-project

Author	SHA1	Message	Date
Florian Hahn	dd94537b40	[LV] Update call widening decision when scalarzing calls. collectInstsToScalarize may decide to scalarize a call. If so, we have to update the widening decision for the call, otherwise the call won't be scalarized as expected during VPlan construction. This issue was uncovered by f82543d509.	2024-09-03 14:12:41 +01:00
Simon Pilgrim	6c8746b6e3	[Analysis] getIntrinsicForCallSite - add vectorization support for acos/asin/atan and cosh/sinh/tanh libcalls (#106844 ) Followup to #106584 - ensure acos/asin/atan and cosh/sinh/tanh libcalls correctly map to the llvm intrinsic equivalents	2024-09-03 10:05:56 +01:00
Florian Hahn	954ed05c10	[VPlan] Simplify MUL operands at recipe construction. This moves the logic to create simplified operands using SCEV to MUL recipe creation. This is needed to match the behavior of the legacy's cost model. TODOs are to extend to other opcodes and move to a transform. Note that this also restricts the number of SCEV simplifications we apply to more precisely match the cases handled by the legacy cost model. Fixes https://github.com/llvm/llvm-project/issues/107015.	2024-09-02 21:25:31 +01:00
Florian Hahn	50a02e7c68	[VPlan] Pass intrinsic inst to TTI in VPWidenCallRecipe::computeCost. Follow-up to 9ccf825, adjust computeCost to also pass IntrinsicInst to TTI if available, as there are multiple places in TTI which use the IntrinsicInst. Fixes https://github.com/llvm/llvm-project/issues/107016.	2024-09-02 20:47:37 +01:00
Florian Hahn	b0de7fa466	[VPlan] Use op from underlying call in computeCost if needed. This fixes a divergence between legacy and VPlan-based cost model, e.g. if one of the operands has an first-order recurrence phi as operand.	2024-09-02 14:00:10 +01:00
Nikita Popov	f044564db1	[InstCombine] Make backedge check in op of phi transform more precise (#106075 ) The op of phi transform wants to prevent moving an operation across a backedge, as this may lead to an infinite combine loop. Currently, this is done using isPotentiallyReachable(). The problem with that is that all blocks inside a loop are reachable from each other. This means that the op of phi transform is effectively completely disabled for code inside loops, even when it's not actually operating on a loop phi (just a phi that happens to be in a loop). Fix this by explicitly computing the backedges inside the function instead. Do this via RPOT, which is a bit more efficient than using FindFunctionBackedges() (which does it without any pre-computed analyses). For irreducible cycles, there may be multiple possible choices of backedge, and this just picks one of them. This is still sufficient to prevent combine loops. This also removes the last use of LoopInfo in InstCombine -- I'll drop the analysis in a followup.	2024-09-02 09:09:21 +02:00
Florian Hahn	654bb4e9f2	[LV] Don't consider branches leaving loop in collectValuesToIgnore. Branches exiting the loop will remain regardless, so don't consider them in collectValuesToIgnore. This fixes another divergence between legacy and VPlan-based cost model. Fixes https://github.com/llvm/llvm-project/issues/106780.	2024-09-01 20:35:36 +01:00
Yingwei Zheng	380fa875ab	[InstCombine] Replace all dominated uses of condition with constants (#105510 ) This patch replaces all dominated uses of condition with true/false to improve context-sensitive optimizations. It eliminates a bunch of branches in llvm-opt-benchmark. As a side effect, it may introduce new phi nodes in some corner cases. See the following case: ``` define i1 @test(i1 %cmp, i1 %cond) { entry: br i1 %cond, label %bb1, label %bb2 bb1: br i1 %cmp, label %if.then, label %if.else if.then: br %bb2 if.else: br %bb2 bb2: %res = phi i1 [%cmp, %entry], [%cmp, %if.then], [%cmp, %if.else] ret i1 %res } ``` It will be simplified into: ``` define i1 @test(i1 %cmp, i1 %cond) { entry: br i1 %cond, label %bb1, label %bb2 bb1: br i1 %cmp, label %if.then, label %if.else if.then: br %bb2 if.else: br %bb2 bb2: %res = phi i1 [%cmp, %entry], [true, %if.then], [false, %if.else] ret i1 %res } ``` I am planning to fix this in late pipeline/CGP since this problem exists before the patch.	2024-09-01 09:49:23 +08:00
Simon Pilgrim	4d412bedcc	[LoopVectorize][X86] amdlibm-calls.ll - add missing sinh and f64 test coverage to all functions Shows failure to vectorise acos/asin/atan and cosh/sinh/tanh libcalls if they don't have a corresponding veclib mapping	2024-08-31 11:48:22 +01:00
Philip Reames	4b553f4916	Regen a bunch of vectorizer tests to avoid naming churn in upcoming review	2024-08-30 10:13:02 -07:00
Simon Pilgrim	d58d105cda	[Analysis] isTriviallyVectorizable - add vectorization support for acos/asin/atan and cosh/sinh/tanh intrinsics (#106584 ) Show fallback cases in amdlibm tests where it doesn't have that specific op	2024-08-30 16:49:23 +01:00
Paul Walker	ce5620ba9a	[LLVM][VPlan] Pick more optimal initial value for VPBlend. (#104019 ) By choosing an initial value whose mask is only used by the blend we can remove the need for the mask entirely.	2024-08-30 13:30:23 +01:00
Florian Hahn	f0e34f3818	[VPlan] Don't skip optimizable truncs in planContainsAdditionalSimps. A optimizable cast can also be removed by VPlan simplifications. Remove the restriction from planContainsAdditionalSimplifications, as this causes it to miss relevant simplifications, triggering false positives for the cost decision verification. Also adds debug output for printing additional cost-precomputations. Fixes https://github.com/llvm/llvm-project/issues/106641.	2024-08-30 11:29:30 +01:00
Florian Hahn	c4906588ce	[VPlan] Use skipCostComputation when pre-computing induction costs. This ensures we skip any instructions identified to be ignored by the legacy cost model as well. Fixes a divergence between legacy and VPlan-based cost model. Fixes https://github.com/llvm/llvm-project/issues/106417.	2024-08-29 21:20:00 +01:00
Simon Pilgrim	81acc84997	[LoopVectorize][X86] amdlibm-calls.ll - add 2/4/8/16 vector widths test checks for fallback to llvm intrinsics Check for cases where there isn't a amdlib call but it still vectorises the math call	2024-08-29 17:31:55 +01:00
Simon Pilgrim	2f95298727	[LoopVectorize][X86] amdlibm-calls.ll - add additional 2/4/8/16 vector widths test checks This should cover most amdlibm functions, but still not added every VF combo (e.g. 2f32/16f64 often vectorises to the llvm intrinsic for that vector type)	2024-08-29 14:27:31 +01:00
Simon Pilgrim	c57abc66e2	[LoopVectorize][X86] amdlibm-calls.ll - cleanup test checks for 2/4/8/16 vector widths This cleans up the existing tests and shows the gaps in the test checks (for instance we're often testing VF4 + VF16 but not VF8 even though amdlibm supports it).	2024-08-29 14:27:31 +01:00
Florian Hahn	0a272d3a17	[LV] Use SCEV to analyze second operand for cost query. Improve operand analysis using SCEV for cost purposes. This fixes a divergence between legacy and VPlan-based cost-modeling after 533e6bbd0d34. Fixes https://github.com/llvm/llvm-project/issues/106248.	2024-08-29 12:08:27 +01:00
Florian Hahn	7912abe149	[LV] Add extra tests with interleave groups and different insert pos. Add additional test coverage for interleave groups with different insert positions.	2024-08-28 19:35:31 +01:00
Florian Hahn	4b84288f00	[VPlan] Pass live-ins used as exit values straight to live-out. Live-ins that are used as exit values don't need to be extracted, they can be passed through directly. This fixes a crash when trying to extract from a live-in. Fixes https://github.com/llvm/llvm-project/issues/106257.	2024-08-28 19:12:05 +01:00
Maciej Gabka	95d2d1cba0	Move stepvector intrinsic out of experimental namespace (#98043 ) This patch is moving out stepvector intrinsic from the experimental namespace. This intrinsic exists in LLVM for several years now, and is widely used.	2024-08-28 12:48:20 +01:00
Mel Chen	dfde1a7232	[LV][NFC] Update and clean up the test case LoopVectorize/RISCV/inloop-reduction.ll. (#102907 )	2024-08-28 17:46:58 +08:00
Florian Hahn	d43a80936d	Revert "[LAA] Remove loop-invariant check added in 234cc40adc61." This reverts commit a80053322b765eec93951e21db490c55521da2d8. The new asserts exposed an underlying issue where the expanded bounds could wrap, causing the parts of the code to incorrectly determine that accesses do not overlap. Reproducer below based on @mstorsjo's test case. opt -passes='print<access-info>' target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64" define i32 @j(ptr %P, i32 %x, i32 %y) { entry: %gep.P.4 = getelementptr inbounds nuw i8, ptr %P, i32 4 %gep.P.8 = getelementptr inbounds nuw i8, ptr %P, i32 8 br label %loop loop: %1 = phi i32 [ %x, %entry ], [ %sel, %loop.latch ] %iv = phi i32 [ %y, %entry ], [ %iv.next, %loop.latch ] %gep.iv = getelementptr inbounds i64, ptr %gep.P.8, i32 %iv %l = load i32, ptr %gep.iv, align 4 %c.1 = icmp eq i32 %l, 3 br i1 %c.1, label %loop.latch, label %if.then if.then: ; preds = %for.body store i64 0, ptr %gep.iv, align 4 %l.2 = load i32, ptr %gep.P.4 br label %loop.latch loop.latch: %sel = phi i32 [ %l.2, %if.then ], [ %1, %loop ] %iv.next = add nsw i32 %iv, 1 %c.2 = icmp slt i32 %iv.next, %sel br i1 %c.2, label %loop, label %exit exit: %res = phi i32 [ %iv.next, %loop.latch ] ret i32 %res }	2024-08-27 11:55:47 +01:00
Florian Hahn	a80053322b	[LAA] Remove loop-invariant check added in 234cc40adc61. 234cc40adc61 introduced a loop-invariance check to limit the compile-time impact of the newly added checks. This patch removes the restriction and avoids extra compile-time impact by sinking the check to exits where we would return an unknown dependence. This notably reduces the amount the extra checks are executed while not missing out on any improvements from them. https://llvm-compile-time-tracker.com/compare.php?from=33e7cd6ff23f6c904314d17c68dc58168fd32d09&to=7c55e66d4f31ce8262b90c119a8e84e1f9515ff1&stat=instructions:u	2024-08-26 10:24:00 +01:00
Florian Hahn	533e6bbd0d	[VPlan] Simplify live-ins if they are SCEVConstant. The legacy cost model in some parts checks if any of the operands are constants via SCEV. Update VPlan construction to replace live-ins that are constants via SCEV with such constants. This means VPlans (and codegen) reflects what we computing the cost of and removes another case where the legacy and VPlan cost model diverged. Fixes https://github.com/llvm/llvm-project/issues/105722.	2024-08-26 09:15:58 +01:00
Florian Hahn	885c4365c1	[VPlan] Skip branches marked as dead in cost precomputation. Don't consider the cost of branches marked to be skipped in VPlan cost pre-computation. Those aren't included in the legacy cost, so they should not be included in the VPlan cast.	2024-08-23 15:58:29 +01:00
Florian Hahn	cb4efe1d07	[VPlan] Don't trigger VF assertion if VPlan has extra simplifications. There are cases where VPlans contain some simplifications that are very hard to accurately account for up-front in the legacy cost model. Those cases are caused by un-simplified inputs, which trigger the assert ensuring both the legacy and VPlan-based cost model agree on the VF. To avoid false positives due to missed simplifications in general, only trigger the assert if the chosen VPlan doesn't contain any additional simplifications. Fixes https://github.com/llvm/llvm-project/issues/104714. Fixes https://github.com/llvm/llvm-project/issues/105713.	2024-08-22 21:38:06 +01:00
Florian Hahn	4e04286d61	[VPlan] Only use selectVectorizationFactor for cross-check (NFCI). (#103033 ) Use getBestVF to select VF up-front and only use selectVectorizationFactor to get the VF legacy VF to check the vectorization decision matches the VPlan-based cost model. PR: https://github.com/llvm/llvm-project/pull/103033	2024-08-21 13:09:01 +02:00
Nikita Popov	a105877646	[InstCombine] Remove some of the complexity-based canonicalization (#91185 ) The idea behind this canonicalization is that it allows us to handle less patterns, because we know that some will be canonicalized away. This is indeed very useful to e.g. know that constants are always on the right. However, this is only useful if the canonicalization is actually reliable. This is the case for constants, but not for arguments: Moving these to the right makes it look like the "more complex" expression is guaranteed to be on the left, but this is not actually the case in practice. It fails as soon as you replace the argument with another instruction. The end result is that it looks like things correctly work in tests, while they actually don't. We use the "thwart complexity-based canonicalization" trick to handle this in tests, but it's often a challenge for new contributors to get this right, and based on the regressions this PR originally exposed, we clearly don't get this right in many cases. For this reason, I think that it's better to remove this complexity canonicalization. It will make it much easier to write tests for commuted cases and make sure that they are handled.	2024-08-21 12:02:54 +02:00
Florian Hahn	99741ac285	[VPlan] Introduce explicit ExtractFromEnd recipes for live-outs. (#100658 ) Introduce explicit ExtractFromEnd recipes to extract the final values for live-outs instead of implicitly extracting in VPLiveOut::fixPhi. This is a follow-up to the recent changes of modeling extracts for recurrences and consolidates live-out extract creation for fixed-order recurrences at a single place: addLiveOutsForFirstOrderRecurrences. It is also in preparation of replacing VPLiveOut with VPIRInstructions wrapping the original scalar phis. PR: https://github.com/llvm/llvm-project/pull/100658	2024-08-21 10:06:44 +02:00
Florian Hahn	b8dccb7d56	[VPlan] Emit note when UserVF > MaxUserVF (NFCI). As suggested in https://github.com/llvm/llvm-project/pull/103033, add a remark when the UserVF is ignored due to it being larger than MaxUserVF. Only changes behavior of diagnostic/debug output.	2024-08-19 12:40:20 +01:00
Florian Hahn	e9e3a183d6	[LV] Don't cost branches and conditions to empty blocks. Update the legacy cost model skip branches with successors blocks that are empty or only contain dead instructions, together with their conditions. Such branches and conditions won't result in any generated code and will be cleaned up by VPlan transforms. This fixes a difference between the legacy and VPlan-based cost model. When running LV in its usual pipeline position, such dead blocks should already have been cleaned up, but they might be generated manually or by fuzzers. Fixes https://github.com/llvm/llvm-project/issues/100591.	2024-08-18 12:51:17 +01:00
Florian Hahn	42555cdba4	[VPlan] Run VPlan optimizations on plans in native path. Update buildVPlans (used in native path) to also run general VPlan optimizations in another small step to align both codepaths.	2024-08-15 13:05:51 +01:00
Volodymyr Vasylkun	8320b97ab9	[InstCombine] Fold an unsigned comparison of `add nsw X, C` with a constant into a signed comparison (#103480 ) Given an unsigned integer comparison of `add nsw X, C1` with some constant `C2` we can fold it into a signed comparison of `X` and `C2 - C1` under the following conditions: * There's a `nsw` flag on the addition * `C2` is non-negative * `X + C1` is non-negative * `C2 - C1` is non-negative	2024-08-14 15:31:19 +01:00
Florian Hahn	3efcc8ec7d	[LV] Add test where diff checks not used when re-trying with RT checks.	2024-08-14 14:19:35 +01:00
Paul Walker	9e318bac5b	[LLVM] Regenerate some test outputs for llvm/test/Transforms/LoopVectorize.	2024-08-14 10:59:46 +00:00
Florian Hahn	31f593eb95	[LAA] Also clear DiffChecks in LAI::reset(). DiffChecks will get populated twice when re-trying with runtime checks. Without clearing it like the regular Checks vector, it will contain some duplicates and the order the checks are created may not match the order the checks have been queued when re-trying.	2024-08-14 10:19:29 +01:00
Florian Hahn	1360b9d412	[LV] Add test for diff check creation order. Add a test where diff checks are generated initial and then re-generated when re-trying with runtime checks. At the moment, the order doesn't match the order they are created in, as the DiffChecks field in LAI isn't cleared as other fields holding runtime checks.	2024-08-14 10:17:00 +01:00
Madhur Amilkanthwar	b73771cf0f	[AArch64] Increase scatter overhead on Neoverse-V2 (#101296 ) This patch increases scatter overhead on Neoverse-V2 to 13. This benefits s128 kernel from TSVC_2 test suite. SPEC 17, RAJAPerf, and Sptter are unaffected by this patch. This patch boosts s128 kernel's performance from TSVC test suite by about 40% as this enables vectorization. Also, handle minor code refactoring for gather related part.	2024-08-14 10:12:40 +05:30
Nikita Popov	306b9c7b48	[SCEV] Handle more add/addrec mixes in computeConstantDifference() (#101999 ) computeConstantDifference() can currently look through addrecs with identical steps, and then through adds with identical operands (apart from constants). However, it fails to handle minor variations, such as two nested add recs, or an outer add with an inner addrec (rather than the other way around). This patch supports these cases by adding a loop over the simplifications, limited to a small number of iterations. The motivation is the same as in #101339, to make computeConstantDifference() powerful enough to replace existing uses of `dyn_cast<SCEVConstant>(getMinusSCEV())` with it. Though as the IR test diff shows, other callers may also benefit.	2024-08-13 11:01:39 +02:00
Florian Hahn	2ab910c08c	[LV] Check pointer user are in loop when checking for uniform pointers. Widening decisions are not set for users outside the loop. Avoid crashing by only calling isVectorizedMemAccessUse for users in the loop. Fixes https://github.com/llvm/llvm-project/issues/102934.	2024-08-13 09:23:44 +01:00
Florian Hahn	c7a44ec031	[VPlan] Check successors in VPlan to check if scalar epi required (NFC) Now that the branches to the scalar epilogue are modeled in VPlan directly, check the VPlan to see if a scalar epilogue is required. Preparation for https://github.com/llvm/llvm-project/pull/100658.	2024-08-12 15:33:52 +01:00
Florian Hahn	cd08fadd03	[LV] Include chains feeding inductions in cost precomputation. Include chain of ops feeding inductions in cost precomputation for inductions, not just the induction increment. In VPlan, those instructions will be cleaned up, as both phi and increment are generated by VPWidenIntOrFpInductionRecipe independently. Fixes https://github.com/llvm/llvm-project/issues/101337.	2024-08-12 14:45:43 +01:00
Florian Hahn	db0603cb7b	[LV] Only OR unique edges when creating block-in masks. This removes redundant ORs of matching masks. Follow-up to f0df4fbd0c7b to reduce the number of redundant ORs for masks.	2024-08-12 10:17:40 +01:00
Florian Hahn	55d7e59023	[VPlan] Replace hard-coded value number in test with pattern. Make test more robust w.r.t. future changes.	2024-08-12 09:36:12 +01:00
Florian Hahn	5a42a677aa	[VPlan] Mark VPVectorPointer as only using the first part of the ptr. VPVectorPointerRecipe only uses the first part of the pointer operand, so mark it accordingly. Follow-up suggested as part of https://github.com/llvm/llvm-project/pull/99808.	2024-08-12 08:46:55 +01:00
Farzon Lotfi	efc6b50d2d	[LoopVectorize][X86][AMDLibm] Add Missing AMD LibM trig vector intrinsics (#101125 ) Adding the following linked to their docs: - [amd_vrs16_acosf](`9c0b67293b/scripts/libalm.def (L221)`) - [amd_vrd2_cosh](`9c0b67293b/scripts/libalm.def (L124)`) - [amd_vrs16_tanhf](`9c0b67293b/scripts/libalm.def (L224)`)	2024-08-11 22:11:09 -04:00
Florian Hahn	60680f7181	[LV] Handle SwitchInst in ::isPredicatedInst. After f0df4fbd0c7b, isPredicatedInst needs to handle SwitchInst as well. Handle it the same as BranchInst. This fixes a crash in the newly added test and improves the results for one of the existing tests in predicate-switch.ll Should fix https://lab.llvm.org/buildbot/#/builders/113/builds/2099.	2024-08-11 20:56:58 +01:00
Florian Hahn	f0df4fbd0c	[LV] Support generating masks for switch terminators. (#99808 ) Update createEdgeMask to created masks where the terminator in Src is a switch. We need to handle 2 separate cases: 1. Dst is not the default desintation. Dst is reached if any of the cases with destination == Dst are taken. Join the conditions for each case where destination == Dst using a logical OR. 2. Dst is the default destination. Dst is reached if none of the cases with destination != Dst are taken. Join the conditions for each case where the destination is != Dst using a logical OR and negate it. Edge masks are created for every destination of cases and/or default when requesting a mask where the source is a switch. Fixes https://github.com/llvm/llvm-project/issues/48188. PR: https://github.com/llvm/llvm-project/pull/99808	2024-08-11 20:38:36 +02:00
Florian Hahn	5286656609	[LV] Regenerate check lines in preparation for #99808 . Regenerate check lines for test to avoid unrelated changes in https://github.com/llvm/llvm-project/pull/99808.	2024-08-11 15:06:05 +01:00

1 2 3 4 5 ...

2615 Commits