llvm-project

Author	SHA1	Message	Date
David Green	3f8160e9ce	[LV][ARM][AArch64] Add multi-exit Loop Vectorizer tests. NFC These are useful to test with the various predication schemes available on different targets.	2023-03-27 10:21:24 +01:00
David Sherwood	1c4fedfa35	[LoopVectorize] Don't tail-fold for scalable VFs when there is no scalar tail Currently in LoopVectorize we avoid tail-folding if we can prove the trip count is always a multiple of the maximum fixed-width VF. This works because we know the vectoriser only ever chooses a VF that is a power of 2. However, if we are also considering scalable VFs then we conservatively bail out of the optimisation because we don't know the value of vscale, which could be an odd or prime number, etc. This patch tries to enable the same optimisation for scalable VFs by asking if vscale is known to be a power of 2. If so, we can then query the maximum value of vscale and use the same logic as we do for fixed-width VFs. I've also added a new TTI hook called isVScaleKnownToBeAPowerOfTwo that does the same thing as the existing TargetLowering hook. Differential Revision: https://reviews.llvm.org/D146199	2023-03-27 08:34:30 +00:00
David Sherwood	bd0c281fcd	[NFC][LoopVectorize] Change trip counts for some tests to guarantee a scalar tail Quite a few vectoriser tests were using a trip count of 1024, which meant: 1. For fixed-length VFs we would never actually tail-fold, e.g. see Transforms/LoopVectorize/RISCV/uniform-load-store.ll. This is because we can prove at compile-time there will never be a scalar tail. 2. As of D146199 the same optimisation mentioned above will also apply to scalable VFs too. I've changed all such trip counts to be 1025 instead. Differential Revision: https://reviews.llvm.org/D146219	2023-03-24 09:43:50 +00:00
Luke Lau	8d16c6809a	[RISCV] Increase default vectorizer LMUL to 2 After some discussion and experimentation, we have seen that changing the default number of vector register bits to LMUL=2 strikes a sweet spot. Whilst we could be clever here and make the vectorizer smarter about dynamically selecting an LMUL that a) Doesn't affect register pressure b) Suitable for the microarchitecture we would need to teach its heuristics about RISC-V register grouping specifics. Instead this just does the easy, pragmatic thing by changing the default to a safe value that doesn't affect register pressure signifcantly[1], but should increase throughput and unlock more interleaving. [1] Register spilling when compiling sqlite at various levels of `-riscv-v-register-bit-width-lmul`: LMUL=1 2573 spills LMUL=2 2583 spills LMUL=4 2819 spills LMUL=8 3256 spills Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D143723	2023-03-23 10:33:50 +00:00
Luke Lau	06f16232b1	[RISCV][NFC] Make interleaved access test more vectorizable The previous test case stored the result of a deinterleaved load and add into the same source address, which resulted in some scatters which we weren't testing for and made the tests harder to understand. Store it at a separate address, which will make the tests easier to read when the cost model is changed after D145085 is landed Reviewed By: reames Differential Revision: https://reviews.llvm.org/D146442	2023-03-22 16:02:44 +00:00
Anna Thomas	4277d932ef	[LV] Use speculatability within entire loop to avoid strided load predication Use existing functionality for identifying total access size by strided loads. If we can speculate the load across all vector iterations, we can avoid predication for these strided loads (or masked gathers in architectures which support it). Differential Revision: https://reviews.llvm.org/D145616	2023-03-21 12:08:25 -04:00
Florian Hahn	962c306a11	[LV] Don't consider pointer as uniform if it is also stored. Update isVectorizedMemAccessUse to also check if the pointer is stored. This prevents LV to incorrectly consider a pointer as uniform if it is used as both pointer and stored by the same StoreInst. Fixes #61396.	2023-03-17 16:26:16 +00:00
Florian Hahn	a4bb037418	[LV] Add test where pointer is incorrectly marked as uniform. Test for #61396.	2023-03-17 14:24:13 +00:00
Florian Hahn	565b98e793	[LV] Convert consecutive-ptr-uniforms.ll to use opaque pointers (NFC).	2023-03-17 14:07:11 +00:00
Douglas Yung	3e16488769	Mark test added in D145155 as requiring asserts since it uses the "-debug-only" option. This should fix the test failure in Release builds.	2023-03-16 15:03:19 -07:00
Luke Lau	b9238abe05	[RISCV] Enable interleaved access vectorization The loop vectorizer supports generating interleaved loads and stores via shuffle patterns for fixed length vectors. This enables it for RISC-V, since interleaved shuffle patterns can be lowered to vlseg/vsseg in https://reviews.llvm.org/D145022 Reviewed By: reames Differential Revision: https://reviews.llvm.org/D145155	2023-03-16 15:48:55 +00:00
Luke Lau	fc220a1aa9	Revert "[RISCV] Enable interleaved access vectorization" This reverts commit acc03ad10af4f379a644e3956cb9aca54e40696c.	2023-03-15 22:00:48 +00:00
Luke Lau	acc03ad10a	[RISCV] Enable interleaved access vectorization The loop vectorizer supports generating interleaved loads and stores via shuffle patterns for fixed length vectors. This enables it for RISC-V, since interleaved shuffle patterns can be lowered to vlseg/vsseg in https://reviews.llvm.org/D145022 Reviewed By: reames Differential Revision: https://reviews.llvm.org/D145155	2023-03-15 21:56:30 +00:00
David Green	98481bc723	[LV][VPlan] Fix printing TripCount liveins. NFC The TripCount liveins would currently be printed as badref in the vplan as they are not allocated slots in the VPSlotTracker. This patch allocates them a slot and adds them to the printed Live-Ins. It also makes a minor adjustment to printing of Live-ins to reduce the empty lines when multiple Live-ins are present. Differential Revision: https://reviews.llvm.org/D145507	2023-03-13 19:44:12 +00:00
Sanjay Patel	ef6f23535d	Revert "[InstCombine] use loop info when running the pass after loop vectorization" This reverts commit 43ae4b62b2671cf73e691c0b53324cd39405cd51. This was intended to be practically NFC in terms of the overall opt pipeline, but there is experimental data showing that code changes occurred here: https://llvm-compile-time-tracker.com/compare.php?from=772aa05452f8ff90a47168e6801cda2acb5a1873&to=43ae4b62b2671cf73e691c0b53324cd39405cd51&stat=size-text	2023-03-11 17:28:56 -05:00
Sanjay Patel	43ae4b62b2	[InstCombine] use loop info when running the pass after loop vectorization This is the follow-up to D144199 and suggestion from D144045. We make use of loop info explicit via InstCombine pass parameter rather than semi-arbitrary via caching. The only InstCombine transform that uses LoopInfo currently is a GEP fold in visitGEPOfGEP(), so that shows up as a failure in the dedicated test for the fold as well as several LoopVectorizer tests that run extra passes. I don't see any pass manager regression tests that actually check for pass options, but this is intended to be NFC for the pass pipeline behavior - we only try to use loop info where it would have been used before via caching . Differential Revision: https://reviews.llvm.org/D144274	2023-03-11 14:20:30 -05:00
Arthur Eubanks	7c3c981442	[Passes] Remove some legacy passes DFAJumpThreading JumpThreading LibCallsShrink LoopVectorize SLPVectorizer DeadStoreElimination AggressiveDCE CorrelatedValuePropagation IndVarSimplify These are part of the optimization pipeline, of which the legacy version is deprecated and being removed.	2023-03-10 17:17:00 -08:00
Luke Lau	d7323f6a7a	[RISCV][NFC] Add tests for interleaved accesses in loop vectorizer Precommit test for D145155 Reviewed By: reames Differential Revision: https://reviews.llvm.org/D145697	2023-03-09 17:43:16 +00:00
Anna Thomas	ac4c0ea73b	[Tests] Precommit tests for D145616	2023-03-08 17:30:53 -05:00
Florian Hahn	79272ec028	[VPlan] Add predicate to VPReplicateRecipe, expand region later. This patch adds the predicate as additional operand to VPReplicateRecipe during initial construction. The predicated recipes are later moved into replicate regions. This simplifies constructions and some VPlan transformations, like fixed-order recurrence handling. It also improves codegen in some cases (e.g. for in-loop reductions), because the recipes remain in the same block. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D143865	2023-03-08 20:11:28 +01:00
Florian Hahn	7019624ee1	[SCEV] Strengthen nowrap flags via ranges for ARs on construction. At the moment, proveNoWrapViaConstantRanges is only used when creating SCEV[Zero,Sign]ExtendExprs. We can get significant improvements by strengthening flags after creating the AddRec. I'll also share a follow-up patch that removes the code to strengthen flags when creating SCEV[Zero,Sign]ExtendExprs. Modifying AddRecs while creating those can lead to surprising changes. Compile-time looks neutral: https://llvm-compile-time-tracker.com/compare.php?from=94676cf8a13c511a9acfc24ed53c98964a87bde3&to=aced434e8b103109104882776824c4136c90030d&stat=instructions:u Reviewed By: mkazantsev, nikic Differential Revision: https://reviews.llvm.org/D144050	2023-03-07 17:10:34 +01:00
Mel Chen	d5c404d1b9	[RISCV] Enable ordered reduction. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D144455	2023-03-07 04:36:50 -08:00
Graham Hunter	92e0ab937f	[AArch64] Don't map llvm sqrt intrinsics to veclib functions Since AArch64 has sqrt instructions, we want to use those instead of calls to vector math routines for llvm sqrt intrinsics (since those don't imply some of the constraints that libm calls might have) so we just remove the mappings. Code originally written by mgabka Reviewed By: danielkiss, paulwalker-arm Differential Revision: https://reviews.llvm.org/D145392	2023-03-07 11:43:41 +00:00
Graham Hunter	a180344589	[LV] Allow scalarization of function calls when masking is required This patch adds support for scalarizing calls to a function when there is a vector variant that cannot be used, either because there isn't a masked variant or because the cost model indicated a VF without a masked variant was better. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D134422	2023-03-03 15:26:04 +00:00
Mel Chen	9d0703a646	[RISCV] Pre-commit test case for ordered reduction, NFC Reviewed By: reames Differential Revision: https://reviews.llvm.org/D144458	2023-03-01 06:27:43 -08:00
Sander de Smalen	c41b41eb11	[LoopVectorize] Use overflow-check analysis to improve tail-folding. This work follows on from D142109 and addresses a possible regression when we know the loop iteration counter cannot overflow. When we know the overflow-check always evaluates to false, it's better to use the other style of tail folding where it assumes a runtime check was added, because that avoids having to calculate a modified trip-count. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D142894	2023-03-01 14:17:58 +00:00
Sander de Smalen	fe1b51ffee	[LoopVectorize] Remove runtime check and scalar tail loop when tail-folding. When using tail-folding and using the predicate for both data and control-flow (the next vector iteration's predicate is generated with the llvm.active.lane.mask intrinsic and then tested for the backedge), the LoopVectorizer still inserts a runtime check to see if the 'i + VF' may at any point overflow for the given trip-count. When it does, it falls back to a scalar epilogue loop. We can get rid of that runtime check in the pre-header and therefore also remove the scalar epilogue loop. This reduces code-size and avoids a runtime check. Consider the following loop: void foo(char * __restrict__ dst, char *src, unsigned long N) { for (unsigned long i=0; i<N; ++i) dst[i] = src[i] + 42; } If 'N' is e.g. ULONG_MAX, and the VF > 1, then the loop iteration counter will overflow when calculating the predicate for the next vector iteration at some point, because LLVM does: vector.ph: %active.lane.mask.entry = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 %N) vector.body: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ] %active.lane.mask = phi <vscale x 16 x i1> [ %active.lane.mask.entry, %vector.ph ], [ %active.lane.mask.next, %vector.body ] ... %index.next = add i64 %index, 16 ; The add above may overflow, which would affect the lane mask and control flow. Hence a runtime check is needed. %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 %index.next, i64 %N) %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0 br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7 The solution: What we can do instead is calculate the predicate before incrementing the loop iteration counter, such that the llvm.active.lane.mask is calculated from 'i' to 'tripcount > VF ? tripcount - VF : 0', i.e. vector.ph: %active.lane.mask.entry = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 %N) %N_minus_VF = select %N > 16 ? %N - 16 : 0 vector.body: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ] %active.lane.mask = phi <vscale x 16 x i1> [ %active.lane.mask.entry, %vector.ph ], [ %active.lane.mask.next, %vector.body ] ... %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 %index, i64 %N_minus_VF) %index.next = add i64 %index, %4 ; The add above may still overflow, but this time the active.lane.mask is not affected %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0 br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7 For N = 20, we'd then get: vector.ph: %active.lane.mask.entry = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 %N) ; %active.lane.mask.entry = <1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1> %N_minus_VF = select 20 > 16 ? 20 - 16 : 0 ; %N_minus_VF = 4 vector.body: (1st iteration) ... ; using <1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1> as predicate in the loop ... %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 4) ; %active.lane.mask.next = <1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0> %index.next = add i64 0, 16 ; %index.next = 16 %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0 ; %8 = 1 br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7 ; branch to %vector.body vector.body: (2nd iteration) ... ; using <1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0> as predicate in the loop ... %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 16, i64 4) ; %active.lane.mask.next = <0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0> %index.next = add i64 16, 16 ; %index.next = 32 %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0 ; %8 = 0 br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7 ; branch to %for.cond.cleanup Reviewed By: fhahn, david-arm Differential Revision: https://reviews.llvm.org/D142109	2023-03-01 09:01:19 +00:00
Sander de Smalen	de111ae70a	NFC: Use generate_test_checks script for LV tests which seem to have been auto-generated.	2023-03-01 09:01:19 +00:00
sgokhale	ac67ec3a54	[LV] Reland testcase in 0ec4cae	2023-02-28 18:14:18 +05:30
sgokhale	0ec4cae146	[LV] Modify test case for commit 4f9a544 Was observing test failure. Relanding the test	2023-02-28 18:07:36 +05:30
sgokhale	4f9a5447c6	[LV] Reland "Update logic for calculating register usage due to invariants" Previously, while calculating register usage due to invariants, it was assumed that invariant would always be part of widening instructions. This resulted in calculating vector register types for vectors which cant be legalized(check the newly added test for more details). An invariant might not always need a vector register. For e.g., invariant might just be used for iteration check. This patch checks if the invariant is part of any widening instruction and considers register usage accordingly. Fixes issue 60493 Differential Revision: https://reviews.llvm.org/D143422	2023-02-28 17:32:39 +05:30
sgokhale	3c8ddbde37	Revert "[LV] Update logic for calculating register usage due to invariants" Observing test failure for llvm/test/Transforms/LoopVectorize/AArch64/reg-usage.ll This reverts commit d1628266946fdddb44bdad2b3ccf3cd5fc769f42.	2023-02-28 15:46:59 +05:30
sgokhale	d162826694	[LV] Update logic for calculating register usage due to invariants Previously, while calculating register usage due to invariants, it was assumed that invariant would always be part of widening instructions. This resulted in calculating vector register types for vectors which cant be legalized(check the newly added test for more details). An invariant might not always need a vector register. For e.g., invariant might just be used for iteration check. This patch checks if the invariant is part of any widening instruction and considers register usage accordingly. Fixes issue 60493 Differential Revision: https://reviews.llvm.org/D143422	2023-02-28 11:05:26 +05:30
Luke Lau	15f9cf164c	[LV][RISCV] Don't interleave scalable vector loops It's less clear with scalable vectors than fixed length vectors that interleaving exposes more ILP, as scalable vectors can be thought of a sort of hardware form of interleaving, especially with larger LMULs. This also addresses the unexpected additional unrolling that occurs when using larger LMULs in the loop vectorizer. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D144485	2023-02-22 10:15:11 +00:00
Luke Lau	0b336e9ef0	[RISCV][NFC] Add test for different LMULs in vectorizer This is a test for an upcoming patch that proposes to change the default LMUL used by the loop vectorizer from 1 to 2 Reviewed By: reames Differential Revision: https://reviews.llvm.org/D143722	2023-02-20 22:40:40 +00:00
Mel Chen	3e84fc857f	[LV] Harden the test of the minmax with index pattern. (NFC) - Add test config: -force-vector-width=4 -force-vector-interleave=1 - New test case: The test case both returns the minimum value and the index. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D143905	2023-02-20 03:16:28 -08:00
Hongtao Yu	c38c8d6743	[PseudoProbe] Refactoring a test As titled. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D144137	2023-02-15 14:07:51 -08:00
Graham Hunter	0fa5df1959	[LV] Synthesize all true masks for masked vector function variants When vectorizing code with function calls in it, if we encounter a function which only has vectorized variants requiring a mask we can synthesize an all-true mask to enable us to proceed. Since we want the mask to be represented in vplan, the pointer to the chosen Function is now stored as part of the VPWidenCallRecipe, and mask arguments are added at the appropriate index to the recipe operands. Reviewed By: david-arm, fhahn, reames Differential Revision: https://reviews.llvm.org/D132458	2023-02-14 14:33:18 +00:00
Florian Hahn	af3c25dc3d	[VPlan] Fix iterator invalidation in adjustFixedOrderRecurrences. adjustFixedOrderRecurrences may insert instructions after immediately after the PHI nodes in the block. This invalidates the phis() iterator. To avoid crashing/accessing invalid recipes, first collect all first-order recurrence phi recipes. This should fix a crash reported by @dmgreen after D142589 landed.	2023-02-13 13:51:14 +00:00
Sander de Smalen	5a115452c4	Reland D143267: [LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices. Fixed issue where 'ConstantInt::get(IndextTy, -Part)' was executed with the wrong type for Part, e.g. IndexTy was i64, but Part was 'unsigned', which led to things like 'mul i64 .., 4294967292', which was obviously wrong. Also changed sve-vector-reverse.ll to be vectorized with UF>1 to test this. This reverts commit 1f01cdda68614dba12af3cc3aff38541d0abcc6b.	2023-02-09 09:42:29 +00:00
Florian Hahn	c83fdc905a	[LV] Perform recurrence sinking directly on VPlan. This patch updates LV to sink recipes directly using the VPlan use chains. The initial patch only moves sinking to be purely VPlan-based. Follow-up patches will move legality checks to VPlan as well. At the moment, there's a single test failure remaining. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D142589	2023-02-08 15:49:29 +00:00
Sander de Smalen	1f01cdda68	Revert "[LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices." This patch causes a regression, so reverting it while I investigate the issue. This reverts commit e6eb84a191ca2a1afd5789c5bb398da68bb6065e.	2023-02-08 15:46:52 +00:00
Fangrui Song	af12879146	[RISCV] Allow mismatched SmallDataLimit and use Min for conflicting values Fix an issue about module linking with LTO. When compiling with PIE, the small data limitation needs to be consistent with that in PIC, otherwise there will be linking errors due to conflicting values. bar.c ``` int bar() { return 1; } ``` foo.c ``` int foo() { return 1; } ``` ``` clang --target=riscv64-unknown-linux-gnu -flto -c foo.c -o foo.o -fPIE clang --target=riscv64-unknown-linux-gnu -flto -c bar.c -o bar.o -fPIC clang --target=riscv64-unknown-linux-gnu -flto foo.o bar.o -flto -nostdlib -v -fuse-ld=lld ``` ``` ld.lld: error: linking module flags 'SmallDataLimit': IDs have conflicting values in 'bar.o' and 'ld-temp.o' clang-15: error: linker command failed with exit code 1 (use -v to see invocation) ``` Use Min instead of Error for conflicting SmallDataLimit. Authored by: @joshua-arch1 Signed-off-by: xiaojing.zhang <xiaojing.zhang@xcalibyte.com> Signed-off-by: jianxin.lai <jianxin.lai@xcalibyte.com> Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D131230	2023-02-07 17:13:21 -08:00
Florian Hahn	a69f23493e	[LV] Remove unused load from RISCV test (NFC). The test contained a unused load that appears unrelated to the test (store of vector of i1). Remove it to avoid test changes in follow-up change which will lead to dead loads being removed.	2023-02-07 21:55:44 +00:00
Sander de Smalen	e6eb84a191	[LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices. This is specifically relevant for loops that vectorize using a scalable VF, where the code results in: %vscale = call i32 llvm.vscale.i32() %vf.part1 = mul i32 %vscale, 4 %gep = getelementptr ..., i32 %vf.part1 Which InstCombine then changes into: %vscale = call i32 llvm.vscale.i32() %vf.part1 = mul i32 %vscale, 4 %vf.part1.zext = sext i32 %vf.part1 to i64 %gep = getelementptr ..., i32 %vf.part1.zext D143016 tried to remove these extends, but that only works when the call to llvm.vscale.i32() has a single use. After doing any kind of CSE on these calls the combine no longer kicks in. It seems more sensible to ask DataLayout what type to use, rather than relying on InstCombine to insert the extend and hoping it can fold it away. I've only changed this for indices that are not constant, because I vaguely remember there was a reason for sticking with i32. It would also mean patching up loads more tests. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D143267	2023-02-07 11:47:51 +00:00
Florian Hahn	40621ff4b8	[LV] Also check interleaving only in select-min-index.ll The new combination exposed a crash in earlier versions of D132063.	2023-02-06 11:30:14 +00:00
Florian Hahn	a9ac22b501	[LV] Add users for loads to make tests more robust. Update a few tests to add users to loads to avoid them being optimized out by future changes. In cases the unused loads didn't matter for the test, remove them.	2023-02-04 20:42:57 +00:00
Florian Hahn	fd5bccb8b1	[LV] Add initial tests for sinking loads past other instructions. Extend test coverage for sinking loads that use fixed order recurrences.	2023-02-04 18:18:33 +00:00
Craig Topper	df76ff98e8	[InstCombine][LV] Fold (add (zext (add X, -1)), 1) -> (zext X) if X is non-zero. This artifact can appear from the vectorizer. (add X, -1) is the backedge taken count. It gets zero extended and then 1 is added to it to get the trip count. There is usually a dominating branch that rules out X being zero. Alive: https://alive2.llvm.org/ce/z/NsRDwX	2023-01-30 17:45:01 -08:00
Matt Devereau	8ff47f6032	[LoopVectorize] Enable integer Mul and Add as select reduction patterns This patch vectorizes Phi node loop reductions for select's whos condition comes from a floating-point comparison, with its operands being integers for Add, Sub, and Mul reductions. Example: int foo(float *x, int n) { int sum = 0; for (int i=0; i<n; ++i) { float elem = x[i]; if (elem > 0) { sum += 2; } } return sum; } This would previously fail to vectorize due to the integer reduction.	2023-01-30 09:41:40 +00:00

... 7 8 9 10 11 ...

2388 Commits