llvm-project

Author	SHA1	Message	Date
David Sherwood	af2ed59f2b	[NFC][LoopVectorize] Add zext/sext cost tests when there is type shrinkage Differential Revision: https://reviews.llvm.org/D147151	2023-04-03 13:12:11 +00:00
Florian Hahn	0d61ffd350	[Loads] Support SCEVAddExpr as start for pointer AddRec. Extend handling to support `%base + offset` as start for AddRecs in isDereferenceableAndAlignedInLoop. This is done by adjusting AccessSize by the offset and effectively checking if the full object starting from %base to %base + offset + access-size is dereferenceable. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D147260	2023-04-02 12:33:44 +01:00
Florian Hahn	fdaebbeff4	[LV] Improve test added in 74dee4791a2. Adjust test so it triggers a case missed in the original version of D147260.	2023-03-31 21:50:39 +01:00
Florian Hahn	74dee4791a	[LV] Add test with predicated load where EltSize % Align != 0.	2023-03-31 21:15:24 +01:00
Philip Reames	78ae870f11	{tests] Rerun autogen to reduce a diff [nfc]	2023-03-31 12:47:08 -07:00
Philip Reames	a512ce5e12	[LV] Add tests for non-constant stride pointer inductions Reduced from the case which triggered the revert of 498aa534f472, and then generalized to cover both expansion paths.	2023-03-31 09:10:59 -07:00
David Green	965a090f02	Revert "[IVDescriptors] Add pointer InductionDescriptors with non-constant strides" Multiple errors have being reported on https://reviews.llvm.org/rG498aa534f472d28db893aa9a8627d0b46e17f312 Reverting until the correctness issues can be resolved. We are also seeing a lot of performance differences from the patch. Some are looking good, but some are looking pretty bad.	2023-03-31 11:08:50 +01:00
Florian Hahn	b060ca7042	[LV] Regenerate check lines for test to reduce diff in follow-up patch.	2023-03-30 20:17:12 +01:00
Philip Reames	498aa534f4	[IVDescriptors] Add pointer InductionDescriptors with non-constant strides This matches the handling for integer IVs. I left the non-opaque cases alone, mostly because they're largely irrelevant today. This doesn't actually make much difference in vectorization right now as we immediately fail on aliasing checks (which also bail on non-constant strides). Slightly suprisingly, it's the case which do need runtime checks which work after this patch as they don't use the same dependency analysis path. This will also enable non-constant stride pointer recurrences for other consumers. I've auditted said code, and don't see any obvious issues.	2023-03-30 11:56:00 -07:00
Philip Reames	1c5bb25d62	[RISCV][LV] Add test coverage for strided access patterns [nfc]	2023-03-30 10:02:27 -07:00
Florian Hahn	4173ed1382	[LV] Add test cases for global struct dereferencability. Currently LLVM fails to determine that conditional loads in @accesses_to_struct_dereferenceable are dereferenceable unconditionally.	2023-03-29 17:47:41 +01:00
Paul Osmialowski	6b6f312cce	[TLI][AArch64] Extend SLEEF vectorized functions mapping with VLA functions This commit extends D134719 "[AArch64] Enable libm vectorized functions via SLEEF" with the mappings for the scalable functions. It also introduces all the necessary changes needed to support masked interfaces. Reviewed By: danielkiss, sdesmalen Differential Revision: https://reviews.llvm.org/D146839	2023-03-29 13:07:09 +01:00
Paul Osmialowski	f8f1909d36	Revert "[TLI][AArch64] Extend SLEEF vectorized functions mapping with VLA functions" Reverting it so I could land it with Arcanist. This reverts commit 59dcf927ee43e995374907b6846b657f68d7ea49.	2023-03-29 12:54:22 +01:00
Paul Osmialowski	59dcf927ee	[TLI][AArch64] Extend SLEEF vectorized functions mapping with VLA functions This commit extends D134719 "[AArch64] Enable libm vectorized functions via SLEEF" with the mappings for the scalable functions. It also introduces all the necessary changes needed to support masked interfaces. Signed-off-by: Paul Osmialowski <pawel.osmialowski@arm.com>	2023-03-29 11:07:35 +01:00
Graham Hunter	fba2a7c695	[LV][AArch64] Precommit interleaved access tests Precommit for D145163	2023-03-29 10:26:14 +01:00
Philip Reames	64f69e453e	[RISCV] Cost model for general case of single vector permute The cost model was not accounting for the fact that we can generate vrgather + an index expression. Two cases to call out. 1) I did not model the difference between vrgather and vrgatherei16. The result is the constant pool cost can be slightly understated on RV32. I don't think we care, but if someone disagrees, this would be easy to add. 2) Our current codegen for i8 vectors longer than 256 (which is the limit of what this costs) has some room for improvement. Differential Revision: https://reviews.llvm.org/D147000	2023-03-28 07:34:11 -07:00
David Sherwood	636efd2e35	[SVE][LoopVectorize] Add option to disable tail-folding for reverse loops If we use tail-folding for reverse loops that contain loads and stores then we will need to reverse the loop predicate. This patch adds a new 'reverse' sve-tail-folding option and ensures they are not considered 'simple'. I did this by adding a function called containsDecreasingPointers to AArch64TargetTransformInfo.cpp that searches all instructions in the loop for loads or stores with negative strides. Differential Revision: https://reviews.llvm.org/D146128	2023-03-27 14:10:15 +00:00
David Green	3f8160e9ce	[LV][ARM][AArch64] Add multi-exit Loop Vectorizer tests. NFC These are useful to test with the various predication schemes available on different targets.	2023-03-27 10:21:24 +01:00
David Sherwood	1c4fedfa35	[LoopVectorize] Don't tail-fold for scalable VFs when there is no scalar tail Currently in LoopVectorize we avoid tail-folding if we can prove the trip count is always a multiple of the maximum fixed-width VF. This works because we know the vectoriser only ever chooses a VF that is a power of 2. However, if we are also considering scalable VFs then we conservatively bail out of the optimisation because we don't know the value of vscale, which could be an odd or prime number, etc. This patch tries to enable the same optimisation for scalable VFs by asking if vscale is known to be a power of 2. If so, we can then query the maximum value of vscale and use the same logic as we do for fixed-width VFs. I've also added a new TTI hook called isVScaleKnownToBeAPowerOfTwo that does the same thing as the existing TargetLowering hook. Differential Revision: https://reviews.llvm.org/D146199	2023-03-27 08:34:30 +00:00
David Sherwood	bd0c281fcd	[NFC][LoopVectorize] Change trip counts for some tests to guarantee a scalar tail Quite a few vectoriser tests were using a trip count of 1024, which meant: 1. For fixed-length VFs we would never actually tail-fold, e.g. see Transforms/LoopVectorize/RISCV/uniform-load-store.ll. This is because we can prove at compile-time there will never be a scalar tail. 2. As of D146199 the same optimisation mentioned above will also apply to scalable VFs too. I've changed all such trip counts to be 1025 instead. Differential Revision: https://reviews.llvm.org/D146219	2023-03-24 09:43:50 +00:00
Luke Lau	8d16c6809a	[RISCV] Increase default vectorizer LMUL to 2 After some discussion and experimentation, we have seen that changing the default number of vector register bits to LMUL=2 strikes a sweet spot. Whilst we could be clever here and make the vectorizer smarter about dynamically selecting an LMUL that a) Doesn't affect register pressure b) Suitable for the microarchitecture we would need to teach its heuristics about RISC-V register grouping specifics. Instead this just does the easy, pragmatic thing by changing the default to a safe value that doesn't affect register pressure signifcantly[1], but should increase throughput and unlock more interleaving. [1] Register spilling when compiling sqlite at various levels of `-riscv-v-register-bit-width-lmul`: LMUL=1 2573 spills LMUL=2 2583 spills LMUL=4 2819 spills LMUL=8 3256 spills Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D143723	2023-03-23 10:33:50 +00:00
Luke Lau	06f16232b1	[RISCV][NFC] Make interleaved access test more vectorizable The previous test case stored the result of a deinterleaved load and add into the same source address, which resulted in some scatters which we weren't testing for and made the tests harder to understand. Store it at a separate address, which will make the tests easier to read when the cost model is changed after D145085 is landed Reviewed By: reames Differential Revision: https://reviews.llvm.org/D146442	2023-03-22 16:02:44 +00:00
Anna Thomas	4277d932ef	[LV] Use speculatability within entire loop to avoid strided load predication Use existing functionality for identifying total access size by strided loads. If we can speculate the load across all vector iterations, we can avoid predication for these strided loads (or masked gathers in architectures which support it). Differential Revision: https://reviews.llvm.org/D145616	2023-03-21 12:08:25 -04:00
Florian Hahn	962c306a11	[LV] Don't consider pointer as uniform if it is also stored. Update isVectorizedMemAccessUse to also check if the pointer is stored. This prevents LV to incorrectly consider a pointer as uniform if it is used as both pointer and stored by the same StoreInst. Fixes #61396.	2023-03-17 16:26:16 +00:00
Florian Hahn	a4bb037418	[LV] Add test where pointer is incorrectly marked as uniform. Test for #61396.	2023-03-17 14:24:13 +00:00
Florian Hahn	565b98e793	[LV] Convert consecutive-ptr-uniforms.ll to use opaque pointers (NFC).	2023-03-17 14:07:11 +00:00
Douglas Yung	3e16488769	Mark test added in D145155 as requiring asserts since it uses the "-debug-only" option. This should fix the test failure in Release builds.	2023-03-16 15:03:19 -07:00
Luke Lau	b9238abe05	[RISCV] Enable interleaved access vectorization The loop vectorizer supports generating interleaved loads and stores via shuffle patterns for fixed length vectors. This enables it for RISC-V, since interleaved shuffle patterns can be lowered to vlseg/vsseg in https://reviews.llvm.org/D145022 Reviewed By: reames Differential Revision: https://reviews.llvm.org/D145155	2023-03-16 15:48:55 +00:00
Luke Lau	fc220a1aa9	Revert "[RISCV] Enable interleaved access vectorization" This reverts commit acc03ad10af4f379a644e3956cb9aca54e40696c.	2023-03-15 22:00:48 +00:00
Luke Lau	acc03ad10a	[RISCV] Enable interleaved access vectorization The loop vectorizer supports generating interleaved loads and stores via shuffle patterns for fixed length vectors. This enables it for RISC-V, since interleaved shuffle patterns can be lowered to vlseg/vsseg in https://reviews.llvm.org/D145022 Reviewed By: reames Differential Revision: https://reviews.llvm.org/D145155	2023-03-15 21:56:30 +00:00
David Green	98481bc723	[LV][VPlan] Fix printing TripCount liveins. NFC The TripCount liveins would currently be printed as badref in the vplan as they are not allocated slots in the VPSlotTracker. This patch allocates them a slot and adds them to the printed Live-Ins. It also makes a minor adjustment to printing of Live-ins to reduce the empty lines when multiple Live-ins are present. Differential Revision: https://reviews.llvm.org/D145507	2023-03-13 19:44:12 +00:00
Sanjay Patel	ef6f23535d	Revert "[InstCombine] use loop info when running the pass after loop vectorization" This reverts commit 43ae4b62b2671cf73e691c0b53324cd39405cd51. This was intended to be practically NFC in terms of the overall opt pipeline, but there is experimental data showing that code changes occurred here: https://llvm-compile-time-tracker.com/compare.php?from=772aa05452f8ff90a47168e6801cda2acb5a1873&to=43ae4b62b2671cf73e691c0b53324cd39405cd51&stat=size-text	2023-03-11 17:28:56 -05:00
Sanjay Patel	43ae4b62b2	[InstCombine] use loop info when running the pass after loop vectorization This is the follow-up to D144199 and suggestion from D144045. We make use of loop info explicit via InstCombine pass parameter rather than semi-arbitrary via caching. The only InstCombine transform that uses LoopInfo currently is a GEP fold in visitGEPOfGEP(), so that shows up as a failure in the dedicated test for the fold as well as several LoopVectorizer tests that run extra passes. I don't see any pass manager regression tests that actually check for pass options, but this is intended to be NFC for the pass pipeline behavior - we only try to use loop info where it would have been used before via caching . Differential Revision: https://reviews.llvm.org/D144274	2023-03-11 14:20:30 -05:00
Arthur Eubanks	7c3c981442	[Passes] Remove some legacy passes DFAJumpThreading JumpThreading LibCallsShrink LoopVectorize SLPVectorizer DeadStoreElimination AggressiveDCE CorrelatedValuePropagation IndVarSimplify These are part of the optimization pipeline, of which the legacy version is deprecated and being removed.	2023-03-10 17:17:00 -08:00
Luke Lau	d7323f6a7a	[RISCV][NFC] Add tests for interleaved accesses in loop vectorizer Precommit test for D145155 Reviewed By: reames Differential Revision: https://reviews.llvm.org/D145697	2023-03-09 17:43:16 +00:00
Anna Thomas	ac4c0ea73b	[Tests] Precommit tests for D145616	2023-03-08 17:30:53 -05:00
Florian Hahn	79272ec028	[VPlan] Add predicate to VPReplicateRecipe, expand region later. This patch adds the predicate as additional operand to VPReplicateRecipe during initial construction. The predicated recipes are later moved into replicate regions. This simplifies constructions and some VPlan transformations, like fixed-order recurrence handling. It also improves codegen in some cases (e.g. for in-loop reductions), because the recipes remain in the same block. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D143865	2023-03-08 20:11:28 +01:00
Florian Hahn	7019624ee1	[SCEV] Strengthen nowrap flags via ranges for ARs on construction. At the moment, proveNoWrapViaConstantRanges is only used when creating SCEV[Zero,Sign]ExtendExprs. We can get significant improvements by strengthening flags after creating the AddRec. I'll also share a follow-up patch that removes the code to strengthen flags when creating SCEV[Zero,Sign]ExtendExprs. Modifying AddRecs while creating those can lead to surprising changes. Compile-time looks neutral: https://llvm-compile-time-tracker.com/compare.php?from=94676cf8a13c511a9acfc24ed53c98964a87bde3&to=aced434e8b103109104882776824c4136c90030d&stat=instructions:u Reviewed By: mkazantsev, nikic Differential Revision: https://reviews.llvm.org/D144050	2023-03-07 17:10:34 +01:00
Mel Chen	d5c404d1b9	[RISCV] Enable ordered reduction. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D144455	2023-03-07 04:36:50 -08:00
Graham Hunter	92e0ab937f	[AArch64] Don't map llvm sqrt intrinsics to veclib functions Since AArch64 has sqrt instructions, we want to use those instead of calls to vector math routines for llvm sqrt intrinsics (since those don't imply some of the constraints that libm calls might have) so we just remove the mappings. Code originally written by mgabka Reviewed By: danielkiss, paulwalker-arm Differential Revision: https://reviews.llvm.org/D145392	2023-03-07 11:43:41 +00:00
Graham Hunter	a180344589	[LV] Allow scalarization of function calls when masking is required This patch adds support for scalarizing calls to a function when there is a vector variant that cannot be used, either because there isn't a masked variant or because the cost model indicated a VF without a masked variant was better. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D134422	2023-03-03 15:26:04 +00:00
Mel Chen	9d0703a646	[RISCV] Pre-commit test case for ordered reduction, NFC Reviewed By: reames Differential Revision: https://reviews.llvm.org/D144458	2023-03-01 06:27:43 -08:00
Sander de Smalen	c41b41eb11	[LoopVectorize] Use overflow-check analysis to improve tail-folding. This work follows on from D142109 and addresses a possible regression when we know the loop iteration counter cannot overflow. When we know the overflow-check always evaluates to false, it's better to use the other style of tail folding where it assumes a runtime check was added, because that avoids having to calculate a modified trip-count. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D142894	2023-03-01 14:17:58 +00:00
Sander de Smalen	fe1b51ffee	[LoopVectorize] Remove runtime check and scalar tail loop when tail-folding. When using tail-folding and using the predicate for both data and control-flow (the next vector iteration's predicate is generated with the llvm.active.lane.mask intrinsic and then tested for the backedge), the LoopVectorizer still inserts a runtime check to see if the 'i + VF' may at any point overflow for the given trip-count. When it does, it falls back to a scalar epilogue loop. We can get rid of that runtime check in the pre-header and therefore also remove the scalar epilogue loop. This reduces code-size and avoids a runtime check. Consider the following loop: void foo(char * __restrict__ dst, char *src, unsigned long N) { for (unsigned long i=0; i<N; ++i) dst[i] = src[i] + 42; } If 'N' is e.g. ULONG_MAX, and the VF > 1, then the loop iteration counter will overflow when calculating the predicate for the next vector iteration at some point, because LLVM does: vector.ph: %active.lane.mask.entry = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 %N) vector.body: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ] %active.lane.mask = phi <vscale x 16 x i1> [ %active.lane.mask.entry, %vector.ph ], [ %active.lane.mask.next, %vector.body ] ... %index.next = add i64 %index, 16 ; The add above may overflow, which would affect the lane mask and control flow. Hence a runtime check is needed. %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 %index.next, i64 %N) %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0 br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7 The solution: What we can do instead is calculate the predicate before incrementing the loop iteration counter, such that the llvm.active.lane.mask is calculated from 'i' to 'tripcount > VF ? tripcount - VF : 0', i.e. vector.ph: %active.lane.mask.entry = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 %N) %N_minus_VF = select %N > 16 ? %N - 16 : 0 vector.body: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ] %active.lane.mask = phi <vscale x 16 x i1> [ %active.lane.mask.entry, %vector.ph ], [ %active.lane.mask.next, %vector.body ] ... %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 %index, i64 %N_minus_VF) %index.next = add i64 %index, %4 ; The add above may still overflow, but this time the active.lane.mask is not affected %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0 br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7 For N = 20, we'd then get: vector.ph: %active.lane.mask.entry = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 %N) ; %active.lane.mask.entry = <1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1> %N_minus_VF = select 20 > 16 ? 20 - 16 : 0 ; %N_minus_VF = 4 vector.body: (1st iteration) ... ; using <1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1> as predicate in the loop ... %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 4) ; %active.lane.mask.next = <1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0> %index.next = add i64 0, 16 ; %index.next = 16 %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0 ; %8 = 1 br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7 ; branch to %vector.body vector.body: (2nd iteration) ... ; using <1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0> as predicate in the loop ... %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 16, i64 4) ; %active.lane.mask.next = <0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0> %index.next = add i64 16, 16 ; %index.next = 32 %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0 ; %8 = 0 br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7 ; branch to %for.cond.cleanup Reviewed By: fhahn, david-arm Differential Revision: https://reviews.llvm.org/D142109	2023-03-01 09:01:19 +00:00
Sander de Smalen	de111ae70a	NFC: Use generate_test_checks script for LV tests which seem to have been auto-generated.	2023-03-01 09:01:19 +00:00
sgokhale	ac67ec3a54	[LV] Reland testcase in 0ec4cae	2023-02-28 18:14:18 +05:30
sgokhale	0ec4cae146	[LV] Modify test case for commit 4f9a544 Was observing test failure. Relanding the test	2023-02-28 18:07:36 +05:30
sgokhale	4f9a5447c6	[LV] Reland "Update logic for calculating register usage due to invariants" Previously, while calculating register usage due to invariants, it was assumed that invariant would always be part of widening instructions. This resulted in calculating vector register types for vectors which cant be legalized(check the newly added test for more details). An invariant might not always need a vector register. For e.g., invariant might just be used for iteration check. This patch checks if the invariant is part of any widening instruction and considers register usage accordingly. Fixes issue 60493 Differential Revision: https://reviews.llvm.org/D143422	2023-02-28 17:32:39 +05:30
sgokhale	3c8ddbde37	Revert "[LV] Update logic for calculating register usage due to invariants" Observing test failure for llvm/test/Transforms/LoopVectorize/AArch64/reg-usage.ll This reverts commit d1628266946fdddb44bdad2b3ccf3cd5fc769f42.	2023-02-28 15:46:59 +05:30
sgokhale	d162826694	[LV] Update logic for calculating register usage due to invariants Previously, while calculating register usage due to invariants, it was assumed that invariant would always be part of widening instructions. This resulted in calculating vector register types for vectors which cant be legalized(check the newly added test for more details). An invariant might not always need a vector register. For e.g., invariant might just be used for iteration check. This patch checks if the invariant is part of any widening instruction and considers register usage accordingly. Fixes issue 60493 Differential Revision: https://reviews.llvm.org/D143422	2023-02-28 11:05:26 +05:30

1 2 3 4 5 ...

2005 Commits