llvm-project

Author	SHA1	Message	Date
David Sherwood	69ee653313	[LoopVectorize] Take vscale into account when deciding to create epilogues In LoopVectorizationCostModel::isEpilogueVectorizationProfitable we check to see if the chosen main vector loop VF >= 16. If so, we decide to create a vector epilogue loop. However, this doesn't take VScaleForTuning into account because we could be targeting a CPU where vscale > 1, and hence the runtime VF would be a multiple of the known minimum value. This patch multiplies scalable VFs by VScaleForTuning and several tests have been updated that now produce vector epilogues. Differential Revision: https://reviews.llvm.org/D147522	2023-04-17 10:49:40 +00:00
Florian Hahn	83ab5708d1	[LV] Don't sink scalar instructions that may read from memory. The current sinking code doesn't prevent us from sinking a load past an aliasing store. Skip sinking instructions that may read from memory to avoid a mis-compile. See @minimal_bit_widths_with_aliasing_store for an example where 2 loads are sunk past aliasing stores before this fix. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D147259	2023-04-17 09:30:25 +01:00
Kazu Hirata	c83c4b58d1	[Transforms] Apply fixes from performance-for-range-copy (NFC)	2023-04-16 08:25:28 -07:00
Florian Hahn	668045eb77	[VPlan] Unify Value2VPValue and VPExternalDefs maps (NFCI). Before this patch, a VPlan contained 2 mappings for Values -> VPValue: 1) Value2VPValue and 2) VPExternalDefs. This duplication is unnecessary and there are already cases where external defs are added to Value2VPValue. This patch replaces all uses of VPExternalDefs with Value2VPValue. It clarifies the naming of getOrAddVPValue (to getOrAddExternalVPValue) and addVPValue (to addExternalVPValue). At the moment, this is NFC, but will enable additional simplifications in D147783. Depends on D147891. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D147892	2023-04-16 15:38:31 +01:00
Florian Hahn	7fc0b3049d	[VPlan] Switch to checking sinking legality for recurrences in VPlan. Building on D142885 and D142589, retire the SinkAfter map from the recurrence handling code. It is replaced by checking whether it is possible to sink all users of a recurrence directly in VPlan. This results in simpler code overall and allows to handle additional cases (see the improvements in @test_crash). Depends on D142885. Depends on D142589. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D142886	2023-04-13 22:00:52 +01:00
Craig Topper	4b47d875a1	[LV] Optimize trip count SCEV. To calculate the trip count we need to add 1 to the backedge taken count. If we need to widen the backedge count, it's better to do the add before the widening if we can guarantee it won't overflow. The code here is based on similar code I found in LoopIdiomRecognize. This is the vectorizer version of this InstCombine patch D142783. Looking at the IR diffs, this does look like it gets more cases than the InstCombine patch. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D147355	2023-04-12 16:17:58 -07:00
Florian Hahn	68afaa3f48	[LV] Use std::make_optional to fix build failure after 082a0046. Some compilers require std::make_optional(std::move()) to force construction of the std::optional return value. This should fix the build failure in https://lab.llvm.org/buildbot#builders/67/builds/10991	2023-04-11 17:56:15 +01:00
Florian Hahn	082a004690	[VPlan] Allow building a VPlan to may fail. Update the planning code constructing VPlan to allow building VPlans to fail. This allows us to gradually shift some legality checks to VPlan construction. The first candidate is checking if all users of first-order recurrence phis can be sunk past the recipe computing the previous value. The new functionality will be used by D142886 which is approved and will be landed shortly. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D142885	2023-04-11 15:41:18 +01:00
Florian Hahn	f9d0b35d22	[LV] Re-use already computed runtime VF in fixFixedOrderRecurrence. This was suggested as independent cleanup in D147472. This removes a redundant runtime VF computation when using scalable vectors.	2023-04-10 21:25:12 +01:00
Florian Hahn	954befe2a7	[LV] Turn check into assert in fixFixedOrderRecurrence (NFCI). Suggested as independent cleanup in D147567. Either VF or UF need to be > 1. Note that if the condition would be false, the code below would use a nullptr and crash.	2023-04-10 21:11:41 +01:00
Florian Hahn	35af27c30a	[VPlan] Only create extracts for recurrence exits if there are live-outs. Move the code to collect live-out earlier and only generate extracts for exit values if there are any live-outs that use them. Depends on D147472. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D147567	2023-04-10 21:08:34 +01:00
Florian Hahn	c255eb2c4b	[VPlan] Use VPLiveOut to update FOR live-out users. Instead of iterating over all LCSSA phis in the exit block, collect all LiveOut users of the FOR splice VPInstruction and only update those users. Building on top of D147471, this removes an access to the cost model after VPlan execution. Depends on D147471. Reviewed By: Ayal, michaelmaitland Differential Revision: https://reviews.llvm.org/D147472	2023-04-10 13:02:44 +01:00
Florian Hahn	620e011a25	[VPlan] Don't add live-outs if scalar epilogue is required. Instead of clearing live outs when a scalar epilogue is required late, don't add live outs during VPlan construction if a scalar epilogue is required. This enables more VPlan-based DCE (if the live out would be the only user in the plan) and is a step towards removing an access of the cost model in fixedVectorizedLoop (which is after VPlan execution). Depends on D147468. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D147471	2023-04-09 09:18:24 +01:00
Florian Hahn	c7a34d355a	[VPlan] Require VFRange.End to be a power-of-2. (NFCI) This removes the need to convert the end of the range to the next power-of-2 for the end iterator after 4bd3fda5124962 and was suggested as follow-up TODO in D147468.	2023-04-08 13:04:08 +01:00
Florian Hahn	4bd3fda512	[VPlan] Add VFRange::begin() and end() iterators. (NFCI) Add an iterator to iterate over all VFs in VFRange. This simplifies some existing code and allows using all_of,any_of and none_of on a VFRange. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D147468	2023-04-08 10:22:25 +01:00
Florian Hahn	11896357d4	[VPlan] Add VPInterleaveRecipe::NeedsMaskForGaps field (NFCI). This patch adds a NeedsMaskForGaps field to VPInterleaveRecipe to record whether a mask for gaps is needed. This removes a dependence on the cost model in VPlan code-generation. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D147467	2023-04-07 13:11:03 +01:00
Michael Maitland	e86ed9bf2a	[LV][NFC] Improve complexity of fixing users of recurrences The original loop has O(MxN) since `is_contained` iterates over all incoming values. This change makes it so only the phis which use the value as an incoming value are iterated over so it is now O(M). Differential Revision: https://reviews.llvm.org/D146999	2023-04-06 16:15:51 -07:00
Florian Hahn	3f36b9b456	[LV] Move conditional MaskForGaps construction to load case. Conditionally setting MaskForGaps is only needed for loads. This avoid re-computing MaskForGaps for stores. Suggested as independent cleanup in D147467.	2023-04-06 21:16:37 +01:00
David Sherwood	9278dd7b2b	[LoopVectorize] Fix zext/sext cost calculations when types are shrunk In getInstructionCost if we know a zext/sext is going to be shrunk we should only be changing the destination type, and leave the source type unchanged. For example, we may change a zext from zext <16 x i8> %a to <16 x i32> to zext <16 x i8> %a to <16 x i16> However, we were previously calculating the cost for doing zext <16 x i16> %a to <16 x i16> which is incorrect. Differential Revision: https://reviews.llvm.org/D147152	2023-04-06 08:52:25 +00:00
David Green	28c8616a5b	[LV] Cleanup and reformatting for some debug messages. NFC This is just some cleanup of various debug messages, pulled out of another patch to simplify it a little.	2023-04-05 17:50:01 +01:00
Philip Reames	c416f6700f	[IVDescriptors] Add pointer InductionDescriptors with non-constant strides (try 2) (JFYI - This has been heavily reframed since original attempt at landing.) This change updates the InductionDescriptor logic to allow matching a pointer IV with a non-constant stride, but also updates the LoopVectorizer to bailout on such descriptors by default. This preserves the default vectorizer behavior. In review, it was pointed out that there's multiple unfortunate performance implications which need to be addressed before this can be enabled. Having a flag allows us to exercise the behavior, and write test cases for logic which is otherwise unreachable (or hard to reach). This will also enable non-constant stride pointer recurrences for other consumers. I've audited said code, and don't see any obvious issues. Differential Revision: https://reviews.llvm.org/D147336	2023-04-05 09:32:35 -07:00
Graham Hunter	185863f7de	[LV] Use available masked vector function variants when required LLVM has the ability to vectorize using function variants that require a mask by creating an all-true mask, and to vectorize a conditional call via scalarization, now we want to join the two parts together and use a masked variant when a mask is required. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D136251	2023-04-05 11:18:38 +01:00
David Sherwood	b4089cfa2f	[NFC][LoopVectorize] Simplify preferPredicateOverEpilogue interface Given just how many arguments we pass to preferPredicateOverEpilogue and considering this list may grow over time I've decided to pass in a pointer to a new TailFoldingInfo structure instead, similar to what we do with IntrinsicCostAttributes, etc. In addition, many of the arguments we pass in are actually available in the LoopVectorizationLegality class so I've managed to reduce the set of pointers that we need to pass in the TailFoldingInfo struct. Differential Revision: https://reviews.llvm.org/D146127	2023-04-04 14:00:49 +00:00
Philip Reames	f6b217c7cb	[LV] Remmove unused default argument to isLegalGatherOrScatter [nfc]	2023-04-03 11:03:35 -07:00
David Green	965a090f02	Revert "[IVDescriptors] Add pointer InductionDescriptors with non-constant strides" Multiple errors have being reported on https://reviews.llvm.org/rG498aa534f472d28db893aa9a8627d0b46e17f312 Reverting until the correctness issues can be resolved. We are also seeing a lot of performance differences from the patch. Some are looking good, but some are looking pretty bad.	2023-03-31 11:08:50 +01:00
Philip Reames	498aa534f4	[IVDescriptors] Add pointer InductionDescriptors with non-constant strides This matches the handling for integer IVs. I left the non-opaque cases alone, mostly because they're largely irrelevant today. This doesn't actually make much difference in vectorization right now as we immediately fail on aliasing checks (which also bail on non-constant strides). Slightly suprisingly, it's the case which do need runtime checks which work after this patch as they don't use the same dependency analysis path. This will also enable non-constant stride pointer recurrences for other consumers. I've auditted said code, and don't see any obvious issues.	2023-03-30 11:56:00 -07:00
David Sherwood	0ef8a79b12	[LoopVectorize] Add non-zero check for MaxPowerOf2RuntimeVF in computeMaxVF This one-line patch just tightens up the code added in 1c4fedfa35aeb8b456e2d8f4f826c0e026b9d863 where we try to avoid tail-folding if we know the runtime VF will always be a multiple of the trip count.	2023-03-29 10:08:32 +00:00
David Sherwood	1c4fedfa35	[LoopVectorize] Don't tail-fold for scalable VFs when there is no scalar tail Currently in LoopVectorize we avoid tail-folding if we can prove the trip count is always a multiple of the maximum fixed-width VF. This works because we know the vectoriser only ever chooses a VF that is a power of 2. However, if we are also considering scalable VFs then we conservatively bail out of the optimisation because we don't know the value of vscale, which could be an odd or prime number, etc. This patch tries to enable the same optimisation for scalable VFs by asking if vscale is known to be a power of 2. If so, we can then query the maximum value of vscale and use the same logic as we do for fixed-width VFs. I've also added a new TTI hook called isVScaleKnownToBeAPowerOfTwo that does the same thing as the existing TargetLowering hook. Differential Revision: https://reviews.llvm.org/D146199	2023-03-27 08:34:30 +00:00
Florian Hahn	ea929a07b6	[LV] Set inbounds flag using CreateGEP in vectorizeInterleaveGroup(NFC). This avoids having to cast the result of the builder to GetElementPtrInst.	2023-03-22 11:29:57 +00:00
Florian Hahn	af99aa0ff7	[LV] Set imbounds flag using CreateGEP in VPWidenMemInst (NFC). This avoids having to cast the result of the builder to GetElementPtrInst.	2023-03-21 11:44:21 +00:00
Florian Hahn	371bb2c9d3	[VPlan] Move createReplicateRegion out of VPRecipeBuilder.h. (NFC) The function doesn't use anything from VPRecipeBuilder, so move the definition to where it is actually used and turn it into a simple static function. It also makes the VPRecipeBuilder argument for createAndOptimizeReplicateRegions unnecessary.	2023-03-18 20:30:49 +00:00
Florian Hahn	6a6b65a84c	[LV] Restructure code creating replicate region (NFC). Re-order recipe and block creation to be in order, as suggested post-commit for 2db71c9851e5.	2023-03-18 17:17:07 +00:00
Florian Hahn	962c306a11	[LV] Don't consider pointer as uniform if it is also stored. Update isVectorizedMemAccessUse to also check if the pointer is stored. This prevents LV to incorrectly consider a pointer as uniform if it is used as both pointer and stored by the same StoreInst. Fixes #61396.	2023-03-17 16:26:16 +00:00
Graham Hunter	9aa01c4e89	[LV] Remove scalable constraints on creating bitcasts InnerLoopVectorizer::createBitOrPointerCast only supported fixed length vectors since it hadn't been updated. Supporting scalable vectors is just a matter of changing types and using elementcount instead of numelements, since there's nothing which actually relies on knowing the exact length of the vector. Original written by mgabka. Split out from D145163.	2023-03-17 16:19:33 +00:00
Florian Hahn	eca14a810e	[VPlan] Consolidate replicate region optimizations (NFC). As suggested in D143865, consolidate replicate region creation and optimization in a single helper that's exposed and used by LV.	2023-03-16 17:06:44 +00:00
Kazu Hirata	398af9b43b	[llvm] Use *{Map,Set}::contains (NFC)	2023-03-15 18:06:32 -07:00
Kazu Hirata	c8f9555c4d	[Transforms] Use *{Set,Map}::contains (NFC)	2023-03-14 00:24:30 -07:00
Philip Reames	dae682ce92	[IRBuilder] Add utilities for materializing scalable values [nfc] These idioms already appear a number of places in code, and upcoming changes to the various sanitizers continue to need more instances of the same patterns. Differential Revision: https://reviews.llvm.org/D145945	2023-03-13 11:54:19 -07:00
Florian Hahn	2db71c9851	[VPlan] Simplify code in createReplicateRegion (NFC). Simplify the code as suggested in D143865.	2023-03-11 11:47:23 +01:00
Arthur Eubanks	7c3c981442	[Passes] Remove some legacy passes DFAJumpThreading JumpThreading LibCallsShrink LoopVectorize SLPVectorizer DeadStoreElimination AggressiveDCE CorrelatedValuePropagation IndVarSimplify These are part of the optimization pipeline, of which the legacy version is deprecated and being removed.	2023-03-10 17:17:00 -08:00
Florian Hahn	54558fd8f3	[VPlan] Replace InvariantCond field from VPWidenSelectRecipe. There is no need to store information about invariance in the recipe. Replace the fields with checks of the operands using isDefinedOutsideVectorRegions. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D144489	2023-03-10 15:28:43 +01:00
Florian Hahn	a8adb38a96	[VPlan] Replace invariance fields from VPWidenGEPRecipe. There is no need to store information about invariance in the recipe. Replace the fields with checks of the operands using isDefinedOutsideVectorRegions. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D144487	2023-03-09 17:52:22 +01:00
Florian Hahn	79272ec028	[VPlan] Add predicate to VPReplicateRecipe, expand region later. This patch adds the predicate as additional operand to VPReplicateRecipe during initial construction. The predicated recipes are later moved into replicate regions. This simplifies constructions and some VPlan transformations, like fixed-order recurrence handling. It also improves codegen in some cases (e.g. for in-loop reductions), because the recipes remain in the same block. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D143865	2023-03-08 20:11:28 +01:00
Florian Hahn	3b2cf45d6b	[VPlan] Check if recipe is in ReplicateRegion for IfPredicateInstr (NFC) Check if replicate recipe is in a replicate region when considering to collect predicated instructions. This allows use IsPredicated for recipes with a mask attached directly in D143865. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D145322	2023-03-08 11:39:44 +01:00
sgokhale	4f018e54c4	[LV][AArch64] Resolve test failure due use of unordered container AArch64/reg-usage.ll has an issue with the output ordering due to use of unordered container. This was discovered by -DLLVM_REVERSE_ITERATION:BOOL=ON cmake option. This patch tries to address it by making use of ordered container. Differential Revision: https://reviews.llvm.org/D145472/	2023-03-07 16:42:21 +05:30
Graham Hunter	a180344589	[LV] Allow scalarization of function calls when masking is required This patch adds support for scalarizing calls to a function when there is a vector variant that cannot be used, either because there isn't a masked variant or because the cost model indicated a VF without a masked variant was better. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D134422	2023-03-03 15:26:04 +00:00
Sander de Smalen	c41b41eb11	[LoopVectorize] Use overflow-check analysis to improve tail-folding. This work follows on from D142109 and addresses a possible regression when we know the loop iteration counter cannot overflow. When we know the overflow-check always evaluates to false, it's better to use the other style of tail folding where it assumes a runtime check was added, because that avoids having to calculate a modified trip-count. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D142894	2023-03-01 14:17:58 +00:00
Sander de Smalen	fe1b51ffee	[LoopVectorize] Remove runtime check and scalar tail loop when tail-folding. When using tail-folding and using the predicate for both data and control-flow (the next vector iteration's predicate is generated with the llvm.active.lane.mask intrinsic and then tested for the backedge), the LoopVectorizer still inserts a runtime check to see if the 'i + VF' may at any point overflow for the given trip-count. When it does, it falls back to a scalar epilogue loop. We can get rid of that runtime check in the pre-header and therefore also remove the scalar epilogue loop. This reduces code-size and avoids a runtime check. Consider the following loop: void foo(char * __restrict__ dst, char *src, unsigned long N) { for (unsigned long i=0; i<N; ++i) dst[i] = src[i] + 42; } If 'N' is e.g. ULONG_MAX, and the VF > 1, then the loop iteration counter will overflow when calculating the predicate for the next vector iteration at some point, because LLVM does: vector.ph: %active.lane.mask.entry = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 %N) vector.body: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ] %active.lane.mask = phi <vscale x 16 x i1> [ %active.lane.mask.entry, %vector.ph ], [ %active.lane.mask.next, %vector.body ] ... %index.next = add i64 %index, 16 ; The add above may overflow, which would affect the lane mask and control flow. Hence a runtime check is needed. %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 %index.next, i64 %N) %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0 br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7 The solution: What we can do instead is calculate the predicate before incrementing the loop iteration counter, such that the llvm.active.lane.mask is calculated from 'i' to 'tripcount > VF ? tripcount - VF : 0', i.e. vector.ph: %active.lane.mask.entry = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 %N) %N_minus_VF = select %N > 16 ? %N - 16 : 0 vector.body: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ] %active.lane.mask = phi <vscale x 16 x i1> [ %active.lane.mask.entry, %vector.ph ], [ %active.lane.mask.next, %vector.body ] ... %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 %index, i64 %N_minus_VF) %index.next = add i64 %index, %4 ; The add above may still overflow, but this time the active.lane.mask is not affected %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0 br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7 For N = 20, we'd then get: vector.ph: %active.lane.mask.entry = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 %N) ; %active.lane.mask.entry = <1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1> %N_minus_VF = select 20 > 16 ? 20 - 16 : 0 ; %N_minus_VF = 4 vector.body: (1st iteration) ... ; using <1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1> as predicate in the loop ... %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 4) ; %active.lane.mask.next = <1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0> %index.next = add i64 0, 16 ; %index.next = 16 %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0 ; %8 = 1 br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7 ; branch to %vector.body vector.body: (2nd iteration) ... ; using <1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0> as predicate in the loop ... %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 16, i64 4) ; %active.lane.mask.next = <0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0> %index.next = add i64 16, 16 ; %index.next = 32 %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0 ; %8 = 0 br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7 ; branch to %for.cond.cleanup Reviewed By: fhahn, david-arm Differential Revision: https://reviews.llvm.org/D142109	2023-03-01 09:01:19 +00:00
Nikita Popov	4bc254c664	[LoopVectorize] Only fetch BFI if profile summary available BlockFrequencyInfo should generally only be fetched in PGO builds where a PSI profile summary is available. However, LoopVectorize was fetching it unconditionally. This results in a small compile-time improvement for non-PGO builds. Differential Revision: https://reviews.llvm.org/D144953	2023-02-28 14:16:21 +01:00
sgokhale	4f9a5447c6	[LV] Reland "Update logic for calculating register usage due to invariants" Previously, while calculating register usage due to invariants, it was assumed that invariant would always be part of widening instructions. This resulted in calculating vector register types for vectors which cant be legalized(check the newly added test for more details). An invariant might not always need a vector register. For e.g., invariant might just be used for iteration check. This patch checks if the invariant is part of any widening instruction and considers register usage accordingly. Fixes issue 60493 Differential Revision: https://reviews.llvm.org/D143422	2023-02-28 17:32:39 +05:30

1 2 3 4 5 ...

1854 Commits