llvm-project

Author	SHA1	Message	Date
Florian Hahn	4fc190351e	[VPlan] Remove uneeded NeedsVectorIV from VPWidenIntOrFpInduction. After recent improvements, all instances of VPWidenIntOrFpInductionRecipe should needs a vector IV and there's no need for a separate field.	2023-04-17 13:38:00 +01:00
Bjorn Pettersson	21a6890856	[Vectorize] Clean up Transforms/Vectorize.h Removed definitions of vectorizeBasicBlock and VectorizeConfig (possibly a remnant from the BBVectorize pass that was removed way back in 2017). Also reduced amount of include dependencies to Transforms/Vectorize.h.	2023-04-17 13:54:19 +02:00
Bjorn Pettersson	a20f7efbc5	Remove several no longer needed includes. NFCI Mostly removing includes of InitializePasses.h and Pass.h in passes that no longer has support for the legacy PM.	2023-04-17 13:54:19 +02:00
Florian Hahn	02369b75fd	[VPlan] Mark recurrence recipes as not having side-effects. Add support for FirstOrderRecurrenceSplice and VPFirstOrderRecurrencePHI recipes to mayHaveSideEffects. They both don't have side-effects.	2023-04-17 12:30:52 +01:00
David Sherwood	69ee653313	[LoopVectorize] Take vscale into account when deciding to create epilogues In LoopVectorizationCostModel::isEpilogueVectorizationProfitable we check to see if the chosen main vector loop VF >= 16. If so, we decide to create a vector epilogue loop. However, this doesn't take VScaleForTuning into account because we could be targeting a CPU where vscale > 1, and hence the runtime VF would be a multiple of the known minimum value. This patch multiplies scalable VFs by VScaleForTuning and several tests have been updated that now produce vector epilogues. Differential Revision: https://reviews.llvm.org/D147522	2023-04-17 10:49:40 +00:00
Florian Hahn	83ab5708d1	[LV] Don't sink scalar instructions that may read from memory. The current sinking code doesn't prevent us from sinking a load past an aliasing store. Skip sinking instructions that may read from memory to avoid a mis-compile. See @minimal_bit_widths_with_aliasing_store for an example where 2 loads are sunk past aliasing stores before this fix. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D147259	2023-04-17 09:30:25 +01:00
Kazu Hirata	c83c4b58d1	[Transforms] Apply fixes from performance-for-range-copy (NFC)	2023-04-16 08:25:28 -07:00
Florian Hahn	668045eb77	[VPlan] Unify Value2VPValue and VPExternalDefs maps (NFCI). Before this patch, a VPlan contained 2 mappings for Values -> VPValue: 1) Value2VPValue and 2) VPExternalDefs. This duplication is unnecessary and there are already cases where external defs are added to Value2VPValue. This patch replaces all uses of VPExternalDefs with Value2VPValue. It clarifies the naming of getOrAddVPValue (to getOrAddExternalVPValue) and addVPValue (to addExternalVPValue). At the moment, this is NFC, but will enable additional simplifications in D147783. Depends on D147891. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D147892	2023-04-16 15:38:31 +01:00
Florian Hahn	2db031528e	[VPlan] Check VPValue step in isCanonical (NFCI). Update the isCanonical() implementations to check the VPValue step operand instead of the step in the induction descriptor. At the moment this is NFC, but it enables further optimizations if the step is replaced by a constant in D147783. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D147891	2023-04-16 14:48:03 +01:00
Vasileios Porpodas	7e67a9473d	[SLP][NFC] Remove Limit from tryToVectorizeSequence() arguments. Limit turns out to be implemented in the exact same way for all calls to tryToVectorizeSequence(). So this patch removes it and implements it internally as a lambda function. Differential Revision: https://reviews.llvm.org/D148382	2023-04-14 14:58:57 -07:00
Nikita Popov	62ef97e063	[llvm-c] Remove PassRegistry and initialization APIs Remove C APIs for interacting with PassRegistry and pass initialization. These are legacy PM concepts, and are no longer relevant for the new pass manager. Calls to these initialization functions can simply be dropped. Differential Revision: https://reviews.llvm.org/D145043	2023-04-14 12:12:48 +02:00
Florian Hahn	7fc0b3049d	[VPlan] Switch to checking sinking legality for recurrences in VPlan. Building on D142885 and D142589, retire the SinkAfter map from the recurrence handling code. It is replaced by checking whether it is possible to sink all users of a recurrence directly in VPlan. This results in simpler code overall and allows to handle additional cases (see the improvements in @test_crash). Depends on D142885. Depends on D142589. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D142886	2023-04-13 22:00:52 +01:00
Alexey Bataev	a4eff2b56c	[SLP][NFC]Remove extra semicolons after function definitions, NFC	2023-04-13 11:33:25 -07:00
Alexey Bataev	f82eb7e066	[SLP]Introduce gather cost estimation function. Introduced BoUpSLP::ShuffleCostEstimator::gather function as an initial implementation of the gather/buildvector cost estimation for buildvector nodes. It will allow to use general codegen infrastructure for better cost estimation + it improves the cost estimation for the gathers/buildvectors. Improved part of D110978. Differential Revision: https://reviews.llvm.org/D148174	2023-04-13 10:16:00 -07:00
Simon Pilgrim	b3480d5ede	[SLP] Compute min/max scalar reduction costs using min/max intrinsics instead of expanded cmp+sel By default these will expand back to cmp/sel, but some targets (X86) has optimized costs for scalar integer min/max patterns which are lower than the default expansion (pre-SSE41 is particularly weak for vector min/max support). Differential Revision: [SLP] Compute min/max scalar reduction costs using min/max intrinsics instead of expanded cmp+sel	2023-04-13 17:00:39 +01:00
Simon Pilgrim	9e30b87afb	[TTI] getMinMaxReductionCost - add FastMathFlag argument Similar to the getArithmeticReductionCost / getExtendedReductionCost calls (which really don't need to use std::optional<>). This will be necessary to correct recognize fast/nnan fmax/fmul reductions which can avoid nan handling - which will allow us to remove the fmax/fmin special case in X86TTIImpl::getMinMaxCost and use getIntrinsicInstrCost like we do for integer reductions (63c3895327839ba5b57f5b99ec9e888abf976ac6). Differential Revision: https://reviews.llvm.org/D148149	2023-04-13 10:42:42 +01:00
Bjorn Pettersson	410775ecfd	[Transforms][LTO] Remove some redundant includes. NFC No need to include CallGraphSCCPass.h from the IPO/Inliner. Also removed the include of LegacyPassManager.h in a couple of files that do not really depend on that header file. Differential Revision: https://reviews.llvm.org/D148083	2023-04-13 10:12:00 +02:00
Craig Topper	4b47d875a1	[LV] Optimize trip count SCEV. To calculate the trip count we need to add 1 to the backedge taken count. If we need to widen the backedge count, it's better to do the add before the widening if we can guarantee it won't overflow. The code here is based on similar code I found in LoopIdiomRecognize. This is the vectorizer version of this InstCombine patch D142783. Looking at the IR diffs, this does look like it gets more cases than the InstCombine patch. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D147355	2023-04-12 16:17:58 -07:00
Alexey Bataev	b28f407df9	[SLP]Improve reduction cost model for scalars. Instead of abstract cost of the scalar reduction ops, try to use the cost of actual reduction operation instructions, where possible. Also, remove the estimation of the vectorized GEPs pointers for reduced loads, since it is already handled in the tree. Differential Revision: https://reviews.llvm.org/D148036	2023-04-12 11:32:51 -07:00
Alexey Bataev	d00158cd28	[SLP][NFC]Introduce ShuffleCostEstimator and adjustExtracts member function. Added ShuffleCostEstimator class and the first adjustExtracts member, which is just a copy of previous AdjustExtractCost lambda. Differential Revision: https://reviews.llvm.org/D147787	2023-04-11 12:47:07 -07:00
Florian Hahn	68afaa3f48	[LV] Use std::make_optional to fix build failure after 082a0046. Some compilers require std::make_optional(std::move()) to force construction of the std::optional return value. This should fix the build failure in https://lab.llvm.org/buildbot#builders/67/builds/10991	2023-04-11 17:56:15 +01:00
Florian Hahn	082a004690	[VPlan] Allow building a VPlan to may fail. Update the planning code constructing VPlan to allow building VPlans to fail. This allows us to gradually shift some legality checks to VPlan construction. The first candidate is checking if all users of first-order recurrence phis can be sunk past the recipe computing the previous value. The new functionality will be used by D142886 which is approved and will be landed shortly. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D142885	2023-04-11 15:41:18 +01:00
Florian Hahn	f9d0b35d22	[LV] Re-use already computed runtime VF in fixFixedOrderRecurrence. This was suggested as independent cleanup in D147472. This removes a redundant runtime VF computation when using scalable vectors.	2023-04-10 21:25:12 +01:00
Florian Hahn	954befe2a7	[LV] Turn check into assert in fixFixedOrderRecurrence (NFCI). Suggested as independent cleanup in D147567. Either VF or UF need to be > 1. Note that if the condition would be false, the code below would use a nullptr and crash.	2023-04-10 21:11:41 +01:00
Florian Hahn	35af27c30a	[VPlan] Only create extracts for recurrence exits if there are live-outs. Move the code to collect live-out earlier and only generate extracts for exit values if there are any live-outs that use them. Depends on D147472. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D147567	2023-04-10 21:08:34 +01:00
Florian Hahn	c255eb2c4b	[VPlan] Use VPLiveOut to update FOR live-out users. Instead of iterating over all LCSSA phis in the exit block, collect all LiveOut users of the FOR splice VPInstruction and only update those users. Building on top of D147471, this removes an access to the cost model after VPlan execution. Depends on D147471. Reviewed By: Ayal, michaelmaitland Differential Revision: https://reviews.llvm.org/D147472	2023-04-10 13:02:44 +01:00
Florian Hahn	0dbcbfe0d0	[VPlan] Don't assign slots for external defs (NFCI). External defs are VPValues wrapping an IR value and hence will get printed as ir<>. We don't need to assign a slot for a VPValue number.	2023-04-09 21:01:21 +01:00
Florian Hahn	620e011a25	[VPlan] Don't add live-outs if scalar epilogue is required. Instead of clearing live outs when a scalar epilogue is required late, don't add live outs during VPlan construction if a scalar epilogue is required. This enables more VPlan-based DCE (if the live out would be the only user in the plan) and is a step towards removing an access of the cost model in fixedVectorizedLoop (which is after VPlan execution). Depends on D147468. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D147471	2023-04-09 09:18:24 +01:00
Florian Hahn	c7a34d355a	[VPlan] Require VFRange.End to be a power-of-2. (NFCI) This removes the need to convert the end of the range to the next power-of-2 for the end iterator after 4bd3fda5124962 and was suggested as follow-up TODO in D147468.	2023-04-08 13:04:08 +01:00
Florian Hahn	4bd3fda512	[VPlan] Add VFRange::begin() and end() iterators. (NFCI) Add an iterator to iterate over all VFs in VFRange. This simplifies some existing code and allows using all_of,any_of and none_of on a VFRange. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D147468	2023-04-08 10:22:25 +01:00
Alexey Bataev	85327f307b	[SLP][NFC]Make clusterSortPtrAccesses static.	2023-04-07 13:24:24 -07:00
Alexey Bataev	6ff177d928	[SLP][NFC]Improve SLP time by precomputing value<->gather nodes dependencies. Improved compiled time by the precomputing the mapping between gathered scalars and their gather/buildvector nodes for later use in isGatherShuffledEntry to avoid recomputing this map each time this function is called.	2023-04-07 12:12:02 -07:00
Florian Hahn	11896357d4	[VPlan] Add VPInterleaveRecipe::NeedsMaskForGaps field (NFCI). This patch adds a NeedsMaskForGaps field to VPInterleaveRecipe to record whether a mask for gaps is needed. This removes a dependence on the cost model in VPlan code-generation. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D147467	2023-04-07 13:11:03 +01:00
Alexey Bataev	52dd72a37a	[SLP][NFC]Make adjustExtracts/needToDelay members of ShuffleInstructionBuilder. Make adjustExtracts/needToDelay lambdas members of ShuffleInstructionBuilder to allow to overload them later for cost model. Differential Revision: https://reviews.llvm.org/D147730	2023-04-06 16:27:19 -07:00
Michael Maitland	e86ed9bf2a	[LV][NFC] Improve complexity of fixing users of recurrences The original loop has O(MxN) since `is_contained` iterates over all incoming values. This change makes it so only the phis which use the value as an incoming value are iterated over so it is now O(M). Differential Revision: https://reviews.llvm.org/D146999	2023-04-06 16:15:51 -07:00
Florian Hahn	3f36b9b456	[LV] Move conditional MaskForGaps construction to load case. Conditionally setting MaskForGaps is only needed for loads. This avoid re-computing MaskForGaps for stores. Suggested as independent cleanup in D147467.	2023-04-06 21:16:37 +01:00
Alexey Bataev	e58a49300e	[SLP][NFC]Evaluate FMF for reductions before the loop, no need to reevaluate it.	2023-04-06 11:57:20 -07:00
Alexey Bataev	50af6ab0ab	[SLP]Fix emission of the masks in shuffles for undefs. If the value is used in the expression, need to adjust the mask before applying the mask. Plus, need to fix the analysis of the phi nodes for reused scalars.	2023-04-06 10:16:58 -07:00
Alexey Bataev	cf62adbbd8	[SLP]Fix delete of the extractelement with users. Made the condition for the erasing of the gathered extractelements stricter, remove it only if it has single vectorized use, otherwise leave it for instcombiner/instsimplify analysis.	2023-04-06 09:15:30 -07:00
Philip Reames	92aae9e725	[LV] Remove a cover function with a single use [nfc] And more importantly, move the fixme to the sole caller where it actually makes sense in context.	2023-04-06 08:27:57 -07:00
David Sherwood	9278dd7b2b	[LoopVectorize] Fix zext/sext cost calculations when types are shrunk In getInstructionCost if we know a zext/sext is going to be shrunk we should only be changing the destination type, and leave the source type unchanged. For example, we may change a zext from zext <16 x i8> %a to <16 x i32> to zext <16 x i8> %a to <16 x i16> However, we were previously calculating the cost for doing zext <16 x i16> %a to <16 x i16> which is incorrect. Differential Revision: https://reviews.llvm.org/D147152	2023-04-06 08:52:25 +00:00
David Green	28c8616a5b	[LV] Cleanup and reformatting for some debug messages. NFC This is just some cleanup of various debug messages, pulled out of another patch to simplify it a little.	2023-04-05 17:50:01 +01:00
Alexey Bataev	40105a9933	[SLP]Find reused scalars in buildvector sequences, if any. Patch generalizes analysis of scalars. The main part is outlined into lambda, which can be used to find reused inserted scalars and emit shuffle for them instead of multiple insertelement instructions, if the permutation is found alreadyi. I.e. some scalars are transformed by the permutation of previously vectorized nodes, and some are inserted directly. Reworked part of D110978 Differential Revision: https://reviews.llvm.org/D146564	2023-04-05 09:37:05 -07:00
Philip Reames	c416f6700f	[IVDescriptors] Add pointer InductionDescriptors with non-constant strides (try 2) (JFYI - This has been heavily reframed since original attempt at landing.) This change updates the InductionDescriptor logic to allow matching a pointer IV with a non-constant stride, but also updates the LoopVectorizer to bailout on such descriptors by default. This preserves the default vectorizer behavior. In review, it was pointed out that there's multiple unfortunate performance implications which need to be addressed before this can be enabled. Having a flag allows us to exercise the behavior, and write test cases for logic which is otherwise unreachable (or hard to reach). This will also enable non-constant stride pointer recurrences for other consumers. I've audited said code, and don't see any obvious issues. Differential Revision: https://reviews.llvm.org/D147336	2023-04-05 09:32:35 -07:00
Jie Fu	ae5f049378	[Transforms] Fix -Wunused-function for 'GetReplicateRegion' with -DLLVM_ENABLE_ASSERTIONS=OFF (NFC) /Users/jiefu/llvm-project/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:614:23: error: unused function 'GetReplicateRegion' [-Werror,-Wunused-function] static VPRegionBlock GetReplicateRegion(VPRecipeBase R) { ^ 1 error generated.	2023-04-05 22:34:42 +08:00
Florian Hahn	c18bc7f7fe	[VPlan] Replace check for replicate regions with assert (NFCI). After recent changes, replication regions only get introduced later, so there's no need to check for them.	2023-04-05 14:29:24 +01:00
Graham Hunter	185863f7de	[LV] Use available masked vector function variants when required LLVM has the ability to vectorize using function variants that require a mask by creating an all-true mask, and to vectorize a conditional call via scalarization, now we want to join the two parts together and use a masked variant when a mask is required. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D136251	2023-04-05 11:18:38 +01:00
David Sherwood	b4089cfa2f	[NFC][LoopVectorize] Simplify preferPredicateOverEpilogue interface Given just how many arguments we pass to preferPredicateOverEpilogue and considering this list may grow over time I've decided to pass in a pointer to a new TailFoldingInfo structure instead, similar to what we do with IntrinsicCostAttributes, etc. In addition, many of the arguments we pass in are actually available in the LoopVectorizationLegality class so I've managed to reduce the set of pointers that we need to pass in the TailFoldingInfo struct. Differential Revision: https://reviews.llvm.org/D146127	2023-04-04 14:00:49 +00:00
Philip Reames	f6b217c7cb	[LV] Remmove unused default argument to isLegalGatherOrScatter [nfc]	2023-04-03 11:03:35 -07:00
Alexey Bataev	c1660006b2	[SLP]Reorder counters for same values, if the root node is reordered. The counters for the repeated scalars are ordered in the natural order, but the original scalars might be reordered during SLP graph reordering and this order can be dropped. Need to use the scalars after the reordering, not the original ones, to emit correct code for same value counters.	2023-04-03 07:52:49 -07:00

1 2 3 4 5 ...

3703 Commits