llvm-project

Author	SHA1	Message	Date
Florian Hahn	5b362e4c7f	[VPlan] Add Debugloc to VPInstruction. Upcoming changes require attaching debug locations to VPInstructions, e.g. adding induction increment recipes in D113223. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D115123	2021-12-20 15:10:41 +00:00
Florian Hahn	564d109b35	[LV] Pass VectorHeader block to emitTransformedIndex (NFC). Pass in the vector header instead of relying on ILV::LoopVectorBody. This reduces the dependence on state from ILV. Where VPTransformState is available, State.CFG.PrevBB can be used.	2021-12-17 10:11:16 +00:00
Evgeniy Brevnov	7002125cff	[LV][NFC] Fix debug message to print out resulting clamped VF	2021-12-13 18:54:05 +07:00
Evgeniy Brevnov	2025e0985c	[LV] Make sure VF doesn't exceed compile time known TC For the simple copy loop (see test case) vectorizer selects VF equal to 32 while the loop is known to have 17 iterations only. Such behavior makes no sense to me since such vector loop will never be executed. The only case we may want to select VF large than TC is masked vectoriztion. So I haven't touched that case. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D114528	2021-12-13 13:48:46 +07:00
Florian Hahn	b6a2ddb6c8	[LV] Use info from State in some helper functions (NFC). This updates several helper functions to use information provided by VPTransformState instead of ILV directly, to help with the transition out of ILV.	2021-12-12 20:48:38 +00:00
David Green	fed3041863	[LV][ARM] Improve reduction costmodel for mismatching extension types. Given a MLA reduction from two different types (say i8 and i16), we were previously failing to find the reduction pattern, often making us chose the lower vector factor. This improves that by using the largest of the two extension types, allowing us to use the larger VF as the type of the reduction. As per https://godbolt.org/z/KP549EEYM the backend handles this valiantly, leading to better performance. Differential Revision: https://reviews.llvm.org/D115432	2021-12-10 15:40:58 +00:00
Florian Hahn	505ad03c7d	[LV] Remove redundant IV casts using VPlan (NFCI). This patch simplifies handling of redundant induction casts, by removing dead cast instructions after initial VPlan construction. This has the following benefits: 1. fixes a crash (see @test_optimized_cast_induction_feeding_first_order_recurrence) 2. Simplifies VPWidenIntOrFpInduction to a single-def recipes 3. Retires recordVectorLoopValueForInductionCast. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D115112	2021-12-10 13:57:03 +00:00
Florian Hahn	acea6e9cfa	[Passes] Only run extra vector passes if loops have been vectorized. This patch uses a similar trick as in D113947 to only run the extra passes after vectorization on functions where loops have been vectorized. The reason for running the 'extra vector passes' is simplification/unswitching of the runtime checks created by LV, there should be no need to run them if nothing got vectorized To do that, a new dummy analysis ShouldRunExtraVectorPasses has been added. If loops have been vectorized for a function, LV will cache the analysis. At the moment it uses MadeCFGChanges as proxy for loop vectorized, which isn't perfect (it could be too aggressive, e.g. because no runtime checks have been added), but should be good enough for now. The extra passes are now managed by a new FunctionPassManager that runs its passes only if ShouldRunExtraVectorPasses has been cached. Without this patch, `-extra-vectorizer-passes` has the following compile-time impact: NewPM-O3: +4.86% NewPM-ReleaseThinLTO: +3.56% NewPM-ReleaseLTO-g: +7.17% http://llvm-compile-time-tracker.com/compare.php?from=ead3979a92fc33add4710c4510d6906260dcb4ad&to=c292da649e2c6e88a31e702fdc474727d09c72bc&stat=instructions With this patch, that gets reduced to NewPM-O3: +1.43% NewPM-ReleaseThinLTO: +1.00% NewPM-ReleaseLTO-g: +1.58% http://llvm-compile-time-tracker.com/compare.php?from=ead3979a92fc33add4710c4510d6906260dcb4ad&to=e67d86b57810011cf285eb9aa1944781be6096f0&stat=instructions It is probably still too high to enable by default, but much better. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D115052	2021-12-10 11:42:45 +00:00
Florian Hahn	978883d254	[VPlan] Add InductionDescriptor to VPWidenIntOrFpInduction. (NFC) This allows easier access to the induction descriptor from VPlan, without needing to go through Legal. VPReductionPHIRecipe already contains a RecurrenceDescriptor in a similar fashion. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D115111	2021-12-10 09:55:09 +00:00
Florian Hahn	d74a8a78ad	[LV] Mark various functions as const (NFC). Make sure various accessors do not modify any state, in preparation for D115111.	2021-12-09 10:51:29 +00:00
Florian Hahn	e9a2944495	[VPlan] Verify plan entry and exit blocks, set correct exit block. Both the entry and exit blocks of the top-region of a plan must be VPBasicBlocks. They also must have no predecessors or successors respectively. This invariant was broken when splitting a block for sink-after. To fix the issue, set the exit block of the region after sink-after is done. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D114586	2021-12-07 16:26:31 +00:00
Cullen Rhodes	0395e01583	[IR] Split vscale_range interface Interface is split from: std::pair<unsigned, unsigned> getVScaleRangeArgs() into separate functions for min/max: unsigned getVScaleRangeMin(); Optional<unsigned> getVScaleRangeMax(); Reviewed By: sdesmalen, paulwalker-arm Differential Revision: https://reviews.llvm.org/D114075	2021-12-07 10:38:26 +00:00
Florian Hahn	07276e49e3	[LV] Check VPValue operand instead of Cost::isUniformAfterVec (NFC). ILV::scalarizeInstruction still uses the original IR operands to check if an input value is uniform after vectorization. There is no need to go back to the cost model to figure that out, as the information is already explicit in the VPlan. Just check directly whether the VPValue is defined outside the plan or is a uniform VPReplicateRecipe. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D114253	2021-12-06 18:32:35 +00:00
Sander de Smalen	3d549dddf7	[LV] Pass compare predicate to getCmpSelInstrCost. If the condition of a select is a compare, pass its predicate to TTI::getCmpSelInstrCost to get a more accurate cost value instead of passing BAD_ICMP_PREDICATE. I noticed that the commit message from D90070 had a comment about the vectorized select predicate possibly being composed of other compares with different predicate values, but I wasn't able to construct an example where this was an actual issue. If this is an issue, I guess we could add another check that the block isn't predicated for any reason. Reviewed By: dmgreen, fhahn Differential Revision: https://reviews.llvm.org/D114646	2021-12-06 11:41:27 +00:00
Florian Hahn	7c3c352d82	[VPlan] Separate ctors for VPWidenIntOrFpInduction. (NFC) VPWidenIntOrFpInductionRecipes can either be constructed with a PHI and an optional cast or a PHI and a trunc instruction. Reflect this in 2 separate constructors. This also simplifies a follow-up change.	2021-12-05 12:15:18 +00:00
Florian Hahn	e44298a8f8	[LV] Move code from vectorizeMemoryInstruction to recipe's execute(). The code in widenMemoryInstruction has already been transitioned to only rely on information provided by VPWidenMemoryInstructionRecipe directly. Moving the code directly to VPWidenMemoryInstructionRecipe::execute completes the transition for the recipe. It provides the following advantages: 1. Less indirection, easier to see what's going on. 2. Removes accesses to fields of ILV. 2) in particular ensures that no dependencies on fields in ILV for vector code generation are re-introduced. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D114324	2021-12-01 14:56:51 +00:00
Philip Reames	c41b318423	[LV] Remove unneeded cast to Operator [NFC]	2021-11-30 08:45:13 -08:00
Florian Hahn	dab776dd0f	[LV] Move code from widenSelectInstruction to VPWidenSelectRecipe. (NFC) The code in widenSelectInstruction has already been transitioned to only rely on information provided by VPWidenSelectRecipe directly. Moving the code directly to VPWidenSelectRecipe::execute completes the transition for the recipe. It provides the following advantages: 1. Less indirection, easier to see what's going on. 2. Removes accesses to fields of ILV. 2) in particular ensures that no dependencies on fields in ILV for vector code generation are re-introduced. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D114323	2021-11-30 10:32:44 +00:00
Roman Lebedev	8cd782487f	[X86][LoopVectorize] "Fix" `X86TTIImpl::getAddressComputationCost()` We ask `TTI.getAddressComputationCost()` about the cost of computing vector address, and then multiply it by the vector width. This doesn't make any sense, it implies that we'd do a vector GEP and then scalarize the vector of pointers, but there is no such thing in the vectorized IR, we perform scalar GEP's. This is especially bad on X86, and was effectively prohibiting any scalarized vectorization of gathers/scatters, because `X86TTIImpl::getAddressComputationCost()` says that cost of vector address computation is `10` as compared to `1` for scalar. The computed costs are similar to the ones with D111222+D111220, but we end up without masked memory intrinsics that we'd then have to expand later on, without much luck. (D111363) Differential Revision: https://reviews.llvm.org/D111460	2021-11-30 10:47:56 +03:00
Florian Hahn	fd71159f64	[LV] Move code from widenInstruction to VPWidenRecipe. (NFC) The code in widenInstruction has already been transitioned to only rely on information provided by VPWidenRecipe directly. Moving the code directly to VPWidenRecipe::execute completes the transition for the recipe. It provides the following advantages: 1. Less indirection, easier to see what's going on. 2. Removes accesses to fields of ILV. 2) in particular ensures that no dependencies on fields in ILV for vector code generation are re-introduced. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D114322	2021-11-29 09:09:00 +00:00
Florian Hahn	3495090b9b	[LV] Move code from widenGEP to VPWidenGEPRecipe (NFC). The code in widenGEP has already been transitioned to only rely on information provided by VPWidenGEPRecipe directly. Moving the code directly to VPWidenGEPRecipe::execute completes the transition for the recipe. It provides the following advantages: 1. Less indirection, easier to see what's going on. 2. Removes accesses to fields of ILV. 2) in particular ensures that no dependencies on fields in ILV for GEP code generation are re-introduced. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D114321	2021-11-28 18:29:18 +00:00
Sander de Smalen	28a4deab92	[LV] Fix incorrectly marking a pointer indvar as 'scalar'. collectLoopScalars should only add non-uniform nodes to the list if they are used by a load/store instruction that is marked as CM_Scalarize. Before this patch, the LV incorrectly marked pointer induction variables as 'scalar' when they required to be widened by something else, such as a compare instruction, and weren't used by a node marked as 'CM_Scalarize'. This case is covered by sve-widen-phi.ll. This change also allows removing some code where the LV tried to widen the PHI nodes with a stepvector, even though it was marked as 'scalarAfterVectorization'. Now that this code is more careful about marking instructions that need widening as 'scalar', this code has become redundant. Differential Revision: https://reviews.llvm.org/D114373	2021-11-28 09:49:28 +00:00
David Sherwood	e20391fc5d	[LoopVectorize] When tail-folding, don't always predicate uniform loads In VPRecipeBuilder::handleReplication if we believe the instruction is predicated we then proceed to create new VP region blocks even when the load is uniform and only predicated due to tail-folding. I have updated isPredicatedInst to avoid treating a uniform load as predicated when tail-folding, which means we can do a single scalar load and a vector splat of the value. Tests added here: Transforms/LoopVectorize/AArch64/tail-fold-uniform-memops.ll Differential Revision: https://reviews.llvm.org/D112552	2021-11-26 11:30:54 +00:00
Florian Hahn	2897b67665	[LV] Use OrigLoop instead of induction to get function. (NFC) Upcoming changes will result in Induction not being set/used in some cases. Use OrigLoop to get the function instead.	2021-11-24 20:17:44 +00:00
Rosie Sumpter	df32a39dd0	[LoopVectorize][CostModel] Update cost model for fmuladd intrinsic This patch updates the cost model for ordered reductions so that a call to the llvm.fmuladd intrinsic is modelled as a normal fmul instruction plus the cost of an ordered fadd reduction. Differential Revision: https://reviews.llvm.org/D111630	2021-11-24 08:50:05 +00:00
Rosie Sumpter	991074012a	[LoopVectorize] Propagate fast-math flags for VPInstruction In-loop vector reductions which use the llvm.fmuladd intrinsic involve the creation of two recipes; a VPReductionRecipe for the fadd and a VPInstruction for the fmul. If the call to llvm.fmuladd has fast-math flags these should be propagated through to the fmul instruction, so an interface setFastMathFlags has been added to the VPInstruction class to enable this. Differential Revision: https://reviews.llvm.org/D113125	2021-11-24 08:50:04 +00:00
Rosie Sumpter	c2441b6b89	[LoopVectorize] Add vector reduction support for fmuladd intrinsic Enables LoopVectorize to handle reduction patterns involving the llvm.fmuladd intrinsic. Differential Revision: https://reviews.llvm.org/D111555	2021-11-24 08:50:04 +00:00
Diego Caballero	4348cd42c3	[LV] Drop integer poison-generating flags from instructions that need predication This patch fixes PR52111. The problem is that LV propagates poison-generating flags (`nuw`/`nsw`, `exact` and `inbounds`) in instructions that contribute to the address computation of widen loads/stores that are guarded by a condition. It may happen that when the code is vectorized and the control flow within the loop is linearized, these flags may lead to generating a poison value that is effectively used as the base address of the widen load/store. The fix drops all the integer poison-generating flags from instructions that contribute to the address computation of a widen load/store whose original instruction was in a basic block that needed predication and is not predicated after vectorization. Reviewed By: fhahn, spatel, nlopes Differential Revision: https://reviews.llvm.org/D111846	2021-11-22 10:57:29 +00:00
Florian Hahn	cf8efbd30e	[VPlan] Wrap vector loop blocks in region. A first step towards modeling preheader and exit blocks in VPlan as well. Keeping the vector loop in a region allows for changing the VF as we traverse region boundaries. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D113182	2021-11-20 17:59:48 +00:00
Florian Hahn	76effb001d	[LV] Remove obsolete comment about creating a dummy block (NFC) No dummy pre-entry block is created since a6c4969f5f45. The comment is stale now and can be removed. Mentioned by @Ayal in D113182.	2021-11-19 17:17:04 +00:00
David Sherwood	670dd40244	[Analysis] Fix getNumberOfParts to return 0 when the answer is unknown When asking how many parts are required for a scalable vector type there are occasions when it cannot be computed. For example, <vscale x 1 x i3> is one such vector for AArch64+SVE because at the moment no matter how we promote the i3 type we never end up with a legal vector. This means that getTypeConversion returns TypeScalarizeScalableVector as the LegalizeKind, and then getTypeLegalizationCost returns an invalid cost. This then causes BasicTTImpl::getNumberOfParts to dereference an invalid cost, which triggers an assert. This patch changes getNumberOfParts to return 0 for such cases, since the definition of getNumberOfParts in TargetTransformInfo.h states that we can use a return value of 0 to represent an unknown answer. Currently, LoopVectorize.cpp is the only place where we need to check for 0 as a return value, because all other instances will not currently ask for the number of parts for <vscale x 1 x iX> types. In addition, I have changed the target-independent interface for getNumberOfParts to return 1 and assume there is a single register that can fit the type. The loop vectoriser has lots of tests that are target-independent and they relied upon the 0 value to mean the answer is known and that we are not scalarising the vector. I have added tests here that show we correctly return an invalid cost for VF=vscale x 1 when the loop contains unusual types such as i7: Transforms/LoopVectorize/AArch64/sve-inductions-unusual-types.ll Differential Revision: https://reviews.llvm.org/D113772	2021-11-17 12:07:09 +00:00
Sander de Smalen	f835fe8ef7	[LV] Rename blockNeedsPredication to blockNeedsPredicationForAnyReason. The interface is a convenience function to ask if a block requires predication when widening, but it's important that there are two separate concepts to consider: (A) The block was predicated in the original loop. (B) The block was unpredicated in the original loop, but requires predication because of tail folding. In the case of (B) we know that at least one lane of the vector will be executed, which means we can implementing a load from a uniform address with a scalar load + splat (D112552). In the case of predication because of (A), we cannot do this, because the scalar load itself requires predication. The name 'blockNeedsPredication' does not make the distinction between (A) and (B), hence the reason to rename it. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D113392	2021-11-15 08:04:20 +00:00
Florian Hahn	93931d78cf	[LV] Do not rely on InductionDescriptor::getCastInsts. (NFC) Now that CastDef is passed as VPValue, there is no need to access ID.getCastInsts, as CastDef can instead be checked.	2021-11-10 13:03:44 +00:00
Florian Hahn	e7f1232cb7	[LV] Move optimized IV recipes to phi section of header after sinking. Unfortunately sinking recipes for first-order recurrences relies on the original position of recipes. So if a recipes needs to be sunk after an optimized induction, it needs to stay in the original position, until sinking is done. This is causing PR52460. To fix the crash, keep the recipes in the original position until sink-after is done. Post-commit follow-up to c45045bfd04af9 to address PR52460.	2021-11-10 11:41:08 +00:00
Kerry McLaughlin	6f16ee5e14	Revert "[LoopVectorize] Extract the last lane from a uniform store" This reverts commit 0d748b4d32cbddf58a1ff83f3ff178ec1ad49edc. This is causing some failures when building Spec2017 with scalable vectors. Reverting to investigate.	2021-11-10 11:21:19 +00:00
Kerry McLaughlin	0d748b4d32	[LoopVectorize] Extract the last lane from a uniform store Changes VPReplicateRecipe to extract the last lane from an unconditional, uniform store instruction. collectLoopUniforms will also add stores to the list of uniform instructions where Legal->isUniformMemOp is true. setCostBasedWideningDecision now sets the widening decision for all uniform memory ops to Scalarize, where previously GatherScatter may have been chosen for scalable stores. This fixes an assert ("Cannot yet scalarize uniform stores") in setCostBasedWideningDecision when we have a loop containing a uniform i1 store and a scalable VF, which we cannot create a scatter for. Reviewed By: sdesmalen, david-arm, fhahn Differential Revision: https://reviews.llvm.org/D112725	2021-11-09 14:43:16 +00:00
Florian Hahn	d9361bfbe2	[VPlan] Add initial inner-loop VPlan verification. This patch adds a function to verify general properties of VPlans. The first check makes sure that all phi-like recipes are at the beginning of a block, with no other recipes in between. Note that this currently may not hold for VPBlendRecipes at the moment, as other recipes may be inserted before the VPBlendRecipe during mask creation. Note that this patch depends on D111300 and D111301, which fix code that breaks the checked invariant. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D111302	2021-11-09 10:18:28 +00:00
Florian Hahn	e3bfb6a146	[VPlan] Make sure recurrence splice is not inserted between phis. All phi-like recipes should be at the beginning of a VPBasicBlock with no other recipes in between. Ensure that the recurrence-splicing recipe is not added between phi-like recipes, but after them. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D111301	2021-11-08 17:42:32 +00:00
Sander de Smalen	2829376bb2	[LV] Use VScaleForTuning to fine-tune the cost per lane. When targeting a specific CPU with scalable vectorization, the knowledge of that particular CPU's vscale value can be used to tune the cost-model and make the cost per lane less pessimistic. If the target implements 'TTI.getVScaleForTuning()', the cost-per-lane is calculated as: Cost / (VScaleForTuning * VF.KnownMinLanes) Otherwise, it assumes a value of 1 meaning that the behavior is unchanged and calculated as: Cost / VF.KnownMinLanes Reviewed By: kmclaughlin, david-arm Differential Revision: https://reviews.llvm.org/D113209	2021-11-08 16:59:46 +00:00
David Sherwood	c63b0f471b	[NFC][LoopVectorize] Make the createStepForVF interface more caller-friendly The common use case for calling createStepForVF is currently something like: Value Step = createStepForVF(Builder, ConstantInt::get(Ty, UF), VF); and it makes more sense to reduce overall lines of code and change the function to let it create the constant instead. With my patch this becomes: Value Step = createStepForVF(Builder, Ty, VF, UF); and the ConstantInt is created instead createStepForVF. A side-effect of this is that the code in createStepForVF is also becomes simpler. As part of this patch I've also replaced some calls to getRuntimeVF with calls to createStepForVF, i.e. getRuntimeVF(Builder, Count->getType(), VFactor * UFactor) -> createStepForVF(Builder, Count->getType(), VFactor, UFactor) because this feels semantically better. Differential Revision: https://reviews.llvm.org/D113122	2021-11-08 15:14:14 +00:00
David Sherwood	c42bb30b9e	[LoopVectorize] Permit fixed-width epilogue loops for scalable vector bodies At the moment in LoopVectorizationCostModel::selectEpilogueVectorizationFactor we bail out if the main vector loop uses a scalable VF. This patch adds support for generating epilogue vector loops using a fixed-width VF when the main vector loop uses a scalable VF. I've changed LoopVectorizationCostModel::selectEpilogueVectorizationFactor so that we convert the scalable VF into a fixed-width VF and do profitability checks on that instead. In addition, since the scalable and fixed-width VFs live in different VPlans that means I had to change the calls to LVP.hasPlanWithVFs so that we only pass in the fixed-width VF. New tests added here: Transforms/LoopVectorize/AArch64/sve-epilog-vect.ll Differential Revision: https://reviews.llvm.org/D109432	2021-11-08 09:41:13 +00:00
Kazu Hirata	0d182d9d1e	[Transforms] Use make_early_inc_range (NFC)	2021-11-07 17:03:15 -08:00
Nikita Popov	f8627877a9	[SCEV] Make eraseValueFromMap() private (NFC) The public API for this functionality is forgetValue(). There was only one call from LoopVectorize, which was directly next to a forgetValue() call and as such redundant.	2021-11-06 17:14:02 +01:00
Florian Hahn	b4992dbb21	[LV] Clarify uniform worklist contains instrs demanding lane 0.	2021-11-04 13:11:50 +01:00
David Sherwood	c0f2774973	[NFC][LoopVectorize] Simple tidy-up in InnerLoopVectorizer::createVectorIntOrFpInductionPHI Use getSignedIntOrFpConstant instead of creating int or FP constants manually.	2021-11-03 14:05:21 +00:00
Florian Hahn	64bc31ee93	[LV] Drop unneeded use of getVPSingleValue (NFC). VPReductionPHIRecipe inherits from VPValue, so there's no need to call getVPSingleValue.	2021-11-03 14:26:15 +01:00
Florian Hahn	8e44bdd12a	[VPlan] Make VPWidenCanonicalIVRecipe a VPValue (NFC). The recipe produces exactly one VPValue and can inherit directly from it. This is in line with other recipes and avoids having to use getVPSingleValue.	2021-11-03 14:11:01 +01:00
Rosie Sumpter	dcb8222d87	[LoopVectorize] Propagate fast-math flags for inloop reductions This patch updates VPReductionRecipe::execute so that the fast-math flags associated with the underlying instruction of the VPRecipe are propagated through to the reductions which are created. Differential Revision: https://reviews.llvm.org/D112548	2021-11-02 08:59:53 +00:00
David Sherwood	87a294d5eb	[LoopVectorize] Change getRuntimeVFAsFloat to use unsigned int->FP conversion We never expect the runtime VF to be negative so we should use the uitofp instruction instead of sitofp. Differential revision: https://reviews.llvm.org/D112610	2021-11-01 09:58:14 +00:00
Florian Hahn	c45045bfd0	[VPlan] Keep induction recipes in header. This patch updates recipe creation to ensure all VPWidenIntOrFpInductionRecipes are in the header block. At the moment, new induction recipes can be created in different blocks when trying to optimize casts and induction variables. Having all induction recipes in the header makes it easier to analyze/transform them in VPlan. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D111300	2021-10-28 18:22:05 +01:00

1 2 3 4 5 ...

1499 Commits