llvm-project

Author	SHA1	Message	Date
Rosie Sumpter	df32a39dd0	[LoopVectorize][CostModel] Update cost model for fmuladd intrinsic This patch updates the cost model for ordered reductions so that a call to the llvm.fmuladd intrinsic is modelled as a normal fmul instruction plus the cost of an ordered fadd reduction. Differential Revision: https://reviews.llvm.org/D111630	2021-11-24 08:50:05 +00:00
Rosie Sumpter	2d33327f9d	[LoopVectorize] Print fast-math flags for VPReductionRecipe	2021-11-24 08:50:05 +00:00
Rosie Sumpter	991074012a	[LoopVectorize] Propagate fast-math flags for VPInstruction In-loop vector reductions which use the llvm.fmuladd intrinsic involve the creation of two recipes; a VPReductionRecipe for the fadd and a VPInstruction for the fmul. If the call to llvm.fmuladd has fast-math flags these should be propagated through to the fmul instruction, so an interface setFastMathFlags has been added to the VPInstruction class to enable this. Differential Revision: https://reviews.llvm.org/D113125	2021-11-24 08:50:04 +00:00
Rosie Sumpter	c2441b6b89	[LoopVectorize] Add vector reduction support for fmuladd intrinsic Enables LoopVectorize to handle reduction patterns involving the llvm.fmuladd intrinsic. Differential Revision: https://reviews.llvm.org/D111555	2021-11-24 08:50:04 +00:00
Diego Caballero	4348cd42c3	[LV] Drop integer poison-generating flags from instructions that need predication This patch fixes PR52111. The problem is that LV propagates poison-generating flags (`nuw`/`nsw`, `exact` and `inbounds`) in instructions that contribute to the address computation of widen loads/stores that are guarded by a condition. It may happen that when the code is vectorized and the control flow within the loop is linearized, these flags may lead to generating a poison value that is effectively used as the base address of the widen load/store. The fix drops all the integer poison-generating flags from instructions that contribute to the address computation of a widen load/store whose original instruction was in a basic block that needed predication and is not predicated after vectorization. Reviewed By: fhahn, spatel, nlopes Differential Revision: https://reviews.llvm.org/D111846	2021-11-22 10:57:29 +00:00
Florian Hahn	cf8efbd30e	[VPlan] Wrap vector loop blocks in region. A first step towards modeling preheader and exit blocks in VPlan as well. Keeping the vector loop in a region allows for changing the VF as we traverse region boundaries. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D113182	2021-11-20 17:59:48 +00:00
Florian Hahn	76effb001d	[LV] Remove obsolete comment about creating a dummy block (NFC) No dummy pre-entry block is created since a6c4969f5f45. The comment is stale now and can be removed. Mentioned by @Ayal in D113182.	2021-11-19 17:17:04 +00:00
Alexey Bataev	d1fdf867b1	[SLP][NFC]Introduce TreeEntry::getVectorFactor member function, NFC. Added TreeEntry::getVectorFactor to get the final vectotization factor to simplify the code. Differential Revision: https://reviews.llvm.org/D114190	2021-11-19 06:32:19 -08:00
David Sherwood	670dd40244	[Analysis] Fix getNumberOfParts to return 0 when the answer is unknown When asking how many parts are required for a scalable vector type there are occasions when it cannot be computed. For example, <vscale x 1 x i3> is one such vector for AArch64+SVE because at the moment no matter how we promote the i3 type we never end up with a legal vector. This means that getTypeConversion returns TypeScalarizeScalableVector as the LegalizeKind, and then getTypeLegalizationCost returns an invalid cost. This then causes BasicTTImpl::getNumberOfParts to dereference an invalid cost, which triggers an assert. This patch changes getNumberOfParts to return 0 for such cases, since the definition of getNumberOfParts in TargetTransformInfo.h states that we can use a return value of 0 to represent an unknown answer. Currently, LoopVectorize.cpp is the only place where we need to check for 0 as a return value, because all other instances will not currently ask for the number of parts for <vscale x 1 x iX> types. In addition, I have changed the target-independent interface for getNumberOfParts to return 1 and assume there is a single register that can fit the type. The loop vectoriser has lots of tests that are target-independent and they relied upon the 0 value to mean the answer is known and that we are not scalarising the vector. I have added tests here that show we correctly return an invalid cost for VF=vscale x 1 when the loop contains unusual types such as i7: Transforms/LoopVectorize/AArch64/sve-inductions-unusual-types.ll Differential Revision: https://reviews.llvm.org/D113772	2021-11-17 12:07:09 +00:00
Alexey Bataev	900cc1a226	[SLP]Improve cost of the gather nodes. No need to count the final shuffle cost for the constants, gathering of the constants is just a constant vector + extra inserts, if required. Differential Revision: https://reviews.llvm.org/D113770	2021-11-16 06:25:07 -08:00
Alexey Bataev	cdf8a53c1d	[SLP]Fix windows build, NFC. Need to put `IndexIdx` var to the list of captures.	2021-11-16 06:09:51 -08:00
Alexey Bataev	aa9bbb64be	[SLP]Adjust GEP indices types when trying to build entries. Need to adjust the types of GEPs indices when building the tree entries/operands. Otherwise some of the nodes might differ and vectorizer is unable to correctly find them and count their cost. Differential Revision: https://reviews.llvm.org/D113792	2021-11-16 05:44:33 -08:00
Alexey Bataev	224e46d355	[SLP][DOT][NFCI]Output all scalars for the splats, not only the first one.	2021-11-15 10:54:26 -08:00
Alexey Bataev	036207d5f2	[SLP]Improve splat detection. A bunch of scalars can be treated as a splat not only if all elements are the same but also if some of them are undefvalues. Differential Revision: https://reviews.llvm.org/D113774	2021-11-15 07:50:34 -08:00
Alexey Bataev	b85152f8b1	[SLP][NFC]Use `isa_and_nonnull` and fix comment, NFC.	2021-11-15 06:49:33 -08:00
Alexey Bataev	6fb5bed7d1	[SLP]Do not create unused gather nodes for scalar arguments of vector intrinsics. If the vector intrinsic has scalar argument, we currently still create a tree entry for this argument. This entry is not used, just consumes resources and increases the cost of the tree. Differential Revision: https://reviews.llvm.org/D113806	2021-11-15 06:11:19 -08:00
Sander de Smalen	f835fe8ef7	[LV] Rename blockNeedsPredication to blockNeedsPredicationForAnyReason. The interface is a convenience function to ask if a block requires predication when widening, but it's important that there are two separate concepts to consider: (A) The block was predicated in the original loop. (B) The block was unpredicated in the original loop, but requires predication because of tail folding. In the case of (B) we know that at least one lane of the vector will be executed, which means we can implementing a load from a uniform address with a scalar load + splat (D112552). In the case of predication because of (A), we cannot do this, because the scalar load itself requires predication. The name 'blockNeedsPredication' does not make the distinction between (A) and (B), hence the reason to rename it. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D113392	2021-11-15 08:04:20 +00:00
Alexey Bataev	352c46e707	[SLP]Improve vectorization of split loads. Need to fix ther cost estimation for split loads, since we look at the subregs already, no need to permute them, need just to estimate subregister insert, if it is smaller than the real register. Also, using split loads, it might be profitable already to vectorize smaller trees with gathering of the loads. Differential Revision: https://reviews.llvm.org/D107188	2021-11-12 06:13:22 -08:00
Florian Hahn	93931d78cf	[LV] Do not rely on InductionDescriptor::getCastInsts. (NFC) Now that CastDef is passed as VPValue, there is no need to access ID.getCastInsts, as CastDef can instead be checked.	2021-11-10 13:03:44 +00:00
Florian Hahn	e7f1232cb7	[LV] Move optimized IV recipes to phi section of header after sinking. Unfortunately sinking recipes for first-order recurrences relies on the original position of recipes. So if a recipes needs to be sunk after an optimized induction, it needs to stay in the original position, until sinking is done. This is causing PR52460. To fix the crash, keep the recipes in the original position until sink-after is done. Post-commit follow-up to c45045bfd04af9 to address PR52460.	2021-11-10 11:41:08 +00:00
Kerry McLaughlin	6f16ee5e14	Revert "[LoopVectorize] Extract the last lane from a uniform store" This reverts commit 0d748b4d32cbddf58a1ff83f3ff178ec1ad49edc. This is causing some failures when building Spec2017 with scalable vectors. Reverting to investigate.	2021-11-10 11:21:19 +00:00
Kerry McLaughlin	0d748b4d32	[LoopVectorize] Extract the last lane from a uniform store Changes VPReplicateRecipe to extract the last lane from an unconditional, uniform store instruction. collectLoopUniforms will also add stores to the list of uniform instructions where Legal->isUniformMemOp is true. setCostBasedWideningDecision now sets the widening decision for all uniform memory ops to Scalarize, where previously GatherScatter may have been chosen for scalable stores. This fixes an assert ("Cannot yet scalarize uniform stores") in setCostBasedWideningDecision when we have a loop containing a uniform i1 store and a scalable VF, which we cannot create a scatter for. Reviewed By: sdesmalen, david-arm, fhahn Differential Revision: https://reviews.llvm.org/D112725	2021-11-09 14:43:16 +00:00
Florian Hahn	acbefbf19f	[VPlan] Guard code to dump instructions after d9361bfbe2ce. This should fix build failures when built without assertions enabled, e.g. https://lab.llvm.org/buildbot/#/builders/205/builds/172	2021-11-09 10:29:05 +00:00
Florian Hahn	d9361bfbe2	[VPlan] Add initial inner-loop VPlan verification. This patch adds a function to verify general properties of VPlans. The first check makes sure that all phi-like recipes are at the beginning of a block, with no other recipes in between. Note that this currently may not hold for VPBlendRecipes at the moment, as other recipes may be inserted before the VPBlendRecipe during mask creation. Note that this patch depends on D111300 and D111301, which fix code that breaks the checked invariant. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D111302	2021-11-09 10:18:28 +00:00
Florian Hahn	e3bfb6a146	[VPlan] Make sure recurrence splice is not inserted between phis. All phi-like recipes should be at the beginning of a VPBasicBlock with no other recipes in between. Ensure that the recurrence-splicing recipe is not added between phi-like recipes, but after them. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D111301	2021-11-08 17:42:32 +00:00
Sander de Smalen	2829376bb2	[LV] Use VScaleForTuning to fine-tune the cost per lane. When targeting a specific CPU with scalable vectorization, the knowledge of that particular CPU's vscale value can be used to tune the cost-model and make the cost per lane less pessimistic. If the target implements 'TTI.getVScaleForTuning()', the cost-per-lane is calculated as: Cost / (VScaleForTuning * VF.KnownMinLanes) Otherwise, it assumes a value of 1 meaning that the behavior is unchanged and calculated as: Cost / VF.KnownMinLanes Reviewed By: kmclaughlin, david-arm Differential Revision: https://reviews.llvm.org/D113209	2021-11-08 16:59:46 +00:00
David Sherwood	c63b0f471b	[NFC][LoopVectorize] Make the createStepForVF interface more caller-friendly The common use case for calling createStepForVF is currently something like: Value Step = createStepForVF(Builder, ConstantInt::get(Ty, UF), VF); and it makes more sense to reduce overall lines of code and change the function to let it create the constant instead. With my patch this becomes: Value Step = createStepForVF(Builder, Ty, VF, UF); and the ConstantInt is created instead createStepForVF. A side-effect of this is that the code in createStepForVF is also becomes simpler. As part of this patch I've also replaced some calls to getRuntimeVF with calls to createStepForVF, i.e. getRuntimeVF(Builder, Count->getType(), VFactor * UFactor) -> createStepForVF(Builder, Count->getType(), VFactor, UFactor) because this feels semantically better. Differential Revision: https://reviews.llvm.org/D113122	2021-11-08 15:14:14 +00:00
David Sherwood	c42bb30b9e	[LoopVectorize] Permit fixed-width epilogue loops for scalable vector bodies At the moment in LoopVectorizationCostModel::selectEpilogueVectorizationFactor we bail out if the main vector loop uses a scalable VF. This patch adds support for generating epilogue vector loops using a fixed-width VF when the main vector loop uses a scalable VF. I've changed LoopVectorizationCostModel::selectEpilogueVectorizationFactor so that we convert the scalable VF into a fixed-width VF and do profitability checks on that instead. In addition, since the scalable and fixed-width VFs live in different VPlans that means I had to change the calls to LVP.hasPlanWithVFs so that we only pass in the fixed-width VF. New tests added here: Transforms/LoopVectorize/AArch64/sve-epilog-vect.ll Differential Revision: https://reviews.llvm.org/D109432	2021-11-08 09:41:13 +00:00
Kazu Hirata	0d182d9d1e	[Transforms] Use make_early_inc_range (NFC)	2021-11-07 17:03:15 -08:00
Simon Pilgrim	f057756a1a	[SLP] Fix Wdocumentation warning - remove \returns from void function. NFC.	2021-11-07 15:08:39 +00:00
Nikita Popov	f8627877a9	[SCEV] Make eraseValueFromMap() private (NFC) The public API for this functionality is forgetValue(). There was only one call from LoopVectorize, which was directly next to a forgetValue() call and as such redundant.	2021-11-06 17:14:02 +01:00
Florian Hahn	b4992dbb21	[LV] Clarify uniform worklist contains instrs demanding lane 0.	2021-11-04 13:11:50 +01:00
David Sherwood	c0f2774973	[NFC][LoopVectorize] Simple tidy-up in InnerLoopVectorizer::createVectorIntOrFpInductionPHI Use getSignedIntOrFpConstant instead of creating int or FP constants manually.	2021-11-03 14:05:21 +00:00
Florian Hahn	64bc31ee93	[LV] Drop unneeded use of getVPSingleValue (NFC). VPReductionPHIRecipe inherits from VPValue, so there's no need to call getVPSingleValue.	2021-11-03 14:26:15 +01:00
Florian Hahn	8e44bdd12a	[VPlan] Make VPWidenCanonicalIVRecipe a VPValue (NFC). The recipe produces exactly one VPValue and can inherit directly from it. This is in line with other recipes and avoids having to use getVPSingleValue.	2021-11-03 14:11:01 +01:00
Kazu Hirata	1b108ab975	[Transforms] Use make_early_inc_range (NFC)	2021-11-02 18:13:23 -07:00
Rosie Sumpter	dcb8222d87	[LoopVectorize] Propagate fast-math flags for inloop reductions This patch updates VPReductionRecipe::execute so that the fast-math flags associated with the underlying instruction of the VPRecipe are propagated through to the reductions which are created. Differential Revision: https://reviews.llvm.org/D112548	2021-11-02 08:59:53 +00:00
David Sherwood	87a294d5eb	[LoopVectorize] Change getRuntimeVFAsFloat to use unsigned int->FP conversion We never expect the runtime VF to be negative so we should use the uitofp instruction instead of sitofp. Differential revision: https://reviews.llvm.org/D112610	2021-11-01 09:58:14 +00:00
Florian Hahn	c45045bfd0	[VPlan] Keep induction recipes in header. This patch updates recipe creation to ensure all VPWidenIntOrFpInductionRecipes are in the header block. At the moment, new induction recipes can be created in different blocks when trying to optimize casts and induction variables. Having all induction recipes in the header makes it easier to analyze/transform them in VPlan. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D111300	2021-10-28 18:22:05 +01:00
Alexey Bataev	07ef9f513f	[SLP]Improve/fix reordering of the gathered graph nodes. Gathered loads/extractelements/extractvalue instructions should be checked if they can represent a vector reordering node too and their order should ve taken into account for better graph reordering analysis/ Also, if the gather node has reused scalars, they must be reordered instead of the scalars themselves. Differential Revision: https://reviews.llvm.org/D112454	2021-10-28 05:45:09 -07:00
Alexey Bataev	f06e332982	Revert "[SLP]Improve/fix reordering of the gathered graph nodes." This reverts commit 64d1617d18cb8b6f9511d0eda481fc5a5d0ebddf to fix test non-stability.	2021-10-27 11:16:58 -07:00
Alexey Bataev	64d1617d18	[SLP]Improve/fix reordering of the gathered graph nodes. Gathered loads/extractelements/extractvalue instructions should be checked if they can represent a vector reordering node too and their order should ve taken into account for better graph reordering analysis/ Also, if the gather node has reused scalars, they must be reordered instead of the scalars themselves. Differential Revision: https://reviews.llvm.org/D112454	2021-10-27 08:49:13 -07:00
David Sherwood	5d9318638e	[NFC][LoopVectorize] Change getStepVector to take a Value* for the StartIdx This patch changes the definition of getStepVector from: Value getStepVector(Value Val, int StartIdx, Value Step, ... to Value getStepVector(Value Val, Value StartIdx, Value Step, ... because: 1. it seems inconsistent to pass some values as Value and some as integer, and 2. future work will require the StartIdx to be an expression made up of runtime calculations of the VF. In widenIntOrFpInduction I've changed the code to pass in the value returned from getRuntimeVF, but the presence of the assert: assert(!VF.isScalable() && "scalable vectors not yet supported."); means that currently this code path is only exercised for fixed-width VFs and so the patch is still NFC. Differential revision: https://reviews.llvm.org/D111882	2021-10-27 16:12:38 +01:00
Alexey Bataev	9b12975cbf	Revert "[SLP]Improve/fix reordering of the gathered graph nodes." This reverts commit f719b794bcaa1df8fa82659d6d4e754c77d2f94e to fix instability in tests.	2021-10-27 07:31:36 -07:00
Alexey Bataev	f719b794bc	[SLP]Improve/fix reordering of the gathered graph nodes. Gathered loads/extractelements/extractvalue instructions should be checked if they can represent a vector reordering node too and their order should ve taken into account for better graph reordering analysis/ Also, if the gather node has reused scalars, they must be reordered instead of the scalars themselves. Differential Revision: https://reviews.llvm.org/D112454	2021-10-27 06:08:40 -07:00
Alexey Bataev	cb4feae7bd	[SLP]Fix logical and/or reductions. Need to emit select(cmp) instructions for poison-safe forms of select ops. Currently alive reports that `Target is more poisonous than source` for operations we generating for such instructions. https://alive2.llvm.org/ce/z/FiNiAA Differential Revision: https://reviews.llvm.org/D112562	2021-10-27 04:25:20 -07:00
David Sherwood	3d706c20f8	[NFC][LoopVectorize] Remove setBestPlan in favour of getBestPlanFor I have removed LoopVectorizationPlanner::setBestPlan, since this function is quite aggressive because it deletes all other plans except the one containing the <VF,UF> pair required. The code is currently written to assume that all <VF,UF> pairs will live in the same vplan. This is overly restrictive, since scalable VFs live in different plans to fixed-width VFS. When we add support for vectorising epilogue loops when the main loop uses scalable vectors then we will the vplan for the main loop will be different to the epilogue. Instead I have added a new function called LoopVectorizationPlanner::getBestPlanFor that returns the best vplan for the <VF,UF> pair requested and leaves all the vplans untouched. We then pass this best vplan to LoopVectorizationPlanner::executePlan which now takes an additional VPlanPtr argument. Differential revision: https://reviews.llvm.org/D111125	2021-10-27 09:38:27 +01:00
Rosie Sumpter	b716d0aa94	[LoopVectorize] Clean up VPReductionRecipe::execute. NFC Use RdxDesc->getOpcode instead of getUnderlingInstr()->getOpcode. Move the code which finds Kind and IsOrdered to be outside the for loop since neither of these change with the vector part. Differential Revision: https://reviews.llvm.org/D112547	2021-10-26 17:18:25 +01:00
Alexey Bataev	ce14d1b690	[SLP]Do not reorder reduction nodes. The final reduction nodes should not be reordered, the order does not matter for reductions. Also, it might be profitable to vectorize smaller reduction trees, reduction cost may compensate small tree cost. Part of D111574 Differential Revision: https://reviews.llvm.org/D112467	2021-10-26 07:41:24 -07:00
Alexey Bataev	eb9b75dd4d	[SLP]Change the order of the reduction/binops args pair vectorization attempts. Need to change the order of the reduction/binops args pair vectorization attempts. Need to try to find the reduction at first and postpone vectorization of binops args. This may help to find more reduction patterns and vectorize them. Part of D111574. Differential Revision: https://reviews.llvm.org/D112224	2021-10-25 06:27:14 -07:00

... 14 15 16 17 18 ...

3535 Commits