llvm-project

Author	SHA1	Message	Date
Philip Reames	269bc684e7	[LV][RISCV] Disable vectorization of epilogue loops Epilogue loop vectorization is a feature in the vectorize intended to avoid running fully scalar code when the vector length of the main loop turns out to be either longer than the trip count of the actual loop, or with a huge remainder. In practice, this feature appears to not have been well tuned. I honestly don't think it should be on by default at all, but it definitely shouldn't be on for RISCV. Note that other targets have also disabled it, but they've done so via disabling interleaving - which is, well, completely unrelated - and we don't want to do that for RISCV. In the near term, many examples I'm seeing have terrible codegen for epilogue vectorization. We are greatly increasing code size for little value at reasonable VLEN values for small types. In the long term, the cases that epilogue vectorization are intended to handle are likely better handled via tail folding on RISCV. As an aside, I also don't really trust the correctness of epilogue vectorization. The code structure is such that otherwise straight forward changes sometimes break only epilogue vectorization. The reuse of an existing vplan without careful validation opens significant room for nasty bugs. Given how rarely the code is exercised, that is not a good combination. As such, this patch introduces a TTI hook, and completely disables epilogue vectorization on RISCV. Differential Revision: https://reviews.llvm.org/D136695	2022-10-25 14:28:02 -07:00
David Green	093b4011e8	[ARM] Add a test demonstrating reductions with reused extend. NFC D136227 showed that tests for this case in getReductionPatternCost were missing.	2022-10-24 19:38:19 +01:00
Florian Hahn	7eb4ec1c75	[VPlan] Print predicates for widened cmp instructions (NFC).	2022-10-21 08:54:11 +01:00
William Huang	6c767cef5a	[InstCombine] Canonicalize GEP of GEP by swapping constant-indexed GEP to the back Canonicalize GEP of GEP by swapping GEP with some suffix constant indices to the back (and GEP with all constant indices to the back of that), this allows more constant index GEP merging to happen. Exceptions are: If swapping violates use-def relations, or anti-optimizes LICM For constant indexed GEP of GEP, if they cannot be merged directly, they will be casted to i8* and merged. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D125845	2022-10-20 17:41:26 +00:00
Florian Hahn	e25ed058bc	[LV] Use buildScalarSteps to also handle VF = 1. (NFCI) The code in buildScalarSteps already properly handles creating the scalar induction values with VF = 1. Use it directly instead of using extra code to handle that case. Suggested by @Ayal in D133760.	2022-10-20 14:30:01 +01:00
Sander de Smalen	137459aff6	[AArch64][SME] Disable (SLP\|Loop)Vectorizer when function may be executed in streaming mode. When the SME attributes tell that a function is or may be executed in Streaming SVE mode, we currently need to be conservative and disable _any_ vectorization (fixed or scalable) because the code-generator does not yet support generating streaming-compatible code. Scalable auto-vec will be gradually enabled in the future when we have confidence that the loop-vectorizer won't use any SVE or NEON instructions that are illegal in Streaming SVE mode. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D135950	2022-10-19 16:42:20 +00:00
Craig Topper	44f0b13494	[RISCV] Correct RISCVTTIImpl::getRegUsageForType for vectors of pointers. getPrimitiveSizeInBits returns 0 for pointers, we need to query the size via DataLayout instead. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D135976	2022-10-14 11:34:12 -07:00
Florian Hahn	518bccfd6e	[LV] Add epilogue test with variable induction start value. Add additional test mentioned by @venkataramanan.kumar.llvm in D92132.	2022-10-13 15:56:27 +01:00
Florian Hahn	26c8632f22	[LV] Add extra tests for epilogue vectorization with widened inductions. Extend test coverage to also include inductions with step > 1 and also with runtime trip counts.	2022-10-12 15:21:38 +01:00
Florian Hahn	c1fe52bfa6	[VPlan] Remove dead recipes before sinking. optimizeInductions may leave dead recipes which can prevent sinking. Sinking on the other hand should not introduce new dead recipes, so clean up dead recipes before sinking. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D133762	2022-10-12 12:49:42 +01:00
Florian Hahn	bbf01ab7b6	[VPlan] Add test for sinking pointer induction increments. Extra test for D133762.	2022-10-12 12:06:40 +01:00
Arthur Eubanks	f3a928e233	[opt] Don't translate legacy -analysis flag to require<analysis> Tests relying on this should explicitly use -passes='require<analysis>,foo'.	2022-10-07 14:54:34 -07:00
Arthur Eubanks	d3d8465446	[opt] Stop treating alias analysis specially when translating legacy opt syntax I've attempted to keep AA tests as close to their original intent as possible.	2022-10-07 11:50:43 -07:00
Guozhi Wei	ded26bf6b9	[IVDescriptors] Before moving an instruction in SinkAfter checking if it is target of other instructions The attached test case can cause LLVM crash in buildVPlanWithVPRecipes because invalid VPlan is generated. FIRST-ORDER-RECURRENCE-PHI ir<%792> = phi ir<%501>, ir<%806> CLONE ir<%804> = fdiv ir<1.000000e+00>, vp<%17> // use of %17 CLONE ir<%806> = load ir<%805> EMIT vp<%17> = first-order splice ir<%792> ir<%806> // def of %17 ... There is a use before def error on %17. When vectorizer generates a VPlan, it generates a "first-order splice" instruction for a loop carried variable after its definition. All related PHI users are changed to use this "first-order splice" result, and are moved after it. The move is guided by a MapVector SinkAfter. And the content of SinkAfter is filled by RecurrenceDescriptor::isFixedOrderRecurrence. Let's look at the first PHI and related instructions %v792 = phi double [ %v806, %Loop ], [ %d1, %Entry ] %v802 = fdiv double %v794, %v792 %v804 = fdiv double 1.000000e+00, %v792 %v806 = load double, ptr %v805, align 8 %v806 is a loop carried variable, %v792 is related PHI instruction. Vectorizer will generated a new "first-order splice" instruction for %v806, and it will be used by %v802 and %v804. So %v802 and %v804 will be moved after %v806 and its "first-order splice" instruction. So SinkAfter contains %v802 -> %v806 %v804 -> %v802 It means %v802 should be moved after %v806 and %v804 will be moved after %v802. Please pay attention that the order is important. When isFixedOrderRecurrence processing PHI instruction %v794, related instructions are %v793 = phi double [ %v813, %Loop ], [ %d1, %Entry ] %v794 = phi double [ %v793, %Loop ], [ %d2, %Entry ] %v802 = fdiv double %v794, %v792 %v813 = load double, ptr %v812, align 8 This time its related loop carried variable is %v813, its user is %v802. So %v802 should also be moved after %v813. But %v802 is already in SinkAfter, because %v813 is later than %v806, so the original %v802 entry in SinkAfter is deleted, a new %v802 entry is added. Now SinkAfter contains %v804 -> %v802 %v802 -> %v813 With these data, %v802 can still be moved after all its operands, but %v804 can't be moved after %v806 and its "first-order splice" instruction. And causes use before def error. So when remove/re-insert an instruction I in SinkAfter, we should also recursively remove instructions targeting I and re-insert them into SinkAfter. But for simplicity I just bail out in this case. Differential Revision: https://reviews.llvm.org/D134083	2022-10-03 18:47:51 +00:00
Zain Jaffal	966411790e	[AArch64] Add support to loop vectorization for non temporal loads Currently, AArch64 doesn't support vectorization for non temporal loads because `isLegalNTLoad` is not implemented for the target. This patch applies similar functionality as `D73158` but for non temporal loads Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D131964	2022-10-03 17:06:47 +01:00
Igor Kirillov	a94a85552c	[LoopVectorize] Add missing test for D133687	2022-10-03 14:54:17 +01:00
Florian Hahn	7c0ff64b0f	[LAA] Change to function analysis for new PM. At the moment, LoopAccessAnalysis is a loop analysis for the new pass manager. The issue with that is that LAI caches SCEV expressions and modifications in a loop may impact SCEV expressions in other loops, but we do not have a convenient way to invalidate LAI for other loops withing a loop pipeline. To avoid this issue, turn it into a function analysis which returns a manager object that keeps track of the individual LAI objects per loop. Fixes #50940. Fixes #51669. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D134606	2022-10-01 15:44:27 +01:00
Arthur Eubanks	e23aee7175	[test] Update some legacy PM tests	2022-09-30 11:31:02 -07:00
Florian Hahn	9933a2e9fd	[SCEVExpander] Move LCSSA fixup to ::expand. Move LCSSA fixup from ::expandCodeForImpl to ::expand(). This has the advantage that we directly preserve LCSSA nodes here instead of relying on doing so in rememberInstruction. It also ensures that we don't add the non-LCSSA-safe value to InsertedExpressions. Alternative to D132704. Fixes #57000. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D134739	2022-09-29 20:49:56 +01:00
Igor Kirillov	2d60d7ba1a	[LoopVectorize][Fix] Crash when invariant store address is calculated inside loop Fixes #57572 Generally LICM pass is responsible for sinking out code that calculates invariant address inside loop as it only needed to be calculated once. But in rare case it does not happen we will not be vectorizing the loop. Differential Revision: https://reviews.llvm.org/D133687	2022-09-28 10:33:50 +01:00
Philip Reames	dc7387b587	[LV] Adjust cost model to use uniform store lowering for unpredicated uniform stores Follow up to D133580; adjust the cost model to prefer uniform store lowering for scalable stores which are unpredicated. The impact here isn't in the uniform store lowering quality itself. InstCombine happily converts the scatter form into the single store form. The main impact is in letting the rest of the cost model make choices based on the knowledge that the vector will be scalarized on use. Differential Revision: https://reviews.llvm.org/D134460	2022-09-27 07:28:40 -07:00
Florian Hahn	2c692d891e	[LV] Update handling of scalable pointer inductions after b73d2c8. The dependent code has been changed quite a lot since 151c144 which b73d2c8 effectively reverts. Now we run into a case where lowering didn't expect/support the behavior pre 151c144 any longer. Update the code dealing with scalable pointer inductions to also check for uniformity in combination with isScalarAfterVectorization. This should ensure scalable pointer inductions are handled properly during epilogue vectorization. Fixes #57912.	2022-09-23 18:23:02 +01:00
Florian Hahn	17167005d5	[LV] Add test for #57912 . Add test showing miscompilation during epilogue vectorization with SVE.	2022-09-23 11:49:55 +01:00
Florian Hahn	05b3493819	[LV] Convert sve-epilog-vect.ll to use opaque pointers.	2022-09-23 10:24:19 +01:00
Philip Reames	32dc1151e2	[VPlan] Only generate single instr for unpredicated stores of varying value to invariant address This extends the previously added uniform store case to handle stores of loop varying values to a loop invariant address. Note that the placement of this code only allows unpredicated stores; this is important for correctness. (That is "IsPredicated" is always false at this point in the function.) This patch does not include scalable types. The diff felt "large enough" as it were; I'll handle that in a separate patch. (It requires some changes to cost modeling.) Differential Revision: https://reviews.llvm.org/D133580	2022-09-22 08:53:46 -07:00
Simon Pilgrim	e030be64d8	[CostModel][X86] Add partial CostKinds handling for funnelshifts/rotates This mainly just adds costs for the targets where we have actual funnelshift/rotate instructions (VBMI2/XOP etc.) - the cases where we expand still need addressing, although for many the default shift+or expansion, especially for uniform cases, isn't that bad. This was achieved with the 'cost-tables vs llvm-mca' script D103695	2022-09-22 11:24:11 +01:00
Simon Pilgrim	b2cd8118d0	[CostModel][X86] Add CostKinds handling for smax/smin/umax/umin instructions This was achieved with the 'cost-tables vs llvm-mca' script D103695	2022-09-22 10:19:23 +01:00
Philip Reames	8c46881a53	[TTI] Recognize fp constants in getOperandInfo We were recognizing vectors of floats, but not scalars. That's a tad odd.	2022-09-21 14:34:34 -07:00
Graham Hunter	7b420a4a8b	[NFC][LV] Scalarizing test for masked vector calls	2022-09-21 15:43:25 +01:00
Simon Pilgrim	71162ad957	[LoopVectorize] Fix test name - the test is for fshl not cttz intrinsic costs	2022-09-21 15:24:43 +01:00
Sanjay Patel	0f32a5dea0	[InstCombine] don't canonicalize shl+sub to mul+add This stops Negator from transforming: `C1 - shl X, C2 --> mul X, (1<<C2) + C1` ...in the general case. There does not seem to be any analysis benefit to using mul in IR, and there's definitely downside in codegen (particularly when the multiply has to be expanded). If `C1` is 0, then there's a stronger argument that the single mul is a better canonicalization than negate-of-shl, but we may want to remove that too. This was noted as a potential conflict for D133667. Differential Revision: https://reviews.llvm.org/D134310	2022-09-21 08:39:07 -04:00
Simon Pilgrim	09cb9fdef9	[InstCombine] Fold ult(add(x,-1),c) -> ule(x,c) iff x != 0 (PR57635) Alive2: https://alive2.llvm.org/ce/z/sZ6wwS As detailed on Issue #57635 and #37628 - for unsigned comparisons, we can compare prior to a decrement iff the value is known never to be zero. Differential Revision: https://reviews.llvm.org/D134172	2022-09-20 16:44:41 +01:00
Vitaly Buka	bbef90ace4	[IRBuilder] Use PoisonValue in CreateMasked* Followup to 72b776168c7c80d2035c7226488462dcffc97e75 Reviewed By: nlopes Differential Revision: https://reviews.llvm.org/D133967	2022-09-19 11:01:41 -07:00
Florian Hahn	582f8ef19f	[LV] Keep track of cost-based ScalarAfterVec in VPWidenPointerInd. Epilogue vectorization uses isScalarAfterVectorization to check if widened versions for inductions need to be generated and bails out in those cases. At the moment, there are scenarios where isScalarAfterVectorization returns true but VPWidenPointerInduction::onlyScalarsGenerated would return false, causing widening. This can lead to widened phis with incorrect start values being created in the epilogue vector body. This patch addresses the issue by storing the cost-model decision in VPWidenPointerInductionRecipe and restoring the behavior before 151c144. This effectively reverts 151c144, but the long-term fix is to properly support widened inductions during epilogue vectorization Fixes #57712.	2022-09-19 18:14:35 +01:00
Sebastian Peryt	99c9b37d11	[NFC][1/n] Remove -enable-new-pm=0 flags from lit tests This is the first patch in a series intended for removing flag -enable-new-pm=0 from lit tests. This is part of a bigger effort of completely removing legacy code related to legacy pass manager in favor of currently default new pass manager. In this patch flag has been removed only from tests where no significant change has been required because checks has been duplicated for both PMs. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D134150	2022-09-19 09:57:37 -07:00
Florian Hahn	f02ff5348f	[LV] Move new epilog-vectorization-widen-inductions.ll to AArch64 dir. The test requires the AArch64 backend, so move it to the right subdir.	2022-09-19 17:13:06 +01:00
Florian Hahn	6087b6386e	[LV] Add tests for epilogue vectorization with widened inductions. Includes a test for the miscompile in #57712.	2022-09-19 17:10:41 +01:00
Simon Pilgrim	393cc6a354	[LoopVectorize] Regenerate runtime-check.ll	2022-09-19 10:25:48 +01:00
Simon Pilgrim	7e626d7a89	[LoopVectorize][X86] Use quotes around the pass list to appease DOS cmd evaluation DOS can't handle -passes='default<O3>' correctly	2022-09-19 10:24:37 +01:00
Sanjay Patel	d6498abc24	[InstCombine] remove multi-use add demanded constant fold This was originally part of D133788. There are no visible regressions. All of the diffs show a large unsigned constant becoming a small negative constant. This should be better for analysis (and slightly less compile-time) and codegen.	2022-09-18 14:23:43 -04:00
Vitaly Buka	ed188b39ab	[test] Regenerate few tests	2022-09-15 12:36:32 -07:00
Simon Pilgrim	0ec028fe10	[CostModel][X86] Add CostKinds handling for vector shift by uniform/constuniform ops Vector shift by const uniform is the cheapest shift instruction we have, non-const uniform have a marginally higher cost - some targets 'splat' the amount internally to use the shift-per-element instruction, others see a higher cost for the explicit zeroing of the upper bits for the (64-bit) shift amount. This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (I'll update the patch soon for reference)	2022-09-15 14:05:30 +01:00
jacquesguan	ecf327f154	[RISCV] Add cost model for vector insert/extract element. This patch adds cost model for vector insert/extract element instructions. In RVV, we could use vector scalar move instruction to insert or extract the first element, and use vslide to move it. But for mask vector or i64 vector in i32 target, we need special instructions to make it. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D133007	2022-09-14 11:10:18 +08:00
Simon Pilgrim	8ae9cf550b	[LoopVectorize][X86] Add uniform shift costs checks for VF=1/2/4	2022-09-13 13:46:52 +01:00
Philip Reames	4e295cb1ce	[LV] Autogen a test for ease of update	2022-09-09 08:16:22 -07:00
Philip Reames	edb26268ce	[VPlan] Only generate single instr for stores uniform across all parts. Extend the approach taken by D133019 to store instructions. Differential Revision: https://reviews.llvm.org/D133497	2022-09-09 07:15:12 -07:00
Graham Hunter	1f639d1bd2	[NFC][LV] Convert masked call tests to use update script	2022-09-09 10:07:39 +01:00
Craig Topper	5f3a8b585b	[RISCV] Add RecurKind::FMulAdd to isLegalToVectorizeReduction for scalable vectors. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D133511	2022-09-08 12:34:59 -07:00
Philip Reames	4c4c0d2c06	[LV] Use safe-divisor lowering for fixed vectors if profitable This extends the safe-divisor widening scheme recently added for scalable vectors to handle fixed vectors as well. Differential Revision: https://reviews.llvm.org/D132591	2022-09-08 09:15:54 -07:00
Florian Hahn	422cf99161	[VPlan] Only generate single instr for loads uniform across all parts. VPReplicateRecipe::isUniform actually means uniform-per-parts, hence a scalar instruction is generated per-part. This is a potential alternative D132892. For now the current patch only catches cases where the address is trivially invariant (defined outside VPlan), while D132892 catches any address that is considered invariant by SCEV AFAICT. It should be possible to hoist fully invariant recipes feeding loads out of the vector loop region as well, but in practice LICM should do that already. This version of the patch artificially limits this to loads to make it easier to compare, but this restriction should be easily liftable. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D133019	2022-09-08 14:27:58 +01:00

1 2 3 4 5 ...

1867 Commits