llvm-project

Author	SHA1	Message	Date
Graham Hunter	0fa5df1959	[LV] Synthesize all true masks for masked vector function variants When vectorizing code with function calls in it, if we encounter a function which only has vectorized variants requiring a mask we can synthesize an all-true mask to enable us to proceed. Since we want the mask to be represented in vplan, the pointer to the chosen Function is now stored as part of the VPWidenCallRecipe, and mask arguments are added at the appropriate index to the recipe operands. Reviewed By: david-arm, fhahn, reames Differential Revision: https://reviews.llvm.org/D132458	2023-02-14 14:33:18 +00:00
Florian Hahn	af3c25dc3d	[VPlan] Fix iterator invalidation in adjustFixedOrderRecurrences. adjustFixedOrderRecurrences may insert instructions after immediately after the PHI nodes in the block. This invalidates the phis() iterator. To avoid crashing/accessing invalid recipes, first collect all first-order recurrence phi recipes. This should fix a crash reported by @dmgreen after D142589 landed.	2023-02-13 13:51:14 +00:00
Sander de Smalen	5a115452c4	Reland D143267: [LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices. Fixed issue where 'ConstantInt::get(IndextTy, -Part)' was executed with the wrong type for Part, e.g. IndexTy was i64, but Part was 'unsigned', which led to things like 'mul i64 .., 4294967292', which was obviously wrong. Also changed sve-vector-reverse.ll to be vectorized with UF>1 to test this. This reverts commit 1f01cdda68614dba12af3cc3aff38541d0abcc6b.	2023-02-09 09:42:29 +00:00
Sander de Smalen	1f01cdda68	Revert "[LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices." This patch causes a regression, so reverting it while I investigate the issue. This reverts commit e6eb84a191ca2a1afd5789c5bb398da68bb6065e.	2023-02-08 15:46:52 +00:00
Sander de Smalen	e6eb84a191	[LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices. This is specifically relevant for loops that vectorize using a scalable VF, where the code results in: %vscale = call i32 llvm.vscale.i32() %vf.part1 = mul i32 %vscale, 4 %gep = getelementptr ..., i32 %vf.part1 Which InstCombine then changes into: %vscale = call i32 llvm.vscale.i32() %vf.part1 = mul i32 %vscale, 4 %vf.part1.zext = sext i32 %vf.part1 to i64 %gep = getelementptr ..., i32 %vf.part1.zext D143016 tried to remove these extends, but that only works when the call to llvm.vscale.i32() has a single use. After doing any kind of CSE on these calls the combine no longer kicks in. It seems more sensible to ask DataLayout what type to use, rather than relying on InstCombine to insert the extend and hoping it can fold it away. I've only changed this for indices that are not constant, because I vaguely remember there was a reason for sticking with i32. It would also mean patching up loads more tests. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D143267	2023-02-07 11:47:51 +00:00
Florian Hahn	a9ac22b501	[LV] Add users for loads to make tests more robust. Update a few tests to add users to loads to avoid them being optimized out by future changes. In cases the unused loads didn't matter for the test, remove them.	2023-02-04 20:42:57 +00:00
Craig Topper	df76ff98e8	[InstCombine][LV] Fold (add (zext (add X, -1)), 1) -> (zext X) if X is non-zero. This artifact can appear from the vectorizer. (add X, -1) is the backedge taken count. It gets zero extended and then 1 is added to it to get the trip count. There is usually a dominating branch that rules out X being zero. Alive: https://alive2.llvm.org/ce/z/NsRDwX	2023-01-30 17:45:01 -08:00
Daniel Kiss	c4fa504f79	[AArch64] Enable libm vectorized functions via SLEEF It enables trigonometry functions vectorization via SLEEF: http://sleef.org/. - A new vectorization library enum is added to TargetLibraryInfo.h: SLEEF. - A new option is added to TargetLibraryInfoImpl - ClVectorLibrary: SLEEF. - A comprehensive test case is included in this changeset. - A new vectorization library argument is added to -fveclib: -fveclib=SLEEF. Trigonometry functions that are vectorized by sleef: acos asin atan atanh cos cosh exp exp2 exp10 lgamma log10 log2 log sin sinh sqrt tan tanh tgamma Co-authored-by: Stefan Teleman Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D134719	2023-01-20 18:52:38 +01:00
Nikita Popov	9ed2f14c87	[AsmParser] Remove typed pointer auto-detection IR is now always parsed in opaque pointer mode, unless -opaque-pointers=0 is explicitly given. There is no automatic detection of typed pointers anymore. The -opaque-pointers=0 option is added to any remaining IR tests that haven't been migrated yet. Differential Revision: https://reviews.llvm.org/D141912	2023-01-18 09:58:32 +01:00
Florian Hahn	830d0bc56b	[AArch64] Set MaxInterleaveFactor for Apple A14, A15, A16. Those CPUs can benefit from additional interleaving. Reviewed By: jroelofs Differential Revision: https://reviews.llvm.org/D141499	2023-01-11 18:52:51 +00:00
Florian Hahn	42f6783747	[AArch64] Add tests for selecting interleave counts for different CPUs. Add extra tests for interleaving heuristics for different AArch64 CPUs.	2023-01-11 14:50:33 +00:00
Paul Walker	eae26b6640	[IRBuilder] Use canonical i64 type for insertelement index used by vector splats. Instcombine prefers this canonical form (see getPreferredVectorIndex), as does IRBuilder when passing the index as an integer so we may as well use the prefered form from creation. NOTE: All test changes are mechanical with nothing else expected beyond a change of index type from i32 to i64. Differential Revision: https://reviews.llvm.org/D140983	2023-01-11 14:08:06 +00:00
Nikita Popov	2fab927546	[LoopVectorize] Convert some tests to opaque pointers (NFC) Check lines for some of these tests were regenerated. The difference is that with opaque pointers SCEVExpander always emits i8 GEPs, making the address calculation explicit. This is a known problem that will be solved long term by making all address calculations explicit.	2023-01-04 17:25:42 +01:00
Paul Walker	0bca44680a	[InstCombine] Bubble vector.reverse of binop operands to their result. This mirrors a similar shufflevector transformation so the same effect is obtained for scalable vectors. The transformation is only performed when it can be proven the number of resulting reversals is not increased. By bubbling the reversals from operand to result this should typically be the case and ideally leads to back-back shuffles that can be elimitated entirely. Differential Revision: https://reviews.llvm.org/D139342	2022-12-21 15:53:14 +00:00
Paul Walker	362c52ad5a	[InstCombine] Bubble vector.reverse of compare operands to their result. This mirrors a similar shufflevector transformation so the same effect is obtained for scalable vectors. The transformation is only performed when it can be proven the number of resulting reversals is not increased. By bubbling the reversals from operand to result this should typically be the case and ideally leads to back-back shuffles that can be elimitated entirely. Differential Revision: https://reviews.llvm.org/D139340	2022-12-21 15:53:14 +00:00
Florian Hahn	f69ac9a22d	[LV] Support widened induction variables in epilogue vectorization. Code generation now uses the start VPValue of induction recipes. This makes it possible to adjust the start value of the epilogue vector loop to use the 'resume' value of the main vector loop. Fixes #59459. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D92132	2022-12-21 13:58:50 +00:00
Nikita Popov	5b40015063	[LoopVectorize] Convert some tests to opaque pointers (NFC) For these tests update_test_checks.py had to be rerun.	2022-12-14 15:27:31 +01:00
Nikita Popov	7d7577256b	[LoopVectorize] Convert some tests to opaque pointers (NFC)	2022-12-14 15:16:59 +01:00
liqinweng	6efb45f5ab	[AARCH64][CostModel] Modified the cost of mask vector load/store Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D134413	2022-12-09 14:11:21 +08:00
Roman Lebedev	1e08a08a87	[NFC] Port all LoopVectorize tests to `-passes=` syntax	2022-12-08 02:38:47 +03:00
Roman Lebedev	be51fa4580	[NFC] Port all runlines for LoopVectorize pass tests to -passes syntax	2022-12-05 22:17:30 +03:00
Florian Hahn	0c5df7cd2f	Recommit "[VPlan] Add VPDerivedIVRecipe, use for VPScalarIVStepsRecipe." This reverts commit bf15f1e489aa2f1ac13268c9081a992a8963eb5b. The updated version fixes a crash by checking the induction kind instead of the opcode; for integer inductions, the step is always added, but the opcode might not be set.	2022-11-30 17:04:20 +00:00
William Huang	be4b1dd35b	[InstCombine] Revert D125845 Reverting D125845 `[InstCombine] Canonicalize GEP of GEP by swapping constant-indexed GEP to the back` because multiple users reported performance regression Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D138950	2022-11-29 22:02:40 +00:00
Florian Hahn	bf15f1e489	Revert "[VPlan] Add VPDerivedIVRecipe, use for VPScalarIVStepsRecipe." This reverts commit 0fa666ecedc3f36471c0fee925d664512e7525a8. This triggers an assertion during AArch64 stage2 builds. Revert while I investigate. See https://lab.llvm.org/buildbot/#/builders/179/builds/4967/steps/11/logs/stdio	2022-11-28 22:43:11 +00:00
Florian Hahn	0fa666eced	[VPlan] Add VPDerivedIVRecipe, use for VPScalarIVStepsRecipe. This patch splits off the logic to transform the canonical IV to a a value for an induction with a different start and step. This transformation only needs to be done once (independent of VF/UF) and enables sinking of VPScalarIVStepsRecipe as follow-up. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D133758	2022-11-28 16:32:31 +00:00
Florian Hahn	758699c399	[VectorUtils] Skip interleave members with diff type and alloca sizes. Currently, codegen doesn't support cases where the type size doesn't match the alloc size. Skip them for now. Fixes #58722.	2022-11-13 22:06:20 +00:00
Karthik Senthil	d9c52c31a0	[LV][IVDescriptors] Fix recurrence identity element for FMin and FMax reductions For a min and max reduction idioms, the identity (i.e. neutral) element should be datatype's highest and lowest possible values respectively. Current implementation in IVDescriptors incorrectly returns -Inf for FMin reduction and +Inf for FMax reduction. This patch fixes this bug which was causing incorrect reduction computation results in loops vectorized by LV. Differential Revision: https://reviews.llvm.org/D137220	2022-11-04 10:39:37 -04:00
LiDongjin	d1cee3539f	[LoopVectorize] Fix crash on "Cannot dereference end iterator!"(PR56627) Check hasOneUser before user_back(). Differential Revision: https://reviews.llvm.org/D136227	2022-11-03 23:13:37 +08:00
Patrick Walton	f3d49dbcb1	[test] Remove readonly from some parameters that are written through in tests. In D136659 I found a few tests that write through readonly parameters: * Analysis/BasicAA/pr18573.ll: @foo1 writes through %arr.ptr, but declares it readonly. I removed the readonly annotation. * CodeGen/ARM/ParallelDSP/aliasing.ll: @restrict writes through the readonly %arg3, @store_alias_arg3_illegal_1 writes through the readonly %arg3, and @store_alias_arg3_illegal_2 writes through the readonly %arg3. I removed readonly from all three. Also, I added some CHECK-LABEL directives to make it harder for FileCheck output to be mixed up. * Transforms/LoopVectorize/AArch64/sve-gather-scatter.ll: @gather_nxv4i32_ind64_stride2 writes through the readonly %a. I removed the readonly attribute. * Transforms/LoopVectorize/interleaved-accesses.ll: @load_gap_reverse writes through the readonly %P1 and %P2. Also, the corresponding C code in the comment didn't match the test. I removed the readonly attribute from both parameters and corrected the C code. Differential Revision: https://reviews.llvm.org/D136880	2022-10-29 15:05:20 -07:00
William Huang	6c767cef5a	[InstCombine] Canonicalize GEP of GEP by swapping constant-indexed GEP to the back Canonicalize GEP of GEP by swapping GEP with some suffix constant indices to the back (and GEP with all constant indices to the back of that), this allows more constant index GEP merging to happen. Exceptions are: If swapping violates use-def relations, or anti-optimizes LICM For constant indexed GEP of GEP, if they cannot be merged directly, they will be casted to i8* and merged. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D125845	2022-10-20 17:41:26 +00:00
Florian Hahn	e25ed058bc	[LV] Use buildScalarSteps to also handle VF = 1. (NFCI) The code in buildScalarSteps already properly handles creating the scalar induction values with VF = 1. Use it directly instead of using extra code to handle that case. Suggested by @Ayal in D133760.	2022-10-20 14:30:01 +01:00
Sander de Smalen	137459aff6	[AArch64][SME] Disable (SLP\|Loop)Vectorizer when function may be executed in streaming mode. When the SME attributes tell that a function is or may be executed in Streaming SVE mode, we currently need to be conservative and disable _any_ vectorization (fixed or scalable) because the code-generator does not yet support generating streaming-compatible code. Scalable auto-vec will be gradually enabled in the future when we have confidence that the loop-vectorizer won't use any SVE or NEON instructions that are illegal in Streaming SVE mode. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D135950	2022-10-19 16:42:20 +00:00
Florian Hahn	518bccfd6e	[LV] Add epilogue test with variable induction start value. Add additional test mentioned by @venkataramanan.kumar.llvm in D92132.	2022-10-13 15:56:27 +01:00
Florian Hahn	26c8632f22	[LV] Add extra tests for epilogue vectorization with widened inductions. Extend test coverage to also include inductions with step > 1 and also with runtime trip counts.	2022-10-12 15:21:38 +01:00
Arthur Eubanks	f3a928e233	[opt] Don't translate legacy -analysis flag to require<analysis> Tests relying on this should explicitly use -passes='require<analysis>,foo'.	2022-10-07 14:54:34 -07:00
Zain Jaffal	966411790e	[AArch64] Add support to loop vectorization for non temporal loads Currently, AArch64 doesn't support vectorization for non temporal loads because `isLegalNTLoad` is not implemented for the target. This patch applies similar functionality as `D73158` but for non temporal loads Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D131964	2022-10-03 17:06:47 +01:00
Arthur Eubanks	e23aee7175	[test] Update some legacy PM tests	2022-09-30 11:31:02 -07:00
Philip Reames	dc7387b587	[LV] Adjust cost model to use uniform store lowering for unpredicated uniform stores Follow up to D133580; adjust the cost model to prefer uniform store lowering for scalable stores which are unpredicated. The impact here isn't in the uniform store lowering quality itself. InstCombine happily converts the scatter form into the single store form. The main impact is in letting the rest of the cost model make choices based on the knowledge that the vector will be scalarized on use. Differential Revision: https://reviews.llvm.org/D134460	2022-09-27 07:28:40 -07:00
Florian Hahn	2c692d891e	[LV] Update handling of scalable pointer inductions after b73d2c8. The dependent code has been changed quite a lot since 151c144 which b73d2c8 effectively reverts. Now we run into a case where lowering didn't expect/support the behavior pre 151c144 any longer. Update the code dealing with scalable pointer inductions to also check for uniformity in combination with isScalarAfterVectorization. This should ensure scalable pointer inductions are handled properly during epilogue vectorization. Fixes #57912.	2022-09-23 18:23:02 +01:00
Florian Hahn	17167005d5	[LV] Add test for #57912 . Add test showing miscompilation during epilogue vectorization with SVE.	2022-09-23 11:49:55 +01:00
Florian Hahn	05b3493819	[LV] Convert sve-epilog-vect.ll to use opaque pointers.	2022-09-23 10:24:19 +01:00
Philip Reames	32dc1151e2	[VPlan] Only generate single instr for unpredicated stores of varying value to invariant address This extends the previously added uniform store case to handle stores of loop varying values to a loop invariant address. Note that the placement of this code only allows unpredicated stores; this is important for correctness. (That is "IsPredicated" is always false at this point in the function.) This patch does not include scalable types. The diff felt "large enough" as it were; I'll handle that in a separate patch. (It requires some changes to cost modeling.) Differential Revision: https://reviews.llvm.org/D133580	2022-09-22 08:53:46 -07:00
Sanjay Patel	0f32a5dea0	[InstCombine] don't canonicalize shl+sub to mul+add This stops Negator from transforming: `C1 - shl X, C2 --> mul X, (1<<C2) + C1` ...in the general case. There does not seem to be any analysis benefit to using mul in IR, and there's definitely downside in codegen (particularly when the multiply has to be expanded). If `C1` is 0, then there's a stronger argument that the single mul is a better canonicalization than negate-of-shl, but we may want to remove that too. This was noted as a potential conflict for D133667. Differential Revision: https://reviews.llvm.org/D134310	2022-09-21 08:39:07 -04:00
Simon Pilgrim	09cb9fdef9	[InstCombine] Fold ult(add(x,-1),c) -> ule(x,c) iff x != 0 (PR57635) Alive2: https://alive2.llvm.org/ce/z/sZ6wwS As detailed on Issue #57635 and #37628 - for unsigned comparisons, we can compare prior to a decrement iff the value is known never to be zero. Differential Revision: https://reviews.llvm.org/D134172	2022-09-20 16:44:41 +01:00
Vitaly Buka	bbef90ace4	[IRBuilder] Use PoisonValue in CreateMasked* Followup to 72b776168c7c80d2035c7226488462dcffc97e75 Reviewed By: nlopes Differential Revision: https://reviews.llvm.org/D133967	2022-09-19 11:01:41 -07:00
Florian Hahn	582f8ef19f	[LV] Keep track of cost-based ScalarAfterVec in VPWidenPointerInd. Epilogue vectorization uses isScalarAfterVectorization to check if widened versions for inductions need to be generated and bails out in those cases. At the moment, there are scenarios where isScalarAfterVectorization returns true but VPWidenPointerInduction::onlyScalarsGenerated would return false, causing widening. This can lead to widened phis with incorrect start values being created in the epilogue vector body. This patch addresses the issue by storing the cost-model decision in VPWidenPointerInductionRecipe and restoring the behavior before 151c144. This effectively reverts 151c144, but the long-term fix is to properly support widened inductions during epilogue vectorization Fixes #57712.	2022-09-19 18:14:35 +01:00
Florian Hahn	f02ff5348f	[LV] Move new epilog-vectorization-widen-inductions.ll to AArch64 dir. The test requires the AArch64 backend, so move it to the right subdir.	2022-09-19 17:13:06 +01:00
Sanjay Patel	d6498abc24	[InstCombine] remove multi-use add demanded constant fold This was originally part of D133788. There are no visible regressions. All of the diffs show a large unsigned constant becoming a small negative constant. This should be better for analysis (and slightly less compile-time) and codegen.	2022-09-18 14:23:43 -04:00
Graham Hunter	1f639d1bd2	[NFC][LV] Convert masked call tests to use update script	2022-09-09 10:07:39 +01:00
Florian Hahn	fc444ddc77	[VPlan] Add field to track if intrinsic should be used for call. (NFC) This patch moves the cost-based decision whether to use an intrinsic or library call to the point where the recipe is created. This untangles code-gen from the cost model and also avoids doing some extra work as the information is already computed at construction. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D132585	2022-09-01 13:14:40 +01:00

1 2 3 4 5 ...

394 Commits