llvm-project

Author	SHA1	Message	Date
Hongtao Yu	c38c8d6743	[PseudoProbe] Refactoring a test As titled. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D144137	2023-02-15 14:07:51 -08:00
Graham Hunter	0fa5df1959	[LV] Synthesize all true masks for masked vector function variants When vectorizing code with function calls in it, if we encounter a function which only has vectorized variants requiring a mask we can synthesize an all-true mask to enable us to proceed. Since we want the mask to be represented in vplan, the pointer to the chosen Function is now stored as part of the VPWidenCallRecipe, and mask arguments are added at the appropriate index to the recipe operands. Reviewed By: david-arm, fhahn, reames Differential Revision: https://reviews.llvm.org/D132458	2023-02-14 14:33:18 +00:00
Florian Hahn	af3c25dc3d	[VPlan] Fix iterator invalidation in adjustFixedOrderRecurrences. adjustFixedOrderRecurrences may insert instructions after immediately after the PHI nodes in the block. This invalidates the phis() iterator. To avoid crashing/accessing invalid recipes, first collect all first-order recurrence phi recipes. This should fix a crash reported by @dmgreen after D142589 landed.	2023-02-13 13:51:14 +00:00
Sander de Smalen	5a115452c4	Reland D143267: [LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices. Fixed issue where 'ConstantInt::get(IndextTy, -Part)' was executed with the wrong type for Part, e.g. IndexTy was i64, but Part was 'unsigned', which led to things like 'mul i64 .., 4294967292', which was obviously wrong. Also changed sve-vector-reverse.ll to be vectorized with UF>1 to test this. This reverts commit 1f01cdda68614dba12af3cc3aff38541d0abcc6b.	2023-02-09 09:42:29 +00:00
Florian Hahn	c83fdc905a	[LV] Perform recurrence sinking directly on VPlan. This patch updates LV to sink recipes directly using the VPlan use chains. The initial patch only moves sinking to be purely VPlan-based. Follow-up patches will move legality checks to VPlan as well. At the moment, there's a single test failure remaining. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D142589	2023-02-08 15:49:29 +00:00
Sander de Smalen	1f01cdda68	Revert "[LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices." This patch causes a regression, so reverting it while I investigate the issue. This reverts commit e6eb84a191ca2a1afd5789c5bb398da68bb6065e.	2023-02-08 15:46:52 +00:00
Fangrui Song	af12879146	[RISCV] Allow mismatched SmallDataLimit and use Min for conflicting values Fix an issue about module linking with LTO. When compiling with PIE, the small data limitation needs to be consistent with that in PIC, otherwise there will be linking errors due to conflicting values. bar.c ``` int bar() { return 1; } ``` foo.c ``` int foo() { return 1; } ``` ``` clang --target=riscv64-unknown-linux-gnu -flto -c foo.c -o foo.o -fPIE clang --target=riscv64-unknown-linux-gnu -flto -c bar.c -o bar.o -fPIC clang --target=riscv64-unknown-linux-gnu -flto foo.o bar.o -flto -nostdlib -v -fuse-ld=lld ``` ``` ld.lld: error: linking module flags 'SmallDataLimit': IDs have conflicting values in 'bar.o' and 'ld-temp.o' clang-15: error: linker command failed with exit code 1 (use -v to see invocation) ``` Use Min instead of Error for conflicting SmallDataLimit. Authored by: @joshua-arch1 Signed-off-by: xiaojing.zhang <xiaojing.zhang@xcalibyte.com> Signed-off-by: jianxin.lai <jianxin.lai@xcalibyte.com> Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D131230	2023-02-07 17:13:21 -08:00
Florian Hahn	a69f23493e	[LV] Remove unused load from RISCV test (NFC). The test contained a unused load that appears unrelated to the test (store of vector of i1). Remove it to avoid test changes in follow-up change which will lead to dead loads being removed.	2023-02-07 21:55:44 +00:00
Sander de Smalen	e6eb84a191	[LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices. This is specifically relevant for loops that vectorize using a scalable VF, where the code results in: %vscale = call i32 llvm.vscale.i32() %vf.part1 = mul i32 %vscale, 4 %gep = getelementptr ..., i32 %vf.part1 Which InstCombine then changes into: %vscale = call i32 llvm.vscale.i32() %vf.part1 = mul i32 %vscale, 4 %vf.part1.zext = sext i32 %vf.part1 to i64 %gep = getelementptr ..., i32 %vf.part1.zext D143016 tried to remove these extends, but that only works when the call to llvm.vscale.i32() has a single use. After doing any kind of CSE on these calls the combine no longer kicks in. It seems more sensible to ask DataLayout what type to use, rather than relying on InstCombine to insert the extend and hoping it can fold it away. I've only changed this for indices that are not constant, because I vaguely remember there was a reason for sticking with i32. It would also mean patching up loads more tests. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D143267	2023-02-07 11:47:51 +00:00
Florian Hahn	40621ff4b8	[LV] Also check interleaving only in select-min-index.ll The new combination exposed a crash in earlier versions of D132063.	2023-02-06 11:30:14 +00:00
Florian Hahn	a9ac22b501	[LV] Add users for loads to make tests more robust. Update a few tests to add users to loads to avoid them being optimized out by future changes. In cases the unused loads didn't matter for the test, remove them.	2023-02-04 20:42:57 +00:00
Florian Hahn	fd5bccb8b1	[LV] Add initial tests for sinking loads past other instructions. Extend test coverage for sinking loads that use fixed order recurrences.	2023-02-04 18:18:33 +00:00
Craig Topper	df76ff98e8	[InstCombine][LV] Fold (add (zext (add X, -1)), 1) -> (zext X) if X is non-zero. This artifact can appear from the vectorizer. (add X, -1) is the backedge taken count. It gets zero extended and then 1 is added to it to get the trip count. There is usually a dominating branch that rules out X being zero. Alive: https://alive2.llvm.org/ce/z/NsRDwX	2023-01-30 17:45:01 -08:00
Matt Devereau	8ff47f6032	[LoopVectorize] Enable integer Mul and Add as select reduction patterns This patch vectorizes Phi node loop reductions for select's whos condition comes from a floating-point comparison, with its operands being integers for Add, Sub, and Mul reductions. Example: int foo(float *x, int n) { int sum = 0; for (int i=0; i<n; ++i) { float elem = x[i]; if (elem > 0) { sum += 2; } } return sum; } This would previously fail to vectorize due to the integer reduction.	2023-01-30 09:41:40 +00:00
Matt Devereau	4468e27d9f	Revert "[LoopVectorize] Enable integer Mul and Add as select reduction patterns" This reverts commit f90103851f9a381bbf7ed6da250217577afd00d2.	2023-01-26 12:02:16 +00:00
Matt Devereau	f90103851f	[LoopVectorize] Enable integer Mul and Add as select reduction patterns This patch vectorizes Phi node loop reductions for select's whos condition comes from a floating-point comparison, with its operands being integers for Add, Sub, and Mul reductions. Example: int foo(float *x, int n) { int sum = 0; for (int i=0; i<n; ++i) { float elem = x[i]; if (elem > 0) { sum += 2; } } return sum; } Differential Revision: https://reviews.llvm.org/D141842	2023-01-25 13:25:18 +00:00
Florian Hahn	fb40c34b8f	[VPlan] Consider all recipes in replicate blocks as sink candidates. Update sinkScalarOperands to consider all operands of recipes in replicate blocks as sink candidates This enables additional sinking opportunities and is another step towards retiring LLVM IR-based sinkScalarOperands. This enables iterative sinking of operands for successive calls of sinkScalarOperands. Depends on D139788. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D139790	2023-01-21 17:14:13 +00:00
Daniel Kiss	c4fa504f79	[AArch64] Enable libm vectorized functions via SLEEF It enables trigonometry functions vectorization via SLEEF: http://sleef.org/. - A new vectorization library enum is added to TargetLibraryInfo.h: SLEEF. - A new option is added to TargetLibraryInfoImpl - ClVectorLibrary: SLEEF. - A comprehensive test case is included in this changeset. - A new vectorization library argument is added to -fveclib: -fveclib=SLEEF. Trigonometry functions that are vectorized by sleef: acos asin atan atanh cos cosh exp exp2 exp10 lgamma log10 log2 log sin sinh sqrt tan tanh tgamma Co-authored-by: Stefan Teleman Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D134719	2023-01-20 18:52:38 +01:00
Nikita Popov	9ed2f14c87	[AsmParser] Remove typed pointer auto-detection IR is now always parsed in opaque pointer mode, unless -opaque-pointers=0 is explicitly given. There is no automatic detection of typed pointers anymore. The -opaque-pointers=0 option is added to any remaining IR tests that haven't been migrated yet. Differential Revision: https://reviews.llvm.org/D141912	2023-01-18 09:58:32 +01:00
Florian Hahn	830d0bc56b	[AArch64] Set MaxInterleaveFactor for Apple A14, A15, A16. Those CPUs can benefit from additional interleaving. Reviewed By: jroelofs Differential Revision: https://reviews.llvm.org/D141499	2023-01-11 18:52:51 +00:00
Florian Hahn	42f6783747	[AArch64] Add tests for selecting interleave counts for different CPUs. Add extra tests for interleaving heuristics for different AArch64 CPUs.	2023-01-11 14:50:33 +00:00
Paul Walker	eae26b6640	[IRBuilder] Use canonical i64 type for insertelement index used by vector splats. Instcombine prefers this canonical form (see getPreferredVectorIndex), as does IRBuilder when passing the index as an integer so we may as well use the prefered form from creation. NOTE: All test changes are mechanical with nothing else expected beyond a change of index type from i32 to i64. Differential Revision: https://reviews.llvm.org/D140983	2023-01-11 14:08:06 +00:00
Miguel Saldivar	3c5e0d87f8	[LoopVectorize] Clear cache of `LoopAccessInfoManager` LAI is cached during the LoopDistribute pass, and is later re-used during LoopVectorize. The problem is that LoopVectorize changes SCEV, and the cached LAI does not get updated. Hence, when re-using the cached LAI, it references an invalid SCEV. Fixes #59319 Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D139601	2023-01-11 09:03:40 +00:00
Florian Hahn	78914e8c32	[VPlan] Keep entries in worklist in sinkScalarOperands. Not removing the entries ensures that duplicates are avoided, reducing the number of iterations.	2023-01-08 15:52:57 +00:00
Craig Topper	e5a71a41d8	[RISCV] Add support for the vscale_range attribute. This is based on @frasercrmck's D107290. At least some of the clang portion of D107290 has already been committed. This uses vscale_range for min/max vector width unless the command line overrides are used. As a follow up, I plan to add a max or exact VLEN option to clang to control the vscale_range. This will eliminate many of the reasons for users to use the overrides through the -mllvm interface. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D139873	2023-01-06 08:20:37 -08:00
Nikita Popov	5867241eac	[Transforms] Convert some tests to opaque pointers (NFC)	2023-01-06 12:14:45 +01:00
Florian Hahn	68469a80cb	[LV] Disable runtime unrolling for vectorized loops. This patch adds metadata to disable runtime unrolling to the vectorized loop. If runtime unrolling/interleaving is considered profitable, LV will interleave the loop directly. There should be no need to perform runtime unrolling at a later stage. Note that we already add metadata to disable runtime unrolling to the scalar loop after vectorization. The additional unrolling unnecessarily increases code size and compile time. In addition to that we have several bug reports of unncessary runtime unrolling for vectorized loops, e.g. PR40961 Compile-time improvements: NewPM-O3: -1.04% NewPM-ReleaseThinLTO: -0.59% NewPM-ReleaseLTO-g: -0.97% https://llvm-compile-time-tracker.com/compare.php?from=ce1be13a868d0f8afa367975558c1a6175cce33a&to=78bc2e67f22e9e10e61cdb6cdac4bb857d95eb1b&stat=instructions:u Fixes #40306. Reviewed By: lebedev.ri, nikic Differential Revision: https://reviews.llvm.org/D115261	2023-01-06 10:56:17 +00:00
David Green	586fd86b0a	[LoopVectorizer] Fix inloop reductions mask placement The validation of vplans could fail if an inloop reduction was created with a block-in mask that did not dominate the reduction. This makes sure that the insert point is set when creating the mask, to ensure it dominates the reduction. Differential Revision: https://reviews.llvm.org/D141003	2023-01-05 11:37:37 +00:00
Augie Fackler	0676156f81	Revert "[VPlan] Also consider operands of sink candidates in same block." This reverts commit aa2414729ebbcb2d8f162e9002a3a6aa768b1f9d. Previously-valid IR from a tensorflow test case (as shown on the Diffusion revision for aa2414729ebbcb2d8f162e9002a3a6aa768b1f9d) started hanging in the loop-vectorize pass. Reverting to keep everyone working.	2023-01-04 16:17:13 -05:00
Nikita Popov	2fab927546	[LoopVectorize] Convert some tests to opaque pointers (NFC) Check lines for some of these tests were regenerated. The difference is that with opaque pointers SCEVExpander always emits i8 GEPs, making the address calculation explicit. This is a known problem that will be solved long term by making all address calculations explicit.	2023-01-04 17:25:42 +01:00
David Green	11f3308ca2	[NFC] Regenerate reduction-inloop.ll check lines. NFC	2023-01-04 16:02:20 +00:00
Florian Hahn	aa2414729e	[VPlan] Also consider operands of sink candidates in same block. Even if the the sink candidate is already in the target block, its operands can be candidates for sinking. Queue them up as well. Also moves the queuing logic to a helper.	2022-12-30 18:24:35 +00:00
Florian Hahn	f5c766ba14	[LV] Convert a few tests to use opaque pointers (NFC).	2022-12-27 23:01:41 +00:00
Florian Hahn	e91e62db14	[LV] Sink scalar operands and merge regions repeatedly. Merging regions can enable new sinking opportunities (e.g. if users of a scalar value are moved from different VPBBs into the same VPBB). Sinking in turn can also enable new merging opportunities (e.g. if a recipe between to merge-able regions is moved. To enable more sinking opportunities, repeat sinking & merging if regions could be merged. Also fix mergeReplicateRegions to return the correct Changed status. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D139788	2022-12-27 18:08:32 +00:00
Florian Hahn	36d70a6aea	[VPlan] Remove redundant blocks by merging them into predecessors. Add and run VPlan transform to fold blocks with a single predecessor into the predecessor. This remove redundant blocks and addresses a TODO to replace special handling for the vector latch VPBB. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D139927	2022-12-26 22:47:09 +00:00
Florian Hahn	9758242046	[LV] Use SCEV to check if the trip count <= VF * UF. Just comparing constant trip counts causes LV to miss cases where the vector loop body only executes once. The motivation for this is to remove the need for unrolling to remove vector loop back-edges, if the body only executes once in more cases. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D133017	2022-12-24 18:34:54 +00:00
Florian Hahn	e1650c8d52	[LV] Move exit cond simplification to separate transform. This sets the stage for D133017 by moving out the code that performs VPlan based simplifications to a separate transform that takes the chosen VF & UF as arguments. The main advantage is that this transform runs before any changes to the CFG are being made. This allows using SCEV without worrying about making queries while the IR is in an incomplete state. Note that this patch switches the reasoning to use SCEV, but still only simplifies loops with constant trip counts. Using SCEV here is needed to access the backedge taken count, because the trip count IR value has not been created yet. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D135017	2022-12-23 12:51:21 +00:00
Paul Walker	0bca44680a	[InstCombine] Bubble vector.reverse of binop operands to their result. This mirrors a similar shufflevector transformation so the same effect is obtained for scalable vectors. The transformation is only performed when it can be proven the number of resulting reversals is not increased. By bubbling the reversals from operand to result this should typically be the case and ideally leads to back-back shuffles that can be elimitated entirely. Differential Revision: https://reviews.llvm.org/D139342	2022-12-21 15:53:14 +00:00
Paul Walker	362c52ad5a	[InstCombine] Bubble vector.reverse of compare operands to their result. This mirrors a similar shufflevector transformation so the same effect is obtained for scalable vectors. The transformation is only performed when it can be proven the number of resulting reversals is not increased. By bubbling the reversals from operand to result this should typically be the case and ideally leads to back-back shuffles that can be elimitated entirely. Differential Revision: https://reviews.llvm.org/D139340	2022-12-21 15:53:14 +00:00
Florian Hahn	f69ac9a22d	[LV] Support widened induction variables in epilogue vectorization. Code generation now uses the start VPValue of induction recipes. This makes it possible to adjust the start value of the epilogue vector loop to use the 'resume' value of the main vector loop. Fixes #59459. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D92132	2022-12-21 13:58:50 +00:00
Florian Hahn	d58f670788	[LV] Add test for #59459 .	2022-12-21 13:23:25 +00:00
Nikita Popov	88419a30a0	[LICM] Allow load-only scalar promotion in the presence of aliasing loads During scalar promotion, if there are additional potentially-aliasing loads outside the promoted set, we can still perform a load-only promotion. As the stores are retained, any potentially-aliasing loads will still read the correct value. This increases the number of load promotions in llvm-test-suite by a factor of two: \| Old \| New licm.NumPromotionCandidates \| 4448 \| 6038 licm.NumLoadPromoted \| 479 \| 1069 licm.NumLoadStorePromoted \| 1459 \| 1459 Unfortunately, this does have some impact on compile-time: http://llvm-compile-time-tracker.com/compare.php?from=57f7f0d6cf0706a88e1ecb74f3d3e8891cceabfa&to=72b811738148aab399966a0435f13b695da1c1c8&stat=instructions In part this is because we now have less early bailouts from promotion, but also due to second order effects (e.g. for one case I looked at we spend more time in SLP now). Differential Revision: https://reviews.llvm.org/D133192	2022-12-20 10:02:46 +01:00
Florian Hahn	c6bbf05a02	[LV] Convert some tests to use opaque pointers (NFC).	2022-12-19 20:55:44 +00:00
Florian Hahn	cf8d8a33c6	[LV] Convert some tests to use opaque pointers (NFC).	2022-12-19 20:44:44 +00:00
Roman Lebedev	4def99e642	[InstCombine] Try to fold `not` into `cmp` iff other users of `cmp` are freely invertible There is still some such patterns that require collaboration of folds to handle,that we don't currently do.	2022-12-19 00:24:28 +03:00
Florian Hahn	3d3634e8bd	[LV] Add extra test for D139927.	2022-12-14 22:47:05 +00:00
Florian Hahn	e898479f2b	[VPlan] Sink non-uniform recieps for scalar plans. In scalar plans, replicate recipes will only generate a single value per UF, independent of whether they are uniform or not. So don't consider uniformity for plans with scalar VFs only. This allows us to handle a few additional cases in VPlan sinking instead of non-VPlan sinkScalarOperands. Depends on D133762. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D134218	2022-12-14 17:55:31 +00:00
Nikita Popov	5b40015063	[LoopVectorize] Convert some tests to opaque pointers (NFC) For these tests update_test_checks.py had to be rerun.	2022-12-14 15:27:31 +01:00
Nikita Popov	7d7577256b	[LoopVectorize] Convert some tests to opaque pointers (NFC)	2022-12-14 15:16:59 +01:00
Philip Reames	b0f904b6da	[LV] Account for minimum vscale when rejecting scalable vectorization of short loops The vectorizer has code to reject scalable vectorization of loops with very short trip counts, and instead use fixed length vectors. The current code doesn't account for the minimum vscale value known, and thus under estimates the number of lanes in the scalable type for RISCV's default configuration. This results in use of predication and a trivially dead loop where a single straight line piece of code would suffice. Note that the code quality of the original scalable vectorization could (and probably should) be improved other ways as well. This patch is solely about whether the scalable vectorization was the right choice to begin with. This bit of code - both with and without my change - does make the unchecked assumption that the target knows how to lower fixed length vectors whose length is provably less than the vector length. Differential Revision: https://reviews.llvm.org/D137285	2022-12-09 11:29:41 -08:00

1 2 3 4 5 ...

1952 Commits