llvm-project

Author	SHA1	Message	Date
Ramkumar Ramachandra	04b1276ad3	LoopVectorize/iv-select-cmp: add tests for truncated IV The current tests in iv-select-cmp.ll are not representative of clang output of common real-world C programs, which are often written with i32 induction vars, as opposed to i64 induction vars. Hence, add five tests corresponding to the following programs: int test(int a, int n) { int rdx = 331; for (int i = 0; i < n; i++) { if (a[i] > 3) rdx = i; } return rdx; } int test(int a) { int rdx = 331; for (int i = 0; i < 20000; i++) { if (a[i] > 3) rdx = i; } return rdx; } int test(int a, long n) { int rdx = 331; for (int i = 0; i < n; i++) { if (a[i] > 3) rdx = i; } return rdx; } int test(int a, unsigned n) { int rdx = 331; for (int i = 0; i < n; i++) { if (a[i] > 3) rdx = i; } return rdx; } int test(int *a) { int rdx = 331; for (long i = INT_MIN - 1; i < UINT_MAX; i++) { if (a[i] > 3) rdx = i; } return rdx; } The first two can theoretically be vectorized without a runtime-check, while the third and fourth cannot. The fifth cannot be vectorized, even with a runtime-check. This issue was found while reviewing D150851. Differential Revision: https://reviews.llvm.org/D156124	2023-08-30 13:09:37 +01:00
Florian Hahn	96e83d3705	[LV] Use IRBuilder to create and optimize middle-block compare. Split off from D150398 to avoid builder-related diff changes there. Using IRBuilder to create ICmps simplifies the result if both operands are constants. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D158332	2023-08-29 11:42:18 +01:00
David Sherwood	c02184f286	[LoopVectorize] Allow inner loop runtime checks to be hoisted above an outer loop Suppose we have a nested loop like this: void foo(int32_t dst, int32_t src, int m, int n) { for (int i = 0; i < m; i++) { for (int j = 0; j < n; j++) { dst[(i * n) + j] += src[(i * n) + j]; } } } We currently generate runtime memory checks as a precondition for entering the vectorised version of the inner loop. However, if the runtime-determined trip count for the inner loop is quite small then the cost of these checks becomes quite expensive. This patch attempts to mitigate these costs by adding a new option to expand the memory ranges being checked to include the outer loop as well. This leads to runtime checks that can then be hoisted above the outer loop. For example, rather than looking for a conflict between the memory ranges: 1. &dst[(i * n)] -> &dst[(i * n) + n] 2. &src[(i * n)] -> &src[(i * n) + n] we can instead look at the expanded ranges: 1. &dst[0] -> &dst[((m - 1) * n) + n] 2. &src[0] -> &src[((m - 1) * n) + n] which are outer-loop-invariant. As with many optimisations there is a trade-off here, because there is a danger that using the expanded ranges we may never enter the vectorised inner loop, whereas with the smaller ranges we might enter at least once. I have added a HoistRuntimeChecks option that is turned off by default, but can be enabled for workloads where we know this is guaranteed to be of real benefit. In future, we can also use PGO to determine if this is worthwhile by using the inner loop trip count information. When enabling this option for SPEC2017 on neoverse-v1 with the flags "-Ofast -mcpu=native -flto" I see an overall geomean improvement of ~0.5%: SPEC2017 results (+ is an improvement, - is a regression): 520.omnetpp: +2% 525.x264: +2% 557.xz: +1.2% ... GEOMEAN: +0.5% I didn't investigate all the differences to see if they are genuine or noise, but I know the x264 improvement is real because it has some hot nested loops with low trip counts where I can see this hoisting is beneficial. Tests have been added here: Transforms/LoopVectorize/runtime-checks-hoist.ll Differential Revision: https://reviews.llvm.org/D152366	2023-08-24 12:14:02 +00:00
David Sherwood	494d28ec07	[LoopVectorize] Add pre-commit tests for D152366 Differential Revision: https://reviews.llvm.org/D154075	2023-08-24 10:52:18 +00:00
Florian Hahn	c071dba1a3	[LV] update hexagon test to use load results. The current version of the test doesn't use any of the loads, so they can be removed together with the mask of the interleave group. Use some loaded values and store them, to prevent the mask from being optimized away.	2023-08-22 20:20:58 +01:00
Florian Hahn	34d25924c4	[VPlan] Mark some VPInstruction opcodes as not having side effects. Mark some VPInstruction opcodes as not having side effects, preparation for D157037.	2023-08-22 20:05:57 +01:00
Kolya Panchenko	acbe886880	[LV] Vectorization remark for outerloop Reviewed By: fhahn, ABataev Differential Revision: https://reviews.llvm.org/D150696	2023-08-21 13:05:06 -04:00
Florian Hahn	686aef8401	[LV] Remove compares and branches on undef from a few tests.	2023-08-18 16:28:42 +01:00
Roland Froese	4d425f8663	[PowerPC] vector cost model add cost to extract i1 Try to avoid some unprofitable predication on PPC. Recognize in the cost model that computing on i1 values will require extra mask or compare operation. Differential Revision: https://reviews.llvm.org/D155876	2023-08-14 17:04:11 -04:00
Kerry McLaughlin	5d814b3848	Revert "[AArch64][SVE2] Change the cost of extends with S/URHADD to 0" This reverts commit dda2cd2505301aa626fcd3e8dea2a447227d00ca.	2023-08-14 10:44:13 +00:00
Kerry McLaughlin	dda2cd2505	[AArch64][SVE2] Change the cost of extends with S/URHADD to 0 When SVE2 is enabled, we can combine an add of 1, add & shift right by 1 to a single s/urhadd instruction. If the operands to the adds are extended, these extends will fold into the s/urhadd and their costs should be 0. Reviewed By: dtemirbulatov Differential Revision: https://reviews.llvm.org/D157628	2023-08-14 10:32:06 +00:00
Anna Thomas	5dfdf34df0	[LV] Move interleaved test to X86 directory Remove the x86-registered-target under REQUIRES.	2023-08-09 16:03:33 -04:00
David Spickett	c09bdfe6f7	[LV] Require x86 target for interleaved access test This is failing on every Linaro bot that only builds the Arm or AArch64 targets, adding X86, it passes.	2023-08-09 09:02:02 +00:00
Anna Thomas	cb7d28ef52	Fix BB failure for check lines Fix clang build bots which complain of missing check lines for Loop access analysis by generating two run lines (original commit: 3cf24dbb).	2023-08-08 20:28:33 -04:00
Anna Thomas	3cf24dbbdd	[LV] Complete load groups and release store groups. Try 2. This is a complete fix for CompleteLoadGroups introduced in D154309. We need to check for dependency between A and every member of the load Group of B. This patch also fixes another miscompile seen when we incorrectly sink stores below a depending load (see testcase in interleaved-accesses-sink-store-across-load.ll). This is fixed by releasing store groups correctly. This change was previously reverted (e85fd3cbdd68) due to Asan failure with use-after-free error. A testcase is added and the bug is fixed in this version of the patch. Differential Revision: https://reviews.llvm.org/D155520	2023-08-08 18:10:23 -04:00
Florian Hahn	af635a5547	[VPlan] Model wrap flags directly, remove NUW opcodes (NFC) Model wrap flags directly using VPRecipeWithIRFlags and clean up the duplicated NUW opcodes. D157144 will build on this and also model FMFs for VPInstruction. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D157194	2023-08-08 12:12:30 +01:00
Florian Hahn	93c5bae00e	[VPlan] Use printOperands for VPInstruction. Use the printOperands for printing VPInstruction's operands to be more in line with other recipes and ensure consistent printing after D15719. Also removes some stray spaces in print output.	2023-08-08 11:31:21 +01:00
Florian Hahn	539acce167	[LV] Add variant of test without dead load. The original test has a unused load, which is removed. Also add a variant with a store that cannot be removed, forcing the mask for the block to always be generated.	2023-08-05 14:15:18 +01:00
Jolanta Jensen	3feb63e112	[TLI][AArch64] Add SLEEF mappings to scalable vector functions for fmod and fmodf This patch adds SLEEF mappings to scalable vector functions for fmod and fmodf. Differential Revision: https://reviews.llvm.org/D156920	2023-08-03 14:33:33 +00:00
Mel Chen	425e9e81a0	[LV] Rename the Select[I\|F]Cmp reduction pattern to [I\|F]AnyOf. (NFC) Regarding this NFC change, please refer to the discussion in this thread. https://reviews.llvm.org/D150851#4467261 Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D155786	2023-08-03 00:37:19 -07:00
Florian Hahn	cdb7d5767c	[LV] Add test for select truncation. Add test coverage for truncating selects for D149903.	2023-08-01 18:53:36 +01:00
Florian Hahn	707359ecf5	Recommit "[LV] Re-use existing broadcast value for live-ins." This reverts commit 245ec675a4e41f7ec24dfc998720bffdc46a6c53. Recommits eea9258648ce with a fix to only erase the instruction from the first part if it is defined outside the loop. This fixes a use-after-free error reported.	2023-08-01 15:54:02 +01:00
Zhongyunde	497966f7f2	Reland [InstSimplify] Remove the remainder loop if we know the mask is always true We check the loop trip count is known a power of 2 to determine whether the tail loop can be eliminated in D146199. However, the remainder loop of mask scalable loop can also be removed If we know the mask is always going to be true for every vector iteration. Depend on the assume of power-of-two vscale on D155350 proofs： https://alive2.llvm.org/ce/z/bT62Wa Fix https://github.com/llvm/llvm-project/issues/63616. Reviewed By: goldstein.w.n, nikic, david-arm, paulwalker-arm Differential Revision: https://reviews.llvm.org/D154953	2023-08-01 22:20:22 +08:00
Florian Hahn	4e130420e3	[LV] Add test for op truncation from 245ec675a4e41. Add extra test for that issue in 245ec675a4e41. Also generate full check lines for tests, which should now be deterministic on all platforms.	2023-08-01 13:54:50 +01:00
Nikita Popov	d01aec4c76	[InstCombine] Set dead phi inputs to poison in more cases Set phi inputs to poison whenever we find a dead edge (either during initial worklist population or the main InstCombine run), instead of only doing this for successors of dead blocks. This means that the phi operand is set to poison even if for critical edges without an intermediate block. There are quite a few test changes, because the pattern is fairly common in vectorizer output, for cases where we know the vectorized loop will be entered.	2023-08-01 11:53:47 +02:00
Nikita Popov	7c64449e44	[LoopVectorize] Regenerate test checks (NFC) To reduce spurious diffs in future changes.	2023-08-01 11:30:55 +02:00
Nikita Popov	eb9fce092a	Revert "[InstSimplify] Remove the remainder loop if we know the mask is always true" This reverts commit 3e386b227886e2fb77b0c1e9182026c4e049f346. Next to the original fold, this also implements an unnecessary and inappropriate simplifyICmpWithDominatingAssume() based fold.	2023-08-01 09:03:20 +02:00
Zhongyunde	3e386b2278	[InstSimplify] Remove the remainder loop if we know the mask is always true We check the loop trip count is known a power of 2 to determine whether the tail loop can be eliminated in D146199. However, the remainder loop of mask scalable loop can also be removed If we know the mask is always going to be true for every vector iteration. Depend on the assume of power-of-two vscale on D155350 proofs： https://alive2.llvm.org/ce/z/FkTMoy Fix https://github.com/llvm/llvm-project/issues/63616. Reviewed By: goldstein.w.n, nikic, david-arm, paulwalker-arm Differential Revision: https://reviews.llvm.org/D154953	2023-08-01 11:20:20 +08:00
Florian Hahn	4162f36bcb	[LV] Regenerate check lines for shrinking tests. Make sure the full IR is checked for loop-vectorization-factors.ll and to make sure nothing gets missed and add missing checks for type-shrinkage-insertelt.ll. Also removes some undef ops from tests.	2023-07-30 16:38:28 +01:00
David Green	2a859b2014	[AArch64] Change the cost of vector insert/extract to 2 The cost of vector instructions has always been high under AArch64, in order to add a high cost for inserts/extracts, shuffles and scalarization. This is a conservative approach to limit the scope of unusual SLP vectorization where the codegen ends up being quite poor, but has always been higher than the correct costs would be for any specific core. This relaxes that, reducing the vector insert/extract cost from 3 to 2. It is a generalization of D142359 to all AArch64 cpus. The ScalarizationOverhead is also overridden for integer vector at the same time, to remove the effect of lane 0 being considered free for integer vectors (something that should only be true for float when scalarizing). The lower insert/extract cost will reduce the cost of insert, extracts, shuffling and scalarization. The adjustments of ScalaizationOverhead will increase the cost on integer, especially for small vectors. The end result will be lower cost for float and long-integer types, some higher cost for some smaller vectors. This, along with the raw insert/extract cost being lower, will generally mean more vectorization from the Loop and SLP vectorizer. We may end up regretting this, as that vectorization is not always profitable. In all the benchmarking I have done this is generally an improvement in the overall performance, and I've attempted to address the places where it wasn't with other costmodel adjustments. Differential Revision: https://reviews.llvm.org/D155459	2023-07-28 21:26:50 +01:00
Florian Hahn	cc39866436	[LV] Reorganize and extend in-loop reduction tests. Split off min-max in-loop reduction tests into separate file and extend them by adding tests with * min & max intrinsics * fmuladd with permuted operands * min & max select tests with permuted operands. Adds extra test coverage as suggested in D155845.	2023-07-26 23:23:14 +01:00
Anna Thomas	e85fd3cbdd	Revert "[LV] Complete load groups and release store groups in presence of dependency" This reverts commit eaf6117f3388615f51198e47c0d6be0252729508 (D155520). There's an ASAN build failure that needs investigation.	2023-07-26 15:07:26 -04:00
Ramkumar Ramachandra	110ec1863a	LoopVectorize/iv-select-cmp: add test for decreasing IV, const start The most straightforward extension to D150851 would involve a loop with decreasing induction variable, with a constant start value. iv-select-cmp.ll only contains a negative test for the decreasing induction variable case when the start value is variable, namely not_vectorized_select_decreasing_induction_icmp. Hence, add a test for the most straightforward extension to D150851, in preparation to vectorize: long rdx = 331; for (long i = 19999; i >= 0; i--) { if (a[i] > 3) rdx = i; } return rdx; Differential Revision: https://reviews.llvm.org/D156152	2023-07-26 14:15:26 +01:00
Anna Thomas	eaf6117f33	[LV] Complete load groups and release store groups in presence of dependency This is a complete fix for CompleteLoadGroups introduced in D154309. We need to check for dependency between A and every member of the load Group of B. This patch also fixes another miscompile seen when we incorrectly sink stores below a depending load (see testcase in interleaved-accesses-sink-store-across-load.ll). This is fixed by releasing store groups correctly. Differential Revision: https://reviews.llvm.org/D155520	2023-07-25 17:32:09 -04:00
Martin Storsjö	245ec675a4	Revert "[LV] Re-use existing broadcast value for live-ins." This reverts commit eea9258648ce73507f6f85c395de978af659d498. That commit triggered crashes in the following testcase: $ cat reduced.c typedef struct { int a[8] } b; typedef struct { b c; short d } e; void f() { int g; char h; e i = f; short j = i->d; int a = i->c->a[0]; for (;;) for (; g < a; g++) { h = j * i->d >> 8; h++; } } $ clang -target aarch64-linux-gnu -w -c -O2 reduced.c	2023-07-25 10:35:41 +03:00
Florian Hahn	eea9258648	[LV] Re-use existing broadcast value for live-ins. When requesting a vector value for a live-in, we can re-use the broadcast of the live-in of part 0 for parts > 0.	2023-07-24 11:50:47 +01:00
Maciej Gabka	38cdb007a5	Add missing SLEEF mappings to scalable vector functions for log2 and log2f In the original commit adding SLEEF mappings, https://reviews.llvm.org/D146839 mappings for log2/log2f were missing. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D155801	2023-07-21 13:59:13 +00:00
Maciej Gabka	b172fbff68	Revert "[TLI][AArch64] Add missing SLEEF mappings to scalable vector functions for log2 and log2f" This reverts commit 791c89600aaa288d7066aea95a1e06cd6d61b2e3.	2023-07-21 13:50:10 +00:00
Maciej Gabka	791c89600a	[TLI][AArch64] Add missing SLEEF mappings to scalable vector functions for log2 and log2f In the original commit adding SLEEF mappings, https://reviews.llvm.org/D146839 mappings for log2/log2f were missing. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D155623	2023-07-21 13:46:03 +00:00
David Green	2e0bf67df1	[LV][AArch64] Fix reductions costs in strict-fadd-cost.ll. NFC These tests were originally added in 0aff1798b5721d5f95d16f465b99d, where they were measuring the cost of fadd and fmuladd reductions, which should be fairly high cost. For some reason, due to the forced vector factors, the debug costs of each instruction are printed twice by the vectorizer. Once as if the instruction is a simple fadd/fmuladd, and later with the correct reduction cost. In d827865e9f778f5b27edb2afe003c2a the costs were updated to match the first print statements, where they would be better to match the second to test the cost of the reduction. This patch returns them to testing the original reduction costs.	2023-07-20 10:34:05 +01:00
Mel Chen	4ddc1745a8	[LV] Add tests for select-cmp reduction pattern. (NFC) The test cases for selecting increasing integer induction variable. Reviewed By: fhahn, shiva0217 Differential Revision: https://reviews.llvm.org/D153936	2023-07-19 20:17:36 -07:00
Philip Reames	7cc6b80d9a	[RISCV][CostModel] Model vrgather.vv as being quadradic in LMUL vrgather.vv across multiple vector registers (i.e. LMUL > 1) requires all to all data movement. This includes two conceptual sets of changes: For permutes, we were modeling these as being linear in LMUL. For reverse, we were modeling them as being fixed cost in LMUL. Both were wrong, and have been adjusted to O(LMUL^2). Noticed via code inspection while looking at something else. Its worth asking whether we should be lowering reverse to something other than a vrgather at high LMULs. That shuffle is quite expensive. (Future work) Differential Revision: https://reviews.llvm.org/D152019	2023-07-18 11:52:34 -07:00
Sander de Smalen	08fd44b300	[AArch64] Force streaming-compatible codegen when attributes are set. Before this patch, the only way to generate streaming-compatible code was to use the `-force-streaming-compatible-sve` flag, but the compiler should also avoid the use of instructions invalid in streaming mode when a function has the aarch64_pstate_sm_enabled/compatible attribute. Reviewed By: paulwalker-arm, david-arm Differential Revision: https://reviews.llvm.org/D155428	2023-07-18 10:26:00 +00:00
Florian Hahn	68746a8cea	[LV] Move all VPlan transforms after initial VPlan construction. Reorder VPlan transforms slightly so they are all grouped together, after disabling Value -> VPValue lookup. In terms of codegen impact, this should be NFC modulo a small number of instruction reorderings. Preparation to split up tryToBuildVPlanWithVPRecipes in a follow-up. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D154640	2023-07-18 10:53:30 +01:00
Anna Thomas	a5573bf030	[LV] Precommit test for interleaving miscompile Identified another miscompile while working on fixing interleaving's current miscompile in D154309. This is different from testcases landed in D154309, since it showcases an incorrect sinking of store (the former testcases in that review and follow-up ones) showed incorrect hoisting of loads across stores.	2023-07-17 17:24:40 -04:00
zhongyunde	4d2723bd00	[ValueTracking] Support vscale assumes for isKnownToBeAPowerOfTwo This patch is separated from D154953 to see what tests are affected by this change alone according comment. Depend on the related updating of LangRef on D155193. Reviewed By: paulwalker-arm, nikic, david-arm Differential Revision: https://reviews.llvm.org/D155350	2023-07-15 19:42:58 +08:00
Anna Thomas	dfaf4587e4	Precommit follow-up testcase for interleaved miscompile Follow-up testcase for PR63602. Suggested by Ayal in D154309, more complete fix coming up which should handle this testcase as well.	2023-07-14 16:04:56 -04:00
Maciej Gabka	5b0e19a7ab	[TLI][AArch64] Add mappings to vectorized functions from ArmPL Arm Performance Libraries contain math library which provides vectorized versions of common math functions. This patch allows to use it with clang and llvm via -fveclib=ArmPL or -vector-library=ArmPL, so loops with such calls can be vectorized. The executable needs to be linked with the amath library. Arm Performance Libraries are available at: https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Libraries Reviewed by: paulwalker-arm Differential Revision: https://reviews.llvm.org/D154508	2023-07-12 12:53:18 +00:00
Nikita Popov	edb2fc6dab	[llvm] Remove explicit -opaque-pointers flag from tests (NFC) Opaque pointers mode is enabled by default, no need to explicitly enable it.	2023-07-12 14:35:55 +02:00
Mel Chen	0158d86ab3	[LV] Change the test cases to ensure that the trip count is not zero. (NFC) Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D154415	2023-07-11 19:12:59 -07:00

1 2 3 4 5 ...

2166 Commits