llvm-project

Author	SHA1	Message	Date
Florian Hahn	4162f36bcb	[LV] Regenerate check lines for shrinking tests. Make sure the full IR is checked for loop-vectorization-factors.ll and to make sure nothing gets missed and add missing checks for type-shrinkage-insertelt.ll. Also removes some undef ops from tests.	2023-07-30 16:38:28 +01:00
David Green	2a859b2014	[AArch64] Change the cost of vector insert/extract to 2 The cost of vector instructions has always been high under AArch64, in order to add a high cost for inserts/extracts, shuffles and scalarization. This is a conservative approach to limit the scope of unusual SLP vectorization where the codegen ends up being quite poor, but has always been higher than the correct costs would be for any specific core. This relaxes that, reducing the vector insert/extract cost from 3 to 2. It is a generalization of D142359 to all AArch64 cpus. The ScalarizationOverhead is also overridden for integer vector at the same time, to remove the effect of lane 0 being considered free for integer vectors (something that should only be true for float when scalarizing). The lower insert/extract cost will reduce the cost of insert, extracts, shuffling and scalarization. The adjustments of ScalaizationOverhead will increase the cost on integer, especially for small vectors. The end result will be lower cost for float and long-integer types, some higher cost for some smaller vectors. This, along with the raw insert/extract cost being lower, will generally mean more vectorization from the Loop and SLP vectorizer. We may end up regretting this, as that vectorization is not always profitable. In all the benchmarking I have done this is generally an improvement in the overall performance, and I've attempted to address the places where it wasn't with other costmodel adjustments. Differential Revision: https://reviews.llvm.org/D155459	2023-07-28 21:26:50 +01:00
Florian Hahn	cc39866436	[LV] Reorganize and extend in-loop reduction tests. Split off min-max in-loop reduction tests into separate file and extend them by adding tests with * min & max intrinsics * fmuladd with permuted operands * min & max select tests with permuted operands. Adds extra test coverage as suggested in D155845.	2023-07-26 23:23:14 +01:00
Anna Thomas	e85fd3cbdd	Revert "[LV] Complete load groups and release store groups in presence of dependency" This reverts commit eaf6117f3388615f51198e47c0d6be0252729508 (D155520). There's an ASAN build failure that needs investigation.	2023-07-26 15:07:26 -04:00
Ramkumar Ramachandra	110ec1863a	LoopVectorize/iv-select-cmp: add test for decreasing IV, const start The most straightforward extension to D150851 would involve a loop with decreasing induction variable, with a constant start value. iv-select-cmp.ll only contains a negative test for the decreasing induction variable case when the start value is variable, namely not_vectorized_select_decreasing_induction_icmp. Hence, add a test for the most straightforward extension to D150851, in preparation to vectorize: long rdx = 331; for (long i = 19999; i >= 0; i--) { if (a[i] > 3) rdx = i; } return rdx; Differential Revision: https://reviews.llvm.org/D156152	2023-07-26 14:15:26 +01:00
Anna Thomas	eaf6117f33	[LV] Complete load groups and release store groups in presence of dependency This is a complete fix for CompleteLoadGroups introduced in D154309. We need to check for dependency between A and every member of the load Group of B. This patch also fixes another miscompile seen when we incorrectly sink stores below a depending load (see testcase in interleaved-accesses-sink-store-across-load.ll). This is fixed by releasing store groups correctly. Differential Revision: https://reviews.llvm.org/D155520	2023-07-25 17:32:09 -04:00
Martin Storsjö	245ec675a4	Revert "[LV] Re-use existing broadcast value for live-ins." This reverts commit eea9258648ce73507f6f85c395de978af659d498. That commit triggered crashes in the following testcase: $ cat reduced.c typedef struct { int a[8] } b; typedef struct { b c; short d } e; void f() { int g; char h; e i = f; short j = i->d; int a = i->c->a[0]; for (;;) for (; g < a; g++) { h = j * i->d >> 8; h++; } } $ clang -target aarch64-linux-gnu -w -c -O2 reduced.c	2023-07-25 10:35:41 +03:00
Florian Hahn	eea9258648	[LV] Re-use existing broadcast value for live-ins. When requesting a vector value for a live-in, we can re-use the broadcast of the live-in of part 0 for parts > 0.	2023-07-24 11:50:47 +01:00
Maciej Gabka	38cdb007a5	Add missing SLEEF mappings to scalable vector functions for log2 and log2f In the original commit adding SLEEF mappings, https://reviews.llvm.org/D146839 mappings for log2/log2f were missing. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D155801	2023-07-21 13:59:13 +00:00
Maciej Gabka	b172fbff68	Revert "[TLI][AArch64] Add missing SLEEF mappings to scalable vector functions for log2 and log2f" This reverts commit 791c89600aaa288d7066aea95a1e06cd6d61b2e3.	2023-07-21 13:50:10 +00:00
Maciej Gabka	791c89600a	[TLI][AArch64] Add missing SLEEF mappings to scalable vector functions for log2 and log2f In the original commit adding SLEEF mappings, https://reviews.llvm.org/D146839 mappings for log2/log2f were missing. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D155623	2023-07-21 13:46:03 +00:00
David Green	2e0bf67df1	[LV][AArch64] Fix reductions costs in strict-fadd-cost.ll. NFC These tests were originally added in 0aff1798b5721d5f95d16f465b99d, where they were measuring the cost of fadd and fmuladd reductions, which should be fairly high cost. For some reason, due to the forced vector factors, the debug costs of each instruction are printed twice by the vectorizer. Once as if the instruction is a simple fadd/fmuladd, and later with the correct reduction cost. In d827865e9f778f5b27edb2afe003c2a the costs were updated to match the first print statements, where they would be better to match the second to test the cost of the reduction. This patch returns them to testing the original reduction costs.	2023-07-20 10:34:05 +01:00
Mel Chen	4ddc1745a8	[LV] Add tests for select-cmp reduction pattern. (NFC) The test cases for selecting increasing integer induction variable. Reviewed By: fhahn, shiva0217 Differential Revision: https://reviews.llvm.org/D153936	2023-07-19 20:17:36 -07:00
Philip Reames	7cc6b80d9a	[RISCV][CostModel] Model vrgather.vv as being quadradic in LMUL vrgather.vv across multiple vector registers (i.e. LMUL > 1) requires all to all data movement. This includes two conceptual sets of changes: For permutes, we were modeling these as being linear in LMUL. For reverse, we were modeling them as being fixed cost in LMUL. Both were wrong, and have been adjusted to O(LMUL^2). Noticed via code inspection while looking at something else. Its worth asking whether we should be lowering reverse to something other than a vrgather at high LMULs. That shuffle is quite expensive. (Future work) Differential Revision: https://reviews.llvm.org/D152019	2023-07-18 11:52:34 -07:00
Sander de Smalen	08fd44b300	[AArch64] Force streaming-compatible codegen when attributes are set. Before this patch, the only way to generate streaming-compatible code was to use the `-force-streaming-compatible-sve` flag, but the compiler should also avoid the use of instructions invalid in streaming mode when a function has the aarch64_pstate_sm_enabled/compatible attribute. Reviewed By: paulwalker-arm, david-arm Differential Revision: https://reviews.llvm.org/D155428	2023-07-18 10:26:00 +00:00
Florian Hahn	68746a8cea	[LV] Move all VPlan transforms after initial VPlan construction. Reorder VPlan transforms slightly so they are all grouped together, after disabling Value -> VPValue lookup. In terms of codegen impact, this should be NFC modulo a small number of instruction reorderings. Preparation to split up tryToBuildVPlanWithVPRecipes in a follow-up. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D154640	2023-07-18 10:53:30 +01:00
Anna Thomas	a5573bf030	[LV] Precommit test for interleaving miscompile Identified another miscompile while working on fixing interleaving's current miscompile in D154309. This is different from testcases landed in D154309, since it showcases an incorrect sinking of store (the former testcases in that review and follow-up ones) showed incorrect hoisting of loads across stores.	2023-07-17 17:24:40 -04:00
zhongyunde	4d2723bd00	[ValueTracking] Support vscale assumes for isKnownToBeAPowerOfTwo This patch is separated from D154953 to see what tests are affected by this change alone according comment. Depend on the related updating of LangRef on D155193. Reviewed By: paulwalker-arm, nikic, david-arm Differential Revision: https://reviews.llvm.org/D155350	2023-07-15 19:42:58 +08:00
Anna Thomas	dfaf4587e4	Precommit follow-up testcase for interleaved miscompile Follow-up testcase for PR63602. Suggested by Ayal in D154309, more complete fix coming up which should handle this testcase as well.	2023-07-14 16:04:56 -04:00
Maciej Gabka	5b0e19a7ab	[TLI][AArch64] Add mappings to vectorized functions from ArmPL Arm Performance Libraries contain math library which provides vectorized versions of common math functions. This patch allows to use it with clang and llvm via -fveclib=ArmPL or -vector-library=ArmPL, so loops with such calls can be vectorized. The executable needs to be linked with the amath library. Arm Performance Libraries are available at: https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Libraries Reviewed by: paulwalker-arm Differential Revision: https://reviews.llvm.org/D154508	2023-07-12 12:53:18 +00:00
Nikita Popov	edb2fc6dab	[llvm] Remove explicit -opaque-pointers flag from tests (NFC) Opaque pointers mode is enabled by default, no need to explicitly enable it.	2023-07-12 14:35:55 +02:00
Mel Chen	0158d86ab3	[LV] Change the test cases to ensure that the trip count is not zero. (NFC) Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D154415	2023-07-11 19:12:59 -07:00
Florian Hahn	d7e79bd7d4	[LV] Check if ops can safely be truncated in computeMinimumValueSizes. Update computeMinimumValueSizes to check if an instruction's operands can safely be truncated. If more than MinBW bits are demanded by for the operand or if the operand is a constant and cannot be safely truncated, it is not safe to evaluate the instruction in the narrower MinBW. Skip those cases. Fixes https://github.com/llvm/llvm-project/issues/47927 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D154717	2023-07-11 20:18:55 +01:00
Florian Hahn	1739200654	[LV] Add trunc test variants with shl and ashr. Add extra tests for D154717 where narrowing results in poison.	2023-07-10 21:04:19 +01:00
Florian Hahn	14ec3f4b06	[LV] Skip VFs > # iterations remaining for epilogue vectorization. If a candidate VF for epilogue vectorization is greater than the number of remaining iterations, the epilogue loop would be dead. Skip such factors. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D154264	2023-07-07 21:43:51 +01:00
Florian Hahn	aee851fd0e	Revert "[LV] Skip VFs < iterations remaining for epilogue vectorization." This reverts commit 7cc0be01a0068946ea3613dc2cb45c81b0f45860. The title of the commit is incorrect, revert to fix the commit message.	2023-07-07 21:41:24 +01:00
Florian Hahn	7cc0be01a0	[LV] Skip VFs < iterations remaining for epilogue vectorization. If a candidate VF for epilogue vectorization is less than the number of remaining iterations, the epilogue loop would be dead. Skip such factors. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D154264	2023-07-07 20:33:42 +01:00
Luke Lau	b9af086292	[RISCV] Update loop vectorizer interleaved access test output 02bb33c3ce7a83d47244ae16c8b4c625aba187a2 changed it so it no longer unrolls the loop.	2023-07-07 15:38:04 +01:00
Nikita Popov	a5e253d659	[LoopVectorize] Regenerate test checks (NFC)	2023-07-07 14:42:31 +02:00
Florian Hahn	4d847bf4d0	[LV] Do not add load to group if it moves across conflicting store. This patch prevents invalid load groups from being formed, where a load needs to be moved across a conflicting store. Once we hit a store that conflicts with a load with an existing interleave group, we need to stop adding earlier loads to the group, as this would force hoisting the previous stores in the group across the conflicting load. To detect such cases, add a new CompletedLoadGroups set, which is used to keep track of load groups to which no earlier loads can be added. Fixes https://github.com/llvm/llvm-project/issues/63602 Reviewed By: anna Differential Revision: https://reviews.llvm.org/D154309	2023-07-07 11:06:30 +01:00
Florian Hahn	6b289304f6	[LV] Add test case for incorrect shift truncation. Test for https://github.com/llvm/llvm-project/issues/47927	2023-07-06 15:23:17 +01:00
Florian Hahn	a0fcf84a8c	[LV] Consider if scalar epilogue is required in getMaximizedVFForTarget. When a scalar epilogue is required, at least one iteration of the scalar loop has to execute. Adjust ConstTripCount accordingly to avoid picking a max VF that results in a dead vector loop. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D154261	2023-07-06 13:35:35 +01:00
Florian Hahn	1746ac42ca	[LV] Forget SCEVs for exit phis after vectorization. After vectorization, the exit blocks of the original loop will have additional predecessors. Invalidate SCEVs for the exit phis in case SE looked through single-entry phis. Fixes https://github.com/llvm/llvm-project/issues/63368 Fixes https://github.com/llvm/llvm-project/issues/63669	2023-07-04 21:28:03 +01:00
Florian Hahn	8a25dc3787	[LV] Regenerate check lines to reduced diff. Regenerate checks to avoid unnecessary changes in D154264.	2023-07-04 14:01:05 +01:00
Evgeniy Brevnov	d7329653d0	[VPlan] Allow sinking of instructions with no defs We started seeing new failure after D142886. Looks like it enabled new cases and we hit an assert: assert(Current->getNumDefinedValues() == 1 && "only recipes with a single defined value expected"); When we do instruction sinking for the first order recurrence we hit an assert if instruction doesn't have single def. In case instruction doesn't produce any new def there is no new users and nothing to sink. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D151204	2023-07-04 16:53:06 +07:00
Florian Hahn	e561edaaa5	[LV] Prepare tests for D154261. Update trip count of test in pr56319-vector-exit-cond-optimization-epilogue-vectorization.ll to make sure epilogue vectorization will still trigger after D154261, checking for the original issue. Move the original test to limit-vf-by-tripcount.ll for testing new functionality of D154261.	2023-07-03 17:49:36 +01:00
Florian Hahn	c14b0a7c55	[LV] Check for vector instruction in main vector loop. Update the test to check for the vectorization call in the main vector loop, not the dead epilogue vector loop as it does currently.	2023-07-03 14:16:47 +01:00
Florian Hahn	6954cb5425	[LV] Add test case for #63602 .	2023-07-02 22:17:16 +01:00
Nikita Popov	bb3763e497	Revert "[SimplifyCFG] Allow dropping block that only contains ephemeral values" This reverts commit 20f0c68fd83a0147a8ec1722bd2e848180610288. https://reviews.llvm.org/D153966#4464594 reports an optimization regression in Rust. Additionally this change has caused an unexpected 0.3% compile-time regression.	2023-06-30 21:24:05 +02:00
Nikita Popov	20f0c68fd8	[SimplifyCFG] Allow dropping block that only contains ephemeral values Perform the TryToSimplifyUncondBranchFromEmptyBlock() transform if the block is empty except for ephemeral values. The ephemeral values will be dropped in that case. This makes sure that assumes don't block this transforms, as reported in https://discourse.llvm.org/t/llvm-assume-blocks-optimization/71609. Differential Revision: https://reviews.llvm.org/D153966	2023-06-30 15:24:01 +02:00
Florian Hahn	9078a9942d	[LV] Add additional tests with dead vector epilogues.	2023-06-30 12:17:57 +01:00
Igor Kirillov	17bde328d6	[LV] Add mask support for vectorizing interleaved groups This patch extends LoopVectorize to handle the vectorization of interleaved memory accesses with scalable vectors when mask is required or/and predicated tail folding is enabled. Differential Revision: https://reviews.llvm.org/D152258	2023-06-29 17:50:56 +00:00
Michael Platings	54c79fa53c	[test] Replace aarch64--eabi with aarch64 Also replace aarch64_be--eabi with aarch64_be Using "eabi" for aarch64 targets is a common mistake and warned by Clang Driver. We want to avoid it elsewhere as well. Just use the common "aarch64" without other triple components. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D153943	2023-06-29 09:06:00 +01:00
Igor Kirillov	7049393a58	[LV] Precommit masked interleaved access tests Precommit for D152258. Differential Revision: https://reviews.llvm.org/D153443	2023-06-28 09:23:23 +00:00
Fangrui Song	ebbfdca586	[test] Replace aarch64-arm-none-eabi with aarch64 Similar to 02e9441d6ca73314afa1973a234dce1e390da1da, but for llvm/test and one lld/test/ELF test.	2023-06-27 19:36:27 -07:00
Florian Hahn	dc9f69e483	[LV] Add test with reduction start values that are/may be poison/undef. Test cases for #62565.	2023-06-22 20:15:23 +01:00
Anna Thomas	ec146cb7c0	[LV] Add support for minimum/maximum intrinsics {mini\|maxi}mum intrinsics are different from {min\|max}num intrinsics in the propagation of NaN and signed zero. Also, the minnum/maxnum intrinsics require the presence of nsz flags to be valid reductions in vectorizer. In this regard, we introduce a new recurrence kind and also add support for identifying reduction patterns using these intrinsics. The reduction intrinsics and lowering was introduced here: 26bfbec5d2. There are tests added which show how this interacts across chains of min/max patterns. Differential Revision: https://reviews.llvm.org/D151482	2023-06-20 13:17:28 -04:00
Florian Hahn	0a246a0c72	[LV] Use VPValues when creating GEP with all invariant indices. Update VPWidenGEPRecipe::execute to use the VPValue operands of the recipe when creating the GEP instruction. Fixes #63340.	2023-06-16 16:14:01 +01:00
Florian Hahn	ea6ca9cb2b	[LV] Fix crash when stride isn't a constant. In same cases, the stride may not be a constant. Just skip those cases for now. This should only happen for cases where LV interleaves only, if it is vectorized the stride needs to be versioned to a constant.	2023-06-14 16:53:34 +01:00
Simon Pilgrim	4cbedaeff5	[LoopVectorize][X86] Regenerate slm-no-vectorize.ll	2023-06-13 14:15:37 +01:00

... 4 5 6 7 8 ...

2388 Commits