llvm-project

Author	SHA1	Message	Date
Sanjay Patel	39de82ecc9	[SLP] fix insertion point for min/max reduction As discussed in D70148 (and caused a revert of the original commit): if we insert at the select, then we can produce invalid IR because the replacement for the compare may have uses before the select.	2019-11-19 10:50:10 -05:00
Evgeniy Brevnov	4a64d710ae	[NFC] Test commit. Please ignore. As a test commit I fixed a misspelling in one of comments in SLP vectorizer.	2019-11-19 15:41:57 +07:00
Eric Christopher	6f1cc4151a	Temporarily revert "[SLP] fix miscompile on min/max reductions with extra uses (PR43948)" as it causes an ICE on valid. A testcase was followed up on the original thread. This reverts commit a3e61946c5bd7bdfab15af76b292e52d6ffa27f7.	2019-11-18 14:41:37 -08:00
Richard Smith	7889d8e7eb	Revert "[LoadStoreVectorize] Use '\|\|' instead of '\|' between sides with function calls. NFCI." This broke two tests. Presumably the non-short-circuting '\|' was intentional here. This reverts commit f7efea0ded8e16c7751b378523407a491016edd6.	2019-11-15 12:49:35 -08:00
Dávid Bolvanský	f7efea0ded	[LoadStoreVectorize] Use '\|\|' instead of '\|' between sides with function calls. NFCI. Fixes warning from PVS Studio	2019-11-15 18:51:13 +01:00
Reid Kleckner	4c1a1d3cf9	Add missing includes needed to prune LLVMContext.h include, NFC These are a pre-requisite to removing #include "llvm/Support/Options.h" from LLVMContext.h: https://reviews.llvm.org/D70280	2019-11-14 15:23:15 -08:00
Alexey Bataev	bfa32573bf	Revert "Temporarily Revert:" This reverts commit e511c4b0dff1692c267addf17dce3cebe8f97faa: Temporarily Revert: "[SLP] Generalization of stores vectorization." "[SLP] Fix -Wunused-variable. NFC" "[SLP] Vectorize jumbled stores." after fixing the problem with compile time.	2019-11-14 16:38:20 -05:00
Sjoerd Meijer	cb47b87830	[LV] PreferPredicateOverEpilog respecting predicate loop hint The vectoriser queries TTI->preferPredicateOverEpilogue to determine if tail-folding is preferred for a loop, but it was not respecting loop hint 'predicate' that can disable this, which has now been added. This showed that we were incorrectly initialising loop hint 'vectorize.predicate.enable' with 0 (i.e. FK_Disabled) but this should have been FK_Undefined, which has been fixed. Differential Revision: https://reviews.llvm.org/D70125	2019-11-14 13:10:44 +00:00
Reid Kleckner	05da2fe521	Sink all InitializePasses.h includes This file lists every pass in LLVM, and is included by Pass.h, which is very popular. Every time we add, remove, or rename a pass in LLVM, it caused lots of recompilation. I found this fact by looking at this table, which is sorted by the number of times a file was changed over the last 100,000 git commits multiplied by the number of object files that depend on it in the current checkout: recompiles touches affected_files header 342380 95 3604 llvm/include/llvm/ADT/STLExtras.h 314730 234 1345 llvm/include/llvm/InitializePasses.h 307036 118 2602 llvm/include/llvm/ADT/APInt.h 213049 59 3611 llvm/include/llvm/Support/MathExtras.h 170422 47 3626 llvm/include/llvm/Support/Compiler.h 162225 45 3605 llvm/include/llvm/ADT/Optional.h 158319 63 2513 llvm/include/llvm/ADT/Triple.h 140322 39 3598 llvm/include/llvm/ADT/StringRef.h 137647 59 2333 llvm/include/llvm/Support/Error.h 131619 73 1803 llvm/include/llvm/Support/FileSystem.h Before this change, touching InitializePasses.h would cause 1345 files to recompile. After this change, touching it only causes 550 compiles in an incremental rebuild. Reviewers: bkramer, asbirlea, bollu, jdoerfert Differential Revision: https://reviews.llvm.org/D70211	2019-11-13 16:34:37 -08:00
Sanjay Patel	a3e61946c5	[SLP] fix miscompile on min/max reductions with extra uses (PR43948) The bug manifests as replacing a reduction operand with an undef value. The problem appears to be limited to cases where a min/max reduction has extra uses of the compare operand to the select. In the general case, we are tracking "ExternallyUsedValues" and an "IgnoreList" of the reduction operations, but those may not apply to the final compare+select in a min/max reduction. For that, we use replaceAllUsesWith (RAUW) to ensure that the new vectorized reduction values are transferred to all subsequent users. Differential Revision: https://reviews.llvm.org/D70148	2019-11-13 15:57:35 -05:00
Sanjay Patel	e9bf7a60a0	[SLP] reduce code duplication for min/max vs. other reductions; NFCI	2019-11-13 11:26:08 -05:00
Simon Pilgrim	d1bd5e476b	SLPVectorizer - make comparison operators + isInSchedulingRegion const Fixes cppcheck warnings.	2019-11-13 14:40:19 +00:00
Vasileios Porpodas	6a18a95487	[SLP] Look-ahead operand reordering heuristic. Summary: This patch introduces a new heuristic for guiding operand reordering. The new "look-ahead" heuristic can look beyond the immediate predecessors. This helps break ties when the immediate predecessors have identical opcodes (see lit test for examples). Reviewers: RKSimon, ABataev, dtemirbulatov, Ayal, hfinkel, rnk Reviewed By: RKSimon, dtemirbulatov Subscribers: xbolva00, Carrot, hiraditya, phosek, rnk, rcorcs, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60897	2019-11-11 21:06:51 -08:00
Simon Pilgrim	4ff246fef2	Remove unused variable (which allows us to remove vector include). NFC.	2019-11-10 12:16:23 +00:00
Gil Rapaport	7f152543e4	[LV] Apply sink-after & interleave-groups as VPlan transformations (NFCI) This recommits 11ed1c0239fd51fd2f064311dc7725277ed0a994 (reverted in 9f08ce0d2197d4f163dfa4633eae2347ce8fc881 for failing an assert) with a fix: tryToWidenMemory() now first checks if the widening decision is to interleave, thus maintaining previous behavior where tryToInterleaveMemory() was called first, giving priority to interleave decisions over widening/scalarization. This commit adds the test case that exposed this bug as a LIT.	2019-11-09 20:52:25 +02:00
Gil Rapaport	9f08ce0d21	Revert "[LV] Apply sink-after & interleave-groups as VPlan transformations (NFCI)" This reverts commit 11ed1c0239fd51fd2f064311dc7725277ed0a994 - causes an assert failure.	2019-11-08 22:17:11 +02:00
Gil Rapaport	11ed1c0239	[LV] Apply sink-after & interleave-groups as VPlan transformations (NFCI) This recommits 100e797adb433724a17c9b42b6533cd634cb796b (reverted in 009e032634b3bd7fc32071ac2344b12142286477 for failing an assert). While the root cause was independently reverted in eaff3004019f97c64c88ab76da6b25106b659b30, this commit includes a LIT to make sure IVDescriptor's SinkAfter logic does not try to sink branch instructions.	2019-11-08 15:25:14 +02:00
Sanjay Patel	7ff57705ba	[SLP] allow forming 2-way reduction patterns We have a vector compare reduction problem seen in PR39665 comment 2: https://bugs.llvm.org/show_bug.cgi?id=39665#c2 Or slightly reduced here: define i1 @cmp2(<2 x double> %a0) { %a = fcmp ogt <2 x double> %a0, <double 1.0, double 1.0> %b = extractelement <2 x i1> %a, i32 0 %c = extractelement <2 x i1> %a, i32 1 %d = and i1 %b, %c ret i1 %d } SLP would not attempt to turn this into a vector reduction because there is an artificial lower limit on that transform. We can not completely remove that limit without inducing regressions though, so this patch just hacks an extra attempt at creating a 2-way reduction to the end of the analysis. As shown in the test file, we are still not getting some of the motivating cases, so follow-on patches will be needed to solve those cases. Differential Revision: https://reviews.llvm.org/D59710	2019-11-07 06:08:42 -05:00
Eric Christopher	009e032634	Temporarily Revert "[LV] Apply sink-after & interleave-groups as VPlan transformations (NFC)" as it's causing assert failures. This reverts commit 100e797adb433724a17c9b42b6533cd634cb796b.	2019-11-06 21:58:28 -08:00
Eric Christopher	e511c4b0df	Temporarily Revert: "[SLP] Generalization of stores vectorization." "[SLP] Fix -Wunused-variable. NFC" "[SLP] Vectorize jumbled stores." As they're causing significant (10-30x) compile time regressions on vectorizable code. The primary cause of the compile-time regression is f228b5371647f471853c5fb3e6719823a42fe451. This reverts commits: f228b5371647f471853c5fb3e6719823a42fe451 5503455ccb3f5fcedced158332c016c8d3a7fa81 21d498c9c0f32dcab5bc89ac593aa813b533b43a	2019-11-06 16:06:15 -08:00
Sjoerd Meijer	6c2a4f5ff9	[TTI][LV] preferPredicateOverEpilogue We have two ways to steer creating a predicated vector body over creating a scalar epilogue. To force this, we have 1) a command line option and 2) a pragma available. This adds a third: a target hook to TargetTransformInfo that can be queried whether predication is preferred or not, which allows the vectoriser to make the decision without forcing it. While this change behaves as a non-functional change for now, it shows the required TTI plumbing, usage of this new hook in the vectoriser, and the beginning of an ARM MVE implementation. I will follow up on this with: - a complete MVE implementation, see D69845. - a patch to disable this, i.e. we should respect "vector_predicate(disable)" and its corresponding loophint. Differential Revision: https://reviews.llvm.org/D69040	2019-11-06 10:14:20 +00:00
Sergey Dmitriev	82588e05cc	[SLP] - Add couple safety checks to TreeEntry::dump(). NFC Summary: Check for MainOp and AltOp for NULL before dereferencing or issue NULL. Reviewers: Vasilis, dtemirbulatov, RKSimon, ABataev Reviewed By: ABataev Subscribers: mehdi_amini, hiraditya, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69812	2019-11-05 09:57:30 -08:00
Gil Rapaport	100e797adb	[LV] Apply sink-after & interleave-groups as VPlan transformations (NFC) This recommits 2be17087f8c38934b7fc9208ae6cf4e9b4d44f4b (reverted in d3ec06d219788801380af1948c7f7ef9d3c6100b for heap-use-after-free) with a fix in IAI's reset() which was not clearing the set of interleave groups after deleting them.	2019-11-05 17:29:13 +02:00
Alexey Bataev	b80c41cd3c	[SLP]Fix PR43799: Crash on different sizes of GEP indices. Summary: If the GEP instructions are going to be vectorized, the indices in those GEP instructions must be of the same type. Otherwise, the compiler may crash when trying to build the vector constant. Reviewers: RKSimon, spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69627	2019-11-04 10:36:26 -05:00
Benjamin Kramer	d3ec06d219	Revert "[LV] Apply sink-after & interleave-groups as VPlan transformations (NFC)" This reverts commit 2be17087f8c38934b7fc9208ae6cf4e9b4d44f4b. Fails ASAN.	2019-11-04 15:04:42 +01:00
Gil Rapaport	2be17087f8	[LV] Apply sink-after & interleave-groups as VPlan transformations (NFC) The sink-after and interleave-group vectorization decisions were so far applied to VPlan during initial VPlan construction, which complicates VPlan construction – also because of their inter-dependence. This patch refactors buildVPlanWithRecipes() to construct a simpler initial VPlan and later apply both these vectorization decisions, in order, as VPlan-to-VPlan transformations. Differential Revision: https://reviews.llvm.org/D68577	2019-11-04 10:37:39 +02:00
Simon Pilgrim	81ba611e88	Ensure VPlanPrinter::Depth is initialized to fix static analyzer warning. NFCI.	2019-11-03 11:17:05 +00:00
Alexey Bataev	70ad617dd6	[SLP] Vectorize jumbled stores. Summary: Patch adds support for vectorization of the jumbled stores. The value operands are vectorized and then shuffled in the right order before store. Reviewers: RKSimon, spatel, hfinkel, mkuper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43339	2019-10-31 16:02:25 -04:00
Haojian Wu	e65ddcafee	Revert "[SLP] Vectorize jumbled stores." This reverts commit 21d498c9c0f32dcab5bc89ac593aa813b533b43a. This commit causes some crashes on some targets.	2019-10-31 10:21:24 +01:00
Alexey Bataev	21d498c9c0	[SLP] Vectorize jumbled stores. Summary: Patch adds support for vectorization of the jumbled stores. The value operands are vectorized and then shuffled in the right order before store. Reviewers: RKSimon, spatel, hfinkel, mkuper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43339	2019-10-30 13:33:52 -04:00
Simon Pilgrim	d52f5ed01a	[SLPVectorizer] Use getAPInt() for comparison. NFCI. Technically integers can assert on getZExtValue() if beyond i64 range, and a fuzzer usually find this.....	2019-10-30 16:16:55 +00:00
Fangrui Song	5503455ccb	[SLP] Fix -Wunused-variable. NFC	2019-10-29 09:38:55 -07:00
Alexey Bataev	f228b53716	[SLP] Generalization of stores vectorization. Stores are vectorized with maximum vectorization factor of 16. Patch tries to improve the situation and use maximal vectorization factor. Reviewers: spatel, RKSimon, mkuper, hfinkel Differential Revision: https://reviews.llvm.org/D43582	2019-10-29 11:46:36 -04:00
Craig Topper	18824d25d8	[LV] Interleaving should not exceed estimated loop trip count. Currently we may do iterleaving by more than estimated trip count coming from the profile or computed maximum trip count. The solution is to use "best known" trip count instead of exact one in interleaving analysis. Patch by Evgeniy Brevnov. Differential Revision: https://reviews.llvm.org/D67948	2019-10-28 10:58:22 -07:00
Guillaume Chatelet	a4783ef58d	[Alignment][NFC] getMemoryOpCost uses MaybeAlign Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69307	2019-10-25 21:26:59 +02:00
Sanjay Patel	b82fa80e80	[SLP] adjust code comment; NFC (check commit access)	2019-10-25 11:39:43 -04:00
Guillaume Chatelet	5e1e83ee23	[Alignment][NFC] Instructions::getLoadStoreAlignment Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69256 llvm-svn: 375416	2019-10-21 14:49:28 +00:00
Sanjay Patel	8cc6d42e8d	[SLP] avoid reduction transform on patterns that the backend can load-combine (2nd try) The 1st attempt at this modified the cost model in a bad way to avoid the vectorization, but that caused problems for other users (the loop vectorizer) of the cost model. I don't see an ideal solution to these 2 related, potentially large, perf regressions: https://bugs.llvm.org/show_bug.cgi?id=42708 https://bugs.llvm.org/show_bug.cgi?id=43146 We decided that load combining was unsuitable for IR because it could obscure other optimizations in IR. So we removed the LoadCombiner pass and deferred to the backend. Therefore, preventing SLP from destroying load combine opportunities requires that it recognizes patterns that could be combined later, but not do the optimization itself ( it's not a vector combine anyway, so it's probably out-of-scope for SLP). Here, we add a cost-independent bailout with a conservative pattern match for a multi-instruction sequence that can probably be reduced later. In the x86 tests shown (and discussed in more detail in the bug reports), SDAG combining will produce a single instruction on these tests like: movbe rax, qword ptr [rdi] or: mov rax, qword ptr [rdi] Not some (half) vector monstrosity as we currently do using SLP: vpmovzxbq ymm0, dword ptr [rdi + 1] # ymm0 = mem[0],zero,zero,.. vpsllvq ymm0, ymm0, ymmword ptr [rip + .LCPI0_0] movzx eax, byte ptr [rdi] movzx ecx, byte ptr [rdi + 5] shl rcx, 40 movzx edx, byte ptr [rdi + 6] shl rdx, 48 or rdx, rcx movzx ecx, byte ptr [rdi + 7] shl rcx, 56 or rcx, rdx or rcx, rax vextracti128 xmm1, ymm0, 1 vpor xmm0, xmm0, xmm1 vpshufd xmm1, xmm0, 78 # xmm1 = xmm0[2,3,0,1] vpor xmm0, xmm0, xmm1 vmovq rax, xmm0 or rax, rcx vzeroupper ret Differential Revision: https://reviews.llvm.org/D67841 llvm-svn: 375025	2019-10-16 18:06:24 +00:00
Sam Parker	527a35e155	[NFC][TTI] Add Alignment for isLegalMasked[Load/Store] Add an extra parameter so the backend can take the alignment into consideration. Differential Revision: https://reviews.llvm.org/D68400 llvm-svn: 374763	2019-10-14 10:00:21 +00:00
Benjamin Kramer	97c9804e06	[LV] Merge LLVM_DEBUG blocks. Avoids unused variable warnings about the range-based for loops in there. NFCI. llvm-svn: 374646	2019-10-12 10:57:22 +00:00
Zi Xuan Wu	9802268ad3	recommit: [LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not estimate different register pressure for different register class separately(especially for scalar type, float type should not be on the same position with int type), so it's not accurate. Specifically, it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance. So we need classify the register classes in IR level, and importantly these are abstract register classes, and are not the target register class of backend provided in td file. It's used to establish the mapping between the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types. For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR), float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled, and 3 kinds of register class when VSX is NOT enabled. It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions. Differential revision: https://reviews.llvm.org/D67148 llvm-svn: 374634	2019-10-12 02:53:04 +00:00
Florian Hahn	39d4c9fd56	[VPlan] Add moveAfter to VPRecipeBase. This patch adds a moveAfter method to VPRecipeBase, which can be used to move elements after other elements, across VPBasicBlocks, if necessary. Reviewers: dcaballe, hsaito, rengolin, hfinkel Reviewed By: dcaballe Differential Revision: https://reviews.llvm.org/D46825 llvm-svn: 374565	2019-10-11 15:36:55 +00:00
Florian Hahn	a3ca7acb4f	[LV][NFC] Factor out calculation of "best" estimated trip count. This is just small refactoring to minimize changes in upcoming patch. In the next path I'm going to introduce changes into heuristic for vectorization of "tiny trip count" loops. Patch by Evgeniy Brevnov <evgueni.brevnov@gmail.com> Reviewers: hsaito, Ayal, fhahn, reames Reviewed By: hsaito Differential Revision: https://reviews.llvm.org/D67690 llvm-svn: 374338	2019-10-10 13:07:01 +00:00
Guillaume Chatelet	837a1b84ce	[Alignment][NFC] Make VectorUtils uas llvm::Align Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, rogfer01, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68784 llvm-svn: 374330	2019-10-10 12:35:04 +00:00
Sanjay Patel	df14bd315d	[SLP] respect target register width for GEP vectorization (PR43578) We failed to account for the target register width (max vector factor) when vectorizing starting from GEPs. This causes vectorization to proceed to obviously illegal widths as in: https://bugs.llvm.org/show_bug.cgi?id=43578 For x86, this also means that SLP can produce rogue AVX or AVX512 code even when the user specifies a narrower vector width. The AArch64 test in ext-trunc.ll appears to be better using the narrower width. I'm not exactly sure what getelementptr.ll is trying to do, but it's testing with "-slp-threshold=-18", so I'm not worried about those diffs. The x86 test is an over-reduction from SPEC h264; this patch appears to restore the perf loss caused by SLP when using -march=haswell. Differential Revision: https://reviews.llvm.org/D68667 llvm-svn: 374183	2019-10-09 16:32:49 +00:00
Sjoerd Meijer	d1170dbe58	[LV] Emitting SCEV checks with OptForSize When optimising for size and SCEV runtime checks need to be emitted to check overflow behaviour, the loop vectorizer can run in this assert: LoopVectorize.cpp:2699: void llvm::InnerLoopVectorizer::emitSCEVChecks( llvm::Loop , llvm::BasicBlock ): Assertion `!BB->getParent()->hasOptSize() && "Cannot SCEV check stride or overflow when opt We should not generate predicates while optimising for size because code will be generated for predicates such as these SCEV overflow runtime checks. This should fix PR43371. Differential Revision: https://reviews.llvm.org/D68082 llvm-svn: 374166	2019-10-09 13:19:41 +00:00
Jinsong Ji	9912232b46	Revert "[LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize" Also Revert "[LoopVectorize] Fix non-debug builds after rL374017" This reverts commit 9f41deccc0e648a006c9f38e11919f181b6c7e0a. This reverts commit 18b6fe07bcf44294f200bd2b526cb737ed275c04. The patch is breaking PowerPC internal build, checked with author, reverting on behalf of him for now due to timezone. llvm-svn: 374091	2019-10-08 17:32:56 +00:00
Kadir Cetinkaya	18b6fe07bc	[LoopVectorize] Fix non-debug builds after rL374017 llvm-svn: 374021	2019-10-08 07:39:50 +00:00
Zi Xuan Wu	9f41deccc0	[LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not estimate different register pressure for different register class separately(especially for scalar type, float type should not be on the same position with int type), so it's not accurate. Specifically, it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance. So we need classify the register classes in IR level, and importantly these are abstract register classes, and are not the target register class of backend provided in td file. It's used to establish the mapping between the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types. For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR), float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled, and 3 kinds of register class when VSX is NOT enabled. It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions. Differential revision: https://reviews.llvm.org/D67148 llvm-svn: 374017	2019-10-08 03:28:33 +00:00
Martin Storsjo	dfc1aee25b	Revert "[SLP] avoid reduction transform on patterns that the backend can load-combine" This reverts SVN r373833, as it caused a failed assert "Non-zero loop cost expected" on building numerous projects, see PR43582 for details and reproduction samples. llvm-svn: 373882	2019-10-07 08:21:37 +00:00

... 24 25 26 27 28 ...

3059 Commits