llvm-project

Author	SHA1	Message	Date
Sanjay Patel	e2321bb448	[SLP] avoid reduction transform on patterns that the backend can load-combine I don't see an ideal solution to these 2 related, potentially large, perf regressions: https://bugs.llvm.org/show_bug.cgi?id=42708 https://bugs.llvm.org/show_bug.cgi?id=43146 We decided that load combining was unsuitable for IR because it could obscure other optimizations in IR. So we removed the LoadCombiner pass and deferred to the backend. Therefore, preventing SLP from destroying load combine opportunities requires that it recognizes patterns that could be combined later, but not do the optimization itself ( it's not a vector combine anyway, so it's probably out-of-scope for SLP). Here, we add a scalar cost model adjustment with a conservative pattern match and cost summation for a multi-instruction sequence that can probably be reduced later. This should prevent SLP from creating a vector reduction unless that sequence is extremely cheap. In the x86 tests shown (and discussed in more detail in the bug reports), SDAG combining will produce a single instruction on these tests like: movbe rax, qword ptr [rdi] or: mov rax, qword ptr [rdi] Not some (half) vector monstrosity as we currently do using SLP: vpmovzxbq ymm0, dword ptr [rdi + 1] # ymm0 = mem[0],zero,zero,.. vpsllvq ymm0, ymm0, ymmword ptr [rip + .LCPI0_0] movzx eax, byte ptr [rdi] movzx ecx, byte ptr [rdi + 5] shl rcx, 40 movzx edx, byte ptr [rdi + 6] shl rdx, 48 or rdx, rcx movzx ecx, byte ptr [rdi + 7] shl rcx, 56 or rcx, rdx or rcx, rax vextracti128 xmm1, ymm0, 1 vpor xmm0, xmm0, xmm1 vpshufd xmm1, xmm0, 78 # xmm1 = xmm0[2,3,0,1] vpor xmm0, xmm0, xmm1 vmovq rax, xmm0 or rax, rcx vzeroupper ret Differential Revision: https://reviews.llvm.org/D67841 llvm-svn: 373833	2019-10-05 18:03:58 +00:00
Aditya Kumar	6a2673605e	Invalidate assumption cache before outlining. Subscribers: llvm-commits Tags: #llvm Reviewers: compnerd, vsk, sebpop, fhahn, tejohnson Reviewed by: vsk Differential Revision: https://reviews.llvm.org/D68478 llvm-svn: 373807	2019-10-04 22:46:42 +00:00
Roman Lebedev	fb5af8b9b9	[InstCombine] Fold 'icmp eq/ne (?trunc (lshr/ashr %x, bitwidth(x)-1)), 0' -> 'icmp sge/slt %x, 0' We do indeed already get it right in some cases, but only transitively, with one-use restrictions. Since we only need to produce a single comparison, it makes sense to match the pattern directly: https://rise4fun.com/Alive/kPg llvm-svn: 373802	2019-10-04 22:16:22 +00:00
Roman Lebedev	f304d4d185	[InstCombine] Right-shift shift amount reassociation with truncation (PR43564, PR42391) Initially (D65380) i believed that if we have rightshift-trunc-rightshift, we can't do any folding. But as it usually happens, i was wrong. https://rise4fun.com/Alive/GEw https://rise4fun.com/Alive/gN2O In https://bugs.llvm.org/show_bug.cgi?id=43564 we happen to have this very sequence, of two right shifts separated by trunc. And "just" so that happens, we apparently can fold the pattern if the total shift amount is either 0, or it's equal to the bitwidth of the innermost widest shift - i.e. if we are left with only the original sign bit. Which is exactly what is wanted there. llvm-svn: 373801	2019-10-04 22:16:11 +00:00
Roman Lebedev	ae738641d5	[NFC][InstCombine] Autogenerate shift.ll test llvm-svn: 373800	2019-10-04 22:15:57 +00:00
Roman Lebedev	007452532b	[NFC][InstCombine] Autogenerate icmp-shr-lt-gt.ll test llvm-svn: 373799	2019-10-04 22:15:49 +00:00
Roman Lebedev	3c56cc920f	[NFC][InstCombine] Tests for bit test via highest sign-bit extract (w/ trunc) (PR43564) https://rise4fun.com/Alive/x5IS llvm-svn: 373798	2019-10-04 22:15:41 +00:00
Roman Lebedev	6a954748c8	[NFC][InstCombine] Tests for right-shift shift amount reassociation (w/ trunc) (PR43564, PR42391) https://rise4fun.com/Alive/GEw llvm-svn: 373797	2019-10-04 22:15:32 +00:00
Sanjay Patel	6e312388b6	[InstCombine] add tests for fneg disguised as fmul; NFC llvm-svn: 373788	2019-10-04 20:54:14 +00:00
Kevin P. Neal	68b8052121	[FPEnv] Strict FP tests should use the requisite function attributes. A set of function attributes is required in any function that uses constrained floating point intrinsics. None of our tests use these attributes. This patch fixes this. These tests have been tested against the IR verifier changes in D68233. Reviewed by: andrew.w.kaylor, cameron.mcinally, uweigand Approved by: andrew.w.kaylor Differential Revision: https://reviews.llvm.org/D67925 llvm-svn: 373761	2019-10-04 17:03:46 +00:00
Peter Collingbourne	71662116fd	LowerTypeTests: Rename local functions to avoid collisions with identically named functions in ThinLTO modules. Without this we can encounter link errors or incorrect behaviour at runtime as a result of the wrong function being referenced. Differential Revision: https://reviews.llvm.org/D67945 llvm-svn: 373678	2019-10-03 23:42:44 +00:00
Alina Sbirlea	145cdad119	[MemorySSA] Don't hoist stores if interfering uses (as calls) exist. llvm-svn: 373674	2019-10-03 22:20:04 +00:00
Roman Lebedev	c780645736	[NFC][InstCombine] Some tests for sub-of-negatible pattern As we have previously estabilished, `sub` is an outcast, and should be considered non-canonical iff it can be converted to `add`. It can be converted to `add` if it's second operand can be negated. So far we mostly only do that for constants and negation itself, but we should be more through. llvm-svn: 373597	2019-10-03 13:36:00 +00:00
Roman Lebedev	ae3315af07	[InstCombine] Bypass high bit extract before variable sign-extension (PR43523) https://rise4fun.com/Alive/8BY - valid for lshr+trunc+variable sext https://rise4fun.com/Alive/7jk - the variable sext can be redundant https://rise4fun.com/Alive/Qslu - 'exact'-ness of first shift can be preserver https://rise4fun.com/Alive/IF63 - without trunc we could view this as more general "drop redundant mask before right-shift", but let's handle it here for now https://rise4fun.com/Alive/iip - likewise, without trunc, variable sext can be redundant. There's more patterns for sure - e.g. we can have 'lshr' as the final shift, but that might be best handled by some more generic transform, e.g. "drop redundant masking before right-shift" (PR42456) I'm singling-out this sext patch because you can only extract high bits with `shr` (unlike abstract bit masking), and i know* this fold is wanted by existing code. I don't believe there is much to review here, so i'm gonna opt into post-review mode here. https://bugs.llvm.org/show_bug.cgi?id=43523 llvm-svn: 373542	2019-10-02 23:02:12 +00:00
Roman Lebedev	29339149c3	[NFC][InstCombine] Add tests for 'variable sext of variable high bit extract' pattern (PR43523) https://bugs.llvm.org/show_bug.cgi?id=43523 llvm-svn: 373541	2019-10-02 23:01:58 +00:00
David Bolvansky	6b45029676	[InstCombine] Transform bcopy to memmove bcopy is still widely used mainly for network apps. Sadly, LLVM has no optimizations for bcopy, but there are some for memmove. Since bcopy == memmove, it is profitable to transform bcopy to memmove and use current optimizations for memmove for free here. llvm-svn: 373537	2019-10-02 22:49:20 +00:00
Sanjay Patel	3f4726b818	[SLP] add test for vectorization of different widths (PR28457); NFC llvm-svn: 373483	2019-10-02 16:12:42 +00:00
Florian Hahn	f2ffa7a1c0	[InstCombine] Precommit tests for D68265 llvm-svn: 373458	2019-10-02 12:32:37 +00:00
Sanjay Patel	be21ceb565	[InstSimplify] fold fma/fmuladd with a NaN or undef operand This is intended to be similar to the constant folding results from D67446 and earlier, but not all operands are constant in these tests, so the responsibility for folding is left to InstSimplify. Differential Revision: https://reviews.llvm.org/D67721 llvm-svn: 373455	2019-10-02 12:12:02 +00:00
Roman Lebedev	053014f8f9	[InstCombine] Deal with -(trunc(X >>u 63)) -> trunc(X >>s 63) Identical to it's trunc-less variant, just pretent-to hoist trunc, and everything else still holds: https://rise4fun.com/Alive/JRU llvm-svn: 373364	2019-10-01 17:50:20 +00:00
Roman Lebedev	65144149d0	[InstCombine] Preserve 'exact' in -(X >>u 31) -> (X >>s 31) fold https://rise4fun.com/Alive/yR4 llvm-svn: 373363	2019-10-01 17:50:09 +00:00
Roman Lebedev	f273fc793a	[NFC][InstCombine] (Better) tests for sign-bit-smearing pattern https://rise4fun.com/Alive/JRU https://rise4fun.com/Alive/yR4 <- we can preserve 'exact' llvm-svn: 373362	2019-10-01 17:49:58 +00:00
Philip Reames	0200626f0b	[IndVars] An implementation of loop predication without a need for speculation This patch implements a variation of a well known techniques for JIT compilers - we have an implementation in tree as LoopPredication - but with an interesting twist. This version does not assume the ability to execute a path which wasn't taken in the original program (such as a guard or widenable.condition intrinsic). The benefit is that this works for arbitrary IR from any frontend (including C/C++/Fortran). The tradeoff is that it's restricted to read only loops without implicit exits. This builds on SCEV, and can thus eliminate the loop varying portion of the any early exit where all exits are understandable by SCEV. A key advantage is that fixing deficiency exposed in SCEV - already found one while writing test cases - will also benefit all of full redundancy elimination (and most other loop transforms). I haven't seen anything in the literature which quite matches this. Given that, I'm not entirely sure that keeping the name "loop predication" is helpful. Anyone have suggestions for a better name? This is analogous to partial redundancy elimination - since we remove the condition flowing around the backedge - and has some parallels to our existing transforms which try to make conditions invariant in loops. Factoring wise, I chose to put this in IndVarSimplify since it's a generally applicable to all workloads. I could split this off into it's own pass, but we'd then probably want to add that new pass every place we use IndVars. One solid argument for splitting it off into it's own pass is that this transform is "too good". It breaks a huge number of existing IndVars test cases as they tend to be simple read only loops. At the moment, I've opted it off by default, but if we add this to IndVars and enable, we'll have to update around 20 test files to add side effects or disable this transform. Near term plan is to fuzz this extensively while off by default, reflect and discuss on the factoring issue mentioned just above, and then enable by default. I also need to give some though to supporting widenable conditions in this framing. Differential Revision: https://reviews.llvm.org/D67408 llvm-svn: 373351	2019-10-01 17:03:44 +00:00
David Bolvansky	4037582d6b	Revert [InstCombine] sprintf(dest, "%s", str) -> memccpy(dest, str, 0, MAX) Seems to be slower than memcpy + strlen. llvm-svn: 373335	2019-10-01 13:19:04 +00:00
David Bolvansky	8fc6a1bf56	[InstCombine] sprintf(dest, "%s", str) -> memccpy(dest, str, 0, MAX) llvm-svn: 373333	2019-10-01 13:03:10 +00:00
Evandro Menezes	110b1138ba	[InstCombine] Expand the simplification of log() Expand the simplification of special cases of `log()` to include `log2()` and `log10()` as well as intrinsics and more types. Differential revision: https://reviews.llvm.org/D67199 llvm-svn: 373261	2019-09-30 20:52:21 +00:00
David Bolvansky	a05e671c7e	[FunctionAttrs] Added noalias for memccpy/mempcpy arguments llvm-svn: 373251	2019-09-30 19:43:48 +00:00
Roman Lebedev	0205be8f12	[NFC][InstCombine] Redundant-left-shift-input-masking: add some more undef tests llvm-svn: 373248	2019-09-30 19:15:51 +00:00
Rong Xu	3674050087	[PGO] Don't group COMDAT variables for compiler generated profile variables in ELF With this patch, compiler generated profile variables will have its own COMDAT name for ELF format, which syncs the behavior with COFF. Tested with clang PGO bootstrap. This shows a modest reduction in object sizes in ELF format. Differential Revision: https://reviews.llvm.org/D68041 llvm-svn: 373241	2019-09-30 18:11:22 +00:00
Sanjay Patel	712b7c2463	[InstCombine] fold negate disguised as select+mul Name: negate if true %sel = select i1 %cond, i32 -1, i32 1 %r = mul i32 %sel, %x => %m = sub i32 0, %x %r = select i1 %cond, i32 %m, i32 %x Name: negate if false %sel = select i1 %cond, i32 1, i32 -1 %r = mul i32 %sel, %x => %m = sub i32 0, %x %r = select i1 %cond, i32 %x, i32 %m https://rise4fun.com/Alive/Nlh llvm-svn: 373230	2019-09-30 17:02:26 +00:00
Sanjay Patel	8913882fa2	[InstCombine] add tests for negate disguised as mul; NFC llvm-svn: 373222	2019-09-30 15:43:27 +00:00
Paul Robinson	14945186c2	[SSP] [1/3] Revert "StackProtector: Use PointerMayBeCaptured" "Captured" and "relevant to Stack Protector" are not the same thing. This reverts commit f29366b1f594f48465c5a2754bcffac6d70fd0b1. aka r363169. Differential Revision: https://reviews.llvm.org/D67842 llvm-svn: 373216	2019-09-30 15:01:35 +00:00
Roman Lebedev	d30093bb8a	[DivRemPairs] Don't assert that we won't ever get expanded-form rem pairs in different BB's (PR43500) If we happen to have the same div in two basic blocks, and in one of those we also happen to have the rem part, we'd match the div-rem pair, but the wrong ones. So let's drop overly-ambiguous assert. Fixes https://bugs.llvm.org/show_bug.cgi?id=43500 llvm-svn: 373167	2019-09-29 15:25:24 +00:00
Alexey Bataev	8b1eeafb91	[SLP] Fix for PR31847: Assertion failed: (isLoopInvariant(Operands[i], L) && "SCEVAddRecExpr operand is not loop-invariant!") Initially SLP vectorizer replaced all going-to-be-vectorized instructions with Undef values. It may break ScalarEvaluation and may cause a crash. Reworked SLP vectorizer so that it does not replace vectorized instructions by UndefValue anymore. Instead vectorized instructions are marked for deletion inside if BoUpSLP class and deleted upon class destruction. Reviewers: mzolotukhin, mkuper, hfinkel, RKSimon, davide, spatel Subscribers: RKSimon, Gerolf, anemet, hans, majnemer, llvm-commits, sanjoy Differential Revision: https://reviews.llvm.org/D29641 llvm-svn: 373166	2019-09-29 14:18:06 +00:00
Wei Mi	f0c4e70e95	[SampleFDO] Create a separate flag profile-accurate-for-symsinlist to handle profile symbol list. Currently many existing users using profile-sample-accurate want to reduce code size as much as possible. Their use cases are different from the scenario profile symbol list tries to handle -- the major motivation of adding profile symbol list is to get the major memory/code size saving without introduce performance regression. So to keep the behavior of profile-sample-accurate unchanged, we think decoupling these two things and using a new flag to control the handling of profile symbol list may be better. When profile-sample-accurate and the new flag profile-accurate-for-symsinlist are both present, since profile-sample-accurate is a user assertion we let it have a higher precedence. Differential Revision: https://reviews.llvm.org/D68047 llvm-svn: 373133	2019-09-27 22:33:59 +00:00
Roman Lebedev	9c604a0dd6	[NFC][PhaseOrdering] Add end-to-end tests for the 'two shifts by sext' problem We start with two separate sext's, but EarlyCSE runs before InstCombine, so when we get them, they are a single sext, and we just ignore that. Likewise, if we had a single sext, we don't do anything there. llvm-svn: 373115	2019-09-27 19:32:43 +00:00
Sanjay Patel	1b40402aa2	[InstSimplify] add tests for fma/fmuladd with undef operand; NFC llvm-svn: 373109	2019-09-27 18:38:51 +00:00
Roman Lebedev	269f1bea0d	[InstCombine] Simplify shift-by-sext to shift-by-zext Summary: This is valid for any `sext` bitwidth pair: ``` Processing /tmp/opt.ll.. ---------------------------------------- %signed = sext %y %r = shl %x, %signed ret %r => %unsigned = zext %y %r = shl %x, %unsigned ret %r %signed = sext %y Done: 2016 Optimization is correct! ``` (This isn't so for funnel shifts, there it's illegal for e.g. i6->i7.) Main motivation is the C++ semantics: ``` int shl(int a, char b) { return a << b; } ``` ends as ``` %3 = sext i8 %1 to i32 %4 = shl i32 %0, %3 ``` https://godbolt.org/z/0jgqUq which is, as this shows, too pessimistic. There is another problem here - we can only do the fold if sext is one-use. But we can trivially have cases where several shifts have the same sext shift amount. This should be resolved, later. Reviewers: spatel, nikic, RKSimon Reviewed By: spatel Subscribers: efriedma, hiraditya, nlopes, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68103 llvm-svn: 373106	2019-09-27 18:12:15 +00:00
Simon Pilgrim	756f5cfc2a	[SLPVectorizer][X86] Regenerate arith-fp tests llvm-svn: 373063	2019-09-27 10:04:25 +00:00
Roman Lebedev	0956480459	[NFC][InstCombine] Revisit shift-by-signext tests llvm-svn: 373055	2019-09-27 09:09:15 +00:00
Wei Mi	9c8efeda5c	Revert "[LoopInfo] Limit the iterations to check whether a loop has dedicated exits" Get a better approach in https://reviews.llvm.org/D68107 to solve the problem. Revert the initial patch and will commit the new one soon. This reverts commit rL372990. llvm-svn: 373044	2019-09-27 05:43:30 +00:00
Jordan Rupprecht	f98d2c099a	Revert [SLP] Fix for PR31847: Assertion failed: (isLoopInvariant(Operands[i], L) && "SCEVAddRecExpr operand is not loop-invariant!") This reverts r372626 (git commit 6a278d9073bdc158d31d4f4b15bbe34238f22c18) llvm-svn: 373019	2019-09-26 22:09:17 +00:00
Kit Barton	50bc610460	[LoopFusion] Add ability to fuse guarded loops Summary: This patch extends the current capabilities in loop fusion to fuse guarded loops (as defined in https://reviews.llvm.org/D63885). The patch adds the necessary safety checks to ensure that it safe to fuse the guarded loops (control flow equivalent, no intervening code, and same guard conditions). It also provides an alternative method to perform the actual fusion of guarded loops. The mechanics to fuse guarded loops are slightly different then fusing non-guarded loops, so I opted to keep them separate methods. I will be cleaning this up in later patches, and hope to converge on a single method to fuse both guarded and non-guarded loops, but for now I think the review will be easier to keep them separate. Reviewers: jdoerfert, Meinersbur, dmgreen, etiotto, Whitney Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65464 llvm-svn: 373018	2019-09-26 21:42:45 +00:00
Zhaoshi Zheng	1128fa0924	[Unroll] Do NOT unroll a loop with small runtime upperbound For a runtime loop if we can compute its trip count upperbound: Don't unroll if: 1. loop is not guaranteed to run either zero or upperbound iterations; and 2. trip count upperbound is less than UnrollMaxUpperBound Unless user or TTI asked to do so. If unrolling, limit unroll factor to loop's trip count upperbound. Differential Revision: https://reviews.llvm.org/D62989 Change-Id: I6083c46a9d98b2e22cd855e60523fdc5a4929c73 llvm-svn: 373017	2019-09-26 21:40:27 +00:00
Roman Lebedev	86b40b0bbf	[InstCombine][NFC] Add tests for shift-by-signext llvm-svn: 373013	2019-09-26 20:49:30 +00:00
Roman Lebedev	d1ef2e48fb	[InstCombine][NFC] Regenerate load-cmp.ll test llvm-svn: 373012	2019-09-26 20:49:21 +00:00
David Bolvansky	f1a5a93157	[NFC] Precommit tests for D68089 llvm-svn: 373006	2019-09-26 19:01:18 +00:00
Craig Topper	46721bb7f5	[InstCombine] Use m_Zero instead of isNullValue() when checking if a GEP index is all zeroes to prevent an infinite loop. The test case here previously infinite looped. Only one element from the GEP is used so SimplifyDemandedVectorElts would replace the other lanes in each index with undef leading to the first index being <0, undef, undef, undef>. But there's a GEP transform that tries to replace an index into a 0 sized type with a zero index. But the zero index check only works on ConstantInt 0 or ConstantAggregateZero so it would turn the index back to zeroinitializer. Resulting in a loop. The fix is to use m_Zero() to allow a vector of zeroes and undefs. Differential Revision: https://reviews.llvm.org/D67977 llvm-svn: 373000	2019-09-26 17:20:50 +00:00
Wei Mi	67d93f0d91	[LoopInfo] Limit the iterations to check whether a loop has dedicated exits for extreme large case. We had a case that a single loop which has 4000 exits and the average number of predecessors of each exit is > 1000, and we found compiling the case spent a significant amount of time on checking whether a loop has dedicated exits. This patch adds a limit for the iterations to the check. With the patch, the time to compile our testcase reduced from 1000s to 200s (clang release build). Differential Revision: https://reviews.llvm.org/D67359 llvm-svn: 372990	2019-09-26 15:36:25 +00:00
Jakub Kuderski	d98cb81cd1	Handle successor's PHI node correctly when flattening CFG merges two if-regions Summary: FlattenCFG merges two 'if' basicblocks by inserting one basicblock to another basicblock. The inserted basicblock can have a successor that contains a PHI node whoes incoming basicblock is the inserted basicblock. Since the existing code does not handle it, it becomes a badref. if (cond1) statement if (cond2) statement successor - contains PHI node whose predecessor is cond2 --> if (cond1 \|\| cond2) statement (BB for cond2 was deleted) successor - contains PHI node whose predecessor is cond2 --> bad ref! Author: Jaebaek Seo Reviewers: asbirlea, kuhar, tstellar, chandlerc, davide, dexonsmith Reviewed By: kuhar Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68032 llvm-svn: 372989	2019-09-26 15:20:17 +00:00

1 2 3 4 5 ...

13517 Commits