IRCE has the ability to further version pre-loops and post-loops that it
created, but this isn't useful at all. This change teaches IRCE to
leave behind some metadata in the loops it creates (by cloning the main
loop) so that these new loops are not re-processed by IRCE.
Today this bug is hidden by another bug -- IRCE does not update LoopInfo
properly so the loop pass manager does not re-invoke IRCE on the loops
it split out. However, once the latter is fixed the bug addressed in
this change causes IRCE to infinite-loop in some cases (e.g. it splits
out a pre-loop, a pre-pre-loop from that, a pre-pre-pre-loop from that
and so on).
llvm-svn: 278617
Loops containing `indirectbr` may not be in simplified form, even after
running LoopSimplify. Reject then gracefully, instead of tripping an
assert.
llvm-svn: 278611
Summary:
Refactor the existing support into a LoopDataPrefetch implementation
class and a LoopDataPrefetchLegacyPass class that invokes it.
Add a new LoopDataPrefetchPass for the new pass manager that utilizes
the LoopDataPrefetch implementation class.
Reviewers: mehdi_amini
Subscribers: sanjoy, mzolotukhin, nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D23483
llvm-svn: 278591
`IVVisitor::visitCast` used to have the invariant that if the
instruction it was passed was a sext or zext instruction, the result of
the instruction would be wider than the induction variable. This is no
longer true after rL275037, so this change teaches `IndVarSimplify` s
implementation of `IVVisitor::visitCast` to work with the relaxed
invariant.
A corresponding change to SimplifyIndVar to preserve the said invariant
after rL275037 would also work, but given how `IVVisitor::visitCast` is
spelled (no indication of said invariant), I figured the current fix is
cleaner.
Fixes PR28935.
llvm-svn: 278584
Summary:
In getVectorizablePrefix, this is less efficient (because we have to
iterate over the BB twice), but boy is it simpler. Given how much
trouble we've had here, I think the simplicity gain is worthwhile.
In reorder(), this is actually more efficient, as
DominatorTree::dominates iterates over the BB from the beginning when
the two instructions are in the same BB.
Reviewers: asbirlea
Subscribers: arsenm, llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D23472
llvm-svn: 278580
InnerLoopVectorizer shouldn't handle a loop with cycles inside the loop
body, even if that cycle isn't a natural loop.
Fixes PR28541.
Differential Revision: https://reviews.llvm.org/D22952
llvm-svn: 278573
They aren't static, and moving them to the entry block across something
else will only result in tears.
Root cause of http://crbug.com/636558.
llvm-svn: 278571
This is part of an effort to constify ValueTracking.cpp. This change is
to methods which need const Value* instead of Value* to go with the upcoming
changes to ValueTracking.
llvm-svn: 278528
Summary: The refined propagation algorithm is more accurate and robust.
Reviewers: davidxl, dnovillo
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23224
llvm-svn: 278522
Summary:
Port the NameAnonFunction pass and add a test.
Depends on D23439.
Reviewers: mehdi_amini
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D23440
llvm-svn: 278509
Summary:
1. Make coroutine representation more robust against optimization that may duplicate instruction by introducing coro.id intrinsics that returns a token that will get fed into coro.alloc and coro.begin. Due to coro.id returning a token, it won't get duplicated and can be used as reliable indicator of coroutine identify when a particular coroutine call gets inlined.
2. Move last three arguments of coro.begin into coro.id as they will be shared if coro.begin will get duplicated.
3. doc + test + code updated to support the new intrinsic.
Reviewers: mehdi_amini, majnemer
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D23412
llvm-svn: 278481
Summary:
This patch adds IsVariadicFunction bit to summary in order
to not import variadic functions. Inliner doesn't inline
variadic functions because it is hard to reason about it.
This one small fix improves Importer by about 16%
(going from 86% to 100% of imported functions that are
inlined anywhere)
on some spec benchmarks like 'int' and others.
Reviewers: eraman, mehdi_amini, tejohnson
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D23339
llvm-svn: 278432
When legal, extending trip count in the loop control logic generates better code compared to truncating IV. This is because
(1) extending trip count is a loop invariant operation (see genLoopLimit where we prove trip count is loop invariant).
(2) Scalar Evolution seems to have problems understanding trunc when computing loop trip count. So removing them allows better analysis performed in Scalar Evolution. (In particular this fixes PR 28363 which is the motivation for this change).
I am not going to perform any performance test. Any degradation caused by this should be an indication of a bug elsewhere.
To prove legality, we rely on SCEV to prove zext(trunc(IV)) == IV (or similarly for sext). If this holds, we can prove equivalence of trunc(IV)==ExitCnt (1) and IV == zext(ExitCnt). Simply take zext of boths sides of (1) and apply the proven equivalence.
This commit contains changes in a newly added testcase which was not included in the previous commit (which was reverted later on).
https://reviews.llvm.org/D23075
llvm-svn: 278421
Summary:
This is an extension of the fix in r271424. That fix dealt with builder
insert points being moved by SCEV expansion, but only for the lifetime
of the expand call. This change modifies the interface so that LSR can
safely call expand multiple times at the same insert point and do the
right thing if one of the expansions decides to move the original insert
point.
This is a fix for PR28719.
Reviewers: sanjoy
Subscribers: llvm-commits, mcrosier, mzolotukhin
Differential Revision: https://reviews.llvm.org/D23342
llvm-svn: 278413
Summary:
This fixes PR 28933 by making sure GVNHoist does not try to recreate memory
accesses when it has not actually moved them.
Reviewers: sebpop
Subscribers: llvm-commits, george.burgess.iv
Differential Revision: https://reviews.llvm.org/D23411
llvm-svn: 278401
Summary:
Keep track of all methods for which we have devirtualized at least
one call and then print them sorted alphabetically. That allows to
avoid duplicates and also makes the order deterministic.
Add optimization names into the remarks, so that it's easier to
understand how has each method been devirtualized.
Fix a bug when wrong methods could have been reported for
tryVirtualConstProp.
Reviewers: kcc, mehdi_amini
Differential Revision: https://reviews.llvm.org/D23297
llvm-svn: 278389
This adds a createFunctionInliningPass pass that takes an InlineParams object and use this to create the pre-inliner pass. This prevents the regular inliner's threshold flag from influencing the preinliner.
Differential revision: https://reviews.llvm.org/D23377
llvm-svn: 278377
When legal, extending trip count in the loop control logic generates better code compared to truncating IV. This is because
(1) extending trip count is a loop invariant operation (see genLoopLimit where we prove trip count is loop invariant).
(2) Scalar Evolution seems to have problems understanding trunc when computing loop trip count. So removing them allows better analysis performed in Scalar Evolution. (In particular this fixes PR 28363 which is the motivation for this change).
I am not going to perform any performance test. Any degradation caused by this should be an indication of a bug elsewhere.
To prove legality, we rely on SCEV to prove zext(trunc(IV)) == IV (or similarly for sext). If this holds, we can prove equivalence of trunc(IV)==ExitCnt (1) and IV == zext(ExitCnt). Simply take zext of boths sides of (1) and apply the proven equivalence.
https://reviews.llvm.org/D23075
llvm-svn: 278334
Change --no-pgo-warn-missing to -pgo-warn-missing-function
and negate the default. /NFC
Add more test to make sure the warning is off by default
llvm-svn: 278314
Summary:
I think it is much better this way.
When I firstly saw line:
Cost += InlineConstants::LastCallToStaticBonus;
I though that this is a bug, because everywhere where the cost is being reduced
it is usuing -=.
Reviewers: eraman, tejohnson, mehdi_amini
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D23222
llvm-svn: 278290
We are seeing r276077 drastically increasing compiler time for our larger
benchmarks in PGO profile generation build (both clang based and IR based
mode) -- it can be 20x slower than without the patch (like from 30 secs to
780 secs)
The increased time are all in pass LCSSA. The problematic code is about
PostProcessPHIs after use-rewrite. Note that the InsertedPhis from ssa_updater
is accumulating (never been cleared). Since the inserted PHIs are added to the
candidate for each rewrite, The earlier ones will be repeatedly added. Later
when adding the new PHIs to the work-list, we don't check the duplication
either. This can result in extremely long work-list that containing tons of
duplicated PHIs.
This patch fixes the issue by hoisting the code out of the loop.
Differential Revision: http://reviews.llvm.org/D23344
llvm-svn: 278250
Summary:
A particular coroutine usage pattern, where a coroutine is created, manipulated and
destroyed by the same calling function, is common for coroutines implementing
RAII idiom and is suitable for allocation elision optimization which avoid
dynamic allocation by storing the coroutine frame as a static `alloca` in its
caller.
coro.free and coro.alloc intrinsics are used to indicate which code needs to be suppressed
when dynamic allocation elision happens:
```
entry:
%elide = call i8* @llvm.coro.alloc()
%need.dyn.alloc = icmp ne i8* %elide, null
br i1 %need.dyn.alloc, label %coro.begin, label %dyn.alloc
dyn.alloc:
%alloc = call i8* @CustomAlloc(i32 4)
br label %coro.begin
coro.begin:
%phi = phi i8* [ %elide, %entry ], [ %alloc, %dyn.alloc ]
%hdl = call i8* @llvm.coro.begin(i8* %phi, i32 0, i8* null,
i8* bitcast ([2 x void (%f.frame*)*]* @f.resumers to i8*))
```
and
```
%mem = call i8* @llvm.coro.free(i8* %hdl)
%need.dyn.free = icmp ne i8* %mem, null
br i1 %need.dyn.free, label %dyn.free, label %if.end
dyn.free:
call void @CustomFree(i8* %mem)
br label %if.end
if.end:
...
```
If heap allocation elision is performed, we replace coro.alloc with a static alloca on the caller frame and coro.free with null constant.
Also, we need to make sure that if there are any tail calls referencing the coroutine frame, we need to remote tail call attribute, since now coroutine frame lives on the stack.
Documentation and overview is here: http://llvm.org/docs/Coroutines.html.
Upstreaming sequence (rough plan)
1.Add documentation. (https://reviews.llvm.org/D22603)
2.Add coroutine intrinsics. (https://reviews.llvm.org/D22659)
3.Add empty coroutine passes. (https://reviews.llvm.org/D22847)
4.Add coroutine devirtualization + tests.
ab) Lower coro.resume and coro.destroy (https://reviews.llvm.org/D22998)
c) Do devirtualization (https://reviews.llvm.org/D23229)
5.Add CGSCC restart trigger + tests. (https://reviews.llvm.org/D23234)
6.Add coroutine heap elision + tests. <= we are here
7.Add the rest of the logic (split into more patches)
Reviewers: mehdi_amini, majnemer
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D23245
llvm-svn: 278242
This is a resubmission of previously reverted r277592. It was hitting overly strong assertion in getConstantRange which was relaxed in r278217.
Use LVI to prove that adds do not wrap. The change is motivated by https://llvm.org/bugs/show_bug.cgi?id=28620 bug and it's the first step to fix that problem.
Reviewed By: sanjoy
Differential Revision: http://reviews.llvm.org/D23059
llvm-svn: 278220