This reverts commit 9bd22595bad36cd19f5e7ae18ccd9f41cba29dc5.
Seeing some bot failures which look plausibly connected. Revert while investigating/waiting for bots to stablize.
e.g. https://lab.llvm.org/buildbot#builders/36/builds/15933
If we have an exit which is controlled by a loop invariant condition and which dominates the latch, we know only the copy in the first unrolled iteration can be taken. All other copies are dead.
The change itself is pretty straight forward, but let me add two points of context:
* I'd have expected other transform passes to catch this after unrolling, but I'm seeing multiple examples where we get to the end of O2/O3 without simplifying.
* I'd like to do a stronger change which did CSE during unroll and accounted for invariant expressions (as defined by SCEV instead of trivial ones from LoopInfo), but that doesn't fit cleanly into the current code structure.
Differential Revision: https://reviews.llvm.org/D116496
The unrolling code was previously inserting new cloned blocks at the end of the function. The result of this with typical loop structures is that the new iterations are placed far from the initial iteration.
With unrolling, the general assumption is that the a) the loop is reasonable hot, and b) the first Count-1 copies of the loop are rarely (if ever) loop exiting. As such, placing Count-1 copies out of line is a fairly poor code placement choice. We'd much rather fall through into the hot (non-exiting) path. For code with branch profiles, later layout would fix this, but this may have a positive impact on non-PGO compiled code.
However, the real motivation for this change isn't performance. Its readability and human understanding. Having to jump around long distances in an IR file to trace an unrolled loop structure is error prone and tedious.
Fold all exits based on known trip count/multiple information from
SCEV. Previously only the latch exit or the single exit were folded.
This doesn't yet eliminate ULO.TripCount and ULO.TripMultiple
entirely: They're still used to a) decide whether runtime unrolling
should be performed and b) for ORE remarks. However, the core
unrolling logic is independent of them now.
Differential Revision: https://reviews.llvm.org/D104203
Unrolling with more iterations than MaxTripCount is pointless, as
those iterations can never be executed. As such, we clamp ULO.Count
to MaxTripCount if it is known. This means we no longer need to
consider iterations after MaxTripCount for exit folding, and the
CompletelyUnroll flag becomes independent of ULO.TripCount.
Differential Revision: https://reviews.llvm.org/D103748
Initially it failed an assertion with "Do actual DCE in LoopUnroll (try 2)"
which was later reverted. Make sure that when this patch is returned, the
test works fine.
Turns out simplifyLoopIVs sometimes returns a non-dead instruction in it's DeadInsts out param. I had done a bit of NFC cleanup which was only NFC if simplifyLoopIVs obeyed it's documentation. I'm simplfy dropping that part of the change.
Commit message from try 3:
Recommitting after fixing a bug found post commit. Amusingly, try 1 had been correct, and by reverting to incorporate last minute review feedback, I introduce the bug. Oops. :)
Original commit message:
The problem was that recursively deleting an instruction can delete instructions beyond the current iterator (via a dead phi), thus invalidating iteration. Test case added in LoopUnroll/dce.ll to cover this case.
LoopUnroll does a limited DCE pass after unrolling, but if you have a chain of dead instructions, it only deletes the last one. Improve the code to recursively delete all trivially dead instructions.
Differential Revision: https://reviews.llvm.org/D102511
Recommitting after fixing a bug found post commit. Amusingly, try 1 had been correct, and by reverting to incorporate last minute review feedback, I introduce the bug. Oops. :)
The problem was that recursively deleting an instruction can delete instructions beyond the current iterator (via a dead phi), thus invalidating iteration. Test case added in LoopUnroll/dce.ll to cover this case.
LoopUnroll does a limited DCE pass after unrolling, but if you have a chain of dead instructions, it only deletes the last one. Improve the code to recursively delete all trivially dead instructions.
Differential Revision: https://reviews.llvm.org/D102511
Recommitting after addressing a missed review comment, and updating an aarch64 test I'd missed.
LoopUnroll does a limited DCE pass after unrolling, but if you have a chain of dead instructions, it only deletes the last one. Improve the code to recursively delete all trivially dead instructions.
Differential Revision: https://reviews.llvm.org/D102511
This reverts commit 9d1a61e695eb01298e26c76867d65592f1e1968c.
I'd missed some review feedback, and had missed updating an aarch64 test. Reverting while I fix both.
LoopUnroll does a limited DCE pass after unrolling, but if you have a chain of dead instructions, it only deletes the last one. Improve the code to recursively delete all trivially dead instructions.
Differential Revision: https://reviews.llvm.org/D102511
The legacy pass is called "loop-unroll", but in the new PM it's called "unroll".
Also applied to unroll-and-jam and unroll-full.
Fixes various check-llvm tests when NPM is turned on.
Reviewed By: Whitney, dmgreen
Differential Revision: https://reviews.llvm.org/D82590
is not necessary one of them.
Summary: Currently LoopUnrollPass already allow loops with multiple
exiting blocks, but it is only allowed when the loop latch is one of the
exiting blocks.
When the loop latch is not an exiting block, then only single exiting
block is supported.
When possible, the single loop latch or the single exiting block
terminator is optimized to an unconditional branch in the unrolled loop.
This patch allows loops with multiple exiting blocks even if the loop
latch is not one of them. However, the optimization of exiting block
terminator to unconditional branch is not done when there exists more
than one exiting block.
Reviewer: dmgreen, Meinersbur, etiotto, fhahn, efriedma, bmahjour
Reviewed By: efriedma
Subscribers: hiraditya, zzheng, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D81053
rG7873376bb36b fixes a build failure for allyesconfig.
The problem happened when the single exiting block doesn't dominate the
loop latch, then the immediate dominator of the exit block should not be
the exiting block after unrolling. As the exiting block of
different unrolled iteration can branch to the exit block, and the ith
exiting block doesn't dominate (i+1)th exiting block, the immediate
dominator of the exit block should not the nearest common dominator of
the exiting block and the loop latch of the same iteration.
Differential Revision: https://reviews.llvm.org/D80477