This change allows us to estimate trip count from profile metadata for all multiple exit loops. We still do the estimate only from the latch, but that's fine as it causes us to over estimate the trip count at worst.
Reviewing the uses of the API, all but one are cases where we restrict a loop transformation (unroll, and vectorize respectively) when we know the trip count is short enough. So, as a result, the change makes these passes strictly less aggressive. The test change illustrates a case where we'd previously have runtime unrolled a loop which ran fewer iterations than the unroll factor. This is definitely unprofitable.
The one case where an upper bound on estimate trip count could drive a more aggressive transform is peeling, and I duplicated the logic being removed from the generic estimation there to keep it the same. The resulting heuristic makes no sense and should probably be immediately removed, but we can do that in a separate change.
This was noticed when analyzing regressions on D113939.
I plan to come back and incorporate estimated trip counts from other exits, but that's a minor improvement which can follow separately.
Differential Revision: https://reviews.llvm.org/D115362
Added support for peeling loops with exits that are followed either by an
unreachable-terminated block or block that has a terminatnig deoptimize call.
All blocks in the sequence must have an unique successor, maybe except
for the last one.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D110922
When peeling a loop, we assume that the latch has a `br` terminator and that
all loop exits are either terminated with an `unreachable` or have a terminating
deoptimize call. So when we peel off the 1st iteration, we change the IDom of
all loop exits to the peeled copy of `NCD(IDom(Exit), Latch)`. This works now,
but if we add logic to support loops with exits that are followed by a block
with an `unreachable` or a terminating deoptimize call, changing the exit's idom
wouldn't be enough and DT would be broken.
For example, let `Exit1` and `Exit2` are loop exits, and each of them
unconditionally branches to the same `unreachable` terminated block. So neither
of the exits dominates this unreachable block. If we change the IDoms of the
exits to some peeled loop block, we don't update the dominators of the unreachable
block. Currently we just don't get to the peeling logic, saying that we can't peel
such loops.
Previously we stored exits' IDoms in a map before peeling a loop and then, after
peeling off one iteration, we changed their IDoms.
Now we use the same logic not only for exits but for all non-loop blocks dominated
by the loop.
So when we add logic to support peeling loops with exits which branch, for example,
to an unreachable-terminated block, we would update the IDoms not only for exits,
but for their successors.
Patch by Dmitry Makogon!
Differential Revision: https://reviews.llvm.org/D111611
Reviewed By: mkazantsev, nikic
This reverts commit fa16329ae0721023376f24c7577b9020d438df1a.
See comments in discussion. Merged by mistake, not entirely getting what
the problem was.
When peeling a loop, we assume that the latch has a `br` terminator and
that all loop exits are either terminated with an `unreachable` or have
a terminating deoptimize call. So when we peel off the 1st iteration, we
change the IDom of all loop exits to the peeled copy of
`NCD(IDom(Exit), Latch)`. This works now, but if we add logic to support
loops with exits that are followed by a block with an `unreachable` or a
terminating deoptimize call, changing the exit's idom wouldn't be enough
and DT would be broken.
For example, let `Exit1` and `Exit2` are loop exits, and each of them
unconditionally branches to the same `unreachable` terminated block. So
neither of the exits dominates this unreachable block. If we change the
IDoms of the exits to some peeled loop block, we don't update the
dominators of the unreachable block. Currently we just don't get to the
peeling logic, saying that we can't peel such loops.
With this NFC we just insert edges from cloned exiting blocks to their
exits after peeling each iteration (we accumulate the insertion updates
and then after peeling apply the updates to DT).
This patch was a part of D110922.
Patch by Dmitry Makogon!
Differential Revision: https://reviews.llvm.org/D111611
Reviewed By: mkazantsev
This patch adds a new cost heuristic that allows peeling a single
iteration off read-only loops, if the loop contains a load that
1. is feeding an exit condition,
2. dominates the latch,
3. is not already known to be dereferenceable,
4. and has a loop invariant address.
If all non-latch exits are terminated with unreachable, such loads
in the loop are guaranteed to be dereferenceable after peeling,
enabling hoisting/CSE'ing them.
This enables vectorization of loops with certain runtime-checks, like
multiple calls to `std::vector::at` if the vector is passed as pointer.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D108114
Removed obsolete DT verification that should not be there because the
strategy of DT updates has changed.
Differential Revision: https://reviews.llvm.org/D110922
Added support for peeling loops with "deoptimizing" exits -
such exits that it or any of its children (or any of their
children, etc) either has a @llvm.experimental.deoptimize call
prior to the terminating return instruction of this basic block
or is terminated with unreachable. All blocks in the the
sequence must have a single successor, maybe except for the last
one.
Previously we only checked the exit block for being deoptimizing.
Now we check if the last reachable block from the exit is deoptimizing.
Patch by Dmitry Makogon!
Differential Revision: https://reviews.llvm.org/D110922
Reviewed By: mkazantsev
Support for peeling with multiple exit blocks was added in D63921/77bb3a486fa6.
So far it has only been enabled for loops where all non-latch exits are
'de-optimizing' exits (D63923). But peeling of multi-exit loops can be
highly beneficial in other cases too, like if all non-latch exiting
blocks are unreachable.
The motivating case are loops with runtime checks, like the C++ example
below. The main issue preventing vectorization is that the invariant
accesses to load the bounds of B is conditionally executed in the loop
and cannot be hoisted out. If we peel off the first iteration, they
become dereferenceable in the loop, because they must execute before the
loop is executed, as all non-latch exits are terminated with
unreachable. This subsequently allows hoisting the loads and runtime
checks out of the loop, allowing vectorization of the loop.
int sum(std::vector<int> *A, std::vector<int> *B, int N) {
int cost = 0;
for (int i = 0; i < N; ++i)
cost += A->at(i) + B->at(i);
return cost;
}
This gives a ~20-30% increase of score for Geekbench5/HDR on AArch64.
Note that this requires a follow-up improvement to the peeling cost
model to actually peel iterations off loops as above. I will share that
shortly.
Also, peeling of multi-exits might be beneficial for exit blocks with
other terminators, but I would like to keep the scope limited to known
high-reward cases for now.
I removed the option to disable peeling for multi-deopt exits because
the code is more general now. Alternatively, the option could also be
generalized, but I am not sure if there's much value in the option?
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D108108
The reduction of a sanitizer build failure when enabling the dominance check (D95335) showed that loop peeling also needs to take care of scope duplication, just like loop unrolling (D92887).
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D95544
Loop peeling assumes that the loop's latch is a conditional branch. Add
a check to canPeel that explicitly checks for this, and testcases that
otherwise fail an assertion when trying to peel a loop whose back-edge
is a switch case or the non-unwind edge of an invoke.
Reviewed By: skatkov, fhahn
Differential Revision: https://reviews.llvm.org/D94995
Summary: This patch separates the Loop Peeling Utilities from Loop Unrolling.
The reason for this change is that Loop Peeling is no longer only being used by
loop unrolling; Patch D82927 introduces loop peeling with fusion, such that
loops can be modified to have to same trip count, making them legal to be
peeled.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D83056