19 Commits

Author SHA1 Message Date
Philip Reames
2d31b02517 Compute estimated trip counts for multiple exit loops
This change allows us to estimate trip count from profile metadata for all multiple exit loops. We still do the estimate only from the latch, but that's fine as it causes us to over estimate the trip count at worst.

Reviewing the uses of the API, all but one are cases where we restrict a loop transformation (unroll, and vectorize respectively) when we know the trip count is short enough. So, as a result, the change makes these passes strictly less aggressive. The test change illustrates a case where we'd previously have runtime unrolled a loop which ran fewer iterations than the unroll factor. This is definitely unprofitable.

The one case where an upper bound on estimate trip count could drive a more aggressive transform is peeling, and I duplicated the logic being removed from the generic estimation there to keep it the same. The resulting heuristic makes no sense and should probably be immediately removed, but we can do that in a separate change.

This was noticed when analyzing regressions on D113939.

I plan to come back and incorporate estimated trip counts from other exits, but that's a minor improvement which can follow separately.

Differential Revision: https://reviews.llvm.org/D115362
2021-12-09 09:53:49 -08:00
Max Kazantsev
cb728cb8a9 [NFC] Get rid of hardcoded magical constant and use Optionals instead
Refactor calculateIterationsToInvariance so that it doesn't need a magical
constant to signify unknown answer.
2021-11-09 18:13:19 +07:00
Dmitry Makogon
e09958d5eb [LoopPeel] Peel loops with exits followed by an unreachable or deopt block
Added support for peeling loops with exits that are followed either by an
unreachable-terminated block or block that has a terminatnig deoptimize call.
All blocks in the sequence must have an unique successor, maybe except
for the last one.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D110922
2021-11-02 23:12:04 +07:00
Max Kazantsev
d4c74cd4e8 [NFC] [LoopPeel] Update IDoms of non-loop blocks dominated by the loop
When peeling a loop, we assume that the latch has a `br` terminator and that
all loop exits are either terminated with an `unreachable` or have a terminating
deoptimize call. So when we peel off the 1st iteration, we change the IDom of
all loop exits to the peeled copy of `NCD(IDom(Exit), Latch)`. This works now,
but if we add logic to support loops with exits that are followed by a block
with an `unreachable` or a terminating deoptimize call, changing the exit's idom
wouldn't be enough and DT would be broken.

For example, let `Exit1` and `Exit2` are loop exits, and each of them
unconditionally branches to the same `unreachable` terminated block. So neither
of the exits dominates this unreachable block. If we change the IDoms of the
exits to some peeled loop block, we don't update the dominators of the unreachable
block. Currently we just don't get to the peeling logic, saying that we can't peel
such loops.

Previously we stored exits' IDoms in a map before peeling a loop and then, after
peeling off one iteration, we changed their IDoms.
Now we use the same logic not only for exits but for all non-loop blocks dominated
by the loop.
So when we add logic to support peeling loops with exits which branch, for example,
to an unreachable-terminated block, we would update the IDoms not only for exits,
but for their successors.

Patch by Dmitry Makogon!

Differential Revision: https://reviews.llvm.org/D111611
Reviewed By: mkazantsev, nikic
2021-10-26 13:09:07 +07:00
Max Kazantsev
baad10c09e Revert "[NFC] [LoopPeel] Change the way DT is updated for loop exits"
This reverts commit fa16329ae0721023376f24c7577b9020d438df1a.

See comments in discussion. Merged by mistake, not entirely getting what
the problem was.
2021-10-18 17:14:11 +07:00
Max Kazantsev
fa16329ae0 [NFC] [LoopPeel] Change the way DT is updated for loop exits
When peeling a loop, we assume that the latch has a `br` terminator and
that all loop exits are either terminated with an `unreachable` or have
a terminating deoptimize call. So when we peel off the 1st iteration, we
change the IDom of all loop exits to the peeled copy of
`NCD(IDom(Exit), Latch)`. This works now, but if we add logic to support
loops with exits that are followed by a block with an `unreachable` or a
terminating deoptimize call, changing the exit's idom wouldn't be enough
and DT would be broken.

For example, let `Exit1` and `Exit2` are loop exits, and each of them
unconditionally branches to the same `unreachable` terminated block. So
neither of the exits dominates this unreachable block. If we change the
IDoms of the exits to some peeled loop block, we don't update the
dominators of the unreachable block. Currently we just don't get to the
peeling logic, saying that we can't peel such loops.

With this NFC we just insert edges from cloned exiting blocks to their
exits after peeling each iteration (we accumulate the insertion updates
and then after peeling apply the updates to DT).

This patch was a part of D110922.

Patch by Dmitry Makogon!

Differential Revision: https://reviews.llvm.org/D111611
Reviewed By: mkazantsev
2021-10-18 10:23:05 +07:00
Florian Hahn
40d85f16c4
[LoopPeel] Use any_of & contains instead of for & find.
Using contains was suggested in D108114, but I forgot to include it when
landing the patch.
2021-10-12 12:18:01 +01:00
Florian Hahn
cd0ba9dc58
[LoopPeel] Peel if it turns invariant loads dereferenceable.
This patch adds a new cost heuristic that allows peeling a single
iteration off read-only loops, if the loop contains a load that

    1. is feeding an exit condition,
    2. dominates the latch,
    3. is not already known to be dereferenceable,
    4. and has a loop invariant address.

If all non-latch exits are terminated with unreachable, such loads
in the loop are guaranteed to be dereferenceable after peeling,
enabling hoisting/CSE'ing them.

This enables vectorization of loops with certain runtime-checks, like
multiple calls to `std::vector::at` if the vector is passed as pointer.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D108114
2021-10-12 11:42:28 +01:00
Arthur Eubanks
9405217999 Revert "Recommit "[LoopPeel] Peel loops with deoptimizing exits""
This reverts commit d68b59f3ebb253ee7119a25a71c51cf19b73e030.

This is causing crashes, see D110922 for details.
2021-10-08 10:53:23 -07:00
Max Kazantsev
d68b59f3eb Recommit "[LoopPeel] Peel loops with deoptimizing exits"
Removed obsolete DT verification that should not be there because the
strategy of DT updates has changed.

Differential Revision: https://reviews.llvm.org/D110922
2021-10-08 17:54:27 +07:00
Max Kazantsev
48a5a2d1af Revert "[LoopPeel] Peel loops with deoptimizing exits"
This reverts commit 8a959625c433f311233682afa7bfe1c76367700d.

Reported failures with LLVM_ENABLE_EXPENSIVE_CHECKS, need to investigate.
2021-10-08 16:07:59 +07:00
Max Kazantsev
8a959625c4 [LoopPeel] Peel loops with deoptimizing exits
Added support for peeling loops with "deoptimizing" exits -
such exits that it or any of its children (or any of their
children, etc) either has a @llvm.experimental.deoptimize call
prior to the terminating return instruction of this basic block
or is terminated with unreachable. All blocks in the the
sequence must have a single successor, maybe except for the last
one.

Previously we only checked the exit block for being deoptimizing.
Now we check if the last reachable block from the exit is deoptimizing.

Patch by Dmitry Makogon!

Differential Revision: https://reviews.llvm.org/D110922
Reviewed By: mkazantsev
2021-10-08 10:32:13 +07:00
Florian Hahn
90d09eb300
[LoopPeel] Allow peeling with multiple unreachable-terminated exit blocks.
Support for peeling with multiple exit blocks was added in D63921/77bb3a486fa6.

So far it has only been enabled for loops where all non-latch exits are
'de-optimizing' exits (D63923). But peeling of multi-exit loops can be
highly beneficial in other cases too, like if all non-latch exiting
blocks are unreachable.

The motivating case are loops with runtime checks, like the C++ example
below. The main issue preventing vectorization is that the invariant
accesses to load the bounds of B is conditionally executed in the loop
and cannot be hoisted out. If we peel off the first iteration, they
become dereferenceable in the loop, because they must execute before the
loop is executed, as all non-latch exits are terminated with
unreachable. This subsequently allows hoisting the loads and runtime
checks out of the loop, allowing vectorization of the loop.

     int sum(std::vector<int> *A, std::vector<int> *B, int N) {
       int cost = 0;
       for (int i = 0; i < N; ++i)
         cost += A->at(i) + B->at(i);
       return cost;
     }

This gives a ~20-30% increase of score for Geekbench5/HDR on AArch64.

Note that this requires a follow-up improvement to the peeling cost
model to actually peel iterations off loops as above. I will share that
shortly.

Also, peeling of multi-exits might be beneficial for exit blocks with
other terminators, but I would like to keep the scope limited to known
high-reward cases for now.

I removed the option to disable peeling for multi-deopt exits because
the code is more general now. Alternatively, the option could also be
generalized, but I am not sure if there's much value in the option?

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D108108
2021-08-25 13:26:40 +01:00
Max Kazantsev
26ec76add5 [NFC] One more use case for evaluatePredicate 2021-03-18 19:21:29 +07:00
Jeroen Dobbelaere
80cdd30eb9 [LoopPeel] Use llvm.experimental.noalias.scope.decl for duplicating noalias metadata as needed.
The reduction of a sanitizer build failure when enabling the dominance check (D95335) showed that loop peeling also needs to take care of scope duplication, just like loop unrolling (D92887).

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D95544
2021-02-01 10:01:17 +01:00
Joseph Tremoulet
40cd262c43 Loop peeling: check that latch is conditional branch
Loop peeling assumes that the loop's latch is a conditional branch.  Add
a check to canPeel that explicitly checks for this, and testcases that
otherwise fail an assertion when trying to peel a loop whose back-edge
is a switch case or the non-unwind edge of an invoke.

Reviewed By: skatkov, fhahn

Differential Revision: https://reviews.llvm.org/D94995
2021-01-20 11:01:16 -05:00
Max Kazantsev
a5b2e795c3 [NFC][SCEV] Refactor monotonic predicate checks to return enums instead of bools
This patch gets rid of output parameter which is not needed for most users
and prepares this API for further refactoring.
2020-10-29 16:01:25 +07:00
Stefanos Baziotis
89c1e35f3c [LoopInfo] empty() -> isInnermost(), add isOutermost()
Differential Revision: https://reviews.llvm.org/D82895
2020-09-22 23:28:51 +03:00
Sidharth Baveja
b7cfa6ca92 [Loop Peeling] Separate the Loop Peeling Utilities from the Loop Unrolling Utilities
Summary: This patch separates the Loop Peeling Utilities from Loop Unrolling.
The reason for this change is that Loop Peeling is no longer only being used by
loop unrolling; Patch D82927 introduces loop peeling with fusion, such that
loops can be modified to have to same trip count, making them legal to be
peeled.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D83056
2020-07-31 18:31:58 +00:00