This is fix for
[187902](https://github.com/llvm/llvm-project/issues/187902), where
`LoopInfo` is not in a valid state at the beginning of `ScalarEvolution::createSCEVIter`.
The reason for the bug is that, `mergeLatch()` is called at a place
where control flow and dominator trees have been updated but `LoopInfo`
has not completed the update yet. `mergeLatch()` calls into
`ScalarEvolution` that uses `LoopInfo`, where out-of-date `LoopInfo` would
result in crash or unpredictable results.
This patch moves `mergeLatch()` to the place where `LoopInfo` has
completed its update and hence is in a valid state.
If the function dependencesAllowFusion returns false, in fuseCandidates
the reportLoopFusion function is used to increment InvalidDependencies
and to emit a OptimizationRemarkMissed. If both dependencesAllowFusion
and reportLoopFusion increment InvalidDependencies, statistics will
appear duplicated
Loop Fusion includes some internal dependence analysis code. Currently
the pass uses both DA and internal code and chooses the best result. The
goal is to use DA for all dependence analysis requirements in fusion.
This patch changes the default value. Removing the code will be done
separately later.
I ended up relaxing some of the checks that LoopInterchange made, the
assumptions that certain instructions were branches seemed to not be
used at all.
A LoopVector contains all the loops with the same parent loop (or all
loops with no parent). Once loop fusion is done with the transformation
for candidates extracted from one LoopVector we can safely clear
FusionCandidates. This avoids unnecssary work and results in more
meaningful statistics.
The order of visiting loops in collectFusionCandidates() guarantees that
a new member can only possibly be added to the end of a set.
Also currently `NumFusionCandidates` counts any loop that is added to a
candidate set. Usually large majority of candidate sets have a single
members so they are not really candidates for fusion. Only the second
member of a candidate set and the ones that come after that could be
counted as fusion candidates.
This patch fixes the issue #115279. After the fusion, some of the cached
SCEV values such as the induction variable may not be valid anymore and
need to be forgotten.
Fixed issue #165087.
When we sink phis from the 2nd loop preheader to the exit block, we
optimize it a bit further, i.e., propagate the uses of each phi node with
its incoming value and optimize away the phis. Deleted `fixPHINodes()`
too because the phis are already optimized away and there is no point
processing `fixPHINodes()`.
Loop fusion assumes the non-loop block of a guarded adjacent loop is the
immediate successor of its exit block. This patch ensures this condition
is hold and fixes the crash #166356.
Considering that the current loop fusion only supports adjacent loops,
we are able to simplify the checks in this pass. By removing
`isControlFlowEquivalent` check, this patch fixes multiple issues
including #166560, #166535, #165031, #80301 and #168263.
Now only the sequential/adjacent candidates are collected in the same
list. This patch is the implementation of approach 2 discussed in post
#171207.
Loop fusion pass will use the information provided by the recent
DA patch to fuse additional legal loops, including those with
forward loop-carried dependencies.
If we have instructions in second loop's preheader which can be sunk, we
should also be adjusting PHI nodes to receive values from the fused loop's latch block.
Fixes#128600
This reverts the revert commit bf92b127d2637948f53d11a187e865aa10e2e74c.
This adds missing initialization of PeelLast in gatherPeelingPreferences.
Original message:
Generalize countToEliminateCompares to also consider peeling off the
last iteration if it eliminates a compare.
At the moment, codegen for peeling off the last iteration is quite
restrictive and callers have to make sure that the exit condition can be
adjusted when peeling and that the loop executes at least 2 iterations.
Both will be relaxed in follow-ups.
PR: https://github.com/llvm/llvm-project/pull/139551
Generalize countToEliminateCompares to also consider peeling off the
last iteration if it eliminates a compare.
At the moment, codegen for peeling off the last iteration is quite
restrictive and callers have to make sure that the exit condition can be
adjusted when peeling and that the loop executes at least 2 iterations.
Both will be relaxed in follow-ups.
PR: https://github.com/llvm/llvm-project/pull/139551
Parameter PossiblyLoopIndependent has lost its intended purpose. This
flag is always set to true in all cases when depends() is called, hence
we want to reconsider the utility of this variable and remove it from
the function signature entirely. This is an NFC patch.
As part of the "RemoveDIs" project, BasicBlock::iterator now carries a
debug-info bit that's needed when getFirstNonPHI and similar feed into
instruction insertion positions. Call-sites where that's necessary were
updated a year ago; but to ensure some type safety however, we'd like to
have all calls to moveBefore use iterators.
This patch adds a (guaranteed dereferenceable) iterator-taking
moveBefore, and changes a bunch of call-sites where it's obviously safe
to change to use it by just calling getIterator() on an instruction
pointer. A follow-up patch will contain less-obviously-safe changes.
We'll eventually deprecate and remove the instruction-pointer
insertBefore, but not before adding concise documentation of what
considerations are needed (very few).
A dominance query of a block that is in a different function is
ill-defined, so assert that getNode() is only called for blocks that are
in the same function.
There are three cases, where this behavior did occur. LoopFuse didn't
explicitly do this, but didn't invalidate the SCEV block dispositions,
leaving dangling pointers to free'ed basic blocks behind, causing
use-after-free. We do, however, want to be able to dereference basic
blocks inside the dominator tree, so that we can refer to them by a
number stored inside the basic block.
Reverts #102780
Reland #101198Fixes#102784
Co-authored-by: Alexis Engelke <engelke@in.tum.de>
A dominance query of a block that is in a different function is
ill-defined, so assert that getNode() is only called for blocks that are
in the same function.
There are two cases, where this behavior did occur. LoopFuse didn't
explicitly do this, but didn't invalidate the SCEV block dispositions,
leaving dangling pointers to free'ed basic blocks behind, causing
use-after-free. We do, however, want to be able to dereference basic
blocks inside the dominator tree, so that we can refer to them by a
number stored inside the basic block.
Continuing the patch series to get rid of debug intrinsics [0], instruction
insertion needs to be done with iterators rather than instruction pointers,
so that we can communicate information in the iterator class. This patch
adds an iterator-taking insertBefore method and converts various call sites
to take iterators. These are all sites where such debug-info needs to be
preserved so that a stage2 clang can be built identically; it's likely that
many more will need to be changed in the future.
At this stage, this is just changing the spelling of a few operations,
which will eventually become signifiant once the debug-info bearing
iterator is used.
[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939
Differential Revision: https://reviews.llvm.org/D152537
This patch tries to fix [[ https://github.com/llvm/llvm-project/issues/56263 | issue ]].
If two **FusionCandidates** are in same level of dominator tree then, they will not be dominates each other. But they are control flow equivalent. To sort those FusionCandidates **nonStrictlyPostDominate** check is needed.
Reviewed By: Narutoworld
Differential Revision: https://reviews.llvm.org/D139993
The value map of last peeled iteration is computed within peelLoop API.
This patch exposes it for callers of peelLoop.
While this is not currently used by upstream passes, we have a usecase
downstream which benefits from this API update. Future users of peelLoop
can also use the ValueMap if needed.
Similar value maps are exposed by other loop utilities such as loop
cloning.
Differential Revision: https://reviews.llvm.org/D138228
Use a consistent type for the operands() methods of different SCEV
types. Also make the API consistent by only providing operands(),
rather than also providin op_begin() and op_end() for some of them.
Fixes https://github.com/llvm/llvm-project/issues/59023
PHI nodes that are in the second loop only have the first loop as its
predecessor. These PHI nodes should be sunk to the end of the fused
loop. If the second loop uses the PHI, then the loops cannot be fused.
I don't think this should happen in typical compilation workflows.
The PHI will be in a dedicated exit block of the first loop following
LCSSA transformations.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D139812
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
LoopInfo doesn't give all loops in a loop nest, it gives top level loops
only. While isLoopSimplifyForm() only checkes for the outter most loop of a
loop nest. As a result, inner loops that are not in simplied form can
not be simplified with the original code.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D137672
Loop Fusion (Function Pass) requires loops in simplified form. With
legacy-pm, loop-simplify pass is added as a dependency for loop-fusion.
But the new pass manager does not always ensure this format. This patch
tries to invoke simplifyLoop() on loops that are not in simplified form
only for new PM.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D136781
This bug was found by recent improvement in SCEV verifier. The code in LoopFuse
directly reassigns blocks to be a part of a different loop, which should automatically
invalidate all related cached loop dispositions.
Differential Revision: https://reviews.llvm.org/D134173
Reviewed By: nikic
Currently, instructions in the preheader of the second of two fusion
candidates are sunk and hoisted whenever possible, to try to allow the
loops to fuse. Memory instructions are skipped, and are never sunk or
hoisted. This change adds memory instructions for sinking/hoisting
consideration.
This change uses DependenceAnalysis to check if a mem inst in the
preheader of FC1 depends on an instruction in FC0's header, across
which it will be hoisted, or FC1's header, across which it will be
sunk. We reject cases where the dependency is a data hazard.
Differential Revision: https://reviews.llvm.org/D131606