This just makes it more obvious that having Parent as the single
predecessor is a special case, instead of checking for it in the middle
of a loop that finds the nearest common dominator of multiple
predecessors.
Currently `StructurizeCFG` drops branch_weight metadata .
This metadata can be generated from user annotations in the source code
like:
```cpp
if (...) [[likely]] {
}
```
After investigating more while-break cases, I think we should try to
optimize
the way we reconstruct phi nodes. Previously, we reconstruct each phi
nodes separately, but this is not optimal. For example:
```
header:
%v.1 = phi float [ %v, %entry ], [ %v.2, %latch ]
br i1 %cc, label %if, label %latch
if:
%v.if = fadd float %v.1, 1.0
br i1 %cc2, label %latch, label %exit
latch:
%v.2 = phi float [ %v.if, %if ], [ %v.1, %header ]
br i1 %cc3, label %exit, label %header
exit:
%v.3 = phi float [ %v.2, %latch ], [ %v.if, %if ]
```
For this case, we have different copies of value `v`, but there is at
most one copy of value `v` alive at any program point shown above.
The existing ssa reconstruction will use the incoming values from the
old deleted phi. Below is a possible output after ssa reconstruction.
```
header:
%v.1 = phi float [ %v, %entry ], [ %v.loop, %Flow1 ]
br i1 %cc, label %if, label %flow
if:
%v.if = fadd float %v.1, 1.0
br label %flow
flow:
%v.exit.if = phi float [ %v.if, %if ], [ undef, %header ]
%v.latch = phi float [ %v.if, %if ], [ %v.1, %header ]
latch:
br label %flow1
flow1:
%v.loop = phi float [ %v.latch, %latch ], [ undef, %Flow ]
%v.exit = phi float [ %v.latch, %latch ], [ %v.exit.if, %Flow ]
exit:
%v.3 = phi float [ %v.exit, %flow1 ]
```
If we look closely, in order to reconstruct `v.1` `v.2` `v.3`, we are
having two simultaneous copies of `v` alive at `flow` and `flow1`.
We highly depend on register coalescer to coalesce them together.
But register coalescer may not always be able to coalesce them
because of the complexity in the chain of phi.
On the other side, now that we have only one copy of `v` alive at any
program point before the transform, why not simplify the phi network
as much as we can? Look at the incoming values of these PHIs:
```
header if latch
v.1: -- -- v.2
v.2: v.1 v.if --
v.3: -- v.if v.2
```
If we let them share the same incoming values for these three different
incoming blocks, then we would have only one copy of alive `v` at any
program point after ssa reconstruction. Something like:
```
header:
%v.1 = phi float [ %v, %entry ], [ %v.2, %Flow1 ]
br i1 %cc, label %if, label %flow
if:
%v.if = fadd float %v.1, 1.0
br label %flow
flow:
%v.2 = phi float [ %v.if, %if ], [ %v.1, %header ]
latch:
br label %flow1
flow1:
...
exit:
%v.3 = phi float [ %v.2, %flow1 ]
```
Some passes has limitation that only support simple terminators:
branch/unreachable/return. Right now, they ask the pass manager to add
LowerSwitch pass to eliminate `switch`. Let's manage such kind of pass
dependency by ourselves. Also add the assertion in the related passes.
Small oversight in https://reviews.llvm.org/D145688 - the pass' dependency was not updated to reflect the change to UA.
Also, change DivergenceAnalysis to UniformityAnalysis in a comment. That way, StructurizeCFG only refers to UA and not DA anymore.
During structurization process, we may place non-predecessor blocks
between the predecessors of a block in the structurized CFG. Take
the typical while-break case as an example:
```
/---A(v=...)
| / \
^ B C
| \ /|
\---L |
\ /
E (r = phi (v:C)...)
```
After structurization, the CFG would be look like:
```
/---A
| |\
| | C
| |/
| F1
^ |\
| | B
| |/
| F2
| |\
| | L
\ |/
\--F3
|
E
```
We can see that block B is placed between the predecessors(C/L) of E.
During phi reconstruction, to achieve the same sematics as before, we
are reconstructing the PHIs as:
F1: v1 = phi (v:C), (undef:A)
F3: r = phi (v1:F2), ...
But this is also saying that `v1` would be live through B, which is not
quite necessary. The idea in the change is to say the incoming value
from B is Undef for the PHI in E. With this change, the reconstructed
PHI would be:
F1: v1 = phi (v:C), (undef:A)
F2: v2 = phi (v1:F1), (undef:B)
F3: r = phi (v2:F2), ...
Reviewed by: sameerds
Differential Revision: https://reviews.llvm.org/D132450
The instruction simplification will try to simplify the affected phis.
In some cases, this might extend the liveness of values. For example:
BB0:
| \
| BB1
| /
BB2:phi (BB0, v), (BB1, undef)
The phi in BB2 will be simplified to v as v dominates BB2, but this is
increasing the number of active values in BB1. By setting CanUseUndef
to false, we will not simplify the phi in this way, this would help
register pressure. This is mandatory for the later change to help
reducing VGPR pressure for AMDGPU.
Reviewed by: foad, sameerds
Differential Revision: https://reviews.llvm.org/D132449
This reverts commit f1b05a0a2bbbea160002be709f8a1c59de366761.
Need to revert to due to issues identified with testing. The
transformation is incorrect for blocks that contain convergent
instructions.
StructurizeCFG linearizes the successors of branching basic block
by adding Flow blocks to record the true/false path for branches
and back edges. This patch reduces the number of Phi values needed
to capture the control flow path by improving the basic block
ordering.
Previously, StructurizeCFG adds loop exit blocks outside of the
loop. StructurizeCFG sets a boolean value to indicate the path
taken, and all exit block live values extend to after the loop.
For loops with a large number of exits blocks, this creates a
huge number of values that are maintained, which increases
compilation time and register pressure. This is problem
especially with ASAN, which adds early exits to blocks with
unreachable instructions for each instrumented check in the loop.
In specific cases, this patch reduces the number of values needed
after the loop by moving the exit block into the loop. This is
done for blocks that have a single predecessor and single successor
by moving the block to appear just after the predecessor.
Differential Revision: https://reviews.llvm.org/D123231
Clang-format InstructionSimplify and convert all "FunctionName"s to
"functionName". This patch does touch a lot of files but gets done with
the cleanup of InstructionSimplify in one commit.
This is the alternative to the less invasive clang-format only patch: D126783
Reviewed By: spatel, rengolin
Differential Revision: https://reviews.llvm.org/D126889
D118623 added code to fold not-of-compare into a compare
with the inverted predicate, if the compare had no other
uses. This relies on accurate use lists in the IR but it
was run before setPhiValues, when some phi inputs are still
stored in a data structure on the side, instead of being
real uses in the IR. The effect was that a phi that should
be using the original compare result would now get an
inverted result instead.
Fix this by moving simplifyConditions after setPhiValues.
Differential Revision: https://reviews.llvm.org/D120312
In some cases StructurizeCFG inserts i1 xor instructions to invert
predicates. Add a quick loop to clean these up afterwards if we can get
away with modifying an existing compare instruction instead.
(StructurizeCFG is generally run late in the pipeline so instcombine
does not clean them up for us.)
Differential Revision: https://reviews.llvm.org/D118623
This doesn't support -structurizecfg-skip-uniform-regions since that
would require porting LegacyDivergenceAnalysis.
The NPM doesn't support adding a non-analysis pass as a dependency of
another, so I had to add -lowerswitch to some tests or pin them to the
legacy PM.
This is the only RegionPass in tree, so I simply copied the logic for
finding all Regions from the legacy PM's RGManager into
StructurizeCFG::run().
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D89026
PassManager.h is one of the top headers in the ClangBuildAnalyzer frontend worst offenders list.
This exposes a large number of implicit dependencies on various forward declarations/includes in other headers that need addressing.
This is a reimplementation of the `orderNodes` function, as the old
implementation didn't take into account all cases.
The new implementation uses SCCs instead of Loops to take account of
irreducible loops.
Fix PR41509
Differential Revision: https://reviews.llvm.org/D79037
This reverts commit 897d8ee5cd693e17f95a7e84194bca4c089a520b,
due to causing an infinite loop when encountering a loop with
a sub-region with an inner loop.
This is a reimplementation of the `orderNodes` function, as the old
implementation didn't take into account all cases.
Fix PR41509
Differential Revision: https://reviews.llvm.org/D79037
For each natural loop with multiple exit blocks, this pass creates a
new block N such that all exiting blocks now branch to N, and then
control flow is redistributed to all the original exit blocks.
The bulk of the tranformation is a new function introduced in
BasicBlockUtils that an redirect control flow from a set of incoming
blocks to a set of outgoing blocks via a common "hub".
This is a useful workaround for a limitation in the structurizer which
incorrectly orders blocks when processing a nest of loops. This pass
bypasses that issue by ensuring that each natural loop is recognized
as a separate region. Since the structurizer is a region pass, it no
longer sees a nest of loops in a single region, and instead processes
each "level" in the nesting as a separate region.
The AMDGPU backend provides a new option to enable this pass before
the structurizer, which may eventually be enabled by default.
Reviewers: madhur13490, arsenm, nhaehnle
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D75865
After structurization, some phi nodes can have a single incoming edge
and can be simplified away. This change runs a simplify query on all
phis that are either modified or added by the structurizer. This also
moves some phis closer to their use as a side benefit.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D75500
This file lists every pass in LLVM, and is included by Pass.h, which is
very popular. Every time we add, remove, or rename a pass in LLVM, it
caused lots of recompilation.
I found this fact by looking at this table, which is sorted by the
number of times a file was changed over the last 100,000 git commits
multiplied by the number of object files that depend on it in the
current checkout:
recompiles touches affected_files header
342380 95 3604 llvm/include/llvm/ADT/STLExtras.h
314730 234 1345 llvm/include/llvm/InitializePasses.h
307036 118 2602 llvm/include/llvm/ADT/APInt.h
213049 59 3611 llvm/include/llvm/Support/MathExtras.h
170422 47 3626 llvm/include/llvm/Support/Compiler.h
162225 45 3605 llvm/include/llvm/ADT/Optional.h
158319 63 2513 llvm/include/llvm/ADT/Triple.h
140322 39 3598 llvm/include/llvm/ADT/StringRef.h
137647 59 2333 llvm/include/llvm/Support/Error.h
131619 73 1803 llvm/include/llvm/Support/FileSystem.h
Before this change, touching InitializePasses.h would cause 1345 files
to recompile. After this change, touching it only causes 550 compiles in
an incremental rebuild.
Reviewers: bkramer, asbirlea, bollu, jdoerfert
Differential Revision: https://reviews.llvm.org/D70211
D62198 introduced an option to relax the checks for
hasOnlyUniformBranches. This commit turns the option on by default, for
better code generation in some cases in AMDGPU.
Differential Revision: https://reviews.llvm.org/D63198
Change-Id: I9cbff002a1e74d3b7eb96b4192dc8129936d537d
llvm-svn: 368042
Summary:
There is PHINode::getBasicBlockIndex() and PHINode::setIncomingValue()
but no function to replace incoming value for a specified BasicBlock*
predecessor.
Clearly, there are a lot of places that could use that functionality.
Reviewer: craig.topper, lebedev.ri, Meinersbur, kbarton, fhahn
Reviewed By: Meinersbur, fhahn
Subscribers: fhahn, hiraditya, zzheng, jsji, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D63338
llvm-svn: 363566
This change relaxes the checks for hasOnlyUniformBranches such that our
region is uniform if:
1. All conditional branches that are direct children are uniform.
2. And either:
a. All sub-regions are uniform.
b. There is one or less conditional branches among the direct
children.
Differential Revision: https://reviews.llvm.org/D62198
llvm-svn: 361610
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
by `getTerminator()` calls instead be declared as `Instruction`.
This is the biggest remaining chunk of the usage of `getTerminator()`
that insists on the narrow type and so is an easy batch of updates.
Several files saw more extensive updates where this would cascade to
requiring API updates within the file to use `Instruction` instead of
`TerminatorInst`. All of these were trivial in nature (pervasively using
`Instruction` instead just worked).
llvm-svn: 344502