This PR removes tests with `br i1 undef` under
`llvm/tests/Transforms/ObjCARC, Reassociate, SCCP, SLPVectorizer...`.
After this PR, I'll continue to fix tests under `llvm/tests/CodeGen`,
which has more UB tests than `llvm/tests/Transforms`.
Currently `StructurizeCFG` drops branch_weight metadata .
This metadata can be generated from user annotations in the source code
like:
```cpp
if (...) [[likely]] {
}
```
[FixIrreducible] Use CycleInfo instead of a custom SCC traversal
1. CycleInfo efficiently locates all cycles in a single pass, while the
SCC is
repeated inside every natural loop.
2. CycleInfo provides a hierarchy of irreducible cycles, and the new
implementation transforms each cycle in this hierarchy separately
instead of
reducing an entire irreducible SCC in a single step. This reduces the
number
of control-flow paths that pass through the header of each newly created
loop. This is evidenced by the reduced number of predecessors on the
"guard"
blocks in the lit tests, and fewer operands on the corresponding PHI
nodes.
3. When an entry of an irreducible cycle is the header of a child
natural loop,
the original implementation destroyed that loop. This is now preserved,
since the incoming edges on non-header entries are not touched.
4. In the new implementation, if an irreducible cycle is a superset of a
natural
loop with the same header, then that natural loop is destroyed and
replaced
by the newly created loop.
CreateControlFlowHub is a method that redirects control flow edges from a set of
incoming blocks to a set of outgoing blocks through a new set of "guard" blocks.
This is now refactored into a separate file with one enhancement: The input to
the method is now a set of branches rather than two sets of blocks.
The original implementation reroutes every edge from incoming blocks to outgoing
blocks. But it is possible that for some incoming block InBB, some successor S
might be in the set of outgoing blocks, but that particular edge should not be
rerouted. The new implementation makes this possible by allowing the user to
specify the targets of each branch that need to be rerouted.
This is needed when improving the implementation of FixIrreducible #101386.
Current use in FixIrreducible does not demonstrate this finer control over the
edges being rerouted. But in UnifyLoopExits, when only one successor of an
exiting block is an exit block, this refinement now reroutes only the relevant
control-flow through the edge; the non-exit successor is not rerouted. This
results in fewer branches and PHI nodes in the hub.
After investigating more while-break cases, I think we should try to
optimize
the way we reconstruct phi nodes. Previously, we reconstruct each phi
nodes separately, but this is not optimal. For example:
```
header:
%v.1 = phi float [ %v, %entry ], [ %v.2, %latch ]
br i1 %cc, label %if, label %latch
if:
%v.if = fadd float %v.1, 1.0
br i1 %cc2, label %latch, label %exit
latch:
%v.2 = phi float [ %v.if, %if ], [ %v.1, %header ]
br i1 %cc3, label %exit, label %header
exit:
%v.3 = phi float [ %v.2, %latch ], [ %v.if, %if ]
```
For this case, we have different copies of value `v`, but there is at
most one copy of value `v` alive at any program point shown above.
The existing ssa reconstruction will use the incoming values from the
old deleted phi. Below is a possible output after ssa reconstruction.
```
header:
%v.1 = phi float [ %v, %entry ], [ %v.loop, %Flow1 ]
br i1 %cc, label %if, label %flow
if:
%v.if = fadd float %v.1, 1.0
br label %flow
flow:
%v.exit.if = phi float [ %v.if, %if ], [ undef, %header ]
%v.latch = phi float [ %v.if, %if ], [ %v.1, %header ]
latch:
br label %flow1
flow1:
%v.loop = phi float [ %v.latch, %latch ], [ undef, %Flow ]
%v.exit = phi float [ %v.latch, %latch ], [ %v.exit.if, %Flow ]
exit:
%v.3 = phi float [ %v.exit, %flow1 ]
```
If we look closely, in order to reconstruct `v.1` `v.2` `v.3`, we are
having two simultaneous copies of `v` alive at `flow` and `flow1`.
We highly depend on register coalescer to coalesce them together.
But register coalescer may not always be able to coalesce them
because of the complexity in the chain of phi.
On the other side, now that we have only one copy of `v` alive at any
program point before the transform, why not simplify the phi network
as much as we can? Look at the incoming values of these PHIs:
```
header if latch
v.1: -- -- v.2
v.2: v.1 v.if --
v.3: -- v.if v.2
```
If we let them share the same incoming values for these three different
incoming blocks, then we would have only one copy of alive `v` at any
program point after ssa reconstruction. Something like:
```
header:
%v.1 = phi float [ %v, %entry ], [ %v.2, %Flow1 ]
br i1 %cc, label %if, label %flow
if:
%v.if = fadd float %v.1, 1.0
br label %flow
flow:
%v.2 = phi float [ %v.if, %if ], [ %v.1, %header ]
latch:
br label %flow1
flow1:
...
exit:
%v.3 = phi float [ %v.2, %flow1 ]
```
Remove support for the icmp and fcmp constant expressions.
This is part of:
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179
As usual, many of the updated tests will no longer test what they were
originally intended to -- this is hard to preserve when constant
expressions get removed, and in many cases just impossible as the
existence of a specific kind of constant expression was the cause of the
issue in the first place.
Some passes has limitation that only support simple terminators:
branch/unreachable/return. Right now, they ask the pass manager to add
LowerSwitch pass to eliminate `switch`. Let's manage such kind of pass
dependency by ourselves. Also add the assertion in the related passes.
Relative to the previous attempt, this also adjusts RegionInfo
verification to allow unreachable predecessors.
-----
If a block in the CHR region has an unreachable predecessor, then
there will be no edge from that predecessor to the newly cloned
block. However, a phi node entry for it will be left behind. Make
sure that these incoming blocks get dropped as well.
Fixes https://github.com/llvm/llvm-project/issues/64594.
Differential Revision: https://reviews.llvm.org/D157621
In order to enable the LLVM frontend to better analyze buffer
operations (and to potentially enable more precise analyses on the
backend), define versions of the raw and structured buffer intrinsics
that use `ptr addrspace(8)` instead of `<4 x i32>` to represent their
rsrc arguments.
The new intrinsics are named by replacing `buffer.` with `buffer.ptr`.
One advantage to these intrinsic definitions is that, instead of
specifying that a buffer load/store will read/write some memory, we
can indicate that the memory read or written will be based on the
pointer argument. This means that, for example, a read from a
`noalias` buffer can be pulled out of a loop that is modifying a
distinct buffer.
In the future, we will define custom PseudoSourceValues that will
allow us to package up the (buffer, index, offset) triples that buffer
intrinsics contain and allow for more precise backend analysis.
This work also enables creating address space 7, which represents
manipulation of raw buffers using native LLVM load and store
instructions.
Where tests simply used a buffer intrinsic while testing some other
code path (such as the tests for VGPR spills), they have been updated
to use the new intrinsic form. Tests that are "about" buffer
intrinsics (for instance, those that ensure that they codegen as
expected) have been duplicated, either within existing files or into
new ones.
Depends on D145441
Reviewed By: arsenm, #amdgpu
Differential Revision: https://reviews.llvm.org/D147547
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.
Reformatting is done with `black`.
If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.
If you run into any problems, post to discourse about it and
we will try to help.
RFC Thread below:
https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style
Reviewed By: barannikov88, kwk
Differential Revision: https://reviews.llvm.org/D150762
The existing way of creating the predicate in the guard blocks uses
a boolean value per outgoing block. This increases the number of live
booleans as the number of outgoing blocks increases. The new way added
in this change is to store one integer to represent the outgoing block
we want to branch to, then at each guard block, an integer equality
check is performed to decide which a specific outgoing block is taken.
Using an integer reduces the number of live values and decreases
register pressure especially in cases where there are a large number
of outgoing blocks. The integer based approach is used when the
number of outgoing blocks crosses a threshold, which is currently set
to 32.
Patch by Ruiling Song.
Differential review: https://reviews.llvm.org/D127831
During structurization process, we may place non-predecessor blocks
between the predecessors of a block in the structurized CFG. Take
the typical while-break case as an example:
```
/---A(v=...)
| / \
^ B C
| \ /|
\---L |
\ /
E (r = phi (v:C)...)
```
After structurization, the CFG would be look like:
```
/---A
| |\
| | C
| |/
| F1
^ |\
| | B
| |/
| F2
| |\
| | L
\ |/
\--F3
|
E
```
We can see that block B is placed between the predecessors(C/L) of E.
During phi reconstruction, to achieve the same sematics as before, we
are reconstructing the PHIs as:
F1: v1 = phi (v:C), (undef:A)
F3: r = phi (v1:F2), ...
But this is also saying that `v1` would be live through B, which is not
quite necessary. The idea in the change is to say the incoming value
from B is Undef for the PHI in E. With this change, the reconstructed
PHI would be:
F1: v1 = phi (v:C), (undef:A)
F2: v2 = phi (v1:F1), (undef:B)
F3: r = phi (v2:F2), ...
Reviewed by: sameerds
Differential Revision: https://reviews.llvm.org/D132450
The instruction simplification will try to simplify the affected phis.
In some cases, this might extend the liveness of values. For example:
BB0:
| \
| BB1
| /
BB2:phi (BB0, v), (BB1, undef)
The phi in BB2 will be simplified to v as v dominates BB2, but this is
increasing the number of active values in BB1. By setting CanUseUndef
to false, we will not simplify the phi in this way, this would help
register pressure. This is mandatory for the later change to help
reducing VGPR pressure for AMDGPU.
Reviewed by: foad, sameerds
Differential Revision: https://reviews.llvm.org/D132449
This reverts commit f1b05a0a2bbbea160002be709f8a1c59de366761.
Need to revert to due to issues identified with testing. The
transformation is incorrect for blocks that contain convergent
instructions.
StructurizeCFG linearizes the successors of branching basic block
by adding Flow blocks to record the true/false path for branches
and back edges. This patch reduces the number of Phi values needed
to capture the control flow path by improving the basic block
ordering.
Previously, StructurizeCFG adds loop exit blocks outside of the
loop. StructurizeCFG sets a boolean value to indicate the path
taken, and all exit block live values extend to after the loop.
For loops with a large number of exits blocks, this creates a
huge number of values that are maintained, which increases
compilation time and register pressure. This is problem
especially with ASAN, which adds early exits to blocks with
unreachable instructions for each instrumented check in the loop.
In specific cases, this patch reduces the number of values needed
after the loop by moving the exit block into the loop. This is
done for blocks that have a single predecessor and single successor
by moving the block to appear just after the predecessor.
Differential Revision: https://reviews.llvm.org/D123231
The NewDefault was used to simplify the updating of PHI nodes, but it
causes some inefficiency for target that will run structurizer later. For
example, for a simple two-case switch, the extra NewDefault is causing
unstructured CFG like:
O
/ \
O O
/ \ / \
C1 ND C2
\ | /
\ | /
D
The change is to avoid the ND(NewDefault) block, that is we will get a
structured CFG for above example like:
O
/ \
/ \
O O
/ \ / \
C1 \ / C2
\-> D <-/
The IR change introduced by this patch should be trivial to other targets,
so I am doing this unconditionally.
Fall-through among the cases will also cause unstructured CFG, but it need
more work and will be addressed in a separate change.
Reviewed by: arsenm
Differential Revision: https://reviews.llvm.org/D123607
D118623 added code to fold not-of-compare into a compare
with the inverted predicate, if the compare had no other
uses. This relies on accurate use lists in the IR but it
was run before setPhiValues, when some phi inputs are still
stored in a data structure on the side, instead of being
real uses in the IR. The effect was that a phi that should
be using the original compare result would now get an
inverted result instead.
Fix this by moving simplifyConditions after setPhiValues.
Differential Revision: https://reviews.llvm.org/D120312
In some cases StructurizeCFG inserts i1 xor instructions to invert
predicates. Add a quick loop to clean these up afterwards if we can get
away with modifying an existing compare instruction instead.
(StructurizeCFG is generally run late in the pipeline so instcombine
does not clean them up for us.)
Differential Revision: https://reviews.llvm.org/D118623
This reverts commit a6b54ddaba2d5dc0f72dcc4591c92b9544eb0016.
Apparently it is not safe to modify the condition even if it passes the
hasOneUse test, because StructurizeCFG might have other references to
the condition that are not manifest in the IR use-def chains.
This avoids various cases where StructurizeCFG would otherwise insert an
xor i1 instruction, and it since it generally runs late in the pipeline,
instcombine does not clean up the xor-of-cmp pattern.
Differential Revision: https://reviews.llvm.org/D118478
Since d6de1e1a71406c75a4ea4d5a2fe84289f07ea3a1, no attributes is quivalent to
setting attribute to false.
This is a preliminary commit for https://reviews.llvm.org/D99080
This commit copies existing tests at llvm/Transforms and replaces
'insertelement undef' in those files with 'insertelement poison'.
(see https://reviews.llvm.org/D93586)
Tests listed using this script:
grep -R -E '^[^;]*insertelement <.*> undef,' . | cut -d":" -f1 | uniq |
wc -l
Tests updated:
file_org=llvm/test/Transforms/$1
file=${file_org%.ll}-inseltpoison.ll
cp $file_org $file
sed -i -E 's/^([^;]*)insertelement <(.*)> undef/\1insertelement <\2> poison/g' $file
head -1 $file | grep "Assertions have been autogenerated by utils/update_test_checks.py" -q
if [ "$?" == 1 ]; then
echo "$file : should be manually updated"
# I manually updated the script
exit 1
fi
python3 ./llvm/utils/update_test_checks.py --opt-binary=./build-releaseassert/bin/opt $file
This doesn't support -structurizecfg-skip-uniform-regions since that
would require porting LegacyDivergenceAnalysis.
The NPM doesn't support adding a non-analysis pass as a dependency of
another, so I had to add -lowerswitch to some tests or pin them to the
legacy PM.
This is the only RegionPass in tree, so I simply copied the logic for
finding all Regions from the legacy PM's RGManager into
StructurizeCFG::run().
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D89026
This is a reimplementation of the `orderNodes` function, as the old
implementation didn't take into account all cases.
The new implementation uses SCCs instead of Loops to take account of
irreducible loops.
Fix PR41509
Differential Revision: https://reviews.llvm.org/D79037
Prevent `invertCondition` from creating the inversion instruction, in
case the given value is an argument which has already been inverted.
Note that this approach has already been taken in case the given value
is an instruction (and not an argument).
Differential Revision: https://reviews.llvm.org/D80399
This reverts commit 897d8ee5cd693e17f95a7e84194bca4c089a520b,
due to causing an infinite loop when encountering a loop with
a sub-region with an inner loop.
This is a reimplementation of the `orderNodes` function, as the old
implementation didn't take into account all cases.
Fix PR41509
Differential Revision: https://reviews.llvm.org/D79037
An irreducible SCC is one which has multiple "header" blocks, i.e., blocks
with control-flow edges incident from outside the SCC. This pass converts an
irreducible SCC into a natural loop by introducing a single new header
block and redirecting all the edges on the original headers to this
new block.
This is a useful workaround for a limitation in the structurizer
which, which produces incorrect control flow in the presence of
irreducible regions. The AMDGPU backend provides an option to
enable this pass before the structurizer, which may eventually be
enabled by default.
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D77198
This restores commit 2ada8e2525dd2653f30c8696a27162a3b1647d66.
Originally reverted with commit 44e09b59b869a91bf47d76e8bc569d9ee91ad145.
This reverts commit 2ada8e2525dd2653f30c8696a27162a3b1647d66.
Buildbots produced compilation errors which I was not able to quickly
reproduce locally. Need more time to investigate.
An irreducible SCC is one which has multiple "header" blocks, i.e., blocks
with control-flow edges incident from outside the SCC. This pass converts an
irreducible SCC into a natural loop by introducing a single new header
block and redirecting all the edges on the original headers to this
new block.
This is a useful workaround for a limitation in the structurizer
which, which produces incorrect control flow in the presence of
irreducible regions. The AMDGPU backend provides an option to
enable this pass before the structurizer, which may eventually be
enabled by default.
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D77198
For each natural loop with multiple exit blocks, this pass creates a
new block N such that all exiting blocks now branch to N, and then
control flow is redistributed to all the original exit blocks.
The bulk of the tranformation is a new function introduced in
BasicBlockUtils that an redirect control flow from a set of incoming
blocks to a set of outgoing blocks via a common "hub".
This is a useful workaround for a limitation in the structurizer which
incorrectly orders blocks when processing a nest of loops. This pass
bypasses that issue by ensuring that each natural loop is recognized
as a separate region. Since the structurizer is a region pass, it no
longer sees a nest of loops in a single region, and instead processes
each "level" in the nesting as a separate region.
The AMDGPU backend provides a new option to enable this pass before
the structurizer, which may eventually be enabled by default.
Reviewers: madhur13490, arsenm, nhaehnle
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D75865
After structurization, some phi nodes can have a single incoming edge
and can be simplified away. This change runs a simplify query on all
phis that are either modified or added by the structurizer. This also
moves some phis closer to their use as a side benefit.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D75500
This change relaxes the checks for hasOnlyUniformBranches such that our
region is uniform if:
1. All conditional branches that are direct children are uniform.
2. And either:
a. All sub-regions are uniform.
b. There is one or less conditional branches among the direct
children.
Differential Revision: https://reviews.llvm.org/D62198
llvm-svn: 361610