As per @arsenm 's instructions, I've separated the non-functional
changes from https://github.com/llvm/llvm-project/pull/169958.
Afterwards I'll tackle the functional ones one by one. I hope I did
everything right this time.
Full descriptions in the article:
https://pvs-studio.com/en/blog/posts/cpp/1318/
3. Array overrun is possible.
The PVS-Studio warning: V557 Array overrun is possible. The value of
'regIdx' index could reach 31. VEAsmParser.cpp 696
10. Excessive check.
The PVS-Studio warning: V547 Expression 'IsLeaf' is always false.
PPCInstrInfo.cpp 419
11. Doubling the same check.
The PVS-Studio warning: V581 The conditional expressions of the 'if'
statements situated alongside each other are identical. Check lines:
5820, 5823. PPCInstrInfo.cpp 5823
15. Excessive check.
The PVS-Studio warning: V547 Expression 'i != e' is always true.
MachineFunction.cpp 1444
17. Excessive assignment.
The PVS-Studio warning: V1048 The 'FirstOp' variable was assigned the
same value. MachineInstr.cpp 1995
18. Excessive check.
The PVS-Studio warning: V547 Expression 'AllSame' is always true.
SimplifyCFG.cpp 1914
19. Excessive check.
The PVS-Studio warning: V547 Expression 'AbbrevDecl' is always true.
LVDWARFReader.cpp 398
SwitchInst case values must be ConstantInt, which have no use list.
Therefore it is not necessary to store these as Use, instead store them
more efficiently as a simple array of pointers after the uses, similar
to how PHINode stores basic blocks.
After this change, the successors of all terminators are stored
consecutively in the operand list. This is preparatory work for
improving the performance of successor access.
Add new C API functions so that switch case values remain accessible
from bindings for other languages.
While this could also be achieved by merely changing the order of
operands (i.e., first all successors, then all constants), doing so
would increase the asymptotic runtime of addCase from O(1) to O(n)
(i.e., adding n cases would be O(n^2)), because it would need to shift
all constants by one slot. Having null/invalid operands is also a bad
idea and would cause much more breakage.
Pull Request: https://github.com/llvm/llvm-project/pull/170984
Continued from #165804
This maintains the BFI of the default branch. Originally `10/63`, post-pass, it ends up being `5/63 + 58/63 * 5/58`(first term is from `PROF`, second is the probability of going to the switch lookup times the probability, there, of taking the default branch)
Issue #147390
This picks up from #166028, making the `Function` argument optional:
most cases don't need to provide it, but in e.g. InstCombine's case,
where the instruction (select, branch) is not attached to a function
yet, the function needs to be passed explicitly.
Co-authored-by: Florian Hahn <flo@fhahn.com>
ConstantRange uses `[-1, -1)` as the canonical form of a full set.
Therefore, the `for (APInt I = Lower; I != Upper; ++I)` idiom doesn't
work for full ranges. This patch fixes the value enumeration in
`ConstantComparesGatherer` to prevent missing values for full sets.
Closes https://github.com/llvm/llvm-project/issues/166369.
PR #161000 introduced a bug whereby the IR would become invalid by having an unconditional branch have `!prof`attached to it. This only became evident in PR #165744, because the IR of `test/Transforms/SimplifyCFG/pr165301.ll`was simple enough to both (1) introduce the unconditional branch, and (2) survive in that fashion until the end of the pass (simplifycfg) and thus trip the verifier.
`simplifySwitchOfPowersOfTwo` converts (when applicable, see `00f5a1e30b`) a switch to a conditional branch. Its false case goes to the `default` target of the former switch, and the true case goes to a BB performing a `cttz`. We can calculate the branch weights from the branch weights of the old switch.
Issue #147390
A recent commit 00f5a1e30b modified simplifySwitchOfPowersOfTwo to
generate a branch to handle the non-power-of-2 case when appropriate,
but does not set a DebugLoc on the new branch instruction; this patch
propagates the switch's DebugLoc to the new branch, as for the other
instructions generated in the same block.
Found using #107279.
In `simplifyBranchOnICmpChain`, if we can merge the comparisons into a
range check, use a conditional branch instead. This change also breaks
the cycle found in https://github.com/llvm/llvm-project/issues/165088.
Closes https://github.com/llvm/llvm-project/issues/165088.
Detailed description of the cycle:
```
define void @pr165088_cycle_1(i8 %x) {
entry:
%switch = icmp uge i8 %x, 2
%cond1 = icmp ugt i8 %x, 1
%or.cond = and i1 %switch, %cond1
br i1 %or.cond, label %block3, label %block2
block1:
%cond2 = icmp ugt i8 %x, 1
br i1 %cond2, label %block3, label %block2
block2:
br label %block3
block3:
%cond3 = icmp eq i8 %x, 0
br i1 %cond3, label %exit, label %block1
exit:
ret void
}
```
`simplifyBranchOnICmpChain` folds the branch in `entry` to a switch.
Then we get:
```
entry:
switch i8 %x, label %block3 [
i8 1, label %block2
i8 0, label %block2
]
...
```
`performValueComparisonIntoPredecessorFolding` redirects the default
target `block3` into `block1` because `%x` is never zero.
```
entry:
switch i8 %x, label %block1 [
i8 1, label %block2
i8 0, label %block2
]
...
```
Then `turnSwitchRangeIntoICmp` will convert the switch back into a
branch on icmp.
```
entry:
%switch = icmp ult i8 %x, 2
br i1 %switch, label %block2, label %block1
...
```
Since `block1` and `block2` share the same successor `block3`,
`performBranchToCommonDestFolding` merges the conditions of `entry` and
`block1`, resulting in the original pattern again.
Favour a `cttz`-indexed table lookup over an indirect jump table when
the default switch case is reachable, by branching non-power-of-two
inputs to the default case.
Proofs: https://alive2.llvm.org/ce/z/HeRAtf.
While we do not want to form actual lookup tables early, we do want to
perform some optimizations, as they may enable inlining of the much
simpler form.
Builds on https://github.com/llvm/llvm-project/pull/156477, which
originally included this change as well. This PR makes two changes on
top of it:
* Do not perform the optimization early if it requires adding a mask
check. These make the resulting IR less analyzable.
* Add a new SimplifyCFG option that controls switch-to-arithmetic
conversion separately from switch-to-lookup conversion. Enable the new
flag at the end of the function simplification pipeline. This means that
we attempt the arithmetic conversion before inlining, but avoid it in
the early pipeline, where it may lose information.
Handle branch weights for `indirectbr` simplification: if we drop branches that aren't taken, we just need to remove the corresponding branch weight (which is presumably 0).
Issue #147390
Fixes#157113.
Take the following IR as an example; we know the destination of the `[1,
3]` cases is `%else`.
```llvm
define i32 @src(i8 range(i8 0, 6) %arg) {
switch i8 %arg, label %else [
i8 0, label %if
i8 4, label %if
i8 5, label %if
]
if:
ret i32 0
else:
ret i32 1
}
```
We can first try the non-wrapping range for both destinations, but I
don't see how that would be any better.
Proof: https://alive2.llvm.org/ce/z/acdWD4.
The switch becomes a conditional branch, one edge going to what was the default target of the switch, the other to a BB that performs a lookup in a table. The branch weights are accurately determinable from the ones of the switch.
Issue #147390
We cannot calculate the weights of the switch precisely, but we do know the probability of the default branch. We then split equally the remaining probability over the rest of the cases. If we did nothing, the static estimation could be considerably poorer.
Issue #147390
There's a pattern throughout LLVM of cl::opts being exported. That in
itself is probably a bit unfortunate, but what's especially bad about it
is that a lot of those symbols are in the global namespace. Move them
into the llvm namespace.
While doing this, I noticed some other variables in the global namespace
and moved them as well.
The main difference between SimplifyCFG's `setBranchWeights` and the ProfDataUtils' is that the former doesn't propagate all-zero weights. That seems like a sensible thing to do, so updated the latter accordingly, and added a flag to control the behavior.
Also moved to ProfDataUtils the logic fitting 64-bit weights to 32-bit.
As side-effect, this fixes some profcheck failures.
This is an overly broad check, the transformation made here can be done
safely for pointers with index!=repr width. This fixes the codegen
regression introduced by https://github.com/llvm/llvm-project/pull/105735
and should be beneficial for AMDGPU code-generation once the datalayout
there no longer uses the overly strict `ni:` specifier.
Reviewed By: arsenm
Pull Request: https://github.com/llvm/llvm-project/pull/159890
This patch fixes:
llvm/lib/Transforms/Utils/SimplifyCFG.cpp:338:6: error: unused
function 'isSelectInRoleOfConjunctionOrDisjunction'
[-Werror,-Wunused-function]
There’s a pattern where a branch is conditioned on a conjunction or disjunction that ends up being modeled as a `select` where the first operand is set to `true` or the second to `false`. If the branch has known branch weights, they can be copied to the `select`. This is worth doing in case later the `select` gets transformed to something else (i.e. if we know the profile, we should propagate it).
Issue #147390
In some cases, we can replace a switch with simpler instructions or a
lookup table.
For instance, if every case results in the same value, we can simply
replace the switch
with that single value.
However, lookup tables are not always supported.
Targets, function attributes and compiler options can deactivate lookup
table creation.
Currently, even simpler switch replacements like the single value
optimization do not
get applied, because we only enable these transformations if lookup
tables are enabled.
This PR enables the other kinds of replacements, even if lookup tables
are not supported.
First, it checks if the potential replacements are lookup tables.
If they are, then check if lookup tables are supported and whether to
continue.
If they are not, then we can apply the other transformations.
Originally, lookup table creation was delayed until late stages of the
compilation pipeline, because
it can result in difficult-to-analyze code and prevent other
optimizations.
As a side effect of this change, we can also enable the simpler
optimizations much earlier in the
compilation process.
The branch weights capture probability. The probability has everything to do with the (SSA) value the condition is predicated on, and nothing to do with the position in the CFG.
This PR is the first part to solve the issue in #149937.
The end goal is enabling more switch optimizations on targets that do
not support lookup tables.
SimplifyCFG has the ability to replace switches with either a few simple
calculations, a single value, or a lookup table.
However, it only considers these options if the target supports lookup
tables, even if the final result is not a LUT, but a few simple
instructions like muls, adds and shifts.
To enable more targets to use these other kinds of optimization, this PR
restructures the code in `switchToLookup`.
Previously, code was generated even before choosing what kind of
replacement to do. However, we need to know if we actually want to
create a true LUT or not before generating anything. Then we can check
for target support only if any LUT would be created.
This PR moves the code so it first determines the replacement kind and
then generates the instructions.
A later PR will insert the target support check after determining the
kind of replacement. If the result is not a LUT, then even targets
without LUT support can replace the switch with something else.
This is about code readability. The operands in the disjunction forming the combined predicate in `mergeConditionalStoreToAddress` could sometimes be negated twice. This patch addresses that.
2 tests needed updating because they exposed the double negation and now they don’t.
Updates SimplifyCFG to avoid jump threading through loop headers if
-keep-loops is requested. Canonical loop form requires a loop header
that dominates all blocks in the loop. If we thread through a header, we
risk breaking its domination of the loop. This change avoids this issue
by conservatively avoiding threading through headers entirely.
Fixes: https://github.com/llvm/llvm-project/issues/151144