Fixes#76609
This patch does:
- relax the phis constraint in `CanRedirectPredsOfEmptyBBToSucc`
- guarantee the BB has multiple different predecessors to redirect, so
that we can handle the case without phis in BB. Without this change and
phi constraint, we may redirect the CommonPred.
The motivation is consistent with JumpThreading. We always want the
branch to jump more direct to the destination, without passing the
middle block. In this way, we can expose more other optimization
opportunities.
An obivous example proposed by @dtcxzyw is like:
```llvm
define i32 @test(...) {
entry:
br i1 %c, label %do.end, label %if.then
if.then: ; preds = %entry
%call2 = call i32 @dummy()
%tobool3.not = icmp eq i32 %call2, 0
br i1 %tobool3.not, label %do.end, label %return
do.end: ; preds = %entry, %if.then
br label %return
return: ; preds = %if.then, %do.end
%retval.0 = phi i32 [ 0, %do.end ], [ %call2, %if.then ]
ret i32 %retval.0
}
```
`entry` can directly jump to return, without passing `do.end`, and then
the if-else pattern can be simplified further:
```llvm
define i32 @test(...) {
entry:
br i1 %c, label %return, label %if.then
if.then: ; preds = %entry
%call2 = call i32 @dummy()
br label %return
return: ; preds = %if.then
%retval.0 = phi i32 [ 0, %entry ], [ %call2, %if.then ]
ret i32 %retval.0
}
```
BlockFrequencyInfo calculates block frequencies as Scaled64 numbers but as a last step converts them to unsigned 64bit integers (`BlockFrequency`). This improves the factors picked for this conversion so that:
* Avoid big numbers close to UINT64_MAX to avoid users overflowing/saturating when adding multiply frequencies together or when multiplying with integers. This leaves the topmost 10 bits unused to allow for some room.
* Spread the difference between hottest/coldest block as much as possible to increase precision.
* If the hot/cold spread cannot be represented loose precision at the lower end, but keep the frequencies at the upper end for hot blocks differentiable.