When we simplify a pair of selects, we want to propagate profile
information when the condition remains the same and drop it when it does
not. Before this patch, we were keeping incorrect profile data in
addition to not annotating any new select instructions that had the same
value as a previous one.
Noticed by looking at 80d9df6b054cebfbe97d709195be4e61a7acc694.
This is a step on the yak-shaving expedition to properly implement the
new `minnum`/`maxnum` signed-zero semantics.
`InstCombineSelect` will convert a `fcmp`+`select` sequence to a
`minnum`/`maxnum` intrinsic. It doesn't require the `fcmp` to have any
particular fast-math flags, just that the `select` has `nnan` and `nsz`
(or is being used in a context where the result doesn't care about
signed zero).
It's not correct to propagate the `nnan` flag from the `fcmp`
instruction for poison-propagation reasons. Patches like
https://github.com/llvm/llvm-project/pull/117977 and
https://github.com/llvm/llvm-project/pull/141010 have *generously* made
it so that if `fcmp` doesn't have fast-math flags, we can still perform
the transformation by simply dropping the flags on the generated
intrinsic.
Unfortunately, converting an `fcmp`+`select` with fast-math flags, to a
`minnum`/`maxnum` without fast-math flags, is actually a
*pessimization*. Common ISAs like x86 and WebAssembly do not provide
single floating-point minimum/maximum instructions that handle NaN in
the same way as `minnum`/`maxnum`. They have to fall back to a routine
or a libcall, which causes its own problems
(https://github.com/llvm/llvm-project/issues/54554).
[Here's an example](https://llvm.godbolt.org/z/avYxa7ccK). Using just
`llc`, the function compiles to a single `maxss`. But with `clang -O3`,
which enables InstCombine optimizations, we end up with a much larger
(and completely unnecessary) routine for handling NaNs.
We should only perform this transformation if it's safe to add `nnan`
and `nsz` to the `minnum`/`maxnum` call, since those are the flags
necessary for lowering it *back* into an `fcmp`+`select` (or equivalent
code).
There's currently one case where we might end up without `nsz` on the
intrinsic call: if the output has one use, and that use doesn't care
about signed zero. To account for that, we should also unconditionally
set `nsz` on the generated intrinsic call.
With this in place, it should be safe to eventually revert
https://reviews.llvm.org/D122610 and further consolidate
`minnum`/`maxnum` handling.
Fixes#82350
Address cases like:
```
select(C0, select(C1, b, a), b) -> select(C0&!C1, a, b)
select(C0, a, select(C1, b, a)) -> select(C0|!C1, a, b)
```
It seem that it generates better code for the real world examples for
the few targets I have checked: https://godbolt.org/z/KeEMd9b8E .
On the most generic case it generates the same assembly code for the
sources and targets for all targets, expect RISC-V, where the targets
seem shoretr and better (less branching):
https://godbolt.org/z/3has1Td5G So I did not experience any regression
on any target in no scenario.
Proofs: https://alive2.llvm.org/ce/z/DoL3zQ
Doing a nop replaceOperand leads us into an infinite loop here.
This was found by a fuzzer I'm working on. The high-level design is to
randomly generate LLVM IR, run a pass on it, and then run the original
and new IR through the interpreter. They should produce the same
results. Right now I'm only fuzzing instcombine.
The masking rewrite in `foldBitCeil` assumes a power-of-two bitwidth.
For non-power-of-two integer types, `(-ctlz) & (BitWidth - 1)` is not
equivalent to `BitWidth - ctlz` and can miscompile.
This patch restricts the transform to power-of-two bitwidths.
Alive2 proof: https://alive2.llvm.org/ce/z/i2E6zTFixes#173787
Fixes
https://github.com/llvm/llvm-project/pull/162003#issuecomment-3693943568.
The current flag propagation assumes that if a select has both `ninf`
and `nnan`, then the operands of the folded operation must be finite.
While this assumption holds for `fadd`, `fsub`, and `fmul`, it does not
hold for `fdiv`.
For example, assume we have:
```
A = 1.0, B = +Inf
A / B = 0.0 (finite, non-NaN)
```
The current transform would turn `fdiv A, B; select ninf nnan cond, A/B,
A;` into `A / (select ninf nnan cond, B, 1.0)`. If `cond` is true, the
inner select returns `B = +Inf`, and due to the propagated `ninf`, this
becomes poison.
This patch add check for operators before flag propagation to avoid
`fdiv` cases.
Alive2: https://alive2.llvm.org/ce/z/o0MJmS
\#163412 touched this last and directly propagated the profile
information. This was not correct for the motivating example:
%a = icmp eq i32 %z, 0
%b = icmp eq i32 %z, 1
%v2 = select i1 %b, i1 true, i1 %pred, !prof !18
%v3 = and i1 %a, %v2
to
%a = icmp eq i32 %z, 0
%v3 = select i1 %a, i1 %pred, i1 false
z == 1 does not imply that z == 0 for i8. In general for the and case,
we need a => b, which means that b must be equivalent or more
restrictive than a, which means we cannot propagate profile information
without additional information on the value distribution of z. For the
or case we need !a => b. We again cannot derive profile information for
a/!a without additional value distribution information.
`cast<Constant>` is not guarded by a type check during canonicalization
of predicates. This patch adds a type check in the outer if to avoid the
crash. `dyn_cast` may introduce another nested if, so I just use
`isa<Constant>` instead.
Address the crash reported in
https://github.com/llvm/llvm-project/pull/153053#issuecomment-3593914124.
https://alive2.llvm.org/ce/z/YGT5SNhttps://alive2.llvm.org/ce/z/PVDxCwhttps://alive2.llvm.org/ce/z/8buR2N
This is tricky because with positive numbers, we only go up, so we can
in fact always hit the signed_max boundary. This is important because
the intrinsic we use has the behavior of going the OTHER way, aka clamp
to INT_MIN if it goes in that direction.
And the range checking we do only works for positive numbers.
Because of this issue, we can only do this for constants as well.
For the simplification
```
(C && A) || (!C && B) --> sel C, A, B
```
(and related), if `C` (or (`!C`)) is the condition in the select
instruction representing the logical and, we can preserve that logical
and's branch weights when emitting the new instruction. Otherwise, the
profile data is unknown.
If `C` is the condition of both logical ands, then we just take the
branch weights of the first logical and (though in practice they should
be equal.)
Furthermore, `select-safe-transforms.ii` now passes under the profcheck
configuration, so we remove it from the failing tests.
Tracking issue: #147390
If a select instruction is replaced with one whose conditional is the
negation of the original, then the replacement's branch weights are the
reverse of the original's.
Tracking issue: #147390
The `masked.load`, `masked.store`, `masked.gather` and `masked.scatter`
intrinsics currently accept a separate alignment immarg. Replace this
with an `align` attribute on the pointer / vector of pointers argument.
This is the standard representation for alignment information on
intrinsics, and is already used by all other memory intrinsics. This
means the signatures now match llvm.expandload, llvm.vp.load, etc.
(Things like llvm.memcpy used to have a separate alignment argument as
well, but were already migrated a long time ago.)
It's worth noting that the masked.gather and masked.scatter intrinsics
previously accepted a zero alignment to indicate the ABI type alignment
of the element type. This special case is gone now: If the align
attribute is omitted, the implied alignment is 1, as usual. If ABI
alignment is desired, it needs to be explicitly emitted (which the
IRBuilder API already requires anyway).
In the case where we have a conditional that is implied by a previous
conditional (like x < 10 => x < 20 in a select), we can simply propagate
the profile information along the select.
Consider the following transform:
```
C = binop float A, nnan OOp
D = select ninf, i1 cond, float C, float A
->
E = select ninf, i1 cond, float OOp, float Identity
F = binop float A, E
```
We cannot propagate ninf from the original select, because OOp may be
inf, and the flag only guarantees that FalseVal (op OOp) is never
infinity.
Examples: -inf + +inf = NaN, -inf - -inf = NaN, 0 * inf = NaN
Specifically, if the original select has both ninf and nnan, we can
safely propagate the flag.
Alive2:
+ fadd: https://alive2.llvm.org/ce/z/TWfktv
+ fsub: https://alive2.llvm.org/ce/z/RAsjJb
+ fmul: https://alive2.llvm.org/ce/z/8eg4ND
Closes https://github.com/llvm/llvm-project/issues/161634.
Logical booleans in LLVM are represented by select statements - e.g. the
statement
```
A && B
```
is represented as
```
select i1 %A, i1 %B, i1 false
```
When LLVM folds two of the same logical booleans into a logical boolean
and a bitwise boolean (e.g. `A && B && C` -> `A && (B & C)`), the first
logical boolean is a select statement that retains the original
condition from the first logical boolean of the original statement. This
means that the new select statement has the branch weights as the
original select statement.
Tracking issue: #147390
This reverts commit 572b579632fb79ea6eb562a537c9ff1280b3d4f5.
This is a reland of #159666 but with a fix moving the `extern`
declaration of the flag under the LLVM namespace, which is needed to fix
a linker error caused by #161240.
If `select` simplification produces the transform:
```
(select A && B, T, F) -> (select A, T, F)
```
or
```
(select A || B, T, F) -> (select A, T, F)
```
it stands to reason that if the branches are the same, then the branch
weights remain the same since the net effect is a simplification of the
conditional.
There are also cases where InstCombine negates the conditional (and
therefore reverses the branches); this PR asserts that the branch
weights are reversed in this case.
Tracking issue: #147390
A common idiom is the usage of the PatternMatch match function within a
functional algorithm like all_of. Introduce a match functor to shorten
this idiom.
Co-authored-by: Luke Lau <luke@igalia.com>
```llvm
%sub = sub nsw T %x, %y
%cmp = icmp sgt T %x, %y ; or sge
%neg = sub T 0, %sub
%abs = select i1 %cmp, T %sub, T %neg
```
becomes:
```llvm
%sub = sub nsw T %x, %y
%abs = call T @llvm.abs.T(T %sub, i1 false)
```
Alive2: https://alive2.llvm.org/ce/z/ApdJX8https://alive2.llvm.org/ce/z/gRTmZk
This patch addresses
https://github.com/llvm/llvm-project/pull/155216#discussion_r2297724663.
This patch adds a helper function to put the inverse cast on constants,
with cast flags preserved(optional).
Follow-up patches will add trunc/ext handling on VectorCombine and flags
preservation on InstCombine.
When folding `X Pred C2 ? X BOp C1 : C2 BOp C1` to `min/max(X, C2) BOp
C1`, if NUW/NSW flags are present on `X BOp C1` and could be safely
applied to `C2 BOp C1`, then they may be added on the BOp after the fold
is complete. https://alive2.llvm.org/ce/z/n_3aNJ
Preserving these flags can allow subsequent transforms to re-order the
min/max and BOp, which in the case of NVPTX would allow for some
potential future transformations which would improve
instruction-selection.
Add the following folds for integer min max folding in InstCombine:
- (X > Y) ? X : (Y - 1) ==> MIN(X, Y - 1)
- (X < Y) ? X : (Y + 1) ==> MAX(X, Y + 1)
These are safe when overflow corresponding to the sign of the comparison
is poison. (proof https://alive2.llvm.org/ce/z/oj5iiI).
The most common of these patterns is likely the minimum case which
occurs in some internal library code when clamping an integer index to a
range (The maximum cases are included for completeness). Here is a
simplified example:
int clampToWidth(int idx, int width) {
if (idx >= width)
return width - 1;
return idx;
}
https://cuda.godbolt.org/z/nhPzWrc3W
Extend folding for `X Pred C2 ? X BOp C1 : C2 BOp C1` to `min/max(X, C2)
BOp C1` to allow min and max as `BOp`. This ensures a constant clamping
pattern is folded into a pair of min/max instructions. Here is a
simplified example of a case where this folding is not occurring
currently.
int clampToU8(int v) {
if (v < 0) return 0;
if (v > 255) return 255;
return v;
}
https://godbolt.org/z/78jhKPWbv
Generic proof: https://alive2.llvm.org/ce/z/cdpLYy
Before this patch, InstCombine hung because it replaced a value with a
more complex one:
```
%sel = select i1 %cmp, i32 %smax, i32 0 ->
%sel = select i1 %cmp, i32 %masked, i32 0 ->
%sel = select i1 %cmp, i32 %smax, i32 0 ->
...
```
This patch makes this replacement more conservative. It only performs
the replacement iff the new value is one of the operands of the original
value.
Closes https://github.com/llvm/llvm-project/issues/142405.
Having a finite Depth (or recursion limit) for computeKnownBits is very
limiting, but is currently a load-bearing necessity, as all KnownBits
are recomputed on each call and there is no caching. As a prerequisite
for an effort to remove the recursion limit altogether, either using a
clever caching technique, or writing a easily-invalidable KnownBits
analysis, make the Depth argument in APIs in ValueTracking uniformly the
last argument with a default value. This would aid in removing the
argument when the time comes, as many callers that currently pass 0
explicitly are now updated to omit the argument altogether.
Make use of known bits when trying to decompose a select/icmp bittest and folding it into an and. This means we can fold when additional information, for instance via a range attribute or metadata, allows us to conclude that the resulting mask is in fact a power of two.
for `trunc nuw` saves a instruction and otherwise only other
instructions without the select, same behavior as for bit test before.
proof: https://alive2.llvm.org/ce/z/a6QmyV