When threading operations over phis, we need to adjust the context
instruction to the terminator of the incoming block. This was
handled when threading icmps, but not when threading binops.
Fixes https://github.com/llvm/llvm-project/issues/61312.
Address the dominating condition, the urem fold is benefit from the analytics improvements.
Fix https://github.com/llvm/llvm-project/issues/60546
NOTE: delete the calls in simplifyBinaryIntrinsic and foldICmpWithDominatingICmp
is used to reduce compile time.
Reviewed By: nikic, arsenm, erikdesjardins
Differential Revision: https://reviews.llvm.org/D144248
This carries a bitmask indicating forbidden floating-point value kinds
in the argument or return value. This will enable interprocedural
-ffinite-math-only optimizations. This is primarily to cover the
no-nans and no-infinities cases, but also covers the other floating
point classes for free. Textually, this provides a number of names
corresponding to bits in FPClassTest, e.g.
call nofpclass(nan inf) @must_be_finite()
call nofpclass(snan) @cannot_be_snan()
This is more expressive than the existing nnan and ninf fast math
flags. As an added bonus, you can represent fun things like nanf:
declare nofpclass(inf zero sub norm) float @only_nans()
Compared to nnan/ninf:
- Can be applied to individual call operands as well as the return value
- Can distinguish signaling and quiet nans
- Distinguishes the sign of infinities
- Can be safely propagated since it doesn't imply anything about
other operands.
- Does not apply to FP instructions; it's not a flag
This is one step closer to being able to retire "no-nans-fp-math" and
"no-infs-fp-math". The one remaining situation where we have no way to
represent no-nans/infs is for loads (if we wanted to solve this we
could introduce !nofpclass metadata, following along with
noundef/!noundef).
This is to help simplify the GPU builtin math library
distribution. Currently the library code has explicit finite math only
checks, read from global constants the compiler driver needs to set
based on the compiler flags during linking. We end up having to
internalize the library into each translation unit in case different
linked modules have different math flags. By propagating known-not-nan
and known-not-infinity information, we can automatically prune the
edge case handling in most functions if the function is only reached
from fast math uses.
This is a generalization of a suggestion from issue #60799
that allows removing a redundant guard of an input value
via icmp+select. It should also solve issue #60801.
This only comes into play for a select with an equality
condition where we are trying to substitute a constant into
the false arm of a select. (A 'true' select arm substitution
allows "refinement", so it is not on this code path.)
The constant must be the same in the compare and the select,
and it must be a "binop absorber" (X op C = C). That query
currently includes 'or', 'and', and 'mul', so there are tests
for all of those opcodes.
We then use "impliesPoison" on the false arm binop and the
original "Op" to be replaced to ensure that the select is not
actually blocking poison from leaking. That could be
potentially expensive as we recursively test each operand, but
it is currently limited to a depth of 2. That's enough to catch
our motivating cases, but probably nothing more complicated
(although that seems unlikely).
I don't know how to generalize a proof for Alive2 for this, but
here's a positive and negative test example to help illustrate
the subtle logic differences of poison/undef propagation:
https://alive2.llvm.org/ce/z/Sz5K-c
Differential Revision: https://reviews.llvm.org/D144493
define i1 @compare_vscales() {
%vscale = call i64 @llvm.vscale.i64()
%vscalex2 = shl nuw nsw i64 %vscale, 1
%vscalex4 = shl nuw nsw i64 %vscale, 2
%cmp = icmp ult i64 %vscalex2, %vscalex4
ret i1 %cmp
}
This IR is currently emitted by LLVM. This icmp is redundant as this snippet
can be simplified to true or false as both operands originate from the same
@llvm.vscale.i64() call.
Differential Revision: https://reviews.llvm.org/D142542
We can only fold insertvalue undef, (extractvalue x, n) to x
if x is not poison, otherwise we might be replacing undef with
poison (https://alive2.llvm.org/ce/z/fnw3c8). The insertvalue
poison case is always fine.
I didn't go to particularly large effort to preserve cases where
folding with undef is still legal (mainly when there is a chain of
multiple inserts that end up covering the whole aggregate),
because this shouldn't really occur in practice: We should always
be generating the insertvalue poison form when constructing
aggregates nowadays.
Differential Revision: https://reviews.llvm.org/D144106
There are 2 issues here:
1. In the default LLVM FP environment (regular FP math instructions),
SNaN is some flavor of "don't care" which we will nail down in
D143074, so this is just a quality-of-implementation improvement
for default FP.
2. In the constrained FP environment (constrained intrinsics), SNaN
must not propagate through a math operation; it has to be quieted
according to IEEE-754 spec. That is independent of exception
handling mode, so the current behavior is a miscompile.
Differential Revision: https://reviews.llvm.org/D143505
When inferring that a GEP of a global variable is inbounds because
there is no notional overindexing, we need to check that the
global value type and the GEP source element type match.
This was not necessary with typed pointers (because we would have
a bitcast in between), but is necessary with opaque pointers.
We should be able to recover some of the safe cases by performing
an offset based inbounds inference in DL-aware ConstantFolding.
https://alive2.llvm.org/ce/z/xuvL46
This is the similar to the existing folds added with:
D138853 / f2973327496fc966c4e89597
7dbeb127eaf6
...but with the and/or swapped.
Existing tests were added with D138853, but that patch failed
to handle all of the commutes. The poison-safety behavior is
symmetric, so I'm not duplicating all of the tests that were
added with that patch.
Definitionally a non-zero power of 2 will only have 1 bit set so this
is a freebee.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D141990
A bug was introduced with 68c197f07eeae71 as noted in the
post-commit review comments, and there are potentially
missed smaller transforms/simplifications because no-wrap
multiply with only 1 or 2 bits eliminates some potential
results.
Instcombine prefers this canonical form (see getPreferredVectorIndex),
as does IRBuilder when passing the index as an integer so we may as
well use the prefered form from creation.
NOTE: All test changes are mechanical with nothing else expected
beyond a change of index type from i32 to i64.
Differential Revision: https://reviews.llvm.org/D140983
This fixes an annoying assymmetry in the test organization. We have
known-never-nan.ll for dedicated isKnownNeverNaN handling tests, but
the isKnownNeverInfinity were in floating-point-compare.ll. Move the
more targeted tests into a separate file to match.