This was a bit confused because nofpclass expresses the opposite
from what an assume of class expresses. We need to assume
the intersection of assumed classes, which also needs to be inverted
to convert to nofpclass.
For all shifts we can apply the same two optimizations.
1) `ShiftOp(KnownVal.One, Max(KnownCnt)) != 0`
-> result is non-zero
2) If already known `Val != 0` and we only shift out zeros (based
on `Max(KnownCnt)`)
-> result is non-zero
The former exists for `shl` and the latter (for constant `Cnt`) exists
for `ashr`/`lshr`.
This patch improves the latter to use `Max(KnownCnt)` instead of
relying on a constant shift `Cnt` and applies both techniques for all
shift ops.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D148404
This reverts commit 2c8d0048f03d054f13909a26f959ef95b2a0a4de.
This is incorrect: computeKnownFPClass() is only known up to
poison, and freeze poison may have any FP class.
Previously only return `shl` non-zero if the shift value was `1`. We
can expand this if we have some bounds on the shift count.
For example:
```
%cnt = and %c, 16 ; Max cnt == 16
%val = or %v, 4 ; val[2] is known one
%shl = shl %val, %cnt ; (val.known.one << cnt.maxval) != 0
```
Differential Revision: https://reviews.llvm.org/D147897
Now that SCEV has a dedicated vscale node type, we should also map
vscale intrinsics to it. To make sure this does not regress ranges
(which were KnownBits based previously), add support for vscale to
getRangeRef() as well.
Differential Revision: https://reviews.llvm.org/D146226
Add support for vscale in computeConstantRange(), based on
vscale_range attributes. This allows simplifying based on the
precise range, rather than a KnownBits approximation (which will
be off by a factor of two for the usual case of a power of two
upper bound).
Differential Revision: https://reviews.llvm.org/D146217
Add a new compute-known-bits like function to compute all
the interesting floating point properties at once.
Eventually this should absorb all the various floating point
queries we already have.
This carries a bitmask indicating forbidden floating-point value kinds
in the argument or return value. This will enable interprocedural
-ffinite-math-only optimizations. This is primarily to cover the
no-nans and no-infinities cases, but also covers the other floating
point classes for free. Textually, this provides a number of names
corresponding to bits in FPClassTest, e.g.
call nofpclass(nan inf) @must_be_finite()
call nofpclass(snan) @cannot_be_snan()
This is more expressive than the existing nnan and ninf fast math
flags. As an added bonus, you can represent fun things like nanf:
declare nofpclass(inf zero sub norm) float @only_nans()
Compared to nnan/ninf:
- Can be applied to individual call operands as well as the return value
- Can distinguish signaling and quiet nans
- Distinguishes the sign of infinities
- Can be safely propagated since it doesn't imply anything about
other operands.
- Does not apply to FP instructions; it's not a flag
This is one step closer to being able to retire "no-nans-fp-math" and
"no-infs-fp-math". The one remaining situation where we have no way to
represent no-nans/infs is for loads (if we wanted to solve this we
could introduce !nofpclass metadata, following along with
noundef/!noundef).
This is to help simplify the GPU builtin math library
distribution. Currently the library code has explicit finite math only
checks, read from global constants the compiler driver needs to set
based on the compiler flags during linking. We end up having to
internalize the library into each translation unit in case different
linked modules have different math flags. By propagating known-not-nan
and known-not-infinity information, we can automatically prune the
edge case handling in most functions if the function is only reached
from fast math uses.
This change just factors out the existing logic for and/xor/or and
puts them in a publicly available helper. functionality is the same.
Differential Revision: https://reviews.llvm.org/D142849
This reverts commit f9599bbc7a3f831e1793a549d8a7a19265f3e504.
For some reason it caused us a huge compile time regression in downstream
workloads. Not sure whether the source of it is in upstream code ir not.
Temporarily reverting until investigated.
Differential Revision: https://reviews.llvm.org/D142330