This reverts commit f4941319cba19d7691baa6ec783c84be4d847637.
This broke llvm/test/CodeGen/Thumb2/mve-vcvt-fixed-to-float.ll which
took out a ton of buildbots and also broke premerge.
Some background: getExactInverse()'s callers expect that
the result is not subnormal.
DoubleAPFloat implemented getExactInverse() by going through
semPPCDoubleDoubleLegacy.
This means that numbers like 0x1p1022 which would have a normal inverse
in semPPCDoubleDouble would not in semPPCDoubleDoubleLegacy.
This commit refactors the logic into a single method on APFloat which
uses getExactLog2Abs() and scalbn() to calculate the inverse without
having to compute a reciprocal and test if it is inexact.
This approach works for both IEEEFloat and DoubleAPFloat.
An earlier draft of DoubleAPFloat::convertToSignExtendedInteger had
arranged for overflow to be handled in a different way. However, these
assertions are now possible if Hi+Lo are out of range and Lo != 0.
A test has been added to defend against a regression.
The prior implementation did not consider that the Lo component may
underflow when it undergoes scaling. This means that we need to
carefully handle things like binade crossings or how to handle
roundTowardZero when Hi and Lo have different signs.
Particularly annoying is roundTiesToAway when Hi and Lo have different
signs. It basically requires us to implement roundTiesTowardZero.
Use DoubleAPFloat::roundToIntegral to get a pair of APFloat values which
hold integral values. Then we sum the pair, taking care to make sure
that we handle edge cases like (hi=2^128, lo=-1) and ensuring that they
fit in an unsigned i128.
The previous implementation did not correctly handle double-doubles like
0x1p100 + 0x1p1 as the low order component would need more than a
106-bit significand to represent.
Rather than converting to the legacy 106-bit format, perform next() on the
low APFloat. Of course, we need to renormalize the two APFloats if
either of the two constraints are violated:
1. abs(low) <= ulp(high)/2
2. high = rtne(high + low)
Should renormalization be needed, it will increment the high component
and set low to the smallest value which obeys these rules.
Fixes#63895Fixes#104984
Before this PR, `addOrSubtractSignificand` presumed that the loss came
from the side being subtracted, and didn't handle the case where lhs ==
rhs and there was loss. This can occur during FMA. This PR fixes the
situation by correctly determining where the loss came from and handling
it appropriately.
Additionally, `normalize` failed to adjust the exponent when the
significand is zero but `lost_fraction != lfExactlyZero`. This meant
that the test case from #63895 was rounded incorrectly as the loss
wasn't adjusted to account for the exponent being below the minimum
exponent. This PR fixes this by only skipping the exponent adjustment if
the significand is zero and there was no lost fraction.
(Note to reviewer: I don't have commit access)
In order for the union APFloat::Storage to permit access to the
semantics field when another union member is stored there, all members
of Storage must be standard layout. This is not necessarily the case
for DoubleAPFloat which may be non-standard layout because there is no
requirement that its std::unique_ptr member is standard layout. Fix this
by converting Floats to a raw pointer.
Reviewers: arsenm
Reviewed By: arsenm
Pull Request: https://github.com/llvm/llvm-project/pull/129981
* Add missing semantics to the `Semantics` enum.
* Move all documentation of the semantics to the header file.
* Also rename some functions for consistency.
Since both APFloat and (Double)IEEEFloat inherit from APFloatBase, empty
base optimization is not performed by GCC/Clang (Minimal reproducer:
https://godbolt.org/z/dY8cM3Wre). This patch removes inheritance
relation between (Double)IEEEFloat and APFloatBase to make sure EBO is
performed on APFloat. After this patch, the size of `ConstantFPRange`
will be reduced from 72 to 56.
Address comment
https://github.com/llvm/llvm-project/pull/111544#discussion_r1792398427.
This patch adds an APFloat type for unsigned E8M0 format. This format is
used for representing the "scale-format" in the MX specification:
(section 5.4)
https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf
This format does not support {Inf, denorms, zeroes}. Like FP32, this
format's exponents are 8-bits (all bits here) and the bias value is 127.
However, it differs from IEEE-FP32 in that the minExponent is -127
(instead of -126). There are updates done in the APFloat utility
functions to handle these constraints for this format.
* The bias calculation is different and convertIEEE* APIs are updated to
handle this.
* Since there are no significand bits, the isSignificandAll{Zeroes/Ones}
methods are updated accordingly.
* Although the format does not have any precision, the precision bit in
the fltSemantics is set to 1 for consistency with APFloat's internal
representation.
* Many utility functions are updated to handle the fact that this format
does not support Zero.
* Provide a separate initFromAPInt() implementation to handle the quirks
of the format.
* Add specific tests to verify the range of values for this format.
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
ConstantFolding behaves differently depending on host's `HAS_IEE754_FLOAT128`.
LLVM should not change the behavior depending on host configurations.
This reverts commit 14c7e4a1844904f3db9b2dc93b722925a8c66b27.
(llvmorg-20-init-3262-g14c7e4a18449 and llvmorg-20-init-3498-g001e423ac626)
This is a reland of (#96287). This patch attempts to reduce the reverted
patch's clang compile time by removing #includes of float128.h and
inlining convertToQuad functions instead.
This is a reland of #96287. This change makes tests in logf128.ll ignore
the sign of NaNs for negative value tests and moves an #include <cmath>
to be blocked behind #ifndef _GLIBCXX_MATH_H.
This PR adds `f8E4M3` type to APFloat.
`f8E3M4` type follows IEEE 754 convention
```c
f8E3M4 (IEEE 754)
- Exponent bias: 3
- Maximum stored exponent value: 6 (binary 110)
- Maximum unbiased exponent value: 6 - 3 = 3
- Minimum stored exponent value: 1 (binary 001)
- Minimum unbiased exponent value: 1 − 3 = −2
- Precision specifies the total number of bits used for the significand (mantissa),
including implicit leading integer bit = 4 + 1 = 5
- Follows IEEE 754 conventions for representation of special values
- Has Positive and Negative zero
- Has Positive and Negative infinity
- Has NaNs
Additional details:
- Max exp (unbiased): 3
- Min exp (unbiased): -2
- Infinities (+/-): S.111.0000
- Zeros (+/-): S.000.0000
- NaNs: S.111.{0,1}⁴ except S.111.0000
- Max normal number: S.110.1111 = +/-2^(6-3) x (1 + 15/16) = +/-2^3 x 31 x 2^(-4) = +/-15.5
- Min normal number: S.001.0000 = +/-2^(1-3) x (1 + 0) = +/-2^(-2)
- Max subnormal number: S.000.1111 = +/-2^(-2) x 15/16 = +/-2^(-2) x 15 x 2^(-4) = +/-15 x 2^(-6)
- Min subnormal number: S.000.0001 = +/-2^(-2) x 1/16 = +/-2^(-2) x 2^(-4) = +/-2^(-6)
```
Related PRs:
- [PR-97179](https://github.com/llvm/llvm-project/pull/97179) [APFloat]
Add support for f8E4M3 IEEE 754 type
This PR adds `f8E4M3` type to APFloat.
`f8E4M3` type follows IEEE 754 convention
```c
f8E4M3 (IEEE 754)
- Exponent bias: 7
- Maximum stored exponent value: 14 (binary 1110)
- Maximum unbiased exponent value: 14 - 7 = 7
- Minimum stored exponent value: 1 (binary 0001)
- Minimum unbiased exponent value: 1 − 7 = −6
- Precision specifies the total number of bits used for the significand (mantisa),
including implicit leading integer bit = 3 + 1 = 4
- Follows IEEE 754 conventions for representation of special values
- Has Positive and Negative zero
- Has Positive and Negative infinity
- Has NaNs
Additional details:
- Max exp (unbiased): 7
- Min exp (unbiased): -6
- Infinities (+/-): S.1111.000
- Zeros (+/-): S.0000.000
- NaNs: S.1111.{001, 010, 011, 100, 101, 110, 111}
- Max normal number: S.1110.111 = +/-2^(7) x (1 + 0.875) = +/-240
- Min normal number: S.0001.000 = +/-2^(-6)
- Max subnormal number: S.0000.111 = +/-2^(-6) x 0.875 = +/-2^(-9) x 7
- Min subnormal number: S.0000.001 = +/-2^(-6) x 0.125 = +/-2^(-9)
```
Related PRs:
- [PR-97118](https://github.com/llvm/llvm-project/pull/97118) Add f8E4M3
IEEE 754 type to mlir
This PR lifts the body of IEEEFloat::toString out to a standalone
function. We do this to facilitate code sharing with other floating
point types, e.g., the forthcoming support for HexFloat.
There is no change in functionality.
Currently `f8E4M3` is mapped to `Float8E4M3FNType`.
This PR renames `f8E4M3` to `f8E4M3FN` to accurately reflect the actual
type.
This PR is needed to avoid names conflict in upcoming PR which will add
IEEE 754 `Float8E4M3Type`.
https://github.com/llvm/llvm-project/pull/97118 Add f8E4M3 IEEE 754 type
Maksim, can you review this PR? @makslevental ?
This is a second attempt to land #84501 which failed on several targets.
This patch adds the HAS_IEE754_FLOAT128 define which makes the check for
typedef'ing float128 more precise by checking whether __uint128_t is
available and checking if the host does not use __ibm128 which is
prevalent on power pc targets and replaces IEEE754 float128s.
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.
- StringRef::operator== outnumbers StringRef::equals by a factor of 25
under llvm/ in terms of their usage.
- The elimination of StringRef::equals brings StringRef closer to
std::string_view, which has operator== but not equals.
- S == "foo" is more readable than S.equals("foo"), especially for
!Long.Expression.equals("str") vs Long.Expression != "str".
This is a second attempt to land #84501 which failed on several targets.
This patch adds the HAS_IEE754_FLOAT128 define which makes the check for
typedef'ing float128 more precise by checking whether __uint128_t is available
and checking if the host does not use __ibm128 which is prevalent on power pc
targets and replaces IEEE754 float128s.
This patch enables constant folding for 128 bit floating-point logf
calls. This is achieved by querying if the host system has the logf128()
symbol available with a CMake test. If so, replace the runtime call with
the compile time value returned from logf128.
All the IEEE formats are quite similar, we can merge their code
effectively by writing it parametrically via the fltSemantics object.
We can metaprogram the implementation such that this parametricity is
zero-cost.