155 Commits

Author SHA1 Message Date
David Majnemer
9fdb5e1fef [APFloat] Properly implement next for DoubleAPFloat
Rather than converting to the legacy 106-bit format, perform next() on the
low APFloat. Of course, we need to renormalize the two APFloats if
either of the two constraints are violated:
1. abs(low) <= ulp(high)/2
2. high = rtne(high + low)

Should renormalization be needed, it will increment the high component
and set low to the smallest value which obeys these rules.
2025-08-01 12:34:33 -07:00
Yingwei Zheng
e710a5a9f2
[InstCombine] Fold fneg/fabs patterns with ppc_f128 (#130557)
This patch is needed by
https://github.com/llvm/llvm-project/pull/130496.
2025-04-14 14:30:00 +08:00
beetrees
8b9c91ec7d
[APFloat] Fix IEEEFloat::addOrSubtractSignificand and IEEEFloat::normalize (#98721)
Fixes #63895
Fixes #104984

Before this PR, `addOrSubtractSignificand` presumed that the loss came
from the side being subtracted, and didn't handle the case where lhs ==
rhs and there was loss. This can occur during FMA. This PR fixes the
situation by correctly determining where the loss came from and handling
it appropriately.

Additionally, `normalize` failed to adjust the exponent when the
significand is zero but `lost_fraction != lfExactlyZero`. This meant
that the test case from #63895 was rounded incorrectly as the loss
wasn't adjusted to account for the exponent being below the minimum
exponent. This PR fixes this by only skipping the exponent adjustment if
the significand is zero and there was no lost fraction.

(Note to reviewer: I don't have commit access)
2025-03-10 09:44:27 +07:00
YunQiang Su
88ff6070a5
APFloat: Fix maxnum and minnum with sNaN (#112854)
See: https://github.com/llvm/llvm-project/pull/112852
Fixes: https://github.com/llvm/llvm-project/issues/111991

We have reclarify llvm.maxnum and llvm.minnum to follow IEEE-754 2008's
maxNum and minNum with +0.0>-0.0.
So let's make APFloat::maxnum and APFloat::minnum to follow it, too.
2025-02-27 14:16:33 +08:00
Matthias Springer
309c890921
[llvm] APFloat: Add helpers to query NaN/inf semantics (#116315)
`APFloat` changes extracted from #116176 as per reviewer comments.
2024-11-16 11:48:05 +09:00
Matthias Springer
2f55de4e31
[llvm] APFloat: Query hasNanOrInf from semantics (#116158)
Whether a floating point type supports NaN or infinity can be queried
from its semantics. No need to hard-code a list of types.
2024-11-15 09:29:54 +09:00
Sergey Kozub
7b308b18c3
Fix bitcasting E8M0 APFloat to APInt (#113298)
Fixes a bug in APFloat handling of E8M0 type (zero mantissa).

Related PRs:
- https://github.com/llvm/llvm-project/pull/107127
- https://github.com/llvm/llvm-project/pull/111028
2024-10-22 19:15:29 +02:00
Yingwei Zheng
a54d88f97d
[APFloat] Fix APFloat::getOne (#112308)
`APFloat::APFloat(const fltSemantics &Semantics, integerPart I)`
interprets 'I' as a unsigned integer.
Fix the bug found in
https://github.com/llvm/llvm-project/pull/112113#discussion_r1799744541.
2024-10-15 13:35:50 +08:00
Durgadoss R
99f527d280
[APFloat] Add APFloat support for E8M0 type (#107127)
This patch adds an APFloat type for unsigned E8M0 format. This format is
used for representing the "scale-format" in the MX specification:
(section 5.4)

https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf

This format does not support {Inf, denorms, zeroes}. Like FP32, this
format's exponents are 8-bits (all bits here) and the bias value is 127.
However, it differs from IEEE-FP32 in that the minExponent is -127
(instead of -126). There are updates done in the APFloat utility
functions to handle these constraints for this format.

* The bias calculation is different and convertIEEE* APIs are updated to
handle this.
* Since there are no significand bits, the isSignificandAll{Zeroes/Ones}
methods are updated accordingly.
* Although the format does not have any precision, the precision bit in
the fltSemantics is set to 1 for consistency with APFloat's internal
representation.
* Many utility functions are updated to handle the fact that this format
does not support Zero.
* Provide a separate initFromAPInt() implementation to handle the quirks
of the format.
* Add specific tests to verify the range of values for this format.

Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
2024-10-02 23:04:21 +05:30
Alex Bradbury
a88901838a
[APFloat] Correct semantics of minimum/maximum for signaling NaN arguments (#109976)
The minimum and maximum operations were introduced in
https://reviews.llvm.org/D52764 alongside the intrinsics. The question
of NaN propagation was discussed at the time, but the resulting
semantics don't seem to match what was ultimately agreed in IEEE754-2019
or the description we now have in the LangRef at
<https://llvm.org/docs/LangRef.html#llvm-min-intrinsics-comparation>.

Essentially, the APFloat implementation doesn't quiet a signaling NaN
input when it should in order to match the LangRef and IEEE spec.
2024-10-01 12:47:55 +01:00
Sergey Kozub
918222ba43
[MLIR] Add f6E3M2FN type (#105573)
This PR adds `f6E3M2FN` type to mlir.

`f6E3M2FN` type is proposed in [OpenCompute MX
Specification](https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf).
It defines a 6-bit floating point number with bit layout S1E3M2. Unlike
IEEE-754 types, there are no infinity or NaN values.

```c
f6E3M2FN
- Exponent bias: 3
- Maximum stored exponent value: 7 (binary 111)
- Maximum unbiased exponent value: 7 - 3 = 4
- Minimum stored exponent value: 1 (binary 001)
- Minimum unbiased exponent value: 1 − 3 = −2
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.000.00
- Max normal number: S.111.11 = ±2^(4) x (1 + 0.75) = ±28
- Min normal number: S.001.00 = ±2^(-2) = ±0.25
- Max subnormal number: S.000.11 = ±2^(-2) x 0.75 = ±0.1875
- Min subnormal number: S.000.01 = ±2^(-2) x 0.25 = ±0.0625
```

Related PRs:
- [PR-94735](https://github.com/llvm/llvm-project/pull/94735) [APFloat]
Add APFloat support for FP6 data types
- [PR-97118](https://github.com/llvm/llvm-project/pull/97118) [MLIR] Add
f8E4M3 type - was used as a template for this PR
2024-09-10 10:41:05 +02:00
Alexander Pivovarov
abc2fe31fc
[APFloat] Add support for f8E3M4 IEEE 754 type (#99698)
This PR adds `f8E4M3` type to APFloat.

`f8E3M4` type  follows IEEE 754 convention

```c
f8E3M4 (IEEE 754)
- Exponent bias: 3
- Maximum stored exponent value: 6 (binary 110)
- Maximum unbiased exponent value: 6 - 3 = 3
- Minimum stored exponent value: 1 (binary 001)
- Minimum unbiased exponent value: 1 − 3 = −2
- Precision specifies the total number of bits used for the significand (mantissa), 
    including implicit leading integer bit = 4 + 1 = 5
- Follows IEEE 754 conventions for representation of special values
- Has Positive and Negative zero
- Has Positive and Negative infinity
- Has NaNs

Additional details:
- Max exp (unbiased): 3
- Min exp (unbiased): -2
- Infinities (+/-): S.111.0000
- Zeros (+/-): S.000.0000
- NaNs: S.111.{0,1}⁴ except S.111.0000
- Max normal number: S.110.1111 = +/-2^(6-3) x (1 + 15/16) = +/-2^3 x 31 x 2^(-4) = +/-15.5
- Min normal number: S.001.0000 = +/-2^(1-3) x (1 + 0) = +/-2^(-2)
- Max subnormal number: S.000.1111 = +/-2^(-2) x 15/16 = +/-2^(-2) x 15 x 2^(-4) = +/-15 x 2^(-6)
- Min subnormal number: S.000.0001 = +/-2^(-2) x 1/16 =  +/-2^(-2) x 2^(-4) = +/-2^(-6)
```

Related PRs:
- [PR-97179](https://github.com/llvm/llvm-project/pull/97179) [APFloat]
Add support for f8E4M3 IEEE 754 type
2024-07-30 00:11:10 -07:00
Alexander Pivovarov
f363317702
[APFloat] Add support for f8E4M3 IEEE 754 type (#97179)
This PR adds `f8E4M3` type to APFloat.

`f8E4M3` type  follows IEEE 754 convention

```c
f8E4M3 (IEEE 754)
- Exponent bias: 7
- Maximum stored exponent value: 14 (binary 1110)
- Maximum unbiased exponent value: 14 - 7 = 7
- Minimum stored exponent value: 1 (binary 0001)
- Minimum unbiased exponent value: 1 − 7 = −6
- Precision specifies the total number of bits used for the significand (mantisa), 
    including implicit leading integer bit = 3 + 1 = 4
- Follows IEEE 754 conventions for representation of special values
- Has Positive and Negative zero
- Has Positive and Negative infinity
- Has NaNs

Additional details:
- Max exp (unbiased): 7
- Min exp (unbiased): -6
- Infinities (+/-): S.1111.000
- Zeros (+/-): S.0000.000
- NaNs: S.1111.{001, 010, 011, 100, 101, 110, 111}
- Max normal number: S.1110.111 = +/-2^(7) x (1 + 0.875) = +/-240
- Min normal number: S.0001.000 = +/-2^(-6)
- Max subnormal number: S.0000.111 = +/-2^(-6) x 0.875 = +/-2^(-9) x 7
- Min subnormal number: S.0000.001 = +/-2^(-6) x 0.125 = +/-2^(-9)
```

Related PRs:
- [PR-97118](https://github.com/llvm/llvm-project/pull/97118) Add f8E4M3
IEEE 754 type to mlir
2024-07-17 23:33:52 -07:00
Alexander Pivovarov
2628a5fd24
Rename f8E4M3 to f8E4M3FN in mlir.extras.types py package (#97102)
Currently `f8E4M3` is mapped to `Float8E4M3FNType`.

This PR renames `f8E4M3` to `f8E4M3FN` to accurately reflect the actual
type.

This PR is needed to avoid names conflict in upcoming PR which will add
IEEE 754 `Float8E4M3Type`.
https://github.com/llvm/llvm-project/pull/97118 Add f8E4M3 IEEE 754 type

Maksim, can you review this PR? @makslevental ?
2024-06-29 12:48:11 -07:00
YunQiang Su
0d53366505
APFloat: Add minimumnum and maximumnum (#96304)
They implements IEEE754-2019 minimumNumber and maximumNumber semantics.

Newer libc also has these 2 functions with names
   fminimum_num
   fmaximum_num

We are planning add minimumnum and maximumnum intrinsic. This is a step
to the goal.
2024-06-26 07:10:43 +08:00
Nikita Popov
f2f18459d4 Revert "Intrinsic: introduce minimumnum and maximumnum (#93841)"
As far as I can tell, this pull request was not approved, and
did not go through an RFC on discourse.

This reverts commit 89881480030f48f83af668175b70a9798edca2fb.
This reverts commit 225d8fc8eb24fb797154c1ef6dcbe5ba033142da.
2024-06-21 08:34:04 +02:00
YunQiang Su
8988148003
Intrinsic: introduce minimumnum and maximumnum (#93841)
Currently, on different platform, the behaivor of llvm.minnum is
different if one operand is sNaN:

When we compare sNaN vs NUM:

ARM/AArch64/PowerPC: follow the IEEE754-2008's minNUM: return qNaN.
RISC-V/Hexagon follow the IEEE754-2019's minimumNumber: return NUM. X86:
Returns NUM but not same with IEEE754-2019's minimumNumber as
     +0.0 is not always greater than -0.0.
MIPS/LoongArch/Generic: return NUM.
LIBCALL: returns qNaN.

So, let's introduce llvm.minmumnum/llvm.maximumnum, which always follow
IEEE754-2019's minimumNumber/maximumNumber.

Half-fix: #93033
2024-06-21 11:53:08 +08:00
Durgadoss R
880d37038c
[APFloat] Add APFloat support for FP4 data type (#95392)
This patch adds APFloat type support for the E2M1
FP4 datatype. The definitions for this format are
detailed in section 5.3.3 of the OCP specification,
which can be accessed here:

https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf

Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
2024-06-14 14:17:37 +05:30
Durgadoss R
b1fe03f084
[APFloat] Add APFloat support for FP6 data types (#94735)
This patch adds APFloat type support for two FP6 data types,
E2M3 and E3M2. The definitions for the two formats are detailed
in section 5.3.2 of the OCP specification, which can be accessed here:

https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf

Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
2024-06-11 13:16:51 +05:30
Matt Arsenault
6cfd3439d4
APFloat: Fix signed zero handling in minnum/maxnum (#83376)
Follow the 2019 rules and order -0 as less than +0 and +0 as greater
than -0. As currently defined this isn't required for the intrinsics,
but is a better QoI.

This will avoid the workaround in libc added by #83158
2024-02-29 16:51:33 +05:30
Matt Arsenault
8a02fd3f94 APFloat: Add getExactLog2Abs
Like the recently added getExactLog2 except ignore the sign bit.

https://reviews.llvm.org/D158102
2023-08-23 19:16:41 -04:00
Matt Arsenault
0b57c3a7b8 APFloat: Add getExactLog2
https://reviews.llvm.org/D157108
2023-08-07 18:40:16 -04:00
Elliot Goodrich
b0abd4893f [llvm] Add missing StringExtras.h includes
In preparation for removing the `#include "llvm/ADT/StringExtras.h"`
from the header to source file of `llvm/Support/Error.h`, first add in
all the missing includes that were previously included transitively
through this header.
2023-06-25 15:42:22 +01:00
Jeremy Furtek
55c2211a23 [APFloat] Add APFloat semantic support for TF32
This diff adds APFloat support for a semantic that matches the TF32 data type
used by some accelerators (most notably GPUs from both NVIDIA and AMD).

For more information on the TF32 data type, see https://blogs.nvidia.com/blog/2020/05/14/tensorfloat-32-precision-format/.
Some intrinsics that support the TF32 data type were added in https://reviews.llvm.org/D122044.

For some discussion on supporting common semantics in `APFloat`, see similar
efforts for 8-bit formats at https://reviews.llvm.org/D146441, as well as
https://discourse.llvm.org/t/rfc-adding-the-amd-graphcore-maybe-others-float8-formats-to-apfloat/67969.

A subsequent diff will extend MLIR to use this data type. (Those changes are
not part of this diff to simplify the review process.)

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D151923
2023-06-23 10:54:49 +02:00
David Majnemer
3d11652bbe [APFloat] Refactor common code for APFloat<->APInt conversion
All the IEEE formats are quite similar, we can merge their code
effectively by writing it parametrically via the fltSemantics object.

We can metaprogram the implementation such that this parametricity is
zero-cost.
2023-04-04 19:12:10 +00:00
David Majnemer
2f086f265b [APFloat] Add E4M3B11FNUZ
X. Sun et al. (https://dl.acm.org/doi/10.5555/3454287.3454728) published
a paper showing that an FP format with 4 bits of exponent, 3 bits of
significand and an exponent bias of 11 would work quite well for ML
applications.

Google hardware supports a variant of this format where 0x80 is used to
represent NaN, as in the Float8E4M3FNUZ format. Just like the
Float8E4M3FNUZ format, this format does not support -0 and values which
would map to it will become +0.

This format is proposed for inclusion in OpenXLA's StableHLO dialect: https://github.com/openxla/stablehlo/pull/1308

As part of inclusion in that dialect, APFloat needs to know how to
handle this format.

Differential Revision: https://reviews.llvm.org/D146441
2023-03-24 20:06:40 +00:00
Matt Arsenault
71f1ea2c15 APFloat: Add classify 2023-03-03 18:54:58 -04:00
Krzysztof Drewniak
6109e70c72 [llvm][APFloat] Add NaN-in-negative-zero formats by AMD and GraphCore
AMD, GraphCore, and Qualcom have published a standard for 8-bit floats that
differs from the 8-bit floats defined by Nvidia, Intel, and ARM. This
commit adds support for these alternate 8-bit floats to APFloat in
order to enable their usage in MLIR. These formats are presented in
the paper at https://arxiv.org/abs/2206.02915 and are implemented in
GRaphCore hardware whose ISA is available at
https://docs.graphcore.ai/projects/isa-mk2-with-fp8/en/latest/_static/TileVertexISA-IPU21-1.3.1.pdf .

In these formats, like the existing Float8E4M3FN, there are no
infinity values and there is only one NaN. Unlike in that format,
however, the NaN values is 0x80, which would be negative 0 in IEEE
formats. This means that these formats also make 0 unsigned.

To allow for these new variant semantics, this commit adds
fltNanEncoding, which can be IEEE (the default), AllOnes (used by
Fleat8E4M3FN), or NegativeZero (used by the new formats,
Float8E5M2FNUZ and Float8E4M3FNUZ). Normalization, arithmetic, and
other such routines have been updated to account for the potential
variant semantics.

The two new formats are Float8E5M2FNUZ (5 bits exponent, 2 bits
mantissa, finite, unsigned zero) and Float8E4M3FNUZ (4 bits exponent,
3 bits mantissa, finite, unsigned zero).

Reviewed By: jakeh-gc, reedwm, lattner

Differential Revision: https://reviews.llvm.org/D141863
2023-02-09 22:08:00 +00:00
Matt Arsenault
191c1d95e8 APFloat: Add isSmallestNormalized predicate function
It was annoying to write the check for this in the one case I added,
and I'm planning on adding another, so add a convenient PatternMatch
like for other special case values.

I have no idea what is going on in the DoubleAPFloat case, I reversed
this from the makeSmallestNormalized test. Also could implement this
as *this == getSmallestNormalized() for less code, but this avoids the
construction of a temporary APFloat copy and follows the style of the
other functions.
2022-12-15 14:04:26 -05:00
Matt Arsenault
ded163027e APFloat: Add isPosInfinity and isNegInfinity helpers 2022-12-13 09:02:58 -05:00
Reed
7476d59414 Fix APFloat::toString on Float8E5M2 values.
Before, an APInt with value 10 was created, whose width was the significand width. But 10 cannot fit in Float8E5M2's significand.

Differential Revision: https://reviews.llvm.org/D138540
2022-12-13 09:52:07 +01:00
Reed
88eb3c62f2 Add FP8 E4M3 support to APFloat.
NVIDIA, ARM, and Intel recently introduced two new FP8 formats, as described in the paper: https://arxiv.org/abs/2209.05433. The first of the two FP8 dtypes, E5M2, was added in https://reviews.llvm.org/D133823. This change adds the second of the two: E4M3.

There is an RFC for adding the FP8 dtypes here: https://discourse.llvm.org/t/rfc-add-apfloat-and-mlir-type-support-for-fp8-e5m2/65279. I spoke with the RFC's author, Stella, and she gave me the go ahead to implement the E4M3 type. The name of the E4M3 type in APFloat is Float8E4M3FN, as discussed in the RFC. The "FN" means only Finite and NaN values are supported.

Unlike E5M2, E4M3 has different behavior from IEEE types in regards to Inf and NaN values. There are no Inf values, and NaN is represented when the exponent and mantissa bits are all 1s. To represent these differences in APFloat, I added an enum field, fltNonfiniteBehavior, to the fltSemantics struct. The possible enum values are IEEE754 and NanOnly. Only Float8E4M3FN has the NanOnly behavior.

After this change is submitted, I plan on adding the Float8E4M3FN type to MLIR, in the same way as E5M2 was added in https://reviews.llvm.org/D133823.

Reviewed By: bkramer

Differential Revision: https://reviews.llvm.org/D137760
2022-11-15 20:26:42 +01:00
Stella Laurenzo
e28b15b572 Add APFloat and MLIR type support for fp8 (e5m2).
(Re-Apply with fixes to clang MicrosoftMangle.cpp)

This is a first step towards high level representation for fp8 types
that have been built in to hardware with near term roadmaps. Like the
BFLOAT16 type, the family of fp8 types are inspired by IEEE-754 binary
floating point formats but, due to the size limits, have been tweaked in
various ways in order to maximally use the range/precision in various
scenarios. The list of variants is small/finite and bounded by real
hardware.

This patch introduces the E5M2 FP8 format as proposed by Nvidia, ARM,
and Intel in the paper: https://arxiv.org/pdf/2209.05433.pdf

As the more conformant of the two implemented datatypes, we are plumbing
it through LLVM's APFloat type and MLIR's type system first as a
template. It will be followed by the range optimized E4M3 FP8 format
described in the paper. Since that format deviates further from the
IEEE-754 norms, it may require more debate and implementation
complexity.

Given that we see two parts of the FP8 implementation space represented
by these cases, we are recommending naming of:

* `F8M<N>` : For FP8 types that can be conceived of as following the
  same rules as FP16 but with a smaller number of mantissa/exponent
  bits. Including the number of mantissa bits in the type name is enough
  to fully specify the type. This naming scheme is used to represent
  the E5M2 type described in the paper.
* `F8M<N>F` : For FP8 types such as E4M3 which only support finite
  values.

The first of these (this patch) seems fairly non-controversial. The
second is previewed here to illustrate options for extending to the
other known variant (but can be discussed in detail in the patch
which implements it).

Many conversations about these types focus on the Machine-Learning
ecosystem where they are used to represent mixed-datatype computations
at a high level. At that level (which is why we also expose them in
MLIR), it is important to retain the actual type definition so that when
lowering to actual kernels or target specific code, the correct
promotions, casts and rescalings can be done as needed. We expect that
most LLVM backends will only experience these types as opaque `I8`
values that are applicable to some instructions.

MLIR does not make it particularly easy to add new floating point types
(i.e. the FloatType hierarchy is not open). Given the need to fully
model FloatTypes and make them interop with tooling, such types will
always be "heavy-weight" and it is not expected that a highly open type
system will be particularly helpful. There are also a bounded number of
floating point types in use for current and upcoming hardware, and we
can just implement them like this (perhaps looking for some cosmetic
ways to reduce the number of places that need to change). Creating a
more generic mechanism for extending floating point types seems like it
wouldn't be worth it and we should just deal with defining them one by
one on an as-needed basis when real hardware implements a new scheme.
Hopefully, with some additional production use and complete software
stacks, hardware makers will converge on a set of such types that is not
terribly divergent at the level that the compiler cares about.

(I cleaned up some old formatting and sorted some items for this case:
If we converge on landing this in some form, I will NFC commit format
only changes as a separate commit)

Differential Revision: https://reviews.llvm.org/D133823
2022-10-04 17:18:17 -07:00
Vitaly Buka
e68c7a9917 Revert "Add APFloat and MLIR type support for fp8 (e5m2)."
Breaks bots https://lab.llvm.org/buildbot/#/builders/37/builds/17086

This reverts commit 2dc68b5398258c7a0cf91f10192d058e787afcdf.
2022-10-02 21:22:44 -07:00
Stella Laurenzo
2dc68b5398 Add APFloat and MLIR type support for fp8 (e5m2).
This is a first step towards high level representation for fp8 types
that have been built in to hardware with near term roadmaps. Like the
BFLOAT16 type, the family of fp8 types are inspired by IEEE-754 binary
floating point formats but, due to the size limits, have been tweaked in
various ways in order to maximally use the range/precision in various
scenarios. The list of variants is small/finite and bounded by real
hardware.

This patch introduces the E5M2 FP8 format as proposed by Nvidia, ARM,
and Intel in the paper: https://arxiv.org/pdf/2209.05433.pdf

As the more conformant of the two implemented datatypes, we are plumbing
it through LLVM's APFloat type and MLIR's type system first as a
template. It will be followed by the range optimized E4M3 FP8 format
described in the paper. Since that format deviates further from the
IEEE-754 norms, it may require more debate and implementation
complexity.

Given that we see two parts of the FP8 implementation space represented
by these cases, we are recommending naming of:

* `F8M<N>` : For FP8 types that can be conceived of as following the
  same rules as FP16 but with a smaller number of mantissa/exponent
  bits. Including the number of mantissa bits in the type name is enough
  to fully specify the type. This naming scheme is used to represent
  the E5M2 type described in the paper.
* `F8M<N>F` : For FP8 types such as E4M3 which only support finite
  values.

The first of these (this patch) seems fairly non-controversial. The
second is previewed here to illustrate options for extending to the
other known variant (but can be discussed in detail in the patch
which implements it).

Many conversations about these types focus on the Machine-Learning
ecosystem where they are used to represent mixed-datatype computations
at a high level. At that level (which is why we also expose them in
MLIR), it is important to retain the actual type definition so that when
lowering to actual kernels or target specific code, the correct
promotions, casts and rescalings can be done as needed. We expect that
most LLVM backends will only experience these types as opaque `I8`
values that are applicable to some instructions.

MLIR does not make it particularly easy to add new floating point types
(i.e. the FloatType hierarchy is not open). Given the need to fully
model FloatTypes and make them interop with tooling, such types will
always be "heavy-weight" and it is not expected that a highly open type
system will be particularly helpful. There are also a bounded number of
floating point types in use for current and upcoming hardware, and we
can just implement them like this (perhaps looking for some cosmetic
ways to reduce the number of places that need to change). Creating a
more generic mechanism for extending floating point types seems like it
wouldn't be worth it and we should just deal with defining them one by
one on an as-needed basis when real hardware implements a new scheme.
Hopefully, with some additional production use and complete software
stacks, hardware makers will converge on a set of such types that is not
terribly divergent at the level that the compiler cares about.

(I cleaned up some old formatting and sorted some items for this case:
If we converge on landing this in some form, I will NFC commit format
only changes as a separate commit)

Differential Revision: https://reviews.llvm.org/D133823
2022-10-02 17:17:08 -07:00
Joe Loser
5e96cea1db [llvm] Use std::size instead of llvm::array_lengthof
LLVM contains a helpful function for getting the size of a C-style
array: `llvm::array_lengthof`. This is useful prior to C++17, but not as
helpful for C++17 or later: `std::size` already has support for C-style
arrays.

Change call sites to use `std::size` instead.

Differential Revision: https://reviews.llvm.org/D133429
2022-09-08 09:01:53 -06:00
Danila Malyutin
ed6c309d4b [APFloat] Fix truncation of certain subnormal numbers
Certain subnormals would be incorrectly rounded away from zero.

Fixes #55838

Differential Revision: https://reviews.llvm.org/D127140
2022-06-08 21:54:35 +03:00
Benjamin Kramer
f15014ff54 Revert "Rename llvm::array_lengthof into llvm::size to match std::size from C++17"
This reverts commit ef8206320769ad31422a803a0d6de6077fd231d2.

- It conflicts with the existing llvm::size in STLExtras, which will now
  never be called.
- Calling it without llvm:: breaks C++17 compat
2022-01-26 16:55:53 +01:00
serge-sans-paille
ef82063207 Rename llvm::array_lengthof into llvm::size to match std::size from C++17
As a conquence move llvm::array_lengthof from STLExtras.h to
STLForwardCompat.h (which is included by STLExtras.h so no build
breakage expected).
2022-01-26 16:17:45 +01:00
Serge Pavlov
c162f086ba [APFloat] convertToDouble/Float can work on shorter types
Previously APFloat::convertToDouble may be called only for APFloats that
were built using double semantics. Other semantics like single precision
were not allowed although corresponding numbers could be converted to
double without loss of precision. The similar restriction applied to
APFloat::convertToFloat.

With this change any APFloat that can be precisely represented by double
can be handled with convertToDouble. Behavior of convertToFloat was
updated similarly. It make the conversion operations more convenient and
adds support for formats like half and bfloat.

Differential Revision: https://reviews.llvm.org/D102671
2021-05-21 11:02:51 +07:00
Sanjay Patel
149f5b573c [APFloat] convert SNaN to QNaN in convert() and raise Invalid signal
This is an alternate fix (see D87835) for a bug where a NaN constant
gets wrongly transformed into Infinity via truncation.
In this patch, we uniformly convert any SNaN to QNaN while raising
'invalid op'.
But we don't have a way to directly specify a 32-bit SNaN value in LLVM IR,
so those are always encoded/decoded by calling convert from/to 64-bit hex.

See D88664 for a clang fix needed to allow this change.

Differential Revision: https://reviews.llvm.org/D88238
2020-10-01 14:37:38 -04:00
Craig Topper
b23916504a Patch IEEEFloat::isSignificandAllZeros and IEEEFloat::isSignificandAllOnes (bug 34579)
Patch IEEEFloat::isSignificandAllZeros and IEEEFloat::isSignificandAllOnes to behave correctly in the case that the size of the significand is a multiple of the width of the integerParts making up the significand.

The patch to IEEEFloat::isSignificandAllOnes fixes bug 34579, and the patch to IEEE:Float:isSignificandAllZeros fixes the unit test "APFloatTest.x87Next" I added here. I have included both in this diff since the changes are very similar.

Patch by Andrew Briand
2020-09-30 16:07:15 -07:00
Sanjay Patel
e34bd1e0b0 [APFloat] prevent NaN morphing into Inf on conversion (PR43907)
We shift the significand right on a truncation, but that needs to be made NaN-safe:
always set at least 1 bit in the significand.
https://llvm.org/PR43907

See D88238 for the likely follow-up (but needs some plumbing fixes before it can proceed).

Differential Revision: https://reviews.llvm.org/D87835
2020-09-24 14:02:19 -04:00
Sanjay Patel
b2c46633d1 [APFloat] add tests for convert of NAN; NFC
More coverage for the bug fix proposed in D87835.
2020-09-24 07:43:07 -04:00
Serge Pavlov
14a1b80e04 Make IEEEFloat::roundToIntegral more standard conformant
Behavior of IEEEFloat::roundToIntegral is aligned with IEEE-754
operation roundToIntegralExact. In partucular this function now:
- returns opInvalid for signaling NaNs,
- returns opInexact if the result of rounding differs from argument.

Differential Revision: https://reviews.llvm.org/D75246
2020-03-11 10:38:46 +07:00
Jay Foad
6c61edcbab [APFloat] Overload comparison operators
Summary:
These implement the usual IEEE-style floating point comparison
semantics, e.g. +0.0 == -0.0 and all operators except != return false
if either argument is NaN.

Subscribers: arsenm, jvesely, nhaehnle, hiraditya, dexonsmith, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75237
2020-03-06 16:42:53 +00:00
Jay Foad
3ecfdc70cf [APFloat] Overload unary operator-
Summary:
We already have overloaded binary arithemetic operators so you can write
A+B etc. This patch lets you write -A instead of neg(A).

Subscribers: hiraditya, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75236
2020-03-06 09:11:38 +00:00
Ehud Katz
9d0956ebd4 [APFloat] Fix FP remainder operation
Reimplement IEEEFloat::remainder() function.

Fix PR3359.

Differential Revision: https://reviews.llvm.org/D69776
2020-02-12 10:42:55 +02:00
Fangrui Song
2a879e6884 [APFloat][unittest] Fix -Wsign-compare after D69773 2020-01-21 12:24:34 -08:00
Ehud Katz
0b336b6048 [APFloat] Add support for operations on Signaling NaN
Fix PR30781

Differential Revision: https://reviews.llvm.org/D69774
2020-01-21 21:02:00 +02:00