llvm-project

Author	SHA1	Message	Date
Yingwei Zheng	6004f5550c	[ADT][APFloat] Make sure EBO is performed on APFloat (#111641 ) Since both APFloat and (Double)IEEEFloat inherit from APFloatBase, empty base optimization is not performed by GCC/Clang (Minimal reproducer: https://godbolt.org/z/dY8cM3Wre). This patch removes inheritance relation between (Double)IEEEFloat and APFloatBase to make sure EBO is performed on APFloat. After this patch, the size of `ConstantFPRange` will be reduced from 72 to 56. Address comment https://github.com/llvm/llvm-project/pull/111544#discussion_r1792398427.	2024-10-09 16:39:02 +08:00
Durgadoss R	99f527d280	[APFloat] Add APFloat support for E8M0 type (#107127 ) This patch adds an APFloat type for unsigned E8M0 format. This format is used for representing the "scale-format" in the MX specification: (section 5.4) https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf This format does not support {Inf, denorms, zeroes}. Like FP32, this format's exponents are 8-bits (all bits here) and the bias value is 127. However, it differs from IEEE-FP32 in that the minExponent is -127 (instead of -126). There are updates done in the APFloat utility functions to handle these constraints for this format. * The bias calculation is different and convertIEEE* APIs are updated to handle this. * Since there are no significand bits, the isSignificandAll{Zeroes/Ones} methods are updated accordingly. * Although the format does not have any precision, the precision bit in the fltSemantics is set to 1 for consistency with APFloat's internal representation. * Many utility functions are updated to handle the fact that this format does not support Zero. * Provide a separate initFromAPInt() implementation to handle the quirks of the format. * Add specific tests to verify the range of values for this format. Signed-off-by: Durgadoss R <durgadossr@nvidia.com>	2024-10-02 23:04:21 +05:30
Yingwei Zheng	fa824dc0dd	[LLVM][IR] Add constant range support for floating-point types (#86483 ) This patch adds basic constant range support for floating-point types to enable range-based optimizations.	2024-09-25 13:58:23 +08:00
NAKAMURA Takumi	3ef64f7ab5	Revert "Enable logf128 constant folding for hosts with 128bit long double (#104929 )" ConstantFolding behaves differently depending on host's `HAS_IEE754_FLOAT128`. LLVM should not change the behavior depending on host configurations. This reverts commit 14c7e4a1844904f3db9b2dc93b722925a8c66b27. (llvmorg-20-init-3262-g14c7e4a18449 and llvmorg-20-init-3498-g001e423ac626)	2024-08-25 08:30:23 +09:00
Matthew Devereau	14c7e4a184	Enable logf128 constant folding for hosts with 128bit long double (#104929 ) This is a reland of (#96287). This patch attempts to reduce the reverted patch's clang compile time by removing #includes of float128.h and inlining convertToQuad functions instead.	2024-08-22 10:12:59 +01:00
Nikita Popov	6300233de1	Revert "Reland logf128 constant folding (#103217 )" This reverts commit 3cab7c555ad6451f2b1b4dc918a4b4f4e4a3e45d. The modified test fails on ppc64le buildbots.	2024-08-14 12:30:33 +02:00
Matthew Devereau	3cab7c555a	Reland logf128 constant folding (#103217 ) This is a reland of #96287. This change makes tests in logf128.ll ignore the sign of NaNs for negative value tests and moves an #include <cmath> to be blocked behind #ifndef _GLIBCXX_MATH_H.	2024-08-14 08:55:52 +01:00
Nikita Popov	a15de17772	Revert "Enable logf128 constant folding for hosts with 128bit floats (#96287 )" This reverts commit ccb2b011e577e861254f61df9c59494e9e122b38. Causes buildbot failures, e.g. on ppc64le builders.	2024-08-09 15:12:11 +02:00
Matthew Devereau	ccb2b011e5	Enable logf128 constant folding for hosts with 128bit floats (#96287 ) Hosts which support a float size of 128 bits can benefit from constant fp128 folding.	2024-08-09 11:12:43 +01:00
Alexander Pivovarov	abc2fe31fc	[APFloat] Add support for f8E3M4 IEEE 754 type (#99698 ) This PR adds `f8E4M3` type to APFloat. `f8E3M4` type follows IEEE 754 convention ```c f8E3M4 (IEEE 754) - Exponent bias: 3 - Maximum stored exponent value: 6 (binary 110) - Maximum unbiased exponent value: 6 - 3 = 3 - Minimum stored exponent value: 1 (binary 001) - Minimum unbiased exponent value: 1 − 3 = −2 - Precision specifies the total number of bits used for the significand (mantissa), including implicit leading integer bit = 4 + 1 = 5 - Follows IEEE 754 conventions for representation of special values - Has Positive and Negative zero - Has Positive and Negative infinity - Has NaNs Additional details: - Max exp (unbiased): 3 - Min exp (unbiased): -2 - Infinities (+/-): S.111.0000 - Zeros (+/-): S.000.0000 - NaNs: S.111.{0,1}⁴ except S.111.0000 - Max normal number: S.110.1111 = +/-2^(6-3) x (1 + 15/16) = +/-2^3 x 31 x 2^(-4) = +/-15.5 - Min normal number: S.001.0000 = +/-2^(1-3) x (1 + 0) = +/-2^(-2) - Max subnormal number: S.000.1111 = +/-2^(-2) x 15/16 = +/-2^(-2) x 15 x 2^(-4) = +/-15 x 2^(-6) - Min subnormal number: S.000.0001 = +/-2^(-2) x 1/16 = +/-2^(-2) x 2^(-4) = +/-2^(-6) ``` Related PRs: - [PR-97179](https://github.com/llvm/llvm-project/pull/97179) [APFloat] Add support for f8E4M3 IEEE 754 type	2024-07-30 00:11:10 -07:00
Alexander Pivovarov	f363317702	[APFloat] Add support for f8E4M3 IEEE 754 type (#97179 ) This PR adds `f8E4M3` type to APFloat. `f8E4M3` type follows IEEE 754 convention ```c f8E4M3 (IEEE 754) - Exponent bias: 7 - Maximum stored exponent value: 14 (binary 1110) - Maximum unbiased exponent value: 14 - 7 = 7 - Minimum stored exponent value: 1 (binary 0001) - Minimum unbiased exponent value: 1 − 7 = −6 - Precision specifies the total number of bits used for the significand (mantisa), including implicit leading integer bit = 3 + 1 = 4 - Follows IEEE 754 conventions for representation of special values - Has Positive and Negative zero - Has Positive and Negative infinity - Has NaNs Additional details: - Max exp (unbiased): 7 - Min exp (unbiased): -6 - Infinities (+/-): S.1111.000 - Zeros (+/-): S.0000.000 - NaNs: S.1111.{001, 010, 011, 100, 101, 110, 111} - Max normal number: S.1110.111 = +/-2^(7) x (1 + 0.875) = +/-240 - Min normal number: S.0001.000 = +/-2^(-6) - Max subnormal number: S.0000.111 = +/-2^(-6) x 0.875 = +/-2^(-9) x 7 - Min subnormal number: S.0000.001 = +/-2^(-6) x 0.125 = +/-2^(-9) ``` Related PRs: - [PR-97118](https://github.com/llvm/llvm-project/pull/97118) Add f8E4M3 IEEE 754 type to mlir	2024-07-17 23:33:52 -07:00
Ariel-Burton	bbc6504b3d	[NFC] [APFloat] Refactor IEEEFloat::toString (#97117 ) This PR lifts the body of IEEEFloat::toString out to a standalone function. We do this to facilitate code sharing with other floating point types, e.g., the forthcoming support for HexFloat. There is no change in functionality.	2024-07-04 08:43:45 -04:00
Alexander Pivovarov	2628a5fd24	Rename f8E4M3 to f8E4M3FN in mlir.extras.types py package (#97102 ) Currently `f8E4M3` is mapped to `Float8E4M3FNType`. This PR renames `f8E4M3` to `f8E4M3FN` to accurately reflect the actual type. This PR is needed to avoid names conflict in upcoming PR which will add IEEE 754 `Float8E4M3Type`. https://github.com/llvm/llvm-project/pull/97118 Add f8E4M3 IEEE 754 type Maksim, can you review this PR? @makslevental ?	2024-06-29 12:48:11 -07:00
Durgadoss R	880d37038c	[APFloat] Add APFloat support for FP4 data type (#95392 ) This patch adds APFloat type support for the E2M1 FP4 datatype. The definitions for this format are detailed in section 5.3.3 of the OCP specification, which can be accessed here: https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf Signed-off-by: Durgadoss R <durgadossr@nvidia.com>	2024-06-14 14:17:37 +05:30
Durgadoss R	b1fe03f084	[APFloat] Add APFloat support for FP6 data types (#94735 ) This patch adds APFloat type support for two FP6 data types, E2M3 and E3M2. The definitions for the two formats are detailed in section 5.3.2 of the OCP specification, which can be accessed here: https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf Signed-off-by: Durgadoss R <durgadossr@nvidia.com>	2024-06-11 13:16:51 +05:30
Matthew Devereau	3613b26831	Constant Fold logf128 calls (#90611 ) This is a second attempt to land #84501 which failed on several targets. This patch adds the HAS_IEE754_FLOAT128 define which makes the check for typedef'ing float128 more precise by checking whether __uint128_t is available and checking if the host does not use __ibm128 which is prevalent on power pc targets and replaces IEEE754 float128s.	2024-05-29 06:13:02 +01:00
Nikita Popov	595de12ff3	[APFloat] Replace partsCount array with single variable (NFC) (#91910 ) We only ever use the last element of this array, so there shouldn't be a need to store the preceding elements as well.	2024-05-14 09:44:49 +09:00
Kazu Hirata	7ee6288312	[Support] Use StringRef::operator== instead of StringRef::equals (NFC) (#91042 ) I'm planning to remove StringRef::equals in favor of StringRef::operator==. - StringRef::operator== outnumbers StringRef::equals by a factor of 25 under llvm/ in terms of their usage. - The elimination of StringRef::equals brings StringRef closer to std::string_view, which has operator== but not equals. - S == "foo" is more readable than S.equals("foo"), especially for !Long.Expression.equals("str") vs Long.Expression != "str".	2024-05-04 08:46:48 -07:00
Matt Devereau	efce8a05aa	Revert "Constant Fold logf128 calls" This reverts commit 088aa81a545421933254f19cd3c8914a0373b493.	2024-05-01 12:18:39 +00:00
Matt Devereau	088aa81a54	Constant Fold logf128 calls This is a second attempt to land #84501 which failed on several targets. This patch adds the HAS_IEE754_FLOAT128 define which makes the check for typedef'ing float128 more precise by checking whether __uint128_t is available and checking if the host does not use __ibm128 which is prevalent on power pc targets and replaces IEEE754 float128s.	2024-05-01 11:55:54 +00:00
Matt Devereau	c26e9bf8fa	Revert "Constant Fold Logf128 calls (#84501 )" This reverts commit e90bc9cfd4d22c89dd993f62ede700ae25df49c5.	2024-04-18 11:20:54 +00:00
Matthew Devereau	e90bc9cfd4	Constant Fold Logf128 calls (#84501 ) This patch enables constant folding for 128 bit floating-point logf calls. This is achieved by querying if the host system has the logf128() symbol available with a CMake test. If so, replace the runtime call with the compile time value returned from logf128.	2024-04-18 10:19:01 +01:00
Simon Pilgrim	bcb685e119	[Support] Use StringRef::starts_with/ends_with instead of startswith/endswith. NFC. startswith/endswith wrap starts_with/ends_with and will eventually go away (to more closely match string_view)	2023-11-03 18:19:33 +00:00
Matt Arsenault	8a02fd3f94	APFloat: Add getExactLog2Abs Like the recently added getExactLog2 except ignore the sign bit. https://reviews.llvm.org/D158102	2023-08-23 19:16:41 -04:00
Matt Arsenault	0b57c3a7b8	APFloat: Add getExactLog2 https://reviews.llvm.org/D157108	2023-08-07 18:40:16 -04:00
Jeremy Furtek	55c2211a23	[APFloat] Add APFloat semantic support for TF32 This diff adds APFloat support for a semantic that matches the TF32 data type used by some accelerators (most notably GPUs from both NVIDIA and AMD). For more information on the TF32 data type, see https://blogs.nvidia.com/blog/2020/05/14/tensorfloat-32-precision-format/. Some intrinsics that support the TF32 data type were added in https://reviews.llvm.org/D122044. For some discussion on supporting common semantics in `APFloat`, see similar efforts for 8-bit formats at https://reviews.llvm.org/D146441, as well as https://discourse.llvm.org/t/rfc-adding-the-amd-graphcore-maybe-others-float8-formats-to-apfloat/67969. A subsequent diff will extend MLIR to use this data type. (Those changes are not part of this diff to simplify the review process.) Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D151923	2023-06-23 10:54:49 +02:00
Matt Arsenault	6966859059	ValueTracking: Implement computeKnownFPClass for fpext	2023-04-21 07:02:55 -04:00
David Majnemer	3d11652bbe	[APFloat] Refactor common code for APFloat<->APInt conversion All the IEEE formats are quite similar, we can merge their code effectively by writing it parametrically via the fltSemantics object. We can metaprogram the implementation such that this parametricity is zero-cost.	2023-04-04 19:12:10 +00:00
David Majnemer	2f086f265b	[APFloat] Add E4M3B11FNUZ X. Sun et al. (https://dl.acm.org/doi/10.5555/3454287.3454728) published a paper showing that an FP format with 4 bits of exponent, 3 bits of significand and an exponent bias of 11 would work quite well for ML applications. Google hardware supports a variant of this format where 0x80 is used to represent NaN, as in the Float8E4M3FNUZ format. Just like the Float8E4M3FNUZ format, this format does not support -0 and values which would map to it will become +0. This format is proposed for inclusion in OpenXLA's StableHLO dialect: https://github.com/openxla/stablehlo/pull/1308 As part of inclusion in that dialect, APFloat needs to know how to handle this format. Differential Revision: https://reviews.llvm.org/D146441	2023-03-24 20:06:40 +00:00
Matt Arsenault	71f1ea2c15	APFloat: Add classify	2023-03-03 18:54:58 -04:00
Kazu Hirata	f8f3db2756	Use APInt::count{l,r}_{zero,one} (NFC)	2023-02-19 22:04:47 -08:00
Pavel Kopyl	36606cf070	[NFC] Replace -1U{LL} expressions with appropriate *_MAX macros in Support library. This makes a code a bit more clear and also gets rid of C4146 warning on MSVC compiler: 'unary minus operator applied to unsigned type, result still unsigned'. In case uint64_t variable is initialized or compared against -1U expression, which corresponds to 32-bit constant, UINT_MAX macro is used to preserve NFC semantics; -1ULL is replaced with UINT64_MAX. Reviewed By: dblaikie, craig.topper Differential Revision: https://reviews.llvm.org/D143942	2023-02-14 22:16:19 +01:00
Krzysztof Drewniak	6109e70c72	[llvm][APFloat] Add NaN-in-negative-zero formats by AMD and GraphCore AMD, GraphCore, and Qualcom have published a standard for 8-bit floats that differs from the 8-bit floats defined by Nvidia, Intel, and ARM. This commit adds support for these alternate 8-bit floats to APFloat in order to enable their usage in MLIR. These formats are presented in the paper at https://arxiv.org/abs/2206.02915 and are implemented in GRaphCore hardware whose ISA is available at https://docs.graphcore.ai/projects/isa-mk2-with-fp8/en/latest/_static/TileVertexISA-IPU21-1.3.1.pdf . In these formats, like the existing Float8E4M3FN, there are no infinity values and there is only one NaN. Unlike in that format, however, the NaN values is 0x80, which would be negative 0 in IEEE formats. This means that these formats also make 0 unsigned. To allow for these new variant semantics, this commit adds fltNanEncoding, which can be IEEE (the default), AllOnes (used by Fleat8E4M3FN), or NegativeZero (used by the new formats, Float8E5M2FNUZ and Float8E4M3FNUZ). Normalization, arithmetic, and other such routines have been updated to account for the potential variant semantics. The two new formats are Float8E5M2FNUZ (5 bits exponent, 2 bits mantissa, finite, unsigned zero) and Float8E4M3FNUZ (4 bits exponent, 3 bits mantissa, finite, unsigned zero). Reviewed By: jakeh-gc, reedwm, lattner Differential Revision: https://reviews.llvm.org/D141863	2023-02-09 22:08:00 +00:00
Samuel Parker	038f7debfd	[DAGCombine] fp_to_sint isSaturatingMinMax Recommitting after fixing scalable vector crash. Check for single smax pattern against zero when converting from a small enough float. Differential Revision: https://reviews.llvm.org/D142481	2023-01-30 12:25:25 +00:00
Kazu Hirata	4dc08de9d2	[Support] Use std::clamp (NFC)	2023-01-29 11:43:10 -08:00
Samuel Parker	e60b91df13	Revert "[DAGCombine] fp_to_sint isSaturatingMinMax" This reverts commit 85395af27241ab9c8d5763b8afcaa07f1bab26d5. This is causing trouble with scalable vectors.	2023-01-27 15:42:12 +00:00
Samuel Parker	85395af272	[DAGCombine] fp_to_sint isSaturatingMinMax Check for single smax pattern against zero when converting from a small enough float. Differential Revision: https://reviews.llvm.org/D142481	2023-01-26 12:37:43 +00:00
serge-sans-paille	38818b60c5	Move from llvm::makeArrayRef to ArrayRef deduction guides - llvm/ part Use deduction guides instead of helper functions. The only non-automatic changes have been: 1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t), (uint8_t)) 2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase. 3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated. 4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that). Per reviewers' comment, some useless makeArrayRef have been removed in the process. This is a follow-up to https://reviews.llvm.org/D140896 that introduced the deduction guides. Differential Revision: https://reviews.llvm.org/D140955	2023-01-05 14:11:08 +01:00
Matt Arsenault	191c1d95e8	APFloat: Add isSmallestNormalized predicate function It was annoying to write the check for this in the one case I added, and I'm planning on adding another, so add a convenient PatternMatch like for other special case values. I have no idea what is going on in the DoubleAPFloat case, I reversed this from the makeSmallestNormalized test. Also could implement this as *this == getSmallestNormalized() for less code, but this avoids the construction of a temporary APFloat copy and follows the style of the other functions.	2022-12-15 14:04:26 -05:00
Reed	7476d59414	Fix APFloat::toString on Float8E5M2 values. Before, an APInt with value 10 was created, whose width was the significand width. But 10 cannot fit in Float8E5M2's significand. Differential Revision: https://reviews.llvm.org/D138540	2022-12-13 09:52:07 +01:00
Matt Arsenault	3d47afdf5a	APFloat: Simplify makeSmallestNormalized implementation	2022-12-02 18:47:43 -05:00
Reed	88eb3c62f2	Add FP8 E4M3 support to APFloat. NVIDIA, ARM, and Intel recently introduced two new FP8 formats, as described in the paper: https://arxiv.org/abs/2209.05433. The first of the two FP8 dtypes, E5M2, was added in https://reviews.llvm.org/D133823. This change adds the second of the two: E4M3. There is an RFC for adding the FP8 dtypes here: https://discourse.llvm.org/t/rfc-add-apfloat-and-mlir-type-support-for-fp8-e5m2/65279. I spoke with the RFC's author, Stella, and she gave me the go ahead to implement the E4M3 type. The name of the E4M3 type in APFloat is Float8E4M3FN, as discussed in the RFC. The "FN" means only Finite and NaN values are supported. Unlike E5M2, E4M3 has different behavior from IEEE types in regards to Inf and NaN values. There are no Inf values, and NaN is represented when the exponent and mantissa bits are all 1s. To represent these differences in APFloat, I added an enum field, fltNonfiniteBehavior, to the fltSemantics struct. The possible enum values are IEEE754 and NanOnly. Only Float8E4M3FN has the NanOnly behavior. After this change is submitted, I plan on adding the Float8E4M3FN type to MLIR, in the same way as E5M2 was added in https://reviews.llvm.org/D133823. Reviewed By: bkramer Differential Revision: https://reviews.llvm.org/D137760	2022-11-15 20:26:42 +01:00
Stella Laurenzo	e28b15b572	Add APFloat and MLIR type support for fp8 (e5m2). (Re-Apply with fixes to clang MicrosoftMangle.cpp) This is a first step towards high level representation for fp8 types that have been built in to hardware with near term roadmaps. Like the BFLOAT16 type, the family of fp8 types are inspired by IEEE-754 binary floating point formats but, due to the size limits, have been tweaked in various ways in order to maximally use the range/precision in various scenarios. The list of variants is small/finite and bounded by real hardware. This patch introduces the E5M2 FP8 format as proposed by Nvidia, ARM, and Intel in the paper: https://arxiv.org/pdf/2209.05433.pdf As the more conformant of the two implemented datatypes, we are plumbing it through LLVM's APFloat type and MLIR's type system first as a template. It will be followed by the range optimized E4M3 FP8 format described in the paper. Since that format deviates further from the IEEE-754 norms, it may require more debate and implementation complexity. Given that we see two parts of the FP8 implementation space represented by these cases, we are recommending naming of: * `F8M<N>` : For FP8 types that can be conceived of as following the same rules as FP16 but with a smaller number of mantissa/exponent bits. Including the number of mantissa bits in the type name is enough to fully specify the type. This naming scheme is used to represent the E5M2 type described in the paper. * `F8M<N>F` : For FP8 types such as E4M3 which only support finite values. The first of these (this patch) seems fairly non-controversial. The second is previewed here to illustrate options for extending to the other known variant (but can be discussed in detail in the patch which implements it). Many conversations about these types focus on the Machine-Learning ecosystem where they are used to represent mixed-datatype computations at a high level. At that level (which is why we also expose them in MLIR), it is important to retain the actual type definition so that when lowering to actual kernels or target specific code, the correct promotions, casts and rescalings can be done as needed. We expect that most LLVM backends will only experience these types as opaque `I8` values that are applicable to some instructions. MLIR does not make it particularly easy to add new floating point types (i.e. the FloatType hierarchy is not open). Given the need to fully model FloatTypes and make them interop with tooling, such types will always be "heavy-weight" and it is not expected that a highly open type system will be particularly helpful. There are also a bounded number of floating point types in use for current and upcoming hardware, and we can just implement them like this (perhaps looking for some cosmetic ways to reduce the number of places that need to change). Creating a more generic mechanism for extending floating point types seems like it wouldn't be worth it and we should just deal with defining them one by one on an as-needed basis when real hardware implements a new scheme. Hopefully, with some additional production use and complete software stacks, hardware makers will converge on a set of such types that is not terribly divergent at the level that the compiler cares about. (I cleaned up some old formatting and sorted some items for this case: If we converge on landing this in some form, I will NFC commit format only changes as a separate commit) Differential Revision: https://reviews.llvm.org/D133823	2022-10-04 17:18:17 -07:00
Vitaly Buka	e68c7a9917	Revert "Add APFloat and MLIR type support for fp8 (e5m2)." Breaks bots https://lab.llvm.org/buildbot/#/builders/37/builds/17086 This reverts commit 2dc68b5398258c7a0cf91f10192d058e787afcdf.	2022-10-02 21:22:44 -07:00
Stella Laurenzo	2dc68b5398	Add APFloat and MLIR type support for fp8 (e5m2). This is a first step towards high level representation for fp8 types that have been built in to hardware with near term roadmaps. Like the BFLOAT16 type, the family of fp8 types are inspired by IEEE-754 binary floating point formats but, due to the size limits, have been tweaked in various ways in order to maximally use the range/precision in various scenarios. The list of variants is small/finite and bounded by real hardware. This patch introduces the E5M2 FP8 format as proposed by Nvidia, ARM, and Intel in the paper: https://arxiv.org/pdf/2209.05433.pdf As the more conformant of the two implemented datatypes, we are plumbing it through LLVM's APFloat type and MLIR's type system first as a template. It will be followed by the range optimized E4M3 FP8 format described in the paper. Since that format deviates further from the IEEE-754 norms, it may require more debate and implementation complexity. Given that we see two parts of the FP8 implementation space represented by these cases, we are recommending naming of: * `F8M<N>` : For FP8 types that can be conceived of as following the same rules as FP16 but with a smaller number of mantissa/exponent bits. Including the number of mantissa bits in the type name is enough to fully specify the type. This naming scheme is used to represent the E5M2 type described in the paper. * `F8M<N>F` : For FP8 types such as E4M3 which only support finite values. The first of these (this patch) seems fairly non-controversial. The second is previewed here to illustrate options for extending to the other known variant (but can be discussed in detail in the patch which implements it). Many conversations about these types focus on the Machine-Learning ecosystem where they are used to represent mixed-datatype computations at a high level. At that level (which is why we also expose them in MLIR), it is important to retain the actual type definition so that when lowering to actual kernels or target specific code, the correct promotions, casts and rescalings can be done as needed. We expect that most LLVM backends will only experience these types as opaque `I8` values that are applicable to some instructions. MLIR does not make it particularly easy to add new floating point types (i.e. the FloatType hierarchy is not open). Given the need to fully model FloatTypes and make them interop with tooling, such types will always be "heavy-weight" and it is not expected that a highly open type system will be particularly helpful. There are also a bounded number of floating point types in use for current and upcoming hardware, and we can just implement them like this (perhaps looking for some cosmetic ways to reduce the number of places that need to change). Creating a more generic mechanism for extending floating point types seems like it wouldn't be worth it and we should just deal with defining them one by one on an as-needed basis when real hardware implements a new scheme. Hopefully, with some additional production use and complete software stacks, hardware makers will converge on a set of such types that is not terribly divergent at the level that the compiler cares about. (I cleaned up some old formatting and sorted some items for this case: If we converge on landing this in some form, I will NFC commit format only changes as a separate commit) Differential Revision: https://reviews.llvm.org/D133823	2022-10-02 17:17:08 -07:00
Fangrui Song	de9d80c1c5	[llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC With C++17 there is no Clang pedantic warning or MSVC C5051.	2022-08-08 11:24:15 -07:00
Danila Malyutin	ed6c309d4b	[APFloat] Fix truncation of certain subnormal numbers Certain subnormals would be incorrectly rounded away from zero. Fixes #55838 Differential Revision: https://reviews.llvm.org/D127140	2022-06-08 21:54:35 +03:00
Qiu Chaofan	f45d5e71d3	[APFloat] Set size of PPCDoubleDouble to 128 566690b0 uses size information in float semantics, but PPCDoubleDouble left them empty. As follow-up, we can consider remove PPCDoubleDoubleLegacy and fill other fields in the future. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D111398	2021-10-09 10:12:10 +08:00
Jay Foad	c95584cd74	[APFloat] Common up some assertions. NFC.	2021-10-04 11:38:53 +01:00
Jay Foad	566690b067	[APFloat] Remove BitWidth argument from getAllOnesValue There's no need to pass this in explicitly because it is trivially available from the semantics.	2021-10-04 11:32:16 +01:00

1 2 3 4 5 ...

333 Commits