The signed i4 bitcast was used when setting the exponent and mantissa
and instead the sign should be omitted in the comparisons.
Without this, for example the following incorrect conversion from `-0.5`
f4 to `-3.0` f32 will happen:
| Binary | F4E2M1 | f32[23:32] | f32
| 1001 | -0.5 | ~~1 1000 000 01~~ | ~~-3.0~~
**Walkthrough:**
Bits 23 and 24 are set based on:
```
Value isHalf =
arith::CmpIOp::create(b, arith::CmpIPredicate::eq, i4BitsNoSign, c0x1);
```
Because `1001 (i4) != 1`, bit 23 and 24 are set to the leading two bits
of `1001 << 2`, which is `01`. The correct bits are `00`.
Bits 25 through 31 are set based on the i4 value being greater or equal
to 4:
```
Value useLargerExp =
arith::CmpIOp::create(b, arith::CmpIPredicate::uge, i4BitsNoSign, c0x4);
```
As `1001` is a negative i4 value, this is false and those bits are
incorrectly set to `1000 000` instead of `0111 111`.
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
The F4E2M1 truncation emulation was expanding or truncating operations
to F32 even when the pattern did not apply, causing non-convergent
rewrites when operating on doubles.
Also, fix a pair of whitespace issues that snuck in.
See work detail: https://github.com/iree-org/iree/issues/20920
Add support for f4E2M1FN in `arith.truncf` and `arith.extf` ops though a software emulation
---------
Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>
This PR adds `arith.scaling_truncf` and `arith.scaling_extf` operations
which supports the block quantization following OCP MXFP specs listed
here
https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf
OCP MXFP Spec comes with reference implementation here
https://github.com/microsoft/microxcaling/tree/main
Interesting piece of reference code is this method `_quantize_mx`
7bc41952de/mx/mx_ops.py (L173).
Both `arith.scaling_truncf` and `arith.scaling_extf` are designed to be
an elementwise operation. Please see description about them in
`ArithOps.td` file for more details.
Internally,
`arith.scaling_truncf` does the
`arith.truncf(arith.divf(input/(2^scale)))`. `scale` should have
necessary broadcast, clamping, normalization and NaN propagation done
before callling into `arith.scaling_truncf`.
`arith.scaling_extf` does the `arith.mulf(2^scale, input)` after taking
care of necessary data type conversions.
CC: @krzysz00 @dhernandez0 @bjacob @pashu123 @MaheshRavishankar
@tgymnich
---------
Co-authored-by: Prashant Kumar <pk5561@gmail.com>
Co-authored-by: Krzysztof Drewniak <Krzysztof.Drewniak@amd.com>
F8E8M0 floating type is supposed to represent biased exponent bits of
F32 type in OCP Micro scaling floating point formats.
https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf
This PR expands `arith.truncf` and `arith.extf` to support this
behavior.
For the `arith.truncf` thing to note here is that F8E8M0FNU type has one
NaN representation which is encoded as `0xFF`. Therefore alll kinds of
NaNs and +/-Inf in Float32Type would map to NaN in F8E8M0FNU. F8E8M0FNU
doesn't have a sign bit therefore it is a lossy and irreversible
downcast.
cc: @krzysz00 @MaheshRavishankar @Muzammiluddin-Syed-ECE
This fixes the current lowering of `arith.ceildivsi` in the arith-expand
pass, which was previously incorrect. The new version is based on the
lowering of `arith.floordivsi`, and will not introduce new undefined
behavior or poison during the lowering. It also replaces one division
with a multiplication.
The previous lowering of `ceildivsi(n, m)` was the following:
```
x = (m > 0) ? -1 : 1
(n*m>0) ? ((n+x) / m) + 1 : - (-n / m)
```
This caused two problems:
* In the case where `n` is INT_MIN and `m` is positive, the result would
be poison instead of an actual value
* In the case where `n` is INT_MAX and `m` is `-1`, this would trigger
undefined behavior, while the original code wouldn't. This is because
`n+x` would be equal to `INT_MIN` (`INT_MAX + 1`), so the `(n+x) / m`
division would overflow and trigger UB.
Expand `arith.minsi`, `arith.minui`, `arith.maxsi`, `arith.maxui` into
`arith.cmpi` and `arith.select`.
---------
Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
Add rounding mode attribute to `arith`. This attribute can be used in
different FP `arith` operations to control rounding mode. Rounding modes
correspond to IEEE 754-specified rounding modes. Use in `arith.truncf` folding.
As this is not supported in dialects other than LLVM, conversion should fail for
now in case this attribute is present.
---------
Signed-off-by: Victor Perez <victor.perez@codeplay.com>
This lowering was not correctly handling the case where saturation of
the mantissa results in an increase of the exponent value. The new code
borrows, with credit, the idea from
e1502c0cdb/c10/util/BFloat16.h (L60-L79)
and adds comments to explain the magic trick going on here and why it's
correct. Hat tip to its original author, whom I believe to be
@Maratyszcza.
A testcase was also requiring a tie to be broken upwards in a case where
"to nearest-even" required going downward. The fact that it used to pass
suggests that there was another bug in the old code.
The maxnum/minnum semantics can be found at
https://llvm.org/docs/LangRef.html#llvm-minnum-intrinsic.
The revision also updates function names in lit tests to match op name.
Take arith.maxnumf as example:
```
func.func @maxnumf(%lhs: f32, %rhs: f32) -> f32 {
%result = arith.maxnumf %lhs, %rhs : f32
return %result : f32
}
```
will be expanded to
```
func.func @maxnumf(%lhs: f32, %rhs: f32) -> f32 {
%0 = arith.cmpf ugt, %lhs, %rhs : f32
%1 = arith.select %0, %lhs, %rhs : f32
%2 = arith.cmpf uno, %lhs, %lhs : f32
%3 = arith.select %2, %rhs, %1 : f32
return %3 : f32
}
```
Case 1: Both LHS and RHS are not NaN; LHS > RHS
In this case, `%1` is LHS. `%3` and `%1` have the same value, so `%3` is
LHS.
Case 2: LHS is NaN and RHS is not NaN
In this case, `%2` is true, so `%3` is always RHS.
Case 3: LHS is not NaN and RHS is NaN
In this case, `%0` is true and `%1` is LHS. `%2` is false, so `%3` and
`%1` have the same value, which is LHS.
Case 4: Both LHS and RHS are NaN:
`%1` and RHS are all NaN, so the result is still NaN.
Note that the `Pass` suffix is added in tablegen, and as a side effect the
options are renamed from `ArithExpandOpsOptions` to `ArithExpandOpsPassOptions`.
This patch is part of a larger initiative aimed at fixing floating-point `max` and `min` operations in MLIR: https://discourse.llvm.org/t/rfc-fix-floating-point-max-and-min-operations-in-mlir/72671.
This commit addresses Task 1.2 of the mentioned RFC. By renaming these operations, we align their names with LLVM intrinsics that have corresponding semantics.
This reverts commit 7c349c369847dc2f1736efb9c90d03521cd44a90.
Per discussion at
https://reviews.llvm.org/rG7c349c369847dc2f1736efb9c90d03521cd44a90
and elsewhere, the lowering to LLVM defined here isn't what it should
be and the fastmath flag usage isn't correct, so `arith.is_nan` and
`arith.is_inf` cannot exist in their current form.
It's unclear if those operations should be introduced in the future,
since they make the dialect more complex and don't add any expressive
power. Further discussion may be moved to an RFC (or I'll drop this
patch).
Differential Revision: https://reviews.llvm.org/D157543
Both LLVM and SPIR-V have some form of "is this float a NaN/Inf"
operation (though LLVM's uses the rather opaque "is.fpclass"
intrinsic), which is not exposed in MLIR.
This has lead to awkward workarounds in -arith-expands-ops where a NaN
test was performed by comparing an operation to itself. This commit
resolves that issue.
Reviewed By: dcaballe, kuhar
Differential Revision: https://reviews.llvm.org/D156169
ExpandOpsPass could only be configured via command line flags. Updated
to allowed constructing using the specified Options structure.
Reviewed By: NatashaKnk
Differential Revision: https://reviews.llvm.org/D148820
bf16 has a trivial truncation/extension behavior with F32 that
can be described in elementary arith operations. Include some
expansions to efficiently convert including rounding towards
infinity for f32 to bf16 truncation.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D147585
This reverts commit 5bff523793ee8c30c260cc77b23c61dcbb606486. The
bf16->f32 conversion is incorrect. This can't be on by default, if you
want this behavior make it a separate pass.
bf16 has a trivial truncation/extension behavior with F32 that
can be described in elementary arith operations. Include some
expansions to efficiently convert.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D147091
This patch extends the `createConst` method so that it can generate
constant vectors (it can already generate scalars). This change is
required to be able to apply the converter for `arith.floordivsi`
(i.e. `FloorDivSIOpConverter`) to vectors.
While `arith.floordivsi` is my main motivation for this change, this
patch should also allow other Arith ops to be converted in vector cases.
In my example, the Linalg vectorizer updates `arith.floordivsi` to
operate on vectors and hence the need for this change.
Differential Revision: https://reviews.llvm.org/D146741
As of several months ago, both ArithToLLVM and ArithToSPIRV have
native support for integer min and max operations. Since these are all
the targets available in MLIR core, the need to "expand" arith.minui,
arith.minsi, arith,maxsi, and arith.manxui to more primitive
operations is to longer present.
Therefore, the expanding of integer min and max operations in Arith,
while correct, is likely to lead to performance loss by way of
misoptimization further down the line, and is no longer needed for
anyone's correctness.
This change may break downstream tests, but will not affect the
semantics of MLIR programs.
arith.minf and arith.maxf have a lot of underlying complexity due to
the many different possible NaN and signed zero semantics available on
various platforms, and so removing their expansion is left to a future
commit.
Reviewed By: ThomasRaoux, Mogball
Differential Revision: https://reviews.llvm.org/D140856
This allows more precise control over which patterns to pick to
expand arithmetic ops. Previously ceil/floor division epxansion
is only available together with various min/max op expansion.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D135479