llvm-project

Author	SHA1	Message	Date
Krzysztof Drewniak	51b65d0895	[mlir][AMDGPU] Improve BF16 handling through AMDGPU compilation Many previous sets of AMDGPU dialect code have been incorrect in the presence of the bf16 type (when lowered to LLVM's bfloat) as they were developed in a setting that run a custom bf16-to-i16 pass before LLVM lowering. An overall effect of this patch is that you should run --arith-emulate-unsupported-floats="source-types=bf16 target-type=f32" on your GPU module before calling --convert-gpu-to-rocdl if your code performs bf16 arithmetic. While LLVM now supports software bfloat, initial experiments showed that using this support on AMDGPU inserted a large number of conversions around loads and stores which had substantial performance imparts. Furthermore, all of the native AMDGPU operations on bf16 types (like the WMMA operations) operate on 16-bit integers instead of the bfloat type. First, we make the following changes to preserve compatibility once the LLVM bfloat type is reenabled. 1. The matrix multiplication operations (MFMA and WMMA) will bitcast bfloat vectors to i16 vectors. 2. Buffer loads and stores will operate on the relevant integer datatype and then cast to bfloat if needed. Second, we add type conversions to convert bf16 and vectors of it to equivalent i16 types. Third, we add the bfloat <-> f32 expansion patterns to the set of operations run before the main LLVM conversion so that MLIR's implementation of these conversion routines is used. Finally, we extend the "floats treated as integers" support in the LLVM exporter to handle types other than fp8. We also fix a bug in the unsupported floats emulation where it tried to operate on `arith.bitcast` due to an oversight. Reviewed By: rsuderman Differential Revision: https://reviews.llvm.org/D156361	2023-08-17 18:31:28 +00:00
Matthias Springer	ce254598b7	[mlir][Conversion] Store const type converter in ConversionPattern ConversionPatterns do not (and should not) modify the type converter that they are using. * Make `ConversionPattern::typeConverter` const. * Make member functions of the `LLVMTypeConverter` const. * Conversion patterns take a const type converter. * Various helper functions (that are called from patterns) now also take a const type converter. Differential Revision: https://reviews.llvm.org/D157601	2023-08-14 09:03:11 +02:00
Krzysztof Drewniak	636f772871	[mlir][Arith] Make previous load-bearing assert into a real error When I landed the EmulateUnsupportedFloats, I'd negligently included an assert that needed to run for the pass to be correct. Previous emergency fix commits removed the assert. This commit re-adds the "can't happen" testing as an emitOpError() and aborting the rewrite, thus allowing it to function in no-assertions builds. Reviewed By: kuhar Differential Revision: https://reviews.llvm.org/D155088	2023-07-13 14:49:29 +00:00
Sterling Augustine	5f4d96ebef	Don't to real work inside an assertion. asserts get compiled to empty when built in opt mode, so that makes certain tests fail, such as emulate-unsupported-floats.mlir.test. This removes the assert altogether, which is also suboptimal, but I have reported to the original author.	2023-07-11 17:44:58 -07:00
Sterling Augustine	5671f02304	Fix unused variable warning.	2023-07-11 14:54:29 -07:00
Krzysztof Drewniak	10b56e0210	[mlir][Arith] Add pass for emulating unsupported float ops (#1079 ) To complement the bf16 expansion and truncation patterns added to ExpandOps, define a pass that replaces, for any arithmetic operation op, %y = arith.op %v0, %v1, ... : T with %e0 = arith.expf %v0 : T to U %e1 = arith.expf %v1 : T to U ... %y.exp = arith.op %e0, %e1, ... : U %y = arith.truncf %y.exp : U to T This allows for "emulating" floating-point operations not supported on a given target (such as bfloat operations or most arithmetic on 8-bit floats) by extending those types to supported ones, performing the arithmetic operation, and then truncating back to the original type (which ensures appropriate rounding behavior). The lowering of the extf and truncf ops introduced by this transformation should be handled by subsequent passes. Reviewed By: rsuderman Differential Revision: https://reviews.llvm.org/D154539	2023-07-11 20:32:35 +00:00

6 Commits