llvm-project

Author	SHA1	Message	Date
Maksim Levental	c610b24493	[mlir][NFC] update `mlir/Dialect` create APIs (27/n) (#150638 ) See https://github.com/llvm/llvm-project/pull/147168 for more info.	2025-07-25 11:48:32 -05:00
Maksim Levental	8e8f195322	[mlir][amd] fix LLVM::InsertValueOp::create failure to disambiguate (#150605 ) fixes https://github.com/llvm/llvm-project/pull/149879#issuecomment-3117145615 Note this happens because ADL can't disambiguate between `mlir::DenseI64ArrayAttr` and `llvm::ArrayRef<int64_t>` for the value 0 which I guess is equal to nullptr on some (most?) systems. Note, this only occurs with the value 0.	2025-07-25 07:56:27 -04:00
Maksim Levental	b0434925c9	[mlir][NFC] update `Conversion` create APIs (4/n) (#149879 ) See https://github.com/llvm/llvm-project/pull/147168 for more info.	2025-07-23 10:49:35 -05:00
Ivan Butygin	4977100624	[mlir][amdgpu] Add `rocdl.s.waitcnt` wrapper (#149670 ) The main motivations is to pass vmcnt/expcnt/lgkmcnt values directly (similar to the asm format) and delegate architecture-dependent bitpacking to the amdgpu->rocdl lowering. --------- Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>	2025-07-22 23:37:56 +03:00
Daniel Hernandez-Juarez	668c964282	[AMDGPU] [MLIR] Add 96 and 128 bit GatherToLDS for gfx950 (#147496 ) This PR adds 96 and 128 gather_to_lds support for gfx950. Updating lowering, verifier and tests.	2025-07-09 11:53:26 -04:00
Alan Li	3f3282cee8	[AMDGPU] Adding AMDGPU dialect wrapper for ROCDL transpose loads. (#145395 ) * 1-to-1 mapping wrapper op. * Direct lowering from AMDGPU wrapper to ROCDL intrinsics.	2025-06-25 22:58:14 -04:00
Umang Yadav	836201f117	Allow bf16 operands on new MFMAs (#144925 ) New gfx950 MFMA allows bf16 operands. `c0cc81cdc0/llvm/include/llvm/IR/IntrinsicsAMDGPU.td (L3434)` When running `amdgpu-to-rocdl`, Current logic converts bf16 to i16 always which fails to compile for newer bf16 MFMA e.g. `v_mfma_f32_16x16x32bf16`. Backend expects bf16 type for the operands for those newer MFMAs. This patch fixes it. CC: @krzysz00 @dhernandez0 @giuseros @antiagainst @kuhar	2025-06-19 12:52:31 -05:00
Daniel Hernandez-Juarez	68b6f392ed	[MLIR][AMDGPU] Fix bug in GatherToLDSOpLowering, get the correct MemRefType for destination (#142915 ) This PR fixes a bug in GatherToLDSOpLowering, we were getting the MemRefType of source for the destination. Additionally, some related typos are corrected. CC: @krzysz00 @umangyadav @lialan	2025-06-13 11:33:51 -05:00
Tim Gymnich	67c590004d	[mlir][AMDGPU] Add scaled floating point conversion ops (#141554 ) implement `ScaledExtPackedOp` and `PackedScaledTruncOp`	2025-06-13 11:09:11 +02:00
Bruno Cardoso Lopes	05494f3bad	[MLIR][LLVM] Tail call support for inline asm op (#140826 )	2025-05-22 15:30:31 -07:00
Krzysztof Drewniak	6c813e8a3c	[mlir][ROCDL] Add fp4 and fp6 conversion intrinsics, fix fp8 immargs (#140801 ) This PR adds support for the scaled conversion intrinsics for fp4 and fp6 types so that they can be targetted by a future amdgpu dialect op or used directly. Additionally, this patch refactors the copy-paste-heavy fp8 versions of these scaled conversion intrinsics with tablegen `foreach` loops, and fixes the fact that certain immargs weren't being stored as attributes. Note that some of the MLIR-level tests for those scaled fp8 intrinsics had incorrect return types, which have been fixed. (Note that while the operations have a known return type, the IR format still prints that type for clarity).	2025-05-21 13:50:02 -07:00
Peiyong Lin	04ad8d4900	Emit inbounds and nuw attributes in memref. (#138984 ) Now that MLIR accepts nuw and nusw in getelementptr, this patch emits the inbounds and nuw attributes when lower memref to LLVM in load and store operators. This patch also strengthens the memref.load and memref.store spec about undefined behaviour during lowering. This patch also lifts the \|rewriter\| parameter in getStridedElementPtr ahead so that LLVM::GEPNoWrapFlags can be added at the end with a default value and grouped together with other operators' parameters. Signed-off-by: Lin, Peiyong <linpyong@gmail.com>	2025-05-20 14:16:22 -07:00
Krzysztof Drewniak	4bdd116b80	[AMDGPU] Add a new amdgcn.load.to.lds intrinsic (#137425 ) This PR adds a amdgns_load_to_lds intrinsic that abstracts over loads to LDS from global (address space 1) pointers and buffer fat pointers (address space 7), since they use the same API and "gather from a pointer to LDS" is something of an abstract operation. This commit adds the intrinsic and its lowerings for addrspaces 1 and 7, and updates the MLIR wrappers to use it (loosening up the restrictions on loads to LDS along the way to match the ground truth from target features). It also plumbs the intrinsic through to clang.	2025-05-19 07:15:04 -07:00
Muzammil	105ce585d3	[mlir][amdgpu] Define an amdgpu.scaling_mfma wrapper (#137498 ) Create a wrapper around the new scaled MFMAs that operate on specific element types and tile sizes. See [Issue](https://github.com/iree-org/iree/issues/20616). --------- Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>	2025-05-02 18:07:17 -05:00
Ivan Butygin	dda4b968e7	[mlir] AMDGPUToROCDL: lower `amdgpu.swizzle_bitmode` (#136223 ) Repack `amdgpu.swizzle_bitmode` arguments and lower it to `rocdl.ds_swizzle`. Repacking logic is follows: * `sizeof(arg) < sizeof(i32)`: bitcast to integer and zext to i32 and then trunc and bitcast back. * `sizeof(arg) == sizeof(i32)`: just bitcast to i32 and back if not i32 * `sizeof(arg) > sizeof(i32)`: bitcast to `vector<Nxi32>`, extract individual elements and do a series of `rocdl.ds_swizzle` and then compose vector and bitcast back. Added repacking logic to LLVM utils so it can be used elsewhere. I'm planning to use it for `gpu.shuffle` later.	2025-04-18 17:19:04 +03:00
Alan Li	dae0ef53a0	[MLIR][AMDGPU] Add a wrapper for global LDS load intrinsics in AMDGPU (#133498 ) Defining a new `amdgpu.global_load` op, which is a thin wrap around ROCDL `global_load_lds` intrinsic, along with its lowering logics to `rocdl.global.load.lds`.	2025-04-08 09:18:30 -04:00
Krzysztof Drewniak	25622aa745	[mlir][AMDGPU] Add gfx950 MFMAs to the amdgpu.mfma op (#133553 ) This commit extends the lowering of amdgpu.mfma to handle the new double-rate MFMAs in gfx950 and adds tests for these operations. It also adds support for MFMAs on small floats (f6 and f4), which are implented using the "scaled" MFMA intrinsic with a scale value of 0 in order to have an unscaled MFMA. This commit does not add a `amdgpu.scaled_mfma` operation, as that is future work. --------- Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>	2025-04-01 11:59:09 -05:00
Yi Qian	0ea4fb9264	[AMD][ROCDL] Add packed conversions fp8/bf8->bf16 and fp8/bf8->fp32 in ROCDL dialect (#131850 ) - Add packed conversions fp8/bf8->bf16 for gfx950 and fp8/bf8->fp32 for gfx942 in ROCDL dialect - Update amdgpu.ext_packed_fp8 lowering to use ROCDL packed fp8/bf8->f32 conversions for vector target types and ROCDL scalar fp8/bf8->fp32 for scalar target type. --------- Co-authored-by: Jungwook Park <jungwook.park@amd.com>	2025-03-21 14:49:50 +00:00
Mirza Halilčević	1fc49ff593	[MLIR][AMDGPU] Add OCP FP8 support for new hardware (#127728 ) (Continuing from #106160) This PR addresses remaining review comments from the original PR. Original PR Description --- Upcoming hardware (gfx12 and some future gfx9) will support the OCP 8-bit float formats for their matrix multiplication intrinsics and conversion operations, retaining existing opcodes and compiler builtins. This commit adds support for these types to the MLIR wrappers around such operations, ensuring that the OCP types aren't used to generate those builtins on hardware that doesn't expect that format and, conversely, to ensure that the pre-OCP formats aren't used on new hardware. --------- Signed-off-by: Mirza Halilcevic <mirza.halilcevic@amd.com> Co-authored-by: Paul Fuqua <pf@acm.org> Co-authored-by: Krzysztof Drewniak <Krzysztof.Drewniak@amd.com>	2025-03-03 14:10:31 -06:00
Krzysztof Drewniak	b31175a33a	[mlir][AMDGPU] Add int4 intrinsics, mixed-type fp8 to handle gfx12 (#128963 ) 1. Extend the gfx12 FP8 support to allow mixed-type intrinsics (since they've been added), creating limited mixed-type support that mirrors MFMA 2. Extend the `amdgpu.wmma` intrinsic lowering to correctly handle shorter vectors because gfx12 now has instructions that logically take a 4xi8, or, as far as LLVM's concerned, an i32. Similarly, there are 4xi4 inputs, which are an i16 (that must be zero-extended to i32). 3. Correctly handle the ambiguities in the int4 intrinsics on gfx12, which can either be 16x16x16 or 16x16x32 4. Add tests showing all WMMAs being lowered the way gfx12 expects (mirroring LLVM's tests) 5. Add a verifier to prevent emiting ilegal instructions on gfx12.	2025-02-27 14:48:58 -06:00
Krzysztof Drewniak	42526d240c	[mlir][AMDGPU] Plumb address space 7 through MLIR, add address_space attr. (#125594 ) This commit adds support for casting memrefs into fat raw buffer pointers to the AMDGPU dialect. Fat raw buffer pointers - or, in LLVM terms, ptr addrspcae(7), allow encapsulating a buffer descriptor (as produced by the make.buffer.rsrc intrinsic or provided from some API) into a pointer that supports ordinary pointer operations like load or store. This allows people to take advantage of the additional semantics that buffer_load and similar instructions provide without forcing the use of entirely separate amdgpu.raw_buffer_* operations. Operations on fat raw buffer pointers are translated to the corresponding LLVM intrinsics by the backend. This commit also goes and and defines a #amdgpu.address_space<> attribute so that AMDGPU-specific memory spaces can be represented. Only #amdgpu.address_space<fat_raw_buffer> will work correctly with the memref dialect, but the other possible address spaces are included for completeness. --------- Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com> Co-authored-by: Prashant Kumar <pk5561@gmail.com>	2025-02-26 16:02:39 -06:00
Ivan Butygin	6e611264c6	[mlir] AMDGPUToROCDL: handle 1-element vectors (#128266 ) Buffer intrinsics doesn't support 1-element vectors, cast them to scalars.	2025-02-23 03:50:59 +03:00
lorenzo chelini	0b63dfb066	[MLIR][NFC] Use base alias for constructor inheritance (#127756 ) During my previous cleanup (#127403), I did not notice that we defined a type alias for the base class. This type alias allows us to use the shorter form Base::Base, and this PR switches to that.	2025-02-19 16:05:25 +01:00
Fabian Ritter	8900e412ae	[AMDGPU][MLIR] Replace gfx940 and gfx941 with gfx942 in MLIR (#125836 ) gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. For SWDEV-512631	2025-02-19 10:05:45 +01:00
lorenzo chelini	c1a2292526	[MLIR][NFC] Retire `let constructor` for passes in Conversion directory (part1) (#127403 ) `let constructor` is deprecated since the table gen backend emits most of the glue logic to build a pass. This PR retires the td method for most (I need another pass) passes in the Conversion directory.	2025-02-17 10:55:27 +01:00
Krzysztof Drewniak	efd0a7f446	[mlir][ROCDL][~NFC] Migrate to LLVM dialect default builders (#125609 ) There were a bunch of spots in ROCDL.td where we were defining our own llvmBuilder call which could have been generated using the default built-in one on LLVM_IntrOpBase. This commit cleans up such usages in the interests of potentinally enabling ROCDL import in the future and of making best practices more obvious. The one breaking change is renaming WaitcntOp to SWaitcntOp, which should have minimal impact.	2025-02-06 11:38:43 -06:00
Matthias Springer	6aaa8f25b6	[mlir][IR][NFC] Move free-standing functions to `MemRefType` (#123465 ) Turn free-standing `MemRefType`-related helper functions in `BuiltinTypes.h` into member functions.	2025-01-21 08:48:09 +01:00
Matthias Springer	7a77f14c0a	[mlir][IR] Remove `isF...()` type API for low-precision FP types (#123326 ) Remove `type.isFloat4E2M1FN()` etc. Use `isa<Float4E2M1FNType>(type)` instead. For details, see: https://discourse.llvm.org/t/rethink-on-approach-to-low-precision-fp-types/82361/28	2025-01-20 09:22:53 +01:00
Fabian Mora	0c1c49f0ff	[mlir][AMDGPU] Fix raw buffer ptr ops lowering (#122293 ) This patch fixes several bugs in the lowering of AMDGPU raw buffer operations. These bugs include: - Incorrectly handling the offset of the memref, causing errors when using subviews. - Using the MaximumOp (float specific op) to calculate the number of records. - The number of records in the static shape case. - The lowering when index bitwidth=i64. Furthermore this patch also switches to use MLIR's data layout to get the type size. --------- Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>	2025-01-13 16:11:33 -05:00
Ivan Butygin	953b07febc	[mlir] AMDGPUToROCDL: RawBufferOpLowering fixes (#120642 ) 1. We can use `getNumElements()` only for memrefs with trivial layout. 2. Buffer ops expecting sizes in i32 but descriptor values can be either i32 or i64, add appropriate casts. This implementation is not ideal as it can overflow, but it's still better than generating broken IR.	2024-12-20 18:09:01 +03:00
Krzysztof Drewniak	3452149c05	[mlir][AMDGPU] Support vector<2xbf16> packed atomic fadd (#113929 ) Now that we use LLVM's native bfloat types in the AMDGPU lowering, enable vector<2xbf16> for AMDGPU.	2024-10-31 10:52:53 -05:00
Benoit Jacob	d8a656ffaf	[MLIR] AMDGPUToROCDL: Use a bitcast op to reintepret a vector of i8 as single integer. (#111400 ) Found by inspecting AMDGPU assembly - so the arithmetic ops created there were definitely making their way into the target ISA. A `LLVM::BitcastOp` seems equivalent, and evaporates as expected in the target asm. Along the way, I thought that this helper function `mfmaConcatIfNeeded` could be renamed to `convertMFMAVectorOperand` to better convey its contract; so I don't need to think about whether a bitcast is a legitimate "concat" :-) --------- Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>	2024-10-07 14:14:18 -04:00
Matthias Springer	206fad0e21	[mlir][NFC] Mark type converter in `populate...` functions as `const` (#111250 ) This commit marks the type converter in `populate...` functions as `const`. This is useful for debugging. Patterns already take a `const` type converter. However, some `populate...` functions do not only add new patterns, but also add additional type conversion rules. That makes it difficult to find the place where a type conversion was added in the code base. With this change, all `populate...` functions that only populate pattern now have a `const` type converter. Programmers can then conclude from the function signature that these functions do not register any new type conversion rules. Also some minor cleanups around the 1:N dialect conversion infrastructure, which did not always pass the type converter as a `const` object internally.	2024-10-05 21:32:40 +02:00
Daniel Hernandez-Juarez	b014265d99	[mlir][AMDGPU] New gfx12 barrier instructions and update lowering LDSBarrierOp (#109273 ) New gfx12 barrier instructions: s.barrier.signal, s.barrier.wait and s.wait.dscnt. And update lowering LDSBarrierOp accordingly. CC: @krzysz00 @manupak @giuseros	2024-09-20 17:41:36 -05:00
Krzysztof Drewniak	6292ea6879	[mlir][AMDGPU] Remove an old bf16 workaround (#108409 ) The AMDGPU backend now implements LLVM's `bfloat` type. Therefore, we no longer need to type convert MLIR's `bf16` to `i16` during lowerings to ROCDL. As a result of this change, we discovered that, whel the code for MFMA and WMMA intrinsics was mainly prepared for this change, we were failing to bitcast the bf16 results of WMMA operations out from the i16 they're natively represented as. This commit also fixes that issue. --------- Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>	2024-09-12 17:45:39 -05:00
Krzysztof Drewniak	9596e83b2a	[mlir][AMDGPU] Enable emulating vector buffer_atomic_fadd on gfx11 (#108312 ) * Fix a bug introduced by the Chipset refactoring in #107720 where atomics emulation for adds was mistakenly applied to gfx11+ * Add the case needed for gfx11+ atomic emulation, namely that gfx11 doesn't support atomically adding a v2f16 or v2bf16, thus requiring MLIR-level legalization for buffer intrinsics that attempt to do such an addition * Add tests, including tests for gfx11 atomic emulation Co-authored-by: Manupa Karunaratne <manupa.karunaratne@amd.com>	2024-09-12 09:47:52 -05:00
Krzysztof Drewniak	aa60a3e4d0	[mlir][AMDGPU] Support vector<2xf16> inputs to buffer atomic fadd (#108286 ) Extend the lowering of atomic.fadd to support the v2f16 variant avaliable on some AMDGPU chips. Re-lands #108238 (and addresses review comments from there) Co-authored-by: Giuseppe Rossini <giuseppe.rossini@amd.com>	2024-09-11 17:51:07 -05:00
Krzysztof Drewniak	cb031267bd	Revert "[mlir][AMDGPU] Support vector<2xf16> inputs to buffer atomic fadd (#108238 )" (#108256 ) This reverts commit 0d48d4d835ec7a2e4d59a8fe4c26dc9823cee56a. Mistakenly landed without approval	2024-09-11 12:28:15 -05:00
Krzysztof Drewniak	0d48d4d835	[mlir][AMDGPU] Support vector<2xf16> inputs to buffer atomic fadd (#108238 ) Extend the lowering of atomic.fadd to support the v2f16 variant avaliable on some AMDGPU chips. Co-authored-by: Giuseppe Rossini <giuseppe.rossini@amd.com>	2024-09-11 12:12:17 -05:00
Jakub Kuderski	763bc9249c	[mlir][amdgpu] Align Chipset with TargetParser (#107720 ) Update the Chipset struct to follow the `IsaVersion` definition from llvm's `TargetParser`. This is a follow up to https://github.com/llvm/llvm-project/pull/106169#discussion_r1733955012. * Add the stepping version. Note: This may break downstream code that compares against the minor version directly. * Use comparisons with full Chipset version where possible. Note that we can't use the code in `TargetParser` directly because the chipset utility is outside of `mlir/Target` that re-exports llvm's target library.	2024-09-09 11:12:26 -04:00
Giuseppe Rossini	a8e1c6f99a	[MLIR][AMDGPU] Add support for fp8 ops on gfx12 (#106388 ) This PR is adding support for `fp8` and `bfp8` on gfx12	2024-09-03 17:47:08 +01:00
stefankoncarevic	1164e4aef2	[mlir][AMDGPU] Implement AMDGPU DPP operation in MLIR. (#89233 ) Defined AMDGPU DPP operation in mlir to represent semantics. Introduced a new enumeration attribute for different permutations and allowed for different types of arguments. Implemented constant attribute handling for ROCDL::DPPMovOp operation. The operation now correctly accepts constant attributes for dppCtrl, rowMask, bankMask, boundCtrl, and passes them to the corresponding LLVM intrinsic.	2024-08-16 11:19:39 -05:00
Manupa Karunaratne	1d0723d4fb	[MLIR][AMDGPU] Add amdgpu.sched_barrier (#98911 ) This commit adds sched_barrier operator to AMDGPU dialect that lowers to rocdl.sched.barrier.	2024-07-30 09:27:28 -05:00
Christian Sigg	a5757c5b65	Switch member calls to `isa/dyn_cast/cast/...` to free function calls. (#89356 ) This change cleans up call sites. Next step is to mark the member functions deprecated. See https://mlir.llvm.org/deprecation and https://discourse.llvm.org/t/preferred-casting-style-going-forward.	2024-04-19 15:58:27 +02:00
Krzysztof Drewniak	b05c15259b	[mlir][AMDGPU] Improve amdgpu.lds_barrier, add warnings (#77942 ) On some architectures (currently gfx90a, gfx94, and gfx10*), we can implement an LDS barrier using compiler intrinsics instead of inline assembly, improving optimization possibilities and decreasing the fragility of the underlying code. Other AMDGPU chipsets continue to require inline assembly to implement this barrier, as, by the default, the LLVM backend will insert waits on global memory (s_waintcnt vmcnt(0)) before barriers in order to ensure memory watchpoints set by debuggers work correctly. Use of amdgpu.lds_barrier, on these architectures, imposes a tradeoff between debugability and performance. The documentation, as well as the generated inline assembly, have been updated to explicitly call attention to this fact. For chipsets that did not require the inline assembly hack, we move to the s.waitcnt and s.barrier intrinsics, which have been added to the ROCDL dialect. The magic constants used as an argument to the waitcnt intrinsic can be derived from llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp	2024-03-11 10:06:49 -05:00
Hugo Trachino	65066c0277	[mlir] Use `create` instead of `createOrFold` for ConstantOp as folding has no effect (NFC) (#80129 ) This aims to clean-up confusing uses of builder.createOrFold<ConstantOp> since folding of constants fails.	2024-01-31 23:40:37 -08:00
Mehdi Amini	5c1650e7f4	Apply clang-tidy fixes for llvm-else-after-return in AMDGPUToROCDL.cpp (NFC)	2023-11-20 01:40:31 -08:00
Krzysztof Drewniak	2ebd633f14	[mlir][AMDGPU] Add packed 8-bit float conversion ops and lowering Define operations that wrap the gfx940's new operations for converting between f32 and registers containing packed sets of four 8-bit floats. Define rocdl operations for the intrinsics and an AMDGPU dialect wrapper around them (to account for the fact that MLIR distinguishes the two float formats at the type level but that the LLVM IR does not). Define an ArithToAMDGPU pass, meant to run before conversion to LLVM, that replaces relevant calls to arith.extf and arith.truncf with the packed operations in the AMDGPU dialect. Note that the conversion currently only handles scalars and vectors of rank <= 1, as we do not have a usecase for multi-dimensional vector support right now. Reviewed By: jsjodin Differential Revision: https://reviews.llvm.org/D152457	2023-09-28 14:44:16 +00:00
Krzysztof Drewniak	bfa501b892	[mlir][AMDGPU] Move to new buffer resource intrinsics The AMDGPU backend now has buffer resource intrinsics that take a ptr addrspase (8) instead of a vector<4xi32>, improving LLVM's ability to reason about their memory behavior. This commit moves MLIR to these new functions. Reviewed By: jsjodin Differential Revision: https://reviews.llvm.org/D157053	2023-09-22 19:48:06 +00:00
Krzysztof Drewniak	51b65d0895	[mlir][AMDGPU] Improve BF16 handling through AMDGPU compilation Many previous sets of AMDGPU dialect code have been incorrect in the presence of the bf16 type (when lowered to LLVM's bfloat) as they were developed in a setting that run a custom bf16-to-i16 pass before LLVM lowering. An overall effect of this patch is that you should run --arith-emulate-unsupported-floats="source-types=bf16 target-type=f32" on your GPU module before calling --convert-gpu-to-rocdl if your code performs bf16 arithmetic. While LLVM now supports software bfloat, initial experiments showed that using this support on AMDGPU inserted a large number of conversions around loads and stores which had substantial performance imparts. Furthermore, all of the native AMDGPU operations on bf16 types (like the WMMA operations) operate on 16-bit integers instead of the bfloat type. First, we make the following changes to preserve compatibility once the LLVM bfloat type is reenabled. 1. The matrix multiplication operations (MFMA and WMMA) will bitcast bfloat vectors to i16 vectors. 2. Buffer loads and stores will operate on the relevant integer datatype and then cast to bfloat if needed. Second, we add type conversions to convert bf16 and vectors of it to equivalent i16 types. Third, we add the bfloat <-> f32 expansion patterns to the set of operations run before the main LLVM conversion so that MLIR's implementation of these conversion routines is used. Finally, we extend the "floats treated as integers" support in the LLVM exporter to handle types other than fp8. We also fix a bug in the unsupported floats emulation where it tried to operate on `arith.bitcast` due to an oversight. Reviewed By: rsuderman Differential Revision: https://reviews.llvm.org/D156361	2023-08-17 18:31:28 +00:00

1 2

80 Commits