llvm-project

Author	SHA1	Message	Date
Kazu Hirata	1a36588ec6	[mlir] Use std::nullopt instead of None (NFC) This patch mechanically replaces None with std::nullopt where the compiler would warn if None were deprecated. The intent is to reduce the amount of manual work required in migrating from Optional to std::optional. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-03 18:50:27 -08:00
rkayaith	b0bbc9b595	[mlir][NVGPU] Fix -Wunsequenced warning llvm-project/mlir/lib/Conversion/NVGPUToNVVM/NVGPUToNVVM.cpp:441:25: warning: multiple unsequenced modifications to 'asmArgIdx' [-Wunsequenced] ss << "$" << asmArgIdx++ << ",$" << asmArgIdx++ << ";"; ^ ~~	2022-11-08 15:47:12 -05:00
Christopher Bate	708185f03f	[mlir][NVGPU] Add support for structured sparsity MMA variants This change adds a new NVGPU operation that targets the PTX `mma.sp.sync` instruction variants. A lowering to NVVM is provided using inline assembly. Reviewed By: ThomasRaoux, manishucsd Differential Revision: https://reviews.llvm.org/D137202	2022-11-07 09:43:03 -07:00
Manish Gupta	fbf69f95b6	[mlir][NVGPU] Adding Support for cp_async_zfill via Inline Asm `cp_async_zfill` is currently not present in the nvvm backend, this patch adds `cp_async_zfill` support by adding inline asm when lowering from `nvgpu` to `nvvm`. Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D132269	2022-09-02 21:29:26 +00:00
Michele Scuttari	67d0d7ac0a	[MLIR] Update pass declarations to new autogenerated files The patch introduces the required changes to update the pass declarations and definitions to use the new autogenerated files and allow dropping the old infrastructure. Reviewed By: mehdi_amini, rriddle Differential Review: https://reviews.llvm.org/D132838	2022-08-31 12:28:45 +02:00
Michele Scuttari	039b969b32	Revert "[MLIR] Update pass declarations to new autogenerated files" This reverts commit 2be8af8f0e0780901213b6fd3013a5268ddc3359.	2022-08-30 22:21:55 +02:00
Michele Scuttari	2be8af8f0e	[MLIR] Update pass declarations to new autogenerated files The patch introduces the required changes to update the pass declarations and definitions to use the new autogenerated files and allow dropping the old infrastructure. Reviewed By: mehdi_amini, rriddle Differential Review: https://reviews.llvm.org/D132838	2022-08-30 21:56:31 +02:00
Jeff Niu	5c5af910fe	[mlir][LLVMIR] "Modernize" Insert/ExtractValueOp This patch "modernizes" the LLVM `insertvalue` and `extractvalue` operations to use DenseI64ArrayAttr, since they only require an array of indices and previously there was confusion about whether to use i32 or i64 arrays, and to use assembly format. Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D131537	2022-08-10 12:51:11 -04:00
Manish Gupta	14d79afeae	[mlir][NVGPU] nvgpu.mmasync on F32 through TF32 Adds optional attribute to support tensor cores on F32 datatype by lowering to `mma.sync` with TF32 operands. Since, TF32 is not a native datatype in LLVM we are adding `tf32Enabled` as an attribute to allow the IR to be aware of `MmaSyncOp` datatype. Additionally, this patch adds placeholders for nvgpu-to-nvgpu transformation targeting higher precision tf32x3. For mma.sync on f32 input using tensor cores there are two possibilites: (a) tf32 (1 `mma.sync` per warp-level matrix-multiply-accumulate) (b) tf32x3 (3 `mma.sync` per warp-level matrix-multiply-accumulate) Typically, tf32 tensor core acceleration comes at a cost of accuracy from missing precision bits. While f32 has 23 precision bits, tf32 has only 10 precision bits. tf32x3 aims to recover the precision bits by splitting each operand into two tf32 values and issue three `mma.sync` tensor core operations. Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D130294	2022-08-01 23:23:27 +00:00
Kazu Hirata	2789c4f51c	[mlir] Use value_or (NFC)	2022-07-25 23:00:56 -07:00
Jacques Pienaar	8df54a6a03	[mlir] Update accessors to prefixed form (NFC) Follow up from flipping dialects to both, flip accessor used to prefixed variant ahead to flipping from _Both to _Prefixed. This just flips to the accessors introduced in the preceding change which are just prefixed forms of the existing accessor changed from. Mechanical change using helper script https://github.com/jpienaar/llvm-project/blob/main/clang-tools-extra/clang-tidy/misc/AddGetterCheck.cpp and clang-format.	2022-06-18 17:53:22 -07:00
Christopher Bate	51b925df94	[mlir][nvgpu] shared memory access optimization pass This change adds a transformation and pass to the NvGPU dialect that attempts to optimize reads/writes from a memref representing GPU shared memory in order to avoid bank conflicts. Given a value representing a shared memory memref, it traverses all reads/writes within the parent op and, subject to suitable conditions, rewrites all last dimension index values such that element locations in the final (col) dimension are given by `newColIdx = col % vecSize + perm[row](col/vecSize,row)` where `perm` is a permutation function indexed by `row` and `vecSize` is the vector access size in elements (currently assumes 128bit vectorized accesses, but this can be made a parameter). This specific transformation can help optimize typical distributed & vectorized accesses common to loading matrix multiplication operands to/from shared memory. Differential Revision: https://reviews.llvm.org/D127457	2022-06-17 09:31:05 -06:00
Mogball	d7ef488bb6	[mlir][gpu] Move GPU headers into IR/ and Transforms/ Depends on D127350 Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D127352	2022-06-09 22:49:03 +00:00
Christopher Bate	7085cb6011	[mlir][NvGpuToNVVM] Fix byte size calculation in async copy lowering AsyncCopyOp lowering converted "size in elements" to "size in bytes" assuming the element type size is at least one byte. This removes that restriction, allowing for types such as i4 and b1 to be handled correctly. Differential Revision: https://reviews.llvm.org/D125838	2022-05-23 10:53:51 -06:00
Christopher Bate	334f63e7c3	[mlir][NvGpuToNVVM] Fix missing i4 support for nvgpu.mma.sync This changes adds missing support for the i4 data type. Tests are added to ensure proper lowering of an nvgpu.mma.sync operation targeting the 16x8x64xi4 and 16x8x32xi4 MMA variants in the NVVM dialect. Differential Revision: https://reviews.llvm.org/D126092	2022-05-23 10:52:28 -06:00
Thomas Raoux	15bcc36eed	[mlir][gpu] Move async copy ops to NVGPU and add caching hints Move async copy operations to NVGPU as they only exist on NV target and are designed to match ptx semantic. This allows us to also add more fine grain caching hint attribute to the op. Add hint to bypass L1 and hook it up to NVVM op. Differential Revision: https://reviews.llvm.org/D125244	2022-05-10 22:30:24 +00:00
Christopher Bate	9879807393	[mlir][NvGpu] Fix nvgpu.mma.sync lowering to NVVM for f32, tf32 types Adds missing logic in the lowering from NvGPU to NVVM to support fp32 (in an accumulator operand) and tf32 (in multiplicand operand) types. Fixes logic in one of the helper functions for converting the result of a mma.sync operation with multiple 8x256bit output tiles, which is the case for f32 outputs. Differential Revision: https://reviews.llvm.org/D124533	2022-05-08 21:49:42 -06:00
Thomas Raoux	894a591cf6	[mlir][nvgpu] Move mma.sync and ldmatrix in nvgpu dialect Move gpu operation mma.sync and ldmatrix in nvgpu as they are specific to nvidia target. Differential Revision: https://reviews.llvm.org/D123824	2022-04-14 23:44:52 +00:00

18 Commits