llvm-project

Author	SHA1	Message	Date
Hanhan Wang	0a1569a400	[mlir][NFC] Remove trailing whitespaces from `.td` and `.mlir` files. This is generated by running ``` sed --in-place 's/[[:space:]]\+$//' mlir/*/.td sed --in-place 's/[[:space:]]\+$//' mlir/*/.mlir ``` Reviewed By: rriddle, dcaballe Differential Revision: https://reviews.llvm.org/D138866	2022-11-28 15:26:30 -08:00
Christopher Bate	708185f03f	[mlir][NVGPU] Add support for structured sparsity MMA variants This change adds a new NVGPU operation that targets the PTX `mma.sp.sync` instruction variants. A lowering to NVVM is provided using inline assembly. Reviewed By: ThomasRaoux, manishucsd Differential Revision: https://reviews.llvm.org/D137202	2022-11-07 09:43:03 -07:00
rkayaith	13bd410962	[mlir][Pass] Include anchor op in -pass-pipeline In D134622 the printed form of a pass manager is changed to include the name of the op that the pass manager is anchored on. This updates the `-pass-pipeline` argument format to include the anchor op as well, so that the printed form of a pipeline can be directly passed to `-pass-pipeline`. In most cases this requires updating `-pass-pipeline='pipeline'` to `-pass-pipeline='builtin.module(pipeline)'`. This also fixes an outdated assert that prevented running a `PassManager` anchored on `'any'`. Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D134900	2022-11-03 11:36:12 -04:00
Thomas Raoux	91594b5b98	[mlir][nvpu] Prevent F32ToTF32 pattern to generate illegal IR We shouldn't apply this pattern to non F32->F32 mma.sync operations. Differential Revision: https://reviews.llvm.org/D131902	2022-08-15 16:46:18 +00:00
Manish Gupta	14d79afeae	[mlir][NVGPU] nvgpu.mmasync on F32 through TF32 Adds optional attribute to support tensor cores on F32 datatype by lowering to `mma.sync` with TF32 operands. Since, TF32 is not a native datatype in LLVM we are adding `tf32Enabled` as an attribute to allow the IR to be aware of `MmaSyncOp` datatype. Additionally, this patch adds placeholders for nvgpu-to-nvgpu transformation targeting higher precision tf32x3. For mma.sync on f32 input using tensor cores there are two possibilites: (a) tf32 (1 `mma.sync` per warp-level matrix-multiply-accumulate) (b) tf32x3 (3 `mma.sync` per warp-level matrix-multiply-accumulate) Typically, tf32 tensor core acceleration comes at a cost of accuracy from missing precision bits. While f32 has 23 precision bits, tf32 has only 10 precision bits. tf32x3 aims to recover the precision bits by splitting each operand into two tf32 values and issue three `mma.sync` tensor core operations. Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D130294	2022-08-01 23:23:27 +00:00
Manish Gupta	713d3de5fb	[mlir][NVGPU] Verifier for nvgpu.ldmatrix * Adds verifiers for `nvgpu.ldmatrix` op * Adds tests to `mlir/test/Dialect/NVGPU/invalid.mlir` Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D129669	2022-07-14 22:46:38 +00:00
Manish Gupta	f7d42d5149	[mlir][NVGPU] Verifiers for nvgpu.mma.sync Op - Adds verification for `nvgpu.mma.sync` op - Adds tests to `mlir/test/Dialect/NVGPU/invalid.mlir` - `nvgpu.mma.sync` verifier caught a bug and triggered a failure in m16n8k4_tf32_f32 variant in `mlir/test/Conversion/NVGPUToNVVM/nvgpu-to-nvvm.mlir` - The output shape of vector holding thread-level accumulators was inconsistent and fixed in this change Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D129400	2022-07-13 18:57:07 +00:00
Christopher Bate	51b925df94	[mlir][nvgpu] shared memory access optimization pass This change adds a transformation and pass to the NvGPU dialect that attempts to optimize reads/writes from a memref representing GPU shared memory in order to avoid bank conflicts. Given a value representing a shared memory memref, it traverses all reads/writes within the parent op and, subject to suitable conditions, rewrites all last dimension index values such that element locations in the final (col) dimension are given by `newColIdx = col % vecSize + perm[row](col/vecSize,row)` where `perm` is a permutation function indexed by `row` and `vecSize` is the vector access size in elements (currently assumes 128bit vectorized accesses, but this can be made a parameter). This specific transformation can help optimize typical distributed & vectorized accesses common to loading matrix multiplication operands to/from shared memory. Differential Revision: https://reviews.llvm.org/D127457	2022-06-17 09:31:05 -06:00
Thomas Raoux	15bcc36eed	[mlir][gpu] Move async copy ops to NVGPU and add caching hints Move async copy operations to NVGPU as they only exist on NV target and are designed to match ptx semantic. This allows us to also add more fine grain caching hint attribute to the op. Add hint to bypass L1 and hook it up to NVVM op. Differential Revision: https://reviews.llvm.org/D125244	2022-05-10 22:30:24 +00:00
River Riddle	0254b0bcf0	[mlir][NFC] Update textual references of `func` to `func.func` in LLVM/Math/MemRef/NVGPU/OpenACC/OpenMP/Quant/SCF/Shape tests The special case parsing of `func` operations is being removed.	2022-04-20 22:17:28 -07:00
Thomas Raoux	894a591cf6	[mlir][nvgpu] Move mma.sync and ldmatrix in nvgpu dialect Move gpu operation mma.sync and ldmatrix in nvgpu as they are specific to nvidia target. Differential Revision: https://reviews.llvm.org/D123824	2022-04-14 23:44:52 +00:00
Thomas Raoux	4c564940a1	[mlir][nvgpu] Add NVGPU dialect (architectural specific gpu dialect) This introduce a new dialect for vendro specific ptx operations. This also adds the first operation ldmatrix as an example. More operations will be added in follow up patches. This new dialect is meant to be a bridge between GPU and Vector dialectis and NVVM dialect. This is based on the RFC proposed here: https://discourse.llvm.org/t/rfc-add-nv-gpu-dialect-hw-specific-extension-of-gpu-dialect-for-nvidia-gpus/61466/8 Differential Revision: https://reviews.llvm.org/D123266	2022-04-14 16:33:46 +00:00

12 Commits