llvm-project

Author	SHA1	Message	Date
Jakub Kuderski	fb7ef637a8	[mlir][vector][nvgpu] Move MMA contraction preparation to VectorUtils This pattern is not specific to nvgpu; I intend to use in SPIR-V codegen. `VectorTransforms` seems like a more generally useful place. In addition: - Fix a bug in the second condition (the dimensions were swapped for RHS). - Add tests. - Add support for externally provided filter functions, similar to other vector transforms. - Prefer to transpose before zero/sign-extending inputs. Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D145638	2023-03-09 14:56:21 -05:00
Nicolas Vasilache	5ef7ceae57	[mlir][Vector] Significantly improve VectorToGPU.cpp This revision performs a bunch of cleanups and tracks free-flowing IR mutations. APIs are systematized around RewriterBase and relevant debug messages are added. Deliberate use of OpBuilder::InsertionGuard is added where needed. Differential Revision: https://reviews.llvm.org/D143738	2023-02-14 16:49:36 -08:00
Manish Gupta	9774cd17e8	[mlir][nvgpu] Fix affine maps computing indices for LdMatrixOp srcMemref This patch fixes and simplifies the ldmatrix affine map arithmetic by abstracting the affine expressions in terms of pitch-linear layout (strided and contiguous dimensions). Then it applies the maps for strided and contiguous dimensions in row-major and col-major. LdMatrixOp collaboratively (32 threads in a warp) load tiles (8 row x 128b col) of data. It can load either x1, x2, x4 tiles. Additionally, it can transpose at 16-bit granularity when moving data from the Shared Memory to registers. This patch fixes affine map: (laneid -> coordinate index a thread points in a tile). - Loading x4 tiles needs all 32 lanes T0-31 point to a contiguous chunk of 128b. The issue was exposed when running this case. - Loading x2 tiles and x1 needs T0-15 threads and T0-7 threads points to contiguous chunk of 128b. The patch is NFC for these cases. Differential Revision: https://reviews.llvm.org/D138978	2022-12-01 18:26:33 -08:00
Manish Gupta	114ba722c1	[mlir][NVGPU] Handle native mma.sync and ldmatrix(x4) sizes This patch handles native `mma.sync` sizes and enables issuing `ldmatrix` on largest possible tiles for matrixB. It requires handling `vector.extract_strided_slice` from vector to ngpu lowering. Differential Revision: https://reviews.llvm.org/D135749	2022-10-19 17:10:21 -07:00
Christopher Bate	ea2ed80e6d	[mlir][nvgpu] NFC - move NVGPU conversion helpers to NvGpu utils library The ConvertVectorToGpu pass implementation contained a small private support library for performing various calculations during conversion between `vector` and `nvgpu.mma.sync` and `nvgpu.ldmatrix` operations. The support library is moved under `Dialect/NVGPU/Utils` because the functions have wider utility. Some documentation comments are added or improved. Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D135303	2022-10-05 20:21:27 -06:00

5 Commits