llvm-project

Author	SHA1	Message	Date
Manish Gupta	9774cd17e8	[mlir][nvgpu] Fix affine maps computing indices for LdMatrixOp srcMemref This patch fixes and simplifies the ldmatrix affine map arithmetic by abstracting the affine expressions in terms of pitch-linear layout (strided and contiguous dimensions). Then it applies the maps for strided and contiguous dimensions in row-major and col-major. LdMatrixOp collaboratively (32 threads in a warp) load tiles (8 row x 128b col) of data. It can load either x1, x2, x4 tiles. Additionally, it can transpose at 16-bit granularity when moving data from the Shared Memory to registers. This patch fixes affine map: (laneid -> coordinate index a thread points in a tile). - Loading x4 tiles needs all 32 lanes T0-31 point to a contiguous chunk of 128b. The issue was exposed when running this case. - Loading x2 tiles and x1 needs T0-15 threads and T0-7 threads points to contiguous chunk of 128b. The patch is NFC for these cases. Differential Revision: https://reviews.llvm.org/D138978	2022-12-01 18:26:33 -08:00
Nicolas Vasilache	3af6438372	Revert "[WIP] Add support for MMA conversion for 1-D vector.transfer followed by a broadcast to 2-D" This reverts commit 7db25f78db807da171f23bcbaff258c5677901d1. This was mistakently stacked below (and committed) along with an NFC change.	2022-12-01 02:57:03 -08:00
Nicolas Vasilache	7db25f78db	[WIP] Add support for MMA conversion for 1-D vector.transfer followed by a broadcast to 2-D Differential Revision: https://reviews.llvm.org/D139040	2022-12-01 02:49:47 -08:00
Quinn Dawkins	c0321edc26	[mlir][gpu] Adding support for transposed mma_load_matrix Enables transposed gpu.subgroup_mma_load_matrix and updates the lowerings in Vector to GPU and GPU to SPIRV. Needed to enable B transpose matmuls lowering to wmma ops. Taken over from author: stanley-nod <stanley@nod-labs.com> Reviewed By: ThomasRaoux, antiagainst Differential Revision: https://reviews.llvm.org/D138770	2022-11-29 03:35:49 +00:00
Hanhan Wang	0a1569a400	[mlir][NFC] Remove trailing whitespaces from `.td` and `.mlir` files. This is generated by running ``` sed --in-place 's/[[:space:]]\+$//' mlir/*/.td sed --in-place 's/[[:space:]]\+$//' mlir/*/.mlir ``` Reviewed By: rriddle, dcaballe Differential Revision: https://reviews.llvm.org/D138866	2022-11-28 15:26:30 -08:00
rkayaith	13bd410962	[mlir][Pass] Include anchor op in -pass-pipeline In D134622 the printed form of a pass manager is changed to include the name of the op that the pass manager is anchored on. This updates the `-pass-pipeline` argument format to include the anchor op as well, so that the printed form of a pipeline can be directly passed to `-pass-pipeline`. In most cases this requires updating `-pass-pipeline='pipeline'` to `-pass-pipeline='builtin.module(pipeline)'`. This also fixes an outdated assert that prevented running a `PassManager` anchored on `'any'`. Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D134900	2022-11-03 11:36:12 -04:00
rkayaith	1c0f541a4d	[mlir] Don't mix -pass-pipeline with other pass options These are test updates required for D135745, which disallows mixing `-pass-pipeline` and the individual `-pass-name` options. Reviewed By: rriddle, mehdi_amini Differential Revision: https://reviews.llvm.org/D135746	2022-11-02 12:10:51 -04:00
Manish Gupta	114ba722c1	[mlir][NVGPU] Handle native mma.sync and ldmatrix(x4) sizes This patch handles native `mma.sync` sizes and enables issuing `ldmatrix` on largest possible tiles for matrixB. It requires handling `vector.extract_strided_slice` from vector to ngpu lowering. Differential Revision: https://reviews.llvm.org/D135749	2022-10-19 17:10:21 -07:00
Christopher Bate	670eee08ce	[mlir][VectorToGPU] Fix support for i4, col-major operand support For the conversion to nvgpu `mma.sync` and `ldmatrix` pathways, the code was missing support for the `i4` data type. While fixing this, another bug was discoverd that caused the number of ldmatrix tiles calculated for certain operand types and configurations to be incorrect. This change fixes both issues and adds additional tests. Differential Revision: https://reviews.llvm.org/D128074	2022-06-30 10:26:59 -06:00
Thomas Raoux	271a48e029	[mlir][VectorToGPU] Fix bug generating incorrect ldmatrix ops ldmatrix transpose can only be used with types that are 16bits wide. Differential Revision: https://reviews.llvm.org/D126846	2022-06-03 04:30:22 +00:00
Christopher Bate	1ca772ed95	[MLIR][GPU] Add NvGpu mma.sync path to the VectorToGPU pass This changes adds the option to lower to NvGpu dialect ops during the VectorToGPU convsersion pass. Because this transformation reuses existing VectorToGPU logic, a seperate VectorToNvGpu conversion pass is not created. The option `use-nvgpu` is added to the VectorToGPU pass. When this is true, the pass will attempt to convert slices rooted at `vector.contract` operations into `nvgpu.mma.sync` ops, and `vector.transfer_read` ops are converted to either `nvgpu.ldmatrix` or one or more `vector.load` operations. The specific data loaded will depend on the thread id within a subgroup (warp). These index calculations depend on data type and shape of the MMA op according to the downstream PTX specification. The code for supporting these details is separated into `NvGpuSupport.cpp\|h`. Differential Revision: https://reviews.llvm.org/D122940	2022-05-20 09:42:55 -06:00
River Riddle	3028bf740e	[mlir][NFC] Update textual references of `func` to `func.func` in Conversion/ tests The special case parsing of `func` operations is being removed.	2022-04-20 22:17:27 -07:00
Thomas Raoux	d77f483640	[mlir][gpu] Relax restriction on mma load/store op Those ops can support more complex layout as long as the most inner dimension is contiguous. Differential Revision: https://reviews.llvm.org/D122452	2022-03-25 04:03:40 +00:00
River Riddle	3655069234	[mlir] Move the Builtin FuncOp to the Func dialect This commit moves FuncOp out of the builtin dialect, and into the Func dialect. This move has been planned in some capacity from the moment we made FuncOp an operation (years ago). This commit handles the functional aspects of the move, but various aspects are left untouched to ease migration: func::FuncOp is re-exported into mlir to reduce the actual API churn, the assembly format still accepts the unqualified `func`. These temporary measures will remain for a little while to simplify migration before being removed. Differential Revision: https://reviews.llvm.org/D121266	2022-03-16 17:07:03 -07:00
River Riddle	47f175b09b	[mlir] Update FuncOp conversion passes to Pass/InterfacePass<FunctionOpInterface> These passes generally don't rely on any special aspects of FuncOp, and moving allows for these passes to be used in many more situations. The passes that obviously weren't relying on invariants guaranteed by a "function" were updated to be generic pass, the rest were updated to be FunctionOpinterface InterfacePasses. The test updates are NFC switching from implicit nesting (-pass -pass2) form to the -pass-pipeline form (generic passes do not implicitly nest as op-specific passes do). Differential Revision: https://reviews.llvm.org/D121190	2022-03-08 12:25:32 -08:00
Thomas Raoux	a57ccad5a6	[VectorToGPU] Fix horizontal stride calculation for N-D memref Fix a bug in how we calculate the stride of mma load/store ops for N-D memrefs Differential Revision: https://reviews.llvm.org/D118378	2022-01-27 13:35:56 -08:00
Mogball	aae5125550	[mlir] Replace StrEnumAttr -> EnumAttr in core dialects Removes uses of `StrEnumAttr` in core dialects Reviewed By: mehdi_amini, rriddle Differential Revision: https://reviews.llvm.org/D117514	2022-01-18 17:15:00 +00:00
Thomas Raoux	e7969240dc	[mlir][VectorToGPU] Support more cases in conversion to MMA ops Support load with broadcast, elementwise divf op and remove the hardcoded restriction on the vector size. Picking the right size should be enfored by user and will fail conversion to llvm/spirv if it is not supported. Differential Revision: https://reviews.llvm.org/D113618	2021-11-11 13:10:38 -08:00
thomasraoux	7fbb0678fa	[mlir][VectorToGPU] Add support for elementwise mma to vector to GPU Differential Revision: https://reviews.llvm.org/D112960	2021-11-02 08:01:04 -07:00
Mogball	a54f4eae0e	[MLIR] Replace std ops with arith dialect ops Precursor: https://reviews.llvm.org/D110200 Removed redundant ops from the standard dialect that were moved to the `arith` or `math` dialects. Renamed all instances of operations in the codebase and in tests. Reviewed By: rriddle, jpienaar Differential Revision: https://reviews.llvm.org/D110797	2021-10-13 03:07:03 +00:00
thomasraoux	4392841949	[mlir][VectorToGPU] Support converting vetor.broadcast to MMA op Differential Revision: https://reviews.llvm.org/D105175	2021-06-30 09:08:55 -07:00
thomasraoux	1a86559276	[mlir][VectorToGPU] Add conversion for scf::For op with Matrix operands Differential Revision: https://reviews.llvm.org/D104134	2021-06-24 15:42:28 -07:00
thomasraoux	6413226dce	[mlir][VectorToGPU] Add conversion for splat constant to MMA const matrix Differential Revision: https://reviews.llvm.org/D104133	2021-06-24 15:38:12 -07:00
thomasraoux	edd9515bd1	[mlir][VectorToGPU] First step to convert vector ops to GPU MMA ops This is the first step to convert vector ops to MMA operations in order to target GPUs tensor core ops. This currently only support simple cases, transpose and element-wise operation will be added later. Differential Revision: https://reviews.llvm.org/D102962	2021-06-11 07:52:32 -07:00

24 Commits