llvm-project

Author	SHA1	Message	Date
harsh-nod	42bba97fc2	[mlir] Extend CombineTransferReadOpTranspose pattern to handle extf ops. (#74754 ) This patch modifies the CombineTransferReadOpTranspose pattern to handle extf ops. Also adds a test which shows the transpose getting folded into the transfer_read.	2023-12-07 15:01:55 -08:00
Cullen Rhodes	9816edc9f3	[mlir][vector] add result type to vector.extract assembly format (#66499 ) The vector.extract assembly format currently only contains the source type, for example: %1 = vector.extract %0[1] : vector<3x7x8xf32> it's not immediately obvious if this is the source or result type. This patch improves the assembly format to make this clearer, so the above becomes: %1 = vector.extract %0[1] : vector<7x8xf32> from vector<3x7x8xf32>	2023-09-28 11:11:16 +01:00
Christopher Bate	cafb6284d1	[mlir][VectorToGPU] Update memref stride preconditions on `nvgpu.mma.sync` path This change removes the requirement that the row stride be statically known when converting `vector.transfer_read` and `vector.transfer_write` to distributed SIMT operations in the `nvgpu` lowering path. It also adds a check to verify that the last dimension of the source memref is statically known to have stride 1 since this is assumed in the conversion logic. No other change should be required since the generated `vector.load` operations are never created across dimensions other than the last. The routines for checking preconditions on `vector.transfer_read/write` are moved to under nvgpu utilities. The change is NFC with respect to the GPU dialect lowering path. Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D155753	2023-09-14 13:51:42 -06:00
Lei Zhang	a01194377c	[mlir][gpu] Support arith.extf in subgroup MMA elementwise ops This commit adds support for arith.extf in the supported list of elementwise ops for subgroup MMA ops, and enables lowering to SPIR-V. Reviewed By: mravishankar Differential Revision: https://reviews.llvm.org/D156847	2023-08-01 21:12:37 -07:00
Manish Gupta	9a795f0c59	[mlir][Vector] Adds a pattern to fold `arith.extf` into `vector.contract` Consider mixed precision data type, i.e., F16 input lhs, F16 input rhs, F32 accumulation, and F32 output. This is typically written as F32 <= F16F16 + F32. During vectorization from linalg to vector for mixed precision data type (F32 <= F16F16 + F32), linalg.matmul introduces arith.extf on input lhs and rhs operands. "linalg.matmul"(%lhs, %rhs, %acc) ({ ^bb0(%arg1: f16, %arg2: f16, %arg3: f32): %lhs_f32 = "arith.extf"(%arg1) : (f16) -> f32 %rhs_f32 = "arith.extf"(%arg2) : (f16) -> f32 %mul = "arith.mulf"(%lhs_f32, %rhs_f32) : (f32, f32) -> f32 %acc = "arith.addf"(%arg3, %mul) : (f32, f32) -> f32 "linalg.yield"(%acc) : (f32) -> () }) There are backend that natively supports mixed-precision data type and does not need the arith.extf. For example, NVIDIA A100 GPU has mma.sync.aligned.*.f32.f16.f16.f32 that can support mixed-precision data type. However, the presence of arith.extf in the IR, introduces the unnecessary casting targeting F32 Tensor Cores instead of F16 Tensor Cores for NVIDIA backend. This patch adds a folding pattern to fold arith.extf into vector.contract Differential Revision: https://reviews.llvm.org/D151918	2023-06-05 23:22:20 +00:00
Manish Gupta	84eed7843e	[Updated commit] Fix Transpose Check in MMA.SYNC Path. Pushed a stale commit for the same review in my previous commit. I am updating the main-line with the latest commit including review commits. Apologies for the redundant commit. Differential Revision: https://reviews.llvm.org/D147749	2023-04-11 00:38:35 +00:00
Thomas Raoux	3cf7f22498	[mlir][vectorToGPU] Fix type used when folding transpose into read op Pick the right result type when folding transpose op into a read Differential Revision: https://reviews.llvm.org/D144113	2023-02-15 17:34:09 +00:00
Nicolas Vasilache	5ef7ceae57	[mlir][Vector] Significantly improve VectorToGPU.cpp This revision performs a bunch of cleanups and tracks free-flowing IR mutations. APIs are systematized around RewriterBase and relevant debug messages are added. Deliberate use of OpBuilder::InsertionGuard is added where needed. Differential Revision: https://reviews.llvm.org/D143738	2023-02-14 16:49:36 -08:00
Quinn Dawkins	5205c7126b	[mlir][gpu] Add support for unsigned integer extend in vector to gpu.subgroup_mma lowering Unsigned integer types are supported in subgroup mma ops by matching against arith.extui ops. This allows for subgroup_mma_compute ops with mixed signedness which requires later conversions to handle this. SPIR-V cooperative matrix ops support this while the lowering to WMMA does not. Differential Revision: https://reviews.llvm.org/D143922	2023-02-14 13:09:46 -05:00
Quinn Dawkins	985f7ff632	[mlir][gpu] Add support for integer types in gpu.subgroup_mma ops The signedness is carried by `!gpu.mma_matrix` types to most closely match the Cooperative Matrix specification which determines signedness with the type (and sometimes the operation). See: https://htmlpreview.github.io/?https://github.com/KhronosGroup/SPIRV-Registry/blob/master/extensions/NV/SPV_NV_cooperative_matrix.html To handle the lowering from vector to gpu, ops such as arith.extsi are pattern matched next to `vector.transfer_read` and `vector.contract` to determine the signedness of the matrix type. Enables s8 and u8 WMMA types in NVVM for the GPUToNVVM conversion. Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D143223	2023-02-07 17:58:01 -05:00
Thomas Raoux	066b4fcb8d	[mlir] Update VectorToGPU to new memory space GPU memory space have changed to new attributes. Update VectorToGPU pass to use those. Differential Revision: https://reviews.llvm.org/D142105	2023-01-19 20:12:37 +00:00
Lei Zhang	f1db4aec30	[mlir][VectorToGPU] Support transposed+broadcasted 2D MMA load This is loading from 2-D memref, in addition to D139655 where we load from 1-D memref cases. Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D140136	2022-12-15 19:34:32 +00:00
Lei Zhang	dbddd4f6a4	[mlir][VectorToGPU] Support transposed+broadcasted 1D MMA load This is now possible with transpose semantics on subgroup MMA load ops. Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D139655	2022-12-15 19:22:35 +00:00
Manish Gupta	9774cd17e8	[mlir][nvgpu] Fix affine maps computing indices for LdMatrixOp srcMemref This patch fixes and simplifies the ldmatrix affine map arithmetic by abstracting the affine expressions in terms of pitch-linear layout (strided and contiguous dimensions). Then it applies the maps for strided and contiguous dimensions in row-major and col-major. LdMatrixOp collaboratively (32 threads in a warp) load tiles (8 row x 128b col) of data. It can load either x1, x2, x4 tiles. Additionally, it can transpose at 16-bit granularity when moving data from the Shared Memory to registers. This patch fixes affine map: (laneid -> coordinate index a thread points in a tile). - Loading x4 tiles needs all 32 lanes T0-31 point to a contiguous chunk of 128b. The issue was exposed when running this case. - Loading x2 tiles and x1 needs T0-15 threads and T0-7 threads points to contiguous chunk of 128b. The patch is NFC for these cases. Differential Revision: https://reviews.llvm.org/D138978	2022-12-01 18:26:33 -08:00
Nicolas Vasilache	3af6438372	Revert "[WIP] Add support for MMA conversion for 1-D vector.transfer followed by a broadcast to 2-D" This reverts commit 7db25f78db807da171f23bcbaff258c5677901d1. This was mistakently stacked below (and committed) along with an NFC change.	2022-12-01 02:57:03 -08:00
Nicolas Vasilache	7db25f78db	[WIP] Add support for MMA conversion for 1-D vector.transfer followed by a broadcast to 2-D Differential Revision: https://reviews.llvm.org/D139040	2022-12-01 02:49:47 -08:00
Quinn Dawkins	c0321edc26	[mlir][gpu] Adding support for transposed mma_load_matrix Enables transposed gpu.subgroup_mma_load_matrix and updates the lowerings in Vector to GPU and GPU to SPIRV. Needed to enable B transpose matmuls lowering to wmma ops. Taken over from author: stanley-nod <stanley@nod-labs.com> Reviewed By: ThomasRaoux, antiagainst Differential Revision: https://reviews.llvm.org/D138770	2022-11-29 03:35:49 +00:00
Hanhan Wang	0a1569a400	[mlir][NFC] Remove trailing whitespaces from `.td` and `.mlir` files. This is generated by running ``` sed --in-place 's/[[:space:]]\+$//' mlir/*/.td sed --in-place 's/[[:space:]]\+$//' mlir/*/.mlir ``` Reviewed By: rriddle, dcaballe Differential Revision: https://reviews.llvm.org/D138866	2022-11-28 15:26:30 -08:00
rkayaith	13bd410962	[mlir][Pass] Include anchor op in -pass-pipeline In D134622 the printed form of a pass manager is changed to include the name of the op that the pass manager is anchored on. This updates the `-pass-pipeline` argument format to include the anchor op as well, so that the printed form of a pipeline can be directly passed to `-pass-pipeline`. In most cases this requires updating `-pass-pipeline='pipeline'` to `-pass-pipeline='builtin.module(pipeline)'`. This also fixes an outdated assert that prevented running a `PassManager` anchored on `'any'`. Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D134900	2022-11-03 11:36:12 -04:00
rkayaith	1c0f541a4d	[mlir] Don't mix -pass-pipeline with other pass options These are test updates required for D135745, which disallows mixing `-pass-pipeline` and the individual `-pass-name` options. Reviewed By: rriddle, mehdi_amini Differential Revision: https://reviews.llvm.org/D135746	2022-11-02 12:10:51 -04:00
Manish Gupta	114ba722c1	[mlir][NVGPU] Handle native mma.sync and ldmatrix(x4) sizes This patch handles native `mma.sync` sizes and enables issuing `ldmatrix` on largest possible tiles for matrixB. It requires handling `vector.extract_strided_slice` from vector to ngpu lowering. Differential Revision: https://reviews.llvm.org/D135749	2022-10-19 17:10:21 -07:00
Christopher Bate	670eee08ce	[mlir][VectorToGPU] Fix support for i4, col-major operand support For the conversion to nvgpu `mma.sync` and `ldmatrix` pathways, the code was missing support for the `i4` data type. While fixing this, another bug was discoverd that caused the number of ldmatrix tiles calculated for certain operand types and configurations to be incorrect. This change fixes both issues and adds additional tests. Differential Revision: https://reviews.llvm.org/D128074	2022-06-30 10:26:59 -06:00
Thomas Raoux	271a48e029	[mlir][VectorToGPU] Fix bug generating incorrect ldmatrix ops ldmatrix transpose can only be used with types that are 16bits wide. Differential Revision: https://reviews.llvm.org/D126846	2022-06-03 04:30:22 +00:00
Christopher Bate	1ca772ed95	[MLIR][GPU] Add NvGpu mma.sync path to the VectorToGPU pass This changes adds the option to lower to NvGpu dialect ops during the VectorToGPU convsersion pass. Because this transformation reuses existing VectorToGPU logic, a seperate VectorToNvGpu conversion pass is not created. The option `use-nvgpu` is added to the VectorToGPU pass. When this is true, the pass will attempt to convert slices rooted at `vector.contract` operations into `nvgpu.mma.sync` ops, and `vector.transfer_read` ops are converted to either `nvgpu.ldmatrix` or one or more `vector.load` operations. The specific data loaded will depend on the thread id within a subgroup (warp). These index calculations depend on data type and shape of the MMA op according to the downstream PTX specification. The code for supporting these details is separated into `NvGpuSupport.cpp\|h`. Differential Revision: https://reviews.llvm.org/D122940	2022-05-20 09:42:55 -06:00
River Riddle	3028bf740e	[mlir][NFC] Update textual references of `func` to `func.func` in Conversion/ tests The special case parsing of `func` operations is being removed.	2022-04-20 22:17:27 -07:00
Thomas Raoux	d77f483640	[mlir][gpu] Relax restriction on mma load/store op Those ops can support more complex layout as long as the most inner dimension is contiguous. Differential Revision: https://reviews.llvm.org/D122452	2022-03-25 04:03:40 +00:00
River Riddle	3655069234	[mlir] Move the Builtin FuncOp to the Func dialect This commit moves FuncOp out of the builtin dialect, and into the Func dialect. This move has been planned in some capacity from the moment we made FuncOp an operation (years ago). This commit handles the functional aspects of the move, but various aspects are left untouched to ease migration: func::FuncOp is re-exported into mlir to reduce the actual API churn, the assembly format still accepts the unqualified `func`. These temporary measures will remain for a little while to simplify migration before being removed. Differential Revision: https://reviews.llvm.org/D121266	2022-03-16 17:07:03 -07:00
River Riddle	47f175b09b	[mlir] Update FuncOp conversion passes to Pass/InterfacePass<FunctionOpInterface> These passes generally don't rely on any special aspects of FuncOp, and moving allows for these passes to be used in many more situations. The passes that obviously weren't relying on invariants guaranteed by a "function" were updated to be generic pass, the rest were updated to be FunctionOpinterface InterfacePasses. The test updates are NFC switching from implicit nesting (-pass -pass2) form to the -pass-pipeline form (generic passes do not implicitly nest as op-specific passes do). Differential Revision: https://reviews.llvm.org/D121190	2022-03-08 12:25:32 -08:00
Thomas Raoux	a57ccad5a6	[VectorToGPU] Fix horizontal stride calculation for N-D memref Fix a bug in how we calculate the stride of mma load/store ops for N-D memrefs Differential Revision: https://reviews.llvm.org/D118378	2022-01-27 13:35:56 -08:00
Mogball	aae5125550	[mlir] Replace StrEnumAttr -> EnumAttr in core dialects Removes uses of `StrEnumAttr` in core dialects Reviewed By: mehdi_amini, rriddle Differential Revision: https://reviews.llvm.org/D117514	2022-01-18 17:15:00 +00:00
Thomas Raoux	e7969240dc	[mlir][VectorToGPU] Support more cases in conversion to MMA ops Support load with broadcast, elementwise divf op and remove the hardcoded restriction on the vector size. Picking the right size should be enfored by user and will fail conversion to llvm/spirv if it is not supported. Differential Revision: https://reviews.llvm.org/D113618	2021-11-11 13:10:38 -08:00
thomasraoux	7fbb0678fa	[mlir][VectorToGPU] Add support for elementwise mma to vector to GPU Differential Revision: https://reviews.llvm.org/D112960	2021-11-02 08:01:04 -07:00
Mogball	a54f4eae0e	[MLIR] Replace std ops with arith dialect ops Precursor: https://reviews.llvm.org/D110200 Removed redundant ops from the standard dialect that were moved to the `arith` or `math` dialects. Renamed all instances of operations in the codebase and in tests. Reviewed By: rriddle, jpienaar Differential Revision: https://reviews.llvm.org/D110797	2021-10-13 03:07:03 +00:00
thomasraoux	4392841949	[mlir][VectorToGPU] Support converting vetor.broadcast to MMA op Differential Revision: https://reviews.llvm.org/D105175	2021-06-30 09:08:55 -07:00
thomasraoux	1a86559276	[mlir][VectorToGPU] Add conversion for scf::For op with Matrix operands Differential Revision: https://reviews.llvm.org/D104134	2021-06-24 15:42:28 -07:00
thomasraoux	6413226dce	[mlir][VectorToGPU] Add conversion for splat constant to MMA const matrix Differential Revision: https://reviews.llvm.org/D104133	2021-06-24 15:38:12 -07:00
thomasraoux	edd9515bd1	[mlir][VectorToGPU] First step to convert vector ops to GPU MMA ops This is the first step to convert vector ops to MMA operations in order to target GPUs tensor core ops. This currently only support simple cases, transpose and element-wise operation will be added later. Differential Revision: https://reviews.llvm.org/D102962	2021-06-11 07:52:32 -07:00

37 Commits