37 Commits

Author SHA1 Message Date
harsh-nod
42bba97fc2
[mlir] Extend CombineTransferReadOpTranspose pattern to handle extf ops. (#74754)
This patch modifies the CombineTransferReadOpTranspose pattern to handle
extf ops. Also adds a test which shows the transpose getting folded into
the transfer_read.
2023-12-07 15:01:55 -08:00
Cullen Rhodes
9816edc9f3
[mlir][vector] add result type to vector.extract assembly format (#66499)
The vector.extract assembly format currently only contains the source
type, for example:

  %1 = vector.extract %0[1] : vector<3x7x8xf32>

it's not immediately obvious if this is the source or result type. This
patch improves the assembly format to make this clearer, so the above
becomes:

  %1 = vector.extract %0[1] : vector<7x8xf32> from vector<3x7x8xf32>
2023-09-28 11:11:16 +01:00
Christopher Bate
cafb6284d1 [mlir][VectorToGPU] Update memref stride preconditions on nvgpu.mma.sync path
This change removes the requirement that the row stride be statically known when
converting `vector.transfer_read` and `vector.transfer_write` to distributed
SIMT operations in the `nvgpu` lowering path. It also adds a check to verify
that the last dimension of the source memref is statically known to have stride
1 since this is assumed in the conversion logic.  No other change should be
required since the generated `vector.load` operations are never created across
dimensions other than the last. The routines for checking preconditions on
`vector.transfer_read/write` are moved to under nvgpu utilities.

The change is NFC with respect to the GPU dialect lowering path.

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D155753
2023-09-14 13:51:42 -06:00
Lei Zhang
a01194377c [mlir][gpu] Support arith.extf in subgroup MMA elementwise ops
This commit adds support for arith.extf in the supported list of
elementwise ops for subgroup MMA ops, and enables lowering to
SPIR-V.

Reviewed By: mravishankar

Differential Revision: https://reviews.llvm.org/D156847
2023-08-01 21:12:37 -07:00
Manish Gupta
9a795f0c59 [mlir][Vector] Adds a pattern to fold arith.extf into vector.contract
Consider mixed precision data type, i.e., F16 input lhs, F16 input rhs, F32 accumulation, and F32 output. This is typically written as F32 <= F16*F16 + F32.

During vectorization from linalg to vector for mixed precision data type (F32 <= F16*F16 + F32), linalg.matmul introduces arith.extf on input lhs and rhs operands.

"linalg.matmul"(%lhs, %rhs, %acc) ({
      ^bb0(%arg1: f16, %arg2: f16, %arg3: f32):
        %lhs_f32 = "arith.extf"(%arg1) : (f16) -> f32
        %rhs_f32 = "arith.extf"(%arg2) : (f16) -> f32
       %mul = "arith.mulf"(%lhs_f32, %rhs_f32) : (f32, f32) -> f32
        %acc = "arith.addf"(%arg3, %mul) : (f32, f32) -> f32
      "linalg.yield"(%acc) : (f32) -> ()
    })
There are backend that natively supports mixed-precision data type and does not need the arith.extf. For example, NVIDIA A100 GPU has mma.sync.aligned.*.f32.f16.f16.f32 that can support mixed-precision data type. However, the presence of arith.extf in the IR, introduces the unnecessary casting targeting F32 Tensor Cores instead of F16 Tensor Cores for NVIDIA backend. This patch adds a folding pattern to fold arith.extf into vector.contract

Differential Revision: https://reviews.llvm.org/D151918
2023-06-05 23:22:20 +00:00
Manish Gupta
84eed7843e [Updated commit] Fix Transpose Check in MMA.SYNC Path.
Pushed a stale commit for the same review in my previous commit.
I am updating the main-line with the latest commit including
review commits. Apologies for the redundant commit.

Differential Revision: https://reviews.llvm.org/D147749
2023-04-11 00:38:35 +00:00
Thomas Raoux
3cf7f22498 [mlir][vectorToGPU] Fix type used when folding transpose into read op
Pick the right result type when folding transpose op into a read

Differential Revision: https://reviews.llvm.org/D144113
2023-02-15 17:34:09 +00:00
Nicolas Vasilache
5ef7ceae57 [mlir][Vector] Significantly improve VectorToGPU.cpp
This revision performs a bunch of cleanups and tracks free-flowing IR mutations.
APIs are systematized around RewriterBase and relevant debug messages are added.
Deliberate use of OpBuilder::InsertionGuard is added where needed.

Differential Revision: https://reviews.llvm.org/D143738
2023-02-14 16:49:36 -08:00
Quinn Dawkins
5205c7126b [mlir][gpu] Add support for unsigned integer extend in vector to gpu.subgroup_mma lowering
Unsigned integer types are supported in subgroup mma ops by matching
against arith.extui ops. This allows for subgroup_mma_compute ops with
mixed signedness which requires later conversions to handle this. SPIR-V
cooperative matrix ops support this while the lowering to WMMA does not.

Differential Revision: https://reviews.llvm.org/D143922
2023-02-14 13:09:46 -05:00
Quinn Dawkins
985f7ff632 [mlir][gpu] Add support for integer types in gpu.subgroup_mma ops
The signedness is carried by `!gpu.mma_matrix` types to most closely
match the Cooperative Matrix specification which determines signedness
with the type (and sometimes the operation).

See: https://htmlpreview.github.io/?https://github.com/KhronosGroup/SPIRV-Registry/blob/master/extensions/NV/SPV_NV_cooperative_matrix.html

To handle the lowering from vector to gpu, ops such as arith.extsi are
pattern matched next to `vector.transfer_read` and `vector.contract` to
determine the signedness of the matrix type.

Enables s8 and u8 WMMA types in NVVM for the GPUToNVVM conversion.

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D143223
2023-02-07 17:58:01 -05:00
Thomas Raoux
066b4fcb8d [mlir] Update VectorToGPU to new memory space
GPU memory space have changed to new attributes. Update VectorToGPU pass
to use those.

Differential Revision: https://reviews.llvm.org/D142105
2023-01-19 20:12:37 +00:00
Lei Zhang
f1db4aec30 [mlir][VectorToGPU] Support transposed+broadcasted 2D MMA load
This is loading from 2-D memref, in addition to D139655 where we
load from 1-D memref cases.

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D140136
2022-12-15 19:34:32 +00:00
Lei Zhang
dbddd4f6a4 [mlir][VectorToGPU] Support transposed+broadcasted 1D MMA load
This is now possible with transpose semantics on subgroup MMA
load ops.

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D139655
2022-12-15 19:22:35 +00:00
Manish Gupta
9774cd17e8 [mlir][nvgpu] Fix affine maps computing indices for LdMatrixOp srcMemref
This patch fixes and simplifies the ldmatrix affine map arithmetic by
abstracting the affine expressions in terms of pitch-linear layout
(strided and contiguous dimensions). Then it applies the maps for
strided and contiguous dimensions in row-major and col-major.

LdMatrixOp collaboratively (32 threads in a warp) load tiles
(8 row x 128b col) of data. It can load either x1, x2, x4 tiles.
Additionally, it can transpose at 16-bit granularity when moving
data from the Shared Memory to registers.

This patch fixes affine map:
(laneid -> coordinate index a thread points in a tile).

- Loading x4 tiles needs all 32 lanes T0-31 point to a contiguous
  chunk of 128b. The issue was exposed when running this case.
- Loading x2 tiles and x1 needs T0-15 threads and T0-7 threads points
  to contiguous chunk of 128b. The patch is NFC for these cases.

Differential Revision: https://reviews.llvm.org/D138978
2022-12-01 18:26:33 -08:00
Nicolas Vasilache
3af6438372 Revert "[WIP] Add support for MMA conversion for 1-D vector.transfer followed by a broadcast to 2-D"
This reverts commit 7db25f78db807da171f23bcbaff258c5677901d1.

This was mistakently stacked below (and committed) along with an NFC change.
2022-12-01 02:57:03 -08:00
Nicolas Vasilache
7db25f78db [WIP] Add support for MMA conversion for 1-D vector.transfer followed by a broadcast to 2-D
Differential Revision: https://reviews.llvm.org/D139040
2022-12-01 02:49:47 -08:00
Quinn Dawkins
c0321edc26 [mlir][gpu] Adding support for transposed mma_load_matrix
Enables transposed gpu.subgroup_mma_load_matrix and updates the lowerings in Vector to GPU and GPU to SPIRV. Needed to enable B transpose matmuls lowering to wmma ops.

Taken over from author: stanley-nod <stanley@nod-labs.com>

Reviewed By: ThomasRaoux, antiagainst

Differential Revision: https://reviews.llvm.org/D138770
2022-11-29 03:35:49 +00:00
Hanhan Wang
0a1569a400 [mlir][NFC] Remove trailing whitespaces from *.td and *.mlir files.
This is generated by running

```
sed --in-place 's/[[:space:]]\+$//' mlir/**/*.td
sed --in-place 's/[[:space:]]\+$//' mlir/**/*.mlir
```

Reviewed By: rriddle, dcaballe

Differential Revision: https://reviews.llvm.org/D138866
2022-11-28 15:26:30 -08:00
rkayaith
13bd410962 [mlir][Pass] Include anchor op in -pass-pipeline
In D134622 the printed form of a pass manager is changed to include the
name of the op that the pass manager is anchored on. This updates the
`-pass-pipeline` argument format to include the anchor op as well, so
that the printed form of a pipeline can be directly passed to
`-pass-pipeline`. In most cases this requires updating
`-pass-pipeline='pipeline'` to
`-pass-pipeline='builtin.module(pipeline)'`.

This also fixes an outdated assert that prevented running a
`PassManager` anchored on `'any'`.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D134900
2022-11-03 11:36:12 -04:00
rkayaith
1c0f541a4d [mlir] Don't mix -pass-pipeline with other pass options
These are test updates required for D135745, which disallows mixing
`-pass-pipeline` and the individual `-pass-name` options.

Reviewed By: rriddle, mehdi_amini

Differential Revision: https://reviews.llvm.org/D135746
2022-11-02 12:10:51 -04:00
Manish Gupta
114ba722c1 [mlir][NVGPU] Handle native mma.sync and ldmatrix(x4) sizes
This patch handles native `mma.sync` sizes and enables issuing `ldmatrix` on
largest possible tiles for matrixB. It requires handling
`vector.extract_strided_slice` from vector to ngpu lowering.

Differential Revision: https://reviews.llvm.org/D135749
2022-10-19 17:10:21 -07:00
Christopher Bate
670eee08ce [mlir][VectorToGPU] Fix support for i4, col-major operand support
For the conversion to nvgpu `mma.sync` and `ldmatrix` pathways, the code
was missing support for the `i4` data type. While fixing this, another
bug was discoverd that caused the number of ldmatrix tiles calculated for
certain operand types and configurations to be incorrect. This change
fixes both issues and adds additional tests.

Differential Revision: https://reviews.llvm.org/D128074
2022-06-30 10:26:59 -06:00
Thomas Raoux
271a48e029 [mlir][VectorToGPU] Fix bug generating incorrect ldmatrix ops
ldmatrix transpose can only be used with types that are 16bits wide.

Differential Revision: https://reviews.llvm.org/D126846
2022-06-03 04:30:22 +00:00
Christopher Bate
1ca772ed95 [MLIR][GPU] Add NvGpu mma.sync path to the VectorToGPU pass
This changes adds the option to lower to NvGpu dialect ops during the
VectorToGPU convsersion pass. Because this transformation reuses
existing VectorToGPU logic, a seperate VectorToNvGpu conversion pass is
not created. The option `use-nvgpu` is added to the VectorToGPU pass.
When this is true, the pass will attempt to convert slices rooted at
`vector.contract` operations into `nvgpu.mma.sync` ops, and
`vector.transfer_read` ops are converted to either `nvgpu.ldmatrix` or
one or more `vector.load` operations.  The specific data loaded will
depend on the thread id within a subgroup (warp). These index
calculations depend on data type and shape of the MMA op
according to the downstream PTX specification. The code for supporting
these details is separated into `NvGpuSupport.cpp|h`.

Differential Revision: https://reviews.llvm.org/D122940
2022-05-20 09:42:55 -06:00
River Riddle
3028bf740e [mlir][NFC] Update textual references of func to func.func in Conversion/ tests
The special case parsing of `func` operations is being removed.
2022-04-20 22:17:27 -07:00
Thomas Raoux
d77f483640 [mlir][gpu] Relax restriction on mma load/store op
Those ops can support more complex layout as long as the most inner
dimension is contiguous.

Differential Revision: https://reviews.llvm.org/D122452
2022-03-25 04:03:40 +00:00
River Riddle
3655069234 [mlir] Move the Builtin FuncOp to the Func dialect
This commit moves FuncOp out of the builtin dialect, and into the Func
dialect. This move has been planned in some capacity from the moment
we made FuncOp an operation (years ago). This commit handles the
functional aspects of the move, but various aspects are left untouched
to ease migration: func::FuncOp is re-exported into mlir to reduce
the actual API churn, the assembly format still accepts the unqualified
`func`. These temporary measures will remain for a little while to
simplify migration before being removed.

Differential Revision: https://reviews.llvm.org/D121266
2022-03-16 17:07:03 -07:00
River Riddle
47f175b09b [mlir] Update FuncOp conversion passes to Pass/InterfacePass<FunctionOpInterface>
These passes generally don't rely on any special aspects of FuncOp, and moving allows
for these passes to be used in many more situations. The passes that obviously weren't
relying on invariants guaranteed by a "function" were updated to be generic pass, the
rest were updated to be FunctionOpinterface InterfacePasses.

The test updates are NFC switching from implicit nesting (-pass -pass2) form to
the -pass-pipeline form (generic passes do not implicitly nest as op-specific passes do).

Differential Revision: https://reviews.llvm.org/D121190
2022-03-08 12:25:32 -08:00
Thomas Raoux
a57ccad5a6 [VectorToGPU] Fix horizontal stride calculation for N-D memref
Fix a bug in how we calculate the stride of mma load/store ops for N-D
memrefs

Differential Revision: https://reviews.llvm.org/D118378
2022-01-27 13:35:56 -08:00
Mogball
aae5125550 [mlir] Replace StrEnumAttr -> EnumAttr in core dialects
Removes uses of `StrEnumAttr` in core dialects

Reviewed By: mehdi_amini, rriddle

Differential Revision: https://reviews.llvm.org/D117514
2022-01-18 17:15:00 +00:00
Thomas Raoux
e7969240dc [mlir][VectorToGPU] Support more cases in conversion to MMA ops
Support load with broadcast, elementwise divf op and remove the
hardcoded restriction on the vector size. Picking the right size should
be enfored by user and will fail conversion to llvm/spirv if it is not
supported.

Differential Revision: https://reviews.llvm.org/D113618
2021-11-11 13:10:38 -08:00
thomasraoux
7fbb0678fa [mlir][VectorToGPU] Add support for elementwise mma to vector to GPU
Differential Revision: https://reviews.llvm.org/D112960
2021-11-02 08:01:04 -07:00
Mogball
a54f4eae0e [MLIR] Replace std ops with arith dialect ops
Precursor: https://reviews.llvm.org/D110200

Removed redundant ops from the standard dialect that were moved to the
`arith` or `math` dialects.

Renamed all instances of operations in the codebase and in tests.

Reviewed By: rriddle, jpienaar

Differential Revision: https://reviews.llvm.org/D110797
2021-10-13 03:07:03 +00:00
thomasraoux
4392841949 [mlir][VectorToGPU] Support converting vetor.broadcast to MMA op
Differential Revision: https://reviews.llvm.org/D105175
2021-06-30 09:08:55 -07:00
thomasraoux
1a86559276 [mlir][VectorToGPU] Add conversion for scf::For op with Matrix operands
Differential Revision: https://reviews.llvm.org/D104134
2021-06-24 15:42:28 -07:00
thomasraoux
6413226dce [mlir][VectorToGPU] Add conversion for splat constant to MMA const matrix
Differential Revision: https://reviews.llvm.org/D104133
2021-06-24 15:38:12 -07:00
thomasraoux
edd9515bd1 [mlir][VectorToGPU] First step to convert vector ops to GPU MMA ops
This is the first step to convert vector ops to MMA operations in order to
target GPUs tensor core ops. This currently only support simple cases,
transpose and element-wise operation will be added later.

Differential Revision: https://reviews.llvm.org/D102962
2021-06-11 07:52:32 -07:00