38 Commits

Author SHA1 Message Date
Aart Bik
4df01dc270 [mlir][sparse][gpu][nvidia] add pruning step and check to 2:4 matrix multiplication
(1) without the check, the results may silently be wrong, so check is needed
(2) add pruning step to guarantee 2:4 property

Note, in the longer run, we may want to split out the pruning step somehow,
or make it optional.

Reviewed By: K-Wu

Differential Revision: https://reviews.llvm.org/D155320
2023-07-14 12:08:13 -07:00
Aart Bik
97678cec1b [mlir][sparse][gpu] remove zero init memset
avoids quite a big memory fill for each setup

Reviewed By: K-Wu

Differential Revision: https://reviews.llvm.org/D155251
2023-07-13 18:22:21 -07:00
Aart Bik
86eff489e7 [mlir][sparse][gpu] force 16-byte alignment on data structs for cuSparseLt
Also makes some minor consistency edits in the cuSparseLt wrapper lib.

Reviewed By: Peiming, K-Wu

Differential Revision: https://reviews.llvm.org/D155139
2023-07-13 10:45:15 -07:00
Adrian Kuegel
f250fbcbbb [mlir] Apply ClangTidy fix (NFC)
The return statement is redundant.
2023-07-10 11:46:32 +02:00
Aart Bik
03125e6894 [mlir][sparse][gpu] fix missing dealloc
This dealloc was incorrectly removed in
https://reviews.llvm.org/D153173

Reviewed By: K-Wu

Differential Revision: https://reviews.llvm.org/D154564
2023-07-06 09:48:19 -07:00
Kun Wu
be2dd22b8f [mlir][sparse][gpu] reuse CUDA environment handle throughout instance lifetime
Differential Revision: https://reviews.llvm.org/D153173
2023-06-30 21:52:34 +00:00
Kun Wu
7a3ebba9cb [mlir][sparse][gpu] Add explaining string to three static_assert stmts
Differential Revision: https://reviews.llvm.org/D154243
2023-06-30 14:10:45 -05:00
Kun Wu
632ccc538c [mlir][sparse][gpu] remove tuple as one of the spmm_buffer_size output type
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D153188
2023-06-19 15:57:50 +00:00
Kun Wu
9167dd46ba [mlir][sparse][gpu] recognizing sddmm pattern in GPU libgen path
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D151582
2023-06-15 23:48:11 +00:00
Kun Wu
ac30f48e37 [mlir][sparse][gpu]fix various cusparseLt bugs
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D152489
2023-06-12 23:48:49 +00:00
Navdeep Katel
18cc07aa07 [MLIR][GPU] Add 16-bit version of cudaMemset in cudaRuntimeWrappers
Add 16-bit version of cudaMemset in cudaRuntimeWrappers and update the GPU to LLVM lowering.

Reviewed By: bondhugula

Differential Revision: https://reviews.llvm.org/D151642
2023-06-08 17:33:26 +05:30
Aart Bik
50db4789a8 [mlir][sparse][gpu] refined build setup for cusparse
Reviewed By: K-Wu

Differential Revision: https://reviews.llvm.org/D152387
2023-06-07 11:09:22 -07:00
Kun Wu
8ed59c53de [mlir][sparse][gpu] add sm8.0+ tensor core 2:4 sparsity support
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D151775
2023-06-06 23:13:21 +00:00
Aart Bik
9fc02a7a08 [mlir][sparse][gpu] add AoS COO support to cuSPARSE
Even though this feature was deprecated in release 11.2,
any library before this version still supports the feature,
which is why we are making it available under a macro.

Reviewed By: K-Wu

Differential Revision: https://reviews.llvm.org/D152290
2023-06-06 12:32:46 -07:00
Kun Wu
7e44f0736a [mlir][gpu][sparse] fix broken type in cusparseCreateCsr
Differential Revision: https://reviews.llvm.org/D151912
2023-06-01 18:06:09 +00:00
Kun Wu
be6c532005 [mlir][sparse][gpu] fixing broken literal names in cuda runner macros
Differential Revision: https://reviews.llvm.org/D151910
2023-06-01 17:52:58 +00:00
Kun Wu
cc402de0b1 [mlir][sparse][gpu] add result type to spmv and spmm gpu libgen path
Differential Revision: https://reviews.llvm.org/D151592
2023-06-01 17:17:40 +00:00
Aart Bik
752c04777f [mlir][sparse][gpu] fix merge conflict
Reviewed By: K-Wu

Differential Revision: https://reviews.llvm.org/D151619
2023-05-27 13:42:20 -07:00
Kun Wu
cf44847b4d [mlir][gpu][sparse] adding cusparse sddmm support
Differential Revision: https://reviews.llvm.org/D151279
2023-05-27 20:01:41 +00:00
Aart Bik
74e29d3715 [mlir][sparse][gpu] fix merge conflict
Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D151574
2023-05-26 11:00:20 -07:00
Kun Wu
235fbe792b [mlir] [sparse] [gpu] adding transpose support to spmm spmv
Reviewed By: aartbik, wrengr

Differential Revision: https://reviews.llvm.org/D151259
2023-05-26 17:07:09 +00:00
Aart Bik
bcb698bfdc [mlir][sparse][gpu] various cuSparse refinements
(1) keep all cuSparse ops on single stream without wait() in right order
(2) use more type precise memref types for COO
(3) use ToTensor on resulting memref (even though it folds away again)

Reviewed By: K-Wu

Differential Revision: https://reviews.llvm.org/D151404
2023-05-24 22:32:52 -07:00
Aart Bik
4ebd836d9e [mlir][sparse][gpu] fix F32 bug for SpMV and SpMM
The alpha/beta variables, residing on the host, should have the
32-bit or 64-bit width of the result type. It was formerly always
passed as double.

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D151255
2023-05-23 17:36:03 -07:00
Aart Bik
a8e1f80f8b [mlir][sparse][gpu] derive type of cuSparse op
This no longer assumes just F64 output.

Note, however, that it will be cleaner to carry the data type in the corresponding operation (rather than tracking operands). That will also allow for mixed type cases, where operands and result type are different

This will be done in a follow revision where the result type is carried by the SpMV/SpMM op itself (and friends).

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D151005
2023-05-19 17:07:52 -07:00
Aart Bik
981cf1678d [mlir][sparse][gpu] add SpMM to GPU ops dialect
Reviewed By: ThomasRaoux, K-Wu

Differential Revision: https://reviews.llvm.org/D150618
2023-05-19 12:46:11 -07:00
Aart Bik
b700a90cc0 [mlir][gpu][sparse] add gpu ops for sparse matrix computations
This revision extends the GPU dialect with ops that can be lowered to
host-oriented sparse matrix library calls (in this case cuSparse focused
although the ops could be generalized to support more GPUs in principle).
This will allow the "sparse compiler pipeline" to accelerate sparse operations
(see follow up revisions with examples of this).

For some background;

https://discourse.llvm.org/t/sparse-compiler-and-gpu-code-generation/69786/2

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D150152
2023-05-12 10:44:36 -07:00
max
8f7c8a6ea7 Add gpu::HostUnregisterOp
Without explicitly unregistering you will get

```
'cuMemHostRegister(ptr, sizeBytes, 0)' failed with 'CUDA_ERROR_HOST_MEMORY_ALREADY_REGISTERED'
```

in CUDA (for example) after repeated runs (e.g., during benchmarking the same kernel).

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D147277
2023-04-06 15:07:12 -05:00
Mehdi Amini
6b7e6ea489 Revert "Fix CUDA runtime wrapper for GPU mem alloc/free to async"
This reverts commit b4117fede20b8c649320ad37364ae208baa0d0e7.
This broke one of the MLIR bot, a test is failing.
2022-04-12 06:50:27 +00:00
Uday Bondhugula
b4117fede2 Fix CUDA runtime wrapper for GPU mem alloc/free to async
Switch CUDA runtime wrapper for GPU mem alloc/free to async. The
semantics of the GPU dialect ops (gpu.alloc/dealloc) and the wrappers it
lowered to (gpu-to-llvm) was for the async versions -- however, this was
being incorrectly mapped to cuMemAlloc/cuMemFree instead of
cuMemAllocAsync/cuMemFreeAsync.

Reviewed By: csigg

Differential Revision: https://reviews.llvm.org/D123482
2022-04-12 09:04:02 +05:30
Krzysztof Drewniak
c5803ee4fa [MLIR][GPU] Remove call to cudaSetDevice(), which no longer exists
Differential Revision: https://reviews.llvm.org/D120085
2022-02-17 21:38:05 +00:00
Krzysztof Drewniak
84718d37db [MLIR][GPU] Add gpu.set_default_device op
This op is added to allow MLIR code running on multi-GPU systems to
select the GPU they want to execute operations on when no GPU is
otherwise specified.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D119883
2022-02-17 21:30:09 +00:00
Nicolas Vasilache
012c0cc7c3 [mlir] NFC - Avoid unused symbol in opt mode. 2021-10-14 11:26:33 +00:00
Loren Maggiore
361458b1ce [mlir] create gpu memset op
Create a gpu memset op and corresponding CUDA and ROCm wrappers.

Reviewed By: herhut, lorenrose1013

Differential Revision: https://reviews.llvm.org/D107548
2021-09-04 08:13:04 +02:00
Aart Bik
b9f87e24f2 [mlir] add missing include, fix broken build
Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D108873
2021-08-28 09:36:38 -07:00
Uday Bondhugula
4edc9e2acf [MLIR][GPU] Drop mgpuMemHostRegisterMemRef's dependence on LLVM Support
Drop mgpuMemHostRegisterMemRef's dependence on LLVM Support. This
method is the only one in CUDA runtime wrappers library that creates
a dependence on libLLVMSupport due to its use of SmallVector and
ArrayRef. The code can be as easily/compactly written without those ADT.
The dependence on LLVMSupport adds a significant amount of additional
complexity for external things that want to link this library in (both
statically or as a shared object) since libLLVMSupport includes numerous
other objects that are sensitive to C++ compiler version and ABI.

Differential Revision: https://reviews.llvm.org/D108684
2021-08-28 11:37:55 +05:30
Christian Sigg
f69d5a7fc7 [mlir] Initialize CUDA context lazily.
So we can remove the ignore-warning pragma again.

Reviewed By: herhut

Differential Revision: https://reviews.llvm.org/D97864
2021-03-04 13:07:56 +01:00
Christian Sigg
b6ac26fce5 [mlir] Silence -Wglobal-constructors error in CudaRuntimeWrapper.cpp
Until I have a better solution with dynamic initialization, to get
the nvidia build bot green again.
2021-03-03 13:48:03 +01:00
Christian Sigg
9d7be77bf9 [mlir] Move cuda tests
Move test inputs to test/Integration directory.
Move runtime wrappers to ExecutionEngine.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D97463
2021-03-03 13:16:51 +01:00