16 Commits

Author SHA1 Message Date
K-Wu
e37fc3cc39 [mlir][sparse][gpu] Impl 2:4 SpMM rewrite for linalg op w/ DENSE24 attr
Differential Revision: https://reviews.llvm.org/D154772
2023-07-10 22:36:57 +00:00
Aart Bik
03125e6894 [mlir][sparse][gpu] fix missing dealloc
This dealloc was incorrectly removed in
https://reviews.llvm.org/D153173

Reviewed By: K-Wu

Differential Revision: https://reviews.llvm.org/D154564
2023-07-06 09:48:19 -07:00
Kun Wu
be2dd22b8f [mlir][sparse][gpu] reuse CUDA environment handle throughout instance lifetime
Differential Revision: https://reviews.llvm.org/D153173
2023-06-30 21:52:34 +00:00
Aart Bik
f14c8eb595 [mlir][sparse][gpu] refine SDDMM pattern for cuSPARSE
Old pattern was missing some cases (e.g. swapping the arguments)
but it also allowed too many cases (e.g. non-empty "absent" or
different arguments for add/mul). This fixes the issues.

Reviewed By: K-Wu

Differential Revision: https://reviews.llvm.org/D153487
2023-06-21 18:31:55 -07:00
Kun Wu
9167dd46ba [mlir][sparse][gpu] recognizing sddmm pattern in GPU libgen path
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D151582
2023-06-15 23:48:11 +00:00
Kun Wu
97f4c22b3a [mlir][sparse][gpu] unify dnmat and dnvec handle and ops
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D152465
2023-06-09 17:16:48 +00:00
Kun Wu
8ed59c53de [mlir][sparse][gpu] add sm8.0+ tensor core 2:4 sparsity support
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D151775
2023-06-06 23:13:21 +00:00
Aart Bik
22caafc9f3 [mlir][sparse][gpu] end to end test for matmul
(1) minor bug fix in copy back [always nice to run stuff ;-)]
(2) run with and without lib (even though some fall back to CPU)

Reviewed By: wrengr

Differential Revision: https://reviews.llvm.org/D151507
2023-05-25 16:10:22 -07:00
Aart Bik
bcb698bfdc [mlir][sparse][gpu] various cuSparse refinements
(1) keep all cuSparse ops on single stream without wait() in right order
(2) use more type precise memref types for COO
(3) use ToTensor on resulting memref (even though it folds away again)

Reviewed By: K-Wu

Differential Revision: https://reviews.llvm.org/D151404
2023-05-24 22:32:52 -07:00
Aart Bik
b75d6a40f1 [mlir][sparse][gpu] recognize SpMM cuSparse during sparsification
Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D150715
2023-05-19 17:22:59 -07:00
wren romano
7f5fb90bbb [mlir][sparse] Fixing GPU tests (followup to D150330)
The GPU tests weren't updated when rebasing D150330, so this patch fixes that.

Reviewed By: anlunx

Differential Revision: https://reviews.llvm.org/D150822
2023-05-17 15:29:54 -07:00
wren romano
a0615d020a [mlir][sparse] Renaming the STEA field dimLevelType to lvlTypes
This commit is part of the migration of towards the new STEA syntax/design.  In particular, this commit includes the following changes:
* Renaming compiler-internal functions/methods:
  * `SparseTensorEncodingAttr::{getDimLevelType => getLvlTypes}`
  * `Merger::{getDimLevelType => getLvlType}` (for consistency)
  * `sparse_tensor::{getDimLevelType => buildLevelType}` (to help reduce confusion vs actual getter methods)
* Renaming external facets to match:
  * the STEA parser and printer
  * the C and Python bindings
  * PyTACO

However, the actual renaming of the `DimLevelType` itself (along with all the "dlt" names) will be handled in a separate commit.

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D150330
2023-05-17 14:24:09 -07:00
Aart Bik
ee42e23614 [mlir][sparse][gpu] first implementation of the GPU libgen approach
The sparse compiler now has two prototype strategies for GPU acceleration:

* CUDA codegen: this converts sparsified code to CUDA threads
* CUDA libgen: this converts pre-sparsified code to cuSPARSE library calls

This revision introduces the first steps required for the second approach.

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D150170
2023-05-15 08:49:38 -07:00
Aart Bik
86888e420c [mlir][sparse][gpu] generate proper memcpy in/out host and device
The host registration is a convenient way to get CUDA kernels
running, but it may be slow and does not work for all buffer
(like global constants). This revision uses the proper alloc
copy dealloc chains for buffers, using asynchronous chains
to increase overlap. The host registration mechanism is
kept under a flag for the output, just for experimentation
purposes while this project ramps up.

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D148682
2023-04-21 09:30:42 -07:00
Aart Bik
4889214a48 [mlir][sparse][gpu] generate single module, unique kernel names
This fixes a TODO in the first version.

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D148406
2023-04-15 17:25:36 -07:00
Aart Bik
19466ebc7f [mlir][sparse][gpu] a first prototype sparse GPU code generator
This implements a proof-of-concept GPU code generator
to the sparse compiler pipeline, currently only capable
of generating CUDA threads for outermost parallel loops.

The objective, obviously, is to grow this concept
to a full blown GPU code generator, capable of the
right combinaton of code generation as well as exploiting
idiomatic kernels or vector specific libraries (think cuSparse).

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D147483
2023-04-05 11:32:06 -07:00