llvm-project

Author	SHA1	Message	Date
Yinying Li	e5924d6499	[mlir][sparse] Implement parsing n out of m (#79935 ) 1. Add parsing methods for block[n, m]. 2. Encode n and m with the newly extended 64-bit LevelType enum. 3. Update 2:4 methods names/comments to n:m.	2024-02-08 14:38:42 -05:00
Uday Bondhugula	fe8a62c463	[MLIR] Fix crash in AffineMap::replace for zero result maps (#80930 ) Fix obvious bug in AffineMap::replace for the case of zero result maps. Extend/complete inferExprsFromList to work with empty expression lists.	2024-02-08 19:16:29 +05:30
Aart Bik	41a07e668c	[mlir][sparse] recognize NVidia 2:4 type for matmul (#76758 ) This removes the temporary DENSE24 attribute and replaces it with proper recognition of dense to 24 conversion. The compressionh will be performed on the device prior to performing the matrix mult. Note that we no longer need to start with the linalg version, we can lift this to the proper named linalg op. Also renames some files into more consistent names.	2024-01-02 14:44:24 -08:00
Matthias Springer	10056c821a	[mlir][SCF] `scf.parallel`: Make reductions part of the terminator (#75314 ) This commit makes reductions part of the terminator. Instead of `scf.yield`, `scf.reduce` now terminates the body of `scf.parallel` ops. `scf.reduce` may contain an arbitrary number of reductions, with one region per reduction. Example: ```mlir %init = arith.constant 0.0 : f32 %r:2 = scf.parallel (%iv) = (%lb) to (%ub) step (%step) init (%init, %init) -> f32, f32 { %elem_to_reduce1 = load %buffer1[%iv] : memref<100xf32> %elem_to_reduce2 = load %buffer2[%iv] : memref<100xf32> scf.reduce(%elem_to_reduce1, %elem_to_reduce2 : f32, f32) { ^bb0(%lhs : f32, %rhs: f32): %res = arith.addf %lhs, %rhs : f32 scf.reduce.return %res : f32 }, { ^bb0(%lhs : f32, %rhs: f32): %res = arith.mulf %lhs, %rhs : f32 scf.reduce.return %res : f32 } } ``` `scf.reduce` operations can no longer be interleaved with other ops in the body of `scf.parallel`. This simplifies the op and makes it possible to assign the `RecursiveMemoryEffects` trait to `scf.reduce`. (This was not possible before because the op was not a terminator, causing the op to be DCE'd.)	2023-12-20 11:06:27 +09:00
Matthias Springer	ea979b24b0	[mlir][SparseTensor][NFC] Remove `isNestedIn` helper function (#75729 ) Use `Region::findAncestorBlockInRegion` instead of a custom IR traversal.	2023-12-17 13:19:27 +09:00
Peiming Liu	4a72a4ef12	[NFC][mlir][sparse] remove redundant parameter. (#75551 )	2023-12-15 09:29:22 -08:00
Aart Bik	365777ecbe	[mlir][sparse] refactor utilities into transform/utils dir (#75250 ) Separates actual transformation files from supporting utility files in the transforms directory. Includes a bazel overlay fix for the build (as well as a bit of cleanup of that file to be less verbose and more flexible).	2023-12-12 15:34:31 -08:00
Matthias Springer	861600f175	[mlir][SparseTensor] Fix invalid IR in `ForallRewriter` pattern (#74547 ) The `ForallRewriter` pattern used to generate invalid IR: ``` mlir/test/Dialect/SparseTensor/GPU/gpu_combi.mlir:0:0: error: 'scf.for' op expects region #0 to have 0 or 1 blocks mlir/test/Dialect/SparseTensor/GPU/gpu_combi.mlir:0:0: note: see current operation: "scf.for"(%8, %2, %9) ({ ^bb0(%arg5: index): // ... "scf.yield"() : () -> () ^bb1(%10: index): // no predecessors "scf.yield"() : () -> () }) : (index, index, index) -> () ``` This commit fixes tests such as `mlir/test/Dialect/SparseTensor/GPU/gpu_combi.mlir` when verifying the IR after each pattern application (#74270).	2023-12-07 08:47:20 +09:00
Maksim Levental	e35b606280	[mlir][sparsifier] fix `isAdmissibleBSR` (#72195 ) Fixes https://github.com/llvm/llvm-project/issues/72194.	2023-11-14 16:56:34 -06:00
Aart Bik	5ef446790f	[mlir][sparse][gpu] cleanup GPUDataTransferStrategy (#71615 ) The flag seems to be doing practically the same thing for zero cost and pinned dma. In addition, the register host is not truly the right zero cost mechanism according to Thomas. So we are simplifying the setup for now, until we have a better definition for what to implement and test. https://github.com/llvm/llvm-project/issues/64316	2023-11-08 09:45:11 -08:00
Tim Harvey	c43e627457	Changed the phrase sparse-compiler to sparsifier in comments (#71578 ) When the Powers That Be decided that the name "sparse compiler" should be changed to "sparsifier", we negected to change some of the comments in the code; this pull request completes the name change.	2023-11-07 20:55:00 +00:00
Aart Bik	3d89c088af	[mlir][sparse] support BSR for cuSPARSE (libgen path only) (#69646 )	2023-10-19 16:56:52 -07:00
Aart Bik	3231a365c1	[mlir][sparse][gpu] add CSC to libgen GPU sparsification using cuSparse (#67713 ) Add CSC, but also adds BSR as a future format. Coming soon!	2023-09-28 11:47:22 -07:00
Peiming Liu	6ca47eb49d	[mlir][sparse] rename sparse_tensor.(un)pack to sparse_tensor.(dis)as… (#67717 ) …semble Pack/Unpack are overridden in many other places, rename the operations to avoid confusion.	2023-09-28 11:01:10 -07:00
Aart Bik	619a888dd8	[mlir][sparse][gpu] free all buffers allocated for spGEMM (#66813 ) Yup, a bit of an oversight ;-)	2023-09-19 14:33:12 -07:00
Matthias Springer	9b5ef2bea8	[mlir][Interfaces] `LoopLikeOpInterface`: Support ops with multiple regions (#66754 ) This commit implements `LoopLikeOpInterface` on `scf.while`. This enables LICM (and potentially other transforms) on `scf.while`. `LoopLikeOpInterface::getLoopBody()` is renamed to `getLoopRegions` and can now return multiple regions. Also fix a bug in the default implementation of `LoopLikeOpInterface::isDefinedOutsideOfLoop()`, which returned "false" for some values that are defined outside of the loop (in a nested op, in such a way that the value does not dominate the loop). This interface is currently only used for LICM and there is no way to trigger this bug, so no test is added.	2023-09-19 17:35:38 +02:00
Aart Bik	289f7231f9	[mlir][sparse][gpu] minor code cleanup for sparse gpu ops Consistent order of ops and related methods. Also, renamed SpGEMMGetSizeOp to SpMatGetSizeOp since this is a general utility for sparse matrices, not specific to GEMM ops only. Reviewed By: Peiming Differential Revision: https://reviews.llvm.org/D157922	2023-08-14 15:08:57 -07:00
Aart Bik	76a80a0808	[mlir][sparse][gpu] sparsifier GPU libgen for SpGEMM in cuSparse With working integration end-to-end test Reviewed By: K-Wu Differential Revision: https://reviews.llvm.org/D157652	2023-08-10 14:52:16 -07:00
K-Wu	cfa82f7783	[mlir][sparse][gpu] introduce flag that controls host to device copy strategies (regular dma default) Differential Revision: https://reviews.llvm.org/D155352	2023-08-01 22:30:40 +00:00
Kun Wu	1e491c425b	[mlir][sparse][gpu] add 2:4 spmm prune_and_check flag Differential Revision: https://reviews.llvm.org/D155909	2023-08-01 18:24:18 +00:00
K-Wu	e37fc3cc39	[mlir][sparse][gpu] Impl 2:4 SpMM rewrite for linalg op w/ DENSE24 attr Differential Revision: https://reviews.llvm.org/D154772	2023-07-10 22:36:57 +00:00
Aart Bik	03125e6894	[mlir][sparse][gpu] fix missing dealloc This dealloc was incorrectly removed in https://reviews.llvm.org/D153173 Reviewed By: K-Wu Differential Revision: https://reviews.llvm.org/D154564	2023-07-06 09:48:19 -07:00
Kun Wu	be2dd22b8f	[mlir][sparse][gpu] reuse CUDA environment handle throughout instance lifetime Differential Revision: https://reviews.llvm.org/D153173	2023-06-30 21:52:34 +00:00
Aart Bik	f14c8eb595	[mlir][sparse][gpu] refine SDDMM pattern for cuSPARSE Old pattern was missing some cases (e.g. swapping the arguments) but it also allowed too many cases (e.g. non-empty "absent" or different arguments for add/mul). This fixes the issues. Reviewed By: K-Wu Differential Revision: https://reviews.llvm.org/D153487	2023-06-21 18:31:55 -07:00
Kun Wu	9167dd46ba	[mlir][sparse][gpu] recognizing sddmm pattern in GPU libgen path Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D151582	2023-06-15 23:48:11 +00:00
Aart Bik	1ea903e164	[mlir][sparse][gpu] guard matvec COO AoS Reviewed By: K-Wu Differential Revision: https://reviews.llvm.org/D152738	2023-06-12 16:49:58 -07:00
Kun Wu	97f4c22b3a	[mlir][sparse][gpu] unify dnmat and dnvec handle and ops Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D152465	2023-06-09 17:16:48 +00:00
Kun Wu	8ed59c53de	[mlir][sparse][gpu] add sm8.0+ tensor core 2:4 sparsity support Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D151775	2023-06-06 23:13:21 +00:00
Aart Bik	9fc02a7a08	[mlir][sparse][gpu] add AoS COO support to cuSPARSE Even though this feature was deprecated in release 11.2, any library before this version still supports the feature, which is why we are making it available under a macro. Reviewed By: K-Wu Differential Revision: https://reviews.llvm.org/D152290	2023-06-06 12:32:46 -07:00
Kun Wu	fa98bdbd95	[mlir][sparse][gpu] make computeType mandatory Differential Revision: https://reviews.llvm.org/D152018	2023-06-02 21:47:44 +00:00
Kun Wu	235fbe792b	[mlir] [sparse] [gpu] adding transpose support to spmm spmv Reviewed By: aartbik, wrengr Differential Revision: https://reviews.llvm.org/D151259	2023-05-26 17:07:09 +00:00
Tres Popp	68f58812e3	[mlir] Move casting calls from methods to function calls The MLIR classes Type/Attribute/Operation/Op/Value support cast/dyn_cast/isa/dyn_cast_or_null functionality through llvm's doCast functionality in addition to defining methods with the same name. This change begins the migration of uses of the method to the corresponding function call as has been decided as more consistent. Note that there still exist classes that only define methods directly, such as AffineExpr, and this does not include work currently to support a functional cast/isa call. Context: - https://mlir.llvm.org/deprecation/ at "Use the free function variants for dyn_cast/cast/isa/…" - Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443 Implementation: This patch updates all remaining uses of the deprecated functionality in mlir/. This was done with clang-tidy as described below and further modifications to GPUBase.td and OpenMPOpsInterfaces.td. Steps are described per line, as comments are removed by git: 0. Retrieve the change from the following to build clang-tidy with an additional check: main...tpopp:llvm-project:tidy-cast-check 1. Build clang-tidy 2. Run clang-tidy over your entire codebase while disabling all checks and enabling the one relevant one. Run on all header files also. 3. Delete .inc files that were also modified, so the next build rebuilds them to a pure state. ``` ninja -C $BUILD_DIR clang-tidy run-clang-tidy -clang-tidy-binary=$BUILD_DIR/bin/clang-tidy -checks='-,misc-cast-functions'\ -header-filter=mlir/ mlir/ -fix rm -rf $BUILD_DIR/tools/mlir/*/.inc ``` Differential Revision: https://reviews.llvm.org/D151542	2023-05-26 10:29:55 +02:00
Aart Bik	22caafc9f3	[mlir][sparse][gpu] end to end test for matmul (1) minor bug fix in copy back [always nice to run stuff ;-)] (2) run with and without lib (even though some fall back to CPU) Reviewed By: wrengr Differential Revision: https://reviews.llvm.org/D151507	2023-05-25 16:10:22 -07:00
Aart Bik	bcb698bfdc	[mlir][sparse][gpu] various cuSparse refinements (1) keep all cuSparse ops on single stream without wait() in right order (2) use more type precise memref types for COO (3) use ToTensor on resulting memref (even though it folds away again) Reviewed By: K-Wu Differential Revision: https://reviews.llvm.org/D151404	2023-05-24 22:32:52 -07:00
Kun Wu	86bf710cf7	[mlir] [gpu] [sparse] refined SparseHandle type Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D151014	2023-05-24 10:16:07 -07:00
Aart Bik	b75d6a40f1	[mlir][sparse][gpu] recognize SpMM cuSparse during sparsification Reviewed By: Peiming Differential Revision: https://reviews.llvm.org/D150715	2023-05-19 17:22:59 -07:00
Aart Bik	ee42e23614	[mlir][sparse][gpu] first implementation of the GPU libgen approach The sparse compiler now has two prototype strategies for GPU acceleration: * CUDA codegen: this converts sparsified code to CUDA threads * CUDA libgen: this converts pre-sparsified code to cuSPARSE library calls This revision introduces the first steps required for the second approach. Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D150170	2023-05-15 08:49:38 -07:00
Tres Popp	5550c82189	[mlir] Move casting calls from methods to function calls The MLIR classes Type/Attribute/Operation/Op/Value support cast/dyn_cast/isa/dyn_cast_or_null functionality through llvm's doCast functionality in addition to defining methods with the same name. This change begins the migration of uses of the method to the corresponding function call as has been decided as more consistent. Note that there still exist classes that only define methods directly, such as AffineExpr, and this does not include work currently to support a functional cast/isa call. Caveats include: - This clang-tidy script probably has more problems. - This only touches C++ code, so nothing that is being generated. Context: - https://mlir.llvm.org/deprecation/ at "Use the free function variants for dyn_cast/cast/isa/…" - Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443 Implementation: This first patch was created with the following steps. The intention is to only do automated changes at first, so I waste less time if it's reverted, and so the first mass change is more clear as an example to other teams that will need to follow similar steps. Steps are described per line, as comments are removed by git: 0. Retrieve the change from the following to build clang-tidy with an additional check: https://github.com/llvm/llvm-project/compare/main...tpopp:llvm-project:tidy-cast-check 1. Build clang-tidy 2. Run clang-tidy over your entire codebase while disabling all checks and enabling the one relevant one. Run on all header files also. 3. Delete .inc files that were also modified, so the next build rebuilds them to a pure state. 4. Some changes have been deleted for the following reasons: - Some files had a variable also named cast - Some files had not included a header file that defines the cast functions - Some files are definitions of the classes that have the casting methods, so the code still refers to the method instead of the function without adding a prefix or removing the method declaration at the same time. ``` ninja -C $BUILD_DIR clang-tidy run-clang-tidy -clang-tidy-binary=$BUILD_DIR/bin/clang-tidy -checks='-,misc-cast-functions'\ -header-filter=mlir/ mlir/ -fix rm -rf $BUILD_DIR/tools/mlir/*/.inc git restore mlir/lib/IR mlir/lib/Dialect/DLTI/DLTI.cpp\ mlir/lib/Dialect/Complex/IR/ComplexDialect.cpp\ mlir/lib/**/IR/\ mlir/lib/Dialect/SparseTensor/Transforms/SparseVectorization.cpp\ mlir/lib/Dialect/Vector/Transforms/LowerVectorMultiReduction.cpp\ mlir/test/lib/Dialect/Test/TestTypes.cpp\ mlir/test/lib/Dialect/Transform/TestTransformDialectExtension.cpp\ mlir/test/lib/Dialect/Test/TestAttributes.cpp\ mlir/unittests/TableGen/EnumsGenTest.cpp\ mlir/test/python/lib/PythonTestCAPI.cpp\ mlir/include/mlir/IR/ ``` Differential Revision: https://reviews.llvm.org/D150123	2023-05-12 11:21:25 +02:00
Aart Bik	86888e420c	[mlir][sparse][gpu] generate proper memcpy in/out host and device The host registration is a convenient way to get CUDA kernels running, but it may be slow and does not work for all buffer (like global constants). This revision uses the proper alloc copy dealloc chains for buffers, using asynchronous chains to increase overlap. The host registration mechanism is kept under a flag for the output, just for experimentation purposes while this project ramps up. Reviewed By: Peiming Differential Revision: https://reviews.llvm.org/D148682	2023-04-21 09:30:42 -07:00
Aart Bik	4889214a48	[mlir][sparse][gpu] generate single module, unique kernel names This fixes a TODO in the first version. Reviewed By: Peiming Differential Revision: https://reviews.llvm.org/D148406	2023-04-15 17:25:36 -07:00
Aart Bik	19466ebc7f	[mlir][sparse][gpu] a first prototype sparse GPU code generator This implements a proof-of-concept GPU code generator to the sparse compiler pipeline, currently only capable of generating CUDA threads for outermost parallel loops. The objective, obviously, is to grow this concept to a full blown GPU code generator, capable of the right combinaton of code generation as well as exploiting idiomatic kernels or vector specific libraries (think cuSparse). Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D147483	2023-04-05 11:32:06 -07:00

41 Commits