As described in issue llvm/llvm-project#91518, a previous PR
llvm/llvm-project#78484 introduced the `defaultMemorySpaceFn` into
bufferization options, allowing one to inform OneShotBufferize that it
should use a specified function to derive the memory space attribute
from the encoding attribute attached to tensor types.
However, introducing this feature exposed unhandled edge cases,
examples of which are introduced by this change in the new test under
`test/Dialect/Bufferization/Transforms/one-shot-bufferize-encodings.mlir`.
Fixing the inconsistencies introduced by `defaultMemorySpaceFn` is
pretty simple. This change:
- Updates the `bufferization.to_memref` and `bufferization.to_tensor`
operations to explicitly include operand and destination types,
whereas previously they relied on type inference to deduce the
tensor types. Since the type inference cannot recover the correct
tensor encoding/memory space, the operand and result types must be
explicitly included. This is a small assembly format change, but it
touches a large number of test files.
- Makes minor updates to other bufferization functions to handle the
changes in building the above ops.
- Updates bufferization of `tensor.from_elements` to handle memory
space.
Integration/upgrade guide:
In downstream projects, if you have tests or MLIR files that explicitly
use
`bufferization.to_tensor` or `bufferization.to_memref`, then update
them to the new assembly format as follows:
```
%1 = bufferization.to_memref %0 : memref<10xf32>
%2 = bufferization.to_tensor %1 : memref<10xf32>
```
becomes
```
%1 = bufferization.to_memref %0 : tensor<10xf32> to memref<10xf32>
%2 = bufferization.to_tensor %0 : memref<10xf32> to tensor<10xf32>
```
This removes the temporary DENSE24 attribute and replaces it with proper
recognition of dense to 24 conversion. The compressionh will be
performed on the device prior to performing the matrix mult. Note that
we no longer need to start with the linalg version, we can lift this to
the proper named linalg op. Also renames some files into more consistent
names.
Previous change no longer properly used the GPU libgen pass (even though
most tests still passed falling back to CPU). This revision puts the
proper pass order into place. Also bit of a cleanup of CPU codegen vs.
libgen setup.
For all the mlir tests (except for roundtrip_coding.mlir), change the
check test to use general form of encoding
`#sparse_tensor.encoding<{{{.*}}}>` instead of actual encoding such as
`#sparse_tensor.encoding<{ lvlTypes = [ "compressed", "singleton" ] }>`.
Consistent order of ops and related methods.
Also, renamed SpGEMMGetSizeOp to SpMatGetSizeOp
since this is a general utility for sparse matrices,
not specific to GEMM ops only.
Reviewed By: Peiming
Differential Revision: https://reviews.llvm.org/D157922
Old pattern was missing some cases (e.g. swapping the arguments)
but it also allowed too many cases (e.g. non-empty "absent" or
different arguments for add/mul). This fixes the issues.
Reviewed By: K-Wu
Differential Revision: https://reviews.llvm.org/D153487
(1) minor bug fix in copy back [always nice to run stuff ;-)]
(2) run with and without lib (even though some fall back to CPU)
Reviewed By: wrengr
Differential Revision: https://reviews.llvm.org/D151507
(1) keep all cuSparse ops on single stream without wait() in right order
(2) use more type precise memref types for COO
(3) use ToTensor on resulting memref (even though it folds away again)
Reviewed By: K-Wu
Differential Revision: https://reviews.llvm.org/D151404
The GPU tests weren't updated when rebasing D150330, so this patch fixes that.
Reviewed By: anlunx
Differential Revision: https://reviews.llvm.org/D150822
This commit is part of the migration of towards the new STEA syntax/design. In particular, this commit includes the following changes:
* Renaming compiler-internal functions/methods:
* `SparseTensorEncodingAttr::{getDimLevelType => getLvlTypes}`
* `Merger::{getDimLevelType => getLvlType}` (for consistency)
* `sparse_tensor::{getDimLevelType => buildLevelType}` (to help reduce confusion vs actual getter methods)
* Renaming external facets to match:
* the STEA parser and printer
* the C and Python bindings
* PyTACO
However, the actual renaming of the `DimLevelType` itself (along with all the "dlt" names) will be handled in a separate commit.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D150330
The sparse compiler now has two prototype strategies for GPU acceleration:
* CUDA codegen: this converts sparsified code to CUDA threads
* CUDA libgen: this converts pre-sparsified code to cuSPARSE library calls
This revision introduces the first steps required for the second approach.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D150170
The host registration is a convenient way to get CUDA kernels
running, but it may be slow and does not work for all buffer
(like global constants). This revision uses the proper alloc
copy dealloc chains for buffers, using asynchronous chains
to increase overlap. The host registration mechanism is
kept under a flag for the output, just for experimentation
purposes while this project ramps up.
Reviewed By: Peiming
Differential Revision: https://reviews.llvm.org/D148682
This implements a proof-of-concept GPU code generator
to the sparse compiler pipeline, currently only capable
of generating CUDA threads for outermost parallel loops.
The objective, obviously, is to grow this concept
to a full blown GPU code generator, capable of the
right combinaton of code generation as well as exploiting
idiomatic kernels or vector specific libraries (think cuSparse).
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D147483