Registering the SparsificationAndBufferization into a proper TD pass
has the advantage that it can be invoked and tested in isolation. This
change also moves some bufferization specific set up from the pipeline
file into the pass file, keeping the logic more locally.
Reviewed By: Peiming
Differential Revision: https://reviews.llvm.org/D158219
The sparse compiler now has two prototype strategies for GPU acceleration:
* CUDA codegen: this converts sparsified code to CUDA threads
* CUDA libgen: this converts pre-sparsified code to cuSPARSE library calls
This revision introduces the first steps required for the second approach.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D150170
The op is not bufferizable but should be analyzable (for `EliminateEmptyTensors`, which uses the bufferization infrastructure).
Also improve debugging functionality and error messages.
Also adds a missing pass to the sparse pipeline. (tensor.empty should be replaced with bufferization.alloc_tensor, but it sometimes used to work without depending on how the tensor.empty is used. Now we always fail explicitly.)
TensorCopyInsertion should not have been exposed as a pass. This was a flaw in the original design. It is a preparation step for bufferization and certain transforms (that would otherwise be legal) are illegal between TensorCopyInsertion and actual rewrite to MemRef ops. Therefore, even if broken down as two separate steps internally, they should be exposed as a single pass.
This change affects the sparse compiler, which uses `TensorCopyInsertionPass`. A new `SparsificationAndBufferizationPass` is added to replace all passes in the sparse tensor pipeline from `TensorCopyInsertionPass` until the actual bufferization (rewrite to memref/non-tensor). It is generally unsafe to run arbitrary passes in-between, in particular passes that hoist tensor ops out of loops or change SSA use-def chains along tensor ops.
Differential Revision: https://reviews.llvm.org/D138915