369 Commits

Author SHA1 Message Date
Aart Bik
b86d3cbc12 [mlir][sparse] complete various FIXMEs in sparse support lib
Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D159245
2023-08-30 21:30:25 -07:00
Peiming Liu
22e8d5b428 [mlir][sparse] Support strided convolution on dense level.
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D159020
2023-08-30 20:00:50 +00:00
Peiming Liu
07bd5f20bc [mlir][sparse] Support strided convolution on compressed level.
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D158912
2023-08-30 19:37:50 +00:00
Peiming Liu
96e1914aa2 [mlir][sparse] fix crash when generating convolution kernel with sparse input in DCCD format.
Reviewed By: aartbik, anlunx

Differential Revision: https://reviews.llvm.org/D159170
2023-08-30 17:49:36 +00:00
Yinying Li
51ebecf309 [mlir][sparse] Changed sparsity properties to use _ instead of -
Example: compressed-no -> compressed_no

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D158567
2023-08-23 17:00:27 +00:00
Peiming Liu
8c8aecdca9 [mlir][sparse] Supporting (non)uniqueness in SparseTensorStorage::lexDiff.
Fix copied from https://reviews.llvm.org/D156946 but with a legit test case that triggers the bug.

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D158578
2023-08-23 03:48:53 +00:00
Peiming Liu
6ca0b27298 [mlir][sparse] more complicated test for dual sparse convolution kernel.
Reviewed By: anlunx

Differential Revision: https://reviews.llvm.org/D158443
2023-08-21 18:48:01 +00:00
Andrzej Warzynski
51eaee3b42 [mlir][SparseTensor] Fix test regression
Fix a regression caused by https://reviews.llvm.org/D158012. Failing
bot:
  * https://lab.llvm.org/buildbot/#/builders/179/builds/7122

Note that both `RUN` lines in the affected file were previously
tested with similar configuraiton (_with_ and _without_ vectorisation).
This change restores that, though the new setting (from D158012) is
used, i.e.

  * with direct IR generation, `enable-runtime-library=true`.

This is sufficient to make the test pass and allows us to investigate
the root cause offline. Issue reported here:

  https://github.com/llvm/llvm-project/issues/64727
2023-08-16 09:37:07 +00:00
Aart Bik
30c1866dec [mlir][sparse][gpu] enable SpGEMM on GPU for libgen path
Direct IR supports pack, but libgen parth did not until
this was added in https://reviews.llvm.org/D158012

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D158020
2023-08-15 17:16:37 -07:00
Peiming Liu
fa6726e27b [mlir][sparse] supports sparse_tensor.pack on libgen path
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D158012
2023-08-15 20:20:54 +00:00
Benjamin Maxwell
f36e909da0 [mlir][VectorOps] Use SCF for vector.print and allow scalable vectors
Reland of the original patch after updating the Python binding tests,
a few CUDA/GPU MLIR tests, and ensuring the assembly format is
round-trippable.

This patch splits the lowering of vector.print into first converting
an n-D print into a loop of scalar prints of the elements, then a second
pass that converts those scalar prints into the runtime calls. The
former is done in VectorToSCF and the latter in VectorToLLVM.

The main reason for this is to allow printing scalable vector types,
which are not possible to fully unroll at compile time, though this
also avoids fully unrolling very large vectors.

To allow VectorToSCF to add the necessary punctuation between vectors
and elements, a "punctuation" attribute has been added to vector.print.
This abstracts calling the runtime functions such as printNewline(),
without leaking the LLVM details into the higher abstraction levels.
For example:

  vector.print punctuation <comma>

lowers to

  llvm.call @printComma() : () -> ()

The output format and runtime functions remain the same, which avoids
the need to alter a large number of tests (aside from the pipelines).

Reviewed By: awarzynski, c-rhodes, aartbik

Differential Revision: https://reviews.llvm.org/D156519
2023-08-11 09:29:54 +00:00
Andrzej Warzynski
25396e1352 [mlir][test] Fix typo in a test
Remove unnecessary `"` that prevent correct `RUN` line expansion.

Introduced in:
  *https://reviews.llvm.org/D156625

Bot failure:
  * https://lab.llvm.org/buildbot/#/builders/61/builds/47437
2023-08-11 09:37:08 +01:00
Andrzej Warzynski
23e5130ebf [mlir][test] Reland: Refactor SparseTensor CPU integration tests
CHANGES SINCE THE ORIGINAL VERSION
----------------------------------
The default test set-up was extracted from
  * SparseTensor/CPU/lit.local.cfg.
and duplicated in all tests. This is to support downstream users that
don't use these local LIT config files.

SUMMARY OF CHANGES
------------------
This patch aims to reduce test duplication. This is a direct follow-up of:
  1. https://reviews.llvm.org/D155403 (test duplication), and
  2. https://reviews.llvm.org/D155405 (code re-use),

All SVE/VLA tests are now enabled _conditionally_ and refactored to use
`mlir-cpu-runner` rather than `lli`. The former helps with test
duplication and the latter with code re-use.

A few additional refactoring changes are included.

1. The reduce verbosity, long runtime library names like:

  %mlir_native_utils_lib_dir/libmlir_c_runner_utils%shlibext

are replaced with:

  %mlir_c_runner_utils

2. In order to keep the code and the comments in sync, and to maintain
   consistency across the tests, the following:

  enable-runtime-library=true

is swapped with (and vice-versa):

  enable-runtime-library=false

Note that this change won't affect test coverage. Only few tests
required such update.

3. A VLS vectorization `RUN` line is added in tests where there was a
   VLA/VLS `RUN` line, but no VLS `RUN` line (with a few exceptions of
   tests that only contained one `RUN` line to begin with).

4. A few test variables are renamed/added. Most notable example:
  * %{options}` --> %{sparse_compiler_opts}

TEST RUNTIME IMPROVEMENT
------------------------
Tl;Dr This change improves test execution time by ~25%.

At the moment, the following `llvm-lit` invocation takes ~7.30s on my
AArch64 workstation (with SVE):

  llvm-lit  <llvm-project>/mlir/test/Integration/Dialect/SparseTensor/CPU/

This timing doesn't change no matter what the value of the following
CMake variable is (that should disable some tests):

  MLIR_RUN_ARM_SVE_TESTS

With this patch, the execution time will indeed depend on the value of
the above CMake variable:
  * with `MLIR_RUN_ARM_SVE_TESTS=true` the timing remains intact,
  * with `MLIR_RUN_ARM_SVE_TESTS=false` the timing drops to ~5.40s (~25%
    improvement).
This is expected:
  * on average there are 4 `RUN` lines per test,
  * _without this change_ (and with `MLIR_RUN_ARM_SVE_TESTS=false`) the
    4th `RUN` line would in most cases duplicate the 3rd `RUN` line,
  * _with this change) (and with `MLIR_RUN_ARM_SVE_TESTS=false`) the
    4th `RUN` line becomes empty.

PATCH SIZE
----------
While rather large and touching many files, most changes in this patch
are rather mechanical. All test configurations have been preserved and
only in a handful of cases new `RUN` lines added.

Differential Revision: https://reviews.llvm.org/D156625
2023-08-11 08:16:01 +00:00
Aart Bik
76a80a0808 [mlir][sparse][gpu] sparsifier GPU libgen for SpGEMM in cuSparse
With working integration end-to-end test

Reviewed By: K-Wu

Differential Revision: https://reviews.llvm.org/D157652
2023-08-10 14:52:16 -07:00
Mehdi Amini
1b272d21c8 Revert "[mlir][VectorOps] Use SCF for vector.print and allow scalable vectors"
This reverts commit 490dae26cb3bee2e8401e4c2a7ad3e0996be67d0.

Bot is broken, seems like there is a problem of ambiguity in the parser.
2023-08-09 19:37:01 -07:00
Benjamin Maxwell
490dae26cb [mlir][VectorOps] Use SCF for vector.print and allow scalable vectors
Reland of the original patch after updating the Python binding tests and
a few CUDA/GPU MLIR tests.

This patch splits the lowering of vector.print into first converting
an n-D print into a loop of scalar prints of the elements, then a second
pass that converts those scalar prints into the runtime calls. The
former is done in VectorToSCF and the latter in VectorToLLVM.

The main reason for this is to allow printing scalable vector types,
which are not possible to fully unroll at compile time, though this
also avoids fully unrolling very large vectors.

To allow VectorToSCF to add the necessary punctuation between vectors
and elements, a "punctuation" attribute has been added to vector.print.
This abstracts calling the runtime functions such as printNewline(),
without leaking the LLVM details into the higher abstraction levels.
For example:

  vector.print <comma>

lowers to

  llvm.call @printComma() : () -> ()

The output format and runtime functions remain the same, which avoids
the need to alter a large number of tests (aside from the pipelines).

Reviewed By: awarzynski, c-rhodes, aartbik

Differential Revision: https://reviews.llvm.org/D156519
2023-08-09 11:47:18 +00:00
Aart Bik
5a1f87f9fc Revert "[mlir][test] Refactor SparseTensor CPU integration tests"
This reverts commit e77e891d8953b487f5f06bf69225a61ef537f766.

Differential Revision: https://reviews.llvm.org/D156947
2023-08-02 15:46:41 -07:00
Andrzej Warzynski
e77e891d89 [mlir][test] Refactor SparseTensor CPU integration tests
SUMMARY OF CHANGES
------------------
This patch aims to reduce test duplication and to improve code re-use in
SparseTensor integration tests for CPU. This is a direct follow-up of:
  1. https://reviews.llvm.org/D155403 (test duplication), and
  2. https://reviews.llvm.org/D155405 (code re-use),

The key logic for this patch is implemented in:
  * SparseTensor/CPU/lit.local.cfg.
Essentially, the set-up that used to be repeated across all test files
has been extracted into a common LIT configuration file. This makes code
re-use straightforward.

All SVE/VLA tests are now enabled _conditionally_ and refactored to use
`mlir-cpu-runner` rather than `lli`. The former helps with test
duplication and the latter with code re-use.

A few additional refactoring changes are included.

1. The reduce verbosity, long runtime library names like:

  %mlir_native_utils_lib_dir/libmlir_c_runner_utils%shlibext

are replaced with:

  %mlir_c_runner_utils

2. In order to keep the code and the comments in sync, and to maintain
   consistency across the tests, the following:

  enable-runtime-library=true

is swapped with (and vice-versa):

  enable-runtime-library=false

Note that this change won't affect test coverage. Only few tests
required such update.

3. A VLS vectorization `RUN` line is added in tests where there was a
   VLA/VLS `RUN` line, but no VLS `RUN` line (with a few exceptions of
   tests that only contained one `RUN` line to begin with).

4. A few test variables are renamed/added. Most notable example:
  * %{options}` --> %{sparse_compiler_opts}

TEST RUNTIME IMPROVEMENT
------------------------
Tl;Dr This change improves test execution time by ~25%.

At the moment, the following `llvm-lit` invocation takes ~7.30s on my
AArch64 workstation (with SVE):

  llvm-lit  <llvm-project>/mlir/test/Integration/Dialect/SparseTensor/CPU/

This timing doesn't change no matter what the value of the following
CMake variable is (that should disable some tests):

  MLIR_RUN_ARM_SVE_TESTS

With this patch, the execution time will indeed depend on the value of
the above CMake variable:
  * with `MLIR_RUN_ARM_SVE_TESTS=true` the timing remains intact,
  * with `MLIR_RUN_ARM_SVE_TESTS=false` the timing drops to ~5.40s (~25%
    improvement).
This is expected:
  * on average there are 4 `RUN` lines per test,
  * _without this change_ (and with `MLIR_RUN_ARM_SVE_TESTS=false`) the
    4th `RUN` line would in most cases duplicate the 3rd `RUN` line,
  * _with this change) (and with `MLIR_RUN_ARM_SVE_TESTS=false`) the
    4th `RUN` line becomes empty.

PATCH SIZE
----------
While rather large and touching many files, most changes in this patch
are rather mechanical. All test configurations have been preserved and
only in a handful of cases new `RUN` lines added.

Differential Revision: https://reviews.llvm.org/D156625
2023-08-02 20:21:50 +00:00
K-Wu
cfa82f7783 [mlir][sparse][gpu] introduce flag that controls host to device copy strategies (regular dma default)
Differential Revision: https://reviews.llvm.org/D155352
2023-08-01 22:30:40 +00:00
Kun Wu
1e491c425b [mlir][sparse][gpu] add 2:4 spmm prune_and_check flag
Differential Revision: https://reviews.llvm.org/D155909
2023-08-01 18:24:18 +00:00
Andrzej Warzynski
e62f366b01 [mlir] Update SVE integration tests to use mlir-cpu-runner
With the recent addition of "-mattr" and "-march" to the list of options
supported by mlir-cpu-runner [1], the SVE integration
tests can be updated to use mlir-cpu-runner instead of lli. This will
allow better code re-use and more consistency

This patch updates 2 tests to demonstrate the new logic. The remaining
tests will be updated in the follow-up patches.

[1] https://reviews.llvm.org/D146917

Depends on D155403

Differential Revision: https://reviews.llvm.org/D155405
2023-07-19 08:29:17 +00:00
Andrzej Warzynski
aa9a10ac1d [mlir][SparseTensor][ArmSVE] Conditionally disable SVE RUN line
This patch updates one SparseTensor integration test so that the VLA
vectorisation is run conditionally based on the value of the
MLIR_RUN_ARM_SME_TESTS CMake variable.

This change opens the path to reduce the duplication of RUN lines in
"mlir/test/Integration/Dialect/SparseTensor/CPU/". ATM, there are
usually 2 RUN lines to test vectorization in SparseTensor integration
tests:
  * one for VLS vectorisation,
  * one for VLA vectorisation whenever that's available and which
    reduces to VLS vectorisation when VLA is not supported.
When VLA is not available, VLS vectorisation is verified twice. This
duplication should be avoided - integration test are relatively
expansive to run.

This patch makes sure that the 2nd vectorisation RUN line becomes:
```
  if (SVE integration tests are enabled)
    run VLA vectorisation
  else
    return
```
This logic is implemented using LIT's (relatively new) conditional
substitution [1]. It enables us to guarantee that all RUN lines are
unique and that the VLA vectorisation is only enabled when supported.

This patch updates only 1 test to set-up and to demonstrate the logic.
Subsequent patches will update the remaining tests.

[1] https://www.llvm.org/docs/TestingGuide.html

Differential Revision: https://reviews.llvm.org/D155403
2023-07-18 06:59:08 +00:00
Kun Wu
d46bad7b55 [mlir][sparse][gpu] add the 2:4 spmm integration test from linalg
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D155351
2023-07-15 06:01:03 +00:00
Aart Bik
4df01dc270 [mlir][sparse][gpu][nvidia] add pruning step and check to 2:4 matrix multiplication
(1) without the check, the results may silently be wrong, so check is needed
(2) add pruning step to guarantee 2:4 property

Note, in the longer run, we may want to split out the pruning step somehow,
or make it optional.

Reviewed By: K-Wu

Differential Revision: https://reviews.llvm.org/D155320
2023-07-14 12:08:13 -07:00
Aart Bik
f6f817d0d7 [mlir][sparse][gpu] minor improvements in 2:4 example
Reviewed By: K-Wu

Differential Revision: https://reviews.llvm.org/D155244
2023-07-13 16:20:27 -07:00
Guray Ozen
22a32f7d9c [mlir][gpu] Add dump-ptx option
When targeting NVIDIA GPUs, seeing the generated PTX is important. Currently, we don't have simple way to do it.

This work adds dump-ptx to gpu-to-cubin pass. One can use it like `gpu-to-cubin{chip=sm_90 features=+ptx80 dump-ptx}`.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D155166
2023-07-13 21:14:57 +02:00
Peiming Liu
fc5d8fce7d [mlir][sparse] support dual sparse convolution.
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D152601
2023-07-10 16:49:32 +00:00
Kun Wu
be2dd22b8f [mlir][sparse][gpu] reuse CUDA environment handle throughout instance lifetime
Differential Revision: https://reviews.llvm.org/D153173
2023-06-30 21:52:34 +00:00
Peiming Liu
a63d6a0014 [mlir][sparse] make UnpackOp return the actual filled length of unpacked memory
This might simplify frontend implementation by avoiding recomputation for the same value.

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D154244
2023-06-30 21:35:15 +00:00
Peiming Liu
e7df82816b [mlir][sparse] rewrite arith::SelectOp to semiring operations to sparsify it.
Reviewed By: aartbik, K-Wu

Differential Revision: https://reviews.llvm.org/D153397
2023-06-21 21:22:18 +00:00
Aart Bik
cdbdf93bf0 [mlir][sparse][gpu] extend SDDMM gpu test
Reviewed By: K-Wu

Differential Revision: https://reviews.llvm.org/D153378
2023-06-20 16:12:12 -07:00
Kun Wu
632ccc538c [mlir][sparse][gpu] remove tuple as one of the spmm_buffer_size output type
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D153188
2023-06-19 15:57:50 +00:00
Kun Wu
9167dd46ba [mlir][sparse][gpu] recognizing sddmm pattern in GPU libgen path
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D151582
2023-06-15 23:48:11 +00:00
Kun Wu
b1c683f5c4 [mlir][sparse][gpu] enable sm80+ sparsity integration test only when explicitly set
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D152966
2023-06-15 17:44:38 +00:00
Peiming Liu
faf7cd97d0 [mlir][sparse] merger extension to support sparsifying arith::CmpI/CmpF operation
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D152761
2023-06-15 17:26:50 +00:00
Kun Wu
8f3fcbc687 [mlir][sparse][GPU] add 2:4 integration test
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D152287
2023-06-13 02:10:26 +00:00
Aart Bik
80fe3168b5 [mlir][sparse] add support for direct prod/and/min/max reductions
We recently fixed a bug in "sparsifying" such reductions, since
it incorrectly changed this into reductions over stored elements
only , which only works for add/sub/or/xor. However, we still want
to be able to "sparsify" the reductions even in the general case,
and this is a first step by rewriting them into a custom reduction
that feeds in the implicit zeros. NOTE HOWEVER, that in the long run
we want to do this better and feed in any implicit zero only ONCE
for efficiency.

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D152580
2023-06-12 09:27:47 -07:00
Aart Bik
e2167d89db [mlir][sparse] refine absent branch feeding into custom op
Document better that unary/binary may only feed to the output
or the input of a custom reduction (not even a regular reduction
since it may have "no value"!). Also fixes a bug when present
branch is empty and feeds into custom reduction.

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D152224
2023-06-06 09:57:15 -07:00
Peiming Liu
23dc96bbe4 [mlir][sparse] fix crashes when using custom reduce with unary operation.
The tests case is directly copied from https://reviews.llvm.org/D152179 authored by @aartbik

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D152204
2023-06-05 23:41:26 +00:00
Peiming Liu
e7b4c93f5e [mlir][sparse] fix crash when using sparse_tensor::UnaryOp and ReduceOp.
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D152048
2023-06-03 01:19:05 +00:00
Aart Bik
6a38c772d4 [mlir][sparse] fixed bug with unary op, dense output
Note that by sparse compiler convention, dense output
is zerod out when not set, so complement results in
zeros where elements were present.

Reviewed By: wrengr

Differential Revision: https://reviews.llvm.org/D152046
2023-06-02 18:15:33 -07:00
Peiming Liu
ce6f8c5afe [mlir][sparse] fix various bug to support sparse pooling
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D151776
2023-06-02 17:34:47 +00:00
Aart Bik
378f1885e3 [mlir][sparse] enhance sparse reduction support
Formerly, we accepted and/prod reductions as a standard
reduction but these change the semantics after sparsification
by not looking at implicit zeros. Therefore, we only accept
standard reductions that are insensitive to implicit vs.
explicit zeros, and leave the more complex reductions to
the sparse_tensor.reduce custom reduction implementation.

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D151929
2023-06-01 16:30:21 -07:00
Peiming Liu
54ac02dd16 [mlir][sparse] fix crashes when generation conv_2d_nchw_fchw with Compressed Dense Compressed Dense sparse encoding.
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D151773
2023-05-31 18:06:01 +00:00
wren romano
540d5e0ce6 [mlir][sparse] Updating STEA parser/printer to use the name "dimSlices"
Depends On D151505

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D151513
2023-05-30 15:50:07 -07:00
wren romano
76647fce13 [mlir][sparse] Combining dimOrdering+higherOrdering fields into dimToLvl
This is a major step along the way towards the new STEA design.  While a great deal of this patch is simple renaming, there are several significant changes as well.  I've done my best to ensure that this patch retains the previous behavior and error-conditions, even though those are at odds with the eventual intended semantics of the `dimToLvl` mapping.  Since the majority of the compiler does not yet support non-permutations, I've also added explicit assertions in places that previously had implicitly assumed it was dealing with permutations.

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D151505
2023-05-30 15:19:50 -07:00
Peiming Liu
db7f639b90 [mlir][sparse] fix a crash when generating sparse convolution with nchw input
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D151744
2023-05-30 20:16:54 +00:00
Tobias Hieta
f9008e6366
[NFC][Py Reformat] Reformat python files in mlir subdir
This is an ongoing series of commits that are reformatting our
Python code.

Reformatting is done with `black`.

If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.

If you run into any problems, post to discourse about it and
we will try to help.

RFC Thread below:

https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style

Differential Revision: https://reviews.llvm.org/D150782
2023-05-26 08:05:40 +02:00
Aart Bik
22caafc9f3 [mlir][sparse][gpu] end to end test for matmul
(1) minor bug fix in copy back [always nice to run stuff ;-)]
(2) run with and without lib (even though some fall back to CPU)

Reviewed By: wrengr

Differential Revision: https://reviews.llvm.org/D151507
2023-05-25 16:10:22 -07:00
Peiming Liu
f7b8b005ff [mlir][sparse] fix bugs when computing the memory size when lowering pack op.
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D151481
2023-05-25 19:19:52 +00:00