Updates:
1. Infer lvlToDim from dimToLvl
2. Add more tests for block sparsity
3. Finish TODOs related to lvlToDim, including adding lvlToDim to python
binding
Verification of lvlToDim that user provides will be implemented in the
next PR.
The vector.extract assembly format currently only contains the source
type, for example:
%1 = vector.extract %0[1] : vector<3x7x8xf32>
it's not immediately obvious if this is the source or result type. This
patch improves the assembly format to make this clearer, so the above
becomes:
%1 = vector.extract %0[1] : vector<7x8xf32> from vector<3x7x8xf32>
In the new syntax, we will parse **loose_compressed** as
**CompressedWithHigh** and **block2_4** as **TwoOutOfFour** level
format. Currently, we support unique and order as level properties.
Rationale:
A bufferization.alloc_tensor can be directly replaced
with tensor.empty since these are more or less semantically
equivalent. The latter is considered a bit more "pure"
with respect to SSA semantics.
This patch adds an NVPTX compilation path that enables JIT compilation
on NVIDIA targets. The following modifications were performed:
1. Adding a format field to the GPU object attribute, allowing the
translation attribute to use the correct runtime function to load the
module. Likewise, a dictionary attribute was added to add any possible
extra options.
2. Adding the `createObject` method to `GPUTargetAttrInterface`; this
method returns a GPU object from a binary string.
3. Adding the function `mgpuModuleLoadJIT`, which is only available for
NVIDIA GPUs, as there is no equivalent for AMD.
4. Adding the CMake flag `MLIR_GPU_COMPILATION_TEST_FORMAT` to specify
the format to use during testing.
Rationale:
This test was really fun to compare the MLIR sparsifier with TACO using
the PyTACO format. However, the underlying mechanism is rapidly growing
outdated with our recent developments. Rather than maintaining the old
code, we are moving toward the newer, better approaches. So if you are
sad this is gone, stay tuned, something better is coming!
Rationale:
This was actually just a pure "string based" test
with very little actual python usage. The output
sparse tensor was handled via the deprecated
convertFromMLIRSparseTensor method.
The revert happened due to a build bot failure that threw 'CUDA_ERROR_UNSUPPORTED_PTX_VERSION'.
The failure's root cause was a pass using "+ptx76" for compilation and an old CUDA driver
on the bot. This commit relands the patch with "+ptx60".
Original Gh PR: #65768
Original commit message:
Migrate tests referencing `gpu-to-cubin` to the new compilation workflow
using `TargetAttrs`. The `test-lower-to-nvvm` pass pipeline was modified
to use the new compilation workflow to simplify the introduction of
future tests.
The `createLowerGpuOpsToNVVMOpsPass` function was removed, as it didn't
allow for passing all options available in the `ConvertGpuOpsToNVVMOp`
pass.
Migrate tests referencing `gpu-to-cubin` to the new compilation workflow
using `TargetAttrs`. The `test-lower-to-nvvm` pass pipeline was modified
to use the new compilation workflow to simplify the introduction of
future tests.
The `createLowerGpuOpsToNVVMOpsPass` function was removed, as it didn't
allow for passing all options available in the `ConvertGpuOpsToNVVMOp`
pass.
Fix a regression caused by https://reviews.llvm.org/D158012. Failing
bot:
* https://lab.llvm.org/buildbot/#/builders/179/builds/7122
Note that both `RUN` lines in the affected file were previously
tested with similar configuraiton (_with_ and _without_ vectorisation).
This change restores that, though the new setting (from D158012) is
used, i.e.
* with direct IR generation, `enable-runtime-library=true`.
This is sufficient to make the test pass and allows us to investigate
the root cause offline. Issue reported here:
https://github.com/llvm/llvm-project/issues/64727
Reland of the original patch after updating the Python binding tests,
a few CUDA/GPU MLIR tests, and ensuring the assembly format is
round-trippable.
This patch splits the lowering of vector.print into first converting
an n-D print into a loop of scalar prints of the elements, then a second
pass that converts those scalar prints into the runtime calls. The
former is done in VectorToSCF and the latter in VectorToLLVM.
The main reason for this is to allow printing scalable vector types,
which are not possible to fully unroll at compile time, though this
also avoids fully unrolling very large vectors.
To allow VectorToSCF to add the necessary punctuation between vectors
and elements, a "punctuation" attribute has been added to vector.print.
This abstracts calling the runtime functions such as printNewline(),
without leaking the LLVM details into the higher abstraction levels.
For example:
vector.print punctuation <comma>
lowers to
llvm.call @printComma() : () -> ()
The output format and runtime functions remain the same, which avoids
the need to alter a large number of tests (aside from the pipelines).
Reviewed By: awarzynski, c-rhodes, aartbik
Differential Revision: https://reviews.llvm.org/D156519
CHANGES SINCE THE ORIGINAL VERSION
----------------------------------
The default test set-up was extracted from
* SparseTensor/CPU/lit.local.cfg.
and duplicated in all tests. This is to support downstream users that
don't use these local LIT config files.
SUMMARY OF CHANGES
------------------
This patch aims to reduce test duplication. This is a direct follow-up of:
1. https://reviews.llvm.org/D155403 (test duplication), and
2. https://reviews.llvm.org/D155405 (code re-use),
All SVE/VLA tests are now enabled _conditionally_ and refactored to use
`mlir-cpu-runner` rather than `lli`. The former helps with test
duplication and the latter with code re-use.
A few additional refactoring changes are included.
1. The reduce verbosity, long runtime library names like:
%mlir_native_utils_lib_dir/libmlir_c_runner_utils%shlibext
are replaced with:
%mlir_c_runner_utils
2. In order to keep the code and the comments in sync, and to maintain
consistency across the tests, the following:
enable-runtime-library=true
is swapped with (and vice-versa):
enable-runtime-library=false
Note that this change won't affect test coverage. Only few tests
required such update.
3. A VLS vectorization `RUN` line is added in tests where there was a
VLA/VLS `RUN` line, but no VLS `RUN` line (with a few exceptions of
tests that only contained one `RUN` line to begin with).
4. A few test variables are renamed/added. Most notable example:
* %{options}` --> %{sparse_compiler_opts}
TEST RUNTIME IMPROVEMENT
------------------------
Tl;Dr This change improves test execution time by ~25%.
At the moment, the following `llvm-lit` invocation takes ~7.30s on my
AArch64 workstation (with SVE):
llvm-lit <llvm-project>/mlir/test/Integration/Dialect/SparseTensor/CPU/
This timing doesn't change no matter what the value of the following
CMake variable is (that should disable some tests):
MLIR_RUN_ARM_SVE_TESTS
With this patch, the execution time will indeed depend on the value of
the above CMake variable:
* with `MLIR_RUN_ARM_SVE_TESTS=true` the timing remains intact,
* with `MLIR_RUN_ARM_SVE_TESTS=false` the timing drops to ~5.40s (~25%
improvement).
This is expected:
* on average there are 4 `RUN` lines per test,
* _without this change_ (and with `MLIR_RUN_ARM_SVE_TESTS=false`) the
4th `RUN` line would in most cases duplicate the 3rd `RUN` line,
* _with this change) (and with `MLIR_RUN_ARM_SVE_TESTS=false`) the
4th `RUN` line becomes empty.
PATCH SIZE
----------
While rather large and touching many files, most changes in this patch
are rather mechanical. All test configurations have been preserved and
only in a handful of cases new `RUN` lines added.
Differential Revision: https://reviews.llvm.org/D156625
Reland of the original patch after updating the Python binding tests and
a few CUDA/GPU MLIR tests.
This patch splits the lowering of vector.print into first converting
an n-D print into a loop of scalar prints of the elements, then a second
pass that converts those scalar prints into the runtime calls. The
former is done in VectorToSCF and the latter in VectorToLLVM.
The main reason for this is to allow printing scalable vector types,
which are not possible to fully unroll at compile time, though this
also avoids fully unrolling very large vectors.
To allow VectorToSCF to add the necessary punctuation between vectors
and elements, a "punctuation" attribute has been added to vector.print.
This abstracts calling the runtime functions such as printNewline(),
without leaking the LLVM details into the higher abstraction levels.
For example:
vector.print <comma>
lowers to
llvm.call @printComma() : () -> ()
The output format and runtime functions remain the same, which avoids
the need to alter a large number of tests (aside from the pipelines).
Reviewed By: awarzynski, c-rhodes, aartbik
Differential Revision: https://reviews.llvm.org/D156519
SUMMARY OF CHANGES
------------------
This patch aims to reduce test duplication and to improve code re-use in
SparseTensor integration tests for CPU. This is a direct follow-up of:
1. https://reviews.llvm.org/D155403 (test duplication), and
2. https://reviews.llvm.org/D155405 (code re-use),
The key logic for this patch is implemented in:
* SparseTensor/CPU/lit.local.cfg.
Essentially, the set-up that used to be repeated across all test files
has been extracted into a common LIT configuration file. This makes code
re-use straightforward.
All SVE/VLA tests are now enabled _conditionally_ and refactored to use
`mlir-cpu-runner` rather than `lli`. The former helps with test
duplication and the latter with code re-use.
A few additional refactoring changes are included.
1. The reduce verbosity, long runtime library names like:
%mlir_native_utils_lib_dir/libmlir_c_runner_utils%shlibext
are replaced with:
%mlir_c_runner_utils
2. In order to keep the code and the comments in sync, and to maintain
consistency across the tests, the following:
enable-runtime-library=true
is swapped with (and vice-versa):
enable-runtime-library=false
Note that this change won't affect test coverage. Only few tests
required such update.
3. A VLS vectorization `RUN` line is added in tests where there was a
VLA/VLS `RUN` line, but no VLS `RUN` line (with a few exceptions of
tests that only contained one `RUN` line to begin with).
4. A few test variables are renamed/added. Most notable example:
* %{options}` --> %{sparse_compiler_opts}
TEST RUNTIME IMPROVEMENT
------------------------
Tl;Dr This change improves test execution time by ~25%.
At the moment, the following `llvm-lit` invocation takes ~7.30s on my
AArch64 workstation (with SVE):
llvm-lit <llvm-project>/mlir/test/Integration/Dialect/SparseTensor/CPU/
This timing doesn't change no matter what the value of the following
CMake variable is (that should disable some tests):
MLIR_RUN_ARM_SVE_TESTS
With this patch, the execution time will indeed depend on the value of
the above CMake variable:
* with `MLIR_RUN_ARM_SVE_TESTS=true` the timing remains intact,
* with `MLIR_RUN_ARM_SVE_TESTS=false` the timing drops to ~5.40s (~25%
improvement).
This is expected:
* on average there are 4 `RUN` lines per test,
* _without this change_ (and with `MLIR_RUN_ARM_SVE_TESTS=false`) the
4th `RUN` line would in most cases duplicate the 3rd `RUN` line,
* _with this change) (and with `MLIR_RUN_ARM_SVE_TESTS=false`) the
4th `RUN` line becomes empty.
PATCH SIZE
----------
While rather large and touching many files, most changes in this patch
are rather mechanical. All test configurations have been preserved and
only in a handful of cases new `RUN` lines added.
Differential Revision: https://reviews.llvm.org/D156625