This commit fixes memory leaks in sparse tensor integration tests by
adding `bufferization.dealloc_tensor` ops.
Note: Buffer deallocation will be automated in the future with the
ownership-based buffer deallocation pass, making `dealloc_tensor`
obsolete (only codegen path, not when using the runtime library).
Rationale:
A bufferization.alloc_tensor can be directly replaced
with tensor.empty since these are more or less semantically
equivalent. The latter is considered a bit more "pure"
with respect to SSA semantics.
CHANGES SINCE THE ORIGINAL VERSION
----------------------------------
The default test set-up was extracted from
* SparseTensor/CPU/lit.local.cfg.
and duplicated in all tests. This is to support downstream users that
don't use these local LIT config files.
SUMMARY OF CHANGES
------------------
This patch aims to reduce test duplication. This is a direct follow-up of:
1. https://reviews.llvm.org/D155403 (test duplication), and
2. https://reviews.llvm.org/D155405 (code re-use),
All SVE/VLA tests are now enabled _conditionally_ and refactored to use
`mlir-cpu-runner` rather than `lli`. The former helps with test
duplication and the latter with code re-use.
A few additional refactoring changes are included.
1. The reduce verbosity, long runtime library names like:
%mlir_native_utils_lib_dir/libmlir_c_runner_utils%shlibext
are replaced with:
%mlir_c_runner_utils
2. In order to keep the code and the comments in sync, and to maintain
consistency across the tests, the following:
enable-runtime-library=true
is swapped with (and vice-versa):
enable-runtime-library=false
Note that this change won't affect test coverage. Only few tests
required such update.
3. A VLS vectorization `RUN` line is added in tests where there was a
VLA/VLS `RUN` line, but no VLS `RUN` line (with a few exceptions of
tests that only contained one `RUN` line to begin with).
4. A few test variables are renamed/added. Most notable example:
* %{options}` --> %{sparse_compiler_opts}
TEST RUNTIME IMPROVEMENT
------------------------
Tl;Dr This change improves test execution time by ~25%.
At the moment, the following `llvm-lit` invocation takes ~7.30s on my
AArch64 workstation (with SVE):
llvm-lit <llvm-project>/mlir/test/Integration/Dialect/SparseTensor/CPU/
This timing doesn't change no matter what the value of the following
CMake variable is (that should disable some tests):
MLIR_RUN_ARM_SVE_TESTS
With this patch, the execution time will indeed depend on the value of
the above CMake variable:
* with `MLIR_RUN_ARM_SVE_TESTS=true` the timing remains intact,
* with `MLIR_RUN_ARM_SVE_TESTS=false` the timing drops to ~5.40s (~25%
improvement).
This is expected:
* on average there are 4 `RUN` lines per test,
* _without this change_ (and with `MLIR_RUN_ARM_SVE_TESTS=false`) the
4th `RUN` line would in most cases duplicate the 3rd `RUN` line,
* _with this change) (and with `MLIR_RUN_ARM_SVE_TESTS=false`) the
4th `RUN` line becomes empty.
PATCH SIZE
----------
While rather large and touching many files, most changes in this patch
are rather mechanical. All test configurations have been preserved and
only in a handful of cases new `RUN` lines added.
Differential Revision: https://reviews.llvm.org/D156625
SUMMARY OF CHANGES
------------------
This patch aims to reduce test duplication and to improve code re-use in
SparseTensor integration tests for CPU. This is a direct follow-up of:
1. https://reviews.llvm.org/D155403 (test duplication), and
2. https://reviews.llvm.org/D155405 (code re-use),
The key logic for this patch is implemented in:
* SparseTensor/CPU/lit.local.cfg.
Essentially, the set-up that used to be repeated across all test files
has been extracted into a common LIT configuration file. This makes code
re-use straightforward.
All SVE/VLA tests are now enabled _conditionally_ and refactored to use
`mlir-cpu-runner` rather than `lli`. The former helps with test
duplication and the latter with code re-use.
A few additional refactoring changes are included.
1. The reduce verbosity, long runtime library names like:
%mlir_native_utils_lib_dir/libmlir_c_runner_utils%shlibext
are replaced with:
%mlir_c_runner_utils
2. In order to keep the code and the comments in sync, and to maintain
consistency across the tests, the following:
enable-runtime-library=true
is swapped with (and vice-versa):
enable-runtime-library=false
Note that this change won't affect test coverage. Only few tests
required such update.
3. A VLS vectorization `RUN` line is added in tests where there was a
VLA/VLS `RUN` line, but no VLS `RUN` line (with a few exceptions of
tests that only contained one `RUN` line to begin with).
4. A few test variables are renamed/added. Most notable example:
* %{options}` --> %{sparse_compiler_opts}
TEST RUNTIME IMPROVEMENT
------------------------
Tl;Dr This change improves test execution time by ~25%.
At the moment, the following `llvm-lit` invocation takes ~7.30s on my
AArch64 workstation (with SVE):
llvm-lit <llvm-project>/mlir/test/Integration/Dialect/SparseTensor/CPU/
This timing doesn't change no matter what the value of the following
CMake variable is (that should disable some tests):
MLIR_RUN_ARM_SVE_TESTS
With this patch, the execution time will indeed depend on the value of
the above CMake variable:
* with `MLIR_RUN_ARM_SVE_TESTS=true` the timing remains intact,
* with `MLIR_RUN_ARM_SVE_TESTS=false` the timing drops to ~5.40s (~25%
improvement).
This is expected:
* on average there are 4 `RUN` lines per test,
* _without this change_ (and with `MLIR_RUN_ARM_SVE_TESTS=false`) the
4th `RUN` line would in most cases duplicate the 3rd `RUN` line,
* _with this change) (and with `MLIR_RUN_ARM_SVE_TESTS=false`) the
4th `RUN` line becomes empty.
PATCH SIZE
----------
While rather large and touching many files, most changes in this patch
are rather mechanical. All test configurations have been preserved and
only in a handful of cases new `RUN` lines added.
Differential Revision: https://reviews.llvm.org/D156625
This commit is part of the migration of towards the new STEA syntax/design. In particular, this commit includes the following changes:
* Renaming compiler-internal functions/methods:
* `SparseTensorEncodingAttr::{getDimLevelType => getLvlTypes}`
* `Merger::{getDimLevelType => getLvlType}` (for consistency)
* `sparse_tensor::{getDimLevelType => buildLevelType}` (to help reduce confusion vs actual getter methods)
* Renaming external facets to match:
* the STEA parser and printer
* the C and Python bindings
* PyTACO
However, the actual renaming of the `DimLevelType` itself (along with all the "dlt" names) will be handled in a separate commit.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D150330
The logic enabling the Arm SVE (and now SME) integration tests for
various dialects, that may run under emulation, is now duplicated in
several places.
This patch moves the configuration to the top-level MLIR integration
tests Lit config and renames the '%lli' substitution in contexts where
it will run exclusively (ArmSVE, ArmSME) on AArch64 (and possibly under
emulation) to '%lli_aarch64_cmd', and '%lli_host_or_aarch64_cmd' for
contexts where it may run AArch64 (also possibly under emulation). The
latter is for integration tests that have target-specific and
target-agnostic codepaths such as SparseTensor, which supports scalable
vectors.
The two substitutions have the same effect but the names are different to
convey this information. The '%lli_aarch64_cmd' substitution could be
used in the SparseTensor tests but that would be a misnomer if the host
were x86 and the MLIR_RUN_SVE_TESTS=OFF.
The reason for renaming the '%lli' substitution is to not prevent running other
target-specific integration tests at the same time, since the same substitution
'%lli' is used for lli in other integration tests:
* mlir/test/Integration/Dialect/Vector/CPU/X86Vector - (AVX emulation via Intel SDE)
* mlir/test/Integration/Dialect/Vector/CPU/AMX - (AMX emulation via Intel SDE)
* mlir/test/Integration/Dialect/LLVMIR/CPU/test-vp-intrinsic.mlir - (RISCV emulation via QEMU if supported, native otherwise)
and substituting '%lli' at the top-level with Arm specific logic would override
this.
Reviewed By: awarzynski
Differential Revision: https://reviews.llvm.org/D148929
This reverts commit 5561e174117ff395d65b6978d04b62c1a1275138
The logic was moved from cmake into lit fixing the issue that lead to the revert and potentially others with multi-config cmake generators
Differential Revision: https://reviews.llvm.org/D143925
This patch contains the changes required to make the vast majority of integration and runner tests run on Windows.
Historically speaking, the JIT support for Windows has been lacking behind, but recent versions of ORC JIT have now caught up and works for basically all examples in repo.
Sadly due to these tests previously not working on Windows, basically all of them are making unix-like assumptions about things like filenames, paths, shell syntax etc.
This patch fixes all these issues in one big swoop and enables Windows support for the vast majority of integration tests.
More specifically, following changes had to be done:
* The various JIT runners used paths to the runtime libraries that assumed a Unix toolchain layout and filenames. I abstracted the specific path and filename of these runtime libraries away by making the paths to the runtime libraries be passed from cmake into lit. This now also allows a much more convenient syntax: `--shared-libs=%mlir_c_runner_utils` instead of `--shared-libs=%mlir_lib_dir/lib/libmlir_c_runner_utils%shlibext`
* Some tests using python set environment variables using the `ENV=VALUE cmd` format. This works on Unix, but on Windows it has to prefixed using `env ENV=VALUE cmd`
* Some tests used C functions that are simply not available or exported on Windows (`fabsf`, `aligned_alloc`). These tests have either been adjusted or explicitly marked as `UNSUPPORTED`
Some tests remain disabled on Windows as before:
* In SparseTensor some tests have non-trivial logic for finding the runtime libraries which seems to be required for the use of emulators. I do not have the time to port these so I simply kept them disabled
* Some tests requiring special hardware which I simply cannot test remain disabled on Windows. These include usage of AVX512 or AMX
The tests for `mlir-vulkan-runner` and `mlir-spirv-runner` all work now as well and so do the vast majority of `mlir-cpu-runner`.
Differential Revision: https://reviews.llvm.org/D143925
This patch updates the remaining SparseCompiler integration tests to
target SVE when available.
Two tests will require some investigation in the future:
* sparse_matmul.mlir
* sparse_tanh.mlir
The former passes regardless - that's due to how `CHECK` lines are
defined. The latter fails when SVE is enabled, but passes when it's
disabled. I marked it as UNSUPPORTED as there is no mechanism to XFAIL a
test conditionally. Also, see [1] for more details.
[1] https://github.com/llvm/llvm-project/issues/60626
Differential Revision: https://reviews.llvm.org/D143514
This op used to belong to the sparse dialect, but there are use cases for dense bufferization as well. (E.g., when a tensor alloc is returned from a function and should be deallocated at the call site.) This change moves the op to the bufferization dialect, which now has an `alloc_tensor` and a `dealloc_tensor` op.
Differential Revision: https://reviews.llvm.org/D129985
After recent bufferization improvement, this test
started failing due to missed zero initialization.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D129800
This change removes the partial bufferization passes from the sparse compilation pipeline and replaces them with One-Shot Bufferize. One-Shot Analysis (and TensorCopyInsertion) is used to resolve all out-of-place bufferizations, dense and sparse. Dense ops are then bufferized with BufferizableOpInterface. Sparse ops are still bufferized in the Sparsification pass.
Details:
* Dense allocations are automatically deallocated, unless they are yielded from a block. (In that case the alloc would leak.) All test cases are modified accordingly. E.g., some funcs now have an "out" tensor argument that is returned from the function. (That way, the allocation happens at the call site.)
* Sparse allocations are *not* automatically deallocated. They must be "released" manually. (No change, this will be addressed in a future change.)
* Sparse tensor copies are not supported yet. (Future change)
* Sparsification no longer has to consider inplacability. If necessary, allocations and/or copies are inserted during TensorCopyInsertion. All tensors are inplaceable by the time Sparsification is running. Instead of marking a tensor as "not inplaceable", it can be marked as "not writable", which will trigger an allocation and/or copy during TensorCopyInsertion.
Differential Revision: https://reviews.llvm.org/D129356