CHANGES SINCE THE ORIGINAL VERSION
----------------------------------
The default test set-up was extracted from
* SparseTensor/CPU/lit.local.cfg.
and duplicated in all tests. This is to support downstream users that
don't use these local LIT config files.
SUMMARY OF CHANGES
------------------
This patch aims to reduce test duplication. This is a direct follow-up of:
1. https://reviews.llvm.org/D155403 (test duplication), and
2. https://reviews.llvm.org/D155405 (code re-use),
All SVE/VLA tests are now enabled _conditionally_ and refactored to use
`mlir-cpu-runner` rather than `lli`. The former helps with test
duplication and the latter with code re-use.
A few additional refactoring changes are included.
1. The reduce verbosity, long runtime library names like:
%mlir_native_utils_lib_dir/libmlir_c_runner_utils%shlibext
are replaced with:
%mlir_c_runner_utils
2. In order to keep the code and the comments in sync, and to maintain
consistency across the tests, the following:
enable-runtime-library=true
is swapped with (and vice-versa):
enable-runtime-library=false
Note that this change won't affect test coverage. Only few tests
required such update.
3. A VLS vectorization `RUN` line is added in tests where there was a
VLA/VLS `RUN` line, but no VLS `RUN` line (with a few exceptions of
tests that only contained one `RUN` line to begin with).
4. A few test variables are renamed/added. Most notable example:
* %{options}` --> %{sparse_compiler_opts}
TEST RUNTIME IMPROVEMENT
------------------------
Tl;Dr This change improves test execution time by ~25%.
At the moment, the following `llvm-lit` invocation takes ~7.30s on my
AArch64 workstation (with SVE):
llvm-lit <llvm-project>/mlir/test/Integration/Dialect/SparseTensor/CPU/
This timing doesn't change no matter what the value of the following
CMake variable is (that should disable some tests):
MLIR_RUN_ARM_SVE_TESTS
With this patch, the execution time will indeed depend on the value of
the above CMake variable:
* with `MLIR_RUN_ARM_SVE_TESTS=true` the timing remains intact,
* with `MLIR_RUN_ARM_SVE_TESTS=false` the timing drops to ~5.40s (~25%
improvement).
This is expected:
* on average there are 4 `RUN` lines per test,
* _without this change_ (and with `MLIR_RUN_ARM_SVE_TESTS=false`) the
4th `RUN` line would in most cases duplicate the 3rd `RUN` line,
* _with this change) (and with `MLIR_RUN_ARM_SVE_TESTS=false`) the
4th `RUN` line becomes empty.
PATCH SIZE
----------
While rather large and touching many files, most changes in this patch
are rather mechanical. All test configurations have been preserved and
only in a handful of cases new `RUN` lines added.
Differential Revision: https://reviews.llvm.org/D156625
SUMMARY OF CHANGES
------------------
This patch aims to reduce test duplication and to improve code re-use in
SparseTensor integration tests for CPU. This is a direct follow-up of:
1. https://reviews.llvm.org/D155403 (test duplication), and
2. https://reviews.llvm.org/D155405 (code re-use),
The key logic for this patch is implemented in:
* SparseTensor/CPU/lit.local.cfg.
Essentially, the set-up that used to be repeated across all test files
has been extracted into a common LIT configuration file. This makes code
re-use straightforward.
All SVE/VLA tests are now enabled _conditionally_ and refactored to use
`mlir-cpu-runner` rather than `lli`. The former helps with test
duplication and the latter with code re-use.
A few additional refactoring changes are included.
1. The reduce verbosity, long runtime library names like:
%mlir_native_utils_lib_dir/libmlir_c_runner_utils%shlibext
are replaced with:
%mlir_c_runner_utils
2. In order to keep the code and the comments in sync, and to maintain
consistency across the tests, the following:
enable-runtime-library=true
is swapped with (and vice-versa):
enable-runtime-library=false
Note that this change won't affect test coverage. Only few tests
required such update.
3. A VLS vectorization `RUN` line is added in tests where there was a
VLA/VLS `RUN` line, but no VLS `RUN` line (with a few exceptions of
tests that only contained one `RUN` line to begin with).
4. A few test variables are renamed/added. Most notable example:
* %{options}` --> %{sparse_compiler_opts}
TEST RUNTIME IMPROVEMENT
------------------------
Tl;Dr This change improves test execution time by ~25%.
At the moment, the following `llvm-lit` invocation takes ~7.30s on my
AArch64 workstation (with SVE):
llvm-lit <llvm-project>/mlir/test/Integration/Dialect/SparseTensor/CPU/
This timing doesn't change no matter what the value of the following
CMake variable is (that should disable some tests):
MLIR_RUN_ARM_SVE_TESTS
With this patch, the execution time will indeed depend on the value of
the above CMake variable:
* with `MLIR_RUN_ARM_SVE_TESTS=true` the timing remains intact,
* with `MLIR_RUN_ARM_SVE_TESTS=false` the timing drops to ~5.40s (~25%
improvement).
This is expected:
* on average there are 4 `RUN` lines per test,
* _without this change_ (and with `MLIR_RUN_ARM_SVE_TESTS=false`) the
4th `RUN` line would in most cases duplicate the 3rd `RUN` line,
* _with this change) (and with `MLIR_RUN_ARM_SVE_TESTS=false`) the
4th `RUN` line becomes empty.
PATCH SIZE
----------
While rather large and touching many files, most changes in this patch
are rather mechanical. All test configurations have been preserved and
only in a handful of cases new `RUN` lines added.
Differential Revision: https://reviews.llvm.org/D156625
This is a major step along the way towards the new STEA design. While a great deal of this patch is simple renaming, there are several significant changes as well. I've done my best to ensure that this patch retains the previous behavior and error-conditions, even though those are at odds with the eventual intended semantics of the `dimToLvl` mapping. Since the majority of the compiler does not yet support non-permutations, I've also added explicit assertions in places that previously had implicitly assumed it was dealing with permutations.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D151505
This commit is part of the migration of towards the new STEA syntax/design. In particular, this commit includes the following changes:
* Renaming compiler-internal functions/methods:
* `SparseTensorEncodingAttr::{getDimLevelType => getLvlTypes}`
* `Merger::{getDimLevelType => getLvlType}` (for consistency)
* `sparse_tensor::{getDimLevelType => buildLevelType}` (to help reduce confusion vs actual getter methods)
* Renaming external facets to match:
* the STEA parser and printer
* the C and Python bindings
* PyTACO
However, the actual renaming of the `DimLevelType` itself (along with all the "dlt" names) will be handled in a separate commit.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D150330
The logic enabling the Arm SVE (and now SME) integration tests for
various dialects, that may run under emulation, is now duplicated in
several places.
This patch moves the configuration to the top-level MLIR integration
tests Lit config and renames the '%lli' substitution in contexts where
it will run exclusively (ArmSVE, ArmSME) on AArch64 (and possibly under
emulation) to '%lli_aarch64_cmd', and '%lli_host_or_aarch64_cmd' for
contexts where it may run AArch64 (also possibly under emulation). The
latter is for integration tests that have target-specific and
target-agnostic codepaths such as SparseTensor, which supports scalable
vectors.
The two substitutions have the same effect but the names are different to
convey this information. The '%lli_aarch64_cmd' substitution could be
used in the SparseTensor tests but that would be a misnomer if the host
were x86 and the MLIR_RUN_SVE_TESTS=OFF.
The reason for renaming the '%lli' substitution is to not prevent running other
target-specific integration tests at the same time, since the same substitution
'%lli' is used for lli in other integration tests:
* mlir/test/Integration/Dialect/Vector/CPU/X86Vector - (AVX emulation via Intel SDE)
* mlir/test/Integration/Dialect/Vector/CPU/AMX - (AMX emulation via Intel SDE)
* mlir/test/Integration/Dialect/LLVMIR/CPU/test-vp-intrinsic.mlir - (RISCV emulation via QEMU if supported, native otherwise)
and substituting '%lli' at the top-level with Arm specific logic would override
this.
Reviewed By: awarzynski
Differential Revision: https://reviews.llvm.org/D148929
The old "pointer/index" names often cause confusion since these names clash with names of unrelated things in MLIR; so this change rectifies this by changing everything to use "position/coordinate" terminology instead.
In addition to the basic terminology, there have also been various conventions for making certain distinctions like: (1) the overall storage for coordinates in the sparse-tensor, vs the particular collection of coordinates of a given element; and (2) particular coordinates given as a `Value` or `TypedValue<MemRefType>`, vs particular coordinates given as `ValueRange` or similar. I have striven to maintain these distinctions
as follows:
* "p/c" are used for individual position/coordinate values, when there is no risk of confusion. (Just like we use "d/l" to abbreviate "dim/lvl".)
* "pos/crd" are used for individual position/coordinate values, when a longer name is helpful to avoid ambiguity or to form compound names (e.g., "parentPos"). (Just like we use "dim/lvl" when we need a longer form of "d/l".)
I have also used these forms for a handful of compound names where the old name had been using a three-letter form previously, even though a longer form would be more appropriate. I've avoided renaming these to use a longer form purely for expediency sake, since changing them would require a cascade of other renamings. They should be updated to follow the new naming scheme, but that can be done in future patches.
* "coords" is used for the complete collection of crd values associated with a single element. In the runtime library this includes both `std::vector` and raw pointer representations. In the compiler, this is used specifically for buffer variables with C++ type `Value`, `TypedValue<MemRefType>`, etc.
The bare form "coords" is discouraged, since it fails to make the dim/lvl distinction; so the compound names "dimCoords/lvlCoords" should be used instead. (Though there may exist a rare few cases where is is appropriate to be intentionally ambiguous about what coordinate-space the coords live in; in which case the bare "coords" is appropriate.)
There is seldom the need for the pos variant of this notion. In most circumstances we use the term "cursor", since the same buffer is reused for a 'moving' pos-collection.
* "dcvs/lcvs" is used in the compiler as the `ValueRange` analogue of "dimCoords/lvlCoords". (The "vs" stands for "`Value`s".) I haven't found the need for it, but "pvs" would be the obvious name for a pos-`ValueRange`.
The old "ind"-vs-"ivs" naming scheme does not seem to have been sustained in more recent code, which instead prefers other mnemonics (e.g., adding "Buf" to the end of the names for `TypeValue<MemRefType>`). I have cleaned up a lot of these to follow the "coords"-vs-"cvs" naming scheme, though haven't done an exhaustive cleanup.
* "positions/coordinates" are used for larger collections of pos/crd values; in particular, these are used when referring to the complete sparse-tensor storage components.
I also prefer to use these unabbreviated names in the documentation, unless there is some specific reason why using the abbreviated forms helps resolve ambiguity.
In addition to making this terminology change, this change also does some cleanup along the way:
* correcting the dim/lvl terminology in certain places.
* adding `const` when it requires no other code changes.
* miscellaneous cleanup that was entailed in order to make the proper distinctions. Most of these are in CodegenUtils.{h,cpp}
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D144773
This reverts commit 5561e174117ff395d65b6978d04b62c1a1275138
The logic was moved from cmake into lit fixing the issue that lead to the revert and potentially others with multi-config cmake generators
Differential Revision: https://reviews.llvm.org/D143925
This patch contains the changes required to make the vast majority of integration and runner tests run on Windows.
Historically speaking, the JIT support for Windows has been lacking behind, but recent versions of ORC JIT have now caught up and works for basically all examples in repo.
Sadly due to these tests previously not working on Windows, basically all of them are making unix-like assumptions about things like filenames, paths, shell syntax etc.
This patch fixes all these issues in one big swoop and enables Windows support for the vast majority of integration tests.
More specifically, following changes had to be done:
* The various JIT runners used paths to the runtime libraries that assumed a Unix toolchain layout and filenames. I abstracted the specific path and filename of these runtime libraries away by making the paths to the runtime libraries be passed from cmake into lit. This now also allows a much more convenient syntax: `--shared-libs=%mlir_c_runner_utils` instead of `--shared-libs=%mlir_lib_dir/lib/libmlir_c_runner_utils%shlibext`
* Some tests using python set environment variables using the `ENV=VALUE cmd` format. This works on Unix, but on Windows it has to prefixed using `env ENV=VALUE cmd`
* Some tests used C functions that are simply not available or exported on Windows (`fabsf`, `aligned_alloc`). These tests have either been adjusted or explicitly marked as `UNSUPPORTED`
Some tests remain disabled on Windows as before:
* In SparseTensor some tests have non-trivial logic for finding the runtime libraries which seems to be required for the use of emulators. I do not have the time to port these so I simply kept them disabled
* Some tests requiring special hardware which I simply cannot test remain disabled on Windows. These include usage of AVX512 or AMX
The tests for `mlir-vulkan-runner` and `mlir-spirv-runner` all work now as well and so do the vast majority of `mlir-cpu-runner`.
Differential Revision: https://reviews.llvm.org/D143925
This patch updates the remaining SparseCompiler integration tests to
target SVE when available.
Two tests will require some investigation in the future:
* sparse_matmul.mlir
* sparse_tanh.mlir
The former passes regardless - that's due to how `CHECK` lines are
defined. The latter fails when SVE is enabled, but passes when it's
disabled. I marked it as UNSUPPORTED as there is no mechanism to XFAIL a
test conditionally. Also, see [1] for more details.
[1] https://github.com/llvm/llvm-project/issues/60626
Differential Revision: https://reviews.llvm.org/D143514
The "sparsification" pass does not need the ability to use runtime values for
the dimension, so the only source for variability would have been user code.
Restricting the dimension to constants simplifies code generation.
Reviewed By: Peiming, wrengr
Differential Revision: https://reviews.llvm.org/D133458
This op used to belong to the sparse dialect, but there are use cases for dense bufferization as well. (E.g., when a tensor alloc is returned from a function and should be deallocated at the call site.) This change moves the op to the bufferization dialect, which now has an `alloc_tensor` and a `dealloc_tensor` op.
Differential Revision: https://reviews.llvm.org/D129985
Rationale:
The silent exit(1) gives little clues on where the error occurs on failure
and may even be confusing at first. The CHECK testing of all computed values
and indices may be a little bit more elaborate, but it directly pinpoints
where errors happen if they occur. This style is also consistent with
the other tests, which I actually prefer.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D112688
Precursor: https://reviews.llvm.org/D110200
Removed redundant ops from the standard dialect that were moved to the
`arith` or `math` dialects.
Renamed all instances of operations in the codebase and in tests.
Reviewed By: rriddle, jpienaar
Differential Revision: https://reviews.llvm.org/D110797
We have several ways to materialize sparse tensors (new and convert) but no explicit operation to release the underlying sparse storage scheme at runtime (other than making an explicit delSparseTensor() library call). To simplify memory management, a sparse_tensor.release operation has been introduced that lowers to the runtime library call while keeping tensors, opague pointers, and memrefs transparent in the initial IR.
*Note* There is obviously some tension between the concept of immutable tensors and memory management methods. This tension is addressed by simply stating that after the "release" call, no further memref related operations are allowed on the tensor value. We expect the design to evolve over time, however, and arrive at a more satisfactory view of tensors and buffers eventually.
Bug:
http://llvm.org/pr52046
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D111099
Conversion to the LLVM dialect is being refactored to be more progressive and
is now performed as a series of independent passes converting different
dialects. These passes may produce `unrealized_conversion_cast` operations that
represent pending conversions between built-in and LLVM dialect types.
Historically, a more monolithic Standard-to-LLVM conversion pass did not need
these casts as all operations were converted in one shot. Previous refactorings
have led to the requirement of running the Standard-to-LLVM conversion pass to
clean up `unrealized_conversion_cast`s even though the IR had no standard
operations in it. The pass must have been also run the last among all to-LLVM
passes, in contradiction with the partial conversion logic. Additionally, the
way it was set up could produce invalid operations by removing casts between
LLVM and built-in types even when the consumer did not accept the uncasted
type, or could lead to cryptic conversion errors (recursive application of the
rewrite pattern on `unrealized_conversion_cast` as a means to indicate failure
to eliminate casts).
In fact, the need to eliminate A->B->A `unrealized_conversion_cast`s is not
specific to to-LLVM conversions and can be factored out into a separate type
reconciliation pass, which is achieved in this commit. While the cast operation
itself has a folder pattern, it is insufficient in most conversion passes as
the folder only applies to the second cast. Without complex legality setup in
the conversion target, the conversion infra will either consider the cast
operations valid and not fold them (a separate canonicalization would be
necessary to trigger the folding), or consider the first cast invalid upon
generation and stop with error. The pattern provided by the reconciliation pass
applies to the first cast operation instead. Furthermore, having a separate
pass makes it clear when `unrealized_conversion_cast`s could not have been
eliminated since it is the only reason why this pass can fail.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D109507