This brings back previous SIMD functionality, but in a separate pass.
The idea is to improve this new pass incrementally, going beyond for-loops
to while-loops for co-iteration as welll (masking), while introducing new
abstractions to make the lowering more progressive. The separation of
sparsification and vectorization is a very good first step on this journey.
Also brings back ArmSVE support
Still to be fine-tuned:
+ use of "index" in SIMD loop (viz. a[i] = i)
+ check that all ops really have SIMD support
+ check all forms of reductions
+ chain reduction SIMD values
Reviewed By: dcaballe
Differential Revision: https://reviews.llvm.org/D138236
This change removes the partial bufferization passes from the sparse compilation pipeline and replaces them with One-Shot Bufferize. One-Shot Analysis (and TensorCopyInsertion) is used to resolve all out-of-place bufferizations, dense and sparse. Dense ops are then bufferized with BufferizableOpInterface. Sparse ops are still bufferized in the Sparsification pass.
Details:
* Dense allocations are automatically deallocated, unless they are yielded from a block. (In that case the alloc would leak.) All test cases are modified accordingly. E.g., some funcs now have an "out" tensor argument that is returned from the function. (That way, the allocation happens at the call site.)
* Sparse allocations are *not* automatically deallocated. They must be "released" manually. (No change, this will be addressed in a future change.)
* Sparse tensor copies are not supported yet. (Future change)
* Sparsification no longer has to consider inplacability. If necessary, allocations and/or copies are inserted during TensorCopyInsertion. All tensors are inplaceable by the time Sparsification is running. Instead of marking a tensor as "not inplaceable", it can be marked as "not writable", which will trigger an allocation and/or copy during TensorCopyInsertion.
Differential Revision: https://reviews.llvm.org/D129356
The SparseTensor passes currently use opaque numbers for the CLI, despite using an enum internally. This patch exposes the enums instead of numbered items that are matched back to the enum.
Fixes GitHub issue #53389
Reviewed by: aartbik, mehdi_amini
Differential Revision: https://reviews.llvm.org/D123876
Use "enable-vla-vectorization=vla" to generate a vector length agnostic
loops during vectorization. This option works for vectorization strategy 2.
Differential Revision: https://reviews.llvm.org/D118379
`vector::InsertElementOp` and `vector::ExtractElementOp` have had their `position`
operand changed to accept `AnySignlessIntegerOrIndex` for better operability with
operations that use `index`, such as affine loops.
LLVM's `extractelement` and `insertelement` can also accept `i64`, so lowering
directly to these operations without explicitly inserting casts is allowed. SPIRV's
equivalent ops can also accept `i64`.
Reviewed By: nicolasvasilache, jpienaar
Differential Revision: https://reviews.llvm.org/D114139
Precursor: https://reviews.llvm.org/D110200
Removed redundant ops from the standard dialect that were moved to the
`arith` or `math` dialects.
Renamed all instances of operations in the codebase and in tests.
Reviewed By: rriddle, jpienaar
Differential Revision: https://reviews.llvm.org/D110797
This relaxes vectorization of dense memrefs a bit so that affine expressions
are allowed in more outer dimensions. Vectorization of non unit stride
references is disabled though, since this seems ineffective anyway.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D111469
Now not just SUM, but also PRODUCT, AND, OR, XOR. The reductions
MIN and MAX are still to be done (also depends on recognizing
these operations in cmp-select constructs).
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D110203
Apply the "for loop peeling" pattern from SCF dialect transforms. This pattern splits scf.for loops into full and partial iterations. In the full iteration, all masked loads/stores are canonicalized to unmasked loads/stores.
Differential Revision: https://reviews.llvm.org/D107733
Controlled by a compiler option, if 32-bit indices can be handled
with zero/sign-extention alike (viz. no worries on non-negative
indices), scatter/gather operations can use the more efficient
32-bit SIMD version.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D103632
A very elaborate, but also very fun revision because all
puzzle pieces are finally "falling in place".
1. replaces lingalg annotations + flags with proper sparse tensor types
2. add rigorous verification on sparse tensor type and sparse primitives
3. removes glue and clutter on opaque pointers in favor of sparse tensor types
4. migrates all tests to use sparse tensor types
NOTE: next CL will remove *all* obsoleted sparse code in Linalg
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D102095
This revision migrates more code from Linalg into the new permanent home of
SparseTensor. It replaces the test passes with proper compiler passes.
NOTE: the actual removal of the last glue and clutter in Linalg will follow
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D101811