This revision adds a transformation and a pattern that rewrites a "maybe masked" `vector.transfer_read %view[...], %pad `into a pattern resembling:
```
%1:3 = scf.if (%inBounds) {
scf.yield %view : memref<A...>, index, index
} else {
%2 = vector.transfer_read %view[...], %pad : memref<A...>, vector<...>
%3 = vector.type_cast %extra_alloc : memref<...> to
memref<vector<...>> store %2, %3[] : memref<vector<...>> %4 =
memref_cast %extra_alloc: memref<B...> to memref<A...> scf.yield %4 :
memref<A...>, index, index
}
%res= vector.transfer_read %1#0[%1#1, %1#2] {masked = [false ... false]}
```
where `extra_alloc` is a top of the function alloca'ed buffer of one vector.
This rewrite makes it possible to realize the "always full tile" abstraction where vector.transfer_read operations are guaranteed to read from a padded full buffer.
The extra work only occurs on the boundary tiles.
Differential Revision: https://reviews.llvm.org/D84631
This reverts commit 35b65be041127db9fe23d3128a004c888893cbae.
Build is broken with -DBUILD_SHARED_LIBS=ON with some undefined
references like:
VectorTransforms.cpp:(.text._ZN4llvm12function_refIFvllEE11callback_fnIZL24createScopedInBoundsCondN4mlir25VectorTransferOpInterfaceEE3$_8EEvlll+0xa5): undefined reference to `mlir::edsc::op::operator+(mlir::Value, mlir::Value)'
This revision adds a transformation and a pattern that rewrites a "maybe masked" `vector.transfer_read %view[...], %pad `into a pattern resembling:
```
%1:3 = scf.if (%inBounds) {
scf.yield %view : memref<A...>, index, index
} else {
%2 = vector.transfer_read %view[...], %pad : memref<A...>, vector<...>
%3 = vector.type_cast %extra_alloc : memref<...> to
memref<vector<...>> store %2, %3[] : memref<vector<...>> %4 =
memref_cast %extra_alloc: memref<B...> to memref<A...> scf.yield %4 :
memref<A...>, index, index
}
%res= vector.transfer_read %1#0[%1#1, %1#2] {masked = [false ... false]}
```
where `extra_alloc` is a top of the function alloca'ed buffer of one vector.
This rewrite makes it possible to realize the "always full tile" abstraction where vector.transfer_read operations are guaranteed to read from a padded full buffer.
The extra work only occurs on the boundary tiles.
Differential Revision: https://reviews.llvm.org/D84631
For the purpose of vector transforms, the Tablegen-based infra is subsumed by simple C++ pattern application. Deprecate declarative transforms whose complexity does not pay for itself.
Differential Revision: https://reviews.llvm.org/D84753
Summary: Vector contract patterns were only parameterized by a `vectorTransformsOptions`. As a result, even if an mlir file was containing several occurrences of `vector.contract`, all of them would be lowered in the same way. More granularity might be required . This Diff adds a `constraint` argument to each of these patterns which allows the user to specify with more precision on which `vector.contract` should each of the lowering apply.
Differential Revision: https://reviews.llvm.org/D83960
We temporarily had separate OUTER lowering (for matmat flavors) and
AXPY lowering (for matvec flavors). With the new generalized
"vector.outerproduct" semantics, these cases can be merged into
a single lowering method. This refactoring will simplify future
decisions on cost models and lowering heuristics.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D83585
The UnrollVectorPattern is can be used in a programmable fashion by:
```
OwningRewritePatternList patterns;
patterns.insert<UnrollVectorPattern<AddFOp>>(ArrayRef<int64_t>{2, 2}, ctx);
patterns.insert<UnrollVectorPattern<vector::ContractionOp>>(
ArrayRef<int64_t>{2, 2, 2}, ctx);
...
applyPatternsAndFoldGreedily(getFunction(), patterns);
```
Differential revision: https://reviews.llvm.org/D83064
Default vector.contract lowering essentially yields a series of sdot/ddot
operations. However, for some layouts a series of saxpy/daxpy operations,
chained through fma are more efficient. This CL introduces a choice between
the two lowering paths. A default heuristic is to follow.
Some preliminary avx2 performance numbers for matrix-times-vector.
Here, dot performs best for 64x64 A x b and saxpy for 64x64 A^T x b.
```
------------------------------------------------------------
A x b A^T x b
------------------------------------------------------------
GFLOPS sdot (reassoc) saxpy sdot (reassoc) saxpy
------------------------------------------------------------
1x1 0.6 0.9 0.6 0.9
2x2 2.5 3.2 2.4 3.5
4x4 6.4 8.4 4.9 11.8
8x8 11.7 6.1 5.0 29.6
16x16 20.7 10.8 7.3 43.3
32x32 29.3 7.9 6.4 51.8
64x64 38.9 79.3
128x128 32.4 40.7
------------------------------------------------------------
```
Reviewed By: nicolasvasilache, ftynse
Differential Revision: https://reviews.llvm.org/D83012
Summary:
Progressive lowering of vector.transpose into an operation that
is closer to an intrinsic, and thus the hardware ISA. Currently
under the common vector transform testing flag, as we prepare
deploying this transformation in the LLVM lowering pipeline.
Reviewers: nicolasvasilache, reidtatge, andydavis1, ftynse
Reviewed By: nicolasvasilache, ftynse
Subscribers: mehdi_amini, rriddle, jpienaar, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, liufengdb, stephenneuendorffer, Joonsoo, grosul1, frgossen, Kayjukh, jurahul, llvm-commits
Tags: #llvm, #mlir
Differential Revision: https://reviews.llvm.org/D80772
This revision adds the additional lowering and exposes the patterns at a finer granularity for better programmatic reuse. The unit test makes use of the finer grained pattern for simpler checks.
As the ContractionOpLowering is exposed programmatically, cleanup opportunities appear and static class methods are turned into free functions with static visibility.
Differential Revision: https://reviews.llvm.org/D80375
Rename mlir::applyPatternsGreedily -> applyPatternsAndFoldGreedily. The
new name is a more accurate description of the method - it performs
both, application of the specified patterns and folding of all ops in
the op's region irrespective of whether any patterns have been supplied.
Differential Revision: https://reviews.llvm.org/D77478
This revision removes all of the CRTP from the pass hierarchy in preparation for using the tablegen backend instead. This creates a much cleaner interface in the C++ code, and naturally fits with the rest of the infrastructure. A new utility class, PassWrapper, is added to replicate the existing behavior for passes not suitable for using the tablegen backend.
Differential Revision: https://reviews.llvm.org/D77350
Summary:
This revision restructures the calling of vector transforms to make it more flexible to ask for lowering through LLVM matrix intrinsics.
This also makes sure we bail out in degenerate cases (i.e. 1) in which LLVM complains about not being able to scalarize.
Differential Revision: https://reviews.llvm.org/D76266
Summary:
NFC - Moved StandardOps/Ops.h to a StandardOps/IR dir to better match surrounding
directories. This is to match other dialects, and prepare for moving StandardOps
related transforms in out for Transforms and into StandardOps/Transforms.
Differential Revision: https://reviews.llvm.org/D74940
Summary:
This sets the basic framework for lowering vector.contract progressively
into simpler vector.contract operations until a direct vector.reduction
operation is reached. More details will be filled out progressively as well.
Reviewers: nicolasvasilache
Reviewed By: nicolasvasilache
Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74520
In the previous state, we were relying on forcing the linker to include
all libraries in the final binary and the global initializer to self-register
every piece of the system. This change help moving away from this model, and
allow users to compose pieces more freely. The current change is only "fixing"
the dialect registration and avoiding relying on "whole link" for the passes.
The translation is still relying on the global registry, and some refactoring
is needed to make this all more convenient.
Differential Revision: https://reviews.llvm.org/D74461
Summary:
Rewrites the extract/insert_slices operation in terms of
strided_slice/insert_strided_slice ops with intermediate
tuple uses (that should get optimimized away with typical
usage). This is done in a separate "pass" to enable testing
this particular rewriting in isolation.
Reviewers: nicolasvasilache, andydavis1, ftynse
Reviewed By: nicolasvasilache
Subscribers: merge_guards_bot, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, liufengdb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73295
This reorganizes the vector transformations to be more easily testable as patterns and more easily composable into fused passes in the future.
PiperOrigin-RevId: 284817474