Add support for folding `memref.expand_shape` ops into
`vector.transfer_read` ops when the permutation map is a
non-minor-identity.
In the case that the permutation map indexes into expanded dimensions
that would be contiguous within the original source shape then it is
safe to make this transformation.
Signed-off-by: Jack Frankland <jack.frankland@arm.com>
I'm planning to introduce an interface that'll allow FoldMemRefAliasOps
to not know about dialects like NVVM or GPU. To do this, however, I need
to get the `affine` ops (which need special handling in order to handle
their implicit affine maps) into a separate pass, analogously to how
`amdgpu` ops have these patterns under their dialect and ton under
`memref`.
This commit also changes the expand/collapse_shape index resolvers to
return `void`, since they never actually failed and to make it clearer
that they modify IR.
(Note: An LLM did the initial refactoring and test movement, I've
reviewed the results and edited them some.)
Extend the load of a expand shape rewrite pattern to support folding a
`memref.expand_shape` and `vector.transfer_read` when the permutation
map on `vector.transfer_read` is a minor identity.
---------
Signed-off-by: Jack Frankland <jack.frankland@arm.com>
This is a reapply of patch #149851. The reapply also fixes a CMake/Bazel
build issue, which was the reason of the revert. (Thanks @rupprecht )
Original patch (#149851) message:
-----
This PR adds a new optimization pass to fold
`memref.subview/expand_shape/collapse_shape` ops into consumer
`amdgpu.gather_to_lds` operations.
* Implements a new pass `AmdgpuFoldMemRefOpsPass` with pattern
`FoldMemRefOpsIntoGatherToLDSOp`
* Adds corresponding folding tests
This PR adds a new optimization pass to fold
`memref.subview/expand_shape/collapse_shape` ops into consumer
`amdgpu.gather_to_lds` operations.
* Implements a new pass `AmdgpuFoldMemRefOpsPass` with pattern
`FoldMemRefOpsIntoGatherToLDSOp`
* Adds corresponding folding tests
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
This PR updates the FoldMemRefAliasOps to use `affine.linearize_index`
and `affine.delinearize_index` to perform the index computations needed
to fold a `memref.expand_shape` or `memref.collapse_shape` into its
consumers, respectively.
This also loosens some limitations of the pass:
1. The existing `output_shape` argument to `memref.expand_shape` is now
used, eliminating the need to re-infer this shape or call `memref.dim`.
2. Because we're using `affine.delinearize_index`, the restriction that
each group in a `memref.collapse_shape` can only have one dynamic
dimension is removed.
[mlir][vector] Standardize base Naming Across Vector Ops (NFC)
This change standardizes the naming convention for the argument
representing the value to read from or write to in Vector ops that
interface with Tensors or MemRefs. Specifically, it ensures that all
such ops use the name `base` (i.e., the base address or location to
which offsets are applied).
Updated operations:
* `vector.transfer_read`,
* `vector.transfer_write`.
For reference, these ops already use `base`:
* `vector.load`, `vector.store`, `vector.scatter`, `vector.gather`,
`vector.expandload`, `vector.compressstore`, `vector.maskedstore`,
`vector.maskedload`.
This is a non-functional change (NFC) and does not alter the semantics of these
operations. However, it does require users of the XFer ops to switch from
`op.getSource()` to `op.getBase()`.
To ease the transition, this PR temporarily adds a `getSource()` interface
method for compatibility. This is intended for downstream use only and should
not be relied on upstream. The method will be removed prior to the LLVM 21
release.
Implements #131602
let constructor is legacy (do not use in tree!) since the tableGen
backend emits most of the glue logic to build a pass.
Note: The following constructor has been retired:
```cpp
std::unique_ptr<Pass> createExpandReallocPass(bool emitDeallocs = true);
```
To update your codebase, replace it with the new options-based API:
```cpp
memref::ExpandReallocPassOptions expandAllocPassOptions{
/*emitDeallocs=*/false};
pm.addPass(memref::createExpandReallocPass(expandAllocPassOptions));
```
The greedy rewriter is used in many different flows and it has a lot of
convenience (work list management, debugging actions, tracing, etc). But
it combines two kinds of greedy behavior 1) how ops are matched, 2)
folding wherever it can.
These are independent forms of greedy and leads to inefficiency. E.g.,
cases where one need to create different phases in lowering and is
required to applying patterns in specific order split across different
passes. Using the driver one ends up needlessly retrying folding/having
multiple rounds of folding attempts, where one final run would have
sufficed.
Of course folks can locally avoid this behavior by just building their
own, but this is also a common requested feature that folks keep on
working around locally in suboptimal ways.
For downstream users, there should be no behavioral change. Updating
from the deprecated should just be a find and replace (e.g., `find ./
-type f -exec sed -i
's|applyPatternsAndFoldGreedily|applyPatternsGreedily|g' {} \;` variety)
as the API arguments hasn't changed between the two.
This patch adds adds patterns to fold memref alias for
expand_shape/collapse_shape feeding into vector.load/vector.store and
vector.maskedload/vector.maskedstore
The contents of a mask on a masked transfer are unaffected by the
particular region of memory being read/stored to, so just forward the
mask in subview folding patterns.
The `resolveSourceIndicesCollapseShape` method is used to compute
indices into the source `MemRef` of a `CollapseShapeOp` from the
collapsed indices. This method didn't check for dynamic sizes of the
source shape which led to a crash.
Fix https://github.com/llvm/llvm-project/issues/68483
The pass uses `computeSuffixProduct` method which only allows static
shapes. This revision adds an early-exit for dynamic cases to avoid
crash.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D157668
Folding mechanism does not recognize `ldmatrix` op. This work helps pass to recognize the op and fold the memref aliases.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D151412
The MLIR classes Type/Attribute/Operation/Op/Value support
cast/dyn_cast/isa/dyn_cast_or_null functionality through llvm's doCast
functionality in addition to defining methods with the same name.
This change begins the migration of uses of the method to the
corresponding function call as has been decided as more consistent.
Note that there still exist classes that only define methods directly,
such as AffineExpr, and this does not include work currently to support
a functional cast/isa call.
Caveats include:
- This clang-tidy script probably has more problems.
- This only touches C++ code, so nothing that is being generated.
Context:
- https://mlir.llvm.org/deprecation/ at "Use the free function variants
for dyn_cast/cast/isa/…"
- Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443
Implementation:
This first patch was created with the following steps. The intention is
to only do automated changes at first, so I waste less time if it's
reverted, and so the first mass change is more clear as an example to
other teams that will need to follow similar steps.
Steps are described per line, as comments are removed by git:
0. Retrieve the change from the following to build clang-tidy with an
additional check:
https://github.com/llvm/llvm-project/compare/main...tpopp:llvm-project:tidy-cast-check
1. Build clang-tidy
2. Run clang-tidy over your entire codebase while disabling all checks
and enabling the one relevant one. Run on all header files also.
3. Delete .inc files that were also modified, so the next build rebuilds
them to a pure state.
4. Some changes have been deleted for the following reasons:
- Some files had a variable also named cast
- Some files had not included a header file that defines the cast
functions
- Some files are definitions of the classes that have the casting
methods, so the code still refers to the method instead of the
function without adding a prefix or removing the method declaration
at the same time.
```
ninja -C $BUILD_DIR clang-tidy
run-clang-tidy -clang-tidy-binary=$BUILD_DIR/bin/clang-tidy -checks='-*,misc-cast-functions'\
-header-filter=mlir/ mlir/* -fix
rm -rf $BUILD_DIR/tools/mlir/**/*.inc
git restore mlir/lib/IR mlir/lib/Dialect/DLTI/DLTI.cpp\
mlir/lib/Dialect/Complex/IR/ComplexDialect.cpp\
mlir/lib/**/IR/\
mlir/lib/Dialect/SparseTensor/Transforms/SparseVectorization.cpp\
mlir/lib/Dialect/Vector/Transforms/LowerVectorMultiReduction.cpp\
mlir/test/lib/Dialect/Test/TestTypes.cpp\
mlir/test/lib/Dialect/Transform/TestTransformDialectExtension.cpp\
mlir/test/lib/Dialect/Test/TestAttributes.cpp\
mlir/unittests/TableGen/EnumsGenTest.cpp\
mlir/test/python/lib/PythonTestCAPI.cpp\
mlir/include/mlir/IR/
```
Differential Revision: https://reviews.llvm.org/D150123
These patterns follow FoldMemRefAliasOps which is further refactored for reuse.
In the process, fix FoldMemRefAliasOps handling of strides for vector.transfer ops which was previously incorrect.
These opt-in patterns generalize the existing canonicalizations on vector.transfer ops.
In the future the blanket canonicalizations will be retired.
They are kept for now to minimize porting disruptions.
Differential Revision: https://reviews.llvm.org/D146624
Creating maximally folded and composd affine.apply operation during
FoldMemRefAliasOps composes better with other transformations without having
to interleave canonicalization passes.
Differential Revision: https://reviews.llvm.org/D146515
This commits adds support for folding subview into GPU subgroup
MMA load/store ops.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D146150
`llvm.load` op has nonTemporal field which is missing for `memref.load` and `memref.store`. This revision first adds nonTemporal field to memref's load/store op, then it lowers the field to llvm.load/store ops.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D142616
The rewrite driver is typically applied to a single region or all regions of the same op. There is no longer an overload to apply the rewrite driver to a list of regions.
This simplifies the rewrite driver implementation because the scope is now a single region as opposed to a list of regions.
Note: This change is not NFC because `config.maxIterations` and `config.maxNumRewrites` is now counted for each region separately. Furthermore, worklist filtering (`scope`) is now applied to each region separately.
Differential Revision: https://reviews.llvm.org/D142611
This flips all of the remaining dialects to prefixed except for linalg, which
will be done in a followup.
Differential Revision: https://reviews.llvm.org/D134995
The patch introduces the required changes to update the pass declarations and definitions to use the new autogenerated files and allow dropping the old infrastructure.
Reviewed By: mehdi_amini, rriddle
Differential Review: https://reviews.llvm.org/D132838
The patch introduces the required changes to update the pass declarations and definitions to use the new autogenerated files and allow dropping the old infrastructure.
Reviewed By: mehdi_amini, rriddle
Differential Review: https://reviews.llvm.org/D132838