This flag enables the configuration of some transformation such as the
lowering of contractions and transposes. The default configuration
preserves the existing behavior.
The greedy rewriter is used in many different flows and it has a lot of
convenience (work list management, debugging actions, tracing, etc). But
it combines two kinds of greedy behavior 1) how ops are matched, 2)
folding wherever it can.
These are independent forms of greedy and leads to inefficiency. E.g.,
cases where one need to create different phases in lowering and is
required to applying patterns in specific order split across different
passes. Using the driver one ends up needlessly retrying folding/having
multiple rounds of folding attempts, where one final run would have
sufficed.
Of course folks can locally avoid this behavior by just building their
own, but this is also a common requested feature that folks keep on
working around locally in suboptimal ways.
For downstream users, there should be no behavioral change. Updating
from the deprecated should just be a find and replace (e.g., `find ./
-type f -exec sed -i
's|applyPatternsAndFoldGreedily|applyPatternsGreedily|g' {} \;` variety)
as the API arguments hasn't changed between the two.
Clean up `populateVectorToLLVMConversionPatterns` so that it populates
only conversion patterns. All rewrite patterns that do not lower to LLVM
should be populated into a separate greedy pattern rewrite.
The current combination of rewrite patterns and conversion patterns
triggered an edge case when merging the 1:1 and 1:N dialect conversions.
Depends on #119973.
The mask materialization patterns during `VectorToLLVM` are rewrite
patterns. They should run as part of the greedy pattern rewrite and not
the dialect conversion. (Rewrite patterns and conversion patterns are
not generally compatible.)
The current combination of rewrite patterns and conversion patterns
triggered an edge case when merging the 1:1 and 1:N dialect conversions.
This patch registers the tensor dialect as dependent of the
ConvertVectorToLLVM.
This which fixes a crash when `vector.transfer_write` is used with
dynamic tensor type.
The MaterializeTransferMask pattern would call
`vector::createOrFoldDimOp` which
creates a `tensor.dim` operation.
Fixes#107805.
There was some inconsistency with ConvertVectorToLLVM Pass builder,
files and option names.
This patch aims to move all occurences to ConvertVectorToLLVM.
The revision unrolls vector.bitcast like:
```mlir
%0 = vector.bitcast %arg0 : vector<2x4xi32> to vector<2x2xi64>
```
to
```mlir
%cst = arith.constant dense<0> : vector<2x2xi64>
%0 = vector.extract %arg0[0] : vector<4xi32> from vector<2x4xi32>
%1 = vector.bitcast %0 : vector<4xi32> to vector<2xi64>
%2 = vector.insert %1, %cst [0] : vector<2xi64> into vector<2x2xi64>
%3 = vector.extract %arg0[1] : vector<4xi32> from vector<2x4xi32>
%4 = vector.bitcast %3 : vector<4xi32> to vector<2xi64>
%5 = vector.insert %4, %2 [1] : vector<2xi64> into vector<2x2xi64>
```
The scalable vector is not supported because of the limitation of
`vector::createUnrollIterator`. The targetRank could mismatch the final
rank during unrolling; there is no direct way to query what the final
rank is from the object.
This gives more flexibility with when these lowerings are performed,
without also lowering unrelated vector ops.
This is a NFC (other than adding a new `-convert-arm-sme-to-llvm` pass)
This rearranges the Arm SVE dialect to have the same structure of the
Arm SME dialect. So this just moves around some source files and adds a
ArmSVE_IntrOp base class for SVE intrinsics. This makes later changes a
little easier and more consistent other dialects.
Currently, the VectorToLLVM patterns are built into a library along
with the corresponding pass, which also pulls in all the
platform-specific vector dialects (like AMXDialect) to apply all the
vector to LLVM conversions.
This causes dependency bloat when writing libraries - for example the
GPU to LLVM passes, which use the vector to LLVM patterns, don't need
the X86Vector dialect to be present at all.
This commit partitions the library into VectorToLLVM and
VectorToLLVMPass, where the latter pulls in all the other vector
transformations.
Reviewed By: nicolasvasilache, mehdi_amini
Differential Revision: https://reviews.llvm.org/D158287
At the moment, SME-to-LLVM lowerings rely entirely on
`LLVMTypeConverter`. This patch introduces a dedicated `TypeConverter`
that inherits from `LLVMTypeConverter` (it will also be used when
lowering ArmSME Ops to LLVM).
The new type converter merely disables lowerings for `VectorType` to
prevent 2-d scalable vectors (common in the context of ArmSME), e.g.
`vector<[16]x[16]xi8>`,
entering the LLVM Type converter. LLVM does not support arrays of
scalable vectors and hence the need for specialisation. In the case of
SME such types are effectively eliminated when emitting LLVM IR
intrinsics for SME.
Differential Revision: https://reviews.llvm.org/D155365
At the moment, the lowering from the Vector dialect to SME looks like
this:
* Vector --> SME LLVM IR intrinsics
This patch introduces a new lowering layer between the Vector dialect
and the Arm SME extension:
* Vector --> ArmSME dialect (custom Ops) --> SME LLVM IR intrinsics.
This is motivated by 2 considerations:
1. Storing `ZA` to memory (e.g. `vector.transfer_write`) requires an
`scf.for` loop over all rows of `ZA`. Similar logic will apply to
"load to ZA from memory". This is a rather complex transformation and
a custom Op seems justified.
2. As discussed in [1], we need to prevent the LLVM type converter from
having to convert types unsupported in LLVM, e.g.
`vector<[16]x[16]xi8>`. A dedicated abstraction layer with custom Ops
opens a path to some fine tuning (e.g. custom type converters) that
will allow us to avoid this.
To facilitate this change, two new custom SME Op are introduced:
* `TileStoreOp`, and
* `ZeroOp`.
Note that no new functionality is added - these Ops merely model what's
already supported. In particular, the following tile size is assumed
(dimension and element size are fixed):
* `vector<[16]x[16]xi8>`
The new lowering layer is introduced via a conversion pass between the
Vector and the SME dialects. You can use the `-convert-vector-to-sme`
flag to run it. The following function:
```
func.func @example(%arg0 : memref<?x?xi8>) {
// (...)
%cst = arith.constant dense<0> : vector<[16]x[16]xi8>
vector.transfer_write %cst, %arg0 : vector<[16]x[16]xi8>, memref<?x?xi8>
return
}
```
would be lowered to:
```
func.func @example(%arg0: memref<?x?xi8>) {
// (...)
%0 = arm_sme.zero : vector<[16]x[16]xi8>
arm_sme.tile_store %arg0[%c0, %c0], %0 : memref<?x?xi8>, vector<[16]x[16]xi8>
return
}
```
Later, a mechanism will be introduced to guarantee that `arm_sme.zero`
and `arm_sme.tile_store` operate on the same virtual tile. For `i8`
elements this is not required as there is only one tile.
In order to lower the above output to LLVM, use
* `-convert-vector-to-llvm="enable-arm-sme"`.
[1] https://github.com/openxla/iree/issues/14294
Reviewed By: WanderAway
Differential Revision: https://reviews.llvm.org/D154867
This patch adds two LLVM intrinsics to the ArmSME dialect:
* llvm.aarch64.sme.za.enable
* llvm.aarch64.sme.za.disable
for enabling the ZA storage array [1], as well as patterns for inserting
them during legalization to LLVM at the start and end of functions if
the function has the 'arm_za' attribute (D152695).
In the future ZA should probably be automatically enabled/disabled when
lowering from vector to SME, but this should be sufficient for now at
least until we have patterns lowering to SME instructions that use ZA.
N.B. The backend function attribute 'aarch64_pstate_za_new' can be used
manage ZA state (as was originally tried in D152694), but it emits calls
to the following SME support routines [2] for the lazy-save mechanism
[3]:
* __arm_tpidr2_restore
* __arm_tpidr2_save
These will soon be added to compiler-rt but there's currently no public
implementation, and using this attribute would introduce an MLIR
dependency on compiler-rt. Furthermore, this mechanism is for routines
with ZA enabled calling other routines with it also enabled. We can
choose not to enable ZA in the compiler when this is case.
Depends on D152695
[1] https://developer.arm.com/documentation/ddi0616/aa
[2] https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#sme-support-routines
[3] https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#the-za-lazy-saving-scheme
Reviewed By: awarzynski, dcaballe
Differential Revision: https://reviews.llvm.org/D153050
Vector dialect patterns have grown enormously in the past year to a point where they are now impenetrable.
Start reorganizing them towards finer-grained control.
Differential Revision: https://reviews.llvm.org/D146736
See https://github.com/llvm/llvm-project/issues/57475 for more context.
Using auto-generated constructors and options has significant advantages:
* It forces a uniform style and expectation for consuming a pass
* It allows to very easily add, remove or change options to a pass by simply making the changes in TableGen
* Its less code
This patch in particular ports all the conversion passes which lower to LLVM to use the auto generated constructors and options. For the most part, care was taken so that auto generated constructor functions have the same name as they previously did. Only following slight breaking changes (which I consider as worth the churn) have been made:
* `mlir::cf::createConvertControlFlowToLLVMPass` has been moved to the `mlir` namespace. This is consistent with basically all conversion passes
* `createGpuToLLVMConversionPass` now takes a proper options struct array for its pass options. The pass options are now also autogenerated.
* `LowerVectorToLLVMOptions` has been replaced by the autogenerated `ConvertVectorToLLVMPassOptions` which is automatically kept up to date by TableGen
* I had to move one function in the GPU to LLVM lowering as it is used as default value for an option.
* All passes that previously returned `unique_ptr<OperationPass<...>>` now simply return `unique_ptr<Pass>`
Differential Revision: https://reviews.llvm.org/D143773
The patch introduces the required changes to update the pass declarations and definitions to use the new autogenerated files and allow dropping the old infrastructure.
Reviewed By: mehdi_amini, rriddle
Differential Review: https://reviews.llvm.org/D132838
The patch introduces the required changes to update the pass declarations and definitions to use the new autogenerated files and allow dropping the old infrastructure.
Reviewed By: mehdi_amini, rriddle
Differential Review: https://reviews.llvm.org/D132838
We are using "enable-index-optimizations" and "indexOptimizations" as
names for an optimization that consists of using i32 for indices within
a vector. For instance, when building a vector comparison for mask
generation. The name is confusing and suggests a scope beyond these
vector indices. This change makes the function of the option explicit
in its name.
Differential Revision: https://reviews.llvm.org/D122415
The way vector.create_mask is currently lowered is
vector-length-dependent, and therefore incompatible with scalable vector
types. This patch adds an alternative lowering path for create_mask
operations that return a scalable vector mask.
Differential Revision: https://reviews.llvm.org/D118248
The Func has a large number of legacy dependencies carried over from the old
Standard dialect, which was pervasive and contained a large number of varied
operations. With the split of the standard dialect and its demise, a lot of lingering
dead dependencies have survived to the Func dialect. This commit removes a
large majority of then, greatly reducing the dependence surface area of the
Func dialect.
The last remaining operations in the standard dialect all revolve around
FuncOp/function related constructs. This patch simply handles the initial
renaming (which by itself is already huge), but there are a large number
of cleanups unlocked/necessary afterwards:
* Removing a bunch of unnecessary dependencies on Func
* Cleaning up the From/ToStandard conversion passes
* Preparing for the move of FuncOp to the Func dialect
See the discussion at https://discourse.llvm.org/t/standard-dialect-the-final-chapter/6061
Differential Revision: https://reviews.llvm.org/D120624
This reduces the dependencies of the MLIRVector target and makes the dialect consistent with other dialects.
Differential Revision: https://reviews.llvm.org/D118533
Precursor: https://reviews.llvm.org/D110200
Removed redundant ops from the standard dialect that were moved to the
`arith` or `math` dialects.
Renamed all instances of operations in the codebase and in tests.
Reviewed By: rriddle, jpienaar
Differential Revision: https://reviews.llvm.org/D110797
It was bundling quite a lot of patterns that convert high-D
vector ops into low-D elementary ops. It might not be good
for all of the patterns to happen for a particular downstream
user. For example, `ShapeCastOpRewritePattern` rewrites
`vector.shape_cast` into data movement extract/insert ops.
Instead, split the entry point into multiple ones so users
can pull in patterns on demand.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D111225
This simplifies the vector to LLVM lowering. Previously, both vector.load/store and vector.transfer_read/write lowered directly to LLVM. With this commit, there is a single path to LLVM vector load/store instructions and vector.transfer_read/write ops must first be lowered to vector.load/store ops.
* Remove vector.transfer_read/write to LLVM lowering.
* Allow non-unit memref strides on all but the most minor dimension for vector.load/store ops.
* Add maxTransferRank option to populateVectorTransferLoweringPatterns.
* vector.transfer_reads with changing element type can no longer be lowered to LLVM. (This functionality is needed only for SPIRV.)
Differential Revision: https://reviews.llvm.org/D106118
The dialect-specific cast between builtin (ex-standard) types and LLVM
dialect types was introduced long time before built-in support for
unrealized_conversion_cast. It has a similar purpose, but is restricted
to compatible builtin and LLVM dialect types, which may hamper
progressive lowering and composition with types from other dialects.
Replace llvm.mlir.cast with unrealized_conversion_cast, and drop the
operation that became unnecessary.
Also make unrealized_conversion_cast legal by default in
LLVMConversionTarget as the majority of convesions using it are partial
conversions that actually want the casts to persist in the IR. The
standard-to-llvm conversion, which is still expected to run last, cleans
up the remaining casts standard-to-llvm conversion, which is still
expected to run last, cleans up the remaining casts
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D105880
After the MemRef has been split out of the Standard dialect, the
conversion to the LLVM dialect remained as a huge monolithic pass.
This is undesirable for the same complexity management reasons as having
a huge Standard dialect itself, and is even more confusing given the
existence of a separate dialect. Extract the conversion of the MemRef
dialect operations to LLVM into a separate library and a separate
conversion pass.
Reviewed By: herhut, silvas
Differential Revision: https://reviews.llvm.org/D105625
Simplify vector unrolling pattern to be more aligned with rest of the
patterns and be closer to vector distribution.
The new implementation uses ExtractStridedSlice/InsertStridedSlice
instead of the Tuple ops. After this change the ops based on Tuple don't
have any more used so they can be removed.
This allows removing signifcant amount of dead code and will allow
extending the unrolling code going forward.
Differential Revision: https://reviews.llvm.org/D105381
Move TransposeOp lowering in its own populate function as in some cases
it is better to keep it during ContractOp lowering to better
canonicalize it rather than emiting scalar insert/extract.
Differential Revision: https://reviews.llvm.org/D101647
ArmSVE dialect is behind the recent changes in how the Vector dialect
interacts with backend vector dialects and the MLIR -> LLVM IR
translation module. This patch cleans up ArmSVE initialization within
Vector and removes the need for an LLVMArmSVE dialect.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D100171
We will soon be adding non-AVX512 operations to MLIR, such as AVX's rsqrt. In https://reviews.llvm.org/D99818 several possibilities were discussed, namely to (1) add non-AVX512 ops to the AVX512 dialect, (2) add more dialects (e.g. AVX dialect for AVX rsqrt), and (3) expand the scope of the AVX512 to include these SIMD x86 ops, thereby renaming the dialect to something more accurate such as X86Vector.
Consensus was reached on option (3), which this patch implements.
Reviewed By: aartbik, ftynse, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D100119
Also factors out out-of-bounds mask generation from vector.transfer_read/write into a new MaterializeTransferMask pattern.
Differential Revision: https://reviews.llvm.org/D100001
This doesn't change APIs, this just cleans up the many in-tree uses of these
names to use the new preferred names. We'll keep the old names around for a
couple weeks to help transitions.
Differential Revision: https://reviews.llvm.org/D99127
This updates the codebase to pass the context when creating an instance of
OwningRewritePatternList, and starts removing extraneous MLIRContext
parameters. There are many many more to be removed.
Differential Revision: https://reviews.llvm.org/D99028
The Intel Advanced Matrix Extensions (AMX) provides a tile matrix
multiply unit (TMUL), a tile control register (TILECFG), and eight
tile registers TMM0 through TMM7 (TILEDATA). This new MLIR dialect
provides a bridge between MLIR concepts like vectors and memrefs
and the lower level LLVM IR details of AMX.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D98470
The dialect separation was introduced to demarkate ops operating in different
type systems. This is no longer the case after the LLVM dialect has migrated to
using built-in vector types, so the original reason for separation is no longer
valid. Squash the two dialects into one.
The code size decrease isn't quite large: the ops originally in LLVM_AVX512 are
preserved because they match LLVM IR intrinsics specialized for vector element
bitwidth. However, it is still conceptually beneficial to have only one
dialect. I originally considered to use Tablegen multiclasses to define both
the type-polymorphic op and its two intrinsic-related instantiations, but
decided against it given both the complexity of the required Tablegen input and
its dissimilarity with the rest of ODS-defined ops, both potentially resulting
in very poor maintainability.
Depends On D98327
Reviewed By: nicolasvasilache, springerm
Differential Revision: https://reviews.llvm.org/D98328
The two dialects are largely redundant. The former was introduced as a mirror
of the latter operating on LLVM dialect types. This is no longer necessary
since the LLVM dialect operates on built-in types. Combine the two dialects.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D98060
This makes ignoring a result explicit by the user, and helps to prevent accidental errors with dropped results. Marking LogicalResult as no discard was always the intention from the beginning, but got lost along the way.
Differential Revision: https://reviews.llvm.org/D95841
Historically, the Vector to LLVM dialect conversion subsumed the Standard to
LLVM dialect conversion patterns. This was necessary because the conversion
infrastructure did not have sufficient support for reconciling type
conversions. This support is now available. Only keep the patterns related to
the Vector dialect in the Vector to LLVM conversion and require type casts
operations to be inserted if necessary. These casts will be removed by
following conversions if possible. Update integration tests to also run the
Standard to LLVM conversion.
There is a significant amount of test churn, which is due to (a) unnecessarily
strict tests in VectorToLLVM and (b) many patterns actually targeting Standard
dialect ops instead of LLVM dialect ops leading to tests actually exercising a
Vector->Standard->LLVM conversion. This churn is a good illustration of the
reason to make the conversion partial: now the tests only check the code in the
Vector to LLVM conversion and will not be randomly broken by changes in
Standard to LLVM conversion.
Arguably, it may be possible to extract Vector to Standard patterns into a
separate pass, but given the ongoing splitting of the Standard dialect, such
pass will be short-lived and will require further refactoring.
Depends On D95626
Reviewed By: nicolasvasilache, aartbik
Differential Revision: https://reviews.llvm.org/D95685